|
|
|
<p>In this work, a new method for designing an analog circuit for deep
sub-micron CMOS fabrication processes is proposed. The proposed method
leverages the regression algorithms with the transistor circuit model to size a
transistor in 0.18 um technology fast and without using simulation software.
Threshold voltage, output resistance, and the product of mobility and oxide
capacitance are key parameters in the transistor circuit model to size a
transistor. For nano-scale transistors, however, these parameters are nonlinear
with respect to electrical and physical characteristics of transistors and
circuit simulator is needed to find the value of these parameters and therefore
the design time increases. Regression analysis is utilized to predict values of
these parameters. We demonstrate the performance of the proposed method by
designing a Current Feedback Instrumentational Amplifier (CFIA). We show that
the presented method accomplishes higher than 90% accuracy in predicting the
desired value of W. It reduces the design time over 97% compared to
conventional methods. The designed circuit using the proposed method consumes
5.76 uW power and has a Common Mode Rejection Ratio (CMRR) of 35.83 dB and it
results in achieving 8.17 V/V gain.
</p>
|
|
|
|
<p>Self-supervised learning is the backbone of state of the art language
modeling. It has been argued that training with predictive loss on a
self-supervised dataset causes simulators: entities that internally represent
possible configurations of real-world systems. Under this assumption, a
mathematical model for simulators is built based in the Cartesian frames model
of embedded agents, which is extended to multi-agent worlds through scaling a
two-dimensional frame to arbitrary dimensions, where literature prior chooses
to instead use operations on frames. This variant leveraging scaling
dimensionality is named the Cartesian object, and is used to represent
simulations (where individual simulacra are the agents and devices in that
object). Around the Cartesian object, functions like token selection and
simulation complexity are accounted for in formalizing the behavior of a
simulator, and used to show (through the L\"obian obstacle) that a proof of
alignment between simulacra by inspection of design is impossible in the
simulator context. Following this, a scheme is proposed and termed Partial
Simulation Extrapolation aimed at circumventing the L\"obian obstacle through
the evaluation of low-complexity simulations.
</p>
|
|
|
|
<p>Fairness is a popular research topic in recent years. A research topic
closely related to fairness is bias and debiasing. Among different types of
bias problems, position bias is one of the most widely encountered symptoms.
Position bias means that recommended items on top of the recommendation list
has a higher likelihood to be clicked than items on bottom of the same list. To
mitigate this problem, we propose to use regularization technique to reduce the
bias effect. In the experiment section, we prove that our method is superior to
other modern algorithms.
</p>
|
|
|
|
<p>As legal case law databases such as HUDOC continue to grow rapidly, it has
become essential for legal researchers to find efficient methods to handle such
large-scale data sets. Such case law databases usually consist of the textual
content of cases together with the citations between them. This paper focuses
on case law from the European Court of Human Rights on Article 8 of the
European Convention of Human Rights, the right to respect private and family
life, home and correspondence. In this study, we demonstrate and compare the
potential of topic modelling and citation network to find and organize case law
on Article 8 based on their general themes and citation patterns, respectively.
Additionally, we explore whether combining these two techniques leads to better
results compared to the application of only one of the methods. We evaluate the
effectiveness of the combined method on a unique manually collected and
annotated dataset of Aricle 8 case law on evictions. The results of our
experiments show that our combined (text and citation-based) approach provides
the best results in finding and grouping case law, providing scholars with an
effective way to extract and analyse relevant cases on a specific issue.
</p>
|
|
|
|
<p>Background: The COVID-19 pandemic has caused severe impacts on health systems
worldwide. Its critical nature and the increased interest of individuals and
organizations to develop countermeasures to the problem has led to a surge of
new studies in scientific journals. Objetive: We sought to develop a tool that
incorporates, in a novel way, aspects of Information Retrieval (IR) and
Extraction (IE) applied to the COVID-19 Open Research Dataset (CORD-19). The
main focus of this paper is to provide researchers with a better search tool
for COVID-19 related papers, helping them find reference papers and hightlight
relevant entities in text. Method: We applied Latent Dirichlet Allocation (LDA)
to model, based on research aspects, the topics of all English abstracts in
CORD-19. Relevant named entities of each abstract were extracted and linked to
the corresponding UMLS concept. Regular expressions and the K-Nearest Neighbors
algorithm were used to rank relevant papers. Results: Our tool has shown the
potential to assist researchers by automating a topic-based search of CORD-19
papers. Nonetheless, we identified that more fine-tuned topic modeling
parameters and increased accuracy of the research aspect classifier model could
lead to a more accurate and reliable tool. Conclusion: We emphasize the need of
new automated tools to help researchers find relevant COVID-19 documents, in
addition to automatically extracting useful information contained in them. Our
work suggests that combining different algorithms and models could lead to new
ways of browsing COVID-19 paper data.
</p>
|
|
|
|
<p>The task of predicting conversion rates (CVR) lies at the heart of online
advertising systems aiming to optimize bids to meet advertiser performance
requirements. Even with the recent rise of deep neural networks, these
predictions are often made by factorization machines (FM), especially in
commercial settings where inference latency is key. These models are trained
using the logistic regression framework on labeled tabular data formed from
past user activity that is relevant to the task at hand.
</p>
<p>Many advertisers only care about click-attributed conversions. A major
challenge in training models that predict conversions-given-clicks comes from
data sparsity - clicks are rare, conversions attributed to clicks are even
rarer. However, mitigating sparsity by adding conversions that are not
click-attributed to the training set impairs model calibration. Since
calibration is critical to achieving advertiser goals, this is infeasible.
</p>
<p>In this work we use the well-known idea of self-supervised pre-training, and
use an auxiliary auto-encoder model trained on all conversion events, both
click-attributed and not, as a feature extractor to enrich the main CVR
prediction model. Since the main model does not train on non click-attributed
conversions, this does not impair calibration. We adapt the basic
self-supervised pre-training idea to our online advertising setup by using a
loss function designed for tabular data, facilitating continual learning by
ensuring auto-encoder stability, and incorporating a neural network into a
large-scale real-time ad auction that ranks tens of thousands of ads, under
strict latency constraints, and without incurring a major engineering cost. We
show improvements both offline, during training, and in an online A/B test.
Following its success in A/B tests, our solution is now fully deployed to the
Yahoo native advertising system.
</p>
|
|
|
|
<p>Within-basket recommendation (WBR) refers to the task of recommending items
to the end of completing a non-empty shopping basket during a shopping session.
While the latest innovations in this space demonstrate remarkable performance
improvement on benchmark datasets, they often overlook the complexity of user
behaviors in practice, such as 1) co-existence of multiple shopping intentions,
2) multi-granularity of such intentions, and 3) interleaving behavior
(switching intentions) in a shopping session. This paper presents Neural
Pattern Associator (NPA), a deep item-association-mining model that explicitly
models the aforementioned factors. Specifically, inspired by vector
quantization, the NPA model learns to encode common user intentions (or
item-combination patterns) as quantized representations (a.k.a. codebook),
which permits identification of users's shopping intentions via
attention-driven lookup during the reasoning phase. This yields coherent and
self-interpretable recommendations. We evaluated the proposed NPA model across
multiple extensive datasets, encompassing the domains of grocery e-commerce
(shopping basket completion) and music (playlist extension), where our
quantitative evaluations show that the NPA model significantly outperforms a
wide range of existing WBR solutions, reflecting the benefit of explicitly
modeling complex user intentions.
</p>
|
|
|
|
<p>An adaptive control approach for a three-phase grid-interfaced solar
photovoltaic system based on the new Neuro-Fuzzy Inference System with Rain
Optimization Algorithm (ANROA) methodology is proposed and discussed in this
manuscript. This method incorporates an Adaptive Neuro-fuzzy Inference System
(ANFIS) with a Rain Optimization Algorithm (ROA). The ANFIS controller has
excellent maximum tracking capability because it includes features of both
neural and fuzzy techniques. The ROA technique is in charge of controlling the
voltage source converter switching. Avoiding power quality problems including
voltage fluctuations, harmonics, and flickers as well as unbalanced loads and
reactive power usage is the major goal. Besides, the proposed method performs
at zero voltage regulation and unity power factor modes. The suggested control
approach has been modeled and simulated, and its performance has been assessed
using existing alternative methods. A statistical analysis of proposed and
existing techniques has been also presented and discussed. The results of the
simulations demonstrate that, when compared to alternative approaches, the
suggested strategy may properly and effectively identify the best global
solutions. Furthermore, the system's robustness has been studied by using
MATLAB/SIMULINK environment and experimentally by Field Programmable Gate
Arrays Controller (FPGA)-based Hardware-in-Loop (HLL).
</p>
|
|
|
|
<p>The Burrows-Wheeler Transform (BWT) is a string transformation technique
widely used in areas such as bioinformatics and file compression. Many
applications combine a run-length encoding (RLE) with the BWT in a way which
preserves the ability to query the compressed data efficiently. However, these
methods may not take full advantage of the compressibility of the BWT as they
do not modify the alphabet ordering for the sorting step embedded in computing
the BWT. Indeed, any such alteration of the alphabet ordering can have a
considerable impact on the output of the BWT, in particular on the number of
runs. For an alphabet $\Sigma$ containing $\sigma$ characters, the space of all
alphabet orderings is of size $\sigma!$. While for small alphabets an
exhaustive investigation is possible, finding the optimal ordering for larger
alphabets is not feasible. Therefore, there is a need for a more informed
search strategy than brute-force sampling the entire space, which motivates a
new heuristic approach. In this paper, we explore the non-trivial cases for the
problem of minimizing the size of a run-length encoded BWT (RLBWT) via
selecting a new ordering for the alphabet. We show that random sampling of the
space of alphabet orderings usually gives sub-optimal orderings for compression
and that a local search strategy can provide a large improvement in relatively
few steps. We also inspect a selection of initial alphabet orderings, including
ASCII, letter appearance, and letter frequency. While this alphabet ordering
problem is computationally hard we demonstrate gain in compressibility.
</p>
|
|
|
|
<p>Weather radar is the primary tool used by forecasters to detect and warn for
tornadoes in near-real time. In order to assist forecasters in warning the
public, several algorithms have been developed to automatically detect tornadic
signatures in weather radar observations. Recently, Machine Learning (ML)
algorithms, which learn directly from large amounts of labeled data, have been
shown to be highly effective for this purpose. Since tornadoes are extremely
rare events within the corpus of all available radar observations, the
selection and design of training datasets for ML applications is critical for
the performance, robustness, and ultimate acceptance of ML algorithms. This
study introduces a new benchmark dataset, TorNet to support development of ML
algorithms in tornado detection and prediction. TorNet contains
full-resolution, polarimetric, Level-II WSR-88D data sampled from 10 years of
reported storm events. A number of ML baselines for tornado detection are
developed and compared, including a novel deep learning (DL) architecture
capable of processing raw radar imagery without the need for manual feature
extraction required for existing ML algorithms. Despite not benefiting from
manual feature engineering or other preprocessing, the DL model shows increased
detection performance compared to non-DL and operational baselines. The TorNet
dataset, as well as source code and model weights of the DL baseline trained in
this work, are made freely available.
</p>
|
|
|
|
<p>Deep learning models like Transformers and Convolutional Neural Networks
(CNNs) have revolutionized various domains, but their parameter-intensive
nature hampers deployment in resource-constrained settings. In this paper, we
introduce a novel concept utilizes column space and row space of weight
matrices, which allows for a substantial reduction in model parameters without
compromising performance. Leveraging this paradigm, we achieve
parameter-efficient deep learning models.. Our approach applies to both
Bottleneck and Attention layers, effectively halving the parameters while
incurring only minor performance degradation. Extensive experiments conducted
on the ImageNet dataset with ViT and ResNet50 demonstrate the effectiveness of
our method, showcasing competitive performance when compared to traditional
models. This approach not only addresses the pressing demand for parameter
efficient deep learning solutions but also holds great promise for practical
deployment in real-world scenarios.
</p>
|
|
|
|
<p>We study the problem of auditing classifiers with the notion of statistical
subgroup fairness. Kearns et al. (2018) has shown that the problem of auditing
combinatorial subgroups fairness is as hard as agnostic learning. Essentially
all work on remedying statistical measures of discrimination against subgroups
assumes access to an oracle for this problem, despite the fact that no
efficient algorithms are known for it. If we assume the data distribution is
Gaussian, or even merely log-concave, then a recent line of work has discovered
efficient agnostic learning algorithms for halfspaces. Unfortunately, the
boosting-style reductions given by Kearns et al. required the agnostic learning
algorithm to succeed on reweighted distributions that may not be log-concave,
even if the original data distribution was. In this work, we give positive and
negative results on auditing for the Gaussian distribution: On the positive
side, we an alternative approach to leverage these advances in agnostic
learning and thereby obtain the first polynomial-time approximation scheme
(PTAS) for auditing nontrivial combinatorial subgroup fairness: we show how to
audit statistical notions of fairness over homogeneous halfspace subgroups when
the features are Gaussian. On the negative side, we find that under
cryptographic assumptions, no polynomial-time algorithm can guarantee any
nontrivial auditing, even under Gaussian feature distributions, for general
halfspace subgroups.
</p>
|
|
|
|
<p>There has been considerable recent interest in scoring properties on the
basis of eviction risk. The success of methods for eviction prediction is
typically evaluated using different measures of predictive accuracy. However,
the underlying goal of such prediction is to direct appropriate assistance to
households that may be at greater risk so they remain stably housed. Thus, we
must ask the question of how useful such predictions are in targeting outreach
efforts - informing action. In this paper, we investigate this question using a
novel dataset that matches information on properties, evictions, and owners. We
perform an eviction prediction task to produce risk scores and then use these
risk scores to plan targeted outreach policies. We show that the risk scores
are, in fact, useful, enabling a theoretical team of caseworkers to reach more
eviction-prone properties in the same amount of time, compared to outreach
policies that are either neighborhood-based or focus on buildings with a recent
history of evictions. We also discuss the importance of neighborhood and
ownership features in both risk prediction and targeted outreach.
</p>
|
|
|
|
<p>Over the past years, a large number of fake news detection algorithms based
on deep learning have emerged. However, they are often developed under
different frameworks, each mandating distinct utilization methodologies,
consequently hindering reproducibility. Additionally, a substantial amount of
redundancy characterizes the code development of such fake news detection
models. To address these concerns, we propose FaKnow, a unified and
comprehensive fake news detection algorithm library. It encompasses a variety
of widely used fake news detection models, categorized as content-based and
social context-based approaches. This library covers the full spectrum of the
model training and evaluation process, effectively organizing the data, models,
and training procedures within a unified framework. Furthermore, it furnishes a
series of auxiliary functionalities and tools, including visualization, and
logging. Our work contributes to the standardization and unification of fake
news detection research, concurrently facilitating the endeavors of researchers
in this field. The open-source code and documentation can be accessed at
https://github.com/NPURG/FaKnow and https://faknow.readthedocs.io,
respectively.
</p>
|
|
|
|
<p>As VR devices become more prevalent in the consumer space, VR applications
are likely to be increasingly used by users unfamiliar with VR. Detecting the
familiarity level of a user with VR as an interaction medium provides the
potential of providing on-demand training for acclimatization and prevents the
user from being burdened by the VR environment in accomplishing their tasks. In
this work, we present preliminary results of using deep classifiers to conduct
automatic detection of familiarity with VR by using hand tracking of the user
as they interact with a numeric passcode entry panel to unlock a VR door. We
use a VR door as we envision it to the first point of entry to collaborative
virtual spaces, such as meeting rooms, offices, or clinics. Users who are
unfamiliar with VR will have used their hands to open doors with passcode entry
panels in the real world. Thus, while the user may not be familiar with VR,
they would be familiar with the task of opening the door. Using a pilot dataset
consisting of 7 users familiar with VR, and 7 not familiar with VR, we acquire
highest accuracy of 88.03\% when 6 test users, 3 familiar and 3 not familiar,
are evaluated with classifiers trained using data from the remaining 8 users.
Our results indicate potential for using user movement data to detect
familiarity for the simple yet important task of secure passcode-based access.
</p>
|
|
|
|
<p>Existing game AI research mainly focuses on enhancing agents' abilities to
win games, but this does not inherently make humans have a better experience
when collaborating with these agents. For example, agents may dominate the
collaboration and exhibit unintended or detrimental behaviors, leading to poor
experiences for their human partners. In other words, most game AI agents are
modeled in a "self-centered" manner. In this paper, we propose a
"human-centered" modeling scheme for collaborative agents that aims to enhance
the experience of humans. Specifically, we model the experience of humans as
the goals they expect to achieve during the task. We expect that agents should
learn to enhance the extent to which humans achieve these goals while
maintaining agents' original abilities (e.g., winning games). To achieve this,
we propose the Reinforcement Learning from Human Gain (RLHG) approach. The RLHG
approach introduces a "baseline", which corresponds to the extent to which
humans primitively achieve their goals, and encourages agents to learn
behaviors that can effectively enhance humans in achieving their goals better.
We evaluate the RLHG agent in the popular Multi-player Online Battle Arena
(MOBA) game, Honor of Kings, by conducting real-world human-agent tests. Both
objective performance and subjective preference results show that the RLHG
agent provides participants better gaming experience.
</p>
|
|
|
|
<p>Large language models (LLMs), as epitomized by models like ChatGPT, have
revolutionized the field of natural language processing (NLP). Along with this
trend, code-based large language models such as StarCoder, WizardCoder, and
CodeLlama have emerged, trained extensively on vast repositories of code data.
Yet, inherent in their design, these models primarily focus on generative tasks
like code generation, code completion, and comment generation, and general
support for multiple programming languages. While the generic abilities of code
LLMs are useful for many programmers, the area of high-performance computing
(HPC) has a narrower set of requirements that make a smaller and more
domain-specific LM a smarter choice. This paper introduces OMPGPT, a novel
model meticulously designed to harness the inherent strengths of language
models for OpenMP pragma generation. Furthermore, we adopt and adapt prompt
engineering techniques from the NLP domain to create chain-of-OMP, an
innovative strategy designed to enhance OMPGPT's effectiveness. Our extensive
evaluations demonstrate that OMPGPT outperforms existing large language models
specialized in OpenMP tasks and maintains a notably smaller size, aligning it
more closely with the typical hardware constraints of HPC environments. We
consider our contribution as a pivotal bridge, connecting the advantage of
language models with the specific demands of HPC tasks. The success of OMPGPT
lays a solid foundation, suggesting its potential applicability and
adaptability to a wider range of HPC tasks, thereby opening new avenues in the
field of computational efficiency and effectiveness.
</p>
|
|
|
|
<p>Fast and reliable transmission network reconfiguration is critical in
improving power grid resilience to cyber-attacks. If the network
reconfiguration following cyber-attacks is imperfect, secondary incidents may
delay or interrupt post-attack restoration of the power grid. This paper
proposes a framework of resilient transmission network reconfiguration, taking
into account the impacts of cyber-attacks in the network reconfiguration
process. First, the mechanism of cyber-attack propagation is analyzed based on
the characteristics of network reconfiguration. Second, systematic resilience
indices are specially extracted in which the impact of cyber-attacks on network
reconfiguration is quantified. These indices are defined in terms of the
restoration characteristics of the transmission power system. Third,
representative cyber-attack incidents motivate an optimization-based model of
resilient transmission network reconfiguration, and an optimal reconstruction
scheme is obtained. Finally, simulation results based on the IEEE 39-bus system
verify the feasibility and effectiveness of the proposed framework in enhancing
power grid resilience to cyber-attacks.
</p>
|
|
|
|
<p>This paper presents LLM4SecHW, a novel framework for hardware debugging that
leverages domain specific Large Language Model (LLM). Despite the success of
LLMs in automating various software development tasks, their application in the
hardware security domain has been limited due to the constraints of commercial
LLMs and the scarcity of domain specific data. To address these challenges, we
propose a unique approach to compile a dataset of open source hardware design
defects and their remediation steps, utilizing version control data. This
dataset provides a substantial foundation for training machine learning models
for hardware. LLM4SecHW employs fine tuning of medium sized LLMs based on this
dataset, enabling the identification and rectification of bugs in hardware
designs. This pioneering approach offers a reference workflow for the
application of fine tuning domain specific LLMs in other research areas. We
evaluate the performance of our proposed system on various open source hardware
designs, demonstrating its efficacy in accurately identifying and correcting
defects. Our work brings a new perspective on automating the quality control
process in hardware design.
</p>
|
|
|
|
<p>Digital Twins (DT) have become crucial to achieve sustainable and effective
smart urban solutions. However, current DT modelling techniques cannot support
the dynamicity of these smart city environments. This is caused by the lack of
right-time data capturing in traditional approaches, resulting in inaccurate
modelling and high resource and energy consumption challenges. To fill this
gap, we explore spatiotemporal graphs and propose the Reinforcement
Learning-based Adaptive Twining (RL-AT) mechanism with Deep Q Networks (DQN).
By doing so, our study contributes to advancing Green Cities and showcases
tangible benefits in accuracy, synchronisation, resource optimization, and
energy efficiency. As a result, we note the spatiotemporal graphs are able to
offer a consistent accuracy and 55% higher querying performance when
implemented using graph databases. In addition, our model demonstrates
right-time data capturing with 20% lower overhead and 25% lower energy
consumption.
</p>
|
|
|
|
<p>With the increasing need for inclusive and user-friendly technology, web
accessibility is crucial to ensuring equal access to online content for
individuals with disabilities, including visual, auditory, cognitive, or motor
impairments. Despite the existence of accessibility guidelines and standards
such as Web Content Accessibility Guidelines (WCAG) and the Web Accessibility
Initiative (W3C), over 90\% of websites still fail to meet the necessary
accessibility requirements. For web users with disabilities, there exists a
need for a tool to automatically fix web page accessibility errors. While
research has demonstrated methods to find and target accessibility errors, no
research has focused on effectively correcting such violations. This paper
presents a novel approach to correcting accessibility violations on the web by
modifying the document object model (DOM) in real time with foundation models.
Leveraging accessibility error information, large language models (LLMs), and
prompt engineering techniques, we achieved greater than a 51\% reduction in
accessibility violation errors after corrections on our novel benchmark:
ACCESS. Our work demonstrates a valuable approach toward the direction of
inclusive web content, and provides directions for future research to explore
advanced methods to automate web accessibility.
</p>
|
|
|
|
<p>Offline reinforcement learning (RL) algorithms can improve the decision
making via stitching sub-optimal trajectories to obtain more optimal ones. This
capability is a crucial factor in enabling RL to learn policies that are
superior to the behavioral policy. On the other hand, Decision Transformer (DT)
abstracts the decision-making as sequence modeling, showcasing competitive
performance on offline RL benchmarks, however, recent studies demonstrate that
DT lacks of stitching capability, thus exploit stitching capability for DT is
vital to further improve its performance. In order to endow stitching
capability to DT, we abstract trajectory stitching as expert matching and
introduce our approach, ContextFormer, which integrates contextual
information-based imitation learning (IL) and sequence modeling to stitch
sub-optimal trajectory fragments by emulating the representations of a limited
number of expert trajectories. To validate our claim, we conduct experiments
from two perspectives: 1) We conduct extensive experiments on D4RL benchmarks
under the settings of IL, and experimental results demonstrate ContextFormer
can achieve competitive performance in multi-IL settings. 2) More importantly,
we conduct a comparison of ContextFormer with diverse competitive DT variants
using identical training datasets. The experimental results unveiled
ContextFormer's superiority, as it outperformed all other variants, showcasing
its remarkable performance.
</p>
|
|
|
|
<p>Long-term traffic prediction has always been a challenging task due to its
dynamic temporal dependencies and complex spatial dependencies. In this paper,
we propose a model that combines hybrid Transformer and spatio-temporal
self-supervised learning. The model enhances its robustness by applying
adaptive data augmentation techniques at the sequence-level and graph-level of
the traffic data. It utilizes Transformer to overcome the limitations of
recurrent neural networks in capturing long-term sequences, and employs
Chebyshev polynomial graph convolution to capture complex spatial dependencies.
Furthermore, considering the impact of spatio-temporal heterogeneity on traffic
speed, we design two self-supervised learning tasks to model the temporal and
spatial heterogeneity, thereby improving the accuracy and generalization
ability of the model. Experimental evaluations are conducted on two real-world
datasets, PeMS04 and PeMS08, and the results are visualized and analyzed,
demonstrating the superior performance of the proposed model.
</p>
|
|
|
|
<p>An effective multi-turn instruction-following assistant can be developed by
creating a simulator that can generate useful interaction data. Apart from
relying on its intrinsic weights, an ideal user simulator should also be able
to bootstrap external knowledge rapidly in its raw form to simulate the
multifarious diversity of text available over the internet. Previous user
simulators generally lacked diversity, were mostly closed domain, and
necessitated rigid schema making them inefficient to rapidly scale to
incorporate external knowledge. In this regard, we introduce, Kaucus, a
Knowledge-Augmented User Simulator framework, to outline a process of creating
diverse user simulators, that can seamlessly exploit external knowledge as well
as benefit downstream assistant model training. Through two GPT-J based
simulators viz., a Retrieval Augmented Simulator and a Summary Controlled
Simulator we generate diverse simulator-assistant interactions. Through reward
and preference model-based evaluations, we find that these interactions serve
as useful training data and create more helpful downstream assistants. We also
find that incorporating knowledge through retrieval augmentation or summary
control helps create better assistants.
</p>
|
|
|
|
<p>Recently, efficient Vision Transformers have shown great performance with low
latency on resource-constrained devices. Conventionally, they use 4x4 patch
embeddings and a 4-stage structure at the macro level, while utilizing
sophisticated attention with multi-head configuration at the micro level. This
paper aims to address computational redundancy at all design levels in a
memory-efficient manner. We discover that using larger-stride patchify stem not
only reduces memory access costs but also achieves competitive performance by
leveraging token representations with reduced spatial redundancy from the early
stages. Furthermore, our preliminary analyses suggest that attention layers in
the early stages can be substituted with convolutions, and several attention
heads in the latter stages are computationally redundant. To handle this, we
introduce a single-head attention module that inherently prevents head
redundancy and simultaneously boosts accuracy by parallelly combining global
and local information. Building upon our solutions, we introduce SHViT, a
Single-Head Vision Transformer that obtains the state-of-the-art speed-accuracy
tradeoff. For example, on ImageNet-1k, our SHViT-S4 is 3.3x, 8.1x, and 2.4x
faster than MobileViTv2 x1.0 on GPU, CPU, and iPhone12 mobile device,
respectively, while being 1.3% more accurate. For object detection and instance
segmentation on MS COCO using Mask-RCNN head, our model achieves performance
comparable to FastViT-SA12 while exhibiting 3.8x and 2.0x lower backbone
latency on GPU and mobile device, respectively.
</p>
|
|
|
|
<p>Bias mitigation of Language Models has been the topic of many studies with a
recent focus on learning separate modules like adapters for on-demand
debiasing. Besides optimizing for a modularized debiased model, it is often
critical in practice to control the degree of bias reduction at inference time,
e.g., in order to tune for a desired performance-fairness trade-off in search
results or to control the strength of debiasing in classification tasks. In
this paper, we introduce Controllable Gate Adapter (ConGater), a novel modular
gating mechanism with adjustable sensitivity parameters, which allows for a
gradual transition from the biased state of the model to the fully debiased
version at inference time. We demonstrate ConGater performance by (1)
conducting adversarial debiasing experiments with three different models on
three classification tasks with four protected attributes, and (2) reducing the
bias of search results through fairness list-wise regularization to enable
adjusting a trade-off between performance and fairness metrics. Our experiments
on the classification tasks show that compared to baselines of the same
caliber, ConGater can maintain higher task performance while containing less
information regarding the attributes. Our results on the retrieval task show
that the fully debiased ConGater can achieve the same fairness performance
while maintaining more than twice as high task performance than recent strong
baselines. Overall, besides strong performance ConGater enables the continuous
transitioning between biased and debiased states of models, enhancing
personalization of use and interpretability through controllability.
</p>
|
|
|
|
<p>Peer-to-peer (P2P) lending has emerged as a distinctive financing mechanism,
linking borrowers with lenders through online platforms. However, P2P lending
faces the challenge of information asymmetry, as lenders often lack sufficient
data to assess the creditworthiness of borrowers. This paper proposes a novel
approach to address this issue by leveraging the textual descriptions provided
by borrowers during the loan application process. Our methodology involves
processing these textual descriptions using a Large Language Model (LLM), a
powerful tool capable of discerning patterns and semantics within the text.
Transfer learning is applied to adapt the LLM to the specific task at hand.
</p>
<p>Our results derived from the analysis of the Lending Club dataset show that
the risk score generated by BERT, a widely used LLM, significantly improves the
performance of credit risk classifiers. However, the inherent opacity of
LLM-based systems, coupled with uncertainties about potential biases,
underscores critical considerations for regulatory frameworks and engenders
trust-related concerns among end-users, opening new avenues for future research
in the dynamic landscape of P2P lending and artificial intelligence.
</p>
|
|
|
|
<p>The remarkable prowess of diffusion models in image generation has spurred
efforts to extend their application beyond generative tasks. However, a
persistent challenge exists in lacking a unified approach to apply diffusion
models to visual perception tasks with diverse semantic granularity
requirements. Our purpose is to establish a unified visual perception
framework, capitalizing on the potential synergies between generative and
discriminative models. In this paper, we propose Vermouth, a simple yet
effective framework comprising a pre-trained Stable Diffusion (SD) model
containing rich generative priors, a unified head (U-head) capable of
integrating hierarchical representations, and an adapted expert providing
discriminative priors. Comprehensive investigations unveil potential
characteristics of Vermouth, such as varying granularity of perception
concealed in latent variables at distinct time steps and various U-net stages.
We emphasize that there is no necessity for incorporating a heavyweight or
intricate decoder to transform diffusion models into potent representation
learners. Extensive comparative evaluations against tailored discriminative
models showcase the efficacy of our approach on zero-shot sketch-based image
retrieval (ZS-SBIR), few-shot classification, and open-vocabulary semantic
segmentation tasks. The promising results demonstrate the potential of
diffusion models as formidable learners, establishing their significance in
furnishing informative and robust visual representations.
</p>
|
|
|
|
<p>A multiagent system can be viewed as a society of autonomous agents, whose
interactions can be effectively regulated via social norms. In general, the
norms of a society are not hardcoded but emerge from the agents' interactions.
Specifically, how the agents in a society react to each other's behavior and
respond to the reactions of others determines which norms emerge in the
society. We think of these reactions by an agent to the satisfactory or
unsatisfactory behaviors of another agent as communications from the first
agent to the second agent. Understanding these communications is a kind of
social intelligence: these communications provide natural drivers for norm
emergence by pushing agents toward certain behaviors, which can become
established as norms. Whereas it is well-known that sanctioning can lead to the
emergence of norms, we posit that a broader kind of social intelligence can
prove more effective in promoting cooperation in a multiagent system.
</p>
<p>Accordingly, we develop Nest, a framework that models social intelligence in
the form of a wider variety of communications and understanding of them than in
previous work. To evaluate Nest, we develop a simulated pandemic environment
and conduct simulation experiments to compare Nest with baselines considering a
combination of three kinds of social communication: sanction, tell, and hint.
</p>
<p>We find that societies formed of Nest agents achieve norms faster; moreover,
Nest agents effectively avoid undesirable consequences, which are negative
sanctions and deviation from goals, and yield higher satisfaction for
themselves than baseline agents despite requiring only an equivalent amount of
information.
</p>
|
|
|
|
<p>The problem of the Remaining Useful Life (RUL) prediction, aiming at
providing an accurate estimate of the remaining time from the current
predicting moment to the complete failure of the device, has gained significant
attention from researchers in recent years. In this paper, to overcome the
shortcomings of rigid combination for temporal and spatial features in most
existing RUL prediction approaches, a spatial-temporal homogeneous feature
extractor, named Dual-Mixer model, is firstly proposed. Flexible layer-wise
progressive feature fusion is employed to ensure the homogeneity of
spatial-temporal features and enhance the prediction accuracy. Secondly, the
Feature Space Global Relationship Invariance (FSGRI) training method is
introduced based on supervised contrastive learning. This method maintains the
consistency of relationships among sample features with their degradation
patterns during model training, simplifying the subsequently regression task in
the output layer and improving the model's performance in RUL prediction.
Finally, the effectiveness of the proposed method is validated through
comparisons with other latest research works on the C-MAPSS dataset. The
Dual-Mixer model demonstrates superiority across most metrics, while the FSGRI
training method shows an average improvement of 7.00% and 2.41% in RMSE and
MAPE, respectively, for all baseline models. Our experiments and model code are
publicly available at https://github.com/fuen1590/PhmDeepLearningProjects.
</p>
|
|
|
|
<p>Robotic hands are an important tool for replacing humans in handling toxic or
radioactive materials. However, these are usually highly expensive, and in many
cases, once they are contaminated, they cannot be re-used. Some solutions cope
with this challenge by 3D printing parts of a tendon-based hand. However,
fabrication requires additional assembly steps. Therefore, a novice user may
have difficulties fabricating a hand upon contamination of the previous one. We
propose the Print-N-Grip (PNG) hand which is a tendon-based underactuated
mechanism able to adapt to the shape of objects. The hand is fabricated through
one-shot 3D printing with no additional engineering effort, and can accommodate
a number of fingers as desired by the practitioner. Due to its low cost, the
PNG hand can easily be detached from a universal base for disposing upon
contamination, and replaced by a newly printed one. In addition, the PNG hand
is scalable such that one can effortlessly resize the computerized model and
print. We present the design of the PNG hand along with experiments to show the
capabilities and high durability of the hand.
</p>
|
|
|
|
<p>Creating and maximizing influence among the customers is one of the central
goals of an advertiser, and hence, remains an active area of research in recent
times. In this advertisement technique, the advertisers approach an influence
provider for a specific number of views of their content on a payment basis.
Now, if the influence provider can provide the required number of views or
more, he will receive the full, else a partial payment. In the context of an
influence provider, it is a loss for him if he offers more or less views. This
is formalized as 'Regret', and naturally, in the context of the influence
provider, the goal will be to minimize this quantity. In this paper, we solve
this problem in the context of billboard advertisement and pose it as a
discrete optimization problem. We propose four efficient solution approaches
for this problem and analyze them to understand their time and space
complexity. We implement all the solution methodologies with real-life datasets
and compare the obtained results with the existing solution approaches from the
literature. We observe that the proposed solutions lead to less regret while
taking less computational time.
</p>
|
|
|
|
<p>Apparel's significant role in human appearance underscores the importance of
garment digitalization for digital human creation. Recent advances in 3D
content creation are pivotal for digital human creation. Nonetheless, garment
generation from text guidance is still nascent. We introduce a text-driven 3D
garment generation framework, DressCode, which aims to democratize design for
novices and offer immense potential in fashion design, virtual try-on, and
digital human creation. For our framework, we first introduce SewingGPT, a
GPT-based architecture integrating cross-attention with text-conditioned
embedding to generate sewing patterns with text guidance. We also tailored a
pre-trained Stable Diffusion for high-quality, tile-based PBR texture
generation. By leveraging a large language model, our framework generates
CG-friendly garments through natural language interaction. Our method also
facilitates pattern completion and texture editing, simplifying the process for
designers by user-friendly interaction. With comprehensive evaluations and
comparisons with other state-of-the-art methods, our method showcases the best
quality and alignment with input prompts. User studies further validate our
high-quality rendering results, highlighting its practical utility and
potential in production settings.
</p>
|
|
|
|
<p>While large language models (LLMs) are increasingly being used for program
synthesis, they lack the global view needed to develop useful abstractions;
they generally predict programs one at a time, often repeating the same
functionality. Generating redundant code from scratch is both inefficient and
error-prone. To address this, we propose Refactoring for Generalizable
Abstraction Learning (ReGAL), a gradient-free method for learning a library of
reusable functions via code refactorization, i.e. restructuring code without
changing its execution output. ReGAL learns from a small set of existing
programs, iteratively verifying and refining its abstractions via execution. We
find that the shared function libraries discovered by ReGAL make programs
easier to predict across diverse domains. On three datasets (LOGO graphics
generation, Date reasoning, and TextCraft, a Minecraft-based text game), both
open-source and proprietary LLMs improve in accuracy when predicting programs
with ReGAL functions. For CodeLlama-13B, ReGAL results in absolute accuracy
increases of 11.5% on graphics, 26.1% on date understanding, and 8.1% on
TextCraft, outperforming GPT-3.5 in two of three domains. Our analysis reveals
ReGAL's abstractions encapsulate frequently-used subroutines as well as
environment dynamics.
</p>
|
|
|
|
<p>Image restoration is a fundamental problem that involves recovering a
high-quality clean image from its degraded observation. All-In-One image
restoration models can effectively restore images from various types and levels
of degradation using degradation-specific information as prompts to guide the
restoration model. In this work, we present the first approach that uses
human-written instructions to guide the image restoration model. Given natural
language prompts, our model can recover high-quality images from their degraded
counterparts, considering multiple degradation types. Our method, InstructIR,
achieves state-of-the-art results on several restoration tasks including image
denoising, deraining, deblurring, dehazing, and (low-light) image enhancement.
InstructIR improves +1dB over previous all-in-one restoration methods.
Moreover, our dataset and results represent a novel benchmark for new research
on text-guided image restoration and enhancement. Our code, datasets and models
are available at: https://github.com/mv-lab/InstructIR
</p>
|
|
|
|
<p>Text simplification aims to make technical texts more accessible to laypeople
but often results in deletion of information and vagueness. This work proposes
InfoLossQA, a framework to characterize and recover simplification-induced
information loss in form of question-and-answer (QA) pairs. Building on the
theory of Question Under Discussion, the QA pairs are designed to help readers
deepen their knowledge of a text. We conduct a range of experiments with this
framework. First, we collect a dataset of 1,000 linguist-curated QA pairs
derived from 104 LLM simplifications of scientific abstracts of medical
studies. Our analyses of this data reveal that information loss occurs
frequently, and that the QA pairs give a high-level overview of what
information was lost. Second, we devise two methods for this task: end-to-end
prompting of open-source and commercial language models, and a natural language
inference pipeline. With a novel evaluation framework considering the
correctness of QA pairs and their linguistic suitability, our expert evaluation
reveals that models struggle to reliably identify information loss and applying
similar standards as humans at what constitutes information loss.
</p>
|
|
|
|
<p>We propose a novel GPU-cluster scheduler for distributed DL (DDL) workloads
that enables proximity based consolidation of GPU resources based on the DDL
jobs' sensitivities to the anticipated communication-network delays. Our
scheduler consists of three major components: (i) a classical delay scheduling
algorithm to facilitate job placement and consolidation; (ii) a
network-sensitive job preemption strategy; and (iii) an "auto-tuner" mechanism
to optimize delay timers for effective delay scheduling. Additionally, to
enable a cost-effective methodology for large-scale experiments, we develop a
data-driven DDL cluster simulation platform. Employing the simulation platform
we compare against several state-of-the-art alternatives on real-world workload
traces to demonstrate the benefits of our design. Our scheduler can provide
improvement of up to 69% in end-to-end Makespan for training all jobs compared
to the prevailing consolidation-based scheduling methods, while reducing the
average job completion time by up to 83% and minimizing the communication
overheads by up to 98% under congested networking conditions.
</p>
|
|
|
|
<p>In this paper, we present an advanced approach to solving the inverse rig
problem in blendshape animation, using high-quality corrective blendshapes. Our
algorithm introduces novel enhancements in three key areas: ensuring high data
fidelity in reconstructed meshes, achieving greater sparsity in weight
distributions, and facilitating smoother frame-to-frame transitions. While the
incorporation of corrective terms is a known practice, our method
differentiates itself by employing a unique combination of $l_1$ norm
regularization for sparsity and a temporal smoothness constraint through
roughness penalty, focusing on the sum of second differences in consecutive
frame weights. A significant innovation in our approach is the temporal
decoupling of blendshapes, which permits simultaneous optimization across
entire animation sequences. This feature sets our work apart from existing
methods and contributes to a more efficient and effective solution. Our
algorithm exhibits a marked improvement in maintaining data fidelity and
ensuring smooth frame transitions when compared to prior approaches that either
lack smoothness regularization or rely solely on linear blendshape models. In
addition to superior mesh resemblance and smoothness, our method offers
practical benefits, including reduced computational complexity and execution
time, achieved through a novel parallelization strategy using clustering
methods. Our results not only advance the state of the art in terms of
fidelity, sparsity, and smoothness in inverse rigging but also introduce
significant efficiency improvements. The source code will be made available
upon acceptance of the paper.
</p>
|
|
|
|
<p>Extracting meaningful information from high-dimensional data poses a
formidable modeling challenge, particularly when the data is obscured by noise
or represented through different modalities. In this research, we propose a
novel non-parametric modeling approach, leveraging the Gaussian Process (GP),
to characterize high-dimensional data by mapping it to a latent low-dimensional
manifold. This model, named the Latent Discriminative Generative Decoder
(LDGD), utilizes both the data (or its features) and associated labels (such as
category or stimulus) in the manifold discovery process. To infer the latent
variables, we derive a Bayesian solution, allowing LDGD to effectively capture
inherent uncertainties in the data while enhancing the model's predictive
accuracy and robustness. We demonstrate the application of LDGD on both
synthetic and benchmark datasets. Not only does LDGD infer the manifold
accurately, but its prediction accuracy in anticipating labels surpasses
state-of-the-art approaches. We have introduced inducing points to reduce the
computational complexity of Gaussian Processes (GPs) for large datasets. This
enhancement facilitates batch training, allowing for more efficient processing
and scalability in handling extensive data collections. Additionally, we
illustrate that LDGD achieves higher accuracy in predicting labels and operates
effectively with a limited training dataset, underscoring its efficiency and
effectiveness in scenarios where data availability is constrained. These
attributes set the stage for the development of non-parametric modeling
approaches in the analysis of high-dimensional data; especially in fields where
data are both high-dimensional and complex.
</p>
|
|
|
|
<p>Pneumatic systems are common in manufacturing, healthcare, transportation,
robotics, and many other fields. Failures in these systems can have very
serious consequences, particularly if they go undetected. In this work, we
present an air-powered error detector device that can detect and respond to
failures in pneumatically actuated systems. The device contains 21 monolithic
membrane valves that act like transistors in a pneumatic logic "circuit" that
uses vacuum to represent TRUE and atmospheric pressure as FALSE. Three
pneumatic exclusive-OR (XOR) gates are used to calculate the parity bit
corresponding to the values of several control bits. If the calculated value of
the parity bit differs from the expected value, then an error (like a leak or a
blocked air line) has been detected and the device outputs a pneumatic error
signal which can in turn be used to alert a user, shut down the system, or take
some other action. As a proof-of-concept, we used our pneumatic error detector
to monitor the operation of a medical device, an intermittent pneumatic
compression (IPC) device commonly used to prevent the formation of
life-threatening blood clots in the wearer's legs. Experiments confirm that
when the IPC device was damaged, the pneumatic error detector immediately
recognized the error (a leak) and alerted the wearer using sound. By providing
a simple and low-cost way to add fault detection to pneumatic actuation systems
without using sensors, our pneumatic error detector can promote safety and
reliability across the wide range of pneumatic systems.
</p>
|
|
|
|
<p>This paper presents a modeling effort to explore the underlying physics of
temperature evolution during additive friction stir deposition (AFSD) by a
human-AI teaming approach. AFSD is an emerging solid-state additive
manufacturing technology that deposits materials without melting. However, both
process modeling and modeling of the AFSD tool are at an early stage. In this
paper, a human-AI teaming approach is proposed to combine models based on first
principles with AI. The resulting human-informed machine learning method,
denoted as AFSD-Physics, can effectively learn the governing equations of
temperature evolution at the tool and the build from in-process measurements.
Experiments are designed and conducted to collect in-process measurements for
the deposition of aluminum 7075 with a total of 30 layers. The acquired
governing equations are physically interpretable models with low computational
cost and high accuracy. Model predictions show good agreement with the
measurements. Experimental validation with new process parameters demonstrates
the model's generalizability and potential for use in tool temperature control
and process optimization.
</p>
|
|
|
|
<p>The growing reliance on online services underscores the crucial role of
recommendation systems, especially on social media platforms seeking increased
user engagement. This study investigates how recommendation systems influence
the impact of personal behavioral traits on social network dynamics. It
explores the interplay between homophily, users' openness to novel ideas, and
recommendation-driven exposure to new opinions. Additionally, the research
examines the impact of recommendation systems on the diversity of newly
generated ideas, shedding light on the challenges and opportunities in
designing effective systems that balance the exploration of new ideas with the
risk of reinforcing biases or filtering valuable, unconventional concepts.
</p>
|
|
|
|
<p>There is a growing demand for transparency in search engines to understand
how search results are curated and to enhance users' trust. Prior research has
introduced search result explanations with a focus on how to explain, assuming
explanations are beneficial. Our study takes a step back to examine if search
explanations are needed and when they are likely to provide benefits.
Additionally, we summarize key characteristics of helpful explanations and
share users' perspectives on explanation features provided by Google and Bing.
Interviews with non-technical individuals reveal that users do not always seek
or understand search explanations and mostly desire them for complex and
critical tasks. They find Google's search explanations too obvious but
appreciate the ability to contest search results. Based on our findings, we
offer design recommendations for search engines and explanations to help users
better evaluate search results and enhance their search experience.
</p>
|
|
|
|
<p>Artificial intelligence (AI) has seen remarkable advancements across various
domains, including natural language processing, computer vision, autonomous
vehicles, and biology. However, the rapid expansion of AI technologies has
escalated the demand for more powerful computing resources. As digital
computing approaches fundamental limits, neuromorphic photonics emerges as a
promising platform to complement existing digital systems. In neuromorphic
photonic computing, photonic devices are controlled using analog signals. This
necessitates the use of digital-to-analog converters (DAC) and
analog-to-digital converters (ADC) for interfacing with these devices during
inference and training. However, data movement between memory and these
converters in conventional von Neumann computing architectures consumes energy.
To address this, analog memory co-located with photonic computing devices is
proposed. This approach aims to reduce the reliance on DACs and ADCs and
minimize data movement to enhance compute efficiency. This paper demonstrates a
monolithically integrated neuromorphic photonic circuit with co-located
capacitive analog memory and compares various analog memory technologies for
neuromorphic photonic computing using the MNIST dataset as a benchmark.
</p>
|
|
|
|
<p>The Kinematic Theory of rapid movements, and its associated Sigma-Lognormal,
model 2D spatiotemporal trajectories. It is constructed mainly as a temporal
overlap of curves between virtual target points. Specifically, it uses an arc
and a lognormal as primitives for the representation of the trajectory and
velocity, respectively. This paper proposes developing this model, in what we
call the Kinematic Theory Transform, which establishes a mathematical framework
that allows further primitives to be used. Mainly, we evaluate Euler curves to
link virtual target points and Gaussian, Beta, Gamma, Double-bounded lognormal,
and Generalized Extreme Value functions to model the bell-shaped velocity
profile. Using these primitives, we report reconstruction results with
spatiotemporal trajectories executed by human beings, animals, and
anthropomorphic robots.
</p>
|
|
|
|
<p>In the realm of Earth science, effective cloud property retrieval,
encompassing cloud masking, cloud phase classification, and cloud optical
thickness (COT) prediction, remains pivotal. Traditional methodologies
necessitate distinct models for each sensor instrument due to their unique
spectral characteristics. Recent strides in Earth Science research have
embraced machine learning and deep learning techniques to extract features from
satellite datasets' spectral observations. However, prevailing approaches lack
novel architectures accounting for hierarchical relationships among retrieval
tasks. Moreover, considering the spectral diversity among existing sensors, the
development of models with robust generalization capabilities over different
sensor datasets is imperative. Surprisingly, there is a dearth of methodologies
addressing the selection of an optimal model for diverse datasets. In response,
this paper introduces MT-HCCAR, an end-to-end deep learning model employing
multi-task learning to simultaneously tackle cloud masking, cloud phase
retrieval (classification tasks), and COT prediction (a regression task). The
MT-HCCAR integrates a hierarchical classification network (HC) and a
classification-assisted attention-based regression network (CAR), enhancing
precision and robustness in cloud labeling and COT prediction. Additionally, a
comprehensive model selection method rooted in K-fold cross-validation, one
standard error rule, and two introduced performance scores is proposed to
select the optimal model over three simulated satellite datasets OCI, VIIRS,
and ABI. The experiments comparing MT-HCCAR with baseline methods, the ablation
studies, and the model selection affirm the superiority and the generalization
capabilities of MT-HCCAR.
</p>
|
|
|
|
<p>This work undertakes studies to evaluate Interpretability Methods for
Time-Series Deep Learning. Sensitivity analysis assesses how input changes
affect the output, constituting a key component of interpretation. Among the
post-hoc interpretation methods such as back-propagation, perturbation, and
approximation, my work will investigate perturbation-based sensitivity Analysis
methods on modern Transformer models to benchmark their performances.
Specifically, my work answers three research questions: 1) Do different
sensitivity analysis (SA) methods yield comparable outputs and attribute
importance rankings? 2) Using the same sensitivity analysis method, do
different Deep Learning (DL) models impact the output of the sensitivity
analysis? 3) How well do the results from sensitivity analysis methods align
with the ground truth?
</p>
|
|
|
|
<p>Deep learning-based informative band selection methods on hyperspectral
images (HSI) recently have gained intense attention to eliminate spectral
correlation and redundancies. However, the existing deep learning-based methods
either need additional post-processing strategies to select the descriptive
bands or optimize the model indirectly, due to the parameterization inability
of discrete variables for the selection procedure. To overcome these
limitations, this work proposes a novel end-to-end network for informative band
selection. The proposed network is inspired by the advances in concrete
autoencoder (CAE) and dropout feature ranking strategy. Different from the
traditional deep learning-based methods, the proposed network is trained
directly given the required band subset eliminating the need for further
post-processing. Experimental results on four HSI scenes show that the proposed
dropout CAE achieves substantial and effective performance levels outperforming
the competing methods.
</p>
|
|
|
|
<p>FPGA technology mapping is the process of implementing a hardware design
expressed in high-level HDL (hardware design language) code using the
low-level, architecture-specific primitives of the target FPGA. As FPGAs become
increasingly heterogeneous, achieving high performance requires hardware
synthesis tools that better support mapping to complex, highly configurable
primitives like digital signal processors (DSPs). Current tools support DSP
mapping via handwritten special-case mapping rules, which are laborious to
write, error-prone, and often overlook mapping opportunities. We introduce
Lakeroad, a principled approach to technology mapping via sketch-guided program
synthesis. Lakeroad leverages two techniques -- architecture-independent sketch
templates and semantics extraction from HDL -- to provide extensible technology
mapping with stronger correctness guarantees and higher coverage of mapping
opportunities than state-of-the-art tools. Across representative
microbenchmarks, Lakeroad produces 2--3.5$\times$ the number of optimal
mappings compared to proprietary state-of-the-art tools and 6--44$\times$ the
number of optimal mappings compared to popular open-source tools, while also
providing correctness guarantees not given by any other tool.
</p>
|
|
|
|
<p>Scheduling real-time tasks that utilize GPUs with analyzable guarantees poses
a significant challenge due to the intricate interaction between CPU and GPU
resources, as well as the complex GPU hardware and software stack. While much
research has been conducted in the real-time research community, several
limitations persist, including the absence or limited availability of
preemption, extended blocking times, and/or the need for extensive
modifications to program code. In this paper, we propose two novel techniques,
namely the kernel thread and IOCTL-based approaches, to enable preemptive
priority-based scheduling for real-time GPU tasks. Our approaches exert control
over GPU context scheduling at the device driver level and enable preemptive
GPU scheduling based on task priorities. The kernel thread-based approach
achieves this without requiring modifications to user-level programs, while the
IOCTL-based approach needs only a single macro at the boundaries of GPU access
segments. In addition, we provide a comprehensive response time analysis that
takes into account overlaps between different task segments, mitigating
pessimism in worst-case estimates. Through empirical evaluations and case
studies, we demonstrate the effectiveness of the proposed approaches in
improving taskset schedulability and timeliness of real-time tasks. The results
highlight significant improvements over prior work, with up to 40\% higher
schedulability, while also achieving predictable worst-case behavior on Nvidia
Jetson embedded platforms.
</p>
|
|
|
|
<p>Selection of hyperparameters in deep neural networks is a challenging problem
due to the wide search space and emergence of various layers with specific
hyperparameters. There exists an absence of consideration for the neural
architecture selection of convolutional neural networks (CNNs) for spectrum
sensing. Here, we develop a method using reinforcement learning and Q-learning
to systematically search and evaluate various architectures for generated
datasets including different signals and channels in the spectrum sensing
problem. We show by extensive simulations that CNN-based detectors proposed by
our developed method outperform several detectors in the literature. For the
most complex dataset, the proposed approach provides 9% enhancement in accuracy
at the cost of higher computational complexity. Furthermore, a novel method
using multi-armed bandit model for selection of the sensing time is proposed to
achieve higher throughput and accuracy while minimizing the consumed energy.
The method dynamically adjusts the sensing time under the time-varying
condition of the channel without prior information. We demonstrate through a
simulated scenario that the proposed method improves the achieved reward by
about 20% compared to the conventional policies. Consequently, this study
effectively manages the selection of important hyperparameters for CNN-based
detectors offering superior performance of cognitive radio network.
</p>
|
|
|
|
<p>In high-end visual effects pipelines, a customized (and expensive) light
stage system is (typically) used to scan an actor in order to acquire both
geometry and texture for various expressions. Aiming towards democratization,
we propose a novel pipeline for obtaining geometry and texture as well as
enough expression information to build a customized person-specific animation
rig without using a light stage or any other high-end hardware (or manual
cleanup). A key novel idea consists of warping real-world images to align with
the geometry of a template avatar and subsequently projecting the warped image
into the template avatar's texture; importantly, this allows us to leverage
baked-in real-world lighting/texture information in order to create surrogate
facial features (and bridge the domain gap) for the sake of geometry
reconstruction. Not only can our method be used to obtain a neutral expression
geometry and de-lit texture, but it can also be used to improve avatars after
they have been imported into an animation system (noting that such imports tend
to be lossy, while also hallucinating various features). Since a default
animation rig will contain template expressions that do not correctly
correspond to those of a particular individual, we use a Simon Says approach to
capture various expressions and build a person-specific animation rig (that
moves like they do). Our aforementioned warping/projection method has high
enough efficacy to reconstruct geometry corresponding to each expressions.
</p>
|
|
|
|
<p>Battery-constrained power consumption, compute limitations, and high frame
rate requirements in head-mounted displays present unique challenges in the
drive to present increasingly immersive and comfortable imagery in virtual
reality. However, humans are not equally sensitive to all regions of the visual
field, and perceptually-optimized rendering techniques are increasingly
utilized to address these bottlenecks. Many of these techniques are
gaze-contingent and often render reduced detail away from a user's fixation.
Such techniques are dependent on spatio-temporally-accurate gaze tracking and
can result in obvious visual artifacts when eye tracking is inaccurate. In this
work we present a gaze-contingent rendering technique which only requires
saccade detection, bypassing the need for highly-accurate eye tracking. In our
first experiment, we show that visual acuity is reduced for several hundred
milliseconds after a saccade. In our second experiment, we use these results to
reduce the rendered image resolution after saccades in a controlled
psychophysical setup, and find that observers cannot discriminate between
saccade-contingent reduced-resolution rendering and full-resolution rendering.
Finally, in our third experiment, we introduce a 90 pixels per degree headset
and validate our saccade-contingent rendering method under typical VR viewing
conditions.
</p>
|
|
|
|
<p>Utilizing administrative data to predict outcomes is an important application
area of machine learning, particularly in healthcare. Most administrative data
records are timestamped and the pattern of records over time is a key input for
machine learning models. This paper explores how best to divide the observation
window of a machine learning model into time segments or "bins". A
computationally efficient process is presented that identifies which data
features benefit most from smaller, higher resolution time segments. Results
generated on healthcare and housing/homelessness administrative data
demonstrate that optimizing the time bin size of these high priority features
while using a single time bin for the other features achieves machine learning
models that are simpler and quicker to train. This approach also achieves
similar and sometimes better performance than more complex models that default
to representing all data features with the same time resolution.
</p>
|
|
|
|
<p>This work focuses on non-adaptive group testing, with a primary goal of
efficiently identifying a set of at most $d$ defective elements among a given
set of elements using the fewest possible number of tests. Non-adaptive
combinatorial group testing often employs disjunctive codes and union-free
codes. This paper discusses union-free codes with fast decoding (UFFD codes), a
recently introduced class of union-free codes that combine the best of both
worlds -- the linear complexity decoding of disjunctive codes and the fewest
number of tests of union-free codes. In our study, we distinguish two
subclasses of these codes -- one subclass, denoted as $(=d)$-UFFD codes, can be
used when the number of defectives $d$ is a priori known, whereas $(\le
d)$-UFFD codes works for any subset of at most $d$ defectives. Previous studies
have established a lower bound on the rate of these codes for $d=2$. Our
contribution lies in deriving new lower bounds on the rate for both $(=d)$- and
$(\le d)$-UFFD codes for an arbitrary number $d \ge 2$ of defectives. Our
results show that for $d\to\infty$, the rate of $(=d)$-UFFD codes is twice as
large as the best-known lower bound on the rate of $d$-disjunctive codes. In
addition, the rate of $(\le d)$-UFFD code is shown to be better than the known
lower bound on the rate of $d$-disjunctive codes for small values of $d$.
</p>
|
|
|
|
<p>The intricate relationship between human decision-making and emotions,
particularly guilt and regret, has significant implications on behavior and
well-being. Yet, these emotions subtle distinctions and interplay are often
overlooked in computational models. This paper introduces a dataset tailored to
dissect the relationship between guilt and regret and their unique textual
markers, filling a notable gap in affective computing research. Our approach
treats guilt and regret recognition as a binary classification task and employs
three machine learning and six transformer-based deep learning techniques to
benchmark the newly created dataset. The study further implements innovative
reasoning methods like chain-of-thought and tree-of-thought to assess the
models interpretive logic. The results indicate a clear performance edge for
transformer-based models, achieving a 90.4% macro F1 score compared to the
85.3% scored by the best machine learning classifier, demonstrating their
superior capability in distinguishing complex emotional states.
</p>
|
|
|
|
<p>This paper presents a fully multidimensional kernel-based reconstruction
scheme for finite volume methods applied to systems of hyperbolic conservation
laws, with a particular emphasis on the compressible Euler equations.
Non-oscillatory reconstruction is achieved through an adaptive order weighted
essentially non-oscillatory (WENO-AO) method cast into a form suited to
multidimensional reconstruction. A kernel-based approach inspired by radial
basis functions (RBF) and Gaussian process (GP) modeling, which we call
KFVM-WENO, is presented here. This approach allows the creation of a scheme of
arbitrary order of accuracy with simply defined multidimensional stencils and
substencils. Furthermore, the fully multidimensional nature of the
reconstruction allows for a more straightforward extension to higher spatial
dimensions and removes the need for complicated boundary conditions on
intermediate quantities in modified dimension-by-dimension methods. In
addition, a new simple-yet-effective set of reconstruction variables is
introduced, which could be useful in existing schemes with little modification.
The proposed scheme is applied to a suite of stringent and informative
benchmark problems to demonstrate its efficacy and utility. A highly parallel
multi-GPU implementation using Kokkos and the message passing interface (MPI)
is also provided.
</p>
|
|
|
|
<p>In this study, we developed a real-time connected vehicle (CV) speed advisory
application that uses public cloud services and tested it on a simulated
signalized corridor for different roadway traffic conditions. First, we
developed a scalable serverless cloud computing architecture leveraging public
cloud services offered by Amazon Web Services (AWS) to support the requirements
of a real-time CV application. Second, we developed an optimization-based
real-time CV speed advisory algorithm by taking a modular design approach,
which makes the application automatically scalable and deployable in the cloud
using the serverless architecture. Third, we developed a cloud-in-the-loop
simulation testbed using AWS and an open-source microscopic roadway traffic
simulator called Simulation of Urban Mobility (SUMO). Our analyses based on
different roadway traffic conditions showed that the serverless CV speed
advisory application meets the latency requirement of real-time CV mobility
applications. Besides, our serverless CV speed advisory application reduced the
average stopped delay (by 77%) and the aggregated risk of collision (by 21%) at
signalized intersection of a corridor. These prove the feasibility as well as
the efficacy of utilizing public cloud infrastructure to implement real-time
roadway traffic management applications in a CV environment.
</p>
|
|
|
|
<p>The MPI Forum has recently adopted a Python scripting engine for generating
the API text in the standard document. As a by-product, it made available
reliable and rich descriptions of all MPI functions that are suited for
scripting tools. Using these extracted API information, we developed a Python
code generation toolbox to generate the language binding layers in MPICH. The
toolbox replaces nearly 70,000 lines of manually maintained C and Fortran 2008
binding code with around 5,000 lines of Python scripts plus some simple
configuration. In addition to completely eliminating code duplication in the
binding layer and avoiding bugs from manual code copying , the code generation
also minimizes the effort for API extension and code instrumentation. This is
demonstrated in our implementation of MPI-4 large count functions and the
prototyping of a next generation MPI profiling interface, QMPI.
</p>
|
|
|
|
<p>Multi-label learning is a rapidly growing research area that aims to predict
multiple labels from a single input data point. In the era of big data, tasks
involving multi-label classification (MLC) or ranking present significant and
intricate challenges, capturing considerable attention in diverse domains.
Inherent difficulties in MLC include dealing with high-dimensional data,
addressing label correlations, and handling partial labels, for which
conventional methods prove ineffective. Recent years have witnessed a notable
increase in adopting deep learning (DL) techniques to address these challenges
more effectively in MLC. Notably, there is a burgeoning effort to harness the
robust learning capabilities of DL for improved modelling of label dependencies
and other challenges in MLC. However, it is noteworthy that comprehensive
studies specifically dedicated to DL for multi-label learning are limited.
Thus, this survey aims to thoroughly review recent progress in DL for
multi-label learning, along with a summary of open research problems in MLC.
The review consolidates existing research efforts in DL for MLC,including deep
neural networks, transformers, autoencoders, and convolutional and recurrent
architectures. Finally, the study presents a comparative analysis of the
existing methods to provide insightful observations and stimulate future
research directions in this domain.
</p>
|
|
|
|
<p>MPI+Threads, embodied by the MPI/OpenMP hybrid programming model, is a
parallel programming paradigm where threads are used for on-node shared-memory
parallelization and MPI is used for multi-node distributed-memory
parallelization. OpenMP provides an incremental approach to parallelize code,
while MPI, with its isolated address space and explicit messaging API, affords
straightforward paths to obtain good parallel performance. However, MPI+Threads
is not an ideal solution. Since MPI is unaware of the thread context, it cannot
be used for interthread communication. This results in duplicated efforts to
create separate and sometimes nested solutions for similar parallel tasks. In
addition, because the MPI library is required to obey message-ordering
semantics, mixing threads and MPI via MPI_THREAD_MULTIPLE can easily result in
miserable performance due to accidental serializations.
</p>
<p>We propose a new MPI extension, MPIX Thread Communicator (threadcomm), that
allows threads to be assigned distinct MPI ranks within thread parallel
regions. The threadcomm extension combines both MPI processes and OpenMP
threads to form a unified parallel environment. We show that this MPIxThreads
(MPI Multiply Threads) paradigm allows OpenMP and MPI to work together in a
complementary way to achieve both cleaner codes and better performance.
</p>
|
|
|
|
<p>Database modeling is a key activity towards the fulfillment of storage
requirements. Despite the availability of several database modeling tools for
developers, these often come with associated costs, setup complexities,
usability challenges, or dependency on specific operating systems. In this
paper we present ONDA, a web-based tool developed at the University of Coimbra,
that allows the creation of Entity-Relationship diagrams, visualization of
physical models, and generation of SQL code for various database engines. ONDA
is freely available at https://onda.dei.uc.pt and was created with the
intention of supporting teaching activities at university-level database
courses. At the time of writing, the tool being used by more than three hundred
university students every academic year.
</p>
|
|
|
|
<p>Training large language models (LLMs) with a large and diverse instruction
dataset aligns the models to comprehend and follow human instructions. Recent
works have shown that using a small set of high-quality instructions can
outperform using large yet more noisy ones. Because instructions are unlabeled
and their responses are natural text, traditional active learning schemes with
the model's confidence cannot be directly applied to the selection of unlabeled
instructions. In this work, we propose a novel method for instruction
selection, called SelectLLM, that leverages LLMs for the selection of
high-quality instructions. Our high-level idea is to use LLMs to estimate the
usefulness and impactfulness of each instruction without the corresponding
labels (i.e., responses), via prompting. SelectLLM involves two steps: dividing
the unlabelled instructions using a clustering algorithm (e.g., CoreSet) to
multiple clusters, and then prompting LLMs to choose high-quality instructions
within each cluster. SelectLLM showed comparable or slightly better performance
on the popular instruction benchmarks, compared to the recent state-of-the-art
selection methods. All code and data are publicly available
(https://github.com/minnesotanlp/select-llm).
</p>
|
|
|
|
<p>The objective of this research is to mitigate vibrations in induction motors.
To achieve this goal, a discontinuous pulse width modulation (PWM) control
strategy based on carrier wave modulation is proposed for multilevel inverters.
This study provides justification for the reduction of machine vibrations
compared to existing control techniques documented in the technical literature.
Additionally, the proposed technique offers the advantage of attenuating the
Total Harmonic Distortion of the multilevel inverter's output voltage while
simultaneously achieving a higher RMS value for the same DC level. By modifying
a parameter of the carrier wave, the control strategy allows for variations in
the electrical spectrum while avoiding natural mechanical resonance
frequencies, thereby reducing motor vibrations. Laboratory results
demonstrating the application of different modulation strategies in a
multilevel inverter for an induction motor and a comparison with the presented
strategy are provided
</p>
|
|
|
|
<p>The pervasive spread of misinformation and disinformation poses a significant
threat to society. Professional fact-checkers play a key role in addressing
this threat, but the vast scale of the problem forces them to prioritize their
limited resources. This prioritization may consider a range of factors, such as
varying risks of harm posed to specific groups of people. In this work, we
investigate potential implications of using a large language model (LLM) to
facilitate such prioritization. Because fact-checking impacts a wide range of
diverse segments of society, it is important that diverse views are represented
in the claim prioritization process. This paper examines whether a LLM can
reflect the views of various groups when assessing the harms of misinformation,
focusing on gender as a primary variable. We pose two central questions: (1) To
what extent do prompts with explicit gender references reflect gender
differences in opinion in the United States on topics of social relevance? and
(2) To what extent do gender-neutral prompts align with gendered viewpoints on
those topics? To analyze these questions, we present the TopicMisinfo dataset,
containing 160 fact-checked claims from diverse topics, supplemented by nearly
1600 human annotations with subjective perceptions and annotator demographics.
Analyzing responses to gender-specific and neutral prompts, we find that GPT
3.5-Turbo reflects empirically observed gender differences in opinion but
amplifies the extent of these differences. These findings illuminate AI's
complex role in moderating online communication, with implications for
fact-checkers, algorithm designers, and the use of crowd-workers as annotators.
We also release the TopicMisinfo dataset to support continuing research in the
community.
</p>
|
|
|
|
<p>This paper describes the results of the IEEE BigData 2023 Keystroke
Verification Challenge (KVC), that considers the biometric verification
performance of Keystroke Dynamics (KD), captured as tweet-long sequences of
variable transcript text from over 185,000 subjects. The data are obtained from
two of the largest public databases of KD up to date, the Aalto Desktop and
Mobile Keystroke Databases, guaranteeing a minimum amount of data per subject,
age and gender annotations, absence of corrupted data, and avoiding excessively
unbalanced subject distributions with respect to the considered demographic
attributes. Several neural architectures were proposed by the participants,
leading to global Equal Error Rates (EERs) as low as 3.33% and 3.61% achieved
by the best team respectively in the desktop and mobile scenario, outperforming
the current state of the art biometric verification performance for KD. Hosted
on CodaLab, the KVC will be made ongoing to represent a useful tool for the
research community to compare different approaches under the same experimental
conditions and to deepen the knowledge of the field.
</p>
|
|
|
|
<p>Manipulating deformable objects arises in daily life and numerous
applications. Despite phenomenal advances in industrial robotics, manipulation
of deformable objects remains mostly a manual task. This is because of the high
number of internal degrees of freedom and the complexity of predicting its
motion. In this paper, we apply the computationally efficient position-based
dynamics method to predict object motion and distance to obstacles. This
distance is incorporated in a control barrier function for the resolved motion
kinematic control for one or more robots to adjust their motion to avoid
colliding with the obstacles. The controller has been applied in simulations to
1D and 2D deformable objects with varying numbers of assistant agents,
demonstrating its versatility across different object types and multi-agent
systems. Results indicate the feasibility of real-time collision avoidance
through deformable object simulation, minimizing path tracking error while
maintaining a predefined minimum distance from obstacles and preventing
overstretching of the deformable object. The implementation is performed in
ROS, allowing ready portability to different applications.
</p>
|
|
|
|
<p>The number of Hindi speakers on social media has increased dramatically in
recent years. Regret is a common emotional experience in our everyday life.
Many speakers on social media, share their regretful experiences and opinions
regularly. It might cause a re-evaluation of one's choices and a desire to make
a different option if given the chance. As a result, knowing the source of
regret is critical for investigating its impact on behavior and
decision-making. This study focuses on regret and how it is expressed,
specifically in Hindi, on various social media platforms. In our study, we
present a novel dataset from three different sources, where each sentence has
been manually classified into one of three classes "Regret by action", "Regret
by inaction", and "No regret". Next, we use this dataset to investigate the
linguistic expressions of regret in Hindi text and also identify the textual
domains that are most frequently associated with regret. Our findings indicate
that individuals on social media platforms frequently express regret for both
past inactions and actions, particularly within the domain of interpersonal
relationships. We use a pre-trained BERT model to generate word embeddings for
the Hindi dataset and also compare deep learning models with conventional
machine learning models in order to demonstrate accuracy. Our results show that
BERT embedding with CNN consistently surpassed other models. This described the
effectiveness of BERT for conveying the context and meaning of words in the
regret domain.
</p>
|
|
|
|
<p>The Ego Network Model (ENM) is a model for the structural organisation of
relationships, rooted in evolutionary anthropology, that is found ubiquitously
in social contexts. It takes the perspective of a single user (Ego) and
organises their contacts (Alters) into a series of (typically 5) concentric
circles of decreasing intimacy and increasing size. Alters are sorted based on
their tie strength to the Ego, however, this is difficult to measure directly.
Traditionally, the interaction frequency has been used as a proxy but this
misses the qualitative aspects of connections, such as signs (i.e. polarity),
which have been shown to provide extremely useful information. However, the
sign of an online social relationship is usually an implicit piece of
information, which needs to be estimated by interaction data from Online Social
Networks (OSNs), making sign prediction in OSNs a research challenge in and of
itself. This work aims to bring the ENM into the signed networks domain by
investigating the interplay of signed connections with the ENM. This paper
delivers 2 main contributions. Firstly, a new and data-efficient method of
signing relationships between individuals using sentiment analysis and,
secondly, we provide an in-depth look at the properties of Signed Ego Networks
(SENs), using 9 Twitter datasets of various categories of users. We find that
negative connections are generally over-represented in the active part of the
Ego Networks, suggesting that Twitter greatly over-emphasises negative
relationships with respect to "offline" social networks. Further, users who use
social networks for professional reasons have an even greater share of negative
connections.
</p>
|
|
|
|
<p>This paper proposes a novel, more computationally efficient method for
optimizing robot excitation trajectories for dynamic parameter identification,
emphasizing self-collision avoidance. This addresses the system identification
challenges for getting high-quality training data associated with
co-manipulated robotic arms that can be equipped with a variety of tools, a
common scenario in industrial but also clinical and research contexts.
Utilizing the Unified Robotics Description Format (URDF) to implement a
symbolic Python implementation of the Recursive Newton-Euler Algorithm (RNEA),
the approach aids in dynamically estimating parameters such as inertia using
regression analyses on data from real robots. The excitation trajectory was
evaluated and achieved on par criteria when compared to state-of-the-art
reported results which didn't consider self-collision and tool calibrations.
Furthermore, physical Human-Robot Interaction (pHRI) admittance control
experiments were conducted in a surgical context to evaluate the derived
inverse dynamics model showing a 30.1\% workload reduction by the NASA TLX
questionnaire.
</p>
|
|
|
|
<p>This paper introduces a stochastic hybrid system (SHS) framework in state
space model to capture sensor, communication, and system contingencies in
modern power systems (MPS). Within this new framework, the paper concentrates
on the development of state estimation methods and algorithms to provide
reliable state estimation under randomly intermittent and noisy sensor data.
MPSs employ diversified measurement devices for monitoring system operations
that are subject to random measurement errors and rely on communication
networks to transmit data whose channels encounter random packet loss and
interruptions. The contingency and noise form two distinct and interacting
stochastic processes that have a significant impact on state estimation
accuracy and reliability. This paper formulates stochastic hybrid system models
for MPSs, introduces coordinated observer design algorithms for state
estimation, and establishes their convergence and reliability properties. A
further study reveals a fundamental design tradeoff between convergence rates
and steady-state error variances. Simulation studies on the IEEE 5-bus system
and IEEE 33-bus system are used to illustrate the modeling methods, observer
design algorithms, convergence properties, performance evaluations, and impact
sensor system selections.
</p>
|
|
|
|
<p>Communication with the goal of accurately conveying meaning, rather than
accurately transmitting symbols, has become an area of growing interest. This
paradigm, termed semantic communication, typically leverages modern
developments in artificial intelligence and machine learning to improve the
efficiency and robustness of communication systems. However, a standard model
for capturing and quantifying the details of "meaning" is lacking, with many
leading approaches to semantic communication adopting a black-box framework
with little understanding of what exactly the model is learning. One solution
is to utilize the conceptual spaces framework, which models meaning explicitly
in a geometric manner. Though prior work studying semantic communication with
conceptual spaces has shown promising results, these previous attempts involve
hand-crafting a conceptual space model, severely limiting the scalability and
practicality of the approach. In this work, we develop a framework for learning
a domain of a conceptual space model using only the raw data with high-level
property labels. In experiments using the MNIST and CelebA datasets, we show
that the domains learned using the framework maintain semantic similarity
relations and possess interpretable dimensions.
</p>
|
|
|
|
<p>This study examines the use of embedded tweets in online news media. In
particular, we add to the previous literature by exploring embedded tweets
across reliable and unreliable news outlets. We use a mixed-method analysis to
examine how the function and frequency of embedded tweets change across outlet
reliability and news topic. We find that, no matter the outlet reliability,
embedded tweets are most often used to relay the opinions of elites, to
syndicate information from another news source, or to self-cite information an
outlet previously produced. Our results also show some notable differences
between reliable media and fringe media's use of tweets. Namely, fringe media
embed tweets more and use those tweets as the source of news more than reliable
media. Our work adds to the literature on hybrid media systems and the
normalization of social media in journalism.
</p>
|
|
|
|
<p>We study an opinion dynamics model in which each agent takes a random
Bernoulli distributed action whose probability is updated at each discrete time
step, and we prove that this model converges almost surely to consensus. We
also provide a detailed critique of a claimed proof of this result in the
literature. We generalize the result by proving that the assumption of
irreducibility in the original model is not necessary. Furthermore, we prove as
a corollary of the generalized result that the almost sure convergence to
consensus holds also in the presence of a stubborn agent which never changes
its opinion. In addition, we show that the model, in both the original and
generalized cases, converges to consensus also in $r$th mean.
</p>
|
|
|
|
<p>The dominant probing approaches rely on the zero-shot performance of
image-text matching tasks to gain a finer-grained understanding of the
representations learned by recent multimodal image-language transformer models.
The evaluation is carried out on carefully curated datasets focusing on
counting, relations, attributes, and others. This work introduces an
alternative probing strategy called guided masking. The proposed approach
ablates different modalities using masking and assesses the model's ability to
predict the masked word with high accuracy. We focus on studying multimodal
models that consider regions of interest (ROI) features obtained by object
detectors as input tokens. We probe the understanding of verbs using guided
masking on ViLBERT, LXMERT, UNITER, and VisualBERT and show that these models
can predict the correct verb with high accuracy. This contrasts with previous
conclusions drawn from image-text matching probing techniques that frequently
fail in situations requiring verb understanding. The code for all experiments
will be publicly available https://github.com/ivana-13/guided_masking.
</p>
|
|
|
|
<p>Large Language Models (LLMs) have demonstrated remarkable language
understanding and generation capabilities. However, training, deploying, and
accessing these models pose notable challenges, including resource-intensive
demands, extended training durations, and scalability issues. To address these
issues, we introduce a concept of hierarchical, distributed LLM architecture
that aims at enhancing the accessibility and deployability of LLMs across
heterogeneous computing platforms, including general-purpose computers (e.g.,
laptops) and IoT-style devices (e.g., embedded systems). By introducing a
"layered" approach, the proposed architecture enables on-demand accessibility
to LLMs as a customizable service. This approach also ensures optimal
trade-offs between the available computational resources and the user's
application needs. We envision that the concept of hierarchical LLM will
empower extensive, crowd-sourced user bases to harness the capabilities of
LLMs, thereby fostering advancements in AI technology in general.
</p>
|
|
|
|
<p>In radiology, Artificial Intelligence (AI) has significantly advanced report
generation, but automatic evaluation of these AI-produced reports remains
challenging. Current metrics, such as Conventional Natural Language Generation
(NLG) and Clinical Efficacy (CE), often fall short in capturing the semantic
intricacies of clinical contexts or overemphasize clinical details, undermining
report clarity. To overcome these issues, our proposed method synergizes the
expertise of professional radiologists with Large Language Models (LLMs), like
GPT-3.5 and GPT-4 1. Utilizing In-Context Instruction Learning (ICIL) and Chain
of Thought (CoT) reasoning, our approach aligns LLM evaluations with
radiologist standards, enabling detailed comparisons between human and AI
generated reports. This is further enhanced by a Regression model that
aggregates sentence evaluation scores. Experimental results show that our
''Detailed GPT-4 (5-shot)'' model achieves a 0.48 score, outperforming the
METEOR metric by 0.19, while our ''Regressed GPT-4'' model shows even greater
alignment with expert evaluations, exceeding the best existing metric by a 0.35
margin. Moreover, the robustness of our explanations has been validated through
a thorough iterative strategy. We plan to publicly release annotations from
radiology experts, setting a new standard for accuracy in future assessments.
This underscores the potential of our approach in enhancing the quality
assessment of AI-driven medical reports.
</p>
|
|
|
|
<p>One-shot channel simulation has recently emerged as a promising alternative
to quantization and entropy coding in machine-learning-based lossy data
compression schemes. However, while there are several potential applications of
channel simulation - lossy compression with realism constraints or differential
privacy, to name a few - little is known about its fundamental limitations. In
this paper, we restrict our attention to a subclass of channel simulation
protocols called causal rejection samplers (CRS), establish new, tighter lower
bounds on their expected runtime and codelength, and demonstrate the bounds'
achievability. Concretely, for an arbitrary CRS, let $Q$ and $P$ denote a
target and proposal distribution supplied as input, and let $K$ be the number
of samples examined by the algorithm. We show that the expected runtime
$\mathbb{E}[K]$ of any CRS scales at least as $\exp_2(D_\infty[Q || P])$, where
$D_\infty[Q || P]$ is the R\'enyi $\infty$-divergence. Regarding the
codelength, we show that $D_{KL}[Q || P] \leq D_{CS}[Q || P] \leq
\mathbb{H}[K]$, where $D_{CS}[Q || P]$ is a new quantity we call the channel
simulation divergence. Furthermore, we prove that our new lower bound, unlike
the $D_{KL}[Q || P]$ lower bound, is achievable tightly, i.e. there is a CRS
such that $\mathbb{H}[K] \leq D_{CS}[Q || P] + \log_2 (e + 1)$. Finally, we
conduct numerical studies of the asymptotic scaling of the codelength of
Gaussian and Laplace channel simulation algorithms.
</p>
|
|
|
|
<p>Job shop scheduling problems are one of the most important and challenging
combinatorial optimization problems that have been tackled mainly by exact or
approximate solution approaches. However, finding an exact solution can be
infeasible for real-world problems, and even with an approximate solution
approach, it can require a prohibitive amount of time to find a near-optimal
solution, and the found solutions are not applicable to new problems in
general. To address these challenges, we propose an attention-based
reinforcement learning method for the class of job shop scheduling problems by
integrating policy gradient reinforcement learning with a modified transformer
architecture. An important result is that our trained learners in the proposed
method can be reused to solve large-scale problems not used in training and
demonstrate that our approach outperforms the results of recent studies and
widely adopted heuristic rules.
</p>
|
|
|
|
<p>Translation into severely low-resource languages has both the cultural goal
of saving and reviving those languages and the humanitarian goal of assisting
the everyday needs of local communities that are accelerated by the recent
COVID-19 pandemic. In many humanitarian efforts, translation into severely
low-resource languages often does not require a universal translation engine,
but a dedicated text-specific translation engine. For example, healthcare
records, hygienic procedures, government communication, emergency procedures
and religious texts are all limited texts. While generic translation engines
for all languages do not exist, translation of multilingually known limited
texts into new, low-resource languages may be possible and reduce human
translation effort. We attempt to leverage translation resources from
rich-resource languages to efficiently produce best possible translation
quality for well known texts, which are available in multiple languages, in a
new, low-resource language. To reach this goal, we argue that in translating a
closed text into low-resource languages, generalization to out-of-domain texts
is not necessary, but generalization to new languages is. Performance gain
comes from massive source parallelism by careful choice of close-by language
families, style-consistent corpus-level paraphrases within the same language
and strategic adaptation of existing large pretrained multilingual models to
the domain first and then to the language. Such performance gain makes it
possible for machine translation systems to collaborate with human translators
to expedite the translation process into new, low-resource languages.
</p>
|
|
|
|
<p>Outsourced computation can put client data confidentiality at risk. Existing
solutions are either inefficient or insufficiently secure: cryptographic
techniques like fully-homomorphic encryption incur significant overheads, even
with hardware assistance, while the complexity of hardware-assisted trusted
execution environments has been exploited to leak secret data.
</p>
<p>Recent proposals such as BliMe and OISA show how dynamic information flow
tracking (DIFT) enforced in hardware can protect client data efficiently. They
are designed to protect CPU-only workloads. However, many outsourced computing
applications, like machine learning, make extensive use of accelerators.
</p>
<p>We address this gap with Dolma, which applies DIFT to the Gemmini matrix
multiplication accelerator, efficiently guaranteeing client data
confidentiality, even in the presence of malicious/vulnerable software and side
channel attacks on the server. We show that accelerators can allow DIFT logic
optimizations that significantly reduce area overhead compared with
general-purpose processor architectures. Dolma is integrated with the BliMe
framework to achieve end-to-end security guarantees. We evaluate Dolma on an
FPGA using a ResNet-50 DNN model and show that it incurs low overheads for
large configurations ($4.4\%$, $16.7\%$, $16.5\%$ for performance, resource
usage and power, respectively, with a 32x32 configuration).
</p>
|
|
|
|
<p>A business process model represents the expected behavior of a set of process
instances (cases). The process instances may be executed in parallel and may
affect each other through data or resources. In particular, changes in values
of data shared by process instances may affect a set of process instances and
require some operations in response. Such potential effects do not explicitly
appear in the process model. This paper addresses possible impacts that may be
affected through shared data across process instances and suggests how to
analyze them at design time (when the actual process instances do not yet
exist). The suggested method uses both a process model and a (relational) data
model in order to identify potential inter-instance data impact sets. These
sets may guide process users in tracking the impacts of data changes and
supporting their handling at runtime. They can also assist process designers in
exploring possible constraints over data. The applicability of the method was
evaluated using three different realistic processes. Using a process expert, we
further assessed the usefulness of the method, revealing some useful insights
for coping with unexpected data-related changes suggested by our approach.
</p>
|
|
|
|
<p>Robotic pick and place stands at the heart of autonomous manipulation. When
conducted in cluttered or complex environments robots must jointly reason about
the selected grasp and desired placement locations to ensure success. While
several works have examined this joint pick-and-place problem, none have fully
leveraged recent learning-based approaches for multi-fingered grasp planning.
We present a modular algorithm for joint pick and place planning that can make
use of state of the art grasp classifiers for planning multi-fingered grasps
for novel objects from partial view point clouds. We demonstrate our joint pick
and place formulation with several costs associated with different placement
tasks. Experiments on pick and place tasks with cluttered scenes using a
physical robot show that our joint inference method is more successful than a
sequential pick then place approach, while also achieving better placement
configurations.
</p>
|
|
|
|
<p>This study explores linguistic differences between human and LLM-generated
dialogues, using 19.5K dialogues generated by ChatGPT-3.5 as a companion to the
EmpathicDialogues dataset. The research employs Linguistic Inquiry and Word
Count (LIWC) analysis, comparing ChatGPT-generated conversations with human
conversations across 118 linguistic categories. Results show greater
variability and authenticity in human dialogues, but ChatGPT excels in
categories such as social processes, analytical style, cognition, attentional
focus, and positive emotional tone, reinforcing recent findings of LLMs being
"more human than human." However, no significant difference was found in
positive or negative affect between ChatGPT and human dialogues. Classifier
analysis of dialogue embeddings indicates implicit coding of the valence of
affect despite no explicit mention of affect in the conversations. The research
also contributes a novel, companion ChatGPT-generated dataset of conversations
between two independent chatbots, which were designed to replicate a corpus of
human conversations available for open access and used widely in AI research on
language modeling. Our findings increase understanding of ChatGPT's linguistic
capabilities and inform ongoing efforts to distinguish between human and
LLM-generated text, which is critical in detecting AI-generated fakes,
misinformation, and disinformation.
</p>
|
|
|
|
<p>Prompt-based methods have been successfully applied to multilingual
pretrained language models for zero-shot cross-lingual understanding. However,
most previous studies primarily focused on sentence-level classification tasks,
and only a few considered token-level labeling tasks such as Named Entity
Recognition (NER) and Part-of-Speech (POS) tagging. In this paper, we propose
Token-Level Prompt Decomposition (ToPro), which facilitates the prompt-based
method for token-level sequence labeling tasks. The ToPro method decomposes an
input sentence into single tokens and applies one prompt template to each
token. Our experiments on multilingual NER and POS tagging datasets demonstrate
that ToPro-based fine-tuning outperforms Vanilla fine-tuning and Prompt-Tuning
in zero-shot cross-lingual transfer, especially for languages that are
typologically different from the source language English. Our method also
attains state-of-the-art performance when employed with the mT5 model. Besides,
our exploratory study in multilingual large language models shows that ToPro
performs much better than the current in-context learning method. Overall, the
performance improvements show that ToPro could potentially serve as a novel and
simple benchmarking method for sequence labeling tasks.
</p>
|
|
|
|
<p>We consider the optimization of complex performance metrics in multi-label
classification under the population utility framework. We mainly focus on
metrics linearly decomposable into a sum of binary classification utilities
applied separately to each label with an additional requirement of exactly $k$
labels predicted for each instance. These "macro-at-$k$" metrics possess
desired properties for extreme classification problems with long tail labels.
Unfortunately, the at-$k$ constraint couples the otherwise independent binary
classification tasks, leading to a much more challenging optimization problem
than standard macro-averages. We provide a statistical framework to study this
problem, prove the existence and the form of the optimal classifier, and
propose a statistically consistent and practical learning algorithm based on
the Frank-Wolfe method. Interestingly, our main results concern even more
general metrics being non-linear functions of label-wise confusion matrices.
Empirical results provide evidence for the competitive performance of the
proposed approach.
</p>
|
|
|
|
<p>This paper proposes a fully distributed termination method for distributed
optimization algorithms solved by multiple agents. The proposed method
guarantees terminating a distributed optimization algorithm after satisfying
the global termination criterion using information from local computations and
neighboring agents. The proposed method requires additional iterations after
satisfying the global terminating criterion to communicate the termination
status. The number of additional iterations is bounded by the diameter of the
communication network. This paper also proposes a fault-tolerant extension of
this termination method that prevents early termination due to faulty agents or
communication errors. We provide a proof of the method's correctness and
demonstrate the proposed method by solving the optimal power flow problem for
electric power grids using the alternating direction method of multipliers.
</p>
|
|
|
|
<p>The Ising model, originally developed as a spin-glass model for ferromagnetic
elements, has gained popularity as a network-based model for capturing
dependencies in agents' outputs. Its increasing adoption in healthcare and the
social sciences has raised privacy concerns regarding the confidentiality of
agents' responses. In this paper, we present a novel
$(\varepsilon,\delta)$-differentially private algorithm specifically designed
to protect the privacy of individual agents' outcomes. Our algorithm allows for
precise estimation of the natural parameter using a single network through an
objective perturbation technique. Furthermore, we establish regret bounds for
this algorithm and assess its performance on synthetic datasets and two
real-world networks: one involving HIV status in a social network and the other
concerning the political leaning of online blogs.
</p>
|
|
|
|
<p>Coordination of multi-robot systems require some form of localization between
agents, but most methods today rely on some external infrastructure. Ultra Wide
Band (UWB) sensing has gained popularity in relative localization applications,
and we see many implementations that use cooperative agents augmenting UWB
range measurements with other sensing modalities (e.g., ViO, IMU, VSLAM) for
infrastructure-free relative localization. A lesser researched option is using
Angle of Arrival (AoA) readings obtained from UWB Antenna pairs to perform
relative localization. In this paper we present a UWB platform called ReLoki
that can be used for ranging and AoA-based relative localization in~3D. ReLoki
enables any message sent from a transmitting agent to be localized by using a
Regular Tetrahedral Antenna Array (RTA). As a full scale proof of concept, we
deploy ReLoki on a 3-robot system and compare its performance in terms of
accuracy and speed with prior methods.
</p>
|
|
|
|
<p>Monocular depth estimation (MDE) is a critical component of many medical
tracking and mapping algorithms, particularly from endoscopic or laparoscopic
video. However, because ground truth depth maps cannot be acquired from real
patient data, supervised learning is not a viable approach to predict depth
maps for medical scenes. Although self-supervised learning for MDE has recently
gained attention, the outputs are difficult to evaluate reliably and each MDE's
generalizability to other patients and anatomies is limited. This work
evaluates the zero-shot performance of the newly released Depth Anything Model
on medical endoscopic and laparoscopic scenes. We compare the accuracy and
inference speeds of Depth Anything with other MDE models trained on general
scenes as well as in-domain models trained on endoscopic data. Our findings
show that although the zero-shot capability of Depth Anything is quite
impressive, it is not necessarily better than other models in both speed and
performance. We hope that this study can spark further research in employing
foundation models for MDE in medical scenes.
</p>
|
|
|
|
<p>This paper describes LeftoverLocals: a vulnerability that allows data
recovery from GPU memory created by another process on Apple, Qualcomm, and AMD
GPUs. LeftoverLocals impacts the security posture of GPU applications, with
particular significance to LLMs and ML models that run on impacted GPUs. By
recovering local memory, an optimized GPU memory region, we built a PoC where
an attacker can listen into another user's interactive LLM session (e.g.,
llama.cpp) across process or container boundaries.
</p>
|
|
|
|
<p>Existing modeling approaches for long-duration energy storage (LDES) are
often based either on an oversimplified representation of power system
operations or limited representation of storage technologies, e.g., evaluation
of only a single application. This manuscript presents an overview of the
challenges of modeling LDES technologies, as well as a discussion regarding the
capabilities and limitations of existing approaches. We used two test power
systems with high shares of both solar photovoltaics- and wind (70% - 90%
annual variable renewable energy shares) to assess LDES dispatch approaches.
Our results estimate that better dispatch modeling of LDES could increase the
associated operational value by 4% - 14% and increase the standard capacity
credit by 14% - 34%. Thus, a better LDES dispatch could represent significant
cost saving opportunities for electric utilities and system operators. In
addition, existing LDES dispatch modeling approaches were tested in terms of
both improved system value (e.g., based on production cost and standard
capacity credit) and scalability (e.g., based on central processing unit time
and peak memory usage). Both copper plate and nodal representations of the
power system were considered. Although the end volume target dispatch approach,
i.e., based on mid-term scheduling, showed promising performance in terms of
both improved system value and scalability, there is a need for robust and
scalable dispatch approaches for LDES in transmission-constrained electric
grids. Moreover, more research is required to better understand the optimal
operation of LDES considering extreme climate/weather events, reliability
applications, and power system operational uncertainties.
</p>
|
|
|
|
<p>Millions of online communities are governed by volunteer moderators, who
shape their communities by setting and enforcing rules, recruiting additional
moderators, and participating in the community themselves. These moderators
must regularly make decisions about how to govern, yet it is challenging to
determine what governance strategies are most successful, as measuring the
`success' of governance is complex and nuanced. Furthermore, the incredible
diversity in community topic, size, and membership all but guarantee that there
is no `one-size-fits-all' solution for community governance. In this work, we
measure governance by assessing how community members publicly discuss their
own moderators. We quantify perceptions of moderators through 1.89 million
labeled posts and comments made on reddit over an 18 month period, and relate
these perceptions to characteristics of community governance and to different
actions that community moderators can take. We identify key differences between
different types of communities, and highlight promising strategies for
moderator teams. Amongst other findings, we show that positive perceptions of
moderators are associated with other measures of community health, and that
strict rule enforcement is perceived more favorably for certain topics, such as
news communities, than others. We investigate what kinds of moderators have the
most positive impact on the community when they join the mod team, and find
that moderators who are active community members before and during their mod
tenures result in the largest improvement of community members' perceptions of
moderators. We make all our models, datasets, and code public.
</p>
|
|
|
|
<p>Integrating deep learning with the search for new electron-phonon
superconductors represents a burgeoning field of research, where the primary
challenge lies in the computational intensity of calculating the
electron-phonon spectral function, $\alpha^2F(\omega)$, the essential
ingredient of Midgal-Eliashberg theory of superconductivity. To overcome this
challenge, we adopt a two-step approach. First, we compute $\alpha^2F(\omega)$
for 818 dynamically stable materials. We then train a deep-learning model to
predict $\alpha^2F(\omega)$, using an unconventional training strategy to
temper the model's overfitting, enhancing predictions. Specifically, we train a
Bootstrapped Ensemble of Tempered Equivariant graph neural NETworks (BETE-NET),
obtaining an MAE of 0.21, 45 K, and 43 K for the Eliashberg moments derived
from $\alpha^2F(\omega)$: $\lambda$, $\omega_{\log}$, and $\omega_{2}$,
respectively, yielding an MAE of 2.5 K for the critical temperature, $T_c$.
Further, we incorporate domain knowledge of the site-projected phonon density
of states to impose inductive bias into the model's node attributes and enhance
predictions. This methodological innovation decreases the MAE to 0.18, 29 K,
and 28 K, respectively, yielding an MAE of 2.1 K for $T_c$. We illustrate the
practical application of our model in high-throughput screening for high-$T_c$
materials. The model demonstrates an average precision nearly five times higher
than random screening, highlighting the potential of ML in accelerating
superconductor discovery. BETE-NET accelerates the search for high-$T_c$
superconductors while setting a precedent for applying ML in materials
discovery, particularly when data is limited.
</p>
|
|
|
|
<p>In inverse problems, it is widely recognized that the incorporation of a
sparsity prior yields a regularization effect on the solution. This approach is
grounded on the a priori assumption that the unknown can be appropriately
represented in a basis with a limited number of significant components, while
most coefficients are close to zero. This occurrence is frequently observed in
real-world scenarios, such as with piecewise smooth signals. In this study, we
propose a probabilistic sparsity prior formulated as a mixture of degenerate
Gaussians, capable of modeling sparsity with respect to a generic basis. Under
this premise, we design a neural network that can be interpreted as the Bayes
estimator for linear inverse problems. Additionally, we put forth both a
supervised and an unsupervised training strategy to estimate the parameters of
this network. To evaluate the effectiveness of our approach, we conduct a
numerical comparison with commonly employed sparsity-promoting regularization
techniques, namely LASSO, group LASSO, iterative hard thresholding, and sparse
coding/dictionary learning. Notably, our reconstructions consistently exhibit
lower mean square error values across all $1$D datasets utilized for the
comparisons, even in cases where the datasets significantly deviate from a
Gaussian mixture model.
</p>
|
|
|
|
<p>In this paper, we study linear convolutional networks with one-dimensional
filters and arbitrary strides. The neuromanifold of such a network is a
semialgebraic set, represented by a space of polynomials admitting specific
factorizations. Introducing a recursive algorithm, we generate polynomial
equations whose common zero locus corresponds to the Zariski closure of the
corresponding neuromanifold. Furthermore, we explore the algebraic complexity
of training these networks employing tools from metric algebraic geometry. Our
findings reveal that the number of all complex critical points in the
optimization of such a network is equal to the generic Euclidean distance
degree of a Segre variety. Notably, this count significantly surpasses the
number of critical points encountered in the training of a fully connected
linear network with the same number of parameters.
</p>
|
|
|
|
<p>In this paper, we present an exploration and assessment of employing a
centralized deep Q-network (DQN) controller as a substitute for the prevalent
use of PID controllers in the context of 6DOF swimming robots. Our primary
focus centers on illustrating this transition with the specific case of
underwater object tracking. DQN offers advantages such as data efficiency and
off-policy learning, while remaining simpler to implement than other
reinforcement learning methods. Given the absence of a dynamic model for our
robot, we propose an RL agent to control this multi-input-multi-output (MIMO)
system, where a centralized controller may offer more robust control than
distinct PIDs. Our approach involves initially using classical controllers for
safe exploration, then gradually shifting to DQN to take full control of the
robot.
</p>
<p>We divide the underwater tracking task into vision and control modules. We
use established methods for vision-based tracking and introduce a centralized
DQN controller. By transmitting bounding box data from the vision module to the
control module, we enable adaptation to various objects and effortless vision
system replacement. Furthermore, dealing with low-dimensional data facilitates
cost-effective online learning for the controller. Our experiments, conducted
within a Unity-based simulator, validate the effectiveness of a centralized RL
agent over separated PID controllers, showcasing the applicability of our
framework for training the underwater RL agent and improved performance
compared to traditional control methods. The code for both real and simulation
implementations is at https://github.com/FARAZLOTFI/underwater-object-tracking.
</p>
|
|
|
|
<p>In past work (Onokpasa, Wild, Wong, DCC 2023), we showed that (a) for joint
compression of RNA sequence and structure, stochastic context-free grammars are
the best known compressors and (b) that grammars which have better compression
ability also show better performance in ab initio structure prediction.
Previous grammars were manually curated by human experts. In this work, we
develop a framework for automatic and systematic search algorithms for
stochastic grammars with better compression (and prediction) ability for RNA.
We perform an exhaustive search of small grammars and identify grammars that
surpass the performance of human-expert grammars.
</p>
|
|
|
|
<p>We contribute the first publicly available dataset of factual claims from
different platforms and fake YouTube videos on the 2023 Israel-Hamas war for
automatic fake YouTube video classification. The FakeClaim data is collected
from 60 fact-checking organizations in 30 languages and enriched with metadata
from the fact-checking organizations curated by trained journalists specialized
in fact-checking. Further, we classify fake videos within the subset of YouTube
videos using textual information and user comments. We used a pre-trained model
to classify each video with different feature combinations. Our best-performing
fine-tuned language model, Universal Sentence Encoder (USE), achieves a Macro
F1 of 87\%, which shows that the trained model can be helpful for debunking
fake videos using the comments from the user discussion. The dataset is
available on Github\footnote{https://github.com/Gautamshahi/FakeClaim}
</p>
|
|
|
|
<p>Local zoning ordinances across the United States have the impact of
restricting development of energy infrastructure, including utility-scale solar
photovoltaics. While these ordinances may be developed for legitimate purposes
to protect public health and safety, they could impede or increase costs of
power sector decarbonization. We quantify the role of utility-scale solar
zoning ordinances on power sector decarbonization across the Great Lakes region
(Illinois, Indiana, Michigan, Minnesota, Ohio, and Wisconsin) by integrating
6,300 rural community zoning ordinances into a power system planning model.
Relative to no ordinances, solar zoning ordinances reduce total potential
deployment of solar PV by 52% (or 1.6 TW) across our region. Currently,
however, the biggest zoning barrier to deployment is zoning ordinances which
are silent on utility-scale solar. Deployment restrictions translate to up to 4
GW greater investment needs and 5.6% greater PV investment costs to achieve a
10% PV generation target. Starker shifts occur at the state level, e.g.
Wisconsin sees a 40% reduction in PV investments due to zoning restrictions.
Our results underscore the need for planning that aligns local zoning laws with
state and regional goals.
</p>
|
|
|
|
<p>This paper presents new solutions for Private Information Retrieval (PIR)
with side information. This problem is motivated by PIR settings in which a
client has side information about the data held by the servers and would like
to leverage this information in order to improve the download rate. The problem
of PIR with side information has been the subject of several recent studies
that presented achievability schemes as well as converses for both multi-server
and single-server settings. However, the solutions for the multi-server
settings adapted from the solutions for the single-server setting in a rather
straightforward manner, relying on the concept of super-messages. Such
solutions require an exponential degree of sub-packetization (in terms of the
number of messages).
</p>
<p>This paper makes the following contributions. First, we revisit the PIR
problem with side information and present a new approach to leverage side
information in the context of PIR. The key idea of our approach is a randomized
algorithm to determine the linear combinations of the sub-packets that need to
be recovered from each server. In addition, our approach takes advantage of the
fact that the identity of the side information messages does not need to be
kept private, and, as a result, the information retrieval scheme does not need
to be symmetric. Second, we present schemes for PIR with side information that
achieve a higher rate than previously proposed solutions and require a
significantly lower degree of sub-packetization (linear in the number of
servers). Our scheme not only achieves the highest known download rate for the
problem at hand but also invalidates a previously claimed converse bound on the
maximum achievable download rate.
</p>
|
|
|
|
<p>This paper revisits the problem of multi-server Private Information Retrieval
with Private Side Information (PIR-PSI). In this problem, $N$ non-colluding
servers store identical copies of $K$ messages, each comprising $L$ symbols
from $\mathbb{F}_q$, and a user, who knows $M$ of these messages, wants to
retrieve one of the remaining $K-M$ messages. The user's goal is to retrieve
the desired message by downloading the minimum amount of information from the
servers while revealing no information about the identities of the desired
message and side information messages to any server. The capacity of PIR-PSI,
defined as the maximum achievable download rate, was previously characterized
for all $N$, $K$, and $M$ when $L$ and $q$ are sufficiently large --
specifically, growing exponentially with $K$, to ensure the divisibility of
each message into $N^K$ sub-packets and to guarantee the existence of an MDS
code with its length and dimension being exponential in $K$. In this work, we
propose a new capacity-achieving PIR-PSI scheme that is applicable to all $N$,
$K$, $M$, $L$, and $q$ where $N\geq M+1$ and $N-1\mid L$. The proposed scheme
operates with a sub-packetization level of $N-1$, independent of $K$, and works
over any finite field without requiring an MDS code.
</p>
|
|
|
|
<p>For turbulent problems of industrial scale, computational cost may become
prohibitive due to the stability constraints associated with explicit time
discretization of the underlying conservation laws. On the other hand, implicit
methods allow for larger time-step sizes but require exorbitant computational
resources. Implicit-explicit (IMEX) formulations combine both temporal
approaches, using an explicit method in nonstiff portions of the domain and
implicit in stiff portions. While these methods can be shown to be orders of
magnitude faster than typical explicit discretizations, they are still limited
by their implicit discretization in terms of cost. Hybridization reduces the
scaling of these systems to an effective lower dimension, which allows the
system to be solved at significant speedup factors compared to standard
implicit methods. This work proposes an IMEX scheme that combines hybridized
and standard flux reconstriction (FR) methods to tackle geometry-induced
stiffness. By using the so-called transmission conditions, an overall
conservative formulation can be obtained after combining both explicit FR and
hybridized implicit FR methods. We verify and apply our approach to a series of
numerical examples, including a multi-element airfoil at Reynolds number 1.7
million. Results demonstrate speedup factors of four against standard IMEX
formulations and at least 15 against standard explicit formulations for the
same problem.
</p>
|
|
|
|
<p>The execution failure of cyber-physical systems (e.g., autonomous driving
systems, unmanned aerial systems, and robotic systems) could result in the loss
of life, severe injuries, large-scale environmental damage, property
destruction, and major economic loss. Hence, such systems usually require a
strong justification that they will effectively support critical requirements
(e.g., safety, security, and reliability) for which they were designed. Thus,
it is often mandatory to develop compelling assurance cases to support that
justification and allow regulatory bodies to certify such systems. In such
contexts, detecting assurance deficits, relying on patterns to improve the
structure of assurance cases, improving existing assurance case notations, and
(semi-)automating the generation of assurance cases are key to develop
compelling assurance cases and foster consumer acceptance. We therefore explore
challenges related to such assurance enablers and outline some potential
directions that could be explored to tackle them.
</p>
|
|
|
|
<p>Active learning strategies for 3D object detection in autonomous driving
datasets may help to address challenges of data imbalance, redundancy, and
high-dimensional data. We demonstrate the effectiveness of entropy querying to
select informative samples, aiming to reduce annotation costs and improve model
performance. We experiment using the BEVFusion model for 3D object detection on
the nuScenes dataset, comparing active learning to random sampling and
demonstrating that entropy querying outperforms in most cases. The method is
particularly effective in reducing the performance gap between majority and
minority classes. Class-specific analysis reveals efficient allocation of
annotated resources for limited data budgets, emphasizing the importance of
selecting diverse and informative data for model training. Our findings suggest
that entropy querying is a promising strategy for selecting data that enhances
model learning in resource-constrained environments.
</p>
|
|
|
|
<p>Reinforcement Learning from Human Feedback (RLHF) is a widely adopted
approach for aligning large language models with human values. However, RLHF
relies on a reward model that is trained with a limited amount of human
preference data, which could lead to inaccurate predictions. As a result, RLHF
may produce outputs that are misaligned with human values. To mitigate this
issue, we contribute a reward ensemble method that allows the reward model to
make more accurate predictions. As using an ensemble of large language
model-based reward models can be computationally and resource-expensive, we
explore efficient ensemble methods including linear-layer ensemble and
LoRA-based ensemble. Empirically, we run Best-of-$n$ and Proximal Policy
Optimization with our ensembled reward models, and verify that our ensemble
methods help improve the alignment performance of RLHF outputs.
</p>
|
|
|
|
<p>Code completion aims to enhance programming productivity by predicting
potential code based on the current programming context. Recently, pretrained
language models (LMs) have become prominent in this field. Various approaches
have been proposed to fine-tune LMs using supervised fine-tuning (SFT)
techniques for code completion. However, the inherent exposure bias of these
models can cause errors to accumulate early in the sequence completion, leading
to even more errors in subsequent completions. To address this problem, deep
reinforcement learning (DRL) is an alternative technique for fine-tuning LMs
for code completion, which can improve the generalization capabilities and
overall performance. Nevertheless, integrating DRL-based strategies into code
completion faces two major challenges: 1) The dynamic nature of the code
context requires the completion model to quickly adapt to changes, which poses
difficulties for conventional DRL strategies that focus on delayed rewarding of
the final code state. 2) It is difficult to evaluate the correctness of partial
code, thus the reward redistribution-based strategies cannot be adapted to code
completion. To tackle these challenges, we propose IRCoCo, a code
completion-specific DRL-based fine-tuning framework. This framework is designed
to provide immediate rewards as feedback for detecting dynamic context changes
arising from continuous edits during code completion. With the aid of immediate
feedback, the fine-tuned LM can gain a more precise understanding of the
current context, thereby enabling effective adjustment of the LM and optimizing
code completion in a more refined manner. Experimental results demonstrate that
fine-tuning pretrained LMs with IRCoCo leads to significant improvements in the
code completion task, outperforming both SFT-based and other DRL-based
baselines.
</p>
|
|
|
|
<p>Fine-tuning large pre-trained language models (LLMs) on particular datasets
is a commonly employed strategy in Natural Language Processing (NLP)
classification tasks. However, this approach usually results in a loss of
models generalizability. In this paper, we present a framework that allows for
maintaining generalizability, and enhances the performance on the downstream
task by utilizing task-specific context attribution. We show that a linear
transformation of the text representation from any transformer model using the
task-specific concept operator results in a projection onto the latent concept
space, referred to as context attribution in this paper. The specific concept
operator is optimized during the supervised learning stage via novel loss
functions. The proposed framework demonstrates that context attribution of the
text representation for each task objective can improve the capacity of the
discriminator function and thus achieve better performance for the
classification task. Experimental results on three datasets, namely HateXplain,
IMDB reviews, and Social Media Attributions, illustrate that the proposed model
attains superior accuracy and generalizability. Specifically, for the
non-fine-tuned BERT on the HateXplain dataset, we observe 8% improvement in
accuracy and 10% improvement in F1-score. Whereas for the IMDB dataset,
fine-tuned state-of-the-art XLNet is outperformed by 1% for both accuracy and
F1-score. Furthermore, in an out-of-domain cross-dataset test, DistilBERT
fine-tuned on the IMDB dataset in conjunction with the proposed model improves
the F1-score on the HateXplain dataset by 7%. For the Social Media Attributions
dataset of YouTube comments, we observe 5.2% increase in F1-metric. The
proposed framework is implemented with PyTorch and provided open-source on
GitHub.
</p>
|
|
|
|
<p>Large language models (LLMs) have significantly advanced natural language
processing, but their progress has yet to be equal across languages. While most
LLMs are trained in high-resource languages like English, multilingual models
generally underperform monolingual ones. Additionally, aspects of their
multilingual foundation sometimes restrict the byproducts they produce, like
computational demands and licensing regimes. In this study, we document the
development of open-foundation models tailored for use in low-resource
settings, their limitations, and their benefits. This is the TeenyTinyLlama
pair: two compact models for Brazilian Portuguese text generation. We release
them under the permissive Apache 2.0 license on GitHub and Hugging Face for
community use and further development. See
https://github.com/Nkluge-correa/TeenyTinyLlama
</p>
|
|
|
|
<p>Online platforms such as YouTube, Instagram, TikTok heavily rely on
recommender systems to decide what content to show to which users. Content
producers often aim to produce material that is likely to be shown to users and
lead them to engage with said producer. To do so, producers try to align their
content with the preferences of their targeted user base. In this work, we
explore the equilibrium behavior of producers that are interested in maximizing
user engagement. We study two variants of the content-serving rule that the
platform's recommender system uses, and we show structural results on
producers' production at equilibrium. We leverage these structural results to
show that, in simple settings, we see specialization naturally arising from the
competition among producers trying to maximize user engagement. We provide a
heuristic for computing equilibria of our engagement game, and evaluate it
experimentally. We show i) the performance and convergence of our heuristic,
ii) the producer and user utilities at equilibrium, and iii) the level of
producer specialization.
</p>
|
|
|
|
<p>Coding theory revolves around the incorporation of redundancy into
transmitted symbols, computation tasks, and stored data to guard against
adversarial manipulation. However, error correction in coding theory is
contingent upon a strict trust assumption. In the context of computation and
storage, it is required that honest nodes outnumber adversarial ones by a
certain margin. However, in several emerging real-world cases, particularly, in
decentralized blockchain-oriented applications, such assumptions are often
unrealistic. Consequently, despite the important role of coding in addressing
significant challenges within decentralized systems, its applications become
constrained. Still, in decentralized platforms, a distinctive characteristic
emerges, offering new avenues for secure coding beyond the constraints of
conventional methods. In these scenarios, the adversary benefits when the
legitimate decoder recovers the data, and preferably with a high estimation
error. This incentive motivates them to act rationally, trying to maximize
their gains. In this paper, we propose a game theoretic formulation, called
game of coding, that captures this unique dynamic where each of the adversary
and the data collector (decoder) have a utility function to optimize. The
utility functions reflect the fact that both the data collector and the
adversary are interested to increase the chance of data being recoverable at
the data collector. Moreover, the utility functions express the interest of the
data collector to estimate the input with lower estimation error, but the
opposite interest of the adversary. As a first, still highly non-trivial step,
we characterize the equilibrium of the game for the repetition code with
repetition factor of 2, for a wide class of utility functions with minimal
assumptions.
</p>
|
|
|
|
<p>Scientific machine learning (SciML) has emerged as a versatile approach to
address complex computational science and engineering problems. Within this
field, physics-informed neural networks (PINNs) and deep operator networks
(DeepONets) stand out as the leading techniques for solving partial
differential equations by incorporating both physical equations and
experimental data. However, training PINNs and DeepONets requires significant
computational resources, including long computational times and large amounts
of memory. In search of computational efficiency, training neural networks
using half precision (float16) rather than the conventional single (float32) or
double (float64) precision has gained substantial interest, given the inherent
benefits of reduced computational time and memory consumed. However, we find
that float16 cannot be applied to SciML methods, because of gradient divergence
at the start of training, weight updates going to zero, and the inability to
converge to a local minima. To overcome these limitations, we explore mixed
precision, which is an approach that combines the float16 and float32 numerical
formats to reduce memory usage and increase computational speed. Our
experiments showcase that mixed precision training not only substantially
decreases training times and memory demands but also maintains model accuracy.
We also reinforce our empirical observations with a theoretical analysis. The
research has broad implications for SciML in various computational
applications.
</p>
|
|
|
|
<p>Autoregressive Large Language Models (LLMs) trained for next-word prediction
have demonstrated remarkable proficiency at producing coherent text. But are
they equally adept at forming coherent probability judgments? We use
probabilistic identities and repeated judgments to assess the coherence of
probability judgments made by LLMs. Our results show that the judgments
produced by these models are often incoherent, displaying human-like systematic
deviations from the rules of probability theory. Moreover, when prompted to
judge the same event, the mean-variance relationship of probability judgments
produced by LLMs shows an inverted-U-shaped like that seen in humans. We
propose that these deviations from rationality can be explained by linking
autoregressive LLMs to implicit Bayesian inference and drawing parallels with
the Bayesian Sampler model of human probability judgments.
</p>
|
|
|
|
<p>In this paper, we focus on the design of binary constant-weight codes that
admit low-complexity encoding and decoding algorithms, and that have size as a
power of $2$. We construct a family of $(n=2^\ell, M=2^k, d=2)$ constant-weight
codes ${\cal C}[\ell, r]$ parameterized by integers $\ell \geq 3$ and $1 \leq r
\leq \lfloor \frac{\ell+3}{4} \rfloor$, by encoding information in the gaps
between successive $1$'s of a vector. The code has weight $w = \ell$ and
combinatorial dimension $k$ that scales quadratically with $\ell$. The encoding
time is linear in the input size $k$, and the decoding time is poly-logarithmic
in the input size $n$, discounting the linear time spent on parsing the input.
Encoding and decoding algorithms of similar codes known in either
information-theoretic or combinatorial literature require computation of large
number of binomial coefficients. Our algorithms fully eliminate the need to
evaluate binomial coefficients. While the code has a natural price to pay in
$k$, it performs fairly well against the information-theoretic upper bound
$\lfloor \log_2 {n \choose w} \rfloor$. When $\ell =3$, the code is optimal
achieving the upper bound; when $\ell=4$, it is one bit away from the upper
bound, and as $\ell$ grows it is order-optimal in the sense that the ratio of
$k$ with its upper bound becomes a constant $\frac{11}{16}$ when $r=\lfloor
\frac{\ell+3}{4} \rfloor$. With the same or even lower complexity, we derive
new codes permitting a wider range of parameters by modifying ${\cal C}[\ell,
r]$ in two different ways. The code derived using the first approach has the
same blocklength $n=2^\ell$, but weight $w$ is allowed to vary from $\ell-1$ to
$1$. In the second approach, the weight remains fixed as $w = \ell$, but the
blocklength is reduced to $n=2^\ell - 2^r +1$. For certain selected values of
parameters, these modified codes have an optimal $k$.
</p>
|
|
|
|
<p>Task-based behavioral biometric authentication of users interacting in
virtual reality (VR) environments enables seamless continuous authentication by
using only the motion trajectories of the person's body as a unique signature.
Deep learning-based approaches for behavioral biometrics show high accuracy
when using complete or near complete portions of the user trajectory, but show
lower performance when using smaller segments from the start of the task. Thus,
any systems designed with existing techniques are vulnerable while waiting for
future segments of motion trajectories to become available. In this work, we
present the first approach that predicts future user behavior using
Transformer-based forecasting and using the forecasted trajectory to perform
user authentication. Our work leverages the notion that given the current
trajectory of a user in a task-based environment we can predict the future
trajectory of the user as they are unlikely to dramatically shift their
behavior since it would preclude the user from successfully completing their
task goal. Using the publicly available 41-subject ball throwing dataset of
Miller et al. we show improvement in user authentication when using forecasted
data. When compared to no forecasting, our approach reduces the authentication
equal error rate (EER) by an average of 23.85% and a maximum reduction of
36.14%.
</p>
|
|
|
|
<p>In continual RL, the environment of a reinforcement learning (RL) agent
undergoes change. A successful system should appropriately balance the
conflicting requirements of retaining agent performance on already learned
tasks, stability, whilst learning new tasks, plasticity. The first-in-first-out
buffer is commonly used to enhance learning in such settings but requires
significant memory. We explore the application of an augmentation to this
buffer which alleviates the memory constraints, and use it with a world model
model-based reinforcement learning algorithm, to evaluate its effectiveness in
facilitating continual learning. We evaluate the effectiveness of our method in
Procgen and Atari RL benchmarks and show that the distribution matching
augmentation to the replay-buffer used in the context of latent world models
can successfully prevent catastrophic forgetting with significantly reduced
computational overhead. Yet, we also find such a solution to not be entirely
infallible, and other failure modes such as the opposite -- lacking plasticity
and being unable to learn a new task -- to be a potential limitation in
continual learning systems.
</p>
|
|
|
|
<p>Autonomous manipulation in robot arms is a complex and evolving field of
study in robotics. This paper introduces an innovative approach to this
challenge by focusing on imitation learning (IL). Unlike traditional imitation
methods, our approach uses IL based on bilateral control, allowing for more
precise and adaptable robot movements. The conventional IL based on bilateral
control method have relied on Long Short-Term Memory (LSTM) networks. In this
paper, we present the IL for robot using position and torque information based
on Bilateral control with Transformer (ILBiT). This proposed method employs the
Transformer model, known for its robust performance in handling diverse
datasets and its capability to surpass LSTM's limitations, especially in tasks
requiring detailed force adjustments. A standout feature of ILBiT is its
high-frequency operation at 100 Hz, which significantly improves the system's
adaptability and response to varying environments and objects of different
hardness levels. The effectiveness of the Transformer-based ILBiT method can be
seen through comprehensive real-world experiments.
</p>
|
|
|
|
<p>BLVRUN is a command line shell script designed to offer developers within the
BLV community a succinct and insightful overview of traceback errors. Its
primary function involves parsing errors and utilizing a refined large language
model to generate informative error summaries. In terms of performance, our
model rivals that of well-known models like ChatGPT or AI-chatbot plug-ins
tailored for specific Integrated Development Environments (IDEs). Importantly,
BLV users can seamlessly integrate this tool into their existing development
workflows, eliminating the need for any modifications or adaptations to
facilitate debugging tasks.
</p>
|
|
|
|
<p>We show how continuous-depth neural ODE models can be framed as single-layer,
infinite-width nets using the Chen--Fliess series expansion for nonlinear ODEs.
In this net, the output ''weights'' are taken from the signature of the control
input -- a tool used to represent infinite-dimensional paths as a sequence of
tensors -- which comprises iterated integrals of the control input over a
simplex. The ''features'' are taken to be iterated Lie derivatives of the
output function with respect to the vector fields in the controlled ODE model.
The main result of this work applies this framework to derive compact
expressions for the Rademacher complexity of ODE models that map an initial
condition to a scalar output at some terminal time. The result leverages the
straightforward analysis afforded by single-layer architectures. We conclude
with some examples instantiating the bound for some specific systems and
discuss potential follow-up work.
</p>
|
|
|
|
<p>Red teaming is a common strategy for identifying weaknesses in generative
language models (LMs), where adversarial prompts are produced that trigger an
LM to generate unsafe responses. Red teaming is instrumental for both model
alignment and evaluation, but is labor-intensive and difficult to scale when
done by humans. In this paper, we present Gradient-Based Red Teaming (GBRT), a
red teaming method for automatically generating diverse prompts that are likely
to cause an LM to output unsafe responses. GBRT is a form of prompt learning,
trained by scoring an LM response with a safety classifier and then
backpropagating through the frozen safety classifier and LM to update the
prompt. To improve the coherence of input prompts, we introduce two variants
that add a realism loss and fine-tune a pretrained model to generate the
prompts instead of learning the prompts directly. Our experiments show that
GBRT is more effective at finding prompts that trigger an LM to generate unsafe
responses than a strong reinforcement learning-based red teaming approach, and
succeeds even when the LM has been fine-tuned to produce safer outputs.
</p>
|
|
|
|
<p>Simulating sampling algorithms with people has proven a useful method for
efficiently probing and understanding their mental representations. We propose
that the same methods can be used to study the representations of Large
Language Models (LLMs). While one can always directly prompt either humans or
LLMs to disclose their mental representations introspectively, we show that
increased efficiency can be achieved by using LLMs as elements of a sampling
algorithm. We explore the extent to which we recover human-like representations
when LLMs are interrogated with Direct Sampling and Markov chain Monte Carlo
(MCMC). We found a significant increase in efficiency and performance using
adaptive sampling algorithms based on MCMC. We also highlight the potential of
our method to yield a more general method of conducting Bayesian inference
\textit{with} LLMs.
</p>
|
|
|
|
<p>Recent studies have advocated for fully open foundation models to promote
transparency and open science. As an initial step, the Open Whisper-style
Speech Model (OWSM) reproduced OpenAI's Whisper using publicly available data
and open-source toolkits. With the aim of reproducing Whisper, the previous
OWSM v1 through v3 models were still based on Transformer, which might lead to
inferior performance compared to other state-of-the-art speech encoders. In
this work, we aim to improve the performance and efficiency of OWSM without
extra training data. We present E-Branchformer based OWSM v3.1 models at two
scales, i.e., 100M and 1B. The 1B model is the largest E-Branchformer based
speech model that has been made publicly available. It outperforms the previous
OWSM v3 in a vast majority of evaluation benchmarks, while demonstrating up to
25% faster inference speed. We publicly release the data preparation scripts,
pre-trained models and training logs.
</p>
|
|
|
|
<p>Conversational search facilitates complex information retrieval by enabling
multi-turn interactions between users and the system. Supporting such
interactions requires a comprehensive understanding of the conversational
inputs to formulate a good search query based on historical information. In
particular, the search query should include the relevant information from the
previous conversation turns. However, current approaches for conversational
dense retrieval primarily rely on fine-tuning a pre-trained ad-hoc retriever
using the whole conversational search session, which can be lengthy and noisy.
Moreover, existing approaches are limited by the amount of manual supervision
signals in the existing datasets. To address the aforementioned issues, we
propose a History-Aware Conversational Dense Retrieval (HAConvDR) system, which
incorporates two ideas: context-denoised query reformulation and automatic
mining of supervision signals based on the actual impact of historical turns.
Experiments on two public conversational search datasets demonstrate the
improved history modeling capability of HAConvDR, in particular for long
conversations with topic shifts.
</p>
|
|
|
|
<p>LiNGAM determines the variable order from cause to effect using additive
noise models, but it faces challenges with confounding. Previous methods
maintained LiNGAM's fundamental structure while trying to identify and address
variables affected by confounding. As a result, these methods required
significant computational resources regardless of the presence of confounding,
and they did not ensure the detection of all confounding types. In contrast,
this paper enhances LiNGAM by introducing LiNGAM-MMI, a method that quantifies
the magnitude of confounding using KL divergence and arranges the variables to
minimize its impact. This method efficiently achieves a globally optimal
variable order through the shortest path problem formulation. LiNGAM-MMI
processes data as efficiently as traditional LiNGAM in scenarios without
confounding while effectively addressing confounding situations. Our
experimental results suggest that LiNGAM-MMI more accurately determines the
correct variable order, both in the presence and absence of confounding.
</p>
|
|
|
|
<p>As consumer Virtual Reality (VR) and Mixed Reality (MR) technologies gain
momentum, there's a growing focus on the development of engagements with 3D
virtual content. Unfortunately, traditional techniques for content creation,
editing, and interaction within these virtual spaces are fraught with
difficulties. They tend to be not only engineering-intensive but also require
extensive expertise, which adds to the frustration and inefficiency in virtual
object manipulation. Our proposed VR-GS system represents a leap forward in
human-centered 3D content interaction, offering a seamless and intuitive user
experience. By developing a physical dynamics-aware interactive Gaussian
Splatting in a Virtual Reality setting, and constructing a highly efficient
two-level embedding strategy alongside deformable body simulations, VR-GS
ensures real-time execution with highly realistic dynamic responses. The
components of our Virtual Reality system are designed for high efficiency and
effectiveness, starting from detailed scene reconstruction and object
segmentation, advancing through multi-view image in-painting, and extending to
interactive physics-based editing. The system also incorporates real-time
deformation embedding and dynamic shadow casting, ensuring a comprehensive and
engaging virtual experience.Our project page is available at:
https://yingjiang96.github.io/VR-GS/.
</p>
|
|
|
|
<p>Relationship inference from sparse data is an important task with
applications ranging from product recommendation to drug discovery. A recently
proposed linear model for sparse matrix completion has demonstrated surprising
advantage in speed and accuracy over more sophisticated recommender systems
algorithms. Here we extend the linear model to develop a shallow autoencoder
for the dual neighborhood-regularized matrix completion problem. We demonstrate
the speed and accuracy advantage of our approach over the existing
state-of-the-art in predicting drug-target interactions and drug-disease
associations.
</p>
|
|
|
|
<p>Smartphone overuse poses risks to people's physical and mental health.
However, current intervention techniques mainly focus on explicitly changing
screen content (i.e., output) and often fail to persistently reduce smartphone
overuse due to being over-restrictive or over-flexible. We present the design
and implementation of InteractOut, a suite of implicit input manipulation
techniques that leverage interaction proxies to weakly inhibit the natural
execution of common user gestures on mobile devices. We present a design space
for input manipulations and demonstrate 8 Android implementations of input
interventions. We first conducted a pilot lab study (N=30) to evaluate the
usability of these interventions. Based on the results, we then performed a
5-week within-subject field experiment (N=42) to evaluate InteractOut in
real-world scenarios. Compared to the traditional and common timed lockout
technique, InteractOut significantly reduced the usage time by an additional
15.0% and opening frequency by 17.0% on participant-selected target apps.
InteractOut also achieved a 25.4% higher user acceptance rate, and resulted in
less frustration and better user experience according to participants'
subjective feedback. InteractOut demonstrates a new direction for smartphone
overuse intervention and serves as a strong complementary set of techniques
with existing methods.
</p>
|
|
|
|
<p>The rapid advancement of artificial intelligence technologies, particularly
in recent years, has led to the emergence of several large parameter artificial
intelligence weather forecast models. These models represent a significant
breakthrough, overcoming the limitations of traditional numerical weather
prediction models and indicating a potential second revolution for weather
forecast. This study explores the evolution of these advanced artificial
intelligence forecast models, and based on the identified commonalities,
proposes the "Three Large Rules" for their development. We discuss the
potential of artificial intelligence in revolutionizing numerical weather
prediction, briefly outlining the underlying reasons for this potential.
Additionally, we explore key areas for future development prospects for large
artificial intelligence weather forecast models, integrating the entire
numerical prediction process. Through an example that combines a large
artificial intelligence model with ocean wave forecasting, we illustrate how
forecasters can adapt and leverage the advanced artificial intelligence model.
While acknowledging the high accuracy, computational efficiency, and ease of
deployment of large artificial intelligence forecast models, we emphasize the
irreplaceable values of traditional numerical forecasts. We believe that the
optimal future of weather forecasting lies in achieving a seamless integration
of artificial intelligence and traditional numerical models. Such a synthesis
is anticipated to offer a more comprehensive and reliable approach for future
weather forecasting.
</p>
|
|
|
|
<p>In the rapidly evolving field of scientific research, efficiently extracting
key information from the burgeoning volume of scientific papers remains a
formidable challenge. This paper introduces an innovative framework designed to
automate the extraction of vital data from scientific PDF documents, enabling
researchers to discern future research trajectories more readily. AutoIE
uniquely integrates four novel components: (1) A multi-semantic feature
fusion-based approach for PDF document layout analysis; (2) Advanced functional
block recognition in scientific texts; (3) A synergistic technique for
extracting and correlating information on molecular sieve synthesis; (4) An
online learning paradigm tailored for molecular sieve literature. Our SBERT
model achieves high Marco F1 scores of 87.19 and 89.65 on CoNLL04 and ADE
datasets. In addition, a practical application of AutoIE in the petrochemical
molecular sieve synthesis domain demonstrates its efficacy, evidenced by an
impressive 78\% accuracy rate. This research paves the way for enhanced data
management and interpretation in molecular sieve synthesis. It is a valuable
asset for seasoned experts and newcomers in this specialized field.
</p>
|
|
|
|
<p>Large Language Models increasingly rely on distributed techniques for their
training and inference. These techniques require communication across devices
which can reduce scaling efficiency as the number of devices increases. While
some distributed techniques can overlap, and thus, hide this communication with
independent computations, techniques such as Tensor Parallelism (TP) inherently
serialize communication with model execution. One approach to hide this
serialized communication is to interleave it with the producer operation (of
the communicated data) in a fine-grained manner. However, this fine-grained
interleaving of communication and computation in software can be difficult.
Furthermore, as with any concurrent execution, it requires compute and memory
resources to be shared between computation and communication, causing resource
contention that reduces overlapping efficacy.
</p>
<p>To overcome these challenges, we propose T3 which applies hardware-software
co-design to transparently overlap serialized communication while minimizing
resource contention with compute. T3 transparently fuses producer operations
with the subsequent communication via a simple configuration of the producer's
output address space and requires minor software changes. At the hardware
level, T3 adds a lightweight track and trigger mechanism to orchestrate the
producer's compute, and communication. It further uses compute-enhanced
memories for communication's attendant compute. As a result, T3 reduces
resource contention, and efficiently overlaps serialized communication with
computation. For important Transformer models like T-NLG, T3 speeds up
communication-heavy sublayers by 30% geomean (max 47%) and reduces data
movement by 22% geomean (max 36%). Furthermore, T3's benefits persist as models
scale: geomean 29% for sublayers in $\sim$500-billion parameter models, PALM
and MT-NLG.
</p>
|
|
|
|
<p>In this paper, we present a variety of classification experiments related to
the task of fictional discourse detection. We utilize a diverse array of
datasets, including contemporary professionally published fiction, historical
fiction from the Hathi Trust, fanfiction, stories from Reddit, folk tales,
GPT-generated stories, and anglophone world literature. Additionally, we
introduce a new feature set of word "supersenses" that facilitate the goal of
semantic generalization. The detection of fictional discourse can help enrich
our knowledge of large cultural heritage archives and assist with the process
of understanding the distinctive qualities of fictional storytelling more
broadly.
</p>
|
|
|
|
<p>Lithium-ion batteries (LIBs) have found wide applications in a variety of
fields such as electrified transportation, stationary storage and portable
electronics devices. A battery management system (BMS) is critical to ensure
the reliability, efficiency and longevity of LIBs. Recent research has
witnessed the emergence of model-based fault diagnosis methods in advanced
BMSs. This paper provides a comprehensive review on the model-based fault
diagnosis methods for LIBs. First, the widely explored battery models in the
existing literature are classified into physics-based electrochemical models
and electrical equivalent circuit models. Second, a general state-space
representation that describes electrical dynamics of a faulty battery is
presented. The formulation of the state vectors and the identification of the
parameter matrices are then elaborated. Third, the fault mechanisms of both
battery faults (incl. overcharege/overdischarge faults, connection faults,
short circuit faults) and sensor faults (incl. voltage sensor faults and
current sensor faults) are discussed. Furthermore, different types of modeling
uncertainties, such as modeling errors and measurement noises, aging effects,
measurement outliers, are elaborated. An emphasis is then placed on the
observer design (incl. online state observers and offline state observers). The
algorithm implementation of typical state observers for battery fault diagnosis
is also put forward. Finally, discussion and outlook are offered to envision
some possible future research directions.
</p>
|
|
|
|
<p>In this work we introduce a manifold learning-based surrogate modeling
framework for uncertainty quantification in high-dimensional stochastic
systems. Our first goal is to perform data mining on the available simulation
data to identify a set of low-dimensional (latent) descriptors that efficiently
parameterize the response of the high-dimensional computational model. To this
end, we employ Principal Geodesic Analysis on the Grassmann manifold of the
response to identify a set of disjoint principal geodesic submanifolds, of
possibly different dimension, that captures the variation in the data. Since
operations on the Grassmann require the data to be concentrated, we propose an
adaptive algorithm based on Riemanniann K-means and the minimization of the
sample Frechet variance on the Grassmann manifold to identify "local" principal
geodesic submanifolds that represent different system behavior across the
parameter space. Polynomial chaos expansion is then used to construct a mapping
between the random input parameters and the projection of the response on these
local principal geodesic submanifolds. The method is demonstrated on four test
cases, a toy-example that involves points on a hypersphere, a Lotka-Volterra
dynamical system, a continuous-flow stirred-tank chemical reactor system, and a
two-dimensional Rayleigh-Benard convection problem
</p>
|
|
|
|
<p>Multimodal federated learning (FL) aims to enrich model training in FL
settings where clients are collecting measurements across multiple modalities.
However, key challenges to multimodal FL remain unaddressed, particularly in
heterogeneous network settings where: (i) the set of modalities collected by
each client will be diverse, and (ii) communication limitations prevent clients
from uploading all their locally trained modality models to the server. In this
paper, we propose multimodal Federated learning with joint Modality and Client
selection (mmFedMC), a new FL methodology that can tackle the above-mentioned
challenges in multimodal settings. The joint selection algorithm incorporates
two main components: (a) A modality selection methodology for each client,
which weighs (i) the impact of the modality, gauged by Shapley value analysis,
(ii) the modality model size as a gauge of communication overhead, against
(iii) the frequency of modality model updates, denoted recency, to enhance
generalizability. (b) A client selection strategy for the server based on the
local loss of modality model at each client. Experiments on five real-world
datasets demonstrate the ability of mmFedMC to achieve comparable accuracy to
several baselines while reducing the communication overhead by over 20x. A demo
video of our methodology is available at https://liangqiy.com/mmfedmc/.
</p>
|
|
|
|
<p>Collaborative learning (CL) is a distributed learning framework that aims to
protect user privacy by allowing users to jointly train a model by sharing
their gradient updates only. However, gradient inversion attacks (GIAs), which
recover users' training data from shared gradients, impose severe privacy
threats to CL. Existing defense methods adopt different techniques, e.g.,
differential privacy, cryptography, and perturbation defenses, to defend
against the GIAs. Nevertheless, all current defense methods suffer from a poor
trade-off between privacy, utility, and efficiency. To mitigate the weaknesses
of existing solutions, we propose a novel defense method, Dual Gradient Pruning
(DGP), based on gradient pruning, which can improve communication efficiency
while preserving the utility and privacy of CL. Specifically, DGP slightly
changes gradient pruning with a stronger privacy guarantee. And DGP can also
significantly improve communication efficiency with a theoretical analysis of
its convergence and generalization. Our extensive experiments show that DGP can
effectively defend against the most powerful GIAs and reduce the communication
cost without sacrificing the model's utility.
</p>
|
|
|
|
<p>In material sciences, characterizing faults in periodic structures is vital
for understanding material properties. To characterize magnetic labyrinthine
patterns, it is necessary to accurately identify junctions and terminals, often
featuring over a thousand closely packed defects per image. This study
introduces a new technique called TM-CNN (Template Matching - Convolutional
Neural Network) designed to detect a multitude of small objects in images, such
as defects in magnetic labyrinthine patterns. TM-CNN was used to identify these
structures in 444 experimental images, and the results were explored to deepen
the understanding of magnetic materials. It employs a two-stage detection
approach combining template matching, used in initial detection, with a
convolutional neural network, used to eliminate incorrect identifications. To
train a CNN classifier, it is necessary to create a large number of training
images. This difficulty prevents the use of CNN in many practical applications.
TM-CNN significantly reduces the manual workload for creating training images
by automatically making most of the annotations and leaving only a small number
of corrections to human reviewers. In testing, TM-CNN achieved an impressive F1
score of 0.988, far outperforming traditional template matching and CNN-based
object detection algorithms.
</p>
|
|
|
|
<p>The Standard Performance Evaluation Corporation (SPEC) CPU benchmark has been
widely used as a measure of computing performance for decades. The SPEC is an
industry-standardized, CPU-intensive benchmark suite and the collective data
provide a proxy for the history of worldwide CPU and system performance. Past
efforts have not provided or enabled answers to questions such as, how has the
SPEC benchmark suite evolved empirically over time and what micro-architecture
artifacts have had the most influence on performance? -- have any
micro-benchmarks within the suite had undue influence on the results and
comparisons among the codes? -- can the answers to these questions provide
insights to the future of computer system performance? To answer these
questions, we detail our historical and statistical analysis of specific
hardware artifacts (clock frequencies, core counts, etc.) on the performance of
the SPEC benchmarks since 1995. We discuss in detail several methods to
normalize across benchmark evolutions. We perform both isolated and collective
sensitivity analyses for various hardware artifacts and we identify one
benchmark (libquantum) that had somewhat undue influence on performance
outcomes. We also present the use of SPEC data to predict future performance.
</p>
|
|
|
|
<p>Deep learning has been widely adopted across various fields, but there has
been little focus on evaluating the performance of deep learning pipelines.
With the increased use of large datasets and complex models, it has become
common to run the training process only once and compare the result to previous
benchmarks. However, this procedure can lead to imprecise comparisons due to
the variance in neural network evaluation metrics. The metric variance comes
from the randomness inherent in the training process of deep learning
pipelines. Traditional solutions such as running the training process multiple
times are usually not feasible in deep learning due to computational
limitations. In this paper, we propose a new metric framework, Calibrated Loss
Metric, that addresses this issue by reducing the variance in its vanilla
counterpart. As a result, the new metric has a higher accuracy to detect
effective modeling improvement. Our approach is supported by theoretical
justifications and extensive experimental validations in the context of Deep
Click-Through Rate Prediction Models.
</p>
|
|
|
|
<p>Emerging applications, such as robot-assisted eldercare and object
recognition, generally employ deep learning neural networks (DNNs) models and
naturally require: i) handling streaming-in inference requests and ii) adapting
to possible deployment scenario changes. Online model fine-tuning is widely
adopted to satisfy these needs. However, fine-tuning involves significant
energy consumption, making it challenging to deploy on edge devices. In this
paper, we propose EdgeOL, an edge online learning framework that optimizes
inference accuracy, fine-tuning execution time, and energy efficiency through
both inter-tuning and intra-tuning optimizations. Experimental results show
that, on average, EdgeOL reduces overall fine-tuning execution time by 82%,
energy consumption by 74%, and improves average inference accuracy by 1.70%
over the immediate online learning strategy.
</p>
|
|
|
|
<p>Interactive visual grounding in Human-Robot Interaction (HRI) is challenging
yet practical due to the inevitable ambiguity in natural languages. It requires
robots to disambiguate the user input by active information gathering. Previous
approaches often rely on predefined templates to ask disambiguation questions,
resulting in performance reduction in realistic interactive scenarios. In this
paper, we propose TiO, an end-to-end system for interactive visual grounding in
human-robot interaction. Benefiting from a unified formulation of visual
dialogue and grounding, our method can be trained on a joint of extensive
public data, and show superior generality to diversified and challenging
open-world scenarios. In the experiments, we validate TiO on GuessWhat?! and
InViG benchmarks, setting new state-of-the-art performance by a clear margin.
Moreover, we conduct HRI experiments on the carefully selected 150 challenging
scenes as well as real-robot platforms. Results show that our method
demonstrates superior generality to diversified visual and language inputs with
a high success rate. Codes and demos are available at
https://github.com/jxu124/TiO.
</p>
|
|
|
|
<p>3D human pose estimation captures the human joint points in three-dimensional
space while keeping the depth information and physical structure. That is
essential for applications that require precise pose information, such as
human-computer interaction, scene understanding, and rehabilitation training.
Due to the challenges in data collection, mainstream datasets of 3D human pose
estimation are primarily composed of multi-view video data collected in
laboratory environments, which contains rich spatial-temporal correlation
information besides the image frame content. Given the remarkable
self-attention mechanism of transformers, capable of capturing the
spatial-temporal correlation from multi-view video datasets, we propose a
multi-stage framework for 3D sequence-to-sequence (seq2seq) human pose
detection. Firstly, the spatial module represents the human pose feature by
intra-image content, while the frame-image relation module extracts temporal
relationships and 3D spatial positional relationship features between the
multi-perspective images. Secondly, the self-attention mechanism is adopted to
eliminate the interference from non-human body parts and reduce computing
resources. Our method is evaluated on Human3.6M, a popular 3D human pose
detection dataset. Experimental results demonstrate that our approach achieves
state-of-the-art performance on this dataset.
</p>
|
|
|
|
<p>Consider the task of estimating a random vector $X$ from noisy observations
$Y = X + Z$, where $Z$ is a standard normal vector, under the $L^p$ fidelity
criterion. This work establishes that, for $1 \leq p \leq 2$, the optimal
Bayesian estimator is linear and positive definite if and only if the prior
distribution on $X$ is a (non-degenerate) multivariate Gaussian. Furthermore,
for $p > 2$, it is demonstrated that there are infinitely many priors that can
induce such an estimator.
</p>
|
|
|
|
<p>Existing video-language studies mainly focus on learning short video clips,
leaving long-term temporal dependencies rarely explored due to over-high
computational cost of modeling long videos. To address this issue, one feasible
solution is learning the correspondence between video clips and captions, which
however inevitably encounters the multi-granularity noisy correspondence (MNC)
problem. To be specific, MNC refers to the clip-caption misalignment
(coarse-grained) and frame-word misalignment (fine-grained), hindering temporal
learning and video understanding. In this paper, we propose NOise Robust
Temporal Optimal traNsport (Norton) that addresses MNC in a unified optimal
transport (OT) framework. In brief, Norton employs video-paragraph and
clip-caption contrastive losses to capture long-term dependencies based on OT.
To address coarse-grained misalignment in video-paragraph contrast, Norton
filters out the irrelevant clips and captions through an alignable prompt
bucket and realigns asynchronous clip-caption pairs based on transport
distance. To address the fine-grained misalignment, Norton incorporates a
soft-maximum operator to identify crucial words and key frames. Additionally,
Norton exploits the potential faulty negative samples in clip-caption contrast
by rectifying the alignment target with OT assignment to ensure precise
temporal modeling. Extensive experiments on video retrieval, videoQA, and
action segmentation verify the effectiveness of our method. Code is available
at https://lin-yijie.github.io/projects/Norton.
</p>
|
|
|
|
<p>Contemporary theories and models for electric power system stability are
predicated on a widely held assumption that the mechanical inertia of the
rotating mass of synchronous generators provides the sole contribution to
stable and synchronized operation of this class of complex networks on
subsecond timescales. Here we formulate the electromagnetic momentum of the
field around the transmission lines that transports energy and present evidence
from a real-world bulk power network that demonstrates its physical
significance. We show the classical stability model for power networks that
overlooks this property, known as the "swing equation", may become inadequate
to analyze systems with high shares of inverter-based resources, commonly known
as "low-inertia power systems". Subsequently, we introduce a plane wave dynamic
model, consistent with the structural properties of emerging power systems with
up to 100% inverter-based resources, which identifies the concept of inertia in
power grids as a time-varying component. We leverage our theory to discuss a
number of open questions in the electric power industry. Most notably, we
postulate that the changing nature of power networks with a preponderance of
variable renewable energy power plants could strengthen power network stability
in the future; a vision which is irreconcilable with the conventional theories.
</p>
|
|
|
|
<p>We consider the redundancy of the exact channel synthesis problem under an
i.i.d. assumption. Existing results provide an upper bound on the unnormalized
redundancy that is logarithmic in the block length. We show, via an improved
scheme, that the logarithmic term can be halved for most channels and
eliminated for all others. For full-support discrete memoryless channels, we
show that this is the best possible.
</p>
|
|
|
|
<p>This paper introduces the multivariate beta mixture model (MBMM), a new
probabilistic model for soft clustering. MBMM adapts to diverse cluster shapes
because of the flexible probability density function of the multivariate beta
distribution. We introduce the properties of MBMM, describe the parameter
learning procedure, and present the experimental results, showing that MBMM
fits diverse cluster shapes on synthetic and real datasets. The code is
released anonymously at \url{https://github.com/hhchen1105/mbmm/}.
</p>
|
|
|
|
<p>This paper is concerned with the ordered statistic decoding with local
constraints (LC-OSD) of binary linear block codes, which is a near
maximum-likelihood decoding algorithm. Compared with the conventional OSD, the
LC-OSD significantly reduces both the maximum and the average number of
searches. The former is achieved by performing the serial list Viterbi
algorithm (SLVA) or a two-way flipping pattern tree (FPT) algorithm with local
constraints on the test error patterns, while the latter is achieved by
incorporating tailored early termination criteria. The main objective of this
paper is to explore the relationship between the performance of the LC-OSD and
decoding parameters, such as the constraint degree and the maximum list size.
To this end, we approximate the local parity-check matrix as a totally random
matrix and then estimate the performance of the LC-OSD by analyzing with a
saddlepoint approach the performance of random codes over the channels
associated with the most reliable bits (MRBs). The random coding approach
enables us to derive an upper bound on the performance and predict the average
rank of the transmitted codeword in the list delivered by the LC-OSD. This
allows us to balance the constraint degree and the maximum list size for the
average (or maximum) time complexity reduction. Simulation results show that
the approximation by random coding approach is numerically effective and
powerful. Simulation results also show that the RS codes decoded by the LC-OSD
can approach the random coding union (RCU) bounds, verifying the efficiency and
universality of the LC-OSD.
</p>
|
|
|
|
<p>Human digital twin (HDT) is an emerging paradigm that bridges physical twins
(PTs) with powerful virtual twins (VTs) for assisting complex task executions
in human-centric services. In this paper, we study a two-timescale online
optimization for building HDT under an end-edge-cloud collaborative framework.
As a unique feature of HDT, we consider that PTs' corresponding VTs are
deployed on edge servers, consisting of not only generic models placed by
downloading experiential knowledge from the cloud but also customized models
updated by collecting personalized data from end devices. To maximize task
execution accuracy with stringent energy and delay constraints, and by taking
into account HDT's inherent mobility and status variation uncertainties, we
jointly and dynamically optimize VTs' construction and PTs' task offloading,
along with communication and computation resource allocations. Observing that
decision variables are asynchronous with different triggers, we propose a novel
two-timescale accuracy-aware online optimization approach (TACO). Specifically,
TACO utilizes an improved Lyapunov method to decompose the problem into
multiple instant ones, and then leverages piecewise McCormick envelopes and
block coordinate descent based algorithms, addressing two timescales
alternately. Theoretical analyses and simulations show that the proposed
approach can reach asymptotic optimum within a polynomial-time complexity, and
demonstrate its superiority over counterparts.
</p>
|
|
|
|
<p>Leveraging the rich information extracted from light field (LF) cameras is
instrumental for dense prediction tasks. However, adapting light field data to
enhance Salient Object Detection (SOD) still follows the traditional RGB
methods and remains under-explored in the community. Previous approaches
predominantly employ a custom two-stream design to discover the implicit
angular feature within light field cameras, leading to significant information
isolation between different LF representations. In this study, we propose an
efficient paradigm (LF Tracy) to address this limitation. We eschew the
conventional specialized fusion and decoder architecture for a dual-stream
backbone in favor of a unified, single-pipeline approach. This comprises
firstly a simple yet effective data augmentation strategy called MixLD to
bridge the connection of spatial, depth, and implicit angular information under
different LF representations. A highly efficient information aggregation (IA)
module is then introduced to boost asymmetric feature-wise information fusion.
Owing to this innovative approach, our model surpasses the existing
state-of-the-art methods, particularly demonstrating a 23% improvement over
previous results on the latest large-scale PKU dataset. By utilizing only 28.9M
parameters, the model achieves a 10% increase in accuracy with 3M additional
parameters compared to its backbone using RGB images and an 86% rise to its
backbone using LF images. The source code will be made publicly available at
https://github.com/FeiBryantkit/LF-Tracy.
</p>
|
|
|
|
<p>We demonstrate that large language models can produce reasonable numerical
ratings of the logical consistency of claims. We also outline a mathematical
approach based on sheaf theory for lifting such ratings to hypertexts such as
laws, jurisprudence, and social media and evaluating their consistency
globally. This approach is a promising avenue to increasing consistency in and
of government, as well as to combating mis- and disinformation and related
ills.
</p>
|
|
|
|
<p>Maintainers are now self-sabotaging their work in order to take political or
economic stances, a practice referred to as "protestware". In this poster, we
present our approach to understand how the discourse about such an attack went
viral, how it is received by the community, and whether developers respond to
the attack in a timely manner. We study two notable protestware cases, i.e.,
Colors.js and es5-ext, comparing with discussions of a typical security
vulnerability as a baseline, i.e., Ua-parser, and perform a thematic analysis
of more than two thousand protest-related posts to extract the different
narratives when discussing protestware.
</p>
|
|
|
|
<p>State estimation for legged robots is challenging due to their highly dynamic
motion and limitations imposed by sensor accuracy. By integrating Kalman
filtering, optimization, and learning-based modalities, we propose a hybrid
solution that combines proprioception and exteroceptive information for
estimating the state of the robot's trunk. Leveraging joint encoder and IMU
measurements, our Kalman filter is enhanced through a single-rigid body model
that incorporates ground reaction force control outputs from convex Model
Predictive Control optimization. The estimation is further refined through
Gated Recurrent Units, which also considers semantic insights and robot height
from a Vision Transformer autoencoder applied on depth images. This framework
not only furnishes accurate robot state estimates, including uncertainty
evaluations, but can minimize the nonlinear errors that arise from sensor
measurements and model simplifications through learning. The proposed
methodology is evaluated in hardware using a quadruped robot on various
terrains, yielding a 65% improvement on the Root Mean Squared Error compared to
our VIO SLAM baseline. Code example: https://github.com/AlexS28/OptiState
</p>
|
|
|
|
<p>There has been a proliferation of artificial intelligence applications, where
model training is key to promising high-quality services for these
applications. However, the model training process is both time-intensive and
energy-intensive, inevitably affecting the user's demand for application
efficiency. Layer freezing, an efficient model training technique, has been
proposed to improve training efficiency. Although existing layer freezing
methods demonstrate the great potential to reduce model training costs, they
still remain shortcomings such as lacking generalizability and compromised
accuracy. For instance, existing layer freezing methods either require the
freeze configurations to be manually defined before training, which does not
apply to different networks, or use heuristic freezing criteria that is hard to
guarantee decent accuracy in different scenarios. Therefore, there lacks a
generic and smart layer freezing method that can automatically perform
``in-situation'' layer freezing for different networks during training
processes. To this end, we propose a generic and efficient training framework
(SmartFRZ). The core proposed technique in SmartFRZ is attention-guided layer
freezing, which can automatically select the appropriate layers to freeze
without compromising accuracy. Experimental results show that SmartFRZ
effectively reduces the amount of computation in training and achieves
significant training acceleration, and outperforms the state-of-the-art layer
freezing approaches.
</p>
|
|
|
|
<p>In this paper, we propose a novel approach for conducting face morphing
attacks, which utilizes optimal-landmark-guided image blending. Current face
morphing attacks can be categorized into landmark-based and generation-based
approaches. Landmark-based methods use geometric transformations to warp facial
regions according to averaged landmarks but often produce morphed images with
poor visual quality. Generation-based methods, which employ generation models
to blend multiple face images, can achieve better visual quality but are often
unsuccessful in generating morphed images that can effectively evade
state-of-the-art face recognition systems~(FRSs). Our proposed method overcomes
the limitations of previous approaches by optimizing the morphing landmarks and
using Graph Convolutional Networks (GCNs) to combine landmark and appearance
features. We model facial landmarks as nodes in a bipartite graph that is fully
connected and utilize GCNs to simulate their spatial and structural
relationships. The aim is to capture variations in facial shape and enable
accurate manipulation of facial appearance features during the warping process,
resulting in morphed facial images that are highly realistic and visually
faithful. Experiments on two public datasets prove that our method inherits the
advantages of previous landmark-based and generation-based methods and
generates morphed images with higher quality, posing a more significant threat
to state-of-the-art FRSs.
</p>
|
|
|
|
<p>The trajectory tracking problem is a fundamental control task in the study of
mechanical systems. A key construction in tracking control is the error or
difference between an actual and desired trajectory. This construction also
lies at the heart of observer design and recent advances in the study of
equivariant systems have provided a template for global error construction that
exploits the symmetry structure of a group action if such a structure exists.
Hamiltonian systems are posed on the cotangent bundle of configuration space of
a mechanical system and symmetries for the full cotangent bundle are not
commonly used in geometric control theory. In this paper, we propose a group
structure on the cotangent bundle of a Lie group and leverage this to define
momentum and configuration errors for trajectory tracking drawing on recent
work on equivariant observer design. We show that this error definition leads
to error dynamics that are themselves ``Euler-Poincare like'' and use these to
derive simple, almost global trajectory tracking control for fully-actuated
Euler-Poincare systems on a Lie group state space.
</p>
|
|
|
|
<p>We study variable-length feedback (VLF) codes with noiseless feedback for
discrete memoryless channels. We present a novel non-asymptotic bound, which
analyzes the average error probability and average decoding time of our
modified Yamamoto--Itoh scheme. We then optimize the parameters of our code in
the asymptotic regime where the average error probability $\epsilon$ remains a
constant as the average decoding time $N$ approaches infinity. Our second-order
achievability bound is an improvement of Polyanskiy et al.'s (2011)
achievability bound. We also universalize our code by employing the empirical
mutual information in our decoding metric and derive a second-order
achievability bound for universal VLF codes. Our results for both VLF and
universal VLF codes are extended to the additive white Gaussian noise channel
with an average power constraint. The former yields an improvement over Truong
and Tan's (2017) achievability bound. The proof of our results for universal
VLF codes uses a refined version of the method of types and an asymptotic
expansion from the nonlinear renewal theory literature.
</p>
|
|
|
|
<p>In the evolving landscape of online communication, moderating hate speech
(HS) presents an intricate challenge, compounded by the multimodal nature of
digital content. This comprehensive survey delves into the recent strides in HS
moderation, spotlighting the burgeoning role of large language models (LLMs)
and large multimodal models (LMMs). Our exploration begins with a thorough
analysis of current literature, revealing the nuanced interplay between
textual, visual, and auditory elements in propagating HS. We uncover a notable
trend towards integrating these modalities, primarily due to the complexity and
subtlety with which HS is disseminated. A significant emphasis is placed on the
advances facilitated by LLMs and LMMs, which have begun to redefine the
boundaries of detection and moderation capabilities. We identify existing gaps
in research, particularly in the context of underrepresented languages and
cultures, and the need for solutions to handle low-resource settings. The
survey concludes with a forward-looking perspective, outlining potential
avenues for future research, including the exploration of novel AI
methodologies, the ethical governance of AI in moderation, and the development
of more nuanced, context-aware systems. This comprehensive overview aims to
catalyze further research and foster a collaborative effort towards more
sophisticated, responsible, and human-centric approaches to HS moderation in
the digital era.\footnote{ \textcolor{red}{WARNING: This paper contains
offensive examples.
</p>
|
|
|
|
<p>A recent study on the interpretability of real-valued convolutional neural
networks (CNNs) \cite{Stankovic_Mandic_2023CNN} has revealed a direct and
physically meaningful link with the task of finding features in data through
matched filters. However, applying this paradigm to illuminate the
interpretability of complex-valued CNNs meets a formidable obstacle: the
extension of matched filtering to a general class of noncircular complex-valued
data, referred to here as the widely linear matched filter (WLMF), has been
only implicit in the literature. To this end, to establish the interpretability
of the operation of complex-valued CNNs, we introduce a general WLMF paradigm,
provide its solution and undertake analysis of its performance. For rigor, our
WLMF solution is derived without imposing any assumption on the probability
density of noise. The theoretical advantages of the WLMF over its standard
strictly linear counterpart (SLMF) are provided in terms of their output
signal-to-noise-ratios (SNRs), with WLMF consistently exhibiting enhanced SNR.
Moreover, the lower bound on the SNR gain of WLMF is derived, together with
condition to attain this bound. This serves to revisit the
convolution-activation-pooling chain in complex-valued CNNs through the lens of
matched filtering, which reveals the potential of WLMFs to provide physical
interpretability and enhance explainability of general complex-valued CNNs.
Simulations demonstrate the agreement between the theoretical and numerical
results.
</p>
|
|
|
|
<p>Recent developments in transformer-based language models have allowed them to
capture a wide variety of world knowledge that can be adapted to downstream
tasks with limited resources. However, what pieces of information are
understood in these models is unclear, and neuron-level contributions in
identifying them are largely unknown. Conventional approaches in neuron
explainability either depend on a finite set of pre-defined descriptors or
require manual annotations for training a secondary model that can then explain
the neurons of the primary model. In this paper, we take BERT as an example and
we try to remove these constraints and propose a novel and scalable framework
that ties textual descriptions to neurons. We leverage the potential of
generative language models to discover human-interpretable descriptors present
in a dataset and use an unsupervised approach to explain neurons with these
descriptors. Through various qualitative and quantitative analyses, we
demonstrate the effectiveness of this framework in generating useful
data-specific descriptors with little human involvement in identifying the
neurons that encode these descriptors. In particular, our experiment shows that
the proposed approach achieves 75% precision@2, and 50% recall@2
</p>
|
|
|
|
<p>This paper presents Flash, an optimized private inference (PI) hybrid
protocol utilizing both homomorphic encryption (HE) and secure two-party
computation (2PC), which can reduce the end-to-end PI latency for deep CNN
models less than 1 minute with CPU. To this end, first, Flash proposes a
low-latency convolution algorithm built upon a fast slot rotation operation and
a novel data encoding scheme, which results in 4-94x performance gain over the
state-of-the-art. Second, to minimize the communication cost introduced by the
standard nonlinear activation function ReLU, Flash replaces the entire ReLUs
with the polynomial $x^2+x$ and trains deep CNN models with the new activation
function. The trained models improve the inference accuracy for CIFAR-10/100
and TinyImageNet by 16% on average (up to 40% for ResNet-32) compared to prior
art. Last, Flash proposes an efficient 2PC-based $x^2+x$ evaluation protocol
that does not require any offline communication and that reduces the total
communication cost to process the activation layer by 84-196x over the
state-of-the-art. As a result, the end-to-end PI latency of Flash implemented
on CPU is 0.02 minute for CIFAR-100 and 0.57 minute for TinyImageNet
classification, while the total data communication is 0.07GB for CIFAR-100 and
0.22GB for TinyImageNet. Flash improves the state-of-the-art PI by 16-45x in
latency and 84-196x in communication cost. Moreover, even for ImageNet, Flash
can deliver the latency less than 1 minute on CPU with the total communication
less than 1GB.
</p>
|
|
|
|
<p>The proliferation of deep learning in natural language processing (NLP) has
led to the development and release of innovative technologies capable of
understanding and generating human language with remarkable proficiency.
Atinuke, a Transformer-based neural network, optimises performance across
various language tasks by utilising a unique configuration. The architecture
interweaves layers for processing sequential data with attention mechanisms to
draw meaningful affinities between inputs and outputs. Due to the configuration
of its topology and hyperparameter tuning, it can emulate human-like language
by extracting features and learning complex mappings. Atinuke is modular,
extensible, and integrates seamlessly with existing machine learning pipelines.
Advanced matrix operations like softmax, embeddings, and multi-head attention
enable nuanced handling of textual, acoustic, and visual signals. By unifying
modern deep learning techniques with software design principles and
mathematical theory, the system achieves state-of-the-art results on natural
language tasks whilst remaining interpretable and robust.
</p>
|
|
|
|
<p>Feature matching is a crucial task in the field of computer vision, which
involves finding correspondences between images. Previous studies achieve
remarkable performance using learning-based feature comparison. However, the
pervasive presence of matching redundancy between images gives rise to
unnecessary and error-prone computations in these methods, imposing limitations
on their accuracy. To address this issue, we propose MESA, a novel approach to
establish precise area (or region) matches for efficient matching redundancy
reduction. MESA first leverages the advanced image understanding capability of
SAM, a state-of-the-art foundation model for image segmentation, to obtain
image areas with implicit semantic. Then, a multi-relational graph is proposed
to model the spatial structure of these areas and construct their scale
hierarchy. Based on graphical models derived from the graph, the area matching
is reformulated as an energy minimization task and effectively resolved.
Extensive experiments demonstrate that MESA yields substantial precision
improvement for multiple point matchers in indoor and outdoor downstream tasks,
e.g. +13.61% for DKM in indoor pose estimation.
</p>
|
|
|
|
<p>While generative AI is now widespread and useful in society, there are
potential risks of misuse, e.g., unconsciously influencing cognitive processes
or decision-making. Although this causes a security problem in the cognitive
domain, there has been no research about neural and computational mechanisms
counteracting the impact of malicious generative AI in humans. We propose
DecNefGAN, a novel framework that combines a generative adversarial system and
a neural reinforcement model. More specifically, DecNefGAN bridges human and
generative AI in a closed-loop system, with the AI creating stimuli that induce
specific mental states, thus exerting external control over neural activity.
The objective of the human is the opposite, to compete and reach an orthogonal
mental state. This framework can contribute to elucidating how the human brain
responds to and counteracts the potential influence of generative AI.
</p>
|
|
|
|
<p>In this paper, practical utilization of multiple distributed reconfigurable
intelligent surfaces (RISs), which are able to conduct group-specific
operations, for multi-group multicasting systems is investigated. To tackle the
inter-group interference issue in the multi-group multicasting systems, the
block diagonalization (BD)-based beamforming is considered first. Without any
inter-group interference after the BD operation, the multiple distributed RISs
are operated to maximize the minimum rate for each group. Since the
computational complexity of the BD-based beamforming can be too high, a
multicasting tailored zero-forcing (MTZF) beamforming technique is proposed to
efficiently suppress the inter-group interference, and the novel design for the
multiple RISs that makes up for the inevitable loss of MTZF beamforming is also
described. Effective closed-form solutions for the loss minimizing RIS
operations are obtained with basic linear operations, making the proposed MTZF
beamforming-based RIS design highly practical. Numerical results show that the
BD-based approach has ability to achieve high sum-rate, but it is useful only
when the base station deploys large antenna arrays. Even with the small number
of antennas, the MTZF beamforming-based approach outperforms the other schemes
in terms of the sum-rate while the technique requires low computational
complexity. The results also prove that the proposed techniques can work with
the minimum rate requirement for each group.
</p>
|
|
|
|
<p>Algorithmic decisions in critical domains such as hiring, college admissions,
and lending are often based on rankings. Because of the impact these decisions
have on individuals, organizations, and population groups, there is a need to
understand them: to know whether the decisions are abiding by the law, to help
individuals improve their rankings, and to design better ranking procedures.
</p>
<p>In this paper, we present ShaRP (Shapley for Rankings and Preferences), a
framework that explains the contributions of features to different aspects of a
ranked outcome, and is based on Shapley values. Using ShaRP, we show that even
when the scoring function used by an algorithmic ranker is known and linear,
the weight of each feature does not correspond to its Shapley value
contribution. The contributions instead depend on the feature distributions,
and on the subtle local interactions between the scoring features. ShaRP builds
on the Quantitative Input Influence framework, and can compute the
contributions of features for multiple Quantities of Interest, including score,
rank, pair-wise preference, and top-k. Because it relies on black-box access to
the ranker, ShaRP can be used to explain both score-based and learned ranking
models. We show results of an extensive experimental validation of ShaRP using
real and synthetic datasets, showcasing its usefulness for qualitative
analysis.
</p>
|
|
|
|
<p>Large language models (LLMs) are increasingly relied upon for complex
multi-turn conversations across diverse real-world applications. However,
existing benchmarks predominantly focus on single-turn evaluations, overlooking
the models' capabilities in multi-turn interactions. To address this gap, we
introduce MT-Eval, a comprehensive benchmark designed to evaluate multi-turn
conversational abilities. By analyzing human-LLM conversations, we categorize
interaction patterns into four types: recollection, expansion, refinement, and
follow-up. We construct multi-turn queries for each category either by
augmenting existing datasets or by creating new examples with GPT-4 to avoid
data leakage. To study the factors impacting multi-turn abilities, we create
single-turn versions of the 1170 multi-turn queries and compare performance.
Our evaluation of 11 well-known LLMs shows that while closed-source models
generally surpass open-source ones, certain open-source models exceed
GPT-3.5-Turbo in specific tasks. We observe significant performance degradation
in multi-turn settings compared to single-turn settings in most models, which
is not correlated with the models' fundamental capabilities. Moreover, we
identify the distance to relevant content and susceptibility to error
propagation as the key factors influencing multi-turn performance. MT-Eval is
released publicly to encourage future research towards more robust
conversational models.
</p>
|
|
|
|
<p>Racism is an alarming phenomenon in our country as well as all over the
world. Every day we have come across some racist comments in our daily life and
virtual life. Though we can eradicate this racism from virtual life (such as
Social Media). In this paper, we have tried to detect those racist comments
with NLP and deep learning techniques. We have built a novel dataset in the
Bengali Language. Further, we annotated the dataset and conducted data label
validation. After extensive utilization of deep learning methodologies, we have
successfully achieved text detection with an impressive accuracy rate of
87.94\% using the Ensemble approach. We have applied RNN and LSTM models using
BERT Embeddings. However, the MCNN-LSTM model performed highest among all those
models. Lastly, the Ensemble approach has been followed to combine all the
model results to increase overall performance.
</p>
|
|
|
|
<p>We study communication over a Gaussian multiple-access channel (MAC) with two
types of transmitters: Digital transmitters hold a message from a discrete set
that needs to be communicated to the receiver. Analog transmitters hold
sequences of analog values, and some function of these distributed values (but
not the values themselves) need to be conveyed to the receiver. For the digital
messages, it is required that they can be decoded error free at the receiver
with high probability while the recovered analog function values have to
satisfy a fidelity criterion such as an upper bound on mean squared error (MSE)
or a certain maximum error with a given confidence. For the case in which the
computed function for the analog transmitters is a sum of values in [-1,1], we
derive inner and outer bounds for the tradeoff of digital and analog rates of
communication under peak and average power constraints for digital transmitters
and a peak power constraint for analog transmitters. We then extend the
achievability part of our result to a larger class of functions that includes
all linear, but also some non-linear functions.
</p>
|
|
|
|
<p>This paper studies zero-shot anomaly classification (AC) and segmentation
(AS) in industrial vision. We reveal that the abundant normal and abnormal cues
implicit in unlabeled test images can be exploited for anomaly determination,
which is ignored by prior methods. Our key observation is that for the
industrial product images, the normal image patches could find a relatively
large number of similar patches in other unlabeled images, while the abnormal
ones only have a few similar patches. We leverage such a discriminative
characteristic to design a novel zero-shot AC/AS method by Mutual Scoring
(MuSc) of the unlabeled images, which does not need any training or prompts.
Specifically, we perform Local Neighborhood Aggregation with Multiple Degrees
(LNAMD) to obtain the patch features that are capable of representing anomalies
in varying sizes. Then we propose the Mutual Scoring Mechanism (MSM) to
leverage the unlabeled test images to assign the anomaly score to each other.
Furthermore, we present an optimization approach named Re-scoring with
Constrained Image-level Neighborhood (RsCIN) for image-level anomaly
classification to suppress the false positives caused by noises in normal
images. The superior performance on the challenging MVTec AD and VisA datasets
demonstrates the effectiveness of our approach. Compared with the
state-of-the-art zero-shot approaches, MuSc achieves a $\textbf{21.1%}$ PRO
absolute gain (from 72.7% to 93.8%) on MVTec AD, a $\textbf{19.4%}$ pixel-AP
gain and a $\textbf{14.7%}$ pixel-AUROC gain on VisA. In addition, our
zero-shot approach outperforms most of the few-shot approaches and is
comparable to some one-class methods. Code is available at
https://github.com/xrli-U/MuSc.
</p>
|
|
|
|
<p>Powered by the increasing predictive capabilities of machine learning
algorithms, artificial intelligence (AI) systems have begun to be used to
overrule human mistakes in many settings. We provide the first field evidence
this AI oversight carries psychological costs that can impact human
decision-making. We investigate one of the highest visibility settings in which
AI oversight has occurred: the Hawk-Eye review of umpires in top tennis
tournaments. We find that umpires lowered their overall mistake rate after the
introduction of Hawk-Eye review, in line with rational inattention given
psychological costs of being overruled by AI. We also find that umpires
increased the rate at which they called balls in, which produced a shift from
making Type II errors (calling a ball out when in) to Type I errors (calling a
ball in when out). We structurally estimate the psychological costs of being
overruled by AI using a model of rational inattentive umpires, and our results
suggest that because of these costs, umpires cared twice as much about Type II
errors under AI oversight.
</p>
|
|
|
|
<p>Dynamical behaviors of complex interacting systems, including brain
activities, financial price movements, and physical collective phenomena, are
associated with underlying interactions between the system's components. The
issue of uncovering interaction relations in such systems using observable
dynamics is called relational inference. In this study, we propose a Diffusion
model for Relational Inference (DiffRI), inspired by a self-supervised method
for probabilistic time series imputation. DiffRI learns to infer the
probability of the presence of connections between components through
conditional diffusion modeling. Experiments on both simulated and quasi-real
datasets show that DiffRI is highly competent compared with other
state-of-the-art models in discovering ground truth interactions in an
unsupervised manner. Our code will be made public soon.
</p>
|
|
|
|
<p>Executing deep neural networks (DNNs) on edge artificial intelligence (AI)
devices enables various autonomous mobile computing applications. However, the
memory budget of edge AI devices restricts the number and complexity of DNNs
allowed in such applications. Existing solutions, such as model compression or
cloud offloading, reduce the memory footprint of DNN inference at the cost of
decreased model accuracy or autonomy. To avoid these drawbacks, we divide DNN
into blocks and swap them in and out in order, such that large DNNs can execute
within a small memory budget. Nevertheless, naive swapping on edge AI devices
induces significant delays due to the redundant memory operations in the DNN
development ecosystem for edge AI devices. To this end, we develop SwapNet, an
efficient DNN block swapping middleware for edge AI devices. We systematically
eliminate the unnecessary memory operations during block swapping while
retaining compatible with the deep learning frameworks, GPU backends, and
hardware architectures of edge AI devices. We further showcase the utility of
SwapNet via a multi-DNN scheduling scheme. Evaluations on eleven DNN inference
tasks in three applications demonstrate that SwapNet achieves almost the same
latency as the case with sufficient memory even when DNNs demand 2.32x to 5.81x
memory beyond the available budget. The design of SwapNet also provides novel
and feasible insights for deploying large language models (LLMs) on edge AI
devices in the future.
</p>
|
|
|
|
<p>We construct a system, Sandi, to bring trust in online communication between
parties that share little or no context. Sandi is based on a unique ``somewhat
monotone'' privacy-preserving reputation system, with strong privacy and
security properties. Registered senders request cryptographic tags from Sandi,
which they attach to their messages. Message receivers do not need registered
accounts, but they can use a sender's score to decide how much the sender
should be trusted. If a receiver finds the message inappropriate, they can use
the tag to report the sender to Sandi, thus decreasing the sender's score. The
design of Sandi ensures compatibility with any communication system that allows
for small binary data transmission.
</p>
<p>Sandi aims to benefit both senders and receivers. Senders benefit, as
receivers are more likely to react to their messages with reputation scores
attached. Receivers benefit, as they can make better choices in who to interact
with based on indisputable evidence from prior receivers.
</p>
<p>Sandi does not require senders or receivers to maintain long-term secret
keys. We provide a score integrity guarantee for the senders, a full
communication privacy guarantee for the senders and receivers, a report privacy
guarantee to protect reporting receivers, and an unlinkability guarantee to
protect senders.
</p>
<p>Finally, we provide a game-theoretic analysis for the sender. We prove that,
for any score function satisfying a list of properties, Sandi drives rational
senders towards a strategy, which reduces the amount of inappropriate messages.
</p>
|
|
|
|
<p>Weight quantization is an effective technique to compress deep neural
networks for their deployment on edge devices with limited resources.
Traditional loss-aware quantization methods commonly use the quantized gradient
to replace the full-precision gradient. However, we discover that the gradient
error will lead to an unexpected zig-zagging-like issue in the gradient descent
learning procedures, where the gradient directions rapidly oscillate or
zig-zag, and such issue seriously slows down the model convergence.
Accordingly, this paper proposes a one-step forward and backtrack way for
loss-aware quantization to get more accurate and stable gradient direction to
defy this issue. During the gradient descent learning, a one-step forward
search is designed to find the trial gradient of the next-step, which is
adopted to adjust the gradient of current step towards the direction of fast
convergence. After that, we backtrack the current step to update the
full-precision and quantized weights through the current-step gradient and the
trial gradient. A series of theoretical analysis and experiments on benchmark
deep models have demonstrated the effectiveness and competitiveness of the
proposed method, and our method especially outperforms others on the
convergence performance.
</p>
|
|
|
|
<p>Diffusion-based text-to-image personalization have achieved great success in
generating subjects specified by users among various contexts. Even though,
existing finetuning-based methods still suffer from model overfitting, which
greatly harms the generative diversity, especially when given subject images
are few. To this end, we propose Pick-and-Draw, a training-free semantic
guidance approach to boost identity consistency and generative diversity for
personalization methods. Our approach consists of two components: appearance
picking guidance and layout drawing guidance. As for the former, we construct
an appearance palette with visual features from the reference image, where we
pick local patterns for generating the specified subject with consistent
identity. As for layout drawing, we outline the subject's contour by referring
to a generative template from the vanilla diffusion model, and inherit the
strong image prior to synthesize diverse contexts according to different text
conditions. The proposed approach can be applied to any personalized diffusion
models and requires as few as a single reference image. Qualitative and
quantitative experiments show that Pick-and-Draw consistently improves identity
consistency and generative diversity, pushing the trade-off between subject
fidelity and image-text fidelity to a new Pareto frontier.
</p>
|
|
|
|
<p>We present an overview of recent developments on the convergence analysis of
numerical methods for inviscid multidimensional compressible flows that
preserve underlying physical structures. We introduce the concept of
generalized solutions, the so-called dissipative solutions, and explain their
relationship to other commonly used solution concepts. In numerical experiments
we apply K-convergence of numerical solutions and approximate turbulent
solutions together with the Reynolds stress defect and the energy defect.
</p>
|
|
|
|
<p>Witnessing the evolution of text-to-image diffusion models, significant
strides have been made in text-to-3D generation. Currently, two primary
paradigms dominate the field of text-to-3D: the feed-forward generation
solutions, capable of swiftly producing 3D assets but often yielding coarse
results, and the Score Distillation Sampling (SDS) based solutions, known for
generating high-fidelity 3D assets albeit at a slower pace. The synergistic
integration of these methods holds substantial promise for advancing 3D
generation techniques. In this paper, we present BoostDream, a highly efficient
plug-and-play 3D refining method designed to transform coarse 3D assets into
high-quality. The BoostDream framework comprises three distinct processes: (1)
We introduce 3D model distillation that fits differentiable representations
from the 3D assets obtained through feed-forward generation. (2) A novel
multi-view SDS loss is designed, which utilizes a multi-view aware 2D diffusion
model to refine the 3D assets. (3) We propose to use prompt and multi-view
consistent normal maps as guidance in refinement.Our extensive experiment is
conducted on different differentiable 3D representations, revealing that
BoostDream excels in generating high-quality 3D assets rapidly, overcoming the
Janus problem compared to conventional SDS-based methods. This breakthrough
signifies a substantial advancement in both the efficiency and quality of 3D
generation processes.
</p>
|
|
|
|
<p>Large Language Models (LLMs) have become increasingly popular for their
advanced text generation capabilities across various domains. However, like any
software, they face security challenges, including the risk of 'jailbreak'
attacks that manipulate LLMs to produce prohibited content. A particularly
underexplored area is the Multilingual Jailbreak attack, where malicious
questions are translated into various languages to evade safety filters.
Currently, there is a lack of comprehensive empirical studies addressing this
specific threat.
</p>
<p>To address this research gap, we conducted an extensive empirical study on
Multilingual Jailbreak attacks. We developed a novel semantic-preserving
algorithm to create a multilingual jailbreak dataset and conducted an
exhaustive evaluation on both widely-used open-source and commercial LLMs,
including GPT-4 and LLaMa. Additionally, we performed interpretability analysis
to uncover patterns in Multilingual Jailbreak attacks and implemented a
fine-tuning mitigation method. Our findings reveal that our mitigation strategy
significantly enhances model defense, reducing the attack success rate by
96.2%. This study provides valuable insights into understanding and mitigating
Multilingual Jailbreak attacks.
</p>
|
|
|
|
<p>Deep Neural Network (DNN) models when implemented on executing devices as the
inference engines are susceptible to Fault Injection Attacks (FIAs) that
manipulate model parameters to disrupt inference execution with disastrous
performance. This work introduces Contrastive Learning (CL) of visual
representations i.e., a self-supervised learning approach into the deep
learning training and inference pipeline to implement DNN inference engines
with self-resilience under FIAs. Our proposed CL based FIA Detection and
Recovery (CFDR) framework features (i) real-time detection with only a single
batch of testing data and (ii) fast recovery effective even with only a small
amount of unlabeled testing data. Evaluated with the CIFAR-10 dataset on
multiple types of FIAs, our CFDR shows promising detection and recovery
effectiveness.
</p>
|
|
|
|
<p>Molecular core structures and R-groups are essential concepts in drug
development. Integration of these concepts with conventional graph pre-training
approaches can promote deeper understanding in molecules. We propose MolPLA, a
novel pre-training framework that employs masked graph contrastive learning in
understanding the underlying decomposable parts inmolecules that implicate
their core structure and peripheral R-groups. Furthermore, we formulate an
additional framework that grants MolPLA the ability to help chemists find
replaceable R-groups in lead optimization scenarios. Experimental results on
molecular property prediction show that MolPLA exhibits predictability
comparable to current state-of-the-art models. Qualitative analysis implicate
that MolPLA is capable of distinguishing core and R-group sub-structures,
identifying decomposable regions in molecules and contributing to lead
optimization scenarios by rationally suggesting R-group replacements given
various query core templates. The code implementation for MolPLA and its
pre-trained model checkpoint is available at https://github.com/dmis-lab/MolPLA
</p>
|
|
|
|
<p>Imitation learning is often used in addition to reinforcement learning in
environments where reward design is difficult or where the reward is sparse,
but it is difficult to be able to imitate well in unknown states from a small
amount of expert data and sampling data. Supervised learning methods such as
Behavioral Cloning do not require sampling data, but usually suffer from
distribution shift. The methods based on reinforcement learning, such as
inverse reinforcement learning and Generative Adversarial imitation learning
(GAIL), can learn from only a few expert data. However, they often need to
interact with the environment. Soft Q imitation learning (SQIL) addressed the
problems, and it was shown that it could learn efficiently by combining
Behavioral Cloning and soft Q-learning with constant rewards. In order to make
this algorithm more robust to distribution shift, we propose more efficient and
robust algorithm by adding to this method a reward function based on
adversarial inverse reinforcement learning that rewards the agent for
performing actions in status similar to the demo. We call this algorithm
Discriminator Soft Q Imitation Learning (DSQIL). We evaluated it on MuJoCo
environments.
</p>
|
|
|
|
<p>Activity detection is an important task in the next generation grant-free
multiple access. While there are a number of existing algorithms designed for
this purpose, they mostly require precise information about the network, such
as large-scale fading coefficients, small-scale fading channel statistics,
noise variance at the access points, and user activity probability. Acquiring
these information would take a significant overhead and their estimated values
might not be accurate. This problem is even more severe in cell-free networks
as there are many of these parameters to be acquired. Therefore, this paper
sets out to investigate the activity detection problem without the
above-mentioned information. In order to handle so many unknown parameters,
this paper employs the Bayesian approach, where the unknown variables are
endowed with prior distributions which effectively act as regularizations.
Together with the likelihood function, a maximum a posteriori (MAP) estimator
and a variational inference algorithm are derived. Extensive simulations
demonstrate that the proposed methods, even without the knowledge of these
system parameters, perform better than existing state-of-the-art methods, such
as covariance-based and approximate message passing methods.
</p>
|
|
|
|
<p>Sequential neural posterior estimation (SNPE) techniques have been recently
proposed for dealing with simulation-based models with intractable likelihoods.
They are devoted to learning the posterior from adaptively proposed simulations
using neural network-based conditional density estimators. As a SNPE technique,
the automatic posterior transformation (APT) method proposed by Greenberg et
al. (2019) performs notably and scales to high dimensional data. However, the
APT method bears the computation of an expectation of the logarithm of an
intractable normalizing constant, i.e., a nested expectation. Although atomic
APT was proposed to solve this by discretizing the normalizing constant, it
remains challenging to analyze the convergence of learning. In this paper, we
propose a nested APT method to estimate the involved nested expectation
instead. This facilitates establishing the convergence analysis. Since the
nested estimators for the loss function and its gradient are biased, we make
use of unbiased multi-level Monte Carlo (MLMC) estimators for debiasing. To
further reduce the excessive variance of the unbiased estimators, this paper
also develops some truncated MLMC estimators by taking account of the trade-off
between the bias and the average cost. Numerical experiments for approximating
complex posteriors with multimodal in moderate dimensions are provided.
</p>
|
|
|
|
<p>Due to non-stationarity of time series, the distribution shift problem
largely hinders the performance of time series forecasting. Existing solutions
either fail for the shifts beyond simple statistics or the limited
compatibility with forecasting models. In this paper, we propose a general
decoupled formulation for time series forecasting, with no reliance on fixed
statistics and no restriction on forecasting architectures. Then, we make such
a formulation formalized into a bi-level optimization problem, to enable the
joint learning of the transformation (outer loop) and forecasting (inner loop).
Moreover, the special requirements of expressiveness and bi-direction for the
transformation motivate us to propose instance normalization flows (IN-Flow), a
novel invertible network for time series transformation. Extensive experiments
demonstrate our method consistently outperforms state-of-the-art baselines on
both synthetic and real-world data.
</p>
|
|
|
|
<p>In this paper, we present a signaling design for secure integrated sensing
and communication (ISAC) systems comprising a dual-functional multi-input
multi-output (MIMO) base station (BS) that simultaneously communicates with
multiple users while detecting targets present in their vicinity, which are
regarded as potential eavesdroppers. In particular, assuming that the
distribution of each parameter to be estimated is known \textit{a priori}, we
focus on optimizing the targets' sensing performance. To this end, we derive
and minimize the Bayesian Cram\'er-Rao bound (BCRB), while ensuring certain
communication quality of service (QoS) by exploiting constructive interference
(CI). The latter scheme enforces that the received signals at the eavesdropping
targets fall into the destructive region of the signal constellation, to
deteriorate their decoding probability, thus enhancing the ISAC's system
physical-layer security (PLS) capability. To tackle the nonconvexity of the
formulated problem, a tailored successive convex approximation method is
proposed for its efficient solution. Our extensive numerical results verify the
effectiveness of the proposed secure ISAC design showing that the proposed
algorithm outperforms block-level precoding techniques.
</p>
|
|
|
|
<p>Complex field imaging, which captures both the amplitude and phase
information of input optical fields or objects, can offer rich structural
insights into samples, such as their absorption and refractive index
distributions. However, conventional image sensors are intensity-based and
inherently lack the capability to directly measure the phase distribution of a
field. This limitation can be overcome using interferometric or holographic
methods, often supplemented by iterative phase retrieval algorithms, leading to
a considerable increase in hardware complexity and computational demand. Here,
we present a complex field imager design that enables snapshot imaging of both
the amplitude and quantitative phase information of input fields using an
intensity-based sensor array without any digital processing. Our design
utilizes successive deep learning-optimized diffractive surfaces that are
structured to collectively modulate the input complex field, forming two
independent imaging channels that perform amplitude-to-amplitude and
phase-to-intensity transformations between the input and output planes within a
compact optical design, axially spanning ~100 wavelengths. The intensity
distributions of the output fields at these two channels on the sensor plane
directly correspond to the amplitude and quantitative phase profiles of the
input complex field, eliminating the need for any digital image reconstruction
algorithms. We experimentally validated the efficacy of our complex field
diffractive imager designs through 3D-printed prototypes operating at the
terahertz spectrum, with the output amplitude and phase channel images closely
aligning with our numerical simulations. We envision that this complex field
imager will have various applications in security, biomedical imaging, sensing
and material science, among others.
</p>
|
|
|
|
<p>This paper provides a comprehensive review of the latest advancements in
fetal motion correction in MRI. We delve into various contemporary
methodologies and technological advancements aimed at overcoming these
challenges. It includes traditional 3D fetal MRI correction methods like Slice
to Volume Registration (SVR), deep learning-based techniques such as
Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) Networks,
Transformers, Generative Adversarial Networks (GANs) and most recent
advancements of Diffusion Models. The insights derived from this literature
review reflect a thorough understanding of both the technical intricacies and
practical implications of fetal motion in MRI studies, offering a reasoned
perspective on potential solutions and future improvements in this field.
</p>
|
|
|
|
<p>Graph neural networks (GNNs) have achieved remarkable performance on
graph-structured data. However, GNNs may inherit prejudice from the training
data and make discriminatory predictions based on sensitive attributes, such as
gender and race. Recently, there has been an increasing interest in ensuring
fairness on GNNs, but all of them are under the assumption that the training
and testing data are under the same distribution, i.e., training data and
testing data are from the same graph. Will graph fairness performance decrease
under distribution shifts? How does distribution shifts affect graph fairness
learning? All these open questions are largely unexplored from a theoretical
perspective. To answer these questions, we first theoretically identify the
factors that determine bias on a graph. Subsequently, we explore the factors
influencing fairness on testing graphs, with a noteworthy factor being the
representation distances of certain groups between the training and testing
graph. Motivated by our theoretical analysis, we propose our framework
FatraGNN. Specifically, to guarantee fairness performance on unknown testing
graphs, we propose a graph generator to produce numerous graphs with
significant bias and under different distributions. Then we minimize the
representation distances for each certain group between the training graph and
generated graphs. This empowers our model to achieve high classification and
fairness performance even on generated graphs with significant bias, thereby
effectively handling unknown testing graphs. Experiments on real-world and
semi-synthetic datasets demonstrate the effectiveness of our model in terms of
both accuracy and fairness.
</p>
|
|
|
|
<p>Support vector regression (SVR) has garnered significant popularity over the
past two decades owing to its wide range of applications across various fields.
Despite its versatility, SVR encounters challenges when confronted with
outliers and noise, primarily due to the use of the $\varepsilon$-insensitive
loss function. To address this limitation, SVR with bounded loss functions has
emerged as an appealing alternative, offering enhanced generalization
performance and robustness. Notably, recent developments focus on designing
bounded loss functions with smooth characteristics, facilitating the adoption
of gradient-based optimization algorithms. However, it's crucial to highlight
that these bounded and smooth loss functions do not possess an insensitive
zone. In this paper, we address the aforementioned constraints by introducing a
novel symmetric loss function named the HawkEye loss function. It is worth
noting that the HawkEye loss function stands out as the first loss function in
SVR literature to be bounded, smooth, and simultaneously possess an insensitive
zone. Leveraging this breakthrough, we integrate the HawkEye loss function into
the least squares framework of SVR and yield a new fast and robust model termed
HE-LSSVR. The optimization problem inherent to HE-LSSVR is addressed by
harnessing the adaptive moment estimation (Adam) algorithm, known for its
adaptive learning rate and efficacy in handling large-scale problems. To our
knowledge, this is the first time Adam has been employed to solve an SVR
problem. To empirically validate the proposed HE-LSSVR model, we evaluate it on
UCI, synthetic, and time series datasets. The experimental outcomes
unequivocally reveal the superiority of the HE-LSSVR model both in terms of its
remarkable generalization performance and its efficiency in training time.
</p>
|
|
|
|
<p>MediaWiki and Wikipedia authors usually use LaTeX to define mathematical
formulas in the wiki text markup. In the Wikimedia ecosystem, these formulas
were processed by a long cascade of web services and finally delivered to
users' browsers in rendered form for visually readable representation as SVG.
</p>
<p>With the latest developments of supporting MathML Core in Chromium-based
browsers, MathML continues its path to be a de facto standard markup language
for mathematical notation in the web. Conveying formulas in MathML enables
semantic annotation and machine readability for extended interpretation of
mathematical content, in example for accessibility technologies.
</p>
<p>With this work, we present WikiTexVC, a novel method for validating LaTeX
formulas from wiki texts and converting them to MathML, which is directly
integrated into MediaWiki. This mitigates the shortcomings of previously used
rendering methods in MediaWiki in terms of robustness, maintainability and
performance. In addition, there is no need for a multitude of web services
running in the background, but processing takes place directly within MediaWiki
instances. We validated this method with an extended dataset of over 300k
formulas which have been incorporated as automated tests to the MediaWiki
continuous integration instances. Furthermore, we conducted an evaluation with
423 formulas, comparing the tree edit distance for produced parse trees to
other MathML renderers. Our method has been made available Open Source and can
be used on German Wikipedia and is delivered with recent MediaWiki versions. As
a practical example of enabling semantic annotations within our method, we
present a new macro that adds content to formula disambiguation to facilitate
accessibility for visually impaired people.
</p>
|
|
|
|
<p>Despite the utility of Large Language Models (LLMs) across a wide range of
tasks and scenarios, developing a method for reliably evaluating LLMs across
varied contexts continues to be challenging. Modern evaluation approaches often
use LLMs to assess responses generated by LLMs. However, the meta-evaluation
conducted to assess the effectiveness of these LLMs as evaluators is typically
constrained by the coverage of existing benchmarks or requires extensive human
annotation. This underscores the urgency of methods for scalable
meta-evaluation that can effectively, reliably, and efficiently evaluate the
performance of LLMs as evaluators across diverse tasks and scenarios,
particularly in potentially new, user-defined scenarios. To fill this gap, we
propose ScaleEval, an agent-debate-assisted meta-evaluation framework that
leverages the capabilities of multiple communicative LLM agents. This framework
supports multi-round discussions to assist human annotators in discerning the
most capable LLMs as evaluators, which significantly eases their workload in
cases that used to require large-scale annotations during meta-evaluation. We
release the code for our framework, which is publicly available at:
\url{https://github.com/GAIR-NLP/scaleeval}.
</p>
|
|
|
|
<p>Training an effective Machine learning (ML) model is an iterative process
that requires effort in multiple dimensions. Vertically, a single pipeline
typically includes an initial ETL (Extract, Transform, Load) of raw datasets, a
model training stage, and an evaluation stage where the practitioners obtain
statistics of the model performance. Horizontally, many such pipelines may be
required to find the best model within a search space of model configurations.
Many practitioners resort to maintaining logs manually and writing simple glue
code to automate the workflow. However, carrying out this process on the cloud
is not a trivial task in terms of resource provisioning, data management, and
bookkeeping of job histories to make sure the results are reproducible. We
propose an end-to-end cloud-based machine learning platform, Accelerated Cloud
for AI (ACAI), to help improve the productivity of ML practitioners. ACAI
achieves this goal by enabling cloud-based storage of indexed, labeled, and
searchable data, as well as automatic resource provisioning, job scheduling,
and experiment tracking. Specifically, ACAI provides practitioners (1) a data
lake for storing versioned datasets and their corresponding metadata, and (2)
an execution engine for executing ML jobs on the cloud with automatic resource
provisioning (auto-provision), logging and provenance tracking. To evaluate
ACAI, we test the efficacy of our auto-provisioner on the MNIST handwritten
digit classification task, and we study the usability of our system using
experiments and interviews. We show that our auto-provisioner produces a 1.7x
speed-up and 39% cost reduction, and our system reduces experiment time for ML
scientists by 20% on typical ML use cases.
</p>
|
|
|
|
<p>The Versal Adaptive Compute Acceleration Platform (ACAP) is a new
architecture that combines AI Engines (AIEs) with reconfigurable fabric. This
architecture offers significant acceleration potential for uniform recurrences
in various domains, such as deep learning, high-performance computation, and
signal processing. However, efficiently mapping these computations onto the
Versal ACAP architecture while achieving high utilization of AIEs poses a
challenge.
</p>
<p>To address this issue, we propose a mapping scheme called \fname, which aims
to accelerate uniform recurrences on the Versal ACAP architecture by leveraging
the features of both the hardware and the computations. Considering the array
architecture of AIEs, our approach utilizes space-time transformations based on
the polyhedral model to generate legally optimized systolic array mappings.
Concurrently, we have developed a routing-aware PLIO assignment algorithm
tailored for communication on the AIE array, and the algorithm aims at
successful compilation while maximizing array utilization. Furthermore, we
introduce an automatic mapping framework. This framework is designed to
generate the corresponding executable code for uniform recurrences, which
encompasses the AIE kernel program, programmable logic bitstreams, and the host
program. The experimental results validate the effectiveness of our mapping
scheme. Specifically, when applying our scheme to matrix multiplication
computations on the VCK5000 board, we achieve a throughput of 4.15TOPS on float
data type, which is 1.11$\times$ higher compared to the state-of-the-art
accelerator on the Versal ACAP architecture.
</p>
|
|
|
|
<p>The development of feedback controllers is undergoing a paradigm shift from
$\textit{modelic}$ (model-driven) control to $\textit{datatic}$ (data-driven)
control. Stability, as a fundamental property in control, is less well studied
in datatic control paradigm. The difficulty is that traditional stability
criteria rely on explicit system models, which are not available in those
systems with datatic description. Some pioneering works explore stability
criteria for datatic systems with special forms such as linear systems,
homogeneous systems, and polynomial systems. However, these systems imply too
strong assumptions on the inherent connection among data points, which do not
hold in general nonlinear systems. This paper proposes a stability verification
algorithm for general datatic control systems called $\eta$-testing. Our
stability criterion only relies on a weak assumption of Lipschitz continuity so
as to extend information from known data points to unmeasured regions. This
information restricts the time derivative of any unknown state to the
intersection of a set of closed balls. Inside the intersection, the worst-case
time derivative of Lyapunov function is estimated by solving a quadratically
constrained linear program (QCLP). By comparing the optimal values of QCLPs to
zero in the whole state space, a sufficient condition of system stability can
be checked. We test our algorithm on three datatic control systems, including
both linear and nonlinear ones. Results show that our algorithm successfully
verifies the stability, instability, and critical stability of tested systems.
</p>
|
|
|
|
<p>We developed an artificial intelligence approach to predict the transfer fee
of a football player. This model can help clubs make better decisions about
which players to buy and sell, which can lead to improved performance and
increased club budgets. Having collected data on player performance, transfer
fees, and other factors that might affect a player's value, we then used this
data to train a machine learning model that can accurately predict a player's
impact on the game. We further passed the obtained results as one of the
features to the predictor of transfer fees. The model can help clubs identify
players who are undervalued and who could be sold for a profit. It can also
help clubs avoid overpaying for players. We believe that our model can be a
valuable tool for football clubs. It can help them make better decisions about
player recruitment and transfers.
</p>
|
|
|
|
<p>Analyzing the health status of patients based on Electronic Health Records
(EHR) is a fundamental research problem in medical informatics. The presence of
extensive missing values in EHR makes it challenging for deep neural networks
to directly model the patient's health status based on EHR. Existing deep
learning training protocols require the use of statistical information or
imputation models to reconstruct missing values; however, the protocols inject
non-realistic data into downstream EHR analysis models, significantly limiting
model performance. This paper introduces Learnable Prompt as Pseudo Imputation
(PAI) as a new training protocol. PAI no longer introduces any imputed data but
constructs a learnable prompt to model the implicit preferences of the
downstream model for missing values, resulting in a significant performance
improvement for all EHR analysis models. Additionally, our experiments show
that PAI exhibits higher robustness in situations of data insufficiency and
high missing rates. More importantly, in a real-world application involving
cross-institutional data with zero-shot evaluation, PAI demonstrates stronger
model generalization capabilities for non-overlapping features.
</p>
|
|
|
|
<p>This paper presents a framework that integrates Large Language Models (LLMs)
into translation validation, targeting LLVM compiler transformations where
formal verification tools are insufficient. Our framework first utilizes
existing formal verification frameworks for translation validation. In this
work, we use Alive2, a well-known tool in LLVM compiler verification, as an
example. When formal verification frameworks are unable to confirm a
transformation's soundness, our framework employs fine-tuned LLMs for
prediction. It applies fuzzing to transformations predicted as potentially
unsound by the LLMs due to return value or memory inconsistencies, aiming to
find counterexamples. In cases where transformations are unsound for other
reasons or sound, or if no counterexamples emerge, the framework directly
reports these outcomes without further fuzzing. This methodology has shown
effectiveness in complex areas like deep-learning accelerator design, where
traditional tools struggle.
</p>
|
|
|
|
<p>In this paper, we propose an online algorithm "mspace" for forecasting node
features in temporal graphs, which adeptly captures spatial cross-correlation
among different nodes as well as the temporal autocorrelation within a node.
The algorithm can be used for both probabilistic and deterministic multi-step
forecasting, making it applicable for estimation and generation tasks.
Comparative evaluations against various baselines, including graph neural
network (GNN) based models and classical Kalman filters, demonstrate that
mspace performs at par with the state-of-the-art and even surpasses them on
some datasets. Importantly, mspace demonstrates consistent robustness across
datasets with varying training sizes, a notable advantage over GNN-based
methods requiring abundant training samples to learn the spatiotemporal trends
in the data effectively. Therefore, employing mspace is advantageous in
scenarios where the training sample availability is limited. Additionally, we
establish theoretical bounds on multi-step forecasting error of mspace and show
that it scales as $O(q)$ for $q$-step forecast.
</p>
|
|
|
|
<p>Robotic arms are highly common in various automation processes such as
manufacturing lines. However, these highly capable robots are usually degraded
to simple repetitive tasks such as pick-and-place. On the other hand, designing
an optimal robot for one specific task consumes large resources of engineering
time and costs. In this paper, we propose a novel concept for optimizing the
fitness of a robotic arm to perform a specific task based on human
demonstration. Fitness of a robot arm is a measure of its ability to follow
recorded human arm and hand paths. The optimization is conducted using a
modified variant of the Particle Swarm Optimization for the robot design
problem. In the proposed approach, we generate an optimal robot design along
with the required path to complete the task. The approach could reduce the
time-to-market of robotic arms and enable the standardization of modular
robotic parts. Novice users could easily apply a minimal robot arm to various
tasks. Two test cases of common manufacturing tasks are presented yielding
optimal designs and reduced computational effort by up to 92%.
</p>
|
|
|
|
<p>Tendon-based underactuated hands are intended to be simple, compliant and
affordable. Often, they are 3D printed and do not include tactile sensors.
Hence, performing in-hand object recognition with direct touch sensing is not
feasible. Adding tactile sensors can complicate the hardware and introduce
extra costs to the robotic hand. Also, the common approach of visual perception
may not be available due to occlusions. In this paper, we explore whether
kinesthetic haptics can provide in-direct information regarding the geometry of
a grasped object during in-hand manipulation with an underactuated hand. By
solely sensing actuator positions and torques over a period of time during
motion, we show that a classifier can recognize an object from a set of trained
ones with a high success rate of almost 95%. In addition, the implementation of
a real-time majority vote during manipulation further improves recognition.
Additionally, a trained classifier is also shown to be successful in
distinguishing between shape categories rather than just specific objects.
</p>
|
|
|
|
<p>This article motivates, describes, and presents the PBSCSR dataset for
studying composer style recognition of piano sheet music. Our overarching goal
was to create a dataset for studying composer style recognition that is "as
accessible as MNIST and as challenging as ImageNet." To achieve this goal, we
sample fixed-length bootleg score fragments from piano sheet music images on
IMSLP. The dataset itself contains 40,000 62x64 bootleg score images for a
9-way classification task, 100,000 62x64 bootleg score images for a 100-way
classification task, and 29,310 unlabeled variable-length bootleg score images
for pretraining. The labeled data is presented in a form that mirrors MNIST
images, in order to make it extremely easy to visualize, manipulate, and train
models in an efficient manner. Additionally, we include relevant metadata to
allow access to the underlying raw sheet music images and other related data on
IMSLP. We describe several research tasks that could be studied with the
dataset, including variations of composer style recognition in a few-shot or
zero-shot setting. For tasks that have previously proposed models, we release
code and baseline results for future works to compare against. We also discuss
open research questions that the PBSCSR data is especially well suited to
facilitate research on and areas of fruitful exploration in future work.
</p>
|
|
|
|
<p>In this paper, we distinguish two guessing algorithms for decoding binary
linear codes. One is the guessing noise decoding (GND) algorithm, and the other
is the guessing codeword decoding (GCD) algorithm. We prove that the GCD is a
maximum likelihood (ML) decoding algorithm and that the GCD is more efficient
than GND for most practical applications. We also introduce several variants of
ordered statistic decoding (OSD) to trade off the complexity of the Gaussian
elimination (GE) and that of the guessing, which may find applications in
decoding short block codes in the high signal-to-noise ratio (SNR) region.
</p>
|
|
|
|
<p>Large Language Models (LLMs), exemplified by ChatGPT, have significantly
reshaped text generation, particularly in the realm of writing assistance.
While ethical considerations underscore the importance of transparently
acknowledging LLM use, especially in scientific communication, genuine
acknowledgment remains infrequent. A potential avenue to encourage accurate
acknowledging of LLM-assisted writing involves employing automated detectors.
Our evaluation of four cutting-edge LLM-generated text detectors reveals their
suboptimal performance compared to a simple ad-hoc detector designed to
identify abrupt writing style changes around the time of LLM proliferation. We
contend that the development of specialized detectors exclusively dedicated to
LLM-assisted writing detection is necessary. Such detectors could play a
crucial role in fostering more authentic recognition of LLM involvement in
scientific communication, addressing the current challenges in acknowledgment
practices.
</p>
|
|
|
|
<p>Modeling time series data remains a pervasive issue as the temporal dimension
is inherent to numerous domains. Despite significant strides in time series
forecasting, high noise-to-signal ratio, non-normality, non-stationarity, and
lack of data continue challenging practitioners. In response, we leverage a
simple representation augmentation technique to overcome these challenges. Our
augmented representation acts as a statistical-space prior encoded at each time
step. In response, we name our method Statistical-space Augmented
Representation (SSAR). The underlying high-dimensional data-generating process
inspires our representation augmentation. We rigorously examine the empirical
generalization performance on two data sets with two downstream temporal
learning algorithms. Our approach significantly beats all five up-to-date
baselines. Moreover, the highly modular nature of our approach can easily be
applied to various settings. Lastly, fully-fledged theoretical perspectives are
available throughout the writing for a clear and rigorous understanding.
</p>
|
|
|
|
<p>To reconstruct a 3D human surface from a single image, it is important to
consider human pose, shape and clothing details simultaneously. In recent
years, a combination of parametric body models (such as SMPL) that capture body
pose and shape prior, and neural implicit functions that learn flexible
clothing details, has been used to integrate the advantages of both approaches.
However, the combined representation introduces additional computation, e.g.
signed distance calculation, in 3D body feature extraction, which exacerbates
the redundancy of the implicit query-and-infer process and fails to preserve
the underlying body shape prior. To address these issues, we propose a novel
IUVD-Feedback representation, which consists of an IUVD occupancy function and
a feedback query algorithm. With this representation, the time-consuming signed
distance calculation is replaced by a simple linear transformation in the IUVD
space, leveraging the SMPL UV maps. Additionally, the redundant query points in
the query-and-infer process are reduced through a feedback mechanism. This
leads to more reasonable 3D body features and more effective query points,
successfully preserving the parametric body prior. Moreover, the IUVD-Feedback
representation can be embedded into any existing implicit human reconstruction
pipelines without modifying the trained neural networks. Experiments on
THuman2.0 dataset demonstrate that the proposed IUVD-Feedback representation
improves result robustness and achieves three times faster acceleration in the
query-and-infer process. Furthermore, this representation has the potential to
be used in generative applications by leveraging its inherited semantic
information from the parametric body model.
</p>
|
|
|
|
<p>The training datasets used in long-tailed recognition are extremely
unbalanced, resulting in significant variation in per-class accuracy across
categories. Prior works mostly used average accuracy to evaluate their
algorithms, which easily ignores those worst-performing categories. In this
paper, we aim to enhance the accuracy of the worst-performing categories and
utilize the harmonic mean and geometric mean to assess the model's performance.
We revive the balanced undersampling idea to achieve this goal. In few-shot
learning, balanced subsets are few-shot and will surely under-fit, hence it is
not used in modern long-tailed learning. But, we find that it produces a more
equitable distribution of accuracy across categories with much higher harmonic
and geometric mean accuracy, and, but lower average accuracy. Moreover, we
devise a straightforward model ensemble strategy, which does not result in any
additional overhead and achieves improved harmonic and geometric mean while
keeping the average accuracy almost intact when compared to state-of-the-art
long-tailed learning methods. We validate the effectiveness of our approach on
widely utilized benchmark datasets for long-tailed learning. Our code is at
\href{https://github.com/yuhao318/BTM/}{https://github.com/yuhao318/BTM/}.
</p>
|
|
|
|
<p>While subjective assessments have been the gold standard for evaluating
speech generation, there is a growing need for objective metrics that are
highly correlated with human subjective judgments due to their cost efficiency.
This paper proposes reference-aware automatic evaluation methods for speech
generation inspired by evaluation metrics in natural language processing. The
proposed SpeechBERTScore computes the BERTScore for self-supervised dense
speech features of the generated and reference speech, which can have different
sequential lengths. We also propose SpeechBLEU and SpeechTokenDistance, which
are computed on speech discrete tokens. The evaluations on synthesized speech
show that our method correlates better with human subjective ratings than mel
cepstral distortion and a recent mean opinion score prediction model. Also,
they are effective in noisy speech evaluation and have cross-lingual
applicability.
</p>
|
|
|
|
<p>We present H2O-Danube-1.8B, a 1.8B language model trained on 1T tokens
following the core principles of LLama 2 and Mistral. We leverage and refine
various techniques for pre-training large language models. Although our model
is trained on significantly fewer total tokens compared to reference models of
similar size, it exhibits highly competitive metrics across a multitude of
benchmarks. We additionally release a chat model trained with supervised
fine-tuning followed by direct preference optimization. We make H2O-Danube-1.8B
openly available under Apache 2.0 license further democratizing LLMs to a wider
audience economically.
</p>
|
|
|
|
<p>Localizing linearly moving sound sources using microphone arrays is
particularly challenging as the transient nature of the signal leads to
relatively short observation periods. Commonly, a moving focus is used and most
methods operate at least partially in the time domain. In contrast, here an
inverse source localization algorithm for mono-frequent uniformly moving
sources that acts entirely in the frequency domain is presented. For this, a
2.5D approach is utilized and a transfer function between sources and a
microphone grid is derived. By solving a least squares problem using the data
at the microphone grid, the unknown source distribution in the moving frame can
be determined. For that the measured time signals need to be transformed into
the frequency domain using a windowed discrete Fourier transform (DFT), which
leads to effects such as spectral leakage that depends on the length of the
time interval and the analysis window used. To include these effects in the
numerical model, the calculation of the transfer matrix is modified using the
Fourier transform of the analysis window. Currently, this approach is limited
to mono-frequent sources as this allows a simplification of the calculation and
reduces the computational effort. The least squares problem is solved using a
Tikhonov regularization employing an L-curve approach to determine a suitable
regularization parameter. As a moving source is considered, the Doppler effect
allows to enhance the stability of the system by combining the transfer
functions for multiple frequencies in the measured signals. The performance of
the approach is validated using simulated data of a moving point source with or
without a reflecting ground. Numerical experiments are performed to show the
effect of the choice of frequencies in the receiver spectrum, the effect of the
DFT, the frequency of the source, and the distance of source and receiver.
</p>
|
|
|
|
<p>Large Language Models (LLMs) have been widely deployed for their remarkable
capability to generate texts resembling human language. However, they could be
misused by criminals to create deceptive content, such as fake news and
phishing emails, which raises ethical concerns. Watermarking is a key technique
to mitigate the misuse of LLMs, which embeds a watermark (e.g., a bit string)
into a text generated by a LLM. Consequently, this enables the detection of
texts generated by a LLM as well as the tracing of generated texts to a
specific user. The major limitation of existing watermark techniques is that
they cannot accurately or efficiently extract the watermark from a text,
especially when the watermark is a long bit string. This key limitation impedes
their deployment for real-world applications, e.g., tracing generated texts to
a specific user.
</p>
<p>This work introduces a novel watermarking method for LLM-generated text
grounded in \textbf{error-correction codes} to address this challenge. We
provide strong theoretical analysis, demonstrating that under bounded
adversarial word/token edits (insertion, deletion, and substitution), our
method can correctly extract watermarks, offering a provable robustness
guarantee. This breakthrough is also evidenced by our extensive experimental
results. The experiments show that our method substantially outperforms
existing baselines in both accuracy and robustness on benchmark datasets. For
instance, when embedding a bit string of length 12 into a 200-token generated
text, our approach attains an impressive match rate of $98.4\%$, surpassing the
performance of Yoo et al. (state-of-the-art baseline) at $85.6\%$. When
subjected to a copy-paste attack involving the injection of 50 tokens to
generated texts with 200 words, our method maintains a substantial match rate
of $90.8\%$, while the match rate of Yoo et al. diminishes to below $65\%$.
</p>
|
|
|
|
<p>Multi-modal large language models (MLLMs) have demonstrated remarkable
success in vision and visual-language tasks within the natural image domain.
Owing to the significant diversities between the natural image and RS image
hinder the development of MLLMs in the remote sensing (RS) domain. Currently,
the unified and powerful MLLM capable of various RS visual tasks is still
under-explored. To fill the gap, a pioneer MLLM called EarthGPT is proposed for
universal RS image comprehension, which integrates various multi-sensor RS
interpretation tasks uniformly. More importantly, a large-scale multi-sensor
multi-modal RS instruction-following dataset named MMRS is carefully
constructed, which comprises 1005.842k image-text pairs based on 34 existing
diverse RS datasets and includes multi-sensor images such as optical, synthetic
aperture radar (SAR), and infrared. The MMRS addresses the issue of MLLMs
lacking RS expert knowledge and stimulates the development of MMLMs in the RS
domain. Extensive experiments demonstrate the EarthGPT's superior performance
in various RS visual interpretation tasks compared with the other specialist
models and MLLMs, which proves the effectiveness of the proposed EarthGPT and
provides a versatile paradigm for open-set reasoning tasks.
</p>
|
|
|
|
<p>With the growth of coal production, the load on the production capacity of
coal enterprises also increases, which leads to a concomitant increase in dust
formation in both opencast and underground methods of mining coal deposits.
Dust, generated during drilling, blasting operations, excavation, loading,
crushing and transportation of mined rock is one of the factors that has a
negative impact on the health of mining workers and on the level of
environmental pollution with solid particles. Thus, increasing the efficiency
of controlling the concentration of solid particles in the mine atmosphere and
dust deposits is an urgent scientific and technical task. In doing so, the use
of modern digital technologies within the framework of the industry 4.0 concept
makes it possible to develop approaches that can significantly improve the
quality of monitoring the state of the mine atmosphere at coal mining
enterprises. This article provides a theoretical basis and test results for a
system for continuous automatic monitoring of dust concentration in a mine
atmosphere as the component of the multifunctional coal mine safety system. It
is shown that monitoring the state of mine workings aerological safety can be
carried out in real time through the system of the new generation using
artificial intelligence. The ability of the proposed system to measure basic
physical parameters affecting dust deposition (disperse composition, air
humidity, dust concentration and air flow velocity) is noted.
</p>
|
|
|
|
<p>In current virtual try-on tasks, only the effect of clothing worn on a person
is depicted. In practical applications, users still need to select suitable
clothing from a vast array of individual clothing items, but existing clothes
may not be able to meet the needs of users. Additionally, some user groups may
be uncertain about what clothing combinations suit them and require clothing
selection recommendations. However, the retrieval-based recommendation methods
cannot meet users' personalized needs, so we propose the Generative Fashion
Matching-aware Virtual Try-on Framework(GMVT). We generate coordinated and
stylistically diverse clothing for users using the Generative Matching Module.
In order to effectively learn matching information, we leverage large-scale
matching dataset, and transfer this acquired knowledge to the current virtual
try-on domain. Furthermore, we utilize the Virtual Try-on Module to visualize
the generated clothing on the user's body. To validate the effectiveness of our
approach, we enlisted the expertise of fashion designers for a professional
evaluation, assessing the rationality and diversity of the clothing
combinations and conducting an evaluation matrix analysis. Our method
significantly enhances the practicality of virtual try-on, offering users a
wider range of clothing choices and an improved user experience.
</p>
|
|
|
|
<p>This work focuses on distributed linear precoding when users transmit
correlated information over a fading Multiple-Input and Multiple-Output
Multiple Access Channel. Precoders are optimized in order to minimize the
sum-Mean Square Error (MSE) between the source and the estimated symbols. When
sources are correlated, minimizing the sum-MSE results in a non-convex
optimization problem. Precoders for an arbitrary number of users and transmit
and receive antennas are thus obtained via a projected steepest-descent
algorithm and a low-complexity heuristic approach. For the more restrictive
case of two single-antenna users, a closed-form expression for the minimum
sum-MSE precoders is derived. Moreover, for the scenario with a single receive
antenna and any number of users, a solution is obtained by means of a
semidefinite relaxation. Finally, we also consider precoding schemes where the
precoders are decomposed into complex scalars and unit norm vectors. Simulation
results show a significant improvement when source correlation is exploited at
precoding, especially for low SNRs and when the number of receive antennas is
lower than the number of transmitting nodes.
</p>
|
|
|
|
<p>Fluidic logic circuitry analogous to its electric counterpart could
potentially provide soft robots with machine intelligence due to its supreme
adaptability, dexterity, and seamless compatibility using state-of-the-art
additive manufacturing processes. However, conventional microfluidic channel
based circuitry suffers from limited driving force, while macroscopic pneumatic
logic lacks timely responsivity and desirable accuracy. Producing heavy duty,
highly responsive and integrated fluidic soft robotic circuitry for control and
actuation purposes for biomedical applications has yet to be accomplished in a
hydraulic manner. Here, we present a 3D printed hydraulic fluidic half-adder
system, composing of three basic hydraulic fluidic logic building blocks: AND,
OR, and NOT gates. Furthermore, a hydraulic soft robotic half-adder system is
implemented using an XOR operation and modified dual NOT gate system based on
an electrical oscillator structure. This half-adder system possesses binary
arithmetic capability as a key component of arithmetic logic unit in modern
computers. With slight modifications, it can realize the control over three
different directions of deformation of a three degree-of-freedom soft actuation
mechanism solely by changing the states of the two fluidic inputs. This
hydraulic fluidic system utilizing a small number of inputs to control multiple
distinct outputs, can alter the internal state of the circuit solely based on
external inputs, holding significant promises for the development of
microfluidics, fluidic logic, and intricate internal systems of untethered soft
robots with machine intelligence.
</p>
|
|
|
|
<p>This paper presents LatentPatch, a new method for generating realistic images
from a small dataset of only a few images. We use a lightweight model with only
a few thousand parameters. Unlike traditional few-shot generation methods that
finetune pre-trained large-scale generative models, our approach is computed
directly on the latent distribution by sequential feature matching, and is
explainable by design. Avoiding large models based on transformers, recursive
networks, or self-attention, which are not suitable for small datasets, our
method is inspired by non-parametric texture synthesis and style transfer
models, and ensures that generated image features are sampled from the source
distribution. We extend previous single-image models to work with a few images
and demonstrate that our method can generate realistic images, as well as
enable conditional sampling and image editing. We conduct experiments on face
datasets and show that our simplistic model is effective and versatile.
</p>
|
|
|
|
<p>A maximal planar graph is a graph which can be embedded in the plane such
that every face of the graph is a triangle. The center of a graph is the
subgraph induced by the vertices of minimum eccentricity. We introduce the
notion of quasi-eccentric vertices, and use this to characterize maximal planar
graphs that are the center of some planar graph. We also present some easier to
check only necessary / only sufficient conditions for planar and maximal planar
graphs to be the center of a planar graph. Finally, we use the aforementioned
characterization to prove that all maximal planar graphs of order at most 8 are
the center of some planar graph -- and this bound is sharp.
</p>
|
|
|
|
<p>Knowledge Tracing (KT) aims to predict the future performance of students by
tracking the development of their knowledge states. Despite all the recent
progress made in this field, the application of KT models in education systems
is still restricted from the data perspectives: 1) limited access to real life
data due to data protection concerns, 2) lack of diversity in public datasets,
3) noises in benchmark datasets such as duplicate records. To resolve these
problems, we simulated student data with three statistical strategies based on
public datasets and tested their performance on two KT baselines. While we
observe only minor performance improvement with additional synthetic data, our
work shows that using only synthetic data for training can lead to similar
performance as real data.
</p>
|
|
|
|
<p>Polar codes were originally specified for codelengths that are powers of two.
In many applications, it is desired to have a code that is not restricted to
such lengths. Two common strategies of modifying the length of a code are
shortening and puncturing. Simple and explicit schemes for shortening and
puncturing were introduced by Wang and Liu, and by Niu, Chen, and Lin,
respectively. In this paper, we prove that both schemes yield polar codes that
are capacity achieving. Moreover, the probability of error for both the
shortened and the punctured polar codes decreases to zero at the same
exponential rate as seminal polar codes. These claims hold for \emph{all}
codelengths large enough.
</p>
|
|
|
|
<p>We consider a non-conservative nonlinear Schrodinger equation (NCNLS) with
time-dependent coefficients, inspired by a water waves problem. This problem
does not have mass or energy conservation, but instead mass and energy change
in time under explicit balance laws. In this paper we extend to the particular
NCNLS two numerical schemes which are known to conserve energy and mass in the
discrete level for the cubic NLS. Both schemes are second oder accurate in
time, and we prove that their extensions satisfy discrete versions of the mass
and energy balance laws for the NCNLS. The first scheme is a relaxation scheme
that is linearly implicit. The other scheme is a modified Delfour-Fortin-Payre
scheme and it is fully implicit. Numerical results show that both schemes
capture robustly the correct values of mass and energy, even in strongly
non-conservative problems. We finally compare the two numerical schemes and
discuss their performance.
</p>
|
|
|
|
<p>Nonnegative Matrix Factorization (NMF) is an important unsupervised learning
method to extract meaningful features from data. To address the NMF problem
within a polynomial time framework, researchers have introduced a separability
assumption, which has recently evolved into the concept of coseparability. This
advancement offers a more efficient core representation for the original data.
However, in the real world, the data is more natural to be represented as a
multi-dimensional array, such as images or videos. The NMF's application to
high-dimensional data involves vectorization, which risks losing essential
multi-dimensional correlations. To retain these inherent correlations in the
data, we turn to tensors (multidimensional arrays) and leverage the tensor
t-product. This approach extends the coseparable NMF to the tensor setting,
creating what we term coseparable Nonnegative Tensor Factorization (NTF). In
this work, we provide an alternating index selection method to select the
coseparable core. Furthermore, we validate the t-CUR sampling theory and
integrate it with the tensor Discrete Empirical Interpolation Method (t-DEIM)
to introduce an alternative, randomized index selection process. These methods
have been tested on both synthetic and facial analysis datasets. The results
demonstrate the efficiency of coseparable NTF when compared to coseparable NMF.
</p>
|
|
|
|
<p>The verification of liveness conditions is an important aspect of state-based
rigorous methods. This article investigates this problem in a fragment
$\square$LTL of the logic LTL(EB), the integration of the UNTIL-fragment of
Pnueli's linear time temporal logic (LTL) and the logic of Event-B, in which
the most commonly used liveness conditions can be expressed. For this fragment
a sound set of derivation rules is developed, which is also complete under mild
restrictions for Event-B machines.
</p>
|
|
|
|
<p>We present a novel software feature for the BrainScaleS-2 accelerated
neuromorphic platform that facilitates the emulation of partitioned large-scale
spiking neural networks. This approach is well suited for many deep spiking
neural networks, where the constraint of the largest recurrent subnetwork
fitting on the substrate or the limited fan-in of neurons is often not a
limitation in practice. We demonstrate the training of two deep spiking neural
network models, using the MNIST and EuroSAT datasets, that exceed the physical
size constraints of a single-chip BrainScaleS-2 system. The ability to emulate
and train networks larger than the substrate provides a pathway for accurate
performance evaluation in planned or scaled systems, ultimately advancing the
development and understanding of large-scale models and neuromorphic computing
architectures.
</p>
|
|
|
|
<p>Traditional neuromorphic hardware architectures rely on event-driven
computation, where the asynchronous transmission of events, such as spikes,
triggers local computations within synapses and neurons. While machine learning
frameworks are commonly used for gradient-based training, their emphasis on
dense data structures poses challenges for processing asynchronous data such as
spike trains. This problem is particularly pronounced for typical tensor data
structures. In this context, we present a novel library (jaxsnn) built on top
of JAX, that departs from conventional machine learning frameworks by providing
flexibility in the data structures used and the handling of time, while
maintaining Autograd functionality and composability. Our library facilitates
the simulation of spiking neural networks and gradient estimation, with a focus
on compatibility with time-continuous neuromorphic backends, such as the
BrainScaleS-2 system, during the forward pass. This approach opens avenues for
more efficient and flexible training of spiking neural networks, bridging the
gap between traditional neuromorphic architectures and contemporary machine
learning frameworks.
</p>
|
|
|
|
<p>Cybersecurity remains a critical challenge in the digital age, with network
traffic flow anomaly detection being a key pivotal instrument in the fight
against cyber threats. In this study, we address the prevalent issue of data
integrity in network traffic datasets, which are instrumental in developing
machine learning (ML) models for anomaly detection. We introduce two refined
versions of the CICIDS-2017 dataset, NFS-2023-nTE and NFS-2023-TE, processed
using NFStream to ensure methodologically sound flow expiration and labeling.
Our research contrasts the performance of the Random Forest (RF) algorithm
across the original CICIDS-2017, its refined counterparts WTMC-2021 and
CRiSIS-2022, and our NFStream-generated datasets, in both binary and
multi-class classification contexts. We observe that the RF model exhibits
exceptional robustness, achieving consistent high-performance metrics
irrespective of the underlying dataset quality, which prompts a critical
discussion on the actual impact of data integrity on ML efficacy. Our study
underscores the importance of continual refinement and methodological rigor in
dataset generation for network security research. As the landscape of network
threats evolves, so must the tools and techniques used to detect and analyze
them.
</p>
|
|
|
|
<p>Congestion pricing, while adopted by many cities to alleviate traffic
congestion, raises concerns about widening socioeconomic disparities due to its
disproportionate impact on low-income travelers. In this study, we address this
concern by proposing a new class of congestion pricing schemes that not only
minimize congestion levels but also incorporate an equity objective to reduce
cost disparities among travelers with different willingness-to-pay. Our
analysis builds on a congestion game model with heterogeneous traveler
populations. We present four pricing schemes that account for practical
considerations, such as the ability to charge differentiated tolls to various
traveler populations and the option to toll all or only a subset of edges in
the network. We evaluate our pricing schemes in the calibrated freeway network
of the San Francisco Bay Area. We demonstrate that the proposed congestion
pricing schemes improve both efficiency (in terms of reduced average travel
time) and equity (the disparities of travel costs experienced by different
populations) compared to the current pricing scheme. Moreover, our pricing
schemes also generate a total revenue comparable to the current pricing scheme.
Our results further show that pricing schemes charging differentiated prices to
traveler populations with varying willingness-to-pay lead to a more equitable
distribution of travel costs compared to those that charge a homogeneous price
to all.
</p>
|
|
|
|
<p>Newspapers are important sources for historians interested in past societies'
cultural values, social structures, and their changes. Since the 19th century,
newspapers have been widely available and spread regionally. Today, historical
newspapers are digitized but unavailable in a separate metadata-enhanced form.
Machine-readable metadata, however, is a prerequisite for a mass statistical
analysis of this source. This paper focuses on parsing the complex layout of
historic newspaper pages, which today's machines do not understand well. We
argue for using neural networks, which require detailed annotated data in large
numbers. Our Bonn newspaper dataset consists of 486 pages of the
\textit{K\"olnische Zeitung} from the years 1866 and 1924. We propose solving
the newspaper-understanding problem by training a U-Net on our new dataset,
which delivers satisfactory performance.
</p>
|
|
|
|
<p>A robust multichannel speaker diarization and separation system is proposed
by exploiting the spatio-temporal activity of the speakers. The system is
realized in a hybrid architecture that combines the array signal processing
units and the deep learning units. For speaker diarization, a spatial coherence
matrix across time frames is computed based on the whitened relative transfer
functions (wRTFs) of the microphone array. This serves as a robust feature for
subsequent machine learning without the need for prior knowledge of the array
configuration. A computationally efficient Spatial Activity-driven Speaker
Diarization network (SASDnet) is constructed to estimate the speaker activity
directly from the spatial coherence matrix. For speaker separation, we propose
the Global and Local Activity-driven Speaker Extraction network (GLASEnet) to
separate speaker signals via speaker-specific global and local spatial activity
functions. The local spatial activity functions depend on the coherence between
the wRTFs of each time-frequency bin and the target speaker-dominant bins. The
global spatial activity functions are computed from the global spatial
coherence functions based on frequency-averaged local spatial activity
functions. Experimental results have demonstrated superior speaker,
diarization, counting, and separation performance achieved by the proposed
system with low computational complexity compared to the pre-selected
baselines.
</p>
|
|
|
|
<p>This paper presents a new approach that integrates deep learning with
computational chess, using both the Mixture of Experts (MoE) method and
Monte-Carlo Tree Search (MCTS). Our methodology employs a suite of specialized
models, each designed to respond to specific changes in the game's input data.
This results in a framework with sparsely activated models, which provides
significant computational benefits. Our framework combines the MoE method with
MCTS, in order to align it with the strategic phases of chess, thus departing
from the conventional ``one-for-all'' model. Instead, we utilize distinct game
phase definitions to effectively distribute computational tasks across multiple
expert neural networks. Our empirical research shows a substantial improvement
in playing strength, surpassing the traditional single-model framework. This
validates the efficacy of our integrated approach and highlights the potential
of incorporating expert knowledge and strategic principles into neural network
design. The fusion of MoE and MCTS offers a promising avenue for advancing
machine learning architectures.
</p>
|
|
|
|
<p>The design of zero-delay Joint Source-Channel Coding (JSCC) schemes for the
transmission of correlated information over fading Multiple Access Channels
(MACs) is an interesting problem for many communication scenarios like Wireless
Sensor Networks (WSNs). Among the different JSCC schemes so far proposed for
this scenario, Distributed Quantizer Linear Coding (DQLC) represents an
appealing solution since it is able to outperform uncoded transmissions for any
correlation level at high Signal-to-Noise Ratios (SNRs) with a low
computational cost. In this paper, we extend the design of DQLC-based schemes
for fading MACs considering sphere decoding to make the optimal Minimum Mean
Squared Error (MMSE) estimation computationally affordable for an arbitrary
number of transmit users. The use of sphere decoding also allows to formulate a
practical algorithm for the optimization of DQLC-based systems. Finally,
non-linear Kalman Filtering for the DQLC is considered to jointly exploit the
temporal and spatial correlation of the source symbols. The results of computer
experiments show that the proposed DQLC scheme with the Kalman Filter decoding
approach clearly outperforms uncoded transmissions for medium and high SNRs.
</p>
|
|
|
|
<p>This paper presents a novel solution concept, called BAR Nash Equilibrium
(BARNE) and apply it to analyse the Verifier's dilemma, a fundamental problem
in blockchain. Our solution concept adapts the Nash equilibrium (NE) to
accommodate interactions among Byzantine, altruistic and rational agents, which
became known as the BAR setting in the literature. We prove the existence of
BARNE in a large class of games and introduce two natural refinements, global
and local stability. Using this equilibrium and its refinement, we analyse the
free-rider problem in the context of byzantine consensus. We demonstrate that
by incorporating fines and forced errors into a standard quorum-based
blockchain protocol, we can effectively reestablish honest behavior as a
globally stable BARNE.
</p>
|
|
|
|
<p>Wasserstein distortion is a one-parameter family of distortion measures that
was recently proposed to unify fidelity and realism constraints. After
establishing continuity results for Wasserstein in the extreme cases of pure
fidelity and pure realism, we prove the first coding theorems for compression
under Wasserstein distortion focusing on the regime in which both the rate and
the distortion are small.
</p>
|
|
|
|
<p>Current image manipulation primarily centers on static manipulation, such as
replacing specific regions within an image or altering its overall style. In
this paper, we introduce an innovative dynamic manipulation task, subject
repositioning. This task involves relocating a user-specified subject to a
desired position while preserving the image's fidelity. Our research reveals
that the fundamental sub-tasks of subject repositioning, which include filling
the void left by the repositioned subject, reconstructing obscured portions of
the subject and blending the subject to be consistent with surrounding areas,
can be effectively reformulated as a unified, prompt-guided inpainting task.
Consequently, we can employ a single diffusion generative model to address
these sub-tasks using various task prompts learned through our proposed task
inversion technique. Additionally, we integrate pre-processing and
post-processing techniques to further enhance the quality of subject
repositioning. These elements together form our SEgment-gEnerate-and-bLEnd
(SEELE) framework. To assess SEELE's effectiveness in subject repositioning, we
assemble a real-world subject repositioning dataset called ReS. Our results on
ReS demonstrate the quality of repositioned image generation.
</p>
|
|
|
|
<p>Recently, low-resource dialogue state tracking (DST) has received increasing
attention. First obtaining state values then based on values to generate slot
types has made great progress in this task. However, obtaining state values is
still an under-studied problem. Existing extraction-based approaches cannot
capture values that require the understanding of context and are not
generalizable either. To address these issues, we propose a novel State VAlue
Generation based framework (SVAG), decomposing DST into state value generation
and domain slot generation. Specifically, we propose to generate state values
and use self-training to further improve state value generation. Moreover, we
design an estimator aiming at detecting incomplete generation and incorrect
generation for pseudo-labeled data selection during self-training. Experimental
results on the MultiWOZ 2.1 dataset show that our method which has only less
than 1 billion parameters achieves state-of-the-art performance under the data
ratio settings of 5%, 10%, and 25% when limited to models under 100 billion
parameters. Compared to models with more than 100 billion parameters, SVAG
still reaches competitive results.
</p>
|
|
|
|
<p>This white paper outlines a long-term scientific vision for the development
of digital-democracy technology. We contend that if digital democracy is to
meet the ambition of enabling a participatory renewal in our societies, then a
comprehensive multi-methods research effort is required that could, over the
years, support its development in a democratically principled, empirically and
computationally informed way. The paper is co-authored by an international and
interdisciplinary team of researchers and arose from the Lorentz Center
Workshop on ``Algorithmic Technology for Democracy'' (Leiden, October 2022).
</p>
|
|
|
|
<p>Since Google introduced Kotlin as an official programming language for
developing Android apps in 2017, Kotlin has gained widespread adoption in
Android development. However, compared to Java, there is limited support for
Kotlin code dependency analysis, which is the foundation to software analysis.
To bridge this gap, we developed Depends-Kotlin to extract entities and their
dependencies in Kotlin source code. Not only does Depends-Kotlin support
extracting entities' dependencies in Kotlin code, but it can also extract
dependency relations between Kotlin and Java. The extraction of such
cross-language dependencies can help developers understand the migration
process from Java to Kotlin. Additionally, we used a Java project with
confirmed dependencies as a benchmark and converted this project to two
projects: Kotlin-only and a combination of Kotlin and Java. The dependencies in
these two projects were then extracted using our tool. The consistency observed
among dependency relations in all three projects confirms the accuracy of
Depends-Kotlin. Furthermore, the performance of Depends-Kotlin was assessed
using another three projects of varying sizes. The source code of
Depends-Kotlin and the dataset used in this demo paper have been uploaded to
https://github.com/XYZboom/depends-kotlin. We also provided a screencast
presenting Depends-Kotlin https://youtu.be/daZuXOwn1Ls.
</p>
|
|
|
|
<p>Centralized repair refers to repairing $h\geq 2$ node failures using $d$
helper nodes in a centralized way, where the repair bandwidth is counted by the
total amount of data downloaded from the helper nodes. A centralized MSR code
is an MDS array code with $(h,d)$-optimal repair for some $h$ and $d$. In this
paper, we present several classes of centralized MSR codes with small
sub-packetization. At first, we construct an alternative MSR code with
$(1,d_i)$-optimal repair for multiple repair degrees $d_i$ simultaneously.
Based on the code structure, we are able to construct a centralized MSR code
with $(h_i,d_i)$-optimal repair property for all possible $(h_i,d_i)$ with
$h_i\mid (d_i-k)$ simultaneously. The sub-packetization is no more than ${\rm
lcm}(1,2,\ldots,n-k)(n-k)^n$, which is much smaller than a previous work given
by Ye and Barg ($({\rm lcm}(1,2,\ldots,n-k))^n$). Moreover, for general
parameters $2\leq h\leq n-k$ and $k\leq d\leq n-h$, we further give a
centralized MSR code enabling $(h,d)$-optimal repair with sub-packetization
smaller than all previous works.
</p>
|
|
|
|
<p>The transformation model is an essential component of any deformable image
registration approach. It provides a representation of physical deformations
between images, thereby defining the range and realism of registrations that
can be found. Two types of transformation models have emerged as popular
choices: B-spline models and mesh models. Although both models have been
investigated in detail, a direct comparison has not yet been made, since the
models are optimized using very different optimization methods in practice.
B-spline models are predominantly optimized using gradient-descent methods,
while mesh models are typically optimized using finite-element method solvers
or evolutionary algorithms. Multi-objective optimization methods, which aim to
find a diverse set of high-quality trade-off registrations, are increasingly
acknowledged to be important in deformable image registration. Since these
methods search for a diverse set of registrations, they can provide a more
complete picture of the capabilities of different transformation models, making
them suitable for a comparison of models. In this work, we conduct the first
direct comparison between B-spline and mesh transformation models, by
optimizing both models with the same state-of-the-art multi-objective
optimization method, the Multi-Objective Real-Valued Gene-pool Optimal Mixing
Evolutionary Algorithm (MO-RV-GOMEA). The combination with B-spline
transformation models, moreover, is novel. We experimentally compare both
models on two different registration problems that are both based on pelvic CT
scans of cervical cancer patients, featuring large deformations. Our results,
on three cervical cancer patients, indicate that the choice of transformation
model can have a profound impact on the diversity and quality of achieved
registration outcomes.
</p>
|
|
|
|
<p>This paper proposes a node flux-linkage synchronizing control method (NFSCM)
for power systems with 100% wind power generation based on a capacitor voltage
balancing scheme (CVBS). Different from the conventional grid-forming
controllers, NFSCM is designed to regulate inverters as virtual flux-linkage
sources. Auto-synchronization of flux-linkage vectors is achieved through the
CVBS-based NFSCM. The mismatch among the angular frequencies of flux-linkage
vectors is eliminated by regulating the tracking errors of DC-link voltages,
which establishes a negative feedback between the output frequency and active
power of the inverter. NFSCM is adaptive to weak and strong grids. It avoids
the excitation inrush currents in the step-up transformer of wind power
generators. It also eliminates the DC components of the three-phase currents,
and avoids low-frequency oscillations in active power. In order to limit the
short-circuit current of inverters, a logic-based bang-bang funnel control
(LBFC) is designed to control the switches of inverter bridges when
over-current is detected. LBFC is able to restrict various fault currents
within an acceptable range within the shortest time. LBFC and NFSCM are
designed to operate in a switched manner according to a state-dependent
switching strategy. Time-domain simulations were conducted on a 100% wind power
generation test system, and the performance of NFSCM and LBFC were
investigated.
</p>
|
|
|
|
<p>RISC-V processors encounter substantial challenges in deploying
multi-precision deep neural networks (DNNs) due to their restricted precision
support, constrained throughput, and suboptimal dataflow design. To tackle
these challenges, a scalable RISC-V vector (RVV) processor, namely SPEED, is
proposed to enable efficient multi-precision DNN inference by innovations from
customized instructions, hardware architecture, and dataflow mapping. Firstly,
dedicated customized RISC-V instructions are proposed based on RVV extensions,
providing SPEED with fine-grained control over processing precision ranging
from 4 to 16 bits. Secondly, a parameterized multi-precision systolic array
unit is incorporated within the scalable module to enhance parallel processing
capability and data reuse opportunities. Finally, a mixed multi-precision
dataflow strategy, compatible with different convolution kernels and data
precision, is proposed to effectively improve data utilization and
computational efficiency. We perform synthesis of SPEED in TSMC 28nm
technology. The experimental results demonstrate that SPEED achieves a peak
throughput of 287.41 GOPS and an energy efficiency of 1335.79 GOPS/W at 4-bit
precision condition, respectively. Moreover, when compared to the pioneer
open-source vector processor Ara, SPEED provides an area efficiency improvement
of 2.04$\times$ and 1.63$\times$ under 16-bit and 8-bit precision conditions,
respectively, which shows SPEED's significant potential for efficient
multi-precision DNN inference.
</p>
|
|
|
|
<p>Linear optical quantum computing (LOQC) offers a quantum computation paradigm
based on well-established and robust technology and flexible environmental
conditions following DiVincenzo's criteria. Within this framework, integrated
photonics can be utilized to achieve gate-based quantum computing, defining
qubits by path-encoding, quantum gates through the use of Mach-Zehnder
interferometers (MZIs) as fundamental building blocks, and measurements through
single-photon detectors. In particular, universal two-qubit gates can be
achieved by suitable structures of MZIs together with post-selection or
heralding. The most resource-efficient choice is given by the post-selected CZ
gate. However, this implementation is characterized by a design which has a
non-regular structure and cannot be cascaded. This limits the implementation of
large-scale LOQC. Starting from these issues, we suggest an approach to move
toward a universal and scalable LOQC on the integrated photonic platform. First
of all, choosing the post-selected CZ as universal two-qubit gate, we extend
the path-encoded dual-rail qubit to a triplet of waveguides, composed of an
auxiliary waveguide and the pair of waveguides corresponding to the qubit basis
states. Additionally, we introduce a swap photonic network that maps the
regularly-labeled structure of the new path-encoded qubits to the structure
needed for the post-selected CZ. We also discuss the optical swap gate that
allows the connection of non-nearest neighbor path-encoded qubits. In this way,
we can deterministically exchange the locations of the qubits and execute
controlled quantum gates between any path-encoded qubits. Next, by truncating
the auxiliary waveguides after any post-selected CZ, we find that it is
possible to cascade this optical gate when it acts on different pairs that
share only one qubit.
</p>
|
|
|
|
<p>Classification based on Zero-shot Learning (ZSL) is the ability of a model to
classify inputs into novel classes on which the model has not previously seen
any training examples. Providing an auxiliary descriptor in the form of a set
of attributes describing the new classes involved in the ZSL-based
classification is one of the favored approaches to solving this challenging
task. In this work, inspired by Hyperdimensional Computing (HDC), we propose
the use of stationary binary codebooks of symbol-like distributed
representations inside an attribute encoder to compactly represent a
computationally simple end-to-end trainable model, which we name
Hyperdimensional Computing Zero-shot Classifier~(HDC-ZSC). It consists of a
trainable image encoder, an attribute encoder based on HDC, and a similarity
kernel. We show that HDC-ZSC can be used to first perform zero-shot attribute
extraction tasks and, can later be repurposed for Zero-shot Classification
tasks with minimal architectural changes and minimal model retraining. HDC-ZSC
achieves Pareto optimal results with a 63.8% top-1 classification accuracy on
the CUB-200 dataset by having only 26.6 million trainable parameters. Compared
to two other state-of-the-art non-generative approaches, HDC-ZSC achieves 4.3%
and 9.9% better accuracy, while they require more than 1.85x and 1.72x
parameters compared to HDC-ZSC, respectively.
</p>
|
|
|
|
<p>Emotions are crucial in human life, influencing perceptions, relationships,
behaviour, and choices. Emotion recognition using Electroencephalography (EEG)
in the Brain-Computer Interface (BCI) domain presents significant challenges,
particularly the need for extensive datasets. This study aims to generate
synthetic EEG samples that are similar to real samples but are distinct by
augmenting noise to a conditional denoising diffusion probabilistic model, thus
addressing the prevalent issue of data scarcity in EEG research. The proposed
method is tested on the DEAP dataset, showcasing a 1.94% improvement in
classification performance when using synthetic data. This is higher compared
to the traditional GAN-based and DDPM-based approaches. The proposed
diffusion-based approach for EEG data generation appears promising in refining
the accuracy of emotion recognition systems and marks a notable contribution to
EEG-based emotion recognition. Our research further evaluates the effectiveness
of state-of-the-art classifiers on EEG data, employing both real and synthetic
data with varying noise levels.
</p>
|
|
|
|
<p>The considered optimal control problem of a stochastic power system, is to
select the set of power supply vectors which infimizes the probability that the
phase-angle differences of any power flow of the network, endangers the
transient stability of the power system by leaving a critical subset. The set
of control laws is restricted to be a periodically recomputed set of fixed
power supply vectors based on predictions of power demand for the next short
horizon. Neither state feedback nor output feedback is used. The associated
control objective function is Lipschitz continuous, nondifferentiable, and
nonconvex. The results of the paper include that a minimum exists in the value
range of the control objective function. Furthermore, it includes a two-step
procedure to compute an approximate minimizer based on two key methods: (1) a
projected generalized subgradient method for computing an initial vector, and
(2) a steepest descent method for approximating a local minimizer. Finally, it
includes two convergence theorems that an approximation sequence converges to a
local minimum.
</p>
|
|
|
|
<p>We propose a local modification of the standard subdiffusion model by
introducing the initial Fickian diffusion, which results in a multiscale
diffusion model. The developed model resolves the incompatibility between the
nonlocal operators in subdiffusion and the local initial conditions and thus
eliminates the initial singularity of the solutions of the subdiffusion, while
retaining its heavy tail behavior away from the initial time. The
well-posedness of the model and high-order regularity estimates of its
solutions are analyzed by resolvent estimates, based on which the numerical
discretization and analysis are performed. Numerical experiments are carried
out to substantiate the theoretical findings.
</p>
|
|
|
|
<p>Medical image semantic segmentation techniques can help identify tumors
automatically from computed tomography (CT) scans. In this paper, we propose a
Contextual and Attentional feature Fusions enhanced Convolutional Neural
Network (CNN) and Transformer hybrid network (CAFCT) model for liver tumor
segmentation. In the proposed model, three other modules are introduced in the
network architecture: Attentional Feature Fusion (AFF), Atrous Spatial Pyramid
Pooling (ASPP) of DeepLabv3, and Attention Gates (AGs) to improve contextual
information related to tumor boundaries for accurate segmentation. Experimental
results show that the proposed CAFCT achieves a mean Intersection over Union
(IoU) of 90.38% and Dice score of 86.78%, respectively, on the Liver Tumor
Segmentation Benchmark (LiTS) dataset, outperforming pure CNN or Transformer
methods, e.g., Attention U-Net, and PVTFormer.
</p>
|
|
|
|
<p>Earlier papers \cite{VB2022,VB2023a} introduced the notions of a core and an
index of a relation (an index being a special case of a core). A limited form
of the axiom of choice was postulated -- specifically that all partial
equivalence relations (pers) have an index -- and the consequences of adding
the axiom to axiom systems for point-free reasoning were explored. In this
paper, we define a partial ordering on relations, which we call the
\textsf{thins} ordering. We show that our axiom of choice is equivalent to the
property that core relations are the minimal elements of the \textsf{thins}
ordering. We also postulate a novel axiom that guarantees that, when
\textsf{thins} is restricted to non-empty pers, equivalence relations are
maximal. This and other properties of \textsf{thins} provide further evidence
that our axiom of choice is a desirable means of strengthening point-free
reasoning on relations.
</p>
<p>Although our novel axiom is valid for concrete relations and is a sufficient
condition for characterising maximality, we show that it is not a necessary
condition in the abstract point-free algebra. This leaves open the problem of
deriving a necessary and sufficient condition.
</p>
|
|
|
|
<p>This paper presents a comprehensive study on using deep reinforcement
learning (RL) to create dynamic locomotion controllers for bipedal robots.
Going beyond focusing on a single locomotion skill, we develop a general
control solution that can be used for a range of dynamic bipedal skills, from
periodic walking and running to aperiodic jumping and standing. Our RL-based
controller incorporates a novel dual-history architecture, utilizing both a
long-term and short-term input/output (I/O) history of the robot. This control
architecture, when trained through the proposed end-to-end RL approach,
consistently outperforms other methods across a diverse range of skills in both
simulation and the real world.The study also delves into the adaptivity and
robustness introduced by the proposed RL system in developing locomotion
controllers. We demonstrate that the proposed architecture can adapt to both
time-invariant dynamics shifts and time-variant changes, such as contact
events, by effectively using the robot's I/O history. Additionally, we identify
task randomization as another key source of robustness, fostering better task
generalization and compliance to disturbances. The resulting control policies
can be successfully deployed on Cassie, a torque-controlled human-sized bipedal
robot. This work pushes the limits of agility for bipedal robots through
extensive real-world experiments. We demonstrate a diverse range of locomotion
skills, including: robust standing, versatile walking, fast running with a
demonstration of a 400-meter dash, and a diverse set of jumping skills, such as
standing long jumps and high jumps.
</p>
|
|
|
|
<p>In the field of distributed computing by robot swarms, the research
comprehends manifold models where robots operate in the Euclidean plane through
a sequence of look-compute-move cycles. Models under study differ for (i) the
possibility of storing constant-size information, (ii) the possibility of
communicating constant-size information, and (iii) the synchronization mode. By
varying features (i,ii), we obtain the noted four base models: OBLOT (silent
and oblivious robots), FSTA (silent and finite-state robots), FCOM (oblivious
and finite-communication robots), and LUMI (finite-state and
finite-communication robots). Combining each base model with the three main
synchronization modes (fully synchronous, semi-synchronous, and asynchronous),
we obtain the well-known 12 models. Extensive research has studied their
computational power, proving the hierarchical relations between different
models. However, only transparent robots have been considered. In this work, we
study the taxonomy of the 12 models considering collision-intolerant opaque
robots. We present six witness problems that prove the majority of the
computational relations between the 12 models. In particular, the last witness
problem depicts a peculiar issue occurring in the case of obstructed visibility
and asynchrony.
</p>
|
|
|
|
<p>Although multilingual language models exhibit impressive cross-lingual
transfer capabilities on unseen languages, the performance on downstream tasks
is impacted when there is a script disparity with the languages used in the
multilingual model's pre-training data. Using transliteration offers a
straightforward yet effective means to align the script of a resource-rich
language with a target language, thereby enhancing cross-lingual transfer
capabilities. However, for mixed languages, this approach is suboptimal, since
only a subset of the language benefits from the cross-lingual transfer while
the remainder is impeded. In this work, we focus on Maltese, a Semitic
language, with substantial influences from Arabic, Italian, and English, and
notably written in Latin script. We present a novel dataset annotated with
word-level etymology. We use this dataset to train a classifier that enables us
to make informed decisions regarding the appropriate processing of each token
in the Maltese language. We contrast indiscriminate transliteration or
translation to mixing processing pipelines that only transliterate words of
Arabic origin, thereby resulting in text with a mixture of scripts. We
fine-tune the processed data on four downstream tasks and show that conditional
transliteration based on word etymology yields the best results, surpassing
fine-tuning with raw Maltese or Maltese processed with non-selective pipelines.
</p>
|
|
|
|
<p>Sliced optimal transport, which is basically a Radon transform followed by
one-dimensional optimal transport, became popular in various applications due
to its efficient computation. In this paper, we deal with sliced optimal
transport on the sphere $\mathbb{S}^{d-1}$ and on the rotation group SO(3). We
propose a parallel slicing procedure of the sphere which requires again only
optimal transforms on the line. We analyze the properties of the corresponding
parallelly sliced optimal transport, which provides in particular a
rotationally invariant metric on the spherical probability measures. For SO(3),
we introduce a new two-dimensional Radon transform and develop its singular
value decomposition. Based on this, we propose a sliced optimal transport on
SO(3).
</p>
<p>As Wasserstein distances were extensively used in barycenter computations, we
derive algorithms to compute the barycenters with respect to our new sliced
Wasserstein distances and provide synthetic numerical examples on the 2-sphere
that demonstrate their behavior both the free and fixed support setting of
discrete spherical measures. In terms of computational speed, they outperform
the existing methods for semicircular slicing as well as the regularized
Wasserstein barycenters.
</p>
|
|
|
|
<p>Performing versatile mobile manipulation actions in human-centered
environments requires highly sophisticated software frameworks that are
flexible enough to handle special use cases, yet general enough to be
applicable across different robotic systems, tasks, and environments. This
paper presents a comprehensive memory-centered, affordance-based, and modular
uni- and multi-manual grasping and mobile manipulation framework, applicable to
complex robot systems with a high number of degrees of freedom such as humanoid
robots. By representing mobile manipulation actions through affordances, i.e.,
interaction possibilities of the robot with its environment, we unify the
autonomous manipulation process for known and unknown objects in arbitrary
environments. Our framework is integrated and embedded into the memory-centric
cognitive architecture of the ARMAR humanoid robot family. This way, robots can
not only interact with the physical world but also use common knowledge about
objects, and learn and adapt manipulation strategies. We demonstrate the
applicability of the framework in real-world experiments, including grasping
known and unknown objects, object placing, and semi-autonomous bimanual
grasping of objects on two different humanoid robot platforms.
</p>
|
|
|
|
<p>The combination of multiple-input multiple-output (MIMO) systems and
intelligent reflecting surfaces (IRSs) is foreseen as a critical enabler of
beyond 5G (B5G) and 6G. In this work, two different approaches are considered
for the joint optimization of the IRS phase-shift matrix and MIMO precoders of
an IRS-assisted multi-stream (MS) multi-user MIMO (MU-MIMO) system. Both
approaches aim to maximize the system sum-rate for every channel realization.
The first proposed solution is a novel contextual bandit (CB) framework with
continuous state and action spaces called deep contextual bandit-oriented deep
deterministic policy gradient (DCB-DDPG). The second is an innovative deep
reinforcement learning (DRL) formulation where the states, actions, and rewards
are selected such that the Markov decision process (MDP) property of
reinforcement learning (RL) is appropriately met. Both proposals perform
remarkably better than state-of-the-art heuristic methods in scenarios with
high multi-user interference.
</p>
|
|
|
|
<p>In this study, we present a secure smart contract-based Verifiable Random
Function (VRF) model, addressing the shortcomings of existing systems. As
quantum computing emerges, conventional public key cryptography faces potential
vulnerabilities. To enhance our VRF's robustness, we employ post-quantum
Ring-LWE encryption for generating pseudo-random sequences. Given the
computational intensity of this approach and associated on-chain gas costs, we
propose a hybrid architecture of VRF system where on-chain and off-chain can
communicate in a scalable and secure way. To ensure the validity and integrity
of the off-chain computations (e.g., Ring-LWE encryption), we employ a
quantum-secure linkable ring signature scheme on NTRU lattice and also
delegated key generation (DKG) with a secure key encapsulation mechanism (KEM).
Our decentralized VRF employs multi-party computation (MPC) with
blockchain-based decentralized identifiers (DID), ensuring the collective
efforts of enhanced randomness and security. We show the security and privacy
advantages of our proposed VRF model with the approximated estimation of
overall temporal and spatial complexities. We also evaluate our VRF MPC model's
entropy and outline its Solidity smart contract integration. This research also
provides a method to produce and verify the VRF output's proof, optimal for
scenarios necessitating randomness and validation. Lastly, using NIST SP800-22
test suite for randomness, we demonstrate the commendable result with a 97.73%
overall pass rate on 11 standard tests and 0.5459 of average p-value for the
total 176 tests.
</p>
|
|
|
|
<p>An innovative approach to hybrid analog-digital precoding for the downlink of
wideband massive MIMO systems is developed. The proposed solution, termed
Rank-Constrained Coordinate Ascent (RCCA), starts seeking the full-digital
precoder that maximizes the achievable sum-rate over all the frequency
subcarriers while constraining the rank of the overall transmit covariance
matrix. The frequency-flat constraint on the analog part of the hybrid precoder
and the non-convex nature of the rank constraint are circumvented by
transforming the original problem into a more suitable one, where a convenient
structure for the transmit covariance matrix is imposed. Such structure makes
the resulting full-digital precoder particularly adequate for its posterior
analog-digital factorization. An additional problem formulation to determine an
appropriate power allocation policy according to the rank constraint is also
provided. The numerical results show that the proposed method outperforms
baseline solutions even for practical scenarios with high spatial diversity.
</p>
|
|
|
|
<p>In [2] we show how to construct information sets for Reed-Muller codes only
in terms of their basic parameters. In this work we deal with the corresponding
problem for q-ary Generalized Reed-Muller codes of first and second order. We
see that for first-order codes the result for binary Reed-Muller codes is also
valid, while for second-order codes, with q > 2, we have to manage more complex
defining sets and we show that we get different information sets. We also
present some examples and associated open problems.
</p>
|
|
|
|
<p>Lattices are architected metamaterials whose properties strongly depend on
their geometrical design. The analogy between lattices and graphs enables the
use of graph neural networks (GNNs) as a faster surrogate model compared to
traditional methods such as finite element modelling. In this work we present a
higher-order GNN model trained to predict the fourth-order stiffness tensor of
periodic strut-based lattices. The key features of the model are (i) SE(3)
equivariance, and (ii) consistency with the thermodynamic law of conservation
of energy. We compare the model to non-equivariant models based on a number of
error metrics and demonstrate the benefits of the encoded equivariance and
energy conservation in terms of predictive performance and reduced training
requirements.
</p>
|
|
|
|
<p>We tackle the problem of Byzantine errors in distributed gradient descent
within the Byzantine-resilient gradient coding framework. Our proposed solution
can recover the exact full gradient in the presence of $s$ malicious workers
with a data replication factor of only $s+1$. It generalizes previous solutions
to any data assignment scheme that has a regular replication over all data
samples. The scheme detects malicious workers through additional interactive
communication and a small number of local computations at the main node,
leveraging group-wise comparisons between workers with a provably optimal
grouping strategy. The scheme requires at most $s$ interactive rounds that
incur a total communication cost logarithmic in the number of data samples.
</p>
|
|
|
|
<p>In this paper we extend the equal division and the equal surplus division
values for transferable utility cooperative games to the more general setup of
transferable utility cooperative games with a priori unions. In the case of the
equal surplus division value we propose three possible extensions. We provide
axiomatic characterizations of the new values. Furthermore, we apply the
proposed modifications to a particular cost sharing problem and compare the
numerical results with those obtained with the original values.
</p>
|
|
|
|
<p>Providing closed form estimates of the decoding failure rate of iterative
decoder for low- and moderate-density parity check codes has attracted
significant interest in the research community over the years. This interest
has raised recently due to the use of iterative decoders in post-quantum
cryptosystems, where the desired decoding failure rates are impossible to
estimate via Monte Carlo simulations. In this work, we propose a new technique
to provide accurate estimates of the DFR of a two-iterations (parallel) bit
flipping decoder, which is also employable for cryptographic purposes. In doing
so, we successfully tackle the estimation of the bit flipping probabilities at
the second decoder iteration, and provide a fitting estimate for the syndrome
weight distribution at the first iteration. We numerically validate our
results, providing comparisons of the modeled and simulated weight of the
syndrome, incorrectly-guessed error bit distribution at the end of the first
iteration, and two-iteration Decoding Failure Rates (DFR), both in the floor
and waterfall regime for simulatable codes. Finally, we apply our method to
estimate the DFR of LEDAcrypt parameters, showing improvements by factors
larger than $2^{70}$ (for NIST category $1$) with respect to the previous
estimation techniques. This allows for a $\approx 20$% shortening in public key
and ciphertext sizes, at no security loss, making the smallest ciphertext for
NIST category $1$ only $6$% larger than the one of BIKE. We note that the
analyzed two-iterations decoder is applicable in BIKE, where swapping it with
the current black-gray decoder (and adjusting the parameters) would provide
strong IND-CCA$2$ guarantees.
</p>
|
|
|
|
<p>This paper uses topological data analysis (TDA) tools and introduces a
data-driven clustering-based stock selection strategy tailored for sparse
portfolio construction. Our asset selection strategy exploits the topological
features of stock price movements to select a subset of topologically similar
(different) assets for a sparse index tracking (Markowitz) portfolio. We
introduce new distance measures, which serve as an input to the clustering
algorithm, on the space of persistence diagrams and landscapes that consider
the time component of a time series. We conduct an empirical analysis on the
S\&P index from 2009 to 2020, including a study on the COVID-19 data to
validate the robustness of our methodology. Our strategy to integrate TDA with
the clustering algorithm significantly enhanced the performance of sparse
portfolios across various performance measures in diverse market scenarios.
</p>
|
|
|
|
<p>We develop a framework for learning properties of quantum states beyond the
assumption of independent and identically distributed (i.i.d.) input states. We
prove that, given any learning problem (under reasonable assumptions), an
algorithm designed for i.i.d. input states can be adapted to handle input
states of any nature, albeit at the expense of a polynomial increase in copy
complexity. Furthermore, we establish that algorithms which perform
non-adaptive incoherent measurements can be extended to encompass non-i.i.d.
input states while maintaining comparable error probabilities. This allows us,
among others applications, to generalize the classical shadows of Huang, Kueng,
and Preskill to the non-i.i.d. setting at the cost of a small loss in
efficiency. Additionally, we can efficiently verify any pure state using
Clifford measurements, in a way that is independent of the ideal state. Our
main techniques are based on de Finetti-style theorems supported by tools from
information theory. In particular, we prove a new randomized local de Finetti
theorem that can be of independent interest.
</p>
|
|
|
|
<p>Integrating information from multiple modalities enhances the robustness of
scene perception systems in autonomous vehicles, providing a more comprehensive
and reliable sensory framework. However, the modality incompleteness in
multi-modal segmentation remains under-explored. In this work, we establish a
task called Modality-Incomplete Scene Segmentation (MISS), which encompasses
both system-level modality absence and sensor-level modality errors. To avoid
the predominant modality reliance in multi-modal fusion, we introduce a
Missing-aware Modal Switch (MMS) strategy to proactively manage missing
modalities during training. Utilizing bit-level batch-wise sampling enhances
the model's performance in both complete and incomplete testing scenarios.
Furthermore, we introduce the Fourier Prompt Tuning (FPT) method to incorporate
representative spectral information into a limited number of learnable prompts
that maintain robustness against all MISS scenarios. Akin to fine-tuning
effects but with fewer tunable parameters (1.1%). Extensive experiments prove
the efficacy of our proposed approach, showcasing an improvement of 5.84% mIoU
over the prior state-of-the-art parameter-efficient methods in modality
missing. The source code will be publicly available at
https://github.com/RuipingL/MISS.
</p>
|
|
|
|
<p>We investigate the Witsenhausen counterexample in a continuous vector-valued
context with a causal encoder and noncausal decoder. Our main result is the
optimal single-letter condition that characterizes the set of achievable
Witsenhausen power costs and estimation costs, leveraging a modified weak
typicality approach. In particular, we accommodate our power analysis to the
causal encoder constraint, and provide an improved distortion error analysis
for the challenging estimation of the interim state. Interestingly, the idea of
dual role of control is explicitly captured by the two auxiliary random
variables.
</p>
|
|
|
|
<p>The low-rank plus sparse (L+S) decomposition model has enabled better
reconstruction of dynamic magnetic resonance imaging (dMRI) with separation
into background (L) and dynamic (S) component. However, use of low-rank prior
alone may not fully explain the slow variations or smoothness of the background
part at the local scale. In this paper, we propose a smoothness-regularized L+S
(SR-L+S) model for dMRI reconstruction from highly undersampled k-t-space data.
We exploit joint low-rank and smooth priors on the background component of dMRI
to better capture both its global and local temporal correlated structures.
Extending the L+S formulation, the low-rank property is encoded by the nuclear
norm, while the smoothness by a general \ell_{p}-norm penalty on the local
differences of the columns of L. The additional smoothness regularizer can
promote piecewise local consistency between neighboring frames. By smoothing
out the noise and dynamic activities, it allows accurate recovery of the
background part, and subsequently more robust dMRI reconstruction. Extensive
experiments on multi-coil cardiac and synthetic data shows that the SR-L+S
model outp
</p>
|
|
|
|
<p>In this paper we introduce the $\Gamma$ value, a new value for cooperative
games with transferable utility. We also provide an axiomatic characterization
of the $\Gamma$ value based on a property concerning the so-called necessary
players. A necessary players of a game is one without which the characteristic
function is zero. We illustrate the performance of the $\Gamma$ value in a
particular cost allocation problem that arises when the owners of the
apartments in a building plan to install an elevator and share its installation
cost; in the resulting example we compare the proposals of the $\Gamma$ value,
the equal division value and the Shapley value in two different scenarios. In
addition, we propose an extension of the $\Gamma$ value for cooperative games
with transferable utility and with a coalition structure. Finally, we provide
axiomatic characterizations of the coalitional $\Gamma$ value and of the Owen
and Banzhaf-Owen values using alternative properties concerning necessary
players.
</p>
|
|
|
|
<p>World is looking for clean and renewable energy sources that do not pollute
the environment, in an attempt to reduce greenhouse gas emissions that
contribute to global warming. Wind energy has significant potential to not only
reduce greenhouse emission, but also meet the ever increasing demand for
energy. To enable the effective utilization of wind energy, addressing the
following three challenges in wind data analysis is crucial. Firstly, improving
data resolution in various climate conditions to ensure an ample supply of
information for assessing potential energy resources. Secondly, implementing
dimensionality reduction techniques for data collected from sensors/simulations
to efficiently manage and store large datasets. Thirdly, extrapolating wind
data from one spatial specification to another, particularly in cases where
data acquisition may be impractical or costly. We propose a deep learning based
approach to achieve multi-modal continuous resolution wind data prediction from
discontinuous wind data, along with data dimensionality reduction.
</p>
|
|
|
|
<p>Purpose: Wood comprises different cell types, such as fibers and vessels,
defining its properties. Studying their shape, size, and arrangement in
microscopic images is crucial for understanding wood samples. Typically, this
involves macerating (soaking) samples in a solution to separate cells, then
spreading them on slides for imaging with a microscope that covers a wide area,
capturing thousands of cells. However, these cells often cluster and overlap in
images, making the segmentation difficult and time-consuming using standard
image-processing methods. Results: In this work, we develop an automatic deep
learning segmentation approach that utilizes the one-stage YOLOv8 model for
fast and accurate fiber and vessel segmentation and characterization in
microscopy images. The model can analyze 32640 x 25920 pixels images and
demonstrate effective cell detection and segmentation, achieving a mAP_0.5-0.95
of 78 %. To assess the model's robustness, we examined fibers from a
genetically modified tree line known for longer fibers. The outcomes were
comparable to previous manual measurements. Additionally, we created a
user-friendly web application for image analysis and provided the code for use
on Google Colab. Conclusion: By leveraging YOLOv8's advances, this work
provides a deep learning solution to enable efficient quantification and
analysis of wood cells suitable for practical applications.
</p>
|
|
|
|
<p>In this paper we extend the equal division and the equal surplus division
values for transferable utility cooperative games to the more general setup of
transferable utility cooperative games with level structures. In the case of
the equal surplus division value we propose three possible extensions, one of
which has already been described in the literature. We provide axiomatic
characterizations of the values considered, apply them to a particular cost
sharing problem and compare them in the framework of such an application.
</p>
|
|
|
|
<p>We consider a model of third-degree price discrimination, in which the seller
has a valuation for the product which is unknown to the market designer, who
aims to maximize the buyers' surplus by revealing information regarding the
buyer's valuation to the seller. Our main result shows that the regret is
bounded by $U^*(0)/e$, where $U^*(0)$ is the optimal buyer surplus in the case
where the seller has zero valuation for the product. This bound is attained by
randomly drawing a seller valuation and applying the segmentation of Bergemann
et al. (2015) with respect to the drawn valuation. We show that the $U^*(0)/e$
bound is tight in the case of binary buyer valuation.
</p>
|
|
|
|
<p>We propose a novel algorithm for online resource allocation with
non-stationary customer arrivals and unknown click-through rates. We assume
multiple types of customers arrive in a nonstationary stochastic fashion, with
unknown arrival rates in each period, and that customers' click-through rates
are unknown and can only be learned online. By leveraging results from the
stochastic contextual bandit with knapsack and online matching with adversarial
arrivals, we develop an online scheme to allocate the resources to
nonstationary customers. We prove that under mild conditions, our scheme
achieves a ``best-of-both-world'' result: the scheme has a sublinear regret
when the customer arrivals are near-stationary, and enjoys an optimal
competitive ratio under general (non-stationary) customer arrival
distributions. Finally, we conduct extensive numerical experiments to show our
approach generates near-optimal revenues for all different customer scenarios.
</p>
|
|
|
|
<p>The importance of addressing security vulnerabilities is indisputable, with
software becoming crucial in sectors such as national defense and finance.
Consequently, The security issues caused by software vulnerabilities cannot be
ignored. Fuzz testing is an automated software testing technology that can
detect vulnerabilities in the software. However, most previous fuzzers
encounter challenges that fuzzing performance is sensitive to initial input
seeds. In the absence of high-quality initial input seeds, fuzzers may expend
significant resources on program path exploration, leading to a substantial
decrease in the efficiency of vulnerability detection. To address this issue,
we propose WGAN-AFL. By collecting high-quality testcases, we train a
generative adversarial network (GAN) to learn their features, thereby obtaining
high-quality initial input seeds. To overcome drawbacks like mode collapse and
training instability inherent in GANs, we utilize the Wasserstein GAN (WGAN)
architecture for training, further enhancing the quality of the generated
seeds. Experimental results demonstrate that WGAN-AFL significantly outperforms
the original AFL in terms of code coverage, new paths, and vulnerability
discovery, demonstrating the effective enhancement of seed quality by WGAN-AFL.
</p>
|
|
|
|
<p>We present a new plug-in for the ARGoS swarm robotic simulator to implement
the Crazyflie drone, including its controllers, sensors, and some expansion
decks. We have based our development on the former Spiri drone, upgrading the
position controller, adding a new speed controller, LED ring, onboard camera,
and battery discharge model. We have compared this new plug-in in terms of
accuracy and efficiency with data obtained from real Crazyflie drones. All our
experiments showed that the proposed plug-in worked well, presenting high
levels of accuracy. We believe that this is an important contribution to robot
simulations which will extend ARGoS capabilities through the use of our
proposed, open-source plug-in.
</p>
|
|
|
|
<p>We consider the transmission of spatially correlated analog information in a
wireless sensor network (WSN) through fading single-input and multiple-output
(SIMO) multiple access channels (MACs) with low-latency requirements. A
lattice-based analog joint source-channel coding (JSCC) approach is considered
where vectors of consecutive source symbols are encoded at each sensor using an
n-dimensional lattice and then transmitted to a multiantenna central node. We
derive a minimum mean square error (MMSE) decoder that accounts for both the
multidimensional structure of the encoding lattices and the spatial
correlation. In addition, a sphere decoder is considered to simplify the
required searches over the multidimensional lattices. Different lattice-based
mappings are approached and the impact of their size and density on performance
and latency is analyzed. Results show that, while meeting low-latency
constraints, lattice-based analog JSCC provides performance gains and higher
reliability with respect to the state-of-the-art JSCC schemes.
</p>
|
|
|
|
<p>This paper presents an algorithm, called BCM-Broadcast, for the
implementation of causal broadcast in distributed mobile systems in the
presence of Byzantine failures. The BCM-Broadcast algorithm simultaneously
focuses on three critical challenges in distributed systems: Byzantine
failures, Causality, and Mobility. We first present a hierarchical architecture
for BCM-Broadcast. Then, we define twelve properties for BCM-Broadcast,
including validity, integrity, termination, and causality. We then show that
BCM-Broadcast satisfies all these properties. We also prove the safety of
BCM-Broadcast; i.e., no healthy process delivers a message from any Byzantine
process, assuming that the number of Byzantine processes is less than a third
of the total number of mobile nodes. Subsequently, we show that the message
complexity of BCM-Broadcast is linear. Finally, using the Poisson process, we
analyze the probability of the violation of the safety condition under
different mobility scenarios.
</p>
|
|
|
|
<p>This paper answers a fundamental question about the exact distribution of the
signal-to-interference-plus-noise ratio (SINR) under matched-filter (MF)
precoding. Specifically, we derive the exact expressions for the cumulative
distribution function (CDF) and the probability density function (PDF) of SINR
under MF precoding over Rayleigh fading channels. Based on the exact analysis,
we then rigorously prove that the SINR converges to some specific distributions
separately in high SNR and in massive MIMO. To simplify the exact result in
general cases, we develop a good approximation by modelling the interference as
a Beta distribution. We then shift to the exact analysis of the transmit rate,
and answer the fundamental question: How does the exact rate converge to the
well-known asymptotic rate in massive MIMO? After that, we propose a novel
approximation for the ergodic rate, which performs better than various existing
approximations. Finally, we present some numerical results to demonstrate the
accuracy of the derived analytical models.
</p>
|
|
|
|
<p>Entity alignment, which is a prerequisite for creating a more comprehensive
Knowledge Graph (KG), involves pinpointing equivalent entities across disparate
KGs. Contemporary methods for entity alignment have predominantly utilized
knowledge embedding models to procure entity embeddings that encapsulate
various similarities-structural, relational, and attributive. These embeddings
are then integrated through attention-based information fusion mechanisms.
Despite this progress, effectively harnessing multifaceted information remains
challenging due to inherent heterogeneity. Moreover, while Large Language
Models (LLMs) have exhibited exceptional performance across diverse downstream
tasks by implicitly capturing entity semantics, this implicit knowledge has yet
to be exploited for entity alignment. In this study, we propose a Large
Language Model-enhanced Entity Alignment framework (LLMEA), integrating
structural knowledge from KGs with semantic knowledge from LLMs to enhance
entity alignment. Specifically, LLMEA identifies candidate alignments for a
given entity by considering both embedding similarities between entities across
KGs and edit distances to a virtual equivalent entity. It then engages an LLM
iteratively, posing multiple multi-choice questions to draw upon the LLM's
inference capability. The final prediction of the equivalent entity is derived
from the LLM's output. Experiments conducted on three public datasets reveal
that LLMEA surpasses leading baseline models. Additional ablation studies
underscore the efficacy of our proposed framework.
</p>
|
|
|
|
<p>The Finite Fourier Series (FFS) Shape-Based (SB) trajectory approximation
method has been used to rapidly generate initial trajectories that satisfy the
dynamics, trajectory boundary conditions, and limitation on maximum thrust
acceleration. The FFS SB approach solves a nonlinear programming problem (NLP)
in searching for feasible trajectories. This paper extends the development of
the FFS SB approach to generate sub optimal solutions. Specifically, the
objective function of the NLP problem is modified to include also a measure for
the time of flight. Numerical results presented in this paper show several
solutions that differ from those of the original FFS SB ones. The sub-optimal
trajectories generated using a time of flight minimization are shown to be
physically feasible trajectories and potential candidates for direct solvers.
</p>
|
|
|
|
<p>The aim of this paper is to develop hybrid non-orthogonal multiple access
(NOMA) assisted downlink transmission. First, for the single-input
single-output (SISO) scenario, i.e., each node is equipped with a single
antenna, a novel hybrid NOMA scheme is introduced, where NOMA is implemented as
an add-on of a legacy time division multiple access (TDMA) network. Because of
the simplicity of the SISO scenario, analytical results can be developed to
reveal important properties of downlink hybrid NOMA. For example, in the case
that the users' channel gains are ordered and the durations of their time slots
are the same, downlink hybrid NOMA is shown to always outperform TDMA, which is
different from the existing conclusion for uplink hybrid NOMA. Second, the
proposed downlink SISO hybrid NOMA scheme is extended to the multiple-input
single-output (MISO) scenario, i.e., the base station has multiple antennas.
For the MISO scenario, near-field communication is considered to illustrate how
NOMA can be used as an add-on in legacy networks based on space division
multiple access and TDMA. Simulation results verify the developed analytical
results and demonstrate the superior performance of downlink hybrid NOMA
compared to conventional orthogonal multiple access.
</p>
|
|
|
|
<p>In this paper, a direct finite element method is proposed for solving
interface problems on simple unfitted meshes. The fact that the two interface
conditions form a $H^{\frac12}(\Gamma)\times H^{-\frac12}(\Gamma)$ pair leads
to a simple and direct weak formulation with an integral term for the mutual
interaction over the interface, and the well-posedness of this weak formulation
is proved. Based on this formulation, a direct finite element method is
proposed to solve the problem on two adjacent subdomains separated by the
interface by conforming finite element and conforming mixed finite element,
respectively. The well-posedness and an optimal a priori analysis are proved
for this direct finite element method under some reasonable assumptions. A
simple lowest order direct finite element method by using the linear element
method and the lowest order Raviart-Thomas element method is proposed and
analyzed to admit the optimal a priori error estimate by verifying the
aforementioned assumptions. Numerical tests are also conducted to verify the
theoretical results and the effectiveness of the direct finite element method.
</p>
|
|
|
|
<p>Recent approaches to automatically detect the speaker of an utterance of
direct speech often disregard general information about characters in favor of
local information found in the context, such as surrounding mentions of
entities. In this work, we explore stylistic representations of characters
built by encoding their quotes with off-the-shelf pretrained Authorship
Verification models in a large corpus of English novels (the Project Dialogism
Novel Corpus). Results suggest that the combination of stylistic and topical
information captured in some of these models accurately distinguish characters
among each other, but does not necessarily improve over semantic-only models
when attributing quotes. However, these results vary across novels and more
investigation of stylometric models particularly tailored for literary texts
and the study of characters should be conducted.
</p>
|
|
|
|
<p>Plagiarism is a pressing concern, even more so with the availability of large
language models. Existing plagiarism detection systems reliably find copied and
moderately reworded text but fail for idea plagiarism, especially in
mathematical science, which heavily uses formal mathematical notation. We make
two contributions. First, we establish a taxonomy of mathematical content reuse
by annotating potentially plagiarised 122 scientific document pairs. Second, we
analyze the best-performing approaches to detect plagiarism and mathematical
content similarity on the newly established taxonomy. We found that the
best-performing methods for plagiarism and math content similarity achieve an
overall detection score (PlagDet) of 0.06 and 0.16, respectively. The
best-performing methods failed to detect most cases from all seven newly
established math similarity types. Outlined contributions will benefit research
in plagiarism detection systems, recommender systems, question-answering
systems, and search engines. We make our experiment's code and annotated
dataset available to the community:
https://github.com/gipplab/Taxonomy-of-Mathematical-Plagiarism
</p>
|
|
|
|
<p>Many High Performance Computing (HPC) facilities have developed and deployed
frameworks in support of continuous monitoring and operational data analytics
(MODA) to help improve efficiency and throughput. Because of the complexity and
scale of systems and workflows and the need for low-latency response to address
dynamic circumstances, automated feedback and response have the potential to be
more effective than current human-in-the-loop approaches which are laborious
and error prone. Progress has been limited, however, by factors such as the
lack of infrastructure and feedback hooks, and successful deployment is often
site- and case-specific. In this position paper we report on the outcomes and
plans from a recent Dagstuhl Seminar, seeking to carve a path for community
progress in the development of autonomous feedback loops for MODA, based on the
established formalism of similar (MAPE-K) loops in autonomous computing and
self-adaptive systems. By defining and developing such loops for significant
cases experienced across HPC sites, we seek to extract commonalities and
develop conventions that will facilitate interoperability and
interchangeability with system hardware, software, and applications across
different sites, and will motivate vendors and others to provide telemetry
interfaces and feedback hooks to enable community development and pervasive
deployment of MODA autonomy loops.
</p>
|
|
|
|
<p>Multi-image super-resolution (MISR) allows to increase the spatial resolution
of a low-resolution (LR) acquisition by combining multiple images carrying
complementary information in the form of sub-pixel offsets in the scene
sampling, and can be significantly more effective than its single-image
counterpart. Its main difficulty lies in accurately registering and fusing the
multi-image information. Currently studied settings, such as burst photography,
typically involve assumptions of small geometric disparity between the LR
images and rely on optical flow for image registration. We study a MISR method
that can increase the resolution of sets of images acquired with arbitrary, and
potentially wildly different, camera positions and orientations, generalizing
the currently studied MISR settings. Our proposed model, called EpiMISR, moves
away from optical flow and explicitly uses the epipolar geometry of the
acquisition process, together with transformer-based processing of radiance
feature fields to substantially improve over state-of-the-art MISR methods in
presence of large disparities in the LR images.
</p>
|
|
|
|
<p>Causal discovery is the challenging task of inferring causal structure from
data. Motivated by Pearl's Causal Hierarchy (PCH), which tells us that passive
observations alone are not enough to distinguish correlation from causation,
there has been a recent push to incorporate interventions into machine learning
research. Reinforcement learning provides a convenient framework for such an
active approach to learning. This paper presents CORE, a deep reinforcement
learning-based approach for causal discovery and intervention planning. CORE
learns to sequentially reconstruct causal graphs from data while learning to
perform informative interventions. Our results demonstrate that CORE
generalizes to unseen graphs and efficiently uncovers causal structures.
Furthermore, CORE scales to larger graphs with up to 10 variables and
outperforms existing approaches in structure estimation accuracy and sample
efficiency. All relevant code and supplementary material can be found at
https://github.com/sa-and/CORE
</p>
|
|
|
|
<p>The modification of Amdahl's law for the case of increment of processor
elements in a computer system is considered. The coefficient $k$ linking
accelerations of parallel and parallel specialized computer systems is
determined. The limiting values of the coefficient are investigated and its
theoretical maximum is calculated. It is proved that $k$ > 1 for any positive
increment of processor elements. The obtained formulas are combined into a
single method allowing to determine the maximum theoretical acceleration of a
parallel specialized computer system in comparison with the acceleration of a
minimal parallel computer system. The method is tested on Apriori, k-nearest
neighbors, CDF 9/7, fast Fourier transform and naive Bayesian classifier
algorithms.
</p>
|
|
|
|
<p>This paper investigates the theoretical analysis of intrinsic message passing
decoding for generalized product codes (GPCs) with irregular degree
distributions, a generalization of product codes that allows every code bit to
be protected by a minimum of two and potentially more component codes. We
derive a random hypergraph-based asymptotic performance analysis for GPCs,
extending previous work that considered the case where every bit is protected
by exactly two component codes. The analysis offers a new tool to guide the
code design of GPCs by providing insights into the influence of degree
distributions on the performance of GPCs.
</p>
|
|
|
|
<p>Generative retrieval models encode pointers to information in a corpus as an
index within the model's parameters. These models serve as part of a larger
pipeline, where retrieved information conditions generation for
knowledge-intensive NLP tasks. However, we identify two limitations: the
generative retrieval does not account for contextual information. Secondly, the
retrieval can't be tuned for the downstream readers as decoding the page title
is a non-differentiable operation. This paper introduces Re3val, trained with
generative reranking and reinforcement learning using limited data. Re3val
leverages context acquired via Dense Passage Retrieval to rerank the retrieved
page titles and utilizes REINFORCE to maximize rewards generated by constrained
decoding. Additionally, we generate questions from our pre-training dataset to
mitigate epistemic uncertainty and bridge the domain gap between the
pre-training and fine-tuning datasets. Subsequently, we extract and rerank
contexts from the KILT database using the rerank page titles. Upon grounding
the top five reranked contexts, Re3val demonstrates the Top 1 KILT scores
compared to all other generative retrieval models across five KILT datasets.
</p>
|
|
|
|
<p>Imaging Atmospheric Cherenkov Telescopes (IACTs) of gamma ray observatory
TAIGA detect the Extesnive Air Showers (EASs) originating from the cosmic or
gamma rays interactions with the atmosphere. Thereby, telescopes obtain images
of the EASs. The ability to segregate gamma rays images from the hadronic
cosmic ray background is one of the main features of this type of detectors.
However, in actual IACT observations simultaneous observation of the background
and the source of gamma ray is needed. This observation mode (called wobbling)
modifies images of events, which affects the quality of selection by neural
networks.
</p>
<p>Thus, in this work, the results of the application of neural networks (NN)
for image classification task on Monte Carlo (MC) images of TAIGA-IACTs are
presented. The wobbling mode is considered together with the image adaptation
for adequate analysis by NNs. Simultaneously, we explore several neural network
structures that classify events both directly from images or through Hillas
parameters extracted from images. In addition, by employing NNs, MC simulation
data are used to evaluate the quality of the segregation of rare gamma events
with the account of all necessary image modifications.
</p>
|
|
|
|
<p>The growing popularity of Android requires malware detection systems that can
keep up with the pace of new software being released. According to a recent
study, a new piece of malware appears online every 12 seconds. To address this,
we treat Android malware detection as a streaming data problem and explore the
use of active online learning as a means of mitigating the problem of labelling
applications in a timely and cost-effective manner. Our resulting framework
achieves accuracies of up to 96\%, requires as little of 24\% of the training
data to be labelled, and compensates for concept drift that occurs between the
release and labelling of an application. We also consider the broader
practicalities of online learning within Android malware detection, and
systematically explore the trade-offs between using different static, dynamic
and hybrid feature sets to classify malware.
</p>
|
|
|
|
<p>This manuscript introduces deep learning models that simultaneously describe
the dynamics of several yield curves. We aim to learn the dependence structure
among the different yield curves induced by the globalization of financial
markets and exploit it to produce more accurate forecasts. By combining the
self-attention mechanism and nonparametric quantile regression, our model
generates both point and interval forecasts of future yields. The architecture
is designed to avoid quantile crossing issues affecting multiple quantile
regression models. Numerical experiments conducted on two different datasets
confirm the effectiveness of our approach. Finally, we explore potential
extensions and enhancements by incorporating deep ensemble methods and transfer
learning mechanisms.
</p>
|
|
|
|
<p>The Sustainable Development Goals (SDGs) of the United Nations provide a
blueprint of a better future by 'leaving no one behind', and, to achieve the
SDGs by 2030, poor countries require immense volumes of development aid. In
this paper, we develop a causal machine learning framework for predicting
heterogeneous treatment effects of aid disbursements to inform effective aid
allocation. Specifically, our framework comprises three components: (i) a
balancing autoencoder that uses representation learning to embed
high-dimensional country characteristics while addressing treatment selection
bias; (ii) a counterfactual generator to compute counterfactual outcomes for
varying aid volumes to address small sample-size settings; and (iii) an
inference model that is used to predict heterogeneous treatment-response
curves. We demonstrate the effectiveness of our framework using data with
official development aid earmarked to end HIV/AIDS in 105 countries, amounting
to more than USD 5.2 billion. For this, we first show that our framework
successfully computes heterogeneous treatment-response curves using
semi-synthetic data. Then, we demonstrate our framework using real-world HIV
data. Our framework points to large opportunities for a more effective aid
allocation, suggesting that the total number of new HIV infections could be
reduced by up to 3.3% (~50,000 cases) compared to the current allocation
practice.
</p>
|
|
|
|
<p>Large-scale image datasets are often partially labeled, where only a few
categories' labels are known for each image. Assigning pseudo-labels to unknown
labels to gain additional training signals has become prevalent for training
deep classification models. However, some pseudo-labels are inevitably
incorrect, leading to a notable decline in the model classification
performance. In this paper, we propose a novel method called Category-wise
Fine-Tuning (CFT), aiming to reduce model inaccuracies caused by the wrong
pseudo-labels. In particular, CFT employs known labels without pseudo-labels to
fine-tune the logistic regressions of trained models individually to calibrate
each category's model predictions. Genetic Algorithm, seldom used for training
deep models, is also utilized in CFT to maximize the classification performance
directly. CFT is applied to well-trained models, unlike most existing methods
that train models from scratch. Hence, CFT is general and compatible with
models trained with different methods and schemes, as demonstrated through
extensive experiments. CFT requires only a few seconds for each category for
calibration with consumer-grade GPUs. We achieve state-of-the-art results on
three benchmarking datasets, including the CheXpert chest X-ray competition
dataset (ensemble mAUC 93.33%, single model 91.82%), partially labeled MS-COCO
(average mAP 83.69%), and Open Image V3 (mAP 85.31%), outperforming the
previous bests by 0.28%, 2.21%, 2.50%, and 0.91%, respectively. The single
model on CheXpert has been officially evaluated by the competition server,
endorsing the correctness of the result. The outstanding results and
generalizability indicate that CFT could be substantial and prevalent for
classification model development. Code is available at:
https://github.com/maxium0526/category-wise-fine-tuning.
</p>
|
|
|
|
<p>This article bridges the gap between two topics used in sharing an encryption
key: (i) Key Consolidation, i.e., extracting two identical strings of bits from
two information sources with similarities (common randomness). (ii)
Quantum-safe Key Encapsulation by incorporating randomness in Public/Private
Key pairs. In the context of Key Consolidation, the proposed scheme adds to the
complexity Eve faces in extracting useful data from leaked information. In this
context, it is applied to the method proposed in [1] for establishing common
randomness from round-trip travel times in a packet data network. The proposed
method allows adapting the secrecy level to the amount of similarity in common
randomness. It can even encapsulate a Quantum-safe encryption key in the
extreme case that no common randomness is available. In the latter case, it is
shown that the proposed scheme offers improvements with respect to the McEliece
cryptosystem which currently forms the foundation for Quantum safe key
encapsulation.
</p>
<p>[1] A. K. Khandani, "Looping for Encryption Key Generation Over the Internet:
A New Frontier in Physical Layer Security," 2023 Biennial Symposium on
Communications (BSC), Montreal, QC, Canada, 2023, pp. 59-64
</p>
|
|
|
|
<p>In this paper we study the interactions between so-called fractional
relaxations of the integer programs (IPs) which encode homomorphism and
isomorphism of relational structures. We give a combinatorial characterization
of a certain natural linear programming (LP) relaxation of homomorphism in
terms of fractional isomorphism. As a result, we show that the families of
constraint satisfaction problems (CSPs) that are solvable by such linear
program are precisely those that are closed under an equivalence relation which
we call Weisfeiler-Leman invariance. We also generalize this result to the much
broader framework of Promise Valued Constraint Satisfaction Problems, which
brings together two well-studied extensions of the CSP framework. Finally, we
consider the hierarchies of increasingly tighter relaxations of the
homomorphism and isomorphism IPs obtained by applying the Sherali-Adams and
Weisfeiler-Leman methods respectively. We extend our combinatorial
characterization of the basic LP to higher levels of the Sherali-Adams
hierarchy, and we generalize a well-known logical characterization of the
Weisfeiler-Leman test from graphs to relational structures.
</p>
|
|
|
|
<p>Text generation is a compelling sub-field of natural language processing,
aiming to generate human-readable text from input words. In particular, the
decoder-only generative models, such as generative pre-trained transformer
(GPT), are widely used for text generation, with two major computational
stages: summarization and generation. Unlike the summarization stage, which can
process the input tokens in parallel, the generation stage is difficult to
accelerate due to its sequential generation of output tokens through iteration.
Moreover, each iteration requires reading a whole model with little data reuse
opportunity. Therefore, the workload of transformer-based text generation is
severely memory-bound, making the external memory bandwidth system bottleneck.
In this paper, we proposed a subarray-level processing-in-memory architecture
named SAL-PIM, HBM-based PIM architecture for the end-to-end acceleration of
transformer-based text generation. The SAL-PIM architecture includes three
architectural features. First, the SAL-PIM architecture utilizes higher
internal bandwidth by integrating multiple subarray-level arithmetic logic
units with optimized data mapping schemes. Second, the SAL-PIM architecture
adopts LUT-based linear interpolation to perform complex non-linear functions
in PIM. Third, the SAL-PIM architecture accelerates end-to-end inference on PIM
in text generation. Furthermore, to validate the SAL-PIM architecture, we built
cycle-accurate simulator and implemented the SAL-PIM's logic units in 28-nm
CMOS technology. As a result, when the input size is from 32 to 128 and the
output size is from 1 to 256, SAL-PIM achieves a maximum of 4.72 times speedup
and an average of 1.83 times speedup for the text generation based on the GPT-2
medium model compared to the server-level GPU.
</p>
|
|
|
|
<p>This paper presents the results of finetuning large language models (LLMs)
for the task of detecting vulnerabilities in source code. We leverage
WizardCoder, a recent improvement of the state-of-the-art LLM StarCoder, and
adapt it for vulnerability detection through further finetuning. To accelerate
training, we modify WizardCoder's training procedure, also we investigate
optimal training regimes. For the imbalanced dataset with many more negative
examples than positive, we also explore different techniques to improve
classification performance. The finetuned WizardCoder model achieves
improvement in ROC AUC and F1 measures on balanced and imbalanced vulnerability
datasets over CodeBERT-like model, demonstrating the effectiveness of adapting
pretrained LLMs for vulnerability detection in source code. The key
contributions are finetuning the state-of-the-art code LLM, WizardCoder,
increasing its training speed without the performance harm, optimizing the
training procedure and regimes, handling class imbalance, and improving
performance on difficult vulnerability detection datasets. This demonstrates
the potential for transfer learning by finetuning large pretrained language
models for specialized source code analysis tasks.
</p>
|
|
|
|
<p>In this paper, we introduce two metrics, namely, age of actuation (AoA) and
age of actuated information (AoAI), within a discrete-time system model that
integrates data caching and energy harvesting (EH). AoA evaluates the
timeliness of actions irrespective of the age of the information, while AoAI
considers the freshness of the utilized data packet. We use Markov Chain
analysis to model the system's evolution. Furthermore, we employ
three-dimensional Markov Chain analysis to characterize the stationary
distributions for AoA and AoAI and calculate their average values. Our findings
from the analysis, validated by simulations, show that while AoAI consistently
decreases with increased data and energy packet arrival rates, AoA presents a
more complex behavior, with potential increases under conditions of limited
data or energy resources. These metrics go towards the semantics of information
and goal-oriented communications since they consider the timeliness of
utilizing the information to perform an action.
</p>
|
|
|
|
<p>This paper belongs to a group of work in the intersection of symbolic
computation and group analysis aiming for the symbolic analysis of differential
equations. The goal is to extract important properties without finding the
explicit general solution. In this contribution, we introduce the algorithmic
verification of nonlinear superposition properties and its implementation. More
exactly, for a system of nonlinear ordinary differential equations of first
order with a polynomial right-hand side, we check if the differential system
admits a general solution by means of a superposition rule and a certain number
of particular solutions. It is based on the theory of Newton polytopes and
associated symbolic computation. The developed method provides the basis for
the identification of nonlinear superpositions within a given system and for
the construction of numerical methods which preserve important algebraic
properties at the numerical level.
</p>
|
|
|
|
<p>Safety measures need to be systemically investigated to what extent they
evaluate the intended performance of Deep Neural Networks (DNNs) for critical
applications. Due to a lack of verification methods for high-dimensional DNNs,
a trade-off is needed between accepted performance and handling of
out-of-distribution (OOD) samples.
</p>
<p>This work evaluates rejecting outputs from semantic segmentation DNNs by
applying a Mahalanobis distance (MD) based on the most probable
class-conditional Gaussian distribution for the predicted class as an OOD
score. The evaluation follows three DNNs trained on the Cityscapes dataset and
tested on four automotive datasets and finds that classification risk can
drastically be reduced at the cost of pixel coverage, even when applied on
unseen datasets. The applicability of our findings will support legitimizing
safety measures and motivate their usage when arguing for safe usage of DNNs in
automotive perception.
</p>
|
|
|
|
<p>Extremely large aperture array (ELAA) is anticipated to serve as a pivotal
feature of future multiple-input multiple-output (MIMO) systems in 6G.
Near-field (NF) fading channel models are essential for reliable link-level
simulation and ELAA system design. In this article, we propose a framework
designed to generate NF fading channels for both communication and integrated
sensing and communication (ISAC) applications. The framework allows a mixed of
line of sight (LoS) and non-LoS (NLoS) links. It also considers spherical wave
model and spatially non-stationary shadow fading. Based on this framework, we
propose a three-dimensional (3D) fading channel model for ELAA systems deployed
with a uniform rectangular array (URA). It can capture the impact of sensing
object for ISAC applications. Moreover, all parameters involved in the
framework are based on specifications or measurements from the 3rd Generation
Partnership Project (3GPP) documents. Therefore, the proposed framework and
channel model have the potential to contribute to the standard in various
aspects, including ISAC, extra-large (XL-) MIMO, and reconfigurable intelligent
surface (RIS) aided MIMO systems. Finally, future directions for ELAA are
presented, including not only NF channel modeling but also the design of
next-generation transceivers.
</p>
|
|
|
|
<p>Subgraph matching has garnered increasing attention for its diverse
real-world applications. Given the dynamic nature of real-world graphs,
addressing evolving scenarios without incurring prohibitive overheads has been
a focus of research. However, existing approaches for dynamic subgraph matching
often proceed serially, retrieving incremental matches for each updated edge
individually. This approach falls short when handling batch data updates,
leading to a decrease in system throughput. Leveraging the parallel processing
power of GPUs, which can execute a massive number of cores simultaneously, has
been widely recognized for performance acceleration in various domains.
Surprisingly, systematic exploration of subgraph matching in the context of
batch-dynamic graphs, particularly on a GPU platform, remains untouched. In
this paper, we bridge this gap by introducing an efficient framework, GAMMA
(GPU-Accelerated Batch-Dynamic Subgraph Matching). Our approach features a
DFS-based warp-centric batch-dynamic subgraph matching algorithm. To ensure
load balance in the DFS-based search, we propose warp-level work stealing via
shared memory. Additionally, we introduce coalesced search to reduce redundant
computations. Comprehensive experiments demonstrate the superior performance of
GAMMA. Compared to state-of-the-art algorithms, GAMMA showcases a performance
improvement up to hundreds of times.
</p>
|
|
|
|
<p>Metamorphic testing (MT) has proven to be a successful solution to automating
testing and addressing the oracle problem. However, it entails manually
deriving metamorphic relations (MRs) and converting them into an executable
form; these steps are time-consuming and may prevent the adoption of MT. In
this paper, we propose an approach for automatically deriving executable MRs
(EMRs) from requirements using large language models (LLMs). Instead of merely
asking the LLM to produce EMRs, our approach relies on a few-shot prompting
strategy to instruct the LLM to perform activities in the MT process, by
providing requirements and API specifications, as one would do with software
engineers. To assess the feasibility of our approach, we conducted a
questionnaire-based survey in collaboration with Siemens Industry Software,
focusing on four of their software applications. Additionally, we evaluated the
accuracy of the generated EMRs for a web application. The outcomes of our study
are highly promising, as they demonstrate the capability of our approach to
generate MRs and EMRs that are both comprehensible and pertinent for testing
purposes.
</p>
|
|
|
|
<p>Moving object segmentation (MOS) provides a reliable solution for detecting
traffic participants and thus is of great interest in the autonomous driving
field. Dynamic capture is always critical in the MOS problem. Previous methods
capture motion features from the range images directly. Differently, we argue
that the residual maps provide greater potential for motion information, while
range images contain rich semantic guidance. Based on this intuition, we
propose MF-MOS, a novel motion-focused model with a dual-branch structure for
LiDAR moving object segmentation. Novelly, we decouple the spatial-temporal
information by capturing the motion from residual maps and generating semantic
features from range images, which are used as movable object guidance for the
motion branch. Our straightforward yet distinctive solution can make the most
use of both range images and residual maps, thus greatly improving the
performance of the LiDAR-based MOS task. Remarkably, our MF-MOS achieved a
leading IoU of 76.7% on the MOS leaderboard of the SemanticKITTI dataset upon
submission, demonstrating the current state-of-the-art performance. The
implementation of our MF-MOS has been released at
https://github.com/SCNU-RISLAB/MF-MOS.
</p>
|
|
|
|
<p>Developing an automatic signature verification system is challenging and
demands a large number of training samples. This is why synthetic handwriting
generation is an emerging topic in document image analysis. Some handwriting
synthesizers use the motor equivalence model, the well-established hypothesis
from neuroscience, which analyses how a human being accomplishes movement.
Specifically, a motor equivalence model divides human actions into two steps:
1) the effector independent step at cognitive level and 2) the effector
dependent step at motor level. In fact, recent work reports the successful
application to Western scripts of a handwriting synthesizer, based on this
theory. This paper aims to adapt this scheme for the generation of synthetic
signatures in two Indic scripts, Bengali (Bangla), and Devanagari (Hindi). For
this purpose, we use two different online and offline databases for both
Bengali and Devanagari signatures. This paper reports an effective synthesizer
for static and dynamic signatures written in Devanagari or Bengali scripts. We
obtain promising results with artificially generated signatures in terms of
appearance and performance when we compare the results with those for real
signatures.
</p>
|
|
|
|
<p>Deep learning models have demonstrated promising results in estimating
treatment effects (TEE). However, most of them overlook the variations in
treatment outcomes among subgroups with distinct characteristics. This
limitation hinders their ability to provide accurate estimations and treatment
recommendations for specific subgroups. In this study, we introduce a novel
neural network-based framework, named SubgroupTE, which incorporates subgroup
identification and treatment effect estimation. SubgroupTE identifies diverse
subgroups and simultaneously estimates treatment effects for each subgroup,
improving the treatment effect estimation by considering the heterogeneity of
treatment responses. Comparative experiments on synthetic data show that
SubgroupTE outperforms existing models in treatment effect estimation.
Furthermore, experiments on a real-world dataset related to opioid use disorder
(OUD) demonstrate the potential of our approach to enhance personalized
treatment recommendations for OUD patients.
</p>
|
|
|
|
<p>We investigate the prospect of reconstructing the ``cosmic distance ladder''
of the Universe using a novel deep learning framework called LADDER - Learning
Algorithm for Deep Distance Estimation and Reconstruction. LADDER is trained on
the apparent magnitude data from the Pantheon Type Ia supernovae compilation,
incorporating the full covariance information among data points, to produce
predictions along with corresponding errors. After employing several validation
tests with a number of deep learning models, we pick LADDER as the best
performing one. We then demonstrate applications of our method in the
cosmological context, that include serving as a model-independent tool for
consistency checks for other datasets like baryon acoustic oscillations,
calibration of high-redshift datasets such as gamma ray bursts, use as a
model-independent mock catalog generator for future probes, etc. Our analysis
advocates for interesting yet cautious consideration of machine learning
applications in these contexts.
</p>
|
|
|
|
<p>One of the most critical aspects of multimodal Reinforcement Learning (RL) is
the effective integration of different observation modalities. Having robust
and accurate representations derived from these modalities is key to enhancing
the robustness and sample efficiency of RL algorithms. However, learning
representations in RL settings for visuotactile data poses significant
challenges, particularly due to the high dimensionality of the data and the
complexity involved in correlating visual and tactile inputs with the dynamic
environment and task objectives. To address these challenges, we propose
Multimodal Contrastive Unsupervised Reinforcement Learning (M2CURL). Our
approach employs a novel multimodal self-supervised learning technique that
learns efficient representations and contributes to faster convergence of RL
algorithms. Our method is agnostic to the RL algorithm, thus enabling its
integration with any available RL algorithm. We evaluate M2CURL on the Tactile
Gym 2 simulator and we show that it significantly enhances the learning
efficiency in different manipulation tasks. This is evidenced by faster
convergence rates and higher cumulative rewards per episode, compared to
standard RL algorithms without our representation learning approach.
</p>
|
|
|
|
<p>Deep subspace clustering (DSC) networks based on self-expressive model learn
representation matrix, often implemented in terms of fully connected network,
in the embedded space. After the learning is finished, representation matrix is
used by spectral clustering module to assign labels to clusters. However, such
approach ignores complementary information that exist in other layers of the
encoder (including the input data themselves). Herein, we apply selected linear
subspace clustering algorithm to learn representation matrices from
representations learned by all layers of encoder network including the input
data. Afterward, we learn a multilayer graph that in a multi-view like manner
integrates information from graph Laplacians of all used layers. That improves
further performance of selected DSC network. Furthermore, we also provide
formulation of our approach to cluster out-of-sample/test data points. We
validate proposed approach on four well-known datasets with two DSC networks as
baseline models. In almost all the cases, proposed approach achieved
statistically significant improvement in three performance metrics. MATLAB code
of proposed algorithm is posted on https://github.com/lovro-sinda/MLG-DSC.
</p>
|
|
|
|
<p>Kernel methods are applied to many problems in pattern recognition, including
subspace clustering (SC). That way, nonlinear problems in the input data space
become linear in mapped high-dimensional feature space. Thereby,
computationally tractable nonlinear algorithms are enabled through implicit
mapping by the virtue of kernel trick. However, kernelization of linear
algorithms is possible only if square of the Froebenious norm of the error term
is used in related optimization problem. That, however, implies normal
distribution of the error. That is not appropriate for non-Gaussian errors such
as gross sparse corruptions that are modeled by -norm. Herein, to the best of
our knowledge, we propose for the first time robust kernel sparse SC (RKSSC)
algorithm for data with gross sparse corruptions. The concept, in principle,
can be applied to other SC algorithms to achieve robustness to the presence of
such type of corruption. We validated proposed approach on two well-known
datasets with linear robust SSC algorithm as a baseline model. According to
Wilcoxon test, clustering performance obtained by the RKSSC algorithm is
statistically significantly better than corresponding performance obtained by
the robust SSC algorithm. MATLAB code of proposed RKSSC algorithm is posted on
https://github.com/ikopriva/RKSSC.
</p>
|
|
|
|
<p>The structure of data organization is widely recognized as having a
substantial influence on the efficacy of machine learning algorithms,
particularly in binary classification tasks. Our research provides a
theoretical framework suggesting that the maximum potential of binary
classifiers on a given dataset is primarily constrained by the inherent
qualities of the data. Through both theoretical reasoning and empirical
examination, we employed standard objective functions, evaluative metrics, and
binary classifiers to arrive at two principal conclusions. Firstly, we show
that the theoretical upper bound of binary classification performance on actual
datasets can be theoretically attained. This upper boundary represents a
calculable equilibrium between the learning loss and the metric of evaluation.
Secondly, we have computed the precise upper bounds for three commonly used
evaluation metrics, uncovering a fundamental uniformity with our overarching
thesis: the upper bound is intricately linked to the dataset's characteristics,
independent of the classifier in use. Additionally, our subsequent analysis
uncovers a detailed relationship between the upper limit of performance and the
level of class overlap within the binary classification data. This relationship
is instrumental for pinpointing the most effective feature subsets for use in
feature engineering.
</p>
|
|
|
|
<p>This paper studies Bayesian optimization with noise-free observations. We
introduce new algorithms rooted in scattered data approximation that rely on a
random exploration step to ensure that the fill-distance of query points decays
at a near-optimal rate. Our algorithms retain the ease of implementation of the
classical GP-UCB algorithm and satisfy cumulative regret bounds that nearly
match those conjectured in <a href="/abs/2002.05096">arXiv:2002.05096</a>, hence solving a COLT open problem.
Furthermore, the new algorithms outperform GP-UCB and other popular Bayesian
optimization strategies in several examples.
</p>
|
|
|
|
<p>Recently, there has been increasing concern about the vulnerability of deep
neural network (DNN)-based synthetic aperture radar (SAR) automatic target
recognition (ATR) to adversarial attacks, where a DNN could be easily deceived
by clean input with imperceptible but aggressive perturbations. This paper
studies the synthetic-to-measured (S2M) transfer setting, where an attacker
generates adversarial perturbation based solely on synthetic data and transfers
it against victim models trained with measured data. Compared with the current
measured-to-measured (M2M) transfer setting, our approach does not need direct
access to the victim model or the measured SAR data. We also propose the
transferability estimation attack (TEA) to uncover the adversarial risks in
this more challenging and practical scenario. The TEA makes full use of the
limited similarity between the synthetic and measured data pairs for blind
estimation and optimization of S2M transferability, leading to feasible
surrogate model enhancement without mastering the victim model and data.
Comprehensive evaluations based on the publicly available synthetic and
measured paired labeled experiment (SAMPLE) dataset demonstrate that the TEA
outperforms state-of-the-art methods and can significantly enhance various
attack algorithms in computer vision and remote sensing applications. Codes and
data are available at https://github.com/scenarri/S2M-TEA.
</p>
|
|
|
|
<p>Clarification requests are a mechanism to help solve communication problems,
e.g. due to ambiguity or underspecification, in instruction-following
interactions. Despite their importance, even skilful models struggle with
producing or interpreting such repair acts. In this work, we test three
hypotheses concerning the effects of action taking as an auxiliary task in
modelling iCR policies. Contrary to initial expectations, we conclude that its
contribution to learning an iCR policy is limited, but some information can
still be extracted from prediction uncertainty. We present further evidence
that even well-motivated, Transformer-based models fail to learn good policies
for when to ask Instruction CRs (iCRs), while the task of determining what to
ask about can be more successfully modelled. Considering the implications of
these findings, we further discuss the shortcomings of the data-driven paradigm
for learning meta-communication acts.
</p>
|
|
|
|
<p>Recently, deep learning techniques are gradually replacing traditional
statistical and machine learning models as the first choice for price
forecasting tasks. In this paper, we leverage probabilistic deep learning for
inferring the volatility index VIX. We employ the probabilistic counterpart of
WaveNet, Temporal Convolutional Network (TCN), and Transformers. We show that
TCN outperforms all models with an RMSE around 0.189. In addition, it has been
well known that modern neural networks provide inaccurate uncertainty
estimates. For solving this problem, we use the standard deviation scaling to
calibrate the networks. Furthermore, we found out that MNF with Gaussian prior
outperforms Reparameterization Trick and Flipout models in terms of precision
and uncertainty predictions. Finally, we claim that MNF with Cauchy and
LogUniform prior distributions yield well calibrated TCN and WaveNet networks
being the former that best infer the VIX values.
</p>
|
|
|
|
<p>Retrieval-Augmented Generation (RAG) is a technique that enhances the
capabilities of large language models (LLMs) by incorporating external
knowledge sources. This method addresses common LLM limitations, including
outdated information and the tendency to produce inaccurate "hallucinated"
content. However, the evaluation of RAG systems is challenging, as existing
benchmarks are limited in scope and diversity. Most of the current benchmarks
predominantly assess question-answering applications, overlooking the broader
spectrum of situations where RAG could prove advantageous. Moreover, they only
evaluate the performance of the LLM component of the RAG pipeline in the
experiments, and neglect the influence of the retrieval component and the
external knowledge database. To address these issues, this paper constructs a
large-scale and more comprehensive benchmark, and evaluates all the components
of RAG systems in various RAG application scenarios. Specifically, we have
categorized the range of RAG applications into four distinct types-Create,
Read, Update, and Delete (CRUD), each representing a unique use case. "Create"
refers to scenarios requiring the generation of original, varied content.
"Read" involves responding to intricate questions in knowledge-intensive
situations. "Update" focuses on revising and rectifying inaccuracies or
inconsistencies in pre-existing texts. "Delete" pertains to the task of
summarizing extensive texts into more concise forms. For each of these CRUD
categories, we have developed comprehensive datasets to evaluate the
performance of RAG systems. We also analyze the effects of various components
of the RAG system, such as the retriever, the context length, the knowledge
base construction, and the LLM. Finally, we provide useful insights for
optimizing the RAG technology for different scenarios.
</p>
|
|
|
|
<p>Multi-Agent Path Finding (MAPF) involves determining paths for multiple
agents to travel simultaneously through a shared area toward particular goal
locations. This problem is computationally complex, especially when dealing
with large numbers of agents, as is common in realistic applications like
autonomous vehicle coordination. Finding an optimal solution is often
computationally infeasible, making the use of approximate algorithms essential.
Adding to the complexity, agents might act in a self-interested and strategic
way, possibly misrepresenting their goals to the MAPF algorithm if it benefits
them. Although the field of mechanism design offers tools to align incentives,
using these tools without careful consideration can fail when only having
access to approximately optimal outcomes. Since approximations are crucial for
scalable MAPF algorithms, this poses a significant challenge. In this work, we
introduce the problem of scalable mechanism design for MAPF and propose three
strategyproof mechanisms, two of which even use approximate MAPF algorithms. We
test our mechanisms on realistic MAPF domains with problem sizes ranging from
dozens to hundreds of agents. Our findings indicate that they improve welfare
beyond a simple baseline.
</p>
|
|
|
|
<p>The emergence of tools based on artificial intelligence has also led to the
need of producing explanations which are understandable by a human being. In
some approaches, the system is not transparent (often referred to as a "black
box"), making it difficult to generate appropriate explanations. In this work,
though, we consider probabilistic logic programming, a combination of logic
programming (for knowledge representation) and probability (to model
uncertainty). In this setting, one can say that models are interpretable, which
eases its understanding. However, given a particular query, the usual notion of
"explanation" is associated with a set of choices, one for each random variable
of the model. Unfortunately, this set does not have a causal structure and, in
fact, some of the choices are actually irrelevant to the considered query. In
order to overcome these shortcomings, we present an approach to explaining
explanations which is based on the definition of a query-driven inference
mechanism for probabilistic logic programs.
</p>
|
|
|
|
<p>Movable antenna (MA) provides an innovative way to arrange antennas that can
contribute to improved signal quality and more effective interference
management. This method is especially beneficial for full-duplex (FD) wireless,
which struggles with self-interference (SI) that usually overpowers the desired
incoming signals. By dynamically repositioning transmit/receive antennas, we
can mitigate the SI and enhance the reception of incoming signals. Thus, this
paper proposes a novel MA-enabled point-to-point FD wireless system and
formulates the minimum achievable rate of two FD terminals. To maximize the
minimum achievable rate and determine the near-optimal positions of the MAs, we
introduce a solution based on projected particle swarm optimization (PPSO),
which can circumvent common suboptimal positioning issues. Moreover, numerical
results reveal that the PPSO method leads to a better performance compared to
the conventional alternating position optimization (APO). The results also
demonstrate that an MA-enabled FD system outperforms the one using
fixed-position antennas (FPAs).
</p>
|
|
|
|
<p>As computer vision continues to advance and finds widespread applications
across various domains, the need for interpretability in deep learning models
becomes paramount. Existing methods often resort to post-hoc techniques or
prototypes to explain the decision-making process, which can be indirect and
lack intrinsic illustration. In this research, we introduce ViTree, a novel
approach for fine-grained visual categorization that combines the popular
vision transformer as a feature extraction backbone with neural decision trees.
By traversing the tree paths, ViTree effectively selects patches from
transformer-processed features to highlight informative local regions, thereby
refining representations in a step-wise manner. Unlike previous tree-based
models that rely on soft distributions or ensembles of paths, ViTree selects a
single tree path, offering a clearer and simpler decision-making process. This
patch and path selectivity enhances model interpretability of ViTree, enabling
better insights into the model's inner workings. Remarkably, extensive
experimentation validates that this streamlined approach surpasses various
strong competitors and achieves state-of-the-art performance while maintaining
exceptional interpretability which is proved by multi-perspective methods. Code
can be found at https://github.com/SJTU-DeepVisionLab/ViTree.
</p>
|
|
|
|
<p>Deep learning for tabular data has garnered increasing attention in recent
years, yet employing deep models for structured data remains challenging. While
these models excel with unstructured data, their efficacy with structured data
has been limited. Recent research has introduced retrieval-augmented models to
address this gap, demonstrating promising results in supervised tasks such as
classification and regression. In this work, we investigate using
retrieval-augmented models for anomaly detection on tabular data. We propose a
reconstruction-based approach in which a transformer model learns to
reconstruct masked features of \textit{normal} samples. We test the
effectiveness of KNN-based and attention-based modules to select relevant
samples to help in the reconstruction process of the target sample. Our
experiments on a benchmark of 31 tabular datasets reveal that augmenting this
reconstruction-based anomaly detection (AD) method with non-parametric
relationships via retrieval modules may significantly boost performance.
</p>
|
|
|
|
<p>We present BlockFusion, a diffusion-based model that generates 3D scenes as
unit blocks and seamlessly incorporates new blocks to extend the scene.
BlockFusion is trained using datasets of 3D blocks that are randomly cropped
from complete 3D scene meshes. Through per-block fitting, all training blocks
are converted into the hybrid neural fields: with a tri-plane containing the
geometry features, followed by a Multi-layer Perceptron (MLP) for decoding the
signed distance values. A variational auto-encoder is employed to compress the
tri-planes into the latent tri-plane space, on which the denoising diffusion
process is performed. Diffusion applied to the latent representations allows
for high-quality and diverse 3D scene generation. To expand a scene during
generation, one needs only to append empty blocks to overlap with the current
scene and extrapolate existing latent tri-planes to populate new blocks. The
extrapolation is done by conditioning the generation process with the feature
samples from the overlapping tri-planes during the denoising iterations. Latent
tri-plane extrapolation produces semantically and geometrically meaningful
transitions that harmoniously blend with the existing scene. A 2D layout
conditioning mechanism is used to control the placement and arrangement of
scene elements. Experimental results indicate that BlockFusion is capable of
generating diverse, geometrically consistent and unbounded large 3D scenes with
unprecedented high-quality shapes in both indoor and outdoor scenarios.
</p>
|
|
|
|
<p>Finding obstacle-free paths in unknown environments is a big navigation issue
for visually impaired people and autonomous robots. Previous works focus on
obstacle avoidance, however they do not have a general view of the environment
they are moving in. New devices based on computer vision systems can help
impaired people to overcome the difficulties of navigating in unknown
environments in safe conditions. In this work it is proposed a combination of
sensors and algorithms that can lead to the building of a navigation system for
visually impaired people. Based on traditional systems that use RGB-D cameras
for obstacle avoidance, it is included and combined the information of a
fish-eye camera, which will give a better understanding of the user's
surroundings. The combination gives robustness and reliability to the system as
well as a wide field of view that allows to obtain many information from the
environment. This combination of sensors is inspired by human vision where the
center of the retina (fovea) provides more accurate information than the
periphery, where humans have a wider field of view. The proposed system is
mounted on a wearable device that provides the obstacle-free zones of the
scene, allowing the planning of trajectories for people guidance.
</p>
|
|
|
|
<p>We leverage the Gibbs inequality and its natural generalization to R\'enyi
entropies to derive closed-form parametric expressions of the optimal lower
bounds of $\rho$th-order guessing entropy (guessing moment) of a secret taking
values on a finite set, in terms of the R\'enyi-Arimoto $\alpha$-entropy. This
is carried out in an non-asymptotic regime when side information may be
available. The resulting bounds yield a theoretical solution to a fundamental
problem in side-channel analysis: Ensure that an adversary will not gain much
guessing advantage when the leakage information is sufficiently weakened by
proper countermeasures in a given cryptographic implementation. Practical
evaluation for classical leakage models show that the proposed bounds greatly
improve previous ones for analyzing the capability of an adversary to perform
side-channel attacks.
</p>
|
|
|
|
<p>In this work we present a novel approach for 3D layout recovery of indoor
environments using a non-central acquisition system. From a non-central
panorama, full and scaled 3D lines can be independently recovered by geometry
reasoning without geometric nor scale assumptions. However, their sensitivity
to noise and complex geometric modeling has led these panoramas being little
investigated. Our new pipeline aims to extract the boundaries of the structural
lines of an indoor environment with a neural network and exploit the properties
of non-central projection systems in a new geometrical processing to recover an
scaled 3D layout. The results of our experiments show that we improve
state-of-the-art methods for layout reconstruction and line extraction in
non-central projection systems. We completely solve the problem in Manhattan
and Atlanta environments, handling occlusions and retrieving the metric scale
of the room without extra measurements. As far as the authors knowledge goes,
our approach is the first work using deep learning on non-central panoramas and
recovering scaled layouts from single panoramas.
</p>
|
|
|
|
<p>In data exploration, executing complex non-aggregate queries over large
databases can be time-consuming. Our paper introduces a novel approach to
address this challenge, focusing on finding an optimized subset of data,
referred to as the approximation set, for query execution. The goal is to
maximize query result quality while minimizing execution time. We formalize
this problem as Approximate Non-Aggregates Query Processing (ANAQP) and
establish its NP-completeness. To tackle this, we propose an approximate
solution using advanced Reinforcement Learning architecture, termed ASQP-RL.
This approach overcomes challenges related to the large action space and the
need for generalization beyond a known query workload. Experimental results on
two benchmarks demonstrate the superior performance of ASQP-RL, outperforming
baselines by 30% in accuracy and achieving efficiency gains of 10-35X. Our
research sheds light on the potential of reinforcement learning techniques for
advancing data management tasks. Experimental results on two benchmarks show
that ASQP-RL significantly outperforms the baselines both in terms of accuracy
(30% better) and efficiency (10-35X). This research provides valuable insights
into the potential of RL techniques for future advancements in data management
tasks.
</p>
|
|
|
|
<p>Omnidirectional and 360{\deg} images are becoming widespread in industry and
in consumer society, causing omnidirectional computer vision to gain attention.
Their wide field of view allows the gathering of a great amount of information
about the environment from only an image. However, the distortion of these
images requires the development of specific algorithms for their treatment and
interpretation. Moreover, a high number of images is essential for the correct
training of computer vision algorithms based on learning. In this paper, we
present a tool for generating datasets of omnidirectional images with semantic
and depth information. These images are synthesized from a set of captures that
are acquired in a realistic virtual environment for Unreal Engine 4 through an
interface plugin. We gather a variety of well-known projection models such as
equirectangular and cylindrical panoramas, different fish-eye lenses,
catadioptric systems, and empiric models. Furthermore, we include in our tool
photorealistic non-central-projection systems as non-central panoramas and
non-central catadioptric systems. As far as we know, this is the first reported
tool for generating photorealistic non-central images in the literature.
Moreover, since the omnidirectional images are made virtually, we provide
pixel-wise information about semantics and depth as well as perfect knowledge
of the calibration parameters of the cameras. This allows the creation of
ground-truth information with pixel precision for training learning algorithms
and testing 3D vision approaches. To validate the proposed tool, different
computer vision algorithms are tested as line extractions from dioptric and
catadioptric central images, 3D Layout recovery and SLAM using equirectangular
panoramas, and 3D reconstruction from non-central panoramas.
</p>
|
|
|
|
<p>This article proposes a test procedure that can be used to test ML models and
ML-based systems independently of the actual training process. In this way, the
typical quality statements such as accuracy and precision of these models and
system can be verified independently, taking into account their black box
character and the immanent stochastic properties of ML models and their
training data. The article presents first results from a set of test
experiments and suggest extensions to existing test methods reflecting the
stochastic nature of ML models and ML-based systems.
</p>
|
|
|
|
<p>For most service architectures, such as OSGi and Spring,
architecture-specific tools allow software developers and architects to
visualize otherwise obscure configurations hidden in the project files. Such
visualization tools are often used for documentation purposes and help to
better understand programs than with source code alone. However, such tools
often do not address project-specific peculiarities or do not exist at all for
less common architectures, requiring developers to use different visualization
and analysis tools within the same architecture. Furthermore, many generic
modeling tools and architecture visualization tools require their users to
create and maintain models manually.
</p>
<p>We here propose a DSL-driven approach that allows software architects to
define and adapt their own project visualization tool. The approach, which we
refer to as Software Project Visualization (SPViz), uses two DSLs, one to
describe architectural elements and their relationships, and one to describe
how these should be visualized. We demonstrate how SPViz can then automatically
synthesize a customized, project-specific visualization tool that can adapt to
changes in the underlying project automatically.
</p>
<p>We implemented our approach in an open-source library, also termed SPViz and
discuss and analyze four different tools that follow this concept, including
open-source projects and projects from an industrial partner in the railway
domain.
</p>
|
|
|
|
<p>As intelligent systems become increasingly important in our daily lives, new
ways of interaction are needed. Classical user interfaces pose issues for the
physically impaired and are partially not practical or convenient. Gesture
recognition is an alternative, but often not reactive enough when conventional
cameras are used. This work proposes a Spiking Convolutional Neural Network,
processing event- and depth data for gesture recognition. The network is
simulated using the open-source neuromorphic computing framework LAVA for
offline training and evaluation on an embedded system. For the evaluation three
open source data sets are used. Since these do not represent the applied
bi-modality, a new data set with synchronized event- and depth data was
recorded. The results show the viability of temporal encoding on depth
information and modality fusion, even on differently encoded data, to be
beneficial to network performance and generalization capabilities.
</p>
|
|
|
|
<p>We propose a simple and effective method to derive the Fundamental Diagram
(FD) from platoon vehicle trajectories. Average traffic state variables are
computed using Edie's generalized definitions within time-dependent trapezoidal
space-time areas. To obtain a clear FD, we employ a bivariate data aggregation
technique to eliminate scatter. Our findings are as follows: (i) The proposed
method demonstrates a remarkably consistent relation between the traffic
variables and a clear triangular shape for autonomously-driven vehicles. (ii)
The FDs are invariant to several factors of heterogeneity such as the platoon
length, vehicle characteristics, road particularities, and data acquisition
accuracy. (iii) ACC-driven vehicle platoons with minimum headway setting
achieve much higher capacity, roughly 90\% than those with a large headway
setting. (iv) Connectivity might increase capacity. (v) Human drivers have a
wider near-capacity operation area, showing different behaviors at high speeds
than low ones, and (vi) Safety concerns might arise due to high values of
backward wave speed for ACC-driven vehicles. Comparative analysis with the
state-of-the-art confirms the validity of our approach. The proposed method
stands out due to its simplicity and accuracy, which paves the way for
practical applications in real-time traffic flow monitoring and control within
modern intelligent transportation systems.
</p>
|
|
|
|
<p>Integration of technological solutions aims to improve accuracy, precision
and repeatability in farming operations, and biosensor devices are increasingly
used for understanding basic biology during livestock production. The aim of
this study was to design and validate a miniaturized tri-axial accelerometer
for non-invasive monitoring of farmed fish with re-programmable schedule
protocols.The device was attached to the operculum of gilthead sea bream and
European sea bass juveniles for monitoring their physical activity by
measurements of movement accelerations in x and y-axes, while records of
operculum beats served as a measurement of respiratory frequency. Data
post-processing of exercised fish in swimming test chambers revealed an
exponential increase of fish accelerations with the increase of fish speed from
1 body-length to 4 body-lengths per second, while a close relationship between
oxygen consumption and opercular frequency was consistently found.The
usefulness of low computational load for data pre-processing with on-board
algorithms was verified from low to submaximal exercise, increasing this
procedure the autonomy of the system up to 6 h of data recording with different
programmable schedules. Visual observations regarding tissue damage, feeding
behavior and circulating levels of stress markers did not reveal at short term
a negative impact of device tagging. Reduced plasma levels of triglycerides
revealed a transient inhibition of feed intake in small fish, but this
disturbance was not detected in larger fish. All this considered together is
the proof of concept that miniaturized devices are suitable for non-invasive
and reliable metabolic phenotyping of farmed fish to improve their overall
performance and welfare. Further work is underway for improving the attachment
procedure and the full device packaging.
</p>
|
|
|
|
<p>Instruction-tuned Large Language Models (LLMs) have recently showcased
remarkable advancements in their ability to generate fitting responses to
natural language instructions. However, many current works rely on manual
evaluation to judge the quality of generated responses. Since such manual
evaluation is time-consuming, it does not easily scale to the evaluation of
multiple models and model variants. In this short paper, we propose a
straightforward but remarkably effective evaluation metric called SemScore, in
which we directly compare model outputs to gold target responses using semantic
textual similarity (STS). We conduct a comparative evaluation of the model
outputs of 12 prominent instruction-tuned LLMs using 8 widely-used evaluation
metrics for text generation. We find that our proposed SemScore metric
outperforms all other, in many cases more complex, evaluation metrics in terms
of correlation to human evaluation. These findings indicate the utility of our
proposed metric for the evaluation of instruction-tuned LLMs.
</p>
|
|
|
|
<p>Omnidirectional images are one of the main sources of information for
learning based scene understanding algorithms. However, annotated datasets of
omnidirectional images cannot keep the pace of these learning based algorithms
development. Among the different panoramas and in contrast to standard central
ones, non-central panoramas provide geometrical information in the distortion
of the image from which we can retrieve 3D information of the environment [2].
However, due to the lack of commercial non-central devices, up until now there
was no dataset of these kinds of panoramas. In this data paper, we present the
first dataset of non-central panoramas for indoor scene understanding. The
dataset is composed by {\bf 2574} RGB non-central panoramas taken in around 650
different rooms. Each panorama has associated a depth map and annotations to
obtain the layout of the room from the image as a structural edge map, list of
corners in the image, the 3D corners of the room and the camera pose. The
images are taken from photorealistic virtual environments and pixel-wise
automatically annotated.
</p>
|
|
|
|
<p>We consider the task of learning individual-specific intensities of counting
processes from a set of static variables and irregularly sampled time series.
We introduce a novel modelization approach in which the intensity is the
solution to a controlled differential equation. We first design a neural
estimator by building on neural controlled differential equations. In a second
time, we show that our model can be linearized in the signature space under
sufficient regularity conditions, yielding a signature-based estimator which we
call CoxSig. We provide theoretical learning guarantees for both estimators,
before showcasing the performance of our models on a vast array of simulated
and real-world datasets from finance, predictive maintenance and food supply
chain management.
</p>
|
|
|
|
<p>Casting manipulation has been studied to expand the robot's movable range. In
this manipulation, the robot throws and reaches the end effector to a distant
target. Usually, a special casting manipulator, which consists of rigid arm
links and specific flexible linear objects, is constructed for an effective
casting manipulation. However, the special manipulator cannot perform normal
manipulations, such as picking and placing, grasping, and operating objects. We
propose that the normal robot arm, which can perform normal tasks, picks up an
unknown string in the surrounding environment and realizes casting manipulation
with it. As the properties of the string are not provided in advance, it is
crucial how to reflect it in casting manipulation. This is realized by the
motion generation of the robot arm with the simulation of string movement,
actual string manipulation by the robot arm, and string parameter estimation
from the actual string movement. After repeating these three steps, the
simulated string movement approximates the actual to realize casting
manipulation with the unknown string. We confirmed the effectiveness of the
proposed method through experiments. The try of this study will lead to
enhancement of the performance of home service robot, exploration robot, rescue
robot and entertainment robot.
</p>
|
|
|
|
<p>Autonomous robot navigation within the dynamic unknown environment is of
crucial significance for mobile robotic applications including robot navigation
in last-mile delivery and robot-enabled automated supplies in industrial and
hospital delivery applications. Current solutions still suffer from
limitations, such as the robot cannot recognize unknown objects in real time
and cannot navigate freely in a dynamic, narrow, and complex environment. We
propose a complete software framework for autonomous robot perception and
navigation within very dense obstacles and dense human crowds. First, we
propose a framework that accurately detects and segments open-world object
categories in a zero-shot manner, which overcomes the over-segmentation
limitation of the current SAM model. Second, we proposed the distillation
strategy to distill the knowledge to segment the free space of the walkway for
robot navigation without the label. In the meantime, we design the trimming
strategy that works collaboratively with distillation to enable lightweight
inference to deploy the neural network on edge devices such as NVIDIA-TX2 or
Xavier NX during autonomous navigation. Integrated into the robot navigation
system, extensive experiments demonstrate that our proposed framework has
achieved superior performance in terms of both accuracy and efficiency in robot
scene perception and autonomous robot navigation.
</p>
|
|
|
|
<p>A multi-input multi-output (MIMO) Gaussian channel with two transmit antennas
and two receive antennas is studied that is subject to an input peak-power
constraint. The capacity and the capacity-achieving input distribution are
unknown in general. The problem is shown to be equivalent to a channel with an
identity matrix but where the input lies inside and on an ellipse with
principal axis length $r_p$ and minor axis length $r_m$. If $r_p \le \sqrt{2}$,
then the capacity-achieving input has support on the ellipse. A sufficient
condition is derived under which a two-point distribution is optimal. Finally,
if $r_m < r_p \le \sqrt{2}$, then the capacity-achieving distribution is
discrete.
</p>
|
|
|
|
<p>Data generation is a data augmentation technique for enhancing the
generalization ability for skeleton-based human action recognition. Most
existing data generation methods face challenges to ensure the temporal
consistency of the dynamic information for action. In addition, the data
generated by these methods lack diversity when only a few training samples are
available. To solve those problems, We propose a novel active generative
network (AGN), which can adaptively learn various action categories by motion
style transfer to generate new actions when the data for a particular action is
only a single sample or few samples. The AGN consists of an action generation
network and an uncertainty metric network. The former, with ST-GCN as the
Backbone, can implicitly learn the morphological features of the target action
while preserving the category features of the source action. The latter guides
generating actions. Specifically, an action recognition model generates
prediction vectors for each action, which is then scored using an uncertainty
metric. Finally, UMN provides the uncertainty sampling basis for the generated
actions.
</p>
|
|
|
|
<p>We present a new method to estimate the rate-distortion-perception function
in the perfect realism regime (PR-RDPF), for multivariate continuous sources
subject to a single-letter average distortion constraint. The proposed approach
is not only able to solve the specific problem but also two related problems:
the entropic optimal transport (EOT) and the output-constrained rate-distortion
function (OC-RDF), of which the PR-RDPF represents a special case. Using copula
distributions, we show that the OC-RDF can be cast as an I-projection problem
on a convex set, based on which we develop a parametric solution of the optimal
projection proving that its parameters can be estimated, up to an arbitrary
precision, via the solution of a convex program. Subsequently, we propose an
iterative scheme via gradient methods to estimate the convex program. Lastly,
we characterize a Shannon lower bound (SLB) for the PR-RDPF under a mean
squared error (MSE) distortion constraint. We support our theoretical findings
with numerical examples by assessing the estimation performance of our
iterative scheme using the PR-RDPF with the obtained SLB for various sources.
</p>
|
|
|
|
<p>The labor market is changing rapidly, prompting increased interest in the
automatic extraction of occupational skills from text. With the advent of
English benchmark job description datasets, there is a need for systems that
handle their diversity well. We tackle the complexity in occupational skill
datasets tasks -- combining and leveraging multiple datasets for skill
extraction, to identify rarely observed skills within a dataset, and overcoming
the scarcity of skills across datasets. In particular, we investigate the
retrieval-augmentation of language models, employing an external datastore for
retrieving similar skills in a dataset-unifying manner. Our proposed method,
\textbf{N}earest \textbf{N}eighbor \textbf{O}ccupational \textbf{S}kill
\textbf{E}xtraction (NNOSE) effectively leverages multiple datasets by
retrieving neighboring skills from other datasets in the datastore. This
improves skill extraction \emph{without} additional fine-tuning. Crucially, we
observe a performance gain in predicting infrequent patterns, with substantial
gains of up to 30\% span-F1 in cross-dataset settings.
</p>
|
|
|
|
<p>To leverage LLMs for visual synthesis, traditional methods convert raster
image information into discrete grid tokens through specialized visual modules,
while disrupting the model's ability to capture the true semantic
representation of visual scenes. This paper posits that an alternative
representation of images, vector graphics, can effectively surmount this
limitation by enabling a more natural and semantically coherent segmentation of
the image information. Thus, we introduce StrokeNUWA, a pioneering work
exploring a better visual representation ''stroke tokens'' on vector graphics,
which is inherently visual semantics rich, naturally compatible with LLMs, and
highly compressed. Equipped with stroke tokens, StrokeNUWA can significantly
surpass traditional LLM-based and optimization-based methods across various
metrics in the vector graphic generation task. Besides, StrokeNUWA achieves up
to a 94x speedup in inference over the speed of prior methods with an
exceptional SVG code compression ratio of 6.9%.
</p>
|
|
|
|
<p>In the literature, there are many results about permutation polynomials over
finite fields. However, very few permutations of vector spaces are constructed
although it has been shown that permutations of vector spaces have many
applications in cryptography, especially in constructing permutations with low
differential and boomerang uniformities.
</p>
<p>In this paper, motivated by the butterfly structure
\cite{perrin2016cryptanalysis} and the work of Qu and Li \cite{qu2023}, we
investigate rotatable permutations from $\gf_{2^m}^3$ to itself with
$d$-homogenous functions.
</p>
<p>Based on the theory of equations of low degree, the resultant of polynomials,
and some skills of exponential sums, we construct five infinite classes of
$3$-homogeneous rotatable permutations from $\gf_{2^m}^3$ to itself, where $m$
is odd. Moreover, we demonstrate that the corresponding permutation polynomials
of $\gf_{2^{3m}}$ of our newly constructed permutations of $\gf_{2^m}^3$ are
QM-inequivalent to the known ones.
</p>
|
|
|
|
<p>This paper leverages macroscopic models and multi-source spatiotemporal data
collected from automatic traffic counters and probe vehicles to accurately
estimate traffic flow and travel time in links where these measurements are
unavailable. This problem is critical in transportation planning applications
where the sensor coverage is low and the planned interventions have
network-wide impacts. The proposed model, named the Macroscopic Traffic
Estimator (MaTE), can perform network-wide estimations of traffic flow and
travel time only using the set of observed measurements of these quantities.
Because MaTE is grounded in macroscopic flow theory, all parameters and
variables are interpretable. The estimated traffic flow satisfies fundamental
flow conservation constraints and exhibits an increasing monotonic relationship
with the estimated travel time. Using logit-based stochastic traffic assignment
as the principle for routing flow behavior makes the model fully differentiable
with respect to the model parameters. This property facilitates the application
of computational graphs to learn parameters from vast amounts of spatiotemporal
data. We also integrate neural networks and polynomial kernel functions to
capture link flow interactions and enrich the mapping of traffic flows into
travel times. MaTE also adds a destination choice model and a trip generation
model that uses historical data on the number of trips generated by location.
Experiments on synthetic data show that the model can accurately estimate
travel time and traffic flow in out-of-sample links. Results obtained using
real-world multi-source data from a large-scale transportation network suggest
that MaTE outperforms data-driven benchmarks, especially in travel time
estimation. The estimated parameters of MaTE are also informative about the
hourly change in travel demand and supply characteristics of the transportation
network.
</p>
|
|
|
|
<p>Handwritten character recognition (HCR) is a challenging problem for machine
learning researchers. Unlike printed text data, handwritten character datasets
have more variation due to human-introduced bias. With numerous unique
character classes present, some data, such as Logographic Scripts or
Sino-Korean character sequences, bring new complications to the HCR problem.
The classification task on such datasets requires the model to learn
high-complexity details of the images that share similar features. With recent
advances in computational resource availability and further computer vision
theory development, some research teams have effectively addressed the arising
challenges. Although known for achieving high efficiency, many common
approaches are still not generalizable and use dataset-specific solutions to
achieve better results. Due to complex structure and high computing demands,
existing methods frequently prevent the solutions from gaining popularity. This
paper proposes a straightforward, generalizable, and highly effective approach
(CharNet) for detailed character image classification and compares its
performance to that of existing approaches.
</p>
|
|
|
|
<p>Traditionally, Machine Translation (MT) Evaluation has been treated as a
regression problem -- producing an absolute translation-quality score. This
approach has two limitations: i) the scores lack interpretability, and human
annotators struggle with giving consistent scores; ii) most scoring methods are
based on (reference, translation) pairs, limiting their applicability in
real-world scenarios where references are absent. In practice, we often care
about whether a new MT system is better or worse than some competitors. In
addition, reference-free MT evaluation is increasingly practical and necessary.
Unfortunately, these two practical considerations have yet to be jointly
explored. In this work, we formulate the reference-free MT evaluation into a
pairwise ranking problem. Given the source sentence and a pair of translations,
our system predicts which translation is better. In addition to proposing this
new formulation, we further show that this new paradigm can demonstrate
superior correlation with human judgments by merely using indirect supervision
from natural language inference and weak supervision from our synthetic data.
In the context of reference-free evaluation, MT-Ranker, trained without any
human annotations, achieves state-of-the-art results on the WMT Shared Metrics
Task benchmarks DARR20, MQM20, and MQM21. On a more challenging benchmark,
ACES, which contains fine-grained evaluation criteria such as addition,
omission, and mistranslation errors, MT-Ranker marks state-of-the-art against
reference-free as well as reference-based baselines.
</p>
|
|
|
|
<p>The effectiveness of an IR system is gauged not just by its ability to
retrieve relevant results but also by how it presents these results to users;
an engaging presentation often correlates with increased user satisfaction.
While existing research has delved into the link between user satisfaction, IR
performance metrics, and presentation, these aspects have typically been
investigated in isolation. Our research aims to bridge this gap by examining
the relationship between query performance, presentation and user satisfaction.
For our analysis, we conducted a between-subjects experiment comparing the
effectiveness of various result card layouts for an ad-hoc news search
interface. Drawing data from the TREC WaPo 2018 collection, we centered our
study on four specific topics. Within each of these topics, we assessed six
distinct queries with varying nDCG values. Our study involved 164 participants
who were exposed to one of five distinct layouts containing result cards, such
as "title'', "title+image'', or "title+image+summary''. Our findings indicate
that while nDCG is a strong predictor of user satisfaction at the query level,
there exists no linear relationship between the performance of the query,
presentation of results and user satisfaction. However, when considering the
total gain on the initial result page, we observed that presentation does play
a significant role in user satisfaction (at the query level) for certain
layouts with result cards such as, title+image or title+image+summary. Our
results also suggest that the layout differences have complex and multifaceted
impacts on satisfaction. We demonstrate the capacity to equalize user
satisfaction levels between queries of varying performance by changing how
results are presented. This emphasizes the necessity to harmonize both
performance and presentation in IR systems, considering users' diverse
preferences.
</p>
|
|
|
|
<p>Purpose: To develop a method for automated segmentation of hypothalamus
subregions informed by ultra-high resolution ex vivo magnetic resonance images
(MRI), which generalizes across MRI sequences and resolutions without
retraining.
</p>
<p>Materials and Methods: We trained our deep learning method, H-synEx, with
synthetic images derived from label maps built from ultra-high resolution ex
vivo MRI scans, which enables finer-grained manual segmentation when compared
with 1mm isometric in vivo images. We validated this retrospective study using
1535 in vivo images from six datasets and six MRI sequences. The quantitative
evaluation used the Dice Coefficient (DC) and Average Hausdorff distance (AVD).
Statistical analysis compared hypothalamic subregion volumes in controls,
Alzheimer's disease (AD), and behavioral variant frontotemporal dementia
(bvFTD) subjects using the area under the curve (AUC) and Wilcoxon rank sum
test.
</p>
<p>Results: H-SynEx can segment the hypothalamus across various MRI sequences,
encompassing FLAIR sequences with significant slice spacing (5mm). Using
hypothalamic volumes on T1w images to distinguish control from AD and bvFTD
patients, we observed AUC values of 0.74 and 0.79 respectively. Additionally,
AUC=0.66 was found for volume variation on FLAIR scans when comparing control
and non-patients.
</p>
<p>Conclusion: Our results show that H-SynEx successfully leverages information
from ultra-high resolution scans to segment in vivo from different MRI
sequences such as T1w, T2w, PD, qT1, FA, and FLAIR. We also found that our
automated segmentation was able to discriminate controls versus patients on
FLAIR images with 5mm spacing. H-SynEx is openly available at
https://github.com/liviamarodrigues/hsynex.
</p>
|
|
|
|
<p>This paper investigates the secure resource allocation for a downlink
integrated sensing and communication system with multiple legal users and
potential eavesdroppers. In the considered model, the base station (BS)
simultaneously transmits sensing and communication signals through beamforming
design, where the sensing signals can be viewed as artificial noise to enhance
the security of communication signals. To further enhance the security in the
semantic layer, the semantic information is extracted from the original
information before transmission. The user side can only successfully recover
the received information with the help of the knowledge base shared with the
BS, which is stored in advance. Our aim is to maximize the sum semantic secrecy
rate of all users while maintaining the minimum quality of service for each
user and guaranteeing overall sensing performance. To solve this sum semantic
secrecy rate maximization problem, an iterative algorithm is proposed using the
alternating optimization method. The simulation results demonstrate the
superiority of the proposed algorithm in terms of secure semantic communication
and reliable detection.
</p>
|
|
|
|
<p>The field of Neural Style Transfer (NST) has witnessed remarkable progress in
the past few years, with approaches being able to synthesize artistic and
photorealistic images and videos of exceptional quality. To evaluate such
results, a diverse landscape of evaluation methods and metrics is used,
including authors' opinions based on side-by-side comparisons, human evaluation
studies that quantify the subjective judgements of participants, and a
multitude of quantitative computational metrics which objectively assess the
different aspects of an algorithm's performance. However, there is no consensus
regarding the most suitable and effective evaluation procedure that can
guarantee the reliability of the results. In this review, we provide an
in-depth analysis of existing evaluation techniques, identify the
inconsistencies and limitations of current evaluation methods, and give
recommendations for standardized evaluation practices. We believe that the
development of a robust evaluation framework will not only enable more
meaningful and fairer comparisons among NST methods but will also enhance the
comprehension and interpretation of research findings in the field.
</p>
|
|
|
|
<p>This note presents an upper bound of $1.252 n$ on the size of a set system
that satisfies the mod-6 town rules. Under these rules the sizes of the sets
are not congruent to $0\bmod 6$ while the sizes of all pairwise intersections
are congruent to $ 0\bmod 6$.
</p>
|
|
|
|
<p>The Mersenne Twister (MT) is a pseudo-random number generator (PRNG) widely
used in High Performance Computing for parallel stochastic simulations. We aim
to assess the quality of common parallelization techniques used to generate
large streams of MT pseudo-random numbers. We compare three techniques:
sequence splitting, random spacing and MT indexed sequence. The TestU01 Big
Crush battery is used to evaluate the quality of 4096 streams for each
technique on three different hardware configurations. Surprisingly, all
techniques exhibited almost 30% of defects with no technique showing better
quality than the others. While all 106 Big Crush tests showed failures, the
failure rate was limited to a small number of tests (maximum of 6 tests failed
per stream, resulting in over 94% success rate). Thanks to 33 CPU years,
high-quality streams identified are given. They can be used for sensitive
parallel simulations such as nuclear medicine and precise high-energy physics
applications.
</p>
|
|
|
|
<p>Quantum computing shows great potential, but errors pose a significant
challenge. This study explores new strategies for mitigating quantum errors
using artificial neural networks (ANN) and the Yang-Baxter equation (YBE).
Unlike traditional error correction methods, which are computationally
intensive, we investigate artificial error mitigation. The manuscript
introduces the basics of quantum error sources and explores the potential of
using classical computation for error mitigation. The Yang-Baxter equation
plays a crucial role, allowing us to compress time dynamics simulations into
constant-depth circuits. By introducing controlled noise through the YBE, we
enhance the dataset for error mitigation. We train an ANN model on partial data
from quantum simulations, demonstrating its effectiveness in correcting errors
in time-evolving quantum states.
</p>
|
|
|
|
<p>Vision-based estimation of the motion of a moving target is usually
formulated as a bearing-only estimation problem where the visual measurement is
modeled as a bearing vector. Although the bearing-only approach has been
studied for decades, a fundamental limitation of this approach is that it
requires extra lateral motion of the observer to enhance the target's
observability. Unfortunately, the extra lateral motion conflicts with the
desired motion of the observer in many tasks. It is well-known that, once a
target has been detected in an image, a bounding box that surrounds the target
can be obtained. Surprisingly, this common visual measurement especially its
size information has not been well explored up to now. In this paper, we
propose a new bearing-angle approach to estimate the motion of a target by
modeling its image bounding box as bearing-angle measurements. Both theoretical
analysis and experimental results show that this approach can significantly
enhance the observability without relying on additional lateral motion of the
observer. The benefit of the bearing-angle approach comes with no additional
cost because a bounding box is a standard output of object detection
algorithms. The approach simply exploits the information that has not been
fully exploited in the past. No additional sensing devices or special detection
algorithms are required.
</p>
|
|
|
|
<p>Traditional models grounded in first principles often struggle with accuracy
as the system's complexity increases. Conversely, machine learning approaches,
while powerful, face challenges in interpretability and in handling physical
constraints. Efforts to combine these models often often stumble upon
difficulties in finding a balance between accuracy and complexity. To address
these issues, we propose a comprehensive framework based on a "mixture of
experts" rationale. This approach enables the data-based fusion of diverse
local models, leveraging the full potential of first-principle-based priors.
Our solution allows independent training of experts, drawing on techniques from
both machine learning and system identification, and it supports both
collaborative and competitive learning paradigms. To enhance interpretability,
we penalize abrupt variations in the expert's combination. Experimental results
validate the effectiveness of our approach in producing an interpretable
combination of models closely resembling the target phenomena.
</p>
|
|
|
|
<p>Landscape renderings are realistic images of landscape sites, allowing
stakeholders to perceive better and evaluate design ideas. While recent
advances in Generative Artificial Intelligence (GAI) enable automated
generation of landscape renderings, the end-to-end methods are not compatible
with common design processes, leading to insufficient alignment with design
idealizations and limited cohesion of iterative landscape design. Informed by a
formative study for comprehending design requirements, we present PlantoGraphy,
an iterative design system that allows for interactive configuration of GAI
models to accommodate human-centered design practice. A two-stage pipeline is
incorporated: first, concretization module transforms conceptual ideas into
concrete scene layouts with a domain-oriented large language model; and second,
illustration module converts scene layouts into realistic landscape renderings
using a fine-tuned low-rank adaptation diffusion model. PlantoGraphy has
undergone a series of performance evaluations and user studies, demonstrating
its effectiveness in landscape rendering generation and the high recognition of
its interactive functionality.
</p>
|
|
|
|
<p>3D neural implicit representations play a significant component in many
robotic applications. However, reconstructing neural radiance fields (NeRF)
from realistic event data remains a challenge due to the sparsities and the
lack of information when only event streams are available. In this paper, we
utilize motion, geometry, and density priors behind event data to impose strong
physical constraints to augment NeRF training. The proposed novel pipeline can
directly benefit from those priors to reconstruct 3D scenes without additional
inputs. Moreover, we present a novel density-guided patch-based sampling
strategy for robust and efficient learning, which not only accelerates training
procedures but also conduces to expressions of local geometries. More
importantly, we establish the first large dataset for event-based 3D
reconstruction, which contains 101 objects with various materials and
geometries, along with the groundtruth of images and depth maps for all camera
viewpoints, which significantly facilitates other research in the related
fields. The code and dataset will be publicly available at
https://github.com/Mercerai/PAEv3d.
</p>
|
|
|
|
<p>In recent years the need for DC distribution buses has increased
considerably. As it can be noticed in the transport for example the
distribution systems of the more electric aircrafts, ships, or electric cars.
Given the complexities of the systems presented above, the need to use more and
more switched power converters has arisen. The main problem of the connection
of multiple controlled switched converters acting as source and load is the
degradation of stability that occurs on the DC distribution bus due to the
converter interactions. To study the stability in the distribution bus there
are some wellestablished criteria. These criteria require knowledge of the
input impedance of the converters that act as load and the output impedance of
the equipment that acts as source. In order to reduce the complexity of
obtaining the input impedance a model based on a controlled converter acting as
a constant power load (CPL) is commonly used. This article studies the accuracy
of this model for a commonly used topology in distribution systems nowadays,
Two Level Voltage Source Converter (2L-VSC), studying different scenarios that
make the model become inaccurate.
</p>
|
|
|
|
<p>Deep generative models (DGMs) have been widely developed for graph data.
However, much less investigation has been carried out on understanding the
latent space of such pretrained graph DGMs. These understandings possess the
potential to provide constructive guidelines for crucial tasks, such as graph
controllable generation. Thus in this work, we are interested in studying this
problem and propose GraphCG, a method for the unsupervised discovery of
steerable factors in the latent space of pretrained graph DGMs. We first
examine the representation space of three pretrained graph DGMs with six
disentanglement metrics, and we observe that the pretrained representation
space is entangled. Motivated by this observation, GraphCG learns the steerable
factors via maximizing the mutual information between semantic-rich directions,
where the controlled graph moving along the same direction will share the same
steerable factors. We quantitatively verify that GraphCG outperforms four
competitive baselines on two graph DGMs pretrained on two molecule datasets.
Additionally, we qualitatively illustrate seven steerable factors learned by
GraphCG on five pretrained DGMs over five graph datasets, including two for
molecules and three for point clouds.
</p>
|
|
|
|
<p>Personalized federated learning (PFL) has been widely investigated to address
the challenge of data heterogeneity, especially when a single generic model is
inadequate in satisfying the diverse performance requirements of local clients
simultaneously. Existing PFL methods are inherently based on the idea that the
relations between the generic global and personalized local models are captured
by the similarity of model weights. Such a similarity is primarily based on
either partitioning the model architecture into generic versus personalized
components, or modeling client relationships via model weights. To better
capture similar (yet distinct) generic versus personalized model
representations, we propose \textit{spectral distillation}, a novel
distillation method based on model spectrum information. Building upon spectral
distillation, we also introduce a co-distillation framework that establishes a
two-way bridge between generic and personalized model training. Moreover, to
utilize the local idle time in conventional PFL, we propose a wait-free local
training protocol. Through extensive experiments on multiple datasets over
diverse heterogeneous data settings, we demonstrate the outperformance and
efficacy of our proposed spectral co-distillation method, as well as our
wait-free training protocol.
</p>
|
|
|
|
<p>A key challenge for supporting elastic behaviour in cloud systems is to
achieve a good performance in automated (de-)provisioning and scheduling of
computing resources. One of the key aspects that can be significant is the
overheads associated with deploying, terminating and maintaining resources.
Therefore, due to their lower start up and termination overhead, containers are
rapidly replacing Virtual Machines (VMs) in many cloud deployments, as the
computation instance of choice. In this paper, we analyse the performance of
Kubernetes achieved through a Petri net-based performance model. Kubernetes is
a container management system for a distributed cluster environment. Our model
can be characterised using data from a Kubernetes deployment, and can be
exploited for supporting capacity planning and designing Kubernetes-based
elastic applications.
</p>
|
|
|
|
<p>The increased application of machine learning (ML) in sensitive domains
requires protecting the training data through privacy frameworks, such as
differential privacy (DP). DP requires to specify a uniform privacy level
$\varepsilon$ that expresses the maximum privacy loss that each data point in
the entire dataset is willing to tolerate. Yet, in practice, different data
points often have different privacy requirements. Having to set one uniform
privacy level is usually too restrictive, often forcing a learner to guarantee
the stringent privacy requirement, at a large cost to accuracy. To overcome
this limitation, we introduce our novel Personalized-DP Output Perturbation
method (PDP-OP) that enables to train Ridge regression models with individual
per data point privacy levels. We provide rigorous privacy proofs for our
PDP-OP as well as accuracy guarantees for the resulting model. This work is the
first to provide such theoretical accuracy guarantees when it comes to
personalized DP in machine learning, whereas previous work only provided
empirical evaluations. We empirically evaluate PDP-OP on synthetic and real
datasets and with diverse privacy distributions. We show that by enabling each
data point to specify their own privacy requirement, we can significantly
improve the privacy-accuracy trade-offs in DP. We also show that PDP-OP
outperforms the personalized privacy techniques of Jorgensen et al. (2015).
</p>
|
|
|
|
<p>This technical report details our work towards building an enhanced
audio-visual sound event localization and detection (SELD) network. We build on
top of the audio-only SELDnet23 model and adapt it to be audio-visual by
merging both audio and video information prior to the gated recurrent unit
(GRU) of the audio-only network. Our model leverages YOLO and DETIC object
detectors. We also build a framework that implements audio-visual data
augmentation and audio-visual synthetic data generation. We deliver an
audio-visual SELDnet system that outperforms the existing audio-visual SELD
baseline.
</p>
|
|
|
|
<p>More than 70 years ago, Jaques Riguet suggested the existence of an
``analogie frappante'' (striking analogy) between so-called ``relations de
Ferrers'' and a class of difunctional relations, members of which we call
``diagonals''. Inspired by his suggestion, we formulate an ``analogie
frappante'' linking the notion of a block-ordered relation and the notion of
the diagonal of a relation. We formulate several novel properties of the
core/index of a diagonal, and use these properties to rephrase our ``analogie
frappante''. Loosely speaking, we show that a block-ordered relation is a
provisional ordering up to isomorphism and reduction to its core. (Our theorems
make this informal statement precise.) Unlike Riguet (and others who follow his
example), we avoid almost entirely the use of nested complements to express and
reason about properties of these notions: we use factors (aka residuals)
instead. The only (and inevitable) exception to this is to show that our
definition of a ``staircase'' relation is equivalent to Riguet's definition of
a ``relation de Ferrers''. Our ``analogie frappante'' also makes it obvious
that a ``staircase'' relation is not necessarily block-ordered, in spite of the
mental picture of such a relation presented by Riguet.
</p>
|
|
|
|
<p>Singing voice conversion (SVC) automates song covers by converting one
singer's singing voice into another target singer's singing voice with the
original lyrics and melody. However, it raises serious concerns about copyright
and civil right infringements to multiple entities. This work proposes
SongBsAb, the first proactive approach to mitigate unauthorized SVC-based
illegal song covers. SongBsAb introduces human-imperceptible perturbations to
singing voices before releasing them, so that when they are used, the
generation process of SVC will be interfered, resulting in unexpected singing
voices. SongBsAb features a dual prevention effect by causing both (singer)
identity disruption and lyric disruption, namely, the SVC-covered singing voice
neither imitates the target singer nor preserves the original lyrics. To
improve the imperceptibility of perturbations, we refine a psychoacoustic
model-based loss with the backing track as an additional masker, a unique
accompanying element for singing voices compared to ordinary speech voices. To
enhance the transferability, we propose to utilize a frame-level interaction
reduction-based loss. We demonstrate the prevention effectiveness, utility, and
robustness of SongBsAb on three SVC models and two datasets using both
objective and human study-based subjective metrics. Our work fosters an
emerging research direction for mitigating illegal automated song covers.
</p>
|
|
|
|
<p>The principal component analysis (PCA) is widely used for data decorrelation
and dimensionality reduction. However, the use of PCA may be impractical in
real-time applications, or in situations were energy and computing constraints
are severe. In this context, the discrete cosine transform (DCT) becomes a
low-cost alternative to data decorrelation. This paper presents a method to
derive computationally efficient approximations to the DCT. The proposed method
aims at the minimization of the angle between the rows of the exact DCT matrix
and the rows of the approximated transformation matrix. The resulting
transformations matrices are orthogonal and have extremely low arithmetic
complexity. Considering popular performance measures, one of the proposed
transformation matrices outperforms the best competitors in both matrix error
and coding capabilities. Practical applications in image and video coding
demonstrate the relevance of the proposed transformation. In fact, we show that
the proposed approximate DCT can outperform the exact DCT for image encoding
under certain compression ratios. The proposed transform and its direct
competitors are also physically realized as digital prototype circuits using
FPGA technology.
</p>
|
|
|
|
<p>Brain-computer interfaces (BCIs) use brain signals such as
electroencephalography to reflect user intention and enable two-way
communication between computers and users. BCI technology has recently received
much attention in healthcare applications, such as neurorehabilitation and
diagnosis. BCI applications can also control external devices using only brain
activity, which can help people with physical or mental disabilities,
especially those suffering from neurological and neuromuscular diseases such as
stroke and amyotrophic lateral sclerosis. Motor imagery (MI) has been widely
used for BCI-based device control, but we adopted intuitive visual motion
imagery to overcome the weakness of MI. In this study, we developed a
three-dimensional (3D) BCI training platform to induce users to imagine
upper-limb movements used in real-life activities (picking up a cell phone,
pouring water, opening a door, and eating food). We collected intuitive visual
motion imagery data and proposed a deep learning network based on functional
connectivity as a mind-reading technique. As a result, the proposed network
recorded a high classification performance on average (71.05%). Furthermore, we
applied the leave-one-subject-out approach to confirm the possibility of
improvements in subject-independent classification performance. This study will
contribute to the development of BCI-based healthcare applications for
rehabilitation, such as robotic arms and wheelchairs, or assist daily life.
</p>
|
|
|
|
<p>In this work, we verify the mutable LongMap from the Scala standard library,
a hash table using open addressing within a single array, using the Stainless
program verifier. As a reference implementation, we write an immutable map
based on a list of tuples. We then show that LongMap's operations correspond to
operations of this association list. To express the resizing of the hash table
array, we introduce a new reference swapping construct in Stainless. This
allows us to apply the decorator pattern without introducing aliasing. Our
verification effort led us to find and fix a bug in the original implementation
that manifests for large hash tables. Our performance analysis shows the
verified version to be within a 1.5 factor of the original data structure.
</p>
|
|
|
|
<p>In large-scale applications including medical imaging, collocation
differential equation solvers, and estimation with differential privacy, the
underlying linear inverse problem can be reformulated as a streaming problem.
In theory, the streaming problem can be effectively solved using
memory-efficient, exponentially-converging streaming solvers. In practice, a
streaming solver's effectiveness is undermined if it is stopped before, or
well-after, the desired accuracy is achieved. In special cases when the
underlying linear inverse problem is finite-dimensional, streaming solvers can
periodically evaluate the residual norm at a substantial computational cost.
When the underlying system is infinite dimensional, streaming solver can only
access noisy estimates of the residual. While such noisy estimates are
computationally efficient, they are useful only when their accuracy is known.
In this work, we rigorously develop a general family of
computationally-practical residual estimators and their uncertainty sets for
streaming solvers, and we demonstrate the accuracy of our methods on a number
of large-scale linear problems. Thus, we further enable the practical use of
streaming solvers for important classes of linear inverse problems.
</p>
|
|
|
|
<p>Decentralized optimization is gaining increased traction due to its
widespread applications in large-scale machine learning and multi-agent
systems. The same mechanism that enables its success, i.e., information sharing
among participating agents, however, also leads to the disclosure of individual
agents' private information, which is unacceptable when sensitive data are
involved. As differential privacy is becoming a de facto standard for privacy
preservation, recently results have emerged integrating differential privacy
with distributed optimization. However, directly incorporating differential
privacy design in existing distributed optimization approaches significantly
compromises optimization accuracy. In this paper, we propose to redesign and
tailor gradient methods for differentially-private distributed optimization,
and propose two differential-privacy oriented gradient methods that can ensure
both rigorous epsilon-differential privacy and optimality. The first algorithm
is based on static-consensus based gradient methods, and the second algorithm
is based on dynamic-consensus (gradient-tracking) based distributed
optimization methods and, hence, is applicable to general directed interaction
graph topologies. Both algorithms can simultaneously ensure almost sure
convergence to an optimal solution and a finite privacy budget, even when the
number of iterations goes to infinity. To our knowledge, this is the first time
that both goals are achieved simultaneously. Numerical simulations using a
distributed estimation problem and experimental results on a benchmark dataset
confirm the effectiveness of the proposed approaches.
</p>
|
|
|
|
<p>We consider the dynamics of an elastic continuum under large deformation but
small strain. Such systems can be described by the equations of geometrically
nonlinear elastodynamics in combination with the St. Venant-Kirchhoff material
law. The velocity-stress formulation of the problem turns out to have a formal
port-Hamiltonian structure. In contrast to the linear case, the operators of
the problem are modulated by the displacement field which can be handled as a
passive variable and integrated along with the velocities. A weak formulation
of the problem is derived and essential boundary conditions are incorporated
via Lagrange multipliers. This variational formulation explicitly encodes the
transfer between kinetic and potential energy in the interior as well as across
the boundary, thus leading to a global power balance and ensuring passivity of
the system. The particular geometric structure of the weak formulation can be
preserved under Galerkin approximation via appropriate mixed finite elements.
In addition, a fully discrete power balance can be obtained by appropriate time
discretization. The main properties of the system and its discretization are
shown theoretically and demonstrated by numerical tests.
</p>
|
|
|
|
<p>Flattening is essential in computer vision by converting multi-dimensional
feature maps or images into one-dimensional vectors. However, existing
flattening approaches neglect the preservation of local smoothness, which can
impact the representational learning capacity of vision models. In this paper,
we propose Hilbert curve flattening as an innovative method to preserve
locality in flattened matrices. We compare it with the commonly used Zigzag
operation and demonstrate that Hilbert curve flattening can better retain the
spatial relationships and local smoothness of the original grid structure,
while maintaining robustness against the input scale variance. And, we
introduce the Localformer, a vision transformer architecture that incorporates
Hilbert token sampling with a token aggregator to enhance its locality bias.
Extensive experiments on image classification and semantic segmentation tasks
demonstrate that the Localformer outperforms baseline models consistently. We
also show it brings consistent performance boosts for other popular
architectures (e.g. MLP-Mixer).
</p>
|
|
|
|
<p>Algorithmic paradigms such as divide-and-conquer (D&C) are proposed to guide
developers in designing efficient algorithms, but it can still be difficult to
apply algorithmic paradigms to practical tasks. To ease the usage of paradigms,
many research efforts have been devoted to the automatic application of
algorithmic paradigms. However, most existing approaches to this problem rely
on syntax-based program transformations and thus put significant restrictions
on the original program.
</p>
<p>In this paper, we study the automatic application of D&C and several similar
paradigms, denoted as D&C-like algorithmic paradigms, and aim to remove the
restrictions from syntax-based transformations. To achieve this goal, we
propose an efficient synthesizer, named AutoLifter, which does not depend on
syntax-based transformations. Specifically, the main challenge of applying
algorithmic paradigms is from the large scale of the synthesized programs, and
AutoLifter addresses this challenge by applying two novel decomposition methods
that do not depend on the syntax of the input program, component elimination
and variable elimination, to soundly divide the whole problem into simpler
subtasks, each synthesizing a sub-program of the final program and being
tractable with existing synthesizers.
</p>
<p>We evaluate AutoLifter on 96 programming tasks related to 6 different
algorithmic paradigms. AutoLifter solves 82/96 tasks with an average time cost
of 20.17 seconds, significantly outperforming existing approaches.
</p>
|
|
|
|
<p>This paper explores optimal service resource management strategy, a
continuous challenge for health information service to enhance service
performance, optimise service resource utilisation and deliver interactive
health information service. An adaptive optimal service resource management
strategy was developed considering a value co-creation model in health
information service with a focus on collaborative and interactive with users.
The deep reinforcement learning algorithm was embedded in the Internet of
Things (IoT)-based health information service system (I-HISS) to allocate
service resources by controlling service provision and service adaptation based
on user engagement behaviour. The simulation experiments were conducted to
evaluate the significance of the proposed algorithm under different user
reactions to the health information service.
</p>
|
|
|
|
<p>As with many other tasks, neural networks prove very effective for anomaly
detection purposes. However, very few deep-learning models are suited for
detecting anomalies on tabular datasets. This paper proposes a novel
methodology to flag anomalies based on TracIn, an influence measure initially
introduced for explicability purposes. The proposed methods can serve to
augment any unsupervised deep anomaly detection method. We test our approach
using Variational Autoencoders and show that the average influence of a
subsample of training points on a test point can serve as a proxy for
abnormality. Our model proves to be competitive in comparison with
state-of-the-art approaches: it achieves comparable or better performance in
terms of detection accuracy on medical and cyber-security tabular benchmark
data.
</p>
|
|
|
|
<p>Evaluation of intervention in a multi-agent system, e.g., when humans should
intervene in autonomous driving systems and when a player should pass to
teammates for a good shot, is challenging in various engineering and scientific
fields. Estimating the individual treatment effect (ITE) using counterfactual
long-term prediction is practical to evaluate such interventions. However, most
of the conventional frameworks did not consider the time-varying complex
structure of multi-agent relationships and covariate counterfactual prediction.
This may lead to erroneous assessments of ITE and difficulty in interpretation.
Here we propose an interpretable, counterfactual recurrent network in
multi-agent systems to estimate the effect of the intervention. Our model
leverages graph variational recurrent neural networks and theory-based
computation with domain knowledge for the ITE estimation framework based on
long-term prediction of multi-agent covariates and outcomes, which can confirm
the circumstances under which the intervention is effective. On simulated
models of an automated vehicle and biological agents with time-varying
confounders, we show that our methods achieved lower estimation errors in
counterfactual covariates and the most effective treatment timing than the
baselines. Furthermore, using real basketball data, our methods performed
realistic counterfactual predictions and evaluated the counterfactual passes in
shot scenarios.
</p>
|
|
|
|
<p>Optical sensors can capture dynamic environments and derive depth information
in near real-time. The quality of these digital reconstructions is determined
by factors like illumination, surface and texture conditions, sensing speed and
other sensor characteristics as well as the sensor-object relations.
Improvements can be obtained by using dynamically collected data from multiple
sensors. However, matching the data from multiple sensors requires a shared
world coordinate system. We present a concept for transferring multi-sensor
data into a commonly referenced world coordinate system: the earth's magnetic
field. The steady presence of our planetary magnetic field provides a reliable
world coordinate system, which can serve as a reference for a position-defined
reconstruction of dynamic environments. Our approach is evaluated using
magnetic field sensors of the ZED 2 stereo camera from Stereolabs, which
provides orientation relative to the North Pole similar to a compass. With the
help of inertial measurement unit informations, each camera's position data can
be transferred into the unified world coordinate system. Our evaluation reveals
the level of quality possible using the earth magnetic field and allows a basis
for dynamic and real-time-based applications of optical multi-sensors for
environment detection.
</p>
|
|
|
|
<p>The co-optimization of behind-the-meter distributed energy resources is
considered for prosumers under the net energy metering tariff. The distributed
energy resources considered include renewable generations, flexible demands,
and battery energy storage systems. An energy management system co-schedules
the consumptions and battery storage based on locally available stochastic
renewables by maximizing the expected operation surplus. A stochastic dynamic
programming formulation is introduced for which structural properties of the
dynamic optimization are derived. A closed-form myopic co-optimization
algorithm is proposed, which achieves optimality when the storage capacity
constraints are nonbinding. The proposed co-optimization algorithm has linear
computation complexity and can be implemented in a decentralized fashion. The
myopic co-optimization algorithm's performance and the economic benefits of the
co-optimization policy to prosumers and grid operations are evaluated in
numerical simulations.
</p>
|
|
|
|
<p>The last decade has shown a growing interest in robots as well-being coaches.
However, insightful guidelines for the design of robots as coaches to promote
mental well-being have not yet been proposed. This paper details design and
ethical recommendations based on a qualitative analysis drawing on a grounded
theory approach, which was conducted with a three-step iterative design process
which included user-centered design studies involving robotic well-being
coaches, namely: (1) a user-centred design study conducted with 11 participants
consisting of both prospective users who had participated in a Brief
Solution-Focused Practice study with a human coach, as well as coaches of
different disciplines, (2) semi-structured individual interview data gathered
from 20 participants attending a Positive Psychology intervention study with
the robotic well-being coach Pepper, and (3) a user-centred design study
conducted with 3 participants of the Positive Psychology study as well as 2
relevant well-being coaches. After conducting a thematic analysis and a
qualitative analysis, we collated the data gathered into convergent and
divergent themes, and we distilled from those results a set of design
guidelines and ethical considerations. Our findings can inform researchers and
roboticists on the key aspects to take into account when designing robotic
mental well-being coaches.
</p>
|
|
|
|
<p>By abstracting over well-known properties of De Bruijn's representation with
nameless dummies, we design a new theory of syntax with variable binding and
capture-avoiding substitution. We propose it as a simpler alternative to Fiore,
Plotkin, and Turi's approach, with which we establish a strong formal link. We
also show that our theory easily incorporates simple types and equations
between terms.
</p>
|
|
|
|
<p>Famous people, such as celebrities and influencers, are harassed online on a
daily basis. Online harassment mentally disturbs them and negatively affects
society. However, limited studies have been conducted on the online harassment
victimization of famous people, and its effects remain unclear. We surveyed
Japanese famous people ($N=213$), who were influential people who appeared on
television and other traditional media and on social media, regarding online
harassment victimization, emotional injury, and action against offenders and
revealed that various forms of online harassment are prevalent. Some victims
used the anti-harassment functions provided by weblogs and social media systems
(e.g., blocking/muting/reporting offender accounts and closing comment forms),
talked about their victimization to close people, and contacted relevant
authorities to take legal action (talent agencies, legal consultants, and
police). By contrast, some victims felt compelled to accept harassment and did
not initiate action for offenses. We propose several approaches to support
victims, inhibit online harassment, and educate people. Our findings help that
platforms establish support systems against online harassment.
</p>
|
|
|
|
<p>Deep Reinforcement Learning (DRL) has achieved impressive performance in
robotics and autonomous systems (RAS). A key challenge to its deployment in
real-life operations is the presence of spuriously unsafe DRL policies.
Unexplored states may lead the agent to make wrong decisions that could result
in hazards, especially in applications where DRL-trained end-to-end controllers
govern the behaviour of RAS. This paper proposes a novel quantitative
reliability assessment framework for DRL-controlled RAS, leveraging
verification evidence generated from formal reliability analysis of neural
networks. A two-level verification framework is introduced to check the safety
property with respect to inaccurate observations that are due to, e.g.,
environmental noise and state changes. Reachability verification tools are
leveraged locally to generate safety evidence of trajectories. In contrast, at
the global level, we quantify the overall reliability as an aggregated metric
of local safety evidence, corresponding to a set of distinct tasks and their
occurrence probabilities. The effectiveness of the proposed verification
framework is demonstrated and validated via experiments on real RAS.
</p>
|
|
|
|
<p>We investigate composed image retrieval with text feedback. Users gradually
look for the target of interest by moving from coarse to fine-grained feedback.
However, existing methods merely focus on the latter, i.e., fine-grained
search, by harnessing positive and negative pairs during training. This
pair-based paradigm only considers the one-to-one distance between a pair of
specific points, which is not aligned with the one-to-many coarse-grained
retrieval process and compromises the recall rate. In an attempt to fill this
gap, we introduce a unified learning approach to simultaneously modeling the
coarse- and fine-grained retrieval by considering the multi-grained
uncertainty. The key idea underpinning the proposed method is to integrate
fine- and coarse-grained retrieval as matching data points with small and large
fluctuations, respectively. Specifically, our method contains two modules:
uncertainty modeling and uncertainty regularization. (1) The uncertainty
modeling simulates the multi-grained queries by introducing identically
distributed fluctuations in the feature space. (2) Based on the uncertainty
modeling, we further introduce uncertainty regularization to adapt the matching
objective according to the fluctuation range. Compared with existing methods,
the proposed strategy explicitly prevents the model from pushing away potential
candidates in the early stage, and thus improves the recall rate. On the three
public datasets, i.e., FashionIQ, Fashion200k, and Shoes, the proposed method
has achieved +4.03%, +3.38%, and +2.40% Recall@50 accuracy over a strong
baseline, respectively.
</p>
|
|
|
|
<p>We introduce and analyze an improved variant of nearest neighbors (NN) for
estimation with missing data in latent factor models. We consider a matrix
completion problem with missing data, where the $(i, t)$-th entry, when
observed, is given by its mean $f(u_i, v_t)$ plus mean-zero noise for an
unknown function $f$ and latent factors $u_i$ and $v_t$. Prior NN strategies,
like unit-unit NN, for estimating the mean $f(u_i, v_t)$ relies on existence of
other rows $j$ with $u_j \approx u_i$. Similarly, time-time NN strategy relies
on existence of columns $t'$ with $v_{t'} \approx v_t$. These strategies
provide poor performance respectively when similar rows or similar columns are
not available. Our estimate is doubly robust to this deficit in two ways: (1)
As long as there exist either good row or good column neighbors, our estimate
provides a consistent estimate. (2) Furthermore, if both good row and good
column neighbors exist, it provides a (near-)quadratic improvement in the
non-asymptotic error and admits a significantly narrower asymptotic confidence
interval when compared to both unit-unit or time-time NN.
</p>
|
|
|
|
<p>Despite the impressive advancements achieved through vision-and-language
pretraining, it remains unclear whether this joint learning paradigm can help
understand each individual modality. In this work, we conduct a comparative
analysis of the visual representations in existing vision-and-language models
and vision-only models by probing a broad range of tasks, aiming to assess the
quality of the learned representations in a nuanced manner. Interestingly, our
empirical observations suggest that vision-and-language models are better at
label prediction tasks like object and attribute prediction, while vision-only
models are stronger at dense prediction tasks that require more localized
information. We hope our study sheds light on the role of language in visual
learning, and serves as an empirical guide for various pretrained models. Code
will be released at https://github.com/Lizw14/visual_probing
</p>
|
|
|
|
<p>Reversible debuggers help programmers to find the causes of misbehaviours in
concurrent programs more quickly, by executing a program backwards from the
point where a misbehaviour was observed, and looking for the bug(s) that caused
it. Reversible debuggers can be founded on the well-studied theory of
causal-consistent reversibility, which only allows one to undo an action
provided that its consequences, if any, are undone beforehand.
Causal-consistent reversibility yields more efficient debugging by reducing the
number of states to be explored when looking backwards. Till now,
causal-consistent reversibility has never considered time, which is a key
aspect in real-world applications. Here, we study the interplay between
reversibility and time in concurrent systems via a process algebra. The
Temporal Process Language (TPL) by Hennessy and Regan is a well-understood
extension of CCS with discrete-time and a timeout operator. We define revTPL, a
reversible extension of TPL, and we show that it satisfies the properties
expected from a causal-consistent reversible calculus. We show that,
alternatively, revTPL can be interpreted as an extension of reversible CCS with
time.
</p>
|
|
|
|
<p>There is increasing adoption of artificial intelligence in drug discovery.
However, existing studies use machine learning to mainly utilize the chemical
structures of molecules but ignore the vast textual knowledge available in
chemistry. Incorporating textual knowledge enables us to realize new drug
design objectives, adapt to text-based instructions and predict complex
biological activities. Here we present a multi-modal molecule structure-text
model, MoleculeSTM, by jointly learning molecules' chemical structures and
textual descriptions via a contrastive learning strategy. To train MoleculeSTM,
we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000
chemical structure-text pairs. To demonstrate the effectiveness and utility of
MoleculeSTM, we design two challenging zero-shot tasks based on text
instructions, including structure-text retrieval and molecule editing.
MoleculeSTM has two main properties: open vocabulary and compositionality via
natural language. In experiments, MoleculeSTM obtains the state-of-the-art
generalization ability to novel biochemical concepts across various benchmarks.
</p>
|
|
|
|
<p>The relationship between electricity demand and weather is well established
in power systems, along with the importance of behavioral and social aspects
such as holidays and significant events. This study explores the link between
electricity demand and more nuanced information about social events. This is
done using mature Natural Language Processing (NLP) and demand forecasting
techniques. The results indicate that day-ahead forecasts are improved by
textual features such as word frequencies, public sentiments, topic
distributions, and word embeddings. The social events contained in these
features include global pandemics, politics, international conflicts,
transportation, etc. Causality effects and correlations are discussed to
propose explanations for the mechanisms behind the links highlighted. This
study is believed to bring a new perspective to traditional electricity demand
analysis. It confirms the feasibility of improving forecasts from unstructured
text, with potential consequences for sociology and economics.
</p>
|
|
|
|
<p>Estimating depth from images nowadays yields outstanding results, both in
terms of in-domain accuracy and generalization. However, we identify two main
challenges that remain open in this field: dealing with non-Lambertian
materials and effectively processing high-resolution images. Purposely, we
propose a novel dataset that includes accurate and dense ground-truth labels at
high resolution, featuring scenes containing several specular and transparent
surfaces. Our acquisition pipeline leverages a novel deep space-time stereo
framework, enabling easy and accurate labeling with sub-pixel precision. The
dataset is composed of 606 samples collected in 85 different scenes, each
sample includes both a high-resolution pair (12 Mpx) as well as an unbalanced
stereo pair (Left: 12 Mpx, Right: 1.1 Mpx), typical of modern mobile devices
that mount sensors with different resolutions. Additionally, we provide
manually annotated material segmentation masks and 15K unlabeled samples. The
dataset is composed of a train set and two test sets, the latter devoted to the
evaluation of stereo and monocular depth estimation networks. Our experiments
highlight the open challenges and future research directions in this field.
</p>
|
|
|
|
<p>The Shapley value, which is arguably the most popular approach for assigning
a meaningful contribution value to players in a cooperative game, has recently
been used intensively in explainable artificial intelligence. Its
meaningfulness is due to axiomatic properties that only the Shapley value
satisfies, which, however, comes at the expense of an exact computation growing
exponentially with the number of agents. Accordingly, a number of works are
devoted to the efficient approximation of the Shapley value, most of them
revolve around the notion of an agent's marginal contribution. In this paper,
we propose with SVARM and Stratified SVARM two parameter-free and
domain-independent approximation algorithms based on a representation of the
Shapley value detached from the notion of marginal contribution. We prove
unmatched theoretical guarantees regarding their approximation quality and
provide empirical results including synthetic games as well as common
explainability use cases comparing ourselves with state-of-the-art methods.
</p>
|
|
|
|
<p>In this work we explore the deliberate infusion of ambiguity into the design
of contracts. We show that when the agent is ambiguity-averse and chooses an
action that maximizes their max-min utility, then the principal can strictly
gain from using an ambiguous contract. We provide insights into the structure
of optimal contracts, and establish that optimal ambiguous contracts are
composed of simple contracts. We also provide a geometric characterization of
ambiguity-proof classes of contracts. Finally, we show that when the agent
considers mixed strategies, then there is no advantage in using an ambiguous
contract.
</p>
|
|
|
|
<p>In this paper we adapt previous work on rewriting string diagrams using
hypergraphs to the case where the underlying category has a traced comonoid
structure, in which wires can be forked and the outputs of a morphism can be
connected to its input. Such a structure is particularly interesting because
any traced Cartesian (dataflow) category has an underlying traced comonoid
structure. We show that certain subclasses of hypergraphs are fully complete
for traced comonoid categories: that is to say, every term in such a category
has a unique corresponding hypergraph up to isomorphism, and from every
hypergraph with the desired properties, a unique term in the category can be
retrieved up to the axioms of traced comonoid categories. We also show how the
framework of double pushout rewriting (DPO) can be adapted for traced comonoid
categories by characterising the valid pushout complements for rewriting in our
setting. We conclude by presenting a case study in the form of recent work on
an equational theory for sequential circuits: circuits built from primitive
logic gates with delay and feedback. The graph rewriting framework allows for
the definition of an operational semantics for sequential circuits.
</p>
|
|
|
|
<p>Let $G$ be an undirected graph. We say that $G$ contains a ladder of length
$k$ if the $2 \times (k+1)$ grid graph is an induced subgraph of $G$ that is
only connected to the rest of $G$ via its four cornerpoints. We prove that if
all the ladders contained in $G$ are reduced to length 4, the treewidth remains
unchanged (and that this bound is tight). Our result indicates that, when
computing the treewidth of a graph, long ladders can simply be reduced, and
that minimal forbidden minors for bounded treewidth graphs cannot contain long
ladders. Our result also settles an open problem from algorithmic
phylogenetics: the common chain reduction rule, used to simplify the comparison
of two evolutionary trees, is treewidth-preserving in the display graph of the
two trees.
</p>
|
|
|
|
<p>Tennenbaum's theorem states that the only countable model of Peano arithmetic
(PA) with computable arithmetical operations is the standard model of natural
numbers. In this paper, we use constructive type theory as a framework to
revisit, analyze and generalize this result. The chosen framework allows for a
synthetic approach to computability theory, exploiting that, externally, all
functions definable in constructive type theory can be shown computable. We
then build on this viewpoint and furthermore internalize it by assuming a
version of Church's thesis, which expresses that any function on natural
numbers is representable by a formula in PA. This assumption provides for a
conveniently abstract setup to carry out rigorous computability arguments, even
in the theorem's mechanization. Concretely, we constructivize several classical
proofs and present one inherently constructive rendering of Tennenbaum's
theorem, all following arguments from the literature. Concerning the classical
proofs in particular, the constructive setting allows us to highlight
differences in their assumptions and conclusions which are not visible
classically. All versions are accompanied by a unified mechanization in the Coq
proof assistant.
</p>
|
|
|
|
<p>In this paper, we establish novel data-dependent upper bounds on the
generalization error through the lens of a "variable-size compressibility"
framework that we introduce newly here. In this framework, the generalization
error of an algorithm is linked to a variable-size 'compression rate' of its
input data. This is shown to yield bounds that depend on the empirical measure
of the given input data at hand, rather than its unknown distribution. Our new
generalization bounds that we establish are tail bounds, tail bounds on the
expectation, and in-expectations bounds. Moreover, it is shown that our
framework also allows to derive general bounds on any function of the input
data and output hypothesis random variables. In particular, these general
bounds are shown to subsume and possibly improve over several existing
PAC-Bayes and data-dependent intrinsic dimension-based bounds that are
recovered as special cases, thus unveiling a unifying character of our
approach. For instance, a new data-dependent intrinsic dimension-based bound is
established, which connects the generalization error to the optimization
trajectories and reveals various interesting connections with the
rate-distortion dimension of a process, the R\'enyi information dimension of a
process, and the metric mean dimension.
</p>
|
|
|
|
<p>The pressing need for digitization of historical documents has led to a
strong interest in designing computerised image processing methods for
automatic handwritten text recognition. However, not much attention has been
paid on studying the handwritten text written in the margins, i.e. marginalia,
that also forms an important source of information. Nevertheless, training an
accurate and robust recognition system for marginalia calls for data-efficient
approaches due to the unavailability of sufficient amounts of annotated
multi-writer texts. Therefore, this work presents an end-to-end framework for
automatic detection and recognition of handwritten marginalia, and leverages
data augmentation and transfer learning to overcome training data scarcity. The
detection phase involves investigation of R-CNN and Faster R-CNN networks. The
recognition phase includes an attention-based sequence-to-sequence model, with
ResNet feature extraction, bidirectional LSTM-based sequence modeling, and
attention-based prediction of marginalia. The effectiveness of the proposed
framework has been empirically evaluated on the data from early book
collections found in the Uppsala University Library in Sweden. Source code and
pre-trained models are available at Github.
</p>
|
|
|
|
<p>Social network structures play an important role in the lives of animals by
affecting individual fitness, and the spread of disease and information.
Nevertheless, we still lack a good understanding of how these structures emerge
from the behaviour of individuals. Generative network models based on empirical
knowledge about animal social systems provide a powerful approach that can help
close this gap. In this study: 1) we develop a general model for the emergence
of social structures based on a key generative process of real animal social
networks, namely social preferences for traits (such as the age, sex, etc. of
social partners); 2) we use this model to investigate how different trait
preferences affect social network structure and function. We find that the
preferences used in a population can have far-reaching consequences for the
population, via effects on the transmission of disease and information and the
robustness of the social network against fragmentation when individuals
disappear. The study thus shows that social preferences can have consequences
that go far beyond direct benefits individuals gain from social partner
selection. It also shows that these consequences depend both on the preference
types, and on the types of traits they are used with. We discuss the
implications of the results for social evolution.
</p>
|
|
|
|
<p>Deep learning has emerged as a strong alternative for classical iterative
methods for deformable medical image registration, where the goal is to find a
mapping between the coordinate systems of two images. Popular classical image
registration methods enforce the useful inductive biases of symmetricity,
inverse consistency, and topology preservation by construct. However, while
many deep learning registration methods encourage these properties via loss
functions, no earlier methods enforce all of them by construct. Here, we
propose a novel registration architecture based on extracting multi-resolution
feature representations which is by construct symmetric, inverse consistent,
and topology preserving. We also develop an implicit layer for memory efficient
inversion of the deformation fields. Our method achieves state-of-the-art
registration accuracy on two datasets.
</p>
|
|
|
|
<p>Finding localized correspondences across different images of the same object
is crucial to understand its geometry. In recent years, this problem has seen
remarkable progress with the advent of deep learning-based local image features
and learnable matchers. Still, learnable matchers often underperform when there
exists only small regions of co-visibility between image pairs (i.e. wide
camera baselines). To address this problem, we leverage recent progress in
coarse single-view geometry estimation methods. We propose LFM-3D, a Learnable
Feature Matching framework that uses models based on graph neural networks and
enhances their capabilities by integrating noisy, estimated 3D signals to boost
correspondence estimation. When integrating 3D signals into the matcher model,
we show that a suitable positional encoding is critical to effectively make use
of the low-dimensional 3D information. We experiment with two different 3D
signals - normalized object coordinates and monocular depth estimates - and
evaluate our method on large-scale (synthetic and real) datasets containing
object-centric image pairs across wide baselines. We observe strong feature
matching improvements compared to 2D-only methods, with up to +6% total recall
and +28% precision at fixed recall. Additionally, we demonstrate that the
resulting improved correspondences lead to much higher relative posing accuracy
for in-the-wild image pairs - up to 8.6% compared to the 2D-only approach.
</p>
|
|
|
|
<p>Inverse Reinforcement Learning (IRL) is a powerful set of techniques for
imitation learning that aims to learn a reward function that rationalizes
expert demonstrations. Unfortunately, traditional IRL methods suffer from a
computational weakness: they require repeatedly solving a hard reinforcement
learning (RL) problem as a subroutine. This is counter-intuitive from the
viewpoint of reductions: we have reduced the easier problem of imitation
learning to repeatedly solving the harder problem of RL. Another thread of work
has proved that access to the side-information of the distribution of states
where a strong policy spends time can dramatically reduce the sample and
computational complexities of solving an RL problem. In this work, we
demonstrate for the first time a more informed imitation learning reduction
where we utilize the state distribution of the expert to alleviate the global
exploration component of the RL subroutine, providing an exponential speedup in
theory. In practice, we find that we are able to significantly speed up the
prior art on continuous control tasks.
</p>
|
|
|
|
<p>Diffusion-weighted (DW) MRI measures the direction and scale of the local
diffusion process in every voxel through its spectrum in q-space, typically
acquired in one or more shells. Recent developments in micro-structure imaging
and multi-tissue decomposition have sparked renewed attention to the radial
b-value dependence of the signal. Applications in tissue classification and
micro-architecture estimation, therefore, require a signal representation that
extends over the radial as well as angular domain. Multiple approaches have
been proposed that can model the non-linear relationship between the DW-MRI
signal and biological microstructure. In the past few years, many deep
learning-based methods have been developed towards faster inference speed and
higher inter-scan consistency compared with traditional model-based methods
(e.g., multi-shell multi-tissue constrained spherical deconvolution). However,
a multi-stage learning strategy is typically required since the learning
process relies on various middle representations, such as simple harmonic
oscillator reconstruction (SHORE) representation. In this work, we present a
unified dynamic network with a single-stage spherical convolutional neural
network, which allows efficient fiber orientation distribution function (fODF)
estimation through heterogeneous multi-shell diffusion MRI sequences. We study
the Human Connectome Project (HCP) young adults with test-retest scans. From
the experimental results, the proposed single-stage method outperforms prior
multi-stage approaches in repeated fODF estimation with shell dropoff and
single-shell DW-MRI sequences.
</p>
|
|
|
|
<p>This study introduces an approach to estimate the uncertainty in bibliometric
indicator values that is caused by data errors. This approach utilizes Bayesian
regression models, estimated from empirical data samples, which are used to
predict error-free data. Through direct Monte Carlo simulation -- drawing
predicted data from the estimated regression models a large number of times for
the same input data -- probability distributions for indicator values can be
obtained, which provide the information on their uncertainty due to data
errors. It is demonstrated how uncertainty in base quantities, such as the
number of publications of a unit of certain document types and the number of
citations of a publication, can be propagated along a measurement model into
final indicator values. This method can be used to estimate the uncertainty of
indicator values due to sources of errors with known error distributions. The
approach is demonstrated with simple synthetic examples for instructive
purposes and real bibliometric research evaluation data to show its possible
application in practice.
</p>
|
|
|
|
<p>We study the problem of federated stochastic multi-arm contextual bandits
with unknown contexts, in which M agents are faced with different bandits and
collaborate to learn. The communication model consists of a central server and
the agents share their estimates with the central server periodically to learn
to choose optimal actions in order to minimize the total regret. We assume that
the exact contexts are not observable and the agents observe only a
distribution of the contexts. Such a situation arises, for instance, when the
context itself is a noisy measurement or based on a prediction mechanism. Our
goal is to develop a distributed and federated algorithm that facilitates
collaborative learning among the agents to select a sequence of optimal actions
so as to maximize the cumulative reward. By performing a feature vector
transformation, we propose an elimination-based algorithm and prove the regret
bound for linearly parametrized reward functions. Finally, we validated the
performance of our algorithm and compared it with another baseline approach
using numerical simulations on synthetic data and on the real-world movielens
dataset.
</p>
|
|
|
|
<p>In smart electrical grids, fault detection tasks may have a high impact on
society due to their economic and critical implications. In the recent years,
numerous smart grid applications, such as defect detection and load
forecasting, have embraced data-driven methodologies. The purpose of this study
is to investigate the challenges associated with the security of machine
learning (ML) applications in the smart grid scenario. Indeed, the robustness
and security of these data-driven algorithms have not been extensively studied
in relation to all power grid applications. We demonstrate first that the deep
neural network method used in the smart grid is susceptible to adversarial
perturbation. Then, we highlight how studies on fault localization and type
classification illustrate the weaknesses of present ML algorithms in smart
grids to various adversarial attacks
</p>
|
|
|
|
<p>This paper presents a fully multidimensional kernel-based reconstruction
scheme for finite volume methods applied to systems of hyperbolic conservation
laws, with a particular emphasis on the compressible Euler equations.
Non-oscillatory reconstruction is achieved through an adaptive order weighted
essentially non-oscillatory (WENO-AO) method cast into a form suited to
multidimensional stencils and reconstruction. A kernel-based approach inspired
by Gaussian process (GP) modeling is presented here. This approach allows the
creation of a scheme of arbitrary order with simply defined multidimensional
stencils and substencils. Furthermore, the fully multidimensional nature of the
reconstruction allows a more straightforward extension to higher spatial
dimensions and removes the need for complicated boundary conditions on
intermediate quantities in modified dimension-by-dimension methods. In
addition, a new simple-yet-effective set of reconstruction variables is
introduced, as well as an easy-to-implement effective limiter for positivity
preservation, both of which could be useful in existing schemes with little
modification. The proposed scheme is applied to a suite of stringent and
informative benchmark problems to demonstrate its efficacy and utility.
</p>
|
|
|
|
<p>Analysis of geospatial data has traditionally been model-based, with a mean
model, customarily specified as a linear regression on the covariates, and a
covariance model, encoding the spatial dependence. We relax the strong
assumption of linearity and propose embedding neural networks directly within
the traditional geostatistical models to accommodate non-linear mean functions
while retaining all other advantages including use of Gaussian Processes to
explicitly model the spatial covariance, enabling inference on the covariate
effect through the mean and on the spatial dependence through the covariance,
and offering predictions at new locations via kriging. We propose NN-GLS, a new
neural network estimation algorithm for the non-linear mean in GP models that
explicitly accounts for the spatial covariance through generalized least
squares (GLS), the same loss used in the linear case. We show that NN-GLS
admits a representation as a special type of graph neural network (GNN). This
connection facilitates use of standard neural network computational techniques
for irregular geospatial data, enabling novel and scalable mini-batching,
backpropagation, and kriging schemes. Theoretically, we show that NN-GLS will
be consistent for irregularly observed spatially correlated data processes. To
our knowledge this is the first asymptotic consistency result for any neural
network algorithm for spatial data. We demonstrate the methodology through
simulated and real datasets.
</p>
|
|
|
|
<p>Translated texts bear several hallmarks distinct from texts originating in
the language. Though individual translated texts are often fluent and preserve
meaning, at a large scale, translated texts have statistical tendencies which
distinguish them from text originally written in the language
("translationese") and can affect model performance. We frame the novel task of
translationese reduction and hypothesize that Abstract Meaning Representation
(AMR), a graph-based semantic representation which abstracts away from the
surface form, can be used as an interlingua to reduce the amount of
translationese in translated texts. By parsing English translations into an AMR
and then generating text from that AMR, the result more closely resembles
originally English text across three quantitative macro-level measures, without
severely compromising fluency or adequacy. We compare our AMR-based approach
against three other techniques based on machine translation or paraphrase
generation. This work makes strides towards reducing translationese in text and
highlights the utility of AMR as an interlingua.
</p>
|
|
|
|
<p>We propose a method to explore the flavor structure of quarks and leptons
with reinforcement learning. As a concrete model, we utilize a basic
value-based algorithm for models with $U(1)$ flavor symmetry. By training
neural networks on the $U(1)$ charges of quarks and leptons, the agent finds 21
models to be consistent with experimentally measured masses and mixing angles
of quarks and leptons. In particular, an intrinsic value of normal ordering
tends to be larger than that of inverted ordering, and the normal ordering is
well fitted with the current experimental data in contrast to the inverted
ordering. A specific value of effective mass for the neutrinoless double beta
decay and a sizable leptonic CP violation induced by an angular component of
flavon field are predicted by autonomous behavior of the agent. Our finding
results indicate that the reinforcement learning can be a new method for
understanding the flavor structure.
</p>
|
|
|
|
<p>Orbit recovery problems are a class of problems that often arise in practice
and various forms. In these problems, we aim to estimate an unknown function
after being distorted by a group action and observed via a known operator.
Typically, the observations are contaminated with a non-trivial level of noise.
Two particular orbit recovery problems of interest in this paper are
multireference alignment and single-particle cryo-EM modelling. In order to
suppress the noise, we suggest using the method of moments approach for both
problems while introducing deep neural network priors. In particular, our
neural networks should output the signals and the distribution of group
elements, with moments being the input. In the multireference alignment case,
we demonstrate the advantage of using the NN to accelerate the convergence for
the reconstruction of signals from the moments. Finally, we use our method to
reconstruct simulated and biological volumes in the cryo-EM setting.
</p>
|
|
|
|
<p>As semiconductor power density is no longer constant with the technology
process scaling down, modern CPUs are integrating capable data accelerators on
chip, aiming to improve performance and efficiency for a wide range of
applications and usages. One such accelerator is the Intel Data Streaming
Accelerator (DSA) introduced in Intel 4th Generation Xeon Scalable CPUs
(Sapphire Rapids). DSA targets data movement operations in memory that are
common sources of overhead in datacenter workloads and infrastructure. In
addition, it becomes much more versatile by supporting a wider range of
operations on streaming data, such as CRC32 calculations, delta record
creation/merging, and data integrity field (DIF) operations. This paper sets
out to introduce the latest features supported by DSA, deep-dive into its
versatility, and analyze its throughput benefits through a comprehensive
evaluation. Along with the analysis of its characteristics, and the rich
software ecosystem of DSA, we summarize several insights and guidelines for the
programmer to make the most out of DSA, and use an in-depth case study of DPDK
Vhost to demonstrate how these guidelines benefit a real application.
</p>
|
|
|
|
<p>Acquiring new knowledge without forgetting what has been learned in a
sequence of tasks is the central focus of continual learning (CL). While tasks
arrive sequentially, the training data are often prepared and annotated
independently, leading to the CL of incoming supervised learning tasks. This
paper considers the under-explored problem of active continual learning (ACL)
for a sequence of active learning (AL) tasks, where each incoming task includes
a pool of unlabelled data and an annotation budget. We investigate the
effectiveness and interplay between several AL and CL algorithms in the domain,
class and task-incremental scenarios. Our experiments reveal the trade-off
between two contrasting goals of not forgetting the old knowledge and the
ability to quickly learn new knowledge in CL and AL, respectively. While
conditioning the AL query strategy on the annotations collected for the
previous tasks leads to improved task performance on the domain and task
incremental learning, our proposed forgetting-learning profile suggests a gap
in balancing the effect of AL and CL for the class-incremental scenario.
</p>
|
|
|
|
<p>Integrating the gas and district heating with the electrical grid in a
multi-energy grid has been shown to provide flexibility and prevent bottlenecks
in the operation of electrical distribution grids. This integration however
presents new challenges, including uncertainties in demand prediction, energy
prices, and renewable energy availability. In response to these challenges,
this paper proposes a novel approach to apply robust optimization methods in
the integrated planning of multi-energy grids, to reduce the risk of investment
in grid expansion and to optimize the use of different carbon-neutral energy
carriers. The uncertainty in energy prices is modeled using interval
uncertainty with a proportional deviation. This allows planners, operators and
regulators to prioritize the expansion of specific grids in certain areas of a
city. By minimizing a cost function subject to various constraints, the
strategy ensures robustness against uncertainties in energy prices. This robust
optimization approach is applied to Hamburg as a case study. The study
concludes that district heating expansion in high-density areas is a low-risk
investment for carbon neutrality. In less dense areas, electrification supports
decentralized heat pumps. Meanwhile, hydrogen gas grids are viable where
electric expansion is impractical. Increased uncertainty leads to more
conservative solutions. This novel approach can be implemented promptly and
practically by grid planners and is an important component of a new holistic
integrated planning process for multi-energy grids.
</p>
|
|
|
|
<p>Principled accountability in the aftermath of harms is essential to the
trustworthy design and governance of algorithmic decision making. Legal theory
offers a paramount method for assessing culpability: putting the agent 'on the
stand' to subject their actions and intentions to cross-examination. We show
that under minimal assumptions automated reasoning can rigorously interrogate
algorithmic behaviors as in the adversarial process of legal fact finding. We
model accountability processes, such as trials or review boards, as
Counterfactual-Guided Logic Exploration and Abstraction Refinement (CLEAR)
loops. We use the formal methods of symbolic execution and satisfiability
modulo theories (SMT) solving to discharge queries about agent behavior in
factual and counterfactual scenarios, as adaptively formulated by a human
investigator. In order to do so, for a decision algorithm $\mathcal{A}$ we use
symbolic execution to represent its logic as a statement $\Pi$ in the decidable
theory $\texttt{QF_FPBV}$. We implement our framework and demonstrate its
utility on an illustrative car crash scenario.
</p>
|
|
|
|
<p>Cross-platform recommendation aims to improve recommendation accuracy by
gathering heterogeneous features from different platforms. However, such
cross-silo collaborations between platforms are restricted by increasingly
stringent privacy protection regulations, thus data cannot be aggregated for
training. Federated learning (FL) is a practical solution to deal with the data
silo problem in recommendation scenarios. Existing cross-silo FL methods
transmit model information to collaboratively build a global model by
leveraging the data of overlapped users. However, in reality, the number of
overlapped users is often very small, thus largely limiting the performance of
such approaches. Moreover, transmitting model information during training
requires high communication costs and may cause serious privacy leakage. In
this paper, we propose a novel privacy-preserving double distillation framework
named FedPDD for cross-silo federated recommendation, which efficiently
transfers knowledge when overlapped users are limited. Specifically, our double
distillation strategy enables local models to learn not only explicit knowledge
from the other party but also implicit knowledge from its past predictions.
Moreover, to ensure privacy and high efficiency, we employ an offline training
scheme to reduce communication needs and privacy leakage risk. In addition, we
adopt differential privacy to further protect the transmitted information. The
experiments on two real-world recommendation datasets, HetRec-MovieLens and
Criteo, demonstrate the effectiveness of FedPDD compared to the
state-of-the-art approaches.
</p>
|
|
|
|
<p>Model generalizability to unseen datasets, concerned with in-the-wild
robustness, is less studied for indoor single-image depth prediction. We
leverage gradient-based meta-learning for higher generalizability on zero-shot
cross-dataset inference. Unlike the most-studied image classification in
meta-learning, depth is pixel-level continuous range values, and mappings from
each image to depth vary widely across environments. Thus no explicit task
boundaries exist. We instead propose fine-grained task that treats each RGB-D
pair as a task in our meta-optimization. We first show meta-learning on limited
data induces much better prior (max +29.4\%). Using meta-learned weights as
initialization for following supervised learning, without involving extra data
or information, it consistently outperforms baselines without the method.
Compared to most indoor-depth methods that only train/ test on a single
dataset, we propose zero-shot cross-dataset protocols, closely evaluate
robustness, and show consistently higher generalizability and accuracy by our
meta-initialization. The work at the intersection of depth and meta-learning
potentially drives both research streams to step closer to practical use.
</p>
|
|
|
|
<p>$\textbf{Objectives}$: Large Language Models (LLMs) such as ChatGPT and
Med-PaLM have excelled in various medical question-answering tasks. However,
these English-centric models encounter challenges in non-English clinical
settings, primarily due to limited clinical knowledge in respective languages,
a consequence of imbalanced training corpora. We systematically evaluate LLMs
in the Chinese medical context and develop a novel in-context learning
framework to enhance their performance.
</p>
<p>$\textbf{Materials and Methods}$: The latest China National Medical Licensing
Examination (CNMLE-2022) served as the benchmark. We collected 53 medical books
and 381,149 medical questions to construct the medical knowledge base and
question bank. The proposed Knowledge and Few-shot Enhancement In-context
Learning (KFE) framework leverages the in-context learning ability of LLMs to
integrate diverse external clinical knowledge sources. We evaluated KFE with
ChatGPT(GPT3.5), GPT4, Baichuan2(BC2)-7B, and BC2-13B in CNMLE-2022 and
investigated the effectiveness of different pathways for incorporating LLMs
with medical knowledge from 7 perspectives.
</p>
<p>$\textbf{Results}$: Directly applying ChatGPT failed to qualify for the
CNMLE-2022 at a score of 51. Cooperated with the KFE, the LLMs with varying
sizes yielded consistent and significant improvements. The ChatGPT's
performance surged to 70.04 and GPT-4 achieved the highest score of 82.59. This
surpasses the qualification threshold (60) and exceeds the average human score
of 68.70. It also enabled a smaller BC2-13B to pass the examination, showcasing
the great potential in low-resource settings.
</p>
<p>$\textbf{Conclusion}$: By synergizing medical knowledge through in-context
learning, LLM can extend clinical insight beyond language barriers,
significantly reducing language-related disparities of LLM applications and
ensuring global benefit in healthcare.
</p>
|
|
|
|
<p>The IoT is vulnerable to network attacks, and Intrusion Detection Systems
(IDS) can provide high attack detection accuracy and are easily installed in
IoT Servers. However, IDS are seldom evaluated in operational conditions which
are seriously impaired by attack overload. Thus a Local Area Network testbed is
used to evaluate the impact of UDP Flood Attacks on an IoT Server, whose first
line of defence is an accurate IDS. We show that attacks overload the
multi-core Server and paralyze its IDS. Thus a mitigation scheme that detects
attacks rapidly, and drops packets within milli-seconds after the attack
begins, is proposed and experimentally evaluated.
</p>
|
|
|
|
<p>Humans work together to solve common problems by having discussions,
explaining, and agreeing or disagreeing with each other. Similarly, if a system
can have discussions with humans when solving tasks, it can improve the
system's performance and reliability. In previous research on explainability,
it has only been possible for the system to make predictions and for humans to
ask questions about them rather than having a mutual exchange of opinions. This
research aims to create a dataset and computational framework for systems that
discuss and refine their predictions through dialogue. Through experiments, we
show that the proposed system can have beneficial discussions with humans
improving the accuracy by up to 25 points in the natural language inference
task.
</p>
|
|
|
|
<p>Representations from transformer-based unidirectional language models are
known to be effective at predicting brain responses to natural language.
However, most studies comparing language models to brains have used GPT-2 or
similarly sized language models. Here we tested whether larger open-source
models such as those from the OPT and LLaMA families are better at predicting
brain responses recorded using fMRI. Mirroring scaling results from other
contexts, we found that brain prediction performance scales logarithmically
with model size from 125M to 30B parameter models, with ~15% increased encoding
performance as measured by correlation with a held-out test set across 3
subjects. Similar logarithmic behavior was observed when scaling the size of
the fMRI training set. We also characterized scaling for acoustic encoding
models that use HuBERT, WavLM, and Whisper, and we found comparable
improvements with model size. A noise ceiling analysis of these large,
high-performance encoding models showed that performance is nearing the
theoretical maximum for brain areas such as the precuneus and higher auditory
cortex. These results suggest that increasing scale in both models and data
will yield incredibly effective models of language processing in the brain,
enabling better scientific understanding as well as applications such as
decoding.
</p>
|
|
|
|
<p>Speech language models (SpeechLMs) process and generate acoustic data only,
without textual supervision. In this work, we propose TWIST, a method for
training SpeechLMs using a warm-start from a pretrained textual language
models. We show using both automatic and human evaluations that TWIST
outperforms a cold-start SpeechLM across the board. We empirically analyze the
effect of different model design choices such as the speech tokenizer, the
pretrained textual model, and the dataset size. We find that model and dataset
scale both play an important role in constructing better-performing SpeechLMs.
Based on our observations, we present the largest (to the best of our
knowledge) SpeechLM both in terms of number of parameters and training data. We
additionally introduce two spoken versions of the StoryCloze textual benchmark
to further improve model evaluation and advance future research in the field.
We make speech samples, code and models publicly available:
https://pages.cs.huji.ac.il/adiyoss-lab/twist/ .
</p>
|
|
|
|
<p>Training generally capable agents that thoroughly explore their environment
and learn new and diverse skills is a long-term goal of robot learning. Quality
Diversity Reinforcement Learning (QD-RL) is an emerging research area that
blends the best aspects of both fields -- Quality Diversity (QD) provides a
principled form of exploration and produces collections of behaviorally diverse
agents, while Reinforcement Learning (RL) provides a powerful performance
improvement operator enabling generalization across tasks and dynamic
environments. Existing QD-RL approaches have been constrained to sample
efficient, deterministic off-policy RL algorithms and/or evolution strategies,
and struggle with highly stochastic environments. In this work, we, for the
first time, adapt on-policy RL, specifically Proximal Policy Optimization
(PPO), to the Differentiable Quality Diversity (DQD) framework and propose
additional improvements over prior work that enable efficient optimization and
discovery of novel skills on challenging locomotion tasks. Our new algorithm,
Proximal Policy Gradient Arborescence (PPGA), achieves state-of-the-art
results, including a 4x improvement in best reward over baselines on the
challenging humanoid domain.
</p>
|
|
|
|
<p>Diffusion models, as a kind of powerful generative model, have given
impressive results on image super-resolution (SR) tasks. However, due to the
randomness introduced in the reverse process of diffusion models, the
performances of diffusion-based SR models are fluctuating at every time of
sampling, especially for samplers with few resampled steps. This inherent
randomness of diffusion models results in ineffectiveness and instability,
making it challenging for users to guarantee the quality of SR results.
However, our work takes this randomness as an opportunity: fully analyzing and
leveraging it leads to the construction of an effective plug-and-play sampling
method that owns the potential to benefit a series of diffusion-based SR
methods. More in detail, we propose to steadily sample high-quality SR images
from pre-trained diffusion-based SR models by solving diffusion ordinary
differential equations (diffusion ODEs) with optimal boundary conditions (BCs)
and analyze the characteristics between the choices of BCs and their
corresponding SR results. Our analysis shows the route to obtain an
approximately optimal BC via an efficient exploration in the whole space. The
quality of SR results sampled by the proposed method with fewer steps
outperforms the quality of results sampled by current methods with randomness
from the same pre-trained diffusion-based SR model, which means that our
sampling method "boosts" current diffusion-based SR models without any
additional training.
</p>
|
|
|
|
<p>This paper proposes a new easy-to-implement parameter-free gradient-based
optimizer: DoWG (Distance over Weighted Gradients). We prove that DoWG is
efficient -- matching the convergence rate of optimally tuned gradient descent
in convex optimization up to a logarithmic factor without tuning any
parameters, and universal -- automatically adapting to both smooth and
nonsmooth problems. While popular algorithms following the AdaGrad framework
compute a running average of the squared gradients to use for normalization,
DoWG maintains a new distance-based weighted version of the running average,
which is crucial to achieve the desired properties. To complement our theory,
we also show empirically that DoWG trains at the edge of stability, and
validate its effectiveness on practical machine learning tasks.
</p>
|
|
|
|
<p>Unsupervised domain adaptation (UDA) involves adapting a model trained on a
label-rich source domain to an unlabeled target domain. However, in real-world
scenarios, the absence of target-domain labels makes it challenging to evaluate
the performance of UDA models. Furthermore, prevailing UDA methods relying on
adversarial training and self-training could lead to model degeneration and
negative transfer, further exacerbating the evaluation problem. In this paper,
we propose a novel metric called the \textit{Transfer Score} to address these
issues. The proposed metric enables the unsupervised evaluation of UDA models
by assessing the spatial uniformity of the classifier via model parameters, as
well as the transferability and discriminability of deep representations. Based
on the metric, we achieve three novel objectives without target-domain labels:
(1) selecting the best UDA method from a range of available options, (2)
optimizing hyperparameters of UDA models to prevent model degeneration, and (3)
identifying which checkpoint of UDA model performs optimally. Our work bridges
the gap between data-level UDA research and practical UDA scenarios, enabling a
realistic assessment of UDA model performance. We validate the effectiveness of
our metric through extensive empirical studies on UDA datasets of different
scales and imbalanced distributions. The results demonstrate that our metric
robustly achieves the aforementioned goals.
</p>
|
|
|
|
<p>Understanding how social situations unfold in people's daily lives is
relevant to designing mobile systems that can support users in their personal
goals, well-being, and activities. As an alternative to questionnaires, some
studies have used passively collected smartphone sensor data to infer social
context (i.e., being alone or not) with machine learning models. However, the
few existing studies have focused on specific daily life occasions and limited
geographic cohorts in one or two countries. This limits the understanding of
how inference models work in terms of generalization to everyday life occasions
and multiple countries. In this paper, we used a novel, large-scale, and
multimodal smartphone sensing dataset with over 216K self-reports collected
from 581 young adults in five countries (Mongolia, Italy, Denmark, UK,
Paraguay), first to understand whether social context inference is feasible
with sensor data, and then, to know how behavioral and country-level diversity
affects inferences. We found that several sensors are informative of social
context, that partially personalized multi-country models (trained and tested
with data from all countries) and country-specific models (trained and tested
within countries) can achieve similar performance above 90% AUC, and that
models do not generalize well to unseen countries regardless of geographic
proximity. These findings confirm the importance of the diversity of mobile
data, to better understand social context inference models in different
countries.
</p>
|
|
|
|
<p>Audio bandwidth extension involves the realistic reconstruction of
high-frequency spectra from bandlimited observations. In cases where the
lowpass degradation is unknown, such as in restoring historical audio
recordings, this becomes a blind problem. This paper introduces a novel method
called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem
in a zero-shot setting, leveraging the generative priors of a pre-trained
unconditional diffusion model. During the inference process, BABE utilizes a
generalized version of diffusion posterior sampling, where the degradation
operator is unknown but parametrized and inferred iteratively. The performance
of the proposed method is evaluated using objective and subjective metrics, and
the results show that BABE surpasses state-of-the-art blind bandwidth extension
baselines and achieves competitive performance compared to informed methods
when tested with synthetic data. Moreover, BABE exhibits robust generalization
capabilities when enhancing real historical recordings, effectively
reconstructing the missing high-frequency content while maintaining coherence
with the original recording. Subjective preference tests confirm that BABE
significantly improves the audio quality of historical music recordings.
Examples of historical recordings restored with the proposed method are
available on the companion webpage:
(<a href="http://research.spa.aalto.fi/publications/papers/ieee-taslp-babe/">this http URL</a>)
</p>
|
|
|
|
<p>This work proposes a Momentum-Enabled Kronecker-Factor-Based Optimizer Using
Rank-1 updates, called MKOR, that improves the training time and convergence
properties of deep neural networks (DNNs). Second-order techniques, while
enjoying higher convergence rates vs first-order counterparts, have cubic
complexity with respect to either the model size and/or the training batch
size. Hence they exhibit poor scalability and performance in transformer
models, e.g. large language models (LLMs), because the batch sizes in these
models scale by the attention mechanism sequence length, leading to large model
size and batch sizes. MKOR's complexity is quadratic with respect to the model
size, alleviating the computation bottlenecks in second-order methods. Because
of their high computation complexity, state-of-the-art implementations of
second-order methods can only afford to update the second order information
infrequently, and thus do not fully exploit the promise of better convergence
from these updates. By reducing the communication complexity of the
second-order updates as well as achieving a linear communication complexity,
MKOR increases the frequency of second order updates. We also propose a hybrid
version of MKOR (called MKOR-H) that mid-training falls backs to a first order
optimizer if the second order updates no longer accelerate convergence. Our
experiments show that MKOR outperforms state -of-the-art first order methods,
e.g. the LAMB optimizer, and best implementations of second-order methods, i.e.
KAISA/KFAC, up to 2.57x and 1.85x respectively on BERT-Large-Uncased on 64
GPUs.
</p>
|
|
|
|
<p>We tackle the task of conditional music generation. We introduce MusicGen, a
single Language Model (LM) that operates over several streams of compressed
discrete music representation, i.e., tokens. Unlike prior work, MusicGen is
comprised of a single-stage transformer LM together with efficient token
interleaving patterns, which eliminates the need for cascading several models,
e.g., hierarchically or upsampling. Following this approach, we demonstrate how
MusicGen can generate high-quality samples, both mono and stereo, while being
conditioned on textual description or melodic features, allowing better
controls over the generated output. We conduct extensive empirical evaluation,
considering both automatic and human studies, showing the proposed approach is
superior to the evaluated baselines on a standard text-to-music benchmark.
Through ablation studies, we shed light over the importance of each of the
components comprising MusicGen. Music samples, code, and models are available
at https://github.com/facebookresearch/audiocraft
</p>
|
|
|
|
<p>Despite advances in generative methods, accurately modeling the distribution
of graphs remains a challenging task primarily because of the absence of
predefined or inherent unique graph representation. Two main strategies have
emerged to tackle this issue: 1) restricting the number of possible
representations by sorting the nodes, or 2) using
permutation-invariant/equivariant functions, specifically Graph Neural Networks
(GNNs).
</p>
<p>In this paper, we introduce a new framework named Discrete Graph Auto-Encoder
(DGAE), which leverages the strengths of both strategies and mitigate their
respective limitations. In essence, we propose a strategy in 2 steps. We first
use a permutation-equivariant auto-encoder to convert graphs into sets of
discrete latent node representations, each node being represented by a sequence
of quantized vectors. In the second step, we sort the sets of discrete latent
representations and learn their distribution with a specifically designed
auto-regressive model based on the Transformer architecture.
</p>
<p>Through multiple experimental evaluations, we demonstrate the competitive
performances of our model in comparison to the existing state-of-the-art across
various datasets. Various ablation studies support the interest of our method.
</p>
|
|
|
|
<p>The rise of Generative Artificial Intelligence systems ("AI systems") has
created unprecedented social engagement. AI code generation systems provide
responses (output) to questions or requests by accessing the vast library of
open-source code created by developers over the past few decades. However, they
do so by allegedly stealing the open-source code stored in virtual libraries,
known as repositories. This Article focuses on how this happens and whether
there is a solution that protects innovation and avoids years of litigation. We
also touch upon the array of issues raised by the relationship between AI and
copyright. Looking ahead, we propose the following: (a) immediate changes to
the licenses for open-source code created by developers that will limit access
and/or use of any open-source code to humans only; (b) we suggest revisions to
the Massachusetts Institute of Technology ("MIT") license so that AI systems
are required to procure appropriate licenses from open-source code developers,
which we believe will harmonize standards and build social consensus for the
benefit of all of humanity, rather than promote profit-driven centers of
innovation; (c) we call for urgent legislative action to protect the future of
AI systems while also promoting innovation; and (d) we propose a shift in the
burden of proof to AI systems in obfuscation cases.
</p>
|
|
|
|
<p>By combining voice and touch interactions, multimodal interfaces can surpass
the efficiency of either modality alone. This paper targets complex
interactions, where users can issue multimodal commands that translate into one
of the possible exponential combinations of actions/function invocations. This
paper presents ReactGenie, a programming framework where developers can code
with simple object-oriented abstractions and labeled user-invocable primitives.
ReactGenie translates multimodal user commands into ReactGenieDSL, a
domain-specific language we created for this purpose, using a neural semantic
parser based on large-language models. The ReactGenie runtime interprets the
parsed ReactGenieDSL and composes primitives to implement complex user
commands. As a result, ReactGenie provides an unprecedented level of richness
in user interactions. Our evaluation showed that 12 developers can learn and
build a ReactGenie application in under 2.5 hours on average. In addition,
compared with a traditional GUI, end users can complete tasks faster and with
less task load using ReactGenie apps.
</p>
|
|
|
|
<p>Recently, ChatGPT, a representative large language model (LLM), has gained
considerable attention due to its powerful emergent abilities. Some researchers
suggest that LLMs could potentially replace structured knowledge bases like
knowledge graphs (KGs) and function as parameterized knowledge bases. However,
while LLMs are proficient at learning probabilistic language patterns based on
large corpus and engaging in conversations with humans, they, like previous
smaller pre-trained language models (PLMs), still have difficulty in recalling
facts while generating knowledge-grounded contents. To overcome these
limitations, researchers have proposed enhancing data-driven PLMs with
knowledge-based KGs to incorporate explicit factual knowledge into PLMs, thus
improving their performance to generate texts requiring factual knowledge and
providing more informed responses to user queries. This paper reviews the
studies on enhancing PLMs with KGs, detailing existing knowledge graph enhanced
pre-trained language models (KGPLMs) as well as their applications. Inspired by
existing studies on KGPLM, this paper proposes to enhance LLMs with KGs by
developing knowledge graph-enhanced large language models (KGLLMs). KGLLM
provides a solution to enhance LLMs' factual reasoning ability, opening up new
avenues for LLM research.
</p>
|
|
|
|
<p>Despite deep learning has achieved great success, it often relies on a large
amount of training data with accurate labels, which are expensive and
time-consuming to collect. A prominent direction to reduce the cost is to learn
with noisy labels, which are ubiquitous in the real-world applications. A
critical challenge for such a learning task is to reduce the effect of network
memorization on the falsely-labeled data. In this work, we propose an iterative
selection approach based on the Weibull mixture model, which identifies clean
data by considering the overall learning dynamics of each data instance. In
contrast to the previous small-loss heuristics, we leverage the observation
that deep network is easy to memorize and hard to forget clean data. In
particular, we measure the difficulty of memorization and forgetting for each
instance via the transition times between being misclassified and being
memorized in training, and integrate them into a novel metric for selection.
Based on the proposed metric, we retain a subset of identified clean data and
repeat the selection procedure to iteratively refine the clean subset, which is
finally used for model training. To validate our method, we perform extensive
experiments on synthetic noisy datasets and real-world web data, and our
strategy outperforms existing noisy-label learning methods.
</p>
|
|
|
|
<p>One of the fundamental results in quantum foundations is the Kochen-Specker
(KS) theorem, which states that any theory whose predictions agree with quantum
mechanics must be contextual, i.e., a quantum observation cannot be understood
as revealing a pre-existing value. The theorem hinges on the existence of a
mathematical object called a KS vector system. While many KS vector systems are
known, the problem of finding the minimum KS vector system in three dimensions
(3D) has remained stubbornly open for over 55 years.
</p>
<p>To address the minimum KS problem, we present a new verifiable
proof-producing method based on a combination of a Boolean satisfiability (SAT)
solver and a computer algebra system (CAS) that uses an isomorph-free orderly
generation technique that is very effective in pruning away large parts of the
search space. Our method shows that a KS system in 3D must contain at least 24
vectors. We show that our sequential and parallel Cube-and-Conquer (CnC)
SAT+CAS methods are significantly faster than SAT-only, CAS-only, and a prior
CAS-based method of Uijlen and Westerbaan. Further, while our parallel pipeline
is somewhat slower than the parallel CnC version of the recently introduced
Satisfiability Modulo Theories (SMS) method, this is in part due to the
overhead of proof generation. Finally, we provide the first computer-verifiable
proof certificate of a lower bound to the KS problem with a size of 42.9 TiB in
order 23.
</p>
|
|
|
|
<p>We introduce a general abstract framework for database repairing in which the
repair notions are defined using formal logic. We differentiate between
integrity constraints and the so-called query constraints. The former are used
to model consistency and desirable properties of the data (such as functional
dependencies and independencies), while the latter relates two database
instances according to their answers for the query constraints. The framework
also admits a distinction between hard and soft queries, allowing to preserve
the answers of a core set of queries as well as defining a distance between
instances based on query answers. We exemplify how various notions of repairs
from the literature can be modelled in our unifying framework. Furthermore, we
initiate a complexity-theoretic analysis of the problems of consistent query
answering, repair computation, and existence of repair within the new
framework. We present both coNP- and NP-hard cases that illustrate the
interplay between computationally hard problems and more flexible repair
notions. We show general upper bounds in NP and the second level of the
polynomial hierarchy. Finally, we relate the existence of a repair to model
checking of existential second-order logic.
</p>
|
|
|
|
<p>Matroidal entropy functions are entropy functions in the form $\mathbf{h} =
\log v \cdot \mathbf{r}_M$ , where $v \ge 2$ is an integer and $\mathbf{r}_M$
is the rank function of a matroid $M$. They can be applied into capacity
characterization and code construction of information theory problems such as
network coding, secret sharing, index coding and locally repairable code. In
this paper, by constructing the variable strength arrays of some matroid
operations, we characterized matroidal entropy functions induced by regular
matroids and some matroids with the same p-characteristic set as uniform
matroid $U_{2,4}$.
</p>
|
|
|
|
<p>Transfer learning plays a key role in modern data analysis when: (1) the
target data are scarce but the source data are sufficient; (2) the
distributions of the source and target data are heterogeneous. This paper
develops an interpretable unified transfer learning model, termed as UTrans,
which can detect both transferable variables and source data. More
specifically, we establish the estimation error bounds and prove that our
bounds are lower than those with target data only. Besides, we propose a source
detection algorithm based on hypothesis testing to exclude the nontransferable
data. We evaluate and compare UTrans to the existing algorithms in multiple
experiments. It is shown that UTrans attains much lower estimation and
prediction errors than the existing methods, while preserving interpretability.
We finally apply it to the US intergenerational mobility data and compare our
proposed algorithms to the classical machine learning algorithms.
</p>
|
|
|
|
<p>The expressiveness of neural networks highly depends on the nature of the
activation function, although these are usually assumed predefined and fixed
during the training stage. Under a signal processing perspective, in this paper
we present Expressive Neural Network (ENN), a novel model in which the
non-linear activation functions are modeled using the Discrete Cosine Transform
(DCT) and adapted using backpropagation during training. This parametrization
keeps the number of trainable parameters low, is appropriate for gradient-based
schemes, and adapts to different learning tasks. This is the first non-linear
model for activation functions that relies on a signal processing perspective,
providing high flexibility and expressiveness to the network. We contribute
with insights in the explainability of the network at convergence by recovering
the concept of bump, this is, the response of each activation function in the
output space. Finally, through exhaustive experiments we show that the model
can adapt to classification and regression tasks. The performance of ENN
outperforms state of the art benchmarks, providing above a 40% gap in accuracy
in some scenarios.
</p>
|
|
|
|
<p>As a popular channel pruning method for convolutional neural networks (CNNs),
network slimming (NS) has a three-stage process: (1) it trains a CNN with
$\ell_1$ regularization applied to the scaling factors of the batch
normalization layers; (2) it removes channels whose scaling factors are below a
chosen threshold; and (3) it retrains the pruned model to recover the original
accuracy. This time-consuming, three-step process is a result of using
subgradient descent to train CNNs. Because subgradient descent does not exactly
train CNNs towards sparse, accurate structures, the latter two steps are
necessary. Moreover, subgradient descent does not have any convergence
guarantee. Therefore, we develop an alternative algorithm called proximal NS.
Our proposed algorithm trains CNNs towards sparse, accurate structures, so
identifying a scaling factor threshold is unnecessary and fine tuning the
pruned CNNs is optional. Using Kurdyka-{\L}ojasiewicz assumptions, we establish
global convergence of proximal NS. Lastly, we validate the efficacy of the
proposed algorithm on VGGNet, DenseNet and ResNet on CIFAR 10/100. Our
experiments demonstrate that after one round of training, proximal NS yields a
CNN with competitive accuracy and compression.
</p>
|
|
|
|
<p>We provide two families of algorithms to compute characteristic polynomials
of endomorphisms and norms of isogenies of Drinfeld modules. Our algorithms
work for Drinfeld modules of any rank, defined over any base curve. When the
base curve is $\mathbb P^1_{\mathbb F_q}$, we do a thorough study of the
complexity, demonstrating that our algorithms are, in many cases, the most
asymptotically performant. The first family of algorithms relies on the
correspondence between Drinfeld modules and Anderson motives, reducing the
computation to linear algebra over a polynomial ring. The second family,
available only for the Frobenius endomorphism, is based on a formula expressing
the characteristic polynomial of the Frobenius as a reduced norm in a central
simple algebra.
</p>
|
|
|
|
<p>In this paper, we provide a theoretical analysis of the recently introduced
weakly adversarial networks (WAN) method, used to approximate partial
differential equations in high dimensions. We address the existence and
stability of the solution, as well as approximation bounds. More precisely, we
prove the existence of discrete solutions, intended in a suitable weak sense,
for which we prove a quasi-best approximation estimate similar to Cea's lemma,
a result commonly found in finite element methods. We also propose two new
stabilized WAN-based formulas that avoid the need for direct normalization.
Furthermore, we analyze the method's effectiveness for the Dirichlet boundary
problem that employs the implicit representation of the geometry. The key
requirement for achieving the best approximation outcome is to ensure that the
space for the test network satisfies a specific condition, known as the inf-sup
condition, essentially requiring that the test network set is sufficiently
large when compared to the trial space. The method's accuracy, however, is only
determined by the space of the trial network. We also devise a pseudo-time
XNODE neural network class for static PDE problems, yielding significantly
faster convergence results than the classical DNN network.
</p>
|
|
|
|
<p>Translational research requires data at multiple scales of biological
organization. Advancements in sequencing and multi-omics technologies have
increased the availability of these data, but researchers face significant
integration challenges. Knowledge graphs (KGs) are used to model complex
phenomena, and methods exist to construct them automatically. However, tackling
complex biomedical integration problems requires flexibility in the way
knowledge is modeled. Moreover, existing KG construction methods provide robust
tooling at the cost of fixed or limited choices among knowledge representation
models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem
for automating the FAIR (Findable, Accessible, Interoperable, and Reusable)
construction of ontologically grounded KGs with fully customizable knowledge
representation. The ecosystem includes KG construction resources (e.g., data
preparation APIs), analysis tools (e.g., SPARQL endpoints and abstraction
algorithms), and benchmarks (e.g., prebuilt KGs and embeddings). We evaluated
the ecosystem by systematically comparing it to existing open-source KG
construction methods and by analyzing its computational performance when used
to construct 12 large-scale KGs. With flexible knowledge representation,
PheKnowLator enables fully customizable KGs without compromising performance or
usability.
</p>
|
|
|
|
<p>Optimal Transport has sparked vivid interest in recent years, in particular
thanks to the Wasserstein distance, which provides a geometrically sensible and
intuitive way of comparing probability measures. For computational reasons, the
Sliced Wasserstein (SW) distance was introduced as an alternative to the
Wasserstein distance, and has seen uses for training generative Neural Networks
(NNs). While convergence of Stochastic Gradient Descent (SGD) has been observed
practically in such a setting, there is to our knowledge no theoretical
guarantee for this observation. Leveraging recent works on convergence of SGD
on non-smooth and non-convex functions by Bianchi et al. (2022), we aim to
bridge that knowledge gap, and provide a realistic context under which
fixed-step SGD trajectories for the SW loss on NN parameters converge. More
precisely, we show that the trajectories approach the set of (sub)-gradient
flow equations as the step decreases. Under stricter assumptions, we show a
much stronger convergence result for noised and projected SGD schemes, namely
that the long-run limits of the trajectories approach a set of generalised
critical points of the loss function.
</p>
|
|
|
|
<p>This paper unveils CG-Eval, the first-ever comprehensive and automated
evaluation framework designed for assessing the generative capabilities of
large Chinese language models across a spectrum of academic disciplines.
CG-Eval stands out for its automated process, which critically assesses models
based on their proficiency in generating precise and contextually relevant
responses to a diverse array of questions within six key domains: Science and
Engineering, Humanities and Social Sciences, Mathematical Calculations, Medical
Practitioner Qualification Examination, Judicial Examination, and Certified
Public Accountant Examination. Alongside this, we introduce Gscore, an
innovative composite index developed from a weighted sum of multiple metrics.
Gscore uniquely automates the quality measurement of a model's text generation
against reference standards, providing a detailed and nuanced assessment of
model performance. This automation not only enhances the efficiency and
scalability of the evaluation process but also ensures objective and consistent
assessment across various models. The detailed test data and results,
highlighting the robust capabilities and comparative performance of the
evaluated models, are accessible at <a href="http://cgeval.besteasy.com/.">this http URL</a>
</p>
|
|
|
|
<p>Self-supervised learning (SSL) has gained significant interest in recent
years as a solution to address the challenges posed by sparse and noisy data in
recommender systems. Despite the growing number of SSL algorithms designed to
provide state-of-the-art performance in various recommendation scenarios (e.g.,
graph collaborative filtering, sequential recommendation, social
recommendation, KG-enhanced recommendation), there is still a lack of unified
frameworks that integrate recommendation algorithms across different domains.
Such a framework could serve as the cornerstone for self-supervised
recommendation algorithms, unifying the validation of existing methods and
driving the design of new ones. To address this gap, we introduce SSLRec, a
novel benchmark platform that provides a standardized, flexible, and
comprehensive framework for evaluating various SSL-enhanced recommenders. The
SSLRec framework features a modular architecture that allows users to easily
evaluate state-of-the-art models and a complete set of data augmentation and
self-supervised toolkits to help create SSL recommendation models with specific
needs. Furthermore, SSLRec simplifies the process of training and evaluating
different recommendation models with consistent and fair settings. Our SSLRec
platform covers a comprehensive set of state-of-the-art SSL-enhanced
recommendation models across different scenarios, enabling researchers to
evaluate these cutting-edge models and drive further innovation in the field.
Our implemented SSLRec framework is available at the source code repository
https://github.com/HKUDS/SSLRec.
</p>
|
|
|
|
<p>Recent proliferation of electric vehicle (EV) charging events has brought
prominent stress over power grid operation. Due to the stochastic and volatile
EV charging behaviors, the induced charging loads are extremely uncertain,
posing modeling and control challenges for grid operators and charging
management. Generating EV charging scenarios would aid via synthesizing a
myriad of realistic charging scenarios. To this end, we propose a novel
denoising Diffusion-based Charging scenario generation model DiffCharge, which
is capable of generating a broad variety of realistic EV charging profiles with
distinctive temporal properties. It is able to progressively convert the simply
known Gaussian noise to genuine charging time-series data, by learning a
parameterized reversal of a forward diffusion process. Besides, we leverage the
multi-head self-attention and prior conditions to capture the temporal
correlations and unique information associated with EV or charging station
types in real charging profiles. Moreover, We demonstrate the superiority of
DiffCharge on extensive real-world charging datasets, as well as the efficacy
on EV integration in power distribution grids.
</p>
|
|
|
|
<p>The increasing versatility of language models LMs has given rise to a new
class of benchmarks that comprehensively assess a broad range of capabilities.
Such benchmarks are associated with massive computational costs reaching
thousands of GPU hours per model. However the efficiency aspect of these
evaluation efforts had raised little discussion in the literature. In this work
we present the problem of Efficient Benchmarking namely intelligently reducing
the computation costs of LM evaluation without compromising reliability. Using
the HELM benchmark as a test case we investigate how different benchmark design
choices affect the computation-reliability tradeoff. We propose to evaluate the
reliability of such decisions by using a new measure Decision Impact on
Reliability DIoR for short. We find for example that the current leader on HELM
may change by merely removing a low-ranked model from the benchmark and observe
that a handful of examples suffice to obtain the correct benchmark ranking.
Conversely a slightly different choice of HELM scenarios varies ranking widely.
Based on our findings we outline a set of concrete recommendations for more
efficient benchmark design and utilization practices leading to dramatic cost
savings with minimal loss of benchmark reliability often reducing computation
by x100 or more.
</p>
|
|
|
|
<p>Deep learning (DL) has become one of the mainstream and effective methods for
point cloud analysis tasks such as detection, segmentation and classification.
To reduce overfitting during training DL models and improve model performance
especially when the amount and/or diversity of training data are limited,
augmentation is often crucial. Although various point cloud data augmentation
methods have been widely used in different point cloud processing tasks, there
are currently no published systematic surveys or reviews of these methods.
Therefore, this article surveys these methods, categorizing them into a
taxonomy framework that comprises basic and advanced point cloud data
augmentation methods, according to their levels of complexity. Through a
comprehensive evaluation of these augmentation methods, this article identifies
their potentials and limitations, serving as a useful reference for choosing
appropriate augmentation methods. In addition, potential directions for future
research are recommended. This survey contributes to providing a holistic
overview of the current state of point cloud data augmentation, promoting its
wider application and development.
</p>
|
|
|
|
<p>Graph Neural Networks (GNNs) have emerged as promising solutions for
collaborative filtering (CF) through the modeling of user-item interaction
graphs. The nucleus of existing GNN-based recommender systems involves
recursive message passing along user-item interaction edges to refine encoded
embeddings. Despite their demonstrated effectiveness, current GNN-based methods
encounter challenges of limited receptive fields and the presence of noisy
``interest-irrelevant'' connections. In contrast, Transformer-based methods
excel in aggregating information adaptively and globally. Nevertheless, their
application to large-scale interaction graphs is hindered by inherent
complexities and challenges in capturing intricate, entangled structural
information. In this paper, we propose TransGNN, a novel model that integrates
Transformer and GNN layers in an alternating fashion to mutually enhance their
capabilities. Specifically, TransGNN leverages Transformer layers to broaden
the receptive field and disentangle information aggregation from edges, which
aggregates information from more relevant nodes, thereby enhancing the message
passing of GNNs. Additionally, to capture graph structure information
effectively, positional encoding is meticulously designed and integrated into
GNN layers to encode such structural knowledge into node attributes, thus
enhancing the Transformer's performance on graphs. Efficiency considerations
are also alleviated by proposing the sampling of the most relevant nodes for
the Transformer, along with two efficient sample update strategies to reduce
complexity. Furthermore, theoretical analysis demonstrates that TransGNN offers
increased expressiveness compared to GNNs, with only a marginal increase in
linear complexity. Extensive experiments on five public datasets validate the
effectiveness and efficiency of TransGNN.
</p>
|
|
|
|
<p>During the past decade, deep neural networks have led to fast-paced progress
and significant achievements in computer vision problems, for both academia and
industry. Yet despite their success, state-of-the-art image classification
approaches fail to generalize well in previously unseen visual contexts, as
required by many real-world applications. In this paper, we focus on this
domain generalization (DG) problem and argue that the generalization ability of
deep convolutional neural networks can be improved by taking advantage of
multi-layer and multi-scaled representations of the network. We introduce a
framework that aims at improving domain generalization of image classifiers by
combining both low-level and high-level features at multiple scales, enabling
the network to implicitly disentangle representations in its latent space and
learn domain-invariant attributes of the depicted objects. Additionally, to
further facilitate robust representation learning, we propose a novel objective
function, inspired by contrastive learning, which aims at constraining the
extracted representations to remain invariant under distribution shifts. We
demonstrate the effectiveness of our method by evaluating on the domain
generalization datasets of PACS, VLCS, Office-Home and NICO. Through extensive
experimentation, we show that our model is able to surpass the performance of
previous DG methods and consistently produce competitive and state-of-the-art
results in all datasets.
</p>
|
|
|
|
<p>High-definition (HD) maps play a crucial role in autonomous driving systems.
Recent methods have attempted to construct HD maps in real-time using vehicle
onboard sensors. Due to the inherent limitations of onboard sensors, which
include sensitivity to detection range and susceptibility to occlusion by
nearby vehicles, the performance of these methods significantly declines in
complex scenarios and long-range detection tasks. In this paper, we explore a
new perspective that boosts HD map construction through the use of satellite
maps to complement onboard sensors. We initially generate the satellite map
tiles for each sample in nuScenes and release a complementary dataset for
further research. To enable better integration of satellite maps with existing
methods, we propose a hierarchical fusion module, which includes feature-level
fusion and BEV-level fusion. The feature-level fusion, composed of a mask
generator and a masked cross-attention mechanism, is used to refine the
features from onboard sensors. The BEV-level fusion mitigates the coordinate
differences between features obtained from onboard sensors and satellite maps
through an alignment module. The experimental results on the augmented nuScenes
showcase the seamless integration of our module into three existing HD map
construction methods. The satellite maps and our proposed module notably
enhance their performance in both HD map semantic segmentation and instance
detection tasks.
</p>
|
|
|
|
<p>Recommender models excel at providing domain-specific item recommendations by
leveraging extensive user behavior data. Despite their ability to act as
lightweight domain experts, they struggle to perform versatile tasks such as
providing explanations and engaging in conversations. On the other hand, large
language models (LLMs) represent a significant step towards artificial general
intelligence, showcasing remarkable capabilities in instruction comprehension,
commonsense reasoning, and human interaction. However, LLMs lack the knowledge
of domain-specific item catalogs and behavioral patterns, particularly in areas
that diverge from general world knowledge, such as online e-commerce.
Finetuning LLMs for each domain is neither economic nor efficient.
</p>
<p>In this paper, we bridge the gap between recommender models and LLMs,
combining their respective strengths to create a versatile and interactive
recommender system. We introduce an efficient framework called
\textbf{InteRecAgent}, which employs LLMs as the brain and recommender models
as tools. We first outline a minimal set of essential tools required to
transform LLMs into InteRecAgent. We then propose an efficient workflow within
InteRecAgent for task execution, incorporating key components such as memory
components, dynamic demonstration-augmented task planning, and reflection.
InteRecAgent enables traditional recommender systems, such as those ID-based
matrix factorization models, to become interactive systems with a natural
language interface through the integration of LLMs. Experimental results on
several public datasets show that InteRecAgent achieves satisfying performance
as a conversational recommender system, outperforming general-purpose LLMs. The
source code of InteRecAgent is released at https://aka.ms/recagent.
</p>
|
|
|
|
<p>How to solve high-dimensional linear programs (LPs) efficiently is a
fundamental question. Recently, there has been a surge of interest in reducing
LP sizes using \textit{random projections}, which can accelerate solving LPs
independently of improving LP solvers.
</p>
<p>In this paper, we explore a new direction of \emph{data-driven projections},
which use projection matrices learned from data instead of random projection
matrices. Given data of past $n$-dimensional LPs, we learn an $n\times k$
projection matrix such that $n > k$. When addressing a future LP instance, we
reduce its dimensionality from $n$ to $k$ via the learned projection matrix,
solve the resulting LP to obtain a $k$-dimensional solution, and apply the
learned matrix to it to recover an $n$-dimensional solution.
</p>
<p>On the theoretical side, a natural question is: how much data is sufficient
to ensure the quality of recovered solutions? We address this question based on
the framework of \textit{data-driven algorithm design}, which connects the
amount of data sufficient for establishing generalization bounds to the
\textit{pseudo-dimension} of performance metrics. We obtain an
$\tilde{\mathrm{O}}(nk^2)$ upper bound on the pseudo-dimension, where
$\tilde{\mathrm{O}}$ compresses logarithmic factors. We also provide an
$\Omega(nk)$ lower bound, implying our result is tight up to an
$\tilde{\mathrm{O}}(k)$ factor.
</p>
<p>On the practical side, we explore two natural methods for learning projection
matrices: PCA- and gradient-based methods. While the former is simple and
efficient, the latter can sometimes lead to better solution quality. Our
experiments confirm the practical benefit of learning projection matrices from
data, achieving significantly higher solution quality than the existing random
projection while greatly reducing the time for solving LPs.
</p>
|
|
|
|
<p>Simulating turbulent flows is crucial for a wide range of applications, and
machine learning-based solvers are gaining increasing relevance. However,
achieving temporal stability when generalizing to longer rollout horizons
remains a persistent challenge for learned PDE solvers. In this work, we
analyze if fully data-driven fluid solvers that utilize an autoregressive
rollout based on conditional diffusion models are a viable option to address
this challenge. We investigate accuracy, posterior sampling, spectral behavior,
and temporal stability, while requiring that methods generalize to flow
parameters beyond the training regime. To quantitatively and qualitatively
benchmark the performance of a range of flow prediction approaches, three
challenging scenarios including incompressible and transonic flows, as well as
isotropic turbulence are employed. We find that even simple diffusion-based
approaches can outperform multiple established flow prediction methods in terms
of accuracy and temporal stability, while being on par with state-of-the-art
stabilization techniques like unrolling at training time. Such traditional
architectures are superior in terms of inference speed, however, the
probabilistic nature of diffusion approaches allows for inferring multiple
predictions that align with the statistics of the underlying physics. Overall,
our benchmark contains three carefully chosen data sets that are suitable for
probabilistic evaluation alongside various established flow prediction
architectures.
</p>
|
|
|
|
<p>Two CNF formulas are called ucp-equivalent, if they behave in the same way
with respect to the unit clause propagation (UCP). A formula is called
ucp-irredundant, if removing any clause leads to a formula which is not
ucp-equivalent to the original one. As a consequence of known results, the
ratio of the size of a ucp-irredundant formula and the size of a smallest
ucp-equivalent formula is at most $n^2$, where $n$ is the number of the
variables. We demonstrate an example of a ucp-irredundant formula for a
symmetric definite Horn function which is larger than a smallest ucp-equivalent
formula by a factor $\Omega(n/\ln n)$ and, hence, a general upper bound on the
above ratio cannot be smaller than this.
</p>
|
|
|
|
<p>In this paper, we explore the dynamic behavior of threshold networks on
undirected signed graphs. Much attention has been dedicated to understand the
convergence and long-term behavior of this model. Yet, an open question
persists: How does the underlying graph structure impact network dynamics?
Similar studies have been carried out for threshold networks and other types of
networks, but these primarily focus on unsigned networks. Here, we address the
question on signed threshold networks. We introduce the stability index of a
graph, related to the concepts of frustration and balance in signed graphs, to
establish a connection between the structure and the dynamics of such networks.
We show that graphs which present a negative stability index exhibit stable
dynamics, i.e., the dynamics converges to fixed points regardless of its
threshold parameters. Conversely, if at least one subgraph has a positive
stability index, oscillations in long term behavior may appear. Furthermore, we
generalize the analysis to network dynamics under periodic update modes and
explore the case of the existence of some subgraph with a positive stability
index, for which we find that attractors of super-polynomial period in the size
of the network may appear.
</p>
|
|
|
|
<p>Composed image retrieval, a task involving the search for a target image
using a reference image and a complementary text as the query, has witnessed
significant advancements owing to the progress made in cross-modal modeling.
Unlike the general image-text retrieval problem with only one alignment
relation, i.e., image-text, we argue for the existence of two types of
relations in composed image retrieval. The explicit relation pertains to the
reference image & complementary text-target image, which is commonly exploited
by existing methods. Besides this intuitive relation, the observations during
our practice have uncovered another implicit yet crucial relation, i.e.,
reference image & target image-complementary text, since we found that the
complementary text can be inferred by studying the relation between the target
image and the reference image. Regrettably, existing methods largely focus on
leveraging the explicit relation to learn their networks, while overlooking the
implicit relation. In response to this weakness, We propose a new framework for
composed image retrieval, termed dual relation alignment, which integrates both
explicit and implicit relations to fully exploit the correlations among the
triplets. Specifically, we design a vision compositor to fuse reference image
and target image at first, then the resulted representation will serve two
roles: (1) counterpart for semantic alignment with the complementary text and
(2) compensation for the complementary text to boost the explicit relation
modeling, thereby implant the implicit relation into the alignment learning.
Our method is evaluated on two popular datasets, CIRR and FashionIQ, through
extensive experiments. The results confirm the effectiveness of our
dual-relation learning in substantially enhancing composed image retrieval
performance.
</p>
|
|
|
|
<p>Physical systems can often be described via a continuous-time dynamical
system. In practice, the true system is often unknown and has to be learned
from measurement data. Since data is typically collected in discrete time, e.g.
by sensors, most methods in Gaussian process (GP) dynamics model learning are
trained on one-step ahead predictions. This can become problematic in several
scenarios, e.g. if measurements are provided at irregularly-sampled time steps
or physical system properties have to be conserved. Thus, we aim for a GP model
of the true continuous-time dynamics. Higher-order numerical integrators
provide the necessary tools to address this problem by discretizing the
dynamics function with arbitrary accuracy. Many higher-order integrators
require dynamics evaluations at intermediate time steps making exact GP
inference intractable. In previous work, this problem is often tackled by
approximating the GP posterior with variational inference. However, exact GP
inference is preferable in many scenarios, e.g. due to its mathematical
guarantees. In order to make direct inference tractable, we propose to leverage
multistep and Taylor integrators. We demonstrate how to derive flexible
inference schemes for these types of integrators. Further, we derive tailored
sampling schemes that allow to draw consistent dynamics functions from the
learned posterior. This is crucial to sample consistent predictions from the
dynamics model. We demonstrate empirically and theoretically that our approach
yields an accurate representation of the continuous-time system.
</p>
|
|
|
|
<p>Dynamics model learning deals with the task of inferring unknown dynamics
from measurement data and predicting the future behavior of the system. A
typical approach to address this problem is to train recurrent models. However,
predictions with these models are often not physically meaningful. Further,
they suffer from deteriorated behavior over time due to accumulating errors.
Often, simulators building on first principles are available being physically
meaningful by design. However, modeling simplifications typically cause
inaccuracies in these models. Consequently, hybrid modeling is an emerging
trend that aims to combine the best of both worlds. In this paper, we propose a
new approach to hybrid modeling, where we inform the latent states of a learned
model via a black-box simulator. This allows to control the predictions via the
simulator preventing them from accumulating errors. This is especially
challenging since, in contrast to previous approaches, access to the
simulator's latent states is not available. We tackle the task by leveraging
observers, a well-known concept from control theory, inferring unknown latent
states from observations and dynamics over time. In our learning-based setting,
we jointly learn the dynamics and an observer that infers the latent states via
the simulator. Thus, the simulator constantly corrects the latent states,
compensating for modeling mismatch caused by learning. To maintain flexibility,
we train an RNN-based residuum for the latent states that cannot be informed by
the simulator.
</p>
|
|
|
|
<p>We explore the impact of coarse quantization on low-rank matrix sensing in
the extreme scenario of dithered one-bit sampling, where the high-resolution
measurements are compared with random time-varying threshold levels. To recover
the low-rank matrix of interest from the highly-quantized collected data, we
offer an enhanced randomized Kaczmarz algorithm that efficiently solves the
emerging highly-overdetermined feasibility problem. Additionally, we provide
theoretical guarantees in terms of the convergence and sample size
requirements. Our numerical results demonstrate the effectiveness of the
proposed methodology.
</p>
|
|
|
|
<p>There are now many explainable AI methods for understanding the decisions of
a machine learning model. Among these are those based on counterfactual
reasoning, which involve simulating features changes and observing the impact
on the prediction. This article proposes to view this simulation process as a
source of creating a certain amount of knowledge that can be stored to be used,
later, in different ways. This process is illustrated in the additive model
and, more specifically, in the case of the naive Bayes classifier, whose
interesting properties for this purpose are shown.
</p>
|
|
|
|
<p>Language models often exhibit behaviors that improve performance on a
pre-training objective but harm performance on downstream tasks. We propose a
novel approach to removing undesirable behaviors by ablating a small number of
causal pathways between model components, with the intention of disabling the
computational circuit responsible for the bad behavior. Given a small dataset
of inputs where the model behaves poorly, we learn to ablate a small number of
important causal pathways. In the setting of reducing GPT-2 toxic language
generation, we find ablating just 12 of the 11.6K causal edges mitigates toxic
generation with minimal degradation of performance on other inputs.
</p>
|
|
|
|
<p>Video-based remote physiological measurement utilizes facial videos to
measure the blood volume change signal, which is also called remote
photoplethysmography (rPPG). Supervised methods for rPPG measurements have been
shown to achieve good performance. However, the drawback of these methods is
that they require facial videos with ground truth (GT) physiological signals,
which are often costly and difficult to obtain. In this paper, we propose
Contrast-Phys+, a method that can be trained in both unsupervised and
weakly-supervised settings. We employ a 3DCNN model to generate multiple
spatiotemporal rPPG signals and incorporate prior knowledge of rPPG into a
contrastive loss function. We further incorporate the GT signals into
contrastive learning to adapt to partial or misaligned labels. The contrastive
loss encourages rPPG/GT signals from the same video to be grouped together,
while pushing those from different videos apart. We evaluate our methods on
five publicly available datasets that include both RGB and Near-infrared
videos. Contrast-Phys+ outperforms the state-of-the-art supervised methods,
even when using partially available or misaligned GT signals, or no labels at
all. Additionally, we highlight the advantages of our methods in terms of
computational efficiency, noise robustness, and generalization. Our code is
available at https://github.com/zhaodongsun/contrast-phys.
</p>
|
|
|
|
<p>In-context learning (ICL) i.e. showing LLMs only a few task-specific
demonstrations has led to downstream gains with no task-specific fine-tuning
required. However, LLMs are sensitive to the choice of prompts, and therefore a
crucial research question is how to select good demonstrations for ICL. One
effective strategy is leveraging semantic similarity between the ICL
demonstrations and test inputs by using a text retriever, which however is
sub-optimal as that does not consider the LLM's existing knowledge about that
task. From prior work (Lyu et al., 2023), we already know that labels paired
with the demonstrations bias the model predictions. This leads us to our
hypothesis whether considering LLM's existing knowledge about the task,
especially with respect to the output label space can help in a better
demonstration selection strategy. Through extensive experimentation on three
text classification tasks, we find that it is beneficial to not only choose
semantically similar ICL demonstrations but also to choose those demonstrations
that help resolve the inherent label ambiguity surrounding the test example.
Interestingly, we find that including demonstrations that the LLM previously
mis-classified and also fall on the test example's decision boundary, brings
the most performance gain.
</p>
|
|
|
|
<p>Language models (LMs) have already demonstrated remarkable abilities in
understanding and generating both natural and formal language. Despite these
advances, their integration with real-world environments such as large-scale
knowledge bases (KBs) remains an underdeveloped area, affecting applications
such as semantic parsing and indulging in "hallucinated" information. This
paper is an experimental investigation aimed at uncovering the robustness
challenges that LMs encounter when tasked with knowledge base question
answering (KBQA). The investigation covers scenarios with inconsistent data
distribution between training and inference, such as generalization to unseen
domains, adaptation to various language variations, and transferability across
different datasets. Our comprehensive experiments reveal that even when
employed with our proposed data augmentation techniques, advanced small and
large language models exhibit poor performance in various dimensions. While the
LM is a promising technology, the robustness of the current form in dealing
with complex environments is fragile and of limited practicality because of the
data distribution issue. This calls for future research on data collection and
LM learning paradims.
</p>
|
|
|
|
<p>In recent years, predicting mobile app usage has become increasingly
important for areas like app recommendation, user behaviour analysis, and
mobile resource management. Existing models, however, struggle with the
heterogeneous nature of contextual data and the user cold start problem. This
study introduces a novel prediction model, Mobile App Prediction Leveraging
Large Language Model Embeddings (MAPLE), which employs Large Language Models
(LLMs) and installed app similarity to overcome these challenges. MAPLE
utilises the power of LLMs to process contextual data and discern intricate
relationships within it effectively. Additionally, we explore the use of
installed app similarity to address the cold start problem, facilitating the
modelling of user preferences and habits, even for new users with limited
historical data. In essence, our research presents MAPLE as a novel, potent,
and practical approach to app usage prediction, making significant strides in
resolving issues faced by existing models. MAPLE stands out as a comprehensive
and effective solution, setting a new benchmark for more precise and
personalised app usage predictions. In tests on two real-world datasets, MAPLE
surpasses contemporary models in both standard and cold start scenarios. These
outcomes validate MAPLE's capacity for precise app usage predictions and its
resilience against the cold start problem. This enhanced performance stems from
the model's proficiency in capturing complex temporal patterns and leveraging
contextual information. As a result, MAPLE can potentially improve personalised
mobile app usage predictions and user experiences markedly.
</p>
|
|
|
|
<p>With increasing frequency of high-profile privacy breaches in various online
platforms, users are becoming more concerned about their privacy. And
recommender system is the core component of online platforms for providing
personalized service, consequently, its privacy preservation has attracted
great attention. As the gold standard of privacy protection, differential
privacy has been widely adopted to preserve privacy in recommender systems.
However, existing differentially private recommender systems only consider
static and independent interactions, so they cannot apply to sequential
recommendation where behaviors are dynamic and dependent. Meanwhile, little
attention has been paid on the privacy risk of sensitive user features, most of
them only protect user feedbacks. In this work, we propose a novel
DIfferentially Private Sequential recommendation framework with a noisy Graph
Neural Network approach (denoted as DIPSGNN) to address these limitations. To
the best of our knowledge, we are the first to achieve differential privacy in
sequential recommendation with dependent interactions. Specifically, in
DIPSGNN, we first leverage piecewise mechanism to protect sensitive user
features. Then, we innovatively add calibrated noise into aggregation step of
graph neural network based on aggregation perturbation mechanism. And this
noisy graph neural network can protect sequentially dependent interactions and
capture user preferences simultaneously. Extensive experiments demonstrate the
superiority of our method over state-of-the-art differentially private
recommender systems in terms of better balance between privacy and accuracy.
</p>
|
|
|
|
<p>The development of large language models (LLMs) capable of following
instructions and engaging in conversational interactions sparked increased
interest in their utilization across various support tools. We investigate the
utility of modern LLMs in assisting professional writers via an empirical user
study (n=30). The design of our collaborative writing interface is grounded in
the cognitive process model of writing that views writing as a goal-oriented
thinking process encompassing non-linear cognitive activities: planning,
translating, and reviewing. Participants are asked to submit a post-completion
survey to provide feedback on the potential and pitfalls of LLMs as writing
collaborators. Upon analyzing the writer-LLM interactions, we find that while
writers seek LLM's help across all three types of cognitive activities, they
find LLMs more helpful in translation and reviewing. Our findings from
analyzing both the interactions and the survey responses highlight future
research directions in creative writing assistance using LLMs.
</p>
|
|
|
|
<p>Large language models are becoming increasingly practical for translating
code across programming languages, a process known as $transpiling$. Even
though automated transpilation significantly boosts developer productivity, a
key concern is whether the generated code is correct. Existing work initially
used manually crafted test suites to test the translations of a small corpus of
programs; these test suites were later automated. In contrast, we devise the
first approach for automated, functional, property-based testing of code
translation models. Our general, user-provided specifications about the
transpiled code capture a range of properties, from purely syntactic to purely
semantic ones. As shown by our experiments, this approach is very effective in
detecting property violations in popular code translation models, and
therefore, in evaluating model quality with respect to given properties. We
also go a step further and explore the usage scenario where a user simply aims
to obtain a correct translation of some code with respect to certain properties
without necessarily being concerned about the overall quality of the model. To
this purpose, we develop the first property-guided search procedure for code
translation models, where a model is repeatedly queried with slightly different
parameters to produce alternative and potentially more correct translations.
Our results show that this search procedure helps to obtain significantly
better code translations.
</p>
|
|
|
|
<p>Indoor monocular depth estimation has attracted increasing research interest.
Most previous works have been focusing on methodology, primarily experimenting
with NYU-Depth-V2 (NYUv2) Dataset, and only concentrated on the overall
performance over the test set. However, little is known regarding robustness
and generalization when it comes to applying monocular depth estimation methods
to real-world scenarios where highly varying and diverse functional
\textit{space types} are present such as library or kitchen. A study for
performance breakdown into space types is essential to realize a pretrained
model's performance variance. To facilitate our investigation for robustness
and address limitations of previous works, we collect InSpaceType, a
high-quality and high-resolution RGBD dataset for general indoor environments.
We benchmark 12 recent methods on InSpaceType and find they severely suffer
from performance imbalance concerning space types, which reveals their
underlying bias. We extend our analysis to 4 other datasets, 3 mitigation
approaches, and the ability to generalize to unseen space types. Our work marks
the first in-depth investigation of performance imbalance across space types
for indoor monocular depth estimation, drawing attention to potential safety
concerns for model deployment without considering space types, and further
shedding light on potential ways to improve robustness. See
\url{https://depthcomputation.github.io/DepthPublic} for data and the
supplementary document. The benchmark list on the GitHub project page keeps
updates for the lastest monocular depth estimation methods.
</p>
|
|
|
|
<p>Advanced Driver-Assistance Systems (ADAS) have successfully integrated
learning-based techniques into vehicle perception and decision-making. However,
their application in 3D lane detection for effective driving environment
perception is hindered by the lack of comprehensive LiDAR datasets. The sparse
nature of LiDAR point cloud data prevents an efficient manual annotation
process. To solve this problem, we present LiSV-3DLane, a large-scale 3D lane
dataset that comprises 20k frames of surround-view LiDAR point clouds with
enriched semantic annotation. Unlike existing datasets confined to a frontal
perspective, LiSV-3DLane provides a full 360-degree spatial panorama around the
ego vehicle, capturing complex lane patterns in both urban and highway
environments. We leverage the geometric traits of lane lines and the intrinsic
spatial attributes of LiDAR data to design a simple yet effective automatic
annotation pipeline for generating finer lane labels. To propel future
research, we propose a novel LiDAR-based 3D lane detection model, LiLaDet,
incorporating the spatial geometry learning of the LiDAR point cloud into
Bird's Eye View (BEV) based lane identification. Experimental results indicate
that LiLaDet outperforms existing camera- and LiDAR-based approaches in the 3D
lane detection task on the K-Lane dataset and our LiSV-3DLane.
</p>
|
|
|
|
<p>Distributed deep neural networks (DNNs) have been shown to reduce the
computational burden of mobile devices and decrease the end-to-end inference
latency in edge computing scenarios. While distributed DNNs have been studied,
to the best of our knowledge the resilience of distributed DNNs to adversarial
action still remains an open problem. In this paper, we fill the existing
research gap by rigorously analyzing the robustness of distributed DNNs against
adversarial action. We cast this problem in the context of information theory
and introduce two new measurements for distortion and robustness. Our
theoretical findings indicate that (i) assuming the same level of information
distortion, latent features are always more robust than input representations;
(ii) the adversarial robustness is jointly determined by the feature dimension
and the generalization capability of the DNN. To test our theoretical findings,
we perform extensive experimental analysis by considering 6 different DNN
architectures, 6 different approaches for distributed DNN and 10 different
adversarial attacks to the ImageNet-1K dataset. Our experimental results
support our theoretical findings by showing that the compressed latent
representations can reduce the success rate of adversarial attacks by 88% in
the best case and by 57% on the average compared to attacks to the input space.
</p>
|
|
|
|
<p>Imitation learning, which learns agent policy by mimicking expert
demonstration, has shown promising results in many applications such as medical
treatment regimes and self-driving vehicles. However, it remains a difficult
task to interpret control policies learned by the agent. Difficulties mainly
come from two aspects: 1) agents in imitation learning are usually implemented
as deep neural networks, which are black-box models and lack interpretability;
2) the latent causal mechanism behind agents' decisions may vary along the
trajectory, rather than staying static throughout time steps. To increase
transparency and offer better interpretability of the neural agent, we propose
to expose its captured knowledge in the form of a directed acyclic causal
graph, with nodes being action and state variables and edges denoting the
causal relations behind predictions. Furthermore, we design this causal
discovery process to be state-dependent, enabling it to model the dynamics in
latent causal graphs. Concretely, we conduct causal discovery from the
perspective of Granger causality and propose a self-explainable imitation
learning framework, {\method}. The proposed framework is composed of three
parts: a dynamic causal discovery module, a causality encoding module, and a
prediction module, and is trained in an end-to-end manner. After the model is
learned, we can obtain causal relations among states and action variables
behind its decisions, exposing policies learned by it. Experimental results on
both synthetic and real-world datasets demonstrate the effectiveness of the
proposed {\method} in learning the dynamic causal graphs for understanding the
decision-making of imitation learning meanwhile maintaining high prediction
accuracy.
</p>
|
|
|
|
<p>The generation of lyrics tightly connected to accompanying melodies involves
establishing a mapping between musical notes and syllables of lyrics. This
process requires a deep understanding of music constraints and semantic
patterns at syllable-level, word-level, and sentence-level semantic meanings.
However, pre-trained language models specifically designed at the syllable
level are publicly unavailable. To solve these challenging issues, we propose
to exploit fine-tuning character-level language models for syllable-level
lyrics generation from symbolic melody. In particular, our method endeavors to
incorporate linguistic knowledge of the language model into the beam search
process of a syllable-level Transformer generator network. Additionally, by
exploring ChatGPT-based evaluation for generated lyrics, along with human
subjective evaluation, we demonstrate that our approach enhances the coherence
and correctness of the generated lyrics, eliminating the need to train
expensive new language models.
</p>
|
|
|
|
<p>Driven by the appealing properties of neural fields for storing and
communicating 3D data, the problem of directly processing them to address tasks
such as classification and part segmentation has emerged and has been
investigated in recent works. Early approaches employ neural fields
parameterized by shared networks trained on the whole dataset, achieving good
task performance but sacrificing reconstruction quality. To improve the latter,
later methods focus on individual neural fields parameterized as large
Multi-Layer Perceptrons (MLPs), which are, however, challenging to process due
to the high dimensionality of the weight space, intrinsic weight space
symmetries, and sensitivity to random initialization. Hence, results turn out
significantly inferior to those achieved by processing explicit
representations, e.g., point clouds or meshes. In the meantime, hybrid
representations, in particular based on tri-planes, have emerged as a more
effective and efficient alternative to realize neural fields, but their direct
processing has not been investigated yet. In this paper, we show that the
tri-plane discrete data structure encodes rich information, which can be
effectively processed by standard deep-learning machinery. We define an
extensive benchmark covering a diverse set of fields such as occupancy,
signed/unsigned distance, and, for the first time, radiance fields. While
processing a field with the same reconstruction quality, we achieve task
performance far superior to frameworks that process large MLPs and, for the
first time, almost on par with architectures handling explicit representations.
</p>
|
|
|
|
<p>Existing Self-Supervised Learning (SSL) models for speech typically process
speech signals at a fixed resolution of 20 milliseconds. This approach
overlooks the varying informational content present at different resolutions in
speech signals. In contrast, this paper aims to incorporate multi-resolution
information into speech self-supervised representation learning. We introduce a
SSL model that leverages a hierarchical Transformer architecture, complemented
by HuBERT-style masked prediction objectives, to process speech at multiple
resolutions. Experimental results indicate that the proposed model not only
achieves more efficient inference but also exhibits superior or comparable
performance to the original HuBERT model over various tasks. Specifically,
significant performance improvements over the original HuBERT have been
observed in fine-tuning experiments on the LibriSpeech speech recognition
benchmark as well as in evaluations using the Speech Universal PERformance
Benchmark (SUPERB) and Multilingual SUPERB (ML-SUPERB).
</p>
|
|
|
|
<p>Generative adversarial networks (GANs) have remarkably advanced in diverse
domains, especially image generation and editing. However, the misuse of GANs
for generating deceptive images, such as face replacement, raises significant
security concerns, which have gained widespread attention. Therefore, it is
urgent to develop effective detection methods to distinguish between real and
fake images. Current research centers around the application of transfer
learning. Nevertheless, it encounters challenges such as knowledge forgetting
from the original dataset and inadequate performance when dealing with
imbalanced data during training. To alleviate this issue, this paper introduces
a novel GAN-generated image detection algorithm called X-Transfer, which
enhances transfer learning by utilizing two neural networks that employ
interleaved parallel gradient transmission. In addition, we combine AUC loss
and cross-entropy loss to improve the model's performance. We carry out
comprehensive experiments on multiple facial image datasets. The results show
that our model outperforms the general transferring approach, and the best
metric achieves 99.04%, which is increased by approximately 10%. Furthermore,
we demonstrate excellent performance on non-face datasets, validating its
generality and broader application prospects.
</p>
|
|
|
|
<p>Neural language models are probabilistic models of human text. They are
predominantly trained using maximum likelihood estimation (MLE), which is
equivalent to minimizing the forward cross-entropy between the empirical data
distribution and the model distribution. However, various degeneration
phenomena are still widely observed when decoding from the distributions
learned by such models. We establish that the forward cross-entropy is
suboptimal as a distance metric for aligning human and model distribution due
to its (1) recall-prioritization (2) negative diversity ignorance and (3)
train-test mismatch. In this paper, we propose Earth Mover Distance
Optimization (EMO) for auto-regressive language modeling. EMO capitalizes on
the inherent properties of earth mover distance to address the aforementioned
challenges. Due to the high complexity of direct computation, we further
introduce a feasible upper bound for EMO to ease end-to-end training. Upon
extensive evaluation of language models trained using EMO and MLE. We find that
EMO demonstrates a consistently better language modeling performance than MLE
across domains. Moreover, EMO demonstrates noteworthy enhancements in
downstream performance with minimal fine-tuning on merely 25,000 sentences.
This highlights the tremendous potential of EMO as a lightweight calibration
method for enhancing large-scale pre-trained language models.
</p>
|
|
|
|
<p>The history of artificial intelligence (AI) has witnessed the significant
impact of high-quality data on various deep learning models, such as ImageNet
for AlexNet and ResNet. Recently, instead of designing more complex neural
architectures as model-centric approaches, the attention of AI community has
shifted to data-centric ones, which focuses on better processing data to
strengthen the ability of neural models. Graph learning, which operates on
ubiquitous topological data, also plays an important role in the era of deep
learning. In this survey, we comprehensively review graph learning approaches
from the data-centric perspective, and aim to answer three crucial questions:
(1) when to modify graph data, (2) what part of the graph data needs
modification to unlock the potential of various graph models, and (3) how to
safeguard graph models from problematic data influence. Accordingly, we propose
a novel taxonomy based on the stages in the graph learning pipeline, and
highlight the processing methods for different data structures in the graph
data, i.e., topology, feature and label. Furthermore, we analyze some potential
problems embedded in graph data and discuss how to solve them in a data-centric
manner. Finally, we provide some promising future directions for data-centric
graph learning.
</p>
|
|
|
|
<p>Synchronous condensers (SCs) have been reported to improve the overall
stability and short-circuit power of a power system. SCs are also being
integrated into offshore wind power plants (WPPs) for the same reason. This
paper, investigates the effect of synchronous condensers on an offshore wind
power plant with grid-following (GFL) and grid-forming (GFM) converter
controls. Primarily, the effect of synchronous condensers can be two-fold: (1)
overall stability enhancement of the WPP by providing reactive power support,
(2) contribution to the effective short circuit ratio (SCR) of the WPP by fault
current support. Therefore, this paper focuses on studies concerning these
effects on an aggregated model of a WPP connected to the grid. To that end, a
state-space model of the test system is developed for small-signal stability
assessment and the synchronous condenser's effect on its stability. In
addition, a mathematical explanation of SCR enhancement with synchronous
condenser is provided and is verified with time-domain electromagnetic
transient simulations.
</p>
|
|
|
|
<p>Various Large Language Models(LLMs) from the Generative Pretrained
Transformer~(GPT) family have achieved outstanding performances in a wide range
of text generation tasks. However, the enormous model sizes have hindered their
practical use in real-world applications due to high inference latency.
Therefore, improving the efficiencies of LLMs through quantization, pruning,
and other means has been a key issue in LLM studies. In this work, we propose a
method based on Hessian sensitivity-aware mixed sparsity pruning to prune LLMs
to at least 50\% sparsity without the need of any retraining. It allocates
sparsity adaptively based on sensitivity, allowing us to reduce pruning-induced
error while maintaining the overall sparsity level. The advantages of the
proposed method exhibit even more when the sparsity is extremely high.
Furthermore, our method is compatible with quantization, enabling further
compression of LLMs.
</p>
|
|
|
|
<p>Diffusion models for text-to-image (T2I) synthesis, such as Stable Diffusion
(SD), have recently demonstrated exceptional capabilities for generating
high-quality content. However, this progress has raised several concerns of
potential misuse, particularly in creating copyrighted, prohibited, and
restricted content, or NSFW (not safe for work) images. While efforts have been
made to mitigate such problems, either by implementing a safety filter at the
evaluation stage or by fine-tuning models to eliminate undesirable concepts or
styles, the effectiveness of these safety measures in dealing with a wide range
of prompts remains largely unexplored. In this work, we aim to investigate
these safety mechanisms by proposing one novel concept retrieval algorithm for
evaluation. We introduce Ring-A-Bell, a model-agnostic red-teaming tool for T2I
diffusion models, where the whole evaluation can be prepared in advance without
prior knowledge of the target model. Specifically, Ring-A-Bell first performs
concept extraction to obtain holistic representations for sensitive and
inappropriate concepts. Subsequently, by leveraging the extracted concept,
Ring-A-Bell automatically identifies problematic prompts for diffusion models
with the corresponding generation of inappropriate content, allowing the user
to assess the reliability of deployed safety mechanisms. Finally, we
empirically validate our method by testing online services such as Midjourney
and various methods of concept removal. Our results show that Ring-A-Bell, by
manipulating safe prompting benchmarks, can transform prompts that were
originally regarded as safe to evade existing safety mechanisms, thus revealing
the defects of the so-called safety mechanisms which could practically lead to
the generation of harmful contents.
</p>
|
|
|
|
<p>Graph Neural Networks (GNNs), especially message-passing neural networks
(MPNNs), have emerged as powerful architectures for learning on graphs in
diverse applications. However, MPNNs face challenges when modeling non-local
interactions in graphs such as large conjugated molecules, and social networks
due to oversmoothing and oversquashing. Although Spectral GNNs and traditional
neural networks such as recurrent neural networks and transformers mitigate
these challenges, they often lack generalizability, or fail to capture detailed
structural relationships or symmetries in the data. To address these concerns,
we introduce Matrix Function Neural Networks (MFNs), a novel architecture that
parameterizes non-local interactions through analytic matrix equivariant
functions. Employing resolvent expansions offers a straightforward
implementation and the potential for linear scaling with system size. The MFN
architecture achieves stateof-the-art performance in standard graph benchmarks,
such as the ZINC and TU datasets, and is able to capture intricate non-local
interactions in quantum systems, paving the way to new state-of-the-art force
fields.
</p>
|
|
|
|
<p>This work proposes an optimal power management strategy for shipboard
microgrids equipped with diesel generators, a fuel cell and a battery energy
storage system. The optimization aims to determine both the unit commitment and
the optimal power dispatch for all resources to ensure a reliable power supply
at minimum cost and with minimal environmental impact. This strategy takes into
account the zero-emission capability of the ship and incorporates a soft
constraint related to the ship's speed. The optimization is performed solving a
mixed integer linear programming problem, where the constraints are defined
according to the operational limits of the resources when a contingency occurs.
The algorithm is tested on a notional all-electric ship where the electrical
load is generated through a Markov chain, modelled on real measurement data.
The results show that the proposed power management strategy successfully
maximizes fuel and emission savings while ensuring blackout prevention
capability.
</p>
|
|
|
|
<p>Finding bugs is key to the correctness of compilers in wide use today. If the
behaviour of a compiled program, as allowed by its architecture memory model,
is not a behaviour of the source program under its source model, then there is
a bug. This holds for all programs, but we focus on concurrency bugs that occur
only with two or more threads of execution. We focus on testing techniques that
detect such bugs in C/C++ compilers.
</p>
<p>We seek a testing technique that automatically covers concurrency bugs up to
fixed bounds on program sizes and that scales to find bugs in compiled programs
with many lines of code. Otherwise, a testing technique can miss bugs.
Unfortunately, the state-of-the-art techniques are yet to satisfy all of these
properties.
</p>
<p>We present the T\'el\'echat compiler testing tool for concurrent programs.
T\'el\'echat compiles a concurrent C/C++ program and compares source and
compiled program behaviours using source and architecture memory models. We
make three claims: T\'el\'echat improves the state-of-the-art at finding bugs
in code generation for multi-threaded execution, it is the first public
description of a compiler testing tool for concurrency that is deployed in
industry, and it is the first tool that takes a significant step towards the
desired properties. We provide experimental evidence suggesting T\'el\'echat
finds bugs missed by other state-of-the-art techniques, case studies indicating
that T\'el\'echat satisfies the properties, and reports of our experience
deploying T\'el\'echat in industry regression testing.
</p>
|
|
|
|
<p>The expressivity of Graph Neural Networks (GNNs) can be entirely
characterized by appropriate fragments of the first order logic. Namely, any
query of the two variable fragment of graded modal logic (GC2) interpreted over
labeled graphs can be expressed using a GNN whose size depends only on the
depth of the query. As pointed out by [Barcelo & Al., 2020, Grohe, 2021], this
description holds for a family of activation functions, leaving the
possibibility for a hierarchy of logics expressible by GNNs depending on the
chosen activation function. In this article, we show that such hierarchy indeed
exists by proving that GC2 queries cannot be expressed by GNNs with polynomial
activation functions. This implies a separation between polynomial and popular
non polynomial activations (such as Rectified Linear Units) and answers an open
question formulated by [Grohe, 21].
</p>
|
|
|
|
<p>This work is the first study on the effects of attacks on cryptocurrencies as
expressed in the sentiments and emotions of social media users. Our goals are
to design the methodologies for the study including data collection, conduct
volumetric and temporal analyses of the data, and profile the sentiments and
emotions that emerge from the data. As a first step, we have created a
first-of-its-kind comprehensive list of 31 events of 51% attacks on various PoW
cryptocurrencies, showing that these events are quite common contrary to the
general perception. We have gathered Twitter data on the events as well as
benchmark data during normal times for comparison. We have defined parameters
for profiling the datasets based on their sentiments and emotions. We have
studied the variation of these sentiment and emotion profiles when a
cryptocurrency is under attack and the benchmark otherwise, between multiple
attack events of the same cryptocurrency, and between different
cryptocurrencies. Our results confirm some expected overall behaviour and
reactions while providing nuanced insights that may not be obvious or may even
be considered surprising. Our code and datasets are publicly accessible.
</p>
|
|
|
|
<p>Rule-based models, e.g., decision trees, are widely used in scenarios
demanding high model interpretability for their transparent inner structures
and good model expressivity. However, rule-based models are hard to optimize,
especially on large data sets, due to their discrete parameters and structures.
Ensemble methods and fuzzy/soft rules are commonly used to improve performance,
but they sacrifice the model interpretability. To obtain both good scalability
and interpretability, we propose a new classifier, named Rule-based
Representation Learner (RRL), that automatically learns interpretable non-fuzzy
rules for data representation and classification. To train the
non-differentiable RRL effectively, we project it to a continuous space and
propose a novel training method, called Gradient Grafting, that can directly
optimize the discrete model using gradient descent. A novel design of logical
activation functions is also devised to increase the scalability of RRL and
enable it to discretize the continuous features end-to-end. Exhaustive
experiments on ten small and four large data sets show that RRL outperforms the
competitive interpretable approaches and can be easily adjusted to obtain a
trade-off between classification accuracy and model complexity for different
scenarios. Our code is available at: https://github.com/12wang3/rrl.
</p>
|
|
|
|
<p>Restless multi-arm bandits (RMABs), a class of resource allocation problems
with broad application in areas such as healthcare, online advertising, and
anti-poaching, have recently been studied from a multi-agent reinforcement
learning perspective. Prior RMAB research suffers from several limitations,
e.g., it fails to adequately address continuous states, and requires retraining
from scratch when arms opt-in and opt-out over time, a common challenge in many
real world applications. We address these limitations by developing a neural
network-based pre-trained model (PreFeRMAB) that has general zero-shot ability
on a wide range of previously unseen RMABs, and which can be fine-tuned on
specific instances in a more sample-efficient way than retraining from scratch.
Our model also accommodates general multi-action settings and discrete or
continuous state spaces. To enable fast generalization, we learn a novel single
policy network model that utilizes feature information and employs a training
procedure in which arms opt-in and out over time. We derive a new update rule
for a crucial $\lambda$-network with theoretical convergence guarantees and
empirically demonstrate the advantages of our approach on several challenging,
real-world inspired problems.
</p>
|
|
|
|
<p>A spectrum-sharing satellite-ground integrated network is conceived,
consisting of a pair of non-geostationary orbit (NGSO) constellations and
multiple terrestrial base stations, which impose the co-frequency interference
(CFI) on each other. The CFI may increase upon increasing the number of
satellites. To manage the potentially severe interference, we propose to rely
on joint multi-domain resource aided interference management (JMDR-IM).
Specifically, the coverage overlap of the constellations considered is
analyzed. Then, multi-domain resources - including both the beam-domain and
power-domain - are jointly utilized for managing the CFI in an overlapping
coverage region. This joint resource utilization is performed by relying on our
specifically designed beam-shut-off and switching based beam scheduling, as
well as on long short-term memory based joint autoregressive moving average
assisted deep Q network aided power scheduling. Moreover, the outage
probability (OP) of the proposed JMDR-IM scheme is derived, and the asymptotic
analysis of the OP is also provided. Our performance evaluations demonstrate
the superiority of the proposed JMDR-IM scheme in terms of its increased
throughput and reduced OP.
</p>
|
|
|
|
<p>With the availability of large-scale video datasets and the advances of
diffusion models, text-driven video generation has achieved substantial
progress. However, existing video generation models are typically trained on a
limited number of frames, resulting in the inability to generate high-fidelity
long videos during inference. Furthermore, these models only support
single-text conditions, whereas real-life scenarios often require multi-text
conditions as the video content changes over time. To tackle these challenges,
this study explores the potential of extending the text-driven capability to
generate longer videos conditioned on multiple texts. 1) We first analyze the
impact of initial noise in video diffusion models. Then building upon the
observation of noise, we propose FreeNoise, a tuning-free and time-efficient
paradigm to enhance the generative capabilities of pretrained video diffusion
models while preserving content consistency. Specifically, instead of
initializing noises for all frames, we reschedule a sequence of noises for
long-range correlation and perform temporal attention over them by window-based
function. 2) Additionally, we design a novel motion injection method to support
the generation of videos conditioned on multiple text prompts. Extensive
experiments validate the superiority of our paradigm in extending the
generative capabilities of video diffusion models. It is noteworthy that
compared with the previous best-performing method which brought about 255%
extra time cost, our method incurs only negligible time cost of approximately
17%. Generated video samples are available at our website:
<a href="http://haonanqiu.com/projects/FreeNoise.html.">this http URL</a>
</p>
|
|
|
|
<p>In scheduling problems common in the industry and various real-world
scenarios, responding in real-time to disruptive events is essential. Recent
methods propose the use of deep reinforcement learning (DRL) to learn policies
capable of generating solutions under this constraint. The objective of this
paper is to introduce a new DRL method for solving the flexible job-shop
scheduling problem, particularly for large instances. The approach is based on
the use of heterogeneous graph neural networks to a more informative graph
representation of the problem. This novel modeling of the problem enhances the
policy's ability to capture state information and improve its decision-making
capacity. Additionally, we introduce two novel approaches to enhance the
performance of the DRL approach: the first involves generating a diverse set of
scheduling policies, while the second combines DRL with dispatching rules (DRs)
constraining the action space. Experimental results on two public benchmarks
show that our approach outperforms DRs and achieves superior results compared
to three state-of-the-art DRL methods, particularly for large instances.
</p>
|
|
|
|
<p>The use of large language models for code generation is a rapidly growing
trend in software development. However, without effective methods for ensuring
the correctness of generated code, this trend could lead to any number of
undesirable outcomes. In this paper, we lay out a vision for addressing this
challenge: the Clover paradigm, short for Closed-Loop Verifiable Code
Generation, which reduces correctness checking to the more accessible problem
of consistency checking. At the core of Clover lies a checker that performs
consistency checks among code, docstrings, and formal annotations. The checker
is implemented using a novel integration of formal verification tools and large
language models. We provide a theoretical analysis to support our thesis that
Clover should be effective at consistency checking. We also empirically
investigate its feasibility on a hand-designed dataset (CloverBench) featuring
annotated Dafny programs at a textbook level of difficulty. Experimental
results show that for this dataset, (i) LLMs are reasonably successful at
automatically generating formal specifications; and (ii) our consistency
checker achieves a promising acceptance rate (up to 87%) for correct instances
while maintaining zero tolerance for incorrect ones (no false positives).
</p>
|
|
|
|
<p>Sound over-approximation methods have been proved effective for guaranteeing
the absence of errors, but inevitably they produce false alarms that can hamper
the programmers. Conversely, under-approximation methods are aimed at bug
finding and are free from false alarms. We introduce Sufficient Incorrectness
Logic~(SIL), a new under-approximating, triple-based program logic to reason
about program errors. SIL is designed to set apart the initial states leading
to errors. We prove that SIL is correct and complete for a minimal set of
rules, and we study additional rules that can facilitate program analyses. We
formally compare SIL to existing triple-based program logics. Incorrectness
Logic and SIL both perform under-approximations, but while the former exposes
only true errors, the latter locates the set of initial states that lead to
such errors. Hoare Logic performs over-approximations and as such cannot
capture the set of initial states leading to errors in nondeterministic
programs -- for deterministic and terminating programs, Hoare Logic and SIL
coincide. Finally, we instantiate SIL with Separation Logic formulae
(Separation SIL) to handle pointers and dynamic allocation and we prove its
correctness and, for loop-free programs, also its completeness. We argue that
in some cases Separation SIL can yield more succinct postconditions and provide
stronger guarantees than Incorrectness Separation Logic and can support
effective backward reasoning.
</p>
|
|
|
|
<p>The growing computing power over the years has enabled simulations to become
more complex and accurate. While immensely valuable for scientific discovery
and problem-solving, however, high-fidelity simulations come with significant
computational demands. As a result, it is common to run a low-fidelity model
with a subgrid-scale model to reduce the computational cost, but selecting the
appropriate subgrid-scale models and tuning them are challenging. We propose a
novel method for learning the subgrid-scale model effects when simulating
partial differential equations augmented by neural ordinary differential
operators in the context of discontinuous Galerkin (DG) spatial discretization.
Our approach learns the missing scales of the low-order DG solver at a
continuous level and hence improves the accuracy of the low-order DG
approximations as well as accelerates the filtered high-order DG simulations
with a certain degree of precision. We demonstrate the performance of our
approach through multidimensional Taylor-Green vortex examples at different
Reynolds numbers and times, which cover laminar, transitional, and turbulent
regimes. The proposed method not only reconstructs the subgrid-scale from the
low-order (1st-order) approximation but also speeds up the filtered high-order
DG (6th-order) simulation by two orders of magnitude.
</p>
|
|
|
|
<p>Emotion recognition is a crucial task for human conversation understanding.
It becomes more challenging with the notion of multimodal data, e.g., language,
voice, and facial expressions. As a typical solution, the global- and the local
context information are exploited to predict the emotional label for every
single sentence, i.e., utterance, in the dialogue. Specifically, the global
representation could be captured via modeling of cross-modal interactions at
the conversation level. The local one is often inferred using the temporal
information of speakers or emotional shifts, which neglects vital factors at
the utterance level. Additionally, most existing approaches take fused features
of multiple modalities in an unified input without leveraging modality-specific
representations. Motivating from these problems, we propose the Relational
Temporal Graph Neural Network with Auxiliary Cross-Modality Interaction
(CORECT), an novel neural network framework that effectively captures
conversation-level cross-modality interactions and utterance-level temporal
dependencies with the modality-specific manner for conversation understanding.
Extensive experiments demonstrate the effectiveness of CORECT via its
state-of-the-art results on the IEMOCAP and CMU-MOSEI datasets for the
multimodal ERC task.
</p>
|
|
|
|
<p>Regev recently introduced a quantum factoring algorithm that may be perceived
as a $d$-dimensional variation of Shor's factoring algorithm. In this work, we
extend Regev's factoring algorithm to an algorithm for computing discrete
logarithms in a natural way. Furthermore, we discuss natural extensions of
Regev's factoring algorithm to order finding, and to factoring completely via
order finding. For all of these algorithms, we discuss various practical
implementation considerations, including in particular the robustness of the
post-processing.
</p>
|
|
|
|
<p>With the booming popularity of smartphones, threats related to these devices
are increasingly on the rise. Smishing, a combination of SMS (Short Message
Service) and phishing has emerged as a treacherous cyber threat used by
malicious actors to deceive users, aiming to steal sensitive information, money
or install malware on their mobile devices. Despite the increase in smishing
attacks in recent years, there are very few studies aimed at understanding the
factors that contribute to a user's ability to differentiate real from fake
messages. To address this gap in knowledge, we have conducted an online survey
on smishing detection with 214 participants. In this study, we presented them
with 16 SMS screenshots and evaluated how different factors affect their
decision making process in smishing detection. Next, we conducted a follow-up
survey to garner information on the participants' security attitudes, behavior
and knowledge. Our results highlighted that attention and security behavioral
scores had a significant impact on participants' accuracy in identifying
smishing messages. Interestingly, we found that participants had more
difficulty identifying real messages from fake ones, with an accuracy of 65.6%
with fake messages and 44.6% with real messages. Our study is crucial in
developing proactive strategies to encounter and mitigate smishing attacks. By
understanding what factors influence smishing detection, we aim to bolster
users' resilience against such threats and create a safer digital environment
for all.
</p>
|
|
|
|
<p>We study the design of a goal-oriented sampling and scheduling strategy
through a channel with highly variable two-way random delay, which can exhibit
memory (e.g., Delay and Disruption Tolerant Networks). The objective of the
communication is to optimize the performance of remote inference, where an
inference algorithm (e.g., a trained neural network) on the receiver side
predicts a time-varying target signal using the data samples transmitted by a
sensor. Previous formulations to this problem either assumed a channel with IID
transmission delay, neglecting feedback delay, or considered the monotonic
relation that the performance only gets worse as the input information ages. We
show how, with delayed feedback, one can effectively exploit the knowledge
about delay memory through an index-based threshold policy. This policy
minimizes the expected time-average inference error that can be monotone or
non-monotone in age. The index function is expressed in terms of the Age of
Information (AoI) on the receiver side and a parameter regarding the
distribution of subsequent transmission delay, both of which can readily be
tracked.
</p>
|
|
|
|
<p>This paper presents a comprehensive study of Meta Prompting, an innovative
technique reshaping the utilization of large language models (LLMs),
multi-modal foundation models, and AI systems in problem-solving and data
interpretation. Grounded in type theory and category theory, Meta Prompting
emphasizes the structure and syntax of information over traditional
content-centric methods. The paper explores the formal definitions of Meta
Prompting (MP), sets it apart from Few-Shot Prompting, and underlines its
effectiveness in various AI applications. A key focus is applying Meta
Prompting for complex reasoning (MP-CR) tasks, showing how it effectively
deconstructs intricate problems into simpler sub-problems, enhancing token
efficiency, and enabling more equitable problem-solving comparisons, especially
against few-shot example methods. Additionally, the paper introduces Meta
Prompting for prompting tasks, allowing LLMs to self-generate new prompts in a
recursive, metaprogramming-like manner. This approach marks a significant leap
in AI's autonomous and adaptive capabilities. The paper also introduces the
integration of Meta Prompting into multi-modal foundation model settings,
tackling the challenges and opportunities of incorporating varied data types
such as images, audio, and video within the structured Meta Prompting
framework. Empirical experiments, including solving the Game of 24 tasks,
demonstrate the MP-CR Agent's enhanced reasoning capabilities, achieving high
accuracy and efficiency, and showcasing Meta Prompting's transformative impact
on AI problem-solving. (The code is available at
https://github.com/meta-prompting/meta-prompting)
</p>
|
|
|
|
<p>The emergent abilities of Large Language Models (LLMs), which power tools
like ChatGPT and Bard, have produced both excitement and worry about how AI
will impact academic writing. In response to rising concerns about AI use,
authors of academic publications may decide to voluntarily disclose any AI
tools they use to revise their manuscripts, and journals and conferences could
begin mandating disclosure and/or turn to using detection services, as many
teachers have done with student writing in class settings. Given these looming
possibilities, we investigate whether academics view it as necessary to report
AI use in manuscript preparation and how detectors react to the use of AI in
academic writing.
</p>
|
|
|
|
<p>We present four main contributions to enhance the performance of Large
Language Models (LLMs) in generating domain-specific code: (i) utilizing
LLM-based data splitting and data renovation techniques to improve the semantic
representation of embeddings' space; (ii) introducing the Chain of Density for
Renovation Credibility (CoDRC), driven by LLMs, and the Adaptive Text
Renovation (ATR) algorithm for assessing data renovation reliability; (iii)
developing the Implicit Knowledge Expansion and Contemplation (IKEC) Prompt
technique; and (iv) effectively refactoring existing scripts to generate new
and high-quality scripts with LLMs. By using engineering simulation software
RedHawk-SC as a case study, we demonstrate the effectiveness of our data
pre-processing method for expanding and categorizing scripts. When combined
with IKEC, these techniques enhance the Retrieval-Augmented Generation (RAG)
method in retrieving more relevant information, ultimately achieving a 73.33%
"Percentage of Correct Lines" for code generation problems in MapReduce
applications.
</p>
|
|
|
|
<p>Can we avoid wars at the crossroads of history? This question has been
pursued by individuals, scholars, policymakers, and organizations throughout
human history. In this research, we attempt to answer the question based on the
recent advances of Artificial Intelligence (AI) and Large Language Models
(LLMs). We propose \textbf{WarAgent}, an LLM-powered multi-agent AI system, to
simulate the participating countries, their decisions, and the consequences, in
historical international conflicts, including the World War I (WWI), the World
War II (WWII), and the Warring States Period (WSP) in Ancient China. By
evaluating the simulation effectiveness, we examine the advancements and
limitations of cutting-edge AI systems' abilities in studying complex
collective human behaviors such as international conflicts under diverse
settings. In these simulations, the emergent interactions among agents also
offer a novel perspective for examining the triggers and conditions that lead
to war. Our findings offer data-driven and AI-augmented insights that can
redefine how we approach conflict resolution and peacekeeping strategies. The
implications stretch beyond historical analysis, offering a blueprint for using
AI to understand human history and possibly prevent future international
conflicts. Code and data are available at
\url{https://github.com/agiresearch/WarAgent}.
</p>
|
|
|
|
<p>This work introduces self-infilling code generation, a general framework that
incorporates infilling operations into auto-regressive decoding. Our approach
capitalizes on the observation that recent infilling-capable code language
models can self-infill: whereas infilling operations aim to fill in the middle
based on a predefined prefix and suffix, self-infilling sequentially generates
both such surrounding context and the infilled content. We utilize this
capability to introduce novel interruption and looping mechanisms in
conventional decoding, evolving it into a non-monotonic process. Interruptions
allow for postponing the generation of specific code until a definitive suffix
is established, enhancing control over the output. Meanwhile, the looping
mechanism, which leverages the complementary nature of self-infilling and
left-to-right decoding, can iteratively update and synchronize each piece of
generation cyclically. Extensive experiments are conducted to demonstrate that
our proposed decoding process is effective in enhancing both regularity and
quality across several code generation benchmarks.
</p>
|
|
|
|
<p>Enhancing the domain generalization performance of Face Anti-Spoofing (FAS)
techniques has emerged as a research focus. Existing methods are dedicated to
extracting domain-invariant features from various training domains. Despite the
promising performance, the extracted features inevitably contain residual style
feature bias (e.g., illumination, capture device), resulting in inferior
generalization performance. In this paper, we propose an alternative and
effective solution, the Textually Guided Domain Generalization (TeG-DG)
framework, which can effectively leverage text information for cross-domain
alignment. Our core insight is that text, as a more abstract and universal form
of expression, can capture the commonalities and essential characteristics
across various attacks, bridging the gap between different image domains.
Contrary to existing vision-language models, the proposed framework is
elaborately designed to enhance the domain generalization ability of the FAS
task. Concretely, we first design a Hierarchical Attention Fusion (HAF) module
to enable adaptive aggregation of visual features at different levels; Then, a
Textual-Enhanced Visual Discriminator (TEVD) is proposed for not only better
alignment between the two modalities but also to regularize the classifier with
unbiased text features. TeG-DG significantly outperforms previous approaches,
especially in situations with extremely limited source domain data (~14% and
~12% improvements on HTER and AUC respectively), showcasing impressive few-shot
performance.
</p>
|
|
|
|
<p>Creativity is core to being human. Generative artificial intelligence (GenAI)
-- including ever more powerful large language models (LLMs) -- holds promise
for humans to be more creative by offering new ideas, or less creative by
anchoring on GenAI ideas. We study the causal impact of GenAI ideas on the
production of a short story in an online experimental study where some writers
could obtain story ideas from a GenAI platform. We find that access to GenAI
ideas causes stories to be evaluated as more creative, better written, and more
enjoyable, especially among less creative writers. However, GenAI-enabled
stories are more similar to each other than stories by humans alone. These
results point to an increase in individual creativity at the risk of losing
collective novelty. This dynamic resembles a social dilemma: with GenAI,
individual writers are better off, but collectively a narrower scope of novel
content may be produced. Our results have implications for researchers,
policy-makers and practitioners interested in bolstering creativity.
</p>
|
|
|
|
<p>The widespread adoption of REST APIs, coupled with their growing complexity
and size, has led to the need for automated REST API testing tools. Current
tools focus on the structured data in REST API specifications but often neglect
valuable insights available in unstructured natural-language descriptions in
the specifications, which leads to suboptimal test coverage. Recently, to
address this gap, researchers have developed techniques that extract rules from
these human-readable descriptions and query knowledge bases to derive
meaningful input values. However, these techniques are limited in the types of
rules they can extract and prone to produce inaccurate results. This paper
presents RESTGPT, an innovative approach that leverages the power and intrinsic
context-awareness of Large Language Models (LLMs) to improve REST API testing.
RESTGPT takes as input an API specification, extracts machine-interpretable
rules, and generates example parameter values from natural-language
descriptions in the specification. It then augments the original specification
with these rules and values. Our evaluations indicate that RESTGPT outperforms
existing techniques in both rule extraction and value generation. Given these
promising results, we outline future research directions for advancing REST API
testing through LLMs.
</p>
|
|
|
|
<p>Graphs can inherently model interconnected objects on the Web, thereby
facilitating a series of Web applications, such as web analyzing and content
recommendation. Recently, Graph Neural Networks (GNNs) have emerged as a
mainstream technique for graph representation learning. However, their efficacy
within an end-to-end supervised framework is significantly tied to the
availabilityof task-specific labels. To mitigate labeling costs and enhance
robustness in few-shot settings, pre-training on self-supervised tasks has
emerged as a promising method, while prompting has been proposed to further
narrow the objective gap between pretext and downstream tasks. Although there
has been some initial exploration of prompt-based learning on graphs, they
primarily leverage a single pretext task, resulting in a limited subset of
general knowledge that could be learned from the pre-training data. Hence, in
this paper, we propose MultiGPrompt, a novel multi-task pre-training and
prompting framework to exploit multiple pretext tasks for more comprehensive
pre-trained knowledge. First, in pre-training, we design a set of pretext
tokens to synergize multiple pretext tasks. Second, we propose a dual-prompt
mechanism consisting of composed and open prompts to leverage task-specific and
global pre-training knowledge, to guide downstream tasks in few-shot settings.
Finally, we conduct extensive experiments on six public datasets to evaluate
and analyze MultiGPrompt.
</p>
|
|
|
|
<p>Large language models (LLMs) encapsulate a vast amount of factual information
within their pre-trained weights, as evidenced by their ability to answer
diverse questions across different domains. However, this knowledge is
inherently limited, relying heavily on the characteristics of the training
data. Consequently, using external datasets to incorporate new information or
refine the capabilities of LLMs on previously seen information poses a
significant challenge. In this study, we compare two common approaches:
unsupervised fine-tuning and retrieval-augmented generation (RAG). We evaluate
both approaches on a variety of knowledge-intensive tasks across different
topics. Our findings reveal that while unsupervised fine-tuning offers some
improvement, RAG consistently outperforms it, both for existing knowledge
encountered during training and entirely new knowledge. Moreover, we find that
LLMs struggle to learn new factual information through unsupervised
fine-tuning, and that exposing them to numerous variations of the same fact
during training could alleviate this problem.
</p>
|
|
|
|
<p>The detection of toxic language in the Arabic language has emerged as an
active area of research in recent years, and reviewing the existing datasets
employed for training the developed solutions has become a pressing need. This
paper offers a comprehensive survey of Arabic datasets focused on online toxic
language. We systematically gathered a total of 54 available datasets and their
corresponding papers and conducted a thorough analysis, considering 18 criteria
across four primary dimensions: availability details, content, annotation
process, and reusability. This analysis enabled us to identify existing gaps
and make recommendations for future research works. For the convenience of the
research community, the list of the analysed datasets is maintained in a GitHub
repository (https://github.com/Imene1/Arabic-toxic-language).
</p>
|
|
|
|
<p>Zero-shot voice conversion (VC) aims to transfer the source speaker timbre to
arbitrary unseen target speaker timbre, while keeping the linguistic content
unchanged. Although the voice of generated speech can be controlled by
providing the speaker embedding of the target speaker, the speaker similarity
still lags behind the ground truth recordings. In this paper, we propose
SEF-VC, a speaker embedding free voice conversion model, which is designed to
learn and incorporate speaker timbre from reference speech via a powerful
position-agnostic cross-attention mechanism, and then reconstruct waveform from
HuBERT semantic tokens in a non-autoregressive manner. The concise design of
SEF-VC enhances its training stability and voice conversion performance.
Objective and subjective evaluations demonstrate the superiority of SEF-VC to
generate high-quality speech with better similarity to target reference than
strong zero-shot VC baselines, even for very short reference speeches.
</p>
|
|
|
|
<p>In this paper, we investigate the existence of self-dual MRD codes $C\subset
L^n$, where $L/F$ is an arbitrary field extension of degree $m\geq n$. We then
apply our results to the case of finite fields, and prove that if $m=n$ and
$F=\mathbb{F}_q$, a self-dual MRD code exists if and only if $q\equiv n\equiv 3
\ [4].$
</p>
|
|
|
|
<p>We propose a reinforcement learning (RL)-based system that would
automatically prescribe a hypothetical patient medication that may help the
patient with their mental health-related speech disfluency, and adjust the
medication and the dosages in response to zero-cost frequent measurement of the
fluency of the patient. We demonstrate the components of the system: a module
that detects and evaluates speech disfluency on a large dataset we built, and
an RL algorithm that automatically finds good combinations of medications. To
support the two modules, we collect data on the effect of psychiatric
medications for speech disfluency from the literature, and build a plausible
patient simulation system. We demonstrate that the RL system is, under some
circumstances, able to converge to a good medication regime. We collect and
label a dataset of people with possible speech disfluency and demonstrate our
methods using that dataset. Our work is a proof of concept: we show that there
is promise in the idea of using automatic data collection to address speech
disfluency.
</p>
|
|
|
|
<p>In this work, we examined how fact-checkers prioritize which claims to
inspect for further investigation and publishing, and what tools may assist
them in their efforts. Specifically, through a series of interviews with 23
professional fact-checkers from around the world, we validated that harm
assessment is a central component of how fact-checkers triage their work.
First, we clarify what aspects of misinformation they considered to create
urgency or importance. These often revolved around the potential for the claim
to harm others. We also clarify the processes behind collective fact-checking
decisions and gather suggestions for tools that could help with these
processes.
</p>
<p>In addition, to address the needs articulated by these fact-checkers and
others, we present a five-dimension framework of questions to help
fact-checkers negotiate the priority of claims. Our FABLE Framework of
Misinformation Harms incorporates five dimensions of magnitude -- (social)
Fragmentation, Actionability, Believability, Likelihood of spread, and
Exploitativeness -- that can help determine the potential urgency of a specific
message or post when considering misinformation as harm. This effort was
further validated by additional interviews with expert fact-checkers. The
result is a questionnaire, a practical and conceptual tool to support
fact-checkers and other content moderators as they make strategic decisions to
prioritize their efforts.
</p>
|
|
|
|
<p>We find the location of factual knowledge in large language models by
exploring the residual stream and analyzing subvalues in vocabulary space. We
find the reason why subvalues have human-interpretable concepts when projecting
into vocabulary space. The before-softmax values of subvalues are added by an
addition function, thus the probability of top tokens in vocabulary space will
increase. Based on this, we find using log probability increase to compute the
significance of layers and subvalues is better than probability increase, since
the curve of log probability increase has a linear monotonically increasing
shape. Moreover, we calculate the inner products to evaluate how much a
feed-forward network (FFN) subvalue is activated by previous layers. Base on
our methods, we find where factual knowledge <France, capital, Paris> is
stored. Specifically, attention layers store "Paris is related to France". FFN
layers store "Paris is a capital/city", activated by attention subvalues
related to "capital". We leverage our method on Baevski-18, GPT2 medium,
Llama-7B and Llama-13B. Overall, we provide a new method for understanding the
mechanism of transformers. We will release our code on github.
</p>
|
|
|
|
<p>We consider the problem of approximating a function from $L^2$ by an element
of a given $m$-dimensional space $V_m$, associated with some feature map
$\varphi$, using evaluations of the function at random points $x_1,\dots,x_n$.
After recalling some results on optimal weighted least-squares using
independent and identically distributed points, we consider weighted
least-squares using projection determinantal point processes (DPP) or volume
sampling. These distributions introduce dependence between the points that
promotes diversity in the selected features $\varphi(x_i)$. We first provide a
generalized version of volume-rescaled sampling yielding quasi-optimality
results in expectation with a number of samples $n = O(m\log(m))$, that means
that the expected $L^2$ error is bounded by a constant times the best
approximation error in $L^2$. Also, further assuming that the function is in
some normed vector space $H$ continuously embedded in $L^2$, we further prove
that the approximation is almost surely bounded by the best approximation error
measured in the $H$-norm. This includes the cases of functions from $L^\infty$
or reproducing kernel Hilbert spaces. Finally, we present an alternative
strategy consisting in using independent repetitions of projection DPP (or
volume sampling), yielding similar error bounds as with i.i.d. or volume
sampling, but in practice with a much lower number of samples. Numerical
experiments illustrate the performance of the different strategies.
</p>
|
|
|
|
<p>Emergency and non-emergency response systems are essential services provided
by local governments and critical to protecting lives, the environment, and
property. The effective handling of (non-)emergency calls is critical for
public safety and well-being. By reducing the burden through non-emergency
callers, residents in critical need of assistance through 911 will receive a
fast and effective response. Collaborating with the Department of Emergency
Communications (DEC) in Nashville, we analyzed 11,796 non-emergency call
recordings and developed Auto311, the first automated system to handle 311
non-emergency calls, which (1) effectively and dynamically predicts ongoing
non-emergency incident types to generate tailored case reports during the call;
(2) itemizes essential information from dialogue contexts to complete the
generated reports; and (3) strategically structures system-caller dialogues
with optimized confidence. We used real-world data to evaluate the system's
effectiveness and deployability. The experimental results indicate that the
system effectively predicts incident type with an average F-1 score of 92.54%.
Moreover, the system successfully itemizes critical information from relevant
contexts to complete reports, evincing a 0.93 average consistency score
compared to the ground truth. Additionally, emulations demonstrate that the
system effectively decreases conversation turns as the utterance size gets more
extensive and categorizes the ongoing call with 94.49% mean accuracy.
</p>
|
|
|
|
<p>This paper proposes a novel method for demand forecasting in a pricing
context. Here, modeling the causal relationship between price as an input
variable to demand is crucial because retailers aim to set prices in a (profit)
optimal manner in a downstream decision making problem. Our methods bring
together the Double Machine Learning methodology for causal inference and
state-of-the-art transformer-based forecasting models. In extensive empirical
experiments, we show on the one hand that our method estimates the causal
effect better in a fully controlled setting via synthetic, yet realistic data.
On the other hand, we demonstrate on real-world data that our method
outperforms forecasting methods in off-policy settings (i.e., when there's a
change in the pricing policy) while only slightly trailing in the on-policy
setting.
</p>
|
|
|
|
<p>Machine unlearning refers to the process of mitigating the influence of
specific training data on machine learning models based on removal requests
from data owners. However, one important area that has been largely overlooked
in the research of unlearning is reinforcement learning. Reinforcement learning
focuses on training an agent to make optimal decisions within an environment to
maximize its cumulative rewards. During the training, the agent tends to
memorize the features of the environment, which raises a significant concern
about privacy. As per data protection regulations, the owner of the environment
holds the right to revoke access to the agent's training data, thus
necessitating the development of a novel and pressing research field, known as
\emph{reinforcement unlearning}. Reinforcement unlearning focuses on revoking
entire environments rather than individual data samples. This unique
characteristic presents three distinct challenges: 1) how to propose unlearning
schemes for environments; 2) how to avoid degrading the agent's performance in
remaining environments; and 3) how to evaluate the effectiveness of unlearning.
To tackle these challenges, we propose two reinforcement unlearning methods.
The first method is based on decremental reinforcement learning, which aims to
erase the agent's previously acquired knowledge gradually. The second method
leverages environment poisoning attacks, which encourage the agent to learn
new, albeit incorrect, knowledge to remove the unlearning environment.
Particularly, to tackle the third challenge, we introduce the concept of
``environment inference attack'' to evaluate the unlearning outcomes. The
source code is available at
\url{https://anonymous.4open.science/r/Reinforcement-Unlearning-D347}.
</p>
|
|
|
|
<p>Recent advancements in Text-to-Image (T2I) diffusion models have demonstrated
impressive success in generating high-quality images with zero-shot
generalization capabilities. Yet, current models struggle to closely adhere to
prompt semantics, often misrepresenting or overlooking specific attributes. To
address this, we propose a simple, training-free approach that modulates the
guidance direction of diffusion models during inference. We first decompose the
prompt semantics into a set of concepts, and monitor the guidance trajectory in
relation to each concept. Our key observation is that deviations in model's
adherence to prompt semantics are highly correlated with divergence of the
guidance from one or more of these concepts. Based on this observation, we
devise a technique to steer the guidance direction towards any concept from
which the model diverges. Extensive experimentation validates that our method
improves the semantic alignment of images generated by diffusion models in
response to prompts. Project page is available at: https://korguy.github.io/
</p>
|
|
|
|
<p>Text-guided image-to-video (I2V) generation aims to generate a coherent video
that preserves the identity of the input image and semantically aligns with the
input prompt. Existing methods typically augment pretrained text-to-video (T2V)
models by either concatenating the image with noised video frames channel-wise
before being fed into the model or injecting the image embedding produced by
pretrained image encoders in cross-attention modules. However, the former
approach often necessitates altering the fundamental weights of pretrained T2V
models, thus restricting the model's compatibility within the open-source
communities and disrupting the model's prior knowledge. Meanwhile, the latter
typically fails to preserve the identity of the input image. We present
I2V-Adapter to overcome such limitations. I2V-Adapter adeptly propagates the
unnoised input image to subsequent noised frames through a cross-frame
attention mechanism, maintaining the identity of the input image without any
changes to the pretrained T2V model. Notably, I2V-Adapter only introduces a few
trainable parameters, significantly alleviating the training cost and also
ensures compatibility with existing community-driven personalized models and
control tools. Moreover, we propose a novel Frame Similarity Prior to balance
the motion amplitude and the stability of generated videos through two
adjustable control coefficients. Our experimental results demonstrate that
I2V-Adapter is capable of producing high-quality videos. This performance,
coupled with its agility and adaptability, represents a substantial advancement
in the field of I2V, particularly for personalized and controllable
applications.
</p>
|
|
|
|
<p>Inspired by human driving focus, this research pioneers networks augmented
with Focusing Sampling, Partial Field of View Evaluation, Enhanced FPN
architecture and Directional IoU Loss - targeted innovations addressing
obstacles to precise lane detection for autonomous driving. Experiments
demonstrate our Focusing Sampling strategy, emphasizing vital distant details
unlike uniform approaches, significantly boosts both benchmark and practical
curved/distant lane recognition accuracy essential for safety. While FENetV1
achieves state-of-the-art conventional metric performance via enhancements
isolating perspective-aware contexts mimicking driver vision, FENetV2 proves
most reliable on the proposed Partial Field analysis. Hence we specifically
recommend V2 for practical lane navigation despite fractional degradation on
standard entire-image measures. Future directions include collecting on-road
data and integrating complementary dual frameworks to further breakthroughs
guided by human perception principles. The Code is available at
https://github.com/HanyangZhong/FENet.
</p>
|
|
|
|
<p>Recent advances in large language models (LLMs) have led to the development
of various evaluation benchmarks. These benchmarks typically rely on a single
instruction template for evaluating all LLMs on a specific task. In this paper,
we comprehensively analyze the brittleness of results obtained via
single-prompt evaluations across 6.5M instances, involving 20 different LLMs
and 39 tasks from 3 benchmarks. To improve robustness of the analysis, we
propose to evaluate LLMs with a set of diverse prompts instead. We discuss
tailored evaluation metrics for specific use cases (e.g., LLM developers vs.
developers interested in a specific downstream task), ensuring a more reliable
and meaningful assessment of LLM capabilities. We then implement these criteria
and conduct evaluations of multiple models, providing insights into the true
strengths and limitations of current LLMs.
</p>
|
|
|
|
<p>This paper introduces HAAQI-Net, a non-intrusive deep learning model for
music quality assessment tailored to hearing aid users. In contrast to
traditional methods like the Hearing Aid Audio Quality Index (HAAQI), HAAQI-Net
utilizes a Bidirectional Long Short-Term Memory (BLSTM) with attention. It
takes an assessed music sample and a hearing loss pattern as input, generating
a predicted HAAQI score. The model employs the pre-trained Bidirectional
Encoder representation from Audio Transformers (BEATs) for acoustic feature
extraction. Comparing predicted scores with ground truth, HAAQI-Net achieves a
Longitudinal Concordance Correlation (LCC) of 0.9368, Spearman's Rank
Correlation Coefficient (SRCC) of 0.9486, and Mean Squared Error (MSE) of
0.0064. Notably, this high performance comes with a substantial reduction in
inference time: from 62.52 seconds (by HAAQI) to 2.54 seconds (by HAAQI-Net),
serving as an efficient music quality assessment model for hearing aid users.
</p>
|
|
|
|
<p>3D human body shape and pose estimation from RGB images is a challenging
problem with potential applications in augmented/virtual reality, healthcare
and fitness technology and virtual retail. Recent solutions have focused on
three types of inputs: i) single images, ii) multi-view images and iii) videos.
In this study, we surveyed and compared 3D body shape and pose estimation
methods for contemporary dance and performing arts, with a special focus on
human body pose and dressing, camera viewpoint, illumination conditions and
background conditions. We demonstrated that multi-frame methods, such as PHALP,
provide better results than single-frame method for pose estimation when
dancers are performing contemporary dances.
</p>
|
|
|
|
<p>In this paper, we present a novel transformer architecture tailored for
learning robust power system state representations, which strives to optimize
power dispatch for the power flow adjustment across different transmission
sections. Specifically, our proposed approach, named Powerformer, develops a
dedicated section-adaptive attention mechanism, separating itself from the
self-attention used in conventional transformers. This mechanism effectively
integrates power system states with transmission section information, which
facilitates the development of robust state representations. Furthermore, by
considering the graph topology of power system and the electrical attributes of
bus nodes, we introduce two customized strategies to further enhance the
expressiveness: graph neural network propagation and multi-factor attention
mechanism. Extensive evaluations are conducted on three power system scenarios,
including the IEEE 118-bus system, a realistic 300-bus system in China, and a
large-scale European system with 9241 buses, where Powerformer demonstrates its
superior performance over several baseline methods.
</p>
|
|
|
|
<p>This paper introduces RAISE (Reasoning and Acting through Scratchpad and
Examples), an advanced architecture enhancing the integration of Large Language
Models (LLMs) like GPT-4 into conversational agents. RAISE, an enhancement of
the ReAct framework, incorporates a dual-component memory system, mirroring
human short-term and long-term memory, to maintain context and continuity in
conversations. It entails a comprehensive agent construction scenario,
including phases like Conversation Selection, Scene Extraction, CoT Completion,
and Scene Augmentation, leading to the LLMs Training phase. This approach
appears to enhance agent controllability and adaptability in complex,
multi-turn dialogues. Our preliminary evaluations in a real estate sales
context suggest that RAISE has some advantages over traditional agents,
indicating its potential for broader applications. This work contributes to the
AI field by providing a robust framework for developing more context-aware and
versatile conversational agents.
</p>
|
|
|
|
<p>This paper addresses the challenge of example-based non-stationary texture
synthesis. We introduce a novel twostep approach wherein users first modify a
reference texture using standard image editing tools, yielding an initial rough
target for the synthesis. Subsequently, our proposed method, termed
"self-rectification", automatically refines this target into a coherent,
seamless texture, while faithfully preserving the distinct visual
characteristics of the reference exemplar. Our method leverages a pre-trained
diffusion network, and uses self-attention mechanisms, to gradually align the
synthesized texture with the reference, ensuring the retention of the
structures in the provided target. Through experimental validation, our
approach exhibits exceptional proficiency in handling non-stationary textures,
demonstrating significant advancements in texture synthesis when compared to
existing state-of-the-art techniques. Code is available at
https://github.com/xiaorongjun000/Self-Rectification
</p>
|
|
|
|
<p>There has been a growing interest in the task of generating sound for silent
videos, primarily because of its practicality in streamlining video
post-production. However, existing methods for video-sound generation attempt
to directly create sound from visual representations, which can be challenging
due to the difficulty of aligning visual representations with audio
representations. In this paper, we present SonicVisionLM, a novel framework
aimed at generating a wide range of sound effects by leveraging vision language
models. Instead of generating audio directly from video, we use the
capabilities of powerful vision language models (VLMs). When provided with a
silent video, our approach first identifies events within the video using a VLM
to suggest possible sounds that match the video content. This shift in approach
transforms the challenging task of aligning image and audio into more
well-studied sub-problems of aligning image-to-text and text-to-audio through
the popular diffusion models. To improve the quality of audio recommendations
with LLMs, we have collected an extensive dataset that maps text descriptions
to specific sound effects and developed temporally controlled audio adapters.
Our approach surpasses current state-of-the-art methods for converting video to
audio, resulting in enhanced synchronization with the visuals and improved
alignment between audio and video components. Project page:
https://yusiissy.github.io/SonicVisionLM.github.io/
</p>
|
|
|
|
<p>The success of drug discovery and development relies on the precise
prediction of molecular activities and properties. While in silico molecular
property prediction has shown remarkable potential, its use has been limited so
far to assays for which large amounts of data are available. In this study, we
use a fine-tuned large language model to integrate biological assays based on
their textual information, coupled with Barlow Twins, a Siamese neural network
using a novel self-supervised learning approach. This architecture uses both
assay information and molecular fingerprints to extract the true molecular
information. TwinBooster enables the prediction of properties of unseen
bioassays and molecules by providing state-of-the-art zero-shot learning tasks.
Remarkably, our artificial intelligence pipeline shows excellent performance on
the FS-Mol benchmark. This breakthrough demonstrates the application of deep
learning to critical property prediction tasks where data is typically scarce.
By accelerating the early identification of active molecules in drug discovery
and development, this method has the potential to help streamline the
identification of novel therapeutics.
</p>
|
|
|
|
<p>In this paper, for any fixed integer $q>2$, we construct $q$-ary codes
correcting a burst of at most $t$ deletions with redundancy $\log n+8\log\log
n+o(\log\log n)+\gamma_{q,t}$ bits and near-linear encoding/decoding
complexity, where $n$ is the message length and $\gamma_{q,t}$ is a constant
that only depends on $q$ and $t$. In previous works there are constructions of
such codes with redundancy $\log n+O(\log q\log\log n)$ bits or $\log
n+O(t^2\log\log n)+O(t\log q)$. The redundancy of our new construction is
independent of $q$ and $t$ in the second term.
</p>
|
|
|
|
<p>We propose Compact and Swift Segmenting 3D Gaussians(CoSSegGaussians), a
method for compact 3D-consistent scene segmentation at fast rendering speed
with only RGB images input. Previous NeRF-based segmentation methods have
relied on time-consuming neural scene optimization. While recent 3D Gaussian
Splatting has notably improved speed, existing Gaussian-based segmentation
methods struggle to produce compact masks, especially in zero-shot
segmentation. This issue probably stems from their straightforward assignment
of learnable parameters to each Gaussian, resulting in a lack of robustness
against cross-view inconsistent 2D machine-generated labels. Our method aims to
address this problem by employing Dual Feature Fusion Network as Gaussians'
segmentation field. Specifically, we first optimize 3D Gaussians under RGB
supervision. After Gaussian Locating, DINO features extracted from images are
applied through explicit unprojection, which are further incorporated with
spatial features from the efficient point cloud processing network. Feature
aggregation is utilized to fuse them in a global-to-local strategy for compact
segmentation features. Experimental results show that our model outperforms
baselines on both semantic and panoptic zero-shot segmentation task, meanwhile
consumes less than 10% inference time compared to NeRF-based methods. Code and
more results will be available at https://David-Dou.github.io/CoSSegGaussians
</p>
|
|
|
|
<p>Adversarial generative models, such as Generative Adversarial Networks
(GANs), are widely applied for generating various types of data, i.e., images,
text, and audio. Accordingly, its promising performance has led to the
GAN-based adversarial attack methods in the white-box and black-box attack
scenarios. The importance of transferable black-box attacks lies in their
ability to be effective across different models and settings, more closely
aligning with real-world applications. However, it remains challenging to
retain the performance in terms of transferable adversarial examples for such
methods. Meanwhile, we observe that some enhanced gradient-based transferable
adversarial attack algorithms require prolonged time for adversarial sample
generation. Thus, in this work, we propose a novel algorithm named GE-AdvGAN to
enhance the transferability of adversarial samples whilst improving the
algorithm's efficiency. The main approach is via optimising the training
process of the generator parameters. With the functional and characteristic
similarity analysis, we introduce a novel gradient editing (GE) mechanism and
verify its feasibility in generating transferable samples on various models.
Moreover, by exploring the frequency domain information to determine the
gradient editing direction, GE-AdvGAN can generate highly transferable
adversarial samples while minimizing the execution time in comparison to the
state-of-the-art transferable adversarial attack algorithms. The performance of
GE-AdvGAN is comprehensively evaluated by large-scale experiments on different
datasets, which results demonstrate the superiority of our algorithm. The code
for our algorithm is available at: https://github.com/LMBTough/GE-advGAN
</p>
|
|
|
|
<p>Multi-modal large language models have demonstrated impressive performance
across various tasks in different modalities. However, existing multi-modal
models primarily emphasize capturing global information within each modality
while neglecting the importance of perceiving local information across
modalities. Consequently, these models lack the ability to effectively
understand the fine-grained details of input data, limiting their performance
in tasks that require a more nuanced understanding. To address this limitation,
there is a compelling need to develop models that enable fine-grained
understanding across multiple modalities, thereby enhancing their applicability
to a wide range of tasks. In this paper, we propose GroundingGPT, a language
enhanced multi-modal grounding model. Beyond capturing global information like
other multi-modal models, our proposed model excels at tasks demanding a
detailed understanding of local information within the input. It demonstrates
precise identification and localization of specific regions in images or
moments in videos. To achieve this objective, we design a diversified dataset
construction pipeline, resulting in a multi-modal, multi-granularity dataset
for model training. The code, dataset, and demo of our model can be found at
https: //github.com/lzw-lzw/GroundingGPT.
</p>
|
|
|
|
<p>We give complete presentations for the dagger-compact props of affine
Lagrangian and coisotropic relations over an arbitrary field. This provides a
unified family of graphical languages for both affinely constrained classical
mechanical systems, as well as odd-prime-dimensional stabiliser quantum
circuits. To this end, we present affine Lagrangian relations by a particular
class of undirected coloured graphs. In order to reason about composite
systems, we introduce a powerful scalable notation where the vertices of these
graphs are themselves coloured by graphs. In the setting of stabiliser quantum
mechanics, this scalable notation gives an extremely concise description of
graph states, which can be composed via ``phased spider fusion.'' Likewise, in
the classical mechanical setting of electrical circuits, we show that impedance
matrices for reciprocal networks are presented in essentially the same way.
</p>
|
|
|
|
<p>In this chapter, we investigate the mathematical foundation of the modeling
and design of reconfigurable intelligent surfaces (RIS) in both the far- and
near-field regimes. More specifically, we first present RIS-assisted wireless
channel models for the far- and near-field regimes, discussing relevant
phenomena, such as line-of-sight (LOS) and non-LOS links, rich and poor
scattering, channel correlation, and array manifold. Subsequently, we introduce
two general approaches for the RIS reflective beam design, namely
optimization-based and analytical, which offer different degrees of design
flexibility and computational complexity. Furthermore, we provide a
comprehensive set of simulation results for the performance evaluation of the
studied RIS beam designs and the investigation of the impact of the system
parameters.
</p>
|
|
|
|
<p>There are two common ways in which developers are incorporating proprietary
and domain-specific data when building applications of Large Language Models
(LLMs): Retrieval-Augmented Generation (RAG) and Fine-Tuning. RAG augments the
prompt with the external data, while fine-Tuning incorporates the additional
knowledge into the model itself. However, the pros and cons of both approaches
are not well understood. In this paper, we propose a pipeline for fine-tuning
and RAG, and present the tradeoffs of both for multiple popular LLMs, including
Llama2-13B, GPT-3.5, and GPT-4. Our pipeline consists of multiple stages,
including extracting information from PDFs, generating questions and answers,
using them for fine-tuning, and leveraging GPT-4 for evaluating the results. We
propose metrics to assess the performance of different stages of the RAG and
fine-Tuning pipeline. We conduct an in-depth study on an agricultural dataset.
Agriculture as an industry has not seen much penetration of AI, and we study a
potentially disruptive application - what if we could provide location-specific
insights to a farmer? Our results show the effectiveness of our dataset
generation pipeline in capturing geographic-specific knowledge, and the
quantitative and qualitative benefits of RAG and fine-tuning. We see an
accuracy increase of over 6 p.p. when fine-tuning the model and this is
cumulative with RAG, which increases accuracy by 5 p.p. further. In one
particular experiment, we also demonstrate that the fine-tuned model leverages
information from across geographies to answer specific questions, increasing
answer similarity from 47% to 72%. Overall, the results point to how systems
built using LLMs can be adapted to respond and incorporate knowledge across a
dimension that is critical for a specific industry, paving the way for further
applications of LLMs in other industrial domains.
</p>
|
|
|
|
<p>Human participants play a central role in the development of modern
artificial intelligence (AI) technology, in psychological science, and in user
research. Recent advances in generative AI have attracted growing interest to
the possibility of replacing human participants in these domains with AI
surrogates. We survey several such "substitution proposals" to better
understand the arguments for and against substituting human participants with
modern generative AI. Our scoping review indicates that the recent wave of
these proposals is motivated by goals such as reducing the costs of research
and development work and increasing the diversity of collected data. However,
these proposals ignore and ultimately conflict with foundational values of work
with human participants: representation, inclusion, and understanding. This
paper critically examines the principles and goals underlying human
participation to help chart out paths for future work that truly centers and
empowers participants.
</p>
|
|
|
|
<p>As deep neural networks are more commonly deployed in high-stakes domains,
their lack of interpretability makes uncertainty quantification challenging. We
investigate the effects of presenting conformal prediction
sets$\unicode{x2013}$a method for generating valid confidence sets in
distribution-free uncertainty quantification$\unicode{x2013}$to express
uncertainty in AI-advised decision-making. Through a large online experiment,
we compare the utility of conformal prediction sets to displays of Top-$1$ and
Top-$k$ predictions for AI-advised image labeling. We find that the utility of
prediction sets for accuracy varies with the difficulty of the task: while they
result in accuracy on par with or less than Top-$1$ and Top-$k$ displays for
easy images, prediction sets excel at assisting humans in labeling
out-of-distribution (OOD) images especially when the set size is small. Our
results empirically pinpoint the practical challenges of conformal prediction
sets and provide implications on how to incorporate them for real-world
decision-making.
</p>
|
|
|
|
<p>Despite the advancements in large language models (LLMs) for mathematical
reasoning, solving competition-level math problems remains a significant
challenge, especially for open-source LLMs without external tools. We introduce
the MMIQC dataset, comprising a mixture of processed web data and synthetic
question-response pairs, aimed at enhancing the mathematical reasoning
capabilities of base language models. Models fine-tuned on MMIQC consistently
surpass their counterparts in performance on the MATH benchmark across various
model sizes. Notably, Qwen-72B-MMIQC achieves a 45.0% accuracy, exceeding the
previous open-source state-of-the-art by 8.2% and outperforming the initial
version GPT-4 released in 2023. Extensive evaluation results on Hungarian high
school finals suggest that such improvement can generalize to unseen data. Our
ablation study on MMIQC reveals that a large part of the improvement can be
attributed to our novel augmentation method, Iterative Question Composing
(IQC), which involves iteratively composing new questions from seed problems
using an LLM and applying rejection sampling through another LLM. The MMIQC
dataset is available on the HuggingFace hub at
https://huggingface.co/datasets/Vivacem/MMIQC. Our code is available at
https://github.com/iiis-ai/IterativeQuestionComposing.
</p>
|
|
|
|
<p>Whilst spectral Graph Neural Networks (GNNs) are theoretically well-founded
in the spectral domain, their practical reliance on polynomial approximation
implies a profound linkage to the spatial domain. As previous studies rarely
examine spectral GNNs from the spatial perspective, their spatial-domain
interpretability remains elusive, e.g., what information is essentially encoded
by spectral GNNs in the spatial domain? In this paper, to answer this question,
we establish a theoretical connection between spectral filtering and spatial
aggregation, unveiling an intrinsic interaction that spectral filtering
implicitly leads the original graph to an adapted new graph, explicitly
computed for spatial aggregation. Both theoretical and empirical investigations
reveal that the adapted new graph not only exhibits non-locality but also
accommodates signed edge weights to reflect label consistency among nodes.
These findings thus highlight the interpretable role of spectral GNNs in the
spatial domain and inspire us to rethink graph spectral filters beyond the
fixed-order polynomials, which neglect global information. Built upon the
theoretical findings, we revisit the state-of-the-art spectral GNNs and propose
a novel Spatially Adaptive Filtering (SAF) framework, which leverages the
adapted new graph by spectral filtering for an auxiliary non-local aggregation.
Notably, our proposed SAF comprehensively models both node similarity and
dissimilarity from a global perspective, therefore alleviating persistent
deficiencies of GNNs related to long-range dependencies and graph heterophily.
Extensive experiments over 13 node classification benchmarks demonstrate the
superiority of our proposed framework to the state-of-the-art models.
</p>
|
|
|
|
<p>E-commerce customers frequently seek detailed product information for
purchase decisions, commonly contacting sellers directly with extended queries.
This manual response requirement imposes additional costs and disrupts buyer's
shopping experience with response time fluctuations ranging from hours to days.
We seek to automate buyer inquiries to sellers in a leading e-commerce store
using a domain-specific federated Question Answering (QA) system. The main
challenge is adapting current QA systems, designed for single questions, to
address detailed customer queries. We address this with a low-latency,
sequence-to-sequence approach, MESSAGE-TO-QUESTION ( M2Q ). It reformulates
buyer messages into succinct questions by identifying and extracting the most
salient information from a message. Evaluation against baselines shows that M2Q
yields relative increases of 757% in question understanding, and 1,746% in
answering rate from the federated QA system. Live deployment shows that
automatic answering saves sellers from manually responding to millions of
messages per year, and also accelerates customer purchase decisions by
eliminating the need for buyers to wait for a reply
</p>
|
|
|
|
<p>Study Objectives: Polysomnography (PSG) currently serves as the benchmark for
evaluating sleep disorders. Its discomfort, impracticality for home-use, and
introduction of bias in sleep quality assessment necessitate the exploration of
less invasive, cost-effective, and portable alternatives. One promising
contender is the in-ear-EEG sensor, which offers advantages in terms of
comfort, fixed electrode positions, resistance to electromagnetic interference,
and user-friendliness. This study aims to establish a methodology to assess the
similarity between the in-ear-EEG signal and standard PSG.
</p>
<p>Methods: We assess the agreement between the PSG and in-ear-EEG derived
hypnograms. We extract features in the time- and frequency- domain from PSG and
in-ear-EEG 30-second epochs. We only consider the epochs where the PSG-scorers
and the in-ear-EEG-scorers were in agreement. We introduce a methodology to
quantify the similarity between PSG derivations and the single-channel
in-ear-EEG. The approach relies on a comparison of distributions of selected
features -- extracted for each sleep stage and subject on both PSG and the
in-ear-EEG signals -- via a Jensen-Shannon Divergence Feature-based Similarity
Index (JSD-FSI).
</p>
<p>Results: We found a high intra-scorer variability, mainly due to the
uncertainty the scorers had in evaluating the in-ear-EEG signals. We show that
the similarity between PSG and in-ear-EEG signals is high (JSD-FSI: 0.61 +/-
0.06 in awake, 0.60 +/- 0.07 in NREM and 0.51 +/- 0.08 in REM), and in line
with the similarity values computed independently on standard
PSG-channel-combinations.
</p>
<p>Conclusions: In-ear-EEG is a valuable solution for home-based sleep
monitoring, however further studies with a larger and more heterogeneous
dataset are needed.
</p>
|
|
|
|
<p>Tactics, Techniques and Procedures (TTPs) represent sophisticated attack
patterns in the cybersecurity domain, described encyclopedically in textual
knowledge bases. Identifying TTPs in cybersecurity writing, often called TTP
mapping, is an important and challenging task. Conventional learning approaches
often target the problem in the classical multi-class or multilabel
classification setting. This setting hinders the learning ability of the model
due to a large number of classes (i.e., TTPs), the inevitable skewness of the
label distribution and the complex hierarchical structure of the label space.
We formulate the problem in a different learning paradigm, where the assignment
of a text to a TTP label is decided by the direct semantic similarity between
the two, thus reducing the complexity of competing solely over the large
labeling space. To that end, we propose a neural matching architecture with an
effective sampling-based learn-to-compare mechanism, facilitating the learning
process of the matching model despite constrained resources.
</p>
|
|
|
|
<p>This study explores the nuanced capabilities and inherent biases of
Recommender Systems using Large Language Models (RecLLMs), with a focus on
ChatGPT-based systems. It studies into the contrasting behaviors of generative
models and traditional collaborative filtering models in movie recommendations.
The research primarily investigates prompt design strategies and their impact
on various aspects of recommendation quality, including accuracy, provider
fairness, diversity, stability, genre dominance, and temporal freshness
(recency).
</p>
<p>Our experimental analysis reveals that the introduction of specific 'system
roles' and 'prompt strategies' in RecLLMs significantly influences their
performance. For instance, role-based prompts enhance fairness and diversity in
recommendations, mitigating popularity bias. We find that while GPT-based
models do not always match the performance of CF baselines, they exhibit a
unique tendency to recommend newer and more diverse movie genres. Notably,
GPT-based models tend to recommend more recent films, particularly those
released post-2000, and show a preference for genres like \sq{Drama} and
Comedy, and Romance (compared to CF Action, Adventure) presumably due to the
RecLLMs' training on varied data sets, which allows them to capture recent
trends and discussions more effectively than CF models. Interestingly, our
results demonstrate that the 'Simple' and 'Chain of Thought (COT)' paradigms
yield the highest accuracy. These findings imply the potential of combining
these strategies with scenarios that favor more recent content, thereby
offering a more balanced and up-to-date recommendation experience. This study
contributes significantly to the understanding of emerging RecLLMs,
particularly in the context of harms and biases within these systems.
</p>
|
|
|
|
<p>Electroencephalography (EEG) signals are frequently used for various
Brain-Computer Interface (BCI) tasks. While Deep Learning (DL) techniques have
shown promising results, they are hindered by the substantial data
requirements. By leveraging data from multiple subjects, transfer learning
enables more effective training of DL models. A technique that is gaining
popularity is Euclidean Alignment (EA) due to its ease of use, low
computational complexity, and compatibility with Deep Learning models. However,
few studies evaluate its impact on the training performance of shared and
individual DL models. In this work, we systematically evaluate the effect of EA
combined with DL for decoding BCI signals. We used EA to train shared models
with data from multiple subjects and evaluated its transferability to new
subjects. Our experimental results show that it improves decoding in the target
subject by 4.33% and decreases convergence time by more than 70%. We also
trained individual models for each subject to use as a majority-voting ensemble
classifier. In this scenario, using EA improved the 3-model ensemble accuracy
by 3.7%. However, when compared to the shared model with EA, the ensemble
accuracy was 3.62% lower.
</p>
|
|
|
|
<p>Embedding methods transform the knowledge graph into a continuous,
low-dimensional space, facilitating inference and completion tasks. Existing
methods are mainly divided into two types: translational distance models and
semantic matching models. A key challenge in translational distance models is
their inability to effectively differentiate between 'head' and 'tail' entities
in graphs. To address this problem, a novel location-sensitive embedding (LSE)
method has been developed. LSE innovatively modifies the head entity using
relation-specific mappings, conceptualizing relations as linear transformations
rather than mere translations. The theoretical foundations of LSE, including
its representational capabilities and its connections to existing models, have
been thoroughly examined. A more streamlined variant, LSE-d, which employs a
diagonal matrix for transformations to enhance practical efficiency, is also
proposed. Experiments conducted on four large-scale KG datasets for link
prediction show that LSEd either outperforms or is competitive with
state-of-the-art related works.
</p>
|
|
|
|
<p>In the current landscape of online abuses and harms, effective content
moderation is necessary to cultivate safe and inclusive online spaces. Yet, the
effectiveness of many moderation interventions is still unclear. Here, we
assess the effectiveness of The Great Ban, a massive deplatforming operation
that affected nearly 2,000 communities on Reddit. By analyzing 16M comments
posted by 17K users during 14 months, we provide nuanced results on the
effects, both desired and otherwise, of the ban. Among our main findings is
that 15.6% of the affected users left Reddit and that those who remained
reduced their toxicity by 6.6% on average. The ban also caused 5% users to
increase their toxicity by more than 70% of their pre-ban level. However, these
resentful users likely had limited impact on Reddit due to low activity and
little support by peers. Overall, our multifaceted results provide new insights
into the efficacy of deplatforming. Our findings can inform the development of
future moderation interventions and the policing of online platforms.
</p>
|
|
|
|
<p>Graph classification is an important learning task for graph-structured data.
Graph neural networks (GNNs) have recently gained growing attention in graph
learning and have shown significant improvements in many important graph
problems. Despite their state-of-the-art performances, existing GNNs only use
local information from a very limited neighborhood around each node, suffering
from loss of multi-modal information and overheads of excessive computation. To
address these issues, we propose a novel Tensor-view Topological Graph Neural
Network (TTG-NN), a class of simple yet effective topological deep learning
built upon persistent homology, graph convolution, and tensor operations. This
new method incorporates tensor learning to simultaneously capture Tensor-view
Topological (TT), as well as Tensor-view Graph (TG) structural information on
both local and global levels. Computationally, to fully exploit graph topology
and structure, we propose two flexible TT and TG representation learning
modules that disentangle feature tensor aggregation and transformation and
learn to preserve multi-modal structure with less computation. Theoretically,
we derive high probability bounds on both the out-of-sample and in-sample mean
squared approximation errors for our proposed Tensor Transformation Layer
(TTL). Real data experiments show that the proposed TTG-NN outperforms 20
state-of-the-art methods on various graph benchmarks.
</p>
|
|
|
|
<p>Computational modeling of the melt pool dynamics in laser-based powder bed
fusion metal additive manufacturing PBF-LB/M promises to shed light on
fundamental defect generation mechanisms. These processes are typically
accompanied by rapid evaporation so that the evaporation-induced recoil
pressure and cooling arise as major driving forces for fluid dynamics and
temperature evolution. The magnitude of these interface fluxes depends
exponentially on the melt pool surface temperature, which, therefore, has to be
predicted with high accuracy. The present work utilizes a diffuse interface
model based on a continuum surface flux (CSF) description on the interfaces to
study dimensionally reduced thermal two-phase problems representing PBF-LB/M in
a finite element framework. It is demonstrated that the extreme temperature
gradients combined with the high ratios of material properties between metal
and ambient gas lead to significant errors in the interface temperatures and
fluxes when classical CSF approaches, along with typical interface thicknesses
and discretizations, are applied. A novel parameter-scaled CSF approach is
proposed, which is constructed to yield a smoother temperature rate in the
diffuse interface region, significantly increasing the solution accuracy. The
interface thickness required to predict the temperature field with a given
level of accuracy is less restrictive by at least one order of magnitude for
the proposed parameter-scaled CSF approach compared to classical CSF,
drastically reducing computational costs. Finally, we showcase the general
applicability of the parameter-scaled CSF to a three-dimensional simulation of
stationary laser melting of PBF-LB/M considering the fully coupled
thermo-hydrodynamic multi-phase problem, including phase change.
</p>
|
|
|
|
<p>In the open source software (OSS) ecosystem, there exists a complex software
supply chain, where developers upstream and downstream widely borrow and reuse
code. This results in the widespread occurrence of recurring defects, missing
fixes, and propagation issues. These are collectively referred to as cognate
defects, and their scale and threats have not received extensive attention and
systematic research. Software composition analysis and code clone detection
methods are unable to cover the various variant issues in the supply chain
scenario, while code static analysis, or static application security testing
(SAST) techniques struggle to target specific defects. In this paper, we
propose a novel technique for detecting cognate defects in OSS through the
automatic generation of SAST rules. Specifically, it extracts key syntax and
semantic information from pre- and post-patch versions of code through
structural comparison and control flow to data flow analysis, and generates
rules that matches these key elements. We have implemented a prototype tool
called Patch2QL and applied it to fundamental OSS in C/C++. In experiments, we
discovered 7 new vulnerabilities with medium to critical severity in the most
popular upstream software, as well as numerous potential security issues. When
analyzing downstream projects in the supply chain, we found a significant
number of representative cognate defects, clarifying the threat posed by this
issue. Additionally, compared to general-purpose SAST and signature-based
mechanisms, the generated rules perform better at discover all variants of
cognate defects.
</p>
|
|
|
|
<p>The dynamic nature of language, particularly evident in the realm of slang
and memes on the Internet, poses serious challenges to the adaptability of
large language models (LLMs). Traditionally anchored to static datasets, these
models often struggle to keep up with the rapid linguistic evolution
characteristic of online communities. This research addresses the critical need
to bridge this gap, aiming to enhance LLMs' comprehension of the evolving new
concepts on the internet, without the high cost of continual retraining. To
address this issue, we propose a new benchmark $\textbf{SLANG}$, which can
autonomously integrates novel data to stay dataset up-to-date, to assess LLMs'
capability in comprehending emerging concepts and an approach $\textbf{FOCUS}$,
which uses causal inference to enhance LLMs to understand new phrases and their
colloquial context. This benchmark and approach involves digesting real-world
instances of linguistic shifts, serving as contextual beacons, to form more
precise and contextually relevant connections between newly emerging
expressions and their intended meanings. The empirical analysis shows that our
causal inference-based approach outperforms the traditional models in terms of
precision and relevance in the interpretation of internet slang and memes.
</p>
|
|
|
|
<p>Multiview clustering (MVC) segregates data samples into meaningful clusters
by synthesizing information across multiple views. Moreover, deep
learning-based methods have demonstrated their strong feature learning
capabilities in MVC scenarios. However, effectively generalizing feature
representations while maintaining consistency is still an intractable problem.
In addition, most existing deep clustering methods based on contrastive
learning overlook the consistency of the clustering representations during the
clustering process. In this paper, we show how the above problems can be
overcome and propose a consistent enhancement-based deep MVC method via
contrastive learning (CCEC). Specifically, semantic connection blocks are
incorporated into a feature representation to preserve the consistent
information among multiple views. Furthermore, the representation process for
clustering is enhanced through spectral clustering, and the consistency across
multiple views is improved. Experiments conducted on five datasets demonstrate
the effectiveness and superiority of our method in comparison with the
state-of-the-art (SOTA) methods. The code for this method can be accessed at
https://anonymous.4open.science/r/CCEC-E84E/.
</p>
|
|
|
|
<p>Gaze interaction presents a promising avenue in Virtual Reality (VR) due to
its intuitive and efficient user experience. Yet, the depth control inherent in
our visual system remains underutilized in current methods. In this study, we
introduce FocusFlow, a hands-free interaction method that capitalizes on human
visual depth perception within the 3D scenes of Virtual Reality. We first
develop a binocular visual depth detection algorithm to understand eye input
characteristics. We then propose a layer-based user interface and introduce the
concept of 'Virtual Window' that offers an intuitive and robust gaze-depth VR
interaction, despite the constraints of visual depth accuracy and precision
spatially at further distances. Finally, to help novice users actively
manipulate their visual depth, we propose two learning strategies that use
different visual cues to help users master visual depth control. Our user
studies on 24 participants demonstrate the usability of our proposed virtual
window concept as a gaze-depth interaction method. In addition, our findings
reveal that the user experience can be enhanced through an effective learning
process with adaptive visual cues, helping users to develop muscle memory for
this brand-new input mechanism. We conclude the paper by discussing strategies
to optimize learning and potential research topics of gaze-depth interaction.
</p>
|
|
|
|
<p>Despite much progress, achieving real-time high-fidelity head avatar
animation is still difficult and existing methods have to trade-off between
speed and quality. 3DMM based methods often fail to model non-facial structures
such as eyeglasses and hairstyles, while neural implicit models suffer from
deformation inflexibility and rendering inefficiency. Although 3D Gaussian has
been demonstrated to possess promising capability for geometry representation
and radiance field reconstruction, applying 3D Gaussian in head avatar creation
remains a major challenge since it is difficult for 3D Gaussian to model the
head shape variations caused by changing poses and expressions. In this paper,
we introduce PSAvatar, a novel framework for animatable head avatar creation
that utilizes discrete geometric primitive to create a parametric morphable
shape model and employs 3D Gaussian for fine detail representation and high
fidelity rendering. The parametric morphable shape model is a Point-based
Morphable Shape Model (PMSM) which uses points instead of meshes for 3D
representation to achieve enhanced representation flexibility. The PMSM first
converts the FLAME mesh to points by sampling on the surfaces as well as off
the meshes to enable the reconstruction of not only surface-like structures but
also complex geometries such as eyeglasses and hairstyles. By aligning these
points with the head shape in an analysis-by-synthesis manner, the PMSM makes
it possible to utilize 3D Gaussian for fine detail representation and
appearance modeling, thus enabling the creation of high-fidelity avatars. We
show that PSAvatar can reconstruct high-fidelity head avatars of a variety of
subjects and the avatars can be animated in real-time ($\ge$ 25 fps at a
resolution of 512 $\times$ 512 ).
</p>
|
|
|
|
<p>We introduce Coverage Axis++, a novel and efficient approach to 3D shape
skeletonization. The current state-of-the-art approaches for this task often
rely on the watertightness of the input or suffer from substantial
computational costs, thereby limiting their practicality. To address this
challenge, Coverage Axis++ proposes a heuristic algorithm to select skeletal
points, offering a high-accuracy approximation of the Medial Axis Transform
(MAT) while significantly mitigating computational intensity for various shape
representations. We introduce a simple yet effective strategy that considers
both shape coverage and uniformity to derive skeletal points. The selection
procedure enforces consistency with the shape structure while favoring the
dominant medial balls, which thus introduces a compact underlying shape
representation in terms of MAT. As a result, Coverage Axis++ allows for
skeletonization for various shape representations (e.g., water-tight meshes,
triangle soups, point clouds), specification of the number of skeletal points,
few hyperparameters, and highly efficient computation with improved
reconstruction accuracy. Extensive experiments across a wide range of 3D shapes
validate the efficiency and effectiveness of Coverage Axis++. The code will be
publicly available once the paper is published.
</p>
|
|
|
|
<p>Large-scale neural language models exhibit a remarkable capacity for
in-context learning (ICL): they can infer novel functions from datasets
provided as input. Most of our current understanding of when and how ICL arises
comes from LMs trained on extremely simple learning problems like linear
regression and associative recall. There remains a significant gap between
these model problems and the "real" ICL exhibited by LMs trained on large text
corpora, which involves not just retrieval and function approximation but
free-form generation of language and other structured outputs. In this paper,
we study ICL through the lens of a new family of model problems we term in
context language learning (ICLL). In ICLL, LMs are presented with a set of
strings from a formal language, and must generate additional strings from the
same language. We focus on in-context learning of regular languages generated
by random finite automata. We evaluate a diverse set of neural sequence models
(including several RNNs, Transformers, and state-space model variants) on
regular ICLL tasks, aiming to answer three questions: (1) Which model classes
are empirically capable of ICLL? (2) What algorithmic solutions do successful
models implement to perform ICLL? (3) What architectural changes can improve
ICLL in less performant models? We first show that Transformers significantly
outperform neural sequence models with recurrent or convolutional
representations on ICLL tasks. Next, we provide evidence that their ability to
do so relies on specialized "n-gram heads" (higher-order variants of induction
heads) that compute input-conditional next-token distributions. Finally, we
show that hard-wiring these heads into neural models improves performance not
just on ICLL, but natural language modeling -- improving the perplexity of
340M-parameter models by up to 1.14 points (6.7%) on the SlimPajama dataset.
</p>
|
|
|
|
<p>Large Language Models (LLMs) have demonstrated remarkable success in various
natural language processing and software engineering tasks, such as code
generation. The LLMs are mainly utilized in the prompt-based zero/few-shot
paradigm to guide the model in accomplishing the task. GPT-based models are one
of the popular ones studied for tasks such as code comment generation or test
generation. These tasks are `generative' tasks. However, there is limited
research on the usage of LLMs for `non-generative' tasks such as classification
using the prompt-based paradigm. In this preliminary exploratory study, we
investigated the applicability of LLMs for Code Clone Detection (CCD), a
non-generative task. By building a mono-lingual and cross-lingual CCD dataset
derived from CodeNet, we first investigated two different prompts using ChatGPT
to detect Type-4 code clones in Java-Java and Java-Ruby pairs in a zero-shot
setting. We then conducted an analysis to understand the strengths and
weaknesses of ChatGPT in CCD. ChatGPT surpasses the baselines in cross-language
CCD attaining an F1-score of 0.877 and achieves comparable performance to fully
fine-tuned models for mono-lingual CCD, with an F1-score of 0.878. Also, the
prompt and the difficulty level of the problems has an impact on the
performance of ChatGPT. Finally we provide insights and future directions based
on our initial analysis
</p>
|
|
|
|
<p>Social media feeds are deeply personal spaces that reflect individual values
and preferences. However, top-down, platform-wide content algorithms can reduce
users' sense of agency and fail to account for nuanced experiences and values.
Drawing on the paradigm of interactive machine teaching (IMT), an interaction
framework for non-expert algorithmic adaptation, we map out a design space for
teachable social media feed experiences to empower agential, personalized feed
curation. To do so, we conducted a think-aloud study (N=24) featuring four
social media platforms -- Instagram, Mastodon, TikTok, and Twitter -- to
understand key signals users leveraged to determine the value of a post in
their feed. We synthesized users' signals into taxonomies that, when combined
with user interviews, inform five design principles that extend IMT into the
social media setting. We finally embodied our principles into three feed
designs that we present as sensitizing concepts for teachable feed experiences
moving forward.
</p>
|
|
|
|
<p>Large-scale text-to-image generative models have made impressive strides,
showcasing their ability to synthesize a vast array of high-quality images.
However, adapting these models for artistic image editing presents two
significant challenges. Firstly, users struggle to craft textual prompts that
meticulously detail visual elements of the input image. Secondly, prevalent
models, when effecting modifications in specific zones, frequently disrupt the
overall artistic style, complicating the attainment of cohesive and
aesthetically unified artworks. To surmount these obstacles, we build the
innovative unified framework CreativeSynth, which is based on a diffusion model
with the ability to coordinate multimodal inputs and multitask in the field of
artistic image generation. By integrating multimodal features with customized
attention mechanisms, CreativeSynth facilitates the importation of real-world
semantic content into the domain of art through inversion and real-time style
transfer. This allows for the precise manipulation of image style and content
while maintaining the integrity of the original model parameters. Rigorous
qualitative and quantitative evaluations underscore that CreativeSynth excels
in enhancing artistic images' fidelity and preserves their innate aesthetic
essence. By bridging the gap between generative models and artistic finesse,
CreativeSynth becomes a custom digital palette.
</p>
|
|
|
|
<p>Indexed Linear Logic has been introduced by Ehrhard and Bucciarelli, it can
be seen as a logical presentation of non-idempotent intersection types extended
through the relational semantics to the full linear logic. We introduce an
idempotent variant of Indexed Linear Logic. We give a fine-grained
reformulation of the syntax by exposing implicit parameters and by unifying
several operations on formulae via the notion of base change. Idempotency is
achieved by means of an appropriate subtyping relation. We carry on an in-depth
study of indLL as a logic, showing how it determines a refinement of classical
linear logic and establishing a terminating cut-elimination procedure.
Cut-elimination is proved to be confluent up to an appropriate congruence
induced by the subtyping relation.
</p>
|
|
|
|
<p>This work considers an asynchronous $\textsf{K}_\text{a}$-active-user
unsourced multiple access channel (AUMAC) with the worst-case asynchronicity.
The transmitted messages must be decoded within $n$ channel uses, while some
codewords are not completely received due to asynchronicities. We consider a
constraint of the largest allowed delay of the transmission. The AUMAC lacks
the permutation-invariant property of the synchronous UMAC since different
permutations of the same codewords with a fixed asynchronicity are
distinguishable. Hence, the analyses require calculating all
$2^{\textsf{K}_\text{a}}-1$ combinations of erroneously decoded messages.
Moreover, transmitters cannot adapt the corresponding codebooks according to
asynchronicity due to a lack of information on asynchronicities. To overcome
this challenge, a uniform bound of the per-user probability of error (PUPE) is
derived by investigating the worst-case of the asynchronous patterns with the
delay constraint. Numerical results show the trade-off between the
energy-per-bit and the number of active users for different delay constraints.
In addition, although the asynchronous transmission reduces interference, the
required energy-per-bit increases as the receiver decodes with incompletely
received codewords, compared to the synchronous case.
</p>
|
|
|
|
<p>We study the (quantum) security of pseudorandom generators (PRGs) constructed
from random oracles. We prove a "lifting theorem" showing, roughly, that if
such a PRG is unconditionally secure against classical adversaries making
polynomially many queries to the random oracle, then it is also
(unconditionally) secure against quantum adversaries in the same sense. As a
result of independent interest, we also show that any pseudo-deterministic
quantum-oracle algorithm (i.e., a quantum algorithm that with high probability
returns the same value on repeated executions) can be simulated by a
computationally unbounded but query bounded classical-oracle algorithm with
only a polynomial blowup in the number of queries. This implies as a corollary
that our lifting theorem holds even for PRGs that themselves make quantum
queries to the random oracle.
</p>
|
|
|
|
<p>Recent TTS models with decoder-only Transformer architecture, such as
SPEAR-TTS and VALL-E, achieve impressive naturalness and demonstrate the
ability for zero-shot adaptation given a speech prompt. However, such
decoder-only TTS models lack monotonic alignment constraints, sometimes leading
to hallucination issues such as mispronunciation, word skipping and repeating.
To address this limitation, we propose VALL-T, a generative Transducer model
that introduces shifting relative position embeddings for input phoneme
sequence, explicitly indicating the monotonic generation process while
maintaining the architecture of decoder-only Transformer. Consequently, VALL-T
retains the capability of prompt-based zero-shot adaptation and demonstrates
better robustness against hallucinations with a relative reduction of 28.3% in
the word error rate. Furthermore, the controllability of alignment in VALL-T
during decoding facilitates the use of untranscribed speech prompts, even in
unknown languages. It also enables the synthesis of lengthy speech by utilizing
an aligned context window.
</p>
|
|
|
|
<p>While Bangla is considered a language with limited resources, sentiment
analysis has been a subject of extensive research in the literature.
Nevertheless, there is a scarcity of exploration into sentiment analysis
specifically in the realm of noisy Bangla texts. In this paper, we introduce a
dataset (NC-SentNoB) that we annotated manually to identify ten different types
of noise found in a pre-existing sentiment analysis dataset comprising of
around 15K noisy Bangla texts. At first, given an input noisy text, we identify
the noise type, addressing this as a multi-label classification task. Then, we
introduce baseline noise reduction methods to alleviate noise prior to
conducting sentiment analysis. Finally, we assess the performance of fine-tuned
sentiment analysis models with both noisy and noise-reduced texts to make
comparisons. The experimental findings indicate that the noise reduction
methods utilized are not satisfactory, highlighting the need for more suitable
noise reduction methods in future research endeavors. We have made the
implementation and dataset presented in this paper publicly available at
https://github.com/ktoufiquee/A-Comparative-Analysis-of-Noise-Reduction-Methods-in-Sentiment-Analysis-on-Noisy-Bangla-Texts
</p>
|
|
|
|
<p>Prompt design and engineering has become an important discipline in just the
past few months. In this paper, we provide an introduction to the main concepts
and design approaches. We also provide more advanced techniques all the way to
those needed to design LLM-based agents. We finish by providing a list of
existing tools for prompt engineering.
</p>
|
|
|
|
<p>Finding a concise and interpretable mathematical formula that accurately
describes the relationship between each variable and the predicted value in the
data is a crucial task in scientific research, as well as a significant
challenge in artificial intelligence. This problem is referred to as symbolic
regression, which is an NP-hard problem. In the previous year, a novel symbolic
regression methodology utilizing Monte Carlo Tree Search (MCTS) was advanced,
achieving state-of-the-art results on a diverse range of datasets. although
this algorithm has shown considerable improvement in recovering target
expressions compared to previous methods, the lack of guidance during the MCTS
process severely hampers its search efficiency. Recently, some algorithms have
added a pre-trained policy network to guide the search of MCTS, but the
pre-trained policy network generalizes poorly. To optimize the trade-off
between efficiency and versatility, we introduce SR-GPT, a novel algorithm for
symbolic regression that integrates Monte Carlo Tree Search (MCTS) with a
Generative Pre-Trained Transformer (GPT). By using GPT to guide the MCTS, the
search efficiency of MCTS is significantly improved. Next, we utilize the MCTS
results to further refine the GPT, enhancing its capabilities and providing
more accurate guidance for the MCTS. MCTS and GPT are coupled together and
optimize each other until the target expression is successfully determined. We
conducted extensive evaluations of SR-GPT using 222 expressions sourced from
over 10 different symbolic regression datasets. The experimental results
demonstrate that SR-GPT outperforms existing state-of-the-art algorithms in
accurately recovering symbolic expressions both with and without added noise.
</p>
|
|
|
|
<p>People employ expressive behaviors to effectively communicate and coordinate
their actions with others, such as nodding to acknowledge a person glancing at
them or saying "excuse me" to pass people in a busy corridor. We would like
robots to also demonstrate expressive behaviors in human-robot interaction.
Prior work proposes rule-based methods that struggle to scale to new
communication modalities or social situations, while data-driven methods
require specialized datasets for each social situation the robot is used in. We
propose to leverage the rich social context available from large language
models (LLMs) and their ability to generate motion based on instructions or
user preferences, to generate expressive robot motion that is adaptable and
composable, building upon each other. Our approach utilizes few-shot
chain-of-thought prompting to translate human language instructions into
parametrized control code using the robot's available and learned skills.
Through user studies and simulation experiments, we demonstrate that our
approach produces behaviors that users found to be competent and easy to
understand. Supplementary material can be found at
https://generative-expressive-motion.github.io/.
</p>
|
|
|
|
<p>This work delves into the expanding role of large language models (LLMs) in
generating artificial data. LLMs are increasingly employed to create a variety
of outputs, including annotations, preferences, instruction prompts, simulated
dialogues, and free text. As these forms of LLM-generated data often intersect
in their application, they exert mutual influence on each other and raise
significant concerns about the quality and diversity of the artificial data
incorporated into training cycles, leading to an artificial data ecosystem. To
the best of our knowledge, this is the first study to aggregate various types
of LLM-generated text data, from more tightly constrained data like "task
labels" to more lightly constrained "free-form text". We then stress test the
quality and implications of LLM-generated artificial data, comparing it with
human data across various existing benchmarks. Despite artificial data's
capability to match human performance, this paper reveals significant hidden
disparities, especially in complex tasks where LLMs often miss the nuanced
understanding of intrinsic human-generated content. This study critically
examines diverse LLM-generated data and emphasizes the need for ethical
practices in data creation and when using LLMs. It highlights the LLMs'
shortcomings in replicating human traits and behaviors, underscoring the
importance of addressing biases and artifacts produced in LLM-generated content
for future research and development. All data and code are available on our
project page.
</p>
|
|
|
|
<p>In high-dimensional data analysis, such as financial index tracking or
biomedical applications, it is crucial to select the few relevant variables
while maintaining control over the false discovery rate (FDR). In these
applications, strong dependencies often exist among the variables (e.g., stock
returns), which can undermine the FDR control property of existing methods like
the model-X knockoff method or the T-Rex selector. To address this issue, we
have expanded the T-Rex framework to accommodate overlapping groups of highly
correlated variables. This is achieved by integrating a nearest neighbors
penalization mechanism into the framework, which provably controls the FDR at
the user-defined target level. A real-world example of sparse index tracking
demonstrates the proposed method's ability to accurately track the S&P 500
index over the past 20 years based on a small number of stocks. An open-source
implementation is provided within the R package TRexSelector on CRAN.
</p>
|
|
|
|
<p>In recent years, deep learning-based solutions have proven successful in the
domains of image enhancement. This paper introduces LYT-Net, or Lightweight YUV
Transformer-based Network, as a novel approach for low-light image enhancement.
The proposed architecture, distinct from conventional Retinex-based models,
leverages the YUV color space's natural separation of luminance (Y) and
chrominance (U and V) to simplify the intricate task of disentangling light and
color information in images. By utilizing the strengths of transformers, known
for their capability to capture long-range dependencies, LYT-Net ensures a
comprehensive contextual understanding of the image while maintaining reduced
model complexity. By employing a novel hybrid loss function, our proposed
method achieves state-of-the-art results on low-light image enhancement
datasets, all while being considerably more compact than its counterparts. The
source code and pre-trained models are available at
https://github.com/albrateanu/LYT-Net
</p>
|
|
|
|
<p>Automating visual inspection for capturing defects based on civil structures
appearance is crucial due to its currently labour-intensive and time-consuming
nature. An important aspect of automated inspection is image acquisition, which
is rapid and cost-effective considering the pervasive developments in both
software and hardware computing in recent years. Previous studies largely
focused on concrete and asphalt, with less attention to masonry cracks. The
latter also lacks publicly available datasets. In this paper, we first present
a corresponding data set for instance segmentation with 1,300 annotated images
(640 pixels x 640 pixels), named as MCrack1300, covering bricks, broken bricks,
and cracks. We then test several leading algorithms for benchmarking, including
the latest large-scale model, the prompt-based Segment Anything Model (SAM). We
fine-tune the encoder using Low-Rank Adaptation (LoRA) and proposed two novel
methods for automation of SAM execution. The first method involves abandoning
the prompt encoder and connecting the SAM encoder to other decoders, while the
second method introduces a learnable self-generating prompter. In order to
ensure the seamless integration of the two proposed methods with SAM encoder
section, we redesign the feature extractor. Both proposed methods exceed
state-of-the-art performance, surpassing the best benchmark by approximately 3%
for all classes and around 6% for cracks specifically. Based on successful
detection, we propose a method based on a monocular camera and the Hough Line
Transform to automatically transform images into orthographic projection maps.
By incorporating known real sizes of brick units, we accurately estimate crack
dimensions, with the results differing by less than 10% from those obtained by
laser scanning. Overall, we address important research gaps in automated
masonry crack detection and size estimation.
</p>
|
|
|
|
<p>In this paper, we consider multi-robot localization problems with focus on
cooperative localization and observability analysis of relative pose
estimation. For cooperative localization, there is extra information available
to each robot via communication network and message passing. If odometry data
of a target robot can be transmitted to the ego-robot then the observability of
their relative pose estimation can be achieved by range-only or bearing-only
measurements provided both of their linear velocities are non-zero. If odometry
data of a target robot is not directly transmitted but estimated by the
ego-robot then there must be both range and bearing measurements to guarantee
the observability of relative pose estimation. For ROS/Gazebo simulations, we
consider four different sensing and communication structures in which extended
Kalman filtering (EKF) and pose graph optimization (PGO) estimation with
different robust loss functions (filtering and smoothing with different batch
sizes of sliding window) are compared in terms of estimation accuracy. For
hardware experiments, two Turtlebot3 equipped with UWB modules are used for
real-world inter-robot relative pose estimation, in which both EKF and PGO are
applied and compared.
</p>
|
|
|
|
<p>Large language models (LLMs) have exhibited an array of reasoning
capabilities but face challenges like error propagation and hallucination,
particularly in specialised areas like finance, where data is heterogeneous,
and precision is paramount. We explore the potential of language model
augmentation with external tools to mitigate these limitations and offload
certain reasoning steps to external tools that are more suited for the task,
instead of solely depending on the LLM's inherent abilities. More concretely,
using financial domain question-answering datasets, we apply supervised
fine-tuning on a LLaMA-2 13B Chat model to act both as a 'task router' and
'task solver'. The 'task router' dynamically directs a question to either be
answered internally by the LLM or externally via the right tool from the tool
set. Our tool-equipped SFT model, Raven, demonstrates an improvement of 35.2%
and 5.06% over the base model and SFT-only baselines, respectively, and is
highly competitive with strong GPT-3.5 results. To the best of our knowledge,
our work is the first that investigates tool augmentation of language models
for the finance domain.
</p>
|
|
|
|
<p>There exist challenges in learning and understanding religions as the
presence of complexity and depth of religious doctrines and teachings. Chatbots
as question-answering systems can help in solving these challenges. LLM
chatbots use NLP techniques to establish connections between topics and
accurately respond to complex questions. These capabilities make it perfect to
be used in enlightenment on religion as a question answering chatbot. However,
LLMs also have a tendency to generate false information, known as
hallucination. The responses of the chatbots can include content that insults
personal religious beliefs, interfaith conflicts, and controversial or
sensitive topics. It needs to avoid such cases without promoting hate speech or
offending certain groups of people or their beliefs. This study uses a vector
database-based Retrieval Augmented Generation (RAG) approach to enhance the
accuracy and transparency of LLMs. Our question-answering system is called as
"MufassirQAS". We created a vector database with several open-access books that
include Turkish context. These are Turkish translations, and interpretations on
Islam. We worked on creating system prompts with care, ensuring they provide
instructions that prevent harmful, offensive, or disrespectful responses. We
also tested the MufassirQAS and ChatGPT with sensitive questions. We got better
performance with our system. Study and enhancements are still in progress.
Results and future works are given.
</p>
|
|
|
|
<p>Diffusion planning has been recognized as an effective decision-making
paradigm in various domains. The high-quality conditional generation capability
of long-horizon trajectories makes it a promising research direction. However,
existing diffusion planning methods suffer from low decision-making frequencies
because of the expensive iterative sampling cost. To address this issue, we
introduce DiffuserLite, a fast and lightweight diffusion planning framework.
DiffuserLite employs a planning refinement process (PRP) to generate
coarse-to-fine-grained trajectories, significantly reducing the modeling of
redundant information and leading to notable increases in decision-making
frequency. Our experimental results demonstrate that DiffuserLite needs only
$0.88\%$ of the runtime cost compared to previous frameworks, achieves an
average decision-making frequency of $122$Hz, and reaches state-of-the-art
performance on D4RL benchmarks. In addition, our clean DiffuserLite framework
can serve as a flexible plugin to enhance decision frequency in other diffusion
planning algorithms, providing a structural design reference for future works.
More details and visualizations are available at [project
website](https://diffuserlite.github.io/).
</p>
|
|
|
|
<p>The synthesis of 3D facial animations from speech has garnered considerable
attention. Due to the scarcity of high-quality 4D facial data and
well-annotated abundant multi-modality labels, previous methods often suffer
from limited realism and a lack of lexible conditioning. We address this
challenge through a trilogy. We first introduce Generalized Neural Parametric
Facial Asset (GNPFA), an efficient variational auto-encoder mapping facial
geometry and images to a highly generalized expression latent space, decoupling
expressions and identities. Then, we utilize GNPFA to extract high-quality
expressions and accurate head poses from a large array of videos. This presents
the M2F-D dataset, a large, diverse, and scan-level co-speech 3D facial
animation dataset with well-annotated emotional and style labels. Finally, we
propose Media2Face, a diffusion model in GNPFA latent space for co-speech
facial animation generation, accepting rich multi-modality guidances from
audio, text, and image. Extensive experiments demonstrate that our model not
only achieves high fidelity in facial animation synthesis but also broadens the
scope of expressiveness and style adaptability in 3D facial animation.
</p>
|
|
|
|
<p>Despite significant advancements in text-to-image models for generating
high-quality images, these methods still struggle to ensure the controllability
of text prompts over images in the context of complex text prompts, especially
when it comes to retaining object attributes and relationships. In this paper,
we propose CompAgent, a training-free approach for compositional text-to-image
generation, with a large language model (LLM) agent as its core. The
fundamental idea underlying CompAgent is premised on a divide-and-conquer
methodology. Given a complex text prompt containing multiple concepts including
objects, attributes, and relationships, the LLM agent initially decomposes it,
which entails the extraction of individual objects, their associated
attributes, and the prediction of a coherent scene layout. These individual
objects can then be independently conquered. Subsequently, the agent performs
reasoning by analyzing the text, plans and employs the tools to compose these
isolated objects. The verification and human feedback mechanism is finally
incorporated into our agent to further correct the potential attribute errors
and refine the generated images. Guided by the LLM agent, we propose a
tuning-free multi-concept customization model and a layout-to-image generation
model as the tools for concept composition, and a local image editing method as
the tool to interact with the agent for verification. The scene layout controls
the image generation process among these tools to prevent confusion among
multiple objects. Extensive experiments demonstrate the superiority of our
approach for compositional text-to-image generation: CompAgent achieves more
than 10\% improvement on T2I-CompBench, a comprehensive benchmark for
open-world compositional T2I generation. The extension to various related tasks
also illustrates the flexibility of our CompAgent for potential applications.
</p>
|
|
|
|
<p>Solving feedback Stackelberg games with nonlinear dynamics and coupled
constraints, a common scenario in practice, presents significant challenges.
This work introduces an efficient method for computing local feedback
Stackelberg policies in multi-player general-sum dynamic games, with continuous
state and action spaces. Different from existing (approximate) dynamic
programming solutions that are primarily designed for unconstrained problems,
our approach involves reformulating a feedback Stackelberg dynamic game into a
sequence of nested optimization problems, enabling the derivation of
Karush-Kuhn-Tucker (KKT) conditions and the establishment of a second-order
sufficient condition for local feedback Stackelberg policies. We propose a
Newton-style primal-dual interior point method for solving constrained linear
quadratic (LQ) feedback Stackelberg games, offering provable convergence
guarantees. Our method is further extended to compute local feedback
Stackelberg policies for more general nonlinear games by iteratively
approximating them using LQ games, ensuring that their KKT conditions are
locally aligned with those of the original nonlinear games. We prove the
exponential convergence of our algorithm in constrained nonlinear games. In a
feedback Stackelberg game with nonlinear dynamics and (nonconvex) coupled costs
and constraints, our experimental results reveal the algorithm's ability to
handle infeasible initial conditions and achieve exponential convergence
towards an approximate local feedback Stackelberg equilibrium.
</p>
|
|
|
|
<p>Training machine learning and statistical models often involves optimizing a
data-driven risk criterion. The risk is usually computed with respect to the
empirical data distribution, but this may result in poor and unstable
out-of-sample performance due to distributional uncertainty. In the spirit of
distributionally robust optimization, we propose a novel robust criterion by
combining insights from Bayesian nonparametric (i.e., Dirichlet Process) theory
and recent decision-theoretic models of smooth ambiguity-averse preferences.
First, we highlight novel connections with standard regularized empirical risk
minimization techniques, among which Ridge and LASSO regressions. Then, we
theoretically demonstrate the existence of favorable finite-sample and
asymptotic statistical guarantees on the performance of the robust optimization
procedure. For practical implementation, we propose and study tractable
approximations of the criterion based on well-known Dirichlet Process
representations. We also show that the smoothness of the criterion naturally
leads to standard gradient-based numerical optimization. Finally, we provide
insights into the workings of our method by applying it to high-dimensional
sparse linear regression and robust location parameter estimation tasks.
</p>
|
|
|
|
<p>Conducting real road testing for autonomous driving algorithms can be
expensive and sometimes impractical, particularly for small startups and
research institutes. Thus, simulation becomes an important method for
evaluating these algorithms. However, the availability of free and open-source
simulators is limited, and the installation and configuration process can be
daunting for beginners and interdisciplinary researchers. We introduce an
autonomous driving simulator with photorealistic scenes, meanwhile keeping a
user-friendly workflow. The simulator is able to communicate with external
algorithms through ROS2 or Socket.IO, making it compatible with existing
software stacks. Furthermore, we implement a highly accurate vehicle dynamics
model within the simulator to enhance the realism of the vehicle's physical
effects. The simulator is able to serve various functions, including generating
synthetic data and driving with machine learning-based algorithms. Moreover, we
prioritize simplicity in the deployment process, ensuring that beginners find
it approachable and user-friendly.
</p>
|
|
|
|
<p>PageRank is a widely used centrality measure that assesses the significance
of vertices in a graph by considering their connections and the importance of
those connections. Efficiently updating PageRank on dynamic graphs is essential
for various applications due to the increasing scale of datasets. This
technical report introduces our improved Dynamic Frontier (DF) and Dynamic
Frontier with Pruning (DF-P) approaches. Given a batch update comprising edge
insertions and deletions, these approaches iteratively identify vertices likely
to change their ranks with minimal overhead. On a server featuring a 64-core
AMD EPYC-7742 processor, our approaches outperform Static and Dynamic Traversal
PageRank by 5.2x/15.2x and 1.3x/3.5x respectively - on real-world dynamic
graphs, and by 7.2x/9.6x and 4.0x/5.6x on large static graphs with random batch
updates. Furthermore, our approaches improve performance at a rate of 1.8x/1.7x
for every doubling of threads.
</p>
|
|
|
|
<p>In the field of causal modeling, potential outcomes (PO) and structural
causal models (SCMs) stand as the predominant frameworks. However, these
frameworks face notable challenges in practically modeling counterfactuals,
formalized as parameters of the joint distribution of potential outcomes.
Counterfactual reasoning holds paramount importance in contemporary
decision-making processes, especially in scenarios that demand personalized
incentives based on the joint values of $(Y(0), Y(1))$. This paper begins with
an investigation of the PO and SCM frameworks for modeling counterfactuals.
Through the analysis, we identify an inherent model capacity limitation, termed
as the ``degenerative counterfactual problem'', emerging from the consistency
rule that is the cornerstone of both frameworks. To address this limitation, we
introduce a novel \textit{distribution-consistency} assumption, and in
alignment with it, we propose the Distribution-consistency Structural Causal
Models (DiscoSCMs) offering enhanced capabilities to model counterfactuals. To
concretely reveal the enhanced model capacity, we introduce a new identifiable
causal parameter, \textit{the probability of consistency}, which holds
practical significance within DiscoSCM alone, showcased with a personalized
incentive example. Furthermore, we provide a comprehensive set of theoretical
results about the ``Ladder of Causation'' within the DiscoSCM framework. We
hope it opens new avenues for future research of counterfactual modeling,
ultimately enhancing our understanding of causality and its real-world
applications.
</p>
|
|
|
|
<p>Recently, DNA storage has surfaced as a promising alternative for data
storage, presenting notable benefits in terms of storage capacity,
cost-effectiveness in maintenance, and the capability for parallel replication.
Mathematically, the DNA storage process can be conceptualized as an insertion,
deletion, and substitution (IDS) channel. Due to the mathematical complexity
associated with the Levenshtein distance, creating a code that corrects for IDS
remains a challenging task. In this paper, we propose a bottom-up generation
approach to grow the required codebook based on the computation of Edit
Computational Graph (ECG) which differs from the algebraic constructions by
incorporating the Derivative-Free Optimization (DFO) method. Specifically, this
approach is regardless of the type of errors. Compared the results with the
work for 1-substitution-1-deletion and 2-deletion, the redundancy is reduced by
about 30-bit and 60-bit, respectively. As far as we know, our method is the
first IDS-correcting code designed using classical Natural Language Process
(NLP) techniques, marking a turning point in the field of error correction code
research. Based on the codebook generated by our method, there may be
significant breakthroughs in the complexity of encoding and decoding
algorithms.
</p>
|
|
|
|
<p>This study investigates self-supervised learning techniques to obtain
representations of Event Sequences. It is a key modality in various
applications, including but not limited to banking, e-commerce, and healthcare.
</p>
<p>We perform a comprehensive study of generative and contrastive approaches in
self-supervised learning, applying them both independently. We find that there
is no single supreme method. Consequently, we explore the potential benefits of
combining these approaches. To achieve this goal, we introduce a novel method
that aligns generative and contrastive embeddings as distinct modalities,
drawing inspiration from contemporary multimodal research.
</p>
<p>Generative and contrastive approaches are often treated as mutually
exclusive, leaving a gap for their combined exploration. Our results
demonstrate that this aligned model performs at least on par with, and mostly
surpasses, existing methods and is more universal across a variety of tasks.
Furthermore, we demonstrate that self-supervised methods consistently
outperform the supervised approach on our datasets.
</p>
|
|
|
|
<p>Guessing random additive noise decoding (GRAND) is a recently proposed
decoding paradigm particularly suitable for codes with short length and high
rate. Among its variants, ordered reliability bits GRAND (ORBGRAND) exploits
soft information in a simple and effective fashion to schedule its queries,
thereby allowing efficient hardware implementation. Compared with maximum
likelihood (ML) decoding, however, ORBGRAND still exhibits noticeable
performance gap in terms of block error rate (BLER). In order to improve the
performance of ORBGRAND while still retaining its amenability to hardware
implementation, a new variant of ORBGRAND termed RS-ORBGRAND is proposed, whose
basic idea is to reshuffle the queries of ORBGRAND so that the expected number
of queries is minimized. Numerical simulations show that RS-ORBGRAND leads to
noticeable gains compared with ORBGRAND and its existing variants, and is only
0.1dB away from ML decoding, for BLER as low as $10^{-6}$.
</p>
|
|
|
|
<p>We consider channels with synchronization errors modeled as insertions and
deletions. A classical result for such channels is the information stability of
such channels, hence the existence of the Shannon capacity, when the
synchronization errors are memoryless. In this paper, we extend this result to
the case where the insertions and deletions have memory. Specifically, we
assume that the synchronization errors are governed by a stationary and ergodic
finite state Markov chain, and prove that mutual information capacity of such
channels exist, and it is equal to its coding capacity, showing that there
exists a coding scheme which achieves this limit.
</p>
|
|
|
|
<p>The share of online video traffic in global carbon dioxide emissions is
growing steadily. To comply with the demand for video media, dedicated
compression techniques are continuously optimized, but at the expense of
increasingly higher computational demands and thus rising energy consumption at
the video encoder side. In order to find the best trade-off between compression
and energy consumption, modeling encoding energy for a wide range of encoding
parameters is crucial. We propose an encoding time and energy model for SVT-AV1
based on empirical relations between the encoding time and video parameters as
well as encoder configurations. Furthermore, we model the influence of video
content by established content descriptors such as spatial and temporal
information. We then use the predicted encoding time to estimate the required
energy demand and achieve a prediction error of 19.6 % for encoding time and
20.9 % for encoding energy.
</p>
|
|
|
|
<p>Instruction finetuning on a variety of image-text instruction data is the key
to obtaining a versatile Multimodal Large Language Model (MLLM), and different
configurations of the instruction data can lead to finetuned models with
different capabilities. However, we have discovered that data conflicts are
inevitable when mixing instruction data from distinct domains, which can result
in performance drops for tasks of a specific domain. To address this issue, we
propose to apply an efficient Mixture of Experts (MoE) design, which is a
sparse Mixture of LoRA Experts (MoLE) for instruction finetuning MLLMs. Within
the Transformer layers, we extend the popular Low-Rank Adaption (LoRA) method
by creating a set of LoRA experts specifically for the MLP layer, and route
each token to the top-1 expert based on a routing function, allowing adaptive
choices for tokens from different domains. Since the LoRA experts are sparsely
activated, the training and inference cost are kept roughly constant compared
to the original LoRA method. By replacing the plain-LoRA of LLaVA-1.5 with our
MoE design, our final model is named LLaVA-MoLE. Extensive experiments proved
that LLaVA-MoLE effectively mitigates the data conflict issue when mixing
multiple distinct instruction datasets with various configurations, and
achieves consistent performance gains over the strong plain-LoRA baselines.
Most importantly, on the mixed datasets, LLaVA-MoLE can even outperform the
plain-LoRA baseline trained with twice the samples.
</p>
|
|
|
|
<p>This paper discusses the error and cost aspects of ill-posed integral
equations when given discrete noisy point evaluations on a fine grid. Standard
solution methods usually employ discretization schemes that are directly
induced by the measurement points. Thus, they may scale unfavorably with the
number of evaluation points, which can result in computational inefficiency. To
address this issue, we propose an algorithm that achieves the same level of
accuracy while significantly reducing computational costs. Our approach
involves an initial averaging procedure to sparsify the underlying grid. To
keep the exposition simple, we focus only on one-dimensional ill-posed integral
equations that have sufficient smoothness. However, the approach can be
generalized to more complicated two- and three-dimensional problems with
appropriate modifications.
</p>
|
|
|
|
<p>Federated learning enhanced by differential privacy has emerged as a popular
approach to better safeguard the privacy of client-side data by protecting
clients' contributions during the training process. Existing solutions
typically assume a uniform privacy budget for all records and provide
one-size-fits-all solutions that may not be adequate to meet each record's
privacy requirement. In this paper, we explore the uncharted territory of
cross-silo FL with record-level personalized differential privacy. We devise a
novel framework named rPDP-FL, employing a two-stage hybrid sampling scheme
with both client-level sampling and non-uniform record-level sampling to
accommodate varying privacy requirements. A critical and non-trivial problem is
to select the ideal per-record sampling probability q given the personalized
privacy budget {\epsilon}. We introduce a versatile solution named
Simulation-CurveFitting, allowing us to uncover a significant insight into the
nonlinear correlation between q and {\epsilon} and derive an elegant
mathematical model to tackle the problem. Our evaluation demonstrates that our
solution can provide significant performance gains over the baselines that do
not consider personalized privacy preservation.
</p>
|
|
|
|
<p>Process modeling and discovery techniques aim to construct sound and valid
process models for different types of processes, i.e., process orchestrations
and collaboration processes. Orchestrations represent behavior of cases within
one process. Collaboration processes represent behavior of collaborating cases
within multiple process orchestrations that interact via collaboration concepts
such as organizations, agents, objects, and services. The heterogeneity of
collaboration concepts and types such as message exchange and resource sharing
has led to different representations and discovery techniques for collaboration
process models, but a standard model class is lacking. We propose collaboration
Petri nets (cPN) to achieve comparability between techniques, to enable
approach and property transfer, and to build a standardized collaboration
mining pipeline similar to process mining. For cPN, we require desirable
modeling power, decision power, modeling convenience, and relations to existing
model classes. We show the representation of collaboration types, structural
characterization as workflow nets, automatic verification of soundness,
bisimulation equivalence to existing model classes, and application in a
general discovery framework. As empirical evidence to discover cPN, we conduct
a comparative evaluation between three discovery techniques on a set of
existing collaboration event logs.
</p>
|
|
|
|
<p>This paper delves into the degradability of quantum channels, with a specific
focus on high-dimensional extensions of qubit depolarizing channels in
low-noise regimes. We build upon the foundation of $\eta$-approximate
degradable channels, as established by Sutter et al. and Leditzky et al., to
introduce and examine the Modified Landau-Streater (MLS) channels. These
channels expand upon the qubit depolarizing and the recently proposed modified
Werner-Holevo channels by Roofeh and Karimipour, extending them to
higher-dimensional Hilbert spaces (with dimension $d=2j+1$, where $j$ are
positive half-integers). Our investigation centers on their conformity to the
$O(\varepsilon^2)$ degradability pattern, aligning with and extending Leditzky
et al.'s findings in the $d=2$ case. By replacing the SU($2$) generators with
SU($d$) in our treatment, we may explore the potential inclusion of generalized
Gell-Mann matrices in future research. Our results enhance the understanding of
super-additivity in quantum channels within the low-noise regime and lay the
groundwork for future explorations into conditions and structures that could
lead to $O(\varepsilon^2)$ degradability across a broader spectrum of quantum
channels.
</p>
|
|
|
|
<p>Simulating realistic time-domain observations of gravitational waves (GWs)
and GW detector glitches can help in advancing GW data analysis. Simulated data
can be used in downstream tasks by augmenting datasets for signal searches,
balancing data sets for machine learning, and validating detection schemes. In
this work, we present Conditional Derivative GAN (cDVGAN), a novel conditional
model in the Generative Adversarial Network framework for simulating multiple
classes of time-domain observations that represent gravitational waves (GWs)
and detector glitches. cDVGAN can also generate generalized hybrid samples that
span the variation between classes through interpolation in the conditioned
class vector. cDVGAN introduces an additional player into the typical 2-player
adversarial game of GANs, where an auxiliary discriminator analyzes the
first-order derivative time-series. Our results show that this provides
synthetic data that better captures the features of the original data. cDVGAN
conditions on three classes, two denoised from LIGO blip and tomte glitch
events from its 3rd observing run (O3), and the third representing binary black
hole (BBH) mergers. Our proposed cDVGAN outperforms 4 different baseline GAN
models in replicating the features of the three classes. Specifically, our
experiments show that training convolutional neural networks (CNNs) with our
cDVGAN-generated data improves the detection of samples embedded in detector
noise beyond the synthetic data from other state-of-the-art GAN models. Our
best synthetic dataset yields as much as a 4.2% increase in
area-under-the-curve (AUC) performance compared to synthetic datasets from
baseline GANs. Moreover, training the CNN with hybrid samples from our cDVGAN
outperforms CNNs trained only on the standard classes, when identifying real
samples embedded in LIGO detector background (4% AUC improvement for cDVGAN).
</p>
|
|
|
|
<p>In this work we present FreDSNet, a deep learning solution which obtains
semantic 3D understanding of indoor environments from single panoramas.
Omnidirectional images reveal task-specific advantages when addressing scene
understanding problems due to the 360-degree contextual information about the
entire environment they provide. However, the inherent characteristics of the
omnidirectional images add additional problems to obtain an accurate detection
and segmentation of objects or a good depth estimation. To overcome these
problems, we exploit convolutions in the frequential domain obtaining a wider
receptive field in each convolutional layer. These convolutions allow to
leverage the whole context information from omnidirectional images. FreDSNet is
the first network that jointly provides monocular depth estimation and semantic
segmentation from a single panoramic image exploiting fast Fourier
convolutions. Our experiments show that FreDSNet has similar performance as
specific state of the art methods for semantic segmentation and depth
estimation. FreDSNet code is publicly available in
https://github.com/Sbrunoberenguel/FreDSNet
</p>
|
|
|
|
<p>Feature attribution methods (FAs) are popular approaches for providing
insights into the model reasoning process of making predictions. The more
faithful a FA is, the more accurately it reflects which parts of the input are
more important for the prediction. Widely used faithfulness metrics, such as
sufficiency and comprehensiveness use a hard erasure criterion, i.e. entirely
removing or retaining the top most important tokens ranked by a given FA and
observing the changes in predictive likelihood. However, this hard criterion
ignores the importance of each individual token, treating them all equally for
computing sufficiency and comprehensiveness. In this paper, we propose a simple
yet effective soft erasure criterion. Instead of entirely removing or retaining
tokens from the input, we randomly mask parts of the token vector
representations proportionately to their FA importance. Extensive experiments
across various natural language processing tasks and different FAs show that
our soft-sufficiency and soft-comprehensiveness metrics consistently prefer
more faithful explanations compared to hard sufficiency and comprehensiveness.
Our code: https://github.com/casszhao/SoftFaith
</p>
|
|
|
|
<p>Context: Software model optimization is a process that automatically
generates design alternatives, typically to enhance quantifiable non-functional
properties of software systems, such as performance and reliability.
Multi-objective evolutionary algorithms have shown to be effective in this
context for assisting the designer in identifying trade-offs between the
desired non-functional properties. Objective: In this work, we investigate the
effects of imposing a time budget to limit the search for design alternatives,
which inevitably affects the quality of the resulting alternatives. Method: The
effects of time budgets are analyzed by investigating both the quality of the
generated design alternatives and their structural features when varying the
budget and the genetic algorithm (NSGA-II, PESA2, SPEA2). This is achieved by
employing multi-objective quality indicators and a tree-based representation of
the search space. Results: The study reveals that the time budget significantly
affects the quality of Pareto fronts, especially for performance and
reliability. NSGA-II is the fastest algorithm, while PESA2 generates the
highest-quality solutions. The imposition of a time budget results in
structurally distinct models compared to those obtained without a budget,
indicating that the search process is influenced by both the budget and
algorithm selection. Conclusions: In software model optimization, imposing a
time budget can be effective in saving optimization time, but designers should
carefully consider the trade-off between time and solution quality in the
Pareto front, along with the structural characteristics of the generated
models. By making informed choices about the specific genetic algorithm,
designers can achieve different trade-offs.
</p>
|