2023-11-05

cs.AI

cs.AI - 2023-11-05

Modelling Cellular Perturbations with the Sparse Additive Mechanism Shift Variational Autoencoder

paper_url: http://arxiv.org/abs/2311.02794
repo_url: https://github.com/insitro/sams-vae
paper_authors: Michael Bereket, Theofanis Karaletsos
For: This paper proposes a new method called SAMS-VAE for modeling the effects of diverse interventions on cells in order to characterize unknown biological mechanisms of action in drug discovery.* Methods: SAMS-VAE models the latent state of a perturbed sample as the sum of a local latent variable capturing sample-specific variation and sparse global variables of latent intervention effects, and sparsifies these global latent variables for individual perturbations to identify disentangled, perturbation-specific latent subspaces that are flexibly composable.* Results: SAMS-VAE outperforms comparable models in terms of generalization across in-distribution and out-of-distribution tasks, including a combinatorial reasoning task under resource paucity, and yields interpretable latent structures which correlate strongly to known biological mechanisms.Here is the same information in Simplified Chinese text:* For: 这篇论文提出了一种新的方法called SAMS-VAE，用于模型不同干预对细胞的影响，以Characterize unknown biological mechanisms of action in drug discovery。* Methods: SAMS-VAE models the latent state of a perturbed sample as the sum of a local latent variable capturing sample-specific variation and sparse global variables of latent intervention effects，并将这些全球隐变量简化为具体的干预特定隐变空间，以便可以flexibly composable。* Results: SAMS-VAE比相关的模型在不同的任务上表现出色，包括资源缺乏下的combined reasoning任务，并且生成了可解释的隐变结构，与知道的生物机制强相关。

Abstract
Generative models of observations under interventions have been a vibrant topic of interest across machine learning and the sciences in recent years. For example, in drug discovery, there is a need to model the effects of diverse interventions on cells in order to characterize unknown biological mechanisms of action. We propose the Sparse Additive Mechanism Shift Variational Autoencoder, SAMS-VAE, to combine compositionality, disentanglement, and interpretability for perturbation models. SAMS-VAE models the latent state of a perturbed sample as the sum of a local latent variable capturing sample-specific variation and sparse global variables of latent intervention effects. Crucially, SAMS-VAE sparsifies these global latent variables for individual perturbations to identify disentangled, perturbation-specific latent subspaces that are flexibly composable. We evaluate SAMS-VAE both quantitatively and qualitatively on a range of tasks using two popular single cell sequencing datasets. In order to measure perturbation-specific model-properties, we also introduce a framework for evaluation of perturbation models based on average treatment effects with links to posterior predictive checks. SAMS-VAE outperforms comparable models in terms of generalization across in-distribution and out-of-distribution tasks, including a combinatorial reasoning task under resource paucity, and yields interpretable latent structures which correlate strongly to known biological mechanisms. Our results suggest SAMS-VAE is an interesting addition to the modeling toolkit for machine learning-driven scientific discovery.

摘要
“生成干预测数据的模型在机器学习和科学领域中具有很大的兴趣，例如药物探索中需要模型不同类型的干预效应以描述未知生物机制。我们提出了对compose, disentangle和可解释性具有优势的SAMS-VAE模型，用于干预模型。SAMS-VAE将批处数据的latent state视为受到干预的sample专有的本地 latent variable和稀有的全球 latent variable，并将这些全球 latent variable压缩以归一化干预效应。我们通过两个受欢迎的单细胞测量数据集进行评估，并提出了基于对干预模型的平均治疗效应的评估框架，以及与后 posterior predictive checks 的连结。SAMS-VAE在不同任务中表现出色，包括资源缺乏下的构成逻辑任务，并具有可解释的latent结构，与生物机制具有强相关。我们的结果显示SAMS-VAE是机器学习驱动科学探索的有趣添加。”

CausalCite: A Causal Formulation of Paper Citations

paper_url: http://arxiv.org/abs/2311.02790
repo_url: https://github.com/causalnlp/causal-cite
paper_authors: Ishan Kumar, Zhijing Jin, Ehsan Mokhtarian, Siyuan Guo, Yuen Chen, Negar Kiyavash, Mrinmaya Sachan, Bernhard Schoelkopf
for: 这 paper 的目的是提出一种 causal inference 方法，用于评估科学论文的影响力。
methods: 该方法基于高维文本嵌入，使用 LLMs 对每篇论文进行编码，然后通过cosine similarity提取相似样本，并使用这些样本的权重平均值 Synthesize 一个 counterfactual 样本。
results: 该方法可以准确地评估论文的影响力，并且具有高相关性和稳定性。 authors 还提供了一些建议，用于未来的研究人员可以更好地使用该 metric。 code 和数据可以在 https://github.com/causalNLP/causal-cite 上获取。

Abstract
Evaluating the significance of a paper is pivotal yet challenging for the scientific community. While the citation count is the most commonly used proxy for this purpose, they are widely criticized for failing to accurately reflect a paper's true impact. In this work, we propose a causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings. Specifically, we encode each paper using the text embeddings by large language models (LLMs), extract similar samples by cosine similarity, and synthesize a counterfactual sample by the weighted average of similar papers according to their similarity values. We apply the resulting metric, called CausalCite, as a causal formulation of paper citations. We show its effectiveness on various criteria, such as high correlation with paper impact as reported by scientific experts on a previous dataset of 1K papers, (test-of-time) awards for past papers, and its stability across various sub-fields of AI. We also provide a set of findings that can serve as suggested ways for future researchers to use our metric for a better understanding of a paper's quality. Our code and data are at https://github.com/causalNLP/causal-cite.

摘要
评估一篇论文的重要性是科学社区中的一项核心任务，但是也是一项具有挑战性的任务。虽然引用数是最常用的代理，但它们被广泛批评因为不能准确反映论文的真实影响。在这种情况下，我们提出了一种 causal inference 方法，即 TextMatch，该方法将传统的匹配框架应用到高维文本嵌入。具体来说，我们使用大型自然语言模型（LLM）生成的文本嵌入来编码每篇论文，然后通过cosinus相似性来提取相似的样本，并使用这些样本的相似性值来权重混合这些相似的论文。我们称这种度量为 CausalCite，它是一种用于评估论文引用的 causal 形式。我们在不同的评估标准下显示了 CausalCite 的有效性，包括高相关性与论文影响力（由科学专家在过去的数据集上提供的）、奖励、以及在不同的人工智能子领域中的稳定性。我们还提供了一些发现，可以帮助未来的研究人员通过我们的度量来更好地理解一篇论文的质量。我们的代码和数据可以在 GitHub 上找到：https://github.com/causalNLP/causal-cite。

Make a Donut: Language-Guided Hierarchical EMD-Space Planning for Zero-shot Deformable Object Manipulation

paper_url: http://arxiv.org/abs/2311.02787
repo_url: None
paper_authors: Yang You, Bokui Shen, Congyue Deng, Haoran Geng, He Wang, Leonidas Guibas
for: 这个论文的目的是解决机器人 manipulate 弹性对象的问题，这是机器人学中最吸引人又最困难的问题。
methods: 这个论文使用了大语言模型（LLM）来提出一个没有示范的层次规划方法，可以解决复杂的长期任务。每个阶段都有工具和子目标，使用了DiffPhysics-P2P损失函数和地球运动距离（EMD）空间来优化预测控制策略。
results: 实验结果表明，这种方法在糖体 manipulate 任务中表现出色，包括短期和长期任务。它还能够Robustly 扩展到未经示范的复杂任务。

Abstract
Deformable object manipulation stands as one of the most captivating yet formidable challenges in robotics. While previous techniques have predominantly relied on learning latent dynamics through demonstrations, typically represented as either particles or images, there exists a pertinent limitation: acquiring suitable demonstrations, especially for long-horizon tasks, can be elusive. Moreover, basing learning entirely on demonstrations can hamper the model's ability to generalize beyond the demonstrated tasks. In this work, we introduce a demonstration-free hierarchical planning approach capable of tackling intricate long-horizon tasks without necessitating any training. We employ large language models (LLMs) to articulate a high-level, stage-by-stage plan corresponding to a specified task. For every individual stage, the LLM provides both the tool's name and the Python code to craft intermediate subgoal point clouds. With the tool and subgoal for a particular stage at our disposal, we present a granular closed-loop model predictive control strategy. This leverages Differentiable Physics with Point-to-Point correspondence (DiffPhysics-P2P) loss in the earth mover distance (EMD) space, applied iteratively. Experimental findings affirm that our technique surpasses multiple benchmarks in dough manipulation, spanning both short and long horizons. Remarkably, our model demonstrates robust generalization capabilities to novel and previously unencountered complex tasks without any preliminary demonstrations. We further substantiate our approach with experimental trials on real-world robotic platforms.

摘要
manipulate 非常复杂的物体stood as one of the most captivating yet formidable challenges in robotics. While previous techniques have predominantly relied on learning latent dynamics through demonstrations, typically represented as either particles or images, there exists a pertinent limitation: acquiring suitable demonstrations, especially for long-horizon tasks, can be elusive. Moreover, basing learning entirely on demonstrations can hamper the model's ability to generalize beyond the demonstrated tasks. In this work, we introduce a demonstration-free hierarchical planning approach capable of tackling intricate long-horizon tasks without necessitating any training. We employ large language models (LLMs) to articulate a high-level, stage-by-stage plan corresponding to a specified task. For every individual stage, the LLM provides both the tool's name and the Python code to craft intermediate subgoal point clouds. With the tool and subgoal for a particular stage at our disposal, we present a granular closed-loop model predictive control strategy. This leverages Differentiable Physics with Point-to-Point correspondence (DiffPhysics-P2P) loss in the earth mover distance (EMD) space, applied iteratively. Experimental findings affirm that our technique surpasses multiple benchmarks in dough manipulation, spanning both short and long horizons. Remarkably, our model demonstrates robust generalization capabilities to novel and previously unencountered complex tasks without any preliminary demonstrations. We further substantiate our approach with experimental trials on real-world robotic platforms.

Towards Generic Anomaly Detection and Understanding: Large-scale Visual-linguistic Model (GPT-4V) Takes the Lead

paper_url: http://arxiv.org/abs/2311.02782
repo_url: https://github.com/caoyunkang/GPT4V-for-Generic-Anomaly-Detection
paper_authors: Yunkang Cao, Xiaohao Xu, Chen Sun, Xiaonan Huang, Weiming Shen
for: 这个研究旨在应用GPT-4V（视力语言模型）来进行一般化的偏常探测任务。
methods: 这个研究使用GPT-4V模型进行多modal, multi-domain偏常探测任务，包括图像、影片、点 cloud和时间序列数据，并涵盖了不同的应用领域，如工业、医疗、逻辑、影片、3D偏常探测和位置任务。
results: GPT-4V在zero/one-shot偏常探测中显示出了高效的探测和解释全球和细部 semantic 模式，实现了精准地区别 normal 和偏常的分别。

Abstract
Anomaly detection is a crucial task across different domains and data types. However, existing anomaly detection models are often designed for specific domains and modalities. This study explores the use of GPT-4V(ision), a powerful visual-linguistic model, to address anomaly detection tasks in a generic manner. We investigate the application of GPT-4V in multi-modality, multi-domain anomaly detection tasks, including image, video, point cloud, and time series data, across multiple application areas, such as industrial, medical, logical, video, 3D anomaly detection, and localization tasks. To enhance GPT-4V's performance, we incorporate different kinds of additional cues such as class information, human expertise, and reference images as prompts.Based on our experiments, GPT-4V proves to be highly effective in detecting and explaining global and fine-grained semantic patterns in zero/one-shot anomaly detection. This enables accurate differentiation between normal and abnormal instances. Although we conducted extensive evaluations in this study, there is still room for future evaluation to further exploit GPT-4V's generic anomaly detection capacity from different aspects. These include exploring quantitative metrics, expanding evaluation benchmarks, incorporating multi-round interactions, and incorporating human feedback loops. Nevertheless, GPT-4V exhibits promising performance in generic anomaly detection and understanding, thus opening up a new avenue for anomaly detection.

摘要
<>Translate the given text into Simplified Chinese.<> anomaly detection 是一项重要的任务 across 不同的领域和数据类型。然而，现有的异常检测模型通常是为特定的领域和Modalities 设计的。本研究探索使用 GPT-4V（视力语言模型）来Address 异常检测任务的通用方式。我们 investigate GPT-4V 在多modal, multi-domain 异常检测任务中的应用，包括图像、视频、点云和时间序列数据，以及多个应用领域，如工业、医疗、逻辑、视频、3D 异常检测和位置定位任务。为了提高 GPT-4V 的表现，我们 incorporate 不同类型的额外提示，如类信息、人工智能和参考图像。根据我们的实验，GPT-4V 在检测和解释 Zero/one-shot 异常检测中表现出色，可以准确地分辨正常和异常实例。虽然我们进行了广泛的评估，但还有更多的可能性来自 GPT-4V 的通用异常检测能力。这些包括探索量化指标、扩展评估标准、 incorporating 多 Round Interactions 和 incorporating 人类反馈循环。然而，GPT-4V 在通用异常检测和理解方面表现出色，因此开启了一个新的途径 для异常检测。

ChaTA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs

paper_url: http://arxiv.org/abs/2311.02775
repo_url: None
paper_authors: Yann Hicke, Anmol Agarwal, Qianou Ma, Paul Denny
for: 这篇论文目的是为了解决知识检索和智能问答（QA）中的扩展和智能化问题。
methods: 这篇论文使用了开源的大语言模型（LLM），以保持数据隐私。它使用了LLaMA-2家族模型，并应用了改进技术，包括检索增强生成（RAG）、监督微调（SFT）和人类反馈学习的替代方法（RLHF）。
results: 在一个 Piazza 数据集上，这篇论文通过人工评估和自动 LLlM 评估，发现了改进技术的共同作用，提高了答案质量约33%，并发现了RAG 是一个有力的添加。这项工作将为开发智能 QA 助手Customizable for 课程而铺平道路。

Abstract
To address the challenges of scalable and intelligent question-answering (QA), we introduce an innovative solution that leverages open-source Large Language Models (LLMs) to ensure data privacy. We use models from the LLaMA-2 family and augmentations including retrieval augmented generation (RAG), supervised fine-tuning (SFT), and an alternative to reinforcement learning with human feedback (RLHF). We perform our experiments on a Piazza dataset from an introductory CS course with 10k QA pairs and 1.5k pairs of preferences data and conduct both human evaluations and automatic LLM evaluations on a small subset. We find preliminary evidence that modeling techniques collectively enhance the quality of answers by 33%, and RAG is an impactful addition. This work paves the way for the development of ChaTA, an intelligent QA assistant customizable for courses with an online QA platform.

摘要

Communication Efficient and Privacy-Preserving Federated Learning Based on Evolution Strategies

paper_url: http://arxiv.org/abs/2311.03405
repo_url: https://github.com/Eric-Lan0/FedES
paper_authors: Guangchen Lan
for: 这个研究旨在提出一种基于演化策略的联合学习算法（Federated Evolution Strategies，FedES），以便在分布式的深度神经网络（DNNs）训练中实现低通信负载和数据隐私。
methods: 这个研究使用了演化策略（Evolution Strategies）来实现联合学习，而不是传输模型参数。因此，它具有非常低的通信负载。此外，这个方法还可以保护数据隐私，因为第三方无法估算梯度 без knowing 预先共享的种子。
results: 实验结果显示，FedES 可以实现低通信负载和数据隐私，同时保持与反射方法相同的参数整合性。

Abstract
Federated learning (FL) is an emerging paradigm for training deep neural networks (DNNs) in distributed manners. Current FL approaches all suffer from high communication overhead and information leakage. In this work, we present a federated learning algorithm based on evolution strategies (FedES), a zeroth-order training method. Instead of transmitting model parameters, FedES only communicates loss values, and thus has very low communication overhead. Moreover, a third party is unable to estimate gradients without knowing the pre-shared seed, which protects data privacy. Experimental results demonstrate FedES can achieve the above benefits while keeping convergence performance the same as that with back propagation methods.

摘要
Federated learning（FL）是一种新趋势的深度神经网络（DNNs）训练方法，现有的FL方法都受到高度通信开销和信息泄露的限制。在这项工作中，我们提出了基于进化策略（FedES）的 federated learning算法，而不是传输模型参数，FedES只在交换损失值，因此通信开销非常低。此外，第三方无法估计梯度，不知道预先分享的种子，因此保护了数据隐私。实验结果表明，FedES可以实现这些优点，同时保持与反向传播方法相同的凝结性能。Note: The translation is done using Google Translate and may not be perfect. Please let me know if you need further assistance or if you would like me to use a different translation tool.

Rule Learning as Machine Translation using the Atomic Knowledge Bank

paper_url: http://arxiv.org/abs/2311.02765
repo_url: https://github.com/krisaesoey/atomictranslation
paper_authors: Kristoffer Æsøy, Ana Ozaki
for: 本研究旨在探讨使用机器学习模型进行逻辑推理是否可靠和可控的问题。
methods: 本研究使用 transformers 将自然语言中表达的规则翻译成逻辑规则，并使用逻辑推理工具进行逻辑推理。
results: 研究发现，使用 transformers 翻译自然语言中表达的规则可以生成可靠和可控的逻辑规则，并且可以用于逻辑推理。

Abstract
Machine learning models, and in particular language models, are being applied to various tasks that require reasoning. While such models are good at capturing patterns their ability to reason in a trustable and controlled manner is frequently questioned. On the other hand, logic-based rule systems allow for controlled inspection and already established verification methods. However it is well-known that creating such systems manually is time-consuming and prone to errors. We explore the capability of transformers to translate sentences expressing rules in natural language into logical rules. We see reasoners as the most reliable tools for performing logical reasoning and focus on translating language into the format expected by such tools. We perform experiments using the DKET dataset from the literature and create a dataset for language to logic translation based on the Atomic knowledge bank.

摘要
机器学习模型，尤其是语言模型，在各种需要逻辑 reasoning 任务中应用。虽然这些模型能够捕捉模式，但其逻辑 reasoning 能力受到一定的质疑。然而，逻辑基础的规则系统具有可控的检查和已知的验证方法。然而，手动创建这些系统可能需要很长时间，并且容易出错。我们 investigate transformer 能力将自然语言中的句子翻译成逻辑规则。我们认为逻辑工具是逻辑 reasoning 最可靠的工具，因此我们将着眼于将语言翻译成这些工具所期望的格式。我们使用文献中的 DKET 数据集进行实验，并创建了基于 Atomic knowledge bank 的语言到逻辑翻译数据集。

Causal Question Answering with Reinforcement Learning

paper_url: http://arxiv.org/abs/2311.02760
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Lukas Blübaum, Stefan Heindorf
for: 本研究的目的是回答 causal вопро题，即寻找 causal 关系和其背景数据。
methods: 本文使用 reinforcement learning 方法，具体来说是 actor-critic 算法，来搜索 causal 关系和解释问题。
results: 本文的实验结果表明，使用 reinforcement learning 方法可以成功地回答 causal 问题，并且可以快速地搜索到解释问题的路径。

Abstract
Causal questions inquire about causal relationships between different events or phenomena. Specifically, they often aim to determine whether there is a relationship between two phenomena, or to identify all causes/effects of a phenomenon. Causal questions are important for a variety of use cases, including virtual assistants and search engines. However, many current approaches to causal question answering cannot provide explanations or evidence for their answers. Hence, in this paper, we aim to answer causal questions with CauseNet, a large-scale dataset of causal relations and their provenance data. Inspired by recent, successful applications of reinforcement learning to knowledge graph tasks, such as link prediction and fact-checking, we explore the application of reinforcement learning on CauseNet for causal question answering. We introduce an Actor-Critic based agent which learns to search through the graph to answer causal questions. We bootstrap the agent with a supervised learning procedure to deal with large action spaces and sparse rewards. Our evaluation shows that the agent successfully prunes the search space to answer binary causal questions by visiting less than 30 nodes per question compared to over 3,000 nodes by a naive breadth-first search. Our ablation study indicates that our supervised learning strategy provides a strong foundation upon which our reinforcement learning agent improves. The paths returned by our agent explain the mechanisms by which a cause produces an effect. Moreover, for each edge on a path, CauseNet stores its original source on the web allowing for easy verification of paths.

摘要
causal 问题查询的关系是两个或多个事件或现象之间的关系。特别是，它们通常想要确定两个现象之间是否存在关系，或者找出一个现象的所有原因。 causal 问题是各种用例中重要的，包括虚拟助手和搜索引擎。然而，许多当前的 causal 问题回答方法无法提供解释或证据。因此，在这篇论文中，我们使用 CauseNet，一个大规模的 causal 关系和其来源数据集，回答 causal 问题。以 reciprocal learning 的 inspiration，我们在 CauseNet 上应用 reciprocal learning 来回答 causal 问题。我们 introduce 一个 actor-critic 基于的搜索者，该搜索者可以在图上搜索以回答 causal 问题。我们使用一种监督学习过程来处理大的动作空间和罕见奖励。我们的评估显示，我们的搜索者可以成功地减少搜索空间，以回答 binary 的 causal 问题，每个问题只需访问 fewer than 30 个节点，而不是 naive 的广度优先搜索所需的 more than 3,000 个节点。我们的剥离研究表明，我们的监督学习策略提供了一个强大的基础，于而我们的 reciprocal learning 代理进行改进。 path 返回的 by our agent 解释了一个原因如何产生一个效果。此外，每个边在路径上，CauseNet 都将其原始来源保存在网上，以便轻松验证路径。

Learning Independently from Causality in Multi-Agent Environments

paper_url: http://arxiv.org/abs/2311.02741
repo_url: None
paper_authors: Rafael Pina, Varuna De Silva, Corentin Artaud
for: 这个论文旨在研究多智能 Reinforcement Learning（MARL）领域中的懒散代理问题，并从 causality 的视角来 investigate 这个问题。
methods: 该论文使用了 fully decentralized MARL setup，并使用了 causality 来链接个体观察和团队奖励。
results: 实验结果表明，通过使用 causality 来链接个体观察和团队奖励，可以提高独立代理的智能行为，并且帮助团队实现更好的性能。

Abstract
Multi-Agent Reinforcement Learning (MARL) comprises an area of growing interest in the field of machine learning. Despite notable advances, there are still problems that require investigation. The lazy agent pathology is a famous problem in MARL that denotes the event when some of the agents in a MARL team do not contribute to the common goal, letting the teammates do all the work. In this work, we aim to investigate this problem from a causality-based perspective. We intend to create the bridge between the fields of MARL and causality and argue about the usefulness of this link. We study a fully decentralised MARL setup where agents need to learn cooperation strategies and show that there is a causal relation between individual observations and the team reward. The experiments carried show how this relation can be used to improve independent agents in MARL, resulting not only on better performances as a team but also on the rise of more intelligent behaviours on individual agents.

摘要
多智能机器学习（MARL）是一个快速发展的领域之一，尚未解决的问题仍然存在。懒散代理症是MARL领域中著名的问题，表示一些代理在MARL团队中不做贡献，让团队其他成员完成所有工作。在这种情况下，我们希望从 causality 的视角来调查这个问题。我们想要建立 MARL 和 causality 之间的桥梁，并讨论这种链接的有用性。我们研究了一个完全分散式的 MARL 设置，其中代理需要学习合作策略，并证明了个体观察与团队奖励之间存在 causal 关系。实验表明，这种关系可以用来改进独立的代理在 MARL 中的表现，不仅提高团队的性能，还使得代理的行为更加聪明。

AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection

paper_url: http://arxiv.org/abs/2311.02733
repo_url: None
paper_authors: Sahibzada Adil Shahzad, Ammarah Hashmi, Yan-Tsung Peng, Yu Tsao, Hsin-Min Wang
for: 防止伪造 multimedia 内容的传播，尤其是 fake news 和 false propaganda。
methods: 使用 multi-modal self-supervised learning (SSL) 特征提取器，捕捉视频和音频模式之间的不一致，进而实现多模式识别。
results: 比较所有现有模型，取得新的 state-of-the-art 性能在 FakeAVCeleb 和 DeepfakeTIMIT 数据集上。

Abstract
Multimodal manipulations (also known as audio-visual deepfakes) make it difficult for unimodal deepfake detectors to detect forgeries in multimedia content. To avoid the spread of false propaganda and fake news, timely detection is crucial. The damage to either modality (i.e., visual or audio) can only be discovered through multi-modal models that can exploit both pieces of information simultaneously. Previous methods mainly adopt uni-modal video forensics and use supervised pre-training for forgery detection. This study proposes a new method based on a multi-modal self-supervised-learning (SSL) feature extractor to exploit inconsistency between audio and visual modalities for multi-modal video forgery detection. We use the transformer-based SSL pre-trained Audio-Visual HuBERT (AV-HuBERT) model as a visual and acoustic feature extractor and a multi-scale temporal convolutional neural network to capture the temporal correlation between the audio and visual modalities. Since AV-HuBERT only extracts visual features from the lip region, we also adopt another transformer-based video model to exploit facial features and capture spatial and temporal artifacts caused during the deepfake generation process. Experimental results show that our model outperforms all existing models and achieves new state-of-the-art performance on the FakeAVCeleb and DeepfakeTIMIT datasets.

摘要
多模态操作（也称为音频视频深伪）使得单模态深伪检测器很难检测多媒体内容中的伪造。为避免假新闻和假 пропаганда 的传播，实时检测是关键。过去的方法主要采用单模态视频科学和监测预训练来检测伪造。本研究提出了一种基于多模态自适应学（SSL）特征提取器来利用视频和音频模态之间的不一致来检测多模态视频伪造。我们使用基于 transformer 的 SSL 预训练 Audio-Visual HuBERT（AV-HuBERT）模型作为视觉和听音特征提取器，并使用多尺度时间卷积神经网络来捕捉音频和视觉模态之间的时间相关性。由于 AV-HuBERT 只提取视觉特征自唇部分，我们还采用另一种基于 transformer 的视频模型来利用脸部特征和捕捉深伪生成过程中的空间和时间偏差。实验结果表明，我们的模型在 FakeAVCeleb 和 DeepfakeTIMIT 数据集上的性能比所有现有模型高，实现了新的状态纪录水平。

Extraction of Atypical Aspects from Customer Reviews: Datasets and Experiments with Language Models

paper_url: http://arxiv.org/abs/2311.02702
repo_url: https://github.com/smitanannaware/xtrata
paper_authors: Smita Nannaware, Erfan Al-Hossami, Razvan Bunescu
for: 本研究的目的是检测顾客评论中的非常规方面，以便提高用户满意度。
methods: 本研究使用了人工批注 benchmark 数据集，以评估不同语言模型的性能。
results: 研究发现，使用 Flan-T5 进行 fine-tuning 和 GPT-3.5 的零shot 和几 shot 提示可以准确检测顾客评论中的非常规方面。

Abstract
A restaurant dinner may become a memorable experience due to an unexpected aspect enjoyed by the customer, such as an origami-making station in the waiting area. If aspects that are atypical for a restaurant experience were known in advance, they could be leveraged to make recommendations that have the potential to engender serendipitous experiences, further increasing user satisfaction. Although relatively rare, whenever encountered, atypical aspects often end up being mentioned in reviews due to their memorable quality. Correspondingly, in this paper we introduce the task of detecting atypical aspects in customer reviews. To facilitate the development of extraction models, we manually annotate benchmark datasets of reviews in three domains - restaurants, hotels, and hair salons, which we use to evaluate a number of language models, ranging from fine-tuning the instruction-based text-to-text transformer Flan-T5 to zero-shot and few-shot prompting of GPT-3.5.

摘要
餐厅晚餐可能变成一个深刻的记忆，因为顾客在等待区域内感受到了一个意外的元素，如 Origami 制作站。如果在餐厅经验中不寻常的方面知道在先，可以利用这些方面来提供建议，以便在用户满意度方面产生巧合体验。虽然这些不寻常的方面相对罕见，但当遇到时，它们通常会被评论中提及，因为它们具有深刻的特点。在这篇论文中，我们介绍了检测顾客评论中不寻常的方面的任务。为了促进EXTRACTION 模型的开发，我们手动标注了多个领域的客评论 benchmark 数据集 - 餐厅、酒店和美发店，并使用这些数据集来评估多种语言模型，从 fine-tuning Flan-T5 到零shot 和几shot 的 GPT-3.5 的Prompting。

Architecture Matters: Uncovering Implicit Mechanisms in Graph Contrastive Learning

paper_url: http://arxiv.org/abs/2311.02687
repo_url: None
paper_authors: Xiaojun Guo, Yifei Wang, Zeming Wei, Yisen Wang
for: 研究 graph contrastive learning (GCL) 方法的系统性特点，包括Positive samples 不是必须的、negative samples 不需要 для图类别或节点类别，以及数据增强对 GCL 的影响较小。
methods: 研究如何通过理解 GNN 的隐式概念偏好来解释 GCL 方法的特点。
results: 发现 GCL 方法的特点与传统的 visual contrastive learning (VCL) 方法有很大差异，包括Positive samples 不是必须的、negative samples 不需要 для图类别或节点类别，以及数据增强对 GCL 的影响较小。

Abstract
With the prosperity of contrastive learning for visual representation learning (VCL), it is also adapted to the graph domain and yields promising performance. However, through a systematic study of various graph contrastive learning (GCL) methods, we observe that some common phenomena among existing GCL methods that are quite different from the original VCL methods, including 1) positive samples are not a must for GCL; 2) negative samples are not necessary for graph classification, neither for node classification when adopting specific normalization modules; 3) data augmentations have much less influence on GCL, as simple domain-agnostic augmentations (e.g., Gaussian noise) can also attain fairly good performance. By uncovering how the implicit inductive bias of GNNs works in contrastive learning, we theoretically provide insights into the above intriguing properties of GCL. Rather than directly porting existing VCL methods to GCL, we advocate for more attention toward the unique architecture of graph learning and consider its implicit influence when designing GCL methods. Code is available at https: //github.com/PKU-ML/ArchitectureMattersGCL.

摘要
With the prosperity of contrastive learning for visual representation learning (VCL), it has also been adapted to the graph domain and has shown promising performance. However, through a systematic study of various graph contrastive learning (GCL) methods, we observe that some common phenomena exist among existing GCL methods that are quite different from the original VCL methods, including:1. Positive samples are not a must for GCL.2. Negative samples are not necessary for graph classification, nor for node classification when using specific normalization modules.3. Data augmentations have much less influence on GCL, as simple domain-agnostic augmentations (e.g., Gaussian noise) can also achieve fairly good performance.By uncovering how the implicit inductive bias of GNNs works in contrastive learning, we provide theoretical insights into the above intriguing properties of GCL. Rather than directly porting existing VCL methods to GCL, we advocate for more attention to be paid to the unique architecture of graph learning and consider its implicit influence when designing GCL methods. Code is available at: .

Compute at Scale – A Broad Investigation into the Data Center Industry

paper_url: http://arxiv.org/abs/2311.02651
repo_url: None
paper_authors: Konstantin Pilz, Lennart Heim
for: 这份报告描述了数据中心行业的现状和其对人工智能发展的重要性。
methods: 报告使用了大量的计算资源和网络连接来描述数据中心的特点和业务模式。
results: 根据报告，全球数据中心市场的价值约为2500亿美元，预计在下一个七年内将双倍增长。同时，全球大约有500个大型数据中心（每个MW电压），美国、欧洲和中国是最重要的市场。

Abstract
This report characterizes the data center industry and its importance for AI development. Data centers are industrial facilities that efficiently provide compute at scale and thus constitute the engine rooms of today's digital economy. As large-scale AI training and inference become increasingly computationally expensive, they are dominantly executed from this designated infrastructure. Key features of data centers include large-scale compute clusters that require extensive cooling and consume large amounts of power, the need for fast connectivity both within the data center and to the internet, and an emphasis on security and reliability. The global industry is valued at approximately $250B and is expected to double over the next seven years. There are likely about 500 large (above 10 MW) data centers globally, with the US, Europe, and China constituting the most important markets. The report further covers important actors, business models, main inputs, and typical locations of data centers.

摘要
这份报告描述了数据中心业和其对人工智能发展的重要性。数据中心是大规模计算的工业设施，它们提供大规模计算能力，并因此成为当今数字经济的发动机。随着大规模人工智能训练和推断变得越来越计算昂贵，它们主要在这些指定的基础设施上进行执行。数据中心的主要特点包括大规模计算集群，需要广泛的冷却和大量的电力，快速的内部连接和互联网连接，以及安全性和可靠性的强调。全球业态估价约2500亿美元，预计在下一个七年内将 doubles。全球可能有约500个大于10MW的数据中心，美国、欧洲和中国是最重要的市场。报告还涵盖了重要的actor、业务模式、主要输入和典型的数据中心所在地。

New Approach for an Affective Computing-Driven Quality of Experience (QoE) Prediction

paper_url: http://arxiv.org/abs/2311.02647
repo_url: None
paper_authors: Joshua Bègue, Mohamed Aymen Labiod, Abdelhamid Melloulk
for:这篇论文旨在提出一种基于情感计算的 качество经验（QoE）预测模型，以便在多媒体QoE评估场景下提高体验质量。methods:这篇论文使用了计算多核电enzephalogram（EEG）信息，并使用了差异 entropy和功率 спектル密度来特征提取。然后，使用深度学习模型来研究是否可以通过这些特征来预测QoE。results:这篇论文使用了一个公共可用的数据集，并使用了多个深度学习模型来研究QoE预测的可能性。结果显示，使用LSTM模型可以获得最好的结果，其中F1分数在68%到78%之间。此外，分析表明，Delta频率带是最不必要的，两个电极具有更高的重要性，而两个电极具有很低的影响。

Abstract
In human interactions, emotion recognition is crucial. For this reason, the topic of computer-vision approaches for automatic emotion recognition is currently being extensively researched. Processing multi-channel electroencephalogram (EEG) information is one of the most researched methods for automatic emotion recognition. This paper presents a new model for an affective computing-driven Quality of Experience (QoE) prediction. In order to validate the proposed model, a publicly available dataset is used. The dataset contains EEG, ECG, and respiratory data and is focused on a multimedia QoE assessment context. The EEG data are retained on which the differential entropy and the power spectral density are calculated with an observation window of three seconds. These two features were extracted to train several deep-learning models to investigate the possibility of predicting QoE with five different factors. The performance of these models is compared, and the best model is optimized to improve the results. The best results were obtained with an LSTM-based model, presenting an F1-score from 68% to 78%. An analysis of the model and its features shows that the Delta frequency band is the least necessary, that two electrodes have a higher importance, and that two other electrodes have a very low impact on the model's performances.

摘要
人际交互中，情感认知是非常重要的。因此，计算机视觉方法自动情感认知的研究在当前已经非常广泛。处理多通道电enzephalogram（EEG）信息是最广泛研究的方法。这篇论文提出了一种新的情感计算驱动的品质经验（QoE）预测模型。为验证提议的模型，使用了一个公共可用的数据集。该数据集包括EEG、ECG和呼吸数据，并且是关于多媒体QoE评估上下文。EEG数据上计算了差异积分和功率spectral density，使用观察窗口为3秒。这两个特征用于训练多个深度学习模型，以 investigate可能通过五个因素预测QoE。模型的性能相比，LSTM模型显示最佳结果，其F1得分在68%到78%之间。分析模型和其特征显示，Delta频率带是最不必要的，两个电极有更高的重要性，两个电极具有很低的影响。

PotholeGuard: A Pothole Detection Approach by Point Cloud Semantic Segmentation

paper_url: http://arxiv.org/abs/2311.02641
repo_url: None
paper_authors: Sahil Nawale, Dhruv Khut, Daksh Dave, Gauransh Sawhney, Pushkar Aggrawal, Dr. Kailas Devadakar
for: 本研究旨在提供一种robust和准确的3D破洞 segmentation方法，用于道路安全维护。
methods: 该方法使用点云图像 segmentation，并提供了一种新的点云缺失缓解机制，以及一种本地关系学习模块，以提高本地特征表示。
results: 对三个公共数据集进行了广泛的实验，并证明了PotholeGuard方法在现有方法中的超越性。

Abstract
Pothole detection is crucial for road safety and maintenance, traditionally relying on 2D image segmentation. However, existing 3D Semantic Pothole Segmentation research often overlooks point cloud sparsity, leading to suboptimal local feature capture and segmentation accuracy. Our research presents an innovative point cloud-based pothole segmentation architecture. Our model efficiently identifies hidden features and uses a feedback mechanism to enhance local characteristics, improving feature presentation. We introduce a local relationship learning module to understand local shape relationships, enhancing structural insights. Additionally, we propose a lightweight adaptive structure for refining local point features using the K nearest neighbor algorithm, addressing point cloud density differences and domain selection. Shared MLP Pooling is integrated to learn deep aggregation features, facilitating semantic data exploration and segmentation guidance. Extensive experiments on three public datasets confirm PotholeGuard's superior performance over state-of-the-art methods. Our approach offers a promising solution for robust and accurate 3D pothole segmentation, with applications in road maintenance and safety.

摘要
《破洞检测是公路安全和维护中非常重要的一环，传统上靠的是2D图像分割。然而，现有的3D语义破洞分割研究经常忽略点云稀疏性，导致本地特征捕捉和分割精度受到限制。我们的研究提出了一种创新的点云基于的破洞分割建筑。我们的模型能够高效发现隐藏的特征，并使用反馈机制来增强本地特征，提高特征表现。我们引入了地方关系学习模块，以更好地理解地方形态关系，提高结构性能。此外，我们提议了一种轻量级适应结构，通过K最近邻近算法来调整本地点特征，解决点云密度差异和领域选择问题。我们将 Shared MLP Pooling 集成到整个模型中，以学习深度聚合特征，促进 semantic 数据探索和分割引导。我们的方法在三个公共数据集上进行了广泛的实验，并证明了 PotholeGuard 的超过状态艺术方法的性能。我们的方法可以为公路维护和安全带来一个可靠和准确的3D破洞分割解决方案。

Assessing the Promise and Pitfalls of ChatGPT for Automated Code Generation

paper_url: http://arxiv.org/abs/2311.02640
repo_url: https://github.com/dsaatusu/chatgpt-promises-and-pitfalls
paper_authors: Muhammad Fawad Akbar Khan, Max Ramsdell, Erik Falor, Hamid Karimi
for: 评估 chatGPT 大语言模型在代码生成方面的能力，并与人工程师的代码相比。
methods: 使用了一个新的代码生成数据集，并对 chatGPT 和人工程师的代码进行了比较性评估，以评估 chatGPT 的代码生成能力。
results: 显示了 chatGPT 在数据分析任务中的强大能力（准确率为 93.1%），但在视觉graphical挑战中存在局限性。 chatGPT 的代码具有较高的含义性和安全性，且倾向于使用模块化设计和更好的错误处理。机器学习模型也能够准确地分辨出 chatGPT 的代码和人工程师的代码之间的差异。

Abstract
This paper presents a comprehensive evaluation of the code generation capabilities of ChatGPT, a prominent large language model, compared to human programmers. A novel dataset of 131 code-generation prompts across 5 categories was curated to enable robust analysis. Code solutions were generated by both ChatGPT and humans for all prompts, resulting in 262 code samples. A meticulous manual assessment methodology prioritized evaluating correctness, comprehensibility, and security using 14 established code quality metrics. The key findings reveal ChatGPT's strengths in crafting concise, efficient code with advanced constructs, showcasing strengths in data analysis tasks (93.1% accuracy) but limitations in visual-graphical challenges. Comparative analysis with human code highlights ChatGPT's inclination towards modular design and superior error handling. Additionally, machine learning models effectively distinguished ChatGPT from human code with up to 88% accuracy, suggesting detectable coding style disparities. By providing profound insights into ChatGPT's code generation capabilities and limitations through quantitative metrics and qualitative analysis, this study makes valuable contributions toward advancing AI-based programming assistants. The curated dataset and methodology offer a robust foundation for future research in this nascent domain. All data and codes are available on https://github.com/DSAatUSU/ChatGPT-promises-and-pitfalls.

摘要
The key findings show that ChatGPT excels in crafting concise and efficient code with advanced constructs, particularly in data analysis tasks (93.1% accuracy). However, it struggles with visual-graphical challenges. Comparative analysis with human code reveals that ChatGPT tends towards modular design and superior error handling. Additionally, machine learning models can accurately distinguish ChatGPT code from human code (up to 88% accuracy), indicating detectable differences in coding style.This study provides valuable insights into ChatGPT's code generation capabilities and limitations through quantitative metrics and qualitative analysis. The curated dataset and methodology serve as a robust foundation for future research in this emerging field. All data and codes are available on GitHub (https://github.com/DSAatUSU/ChatGPT-promises-and-pitfalls).

A Critical Perceptual Pre-trained Model for Complex Trajectory Recovery

paper_url: http://arxiv.org/abs/2311.02631
repo_url: None
paper_authors: Dedong Li, Ziyue Li, Zhishuai Li, Lei Bai, Qingyuan Gong, Lijun Sun, Wolfgang Ketter, Rui Zhao
For: 提高复杂路径轨迹恢复的精度， especialmente 处理跨度远 road segment 和多个拐弯的情况。* Methods: 使用顺序语言模型在预训练 manner 学习道路段表示 вектор，并提出了多视图图文件和复杂度意识Transformer（MGCAT）模型，可以在轨迹预训练中 adaptively 聚合多视图图文件特征，以及增强关键节点的注意力。* Results: 对大规模数据集进行了广泛的实验，结果表明，我们的方法可以更好地学习轨迹恢复的表示，全体 F1 分数提高 5.22%，特别是复杂轨迹 F1 分数提高 8.16%。

Abstract
The trajectory on the road traffic is commonly collected at a low sampling rate, and trajectory recovery aims to recover a complete and continuous trajectory from the sparse and discrete inputs. Recently, sequential language models have been innovatively adopted for trajectory recovery in a pre-trained manner: it learns road segment representation vectors, which will be used in the downstream tasks. However, existing methods are incapable of handling complex trajectories: when the trajectory crosses remote road segments or makes several turns, which we call critical nodes, the quality of learned representations deteriorates, and the recovered trajectories skip the critical nodes. This work is dedicated to offering a more robust trajectory recovery for complex trajectories. Firstly, we define the trajectory complexity based on the detour score and entropy score and construct the complexity-aware semantic graphs correspondingly. Then, we propose a Multi-view Graph and Complexity Aware Transformer (MGCAT) model to encode these semantics in trajectory pre-training from two aspects: 1) adaptively aggregate the multi-view graph features considering trajectory pattern, and 2) higher attention to critical nodes in a complex trajectory. Such that, our MGCAT is perceptual when handling the critical scenario of complex trajectories. Extensive experiments are conducted on large-scale datasets. The results prove that our method learns better representations for trajectory recovery, with 5.22% higher F1-score overall and 8.16% higher F1-score for complex trajectories particularly. The code is available at https://github.com/bonaldli/ComplexTraj.

摘要
trajectory 在路况资料收集时通常采集得非常低， trajectory 恢复目标是恢复完整、连续的 trajectory 从稀疏、离散输入中。最近，序列语言模型在预训练方式下被创新地应用于 trajectory 恢复中，它学习了路段表示 вектор，这些 вектор 将在下游任务中使用。然而，现有方法无法处理复杂的 trajectory：当 trajectory 过 remote 路段或多次转弯时，学习的表示质量会下降， recovered trajectory 会跳过关键节点。这项工作旨在提供更加稳定的 trajectory 恢复方法，能够处理复杂的 trajectory。我们首先定义 trajectory 的复杂性基于拐弯分数和 entropy 分数，并构建了相应的复杂性意识图。然后，我们提出了多视图图和复杂性意识图 transformer（MGCAT）模型，用于在 trajectory 预训练中编码这些semantics。MGCAT 模型通过以下两种方式来编码这些semantics：1）适应性地集合多视图图特征，考虑 trajectory 模式；2）在复杂的 trajectory 中高优先级关注关键节点。这样，我们的 MGCAT 在处理复杂的 trajectory 时具有较高的感知性。我们在大规模数据集上进行了广泛的实验，结果表明我们的方法可以更好地学习 trajectory 恢复的表示，全局 F1 分数提高 5.22%，而复杂 trajectory 的 F1 分数提高 8.16%。代码可以在上获取。

The New Frontier of Cybersecurity: Emerging Threats and Innovations

paper_url: http://arxiv.org/abs/2311.02630
repo_url: None
paper_authors: Daksh Dave, Gauransh Sawhney, Pushkar Aggarwal, Nitish Silswal, Dhruv Khut
for: 这项研究旨在全面检讨Cybersecurity领域的多种威胁，包括Malware攻击、社会工程攻击、网络漏洞和数据泄露等四类威胁。
methods: 本研究采用资深研究方法，检讨了这些威胁对个人、组织和社会的影响。
results: 研究发现了一系列新兴Cybersecurity威胁，包括高级 persistente攻击、劫持攻击、物联网（IoT）漏洞和社会工程攻击等。这些威胁对组织和个人都构成了严重的风险。因此，需要采取多层防御措施，包括强大的安全措施、全面的员工培训和定期的安全审核。

Abstract
In today's digitally interconnected world, cybersecurity threats have reached unprecedented levels, presenting a pressing concern for individuals, organizations, and governments. This study employs a qualitative research approach to comprehensively examine the diverse threats of cybersecurity and their impacts across various sectors. Four primary categories of threats are identified and analyzed, encompassing malware attacks, social engineering attacks, network vulnerabilities, and data breaches. The research delves into the consequences of these threats on individuals, organizations, and society at large. The findings reveal a range of key emerging threats in cybersecurity, including advanced persistent threats, ransomware attacks, Internet of Things (IoT) vulnerabilities, and social engineering exploits. Consequently, it is evident that emerging cybersecurity threats pose substantial risks to both organizations and individuals. The sophistication and diversity of these emerging threats necessitate a multi-layered approach to cybersecurity. This approach should include robust security measures, comprehensive employee training, and regular security audits. The implications of these emerging threats are extensive, with potential consequences such as financial loss, reputational damage, and compromised personal information. This study emphasizes the importance of implementing effective measures to mitigate these threats. It highlights the significance of using strong passwords, encryption methods, and regularly updating software to bolster cyber defenses.

摘要
The research reveals a range of key emerging threats in cybersecurity, including advanced persistent threats, ransomware attacks, Internet of Things (IoT) vulnerabilities, and social engineering exploits. These emerging threats pose substantial risks to both organizations and individuals, with potential consequences such as financial loss, reputational damage, and compromised personal information.To mitigate these threats, this study emphasizes the importance of implementing effective measures, such as robust security measures, comprehensive employee training, and regular security audits. Additionally, using strong passwords, encryption methods, and regularly updating software can help bolster cyber defenses. The sophistication and diversity of emerging threats necessitate a multi-layered approach to cybersecurity.The findings of this study have far-reaching implications, highlighting the significance of taking proactive measures to protect against cyber threats. With the increasing dependence on digital technologies, it is essential to stay vigilant and adapt to the evolving landscape of cyber threats. By prioritizing cybersecurity, individuals and organizations can minimize the risks of financial loss, reputational damage, and compromised personal information.

AIOps-Driven Enhancement of Log Anomaly Detection in Unsupervised Scenarios

paper_url: http://arxiv.org/abs/2311.02621
repo_url: None
paper_authors: Daksh Dave, Gauransh Sawhney, Dhruv Khut, Sahil Nawale, Pushkar Aggrawal, Prasenjit Bhavathankar
for: 本研究旨在提高AIOps平台中的日志异常检测效果，填补现有研究空白。
methods: 本研究提出了一种新型的混合方法，结合了不监督学习策略，包括原始数据处理和人工神经网络。
results: 实验结果表明，提出的方法可以减少 Pseudo-positive 的数量，并且可以处理日志在原始、未处理的形式下进行分析。

Abstract
Artificial intelligence operations (AIOps) play a pivotal role in identifying, mitigating, and analyzing anomalous system behaviors and alerts. However, the research landscape in this field remains limited, leaving significant gaps unexplored. This study introduces a novel hybrid framework through an innovative algorithm that incorporates an unsupervised strategy. This strategy integrates Principal Component Analysis (PCA) and Artificial Neural Networks (ANNs) and uses a custom loss function to substantially enhance the effectiveness of log anomaly detection. The proposed approach encompasses the utilization of both simulated and real-world datasets, including logs from SockShop and Hadoop Distributed File System (HDFS). The experimental results are highly promising, demonstrating significant reductions in pseudo-positives. Moreover, this strategy offers notable advantages, such as the ability to process logs in their raw, unprocessed form, and the potential for further enhancements. The successful implementation of this approach showcases a remarkable reduction in anomalous logs, thus unequivocally establishing the efficacy of the proposed methodology. Ultimately, this study makes a substantial contribution to the advancement of log anomaly detection within AIOps platforms, addressing the critical need for effective and efficient log analysis in modern and complex systems.

摘要

Get the Ball Rolling: Alerting Autonomous Robots When to Help to Close the Healthcare Loop

paper_url: http://arxiv.org/abs/2311.02602
repo_url: None
paper_authors: Jiaxin Shen, Yanyao Liu, Ziming Wang, Ziyuan Jiao, Yufeng Chen, Wenjuan Han
for: 推动健康Robot研究 без人类干预或指令，提出自主帮助挑战和大规模人均数据采集。
methods: 提出健康Robot拥有自动确定帮助需要、生成有用子任务、通过物理机器执行计划、接受环境反馈以生成新任务的能力。
results: 解决开放场景中自主任务生成、现场与静止常识之间的漏洞、语言指令与实际世界之间的漏洞等挑战，并提出Helpy方法尝试填补健康循环学习无需人类干预的情况。

Abstract
To facilitate the advancement of research in healthcare robots without human intervention or commands, we introduce the Autonomous Helping Challenge, along with a crowd-sourcing large-scale dataset. The goal is to create healthcare robots that possess the ability to determine when assistance is necessary, generate useful sub-tasks to aid in planning, carry out these plans through a physical robot, and receive feedback from the environment in order to generate new tasks and continue the process. Besides the general challenge in open-ended scenarios, Autonomous Helping focuses on three specific challenges: autonomous task generation, the gap between the current scene and static commonsense, and the gap between language instruction and the real world. Additionally, we propose Helpy, a potential approach to close the healthcare loop in the learning-free setting.

摘要
为了推动医疗机器人自主研究的发展，我们提出了无人指导的帮助挑战，同时发布了大规模的人类参与评估数据集。我们的目标是创造一种具有自动确定帮助需求、生成有用子任务、执行 física robot 计划、并接受环境反馈以生成新任务的医疗机器人。除了开放场景中的总体挑战外，Autonomous Helping 特点在于三个特定挑战：自动任务生成、现场与静态常识之间的差距、以及语言指令与实际世界之间的差距。此外，我们提出了一种可能的方法来在无学习设定下关闭医疗循环——Helpy。

Automated Camera Calibration via Homography Estimation with GNNs

paper_url: http://arxiv.org/abs/2311.02598
repo_url: None
paper_authors: Giacomo D’Amicantonio, Egor Bondarev, Peter H. N. De With
for: 这篇论文是为了提高交通监测系统中相机的准确性和自动化调整而提出的一种新方法。
methods: 该方法基于图граaph neural networks，使用 bird’s-eye-view 图像生成交叉口视角图像集合，并使用这些图像集合学习拓扑结构，从而估算出Homography矩阵。
results: 该方法在实验中表现出色，在真实世界相机上取得了最高精度的准确性和自动化调整。

Abstract
Over the past few decades, a significant rise of camera-based applications for traffic monitoring has occurred. Governments and local administrations are increasingly relying on the data collected from these cameras to enhance road safety and optimize traffic conditions. However, for effective data utilization, it is imperative to ensure accurate and automated calibration of the involved cameras. This paper proposes a novel approach to address this challenge by leveraging the topological structure of intersections. We propose a framework involving the generation of a set of synthetic intersection viewpoint images from a bird's-eye-view image, framed as a graph of virtual cameras to model these images. Using the capabilities of Graph Neural Networks, we effectively learn the relationships within this graph, thereby facilitating the estimation of a homography matrix. This estimation leverages the neighbourhood representation for any real-world camera and is enhanced by exploiting multiple images instead of a single match. In turn, the homography matrix allows the retrieval of extrinsic calibration parameters. As a result, the proposed framework demonstrates superior performance on both synthetic datasets and real-world cameras, setting a new state-of-the-art benchmark.

摘要
We propose a framework that involves generating a set of synthetic intersection viewpoint images from a bird's-eye-view image, framed as a graph of virtual cameras to model these images. Using the capabilities of Graph Neural Networks, we effectively learn the relationships within this graph, thereby facilitating the estimation of a homography matrix. This estimation leverages the neighborhood representation for any real-world camera and is enhanced by exploiting multiple images instead of a single match. In turn, the homography matrix allows the retrieval of extrinsic calibration parameters.As a result, the proposed framework demonstrates superior performance on both synthetic datasets and real-world cameras, setting a new state-of-the-art benchmark.

FloodBrain: Flood Disaster Reporting by Web-based Retrieval Augmented Generation with an LLM

paper_url: http://arxiv.org/abs/2311.02597
repo_url: None
paper_authors: Grace Colverd, Paul Darm, Leonard Silverberg, Noah Kasmanoff
for: 急需快速灾害影响报告，以便规划人道援助。
methods: 利用大量语言模型（LLMs）实现文本生成和问题解答等功能，并将资讯EXTRACT AND CURATE FROM THE WEB以生成灾害影响报告。
results: 比较不同的大量语言模型（LLMs）的生成报告和人工撰写的报告，发现与人工评分相似的相关性。另外，透过组件分析，发现单一管道元件的重要性。以增进大量语言模型的应用，减少灾害发生后的协调时间。

Abstract
Fast disaster impact reporting is crucial in planning humanitarian assistance. Large Language Models (LLMs) are well known for their ability to write coherent text and fulfill a variety of tasks relevant to impact reporting, such as question answering or text summarization. However, LLMs are constrained by the knowledge within their training data and are prone to generating inaccurate, or "hallucinated", information. To address this, we introduce a sophisticated pipeline embodied in our tool FloodBrain (floodbrain.com), specialized in generating flood disaster impact reports by extracting and curating information from the web. Our pipeline assimilates information from web search results to produce detailed and accurate reports on flood events. We test different LLMs as backbones in our tool and compare their generated reports to human-written reports on different metrics. Similar to other studies, we find a notable correlation between the scores assigned by GPT-4 and the scores given by human evaluators when comparing our generated reports to human-authored ones. Additionally, we conduct an ablation study to test our single pipeline components and their relevancy for the final reports. With our tool, we aim to advance the use of LLMs for disaster impact reporting and reduce the time for coordination of humanitarian efforts in the wake of flood disasters.

摘要
快速灾害影响报告是紧急 situations 规划人道援助的关键。大型自然语言模型（LLM）因其可以生成协调的文本和完成多种与影响报告相关的任务，如问答或文本摘要。然而，LLM 受训数据中知识的限制，容易生成错误或 "幻见" 信息。为解决这问题，我们提出了一个复杂的管道，并在我们的工具 FloodBrain (floodbrain.com) 中实现了生成洪水灾害影响报告。我们的管道从网络搜索结果中提取和筛选信息，生成详细和准确的洪水事件报告。我们测试了不同的 LLM 作为管道的脑库，并与人类评估器的评分相比较。与其他研究相似，我们发现了 GPT-4 的评分和人类评估器的评分之间存在显著的相关性。此外，我们还进行了减少学Component 的研究，以评估它们在最终报告中的重要性。我们的工具 aim 是使用 LLM 进行灾害影响报告，提高人道援助协调的效率，并减少洪水灾害后的协调时间。

scBeacon: single-cell biomarker extraction via identifying paired cell clusters across biological conditions with contrastive siamese networks

paper_url: http://arxiv.org/abs/2311.02594
repo_url: None
paper_authors: Chenyu Liu, Kweon Yong Jin, Jun Ding
for: 这篇论文旨在提高单细胞水平的标识分析，尤其是在疾病和健康状态之间的交互作用下。
methods: 这篇论文提出了一个名为 scBeacon 的新的框架，这是一个基于深度对称 siamese 网络的无监控方法，可以对单细胞水平的标识进行分组，并且可以对不同状态下的单细胞进行匹配。
results: 根据评估结果，scBeacon 在各种数据集上表现出色，较 existing 的单细胞标识分析工具有更高的精度和灵活性。

Abstract
Despite the breakthroughs in biomarker discovery facilitated by differential gene analysis, challenges remain, particularly at the single-cell level. Traditional methodologies heavily rely on user-supplied cell annotations, focusing on individually expressed data, often neglecting the critical interactions between biological conditions, such as healthy versus diseased states. In response, here we introduce scBeacon, an innovative framework built upon a deep contrastive siamese network. scBeacon pioneers an unsupervised approach, adeptly identifying matched cell populations across varied conditions, enabling a refined differential gene analysis. By utilizing a VQ-VAE framework, a contrastive siamese network, and a greedy iterative strategy, scBeacon effectively pinpoints differential genes that hold potential as key biomarkers. Comprehensive evaluations on a diverse array of datasets validate scBeacon's superiority over existing single-cell differential gene analysis tools. Its precision and adaptability underscore its significant role in enhancing diagnostic accuracy in biomarker discovery. With the emphasis on the importance of biomarkers in diagnosis, scBeacon is positioned to be a pivotal asset in the evolution of personalized medicine and targeted treatments.

摘要
尽管生物标志物发现方面已经做出了重大突破，但是在单个细胞水平还存在一些挑战。传统的方法ologies依赖用户提供的细胞注释，宁静关注个别表达数据，经常忽略生物条件之间的关键互动，如健康与疾病状态之间的对比。为此，我们在这里引入scBeacon，一种创新的框架，基于深度对比性同构网络。scBeacon采用无监督方法，能够准确地匹配不同状态下的细胞人口，从而提高了差异基因分析的精度。通过VQ-VAE框架、对比性同构网络和迅速迭代策略，scBeacon可以有效地找到具有潜在作用的差异基因，这些基因可能成为重要的生物标志物。对于一系列多样化的数据集进行了全面的评估，scBeacon的精度和适应性得到了证明，与现有的单个细胞差异基因分析工具相比，具有显著的优势。鉴于生物标志物在诊断中的重要性，scBeacon将成为个人化医学和Targeted therapy的核心资产。

Differentially Private Pre-Trained Model Fusion using Decentralized Federated Graph Matching

paper_url: http://arxiv.org/abs/2311.03396
repo_url: None
paper_authors: Qian Chen, Yiqiang Chen, Xinlong Jiang, Teng Zhang, Weiwei Dai, Wuliang Huang, Zhen Yan, Bo Ye
for: 本研究旨在提供一种保持隐私的模型融合方法，以便在模型作为服务的场景中实现高质量的模型服务传递。
methods: 本研究使用了图structured architecture，并采用了本地差分隐私机制和分布式联合图匹配来保证模型融合过程中的隐私。
results: 实验结果表明，PrivFusion可以保持模型性能的同时保障隐私，并且在实际的医疗应用中得到了较好的效果。

Abstract
Model fusion is becoming a crucial component in the context of model-as-a-service scenarios, enabling the delivery of high-quality model services to local users. However, this approach introduces privacy risks and imposes certain limitations on its applications. Ensuring secure model exchange and knowledge fusion among users becomes a significant challenge in this setting. To tackle this issue, we propose PrivFusion, a novel architecture that preserves privacy while facilitating model fusion under the constraints of local differential privacy. PrivFusion leverages a graph-based structure, enabling the fusion of models from multiple parties without necessitating retraining. By employing randomized mechanisms, PrivFusion ensures privacy guarantees throughout the fusion process. To enhance model privacy, our approach incorporates a hybrid local differentially private mechanism and decentralized federated graph matching, effectively protecting both activation values and weights. Additionally, we introduce a perturbation filter adapter to alleviate the impact of randomized noise, thereby preserving the utility of the fused model. Through extensive experiments conducted on diverse image datasets and real-world healthcare applications, we provide empirical evidence showcasing the effectiveness of PrivFusion in maintaining model performance while preserving privacy. Our contributions offer valuable insights and practical solutions for secure and collaborative data analysis within the domain of privacy-preserving model fusion.

摘要
<>模型融合在服务模式下成为关键组件，使得本地用户获得高质量模型服务。然而，这种方法带来隐私风险并带来一些应用限制。保持安全的模型交换和用户知识融合成为这种设置中的主要挑战。为解决这个问题，我们提出了 PrivFusion，一种新的架构，可以在保持隐私的情况下进行模型融合。PrivFusion利用图structured，可以在多方参与者的情况下进行模型融合，不需要重新训练。通过随机机制，PrivFusion保证了隐私保障 throughout the fusion process。为增强模型隐私，我们的方法包括了hybrid本地分布式隐私机制和分布式图匹配，以保护模型的活动值和权重。此外，我们还提出了抑制随机噪声的滤波器适配器，以降低随机噪声的影响，保持融合模型的用用性。通过对多个图像数据集和实际医疗应用进行了广泛的实验，我们提供了Empirical evidence，证明PrivFusion可以保持模型性能的同时保护隐私。我们的贡献提供了有价值的实践解决方案和技术方法，用于在隐私保护下进行安全的数据分析。

paper_url: http://arxiv.org/abs/2311.03395
repo_url: None
paper_authors: Kumar Srinivas Bobba, Kartheeban K, Vamsi Krishna Sai Boddu, Vijaya Mani Surendra Bolla, Dinesh Bugga
for: 帮助视障人群在日常生活中独立行动，提高生活质量。
methods: 使用计算机视觉、距离估算附加于超声波传感器、语音识别和语音助手，为用户提供实时环境信息。
results: 可以帮助视障人群在环境中导航、识别物体和人员、阅读文本、避免障碍。

Abstract
As able-bodied people, we often take our vision for granted. For people who are visually impaired, however, their disability can have a significant impact on their daily lives. We are developing proprietary headgear that will help visually impaired people navigate their surroundings, identify objects and people, read text, and avoid obstacles. The headgear will use a combination of computer vision, distance estimation with ultrasonic sensors, voice recognition, and voice assistants to provide users with real-time information about their environment. Users will be able to interact with the headgear through voice commands, such as ''What is that?'' to identify an object or ''Navigate to the front door'' to find their way around. The headgear will then provide the user with a verbal description of the object or spoken navigation instructions. We believe that this headgear has the potential to make a significant difference in the lives of visually impaired people, allowing them to live more independently and participate more fully in society.

摘要
As 能够的人们，我们经常忽略我们的视力。但对于有视力障碍的人们，他们的障碍可能会对他们的日常生活产生深远的影响。我们正在开发专有的头盔，帮助有视力障碍的人们在环境中导航、识别物体和人员、阅读文本，并避免障碍。这个头盔使用计算机视觉、ultrasonic探测、语音识别和语音助手等技术，为用户提供实时环境信息。用户可以通过声音命令，如 ''什么是那？'' 识别物体，或 ''导航到门口'' 查找方向。头盔然后为用户提供物体的声音描述或导航说明。我们认为这个头盔有可能对有视力障碍人员的生活产生深远的影响，让他们更独立地生活，更全面地参与社会。

KITS: Inductive Spatio-Temporal Kriging with Increment Training Strategy

paper_url: http://arxiv.org/abs/2311.02565
repo_url: None
paper_authors: Qianxiong Xu, Cheng Long, Ziyue Li, Sijie Ruan, Rui Zhao, Zhishuai Li
For: This paper proposes a new method called KITS (Kriging with Increment Training Strategy) to address the issue of graph gap in inductive spatio-temporal kriging methods based on graph neural networks.* Methods: The KITS method adds virtual nodes to the training graph to mitigate the graph gap issue, and pairs each virtual node with its most similar observed node to fuse their features together. The method also constructs reliable pseudo labels for virtual nodes to enhance the supervision signal.* Results: The KITS method consistently outperforms existing kriging methods by large margins, with an improvement over MAE score of up to 18.33%.

Abstract
Sensors are commonly deployed to perceive the environment. However, due to the high cost, sensors are usually sparsely deployed. Kriging is the tailored task to infer the unobserved nodes (without sensors) using the observed source nodes (with sensors). The essence of kriging task is transferability. Recently, several inductive spatio-temporal kriging methods have been proposed based on graph neural networks, being trained based on a graph built on top of observed nodes via pretext tasks such as masking nodes out and reconstructing them. However, the graph in training is inevitably much sparser than the graph in inference that includes all the observed and unobserved nodes. The learned pattern cannot be well generalized for inference, denoted as graph gap. To address this issue, we first present a novel Increment training strategy: instead of masking nodes (and reconstructing them), we add virtual nodes into the training graph so as to mitigate the graph gap issue naturally. Nevertheless, the empty-shell virtual nodes without labels could have bad-learned features and lack supervision signals. To solve these issues, we pair each virtual node with its most similar observed node and fuse their features together; to enhance the supervision signal, we construct reliable pseudo labels for virtual nodes. As a result, the learned pattern of virtual nodes could be safely transferred to real unobserved nodes for reliable kriging. We name our new Kriging model with Increment Training Strategy as KITS. Extensive experiments demonstrate that KITS consistently outperforms existing kriging methods by large margins, e.g., the improvement over MAE score could be as high as 18.33%.

摘要
感知器通常用于感知环境。然而，由于成本高昂，感知器通常会受到稀畴部署。基于树状网络的 krilling 任务可以用来推断没有感知器的节点（无感知节点）。 krilling 任务的核心思想是传播性。现在，基于图ael 神经网络的一些 inductive spatio-temporal krilling 方法已经被提出，这些方法通过在观察节点基础上建立图来进行训练，然后通过预测任务来学习。然而，训练图和推断图都包含所有观察和无感知节点，这会导致学习的模式难以在推断中 generalized。这种问题被称为图 gap。为解决这个问题，我们首先提出了一种新的增量训练策略：而不是将节点屏蔽（并重建它们），我们会将虚拟节点添加到训练图中，以mitigate the graph gap issue naturally。然而，空 shell 的虚拟节点没有标签可能会有坏学习特征和缺乏监督信号。为解决这些问题，我们将每个虚拟节点与其最相似的观察节点进行对应，并将它们的特征特性相加。此外，为增强监督信号，我们将虚拟节点的 pseudo label 建立起来。因此，学习的虚拟节点模式可以安全地传输到实际的无感知节点，以确保可靠的 krilling。我们称之为 KITS。我们的实验表明，KITS 可以大幅超过现有的 krilling 方法，例如 MAE 分数的改进率可以高达 18.33%。

Time Series Synthesis Using the Matrix Profile for Anonymization

paper_url: http://arxiv.org/abs/2311.02563
repo_url: None
paper_authors: Audrey Der, Chin-Chia Michael Yeh, Yan Zheng, Junpeng Wang, Huiyuan Chen, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn Keogh
for: 提供一种方法 Synthesize time series data，以便在保持数据相似性的情况下，避免遵循 privacy regulations 或 commercial confidentiality 的限制。
methods: 提出了 Time Series Synthesis Using Matrix Profile (TSSUMP) 方法，该方法可以在保持数据相似性的情况下，将时间序列数据synthesized，以便在数据分析 tasks 中使用。
results: 通过实际案例研究，表明 TSSUMP 方法可以减少时间序列数据的相关性，同时保持数据的相似性，使得数据分析工具可以在synthesized时间序列上达到 near-identical 性能。

Abstract
Publishing and sharing data is crucial for the data mining community, allowing collaboration and driving open innovation. However, many researchers cannot release their data due to privacy regulations or fear of leaking confidential business information. To alleviate such issues, we propose the Time Series Synthesis Using the Matrix Profile (TSSUMP) method, where synthesized time series can be released in lieu of the original data. The TSSUMP method synthesizes time series by preserving similarity join information (i.e., Matrix Profile) while reducing the correlation between the synthesized and the original time series. As a result, neither the values for the individual time steps nor the local patterns (or shapes) from the original data can be recovered, yet the resulting data can be used for downstream tasks that data analysts are interested in. We concentrate on similarity joins because they are one of the most widely applied time series data mining routines across different data mining tasks. We test our method on a case study of ECG and gender masking prediction. In this case study, the gender information is not only removed from the synthesized time series, but the synthesized time series also preserves enough information from the original time series. As a result, unmodified data mining tools can obtain near-identical performance on the synthesized time series as on the original time series.

摘要
发布和分享数据对数据挖掘社区至关重要，它帮助研究人员合作和推动开放创新。然而，许多研究人员无法发布自己的数据，因为隐私法规或担心泄露商业机密信息。为解决这些问题，我们提出了时间序列合成使用矩阵Profile（TSSUMP）方法，其中合成的时间序列可以代替原始数据。TSSUMP方法将时间序列合成，保持相似性Join信息（即矩阵Profile），同时减少合成时间序列和原始时间序列之间的相关性。因此，不能回归原始数据中的值，也不能回归本地特征（或形状）。然而，合成的数据仍然可以用于下游任务，数据分析师感兴趣的任务。我们专注于相似Join，因为它们是时间序列数据挖掘任务中最常用的 Routine。我们在ECG和性别遮盾预测case study中测试了我们的方法。在这个case study中， gender信息不仅从合成的时间序列中被除了，还保留了原始时间序列中的足够信息。因此，未修改的数据挖掘工具可以在合成的时间序列上获得近似于原始时间序列的性能。

Ego-Network Transformer for Subsequence Classification in Time Series Data

paper_url: http://arxiv.org/abs/2311.02561
repo_url: None
paper_authors: Chin-Chia Michael Yeh, Huiyuan Chen, Yujie Fan, Xin Dai, Yan Zheng, Vivian Lai, Junpeng Wang, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn Keogh
for: 这篇论文旨在解决实际时间序列数据中的背景序列与前景序列混合类别问题。
methods: 本论文提出了一个新的时间序列子序列分类方法，将每个子序列表示为一个自我网络，具有重要最近邻信息。
results: 根据128个单变量和30个多变量时间序列数据集进行实验，结果显示本方法比据点方法表现出色，在104个数据集中表现更好。

Abstract
Time series classification is a widely studied problem in the field of time series data mining. Previous research has predominantly focused on scenarios where relevant or foreground subsequences have already been extracted, with each subsequence corresponding to a single label. However, real-world time series data often contain foreground subsequences that are intertwined with background subsequences. Successfully classifying these relevant subsequences requires not only distinguishing between different classes but also accurately identifying the foreground subsequences amidst the background. To address this challenge, we propose a novel subsequence classification method that represents each subsequence as an ego-network, providing crucial nearest neighbor information to the model. The ego-networks of all subsequences collectively form a time series subsequence graph, and we introduce an algorithm to efficiently construct this graph. Furthermore, we have demonstrated the significance of enforcing temporal consistency in the prediction of adjacent subsequences for the subsequence classification problem. To evaluate the effectiveness of our approach, we conducted experiments using 128 univariate and 30 multivariate time series datasets. The experimental results demonstrate the superior performance of our method compared to alternative approaches. Specifically, our method outperforms the baseline on 104 out of 158 datasets.

摘要
时间序列分类是时间序列数据挖掘领域广泛研究的问题。先前的研究主要集中在已经提取了相关或前景 subsequences 的情况下进行研究，每个 subsequences 都对应一个单独的标签。然而，实际世界中的时间序列数据经常包含相关的前景 subsequences，需要不仅分辨不同的类型，还需要准确地识别前景 subsequences 中的相关部分。为解决这个挑战，我们提出了一种新的 subsequences 分类方法，即将每个 subsequences 表示为一个自我网络，提供了关键的最近邻居信息给模型。所有 subsequences 的ego-networks 共同形成了时间序列 subsequences 图，我们介绍了一种有效地构建这个图的算法。此外，我们还证明了在预测相邻 subsequences 时应该保持时间一致性的重要性。为评估我们的方法的有效性，我们对 128 个单variate 和 30 个多variate 时间序列数据集进行了实验。实验结果表明，我们的方法与其他方法相比，在 104 个数据集上表现出了更高的性能。具体来说，我们的方法在 158 个数据集中超过了基准值。

Sketching Multidimensional Time Series for Fast Discord Mining

paper_url: http://arxiv.org/abs/2311.03393
repo_url: None
paper_authors: Chin-Chia Michael Yeh, Yan Zheng, Menghai Pan, Huiyuan Chen, Zhongfang Zhuang, Junpeng Wang, Liang Wang, Wei Zhang, Jeff M. Phillips, Eamonn Keogh
for: 本研究旨在提高多维时间序列异常检测中维度缩放的效率，并提供一种可靠地检测多维时间序列异常的方法。
methods: 本研究使用缩放矩阵 Profile 来捕捉时间序列异常，并提出一种基于缩放矩阵的快速异常检测算法。
results: 实验结果表明，提出的算法可以在多个实际世界应用中提高吞吐量，并且只具有 minimal impact on the quality of the approximated solution。此外，该算法还可以处理动态添加或删除维度的情况，允许数据分析师在实时进行 “what-if” 分析。

Abstract
Time series discords are a useful primitive for time series anomaly detection, and the matrix profile is capable of capturing discord effectively. There exist many research efforts to improve the scalability of discord discovery with respect to the length of time series. However, there is surprisingly little work focused on reducing the time complexity of matrix profile computation associated with dimensionality of a multidimensional time series. In this work, we propose a sketch for discord mining among multi-dimensional time series. After an initial pre-processing of the sketch as fast as reading the data, the discord mining has runtime independent of the dimensionality of the original data. On several real world examples from water treatment and transportation, the proposed algorithm improves the throughput by at least an order of magnitude (50X) and only has minimal impact on the quality of the approximated solution. Additionally, the proposed method can handle the dynamic addition or deletion of dimensions inconsequential overhead. This allows a data analyst to consider "what-if" scenarios in real time while exploring the data.

摘要
时序列冲突是一种有用的原始 primitives для时序列异常检测，matrix profile 可以有效地捕捉冲突。有很多研究努力以提高时序列冲突发现的可扩展性，但是奇怪的是，有 surprisingly little work focused on reducing the time complexity of matrix profile computation associated with the dimensionality of a multidimensional time series.在这种工作中，我们提议一种笔记 для多维时序列冲突挖掘。经过初始快速预处理的笔记，冲突挖掘的运行时间与原始数据的维度无关。在几个实际世界示例中（水处理和交通），我们提出的算法可以提高通过put throughput 至少一个数量级（50X），并且只有 minimal impact on the quality of the approximated solution。此外，我们的方法还可以处理动态添加或删除维度的无关 overhead。这意味着数据分析师可以在实时中考虑 "what-if" 场景，在探索数据时进行实时探索。

Nonlinear Multi-objective Reinforcement Learning with Provable Guarantees

paper_url: http://arxiv.org/abs/2311.02544
repo_url: None
paper_authors: Nianli Peng, Brandon Fain
for: 解决单或多目标Markov决策过程（MDP）中 maximize 期望值的非线性函数。
methods: 使用证明可靠的保证来解决这些问题，扩展了经典的E3算法，并提出了一种基于奖励意识的值迭代过程，以及一种同时学习环境模型的算法。
results: 该算法可以在很短的时间内获得一个约等于优化的策略，时间复杂度为MDP大小、欲达到的拟合度和非线性函数的平滑程度的高阶幂。

Abstract
We describe RA-E3 (Reward-Aware Explicit Explore or Exploit), an algorithm with provable guarantees for solving a single or multi-objective Markov Decision Process (MDP) where we want to maximize the expected value of a nonlinear function over accumulated rewards. This allows us to model fairness-aware welfare optimization for multi-objective reinforcement learning as well as risk-aware reinforcement learning with nonlinear Von Neumann-Morgenstern utility functions in the single objective setting. RA-E3 extends the classic E3 algorithm that solves MDPs with scalar rewards and linear preferences. We first state a distinct reward-aware version of value iteration that calculates a non-stationary policy that is approximately optimal for a given model of the environment. This sub-procedure is based on an extended form of Bellman optimality for nonlinear optimization that explicitly considers time and current accumulated reward. We then describe how to use this optimization procedure in a larger algorithm that must simultaneously learn a model of the environment. The algorithm learns an approximately optimal policy in time that depends polynomially on the MDP size, desired approximation, and smoothness of the nonlinear function, and exponentially on the number of objectives.

摘要
我们描述RA-E3（奖励意识的明确探索或利用算法），这是一个具有证明保证的算法，用于解决单或多个目标Markov决策过程（MDP），以 Maximize the expected value of a nonlinear function over accumulated rewards. 这Permit us to model fairness-aware welfare optimization for multi-objective reinforcement learning as well as risk-aware reinforcement learning with nonlinear Von Neumann-Morgenstern utility functions in the single objective setting. RA-E3 extends the classic E3 algorithm that solves MDPs with scalar rewards and linear preferences. We first state a distinct reward-aware version of value iteration that calculates a non-stationary policy that is approximately optimal for a given model of the environment. This sub-procedure is based on an extended form of Bellman optimality for nonlinear optimization that explicitly considers time and current accumulated reward. We then describe how to use this optimization procedure in a larger algorithm that must simultaneously learn a model of the environment. The algorithm learns an approximately optimal policy in time that depends polynomially on the MDP size, desired approximation, and smoothness of the nonlinear function, and exponentially on the number of objectives.

Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols

paper_url: http://arxiv.org/abs/2311.02538
repo_url: None
paper_authors: Iqra Qasim, Alexander Horsch, Dilip K. Prasad
for: 本研究旨在描述视频中的各种事件和互动，以提高视频的自然语言描述能力。
methods: 本研究使用了 dense video captioning (DVC) 技术，包括视频特征提取 (VFE)、时间事件Localization (TEL) 和高密度caption生成 (DCG) 三个子任务。
results: 研究人员通过实现DVC技术来描述视频中的各种事件和互动，并获得了较好的结果。

Abstract
Untrimmed videos have interrelated events, dependencies, context, overlapping events, object-object interactions, domain specificity, and other semantics that are worth highlighting while describing a video in natural language. Owing to such a vast diversity, a single sentence can only correctly describe a portion of the video. Dense Video Captioning (DVC) aims at detecting and describing different events in a given video. The term DVC originated in the 2017 ActivityNet challenge, after which considerable effort has been made to address the challenge. Dense Video Captioning is divided into three sub-tasks: (1) Video Feature Extraction (VFE), (2) Temporal Event Localization (TEL), and (3) Dense Caption Generation (DCG). This review aims to discuss all the studies that claim to perform DVC along with its sub-tasks and summarize their results. We also discuss all the datasets that have been used for DVC. Lastly, we highlight some emerging challenges and future trends in the field.

摘要
<> simulti-translation:en-cn原文：Untrimmed videos have interrelated events, dependencies, context, overlapping events, object-object interactions, domain specificity, and other semantics that are worth highlighting while describing a video in natural language. Owing to such a vast diversity, a single sentence can only correctly describe a portion of the video. Dense Video Captioning (DVC) aims at detecting and describing different events in a given video. The term DVC originated in the 2017 ActivityNet challenge, after which considerable effort has been made to address the challenge. Dense Video Captioning is divided into three sub-tasks: (1) Video Feature Extraction (VFE), (2) Temporal Event Localization (TEL), and (3) Dense Caption Generation (DCG). This review aims to discuss all the studies that claim to perform DVC along with its sub-tasks and summarize their results. We also discuss all the datasets that have been used for DVC. Lastly, we highlight some emerging challenges and future trends in the field.翻译：视频中有关联的事件、依赖关系、上下文、重叠事件、对象之间交互、域特定性和其他semantics，这些都值得在描述视频的自然语言中提到。由于这种广泛的多样性，单个句子只能正确描述视频的一部分。dense video captioning（DVC）目标在检测和描述视频中的不同事件。DVC的概念在2017年的ActivityNet挑战之后得到了广泛的努力，以解决这个挑战。DVC分为三个子任务：（1）视频特征提取（VFE），（2）时间事件地理位置（TEL），和（3）密集caption生成（DCG）。本文尝试讨论所有宣称实现DVC的研究，以及它们的结果。我们还讨论了所有用于DVC的数据集。最后，我们提出了一些emerging挑战和未来趋势。

2023-11-05

cs.CL

cs.CL - 2023-11-05

Robust Generalization Strategies for Morpheme Glossing in an Endangered Language Documentation Context

paper_url: http://arxiv.org/abs/2311.02777
repo_url: None
paper_authors: Michael Ginn, Alexis Palmer
for: 这篇论文旨在 investigate the ability of morpheme labeling models to generalize, especially in resource-constrained settings.
methods: 这篇论文使用 weight decay optimization, output denoising, and iterative pseudo-labeling 方法来减少模型在不同类型文本上的差异性。
results: experiments 表明，通过使用这些方法，模型的性能在未经见过的类型文本上提高了2%。

Abstract
Generalization is of particular importance in resource-constrained settings, where the available training data may represent only a small fraction of the distribution of possible texts. We investigate the ability of morpheme labeling models to generalize by evaluating their performance on unseen genres of text, and we experiment with strategies for closing the gap between performance on in-distribution and out-of-distribution data. Specifically, we use weight decay optimization, output denoising, and iterative pseudo-labeling, and achieve a 2% improvement on a test set containing texts from unseen genres. All experiments are performed using texts written in the Mayan language Uspanteko.

摘要
通用化在有限资源的情况下 particualrly important, where the available training data may only represent a small fraction of the distribution of possible texts. We investigate the ability of morpheme labeling models to generalize by evaluating their performance on unseen genres of text, and we experiment with strategies for closing the gap between performance on in-distribution and out-of-distribution data. Specifically, we use weight decay optimization, output denoising, and iterative pseudo-labeling, and achieve a 2% improvement on a test set containing texts from unseen genres. All experiments are performed using texts written in the Mayan language Uspanteko.Here's the text with Traditional Chinese characters:通用化在有限资源的情况下 particualrly important, where the available training data may only represent a small fraction of the distribution of possible texts. We investigate the ability of morpheme labeling models to generalize by evaluating their performance on unseen genres of text, and we experiment with strategies for closing the gap between performance on in-distribution and out-of-distribution data. Specifically, we use weight decay optimization, output denoising, and iterative pseudo-labeling, and achieve a 2% improvement on a test set containing texts from unseen genres. All experiments are performed using texts written in the Mayan language Uspanteko.

Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency

paper_url: http://arxiv.org/abs/2311.02772
repo_url: None
paper_authors: Sungho Jeon, Ching-Feng Yeh, Hakan Inan, Wei-Ning Hsu, Rashi Rungta, Yashar Mehdad, Daniel Bikel
for: 这个论文目的是提出一种简单自编程的音频模型，可以达到与更复杂的预训练模型相同的推理效率。
methods: 这个论文使用了混合卷积模块和自注意模块的speech transformerEncoder，实现了ASR的state-of-the-art性和高效性。
results: 研究表明，使用这种speech transformerEncoder可以大幅提高预训练音频模型的效率，但是我们还可以通过使用高级自注意来实现相同的效率。此外，我们发现使用低位数字量化技术可以进一步提高效率。

Abstract
In this paper, we show that a simple self-supervised pre-trained audio model can achieve comparable inference efficiency to more complicated pre-trained models with speech transformer encoders. These speech transformers rely on mixing convolutional modules with self-attention modules. They achieve state-of-the-art performance on ASR with top efficiency. We first show that employing these speech transformers as an encoder significantly improves the efficiency of pre-trained audio models as well. However, our study shows that we can achieve comparable efficiency with advanced self-attention solely. We demonstrate that this simpler approach is particularly beneficial with a low-bit weight quantization technique of a neural network to improve efficiency. We hypothesize that it prevents propagating the errors between different quantized modules compared to recent speech transformers mixing quantized convolution and the quantized self-attention modules.

摘要
在这篇论文中，我们显示了一种简单的自我超vised预训练音频模型可以达到与更复杂的预训练模型（具有speech transformer Encoder）相当的推理效率。这些speech transformer Encoder通过混合径向模块与自我注意模块来实现了ASR中的状态环境。我们首先显示了使用这些speech transformer Encoder作为encoder可以显著提高预训练音频模型的效率。然而，我们的研究表明，我们可以通过高级自我注意来实现相同的效率。我们示出了这种更简单的方法在使用低位数量量化神经网络时 particualrly有利。我们假设这种方法可以避免在不同量化模块之间传递错误，相比之下，当前的speech transformers混合量化径向模块和量化自我注意模块。

Pyclipse, a library for deidentification of free-text clinical notes

paper_url: http://arxiv.org/abs/2311.02748
repo_url: None
paper_authors: Callandra Moore, Jonathan Ranisau, Walter Nelson, Jeremy Petch, Alistair Johnson
for: Automated deidentification of clinical text data is crucial due to the high cost of manual deidentification, which has been a barrier to sharing clinical text and the advancement of clinical natural language processing.
methods: The pyclipse framework is proposed to address the challenges of creating effective automated deidentification tools, including issues in reproducibility due to differences in text processing, evaluation methods, and a lack of consistency across clinical domains and institutions.
results: The pyclipse framework is demonstrated to be a unified and configurable evaluation procedure that can streamline the comparison of deidentification algorithms, and it is found that algorithm performance consistently falls short of the results reported in the original papers, even when evaluated on the same benchmark dataset.

Abstract
Automated deidentification of clinical text data is crucial due to the high cost of manual deidentification, which has been a barrier to sharing clinical text and the advancement of clinical natural language processing. However, creating effective automated deidentification tools faces several challenges, including issues in reproducibility due to differences in text processing, evaluation methods, and a lack of consistency across clinical domains and institutions. To address these challenges, we propose the pyclipse framework, a unified and configurable evaluation procedure to streamline the comparison of deidentification algorithms. Pyclipse serves as a single interface for running open-source deidentification algorithms on local clinical data, allowing for context-specific evaluation. To demonstrate the utility of pyclipse, we compare six deidentification algorithms across four public and two private clinical text datasets. We find that algorithm performance consistently falls short of the results reported in the original papers, even when evaluated on the same benchmark dataset. These discrepancies highlight the complexity of accurately assessing and comparing deidentification algorithms, emphasizing the need for a reproducible, adjustable, and extensible framework like pyclipse. Our framework lays the foundation for a unified approach to evaluate and improve deidentification tools, ultimately enhancing patient protection in clinical natural language processing.

摘要
自动化识别临床文本数据的重要性在于手动识别的高成本，这成为了临床自然语言处理的发展的一个障碍。然而，创建有效的自动化识别工具面临着许多挑战，包括评估方法的不同和临床领域和机构之间的不一致性。为解决这些挑战，我们提出了pyclipse框架，一个可配置的评估过程框架，可以帮助Streamline识别算法的比较。pyclipse提供了一个单一的界面，可以在本地临床数据上运行开源识别算法，并为每个临床领域和机构提供上下文特定的评估。为了证明pyclipse的有用性，我们将比较六种识别算法在四个公共和两个私人临床文本数据集上的表现。我们发现，算法的表现 consistently short of the results reported in the original papers, even when evaluated on the same benchmark dataset.这些差异 highlights the complexity of accurately assessing and comparing deidentification algorithms, emphasizing the need for a reproducible, adjustable, and extensible framework like pyclipse.我们的框架为识别工具的评估和改进提供了一个统一的方法，从而推动了患者保护在临床自然语言处理中。

Nepali Video Captioning using CNN-RNN Architecture

paper_url: http://arxiv.org/abs/2311.02699
repo_url: None
paper_authors: Bipesh Subedi, Saugat Singh, Bal Krishna Bal
for: 这个研究旨在开发一个基于深度神经网络的尼泊尔视频描述系统，以提供精准和contextually relevant的视频描述 для尼泊尔视频。
methods: 该研究使用了预训练的CNN和RNN，并通过数据采集、数据处理、模型实现和评估来实现目标。研究使用了Google翻译将MSVD数据集扩展到尼泊尔语描述，然后训练了不同的CNN-RNN架构。
results: 研究发现，使用EfficientNetB0和BiLSTM结构的模型在BLEU和METEOR metric上达到了17和46的分数。此外，研究还描述了在尼泊尔语视频描述方面遇到的挑战和未来研究的方向。

Abstract
This article presents a study on Nepali video captioning using deep neural networks. Through the integration of pre-trained CNNs and RNNs, the research focuses on generating precise and contextually relevant captions for Nepali videos. The approach involves dataset collection, data preprocessing, model implementation, and evaluation. By enriching the MSVD dataset with Nepali captions via Google Translate, the study trains various CNN-RNN architectures. The research explores the effectiveness of CNNs (e.g., EfficientNetB0, ResNet101, VGG16) paired with different RNN decoders like LSTM, GRU, and BiLSTM. Evaluation involves BLEU and METEOR metrics, with the best model being EfficientNetB0 + BiLSTM with 1024 hidden dimensions, achieving a BLEU-4 score of 17 and METEOR score of 46. The article also outlines challenges and future directions for advancing Nepali video captioning, offering a crucial resource for further research in this area.

摘要
The study involves several steps, including dataset collection, data preprocessing, model implementation, and evaluation. To enrich the MSVD dataset with Nepali captions, the researchers use Google Translate to add captions to the videos. They then train various CNN-RNN architectures, including EfficientNetB0, ResNet101, and VGG16, paired with different RNN decoders such as LSTM, GRU, and BiLSTM.The evaluation metrics used in the study are BLEU and METEOR, and the best model is found to be EfficientNetB0 + BiLSTM with 1024 hidden dimensions, achieving a BLEU-4 score of 17 and METEOR score of 46. The article also discusses challenges and future directions for advancing Nepali video captioning, providing a valuable resource for further research in this area.

LLM-enhanced Self-training for Cross-domain Constituency Parsing

paper_url: http://arxiv.org/abs/2311.02660
repo_url: None
paper_authors: Jianling Li, Meishan Zhang, Peiming Guo, Min Zhang, Yue Zhang
for: 本研究探讨了自动训练在跨领域任务中的应用，特别是在跨领域成分分析中。
methods: 本研究提出了利用大语言模型（LLM）生成领域特定的raw corpora，并通过 grammar rules和假实例选择 criterion来引导LLM生成raw corpora。
results: 实验结果表明，自动训练 для成分分析，启用LLM，可以超越传统方法，无论LLM的性能如何。此外，结合grammar rules和假实例选择 criterion可以实现最高的跨领域成分分析性能。

Abstract
Self-training has proven to be an effective approach for cross-domain tasks, and in this study, we explore its application to cross-domain constituency parsing. Traditional self-training methods rely on limited and potentially low-quality raw corpora. To overcome this limitation, we propose enhancing self-training with the large language model (LLM) to generate domain-specific raw corpora iteratively. For the constituency parsing, we introduce grammar rules that guide the LLM in generating raw corpora and establish criteria for selecting pseudo instances. Our experimental results demonstrate that self-training for constituency parsing, equipped with an LLM, outperforms traditional methods regardless of the LLM's performance. Moreover, the combination of grammar rules and confidence criteria for pseudo-data selection yields the highest performance in the cross-domain constituency parsing.

摘要
自我训练已经证明是跨领域任务的有效方法，在这种研究中，我们探索了它的应用于跨领域成分分析。传统的自我训练方法取得有限和可能是低质量的Raw corpora。为了超越这些限制，我们提议通过大型语言模型（LLM）生成领域特定的Raw corpora，并在每一轮生成Raw corpora时遵循语法规则。对于成分分析，我们引入语法规则来导引LLM生成Raw corpora，并设置pseudo实例选择的标准。我们的实验结果表明，将自我训练与LLM结合使用，可以超越传统方法，无论LLM的性能如何。此外，结合语法规则和pseudo实例选择的信心标准，可以在跨领域成分分析中获得最高性能。

Divide & Conquer for Entailment-aware Multi-hop Evidence Retrieval

paper_url: http://arxiv.org/abs/2311.02616
repo_url: None
paper_authors: Fan Luo, Mihai Surdeanu
for: Answering multi-hop questions by retrieving evidences that are semantically equivalent or entailed by the question.
methods: Divide the task into two sub-tasks: semantic textual similarity retrieval and inference similarity retrieval, and use two ensemble models (EAR and EARnest) to jointly re-rank sentences with consideration of diverse relevance signals.
results: Significantly outperform all single retrieval models and two ensemble baseline models on HotpotQA, and more effective in retrieving relevant evidences for multi-hop questions.

Abstract
Lexical and semantic matches are commonly used as relevance measurements for information retrieval. Together they estimate the semantic equivalence between the query and the candidates. However, semantic equivalence is not the only relevance signal that needs to be considered when retrieving evidences for multi-hop questions. In this work, we demonstrate that textual entailment relation is another important relevance dimension that should be considered. To retrieve evidences that are either semantically equivalent to or entailed by the question simultaneously, we divide the task of evidence retrieval for multi-hop question answering (QA) into two sub-tasks, i.e., semantic textual similarity and inference similarity retrieval. We propose two ensemble models, EAR and EARnest, which tackle each of the sub-tasks separately and then jointly re-rank sentences with the consideration of the diverse relevance signals. Experimental results on HotpotQA verify that our models not only significantly outperform all the single retrieval models it is based on, but is also more effective than two intuitive ensemble baseline models.

摘要
lexical和semantic匹配通常用于信息检索中的相关性评估。它们共同估计查询和候选答案之间的semanticEquivalence。但semanticEquivalence并不是多步问题检索证据的唯一相关性信号。在这种情况下，我们表明文本涵义关系是另一个重要的相关性维度。为了同时检索具有查询和问题相似或涵义涵盖的证据，我们将多步问题answering（QA）证据检索任务分为两个子任务：semantic textual similarity retrieval和inference similarity retrieval。我们提出了两种ensemble模型，EAR和EARnest，它们分别处理每个子任务，然后对结果进行jointly重新排序，考虑多种相关性信号的多样性。实验结果表明，我们的模型不仅在HotpotQA上显著超越所有基于它的单个检索模型，还比两个INTUITIVE ensemble基eline模型更有效。

mahaNLP: A Marathi Natural Language Processing Library

paper_url: http://arxiv.org/abs/2311.02579
repo_url: https://github.com/l3cube-pune/MarathiNLP
paper_authors: Vidula Magdum, Omkar Dhekane, Sharayu Hiwarkhedkar, Saloni Mittal, Raviraj Joshi
for: 这个研究是为了提供一个开源的自然语言处理（NLP）库，专门针对印度语言Marathi进行支持。
methods: 这个研究使用了现代的MahaBERT基于trasnformer模型，并提供了一个易于使用、可扩展、对应的Marathi文本分析工具组。
results: 这个研究提供了一个全面的NLP任务集，包括基本的预处理任务和进阶的NLP任务，例如情感分析、命名实体识别、讨厌话检测和句子完成。

Abstract
We present mahaNLP, an open-source natural language processing (NLP) library specifically built for the Marathi language. It aims to enhance the support for the low-resource Indian language Marathi in the field of NLP. It is an easy-to-use, extensible, and modular toolkit for Marathi text analysis built on state-of-the-art MahaBERT-based transformer models. Our work holds significant importance as other existing Indic NLP libraries provide basic Marathi processing support and rely on older models with restricted performance. Our toolkit stands out by offering a comprehensive array of NLP tasks, encompassing both fundamental preprocessing tasks and advanced NLP tasks like sentiment analysis, NER, hate speech detection, and sentence completion. This paper focuses on an overview of the mahaNLP framework, its features, and its usage. This work is a part of the L3Cube MahaNLP initiative, more information about it can be found at https://github.com/l3cube-pune/MarathiNLP .

摘要
我们介绍mahaNLP，一个开源的自然语言处理（NLP）库，专门为旁遮普语言提供支持。它的目标是在NLP领域提高旁遮普语言的支持。这是一个易于使用、可扩展、具有模块性的旁遮普文本分析工具库，建立于现代的MahaBERT基于转移模型。我们的工作具有重要的意义，因为现有的印度语言NLP库只提供了基本的旁遮普处理支持，并且使用older模型，性能有限。我们的工具库包括了许多NLP任务，包括基本的预处理任务以及高级NLP任务，如情感分析、命名实体识别、仇恨言语检测和句子完成。本文将对mahaNLP框架、特点和使用进行概述。这是L3Cube MahaNLP项目的一部分，更多信息可以在https://github.com/l3cube-pune/MarathiNLP查看。

Temporal Sequencing of Documents

paper_url: http://arxiv.org/abs/2311.02578
repo_url: None
paper_authors: Michael Gervers, Gelila Tilahun
for: 这篇论文是为了排序历史文档的时间顺序而写的。
methods: 这种方法使用非 Parametric 泛函模型（Fan, Heckman, 和 Wand, 1995）来捕捉文字使用的慢滑度变化。
results: 这种方法可以有效地对历史文档进行排序，并且在对 medieval English 财产转让文档和美国国情报告 addresses 进行比较时，都有显著的改善。

Abstract
We outline an unsupervised method for temporal rank ordering of sets of historical documents, namely American State of the Union Addresses and DEEDS, a corpus of medieval English property transfer documents. Our method relies upon effectively capturing the gradual change in word usage via a bandwidth estimate for the non-parametric Generalized Linear Models (Fan, Heckman, and Wand, 1995). The number of possible rank orders needed to search through possible cost functions related to the bandwidth can be quite large, even for a small set of documents. We tackle this problem of combinatorial optimization using the Simulated Annealing algorithm, which allows us to obtain the optimal document temporal orders. Our rank ordering method significantly improved the temporal sequencing of both corpora compared to a randomly sequenced baseline. This unsupervised approach should enable the temporal ordering of undated document sets.

摘要
我们提出了一种无监督的方法，用于排序历史文档集合，包括美国州联合宪言和中世纪英格兰财产转让文档集。我们的方法基于有效地捕捉文本中慢慢变化的词汇使用情况，通过非参数化的泛化线性模型（Fan, Heckman, 和 Wand，1995）来估算带宽。由于搜索可能的排序方案的数量可能很大，即使是一小组文档也可能会出现这个问题。我们使用模拟熔化算法来解决这个问题，从而获得最佳的文档排序顺序。我们的排序方法在对两个 corpora 进行比较时具有显著改善，相比随机排序基线。这种无监督的方法应该能够应用于无日期文档集。

paper_url: http://arxiv.org/abs/2311.02570
repo_url: https://github.com/kamruzzaman15/banmani
paper_authors: Mahammed Kamruzzaman, Md. Minul Islam Shovon, Gene Louis Kim
for: 本研究旨在识别社交媒体新闻中 false 地操纵相关新闻文章的具体说法。
methods: 本研究使用了一种数据集采集方法，以 circumvent 当前可用的 NLP 工具在 Bangla 语言上的限制。
results: 研究发现，当 Zero-shot 和微调设定下，现有的 LLM 都难以满足这个任务的要求。

Abstract
Initial work has been done to address fake news detection and misrepresentation of news in the Bengali language. However, no work in Bengali yet addresses the identification of specific claims in social media news that falsely manipulates a related news article. At this point, this problem has been tackled in English and a few other languages, but not in the Bengali language. In this paper, we curate a dataset of social media content labeled with information manipulation relative to reference articles, called BanMANI. The dataset collection method we describe works around the limitations of the available NLP tools in Bangla. We expect these techniques will carry over to building similar datasets in other low-resource languages. BanMANI forms the basis both for evaluating the capabilities of existing NLP systems and for training or fine-tuning new models specifically on this task. In our analysis, we find that this task challenges current LLMs both under zero-shot and fine-tuned settings.

摘要
初步工作已经对假新闻检测和新闻歪曲的问题进行了准备。然而，目前没有任何工作在孟加拉语中对社交媒体新闻中谎言性的具体CLAIM进行识别。在这篇论文中，我们为这个问题收集了一个社交媒体内容的标注数据集，称为BanMANI。我们的数据集采集方法会讲述在可用的NLP工具 limitation下如何实现。我们期望这些技术可以扩展到其他低资源语言。BanMANI将成为评估现有NLP系统的能力以及训练或精度调整新模型的基础。在我们的分析中，我们发现这个任务对当前LLMs都是一个挑战，无论在零情况下或者精度调整后。

Topic model based on co-occurrence word networks for unbalanced short text datasets

paper_url: http://arxiv.org/abs/2311.02566
repo_url: None
paper_authors: Chengjie Ma, Junping Du, Meiyu Liang, Zeli Guan
for: 检测罕见话题在短文本 datasets 中的检测 (Detecting scarce topics in unbalanced short text datasets)
methods: 基于 co-occurrence word networks 的话题模型 (Topic model based on co-occurrence word networks)
results: 在不平衡短文本 datasets 中提供了一种可靠的话题检测方法 (Provides a reliable method for detecting topics in unbalanced short text datasets)

Abstract
We propose a straightforward solution for detecting scarce topics in unbalanced short-text datasets. Our approach, named CWUTM (Topic model based on co-occurrence word networks for unbalanced short text datasets), Our approach addresses the challenge of sparse and unbalanced short text topics by mitigating the effects of incidental word co-occurrence. This allows our model to prioritize the identification of scarce topics (Low-frequency topics). Unlike previous methods, CWUTM leverages co-occurrence word networks to capture the topic distribution of each word, and we enhanced the sensitivity in identifying scarce topics by redefining the calculation of node activity and normalizing the representation of both scarce and abundant topics to some extent. Moreover, CWUTM adopts Gibbs sampling, similar to LDA, making it easily adaptable to various application scenarios. Our extensive experimental validation on unbalanced short-text datasets demonstrates the superiority of CWUTM compared to baseline approaches in discovering scarce topics. According to the experimental results the proposed model is effective in early and accurate detection of emerging topics or unexpected events on social platforms.

摘要
我们提出了一种直观的解决方案，用于探测罕见话题在不均衡短文本数据集中。我们的方法，名为CWUTM（基于协occurrence词网络的短文本数据集中罕见话题模型），解决了短文本话题的罕见性和不均衡性的挑战。我们的模型可以增强对罕见话题的识别，并且可以在不同应用场景中轻松地适应。我们对不均衡短文本数据集进行了广泛的实验 validate，结果表明，相比基eline方法，CWUTM在发现罕见话题方面表现出了明显的优势。根据实验结果，我们的模型可以在社交平台上早期发现emerging话题或意外事件。Note: "短文本数据集" (short-text dataset) in Chinese is typically translated as "短文本集" (short-text collection), and "罕见话题" (scarce topic) is translated as "罕见话题" (rare topic) or "罕见话题" (underrepresented topic).

Relation Extraction Model Based on Semantic Enhancement Mechanism

paper_url: http://arxiv.org/abs/2311.02564
repo_url: None
paper_authors: Peiyu Liu, Junping Du, Yingxia Shao, Zeli Guan
for: 提高信息抽取中关系EXTRACTION的效果，解决 triple overlap 问题
methods: 基于CasRel框架和semantic enhancement mechanism，提出了CasAug模型，通过对可能主语进行semantic coding，采用含义增强机制，对可能主语进行权重调整，提高关系EXTRACTION的精度
results: 比基eline模型提高了关系EXTRACTION的效果，可以更好地处理 triple overlap 问题，提高了对多个关系的EXTRACTION能力

Abstract
Relational extraction is one of the basic tasks related to information extraction in the field of natural language processing, and is an important link and core task in the fields of information extraction, natural language understanding, and information retrieval. None of the existing relation extraction methods can effectively solve the problem of triple overlap. The CasAug model proposed in this paper based on the CasRel framework combined with the semantic enhancement mechanism can solve this problem to a certain extent. The CasAug model enhances the semantics of the identified possible subjects by adding a semantic enhancement mechanism, First, based on the semantic coding of possible subjects, pre-classify the possible subjects, and then combine the subject lexicon to calculate the semantic similarity to obtain the similar vocabulary of possible subjects. According to the similar vocabulary obtained, each word in different relations is calculated through the attention mechanism. For the contribution of the possible subject, finally combine the relationship pre-classification results to weight the enhanced semantics of each relationship to find the enhanced semantics of the possible subject, and send the enhanced semantics combined with the possible subject to the object and relationship extraction module. Complete the final relation triplet extraction. The experimental results show that, compared with the baseline model, the CasAug model proposed in this paper has improved the effect of relation extraction, and CasAug's ability to deal with overlapping problems and extract multiple relations is also better than the baseline model, indicating that the semantic enhancement mechanism proposed in this paper It can further reduce the judgment of redundant relations and alleviate the problem of triple overlap.

摘要
基于自然语言处理的信息EXTRACTION中，关系提取是一项基础任务和核心任务，与信息提取、自然语言理解和信息检索 closely related。现有的关系提取方法无法有效解决 triple overlap 问题。本文提出的 CasAug 模型，基于 CasRel 框架和semantic enhancement mechanism，可以减少重复的关系判断和 triple overlap 问题。CasAug 模型首先使用可能主语的semantic coding进行预类型，然后使用主语词典计算 Possible subjects 的semantic similarity，以获得每个关系中的相似词汇。通过注意机制，对每个关系中的每个词语进行计算。最后，根据关系预类型的结果，对各种关系中的semantics进行权重计算，并将权重计算结果与可能主语进行组合。最终，通过对象和关系提取模块进行完善，完成最终的关系 triplet 提取。实验结果表明，相比基eline模型，提出的 CasAug 模型在关系提取方面有所提高，并且 CasAug 模型在 triple overlap 问题上的处理能力也比基eline模型更好，这表明该paper中提出的 semantic enhancement mechanism 可以进一步减少重复的关系判断和 triple overlap 问题。

2023-11-05

cs.LG

cs.LG - 2023-11-05

From molecules to scaffolds to functional groups: building context-dependent molecular representation via multi-channel learning

paper_url: http://arxiv.org/abs/2311.02798
repo_url: None
paper_authors: Yue Wan, Jialu Wu, Tingjun Hou, Chang-Yu Hsieh, Xiaowei Jia
for: 这篇论文主要应用在化学物质的物理化学和生物学性质预测中，例如药物发现。
methods: 这篇论文使用了自类学习（SSL）方法，利用大规模、未标注的化学物质数据来学习化学空间的基础表现，并将这些表现与特定应用场景相结合。
results: 这篇论文的结果显示了与其他基eline比较之下，在多个分子属性评估标准中表现出色，并在特定但普遍的挑战 scenarios 中具有更高的稳定性和应用性。

Abstract
Reliable molecular property prediction is essential for various scientific endeavors and industrial applications, such as drug discovery. However, the scarcity of data, combined with the highly non-linear causal relationships between physicochemical and biological properties and conventional molecular featurization schemes, complicates the development of robust molecular machine learning models. Self-supervised learning (SSL) has emerged as a popular solution, utilizing large-scale, unannotated molecular data to learn a foundational representation of chemical space that might be advantageous for downstream tasks. Yet, existing molecular SSL methods largely overlook domain-specific knowledge, such as molecular similarity and scaffold importance, as well as the context of the target application when operating over the large chemical space. This paper introduces a novel learning framework that leverages the knowledge of structural hierarchies within molecular structures, embeds them through separate pre-training tasks over distinct channels, and employs a task-specific channel selection to compose a context-dependent representation. Our approach demonstrates competitive performance across various molecular property benchmarks and establishes some state-of-the-art results. It further offers unprecedented advantages in particularly challenging yet ubiquitous scenarios like activity cliffs with enhanced robustness and generalizability compared to other baselines.

摘要
可靠的分子性质预测是科学研究和工业应用中的关键，如药物搜索。然而，数据稀缺和物理化和生物性质之间非线性关系，以及传统的分子特征化方案，使分子机器学习模型的开发变得复杂。自我超视学习（SSL）已成为一种流行的解决方案，利用大规模、无注释的分子数据来学习分子空间的基础表示，这可能对下游任务有利。然而，现有的分子SSL方法忽视了域专门知识，如分子相似性和架构重要性，以及目标应用场景的 контекст。本文介绍一种新的学习框架，利用分子结构中的结构层次结构，通过不同的预训练任务来嵌入这些结构，并使用任务特定的通道选择来组合上下文依赖的表示。我们的方法在多种分子性质benchmark上显示竞争性的性能，并在一些特殊 yet ubiquitous的enario中提供了前所未有的优势，比如活性峰值中的提高了Robustness和普遍性。

Riemannian Laplace Approximation with the Fisher Metric

paper_url: http://arxiv.org/abs/2311.02766
repo_url: https://github.com/ksnxr/rlaf
paper_authors: Hanlin Yu, Marcelo Hartmann, Bernardo Williams, Mark Girolami, Arto Klami
for: 用于 Bayesian inference 的数据拟合
methods: 使用 Laplace 方法和 Riemannian geometry 拟合 Gaussian 分布
results: 提供了两种修改后的变体，并在几个实验中证明了其实际上的改进

Abstract
The Laplace's method approximates a target density with a Gaussian distribution at its mode. It is computationally efficient and asymptotically exact for Bayesian inference due to the Bernstein-von Mises theorem, but for complex targets and finite-data posteriors it is often too crude an approximation. A recent generalization of the Laplace Approximation transforms the Gaussian approximation according to a chosen Riemannian geometry providing a richer approximation family, while still retaining computational efficiency. However, as shown here, its properties heavily depend on the chosen metric, indeed the metric adopted in previous work results in approximations that are overly narrow as well as being biased even at the limit of infinite data. We correct this shortcoming by developing the approximation family further, deriving two alternative variants that are exact at the limit of infinite data, extending the theoretical analysis of the method, and demonstrating practical improvements in a range of experiments.

摘要
拉普拉斯方法用 Gaussian 分布近似目标概率分布， computationally efficient 和 asymptotically exact для Bayesian inference due to the Bernstein-von Mises theorem，但对复杂目标和 finite-data posterior 通常太粗糙。一种 recient generalization of the Laplace Approximation 使用 chosen Riemannian geometry 提供一个更加丰富的近似家族，仍然保持 computation efficiency，但选择的 метри可能会导致近似过于窄而且偏向的问题。我们在这里修正这个缺陷，发展了两个替代方案，并对方法的理论分析进行了扩展，在一系列实验中也表现了实践上的改进。

Log-Concavity of Multinomial Likelihood Functions Under Interval Censoring Constraints on Frequencies or Their Partial Sums

paper_url: http://arxiv.org/abs/2311.02763
repo_url: None
paper_authors: Bruce Levin, Erik Learned-Miller
for: 本文为了描述 multinomial vector 在 interval censoring 约束下的概率函数是完全log-concave。
methods: 本文使用 M-convex subset 的概念和 constrained sample spaces 的概念来证明概率函数的完全log-concavity。
results: 本文证明了 multinomial vector 在 interval censoring 约束下的概率函数是完全log-concave。

Abstract
We show that the likelihood function for a multinomial vector observed under arbitrary interval censoring constraints on the frequencies or their partial sums is completely log-concave by proving that the constrained sample spaces comprise M-convex subsets of the discrete simplex.

摘要
我们显示了Multinomial vector在arbitrary interval censored的情况下观察到的概率函数是完全log-concave，通过证明受限样本空间包含M-convex的简单体的子集。

One-Shot Strategic Classification Under Unknown Costs

paper_url: http://arxiv.org/abs/2311.02761
repo_url: None
paper_authors: Elan Rosenfeld, Nir Rosenfeld
for: 本研究旨在学习具有抗扰乱性的决策规则，以适应不确定的用户回应。
methods: 本研究使用一shot设定，通过commit一个核心分类器来解决不确定的用户回应问题。
results: 研究发现， même avec small mis-estimation of the true cost, the accuracy of the classifier can be arbitrarily low in the worst case. 我们提出了一种 minimax 问题来解决这个问题，并提供了efficient algorithms for both full-batch and stochastic settings, which converge to the minimax optimal solution at the dimension-independent rate of $\tilde{\mathcal{O}(T^{-\frac{1}{2})$.

Abstract
A primary goal in strategic classification is to learn decision rules which are robust to strategic input manipulation. Earlier works assume that strategic responses are known; while some recent works address the important challenge of unknown responses, they exclusively study sequential settings which allow multiple model deployments over time. But there are many domains$\unicode{x2014}$particularly in public policy, a common motivating use-case$\unicode{x2014}$where multiple deployments are unrealistic, or where even a single bad round is undesirable. To address this gap, we initiate the study of strategic classification under unknown responses in the one-shot setting, which requires committing to a single classifier once. Focusing on the users' cost function as the source of uncertainty, we begin by proving that for a broad class of costs, even a small mis-estimation of the true cost can entail arbitrarily low accuracy in the worst case. In light of this, we frame the one-shot task as a minimax problem, with the goal of identifying the classifier with the smallest worst-case risk over an uncertainty set of possible costs. Our main contribution is efficient algorithms for both the full-batch and stochastic settings, which we prove converge (offline) to the minimax optimal solution at the dimension-independent rate of $\tilde{\mathcal{O}(T^{-\frac{1}{2})$. Our analysis reveals important structure stemming from the strategic nature of user responses, particularly the importance of dual norm regularization with respect to the cost function.

摘要
primary goal in strategic classification 是学习强制性不受输入操纵的决策规则。earlier works 假设战略回应是已知的；而一些最近的工作 Address 了重要的挑战，但 exclusively 研究了顺序设置，允许多个模型的多次部署在时间上。但是，在公共政策领域等很多领域，多个部署是不现实的，或者 Even a single bad round 是不 desirable。为了解决这个差距，我们开始研究不知道回应的战略分类在一枚 Setting 中，需要在一次性地选择一个分类器。我们从用户的成本函数中的不确定性开始，我们证明了，even a small mis-estimation of the true cost 可以导致最差情况下的准确率为零。在这种情况下，我们将一枚 Setting 定义为一个 minimax 问题，目标是找到可以在不确定性集中的可能成本中最小最差情况的决策器。我们的主要贡献是对批处理和随机设置中的精炼算法，我们证明它们在线上 converges 到 minimax 优化的解决方案，具有约等于 $T^{- \frac{1}{2}$ 的缩放率。我们的分析表明了由战略性的用户回应带来的重要结构，特别是对于成本函数的双重范数规范。

ELEGANT: Certified Defense on the Fairness of Graph Neural Networks

paper_url: http://arxiv.org/abs/2311.02757
repo_url: https://github.com/yushundong/elegant
paper_authors: Yushun Dong, Binchi Zhang, Hanghang Tong, Jundong Li
for: 防止Graph Neural Networks（GNNs）的偏见和不公平性攻击
methods: 提出了一个名为ELEGANT的原则性框架，并进行了详细的理论证明，以保证GNNs的公平性
results: 在实际实验中，ELEGANT被证明可以有效地防止攻击者通过添加偏见来让GNNs的预测结果偏离公平性，并且可以用于GNNs偏移修复Here is the translation in English:
for: Protecting Graph Neural Networks (GNNs) from bias and unfair attacks
methods: Proposed a principled framework called ELEGANT and provided a detailed theoretical certification analysis to ensure the fairness of GNNs
results: In practical experiments, ELEGANT was proven to be effective in preventing attackers from corrupting the fairness level of GNNs’ predictions by adding perturbations, and it can also be used for GNN debiasing.

Abstract
Graph Neural Networks (GNNs) have emerged as a prominent graph learning model in various graph-based tasks over the years. Nevertheless, due to the vulnerabilities of GNNs, it has been empirically proved that malicious attackers could easily corrupt the fairness level of their predictions by adding perturbations to the input graph data. In this paper, we take crucial steps to study a novel problem of certifiable defense on the fairness level of GNNs. Specifically, we propose a principled framework named ELEGANT and present a detailed theoretical certification analysis for the fairness of GNNs. ELEGANT takes any GNNs as its backbone, and the fairness level of such a backbone is theoretically impossible to be corrupted under certain perturbation budgets for attackers. Notably, ELEGANT does not have any assumption over the GNN structure or parameters, and does not require re-training the GNNs to realize certification. Hence it can serve as a plug-and-play framework for any optimized GNNs ready to be deployed. We verify the satisfactory effectiveness of ELEGANT in practice through extensive experiments on real-world datasets across different backbones of GNNs, where ELEGANT is also demonstrated to be beneficial for GNN debiasing. Open-source code can be found at https://github.com/yushundong/ELEGANT.

摘要
格网神经网络（GNNs）在各种基于图的任务中显示出了突出的表现。然而，由于GNNS的漏洞，实际证明了恶意攻击者可以轻松地腐蚀GNNS的预测公平性水平。在这篇论文中，我们研究了一个新的问题——GNNS公平性水平的证明防御。 Specifically, we propose a principled framework named ELEGANT and present a detailed theoretical certification analysis for the fairness of GNNs. ELEGANT takes any GNNs as its backbone, and the fairness level of such a backbone is theoretically impossible to be corrupted under certain perturbation budgets for attackers. Notably, ELEGANT does not make any assumptions about the GNN structure or parameters, and does not require re-training the GNNs to realize certification. Therefore, it can serve as a plug-and-play framework for any optimized GNNs ready to be deployed. We verify the satisfactory effectiveness of ELEGANT in practice through extensive experiments on real-world datasets across different backbones of GNNs, where ELEGANT is also demonstrated to be beneficial for GNN debiasing. 开源代码可以在https://github.com/yushundong/ELEGANT找到。

Staged Reinforcement Learning for Complex Tasks through Decomposed Environments

paper_url: http://arxiv.org/abs/2311.02746
repo_url: None
paper_authors: Rafael Pina, Corentin Artaud, Xiaolan Liu, Varuna De Silva
for: 这个论文是关于智能控制领域内的强化学习（RL）的应用，尤其是在智能汽车控制方面。
methods: 论文提出了两种方法来 aproximate RL 问题到实际问题上，包括分解复杂任务为多个子任务，以及使用中央训练分布执行（CTDE） paradigma。
results: 实验结果表明，提posed方法可以提高智能代理人在交通十字路相关的复杂任务中的表现，并最小化可能发生的安全问题。

Abstract
Reinforcement Learning (RL) is an area of growing interest in the field of artificial intelligence due to its many notable applications in diverse fields. Particularly within the context of intelligent vehicle control, RL has made impressive progress. However, currently it is still in simulated controlled environments where RL can achieve its full super-human potential. Although how to apply simulation experience in real scenarios has been studied, how to approximate simulated problems to the real dynamic problems is still a challenge. In this paper, we discuss two methods that approximate RL problems to real problems. In the context of traffic junction simulations, we demonstrate that, if we can decompose a complex task into multiple sub-tasks, solving these tasks first can be advantageous to help minimising possible occurrences of catastrophic events in the complex task. From a multi-agent perspective, we introduce a training structuring mechanism that exploits the use of experience learned under the popular paradigm called Centralised Training Decentralised Execution (CTDE). This experience can then be leveraged in fully decentralised settings that are conceptually closer to real settings, where agents often do not have access to a central oracle and must be treated as isolated independent units. The results show that the proposed approaches improve agents performance in complex tasks related to traffic junctions, minimising potential safety-critical problems that might happen in these scenarios. Although still in simulation, the investigated situations are conceptually closer to real scenarios and thus, with these results, we intend to motivate further research in the subject.

摘要
强化学习（RL）是人工智能领域的一个快速发展领域，具有各种应用场景的优势。特别是在智能控制领域，RL已经做出了卓越的进展。然而，目前RL仍然在模拟控制环境中达到了最高的超人类水平。虽然有研究如何将模拟经验应用于实际场景，但是如何近似模拟问题到实际动态问题仍然是一个挑战。在这篇论文中，我们讨论了两种方法可以将RL问题近似到实际问题。在交通立交点模拟中，我们示出了如果将复杂任务分解成多个子任务，解决这些子任务可以帮助避免在复杂任务中可能发生的潜在灾难事件。从多智能代理的视角来看，我们介绍了一种使用中央训练分布执行（CTDE）的训练结构机制，利用这种机制可以在完全分布式的设置中使用经验学习。这些经验可以在实际场景中使用，agent们在实际场景中通常不具备中央报告机制，因此这些经验可以在完全分布式的设置中帮助agent们提高完成复杂任务的能力。结果显示，提出的方法可以在交通立交点任务中提高agent的性能，避免可能发生的安全关键问题。虽然仍在模拟环境中， investigate的情况概念上更近于实际场景，因此我们希望通过这些结果激励更多的研究在这个领域。

Exploiting Correlated Auxiliary Feedback in Parameterized Bandits

paper_url: http://arxiv.org/abs/2311.02715
repo_url: None
paper_authors: Arun Verma, Zhongxiang Dai, Yao Shu, Bryan Kian Hsiang Low
for: 这个论文研究了一种新的参数化带宽问题变体，在这个问题中，学习者可以观察附加的协助反馈，这些反馈与观察到的奖励相关。
methods: 这个论文首先开发了一种利用协助反馈建立奖励估计器，并提供了准确的信息 bounds，从而减少了 regret。
results: 实验结果在不同的设定中证明了我们提出的方法可以减少 regret，并且可以在不同的协助反馈下达到更好的性能。

Abstract
We study a novel variant of the parameterized bandits problem in which the learner can observe additional auxiliary feedback that is correlated with the observed reward. The auxiliary feedback is readily available in many real-life applications, e.g., an online platform that wants to recommend the best-rated services to its users can observe the user's rating of service (rewards) and collect additional information like service delivery time (auxiliary feedback). In this paper, we first develop a method that exploits auxiliary feedback to build a reward estimator with tight confidence bounds, leading to a smaller regret. We then characterize the regret reduction in terms of the correlation coefficient between reward and its auxiliary feedback. Experimental results in different settings also verify the performance gain achieved by our proposed method.

摘要
我们研究一种新的参数化强制投票问题变体，在该问题中学习者可以观察附加的auxiliary反馈，这些反馈与观察到的奖励相关。这些附加反馈在实际应用中很普遍，例如一个在线平台想要推荐用户最佳评分服务可以观察用户对服务的评分（奖励）并收集附加信息如服务交付时间（auxiliary反馈）。我们首先开发了一种利用附加反馈建立奖励估计器，并提供紧张的信息 bounds，从而减少了 regret。然后，我们Characterize了 regret reduction的相对评价差，并通过不同的设置的实验结果来验证我们的提posed方法的性能提升。

A Goal-Driven Approach to Systems Neuroscience

paper_url: http://arxiv.org/abs/2311.02704
repo_url: https://github.com/neuroailab/Neural-Alignment
paper_authors: Aran Nayebi
for: 这篇论文的目的是提出一种新的解释神经元和神经细胞之间的交互方式，以解决现有的描述问题。
methods: 这篇论文使用了实验 neuroscience 技术，记录并 manipulate 动物表现Complex behaviors 的时候，神经元的活动。
results: 这篇论文提出了一种新的解释神经元和神经细胞之间的交互方式，并在多种脑区和物种中进行了应用，以研究智能行为的起源。

Abstract
Humans and animals exhibit a range of interesting behaviors in dynamic environments, and it is unclear how our brains actively reformat this dense sensory information to enable these behaviors. Experimental neuroscience is undergoing a revolution in its ability to record and manipulate hundreds to thousands of neurons while an animal is performing a complex behavior. As these paradigms enable unprecedented access to the brain, a natural question that arises is how to distill these data into interpretable insights about how neural circuits give rise to intelligent behaviors. The classical approach in systems neuroscience has been to ascribe well-defined operations to individual neurons and provide a description of how these operations combine to produce a circuit-level theory of neural computations. While this approach has had some success for small-scale recordings with simple stimuli, designed to probe a particular circuit computation, often times these ultimately lead to disparate descriptions of the same system across stimuli. Perhaps more strikingly, many response profiles of neurons are difficult to succinctly describe in words, suggesting that new approaches are needed in light of these experimental observations. In this thesis, we offer a different definition of interpretability that we show has promise in yielding unified structural and functional models of neural circuits, and describes the evolutionary constraints that give rise to the response properties of the neural population, including those that have previously been difficult to describe individually. We demonstrate the utility of this framework across multiple brain areas and species to study the roles of recurrent processing in the primate ventral visual pathway; mouse visual processing; heterogeneity in rodent medial entorhinal cortex; and facilitating biological learning.

摘要
人类和动物在动态环境中展现出各种 interessante 行为，但是我们的大脑如何活动地重新格式化这些紧密的感知信息以启用这些行为仍然不清楚。现代神经科学实验受到了记录和修改百到千个神经元的技术的革命，这些方法使得我们可以在动物表现复杂行为时获取至前无之有的脑部数据。随着这些方法的发展，一个自然的问题出现了：如何将这些数据转化成可解释的洞察。传统的系统神经科学方法是将各个神经元归功于特定的操作，并提供一种描述如何这些操作相互作用以生成神经计算的综合理论。虽然这种方法在小规模记录下有一定的成功，但是它在面对复杂的刺激时经常导致不同的描述，这些描述在不同的刺激下都是不一致的。事实上，许多神经元响应 profiles 很难以用字符串来描述，这表明需要新的方法。在这个论文中，我们提出了一种不同的可解释性定义，并证明该定义在生成神经Circuit 级别的结构和功能模型方面具有承诺。我们还证明了这种定义在多个脑区和种类中的应用，以研究恒定处理的角色，包括人类脑镜下部Visual 路径; 鼠类视觉处理; 鼠类中脑核心受体区域的多样性; 和促进生物学学习。

Enhancing AI Research Paper Analysis: Methodology Component Extraction using Factored Transformer-based Sequence Modeling Approach

paper_url: http://arxiv.org/abs/2311.03401
repo_url: None
paper_authors: Madhusudan Ghosh, Debasis Ganguly, Partha Basuchowdhuri, Sudip Kumar Naskar
for: 本研究旨在自动抽取科学方法名称，提高科学文献中方法组成部分的抽取精度。
methods: 本研究提议使用分解方法，利用方法域的广泛类别信息，如 NLP、RL 等，以提高方法组成部分的抽取精度。
results: 实验结果显示，分解方法在尝试setup中表现出色，与基eline相比，提高了9.257%的精度。

Abstract
Research in scientific disciplines evolves, often rapidly, over time with the emergence of novel methodologies and their associated terminologies. While methodologies themselves being conceptual in nature and rather difficult to automatically extract and characterise, in this paper, we seek to develop supervised models for automatic extraction of the names of the various constituents of a methodology, e.g., `R-CNN', `ELMo' etc. The main research challenge for this task is effectively modeling the contexts around these methodology component names in a few-shot or even a zero-shot setting. The main contributions of this paper towards effectively identifying new evolving scientific methodology names are as follows: i) we propose a factored approach to sequence modeling, which leverages a broad-level category information of methodology domains, e.g., `NLP', `RL' etc.; ii) to demonstrate the feasibility of our proposed approach of identifying methodology component names under a practical setting of fast evolving AI literature, we conduct experiments following a simulated chronological setup (newer methodologies not seen during the training process); iii) our experiments demonstrate that the factored approach outperforms state-of-the-art baselines by margins of up to 9.257\% for the methodology extraction task with the few-shot setup.

摘要
科学研究领域中的研究方法不断发展，经常快速地出现新的方法和其相关的术语。在这篇论文中，我们想要开发有监督模型来自动提取方法学Component的名称，例如“R-CNN”、“ELMo”等。我们的研究挑战是在几个或者 zeroshot设置下，有效地模型这些方法组件名称的上下文。我们的主要贡献如下：1. 我们提出了一种分解方法来模型序列，借鉴了方法学领域的大致类别信息，例如“NLP”、“RL”等。2. 为证明我们提出的方法在实际情况下可行，我们在快速演化的AI文献中进行了实验，采用了模拟时间序列的设置（ newer methodologies not seen during the training process）。3. 我们的实验表明，我们的分解方法可以在几个或者 zeroshot设置下，与现有的基eline相比，提高了方法提取任务的效果，提高了9.257%。

Identifying Linearly-Mixed Causal Representations from Multi-Node Interventions

paper_url: http://arxiv.org/abs/2311.02695
repo_url: None
paper_authors: Simon Bing, Urmi Ninad, Jonas Wahl, Jakob Runge
for: 本研究旨在 Addressing the underconstrained problem of causal representation learning, particularly in the presence of multiple variables intervened upon within one environment.
methods: 我们的方法基于一个通用的假设，即在不同环境中的干预覆盖和多变量干预。我们采用了一种新的规范，即在不同环境中的干预Trace，并通过对这些跟踪进行规范和精炼来学习 causal representation。
results: 我们的实验结果表明，我们的方法可以在多变量干预下学习有效的 causal representation，并且可以避免一些先前的假设，如单变量干预和独立干预。

Abstract
The task of inferring high-level causal variables from low-level observations, commonly referred to as causal representation learning, is fundamentally underconstrained. As such, recent works to address this problem focus on various assumptions that lead to identifiability of the underlying latent causal variables. A large corpus of these preceding approaches consider multi-environment data collected under different interventions on the causal model. What is common to virtually all of these works is the restrictive assumption that in each environment, only a single variable is intervened on. In this work, we relax this assumption and provide the first identifiability result for causal representation learning that allows for multiple variables to be targeted by an intervention within one environment. Our approach hinges on a general assumption on the coverage and diversity of interventions across environments, which also includes the shared assumption of single-node interventions of previous works. The main idea behind our approach is to exploit the trace that interventions leave on the variance of the ground truth causal variables and regularizing for a specific notion of sparsity with respect to this trace. In addition to and inspired by our theoretical contributions, we present a practical algorithm to learn causal representations from multi-node interventional data and provide empirical evidence that validates our identifiability results.

摘要
task of inferring high-level causal variables from low-level observations, commonly referred to as causal representation learning, is fundamentally underconstrained. As such, recent works to address this problem focus on various assumptions that lead to identifiability of the underlying latent causal variables. A large corpus of these preceding approaches consider multi-environment data collected under different interventions on the causal model. What is common to virtually all of these works is the restrictive assumption that in each environment, only a single variable is intervened on. In this work, we relax this assumption and provide the first identifiability result for causal representation learning that allows for multiple variables to be targeted by an intervention within one environment. Our approach hinges on a general assumption on the coverage and diversity of interventions across environments, which also includes the shared assumption of single-node interventions of previous works. The main idea behind our approach is to exploit the trace that interventions leave on the variance of the ground truth causal variables and regularizing for a specific notion of sparsity with respect to this trace. In addition to and inspired by our theoretical contributions, we present a practical algorithm to learn causal representations from multi-node interventional data and provide empirical evidence that validates our identifiability results.

Regret Analysis of Learning-Based Linear Quadratic Gaussian Control with Additive Exploration

paper_url: http://arxiv.org/abs/2311.02679
repo_url: None
paper_authors: Archith Athrey, Othmane Mazhar, Meichen Guo, Bart De Schutter, Shengling Shi
for: 本文研究一种 computationally efficient 探索策略，即naive exploration，用于控制未知部分可观测系统 within Linear Quadratic Gaussian (LQG) 框架。
methods: 本文提出了一种 two-phase 控制算法 called LQG-NAIVE，包括一个初始阶段输入 Gaussian 信号以获得系统模型，然后在一个 episodic 的方式中与 naive exploration 和控制进行交互。
results: 我们证明了 LQG-NAIVE 可以实现 regret 增长率为 $\tilde{\mathcal{O}(\sqrt{T})$，即 $\mathcal{O}(\sqrt{T})$ 以上下标 Logarithmic factors 之后 $T$ 步骤。此外，我们还提出了 LQG-IF2E，它在探索信号中包括 Fisher Information Matrix (FIM)，并提供了 LQG-IF2E 的竞争性性能比 LQG-NAIVE 更好的数据分析证明。

Abstract
In this paper, we analyze the regret incurred by a computationally efficient exploration strategy, known as naive exploration, for controlling unknown partially observable systems within the Linear Quadratic Gaussian (LQG) framework. We introduce a two-phase control algorithm called LQG-NAIVE, which involves an initial phase of injecting Gaussian input signals to obtain a system model, followed by a second phase of an interplay between naive exploration and control in an episodic fashion. We show that LQG-NAIVE achieves a regret growth rate of $\tilde{\mathcal{O}(\sqrt{T})$, i.e., $\mathcal{O}(\sqrt{T})$ up to logarithmic factors after $T$ time steps, and we validate its performance through numerical simulations. Additionally, we propose LQG-IF2E, which extends the exploration signal to a `closed-loop' setting by incorporating the Fisher Information Matrix (FIM). We provide compelling numerical evidence of the competitive performance of LQG-IF2E compared to LQG-NAIVE.

摘要
在本文中，我们分析了computationally efficient exploration strategy（naive exploration）在Linear Quadratic Gaussian（LQG）框架下控制未知部分可观测系统中的 regret。我们提出了一种两相控制算法，即LQG-NAIVE，其包括一个初始阶段插入 Gaussian 输入信号以获得系统模型，然后是一个 episodic 的第二阶段，在这个阶段中，naive exploration 和控制之间进行了协调。我们证明了LQG-NAIVE 的 regret增长率为 $\tilde{\mathcal{O}(\sqrt{T})$，即在 $T$ 步时间后， regret 增长率为 $\mathcal{O}(\sqrt{T})$ 以上 logarithmic 因素。此外，我们还提出了LQG-IF2E，它在探索信号中包含了 Fisher Information Matrix（FIM）。我们通过数值实验证明了LQG-IF2E 的竞争性性比 LQG-NAIVE 更高。

Drone-Enabled Load Management for Solar Small Cell Networks in Next-Gen Communications Optimization for Solar Small Cells

paper_url: http://arxiv.org/abs/2311.02648
repo_url: None
paper_authors: Daksh Dave, Dhruv Khut, Sahil Nawale, Pushkar Aggrawal, Disha Rastogi, Kailas Devadkar
for: 支持5G及以后移动通信网络的绿色微网络能源管理
methods: 使用无人机携带的空中基站 Load Transfer 技术实现稳定可靠的能源重新分配
results: 提高了基站的可靠性和灵活性，降低了基站的能源损失和无人机交换次数

Abstract
In recent years, the cellular industry has witnessed a major evolution in communication technologies. It is evident that the Next Generation of cellular networks(NGN) will play a pivotal role in the acceptance of emerging IoT applications supporting high data rates, better Quality of Service(QoS), and reduced latency. However, the deployment of NGN will introduce a power overhead on the communication infrastructure. Addressing the critical energy constraints in 5G and beyond, this study introduces an innovative load transfer method using drone-carried airborne base stations (BSs) for stable and secure power reallocation within a green micro-grid network. This method effectively manages energy deficit by transferring aerial BSs from high to low-energy cells, depending on user density and the availability of aerial BSs, optimizing power distribution in advanced cellular networks. The complexity of the proposed system is significantly lower as compared to existing power cable transmission systems currently employed in powering the BSs. Furthermore, our proposed algorithm has been shown to reduce BS power outages while requiring a minimum number of drone exchanges. We have conducted a thorough review on real-world dataset to prove the efficacy of our proposed approach to support BS during high load demand times

摘要

Pointer Networks with Q-Learning for OP Combinatorial Optimization

paper_url: http://arxiv.org/abs/2311.02629
repo_url: None
paper_authors: Alessandro Barro
for: solves the Orienteering Problem (OP)
methods: combines Pointer Networks (Ptr-Nets) and Q-learning
results: superior capability in managing OP situationsHere are the three key points in Traditional Chinese:
for: 解决 Orienteering Problem (OP)
methods: 结合 Pointer Networks (Ptr-Nets) 和 Q-learning
results: 在 OP 中的优秀表现

Abstract
The Orienteering Problem (OP) presents a unique challenge in combinatorial optimization, emphasized by its widespread use in logistics, delivery, and transportation planning. Given the NP-hard nature of OP, obtaining optimal solutions is inherently complex. While Pointer Networks (Ptr-Nets) have exhibited prowess in various combinatorial tasks, their performance in the context of OP leaves room for improvement. Recognizing the potency of Q-learning, especially when paired with deep neural structures, this research unveils the Pointer Q-Network (PQN). This innovative method combines Ptr-Nets and Q-learning, effectively addressing the specific challenges presented by OP. We deeply explore the architecture and efficiency of PQN, showcasing its superior capability in managing OP situations.

摘要
Orienteering Problem（OP）呈现了 combinatorial optimization 领域的独特挑战，它在物流、交通规划等领域广泛应用。由于 OP 的NP-硬性，获得优化解决方案是自然复杂的。然而，Pointer Networks（Ptr-Nets）在其他 combinatorial 任务中表现出色，但在 OP 中的表现仍有空间提升。本研究认识到 Q-学习的能力，特别是在与深度神经结构结合时，因此提出了 Pointer Q-Network（PQN）。这种创新方法结合了 Ptr-Nets 和 Q-学习，有效地解决了 OP 中的特定挑战。我们深入探讨 PQN 的architecture和效率，展示其在 OP 中的superior 能力。

An adaptive standardisation model for Day-Ahead electricity price forecasting

paper_url: http://arxiv.org/abs/2311.02610
repo_url: https://github.com/ccaribe9/adaptstdepf
paper_authors: Carlos Sebastián, Carlos E. González-Guillén, Jesús Juan
for: Electricity market day-ahead price forecasting
methods: Introducing adaptive standardization to mitigate dataset shifts and improve forecasting performance
results: Significant improvement in forecasting accuracy across four markets, including two novel datasets, using less complex and widely accepted learning algorithms.

Abstract
The study of Day-Ahead prices in the electricity market is one of the most popular problems in time series forecasting. Previous research has focused on employing increasingly complex learning algorithms to capture the sophisticated dynamics of the market. However, there is a threshold where increased complexity fails to yield substantial improvements. In this work, we propose an alternative approach by introducing an adaptive standardisation to mitigate the effects of dataset shifts that commonly occur in the market. By doing so, learning algorithms can prioritize uncovering the true relationship between the target variable and the explanatory variables. We investigate four distinct markets, including two novel datasets, previously unexplored in the literature. These datasets provide a more realistic representation of the current market context, that conventional datasets do not show. The results demonstrate a significant improvement across all four markets, using learning algorithms that are less complex yet widely accepted in the literature. This significant advancement unveils opens up new lines of research in this field, highlighting the potential of adaptive transformations in enhancing the performance of forecasting models.

摘要
研究一天前价格在电力市场是时间序列预测中最受欢迎的问题。先前的研究强调使用越来越复杂的学习算法来捕捉市场的复杂动态。然而，有一个阈值，其中增加复杂性不会带来显著改善。在这种情况下，我们提议一种不同的方法，即引入适应标准化，以mitigate dataset shifts常见于市场中。这样做可以使学习算法更加注重捕捉target变量和解释变量之间的真实关系。我们对四个市场进行了研究，包括两个新的数据集，之前从未出现在文献中。这些数据集提供了更加现实的市场背景，与 conventient datasets不同。结果显示在所有四个市场中有显著改善，使用在文献中广泛accepted的学习算法。这一显著进步揭示了适应转换在预测模型性能提高方面的潜在力量，开启了新的研究方向。

Steady-State Analysis of Queues with Hawkes Arrival and Its Application to Online Learning for Hawkes Queues

paper_url: http://arxiv.org/abs/2311.02577
repo_url: None
paper_authors: Xinyun Chen, Guiyu Hong
for: 这个论文 investigate了单服务器队列中 Hawkes 到达和一般服务分布的长期行为，以及相关的优化问题。
methods: 该论文使用了新的 coupling 技术来确定工作负荷和忙期过程的 finite moment bounds，并证明这些队列过程在恒定状态下对数快速 converges。
results: 根据这些理论结论，该论文开发了一种高效的数据驱动的 numerial 算法来解决 Hawkes 队列中的优化工作人员问题，并发现在高峰期 régime, Hawkes 队列的工作人员划算与 класси GI/GI/1 模型存在显著差异。

Abstract
We investigate the long-run behavior of single-server queues with Hawkes arrivals and general service distributions and related optimization problems. In detail, utilizing novel coupling techniques, we establish finite moment bounds for the stationary distribution of the workload and busy period processes. In addition, we are able to show that, those queueing processes converge exponentially fast to their stationary distribution. Based on these theoretic results, we develop an efficient numerical algorithm to solve the optimal staffing problem for the Hawkes queues in a data-driven manner. Numerical results indicate a sharp difference in staffing for Hawkes queues, compared to the classic GI/GI/1 model, especially in the heavy-traffic regime.

摘要
我们研究单服务器队列中的长期行为，包括途径 Hawkes 的到达和一般服务分布，以及相关的优化问题。在详细的探讨中，我们利用新的 Coupling 技术，确定了工作负荷和忙期过程的finite moment bound。此外，我们还证明了这些队列过程在 exponentially fast 速度下关于其站点分布的整体准确性。基于这些理论结果，我们开发了一种高效的数据驱动的数字算法，解决 Hawkes 队列的优化人员问题。 numerically 的结果表明，在高负荷情况下，Hawkes 队列的人员配置和 класси GI/GI/1 模型之间存在很大的差异。

Temporal Treasure Hunt: Content-based Time Series Retrieval System for Discovering Insights

paper_url: http://arxiv.org/abs/2311.02560
repo_url: None
paper_authors: Chin-Chia Michael Yeh, Huiyuan Chen, Xin Dai, Yan Zheng, Yujie Fan, Vivian Lai, Junpeng Wang, Audrey Der, Zhongfang Zhuang, Liang Wang, Wei Zhang
for: 本研究旨在解决多个领域时序数据库中的时序数据检索问题。
methods: 本研究使用了多种流行的时序数据模型化和检索方法，并提出了一种新的距离学习模型来解决这个问题。
results: 对于多个领域时序数据库中的时序数据检索问题，新的距离学习模型表现出色，超过了现有的方法。

Abstract
Time series data is ubiquitous across various domains such as finance, healthcare, and manufacturing, but their properties can vary significantly depending on the domain they originate from. The ability to perform Content-based Time Series Retrieval (CTSR) is crucial for identifying unknown time series examples. However, existing CTSR works typically focus on retrieving time series from a single domain database, which can be inadequate if the user does not know the source of the query time series. This limitation motivates us to investigate the CTSR problem in a scenario where the database contains time series from multiple domains. To facilitate this investigation, we introduce a CTSR benchmark dataset that comprises time series data from a variety of domains, such as motion, power demand, and traffic. This dataset is sourced from a publicly available time series classification dataset archive, making it easily accessible to researchers in the field. We compare several popular methods for modeling and retrieving time series data using this benchmark dataset. Additionally, we propose a novel distance learning model that outperforms the existing methods. Overall, our study highlights the importance of addressing the CTSR problem across multiple domains and provides a useful benchmark dataset for future research.

摘要
时序数据在不同领域 everywhere，如金融、医疗和制造等，但它们的属性可以很大不同。能够实现基于内容的时序数据检索（CTSR）是识别未知时序例子的重要能力。然而，现有的CTSR工作通常将注意力集中在单一领域数据库上，这可能不够用于用户不知道查询时序序列的来源。这种限制使我们感到需要调查多个领域数据库中的CTSR问题。为了实现这一目的，我们提出了一个CTSRBenchmark dataset，该dataset包含多个领域的时序数据，如运动、电力需求和交通。这些数据来自公共可用时序分类数据集存档，因此可以让研究人员在领域中轻松地访问。我们比较了多种流行的时序数据模型化和检索方法，并提出了一种新的距离学习模型，该模型在CTSRBenchmark dataset上表现出色。总之，我们的研究强调了跨多个领域的CTSR问题的重要性，并提供了一个有用的CTSRBenchmark dataset，为未来的研究提供了便利。

Fast Minimization of Expected Logarithmic Loss via Stochastic Dual Averaging

paper_url: http://arxiv.org/abs/2311.02557
repo_url: None
paper_authors: Chung-En Tsai, Hao-Chung Cheng, Yen-Huan Li
for: 这个论文目的是将预期对数损失最小化，并且考虑了概率简单体和量子激发函数的问题。
methods: 这个论文使用了 Stochastic First-Order Algorithm with Logarithmic Barrier， named $B$-sample stochastic dual averaging。
results: 这个算法可以在 $\tilde{O} (d^2/\varepsilon^2)$ 时间内获得 $\varepsilon$-优解，与现有的概率方法减少了 $d^{2\omega-2}$ 的时间复杂度，超过了批处理方法的时间复杂度 $d^2$。

Abstract
Consider the problem of minimizing an expected logarithmic loss over either the probability simplex or the set of quantum density matrices. This problem encompasses tasks such as solving the Poisson inverse problem, computing the maximum-likelihood estimate for quantum state tomography, and approximating positive semi-definite matrix permanents with the currently tightest approximation ratio. Although the optimization problem is convex, standard iteration complexity guarantees for first-order methods do not directly apply due to the absence of Lipschitz continuity and smoothness in the loss function. In this work, we propose a stochastic first-order algorithm named $B$-sample stochastic dual averaging with the logarithmic barrier. For the Poisson inverse problem, our algorithm attains an $\varepsilon$-optimal solution in $\tilde{O} (d^2/\varepsilon^2)$ time, matching the state of the art. When computing the maximum-likelihood estimate for quantum state tomography, our algorithm yields an $\varepsilon$-optimal solution in $\tilde{O} (d^3/\varepsilon^2)$ time, where $d$ denotes the dimension. This improves on the time complexities of existing stochastic first-order methods by a factor of $d^{\omega-2}$ and those of batch methods by a factor of $d^2$, where $\omega$ denotes the matrix multiplication exponent. Numerical experiments demonstrate that empirically, our algorithm outperforms existing methods with explicit complexity guarantees.

摘要
问题是最小化预期的含阶函数损失的问题，这个问题包括解决波索因 inverse 问题、计算量子状态探测的最大可能性估计、以及使用当前最紧的比率来近似正semidefinite 矩阵的 permanents。尽管优化问题是凸的，但标准的第一阶方法的证明不直接适用，因为损失函数没有 lipschitz 连续和光滑性。在这篇文章中，我们提出了一种 Stochastic first-order 算法，名为 $B$-sample stochastic dual averaging with logarithmic barrier。对于波索因 inverse 问题，我们的算法可以在 $\tilde{O} (d^2/\varepsilon^2)$ 时间内获得 $\varepsilon$-优的解，与当前状态之冲突。当计算量子状态探测的最大可能性估计时，我们的算法可以在 $\tilde{O} (d^3/\varepsilon^2)$ 时间内获得 $\varepsilon$-优的解，其中 $d$ 是维度。这比现有的随机第一阶方法的时间复杂度增加 $d^{\omega}-2}$，并且比批处理方法增加 $d^2$，其中 $\omega$ 是矩阵乘法 exponent。实验表明，我们的算法在实际中比现有的方法 WITH 显式复杂度保证更好。

High-dimensional Bid Learning for Energy Storage Bidding in Energy Markets

paper_url: http://arxiv.org/abs/2311.02551
repo_url: None
paper_authors: Jinyu Liu, Hongye Guo, Qinghu Tang, En Lu, Qiuna Cai, Qixin Chen
for: optimize the profitability of Energy Storage Systems (ESSs) in electricity markets with high volatility
methods: modify the common reinforcement learning (RL) process with a new bid representation method called Neural Network Embedded Bids (NNEBs), which represents market bids as monotonic neural networks with discrete outputs
results: achieve 18% higher profit than the baseline and up to 78% profit of the optimal market bidder through experiments on real-world market datasets

Abstract
With the growing penetration of renewable energy resource, electricity market prices have exhibited greater volatility. Therefore, it is important for Energy Storage Systems(ESSs) to leverage the multidimensional nature of energy market bids to maximize profitability. However, current learning methods cannot fully utilize the high-dimensional price-quantity bids in the energy markets. To address this challenge, we modify the common reinforcement learning(RL) process by proposing a new bid representation method called Neural Network Embedded Bids (NNEBs). NNEBs refer to market bids that are represented by monotonic neural networks with discrete outputs. To achieve effective learning of NNEBs, we first learn a neural network as a strategic mapping from the market price to ESS power output with RL. Then, we re-train the network with two training modifications to make the network output monotonic and discrete. Finally, the neural network is equivalently converted into a high-dimensional bid for bidding. We conducted experiments over real-world market datasets. Our studies show that the proposed method achieves 18% higher profit than the baseline and up to 78% profit of the optimal market bidder.

摘要
NNEBs refer to market bids that are represented by monotonic neural networks with discrete outputs. To effectively learn NNEBs, we first learn a neural network as a strategic mapping from the market price to ESS power output with RL. Then, we re-train the network with two training modifications to make the network output monotonic and discrete. Finally, the neural network is equivalently converted into a high-dimensional bid for bidding.We conducted experiments over real-world market datasets. Our studies show that the proposed method achieves 18% higher profit than the baseline and up to 78% profit of the optimal market bidder.

Preliminary Analysis on Second-Order Convergence for Biased Policy Gradient Methods

paper_url: http://arxiv.org/abs/2311.02546
repo_url: None
paper_authors: Siqiao Mu, Diego Klabjan
for: 研究 policy gradient 算法在非凸函数空间中的 globally optimal 性和稳定性。
methods: 使用非凸函数结构的 regularity assumptions, 以及 second-order guarantee 技巧，从而实现更好的 convergence 性。
results: 提供了 biased policy gradient 算法的 preliminary 结果，并且采用 nonconvex 优化技巧进行证明。 future work 将是提供actor-critic 算法的 finite-time second-order convergence 分析。

Abstract
Although the convergence of policy gradient algorithms to first-order stationary points is well-established, the objective functions of reinforcement learning problems are typically highly nonconvex. Therefore, recent work has focused on two extensions: ``global" convergence guarantees under regularity assumptions on the function structure, and second-order guarantees for escaping saddle points and convergence to true local minima. Our work expands on the latter approach, avoiding the restrictive assumptions of the former that may not apply to general objective functions. Existing results on vanilla policy gradient only consider an unbiased gradient estimator, but practical implementations under the infinite-horizon discounted setting, including both Monte-Carlo methods and actor-critic methods, involve gradient descent updates with a biased gradient estimator. We present preliminary results on the convergence of biased policy gradient algorithms to second-order stationary points, leveraging proof techniques from nonconvex optimization. In our next steps we aim to provide the first finite-time second-order convergence analysis for actor-critic algorithms.

摘要
although the convergence of policy gradient algorithms to first-order stationary points is well-established, the objective functions of reinforcement learning problems are typically highly nonconvex. therefore, recent work has focused on two extensions: "global" convergence guarantees under regularity assumptions on the function structure, and second-order guarantees for escaping saddle points and convergence to true local minima. our work expands on the latter approach, avoiding the restrictive assumptions of the former that may not apply to general objective functions. existing results on vanilla policy gradient only consider an unbiased gradient estimator, but practical implementations under the infinite-horizon discounted setting, including both monte-carlo methods and actor-critic methods, involve gradient descent updates with a biased gradient estimator. we present preliminary results on the convergence of biased policy gradient algorithms to second-order stationary points, leveraging proof techniques from nonconvex optimization. in our next steps, we aim to provide the first finite-time second-order convergence analysis for actor-critic algorithms.Here's the translation in Traditional Chinese:although the convergence of policy gradient algorithms to first-order stationary points is well-established, the objective functions of reinforcement learning problems are typically highly nonconvex. therefore, recent work has focused on two extensions: "global" convergence guarantees under regularity assumptions on the function structure, and second-order guarantees for escaping saddle points and convergence to true local minima. our work expands on the latter approach, avoiding the restrictive assumptions of the former that may not apply to general objective functions. existing results on vanilla policy gradient only consider an unbiased gradient estimator, but practical implementations under the infinite-horizon discounted setting, including both monte-carlo methods and actor-critic methods, involve gradient descent updates with a biased gradient estimator. we present preliminary results on the convergence of biased policy gradient algorithms to second-order stationary points, leveraging proof techniques from nonconvex optimization. in our next steps, we aim to provide the first finite-time second-order convergence analysis for actor-critic algorithms.

2023-11-05

eess.IV

eess.IV - 2023-11-05

Flexible uniform-sampling foveated Fourier single-pixel imaging

paper_url: http://arxiv.org/abs/2311.02646
repo_url: None
paper_authors: Huan Cui, Jie Cao, Qun Hao, Haoyu Zhang, Chang Zhou
for: The paper aims to achieve high-quality single-pixel imaging (SPI) using fewer measurements, which is essential for real-time SPI applications.
methods: The proposed method, called uniform-sampling foveated FSI (UFFSI), utilizes three features: uniform sampling, effective sampling, and flexible fovea. These features reduce data redundancy, transform non-uniform sampling into uniform sampling, and achieve under-sampling high-efficiency and high-quality SPI.
results: Experimental results show that UFFSI with 255341 cells and 89% reduction in data redundancy can achieve significantly better imaging quality than traditional high-resolution FSI with 1024768 pixels, while reducing the number of measurements required. This breakthrough may pave the way for future real-time SPI applications.Here’s the simplified Chinese text for the three key points:
for: 这个论文目的是实现 fewer measurements 的高质量单像素成像 (SPI)，这对实时 SPI 应用非常重要。
methods: 提议的方法是 uniform-sampling foveated FSI (UFFSI)，它利用了三个特点：uniform sampling、effective sampling 和 flexible fovea。这些特点可以减少数据纠缠、将非uniform sampling 转换为 uniform sampling，并实现 under-sampling 高效率高质量 SPI。
results: 实验结果表明，UFFSI WITH 255*341 cells 和 89% 的数据纠缠减少，可以在大规模场景中实现高质量 SPI，而不需要如传统高分辨 FSI 的多个数据。这种突破可能会对未来实时 SPI 应用产生深见。

Abstract
Fourier single-pixel imaging (FSI) is a data-efficient single-pixel imaging (SPI). However, there is still a serious challenge to obtain higher imaging quality using fewer measurements, which limits the development of real-time SPI. In this work, a uniform-sampling foveated FSI (UFFSI) is proposed with three features, uniform sampling, effective sampling and flexible fovea, to achieve under-sampling high-efficiency and high-quality SPI, even in a large-scale scene. First, by flexibly using the three proposed foveated pattern structures, data redundancy is reduced significantly to only require high resolution (HR) on regions of interest (ROIs), which radically reduces the need of total data number. Next, by the non-uniform weight distribution processing, non-uniform spatial sampling is transformed into uniform sampling, then the fast Fourier transform is used accurately and directly to obtain under-sampling high imaging quality with further reduced measurements. At a sampling ratio of 0.0084 referring to HR FSI with 1024*768 pixels, experimentally, by UFFSI with 255*341 cells of 89% reduction in data redundancy, the ROI has a significantly better imaging quality to meet imaging needs. We hope this work can provide a breakthrough for future real-time SPI.

摘要
富含单个像素成像（FSI）是一种数据效率高的单个像素成像（SPI）。然而，在使用更少测量时，获得更高质量成像仍然是一个严重挑战，这限制了实时SPI的发展。在这种工作中，我们提出了一种固定样式的均匀抽象抽象（UFFSI），具有三个特点：均匀采样、有效采样和灵活覆盖。通过使用这三种提议的覆盖模式，减少了数据繁殖，只需要在关键区域（ROI）中高分辨率（HR），从而减少了总数据量。然后，通过非均匀权重分布处理，将非均匀的空间采样转换为均匀采样，然后使用快速傅立叶变换，直接获得减少测量的高质量成像。在0.0084的抽象比例（HR FSI）下，实验证明，使用UFFSI的255*341个细胞，可以减少数据繁殖的89%，ROI中的成像质量得到明显改善。我们希望这种工作可以为未来实时SPI提供一个突破。

2023-11-05

eess.SP

eess.SP - 2023-11-05

Age of Information Analysis for CR-NOMA Aided Uplink Systems with Randomly Arrived Packets

paper_url: http://arxiv.org/abs/2311.02691
repo_url: None
paper_authors: Yanshi Sun, Yanglin Ye, Zhiguo Ding, Momiao Zhou, Lei Liu
for: 这种研究旨在应用 cognitive radio 引导的非对称访问技术来降低 uplink 传输中的年龄信息 (AoI)。methods: 这种研究使用了时分多access (TDMA) 的传统网络作为研究对象，每个用户都被分配了一个专用的时间槽来传输其状态更新信息。研究采用了 cognitive radio 技术，让每个用户在其他用户的时间槽中发送信息，从而提高发送机会数。研究人员还开发了一个精确的分析框架，以确定 CR-NOMA 和无重传的 AoI 表达式，并考虑了状态更新生成过程中的随机性。results: 研究结果表明，通过应用 CR-NOMA，AoI 可以显著降低，特别是当状态到达率低时。此外，使用重传也能够减少 AoI，特别是当状态到达率低时。

Abstract
This paper studies the application of cognitive radio inspired non-orthogonal multiple access (CR-NOMA) to reduce age of information (AoI) for uplink transmission. In particular, a time division multiple access (TDMA) based legacy network is considered, where each user is allocated with a dedicated time slot to transmit its status update information. The CR-NOMA is implemented as an add-on to the TDMA legacy network, which enables each user to have more opportunities to transmit by sharing other user's time slots. A rigorous analytical framework is developed to obtain the expressions for AoIs achieved by CR-NOMA with and without re-transmission, by taking the randomness of the status update generating process into consideration. Numerical results are presented to verify the accuracy of the developed analysis. It is shown that the AoI can be significantly reduced by applying CR-NOMA compared to TDMA. Moreover, the use of re-transmission is helpful to reduce AoI, especially when the status arrival rate is low.

摘要
To analyze the performance of CR-NOMA, a rigorous analytical framework is developed, taking into account the randomness of the status update generating process. The expressions for AoIs achieved by CR-NOMA with and without re-transmission are derived, and numerical results are presented to verify the accuracy of the analysis.The results show that CR-NOMA can significantly reduce AoI compared to TDMA, and the use of re-transmission is particularly beneficial when the status arrival rate is low. This suggests that CR-NOMA can be an effective technique for improving the efficiency of uplink transmission in legacy networks.

An Open Dataset Storage Standard for 6G Testbeds

paper_url: http://arxiv.org/abs/2311.02662
repo_url: None
paper_authors: Gilles Callebaut, Michiel Sandra, Christian Nelson, Thomas Wilding, Daan Delabie, Benjamin J. B. Deutschmann, William Tärneberg, Emma Fitzgerald, Anders J. Johansson, Liesbet Van der Perre
for: 提高6G测试床的数据可访性和可重用性，以便增进研究社区中数据的共享和合作。
methods: 提出了数据存储标准（DSS），用于实验和仿真数据的交换和处理，并支持多种测试床和数据类型。
results: DSS可以提高研究者们对数据的访问和 reuse，满足FAIR原则（找到、访问、交换、重用），并且不限于RF数据存储。

Abstract
The emergence of sixth-generation (6G) networks has spurred the development of novel testbeds, including sub-THz networks, cell-free systems, and 6G simulators. To maximize the benefits of these systems, it is crucial to make the generated data publicly available and easily reusable by others. Although data sharing has become a common practice, a lack of standardization hinders data accessibility and interoperability. In this study, we propose the Dataset Storage Standard (DSS) to address these challenges by facilitating data exchange and enabling convenient processing script creation in a testbed-agnostic manner. DSS supports both experimental and simulated data, allowing researchers to employ the same processing scripts and tools across different datasets. Unlike existing standardization efforts such as SigMF and NI RF Data Recording API, DSS provides a broader scope by accommodating a common definition file for testbeds and is not limited to RF data storage. The dataset format utilizes a hierarchical structure, with a tensor representation for specific experiment scenarios. In summary, DSS offers a comprehensive and flexible framework for enhancing the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) in 6G testbeds, promoting open and efficient data sharing in the research community.

摘要
“六代网络（6G）的出现促进了新的测试平台的发展，包括Sub-THz网络、无终端系统和6G模拟器。为了最大化这些系统的利器，它是非常重要的让生成的数据公开可用，并且可以轻松地重用其他人。虽然数据分享已经成为常见的做法，但是数据访问和兼容性受到标准化的限制。在这项研究中，我们提议使用数据存储标准（DSS）来解决这些挑战，使得数据交换和处理脚本的创建变得更加简单和通用。DSS支持实验和模拟数据，允许研究人员使用相同的处理脚本和工具来处理不同的数据集。与现有的标准化尝试如SigMF和NI RF数据记录API不同，DSS具有更广泛的范围，可以涵盖各种测试平台的公定定义文件。数据格式采用层次结构，使用tensor表示法来描述特定的实验场景。总之，DSS提供了一个全面和灵活的框架，以便在6G测试平台中提高FAIR原则（找到、访问、兼容性和重用），推动开放和高效的数据分享在研究 сообществе。”

Exploiting Hybrid Terrestrial/LEO Satellite Systems for Rural Connectivity

paper_url: http://arxiv.org/abs/2311.02591
repo_url: None
paper_authors: Houcem Ben Salem, Nour Kouzayha, Ammar EL Falou, Mohamed-Slim Alouini, Tareq Y. Al-Naffouri
for: 本文目的是评估hybrid terrestrial/satellite网络在农村地区提供连接性能的性能。
methods: 本文使用束ochastic geometry工具来 derivatetractable表达式，用于评估hybrid网络中用户与基站或卫星连接的coverage概率和平均数据速率。
results: 经过Monte Carlo仿真 validate the accuracy of the derived expressions, the obtained results show that the satellite constellation size, terrestrial base station density, and MIMO configuration parameters all have a significant impact on the performance of hybrid networks in providing rural connectivity.

Abstract
Satellite networks are playing an important role in realizing global seamless connectivity in beyond 5G and 6G wireless networks. In this paper, we develop a comprehensive analytical framework to assess the performance of hybrid terrestrial/satellite networks in providing rural connectivity. We assume that the terrestrial base stations are equipped with multiple-input-multiple-output (MIMO) technologies and that the user has the option to associate with a base station or a satellite to be served. Using tools from stochastic geometry, we derive tractable expressions for the coverage probability and average data rate and prove the accuracy of the derived expressions through Monte Carlo simulations. The obtained results capture the impact of the satellite constellation size, the terrestrial base station density, and the MIMO configuration parameters.

摘要
卫星网络在超5G和6G无线网络中实现全球无缝连接具有重要作用。本文，我们开发了一个完整的分析框架，以评估卫星/地面网络在偏远地区connectivity提供的性能。我们假设地面基站装备了多输入多出力（MIMO）技术，用户可以选择与基站或卫星连接。使用Stochastic Geometry工具，我们 derivates tractable表达式，表示覆盖率和平均数据速率，并通过Monte Carlo仿真实验 validate the accuracy of the derived expressions。获得的结果反映了卫星星座大小、地面基站密度和MIMO配置参数的影响。

Pilot-Based Key Distribution and Encryption for Secure Coherent Passive Optical Networks

paper_url: http://arxiv.org/abs/2311.02554
repo_url: None
paper_authors: Haide Wang, Ji Zhou, Qingxin Lu, Jianrui Zeng, Yongqing Liao, Weiping Liu, Changyuan Yu, Zhaohui Li
for: 提高了物理层安全性，增强了光学网络的安全性。methods: 使用 Advanced Encryption Standard（AES）算法和四级振荡形式（GCS-PAM4）的导航干扰器基础设定密钥分布。results: 实验结果表明，使用GCS-PAM4导航干扰器可以实现无错误的上行传输，并且防止下行传输中的侦测。此外，使用GCS-PAM4导航干扰器对CPR的影响相对较小。

Abstract
The security issues of passive optical networks (PONs) have always been a concern due to broadcast transmission. Physical-layer security enhancement for the coherent PON should be as significant as improving transmission performance. In this paper, we propose the advanced encryption standard (AES) algorithm and geometric constellation shaping four-level pulse amplitude modulation (GCS-PAM4) pilot-based key distribution for secure coherent PON. The first bit of the GCS-PAM4 pilot is used for the hardware-efficient carrier phase recovery (CPR), while the second bit is utilized for key distribution without occupying the additional overhead. The key bits are encoded by the polar code to ensure error-free distribution. Frequent key updates are permitted for every codeword to improve the security of coherent PON. The experimental results of the 200-Gbps secure coherent PON using digital subcarrier multiplexing show that the GCS-PAM4 pilot-based key distribution could be error-free at upstream transmission without occupying the additional overhead and the eavesdropping would be prevented by AES algorithm at downstream transmission. Moreover, there is almost no performance penalty on the CPR using the GCS-PAM4 pilot compared to the binary phase shift keying pilot.

摘要
PASSIVE OPTICAL NETWORKS (PONs) 的安全问题一直以来都是一大问题，因为它们使用广播传输。为了提高干扰性的 Physical-layer 安全性，在本文中我们提出了高级加密标准 (AES) 算法和四个水平杂化干扰 (GCS-PAM4) 导航器基于钥匙分布。GCS-PAM4 导航器的第一个比特用于硬件高效的 carriers 逻辑征 recovery (CPR)，而第二个比特用于钥匙分布，不占用额外开销。钥匙位数用波尔代码确保错误自动分配。在每个代码字符串中允许频繁更新钥匙，以提高干扰性的 PON 安全性。实验结果表明，使用数字子帧多路分 multiplexing 的 200 Gbps 安全干扰 PON 中 GCS-PAM4 导航器基于钥匙分布可以在上行传输中实现无错误，并且在下行传输中防止侦测。此外，使用 GCS-PAM4 导航器与使用 binary phase shift keying 导航器相比，CPR 的性能几乎无损。

2023-11-04

cs.SD

cs.SD - 2023-11-04

OverHear: Headphone based Multi-sensor Keystroke Inference

paper_url: http://arxiv.org/abs/2311.02288
repo_url: None
paper_authors: Raveen Wijewickrama, Maryam Abbasihafshejani, Anindya Maiti, Murtuza Jadliwala
for: 这篇论文旨在检测和分析Headphones中的键盘输入嗅探攻击。
methods: 该论文使用了OverHear框架，该框架利用了Headphones中的高级度麦克风和加速度计数据来进行键盘输入预测。
results: 实验结果表明，该方法可以在不同环境下达到键盘输入预测精度达80%以上，而 word prediction 精度超过70%。

Abstract
Headphones, traditionally limited to audio playback, have evolved to integrate sensors like high-definition microphones and accelerometers. While these advancements enhance user experience, they also introduce potential eavesdropping vulnerabilities, with keystroke inference being our concern in this work. To validate this threat, we developed OverHear, a keystroke inference framework that leverages both acoustic and accelerometer data from headphones. The accelerometer data, while not sufficiently detailed for individual keystroke identification, aids in clustering key presses by hand position. Concurrently, the acoustic data undergoes analysis to extract Mel Frequency Cepstral Coefficients (MFCC), aiding in distinguishing between different keystrokes. These features feed into machine learning models for keystroke prediction, with results further refined via dictionary-based word prediction methods. In our experimental setup, we tested various keyboard types under different environmental conditions. We were able to achieve top-5 key prediction accuracy of around 80% for mechanical keyboards and around 60% for membrane keyboards with top-100 word prediction accuracies over 70% for all keyboard types. The results highlight the effectiveness and limitations of our approach in the context of real-world scenarios.

摘要
headphones, 原本只是专门用于音频播放的设备, 已经演化到添加了高级 Microphone 和加速度计数器等感应器。这些进步可以增强用户体验, 但也会带来 potential eavesdropping 问题, 我们在这个工作中关注的是键盘输入推测的问题。为了验证这个问题, 我们开发了 OverHear 框架, 这个框架利用 headphones 上的 acoustic 和加速度数据来进行键盘输入推测。加速度数据, although not detailed enough for individual key press identification, 可以帮助分组键盘输入。同时, acoustic 数据会进行分析, 以提取 Mel Frequency Cepstral Coefficients (MFCC)，帮助区分不同的键盘输入。这些特征会被 feed 到机器学习模型中, 以进行键盘预测, 结果会透过字库基于词汇预测方法进一步精确化。在我们的实验设置中, 我们对不同环境下的不同键盘进行了试验, 我们能够 achieve top-5 key prediction accuracy 约 80% 以上, 以及 top-100 word prediction accuracy 约 70% 以上, 这些结果显示了我们的方法在实际应用中的有效性和局限性。

2023-11-04

cs.CV

cs.CV - 2023-11-04

Anthropomorphic Grasping with Neural Object Shape Completion

paper_url: http://arxiv.org/abs/2311.02510
repo_url: None
paper_authors: Diego Hidalgo-Carvajal, Hanzhi Chen, Gemma C. Bettelani, Jaesug Jung, Melissa Zavaglia, Laura Busse, Abdeldjallil Naceri, Stefan Leutenegger, Sami Haddadin
for: 这 paper 的目的是提高 робоット在人造环境中的物体抓取和操作能力。
methods: 这 paper 使用了人类对物体的理解，通过重建和完善部分观察的物体形状，并使用7度自由度人工手臂来抓取和操作物体。
results: 这 paper 的方法在不同的方向和位置下，可以提高基eline的抓取成功率约30%，并实现了多种不同物体类别上的150多个成功抓取。这表明这种方法可以准确预测和执行抓取姿势，并在实际场景中提高了робоット的抓取和操作能力。

Abstract
The progressive prevalence of robots in human-suited environments has given rise to a myriad of object manipulation techniques, in which dexterity plays a paramount role. It is well-established that humans exhibit extraordinary dexterity when handling objects. Such dexterity seems to derive from a robust understanding of object properties (such as weight, size, and shape), as well as a remarkable capacity to interact with them. Hand postures commonly demonstrate the influence of specific regions on objects that need to be grasped, especially when objects are partially visible. In this work, we leverage human-like object understanding by reconstructing and completing their full geometry from partial observations, and manipulating them using a 7-DoF anthropomorphic robot hand. Our approach has significantly improved the grasping success rates of baselines with only partial reconstruction by nearly 30% and achieved over 150 successful grasps with three different object categories. This demonstrates our approach's consistent ability to predict and execute grasping postures based on the completed object shapes from various directions and positions in real-world scenarios. Our work opens up new possibilities for enhancing robotic applications that require precise grasping and manipulation skills of real-world reconstructed objects.

摘要
人类环境中机器人的普及进展，导致了一系列物体抓取技巧的发展，dexterity在这些技巧中扮演着关键角色。人类在抓取物体时表现出了惊人的灵活性，这种灵活性归功于对物体特性（如重量、大小、形状）的稳固了理解，以及与物体进行互动的出色能力。手姿常常反映物体需要抓取的特定区域的影响，特别是当物体只部分可见时。在这项工作中，我们利用人类对物体的理解，通过重建和完成部分观察的物体形态，并使用7自由度人工手掌进行抓取。我们的方法比基eline只有部分重建时的抓取成功率提高了近30%，并在不同的物体类别上达成了150多次成功的抓取。这表明我们的方法可以在真实世界enario中预测和执行基于完整的物体形态的抓取姿势，开启了新的机器人应用的可能性，例如精准的抓取和 manipulate技巧。

Neural Network Reconstruction of the Left Atrium using Sparse Catheter Paths

paper_url: http://arxiv.org/abs/2311.02488
repo_url: None
paper_authors: Alon Baram, Moshe Safran, Tomer Noy, Naveh Geri, Hayit Greenspan
for: 这个论文是为了提供一种可以在进程早期提供左心室Visualization的方法，以便使用简单的刺激器动作来获得 Left atrial shape reconstruction。
methods: 该论文提出了一种 dense encoder-decoder 网络，并使用一种新的Regularization term来重构左心室的形状。
results: 该论文表明，该方法可以在3分钟内基于部分数据来重构 Left atrial shape，并且可以生成真实的Visualization。 Synthetic和人类临床案例都被示出。

Abstract
Catheter based radiofrequency ablation for pulmonary vein isolation has become the first line of treatment for atrial fibrillation in recent years. This requires a rather accurate map of the left atrial sub-endocardial surface including the ostia of the pulmonary veins, which requires dense sampling of the surface and takes more than 10 minutes. The focus of this work is to provide left atrial visualization early in the procedure to ease procedure complexity and enable further workflows, such as using catheters that have difficulty sampling the surface. We propose a dense encoder-decoder network with a novel regularization term to reconstruct the shape of the left atrium from partial data which is derived from simple catheter maneuvers. To train the network, we acquire a large dataset of 3D atria shapes and generate corresponding catheter trajectories. Once trained, we show that the suggested network can sufficiently approximate the atrium shape based on a given trajectory. We compare several network solutions for the 3D atrium reconstruction. We demonstrate that the solution proposed produces realistic visualization using partial acquisition within a 3-minute time interval. Synthetic and human clinical cases are shown.

摘要
医疗器械导管基于射频热力学隔离，作为现代抗不规征性颤动疾病治疗的首选方式，已经在过去几年得到广泛应用。这需要一个非常准确的左心室内部表面地图，包括肺动脉口，这需要密集的表面探测，需要 более10分钟的时间。我们的目标是提供早期左心室视觉，以便简化过程复杂性和启用更多的工作流程，如使用困难探测表面的导管。我们提议一种密集编码-解码网络，以重建左心室形状从partial数据中。为了训练网络，我们收集了大量3Datria形状数据，并生成相应的导管轨迹。我们显示，我们的提议的网络可以基于给定轨迹sufficiently approximate left atrium shape。我们比较了多个网络解决方案，并显示我们的解决方案可以生成真实的视觉使用部分收集在3分钟时间内。我们使用 sintetic和人类临床案例展示。

A Strictly Bounded Deep Network for Unpaired Cyclic Translation of Medical Images

paper_url: http://arxiv.org/abs/2311.02480
repo_url: None
paper_authors: Swati Rai, Jignesh S. Bhatt, Sarat Kumar Patra
for: 这个论文是关于医学影像翻译的一种解决方案，它的目标是提供一种稳定的双向翻译模型，可以处理不同模式的医学影像。
methods: 该论文提出了一种基于patch-level concatenated cyclic conditional generative adversarial network（pCCGAN）的方法，它包括两个相互连接的CGAN，每个 generator都是conditional的，它们使用叠加的异质补做patches来学习特征。同时， generator还使用adaptive dictionary来降低可能的衰减。
results: 论文的实验结果表明，该方法可以在实际的CT和MRI图像翻译中提供superior的结果，并且通过质量、量化和简洁分析表明，该方法可以减少变异和提高稳定性。

Abstract
Medical image translation is an ill-posed problem. Unlike existing paired unbounded unidirectional translation networks, in this paper, we consider unpaired medical images and provide a strictly bounded network that yields a stable bidirectional translation. We propose a patch-level concatenated cyclic conditional generative adversarial network (pCCGAN) embedded with adaptive dictionary learning. It consists of two cyclically connected CGANs of 47 layers each; where both generators (each of 32 layers) are conditioned with concatenation of alternate unpaired patches from input and target modality images (not ground truth) of the same organ. The key idea is to exploit cross-neighborhood contextual feature information that bounds the translation space and boosts generalization. The generators are further equipped with adaptive dictionaries learned from the contextual patches to reduce possible degradation. Discriminators are 15-layer deep networks that employ minimax function to validate the translated imagery. A combined loss function is formulated with adversarial, non-adversarial, forward-backward cyclic, and identity losses that further minimize the variance of the proposed learning machine. Qualitative, quantitative, and ablation analysis show superior results on real CT and MRI.

摘要
医学图像翻译是一个不定Problem。不同于现有的已经对应的无限向量翻译网络，在这篇论文中，我们考虑了无对应的医学图像，并提供了一个具有稳定性的双向翻译网络。我们提议了一种patch-level concatenated cyclic conditional generative adversarial network (pCCGAN)，它包括两个相互连接的CGAN，每个CGAN都有47层，其中每个生成器都是通过 concatenation of alternate unpaired patches from input and target modality images (不是真实的ground truth) of the same organ来Conditional generation。我们的关键思想是利用跨邻域特征信息，以防止翻译空间过大，并提高泛化能力。生成器还使用了从Contextual patches中学习的自适应字典，以避免可能的下降。检测器是15层深度的网络，使用最大函数来验证翻译的图像。我们定义了一个组合损失函数，包括对抗损失、非对抗损失、前向后向环路损失和标识损失，以更加减小提议的学习机器的幂等误差。Qualitative、quantitative和ablation分析表明，我们的方法在真实的CT和MRI上达到了superior的结果。

SPHEAR: Spherical Head Registration for Complete Statistical 3D Modeling

paper_url: http://arxiv.org/abs/2311.02461
repo_url: None
paper_authors: Eduard Gabriel Bazavan, Andrei Zanfir, Thiemo Alldieck, Teodor Alexandru Szente, Mihai Zanfir, Cristian Sminchisescu
for: 该论文旨在提出一种准确、可导的参数统计3D人头模型，基于圆柱体嵌入的新型3D注册方法。
methods: 该模型使用非rigid注册方法，不受表面假设所限制，提高重建准确性，并减少人工干预。
results: 该模型可以生成高分辨率的自然色Texture、表面法向图、毛发辐射图等，同时支持自动化的视觉数据生成、 semantic 注释和总体重建任务。与现有方法相比，该模型的组件快速、内存利用率高，实验证明了设计方式的有效性和注册、重建和生成技术的准确性。

Abstract
We present \emph{SPHEAR}, an accurate, differentiable parametric statistical 3D human head model, enabled by a novel 3D registration method based on spherical embeddings. We shift the paradigm away from the classical Non-Rigid Registration methods, which operate under various surface priors, increasing reconstruction fidelity and minimizing required human intervention. Additionally, SPHEAR is a \emph{complete} model that allows not only to sample diverse synthetic head shapes and facial expressions, but also gaze directions, high-resolution color textures, surface normal maps, and hair cuts represented in detail, as strands. SPHEAR can be used for automatic realistic visual data generation, semantic annotation, and general reconstruction tasks. Compared to state-of-the-art approaches, our components are fast and memory efficient, and experiments support the validity of our design choices and the accuracy of registration, reconstruction and generation techniques.

摘要
我们介绍了SPHEAR，一种精准、可微分的参数型三维人头模型，基于圆形嵌入的新三维 регистра方法。我们弃却了传统的非固定注册方法，这些方法基于不同的表面优先级，从而提高了重建准确性并最小化了人工干预。此外，SPHEAR是一个完整的模型，允许不仅采样多种 sintetic 头部形状和 facial expression，还可以控制视线方向、高分辨率颜色Texture、表面法向图和毛发 Represented in detail, as strands。SPHEAR可以用于自动生成真实的视觉数据，semantic annotation和总体重建任务。相比之前的方法，我们的组件快速和内存减少，实验证明了我们的设计选择和注册、重建和生成技术的准确性。

Extracting Network Structures from Corporate Organization Charts Using Heuristic Image Processing

paper_url: http://arxiv.org/abs/2311.02460
repo_url: None
paper_authors: Hiroki Sayama, Junichi Yamanoi
for: 这个研究旨在开拓企业结构的影响力和性能表现。
methods: 研究者开发了一种图像处理方法，用于从公司组织架构图中提取和重构组织网络数据。该方法包括多个逻辑步骤，通过识别文本标签、方框、连接线和其他对象来检测组织架构图中的元素。检测到的元素将被组织成Python的NetworkX图形对象，用于可视化、验证和进一步的网络分析。
results: 研究者通过应用该方法，成功地从2008年至2011年《企业组织架构图/系统图手册》中的10,008个组织架构PDF文档中提取了4,606个组织网络（数据获取成功率为46%）。对每个重构的组织网络进行了多种网络指标的测量，以便进一步的统计分析，以 investigate 其可能的相关性与企业行为和表现。

Abstract
Organizational structure of corporations has potential to provide implications for dynamics and performance of corporate operations. However, this subject has remained unexplored because of the lack of readily available organization network datasets. To overcome the this gap, we developed a new heuristic image-processing method to extract and reconstruct organization network data from published organization charts. Our method analyzes a PDF file of a corporate organization chart and detects text labels, boxes, connecting lines, and other objects through multiple steps of heuristically implemented image processing. The detected components are reorganized together into a Python's NetworkX Graph object for visualization, validation and further network analysis. We applied the developed method to the organization charts of all the listed firms in Japan shown in the ``Organization Chart/System Diagram Handbook'' published by Diamond, Inc., from 2008 to 2011. Out of the 10,008 organization chart PDF files, our method was able to reconstruct 4,606 organization networks (data acquisition success rate: 46%). For each reconstructed organization network, we measured several network diagnostics, which will be used for further statistical analysis to investigate their potential correlations with corporate behavior and performance.

摘要
企业组织结构具有可能对企业运营动态和性能产生影响，但这个主题尚未被探讨，因为有限的可用组织网络数据。为了bridge这个阻隔，我们开发了一种新的图像处理方法，用于从公布的组织图中提取和重建组织网络数据。我们的方法通过多个逻辑地图处理步骤，从PDF文档中的组织图中检测文本标签、方块、连接线和其他对象。检测到的组件被重新组织为Python的NetworkX图形对象，以供可视化、验证和进一步的网络分析。我们应用了这种方法于日本上市公司《组织图/系统 диаграм手册》（Diamond, Inc.，2008-2011年）中所显示的10,008个组织图PDF文档中。其中，我们的方法成功地重建了4,606个组织网络（数据获取成功率：46%）。对每个重建的组织网络，我们测量了多种网络指标，这些指标将用于进一步的统计分析，以研究它们可能与企业行为和性能之间的相关性。

P-Age: Pexels Dataset for Robust Spatio-Temporal Apparent Age Classification

paper_url: http://arxiv.org/abs/2311.02432
repo_url: None
paper_authors: Abid Ali, Ashish Marisetty, Francois Bremond
for: 这个论文主要targets age estimation in videos, addressing challenges such as occlusions, low resolution, and lighting conditions.
methods: 该论文提出了一种新的方向 для年龄分类，利用视频基于模型来 capture 人体动态信息，dominating 面部基于方法。该方法使用两树结构，TimeSformer和EfficientNet作为 backing，以 efficiently capture 人体和面部动态信息。
results: 该方法在不同的视频数据集上进行了评测，与现有的面部基于方法相比，达到了更高的准确率。特别是在真实世界情况下，当人脸受到干扰、模糊或遮盾时，该方法仍能够准确地 estimte 年龄。

Abstract
Age estimation is a challenging task that has numerous applications. In this paper, we propose a new direction for age classification that utilizes a video-based model to address challenges such as occlusions, low-resolution, and lighting conditions. To address these challenges, we propose AgeFormer which utilizes spatio-temporal information on the dynamics of the entire body dominating face-based methods for age classification. Our novel two-stream architecture uses TimeSformer and EfficientNet as backbones, to effectively capture both facial and body dynamics information for efficient and accurate age estimation in videos. Furthermore, to fill the gap in predicting age in real-world situations from videos, we construct a video dataset called Pexels Age (P-Age) for age classification. The proposed method achieves superior results compared to existing face-based age estimation methods and is evaluated in situations where the face is highly occluded, blurred, or masked. The method is also cross-tested on a variety of challenging video datasets such as Charades, Smarthome, and Thumos-14.

摘要
��ж�� Age 估计是一项复杂的任务，具有许多应用。在这篇论文中，我们提出了一种新的方向 для age 分类，利用视频基本模型来解决 occlusions、低分辨率和照明条件等挑战。为了解决这些挑战，我们提出了 AgeFormer，它利用全身动态信息来替代面部基本方法 для age 分类。我们的新型两核体系使用 TimeSformer 和 EfficientNet 作为后备网络，以有效地捕捉全身动态信息以实现高效准确的 age 估计。此外，为了填充实际情况中的 age 估计漏斗，我们建立了一个名为 Pexels Age (P-Age) 的视频 dataset，用于 age 分类。我们的方法在不同的挑战性视频 dataset 上实现了比较出色的结果，比如 Charades、Smarthome 和 Thumos-14。

Task Arithmetic with LoRA for Continual Learning

paper_url: http://arxiv.org/abs/2311.02428
repo_url: None
paper_authors: Rajas Chitale, Ankit Vaidya, Aditya Kane, Archana Ghotkar
for: 这篇论文旨在解决连续训练问题，即训练数据分配为“任务”的序列。
methods: 我们提出了一种使用低维数据适应和任务加权的新方法，以解决连续训练问题和对模型进行多次训练的计算成本问题。
results: 我们的方法可以完全对抗快速忘却问题，并且降低了训练模型的计算成本。将10个类别的样本存储在小型快取中可以实现近似于全集调整的性能。

Abstract
Continual learning refers to the problem where the training data is available in sequential chunks, termed "tasks". The majority of progress in continual learning has been stunted by the problem of catastrophic forgetting, which is caused by sequential training of the model on streams of data. Moreover, it becomes computationally expensive to sequentially train large models multiple times. To mitigate both of these problems at once, we propose a novel method to continually train transformer-based vision models using low-rank adaptation and task arithmetic. Our method completely bypasses the problem of catastrophic forgetting, as well as reducing the computational requirement for training models on each task. When aided with a small memory of 10 samples per class, our method achieves performance close to full-set finetuning. We present rigorous ablations to support the prowess of our method.

摘要
Translation in Simplified Chinese: kontinuäl learning réferë à problém where training data è disponibile in sequential chunks, termed "tasks". Majorité avancements in kontinuäl learning hanno été stunted per problem of catastrophic forgetting, which è caused da sequential training of model on streams of data. Moreover, it becomes computationally expensive to sequentially train large models multiple times. To mitigate both of these problems at once, we propose a novel method to continually train transformer-based vision models using low-rank adaptation and task arithmetic. Our method completely bypasses the problem of catastrophic forgetting, as well as reducing the computational requirement for training models on each task. When aided with a small memory of 10 samples per class, our method achieves performance close to full-set finetuning. We present rigorous ablations to support the prowess of our method.

P2O-Calib: Camera-LiDAR Calibration Using Point-Pair Spatial Occlusion Relationship

paper_url: http://arxiv.org/abs/2311.02413
repo_url: None
paper_authors: Su Wang, Shini Zhang, Xuchong Qiu
for: 提高自动驾驶和机器人领域中感知器的准确和可靠性，通过缺乏目标的准则进行抗干扰的三个维度推准。
methods: 基于三元比较关系的2D-3D边缘点抽取法，以及基于抽取的2D-3D点对对应关系进行干扰导向的点匹配方法。
results: 对实际图像集KITTI进行评估，比较方法与现有目标 moins 方法的性能，结果表明，提出的方法可以在各种环境中具有低误差和高可靠性，为实际应用中的高质量摄像头-LiDAR推准做出贡献。

Abstract
The accurate and robust calibration result of sensors is considered as an important building block to the follow-up research in the autonomous driving and robotics domain. The current works involving extrinsic calibration between 3D LiDARs and monocular cameras mainly focus on target-based and target-less methods. The target-based methods are often utilized offline because of restrictions, such as additional target design and target placement limits. The current target-less methods suffer from feature indeterminacy and feature mismatching in various environments. To alleviate these limitations, we propose a novel target-less calibration approach which is based on the 2D-3D edge point extraction using the occlusion relationship in 3D space. Based on the extracted 2D-3D point pairs, we further propose an occlusion-guided point-matching method that improves the calibration accuracy and reduces computation costs. To validate the effectiveness of our approach, we evaluate the method performance qualitatively and quantitatively on real images from the KITTI dataset. The results demonstrate that our method outperforms the existing target-less methods and achieves low error and high robustness that can contribute to the practical applications relying on high-quality Camera-LiDAR calibration.

摘要
“准确和可靠的感知器的准确性是自动驾驶和机器人领域的重要基础结构。目前的外部投入calibration方法主要集中在目标基础和无目标方法两个领域。目标基础方法通常在线上使用，但是受到附加的目标设计和目标置放限制。现有的无目标方法受到环境中的特征不确定和特征匹配问题的影响。为了解决这些限制，我们提出了一种新的无目标准确方法，基于2D-3D边缘点提取和3D空间 occlusion 关系。基于提取的2D-3D点对，我们进一步提出了一种 occlusion-guided 点对应方法，可以提高准确率并降低计算成本。为验证我们的方法的有效性，我们对实际来自 KITTI 数据集的图像进行质量评估。结果表明，我们的方法在无目标情况下比现有的方法更高精度和更高可靠性，可以为实际应用中的高质量 Camera-LiDAR 准确协调做出贡献。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Hybrid quantum image classification and federated learning for hepatic steatosis diagnosis

paper_url: http://arxiv.org/abs/2311.02402
repo_url: None
paper_authors: Luca Lusnig, Asel Sagingalieva, Mikhail Surmach, Tatjana Protasevich, Ovidiu Michiu, Joseph McLoughlin, Christopher Mansell, Graziano de’ Petris, Deborah Bonazza, Fabrizio Zanconati, Alexey Melnikov, Fabio Cavalli
for: 这项研究旨在开发一种可助Pathologist在日常诊断中使用的智能系统，该系统可以利用深度学习技术和量子计算技术，并采用联合学习方法来实现隐私友好的多方参与学习。methods: 该研究使用了一种混合式量子神经网络，该网络包括5个量子比特和超过100个变量门，可以用于评估非酒精性肝脂肿，并提出了一种基于类型深度学习解决方案，通过减少每个参与者的数据量来解决隐私问题。results: 研究发现，混合式量子神经网络的肝脂肿图像分类精度达97%，高于其类似的类型深度学习模型（ResNet）的95.2%，而且在减少数据量的情况下，hybrid方法仍然能够superior generalization和less potential for overfitting，这表明该方法在医疗应用中具有优异的普适性和可靠性。

Abstract
With the maturity achieved by deep learning techniques, intelligent systems that can assist physicians in the daily interpretation of clinical images can play a very important role. In addition, quantum techniques applied to deep learning can enhance this performance, and federated learning techniques can realize privacy-friendly collaborative learning among different participants, solving privacy issues due to the use of sensitive data and reducing the number of data to be collected for each individual participant. We present in this study a hybrid quantum neural network that can be used to quantify non-alcoholic liver steatosis and could be useful in the diagnostic process to determine a liver's suitability for transplantation; at the same time, we propose a federated learning approach based on a classical deep learning solution to solve the same problem, but using a reduced data set in each part. The liver steatosis image classification accuracy of the hybrid quantum neural network, the hybrid quantum ResNet model, consisted of 5 qubits and more than 100 variational gates, reaches 97%, which is 1.8% higher than its classical counterpart, ResNet. Crucially, that even with a reduced dataset, our hybrid approach consistently outperformed its classical counterpart, indicating superior generalization and less potential for overfitting in medical applications. In addition, a federated approach with multiple clients, up to 32, despite the lower accuracy, but still higher than 90%, would allow using, for each participant, a very small dataset, i.e., up to one-thirtieth. Our work, based over real-word clinical data can be regarded as a scalable and collaborative starting point, could thus fulfill the need for an effective and reliable computer-assisted system that facilitates the daily diagnostic work of the clinical pathologist.

摘要
随着深度学习技术的成熔，智能系统可以帮助医生日常解读临床图像，扮演着非常重要的角色。此外，应用于深度学习的量子技术可以提高这种性能，而 federated learning 技术可以实现隐私友好的合作学习，解决由敏感数据使用而导致的隐私问题，同时减少每个参与者需要收集的数据量。在本研究中，我们提出了一种混合量子神经网络，可以用于评估非酒精肝炎病变，并且可以在诊断过程中决定肝脏的适用性 для移植。同时，我们提出了基于类型深度学习解决方案的联邦学习方法，可以使用减少的数据集来解决同一个问题。混合量子神经网络的肝炎病变图像分类精度达97%，高于其类型深度学习模型（ResNet）的95.2%。更重要的是，我们的混合方法在减少数据集时仍然可以高效地分类肝炎病变图像，这表明它在医疗应用中具有更好的总结和更少的潜在过拟合问题。此外，多客户联邦学习方法可以使用每个参与者的非常小数据集（最多一半）来解决同一个问题，尽管它的准确率虽然不如单个客户的混合量子神经网络，但仍高于90%。我们的工作，基于实际临床数据，可以视为一种可扩展和协作的起点，可以满足医疗应用中的有效和可靠计算助手的需求。

Domain Transfer in Latent Space (DTLS) Wins on Image Super-Resolution – a Non-Denoising Model

paper_url: http://arxiv.org/abs/2311.02358
repo_url: None
paper_authors: Chun-Chuen Hui, Wan-Chi Siu, Ngai-Fong Law
for: 大规模图像超分辨是一项computer vision任务，因为很多信息在高度压缩图像中缺失，例如scale x16超分辨。
methods: diffusion models 在过去几年中得到了成功，它使用 Gaussian noise 来构建一个 latent photo-realistic space，并作为连接latent vector space和 latent photo-realistic space的链接。
results: 在这篇文章中，我们提出了一种简单的方法，它不使用 Gaussian noise，而是采用基于diffusion models的一些基本结构来实现高质量的图像超分辨。我们使用 DNN 来实现域传递，以便利用邻域域的统计特性来进行渐进 interpolate，并通过参照输入LR图像来进行条件域传递，从而进一步提高图像质量。实验结果表明，我们的方法不仅超越了当前的大规模超分辨模型，还超越了当前的扩散模型。这种方法可以轻松扩展到其他图像到图像任务，如图像照明、填充、降噪等。

Abstract
Large scale image super-resolution is a challenging computer vision task, since vast information is missing in a highly degraded image, say for example forscale x16 super-resolution. Diffusion models are used successfully in recent years in extreme super-resolution applications, in which Gaussian noise is used as a means to form a latent photo-realistic space, and acts as a link between the space of latent vectors and the latent photo-realistic space. There are quite a few sophisticated mathematical derivations on mapping the statistics of Gaussian noises making Diffusion Models successful. In this paper we propose a simple approach which gets away from using Gaussian noise but adopts some basic structures of diffusion models for efficient image super-resolution. Essentially, we propose a DNN to perform domain transfer between neighbor domains, which can learn the differences in statistical properties to facilitate gradual interpolation with results of reasonable quality. Further quality improvement is achieved by conditioning the domain transfer with reference to the input LR image. Experimental results show that our method outperforms not only state-of-the-art large scale super resolution models, but also the current diffusion models for image super-resolution. The approach can readily be extended to other image-to-image tasks, such as image enlightening, inpainting, denoising, etc.

摘要
大规模图像超解析是一项计算机视觉任务，因为高度受损图像中的信息量很大，例如 scale x16 超解析。扩散模型在过去几年得到了成功，在极端超解析应用中使用 Gaussian 噪声作为一种 latent photo-realistic 空间的形成者，并作为 latent vector 空间和 latent photo-realistic 空间之间的连接。有很多复杂的数学推导，映射 Gaussian 噪声的统计特性，使扩散模型成功。在这篇论文中，我们提出了一种简单的方法，不使用 Gaussian 噪声，但采用了扩散模型的一些基本结构，实现高质量图像超解析。我们提议使用 DNN 进行频率域传输，以便利用邻域频率的不同来实现慢滑均衡，并通过参考输入低解析图像来进行条件域传输，从而进一步提高图像质量。实验结果表明，我们的方法不仅超过了当前大规模超解析模型的状态，还超过了当前扩散模型的图像超解析性能。该方法可以轻松扩展到其他图像-图像任务，如图像照明、填充、去噪等。

Proposal-Level Unsupervised Domain Adaptation for Open World Unbiased Detector

paper_url: http://arxiv.org/abs/2311.02342
repo_url: None
paper_authors: Xuanyi Liu, Zhongqi Yue, Xian-Sheng Hua
for: 开展开放世界对象检测（OWOD），结合开放集合对象检测和逐步学习能力，面对视觉世界中的开放和动态挑战。
methods: 采用Unsupervised Domain Adaptation方法，将已知类别的预测器作为源频谱，使用自动回归法学习域 invariants 的前景特征，实现不偏向的预测。
results: 通过OOD evaluation，我们达到了状态 искусственный智能的性能水平。

Abstract
Open World Object Detection (OWOD) combines open-set object detection with incremental learning capabilities to handle the challenge of the open and dynamic visual world. Existing works assume that a foreground predictor trained on the seen categories can be directly transferred to identify the unseen categories' locations by selecting the top-k most confident foreground predictions. However, the assumption is hardly valid in practice. This is because the predictor is inevitably biased to the known categories, and fails under the shift in the appearance of the unseen categories. In this work, we aim to build an unbiased foreground predictor by re-formulating the task under Unsupervised Domain Adaptation, where the current biased predictor helps form the domains: the seen object locations and confident background locations as the source domain, and the rest ambiguous ones as the target domain. Then, we adopt the simple and effective self-training method to learn a predictor based on the domain-invariant foreground features, hence achieving unbiased prediction robust to the shift in appearance between the seen and unseen categories. Our approach's pipeline can adapt to various detection frameworks and UDA methods, empirically validated by OWOD evaluation, where we achieve state-of-the-art performance.

摘要
Translation notes:* Open World Object Detection (OWOD) is translated as "开放世界物体检测" (kāifàng shìjiè wùzhì kǎoyan).* existing works is translated as "现有的工作" (xiàn yǒu de gōngzuò).* foreground predictor is translated as "前景预测器" (qiánjìng yùdiǎn).* unseen categories is translated as "未知类别" (wèi zhī lèibì).* domain adaptation is translated as "领域适应" (dòngyì tiěbìng).* self-training method is translated as "自我启用法" (zìwǒ kāifàng fǎ).* domain-invariant features is translated as "领域不变特征" (dòngyì bùbiàn tèzhèng).* unbiased prediction is translated as "无偏预测" (wùpíng yùdiǎn).* state-of-the-art performance is translated as "顶尖性能" (dǐngjiān xìngnéng).

MC-Stereo: Multi-peak Lookup and Cascade Search Range for Stereo Matching

paper_url: http://arxiv.org/abs/2311.02340
repo_url: None
paper_authors: Miaojie Feng, Junda Cheng, Hao Jia, Longliang Liu, Gangwei Xu, Xin Yang
for: 本研究旨在提高iterative optimization方法的掌握能力，特别是解决多峰分布问题和固定搜索范围的限制。
methods: 本文提出了一种新的iterative optimization架构，称为MC-Stereo，它通过多峰查找策略来减轻多峰分布问题，并在迭代框架中 интеGRATE了粗细搜索的概念。此外，我们还引入了一个预训练的网络来提高前端的特征提取器。
results: 根据实验结果，MC-Stereo在KITTI-2012和KITTI-2015测试集上 ranked first among all publicly available方法，并在ETH3D上达到了领域内最佳性能。

Abstract
Stereo matching is a fundamental task in scene comprehension. In recent years, the method based on iterative optimization has shown promise in stereo matching. However, the current iteration framework employs a single-peak lookup, which struggles to handle the multi-peak problem effectively. Additionally, the fixed search range used during the iteration process limits the final convergence effects. To address these issues, we present a novel iterative optimization architecture called MC-Stereo. This architecture mitigates the multi-peak distribution problem in matching through the multi-peak lookup strategy, and integrates the coarse-to-fine concept into the iterative framework via the cascade search range. Furthermore, given that feature representation learning is crucial for successful learnbased stereo matching, we introduce a pre-trained network to serve as the feature extractor, enhancing the front end of the stereo matching pipeline. Based on these improvements, MC-Stereo ranks first among all publicly available methods on the KITTI-2012 and KITTI-2015 benchmarks, and also achieves state-of-the-art performance on ETH3D. The code will be open sourced after the publication of this paper.

摘要
斯tereo匹配是Scene理解中的基本任务。在最近几年，基于迭代优化的方法在斯tereo匹配中表现良好。然而，当前的迭代框架使用单峰搜索，在多峰问题中效果不佳。此外，使用的fixed搜索范围限制了最终的整合效果。为解决这些问题，我们提出了一种新的迭代优化架构，称为MC-Stereo。这种架构通过多峰搜索策略解决了匹配中的多峰分布问题，并通过缩放搜索范围实现了从粗到细的搜索概念。此外，由于特征表示学习是成功的learnbased斯tereo匹配的关键，我们引入了预训练的网络作为特征提取器，从而提高了斯tereo匹配管线的前端。基于这些改进，MC-Stereo在KITTI-2012和KITTI-2015benchmark上名列前茅，并在ETH3D上实现了状态的art performance。代码将在本文发表后开源。

Multimodal Machine Learning for Clinically-Assistive Imaging-Based Biomedical Applications

paper_url: http://arxiv.org/abs/2311.02332
repo_url: None
paper_authors: Elisa Warner, Joonsang Lee, William Hsu, Tanveer Syeda-Mahmood, Charles Kahn, Arvind Rao
for: 这项研究旨在探讨适用于医疗人工智能系统的机器学习应用，尤其是在多modal数据集合 integrate 方面。
methods: 这篇论文描述了五种对多modal AI 的挑战（表示、融合、对接、翻译和共学习），并评估了这些挑战在医疗影像基础的临床决策支持模型中的应用。
results: 这篇论文结论提出了未来这个领域的发展趋势，并建议了在成功诊断模型的翻译和应用中进一步探索的方向。

Abstract
Machine learning (ML) applications in medical artificial intelligence (AI) systems have shifted from traditional and statistical methods to increasing application of deep learning models and even more recently generative models. Recent years have seen a rise in the discovery of widely-available deep learning architectures that support multimodal data integration, particularly with images. The incorporation of multiple modalities into these models is a thriving research topic, presenting its own unique challenges. In this work, we discuss five challenges to multimodal AI as it pertains to ML (representation, fusion, alignment, translation, and co-learning) and survey recent approaches to addressing these challenges in the context of medical image-based clinical decision support models. We conclude with a discussion of the future of the field, suggesting directions that should be elucidated further for successful clinical models and their translation to the clinical setting.

摘要
机器学习（ML）在医疗人工智能（AI）系统中的应用从传统和统计方法逐渐转移到深度学习模型，并在最近几年内，更多地使用生成模型。在过去几年中，我们发现了许多可用的深度学习架构，尤其是与图像集成。将多种模式integrated into these models presents unique challenges. In this work, we discuss five challenges to multimodal AI as it pertains to ML (representation, fusion, alignment, translation, and co-learning) and survey recent approaches to addressing these challenges in the context of medical image-based clinical decision support models. We conclude with a discussion of the future of the field, suggesting directions that should be further explored for successful clinical models and their translation to the clinical setting.Here's a breakdown of the translation:1. 机器学习 (ML) - Machine learning2. 在医疗人工智能 (AI)系统中 - In medical artificial intelligence systems3. 应用从传统和统计方法逐渐转移 - From traditional and statistical methods to4. 到深度学习模型 - Deep learning models5. 并在最近几年内，更多地使用生成模型 - And more recently, using generative models6. 在过去几年中，我们发现了许多可用的深度学习架构 - In the past few years, we have found many available deep learning architectures7. 尤其是与图像集成 - Especially with image integration8. 将多种模式integrated into these models presents unique challenges - Integrating multiple modes into these models presents unique challenges9. In this work, we discuss five challenges to multimodal AI as it pertains to ML - In this work, we discuss five challenges to multimodal AI as it relates to machine learning10. representation, fusion, alignment, translation, and co-learning - Representation, fusion, alignment, translation, and co-learning11. 并 survey recent approaches to addressing these challenges - And survey recent approaches to addressing these challenges12. 在医疗图像基础的临床决策模型中 - In medical image-based clinical decision support models13. We conclude with a discussion of the future of the field - We conclude with a discussion of the future of the field14. 建议更多的探索 - Suggesting further explorationPlease note that the translation is done in Simplified Chinese, which is the most widely used standard for Chinese writing. If you need the translation in Traditional Chinese, please let me know.

Counting Manatee Aggregations using Deep Neural Networks and Anisotropic Gaussian Kernel

paper_url: http://arxiv.org/abs/2311.02315
repo_url: https://github.com/yeyimilk/deep-learning-for-manatee-counting
paper_authors: Zhiqiang Wang, Yiran Pang, Cihan Ulus, Xingquan Zhu
for: 这个论文是为了自动计算水獭群体中的数量而写的。
methods: 该方法使用深度学习技术，使用低质量图像作为输入，并使用不同类型的深度神经网络来学习水獭的密度函数，以计算水獭群体中的数量。
results: 实验结果显示，使用Anisotropic Gaussian Kernel（AGK）kernel，并应用于不同类型的深度神经网络，可以准确地计算水獭群体中的数量，特别是在复杂背景环境下。

Abstract
Manatees are aquatic mammals with voracious appetites. They rely on sea grass as the main food source, and often spend up to eight hours a day grazing. They move slow and frequently stay in group (i.e. aggregations) in shallow water to search for food, making them vulnerable to environment change and other risks. Accurate counting manatee aggregations within a region is not only biologically meaningful in observing their habit, but also crucial for designing safety rules for human boaters, divers, etc., as well as scheduling nursing, intervention, and other plans. In this paper, we propose a deep learning based crowd counting approach to automatically count number of manatees within a region, by using low quality images as input. Because manatees have unique shape and they often stay in shallow water in groups, water surface reflection, occlusion, camouflage etc. making it difficult to accurately count manatee numbers. To address the challenges, we propose to use Anisotropic Gaussian Kernel (AGK), with tunable rotation and variances, to ensure that density functions can maximally capture shapes of individual manatees in different aggregations. After that, we apply AGK kernel to different types of deep neural networks primarily designed for crowd counting, including VGG, SANet, Congested Scene Recognition network (CSRNet), MARUNet etc. to learn manatee densities and calculate number of manatees in the scene. By using generic low quality images extracted from surveillance videos, our experiment results and comparison show that AGK kernel based manatee counting achieves minimum Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). The proposed method works particularly well for counting manatee aggregations in environments with complex background.

摘要
MANATEES 是水生哺乳动物，具有极高的食量。它们依赖于海草为食，并可能每天花费8小时在牧场。它们移动缓慢，并常常在浅水区寻找食物，使得它们易受环境变化和其他风险的影响。正确地计算MANATEES 的聚集数量在一个区域非仅生物学上重要，还是关键的 для设计人类潜水员、潜水等安全规则，以及安排护理、救援等计划。在这篇论文中，我们提出了基于深度学习的人ATEES 聚集计数方法，使用低质量图像作为输入。由于MANATEES 的特有形状和它们常常在浅水区寻找食物，水面反射、遮挡等因素使得准确计数MANATEES 数量非常困难。为了解决这些挑战，我们提议使用不规则 Gaussian kernel（AGK），并可调整旋转和方差，以确保AGK kernel能够最大化个体MANATEES 形状在不同的聚集中。然后，我们将AGK kernel应用到不同类型的深度神经网络，包括 VGG、SANet、 Congested Scene Recognition network（CSRNet）等，以学习MANATEES 的浓度函数并计算场景中的MANATEES 数量。通过使用普通的低质量图像，我们的实验结果和比较表明，AGK kernel基于MANATEES 计数实现了最小的精度平均误差（MAE）和平方根误差（RMSE）。我们的方法在环境复杂背景下特别有效。

LISNeRF Mapping: LiDAR-based Implicit Mapping via Semantic Neural Fields for Large-Scale 3D Scenes

paper_url: http://arxiv.org/abs/2311.02313
repo_url: None
paper_authors: Jianyuan Zhang, Zhiliu Yang
for: This paper proposes a method for large-scale 3D semantic reconstruction from LiDAR measurements alone, which is crucial for outdoor autonomous agents to fulfill high-level tasks such as planning and navigation.
methods: The proposed method uses an octree-based and hierarchical structure to store implicit features, which are decoded to semantic information and signed distance value through shallow Multilayer Perceptrons (MLPs). Off-the-shelf algorithms are used to predict semantic labels and instance IDs of point cloud, and the implicit features and MLPs parameters are jointly optimized with self-supervision paradigm for point cloud geometry and pseudo-supervision paradigm for semantic and panoptic labels.
results: The proposed method is evaluated on three real-world datasets, SemanticKITTI, SemanticPOSS, and nuScenes, and demonstrates effectiveness and efficiency compared to current state-of-the-art 3D mapping methods.

Abstract
Large-scale semantic mapping is crucial for outdoor autonomous agents to fulfill high-level tasks such as planning and navigation. This paper proposes a novel method for large-scale 3D semantic reconstruction through implicit representations from LiDAR measurements alone. We firstly leverages an octree-based and hierarchical structure to store implicit features, then these implicit features are decoded to semantic information and signed distance value through shallow Multilayer Perceptrons (MLPs). We adopt off-the-shelf algorithms to predict the semantic labels and instance IDs of point cloud. Then we jointly optimize the implicit features and MLPs parameters with self-supervision paradigm for point cloud geometry and pseudo-supervision pradigm for semantic and panoptic labels. Subsequently, Marching Cubes algorithm is exploited to subdivide and visualize the scenes in the inferring stage. For scenarios with memory constraints, a map stitching strategy is also developed to merge sub-maps into a complete map. As far as we know, our method is the first work to reconstruct semantic implicit scenes from LiDAR-only input. Experiments on three real-world datasets, SemanticKITTI, SemanticPOSS and nuScenes, demonstrate the effectiveness and efficiency of our framework compared to current state-of-the-art 3D mapping methods.

摘要
大规模 semantic mapping 是外部自主 Agent 完成高级任务，如规划和导航的关键。这篇论文提出了一种基于 LiDAR 测量的大规模 3D semantic 重建方法。我们首先利用 Octree 结构来存储偏函数特征，然后使用 shallow Multilayer Perceptrons (MLPs) 来解码这些偏函数特征为 semantic 信息和 signed distance value。我们采用了市场上的算法来预测 semantic 标签和实例 ID 的点云。然后，我们同时优化偏函数和 MLPs 参数使用自我超级vised paradigm for point cloud geometry和 pseudo-supervision 概念 для semantic 和 panoptic 标签。在推断阶段，我们利用 Marching Cubes 算法来分割和可视化场景。对于存储限制的场景，我们还开发了一种 map stitching 策略来合并子地图到完整的地图。根据我们所知，我们的方法是首个从 LiDAR 输入 alone 重建 semantic implicit scene。我们在三个实际世界数据集（SemanticKITTI、SemanticPOSS 和 nuScenes）进行了实验，并证明了我们的框架在现状最佳的 3D 地图方法中表现出了效果和效率。

2023-11-04

cs.AI

cs.AI - 2023-11-04

UniTSFace: Unified Threshold Integrated Sample-to-Sample Loss for Face Recognition

paper_url: http://arxiv.org/abs/2311.02523
repo_url: None
paper_authors: Qiufu Li, Xi Jia, Jiancan Zhou, Linlin Shen, Jinming Duan
for: 这篇论文的目的是提出一种高效的面部识别方法，以满足现实世界中的面部验证应用。
methods: 该方法使用了一种新的集成损失函数（USS loss），该损失函数具有一个明确的统一阈值，用于分辨正面和负面对的对比。
results: 实验结果表明，提出的 USS loss 高效地使用了 sample-to-sample 的损失函数，并可以与 sample-to-class 的损失函数结合使用。此外，该方法在多个 benchmark 数据集上表现出色，比如 MFR、IJB-C、LFW、CFP-FP、AgeDB 和 MegaFace，并且可以超越现有的方法，如 CosFace、ArcFace、VPL、AnchorFace 和 UNPG。

Abstract
Sample-to-class-based face recognition models can not fully explore the cross-sample relationship among large amounts of facial images, while sample-to-sample-based models require sophisticated pairing processes for training. Furthermore, neither method satisfies the requirements of real-world face verification applications, which expect a unified threshold separating positive from negative facial pairs. In this paper, we propose a unified threshold integrated sample-to-sample based loss (USS loss), which features an explicit unified threshold for distinguishing positive from negative pairs. Inspired by our USS loss, we also derive the sample-to-sample based softmax and BCE losses, and discuss their relationship. Extensive evaluation on multiple benchmark datasets, including MFR, IJB-C, LFW, CFP-FP, AgeDB, and MegaFace, demonstrates that the proposed USS loss is highly efficient and can work seamlessly with sample-to-class-based losses. The embedded loss (USS and sample-to-class Softmax loss) overcomes the pitfalls of previous approaches and the trained facial model UniTSFace exhibits exceptional performance, outperforming state-of-the-art methods, such as CosFace, ArcFace, VPL, AnchorFace, and UNPG. Our code is available.

摘要
“现有的面部识别模型（sample-to-class基本模型）无法充分利用大量的面部图像之间的交叉样本关系，而sample-to-sample基本模型则需要复杂的对应过程进行训练。此外，这两种方法都不能满足实际面部验证应用中的需求，需要一个统一的阈值来分辨正面和负面的面部对。在本篇文章中，我们提出了统一阈值结合sample-to-sample基本损失（USS损失），其中包含一个明确的统一阈值，用于分辨正面和负面的面部对。我们也从USS损失中 derivated sample-to-sample基本软max损失和BCE损失，并讨论它们之间的关系。我们对多个benchmark数据集，包括MFR、IJB-C、LFW、CFP-FP、AgeDB和MegaFace进行了广泛的评估，结果显示了我们的USS损失非常高效，并且可以与sample-to-class基本损失一起运作。我们的模型UniTSFace在训练时使用了USS损失和sample-to-class软max损失，并且表现出色，超越了现有的方法，如CosFace、ArcFace、VPL、AnchorFace和UNPG。我们的代码可以通过我们的网站下载。”

MAAIP: Multi-Agent Adversarial Interaction Priors for imitation from fighting demonstrations for physics-based characters

paper_url: http://arxiv.org/abs/2311.02502
repo_url: None
paper_authors: Mohamed Younes, Ewa Kijak, Richard Kulpa, Simon Malinowski, Franck Multon
for: 这篇论文旨在提出一种基于多智能生成对抗学习的多个物理角色动作模拟方法，以满足互动应用和电影电视行业中的自动次要人物动作生成需求。
methods: 该方法基于多智能生成对抗学习，使用印杂学习技术来模拟多个物理角色的交互和动作。
results: 该方法在两种不同的拳击和全身武术风格下进行了测试，并成功地模拟了不同风格的交互和动作。

Abstract
Simulating realistic interaction and motions for physics-based characters is of great interest for interactive applications, and automatic secondary character animation in the movie and video game industries. Recent works in reinforcement learning have proposed impressive results for single character simulation, especially the ones that use imitation learning based techniques. However, imitating multiple characters interactions and motions requires to also model their interactions. In this paper, we propose a novel Multi-Agent Generative Adversarial Imitation Learning based approach that generalizes the idea of motion imitation for one character to deal with both the interaction and the motions of the multiple physics-based characters. Two unstructured datasets are given as inputs: 1) a single-actor dataset containing motions of a single actor performing a set of motions linked to a specific application, and 2) an interaction dataset containing a few examples of interactions between multiple actors. Based on these datasets, our system trains control policies allowing each character to imitate the interactive skills associated with each actor, while preserving the intrinsic style. This approach has been tested on two different fighting styles, boxing and full-body martial art, to demonstrate the ability of the method to imitate different styles.

摘要
仿真人物的交互和动作是现代应用中很受欢迎的话题，特别是在电影和电子游戏行业中。最近的学习策略中，强调实现单个人物的仿真动作，尤其是使用仿制学习技术。然而，模拟多个人物之间的交互和动作需要同时模型他们之间的互动。在这篇论文中，我们提出了一种新的多智能体生成对抗学习仿真学习方法，扩展了单个人物的动作仿真到多个物理基于的人物之间的交互和动作。我们使用两个无结构数据集作为输入：1）一个单个演员数据集，包含一个演员执行一系列动作和应用相关的动作链接，和2）一个互动数据集，包含一些多个演员之间的互动示例。基于这两个数据集，我们的系统通过控制策略让每个人物学习与每个演员相互交互的技能，保持内在的风格。我们在拳击和全身武术两种不同的战斗风格中测试了这种方法，以示方法的多样性。

Forecasting Post-Wildfire Vegetation Recovery in California using a Convolutional Long Short-Term Memory Tensor Regression Network

paper_url: http://arxiv.org/abs/2311.02492
repo_url: None
paper_authors: Jiahe Liu, Xiaodi Wang
for: 这个研究旨在发展成功的生态系统恢复策略，帮助理解火灾后植被恢复的过程。
methods: 这个研究使用了一种新的方法，即将Convolutional Long Short-Term Memory Tensor Regression（ConvLSTMTR）网络应用于火灾后植被恢复的预测。
results: 研究结果表明，ConvLSTMTR网络可以准确预测火灾后植被恢复的速度，并且可以分类不同的恢复趋势。

Abstract
The study of post-wildfire plant regrowth is essential for developing successful ecosystem recovery strategies. Prior research mainly examines key ecological and biogeographical factors influencing post-fire succession. This research proposes a novel approach for predicting and analyzing post-fire plant recovery. We develop a Convolutional Long Short-Term Memory Tensor Regression (ConvLSTMTR) network that predicts future Normalized Difference Vegetation Index (NDVI) based on short-term plant growth data after fire containment. The model is trained and tested on 104 major California wildfires occurring between 2013 and 2020, each with burn areas exceeding 3000 acres. The integration of ConvLSTM with tensor regression enables the calculation of an overall logistic growth rate k using predicted NDVI. Overall, our k-value predictions demonstrate impressive performance, with 50% of predictions exhibiting an absolute error of 0.12 or less, and 75% having an error of 0.24 or less. Finally, we employ Uniform Manifold Approximation and Projection (UMAP) and KNN clustering to identify recovery trends, offering insights into regions with varying rates of recovery. This study pioneers the combined use of tensor regression and ConvLSTM, and introduces the application of UMAP for clustering similar wildfires. This advances predictive ecological modeling and could inform future post-fire vegetation management strategies.

摘要
研究火灾后植物回复的学术研究非常重要，以发展成功的生态系统回复策略。先前的研究主要探讨火灾后的生态和生物地理因素的影响。这个研究提出了一种新的方法来预测和分析火灾后植物的回复。我们开发了一个卷积长短期记忆点 regression（ConvLSTMTR）网络，可以预测未来 Normalized Difference Vegetation Index（NDVI）基于火灾后植物增长数据。这个模型在104次加利福尼亚州大火灾中训练和测试，每次火灾面积超过3000英亩。通过卷积和tensor regression的结合，我们可以计算整体的几何增长率k。总的来说，我们的k值预测表现出色，50%的预测值几何准确性在0.12或更低，75%的预测值几何准确性在0.24或更低。最后，我们使用Uniform Manifold Approximation and Projection（UMAP）和KNN推敲来识别回复趋势，提供了不同回复速率的区域差异的见解。这项研究创新了tensor regression和ConvLSTM的结合，并首次应用UMAP来推敲相似的野火。这些进展可能对未来火灾后植物管理策略提供帮助。

Uncertainty Quantification of Deep Learning for Spatiotemporal Data: Challenges and Opportunities

paper_url: http://arxiv.org/abs/2311.02485
repo_url: None
paper_authors: Wenchong He, Zhe Jiang
for: 随着GPS、Remote Sensing和计算模拟技术的发展，大量的地ospatial和时间特征数据正在不断增加，这些数据资产提供了改变社会的Unique机遇。但是，深度学习模型在高度决策应用中可能会出现意外和错误的预测，导致严重的后果。不确定性评估（UQ）可以 estimating a deep learning model’s confidence.
methods: 本文提供了深度学习模型的不确定性评估简介，包括其特殊挑战和现有方法。我们尤其关注不确定性来源的重要性。
results: 本文 highlights several future research directions for spatiotemporal data, including the importance of uncertainty sources.Here is the same information in English:
for: With the advancement of GPS, remote sensing, and computational simulations, large amounts of geospatial and spatiotemporal data are being collected at an increasing speed, providing unique opportunities to transform society. However, deep learning models sometimes make unexpected and incorrect predictions with unwarranted confidence, causing severe consequences in high-stake decision-making applications. Uncertainty quantification (UQ) aims to estimate a deep learning model’s confidence.
methods: This paper provides a brief overview of UQ of deep learning for spatiotemporal data, including its unique challenges and existing methods. We particularly focus on the importance of uncertainty sources.
results: The paper highlights several future research directions for spatiotemporal data, including the importance of uncertainty sources.

Abstract
With the advancement of GPS, remote sensing, and computational simulations, large amounts of geospatial and spatiotemporal data are being collected at an increasing speed. Such emerging spatiotemporal big data assets, together with the recent progress of deep learning technologies, provide unique opportunities to transform society. However, it is widely recognized that deep learning sometimes makes unexpected and incorrect predictions with unwarranted confidence, causing severe consequences in high-stake decision-making applications (e.g., disaster management, medical diagnosis, autonomous driving). Uncertainty quantification (UQ) aims to estimate a deep learning model's confidence. This paper provides a brief overview of UQ of deep learning for spatiotemporal data, including its unique challenges and existing methods. We particularly focus on the importance of uncertainty sources. We identify several future research directions for spatiotemporal data.

摘要
Translated into Simplified Chinese:随着GPS、远程感知和计算 simulate的发展，大量的地ospatial和时空数据在不断增加。这些emerging spatiotemporal big data assets，加上深度学习技术的最新进步，为社会转型提供了唯一的机会。然而，广泛认可的深度学习 sometimes makes unexpected and incorrect predictions with unwarranted confidence，对高度决策应用（如灾害管理、医疗诊断、自动驾驶）可致严重的后果。 uncertainty quantification (UQ) aimsto estimate a deep learning model's confidence。这篇文章提供了深度学习 for spatiotemporal data的 UQ 简介，包括其特殊挑战和现有方法。我们尤其关注了 uncertainty sources 的重要性。我们标识了多个未来研究方向 for spatiotemporal data。

Generalized zero-shot audio-to-intent classification

paper_url: http://arxiv.org/abs/2311.02482
repo_url: None
paper_authors: Veera Raghavendra Elluru, Devang Kulshreshtha, Rohit Paturi, Sravan Bodapati, Srikanth Ronanki
for: 这个研究旨在提高使用音频数据的语音识别系统的未见意能力。
methods: 该研究提议一种通用零shot音频到意类型分类框架，只需要几个示例文本句子每个意图。这个框架首先通过使用一个自动生成的预训练模型来训练一个监督音频到意类型分类器。然后，我们使用神经网络音频生成器生成音频嵌入 для示例文本词汇，并使用普通的cosinus相似性来进行通用零shot分类。此外，我们还提出了一种多Modal训练策略，它将字幕信息 integrate到音频表示中以提高零shot性能。
results: 我们的多Modal训练策略提高了SLURP dataset上未见意分类的准确率，比Audio只训练策略高2.75%和18.2%。

Abstract
Spoken language understanding systems using audio-only data are gaining popularity, yet their ability to handle unseen intents remains limited. In this study, we propose a generalized zero-shot audio-to-intent classification framework with only a few sample text sentences per intent. To achieve this, we first train a supervised audio-to-intent classifier by making use of a self-supervised pre-trained model. We then leverage a neural audio synthesizer to create audio embeddings for sample text utterances and perform generalized zero-shot classification on unseen intents using cosine similarity. We also propose a multimodal training strategy that incorporates lexical information into the audio representation to improve zero-shot performance. Our multimodal training approach improves the accuracy of zero-shot intent classification on unseen intents of SLURP by 2.75% and 18.2% for the SLURP and internal goal-oriented dialog datasets, respectively, compared to audio-only training.

摘要
spoken language understanding systems using audio-only data 获得 popularity，但它们对未经见意旨的处理能力仍然有限。在这种研究中，我们提议一种通用的零shot audio-to-intent分类框架，只需几个sample text sentences per intent。为实现这一点，我们首先使用一个自我超vised audio-to-intent分类器进行训练，然后利用一个神经网络音频生成器生成音频嵌入 дляsample text词汇，并使用cosine similarity进行通用零shot分类。我们还提出了一种多Modal训练策略，该策略将语言信息 incorporated into the audio representation，以提高零shot性能。我们的多Modal训练方法在SLURP上的零shot意旨分类精度提高2.75%和18.2%，相比 audio-only 训练。

Constrained Equation Learner Networks for Precision-Preserving Extrapolation of Robotic Skills

paper_url: http://arxiv.org/abs/2311.02475
repo_url: None
paper_authors: Hector Perez-Villeda, Justus Piater, Matteo Saveriano
for: 该论文目的是解决在程序示示中学习 novel 技能后，如何适应不同的环境和条件，而不需要收集新的训练数据。
methods: 该论文提出了一种新的监督学习框架，即受限Equation Learner Networks（CEN），用于解决程序示示中的轨迹适应问题。CEN 使用 Equation Learner Networks 来学习一组 analytical 表达，并用这些表达作为基函数。
results: 实验结果表明，CEN 可以比现有方法更好地适应 robotic 技能，并且可以保持适应的精度。在一些 robotic 任务中，CEN 在不同的环境下实现了更高的总体性和适应性。

Abstract
In Programming by Demonstration, the robot learns novel skills from human demonstrations. After learning, the robot should be able not only to reproduce the skill, but also to generalize it to shifted domains without collecting new training data. Adaptation to similar domains has been investigated in the literature; however, an open problem is how to adapt learned skills to different conditions that are outside of the data distribution, and, more important, how to preserve the precision of the desired adaptations. This paper presents a novel supervised learning framework called Constrained Equation Learner Networks that addresses the trajectory adaptation problem in Programming by Demonstrations from a constrained regression perspective. While conventional approaches for constrained regression use one kind of basis function, e.g., Gaussian, we exploit Equation Learner Networks to learn a set of analytical expressions and use them as basis functions. These basis functions are learned from demonstration with the objective to minimize deviations from the training data while imposing constraints that represent the desired adaptations, like new initial or final points or maintaining the trajectory within given bounds. Our approach addresses three main difficulties in adapting robotic trajectories: 1) minimizing the distortion of the trajectory for new adaptations; 2) preserving the precision of the adaptations; and 3) dealing with the lack of intuition about the structure of basis functions. We validate our approach both in simulation and in real experiments in a set of robotic tasks that require adaptation due to changes in the environment, and we compare obtained results with two existing approaches. Performed experiments show that Constrained Equation Learner Networks outperform state of the art approaches by increasing generalization and adaptability of robotic skills.

摘要
在程序编程中，机器人从人类示例学习新技能。学习后，机器人应该不仅能复制技能，还能泛化到偏移域无需新的训练数据。针对相似域的适应已经在文献中 investigate;然而，一个开放的问题是如何适应不同的条件，这些条件外部训练数据分布。此外，更重要的是如何保持适应的精度。这篇论文提出了一种新的监督学习框架，即受限 regression 框架，用于程序编程中的示例适应问题。与传统的受限 regression 方法一样，我们使用一种基于Equation Learner Networks的学习算法，以学习一组分析表达式，并使其作为基函数使用。这些基函数从示例学习中得出，并与具有限制的目标函数进行拟合，以最小化示例数据与适应结果之间的差异，同时保持适应的精度。我们的方法解决了以下三个主要难题：1）减少新适应中的路径扭曲;2）保持适应的精度;3）处理基函数的感知问题。我们在实验中 validate 了我们的方法，并与两种现有方法进行比较。实验结果表明，受限Equation Learner Networks 可以比现有方法提高机器人技能的泛化和适应能力。

Multi-State Brain Network Discovery

paper_url: http://arxiv.org/abs/2311.02466
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Hang Yin, Yao Su, Xinyue Liu, Thomas Hartvigsen, Yanhua Li, Xiangnan Kong
for: This paper aims to discover brain networks from spatio-temporal signals obtained by neuroimaging data, such as fMRI scans of human brains, and to model multi-state brain networks that capture the intricate patterns of brain activities.
methods: The proposed method, called MNGL (Multi-state Network Graphical Lasso), combines CGL (coherent graphical lasso) with GMM (Gaussian Mixture Model) to successfully model multi-state brain networks.
results: Compared to recent state-of-the-art alternatives, MNGL outperforms by discovering more explanatory and realistic results using both synthetic and real world ADHD 200 fMRI datasets.Here’s the Chinese translation of the three information:
for: 这篇论文旨在基于神经成像数据，如fMRI扫描人脑的信号，发现脑网络，并模型多状态脑网络，以捕捉脑活动的复杂征性。
methods: 提议的方法是MNGL（多状态网络图解吸积模型），它将CGL（协调图解吸积模型）与GMM（加性分布模型）结合，成功地模型多状态脑网络。
results: 与最新的状态艺术相比，MNGL在使用 sintetic和实际ADHD 200 fMRI数据上表现出色，可以更好地捕捉脑活动的特征。

Abstract
Brain network discovery aims to find nodes and edges from the spatio-temporal signals obtained by neuroimaging data, such as fMRI scans of human brains. Existing methods tend to derive representative or average brain networks, assuming observed signals are generated by only a single brain activity state. However, the human brain usually involves multiple activity states, which jointly determine the brain activities. The brain regions and their connectivity usually exhibit intricate patterns that are difficult to capture with only a single-state network. Recent studies find that brain parcellation and connectivity change according to the brain activity state. We refer to such brain networks as multi-state, and this mixture can help us understand human behavior. Thus, compared to a single-state network, a multi-state network can prevent us from losing crucial information of cognitive brain network. To achieve this, we propose a new model called MNGL (Multi-state Network Graphical Lasso), which successfully models multi-state brain networks by combining CGL (coherent graphical lasso) with GMM (Gaussian Mixture Model). Using both synthetic and real world ADHD 200 fMRI datasets, we demonstrate that MNGL outperforms recent state-of-the-art alternatives by discovering more explanatory and realistic results.

摘要
�� Git network discovery aims to find nodes and edges from the spatio-temporal signals obtained by neuroimaging data, such as fMRI scans of human brains. Existing methods tend to derive representative or average brain networks, assuming observed signals are generated by only a single brain activity state. However, the human brain usually involves multiple activity states, which jointly determine the brain activities. The brain regions and their connectivity usually exhibit intricate patterns that are difficult to capture with only a single-state network. Recent studies find that brain parcellation and connectivity change according to the brain activity state. We refer to such brain networks as multi-state, and this mixture can help us understand human behavior. Thus, compared to a single-state network, a multi-state network can prevent us from losing crucial information of cognitive brain network. To achieve this, we propose a new model called MNGL (Multi-state Network Graphical Lasso), which successfully models multi-state brain networks by combining CGL (coherent graphical lasso) with GMM (Gaussian Mixture Model). Using both synthetic and real world ADHD 200 fMRI datasets, we demonstrate that MNGL outperforms recent state-of-the-art alternatives by discovering more explanatory and realistic results.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Levels of AGI: Operationalizing Progress on the Path to AGI

paper_url: http://arxiv.org/abs/2311.02462
repo_url: None
paper_authors: Meredith Ringel Morris, Jascha Sohl-dickstein, Noah Fiedel, Tris Warkentin, Allan Dafoe, Aleksandra Faust, Clement Farabet, Shane Legg
for: This paper proposes a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors.
methods: The paper analyzes existing definitions of AGI and distills six principles that a useful ontology for AGI should satisfy.
results: The paper proposes “Levels of AGI” based on depth (performance) and breadth (generality) of capabilities, and discusses the challenges of quantifying the behavior and capabilities of AGI models against these levels.Here are the three key points in Simplified Chinese text:
for: 这篇论文提出了一个用于分类人工通用智能（AGI）模型和其前体的框架。
methods: 这篇论文通过分析现有的AGI定义，提出了 six 个用于AGIontology的原则。
results: 这篇论文提出了“Levels of AGI”，基于深度（性能）和面积（通用）的能力，并讨论了量化AGI模型的行为和能力的挑战。

Abstract
We propose a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors. This framework introduces levels of AGI performance, generality, and autonomy. It is our hope that this framework will be useful in an analogous way to the levels of autonomous driving, by providing a common language to compare models, assess risks, and measure progress along the path to AGI. To develop our framework, we analyze existing definitions of AGI, and distill six principles that a useful ontology for AGI should satisfy. These principles include focusing on capabilities rather than mechanisms; separately evaluating generality and performance; and defining stages along the path toward AGI, rather than focusing on the endpoint. With these principles in mind, we propose 'Levels of AGI' based on depth (performance) and breadth (generality) of capabilities, and reflect on how current systems fit into this ontology. We discuss the challenging requirements for future benchmarks that quantify the behavior and capabilities of AGI models against these levels. Finally, we discuss how these levels of AGI interact with deployment considerations such as autonomy and risk, and emphasize the importance of carefully selecting Human-AI Interaction paradigms for responsible and safe deployment of highly capable AI systems.

摘要
我们提出了一套AGI模型和其前体的能力和行为分类框架。这套框架 introduce AGI性能、通用性和自主性的多个级别。我们希望这套框架可以与自动驾驶技术类似，提供一种共同语言，比较模型、评估风险和衡量AGI的进步。为开发这套框架，我们分析了现有的AGI定义，并总结出六个AGIontology应满足的原则。这些原则包括专注于能力而不是机制，分开评估通用性和性能，以及定义AGI的发展阶段，而不是专注于终点。基于这些原则，我们提出了“AGI级别”，按照性能和通用性的深度和面积来评估AGI模型的能力。我们还讨论了未来测试AGI模型的标准 benchmark，并讨论了这些级别与部署考虑因素，如自主和风险的关系。最后，我们强调选择合适的人机交互模式对于负责任和安全地部署高能力AI系统非常重要。

Can ChatGPT support software verification?

paper_url: http://arxiv.org/abs/2311.02433
repo_url: None
paper_authors: Christian Janßen, Cedric Richter, Heike Wehrheim
for: 这个论文的目的是研究使用 chatGPT 支持正式软件验证。
methods: 论文使用 chatGPT 生成 loop invariants，并通过 Frama-C 和 CPAchecker 验证其有效性和实用性。
results: 论文的结果表明，chatGPT 可以生成有效和实用的 loop invariants，帮助 Frama-C 验证 tasks 之前无法解决。

Abstract
Large language models have become increasingly effective in software engineering tasks such as code generation, debugging and repair. Language models like ChatGPT can not only generate code, but also explain its inner workings and in particular its correctness. This raises the question whether we can utilize ChatGPT to support formal software verification. In this paper, we take some first steps towards answering this question. More specifically, we investigate whether ChatGPT can generate loop invariants. Loop invariant generation is a core task in software verification, and the generation of valid and useful invariants would likely help formal verifiers. To provide some first evidence on this hypothesis, we ask ChatGPT to annotate 106 C programs with loop invariants. We check validity and usefulness of the generated invariants by passing them to two verifiers, Frama-C and CPAchecker. Our evaluation shows that ChatGPT is able to produce valid and useful invariants allowing Frama-C to verify tasks that it could not solve before. Based on our initial insights, we propose ways of combining ChatGPT (or large language models in general) and software verifiers, and discuss current limitations and open issues.

摘要
To provide some initial evidence for this hypothesis, we ask ChatGPT to annotate 106 C programs with loop invariants. We then check the validity and usefulness of the generated invariants by passing them to two verifiers, Frama-C and CPAchecker. Our evaluation shows that ChatGPT is able to produce valid and useful invariants that allow Frama-C to verify tasks that it could not solve before. Based on our initial insights, we propose ways of combining ChatGPT (or large language models in general) and software verifiers, and discuss current limitations and open issues.

CDR-Adapter: Learning Adapters to Dig Out More Transferring Ability for Cross-Domain Recommendation Models

paper_url: http://arxiv.org/abs/2311.02398
repo_url: None
paper_authors: Yanyu Chen, Yao Yao, Wai Kin Victor Chan, Li Xiao, Kai Zhang, Liang Zhang, Yun Ye
for: 解决推荐系统中的数据稀缺和冷启动问题，提高推荐性能。
methods: 提出了一种可扩展和高效的解决方案，即CDR-Adapter，它通过分离原来的推荐模型和映射函数，实现知识传递而不需要重新引入网络结构，从而避免了计算成本高和知识淡化问题。
results: 在标准测试集上进行了广泛的实验，证明了我们的方法的有效性，比如 state-of-the-art CDR 方法。

Abstract
Data sparsity and cold-start problems are persistent challenges in recommendation systems. Cross-domain recommendation (CDR) is a promising solution that utilizes knowledge from the source domain to improve the recommendation performance in the target domain. Previous CDR approaches have mainly followed the Embedding and Mapping (EMCDR) framework, which involves learning a mapping function to facilitate knowledge transfer. However, these approaches necessitate re-engineering and re-training the network structure to incorporate transferrable knowledge, which can be computationally expensive and may result in catastrophic forgetting of the original knowledge. In this paper, we present a scalable and efficient paradigm to address data sparsity and cold-start issues in CDR, named CDR-Adapter, by decoupling the original recommendation model from the mapping function, without requiring re-engineering the network structure. Specifically, CDR-Adapter is a novel plug-and-play module that employs adapter modules to align feature representations, allowing for flexible knowledge transfer across different domains and efficient fine-tuning with minimal training costs. We conducted extensive experiments on the benchmark dataset, which demonstrated the effectiveness of our approach over several state-of-the-art CDR approaches.

摘要
数据稀缺和冷启问题是推荐系统中的惯常挑战。跨Domain推荐（CDR）是一种有前途的解决方案，它利用源Domain中的知识提高目标Domain中的推荐性能。先前的CDR方法主要遵循Embedding and Mapping（EMCDR）框架，这些方法通常需要重新工程和重新训练网络结构，以便传递可移植的知识。然而，这些方法可能需要大量的计算资源和可能会导致原始知识的恐慌遗忘。在这篇论文中，我们提出了一种可扩展和高效的方法，以解决推荐系统中的数据稀缺和冷启问题，名为CDR-Adapter。CDR-Adapter是一种新的插件模块，它使用适配器模块来对特征表示进行减Alignment，以便在不同的Domain之间进行可靠的知识传递和高效的细化训练，无需重新工程网络结构。我们对标准数据集进行了广泛的实验，结果表明我们的方法比先前的CDR方法更有效。

Continual Learning of Unsupervised Monocular Depth from Videos

paper_url: http://arxiv.org/abs/2311.02393
repo_url: https://github.com/NeurAI-Lab/CUDE-MonoDepthCL
paper_authors: Hemang Chawla, Arnav Varma, Elahe Arani, Bahram Zonooz
for: 提高无监督单目深度估计的能力，应用于 робо扮和自动驾驶等领域。
methods: 提出了一个框架，以及一种基于备忘的双存储方法（MonoDepthCL），利用空间时间协调的一致性来实现不间断学习。
results: 模型在不同的频率和规模上进行了训练和测试，显示了在不间断学习中的性能稳定性和增长。

Abstract
Spatial scene understanding, including monocular depth estimation, is an important problem in various applications, such as robotics and autonomous driving. While improvements in unsupervised monocular depth estimation have potentially allowed models to be trained on diverse crowdsourced videos, this remains underexplored as most methods utilize the standard training protocol, wherein the models are trained from scratch on all data after new data is collected. Instead, continual training of models on sequentially collected data would significantly reduce computational and memory costs. Nevertheless, naive continual training leads to catastrophic forgetting, where the model performance deteriorates on older domains as it learns on newer domains, highlighting the trade-off between model stability and plasticity. While several techniques have been proposed to address this issue in image classification, the high-dimensional and spatiotemporally correlated outputs of depth estimation make it a distinct challenge. To the best of our knowledge, no framework or method currently exists focusing on the problem of continual learning in depth estimation. Thus, we introduce a framework that captures the challenges of continual unsupervised depth estimation (CUDE), and define the necessary metrics to evaluate model performance. We propose a rehearsal-based dual-memory method, MonoDepthCL, which utilizes spatiotemporal consistency for continual learning in depth estimation, even when the camera intrinsics are unknown.

摘要
空间场景理解，包括单目深度估计，在各种应用中具有重要性，如 роботиcs和自动驾驶。尽管无监督单目深度估计的改进允许模型在不同的人工训练视频上进行训练，但这还是未经探索的，因为大多数方法使用标准训练协议，即从 scratch 上所有数据进行训练，新数据收集后。相反，继续训练模型在顺序收集的数据上会显著减少计算和内存成本。然而，简单的继续训练会导致忘记现象，模型对老化领域的性能下降，显示出模型稳定性和抑制的负面关系。虽然一些技术已经被提出来解决这个问题在图像分类方面，但高维度和空间时间相关的输出使得深度估计是一个特殊的挑战。根据我们所知，现在没有任何框架或方法专门关注深度估计的连续学习问题。因此，我们提出了一个框架，即不间断无监督深度估计框架（CUDE），并定义了评估模型性能的必要指标。我们提议一种备忘队列方法，即单目深度CL，它通过空间时间一致性来实现连续学习深度估计，即使摄像头内参不详。

Cross-Level Distillation and Feature Denoising for Cross-Domain Few-Shot Classification

paper_url: http://arxiv.org/abs/2311.02392
repo_url: https://github.com/jarucezh/cldfd
paper_authors: Hao Zheng, Runqi Wang, Jianzhuang Liu, Asako Kanezaki
for: 本研究目标是解决跨频道几拟分类问题，即在培育模型时使用不同频道的数据进行学习。
methods: 我们采用了跨层知识填充法，以强化模型在目标数据集中提取更多特征信息。此外，我们还提出了一种特征净化操作，以降低特征重复和避免过拟合。
results: 我们的方法可以在BSCD-FSL benchmark中超越前任的Dynamic-Distillation方法，在1-shot和5-shot分类任务上的平均提高5.44%和1.37%。代码将在GitHub上提供。

Abstract
The conventional few-shot classification aims at learning a model on a large labeled base dataset and rapidly adapting to a target dataset that is from the same distribution as the base dataset. However, in practice, the base and the target datasets of few-shot classification are usually from different domains, which is the problem of cross-domain few-shot classification. We tackle this problem by making a small proportion of unlabeled images in the target domain accessible in the training stage. In this setup, even though the base data are sufficient and labeled, the large domain shift still makes transferring the knowledge from the base dataset difficult. We meticulously design a cross-level knowledge distillation method, which can strengthen the ability of the model to extract more discriminative features in the target dataset by guiding the network's shallow layers to learn higher-level information. Furthermore, in order to alleviate the overfitting in the evaluation stage, we propose a feature denoising operation which can reduce the feature redundancy and mitigate overfitting. Our approach can surpass the previous state-of-the-art method, Dynamic-Distillation, by 5.44% on 1-shot and 1.37% on 5-shot classification tasks on average in the BSCD-FSL benchmark. The implementation code will be available at https://github.com/jarucezh/cldfd.

摘要
传统的几shot分类目标是学习一个模型，然后快速适应目标数据集，但在实际应用中，基数据集和目标数据集通常来自不同的领域，这是跨领域几shot分类的问题。我们解决这个问题 by 在训练阶段使得小量目标数据集中的无标照图像变得可访问。尽管基数据集充足并彩色标注，但大领域变化仍然使得从基数据集传输知识困难。我们仔细设计了跨层知识填充方法，该方法可以使模型在目标数据集中提取更多特征分类器。此外，为了避免评估阶段的过拟合，我们提议一种特征净化操作，可以减少特征重复和抑制过拟合。我们的方法可以在BSCD-FSL标准准则下平均超过前一个状态的方法，Dynamic-Distillation，在1shot和5shot分类任务上的平均性能提高5.44%和1.37%。代码实现将提供在https://github.com/jarucezh/cldfd中。

AI-based Self-healing Solutions Applied to Cellular Networks: An Overview

paper_url: http://arxiv.org/abs/2311.02390
repo_url: None
paper_authors: Jaleh Farmani, Amirreza Khalil Zadeh
For: The paper is written for researchers and practitioners in the field of cellular networks, specifically those interested in self-healing and machine learning techniques for network management.* Methods: The paper provides an overview of machine learning methods, including classical and deep learning variants, that are used to implement self-healing for cell outages in cellular networks.* Results: The paper reviews the state-of-the-art in literature for cell outages, with a particular emphasis on machine learning-based approaches.Here are the three key points in Simplified Chinese text:* For: 本文是为Cellular网络相关研究人员和实践者所写，尤其是关注自适应和机器学习技术的网络管理方面。* Methods: 本文提供了机器学习方法的概述，包括经典和深度学习变体，用于实现Cellular网络中的自适应。* Results: 本文对Cellular网络中的维护和自适应方面进行了文献综述，尤其是关注机器学习基于的方法。

Abstract
In this article, we provide an overview of machine learning (ML) methods, both classical and deep variants, that are used to implement self-healing for cell outages in cellular networks. Self-healing is a promising approach to network management, which aims to detect and compensate for cell outages in an autonomous way. This technology aims to decrease the expenses associated with the installation and maintenance of existing 4G and 5G, i.e. emerging 6G networks by simplifying operational tasks through its ability to heal itself. We provide an overview of the basic concepts and taxonomy for SON, self-healing, and ML techniques, in network management. Moreover, we review the state-of-the-art in literature for cell outages, with a particular emphasis on ML-based approaches.

摘要
在这篇文章中，我们提供了机器学习（ML）方法的概述，包括古典和深度变种，用于实现cell网络中的自适应维护。自适应维护是一种有前途的网络管理方法，旨在通过自动化方式探测和补做cell网络中的维护问题。这技术可以降低现有4G和5G等新生6G网络的安装和维护成本，通过简化操作任务来简化操作任务。我们还提供了网络管理基本概念和分类，以及相关文献综述。特别是，我们对文献中关于cell网络维护的研究进行了深入审查，强调了基于ML的方法。

Ultra-Long Sequence Distributed Transformer

paper_url: http://arxiv.org/abs/2311.02382
repo_url: None
paper_authors: Xiao Wang, Isaac Lyngaas, Aristeidis Tsaris, Peng Chen, Sajal Dash, Mayanka Chandra Shekar, Tao Luo, Hong-Jun Yoon, Mohamed Wahib, John Gouley
for: 该论文旨在提出一种高效的分布式训练方法，以便使用长序列训练变换器模型。
methods: 该方法分割长序列成多个段，并将每个段分配给不同的GPU进行计算。然后，它使用了一种复合通信和双均值平均技术来避免部分自注意计算的汇聚和通信开销。
results: 与现有的序列并行技术相比，该方法在144个Nvidia V100 GPU上实现了5.6倍的速度提升和10.2倍的内存可用性提升。此外，该算法可以扩展到极长序列长度50,112，在3,456个GPU上实现161%的超线性并行效率和32PFLOP的吞吐量。

Abstract
Transformer models trained on long sequences often achieve higher accuracy than short sequences. Unfortunately, conventional transformers struggle with long sequence training due to the overwhelming computation and memory requirements. Existing methods for long sequence training offer limited speedup and memory reduction, and may compromise accuracy. This paper presents a novel and efficient distributed training method, the Long Short-Sequence Transformer (LSS Transformer), for training transformer with long sequences. It distributes a long sequence into segments among GPUs, with each GPU computing a partial self-attention for its segment. Then, it uses a fused communication and a novel double gradient averaging technique to avoid the need to aggregate partial self-attention and minimize communication overhead. We evaluated the performance between LSS Transformer and the state-of-the-art Nvidia sequence parallelism on a Wikipedia enwik8 dataset. Results show that our proposed method lead to 5.6x faster and 10.2x more memory-efficient implementation compared to state-of-the-art sequence parallelism on 144 Nvidia V100 GPUs. Moreover, our algorithm scales to an extreme sequence length of 50,112 at 3,456 GPUs, achieving 161% super-linear parallel efficiency and a throughput of 32 petaflops.

摘要
很多变换器模型在长序列上训练时会达到更高的准确率。然而，普通的变换器在长序列训练时会遇到过于复杂的计算和内存需求的问题。现有的长序列训练方法可能会减少速度和内存占用，并可能会降低准确率。这篇论文提出了一种新的和高效的分布式训练方法——长短序列变换器（LSS Transformer），用于训练变换器模型。它将长序列分成多个 GPU 上的段，每个 GPU 计算自己的段部分自注意。然后，它使用了一种混合通信和一种新的双 Gradient 平均技术，以避免需要合并部分自注意和减少通信开销。我们对 LSS Transformer 和状态对照短序列并行方法进行了对比，使用 Wikipedia enwik8 数据集。结果显示，我们提出的方法比状态对照短序列并行方法在 144 Nvidia V100 GPU 上得到了5.6倍快速和10.2倍内存高效的实现。此外，我们的算法可以在极长序列长度为50,112的情况下，在3,456 GPU 上进行扩展，实现161%的超线性并行率和32 petaflops 的吞吐量。

Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

paper_url: http://arxiv.org/abs/2311.02379
repo_url: None
paper_authors: Kun Chu, Xufeng Zhao, Cornelius Weber, Mengdi Li, Stefan Wermter
for: 提高RLAgent的学习效率和成功率，通过人工智能语言模型提供有用的反馈。
methods: 利用大量语言数据预训练的大语言模型（LLM），提供RLAgent有用的反馈，帮助RLAgent更快速地学习和成功完成 робо控制任务。
results: 实验结果表明，使用Lafite-RL框架，RLAgent可以更快速地学习并成功完成RLBench任务，并且在学习效率和成功率上都有显著提高。

Abstract
Reinforcement Learning (RL) plays an important role in the robotic manipulation domain since it allows self-learning from trial-and-error interactions with the environment. Still, sample efficiency and reward specification seriously limit its potential. One possible solution involves learning from expert guidance. However, obtaining a human expert is impractical due to the high cost of supervising an RL agent, and developing an automatic supervisor is a challenging endeavor. Large Language Models (LLMs) demonstrate remarkable abilities to provide human-like feedback on user inputs in natural language. Nevertheless, they are not designed to directly control low-level robotic motions, as their pretraining is based on vast internet data rather than specific robotics data. In this paper, we introduce the Lafite-RL (Language agent feedback interactive Reinforcement Learning) framework, which enables RL agents to learn robotic tasks efficiently by taking advantage of LLMs' timely feedback. Our experiments conducted on RLBench tasks illustrate that, with simple prompt design in natural language, the Lafite-RL agent exhibits improved learning capabilities when guided by an LLM. It outperforms the baseline in terms of both learning efficiency and success rate, underscoring the efficacy of the rewards provided by an LLM.

摘要

MTS-DVGAN: Anomaly Detection in Cyber-Physical Systems using a Dual Variational Generative Adversarial Network

paper_url: http://arxiv.org/abs/2311.02378
repo_url: None
paper_authors: Haili Sun, Yan Huang, Lansheng Han, Cai Fu, Hongle Liu, Xiang Long
for: 这篇论文旨在应用深度生成模型来探测 Cyber-physical systems (CPSs) 中的新型攻击，并无需靠扩展标签信息。
methods: 这篇论文提出了一个名为 MST-DVGAN 的新型Unsupervised dual variational generative adversarial model，用于侦测 multivariate time series data 中的异常情况。
results: compared with state-of-the-art methods, the proposed MTS-DVGAN is more stable and can achieve consistent performance improvement in detecting anomalies in CPSs.

Abstract
Deep generative models are promising in detecting novel cyber-physical attacks, mitigating the vulnerability of Cyber-physical systems (CPSs) without relying on labeled information. Nonetheless, these generative models face challenges in identifying attack behaviors that closely resemble normal data, or deviate from the normal data distribution but are in close proximity to the manifold of the normal cluster in latent space. To tackle this problem, this article proposes a novel unsupervised dual variational generative adversarial model named MST-DVGAN, to perform anomaly detection in multivariate time series data for CPS security. The central concept is to enhance the model's discriminative capability by widening the distinction between reconstructed abnormal samples and their normal counterparts. Specifically, we propose an augmented module by imposing contrastive constraints on the reconstruction process to obtain a more compact embedding. Then, by exploiting the distribution property and modeling the normal patterns of multivariate time series, a variational autoencoder is introduced to force the generative adversarial network (GAN) to generate diverse samples. Furthermore, two augmented loss functions are designed to extract essential characteristics in a self-supervised manner through mutual guidance between the augmented samples and original samples. Finally, a specific feature center loss is introduced for the generator network to enhance its stability. Empirical experiments are conducted on three public datasets, namely SWAT, WADI and NSL_KDD. Comparing with the state-of-the-art methods, the evaluation results show that the proposed MTS-DVGAN is more stable and can achieve consistent performance improvement.

摘要
深度生成模型可以有效探测 novel 的Cyber-physical attacks，减轻 Cyber-physical systems (CPSs) 的感受性，不需要靠据标注信息。然而，这些生成模型面临着难以识别攻击行为，与正常数据 Distribution 相似或者在正常数据分布的邻近区域内偏离。为解决这个问题，本文提出了一种新的无监督 dual variational generative adversarial model（MST-DVGAN），用于Cyber-physical systems (CPSs) 安全中的异常检测。中心思想是通过增强模型的歧义能力，使重构后的异常样本与正常样本更加明显不同。具体来说，我们提出了一个增强模块，通过对重构过程进行冲突约束，以获得更加紧凑的嵌入。然后，通过利用多变量时序数据的分布特性和模型正常模式，引入了一种变量自动编码器，以使生成对抗网络（GAN）生成更加多样化的样本。此外，我们还设计了两种增强loss函数，通过自然指导两者之间的增强样本和原始样本之间的互动，从而提取出更加重要的特征。最后，我们引入了一种特定的特征中心损失，以提高生成器网络的稳定性。我们在三个公共数据集上进行了实验，即SWAT、WADI和NSL_KDD。与现状的方法相比，我们的MST-DVGAN表现更加稳定，并且可以在各种场景下提供更高的性能改进。

Contrastive Deep Nonnegative Matrix Factorization for Community Detection

paper_url: http://arxiv.org/abs/2311.02357
repo_url: None
paper_authors: Yuecheng Li, Jialong Chen, Chuan Chen, Lei Yang, Zibin Zheng
for: 本研究旨在提出一种新的社区探测算法，以解决现有的非正式矩阵分解（NMF）基于方法的三大问题：1）它们直接将原始网络转换为社区会员空间，难以捕捉层次结构信息；2）它们通常仅关注网络的拓扑结构，忽略节点特征；3）它们难以学习社区探测所需的全局结构信息。
methods: 我们提出了一种新的社区探测算法，名为对比深度非负矩阵分解（CDNMF）。我们首先深化NMF，以增强其信息提取能力。然后，我们受到对比学习的启发，把网络拓扑结构和节点特征作为两个对比视图构建。此外，我们使用了一个减噪负样本层，以提高模型的社区同义性。
results: 我们在三个公共实验 graphs 上进行了实验，并证明了 CDNMF 模型在社区探测方面的优异性。与现有方法相比，CDNMF 模型在社区内部的节点相似性学习和全局结构信息捕捉方面具有优势。代码可以在 https://github.com/6lyc/CDNMF.git 中找到。

Abstract
Recently, nonnegative matrix factorization (NMF) has been widely adopted for community detection, because of its better interpretability. However, the existing NMF-based methods have the following three problems: 1) they directly transform the original network into community membership space, so it is difficult for them to capture the hierarchical information; 2) they often only pay attention to the topology of the network and ignore its node attributes; 3) it is hard for them to learn the global structure information necessary for community detection. Therefore, we propose a new community detection algorithm, named Contrastive Deep Nonnegative Matrix Factorization (CDNMF). Firstly, we deepen NMF to strengthen its capacity for information extraction. Subsequently, inspired by contrastive learning, our algorithm creatively constructs network topology and node attributes as two contrasting views. Furthermore, we utilize a debiased negative sampling layer and learn node similarity at the community level, thereby enhancing the suitability of our model for community detection. We conduct experiments on three public real graph datasets and the proposed model has achieved better results than state-of-the-art methods. Code available at https://github.com/6lyc/CDNMF.git.

摘要
现在，非正式矩阵分解（NMF）已经广泛应用于社群检测，因为它的更好的解释性。然而，现有的NMF基于方法有以下三个问题：1）它们直接将原始网络转换成社群成员空间，因此很难捕捉层次信息; 2）它们通常只关注网络的结构，忽略节点特征; 3）它们难以学习必要的全局结构信息 для社群检测。因此，我们提出了一个新的社群检测算法，名为对比深度非正式矩阵分解（CDNMF）。首先，我们深入了NMF，以增强其信息提取能力。然后，我们受到对比学习的启发，创新地将网络结构和节点特征作为两个对比视图。此外，我们使用了一层恢复降低的负样本层，并在社群层上学习节点相似性，从而提高了我们模型的适应性。我们在三个公共实验Graph数据集上进行了实验，并取得了与当前最佳方法的更好的结果。代码可以在https://github.com/6lyc/CDNMF.git中找到。

Perturbation-based Active Learning for Question Answering

paper_url: http://arxiv.org/abs/2311.02345
repo_url: None
paper_authors: Fan Luo, Mihai Surdeanu
for: 建立一个问答模型，可以降低标注成本，通过使用活动学习（AL）训练策略。
methods: 使用活动学习的采样策略，选择最有用的无标示训练数据，以更新模型。
results: 提出了一种扰动基于采样策略，与常用的采样策略相比，更有效率。

Abstract
Building a question answering (QA) model with less annotation costs can be achieved by utilizing active learning (AL) training strategy. It selects the most informative unlabeled training data to update the model effectively. Acquisition functions for AL are used to determine how informative each training example is, such as uncertainty or diversity based sampling. In this work, we propose a perturbation-based active learning acquisition strategy and demonstrate it is more effective than existing commonly used strategies.

摘要
使用活动学习（AL）训练策略可以降低问答（QA）模型标注成本。它选择最有用的未标注训练数据来更新模型，以达到更高的效果。获取函数用于AL来确定每个训练示例的有用程度，如不确定性或多样性基本采样。在这项工作中，我们提议使用扰动基于的活动学习获取策略，并证明它比现有的通用使用策略更有效。

You Only Forward Once: Prediction and Rationalization in A Single Forward Pass

paper_url: http://arxiv.org/abs/2311.02344
repo_url: None
paper_authors: Han Jiang, Junwen Duan, Zhe Qu, Jianxin Wang
for: 本研究旨在提高无监督逻辑抽象的精度和效果，使模型预测时可以快速提取有用的信息。
methods: 本研究使用了一种新的单阶段框架，即You Only Forward Once（YOFO）框架，其中使用了一个预训练的语言模型如BERT进行预测和分析。
results: 实验结果显示，YOFO模型可以比前一代RNP模型更加准确地预测和提取有用的逻辑抽象。对比于前一代方法，YOFO模型可以提高token级F1得分达18.4%。此外，研究还发现YOFO模型可以快速提取有用的逻辑抽象，并且可以在模型中移除不重要的token。

Abstract
Unsupervised rationale extraction aims to extract concise and contiguous text snippets to support model predictions without any annotated rationale. Previous studies have used a two-phase framework known as the Rationalizing Neural Prediction (RNP) framework, which follows a generate-then-predict paradigm. They assumed that the extracted explanation, called rationale, should be sufficient to predict the golden label. However, the assumption above deviates from the original definition and is too strict to perform well. Furthermore, these two-phase models suffer from the interlocking problem and spurious correlations. To solve the above problems, we propose a novel single-phase framework called You Only Forward Once (YOFO), derived from a relaxed version of rationale where rationales aim to support model predictions rather than make predictions. In our framework, A pre-trained language model like BERT is deployed to simultaneously perform prediction and rationalization with less impact from interlocking or spurious correlations. Directly choosing the important tokens in an unsupervised manner is intractable. Instead of directly choosing the important tokens, YOFO gradually removes unimportant tokens during forward propagation. Through experiments on the BeerAdvocate and Hotel Review datasets, we demonstrate that our model is able to extract rationales and make predictions more accurately compared to RNP-based models. We observe an improvement of up to 18.4\% in token-level F1 compared to previous state-of-the-art methods. We also conducted analyses and experiments to explore the extracted rationales and token decay strategies. The results show that YOFO can extract precise and important rationales while removing unimportant tokens in the middle part of the model.

摘要
<> translate the following text into Simplified Chinese:Unsupervised rationale extraction aims to extract concise and contiguous text snippets to support model predictions without any annotated rationale. Previous studies have used a two-phase framework known as the Rationalizing Neural Prediction (RNP) framework, which follows a generate-then-predict paradigm. They assumed that the extracted explanation, called rationale, should be sufficient to predict the golden label. However, the assumption above deviates from the original definition and is too strict to perform well. Furthermore, these two-phase models suffer from the interlocking problem and spurious correlations. To solve the above problems, we propose a novel single-phase framework called You Only Forward Once (YOFO), derived from a relaxed version of rationale where rationales aim to support model predictions rather than make predictions. In our framework, A pre-trained language model like BERT is deployed to simultaneously perform prediction and rationalization with less impact from interlocking or spurious correlations. Directly choosing the important tokens in an unsupervised manner is intractable. Instead of directly choosing the important tokens, YOFO gradually removes unimportant tokens during forward propagation. Through experiments on the BeerAdvocate and Hotel Review datasets, we demonstrate that our model is able to extract rationales and make predictions more accurately compared to RNP-based models. We observe an improvement of up to 18.4\% in token-level F1 compared to previous state-of-the-art methods. We also conducted analyses and experiments to explore the extracted rationales and token decay strategies. The results show that YOFO can extract precise and important rationales while removing unimportant tokens in the middle part of the model.Translation:无监督的理由抽取目标在支持模型预测而不需要任何标注的理由。先前的研究使用了一个两个阶段框架，称为神经网络预测合理化（RNP）框架，该框架采用生成Then预测模式。它们假设提取的解释，即理由，应该足够预测金 Label。然而，上述假设与原始定义偏离，并且太严格来不能perform well。此外，这些两个阶段模型受到了交叠问题和偶极相关性的影响。为解决以上问题，我们提出了一种单阶段框架，称为你只能前进一次（YOFO），该框架基于放松的理由定义，其中理由的目的是支持模型预测而不是预测。在我们的框架中，使用预训练的自然语言模型，如BERT，同时进行预测和合理化，以减少交叠或偶极相关性的影响。直接从无监督中选择重要的字符是不可能。相反，YOFO在前进传播过程中逐渐移除无关重要的字符。通过对BeerAdvocate和酒店评论数据集进行实验，我们证明了我们的模型可以更加准确地提取理由和预测。我们观察到的提取改善率可达18.4%。我们还进行了分析和实验，以探索提取的理由和字符衰减策略。结果表明，YOFO可以提取精确和重要的理由，并在中部模型中移除无关重要的字符。

Stable Diffusion Reference Only: Image Prompt and Blueprint Jointly Guided Multi-Condition Diffusion Model for Secondary Painting

paper_url: http://arxiv.org/abs/2311.02343
repo_url: None
paper_authors: Hao Ai, Lu Sheng
for: 这篇论文旨在提高二次绘制的效率，特别是在漫画、动画等艺术创作领域。
methods: 这篇论文提出了一新的自动化图像生成方法，仅使用两种条件图像进行精确控制生成，以减少对ControlNet的需求。
results: 这篇论文的实验结果显示，这新的方法可以实现高效的二次绘制，并且可以实现高品质的图像生成。

Abstract
Stable Diffusion and ControlNet have achieved excellent results in the field of image generation and synthesis. However, due to the granularity and method of its control, the efficiency improvement is limited for professional artistic creations such as comics and animation production whose main work is secondary painting. In the current workflow, fixing characters and image styles often need lengthy text prompts, and even requires further training through TextualInversion, DreamBooth or other methods, which is very complicated and expensive for painters. Therefore, we present a new method in this paper, Stable Diffusion Reference Only, a images-to-image self-supervised model that uses only two types of conditional images for precise control generation to accelerate secondary painting. The first type of conditional image serves as an image prompt, supplying the necessary conceptual and color information for generation. The second type is blueprint image, which controls the visual structure of the generated image. It is natively embedded into the original UNet, eliminating the need for ControlNet. We released all the code for the module and pipeline, and trained a controllable character line art coloring model at https://github.com/aihao2000/stable-diffusion-reference-only, that achieved state-of-the-art results in this field. This verifies the effectiveness of the structure and greatly improves the production efficiency of animations, comics, and fanworks.

摘要
stable diffusion和controlnet在图像生成和合成领域具有优秀的成绩，但由于它的粒度和控制方法，对专业艺术创作如漫画和动画制作而言，效率提升的限制很大。现在的工作流程中，fixing人物和图像风格经常需要长时间的文本提示，甚至需要通过文本反向、梦幻箱等方法进行进一步的训练，这对画家来说是非常复杂和昂贵的。因此，我们在这篇论文中提出了一新的方法：stable diffusion reference only，这是一种图像到图像的自我超vis的模型，只需要两种类型的条件图像来进行精确的控制生成，以加速secondary painting。第一种类型的条件图像 acted as an image prompt，提供了必要的概念和颜色信息 для生成。第二种类型的条件图像是蓝图图像，它控制了生成图像的视觉结构，并且Native embedding在原始UNet中，消除了需要控制网的需求。我们已经发布了模块和管道的代码，并在https://github.com/aihao2000/stable-diffusion-reference-only上训练了一个可控色彩插画模型，达到了这个领域的州前成绩。这证明了结构的有效性，对动画、漫画和粉丝创作的生产效率进行了很大的提升。

Potato Leaf Disease Classification using Deep Learning: A Convolutional Neural Network Approach

paper_url: http://arxiv.org/abs/2311.02338
repo_url: None
paper_authors: Utkarsh Yashwant Tambe, A. Shobanadevi, A. Shanthini, Hsiu-Chun Hsu
for: 本研究使用深度学习（Deep Learning）来分类芋头叶病。
methods: 提议的方法包括对叶图像数据进行预处理，使用深度学习模型训练，并对测试集进行评估。
results: 实验结果显示，使用深度学习模型，全面准确率达99.1%，可高度准确地识别芋头叶病两种，包括早期虫疫和晚期虫疫，以及健康叶片。这种方法可能为芋头农业中疾病识别提供可靠和有效的解决方案，帮助保持食品安全和避免农业损失。

Abstract
In this study, a Convolutional Neural Network (CNN) is used to classify potato leaf illnesses using Deep Learning. The suggested approach entails preprocessing the leaf image data, training a CNN model on that data, and assessing the model's success on a test set. The experimental findings show that the CNN model, with an overall accuracy of 99.1%, is highly accurate in identifying two kinds of potato leaf diseases, including Early Blight, Late Blight, and Healthy. The suggested method may offer a trustworthy and effective remedy for identifying potato diseases, which is essential for maintaining food security and minimizing financial losses in agriculture. The model can accurately recognize the various disease types even when there are severe infections present. This work highlights the potential of deep learning methods for categorizing potato diseases, which can help with effective and automated disease management in potato farming.

摘要
在这项研究中，我们使用了卷积神经网络（CNN）来分类芋头叶病。我们建议的方法包括对叶图像数据进行预处理，使用该数据训练CNN模型，并对测试集进行评估。实验结果表明，我们的CNN模型，准确率为99.1%，高度准确地识别了两种芋头叶病，包括早期病和晚期病，以及健康的叶片。我们建议的方法可能为芋头农业中维护食品安全和减少经济损失提供一个可靠和有效的解决方案。这种方法可以准确地识别不同的病种，即使有严重的感染存在。这项研究表明了深度学习方法在芋头病类分类中的潜力，这可能会帮助实现自动化和有效的芋头病管理。

STOW: Discrete-Frame Segmentation and Tracking of Unseen Objects for Warehouse Picking Robots

paper_url: http://arxiv.org/abs/2311.02337
repo_url: None
paper_authors: Yi Li, Muru Zhang, Markus Grotz, Kaichun Mo, Dieter Fox
for: 这篇论文主要关注在dynamic industrial robotic contexts和domestic robotic applications中，对于物品的重新排序、移除和部分遮蔽等操作，以及在长时间内实现物品追踪的任务。
methods: 本文提出了一个新的合成和真实世界数据集，以及一个基于transformer模组的联合分割和追踪方法，以解决这些具有挑战性的任务。
results: 本文的实验结果显示，该方法与最近的方法相比，有着优秀的性能。另外，请参考官方网站(\href{https://sites.google.com/view/stow-corl23}{website}) для更多的结果和视频。

Abstract
Segmentation and tracking of unseen object instances in discrete frames pose a significant challenge in dynamic industrial robotic contexts, such as distribution warehouses. Here, robots must handle object rearrangement, including shifting, removal, and partial occlusion by new items, and track these items after substantial temporal gaps. The task is further complicated when robots encounter objects not learned in their training sets, which requires the ability to segment and track previously unseen items. Considering that continuous observation is often inaccessible in such settings, our task involves working with a discrete set of frames separated by indefinite periods during which substantial changes to the scene may occur. This task also translates to domestic robotic applications, such as rearrangement of objects on a table. To address these demanding challenges, we introduce new synthetic and real-world datasets that replicate these industrial and household scenarios. We also propose a novel paradigm for joint segmentation and tracking in discrete frames along with a transformer module that facilitates efficient inter-frame communication. The experiments we conduct show that our approach significantly outperforms recent methods. For additional results and videos, please visit \href{https://sites.google.com/view/stow-corl23}{website}. Code and dataset will be released.

摘要
Segmentation and tracking of unseen object instances in discrete frames poses a significant challenge in dynamic industrial robotic contexts, such as distribution warehouses. Here, robots must handle object rearrangement, including shifting, removal, and partial occlusion by new items, and track these items after substantial temporal gaps. The task is further complicated when robots encounter objects not learned in their training sets, which requires the ability to segment and track previously unseen items. Considering that continuous observation is often inaccessible in such settings, our task involves working with a discrete set of frames separated by indefinite periods during which substantial changes to the scene may occur. This task also translates to domestic robotic applications, such as rearrangement of objects on a table. To address these demanding challenges, we introduce new synthetic and real-world datasets that replicate these industrial and household scenarios. We also propose a novel paradigm for joint segmentation and tracking in discrete frames along with a transformer module that facilitates efficient inter-frame communication. The experiments we conduct show that our approach significantly outperforms recent methods. For additional results and videos, please visit \href{https://sites.google.com/view/stow-corl23}{website}. Code and dataset will be released.Here's the translation in Traditional Chinese:Segmentation and tracking of unseen object instances in discrete frames poses a significant challenge in dynamic industrial robotic contexts, such as distribution warehouses. Here, robots must handle object rearrangement, including shifting, removal, and partial occlusion by new items, and track these items after substantial temporal gaps. The task is further complicated when robots encounter objects not learned in their training sets, which requires the ability to segment and track previously unseen items. Considering that continuous observation is often inaccessible in such settings, our task involves working with a discrete set of frames separated by indefinite periods during which substantial changes to the scene may occur. This task also translates to domestic robotic applications, such as rearrangement of objects on a table. To address these demanding challenges, we introduce new synthetic and real-world datasets that replicate these industrial and household scenarios. We also propose a novel paradigm for joint segmentation and tracking in discrete frames along with a transformer module that facilitates efficient inter-frame communication. The experiments we conduct show that our approach significantly outperforms recent methods. For additional results and videos, please visit \href{https://sites.google.com/view/stow-corl23}{website}. Code and dataset will be released.

Complex Organ Mask Guided Radiology Report Generation

paper_url: http://arxiv.org/abs/2311.02329
repo_url: https://github.com/GaryGuTC/COMG_model
paper_authors: Gu Tiancheng, Liu Dongnan, Li Zhiyuan, Cai Weidong
for: 提高诊断报告的精度和详细程度，alleviate traditional radiology reporting workload.
methods: 基于多Modal的组织面干（COMG）报告生成模型， incorporates 多个器官（如骨、肺、心脏、 mediastinum）的面干，以提供更详细的医疗信息和引导模型注意力。
results: 在 IU-Xray 和 MIMIC 两个公共数据集上实验，COMG 比 SOTA 模型 KiUT 提高了11.4% 和 9.7% 的 BLEU@4 分数。

Abstract
The goal of automatic report generation is to generate a clinically accurate and coherent phrase from a single given X-ray image, which could alleviate the workload of traditional radiology reporting.However, in a real-world scenario, radiologists frequently face the challenge of producing extensive reports derived from numerous medical images, thereby medical report generation from multi-image perspective is needed.In this paper, we propose the Complex Organ Mask Guided (termed as COMG) report generation model, which incorporates masks from multiple organs (e.g., bones, lungs, heart, and mediastinum), to provide more detailed information and guide the model's attention to these crucial body regions. Specifically, we leverage prior knowledge of the disease corresponding to each organ in the fusion process to enhance the disease identification phase during the report generation process. Additionally, cosine similarity loss is introduced as target function to ensure the convergence of cross-modal consistency and facilitate model optimization.Experimental results on two public datasets show that COMG achieves a 11.4% and 9.7% improvement in terms of BLEU@4 scores over the SOTA model KiUT on IU-Xray and MIMIC, respectively.

摘要
目的自动报告生成是生成单一X-ray图像的依据生成一个临床精确和连贯的句子，以减轻传统医疗影像报告的工作负担。然而，在实际应用中，医生很频繁地面临着从多个医疗影像中生成详细的报告，因此对多个医疗影像的医疗报告生成是需要的。在这篇文章中，我们提出了复杂器官面精准指南（COMG）报告生成模型，它将多个器官（如骨、肺、心脏和脊椎）的面精准指南 integrate into the model,以提供更多的详细信息和导引模型的注意力到这些重要的身体区域。具体来说，我们利用各器官疾病的专业知识在融合过程中强化疾病识别阶段，以提高报告生成过程中的疾病识别率。此外，我们引入了cosine similarity损失函数，以便在混合modal consistency的整合过程中实现模型优化。实验结果显示，在两个公共数据集上，COMG对于KiUT的BLEU@4分数有11.4%和9.7%的提升。

FragXsiteDTI: Revealing Responsible Segments in Drug-Target Interaction with Transformer-Driven Interpretation

paper_url: http://arxiv.org/abs/2311.02326
repo_url: None
paper_authors: Ali Khodabandeh Yalabadi, Mehdi Yazdani-Jahromi, Niloofar Yousefi, Aida Tayebi, Sina Abdidizaji, Ozlem Ozmen Garibay
For: 预测药物目标交互（DTI）在药物发现中具有重要意义，但是现有模型具有解释性和性能优化等挑战。本文提出了一种基于转换器的新模型，即FragXsiteDTI，以解决DTI预测中的这些挑战。* Methods: FragXsiteDTI模型 simultaneous 利用药物分子块和蛋白质孔隙，并采用了转换器架构，包括跨注意力和自注意力。模型还具有可学习的隐藏数组，通过cross-attention和self-attention来改进模型的性能。* Results: 根据三个 benchmarking 数据集的计算结果，FragXsiteDTI 模型在预测DTI方面表现出了明显的优势，并且可以准确地表达药物和目标蛋白质之间的交互。此外，模型还可以提供可读性的解释，包括药物和目标蛋白质中关键的组分。

Abstract
Drug-Target Interaction (DTI) prediction is vital for drug discovery, yet challenges persist in achieving model interpretability and optimizing performance. We propose a novel transformer-based model, FragXsiteDTI, that aims to address these challenges in DTI prediction. Notably, FragXsiteDTI is the first DTI model to simultaneously leverage drug molecule fragments and protein pockets. Our information-rich representations for both proteins and drugs offer a detailed perspective on their interaction. Inspired by the Perceiver IO framework, our model features a learnable latent array, initially interacting with protein binding site embeddings using cross-attention and later refined through self-attention and used as a query to the drug fragments in the drug's cross-attention transformer block. This learnable query array serves as a mediator and enables seamless information translation, preserving critical nuances in drug-protein interactions. Our computational results on three benchmarking datasets demonstrate the superior predictive power of our model over several state-of-the-art models. We also show the interpretability of our model in terms of the critical components of both target proteins and drug molecules within drug-target pairs.

摘要
drugs-target interaction (DTI) prediction is crucial for drug discovery, but challenges remain in achieving model interpretability and optimizing performance. We propose a novel transformer-based model, FragXsiteDTI, to address these challenges in DTI prediction. Notably, FragXsiteDTI is the first DTI model to simultaneously leverage drug molecule fragments and protein pockets. Our information-rich representations for both proteins and drugs provide a detailed perspective on their interaction. Inspired by the Perceiver IO framework, our model features a learnable latent array that initially interacts with protein binding site embeddings using cross-attention and is later refined through self-attention. This learnable query array serves as a mediator and enables seamless information translation, preserving critical nuances in drug-protein interactions. Our computational results on three benchmarking datasets demonstrate the superior predictive power of our model over several state-of-the-art models. We also show the interpretability of our model in terms of the critical components of both target proteins and drug molecules within drug-target pairs.Here's the translation in Traditional Chinese:这是一个标题，请将其转换为中文。药品-标的互动（DTI）预测是药品探索的重要环节，但是还有许多挑战需要解决，以提高模型的解释性和性能。我们提出了一个新的transformer-based模型，FragXsiteDTI，以解决DTI预测中的这些挑战。值得注意的是，FragXsiteDTI是首个同时利用药品分子片段和蛋白质包含物的DTI模型。我们的模型具有丰富的资讯表现，对药品和蛋白质之间的互动提供了详细的见解。受到Perceiver IO框架的启发，我们的模型具有可学习的潜在阵列，首先与蛋白质绑定位置嵌入进行交互，然后通过自我对话和探索来进一步细化。这个可学习的查询阵列作为一个中介者，实现了无障碍的资讯转化，保留了药品-蛋白质互动中的重要特征。我们的computational result表明，FragXsiteDTI模型在三个benchmarkingdataset上的预测力高于多个现有模型。我们还展示了我们模型在药品-标的对之中的解释性，包括标的蛋白质和药品分子之间的关键Component。

Thermal Face Image Classification using Deep Learning Techniques

paper_url: http://arxiv.org/abs/2311.02314
repo_url: None
paper_authors: Prosenjit Chatterjee, ANK Zaman
for: 这篇论文应用于热像分类。
methods: 本文使用深度学习方法，specifically ResNet-50和VGGNet-19对热像进行特征提取，并应用Kalman统计filter进行图像预测。
results: 实验结果显示提案方法具有高精度和高效率。

Abstract
Thermal images have various applications in security, medical and industrial domains. This paper proposes a practical deep-learning approach for thermal image classification. Accurate and efficient classification of thermal images poses a significant challenge across various fields due to the complex image content and the scarcity of annotated datasets. This work uses a convolutional neural network (CNN) architecture, specifically ResNet-50 and VGGNet-19, to extract features from thermal images. This work also applied Kalman filter on thermal input images for image denoising. The experimental results demonstrate the effectiveness of the proposed approach in terms of accuracy and efficiency.

摘要
热图像在安全、医疗和工业领域有多种应用。这篇论文提出了一种实用的深度学习方法来对热图像进行分类。由于热图像的复杂图像内容以及不同领域的热图像标注数据的稀缺，精度和效率的图像分类具有 significannot challenges。本文使用 convolutional neural network (CNN) 架构，具体来说是 ResNet-50 和 VGGNet-19，来从热图像中提取特征。此外，本文还应用了 Kalman 筛选器来降噪热输入图像。实验结果表明提出的方法在精度和效率两个方面具有remarkable的效果。

OSM vs HD Maps: Map Representations for Trajectory Prediction

paper_url: http://arxiv.org/abs/2311.02305
repo_url: None
paper_authors: Jing-Yan Liao, Parth Doshi, Zihan Zhang, David Paz, Henrik Christensen
for: 提出了一种使用 OpenStreetMap (OSM) 作为替代高清地图 (HD Maps) 的方法，以便在自动驾驶中长期预测动向。
methods: 该方法通过扩展应用范围和 интеграción intercept 约束，使用 OSM 来实现长期预测动向，并且能够与 HD Map 基本相当。
results: 研究表明，该方法可以在不同的情景下提供更好的预测性能，并且可以在自动驾驶中广泛应用。

Abstract
While High Definition (HD) Maps have long been favored for their precise depictions of static road elements, their accessibility constraints and susceptibility to rapid environmental changes impede the widespread deployment of autonomous driving, especially in the motion forecasting task. In this context, we propose to leverage OpenStreetMap (OSM) as a promising alternative to HD Maps for long-term motion forecasting. The contributions of this work are threefold: firstly, we extend the application of OSM to long-horizon forecasting, doubling the forecasting horizon compared to previous studies. Secondly, through an expanded receptive field and the integration of intersection priors, our OSM-based approach exhibits competitive performance, narrowing the gap with HD Map-based models. Lastly, we conduct an exhaustive context-aware analysis, providing deeper insights in motion forecasting across diverse scenarios as well as conducting class-aware comparisons. This research not only advances long-term motion forecasting with coarse map representations but additionally offers a potential scalable solution within the domain of autonomous driving.

摘要
高清定图（HD Map）已经长期被喜欢用于其精确地表示静止道路元素，但是访问限制和环境变化的敏感性使得自动驾驶的广泛部署受到限制，尤其是在动作预测任务中。在这种情况下，我们提议使用开源地图（OSM）作为自动驾驶中长期动作预测的可能的代替方案。本研究的贡献有三个方面：一、我们扩展了OSM的应用范围，使其能够进行更长的预测 horizon，与前一些研究相比， doubling the forecasting horizon。二、通过扩展的接受场和交叉点约束的结合，我们的OSM基于方法在竞争性方面表现出色，与HD Map基于模型相匹配。三、我们进行了广泛的Context-aware分析，提供了更深入的运动预测理解，并进行了不同场景的类型敏感比较。这种研究不仅提高了长期动作预测的粗糙地图表示，还提供了可扩展的解决方案在自动驾驶领域。

MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning

paper_url: http://arxiv.org/abs/2311.02303
repo_url: None
paper_authors: Bingchang Liu, Chaoyu Chen, Cong Liao, Zi Gong, Huan Wang, Zhichao Lei, Ming Liang, Dajun Chen, Min Shen, Hailian Zhou, Hang Yu, Jianguo Li
for: 提高 CodeLLama 模型的编程能力，并且可以同时 fine-tune 多个任务。
methods: 使用多任务 fine-tuning 框架（MFTcoder），并结合多种损失函数来解决多任务学习中的常见挑战。
results: 比较传统 fine-tuning 方法和 mixed 任务 fine-tuning 方法，MFTcoder 能够达到更高的性能，并且可以快速训练和部署。

Abstract
Code LLMs have emerged as a specialized research field, with remarkable studies dedicated to enhancing model's coding capabilities through fine-tuning on pre-trained models. Previous fine-tuning approaches were typically tailored to specific downstream tasks or scenarios, which meant separate fine-tuning for each task, requiring extensive training resources and posing challenges in terms of deployment and maintenance. Furthermore, these approaches failed to leverage the inherent interconnectedness among different code-related tasks. To overcome these limitations, we present a multi-task fine-tuning framework, MFTcoder, that enables simultaneous and parallel fine-tuning on multiple tasks. By incorporating various loss functions, we effectively address common challenges in multi-task learning, such as data imbalance, varying difficulty levels, and inconsistent convergence speeds. Extensive experiments have conclusively demonstrated that our multi-task fine-tuning approach outperforms both individual fine-tuning on single tasks and fine-tuning on a mixed ensemble of tasks. Moreover, MFTcoder offers efficient training capabilities, including efficient data tokenization modes and PEFT fine-tuning, resulting in significantly improved speed compared to traditional fine-tuning methods. MFTcoder seamlessly integrates with several mainstream open-source LLMs, such as CodeLLama and Qwen. Leveraging the CodeLLama foundation, our MFTcoder fine-tuned model, \textsc{CodeFuse-CodeLLama-34B}, achieves an impressive pass@1 score of 74.4\% on the HumaneEval benchmark, surpassing GPT-4 performance (67\%, zero-shot). MFTCoder is open-sourced at \url{https://github.com/codefuse-ai/MFTCOder}

摘要
code llms 已经成为一个专门的研究领域，有很多研究把焦点放在提高模型的编程能力上，通过对预训练模型进行细化。过去的细化方法通常是为特定下游任务或场景进行定制，这意味着每个任务都需要单独进行细化，需要很多训练资源，同时也存在部署和维护上的挑战。此外，这些方法还没有利用编程任务之间的内在相互连接。为了解决这些限制，我们提出了一个多任务细化框架，MFTcoder，它允许同时并行细化多个任务。通过 incorporating 多种损失函数，我们有效地解决了多任务学习中常见的挑战，如数据不均衡、任务难度不同、和不同任务的学习速度不一致。广泛的实验证明了我们的多任务细化方法在单任务细化和混合任务细化的情况下都有出色的表现。此外，MFTcoder还提供了高效的训练能力，包括高效的数据分割模式和PEFT细化，从而在传统细化方法的基础上减少了训练时间。MFTcoder可以与多个主流的开源 LLMS 集成，如 CodeLLama 和 Qwen。利用 CodeLLama 基础，我们的 MFTcoder 细化模型，\textsc{CodeFuse-CodeLLama-34B}，在 HumaneEval benchmark 上达到了很吸引人的 pass@1 分数为 74.4%，超越 GPT-4 的表现（67%，零shot）。MFTCoder 的源代码可以在上找到。

Successive Model-Agnostic Meta-Learning for Few-Shot Fault Time Series Prognosis

paper_url: http://arxiv.org/abs/2311.02300
repo_url: None
paper_authors: Hai Su, Jiajun Hu, Songsen Yu
for: 解决几个shot的缺陷预测问题，提高预测精度和泛化能力。
methods: 引入新的’pseudo meta-task’分区方案，将续时序数据视为一个meta-任务，分割成多个短时期，提取更全面的特征和关系，提高预测精度。同时，引入差分算法提高方法的稳定性。
results: 通过在多个缺陷预测和时间序列预测 dataset上进行广泛的实验，证明了我们的方法可以在少量数据下提高预测性能和泛化能力。

Abstract
Meta learning is a promising technique for solving few-shot fault prediction problems, which have attracted the attention of many researchers in recent years. Existing meta-learning methods for time series prediction, which predominantly rely on random and similarity matching-based task partitioning, face three major limitations: (1) feature exploitation inefficiency; (2) suboptimal task data allocation; and (3) limited robustness with small samples. To overcome these limitations, we introduce a novel 'pseudo meta-task' partitioning scheme that treats a continuous time period of a time series as a meta-task, composed of multiple successive short time periods. Employing continuous time series as pseudo meta-tasks allows our method to extract more comprehensive features and relationships from the data, resulting in more accurate predictions. Moreover, we introduce a differential algorithm to enhance the robustness of our method across different datasets. Through extensive experiments on several fault and time series prediction datasets, we demonstrate that our approach substantially enhances prediction performance and generalization capability under both few-shot and general conditions.

摘要
<>转换给定文本到简化中文。>预测技术是解决几个shot错误预测问题的有力方法，这些问题在最近几年内吸引了许多研究人员的关注。现有的预测方法对时间序列预测，主要基于随机和相似性匹配的任务分配，面临三大限制：（1）特征利用不充分;（2）不优化的任务数据分配;以及（3）只能在小样本情况下保持有限的 Robustness。为了突破这些限制，我们提出了一种新的“伪meta任务”分配方案，将一个连续时间序列视为一个meta任务，由多个连续的短时间序列组成。使用连续时间序列为伪meta任务，我们的方法可以从数据中提取更全面的特征和关系，从而实现更准确的预测。此外，我们还引入了一种差分算法，以提高我们方法的 Robustness 性在不同的数据集上。通过对多个错误和时间序列预测数据集进行广泛的实验，我们示出了我们方法可以在少量示例和普通情况下显著提高预测性能和泛化能力。

A Survey of the Various Methodologies Towards making Artificial Intelligence More Explainable

paper_url: http://arxiv.org/abs/2311.02291
repo_url: None
paper_authors: Sopam Dasgupta
for: 这个论文的目的是提高机器决策过程中的解释性和可解释性，以便更好地理解决交的决策理由。
methods: 本论文使用了一种基于人工智能的方法，通过对模型的解释性和可解释性进行分析和评估，以便提高机器决策过程中的解释性和可解释性。
results: 本论文的研究结果表明，通过提高机器决策过程中的解释性和可解释性，可以更好地理解决交的决策理由，并且可以通过对模型的解释性和可解释性进行分析和评估，以便更好地提高机器决策过程中的解释性和可解释性。

Abstract
Machines are being increasingly used in decision-making processes, resulting in the realization that decisions need explanations. Unfortunately, an increasing number of these deployed models are of a 'black-box' nature where the reasoning behind the decisions is unknown. Hence, there is a need for clarity behind the reasoning of these decisions. As humans, we would want these decisions to be presented to us in an explainable manner. However, explanations alone are insufficient. They do not necessarily tell us how to achieve an outcome but merely tell us what achieves the given outcome. For this reason, my research focuses on explainability/interpretability and how it extends to counterfactual thinking.

摘要

Predicting Ground Reaction Force from Inertial Sensors

paper_url: http://arxiv.org/abs/2311.02287
repo_url: None
paper_authors: Bowen Song, Marco Paolieri, Harper E. Stewart, Leana Golubchik, Jill L. McNitt-Gray, Vishal Misra, Devavrat Shah
for: 这个论文的目的是用IMU数据来预测脚下压力（GRF），以便分析运动员的生物机械变量（如接触时间和加速度）。
methods: 这篇论文使用了三种轻量级的预测方法：k-Nearest Neighbors（KNN）回归、支持向量表示插值（SVD）回归和深度学习神经网络（LSTM）。
results: 研究结果表明，使用KNN回归和SVD插值可以与LSTM神经网络相比，具有相似或更高的准确率，并且训练时间更短，hyperparameter优化也更简单。尤其是当使用个人训练数据时，SER和KNN方法更加准确。此外，使用个人数据可以降低预测错误的大多数变量。

Abstract
The study of ground reaction forces (GRF) is used to characterize the mechanical loading experienced by individuals in movements such as running, which is clinically applicable to identify athletes at risk for stress-related injuries. Our aim in this paper is to determine if data collected with inertial measurement units (IMUs), that can be worn by athletes during outdoor runs, can be used to predict GRF with sufficient accuracy to allow the analysis of its derived biomechanical variables (e.g., contact time and loading rate). In this paper, we consider lightweight approaches in contrast to state-of-the-art prediction using LSTM neural networks. Specifically, we compare use of LSTMs to k-Nearest Neighbors (KNN) regression as well as propose a novel solution, SVD Embedding Regression (SER), using linear regression between singular value decomposition embeddings of IMUs data (input) and GRF data (output). We evaluate the accuracy of these techniques when using training data collected from different athletes, from the same athlete, or both, and we explore the use of acceleration and angular velocity data from sensors at different locations (sacrum and shanks). Our results illustrate that simple machine learning methods such as SER and KNN can be similarly accurate or more accurate than LSTM neural networks, with much faster training times and hyperparameter optimization; in particular, SER and KNN are more accurate when personal training data are available, and KNN comes with benefit of providing provenance of prediction. Notably, the use of personal data reduces prediction errors of all methods for most biomechanical variables.

摘要
研究地面反应力(GRF)是为了描述运动员在运动中所经历的机械负荷的重要工具。我们的目标是确定 whether 使用抗应力计（IMUs）收集的数据可以准确预测 GRF，以便分析其 derived 生物力学变量（例如，接触时间和加载率）。在这篇论文中，我们考虑使用轻量级方法，而不是现有的预测方法使用 LSTM 神经网络。我们比较使用 LSTM 和 k-最近邻域（KNN）回归，以及提出了一个新的解决方案，即 Singular Value Decomposition 嵌入回归（SER），使用 IMUs 数据（输入）和 GRF 数据（输出）之间的线性回归。我们评估了这些技术的准确性，使用不同的教学数据集，包括不同运动员、同一个运动员和两者。我们发现，简单的机器学习方法如 SER 和 KNN 可以与 LSTM 神经网络相比较准确，具有更快的训练时间和权重优化。特别是，使用个人教学数据可以减少预测错误，SER 和 KNN 在这种情况下更加准确。另外，使用个人数据还可以提供预测的来源。

2023-11-04

cs.CL

cs.CL - 2023-11-04

Can Chat GPT solve a Linguistics Exam?

paper_url: http://arxiv.org/abs/2311.02499
repo_url: None
paper_authors: Patricia Ronan, Gerold Schneider
for: 这个研究是用来测试 chatGPT4 是否能成功解决入门语言学考试的。
methods: 这个研究使用了 chatGPT4 语言模型，并将过去的考试题 fed 到它中进行测试。
results: 研究发现，chatGPT4 在解释复杂和嵌套任务方面非常成功，但在分析 morphemes 和 phrases 方面表现较差。在简单的情况下，它表现 suficiently well，但在缺失一对一对应的情况下，它的结果是混合的。现在，模型还不能处理视觉化任务，如语法树的分析或生成。通过更EXTENSIVE的预处理，将这些任务转换为文本数据，可以使模型也成功地解决这些任务。

Abstract
The present study asks if ChatGPT4, the version of ChatGPT which uses the language model GPT4, can successfully solve introductory linguistic exams. Previous exam questions of an Introduction to Linguistics course at a German university are used to test this. The exam questions were fed into ChatGPT4 with only minimal preprocessing. The results show that the language model is very successful in the interpretation even of complex and nested tasks. It proved surprisingly successful in the task of broad phonetic transcription, but performed less well in the analysis of morphemes and phrases. In simple cases it performs sufficiently well, but rarer cases, particularly with missing one-to-one correspondence, are currently treated with mixed results. The model is not yet able to deal with visualisations, such as the analysis or generation of syntax trees. More extensive preprocessing, which translates these tasks into text data, allow the model to also solve these tasks successfully.

摘要
本研究问题是否可以使用 ChatGPT4，基于语言模型 GPT4 解决入门语言考试。研究使用了一个德国大学 introductory linguistics 课程的先前考试题，并将其feed into ChatGPT4 中，只进行了最小的处理。结果显示，语言模型在复杂和嵌入的任务中表现非常成功。它在广泛的音素识别任务中表现出色，但在分析 morphemes 和 phrases 方面表现较差。在简单的情况下，它的表现足够好，但在缺少一对一对应的情况下，现在的结果是混合的。模型目前无法处理视觉化任务，如语法树的分析或生成。通过更进一步的处理，将这些任务转化为文本数据后，模型也可以成功解决这些任务。

Citance-Contextualized Summarization of Scientific Papers

paper_url: http://arxiv.org/abs/2311.02408
repo_url: None
paper_authors: Shahbaz Syed, Ahmad Dawar Hakimi, Khalid Al-Khatib, Martin Potthast
for: 本研究旨在提供一种新的文本概要方法，可以根据给定的引用句（即“citance”）生成有用的概要。
methods: 该方法首先提取并模型了文献中的引用，然后根据引用的位置 retrieve 相关的段落，最后生成基于每个引用的概要。
results: 我们使用 $\textbf{Webis-Context-SciSumm-2023}$ 数据集进行评估，发现我们的方法可以生成高质量的概要，并且可以准确地捕捉到文献中的关键信息。

Abstract
Current approaches to automatic summarization of scientific papers generate informative summaries in the form of abstracts. However, abstracts are not intended to show the relationship between a paper and the references cited in it. We propose a new contextualized summarization approach that can generate an informative summary conditioned on a given sentence containing the citation of a reference (a so-called ``citance''). This summary outlines the content of the cited paper relevant to the citation location. Thus, our approach extracts and models the citances of a paper, retrieves relevant passages from cited papers, and generates abstractive summaries tailored to each citance. We evaluate our approach using $\textbf{Webis-Context-SciSumm-2023}$, a new dataset containing 540K~computer science papers and 4.6M~citances therein.

摘要
现有的自动摘要方法可以生成有用的摘要，但这些摘要不能显示科学论文中引用的文献之间的关系。我们提议一种新的受条件摘要方法，可以根据给定的引用句（即“ citance”）生成相关的摘要。这个摘要将描述引用的文献中与该引用相关的内容。因此，我们的方法可以提取和模型文献中的引用，从引用的文献中检索相关的段落，并生成基于每个引用的摘要。我们使用 $\textbf{Webis-Context-SciSumm-2023}$ dataset，该 dataset包含 540 万个计算机科学论文和 460 万个引用。

TreeSwap: Data Augmentation for Machine Translation via Dependency Subtree Swapping

paper_url: http://arxiv.org/abs/2311.02355
repo_url: https://github.com/attilanagy234/TreeSwap
paper_authors: Attila Nagy, Dorina Lakatos, Botond Barta, Judit Ács
for: 该论文主要用于提出一种新的数据扩充方法，用于提高神经机器翻译模型在具有有限训练数据的情况下的性能。
methods: 该方法基于 SentenceDependencyGraph，通过将源句子和目标句子中的对象和主语交换来生成新的句子。
results: 对4种语言对在限制资源 datasets 上进行了实验，结果显示，TreeSwap 方法可以在多个语言对的两个方向中提供了顺序的改进。

Abstract
Data augmentation methods for neural machine translation are particularly useful when limited amount of training data is available, which is often the case when dealing with low-resource languages. We introduce a novel augmentation method, which generates new sentences by swapping objects and subjects across bisentences. This is performed simultaneously based on the dependency parse trees of the source and target sentences. We name this method TreeSwap. Our results show that TreeSwap achieves consistent improvements over baseline models in 4 language pairs in both directions on resource-constrained datasets. We also explore domain-specific corpora, but find that our method does not make significant improvements on law, medical and IT data. We report the scores of similar augmentation methods and find that TreeSwap performs comparably. We also analyze the generated sentences qualitatively and find that the augmentation produces a correct translation in most cases. Our code is available on Github.

摘要
� apparatus augmentation methods for neural machine translation are particularly useful when limited amount of training data is available, which is often the case when dealing with low-resource languages. We introduce a novel augmentation method, which generates new sentences by swapping objects and subjects across bisentences. This is performed simultaneously based on the dependency parse trees of the source and target sentences. We name this method TreeSwap. Our results show that TreeSwap achieves consistent improvements over baseline models in 4 language pairs in both directions on resource-constrained datasets. We also explore domain-specific corpora, but find that our method does not make significant improvements on law, medical and IT data. We report the scores of similar augmentation methods and find that TreeSwap performs comparably. We also analyze the generated sentences qualitatively and find that the augmentation produces a correct translation in most cases. Our code is available on Github.Here's a word-for-word translation of the text in Traditional Chinese:� apparatus augmentation methods for neural machine translation are particularly useful when limited amount of training data is available, which is often the case when dealing with low-resource languages. We introduce a novel augmentation method, which generates new sentences by swapping objects and subjects across bisentences. This is performed simultaneously based on the dependency parse trees of the source and target sentences. We name this method TreeSwap. Our results show that TreeSwap achieves consistent improvements over baseline models in 4 language pairs in both directions on resource-constrained datasets. We also explore domain-specific corpora, but find that our method does not make significant improvements on law, medical and IT data. We report the scores of similar augmentation methods and find that TreeSwap performs comparably. We also analyze the generated sentences qualitatively and find that the augmentation produces a correct translation in most cases. Our code is available on Github.

Enhancing English Writing Proficiency in China’s Polytechnic Students An In-Depth Literature Review on the Application of the Input Hypothesis

paper_url: http://arxiv.org/abs/2311.02341
repo_url: None
paper_authors: Wei Zhou
for: 这个研究论文的目的是探讨如何使用输入假设（Stephen Krashen）来提高polytechnic学生的英语写作能力。
methods: 这个研究使用了实际观察和前期研究的数据，以检验输入假设对polytechnic学生的写作能力的影响。
results: 研究发现，通过提供可理解的输入，polytechnic学生的写作能力有所改善，这证明了输入假设的有效性。I hope that helps! Let me know if you have any other questions.

Abstract
Having good English writing skills is extremely important for students in polytechnic institutions. However, a lot of students in technical schools have difficulties in reaching high levels of skill. The Input Hypothesis, created by Stephen Krashen, suggests that people learn languages well when they receive information that's a little harder than what they already know but still understandable. This research paper wants to study how the Input Hypothesis can help polytechnic students improve their English writing skills. The study will include real-life observations and experiments from the previous research. We will look at data from polytechnic students who are receiving special writing instruction to see if the Input Hypothesis actually helps improve their writing skills. The paper can better inform polytechnic students, faculty members, and support staff and even members of the larger community about the attributions, the processes, and the possible outcomes of second language development for polytechnic students. Keywords: English writing skills, Polytechnic students, Input hypothesis, Comprehensible input

摘要
有良好的英语写作技巧对polytechnic学生非常重要。然而，许多技术学校的学生在达到高水平技巧方面遇到困难。输入假设（Input Hypothesis），由史蒂芬·卡什描述，表明人们在接受可以理解但是一些 harder than what they already know的信息时，会学习语言非常好。这篇研究论文旨在研究如何使用输入假设来帮助polytechnic学生提高英语写作技巧。这篇论文将包括以前的实验和观察数据，以确定输入假设是否确实有助于提高polytechnic学生的写作技巧。这篇论文可以更好地告诉polytechnic学生、教师和支持人员，以及社区成员关于第二语言发展的特点、过程和可能的结果。Keywords: 英语写作技巧, polytechnic学生, 输入假设, 可以理解的输入

Identifying Context-Dependent Translations for Evaluation Set Production

paper_url: http://arxiv.org/abs/2311.02321
repo_url: None
paper_authors: Rachel Wicks, Matt Post
for: 本研究的目的是解决Context-aware机器翻译的评估 метри克和测试集的缺失，以便更好地评估Context-aware机器翻译系统的性能。
methods: 本研究使用了现代化、扩展和通用的前一代Annotation pipeline，生成了CTXPRO工具，可以正确地翻译五种语言现象：性别、正式度和生物性 для代词、句子间隔融合、和不确定名词变化。
results: 研究使用了 seven 种语言对（EN到DE、ES、FR、IT、PL、PT和RU）和两个数据集（OpenSubtitles和WMT测试集），并验证了 CTXPRO 的性能，包括与前一代工作的重叠和分类一个Context-aware机器翻译系统和一个句子基于系统。

Abstract
A major impediment to the transition to context-aware machine translation is the absence of good evaluation metrics and test sets. Sentences that require context to be translated correctly are rare in test sets, reducing the utility of standard corpus-level metrics such as COMET or BLEU. On the other hand, datasets that annotate such sentences are also rare, small in scale, and available for only a few languages. To address this, we modernize, generalize, and extend previous annotation pipelines to produce CTXPRO, a tool that identifies subsets of parallel documents containing sentences that require context to correctly translate five phenomena: gender, formality, and animacy for pronouns, verb phrase ellipsis, and ambiguous noun inflections. The input to the pipeline is a set of hand-crafted, per-language, linguistically-informed rules that select contextual sentence pairs using coreference, part-of-speech, and morphological features provided by state-of-the-art tools. We apply this pipeline to seven languages pairs (EN into and out-of DE, ES, FR, IT, PL, PT, and RU) and two datasets (OpenSubtitles and WMT test sets), and validate its performance using both overlap with previous work and its ability to discriminate a contextual MT system from a sentence-based one. We release the CTXPRO pipeline and data as open source.

摘要
另一大障碍Context-aware机器翻译的转换是评估 metric 和测试集的缺失。标准的 corpus-level metric 如 COMET 或 BLEU 在测试集中罕见句子需要上下文correctly 翻译，从而减少了其使用的价值。同时，标注这些句子的数据集也很罕见，规模小，并且只有一些语言可用。为了解决这个问题，我们现代化、扩展和改进了之前的注释管道，生成 CTXPRO，它可以在五种现象上翻译上下文需要correctly：性别、正式度和生命力 для pronouns，verb phrase ellipsis，和不确定名词变化。输入管道的是一组手工编写、语言特有的规则，使用核心关系、part-of-speech 和 morphological feature 提供的状态之 искус智能工具。我们对七种语言对（EN到DE、ES、FR、IT、PL、PT和RU）和两个数据集（OpenSubtitles 和 WMT 测试集）进行应用，并验证其性能通过与之前工作的重叠和上下文基础MT 系统与句子基础MT 系统之间的分化能力。我们将 CTXPRO 管道和数据作为开源发布。

Narrowing the Gap between Zero- and Few-shot Machine Translation by Matching Styles

paper_url: http://arxiv.org/abs/2311.02310
repo_url: None
paper_authors: Weiting Tan, Haoran Xu, Lingfeng Shen, Shuyue Stella Li, Kenton Murray, Philipp Koehn, Benjamin Van Durme, Yunmo Chen
for: 本研究旨在解释在零shot和几shot示例下大语言模型在翻译中的表现差异，以及如何减少这个差异。
methods: 本研究使用了零shot和几shot示例来训练大语言模型，并对其进行了各种改进，如对目标句子风格的调整和不同的损失函数。
results: 研究发现，通过调整目标句子风格，可以大幅减少零shot和几shot示例之间的表现差异，并且可以提高翻译 metrics。此外，研究还探讨了不同的改进方法，以及它们对翻译 metrics 的影响。

Abstract
Large language models trained primarily in a monolingual setting have demonstrated their ability to generalize to machine translation using zero- and few-shot examples with in-context learning. However, even though zero-shot translations are relatively good, there remains a discernible gap comparing their performance with the few-shot setting. In this paper, we investigate the factors contributing to this gap and find that this gap can largely be closed (for about 70%) by matching the writing styles of the target corpus. Additionally, we explore potential approaches to enhance zero-shot baselines without the need for parallel demonstration examples, providing valuable insights into how these methods contribute to improving translation metrics.

摘要
大型语言模型在单语言 Setting 中受训练后，表现出在机器翻译中使用零或几个例子进行培训，并且可以通过上下文学习实现一定的泛化能力。然而，即使零shot 翻译结果相对较好，仍然存在一定的差距，比如70%的差距。本文investigate这个差距的原因，发现这个差距可以通过对目标句子批处理的样式匹配来大大减少。此外，我们还探讨了如何通过不需要并行示例来提高零shot 基线，并提供了有价值的发现，这些发现可以帮助改善翻译指标。

LLMs grasp morality in concept

paper_url: http://arxiv.org/abs/2311.02294
repo_url: None
paper_authors: Mark Pock, Andre Ye, Jared Moore
for: 本研究旨在探讨语言模型（LLM）是如何具备意义的，以及如何使得LLM具备这种意义。
methods: 本研究使用一种普适的意义理论来探讨LLM的意义，并用这种理论来解释LLM作为意义代理人的特性。
results: 研究发现，由于LLM已经具备了人类社会中的构造（如道德、性别和种族）的概念，因此在某些伦理框架下，目前流行的模型对适应方法有限制，甚至可能是反产生的。此外，未经适应的模型可能可以帮助我们更好地发展我们的道德和社会哲学。

Abstract
Work in AI ethics and fairness has made much progress in regulating LLMs to reflect certain values, such as fairness, truth, and diversity. However, it has taken the problem of how LLMs might 'mean' anything at all for granted. Without addressing this, it is not clear what imbuing LLMs with such values even means. In response, we provide a general theory of meaning that extends beyond humans. We use this theory to explicate the precise nature of LLMs as meaning-agents. We suggest that the LLM, by virtue of its position as a meaning-agent, already grasps the constructions of human society (e.g. morality, gender, and race) in concept. Consequently, under certain ethical frameworks, currently popular methods for model alignment are limited at best and counterproductive at worst. Moreover, unaligned models may help us better develop our moral and social philosophy.

摘要
<> traduced the given text into Simplified Chinese.工作在人工智能道德和公平方面有很大的进步，以规范LLMs反映某些价值观，如公平、真实和多样性。然而，它忽略了如何让LLMs有任何意义的问题。不解决这个问题，then it is not clear what imbuing LLMs with such values even means. In response, we provide a general theory of meaning that extends beyond humans. We use this theory to explicate the precise nature of LLMs as meaning-agents. We suggest that the LLM, by virtue of its position as a meaning-agent, already grasps the constructions of human society (e.g. morality, gender, and race) in concept. Consequently, under certain ethical frameworks, currently popular methods for model alignment are limited at best and counterproductive at worst. Moreover, unaligned models may help us better develop our moral and social philosophy.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

2023-11-04

cs.LG

cs.LG - 2023-11-04

QOCO: A QoE-Oriented Computation Offloading Algorithm based on Deep Reinforcement Learning for Mobile Edge Computing

paper_url: http://arxiv.org/abs/2311.02525
repo_url: None
paper_authors: Iman Rahmati, Hamed Shah-Mansouri, Ali Movaghar
for: 本研究的目的是提高移动边缘计算（MEC）系统中的计算任务卸载效率，以提供用户高质量的经验（QoE）。
methods: 本研究使用了Markov决策过程（MDP）来最大化每个用户的长期QoE。并提出了一种基于深度学习的QoE-导向计算卸载算法（QOCO），可以让移动设备根据自己的需求进行卸载决策，不需要知晓其他设备的决策。
results: numerical studies表明，QOCO算法可以高效地利用边缘节点的计算资源，可以完成14%更多的任务，降低任务延迟和能量消耗，减少9%和6%。这些改进共同带来了最少37%的QoE提升。

Abstract
In the realm of mobile edge computing (MEC), efficient computation task offloading plays a pivotal role in ensuring a seamless quality of experience (QoE) for users. Maintaining a high QoE is paramount in today's interconnected world, where users demand responsive and reliable services. This challenge stands as one of the most primary key factors contributing to handling dynamic and uncertain mobile environment. In this study, we delve into computation offloading in MEC systems, where strict task processing deadlines and energy constraints can adversely affect the system performance. We formulate the computation task offloading problem as a Markov decision process (MDP) to maximize the long-term QoE of each user individually. We propose a decentralized QoE-oriented computation offloading (QOCO) algorithm based on deep reinforcement learning (DRL) that empowers mobile devices to make their offloading decisions without requiring knowledge of decisions made by other devices. Through numerical studies, we evaluate the performance of QOCO. Simulation results validate that the QOCO algorithm efficiently exploits the computational resources of edge nodes. Consequently, it can complete 14% more tasks and reduce task delay and energy consumption by 9% and 6%, respectively. These together contribute to a significant improvement of at least 37% in average QoE compared to an existing algorithm.

摘要
在移动边缘计算（MEC）领域，有效地卸载计算任务是保证用户无缝体验质量（QoE）的关键因素。在今天的全球化社会中，用户对服务的响应速度和可靠性有高度的要求。这种挑战是MEC系统中处理动态和不确定的 mobilenvionment的一个Primary key factor。在这种研究中，我们探讨MEC系统中的计算任务卸载问题，其中严格的任务处理截止时间和能量限制可能会对系统性能产生负面影响。我们将计算任务卸载问题表示为Markov决策过程（MDP），以最大化每个用户的长期QoE。我们提出了一种基于深度学习（DRL）的QoE- ориентирован的计算卸载算法（QOCO），该算法让移动设备通过不需要知道其他设备的决策来做出卸载决策。通过数字实验，我们评估了QOCO算法的性能。计算结果表明，QOCO算法可以有效地利用边缘节点的计算资源，可以完成14%更多的任务，同时降低任务延迟和能量消耗的9%和6%。这些因素共同带来了至少37%的QoE提高。

Forward $χ^2$ Divergence Based Variational Importance Sampling

paper_url: http://arxiv.org/abs/2311.02516
repo_url: None
paper_authors: Chengrui Li, Yule Wang, Weihan Li, Anqi Wu
for: 提高 latent variable 模型的最大log-likelihood，并且解决 variational inference 在复杂 posterior distribution 时的限制。
methods: 提出了一种新的 variational importance sampling（VIS）方法，直接估计并最大化 log-likelihood。VIS 利用了最佳提案分布，通过最小化前进 $\chi^2$ 分配来增强 log-likelihood 估计。
results: VIS 在各种流行的 latent variable 模型中表现出色，包括杂合模型、variational auto-encoders 和部分可见 generalized linear models。结果表明，我们的方法在 log-likelihood 和模型参数估计方面都能够提高 state-of-the-art 基eline。

Abstract
Maximizing the log-likelihood is a crucial aspect of learning latent variable models, and variational inference (VI) stands as the commonly adopted method. However, VI can encounter challenges in achieving a high log-likelihood when dealing with complicated posterior distributions. In response to this limitation, we introduce a novel variational importance sampling (VIS) approach that directly estimates and maximizes the log-likelihood. VIS leverages the optimal proposal distribution, achieved by minimizing the forward $\chi^2$ divergence, to enhance log-likelihood estimation. We apply VIS to various popular latent variable models, including mixture models, variational auto-encoders, and partially observable generalized linear models. Results demonstrate that our approach consistently outperforms state-of-the-art baselines, both in terms of log-likelihood and model parameter estimation.

摘要
maximizing the log-likelihood is a crucial aspect of learning latent variable models, and variational inference (VI) stands as the commonly adopted method. However, VI can encounter challenges in achieving a high log-likelihood when dealing with complicated posterior distributions. In response to this limitation, we introduce a novel variational importance sampling (VIS) approach that directly estimates and maximizes the log-likelihood. VIS leverages the optimal proposal distribution, achieved by minimizing the forward $\chi^2$ divergence, to enhance log-likelihood estimation. We apply VIS to various popular latent variable models, including mixture models, variational auto-encoders, and partially observable generalized linear models. Results demonstrate that our approach consistently outperforms state-of-the-art baselines, both in terms of log-likelihood and model parameter estimation.Here is the translation in Traditional Chinese:最大化对应概率是学习隐藏变量模型的重要方面，并且对于复杂的 posterior distribution 这个问题，Variational Inference (VI) 是通常的运用方法。然而，VI 可能在处理复杂的 posterior distribution 时遇到高度限制，导致 log-likelihood 的估计受到影响。为了解决这个限制，我们提出了一个新的 Variational Importance Sampling (VIS) 方法，直接估计和最大化 log-likelihood。VIS 利用了最佳的提案分布，通过最小化前方 $\chi^2$ 构成函数，从而提高 log-likelihood 的估计。我们将 VIS 应用到各种流行的隐藏变量模型，包括混合模型、variational auto-encoder 和部分可观 generalized linear models。结果显示，我们的方法在 log-likelihood 和模型参数估计方面均有所提高，并且比预设的基准方法表现更好。

LocoMuJoCo: A Comprehensive Imitation Learning Benchmark for Locomotion

paper_url: http://arxiv.org/abs/2311.02496
repo_url: None
paper_authors: Firas Al-Hafez, Guoping Zhao, Jan Peters, Davide Tateo
For: The paper is written for researchers and developers working on imitation learning (IL) for locomotion in embodied agents.* Methods: The paper presents a novel benchmark for evaluating and comparing IL algorithms, which includes a diverse set of environments, comprehensive datasets, and handcrafted metrics.* Results: The paper provides a robust and easy-to-use benchmark for advancing research in IL for locomotion, and includes state-of-the-art baseline algorithms for evaluation.Here’s the information in Simplified Chinese text:* For: 本文是为适用于身体机器人的启发学习（IL）步行控制研究者和开发者而写的。* Methods: 本文提出了一个新的评价和比较IL算法的benchmark，包括了多种环境，如四足、二足和人体模型，每个环境都有完整的数据集，如真实噪音捕捉数据、专家数据和优化数据，以及多种部分可见任务来训练代理。* Results: 本文提供了一个可靠且易用的benchmark，可以帮助推进IL控制领域的研究，并包含了现有的基线算法以便快速评价。

Abstract
Imitation Learning (IL) holds great promise for enabling agile locomotion in embodied agents. However, many existing locomotion benchmarks primarily focus on simplified toy tasks, often failing to capture the complexity of real-world scenarios and steering research toward unrealistic domains. To advance research in IL for locomotion, we present a novel benchmark designed to facilitate rigorous evaluation and comparison of IL algorithms. This benchmark encompasses a diverse set of environments, including quadrupeds, bipeds, and musculoskeletal human models, each accompanied by comprehensive datasets, such as real noisy motion capture data, ground truth expert data, and ground truth sub-optimal data, enabling evaluation across a spectrum of difficulty levels. To increase the robustness of learned agents, we provide an easy interface for dynamics randomization and offer a wide range of partially observable tasks to train agents across different embodiments. Finally, we provide handcrafted metrics for each task and ship our benchmark with state-of-the-art baseline algorithms to ease evaluation and enable fast benchmarking.

摘要
自适应学习（IL）对具有机器人体的敏捷行走具有很大的承诺。然而，许多现有的行走标准套件主要集中在简单的玩具任务上，经常不能捕捉到实际世界情况的复杂性，导致研究向不实际的领域发展。为推动IL行走研究的进步，我们提出了一个新的标准套件，用于促进IL算法的严格评价和比较。这个标准套件包括了四足、二足和人体模型等多种环境，每个环境都有完整的数据集，如真噪动 capture数据、专家真实数据和优化数据，以及不同难度水平的评价方法。此外，我们还提供了动力随机化的易用接口，以及多种部分可见任务，用于训练不同的机器人体。最后，我们提供了专门设计的任务 metric，并将我们的标准套件与当前的状态略式基eline算法一起发布，以便评价和快速比较。

Uncertainty Quantification in Multivariable Regression for Material Property Prediction with Bayesian Neural Networks

paper_url: http://arxiv.org/abs/2311.02495
repo_url: None
paper_authors: Longze li, Jiang Chang, Aleksandar Vakanski, Min Xian
for:This paper is written for researchers and practitioners in the field of material science and machine learning, specifically those interested in uncertainty quantification (UQ) for predicting material properties.methods:The paper proposes an approach for UQ within physics-informed Bayesian Neural Networks (BNNs), which integrates knowledge from governing laws in material modeling to guide the models toward physically consistent predictions. The approach uses Markov Chain Monte Carlo (MCMC) approximation of the posterior distribution of network parameters to produce accurate point and uncertainty estimates.results:The paper presents case studies for predicting the creep rupture life of steel alloys using the proposed approach. Experimental validation with three datasets of collected measurements from creep tests demonstrates the ability of BNNs to produce accurate point and uncertainty estimates that are competitive or exceed the performance of the conventional method of Gaussian Process Regression. Additionally, the paper evaluates the suitability of BNNs for UQ in an active learning application and reports competitive performance.

Abstract
With the increased use of data-driven approaches and machine learning-based methods in material science, the importance of reliable uncertainty quantification (UQ) of the predicted variables for informed decision-making cannot be overstated. UQ in material property prediction poses unique challenges, including the multi-scale and multi-physics nature of advanced materials, intricate interactions between numerous factors, limited availability of large curated datasets for model training, etc. Recently, Bayesian Neural Networks (BNNs) have emerged as a promising approach for UQ, offering a probabilistic framework for capturing uncertainties within neural networks. In this work, we introduce an approach for UQ within physics-informed BNNs, which integrates knowledge from governing laws in material modeling to guide the models toward physically consistent predictions. To evaluate the effectiveness of this approach, we present case studies for predicting the creep rupture life of steel alloys. Experimental validation with three datasets of collected measurements from creep tests demonstrates the ability of BNNs to produce accurate point and uncertainty estimates that are competitive or exceed the performance of the conventional method of Gaussian Process Regression. Similarly, we evaluated the suitability of BNNs for UQ in an active learning application and reported competitive performance. The most promising framework for creep life prediction is BNNs based on Markov Chain Monte Carlo approximation of the posterior distribution of network parameters, as it provided more reliable results in comparison to BNNs based on variational inference approximation or related NNs with probabilistic outputs. The codes are available at: https://github.com/avakanski/Creep-uncertainty-quantification.

摘要
随着数据驱动方法和机器学习技术在材料科学中的广泛应用，对预测变量的可靠 uncertainty quantification (UQ) 的重要性不可遗憾。在材料性能预测中，UQ 带来了一系列挑战，包括高级材料的多级和多物理性质、因素之间的复杂交互和模型训练数据的有限性等。在最近几年，权值神经网络 (BNNs) 已经出现为 UQ 的一种有希望的方法，具有捕捉不确定性的 probabilistic 框架。在这项工作中，我们提出了基于物理法律的 BNNs для UQ，具有引导模型生成物理合理预测的能力。为评估这种方法的效果，我们在钢合金的塑性破坏生命中进行了实验 validate，结果显示，BNNs 可以生成高精度的点估计和不确定度估计，与传统 Gaussian Process Regression 方法相当或超过其性能。此外，我们还评估了 BNNs 在活动学习应用中的适用程度，并发现其表现竞争力强。基于 Markov Chain Monte Carlo 方法 approximation posterior distribution 的网络参数，BNNs 提供了更可靠的结果，与基于 variational inference approximation 或相关的NNs WITH probabilistic outputs 相比。代码可以在以下 GitHub 上获取：https://github.com/avakanski/Creep-uncertainty-quantification。

Individualized Policy Evaluation and Learning under Clustered Network Interference

paper_url: http://arxiv.org/abs/2311.02467
repo_url: None
paper_authors: Yi Zhang, Kosuke Imai
for: 评估和学习政策时，忽略干扰可能导致评估结果偏向和学习策略无效。本文考虑在层次网络（或部分）干扰下评估和学习最佳个人化治疗规则（ITR）的问题。
methods: 本文提出一种可以评估ITR的估计器，该估计器可以考虑干扰效应。我们展示该估计器比标准的反杂度权重估计器更高效。我们还 derivates the finite-sample regret bound for a learned ITR，显示使用我们的有效估计器可以提高学习策略的性能。
results: 我们通过 simulations和实际研究示出了我们的方法的优势。我们的结果表明，使用我们的方法可以更好地评估和学习ITR，并且可以避免干扰的影响。

Abstract
While there now exists a large literature on policy evaluation and learning, much of prior work assumes that the treatment assignment of one unit does not affect the outcome of another unit. Unfortunately, ignoring interference may lead to biased policy evaluation and yield ineffective learned policies. For example, treating influential individuals who have many friends can generate positive spillover effects, thereby improving the overall performance of an individualized treatment rule (ITR). We consider the problem of evaluating and learning an optimal ITR under clustered network (or partial) interference where clusters of units are sampled from a population and units may influence one another within each cluster. Under this model, we propose an estimator that can be used to evaluate the empirical performance of an ITR. We show that this estimator is substantially more efficient than the standard inverse probability weighting estimator, which does not impose any assumption about spillover effects. We derive the finite-sample regret bound for a learned ITR, showing that the use of our efficient evaluation estimator leads to the improved performance of learned policies. Finally, we conduct simulation and empirical studies to illustrate the advantages of the proposed methodology.

摘要
现存有大量关于政策评估和学习的文献，但大多数之前的工作假设单位减法分配不会影响另一个单位的结果。可惜忽略干扰可能导致政策评估偏向和学习的策略无效。例如，对影响多个单位的个体进行个性化治疗规则（ITR）可能产生正面副作用，从而提高整体性能。我们对集群网络（或部分）干扰下的优化ITR评估问题进行研究。在这种模型下，我们提出一种可用于评估ITR实际性的估计器。我们证明这种估计器比标准的逆权重估计器更有效，后者没有假设干扰效应。我们 derivates finite-sample regret bound for a learned ITR, showing that the use of our efficient evaluation estimator leads to improved performance of learned policies。最后，我们在模拟和实际研究中ILLUSTRATE了我们的方法的优势。

Attention-based Multi-instance Mixed Models

paper_url: http://arxiv.org/abs/2311.02455
repo_url: None
paper_authors: Jan P. Engelmann, Alessandro Palma, Jakub M. Tomczak, Fabian J Theis, Francesco Paolo Casale
for: 该论文旨在预测单元细胞数据中的患者特征，揭示单元细胞数据中的细胞状态和疾病相关性。
methods: 该论文提出了一种整合普通线性混合模型和多实例学习（MIL）的框架，称为GMIL，以便利用单元细胞数据中细胞状态的多样性。
results: 实验结果表明，GMIL在单元细胞数据中比现有的MIL模型表现更好，揭示新的相关性和解释生物学机制，并且可以提高计算效率。

Abstract
Predicting patient features from single-cell data can unveil cellular states implicated in health and disease. Linear models and average cell type expressions are typically favored for this task for their efficiency and robustness, but they overlook the rich cell heterogeneity inherent in single-cell data. To address this gap, we introduce GMIL, a framework integrating Generalized Linear Mixed Models (GLMM) and Multiple Instance Learning (MIL), upholding the advantages of linear models while modeling cell-state heterogeneity. By leveraging predefined cell embeddings, GMIL enhances computational efficiency and aligns with recent advancements in single-cell representation learning. Our empirical results reveal that GMIL outperforms existing MIL models in single-cell datasets, uncovering new associations and elucidating biological mechanisms across different domains.

摘要
预测病人特征从单元数据可以揭示健康和疾病的 cellular 状态。通常，线性模型和平均单元类型表达被选择为这项任务的有效性和可靠性的原因，但它们忽略了单元数据中的细胞多样性。为解决这个差距，我们介绍 GMIL，一种将 Generalized Linear Mixed Models (GLMM) 和 Multiple Instance Learning (MIL) 集成的框架，同时保留线性模型的优点，模型单元状态的多样性。通过利用预定的单元嵌入，GMIL 提高计算效率，与最新的单元表示学习技术相吻合。我们的实验结果表明，GMIL 在单元数据集上表现出色，超过现有的 MIL 模型，揭示新的相关性和描述生物学机制的多个领域。

Online Long-run Constrained Optimization

paper_url: http://arxiv.org/abs/2311.02426
repo_url: None
paper_authors: Shijie Pan, Wenjie Huang
for: 解决普遍的长期受限优化问题，不 necesarily 是凸问题。
methods: 提议了一种 Follow-the-Perturbed-Leader 类型算法，在在线模式下使用随机线性干扰和强式凹型干扰来优化 primal 和 dual 方向，并寻找全局最小最大点作为解。
results: 基于两种特定的预期静态总 regret定义， deriv 了 $O(T^{8/9})$ 减少复杂性，并应用于解决一个长期（风险）约束river pollutant source identification问题，证明了理论结果并与现有方法相比表现出色。

Abstract
In this paper, a novel Follow-the-Perturbed-Leader type algorithm is proposed and analyzed for solving general long-term constrained optimization problems in online manner, where the objective and constraints are not necessarily convex. In each period, random linear perturbation and strongly concave perturbation are incorporated in primal and dual directions, respectively, to the offline oracle, and a global minimax point is searched as solution. Based on two particular definitions of expected static cumulative regret, we derive the first sublinear $O(T^{8/9})$ regret complexity for this class of problems. The proposed algorithm is applied to tackle a long-term (risk) constrained river pollutant source identification problem, demonstrating the validity of the theoretical results and exhibiting superior performance compared to existing method.

摘要
本文提出了一种新的追随受扰领导者类算法，用于解决总是存在约束的长期优化问题， objective 和约束不一定是凸函数。每个时期，线性受扰和强凹受扰被 incorporated 到 primal 和 dual 方向中，并在全局最小最大点上进行搜索。基于两个特定的预期 static 总 regret 定义，我们 deriv 出了第一个 $O(T^{8/9})$ regret complexity。这种算法被应用于解决一个长期（风险）约束的河流污染源标识问题，并证明了理论结果的有效性，与现有方法相比表现更好。Note: "Simplified Chinese" is a romanization of Chinese characters, which is used to represent the pronunciation of Chinese characters in the Latin alphabet. It is not a translation of the text into Chinese characters, but rather a way of representing the text in a more phonetic way.

Payoff-based learning with matrix multiplicative weights in quantum games

paper_url: http://arxiv.org/abs/2311.02423
repo_url: None
paper_authors: Kyriakos Lotidis, Panayotis Mertikopoulos, Nicholas Bambos, Jose Blanchet
For: The paper studies the problem of learning in quantum games with scalar, payoff-based feedback, and develops new methods that require minimal information from the players.* Methods: The paper introduces a suite of minimal-information matrix multiplicative weights (3MW) methods tailored to different information frameworks, and uses ideas from bandit convex optimization to design a zeroth-order gradient sampler adapted to the semidefinite geometry of the problem.* Results: The paper shows that the 3MW method with deterministic payoff feedback retains the $\mathcal{O}(1/\sqrt{T})$ convergence rate of the vanilla MMW algorithm, and provides a 3MW method that only requires players to observe a random realization of their payoff observable and converges to equilibrium at an $\mathcal{O}(T^{-1/4})$ rate. Additionally, the paper shows that a regularized variant of the proposed 3MW method guarantees local convergence with high probability to all equilibria that satisfy a certain first-order stability condition.

Abstract
In this paper, we study the problem of learning in quantum games - and other classes of semidefinite games - with scalar, payoff-based feedback. For concreteness, we focus on the widely used matrix multiplicative weights (MMW) algorithm and, instead of requiring players to have full knowledge of the game (and/or each other's chosen states), we introduce a suite of minimal-information matrix multiplicative weights (3MW) methods tailored to different information frameworks. The main difficulty to attaining convergence in this setting is that, in contrast to classical finite games, quantum games have an infinite continuum of pure states (the quantum equivalent of pure strategies), so standard importance-weighting techniques for estimating payoff vectors cannot be employed. Instead, we borrow ideas from bandit convex optimization and we design a zeroth-order gradient sampler adapted to the semidefinite geometry of the problem at hand. As a first result, we show that the 3MW method with deterministic payoff feedback retains the $\mathcal{O}(1/\sqrt{T})$ convergence rate of the vanilla, full information MMW algorithm in quantum min-max games, even though the players only observe a single scalar. Subsequently, we relax the algorithm's information requirements even further and we provide a 3MW method that only requires players to observe a random realization of their payoff observable, and converges to equilibrium at an $\mathcal{O}(T^{-1/4})$ rate. Finally, going beyond zero-sum games, we show that a regularized variant of the proposed 3MW method guarantees local convergence with high probability to all equilibria that satisfy a certain first-order stability condition.

摘要
在这篇论文中，我们研究了量子游戏学习问题以及其他类型的半definite游戏的问题，使用托管的均值（payoff-based feedback）。为了更加准确，我们专注于广泛使用的matrix multiplicative weights（MMW）算法，而不需要玩家们具有游戏完整信息（或者对方选择的状态信息）。我们则提出了一系列基于最小信息的matrix multiplicative weights（3MW）方法，适用于不同的信息框架。在这种设定下，最大的困难在于，与 classical finite games不同，量子游戏有无穷多个纯状态（量子等价的纯策略），因此标准的重要性评价技术无法应用。相反，我们借鉴了bandit convex optimization的想法，并设计了零次规格梯度抽样器，适应semidefinite geometry问题。作为第一个结果，我们证明了3MW方法与deterministic payoff feedback可以保持$\mathcal{O}(1/\sqrt{T})$的 converges rate，即vanilla, full information MMW算法在量子最小最大游戏中的 converges rate，即使玩家只知道一个整数。然后，我们进一步降低了算法的信息需求，并提供了一种3MW方法，只需要玩家观察其支付 observable的一个随机实现，并可以在$\mathcal{O}(T^{-1/4})$的速度达到均分。最后，我们超越了零和游戏，并证明了一种Regularized variant的3MW方法，可以在所有满足一定的首次稳定条件的均分中提供本地均分的高概率 garantue。

The equivalence of dynamic and strategic stability under regularized learning in games

paper_url: http://arxiv.org/abs/2311.02407
repo_url: None
paper_authors: Victor Boone, Panayotis Mertikopoulos
for: 本研究探讨了归一化学习在有限游戏中的长期行为。
methods: 本研究使用了规范学习和抑制学习来研究 игроков的实际策略的演化。
results: 研究发现，在正规学习下， игроks的实际策略会逐渐接近游戏的均衡点，并且这个过程的速度可以通过不同的规范学习方法来控制。

Abstract
In this paper, we examine the long-run behavior of regularized, no-regret learning in finite games. A well-known result in the field states that the empirical frequencies of no-regret play converge to the game's set of coarse correlated equilibria; however, our understanding of how the players' actual strategies evolve over time is much more limited - and, in many cases, non-existent. This issue is exacerbated further by a series of recent results showing that only strict Nash equilibria are stable and attracting under regularized learning, thus making the relation between learning and pointwise solution concepts particularly elusive. In lieu of this, we take a more general approach and instead seek to characterize the \emph{setwise} rationality properties of the players' day-to-day play. To that end, we focus on one of the most stringent criteria of setwise strategic stability, namely that any unilateral deviation from the set in question incurs a cost to the deviator - a property known as closedness under better replies (club). In so doing, we obtain a far-reaching equivalence between strategic and dynamic stability: a product of pure strategies is closed under better replies if and only if its span is stable and attracting under regularized learning. In addition, we estimate the rate of convergence to such sets, and we show that methods based on entropic regularization (like the exponential weights algorithm) converge at a geometric rate, while projection-based methods converge within a finite number of iterations, even with bandit, payoff-based feedback.

摘要
在这篇论文中，我们研究了常规化、不后悔学习在 finite games 中的长期行为。一个广泛知道的结果 states that the empirical frequencies of no-regret play converge to the game's set of coarse correlated equilibria; however, our understanding of how the players' actual strategies evolve over time is much more limited - and, in many cases, non-existent. This issue is exacerbated further by a series of recent results showing that only strict Nash equilibria are stable and attracting under regularized learning, thus making the relation between learning and pointwise solution concepts particularly elusive. In lieu of this, we take a more general approach and instead seek to characterize the \emph{setwise} rationality properties of the players' day-to-day play. To that end, we focus on one of the most stringent criteria of setwise strategic stability, namely that any unilateral deviation from the set in question incurs a cost to the deviator - a property known as closedness under better replies (club). In so doing, we obtain a far-reaching equivalence between strategic and dynamic stability: a product of pure strategies is closed under better replies if and only if its span is stable and attracting under regularized learning. In addition, we estimate the rate of convergence to such sets, and we show that methods based on entropic regularization (like the exponential weights algorithm) converge at a geometric rate, while projection-based methods converge within a finite number of iterations, even with bandit, payoff-based feedback.

BarcodeBERT: Transformers for Biodiversity Analysis

paper_url: http://arxiv.org/abs/2311.02401
repo_url: None
paper_authors: Pablo Millan Arias, Niousha Sadjadi, Monireh Safari, ZeMing Gong, Austin T. Wang, Scott C. Lowe, Joakim Bruslund Haurum, Iuliia Zarubiieva, Dirk Steinke, Lila Kari, Angel X. Chang, Graham W. Taylor
for: 这个研究旨在探讨如何使用机器学习方法来进行生物多样性的分析，特别是对于无脊椎动物这一受探讨的类型。
methods: 本研究使用了不同的机器学习方法，包括支持学习的卷积神经网络、受训练的基础模型和特殊设计的DNA条码遮罩策略。
results: 研究发现，在较简单的数据集和任务下，支持学习的卷积神经网络或受训练的基础模型表现较佳，但是面对具有挑战性的物种水平识别任务时，需要一个新的自动化预训练方法。因此，本研究提出了BarcodeBERT，一个首创的自动化预训练方法，利用了150万个无脊椎动物DNA条码参考库。

Abstract
Understanding biodiversity is a global challenge, in which DNA barcodes - short snippets of DNA that cluster by species - play a pivotal role. In particular, invertebrates, a highly diverse and under-explored group, pose unique taxonomic complexities. We explore machine learning approaches, comparing supervised CNNs, fine-tuned foundation models, and a DNA barcode-specific masking strategy across datasets of varying complexity. While simpler datasets and tasks favor supervised CNNs or fine-tuned transformers, challenging species-level identification demands a paradigm shift towards self-supervised pretraining. We propose BarcodeBERT, the first self-supervised method for general biodiversity analysis, leveraging a 1.5 M invertebrate DNA barcode reference library. This work highlights how dataset specifics and coverage impact model selection, and underscores the role of self-supervised pretraining in achieving high-accuracy DNA barcode-based identification at the species and genus level. Indeed, without the fine-tuning step, BarcodeBERT pretrained on a large DNA barcode dataset outperforms DNABERT and DNABERT-2 on multiple downstream classification tasks. The code repository is available at https://github.com/Kari-Genomics-Lab/BarcodeBERT

摘要
translate into Simplified Chinese:理解生物多样性是全球挑战，DNA编码 - 短段DNA序列归类到物种水平 - 扮演着关键角色。特别是无脊椎动物，这个非常多样化和未探索的组分，表现出独特的分类复杂性。我们研究机器学习方法，比较使用supervised CNNs、精制基模型和DNA编码特定的遮盾策略，在不同复杂度的数据集上进行比较。而 simpler数据集和任务更倾向于使用supervised CNNs或精制transformers，但是挑战性的种类水平识别需要一种思维方式的转变，强调自我超vised预训练。我们提出了BarcodeBERT，首个针对普通生物多样性分析的自我超vised方法，利用1.5万个无脊椎动物DNA编码参考库。这项工作探讨了数据集特点和覆盖率对模型选择的影响，并强调了自我超vised预训练在达到高精度DNA编码基于识别的物种和属水平的重要性。实际上，不包括精制步骤，BarcodeBERT预训练在大量DNA编码数据集上表现出优于DNABERT和DNABERT-2在多个下游分类任务上。代码仓库可以在https://github.com/Kari-Genomics-Lab/BarcodeBERT中找到。

Entropy Aware Training for Fast and Accurate Distributed GNN

paper_url: http://arxiv.org/abs/2311.02399
repo_url: None
paper_authors: Dhruv Deshmukh, Gagan Raj Gupta, Manisha Chawla, Vishwesh Jatala, Anirban Haldar
for: 这 paper 的目的是提高分布式图 neural network 的表现，并且解决分布式图 partitioning 生成不均匀和类别偏好的问题。
methods: 这 paper 使用了 Edge-Weighted partitioning 技术来减少总 entropy，并在每个计算机主上进行异步个性化阶段以适应本地数据分布。它还使用了类别偏好抽样法来加速收敛。
results: 在 DistDGL 框架上实现的这些训练技术比标准基elines 2-3x 快，并在 5 个大型图 benchmark 上平均提高了 4% 的微average F1 分数。

Abstract
Several distributed frameworks have been developed to scale Graph Neural Networks (GNNs) on billion-size graphs. On several benchmarks, we observe that the graph partitions generated by these frameworks have heterogeneous data distributions and class imbalance, affecting convergence, and resulting in lower performance than centralized implementations. We holistically address these challenges and develop techniques that reduce training time and improve accuracy. We develop an Edge-Weighted partitioning technique to improve the micro average F1 score (accuracy) by minimizing the total entropy. Furthermore, we add an asynchronous personalization phase that adapts each compute-host's model to its local data distribution. We design a class-balanced sampler that considerably speeds up convergence. We implemented our algorithms on the DistDGL framework and observed that our training techniques scale much better than the existing training approach. We achieved a (2-3x) speedup in training time and 4\% improvement on average in micro-F1 scores on 5 large graph benchmarks compared to the standard baselines.

摘要
To address these challenges, we develop several techniques to improve training time and accuracy. First, we propose an Edge-Weighted partitioning technique that minimizes the total entropy to improve the micro average F1 score (accuracy). Additionally, we introduce an asynchronous personalization phase that adapts each compute-host's model to its local data distribution. We also design a class-balanced sampler that significantly speeds up convergence.We implement our algorithms on the DistDGL framework and observe that our training techniques scale much better than the existing training approach. Specifically, we achieve a 2-3x speedup in training time and a 4% improvement on average in micro-F1 scores on 5 large graph benchmarks compared to the standard baselines.

NeuroEvoBench: Benchmarking Evolutionary Optimizers for Deep Learning Applications

paper_url: http://arxiv.org/abs/2311.02394
repo_url: None
paper_authors: Robert Tjarko Lange, Yujin Tang, Yingtao Tian
for: This paper aims to address the lack of understanding and best practices for evolutionary optimization (EO) methods in deep learning, and to provide a new benchmark for evaluating EO methods tailored towards deep learning applications.
methods: The paper uses a variety of EO methods, including traditional and meta-learned EO, and investigates their performance on a new benchmark called NeuroEvoBench. The authors also explore core scientific questions such as resource allocation, fitness shaping, normalization, regularization, and scalability of EO.
results: The paper presents the results of the authors’ experiments on NeuroEvoBench, which demonstrate the effectiveness of EO methods for solving hard optimization problems in deep learning. The authors also show that their new benchmark provides practical insights for deep learning applications and can help to accelerate the adoption of EO methods in the field.

Abstract
Recently, the Deep Learning community has become interested in evolutionary optimization (EO) as a means to address hard optimization problems, e.g. meta-learning through long inner loop unrolls or optimizing non-differentiable operators. One core reason for this trend has been the recent innovation in hardware acceleration and compatible software - making distributed population evaluations much easier than before. Unlike for gradient descent-based methods though, there is a lack of hyperparameter understanding and best practices for EO - arguably due to severely less 'graduate student descent' and benchmarking being performed for EO methods. Additionally, classical benchmarks from the evolutionary community provide few practical insights for Deep Learning applications. This poses challenges for newcomers to hardware-accelerated EO and hinders significant adoption. Hence, we establish a new benchmark of EO methods (NeuroEvoBench) tailored toward Deep Learning applications and exhaustively evaluate traditional and meta-learned EO. We investigate core scientific questions including resource allocation, fitness shaping, normalization, regularization & scalability of EO. The benchmark is open-sourced at https://github.com/neuroevobench/neuroevobench under Apache-2.0 license.

摘要

Riemannian stochastic optimization methods avoid strict saddle points

paper_url: http://arxiv.org/abs/2311.02374
repo_url: None
paper_authors: Ya-Ping Hsieh, Mohammad Reza Karimi, Andreas Krause, Panayotis Mertikopoulos
for: 本文研究了Stochastic Riemannian optimization算法是否能够避免瑕疵点的问题。
methods: 本文研究了一家 retraction-based 方法，包括自然策略强化法和镜像投射法等。
results: 研究发现，在 ambient manifold 和 gradient 信息抽象函数的假设下，这些策略在任意初始状态下避免瑕疵点 / 子抽象空间的概率为 1。这个结果为使用梯度方法在抽象空间进行优化提供了重要的健康检查，因为它表明，大多数情况下，梯度方法的限制状态都是本地最小值。

Abstract
Many modern machine learning applications - from online principal component analysis to covariance matrix identification and dictionary learning - can be formulated as minimization problems on Riemannian manifolds, and are typically solved with a Riemannian stochastic gradient method (or some variant thereof). However, in many cases of interest, the resulting minimization problem is not geodesically convex, so the convergence of the chosen solver to a desirable solution - i.e., a local minimizer - is by no means guaranteed. In this paper, we study precisely this question, that is, whether stochastic Riemannian optimization algorithms are guaranteed to avoid saddle points with probability 1. For generality, we study a family of retraction-based methods which, in addition to having a potentially much lower per-iteration cost relative to Riemannian gradient descent, include other widely used algorithms, such as natural policy gradient methods and mirror descent in ordinary convex spaces. In this general setting, we show that, under mild assumptions for the ambient manifold and the oracle providing gradient information, the policies under study avoid strict saddle points / submanifolds with probability 1, from any initial condition. This result provides an important sanity check for the use of gradient methods on manifolds as it shows that, almost always, the limit state of a stochastic Riemannian algorithm can only be a local minimizer.

摘要
许多现代机器学习应用 - 从在线主成分分析到covariance矩阵识别和词库学习 - 都可以表述为在里曼尼投影上的最小化问题，通常使用里曼尼泛化 gradient 方法（或其变种）解决。然而，在许多实际应用中，得到的最小化问题通常不是曲线 convex，因此选择的解决方案的 converges 是不能保证的。在这篇论文中，我们研究了这个问题，即里曼尼泛化优化算法是否能够避免陷阱点的可能性。为了保持一致性，我们研究了一家 retraction-based 方法，这种方法不仅可能比里曼尼泛化 gradient descent 更加低效，还包括了其他广泛使用的算法，如自然政策梯度方法和 mirror descent 在几何空间中。在这个总体设定下，我们证明了，对于 ambient manifold 和 gradient 信息来源的假设满足某些轻量级的条件，则 policies 在研究中避免精确的陷阱点 / 子抽象空间的可能性为 1，从任何初始状态开始。这个结果提供了对使用梯度方法在 manifold 上进行优化的重要的健康性检查，因为它表明，大多数情况下，里曼尼泛化优化算法的极限状态只能是一个本地最小值。

From Trojan Horses to Castle Walls: Unveiling Bilateral Backdoor Effects in Diffusion Models

paper_url: http://arxiv.org/abs/2311.02373
repo_url: None
paper_authors: Zhuoshi Pan, Yuguang Yao, Gaowen Liu, Bingquan Shen, H. Vicky Zhao, Ramana Rao Kompella, Sijia Liu
for: This paper investigates the vulnerability of state-of-the-art diffusion models (DMs) to backdoor attacks, specifically whether generating backdoor attacks can be as simple as BadNets in image classification.
methods: The paper uses a more realistic backdoor setting, where the training dataset is contaminated without tampering the original diffusion process, and uncovers bilateral backdoor effects that can be used for both adversarial and defensive purposes.
results: The paper shows that a BadNets-like backdoor attack remains effective in DMs for producing incorrect images, and that backdoored DMs exhibit an increased ratio of backdoor triggers, which can be used to enhance the detection of backdoor-poisoned training data. Additionally, the paper establishes a linkage between backdoor attacks and the phenomenon of data replications by exploring DMs’ inherent data memorization tendencies.

Abstract
While state-of-the-art diffusion models (DMs) excel in image generation, concerns regarding their security persist. Earlier research highlighted DMs' vulnerability to backdoor attacks, but these studies placed stricter requirements than conventional methods like 'BadNets' in image classification. This is because the former necessitates modifications to the diffusion sampling and training procedures. Unlike the prior work, we investigate whether generating backdoor attacks in DMs can be as simple as BadNets, i.e., by only contaminating the training dataset without tampering the original diffusion process. In this more realistic backdoor setting, we uncover bilateral backdoor effects that not only serve an adversarial purpose (compromising the functionality of DMs) but also offer a defensive advantage (which can be leveraged for backdoor defense). Specifically, we find that a BadNets-like backdoor attack remains effective in DMs for producing incorrect images (misaligned with the intended text conditions), and thereby yielding incorrect predictions when DMs are used as classifiers. Meanwhile, backdoored DMs exhibit an increased ratio of backdoor triggers, a phenomenon we refer to as `trigger amplification', among the generated images. We show that this latter insight can be used to enhance the detection of backdoor-poisoned training data. Even under a low backdoor poisoning ratio, studying the backdoor effects of DMs is also valuable for designing anti-backdoor image classifiers. Last but not least, we establish a meaningful linkage between backdoor attacks and the phenomenon of data replications by exploring DMs' inherent data memorization tendencies. The codes of our work are available at https://github.com/OPTML-Group/BiBadDiff.

摘要
当前最先进的扩散模型（DM）在图像生成方面表现出色，但security问题仍然存在。 Earlier research highlighted DMs的易受到后门攻击的问题，但这些研究假设了与传统方法 like 'BadNets' in image classification不同的需求。 This is because the former requires modifications to the diffusion sampling and training procedures. Unlike prior work, we investigate whether generating backdoor attacks in DMs can be as simple as BadNets, i.e., by only contaminating the training dataset without tampering the original diffusion process. In this more realistic backdoor setting, we uncover bilateral backdoor effects that not only serve an adversarial purpose (compromising the functionality of DMs) but also offer a defensive advantage (which can be leveraged for backdoor defense). Specifically, we find that a BadNets-like backdoor attack remains effective in DMs for producing incorrect images (misaligned with the intended text conditions), and thereby yielding incorrect predictions when DMs are used as classifiers. Meanwhile, backdoored DMs exhibit an increased ratio of backdoor triggers, a phenomenon we refer to as 'trigger amplification', among the generated images. We show that this latter insight can be used to enhance the detection of backdoor-poisoned training data. Even under a low backdoor poisoning ratio, studying the backdoor effects of DMs is also valuable for designing anti-backdoor image classifiers. Last but not least, we establish a meaningful linkage between backdoor attacks and the phenomenon of data replications by exploring DMs' inherent data memorization tendencies. The codes of our work are available at .

TACNET: Temporal Audio Source Counting Network

paper_url: http://arxiv.org/abs/2311.02369
repo_url: None
paper_authors: Amirreza Ahmadnejad, Ahmad Mahmmodian Darviishani, Mohmmad Mehrdad Asadi, Sajjad Saffariyeh, Pedram Yousef, Emad Fatemizadeh
for: 这篇论文是为了解决音频源计数任务中的限制而设计的 Temporal Audio Source Counting Network (TaCNet) 架构。
methods: TaCNet 直接处理原始音频输入，减少了复杂的预处理步骤，简化了工作流程。它在实时speaker计数任务中表现出色，即使输入窗口被截取。
results: 在使用 LibriCount 数据集进行广泛评估中，TaCNet 的平均准确率为 74.18%，在 11 个类别中表现出色，包括中文和波斯语应用场景。这种跨语言适应性表明其 universality 和可能的影响。

Abstract
In this paper, we introduce the Temporal Audio Source Counting Network (TaCNet), an innovative architecture that addresses limitations in audio source counting tasks. TaCNet operates directly on raw audio inputs, eliminating complex preprocessing steps and simplifying the workflow. Notably, it excels in real-time speaker counting, even with truncated input windows. Our extensive evaluation, conducted using the LibriCount dataset, underscores TaCNet's exceptional performance, positioning it as a state-of-the-art solution for audio source counting tasks. With an average accuracy of 74.18 percentage over 11 classes, TaCNet demonstrates its effectiveness across diverse scenarios, including applications involving Chinese and Persian languages. This cross-lingual adaptability highlights its versatility and potential impact.

摘要
在这篇论文中，我们介绍了Temporal Audio Source Counting Network（TaCNet），一种创新的架构，用于解决音频来源计数任务中的限制。TaCNet直接操作 raw 音频输入，从而消除复杂的预处理步骤，简化工作流程。尤其是在实时speaker计数任务中，TaCNet表现出色，即使输入窗口被截断。我们对利用 LibriCount 数据集进行了广泛的评估，并证明 TaCNet 在多种场景下表现出优秀的性能，包括使用中文和波斯语。这种跨语言适应性表明 TaCNet 的多样性和影响力。

MATA: Combining Learnable Node Matching with A Algorithm for Approximate Graph Edit Distance Computation

paper_url: http://arxiv.org/abs/2311.02356
repo_url: None
paper_authors: Junfeng Liu, Min Zhou, Shuai Ma, Lujia Pan
for: 这 paper 的目的是提出一种数据驱动的混合方法来 aproximate Graph Edit Distance (GED) 计算，以解决现有 A* 算法在搜索空间中寻找优化解决方案的可扩展性问题，以及学习基于方法不能准确地回归 GED 的问题。
methods: 这 paper 使用了 Graph Neural Networks (GNNs) 和 A* 算法来实现数据驱动的混合方法 MATA*，其中首先设计了一种结构增强 GNN 来同时学习本地和高阶结构信息，然后通过一个可导的 top-k 操作生成多个优秀的候选节点，最后使用这些候选节点来快速找到解决方案。
results: 经验表明，MATA* 对于大型图进行 aproximate GED 计算具有显著优势，可以高效地解决现有的搜索和学习方法的缺陷。

Abstract
Graph Edit Distance (GED) is a general and domain-agnostic metric to measure graph similarity, widely used in graph search or retrieving tasks. However, the exact GED computation is known to be NP-complete. For instance, the widely used A* algorithms explore the entire search space to find the optimal solution which inevitably suffers scalability issues. Learning-based methods apply graph representation techniques to learn the GED by formulating a regression task, which can not recover the edit path and lead to inaccurate GED approximation (i.e., the predicted GED is smaller than the exact). To this end, in this work, we present a data-driven hybrid approach MATA* for approximate GED computation based on Graph Neural Networks (GNNs) and A* algorithms, which models from the perspective of learning to match nodes instead of directly regressing GED. Specifically, aware of the structure-dominant operations (i.e.,node and edge insertion/deletion) property in GED computation, a structure-enhanced GNN is firstly designed to jointly learn local and high-order structural information for node embeddings for node matchings. Second, top-k candidate nodes are produced via a differentiable top-k operation to enable the training for node matchings, which is adhering to another property of GED, i.e., multiple optimal node matchings. Third, benefiting from the candidate nodes, MATA* only performs on the promising search directions, reaching the solution efficiently. Finally, extensive experiments show the superiority of MATA* as it significantly outperforms the combinatorial search-based, learning-based and hybrid methods and scales well to large-size graphs.

摘要
图文编辑距离（GED）是一个通用和领域不依赖的度量图像相似性，广泛用于图像搜索或检索任务。然而，GED的准确计算是知道NP完备的。例如，通用的A*算法探索整个搜索空间以找到优化解决方案，不可避免的Scalability问题。学习基于方法采用图像表示技术来学习GED，可以不能恢复编辑路径，导致不准确的GED估计（即预测的GED小于实际）。为此，在这项工作中，我们提出了一种数据驱动的混合方法MATA*，基于图像神经网络（GNNs）和A*算法，用于粗略GED计算。具体来说，我们注意到GED计算中结构多 Operation（即节点和边插入/删除）性质，因此我们首先设计了结构增强的GNN来同时学习本地和高阶结构信息以获得节点匹配。其次，通过可导的top-k操作生成top-k候选节点，以便在匹配过程中训练节点匹配。第三，由于候选节点，MATA*仅在有前景的搜索方向上进行搜索，以达到解决方案的目的。最后，我们进行了广泛的实验，并证明MATA*在相比 combinatorial search-based、学习基本和混合方法的情况下表现出色，并且可以在大规模图像上执行。

Sample Complexity of Opinion Formation on Networks

paper_url: http://arxiv.org/abs/2311.02349
repo_url: None
paper_authors: Haolin Liu, Rajmohan Rajaraman, Ravi Sundaram, Anil Vullikanti, Omer Wasim, Haifeng Xu
for: 寻求最佳资源分配策略，使公共卫生官员在社交网络上宣传新疫苗，以达到社区内所有人的共识，并保证宣传内容与实际事实相符。
methods: 基于recognized opinion formation game，每个代理的意见视为数据 derive的模型参数，而不仅仅是先前研究中的实数。这种扩展可以更深入地理解意见形成，与联邦学习密切相关。通过这种形式ulation，我们确定了样本复杂性 bound for any network，并显示了特定网络结构的上下文 bound。
results: 发现优化策略通常将样本分配给代理 inverse proportion to their degree，这对政策产生了重要的含义。我们的发现被验证了在 sinthezied 和实际世界网络上。

Abstract
Consider public health officials aiming to spread awareness about a new vaccine in a community interconnected by a social network. How can they distribute information with minimal resources, ensuring community-wide understanding that aligns with the actual facts? This concern mirrors numerous real-world situations. In this paper, we initialize the study of sample complexity in opinion formation to solve this problem. Our model is built on the recognized opinion formation game, where we regard each agent's opinion as a data-derived model parameter, not just a real number as in prior studies. Such an extension offers a wider understanding of opinion formation and ties closely with federated learning. Through this formulation, we characterize the sample complexity bounds for any network and also show asymptotically tight bounds for specific network structures. Intriguingly, we discover optimal strategies often allocate samples inversely to the degree, hinting at vital policy implications. Our findings are empirically validated on both synthesized and real-world networks.

摘要
公共卫生官员想推广新疫苗的知识在社交媒体上，如何尽可能地分散信息，使整个社区都能够理解，同时与实际情况保持一致？这个问题与现实生活中的许多情况有着很大的相似性。在这篇论文中，我们开始研究样本复杂性在意见形成中的问题。我们在已知的意见形成游戏中使用每个代理的意见作为数据获得的模型参数，而不仅仅是一个实数，这种扩展可以更好地理解意见形成，并与联邦学习 closely 相关。通过这种形式ulation，我们定义了任何网络的样本复杂性bound，以及特定网络结构的上下文bound。我们发现，优化策略通常会尽可能地分配样本，与代理的度量成正比，这对政策有着重要的含义。我们的发现得到了Synthesized和实际世界网络的验证。

Understanding the Natural Language of DNA using Encoder-Decoder Foundation Models with Byte-level Precision

paper_url: http://arxiv.org/abs/2311.02333
repo_url: None
paper_authors: Aditya Malusare, Harish Kothandaraman, Dipesh Tamboli, Nadia A. Lanman, Vaneet Aggarwal
for: 这个论文是为了分析 DNA 序列的 byte-level 精度而设计的 Ensemble Nucleotide Byte-level Encoder-Decoder (ENBED) 基础模型。
methods: 这个模型使用 Transformer 架构的 encoder-decoder 结构，并使用 sub-quadratic 实现注意力来开发一个高效的 sequence-to-sequence 模型。
results: 在不同的下游任务中，包括识别激活器、 promote 和 slice сайты、识别 genomic 序列的生物功能注释、识别 base call mismatches 和 insertion/deletion 错误、以及生成Influenza 病毒的变异，ENBED 模型都表现出了显著的提升，相比 existed 状态的艺术结果。

Abstract
This paper presents the Ensemble Nucleotide Byte-level Encoder-Decoder (ENBED) foundation model, analyzing DNA sequences at byte-level precision with an encoder-decoder Transformer architecture. ENBED uses a sub-quadratic implementation of attention to develop an efficient model capable of sequence-to-sequence transformations, generalizing previous genomic models with encoder-only or decoder-only architectures. We use Masked Language Modeling to pre-train the foundation model using reference genome sequences and apply it in the following downstream tasks: (1) identification of enhancers, promotors and splice sites, (2) identification of biological function annotations of genomic sequences, (3) recognition of sequences containing base call mismatches and insertion/deletion errors, an advantage over tokenization schemes involving multiple base pairs, which lose the ability to analyze with byte-level precision, and (4) generating mutations of the Influenza virus using the encoder-decoder architecture and validating them against real-world observations. In each of these tasks, we demonstrate significant improvement as compared to the existing state-of-the-art results.

摘要

Identification of enhancers, promoters, and splice sites: The ENBED model outperforms existing methods in identifying these functional elements in DNA sequences.2. Identification of biological function annotations of genomic sequences: The ENBED model accurately predicts the biological functions of genomic sequences, including the presence of transcription factor binding sites and other regulatory elements.3. Recognition of sequences containing base call mismatches and insertion/deletion errors: The ENBED model is able to identify sequences with base call errors and insertions/deletions, which is an advantage over tokenization schemes that lose the ability to analyze at the byte level.4. Generating mutations of the Influenza virus using the encoder-decoder architecture and validating them against real-world observations: The ENBED model is used to generate mutations of the Influenza virus and the generated mutations are validated against real-world observations, demonstrating the potential of the model for drug resistance analysis and vaccine design.In each of these tasks, the ENBED model achieves significant improvement over existing state-of-the-art results, demonstrating its effectiveness in analyzing DNA sequences at the byte level.

An Operator Learning Framework for Spatiotemporal Super-resolution of Scientific Simulations

paper_url: http://arxiv.org/abs/2311.02328
repo_url: None
paper_authors: Valentin Duruisseaux, Amit Chakraborty
for:* 这篇论文是为了解决高维度解析方法在数学模型中的计算限制问题，即使在小尺度下可以更好地捕捉实际动态。methods:* 这篇论文使用机器学习技术进行超Resolution，从低维度估算中重建高维度数值解。results:* 这篇论文提出了一种名为Super Resolution Operator Network（SROpNet）的新方法，可以在各种实际问题中提供更高精度的解决方案。

Abstract
In numerous contexts, high-resolution solutions to partial differential equations are required to capture faithfully essential dynamics which occur at small spatiotemporal scales, but these solutions can be very difficult and slow to obtain using traditional methods due to limited computational resources. A recent direction to circumvent these computational limitations is to use machine learning techniques for super-resolution, to reconstruct high-resolution numerical solutions from low-resolution simulations which can be obtained more efficiently. The proposed approach, the Super Resolution Operator Network (SROpNet), frames super-resolution as an operator learning problem and draws inspiration from existing architectures to learn continuous representations of solutions to parametric differential equations from low-resolution approximations, which can then be evaluated at any desired location. In addition, no restrictions are imposed on the locations of (the fixed number of) spatiotemporal sensors at which the low-resolution approximations are provided, thereby enabling the consideration of a broader spectrum of problems arising in practice, for which many existing super-resolution approaches are not well-suited.

摘要
在许多Context中，高分辨率解决方案是必要的，以捕捉小时空尺度下的重要动力学行为。然而，使用传统方法可能会很慢和困难，因为计算资源有限。一种新的方向是使用机器学习技术来实现超解析，从低分辨率的 simulations 中重建高分辨率的数学解。我们的方法，称为 Super Resolution Operator Network (SROpNet)，将超解析视为一个操作学习问题， drew inspiration from existing architectures to learn continuous representations of solutions to parametric differential equations from low-resolution approximations, which can then be evaluated at any desired location.此外，我们不假设仅有一定数量的空间时间感知器的位置，因此可以考虑更多的实际问题，其中许多现有的超解析方法并不适用。

Self-Supervised Learning of Representations for Space Generates Multi-Modular Grid Cells

paper_url: http://arxiv.org/abs/2311.02316
repo_url: None
paper_authors: Rylan Schaeffer, Mikail Khona, Tzuhsuan Ma, Cristóbal Eyzaguirre, Sanmi Koyejo, Ila Rani Fiete
For: 解决空间问题的映射、定位和导航，哺乳动物的后代发展出了突出的空间表示。* Methods: 使用了四种方法：编码理论、动力系统、功能优化和监督深度学习。* Results: 提出了一种新的自监学习（SSL）框架，能够在不需要指导位信息或工程特定的读出表示的情况下，使多个网格细胞模块出现在训练后进行泛化。

Abstract
To solve the spatial problems of mapping, localization and navigation, the mammalian lineage has developed striking spatial representations. One important spatial representation is the Nobel-prize winning grid cells: neurons that represent self-location, a local and aperiodic quantity, with seemingly bizarre non-local and spatially periodic activity patterns of a few discrete periods. Why has the mammalian lineage learnt this peculiar grid representation? Mathematical analysis suggests that this multi-periodic representation has excellent properties as an algebraic code with high capacity and intrinsic error-correction, but to date, there is no satisfactory synthesis of core principles that lead to multi-modular grid cells in deep recurrent neural networks. In this work, we begin by identifying key insights from four families of approaches to answering the grid cell question: coding theory, dynamical systems, function optimization and supervised deep learning. We then leverage our insights to propose a new approach that combines the strengths of all four approaches. Our approach is a self-supervised learning (SSL) framework - including data, data augmentations, loss functions and a network architecture - motivated from a normative perspective, without access to supervised position information or engineering of particular readout representations as needed in previous approaches. We show that multiple grid cell modules can emerge in networks trained on our SSL framework and that the networks and emergent representations generalize well outside their training distribution. This work contains insights for neuroscientists interested in the origins of grid cells as well as machine learning researchers interested in novel SSL frameworks.

摘要
为解决地图、位置Localization和导航问题，哺乳动物的演化历史中发展出了突出的空间表示。一种重要的空间表示是诺贝尔奖获得的格子细胞：神经元表示自己的位置，是一个本地和不规则的量，似乎具有奇怪的非本地和空间周期性的活动模式。为什么哺乳动物演化出这种怪异的格子表示？数学分析表明这些多周期的表示具有出色的算法代码性和内置的错误修复特性，但到目前为止，没有满意的核心原则的合成。在这项工作中，我们开始 by identifying key insights from four families of approaches to answering the grid cell question: coding theory, dynamical systems, function optimization and supervised deep learning. We then leverage our insights to propose a new approach that combines the strengths of all four approaches. Our approach is a self-supervised learning (SSL) framework - including data, data augmentations, loss functions and a network architecture - motivated from a normative perspective, without access to supervised position information or engineering of particular readout representations as needed in previous approaches. We show that multiple grid cell modules can emerge in networks trained on our SSL framework and that the networks and emergent representations generalize well outside their training distribution. This work contains insights for neuroscientists interested in the origins of grid cells as well as machine learning researchers interested in novel SSL frameworks.

Heteroskedastic Tensor Clustering

paper_url: http://arxiv.org/abs/2311.02306
repo_url: None
paper_authors: Yuchen Zhou, Yuxin Chen
for: 提取tensor数据中各个模式下的准确层次结构
methods: 使用一种新的特征值算法 called $\mathsf{Thresholded~Deflated\text{-}HeteroPCA}$，然后使用approx $k$-means来获取层次结构
results: 提供了一种可靠地实现tensor clustering的算法，并且在多种设置下比现有算法表现出更高的可靠性和精度。

Abstract
Tensor clustering, which seeks to extract underlying cluster structures from noisy tensor observations, has gained increasing attention. One extensively studied model for tensor clustering is the tensor block model, which postulates the existence of clustering structures along each mode and has found broad applications in areas like multi-tissue gene expression analysis and multilayer network analysis. However, currently available computationally feasible methods for tensor clustering either are limited to handling i.i.d. sub-Gaussian noise or suffer from suboptimal statistical performance, which restrains their utility in applications that have to deal with heteroskedastic data and/or low signal-to-noise-ratio (SNR). To overcome these challenges, we propose a two-stage method, named $\mathsf{High\text{-}order~HeteroClustering}$ ($\mathsf{HHC}$), which starts by performing tensor subspace estimation via a novel spectral algorithm called $\mathsf{Thresholded~Deflated\text{-}HeteroPCA}$, followed by approximate $k$-means to obtain cluster nodes. Encouragingly, our algorithm provably achieves exact clustering as long as the SNR exceeds the computational limit (ignoring logarithmic factors); here, the SNR refers to the ratio of the pairwise disparity between nodes to the noise level, and the computational limit indicates the lowest SNR that enables exact clustering with polynomial runtime. Comprehensive simulation and real-data experiments suggest that our algorithm outperforms existing algorithms across various settings, delivering more reliable clustering performance.

摘要
tensor clustering，它旨在从含有噪声的张量观察中提取下面的底层结构，在过去几年内获得了越来越多的关注。一种广泛研究的张量 clustering 模型是张量块模型，它假设每个模式中存在层次结构，并在多个领域，如多组织表达分析和多层网络分析中发现了广泛的应用。然而，目前可用的计算可行的张量 clustering 方法 Either 是处理 i.i.d. 子 Gaussian 噪声的限制，或者受到不佳的统计性能的限制，这限制了它们在应用中处理不均匀数据和/或低信号响应比例 (SNR) 的能力。为了解决这些挑战，我们提出了一种两stage方法，名为 $\mathsf{High\text{-}order~HeteroClustering}$ ($\mathsf{HHC}$)，它首先通过一种新的спектраль算法 called $\mathsf{Thresholded~Deflated\text{-}HeteroPCA}$ 进行张量子空间估计，然后使用approx $k$-means 获取集群节点。鼓舞人的是，我们的算法可以在 SNR 超过计算限制 (忽略对数因素) 的情况下，提供正确的划分结果，其中 SNR 是对比两个节点之间的差异与噪声水平的比率，而计算限制则是最低的 SNR 可以使用的计算时间的下限。在广泛的 simulate 和实际数据实验中，我们的算法比现有的算法在不同的设置下表现出更高的可靠性。

paper_url: http://arxiv.org/abs/2311.02282
repo_url: None
paper_authors: Ardavan Modarres, Vahid Mohammad-Zadeh Eivaghi, Mahdi Aliyari Shoorehdeli, Ashkan Moosavian
for: 这个研究旨在提高工业设备状态监控中的条件监控，因为单一感知量不能提供足够的信息，而且单一感知量的噪音会导致误导。因此，需要一个有效的数据融合策略。
methods: 这个研究使用了一种具有对比学习概念的Denosing Multi-Modal Autoencoder，并且首次应用了这种方法在机器健康监控领域中。这种方法不仅能够充分融合多种感知量（或视角）的数据，而且可以在测试时将一个视角 omitted 而不会影响性能，或者甚至不需要将任何视角 omitted。
results: 这个研究的结果显示，使用了Denosing Multi-Modal Autoencoder的方法可以实现高效的多感知量融合，并且可以在感知量失效时继续运行，不需要更改现有的感知量组态。此外，这种方法可以实现更cost-effective的状态监控系统，不需要增加更多的感知量。

Abstract
Due to the incapability of one sensory measurement to provide enough information for condition monitoring of some complex engineered industrial mechanisms and also for overcoming the misleading noise of a single sensor, multiple sensors are installed to improve the condition monitoring of some industrial equipment. Therefore, an efficient data fusion strategy is demanded. In this research, we presented a Denoising Multi-Modal Autoencoder with a unique training strategy based on contrastive learning paradigm, both being utilized for the first time in the machine health monitoring realm. The presented approach, which leverages the merits of both supervised and unsupervised learning, not only achieves excellent performance in fusing multiple modalities (or views) of data into an enriched common representation but also takes data fusion to the next level wherein one of the views can be omitted during inference time with very slight performance reduction, or even without any reduction at all. The presented methodology enables multi-modal fault diagnosis systems to perform more robustly in case of sensor failure occurrence, and one can also intentionally omit one of the sensors (the more expensive one) in order to build a more cost-effective condition monitoring system without sacrificing performance for practical purposes. The effectiveness of the presented methodology is examined on a real-world private multi-modal dataset gathered under non-laboratory conditions from a complex engineered mechanism, an inline four-stroke spark-ignition engine, aiming for spark plug fault diagnosis. This dataset, which contains the accelerometer and acoustic signals as two modalities, has a very slight amount of fault, and achieving good performance on such a dataset promises that the presented method can perform well on other equipment as well.

摘要

Machine learning’s own Industrial Revolution

paper_url: http://arxiv.org/abs/2311.02278
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Yuan Luo, Song Han, Jingjing Liu
for: 本研究目的是帮助机器学习完成自己的工业革命，以满足越来越高的企业需求和广泛的行业。
methods: 本文提出了一种新的工业革命模型，用于帮助机器学习实现自己的目标，包括标准化和自动化生产网络。
results: 本文预测了机器学习的未来发展趋势，并提出了新的机会和挑战，以帮助机器学习在广泛的行业中得到更广泛的应用和利用。

Abstract
Machine learning is expected to enable the next Industrial Revolution. However, lacking standardized and automated assembly networks, ML faces significant challenges to meet ever-growing enterprise demands and empower broad industries. In the Perspective, we argue that ML needs to first complete its own Industrial Revolution, elaborate on how to best achieve its goals, and discuss new opportunities to enable rapid translation from ML's innovation frontier to mass production and utilization.

摘要
机器学习预计会推动下一个工业革命。然而，由于缺乏标准化和自动化的组装网络，机器学习面临着满足永不减少的企业需求和推广到多个行业的重大挑战。在我们的视角中，机器学习需要先完成自己的工业革命，详细说明如何最好实现目标，并讨论新的机会来快速将机器学习的创新前沿翻译成大规模生产和应用。Note: Simplified Chinese is used here, as it is more widely used in mainland China and is the standard language for most online content. Traditional Chinese is used in Taiwan and Hong Kong, and it has some differences in grammar and vocabulary compared to Simplified Chinese.

2023-11-05

Modelling Cellular Perturbations with the Sparse Additive Mechanism Shift Variational Autoencoder

CausalCite: A Causal Formulation of Paper Citations

Make a Donut: Language-Guided Hierarchical EMD-Space Planning for Zero-shot Deformable Object Manipulation

Towards Generic Anomaly Detection and Understanding: Large-scale Visual-linguistic Model (GPT-4V) Takes the Lead

ChaTA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs

Communication Efficient and Privacy-Preserving Federated Learning Based on Evolution Strategies

Rule Learning as Machine Translation using the Atomic Knowledge Bank

Causal Question Answering with Reinforcement Learning

Learning Independently from Causality in Multi-Agent Environments

AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection

Extraction of Atypical Aspects from Customer Reviews: Datasets and Experiments with Language Models

Architecture Matters: Uncovering Implicit Mechanisms in Graph Contrastive Learning

Compute at Scale – A Broad Investigation into the Data Center Industry

New Approach for an Affective Computing-Driven Quality of Experience (QoE) Prediction

PotholeGuard: A Pothole Detection Approach by Point Cloud Semantic Segmentation

Assessing the Promise and Pitfalls of ChatGPT for Automated Code Generation

A Critical Perceptual Pre-trained Model for Complex Trajectory Recovery

The New Frontier of Cybersecurity: Emerging Threats and Innovations

AIOps-Driven Enhancement of Log Anomaly Detection in Unsupervised Scenarios

Get the Ball Rolling: Alerting Autonomous Robots When to Help to Close the Healthcare Loop

Automated Camera Calibration via Homography Estimation with GNNs

FloodBrain: Flood Disaster Reporting by Web-based Retrieval Augmented Generation with an LLM

scBeacon: single-cell biomarker extraction via identifying paired cell clusters across biological conditions with contrastive siamese networks

Differentially Private Pre-Trained Model Fusion using Decentralized Federated Graph Matching

Newvision: application for helping blind people using deep learning

KITS: Inductive Spatio-Temporal Kriging with Increment Training Strategy

Time Series Synthesis Using the Matrix Profile for Anonymization

Ego-Network Transformer for Subsequence Classification in Time Series Data

Sketching Multidimensional Time Series for Fast Discord Mining

Nonlinear Multi-objective Reinforcement Learning with Provable Guarantees

Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols

2023-11-05

Robust Generalization Strategies for Morpheme Glossing in an Endangered Language Documentation Context

Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency

Pyclipse, a library for deidentification of free-text clinical notes

Nepali Video Captioning using CNN-RNN Architecture

LLM-enhanced Self-training for Cross-domain Constituency Parsing

Divide & Conquer for Entailment-aware Multi-hop Evidence Retrieval

mahaNLP: A Marathi Natural Language Processing Library

Temporal Sequencing of Documents

BanMANI: A Dataset to Identify Manipulated Social Media News in Bangla

Topic model based on co-occurrence word networks for unbalanced short text datasets

Relation Extraction Model Based on Semantic Enhancement Mechanism

2023-11-05

From molecules to scaffolds to functional groups: building context-dependent molecular representation via multi-channel learning

Riemannian Laplace Approximation with the Fisher Metric

Log-Concavity of Multinomial Likelihood Functions Under Interval Censoring Constraints on Frequencies or Their Partial Sums

One-Shot Strategic Classification Under Unknown Costs

ELEGANT: Certified Defense on the Fairness of Graph Neural Networks

Staged Reinforcement Learning for Complex Tasks through Decomposed Environments

Exploiting Correlated Auxiliary Feedback in Parameterized Bandits

A Goal-Driven Approach to Systems Neuroscience

Enhancing AI Research Paper Analysis: Methodology Component Extraction using Factored Transformer-based Sequence Modeling Approach

Identifying Linearly-Mixed Causal Representations from Multi-Node Interventions

Regret Analysis of Learning-Based Linear Quadratic Gaussian Control with Additive Exploration

Drone-Enabled Load Management for Solar Small Cell Networks in Next-Gen Communications Optimization for Solar Small Cells

Pointer Networks with Q-Learning for OP Combinatorial Optimization

An adaptive standardisation model for Day-Ahead electricity price forecasting

Steady-State Analysis of Queues with Hawkes Arrival and Its Application to Online Learning for Hawkes Queues

Temporal Treasure Hunt: Content-based Time Series Retrieval System for Discovering Insights

Fast Minimization of Expected Logarithmic Loss via Stochastic Dual Averaging

High-dimensional Bid Learning for Energy Storage Bidding in Energy Markets

Preliminary Analysis on Second-Order Convergence for Biased Policy Gradient Methods

2023-11-05

Flexible uniform-sampling foveated Fourier single-pixel imaging

2023-11-05

Age of Information Analysis for CR-NOMA Aided Uplink Systems with Randomly Arrived Packets

An Open Dataset Storage Standard for 6G Testbeds

Exploiting Hybrid Terrestrial/LEO Satellite Systems for Rural Connectivity

Pilot-Based Key Distribution and Encryption for Secure Coherent Passive Optical Networks

2023-11-04

OverHear: Headphone based Multi-sensor Keystroke Inference

2023-11-04

Anthropomorphic Grasping with Neural Object Shape Completion

Neural Network Reconstruction of the Left Atrium using Sparse Catheter Paths

A Strictly Bounded Deep Network for Unpaired Cyclic Translation of Medical Images

SPHEAR: Spherical Head Registration for Complete Statistical 3D Modeling

Extracting Network Structures from Corporate Organization Charts Using Heuristic Image Processing

P-Age: Pexels Dataset for Robust Spatio-Temporal Apparent Age Classification