2023-09-03

cs.AI

cs.AI - 2023-09-03

paper_url: http://arxiv.org/abs/2309.01291
repo_url: https://github.com/babatundeibukun/simple-social-learning-environment
paper_authors: Sara Fish, Paul Gölz, David C. Parkes, Ariel D. Procaccia, Gili Rusak, Itai Shapira, Manuel Wüthrich
for: 这篇论文是为了探讨人工智能在民主过程中的应用，具体来说是如何使用自然语言处理技术来实现民主选举。
methods: 这篇论文使用了社会选择理论的数学严谨性和大自然语言模型的文本生成能力，提出了一个生成社会选择框架，可以帮助解决复杂的民主选举问题。
results: 通过应用这个框架，可以生成一个代表民意的评论文本，例如在在线审议过程中。

Abstract
Traditionally, social choice theory has only been applicable to choices among a few predetermined alternatives but not to more complex decisions such as collectively selecting a textual statement. We introduce generative social choice, a framework that combines the mathematical rigor of social choice theory with large language models' capability to generate text and extrapolate preferences. This framework divides the design of AI-augmented democratic processes into two components: first, proving that the process satisfies rigorous representation guarantees when given access to oracle queries; second, empirically validating that these queries can be approximately implemented using a large language model. We illustrate this framework by applying it to the problem of generating a slate of statements that is representative of opinions expressed as free-form text, for instance in an online deliberative process.

摘要
（以下是简化中文版本）传统上，社会选择理论只适用于一些已经预先确定的选项之间的选择，而不适用于更复杂的决策，如通过人工智能支持的民主过程中的 коллектив选择文本声明。我们介绍了生成社会选择框架，这个框架将社会选择理论的数学严谨性与大自然语言模型的文本生成能力相结合，以便更好地满足民主过程中的多样化需求。这个框架将民主过程的设计分为两个组成部分：首先，证明过程满足严谨的表达保证，当给定询问 oracle 时；其次，通过实际验证来证明这些询问可以使用大自然语言模型来近似实现。我们通过应用这个框架来解决在 он线协商过程中收集和代表表达出的意见的问题，例如生成一份代表多种意见的文本声明。

Traveling Waves Encode the Recent Past and Enhance Sequence Learning

paper_url: http://arxiv.org/abs/2309.08045
repo_url: https://github.com/anon-neurips-2023/wave-rnn
paper_authors: T. Anderson Keller, Lyle Muller, Terrence Sejnowski, Max Welling
for: 这个论文的目的是解释 cortical sheet 中的 neural activity 是如何实现短期记忆的。
methods: 这个论文使用了一种简单的 recurrent neural network 模型，称为 Wave-RNN (wRNN)，来实现wave-like dynamics。
results: 研究发现，使用 wRNN 模型可以快速地学习并表现出优于不含wave的模型，并且在更复杂的序列模型任务中也表现出类似的性能。

Abstract
Traveling waves of neural activity have been observed throughout the brain at a diversity of regions and scales; however, their precise computational role is still debated. One physically grounded hypothesis suggests that the cortical sheet may act like a wave-field capable of storing a short-term memory of sequential stimuli through induced waves traveling across the cortical surface. To date, however, the computational implications of this idea have remained hypothetical due to the lack of a simple recurrent neural network architecture capable of exhibiting such waves. In this work, we introduce a model to fill this gap, which we denote the Wave-RNN (wRNN), and demonstrate how both connectivity constraints and initialization play a crucial role in the emergence of wave-like dynamics. We then empirically show how such an architecture indeed efficiently encodes the recent past through a suite of synthetic memory tasks where wRNNs learn faster and perform significantly better than wave-free counterparts. Finally, we explore the implications of this memory storage system on more complex sequence modeling tasks such as sequential image classification and find that wave-based models not only again outperform comparable wave-free RNNs while using significantly fewer parameters, but additionally perform comparably to more complex gated architectures such as LSTMs and GRUs. We conclude with a discussion of the implications of these results for both neuroscience and machine learning.

摘要
旅行波的神经活动已在脑部多个区域和尺度上观察到;然而，它们的具体计算作用仍在讨论中。一种物理上基础的假设是，质神经层可能 acts like a wave-field，可以在启发了扩散的 cortical surface 上存储短期内存。然而，这个想法的计算影响仍然是假设，因为没有一个简单的循环神经网络架构可以实现这种波动。在这种工作中，我们提出了一种模型，我们称之为 wave-RNN（wRNN），并证明了连接约束和初始化对波动的出现具有关键作用。然后，我们employmultiple synthetic memory tasks to demonstrate that wRNNs learn faster and perform significantly better than wave-free counterparts。最后，我们探讨了这种内存存储系统在更复杂的序列模型任务中的表现，并发现波动基本模型不仅在相对较少的参数下比wave-free RNNs快速学习，而且与更复杂的闭合架构，如LSTMs和GRUs，相当。我们 conclude with a discussion of the implications of these results for both neuroscience and machine learning.

Bayesian inference of composition-dependent phase diagrams

paper_url: http://arxiv.org/abs/2309.01271
repo_url: None
paper_authors: Timofei Miryashkin, Olga Klimanova, Vladimir Ladygin, Alexander Shapeev
for: This paper was written to develop a method for constructing temperature-concentration phase diagrams for materials using Bayesian inference and molecular dynamics simulations.
methods: The paper uses Bayesian inference to combine thermodynamic data from molecular dynamics simulations, melting point simulations, and phonon calculations, and to extrapolate the results to the infinite-atom limit.
results: The paper reports the development of an algorithm that can be used to construct temperature-concentration phase diagrams for materials with a high degree of accuracy and precision, and demonstrates the effectiveness of the algorithm on two binary systems, Ge-Si and K-Na.

Abstract
Phase diagrams serve as a highly informative tool for materials design, encapsulating information about the phases that a material can manifest under specific conditions. In this work, we develop a method in which Bayesian inference is employed to combine thermodynamic data from molecular dynamics (MD), melting point simulations, and phonon calculations, process these data, and yield a temperature-concentration phase diagram. The employed Bayesian framework yields us not only the free energies of different phases as functions of temperature and concentration but also the uncertainties of these free energies originating from statistical errors inherent to finite-length MD trajectories. Furthermore, it extrapolates the results of the finite-atom calculations to the infinite-atom limit and facilitates the choice of temperature, chemical potentials, and the number of atoms conducting the next simulation with which will be the most efficient in reducing the uncertainty of the phase diagram. The developed algorithm was successfully tested on two binary systems, Ge-Si and K-Na, in the full range of concentrations and temperatures.

摘要
（以下是简化中文版）phas diagrams serve as a highly informative tool for materials design, encapsulating information about the phases that a material can manifest under specific conditions. In this work, we develop a method in which Bayesian inference is employed to combine thermodynamic data from molecular dynamics (MD), melting point simulations, and phonon calculations, process these data, and yield a temperature-concentration phase diagram. The employed Bayesian framework yields us not only the free energies of different phases as functions of temperature and concentration but also the uncertainties of these free energies originating from statistical errors inherent to finite-length MD trajectories. Furthermore, it extrapolates the results of the finite-atom calculations to the infinite-atom limit and facilitates the choice of temperature, chemical potentials, and the number of atoms conducting the next simulation with which will be the most efficient in reducing the uncertainty of the phase diagram. The developed algorithm was successfully tested on two binary systems, Ge-Si and K-Na, in the full range of concentrations and temperatures.

COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers

paper_url: http://arxiv.org/abs/2309.01270
repo_url: https://github.com/juliendenize/eztorch
paper_authors: Julien Denize, Mykola Liashuha, Jaonary Rabarisoa, Astrid Orcesi, Romain Hérault
for: 这 paper 是为了提出一种用于动作检测的 Initialization 管道，即 COMEDIAN，该管道包括自动学习和知识储存两个 initialization 阶段。
methods: 该 paper 使用了两个 initialization 阶段，首先是使用短视频作为输入进行自动学习初始化 spatial transformer，然后是通过知识储存来增强 spatial transformer 的输出，并在最后一步进行 fine-tuning。
results: 实验结果表明，COMEDIAN 的预训练方法可以在 SoccerNet-v2 数据集上达到状态作卷积的性能，并且比非预训练模型更快地 converges。这些结果表明 COMEDIAN 的预训练管道的有效性。

Abstract
We present COMEDIAN, a novel pipeline to initialize spatio-temporal transformers for action spotting, which involves self-supervised learning and knowledge distillation. Action spotting is a timestamp-level temporal action detection task. Our pipeline consists of three steps, with two initialization stages. First, we perform self-supervised initialization of a spatial transformer using short videos as input. Additionally, we initialize a temporal transformer that enhances the spatial transformer's outputs with global context through knowledge distillation from a pre-computed feature bank aligned with each short video segment. In the final step, we fine-tune the transformers to the action spotting task. The experiments, conducted on the SoccerNet-v2 dataset, demonstrate state-of-the-art performance and validate the effectiveness of COMEDIAN's pretraining paradigm. Our results highlight several advantages of our pretraining pipeline, including improved performance and faster convergence compared to non-pretrained models.

摘要
我们提出了COMEDIAN，一个新的 Initialize Pipeline，用于时间action spotting任务的spatio-temporal transformer的初始化。action spotting是一个时间戳级的动作检测任务。我们的管道包括三个步骤，其中有两个初始化阶段。首先，我们使用短视频作为输入进行自我超vised学习初始化一个空间变换器。其次，我们使用知识填充学习增强空间变换器的输出，通过对每个短视频分段预计算的特征库进行知识填充。最后，我们精度调整transformer到动作检测任务。我们在SoccerNet-v2数据集上进行了实验，并证明了COMEDIAN的预训练方案的效iveness。我们的结果显示了预训练模型的性能提高和更快的收敛速度。

Learning-Aware Safety for Interactive Autonomy

paper_url: http://arxiv.org/abs/2309.01267
repo_url: None
paper_authors: Haimin Hu, Zixu Zhang, Kensuke Nakamura, Andrea Bajcsy, Jaime F. Fisac
for: 本研究旨在提供一种新的关闭Loop方法，以确保机器人系统在实时学习和适应的情况下保持安全交互。
methods: 该方法使用反抗搅ء深度学习来规避未来可能的enario，并同时考虑机器人学习算法的内部信念的变化。
results: 研究人员使用这种方法可以 tractable safety analysis，并且可以处理高维度的情况。此外，他们还能够证明这种方法可以与 bayesian belief propagation和大型预训练神经轨迹预测器结合使用。

Abstract
One of the outstanding challenges for the widespread deployment of robotic systems like autonomous vehicles is ensuring safe interaction with humans without sacrificing efficiency. Existing safety analysis methods often neglect the robot's ability to learn and adapt at runtime, leading to overly conservative behavior. This paper proposes a new closed-loop paradigm for synthesizing safe control policies that explicitly account for the system's evolving uncertainty under possible future scenarios. The formulation reasons jointly about the physical dynamics and the robot's learning algorithm, which updates its internal belief over time. We leverage adversarial deep reinforcement learning (RL) for scaling to high dimensions, enabling tractable safety analysis even for implicit learning dynamics induced by state-of-the-art prediction models. We demonstrate our framework's ability to work with both Bayesian belief propagation and the implicit learning induced by a large pre-trained neural trajectory predictor.

摘要
一个现代化的挑战是在广泛部署自动化系统时确保安全地与人类交互，不会牺牲效率。现有的安全分析方法经常忽略机器人的学习和运行时 adaptability，导致行为过于保守。这篇论文提出了一种新的封闭循环方案，用于生成安全控制策略，并且考虑了系统的演变不确定性。我们利用对抗式深度学习来扩展到高维度，使得安全分析可以承受大数据量，并且可以 tractable 地分析隐式学习动力，即使使用现代预测模型。我们示例中使用了 bayesian belief propagation 和大型预训练神经网络轨迹预测器来演示我们的框架的可行性。

Large AI Model Empowered Multimodal Semantic Communications

paper_url: http://arxiv.org/abs/2309.01249
repo_url: None
paper_authors: Feibo Jiang, Yubo Peng, Li Dong, Kezhi Wang, Kun Yang, Cunhua Pan, Xiaohu You
for: 提供一个具有低延迟和高质量的Semantic Communication（SC）体验，使用多Modal Signal（文本、音频、图像和视频）。
methods: 利用大AI模型，具体是Multimodal Language Model（MLM）和Large Language Model（LLM）来解决数据不一致性、semantic ambiguity和信号抖动等问题。
results: 提出一个基于大AI模型的多Modal SC（LAM-MSC）框架，包括MLM-based Multimodal Alignment（MMA）、个性化的LKB和Conditional Generative Adversarial Networks-based Channel Estimation（CGE）等技术，可以有效地提高SC的性能。

Abstract
Multimodal signals, including text, audio, image and video, can be integrated into Semantic Communication (SC) for providing an immersive experience with low latency and high quality at the semantic level. However, the multimodal SC has several challenges, including data heterogeneity, semantic ambiguity, and signal fading. Recent advancements in large AI models, particularly in Multimodal Language Model (MLM) and Large Language Model (LLM), offer potential solutions for these issues. To this end, we propose a Large AI Model-based Multimodal SC (LAM-MSC) framework, in which we first present the MLM-based Multimodal Alignment (MMA) that utilizes the MLM to enable the transformation between multimodal and unimodal data while preserving semantic consistency. Then, a personalized LLM-based Knowledge Base (LKB) is proposed, which allows users to perform personalized semantic extraction or recovery through the LLM. This effectively addresses the semantic ambiguity. Finally, we apply the Conditional Generative adversarial networks-based channel Estimation (CGE) to obtain Channel State Information (CSI). This approach effectively mitigates the impact of fading channels in SC. Finally, we conduct simulations that demonstrate the superior performance of the LAM-MSC framework.

摘要
多模式信号（包括文本、音频、图像和视频）可以在semantic Communication（SC）中集成，以提供具有低延迟和高质量的 immerse 体验。然而，多模式SC 存在许多挑战，包括数据不一致、semantic 抽象和信号衰减。最近的大AI模型，特别是多模式语言模型（MLM）和大语言模型（LLM），提供了解决这些问题的可能性。为此，我们提出了基于大AI模型的多模式SC 框架（LAM-MSC），其中我们首先提出了基于 MLM 的多模式对应（MMA），使得在多模式和单模式数据之间进行转换，保持 semantic 一致性。然后，我们提出了基于 LLM 的个性化知识库（LKB），允许用户进行个性化semantic 提取或恢复，从而有效解决semantic 抽象问题。最后，我们应用Conditional Generative Adversarial Networks（CGE）来获取通道状态信息（CSI），这种方法有效地减轻了混叠通道的影响。我们进行了实验，并证明了 LAM-MSC 框架的超越性。

Representations Matter: Embedding Modes of Large Language Models using Dynamic Mode Decomposition

paper_url: http://arxiv.org/abs/2309.01245
repo_url: None
paper_authors: Mohamed Akrout
for: 本研究旨在检测大型自然语言模型（LLM）生成的“妄想”内容，即不基础的文本内容。
methods: 本研究使用动态模式分解（DMD）工具分析生成文本的词嵌入空间的模式演化。
results: 研究发现，生成文本的词嵌入spectrum随着句子的排序逐渐降低，与真实文本的词嵌入spectrum不同。此外，评估 случа件中存在LLM妄想时，真实文本的词嵌入模式具有更多的模式被LLM嵌入模式不好地拟合。这表明，妄想结果归因于生成技术和基础表示。

Abstract
Existing large language models (LLMs) are known for generating "hallucinated" content, namely a fabricated text of plausibly looking, yet unfounded, facts. To identify when these hallucination scenarios occur, we examine the properties of the generated text in the embedding space. Specifically, we draw inspiration from the dynamic mode decomposition (DMD) tool in analyzing the pattern evolution of text embeddings across sentences. We empirically demonstrate how the spectrum of sentence embeddings over paragraphs is constantly low-rank for the generated text, unlike that of the ground-truth text. Importantly, we find that evaluation cases having LLM hallucinations correspond to ground-truth embedding patterns with a higher number of modes being poorly approximated by the few modes associated with LLM embedding patterns. In analogy to near-field electromagnetic evanescent waves, the embedding DMD eigenmodes of the generated text with hallucinations vanishes quickly across sentences as opposed to those of the ground-truth text. This suggests that the hallucinations result from both the generation techniques and the underlying representation.

摘要
现有大型语言模型（LLM）已知能生成“幻想”内容，即fabricated文本中的虚假信息。为了识别这些幻想场景，我们研究生成文本在嵌入空间的属性。 Specifically, we draw inspiration from动态模式分解（DMD）工具来分析文本嵌入的模式进化。我们实际示例中，生成文本的句子嵌入spectrum across paragraphs是常 Low-rank的，与真实文本的嵌入spectrum不同。进一步，我们发现评测 случа件具有LLM幻想的情况与真实文本的嵌入模式数量更高，但这些模式与LLM嵌入模式之间的相似性较低。在近场电磁波的类比中，生成文本幻想的嵌入DMD eigenmodes在句子之间变得越来越小，与真实文本的嵌入DMD eigenmodes不同。这表明幻想的结果来自生成技术和下面的表示。

Saturn: An Optimized Data System for Large Model Deep Learning Workloads

paper_url: http://arxiv.org/abs/2309.01226
repo_url: https://github.com/knagrecha/saturn
paper_authors: Kabir Nagrecha, Arun Kumar
for: 本研究旨在帮助深度学习（DL）用户更好地选择并运行大型模型，解决DL用户面临的三大负担：并行选择、资源分配和任务调度。
methods: 本研究提出了一种新的信息系统架构，用于解决DL用户面临的三大负担。该架构包括一种可编程的实验室 Profiler，一种MILP（分配优化问题）模型，以及一种自适应调度策略。
results: 实验结果表明，使用MILP解决器可以significantly reduces model selection runtimes（39-49%），而且可以进一步 optimize system runtime through introspective scheduling approach。这些技术在一个新的数据系统中被实现，称为Saturn。

Abstract
Large language models such as GPT-3 & ChatGPT have transformed deep learning (DL), powering applications that have captured the public's imagination. These models are rapidly being adopted across domains for analytics on various modalities, often by finetuning pre-trained base models. Such models need multiple GPUs due to both their size and computational load, driving the development of a bevy of "model parallelism" techniques & tools. Navigating such parallelism choices, however, is a new burden for end users of DL such as data scientists, domain scientists, etc. who may lack the necessary systems knowhow. The need for model selection, which leads to many models to train due to hyper-parameter tuning or layer-wise finetuning, compounds the situation with two more burdens: resource apportioning and scheduling. In this work, we tackle these three burdens for DL users in a unified manner by formalizing them as a joint problem that we call SPASE: Select a Parallelism, Allocate resources, and SchedulE. We propose a new information system architecture to tackle the SPASE problem holistically, representing a key step toward enabling wider adoption of large DL models. We devise an extensible template for existing parallelism schemes and combine it with an automated empirical profiler for runtime estimation. We then formulate SPASE as an MILP. We find that direct use of an MILP-solver is significantly more effective than several baseline heuristics. We optimize the system runtime further with an introspective scheduling approach. We implement all these techniques into a new data system we call Saturn. Experiments with benchmark DL workloads show that Saturn achieves 39-49% lower model selection runtimes than typical current DL practice.

摘要
大型语言模型如GPT-3和ChatGPT已经改变深度学习（DL），推动了许多应用程序，吸引了大众的注意。这些模型在不同领域中被快速适用，通常是通过调整预训模型来进行调整。这些模型需要多个GPU，因为它们的大小和计算负载，这驱使了模型平行化技术和工具的发展。但是，为DL使用者如数据科学家和领域科学家等选择和管理这些平行化方案，则增加了新的负担。因为模型选择和层级调整导致了多个模型需要训练，这个问题更加复杂。在这个研究中，我们将这三个负担视为一个共同问题，我们统称为SPASE：选择平行、分配资源和安排。我们提出了一个新的资讯系统架构，来解决SPASE问题。我们创建了一个可扩展的平行方案模板，并与一个自动化的实验性质估计器结合。我们将SPASE视为一个MILP（内置搜索）。我们发现，直接使用MILP解决方案比基eline变数估计法更有效。我们进一步优化系统执行时间使用一种自我反思的安排方法。我们实现了这些技术在我们的新数据系统Saturn上。实验结果显示，Saturn在常用DL工作负载上降低了39-49%的模型选择执行时间。

Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models

paper_url: http://arxiv.org/abs/2309.01219
repo_url: https://github.com/hillzhang1999/llm-hallucination-survey
paper_authors: Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, Shuming Shi
for: 本研究旨在探讨大语言模型（LLM）在实际应用中的可靠性问题，即 LLM occasional hallucination 问题。
methods: 本研究对现有的检测、解释和缓解 LLM hallucination 方法进行了检视和分析，并讨论了未来研究的可能方向。
results: 研究发现了 LLM hallucination 现象的多种类型和评价指标，分析了现有的缓解方法的效果，并提出了未来研究的潜在方向。

Abstract
While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge. This phenomenon poses a substantial challenge to the reliability of LLMs in real-world scenarios. In this paper, we survey recent efforts on the detection, explanation, and mitigation of hallucination, with an emphasis on the unique challenges posed by LLMs. We present taxonomies of the LLM hallucination phenomena and evaluation benchmarks, analyze existing approaches aiming at mitigating LLM hallucination, and discuss potential directions for future research.

摘要
大型语言模型（LLM）在多种下游任务中表现出色，但也存在一定的问题：LLM occasional generation content diverges from user input, contradicts previously generated context, or misaligns with established world knowledge。这种现象对 LLM 在实际应用场景中的可靠性带来了极大的挑战。在这篇论文中，我们对 LLM 幻觉现象进行了检测、解释和避免的各种尝试，并分析了现有的避免方法，以及未来研究的可能性。Here's the text with some notes on the translation:* "大型语言模型" (LLM) is translated as "大型语言模型" (also known as "large language models" or "LLMs").* "幻觉" (hallucination) is translated as "幻觉" (also known as "hallucination" or "LLM hallucination").* " contradicts" is translated as " contradicts" (同义词).* "misaligns" is translated as "misaligns" (同义词).* "established world knowledge" is translated as "已知世界知识" (also known as "common knowledge" or "established knowledge").Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

Physics-inspired Neural Networks for Parameter Learning of Adaptive Cruise Control Systems

paper_url: http://arxiv.org/abs/2309.01211
repo_url: None
paper_authors: Theocharis Apostolakis, Konstantinos Ampountolas
for: 本研究提出了一种基于物理学的神经网络（PiNN），用于学习商业实施的自适应速度控制系统（ACC）的参数。
methods: 本研究使用了多层人工神经网络作为通用函数approximator，并采用了常数时间头额策略（CTHP）来模拟ACC系统的长向 Dynamics。
results: 研究结果表明，提出的PiNN可以高效地学习未知ACC系统的参数，并且对于不同的汽车制造商的ACC系统进行了严格的评估。结果还表明，ACC系统的设计参数不是$L_2$也不是$L_\infty$的 string stable。

Abstract
This paper proposes and develops a physics-inspired neural network (PiNN) for learning the parameters of commercially implemented adaptive cruise control (ACC) systems in automotive industry. To emulate the core functionality of stock ACC systems, which have proprietary control logic and undisclosed parameters, the constant time-headway policy (CTHP) is adopted. Leveraging the multi-layer artificial neural networks as universal approximators, the developed PiNN serves as a surrogate model for the longitudinal dynamics of ACC-engaged vehicles, efficiently learning the unknown parameters of the CTHP. The ability of the PiNN to infer the unknown ACC parameters is meticulous evaluated using both synthetic and high-fidelity empirical data of space-gap and relative velocity involving ACC-engaged vehicles in platoon formation. The results have demonstrated the superior predictive ability of the proposed PiNN in learning the unknown design parameters of stock ACC systems from different car manufacturers. The set of ACC model parameters obtained from the PiNN revealed that the stock ACC systems of the considered vehicles in three experimental campaigns are neither $L_2$ nor $L_\infty$ string stable.

摘要
Translation notes:* "stock ACC systems" is translated as "商业实现的适应速度控制系统" (shāngchǎng jítuō de suīyìng yìjīng zhìxīng zhì)* "proprietary control logic" is translated as "专有控制逻辑" (zhuān yǒu kòng zhì lógí)* "undisclosed parameters" is translated as "未公开的参数" (wèi gōngkāi de ciàngxiàng)* "constant time-headway policy" is translated as "常数时间间隔策略" (chángshuō shíjiān jiāngrá zhùlü)* "multi-layer artificial neural networks" is translated as "多层人工神经网络" (duōcéng rénshēng jīngxīn wǎngwǎng)* "surrogate model" is translated as "代理模型" (dài lǐ móde)* "longitudinal dynamics" is translated as "长度动力学" (chángduō dònglì xué)* "ACC-engaged vehicles" is translated as "适应速度控制车辆" (suīyìng yìjīng chēliàng)* "platoon formation" is translated as "队列形式" (duì liè xíngshì)* "high-fidelity empirical data" is translated as "高准确的实验数据" (gāo zhèngqì de shíyàn shùdà)* "string stability" is translated as "串稳定" (chuī jìdìng)

A Visual Interpretation-Based Self-Improved Classification System Using Virtual Adversarial Training

paper_url: http://arxiv.org/abs/2309.01196
repo_url: None
paper_authors: Shuai Jiang, Sayaka Kamei, Chen Li, Shengzhe Hou, Yasuhiko Morimoto
for: 这篇论文旨在提出一个可以实现Visual Interpretation-based Self-Improving Classification的模型，以解决BERT模型在自然语言处理中的黑盒问题。methods: 本文提出的方法包括：使用精进BERT模型作为文本类别器，然后使用这些预测的类别标签来在另一个BERT模型中进行类别训练，同时使用VAT技术进行自适应训练。results: 实验结果显示，提出的模型在Twitter的短讯数据集上实现了高效的类别性能。此外，ablation study结果显示不同模型的Component对于类别结果的影响。

Abstract
The successful application of large pre-trained models such as BERT in natural language processing has attracted more attention from researchers. Since the BERT typically acts as an end-to-end black box, classification systems based on it usually have difficulty in interpretation and low robustness. This paper proposes a visual interpretation-based self-improving classification model with a combination of virtual adversarial training (VAT) and BERT models to address the above problems. Specifically, a fine-tuned BERT model is used as a classifier to classify the sentiment of the text. Then, the predicted sentiment classification labels are used as part of the input of another BERT for spam classification via a semi-supervised training manner using VAT. Additionally, visualization techniques, including visualizing the importance of words and normalizing the attention head matrix, are employed to analyze the relevance of each component to classification accuracy. Moreover, brand-new features will be found in the visual analysis, and classification performance will be improved. Experimental results on Twitter's tweet dataset demonstrate the effectiveness of the proposed model on the classification task. Furthermore, the ablation study results illustrate the effect of different components of the proposed model on the classification results.

摘要
成功应用大型预训模型，如BERT，在自然语言处理中吸引了更多研究者的注意。由于BERT通常作为终端黑盒模型，因此基于它的分类系统通常具有低可解释性和低Robustness。本文提出了基于可见解释的自我改进分类模型，通过结合虚拟对抗训练（VAT）和BERT模型来解决上述问题。具体来说，一个精度调整后的BERT模型被用作文本情感分类器。然后，预测的情感分类标签被用作另一个BERT模型的敏感训练数据，通过semi-supervised的方式使用VAT进行训练。此外，使用视觉化技术，包括Word的重要性可见化和注意头矩阵的 норmalizaition，以分析每个组件对分类精度的 relevance。此外，Visual分析还可以找到新的特征。实验结果表明，提议的模型在Twitter tweet数据集上对分类任务具有效果。此外，ablation study结果表明不同组件对分类结果的影响。

A Survey on Service Route and Time Prediction in Instant Delivery: Taxonomy, Progress, and Prospects

paper_url: http://arxiv.org/abs/2309.01194
repo_url: None
paper_authors: Haomin Wen, Youfang Lin, Lixia Wu, Xiaowei Mao, Tianyue Cai, Yunfeng Hou, Shengnan Guo, Yuxuan Liang, Guangyin Jin, Yiji Zhao, Roger Zimmermann, Jieping Ye, Huaiyu Wan
for: 这篇论文的目的是为服务平台的路由和时间预测（RTP）提供一个系统性的概述，帮助研究人员更好地了解这个领域。
methods: 这篇论文使用了一种新的分类方法，将RTP方法分为三类：任务类型、模型架构和学习态度。这些方法包括单路预测、单时预测和共同路径时间预测等。
results: 这篇论文提供了一个全面的概述，把现有的RTP方法分类和总结，并指出了当前研究的局限性和未来可能的发展方向。

Abstract
Instant delivery services, such as food delivery and package delivery, have achieved explosive growth in recent years by providing customers with daily-life convenience. An emerging research area within these services is service Route\&Time Prediction (RTP), which aims to estimate the future service route as well as the arrival time of a given worker. As one of the most crucial tasks in those service platforms, RTP stands central to enhancing user satisfaction and trimming operational expenditures on these platforms. Despite a plethora of algorithms developed to date, there is no systematic, comprehensive survey to guide researchers in this domain. To fill this gap, our work presents the first comprehensive survey that methodically categorizes recent advances in service route and time prediction. We start by defining the RTP challenge and then delve into the metrics that are often employed. Following that, we scrutinize the existing RTP methodologies, presenting a novel taxonomy of them. We categorize these methods based on three criteria: (i) type of task, subdivided into only-route prediction, only-time prediction, and joint route\&time prediction; (ii) model architecture, which encompasses sequence-based and graph-based models; and (iii) learning paradigm, including Supervised Learning (SL) and Deep Reinforcement Learning (DRL). Conclusively, we highlight the limitations of current research and suggest prospective avenues. We believe that the taxonomy, progress, and prospects introduced in this paper can significantly promote the development of this field.

摘要
快速配送服务，如食物配送和快递服务，在最近几年内取得了极大的增长，提供了日常生活的便利。一个快速发展的研究领域是服务路径预测（RTP），旨在预测未来服务路径以及工作者的到达时间。作为服务平台中最重要的任务之一，RTP对于提高用户满意度和降低运营成本具有重要意义。然而，迄今为止，没有一份系统性、全面的评论指导研究人员在这个领域。为了填补这一空白，我们的工作提供了首次的全面评论，系统地分类了最新的服务路径预测方法。我们首先定义RTP挑战，然后详细介绍使用的指标。接着，我们仔细检查现有的RTP方法，并对其进行新的分类。我们根据三个 критери予分类这些方法：（一）任务类型，分为单独的路径预测、时间预测和路径\&时间预测；（二）模型结构，包括序列基的和图基的模型；（三）学习思想，包括超级学习（SL）和深度优化学习（DRL）。最后，我们强调现有研究的局限性，并建议未来的方向。我们认为这种分类、进步和前瞻在这篇论文中具有重要的促进作用，可以推动这个领域的发展。

LogGPT: Exploring ChatGPT for Log-Based Anomaly Detection

paper_url: http://arxiv.org/abs/2309.01189
repo_url: None
paper_authors: Jiaxing Qi, Shaohan Huang, Zhongzhi Luan, Carol Fung, Hailong Yang, Depei Qian
for: 这个研究旨在提出一个基于 ChatGPT 的传统系统logs 异常检测方法，以解决高维度和噪音 logs 资料的分析问题。
methods: 本研究使用了 ChatGPT 的语言解释能力，实现了将大规模文本库中的知识转移到 logs 异常检测中。
results: 我们的实验结果显示，LogGPT 可以获得良好的效果，并且具有良好的解释性。这个研究提供了对传统系统logs 异常检测任务中 prompt-based 模型的初步探索。

Abstract
The increasing volume of log data produced by software-intensive systems makes it impractical to analyze them manually. Many deep learning-based methods have been proposed for log-based anomaly detection. These methods face several challenges such as high-dimensional and noisy log data, class imbalance, generalization, and model interpretability. Recently, ChatGPT has shown promising results in various domains. However, there is still a lack of study on the application of ChatGPT for log-based anomaly detection. In this work, we proposed LogGPT, a log-based anomaly detection framework based on ChatGPT. By leveraging the ChatGPT's language interpretation capabilities, LogGPT aims to explore the transferability of knowledge from large-scale corpora to log-based anomaly detection. We conduct experiments to evaluate the performance of LogGPT and compare it with three deep learning-based methods on BGL and Spirit datasets. LogGPT shows promising results and has good interpretability. This study provides preliminary insights into prompt-based models, such as ChatGPT, for the log-based anomaly detection task.

摘要
随着软件敏感系统中的日志数据量的增加，手动分析变得不切实际。许多深度学习基本方法已经为日志异常检测提出了多种方案。这些方法面临着高维度和噪声的日志数据、类别不均衡、泛化和模型解释性等挑战。最近，ChatGPT已经在不同领域展示了有前途的成绩。然而，针对日志异常检测的ChatGPT的应用研究仍然缺乏。本文提出了基于ChatGPT的日志异常检测框架——LogGPT。通过利用ChatGPT的语言解释能力，LogGPT希望能够利用大规模文献中的知识传递到日志异常检测中。我们对LogGPT进行了实验，并与三种深度学习基本方法进行比较。LogGPT显示了良好的性能和解释性。这项研究提供了推特模型（如ChatGPT）在日志异常检测任务中的初步洞察。

Pre-trained Neural Recommenders: A Transferable Zero-Shot Framework for Recommendation Systems

paper_url: http://arxiv.org/abs/2309.01188
repo_url: None
paper_authors: Junting Wang, Adit Krishnan, Hari Sundaram, Yunzhe Li
for: 该研究旨在开发一种基于现代神经网络的协同推荐技术，以满足电商、社交媒体和内容分享平台的成功。
methods: 该研究使用了预训练的视觉和语言模型，并explores the possibility of pre-trained recommender models that支持在新领域建立推荐系统，无需重新训练或使用auxiliary user或item信息。
results: 研究表明，通过利用用户-项目交互矩阵的统计特征，可以学习不需要用户或项目 auxillary信息的零式推荐模型，并且这些模型可以在不同领域和数据集上进行适应。

Abstract
Modern neural collaborative filtering techniques are critical to the success of e-commerce, social media, and content-sharing platforms. However, despite technical advances -- for every new application domain, we need to train an NCF model from scratch. In contrast, pre-trained vision and language models are routinely applied to diverse applications directly (zero-shot) or with limited fine-tuning. Inspired by the impact of pre-trained models, we explore the possibility of pre-trained recommender models that support building recommender systems in new domains, with minimal or no retraining, without the use of any auxiliary user or item information. Zero-shot recommendation without auxiliary information is challenging because we cannot form associations between users and items across datasets when there are no overlapping users or items. Our fundamental insight is that the statistical characteristics of the user-item interaction matrix are universally available across different domains and datasets. Thus, we use the statistical characteristics of the user-item interaction matrix to identify dataset-independent representations for users and items. We show how to learn universal (i.e., supporting zero-shot adaptation without user or item auxiliary information) representations for nodes and edges from the bipartite user-item interaction graph. We learn representations by exploiting the statistical properties of the interaction data, including user and item marginals, and the size and density distributions of their clusters.

摘要
现代神经网络合作推荐技术对电商、社交媒体和内容分享平台的成功起到了关键作用。然而，尽管技术上有所进步，但为每个新应用领域，我们仍需要从零开始训练NCF模型。相比之下，预训练视觉和语言模型可以直接应用于多个应用领域，或者只需要限定的微调。受预训练模型的影响启发了我们，我们试图开发预训练推荐模型，以支持在新领域建立推荐系统，无需重新训练，无需使用任何辅助用户或者物品信息。零shot推荐 без辅助信息是一项挑战，因为在不同的用户和物品之间没有共同的用户或者物品。我们的基本想法是，用户-物品交互矩阵的统计特征是透传的，可以在不同的领域和数据集之间形成共同的表征。因此，我们使用用户-物品交互矩阵的统计特征来定义数据集独立的用户和物品表征。我们展示了如何从二分图中学习universal（即无需用户或物品辅助信息进行适应）的表征。我们利用交互数据的统计特征，包括用户和物品的独立分布、以及用户和物品的尺寸和密度分布，来学习表征。

Cognition-Mode Aware Variational Representation Learning Framework for Knowledge Tracing

paper_url: http://arxiv.org/abs/2309.01179
repo_url: https://github.com/zmy-9/CMVF
paper_authors: Moyu Zhang, Xinning Zhu, Chunhong Zhang, Feng Pan, Wenchen Qian, Hui Zhao
for: 这篇论文的目的是帮助学生Personalized Learning中的知识追踪（KT）任务，解决该任务的资料罕见问题，以及将学生的实际状态转换为更加Robust的表现。
methods: 这篇论文提出了一个Cognition-Mode Aware Variational Representation Learning Framework（CMVF），可以直接应用于现有的KT方法。CMVF使用一个几率模型来生成每个学生的分布，考虑到有限实践记录的不确定性，并使用variational inference（VI）估计学生的分布。此外，我们还引入了一个认知模式意识的多元分布作为专家知识，以避免学生对于有限实践记录的过度个性化。
results: 实验结果显示，CMVF可以有效地帮助现有的KT方法学习更加Robust的学生表现。

Abstract
The Knowledge Tracing (KT) task plays a crucial role in personalized learning, and its purpose is to predict student responses based on their historical practice behavior sequence. However, the KT task suffers from data sparsity, which makes it challenging to learn robust representations for students with few practice records and increases the risk of model overfitting. Therefore, in this paper, we propose a Cognition-Mode Aware Variational Representation Learning Framework (CMVF) that can be directly applied to existing KT methods. Our framework uses a probabilistic model to generate a distribution for each student, accounting for uncertainty in those with limited practice records, and estimate the student's distribution via variational inference (VI). In addition, we also introduce a cognition-mode aware multinomial distribution as prior knowledge that constrains the posterior student distributions learning, so as to ensure that students with similar cognition modes have similar distributions, avoiding overwhelming personalization for students with few practice records. At last, extensive experimental results confirm that CMVF can effectively aid existing KT methods in learning more robust student representations. Our code is available at https://github.com/zmy-9/CMVF.

摘要
知识跟踪（KT）任务在个性化学习中扮演着关键角色，其目的是预测学生的回答基于他们历史实践行为序列。然而，KT任务受到数据稀缺的影响，这使得学习 robust 的学生表示变得更加挑战，同时增加了模型适应过拟合的风险。因此，在这篇论文中，我们提出了一种基于变量学习框架（CMVF），可以直接应用于现有的 KT 方法。我们的框架使用一种 probabilistic 模型来生成每个学生的分布，考虑到有限实践记录下的不确定性，并通过变量推理（VI）来估计学生的分布。此外，我们还引入了认知模式意识的多omial分布作为先验知识，以避免学生具有少量实践记录的情况下过度个性化。最后，我们进行了广泛的实验研究，证明CMVF可以有效地帮助现有的 KT 方法学习更加 robust 的学生表示。我们的代码可以在 https://github.com/zmy-9/CMVF 上获取。

Logic of subjective probability

paper_url: http://arxiv.org/abs/2309.01173
repo_url: None
paper_authors: Vladimir Vovk
for: 本文研究对主观概率的 sintax和 semantics。
methods: 本文使用多种方法来测试概率陈述，包括间subjective概率和不人性概率。
results: 本文 argue that已经被测试过的不人性概率具有对象概率的特征，并采用 Jeffreys’s law来支持这一想法。

Abstract
In this paper I discuss both syntax and semantics of subjective probability. The semantics determines ways of testing probability statements. Among important varieties of subjective probabilities are intersubjective probabilities and impersonal probabilities, and I will argue that well-tested impersonal probabilities acquire features of objective probabilities. Jeffreys's law, my next topic, states that two successful probability forecasters must issue forecasts that are close to each other, thus supporting the idea of objective probabilities. Finally, I will discuss connections between subjective and frequentist probability.

摘要
在这篇论文中，我讨论了主观概率的语法和 semantics。 semantics 确定了概率声明的测试方法。重要的主观概率包括 между人共识概率和无人共识概率，我会 argue 这些经过测试的无人共识概率具有目的对象概率的特征。 Jeffreys's law 是我下一个话题，它表明两个成功的概率预测人必须发布的预测结果几乎相同，从而支持目的对象概率的想法。最后，我会讨论主观概率和频率主义概率之间的关系。

FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs

paper_url: http://arxiv.org/abs/2309.01172
repo_url: None
paper_authors: Zhenheng Tang, Yuxin Wang, Xin He, Longteng Zhang, Xinglin Pan, Qiang Wang, Rongfei Zeng, Kaiyong Zhao, Shaohuai Shi, Bingsheng He, Xiaowen Chu
for: 这篇论文旨在解决大型自然语言模型（LLM）的内存和计算需求快速增长，对于没有大规模高端GPU的人员而言，训练或部署LLM受到阻碍。但是，consumer-level GPU却通常被LLM忽略，因为它们的计算能力较弱，储存容量较小，并且通信带宽较低。此外，用户可能有隐私问题在与远端LLM进行互动。
methods: 这篇论文提出了一个分散式系统，以解开consumer-level GPU的潜力在预训、推导和精度调整 LLM 中。但是，这个系统面临了重要挑战，包括CPU和GPU内存有限，低网络带宽，节点和设备多样性。
results: 我们的系统设计包括：1）一个中继处理器，以实现动态加入和退出计算提供者；2）任务排程，以提高系统效率；3）将机器学习过程抽象为指向无顺序图（DAG），以实现模型和任务通用性。我们的性能分析显示，50个RTX 3080 GPU可以 дости持比4个H100 GPU，它们都是许多更昂贵的。

Abstract
The rapid growth of memory and computation requirements of large language models (LLMs) has outpaced the development of hardware, hindering people who lack large-scale high-end GPUs from training or deploying LLMs. However, consumer-level GPUs, which constitute a larger market share, are typically overlooked in LLM due to their weaker computing performance, smaller storage capacity, and lower communication bandwidth. Additionally, users may have privacy concerns when interacting with remote LLMs. In this paper, we envision a decentralized system unlocking the potential vast untapped consumer-level GPUs in pre-training, inference and fine-tuning of LLMs with privacy protection. However, this system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity. To address these challenges, our system design incorporates: 1) a broker with backup pool to implement dynamic join and quit of computing providers; 2) task scheduling with hardware performance to improve system efficiency; 3) abstracting ML procedures into directed acyclic graphs (DAGs) to achieve model and task universality; 4) abstracting intermediate represention and execution planes to ensure compatibility of various devices and deep learning (DL) frameworks. Our performance analysis demonstrates that 50 RTX 3080 GPUs can achieve throughputs comparable to those of 4 H100 GPUs, which are significantly more expensive.

摘要
大型语言模型（LLM）的快速增长对于硬件的开发落后，使得没有大规模高端GPU的人员无法训练或部署LLM。然而，consumer-level GPU占了更大的市场份额，但它们通常在LLM中被忽略因为它们的计算性能较弱、存储容量较小、通信带宽较低。此外，用户可能有隐私问题在与远程LLM互动。本文描述了一个分布式系统，让consumer-level GPU在预训、处理和精确化LLM中发挥潜力，并提供隐私保护。然而，这个系统面临着重要的挑战，包括CPU和GPU内存有限、低网络带宽、执行环境和设备多样性。为解决这些挑战，我们的系统设计包括：1. 中介人员伙伴库，以进行动态加入和退出计算提供者的实现。2. 根据硬件性能进行任务调度，以提高系统效率。3. 将机器学习过程抽象为指向的无向 graphs（DAGs），以实现模型和任务通用性。4. 将中间表示和执行计划抽象为保证不同设备和深度学习（DL）框架的相容性。我们的性能分析显示，50个RTX 3080 GPU可以 achievable throughputs comparable to those of 4个H100 GPU，这些 GPU 的成本很高。

End-to-End Learning on Multimodal Knowledge Graphs

paper_url: http://arxiv.org/abs/2309.01169
repo_url: https://gitlab.com/wxwilcke/mrgcn
paper_authors: W. X. Wilcke, P. Bloem, V. de Boer, R. H. van t Veer
for: This paper aims to enable data scientists to learn end-to-end on heterogeneous knowledge by proposing a multimodal message passing network that can learn from the structure of graphs and multimodal node features.
methods: The proposed model uses dedicated neural encoders to learn embeddings for node features belonging to five different types of modalities, which are then projected into a joint representation space together with their relational information.
results: The authors implement and demonstrate their model on node classification and link prediction for artificial and real-world datasets, and conduct an inverse ablation study to evaluate the effect that each modality has on the overall performance. The results show that end-to-end multimodal learning from any arbitrary knowledge graph is possible, and that including multimodal information can significantly affect performance, but much depends on the characteristics of the data.Here’s the Chinese translation of the three points:
for: 这篇论文目的是帮助数据科学家通过多Modal的消息传递网络来学习综合知识。
methods: 提议的模型使用专门的神经网络编码器来学习节点特征所属不同类型的多Modalities，然后将其投影到共同表示空间中。
results: 作者实现并在人工和实际世界数据集上进行节点分类和链接预测，并进行反归消除研究来评估每种Modalities对总性能的影响。结果表明，任意知识图Multimodal端到端学习是可能的，并且包含多Modal信息可以对性能产生显著影响，但是这取决于数据的特点。

Abstract
Knowledge graphs enable data scientists to learn end-to-end on heterogeneous knowledge. However, most end-to-end models solely learn from the relational information encoded in graphs' structure: raw values, encoded as literal nodes, are either omitted completely or treated as regular nodes without consideration for their values. In either case we lose potentially relevant information which could have otherwise been exploited by our learning methods. We propose a multimodal message passing network which not only learns end-to-end from the structure of graphs, but also from their possibly divers set of multimodal node features. Our model uses dedicated (neural) encoders to naturally learn embeddings for node features belonging to five different types of modalities, including numbers, texts, dates, images and geometries, which are projected into a joint representation space together with their relational information. We implement and demonstrate our model on node classification and link prediction for artificial and real-worlds datasets, and evaluate the effect that each modality has on the overall performance in an inverse ablation study. Our results indicate that end-to-end multimodal learning from any arbitrary knowledge graph is indeed possible, and that including multimodal information can significantly affect performance, but that much depends on the characteristics of the data.

摘要
知识 graphs 启用数据科学家学习终端到不同类型的知识。然而，大多数终端模型只学习图structure中的关系信息，Raw values 作为literal nodes被完全 omitted 或者 treated as regular nodes without consideration for their values。在这种情况下，我们可能会产生可以利用我们学习方法的潜在信息。我们提议一种多modal message passing network，不仅学习终端从图structure，还从图中可能多样化的多modal node features。我们的模型使用专门（神经网络）编码器来自然学习节点特征的嵌入，包括数字、文本、日期、图像和几何特征，这些特征被投影到共同表示空间中，与关系信息一起。我们实现并示cases on node classification和链接预测任务上，并通过反向减少研究来评估每个模式对总性能的影响。我们的结果表明，从任何arbitrary知识图中进行终端多modal学习是可能的，并且包含多modal信息可以对性能产生重要影响，但是具体取决于数据的特点。

Spatial-temporal Vehicle Re-identification

paper_url: http://arxiv.org/abs/2309.01166
repo_url: https://github.com/Zhongdao/VehicleReIDKeyPointData
paper_authors: Hye-Geun Kim, YouKyoung Na, Hae-Won Joe, Yong-Hyuk Moon, Yeong-Jun Cho
for: 解决大规模摄像头网络中的车辆重新识别问题，提高公共安全、交通管理和安全性。
methods: 基于可适应Parzen窗法估算相机网络拓扑，并将相机空间temporal相似性和外观相似性合理地融合，使用协调网络进行组合。
results: 在公共数据集（VeRi776）上实现了99.64%的排名1准确率，表明利用空间和时间信息可以提高外观基于方法的准确率，有效地处理车辆外观模糊问题。

Abstract
Vehicle re-identification (ReID) in a large-scale camera network is important in public safety, traffic control, and security. However, due to the appearance ambiguities of vehicle, the previous appearance-based ReID methods often fail to track vehicle across multiple cameras. To overcome the challenge, we propose a spatial-temporal vehicle ReID framework that estimates reliable camera network topology based on the adaptive Parzen window method and optimally combines the appearance and spatial-temporal similarities through the fusion network. Based on the proposed methods, we performed superior performance on the public dataset (VeRi776) by 99.64% of rank-1 accuracy. The experimental results support that utilizing spatial and temporal information for ReID can leverage the accuracy of appearance-based methods and effectively deal with appearance ambiguities.

摘要

Large Language Models for Generative Recommendation: A Survey and Visionary Discussions

paper_url: http://arxiv.org/abs/2309.01157
repo_url: None
paper_authors: Lei Li, Yongfeng Zhang, Dugang Liu, Li Chen
for: 这篇论文主要是为了探讨大语言模型（LLM）在推荐系统（RS）中的应用和发展。
methods: 这篇论文使用了大量的文献研究和分析，探讨了LLM在RS中的应用，包括直接从整个item池中生成推荐。
results: 论文发现，LLM可以在RS中提供更加准确和个性化的推荐，同时也可以简化推荐过程，从而提高推荐系统的效率和可靠性。

Abstract
Recent years have witnessed the wide adoption of large language models (LLM) in different fields, especially natural language processing and computer vision. Such a trend can also be observed in recommender systems (RS). However, most of related work treat LLM as a component of the conventional recommendation pipeline (e.g., as a feature extractor) which may not be able to fully leverage the generative power of LLM. Instead of separating the recommendation process into multiple stages such as score computation and re-ranking, this process can be simplified to one stage with LLM: directly generating recommendations from the complete pool of items. This survey reviews the progress, methods and future directions of LLM-based generative recommendation by examining three questions: 1) What generative recommendation is, 2) Why RS should advance to generative recommendation, and 3) How to implement LLM-based generative recommendation for various RS tasks. We hope that the survey can provide the context and guidance needed to explore this interesting and emerging topic.

摘要
近年来，大语言模型（LLM）在不同领域得到广泛应用，特别是自然语言处理和计算机视觉。这种趋势也可以在推荐系统（RS）中见到。然而，大多数相关工作都将 LLM 视为传统推荐管道的一部分（例如特征提取器），这可能无法充分利用 LLM 的生成能力。相反，可以将推荐过程简化为一个阶段， directly generating recommendations from the complete pool of items，而不是将推荐过程分解为多个阶段，如分数计算和重新排序。本文将评查 LLM 基于生成推荐的进步、方法和未来发展方向。 Specifically, we will examine three questions: 1) What is generative recommendation, 2) Why should RS advance to generative recommendation, and 3) How to implement LLM-based generative recommendation for various RS tasks. We hope that this survey can provide the necessary context and guidance to explore this interesting and emerging topic.

FedFwd: Federated Learning without Backpropagation

paper_url: http://arxiv.org/abs/2309.01150
repo_url: None
paper_authors: Seonghwan Park, Dahun Shin, Jinseok Chung, Namhoon Lee
for: 降低 federated learning（FL）中客户端的资源限制，提高训练效率。
methods: 利用 Hinton（2022）提出的 recent BP-free method，即 Forward Forward algorithm，在本地训练过程中进行 layer-wise 地本地更新参数。
results: 在标准dataset上进行了多种实验，如 MNIST 和 CIFAR-10，并显示 FedFwd 与其他 BP-dependent FL 方法相当竞争。

Abstract
In federated learning (FL), clients with limited resources can disrupt the training efficiency. A potential solution to this problem is to leverage a new learning procedure that does not rely on backpropagation (BP). We present a novel approach to FL called FedFwd that employs a recent BP-free method by Hinton (2022), namely the Forward Forward algorithm, in the local training process. FedFwd can reduce a significant amount of computations for updating parameters by performing layer-wise local updates, and therefore, there is no need to store all intermediate activation values during training. We conduct various experiments to evaluate FedFwd on standard datasets including MNIST and CIFAR-10, and show that it works competitively to other BP-dependent FL methods.

摘要
在联合学习（FL）中，客户端具有有限资源可能会干扰训练效率。我们提出一种新的学习方法，不依赖于反卷推（BP）。我们称之为FedFwd，它利用截止往返算法（Hinton，2022），在本地训练过程中进行层次分解更新参数。因此，无需在训练过程中存储所有中间活动值。我们在标准数据集上进行了多种实验，包括MNIST和CIFAR-10，并证明FedFwd与其他依赖于BP的FL方法相当竞争。

Interpretable Sequence Clustering

paper_url: http://arxiv.org/abs/2309.01140
repo_url: https://github.com/jd445/Interpretable-Sequence-Clustering-Tree
paper_authors: Junjie Dong, Xinyi Yang, Mudi Jiang, Lianyu Hu, Zengyou He
for: 解决 categorical sequence clustering 中 interpretability 问题，提供一种可解释的树结构。
methods: combinatorial patterns 和 boosting-based construction strategy，first project sequences into random subspaces, then use k-means algorithm to obtain initial cluster assignments, and construct a pattern-based decision tree.
results: 实验结果表明，提posed method 可以提供可解释的树结构，同时具有快速和准确的cluster assignments。

Abstract
Categorical sequence clustering plays a crucial role in various fields, but the lack of interpretability in cluster assignments poses significant challenges. Sequences inherently lack explicit features, and existing sequence clustering algorithms heavily rely on complex representations, making it difficult to explain their results. To address this issue, we propose a method called Interpretable Sequence Clustering Tree (ISCT), which combines sequential patterns with a concise and interpretable tree structure. ISCT leverages k-1 patterns to generate k leaf nodes, corresponding to k clusters, which provides an intuitive explanation on how each cluster is formed. More precisely, ISCT first projects sequences into random subspaces and then utilizes the k-means algorithm to obtain high-quality initial cluster assignments. Subsequently, it constructs a pattern-based decision tree using a boosting-based construction strategy in which sequences are re-projected and re-clustered at each node before mining the top-1 discriminative splitting pattern. Experimental results on 14 real-world data sets demonstrate that our proposed method provides an interpretable tree structure while delivering fast and accurate cluster assignments.

摘要
Note:* "可解释" (可解释) in Chinese means "interpretable" or "explainable".* "序列" (序列) in Chinese means "sequence".* "划分" (划分) in Chinese means "clustering" or "partitioning".* "树" (树) in Chinese means "tree".

Financial Fraud Detection using Quantum Graph Neural Networks

paper_url: http://arxiv.org/abs/2309.01127
repo_url: None
paper_authors: Nouhaila Innan, Abhishek Sawaika, Ashim Dhor, Siddhant Dutta, Sairupa Thota, Husayn Gokal, Nandan Patel, Muhammad Al-Zafar Khan, Ioannis Theodonis, Mohamed Bennai
for: 防止金融欺诈和保持金融机构的声誉
methods: 使用量子图 neural network (QGNN) 和变量量子电路 (VQC)
results: 在实际金融欺诈检测数据集上，QGNN 的 AUC 为 0.85，高于 classical GNNHere’s the full translation of the abstract in Simplified Chinese:防止金融欺诈和保持金融机构的声誉是非常重要的。然而，现有的金融欺诈检测方法有限制性，需要新的方法来提高检测率。在这篇论文中，我们提出了一种使用量子图 neural network (QGNN) 和变量量子电路 (VQC) 的新方法，用于检测金融欺诈。我们使用了一个实际的金融欺诈检测数据集，并将 QGNN 与 classical GNN 进行比较。结果显示，QGNN 的 AUC 为 0.85，高于 classical GNN。这些研究表明了 QGNN 的潜在优势，并建议 QGNN 作为改进金融欺诈检测的新方法。

Abstract
Financial fraud detection is essential for preventing significant financial losses and maintaining the reputation of financial institutions. However, conventional methods of detecting financial fraud have limited effectiveness, necessitating the need for new approaches to improve detection rates. In this paper, we propose a novel approach for detecting financial fraud using Quantum Graph Neural Networks (QGNNs). QGNNs are a type of neural network that can process graph-structured data and leverage the power of Quantum Computing (QC) to perform computations more efficiently than classical neural networks. Our approach uses Variational Quantum Circuits (VQC) to enhance the performance of the QGNN. In order to evaluate the efficiency of our proposed method, we compared the performance of QGNNs to Classical Graph Neural Networks using a real-world financial fraud detection dataset. The results of our experiments showed that QGNNs achieved an AUC of $0.85$, which outperformed classical GNNs. Our research highlights the potential of QGNNs and suggests that QGNNs are a promising new approach for improving financial fraud detection.

摘要
财务欺诈检测是预防重大财务损失和保持金融机构声誉的关键。然而，传统的金融欺诈检测方法有限制，需要新的方法来提高检测率。在这篇论文中，我们提出了一种使用量子图神经网络（QGNN）来检测金融欺诈的新方法。QGNN是一种可以处理图 струкured 数据的神经网络，并且可以利用量子计算（QC）来进行计算，比 классические神经网络更高效。我们的方法使用变量量子电路（VQC）来增强QGNN的性能。为了评估我们的提议的效果，我们比较了QGNN和经典的图神经网络（GNN）在一个真实的金融欺诈检测数据集上的性能。实验结果表明，QGNN达到了 AUC 的 $0.85$，高于经典 GNN。我们的研究表明了 QGNN 的潜在优势，并建议 QGNN 是一种有前途的新方法，可以改善金融欺诈检测。

MedChatZH: a Better Medical Adviser Learns from Better Instructions

paper_url: http://arxiv.org/abs/2309.01114
repo_url: https://github.com/tyang816/medchatzh
paper_authors: Yang Tan, Mingchen Li, Zijie Huang, Huiqun Yu, Guisheng Fan
for: 这个研究是为了提高特殊领域的中医Question Answering（QA）系统，使用Generative大型自然语言模型（LLMs）。
methods: 我们使用了特定领域的中医书籍进行预训，并与精心挑选的医疗指令集进行微调。
results: 我们的模型在一个真实的医疗对话数据集上表现出色，超过了几个固定基准的模型。我们释出了我们的模型、代码和数据集，并且鼓励更多的研究者参与这个领域的研究。

Abstract
Generative large language models (LLMs) have shown great success in various applications, including question-answering (QA) and dialogue systems. However, in specialized domains like traditional Chinese medical QA, these models may perform unsatisfactorily without fine-tuning on domain-specific datasets. To address this, we introduce MedChatZH, a dialogue model designed specifically for traditional Chinese medical QA. Our model is pre-trained on Chinese traditional medical books and fine-tuned with a carefully curated medical instruction dataset. It outperforms several solid baselines on a real-world medical dialogue dataset. We release our model, code, and dataset on https://github.com/tyang816/MedChatZH to facilitate further research in the domain of traditional Chinese medicine and LLMs.

摘要
大型生成语言模型（LLMs）在不同应用领域中表现出色，包括问答（QA）和对话系统。然而，在专门的中文传统医学问答领域中，这些模型可能无法达到预期的性能，需要进行域pecific的 fine-tuning。为解决这个问题，我们介绍了MedChatZH，一种专门为中文传统医学问答设计的对话模型。我们的模型在中文传统医学书籍上进行预训练，并与仔细编辑的医学指导数据集进行了精度调整。与几个固定基eline相比，我们的模型在真实的医学对话数据集上表现出色。我们将我们的模型、代码和数据集发布到https://github.com/tyang816/MedChatZH，以便进一步的研究在中文传统医学领域和LLMs之间的关系。

A Study on the Implementation of Generative AI Services Using an Enterprise Data-Based LLM Application Architecture

paper_url: http://arxiv.org/abs/2309.01105
repo_url: None
paper_authors: Cheonsu Jeong
for: 本研究旨在提供一种基于大语言模型（LLM）应用架构的生成AI服务实现方法。
methods: 本研究使用精度调整技术和直接文档 интеграción来缓解数据缺乏问题，并开发了一种名为Retrieval-Augmented Generation（RAG）模型，以提高信息存储和检索过程，从而改善内容生成。
results: 研究表明，RAG模型能够有效地缓解数据缺乏问题，并且可以在实际应用中提高LLM服务的可用性。

Abstract
This study presents a method for implementing generative AI services by utilizing the Large Language Models (LLM) application architecture. With recent advancements in generative AI technology, LLMs have gained prominence across various domains. In this context, the research addresses the challenge of information scarcity and proposes specific remedies by harnessing LLM capabilities. The investigation delves into strategies for mitigating the issue of inadequate data, offering tailored solutions. The study delves into the efficacy of employing fine-tuning techniques and direct document integration to alleviate data insufficiency. A significant contribution of this work is the development of a Retrieval-Augmented Generation (RAG) model, which tackles the aforementioned challenges. The RAG model is carefully designed to enhance information storage and retrieval processes, ensuring improved content generation. The research elucidates the key phases of the information storage and retrieval methodology underpinned by the RAG model. A comprehensive analysis of these steps is undertaken, emphasizing their significance in addressing the scarcity of data. The study highlights the efficacy of the proposed method, showcasing its applicability through illustrative instances. By implementing the RAG model for information storage and retrieval, the research not only contributes to a deeper comprehension of generative AI technology but also facilitates its practical usability within enterprises utilizing LLMs. This work holds substantial value in advancing the field of generative AI, offering insights into enhancing data-driven content generation and fostering active utilization of LLM-based services within corporate settings.

摘要
The study introduces a Retrieval-Augmented Generation (RAG) model, which tackles the aforementioned challenges by enhancing information storage and retrieval processes. The RAG model is carefully designed to improve content generation. The research elucidates the key phases of the information storage and retrieval methodology underpinned by the RAG model, and a comprehensive analysis of these steps is undertaken to emphasize their significance in addressing data scarcity.The study demonstrates the efficacy of the proposed method through illustrative instances, showcasing its applicability within enterprises utilizing LLMs. By implementing the RAG model for information storage and retrieval, the research contributes to a deeper understanding of generative AI technology and facilitates its practical usability within corporate settings. This work holds substantial value in advancing the field of generative AI, offering insights into enhancing data-driven content generation and fostering active utilization of LLM-based services.

M2HGCL: Multi-Scale Meta-Path Integrated Heterogeneous Graph Contrastive Learning

paper_url: http://arxiv.org/abs/2309.01101
repo_url: None
paper_authors: Yuanyuan Guo, Yu Xia, Rui Wang, Rongcheng Duan, Lu Li, Jiangmeng Li
for: This paper focuses on improving the performance of heterogeneous graph contrastive learning models by proposing a new multi-scale meta-path integrated model (M2HGCL) that captures discriminative information from various types of meta-paths.
methods: The proposed M2HGCL model discards the conventional heterogeneity-homogeneity transformation and performs graph contrastive learning in a joint manner, aggregating direct neighbor information, initial meta-path neighbor information, and expanded meta-path neighbor information to capture sufficient discriminative information.
results: The proposed M2HGCL model outperforms current state-of-the-art baseline models on three real-world datasets through extensive experiments, demonstrating its effectiveness in improving the performance of heterogeneous graph contrastive learning models.

Abstract
Inspired by the successful application of contrastive learning on graphs, researchers attempt to impose graph contrastive learning approaches on heterogeneous information networks. Orthogonal to homogeneous graphs, the types of nodes and edges in heterogeneous graphs are diverse so that specialized graph contrastive learning methods are required. Most existing methods for heterogeneous graph contrastive learning are implemented by transforming heterogeneous graphs into homogeneous graphs, which may lead to ramifications that the valuable information carried by non-target nodes is undermined thereby exacerbating the performance of contrastive learning models. Additionally, current heterogeneous graph contrastive learning methods are mainly based on initial meta-paths given by the dataset, yet according to our deep-going exploration, we derive empirical conclusions: only initial meta-paths cannot contain sufficiently discriminative information; and various types of meta-paths can effectively promote the performance of heterogeneous graph contrastive learning methods. To this end, we propose a new multi-scale meta-path integrated heterogeneous graph contrastive learning (M2HGCL) model, which discards the conventional heterogeneity-homogeneity transformation and performs the graph contrastive learning in a joint manner. Specifically, we expand the meta-paths and jointly aggregate the direct neighbor information, the initial meta-path neighbor information and the expanded meta-path neighbor information to sufficiently capture discriminative information. A specific positive sampling strategy is further imposed to remedy the intrinsic deficiency of contrastive learning, i.e., the hard negative sample sampling issue. Through extensive experiments on three real-world datasets, we demonstrate that M2HGCL outperforms the current state-of-the-art baseline models.

摘要
研究人员受到同化学习在图上的成功应用的启发，尝试将同化学习方法应用于不同类型节点和边的异质图。与同质图不同的是，异质图中节点和边的类型多样化，因此需要特化的同化学习方法。现有的异质图同化学习方法大多是通过将异质图转化为同质图来实现，这可能会导致非目标节点上的有价信息被抑制，从而降低同化学习模型的性能。另外，现有的异质图同化学习方法主要基于数据集提供的初始元PATH，但根据我们的深入探索，我们得出了实证结论：只有初始元PATH不能含有足够的分化信息；而不同类型的元PATH可以有效提高异质图同化学习模型的性能。为此，我们提出了一种新的多级元PATH集成的异质图同化学习（M2HGCL）模型，该模型不需要将异质图转化为同质图，而是直接在异质图上进行同化学习。具体来说，我们将元PATH扩展，并同时对直接邻居信息、初始元PATH邻居信息和扩展元PATH邻居信息进行联合聚合，以足够捕捉分化信息。此外，我们还采用了一种特定的正样本采样策略，以解决对异质图同化学习的内在缺陷，即困难的负样本采样问题。通过对三个实际数据集进行广泛的实验，我们证明了M2HGCL模型比现状之最先进基eline模型具有更高的性能。

Stabilize to Act: Learning to Coordinate for Bimanual Manipulation

paper_url: http://arxiv.org/abs/2309.01087
repo_url: None
paper_authors: Jennifer Grannen, Yilin Wu, Brandon Vu, Dorsa Sadigh
for: 这篇论文旨在提供一种解决高维控制问题的策略，以便在两手控制系统中实现高级别的掌控能力。
methods: 该策略基于人类的启发，提出了一种新的角色分配框架，其中一个稳定臂用于保持物品不变，而另一个执行臂用于完成任务。该框架使用一个学习的稳定 repositing 类型来 alternate между维护稳定位置和执行任务。
results: 在四种不同复杂度的双手任务上，BUDS 使用 20 个示例并达到 76.9% 的任务成功率，并能够在不同类型的对象上进行扩展。相比之下，不结构化基线方法只能达到 43.3% 的成功率。

Abstract
Key to rich, dexterous manipulation in the real world is the ability to coordinate control across two hands. However, while the promise afforded by bimanual robotic systems is immense, constructing control policies for dual arm autonomous systems brings inherent difficulties. One such difficulty is the high-dimensionality of the bimanual action space, which adds complexity to both model-based and data-driven methods. We counteract this challenge by drawing inspiration from humans to propose a novel role assignment framework: a stabilizing arm holds an object in place to simplify the environment while an acting arm executes the task. We instantiate this framework with BimanUal Dexterity from Stabilization (BUDS), which uses a learned restabilizing classifier to alternate between updating a learned stabilization position to keep the environment unchanged, and accomplishing the task with an acting policy learned from demonstrations. We evaluate BUDS on four bimanual tasks of varying complexities on real-world robots, such as zipping jackets and cutting vegetables. Given only 20 demonstrations, BUDS achieves 76.9% task success across our task suite, and generalizes to out-of-distribution objects within a class with a 52.7% success rate. BUDS is 56.0% more successful than an unstructured baseline that instead learns a BC stabilizing policy due to the precision required of these complex tasks. Supplementary material and videos can be found at https://sites.google.com/view/stabilizetoact .

摘要
针对实际世界中的灵活操作，关键是在两手之间协调控制。然而，建立双手自主系统的控制策略具有内在的挑战。一个这样的挑战是双手动作空间的高维度，这会使模型基于方法和数据驱动方法都变得复杂。我们从人类的经验中着想出一种新的角色分配框架：一个稳定化手持物体以简化环境，而另一个执行手执行任务。我们实现了这种框架，并命名为BUDS（双手稳定到行动），它使用一个学习的稳定化分类器来 alternate между更新一个学习的稳定化位置，以保持环境不变，并使用一个学习来自示例的行动策略来完成任务。我们在四个不同复杂度的双手任务上进行了实验，包括zip Jackets和切 vegetables，并只需20个示例来 achieve 76.9%的任务成功率。此外，BUDS还能够在不同类型的物体上generalize，并在不同的环境中保持52.7%的成功率。相比之下，不结构化的基准模型只能达到56.0%的成功率。详细的材料和视频可以在https://sites.google.com/view/stabilizetoact找到。

UnsMOT: Unified Framework for Unsupervised Multi-Object Tracking with Geometric Topology Guidance

paper_url: http://arxiv.org/abs/2309.01078
repo_url: None
paper_authors: Son Tran, Cong Tran, Anh Tran, Cuong Pham
for: 提高无监督多目标跟踪（MOT）方法的性能，避免高昂的数据标注成本。
methods: 提出了一种名为UnsMOT的新框架，其将视觉特征和动作特征与几何信息结合，以提供更准确的跟踪。
results: 实验结果表明，与现有方法相比，UnsMOT方法在HOTA、IDF1和MOTA指标上表现出色。

Abstract
Object detection has long been a topic of high interest in computer vision literature. Motivated by the fact that annotating data for the multi-object tracking (MOT) problem is immensely expensive, recent studies have turned their attention to the unsupervised learning setting. In this paper, we push forward the state-of-the-art performance of unsupervised MOT methods by proposing UnsMOT, a novel framework that explicitly combines the appearance and motion features of objects with geometric information to provide more accurate tracking. Specifically, we first extract the appearance and motion features using CNN and RNN models, respectively. Then, we construct a graph of objects based on their relative distances in a frame, which is fed into a GNN model together with CNN features to output geometric embedding of objects optimized using an unsupervised loss function. Finally, associations between objects are found by matching not only similar extracted features but also geometric embedding of detections and tracklets. Experimental results show remarkable performance in terms of HOTA, IDF1, and MOTA metrics in comparison with state-of-the-art methods.

摘要
We first extract object appearance features using a convolutional neural network (CNN) and motion features using a recurrent neural network (RNN). Then, we create a graph of objects based on their relative distances in a frame, which is fed into a graph neural network (GNN) together with CNN features. The GNN outputs geometric embeddings of objects that are optimized using an unsupervised loss function. Finally, we use both feature extraction and geometric embedding to associate objects, by matching not only similar extracted features but also the geometric embeddings of detections and tracklets.Our experimental results show impressive performance in terms of HOTA, IDF1, and MOTA metrics, outperforming state-of-the-art methods.

Multidomain transformer-based deep learning for early detection of network intrusion

paper_url: http://arxiv.org/abs/2309.01070
repo_url: None
paper_authors: Jinxin Liu, Murat Simsek, Michele Nogueira, Burak Kantarci
For: This paper aims to improve the timeliness of Network Intrusion Detection Systems (NIDS) by using Multivariate Time Series (MTS) early detection to identify malicious flows before they reach their target systems.* Methods: The paper proposes a novel feature extractor called Time Series Network Flow Meter (TS-NFM) to represent network flows as MTS with explainable features. It also introduces a new deep learning-based early detection model called Multi-Domain Transformer (MDT) that incorporates the frequency domain into Transformer, and a Multi-Domain Multi-Head Attention (MD-MHA) mechanism to improve feature extraction.* Results: The proposed methodology improves the earliness of conventional NIDS by 5x10^4 times and duration-based earliness by a factor of 60, resulting in a 84.1% macro F1 score (31% higher than Transformer) on the SCVIC-TS-2022 dataset. The proposed MDT also outperforms state-of-the-art early detection methods by 5% and 6% on ECG and Wafer datasets, respectively.

Abstract
Timely response of Network Intrusion Detection Systems (NIDS) is constrained by the flow generation process which requires accumulation of network packets. This paper introduces Multivariate Time Series (MTS) early detection into NIDS to identify malicious flows prior to their arrival at target systems. With this in mind, we first propose a novel feature extractor, Time Series Network Flow Meter (TS-NFM), that represents network flow as MTS with explainable features, and a new benchmark dataset is created using TS-NFM and the meta-data of CICIDS2017, called SCVIC-TS-2022. Additionally, a new deep learning-based early detection model called Multi-Domain Transformer (MDT) is proposed, which incorporates the frequency domain into Transformer. This work further proposes a Multi-Domain Multi-Head Attention (MD-MHA) mechanism to improve the ability of MDT to extract better features. Based on the experimental results, the proposed methodology improves the earliness of the conventional NIDS (i.e., percentage of packets that are used for classification) by 5x10^4 times and duration-based earliness (i.e., percentage of duration of the classified packets of a flow) by a factor of 60, resulting in a 84.1% macro F1 score (31% higher than Transformer) on SCVIC-TS-2022. Additionally, the proposed MDT outperforms the state-of-the-art early detection methods by 5% and 6% on ECG and Wafer datasets, respectively.

摘要
timely response of Network Intrusion Detection Systems (NIDS) 是受流生成过程的限制，需要accumulation of network packets。这篇论文介绍了Multivariate Time Series (MTS) early detection into NIDS，以识别恶意流之前到达目标系统。为此，我们首先提出了一种新的特征提取器，Time Series Network Flow Meter (TS-NFM)，它将网络流转换为MTS，并提取可解释的特征。此外，我们还创建了一个新的benchmark dataset，使用TS-NFM和CICIDS2017的元数据，称为SCVIC-TS-2022。此外，我们还提出了一种新的深度学习基于Transformer的早期检测模型，即Multi-Domain Transformer (MDT)，它在频率频谱中包含Transformer。此外，我们还提出了一种Multi-Domain Multi-Head Attention (MD-MHA)机制，以提高MTD的特征提取能力。根据实验结果，我们的方法提高了传统NIDS的早期响应（即流经过核心率）5x10^4倍，并提高了持续时间基于的早期响应（即分类后的流 duration）的因子60，从而达到了84.1%的macro F1分数（31%高于Transformer）。此外，我们的MTD还超过了当前早期检测方法的状态。

Separable Hamiltonian Neural Networks

paper_url: http://arxiv.org/abs/2309.01069
repo_url: https://github.com/zykhoo/separablenns
paper_authors: Zi-Yu Khoo, Jonathan Sze Choong Low, Stéphane Bressan
for: 用于透过确定数据库中的问题，推断汉密尔数据的问题。
methods: 使用汉密尔神经网络，将汉密尔系统的问题转化为确定的数据库中的问题，并将问题转化为确定的数据库中的问题。
results: 透过将问题转化为确定的数据库中的问题，使得汉密尔神经网络可以更好地预测汉密尔系统的问题。

Abstract
The modelling of dynamical systems from discrete observations is a challenge faced by modern scientific and engineering data systems. Hamiltonian systems are one such fundamental and ubiquitous class of dynamical systems. Hamiltonian neural networks are state-of-the-art models that unsupervised-ly regress the Hamiltonian of a dynamical system from discrete observations of its vector field under the learning bias of Hamilton's equations. Yet Hamiltonian dynamics are often complicated, especially in higher dimensions where the state space of the Hamiltonian system is large relative to the number of samples. A recently discovered remedy to alleviate the complexity between state variables in the state space is to leverage the additive separability of the Hamiltonian system and embed that additive separability into the Hamiltonian neural network. Following the nomenclature of physics-informed machine learning, we propose three separable Hamiltonian neural networks. These models embed additive separability within Hamiltonian neural networks. The first model uses additive separability to quadratically scale the amount of data for training Hamiltonian neural networks. The second model embeds additive separability within the loss function of the Hamiltonian neural network. The third model embeds additive separability through the architecture of the Hamiltonian neural network using conjoined multilayer perceptions. We empirically compare the three models against state-of-the-art Hamiltonian neural networks, and demonstrate that the separable Hamiltonian neural networks, which alleviate complexity between the state variables, are more effective at regressing the Hamiltonian and its vector field.

摘要
现代科学和工程数据系统中模拟动力系统从离散观察数据是一个挑战。哈密顿系统是这种基本和普遍的动力系统之一。哈密顿神经网络是目前的状态艺术模型，可以无监督地将哈密顿系统的劳动量从离散观察数据的向量场中预测。然而，哈密顿动力学在更高维度时可能会变得复杂，特别是当状态空间的维度远大于样本数时。为了缓解状态变量之间的复杂性，我们提出了利用哈密顿系统的添加性分解性来附加到哈密顿神经网络中。根据物理学教育机器学习的命名，我们提出了三种分解哈密顿神经网络。这些模型在哈密顿神经网络中嵌入添加性分解性。第一个模型通过添加性来幂等增加训练哈密顿神经网络的数据量。第二个模型在哈密顿神经网络的损失函数中嵌入添加性。第三个模型通过哈密顿神经网络的建筑嵌入添加性，使用共同多层感知。我们对现有的哈密顿神经网络进行了比较，并证明了分解哈密顿神经网络在预测哈密顿和其向量场方面更有效。

AB2CD: AI for Building Climate Damage Classification and Detection

paper_url: http://arxiv.org/abs/2309.01066
repo_url: None
paper_authors: Maximilian Nitsche, S. Karthik Mukkavilli, Niklas Kühl, Thomas Brunschwiler
for: 本研究旨在应用深度学习技术精确评估自然灾害中的建筑物损坏，使用遥测数据。
methods: 我们使用了不同的深度学习模型，包括差分径等径网络、内部对称网络和双路网络，以及ensemble技术，并评估了不同对应的测试集。
results: 我们的研究结果显示，使用3米以下的卫星影像分辨率可以实现高精度的建筑物损坏推断，并且使用不同的深度学习模型可以实现不同程度的准确性。

Abstract
We explore the implementation of deep learning techniques for precise building damage assessment in the context of natural hazards, utilizing remote sensing data. The xBD dataset, comprising diverse disaster events from across the globe, serves as the primary focus, facilitating the evaluation of deep learning models. We tackle the challenges of generalization to novel disasters and regions while accounting for the influence of low-quality and noisy labels inherent in natural hazard data. Furthermore, our investigation quantitatively establishes that the minimum satellite imagery resolution essential for effective building damage detection is 3 meters and below 1 meter for classification using symmetric and asymmetric resolution perturbation analyses. To achieve robust and accurate evaluations of building damage detection and classification, we evaluated different deep learning models with residual, squeeze and excitation, and dual path network backbones, as well as ensemble techniques. Overall, the U-Net Siamese network ensemble with F-1 score of 0.812 performed the best against the xView2 challenge benchmark. Additionally, we evaluate a Universal model trained on all hazards against a flood expert model and investigate generalization gaps across events, and out of distribution from field data in the Ahr Valley. Our research findings showcase the potential and limitations of advanced AI solutions in enhancing the impact assessment of climate change-induced extreme weather events, such as floods and hurricanes. These insights have implications for disaster impact assessment in the face of escalating climate challenges.

摘要
我们探讨了深度学习技术的应用于精准建筑损害评估中，利用遥感数据，在自然灾害背景下。xBD数据集，包括全球各地不同类型灾害事件，作为主要关注对象，以评估深度学习模型。我们解决了对新灾害和地区总结的挑战，同时考虑了自然灾害数据中的低质量和噪音标签的影响。进一步，我们发现了卫星遥感分辨率最低为3米以下，以下1米为类型分类使用对称和非对称分辨率扰动分析。为实现Robust和准确的建筑损害检测和分类，我们评估了不同的深度学习模型，包括剩余、挤压和激活、双路网络框架。综合来说，U-Net Siamese网络集成 Ensemble架构，F-1分数0.812，在xView2挑战benchmark中表现最佳。此外，我们还评估了对所有灾害的通用模型，并 investigate了不同事件之间的总结差和场景数据外部的差异。我们的研究发现，高级AI解决方案在气候变化引起的极端天气事件的影响评估中具有潜力，但同时也存在局限性。这些发现对气候变化的挑战下的灾害影响评估产生了重要的意义。

Generative Data Augmentation using LLMs improves Distributional Robustness in Question Answering

paper_url: http://arxiv.org/abs/2309.06358
repo_url: None
paper_authors: Arijit Ghosh Chowdhury, Aman Chadha
for: investigate the influence of generated datasets on the performance of QA models under natural distribution shifts
methods: two-step generation approach, generating both contexts and QA pairs to augment existing datasets
results: augmenting reading comprehension datasets with generated data leads to better robustness towards natural distribution shifts

Abstract
Robustness in Natural Language Processing continues to be a pertinent issue, where state of the art models under-perform under naturally shifted distributions. In the context of Question Answering, work on domain adaptation methods continues to be a growing body of research. However, very little attention has been given to the notion of domain generalization under natural distribution shifts, where the target domain is unknown. With drastic improvements in the quality and access to generative models, we answer the question: How do generated datasets influence the performance of QA models under natural distribution shifts? We perform experiments on 4 different datasets under varying amounts of distribution shift, and analyze how "in-the-wild" generation can help achieve domain generalization. We take a two-step generation approach, generating both contexts and QA pairs to augment existing datasets. Through our experiments, we demonstrate how augmenting reading comprehension datasets with generated data leads to better robustness towards natural distribution shifts.

摘要
natural language processing 的 robustness 仍然是一个有问题的issue, 现代模型在自然地shifted distributions下表现不佳。在问答领域中, 关于领域适应方法的研究继续增长。然而, 很少注意到target domain是未知的情况下的领域普遍化。随着生成模型的提高和生成数据的可用性的提高, 我们回答了 Question Answering 模型在自然分布shift下的性能如何受到生成数据的影响。我们在4个不同的 dataset上进行了不同量的分布shift的实验，并分析了如何在�nit-in-the-wild�生成数据的帮助下实现领域普遍化。我们采用了two-step generation方法，首先生成了上下文，然后生成了问题和答案的对。通过我们的实验，我们证明了增强阅读理解dataset的可Generated data可以提高模型对自然分布shift的 Robustness。

Integration of Vision-based Object Detection and Grasping for Articulated Manipulator in Lunar Conditions

paper_url: http://arxiv.org/abs/2309.01055
repo_url: None
paper_authors: Camille Boucher, Gustavo H. Diaz, Shreya Santra, Kentaro Uno, Kazuya Yoshida
for: 这篇论文是为了开发 lunar robot 应用程序而写的。
methods: 这篇论文使用了视觉基础框架，包括物体检测、实例分割和抓取检测，以实现不同应用程序的 integrate。
results: 在具有不平面表面和困难照明条件的情况下，这篇论文达到了92%的成功率，并实现了使用视觉系统结果进行不同应用程序的 assemble 任务。

Abstract
The integration of vision-based frameworks to achieve lunar robot applications faces numerous challenges such as terrain configuration or extreme lighting conditions. This paper presents a generic task pipeline using object detection, instance segmentation and grasp detection, that can be used for various applications by using the results of these vision-based systems in a different way. We achieve a rock stacking task on a non-flat surface in difficult lighting conditions with a very good success rate of 92%. Eventually, we present an experiment to assemble 3D printed robot components to initiate more complex tasks in the future.

摘要
具有视觉基础框架的月球机器人应用面临许多挑战，如地形配置和极端照明条件。本文提出了一个通用任务管道，使用物体检测、实例分割和抓取检测来实现多种应用。我们在非平面表面下实现了一个石堆任务，并在困难的照明条件下达到了92%的成功率。最后，我们展示了将3D打印机器人组件 assembling 以实现更复杂的任务。Note: Please keep in mind that the translation is Simplified Chinese, and some words or phrases may have different translations in Traditional Chinese.

2023-09-03

Generative Social Choice

Traveling Waves Encode the Recent Past and Enhance Sequence Learning

Bayesian inference of composition-dependent phase diagrams

COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers

Learning-Aware Safety for Interactive Autonomy

Large AI Model Empowered Multimodal Semantic Communications

Representations Matter: Embedding Modes of Large Language Models using Dynamic Mode Decomposition

Saturn: An Optimized Data System for Large Model Deep Learning Workloads

Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Physics-inspired Neural Networks for Parameter Learning of Adaptive Cruise Control Systems

A Visual Interpretation-Based Self-Improved Classification System Using Virtual Adversarial Training

A Survey on Service Route and Time Prediction in Instant Delivery: Taxonomy, Progress, and Prospects

LogGPT: Exploring ChatGPT for Log-Based Anomaly Detection

Pre-trained Neural Recommenders: A Transferable Zero-Shot Framework for Recommendation Systems

Cognition-Mode Aware Variational Representation Learning Framework for Knowledge Tracing

Logic of subjective probability

FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs

End-to-End Learning on Multimodal Knowledge Graphs

Spatial-temporal Vehicle Re-identification

Large Language Models for Generative Recommendation: A Survey and Visionary Discussions

FedFwd: Federated Learning without Backpropagation

Interpretable Sequence Clustering

Financial Fraud Detection using Quantum Graph Neural Networks

MedChatZH: a Better Medical Adviser Learns from Better Instructions

A Study on the Implementation of Generative AI Services Using an Enterprise Data-Based LLM Application Architecture

M2HGCL: Multi-Scale Meta-Path Integrated Heterogeneous Graph Contrastive Learning

Stabilize to Act: Learning to Coordinate for Bimanual Manipulation

UnsMOT: Unified Framework for Unsupervised Multi-Object Tracking with Geometric Topology Guidance

Multidomain transformer-based deep learning for early detection of network intrusion

Separable Hamiltonian Neural Networks

AB2CD: AI for Building Climate Damage Classification and Detection

Generative Data Augmentation using LLMs improves Distributional Robustness in Question Answering

Integration of Vision-based Object Detection and Grasping for Articulated Manipulator in Lunar Conditions