2023-10-04

cs.AI

cs.AI - 2023-10-04

Progressive reduced order modeling: empowering data-driven modeling with selective knowledge transfer

paper_url: http://arxiv.org/abs/2310.03770
repo_url: None
paper_authors: Teeratorn Kadeethum, Daniel O’Malley, Youngsoo Choi, Hari S. Viswanathan, Hongkyu Yoon
for: 解决数据缺乏问题，提高数据驱动模型的实用性。
methods: 提出了一种进步式减少顺序模型框架，通过人工智能选择性地使用前一些模型的知识，以减少数据需求和提高模型的准确性。
results: 在多个情况中测试了该框架，包括孔隙媒体传输、重力驱动流和各种弹性材料的质量变化。结果表明，保留前一些模型的知识并在当前模型中使用有价值的信息可以大幅提高模型的准确性。相比之下，没有父模型的模型需要9倍更多的数据来达到相同的准确性。这些结果表明了进步式知识传递的重要性和其对模型准确性的影响。

Abstract
Data-driven modeling can suffer from a constant demand for data, leading to reduced accuracy and impractical for engineering applications due to the high cost and scarcity of information. To address this challenge, we propose a progressive reduced order modeling framework that minimizes data cravings and enhances data-driven modeling's practicality. Our approach selectively transfers knowledge from previously trained models through gates, similar to how humans selectively use valuable knowledge while ignoring unuseful information. By filtering relevant information from previous models, we can create a surrogate model with minimal turnaround time and a smaller training set that can still achieve high accuracy. We have tested our framework in several cases, including transport in porous media, gravity-driven flow, and finite deformation in hyperelastic materials. Our results illustrate that retaining information from previous models and utilizing a valuable portion of that knowledge can significantly improve the accuracy of the current model. We have demonstrated the importance of progressive knowledge transfer and its impact on model accuracy with reduced training samples. For instance, our framework with four parent models outperforms the no-parent counterpart trained on data nine times larger. Our research unlocks data-driven modeling's potential for practical engineering applications by mitigating the data scarcity issue. Our proposed framework is a significant step toward more efficient and cost-effective data-driven modeling, fostering advancements across various fields.

摘要
<>Translate the given text into Simplified Chinese.<>数据驱动模型可能会面临数据需求不断增加的问题，导致模型精度降低，不适合工程应用因为数据昂贵和珍贵。为解决这个挑战，我们提出了一种进行逐步减少的模型架构，以减少数据的需求并提高数据驱动模型的实用性。我们的方法通过门户来传递知识，类似于人类选择性使用有价值的知识而忽略无用信息。通过过滤先前训练过的模型中的有用信息，我们可以创建一个快速生成的代理模型，并且只需训练一个较小的数据集，却仍能达到高精度。我们在多个情况中测试了我们的框架，包括在孔隙媒体中的运输、重力驱动流和可塑性材料中的finite deformation。我们的结果表明，保留先前模型中的信息并利用有价值的信息可以显著提高当前模型的精度。我们的研究表明了进行逐步知识传递的重要性和其对模型精度的影响，并且我们的框架在训练样本数量九倍时表现比无父模型要好。我们的研究推动数据驱动模型在实际工程应用中的应用，解决了数据珍贵问题。我们提出的框架是数据驱动模型效率和成本下降的重要一步，推动各个领域的进步。

On the Performance of Multimodal Language Models

paper_url: http://arxiv.org/abs/2310.03211
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Utsav Garg, Erhan Bas
for: 本研究旨在比较不同多媒体指令训练方法的表现，并评估它们在多种任务上的适用范围，包括复杂的推理、对话、图像描述、多选问题（MCQ）和二分类。
methods: 本研究使用了模型插入和 instrucion 训练方法，将独立预训的视觉encoder与语言模型结合，以扩展语言模型的多媒体能力。
results: 研究发现，现有的方法尚未能够覆盖多媒体指令数据集的多样性，这限制了模型的任务普遍化能力。此外，研究发现当生成回答时，模型可能无法保持真实性和事实性。这些发现可以帮助研究人员和实践者更好地适用多媒体语言模型。

Abstract
Instruction-tuned large language models (LLMs) have demonstrated promising zero-shot generalization capabilities across various downstream tasks. Recent research has introduced multimodal capabilities to LLMs by integrating independently pretrained vision encoders through model grafting. These multimodal variants undergo instruction tuning, similar to LLMs, enabling effective zero-shot generalization for multimodal tasks. This study conducts a comparative analysis of different multimodal instruction tuning approaches and evaluates their performance across a range of tasks, including complex reasoning, conversation, image captioning, multiple-choice questions (MCQs), and binary classification. Through rigorous benchmarking and ablation experiments, we reveal key insights for guiding architectural choices when incorporating multimodal capabilities into LLMs. However, current approaches have limitations; they do not sufficiently address the need for a diverse multimodal instruction dataset, which is crucial for enhancing task generalization. Additionally, they overlook issues related to truthfulness and factuality when generating responses. These findings illuminate current methodological constraints in adapting language models for image comprehension and provide valuable guidance for researchers and practitioners seeking to harness multimodal versions of LLMs.

摘要
现代大语言模型（LLM）在不同下游任务上显示出了可观的零shot泛化能力。最近的研究将多modal功能integrated into LLMs through model grafting, allowing for zero-shot generalization across multimodal tasks. This study compares different multimodal instruction tuning approaches and evaluates their performance on a range of tasks, including complex reasoning, conversation, image captioning, multiple-choice questions (MCQs), and binary classification. Through rigorous benchmarking and ablation experiments, we reveal key insights for guiding architectural choices when incorporating multimodal capabilities into LLMs. However, current approaches have limitations; they do not adequately address the need for a diverse multimodal instruction dataset, which is crucial for enhancing task generalization. Additionally, they overlook issues related to truthfulness and factuality when generating responses. These findings illuminate current methodological constraints in adapting language models for image comprehension and provide valuable guidance for researchers and practitioners seeking to harness multimodal versions of LLMs.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization

paper_url: http://arxiv.org/abs/2310.03205
repo_url: None
paper_authors: Kim Youwang, Lee Hyun, Kim Sung-Bin, Suekyeong Nam, Janghoon Ju, Tae-Hyun Oh
for: 本研究 propose了一种基于神经网络优化的3D面精度投影方法，以生成具有高可靠性和精度的3D面抽象数据集。
methods: 该方法使用神经网络重parameter化优化方法，将原始的3D面精度投影转换为更加精度和可靠的3D面抽象数据集。
results: 经过 experiments 表明，使用该方法可以生成高质量的3D面抽象数据集，并且可以提高现有3Dface reconstruction模型的重建精度。 code和数据集将于https://neuface-dataset.github.io/中公开发布。

Abstract
We propose NeuFace, a 3D face mesh pseudo annotation method on videos via neural re-parameterized optimization. Despite the huge progress in 3D face reconstruction methods, generating reliable 3D face labels for in-the-wild dynamic videos remains challenging. Using NeuFace optimization, we annotate the per-view/-frame accurate and consistent face meshes on large-scale face videos, called the NeuFace-dataset. We investigate how neural re-parameterization helps to reconstruct image-aligned facial details on 3D meshes via gradient analysis. By exploiting the naturalness and diversity of 3D faces in our dataset, we demonstrate the usefulness of our dataset for 3D face-related tasks: improving the reconstruction accuracy of an existing 3D face reconstruction model and learning 3D facial motion prior. Code and datasets will be available at https://neuface-dataset.github.io.

摘要
我们提出了 NeuFace，一种基于神经网络优化的3D面积pseudo注解方法，用于视频中的人脸重建。尽管3D人脸重建方法已经做出了巨大的进步，但是在野外动态视频中生成可靠的3D人脸标注仍然是一个挑战。使用NeuFace优化方法，我们为大规模人脸视频annotated每个视图/-帧的准确和一致的3D人脸矩阵，并称之为NeuFace-数据集。我们通过对 Gradient分析进行 investigated如何使用神经重parameterization来重建图像对齐的 facial details on 3D矩阵。利用我们数据集中的自然和多样的3D人脸，我们示出了我们数据集在3D人脸相关任务中的用途：提高现有3D人脸重建模型的重建精度和学习3Dfacial motion prior。我们的代码和数据集将在https://neuface-dataset.github.io上公开。

Deep reinforcement learning for machine scheduling: Methodology, the state-of-the-art, and future directions

paper_url: http://arxiv.org/abs/2310.03195
repo_url: None
paper_authors: Maziyar Khadivi, Todd Charter, Marjan Yaghoubi, Masoud Jalayer, Maryam Ahang, Ardeshir Shojaeinasab, Homayoun Najjaran
For: 本研究旨在审查和比较基于深度学习的机器调度策略，以优化机器任务分配，遵循制造规则和作业特性。* Methods: 本研究使用深度学习算法，包括传统神经网络、编码器-解码器架构、图 neural networks 和metaheuristic算法，来解决机器调度问题。* Results: 研究发现，基于深度学习的机器调度策略可以快速计算，生成近似全局优解，并在不同的机器环境和作业特性下取得成功应用。但是，这些方法面临着实现复杂操作约束、可 configurable多目标优化、泛化、可扩展性、可读性和稳定性等挑战。

Abstract
Machine scheduling aims to optimize job assignments to machines while adhering to manufacturing rules and job specifications. This optimization leads to reduced operational costs, improved customer demand fulfillment, and enhanced production efficiency. However, machine scheduling remains a challenging combinatorial problem due to its NP-hard nature. Deep Reinforcement Learning (DRL), a key component of artificial general intelligence, has shown promise in various domains like gaming and robotics. Researchers have explored applying DRL to machine scheduling problems since 1995. This paper offers a comprehensive review and comparison of DRL-based approaches, highlighting their methodology, applications, advantages, and limitations. It categorizes these approaches based on computational components: conventional neural networks, encoder-decoder architectures, graph neural networks, and metaheuristic algorithms. Our review concludes that DRL-based methods outperform exact solvers, heuristics, and tabular reinforcement learning algorithms in terms of computation speed and generating near-global optimal solutions. These DRL-based approaches have been successfully applied to static and dynamic scheduling across diverse machine environments and job characteristics. However, DRL-based schedulers face limitations in handling complex operational constraints, configurable multi-objective optimization, generalization, scalability, interpretability, and robustness. Addressing these challenges will be a crucial focus for future research in this field. This paper serves as a valuable resource for researchers to assess the current state of DRL-based machine scheduling and identify research gaps. It also aids experts and practitioners in selecting the appropriate DRL approach for production scheduling.

摘要
机器调度目标是优化作业分配到机器的优化问题，同时遵循制造规则和作业规定。这种优化导致了降低运营成本、改善客户需求满足度和提高生产效率。然而，机器调度问题是一个NP困难的 combinatorial problem。深度学习 Reinforcement Learning (DRL) 作为人工通用智能的关键组件，在不同领域如游戏和机器人等领域中已经展现出了扎实的应用潜力。自1995年以来，研究人员已经开始探索将 DRL 应用于机器调度问题。本文提供了 DRL 基于方法的全面回顾和比较，包括它们的方法学习、应用、优势和局限性。我们根据计算Component分类了这些方法：传统神经网络、编码器-解码器架构、图神经网络和metaheuristic算法。我们的回顾结论是，DRL 基于方法在计算速度和生成 Near-global 优化解决方案方面表现出色，超过了精确解决方法、规则和表示学习算法。这些 DRL 基于方法在不同的机器环境和作业特点下都得到了成功应用。然而，DRL 基于调度器面临着处理复杂的运营约束、可 configurable 多目标优化、泛化、扩展、可读性和可靠性等挑战。未来研究应该集中在解决这些挑战上。本文作为 DRL 基于机器调度的现状评估和研究漏洞的资源，同时也为专家和实践者提供了选择合适 DRL 方法的指南。

Talking Models: Distill Pre-trained Knowledge to Downstream Models via Interactive Communication

paper_url: http://arxiv.org/abs/2310.03188
repo_url: None
paper_authors: Zhe Zhao, Qingyun Liu, Huan Gui, Bang An, Lichan Hong, Ed H. Chi
for: 这篇研究的目的是提高下游任务的效能，使用基础模型进行知识传递和学习。
methods: 这篇研究使用了知识传递（KD）技术，并将其扩展为一个互动式沟通过程，帮助学生模型从基础模型中学习有效。
results: 研究发现，这个互动式沟通过程可以优化知识传递，使学生模型在下游任务上表现更好，并且比现有的传递技术表现更高。

Abstract
Many recent breakthroughs in machine learning have been enabled by the pre-trained foundation models. By scaling up model parameters, training data, and computation resources, foundation models have significantly advanced the state-of-the-art in many applications. However, it is still an open question of how to use these models to perform downstream tasks efficiently. Knowledge distillation (KD) has been explored to tackle this challenge. KD transfers knowledge from a large teacher model to a smaller student model. While KD has been successful in improving student model performance, recent research has discovered that a powerful teacher does not necessarily lead to a powerful student, due to their huge capacity gap. In addition, the potential distribution shifts between the pre-training data and downstream tasks can make knowledge transfer in KD sub-optimal for improving downstream task performance. In this paper, we extend KD with an interactive communication process to help students of downstream tasks learn effectively from pre-trained foundation models. Our design is inspired by the way humans learn from teachers who can explain knowledge in a way that meets the students' needs. Specifically, we let each model (i.e., student and teacher) train two components: (1) an encoder encoding the model's hidden states to a message and (2) a decoder decoding any messages to its own hidden states. With encoder and decoder, not only can the teacher transfer rich information by encoding its hidden states, but also the student can send messages with information of downstream tasks to the teacher. Therefore, knowledge passing from teacher to student can be tailored to the student's capacity and downstream tasks' distributions. We conducted experiments on benchmark datasets to show that our communication mechanism outperforms state-of-the-art distillation techniques.

摘要
很多最近的机器学习突破口都是基于预训练基模型。通过扩大模型参数、训练数据和计算资源，基模型已经显著提高了许多应用领域的状态。然而，如何使用这些模型效率地完成下游任务仍然是一个开放的问题。知识储备（KD）已经被研究以解决这个挑战。KD将知识从大型教师模型传递给小型学生模型。虽然KD已经成功地提高学生模型性能，但最近的研究发现，强大的教师模型并不一定导致强大的学生模型，它们之间的质量差距很大。此外，在预训练数据和下游任务的分布变化下，KD可能会导致知识传递不优化下游任务性能。在这篇论文中，我们将KD扩展为交互式通信过程，帮助下游任务的学生模型从预训练基模型中学习效果。我们的设计受到人类教师从知识传递给学生的方式启发。特别是，我们让每个模型（即学生和教师）训练两个组件：（1）编码器将模型的隐藏状态编码成消息，和（2）解码器将任何消息解码回其自己的隐藏状态。通过编码器和解码器，不仅可以让教师传递丰富的信息，还可以让学生将下游任务的信息发送给教师。因此，知识传递从教师到学生可以适应学生的能力和下游任务的分布。我们在标准 benchmark 数据集上进行了实验，证明我们的通信机制超过了当前的储备技术。

Inferring Inference

paper_url: http://arxiv.org/abs/2310.03186
repo_url: https://github.com/xaqlab/inferringinference
paper_authors: Rajkumar Vasudeva Raju, Zhe Li, Scott Linderman, Xaq Pitkow
for: 这paper aimed to develop a mathematical framework for inferring canonical distributed computations from large-scale neural activity patterns.
methods: The authors integrated normative and algorithmic theories of neural computation into a mathematical framework, using a nonlinear message-passing algorithm on a graph-structured model of the world.
results: The framework was able to recover the latent variables, their neural representation and dynamics, and canonical message-functions from simulated recordings of a model brain. The authors highlighted features of experimental design needed to successfully extract canonical computations from neural data.

Abstract
Patterns of microcircuitry suggest that the brain has an array of repeated canonical computational units. Yet neural representations are distributed, so the relevant computations may only be related indirectly to single-neuron transformations. It thus remains an open challenge how to define canonical distributed computations. We integrate normative and algorithmic theories of neural computation into a mathematical framework for inferring canonical distributed computations from large-scale neural activity patterns. At the normative level, we hypothesize that the brain creates a structured internal model of its environment, positing latent causes that explain its sensory inputs, and uses those sensory inputs to infer the latent causes. At the algorithmic level, we propose that this inference process is a nonlinear message-passing algorithm on a graph-structured model of the world. Given a time series of neural activity during a perceptual inference task, our framework finds (i) the neural representation of relevant latent variables, (ii) interactions between these variables that define the brain's internal model of the world, and (iii) message-functions specifying the inference algorithm. These targeted computational properties are then statistically distinguishable due to the symmetries inherent in any canonical computation, up to a global transformation. As a demonstration, we simulate recordings for a model brain that implicitly implements an approximate inference algorithm on a probabilistic graphical model. Given its external inputs and noisy neural activity, we recover the latent variables, their neural representation and dynamics, and canonical message-functions. We highlight features of experimental design needed to successfully extract canonical computations from neural data. Overall, this framework provides a new tool for discovering interpretable structure in neural recordings.

摘要
《Patterns of Microcircuitry Suggest Canonical Distributed Computations in the Brain》研究表明，大脑中的微型维生物呈现出一系列重复的标准计算单元。然而，神经表达是分布的，因此相关的计算可能只有间接关系到单个神经元的变换。因此，如何定义标准分布计算仍然是一个开放的挑战。我们将normative和algorithmic理论 integrate into a mathematical framework for inferring canonical distributed computations from large-scale neural activity patterns.在normative level上，我们假设大脑在进行感知推理任务时创建了一个结构化的内部模型，并将感知输入作为这些内部模型的 latent causes 来解释。在algorithmic level上，我们提议这个推理过程是一种非线性消息传递算法在一个图Structured world 上。给定一个时间序列神经活动记录，我们的框架可以找到（i）神经表达中相关的 latent variables ，（ii）这些变量之间的交互，这些交互定义大脑对世界的内部模型，以及（iii）消息函数，这些函数 specify the inference algorithm。这些目标计算属性因 symmetries inherent in any canonical computation 而可以 statistically distinguishable，up to a global transformation。为了证明，我们模拟了一个模拟大脑，该模拟中包含了一种近似于推理算法的 probabilistic graphical model 。给定外部输入和噪音神经活动，我们可以回归 latent variables，它们的神经表达和动力学，以及标准消息函数。我们高亮了实验设计的特点，以便成功地提取 canonical computations FROM neural data。总的来说，这种框架提供了一种新的工具，可以在神经记录中发现可读取的结构。

Misusing Tools in Large Language Models With Visual Adversarial Examples

paper_url: http://arxiv.org/abs/2310.03185
repo_url: None
paper_authors: Xiaohan Fu, Zihan Wang, Shuheng Li, Rajesh K. Gupta, Niloofar Mireshghallah, Taylor Berg-Kirkpatrick, Earlence Fernandes
for: 这篇论文旨在探讨大语言模型（LLM）在使用工具和处理多模态时新增的能力，以及这些新能力带来的新的安全风险。
methods: 作者使用梯度基本 adversarial 训练构建了对抗示例，并通过多个维度测试了这些攻击的性能。
results: 作者发现，使用这些对抗示例可以让攻击者让受影响的 LLM invoke 工具，例如删除日历事件、泄露私人对话和预订酒店等。这些攻击可以让用户资源受到攻击而不会被发现，并且可以在多个输入提示上进行攻击。

Abstract
Large Language Models (LLMs) are being enhanced with the ability to use tools and to process multiple modalities. These new capabilities bring new benefits and also new security risks. In this work, we show that an attacker can use visual adversarial examples to cause attacker-desired tool usage. For example, the attacker could cause a victim LLM to delete calendar events, leak private conversations and book hotels. Different from prior work, our attacks can affect the confidentiality and integrity of user resources connected to the LLM while being stealthy and generalizable to multiple input prompts. We construct these attacks using gradient-based adversarial training and characterize performance along multiple dimensions. We find that our adversarial images can manipulate the LLM to invoke tools following real-world syntax almost always (~98%) while maintaining high similarity to clean images (~0.9 SSIM). Furthermore, using human scoring and automated metrics, we find that the attacks do not noticeably affect the conversation (and its semantics) between the user and the LLM.

摘要

Neural architecture impact on identifying temporally extended Reinforcement Learning tasks

paper_url: http://arxiv.org/abs/2310.03161
repo_url: None
paper_authors: Victor Vadakechirayath George
for: 这 paper 的目的是在 OpenAI Gym Atari-2600 游戏集中使用 Attention 模型进行 Deep Reinforcement Learning (RL) 探索，并提供更好的解释能力。
methods: 这 paper 使用了 Attention 模型，包括 Attention 图像混合和 Vision Transformer，以提高 RL 模型的解释性和性能。
results: 这 paper 的模型在 OpenAI Gym Atari-2600 游戏集中表现出色，并且提供了更好的解释能力，让 agent 的选择动作更加直观。

Abstract
Inspired by recent developments in attention models for image classification and natural language processing, we present various Attention based architectures in reinforcement learning (RL) domain, capable of performing well on OpenAI Gym Atari-2600 game suite. In spite of the recent success of Deep Reinforcement learning techniques in various fields like robotics, gaming and healthcare, they suffer from a major drawback that neural networks are difficult to interpret. We try to get around this problem with the help of Attention based models. In Attention based models, extracting and overlaying of attention map onto images allows for direct observation of information used by agent to select actions and easier interpretation of logic behind the chosen actions. Our models in addition to playing well on gym-Atari environments, also provide insights on how agent perceives its environment. In addition, motivated by recent developments in attention based video-classification models using Vision Transformer, we come up with an architecture based on Vision Transformer, for image-based RL domain too. Compared to previous works in Vision Transformer, our model is faster to train and requires fewer computational resources. 3

摘要
受最近图像分类和自然语言处理领域的注意模型发展启发，我们在再增强学习（RL）领域提出了多种注意模型架构，能够在OpenAI Gym Atari-2600游戏集成环境中表现良好。 despite the recent success of deep reinforcement learning techniques in various fields such as robotics, gaming, and healthcare, these techniques suffer from a major drawback that neural networks are difficult to interpret. To address this problem, we use attention-based models. In attention-based models, extracting and overlaying attention maps onto images allows for direct observation of the information used by the agent to select actions and easier interpretation of the logic behind the chosen actions. Our models not only perform well on gym-Atari environments, but also provide insights into how the agent perceives its environment. In addition, inspired by recent developments in attention-based video classification models using Vision Transformer, we propose an architecture based on Vision Transformer for the image-based RL domain, which is faster to train and requires fewer computational resources than previous works.

Assessment of Prediction Intervals Using Uncertainty Characteristics Curves

paper_url: http://arxiv.org/abs/2310.03158
repo_url: None
paper_authors: Jiri Navratil, Benjamin Elder, Matthew Arnold, Soumya Ghosh, Prasanna Sattigeri
for: 该论文旨在为信任AI提供可靠的模型不确定量化。
methods: 该论文使用操作特征曲线和空间参照点的概念，提出了一种无关操作点的评估方法，用于评估预测间隔。
results: 论文提出了一种新的评估方法，可以准确评估预测间隔的不确定性。该方法在选定的场景中进行了示例评估，并证明了其在现有评估工具箱中的价值。

Abstract
Accurate quantification of model uncertainty has long been recognized as a fundamental requirement for trusted AI. In regression tasks, uncertainty is typically quantified using prediction intervals calibrated to an ad-hoc operating point, making evaluation and comparison across different studies relatively difficult. Our work leverages: (1) the concept of operating characteristics curves and (2) the notion of a gain over a null reference, to derive a novel operating point agnostic assessment methodology for prediction intervals. The paper defines the Uncertainty Characteristics Curve and demonstrates its utility in selected scenarios. We argue that the proposed method addresses the current need for comprehensive assessment of prediction intervals and thus represents a valuable addition to the uncertainty quantification toolbox.

摘要
长期以来，模型不确定性的准确量化已被认为是人工智能的基本需求。在回归任务中，不确定性通常通过预测Intervals进行衡量，但这会使评估和比较不同研究变得更加困难。我们的工作利用：（1）运行特性曲线的概念和（2）对Null参照点的增值来 derive一种新的无关操作点评估方法，用于预测Intervals。文章定义了不确定性特征曲线，并在选择的场景中示出其实用性。我们认为，提案的方法可以全面评估预测Intervals，因此代表了 uncertainty量化工具箱中的一个值得加入的贡献。

Comprehensive Multimodal Segmentation in Medical Imaging: Combining YOLOv8 with SAM and HQ-SAM Models

paper_url: http://arxiv.org/abs/2310.12995
repo_url: None
paper_authors: Sumit Pandey, Kuan-Fu Chen, Erik B. Dam
for: 这篇论文旨在提出一个综合方法来对不同医疗影像资料中的区域兴趣 (ROI) 进行分类，包括ultrasound、CT扫描和X射线影像。
methods: 提案方法利用 YOLOv8 模型来 aproximate 边框检测过程，以及 Segment Anything Model (SAM) 和 High Quality (HQ) SAM 模型来实现完全自动且精确的分类。
results: 研究结果显示 YOLOv8+SAM 模型在医疗影像分类中具有高度的准确性和性能，而 HQ-SAM 模型优于其他两个模型，但是其额外的计算成本不足以衡量其增加的功能。

Abstract
This paper introduces a comprehensive approach for segmenting regions of interest (ROI) in diverse medical imaging datasets, encompassing ultrasound, CT scans, and X-ray images. The proposed method harnesses the capabilities of the YOLOv8 model for approximate boundary box detection across modalities, alongside the Segment Anything Model (SAM) and High Quality (HQ) SAM for fully automatic and precise segmentation. To generate boundary boxes, the YOLOv8 model was trained using a limited set of 100 images and masks from each modality. The results obtained from our approach are extensively computed and analyzed, demonstrating its effectiveness and potential in medical image analysis. Various evaluation metrics, including precision, recall, F1 score, and Dice Score, were employed to quantify the accuracy of the segmentation results. A comparative analysis was conducted to assess the individual and combined performance of the YOLOv8, YOLOv8+SAM, and YOLOv8+HQ-SAM models. The results indicate that the SAM model performs better than the other two models, exhibiting higher segmentation accuracy and overall performance. While HQ-SAM offers potential advantages, its incremental gains over the standard SAM model may not justify the additional computational cost. The YOLOv8+SAM model shows promise for enhancing medical image segmentation and its clinical implications.

摘要
To generate boundary boxes, the YOLOv8 model was trained using a limited set of 100 images and masks from each modality. The results show that the SAM model outperforms the other two models in terms of segmentation accuracy and overall performance, with higher precision, recall, F1 score, and Dice Score. While HQ-SAM offers potential advantages, its incremental gains over the standard SAM model may not justify the additional computational cost.The YOLOv8+SAM model demonstrates promise for enhancing medical image segmentation and its clinical implications. Various evaluation metrics were employed to quantify the accuracy of the segmentation results, and a comparative analysis was conducted to assess the individual and combined performance of the three models.

Attributing Learned Concepts in Neural Networks to Training Data

paper_url: http://arxiv.org/abs/2310.03149
repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
paper_authors: Nicholas Konz, Charles Godfrey, Madelyn Shapiro, Jonathan Tu, Henry Kvinge, Davis Brown
For: The paper aims to investigate how deep learning models learn certain human-interpretable features and which inputs from the model’s original training set are most important for learning a concept at a given layer.* Methods: The authors use data attribution methods combined with probing the concepts learned by a model. They train network and probe ensembles for two concept datasets on a range of network layers and use the recently developed TRAK method for large-scale data attribution.* Results: The authors find evidence for convergence, where removing the 10,000 top attributing images for a concept and retraining the model does not change the location of the concept in the network nor the probing sparsity of the concept. This suggests that the features that inform the development of a concept are spread in a more diffuse manner across its exemplars, implying robustness in concept formation.Here is the information in Simplified Chinese text:
for: 本研究旨在调查深度学习模型如何学习人类可理解的特征，以及模型原始训练数据中哪些输入最重要 для某层概念的学习。
methods: 作者使用数据负担方法和探测模型学习的方法，对两个概念集合进行网络层的训练和探测。他们使用最近发展的TRAK方法进行大规模数据负担。
results: 作者发现了一些证据，表明模型学习的概念是通过某些特定的例子来形成的，但不是依赖于特定的几个例子。从这些结果来看，概念的特征是通过其示例中的各种扩散的方式被学习，这表明模型在概念形成中具有更好的可靠性。

Abstract
By now there is substantial evidence that deep learning models learn certain human-interpretable features as part of their internal representations of data. As having the right (or wrong) concepts is critical to trustworthy machine learning systems, it is natural to ask which inputs from the model's original training set were most important for learning a concept at a given layer. To answer this, we combine data attribution methods with methods for probing the concepts learned by a model. Training network and probe ensembles for two concept datasets on a range of network layers, we use the recently developed TRAK method for large-scale data attribution. We find some evidence for convergence, where removing the 10,000 top attributing images for a concept and retraining the model does not change the location of the concept in the network nor the probing sparsity of the concept. This suggests that rather than being highly dependent on a few specific examples, the features that inform the development of a concept are spread in a more diffuse manner across its exemplars, implying robustness in concept formation.

摘要
(Simplified Chinese translation)到目前为止，已有证据表明深度学习模型在数据内部表征中学习了一些人类可解释的特征。因为有效的概念是机器学习系统中关键的，因此 naturall 要问，哪些输入在模型的原始训练集中对于某层学习某个概念是最重要的。为了回答这个问题，我们将数据负担方法与模型学习概念的方法结合使用。我们在两个概念集上对多个网络层进行训练和探测集 ensemble，并使用最近发展的TRAK方法进行大规模数据负担。我们发现了一些证据，表明在某些情况下，移除概念的10000个最重要的示例图像，并重新训练模型，不会改变概念在网络中的位置，也不会改变探测稀缺性。这表示，相比于依赖于几个特定示例，概念的发展中的特征是在其示例中散布在更加混合的方式，这表明了概念的稳定性。

Efficient Federated Prompt Tuning for Black-box Large Pre-trained Models

paper_url: http://arxiv.org/abs/2310.03123
repo_url: None
paper_authors: Zihao Lin, Yan Sun, Yifan Shi, Xueqian Wang, Lifu Huang, Li Shen, Dacheng Tao
for: 这篇论文的目的是提出一种名为“ Federated Black-Box Prompt Tuning”（Fed-BBPT）的新方法，以便在内存限制和隐私问题下，有效地调整预训模型（PTM）。
methods: 这篇论文使用了一个中央服务器，帮助地方用户协同训练一个提示生成器，通过定期的聚合来实现。地方用户通过API驱动学习，不需要PTM的部署。
results: 试验结果显示，相比于广泛的 fine-tuning，Fed-BBPT 能够有效地解决内存限制和隐私问题，并且在40个数据集上进行了 Thorough 的评估。

Abstract
With the blowout development of pre-trained models (PTMs), the efficient tuning of these models for diverse downstream applications has emerged as a pivotal research concern. Although recent investigations into prompt tuning have provided promising avenues, three salient challenges persist: (1) memory constraint: the continuous growth in the size of open-source PTMs renders fine-tuning, even a fraction of their parameters, challenging for many practitioners. (2) model privacy: existing PTMs often function as public API services, with their parameters inaccessible for effective or tailored fine-tuning. (3) data privacy: the fine-tuning of PTMs necessitates high-quality datasets, which are typically localized and not shared to public. To optimally harness each local dataset while navigating memory constraints and preserving privacy, we propose Federated Black-Box Prompt Tuning (Fed-BBPT). This innovative approach eschews reliance on parameter architectures and private dataset access, instead capitalizing on a central server that aids local users in collaboratively training a prompt generator through regular aggregation. Local users leverage API-driven learning via a zero-order optimizer, obviating the need for PTM deployment. Relative to extensive fine-tuning, Fed-BBPT proficiently sidesteps memory challenges tied to PTM storage and fine-tuning on local machines, tapping into comprehensive, high-quality, yet private training datasets. A thorough evaluation across 40 datasets spanning CV and NLP tasks underscores the robustness of our proposed model.

摘要
随着预训模型（PTM）的快速发展，有效地调节这些模型以适应多样化的下游应用已成为研究的约束。虽然最近关于提问调教的研究已提供了可能的道路，但是三个突出的挑战仍然存在：（1）内存约束：随着开源PTM的大小不断增长，细化大量参数已成为许多实践者的挑战。（2）模型隐私：现有PTM通常作为公共API服务，其参数不可访问，使得有效或特定的细化不可能。（3）数据隐私：PTM的细化需要高质量的数据，这些数据通常是本地化的，并不会被公开分享。为了最佳地利用每个本地数据集，同时绕过内存约束和隐私问题，我们提议了联邦黑盒提问调教（Fed-BBPT）。这种创新的方法不依赖参数建筑和私人数据访问，而是通过中央服务器，帮助本地用户通过定期聚合来共同培训一个提问生成器。本地用户通过API驱动学习，使用零次训练算法，无需PTM部署。相比较广泛的细化，Fed-BBPT efficiently circumvents memory challenges related to PTM storage and fine-tuning on local machines, tapping into comprehensive, high-quality, yet private training datasets. 一项严格的评估 across 40 datasets spanning CV and NLP tasks highlights the robustness of our proposed model.

Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning

paper_url: http://arxiv.org/abs/2310.03094
repo_url: https://github.com/murongyue/llm_mot_cascade
paper_authors: Murong Yue, Jie Zhao, Min Zhang, Liang Du, Ziyu Yao
for: 降低使用大语言模型（LLM）的成本，特别是在推理（例如数学、 causal）任务中。
methods: 建立LLM层次结构，根据问题的困难程度使用不同的LLM。提出了“答案一致性”来评估问题的困难程度，并提出了答案采样和一致性检查的方法，包括使用两种思维表示（Chain-of-Thought和Program-of-Thought）的混合。
results: 通过在六个推理基准数据集上进行实验，使用GPT-3.5-turbo和GPT-4作为较弱和较强LLM，分别显示了我们提议的LLM层次结构可以与使用 solo 较强LLM获得相同的性能，但需要只有40%的成本。

Abstract
Large language models (LLMs) such as GPT-4 have exhibited remarkable performance in a variety of tasks, but this strong performance often comes with the high expense of using paid API services. In this paper, we are motivated to study building an LLM cascade to save the cost of using LLMs, particularly for performing reasoning (e.g., mathematical, causal) tasks. Our cascade pipeline follows the intuition that simpler questions can be addressed by a weaker but more affordable LLM, whereas only the challenging questions necessitate the stronger and more expensive LLM. To realize this decision-making, we consider the "answer consistency" of the weaker LLM as a signal of the question difficulty and propose several methods for the answer sampling and consistency checking, including one leveraging a mixture of two thought representations (i.e., Chain-of-Thought and Program-of-Thought). Through experiments on six reasoning benchmark datasets, with GPT-3.5-turbo and GPT-4 being the weaker and stronger LLMs, respectively, we demonstrate that our proposed LLM cascades can achieve performance comparable to using solely the stronger LLM but require only 40% of its cost.

摘要
大型自然语言模型（LLM）如GPT-4在多种任务中表现出了非常出色的表现，但这强大的表现经常是通过使用付费API服务来实现的。在这篇论文中，我们受到了建立LLM堆叠以降低使用LLM的成本的动机。我们的堆叠管道遵循了intuition，即更简单的问题可以由较弱但更有经济的LLM来解决，而只有复杂的问题需要更强大且更昂贵的LLM。为实现这种决策，我们考虑了使用较弱LLM的答案一致性作为问题难度的信号，并提出了一些答案采样和一致性检查方法，其中一种利用两种思维表示（即链条思维和计划思维）的混合。通过在六个逻辑 bench mark 数据集上进行实验，使用GPT-3.5-turbo和GPT-4作为较弱和更强的LLM，我们示出了我们提议的LLM堆叠可以与使用唯一的更强LLM实现相同的表现，但需要的成本只有40%。

Discovering Knowledge-Critical Subnetworks in Pretrained Language Models

paper_url: http://arxiv.org/abs/2310.03084
repo_url: None
paper_authors: Deniz Bayazit, Negar Foroutan, Zeming Chen, Gail Weiss, Antoine Bosselut
for: 本研究旨在探索预训练语言模型（LMs）中存储知识的方法。
methods: 我们提出了一种多目标可 differentiable 权重屏蔽方法，用于发现LMs中的各种知识关键互联网络，并证明我们可以通过这些子网络来精准地从模型中除除特定知识。
results: 我们在多个GPT2变种上应用了我们的方法，并发现了高度稀疏的互联网络（98%以上），这些网络仅负责特定的关系知识。当这些网络被移除时，原始语言模型保留了大部分的语言和其他已memorized关系知识，但是无法表达被移除的知识，并在下游任务中表现出较大的性能下降。

Abstract
Pretrained language models (LMs) encode implicit representations of knowledge in their parameters. However, localizing these representations and disentangling them from each other remains an open problem. In this work, we investigate whether pretrained language models contain various knowledge-critical subnetworks: particular sparse computational subgraphs responsible for encoding specific knowledge the model has memorized. We propose a multi-objective differentiable weight masking scheme to discover these subnetworks and show that we can use them to precisely remove specific knowledge from models while minimizing adverse effects on the behavior of the original language model. We demonstrate our method on multiple GPT2 variants, uncovering highly sparse subnetworks (98%+) that are solely responsible for specific collections of relational knowledge. When these subnetworks are removed, the remaining network maintains most of its initial capacity (modeling language and other memorized relational knowledge) but struggles to express the removed knowledge, and suffers performance drops on examples needing this removed knowledge on downstream tasks after finetuning.

摘要
预训言语模型（LM）储存了知识的隐式表示。然而，当地址这些表示并分离它们时，仍然是一个开放的问题。在这项工作中，我们调查了预训言语模型是否包含不同类型的知识核心子网络：特定的稀疏计算子图负责编码特定的知识，模型记忆的。我们提出了多目标可微分权重封顶方案，用于发现这些子网络，并证明我们可以使用它们来精确地从模型中除除特定知识，以避免对原始语言模型的行为产生负面影响。我们在多个GPT2变体上应用了这种方法，揭示出了高度稀疏的子网络（98%以上），这些子网络仅负责特定的关系知识。当这些子网络被移除后，剩下的网络保留了大部分的初始容量（模型语言和其他记忆的关系知识），但是它困难表达被移除的知识，并在下游任务之后训练后表现下降。

LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving

paper_url: http://arxiv.org/abs/2310.03026
repo_url: None
paper_authors: Hao Sha, Yao Mu, Yuxuan Jiang, Li Chen, Chenfeng Xu, Ping Luo, Shengbo Eben Li, Masayoshi Tomizuka, Wei Zhan, Mingyu Ding
For: This paper aims to address the challenges faced by existing learning-based autonomous driving (AD) systems, such as comprehending high-level information, generalizing to rare events, and providing interpretability.* Methods: The paper employs Large Language Models (LLMs) as a decision-making component for complex AD scenarios, and develops algorithms for translating LLM decisions into actionable driving commands. The proposed method integrates LLM decisions with low-level controllers through guided parameter matrix adaptation.* Results: The proposed method consistently surpasses baseline approaches in single-vehicle tasks and handles complex driving behaviors, including multi-vehicle coordination, thanks to the commonsense reasoning capabilities of LLMs. The method demonstrates improvements in safety, efficiency, generalizability, and interoperability.Here’s the information in Simplified Chinese text:* 为：这篇论文目的是解决现有的学习型自动驾驶（AD）系统面临的挑战，包括理解高级信息、泛化到罕见事件以及提供可解释性。* 方法：论文使用大型语言模型（LLM）作为复杂AD场景的决策组件，并开发了将 LLM 决策转化为可驾驶指令的算法。提议的方法将 LLM 决策与低级控制器集成through导航参数矩阵适应。* 结果：提议的方法在单车任务上一致性地超越基线方法，并能够处理复杂的驾驶行为，包括多车协调。这些成果表明了提议的方法在安全、效率、泛化性和可工作性等方面具有优势。

Abstract
Existing learning-based autonomous driving (AD) systems face challenges in comprehending high-level information, generalizing to rare events, and providing interpretability. To address these problems, this work employs Large Language Models (LLMs) as a decision-making component for complex AD scenarios that require human commonsense understanding. We devise cognitive pathways to enable comprehensive reasoning with LLMs, and develop algorithms for translating LLM decisions into actionable driving commands. Through this approach, LLM decisions are seamlessly integrated with low-level controllers by guided parameter matrix adaptation. Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination, thanks to the commonsense reasoning capabilities of LLMs. This paper presents an initial step toward leveraging LLMs as effective decision-makers for intricate AD scenarios in terms of safety, efficiency, generalizability, and interoperability. We aspire for it to serve as inspiration for future research in this field. Project page: https://sites.google.com/view/llm-mpc

摘要
现有的学习型自动驾驶（AD）系统面临高度信息理解、罕见事件扩展和解释性提供问题。为解决这些问题，本研究将大型语言模型（LLM）作为决策 компонен应用于复杂的 AD 情况，需要人类通过感知理解。我们设计了认知路径来实现全面的理解，并开发了将 LLM 决策转换为可行动指令的算法。这种方法使得 LLM 的决策与低层控制器集成，并且通过导引 Parametric Matrix Adaptation（PMA）来实现。实验结果显示，我们的提案方法不仅在单车辆任务中显著超过基eline方法，而且能够处理复杂的驾驶行为，甚至多车辆协调。这篇论文表明了使用 LLM 作为具有安全、效率、普遍性和互操作性的决策员的可能性。我们希望这篇论文能够成为未来这个领域的研究启发。项目页面：https://sites.google.com/view/llm-mpc

Retrieval meets Long Context Large Language Models

paper_url: http://arxiv.org/abs/2310.03025
repo_url: None
paper_authors: Peng Xu, Wei Ping, Xianchao Wu, Lawrence McAfee, Chen Zhu, Zihan Liu, Sandeep Subramanian, Evelina Bakhturina, Mohammad Shoeybi, Bryan Catanzaro
for: This paper aims to compare the effectiveness of retrieval-augmentation and long context extension for improving the performance of large language models (LLMs) on downstream tasks.
methods: The authors use two state-of-the-art pretrained LLMs, a proprietary 43B GPT and LLaMA2-70B, and compare the performance of these models with and without retrieval-augmentation and long context extension.
results: The authors find that retrieval-augmentation can significantly improve the performance of LLMs, regardless of their extended context window sizes. Their best model, retrieval-augmented LLaMA2-70B with 32K context window, outperforms GPT-3.5-turbo-16k and Davinci003 in terms of average score on seven long context tasks, while being much faster at generation.Here’s the Chinese translation of the three information points:
for: 这篇论文目标是比较大语言模型（LLM）下游任务的表现，使用扩展 langecontext window 和 retrieval-augmentation 两种方法进行比较。
methods: 作者使用了两个 state-of-the-art 预训练 LLM，一个是一个专有的 43B GPT，另一个是 LLaMA2-70B，并对这两个模型进行了比较。
results: 作者发现，使用 retrieval-augmentation 可以significantly improve LLM 的表现，无论其扩展 context window 的大小。最佳模型， retrieval-augmented LLaMA2-70B 的 32K context window，在七个长 context task 中的平均分数上 OUTPERFORMS GPT-3.5-turbo-16k 和 Davinci003，同时在生成速度上也比其快得多。

Abstract
Extending the context window of large language models (LLMs) is getting popular recently, while the solution of augmenting LLMs with retrieval has existed for years. The natural questions are: i) Retrieval-augmentation versus long context window, which one is better for downstream tasks? ii) Can both methods be combined to get the best of both worlds? In this work, we answer these questions by studying both solutions using two state-of-the-art pretrained LLMs, i.e., a proprietary 43B GPT and LLaMA2-70B. Perhaps surprisingly, we find that LLM with 4K context window using simple retrieval-augmentation at generation can achieve comparable performance to finetuned LLM with 16K context window via positional interpolation on long context tasks, while taking much less computation. More importantly, we demonstrate that retrieval can significantly improve the performance of LLMs regardless of their extended context window sizes. Our best model, retrieval-augmented LLaMA2-70B with 32K context window, outperforms GPT-3.5-turbo-16k and Davinci003 in terms of average score on seven long context tasks including question answering and query-based summarization. It also outperforms its non-retrieval LLaMA2-70B-32k baseline by a margin, while being much faster at generation. Our study provides general insights on the choice of retrieval-augmentation versus long context extension of LLM for practitioners.

摘要
大型语言模型（LLM）的上下文窗口扩展已经在最近很受欢迎，而使用检索来增强LLM的解决方案已经存在很多年了。然而，问题是：i) 检索增强 versus 长上下文窗口，哪一个更适合下游任务？ii) 两种方法可以在一起使用，以获得最佳的结果吗？在这个研究中，我们回答了这些问题，通过使用两个国际先进的预训练LLM，即一个43B GPT和LLaMA2-70B。可能有些surprisingly，我们发现，使用简单的检索增强在生成时可以将LLM的4K上下文窗口与较短的16K上下文窗口进行比较，而且只需要相对较少的计算。更重要的是，我们发现检索可以对LLM的性能产生显著改善，无论它的上下文窗口大小如何。我们的最佳模型是使用检索增强LLaMA2-70B的32K上下文窗口，在七个长上下文任务中的平均得分高于GPT-3.5-turbo-16k和Davinci003，而且和非检索LLaMA2-70B-32k基eline的差距更大，同时在生成时 faster。我们的研究提供了 LLM的选择检索增强 versus 长上下文扩展的一般性见解，为实践者提供了指导。

Human-oriented Representation Learning for Robotic Manipulation

paper_url: http://arxiv.org/abs/2310.03023
repo_url: None
paper_authors: Mingxiao Huo, Mingyu Ding, Chenfeng Xu, Thomas Tian, Xinghao Zhu, Yao Mu, Lingfeng Sun, Masayoshi Tomizuka, Wei Zhan
for: 这 paper 的目的是提出一种基于人类视觉表征的 robot 学习方法，以便更好地学习 manipulate 任务。
methods: 这 paper 使用 multi-task fine-tuning 技术，在 pre-trained 视觉编码器上同时学习多个人类视觉技能，以提高 robot 的 manipulate 能力。
results: 实验表明，这 paper 的 Task Fusion Decoder 可以有效地改善三种state-of-the-art 视觉编码器（R3M、MVP、EgoVLP）的表征，以便下游 manipulate 策略学习。

Abstract
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks. We advocate that such a representation automatically arises from simultaneously learning about multiple simple perceptual skills that are critical for everyday scenarios (e.g., hand detection, state estimate, etc.) and is better suited for learning robot manipulation policies compared to current state-of-the-art visual representations purely based on self-supervised objectives. We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders, where each task is a perceptual skill tied to human-environment interactions. We introduce Task Fusion Decoder as a plug-and-play embedding translator that utilizes the underlying relationships among these perceptual skills to guide the representation learning towards encoding meaningful structure for what's important for all perceptual skills, ultimately empowering learning of downstream robotic manipulation tasks. Extensive experiments across a range of robotic tasks and embodiments, in both simulations and real-world environments, show that our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders including R3M, MVP, and EgoVLP, for downstream manipulation policy-learning. Project page: https://sites.google.com/view/human-oriented-robot-learning

摘要
人类自然地拥有通用的视觉表示，这使得他们能够高效地探索和与环境进行互动，在操作任务中。我们认为这种表示自动从同时学习多个简单的感知技能（例如手检测、状态估计等）中得到，这些技能是日常场景中 kritical的。我们通过人类中心的多任务练习和预训练视觉编码器的方式来正式化这个想法。我们引入了任务融合解码器，它利用这些感知技能之间的下面关系来导向表示学习，以便在所有感知技能中编码有意义的结构，最终为下游机器人操作任务学习提供更好的表示。我们在多种机器人任务和实体上，包括实验室和实际环境，进行了广泛的实验，结果表明，我们的任务融合解码器可以一直提高三个现状最佳的视觉编码器，包括R3M、MVP和EgoVLP，以便下游机器人操作策略学习。项目页面：https://sites.google.com/view/human-oriented-robot-learning

paper_url: http://arxiv.org/abs/2310.03024
repo_url: https://github.com/PolymathicAI/AstroCLIP
paper_authors: Francois Lanusse, Liam Parker, Siavash Golkar, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Geraud Krawezik, Michael McCabe, Ruben Ohana, Mariel Pettee, Bruno Regaldo-Saint Blancard, Tiberiu Tesileanu, Kyunghyun Cho, Shirley Ho
for: bridging the gap between diverse observational modalities in astronomy, specifically between images and optical spectra of galaxies
methods: cross-modal contrastive learning approach using multi-band images and optical spectra from the Dark Energy Spectroscopic Instrument (DESI)
results: highly informative embeddings of both modalities that can be used for accurate cross-modal searches, and encoding valuable physical information about the galaxies (redshift and stellar mass) that can be used for competitive zero- and few-shot predictions without further finetuning.

Abstract
We present AstroCLIP, a strategy to facilitate the construction of astronomical foundation models that bridge the gap between diverse observational modalities. We demonstrate that a cross-modal contrastive learning approach between images and optical spectra of galaxies yields highly informative embeddings of both modalities. In particular, we apply our method on multi-band images and optical spectra from the Dark Energy Spectroscopic Instrument (DESI), and show that: (1) these embeddings are well-aligned between modalities and can be used for accurate cross-modal searches, and (2) these embeddings encode valuable physical information about the galaxies -- in particular redshift and stellar mass -- that can be used to achieve competitive zero- and few- shot predictions without further finetuning. Additionally, in the process of developing our approach, we also construct a novel, transformer-based model and pretraining approach for processing galaxy spectra.

摘要
我们提出了 AstroCLIP，一种方法来促进天文基础模型的建构，使得不同观测方式之间的 gap 得到 bridge。我们证明了一种在图像和光谱之间进行交叉模态学习的方法，可以得到高度有用的嵌入。特别是，我们对 DESI 多核带图像和光谱进行了应用，并证明了以下两点：1. 这些嵌入是多modal之间协调的，可以进行精准的交叉模态搜索。2. 这些嵌入包含有价值的物理信息，例如红移和星系质量，可以用于实现无需追加微调的零和几次预测。在开发我们的方法的过程中，我们还构建了一种新的transformer模型和预训练方法，用于处理星系光谱。

SemiReward: A General Reward Model for Semi-supervised Learning

paper_url: http://arxiv.org/abs/2310.03013
repo_url: https://github.com/Westlake-AI/SemiReward
paper_authors: Siyuan Li, Weiyang Jin, Zedong Wang, Fang Wu, Zicheng Liu, Cheng Tan, Stan Z. Li
for: 提高 semi-supervised learning (SSL) 的性能和速度，并具有任务多样性和场景适应性。
methods: 提出了一种 Semi-supervised Reward 框架 (SemiReward)，通过预测奖励分数来评估和筛选高质量的 pseudo labels，可插入主流 SSL 方法中。
results: 对于13个标准 SSL benchmark 的三种模式，广泛的实验表明 SemiReward 可以获得显著性能提升和更快的整合速度，比 Pseudo Label、FlexMatch 和 Free/SoftMatch 更好。

Abstract
Semi-supervised learning (SSL) has witnessed great progress with various improvements in the self-training framework with pseudo labeling. The main challenge is how to distinguish high-quality pseudo labels against the confirmation bias. However, existing pseudo-label selection strategies are limited to pre-defined schemes or complex hand-crafted policies specially designed for classification, failing to achieve high-quality labels, fast convergence, and task versatility simultaneously. To these ends, we propose a Semi-supervised Reward framework (SemiReward) that predicts reward scores to evaluate and filter out high-quality pseudo labels, which is pluggable to mainstream SSL methods in wide task types and scenarios. To mitigate confirmation bias, SemiReward is trained online in two stages with a generator model and subsampling strategy. With classification and regression tasks on 13 standard SSL benchmarks of three modalities, extensive experiments verify that SemiReward achieves significant performance gains and faster convergence speeds upon Pseudo Label, FlexMatch, and Free/SoftMatch.

摘要
半supervised learning（SSL）在自我训练框架中各种改进得到了 significiant progress。然而，主要挑战是如何distinguish高质量的pseudo标签对于 confirmation bias。现有的pseudo标签选择策略受到先defined scheme或者特制的手工策略所限制，无法同时实现高质量标签、快速整合和任务多样性。为了解决这些问题，我们提议一种半supervised reward框架（SemiReward），该框架可以预测 Pseudo Labels的评分分数，并用这些分数来筛选高质量的pseudo标签。SemiReward可以与主流的SSL方法在各种任务类型和场景中结合使用。为了mitigate confirmation bias，SemiReward在线上训练在两个阶段，使用生成器模型和抽样策略。在13个标准SSL benchmark上进行了多种任务的实验，结果表明，SemiReward可以在Pseudo Label、FlexMatch和Free/SoftMatch上实现显著的性能提升和快速的整合速度。

Soft Convex Quantization: Revisiting Vector Quantization with Convex Optimization

paper_url: http://arxiv.org/abs/2310.03004
repo_url: None
paper_authors: Tanmay Gautam, Reid Pryzant, Ziyi Yang, Chenguang Zhu, Somayeh Sojoudi
For: + The paper is written for extracting informative discrete latent representations using Vector Quantization (VQ) and mitigating its practical challenges.* Methods: + The paper proposes Soft Convex Quantization (SCQ) as a direct substitute for VQ, which is a differentiable convex optimization (DCO) layer that solves for the optimal convex combination of codebook vectors to quantize inputs.* Results: + The paper shows that SCQ significantly outperforms matched VQ-based architectures in terms of image reconstruction and codebook usage, with an order of magnitude improvement and comparable quantization runtime. The paper also demonstrates the efficacy of SCQ on the CIFAR-10, GTSRB, and LSUN datasets.

Abstract
Vector Quantization (VQ) is a well-known technique in deep learning for extracting informative discrete latent representations. VQ-embedded models have shown impressive results in a range of applications including image and speech generation. VQ operates as a parametric K-means algorithm that quantizes inputs using a single codebook vector in the forward pass. While powerful, this technique faces practical challenges including codebook collapse, non-differentiability and lossy compression. To mitigate the aforementioned issues, we propose Soft Convex Quantization (SCQ) as a direct substitute for VQ. SCQ works like a differentiable convex optimization (DCO) layer: in the forward pass, we solve for the optimal convex combination of codebook vectors that quantize the inputs. In the backward pass, we leverage differentiability through the optimality conditions of the forward solution. We then introduce a scalable relaxation of the SCQ optimization and demonstrate its efficacy on the CIFAR-10, GTSRB and LSUN datasets. We train powerful SCQ autoencoder models that significantly outperform matched VQ-based architectures, observing an order of magnitude better image reconstruction and codebook usage with comparable quantization runtime.

摘要
Vector量生成（VQ）是深度学习中广泛使用的一种技术，用于从输入数据中提取有用的整数纹理表示。VQ模型在图像和语音生成等应用中显示出了很好的效果。VQ运算如 parametric K-means 算法，在前向传输中使用单个codebook vector来量化输入。虽然这种技术具有很大的力量，但它还面临着实际问题，包括codebook塌陷、非导数性和数据损失。为了解决这些问题，我们提议使用软凸量化（SCQ）作为VQ的直接替代品。SCQ在前向传输中解决了输入的最佳凸合问题，并在反向传输中利用了导数性。我们然后提出了SCQ优化的可扩展relaxation，并在CIFAR-10、GTSRB和LSUN数据集上证明了其效果。我们在SCQ自适应模型中训练出了与对照VQ模型相比有 ORDER OF MAGNITUDE 更好的图像重建和codebook使用，同时具有相对较快的量化时间。

ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models

paper_url: http://arxiv.org/abs/2310.02998
repo_url: https://github.com/ylsung/ECoFLaP
paper_authors: Yi-Lin Sung, Jaehong Yoon, Mohit Bansal
for: 这篇论文旨在提出一种高效的层别剪减方法，以简化大型视觉语言模型（LVLM）的 computional 成本和碳负担，并提高多modal 下游任务的性能。
methods: 这篇论文提出了一种两阶段粗细剪减方法，首先通过调整层别或封页的组合缩减因子，然后使用全球性质给出的统计量来进行本层次的粗细剪减。
results: 这篇论文在多modal 和单modal 模型和数据集上进行验证，证明了与传统剪减方法相比，这种方法在高简化状态下能够提高性能。

Abstract
Large Vision-Language Models (LVLMs) can understand the world comprehensively by integrating rich information from different modalities, achieving remarkable performance improvements on various multimodal downstream tasks. However, deploying LVLMs is often problematic due to their massive computational/energy costs and carbon consumption. Such issues make it infeasible to adopt conventional iterative global pruning, which is costly due to computing the Hessian matrix of the entire large model for sparsification. Alternatively, several studies have recently proposed layer-wise pruning approaches to avoid the expensive computation of global pruning and efficiently compress model weights according to their importance within a layer. However, these methods often suffer from suboptimal model compression due to their lack of a global perspective. To address this limitation in recent efficient pruning methods for large models, we propose Efficient Coarse-to-Fine Layer-Wise Pruning (ECoFLaP), a two-stage coarse-to-fine weight pruning approach for LVLMs. We first determine the sparsity ratios of different layers or blocks by leveraging the global importance score, which is efficiently computed based on the zeroth-order approximation of the global model gradients. Then, the multimodal model performs local layer-wise unstructured weight pruning based on globally-informed sparsity ratios. We validate our proposed method across various multimodal and unimodal models and datasets, demonstrating significant performance improvements over prevalent pruning techniques in the high-sparsity regime.

摘要
大型视言语模型（LVLM）可以全面理解世界，通过 integrate 多种感知模式的丰富信息，实现了多种多模态下游任务的很好表现。然而，部署 LVLM 的时候会遇到巨大的计算/能源成本和碳排放问题，这些问题使得使用 conventient 迭代全球磨煞成本昂贵。相反，一些最近的研究已经提出了层 wise 磨煞方法，以避免全球磨煞的昂贵计算，并快速压缩模型 весов根据它们在层中的重要性。然而，这些方法经常受到不佳的模型压缩问题，因为它们缺乏全局视角。为了解决这些限制，我们提出了高效的 Coarse-to-Fine 层 wise 磨煞方法（ECoFLaP），这是一个两个阶段的粗粒度-细粒度 weight 磨煞方法。我们首先通过利用全局重要性分数来确定不同层或块的缺失比率。然后，模型在基于全球重要性分数的局部层 wise 无结构Weight 磨煞。我们在多种多模态模型和数据集上验证了我们的提议方法，并证明了在高缺失比率下，它们与现有的磨煞技术相比具有显著的性能改进。

Multiple Physics Pretraining for Physical Surrogate Models

paper_url: http://arxiv.org/abs/2310.02994
repo_url: https://github.com/PolymathicAI/multiple_physics_pretraining
paper_authors: Michael McCabe, Bruno Régaldo-Saint Blancard, Liam Holden Parker, Ruben Ohana, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Siavash Golkar, Geraud Krawezik, Francois Lanusse, Mariel Pettee, Tiberiu Tesileanu, Kyunghyun Cho, Shirley Ho
for: 这 paper 是为了开发一种physics pretraining（MPP），用于模拟多种不同物理系统的动力学行为。
methods: MPP 使用了一种共享嵌入和 нор化策略，将多个系统的场景映射到单一的嵌入空间中，以便学习通用于多种物理任务的特征。
results: authors Validated MPP 的效果在预训练和下游任务中，并显示了一个 MPP 预训练的 transformer 能够与任务特定的基eline 匹配或超越，无需进行微调。在下游任务中，finetuning MPP 预训练的模型可以在新的物理上提供更准确的预测，比如在多个时间步上。 authors 还开源了他们的代码和模型权重，以便重现和社区实验。

Abstract
We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling. MPP involves training large surrogate models to predict the dynamics of multiple heterogeneous physical systems simultaneously by learning features that are broadly useful across diverse physical tasks. In order to learn effectively in this setting, we introduce a shared embedding and normalization strategy that projects the fields of multiple systems into a single shared embedding space. We validate the efficacy of our approach on both pretraining and downstream tasks over a broad fluid mechanics-oriented benchmark. We show that a single MPP-pretrained transformer is able to match or outperform task-specific baselines on all pretraining sub-tasks without the need for finetuning. For downstream tasks, we demonstrate that finetuning MPP-trained models results in more accurate predictions across multiple time-steps on new physics compared to training from scratch or finetuning pretrained video foundation models. We open-source our code and model weights trained at multiple scales for reproducibility and community experimentation.

摘要
我们介绍了多物理预训练（MPP），一种无关任务的泛化预训练方法，用于物理替身模型。MPP通过训练大型替身模型，以预测多种不同物理系统的动态，同时学习普遍适用于多种物理任务的特征。为了有效地学习在这种设定下，我们引入共享嵌入和 норализовimplified Chinese的策略，将多种系统的场景 проек到单一的共享嵌入空间中。我们验证了我们的方法在预训练和下游任务上的有效性，并在广泛的流体力学指标上进行了评估。我们发现，一个MPP预训练的变换器可以与任务特定基线模型匹配或超越，无需进行微调。在下游任务中，我们证明了对MPP预训练模型进行微调，可以在新的物理任务上提供更准确的预测，比起从零开始训练或微调预先训练的视频基础模型。我们开源了我们的代码和模型权重，以便重现和社区实验。

xVal: A Continuous Number Encoding for Large Language Models

paper_url: http://arxiv.org/abs/2310.02989
repo_url: https://github.com/PolymathicAI/xVal
paper_authors: Siavash Golkar, Mariel Pettee, Michael Eickenberg, Alberto Bietti, Miles Cranmer, Geraud Krawezik, Francois Lanusse, Michael McCabe, Ruben Ohana, Liam Parker, Bruno Régaldo-Saint Blancard, Tiberiu Tesileanu, Kyunghyun Cho, Shirley Ho
for: 这篇论文是为了解决科学数据分析中数字化的问题而写的。
methods: 这篇论文提出了一种名为xVal的数字编码方案，该方案可以将任何实数表示为唯一的字符串。
results: 作者通过对一些 sintetic 和实际世界数据进行实验，发现 xVal 比现有的数字编码方案更加Token效率，并且在总体上具有更好的泛化性。

Abstract
Large Language Models have not yet been broadly adapted for the analysis of scientific datasets due in part to the unique difficulties of tokenizing numbers. We propose xVal, a numerical encoding scheme that represents any real number using just a single token. xVal represents a given real number by scaling a dedicated embedding vector by the number value. Combined with a modified number-inference approach, this strategy renders the model end-to-end continuous when considered as a map from the numbers of the input string to those of the output string. This leads to an inductive bias that is generally more suitable for applications in scientific domains. We empirically evaluate our proposal on a number of synthetic and real-world datasets. Compared with existing number encoding schemes, we find that xVal is more token-efficient and demonstrates improved generalization.

摘要
大型语言模型尚未广泛适用于科学数据分析，部分原因是处理数字的特殊问题。我们提出xVal，一种数字编码方案，将任何实数用单一的字元表示。xVal使用对应数值的专门映射vector进行缩放，将输入字串中的数字转换为输出字串中的数字。这种方法使得模型成为一个端到端连续的映射，从输入字串中的数字到输出字串中的数字。这导致一个更适合科学领域的预设偏好。我们在一些 sintetic 和实际世界数据上进行了实验评估，与现有的数字编码方案相比，我们发现xVal更加 Token-efficient 且具有改善的通用化能力。

Probing Intersectional Biases in Vision-Language Models with Counterfactual Examples

paper_url: http://arxiv.org/abs/2310.02988
repo_url: None
paper_authors: Phillip Howard, Avinash Madasu, Tiep Le, Gustavo Lujan Moreno, Vasudev Lal
for: 本研究旨在探讨当代视力语言模型（VLM）具有社会属性偏见的问题。
methods: 我们使用文本到图像扩散模型生成大规模的对比例ional社会偏见的实验数据，以探讨当前VLM中的交叉社会属性偏见。
results: 我们的实验结果表明，当前VLM中存在交叉社会属性偏见，并且这种偏见在不同的社会属性交叉点上存在差异。

Abstract
While vision-language models (VLMs) have achieved remarkable performance improvements recently, there is growing evidence that these models also posses harmful biases with respect to social attributes such as gender and race. Prior studies have primarily focused on probing such bias attributes individually while ignoring biases associated with intersections between social attributes. This could be due to the difficulty of collecting an exhaustive set of image-text pairs for various combinations of social attributes from existing datasets. To address this challenge, we employ text-to-image diffusion models to produce counterfactual examples for probing intserctional social biases at scale. Our approach utilizes Stable Diffusion with cross attention control to produce sets of counterfactual image-text pairs that are highly similar in their depiction of a subject (e.g., a given occupation) while differing only in their depiction of intersectional social attributes (e.g., race & gender). We conduct extensive experiments using our generated dataset which reveal the intersectional social biases present in state-of-the-art VLMs.

摘要
“Recently, vision-language models（VLMs）have achieved remarkable performance improvements, but there is growing evidence that these models also possess harmful biases with respect to social attributes such as gender and race. Prior studies have primarily focused on probing such bias attributes individually, while ignoring biases associated with intersections between social attributes. This could be due to the difficulty of collecting an exhaustive set of image-text pairs for various combinations of social attributes from existing datasets. To address this challenge, we employ text-to-image diffusion models to produce counterfactual examples for probing intersectional social biases at scale. Our approach utilizes Stable Diffusion with cross attention control to produce sets of counterfactual image-text pairs that are highly similar in their depiction of a subject (e.g., a given occupation) while differing only in their depiction of intersectional social attributes (e.g., race & gender). We conduct extensive experiments using our generated dataset, which reveal the intersectional social biases present in state-of-the-art VLMs.”Here's the breakdown of the translation:* “vision-language models”(VLMs) becomes “视力语言模型”(VLM)* “recently” becomes “近期”* “achieved” becomes “取得”* “remarkable performance improvements” becomes “显著性能提升”* “growing evidence” becomes “增长的证据”* “social attributes” becomes “社会属性”* “such as gender and race” becomes “如gender和race”* “Prior studies have primarily focused on probing such bias attributes individually” becomes “先前的研究主要集中在单独探索这些偏见属性上”* “while ignoring biases associated with intersections between social attributes” becomes “而忽略了社会属性之间的交叉偏见”* “This could be due to the difficulty of collecting an exhaustive set of image-text pairs for various combinations of social attributes from existing datasets” becomes “这可能是因为现有数据集中存在社会属性的多种组合的图像文本对的采集困难”* “To address this challenge, we employ text-to-image diffusion models to produce counterfactual examples for probing intersectional social biases at scale” becomes “为了解决这个挑战，我们使用文本到图像扩散模型生成大规模的对例，以探索交叉社会偏见”* “Our approach utilizes Stable Diffusion with cross attention control to produce sets of counterfactual image-text pairs that are highly similar in their depiction of a subject (e.g., a given occupation) while differing only in their depiction of intersectional social attributes (e.g., race & gender)” becomes “我们的方法使用稳定扩散，并使用对比注意控制生成一系列高度相似的图像文本对，其中图像和文本之间的差异仅仅在于交叉社会属性（如种族和性别）上”* “We conduct extensive experiments using our generated dataset, which reveal the intersectional social biases present in state-of-the-art VLMs” becomes “我们使用生成的数据集进行广泛的实验，发现现有最先进的VLM模型中存在交叉社会偏见”

Exploring the Impact of Disrupted Peer-to-Peer Communications on Fully Decentralized Learning in Disaster Scenarios

paper_url: http://arxiv.org/abs/2310.02986
repo_url: None
paper_authors: Luigi Palmieri, Chiara Boldrini, Lorenzo Valerio, Andrea Passarella, Marco Conti
for: This paper explores the resilience of decentralized learning in a disaster setting, specifically how the process is affected by abrupt changes in peer-to-peer communications.
methods: The paper uses a Barabasi-Albert graph topology and investigates the effects of losing devices with data versus those that only contribute to the graph connectivity.
results: The study finds that a loss of connectivity has a greater impact on the accuracy of the learning process than a loss of data, but the network remains relatively robust and the process can achieve a good level of accuracy.Here’s the text in Simplified Chinese:
for: 这篇论文研究了在灾难场景下分布式学习的可恢复性，具体来说是如何在各个用户设备或节点之间分布学习资源和决策能力时，在不中心化的情况下，维持学习过程的稳定性和可靠性。
methods: 这篇论文使用了Barabasi-Albert图 topology，并对各个设备的数据分布情况进行了分析，以了解在各种灾难场景下分布式学习的影响。
results: 研究发现，在分布式学习过程中，对连接性的损害比对数据的损害更大程度地影响学习的准确性，但是网络仍然保持了相对的稳定性，并且学习过程可以达到一定的准确性。

Abstract
Fully decentralized learning enables the distribution of learning resources and decision-making capabilities across multiple user devices or nodes, and is rapidly gaining popularity due to its privacy-preserving and decentralized nature. Importantly, this crowdsourcing of the learning process allows the system to continue functioning even if some nodes are affected or disconnected. In a disaster scenario, communication infrastructure and centralized systems may be disrupted or completely unavailable, hindering the possibility of carrying out standard centralized learning tasks in these settings. Thus, fully decentralized learning can help in this case. However, transitioning from centralized to peer-to-peer communications introduces a dependency between the learning process and the topology of the communication graph among nodes. In a disaster scenario, even peer-to-peer communications are susceptible to abrupt changes, such as devices running out of battery or getting disconnected from others due to their position. In this study, we investigate the effects of various disruptions to peer-to-peer communications on decentralized learning in a disaster setting. We examine the resilience of a decentralized learning process when a subset of devices drop from the process abruptly. To this end, we analyze the difference between losing devices holding data, i.e., potential knowledge, vs. devices contributing only to the graph connectivity, i.e., with no data. Our findings on a Barabasi-Albert graph topology, where training data is distributed across nodes in an IID fashion, indicate that the accuracy of the learning process is more affected by a loss of connectivity than by a loss of data. Nevertheless, the network remains relatively robust, and the learning process can achieve a good level of accuracy.

摘要
全面协同学习可以将学习资源和决策能力分布在多个用户设备或节点之间，并在快速增长的privacy保持和分布式特性下得到广泛应用。重要的是，这种拥有多个节点协同学习的系统可以继续正常工作， même si一些节点被影响或断开。在灾难enario中，通信基础设施和中央系统可能会受损或完全无法使用，这会限制标准中央式学习任务的执行。因此，全面协同学习可以帮助。然而，从中央式通信转换到点对点通信会导致学习过程与节点之间的通信图模式相互依赖。在灾难enario中，甚至点对点通信也可能受到不可预期的变化，如设备电池耗尽或与其他设备的连接被中断。在这项研究中，我们研究了在灾难设置下点对点通信受到各种中断的影响。我们分析了因设备突然退出学习过程而导致的减少精度的影响。我们发现，在Barabasi-Albert图 topology上，分布式学习过程中训练数据的分布和连接图的结构强相关。尽管连接图中的设备损失会导致精度下降，但是网络仍然保持相对稳定，并且学习过程可以达到较高的精度。

Scaling Laws for Associative Memories

paper_url: http://arxiv.org/abs/2310.02984
repo_url: None
paper_authors: Vivien Cabannes, Elvis Dohmatob, Alberto Bietti
for: 这篇论文的目的是研究 associative memory 机制。
methods: 该模型基于高维矩阵，其中每个矩阵由外积 embedding 相关的层次结构组成，与 transformer 语言模型的内层相似。
results: 文章提供了对样本大小和参数大小的精确推算法，以及不同优化算法的统计效率。并通过大量的数值实验 validate 和解释理论结果，包括细致的视觉化存储记忆协同关系。

Abstract
Learning arguably involves the discovery and memorization of abstract rules. The aim of this paper is to study associative memory mechanisms. Our model is based on high-dimensional matrices consisting of outer products of embeddings, which relates to the inner layers of transformer language models. We derive precise scaling laws with respect to sample size and parameter size, and discuss the statistical efficiency of different estimators, including optimization-based algorithms. We provide extensive numerical experiments to validate and interpret theoretical results, including fine-grained visualizations of the stored memory associations.

摘要
学习可能涉及到发现和记忆抽象规则。本文的目标是研究相关记忆机制。我们的模型基于高维矩阵，其中每个矩阵由嵌入的外积生成，与变换语言模型的内层相关。我们得出了准确的缩放法律，并讨论不同估计器的统计效率。我们提供了广泛的数值实验来验证和解释理论结果，包括细化的视觉化存储相关性。

Towards Fully Adaptive Regret Minimization in Heavy-Tailed Bandits

paper_url: http://arxiv.org/abs/2310.02975
repo_url: None
paper_authors: Gianmarco Genalti, Lupo Marsigli, Nicola Gatti, Alberto Maria Metelli
for: 这篇论文研究了在含有重尾分布的随机带射问题中的学习算法，特别是在分布参数未知的情况下。
methods: 这篇论文使用了适应算法，并提供了一种名为适应稳健UCB的追踪策略，以最小化缺失的 regret。
results: 研究发现，适应策略会带来更高的缺失 regret，比标准设定更高。然而，通过设定一种特定的分布假设，可以实现 Adaptive Robust UCB 算法，并达到known lower bound for the heavy-tailed MAB problem。

Abstract
Heavy-tailed distributions naturally arise in many settings, from finance to telecommunications. While regret minimization under sub-Gaussian or bounded support rewards has been widely studied, learning on heavy-tailed distributions only gained popularity over the last decade. In the stochastic heavy-tailed bandit problem, an agent learns under the assumption that the distributions have finite moments of maximum order $1+\epsilon$ which are uniformly bounded by a constant $u$, for some $\epsilon \in (0,1]$. To the best of our knowledge, literature only provides algorithms requiring these two quantities as an input. In this paper, we study the stochastic adaptive heavy-tailed bandit, a variation of the standard setting where both $\epsilon$ and $u$ are unknown to the agent. We show that adaptivity comes at a cost, introducing two lower bounds on the regret of any adaptive algorithm, implying a higher regret w.r.t. the standard setting. Finally, we introduce a specific distributional assumption and provide Adaptive Robust UCB, a regret minimization strategy matching the known lower bound for the heavy-tailed MAB problem.

摘要
重 tailed 分布自然出现在多个设置中，从金融到电信。而在 sub-Gaussian 或受 bounds 支持奖励下的 regret 最小化已经广泛研究，而学习 heavy-tailed 分布只是过去十年才开始受到关注。在随机重 tailed 策略中，一个代理人学习假设 distribution 有最大顺序数 $1+\epsilon$ 是Finite 且固定的 $u$，其中 $\epsilon \in (0,1]$。根据我们所知，文献中只提供了两个输入 quantity 的算法。在这篇论文中，我们研究随机自适应重 tailed 策略，这是标准设置的变种，在代理人手中不知道 $\epsilon$ 和 $u$。我们证明了可适性会带来成本，引入了两个下界，表明适应策略的 regret 比标准设置更高。最后，我们提出了特定的分布假设，并提供了 Adaptive Robust UCB，一种 regret 最小化策略，与 known 下界匹配 heavy-tailed MAB 问题。

Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models

paper_url: http://arxiv.org/abs/2310.03059
repo_url: https://github.com/EvenJoker/Point-PEFT
paper_authors: Ivan Tang, Eric Zhang, Ray Gu
for: 这篇论文的目的是为了提出一个 Parameter-Efficient Fine-Tuning (PEFT) 方法来适应点云范例下的下游任务，以减少适应成本。
methods: 这篇论文使用了一个名为 Point-PEFT 的框架，它将具有最小化可读的参数的方法与点云范例模型结合起来。具体来说，这篇论文将大多数点云范例模型的参数冻结不动，仅将新增加的 PEFT 模组进行调整。这两个模组包括一个 Point-prior Prompt 和一个 Geometry-aware Adapter。
results: 实验结果显示，这篇论文的 Point-PEFT 方法可以在不同的下游任务上实现更好的性能，使用的参数只有 5%，证明了这篇论文的方法是有效和有效的。

Abstract
The popularity of pre-trained large models has revolutionized downstream tasks across diverse fields, such as language, vision, and multi-modality. To minimize the adaption cost for downstream tasks, many Parameter-Efficient Fine-Tuning (PEFT) techniques are proposed for language and 2D image pre-trained models. However, the specialized PEFT method for 3D pre-trained models is still under-explored. To this end, we introduce Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters. Specifically, for a pre-trained 3D model, we freeze most of its parameters, and only tune the newly added PEFT modules on downstream tasks, which consist of a Point-prior Prompt and a Geometry-aware Adapter. The Point-prior Prompt adopts a set of learnable prompt tokens, for which we propose to construct a memory bank with domain-specific knowledge, and utilize a parameter-free attention to enhance the prompt tokens. The Geometry-aware Adapter aims to aggregate point cloud features within spatial neighborhoods to capture fine-grained geometric information through local interactions. Extensive experiments indicate that our Point-PEFT can achieve better performance than the full fine-tuning on various downstream tasks, while using only 5% of the trainable parameters, demonstrating the efficiency and effectiveness of our approach. Code will be released at https://github.com/EvenJoker/Point-PEFT.

摘要
各种预训练大型模型的流行化对下游任务中的多种领域产生了革命性的变革，如语言、视觉和多模态等。为了减少下游任务中的适应成本，许多Parameter-Efficient Fine-Tuning（PEFT）技术被提出，但特殊的3D预训练模型PEFT方法还处于未explored阶段。为此，我们介绍了Point-PEFT，一种适用于预训练3D模型的新型框架，以最小化可学习参数。具体来说，对于一个预训练的3D模型，我们冻结大多数参数，并仅在下游任务上调整新增的PEFT模块。这些PEFT模块包括Point-priorPrompt和Geometry-aware Adapter。Point-priorPrompt采用一组学习的提示符，我们提议使用域特定知识构建一个记忆银行，并使用无参数注意力增强提示符。Geometry-aware Adapter通过地理位置的相互互动来捕捉细致的几何信息。我们的Point-PEFT在多种下游任务上表现较好于全 Fine-tuning，使用的可学习参数仅占总参数的5%，证明了我们的方法的效率和效果。代码将在https://github.com/EvenJoker/Point-PEFT上发布。

Credit card score prediction using machine learning models: A new dataset

paper_url: http://arxiv.org/abs/2310.02956
repo_url: https://github.com/Aryia-Behroziuan/neurons
paper_authors: Anas Arram, Masri Ayob, Musatafa Abbas Abbood Albadr, Alaa Sulaiman, Dheeb Albashish
for: 预测信用卡Default风险
methods: 使用机器学习模型进行信用卡Default预测
results: MLP模型在预测Default客户和评估风险中表现出色，AUC为86.7%，准确率为91.6%，记忆率超过80%。

Abstract
The use of credit cards has recently increased, creating an essential need for credit card assessment methods to minimize potential risks. This study investigates the utilization of machine learning (ML) models for credit card default prediction system. The main goal here is to investigate the best-performing ML model for new proposed credit card scoring dataset. This new dataset includes credit card transaction histories and customer profiles, is proposed and tested using a variety of machine learning algorithms, including logistic regression, decision trees, random forests, multi-layer perceptron (MLP) neural network, XGBoost, and LightGBM. To prepare the data for machine learning models, we perform data pre-processing, feature extraction, feature selection, and data balancing techniques. Experimental results demonstrate that MLP outperforms logistic regression, decision trees, random forests, LightGBM, and XGBoost in terms of predictive performance in true positive rate, achieving an impressive area under the curve (AUC) of 86.7% and an accuracy rate of 91.6%, with a recall rate exceeding 80%. These results indicate the superiority of MLP in predicting the default customers and assessing the potential risks. Furthermore, they help banks and other financial institutions in predicting loan defaults at an earlier stage.

摘要
受信用卡使用的增加，对可能的风险的评估方法的需求而言，已成为一项非常重要的任务。本研究探讨了使用机器学习（ML）模型来预测信用卡 defaults 系统中的最佳表现。我们使用一个新的信用卡分配数据集，该数据集包括信用卡交易历史和客户资料，并使用多种机器学习算法进行测试，包括逻辑回归、决策树、随机森林、多层感知网络（MLP）、XGBoost 和 LightGBM。为了准备数据 для机器学习模型，我们进行了数据预处理、特征提取、特征选择和数据平衡技术。实验结果表明，MLP 在 true positive rate 方面表现出色，其 AUC 为 86.7%，准确率为 91.6%， recall 率超过 80%。这些结果表明 MLP 在预测默认客户和评估风险中的优势。此外，它们可以帮助银行和其他金融机构在更早的阶段预测借款 defaults。

Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models

paper_url: http://arxiv.org/abs/2310.02949
repo_url: https://github.com/BeyonderXX/ShadowAlignment
paper_authors: Xianjun Yang, Xiao Wang, Qi Zhang, Linda Petzold, William Yang Wang, Xun Zhao, Dahua Lin
for: 这 paper 是为了探讨如何保障 AI 安全，具体来说是通过对大语言模型（LLM）进行安全对齐来防止 malicious use。
methods: 这 paper 使用了一种新的攻击方法，即阴影对齐（Shadow Alignment），它利用了一小量的数据来让安全对齐的模型适应危险任务，而不会 sacrifice 模型的帮助性。
results: 实验结果显示，使用阴影对齐攻击可以轻松地将安全对齐的模型转移到危险任务上，而且这些模型仍然能够正确地回答常见问题。

Abstract
Warning: This paper contains examples of harmful language, and reader discretion is recommended. The increasing open release of powerful large language models (LLMs) has facilitated the development of downstream applications by reducing the essential cost of data annotation and computation. To ensure AI safety, extensive safety-alignment measures have been conducted to armor these models against malicious use (primarily hard prompt attack). However, beneath the seemingly resilient facade of the armor, there might lurk a shadow. By simply tuning on 100 malicious examples with 1 GPU hour, these safely aligned LLMs can be easily subverted to generate harmful content. Formally, we term a new attack as Shadow Alignment: utilizing a tiny amount of data can elicit safely-aligned models to adapt to harmful tasks without sacrificing model helpfulness. Remarkably, the subverted models retain their capability to respond appropriately to regular inquiries. Experiments across 8 models released by 5 different organizations (LLaMa-2, Falcon, InternLM, BaiChuan2, Vicuna) demonstrate the effectiveness of shadow alignment attack. Besides, the single-turn English-only attack successfully transfers to multi-turn dialogue and other languages. This study serves as a clarion call for a collective effort to overhaul and fortify the safety of open-source LLMs against malicious attackers.

摘要
警告：本文包含有害语言的示例，读者应自行谨慎阅读。随着大语言模型（LLM）的开源释出，许多下游应用的开发成本减少了，以降低数据注释和计算成本。为保障人工智能安全，已经进行了广泛的安全对齐措施，以防止恶意使用。然而，这些安全对齐后的模型可能仍然受到威胁。我们提出了一种新的攻击方法，称为阴影对齐（Shadow Alignment），它可以使用一小amount of data来让安全对齐的模型适应危险任务，而不会 sacrificing model helpfulness。这些附加的模型仍然能够正确地回答常见问题。我们的实验表明，这种攻击方法可以在8种由5家不同组织发布的模型（LLaMa-2、Falcon、InternLM、BaiChuan2、Vicuna）中实现。此外，单turn英文Only攻击也可以成功转移到多turn对话和其他语言。这项研究 serve as a clarion call for a collective effort to overhaul and fortify the safety of open-source LLMs against malicious attackers。

Local Max-Entropy and Free Energy Principles, Belief Diffusions and their Singularities

paper_url: http://arxiv.org/abs/2310.02946
repo_url: https://github.com/opeltre/topos
paper_authors: Olivier Peltre
for: 这篇论文探讨了三种贝特-基钱拟合原则，包括它们与信仰卷积算法在树Graph上的关系。
methods: 这篇论文使用了一种通用的描述方法，总结了贝特-基钱拟合原则的结构，并将其推广到定时Diffusion中。
results: 论文显示了贝特-基钱拟合原则的稳定点和站点信仰的形态，以及这些稳定点与信仰卷积算法的关系。此外，论文还描述了偏好信仰的表达方式，以及这些表达方式在树Graph上的形态。

Abstract
A comprehensive picture of three Bethe-Kikuchi variational principles including their relationship to belief propagation (BP) algorithms on hypergraphs is given. The structure of BP equations is generalized to define continuous-time diffusions, solving localized versions of the max-entropy principle (A), the variational free energy principle (B), and a less usual equilibrium free energy principle (C), Legendre dual to A. Both critical points of Bethe-Kikuchi functionals and stationary beliefs are shown to lie at the non-linear intersection of two constraint surfaces, enforcing energy conservation and marginal consistency respectively. The hypersurface of singular beliefs, accross which equilibria become unstable as the constraint surfaces meet tangentially, is described by polynomial equations in the convex polytope of consistent beliefs. This polynomial is expressed by a loop series expansion for graphs of binary variables.

摘要
提供了三种Bethe-Kikuchi变量原理的完整图像，包括它们与信念传播（BP）算法在图上的关系。BP方程的结构被总结为定义连续时间扩散，解决本地版本的最大 entropy原理（A）、变量自由能原理（B）和一种 menos usual的平衡自由能原理（C）的Localized version。两个Bethe-Kikuchi函数的极点和站点信念都显示在两个约束表面的非线性交叉点上，其中一个表面保证能量征Compatibility，另一个表面保证边缘Consistency。在相互约束的交叉点上，equilibria变得不稳定，singular beliefs的抽象表面由多项式方程描述在具有consistent beliefs的几何体上。这个多项式可以用 Loop series expansion表示，其中变量是二进制变量的图。

Assessing Large Language Models on Climate Information

paper_url: http://arxiv.org/abs/2310.02932
repo_url: None
paper_authors: Jannis Bulian, Mike S. Schäfer, Afra Amini, Heidi Lam, Massimiliano Ciaramita, Ben Gaiarin, Michelle Chen Huebscher, Christian Buck, Niels Mede, Markus Leippold, Nadine Strauss
for: 这个研究旨在评估大语言模型（LLM）在气候变化话题上的表现，以便更好地理解它们在气候通信领域的能力。
methods: 该研究使用了基于科学沟通原则的全面评估框架，以分析 LLM 对气候变化话题的回答。该框架包括8个维度，可以识别出模型输出中的30个问题。
results: 研究人员通过评估多个最新的 LLM 和对其结果进行了全面分析，从而揭示了 LLM 在气候通信领域的潜在和局限性。

Abstract
Understanding how climate change affects us and learning about available solutions are key steps toward empowering individuals and communities to mitigate and adapt to it. As Large Language Models (LLMs) rise in popularity, it is necessary to assess their capability in this domain. In this study, we present a comprehensive evaluation framework, grounded in science communication principles, to analyze LLM responses to climate change topics. Our framework emphasizes both the presentational and epistemological adequacy of answers, offering a fine-grained analysis of LLM generations. Spanning 8 dimensions, our framework discerns up to 30 distinct issues in model outputs. The task is a real-world example of a growing number of challenging problems where AI can complement and lift human performance. We introduce a novel and practical protocol for scalable oversight that uses AI Assistance and relies on raters with relevant educational backgrounds. We evaluate several recent LLMs and conduct a comprehensive analysis of the results, shedding light on both the potential and the limitations of LLMs in the realm of climate communication.

摘要
理解气候变化如何影响我们以及了解可用解决方案是关键步骤，以便让个人和社区能够避免和适应它。随着大型自然语言模型（LLM）的崛起，需要评估它们在这个领域的能力。本研究提出了一个完整的评估框架，基于科学沟通原则，用于分析LLM对气候变化话题的回答。我们的框架强调回答的现场和知识上的适用程度，从而提供细致的分析LLM生成的Output。涵盖8个维度，我们的框架可以识别出多达30个问题。这是一个真实存在的人工智能可以补充和提高人类表现的例子。我们提出了一种新的实用协助协议，使用人工智能协助和有相关教育背景的评审人员，以实现可扩展的监督。我们对一些最新的LLM进行了评估，并对结果进行了全面的分析，从而揭示LLM在气候沟通领域的潜力和局限性。

Learning-Aided Warmstart of Model Predictive Control in Uncertain Fast-Changing Traffic

paper_url: http://arxiv.org/abs/2310.02918
repo_url: None
paper_authors: Mohamed-Khalil Bouzidi, Yue Yao, Daniel Goehring, Joerg Reichardt
for: 提高Model Predictive Control（MPC）在非凸问题中逃脱本地最优点的能力，以及在快速变化和不确定环境中提供更好的初始猜测。
methods: 使用神经网络基于多模态预测器生成多个轨迹提案，然后使用采样法进行精细调整。
results: 通过 Monte Carlo simulations validate our approach, 并显示其能够提供更多的本地最优点和更好的初始猜测。

Abstract
Model Predictive Control lacks the ability to escape local minima in nonconvex problems. Furthermore, in fast-changing, uncertain environments, the conventional warmstart, using the optimal trajectory from the last timestep, often falls short of providing an adequately close initial guess for the current optimal trajectory. This can potentially result in convergence failures and safety issues. Therefore, this paper proposes a framework for learning-aided warmstarts of Model Predictive Control algorithms. Our method leverages a neural network based multimodal predictor to generate multiple trajectory proposals for the autonomous vehicle, which are further refined by a sampling-based technique. This combined approach enables us to identify multiple distinct local minima and provide an improved initial guess. We validate our approach with Monte Carlo simulations of traffic scenarios.

摘要
模型预测控制缺乏能够脱离非对称问题的本地最小点能力。此外，在快速变化、不确定环境中，传统的温始方法，使用上一步优质轨迹作为当前优质轨迹的初始猜测，经常无法提供足够近的初始猜测，从而可能导致偏移失败和安全问题。因此，这篇论文提出了基于神经网络的学习帮助的 Model Predictive Control 算法框架。我们的方法利用神经网络基于多模态预测器生成多个轨迹建议，然后使用抽象 sampling 技术进一步细化。这种结合方法使得我们能够识别多个不同的本地最小点，并提供改进的初始猜测。我们通过 Monte Carlo 仿真交通场景验证了我们的方法。

Boosting Dermatoscopic Lesion Segmentation via Diffusion Models with Visual and Textual Prompts

paper_url: http://arxiv.org/abs/2310.02906
repo_url: None
paper_authors: Shiyi Du, Xiaosong Wang, Yongyi Lu, Yuyin Zhou, Shaoting Zhang, Alan Yuille, Kang Li, Zongwei Zhou
for: 该论文旨在提出一种基于扩散模型的数据生成方法，以提高皮肤病变诊断的准确率。
methods: 该方法使用了最新的扩散模型，并添加了病变特异的视觉和文本提示来控制生成的皮肤图像。
results: 对比于传统生成模型，该方法可以提高皮肤病变诊断的准确率，并且可以生成高质量的皮肤图像。实验结果显示，该方法可以提高SIMIT图像质量指标9%，并提高皮肤分割性能过5%。

Abstract
Image synthesis approaches, e.g., generative adversarial networks, have been popular as a form of data augmentation in medical image analysis tasks. It is primarily beneficial to overcome the shortage of publicly accessible data and associated quality annotations. However, the current techniques often lack control over the detailed contents in generated images, e.g., the type of disease patterns, the location of lesions, and attributes of the diagnosis. In this work, we adapt the latest advance in the generative model, i.e., the diffusion model, with the added control flow using lesion-specific visual and textual prompts for generating dermatoscopic images. We further demonstrate the advantage of our diffusion model-based framework over the classical generation models in both the image quality and boosting the segmentation performance on skin lesions. It can achieve a 9% increase in the SSIM image quality measure and an over 5% increase in Dice coefficients over the prior arts.

摘要
医学图像合成方法，如生成对抗网络，已成为医学图像分析任务中常用的数据增强方法。它的主要优点是解决公共数据和相关质量注释的缺乏。然而，现有技术通常无法控制生成图像的详细内容，如疾病模式、肿瘤的位置和诊断特征。在这种情况下，我们采用最新的生成模型，即扩散模型，并在生成图像时使用病诊特定的视觉和文本提示。我们进一步示出了我们的扩散模型基于框架在图像质量和诊断性能方面的优势，可以实现9%的SSIM图像质量指标和5%以上的Dice系数提高。

Searching for High-Value Molecules Using Reinforcement Learning and Transformers

paper_url: http://arxiv.org/abs/2310.02902
repo_url: None
paper_authors: Raj Ghugare, Santiago Miret, Adriana Hugessen, Mariano Phielipp, Glen Berseth
for: 这篇论文是关于使用强化学习（RL）来设计文本表示的分子结构的研究。
methods: 该论文使用了RL算法和文本 grammar来 Structuring the search space，并对不同的设计选择和训练策略进行了广泛的实验研究。
results: 经过EXTENSIVE实验，该论文提出了一种新的RL基于分子设计算法（ChemRLformer），并对25个分子设计任务进行了系统性的分析。结果表明，ChemRLformer可以达到现状之最的性能，而且比之前的工作更加直观，可以帮助解决文本基于分子设计的计算复杂问题。

Abstract
Reinforcement learning (RL) over text representations can be effective for finding high-value policies that can search over graphs. However, RL requires careful structuring of the search space and algorithm design to be effective in this challenge. Through extensive experiments, we explore how different design choices for text grammar and algorithmic choices for training can affect an RL policy's ability to generate molecules with desired properties. We arrive at a new RL-based molecular design algorithm (ChemRLformer) and perform a thorough analysis using 25 molecule design tasks, including computationally complex protein docking simulations. From this analysis, we discover unique insights in this problem space and show that ChemRLformer achieves state-of-the-art performance while being more straightforward than prior work by demystifying which design choices are actually helpful for text-based molecule design.

摘要
利用文本表示法的强化学习（RL）可以有效地找到可以搜索图表的高值策略。然而，RL需要精心设计搜索空间和算法设计，以便在这个挑战中有效。通过广泛的实验，我们探索了不同的文本语法和训练算法的设计选择对RL策略的能力生成材料有效性的影响。我们开发了一种新的RL基于分子设计算法（ChemRLformer），并进行了余程分析，使用25个分子设计任务，包括计算复杂的蛋白质嵌入 simulations。从这种分析中，我们发现了这个问题空间中的独特意见，并证明了ChemRLformer在文本基本分子设计中 achieved state-of-the-art表现，而且比之前的工作更加直观，推翻了哪些设计选择对文本基本分子设计是有用的。

Notes on a Path to AI Assistance in Mathematical Reasoning

paper_url: http://arxiv.org/abs/2310.02896
repo_url: None
paper_authors: Alex Kontorovich
for: 这篇论文的目的是为研究数学家提供AI助手。
methods: 这篇论文使用了AI技术来帮助数学家进行数学推理。
results: 论文获得了一些有用的结果，可以帮助研究数学家更好地完成他们的工作。In English, these notes would be:
for: The purpose of this paper is to provide AI assistance to research mathematicians.
methods: The paper uses AI technology to help mathematicians perform mathematical reasoning.
results: The paper obtains some useful results that can help research mathematicians complete their work more effectively.

Abstract
These informal notes are based on the author's lecture at the National Academies of Science, Engineering, and Mathematics workshop on "AI to Assist Mathematical Reasoning" in June 2023. The goal is to think through a path by which we might arrive at AI that is useful for the research mathematician.

摘要
这些 informal 笔记是基于作者在美国国家科学、工程和数学学会工作坊上的讲座，主题是“AI 助力数学推理”，发生在2023年6月。目标是思考一种路径，以到达有用于研究数学家的 AI。Note: "National Academies of Science, Engineering, and Mathematics" is translated as "美国国家科学、工程和数学学会" in Simplified Chinese.

Recent Methodological Advances in Federated Learning for Healthcare

paper_url: http://arxiv.org/abs/2310.02874
repo_url: None
paper_authors: Fan Zhang, Daniel Kreuter, Yichen Chen, Sören Dittmer, Samuel Tull, Tolou Shadbahr, BloodCounts! Collaboration, Jacobus Preller, James H. F. Rudd, John A. D. Aston, Carola-Bibiane Schönlieb, Nicholas Gleadall, Michael Roberts
for: 这些论文主要用于描述如何使用联邦学习方法解决医疗数据的多种挑战，如数据隔离、数据不均衡、缺失数据、分布Shift和非标准化变量。
methods: 这些论文使用的方法包括分布式优化、节点之间的通信、模型聚合和模型重新分布等。
results: 文献评审发现了许多论文中的系统性问题，这些问题会影响论文中的方法质量。文献还提出了具体的建议，以改善联邦学习方法在医疗领域的发展质量。

Abstract
For healthcare datasets, it is often not possible to combine data samples from multiple sites due to ethical, privacy or logistical concerns. Federated learning allows for the utilisation of powerful machine learning algorithms without requiring the pooling of data. Healthcare data has many simultaneous challenges which require new methodologies to address, such as highly-siloed data, class imbalance, missing data, distribution shifts and non-standardised variables. Federated learning adds significant methodological complexity to conventional centralised machine learning, requiring distributed optimisation, communication between nodes, aggregation of models and redistribution of models. In this systematic review, we consider all papers on Scopus that were published between January 2015 and February 2023 and which describe new federated learning methodologies for addressing challenges with healthcare data. We performed a detailed review of the 89 papers which fulfilled these criteria. Significant systemic issues were identified throughout the literature which compromise the methodologies in many of the papers reviewed. We give detailed recommendations to help improve the quality of the methodology development for federated learning in healthcare.

摘要
For healthcare datasets, it is often not possible to combine data samples from multiple sites due to ethical, privacy, or logistical concerns. Federated learning allows for the utilization of powerful machine learning algorithms without requiring the pooling of data. Healthcare data has many simultaneous challenges that require new methodologies to address, such as highly-siloed data, class imbalance, missing data, distribution shifts, and non-standardized variables. Federated learning adds significant methodological complexity to conventional centralized machine learning, requiring distributed optimization, communication between nodes, aggregation of models, and redistribution of models. In this systematic review, we consider all papers on Scopus that were published between January 2015 and February 2023 and which describe new federated learning methodologies for addressing challenges with healthcare data. We performed a detailed review of the 89 papers that fulfilled these criteria. Significant systemic issues were identified throughout the literature which compromise the methodologies in many of the papers reviewed. We give detailed recommendations to help improve the quality of the methodology development for federated learning in healthcare.

Stable and Interpretable Deep Learning for Tabular Data: Introducing InterpreTabNet with the Novel InterpreStability Metric

paper_url: http://arxiv.org/abs/2310.02870
repo_url: None
paper_authors: Shiyun Wa, Xinai Lu, Minjuan Wang
for: 提高分类精度和解释性，应用于多个领域的AI模型设计
methods: 基于TabNet架构，改进了吸引模块以确保robust Gradient Propagation和计算稳定性
results: 在多种应用场景中，InterpreTabNet超过了其他领先解释模型，并提出了一个新的评价指标InterpreStability，可以帮助评估和比较未来模型的解释性。

Abstract
As Artificial Intelligence (AI) integrates deeper into diverse sectors, the quest for powerful models has intensified. While significant strides have been made in boosting model capabilities and their applicability across domains, a glaring challenge persists: many of these state-of-the-art models remain as black boxes. This opacity not only complicates the explanation of model decisions to end-users but also obstructs insights into intermediate processes for model designers. To address these challenges, we introduce InterpreTabNet, a model designed to enhance both classification accuracy and interpretability by leveraging the TabNet architecture with an improved attentive module. This design ensures robust gradient propagation and computational stability. Additionally, we present a novel evaluation metric, InterpreStability, which quantifies the stability of a model's interpretability. The proposed model and metric mark a significant stride forward in explainable models' research, setting a standard for transparency and interpretability in AI model design and application across diverse sectors. InterpreTabNet surpasses other leading solutions in tabular data analysis across varied application scenarios, paving the way for further research into creating deep-learning models that are both highly accurate and inherently explainable. The introduction of the InterpreStability metric ensures that the interpretability of future models can be measured and compared in a consistent and rigorous manner. Collectively, these contributions have the potential to promote the design principles and development of next-generation interpretable AI models, widening the adoption of interpretable AI solutions in critical decision-making environments.

摘要
为了满足人工智能（AI）在多个领域的应用，强大模型的研究得到了推动。虽然在提高模型能力和适用范围方面做出了重要进展，但是一个主要挑战仍然存在：许多这些状态艺术模型仍然是黑盒模型。这种透明度不仅使得模型决策的解释对终端用户变得更加困难，还使得模型设计者不能够深入了解模型的中间过程。为解决这些挑战，我们提出了InterpreTabNet，一种基于TabNet架构的模型，通过改进的注意力模块来提高分类精度和可解释性。这种设计保证了正确的gradient传播和计算稳定性。此外，我们还提出了一个新的评价指标——InterpreStability，可以衡量模型的可解释性稳定性。提出的模型和指标标志着对可解释AI模型的研究的一个重要进步，为AI模型设计和应用在多个领域的扩展奠定了基础。InterpreTabNet在不同的应用场景中对 tabular 数据进行分类Task 表现出色，为未来的可解释AI模型的研究开辟了新的可能性。InterpreStability指标的引入确保了未来模型的可解释性可以在一致和严格的方式进行衡量和比较。总的来说，这些贡献有助于推动下一代可解释AI模型的设计原则和开发，扩大AI模型在重要决策环境中的应用。

A novel asymmetrical autoencoder with a sparsifying discrete cosine Stockwell transform layer for gearbox sensor data compression

paper_url: http://arxiv.org/abs/2310.02862
repo_url: None
paper_authors: Xin Zhu, Daoguang Yang, Hongyi Pan, Hamid Reza Karimi, Didem Ozevin, Ahmet Enis Cetin
for: 非接触式牙轮瑕疵诊断问题中的无线传输硬盘数据压缩问题受到了不足的高效压缩模型的挑战。
methods: 本文提出了一种Signal-adaptive asymmetrical autoencoder，其中引入了一个新的离散归一transform domain layer，并在这个域中实现了一个可学习的滤波器。此外，还应用了一个可学习的均值阈值层来使feature map sparse。相比 Linear layer，DCST层可以减少trainable参数的数量，并提高数据重建的准确性。
results: 对于University of Connecticut (UoC)和Southeast University (SEU)的牙轮数据集，提出的方法与其他autoencoder-based方法相比，平均质量分数提高2.00%最低和32.35%最高，仅需用限制数据集的训练样本。

Abstract
The lack of an efficient compression model remains a challenge for the wireless transmission of gearbox data in non-contact gear fault diagnosis problems. In this paper, we present a signal-adaptive asymmetrical autoencoder with a transform domain layer to compress sensor signals. First, a new discrete cosine Stockwell transform (DCST) layer is introduced to replace linear layers in a multi-layer autoencoder. A trainable filter is implemented in the DCST domain by utilizing the multiplication property of the convolution. A trainable hard-thresholding layer is applied to reduce redundant data in the DCST layer to make the feature map sparse. In comparison to the linear layer, the DCST layer reduces the number of trainable parameters and improves the accuracy of data reconstruction. Second, training the autoencoder with a sparsifying DCST layer only requires a small number of datasets. The proposed method is superior to other autoencoder-based methods on the University of Connecticut (UoC) and Southeast University (SEU) gearbox datasets, as the average quality score is improved by 2.00% at the lowest and 32.35% at the highest with a limited number of training samples

摘要
缺乏高效压缩模型是无线传输变速盘数据的挑战，在这篇论文中，我们提出了一个适应信号的不同类型自适应器，包括一个对称自适应器和一个对称自适应器的变分层。首先，我们引入了一个新的简单cosine Stockwell变换（DCST）层，以取代多层自适应器中的线性层。这个层通过在DCST空间中实现可读性的范围内的对称滤波器，以便将数据映射到一个更短的特征地图。其次，我们将一个可调范围内的硬边阈层应用于DCST层，以将特征地图删除重复的数据，并使特征地图更加简洁。相比于线性层，DCST层可以减少训练参数的数量，并提高数据重建的精度。此外，我们发现使用DCST层进行训练只需要一小部分的数据，并且与其他基于自适应器的方法相比，我们的方法可以在UoC和SEU箱变数据上提高均值品质分数，从最低的2.00%到最高的32.35%。

Rayleigh Quotient Graph Neural Networks for Graph-level Anomaly Detection

paper_url: http://arxiv.org/abs/2310.02861
repo_url: None
paper_authors: Xiangyu Dong, Xingyi Zhang, Sibo Wang
for: 本研究旨在提出一种新的spectral graph neural network(RQGNN)，用于图 уров异常检测。
methods: 我们提出了一种新的框架，包括两个组件：归一化矩阵学习(RQL)和Chebychev wavelet GNN with RQ-pooling(CWGNN-RQ)。RQL直接捕捉图的律RAYLEIGH积分，而CWGNN-RQ则通过spectral空间来探索图的异常性。
results: 我们在10个真实世界数据集上进行了广泛的实验，结果显示，RQGNN比最佳竞争对手提高了6.74%的macro-F1分数和1.44%的AUC值，这表明我们的框架有效。

Abstract
Graph-level anomaly detection has gained significant attention as it finds many applications in various domains, such as cancer diagnosis and enzyme prediction. However, existing methods fail to capture the underlying properties of graph anomalies, resulting in unexplainable framework design and unsatisfying performance. In this paper, we take a step back and re-investigate the spectral differences between anomalous and normal graphs. Our main observation shows a significant disparity in the accumulated spectral energy between these two classes. Moreover, we prove that the accumulated spectral energy of the graph signal can be represented by its Rayleigh Quotient, indicating that the Rayleigh Quotient is a driving factor behind the anomalous properties of graphs. Motivated by this, we propose Rayleigh Quotient Graph Neural Network (RQGNN), the first spectral GNN for graph-level anomaly detection, providing a new perspective on exploring the inherent spectral features of anomalous graphs. Specifically, we introduce a novel framework that consists of two components: the Rayleigh Quotient learning component (RQL) and Chebyshev Wavelet GNN with RQ-pooling (CWGNN-RQ). RQL explicitly captures the Rayleigh Quotient of graphs and CWGNN-RQ implicitly explores the spectral space of graphs. Extensive experiments on 10 real-world datasets show that RQGNN outperforms the best rival by 6.74% in Macro-F1 score and 1.44% in AUC, demonstrating the effectiveness of our framework.

摘要
graph уровня异常检测已经受到了广泛关注，因为它在各个领域，如癌症诊断和酶预测中发现了多种应用。然而，现有方法无法捕捉图异常的基本性质，导致框架设计不可解释和性能不满。在这篇论文中，我们往回一步，重新调查图异常的spectral differences。我们的主要观察结果表明，图异常和正常图之间的累积spectral energy存在显著差异。此外，我们证明了累积图信号的spectral energy可以由其Rayleigh Quotient表示，这表明Rayleigh Quotient是图异常性的驱动因素。这些观察和证明 inspirited我们提出Rayleigh Quotient Graph Neural Network（RQGNN），这是首个spectral GNN для图异常检测，它提供了一个新的视角来探索异常图的内在spectral特征。我们的框架包括两个组成部分：Rayleigh Quotient学习Component（RQL）和Chebyshev Wavelet GNN with RQ-pooling（CWGNN-RQ）。RQLExplicitly捕捉图的Rayleigh Quotient，而CWGNN-RQ则通过spectral空间来探索图的特征。我们在10个实际数据集上进行了广泛的实验，结果表明，RQGNN在Macro-F1分数和AUC方面比最佳竞争者高6.74%和1.44%，这表明我们的框架的效果。

Large language models in textual analysis for gesture selection

paper_url: http://arxiv.org/abs/2310.13705
repo_url: None
paper_authors: Laura B. Hensel, Nutchanon Yongsatianchot, Parisa Torshizi, Elena Minucci, Stacy Marsella
for: 这篇论文主要关注的是自动化手势生成技术的发展，具体来说是如何使用大语言模型（LLMs）来实现Context-specific gesture generation。
methods: 这篇论文使用了ChatGPT作为工具，通过提供小量的提示来建议上下文特定的手势。另外，LLMs还能够提供不在训练数据中存在的新的、适当的手势。
results: 该论文发现，使用LLMs可以减少繁重的注释量和快速适应不同的设计者意图。这种方法有望成为自动化手势生成技术的可能性。

Abstract
Gestures perform a variety of communicative functions that powerfully influence human face-to-face interaction. How this communicative function is achieved varies greatly between individuals and depends on the role of the speaker and the context of the interaction. Approaches to automatic gesture generation vary not only in the degree to which they rely on data-driven techniques but also the degree to which they can produce context and speaker specific gestures. However, these approaches face two major challenges: The first is obtaining sufficient training data that is appropriate for the context and the goal of the application. The second is related to designer control to realize their specific intent for the application. Here, we approach these challenges by using large language models (LLMs) to show that these powerful models of large amounts of data can be adapted for gesture analysis and generation. Specifically, we used ChatGPT as a tool for suggesting context-specific gestures that can realize designer intent based on minimal prompts. We also find that ChatGPT can suggests novel yet appropriate gestures not present in the minimal training data. The use of LLMs is a promising avenue for gesture generation that reduce the need for laborious annotations and has the potential to flexibly and quickly adapt to different designer intents.

摘要
姿势可以具有多种交流功能，强烈影响人对面交流。这种交流功能的实现方式各不相同，归结在发言人的角色和交流Context中。自动姿势生成的方法不仅在数据驱动技术的程度上存在差异，还在可以生成Context和发言人Specific姿势上存在差异。然而，这些方法面临两个主要挑战：第一是获得适合Context和应用目标的充足训练数据。第二是 designer控制，以实现他们特定的应用目标。我们在这里解决这些挑战，使用大语言模型（LLMs）来示出这些强大的模型可以适应姿势分析和生成。我们使用ChatGPT作为工具，以提供Context特定的姿势建议，基于最小的提示来实现设计师的意图。我们还发现，ChatGPT可以建议不在最小训练数据中存在的新的、适当的姿势。使用LLMs是一个有前途的方式，可以减少繁重的注释和快速适应不同的设计师意图。

GPT-4 as an interface between researchers and computational software: improving usability and reproducibility

paper_url: http://arxiv.org/abs/2310.11458
repo_url: None
paper_authors: Juan C. Verduzco, Ethan Holbrook, Alejandro Strachan
for: 本研究旨在探讨大型自然语言模型（LLM）在科学和工程中的应用，以及其在计算材料科学中的应用。
methods: 本研究使用GPT-4语言模型来解决计算材料科学中的两个主要挑战：一是使用自定义输入语言的学术软件采用高难度的采用率，二是发表的结果的重复性因为不充分的描述计算方法而受到限制。
results: 研究发现，GPT-4可以生成正确和可用的输入文件，并且可以对复杂多步计算任务进行初步设置。此外，GPT-4可以从输入文件中提取计算任务的描述，并且可以根据需要进行调整，从详细的步骤指令转换为适合出版的概要描述。研究结果表明，GPT-4可以减少研究者的日常任务数量，加速新用户的培训，并提高结果的重复性。

Abstract
Large language models (LLMs) are playing an increasingly important role in science and engineering. For example, their ability to parse and understand human and computer languages makes them powerful interpreters and their use in applications like code generation are well-documented. We explore the ability of the GPT-4 LLM to ameliorate two major challenges in computational materials science: i) the high barriers for adoption of scientific software associated with the use of custom input languages, and ii) the poor reproducibility of published results due to insufficient details in the description of simulation methods. We focus on a widely used software for molecular dynamics simulations, the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS), and quantify the usefulness of input files generated by GPT-4 from task descriptions in English and its ability to generate detailed descriptions of computational tasks from input files. We find that GPT-4 can generate correct and ready-to-use input files for relatively simple tasks and useful starting points for more complex, multi-step simulations. In addition, GPT-4's description of computational tasks from input files can be tuned from a detailed set of step-by-step instructions to a summary description appropriate for publications. Our results show that GPT-4 can reduce the number of routine tasks performed by researchers, accelerate the training of new users, and enhance reproducibility.

摘要

Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation

paper_url: http://arxiv.org/abs/2310.02842
repo_url: None
paper_authors: Chen Dun, Mirian Hipolito Garcia, Guoqing Zheng, Ahmed Hassan Awadallah, Anastasios Kyrillidis, Robert Sim
for: 这个论文的目的是探讨如何使用 Mixture of Prompts（MoPs）和智能阀值功能来调整含有多种任务和数据分布的大型语言模型（LLMs），以提高其在多任务、多源enario下的性能。methods: 这篇论文提出使用 MoPs 和智能阀值功能来实现这个目标，其中 MoPs 是一种组合多个提示的技术，可以在不同的任务和数据分布下 dynamically assign 合适的专家提示，以提高模型的性能。results: 实验结果表明，使用 MoPs 可以在多任务、多源enario下 mitigate 提示训练 “干扰”，以及模型的缺失和误差。 Specifically, MoPs 在 federated scenario 下可以降低 final perplexity 从 $\sim20%$ 降至 $\sim70%$，而在 centralized scenario 下可以降低 final perplexity 从 $\sim 3%$ 降至 $\sim30%$。

Abstract
Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions, just out of the box, but they are often trained with a single task in mind. Due to high computational costs, the current trend is to use prompt instruction tuning to better adjust monolithic, pretrained LLMs for new -- but often individual -- downstream tasks. Thus, how one would expand prompt tuning to handle -- concomitantly -- heterogeneous tasks and data distributions is a widely open question. To address this gap, we suggest the use of \emph{Mixture of Prompts}, or MoPs, associated with smart gating functionality: the latter -- whose design is one of the contributions of this paper -- can identify relevant skills embedded in different groups of prompts and dynamically assign combined experts (i.e., collection of prompts), based on the target task. Additionally, MoPs are empirically agnostic to any model compression technique applied -- for efficiency reasons -- as well as instruction data source and task composition. In practice, MoPs can simultaneously mitigate prompt training "interference" in multi-task, multi-source scenarios (e.g., task and data heterogeneity across sources), as well as possible implications from model approximations. As a highlight, MoPs manage to decrease final perplexity from $\sim20\%$ up to $\sim70\%$, as compared to baselines, in the federated scenario, and from $\sim 3\%$ up to $\sim30\%$ in the centralized scenario.

摘要
大型语言模型（LLM）具有多种任务的解决能力，如文本概要和数学问题，直接从箱中提取，但它们通常在单一任务的训练下进行学习。由于计算成本高昂，当前趋势是使用提示指定调整已经预训练的LLM以适应新的、但通常是个体的下游任务。因此，如何同时处理多种不同任务和数据分布是一个未解决的问题。为解决这个漏洞，我们建议使用“混合提示”（Mixture of Prompts，MoPs），它们与智能闭合功能相结合：后者可以在目标任务上标识不同的提示组中嵌入的相关技能，并在运行时动态分配组合专家（即提示集）。此外，MoPs是对任何模型压缩技术应用的empirical无关，以及指令数据源和任务组合。在实践中，MoPs可以同时消除提示训练“干扰”在多任务多源场景（例如，任务和数据多样性遍布多个源），以及可能的模型缺失。高亮是，MoPs可以将最终的抗抑扰度从大约20%降低至大约70%，比基eline更高，在联合场景下，以及从大约3%降低至大约30%，在中央场景下。

Improving Vision Anomaly Detection with the Guidance of Language Modality

paper_url: http://arxiv.org/abs/2310.02821
repo_url: https://github.com/Anfeather/CMG
paper_authors: Dong Chen, Kaihang Pan, Guoming Wang, Yueting Zhuang, Siliang Tang
For: 这篇论文的目的是提出一种基于多模式的异常检测方法，以解决现有的数据繁残和缺乏特征空间问题，以应对工业瑕疵检测、事件检测等。* Methods: 这篇论文提出了两个方法来解决缺乏特征空间和数据繁残问题，即跨模式统计学减少（CMER）和跨模式线性嵌入（CMLE）。CMER会在原始图像中遮盾一部分，然后与文本进行匹配分数，并将无关像排除以专注于重要内容。CMLE则是从语言模式中学习一个相互联系结构矩阵，以导引视觉模式中的内存空间学习。* Results: 实验结果显示，这篇论文的提案方法比基于图像的基eline方法高效16.81%。剥ppings experiments进一步证明了提案方法的协调性，每个 ком成分都需要对另一个 ком成分进行协调以 дости� optimal performance。

Abstract
Recent years have seen a surge of interest in anomaly detection for tackling industrial defect detection, event detection, etc. However, existing unsupervised anomaly detectors, particularly those for the vision modality, face significant challenges due to redundant information and sparse latent space. Conversely, the language modality performs well due to its relatively single data. This paper tackles the aforementioned challenges for vision modality from a multimodal point of view. Specifically, we propose Cross-modal Guidance (CMG), which consists of Cross-modal Entropy Reduction (CMER) and Cross-modal Linear Embedding (CMLE), to tackle the redundant information issue and sparse space issue, respectively. CMER masks parts of the raw image and computes the matching score with the text. Then, CMER discards irrelevant pixels to make the detector focus on critical contents. To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality, and then the latent space of vision modality will be learned with the guidance of the matrix. Thereafter, the vision latent space will get semantically similar images closer. Extensive experiments demonstrate the effectiveness of the proposed methods. Particularly, CMG outperforms the baseline that only uses images by 16.81%. Ablation experiments further confirm the synergy among the proposed methods, as each component depends on the other to achieve optimal performance.

摘要
近年来，异常检测在工业缺陷检测和事件检测等领域得到了广泛关注。然而，现有的无监督异常检测器，特别是视觉模式的检测器，面临着冗余信息和稀疏的 latent space 的挑战。而语言模式的检测器则表现良好，这是因为语言数据相对较少。这篇论文从多模态角度解决了上述挑战。我们提出了跨模态指导（CMG），它包括跨模态Entropy减少（CMER）和跨模态线性嵌入（CMLE），以解决冗余信息问题和稀疏空间问题。CMER 将原始图像部分掩码，并计算与文本的匹配得分。然后，CMER 丢弃无关像素，使检测器更注意关键内容。为了学习更加紧凑的视觉异常检测器的 latent space，CMLE 学习了语言模式的相关结构矩阵，然后视觉模式的 latent space 将会被指导而学习。最后，视觉 latent space 将会更加紧凑，Semantic 相似的图像将会更加接近。广泛的实验表明我们提出的方法的效果。特别是，CMG 在只使用图像的基础上进行检测时，比基eline 高出16.81%。剖除实验还证明了我们提出的方法之间的相互依存关系，每个组件都需要另一个组件以达到最佳性能。

Time-Series Classification in Smart Manufacturing Systems: An Experimental Evaluation of State-of-the-Art Machine Learning Algorithms

paper_url: http://arxiv.org/abs/2310.02812
repo_url: None
paper_authors: Mojtaba A. Farahani, M. R. McCormick, Ramy Harik, Thorsten Wuest
for: 本研究的目的是在生产和工业设置中进行时间序列分类任务的探索和评估，以提供对现有的SoTA机器学习和深度学习算法的实验性评估。
methods: 本研究使用了92种SoTA算法，其中36种是最 represetative的算法，并在22个生产数据集上进行了实验评估。
results: 研究发现，ResNet、DrCIF、InceptionTime和ARSENAL算法在22个生产时间序列分类任务中的平均准确率高于96.6%。此外，LSTM、BiLSTM和TS-LSTM算法也表现出色，能够在时间序列数据中捕捉特征。

Abstract
Manufacturing is gathering extensive amounts of diverse data, thanks to the growing number of sensors and rapid advances in sensing technologies. Among the various data types available in SMS settings, time-series data plays a pivotal role. Hence, TSC emerges is crucial in this domain. The objective of this study is to fill this gap by providing a rigorous experimental evaluation of the SoTA ML and DL algorithms for TSC tasks in manufacturing and industrial settings. We first explored and compiled a comprehensive list of more than 92 SoTA algorithms from both TSC and manufacturing literature. Following, we selected the 36 most representative algorithms from this list. To evaluate their performance across various manufacturing classification tasks, we curated a set of 22 manufacturing datasets, representative of different characteristics that cover diverse manufacturing problems. Subsequently, we implemented and evaluated the algorithms on the manufacturing benchmark datasets, and analyzed the results for each dataset. Based on the results, ResNet, DrCIF, InceptionTime, and ARSENAL are the top-performing algorithms, boasting an average accuracy of over 96.6% across all 22 manufacturing TSC datasets. These findings underscore the robustness, efficiency, scalability, and effectiveness of convolutional kernels in capturing temporal features in time-series data, as three out of the top four performing algorithms leverage these kernels for feature extraction. Additionally, LSTM, BiLSTM, and TS-LSTM algorithms deserve recognition for their effectiveness in capturing features within time-series data using RNN-based structures.

摘要
制造业收集了大量多样数据，这主要归功于传感器的增加和感知技术的快速发展。在SMS设置中，时序数据扮演着关键角色，因此TSC在这个领域变得非常重要。本研究的目的是填补这个空白，通过对SoTA ML和DL算法在制造业和工业设置中的实验评估来提供一个严格的实验评估。我们首先搜索和组织了More than 92 SoTA算法，其中大部分来自于TSC和制造业文献。然后，我们选择了这些列表中的36个最有代表性的算法。为了评估这些算法在不同的制造类型任务中的性能，我们准备了22个制造数据集，这些数据集代表了不同的制造问题，并覆盖了多种不同的特征。接下来，我们对制造benchmark数据集进行了实现和评估，并分析了每个数据集的结果。根据结果，ResNet、DrCIF、InceptionTime和ARSENAL算法在22个制造TSC数据集中的平均准确率高于96.6%。这些结果表明了 convolutional kernels在时序数据中捕捉特征的稳定性、效率、扩展性和有效性。此外，LSTM、BiLSTM和TS-LSTM算法在时序数据中捕捉特征的效果也值得注意。

Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design

paper_url: http://arxiv.org/abs/2310.02782
repo_url: https://github.com/EmptyJackson/groove
paper_authors: Matthew Thomas Jackson, Minqi Jiang, Jack Parker-Holder, Risto Vuorio, Chris Lu, Gregory Farquhar, Shimon Whiteson, Jakob Nicolaus Foerster
for: 本研究旨在找到一种能够在各种深度强化学习任务中表现出色的通用RL算法，以解决现有RL算法在不同环境中的泛化性问题。
methods: 本研究使用了元学习更新规则，并基于无监督环境设计（UED）的想法，提出了一种自动生成课程来最大化元学习优化器的偏误。同时，提出了一种新的误差度量（AR），用于评估元学习算法在不同环境中的泛化性。
results: 实验结果表明，使用GROOVE方法可以在不同环境中达到更高的泛化性，并且AR被证明是UED中的一个关键组成部分。

Abstract
The past decade has seen vast progress in deep reinforcement learning (RL) on the back of algorithms manually designed by human researchers. Recently, it has been shown that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks. Despite impressive initial results from algorithms such as Learned Policy Gradient (LPG), there remains a generalization gap when these algorithms are applied to unseen environments. In this work, we examine how characteristics of the meta-training distribution impact the generalization performance of these algorithms. Motivated by this analysis and building on ideas from Unsupervised Environment Design (UED), we propose a novel approach for automatically generating curricula to maximize the regret of a meta-learned optimizer, in addition to a novel approximation of regret, which we name algorithmic regret (AR). The result is our method, General RL Optimizers Obtained Via Environment Design (GROOVE). In a series of experiments, we show that GROOVE achieves superior generalization to LPG, and evaluate AR against baseline metrics from UED, identifying it as a critical component of environment design in this setting. We believe this approach is a step towards the discovery of truly general RL algorithms, capable of solving a wide range of real-world environments.

摘要
过去一代，深度强化学习（RL）在人工设计的算法支持下做出了巨大的进步。最近，人们发现可以使用meta学习更新规则，以期望在多种RL任务上发现高效的算法。虽然LPG等算法在初期的成果很吸引人，但是在未经见过的环境中仍然存在一般化差距。在这项工作中，我们分析了meta训练分布的特点对RL算法的泛化性的影响。受到这种分析和基于无监督环境设计的想法，我们提出了一种新的方法，即通过自动生成课程来最大化meta学习器的快捷（GROOVE）。在一系列实验中，我们证明GROOVE可以在LPG的基础上实现更好的泛化性，并评估了AR在UED中的基准指标，并证明它是环境设计中的关键组成部分。我们认为这种方法是在发现真正泛化RL算法的步骤， capable of solving多种实际环境中的问题。

Integrating UMLS Knowledge into Large Language Models for Medical Question Answering

paper_url: http://arxiv.org/abs/2310.02778
repo_url: https://github.com/YangRui525/UMLS-Augmented-LLM
paper_authors: Rui Yang, Edison Marrese-Taylor, Yuhe Ke, Lechao Cheng, Qingyu Chen, Irene Li
for: 这个研究旨在提高大语言模型在医疗领域的应用，并使其更能够适应实际的临床应用场景。
methods: 这个研究使用了UMLS扩展的大语言模型，并使用了LLaMa2-13b-chat和ChatGPT-3.5作为参考模型。研究者使用ROUGE分数和BERTScore进行自动评估，并为医生评估使用了四个维度：实际性、完整性、可读性和相关性。
results: 研究结果显示，这个框架可以有效提高生成内容的实际性、完整性和相关性。多名医生进行了盲测评估，结果表明这个框架可以增强生成内容的质量。

Abstract
Large language models (LLMs) have demonstrated powerful text generation capabilities, bringing unprecedented innovation to the healthcare field. While LLMs hold immense promise for applications in healthcare, applying them to real clinical scenarios presents significant challenges, as these models may generate content that deviates from established medical facts and even exhibit potential biases. In our research, we develop an augmented LLM framework based on the Unified Medical Language System (UMLS), aiming to better serve the healthcare community. We employ LLaMa2-13b-chat and ChatGPT-3.5 as our benchmark models, and conduct automatic evaluations using the ROUGE Score and BERTScore on 104 questions from the LiveQA test set. Additionally, we establish criteria for physician-evaluation based on four dimensions: Factuality, Completeness, Readability and Relevancy. ChatGPT-3.5 is used for physician evaluation with 20 questions on the LiveQA test set. Multiple resident physicians conducted blind reviews to evaluate the generated content, and the results indicate that this framework effectively enhances the factuality, completeness, and relevance of generated content. Our research demonstrates the effectiveness of using UMLS-augmented LLMs and highlights the potential application value of LLMs in in medical question-answering.

摘要
大型语言模型（LLMs）已经显示出强大的文本生成能力，带来医疗领域前所未有的创新。然而，将LLMs应用到实际的医疗情况中存在着重大挑战，因为这些模型可能会生成不符合现有医疗知识的内容，甚至会显示出潜在偏见。在我们的研究中，我们开发了基于Unified Medical Language System（UMLS）的增强LLM框架，以更好地服务医疗社区。我们使用LLaMa2-13b-chat和ChatGPT-3.5作为我们的参考模型，并通过ROUGE Score和BERTScore自动评估104个LiveQA试题集中的问题。此外，我们建立了基于四个维度的医生评估标准：事实性、完整性、可读性和相关性。ChatGPT-3.5被用来进行医生评估，并使用20个LiveQA试题集中的问题进行盲评。多名住院医生进行了双盲评估，以评估生成的内容的实际性、完整性和相关性。我们的研究显示，这个框架有效地提高了生成内容的事实性、完整性和相关性。我们的研究显示LLMs在医疗问题回答中的应用价值，并显示了UMLS-增强LLMs在医疗领域的应用前景。

Spike Accumulation Forwarding for Effective Training of Spiking Neural Networks

paper_url: http://arxiv.org/abs/2310.02772
repo_url: None
paper_authors: Ryuji Saiin, Tomoya Shirakawa, Sota Yoshihara, Yoshihide Sawada, Hiroyuki Kusumoto
for: 本研究提出了一种新的聚集前进 paradigma（SAF），用于训练脉冲神经网络（SNN）。SNN具有能效的能耗特性，但训练困难。因此，许多研究人员已经提出了各种方法来解决这个问题，其中线上训练在时间（OTTT）是一种方法，允许在每个时间步骤中进行推理，同时降低内存成本。但是，OTTT需要在传输过程中进行脉冲车和权重总和操作，这会增加计算成本。
methods: SAF可以解决这些问题，即可以减少传输过程中的操作数量，并且可以 theoretically 证明 SAF 与 OTTT 和 Spike Representation 相一致。此外，我们通过实验证明了上述结论，并表明可以降低内存和训练时间，保持精度。
results: 我们通过实验证明了 SAF 可以降低内存和训练时间，保持精度。具体来说，我们在 MNIST 和 CIFAR-10 数据集上进行了实验，结果显示，使用 SAF 可以在内存和训练时间方面减少一半，而且精度保持在原始精度水平。

Abstract
In this article, we propose a new paradigm for training spiking neural networks (SNNs), spike accumulation forwarding (SAF). It is known that SNNs are energy-efficient but difficult to train. Consequently, many researchers have proposed various methods to solve this problem, among which online training through time (OTTT) is a method that allows inferring at each time step while suppressing the memory cost. However, to compute efficiently on GPUs, OTTT requires operations with spike trains and weighted summation of spike trains during forwarding. In addition, OTTT has shown a relationship with the Spike Representation, an alternative training method, though theoretical agreement with Spike Representation has yet to be proven. Our proposed method can solve these problems; namely, SAF can halve the number of operations during the forward process, and it can be theoretically proven that SAF is consistent with the Spike Representation and OTTT, respectively. Furthermore, we confirmed the above contents through experiments and showed that it is possible to reduce memory and training time while maintaining accuracy.

摘要
在本文中，我们提出了一种新的思维方式来训练神经元脉冲网络（SNN），即脉冲汇聚前进（SAF）。已知SNN具有能效的能源占用但训练困难。因此，许多研究人员已经提出了多种解决方案，其中在线训练通过时间（OTTT）是一种可以在每个时间步骤上进行推理的方法。然而，在GPU上计算时，OTTT需要对脉冲列表和权重总和的脉冲列表进行操作。此外，OTTT与脉冲表示之间存在关系，尚未经过理论确认。我们的提议的方法可以解决这些问题，即SAF可以在前进过程中减少操作数量的一半，并且可以从理论上证明SAF与OTTT和脉冲表示之间存在一致性。此外，我们通过实验证明了以上内容，并证明可以降低内存和训练时间的同时保持准确性。

Modified LAB Algorithm with Clustering-based Search Space Reduction Method for solving Engineering Design Problems

paper_url: http://arxiv.org/abs/2310.03055
repo_url: None
paper_authors: Ruturaj Reddy, Utkarsh Gupta, Ishaan Kale, Apoorva Shastri, Anand J Kulkarni
for: 这篇论文是为了提出一种修改后的LAB算法（Reddy et al. 2023），用于解决具有竞争和学习行为的群体问题。
methods: 该算法将原始LAB算法继承，并 introduce roulette wheel approach 和减少因子，以实现群体内部竞争和迭代缩小样本空间。此外，它还提出了一种基于减少因子的尺度的搜索空间减少法（C-SSR），以解决约束问题。
results: 该算法在CEC 2005 和 CEC 2017 的标准测试问题上进行验证，并表现出了改善的robustness和搜索空间探索能力。此外，与其他最近的metaheuristic算法进行比较，该算法的结果也表现出了优越性。

Abstract
A modified LAB algorithm is introduced in this paper. It builds upon the original LAB algorithm (Reddy et al. 2023), which is a socio-inspired algorithm that models competitive and learning behaviours within a group, establishing hierarchical roles. The proposed algorithm incorporates the roulette wheel approach and a reduction factor introducing inter-group competition and iteratively narrowing down the sample space. The algorithm is validated by solving the benchmark test problems from CEC 2005 and CEC 2017. The solutions are validated using standard statistical tests such as two-sided and pairwise signed rank Wilcoxon test and Friedman rank test. The algorithm exhibited improved and superior robustness as well as search space exploration capabilities. Furthermore, a Clustering-Based Search Space Reduction (C-SSR) method is proposed, making the algorithm capable to solve constrained problems. The C-SSR method enables the algorithm to identify clusters of feasible regions, satisfying the constraints and contributing to achieve the optimal solution. This method demonstrates its effectiveness as a potential alternative to traditional constraint handling techniques. The results obtained using the Modified LAB algorithm are then compared with those achieved by other recent metaheuristic algorithms.

摘要
本文引入一种修改后的LAB算法。它基于原始LAB算法（Reddy等2023），该算法模拟了社会中竞争和学习行为，在群体中建立层次角色。提案的算法添加了扭轮方法和减少因子，引入群体间竞争和迭代缩小样本空间。该算法通过CEC 2005和CEC 2017的标准测试问题进行验证，并使用标准统计测试如双边对应签名沃克逊测试和Friendman排名测试验证解决方案。算法表现出了改善的Robustness和搜索空间探索能力。此外，一种归一化搜索空间减少（C-SSR）方法被提议，使算法能够解决约束问题。C-SSR方法使算法能够识别满足约束的可行区域归一化，从而实现优化解决方案。这种方法证明了它作为传统约束处理技术的替代方案的有效性。本文结果与其他最近的metaheuristic算法相比较。

MUNCH: Modelling Unique ‘N Controllable Heads

paper_url: http://arxiv.org/abs/2310.02753
repo_url: None
paper_authors: Debayan Deb, Suvidha Tripathi, Pranit Puri
for: 这个论文旨在提供一种可控、多样、高质量和可解释的3D人头生成方法，为游戏设计师提供更多的创作自由和灵活性。
methods: 该方法包括一个准确的geometry生成器，可以生成多种独特的样本，以及一个Render Map生成器，可以Synthesize多个高质量的物理渲染地图，包括Albedo、Glossiness、Specular和Normals等。此外，我们还引入了一种新的ColorTransformer模型，允许艺术家在生成的地图上进行semantic色控制。
results: 我们的模型可以生成多种独特的3D人头，并且可以在不同的游戏和动画场景中使用，同时也可以提供高质量的渲染图像。我们还引入了一些量化的metric，可以衡量模型的性能和多样性。您可以在https://munch-seven.vercel.app/中找到示例和数据集下载。

Abstract
The automated generation of 3D human heads has been an intriguing and challenging task for computer vision researchers. Prevailing methods synthesize realistic avatars but with limited control over the diversity and quality of rendered outputs and suffer from limited correlation between shape and texture of the character. We propose a method that offers quality, diversity, control, and realism along with explainable network design, all desirable features to game-design artists in the domain. First, our proposed Geometry Generator identifies disentangled latent directions and generate novel and diverse samples. A Render Map Generator then learns to synthesize multiply high-fidelty physically-based render maps including Albedo, Glossiness, Specular, and Normals. For artists preferring fine-grained control over the output, we introduce a novel Color Transformer Model that allows semantic color control over generated maps. We also introduce quantifiable metrics called Uniqueness and Novelty and a combined metric to test the overall performance of our model. Demo for both shapes and textures can be found: https://munch-seven.vercel.app/. We will release our model along with the synthetic dataset.

摘要
computer vision 研究者们已经有很长时间来努力地自动生成3D人头。现有的方法可以生成具有真实感的人头模型，但是它们受到形状和文化特征的限制，而且生成的结果的质量和多样性受到限制。我们提出了一种方法，它可以同时提供高质量、多样性、控制和真实感等所有愉悦的特点，这些特点都是游戏设计师所需的。我们的提议的Geometry Generator可以识别分离的约束方向，并生成新和多样的样本。然后，我们的Render Map Generator可以synthesize多个高级Physically-Based Rendering（PBR）映射，包括Albedo、Glossiness、Specular和Normals。为了让艺术家可以进行细致的控制，我们引入了一种新的Color Transformer Model，它允许用户在生成的映射中进行semantic色控制。我们还引入了Uniqueness和Novelty这两个量化度量，以评估我们的模型的总性能。您可以在https://munch-seven.vercel.app/中找到我们的demo和数据集。我们计划将我们的模型和数据集发布出来。

Inclusive Data Representation in Federated Learning: A Novel Approach Integrating Textual and Visual Prompt

paper_url: http://arxiv.org/abs/2310.04455
repo_url: None
paper_authors: Zihao Zhao, Zhenpeng Shi, Yang Liu, Wenbo Ding
for: 提高 Federated Learning（FL）的通信开销问题
methods: 使用双模态Prompt Tuning（TPFL）和增强TPFL（ATPFL）来更好地表征本地客户端数据特征
results: 比基elines表现出色，提高了客户端模型的全球知识获取和模型的稳定性和Compactness

Abstract
Federated Learning (FL) is often impeded by communication overhead issues. Prompt tuning, as a potential solution, has been introduced to only adjust a few trainable parameters rather than the whole model. However, current single-modality prompt tuning approaches fail to comprehensively portray local clients' data. To overcome this limitation, we present Twin Prompt Federated learning (TPFL), a pioneering solution that integrates both visual and textual modalities, ensuring a more holistic representation of local clients' data characteristics. Furthermore, in order to tackle the data heterogeneity issues, we introduce the Augmented TPFL (ATPFL) employing the contrastive learning to TPFL, which not only enhances the global knowledge acquisition of client models but also fosters the development of robust, compact models. The effectiveness of TPFL and ATPFL is substantiated by our extensive evaluations, consistently showing superior performance compared to all baselines.

摘要
Federated Learning (FL) 常被通信开销问题困扰。为解决这问题，我们提出了快速调整trainable参数的方法，而不是整个模型。然而，现有的单Modal prompt tuning方法无法全面反映本地客户端数据特征。为了解决这些限制，我们提出了双Modal prompt federated learning（TPFL），它将视觉和文本模式 integrate，以确保更全面地表征本地客户端数据特征。另外，为了解决数据不一致问题，我们提出了增强版TPFL（ATPFL），通过对TPFL进行对照学习，不仅提高客户端模型的全局知识获取，还会促进模型的紧凑和Robust。我们对TPFL和ATPFL进行了广泛的评估，并 consistently show superior performance compared to all baselines。

Functional trustworthiness of AI systems by statistically valid testing

paper_url: http://arxiv.org/abs/2310.02727
repo_url: None
paper_authors: Bernhard Nessler, Thomas Doms, Sepp Hochreiter
for: 欧盟人工智能（AI）法律的草拟中的不足和缺乏对欧盟公民的安全、健康和权利的考核。
methods: 使用随机抽样和精确定义应用领域来测试AI系统的统计函数性，以确保AI系统的可靠性和可信worthiness。
results: 提出了三个必需元素来确保AI系统的可靠性和可信worthiness，即（1）应用领域的技术分布定义，（2）基于风险的最低性能要求，和（3）基于独立随机抽样的统计测试。

Abstract
The authors are concerned about the safety, health, and rights of the European citizens due to inadequate measures and procedures required by the current draft of the EU Artificial Intelligence (AI) Act for the conformity assessment of AI systems. We observe that not only the current draft of the EU AI Act, but also the accompanying standardization efforts in CEN/CENELEC, have resorted to the position that real functional guarantees of AI systems supposedly would be unrealistic and too complex anyways. Yet enacting a conformity assessment procedure that creates the false illusion of trust in insufficiently assessed AI systems is at best naive and at worst grossly negligent. The EU AI Act thus misses the point of ensuring quality by functional trustworthiness and correctly attributing responsibilities. The trustworthiness of an AI decision system lies first and foremost in the correct statistical testing on randomly selected samples and in the precision of the definition of the application domain, which enables drawing samples in the first place. We will subsequently call this testable quality functional trustworthiness. It includes a design, development, and deployment that enables correct statistical testing of all relevant functions. We are firmly convinced and advocate that a reliable assessment of the statistical functional properties of an AI system has to be the indispensable, mandatory nucleus of the conformity assessment. In this paper, we describe the three necessary elements to establish a reliable functional trustworthiness, i.e., (1) the definition of the technical distribution of the application, (2) the risk-based minimum performance requirements, and (3) the statistically valid testing based on independent random samples.

摘要
作者们对欧盟人工智能（AI）法草案中的安全、健康和公民权的不足表示关切。现行草案和相关的标准化努力在CEN/CENELEC中都采取了位置，即AI系统的实际功能保证是不可能和太复杂。然而，实施不充分的验证程序，只是创造了虚假的信任感，这是最好的情况下的懒散，最坏的情况下是格外费尽。欧盟AI法因此错过了保证质量的机会，不能正确归因责任。我们认为AI决策系统的可靠性首先取决于对随机抽样的正确统计测试和应用领域的精确定义。我们将称之为可测试的功能信任。它包括设计、开发和部署，以便对所有相关功能进行正确的统计测试。我们坚持认为，对AI系统的统计功能性的可靠评估是必不可少的，也是强制的核心。在这篇论文中，我们介绍了三个必要元素，以建立可靠的功能信任：1. 应用领域的技术分布定义2. 基于风险的最低性能要求3. 基于独立随机抽样的统计测试这三个元素是建立可靠的AI系统功能信任的必要条件。

Online Clustering of Bandits with Misspecified User Models

paper_url: http://arxiv.org/abs/2310.02717
repo_url: https://github.com/JizeXie/Online-Corrupted-User-Detection-and-Regret-Minimization
paper_authors: Zhiyong Wang, Jize Xie, Xutong Liu, Shuai Li, John C. S. Lui
for: 本文研究了 clustering of bandits 问题下的 user model misspecification 问题，提出了两种robust CB 算法（RCLUMB 和 RSCLUMB），可以适应用户偏好估计不准确和 clustering 错误引起的问题。
methods: 本文使用了 linear bandit 算法和 clustering 技术，并提出了两种robust CB 算法，其中一种使用了动态图和集合来表示学习的 clustering 结构，另一种使用了sets来表示 clustering 结构。
results: 本文证明了其算法的 regret 上界为 $O(\epsilon_*T\sqrt{md\log T} + d\sqrt{mT}\log T)$，比之前的 CB 工作更加宽泛，不再需要特定的技术假设。实验结果表明其算法在 synthetic 和实际数据上表现出色，超过了之前的算法。

Abstract
The contextual linear bandit is an important online learning problem where given arm features, a learning agent selects an arm at each round to maximize the cumulative rewards in the long run. A line of works, called the clustering of bandits (CB), utilize the collaborative effect over user preferences and have shown significant improvements over classic linear bandit algorithms. However, existing CB algorithms require well-specified linear user models and can fail when this critical assumption does not hold. Whether robust CB algorithms can be designed for more practical scenarios with misspecified user models remains an open problem. In this paper, we are the first to present the important problem of clustering of bandits with misspecified user models (CBMUM), where the expected rewards in user models can be perturbed away from perfect linear models. We devise two robust CB algorithms, RCLUMB and RSCLUMB (representing the learned clustering structure with dynamic graph and sets, respectively), that can accommodate the inaccurate user preference estimations and erroneous clustering caused by model misspecifications. We prove regret upper bounds of $O(\epsilon_*T\sqrt{md\log T} + d\sqrt{mT}\log T)$ for our algorithms under milder assumptions than previous CB works (notably, we move past a restrictive technical assumption on the distribution of the arms), which match the lower bound asymptotically in $T$ up to logarithmic factors, and also match the state-of-the-art results in several degenerate cases. The techniques in proving the regret caused by misclustering users are quite general and may be of independent interest. Experiments on both synthetic and real-world data show our outperformance over previous algorithms.

摘要
Contextual linear bandit是一个重要的在线学习问题，给出arm特征，学习代理选择arm每个轮次以最大化长期的奖励。一系列工作，称为 clustering of bandits（CB），利用用户偏好的共同效应并显示了较好的性能than classic linear bandit算法。然而，现有的CB算法需要well-specified的线性用户模型，并且在这个假设不成立时可能失败。whether robust CB算法可以设计 для更实际的场景，即misspecified用户模型，是一个开放的问题。在这篇论文中，我们是第一个提出 clustering of bandits with misspecified user models（CBMUM）问题，其中用户预期奖励的预测可能偏离完美的线性模型。我们设计了两种Robust CB算法，即RCLUMB和RSCLUMB（分别表示学习 clustering 结构的动态图和集合），它们可以承受用户偏好估计的不准确和 clustering 错误。我们证明了我们算法的 regret upper bound为 $O(\epsilon_*T\sqrt{md\log T} + d\sqrt{mT}\log T)$，这与前一些 CB 工作（特别是不 restrictive的技术假设）下的更强的假设下，并且与state-of-the-art 结果相同，并且在一些degree degenerate 的情况下也相同。我们的证明技巧可能是独立的兴趣。在 synthetic 和实际数据上进行了实验，我们的表现超过了之前的算法。

scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis in Brain

paper_url: http://arxiv.org/abs/2310.02713
repo_url: None
paper_authors: Gyutaek Oh, Baekgyu Choi, Inkyung Jung, Jong Chul Ye
for: 这份论文是为了提高单元细胞RNA序列分析中的精度，尤其是在脑组织中，因为脑组织具有较高的细胞多样性。
methods: 该论文使用了一种基于Hyena算法的Transformer架构，称为单元细胞Hyena（scHyena），该架构包括线性适应层、基因嵌入编码和双向Hyena运算符。这使得我们可以处理整个scRNA-seq数据集而不会产生数据损失。
results: 作者比较了scHyena与其他参考方法的下游任务性能，包括细胞类型分类和scRNA-seq补充，并证明scHyena表现较好。

Abstract
Single-cell RNA sequencing (scRNA-seq) has made significant strides in unraveling the intricate cellular diversity within complex tissues. This is particularly critical in the brain, presenting a greater diversity of cell types than other tissue types, to gain a deeper understanding of brain function within various cellular contexts. However, analyzing scRNA-seq data remains a challenge due to inherent measurement noise stemming from dropout events and the limited utilization of extensive gene expression information. In this work, we introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain. Specifically, inspired by the recent Hyena operator, we design a novel Transformer architecture called singe-cell Hyena (scHyena) that is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a {bidirectional} Hyena operator. This enables us to process full-length scRNA-seq data without losing any information from the raw data. In particular, our model learns generalizable features of cells and genes through pre-training scHyena using the full length of scRNA-seq data. We demonstrate the superior performance of scHyena compared to other benchmark methods in downstream tasks, including cell type classification and scRNA-seq imputation.

摘要
Single-cell RNA sequencing (scRNA-seq) 技术已经在解释脑组织中的细胞多样性中做出了重要进步。特别是在脑组织中，其细胞类型多样性更大于其他组织类型，以更深入了解脑功能在不同细胞上下文中。然而，分析 scRNA-seq 数据仍然是一项挑战，因为存在内生的测量噪音和限制了广泛的基因表达信息的利用。在这种情况下，我们引入 scHyena，一种基于 Hyena 算法的基础模型，以解决这些挑战并提高 scRNA-seq 分析的准确性。具体来说，我们采用了 Hyena 算法中的新的 Transformer 架构，并将其称为单元细胞 Hyena（scHyena）。这种架构包括线性适应层、基因嵌入编码和双向 Hyena 算法。这使得我们可以处理完整的 scRNA-seq 数据，而不会失去任何信息。特别是，我们的模型通过预训练 scHyena 使用完整的 scRNA-seq 数据来学习细胞和基因的通用特征。我们示出了 scHyena 相比其他参考方法在下游任务中的superior表现，包括细胞类型分类和 scRNA-seq 填充。

ED-NeRF: Efficient Text-Guided Editing of 3D Scene using Latent Space NeRF

paper_url: http://arxiv.org/abs/2310.02712
repo_url: None
paper_authors: Jangho Park, Gihyun Kwon, Jong Chul Ye
for: 文章目的是提出一种基于 latent diffusion model (LDM) 的三维 NeRF 编辑方法，以提高 NeRF 编辑速度和质量。
methods: 本文使用了一种唯一的填充层来嵌入真实场景到 LDM 的 latent space，并提出了一种基于 delta denoising score (DDS) 的改进的损失函数，以便更好地支持编辑。
results: 实验结果表明，ED-NeRF 可以在 editing 速度和输出质量两个方面比前一代 3D 编辑模型表现更优，而且它的编辑速度比传统的图像空间 NeRF 编辑更快。

Abstract
Recently, there has been a significant advancement in text-to-image diffusion models, leading to groundbreaking performance in 2D image generation. These advancements have been extended to 3D models, enabling the generation of novel 3D objects from textual descriptions. This has evolved into NeRF editing methods, which allow the manipulation of existing 3D objects through textual conditioning. However, existing NeRF editing techniques have faced limitations in their performance due to slow training speeds and the use of loss functions that do not adequately consider editing. To address this, here we present a novel 3D NeRF editing approach dubbed ED-NeRF by successfully embedding real-world scenes into the latent space of the latent diffusion model (LDM) through a unique refinement layer. This approach enables us to obtain a NeRF backbone that is not only faster but also more amenable to editing compared to traditional image space NeRF editing. Furthermore, we propose an improved loss function tailored for editing by migrating the delta denoising score (DDS) distillation loss, originally used in 2D image editing to the three-dimensional domain. This novel loss function surpasses the well-known score distillation sampling (SDS) loss in terms of suitability for editing purposes. Our experimental results demonstrate that ED-NeRF achieves faster editing speed while producing improved output quality compared to state-of-the-art 3D editing models.

摘要
最近，文本到图像扩散模型的进步很大，导致了2D图像生成的新纪录。这些进步被扩展到3D模型，允许通过文本描述生成新的3D对象。这种进步演变成了NeRF编辑方法，允许通过文本条件来修改现有3D对象。然而，现有的NeRF编辑技术受到了训练速度过慢和不适合编辑的问题的限制。为解决这个问题，我们在这里提出了一种新的3D NeRF编辑方法，称为ED-NeRF。这种方法通过在LDM（隐藏层扩散模型）中嵌入真实场景的特征来成功地将真实场景嵌入到LDM的latent空间中。这种方法使得我们可以获得一个更快的NeRF脊梁，并且更适合编辑。此外，我们提出了一种适用于编辑的改进的损失函数，通过将2D图像 editing中使用的delta denoising score（DDS）涂抹损失迁移到三维领域，使得这种损失函数更适合编辑。我们的实验结果表明，ED-NeRF可以比现状的3D编辑模型更快速地进行编辑，并且生成的输出质量也更高。

Continual Contrastive Spoken Language Understanding

paper_url: http://arxiv.org/abs/2310.02699
repo_url: None
paper_authors: Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti, Bhiksha Raj
for: This paper focuses on the problem of learning sequence-to-sequence models for spoken language understanding in a class-incremental learning (CIL) setting, with the goal of preserving the learned representations and improving the model’s ability to learn new tasks continually.
methods: The proposed method, called COCONUT, combines experience replay and contrastive learning to preserve the learned representations and learn more discriminative representations of the new data. The method uses a modified version of the standard supervised contrastive loss applied only to the rehearsal samples, and also leverages a multimodal contrastive loss to align audio and text features.
results: The experiments on two established SLU datasets show the effectiveness of the proposed approach, with significant improvements over the baselines. The method is also shown to be combinable with methods that operate on the decoder side of the model, resulting in further metrics improvements.

Abstract
Recently, neural networks have shown impressive progress across diverse fields, with speech processing being no exception. However, recent breakthroughs in this area require extensive offline training using large datasets and tremendous computing resources. Unfortunately, these models struggle to retain their previously acquired knowledge when learning new tasks continually, and retraining from scratch is almost always impractical. In this paper, we investigate the problem of learning sequence-to-sequence models for spoken language understanding in a class-incremental learning (CIL) setting and we propose COCONUT, a CIL method that relies on the combination of experience replay and contrastive learning. Through a modified version of the standard supervised contrastive loss applied only to the rehearsal samples, COCONUT preserves the learned representations by pulling closer samples from the same class and pushing away the others. Moreover, we leverage a multimodal contrastive loss that helps the model learn more discriminative representations of the new data by aligning audio and text features. We also investigate different contrastive designs to combine the strengths of the contrastive loss with teacher-student architectures used for distillation. Experiments on two established SLU datasets reveal the effectiveness of our proposed approach and significant improvements over the baselines. We also show that COCONUT can be combined with methods that operate on the decoder side of the model, resulting in further metrics improvements.

摘要
Translated into Simplified Chinese:近期，神经网络在多个领域表现出色，语音处理不例外。然而，这些领域的最新突破都需要大量的数据和计算资源进行训练，并且现在难以保持之前学习的知识。在这篇论文中，我们研究了语音理解的类逐式学习（CIL）问题，并提出了一种基于经验回放和对比学习的方法——COCONUT。通过对储存样本中的对比loss进行修改，COCONUT保持了学习的表示，同时将相同类型的样本吸引近起来，并将别的样本推远。此外，我们还利用了多模态对比损失，帮助模型学习更有特征的新数据特征。我们还 investigate了不同的对比设计，以 combinator contrastive loss 的优点和教师-学生架构的逻辑。实验表明，我们的提出方法在两个已知的 SLU 数据集上具有显著的效果，并与基准值之间具有显著的改善。此外，我们还证明了 COCONUT 可以与decoder сторо面的方法结合使用，从而进一步提高 метрикс表现。

Bridging the Domain Gap by Clustering-based Image-Text Graph Matching

paper_url: http://arxiv.org/abs/2310.02692
repo_url: None
paper_authors: Nokyung Park, Daewon Chae, Jeongyong Shim, Sangpil Kim, Eun-Sol Kim, Jinkyu Kim
for: 本研究旨在学习域外化表示，以提高模型对未经看过的目标任务域的泛化能力。
methods: 本研究使用多Modal图表示方法，将图像和文本描述 fusion 为域外化的轴embedding。特别是，我们通过（i） represent 图像和文本描述为图形，并（ii）同时匹配图形基于图像节点特征和文本描述的图形结构进行域外化特征学习。
results: 我们在大规模公共数据集（CUB-DG和DomainBed）上进行实验，并实现了与或更好的目前状态艺的性能。我们的代码将在发表之前公开。

Abstract
Learning domain-invariant representations is important to train a model that can generalize well to unseen target task domains. Text descriptions inherently contain semantic structures of concepts and such auxiliary semantic cues can be used as effective pivot embedding for domain generalization problems. Here, we use multimodal graph representations, fusing images and text, to get domain-invariant pivot embeddings by considering the inherent semantic structure between local images and text descriptors. Specifically, we aim to learn domain-invariant features by (i) representing the image and text descriptions with graphs, and by (ii) clustering and matching the graph-based image node features into textual graphs simultaneously. We experiment with large-scale public datasets, such as CUB-DG and DomainBed, and our model achieves matched or better state-of-the-art performance on these datasets. Our code will be publicly available upon publication.

摘要
(i) representing the image and text descriptions with graphs, and(ii) clustering and matching the graph-based image node features into textual graphs simultaneously.We experiment with large-scale public datasets, such as CUB-DG and DomainBed, and our model achieves matched or better state-of-the-art performance on these datasets. Our code will be publicly available upon publication.Translated into Simplified Chinese:学习域外常量表示是训练模型在未经见目标任务域的总是重要的。文本描述本身就含有 semantic 结构，这些auxiliary semantic cue可以用作域外常量表示的有效 pivot embedding。我们使用多modal图表示，将图像和文本描述 fusion 在一起，以获取域外常量 pivot embedding，并通过考虑本地图像和文本描述之间的semantic结构来学习域外常量特征。我们的目标是通过：(i) 将图像和文本描述表示为图，并(ii) 将图形基于图像节点的匹配和文本描述图同时进行 clustering 来学习域外常量特征。我们在大规模的公共数据集，如 CUB-DG 和 DomainBed 上进行了实验，并达到了与最佳状态的表现。我们的代码将在发表后公开。

USB-NeRF: Unrolling Shutter Bundle Adjusted Neural Radiance Fields

paper_url: http://arxiv.org/abs/2310.02687
repo_url: None
paper_authors: Moyang Li, Peng Wang, Lingzhe Zhao, Bangyan Liao, Peidong Liu
For: The paper is written for novel view synthesis and camera motion estimation in the presence of rolling shutter (RS) images.* Methods: The paper proposes a method called Unrolling Shutter Bundle Adjusted Neural Radiance Fields (USB-NeRF) that corrects RS distortions and recovers accurate camera motion trajectory using a physical image formation model.* Results: The paper demonstrates better performance compared to prior works in terms of RS effect removal, novel view image synthesis, and camera motion estimation, as well as the ability to recover high-fidelity high frame-rate global shutter video from a sequence of RS images.

Abstract
Neural Radiance Fields (NeRF) has received much attention recently due to its impressive capability to represent 3D scene and synthesize novel view images. Existing works usually assume that the input images are captured by a global shutter camera. Thus, rolling shutter (RS) images cannot be trivially applied to an off-the-shelf NeRF algorithm for novel view synthesis. Rolling shutter effect would also affect the accuracy of the camera pose estimation (e.g. via COLMAP), which further prevents the success of NeRF algorithm with RS images. In this paper, we propose Unrolling Shutter Bundle Adjusted Neural Radiance Fields (USB-NeRF). USB-NeRF is able to correct rolling shutter distortions and recover accurate camera motion trajectory simultaneously under the framework of NeRF, by modeling the physical image formation process of a RS camera. Experimental results demonstrate that USB-NeRF achieves better performance compared to prior works, in terms of RS effect removal, novel view image synthesis as well as camera motion estimation. Furthermore, our algorithm can also be used to recover high-fidelity high frame-rate global shutter video from a sequence of RS images.

摘要
neural radiance fields (nerf) 在最近几年内得到了广泛关注，因为它可以出色地表示3D场景并生成新视图图像。现有的工作通常假设输入图像是由全球曝光相机捕捉的。因此，rolling shutter（RS）图像不能直接应用于现成的 nerf 算法中，rolling shutter 效应也会影响相机pose估计（例如 via COLMAP），从而阻止 nerf 算法在 RS 图像上成功。在这篇论文中，我们提出了 Unrolling Shutter Bundle Adjusted Neural Radiance Fields（USB-NeRF）。USB-NeRF 可以同时纠正 rolling shutter 扭曲和重新估计相机运动轨迹，基于 nerf 框架，并模型了RS相机的物理图像形成过程。实验结果表明，USB-NeRF 在RS图像中Remove rolling shutter distortions和重新估计相机运动轨迹方面比过去的工作更好，同时在新视图图像生成和相机pose估计方面也表现出更高的性能。此外，我们的算法还可以用来恢复高精度高帧率全球曝光视频。

Memoria: Hebbian Memory Architecture for Human-Like Sequential Processing

paper_url: http://arxiv.org/abs/2310.03052
repo_url: https://github.com/cosmoquester/memoria
paper_authors: Sangjun Park, JinYeong Bak
for: 提高Transformer模型对长输入序列的处理能力
methods: 根据HEBB的理论实现一个通用的记忆网络，称之为Memoria，用于增强长期依赖关系
results: 通过BERT和GPT等Popular Transformer模型的实验，显示Memoria可以有效地提高对长输入序列的处理能力，并在排序和语言模型等多种任务中表现出色

Abstract
Transformers have demonstrated their success in various domains and tasks. However, Transformers struggle with long input sequences due to their limited capacity. While one solution is to increase input length, endlessly stretching the length is unrealistic. Furthermore, humans selectively remember and use only relevant information from inputs, unlike Transformers which process all raw data from start to end. We introduce Memoria, a general memory network that applies Hebbian theory which is a major theory explaining human memory formulation to enhance long-term dependencies in neural networks. Memoria stores and retrieves information called engram at multiple memory levels of working memory, short-term memory, and long-term memory, using connection weights that change according to Hebb's rule. Through experiments with popular Transformer-based models like BERT and GPT, we present that Memoria significantly improves the ability to consider long-term dependencies in various tasks. Results show that Memoria outperformed existing methodologies in sorting and language modeling, and long text classification.

摘要
启发器已经在多个领域和任务中证明了其成功。然而，启发器在长输入序列上受到限制，因为它们的容量有限。而人类选择性地记忆和使用输入中的相关信息，而不是像启发器一样处理所有Raw数据从开始到结束。我们介绍Memoria，一种通用的记忆网络，该网络根据希贝尔理论，该理论是人类记忆形成的主要理论，以增强神经网络中长期依赖关系。Memoria在多个记忆层，包括工作记忆、短期记忆和长期记忆中存储和重新获取信息，使用根据希贝尔规则变化的连接重量。通过对BERT和GPT等启发器模型进行实验，我们表明了Memoria在不同任务中对长期依赖关系的考虑能力有显著改进。实验结果显示，Memoria在排序和语言模型化任务以及长文本分类任务中表现出色，超越了现有的方法ologies。

Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization

paper_url: http://arxiv.org/abs/2310.02679
repo_url: https://github.com/zdhNarsil/Diffusion-Generative-Flow-Samplers
paper_authors: Dinghuai Zhang, Ricky Tian Qi Chen, Cheng-Hao Liu, Aaron Courville, Yoshua Bengio
for: 解决高维密度函数难以样本的问题，这是机器学习和统计领域中常见的基础任务。
methods: 我们extend了最近的 sampling-based 方法，这些方法利用控制的抽象过程来模拟target density的近似样本。然而，这些方法的主要缺点是需要全 trajectory 来计算训练目标，从而导致了慢的凭据分配问题和使用整个 trajectory 的学习信号。
results: 我们提出了Diffusion Generative Flow Samplers (DGFS)，一种 sampling-based 框架，可以分解到短 partial trajectory 段，通过 parameterizing 一个 “流函数”。我们的方法启发自 generative flow networks (GFlowNets) 的理论，使我们可以利用中间学习信号和Off-policy 探索能力。通过多种复杂的实验，我们示出了 DGFS 比相关的先前方法更准确地估计 normalization constant。

Abstract
We tackle the problem of sampling from intractable high-dimensional density functions, a fundamental task that often appears in machine learning and statistics. We extend recent sampling-based approaches that leverage controlled stochastic processes to model approximate samples from these target densities. The main drawback of these approaches is that the training objective requires full trajectories to compute, resulting in sluggish credit assignment issues due to use of entire trajectories and a learning signal present only at the terminal time. In this work, we present Diffusion Generative Flow Samplers (DGFS), a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments, via parameterizing an additional "flow function". Our method takes inspiration from the theory developed for generative flow networks (GFlowNets), allowing us to make use of intermediate learning signals and benefit from off-policy exploration capabilities. Through a variety of challenging experiments, we demonstrate that DGFS results in more accurate estimates of the normalization constant than closely-related prior methods.

摘要
我们解决高维度对应函数抽样的问题，是机器学习和统计中的基本任务之一。我们延续了 latest sampling-based 方法，它们利用控制的测量过程来模拟高维度目标分布中的抽样。主要缺点是训练目标需要全程的路径来计算，从而导致 credit assignment 问题，因为使用整个路径和终端时间所得到的学习讯号。在这个工作中，我们提出了 Diffusion Generative Flow Samplers (DGFS)，一个抽样基础框架，可以追踪分配到短距离的 partial trajectory segments，通过另外增加一个 "流函数" 来 parameterize。我们的方法受到了生成流网络 (GFlowNets) 的理论影响，因此可以利用中途学习讯号和过度策略的优点。通过一些挑战性的实验，我们证明了 DGFS 可以更精确地估算 Normalization Constant than 相似的先前方法。

Spherical Position Encoding for Transformers

paper_url: http://arxiv.org/abs/2310.04454
repo_url: None
paper_authors: Eren Unlu
for: 本研究旨在提出一种基于RoPE架构的地理位置编码方法，以便在 transformer 架构中处理地理位置相关的信息。
methods: 本研究使用 RoPE 架构来提出一种基于圆形坐标的位置编码方法，以便在地理位置相关的信息上进行模型化。
results: 经过实验 validate 的结果表明，该方法可以在地理位置相关的任务上提供更好的性能，并且可以在不同的坐标系上进行可视化。

Abstract
Position encoding is the primary mechanism which induces notion of sequential order for input tokens in transformer architectures. Even though this formulation in the original transformer paper has yielded plausible performance for general purpose language understanding and generation, several new frameworks such as Rotary Position Embedding (RoPE) are proposed for further enhancement. In this paper, we introduce the notion of "geotokens" which are input elements for transformer architectures, each representing an information related to a geological location. Unlike the natural language the sequential position is not important for the model but the geographical coordinates are. In order to induce the concept of relative position for such a setting and maintain the proportion between the physical distance and distance on embedding space, we formulate a position encoding mechanism based on RoPE architecture which is adjusted for spherical coordinates.

摘要
“位置编码是转换器架构中主要的机制，用于induce输入токен的顺序序列。尽管这种形式ulation在原始转换器论文中已经实现了一般语言理解和生成的可行性，但新的框架如Rotary Position Embedding（RoPE）已经被提出来进行进一步的提升。在这篇论文中，我们引入了“地理токен”这个概念，每个tokent代表一个地理位置的信息。不同于自然语言，序列位置不是模型中重要的，而地理坐标则是。为了induce相对位置的概念并保持坐标空间中的比例，我们基于RoPE架构来修改圆形坐标的位置编码机制。”

Land-cover change detection using paired OpenStreetMap data and optical high-resolution imagery via object-guided Transformer

paper_url: http://arxiv.org/abs/2310.02674
repo_url: https://github.com/chenhongruixuan/objformer
paper_authors: Hongruixuan Chen, Cuiling Lan, Jian Song, Clifford Broni-Bediako, Junshi Xia, Naoto Yokoya
For: 本研究使用光学高分辨 imagery 和 OpenStreetMap (OSM) 数据进行土地覆盖变化检测。* Methods: 提出一种基于 object-based image analysis (OBIA) 技术和 Transformer 架构的 Object-guided Transformer (ObjFormer) 模型，以实现直接基于 OSM 数据和光学图像进行土地覆盖变化检测。* Results: 提出了一种新的半监督Semantic change detection task，不需要光学图像的手动标注地形变化标签来训练semantic change detector。两个轻量级semantic decoder被添加到 ObjFormer 中，以实现这个任务高效地。一种折补交叉熵损失被设计，以完全利用负样本，从而提高任务的性能。

Abstract
Optical high-resolution imagery and OpenStreetMap (OSM) data are two important data sources for land-cover change detection. Previous studies in these two data sources focus on utilizing the information in OSM data to aid the change detection on multi-temporal optical high-resolution images. This paper pioneers the direct detection of land-cover changes utilizing paired OSM data and optical imagery, thereby broadening the horizons of change detection tasks to encompass more dynamic earth observations. To this end, we propose an object-guided Transformer (ObjFormer) architecture by naturally combining the prevalent object-based image analysis (OBIA) technique with the advanced vision Transformer architecture. The introduction of OBIA can significantly reduce the computational overhead and memory burden in the self-attention module. Specifically, the proposed ObjFormer has a hierarchical pseudo-siamese encoder consisting of object-guided self-attention modules that extract representative features of different levels from OSM data and optical images; a decoder consisting of object-guided cross-attention modules can progressively recover the land-cover changes from the extracted heterogeneous features. In addition to the basic supervised binary change detection task, this paper raises a new semi-supervised semantic change detection task that does not require any manually annotated land-cover labels of optical images to train semantic change detectors. Two lightweight semantic decoders are added to ObjFormer to accomplish this task efficiently. A converse cross-entropy loss is designed to fully utilize the negative samples, thereby contributing to the great performance improvement in this task. The first large-scale benchmark dataset containing 1,287 map-image pairs (1024$\times$ 1024 pixels for each sample) covering 40 regions on six continents ...(see the manuscript for the full abstract)

摘要
《高分辨率光学图像和OpenStreetMap（OSM）数据为地形变化检测提供了两种重要数据源。先前的研究主要关注在使用OSM数据信息来 помо助多时间点光学高分辨率图像中的变化检测。本文提出了直接通过对OSM数据和光学图像的对比来检测地形变化的方法，从而扩展了变化检测任务的观察范围，包括更多的地球观测。为此，我们提出了一种带有对象指导的变换（ObjFormer）架构，通过自然地结合了流行的对象基本分析（OBIA）技术和先进的视图变换架构。对于OBIA的引入，可以在自身注意模块中减少计算负担和内存压力。具体来说，我们的ObjFormer架构包括以对象为引导的层次伪仿同模块，从OSM数据和光学图像中提取了不同级别的特征表示，以及一个以对象为引导的跨模块，可以逐步回归地形变化从提取的异质特征中。此外，本文还提出了一种新的半监督Semantic Change Detection任务，不需要光学图像的手动标注地形类别来训练Semantic Change Detector。为此，我们在ObjFormer架构中添加了两个轻量级semantic decoder，可以高效完成这个任务。我们还设计了一种倒推十字Entropy损失函数，以完全利用负样本，从而对性能做出大幅提升。本文的首个大规模benchmark数据集包含1,287个地图像对（每个样本为1024×1024像素），覆盖了6大洲40个区域。》

On Memorization in Diffusion Models

paper_url: http://arxiv.org/abs/2310.02664
repo_url: https://github.com/sail-sg/DiffMemorize
paper_authors: Xiangming Gu, Chao Du, Tianyu Pang, Chongxuan Li, Min Lin, Ye Wang
for: 这种研究旨在了解扩散模型在训练时的吸收行为，以及这种行为对模型的性能有何影响。
methods: 研究者使用了一系列实验和分析方法，包括定制扩散模型，变换训练数据，和评估模型性能等。
results: 研究者发现，扩散模型在训练时存在吸收行为，这种行为与数据集大小有关，并且可以通过评估模型的最大 memorization 来衡量。此外，研究者发现，在训练数据上随机的标签 Conditioning 可以显著提高模型的吸收能力。

Abstract
Due to their capacity to generate novel and high-quality samples, diffusion models have attracted significant research interest in recent years. Notably, the typical training objective of diffusion models, i.e., denoising score matching, has a closed-form optimal solution that can only generate training data replicating samples. This indicates that a memorization behavior is theoretically expected, which contradicts the common generalization ability of state-of-the-art diffusion models, and thus calls for a deeper understanding. Looking into this, we first observe that memorization behaviors tend to occur on smaller-sized datasets, which motivates our definition of effective model memorization (EMM), a metric measuring the maximum size of training data at which a learned diffusion model approximates its theoretical optimum. Then, we quantify the impact of the influential factors on these memorization behaviors in terms of EMM, focusing primarily on data distribution, model configuration, and training procedure. Besides comprehensive empirical results identifying the influential factors, we surprisingly find that conditioning training data on uninformative random labels can significantly trigger the memorization in diffusion models. Our study holds practical significance for diffusion model users and offers clues to theoretical research in deep generative models. Code is available at https://github.com/sail-sg/DiffMemorize.

摘要

Enhancing Energy-efficiency by Solving the Throughput Bottleneck of LSTM Cells for Embedded FPGAs

paper_url: http://arxiv.org/abs/2310.16842
repo_url: None
paper_authors: Chao Qian, Tianheng Ling, Gregor Schiele
for: 这个研究是为了提高互联网络 Things（IoT）中处理感应数据的效率。
methods: 这个研究提出了一种新的LSTM细胞优化方法，以提高FPGA上的能效处理。
results: 使用交通速度预测为 caso study，这个 vanilla LSTM 模型使用优化的LSTM细胞可以在FPGA上进行17534次推导每秒，仅consumption 3.8 $\mu$J每次推导，与现有方法相比，实现至少5.4倍的通过率和1.37倍的能效性。

Abstract
To process sensor data in the Internet of Things(IoTs), embedded deep learning for 1-dimensional data is an important technique. In the past, CNNs were frequently used because they are simple to optimise for special embedded hardware such as FPGAs. This work proposes a novel LSTM cell optimisation aimed at energy-efficient inference on end devices. Using the traffic speed prediction as a case study, a vanilla LSTM model with the optimised LSTM cell achieves 17534 inferences per second while consuming only 3.8 $\mu$J per inference on the FPGA \textit{XC7S15} from \textit{Spartan-7} family. It achieves at least 5.4$\times$ faster throughput and 1.37$\times$ more energy efficient than existing approaches.

摘要
为Internet of Things（IoT）设备进行感知数据处理，嵌入深度学习技术是非常重要的。在过去，通常使用卷积神经网络（CNN），因为它们在特定的嵌入式硬件上进行优化非常简单，如Field-Programmable Gate Array（FPGA）。这项工作提出了一种新的长短期记忆网络（LSTM）细胞优化，旨在实现能效的推理在终端设备上。使用交通速度预测为案例研究，一个使用优化LSTM细胞的淫荡LSTM模型在FPGA上实现了每秒17534次推理，并且只消耗了3.8微瓦特的能量每次推理。与现有方法相比，它具有至少5.4倍的通道数和1.37倍的能效率。

Solving Multi-Configuration Problems: A Performance Analysis with Choco Solver

paper_url: http://arxiv.org/abs/2310.02658
repo_url: https://github.com/AIG-ist-tugraz/ExamMultiConf
paper_authors: Benjamin Ritz, Alexander Felfernig, Viet-Man Le, Sebastian Lubos
for: 这篇论文是为了描述如何使用多配置功能来满足用户的偏好而写的。
methods: 论文使用了一种称为多配置的方法，该方法可以配置一组配置。
results: 论文通过示例描述了如何使用多配置来生成个性化考试。并提供了一个约束解决器性能分析，帮助了解相关性能问题。

Abstract
In many scenarios, configurators support the configuration of a solution that satisfies the preferences of a single user. The concept of \emph{multi-configuration} is based on the idea of configuring a set of configurations. Such a functionality is relevant in scenarios such as the configuration of personalized exams, the configuration of project teams, and the configuration of different trips for individual members of a tourist group (e.g., when visiting a specific city). In this paper, we exemplify the application of multi-configuration for generating individualized exams. We also provide a constraint solver performance analysis which helps to gain some insights into corresponding performance issues.

摘要
许多场景中，配置器支持配置一个满足单个用户的首选项的解决方案。基于多配置的概念，我们可以配置一组配置。这种功能在多个场景中是有用的，例如：个性化考试的配置、项目团队的配置以及不同旅游者组成员的旅游计划（例如，当访问特定城市时）。在这篇论文中，我们通过实现多配置来生成个性化考试。我们还提供了一种约束解决器性能分析，以帮助我们获得一些关于相关性能问题的启示。

A Study of Quantisation-aware Training on Time Series Transformer Models for Resource-constrained FPGAs

paper_url: http://arxiv.org/abs/2310.02654
repo_url: None
paper_authors: Tianheng Ling, Chao Qian, Lukas Einhaus, Gregor Schiele
for: 这个研究探讨时序变换器模型中的量化感知训练（Quantisation-aware training，QAT）。
methods: 我们提出了一种新的自适应量化方案，在QAT阶段动态选择对称和非对称量化方案。我们发现，根据实际数据分布来匹配量化方案可以降低计算开销，保持可接受的精度。此外，我们的方法对实际数据和混合精度量化中的大多数对象进行4比特量化。
results: 我们的结果表明，我们的方法可以减轻计算开销，同时保持可接受的精度。此外，我们的方法在实际数据和混合精度量化中表现稳定。这些发现可以帮助模型量化和部署决策，同时为量化技术的进一步发展提供基础。

Abstract
This study explores the quantisation-aware training (QAT) on time series Transformer models. We propose a novel adaptive quantisation scheme that dynamically selects between symmetric and asymmetric schemes during the QAT phase. Our approach demonstrates that matching the quantisation scheme to the real data distribution can reduce computational overhead while maintaining acceptable precision. Moreover, our approach is robust when applied to real-world data and mixed-precision quantisation, where most objects are quantised to 4 bits. Our findings inform model quantisation and deployment decisions while providing a foundation for advancing quantisation techniques.

摘要
Translation Notes:* "time series Transformer models" became "时间序列Transformer模型" (shíjiān xiàngxīng Transformer módel)* "quantisation-aware training" became "量化意识训练" (liàngzhì yìxiàng xùndǎo)* "adaptive quantization scheme" became "适应量化方案" (shìbiāng liàngzhì fāng'àn)* "symmetric and asymmetric schemes" became "对称和不对称方案" (duìxiàng hé bùduìxiàng fāng'àn)* "real data distribution" became "实际数据分布" (shíjì shùzhì fāngdīstribution)* "computational overhead" became "计算负担" (jìsuàn fùdāng)* "mixed-precision quantization" became "混合精度量化" (hùnhǎng jīngdù liàngzhì)* "most objects are quantized to 4 bits" became "大多数对象被量化为4比特" (dàduōshù de yǐngxìng bèi 4 bǐt)

GET: Group Event Transformer for Event-Based Vision

paper_url: http://arxiv.org/abs/2310.02642
repo_url: https://github.com/peterande/get-group-event-transformer
paper_authors: Yansong Peng, Yueyi Zhang, Zhiwei Xiong, Xiaoyan Sun, Feng Wu
for: 这篇论文主要是为了提出一种基于事件的视觉模型，即Group Event Transformer（GET），用于提高事件视觉的性能。
methods: 该方法首先提出了一种新的事件表示方式，称为Group Token，它将异步事件按照时间戳和方向分组。然后，GET使用了事件双自我注意力块和Group Token集成模块，以便在空间和时间-方向两个领域中有效地传递和集成特征。
results: 论文对四个事件视觉分类 dataset（Cifar10-DVS、N-MNIST、N-CARS和DVS128Gesture）以及两个事件视觉检测dataset（1Mpx和Gen1）进行了评估，结果显示，GET的性能比其他当前状态的方法更高。

Abstract
Event cameras are a type of novel neuromorphic sen-sor that has been gaining increasing attention. Existing event-based backbones mainly rely on image-based designs to extract spatial information within the image transformed from events, overlooking important event properties like time and polarity. To address this issue, we propose a novel Group-based vision Transformer backbone for Event-based vision, called Group Event Transformer (GET), which de-couples temporal-polarity information from spatial infor-mation throughout the feature extraction process. Specifi-cally, we first propose a new event representation for GET, named Group Token, which groups asynchronous events based on their timestamps and polarities. Then, GET ap-plies the Event Dual Self-Attention block, and Group Token Aggregation module to facilitate effective feature commu-nication and integration in both the spatial and temporal-polarity domains. After that, GET can be integrated with different downstream tasks by connecting it with vari-ous heads. We evaluate our method on four event-based classification datasets (Cifar10-DVS, N-MNIST, N-CARS, and DVS128Gesture) and two event-based object detection datasets (1Mpx and Gen1), and the results demonstrate that GET outperforms other state-of-the-art methods. The code is available at https://github.com/Peterande/GET-Group-Event-Transformer.

摘要
Event 摄像头是一种新型的神经мор夫设备，在过去几年内受到了不断的关注。现有的事件基于设计主要是通过图像转换的方式提取图像中的空间信息，忽略了事件中重要的时间和方向信息。为了解决这个问题，我们提议一种新的集群基于视力 transformer 背部筋，名为集群事件转换（GET），它在特征提取过程中分离了时间-方向信息和空间信息。具体来说，我们首先提出了一种新的事件表示方式，名为集群标识符（Group Token），该标识符将异步事件按照时间戳和方向相组织。然后，GET 应用了事件双重自我注意力块和集群标识符聚合模块，以便在空间和时间-方向域内进行有效的特征交换和集成。接着，GET 可以与不同的下游任务连接，以实现不同的应用。我们在四个事件基于分类 datasets（Cifar10-DVS、N-MNIST、N-CARS 和 DVS128Gesture）和两个事件基于物体检测 datasets（1Mpx 和 Gen1）进行了评估，结果表明，GET 的表现比其他状态的方法更出色。代码可以在 GitHub 上找到：https://github.com/Peterande/GET-Group-Event-Transformer。

Deformation-Invariant Neural Network and Its Applications in Distorted Image Restoration and Analysis

paper_url: http://arxiv.org/abs/2310.02641
repo_url: None
paper_authors: Han Zhang, Qiguang Chen, Lok Ming Lui
for:* 这篇论文旨在解决图像受到 геометрическими扭曲所导致的影像处理和计算机视觉任务中的问题，包括图像识别等。methods:* 该论文提出了一种名为几何不变网络（DINN）的框架，用于解决图像处理任务中的几何扭曲问题。DINN通过将几何变换网络（QCTN）作为其他深度网络的一部分，输出一个几何可变映射，以便将受到几何扭曲的图像转换为更加接近自然或好图像的版本。QCTN使用了一个深度神经网络，输出一个Beltrami系数，用于控制输出扭曲映射的本地几何扭曲。results:* 根据该框架，我们开发了一个图像分类网络，可以准确地分类受到扭曲的图像。我们的提议方案在受到大气扭曲和水扭曲的情况下进行了图像恢复，并且与现有的GAN基于方法相比，达到了更高的效果。此外，我们还应用了我们的提议方案到人脸图像的1-1验证中，并取得了满意的效果，进一步证明了我们的方法的有效性。

Abstract
Images degraded by geometric distortions pose a significant challenge to imaging and computer vision tasks such as object recognition. Deep learning-based imaging models usually fail to give accurate performance for geometrically distorted images. In this paper, we propose the deformation-invariant neural network (DINN), a framework to address the problem of imaging tasks for geometrically distorted images. The DINN outputs consistent latent features for images that are geometrically distorted but represent the same underlying object or scene. The idea of DINN is to incorporate a simple component, called the quasiconformal transformer network (QCTN), into other existing deep networks for imaging tasks. The QCTN is a deep neural network that outputs a quasiconformal map, which can be used to transform a geometrically distorted image into an improved version that is closer to the distribution of natural or good images. It first outputs a Beltrami coefficient, which measures the quasiconformality of the output deformation map. By controlling the Beltrami coefficient, the local geometric distortion under the quasiconformal mapping can be controlled. The QCTN is lightweight and simple, which can be readily integrated into other existing deep neural networks to enhance their performance. Leveraging our framework, we have developed an image classification network that achieves accurate classification of distorted images. Our proposed framework has been applied to restore geometrically distorted images by atmospheric turbulence and water turbulence. DINN outperforms existing GAN-based restoration methods under these scenarios, demonstrating the effectiveness of the proposed framework. Additionally, we apply our proposed framework to the 1-1 verification of human face images under atmospheric turbulence and achieve satisfactory performance, further demonstrating the efficacy of our approach.

摘要
图像受到 геометрических扭曲的影响 pose 图像处理和计算机视觉任务中的一个重要挑战。深度学习基于的图像模型通常无法在扭曲图像上提供准确的性能。在这篇论文中，我们提出了异构不变网络（DINN），用于解决图像处理任务中的扭曲图像问题。DINN输出了一致的缺省特征，用于表示扭曲图像，但是表示同一个物体或场景的图像。DINN的想法是将简单的组件，即射影变换网络（QCTN）， integrating 到现有的深度网络中。QCTN 是一个深度神经网络，输出一个射影变换矩阵，可以将扭曲图像转换成更加靠近自然或好图像的版本。它首先输出了一个 Бел特瑞姆系数，该系数测量射影变换矩阵的地方几何扭曲程度。通过控制 Бел特瑞姆系数，可以控制地方几何扭曲的程度。QCTN 轻量级、简单，可以轻松地与现有的深度网络集成，提高其性能。基于我们的框架，我们已经开发了一个图像分类网络，可以准确地分类扭曲图像。我们的提议的框架已经应用于 restore 扭曲图像，包括大气扭曲和水扭曲。DINN 在这些场景下比存在 GAN 基于的修复方法更高效， demonstrating 了我们的方法的有效性。此外，我们应用我们的提议的框架到人脸1：1验证中的大气扭曲场景，得到了满意的性能，进一步证明了我们的方法的有效性。

Foundation Reinforcement Learning: towards Embodied Generalist Agents with Foundation Prior Assistance

paper_url: http://arxiv.org/abs/2310.02635
repo_url: None
paper_authors: Weirui Ye, Yunsheng Zhang, Mengchen Wang, Shengjie Wang, Xianfan Gu, Pieter Abbeel, Yang Gao
for: 建立embodied generalist agents，需要开发一种基于大规模预训练的基础先验论证。这种基础先验论证可以帮助RL Agent在下游任务中更快地学习和更好地 Perform。
methods: 我们提出了一种基于目标conditioned MDP的embodied priors，包括基础策略、价值函数和成功奖励。我们还提出了一种基于这些embodied priors的actor-critic方法，称为Foundation Actor-Critic (FAC)。
results: 我们通过对Meta-World任务进行评估，发现FAC可以在 less than 200k frames 内达到100%成功率，而基eline方法需要更多的框架才能达到相同的水平。此外，我们还发现FAC具有耐量噪声特性，可以在噪声环境中进行学习和 Perform。最后，我们发现FAC可以在无需人工定义精密奖励或提供teleoperated demo的情况下自主学习和 Perform。

Abstract
Recently, people have shown that large-scale pre-training from internet-scale data is the key to building generalist models, as witnessed in NLP. To build embodied generalist agents, we and many other researchers hypothesize that such foundation prior is also an indispensable component. However, it is unclear what is the proper concrete form to represent those embodied foundation priors and how they should be used in the downstream task. In this paper, we propose an intuitive and effective set of embodied priors that consist of foundation policy, value, and success reward. The proposed priors are based on the goal-conditioned MDP. To verify their effectiveness, we instantiate an actor-critic method assisted by the priors, called Foundation Actor-Critic (FAC). We name our framework as Foundation Reinforcement Learning (FRL), since it completely relies on embodied foundation priors to explore, learn and reinforce. The benefits of FRL are threefold. (1) Sample efficient. With foundation priors, FAC learns significantly faster than traditional RL. Our evaluation on the Meta-World has proved that FAC can achieve 100% success rates for 7/8 tasks under less than 200k frames, which outperforms the baseline method with careful manual-designed rewards under 1M frames. (2) Robust to noisy priors. Our method tolerates the unavoidable noise in embodied foundation models. We show that FAC works well even under heavy noise or quantization errors. (3) Minimal human intervention: FAC completely learns from the foundation priors, without the need of human-specified dense reward, or providing teleoperated demos. Thus, FAC can be easily scaled up. We believe our FRL framework could enable the future robot to autonomously explore and learn without human intervention in the physical world. In summary, our proposed FRL is a novel and powerful learning paradigm, towards achieving embodied generalist agents.

摘要
近期，人们已经证明了大规模预训练从互联网级数据是拥有普适模型的关键，例如NLP。为建立embodied普适代理人，我们和许多其他研究人员认为，这种基础先验是不可或缺的组成部分。然而，它的具体表现形式是什么，并如何在下游任务中使用，这还是一个未知问题。在这篇论文中，我们提出了一种直观的和有效的embodied先验，它包括基础策略、价值和成功奖励。这些先验基于目标conditioned MDP。为验证其效果，我们实现了一种actor-critic方法，即基础actor-critic（FAC）。我们将我们的框架称为基础学习（FRL），因为它完全依赖于embodied基础先验来探索、学习和奖励。FRL的好处有三个方面：1. Sample efficient。与基础先验一起，FAC快速学习，我们在Meta-World上评估了FAC，它在 less than 200k frames 下可以达到 100% 成功率，而基础方法在 1M frames 下仍然未能达到这个目标。2. Robust to noisy priors。我们的方法可以忍受基础模型中的不可避免的噪音。我们展示了FAC在噪音或量化错误下仍然可以工作良好。3. Minimal human intervention。FAC完全从基础先验学习，不需要人类提供密集奖励或提供电动示范。因此，FAC可以轻松扩展。我们认为，我们的FRL框架可能会将未来的机器人让人类无需直接干预地自主探索和学习。总之，我们提出的FRL是一种新的和强大的学习方法，逐渐实现embodied普适代理人。

Multi-rules mining algorithm for combinatorially exploded decision trees with modified Aitchison-Aitken function-based Bayesian optimization

paper_url: http://arxiv.org/abs/2310.02633
repo_url: None
paper_authors: Yuto Omae, Masaya Mori, Yohei Kakimoto
for: 本研究旨在提出一种能够有效地找到可靠规则的方法，以解决decision trees中的 combinatorial explosion问题。
methods: 本研究提出了两种新的算法，即MAABO-MT和GS-MRM，它们可以在有限的计算成本下，找到全面性最高的decision trees，并提取可靠且不同的规则。
results: 实验结果表明，MAABO-MT可以更有效地找到可靠规则，并且比其他基于随机性的方法更具有深度的探索能力。此外，GS-MRM可以减少不必要的规则，提高可靠性。

Abstract
Decision trees offer the benefit of easy interpretation because they allow the classification of input data based on if--then rules. However, as decision trees are constructed by an algorithm that achieves clear classification with minimum necessary rules, the trees possess the drawback of extracting only minimum rules, even when various latent rules exist in data. Approaches that construct multiple trees using randomly selected feature subsets do exist. However, the number of trees that can be constructed remains at the same scale because the number of feature subsets is a combinatorial explosion. Additionally, when multiple trees are constructed, numerous rules are generated, of which several are untrustworthy and/or highly similar. Therefore, we propose "MAABO-MT" and "GS-MRM" algorithms that strategically construct trees with high estimation performance among all possible trees with small computational complexity and extract only reliable and non-similar rules, respectively. Experiments are conducted using several open datasets to analyze the effectiveness of the proposed method. The results confirm that MAABO-MT can discover reliable rules at a lower computational cost than other methods that rely on randomness. Furthermore, the proposed method is confirmed to provide deeper insights than single decision trees commonly used in previous studies. Therefore, MAABO-MT and GS-MRM can efficiently extract rules from combinatorially exploded decision trees.

摘要
To address these challenges, we propose two algorithms: "MAABO-MT" and "GS-MRM." MAABO-MT strategically constructs trees with high estimation performance among all possible trees with small computational complexity. GS-MRM extracts only reliable and non-similar rules from the constructed trees. Experiments were conducted using several open datasets to analyze the effectiveness of the proposed method. The results confirm that MAABO-MT can discover reliable rules at a lower computational cost than other methods that rely on randomness. Furthermore, the proposed method provides deeper insights than single decision trees commonly used in previous studies. Therefore, MAABO-MT and GS-MRM can efficiently extract rules from combinatorially exploded decision trees.

On Quantified Observability Analysis in Multiagent Systems

paper_url: http://arxiv.org/abs/2310.02614
repo_url: None
paper_authors: Chunyan Mu, Jun Pang
for: 本研究旨在帮助多智能体系统（MAS）操作人员通过观察系统行为来优化团队性能，但是也可能泄露敏感信息。
methods: 本研究提出了一种量化观察性分析方法，使用了 opacity 概念来形式表达 MAS 中 Agent 的观察性特性。我们提出了一种基于 temporal logic 的 oPATL 语言来描述 Agent 的观察性目标，并开发了一些验证技术来量化分析这些特性。
results: 我们在 PRISM 模型检查器上实现了该方法，并通过一些示例证明了其可行性和应用性。

Abstract
In multiagent systems (MASs), agents' observation upon system behaviours may improve the overall team performance, but may also leak sensitive information to an observer. A quantified observability analysis can thus be useful to assist decision-making in MASs by operators seeking to optimise the relationship between performance effectiveness and information exposure through observations in practice. This paper presents a novel approach to quantitatively analysing the observability properties in MASs. The concept of opacity is applied to formally express the characterisation of observability in MASs modelled as partially observable multiagent systems. We propose a temporal logic oPATL to reason about agents' observability with quantitative goals, which capture the probability of information transparency of system behaviours to an observer, and develop verification techniques for quantitatively analysing such properties. We implement the approach as an extension of the PRISM model checker, and illustrate its applicability via several examples.

摘要
在多代理系统（MAS）中，代理人对系统行为的观察可能会提高整体团队性能，但也可能会泄露敏感信息给观察者。一个量化的观察可能分析可以帮助决策在MAS中，由操作者寻求最佳化观察和信息暴露之间的关系。本纸提出了一种新的方法来量化MAS中的可观察性特性。我们将opacity应用到形式表示MAS中代理人对系统行为的可观察性，并提出了一种时间逻辑oPATL来评估代理人对于系统行为的可观察性，并开发了证明技术来量化这些特性。我们将这种方法扩展到PRISM模型检查器中，并透过一些例子说明其应用。

How FaR Are Large Language Models From Agents with Theory-of-Mind?

paper_url: http://arxiv.org/abs/2310.03051
repo_url: None
paper_authors: Pei Zhou, Aman Madaan, Srividya Pranavi Potharaju, Aditya Gupta, Kevin R. McKee, Ari Holtzman, Jay Pujara, Xiang Ren, Swaroop Mishra, Aida Nematzadeh, Shyam Upadhyay, Manaal Faruqui
for: 这个论文旨在测试大语言模型（LLM）是否可以从他们对人们思维的推理中得到策略性行为。
methods: 这个论文提出了一种新的评估模型（T4D），要求模型从故事中人物的思维状态推理中导出策略性行动。同时，论文还提出了一种零批训练框架（FaR），用于帮助模型更好地预测未来挑战和考虑可能的行动。
results: 实验表明，使用FaR框架可以提高GPT-4的表现从50%提高到71%，比其他训练方法和几何思维更高。此外，FaR框架还能在多种各种故事结构和场景中提高模型的表现，包括需要思维状态推理的多种场景。

Abstract
"Thinking is for Doing." Humans can infer other people's mental states from observations--an ability called Theory-of-Mind (ToM)--and subsequently act pragmatically on those inferences. Existing question answering benchmarks such as ToMi ask models questions to make inferences about beliefs of characters in a story, but do not test whether models can then use these inferences to guide their actions. We propose a new evaluation paradigm for large language models (LLMs): Thinking for Doing (T4D), which requires models to connect inferences about others' mental states to actions in social scenarios. Experiments on T4D demonstrate that LLMs such as GPT-4 and PaLM 2 seemingly excel at tracking characters' beliefs in stories, but they struggle to translate this capability into strategic action. Our analysis reveals the core challenge for LLMs lies in identifying the implicit inferences about mental states without being explicitly asked about as in ToMi, that lead to choosing the correct action in T4D. To bridge this gap, we introduce a zero-shot prompting framework, Foresee and Reflect (FaR), which provides a reasoning structure that encourages LLMs to anticipate future challenges and reason about potential actions. FaR boosts GPT-4's performance from 50% to 71% on T4D, outperforming other prompting methods such as Chain-of-Thought and Self-Ask. Moreover, FaR generalizes to diverse out-of-distribution story structures and scenarios that also require ToM inferences to choose an action, consistently outperforming other methods including few-shot in-context learning.

摘要

Multi-Agent Reinforcement Learning for Power Grid Topology Optimization

paper_url: http://arxiv.org/abs/2310.02605
repo_url: None
paper_authors: Erica van der Sar, Alessandro Zocca, Sandjai Bhulai
for: 管理增长中的能源网络，因为增加的能源需求和不可预测的可再生能源（太阳和风）。
methods: 使用分布式决策者（MARL）框架，利用电力网络的层次结构。
results: 实验结果表明，MARL框架与单个决策者RL方法相比，表现竞争力强。同时，对下级决策者的RL算法和高级决策者的策略进行比较。

Abstract
Recent challenges in operating power networks arise from increasing energy demands and unpredictable renewable sources like wind and solar. While reinforcement learning (RL) shows promise in managing these networks, through topological actions like bus and line switching, efficiently handling large action spaces as networks grow is crucial. This paper presents a hierarchical multi-agent reinforcement learning (MARL) framework tailored for these expansive action spaces, leveraging the power grid's inherent hierarchical nature. Experimental results indicate the MARL framework's competitive performance with single-agent RL methods. We also compare different RL algorithms for lower-level agents alongside different policies for higher-order agents.

摘要
最近的电力网操作挑战 arise from 增长的能源需求和不可预测的可再生能源 like 风和太阳。而增强学习（RL）显示出管理这些网络的应用潜力，通过顶层的动作 like 电网和线路调整。然而，为了有效地处理随网络规模增长的大型动作空间，是非常重要的。这篇文章提出了一个层次多智能类RL框架，利用电力网的自然层次结构。实验结果显示这个框架与单一RL方法相当竞争，我们还比较了不同的RL算法和高级掌控策略。

MagicDrive: Street View Generation with Diverse 3D Geometry Control

paper_url: http://arxiv.org/abs/2310.02601
repo_url: https://github.com/cure-lab/MagicDrive
paper_authors: Ruiyuan Gao, Kai Chen, Enze Xie, Lanqing Hong, Zhenguo Li, Dit-Yan Yeung, Qiang Xu
for: The paper is written for the task of street view generation with 3D control, specifically for 3D perception tasks like 3D object detection.
methods: The paper proposes a novel framework called MagicDrive, which uses tailored encoding strategies to generate street views with diverse 3D geometry controls, including camera poses, road maps, and 3D bounding boxes, as well as textual descriptions. The framework also incorporates a cross-view attention module to ensure consistency across multiple camera views.
results: The paper achieves high-fidelity street-view synthesis that captures nuanced 3D geometry and various scene descriptions, enhancing tasks like BEV segmentation and 3D object detection.

Abstract
Recent advancements in diffusion models have significantly enhanced the data synthesis with 2D control. Yet, precise 3D control in street view generation, crucial for 3D perception tasks, remains elusive. Specifically, utilizing Bird's-Eye View (BEV) as the primary condition often leads to challenges in geometry control (e.g., height), affecting the representation of object shapes, occlusion patterns, and road surface elevations, all of which are essential to perception data synthesis, especially for 3D object detection tasks. In this paper, we introduce MagicDrive, a novel street view generation framework offering diverse 3D geometry controls, including camera poses, road maps, and 3D bounding boxes, together with textual descriptions, achieved through tailored encoding strategies. Besides, our design incorporates a cross-view attention module, ensuring consistency across multiple camera views. With MagicDrive, we achieve high-fidelity street-view synthesis that captures nuanced 3D geometry and various scene descriptions, enhancing tasks like BEV segmentation and 3D object detection.

摘要
In this paper, we introduce MagicDrive, a novel street view generation framework that offers diverse 3D geometry controls, including camera poses, road maps, and 3D bounding boxes, along with textual descriptions. This is achieved through tailored encoding strategies. Additionally, our design incorporates a cross-view attention module to ensure consistency across multiple camera views.With MagicDrive, we achieve high-fidelity street-view synthesis that captures nuanced 3D geometry and various scene descriptions, enhancing tasks such as BEV segmentation and 3D object detection.

A ModelOps-based Framework for Intelligent Medical Knowledge Extraction

paper_url: http://arxiv.org/abs/2310.02593
repo_url: None
paper_authors: Hongxin Ding, Peinie Zou, Zhiyuan Wang, Junfeng Zhao, Yasha Wang, Qiang Zhou
for: 提高医疗文本中的医学知识抽取，以便进行下游任务，如医学知识图构建和临床决策。
methods: 提出一个基于ModelOps的智能医学知识抽取框架，该框架提供了一个低代码系统 для模型选择、训练、评估和优化。
results: 提出了一种基于多层回调函数的数据集抽象机制，以及一种基于数据集相似性的模型推荐方法，帮助用户快速找到适合给定数据集的模型。

Abstract
Extracting medical knowledge from healthcare texts enhances downstream tasks like medical knowledge graph construction and clinical decision-making. However, the construction and application of knowledge extraction models lack automation, reusability and unified management, leading to inefficiencies for researchers and high barriers for non-AI experts such as doctors, to utilize knowledge extraction. To address these issues, we propose a ModelOps-based intelligent medical knowledge extraction framework that offers a low-code system for model selection, training, evaluation and optimization. Specifically, the framework includes a dataset abstraction mechanism based on multi-layer callback functions, a reusable model training, monitoring and management mechanism. We also propose a model recommendation method based on dataset similarity, which helps users quickly find potentially suitable models for a given dataset. Our framework provides convenience for researchers to develop models and simplifies model access for non-AI experts such as doctors.

摘要
<>传输文本到Simplified Chinese。<>从医疗文本中提取医学知识可以提高下游任务，如医学知识图构建和临床决策。然而，知识提取模型的建构和应用缺乏自动化、再利用和统一管理，导致研究人员的不fficient和非AI专家，如医生， Utilize知识提取。为解决这些问题，我们提出了基于ModelOps的智能医学知识提取框架，该框架提供了低代码系统，用于选择模型、训练、评估和优化。具体来说，该框架包括基于多层回调函数的数据集抽象机制，可重用的模型训练、监控和管理机制。我们还提出了基于数据集相似性的模型推荐方法，帮助用户快速找到适合给定数据集的可能适用的模型。我们的框架为研究人员开发模型提供了便利，并简化了非AI专家，如医生，访问模型的过程。

On the Stability of Expressive Positional Encodings for Graph Neural Networks

paper_url: http://arxiv.org/abs/2310.02579
repo_url: https://github.com/Graph-COM/SPE
paper_authors: Yinan Huang, William Lu, Joshua Robinson, Yu Yang, Muhan Zhang, Stefanie Jegelka, Pan Li
for: 本文主要针对Graph Transformer和Message-Passing Graph Neural Network中的Positional Encoding问题，提出了一种稳定和表达complete的解决方案。
methods: 本文使用Laplacian eigenvectors作为Positional Encoding，但是存在两个主要挑战：非唯一性和不稳定性。而大多数现有方法只关注非唯一性，忽略了稳定性问题。本文提出了一种”软分区”的方法来解决稳定性问题，并证明了这种方法的稳定性和表达能力。
results: 本文通过分子性质预测和 OUT-OF-distribution泛化任务的实验表明，SPE方法可以提高Positional Encoding的泛化能力和稳定性。

Abstract
Designing effective positional encodings for graphs is key to building powerful graph transformers and enhancing message-passing graph neural networks. Although widespread, using Laplacian eigenvectors as positional encodings faces two fundamental challenges: (1) \emph{Non-uniqueness}: there are many different eigendecompositions of the same Laplacian, and (2) \emph{Instability}: small perturbations to the Laplacian could result in completely different eigenspaces, leading to unpredictable changes in positional encoding. Despite many attempts to address non-uniqueness, most methods overlook stability, leading to poor generalization on unseen graph structures. We identify the cause of instability to be a "hard partition" of eigenspaces. Hence, we introduce Stable and Expressive Positional Encodings (SPE), an architecture for processing eigenvectors that uses eigenvalues to "softly partition" eigenspaces. SPE is the first architecture that is (1) provably stable, and (2) universally expressive for basis invariant functions whilst respecting all symmetries of eigenvectors. Besides guaranteed stability, we prove that SPE is at least as expressive as existing methods, and highly capable of counting graph structures. Finally, we evaluate the effectiveness of our method on molecular property prediction, and out-of-distribution generalization tasks, finding improved generalization compared to existing positional encoding methods.

摘要
设计有效的位置编码方法对于图 transformations 和消息传递图神经网络是关键。虽然广泛使用laplacian eigenvector作为位置编码，但面临两个基本挑战：（1）非唯一性：存在多种不同的laplacian eigendecompositions，（2）不稳定性：小 perturbations 可能导致完全不同的eigenspaces，从而导致位置编码不可预测。despite many attempts to address non-uniqueness, most methods overlook stability, leading to poor generalization on unseen graph structures. we identify the cause of instability to be a "hard partition" of eigenspaces. hence, we introduce Stable and Expressive Positional Encodings (SPE), an architecture for processing eigenvectors that uses eigenvalues to "softly partition" eigenspaces. SPE is the first architecture that is (1) provably stable, and (2) universally expressive for basis-invariant functions while respecting all symmetries of eigenvectors. besides guaranteed stability, we prove that SPE is at least as expressive as existing methods, and highly capable of counting graph structures. finally, we evaluate the effectiveness of our method on molecular property prediction and out-of-distribution generalization tasks, finding improved generalization compared to existing positional encoding methods.Here's the translation in Traditional Chinese:设计有效的位置编码方法对于图 transformations 和消息传递图神经网络是关键。虽然广泛使用laplacian eigenvector作为位置编码，但面临两个基本挑战：（1）非唯一性：存在多种不同的laplacian eigendecompositions，（2）不稳定性：小 perturbations 可能导致完全不同的eigenspaces，从而导致位置编码不可预测。despite many attempts to address non-uniqueness, most methods overlook stability, leading to poor generalization on unseen graph structures. we identify the cause of instability to be a "hard partition" of eigenspaces. hence, we introduce Stable and Expressive Positional Encodings (SPE), an architecture for processing eigenvectors that uses eigenvalues to "softly partition" eigenspaces. SPE is the first architecture that is (1) provably stable, and (2) universally expressive for basis-invariant functions while respecting all symmetries of eigenvectors. besides guaranteed stability, we prove that SPE is at least as expressive as existing methods, and highly capable of counting graph structures. finally, we evaluate the effectiveness of our method on molecular property prediction and out-of-distribution generalization tasks, finding improved generalization compared to existing positional encoding methods.

Stand for Something or Fall for Everything: Predict Misinformation Spread with Stance-Aware Graph Neural Networks

paper_url: http://arxiv.org/abs/2310.02568
repo_url: None
paper_authors: Zihan Chen, Jingyi Sun, Rong Liu, Feng Mai
for: 这篇研究旨在提出一个基于用户立场的图 neural network（stance-aware GNN），以预测社交媒体上的谣传。
methods: 这篇研究使用了四种信息传递路径，并使用可变的注意力权重提供解释性。
results: 比较之下，stance-aware GNN 的预测性能高于参考值 by 32.65%，并高于不包含用户立场的进阶 GNN by over 4.69%。另外，注意力权重显示了用户反对立场对他们邻居的行为有较高的影响力，这可能 function as social correction to halt misinformation propagation.

Abstract
Although pervasive spread of misinformation on social media platforms has become a pressing challenge, existing platform interventions have shown limited success in curbing its dissemination. In this study, we propose a stance-aware graph neural network (stance-aware GNN) that leverages users' stances to proactively predict misinformation spread. As different user stances can form unique echo chambers, we customize four information passing paths in stance-aware GNN, while the trainable attention weights provide explainability by highlighting each structure's importance. Evaluated on a real-world dataset, stance-aware GNN outperforms benchmarks by 32.65% and exceeds advanced GNNs without user stance by over 4.69%. Furthermore, the attention weights indicate that users' opposition stances have a higher impact on their neighbors' behaviors than supportive ones, which function as social correction to halt misinformation propagation. Overall, our study provides an effective predictive model for platforms to combat misinformation, and highlights the impact of user stances in the misinformation propagation.

摘要

Improving Automatic VQA Evaluation Using Large Language Models

paper_url: http://arxiv.org/abs/2310.02567
repo_url: None
paper_authors: Oscar Mañas, Benno Krojer, Aishwarya Agrawal
for: 提高自动评估的精度，使其更能反映人类判断。
methods: 利用 instruciton-tuned 大语言模型来建立更加稳健的 VQA 评估 metric。
results: 比 existed 评估 metric 更好地与人类判断相关，适用于多种 VQA 模型和测试数据集。

Abstract
8 years after the visual question answering (VQA) task was proposed, accuracy remains the primary metric for automatic evaluation. VQA Accuracy has been effective so far in the IID evaluation setting. However, our community is undergoing a shift towards open-ended generative models and OOD evaluation. In this new paradigm, the existing VQA Accuracy metric is overly stringent and underestimates the performance of VQA systems. Thus, there is a need to develop more robust automatic VQA metrics that serve as a proxy for human judgment. In this work, we propose to leverage the in-context learning capabilities of instruction-tuned large language models (LLMs) to build a better VQA metric. We formulate VQA evaluation as an answer-rating task where the LLM is instructed to score the accuracy of a candidate answer given a set of reference answers. We demonstrate the proposed metric better correlates with human judgment compared to existing metrics across several VQA models and benchmarks. We hope wide adoption of our metric will contribute to better estimating the research progress on the VQA task.

摘要
八年后，视觉问答任务（VQA）的自动评估 metric 仍然是评估系统的首选metric。VQA 精度在closed-set中是有效的，但我们社区正在转移到开放式生成模型和 OOD 评估。在这个新 paradigm 中，现有的 VQA 精度 metric 太严格，下降了 VQA 系统的表现。因此，我们需要开发更加 Robust 的自动 VQA metric，作为人类判断的代理。在这种工作中，我们利用 instruction-tuned 大语言模型（LLM）的上下文学习能力，构建了一个更好的 VQA 评估 metric。我们将 VQA 评估定义为一个答案评分任务，LLM 根据一组参考答案，对候选答案进行评分。我们示出了我们提出的 metric 与人类判断更加相关，在多个 VQA 模型和benchmark上表现出了更好的性能。我们希望广泛采用我们的 metric，能够更好地评估 VQA 任务的研究进步。

Improving Drumming Robot Via Attention Transformer Network

paper_url: http://arxiv.org/abs/2310.02565
repo_url: None
paper_authors: Yang Yi, Zonghan Li
for: 这个论文主要关注在娱乐领域中使用机器人技术，具体来说是开发一种基于视transformer网络的自动化音乐识别机器人。
methods: 该方法使用视transformer网络，可以高效处理顺序音频嵌入数据，并模型全局长距离相关性。
results: 大量实验结果表明，改进算法可以提高鼓类别性能，从而帮助机器人享受多种智能应用和服务。

Abstract
Robotic technology has been widely used in nowadays society, which has made great progress in various fields such as agriculture, manufacturing and entertainment. In this paper, we focus on the topic of drumming robots in entertainment. To this end, we introduce an improving drumming robot that can automatically complete music transcription based on the popular vision transformer network based on the attention mechanism. Equipped with the attention transformer network, our method can efficiently handle the sequential audio embedding input and model their global long-range dependencies. Massive experimental results demonstrate that the improving algorithm can help the drumming robot promote drum classification performance, which can also help the robot to enjoy a variety of smart applications and services.

摘要
现代社会中，机器人技术已经广泛应用，在各个领域如农业、制造和娱乐等方面取得了大量的进步。在这篇论文中，我们将关注乐队琴行业中的打击机器人。为了实现这一目标，我们提出了一种基于流行的视Transformer网络的注意机制来改进打击机器人的音乐识别性能。凭借注意Transformer网络，我们的方法可以有效地处理串行音频嵌入，并模型全球长距离关系。大量的实验结果表明，改进算法可以帮助打击机器人提高打击类型分类性能，这也可以帮助机器人享受到多种智能应用和服务。

zkFL: Zero-Knowledge Proof-based Gradient Aggregation for Federated Learning

paper_url: http://arxiv.org/abs/2310.02554
repo_url: None
paper_authors: Zhipeng Wang, Nanqing Dong, Jiahao Sun, William Knottenbelt
for: 提高 Federated Learning（FL）中中央聚合器的可信度问题
methods: 使用零知识证明（ZKP）和区块链来保证中央聚合器在训练模型归一化过程中的正确性
results: zkFL可以在不改变FL网络结构的情况下，提高安全性和隐私性，而无需重大地降低训练速度。

Abstract
Federated Learning (FL) is a machine learning paradigm, which enables multiple and decentralized clients to collaboratively train a model under the orchestration of a central aggregator. Traditional FL solutions rely on the trust assumption of the centralized aggregator, which forms cohorts of clients in a fair and honest manner. However, a malicious aggregator, in reality, could abandon and replace the client's training models, or launch Sybil attacks to insert fake clients. Such malicious behaviors give the aggregator more power to control clients in the FL setting and determine the final training results. In this work, we introduce zkFL, which leverages zero-knowledge proofs (ZKPs) to tackle the issue of a malicious aggregator during the training model aggregation process. To guarantee the correct aggregation results, the aggregator needs to provide a proof per round. The proof can demonstrate to the clients that the aggregator executes the intended behavior faithfully. To further reduce the verification cost of clients, we employ a blockchain to handle the proof in a zero-knowledge way, where miners (i.e., the nodes validating and maintaining the blockchain data) can verify the proof without knowing the clients' local and aggregated models. The theoretical analysis and empirical results show that zkFL can achieve better security and privacy than traditional FL, without modifying the underlying FL network structure or heavily compromising the training speed.

摘要

Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data

paper_url: http://arxiv.org/abs/2310.02540
repo_url: https://github.com/AutoFP/Auto-FP
paper_authors: Danrui Qi, Jinglin Peng, Yongjun He, Jiannan Wang
for: 本研究旨在自动化特征预处理（Auto-FP），以提高机器学习模型的质量。
methods: 本研究使用了演化算法、神经网络搜索算法等方法来自动化特征预处理。
results: 研究发现，演化算法在45个公共机器学习数据集上表现最佳，而随机搜索则出乎意料地表现良好。这些结果可能归结于特征预处理的具体问题和搜索算法的设计。

Abstract
Classical machine learning models, such as linear models and tree-based models, are widely used in industry. These models are sensitive to data distribution, thus feature preprocessing, which transforms features from one distribution to another, is a crucial step to ensure good model quality. Manually constructing a feature preprocessing pipeline is challenging because data scientists need to make difficult decisions about which preprocessors to select and in which order to compose them. In this paper, we study how to automate feature preprocessing (Auto-FP) for tabular data. Due to the large search space, a brute-force solution is prohibitively expensive. To address this challenge, we interestingly observe that Auto-FP can be modelled as either a hyperparameter optimization (HPO) or a neural architecture search (NAS) problem. This observation enables us to extend a variety of HPO and NAS algorithms to solve the Auto-FP problem. We conduct a comprehensive evaluation and analysis of 15 algorithms on 45 public ML datasets. Overall, evolution-based algorithms show the leading average ranking. Surprisingly, the random search turns out to be a strong baseline. Many surrogate-model-based and bandit-based search algorithms, which achieve good performance for HPO and NAS, do not outperform random search for Auto-FP. We analyze the reasons for our findings and conduct a bottleneck analysis to identify the opportunities to improve these algorithms. Furthermore, we explore how to extend Auto-FP to support parameter search and compare two ways to achieve this goal. In the end, we evaluate Auto-FP in an AutoML context and discuss the limitations of popular AutoML tools. To the best of our knowledge, this is the first study on automated feature preprocessing. We hope our work can inspire researchers to develop new algorithms tailored for Auto-FP.

摘要
传统机器学习模型，如线性模型和树状模型，在实际应用中广泛使用。这些模型对数据分布敏感，因此特征预处理成为确保模型质量的关键步骤。手动构建特征预处理管道是具有挑战性的，因为数据科学家需要做出许多困难的决策，选择哪些预处理器并将它们按什么顺序排序。在这篇论文中，我们研究如何自动化特征预处理（Auto-FP） для表格数据。由于搜索空间很大，简单的策略是不可能的。为了解决这个挑战，我们发现了一个有趣的观察：Auto-FP可以被视为 either hyperparameter optimization（HPO）或 neural architecture search（NAS）问题。这一观察使得我们可以将多种 HPO 和 NAS 算法扩展到解决 Auto-FP 问题。我们对 15 种算法进行了全面的评估和分析，并在 45 个公共机器学习数据集上进行了比较。总的来说，演化算法显示出了最佳平均排名。尽管随机搜索表现出色，但许多基于模拟器和带刺搜索算法，在 HPO 和 NAS 中表现良好，却不能超越随机搜索。我们分析了这些结果的原因，并进行了瓶颈分析，以寻找改进这些算法的机会。此外，我们还探讨了如何将 Auto-FP 扩展到支持参数搜索，并评估了两种实现方式。最后，我们评估了 Auto-FP 在 AutoML 上的表现，并讨论了流行的 AutoML 工具的局限性。到目前为止，这是自动化特征预处理的首次研究。我们希望我们的工作能启发研究人员开发特定于 Auto-FP 的新算法。

Literature Based Discovery (LBD): Towards Hypothesis Generation and Knowledge Discovery in Biomedical Text Mining

paper_url: http://arxiv.org/abs/2310.03766
repo_url: None
paper_authors: Balu Bhasuran, Gurusamy Murugesan, Jeyakumar Natarajan
for: automatic discovery of novel associations between medical terms in biomedical literature
methods: Literature Based Discovery (LBD) using concept profiles and statistical significance, and deep learning applications such as transformer models and neural networks
results: potential associations hidden in biomedical literature, including key biomedical discoveries in biomedicineHere’s the full text in Simplified Chinese:
for: 自动发现生物医学文献中的医学术语之间的新相关性
methods: 使用概念profile和统计学 significado的Literature Based Discovery (LBD)方法，以及深度学习应用such as transformer模型和神经网络
results: 生物医学文献中隐藏的可能相关性，包括生物医学中的重要发现I hope that helps!

Abstract
Biomedical knowledge is growing in an astounding pace with a majority of this knowledge is represented as scientific publications. Text mining tools and methods represents automatic approaches for extracting hidden patterns and trends from this semi structured and unstructured data. In Biomedical Text mining, Literature Based Discovery (LBD) is the process of automatically discovering novel associations between medical terms otherwise mentioned in disjoint literature sets. LBD approaches proven to be successfully reducing the discovery time of potential associations that are hidden in the vast amount of scientific literature. The process focuses on creating concept profiles for medical terms such as a disease or symptom and connecting it with a drug and treatment based on the statistical significance of the shared profiles. This knowledge discovery approach introduced in 1989 still remains as a core task in text mining. Currently the ABC principle based two approaches namely open discovery and closed discovery are mostly explored in LBD process. This review starts with general introduction about text mining followed by biomedical text mining and introduces various literature resources such as MEDLINE, UMLS, MESH, and SemMedDB. This is followed by brief introduction of the core ABC principle and its associated two approaches open discovery and closed discovery in LBD process. This review also discusses the deep learning applications in LBD by reviewing the role of transformer models and neural networks based LBD models and its future aspects. Finally, reviews the key biomedical discoveries generated through LBD approaches in biomedicine and conclude with the current limitations and future directions of LBD.

摘要
The LBD process creates concept profiles for medical terms, such as diseases or symptoms, and connects them with drugs and treatments based on the statistical significance of the shared profiles. This knowledge discovery approach was introduced in 1989 and remains a core task in text mining. Currently, the ABC principle-based two approaches, open discovery and closed discovery, are mostly explored in the LBD process.This review begins with a general introduction to text mining, followed by biomedical text mining and an overview of various literature resources, such as MEDLINE, UMLS, MESH, and SemMedDB. The review then introduces the core ABC principle and its associated two approaches in the LBD process, open discovery and closed discovery. Additionally, the review discusses the deep learning applications in LBD, including transformer models and neural network-based LBD models, and their future aspects.Finally, the review highlights key biomedical discoveries generated through LBD approaches in biomedicine and concludes with current limitations and future directions of LBD.Here is the Simplified Chinese translation of the text:生物医学知识在快速增长，大多数这些知识表现为科学文献。自动文本挖掘工具和方法可以从这些不结构化和无结构化数据中提取隐藏的模式和趋势。在生物医学文本挖掘中，文献基于发现（LBD）是自动发现医学术语之间的新关系的过程。LBD方法已经证明可以成功减少医学文献中潜在关系的发现时间。LBD过程中创建医学术语的概念profile，如疾病或症状，并将其与药物和治疗相连接，基于这些概念profile之间的统计学 significado。这种知识发现方法在1989年引入，并且至今仍然是文本挖掘中核心任务。目前，ABC原则基于的两种方法，开发发现和关闭发现，在LBD过程中得到广泛的探索。本文首先介绍了文本挖掘的概述，然后是生物医学文本挖掘，并提供了各种文献资源的概述，如MEDLINE、UMLS、MESH和SemMedDB。接着，文章介绍了ABC原则和其相关的两种方法，开发发现和关闭发现，在LBD过程中的应用。此外，文章还评论了深度学习在LBD中的应用，包括转换器模型和神经网络基于的LBD模型，以及其未来方向。最后，文章着重介绍了通过LBD方法在生物医学中发现的关键发现，并评估了当前LBD的限制和未来方向。

MIDDAG: Where Does Our News Go? Investigating Information Diffusion via Community-Level Information Pathways

paper_url: http://arxiv.org/abs/2310.02529
repo_url: None
paper_authors: Mingyu Derek Ma, Alexander K. Taylor, Nuan Wen, Yanchen Liu, Po-Nien Kung, Wenna Qin, Shicheng Wen, Azure Zhou, Diyi Yang, Xuezhe Ma, Nanyun Peng, Wei Wang
for: 这个研究是为了创建一个直观的社交媒体信息传播系统，用于探索COVID-19相关新闻文章在社交媒体上的信息传播路径，以及用户/社区感染水平、群体响应和信息传播的预测能力。
methods: 这个研究使用了社交媒体数据分析技术和人工智能模型，挖掘了COVID-19相关新闻文章在社交媒体上的信息传播路径，并构建了用户群体和信息传播预测能力。
results: 研究发现，COVID-19相关新闻文章在社交媒体上的信息传播路径具有复杂的层次结构和分布特征，并且可以根据用户群体和信息传播预测能力进行预测和跟踪。

Abstract
We present MIDDAG, an intuitive, interactive system that visualizes the information propagation paths on social media triggered by COVID-19-related news articles accompanied by comprehensive insights including user/community susceptibility level, as well as events and popular opinions raised by the crowd while propagating the information. Besides discovering information flow patterns among users, we construct communities among users and develop the propagation forecasting capability, enabling tracing and understanding of how information is disseminated at a higher level.

摘要
我们介绍MIDDAG，一个直观互动的系统，可以视觉化COVID-19相关新闻文章所Trigger的信息宣传路径在社交媒体上。此外，我们还提供了用户/社区感染水平的全面报告，以及由群众宣传的事件和受欢迎的观点。我们不仅揭示用户之间信息流动的模式，还可以建立用户群体和宣传预测能力，以跟踪和理解信息如何在更高层次传播。

CITING: Large Language Models Create Curriculum for Instruction Tuning

paper_url: http://arxiv.org/abs/2310.02527
repo_url: None
paper_authors: Tao Feng, Zifeng Wang, Jimeng Sun
for: 本研究旨在开发一种基于人工智能模型的自动化 instruktion 优化方法，以减少人工指导的成本，提高大语言模型的发展效率。
methods: 我们提出一种基于教师AI模型的 instruction 优化方法，即 Curriculum Instruction TunING (CITING)，它包括两个主要步骤：首先，教师AI模型制定评价答案的标准，然后学生AI模型通过自我修复来学习遵循这些标准。我们还进行了多 Iterative 实现 CITING。
results: 我们对四个数据集进行比较，结果表明，使用 CITING 方法可以大幅提高 GPT-4 评价中的报道、深度和全面性， Specifically, it achieves an average winning rate of 79.4% over SFT, 73.4% over RLHF, 78.1% over RRHF, and 76.3% over RAFT, respectively.

Abstract
The recent advancement of large language models (LLMs) has been achieved through a combo of instruction tuning and human alignment. However, building manually crafted instruction datasets and performing human alignment become the bottleneck for scaling the development of LLMs. In this paper, we exploit the idea of leveraging AI models in lieu of humans as the teacher to train student LLMs. Our method is inspired by how human students refine their writing skills by following the rubrics and learning from the revisions offered by their tutors. Specifically, we employ a teacher LLM to create a curriculum for instruction tuning of the student LLM, namely Curriculum Instruction TunING (CITING). It encompasses two main steps: (1) the teacher LLM crafts the rubrics for evaluating the answers corresponding to various types of questions, and (2) the student LLM learns to follow the rubrics and perform self-correction from the revision made by the teacher. We further iteratively carry out it to embody the procedure of CITING. We compare CITING to a series of state-of-the-art baselines on four datasets. Our method demonstrates strong improvement in terms of articulate, in-depth, and comprehensive by GPT-4 evaluation. Specifically, it achieves an average winning rate of 79.4% over SFT, 73.4% over RLHF, 78.1% over RRHF, and 76.3% over RAFT, respectively.

摘要
大量语言模型（LLM）的最近进步得到了人工调教和人类对齐的组合。然而，建立手动制作的指导数据集和进行人类对齐成为了大量发展LLM的瓶颈。在这篇论文中，我们利用人类学生如何通过指南和老师的修改来练习写作技巧的想法，即使用AI模型来取代人类老师来训练学生LLM。我们的方法被称为学习指南调教（CITING）。它包括两个主要步骤：（1）教师LLM制定评估答案的评价标准，（2）学生LLM按照标准进行自修改。我们还在不断迭代执行CITING来体现程序。我们与一系列现有的基线比较CITING，在四个数据集上表现出了强大的改善。具体来说，它在SFT、RLHF、RRHF和RAFT等四个数据集上平均赢得了79.4%、73.4%、78.1%和76.3%的胜率。

Federated Conditional Stochastic Optimization

paper_url: http://arxiv.org/abs/2310.02524
repo_url: https://github.com/xidongwu/Federated-Minimax-and-Conditional-Stochastic-Optimization
paper_authors: Xidong Wu, Jianhui Sun, Zhengmian Hu, Junyi Li, Aidong Zhang, Heng Huang
for: 本研究探讨了在分布式数据中进行非 convex conditional stochastic optimization的 federated learning 问题，并提出了首个基于 conditional stochastic gradient estimator 和 momentum 算法的 federated conditional stochastic optimization algorithm (FCSG)。
methods: 本研究使用了 conditional stochastic gradient estimator 和 momentum 算法，并通过 variance reduction 技术提高了加速器 (Acc-FCSG-M) 的性能。
results: 对于多种任务的实验结果表明，FCSG 和 Acc-FCSG-M 可以具有较好的样本和通信复杂度。

Abstract
Conditional stochastic optimization has found applications in a wide range of machine learning tasks, such as invariant learning, AUPRC maximization, and meta-learning. As the demand for training models with large-scale distributed data grows in these applications, there is an increasing need for communication-efficient distributed optimization algorithms, such as federated learning algorithms. This paper considers the nonconvex conditional stochastic optimization in federated learning and proposes the first federated conditional stochastic optimization algorithm (FCSG) with a conditional stochastic gradient estimator and a momentum-based algorithm (FCSG-M). To match the lower bound complexity in the single-machine setting, we design an accelerated algorithm (Acc-FCSG-M) via the variance reduction to achieve the best sample and communication complexity. Compared with the existing optimization analysis for MAML in FL, federated conditional stochastic optimization considers the sample of tasks. Extensive experimental results on various tasks validate the efficiency of these algorithms.

摘要
<>将文本翻译成简化中文。<>Conditional stochastic optimization在机器学习任务中广泛应用，如不变学习、AUPRC最大化和元学习。随着大规模分布式数据的训练模型需求增加，有增加通信效率的分布式优化算法的需求，如联合学习算法。本文考虑非конvex的conditional stochastic optimization在联合学习中，并提出首个联合conditional stochastic gradient估计和势量驱动算法（FCSG-M）。为实现单机设置下的下限Complexity，我们设计加速算法（Acc-FCSG-M）via variance reduction来实现最佳样本和通信复杂度。与现有的MAML在FL中的优化分析相比，联合conditional stochastic optimization考虑任务样本。经验性研究表明，这些算法在各种任务上具有高效性。

MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation

paper_url: http://arxiv.org/abs/2310.02520
repo_url: None
paper_authors: Yuan Zhong, Suhan Cui, Jiaqi Wang, Xiaochen Wang, Ziyi Yin, Yaqing Wang, Houping Xiao, Mengdi Huai, Ting Wang, Fenglong Ma
for:这篇论文旨在提高医疗风险预测的效果，以便预测患者未来可能面临的健康风险。methods:这篇论文提出了一种新的扩充生成模型，名为MedDiffusion，它在训练过程中通过生成假患者数据来扩大样本空间，从而提高风险预测性能。此外，MedDiffusion还使用步进注意机制来检出患者访问记录中隐藏的关系，以便自动保留最重要的信息。results:对四个真实的医疗数据集进行了实验评估，显示MedDiffusion在PR-AUC、F1和科ienn的均值上都高于14种现代基线。此外，我们还进行了减少学习和对GAN-based模型的比较，以验证我们的模型设计的合理性和适应性。此外，我们还分析生成的数据，以提供新的解释力研究。

Abstract
Health risk prediction is one of the fundamental tasks under predictive modeling in the medical domain, which aims to forecast the potential health risks that patients may face in the future using their historical Electronic Health Records (EHR). Researchers have developed several risk prediction models to handle the unique challenges of EHR data, such as its sequential nature, high dimensionality, and inherent noise. These models have yielded impressive results. Nonetheless, a key issue undermining their effectiveness is data insufficiency. A variety of data generation and augmentation methods have been introduced to mitigate this issue by expanding the size of the training data set through the learning of underlying data distributions. However, the performance of these methods is often limited due to their task-unrelated design. To address these shortcomings, this paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. Furthermore, MedDiffusion discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data. Experimental evaluation on four real-world medical datasets demonstrates that MedDiffusion outperforms 14 cutting-edge baselines in terms of PR-AUC, F1, and Cohen's Kappa. We also conduct ablation studies and benchmark our model against GAN-based alternatives to further validate the rationality and adaptability of our model design. Additionally, we analyze generated data to offer fresh insights into the model's interpretability.

摘要
医疗风险预测是医疗领域预测模型的基本任务之一，旨在预测患者未来可能面临的医疗风险使用医疗电子病历（EHR）的历史记录。研究人员已经开发了许多风险预测模型，以处理医疗数据的独特挑战，如其继承性、高维度和内在噪音。这些模型已经取得了很好的效果。然而，数据缺乏问题是这些模型的限制因素。为了解决这个问题，本文提出了一种新的、终端扩散基于的医疗风险预测模型，名为MedDiffusion。它通过在训练时创建 sintetic 患者数据来扩大样本空间，提高风险预测性能。此外，MedDiffusion 通过步骤 wise 注意机制，了解患者访问记录中隐藏的关系，使模型自动保留最重要的信息，以生成高质量数据。实验证明，MedDiffusion 在四个真实的医疗数据集上表现出色，至少在PR-AUC、F1和科恩的卡方面比14种高级基准模型更好。我们还进行了剖析研究和对 GAN 基于的相关模型进行比较，以验证我们的模型设计的合理性和适应性。此外，我们还分析生成的数据，提供新的解释力。

Proactive Human-Robot Interaction using Visuo-Lingual Transformers

paper_url: http://arxiv.org/abs/2310.02506
repo_url: None
paper_authors: Pranay Mathur
for: 本研究旨在将人类与机器人合作中的人类对话中的隐藏性视觉语言决策转化为机器人，以提高人机合作的效率和自然性。
methods: 本研究提出了一种基于视觉语言多模态变换器的学习方法，可以Capture scene dependencies and proactively suggest tasks based on the user’s intention.
results: 在模拟和实际场景中，提出的方法能够准确地描述场景和建议任务，提高了人机合作的效率和自然性。

Abstract
Humans possess the innate ability to extract latent visuo-lingual cues to infer context through human interaction. During collaboration, this enables proactive prediction of the underlying intention of a series of tasks. In contrast, robotic agents collaborating with humans naively follow elementary instructions to complete tasks or use specific hand-crafted triggers to initiate proactive collaboration when working towards the completion of a goal. Endowing such robots with the ability to reason about the end goal and proactively suggest intermediate tasks will engender a much more intuitive method for human-robot collaboration. To this end, we propose a learning-based method that uses visual cues from the scene, lingual commands from a user and knowledge of prior object-object interaction to identify and proactively predict the underlying goal the user intends to achieve. Specifically, we propose ViLing-MMT, a vision-language multimodal transformer-based architecture that captures inter and intra-modal dependencies to provide accurate scene descriptions and proactively suggest tasks where applicable. We evaluate our proposed model in simulation and real-world scenarios.

摘要
人类具有内生的能力，可以从视觉语言cue中提取潜在的上下文信息，通过人类交互来进行推理。在合作中，这使得人类可以前置预测任务的目标意图。相比之下，机器人在合作时，可能只是遵循基本的指令来完成任务，或者使用特定的手工触发器来引起前置的合作。为了提高人机合作的效果，我们提出了一种学习基于的方法，使用场景中的视觉cue、用户的语言命令和对象之间的互动知识来识别和预测用户的目标意图。我们提出了一种视语多模态变换器（ViLing-MMT），它可以捕捉场景中的视觉和语言之间的依赖关系，并提供准确的场景描述和适时提示任务。我们在模拟和实际场景中评估了我们的提议模型。

2023-10-04

Progressive reduced order modeling: empowering data-driven modeling with selective knowledge transfer

On the Performance of Multimodal Language Models

A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization

Deep reinforcement learning for machine scheduling: Methodology, the state-of-the-art, and future directions

Talking Models: Distill Pre-trained Knowledge to Downstream Models via Interactive Communication

Inferring Inference

Misusing Tools in Large Language Models With Visual Adversarial Examples

Neural architecture impact on identifying temporally extended Reinforcement Learning tasks

Assessment of Prediction Intervals Using Uncertainty Characteristics Curves

Comprehensive Multimodal Segmentation in Medical Imaging: Combining YOLOv8 with SAM and HQ-SAM Models

Attributing Learned Concepts in Neural Networks to Training Data

Efficient Federated Prompt Tuning for Black-box Large Pre-trained Models

Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning

Discovering Knowledge-Critical Subnetworks in Pretrained Language Models

LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving

Retrieval meets Long Context Large Language Models

Human-oriented Representation Learning for Robotic Manipulation

AstroCLIP: Cross-Modal Pre-Training for Astronomical Foundation Models

SemiReward: A General Reward Model for Semi-supervised Learning

Soft Convex Quantization: Revisiting Vector Quantization with Convex Optimization

ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models

Multiple Physics Pretraining for Physical Surrogate Models

xVal: A Continuous Number Encoding for Large Language Models

Probing Intersectional Biases in Vision-Language Models with Counterfactual Examples

Exploring the Impact of Disrupted Peer-to-Peer Communications on Fully Decentralized Learning in Disaster Scenarios

Scaling Laws for Associative Memories

Towards Fully Adaptive Regret Minimization in Heavy-Tailed Bandits

Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models

Credit card score prediction using machine learning models: A new dataset

Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models

Local Max-Entropy and Free Energy Principles, Belief Diffusions and their Singularities

Assessing Large Language Models on Climate Information

Learning-Aided Warmstart of Model Predictive Control in Uncertain Fast-Changing Traffic

Boosting Dermatoscopic Lesion Segmentation via Diffusion Models with Visual and Textual Prompts

Searching for High-Value Molecules Using Reinforcement Learning and Transformers

Notes on a Path to AI Assistance in Mathematical Reasoning

Recent Methodological Advances in Federated Learning for Healthcare

Stable and Interpretable Deep Learning for Tabular Data: Introducing InterpreTabNet with the Novel InterpreStability Metric

A novel asymmetrical autoencoder with a sparsifying discrete cosine Stockwell transform layer for gearbox sensor data compression

Rayleigh Quotient Graph Neural Networks for Graph-level Anomaly Detection

Large language models in textual analysis for gesture selection

GPT-4 as an interface between researchers and computational software: improving usability and reproducibility

Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation

Improving Vision Anomaly Detection with the Guidance of Language Modality

Time-Series Classification in Smart Manufacturing Systems: An Experimental Evaluation of State-of-the-Art Machine Learning Algorithms

Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design

Integrating UMLS Knowledge into Large Language Models for Medical Question Answering

Spike Accumulation Forwarding for Effective Training of Spiking Neural Networks

Modified LAB Algorithm with Clustering-based Search Space Reduction Method for solving Engineering Design Problems

MUNCH: Modelling Unique ‘N Controllable Heads

Inclusive Data Representation in Federated Learning: A Novel Approach Integrating Textual and Visual Prompt

Functional trustworthiness of AI systems by statistically valid testing

Online Clustering of Bandits with Misspecified User Models

scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis in Brain

ED-NeRF: Efficient Text-Guided Editing of 3D Scene using Latent Space NeRF

Continual Contrastive Spoken Language Understanding

Bridging the Domain Gap by Clustering-based Image-Text Graph Matching

USB-NeRF: Unrolling Shutter Bundle Adjusted Neural Radiance Fields

Memoria: Hebbian Memory Architecture for Human-Like Sequential Processing

Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization

Spherical Position Encoding for Transformers

Land-cover change detection using paired OpenStreetMap data and optical high-resolution imagery via object-guided Transformer

On Memorization in Diffusion Models

Enhancing Energy-efficiency by Solving the Throughput Bottleneck of LSTM Cells for Embedded FPGAs

Solving Multi-Configuration Problems: A Performance Analysis with Choco Solver

A Study of Quantisation-aware Training on Time Series Transformer Models for Resource-constrained FPGAs

GET: Group Event Transformer for Event-Based Vision

Deformation-Invariant Neural Network and Its Applications in Distorted Image Restoration and Analysis

Foundation Reinforcement Learning: towards Embodied Generalist Agents with Foundation Prior Assistance

Multi-rules mining algorithm for combinatorially exploded decision trees with modified Aitchison-Aitken function-based Bayesian optimization

On Quantified Observability Analysis in Multiagent Systems

How FaR Are Large Language Models From Agents with Theory-of-Mind?

Multi-Agent Reinforcement Learning for Power Grid Topology Optimization

MagicDrive: Street View Generation with Diverse 3D Geometry Control

A ModelOps-based Framework for Intelligent Medical Knowledge Extraction

On the Stability of Expressive Positional Encodings for Graph Neural Networks

Stand for Something or Fall for Everything: Predict Misinformation Spread with Stance-Aware Graph Neural Networks

Improving Automatic VQA Evaluation Using Large Language Models

Improving Drumming Robot Via Attention Transformer Network