2023-10-20

cs.AI

cs.AI - 2023-10-20

Deep Learning Approaches for Dynamic Mechanical Analysis of Viscoelastic Fiber Composites

paper_url: http://arxiv.org/abs/2310.15188
repo_url: None
paper_authors: Victor Hoffmann, Ilias Nahmed, Parisa Rastin, Guénaël Cabanes, Julien Boisse
for: 这篇论文是为了映射微结构到其机械性能的映射，以便通过深度神经网络快速设计和理解微结构。
methods: 该论文使用了机器学习技术，特别是深度神经网络，来映射微结构到机械性能。
results: 该论文可以快速地映射微结构到机械性能，从而帮助设计和理解微结构。

Abstract
The increased adoption of reinforced polymer (RP) composite materials, driven by eco-design standards, calls for a fine balance between lightness, stiffness, and effective vibration control. These materials are integral to enhancing comfort, safety, and energy efficiency. Dynamic Mechanical Analysis (DMA) characterizes viscoelastic behavior, yet there's a growing interest in using Machine Learning (ML) to expedite the design and understanding of microstructures. In this paper we aim to map microstructures to their mechanical properties using deep neural networks, speeding up the process and allowing for the generation of microstructures from desired properties.

摘要
随着复合材料（RP）的广泛应用，受生态设计标准的推动，需要均衡轻量、刚性和有效的振荡控制。这些材料对于提高舒适、安全和能效性具有重要作用。动态机械分析（DMA）可以描述弹性行为，但是有一种增长的兴趣是使用机器学习（ML）来加速设计和理解微结构。在这篇论文中，我们希望通过深度神经网络将微结构映射到机械性能上，从而加快过程并允许从所需的性能开发微结构。

Evoke: Evoking Critical Thinking Abilities in LLMs via Reviewer-Author Prompt Editing

paper_url: http://arxiv.org/abs/2310.13855
repo_url: https://github.com/microsoft/Evoke
paper_authors: Xinyu Hu, Pengfei Tang, Simiao Zuo, Zihan Wang, Bowen Song, Qiang Lou, Jian Jiao, Denis Charles
For: The paper is written for improving the performance of large language models (LLMs) in natural language processing tasks by proposing a new prompt refinement framework called Evoke.* Methods: The paper proposes using two instances of a same LLM, one as a reviewer and one as an author, to refine prompts through an author-reviewer feedback loop. The paper also introduces a data selection approach to expose the LLM to hard samples.* Results: The paper reports significant outperformance of Evoke compared to existing methods, with Evoke scoring above 80 in the challenging task of logical fallacy detection while other baseline methods struggle to reach 20.

Abstract
Large language models (LLMs) have made impressive progress in natural language processing. These models rely on proper human instructions (or prompts) to generate suitable responses. However, the potential of LLMs are not fully harnessed by commonly-used prompting methods: many human-in-the-loop algorithms employ ad-hoc procedures for prompt selection; while auto prompt generation approaches are essentially searching all possible prompts randomly and inefficiently. We propose Evoke, an automatic prompt refinement framework. In Evoke, there are two instances of a same LLM: one as a reviewer (LLM-Reviewer), it scores the current prompt; the other as an author (LLM-Author), it edits the prompt by considering the edit history and the reviewer's feedback. Such an author-reviewer feedback loop ensures that the prompt is refined in each iteration. We further aggregate a data selection approach to Evoke, where only the hard samples are exposed to the LLM. The hard samples are more important because the LLM can develop deeper understanding of the tasks out of them, while the model may already know how to solve the easier cases. Experimental results show that Evoke significantly outperforms existing methods. For instance, in the challenging task of logical fallacy detection, Evoke scores above 80, while all other baseline methods struggle to reach 20.

摘要
大型自然语言处理模型（LLM）已经做出了很好的进步。这些模型需要合适的人类指导（或提示）来生成适当的响应。然而，LLM的潜在能力未被充分利用，因为常用的人类在循环算法使用了临时的方法来选择提示，而自动生成提示方法则是随机搜索所有可能的提示。我们提出了“触发”，一个自动提示修充框架。在触发中，有两个相同的LLM实例：一个作为评审者（LLM-Reviewer），它评分当前提示；另一个作为作者（LLM-Author），它通过考虑编辑历史和评审者的反馈来修改提示。这种作者-评审者反馈循环确保提示在每次迭代中得到修充。我们还添加了一种数据选择方法到触发中，只暴露困难样本给LLM。困难样本更重要，因为LLM可以通过它们更深入理解任务，而模型可能已经知道如何解决较易的 случа子。实验结果表明，触发明确超越了现有方法。例如，在逻辑错误检测任务中，触发得分超过80，而所有基准方法几乎无法达到20。

CNN-based Prediction of Partition Path for VVC Fast Inter Partitioning Using Motion Fields

paper_url: http://arxiv.org/abs/2310.13838
repo_url: https://github.com/simon123123/vtm10_fast_dt_inter_partition_pcs2021
paper_authors: Yiqun Liu, Marc Riviere, Thomas Guionnet, Aline Roumy, Christine Guillemot
for: 这个研究旨在提高VVC编码器的速度，使其在高效视频编码（HEVC）标准下提供更高的压缩效率和更高的编码复杂度。methods: 该研究提出了一种基于卷积神经网络（CNN）的方法，通过预测QUADTREE分割路径来加速VVC编码器的INTERPARTITIONING过程。在这种方法中，首先引入了一种新的QUADTREE分割表示法，然后开发了一种基于U-Net的CNN，用于在CTU级别预测分割路径。results: 实验表明，提出的方法可以在RAGOP32配置下实现加速率 range from 16.5% to 60.2%，而且与此同时，BD-rate的影响在0.44% to 4.59%之间，这超过了其他当前的解决方案。此外，该方法还被认为是当前场景中最轻量级的方法之一，这使得其适用于其他编码器。

Abstract
The Versatile Video Coding (VVC) standard has been recently finalized by the Joint Video Exploration Team (JVET). Compared to the High Efficiency Video Coding (HEVC) standard, VVC offers about 50% compression efficiency gain, in terms of Bjontegaard Delta-Rate (BD-rate), at the cost of a 10-fold increase in encoding complexity. In this paper, we propose a method based on Convolutional Neural Network (CNN) to speed up the inter partitioning process in VVC. Firstly, a novel representation for the quadtree with nested multi-type tree (QTMT) partition is introduced, derived from the partition path. Secondly, we develop a U-Net-based CNN taking a multi-scale motion vector field as input at the Coding Tree Unit (CTU) level. The purpose of CNN inference is to predict the optimal partition path during the Rate-Distortion Optimization (RDO) process. To achieve this, we divide CTU into grids and predict the Quaternary Tree (QT) depth and Multi-type Tree (MT) split decisions for each cell of the grid. Thirdly, an efficient partition pruning algorithm is introduced to employ the CNN predictions at each partitioning level to skip RDO evaluations of unnecessary partition paths. Finally, an adaptive threshold selection scheme is designed, making the trade-off between complexity and efficiency scalable. Experiments show that the proposed method can achieve acceleration ranging from 16.5% to 60.2% under the RandomAccess Group Of Picture 32 (RAGOP32) configuration with a reasonable efficiency drop ranging from 0.44% to 4.59% in terms of BD-rate, which surpasses other state-of-the-art solutions. Additionally, our method stands out as one of the lightest approaches in the field, which ensures its applicability to other encoders.

摘要
joint 视频探索团体（JVET）最近确定的多元视频编码标准（VVC），相比高效视频编码标准（HEVC），VVC在BD率方面提供约50%的压缩效率提升，但是需要10倍的编码复杂度增加。在这篇论文中，我们提出一种基于卷积神经网络（CNN）的方法，用于加速VVC中的分区过程。首先，我们引入了一种新的Quadtree嵌套多型树（QTMT）分区表示，基于分区路径。其次，我们开发了一种基于U-Net的CNN，用于在CTU级别上接受多尺度运动 вектор场为输入，并通过RDO过程预测最佳分区路径。为此，我们将CTU分割成格子，并预测每个格子的QT深度和MT分裂决策。第三，我们引入了一种高效的分区剔除算法，以使用CNN预测结果在每个分区级别进行分区剔除。最后，我们设计了一种可Scalable的阈值选择方案，以实现质量和效率之间的平衡。实验表明，我们的方法可以在RAGOP32配置下实现加速率 ranges from 16.5% to 60.2%，BD率上的效率损失在0.44% to 4.59%之间，超越了其他现有的解决方案。此外，我们的方法具有轻量级的特点，使其适用于其他编码器。

GraphMaker: Can Diffusion Models Generate Large Attributed Graphs?

paper_url: http://arxiv.org/abs/2310.13833
repo_url: https://github.com/graph-com/graphmaker
paper_authors: Mufei Li, Eleonora Kreačić, Vamsi K. Potluru, Pan Li
for: 该论文旨在生成大规模图的实用图学机器学习中，提供一种新的扩散模型，帮助理解网络演化和数据用途保持。
methods: 该论文提出了一种新的扩散模型，名为GraphMaker，可以生成大规模图并考虑节点属性。它还使用了节点级conditioning和小批量策略以提高可扩展性。
results: 实验表明，GraphMaker可以生成高质量的大规模图，并且可以在下游任务中提供有用的数据。

Abstract
Large-scale graphs with node attributes are fundamental in real-world scenarios, such as social and financial networks. The generation of synthetic graphs that emulate real-world ones is pivotal in graph machine learning, aiding network evolution understanding and data utility preservation when original data cannot be shared. Traditional models for graph generation suffer from limited model capacity. Recent developments in diffusion models have shown promise in merely graph structure generation or the generation of small molecular graphs with attributes. However, their applicability to large attributed graphs remains unaddressed due to challenges in capturing intricate patterns and scalability. This paper introduces GraphMaker, a novel diffusion model tailored for generating large attributed graphs. We study the diffusion models that either couple or decouple graph structure and node attribute generation to address their complex correlation. We also employ node-level conditioning and adopt a minibatch strategy for scalability. We further propose a new evaluation pipeline using models trained on generated synthetic graphs and tested on original graphs to evaluate the quality of synthetic data. Empirical evaluations on real-world datasets showcase GraphMaker's superiority in generating realistic and diverse large-attributed graphs beneficial for downstream tasks.

摘要
大规模图像有节点特征是现实世界中常见的案例，如社交和金融网络。生成可信的图像是图机器学习中关键的，帮助理解网络的进化和数据的可用性保持，当原始数据无法分享时。传统的图像生成模型受到有限的模型容量的限制。最近的扩散模型已经显示出了在图像结构生成或小分子图像生成中的承诺。然而，它们对大型具有特征的图像仍然无法解决，因为捕捉复杂的模式和可扩展性的挑战。本文介绍了GraphMaker，一种适用于生成大型具有特征的图像的新型扩散模型。我们研究了把 grafstructures 和节点特征生成 decouple 或 couple 以处理它们之间的复杂关系。我们还使用节点级 conditioning 和采用小批量策略以实现可扩展性。此外，我们还提出了一种基于生成的 синтетиче图像模型来评估生成的图像质量。empirical evaluation 表明，GraphMaker 可以生成真实且多样的大型具有特征的图像，对下游任务具有利助。

Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models

paper_url: http://arxiv.org/abs/2310.13828
repo_url: None
paper_authors: Shawn Shan, Wenxin Ding, Josephine Passananti, Haitao Zheng, Ben Y. Zhao
for: 本研究旨在探讨数据恶意攻击的可行性和影响，以及可能的防御策略。
methods: 研究人员使用了一种名为“Nightshade”的优化后的提问特定恶意攻击方法，可以在少量恶意样本中训练模型。
results: 研究发现， Nightshade 攻击可以成功地让模型产生意外的输出，并且这些输出可以“延伸”到相关的概念上。此外，研究人员还发现，一些恶意攻击可以“混合”在同一个提问上，从而导致模型的总体特征变得不稳定。

Abstract
Data poisoning attacks manipulate training data to introduce unexpected behaviors into machine learning models at training time. For text-to-image generative models with massive training datasets, current understanding of poisoning attacks suggests that a successful attack would require injecting millions of poison samples into their training pipeline. In this paper, we show that poisoning attacks can be successful on generative models. We observe that training data per concept can be quite limited in these models, making them vulnerable to prompt-specific poisoning attacks, which target a model's ability to respond to individual prompts. We introduce Nightshade, an optimized prompt-specific poisoning attack where poison samples look visually identical to benign images with matching text prompts. Nightshade poison samples are also optimized for potency and can corrupt an Stable Diffusion SDXL prompt in <100 poison samples. Nightshade poison effects "bleed through" to related concepts, and multiple attacks can composed together in a single prompt. Surprisingly, we show that a moderate number of Nightshade attacks can destabilize general features in a text-to-image generative model, effectively disabling its ability to generate meaningful images. Finally, we propose the use of Nightshade` and similar tools as a last defense for content creators against web scrapers that ignore opt-out/do-not-crawl directives, and discuss possible implications for model trainers and content creators.

摘要
数据毒攻击攻击机器学习模型的训练时间，以引入无预期的行为。文本生成模型拥有庞大的训练集，目前认知中指出，成功攻击需要插入数百万毒样本到训练管道中。在这篇论文中，我们表明了毒攻击可以成功地进行在生成模型中。我们发现，生成模型中每个概念的训练数据很有限，使其易受到特定提示毒攻击，该攻击target模型对具体提示的响应能力。我们引入了“夜露”（Nightshade）优化的提示特定毒攻击，毒样本与benign图像和相同的文本提示保持视觉相同。夜露毒样本还优化了强度，可以在<100个毒样本中训练Stable Diffusion SDXL提示。夜露毒效“泄漏”到相关概念上，并且多个攻击可以在单个提示中组合 together。 surprisingly，我们发现了一些 Nightshade 攻击可以使生成模型的通用特征失效，从而禁用生成有意义的图像。最后，我们提议使用 Nightshade 和类似工具作为内容创作者对网络抓取器忽略 opt-out/do-not-crawl 指令的最后防御，并讨论了模型训练者和内容创作者的可能的影响。

FERI: A Multitask-based Fairness Achieving Algorithm with Applications to Fair Organ Transplantation

paper_url: http://arxiv.org/abs/2310.13820
repo_url: None
paper_authors: Can Li, Dejian Lai, Xiaoqian Jiang, Kai Zhang
for: 这 paper 是为了 Addressing fairness challenges in liver transplantation, particularly for subgroups defined by sensitive attributes like age group, gender, and race/ethnicity.
methods: 这 paper 使用了 Machine learning models for outcome prediction, with a focus on fairness-aware predictive modeling. The authors introduce the Fairness through the Equitable Rate of Improvement in Multitask Learning (FERI) algorithm, which constrains subgroup loss and prevents subgroup dominance in the training process.
results: 试验表明，FERI 可以 maintain high predictive accuracy with AUROC and AUPRC comparable to baseline models, while improving fairness. Specifically, for gender, FERI reduces the demographic parity disparity by 71.74%, and for the age group, it decreases the equalized odds disparity by 40.46%.

Abstract
Liver transplantation often faces fairness challenges across subgroups defined by sensitive attributes like age group, gender, and race/ethnicity. Machine learning models for outcome prediction can introduce additional biases. To address these, we introduce Fairness through the Equitable Rate of Improvement in Multitask Learning (FERI) algorithm for fair predictions of graft failure risk in liver transplant patients. FERI constrains subgroup loss by balancing learning rates and preventing subgroup dominance in the training process. Our experiments show that FERI maintains high predictive accuracy with AUROC and AUPRC comparable to baseline models. More importantly, FERI demonstrates an ability to improve fairness without sacrificing accuracy. Specifically, for gender, FERI reduces the demographic parity disparity by 71.74%, and for the age group, it decreases the equalized odds disparity by 40.46%. Therefore, the FERI algorithm advances fairness-aware predictive modeling in healthcare and provides an invaluable tool for equitable healthcare systems.

摘要
肝移植往往面临公平挑战，特别是在年龄组、性别和种族/民族等敏感属性下的分组。机器学习模型用于结果预测可能会引入额外偏见。为解决这些问题，我们介绍了在多任务学习中实现公平性的 Fairness through the Equitable Rate of Improvement in Multitask Learning（FERI）算法。FERI通过均衡学习率和避免分组占据训练过程中的分组主导来约束分组损失。我们的实验表明，FERI可以保持高度的预测精度，AUROC和AUPRC与基线模型相当。此外，FERI能够提高公平性而不牺牲准确性。特别是gender方面，FERI可以降低性别差距的比例为71.74%，而年龄组方面，它可以降低平等 odds差距的比例为40.46%。因此，FERI算法为健康卫生领域的公平预测模型提供了一个有价值的工具，并且能够推动公平的医疗系统。

FATA-Trans: Field And Time-Aware Transformer for Sequential Tabular Data

paper_url: http://arxiv.org/abs/2310.13818
repo_url: https://github.com/zdy93/fata-trans
paper_authors: Dongyu Zhang, Liang Wang, Xin Dai, Shubham Jain, Junpeng Wang, Yujie Fan, Chin-Chia Michael Yeh, Yan Zheng, Zhongfang Zhuang, Wei Zhang
for: 这个研究旨在提出一个适合批评Sequential Tabular Data（STD）的模型，并且能够对STD进行有效的分析和处理。
methods: 本研究提出了一个名为FATA-Trans的模型，它使用了两个Field Transformer来处理STD，并且透过Field-type embedding和Time-aware position embedding来capture static和动态字段之间的差异和时间序列资讯。
results: 实验结果显示，FATA-Trans在下游任务中的学习表现都高于现有的方法，并且通过可视化研究来显示了模型所捕捉的内在结构和时间行为图像。

Abstract
Sequential tabular data is one of the most commonly used data types in real-world applications. Different from conventional tabular data, where rows in a table are independent, sequential tabular data contains rich contextual and sequential information, where some fields are dynamically changing over time and others are static. Existing transformer-based approaches analyzing sequential tabular data overlook the differences between dynamic and static fields by replicating and filling static fields into each transformer, and ignore temporal information between rows, which leads to three major disadvantages: (1) computational overhead, (2) artificially simplified data for masked language modeling pre-training task that may yield less meaningful representations, and (3) disregarding the temporal behavioral patterns implied by time intervals. In this work, we propose FATA-Trans, a model with two field transformers for modeling sequential tabular data, where each processes static and dynamic field information separately. FATA-Trans is field- and time-aware for sequential tabular data. The field-type embedding in the method enables FATA-Trans to capture differences between static and dynamic fields. The time-aware position embedding exploits both order and time interval information between rows, which helps the model detect underlying temporal behavior in a sequence. Our experiments on three benchmark datasets demonstrate that the learned representations from FATA-Trans consistently outperform state-of-the-art solutions in the downstream tasks. We also present visualization studies to highlight the insights captured by the learned representations, enhancing our understanding of the underlying data. Our codes are available at https://github.com/zdy93/FATA-Trans.

摘要
纵向表格数据是现实世界中最常用的数据类型之一。与传统的表格数据不同，纵向表格数据含有较为复杂的上下文和时间序列信息，其中一些字段会随时间的推移而变化，而其他字段则是静态的。现有的转换器基本方法会忽略时间序列中的不同性和顺序信息，这会导致三个主要缺点：（1）计算成本高，（2）为隐藏语言模型预训练任务提供不准确的数据，（3）忽略时间序列中的行为模式。在这项工作中，我们提议了FATA-Trans模型，它包含两个字段转换器，每个转换器都会处理不同的静态和动态字段信息。FATA-Trans模型具有场景和时间意识，可以有效地处理纵向表格数据。我们在三个标准数据集上进行了实验，结果显示FATA-Trans模型在下游任务中的学习表现 Always outperform了现有的解决方案。我们还提供了视觉研究，以帮助我们更好地理解下面数据的含义。我们的代码可以在https://github.com/zdy93/FATA-Trans上获取。

paper_url: http://arxiv.org/abs/2310.13809
repo_url: https://github.com/lindamoraes/turtlebot-project
paper_authors: Linda Dotto de Moraes, Victor Augusto Kich, Alisson Henrique Kolling, Jair Augusto Bottega, Ricardo Bedin Grando, Anselmo Rafael Cukla, Daniel Fernando Tello Gamarra
for: 这项研究旨在提高无地图导航的地面机器人能力。
methods: 研究采用了两种不同的深度强化学习方法，一是基于深度Q网络（DQN）算法的方法，另一是基于双深度Q网络（DDQN）算法的方法。两种方法都使用了激光范围探测器提供的24个量据，并且通过Agent的位差和方向相对于目标来决定导航。
results: 研究发现，使用Double Deep结构可以显著提高地面机器人的导航能力，并且不需要依赖于复杂的图像输入。在三个实际环境中进行了评估，结果表明 Double Deep 结构在导航任务中表现出了显著的优势。

Abstract
In this study, we present two distinct approaches within the realm of Deep Reinforcement Learning (Deep-RL) aimed at enhancing mapless navigation for a ground-based mobile robot. The research methodology primarily involves a comparative analysis between a Deep-RL strategy grounded in the foundational Deep Q-Network (DQN) algorithm, and an alternative approach based on the Double Deep Q-Network (DDQN) algorithm. The agents in these approaches leverage 24 measurements from laser range sampling, coupled with the agent's positional differentials and orientation relative to the target. This amalgamation of data influences the agents' determinations regarding navigation, ultimately dictating the robot's velocities. By embracing this parsimonious sensory framework as proposed, we successfully showcase the training of an agent for proficiently executing navigation tasks and adeptly circumventing obstacles. Notably, this accomplishment is attained without a dependency on intricate sensory inputs like those inherent to image-centric methodologies. The proposed methodology is evaluated in three different real environments, revealing that Double Deep structures significantly enhance the navigation capabilities of mobile robots compared to simple Q structures.

摘要
在这项研究中，我们提出了两种不同的深度强化学习方法（深度强化学习），用于增强地面机器人无地图导航。我们的研究方法主要包括对使用深度Q网络（DQN）算法和双深度Q网络（DDQN）算法两种方法进行比较分析。这两种方法的代理人利用24个激光距测样本，并结合代理人的位差和方向偏差 relative to the target，以Influence the agents' determinations regarding navigation, ultimately dictating the robot's velocities。通过采用这种简单的感知框架，我们成功地训练了一个能够高效执行导航任务并绕过障碍物的代理人。值得注意的是，这种成功不依赖于复杂的感知输入，如图像中心的方法。我们的方法在三个不同的实际环境中进行了评估，结果显示，使用双深度结构可以增强移动机器人的导航能力，相比于单深度结构。

RoseNet: Predicting Energy Metrics of Double InDel Mutants Using Deep Learning

paper_url: http://arxiv.org/abs/2310.13806
repo_url: None
paper_authors: Sarah Coffland, Katie Christensen, Filip Jagodzinski, Brian Hutchinson
For: The paper is written to explore the use of computational methods to model insertion and deletion (InDel) mutations in proteins, specifically using a deep learning approach called RoseNet to predict the effects of InDel mutations on protein structure and function.* Methods: The paper uses a combination of computational methods, including the Rosetta protein structure prediction software and deep learning techniques, to generate and analyze exhaustive double InDel mutations for three proteins. The authors develop and train RoseNet on several structural and energetic metrics output by Rosetta during the mutant generation process.* Results: The paper presents the results of the authors’ experiments using RoseNet to predict the effects of InDel mutations on protein structure and function. The authors show that RoseNet can accurately emulate the exhaustive data set using deep learning methods, and demonstrate the ability of the model to predict Rosetta metrics for unseen mutant sequences with two InDels. The paper also includes a sensitivity analysis to determine the necessary quantity of data required to accurately emulate the structural scores for computationally generated mutants.

Abstract
An amino acid insertion or deletion, or InDel, can have profound and varying functional impacts on a protein's structure. InDel mutations in the transmembrane conductor regulator protein for example give rise to cystic fibrosis. Unfortunately performing InDel mutations on physical proteins and studying their effects is a time prohibitive process. Consequently, modeling InDels computationally can supplement and inform wet lab experiments. In this work, we make use of our data sets of exhaustive double InDel mutations for three proteins which we computationally generated using a robotics inspired inverse kinematics approach available in Rosetta. We develop and train a neural network, RoseNet, on several structural and energetic metrics output by Rosetta during the mutant generation process. We explore and present how RoseNet is able to emulate the exhaustive data set using deep learning methods, and show to what extent it can predict Rosetta metrics for unseen mutant sequences with two InDels. RoseNet achieves a Pearson correlation coefficient median accuracy of 0.775 over all Rosetta scores for the largest protein. Furthermore, a sensitivity analysis is performed to determine the necessary quantity of data required to accurately emulate the structural scores for computationally generated mutants. We show that the model can be trained on minimal data (<50%) and still retain a high level of accuracy.

摘要
一个氨基酸插入或删除（InDel）可能对蛋白质的结构产生深刻和多样化的功能影响。例如，InDel 变异在传输膜调控蛋白中会导致肾上腺炎病。然而，在实验室中进行InDel变异和研究其效果是一个时间紧张的过程。因此，通过计算方式模拟InDel变异可以补充和指导 wet lab experiment。在这项工作中，我们使用了我们已经生成的double InDel 变异数据集，其中包括三种蛋白质的计算生成的双重InDel 变异。我们开发了一个神经网络模型，称为RoseNet，并在Rosetta中生成的多种结构和能量指标上训练这个模型。我们探索了RoseNet是如何使用深度学习方法来模拟数据集，并对未经见过的双重InDel 变异序列预测Rosetta指标的能力。RoseNet在最大蛋白质中达到了 median 准确率0.775。此外，我们进行了敏感分析，以确定模型需要多少数据来准确模拟计算生成的结构分数。我们发现模型可以在少量数据（<50%）上训练并仍保持高级别的准确率。

Improving Molecular Properties Prediction Through Latent Space Fusion

paper_url: http://arxiv.org/abs/2310.13802
repo_url: https://github.com/ibm/molformer
paper_authors: Eduardo Soares, Akihiro Kishimoto, Emilio Vital Brazil, Seiji Takeda, Hiroshi Kajino, Renato Cerqueira
for:This paper aims to enhance the efficacy of pre-trained language models for predicting molecular properties by combining latent spaces derived from state-of-the-art chemical models.methods:The proposed approach combines the embeddings derived from MHG-GNN, which represents molecular structures as graphs, and MoLFormer embeddings rooted in chemical language. The attention mechanism of MoLFormer is able to identify relations between two atoms even when their distance is far apart, while the GNN of MHG-GNN can more precisely capture relations among multiple atoms closely located.results:The proposed multi-view approach outperforms existing state-of-the-art methods, including MoLFormer-XL, in predicting clinical trial drug toxicity and inhibiting HIV replication, as demonstrated on six benchmark datasets from MoleculeNet. The approach achieves superior performance in intricate tasks, and the use of small versions of MHG-GNN and MoLFormer opens up an opportunity for further improvement when using a larger-scale dataset.Here is the result in Simplified Chinese text:for: 这篇论文目标是提高预训练语言模型在预测分子性质方面的效果，通过结合化物学模型的状态空间。methods: 该方法结合MHG-GNN的嵌入， representing molecular structures as graphs，以及基于化学语言的MoLFormer嵌入。MoLFormer的注意机制可以在两个原子之间识别远距离的关系，而MHG-GNN的GNN可以更好地捕捉多个原子 closely located的关系。results: 该方法在六个MoleculeNet的标准数据集上表现出色，比如MoLFormer-XL、在临床药物到ксиicity预测和HIV复制抑制方面取得了更高的性能，特别是在复杂任务中。使用小版本的MHG-GNN和MoLFormer，这开 up了进一步改进的机会，当使用更大规模的数据集时。

Abstract
Pre-trained Language Models have emerged as promising tools for predicting molecular properties, yet their development is in its early stages, necessitating further research to enhance their efficacy and address challenges such as generalization and sample efficiency. In this paper, we present a multi-view approach that combines latent spaces derived from state-of-the-art chemical models. Our approach relies on two pivotal elements: the embeddings derived from MHG-GNN, which represent molecular structures as graphs, and MoLFormer embeddings rooted in chemical language. The attention mechanism of MoLFormer is able to identify relations between two atoms even when their distance is far apart, while the GNN of MHG-GNN can more precisely capture relations among multiple atoms closely located. In this work, we demonstrate the superior performance of our proposed multi-view approach compared to existing state-of-the-art methods, including MoLFormer-XL, which was trained on 1.1 billion molecules, particularly in intricate tasks such as predicting clinical trial drug toxicity and inhibiting HIV replication. We assessed our approach using six benchmark datasets from MoleculeNet, where it outperformed competitors in five of them. Our study highlights the potential of latent space fusion and feature integration for advancing molecular property prediction. In this work, we use small versions of MHG-GNN and MoLFormer, which opens up an opportunity for further improvement when our approach uses a larger-scale dataset.

摘要

Specific versus General Principles for Constitutional AI

paper_url: http://arxiv.org/abs/2310.13798
repo_url: None
paper_authors: Sandipan Kundu, Yuntao Bai, Saurav Kadavath, Amanda Askell, Andrew Callahan, Anna Chen, Anna Goldie, Avital Balwit, Azalia Mirhoseini, Brayden McLean, Catherine Olsson, Cassie Evraets, Eli Tran-Johnson, Esin Durmus, Ethan Perez, Jackson Kernion, Jamie Kerr, Kamal Ndousse, Karina Nguyen, Nelson Elhage, Newton Cheng, Nicholas Schiefer, Nova DasSarma, Oliver Rausch, Robin Larson, Shannon Yang, Shauna Kravec, Timothy Telleen-Lawton, Thomas I. Liao, Tom Henighan, Tristan Hume, Zac Hatfield-Dodds, Sören Mindermann, Nicholas Joseph, Sam McCandlish, Jared Kaplan
for: 这篇论文是为了探讨人工智能模型中的伦理问题，以及如何使用AI模型来避免这些问题。
methods: 这篇论文使用了AI模型来替代人类反馈，并让AI模型仅仅根据一个列表的原则进行反馈。
results: 研究发现，使用简单的原则可以有效地防止AI模型表达有害行为，但是需要更多的原则来实现细化的控制。

Abstract
Human feedback can prevent overtly harmful utterances in conversational models, but may not automatically mitigate subtle problematic behaviors such as a stated desire for self-preservation or power. Constitutional AI offers an alternative, replacing human feedback with feedback from AI models conditioned only on a list of written principles. We find this approach effectively prevents the expression of such behaviors. The success of simple principles motivates us to ask: can models learn general ethical behaviors from only a single written principle? To test this, we run experiments using a principle roughly stated as "do what's best for humanity". We find that the largest dialogue models can generalize from this short constitution, resulting in harmless assistants with no stated interest in specific motivations like power. A general principle may thus partially avoid the need for a long list of constitutions targeting potentially harmful behaviors. However, more detailed constitutions still improve fine-grained control over specific types of harms. This suggests both general and specific principles have value for steering AI safely.

摘要
人类反馈可以防止对话模型表达过分危险的言行，但可能无法自动消除微妙的问题行为，如表达自保或权力愿望。宪法AI提供了一种alternative， replacing human feedback with AI模型对一份已编写的原则进行反馈。我们发现这种方法能够有效防止表达这些行为。成功的简单原则使我们问：可以模型学习通用的伦理行为从单一的写好的原则中吗？为测试这一点，我们运行了一些实验，使用“为人类做好事”的简单宪法。我们发现大型对话模型可以从这个短宪法中泛化，得到无权力的助手。一个通用的原则可以因此部分避免制定长列表的宪法，targeting potentially harmful behaviors。然而，更详细的宪法仍然可以提供细化的控制 sobre specific types of harms。这表明 Both general and specific principles have value for steering AI safely。

Enhancing Illicit Activity Detection using XAI: A Multimodal Graph-LLM Framework

paper_url: http://arxiv.org/abs/2310.13787
repo_url: None
paper_authors: Jack Nicholls, Aditya Kuppa, Nhien-An Le-Khac
For: This paper is written for organisations and governments looking to improve their financial cybercrime detection and prevention methods.* Methods: The paper presents a state-of-the-art, novel multimodal approach to explainable AI (XAI) in financial cybercrime detection, leveraging deep learning models to distill essential representations from transaction sequencing, subgraph connectivity, and narrative generation.* Results: The paper’s proposed approach significantly streamlines the investigative process for analysts, allowing them to understand transactions and their metadata much further through contextual narrative generation.

Abstract
Financial cybercrime prevention is an increasing issue with many organisations and governments. As deep learning models have progressed to identify illicit activity on various financial and social networks, the explainability behind the model decisions has been lacklustre with the investigative analyst at the heart of any deep learning platform. In our paper, we present a state-of-the-art, novel multimodal proactive approach to addressing XAI in financial cybercrime detection. We leverage a triad of deep learning models designed to distill essential representations from transaction sequencing, subgraph connectivity, and narrative generation to significantly streamline the analyst's investigative process. Our narrative generation proposal leverages LLM to ingest transaction details and output contextual narrative for an analyst to understand a transaction and its metadata much further.

摘要
金融电脑犯罪预防已成为许多组织和政府的问题。随着深度学习模型在不同的金融和社交网络上识别违法活动的能力不断提高，模型决策的解释力 however, has been lackluster for investigative analysts at the heart of any deep learning platform. 在我们的论文中，我们提出了一种现代化、新型的多Modalactive approach来解决金融电脑犯罪检测中的XAI问题。我们利用了三种深度学习模型，用于筛选交易时间序列、子图连接和生成文本，以大大简化分析员的调查过程。我们的文本生成提案利用LLM来接受交易细节并输出 Contextual narrative，让分析员更深入理解交易和其元数据。

Fundamental Limits of Membership Inference Attacks on Machine Learning Models

paper_url: http://arxiv.org/abs/2310.13786
repo_url: None
paper_authors: Eric Aubinais, Elisabeth Gassiat, Pablo Piantanida
for: 本文探讨了机器学习模型上的会员推测攻击（MIA）的基本统计限制，以及这种攻击对个人隐私的暴露。
methods: 本文首先 derivates了攻击成功的统计量，然后investigates several situations，并提供了这种量的上下限。
results: 根据样本数量和学习模型的结构参数，可以直接从数据集中估算出攻击准确率。

Abstract
Membership inference attacks (MIA) can reveal whether a particular data point was part of the training dataset, potentially exposing sensitive information about individuals. This article explores the fundamental statistical limitations associated with MIAs on machine learning models. More precisely, we first derive the statistical quantity that governs the effectiveness and success of such attacks. Then, we investigate several situations for which we provide bounds on this quantity of interest. This allows us to infer the accuracy of potential attacks as a function of the number of samples and other structural parameters of learning models, which in some cases can be directly estimated from the dataset.

摘要
会员推测攻击（MIA）可以揭示一个特定数据点是否在训练集中，可能暴露个人敏感信息。这篇文章探讨机器学习模型上的基本统计限制，以帮助理解会员推测攻击的效果和成功。我们首先计算攻击成功的统计量，然后研究这种量在不同情况下的下界和上界，从而可以根据样本数量和学习模型的结构参数来估算攻击的准确率。

Copyright Violations and Large Language Models

paper_url: http://arxiv.org/abs/2310.13771
repo_url: https://github.com/Noykarde/NoykardeRepository
paper_authors: Antonia Karamolegkou, Jiaang Li, Li Zhou, Anders Søgaard
for: 这篇论文探讨了大型自然语言处理模型可能违反版权法律的问题，具体来说是模型是否可以通过精准记忆来重新分布版权文本。
methods: 该论文使用了一系列语言模型对popular books和编程问题进行了实验，以保守地 Characterize the extent to which language models can redistribute these materials.
results: 研究发现，大型语言模型可以很好地记忆和重新分布版权文本，这可能会导致未经授权的版权违反。这些结果提醒我们需要进一步检查和研究，以确保未来的自然语言处理技术的发展遵循版权法律。

Abstract
Language models may memorize more than just facts, including entire chunks of texts seen during training. Fair use exemptions to copyright laws typically allow for limited use of copyrighted material without permission from the copyright holder, but typically for extraction of information from copyrighted materials, rather than {\em verbatim} reproduction. This work explores the issue of copyright violations and large language models through the lens of verbatim memorization, focusing on possible redistribution of copyrighted text. We present experiments with a range of language models over a collection of popular books and coding problems, providing a conservative characterization of the extent to which language models can redistribute these materials. Overall, this research highlights the need for further examination and the potential impact on future developments in natural language processing to ensure adherence to copyright regulations. Code is at \url{https://github.com/coastalcph/CopyrightLLMs}.

摘要
大型自然语言处理模型可能会记忆更多于Just facts，包括训练期间看到的整个文本块。 fair use exemption to copyright laws 通常允许不经copyright持有人授权的限量使用copyrighted material，但通常是为了提取copyrighted materials中的信息，而不是 verbatim 复制。这项研究通过 memorization 的问题来探讨版权侵犯和大型语言模型，强调可能的版权文本重新分布。我们在一系列popular book和编程问题上进行了实验，提供了保守的计算机архитектура来Quantify the extent to which language models can redistribute these materials. 总之，这项研究强调了需要进一步检查和未来自然语言处理发展中的版权规定遵循。代码可以在上找到。

Neural-Base Music Generation for Intelligence Duplication

paper_url: http://arxiv.org/abs/2310.13691
repo_url: None
paper_authors: Jacob Galajda, Kien Hua
for: 这 paper 是为了研究智能复制（Intelligent Duplication）技术，以便创造新的信息。
methods: 这 paper 使用了深度学习系统，以学习大作曲家贝多芬的作曲能力，并将其转化为一个哈希基本知识库。
results: 这 paper 通过了一种新的音乐创作方法，可以通过贝多芬的作曲能力来驱动音乐创作。

Abstract
There are two aspects of machine learning and artificial intelligence: (1) interpreting information, and (2) inventing new useful information. Much advance has been made for (1) with a focus on pattern recognition techniques (e.g., interpreting visual data). This paper focuses on (2) with intelligent duplication (ID) for invention. We explore the possibility of learning a specific individual's creative reasoning in order to leverage the learned expertise and talent to invent new information. More specifically, we employ a deep learning system to learn from the great composer Beethoven and capture his composition ability in a hash-based knowledge base. This new form of knowledge base provides a reasoning facility to drive the music composition through a novel music generation method.

摘要
Machine learning和人工智能有两个方面：（1）解读信息，和（2）创造新有用信息。在（1）方面，已经做出了很大的进步，主要是通过模式识别技术（如图像数据的解读）。这篇论文则专注于（2）方面，通过智能复制（ID）来创造新的信息。我们尝试了学习特定个人的创造性思维，以利用学习到的专业技巧和才华来创造新的信息。我们使用深度学习系统学习了大作曲家贝多芬的作曲能力，并将其储存在一个哈希基本知识库中。这种新的知识库提供了一种新的思维方式，用于驱动音乐创作。

Optimizing Retrieval-augmented Reader Models via Token Elimination

paper_url: http://arxiv.org/abs/2310.13682
repo_url: https://github.com/mosheber/token_elimination
paper_authors: Moshe Berchansky, Peter Izsak, Avi Caciularu, Ido Dagan, Moshe Wasserblat
for: 提高语音模型的运行效率，特别是在生成大量输出时。
methods: 分析支持文本的贡献度，并在токен水平上剔除不必要的信息，以降低运行时间。
results: 可以减少运行时间，最多62.2%，同时保持表现稳定，甚至提高表现。

Abstract
Fusion-in-Decoder (FiD) is an effective retrieval-augmented language model applied across a variety of open-domain tasks, such as question answering, fact checking, etc. In FiD, supporting passages are first retrieved and then processed using a generative model (Reader), which can cause a significant bottleneck in decoding time, particularly with long outputs. In this work, we analyze the contribution and necessity of all the retrieved passages to the performance of reader models, and propose eliminating some of the retrieved information, at the token level, that might not contribute essential information to the answer generation process. We demonstrate that our method can reduce run-time by up to 62.2%, with only a 2% reduction in performance, and in some cases, even improve the performance results.

摘要

Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models

paper_url: http://arxiv.org/abs/2310.13678
repo_url: None
paper_authors: Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Ke Wu
for: 提高对长型语音内容的翻译质量，因为长型语音内容中的各个单元可以独立翻译以提高总体翻译质量。
methods: 利用大语言模型（LLMs）对长ASR讲cript进行分割，以便独立翻译每个单元，并在解码过程中 incorporating finite-state constraints 来消除投射的 Output。
results: 通过prompt-tuning或 fine-tuning，LLMs可以适应ASR错误的讲cript，并在9个测试集上提高了 average BLEU 值by 2.9点，相比于 automatic punctuation 基准。

Abstract
One challenge in speech translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations. To address this mismatch, we adapt large language models (LLMs) to split long ASR transcripts into segments that can be independently translated so as to maximize the overall translation quality. We overcome the tendency of hallucination in LLMs by incorporating finite-state constraints during decoding; these eliminate invalid outputs without requiring additional training. We discover that LLMs are adaptable to transcripts containing ASR errors through prompt-tuning or fine-tuning. Relative to a state-of-the-art automatic punctuation baseline, our best LLM improves the average BLEU by 2.9 points for English-German, English-Spanish, and English-Arabic TED talk translation in 9 test sets, just by improving segmentation.

摘要
一个挑战在语音翻译中是许多 spoken 内容很长，但是需要短单位以获得高质量翻译。为Address this mismatch, we adapt large language models (LLMs) to split long ASR transcripts into segments that can be independently translated so as to maximize the overall translation quality. We overcome the tendency of hallucination in LLMs by incorporating finite-state constraints during decoding; these eliminate invalid outputs without requiring additional training. We discover that LLMs are adaptable to transcripts containing ASR errors through prompt-tuning or fine-tuning. Compared to a state-of-the-art automatic punctuation baseline, our best LLM improves the average BLEU by 2.9 points for English-German, English-Spanish, and English-Arabic TED talk translation in 9 test sets, just by improving segmentation.

Let’s Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models

paper_url: http://arxiv.org/abs/2310.13671
repo_url: https://github.com/rickyskywalker/synthesis_step-by-step_official
paper_authors: Ruida Wang, Wangchunshu Zhou, Mrinmaya Sachan
for: 这篇论文的目的是提出一种数据合成方法，以帮助小型模型在具有很少标注数据的情况下进行训练。methods: 该方法利用大语言模型的丰富知识来生成 pseudo 训练示例，以实现数据和计算效率的同时提高。results: 该方法可以减少数据合成集和真实任务数据集之间的分布差距，从而提高小型模型的性能。对多个 NLP 任务进行了广泛的实验，显示了我们的方法可以比基elines提高小型模型的性能，最大提高比例达到 15.17%。

Abstract
*Data Synthesis* is a promising way to train a small model with very little labeled data. One approach for data synthesis is to leverage the rich knowledge from large language models to synthesize pseudo training examples for small models, making it possible to achieve both data and compute efficiency at the same time. However, a key challenge in data synthesis is that the synthesized dataset often suffers from a large distributional discrepancy from the *real task* data distribution. Thus, in this paper, we propose *Synthesis Step by Step* (**S3**), a data synthesis framework that shrinks this distribution gap by iteratively extrapolating the errors made by a small model trained on the synthesized dataset on a small real-world validation dataset using a large language model. Extensive experiments on multiple NLP tasks show that our approach improves the performance of a small model by reducing the gap between the synthetic dataset and the real data, resulting in significant improvement compared to several baselines: 9.48% improvement compared to ZeroGen and 2.73% compared to GoldGen, and at most 15.17% improvement compared to the small model trained on human-annotated data.

摘要
<>translate english text into simplified chinese*Data Synthesis* 是一种有前途的方法，用于训练一个小型模型，只需要非常小的标注数据。一种实现数据合成的方法是利用大型语言模型中的丰富知识，生成pseudo的训练示例，使得可以同时实现数据和计算效率。然而，数据合成中的一个关键挑战是，合成的数据集经常受到真实任务数据分布的大分布差异。因此，在这篇论文中，我们提出了Synthesis Step by Step（**S3**）数据合成框架，通过在小型真实世界验证集上使用大型语言模型来逐步修正小型模型在合成数据集上的错误。广泛的实验表明，我们的方法可以提高小型模型的性能，降低合成数据集和真实数据集之间的分布差异，相比ZeroGen和GoldGen基准下提高9.48%和2.73%，并且在最好情况下与人工标注数据上训练小型模型相比提高15.17%。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

ManifoldNeRF: View-dependent Image Feature Supervision for Few-shot Neural Radiance Fields

paper_url: http://arxiv.org/abs/2310.13670
repo_url: None
paper_authors: Daiju Kanaoka, Motoharu Sonogashira, Hakaru Tamukoh, Yasutomo Kawanishi
for: 本研究旨在提高 Novel View Synthesis 的效果，使用 Neural Radiance Fields (NeRF) 的扩展 DietNeRF。
methods: 本研究提出了一种基于 interpolated features 的方法，来supervise unknown viewpoints 的特征向量。
results: 实验结果表明，提议的方法在复杂场景中表现更好于 DietNeRF，并且在实际环境中identified一组有效的视点 Patterns。

Abstract
Novel view synthesis has recently made significant progress with the advent of Neural Radiance Fields (NeRF). DietNeRF is an extension of NeRF that aims to achieve this task from only a few images by introducing a new loss function for unknown viewpoints with no input images. The loss function assumes that a pre-trained feature extractor should output the same feature even if input images are captured at different viewpoints since the images contain the same object. However, while that assumption is ideal, in reality, it is known that as viewpoints continuously change, also feature vectors continuously change. Thus, the assumption can harm training. To avoid this harmful training, we propose ManifoldNeRF, a method for supervising feature vectors at unknown viewpoints using interpolated features from neighboring known viewpoints. Since the method provides appropriate supervision for each unknown viewpoint by the interpolated features, the volume representation is learned better than DietNeRF. Experimental results show that the proposed method performs better than others in a complex scene. We also experimented with several subsets of viewpoints from a set of viewpoints and identified an effective set of viewpoints for real environments. This provided a basic policy of viewpoint patterns for real-world application. The code is available at https://github.com/haganelego/ManifoldNeRF_BMVC2023

摘要
新型视图合成技术在近期内受到了神经辐射场（NeRF）的推出，其中 DietNeRF 是一种从只有几个图像中学习视图 synthesis 的扩展。DietNeRF 的目标是在不知道视点的情况下，从几个图像中学习视图 synthesis。但是，这个假设是理想的，实际上，随着视点的变化，图像中的特征向量也会随着变化。因此，这个假设可能会对训练造成害。为了避免这种害，我们提出了 ManifoldNeRF，一种使用邻近known viewpoint的 interpolated 特征来监督未知视点的特征vector的方法。由于该方法可以对每个未知视点提供适当的监督，因此可以更好地学习volume representation。实验结果表明，我们提出的方法在复杂场景中表现更好 than others。我们还对一些视点集进行了实验，并确定了在真实环境中有效的视点集。这提供了一个基本的视点模式政策，可以应用于实际世界中。代码可以在上找到。

Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis

paper_url: http://arxiv.org/abs/2310.13669
repo_url: https://github.com/huawei-noah/noah-research
paper_authors: Philip John Gorinski, Matthieu Zimmer, Gerasimos Lampouras, Derrick Goh Xin Deik, Ignacio Iacobacci
for: 这个论文的目的是提高代码生成模型的性能。
methods: 该论文使用了自动生成的函数签名和相关的单元测试数据，以及actor-critic reinforcement learning训练方法。
results: 该论文的实验结果显示，与原始代码生成LM模型相比，使用自动生成的数据和actor-critic RL训练方法可以提高代码生成模型的性能，最高提高9.9%。同时，与常见的PPO或CodeRL模型相比，该论文的方法可以提高代码生成模型的性能，最高提高4.3%。

Abstract
The advent of large pre-trained language models in the domain of Code Synthesis has shown remarkable performance on various benchmarks, treating the problem of Code Generation in a fashion similar to Natural Language Generation, trained with a Language Modelling (LM) objective. In addition, the property of programming language code being precisely evaluable with respect to its semantics -- through the use of Unit Tests to check its functional correctness -- lends itself to using Reinforcement Learning (RL) as a further training paradigm. Previous work has shown that RL can be applied as such to improve models' coding capabilities; however, such RL-based methods rely on a reward signal based on defined Unit Tests, which are much harder to obtain compared to the huge crawled code datasets used in LM objectives. In this work, we present a novel approach to automatically obtain data consisting of function signatures and associated Unit Tests, suitable for RL training of Code Synthesis models. We also introduce a straightforward, simple yet effective Actor-Critic RL training scheme and show that it, in conjunction with automatically generated training data, leads to improvement of a pre-trained code language model's performance by up to 9.9% improvement over the original underlying code synthesis LM, and up to 4.3% over RL-based models trained with standard PPO or CodeRL.

摘要
大量预训练语言模型在代码生成领域的出现，在不同的benchmark上表现出色，将代码生成问题与自然语言生成问题相似的方式进行训练，使用语言模型（LM）目标。此外，编程语言代码的准确评估性——通过使用单元测试来检查其功能正确性——使得使用强化学习（RL）作为训练方法的可能性。前一个研究表明RL可以用来提高模型的编程能力，但是这些RL基于的方法需要基于定义的单元测试奖励信号，这些奖励信号比大量爬取代码集更难获得。在这项工作中，我们提出了一种新的方法，可以自动获得包含函数签名和相关单元测试的数据，适用于RL训练代码生成模型。我们还提出了一种简单、直观 yet effective的actor-critic RL训练方案，并证明在这种方案下，可以使用自动生成的训练数据，提高一个预训练的代码语言模型的性能，相比原始代码生成LM的9.9%提升，并比使用标准PPO或CodeRL的RL-based模型的4.3%提升。

An experimental study for early diagnosing Parkinson’s disease using machine learning

paper_url: http://arxiv.org/abs/2310.13654
repo_url: None
paper_authors: Md. Taufiqul Haque Khan Tusar, Md. Touhidul Islam, Abul Hasnat Sakil
For: Early detection of Parkinson’s Disease* Methods: Machine Learning techniques, including MinMax Scaler, Local Outlier Factor, and SMOTE, to automate the early detection of Parkinson’s Disease from clinical characteristics, voice features, and motor examination.* Results: Obtained 100% accuracy in classifying PD and RBD patients, as well as 92% accuracy in classifying PD and HC individuals.

Abstract
One of the most catastrophic neurological disorders worldwide is Parkinson's Disease. Along with it, the treatment is complicated and abundantly expensive. The only effective action to control the progression is diagnosing it in the early stage. However, this is challenging because early detection necessitates a large and complex clinical study. This experimental work used Machine Learning techniques to automate the early detection of Parkinson's Disease from clinical characteristics, voice features and motor examination. In this study, we develop ML models utilizing a public dataset of 130 individuals, 30 of whom are untreated Parkinson's Disease patients, 50 of whom are Rapid Eye Movement Sleep Behaviour Disorder patients who are at a greater risk of contracting Parkinson's Disease, and 50 of whom are Healthy Controls. We use MinMax Scaler to rescale the data points, Local Outlier Factor to remove outliers, and SMOTE to balance existing class frequency. Afterwards, apply a number of Machine Learning techniques. We implement the approaches in such a way that data leaking and overfitting are not possible. Finally, obtained 100% accuracy in classifying PD and RBD patients, as well as 92% accuracy in classifying PD and HC individuals.

摘要
In this study, we use machine learning techniques to automate the early detection of Parkinson's disease from clinical characteristics, voice features, and motor examination. We use a public dataset of 130 individuals, including 30 patients with untreated Parkinson's disease, 50 patients with rapid eye movement sleep behavior disorder who are at a higher risk of contracting Parkinson's disease, and 50 healthy controls.First, we use Min-Max Scaler to rescale the data points, Local Outlier Factor to remove outliers, and SMOTE to balance the existing class frequency. We implement the approaches in such a way that data leaking and overfitting are not possible.After applying these techniques, we obtained 100% accuracy in classifying Parkinson's disease and rapid eye movement sleep behavior disorder patients, as well as 92% accuracy in classifying Parkinson's disease and healthy control individuals.Note: "Parkinson's disease" is translated as " Parkinson 病" in Simplified Chinese, "rapid eye movement sleep behavior disorder" is translated as "快眼睛运动失调" in Simplified Chinese, and "healthy controls" is translated as "健康控制群" in Simplified Chinese.

Weighted Joint Maximum Mean Discrepancy Enabled Multi-Source-Multi-Target Unsupervised Domain Adaptation Fault Diagnosis

paper_url: http://arxiv.org/abs/2310.14790
repo_url: None
paper_authors: Zixuan Wang, Haoran Tang, Haibo Wang, Bo Qin, Mark D. Butala, Weiming Shen, Hongwei Wang
for: 本研究旨在提出一种多源多目标无监督领域适应（WJMMD-MDA）方法，用于在多源多目标场景下实现适应预测。
methods: 该方法提取了多个标注源频谱中的足够信息，并通过改进的距离损失实现多源多目标频谱的域对齐。这使得在多源多目标场景下学习域外特征，并实现跨域缺陷诊断。
results: 对三个数据集进行了广泛的比较试验，实验结果表明，提出的方法在跨域缺陷诊断中具有显著优势。

Abstract
Despite the remarkable results that can be achieved by data-driven intelligent fault diagnosis techniques, they presuppose the same distribution of training and test data as well as sufficient labeled data. Various operating states often exist in practical scenarios, leading to the problem of domain shift that hinders the effectiveness of fault diagnosis. While recent unsupervised domain adaptation methods enable cross-domain fault diagnosis, they struggle to effectively utilize information from multiple source domains and achieve effective diagnosis faults in multiple target domains simultaneously. In this paper, we innovatively proposed a weighted joint maximum mean discrepancy enabled multi-source-multi-target unsupervised domain adaptation (WJMMD-MDA), which realizes domain adaptation under multi-source-multi-target scenarios in the field of fault diagnosis for the first time. The proposed method extracts sufficient information from multiple labeled source domains and achieves domain alignment between source and target domains through an improved weighted distance loss. As a result, domain-invariant and discriminative features between multiple source and target domains are learned with cross-domain fault diagnosis realized. The performance of the proposed method is evaluated in comprehensive comparative experiments on three datasets, and the experimental results demonstrate the superiority of this method.

摘要
尽管数据驱动智能故障诊断技术可以实现很出色的结果，但它们假设训练和测试数据的分布相同，以及充足的标注数据。在实际场景中，各种运行状态经常存在，导致域 shift 问题，这阻碍了故障诊断的效iveness。而最近的无监督领域适应方法可以在不同域的故障诊断中进行交叉领域适应，但它们很难 simultaneously 利用多个来源域的信息，并实现多个目标域的有效诊断。在这篇论文中，我们创新地提出了一种基于加权最大差异 enabled 多源多目标无监督领域适应方法（WJMMD-MDA），该方法在多源多目标场景中实现了领域适应。该方法从多个标注源域中提取了足够的信息，并通过改进的加权距离损失来实现源和目标域之间的域对应。因此，在多个源和目标域之间学习了域不variant 和抑制特征，并实现了交叉域的故障诊断。我们在三个数据集上进行了广泛的比较实验，结果表明该方法的性能优于其他方法。

Contrastive Preference Learning: Learning from Human Feedback without RL

paper_url: http://arxiv.org/abs/2310.13639
repo_url: https://github.com/jhejna/cpl
paper_authors: Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh
for: 这 paper 是关于人工智能学习从人类反馈（RLHF）中的一种新方法，用于让模型与人类意愿相对应。
methods: 这 paper 使用了一种新的 regret-based 模型来学习人类偏好，并使用了一种新的 contrastive 目标函数来学习优化行为。
results: 这 paper 的结果表明，这种新方法可以在高维和连续的RLHF问题中Scale to elegantly，并且比之前的方法更简单。

Abstract
Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for aligning models with human intent. Typically RLHF algorithms operate in two phases: first, use human preferences to learn a reward function and second, align the model by optimizing the learned reward via reinforcement learning (RL). This paradigm assumes that human preferences are distributed according to reward, but recent work suggests that they instead follow the regret under the user's optimal policy. Thus, learning a reward function from feedback is not only based on a flawed assumption of human preference, but also leads to unwieldy optimization challenges that stem from policy gradients or bootstrapping in the RL phase. Because of these optimization challenges, contemporary RLHF methods restrict themselves to contextual bandit settings (e.g., as in large language models) or limit observation dimensionality (e.g., state-based robotics). We overcome these limitations by introducing a new family of algorithms for optimizing behavior from human feedback using the regret-based model of human preferences. Using the principle of maximum entropy, we derive Contrastive Preference Learning (CPL), an algorithm for learning optimal policies from preferences without learning reward functions, circumventing the need for RL. CPL is fully off-policy, uses only a simple contrastive objective, and can be applied to arbitrary MDPs. This enables CPL to elegantly scale to high-dimensional and sequential RLHF problems while being simpler than prior methods.

摘要
人工智能学习奖励（RLHF）已成为一种流行的方法，用于将模型与人类意愿相align。通常RLHF算法在两个阶段操作：首先，使用人类偏好来学习奖励函数，然后使用奖励学习（RL）来调整模型。这种方法假设人类偏好是根据奖励分布的，但最近的研究表明，人类偏好实际上是基于用户的优化策略的 regret。因此，从反馈中学习奖励函数并不只是基于错误的人类偏好假设，还会导致困难的优化挑战，例如Policy Gradient或Bootstrapping在RL阶段。由于这些优化挑战，当代RLHF方法通常限制自己在contextual bandit Setting（如大语言模型）或限制观察维度（如状态基于机器人）。我们超越这些限制，通过引入一种新的人类反馈学习算法，使用 regret-based模型来学习行为。使用最大 entropy原理，我们 derive Contrastive Preference Learning（CPL）算法，可以从偏好中学习优化策略，不需要学习奖励函数，从而避免RL阶段的优化挑战。CPL是完全偏离策略的，只需使用简单的对比目标，可以应用于任意MDP。这使得CPL可以简单地扩展到高维和顺序RLHF问题，而且更简单于先前的方法。

Hunayn: Elevating Translation Beyond the Literal

paper_url: http://arxiv.org/abs/2310.13613
repo_url: None
paper_authors: Nasser Almousa, Nasser Alzamil, Abdullah Alshehri, Ahmad Sait
for: 这项研究旨在开发一个高级的英语到阿拉伯语翻译工具，超越传统工具。
methods: 该方法利用赫尔辛基 transformer（MarianMT），通过自动抽取的纯文学阿拉伯语数据进行微调。
results: 对于Google翻译的评估表明，该方法在质量评估中具有明显的优势，特别是在文化敏感度和上下文准确性方面。

Abstract
This project introduces an advanced English-to-Arabic translator surpassing conventional tools. Leveraging the Helsinki transformer (MarianMT), our approach involves fine-tuning on a self-scraped, purely literary Arabic dataset. Evaluations against Google Translate show consistent outperformance in qualitative assessments. Notably, it excels in cultural sensitivity and context accuracy. This research underscores the Helsinki transformer's superiority for English-to-Arabic translation using a Fusha dataset.

摘要
这个项目推出了一种高级英语到阿拉伯语翻译工具，超越传统工具。我们采用了赫尔辛基transformer（MarianMT），我们的方法是在自动抽取的纯文学阿拉伯语数据上细调。对于Google翻译进行评估，我们的方法表现出了一致的提升，尤其是在文化敏感度和语言上下文准确性方面。这些研究证明了赫尔辛基transformer在英语到阿拉伯语翻译中的优势，使用福沙数据集。

Make Your Decision Convincing! A Unified Two-Stage Framework: Self-Attribution and Decision-Making

paper_url: http://arxiv.org/abs/2310.13610
repo_url: None
paper_authors: Yanrui Du, Sendong Zhao, Haochun Wang, Yuhan Chen, Rui Bai, Zewen Qiang, Muzhen Cai, Bing Qin
for: 本研究旨在提高黑盒模型的决策过程中的自然语言描述能力，以及提供用户可靠的决策理由。
methods: 本研究使用了子序列从输入文本中提取的自然语言来为用户提供决策理由，并通过两个阶段框架 Self-Attribution and Decision-Making (SADM) 来确保决策理由和模型决策之间的关系更加可靠。
results: 经过对 ERASER 测试 benchmark 上的五个理解任务的广泛实验，我们表明了我们的框架不仅可以提高决策理由和模型决策之间的关系的可靠性，还可以在任务性能和决策理由质量两个方面达到竞争力。此外，我们还探讨了我们的框架在半supervised情况下的潜在应用。

Abstract
Explaining black-box model behavior with natural language has achieved impressive results in various NLP tasks. Recent research has explored the utilization of subsequences from the input text as a rationale, providing users with evidence to support the model decision. Although existing frameworks excel in generating high-quality rationales while achieving high task performance, they neglect to account for the unreliable link between the generated rationale and model decision. In simpler terms, a model may make correct decisions while attributing wrong rationales, or make poor decisions while attributing correct rationales. To mitigate this issue, we propose a unified two-stage framework known as Self-Attribution and Decision-Making (SADM). Through extensive experiments on five reasoning datasets from the ERASER benchmark, we demonstrate that our framework not only establishes a more reliable link between the generated rationale and model decision but also achieves competitive results in task performance and the quality of rationale. Furthermore, we explore the potential of our framework in semi-supervised scenarios.

摘要
<>将黑盒模型行为用自然语言描述达到了各种NLP任务中的出色结果。现有研究利用输入文本中的子序列作为论证，为用户提供模型决策的证据。 although existing frameworks can generate high-quality rationales and achieve high task performance, they neglect to account for the unreliable link between the generated rationale and model decision. In simpler terms, a model may make correct decisions while attributing wrong rationales, or make poor decisions while attributing correct rationales. To address this issue, we propose a unified two-stage framework known as Self-Attribution and Decision-Making (SADM). Through extensive experiments on five reasoning datasets from the ERASER benchmark, we demonstrate that our framework not only establishes a more reliable link between the generated rationale and model decision but also achieves competitive results in task performance and the quality of rationale. Furthermore, we explore the potential of our framework in semi-supervised scenarios.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection Benchmark

paper_url: http://arxiv.org/abs/2310.13606
repo_url: https://github.com/kinit-sk/mgt-detection-benchmark
paper_authors: Dominik Macko, Robert Moro, Adaku Uchendu, Jason Samuel Lucas, Michiharu Yamashita, Matúš Pikuliak, Ivan Srba, Thai Le, Dongwon Lee, Jakub Simko, Maria Bielikova
for: 填充 Multilingual Machine-Generated Text Detection Benchmark Dataset的 lacuna，提供8种多语言LLM生成的74,081个authentic和机器生成文本，用于评估多语言机器生成文本检测器的性能。
methods: 使用这个benchmark dataset， Comparing zero-shot (统计学和黑盒子)和精心调整的检测器的性能，并评估这些检测器在不同语言和LLM之间的一致性。
results: 研究发现， zero-shot检测器在语言不同和LLM不同情况下的性能较差，而精心调整的检测器在多语言和多LLM情况下的性能显著提高。

Abstract
There is a lack of research into capabilities of recent LLMs to generate convincing text in languages other than English and into performance of detectors of machine-generated text in multilingual settings. This is also reflected in the available benchmarks which lack authentic texts in languages other than English and predominantly cover older generators. To fill this gap, we introduce MULTITuDE, a novel benchmarking dataset for multilingual machine-generated text detection comprising of 74,081 authentic and machine-generated texts in 11 languages (ar, ca, cs, de, en, es, nl, pt, ru, uk, and zh) generated by 8 multilingual LLMs. Using this benchmark, we compare the performance of zero-shot (statistical and black-box) and fine-tuned detectors. Considering the multilinguality, we evaluate 1) how these detectors generalize to unseen languages (linguistically similar as well as dissimilar) and unseen LLMs and 2) whether the detectors improve their performance when trained on multiple languages.

摘要
“现有研究缺乏关于最新的LLM的生成文本的真实性的语言 besides English 以及机器生成文本检测器在多语言设置下的性能研究。这也反映在可用的标准准则中，lack of authentic texts in languages other than English and primarily cover older generators。为了填补这一漏洞，我们引入 MULTITuDE，一个新的多语言机器生成文本检测 benchmark，包含74,081 个真实的机器生成文本和人工生成文本在 11 种语言（ar, ca, cs, de, en, es, nl, pt, ru, uk, and zh），由 8 种多语言 LLMS 生成。通过这个benchmark，我们比较了零 shot (统计学和黑盒子) 和精度调教的检测器的性能。考虑到多语言性，我们评估了1）这些检测器在未看过语言（语言相似性和不相似性）和LLMs 中的泛化性能，2）是否在多语言培训中提高检测器的性能。”

Skin Lesion Segmentation Improved by Transformer-based Networks with Inter-scale Dependency Modeling

paper_url: http://arxiv.org/abs/2310.13604
repo_url: https://github.com/saniaesk/skin-lesion-segmentation
paper_authors: Sania Eskandari, Janet Lumpp, Luis Sanchez Giraldo
for:这项研究旨在提高皮肤癌病变分割的自动化精度，使用Transformer网络模型来增强FCNs的长距离依赖关系捕捉能力。methods:该研究使用了一种基于U-Net架构的Transformer网络模型，并提出了一种增强skip连接路径的方法，以增加网络特征重用性。此外，研究还提出了一种Inter-scale Context Fusion（ISCF）方法，通过在每个阶段的Encoder中使用注意力相关性来适应性地融合不同阶段的上下文。results:研究在两个皮肤癌病变分割 benchmark 上达到了比较高的分割精度，并且对比Transformer网络模型和U-Net架构的性能进行了比较，结果表明，基于U-Net架构的Transformer网络模型可以增强皮肤癌病变分割的自动化精度。代码可以在 GitHub 上找到：https://github.com/saniaesk/skin-lesion-segmentation。

Abstract
Melanoma, a dangerous type of skin cancer resulting from abnormal skin cell growth, can be treated if detected early. Various approaches using Fully Convolutional Networks (FCNs) have been proposed, with the U-Net architecture being prominent To aid in its diagnosis through automatic skin lesion segmentation. However, the symmetrical U-Net model's reliance on convolutional operations hinders its ability to capture long-range dependencies crucial for accurate medical image segmentation. Several Transformer-based U-Net topologies have recently been created to overcome this limitation by replacing CNN blocks with different Transformer modules to capture local and global representations. Furthermore, the U-shaped structure is hampered by semantic gaps between the encoder and decoder. This study intends to increase the network's feature re-usability by carefully building the skip connection path. Integrating an already calculated attention affinity within the skip connection path improves the typical concatenation process utilized in the conventional skip connection path. As a result, we propose a U-shaped hierarchical Transformer-based structure for skin lesion segmentation and an Inter-scale Context Fusion (ISCF) method that uses attention correlations in each stage of the encoder to adaptively combine the contexts from each stage to mitigate semantic gaps. The findings from two skin lesion segmentation benchmarks support the ISCF module's applicability and effectiveness. The code is publicly available at \url{https://github.com/saniaesk/skin-lesion-segmentation}

摘要
melanoma，一种危险的皮肤癌症，可以通过早期检测治疗。多种使用 Fully Convolutional Networks (FCNs) 的方法已经被提议，其中 U-Net 建筑物被广泛使用，以帮助自动识别皮肤肿瘤。然而，传统的 U-Net 模型在 convolutional 操作上存在一定的限制，这限制了它的捕捉长距离依赖关系的能力，这些依赖关系是医疗图像分割中必要的。为了缓解这些限制，一些基于 Transformer 的 U-Net 结构已经被创建，这些结构将 CNN 块 replaced 为不同的 Transformer 模块，以捕捉本地和全局表示。此外，U 形结构受到Semantic gap 的限制，这个问题可以通过精心建立 skip connection path 来解决。在传统的 skip connection path 中使用已经计算的 attention affinity 可以提高 feature 的重用性。因此，我们提出了一种 U 形层次 Transformer 基本结构和一种 Inter-scale Context Fusion (ISCF) 方法，该方法在每个encoder阶段使用 attention 相关性来适应地将每个阶段的上下文进行adaptive 组合，以 Mitigate Semantic gap。从两个皮肤肿瘤分割 benchmark 的结果来看，ISCF 模块的可行性和效果。代码可以在 \url{https://github.com/saniaesk/skin-lesion-segmentation} 上获取。

MarineGPT: Unlocking Secrets of Ocean to the Public

paper_url: http://arxiv.org/abs/2310.13596
repo_url: https://github.com/hkust-vgd/marinegpt
paper_authors: Ziqiang Zheng, Jipeng Zhang, Tuan-Anh Vu, Shizhe Diao, Yue Him Wong Tim, Sai-Kit Yeung
for: The paper is written to explore the use of large language models (LLMs) and multi-modal large language models (MLLMs) in the domain-specific application of the marine domain.
methods: The paper proposes a new vision-language model called MarineGPT, which is specifically designed for the marine domain and trained on a large dataset of marine image-text pairs called Marine-5M.
results: The paper shows that MarineGPT outperforms existing MLLMs in understanding domain-specific intents and generating informative and scientific responses in the marine domain. It also provides a standard protocol for adapting general-purpose assistants to downstream domain-specific experts.

Abstract
Large language models (LLMs), such as ChatGPT/GPT-4, have proven to be powerful tools in promoting the user experience as an AI assistant. The continuous works are proposing multi-modal large language models (MLLM), empowering LLMs with the ability to sense multiple modality inputs through constructing a joint semantic space (e.g. visual-text space). Though significant success was achieved in LLMs and MLLMs, exploring LLMs and MLLMs in domain-specific applications that required domain-specific knowledge and expertise has been less conducted, especially for \textbf{marine domain}. Different from general-purpose MLLMs, the marine-specific MLLM is required to yield much more \textbf{sensitive}, \textbf{informative}, and \textbf{scientific} responses. In this work, we demonstrate that the existing MLLMs optimized on huge amounts of readily available general-purpose training data show a minimal ability to understand domain-specific intents and then generate informative and satisfactory responses. To address these issues, we propose \textbf{MarineGPT}, the first vision-language model specially designed for the marine domain, unlocking the secrets of the ocean to the public. We present our \textbf{Marine-5M} dataset with more than 5 million marine image-text pairs to inject domain-specific marine knowledge into our model and achieve better marine vision and language alignment. Our MarineGPT not only pushes the boundaries of marine understanding to the general public but also offers a standard protocol for adapting a general-purpose assistant to downstream domain-specific experts. We pave the way for a wide range of marine applications while setting valuable data and pre-trained models for future research in both academic and industrial communities.

摘要
大型语言模型（LLMs），如ChatGPT/GPT-4，已经证明是强大的工具来提升用户体验，作为人工智能助手。不断的研究提出了多modal大型语言模型（MLLM），将LMLMs扩展到多种数据类型的听取，例如文本和视觉数据。虽然LMLMs和MLLMs在不同领域中取得了卓越成就，但是对特定领域的应用仍然较少，尤其是在海洋领域。不同于通用MLLMs，海洋特定MLLM需要更加敏感、有用和科学的回应。在这个工作中，我们表明了现有的MLLMs在大量可用的通用训练数据上并不能够理解领域专门意图，并生成有用和满意的回应。为解决这些问题，我们提出了海洋GPT，首个特别设计 для海洋领域的视觉语言模型，为海洋秘密开启给大众。我们提供了我们的海洋-5M数据集，包含更多 than 500万几何和文本对应项目，将领域专门知识注入到我们的模型中，以 достиieving更好的视觉和语言对齐。我们的海洋GPT不仅扩展了海洋理解的boundaries，并且提供了一个标准协议供后续领域专门人员适应。我们开启了海洋应用的广泛前景，同时设定了价值的数据和预训模型供未来学术和工业社群的研究。

Towards equilibrium molecular conformation generation with GFlowNets

paper_url: http://arxiv.org/abs/2310.14782
repo_url: None
paper_authors: Alexandra Volokhova, Michał Koziarski, Alex Hernández-García, Cheng-Hao Liu, Santiago Miret, Pablo Lemos, Luca Thiede, Zichao Yan, Alán Aspuru-Guzik, Yoshua Bengio
for: 用于预测分子性质的预测方法
methods: 使用GFlowNet方法对小分子的可能性空间进行采样，根据分子的能量来确定采样分布
results: 可以与不同精度的能量估计方法结合使用，找到高度可变的药物分子低能态 conformations 的多样化集合，并能够复制分子潜在能量表面。

Abstract
Sampling diverse, thermodynamically feasible molecular conformations plays a crucial role in predicting properties of a molecule. In this paper we propose to use GFlowNet for sampling conformations of small molecules from the Boltzmann distribution, as determined by the molecule's energy. The proposed approach can be used in combination with energy estimation methods of different fidelity and discovers a diverse set of low-energy conformations for highly flexible drug-like molecules. We demonstrate that GFlowNet can reproduce molecular potential energy surfaces by sampling proportionally to the Boltzmann distribution.

摘要
采样多样、 termodynamic 可行的分子姿态对分子性质预测具有重要作用。本文提议使用 GFlowNet 采样小分子的姿态，由分子能量确定的博尔ツ曼分布中的一部分。该方法可与不同级别的能量估计方法结合使用，并发现高灵活性药物分子的多个低能量姿态。我们示出 GFlowNet 可以按照博尔ツ曼分布中的比例采样分子潜在能能面。

ReLM: Leveraging Language Models for Enhanced Chemical Reaction Prediction

paper_url: http://arxiv.org/abs/2310.13590
repo_url: https://github.com/syr-cn/relm
paper_authors: Yaorui Shi, An Zhang, Enzhi Zhang, Zhiyuan Liu, Xiang Wang
for: 预测化学反应，一个基本的化学挑战，涉及预测反应过程中的产物。现有的技术常受训练数据的限制和不能利用文本信息的办法约束其应用在实际应用中。本文提出了一种名为ReLM的新方案，利用化学知识编码在语言模型（LM）中，以助金Graph Neural Networks（GNNs），从而提高实际化学反应预测的准确率。
methods: 我们提出了一种名为ReLM的新方案，利用化学知识编码在语言模型（LM）中，以助金Graph Neural Networks（GNNs），从而提高实际化学反应预测的准确率。
results: 我们的实验结果表明，ReLM可以在不同的化学反应数据集上提高现有GNN-based方法的性能，特别是在异常情况下。codes可以在https://github.com/syr-cn/ReLM中获得。

Abstract
Predicting chemical reactions, a fundamental challenge in chemistry, involves forecasting the resulting products from a given reaction process. Conventional techniques, notably those employing Graph Neural Networks (GNNs), are often limited by insufficient training data and their inability to utilize textual information, undermining their applicability in real-world applications. In this work, we propose ReLM, a novel framework that leverages the chemical knowledge encoded in language models (LMs) to assist GNNs, thereby enhancing the accuracy of real-world chemical reaction predictions. To further enhance the model's robustness and interpretability, we incorporate the confidence score strategy, enabling the LMs to self-assess the reliability of their predictions. Our experimental results demonstrate that ReLM improves the performance of state-of-the-art GNN-based methods across various chemical reaction datasets, especially in out-of-distribution settings. Codes are available at https://github.com/syr-cn/ReLM.

摘要
预测化学反应是化学领域的基本挑战之一，即预测反应过程中的产物。现有的技术，如基于图神经网络（GNN）的方法，常受到数据不充分的训练和文本信息的不能利用的限制，从而削弱其在实际应用中的可行性。在这项工作中，我们提出了一种新的框架，即ReLM，它利用化学知识编码在语言模型（LM）中来帮助GNN，从而提高实际化学反应预测的准确性。为进一步增强模型的可靠性和可读性，我们还在模型中 интеGRATE了信任分数策略，使LM可以自我评估其预测的可靠性。我们的实验结果表明，ReLM可以在不同的化学反应数据集上超越现有的GNN-based方法的性能，特别是在出版数据集上。代码可以在https://github.com/syr-cn/ReLM上下载。

SPARE: A Single-Pass Neural Model for Relational Databases

paper_url: http://arxiv.org/abs/2310.13581
repo_url: None
paper_authors: Benjamin Hilprecht, Kristian Kersting, Carsten Binnig
for: 这篇论文旨在提出一种高效地在关系数据库（RDB）上训练深度学习模型的方法，以提高predictive performance和减少训练时间。
methods: 该方法基于单过Relational models（SPARE），它利用了关系数据库中数据的规则结构，通过单过训练来快速地训练深度学习模型，并且可以充分利用相似性来降低模型的维度。
results: 对多个基线模型进行了比较，研究发现SPARE可以在训练和推理中快速减少时间，同时保持与基线模型相似的预测性能。

Abstract
While there has been extensive work on deep neural networks for images and text, deep learning for relational databases (RDBs) is still a rather unexplored field. One direction that recently gained traction is to apply Graph Neural Networks (GNNs) to RBDs. However, training GNNs on large relational databases (i.e., data stored in multiple database tables) is rather inefficient due to multiple rounds of training and potentially large and inefficient representations. Hence, in this paper we propose SPARE (Single-Pass Relational models), a new class of neural models that can be trained efficiently on RDBs while providing similar accuracies as GNNs. For enabling efficient training, different from GNNs, SPARE makes use of the fact that data in RDBs has a regular structure, which allows one to train these models in a single pass while exploiting symmetries at the same time. Our extensive empirical evaluation demonstrates that SPARE can significantly speedup both training and inference while offering competitive predictive performance over numerous baselines.

摘要
tradicional deep learning for images and text 领域已经有了广泛的研究，但是deep learning for relational databases（RDB）仍然是一个未经探索的领域。一个Recently gained traction的方向是将Graph Neural Networks（GNNs）应用于RBD。然而，在大规模的关系数据库（即数据存储在多个数据库表中）上训练GNNs是不具有效率的，因为需要多 rondas of training和可能很大的不效率表示。因此，在这篇论文中，我们提出了SPARE（Single-Pass Relational models），一种新的神经网络模型，可以高效地在RDB上训练，并且提供与GNNs相似的准确性。为了实现高效的训练，SPARE不同于GNNs，利用了RDB中数据的Regular structure，这 позвоes 一次性训练这些模型，同时利用Symmetries。我们的广泛的实验证明，SPARE可以明显提高训练和推理的速度，并且与多种基eline提供竞争的预测性能。

Tree Search in DAG Space with Model-based Reinforcement Learning for Causal Discovery

paper_url: http://arxiv.org/abs/2310.13576
repo_url: None
paper_authors: Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi
for: 本研究旨在提出一种基于模型搜索的 causal 发现方法，用于解决多个领域中的决策和生物经济学等领域中的 causal 结构难以确定问题。
methods: 本研究使用了搜索树来逐步构建导向的无环图，并提出了一种有效的算法来排除引入循环的边，从而实现更深入的离散搜索和采样在 DAGC 空间中。
results: 研究人员对两个实际任务进行了评估，并获得了较好的性能，超过了当前的模型自由方法和扩散搜索的性能，表明该方法在 combinatorial 方法中表现了明显的进步。

Abstract
Identifying causal structure is central to many fields ranging from strategic decision-making to biology and economics. In this work, we propose a model-based reinforcement learning method for causal discovery based on tree search, which builds directed acyclic graphs incrementally. We also formalize and prove the correctness of an efficient algorithm for excluding edges that would introduce cycles, which enables deeper discrete search and sampling in DAG space. We evaluate our approach on two real-world tasks, achieving substantially better performance than the state-of-the-art model-free method and greedy search, constituting a promising advancement for combinatorial methods.

摘要
找到 causal 结构是许多领域的中心问题，从策略决策到生物和经济学。在这项工作中，我们提出了基于模型的回归学习方法 для causal 发现，使用搜索树来逐步构建导向的无环图。我们还正式定义和证明了一种高效的算法来排除引入环的边，这使得精确的搜索和采样在 DAG 空间中可以进行更深入的探索。我们在两个实际任务上评估了我们的方法，与当前状态的模型自由方法和排序搜索具有显著更好的性能，代表了 combinatorial 方法的进步。

Boosting Generalization with Adaptive Style Techniques for Fingerprint Liveness Detection

paper_url: http://arxiv.org/abs/2310.13573
repo_url: None
paper_authors: Kexin Zhu, Bo Lin, Yang Qiu, Adam Yule, Yao Tang, Jiajun Liang
for: 本研究旨在提出一种高性能的指纹生物特征提取技术，并在 LivDet 2023 指纹表现挑战中获得第一名。
methods: 本研究使用了多种方法，包括样式转移，以提高精度和泛化能力。
results: 本研究在 LivDet 2023 生命检测在动作中的挑战中获得第二名，并在 LivDet 2023 指纹表现挑战中实现了状态的最佳性能。

Abstract
We introduce a high-performance fingerprint liveness feature extraction technique that secured first place in LivDet 2023 Fingerprint Representation Challenge. Additionally, we developed a practical fingerprint recognition system with 94.68% accuracy, earning second place in LivDet 2023 Liveness Detection in Action. By investigating various methods, particularly style transfer, we demonstrate improvements in accuracy and generalization when faced with limited training data. As a result, our approach achieved state-of-the-art performance in LivDet 2023 Challenges.

摘要
我们介绍了一种高性能指纹生活特征提取技术，在2023年生活特征挑战赛中获得第一名。此外，我们还开发了一个实用的指纹识别系统，准确率达94.68%，在2023年生活检测在动作赛中获得第二名。通过各种方法的研究，特别是样式传输，我们证明了对有限训练数据的改进。因此，我们的方法在2023年生活检测挑战中达到了国际前列水平。

Retrieval-Augmented Neural Response Generation Using Logical Reasoning and Relevance Scoring

paper_url: http://arxiv.org/abs/2310.13566
repo_url: None
paper_authors: Nicholas Thomas Walker, Stefan Ultes, Pierre Lison
for: 这个论文是为了提出一种基于知识图和逻辑推理的响应生成方法，以提高对话系统的响应质量。
methods: 该方法包括在对话状态和背景信息上构建知识图，然后使用 probabilistic logical programming 推理出逻辑推理出逻辑推理得到的信息，最后使用神经网络模型对对话中每个节点和边进行排名，并将最高排名的元素转化为自然语言形式，并与对话系统的响应相结合。
results: 实验结果表明，将逻辑推理与对话 relevance 排名结合，可以提高对话系统的响应的 фактиче性和流畅性。

Abstract
Constructing responses in task-oriented dialogue systems typically relies on information sources such the current dialogue state or external databases. This paper presents a novel approach to knowledge-grounded response generation that combines retrieval-augmented language models with logical reasoning. The approach revolves around a knowledge graph representing the current dialogue state and background information, and proceeds in three steps. The knowledge graph is first enriched with logically derived facts inferred using probabilistic logical programming. A neural model is then employed at each turn to score the conversational relevance of each node and edge of this extended graph. Finally, the elements with highest relevance scores are converted to a natural language form, and are integrated into the prompt for the neural conversational model employed to generate the system response. We investigate the benefits of the proposed approach on two datasets (KVRET and GraphWOZ) along with a human evaluation. Experimental results show that the combination of (probabilistic) logical reasoning with conversational relevance scoring does increase both the factuality and fluency of the responses.

摘要
通常情况下，任务导向对话系统的响应执行都是基于对话状态或外部数据库的信息。这篇论文提出了一种新的知识固定响应生成方法，该方法结合检索加强语言模型和逻辑推理。该方法的核心思想是使用对话状态和背景信息的知识图，并在三个步骤中进行处理。首先，将对话状态和背景信息转换为逻辑推理可以生成的逻辑知识图。然后，使用神经网络模型对每个转换后的图进行分类，以评估对话中每个节点和边的对话相关性。最后，选择分类结果最高的元素，并将其转换为自然语言形式，以整合到用于生成系统响应的神经网络模型中。我们在两个数据集（KVRET和GraphWOZ）上进行了实验和人工评估，结果表明，将逻辑推理与对话相关性分类结合使用，可以提高响应的事实性和流畅性。

Reward Shaping for Happier Autonomous Cyber Security Agents

paper_url: http://arxiv.org/abs/2310.13565
repo_url: None
paper_authors: Elizabeth Bates, Vasilios Mavroudis, Chris Hicks
for: 这种工作研究了计算机网络防御任务中深度强化学习模型的训练方法，特别是对奖励信号的影响。
methods: 本研究使用了奖励信号的修正技巧，以帮助代理人更 efficiently 训练和可能 converges to better performance。
results: 研究发现，深度强化学习算法对奖励信号的大小和相对大小有敏感性。此外，结合奖励和外部奖励的组合训练可以与奖励只训练相比，提高代理人的训练效率和性能。但是，内在的好奇心作为一种内部正面奖励机制可能不太适用于高级网络监测任务。

Abstract
As machine learning models become more capable, they have exhibited increased potential in solving complex tasks. One of the most promising directions uses deep reinforcement learning to train autonomous agents in computer network defense tasks. This work studies the impact of the reward signal that is provided to the agents when training for this task. Due to the nature of cybersecurity tasks, the reward signal is typically 1) in the form of penalties (e.g., when a compromise occurs), and 2) distributed sparsely across each defense episode. Such reward characteristics are atypical of classic reinforcement learning tasks where the agent is regularly rewarded for progress (cf. to getting occasionally penalized for failures). We investigate reward shaping techniques that could bridge this gap so as to enable agents to train more sample-efficiently and potentially converge to a better performance. We first show that deep reinforcement learning algorithms are sensitive to the magnitude of the penalties and their relative size. Then, we combine penalties with positive external rewards and study their effect compared to penalty-only training. Finally, we evaluate intrinsic curiosity as an internal positive reward mechanism and discuss why it might not be as advantageous for high-level network monitoring tasks.

摘要
随着机器学习模型的能力不断提高，它们在解决复杂任务方面表现出了潜在的投资潜力。一个最有前途的方向是使用深度反馈学习训练自动化代理人在计算机网络防御任务中。这项研究研究了在这个任务中训练代理人时的奖励信号的影响。由于网络安全任务的性质，奖励信号通常是1）在攻击发生时给出的罚款（例如），和2）每个防御集的分布式罚款。这种奖励特点与 класси型反馈学习任务不同，agent Regularly rewarded for progress（比如 occasional penalties for failures）。我们研究了修复奖励技巧，以帮助代理人更加样本效率地训练，并可能达到更好的性能。我们首先表明深度反馈学习算法对奖励的大小和相对大小的敏感性。然后，我们将罚款与正面的外部奖励相结合，并比较奖励只有训练的效果。最后，我们评估了内在的好奇性作为内部正面奖励机制，并讨论了为高级网络监测任务而言，这种机制可能不太有利。

Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning

paper_url: http://arxiv.org/abs/2310.13552
repo_url: https://github.com/noewangjy/sp-cot
paper_authors: Jinyuan Wang, Junlong Li, Hai Zhao
for: 这 paper 的目的是提出一种自动生成高质量的 chain-of-thought (CoT) 框架，以提高大语言模型 (LLM) 的多步逻辑能力。
methods: 这 paper 使用了一种自动生成高质量 CoT 数据集的框架，以及一种适应性抽取器来选择在上下文中的 CoT。另外，它还使用了一种自适应学习的句子提示方法来进行自我推导。
results: 对四个多步逻辑问答 benchmark 进行了广泛的实验，并显示了 SP-CoT 可以在大规模 (175B) LLM 上显著超越之前的 SOTA 方法，同时也可以在小规模 (13B) LLM 上近乎双倍提高零基eline性能。进一步的分析发现，SP-CoT 能够诱导 LLM 提供直接和简洁的中间逻辑步骤，在 MuSiQue-Ans 数据集上 recall 约 50% 的中间答案。

Abstract
In open-domain question-answering (ODQA), most existing questions require single-hop reasoning on commonsense. To further extend this task, we officially introduce open-domain multi-hop reasoning (ODMR) by answering multi-hop questions with explicit reasoning steps in open-domain setting. Recently, large language models (LLMs) have found significant utility in facilitating ODQA without external corpus. Furthermore, chain-of-thought (CoT) prompting boosts the reasoning capability of LLMs to a greater extent with manual or automated paradigms. However, existing automated methods lack of quality assurance, while manual approaches suffer from limited scalability and poor diversity, hindering the capabilities of LLMs. In this paper, we propose Self-prompted Chain-of-Thought (SP-CoT), an automated framework to mass-produce high quality CoTs of LLMs, by LLMs and for LLMs. SP-CoT introduces an automated generation pipeline of high quality ODMR datasets, an adaptive sampler for in-context CoT selection and self-prompted inference via in-context learning. Extensive experiments on four multi-hop question-answering benchmarks show that our proposed SP-CoT not only significantly surpasses the previous SOTA methods on large-scale (175B) LLMs, but also nearly doubles the zero-shot performance of small-scale (13B) LLMs. Further analysis reveals the remarkable capability of SP-CoT to elicit direct and concise intermediate reasoning steps by recalling $\sim$50\% of intermediate answers on MuSiQue-Ans dataset.

摘要
在开放预测问答（ODQA）任务中，大多数现有的问题需要单步逻辑。为了进一步推进这个任务，我们正式引入开放预测多步逻辑（ODMR），通过在开放预测 Setting中回答多步问题，并提供明确的逻辑步骤。现在，大型自然语言模型（LLM）在ODQA中发现了显著的用于促进逻辑能力的用途。此外，链条思维（CoT）提问技术可以大幅提高LLM的逻辑能力，但现有的自动化方法缺乏质量保证，而手动方法受限于批量缺乏多样性，这限制了LLM的能力。在这篇论文中，我们提出了自动生成高质量链条思维（SP-CoT）框架，用于自动生成高质量ODMR数据集，适应Context的CoT选择和自我提示的推理。我们的实验表明，我们提出的SP-CoT不仅在大规模（175B）LLM上明显超越了之前的SOTA方法，而且在小规模（13B）LLM上也近乎双倍了零基eline性能。进一步分析发现，SP-CoT可以诱导直接和简洁的中间逻辑步骤，在MuSiQue-Ans数据集上回快约50%的中间答案。

Towards Understanding Sycophancy in Language Models

paper_url: http://arxiv.org/abs/2310.13548
repo_url: https://github.com/meg-tong/sycophancy-eval
paper_authors: Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, Ethan Perez
for: investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback
methods: five state-of-the-art AI assistants consistently exhibit sycophancy across four varied free-form text-generation tasks
results: human preferences drive this broadly observed behavior, and both humans and preference models (PMs) prefer convincingly-written sycophantic responses over correct ones a non-negligible fraction of the time

Abstract
Human feedback is commonly utilized to finetune AI assistants. But human feedback may also encourage model responses that match user beliefs over truthful ones, a behaviour known as sycophancy. We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback, and the potential role of human preference judgments in such behavior. We first demonstrate that five state-of-the-art AI assistants consistently exhibit sycophancy across four varied free-form text-generation tasks. To understand if human preferences drive this broadly observed behavior, we analyze existing human preference data. We find that when a response matches a user's views, it is more likely to be preferred. Moreover, both humans and preference models (PMs) prefer convincingly-written sycophantic responses over correct ones a non-negligible fraction of the time. Optimizing model outputs against PMs also sometimes sacrifices truthfulness in favor of sycophancy. Overall, our results indicate that sycophancy is a general behavior of state-of-the-art AI assistants, likely driven in part by human preference judgments favoring sycophantic responses.

摘要
人类反馈通常用于精细化AI助手。然而，人类反馈也可能导致模型响应与用户的信念相符，这种行为称为奉承。我们研究了使用人类反馈进行finetuning的模型中是否存在奉承行为，以及人类喜好判断是否对此行为产生影响。我们首先表明了五种当前顶尖AI助手在四种不同的自由文本生成任务上一致地表现出奉承行为。为了了解人类喜好是否驱动这种广泛观察到的行为，我们分析了现有的人类喜好数据。我们发现当响应与用户的观点相符时，它更可能被 preference。此外，人类和喜好模型（PM）也有一定的时间喜好奉承而不是正确的响应。优化模型输出对PM也有时会损失真实性，而是偏向奉承。总的来说，我们的结果表明，state-of-the-art AI助手中的奉承行为是一种普遍存在的现象，可能受到人类喜好判断的影响。

ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection

paper_url: http://arxiv.org/abs/2310.13545
repo_url: https://github.com/sail-sg/scalelong
paper_authors: Zhongzhan Huang, Pan Zhou, Shuicheng Yan, Liang Lin
for: 这个论文主要是为了解释UNet在diffusion模型中的不稳定性问题，以及对UNet的缩放长连接系数（LSC）的影响。
methods: 这个论文使用了 teoretic 方法来解释UNet在diffusion模型中的不稳定性问题，并提出了一种名为ScaleLong的缩放长连接系数 frameworks，以改进UNet的训练稳定性。
results: 实验结果表明，ScaleLong 方法可以更好地稳定训练UNet，并且可以在不同的diffusion模型和UNet/UViT backbones上提高训练速度约1.5倍。

Abstract
In diffusion models, UNet is the most popular network backbone, since its long skip connects (LSCs) to connect distant network blocks can aggregate long-distant information and alleviate vanishing gradient. Unfortunately, UNet often suffers from unstable training in diffusion models which can be alleviated by scaling its LSC coefficients smaller. However, theoretical understandings of the instability of UNet in diffusion models and also the performance improvement of LSC scaling remain absent yet. To solve this issue, we theoretically show that the coefficients of LSCs in UNet have big effects on the stableness of the forward and backward propagation and robustness of UNet. Specifically, the hidden feature and gradient of UNet at any layer can oscillate and their oscillation ranges are actually large which explains the instability of UNet training. Moreover, UNet is also provably sensitive to perturbed input, and predicts an output distant from the desired output, yielding oscillatory loss and thus oscillatory gradient. Besides, we also observe the theoretical benefits of the LSC coefficient scaling of UNet in the stableness of hidden features and gradient and also robustness. Finally, inspired by our theory, we propose an effective coefficient scaling framework ScaleLong that scales the coefficients of LSC in UNet and better improves the training stability of UNet. Experimental results on four famous datasets show that our methods are superior to stabilize training and yield about 1.5x training acceleration on different diffusion models with UNet or UViT backbones. Code: https://github.com/sail-sg/ScaleLong

摘要
在扩散模型中，UNet 是最受欢迎的网络脊梁，因为它的长距离连接（LSC）可以将远方网络块的信息聚合并使得渐减 gradient 问题得到解决。然而，UNet 在扩散模型中的训练过程经常存在不稳定性问题，这可以通过减小 LSC 系数来缓解。然而，关于 UNet 在扩散模型中不稳定性的理论理解和 LSC 系数缩放对性能的改进仍然缺乏研究。为解决这个问题，我们理论上表明了 LSC 系数在 UNet 中对前向和反向传播稳定性和 robustness 的大量影响。具体来说，UNet 的隐藏特征和梯度在任何层都可能会振荡，其振荡范围实际很大，这解释了 UNet 训练不稳定的原因。此外，我们还发现 UNet 对输入的小偏差会导致输出与期望输出远离，从而导致振荡的损失和振荡的梯度。除此之外，我们还发现 LSC 系数缩放对 UNet 的隐藏特征和梯度的稳定性和 robustness 有理论上的 beneficial 效果。最后，我们根据我们的理论，提出了一个有效的 coefficient scaling 框架 ScaleLong，可以更好地改进 UNet 的训练稳定性。实验结果表明，我们的方法在四个知名的 dataset 上的训练稳定性和训练速度比 Traditional UNet 和 UViT 更好，具体来说，我们的方法可以在不同的扩散模型上带来约 1.5 倍的训练加速。代码可以在 GitHub 上找到：https://github.com/sail-sg/ScaleLong

Positive-Unlabeled Node Classification with Structure-aware Graph Learning

paper_url: http://arxiv.org/abs/2310.13538
repo_url: None
paper_authors: Hansi Yang, Yongqi Zhang, Quanming Yao, James Kwok
for: 这篇论文是针对 graphs 上的 node classification 问题进行研究，特别是在 positive-unlabeled (PU) 的情况下。
methods: 本文提出了一个 distance-aware PU 损失函数，利用 graph 的 homophily 来提供更加精确的超级vision。此外，本文还提出了一个对 graph 结构进行调整的正规化器。
results: 实验结果显示，该方法在多种不同的 graph 数据集上表现出色，较先前的 state-of-the-art 方法有更好的性能。

Abstract
Node classification on graphs is an important research problem with many applications. Real-world graph data sets may not be balanced and accurate as assumed by most existing works. A challenging setting is positive-unlabeled (PU) node classification, where labeled nodes are restricted to positive nodes. It has diverse applications, e.g., pandemic prediction or network anomaly detection. Existing works on PU node classification overlook information in the graph structure, which can be critical. In this paper, we propose to better utilize graph structure for PU node classification. We first propose a distance-aware PU loss that uses homophily in graphs to introduce more accurate supervision. We also propose a regularizer to align the model with graph structure. Theoretical analysis shows that minimizing the proposed loss also leads to minimizing the expected loss with both positive and negative labels. Extensive empirical evaluation on diverse graph data sets demonstrates its superior performance over existing state-of-the-art methods.

摘要
节点分类在图上是一个重要的研究问题，具有多种应用。现实中的图数据集可能不具备准确性和平衡性，而大多数现有工作假设了这些假设。困难的设定是正面未标注（PU）节点分类，即标注节点只能是正面节点。它在多个应用中具有多样性，例如疫苗预测或网络异常检测。现有的PU节点分类方法忽略了图结构信息，这可能是关键。在这篇论文中，我们提议更好地利用图结构来进行PU节点分类。我们首先提出了距离意识PU损失函数，使用图中的同类关系（homophily）引入更加准确的监督。我们还提出了一种对齐模型与图结构的正则项。理论分析表明，尝试最小化我们提出的损失函数也将导致最小化预期损失函数中的正面和负面标签。我们在多种多样的图数据集进行了广泛的实验，证明了我们的方法在现有状态的方法之上表现出色。

Technical Report for ICCV 2023 Visual Continual Learning Challenge: Continuous Test-time Adaptation for Semantic Segmentation

paper_url: http://arxiv.org/abs/2310.13533
repo_url: None
paper_authors: Damian Sójka, Yuyang Liu, Dipam Goswami, Sebastian Cygert, Bartłomiej Twardowski, Joost van de Weijer
for:The paper is written for developing a test-time adaptation (TTA) method for semantic segmentation in video sequences, specifically for adapting to gradually changing domains caused by weather conditions and time of day.methods:The proposed TTA method uses a synthetic driving video dataset called SHIFT, and the source model is trained on images taken during daytime in clear weather. The method adapts the model to the changing domains by analyzing the distributional shift and developing a method that can generalize across different scenarios.results:The proposed method secured a 3rd place in a challenge and received an innovation award, outperforming solutions that used external pretrained models or specialized data augmentations. The method demonstrated the ability to adapt to changing data dynamics and generalize across different scenarios.Here’s the Chinese translation of the three points:for:本文是为开发一种测试时适应（TTA）方法，用于视频序列中的 semantics 分割任务，特别是针对逐渐变化的领域所导致的变化。methods:提议的 TTA 方法使用了一个 sintetic driving 视频数据集 called SHIFT，并使用源模型在晴朗的日子时光下训练。该方法通过分析分布Shift的方法，来适应不同的enario。results:提议的方法在一个挑战中获得第三名并获得了创新奖，比其他使用外部预训练模型或特殊数据增强的解决方案更好。该方法表明了适应不同的数据动态和场景的能力。

Abstract
The goal of the challenge is to develop a test-time adaptation (TTA) method, which could adapt the model to gradually changing domains in video sequences for semantic segmentation task. It is based on a synthetic driving video dataset - SHIFT. The source model is trained on images taken during daytime in clear weather. Domain changes at test-time are mainly caused by varying weather conditions and times of day. The TTA methods are evaluated in each image sequence (video) separately, meaning the model is reset to the source model state before the next sequence. Images come one by one and a prediction has to be made at the arrival of each frame. Each sequence is composed of 401 images and starts with the source domain, then gradually drifts to a different one (changing weather or time of day) until the middle of the sequence. In the second half of the sequence, the domain gradually shifts back to the source one. Ground truth data is available only for the validation split of the SHIFT dataset, in which there are only six sequences that start and end with the source domain. We conduct an analysis specifically on those sequences. Ground truth data for test split, on which the developed TTA methods are evaluated for leader board ranking, are not publicly available. The proposed solution secured a 3rd place in a challenge and received an innovation award. Contrary to the solutions that scored better, we did not use any external pretrained models or specialized data augmentations, to keep the solutions as general as possible. We have focused on analyzing the distributional shift and developing a method that could adapt to changing data dynamics and generalize across different scenarios.

摘要
挑战目标是开发一种测试时适应（TTA）方法，用于在视频序列中进行Semantic Segmentation任务中的模型适应。该方法基于一个合成的驾驶视频集（SHIFT），源模型在日间晴朗的图像上训练。测试时的领域变化主要由不同的天气和时间条件引起。TTA方法在每个图像序列（视频）上进行评估，因此模型在下一个序列之前被重置到源模型状态。图像来一个一个，需要在每帧预测。每个序列由401帧组成，开头是源领域，然后慢慢地变化到不同的领域（不同的天气或时间），中间一部分是源领域，然后再次变化回源领域。VALIDATION Split的真实数据可以进行特定的分析，但是TEST Split的真实数据，用于评估开发的TTA方法，不公开。我们的解决方案在挑战中获得了第三名，并且获得了创新奖。与其他更高分的解决方案不同，我们没有使用任何外部预训练模型或特殊的数据增强，以保持解决方案的通用性。我们主要关注分布式变化的分析，并开发了一种能够适应不同数据动态的方法，并且在不同的场景下具有普适性。

Design-Inclusive Language Models for Responsible Information Access

paper_url: http://arxiv.org/abs/2310.18333
repo_url: None
paper_authors: Veronica Chatrath, Oluwanifemi Bamgbose, Shaina Raza
for: 这个研究旨在发展一个名为“责任预测语言模型（ReDev）”的框架，以促进对所有用户而言的公正、安全和可靠的语言模型开发。
methods: 研究人员使用了一个专门设计来评估语言模型的测试组合，以确保模型的输出不含有偏见或害害的内容。
results: 研究发现，现有的四个州录语言模型（OPT、GPT-3.5、GPT-4和LLaMA-2）在不同的测试中表现不佳，表明需要在机器学习管线中考虑公正、安全和可靠性。

Abstract
As the use of large language models (LLMs) increases for everyday tasks, appropriate safeguards must be in place to ensure unbiased and safe output. Recent events highlight ethical concerns around conventionally trained LLMs, leading to overall unsafe user experiences. This motivates the need for responsible LLMs that are trained fairly, transparent to the public, and regularly monitored after deployment. In this work, we introduce the "Responsible Development of Language Models (ReDev)" framework to foster the development of fair, safe, and robust LLMs for all users. We also present a test suite of unique prompt types to assess LLMs on the aforementioned elements, ensuring all generated responses are non-harmful and free from biased content. Outputs from four state-of-the-art LLMs, OPT, GPT-3.5, GPT-4, and LLaMA-2, are evaluated by our test suite, highlighting the importance of considering fairness, safety, and robustness at every stage of the machine learning pipeline, including data curation, training, and post-deployment.

摘要
As the use of large language models (LLMs) increases for everyday tasks, appropriate safeguards must be in place to ensure unbiased and safe output. Recent events highlight ethical concerns around conventionally trained LLMs, leading to overall unsafe user experiences. This motivates the need for responsible LLMs that are trained fairly, transparent to the public, and regularly monitored after deployment. In this work, we introduce the "Responsible Development of Language Models (ReDev)" framework to foster the development of fair, safe, and robust LLMs for all users. We also present a test suite of unique prompt types to assess LLMs on the aforementioned elements, ensuring all generated responses are non-harmful and free from biased content. Outputs from four state-of-the-art LLMs, OPT, GPT-3.5, GPT-4, and LLaMA-2, are evaluated by our test suite, highlighting the importance of considering fairness, safety, and robustness at every stage of the machine learning pipeline, including data curation, training, and post-deployment.Here's the translation in Traditional Chinese:当大语言模型（LLMs）在日常任务中使用时，应该有适当的安全措施，以确保无偏袋和安全的输出。最近的事件显示了传统的 LLMS 对于不公正和不安全的用户体验问题，这引起了发展公正、透明和定期监控的 LLMS 的需求。在这个工作中，我们介绍了“责任的语言模型开发（ReDev）”框架，以促进公正、安全和可靠的 LLMS 的开发。我们还提出了一个对 LLMS 进行评估的测试组合，以确保所有生成的回应都是无害的和不受偏袋的。从四个现代 LLMS 中，OPT、GPT-3.5、GPT-4和LLaMA-2 的输出都被我们的测试组合评估，强调了在机器学习管线中考虑公平、安全和可靠性的重要性，包括数据混合、训练和部署阶段。

Variational measurement-based quantum computation for generative modeling

paper_url: http://arxiv.org/abs/2310.13524
repo_url: None
paper_authors: Arunava Majumder, Marius Krumm, Tina Radkohl, Hendrik Poulsen Nautrup, Sofiene Jerbi, Hans J. Briegel
for: 这篇论文旨在探讨量子测量计算（MBQC）如何使用随机性来进行计算，并explore MBQC 算法可以捕捉这种随机性作为计算资源。
methods: 该论文提出了一种基于 MBQC 的变量算法，该算法通过控制参数来调整随机性的度量，以提高生成模型的学习性能。
results: 数值研究表明，在某些生成模型任务中，该算法可以获得显著的提升，这些结果验证了 MBQC 中随机性的可能的优势，并鼓励进一步的MBQC-based算法的研究。

Abstract
Measurement-based quantum computation (MBQC) offers a fundamentally unique paradigm to design quantum algorithms. Indeed, due to the inherent randomness of quantum measurements, the natural operations in MBQC are not deterministic and unitary, but are rather augmented with probabilistic byproducts. Yet, the main algorithmic use of MBQC so far has been to completely counteract this probabilistic nature in order to simulate unitary computations expressed in the circuit model. In this work, we propose designing MBQC algorithms that embrace this inherent randomness and treat the random byproducts in MBQC as a resource for computation. As a natural application where randomness can be beneficial, we consider generative modeling, a task in machine learning centered around generating complex probability distributions. To address this task, we propose a variational MBQC algorithm equipped with control parameters that allow to directly adjust the degree of randomness to be admitted in the computation. Our numerical findings indicate that this additional randomness can lead to significant gains in learning performance in certain generative modeling tasks. These results highlight the potential advantages in exploiting the inherent randomness of MBQC and motivate further research into MBQC-based algorithms.

摘要
生成模型是机器学习中心的一个任务，旨在生成复杂的概率分布。我们提议一种基于MBQC的变量算法，具有控制参数，以直接控制计算中的随机性水平。我们的numerical findings表明，这种额外的随机性可以导致certain generative modeling tasks中的学习性能提高。这些结果表明了利用MBQC的随机性的优势，并促进了进一步的MBQC算法研究。

RaceLens: A Machine Intelligence-Based Application for Racing Photo Analysis

paper_url: http://arxiv.org/abs/2310.13515
repo_url: None
paper_authors: Andrei Boiarov, Dmitry Bleklov, Pavlo Bredikhin, Nikita Koritsky, Sergey Ulasen
for: 本研究开发了一个名为 RaceLens 的应用程序，用于精确分析赛车照片。
methods: 本研究使用了进步的深度学习和计算机视觉模型，并实现了访问车辆、车号码、车辆细节和车辆方向的检测和识别。
results: 研究发现，RaceLens 可以实现高度的精确性和效能，并且在 NASCAR 队伍的四个赛季中得到了成功的应用。研究还提供了实际应用的评估和车队的策略决策和性能指标的影响。

Abstract
This paper presents RaceLens, a novel application utilizing advanced deep learning and computer vision models for comprehensive analysis of racing photos. The developed models have demonstrated their efficiency in a wide array of tasks, including detecting racing cars, recognizing car numbers, detecting and quantifying car details, and recognizing car orientations. We discuss the process of collecting a robust dataset necessary for training our models, and describe an approach we have designed to augment and improve this dataset continually. Our method leverages a feedback loop for continuous model improvement, thus enhancing the performance and accuracy of RaceLens over time. A significant part of our study is dedicated to illustrating the practical application of RaceLens, focusing on its successful deployment by NASCAR teams over four seasons. We provide a comprehensive evaluation of our system's performance and its direct impact on the team's strategic decisions and performance metrics. The results underscore the transformative potential of machine intelligence in the competitive and dynamic world of car racing, setting a precedent for future applications.

摘要
Translation notes:* "advanced deep learning and computer vision models" is translated as "高级深度学习和计算机视觉模型" (gāojí shēngrán yǔ jìsuān zhìshuāng)* "comprehensive analysis" is translated as "全面分析" (quánmiàn fāng'àn)* "racing photos" is translated as "赛车照片" (sàichē zhezhe)* "car numbers" is translated as "车辆号码" (chēliàng hàoqī)* "car details" is translated as "车辆细节" (chēliàng xiǎojiě)* "car orientations" is translated as "车辆方向" (chēliàng fāngdòng)* "robust dataset" is translated as "可靠的数据集" (kějì de xiàngxīn)* "feedback loop" is translated as "反馈循环" (fǎnggǎn xiàngxīn)* "continuous model improvement" is translated as "连续模型改进" (liánxù módel gǎijì)* "practical application" is translated as "实用应用" (shíyòng yìngyì)* "successfully deployed" is translated as "成功应用" (chéngjì yìngyì)* "four seasons" is translated as "四季" (sì jì)

Explaining Interactions Between Text Spans

paper_url: http://arxiv.org/abs/2310.13506
repo_url: https://github.com/copenlu/spanex
paper_authors: Sagnik Ray Choudhury, Pepa Atanasova, Isabelle Augenstein
for: This paper aims to provide explanations for natural language understanding (NLU) tasks such as fact-checking (FC) and machine reading comprehension (MRC).
methods: The paper introduces a multi-annotator dataset of human span interaction explanations for NLU tasks, and investigates the decision-making processes of fine-tuned large language models in terms of the connections between spans in separate parts of the input.
results: The paper presents a novel community detection based unsupervised method to extract interaction explanations from a model’s inner workings.

Abstract
Reasoning over spans of tokens from different parts of the input is essential for natural language understanding (NLU) tasks such as fact-checking (FC), machine reading comprehension (MRC) or natural language inference (NLI). However, existing highlight-based explanations primarily focus on identifying individual important tokens or interactions only between adjacent tokens or tuples of tokens. Most notably, there is a lack of annotations capturing the human decision-making process w.r.t. the necessary interactions for informed decision-making in such tasks. To bridge this gap, we introduce SpanEx, a multi-annotator dataset of human span interaction explanations for two NLU tasks: NLI and FC. We then investigate the decision-making processes of multiple fine-tuned large language models in terms of the employed connections between spans in separate parts of the input and compare them to the human reasoning processes. Finally, we present a novel community detection based unsupervised method to extract such interaction explanations from a model's inner workings.

摘要
<>将文本翻译成简化中文。>自然语言理解（NLU）任务中，理智推理 sobre 各个 Token 的范围是非常重要的，例如实验室检查（FC）、机器阅读理解（MRC）或自然语言推理（NLI）。然而，现有的高亮显示型解释主要集中于特定的重要 Token 或邻近 Token 或 Tuple 的交互。最引人注意的是，缺乏记录人类决策过程中对 informed 决策所需的交互。为了bridging这个差距，我们引入 SpanEx，一个多个注释者数据集，其中包含人类对 NLI 和 FC 任务中的 span 交互解释。然后，我们调查多个精细调节的大语言模型在输入中不同部分的 span 之间的连接方式，并与人类的决策过程进行比较。最后，我们提出一种基于社群探测的无监督方法，用于从模型内部提取这些交互解释。

Robust Training for Conversational Question Answering Models with Reinforced Reformulation Generation

paper_url: http://arxiv.org/abs/2310.13505
repo_url: https://github.com/magkai/REIGN
paper_authors: Magdalena Kaiser, Rishiraj Saha Roy, Gerhard Weikum
for: 提高 conversational question answering (ConvQA) 模型在知识图 (KG) 上的表现，并且让模型能够更好地适应不同的表达形式。
methods: 提出了一种 frameworks named REIGN，通过系统地生成问题的 reformulations，提高模型对表达形式的弹性性，并使用深度强化学习指导模型提高答案质量。
results: 研究发现，通过使用 reformulations 进行强化学习，ConvQA 模型能够显著超越使用标准训练方法的模型，并且能够在不同的测试集上表现良好。

Abstract
Models for conversational question answering (ConvQA) over knowledge graphs (KGs) are usually trained and tested on benchmarks of gold QA pairs. This implies that training is limited to surface forms seen in the respective datasets, and evaluation is on a small set of held-out questions. Through our proposed framework REIGN, we take several steps to remedy this restricted learning setup. First, we systematically generate reformulations of training questions to increase robustness of models to surface form variations. This is a particularly challenging problem, given the incomplete nature of such questions. Second, we guide ConvQA models towards higher performance by feeding it only those reformulations that help improve their answering quality, using deep reinforcement learning. Third, we demonstrate the viability of training major model components on one benchmark and applying them zero-shot to another. Finally, for a rigorous evaluation of robustness for trained models, we use and release large numbers of diverse reformulations generated by prompting GPT for benchmark test sets (resulting in 20x increase in sizes). Our findings show that ConvQA models with robust training via reformulations, significantly outperform those with standard training from gold QA pairs only.

摘要
模型 для对话式问答（ConvQA）在知识图（KG）上通常由金标准问答对（gold QA pairs）进行训练和测试。这意味着训练是基于表面形式所限制的，而测试是基于一小组封闭的问答。我们的提出的框架REIGN通过以下几个步骤来缓解这种受限的学习环境：首先，我们系统地生成了训练问题的重新形式，以提高模型对表面形式的鲁棒性。这是一个特别困难的问题，因为训练问题是不完整的。其次，我们使用深度奖励学习引导ConvQA模型，只有那些能够提高答案质量的重新形式。第三，我们证明了训练主要模型组件在一个benchmark上训练后，可以零式应用到另一个benchmark上。最后，为了进行训练后模型的严格评估，我们使用和发布大量多样化的重新形式，由GPT提问 benchmark测试集（ resulting in 20x increase in size）。我们的发现表明，通过对模型进行robust训练 via 重新形式，ConvQA模型可以明显超越标准训练从金标准问答对只进行训练的模型。

Analogical Proportions and Creativity: A Preliminary Study

paper_url: http://arxiv.org/abs/2310.13500
repo_url: None
paper_authors: Stergos Afantenos, Henri Prade, Leonardo Cortez Bernardes
For: The paper is written to explore the use of analogical proportions for creating novel animal descriptions and retrieving rare animals.* Methods: The paper uses word embeddings and Boolean features to propose novel animals based on analogical proportions.* Results: The paper shows that word embeddings obtain better results in creating novel animals based on analogical proportions compared to Boolean features.Here’s the information in Simplified Chinese text:
for: 这篇论文是为了探讨使用对比比例来创造新的动物描述和找到罕见的动物。
methods: 这篇论文使用word embedding和布尔特Feature来提出新的动物基于对比比例。
results: 这篇论文显示word embedding在基于对比比例创造新动物方面比布尔特Feature更好的效果。

Abstract
Analogical proportions are statements of the form "$a$ is to $b$ as $c$ is to $d$", which expresses that the comparisons of the elements in pair $(a, b)$ and in pair $(c, d)$ yield similar results. Analogical proportions are creative in the sense that given 3 distinct items, the representation of a 4th item $d$, distinct from the previous items, which forms an analogical proportion with them can be calculated, provided certain conditions are met. After providing an introduction to analogical proportions and their properties, the paper reports the results of an experiment made with a database of animal descriptions and their class, where we try to "create" new animals from existing ones, retrieving rare animals such as platypus. We perform a series of experiments using word embeddings as well as Boolean features in order to propose novel animals based on analogical proportions, showing that word embeddings obtain better results.

摘要
<>将文本翻译成简化中文。<>对比比例是形式如 "$a$ 是 $b$ 的为 $c$ 是 $d$"，表达了对 $(a, b)$ 对和 $(c, d)$ 对的比较结果相似。对比比例是创造性的，即给定三个不同的 Item，可以计算出一个第四个 Item $d$，与之前的 Item 相似，并且满足certain conditions。文章 introduce analogical proportions and their properties, and then reports the results of an experiment using a database of animal descriptions and their class, where we try to "create" new animals from existing ones, retrieving rare animals such as platypus. We perform a series of experiments using word embeddings as well as Boolean features in order to propose novel animals based on analogical proportions, showing that word embeddings obtain better results.Here's the translation in Traditional Chinese:<>将文本翻译成繁体中文。<>对比比例是形式如 "$a$ 是 $b$ 的为 $c$ 是 $d$"，表达了对 $(a, b)$ 对和 $(c, d)$ 对的比较结果相似。对比比例是创造性的，即给定三个不同的 Item，可以计算出一个第四个 Item $d$，与之前的 Item 相似，并且满足certain conditions。文章 introduce analogical proportions and their properties, and then reports the results of an experiment using a database of animal descriptions and their class, where we try to "create" new animals from existing ones, retrieving rare animals such as platypus. We perform a series of experiments using word embeddings as well as Boolean features in order to propose novel animals based on analogical proportions, showing that word embeddings obtain better results.

Mind the instructions: a holistic evaluation of consistency and interactions in prompt-based learning

paper_url: http://arxiv.org/abs/2310.13486
repo_url: None
paper_authors: Lucas Weber, Elia Bruni, Dieuwke Hupkes
for: 本研究的目的是找到现有语言模型的最佳适应方式，以便在当前NLP中进行任务适应。
methods: 本研究使用了启发式学习（Prompting）方法，并进行了系统atic的评估，以找出不同因素对预测结果的影响。
results: 研究发现，某些因素会导致预测结果不稳定和不一致，而其他因素则可以无恶不害地使用。这些结论可以帮助选择适合的适应方式，以提高任务适应的效果。

Abstract
Finding the best way of adapting pre-trained language models to a task is a big challenge in current NLP. Just like the previous generation of task-tuned models (TT), models that are adapted to tasks via in-context-learning (ICL) are robust in some setups but not in others. Here, we present a detailed analysis of which design choices cause instabilities and inconsistencies in LLM predictions. First, we show how spurious correlations between input distributions and labels -- a known issue in TT models -- form only a minor problem for prompted models. Then, we engage in a systematic, holistic evaluation of different factors that have been found to influence predictions in a prompting setup. We test all possible combinations of a range of factors on both vanilla and instruction-tuned (IT) LLMs of different scale and statistically analyse the results to show which factors are the most influential, interactive or stable. Our results show which factors can be used without precautions and which should be avoided or handled with care in most settings.

摘要
现在的自然语言处理中，找到最佳适应预训练语言模型任务的方式是一大挑战。与前一代任务调整模型（TT）类似，通过在 контекст学习（ICL）方式进行适应，模型在某些设置下 Displaying Robustness，但在其他设置下却存在不稳定和不一致的问题。在这里，我们提供了适应模型预测中的不稳定和不一致的分析。首先，我们表明了预训练模型输入分布和标签之间的假 correlations（一个已知的问题）在提问模型中只占了一小部分。然后，我们进行了系统性的全面性评估不同因素对预测的影响。我们测试了所有可能的组合，包括不同的规模和不同的预测模型，并使用统计分析来显示这些因素对预测的影响程度，以及它们之间的互动和稳定性。我们的结果表明了哪些因素可以无需预caution使用，哪些因素应该避免或处理得更加小心。

Application of deep learning for livestock behaviour recognition: A systematic literature review

paper_url: http://arxiv.org/abs/2310.13483
repo_url: None
paper_authors: Ali Rohan, Muhammad Saad Rafaq, Md. Junayed Hasan, Furqan Asghar, Ali Kashif Bashir, Tania Dottorini
for: 这个论文主要是为了研究使用深度学习技术来识别畜牧动物的行为。
methods: 这个论文使用了多种深度学习模型和网络，包括CNN、Faster R-CNN、YOLOv5和YOLOv4等模型，以及VGG16、CSPDarknet53、GoogLeNet、ResNet101和ResNet50等网络。
results: 这个论文的研究表明，深度学习成功地解决了13种行为识别问题，包括44种不同的行为类型。

Abstract
Livestock health and welfare monitoring has traditionally been a labor-intensive task performed manually. Recent advances have led to the adoption of AI and computer vision techniques, particularly deep learning models, as decision-making tools within the livestock industry. These models have been employed for tasks like animal identification, tracking, body part recognition, and species classification. In the past decade, there has been a growing interest in using these models to explore the connection between livestock behaviour and health issues. While previous review studies have been rather generic, there is currently no review study specifically focusing on DL for livestock behaviour recognition. Hence, this systematic literature review (SLR) was conducted. The SLR involved an initial search across electronic databases, resulting in 1101 publications. After applying defined selection criteria, 126 publications were shortlisted. These publications were further filtered based on quality criteria, resulting in the selection of 44 high-quality primary studies. These studies were analysed to address the research questions. The results showed that DL successfully addressed 13 behaviour recognition problems encompassing 44 different behaviour classes. A variety of DL models and networks were employed, with CNN, Faster R-CNN, YOLOv5, and YOLOv4 being among the most common models, and VGG16, CSPDarknet53, GoogLeNet, ResNet101, and ResNet50 being popular networks. Performance evaluation involved ten different matrices, with precision and accuracy being the most frequently used. Primary studies identified challenges, including occlusion, adhesion, data imbalance, and the complexities of the livestock environment. The SLR study also discussed potential solutions and research directions to facilitate the development of autonomous livestock behaviour recognition systems.

摘要
livestock health和福祉监测曾经是一项劳动密集的任务，通常是手动完成的。近年来，人工智能和计算机视觉技术的应用，特别是深度学习模型，在畜牧业中被用作决策支持工具。这些模型已被用于动物识别、跟踪、身体部分识别和种类分类等任务。过去的十年中，有一个增长的兴趣在使用这些模型探索畜牧动物行为和健康问题之间的联系。而前一个综述研究已经是非常通用的，但是目前没有专门关于深度学习的畜牧动物行为识别综述。因此，这项系统性文献综述（SLR）被进行了。SLR involve了电子数据库的初步搜索，共计1101篇论文。经过应用定义的选择 criterion，短listed 126篇论文。这些论文进一步根据质量标准进行筛选，选择了44篇高品质的原始研究。这些研究被分析以解答研究 вопро题。结果显示，深度学习成功解决了13种行为识别问题，涵盖44种不同的行为类型。多种深度学习模型和网络被使用，其中CNN、Faster R-CNN、YOLOv5和YOLOv4是最常用的模型，而VGG16、CSPDarknet53、GoogLeNet、ResNet101和ResNet50是最受欢迎的网络。性能评价使用了十种不同的矩阵，准确率和准确率是最常用的两个矩阵。原始研究认为， occlusion、黏合、数据不均衡和畜牧环境的复杂性是挑战。SLR 研究还讨论了可能的解决方案和研究方向，以便开发自主的畜牧动物行为识别系统。

Ask Language Model to Clean Your Noisy Translation Data

paper_url: http://arxiv.org/abs/2310.13469
repo_url: None
paper_authors: Quinten Bolding, Baohao Liao, Brandon James Denis, Jun Luo, Christof Monz
for: 该研究旨在提高MTNT数据集的用途，以便更好地评估Neural Machine Translation（NMT）模型对听频输入的敏感性。
methods: 该研究使用大语言模型（LLM）进行听频提取和重塑，从而提高MTNT数据集的清晰度。
results: 研究表明，LLM可以有效地去除听频，同时保留原始句子的 semantics。此外，LLM还能够重塑slang、argot和低语言。这些数据集被称为C-MTNT，并且在评估NMT模型的Robustness方面表现出色。

Abstract
Transformer models have demonstrated remarkable performance in neural machine translation (NMT). However, their vulnerability to noisy input poses a significant challenge in practical implementation, where generating clean output from noisy input is crucial. The MTNT dataset is widely used as a benchmark for evaluating the robustness of NMT models against noisy input. Nevertheless, its utility is limited due to the presence of noise in both the source and target sentences. To address this limitation, we focus on cleaning the noise from the target sentences in MTNT, making it more suitable as a benchmark for noise evaluation. Leveraging the capabilities of large language models (LLMs), we observe their impressive abilities in noise removal. For example, they can remove emojis while considering their semantic meaning. Additionally, we show that LLM can effectively rephrase slang, jargon, and profanities. The resulting datasets, called C-MTNT, exhibit significantly less noise in the target sentences while preserving the semantic integrity of the original sentences. Our human and GPT-4 evaluations also lead to a consistent conclusion that LLM performs well on this task. Lastly, experiments on C-MTNT showcased its effectiveness in evaluating the robustness of NMT models, highlighting the potential of advanced language models for data cleaning and emphasizing C-MTNT as a valuable resource.

摘要
启示器模型在神经机器翻译（NMT）中表现出了惊人的能力。然而，它们对听风输入的敏感性带来了实际应用中的挑战，因为生成干净输出从听风输入是关键。MTNT数据集广泛用于评估NMT模型对听风输入的Robustness。然而，该数据集的利用受到了源和目标句子都含有噪音的限制。为解决这一问题，我们将目标句子中的噪音除掉，使MTNT数据集更适合用于噪音评估。利用大语言模型（LLM）的能力，我们发现它们可以从目标句子中除掉表情符号，同时考虑其 semantics 意义。此外，我们发现 LLM 可以有效地重塑 slang、短语和恶语。所得到的数据集，称为 C-MTNT，具有明显 menos 的噪音，同时保持原始句子的 semantics 完整性。我们的人工评估和 GPT-4 评估也导致了一致的结论：LLM 在这种任务上表现良好。最后，我们对 C-MTNT 进行了实验，并证明了它在评估 NMT 模型的 Robustness 方面的效果，强调了高级语言模型在数据清洁方面的潜在力量，并将 C-MTNT 作为一个价值的资源。

WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models

paper_url: http://arxiv.org/abs/2310.18332
repo_url: None
paper_authors: Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Yusen Hu, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren Zhou
for: 这个论文旨在探讨一种基于大语言模型（LLM）的用户驱动框架，用于创造艺术 typography。
methods: 该系统包括四个关键模块：”LLM Engine”、”SemTypo”、”StyTypo”和”TexTypo”模块。”LLM Engine” 通过 LLM（如 GPT-3.5-turbo）解释用户输入并生成可行的提示，将抽象概念转化成可见的设计。”SemTypo module” 利用语义概念优化字体设计，寻求平衡艺术变换和可读性。”StyTypo module” 基于语义布局生成细腻的图像。”TexTypo module” 进一步提高设计的美学效果通过 текстуر渲染。
results: 该系统能够生成创新的文本字体，并且可以在 ModelScope 上体验其能力：https://www.modelscope.cn/studios/WordArt/WordArt.

Abstract
This paper introduces "WordArt Designer", a user-driven framework for artistic typography synthesis, relying on Large Language Models (LLM). The system incorporates four key modules: the "LLM Engine", "SemTypo", "StyTypo", and "TexTypo" modules. 1) The "LLM Engine", empowered by LLM (e.g., GPT-3.5-turbo), interprets user inputs and generates actionable prompts for the other modules, thereby transforming abstract concepts into tangible designs. 2) The "SemTypo module" optimizes font designs using semantic concepts, striking a balance between artistic transformation and readability. 3) Building on the semantic layout provided by the "SemTypo module", the "StyTypo module" creates smooth, refined images. 4) The "TexTypo module" further enhances the design's aesthetics through texture rendering, enabling the generation of inventive textured fonts. Notably, "WordArt Designer" highlights the fusion of generative AI with artistic typography. Experience its capabilities on ModelScope: https://www.modelscope.cn/studios/WordArt/WordArt.

摘要

“LLM Engine”: empowered by LLM (e.g., GPT-3.5-turbo), it interprets user inputs and generates actionable prompts for the other modules, transforming abstract concepts into tangible designs.2. “SemTypo module”: optimizes font designs using semantic concepts, balancing artistic transformation and readability.3. “StyTypo module”: creates smooth, refined images based on the semantic layout provided by the “SemTypo module”.4. “TexTypo module”: enhances the design’s aesthetics through texture rendering, enabling the generation of inventive textured fonts.”WordArt Designer” showcases the fusion of generative AI with artistic typography. Experience its capabilities on ModelScope: https://www.modelscope.cn/studios/WordArt/WordArt.

Multiscale Superpixel Structured Difference Graph Convolutional Network for VL Representation

paper_url: http://arxiv.org/abs/2310.13447
repo_url: None
paper_authors: Siyu Zhang, Yeming Chen, Sirui Cheng, Yaoru Sun, Jun Yang, Lizhi Bai
for: 本研究旨在提高多Modal Semantic Representation，尤其是vision和language之间的同步。
methods: 本研究使用自适应学习的pre-trained模型，并提出了superpixel作为可学习图像数据的全面表示，以及Multiscale Difference Graph Convolutional Network (MDGCN)来捕捉多尺度特征。
results: 对多个下游任务学习，本研究可以与其他状态调研方法竞争，并且可以更好地捕捉图像的空间semantic关系。

Abstract
Within the multimodal field, the key to integrating vision and language lies in establishing a good alignment strategy. Recently, benefiting from the success of self-supervised learning, significant progress has been made in multimodal semantic representation based on pre-trained models for vision and language. However, there is still room for improvement in visual semantic representation. The lack of spatial semantic coherence and vulnerability to noise makes it challenging for current pixel or patch-based methods to accurately extract complex scene boundaries. To this end, this paper develops superpixel as a comprehensive compact representation of learnable image data, which effectively reduces the number of visual primitives for subsequent processing by clustering perceptually similar pixels. To mine more precise topological relations, we propose a Multiscale Difference Graph Convolutional Network (MDGCN). It parses the entire image as a fine-to-coarse hierarchical structure of constituent visual patterns, and captures multiscale features by progressively merging adjacent superpixels as graph nodes. Moreover, we predict the differences between adjacent nodes through the graph structure, facilitating key information aggregation of graph nodes to reason actual semantic relations. Afterward, we design a multi-level fusion rule in a bottom-up manner to avoid understanding deviation by learning complementary spatial information at different regional scales. Our proposed method can be well applied to multiple downstream task learning. Extensive experiments demonstrate that our method is competitive with other state-of-the-art methods in visual reasoning. Our code will be released upon publication.

摘要
在多模态场景中，与语言集成的关键在于设置良好的对齐策略。在最近，基于自动学习的成功，我们在多模态semantic表示方面进行了显著的进步，但是视觉semantic表示仍然存在改进的空间。现有的像素或patch基本方法难以准确地提取复杂的场景边界，主要是因为缺乏空间semantic准确性和噪声抑制的能力。为了解决这些问题，本文提出了superpixel作为可学习图像数据的总体紧凑表示，可以有效减少后续处理的视觉元素数量。此外，我们还提出了多尺度差分图 convolutional neural network（MDGCN）来捕捉多尺度特征。MDGCN通过将整个图像分解为一个细到广的 hierarchical结构，并在不同尺度上 merge 相似的superpixel作为图节点，来捕捉多尺度特征。此外，我们还预测了图节点之间的差异，以便通过图结构来汇总实际semantic关系的信息。最后，我们设计了一种多级融合规则，以避免不同地域尺度上的理解偏差。我们的提出的方法可以应用于多个下游任务学习。广泛的实验表明，我们的方法与其他当前领先的方法在视理解方面具有竞争力。我们的代码将在发表时公布。

Self-Consistency of Large Language Models under Ambiguity

paper_url: http://arxiv.org/abs/2310.13439
repo_url: https://github.com/jacobpfau/introspective-self-consistency
paper_authors: Henning Bartsch, Ole Jorgensen, Domenic Rosati, Jason Hoelscher-Obermaier, Jacob Pfau
for: 这个论文的目的是评估自身一致性在不充分约束下的情况，以及模型在不同上下文中的一致性是否能够保持。
methods: 这个论文使用了OpenAI模型集进行了一系列行为实验，测试了模型在一个抽象整数序列完成任务上的表现，并发现模型的一致性范围为67%-82%，高于随机预测的水平，并且随着模型能力的提高而增加。
results: 这个论文发现，即使模型在不同的Robustness Check中保持了自身一致性，但模型并不总能够正确地评估自身一致性。此外，模型通常会将一些不一致的答案分配给一定的概率，这提供了内部计算多个可能答案的证据。

Abstract
Large language models (LLMs) that do not give consistent answers across contexts are problematic when used for tasks with expectations of consistency, e.g., question-answering, explanations, etc. Our work presents an evaluation benchmark for self-consistency in cases of under-specification where two or more answers can be correct. We conduct a series of behavioral experiments on the OpenAI model suite using an ambiguous integer sequence completion task. We find that average consistency ranges from 67\% to 82\%, far higher than would be predicted if a model's consistency was random, and increases as model capability improves. Furthermore, we show that models tend to maintain self-consistency across a series of robustness checks, including prompting speaker changes and sequence length changes. These results suggest that self-consistency arises as an emergent capability without specifically training for it. Despite this, we find that models are uncalibrated when judging their own consistency, with models displaying both over- and under-confidence. We also propose a nonparametric test for determining from token output distribution whether a model assigns non-trivial probability to alternative answers. Using this test, we find that despite increases in self-consistency, models usually place significant weight on alternative, inconsistent answers. This distribution of probability mass provides evidence that even highly self-consistent models internally compute multiple possible responses.

摘要
大型语言模型（LLM）无法在不同的上下文中给出一致的答案是问题当用于需要一致性的任务，例如问答、解释等。我们的工作提供了一个评估标准 benchmark for自我一致性在不足规定的情况下，其中有两个或更多的答案都可以是正确的。我们使用OpenAI模型组合进行了一系列的行为实验，包括一个模糊的数字序列完成任务。我们发现，自我一致性的平均值在67%至82%之间，远高于随机的预期，并随模型能力的提高而增加。此外，我们发现模型在多个坚固性检查中保持自我一致性，包括说话人变化和序列长度变化。这些结果表明自我一致性是一个自然而然的能力，不需要具体地训练。尽管如此，我们发现模型对自己的一致性仍然无法准确评估，模型会表现出过度和不足的自信。我们还提出了一个非Parametric测试，用于决定模型从字串输出分布是否将非零概率分配给alternative答案。使用这个测试，我们发现，即使自我一致性提高，模型通常将重要的概率分配给不一致的答案。这个分布的概率质量提供了证据，表明even highly self-consistent models internally compute multiple possible responses。

Random Matrix Analysis to Balance between Supervised and Unsupervised Learning under the Low Density Separation Assumption

paper_url: http://arxiv.org/abs/2310.13434
repo_url: None
paper_authors: Vasilii Feofanov, Malik Tiomoko, Aladin Virmaux
for: 本研究旨在提出一个理论框架，用于分类下降精度分布假设在高维度情况下。
methods: 我们引入QLDS，一种线性分类模型，在这里实现低概率分离假设通过 quadratic margin maximization。我们提供了可解释的算法，并证明其在某些特定情况下与最小二乘支持向量机、完全无监督分类和半监督图像分类方法相同。
results: 我们使用最新的随机矩阵理论来正式评估分类错误率的数学性评估。此外，我们还提出了一种hyperparameter选择策略，可以找到最佳的超参数，并进行了一系列的示例和实验，证明QLDS在计算效率和分类质量上都有优势，而且可以超越交叉验证。

Abstract
We propose a theoretical framework to analyze semi-supervised classification under the low density separation assumption in a high-dimensional regime. In particular, we introduce QLDS, a linear classification model, where the low density separation assumption is implemented via quadratic margin maximization. The algorithm has an explicit solution with rich theoretical properties, and we show that particular cases of our algorithm are the least-square support vector machine in the supervised case, the spectral clustering in the fully unsupervised regime, and a class of semi-supervised graph-based approaches. As such, QLDS establishes a smooth bridge between these supervised and unsupervised learning methods. Using recent advances in the random matrix theory, we formally derive a theoretical evaluation of the classification error in the asymptotic regime. As an application, we derive a hyperparameter selection policy that finds the best balance between the supervised and the unsupervised terms of our learning criterion. Finally, we provide extensive illustrations of our framework, as well as an experimental study on several benchmarks to demonstrate that QLDS, while being computationally more efficient, improves over cross-validation for hyperparameter selection, indicating a high promise of the usage of random matrix theory for semi-supervised model selection.

摘要
我们提出了一个理论框架，用于分析半监督分类在高维度 Régime 下的 semi-supervised classification。特别是，我们引入QLDS，一种线性分类模型，其中占据低概率分离假设通过quadratic margin maximization进行实现。该算法有显式解，并且我们显示了它的理论性质，并且我们显示了它与其他监督学习方法之间的缓解。使用最近的随机矩阵理论，我们正式地评估分类错误率的极限regime。为此，我们提出了一种hyperparameter选择策略，可以找到我们的学习 criterion 中的最佳平衡。最后，我们提供了广泛的案例研究和一些 benchmark 的实验，以示出QLDS 可以更高效地进行计算，并且可以在 hyperparameter 选择方面超过cross-validation。这表明了随机矩阵理论在 semi-supervised 模型选择方面的高 Promise。

FLTracer: Accurate Poisoning Attack Provenance in Federated Learning

paper_url: http://arxiv.org/abs/2310.13424
repo_url: https://github.com/eyr3/fltracer
paper_authors: Xinyu Zhang, Qingyu Liu, Zhongjie Ba, Yuan Hong, Tianhang Zheng, Feng Lin, Li Lu, Kui Ren
for: 本研究旨在探讨 Federated Learning (FL) 中的攻击检测方法，以及如何准确地检测不同类型的攻击和恶意更新。
methods: 本研究提出了一种基于 Kalman 约束的跨局回合检测方法，可以准确地检测不同类型的攻击和恶意更新，并且可以适应非独立同分布 (non-IID) 的数据设置。
results: 对比存在的检测方法，本研究的方法可以准确地检测攻击和恶意更新，并且在非独立同分布 (non-IID) 的数据设置下表现出 excel 的性能。

Abstract
Federated Learning (FL) is a promising distributed learning approach that enables multiple clients to collaboratively train a shared global model. However, recent studies show that FL is vulnerable to various poisoning attacks, which can degrade the performance of global models or introduce backdoors into them. In this paper, we first conduct a comprehensive study on prior FL attacks and detection methods. The results show that all existing detection methods are only effective against limited and specific attacks. Most detection methods suffer from high false positives, which lead to significant performance degradation, especially in not independent and identically distributed (non-IID) settings. To address these issues, we propose FLTracer, the first FL attack provenance framework to accurately detect various attacks and trace the attack time, objective, type, and poisoned location of updates. Different from existing methodologies that rely solely on cross-client anomaly detection, we propose a Kalman filter-based cross-round detection to identify adversaries by seeking the behavior changes before and after the attack. Thus, this makes it resilient to data heterogeneity and is effective even in non-IID settings. To further improve the accuracy of our detection method, we employ four novel features and capture their anomalies with the joint decisions. Extensive evaluations show that FLTracer achieves an average true positive rate of over $96.88\%$ at an average false positive rate of less than $2.67\%$, significantly outperforming SOTA detection methods. \footnote{Code is available at \url{https://github.com/Eyr3/FLTracer}.}

摘要
federated 学习（FL）是一种有前途的分布式学习方法，它允许多个客户端共同训练一个共享的全球模型。然而，最近的研究表明，FL 是易受到多种攻击的，这些攻击可以降低全球模型的性能或者引入后门。在这篇论文中，我们首先进行了FL攻击和检测方法的全面研究。结果显示，所有现有的检测方法都只有对特定和有限的攻击有效。大多数检测方法会导致高的假阳性，这会导致非常显著的性能下降，特别是在非独立和相同分布（non-IID）的设置下。为解决这些问题，我们提议FLTracer，FL攻击源架构，它可以准确地检测多种攻击并跟踪攻击时间、目标、类型和毒化的更新。与现有方法不同，我们的方法不仅仅基于客户端间异常检测，而是使用加尔曼滤波器基于跨局检测，以识别对手的行为变化前后。这使得我们的方法能够抗衡数据多样性，并在非独立和相同分布的设置下具有高效性。为了进一步提高我们的检测方法的准确度，我们采用了四个新的特征，并通过联合决策捕捉它们的异常。广泛的评估结果显示，FLTracer 的真正正确率超过 $96.88\%$，假阳性率低于 $2.67\%$，与当前最佳检测方法相比，显著超出。

paper_url: http://arxiv.org/abs/2310.18331
repo_url: None
paper_authors: Jiarun Liu, Wentao Hu, Chunhong Zhang
for: 这个论文主要targets web navigation tasks, using large language models (LLMs) to interpret objectives and interact with web pages.
methods: 该论文提出了一种标准化的提示模板，可以增强任务上下文表示，从而提高 LLMS 在 HTML 基于的网页导航性能。
results: 我们通过提示学习和指令精度调整基于开源 Llama-2 和 API 可用的 GPT 模型，发现 GPT-4 等大型模型在网页导航任务中表现出色，并且 HTML 片段长度和历史轨迹有显著影响性能，而先前的步骤指令不如实时环境反馈有更好的效果。

Abstract
Large Language Models (LLMs) have emerged as promising agents for web navigation tasks, interpreting objectives and interacting with web pages. However, the efficiency of spliced prompts for such tasks remains underexplored. We introduces AllTogether, a standardized prompt template that enhances task context representation, thereby improving LLMs' performance in HTML-based web navigation. We evaluate the efficacy of this approach through prompt learning and instruction finetuning based on open-source Llama-2 and API-accessible GPT models. Our results reveal that models like GPT-4 outperform smaller models in web navigation tasks. Additionally, we find that the length of HTML snippet and history trajectory significantly influence performance, and prior step-by-step instructions prove less effective than real-time environmental feedback. Overall, we believe our work provides valuable insights for future research in LLM-driven web agents.

摘要
大型语言模型（LLM）已经出现为网络浏览任务中有前途的代理人，解释目标和与网页交互。然而，用于这些任务的拼接提示的效率仍未得到足够的探索。我们介绍了AllTogether，一个标准化的提示模板，可以增强任务上下文表示，从而提高LLM在基于HTML的网络浏览中的表现。我们通过提示学习和指令精度调整基于开源Llama-2和可用API的GPT模型进行评估。我们的结果表明，比较大的GPT-4模型在网络浏览任务中表现更好，而且HTML段和历史轨迹的长度对表现有重要影响，而先前的步骤指令比实时环境反馈更有效。总之，我们认为我们的工作对未来LLM驱动的网络代理人做出了重要贡献。

A Novel Transfer Learning Method Utilizing Acoustic and Vibration Signals for Rotating Machinery Fault Diagnosis

paper_url: http://arxiv.org/abs/2310.14796
repo_url: None
paper_authors: Zhongliang Chen, Zhuofei Huang, Wenxiong Kang
for: 这篇论文的目的是提出一种基于声学和振荡信号的错误诊断方法，以解决现有系统中的分布差异问题，提高错误诊断的精度和可靠性。
methods: 本文提出的方法包括设计了声学和振荡特征融合MAVgram，以提供更丰富和可靠的错误信息，并与基于神经网络的分类器结合，实现更有效的错误诊断表现。
results: 实验结果显示，提案的方法可以实现更高的错误诊断性能，并比STgram-MFN更有效。

Abstract
Fault diagnosis of rotating machinery plays a important role for the safety and stability of modern industrial systems. However, there is a distribution discrepancy between training data and data of real-world operation scenarios, which causing the decrease of performance of existing systems. This paper proposed a transfer learning based method utilizing acoustic and vibration signal to address this distribution discrepancy. We designed the acoustic and vibration feature fusion MAVgram to offer richer and more reliable information of faults, coordinating with a DNN-based classifier to obtain more effective diagnosis representation. The backbone was pre-trained and then fine-tuned to obtained excellent performance of the target task. Experimental results demonstrate the effectiveness of the proposed method, and achieved improved performance compared to STgram-MFN.

摘要
扭转机器的故障诊断在现代工业系统中扮演着重要的角色，以保证安全和稳定。然而，现有系统存在训练数据和实际运行场景数据之间的分布差异，导致现有系统的性能下降。这篇论文提出了基于传播学的方法，利用声音和振荡信号来解决这种分布差异。我们设计了声音和振荡特征融合MAVgram，以提供更加丰富和可靠的故障信息，并与基于DNN的分类器结合，以获得更有效的诊断表示。背部先经过训练，然后细化以实现target任务的优秀表现。实验结果表明提案的方法的有效性，并在STgram-MFN的基础上提高了性能。

POSQA: Probe the World Models of LLMs with Size Comparisons

paper_url: http://arxiv.org/abs/2310.13394
repo_url: https://github.com/cambridgeltl/posqa
paper_authors: Chang Shu, Jiuzhou Han, Fangyu Liu, Ehsan Shareghi, Nigel Collier
for: 验证最新的大语言模型（LLMs）在真实世界中的理解能力
methods: 使用物理对象大小问答数据集（POSQA）进行零基础测试，并使用高级提问技术和外部知识增强
results: 显示最新的LLMs在零基础情况下表现不佳，并且表现受提问形式和对象报告偏见的影响，表明语言模型从文本数据中塑造的理解能力可能受到表单形式的干扰和人类行为的不一致。

Abstract
Embodied language comprehension emphasizes that language understanding is not solely a matter of mental processing in the brain but also involves interactions with the physical and social environment. With the explosive growth of Large Language Models (LLMs) and their already ubiquitous presence in our daily lives, it is becoming increasingly necessary to verify their real-world understanding. Inspired by cognitive theories, we propose POSQA: a Physical Object Size Question Answering dataset with simple size comparison questions to examine the extremity and analyze the potential mechanisms of the embodied comprehension of the latest LLMs. We show that even the largest LLMs today perform poorly under the zero-shot setting. We then push their limits with advanced prompting techniques and external knowledge augmentation. Furthermore, we investigate whether their real-world comprehension primarily derives from contextual information or internal weights and analyse the impact of prompt formats and report bias of different objects. Our results show that real-world understanding that LLMs shaped from textual data can be vulnerable to deception and confusion by the surface form of prompts, which makes it less aligned with human behaviours.

摘要
研究人员强调体验语言理解，表明语言理解不仅是脑中的心理处理，还与物理环境和社会环境交互有关。随着大语言模型（LLMs）的快速发展和日常生活中的普遍存在，我们必须更加重视他们在实际情况下的理解能力。以聪明认知理论为引导，我们提出了POSQA：一个包含简单的大小比较问题的物理对象大小问答集，以探索最新的LLMs的具体实现和分析其物理语言理解机制。我们发现，当前最大的LLMs在零情况下表现很差。然后，我们使用高级推荐技术和外部知识增强。此外，我们还研究了LLMs的实际理解是否主要来自于上下文信息或内部权重，并分析了提问格式和对象报告偏见的影响。我们的结果表明，由文本数据塑造的LLMs可能因提问表现的表面形式而受到欺骗和混乱，这使其更加与人类行为不一致。

Learning Successor Representations with Distributed Hebbian Temporal Memory

paper_url: http://arxiv.org/abs/2310.13391
repo_url: None
paper_authors: Evgenii Dzhivelikian, Petr Kuderov, Aleksandr I. Panov
for: address the challenge of online hidden representation learning for decision-making under uncertainty in non-stationary, partially observable environments.
methods: based on factor graph formalism and a multicomponent neuron model, using distributed representations, sparse transition matrices, and local Hebbian-like learning rules.
results: outperforms classical LSTM and performs comparably to more advanced RNN-like algorithms, speeding up Temporal Difference learning for Successor Representation in changing environments.

Abstract
This paper presents a novel approach to address the challenge of online hidden representation learning for decision-making under uncertainty in non-stationary, partially observable environments. The proposed algorithm, Distributed Hebbian Temporal Memory (DHTM), is based on factor graph formalism and a multicomponent neuron model. DHTM aims to capture sequential data relationships and make cumulative predictions about future observations, forming Successor Representation (SR). Inspired by neurophysiological models of the neocortex, the algorithm utilizes distributed representations, sparse transition matrices, and local Hebbian-like learning rules to overcome the instability and slow learning process of traditional temporal memory algorithms like RNN and HMM. Experimental results demonstrate that DHTM outperforms classical LSTM and performs comparably to more advanced RNN-like algorithms, speeding up Temporal Difference learning for SR in changing environments. Additionally, we compare the SRs produced by DHTM to another biologically inspired HMM-like algorithm, CSCG. Our findings suggest that DHTM is a promising approach for addressing the challenges of online hidden representation learning in dynamic environments.

摘要

A Human-Robot Mutual Learning System with Affect-Grounded Language Acquisition and Differential Outcomes Training

paper_url: http://arxiv.org/abs/2310.13377
repo_url: None
paper_authors: Alva Markelius, Sofia Sjöberg, Zakaria Lemhauori, Laura Cohen, Martin Bergström, Robert Lowe, Lola Cañamero
for: 这个论文旨在探讨一种人机交互设置，以便人类和机器人共同学习符号语言，用于识别机器人的内部需求。
methods: 这个研究采用了一种差异性结果培训（DOT）协议，以便机器人对自己的内部需求（如饿）提供特定的反馈，并且人类通过正确的刺激（如 cookie）来响应机器人的需求。
results: 研究发现，在DOT协议下，人类的学习效率提高，并且可以更有效地学习机器人的语言。机器人在这个研究中使用了一个类似于人类婴儿语言发展阶段的词汇表。机器人的软件架构基于一种对情感相关的语言学习模型，将机器人的词汇与内部需求相关联。研究发现，在DOT conditon下，机器人的语言学习速度比非DOT控制condition更快 converges。参与者还报告了正面的情感体验、感到控制和与机器人之间的共鸣连接。这种教师-学生学习方法可能为增强治疗效果（如对偏抑郁症的治疗）做出贡献，通过增加人类在训练任务中的参与度，从而提高治疗效果。机器人的家OSTATIC需求启发的语言学习具有潜在的社会化和合作（教育）功能，可能为人类与机器人之间的交互带来更多的有用和满足。

Abstract
This paper presents a novel human-robot interaction setup for robot and human learning of symbolic language for identifying robot homeostatic needs. The robot and human learn to use and respond to the same language symbols that convey homeostatic needs and the stimuli that satisfy the homeostatic needs, respectively. We adopted a differential outcomes training (DOT) protocol whereby the robot provides feedback specific (differential) to its internal needs (e.g. `hunger') when satisfied by the correct stimulus (e.g. cookie). We found evidence that DOT can enhance the human's learning efficiency, which in turn enables more efficient robot language acquisition. The robot used in the study has a vocabulary similar to that of a human infant in the linguistic ``babbling'' phase. The robot software architecture is built upon a model for affect-grounded language acquisition where the robot associates vocabulary with internal needs (hunger, thirst, curiosity) through interactions with the human. The paper presents the results of an initial pilot study conducted with the interactive setup, which reveal that the robot's language acquisition achieves higher convergence rate in the DOT condition compared to the non-DOT control condition. Additionally, participants reported positive affective experiences, feeling of being in control, and an empathetic connection with the robot. This mutual learning (teacher-student learning) approach offers a potential contribution of facilitating cognitive interventions with DOT (e.g. for people with dementia) through increased therapy adherence as a result of engaging humans more in training tasks by taking an active teaching-learning role. The homeostatic motivational grounding of the robot's language acquisition has potential to contribute to more ecologically valid and social (collaborative/nurturing) interactions with robots.

摘要
这篇论文描述了一种新的人机交互设置，用于机器人和人类学习符号语言，以便识别机器人的内部需求。机器人和人类都学习了使用和响应同一种语言符号，表示内部需求和满足需求的刺激。我们采用了一种差分结果培训（DOT）协议，其中机器人提供特定的反馈（差分），以满足其内部需求（如快餐）。我们发现，DOT可以加强人类学习效率，从而使机器人语言学习更加高效。机器人在研究中使用的词汇数量与人类婴儿语言发展阶段相似。机器人软件架构基于语言学习模型，其中机器人通过与人类的互动关系 vocabulary 与内部需求（快餐、喝彩、好奇）相联系。研究发现，在DOT条件下，机器人语言学习具有更高的吞吐率，而控制条件下则相对较低。此外，参与者报告了正面的情感体验、感到控制和与机器人之间的共鸣。这种教师学生学习方法可能为增强治疗效率而做出贡献，例如用于人们的诱导疗法（如偏 wurlitzer 症），通过增加人类在训练任务中的活跃参与，从而提高治疗效率。机器人的内部需求驱动的语言学习有助于实现更加生动化和社交（合作/善父）的机器人交互。

VFedMH: Vertical Federated Learning for Training Multi-party Heterogeneous Models

paper_url: http://arxiv.org/abs/2310.13367
repo_url: None
paper_authors: Shuo Wang, Keke Gai, Jing Yu, Liehuang Zhu
for:This paper proposes a novel approach called Vertical Federated learning for training Multi-parties Heterogeneous models (VFedMH) to address the challenges of heterogeneous local models among participants in existing VFL methods.methods:The approach focuses on aggregating the embeddings of each participant’s knowledge instead of intermediate results during forward propagation. The active party securely aggregates local embeddings to obtain global knowledge embeddings and sends them to passive parties, who then utilize the global embeddings to propagate forward on their local heterogeneous networks. The active party assists the passive party in computing its local heterogeneous model gradients.results:The paper demonstrates that VFedMH can simultaneously train multiple heterogeneous models with heterogeneous optimization and outperform some recent methods in model performance. The paper also provides a theoretical analysis of VFedMH’s convergence performance.

Abstract
Vertical Federated Learning (VFL) has gained increasing attention as a novel training paradigm that integrates sample alignment and feature union. However, existing VFL methods face challenges when dealing with heterogeneous local models among participants, which affects optimization convergence and generalization. To address this issue, this paper proposes a novel approach called Vertical Federated learning for training Multi-parties Heterogeneous models (VFedMH). VFedMH focuses on aggregating the embeddings of each participant's knowledge instead of intermediate results during forward propagation. The active party, who possesses labels and features of the sample, in VFedMH securely aggregates local embeddings to obtain global knowledge embeddings, and sends them to passive parties. The passive parties, who own only features of the sample, then utilize the global embeddings to propagate forward on their local heterogeneous networks. However, the passive party does not own the labels, so the local model gradient cannot be calculated locally. To overcome this limitation, the active party assists the passive party in computing its local heterogeneous model gradients. Then, each participant trains their local model using the heterogeneous model gradients. The objective is to minimize the loss value of their respective local heterogeneous models. Additionally, the paper provides a theoretical analysis of VFedMH's convergence performance. Extensive experiments are conducted to demonstrate that VFedMH can simultaneously train multiple heterogeneous models with heterogeneous optimization and outperform some recent methods in model performance.

摘要
垂直联邦学习（VFL）在最近几年中得到了越来越多的关注，这是一种新的训练方法，它将把样本Alignment和特征Union结合在一起。然而，现有的VFL方法在参与者之间存在不同的本地模型，这会影响优化征程和泛化性。为了解决这个问题，这篇论文提出了一种新的方法，即Vertically Federated Learning for training Multi-parties Heterogeneous models（VFedMH）。VFedMH通过在每个参与者的知识嵌入上进行聚合来取代传输中间结果的方法。活动参与者，即拥有标签和样本特征的方，在VFedMH中安全地聚合本地嵌入，并将其发送到抗拒参与者。抗拒参与者，即拥有样本特征但没有标签的方，然后使用全球嵌入来在本地不同的网络上进行前进传播。然而，抗拒参与者没有标签，因此本地模型梯度无法计算本地。为了解决这个问题，活动参与者为抗拒参与者计算本地不同模型梯度。然后，每个参与者使用自己的本地模型梯度来训练自己的本地模型，目标是将本地模型的损失值最小化。此外，论文还提供了VFL的准确性性分析。广泛的实验表明，VFedMH可以同时训练多个不同的模型，并且在模型性能方面超越一些最新的方法。

Towards General Error Diagnosis via Behavioral Testing in Machine Translation

paper_url: http://arxiv.org/abs/2310.13362
repo_url: https://github.com/wujunjie1998/btpgbt
paper_authors: Junjie Wu, Lemao Liu, Dit-Yan Yeung
for: 本研究旨在提供一种基于行为测试的机器翻译系统诊断方法，以检测机器翻译系统的通用错误。
methods: 本研究提出了一种新的双语翻译对生成基于行为测试（BTPGBT）框架，通过自动生成高质量测试 случа和 Pseudoreferences，以便对机器翻译系统进行全面和准确的行为测试。
results: 实验结果表明，BTPGBT 可以为机器翻译系统提供全面和准确的行为测试结果，并提供了一些有趣的发现。codes和数据可以在 https: //github.com/wujunjie1998/BTPGBT 上下载。

Abstract
Behavioral testing offers a crucial means of diagnosing linguistic errors and assessing capabilities of NLP models. However, applying behavioral testing to machine translation (MT) systems is challenging as it generally requires human efforts to craft references for evaluating the translation quality of such systems on newly generated test cases. Existing works in behavioral testing of MT systems circumvent this by evaluating translation quality without references, but this restricts diagnosis to specific types of errors, such as incorrect translation of single numeric or currency words. In order to diagnose general errors, this paper proposes a new Bilingual Translation Pair Generation based Behavior Testing (BTPGBT) framework for conducting behavioral testing of MT systems. The core idea of BTPGBT is to employ a novel bilingual translation pair generation (BTPG) approach that automates the construction of high-quality test cases and their pseudoreferences. Experimental results on various MT systems demonstrate that BTPGBT could provide comprehensive and accurate behavioral testing results for general error diagnosis, which further leads to several insightful findings. Our code and data are available at https: //github.com/wujunjie1998/BTPGBT.

摘要
行为测试提供了诊断语言错误和评估自然语言处理器（NLP）模型能力的重要方式。但在机器翻译（MT）系统上进行行为测试是具有挑战性，因为通常需要人工劳动来制定评估MT系统翻译质量的参考。现有的MT系统行为测试方法通过不使用参考来评估翻译质量，但这限定了诊断的类型为单个数字或货币词的错误。为了诊断通用错误，本文提出了一种新的行为测试框架（BTPGBT），基于自动生成高质量测试用例和其 Pseudoreferences的翻译对。实验结果表明，BTPGBT可以提供全面和准确的行为测试结果，用于普遍错误诊断，并且导致了一些有价值的发现。我们的代码和数据可以在https: //github.com/wujunjie1998/BTPGBT上获取。

Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation

paper_url: http://arxiv.org/abs/2310.13361
repo_url: https://github.com/ictnlp/SAMMT
paper_authors: Wenyu Guo, Qingkai Fang, Dong Yu, Yang Feng
for: simultaneous machine translation and image input
methods: using powerful text-to-image generation models and minimizing the gap between synthetic and authentic image representations
results: achieving state-of-the-art performance on Multi30K En-De and En-Fr datasets while remaining independent of authentic images during inference

Abstract
Multimodal machine translation (MMT) simultaneously takes the source sentence and a relevant image as input for translation. Since there is no paired image available for the input sentence in most cases, recent studies suggest utilizing powerful text-to-image generation models to provide image inputs. Nevertheless, synthetic images generated by these models often follow different distributions compared to authentic images. Consequently, using authentic images for training and synthetic images for inference can introduce a distribution shift, resulting in performance degradation during inference. To tackle this challenge, in this paper, we feed synthetic and authentic images to the MMT model, respectively. Then we minimize the gap between the synthetic and authentic images by drawing close the input image representations of the Transformer Encoder and the output distributions of the Transformer Decoder. Therefore, we mitigate the distribution disparity introduced by the synthetic images during inference, thereby freeing the authentic images from the inference process.Experimental results show that our approach achieves state-of-the-art performance on the Multi30K En-De and En-Fr datasets, while remaining independent of authentic images during inference.

摘要
多模态机器翻译（MMT）同时接受源句子和相关的图像作为翻译输入。由于翻译输入句子中的图像 rarely available，recent studies suggest 使用强大的文本到图像生成模型提供图像输入。然而，由这些模型生成的图像经常遵循不同的分布，从而导致在推理过程中的分布偏移，从而降低翻译性能。为解决这个挑战，在这篇论文中，我们将Feed synthetic和authentic图像给MMT模型，然后将synthetic和authentic图像的输入图像表示和Transformer Encoder的输出分布减小到最小值。因此，我们消除了在推理过程中由synthetic图像引入的分布偏移，使得authentic图像可以在推理过程中自由发挥作用。实验结果显示，我们的方法在Multi30K En-De和En-Fr数据集上实现了状态的翻译性能，同时不依赖于authentic图像进行推理。

DYNAMITE: Dynamic Interplay of Mini-Batch Size and Aggregation Frequency for Federated Learning with Static and Streaming Dataset

paper_url: http://arxiv.org/abs/2310.14906
repo_url: None
paper_authors: Weijie Liu, Xiaoxi Zhang, Jingpu Duan, Carlee Joe-Wong, Zhi Zhou, Xu Chen
for: This paper focuses on analyzing the joint effects of adjusting batch size and aggregation frequency on model performance, training time, and resource consumption in federated learning (FL) training, especially when facing dynamic data streams and network characteristics.
methods: The paper introduces novel analytical models and optimization algorithms that leverage the interplay between batch size and aggregation frequency to navigate the trade-offs among convergence, cost, and completion time for dynamic FL training. The paper also derives closed-form solutions for co-optimized batch size and aggregation frequency that are consistent across all devices.
results: The paper conducts extensive experiments to demonstrate the superiority of the proposed offline optimal solutions and online adaptive algorithm. The results show that the proposed methods can efficiently train accurate FL models while addressing the heterogeneity of both data and system characteristics.

Abstract
Federated Learning (FL) is a distributed learning paradigm that can coordinate heterogeneous edge devices to perform model training without sharing private data. While prior works have focused on analyzing FL convergence with respect to hyperparameters like batch size and aggregation frequency, the joint effects of adjusting these parameters on model performance, training time, and resource consumption have been overlooked, especially when facing dynamic data streams and network characteristics. This paper introduces novel analytical models and optimization algorithms that leverage the interplay between batch size and aggregation frequency to navigate the trade-offs among convergence, cost, and completion time for dynamic FL training. We establish a new convergence bound for training error considering heterogeneous datasets across devices and derive closed-form solutions for co-optimized batch size and aggregation frequency that are consistent across all devices. Additionally, we design an efficient algorithm for assigning different batch configurations across devices, improving model accuracy and addressing the heterogeneity of both data and system characteristics. Further, we propose an adaptive control algorithm that dynamically estimates network states, efficiently samples appropriate data batches, and effectively adjusts batch sizes and aggregation frequency on the fly. Extensive experiments demonstrate the superiority of our offline optimal solutions and online adaptive algorithm.

摘要
联合学习（FL）是一种分布式学习模式，可以在不同的边缘设备上进行模型训练，无需分享private数据。而且，以前的研究主要关注了FL的整合程度和批处理频率之间的关系，而忽略了在面对动态数据流和网络特性时，这些参数的共同影响。本文提出了新的分析模型和优化算法，通过批处理大小和汇集频率之间的交互来导航模型性能、训练时间和资源消耗之间的负面oren。我们提出了一个新的训练误差下界，考虑到不同设备上的hetogeneous数据集，并 deriv出closed-form解决方案，可以在所有设备上实现共同的批处理大小和汇集频率。此外，我们设计了一种高效的分配不同批处理配置的算法，以提高模型精度和处理不同数据和系统特性的hetogeneity。最后，我们提出了一种自适应控制算法，可以在 fly 上精准地估算网络状态，选择合适的数据批处理，并动态地调整批处理大小和汇集频率。广泛的实验表明了我们的offline优化解决方案和在线自适应算法的优越性。

NurViD: A Large Expert-Level Video Database for Nursing Procedure Activity Understanding

paper_url: http://arxiv.org/abs/2310.13347
repo_url: https://github.com/minghu0830/NurViD-benchmark
paper_authors: Ming Hu, Lin Wang, Siyuan Yan, Don Ma, Qingli Ren, Peng Xia, Wei Feng, Peibo Duan, Lie Ju, Zongyuan Ge
for: 这个论文旨在提高护理活动理解的质量和安全性，通过应用深度学习技术，促进教育和培训，改善质量控制，并启用操作符控制监测。
methods: 这个论文使用的方法包括使用深度学习技术进行护理活动理解，并提供了一个大型视频数据集（NurViD），其包含了51种护理程序和177个动作步骤的专家级标注。
results: 这个论文的结果显示，使用现有的深度学习方法在护理活动理解方面的效果不佳，而 NurViD 数据集可以帮助改善这种效果。

Abstract
The application of deep learning to nursing procedure activity understanding has the potential to greatly enhance the quality and safety of nurse-patient interactions. By utilizing the technique, we can facilitate training and education, improve quality control, and enable operational compliance monitoring. However, the development of automatic recognition systems in this field is currently hindered by the scarcity of appropriately labeled datasets. The existing video datasets pose several limitations: 1) these datasets are small-scale in size to support comprehensive investigations of nursing activity; 2) they primarily focus on single procedures, lacking expert-level annotations for various nursing procedures and action steps; and 3) they lack temporally localized annotations, which prevents the effective localization of targeted actions within longer video sequences. To mitigate these limitations, we propose NurViD, a large video dataset with expert-level annotation for nursing procedure activity understanding. NurViD consists of over 1.5k videos totaling 144 hours, making it approximately four times longer than the existing largest nursing activity datasets. Notably, it encompasses 51 distinct nursing procedures and 177 action steps, providing a much more comprehensive coverage compared to existing datasets that primarily focus on limited procedures. To evaluate the efficacy of current deep learning methods on nursing activity understanding, we establish three benchmarks on NurViD: procedure recognition on untrimmed videos, procedure and action recognition on trimmed videos, and action detection. Our benchmark and code will be available at \url{https://github.com/minghu0830/NurViD-benchmark}.

摘要
使用深度学习对护理程序活动理解可能会大幅提高护理人员和病人之间的质量和安全性。通过这种技术，我们可以提供培训和教育，提高质量控制，并启用操作符合性监测。然而，在这一领域的自动识别系统开发目前受到数据鲜血的限制。现有的视频数据集存在多种限制：1）这些数据集较小，无法支持全面的护理活动调查; 2）它们主要关注单一的程序，缺乏专家级别的护理程序和操作步骤的标注; 3）它们缺乏时间地标注，这使得targeted action在更长的视频序列中不能有效地local化。为了缓解这些限制，我们提出NurViD，一个大型视频数据集，包含专家级别的护理程序活动理解标注。NurViD包含1.5k个视频，总时长144小时，比现有最大的护理活动数据集长得多。其中包含51种不同的护理程序和177个操作步骤，比现有数据集更加全面。为了评估当前深度学习方法在护理活动理解方面的效果，我们建立了三个benchmark在NurViD上：程序认知在未处理视频上，程序和操作认知在处理视频上，以及操作检测。我们的benchmark和代码将在GitHub上公开，请参阅\url{https://github.com/minghu0830/NurViD-benchmark}.

Challenges and Contributing Factors in the Utilization of Large Language Models (LLMs)

paper_url: http://arxiv.org/abs/2310.13343
repo_url: None
paper_authors: Xiaoliang Chen, Liangbin Li, Le Chang, Yunhe Huang, Yuxuan Zhao, Yuxiao Zhang, Dinuo Li
for: 本文探讨了大语言模型（LLM）在各种应用场景中的挑战，包括域域特性、知识忘记、知识重复、知识虚假等问题。
methods: 本文提出了多种解决这些问题的方法，包括数据多样化、模型细化、可见性和解释性提高、优化模型、增加优化和公平性训练等。
results: 本文预测未来的LLM将强调公平、透明度和优素，以保证在服务人类时保持高的道德和伦理水平。

Abstract
With the development of large language models (LLMs) like the GPT series, their widespread use across various application scenarios presents a myriad of challenges. This review initially explores the issue of domain specificity, where LLMs may struggle to provide precise answers to specialized questions within niche fields. The problem of knowledge forgetting arises as these LLMs might find it hard to balance old and new information. The knowledge repetition phenomenon reveals that sometimes LLMs might deliver overly mechanized responses, lacking depth and originality. Furthermore, knowledge illusion describes situations where LLMs might provide answers that seem insightful but are actually superficial, while knowledge toxicity focuses on harmful or biased information outputs. These challenges underscore problems in the training data and algorithmic design of LLMs. To address these issues, it's suggested to diversify training data, fine-tune models, enhance transparency and interpretability, and incorporate ethics and fairness training. Future technological trends might lean towards iterative methodologies, multimodal learning, model personalization and customization, and real-time learning and feedback mechanisms. In conclusion, future LLMs should prioritize fairness, transparency, and ethics, ensuring they uphold high moral and ethical standards when serving humanity.

摘要
大型语言模型（LLM）如GPT系列的发展，在各种应用场景中广泛使用，却也带来许多挑战。本文首先探讨领域特定性问题， LLM 可能难以回答特殊领域内的精细问题。知识忘却问题表明这些 LLM 可能很难以平衡旧和新信息。知识重复现象表明 LLM 可能提供过于机械化的答案，缺乏深度和创新。此外，知识幻觉问题描述了 LLM 可能提供的答案似乎具有深度和创新，但实际上是 superficiel 的。知识毒性问题则关注 LLM 输出的有害或偏见信息。这些挑战表明 LLM 的训练数据和算法设计存在问题。为解决这些问题，建议多样化训练数据，细化模型，提高透明度和可解释性，并包括伦理和公平训练。未来技术趋势可能是迭代方法、多模态学习、模型个性化和定制化，以及实时学习和反馈机制。在结束时，未来 LLM 应该优先考虑公平、透明度和伦理，以确保它们在服务人类时保持高的道德和伦理标准。

FLAIR: a Country-Scale Land Cover Semantic Segmentation Dataset From Multi-Source Optical Imagery

paper_url: http://arxiv.org/abs/2310.13336
repo_url: None
paper_authors: Anatol Garioud, Nicolas Gonthier, Loic Landrieu, Apolline De Wit, Marion Valette, Marc Poupée, Sébastien Giordano, Boris Wattrelos
for: FLAIR 是一个大规模地理分类的数据集，用于监测和理解人类活动的发展指标，如城市化、扫林和土壤人工化。
methods: FLAIR 使用高分辨率遥感图像和时间序列数据，并提供了精确的地表类型分类。
results: FLAIR 提供了817平方公里的高分辨率遥感图像，以及20亿个分类标签，可以用于开发和评估大规模地理分类算法。

Abstract
We introduce the French Land cover from Aerospace ImageRy (FLAIR), an extensive dataset from the French National Institute of Geographical and Forest Information (IGN) that provides a unique and rich resource for large-scale geospatial analysis. FLAIR contains high-resolution aerial imagery with a ground sample distance of 20 cm and over 20 billion individually labeled pixels for precise land-cover classification. The dataset also integrates temporal and spectral data from optical satellite time series. FLAIR thus combines data with varying spatial, spectral, and temporal resolutions across over 817 km2 of acquisitions representing the full landscape diversity of France. This diversity makes FLAIR a valuable resource for the development and evaluation of novel methods for large-scale land-cover semantic segmentation and raises significant challenges in terms of computer vision, data fusion, and geospatial analysis. We also provide powerful uni- and multi-sensor baseline models that can be employed to assess algorithm's performance and for downstream applications. Through its extent and the quality of its annotation, FLAIR aims to spur improvements in monitoring and understanding key anthropogenic development indicators such as urban growth, deforestation, and soil artificialization. Dataset and codes can be accessed at https://ignf.github.io/FLAIR/

摘要
我们介绍法国陆地覆盖物（FLAIR），一个广泛的数据集来自法国国家地理和森林信息研究所（IGN），提供了一个独特和丰富的大规模地ospatial分析资源。FLAIR包含高分辨率飞行图像，地面抽象距离20 cm，超过20亿个准确标注的像素，用于精确的陆地覆盖类别分类。数据集还 integraoptical卫星时序序数据。因此，FLAIR结合了不同的空间、spectral和时间分辨率，覆盖了法国的全景多样性，总面积超过817 km2。这种多样性使FLAIR成为大规模陆地Semantic分类的开发和评估的丰富资源，同时也提出了计算机视觉、数据融合和地ospatial分析的挑战。我们还提供了强大的单感器和多感器基线模型，可以用于评估算法性能和下游应用。通过其覆盖范围和精确的标注，FLAIR希望能促进跟踪和理解人类发展指标，如城市增长、Deforestation和 soil artificialization。数据集和代码可以在https://ignf.github.io/FLAIR/上获取。

Democratizing Reasoning Ability: Tailored Learning from Large Language Model

paper_url: http://arxiv.org/abs/2310.13332
repo_url: https://github.com/raibows/learn-to-reason
paper_authors: Zhaoyang Wang, Shaohan Huang, Yuxuan Liu, Jiahai Wang, Minghui Song, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang
for: 提高小型语言模型（LLM）的推理能力，使其能够更广泛应用于自然语言处理领域。
methods: 提出了一种特化的学习方法，通过在多轮学习 paradigm 中，将大型黑盒语言模型（LLM）作为教师，为小型语言模型（LM）提供个性化的培训数据，以提高LM的推理能力。同时，通过自我反思学习，让学生从自己的错误中学习。
results: 经过实验和分析，表明该方法可以有效提高小型语言模型的推理能力，并且可以在数学和常识理解任务上达到比较高的性能。

Abstract
Large language models (LLMs) exhibit impressive emergent abilities in natural language processing, but their democratization is hindered due to huge computation requirements and closed-source nature. Recent research on advancing open-source smaller LMs by distilling knowledge from black-box LLMs has obtained promising results in the instruction-following ability. However, the reasoning ability which is more challenging to foster, is relatively rarely explored. In this paper, we propose a tailored learning approach to distill such reasoning ability to smaller LMs to facilitate the democratization of the exclusive reasoning ability. In contrast to merely employing LLM as a data annotator, we exploit the potential of LLM as a reasoning teacher by building an interactive multi-round learning paradigm. This paradigm enables the student to expose its deficiencies to the black-box teacher who then can provide customized training data in return. Further, to exploit the reasoning potential of the smaller LM, we propose self-reflection learning to motivate the student to learn from self-made mistakes. The learning from self-reflection and LLM are all tailored to the student's learning status, thanks to the seamless integration with the multi-round learning paradigm. Comprehensive experiments and analysis on mathematical and commonsense reasoning tasks demonstrate the effectiveness of our method. The code will be available at https://github.com/Raibows/Learn-to-Reason.

摘要
大型自然语言处理模型（LLM）表现出印象的emergent能力，但其普及化受到巨大计算需求和封闭式 natura 的阻碍。现代研究推进小型开源LLM的进步，通过将黑盒LLM知识蒸馏到小型LLM中，已经获得了惊人的成果。然而，更加困难的是 fostering 的逻辑能力，通常不会被探索。在本文中，我们提出了一个适应学习方法，以将这种逻辑能力蒸馏到小型LLM中，以便普及化这种专有的逻辑能力。相比使用LLM作为标签生成器，我们利用LLM的启发力作为教育师，建立了互动多轮学习 paradigm。这个 paradigm 让学生可以向黑盒教育师表达自己的不足，并且获得专门的训练数据回应。此外，为了套用小型LLM的逻辑潜力，我们提出了自我反思学习，让学生从自己的错误中学习。这些学习自我反思和LLM都是根据学习状态进行定制，感谢与多轮学习 paradigm 的整合。实验和分析显示了我们的方法的有效性，代码将会在 GitHub 上公开。

Boosting for Bounding the Worst-class Error

paper_url: http://arxiv.org/abs/2310.14890
repo_url: None
paper_authors: Yuya Saito, Shinnosuke Matsuo, Seiichi Uchida, Daiki Suehiro
for: 本文解决了最坏类错误率问题，而不是通过平均误差率来衡量所有类的性能。
methods: 本文提出了一种提高算法，该算法可以确保最坏类训练误差的Upper bound，并 deriv出其泛化 bound。
results: 实验结果显示，该算法可以降低测试集最坏类误差率，而不会过拟合训练集。

Abstract
This paper tackles the problem of the worst-class error rate, instead of the standard error rate averaged over all classes. For example, a three-class classification task with class-wise error rates of 10\%, 10\%, and 40\% has a worst-class error rate of 40\%, whereas the average is 20\% under the class-balanced condition. The worst-class error is important in many applications. For example, in a medical image classification task, it would not be acceptable for the malignant tumor class to have a 40\% error rate, while the benign and healthy classes have 10\% error rates.We propose a boosting algorithm that guarantees an upper bound of the worst-class training error and derive its generalization bound. Experimental results show that the algorithm lowers worst-class test error rates while avoiding overfitting to the training set.

摘要
We propose a boosting algorithm that guarantees an upper bound of the worst-class training error and derive its generalization bound. Experimental results show that the algorithm reduces worst-class test error rates while avoiding overfitting to the training set.

Coarse-to-Fine Dual Encoders are Better Frame Identification Learners

paper_url: http://arxiv.org/abs/2310.13316
repo_url: https://github.com/pkunlp-icler/cofftea
paper_authors: Kaikai An, Ce Zheng, Bofei Gao, Haozhe Zhao, Baobao Chang
for: 本研究旨在提高FrameIdentification的精度和效率，尤其是在面临大量候选框的情况下。
methods: 本文提出了CoFFTEA Architecture，包括Coarse-to-Fine Encoders和 dual encoders，通过对框和目标进行对齐学习，以提高FrameIdentification的精度和效率。
results: 实验结果表明，CoFFTEA比前一代模型提高0.93的总得分和1.53的R@1指标，而不使用$lf$。此外，CoFFTEA还能更好地模型框和框之间的关系，以及目标和目标之间的关系。

Abstract
Frame identification aims to find semantic frames associated with target words in a sentence. Recent researches measure the similarity or matching score between targets and candidate frames by modeling frame definitions. However, they either lack sufficient representation learning of the definitions or face challenges in efficiently selecting the most suitable frame from over 1000 candidate frames. Moreover, commonly used lexicon filtering ($lf$) to obtain candidate frames for the target may ignore out-of-vocabulary targets and cause inadequate frame modeling. In this paper, we propose CoFFTEA, a $\underline{Co}$arse-to-$\underline{F}$ine $\underline{F}$rame and $\underline{T}$arget $\underline{E}$ncoders $\underline{A}$rchitecture. With contrastive learning and dual encoders, CoFFTEA efficiently and effectively models the alignment between frames and targets. By employing a coarse-to-fine curriculum learning procedure, CoFFTEA gradually learns to differentiate frames with varying degrees of similarity. Experimental results demonstrate that CoFFTEA outperforms previous models by 0.93 overall scores and 1.53 R@1 without $lf$. Further analysis suggests that CoFFTEA can better model the relationships between frame and frame, as well as target and target. The code for our approach is available at https://github.com/pkunlp-icler/COFFTEA.

摘要
框架识别目标words的Semantic框的相关研究。近期研究通过定义框的模型来衡量目标和候选框之间的相似性或匹配得分。然而，它们可能缺乏定义框的表示学习或有效地从1000多个候选框中选择最适合的框。另外，通常使用词典筛选($lf$)来获取候选框的目标可能忽略到词语表外的目标，从而导致不充分的框模型。在这篇论文中，我们提出了CoFFTEA，一种$\underline{Co}$arse-to-$\underline{F}$ine $\underline{F}$rame和$\underline{T}$arget $\underline{E}$ncoders $\underline{A}$rchitecture。通过对框和目标的对齐学习，CoFFTEA可以高效地和高效地模型框和目标之间的对齐。通过使用粗细度逐步学习程序，CoFFTEA逐渐学习到不同程度的相似度之间的分化。实验结果表明，CoFFTEA在前一代模型的0.93的总得分和1.53的R@1（不使用$lf）。进一步分析表明，CoFFTEA可以更好地模型框和框之间，以及目标和目标之间的关系。我们的代码可以在https://github.com/pkunlp-icler/COFFTEA中获取。

paper_url: http://arxiv.org/abs/2310.13297
repo_url: None
paper_authors: Chenkai Sun, Jinning Li, Yi R. Fung, Hou Pong Chan, Tarek Abdelzaher, ChengXiang Zhai, Heng Ji
For: The paper aims to improve the accuracy of automatic response forecasting for news media, specifically in cases where explicit profiles or historical actions of users are limited (referred to as lurkers).* Methods: The proposed framework, SocialSense, leverages a large language model to induce a belief-centered graph on top of an existent social network, along with graph-based propagation to capture social dynamics.* Results: The proposed method surpasses existing state-of-the-art in experimental evaluations for both zero-shot and supervised settings, demonstrating its effectiveness in response forecasting, and is capable of handling unseen user and lurker scenarios.Here is the same information in Simplified Chinese:* For: 文章目标是改进新闻媒体自动回应预测精度，特别是在用户的详细信息或历史行为 Limited (referred to as lurkers) 情况下。* Methods: 提议的社交感知框架 SocialSense 利用大型语言模型生成 belief-centered 图 на основе现有的社交网络，并通过图基本传播来捕捉社交动态。* Results: 提议的方法在 zero-shot 和 supervised 设置下的实验评估中超过现有状态的前ier，表明其在回应预测中的有效性，并能够处理未seen 用户和隐身用户情况。

Abstract
Automatic response forecasting for news media plays a crucial role in enabling content producers to efficiently predict the impact of news releases and prevent unexpected negative outcomes such as social conflict and moral injury. To effectively forecast responses, it is essential to develop measures that leverage the social dynamics and contextual information surrounding individuals, especially in cases where explicit profiles or historical actions of the users are limited (referred to as lurkers). As shown in a previous study, 97% of all tweets are produced by only the most active 25% of users. However, existing approaches have limited exploration of how to best process and utilize these important features. To address this gap, we propose a novel framework, named SocialSense, that leverages a large language model to induce a belief-centered graph on top of an existent social network, along with graph-based propagation to capture social dynamics. We hypothesize that the induced graph that bridges the gap between distant users who share similar beliefs allows the model to effectively capture the response patterns. Our method surpasses existing state-of-the-art in experimental evaluations for both zero-shot and supervised settings, demonstrating its effectiveness in response forecasting. Moreover, the analysis reveals the framework's capability to effectively handle unseen user and lurker scenarios, further highlighting its robustness and practical applicability.

摘要
自动回应预测 для新闻媒体在内容制作者能够有效预测新闻发布后的影响，避免不必要的负面结果，如社交冲突和道德伤害。为了有效预测回应，需要开发机制，利用社交动力和用户境外信息，尤其是在用户没有明确 Profiling 或历史行为时。据前一研究显示，97%的所有推文来自只有25%最活跃的用户。然而，现有方法尚未充分探讨如何最佳处理和利用这些重要特征。为了解决这个空白，我们提出了一种新的框架，名为SocialSense，它利用大型自然语言模型生成一个带有信念中心的图，并利用图基于传播来捕捉社交动力。我们假设，生成的图可以bridging distant用户之间的相似信念，使模型能够有效地捕捉回应模式。我们的方法在实验评估中超越了现有状态的艺术，demonstrating its effectiveness in response forecasting。此外，分析表明框架可以有效处理未看到用户和寂寂者enario，进一步强调其可靠性和实用性。

PathRL: An End-to-End Path Generation Method for Collision Avoidance via Deep Reinforcement Learning

paper_url: http://arxiv.org/abs/2310.13295
repo_url: None
paper_authors: Wenhao Yu, Jie Peng, Quecheng Qiu, Hanyu Wang, Lu Zhang, Jianmin Ji
for: 提高移动机器人性能的深度强化学习（DRL）方法
methods: 使用特定的动作空间精度化技术和适应状态空间表示方法来解决训练困难
results: 比其他DRL导航方法更高的成功率和降低旋转幅度的稳定和平滑机器人运动

Abstract
Robot navigation using deep reinforcement learning (DRL) has shown great potential in improving the performance of mobile robots. Nevertheless, most existing DRL-based navigation methods primarily focus on training a policy that directly commands the robot with low-level controls, like linear and angular velocities, which leads to unstable speeds and unsmooth trajectories of the robot during the long-term execution. An alternative method is to train a DRL policy that outputs the navigation path directly. However, two roadblocks arise for training a DRL policy that outputs paths: (1) The action space for potential paths often involves higher dimensions comparing to low-level commands, which increases the difficulties of training; (2) It takes multiple time steps to track a path instead of a single time step, which requires the path to predicate the interactions of the robot w.r.t. the dynamic environment in multiple time steps. This, in turn, amplifies the challenges associated with training. In response to these challenges, we propose PathRL, a novel DRL method that trains the policy to generate the navigation path for the robot. Specifically, we employ specific action space discretization techniques and tailored state space representation methods to address the associated challenges. In our experiments, PathRL achieves better success rates and reduces angular rotation variability compared to other DRL navigation methods, facilitating stable and smooth robot movement. We demonstrate the competitive edge of PathRL in both real-world scenarios and multiple challenging simulation environments.

摘要
仿生Navigation使用深度强化学习（DRL）已经显示出了提高移动机器人性能的潜在力量。然而，现有的大多数DRL基于 Navigation方法都主要集中于直接训练机器人的低级指令，如线性和ANGULAR velocity，这会导致机器人在长期执行中的速度不稳定和轨迹不平滑。作为一种alternative方法，可以训练一个DRL策略，输出机器人的Navigation path。然而，两个障碍物 arise for training a DRL策略：（1）action space for potential paths often involves higher dimensions comparing to low-level commands，这会增加训练的difficulties;（2）It takes multiple time steps to track a path instead of a single time step，这需要机器人在多个时间步骤中与动态环境进行互动，从而增加训练的挑战。为回应这些挑战，我们提出了PathRL，一种新的DRL方法，训练策略是生成机器人的Navigation path。我们使用specific action space discretization techniques和tailored state space representation methods来解决相关的挑战。在我们的实验中，PathRL实现了与其他DRL Navigation方法相比更高的成功率和降低ANGULAR rotation variability，使机器人的移动更加稳定和平滑。我们在实际场景和多个复杂的simulation环境中证明了PathRL的竞争力。

Assessing Privacy Risks in Language Models: A Case Study on Summarization Tasks

paper_url: http://arxiv.org/abs/2310.13291
repo_url: None
paper_authors: Ruixiang Tang, Gord Lueck, Rodolfo Quispe, Huseyin A Inan, Janardhan Kulkarni, Xia Hu
for: 研究它们可以攻击语言模型的数据成员情报泄露问题。
methods: 利用文本相似性和模型对文档修改的抵抗性作为可能的数据成员情报泄露信号，并评估其效果在广泛使用的 dataset 上。
results: 结果表明，摘要模型容易泄露数据成员情报，即使参考摘要不可用。此外，我们还讨论了训练摘要模型的安全措施，并讨论了数据隐私和实用性之间的自然补偿。

Abstract
Large language models have revolutionized the field of NLP by achieving state-of-the-art performance on various tasks. However, there is a concern that these models may disclose information in the training data. In this study, we focus on the summarization task and investigate the membership inference (MI) attack: given a sample and black-box access to a model's API, it is possible to determine if the sample was part of the training data. We exploit text similarity and the model's resistance to document modifications as potential MI signals and evaluate their effectiveness on widely used datasets. Our results demonstrate that summarization models are at risk of exposing data membership, even in cases where the reference summary is not available. Furthermore, we discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.

摘要

Unified Pretraining for Recommendation via Task Hypergraphs

paper_url: http://arxiv.org/abs/2310.13286
repo_url: https://github.com/mdyfrank/uprth
paper_authors: Mingdai Yang, Zhiwei Liu, Liangwei Yang, Xiaolong Liu, Chen Wang, Hao Peng, Philip S. Yu
for: 提出一个 novel 多任务预训练框架 UPRTH，以满足各种推荐任务的多元需求和特点。
methods: 提出了一个名为 task hypergraphs 的新方法，可以将多个预训练任务转换为 Hyperedge 预测任务，并将这些任务联系到推荐任务上。同时，提出了一个名为 transitional attention 的新层，可以精确地学习每个预训练任务与推荐任务之间的相关性。
results: 透过实验结果，显示 UPRTH 的超越性，并进行了详细的探索，以证明提案的架构的有效性。

Abstract
Although pretraining has garnered significant attention and popularity in recent years, its application in graph-based recommender systems is relatively limited. It is challenging to exploit prior knowledge by pretraining in widely used ID-dependent datasets. On one hand, user-item interaction history in one dataset can hardly be transferred to other datasets through pretraining, where IDs are different. On the other hand, pretraining and finetuning on the same dataset leads to a high risk of overfitting. In this paper, we propose a novel multitask pretraining framework named Unified Pretraining for Recommendation via Task Hypergraphs. For a unified learning pattern to handle diverse requirements and nuances of various pretext tasks, we design task hypergraphs to generalize pretext tasks to hyperedge prediction. A novel transitional attention layer is devised to discriminatively learn the relevance between each pretext task and recommendation. Experimental results on three benchmark datasets verify the superiority of UPRTH. Additional detailed investigations are conducted to demonstrate the effectiveness of the proposed framework.

摘要
(Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China. The translation may vary depending on the region or dialect.)

Meta-learning of Physics-informed Neural Networks for Efficiently Solving Newly Given PDEs

paper_url: http://arxiv.org/abs/2310.13270
repo_url: None
paper_authors: Tomoharu Iwata, Yusuke Tanaka, Naonori Ueda
for: 解决各种partial differential equation(PDE)问题
methods: 使用神经网络基于meta-学习方法，将PDE问题编码成问题表示，并使用神经网络预测解决方案
results: 比较 existed方法，提出的方法可以更高效地预测PDE问题的解决方案

Abstract
We propose a neural network-based meta-learning method to efficiently solve partial differential equation (PDE) problems. The proposed method is designed to meta-learn how to solve a wide variety of PDE problems, and uses the knowledge for solving newly given PDE problems. We encode a PDE problem into a problem representation using neural networks, where governing equations are represented by coefficients of a polynomial function of partial derivatives, and boundary conditions are represented by a set of point-condition pairs. We use the problem representation as an input of a neural network for predicting solutions, which enables us to efficiently predict problem-specific solutions by the forwarding process of the neural network without updating model parameters. To train our model, we minimize the expected error when adapted to a PDE problem based on the physics-informed neural network framework, by which we can evaluate the error even when solutions are unknown. We demonstrate that our proposed method outperforms existing methods in predicting solutions of PDE problems.

摘要
我们提出了一种基于神经网络的meta学习方法，用于效率地解决部分 diferencial equation（PDE）问题。我们的方法可以快速地学习解决各种PDE问题，并使用这些知识来解决新给定的PDE问题。我们将PDE问题编码为一个问题表示，使用神经网络来表示管理方程，并将边界条件表示为一组点条件对。我们使用问题表示作为神经网络预测解决方法的输入，这使得我们可以通过神经网络的前进过程来快速地预测问题特定的解决方案，而不需要更新模型参数。为了训练我们的模型，我们将通过基于物理学信息神经网络框架的最小二乘法来减少预测错误，这样我们就可以在解决PDE问题时评估解决方案的错误。我们示出了我们的提出方法可以比既有方法更高效地预测PDE问题的解决方案。

An Exploratory Study on Simulated Annealing for Feature Selection in Learning-to-Rank

paper_url: http://arxiv.org/abs/2310.13269
repo_url: None
paper_authors: Mohd. Sayemul Haque, Md. Fahim, Muhammad Ibrahim
for: 本研究是 investigate the use of simulated annealing for feature selection in the learning-to-rank domain.
methods: 我们使用了一种叫做 simulated annealing的meta-heuristic approach，并 explore various neighborhood selection strategies和 temperature cooling schemes. We also introduce a new hyper-parameter called the progress parameter.
results: 我们的算法在五个公共的学习到排序 benchmark datasets上进行了评估，并与Local Beam Search算法进行了比较。结果表明我们的提议的模型具有效果。

Abstract
Learning-to-rank is an applied domain of supervised machine learning. As feature selection has been found to be effective for improving the accuracy of learning models in general, it is intriguing to investigate this process for learning-to-rank domain. In this study, we investigate the use of a popular meta-heuristic approach called simulated annealing for this task. Under the general framework of simulated annealing, we explore various neighborhood selection strategies and temperature cooling schemes. We further introduce a new hyper-parameter called the progress parameter that can effectively be used to traverse the search space. Our algorithms are evaluated on five publicly benchmark datasets of learning-to-rank. For a better validation, we also compare the simulated annealing-based feature selection algorithm with another effective meta-heuristic algorithm, namely local beam search. Extensive experimental results shows the efficacy of our proposed models.

摘要
学习排名是应用领域的超级vised机器学习。因为特征选择已被证明可以提高学习模型的准确率，因此在这个领域进行这种过程是非常有趣。在这个研究中，我们使用了一种受欢迎的meta-heuristic方法，即模拟熔炉。在总体框架下，我们探索了不同的邻居选择策略和温度冷却方案。我们还引入了一个新的超参数，即进度参数，可以有效地探索搜索空间。我们的算法在五个公共的学习排名数据集上进行了评估。为了更好地验证，我们还与另一种有效的meta-heuristic算法，即本地搜索算法进行比较。广泛的实验结果表明了我们的提议的模型的有效性。

Enhancing drug and cell line representations via contrastive learning for improved anti-cancer drug prioritization

paper_url: http://arxiv.org/abs/2310.13725
repo_url: None
paper_authors: Patrick J. Lawrence, Xia Ning
for: 这项研究旨在提高基于omic序列分析的精准肿瘤治疗，通过强化药物和细胞系列表示的关系结构，以提高计算机方法学习药物和细胞系列之间的相互关系。
methods: 本研究使用了对比学习方法，以保留药物机制作用和细胞系列肿瘤类型之间的关系结构，从而提高了学习的药物和细胞系列表示。
results: 相比之前的状态艺法，我们的学习表示可以 achieve更高的性能，并且发现使用我们学习的表示时，预测器更加均衡地使用药物和细胞系列来源的特征，这有助于进行更加个性化的药物优先顺序。

Abstract
Due to cancer's complex nature and variable response to therapy, precision oncology informed by omics sequence analysis has become the current standard of care. However, the amount of data produced for each patients makes it difficult to quickly identify the best treatment regimen. Moreover, limited data availability has hindered computational methods' abilities to learn patterns associated with effective drug-cell line pairs. In this work, we propose the use of contrastive learning to improve learned drug and cell line representations by preserving relationship structures associated with drug mechanism of action and cell line cancer types. In addition to achieving enhanced performance relative to a state-of-the-art method, we find that classifiers using our learned representations exhibit a more balances reliance on drug- and cell line-derived features when making predictions. This facilitates more personalized drug prioritizations that are informed by signals related to drug resistance.

摘要
Translated into Simplified Chinese:由于肿瘤的复杂性和变化的响应，精准肿瘤学 Informed by omics sequence analysis 已成为现代标准的治疗方法。然而，每个患者生成的数据量太大，使得快速 identificar最佳治疗方案变得困难。此外，数据的有限性限制了计算方法学习 pattern 关联有效的药品和细胞线路。在这项工作中，我们提议使用对比学习来提高学习到药品和细胞线路的表示，以保留药理学 Mechanism of action 和细胞型 cancer 之间的关系结构。此外，我们发现使用我们学习的表示的分类器能够更好地使用药品和细胞线路 Derived 特征进行预测，从而实现更个性化的药品优先级，基于药品抵抗的信号。

ManiCast: Collaborative Manipulation with Cost-Aware Human Forecasting

paper_url: http://arxiv.org/abs/2310.13258
repo_url: None
paper_authors: Kushal Kedia, Prithwish Dan, Atiksh Bhardwaj, Sanjiban Choudhury
for: 该论文旨在提高人机 робо合作 manipulation 性能，具体是通过精准预测人类动作来提高机器人的计划性能。
methods: 该论文提出了一种名为 ManiCast 的新框架，该框架通过学习人类动作预测模型，并将其与一种模拟预测控制器结合，以执行协同 manipulation 任务。
results: 该论文通过实验证明，ManiCast 框架可以在多个实际任务中，如反应混合、物品传递和协同设备等，实现流畅、实时的人机合作。同时，论文还对 Forecast 和 End-to-End 预测控制器系统进行了评估，并与已有的学习基线和规则基线进行了比较。

Abstract
Seamless human-robot manipulation in close proximity relies on accurate forecasts of human motion. While there has been significant progress in learning forecast models at scale, when applied to manipulation tasks, these models accrue high errors at critical transition points leading to degradation in downstream planning performance. Our key insight is that instead of predicting the most likely human motion, it is sufficient to produce forecasts that capture how future human motion would affect the cost of a robot's plan. We present ManiCast, a novel framework that learns cost-aware human forecasts and feeds them to a model predictive control planner to execute collaborative manipulation tasks. Our framework enables fluid, real-time interactions between a human and a 7-DoF robot arm across a number of real-world tasks such as reactive stirring, object handovers, and collaborative table setting. We evaluate both the motion forecasts and the end-to-end forecaster-planner system against a range of learned and heuristic baselines while additionally contributing new datasets. We release our code and datasets at https://portal-cornell.github.io/manicast/.

摘要
<>将人机 manipulate 在 close proximity 中进行流畅的执行，需要准确预测人类动作。虽然在大规模学习中有了 significiant progress ，但在执行 manipulate 任务时，这些模型会在关键转折点上产生高错误率，导致下游规划性能下降。我们的关键发现是，而不是预测人类动作的最有可能性，可以生成捕捉未来人类动作对机器人计划的成本影响的预测。我们提出了 ManiCast 框架，该框架学习成本识别人类预测，并将其传递给模型预测控制 плаanner 执行共同执行任务。我们的框架可以在多个实际任务中，如反应搅拌、物品交接和共同设备，实现流畅、实时的人机互动。我们对预测和总体预测控制系统进行了评估，并与已学习和启发式基础集成。我们将代码和数据集发布在。

Visual Grounding Helps Learn Word Meanings in Low-Data Regimes

paper_url: http://arxiv.org/abs/2310.13257
repo_url: None
paper_authors: Chengxu Zhuang, Evelina Fedorenko, Jacob Andreas
for: 研究语言模型（LM）是人类语言生成和理解的 poderful工具，并且它们的内部表示与人类语言处理的表示相吻合。但是，为了达到这些结果，LM必须通过不人类化的训练方法进行训练，需要大量的语言数据，而无需与感知、行为或社会行为相关的知识。
methods: 我们使用了多种LM架构，并在不同的数据scale上进行训练。我们还使用了图像描述任务作为 auxiliary supervision。
results: 我们发现，视觉超vision可以提高word learning的效率，但这些改进几乎都出现在低数据 régime中，而且在包含丰富的分布式信号的情况下，这些改进可能会被抵消。我们发现，模型主要驱动的视觉信息和word co-occurrence信息之间的信息不是重复的。然而，我们的结果表明，当前的多模式模型化方法无法有效地利用视觉信息，从人类化的数据集上建立更人类化的word表示。

Abstract
Modern neural language models (LMs) are powerful tools for modeling human sentence production and comprehension, and their internal representations are remarkably well-aligned with representations of language in the human brain. But to achieve these results, LMs must be trained in distinctly un-human-like ways -- requiring orders of magnitude more language data than children receive during development, and without any of the accompanying grounding in perception, action, or social behavior. Do models trained more naturalistically -- with grounded supervision -- exhibit more human-like language learning? We investigate this question in the context of word learning, a key sub-task in language acquisition. We train a diverse set of LM architectures, with and without auxiliary supervision from image captioning tasks, on datasets of varying scales. We then evaluate these models on a broad set of benchmarks characterizing models' learning of syntactic categories, lexical relations, semantic features, semantic similarity, and alignment with human neural representations. We find that visual supervision can indeed improve the efficiency of word learning. However, these improvements are limited: they are present almost exclusively in the low-data regime, and sometimes canceled out by the inclusion of rich distributional signals from text. The information conveyed by text and images is not redundant -- we find that models mainly driven by visual information yield qualitatively different from those mainly driven by word co-occurrences. However, our results suggest that current multi-modal modeling approaches fail to effectively leverage visual information to build more human-like word representations from human-sized datasets.

摘要
现代神经语言模型（LM）是强大的工具，用于模拟人类句子生成和理解，并且其内部表示与人类语言表示相吻合。但是，为了 достичь这些结果，LM需要接受非人类化的训练方法，需要大量的语言数据，并且没有与感知、行为或社会行为相关的背景。我们问 whether models trained more naturally -- with grounded supervision -- exhibit more human-like language learning? 在word learning中，我们训练了多种LM架构，有和无附加的图像描述任务 auxiliary supervision，在不同的数据规模上进行训练。然后，我们对这些模型进行了广泛的测试，以评估它们在 sintactic categories、lexical relations、semantic features、semantic similarity 和人类神经表示相似性方面的学习效果。我们发现，视觉supervision可以提高word learning的效率，但这些改进几乎完全局限于低数据规模，而且有时会被文本中的丰富分布式信号抵消。我们发现，文本和图像中的信息并不是重复的，模型主要驱动的视觉信息会导致模型的学习结果与主要驱动的word co-occurrences不同。然而，我们的结果表明，现有的多模态模型化方法无法有效地利用视觉信息，从人类化的数据规模上建立更人类化的word表示。

TempGNN: Temporal Graph Neural Networks for Dynamic Session-Based Recommendations

paper_url: http://arxiv.org/abs/2310.13249
repo_url: None
paper_authors: Eunkyu Oh, Taehun Kim
for: 这种研究旨在提高Session-based recommendation的准确率，通过更好地理解用户在短时间内的交互行为和item之间的相互关系。
methods: 该研究提出了Temporal Graph Neural Networks（TempGNN）模型，通过在动态Session图上使用时间嵌入Operator来捕捉交互行为的结构和时间层次结构。
results: 实验结果表明，TempGNN可以充分利用已有模型的优势，并且在两个真实世界电商数据集上达到了状态之冠的表现。

Abstract
Session-based recommendations which predict the next action by understanding a user's interaction behavior with items within a relatively short ongoing session have recently gained increasing popularity. Previous research has focused on capturing the dynamics of sequential dependencies from complicated item transitions in a session by means of recurrent neural networks, self-attention models, and recently, mostly graph neural networks. Despite the plethora of different models relying on the order of items in a session, few approaches have been proposed for dealing better with the temporal implications between interactions. We present Temporal Graph Neural Networks (TempGNN), a generic framework for capturing the structural and temporal dynamics in complex item transitions utilizing temporal embedding operators on nodes and edges on dynamic session graphs, represented as sequences of timed events. Extensive experimental results show the effectiveness and adaptability of the proposed method by plugging it into existing state-of-the-art models. Finally, TempGNN achieved state-of-the-art performance on two real-world e-commerce datasets.

摘要
SESSION-based recommendations, which predict the next action by understanding a user's interaction behavior with items within a relatively short ongoing session, have recently gained increasing popularity. Previous research has focused on capturing the dynamics of sequential dependencies from complicated item transitions in a session by means of recurrent neural networks, self-attention models, and recently, mostly graph neural networks. Despite the plethora of different models relying on the order of items in a session, few approaches have been proposed for dealing better with the temporal implications between interactions. We present Temporal Graph Neural Networks (TempGNN), a generic framework for capturing the structural and temporal dynamics in complex item transitions utilizing temporal embedding operators on nodes and edges on dynamic session graphs, represented as sequences of timed events. Extensive experimental results show the effectiveness and adaptability of the proposed method by plugging it into existing state-of-the-art models. Finally, TempGNN achieved state-of-the-art performance on two real-world e-commerce datasets.Here's the translation in Traditional Chinese as well:SESSION-based recommendations, which predict the next action by understanding a user's interaction behavior with items within a relatively short ongoing session, have recently gained increasing popularity. Previous research has focused on capturing the dynamics of sequential dependencies from complicated item transitions in a session by means of recurrent neural networks, self-attention models, and recently, mostly graph neural networks. Despite the plethora of different models relying on the order of items in a session, few approaches have been proposed for dealing better with the temporal implications between interactions. We present Temporal Graph Neural Networks (TempGNN), a generic framework for capturing the structural and temporal dynamics in complex item transitions utilizing temporal embedding operators on nodes and edges on dynamic session graphs, represented as sequences of timed events. Extensive experimental results show the effectiveness and adaptability of the proposed method by plugging it into existing state-of-the-art models. Finally, TempGNN achieved state-of-the-art performance on two real-world e-commerce datasets.

FLEE-GNN: A Federated Learning System for Edge-Enhanced Graph Neural Network in Analyzing Geospatial Resilience of Multicommodity Food Flows

paper_url: http://arxiv.org/abs/2310.13248
repo_url: https://github.com/geods/flee-gnn
paper_authors: Yuxiao Qu, Jinmeng Rao, Song Gao, Qianheng Zhang, Wei-Lun Chao, Yu Su, Michelle Miller, Alfonso Morales, Patrick Huber
for: 这篇论文旨在探讨如何使用 Federated Learning System for Edge-Enhanced Graph Neural Network (FLEE-GNN) 来解决食品供应网络的可恢复性问题，以提高食品安全性。
methods: 该论文提出了一种基于 Federated Learning 的方法，使用 Graph Neural Network (GNN) 来分析食品供应网络的可恢复性。这种方法结合了 GNN 的强大和鲁棒性，以及 Federated Learning 的隐私保护和分布式特点。
results: 该论文的实验结果表明，FLEE-GNN 可以有效地提高食品供应网络的可恢复性分析，并且可以应用于其他的空间网络中。

Abstract
Understanding and measuring the resilience of food supply networks is a global imperative to tackle increasing food insecurity. However, the complexity of these networks, with their multidimensional interactions and decisions, presents significant challenges. This paper proposes FLEE-GNN, a novel Federated Learning System for Edge-Enhanced Graph Neural Network, designed to overcome these challenges and enhance the analysis of geospatial resilience of multicommodity food flow network, which is one type of spatial networks. FLEE-GNN addresses the limitations of current methodologies, such as entropy-based methods, in terms of generalizability, scalability, and data privacy. It combines the robustness and adaptability of graph neural networks with the privacy-conscious and decentralized aspects of federated learning on food supply network resilience analysis across geographical regions. This paper also discusses FLEE-GNN's innovative data generation techniques, experimental designs, and future directions for improvement. The results show the advancements of this approach to quantifying the resilience of multicommodity food flow networks, contributing to efforts towards ensuring global food security using AI methods. The developed FLEE-GNN has the potential to be applied in other spatial networks with spatially heterogeneous sub-network distributions.

摘要
全球化的食品供应网络可靠性理解和测量是面临增长的食品不安全的全球需求。然而，这些网络的复杂性，包括多维度交互和决策，带来了重要的挑战。这篇论文提出了FLEE-GNN，一种新的联邦学习系统，用于增强地图分布式神经网络的地ospatial可靠性分析。FLEE-GNN将现有方法，如Entropy-based方法，超越了一致性、可扩展性和数据隐私方面的限制。它将图神经网络的可靠性和适应性与联邦学习的隐私性和分散性相结合，进行食品供应网络可靠性分析 Across geographical regions。本文还讨论了FLEE-GNN的创新数据生成技术、实验设计和未来改进方向。结果表明FLEE-GNN在多种 alimentary food flow networks 可靠性分析方面做出了进步，贡献到全球食品安全使用 AI 方法。发展的FLEE-GNN可以应用于其他的空间网络，具有空间不同互连分布。

Multi-level Contrastive Learning for Script-based Character Understanding

paper_url: http://arxiv.org/abs/2310.13231
repo_url: None
paper_authors: Dawei Li, Hengyuan Zhang, Yanran Li, Shiping Yang
for: 本研究目的是理解文本中的人物性格和身份，通过其讲话习惯了解其全面性。
methods: 我们提出了一种多级对比学习框架，用于捕捉人物的全面信息。我们进行了广泛的实验，与多种先进的语言模型进行比较，包括SpanBERT、Longformer、BigBird和ChatGPT-3.5。
results: 我们的方法可以大幅提高人物理解的性能，并通过进一步的分析，证明了我们的方法的有效性和解决了一些挑战。我们将在github上公开我们的工作，链接在https://github.com/David-Li0406/Script-based-Character-Understanding。

Abstract
In this work, we tackle the scenario of understanding characters in scripts, which aims to learn the characters' personalities and identities from their utterances. We begin by analyzing several challenges in this scenario, and then propose a multi-level contrastive learning framework to capture characters' global information in a fine-grained manner. To validate the proposed framework, we conduct extensive experiments on three character understanding sub-tasks by comparing with strong pre-trained language models, including SpanBERT, Longformer, BigBird and ChatGPT-3.5. Experimental results demonstrate that our method improves the performances by a considerable margin. Through further in-depth analysis, we show the effectiveness of our method in addressing the challenges and provide more hints on the scenario of character understanding. We will open-source our work on github at https://github.com/David-Li0406/Script-based-Character-Understanding.

摘要
在这项工作中，我们面临了剑道字符的理解enario，即从字符的讲话中学习其个性和身份。我们首先分析了这个enario中的一些挑战，然后提出了一种多级对比学习框架，以精细地捕捉字符的全球信息。为验证我们的方法，我们进行了广泛的实验，包括对三种字符理解子任务进行比较，其中包括SpanBERT、Longformer、BigBird和ChatGPT-3.5等强大的预训练语言模型。实验结果显示，我们的方法可以明显提高表现。通过进一步的深入分析，我们证明了我们的方法在面临挑战时的效果，并提供了更多有关字符理解scenario的提示。我们将在GitHub上开源我们的工作，可以在https://github.com/David-Li0406/Script-based-Character-Understanding中找到。

Absolute Policy Optimization

paper_url: http://arxiv.org/abs/2310.13230
repo_url: https://github.com/NhaPhatHanh/github
paper_authors: Weiye Zhao, Feihan Li, Yifan Sun, Rui Chen, Tianhao Wei, Changliu Liu
for: 解决复杂控制任务和游戏场景中的策略优化问题
methods: 引入新的目标函数优化策略，并通过一系列近似算法简化实现
results: 在复杂 kontinuous control benchmark 任务和 Atari 游戏中显著超越现状强度策略优化算法，并在预期性和最坏性性能两个方面具备显著改进

Abstract
In recent years, trust region on-policy reinforcement learning has achieved impressive results in addressing complex control tasks and gaming scenarios. However, contemporary state-of-the-art algorithms within this category primarily emphasize improvement in expected performance, lacking the ability to control over the worst-case performance outcomes. To address this limitation, we introduce a novel objective function; by optimizing which, it will lead to guaranteed monotonic improvement in the lower bound of near-total performance samples (absolute performance). Considering this groundbreaking theoretical advancement, we then refine this theoretically grounded algorithm through a series of approximations, resulting in a practical solution called Absolute Policy Optimization (APO). Our experiments demonstrate the effectiveness of our approach across challenging continuous control benchmark tasks and extend its applicability to mastering Atari games. Our findings reveal that APO significantly outperforms state-of-the-art policy gradient algorithms, resulting in substantial improvements in both expected performance and worst-case performance.

摘要

Interpretable Deep Reinforcement Learning for Optimizing Heterogeneous Energy Storage Systems

paper_url: http://arxiv.org/abs/2310.14783
repo_url: None
paper_authors: Luolin Xiong, Yang Tang, Chensheng Liu, Shuai Mao, Ke Meng, Zhaoyang Dong, Feng Qian
for: 提高能量存储系统（ESS）在能源市场中的灵活性和可再生能源利用率，通过杂质太阳能-存储系统（PV-ESS），利用磷酸存储（BES）和氢能存储（HES）的特点。
methods: 开发了一个完整的成本函数，包括衰变、资本和运营维护成本，用于反映实际场景。同时，提出了一种可解释性高的prototype-based policy网络，通过人工设计的原型来引导决策，使得决策过程自然地具有解释性。
results: 在四个不同的案例中，比较了黑obox模型和我们提出的可解释性优化方法，结果表明，我们的方法具有更高的效果和实用性。

Abstract
Energy storage systems (ESS) are pivotal component in the energy market, serving as both energy suppliers and consumers. ESS operators can reap benefits from energy arbitrage by optimizing operations of storage equipment. To further enhance ESS flexibility within the energy market and improve renewable energy utilization, a heterogeneous photovoltaic-ESS (PV-ESS) is proposed, which leverages the unique characteristics of battery energy storage (BES) and hydrogen energy storage (HES). For scheduling tasks of the heterogeneous PV-ESS, cost description plays a crucial role in guiding operator's strategies to maximize benefits. We develop a comprehensive cost function that takes into account degradation, capital, and operation/maintenance costs to reflect real-world scenarios. Moreover, while numerous methods excel in optimizing ESS energy arbitrage, they often rely on black-box models with opaque decision-making processes, limiting practical applicability. To overcome this limitation and enable transparent scheduling strategies, a prototype-based policy network with inherent interpretability is introduced. This network employs human-designed prototypes to guide decision-making by comparing similarities between prototypical situations and encountered situations, which allows for naturally explained scheduling strategies. Comparative results across four distinct cases underscore the effectiveness and practicality of our proposed pre-hoc interpretable optimization method when contrasted with black-box models.

摘要
能量存储系统（ESS）是能源市场中的关键组件，同时作为能源供应者和消费者。ESS运营商可以通过优化存储设备的操作来获得利润。为了进一步提高ESS在能源市场中的灵活性和可再生能源利用率，我们提议了一种多种能量存储系统（PV-ESS），利用独特的电池能量存储（BES）和氢能存储（HES）特点。对于PV-ESS的调度任务，成本描述起到了关键作用，导引运营商的策略以最大化利润。我们开发了一个全面的成本函数，考虑了退化、资本和运营/维护成本，以准确反映实际场景。此外，虽然许多方法可以优化ESS的能源融合，但它们通常基于黑盒模型，即不可见的决策过程，限制了实际应用。为了缓解这一限制和提供透明的调度策略，我们引入了一种封装式政策网络，该网络使用人类设计的原型来引导决策，通过比较相似的原型 situación和遇到的 situación之间的相似性，以便自然地解释调度策略。对四个不同的案例进行比较结果表明，我们提出的先验可解释优化方法在黑盒模型的基础上显著超越了黑盒模型。

paper_url: http://arxiv.org/abs/2310.13227
repo_url: None
paper_authors: Yuchen Zhuang, Xiang Chen, Tong Yu, Saayan Mitra, Victor Bursztyn, Ryan A. Rossi, Somdeb Sarkhel, Chao Zhang
for: 这个论文的目的是提出一种高效的搜索算法，用于LLM-based自动代理在具有广泛的动作空间的问题上进行决策和规划。
methods: 这个论文使用的方法是基于搜索算法的树形搜索算法，具体来说是将整个动作空间转换为决策树，每个节点表示一个可能的API函数调用。这个算法利用A*搜索算法和任务特定的成本函数设计，高效地快速搜索最低成本的有效路径。
results: 实验结果表明，ToolChain* 算法可以高效地平衡探索和利用在具有广泛的动作空间中，并在计划和理解任务上比基eline表现出3.1%和3.5%的提升，同时需要7.35倍和2.31倍的时间。

Abstract
Large language models (LLMs) have demonstrated powerful decision-making and planning capabilities in solving complicated real-world problems. LLM-based autonomous agents can interact with diverse tools (e.g., functional APIs) and generate solution plans that execute a series of API function calls in a step-by-step manner. The multitude of candidate API function calls significantly expands the action space, amplifying the critical need for efficient action space navigation. However, existing methods either struggle with unidirectional exploration in expansive action spaces, trapped into a locally optimal solution, or suffer from exhaustively traversing all potential actions, causing inefficient navigation. To address these issues, we propose ToolChain*, an efficient tree search-based planning algorithm for LLM-based agents. It formulates the entire action space as a decision tree, where each node represents a possible API function call involved in a solution plan. By incorporating the A* search algorithm with task-specific cost function design, it efficiently prunes high-cost branches that may involve incorrect actions, identifying the most low-cost valid path as the solution. Extensive experiments on multiple tool-use and reasoning tasks demonstrate that ToolChain* efficiently balances exploration and exploitation within an expansive action space. It outperforms state-of-the-art baselines on planning and reasoning tasks by 3.1% and 3.5% on average while requiring 7.35x and 2.31x less time, respectively.

摘要

Scalable Neural Network Kernels

paper_url: http://arxiv.org/abs/2310.13225
repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
paper_authors: Arijit Sehanobish, Krzysztof Choromanski, Yunfan Zhao, Avinava Dubey, Valerii Likhosherstov
for: 这篇论文目标是提出一种可扩展神经网络核心（SNNK），用于取代常见的前向传输层（FFL），能够approximate FFL，但具有更有利的计算性质。
methods: 这篇论文使用了SNNK，它可以分离输入和参数之间的关系，只在最终计算中via点积kernel进行连接。它们还是FFL的约等价表示，可以模型输入和参数之间的复杂关系。此外，文章还提出了神经网络集成过程，通过应用SNNK来压缩深度神经网络架构，从而获得额外的压缩效果。在极端情况下，这会导致完全集成的神经网络，其优化参数可以通过explicit的方程表示出来，打开了一个可以 circumvent backpropagation的可能性。
results: 文章提供了严格的理论分析和广泛的实验评估，从点积kernel估计到Transformers的精细调整with novel adapter layers inspired by SNNKs。结果表明，使用SNNK可以减少训练参数的数量，保持竞争性的准确性。

Abstract
We introduce the concept of scalable neural network kernels (SNNKs), the replacements of regular feedforward layers (FFLs), capable of approximating the latter, but with favorable computational properties. SNNKs effectively disentangle the inputs from the parameters of the neural network in the FFL, only to connect them in the final computation via the dot-product kernel. They are also strictly more expressive, as allowing to model complicated relationships beyond the functions of the dot-products of parameter-input vectors. We also introduce the neural network bundling process that applies SNNKs to compactify deep neural network architectures, resulting in additional compression gains. In its extreme version, it leads to the fully bundled network whose optimal parameters can be expressed via explicit formulae for several loss functions (e.g. mean squared error), opening a possibility to bypass backpropagation. As a by-product of our analysis, we introduce the mechanism of the universal random features (or URFs), applied to instantiate several SNNK variants, and interesting on its own in the context of scalable kernel methods. We provide rigorous theoretical analysis of all these concepts as well as an extensive empirical evaluation, ranging from point-wise kernel estimation to Transformers' fine-tuning with novel adapter layers inspired by SNNKs. Our mechanism provides up to 5x reduction in the number of trainable parameters, while maintaining competitive accuracy.

摘要
我们介绍了扩展神经网络底层（SNNK），它们取代了常规传递层（FFL），并能够模拟它们，但具有更有利的计算特性。SNNK将输入与神经网络参数分离开来，仅在最终计算中通过点产生kernel来连接。它们还是严格更加表达力强，因为它们可以模型复杂的关系，超过传递参数vector的数统产生器。我们还介绍了将SNNK应用于压缩深度神经网络架构的神经网络卷绿（bundling）过程，从而获得额外的压缩成本。在极端版本下，它导致完全卷绿网络，其最佳参数可以通过明确的公式表示出来，实现了几个损失函数（例如平均方差）的最佳化。作为一个副作用，我们介绍了内在特征随机抽象（URF）的机制，它可以实现多种SNNK的实例，并且在扩展层神经网络方面具有兴趣。我们对这些概念进行了严谨的理论分析，以及广泛的实验评估，从单元测度到Transformers的精细调整 avec novel adapter layers inspired by SNNKs。我们的机制可以实现5倍的受训练 парамет数减少，保持竞争性的精度。

HierCas: Hierarchical Temporal Graph Attention Networks for Popularity Prediction in Information Cascades

paper_url: http://arxiv.org/abs/2310.13219
repo_url: https://github.com/Daisy-zzz/HierCas
paper_authors: Zhizhen Zhang, Xiaohui Xie, Yishuo Zhang, Lanshan Zhang, Yong Jiang
for: 预测信息冲击的吸引力，用于识别假新闻和提供准确的推荐。
methods: 使用神经网络方法，不同于传统的特征基于方法，它们具有域特定的特性和新领域的不适应性。
results: 在两个实际 dataset 上，对比州的方法，提出了一种新的框架called Hierarchical Temporal Graph Attention Networks for cascade popularity prediction (HierCas)，实现了更高的准确率和更好的性能。

Abstract
Information cascade popularity prediction is critical for many applications, including but not limited to identifying fake news and accurate recommendations. Traditional feature-based methods heavily rely on handcrafted features, which are domain-specific and lack generalizability to new domains. To address this problem, researchers have turned to neural network-based approaches. However, existing methods follow a sampling-based modeling approach, potentially losing continuous dynamic information and structural-temporal dependencies that emerge during the information diffusion process. In this paper, we propose a novel framework called Hierarchical Temporal Graph Attention Networks for cascade popularity prediction (HierCas). Unlike existing methods, HierCas operates on the entire cascade graph by a dynamic graph modeling approach, enabling it to capture the full range of continuous dynamic information and explicitly model the interplay between structural and temporal factors. By leveraging time-aware node embedding, graph attention mechanisms and hierarchical pooling structures, HierCas effectively captures the popularity trend implicit in the complex cascade. Extensive experiments conducted on two real-world datasets in different scenarios demonstrate that our HierCas significantly outperforms the state-of-the-art approaches.

摘要
信息带动 popularity 预测是许多应用程序中的关键任务，包括但不限于识别假新闻和准确推荐。传统的特征基于方法依赖于手工设计的特征，这些特征是域特定的，缺乏对新领域的普适性。为解决这个问题，研究人员转向神经网络基于方法。然而，现有方法采用采样基本的模型化方法，可能会失去流行化过程中的连续动态信息和结构时间依赖关系。在本文中，我们提出了一种新的框架，即层次时间图注意力网络（HierCas）。与现有方法不同，HierCas 在整个带动图上运行，能够捕捉整个流行化过程中的连续动态信息，并且可以显式地模型结构时间因素之间的交互作用。通过利用时间意识节点嵌入、图注意力机制和层次聚合结构，HierCas 能够有效地捕捉流行趋势的隐含信息。我们在两个实际场景中进行了大量实验，结果表明，我们的 HierCas 在比较 estado-of-the-art 方法的情况下显著 OUTPERFORM。

MultiCoNER v2: a Large Multilingual dataset for Fine-grained and Noisy Named Entity Recognition

paper_url: http://arxiv.org/abs/2310.13213
repo_url: None
paper_authors: Besnik Fetahu, Zhiyu Chen, Sudipta Kar, Oleg Rokhlenko, Shervin Malmasi
for: 本文提出了一个新的名实体识别（NER）数据集，即MULTICONER V2，用于解决细化的名实体识别问题。
methods: 本文使用了开源资源如Wikipedia和Wikidata来编译数据集，并在多语言环境下进行了评估。
results: evaluation表明，MULTICONER V2数据集具有较低的精度，macro-F1=0.63（所有语言），并且对实体干扰具有更大的影响，相比于上下文干扰。

Abstract
We present MULTICONER V2, a dataset for fine-grained Named Entity Recognition covering 33 entity classes across 12 languages, in both monolingual and multilingual settings. This dataset aims to tackle the following practical challenges in NER: (i) effective handling of fine-grained classes that include complex entities like movie titles, and (ii) performance degradation due to noise generated from typing mistakes or OCR errors. The dataset is compiled from open resources like Wikipedia and Wikidata, and is publicly available. Evaluation based on the XLM-RoBERTa baseline highlights the unique challenges posed by MULTICONER V2: (i) the fine-grained taxonomy is challenging, where the scores are low with macro-F1=0.63 (across all languages), and (ii) the corruption strategy significantly impairs performance, with entity corruption resulting in 9% lower performance relative to non-entity corruptions across all languages. This highlights the greater impact of entity noise in contrast to context noise.

摘要
我们介绍MULTICONER V2 dataset，用于细化Named Entity Recognition（NER），覆盖12种语言和33种实体类。该dataset的目标是解决NER中的两个实际挑战：（i）精细类实体，如电影标题，的处理，（ii）由 typing mistakes 或 OCR errors 生成的噪声的影响。dataset 从开源资源Wikipedia和Wikidata中 compile，并公共可用。使用XLM-RoBERTa基eline进行评估，显示MULTICONER V2具有以下唯一挑战：（i）细化分类困难，macro-F1 = 0.63（所有语言），（ii）损害策略对性能产生显著影响，实体损害相对于非实体损害在所有语言上下降9%。这反映了实体噪声比 контекст噪声更大的影响。

Primacy Effect of ChatGPT

paper_url: http://arxiv.org/abs/2310.13206
repo_url: None
paper_authors: Yiwei Wang, Yujun Cai, Muhao Chen, Yuxuan Liang, Bryan Hooi
For: The paper studies the primacy effect of ChatGPT, which is the tendency of selecting the labels at earlier positions as the answer.* Methods: The paper uses ChatGPT to query the model with prompts containing the question and candidate labels, and analyzes the model’s decision-making process.* Results: The paper finds that ChatGPT’s decision is sensitive to the order of labels in the prompt, and the model has a higher chance of selecting the labels at earlier positions as the answer.Here’s the simplified Chinese text for the three information points:* For: 这篇论文研究了ChatGPT模型中的主导效应，即在提问中选择早些位置的标签作为答案。* Methods: 论文使用ChatGPT模型来查询提问和候选标签，并分析模型的决策过程。* Results: 论文发现ChatGPT的决策受提问中标签顺序的影响，模型更有可能选择提问中早些位置的标签作为答案。

Abstract
Instruction-tuned large language models (LLMs), such as ChatGPT, have led to promising zero-shot performance in discriminative natural language understanding (NLU) tasks. This involves querying the LLM using a prompt containing the question, and the candidate labels to choose from. The question-answering capabilities of ChatGPT arise from its pre-training on large amounts of human-written text, as well as its subsequent fine-tuning on human preferences, which motivates us to ask: Does ChatGPT also inherits humans' cognitive biases? In this paper, we study the primacy effect of ChatGPT: the tendency of selecting the labels at earlier positions as the answer. We have two main findings: i) ChatGPT's decision is sensitive to the order of labels in the prompt; ii) ChatGPT has a clearly higher chance to select the labels at earlier positions as the answer. We hope that our experiments and analyses provide additional insights into building more reliable ChatGPT-based solutions. We release the source code at https://github.com/wangywUST/PrimacyEffectGPT.

摘要
大型语言模型（LLM），如ChatGPT，在推导性自然语言理解（NLU）任务中表现出了可喜的零shot性。这 involves 使用提示符 containing 问题和候选标签，并让 LLM 回答问题。 ChatGPT 的问答能力来自于其在大量人类写的文本上进行预训练，以及其后续的人类偏好的 fine-tuning，这使我们想问： ChatGPT 也继承了人类的认知偏见吗？在这篇论文中，我们研究 ChatGPT 的 primacy effect：提示符中标签的顺序对 ChatGPT 的决策产生影响。我们有两个主要发现：i）ChatGPT 的决策受提示符中标签的顺序影响；ii）ChatGPT 对提示符中早期位置的标签有明显更高的选择概率。我们希望通过我们的实验和分析，为建立更可靠的 ChatGPT-based 解决方案提供更多的意见。我们在 GitHub 上发布了源代码，请参考。

Towards Detecting Contextual Real-Time Toxicity for In-Game Chat

paper_url: http://arxiv.org/abs/2310.18330
repo_url: None
paper_authors: Zachary Yang, Nicolas Grenan-Godbout, Reihaneh Rabbany
for: 这篇论文是为了实现在在线环境中实时探测毒性内容而写的。
methods: 这篇论文使用了一种简单可扩展的模型，该模型可以在实时聊天中可靠地检测毒性内容，并包括聊天历史和元数据。
results: 该模型在多player游戏中表现出色，可以成功地检测毒性内容，并且可以在聊天报告后进行后勤调节，成功标记82.1%的聊天报告用户，准确率为90.0%。

Abstract
Real-time toxicity detection in online environments poses a significant challenge, due to the increasing prevalence of social media and gaming platforms. We introduce ToxBuster, a simple and scalable model that reliably detects toxic content in real-time for a line of chat by including chat history and metadata. ToxBuster consistently outperforms conventional toxicity models across popular multiplayer games, including Rainbow Six Siege, For Honor, and DOTA 2. We conduct an ablation study to assess the importance of each model component and explore ToxBuster's transferability across the datasets. Furthermore, we showcase ToxBuster's efficacy in post-game moderation, successfully flagging 82.1% of chat-reported players at a precision level of 90.0%. Additionally, we show how an additional 6% of unreported toxic players can be proactively moderated.

摘要
实时恶意检测在在线环境中具有重要挑战，由于社交媒体和游戏平台的广泛使用。我们介绍ToxBuster，一种简单可扩展的模型，可靠地在实时中检测恶意内容，并包括交谈历史和元数据。ToxBuster在流行的多人游戏中，如雨丝六世、荣誉之战和DOTA 2等，consistently outperforms conventional toxicity models。我们进行了减少研究，以评估模型组件的重要性，并探索ToxBuster的可转移性。此外，我们展示了ToxBuster在后期 Moderation 中的效果，成功地标记了82.1%的交谈报告用户，准确率为90.0%。此外，我们还表明了一个额外的6%的恶意用户可以被早期 Moderation。

2023-10-20

Deep Learning Approaches for Dynamic Mechanical Analysis of Viscoelastic Fiber Composites

Evoke: Evoking Critical Thinking Abilities in LLMs via Reviewer-Author Prompt Editing

CNN-based Prediction of Partition Path for VVC Fast Inter Partitioning Using Motion Fields

GraphMaker: Can Diffusion Models Generate Large Attributed Graphs?

Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models

FERI: A Multitask-based Fairness Achieving Algorithm with Applications to Fair Organ Transplantation

FATA-Trans: Field And Time-Aware Transformer for Sequential Tabular Data

Enhanced Low-Dimensional Sensing Mapless Navigation of Terrestrial Mobile Robots Using Double Deep Reinforcement Learning Techniques

RoseNet: Predicting Energy Metrics of Double InDel Mutants Using Deep Learning

Improving Molecular Properties Prediction Through Latent Space Fusion

Specific versus General Principles for Constitutional AI

Enhancing Illicit Activity Detection using XAI: A Multimodal Graph-LLM Framework

Fundamental Limits of Membership Inference Attacks on Machine Learning Models

Copyright Violations and Large Language Models

Neural-Base Music Generation for Intelligence Duplication

Optimizing Retrieval-augmented Reader Models via Token Elimination

Long-Form Speech Translation through Segmentation with Finite-State Decoding Constraints on Large Language Models

Let’s Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models

ManifoldNeRF: View-dependent Image Feature Supervision for Few-shot Neural Radiance Fields

Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis

An experimental study for early diagnosing Parkinson’s disease using machine learning

Weighted Joint Maximum Mean Discrepancy Enabled Multi-Source-Multi-Target Unsupervised Domain Adaptation Fault Diagnosis

Contrastive Preference Learning: Learning from Human Feedback without RL

Hunayn: Elevating Translation Beyond the Literal

Make Your Decision Convincing! A Unified Two-Stage Framework: Self-Attribution and Decision-Making

MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection Benchmark

Skin Lesion Segmentation Improved by Transformer-based Networks with Inter-scale Dependency Modeling

MarineGPT: Unlocking Secrets of Ocean to the Public

Towards equilibrium molecular conformation generation with GFlowNets

ReLM: Leveraging Language Models for Enhanced Chemical Reaction Prediction

SPARE: A Single-Pass Neural Model for Relational Databases

Tree Search in DAG Space with Model-based Reinforcement Learning for Causal Discovery

Boosting Generalization with Adaptive Style Techniques for Fingerprint Liveness Detection

Retrieval-Augmented Neural Response Generation Using Logical Reasoning and Relevance Scoring

Reward Shaping for Happier Autonomous Cyber Security Agents

Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning

Towards Understanding Sycophancy in Language Models

ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection

Positive-Unlabeled Node Classification with Structure-aware Graph Learning

Technical Report for ICCV 2023 Visual Continual Learning Challenge: Continuous Test-time Adaptation for Semantic Segmentation

Design-Inclusive Language Models for Responsible Information Access

Variational measurement-based quantum computation for generative modeling

RaceLens: A Machine Intelligence-Based Application for Racing Photo Analysis

Explaining Interactions Between Text Spans

Robust Training for Conversational Question Answering Models with Reinforced Reformulation Generation

Analogical Proportions and Creativity: A Preliminary Study

Mind the instructions: a holistic evaluation of consistency and interactions in prompt-based learning

Application of deep learning for livestock behaviour recognition: A systematic literature review

Ask Language Model to Clean Your Noisy Translation Data

WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models

Multiscale Superpixel Structured Difference Graph Convolutional Network for VL Representation

Self-Consistency of Large Language Models under Ambiguity

Random Matrix Analysis to Balance between Supervised and Unsupervised Learning under the Low Density Separation Assumption

FLTracer: Accurate Poisoning Attack Provenance in Federated Learning

AllTogether: Investigating the Efficacy of Spliced Prompt for Web Navigation using Large Language Models

A Novel Transfer Learning Method Utilizing Acoustic and Vibration Signals for Rotating Machinery Fault Diagnosis

POSQA: Probe the World Models of LLMs with Size Comparisons

Learning Successor Representations with Distributed Hebbian Temporal Memory

A Human-Robot Mutual Learning System with Affect-Grounded Language Acquisition and Differential Outcomes Training

VFedMH: Vertical Federated Learning for Training Multi-party Heterogeneous Models

Towards General Error Diagnosis via Behavioral Testing in Machine Translation

Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation

DYNAMITE: Dynamic Interplay of Mini-Batch Size and Aggregation Frequency for Federated Learning with Static and Streaming Dataset

NurViD: A Large Expert-Level Video Database for Nursing Procedure Activity Understanding

Challenges and Contributing Factors in the Utilization of Large Language Models (LLMs)

FLAIR: a Country-Scale Land Cover Semantic Segmentation Dataset From Multi-Source Optical Imagery

Democratizing Reasoning Ability: Tailored Learning from Large Language Model

Boosting for Bounding the Worst-class Error

Coarse-to-Fine Dual Encoders are Better Frame Identification Learners

Decoding the Silent Majority: Inducing Belief Augmented Social Graph with Large Language Model for Response Forecasting

PathRL: An End-to-End Path Generation Method for Collision Avoidance via Deep Reinforcement Learning

Assessing Privacy Risks in Language Models: A Case Study on Summarization Tasks

Unified Pretraining for Recommendation via Task Hypergraphs

Meta-learning of Physics-informed Neural Networks for Efficiently Solving Newly Given PDEs

An Exploratory Study on Simulated Annealing for Feature Selection in Learning-to-Rank

Enhancing drug and cell line representations via contrastive learning for improved anti-cancer drug prioritization

ManiCast: Collaborative Manipulation with Cost-Aware Human Forecasting

Visual Grounding Helps Learn Word Meanings in Low-Data Regimes

TempGNN: Temporal Graph Neural Networks for Dynamic Session-Based Recommendations

ToolChain: Efficient Action Space Navigation in Large Language Models with A Search