2023-09-02

cs.AI

cs.AI - 2023-09-02

Neurosymbolic Reinforcement Learning and Planning: A Survey

paper_url: http://arxiv.org/abs/2309.01038
repo_url: None
paper_authors: K. Acharya, W. Raza, C. M. J. M. Dourado Jr, A. Velasquez, H. Song
for: 本研究的目的是对 neurosymbolic 人工智能（Neurosymbolic AI）领域的发展进行文献综述，特别是 neurosymbolic deep learning（Neurosymbolic DL）和 neurosymbolic reinforcement learning（Neurosymbolic RL）这两个子领域。methods: 本研究使用文献综述的方法，对 neurosymbolic RL 领域的研究进行分类和概括。三种分类方法是：学习 для理解、理解 для学习和学习-理解。这些分类方法再次细分为各个应用领域。results: 本研究发现 neurosymbolic RL 领域的研究主要集中在三个方面：学习、理解和决策。学习方面包括对于不同应用领域的学习方法和技术的研究，例如 image recognition 和自然语言处理。理解方面包括对于不同应用领域的理解方法和技术的研究，例如知识 Graph 和 Semantic Reasoning。决策方面包括对于不同应用领域的决策方法和技术的研究，例如 deep reinforcement learning 和 Transfer Learning。

Abstract
The area of Neurosymbolic Artificial Intelligence (Neurosymbolic AI) is rapidly developing and has become a popular research topic, encompassing sub-fields such as Neurosymbolic Deep Learning (Neurosymbolic DL) and Neurosymbolic Reinforcement Learning (Neurosymbolic RL). Compared to traditional learning methods, Neurosymbolic AI offers significant advantages by simplifying complexity and providing transparency and explainability. Reinforcement Learning(RL), a long-standing Artificial Intelligence(AI) concept that mimics human behavior using rewards and punishment, is a fundamental component of Neurosymbolic RL, a recent integration of the two fields that has yielded promising results. The aim of this paper is to contribute to the emerging field of Neurosymbolic RL by conducting a literature survey. Our evaluation focuses on the three components that constitute Neurosymbolic RL: neural, symbolic, and RL. We categorize works based on the role played by the neural and symbolic parts in RL, into three taxonomies:Learning for Reasoning, Reasoning for Learning and Learning-Reasoning. These categories are further divided into sub-categories based on their applications. Furthermore, we analyze the RL components of each research work, including the state space, action space, policy module, and RL algorithm. Additionally, we identify research opportunities and challenges in various applications within this dynamic field.

摘要
neural 符号人工智能（Neurosymbolic AI）领域在迅速发展，已成为研究热点，涵盖子领域 such as Neurosymbolic Deep Learning（Neurosymbolic DL）和Neurosymbolic Reinforcement Learning（Neurosymbolic RL）。相比传统学习方法，Neurosymbolic AI 提供了 significan advantages，例如简化复杂性和提供透明性和解释性。人工智能（AI）概念，模拟人类行为使用奖励和惩罚的 Reinforcement Learning（RL），是 Neurosymbolic RL 的基础组件，是一种最近 integrate 两个领域的成果，并产生了有前途的结果。本文的目标是为emerging 的 Neurosymbolic RL 领域进行文献综述。我们的评估将关注Neurosymbolic RL 中的三个组件：神经、符号和RL。我们根据这三个组件在 RL 中的角色，将工作分为三类：学习为理解、理解为学习和学习-理解。这些类别进一步分为应用的子类别。此外，我们还分析了每个研究作品中的 RL 组件，包括状态空间、动作空间、策略模块和RL算法。此外，我们还识别了在不同应用场景中的研究机会和挑战。

Deep Deformable Models: Learning 3D Shape Abstractions with Part Consistency

paper_url: http://arxiv.org/abs/2309.01035
repo_url: None
paper_authors: Di Liu, Long Zhao, Qilong Zhangli, Yunhe Gao, Ting Liu, Dimitris N. Metaxas
for: 本研究旨在提出一种能够精准抽象自然物体形状的深度学习方法，以提高形状理解和应用。
methods: 本方法基于深度学习的Deep Deformable Models (DDMs)，它使用全局变换和 diffeomorphic 本地变换来描述物体形状，并可以学习到精准的部件相对位置和尺寸。
results: 在ShapeNet数据集上进行了广泛的实验，并证明了DDMs可以在形状抽象中具有更高的精度和部件一致性，并且超过了现有的状态场方法。

Abstract
The task of shape abstraction with semantic part consistency is challenging due to the complex geometries of natural objects. Recent methods learn to represent an object shape using a set of simple primitives to fit the target. \textcolor{black}{However, in these methods, the primitives used do not always correspond to real parts or lack geometric flexibility for semantic interpretation.} In this paper, we investigate salient and efficient primitive descriptors for accurate shape abstractions, and propose \textit{Deep Deformable Models (DDMs)}. DDM employs global deformations and diffeomorphic local deformations. These properties enable DDM to abstract complex object shapes with significantly fewer primitives that offer broader geometry coverage and finer details. DDM is also capable of learning part-level semantic correspondences due to the differentiable and invertible properties of our primitive deformation. Moreover, DDM learning formulation is based on dynamic and kinematic modeling, which enables joint regularization of each sub-transformation during primitive fitting. Extensive experiments on \textit{ShapeNet} demonstrate that DDM outperforms the state-of-the-art in terms of reconstruction and part consistency by a notable margin.

摘要
shape abstraction with semantic part consistency是一项复杂的任务，因为自然物体的geometry是多样的。现有的方法通过使用一组简单的基本元素来表示物体形状，但这些基本元素并不总是真实的部分或者缺乏地理学灵活性。在这篇论文中，我们调查了突出的和高效的基本描述符，并提出了深度可变模型（DDM）。DDM使用全局变形和 diffeomorphic 地方变形，这些特性使得 DDM 可以准确抽象复杂的物体形状，并且只需要 fewer primitives，可以更好地覆盖各种geometry和细节。此外，DDM 可以学习部分 semantics 的匹配，因为我们的基本描述符具有可微和反函数性。此外，我们的学习框架基于动态和静态模型，可以同时规范每个子转换的定制。我们的实验表明，DDM 在 ShapeNet 上的重建和部件一致性方面，与状态对比明显提高。

Explainability for Large Language Models: A Survey

paper_url: http://arxiv.org/abs/2309.01029
repo_url: https://github.com/Aryia-Behroziuan/Other-sources
paper_authors: Haiyan Zhao, Hanjie Chen, Fan Yang, Ninghao Liu, Huiqi Deng, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Mengnan Du
for: 本研究旨在掌握和解释大型自然语言处理模型（LLMs）的内部机制，以便了解其行为、局限性和社会影响。
methods: 本文提出了一种分类法，用于描述和解释基于Transformer架构的语言模型。该分类法基于语言模型的训练方法，包括传统的精度训练方法和提示方法。
results: 本文提供了一个结构化的概述，描述了基于Transformer架构的语言模型的解释技术。还介绍了评估生成的解释 metric，以及如何使用解释来调试模型和提高性能。

Abstract
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. However, their internal mechanisms are still unclear and this lack of transparency poses unwanted risks for downstream applications. Therefore, understanding and explaining these models is crucial for elucidating their behaviors, limitations, and social impacts. In this paper, we introduce a taxonomy of explainability techniques and provide a structured overview of methods for explaining Transformer-based language models. We categorize techniques based on the training paradigms of LLMs: traditional fine-tuning-based paradigm and prompting-based paradigm. For each paradigm, we summarize the goals and dominant approaches for generating local explanations of individual predictions and global explanations of overall model knowledge. We also discuss metrics for evaluating generated explanations, and discuss how explanations can be leveraged to debug models and improve performance. Lastly, we examine key challenges and emerging opportunities for explanation techniques in the era of LLMs in comparison to conventional machine learning models.

摘要
In this paper, we introduce a taxonomy of explainability techniques and provide a structured overview of methods for explaining Transformer-based language models. We categorize techniques based on the training paradigms of LLMs: traditional fine-tuning-based paradigm and prompting-based paradigm. For each paradigm, we summarize the goals and dominant approaches for generating local explanations of individual predictions and global explanations of overall model knowledge. We also discuss metrics for evaluating generated explanations, and discuss how explanations can be leveraged to debug models and improve performance. Lastly, we examine key challenges and emerging opportunities for explanation techniques in the era of LLMs in comparison to conventional machine learning models. Traditional fine-tuning-based paradigm:1. Local explanations: * feature importance analysis * saliency maps * attention visualization2. Global explanations: * model interpretability techniques * knowledge distillationPrompting-based paradigm:1. Local explanations: * prompt-based attention analysis * prompt-based feature importance analysis2. Global explanations: * prompt-based knowledge distillationEvaluation metrics:1. accuracy2. F1-score3. ROUGE score4. BLEU scoreChallenges and opportunities:1. interpretability of complex models2. lack of transparency in decision-making processes3. potential biases and ethical considerations4. opportunities for improving model performance and trustworthiness5. potential applications in natural language processing and beyond.

Zero-Shot Recommendations with Pre-Trained Large Language Models for Multimodal Nudging

paper_url: http://arxiv.org/abs/2309.01026
repo_url: https://github.com/paxnea/wain23
paper_authors: Rachel Harrison, Anton Dereventsov, Anton Bibin
for: 这篇论文旨在提出一种针对多Modal非站点内容的零shot推荐方法，利用最新的生成AI技术。
methods: 该方法提议将不同Modal的输入描述为文本描述，使用预训练的LLM获取它们的数字表示，并计算它们之间的相似度来进行推荐。
results: 在一个synthetic多Modal推动环境中，该方法可以准确地推荐多Modal的内容项，不需要额外学习。

Abstract
We present a method for zero-shot recommendation of multimodal non-stationary content that leverages recent advancements in the field of generative AI. We propose rendering inputs of different modalities as textual descriptions and to utilize pre-trained LLMs to obtain their numerical representations by computing semantic embeddings. Once unified representations of all content items are obtained, the recommendation can be performed by computing an appropriate similarity metric between them without any additional learning. We demonstrate our approach on a synthetic multimodal nudging environment, where the inputs consist of tabular, textual, and visual data.

摘要
我们提出了一种零shot推荐多Modal非站点内容的方法，利用最新的生成AI技术进行实现。我们提议将不同模式的输入描述为文本描述，并使用预训练的LLM来获得它们的数字表示。一旦所有内容项的统一表示 obten得到， then recommendation可以通过计算相应的相似度 metric来进行，无需进行额外学习。我们在一个Synthetic多Modal拖延环境中进行了示例，输入包括表格、文本和视觉数据。Note:* "零shot" (zero-shot) refers to the fact that the recommendation is done without any additional training or learning of the model.* "多Modal" (multimodal) refers to the fact that the inputs consist of multiple modalities, such as tabular, textual, and visual data.* "非站点" (non-stationary) refers to the fact that the inputs are not stationary, meaning that they are not fixed and can change over time.

Mitigating Motion Blur for Robust 3D Baseball Player Pose Modeling for Pitch Analysis

paper_url: http://arxiv.org/abs/2309.01010
repo_url: None
paper_authors: Jerrin Bright, Yuhao Chen, John Zelek
for: 用于 baseball 投手的分析和减少伤病风险
methods: 使用计算机视觉基于pose分析，并利用Synthetic数据增强模型对快速动作的处理能力
results: 提高 pose 估计模型对运动人体动作的处理能力，并在不同的实际场景和摄像头位置下保持模型的稳定性

Abstract
Using videos to analyze pitchers in baseball can play a vital role in strategizing and injury prevention. Computer vision-based pose analysis offers a time-efficient and cost-effective approach. However, the use of accessible broadcast videos, with a 30fps framerate, often results in partial body motion blur during fast actions, limiting the performance of existing pose keypoint estimation models. Previous works have primarily relied on fixed backgrounds, assuming minimal motion differences between frames, or utilized multiview data to address this problem. To this end, we propose a synthetic data augmentation pipeline to enhance the model's capability to deal with the pitcher's blurry actions. In addition, we leverage in-the-wild videos to make our model robust under different real-world conditions and camera positions. By carefully optimizing the augmentation parameters, we observed a notable reduction in the loss by 54.2% and 36.2% on the test dataset for 2D and 3D pose estimation respectively. By applying our approach to existing state-of-the-art pose estimators, we demonstrate an average improvement of 29.2%. The findings highlight the effectiveness of our method in mitigating the challenges posed by motion blur, thereby enhancing the overall quality of pose estimation.

摘要
translate into Simplified Chinese:使用视频分析投手可以发挥重要作用，帮助战略和伤害预防。计算机视觉基于姿势分析提供了时间效益和成本效益的方法。然而，使用可 accessible 的广播视频，30fps 帧率，常常导致快速动作中人体部分动作模糊，限制现有的姿势关键点估计模型的性能。先前的工作主要依赖于固定背景，假设动作变化少，或者使用多视图数据来解决这个问题。为此，我们提议一种人工数据增强管道，以提高模型对投手模糊动作的处理能力。此外，我们利用野外视频，使我们的模型在不同的实际情况和摄像机位置下成为更加可靠。通过精心优化增强参数，我们观察到了测试数据集上的损失下降54.2%和36.2%，对2D和3D姿势估计模型进行了平均改进29.2%。通过应用我们的方法到现有的状态态arter-of-the-art姿势估计器，我们实现了平均改进29.2%。这些发现表明了我们的方法在对动作模糊的挑战下减轻影响，从而提高总体姿势估计质量。

Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon Manipulation

paper_url: http://arxiv.org/abs/2309.00987
repo_url: None
paper_authors: Yuanpei Chen, Chen Wang, Li Fei-Fei, C. Karen Liu
for: 这个研究旨在解决复杂长期任务中手部运动空间高维度和复合动力学性的挑战，提出了一个基于循环学习（RL）的总体系统，可以链接多个细致政策以实现长期任务目标。
methods: 这个系统使用了循环学习（RL），搭配了一个过程可行性函数，逐渐精致化子政策以提高链接成功率，同时允许自动政策调整以复原自失败和避免重复阶段。
results: 这个系统在实验中仅使用了几个任务物品进行训练，但能够实现zero-shot转移到真实世界中的机器人，并且能够对不同的物品进行适应和自动调整。更多细节和视频结果可以在https://sequential-dexterity.github.io获取。

Abstract
Many real-world manipulation tasks consist of a series of subtasks that are significantly different from one another. Such long-horizon, complex tasks highlight the potential of dexterous hands, which possess adaptability and versatility, capable of seamlessly transitioning between different modes of functionality without the need for re-grasping or external tools. However, the challenges arise due to the high-dimensional action space of dexterous hand and complex compositional dynamics of the long-horizon tasks. We present Sequential Dexterity, a general system based on reinforcement learning (RL) that chains multiple dexterous policies for achieving long-horizon task goals. The core of the system is a transition feasibility function that progressively finetunes the sub-policies for enhancing chaining success rate, while also enables autonomous policy-switching for recovery from failures and bypassing redundant stages. Despite being trained only in simulation with a few task objects, our system demonstrates generalization capability to novel object shapes and is able to zero-shot transfer to a real-world robot equipped with a dexterous hand. More details and video results could be found at https://sequential-dexterity.github.io

摘要
多个真实世界操作任务通常是一系列不同的子任务，这些长期任务表明了人工手的可靠性和多样性，可以无需重新抓取或使用外部工具，快速适应不同的模式功能。然而，由于高维动作空间和复杂的compositional dynamics，这些任务具有挑战。我们提出了Sequential Dexterity，一种基于奖励学习（RL）的通用系统，可以串行多个灵活策略以实现长期任务目标。系统的核心是一个过程可行性函数，逐渐细化子策略以提高串行成功率，同时允许自主的策略交换以恢复失败和绕过冗余阶段。尽管只在 simulate 环境中培养了几个任务对象，我们的系统仍然可以通过 Zero-shot 转移到真实世界中的 робоット，装备了灵活的手。更多细节和视频结果可以在找到。

Compositional Diffusion-Based Continuous Constraint Solvers

paper_url: http://arxiv.org/abs/2309.00966
repo_url: https://github.com/diffusion-ccsp/diffusion-ccsp.github.io
paper_authors: Zhutian Yang, Jiayuan Mao, Yilun Du, Jiajun Wu, Joshua B. Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling
for: 这篇论文是为了解决机器人智能和规划中的连续约束满足问题 (CCSP) 的学习方法。
methods: 这篇论文使用了compositional diffusion continuous constraint solver (Diffusion-CCSP) 模型，将 CCSP 表示为因子图，并将各种约束类型的能量结合起来，从而获得全局解决方案。
results: Diffusion-CCSP 模型能够强大地泛化到新的约束组合中，并可以与任务和运动规划结合，以生成包含整数和连续参数的长期计划。Here’s the English version of the three key points for reference:
for: This paper proposes a method for learning to solve continuous constraint satisfaction problems (CCSP) in robotic reasoning and planning.
methods: The proposed method uses a compositional diffusion continuous constraint solver (Diffusion-CCSP) model, which represents CCSPs as factor graphs and combines the energies of diffusion models trained to sample for individual constraint types.
results: The Diffusion-CCSP model demonstrates strong generalization to novel combinations of known constraints, and can be integrated into a task and motion planner to devise long-horizon plans that include actions with both discrete and continuous parameters.

Abstract
This paper introduces an approach for learning to solve continuous constraint satisfaction problems (CCSP) in robotic reasoning and planning. Previous methods primarily rely on hand-engineering or learning generators for specific constraint types and then rejecting the value assignments when other constraints are violated. By contrast, our model, the compositional diffusion continuous constraint solver (Diffusion-CCSP) derives global solutions to CCSPs by representing them as factor graphs and combining the energies of diffusion models trained to sample for individual constraint types. Diffusion-CCSP exhibits strong generalization to novel combinations of known constraints, and it can be integrated into a task and motion planner to devise long-horizon plans that include actions with both discrete and continuous parameters. Project site: https://diffusion-ccsp.github.io/

摘要
这篇论文介绍了一种用于解决连续约束满意问题（CCSP）的机器学习方法。先前的方法主要依靠手工设计或学习生成器特定约束类型，然后拒绝其他约束被违反。而我们的模型——复杂度分析 kontinuous constraint solver（Diffusion-CCSP）——通过将约束类型表示为因子图并将各类约束的能量组合起来，以 derivation global solution to CCSPs。Diffusion-CCSP具有强大的泛化能力，可以适应新的约束组合，并且可以与任务和运动规划结合，以生成长期规划，包括绝对和连续参数的动作。项目网站：https://diffusion-ccsp.github.io/

eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models

paper_url: http://arxiv.org/abs/2309.00964
repo_url: None
paper_authors: Minsik Cho, Keivan A. Vahid, Qichen Fu, Saurabh Adya, Carlo C Del Mundo, Mohammad Rastegari, Devang Naik, Peter Zatloukal
for: 这个论文的目的是提出一种可以实现高效对称质量调整的轻量级语言模型压缩方法，以便在储存限制的移动设备上实现更快的响应和更好的隐私保护。
methods: 这个方法使用了一种名为“Weight-Clustering”的非线性量化技术，并且使用了一些新的技术来实现可读性和可靠性。
results: 实验结果显示，这个方法可以将预训模型压缩到2.5GB（3bit/weight），并且在较宽的语言模型测试 benchmark 上保持了良好的准确性（例如PIQA的准确度为77.7%，Winograde的准确度为66.1%等）。

Abstract
Since Large Language Models or LLMs have demonstrated high-quality performance on many complex language tasks, there is a great interest in bringing these LLMs to mobile devices for faster responses and better privacy protection. However, the size of LLMs (i.e., billions of parameters) requires highly effective compression to fit into storage-limited devices. Among many compression techniques, weight-clustering, a form of non-linear quantization, is one of the leading candidates for LLM compression, and supported by modern smartphones. Yet, its training overhead is prohibitively significant for LLM fine-tuning. Especially, Differentiable KMeans Clustering, or DKM, has shown the state-of-the-art trade-off between compression ratio and accuracy regression, but its large memory complexity makes it nearly impossible to apply to train-time LLM compression. In this paper, we propose a memory-efficient DKM implementation, eDKM powered by novel techniques to reduce the memory footprint of DKM by orders of magnitudes. For a given tensor to be saved on CPU for the backward pass of DKM, we compressed the tensor by applying uniquification and sharding after checking if there is no duplicated tensor previously copied to CPU. Our experimental results demonstrate that \prjname can fine-tune and compress a pretrained LLaMA 7B model from 12.6 GB to 2.5 GB (3bit/weight) with the Alpaca dataset by reducing the train-time memory footprint of a decoder layer by 130$\times$, while delivering good accuracy on broader LLM benchmarks (i.e., 77.7% for PIQA, 66.1% for Winograde, and so on).

摘要
自 Large Language Models (LLMs) 在许多复杂语言任务上表现出色，因此有很大的兴趣将这些 LLMs 带到移动设备上进行更快的响应和更好的隐私保护。然而，LLMs 的大小（即数十亿参数）需要非常有效的压缩以适应存储有限的设备。许多压缩技术之一是 weight-clustering，它是一种非线性量化，并且由现代智能手机支持。然而，它的训练负担是 LLM 精度调整的瓶颈。特别是 Differentiable KMeans Clustering (DKM) 表现出了状态码的质量与精度回归的最佳平衡，但它的内存复杂度使其几乎不可能应用于训练时 LLM 压缩。在这篇论文中，我们提出了一种内存高效的 DKM 实现，即 eDKM，通过新的技术减少 DKM 的内存占用量。为一个给定的矩阵在 CPU 上进行 backwards 的 DKM，我们将矩阵压缩通过应用 uniquification 和 sharding，并且只有在检查到矩阵没有已经被复制到 CPU 的情况下进行压缩。我们的实验结果表明，我们可以使用 eDKM 将预训练的 LLaMA 7B 模型从 12.6 GB 压缩到 2.5 GB (3bit/weight)，在 Alpaca 数据集上进行精度调整，同时在更广泛的 LLM 标准准则（例如 PIQA 77.7%、Winograde 66.1% 等）上保持良好的准确率。

Visual-Kinematics Graph Learning for Procedure-agnostic Instrument Tip Segmentation in Robotic Surgeries

paper_url: http://arxiv.org/abs/2309.00957
repo_url: None
paper_authors: Jiaqi Liu, Yonghao Long, Kai Chen, Cheuk Hei Leung, Zerui Wang, Qi Dou
for: precisely segmenting surgical instrument tips to enable downstream applications in robotic surgery, such as skill assessment, tool-tissue interaction, and deformation modeling, as well as surgical autonomy.
methods: a novel visual-kinematics graph learning framework that encodes relational features of instrument parts from both image and kinematics, and a cross-modal contrastive loss to incorporate robust geometric prior from kinematics to image for tip segmentation.
results: the proposed multi-modal segmentation method significantly outperformed current image-based state-of-the-art approaches, exceeding averagely 11.2% on Dice, on a private paired visual-kinematics dataset including multiple procedures.

Abstract
Accurate segmentation of surgical instrument tip is an important task for enabling downstream applications in robotic surgery, such as surgical skill assessment, tool-tissue interaction and deformation modeling, as well as surgical autonomy. However, this task is very challenging due to the small sizes of surgical instrument tips, and significant variance of surgical scenes across different procedures. Although much effort has been made on visual-based methods, existing segmentation models still suffer from low robustness thus not usable in practice. Fortunately, kinematics data from the robotic system can provide reliable prior for instrument location, which is consistent regardless of different surgery types. To make use of such multi-modal information, we propose a novel visual-kinematics graph learning framework to accurately segment the instrument tip given various surgical procedures. Specifically, a graph learning framework is proposed to encode relational features of instrument parts from both image and kinematics. Next, a cross-modal contrastive loss is designed to incorporate robust geometric prior from kinematics to image for tip segmentation. We have conducted experiments on a private paired visual-kinematics dataset including multiple procedures, i.e., prostatectomy, total mesorectal excision, fundoplication and distal gastrectomy on cadaver, and distal gastrectomy on porcine. The leave-one-procedure-out cross validation demonstrated that our proposed multi-modal segmentation method significantly outperformed current image-based state-of-the-art approaches, exceeding averagely 11.2% on Dice.

摘要
importante任务是正确分 segmentation of surgical instrument tip 是为 robotic surgery 下的下游应用，如技能评估、工具与组织之间的互动和变形模型，以及自动手术。然而，这项任务很有挑战性，因为外科工具的小小大小，以及不同手术过程中的重要差异。虽然很多努力已经在可见基于方法上，但现有的分 segmentation 模型仍然受到低稳定性的影响，因此在实践中不可用。幸好，机械系统的遥感数据可以提供可靠的前提，即工具的位置信息，这些信息不受不同手术类型的影响。为了利用这些多Modal信息，我们提议了一种新的可见-遥感图学学习框架，确定外科工具的末端部分。特别是，我们提出了一种图学学习框架，用于编码外科工具的关系特征从图像和机械数据中。接着，我们设计了一种交叉模式对比损失函数，以实现从机械数据中获得稳定的几何先验。我们在私人的对应视觉-机械数据集上进行了实验，包括多种手术类型，例如肾脏摘除、全部肠脏除、肠脏嵌入和胃部摘除。我们使用了留下一个手术类型的交叉验证，并显示了我们提议的多Modal分 segmentation 方法在对比现有图像基于状态先验的方法时，平均性能提高了11.2%的Dice。

Bridge Diffusion Model: bridge non-English language-native text-to-image diffusion model with English communities

paper_url: http://arxiv.org/abs/2309.00952
repo_url: None
paper_authors: Shanyuan Liu, Dawei Leng, Yuhui Yin
for: 本研究旨在提出一种能够兼顾非英语语言本土特征和英语TTI社区的新型模型结构，以解决英语世界中心训练数据带来的模型偏见问题。
methods: 该模型结构称为“桥接干扰模型”（BDM），具有脊梁支网络结构，能够同时学习非英语语言 semantics 和英语 TTI 社区的 latent space 兼容性。
results: 经验表明，BDM 可以不仅生成准确表达非英语语言 semantics 的图像，还可以与多种英语 TTI 插件相容，如不同的检查点、LoRA、ControlNet、Dreambooth 等等。此外，BDM 还可以同时生成 combine 非英语 native 和英语 native semantics 的内容，促进文化交流。

Abstract
Text-to-Image generation (TTI) technologies are advancing rapidly, especially in the English language communities. However, English-native TTI models inherently carry biases from English world centric training data, which creates a dilemma for development of other language-native TTI models. One common choice is fine-tuning the English-native TTI model with translated samples from non-English communities. It falls short of fully addressing the model bias problem. Alternatively, training non-English language native models from scratch can effectively resolve the English world bias, but diverges from the English TTI communities, thus not able to utilize the strides continuously gaining in the English TTI communities any more. To build non-English language native TTI model meanwhile keep compatability with the English TTI communities, we propose a novel model structure referred as "Bridge Diffusion Model" (BDM). The proposed BDM employs a backbone-branch network structure to learn the non-English language semantics while keep the latent space compatible with the English-native TTI backbone, in an end-to-end manner. The unique advantages of the proposed BDM are that it's not only adept at generating images that precisely depict non-English language semantics, but also compatible with various English-native TTI plugins, such as different checkpoints, LoRA, ControlNet, Dreambooth, and Textual Inversion, etc. Moreover, BDM can concurrently generate content seamlessly combining both non-English native and English-native semantics within a single image, fostering cultural interaction. We verify our method by applying BDM to build a Chinese-native TTI model, whereas the method is generic and applicable to any other language.

摘要
文本到图像生成（TTI）技术在英语社区中进步快速，但英语原生TTI模型带有英语世界中心训练数据的偏见问题。一种常见的选择是使用翻译后的非英语样本来微调英语原生TTI模型，但这并不能完全解决模型偏见问题。 Alternatively, 从scratch来训练非英语语言原生TTI模型可以有效解决英语世界偏见问题，但是这会与英语TTI社区分离，无法再利用英语TTI社区的成果。为建立非英语语言原生TTI模型，同时保持与英语TTI社区的兼容性，我们提出了一种新的模型结构， referred to as "Bridge Diffusion Model" (BDM)。我们的提议的BDM模型采用了后脊架-分支网络结构，通过练习非英语语言 semantics 来学习非英语语言 semantics，并保持与英语原生TTI脊架兼容的粒子空间，从而实现了端到端的学习。BDM模型具有以下优点：一是能够准确地描述非英语语言 semantics，二是可以与英语原生TTI插件（如不同的检查点、LoRA、ControlNet、Dreambooth、Textual Inversion等）兼容，三是能够同时生成 combining both non-English native and English-native semantics within a single image，激发文化交流。我们验证了我们的方法，通过应用BDM模型来建立一个中文原生TTI模型，而这种方法是通用的，可以应用于任何其他语言。

From Specific to Generic Learned Sorted Set Dictionaries: A Theoretically Sound Paradigm Yelding Competitive Data Structural Boosters in Practice

paper_url: http://arxiv.org/abs/2309.00946
repo_url: https://github.com/globosco/An-implementation-of-Generic-Learned-Static-Sorted-Sets-Dictionaries
paper_authors: Domenico Amato, Giosué Lo Bosco, Raffaele Giancarlo
for: 这项研究关注学习数据结构，即机器学习和经典数据结构的交叉领域中的新领域。它有方法学上的重要性和实践上的强大影响。
methods: 我们专注于学习排序字典，即学习排序集合。现有的提议都是特定的，它们可以提高表格搜索过程的时间性能，但只适用于排序的布局，如二分搜索。我们提出了一新的方法，可以补充现有的专门方法，将任何排序集合转化为学习版本。
results: 我们 obtiained several interesting results，包括（a）首个学习优质二分搜索森林，其 mean access time bounded by Entropy 的概率分布下的访问 Dictionary。（b）首个学习排序集合，在动态情况下，在权衡分析设置下，与经典字典匹配的时间上限。这后者在广泛接受的宇宙大小下。实验部分，软件开发相对复杂，显示了非常有趣的发现，即我们的总结可以生成有效竞争力的学习数据结构加速器，即使与特定的benchmark模型相比。

Abstract
This research concerns Learned Data Structures, a recent area that has emerged at the crossroad of Machine Learning and Classic Data Structures. It is methodologically important and with a high practical impact. We focus on Learned Indexes, i.e., Learned Sorted Set Dictionaries. The proposals available so far are specific in the sense that they can boost, indeed impressively, the time performance of Table Search Procedures with a sorted layout only, e.g., Binary Search. We propose a novel paradigm that, complementing known specialized ones, can produce Learned versions of any Sorted Set Dictionary, for instance, Balanced Binary Search Trees or Binary Search on layouts other that sorted, i.e., Eytzinger. Theoretically, based on it, we obtain several results of interest, such as (a) the first Learned Optimum Binary Search Forest, with mean access time bounded by the Entropy of the probability distribution of the accesses to the Dictionary; (b) the first Learned Sorted Set Dictionary that, in the Dynamic Case and in an amortized analysis setting, matches the same time bounds known for Classic Dictionaries. This latter under widely accepted assumptions regarding the size of the Universe. The experimental part, somewhat complex in terms of software development, clearly indicates the nonobvious finding that the generalization we propose can yield effective and competitive Learned Data Structural Booster, even with respect to specific benchmark models.

摘要
Theoretically, we obtain several interesting results based on this paradigm. For example, we develop the first learned optimum binary search forest, with a mean access time bounded by the entropy of the probability distribution of the accesses to the dictionary. Additionally, we create the first learned sorted set dictionary that, in the dynamic case and in an amortized analysis setting, matches the same time bounds as classic dictionaries, under widely accepted assumptions about the size of the universe.The experimental part of our research, which involved complex software development, surprisingly found that our generalization can yield effective and competitive learned data structural boosters, even compared to specific benchmark models.

Pressmatch: Automated journalist recommendation for media coverage with Nearest Neighbor search

paper_url: http://arxiv.org/abs/2309.00944
repo_url: None
paper_authors: Soumya Parekh, Jay Patel
for: 这个研究旨在帮助公司发布产品时更好地推广和获得媒体推广，以增加产品的普及度和观众的互动。
methods: 这个研究使用自然语言处理和机器学习技术，推荐适合产品新闻发布的记者，以减少公司需要运作媒体联系和筛选记者的时间和努力。
results: 这借研究发现，使用自然语言处理和机器学习技术可以快速和准确地推荐适合产品新闻发布的记者，从而提高公司发布产品的媒体推广效果。

Abstract
Slating a product for release often involves pitching journalists to run stories on your press release. Good media coverage often ensures greater product reach and drives audience engagement for those products. Hence, ensuring that those releases are pitched to the right journalists with relevant interests is crucial, since they receive several pitches daily. Keeping up with journalist beats and curating a media contacts list is often a huge and time-consuming task. This study proposes a model to automate and expedite the process by recommending suitable journalists to run media coverage on the press releases provided by the user.

摘要
平时发布产品 often involves pitching 新闻工作者来讲述产品的新闻稿。好的媒体报道通常会提高产品的报道覆盖率和驱动产品的听众参与度。因此，确保向正确的新闻工作者发送 pitches 是非常重要的，因为他们每天收到很多 pitches。维护新闻工作者的 beat 和建立媒体联系人名单可以是一项巨大和耗时的任务。这个研究提出了一个模型，用于自动化和加速这个过程，并对用户提供的新闻稿进行推荐适合的新闻工作者。

Content Prompting: Modeling Content Provider Dynamics to Improve User Welfare in Recommender Ecosystems

paper_url: http://arxiv.org/abs/2309.00940
repo_url: None
paper_authors: Siddharth Prasad, Martin Mladenov, Craig Boutilier
for: 这篇论文旨在解决信息不均衡问题，帮助内容提供者更好地了解用户需求，并适应用户需求提供内容。
methods: 该论文采用了提示策略，即向内容提供者提供启示或建议，以便他们可以提供满足用户需求的新内容。同时，论文还提出了一种Sequential Prompting Policy，即在提供者的信念、技能和激励下采取一系列的提示策略。
results: 论文通过一种抽象模型和数学分析，证明了这种提示策略可以优化用户社会利益，同时尊重提供者的激励。此外，论文还通过简单的实验证明了这种策略可以提高生态系统的健康度和用户满意度。

Abstract
Users derive value from a recommender system (RS) only to the extent that it is able to surface content (or items) that meet their needs/preferences. While RSs often have a comprehensive view of user preferences across the entire user base, content providers, by contrast, generally have only a local view of the preferences of users that have interacted with their content. This limits a provider's ability to offer new content to best serve the broader population. In this work, we tackle this information asymmetry with content prompting policies. A content prompt is a hint or suggestion to a provider to make available novel content for which the RS predicts unmet user demand. A prompting policy is a sequence of such prompts that is responsive to the dynamics of a provider's beliefs, skills and incentives. We aim to determine a joint prompting policy that induces a set of providers to make content available that optimizes user social welfare in equilibrium, while respecting the incentives of the providers themselves. Our contributions include: (i) an abstract model of the RS ecosystem, including content provider behaviors, that supports such prompting; (ii) the design and theoretical analysis of sequential prompting policies for individual providers; (iii) a mixed integer programming formulation for optimal joint prompting using path planning in content space; and (iv) simple, proof-of-concept experiments illustrating how such policies improve ecosystem health and user welfare.

摘要
用户只有在推荐系统（RS）能够浮现符合他们需求和偏好的内容时，才能获得价值。而RS通常有用户基本库的全面视图，而内容提供者只有与他们内容的用户交互的本地视图。这限制了提供者对整个人口的新内容供应的能力。在这种情况下，我们使用内容提醒策略来缓解信息不均衡。内容提醒是一个提醒或建议，让提供者为RS预测的用户需求而提供新的内容。提醒策略是一个针对提供者的信念、技能和利益的回应序列。我们的目标是找到一个共同的提醒策略，使提供者在均衡下为用户社会福祉产生最佳内容，同时尊重提供者自己的利益。我们的贡献包括：1. RS生态系统抽象模型，包括内容提供者行为，支持内容提醒策略。2. 针对个体提供者的sequential提醒策略的设计和理论分析。3. 内容空间探索路径规划法，用于优化共同提醒策略。4. 简单的证明性实验，说明如何实施内容提醒策略，提高生态系统健康和用户福祉。

Deep supervised hashing for fast retrieval of radio image cubes

paper_url: http://arxiv.org/abs/2309.00932
repo_url: None
paper_authors: Steven Ndung’u, Trienko Grobler, Stefan J. Wijnholds, Dimka Karastoyanova, George Azzopardi
for: Next-generation radio surveys will result in a large number of serendipitous discoveries, and deep hashing algorithms can be used to efficiently search and retrieve similar images in a large database.
methods: The paper uses deep hashing algorithms for image retrieval tasks in astronomy, specifically using the Hamming distance between the binary hash of the query image and those of the reference images in the database.
results: The experimental results achieved a precision of 88.5% using the mean average precision (mAP) metric, demonstrating the capability to search and retrieve similar radio images efficiently and at scale.

Abstract
The shear number of sources that will be detected by next-generation radio surveys will be astronomical, which will result in serendipitous discoveries. Data-dependent deep hashing algorithms have been shown to be efficient at image retrieval tasks in the fields of computer vision and multimedia. However, there are limited applications of these methodologies in the field of astronomy. In this work, we utilize deep hashing to rapidly search for similar images in a large database. The experiment uses a balanced dataset of 2708 samples consisting of four classes: Compact, FRI, FRII, and Bent. The performance of the method was evaluated using the mean average precision (mAP) metric where a precision of 88.5\% was achieved. The experimental results demonstrate the capability to search and retrieve similar radio images efficiently and at scale. The retrieval is based on the Hamming distance between the binary hash of the query image and those of the reference images in the database.

摘要
“Future radio surveys will detect an enormous number of sources, leading to unexpected discoveries. Deep hashing algorithms have been proven efficient in image retrieval tasks in computer vision and multimedia, but their applications in astronomy are limited. In this study, we utilize deep hashing to quickly search for similar images in a large database. The experiment uses a balanced dataset of 2708 samples, including four classes: Compact, FRI, FRII, and Bent. The performance of the method was evaluated using the mean average precision (mAP) metric, achieving a precision of 88.5%. The results demonstrate the ability to efficiently search and retrieve similar radio images at scale, based on the Hamming distance between the binary hash of the query image and those of the reference images in the database.”Here's the breakdown of the translation:* "shear number" 的 Simplified Chinese translation is "巨大的数量" (jùdà de shùliàng)* "next-generation radio surveys" 的 Simplified Chinese translation is "次代 радио探测" (cìdài ràdìo tàncè)* "serendipitous discoveries" 的 Simplified Chinese translation is "偶然发现" (òujiān fāxìan)* "data-dependent deep hashing algorithms" 的 Simplified Chinese translation is "基于数据的深度哈希算法" (jīyú shuòyǔ de shēngrán hǎixī算法)* "computer vision and multimedia" 的 Simplified Chinese translation is "计算机视觉和多媒体" (jìsuànjī zhìguān yǔ duōmédiā)* "limited applications in astronomy" 的 Simplified Chinese translation is "在天文学中的应用受限" (zài tiānwén xué zhī yǐ jí)* "utilize deep hashing to rapidly search for similar images" 的 Simplified Chinese translation is "使用深度哈希快速搜索相似图像" (shǐyòu shēngrán hǎixī suōsòu xiàngsi túxìng)* "balanced dataset of 2708 samples" 的 Simplified Chinese translation is "2708个样本的均衡数据集" (2708 ge yàngbèi de jìngbìng shùliàng)* "four classes: Compact, FRI, FRII, and Bent" 的 Simplified Chinese translation is "四类：紧凑、FRI、FRII、拐型" (sì lèi: jǐnchōng, FRI, FRII, gōngyì)* "mean average precision (mAP) metric" 的 Simplified Chinese translation is "平均精度 (mAP) 指标" (píngjìn jīngdé (mAP) zhǐbǐ)* "achieving a precision of 88.5%" 的 Simplified Chinese translation is "达到88.5%的精度" (dàtuō 88.5% de jīngdé)* "the retrieval is based on the Hamming distance between the binary hash of the query image and those of the reference images in the database" 的 Simplified Chinese translation is "搜索基于图像查询图像的二进制哈希距离" (suōsòu jīyú túxìng túxìng de èrjìn bìngxī hǎixī yuèlü)

Studying the impacts of pre-training using ChatGPT-generated text on downstream tasks

paper_url: http://arxiv.org/abs/2309.05668
repo_url: None
paper_authors: Sarthak Anand
for: 本研究的目的是调查人工文本在语言模型预训练阶段的影响。
methods: 我们使用了RoBERTa和ChatGPT两个语言模型，对这两个模型进行了比较分析，并对其在三个下游任务中的表现进行评估，同时也对模型的性别偏见进行了评估。
results: 我们的实验结果表明，在预训练阶段使用人工文本不会对语言模型在下游任务中的表现产生显著影响，也不会导致模型的性别偏见增加。

Abstract
In recent times, significant advancements have been witnessed in the field of language models, particularly with the emergence of Large Language Models (LLMs) that are trained on vast amounts of data extracted from internet archives. These LLMs, such as ChatGPT, have become widely accessible, allowing users to generate text for various purposes including articles, essays, jokes, and poetry. Given that LLMs are trained on a diverse range of text sources, encompassing platforms like Reddit and Twitter, it is foreseeable that future training datasets will also incorporate text generated by previous iterations of the models themselves. In light of this development, our research aims to investigate the influence of artificial text in the pre-training phase of language models. Specifically, we conducted a comparative analysis between a language model, RoBERTa, pre-trained using CNN/DailyMail news articles, and ChatGPT, which employed the same articles for its training and evaluated their performance on three downstream tasks as well as their potential gender bias, using sentiment analysis as a metric. Through a series of experiments, we demonstrate that the utilization of artificial text during pre-training does not have a significant impact on either the performance of the models in downstream tasks or their gender bias. In conclusion, our findings suggest that the inclusion of text generated by LLMs in their own pre-training process does not yield substantial effects on the subsequent performance of the models in downstream tasks or their potential gender bias.

摘要
In light of this development, our research aims to investigate the influence of artificial text in the pre-training phase of language models. Specifically, we conducted a comparative analysis between a language model, RoBERTa, pre-trained using CNN/DailyMail news articles, and ChatGPT, which employed the same articles for its training, and evaluated their performance on three downstream tasks as well as their potential gender bias, using sentiment analysis as a metric.Through a series of experiments, we found that the utilization of artificial text during pre-training does not have a significant impact on either the performance of the models in downstream tasks or their gender bias. In conclusion, our findings suggest that the inclusion of text generated by LLMs in their own pre-training process does not yield substantial effects on the subsequent performance of the models in downstream tasks or their potential gender bias.

Knowledge Graph Embeddings for Multi-Lingual Structured Representations of Radiology Reports

paper_url: http://arxiv.org/abs/2309.00917
repo_url: https://github.com/tjvsonsbeek/knowledge_graphs_for_radiology_reports
paper_authors: Tom van Sonsbeek, Xiantong Zhen, Marcel Worring
for: 这个研究旨在开发一个轻量级的图形基于嵌入方法，以便更好地理解并分析医疗报告。
methods: 这个方法利用医疗报告的结构和成分，并与多种语言的医疗词汇知识库（SNOMED CT）进行连结。从而生成一个具有更好的理解和跨语言传递能力的图形嵌入。
results: 研究表明，这个图形嵌入可以更好地捕捉医疗报告中的关系性，并且在疾病分类和影像分类任务中表现出色，比BERT模型更好，且具有较小的模型大小和训练数据需求。此外，这个方法还可以跨语言进行应用。

Abstract
The way we analyse clinical texts has undergone major changes over the last years. The introduction of language models such as BERT led to adaptations for the (bio)medical domain like PubMedBERT and ClinicalBERT. These models rely on large databases of archived medical documents. While performing well in terms of accuracy, both the lack of interpretability and limitations to transfer across languages limit their use in clinical setting. We introduce a novel light-weight graph-based embedding method specifically catering radiology reports. It takes into account the structure and composition of the report, while also connecting medical terms in the report through the multi-lingual SNOMED Clinical Terms knowledge base. The resulting graph embedding uncovers the underlying relationships among clinical terms, achieving a representation that is better understandable for clinicians and clinically more accurate, without reliance on large pre-training datasets. We show the use of this embedding on two tasks namely disease classification of X-ray reports and image classification. For disease classification our model is competitive with its BERT-based counterparts, while being magnitudes smaller in size and training data requirements. For image classification, we show the effectiveness of the graph embedding leveraging cross-modal knowledge transfer and show how this method is usable across different languages.

摘要
医学文本分析方法在最近几年内经历了重大变革。BERT语言模型的引入对医学领域的PubMedBERT和ClinicalBERT进行了适应。这些模型依靠大量储存的医学文献库。虽然在准确性方面表现良好，但lack of interpretability和语言转移限制使其在临床设置中无法使用。我们介绍了一种新的轻量级图 embedding方法，专门针对医学报告。该方法考虑报告的结构和组成，同时通过多语言的SNOMED临床术语知识库连接医学术语。得到的图 embedding揭示了临床术语之间的下面关系，实现了更好的可读性和临床准确性，不需要大量的预训练数据。我们在疾病分类和图像分类两个任务上使用了这种embedding，并证明了它的效果。在疾病分类任务中，我们的模型与BERT基于模型相比竞争，而且它的大小和训练数据要求都比BERT要小得多。在图像分类任务中，我们利用了图 embedding在不同语言之间的交叉模式知识传递，并证明了这种方法在不同语言上的可用性。

A 3D explainability framework to uncover learning patterns and crucial sub-regions in variable sulci recognition

paper_url: http://arxiv.org/abs/2309.00903
repo_url: None
paper_authors: Michail Mamalakis, Heloise de Vareilles, Atheer AI-Manea, Samantha C. Mitchell, Ingrid Arartz, Lynn Egeland Morch-Johnsen, Jane Garrison, Jon Simons, Pietro Lio, John Suckling, Graham Murray
for: 本研究旨在提高人脑MRI图像中的皮层 Sulcal 特征的准确性，以及深度学习网络的解释能力。
methods: 该研究使用了一种新的3D解释框架，将本地解释技术GradCam和SHAP综合使用，并结合维度减少方法。该解释框架为深度学习网络提供了本地和全局的解释，以及类别结果的准确性。
results: 研究发现，在使用TOP-OSLO dataset中的MRI图像中，左半球比右半球更有可能正确地检测皮层 Sulcus（存在或不存在），并且发现了特定但广泛的子区域对每个类别结果做出了重要贡献。此外，研究还启示了不偏袋注意力的注意事项对网络性能的影响。该方法不仅提供了自动化、公正的皮层 Sulcus annotations，还为脑科学领域的进一步探索和调查提供了新的思路。

Abstract
Precisely identifying sulcal features in brain MRI is made challenging by the variability of brain folding. This research introduces an innovative 3D explainability frame-work that validates outputs from deep learning networks in their ability to detect the paracingulate sulcus, an anatomical feature that may or may not be present on the frontal medial surface of the human brain. This study trained and tested two networks, amalgamating local explainability techniques GradCam and SHAP with a dimensionality reduction method. The explainability framework provided both localized and global explanations, along with accuracy of classification results, revealing pertinent sub-regions contributing to the decision process through a post-fusion transformation of explanatory and statistical features. Leveraging the TOP-OSLO dataset of MRI acquired from patients with schizophrenia, greater accuracies of paracingulate sulcus detection (presence or absence) were found in the left compared to right hemispheres with distinct, but extensive sub-regions contributing to each classification outcome. The study also inadvertently highlighted the critical role of an unbiased annotation protocol in maintaining network performance fairness. Our proposed method not only offers automated, impartial annotations of a variable sulcus but also provides insights into the broader anatomical variations associated with its presence throughout the brain. The adoption of this methodology holds promise for instigating further explorations and inquiries in the field of neuroscience.

摘要
通过准确地识别大脑磁共振图像中的 Sulcal 特征，这种研究推出了一个创新的3D解释框架。该框架验证了深度学习网络的输出是否能够检测人脑前 медиаль面上的 paracingulate sulcus，这是一个可能或可能不存在的 анатомиче特征。本研究用了两个网络，其中一个是 GradCam 和 SHAP 的 мест解释技术，另一个是一种维度减少方法。该解释框架提供了 both localized 和 global 解释，以及分类结果的准确性，揭示了在决策过程中重要的相关子区域。使用 TOP-OSLO 数据集，该研究发现在左半球比右半球更高的准确率检测 paracingulate sulcus（存在或缺失）。此外，研究还意外地发现了在左半球和右半球之间的区域差异对于每个分类结果的重要性。此方法不仅提供了自动化、不偏的 sulcus Variable 的注释，还为其存在的各个部分提供了深入的解释。该方法的采用拥有推动更多的 Neuroscience 领域的探索和问题。

Large Process Models: Business Process Management in the Age of Generative AI

paper_url: http://arxiv.org/abs/2309.00900
repo_url: None
paper_authors: Timotheus Kampik, Christian Warmuth, Adrian Rebmann, Ron Agam, Lukas N. P. Egger, Andreas Gerber, Johannes Hoffart, Jonas Kolk, Philipp Herzig, Gero Decker, Han van der Aa, Artem Polyvyanyy, Stefanie Rinderle-Ma, Ingo Weber, Matthias Weidlich
for: 这个论文是为了探讨基于大语言模型（LLM）和其他生成人工智能技术的潜在优势和局限性，以及如何结合这些技术以提高企业转型的效率和深度。
methods: 论文提出了一种基于大语言模型（LLM）和知识基本模型（KBM）的大过程模型（LPM），该模型结合了LLM的相关力和KBM的分析精度和可靠性，以提供更加Context-specific（适应）的过程和商业模型，深入分析和提供改进建议。
results: 论文认为，实施LPM可以减少企业转型所需的时间和努力，同时提供更加深入、更加有效和更加可行的业务转型建议，相比之下传统的 Symbolic 模型。然而，论文也提出了实施LPM的限制和研究挑战。

Abstract
The continued success of Large Language Models (LLMs) and other generative artificial intelligence approaches highlights the advantages that large information corpora can have over rigidly defined symbolic models, but also serves as a proof-point of the challenges that purely statistics-based approaches have in terms of safety and trustworthiness. As a framework for contextualizing the potential, as well as the limitations of LLMs and other foundation model-based technologies, we propose the concept of a Large Process Model (LPM) that combines the correlation power of LLMs with the analytical precision and reliability of knowledge-based systems and automated reasoning approaches. LPMs are envisioned to directly utilize the wealth of process management experience that experts have accumulated, as well as process performance data of organizations with diverse characteristics, e.g., regarding size, region, or industry. In this vision, the proposed LPM would allow organizations to receive context-specific (tailored) process and other business models, analytical deep-dives, and improvement recommendations. As such, they would allow to substantially decrease the time and effort required for business transformation, while also allowing for deeper, more impactful, and more actionable insights than previously possible. We argue that implementing an LPM is feasible, but also highlight limitations and research challenges that need to be solved to implement particular aspects of the LPM vision.

摘要
大型语言模型（LLM）和其他生成人工智能方法的继续成功，强调了大量信息库对于固定符号模型的优势，但也 serves as a proof-point of 隐性和可靠性问题。作为 LLM 和其他基础模型技术的框架，我们提出了大量处理模型（LPM）的概念，该模型结合了 LLM 的相关力和知识基础系统和自动化推理方法的分析精度和可靠性。LPM 可以直接利用专家们积累的过程管理经验和不同特征的组织过程性能数据，例如大小、地区、行业等，以提供Context-specific（特定）的过程和商业模型、深入分析和改进建议。因此，LPM 可以减少企业转型所需的时间和努力，同时提供更深入、更有影响和更可行的发现。我们认为实施 LPM 是可能的，但也存在实现特定方面的限制和研究挑战。

Regularly Truncated M-estimators for Learning with Noisy Labels

paper_url: http://arxiv.org/abs/2309.00894
repo_url: https://github.com/xiaoboxia/rtm_lnl
paper_authors: Xiaobo Xia, Pengqian Lu, Chen Gong, Bo Han, Jun Yu, Jun Yu, Tongliang Liu
for: 提高学习 WITH 噪音标签的精度和稳定性。
methods: 提出了一种名为Regularly Truncated M-estimators（RTME）的新方法，该方法可以同时解决两个问题：（a）不考虑噪音标签在选择小损失示例中的不良影响；（b）不利用抛弃大损失示例中的可能有用信息。
results: 理论上显示了方法的抗噪音特性，实验结果表明我们的方法可以超越多个基eline，并在各种噪音类型和水平下表现稳定和可靠。

Abstract
The sample selection approach is very popular in learning with noisy labels. As deep networks learn pattern first, prior methods built on sample selection share a similar training procedure: the small-loss examples can be regarded as clean examples and used for helping generalization, while the large-loss examples are treated as mislabeled ones and excluded from network parameter updates. However, such a procedure is arguably debatable from two folds: (a) it does not consider the bad influence of noisy labels in selected small-loss examples; (b) it does not make good use of the discarded large-loss examples, which may be clean or have meaningful information for generalization. In this paper, we propose regularly truncated M-estimators (RTME) to address the above two issues simultaneously. Specifically, RTME can alternately switch modes between truncated M-estimators and original M-estimators. The former can adaptively select small-losses examples without knowing the noise rate and reduce the side-effects of noisy labels in them. The latter makes the possibly clean examples but with large losses involved to help generalization. Theoretically, we demonstrate that our strategies are label-noise-tolerant. Empirically, comprehensive experimental results show that our method can outperform multiple baselines and is robust to broad noise types and levels.

摘要
“ selección de muestras es muy popular en aprendizaje con etiquetas ruidosas. Como las redes profundas aprenden patrones primero, los métodos previos basados en selección de muestras comparten un procedimiento de entrenamiento similar: los ejemplos de pérdida pequeña pueden ser considerados como ejemplos limpios y utilizados para ayudar en la generalización, mientras que los ejemplos de pérdida grande son excluidos de actualizaciones de parámetros de la red. Sin embargo, tal procedimiento es objeto de debate desde dos ángulos: (a) no tiene en cuenta la mala influencia de las etiquetas ruidosas en los ejemplos de pérdida pequeña seleccionados; (b) no utiliza adecuadamente los ejemplos de pérdida grande descartados, que pueden ser limpios o tener información significativa para la generalización. En este artículo, propongo regularmente truncated M-estimators (RTME) para abordar los problemas anteriores de manera simultánea. De manera específica, RTME puede alternar entre modes de estimadores truncados y estimadores originales. Los primeros pueden adaptativamente seleccionar ejemplos de pérdida pequeña sin conocer la tasa de ruido y reducir los efectos colaterales de las etiquetas ruidosas en ellos. Los segundos involucran posibles ejemplos limpios pero con pérdidas grandes para ayudar en la generalización. Teóricamente, demostramos que nuestras estrategias son tolerantes al ruido de etiquetas. Empíricamente, resultados exhaustivos y robustos muestran que nuestro método puede superar multiple baselines y es resistente a tipos y niveles de ruido de etiquetas amplios.”

Equitable-FL: Federated Learning with Sparsity for Resource-Constrained Environment

paper_url: http://arxiv.org/abs/2309.00864
repo_url: None
paper_authors: Indrajeet Kumar Sinha, Shekhar Verma, Krishna Pratap Singh
for: 该论文旨在提出一种适用于有限资源环境的联合学习方法，以便在客户端缺乏资源时仍可进行学习。
methods: 该方法基于lottery ticket假设，逐渐减少模型参数数量，以鼓励有限资源的客户端参与联合学习。
results: 实验结果表明，该方法可以在不同的数据集和环境下减少模型大小，同时保持模型的准确性，并且可以适应不同的客户端资源具有不同的缺省值。

Abstract
In Federated Learning, model training is performed across multiple computing devices, where only parameters are shared with a common central server without exchanging their data instances. This strategy assumes abundance of resources on individual clients and utilizes these resources to build a richer model as user's models. However, when the assumption of the abundance of resources is violated, learning may not be possible as some nodes may not be able to participate in the process. In this paper, we propose a sparse form of federated learning that performs well in a Resource Constrained Environment. Our goal is to make learning possible, regardless of a node's space, computing, or bandwidth scarcity. The method is based on the observation that model size viz a viz available resources defines resource scarcity, which entails that reduction of the number of parameters without affecting accuracy is key to model training in a resource-constrained environment. In this work, the Lottery Ticket Hypothesis approach is utilized to progressively sparsify models to encourage nodes with resource scarcity to participate in collaborative training. We validate Equitable-FL on the $MNIST$, $F-MNIST$, and $CIFAR-10$ benchmark datasets, as well as the $Brain-MRI$ data and the $PlantVillage$ datasets. Further, we examine the effect of sparsity on performance, model size compaction, and speed-up for training. Results obtained from experiments performed for training convolutional neural networks validate the efficacy of Equitable-FL in heterogeneous resource-constrained learning environment.

摘要
在联合学习中，模型训练在多个计算设备之间进行，只是共享参数而不是数据实例。这种策略假设每个客户端都有充足的资源，并利用这些资源建立更加丰富的模型。然而，当假设丰富资源的假设不成立时，学习可能无法进行，因为一些节点可能无法参与过程中。在这篇论文中，我们提出了一种缺省形式的联合学习方法，可以在资源受限环境中进行学习。我们的目标是让学习无论节点的空间、计算或带宽scarce都可以进行。该方法基于参数数量与可用资源的关系，即模型大小与可用资源的定义资源缺乏问题。在这种情况下，我们采用了抽奖票假设方法，以逐步减少模型参数，以鼓励有资源缺乏的节点参与合作训练。我们在$MNIST$, $F-MNIST$, $CIFAR-10$benchmark数据集和$Brain-MRI$数据集以及$PlantVillage$数据集进行了验证。此外，我们还研究了缺省对性能、模型大小压缩和训练速度的影响。实验结果表明，Equitable-FL在多种不同资源环境下进行联合学习时具有有效性。

Domain Generalization via Balancing Training Difficulty and Model Capability

paper_url: http://arxiv.org/abs/2309.00844
repo_url: None
paper_authors: Xueying Jiang, Jiaxing Huang, Sheng Jin, Shijian Lu
for: 这个研究旨在为 domain generalization (DG) 学习一个可以在未知目标领域中表现良好的模型，并且解决了训练过程中模型和样本的对错问题。
methods: 这个研究使用了两个新的设计，即 MoDify-based Data Augmentation 和 MoDify-based Network Optimization，它们协力地对抗了训练过程中的对错问题，以获得更好的预测性。
results: 这个研究获得了多个 benchark 上的 superior performance，并且可以与现有的方法整合，并且可以在不同的视觉识别任务中使用。

Abstract
Domain generalization (DG) aims to learn domain-generalizable models from one or multiple source domains that can perform well in unseen target domains. Despite its recent progress, most existing work suffers from the misalignment between the difficulty level of training samples and the capability of contemporarily trained models, leading to over-fitting or under-fitting in the trained generalization model. We design MoDify, a Momentum Difficulty framework that tackles the misalignment by balancing the seesaw between the model's capability and the samples' difficulties along the training process. MoDify consists of two novel designs that collaborate to fight against the misalignment while learning domain-generalizable models. The first is MoDify-based Data Augmentation which exploits an RGB Shuffle technique to generate difficulty-aware training samples on the fly. The second is MoDify-based Network Optimization which dynamically schedules the training samples for balanced and smooth learning with appropriate difficulty. Without bells and whistles, a simple implementation of MoDify achieves superior performance across multiple benchmarks. In addition, MoDify can complement existing methods as a plug-in, and it is generic and can work for different visual recognition tasks.

摘要
域名泛化（DG）目标是从一个或多个源领域学习到能够在未看过的目标领域表现好的模型。虽然最近得到了进步，但大多数现有的工作受到模型在训练过程中样本难度与模型能力的不同而受到偏移，导致过拟合或者下降在训练的泛化模型。我们提出了MoDify，一个带动力度的框架，解决这个问题。MoDify包括两个新的设计，协作以适应不同难度水平的样本，从训练过程中学习域名泛化模型。首先是MoDify基于数据增强的设计，利用RGB混淆技术在训练过程中生成适度考虑的训练样本。其次是MoDify基于网络优化的设计，通过动态安排训练样本来保持平稳的学习，适应不同难度水平。无需辉煌的简单实现，MoDify可以在多个 benchmark 上 achieve 优秀表现。此外，MoDify可以补充现有方法，作为插件，并且可以对不同的视觉识别任务进行应用。

LeanContext: Cost-Efficient Domain-Specific Question Answering Using LLMs

paper_url: http://arxiv.org/abs/2309.00841
repo_url: None
paper_authors: Md Adnan Arefeen, Biplob Debnath, Srimat Chakradhar
for: 本研究旨在提高小型企业对大语言模型（LLM）的应用，以提高聊天机器人的能力。
methods: 本研究使用leansContext方法，该方法可以快速提取适用于查询的$k$个关键句。
results: 与基eline相比，leansContext方法可以减少Context的成本，同时保持ROUGE-1分数的稳定性。如果使用免费预训练的LLM-基于摘要器，leansContext还可以进一步提高准确性。

Abstract
Question-answering (QA) is a significant application of Large Language Models (LLMs), shaping chatbot capabilities across healthcare, education, and customer service. However, widespread LLM integration presents a challenge for small businesses due to the high expenses of LLM API usage. Costs rise rapidly when domain-specific data (context) is used alongside queries for accurate domain-specific LLM responses. One option is to summarize the context by using LLMs and reduce the context. However, this can also filter out useful information that is necessary to answer some domain-specific queries. In this paper, we shift from human-oriented summarizers to AI model-friendly summaries. Our approach, LeanContext, efficiently extracts $k$ key sentences from the context that are closely aligned with the query. The choice of $k$ is neither static nor random; we introduce a reinforcement learning technique that dynamically determines $k$ based on the query and context. The rest of the less important sentences are reduced using a free open source text reduction method. We evaluate LeanContext against several recent query-aware and query-unaware context reduction approaches on prominent datasets (arxiv papers and BBC news articles). Despite cost reductions of $37.29\%$ to $67.81\%$, LeanContext's ROUGE-1 score decreases only by $1.41\%$ to $2.65\%$ compared to a baseline that retains the entire context (no summarization). Additionally, if free pretrained LLM-based summarizers are used to reduce context (into human consumable summaries), LeanContext can further modify the reduced context to enhance the accuracy (ROUGE-1 score) by $13.22\%$ to $24.61\%$.

摘要
帮助机器人回答问题（Question-answering，QA）是大型自然语言模型（Large Language Models，LLMs）的重要应用，涵盖医疗、教育和客户服务等领域。然而，广泛的 LLM 集成对小型企业而言是一大挑战，因为 LLM API 使用成本高涨。当用于精度的域pecific数据（context）时，成本会快速增加。一种选择是使用 LLM SUMMARIZE Context，从而减少context。然而，这也可能会过滤出一些具有响应域pecific queries 的有用信息。在这篇论文中，我们弃用人类SUMMARIZER，转而使用 AI 模型友好的SUMMARY。我们的方法LeanContext，高效地提取了 $k$ 个关键句子，与查询高度相关。 $k$ 的选择不是静态也不是随机的，我们引入了一种强化学习技术，动态确定 $k$ 基于查询和 context。剩下的部分使用一种免费开源的文本减少方法减少。我们对多个最新的查询 aware 和查询无关的context减少方法进行评估，并发现LeanContext可以在成本下降 $37.29\%$ 到 $67.81\%$ 的情况下，ROUGE-1 分数下降只有 $1.41\%$ 到 $2.65\%$ 相比于基线（无 summarization）。此外，如果使用免费预训练 LLM 基于 SUMMARIZER 减少 context（转换为人类可读的摘要），LeanContext 可以进一步修改减少后的 context，提高精度（ROUGE-1 分数） by $13.22\%$ 到 $24.61\%$。

Leveraging Semi-Supervised Graph Learning for Enhanced Diabetic Retinopathy Detection

paper_url: http://arxiv.org/abs/2309.00824
repo_url: None
paper_authors: D. Dhinakaran, L. Srinivasan, D. Selvaraj, S. M. Udhaya Sankar
for: 这项研究旨在提高 диабетичеRetinopathy（DR）的早期检测和治疗，使用机器学习（ML）技术。
methods: 该研究提出了一种新的半监督图像学习（SSGL）算法，利用标注和无标注数据之间的关系来提高准确性。
results: 研究表明，该算法可以在两个公共可用的数据集上获得显著改进的分类精度、特异性和敏感性，并且具有鲁棒性 against 噪音和异常值。

Abstract
Diabetic Retinopathy (DR) is a significant cause of blindness globally, highlighting the urgent need for early detection and effective treatment. Recent advancements in Machine Learning (ML) techniques have shown promise in DR detection, but the availability of labeled data often limits their performance. This research proposes a novel Semi-Supervised Graph Learning SSGL algorithm tailored for DR detection, which capitalizes on the relationships between labelled and unlabeled data to enhance accuracy. The work begins by investigating data augmentation and preprocessing techniques to address the challenges of image quality and feature variations. Techniques such as image cropping, resizing, contrast adjustment, normalization, and data augmentation are explored to optimize feature extraction and improve the overall quality of retinal images. Moreover, apart from detection and diagnosis, this work delves into applying ML algorithms for predicting the risk of developing DR or the likelihood of disease progression. Personalized risk scores for individual patients are generated using comprehensive patient data encompassing demographic information, medical history, and retinal images. The proposed Semi-Supervised Graph learning algorithm is rigorously evaluated on two publicly available datasets and is benchmarked against existing methods. Results indicate significant improvements in classification accuracy, specificity, and sensitivity while demonstrating robustness against noise and outlie rs.Notably, the proposed algorithm addresses the challenge of imbalanced datasets, common in medical image analysis, further enhancing its practical applicability.

摘要
糖尿病视网膜病 (DR) 是全球主要导致盲视的重要原因，强调了早期发现和有效治疗的急需。 recent advancements in Machine Learning (ML) techniques have shown promise in DR detection, but the availability of labeled data often limits their performance. This research proposes a novel Semi-Supervised Graph Learning (SSGL) algorithm tailored for DR detection, which capitalizes on the relationships between labeled and unlabeled data to enhance accuracy.首先，这项研究 investigate data augmentation and preprocessing techniques to address the challenges of image quality and feature variations. techniques such as image cropping, resizing, contrast adjustment, normalization, and data augmentation are explored to optimize feature extraction and improve the overall quality of retinal images.此外，这项研究不仅仅是 DR 的检测和诊断，还探讨了使用 ML 算法预测糖尿病的发展风险或疾病进程的可能性。通过对患者的全面数据，包括人口统计信息、医疗历史和视网膜图像，生成个性化的风险分数。提出的 Semi-Supervised Graph learning 算法在两个公共可用的数据集上进行了严格的评估，并与现有方法进行了比较。结果表明该算法在分类精度、特异性和敏感性方面具有显著提高，并且在噪音和异常值的情况下具有坚定的稳定性。值得一提的是，提出的算法可以有效地处理医学图像分析中常见的不均衡数据集，进一步提高了其实际应用性。

Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits

paper_url: http://arxiv.org/abs/2309.00814
repo_url: None
paper_authors: Haolin Liu, Chen-Yu Wei, Julian Zimmert
for: 解决 adversarial linear contextual bandit problem，loss vectors 完全 adversarially 选择，per-round action set 从固定分布中随机选择。
methods: 不需要 simulator，直接使用 adversarial 搜索，实现 $\widetilde{O}(\sqrt{T})$ regret，同时保持小action set 的计算效率。
results: 在 sleeping bandits 中，解决 Saha et al. [2020] 的开问，存在 $poly(d)\sqrt{T}$ regret 的 polynomials-time 算法，并且可以处理 linear loss 的 additive misspecification error。

Abstract
We consider the adversarial linear contextual bandit problem, where the loss vectors are selected fully adversarially and the per-round action set (i.e. the context) is drawn from a fixed distribution. Existing methods for this problem either require access to a simulator to generate free i.i.d. contexts, achieve a sub-optimal regret no better than $\widetilde{O}(T^{\frac{5}{6})$, or are computationally inefficient. We greatly improve these results by achieving a regret of $\widetilde{O}(\sqrt{T})$ without a simulator, while maintaining computational efficiency when the action set in each round is small. In the special case of sleeping bandits with adversarial loss and stochastic arm availability, our result answers affirmatively the open question by Saha et al. [2020] on whether there exists a polynomial-time algorithm with $poly(d)\sqrt{T}$ regret. Our approach naturally handles the case where the loss is linear up to an additive misspecification error, and our regret shows near-optimal dependence on the magnitude of the error.

摘要
我们考虑了对抗线性上下文竞争问题，其中损失 вектор被完全对抗选择，并且每轮动作集（即上下文）是从固定分布中抽出的。现有的方法 either需要 Access to a simulator to generate free i.i.d. contexts, or achieve a sub-optimal regret no better than $\widetilde{O}(T^{\frac{5}{6})$, or are computationally inefficient. We greatly improve these results by achieving a regret of $\widetilde{O}(\sqrt{T})$ without a simulator, while maintaining computational efficiency when the action set in each round is small. In the special case of sleeping bandits with adversarial loss and stochastic arm availability, our result answers affirmatively the open question by Saha et al. [2020] on whether there exists a polynomial-time algorithm with $poly(d)\sqrt{T}$ regret. Our approach naturally handles the case where the loss is linear up to an additive misspecification error, and our regret shows near-optimal dependence on the magnitude of the error.

RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model

paper_url: http://arxiv.org/abs/2309.00810
repo_url: None
paper_authors: Fengxiang Bie, Yibo Yang, Zhongzhu Zhou, Adam Ghanem, Minjia Zhang, Zhewei Yao, Xiaoxia Wu, Connor Holmes, Pareesa Golnari, David A. Clifton, Yuxiong He, Dacheng Tao, Shuaiwen Leon Song
For: This paper focuses on text-to-image generation (TTI) models that use neural networks to generate high-fidelity images based on text descriptions.* Methods: The paper discusses various types of generative models used for TTI, including diffusion models, which have been shown to be effective in image synthesis and have become the major image decoder used by TTI models. The paper also explores the integration of large language models with TTI models to improve performance.* Results: The paper reports that TTI models have made significant progress in recent years, with the generation results nearly indistinguishable from real-world images. The paper argues that further improvements could be made through the combination of innovative model architectures and prediction enhancement techniques.

Abstract
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions. Text-to-image generation using neural networks could be traced back to the emergence of Generative Adversial Network (GAN), followed by the autoregressive Transformer. Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps. As an effect of the impressive results of diffusion models on image synthesis, it has been cemented as the major image decoder used by text-to-image models and brought text-to-image generation to the forefront of machine-learning (ML) research. In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models, resulting the generation result nearly indistinguishable from real-world images, revolutionizing the way we retrieval images. Our explorative study has incentivised us to think that there are further ways of scaling text-to-image models with the combination of innovative model architectures and prediction enhancement techniques. We have divided the work of this survey into five main sections wherein we detail the frameworks of major literature in order to delve into the different types of text-to-image generation methods. Following this we provide a detailed comparison and critique of these methods and offer possible pathways of improvement for future work. In the future work, we argue that TTI development could yield impressive productivity improvements for creation, particularly in the context of the AIGC era, and could be extended to more complex tasks such as video generation and 3D generation.

摘要
文本到图像生成（TTI）是指使用模型可以从文本输入中生成高效精度的图像描述。使用神经网络进行文本到图像生成可以追溯到生成对抗网络的出现，然后是推理转换器。扩散模型是一种广泛使用的生成模型，用于通过系统性地引入噪声来生成图像。由于扩散模型在图像生成方面的出色表现，因此成为了主要的图像解码器，并使文本到图像生成成为机器学习（ML）研究的先锋。在大模型时代，通过增大模型大小和与大语言模型的集成，有效提高了TTI模型的性能，使得生成结果几乎与实际图像无法分辨，革命化了图像检索方式。我们的探索研究让我们认为，可以通过创新的模型架构和预测增强技术来进一步扩展文本到图像模型。我们在这篇评论中分为五个主要部分，详细介绍了主要的文献框架，以便深入探讨不同类型的文本到图像生成方法。接着，我们对这些方法进行了详细比较和批判，并提出了未来工作的可能的改进方向。在未来工作中，我们 argue that TTI的发展可以带来很大的产出效益，特别在AIGC时代，并可以扩展到更复杂的任务，如视频生成和3D生成。

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties

paper_url: http://arxiv.org/abs/2309.00779
repo_url: https://github.com/tsor13/kaleido
paper_authors: Taylor Sorensen, Liwei Jiang, Jena Hwang, Sydney Levine, Valentina Pyatkin, Peter West, Nouha Dziri, Ximing Lu, Kavel Rao, Chandra Bhagavatula, Maarten Sap, John Tasioulas, Yejin Choi
for: The paper aims to improve AI systems’ ability to reflect value pluralism, which is the view that multiple correct values may be held in tension with one another.
methods: The authors introduce ValuePrism, a large-scale dataset of human-written values, rights, and duties, and use GPT-4 to generate contextualized values. They also build Kaleido, an open, light-weight, and structured language-based multi-task model that generates, explains, and assesses the relevance and valence of human values, rights, and duties within a specific context.
results: The authors show that Kaleido outperforms the teacher GPT-4 in terms of accuracy and broader coverage, and can help explain variability in human decision-making by outputting contrasting values. Additionally, they demonstrate that Kaleido’s representations transfer to other philosophical frameworks and datasets, confirming the benefit of an explicit, modular, and interpretable approach to value pluralism.

Abstract
Human values are crucial to human decision-making. Value pluralism is the view that multiple correct values may be held in tension with one another (e.g., when considering lying to a friend to protect their feelings, how does one balance honesty with friendship?). As statistical learners, AI systems fit to averages by default, washing out these potentially irreducible value conflicts. To improve AI systems to better reflect value pluralism, the first-order challenge is to explore the extent to which AI systems can model pluralistic human values, rights, and duties as well as their interaction. We introduce ValuePrism, a large-scale dataset of 218k values, rights, and duties connected to 31k human-written situations. ValuePrism's contextualized values are generated by GPT-4 and deemed high-quality by human annotators 91% of the time. We conduct a large-scale study with annotators across diverse social and demographic backgrounds to try to understand whose values are represented. With ValuePrism, we build Kaleido, an open, light-weight, and structured language-based multi-task model that generates, explains, and assesses the relevance and valence (i.e., support or oppose) of human values, rights, and duties within a specific context. Humans prefer the sets of values output by our system over the teacher GPT-4, finding them more accurate and with broader coverage. In addition, we demonstrate that Kaleido can help explain variability in human decision-making by outputting contrasting values. Finally, we show that Kaleido's representations transfer to other philosophical frameworks and datasets, confirming the benefit of an explicit, modular, and interpretable approach to value pluralism. We hope that our work will serve as a step to making more explicit the implicit values behind human decision-making and to steering AI systems to make decisions that are more in accordance with them.

摘要
人类价值观是人类决策中不可或缺的一部分。价值多元观是指人们可能同时保持多个正确的价值观（例如，当考虑到为朋友保持好感而做出谎言时，如何均衡诚实和友谊？）。作为统计学学习者，AI系统默认会按照平均值进行适应，这可能会抹平人类价值观的可能性。为了改进AI系统，以更好地反映价值多元观，我们的首要挑战是探索AI系统是否可以模拟人类多元价值观、权利和义务，以及它们之间的交互。我们介绍了ValuePrism，一个大规模的数据集，包含218万个价值观、权利和义务，与31万个人类写作的情况相连。ValuePrism的受过Contextualization的价值观是由GPT-4生成的，并被人类评分员评为高质量91%的时间。我们进行了大规模的研究，邀请来自多个社会和民族背景的 annotators，以了解价值观 representation的多样性。基于ValuePrism，我们构建了Kaleido，一个开源、轻量级、结构化的语言基于多任务模型，可以生成、解释和评估人类价值观、权利和义务在特定情况下的 relevance 和 valence（即支持或反对）。人类更喜欢我们的系统输出的价值观集，认为它们更准确，覆盖率更广。此外，我们还证明了Kaleido可以帮助解释人类决策的变化，输出相互矛盾的价值观。最后，我们表明了Kaleido的表示可以跨越不同的哲学框架和数据集，证明了明确、模块化和可解释的方法对价值多元观具有优势。我们希望通过让人类价值观变得更加明确，使AI系统做出更符合人类价值观的决策。

Contrastive Feature Masking Open-Vocabulary Vision Transformer

paper_url: http://arxiv.org/abs/2309.00775
repo_url: None
paper_authors: Dahun Kim, Anelia Angelova, Weicheng Kuo
for: 这篇论文旨在提出一种基于对比特征压缩视Transformer（CFM-ViT）的图像文本预训练方法，用于实现开放词汇对象检测（OVD）中的同时学习图像和区域水平表示。
methods: 该方法结合了压缩隐藏层（MAE）目标到对比学习目标中，以改进表示的地方semantics。此外，我们还引入了随机dropoutPositional Embedding（PED）来 Address scale variation between image-text pretraining and detection finetuning。
results: 在LVIS开放词汇检测标准benchmark上，CFM-ViT实现了33.9 AP$r$的最佳记录，比最佳方法高7.6分。此外，CFM-ViT还实现了更好的零aser检测传递性和图像水平表示。

Abstract
We present Contrastive Feature Masking Vision Transformer (CFM-ViT) - an image-text pretraining methodology that achieves simultaneous learning of image- and region-level representation for open-vocabulary object detection (OVD). Our approach combines the masked autoencoder (MAE) objective into the contrastive learning objective to improve the representation for localization tasks. Unlike standard MAE, we perform reconstruction in the joint image-text embedding space, rather than the pixel space as is customary with the classical MAE method, which causes the model to better learn region-level semantics. Moreover, we introduce Positional Embedding Dropout (PED) to address scale variation between image-text pretraining and detection finetuning by randomly dropping out the positional embeddings during pretraining. PED improves detection performance and enables the use of a frozen ViT backbone as a region classifier, preventing the forgetting of open-vocabulary knowledge during detection finetuning. On LVIS open-vocabulary detection benchmark, CFM-ViT achieves a state-of-the-art 33.9 AP$r$, surpassing the best approach by 7.6 points and achieves better zero-shot detection transfer. Finally, CFM-ViT acquires strong image-level representation, outperforming the state of the art on 8 out of 12 metrics on zero-shot image-text retrieval benchmarks.

摘要
我们提出了对比特征掩码视TRANSFORMER（CFM-ViT）方法，用于开放词汇 объек检测（OVD）的图像-文本预训练。我们的方法将掩码自动编码（MAE）目标函数与对比学习目标函数结合在一起，以提高定位任务的表示。与标准MAE不同的是，我们在图像-文本嵌入空间进行重建，而不是在像素空间进行重建，这使得模型更好地学习区域水平 semantics。此外，我们引入了随机dropout的位置嵌入（PED）来解决图像-文本预训练和检测finetuning之间的尺度变化问题。PED通过随机dropout位置嵌入来提高检测性能，并使得使用冻结的ViT背景作为区域分类器，以避免在检测finetuning过程中忘记开放词汇知识。在LVIS开放词汇检测数据集上，CFM-ViT实现了33.9 AP$r$的状态ethe-art成绩，比最佳方法提高7.6个点，并在零配置检测转移中实现了更好的检测性能。此外，CFM-ViT获得了图像水平表示的强大表示能力，在零配置图像-文本检索数据集上超过了状态艺术的8个 из 12个维度。

Bias and Fairness in Large Language Models: A Survey

paper_url: http://arxiv.org/abs/2309.00770
repo_url: https://github.com/i-gallegos/fair-llm-benchmark
paper_authors: Isabel O. Gallegos, Ryan A. Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, Nesreen K. Ahmed
for: 这 paper 的目的是提供一个完整的survey of bias evaluation and mitigation techniques for large language models (LLMs), 以便研究者和实践者可以更好地理解和防止 LLMS 中的偏见的传播。
methods: 这 paper 使用了三种分类法来描述 bias evaluation 和 mitigation techniques：一种是 metrics 的分类，另一种是 datasets 的分类，第三种是 mitigation techniques 的分类。这些分类法可以帮助研究者和实践者更好地理解和选择适合的方法。
results: 这 paper 的结果是一个完整的survey of recent research on bias evaluation and mitigation techniques for LLMs，包括了不同类型的 metrics、datasets 和 mitigation techniques，以及它们之间的关系和交互。这个survey 可以帮助研究者和实践者更好地理解和防止 LLMS 中的偏见。

Abstract
Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere. Despite this success, these models can learn, perpetuate, and amplify harmful social biases. In this paper, we present a comprehensive survey of bias evaluation and mitigation techniques for LLMs. We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing, defining distinct facets of harm and introducing several desiderata to operationalize fairness for LLMs. We then unify the literature by proposing three intuitive taxonomies, two for bias evaluation, namely metrics and datasets, and one for mitigation. Our first taxonomy of metrics for bias evaluation disambiguates the relationship between metrics and evaluation datasets, and organizes metrics by the different levels at which they operate in a model: embeddings, probabilities, and generated text. Our second taxonomy of datasets for bias evaluation categorizes datasets by their structure as counterfactual inputs or prompts, and identifies the targeted harms and social groups; we also release a consolidation of publicly-available datasets for improved access. Our third taxonomy of techniques for bias mitigation classifies methods by their intervention during pre-processing, in-training, intra-processing, and post-processing, with granular subcategories that elucidate research trends. Finally, we identify open problems and challenges for future work. Synthesizing a wide range of recent research, we aim to provide a clear guide of the existing literature that empowers researchers and practitioners to better understand and prevent the propagation of bias in LLMs.

摘要
大语言模型（LLMs）的快速进步使得处理、理解和生成人类语言的能力得到了进一步的提高，并在社会圈中得到了加强。然而，这些模型可以学习、延续和增强社会偏见。在这篇论文中，我们提供了大语言模型偏见评估和降低技术的全面评论。我们首先将社会偏见和公平在自然语言处理中进行了整合、正式化和扩展，并定义了不同类型的危害和公平的多种要求。然后，我们提出了三种直观的分类，即评估метри克和数据集分类，以及降低技术分类。我们的第一个分类是评估метри克分类，它将评估数据集和模型之间的关系解决，并将评估 метри克分为模型层次、字符级别和生成文本层次。我们的第二个分类是数据集分类，它将数据集分为对称输入或提示，并标识目标危害和社会群体。我们还发布了一份总结公共可用数据集的文件，以便更好地访问。我们的第三个分类是降低技术分类，它将降低方法分为预处理、训练、内部处理和后处理，并在每个子类别中列出了研究趋势。最后，我们 indentified了未来工作中的一些问题和挑战。通过汇总一系列最近的研究成果，我们希望通过这篇文章，为研究人员和实践者提供一份清晰的指南，以便更好地理解和预防 LLMS 中的偏见传播。