2023-09-30

cs.AI

cs.AI - 2023-09-30

Reinforcement learning adaptive fuzzy controller for lighting systems: application to aircraft cabin

paper_url: http://arxiv.org/abs/2310.00525
repo_url: None
paper_authors: Kritika Vashishtha, Anas Saad, Reza Faieghi, Fengfeng Xi
for: 这篇论文旨在开发智能照明算法，以适应用户偏好。
methods: 这篇论文使用了混杂逻辑和强化学习来开发一个可适应用户偏好的照明算法。具体来说，我们使用了领域知识来建立一个基线混杂推理系统（FIS），该系统根据环境条件（例如日均闪光指数）和用户信息（如年龄、活动和生物钟）生成照明设置建议。用户可以通过反馈机制来与算法进行交互，并将自己的偏好反馈给Q学习代理人，以调整FIS参数。
results: 我们在飞机客舱模拟室进行了广泛的用户研究，以评估算法的效果和学习行为。结果表明，开发的算法具有适应用户偏好的能力，并能成功地适应各种环境条件和用户特点。这表明该算法在智能照明领域具有广泛的应用前景。

Abstract
The lighting requirements are subjective and one light setting cannot work for all. However, there is little work on developing smart lighting algorithms that can adapt to user preferences. To address this gap, this paper uses fuzzy logic and reinforcement learning to develop an adaptive lighting algorithm. In particular, we develop a baseline fuzzy inference system (FIS) using the domain knowledge. We use the existing literature to create a FIS that generates lighting setting recommendations based on environmental conditions i.e. daily glare index, and user information including age, activity, and chronotype. Through a feedback mechanism, the user interacts with the algorithm, correcting the algorithm output to their preferences. We interpret these corrections as rewards to a Q-learning agent, which tunes the FIS parameters online to match the user preferences. We implement the algorithm in an aircraft cabin mockup and conduct an extensive user study to evaluate the effectiveness of the algorithm and understand its learning behavior. Our implementation results demonstrate that the developed algorithm possesses the capability to learn user preferences while successfully adapting to a wide range of environmental conditions and user characteristics. and can deal with a diverse spectrum of environmental conditions and user characteristics. This underscores its viability as a potent solution for intelligent light management, featuring advanced learning capabilities.

摘要
“照明需求是主观的，一个照明设定无法适用于所有人。然而，有很少有关于发展智能照明算法的研究，以适应用户喜好。为了填补这个空白，本研究使用混淆逻辑和强化学习开发了一个适应式照明算法。具体来说，我们开发了一个基本的混淆推理系统（FIS），使用领域知识来生成照明设定建议 based on 环境条件（日常闪光指数）和用户信息（年龄、活动和生物体质）。通过反馈机制，用户与算法进行互动，对算法输出进行更正，以对用户喜好进行调整。我们将这些更正视为对Q学习者的奖励，以调整FIS参数在线匹配用户喜好。我们实现了这个算法在飞机客舱模拟中，并进行了广泛的用户研究，以评估算法的效能和学习行为。我们的实现结果显示，开发的算法具有适应用户喜好的能力，并成功地适应了广泛的环境条件和用户特点。这说明了它的可行性，作为智能照管的强大解决方案。”

Learning Informative Latent Representation for Quantum State Tomography

paper_url: http://arxiv.org/abs/2310.00518
repo_url: None
paper_authors: Hailan Ma, Zhenhong Sun, Daoyi Dong, Dong Gong
for: 量子状态探测 (QST) 是一种重建完整量子系统（通过数学模型描述为密度矩阵）的过程，通过一系列不同的测量来获得量子状态的完整信息。
methods: 我们提议一种基于 transformer 架构的自动编码器结构，这种结构可以在不具备准确测量数据的情况下，通过提取高维度的干扰 latent representation (ILR) 来重建量子状态。
results: 我们通过对预处理的 encoder 进行训练，使其能够在不具备准确测量数据的情况下，对高维度 ILR 进行重建，并通过 decoder 来预测量子状态。我们的方法在实验中得到了惊人的效果，能够在不具备准确测量数据的情况下重建量子状态。

Abstract
Quantum state tomography (QST) is the process of reconstructing the complete state of a quantum system (mathematically described as a density matrix) through a series of different measurements. These measurements are performed on a number of identical copies of the quantum system, with outcomes gathered as frequencies. QST aims to recover the density matrix and the corresponding properties of the quantum state from the measured frequencies. Although an informationally complete set of measurements can specify quantum state accurately in an ideal scenario with a large number of identical copies, both measurements and identical copies are restricted and imperfect in practical scenarios, making QST highly ill-posed. The conventional QST methods usually assume adequate or accurate measured frequencies or rely on manually designed regularizers to handle the ill-posed reconstruction problem, suffering from limited applications in realistic scenarios. Recent advances in deep neural networks (DNNs) led to the emergence of deep learning (DL) in QST. However, existing DL-based QST approaches often employ generic DNN models that are not optimized for imperfect conditions of QST. In this paper, we propose a transformer-based autoencoder architecture tailored for QST with imperfect measurement data. Our method leverages a transformer-based encoder to extract an informative latent representation (ILR) from imperfect measurement data and employs a decoder to predict the quantum states based on the ILR. We anticipate that the high-dimensional ILR will capture more comprehensive information about quantum states. To achieve this, we conduct pre-training of the encoder using a pretext task that involves reconstructing high-quality frequencies from measured frequencies. Extensive simulations and experiments demonstrate the remarkable ability of the ILR in dealing with imperfect measurement data in QST.

摘要
量子状态拟合（QST）是将量子系统的完整状态（数学描述为密度矩阵）重建的过程，通过一系列不同的测量获得结果。这些测量在多个相同的量子系统上进行，并记录结果为频率。QST的目标是从测量结果中恢复密度矩阵和相应的量子状态属性。然而，在实际情况下，测量和量子系统 копи本都是受限和不完美的，使QST变得高度不定义。传统的QST方法通常假设了充分或准确的测量结果，或者依靠手动设计正则化器来处理不定义重建问题，受到限制的应用场景。现代深度神经网络（DNN）的出现导致深度学习（DL）在QST中出现。然而，现有的DL基于QST方法 oftemploys generic DNN模型，不是为不完美的QST条件优化。在本文中，我们提出一种基于变换器的自动编码器架构，适用于QST中的不完美测量数据。我们的方法利用变换器基本编码器提取不完美测量数据中的有用信息（ILR），并使用解码器预测量子状态基于ILR。我们预计ILR的高维度将捕捉更广泛的量子状态信息。为了实现这一点，我们在encoder中进行预训练，使用一个预TEXT任务，该任务涉及从测量数据中重建高质量的频率。我们的实验和 simulations表明，ILR具有在QST中处理不完美测量数据的出色能力。

A Brief History of Prompt: Leveraging Language Models

paper_url: http://arxiv.org/abs/2310.04438
repo_url: None
paper_authors: Golam Md Muktadir
for: 这篇论文探讨了自然语言处理领域内部件工程的发展历程。
methods: 论文考察了从早期语言模型和信息检索系统开始，随着年月，Prompt工程发展的关键进展。自2015年的注意机制引入以来，语言理解得到了革命性的改进，包括可控性和上下文意识。随后，基于强化学习技术的进展进一步提高了Prompt工程，解决了暴露偏见和生成文本中的偏见。论文还讨论了2018和2019年的重要贡献，包括细化策略、控制码和模板基本生成。
results: 论文详细介绍了2020和2021年的上下文提醒和转移学习的获得，以及2022和2023年的前期无监督训练和新奖励形成的出现。全文引用了具体的研究例子，以示不同发展对Prompt工程的影响。

Abstract
This paper presents a comprehensive exploration of the evolution of prompt engineering and generation in the field of natural language processing (NLP). Starting from the early language models and information retrieval systems, we trace the key developments that have shaped prompt engineering over the years. The introduction of attention mechanisms in 2015 revolutionized language understanding, leading to advancements in controllability and context-awareness. Subsequent breakthroughs in reinforcement learning techniques further enhanced prompt engineering, addressing issues like exposure bias and biases in generated text. We examine the significant contributions in 2018 and 2019, focusing on fine-tuning strategies, control codes, and template-based generation. The paper also discusses the growing importance of fairness, human-AI collaboration, and low-resource adaptation. In 2020 and 2021, contextual prompting and transfer learning gained prominence, while 2022 and 2023 witnessed the emergence of advanced techniques like unsupervised pre-training and novel reward shaping. Throughout the paper, we reference specific research studies that exemplify the impact of various developments on prompt engineering. The journey of prompt engineering continues, with ethical considerations being paramount for the responsible and inclusive future of AI systems.

摘要
In 2015, the introduction of attention mechanisms revolutionized language understanding, leading to advancements in controllability and context-awareness. Subsequent breakthroughs in reinforcement learning techniques further enhanced prompt engineering, addressing issues like exposure bias and biases in generated text.In 2018 and 2019, significant contributions included fine-tuning strategies, control codes, and template-based generation. The paper also discusses the growing importance of fairness, human-AI collaboration, and low-resource adaptation.In 2020 and 2021, contextual prompting and transfer learning gained prominence, while 2022 and 2023 witnessed the emergence of advanced techniques like unsupervised pre-training and novel reward shaping. Throughout the paper, we reference specific research studies that exemplify the impact of various developments on prompt engineering.The journey of prompt engineering continues, with ethical considerations being paramount for the responsible and inclusive future of AI systems.

Unveiling the Unborn: Advancing Fetal Health Classification through Machine Learning

paper_url: http://arxiv.org/abs/2310.00505
repo_url: None
paper_authors: Sujith K Mandala
for: 这个研究旨在提高胎儿健康评估的精度，以提供更好的胎儿健康评估方法。
methods: 本研究使用了LightGBM分类器，利用了该模型的强大搜寻和数据分析功能，并结合了多个特征，如胎心率、子宫收缩和 maternal blood pressure，以提供全面的评估。
results: 研究获得了98.31%的准确率，表明了机器学习的潜力在胎儿健康评估中。

Abstract
Fetal health classification is a critical task in obstetrics, enabling early identification and management of potential health problems. However, it remains challenging due to data complexity and limited labeled samples. This research paper presents a novel machine-learning approach for fetal health classification, leveraging a LightGBM classifier trained on a comprehensive dataset. The proposed model achieves an impressive accuracy of 98.31% on a test set. Our findings demonstrate the potential of machine learning in enhancing fetal health classification, offering a more objective and accurate assessment. Notably, our approach combines various features, such as fetal heart rate, uterine contractions, and maternal blood pressure, to provide a comprehensive evaluation. This methodology holds promise for improving early detection and treatment of fetal health issues, ensuring better outcomes for both mothers and babies. Beyond the high accuracy achieved, the novelty of our approach lies in its comprehensive feature selection and assessment methodology. By incorporating multiple data points, our model offers a more holistic and reliable evaluation compared to traditional methods. This research has significant implications in the field of obstetrics, paving the way for advancements in early detection and intervention of fetal health concerns. Future work involves validating the model on a larger dataset and developing a clinical application. Ultimately, we anticipate that our research will revolutionize the assessment and management of fetal health, contributing to improved healthcare outcomes for expectant mothers and their babies.

摘要
婴儿健康分类是妇科领域中一项关键任务，可以早期发现和管理潜在的健康问题。然而，由于数据复杂性和有限的标注样本，这项任务仍然具有挑战性。本研究论文提出了一种新的机器学习方法，利用LightGBM分类器在全面数据集上进行训练。我们的实验结果显示，提案的模型在测试集上达到了98.31%的准确率。我们的发现表明机器学习在婴儿健康分类中具有潜在的潜力，可以提供更加 объек的和准确的评估。我们的方法选择了多种特征，如婴儿心跳、uterine contractions和 maternal blood pressure，以提供全面的评估。这种方法背后的创新在于其全面的特征选择和评估方法。通过结合多个数据点，我们的模型提供了更加全面和可靠的评估，与传统方法相比。这项研究对妇科领域有着深远的影响，开创了早期发现和治疗婴儿健康问题的新途径。未来的工作将包括验证模型在更大的数据集上的可靠性和开发临床应用。最终，我们预计这项研究将对婴儿健康评估和管理产生深远的影响，为预期母亲和婴儿带来更好的医疗结果。

From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

paper_url: http://arxiv.org/abs/2310.00492
repo_url: None
paper_authors: Xuansheng Wu, Wenlin Yao, Jianshu Chen, Xiaoman Pan, Xiaoyang Wang, Ninghao Liu, Dong Yu
for: 研究如何使用微调 instrucion 来改善预训练模型的指令执行能力。
methods: 使用了多种本地和全局解释方法，包括输入输出偏导的 gradient-based 方法和自注意和循环层中的模式和概念解释技术。
results: 研究发现，微调 instrucion 对预训练模型有三个重要影响：1）帮助模型更好地识别用户提示中的指令部分，从而提高响应生成质量和解决“lost-in-the-middle”问题；2）将知识在循环层中与用户任务相关的知识相互协调，保持语言水平的稳定性；3）通过自注意机制，模型更好地认识指令词。这些发现对于理解预训练模型后 instrucion 微调的行为变化做出了贡献，并为未来针对不同应用场景进行预训练模型的解释和优化做出了基础。

Abstract
Large Language Models (LLMs) have achieved remarkable success, demonstrating powerful instruction-following capabilities across diverse tasks. Instruction fine-tuning is critical in enabling LLMs to align with user intentions and effectively follow instructions. In this work, we investigate how instruction fine-tuning modifies pre-trained models, focusing on two perspectives: instruction recognition and knowledge evolution. To study the behavior shift of LLMs, we employ a suite of local and global explanation methods, including a gradient-based approach for input-output attribution and techniques for interpreting patterns and concepts in self-attention and feed-forward layers. Our findings reveal three significant impacts of instruction fine-tuning: 1) It empowers LLMs to better recognize the instruction parts from user prompts, thereby facilitating high-quality response generation and addressing the ``lost-in-the-middle'' issue observed in pre-trained models; 2) It aligns the knowledge stored in feed-forward layers with user-oriented tasks, exhibiting minimal shifts across linguistic levels. 3) It facilitates the learning of word-word relations with instruction verbs through the self-attention mechanism, particularly in the lower and middle layers, indicating enhanced recognition of instruction words. These insights contribute to a deeper understanding of the behavior shifts in LLMs after instruction fine-tuning and lay the groundwork for future research aimed at interpreting and optimizing LLMs for various applications. We will release our code and data soon.

摘要

它使得 LLMS 更好地识别用户提示中的指令部分，从而促进高质量的响应生成和解决预训练模型中的“lost-in-the-middle”问题;2. 它将在Feedforward层中存储的知识与用户关注的任务相互协调，表现出较小的语言层次变化;3. 它通过自我注意机制来促进指令词的学习，特别是在下层和中层，表明了指令词的更好的识别。这些发现对于解释 LLMS 后 instruction fine-tuning 的行为变化提供了深入的理解，并为未来关于 LLMS 的多种应用程序进行解释和优化提供了基础。我们即将发布我们的代码和数据。

On Memorization and Privacy risks of Sharpness Aware Minimization

paper_url: http://arxiv.org/abs/2310.00488
repo_url: None
paper_authors: Young In Kim, Pratiksha Agrawal, Johannes O. Royset, Rajiv Khanna
for: 这个论文主要是为了解释在训练神经网络时，为什么使用抛物线优化算法可以获得更好的泛化性能。
methods: 该论文使用了一种新的指标来评估抛物线优化算法对不同数据点的表现。它们还对比了使用抛物线优化算法和标准SGD算法的性能。
results: 研究发现，使用抛物线优化算法可以在一些特殊的数据点上提高泛化性能，但同时也可能增加隐私风险。研究还提出了一些缓解这种隐私风险的策略。

Abstract
In many recent works, there is an increased focus on designing algorithms that seek flatter optima for neural network loss optimization as there is empirical evidence that it leads to better generalization performance in many datasets. In this work, we dissect these performance gains through the lens of data memorization in overparameterized models. We define a new metric that helps us identify which data points specifically do algorithms seeking flatter optima do better when compared to vanilla SGD. We find that the generalization gains achieved by Sharpness Aware Minimization (SAM) are particularly pronounced for atypical data points, which necessitate memorization. This insight helps us unearth higher privacy risks associated with SAM, which we verify through exhaustive empirical evaluations. Finally, we propose mitigation strategies to achieve a more desirable accuracy vs privacy tradeoff.

摘要
很多最近的研究都集中在设计可以寻找更平的最优点的算法，因为有证据表明这会提高多种数据集的泛化性能。在这个工作中，我们通过数据记忆的角度分析这些性能提升的原因。我们定义了一个新的指标，可以帮助我们确定特定的数据点，在比较于普通的SGD时，哪些算法更好地完成。我们发现，使用Sharpness Aware Minimization（SAM）时，特别是在不Typical的数据点上，其性能提升非常明显。这一点帮助我们发现高privacy风险，我们通过详细的实验来验证。最后，我们提出了一些缓解措施，以实现更好的准确率和隐私质量的平衡。

UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities

paper_url: http://arxiv.org/abs/2310.01441
repo_url: None
paper_authors: Hejia Geng, Boxun Xu, Peng Li
for: The paper aims to improve the inferential capabilities of large language models (LLMs) by proposing a new prompting framework called UPAR, which is inspired by Kant’s a priori philosophy.
methods: The UPAR framework consists of four phases: “Understand”, “Plan”, “Act”, and “Reflect”. It enables the extraction of structured information from complex contexts, prior planning of solutions, execution according to plan, and self-reflection.
results: The paper demonstrates the effectiveness of the UPAR framework by testing it on two tasks: a challenging subset of GSM8K and the causal judgment task. The results show that the accuracy of LLM inference is significantly improved, with an increase from 22.92% to 58.33% in the GSM8K task and from 67.91% to 75.40% in the causal judgment task.

Abstract
Large Language Models (LLMs) have demonstrated impressive inferential capabilities, with numerous research endeavors devoted to enhancing this capacity through prompting. Despite these efforts, a unified epistemological foundation is still conspicuously absent. Drawing inspiration from Kant's a priori philosophy, we propose the UPAR prompting framework, designed to emulate the structure of human cognition within LLMs. The UPAR framework is delineated into four phases: "Understand", "Plan", "Act", and "Reflect", enabling the extraction of structured information from complex contexts, prior planning of solutions, execution according to plan, and self-reflection. This structure significantly augments the explainability and accuracy of LLM inference, producing a human-understandable and inspectable inferential trajectory. Furthermore, our work offers an epistemological foundation for existing prompting techniques, allowing for a possible systematic integration of these methods. With GPT-4, our approach elevates the accuracy from COT baseline of 22.92% to 58.33% in a challenging subset of GSM8K, and from 67.91% to 75.40% in the causal judgment task.

摘要
大型语言模型（LLM）已经表现出了吸引人的推理能力，有很多研究努力于提高这种能力通过提示。尽管如此，一个统一的 épistémologique基础仍然缺失。我们 Drawing inspiration from Kant's a priori philosophy, we propose the UPAR prompting framework, designed to emulate the structure of human cognition within LLMs. The UPAR framework is delineated into four phases: "Understand", "Plan", "Act", and "Reflect", enabling the extraction of structured information from complex contexts, prior planning of solutions, execution according to plan, and self-reflection. This structure significantly augments the explainability and accuracy of LLM inference, producing a human-understandable and inspectable inferential trajectory. Furthermore, our work offers an épistémological foundation for existing prompting techniques, allowing for a possible systematic integration of these methods. With GPT-4, our approach elevates the accuracy from COT baseline of 22.92% to 58.33% in a challenging subset of GSM8K, and from 67.91% to 75.40% in the causal judgment task.

Encouraging Inferable Behavior for Autonomy: Repeated Bimatrix Stackelberg Games with Observations

paper_url: http://arxiv.org/abs/2310.00468
repo_url: None
paper_authors: Mustafa O. Karabag, Sophia Smith, David Fridovich-Keil, Ufuk Topcu
For: 这篇论文关注了自主agent在与其他非竞争决策机器人交互时，如何表达其意图和策略。* Methods: 作者使用了一个重复的二元Stackelberg游戏模型，以模拟自主车与其他机器人之间的交互。在这个模型中，领导者采用固定的混合策略，而追随者则根据领导者的前一步动作进行反应。* Results: 作者证明了领导者在有观察情况下可能会受到一定的推断损失，即与领导者的策略完全知情情况下的性能相比。此外，作者还提供了一个游戏，其中需要一定的交互次数来保证推断性。

Abstract
When interacting with other non-competitive decision-making agents, it is critical for an autonomous agent to have inferable behavior: Their actions must convey their intention and strategy. For example, an autonomous car's strategy must be inferable by the pedestrians interacting with the car. We model the inferability problem using a repeated bimatrix Stackelberg game with observations where a leader and a follower repeatedly interact. During the interactions, the leader uses a fixed, potentially mixed strategy. The follower, on the other hand, does not know the leader's strategy and dynamically reacts based on observations that are the leader's previous actions. In the setting with observations, the leader may suffer from an inferability loss, i.e., the performance compared to the setting where the follower has perfect information of the leader's strategy. We show that the inferability loss is upper-bounded by a function of the number of interactions and the stochasticity level of the leader's strategy, encouraging the use of inferable strategies with lower stochasticity levels. As a converse result, we also provide a game where the required number of interactions is lower bounded by a function of the desired inferability loss.

摘要

Consistent Aggregation of Objectives with Diverse Time Preferences Requires Non-Markovian Rewards

paper_url: http://arxiv.org/abs/2310.00435
repo_url: None
paper_authors: Silviu Pitis
for: 这篇论文关注了人工智能系统在服务多个多元目标和利益相关者时，目标组合是否能够有效地进行。
methods: 本论文采用了normative方法，从一组直观上有吸引力的axioms出发，证明了Markov均摊重函数不能够具有不同目标的时间偏好（折扣因子）。
results: 研究发现了一种实用的非Markov均摊集合方案，可以超越这种不可能性，仅需要每个目标添加一个额外参数。这些成果对sequential、多元目标代理和时间选择具有新的理解和实践意义，有助于设计服务多代理的人工智能系统。

Abstract
As the capabilities of artificial agents improve, they are being increasingly deployed to service multiple diverse objectives and stakeholders. However, the composition of these objectives is often performed ad hoc, with no clear justification. This paper takes a normative approach to multi-objective agency: from a set of intuitively appealing axioms, it is shown that Markovian aggregation of Markovian reward functions is not possible when the time preference (discount factor) for each objective may vary. It follows that optimal multi-objective agents must admit rewards that are non-Markovian with respect to the individual objectives. To this end, a practical non-Markovian aggregation scheme is proposed, which overcomes the impossibility with only one additional parameter for each objective. This work offers new insights into sequential, multi-objective agency and intertemporal choice, and has practical implications for the design of AI systems deployed to serve multiple generations of principals with varying time preference.

摘要
随着人工智能技术的进步，人工智能系统被越来越多地用于服务多个多样化的目标和利益相关者。然而，这些目标的组合经常是随意的，没有明确的证明。这篇论文采取了normative方法，从一组直观上有吸引力的axioms开始，Proof that Markovian aggregation of Markovian reward functions is not possible when the time preference (discount factor) for each objective may vary。这意味着优质多目标agent必须承认不同目标之间的非Markovian奖励。为此，一种实用的非Markovian汇集方案被提议，可以在每个目标上增加一个额外参数来解决这个不可能性。这项工作提供了新的思路，对sequential, multi-objective agency和时间偏好选择进行了深入的研究，并对AI系统服务多个代理人的设计产生了实质性的影响。

Active-Perceptive Motion Generation for Mobile Manipulation

paper_url: http://arxiv.org/abs/2310.00433
repo_url: None
paper_authors: Snehal Jauhri, Sophie Lueth, Georgia Chalvatzaki
for: 这个论文的目的是为移动抓取系统提供有用的视觉信息，以便在不知道的环境中完成抓取任务。
methods: 该论文使用了活动感知管道，通过在远程观察器中采样路径并计算路径智能来提高抓取任务的成功率。
results: 实验表明，该方法可以在 simulate 的Scene中提高移动抓取系统的抓取率，并且可以在实际场景中转移。此外，该方法还可以对抓取任务进行优化，以提高抓取率和效率。

Abstract
Mobile Manipulation (MoMa) systems incorporate the benefits of mobility and dexterity, thanks to the enlarged space in which they can move and interact with their environment. MoMa robots can also continuously perceive their environment when equipped with onboard sensors, e.g., an embodied camera. However, extracting task-relevant visual information in unstructured and cluttered environments such as households remains a challenge. In this work, we introduce an active perception pipeline for mobile manipulators to generate motions that are informative toward manipulation tasks such as grasping, in initially unknown, cluttered scenes. Our proposed approach ActPerMoMa generates robot trajectories in a receding horizon fashion, sampling trajectories and computing path-wise utilities that trade-off reconstructing the unknown scene by maximizing the visual information gain and the taskoriented objective, e.g., grasp success by maximizing grasp reachability efficiently. We demonstrate the efficacy of our method in simulated experiments with a dual-arm TIAGo++ MoMa robot performing mobile grasping in cluttered scenes and when its path is obstructed by external obstacles. We empirically analyze the contribution of various utilities and hyperparameters, and compare against representative baselines both with and without active perception objectives. Finally, we demonstrate the transfer of our mobile grasping strategy to the real world, showing a promising direction for active-perceptive MoMa.

摘要
Mobile Manipulation（MoMa）系统具有移动和聪明的优点，感谢装置了 борьбу的感知器，如搭载的相机。但是，在无组织的和混乱的环境中，如家居，提取任务相关的视觉信息仍然是一个挑战。在这个工作中，我们介绍了一个活耀感知管道，用于生产移动掌控器的机动轨迹，以实现实用的掌控任务，如抓取。我们的提案方法ActPerMoMa使用推移视野的方式生成机器人的轨迹，该轨迹将路径实用性和任务目标优先级排序。我们在实验中使用了双臂TIAGo++ MoMa机器人在混乱场景中进行移动抓取，并评估了不同的优点和参数的贡献。我们还与不具有活耀感知目标的基eline进行比较。最后，我们展示了我们的移动抓取策略在实际世界中的实现，显示了活耀感知MoMa的可能性。

Making Friends in the Dark: Ad Hoc Teamwork Under Partial Observability

paper_url: http://arxiv.org/abs/2310.01439
repo_url: https://github.com/jmribeiro/adhoc-teamwork-under-partial-observability
paper_authors: João G. Ribeiroa, Cassandro Martinhoa, Alberto Sardinhaa, Francisco S. Melo
for: 本研究旨在提供一种基于优先知识和偏见环境的适应性团队协作方法，以便在偏见环境下协助未知团队成员解决未知任务。
methods: 本研究提出了三个假设，即环境状态总是半可见，团队成员的行为总是不可见，以及协作代理不可以直接获得奖励信号。基于这些假设，我们提出了一种基于优先知识和偏见环境的适应性协作方法。
results: 我们在70个POMDP问题中进行了11个领域的实验，结果表明，我们的方法不仅能够帮助未知团队成员解决未知任务，而且可以在更加复杂的问题上进行稳定的性能。

Abstract
This paper introduces a formal definition of the setting of ad hoc teamwork under partial observability and proposes a first-principled model-based approach which relies only on prior knowledge and partial observations of the environment in order to perform ad hoc teamwork. We make three distinct assumptions that set it apart previous works, namely: i) the state of the environment is always partially observable, ii) the actions of the teammates are always unavailable to the ad hoc agent and iii) the ad hoc agent has no access to a reward signal which could be used to learn the task from scratch. Our results in 70 POMDPs from 11 domains show that our approach is not only effective in assisting unknown teammates in solving unknown tasks but is also robust in scaling to more challenging problems.

摘要
这篇论文介绍了适应性团队工作的正式定义，并提出了基于模型的首则方法，该方法仅基于团队成员的先前知识和环境的部分观察来实现适应性团队工作。我们做出了三个特点分开的假设，即：i) 环境状态总是部分可见，ii) 团队成员的行动总是不可见给适应代理人和 iii) 适应代理人没有直接学习任务的奖励信号。我们在11个领域中的70个POMDP中的结果表明，我们的方法不仅能够帮助未知团队成员解决未知任务，还能够在更加困难的问题上具有稳定性。

Building Flexible, Scalable, and Machine Learning-ready Multimodal Oncology Datasets

paper_url: http://arxiv.org/abs/2310.01438
repo_url: None
paper_authors: Aakash Tripathi, Asim Waqas, Kavya Venkatesan, Yasin Yilmaz, Ghulam Rasool
for: 本研究旨在提出一种可扩展、可扩展、可靠的metadata框架，以便高效地融合不同来源的医疗数据，包括放射学扫描、 histopathology 图像和分子信息，以及临床数据，以实现精准医学和个性化治疗。
methods: 本研究使用了 Multimodal Integration of Oncology Data System (MINDS)，一个可扩展、可扩展、可靠的metadata框架，可以高效地融合不同来源的医疗数据，并提供了一个界面，以便探索不同数据类型之间的关系，并建立大规模多modal机器学习模型。
results: 本研究通过MINDS来融合多modal数据，并提供了一个patient-centric的框架，以便实现精准医学和个性化治疗。MINDS还可以跟踪细致的数据证明，以确保可重现性和透明度。此外，MINDS的云 Native架构可以处理快速增长的数据，并确保安全、可靠、可扩展和高效的数据处理。

Abstract
The advancements in data acquisition, storage, and processing techniques have resulted in the rapid growth of heterogeneous medical data. Integrating radiological scans, histopathology images, and molecular information with clinical data is essential for developing a holistic understanding of the disease and optimizing treatment. The need for integrating data from multiple sources is further pronounced in complex diseases such as cancer for enabling precision medicine and personalized treatments. This work proposes Multimodal Integration of Oncology Data System (MINDS) - a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources such as the Cancer Research Data Commons (CRDC) into an interconnected, patient-centric framework. MINDS offers an interface for exploring relationships across data types and building cohorts for developing large-scale multimodal machine learning models. By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability to uncover diagnostic and prognostic insights and enable evidence-based personalized care. MINDS tracks granular end-to-end data provenance, ensuring reproducibility and transparency. The cloud-native architecture of MINDS can handle exponential data growth in a secure, cost-optimized manner while ensuring substantial storage optimization, replication avoidance, and dynamic access capabilities. Auto-scaling, access controls, and other mechanisms guarantee pipelines' scalability and security. MINDS overcomes the limitations of existing biomedical data silos via an interoperable metadata-driven approach that represents a pivotal step toward the future of oncology data integration.

摘要
随着数据获取、存储和处理技术的进步，医疗数据的多样化快速增长。将验理学扫描图像、 Histopathology 图像和分子信息与临床数据集成是为了建立疾病全面理解和优化治疗提供了基础。在复杂的疾病，如癌症，将多种数据集成是为了实现精准医疗和个性化治疗。本文提出了多Modal Integration of Oncology Data System (MINDS) -一个灵活、可扩展、成本效果的元数据框架，可以有效地将不同来源的数据集成成一个连接的、患者中心的框架。MINDS 提供了浏览不同数据类型之间的关系的接口，并可以建立大规模多模态机器学习模型。通过融合多模态数据，MINDS 目标是使研究人员拥有更多的分析能力，探索诊断和预后探索的新知识，并为个性化医疗提供证据基础。MINDS 跟踪精细的数据来源追溯，确保可重现性和透明度。云native 架构可以处理快速增长的数据，并确保安全、成本优化的方式进行存储优化、复制避免和动态访问。自动扩展、访问控制和其他机制确保管道的可扩展性和安全性。MINDS 超越了现有的医学数据困难，通过一种可交互的元数据驱动的方法，代表了未来医学数据集成的重要一步。

Refutation of Shapley Values for XAI – Additional Evidence

paper_url: http://arxiv.org/abs/2310.00416
repo_url: None
paper_authors: Xuanxiang Huang, Joao Marques-Silva
for: 证明Shapley值不适用于可解释人工智能（XAI）。
methods: 使用 families of classifiers，不是 Boolean 分类器，以及 multiple classes 可以选择。
results: 显示 Shapley values 不适用于 XAI，并且features changed in any minimal $l_0$ distance adversarial examples 不包括无关的特征。

Abstract
Recent work demonstrated the inadequacy of Shapley values for explainable artificial intelligence (XAI). Although to disprove a theory a single counterexample suffices, a possible criticism of earlier work is that the focus was solely on Boolean classifiers. To address such possible criticism, this paper demonstrates the inadequacy of Shapley values for families of classifiers where features are not boolean, but also for families of classifiers for which multiple classes can be picked. Furthermore, the paper shows that the features changed in any minimal $l_0$ distance adversarial examples do not include irrelevant features, thus offering further arguments regarding the inadequacy of Shapley values for XAI.

摘要
最近的工作表明了希普利值不适用于可解释人工智能（XAI）。尽管单个反例 suffices to disprove a theory，可能的批评是之前的工作强调了布尔分类器。为解决这种可能的批评，本文示出希普利值对于不是布尔分类器的家族分类器以及可以选择多个类的家族分类器是无效的。此外，本文还证明了在任何最小$l_0$距离抗击例中改变的特征不包括无关的特征，从而提供了更多有关希普利值不适用于 XAI 的论据。

Order-Preserving GFlowNets

paper_url: http://arxiv.org/abs/2310.00386
repo_url: None
paper_authors: Yihang Chen, Lukas Mauch
for: 这个论文是用来解决多个目标优化任务中的问题，特别是当目标函数不可知或 computationally expensive 时。
methods: 这个论文提出了 Order-Preserving GFlowNets (OP-GFNs)，一种可以根据提供的（部分）排序来评估候选者的可能性，不需要明确表述优化函数。
results: 这个论文的实验结果显示 OP-GFNs 可以在单一目标最大化任务和多个目标 Pareto 前方估算任务中表现出色，包括人工数据集、分子生成和神经架构搜寻。

Abstract
Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates with probabilities proportional to a given reward. However, GFlowNets can only be used with a predefined scalar reward, which can be either computationally expensive or not directly accessible, in the case of multi-objective optimization (MOO) tasks for example. Moreover, to prioritize identifying high-reward candidates, the conventional practice is to raise the reward to a higher exponent, the optimal choice of which may vary across different environments. To address these issues, we propose Order-Preserving GFlowNets (OP-GFNs), which sample with probabilities in proportion to a learned reward function that is consistent with a provided (partial) order on the candidates, thus eliminating the need for an explicit formulation of the reward function. We theoretically prove that the training process of OP-GFNs gradually sparsifies the learned reward landscape in single-objective maximization tasks. The sparsification concentrates on candidates of a higher hierarchy in the ordering, ensuring exploration at the beginning and exploitation towards the end of the training. We demonstrate OP-GFN's state-of-the-art performance in single-objective maximization (totally ordered) and multi-objective Pareto front approximation (partially ordered) tasks, including synthetic datasets, molecule generation, and neural architecture search.

摘要
生成流网络（GFlowNets）已经被提出，用于采样一个多样化的候选者，概率与给定的奖励相符。然而，GFlowNets只能在预定的整数奖励上使用，这可能是计算成本高或不直接可访问的，例如多目标优化（MOO）任务中。此外，为便于寻找高奖励候选者，通常是通过提高奖励的幂来进行优化，但选择最佳幂值可能会因不同环境而异。为解决这些问题，我们提议Order-Preserving GFlowNets（OP-GFNs），它采样的概率与提供的（部分）顺序中的候选者相符，因此不需要显式表述奖励函数。我们证明OP-GFNs在单目标最大化任务中的训练过程中逐渐简化学习的奖励地形，并且在开始训练时进行探索，到训练结束时则进行利用。我们在单目标最大化（完全排序）和多目标Pareto前缘接近（部分排序）任务中实现OP-GFNs的状态 arts Performances，包括 sintetic dataset、分子生成和神经网络搜索。

Dynamic Demonstrations Controller for In-Context Learning

paper_url: http://arxiv.org/abs/2310.00385
repo_url: https://github.com/tjtp/d2controller
paper_authors: Fei Zhao, Taotian Pang, Zhen Wu, Zheng Ma, Shujian Huang, Xinyu Dai
for: 本研究旨在探讨启发式学习（ICL）中语言模型（LLM）的示例数量对性能的影响，并提出一种动态示例控制器（D$^2$Controller）以改进ICL性能。
methods: 本研究使用了一些预先设计的示例，并通过对不同大小的LLM进行测试，来检验D$^2$Controller的效果。
results: 实验结果表明，D$^2$Controller可以在八种不同的LLM上提高ICL性能平均5.4%，并且可以与之前的ICL模型进行比较。

Abstract
In-Context Learning (ICL) is a new paradigm for natural language processing (NLP), where a large language model (LLM) observes a small number of demonstrations and a test instance as its input, and directly makes predictions without updating model parameters. Previous studies have revealed that ICL is sensitive to the selection and the ordering of demonstrations. However, there are few studies regarding the impact of the demonstration number on the ICL performance within a limited input length of LLM, because it is commonly believed that the number of demonstrations is positively correlated with model performance. In this paper, we found this conclusion does not always hold true. Through pilot experiments, we discover that increasing the number of demonstrations does not necessarily lead to improved performance. Building upon this insight, we propose a Dynamic Demonstrations Controller (D$^2$Controller), which can improve the ICL performance by adjusting the number of demonstrations dynamically. The experimental results show that D$^2$Controller yields a 5.4% relative improvement on eight different sizes of LLMs across ten datasets. Moreover, we also extend our method to previous ICL models and achieve competitive results.

摘要
新的一代自然语言处理（NLP） paradigma——卷积语言模型（LLM）在观察一小数量示例和测试实例后，直接进行预测而不需要更新模型参数。先前的研究表明，ICL对示例选择和排序具有敏感性。然而，关于 Limited LLM 输入长度内示例数量对 ICL 性能的影响，有少量研究，因为通常认为示例数量与模型性能是正相关的。在这篇论文中，我们发现这种结论并不总是成立。经验测试表明，增加示例数量并不一定能提高性能。基于这一点，我们提出了动态示例控制器（D$^2$Controller），可以在运行时动态调整示例数量，以提高 ICL 性能。实验结果表明，D$^2$Controller 在八种不同大小的 LLM 上对十个数据集进行了5.4%的相对提高。此外，我们还扩展了我们的方法到之前的 ICL 模型，并实现了竞争性的结果。

Measuring Value Understanding in Language Models through Discriminator-Critique Gap

paper_url: http://arxiv.org/abs/2310.00378
repo_url: None
paper_authors: Zhaowei Zhang, Fengshuo Bai, Jun Gao, Yaodong Yang
for: 本研究旨在评估大语言模型（LLMs）对人类价值的理解水平，并提出了一个Value Understanding Measurement（VUM）框架来评估LLMs的价值理解。
methods: 本研究使用了 Schwartz Value Survey 来Specify evaluation values，并开发了一个 thousand-level dialogue dataset with GPT-4。对于 LLMs 的评估，分析了其输出与基eline答案之间的差异，以及 LLM 对价值认知的理由与 GPT-4 的注释之间的差异。
results: 研究发现，随着 LLMS 的缩放，”know what” 方面的差异增加，但 “know why” 方面的差异很少变化，这可能指示 LLMS 可能会提供合理的解释，但并不真正理解其内在的价值。这些结果可能表明 LLMS 可能存在风险。

Abstract
Recent advancements in Large Language Models (LLMs) have heightened concerns about their potential misalignment with human values. However, evaluating their grasp of these values is complex due to their intricate and adaptable nature. We argue that truly understanding values in LLMs requires considering both "know what" and "know why". To this end, we present the Value Understanding Measurement (VUM) framework that quantitatively assesses both "know what" and "know why" by measuring the discriminator-critique gap related to human values. Using the Schwartz Value Survey, we specify our evaluation values and develop a thousand-level dialogue dataset with GPT-4. Our assessment looks at both the value alignment of LLM's outputs compared to baseline answers and how LLM responses align with reasons for value recognition versus GPT-4's annotations. We evaluate five representative LLMs and provide strong evidence that the scaling law significantly impacts "know what" but not much on "know why", which has consistently maintained a high level. This may further suggest that LLMs might craft plausible explanations based on the provided context without truly understanding their inherent value, indicating potential risks.

摘要
最近的大语言模型（LLM）技术进步引发了对其可能的偏差问题的担忧。然而，评估这些模型对人类价值的理解复杂，因为它们的结构和适应性很强。我们认为，很难真正理解 LLM 中的价值，不是只是知道“what”，而是知道“why”。为此，我们提出了价值理解度量框架（VUM），用于量化评估 LLM 对人类价值的理解。我们使用了 Schwartz 价值问卷，并开发了一个 thousand-level 对话集，并用 GPT-4 进行评估。我们的评估包括 LLM 输出与基线答案之间的价值Alignment，以及 LLM 对价值认知的原因与 GPT-4 的注释之间的对应性。我们评估了五种代表性 LLM，并发现了扩展法律对“what”有显著影响，但对“why”没有太多影响，这可能表示 LLM 可能会基于提供的 контекст生成可能的解释，而不是真正理解其内在的价值，这可能会带来风险。

AI-Dentify: Deep learning for proximal caries detection on bitewing x-ray – HUNT4 Oral Health Study

paper_url: http://arxiv.org/abs/2310.00354
repo_url: None
paper_authors: Javier Pérez de Frutos, Ragnhild Holden Helland, Shreya Desai, Line Cathrine Nymoen, Thomas Langø, Theodor Remman, Abhijit Sen
for: 该研究用于检测牙科疾病（牙肉病）的诊断，以提高牙医诊断的准确率和效率。
methods: 该研究使用了三种深度学习模型：RetinaNet（ResNet50）、YOLOv5（M size）和EfficientDet（D0和D1 size），并使用了13,887张牙科照片的标注数据进行训练。
results: 训练后的模型显示了与牙医专业人员的准确率和F1分数的提高，并且false negative率的减少。其中YOLOv5模型表现最佳，其中的mean average precision为0.647，mean F1分数为0.548，false negative率为0.149。

Abstract
Background: Dental caries diagnosis requires the manual inspection of diagnostic bitewing images of the patient, followed by a visual inspection and probing of the identified dental pieces with potential lesions. Yet the use of artificial intelligence, and in particular deep-learning, has the potential to aid in the diagnosis by providing a quick and informative analysis of the bitewing images. Methods: A dataset of 13,887 bitewings from the HUNT4 Oral Health Study were annotated individually by six different experts, and used to train three different object detection deep-learning architectures: RetinaNet (ResNet50), YOLOv5 (M size), and EfficientDet (D0 and D1 sizes). A consensus dataset of 197 images, annotated jointly by the same six dentist, was used for evaluation. A five-fold cross validation scheme was used to evaluate the performance of the AI models. Results: the trained models show an increase in average precision and F1-score, and decrease of false negative rate, with respect to the dental clinicians. Out of the three architectures studied, YOLOv5 shows the largest improvement, reporting 0.647 mean average precision, 0.548 mean F1-score, and 0.149 mean false negative rate. Whereas the best annotators on each of these metrics reported 0.299, 0.495, and 0.164 respectively. Conclusion: Deep-learning models have shown the potential to assist dental professionals in the diagnosis of caries. Yet, the task remains challenging due to the artifacts natural to the bitewings.

摘要
背景：牙科疾病诊断需要 manually inspect 牙齿图像，然后进行视觉检查和可能的疾病部位的探针。然而，人工智能（AI）和深度学习有助于诊断，可以提供快速和有用的牙齿图像分析。方法：使用了13,887张牙齿图像从HUNT4口腔卫生研究计划，由6名专家分别注释，并用于训练3种深度学习建筑：RetinaNet（ResNet50）、YOLOv5（M大小）和EfficientDet（D0和D1大小）。用于评估AI模型性能的共识数据集包含197张图像，由同6名牙科专家共同注释。采用五fold十字验证法来评估AI模型性能。结果：训练的模型在比较牙科专家时显示了增加的平均精度和F1分数，以及降低的假阳性率。其中YOLOv5显示最大改善，reporting平均精度0.647、平均F1分数0.548和平均假阳性率0.149。而最佳注释员在每个维度上的最佳值分别为0.299、0.495和0.164。结论：深度学习模型在诊断疾病方面表现出了潜力，但牙齿图像中的自然遗传物品使得任务变得更加挑战。

Neuroadaptation in Physical Human-Robot Collaboration

paper_url: http://arxiv.org/abs/2310.00351
repo_url: None
paper_authors: Avinash Singh, Dikai Liu, Chin-Teng Lin
for: 这个论文的目的是解决人机合作系统中机器人的行为和操作方式需要根据人工作者的性能和意图以及不同人工作者的碰撞避免和机器人操作的单点性进行调整。
methods: 作者提出了一种基于带有反馈学习的封闭征文化框架，通过利用认知冲突信息来适应机器人策略，并与开 Loop设置进行比较。
results: 实验结果表明，封闭征文化框架在人机合作中降低了认知冲突水平，从而提高了人机合作的平滑度和直观性。这些结果表明了未来人机合作控制系统中使用电энцеfalogram（EEG）信号的可能性。

Abstract
Robots for physical Human-Robot Collaboration (pHRC) systems need to change their behavior and how they operate in consideration of several factors, such as the performance and intention of a human co-worker and the capabilities of different human-co-workers in collision avoidance and singularity of the robot operation. As the system's admittance becomes variable throughout the workspace, a potential solution is to tune the interaction forces and control the parameters based on the operator's requirements. To overcome this issue, we have demonstrated a novel closed-loop-neuroadaptive framework for pHRC. We have applied cognitive conflict information in a closed-loop manner, with the help of reinforcement learning, to adapt to robot strategy and compare this with open-loop settings. The experiment results show that the closed-loop-based neuroadaptive framework successfully reduces the level of cognitive conflict during pHRC, consequently increasing the smoothness and intuitiveness of human-robot collaboration. These results suggest the feasibility of a neuroadaptive approach for future pHRC control systems through electroencephalogram (EEG) signals.

摘要

Visual Political Communication in a Polarized Society: A Longitudinal Study of Brazilian Presidential Elections on Instagram

paper_url: http://arxiv.org/abs/2310.00349
repo_url: None
paper_authors: Mathias-Felipe de-Lima-Santos, Isabella Gonçalves, Marcos G. Quiles, Lucia Mesquita, Wilson Ceron
for: This study aims to investigate the visual communication strategies employed by Brazilian presidential candidates on Instagram in the 2018 and 2022 national elections.methods: The study employs a combination of computational methods and qualitative approach to analyze a dataset of 11,263 Instagram posts by 19 Brazilian presidential candidates.results: The study finds consistent patterns of celebratory and positively toned images, a strong sense of personalization, and unique contextual nuances specific to the Brazilian political landscape. The study also uncovers the prevalence of screenshots from news websites and other social media platforms, as well as text-edited images with portrayals.

Abstract
In today's digital age, images have emerged as powerful tools for politicians to engage with their voters on social media platforms. Visual content possesses a unique emotional appeal that often leads to increased user engagement. However, research on visual communication remains relatively limited, particularly in the Global South. This study aims to bridge this gap by employing a combination of computational methods and qualitative approach to investigate the visual communication strategies employed in a dataset of 11,263 Instagram posts by 19 Brazilian presidential candidates in 2018 and 2022 national elections. Through two studies, we observed consistent patterns across these candidates on their use of visual political communication. Notably, we identify a prevalence of celebratory and positively toned images. They also exhibit a strong sense of personalization, portraying candidates connected with their voters on a more emotional level. Our research also uncovers unique contextual nuances specific to the Brazilian political landscape. We note a substantial presence of screenshots from news websites and other social media platforms. Furthermore, text-edited images with portrayals emerge as a prominent feature. In light of these results, we engage in a discussion regarding the implications for the broader field of visual political communication. This article serves as a testament to the pivotal role that Instagram has played in shaping the narrative of two fiercely polarized Brazilian elections, casting a revealing light on the ever-evolving dynamics of visual political communication in the digital age. Finally, we propose avenues for future research in the realm of visual political communication.

摘要
今天的数字时代，图像已成为政治家用于社交媒体平台与选民互动的有效工具。图像具有唯一的情感吸引力，导致用户参与度增加。然而，关于视觉通信的研究在全球南方仍然有限，特别是在2018和2022年布razil大选期间。这项研究希望通过计算方法和质量方法相结合，investigate在11,263个Instagram帖子中19名布razil总统候选人的视觉政治通信策略。经两项研究，我们发现了候选人在使用视觉政治通信的一系列办法的一致性。尤其是，我们发现了一种庆祝和积极的图像优势。候选人还具有更深层的人性化，通过更直观的情感连接与选民。我们的研究还发现了特定于布razil政治景观的 Contextual nuances。我们注意到了屏幕截屉和其他社交媒体平台的屏幕截屉的普遍存在。此外，我们发现了文本修改的图像特征。在这些结果的基础之上，我们进行了关于视觉政治通信领域的探讨。这篇文章作为Instagram在两场极其分化的布razil大选中的形象之一，投射出了数字时代的视觉政治通信在不断演化的特点。最后，我们提出了未来在视觉政治通信领域的研究方向。

Unlocking Bias Detection: Leveraging Transformer-Based Models for Content Analysis

paper_url: http://arxiv.org/abs/2310.00347
repo_url: None
paper_authors: Shaina Raza, Oluwanifemi Bamgbose, Veronica Chatrath, Shardul Ghuge, Yan Sidyakin, Abdullah Y Muaad
for: 检测文本中的偏见，因为它可能会� reinforcing negative stereotypes，diffuse misinformation，和影响决策。
methods: 我们引入了Contextualized Bi-Directional Dual Transformer（CBDT）分类器，该架构使用了两个相互作用的 transformer 网络：Context Transformer 和 Entity Transformer，以提高偏见检测的能力。
results: CBDT 模型在多个数据集上进行了严谨测试，能够准确地 distinguish 偏见与中性声明，并且可以准确地标识具有偏见的单词。与现有方法相比，CBDT 模型表现出了2-4％的提升。

Abstract
Bias detection in text is imperative due to its role in reinforcing negative stereotypes, disseminating misinformation, and influencing decisions. Current language models often fall short in generalizing beyond their training sets. In response, we introduce the Contextualized Bi-Directional Dual Transformer (CBDT) Classifier. This novel architecture utilizes two synergistic transformer networks: the Context Transformer and the Entity Transformer, aiming for enhanced bias detection. Our dataset preparation follows the FAIR principles, ensuring ethical data usage. Through rigorous testing on various datasets, CBDT showcases its ability in distinguishing biased from neutral statements, while also pinpointing exact biased lexemes. Our approach outperforms existing methods, achieving a 2-4\% increase over benchmark performances. This opens avenues for adapting the CBDT model across diverse linguistic and cultural landscapes.

摘要
文本中的偏见检测非常重要，因为它可以扩大负面刻板印象，传播错误信息，并影响决策。现有语言模型经常无法总是泛化到其训练集之外。为此，我们介绍了 Contextualized Bi-Directional Dual Transformer（CBDT）分类器。这种新的架构使用两个相互作用的转换器网络：上下文转换器和实体转换器，以提高偏见检测能力。我们的数据准备遵循了 FAIR 原则，确保数据使用的道德。经过对多个数据集的严格测试，CBDT 能够 отличи出偏见语句和中立语句，同时还能够准确地标识偏见词语。我们的方法在现有方法之上提高了2-4％的性能，这开启了适应不同语言和文化背景的CBDT 模型的可能性。

paper_url: http://arxiv.org/abs/2310.08595
repo_url: https://github.com/palatiumAI/deep-intersection-navigation
paper_authors: Badr Ben Elallid, Hamza El Alaoui, Nabil Benamar
for: 本研究探讨了自动驾驶车辆（AV）在高密度交通情况下穿梭复杂T字口的挑战。
methods: 本研究使用了杜邦滞后决策函数Gradient（TD3）计算学习算法来解决这些挑战，实现AV在实时下做出安全和高效决策。
results: 我们的TD3基于方法在CARLA simulate平台上训练和测试后显示稳定协调和安全性表现，在不同的交通密度下都有显著改善。结果表明我们的方法可以有效地帮助AV穿梭T字口，比前一些方法减少旅行延迟、减少碰撞和总成本。这篇研究贡献了自动驾驶领域的强化学习应用和单机制方法的前进，并探讨未来强化学习算法的发展。

Abstract
In this paper, we explore the challenges associated with navigating complex T-intersections in dense traffic scenarios for autonomous vehicles (AVs). Reinforcement learning algorithms have emerged as a promising approach to address these challenges by enabling AVs to make safe and efficient decisions in real-time. Here, we address the problem of efficiently and safely navigating T-intersections using a lower-cost, single-agent approach based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) reinforcement learning algorithm. We show that our TD3-based method, when trained and tested in the CARLA simulation platform, demonstrates stable convergence and improved safety performance in various traffic densities. Our results reveal that the proposed approach enables the AV to effectively navigate T-intersections, outperforming previous methods in terms of travel delays, collision minimization, and overall cost. This study contributes to the growing body of knowledge on reinforcement learning applications in autonomous driving and highlights the potential of single-agent, cost-effective methods for addressing more complex driving scenarios and advancing reinforcement learning algorithms in the future.

摘要
在这篇论文中，我们探讨了自动驾驶车辆（AV）在稠密交通场景中穿梭复杂的T字路口的挑战。 reinforcement learning算法已经出现为解决这些挑战的有力方法，允许AV在实时中作出安全和高效的决策。在这里，我们解决了在低成本、单代理下使用TD3 reinforcement learning算法来有效地和安全地穿梭T字路口的问题。我们在CARLA simulate平台上训练和测试了我们的TD3基本方法，并证明了它在不同的交通密度下具有稳定的凝固和安全性表现。我们的结果表明，我们的方法可以帮助AV有效地穿梭T字路口，在前一些方法的旁路延迟、碰撞降低和总成本方面表现出色。这篇研究贡献了自动驾驶领域的增长体系，并高亮了未来可能的单代理、低成本方法在推进增强学习算法的前景。

FedLPA: Personalized One-shot Federated Learning with Layer-Wise Posterior Aggregation

paper_url: http://arxiv.org/abs/2310.00339
repo_url: None
paper_authors: Xiang Liu, Liangxi Liu, Feiyang Ye, Yunheng Shen, Xia Li, Linshan Jiang, Jialin Li
for: 提高 federated learning 中一 shot 聚合性能，减少 client-server 通信 overhead，并保护 client 的隐私。
methods: 提出了一种叫做 FedLPA 的一 shot 聚合方法，通过层 wise posterior 聚合来快速地从多个不同训练数据的本地模型中获取更准确的全球模型，而无需Extra 附加数据或暴露任何客户端信息。
results: 在几个 metrics 上实现了对 state-of-the-art 方法的显著改进，包括学习性能和通信 overhead 等。

Abstract
Efficiently aggregating trained neural networks from local clients into a global model on a server is a widely researched topic in federated learning. Recently, motivated by diminishing privacy concerns, mitigating potential attacks, and reducing the overhead of communication, one-shot federated learning (i.e., limiting client-server communication into a single round) has gained popularity among researchers. However, the one-shot aggregation performances are sensitively affected by the non-identical training data distribution, which exhibits high statistical heterogeneity in some real-world scenarios. To address this issue, we propose a novel one-shot aggregation method with Layer-wise Posterior Aggregation, named FedLPA. FedLPA aggregates local models to obtain a more accurate global model without requiring extra auxiliary datasets or exposing any confidential local information, e.g., label distributions. To effectively capture the statistics maintained in the biased local datasets in the practical non-IID scenario, we efficiently infer the posteriors of each layer in each local model using layer-wise Laplace approximation and aggregate them to train the global parameters. Extensive experimental results demonstrate that FedLPA significantly improves learning performance over state-of-the-art methods across several metrics.

摘要
通用训练神经网络从本地客户端归一到服务器端是 Federated Learning 领域中广泛研究的主题。在最近，驱动了隐私问题的减少、抵御攻击和通信过程的减少，一shot Federated Learning（即限制客户端服务器通信到一round）在研究人员中得到了广泛的关注。然而，one-shot 集成性表现受到非标准训练数据分布的影响，这些分布在一些实际场景中具有高度的统计差异性。为 Addressing this issue, we propose a novel one-shot aggregation method called FedLPA, which aggregates local models to obtain a more accurate global model without requiring extra auxiliary datasets or exposing any confidential local information, such as label distributions. To effectively capture the statistics maintained in the biased local datasets in the practical non-IID scenario, we efficiently infer the posteriors of each layer in each local model using layer-wise Laplace approximation and aggregate them to train the global parameters. Extensive experimental results demonstrate that FedLPA significantly improves learning performance over state-of-the-art methods across several metrics.

Quantization of Deep Neural Networks to facilitate self-correction of weights on Phase Change Memory-based analog hardware

paper_url: http://arxiv.org/abs/2310.00337
repo_url: None
paper_authors: Arseni Ivanov
for: 这篇研究旨在实现硬件加速的神经网络，以应用于边缘 Computing 应用。
methods: 这篇研究使用了一种特别设计 для硬件架构的量化技术，以及一种自我修复机制。
results: 研究结果显示，当与在芯片上的普勒变化生成器一起使用时，我们的自我修复神经网络可以与使用分析敏感的算法相比。

Abstract
In recent years, hardware-accelerated neural networks have gained significant attention for edge computing applications. Among various hardware options, crossbar arrays, offer a promising avenue for efficient storage and manipulation of neural network weights. However, the transition from trained floating-point models to hardware-constrained analog architectures remains a challenge. In this work, we combine a quantization technique specifically designed for such architectures with a novel self-correcting mechanism. By utilizing dual crossbar connections to represent both the positive and negative parts of a single weight, we develop an algorithm to approximate a set of multiplicative weights. These weights, along with their differences, aim to represent the original network's weights with minimal loss in performance. We implement the models using IBM's aihwkit and evaluate their efficacy over time. Our results demonstrate that, when paired with an on-chip pulse generator, our self-correcting neural network performs comparably to those trained with analog-aware algorithms.

摘要
近年来，固件加速神经网络得到了边缘计算应用的广泛关注。多种硬件选项中，扫描阵列表示出了可能的高效存储和神经网络权重的方式。然而，从训练浮点模型转换到硬件受限的材料建筑仍然是一个挑战。在这种工作中，我们结合特定于这些架构的量化技术和一种新的自我修正机制。通过使用双扫描连接来表示单个权重的正负部分，我们开发了一种近似多个乘数的算法。这些乘数、其差异，目的是表示原始网络的权重，尽可能减少性能损失。我们使用IBM的aihwkit进行实现，并对时间进行评估。我们的结果表明，当与在板内普ulse生成器结合使用时，我们的自我修正神经网络能够与使用材料感知算法训练的神经网络相比走。

Efficient Planning with Latent Diffusion

paper_url: http://arxiv.org/abs/2310.00311
repo_url: None
paper_authors: Wenhao Li
for: 本研究旨在解决离线学习中的时间抽象和有效规划问题，特别是面临长期任务和延迟的奖励问题。现有方法通常在raw action space中规划，可能是不fficient和inflexible。
methods: 本文提出了一种基于latent action space的方法， capturing only possible actions within the behavior policy support and decoupling the temporal structure between planning and modeling。但现有的latent-action-based方法通常是离散的，需要昂贵的规划。
results: 本文提出的$\texttt{LatentDiffuser}$方法在low-dimensional locomotion control任务中表现竞争性，并在高维任务中超越了现有方法。

Abstract
Temporal abstraction and efficient planning pose significant challenges in offline reinforcement learning, mainly when dealing with domains that involve temporally extended tasks and delayed sparse rewards. Existing methods typically plan in the raw action space and can be inefficient and inflexible. Latent action spaces offer a more flexible paradigm, capturing only possible actions within the behavior policy support and decoupling the temporal structure between planning and modeling. However, current latent-action-based methods are limited to discrete spaces and require expensive planning. This paper presents a unified framework for continuous latent action space representation learning and planning by leveraging latent, score-based diffusion models. We establish the theoretical equivalence between planning in the latent action space and energy-guided sampling with a pretrained diffusion model and incorporate a novel sequence-level exact sampling method. Our proposed method, $\texttt{LatentDiffuser}$, demonstrates competitive performance on low-dimensional locomotion control tasks and surpasses existing methods in higher-dimensional tasks.

摘要
temporal abstraction和有效规划在线上游戏学习中具有重要挑战，尤其是在具有长时间任务和延迟的奖励的Domain中。现有方法通常在原生动作空间中规划，可能是不灵活和不高效的。latent action space提供了更加灵活的思想，只 capture行为策略支持下可能的动作和时间结构的分离。然而，当前的latent-action-based方法通常是离散的，需要昂贵的规划。这篇论文提出了一个综合框架，使用latent, score-based diffusion模型来学习和规划连续latent action space。我们证明了在latent action space中规划和energy-guided sampling with pretrained diffusion模型之间的理论等价性，并 интегrios了一种新的序列级别准确 sampling方法。我们的提案方法， $\texttt{LatentDiffuser}$,在low-dimensional locomotion控制任务上显示了竞争性的性能，并在高维任务上超过了现有的方法。

A Hierarchical Approach to Environment Design with Generative Trajectory Modeling

paper_url: http://arxiv.org/abs/2310.00301
repo_url: None
paper_authors: Dexun Li, Pradeep Varakantham
for: trains generally capable agents to achieve good zero-shot transfer performance
methods: uses Hierarchical MDP and synthetic experience dataset to reduce resource-intensive interactions
results: significantly improves efficiency and robustness of agent under limited training resources, with manifold advantages and effectiveness in various domains.

Abstract
Unsupervised Environment Design (UED) is a paradigm for training generally capable agents to achieve good zero-shot transfer performance. This paradigm hinges on automatically generating a curriculum of training environments. Leading approaches for UED predominantly use randomly generated environment instances to train the agent. While these methods exhibit good zero-shot transfer performance, they often encounter challenges in effectively exploring large design spaces or leveraging previously discovered underlying structures, To address these challenges, we introduce a novel framework based on Hierarchical MDP (Markov Decision Processes). Our approach includes an upper-level teacher's MDP responsible for training a lower-level MDP student agent, guided by the student's performance. To expedite the learning of the upper leavel MDP, we leverage recent advancements in generative modeling to generate synthetic experience dataset for training the teacher agent. Our algorithm, called Synthetically-enhanced Hierarchical Environment Design (SHED), significantly reduces the resource-intensive interactions between the agent and the environment. To validate the effectiveness of SHED, we conduct empirical experiments across various domains, with the goal of developing an efficient and robust agent under limited training resources. Our results show the manifold advantages of SHED and highlight its effectiveness as a potent instrument for curriculum-based learning within the UED framework. This work contributes to exploring the next generation of RL agents capable of adeptly handling an ever-expanding range of complex tasks.

摘要
自动化环境设计（UES）是一种训练通用智能代理人达到良好零批转换性的 paradigm。这种 paradigm 基于自动生成训练环境的课程。现有领先的UES方法主要使用随机生成的环境实例来训练代理人。 although these methods have shown good zero-shot transfer performance, they often encounter challenges in effectively exploring large design spaces or leveraging previously discovered underlying structures.To address these challenges, we propose a novel framework based on Hierarchical MDP (Markov Decision Processes). Our approach includes an upper-level teacher's MDP responsible for training a lower-level MDP student agent, guided by the student's performance. To expedite the learning of the upper-level MDP, we leverage recent advancements in generative modeling to generate synthetic experience datasets for training the teacher agent. Our algorithm, called Synthetically-enhanced Hierarchical Environment Design (SHED), significantly reduces the resource-intensive interactions between the agent and the environment.To validate the effectiveness of SHED, we conduct empirical experiments across various domains, with the goal of developing an efficient and robust agent under limited training resources. Our results show the manifold advantages of SHED and highlight its effectiveness as a potent instrument for curriculum-based learning within the UED framework. This work contributes to exploring the next generation of RL agents capable of adeptly handling an ever-expanding range of complex tasks.

Graph Neural Architecture Search with GPT-4

paper_url: http://arxiv.org/abs/2310.01436
repo_url: None
paper_authors: Haishuai Wang, Yang Gao, Xin Zheng, Peng Zhang, Hongyang Chen, Jiajun Bu
for: 自动设计图像神经网络
methods: 使用 GPT-4 为 GNAS 设计新的提示，将 GPT-4 引导到创造图像神经网络的生成任务中
results: 实验结果显示，将 GPT-4 embed 到 GNAS 中可以超越现有的GNAS方法，并且实现更快的融合速度

Abstract
Graph Neural Architecture Search (GNAS) has shown promising results in automatically designing graph neural networks. However, GNAS still requires intensive human labor with rich domain knowledge to design the search space and search strategy. In this paper, we integrate GPT-4 into GNAS and propose a new GPT-4 based Graph Neural Architecture Search method (GPT4GNAS for short). The basic idea of our method is to design a new class of prompts for GPT-4 to guide GPT-4 toward the generative task of graph neural architectures. The prompts consist of descriptions of the search space, search strategy, and search feedback of GNAS. By iteratively running GPT-4 with the prompts, GPT4GNAS generates more accurate graph neural networks with fast convergence. Experimental results show that embedding GPT-4 into GNAS outperforms the state-of-the-art GNAS methods.

摘要
graph神经架构搜寻（GNAS）已经显示出优异的结果，可以自动设计graph神经网络。然而，GNAS仍然需要专业知识和专业人员的努力来设计搜寻空间和搜寻策略。在这篇论文中，我们将GPT-4 integrate到GNAS中，并提出了一种基于GPT-4的新的Graph Neural Architecture Search方法（GPT4GNAS简称）。我们的方法的基本想法是，通过设计GPT-4的新类型的提示，将GPT-4引导到Graph Neural Architecture Search的生成任务中。这些提示包括搜寻空间、搜寻策略和搜寻反馈。通过轮询GPT-4的提示，GPT4GNAS可以更快地生成更准确的graph神经网络。实验结果显示，将GPT-4 integrate到GNAS中比前者的GNAS方法更高效。

Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition

paper_url: http://arxiv.org/abs/2310.00283
repo_url: None
paper_authors: Dongyuan Li, Yusong Wang, Kotaro Funakoshi, Manabu Okumura
for: 这种方法是为了提高人机交互中的语音情感识别（SER）的性能和效率。
methods: 这种方法使用了任务适应预训练（TAPT）和活动学习（AL）方法来增强性能和效率。具体来说，首先使用TAPT来减少预训练和下游任务之间的信息差距。然后，使用AL方法来遍历选择最有信息和多样性的样本进行细化，从而减少时间消耗。
results: 实验表明，只使用20%的样本可以提高8.45%的准确率，同时减少79%的时间消耗。

Abstract
Speech emotion recognition (SER) has drawn increasing attention for its applications in human-machine interaction. However, existing SER methods ignore the information gap between the pre-training speech recognition task and the downstream SER task, leading to sub-optimal performance. Moreover, they require much time to fine-tune on each specific speech dataset, restricting their effectiveness in real-world scenes with large-scale noisy data. To address these issues, we propose an active learning (AL) based Fine-Tuning framework for SER that leverages task adaptation pre-training (TAPT) and AL methods to enhance performance and efficiency. Specifically, we first use TAPT to minimize the information gap between the pre-training and the downstream task. Then, AL methods are used to iteratively select a subset of the most informative and diverse samples for fine-tuning, reducing time consumption. Experiments demonstrate that using only 20\%pt. samples improves 8.45\%pt. accuracy and reduces 79\%pt. time consumption.

摘要
人工智能感情识别（SER）在人机交互中受到越来越多的关注。然而，现有的SER方法忽视了预训练语音识别任务和下游SER任务之间的信息差距，导致性能下降。另外，它们需要大量时间来 Fine-tune 每个特定语音数据集，限制了它们在实际场景中的实用性。为解决这些问题，我们提议一种基于活动学习（AL）的 Fine-Tuning 框架 для SER，利用任务适应预训练（TAPT）和 AL 方法提高性能和效率。具体来说，我们首先使用 TAPT 将预训练任务和下游任务之间的信息差距减少到最小。然后，我们使用 AL 方法选择下游任务中最有用和多样的样本进行 Fine-tuning，从而减少时间消耗。实验结果表明，只使用 20% 的样本可以提高 8.45% 的精度和减少 79% 的时间消耗。

Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration

paper_url: http://arxiv.org/abs/2310.00280
repo_url: https://github.com/qiushisun/corex
paper_authors: Qiushi Sun, Zhangyue Yin, Xiang Li, Zhiyong Wu, Xipeng Qiu, Lingpeng Kong
for: 本研究旨在提高大型自然语言处理（NLP）模型在复杂任务解释中的表现，通过设计多模型协作策略，使模型能够’’思考出obox’’，提高事实性、准确性和可靠性。
methods: 本研究提出了一个新的多模型协作策略集合，称为Corex，包括对话、评审和检索模式。这些模式通过启发人类行为，激发模型进行多样化的解释，提高task-agnostic的表现。
results: 经过对四种不同类型的解释任务的广泛实验，我们发现，通过多模型协作，解释性能得到了显著提高，与现有方法相比。此外，我们还提供了更多的结果和深入分析，证明我们的方法的成本效益和注解效率。

Abstract
Large Language Models (LLMs) are evolving at an unprecedented pace and have exhibited considerable capability in the realm of natural language processing (NLP) with world knowledge. Benefiting from ultra-large-scale training corpora, a single LLM can manage typical NLP tasks competently. However, its performance in executing reasoning tasks is still confined by the limitations of its internal representations. To push this boundary further, we introduce Corex in this paper, a suite of novel general-purpose strategies that transform LLMs into autonomous agents pioneering multi-model collaborations for complex task-solving. Inspired by human behaviors, Corex is constituted by diverse collaboration paradigms including Debate, Review, and Retrieve modes, which collectively work towards enhancing the factuality, faithfulness, and reliability of the reasoning process. These paradigms foster task-agnostic approaches that enable LLMs to ''think outside the box,'' thereby overcoming hallucinations and providing better solutions. Through extensive experiments across four different types of reasoning tasks, we demonstrate that orchestrating multiple LLMs to work in concert yields substantially better performance compared to existing methods. Further results and in-depth analysis demonstrate the cost-effectiveness of our method, facilitating collaboration among different LLMs and promoting annotation efficiency.

摘要

A Unified Framework for Generative Data Augmentation: A Comprehensive Survey

paper_url: http://arxiv.org/abs/2310.00277
repo_url: None
paper_authors: Yunhao Chen, Zihui Yan, Yunjie Zhu
for: 提高机器学习应用中数据缺乏的问题
methods: 使用生成数据生成技术（GDA）来增加数据量
results: 提供了一个总结GDA领域的框架，揭示了一些研究方向，如有效的数据选择、大规模模型的应用、建立GDA标准 bencmark等。In English, that means:
for: To alleviate the problem of data scarcity in machine learning applications
methods: Using generative data augmentation (GDA) techniques to increase data quantity
results: Provide a comprehensive framework for the GDA landscape, reveal some research directions, such as effective data selection, theoretical development for large-scale models, and establishing a benchmark for GDA.

Abstract
Generative data augmentation (GDA) has emerged as a promising technique to alleviate data scarcity in machine learning applications. This thesis presents a comprehensive survey and unified framework of the GDA landscape. We first provide an overview of GDA, discussing its motivation, taxonomy, and key distinctions from synthetic data generation. We then systematically analyze the critical aspects of GDA - selection of generative models, techniques to utilize them, data selection methodologies, validation approaches, and diverse applications. Our proposed unified framework categorizes the extensive GDA literature, revealing gaps such as the lack of universal benchmarks. The thesis summarises promising research directions, including , effective data selection, theoretical development for large-scale models' application in GDA and establishing a benchmark for GDA. By laying a structured foundation, this thesis aims to nurture more cohesive development and accelerate progress in the vital arena of generative data augmentation.

摘要

Investigating the Efficacy of Large Language Models in Reflective Assessment Methods through Chain of Thoughts Prompting

paper_url: http://arxiv.org/abs/2310.00272
repo_url: None
paper_authors: Baphumelele Masikisiki, Vukosi Marivate, Yvette Hlope
for: 本研究的目的是评估四种语言模型对第三年医学生的投影文章进行评估，以评估学生的批判思维能力。
methods: 本研究使用了Chain of Thought（CoT）提示法来训练大型语言模型完成特定任务。
results: 结果显示， Among all the models, Llama-7b performs the least effectively, displaying the highest mean squared error。 conversely, ChatGPT emerges as the superior model, boasting a higher Cohen kappa score value of 0.53。

Abstract
Large Language Models, such as Generative Pre-trained Transformer 3 (aka. GPT-3), have been developed to understand language through the analysis of extensive text data, allowing them to identify patterns and connections between words. While LLMs have demonstrated impressive performance across various text-related tasks, they encounter challenges in tasks associated with reasoning. To address this challenge, Chain of Thought(CoT) prompting method has been proposed as a means to enhance LLMs' proficiency in complex reasoning tasks like solving math word problems and answering questions based on logical argumentative reasoning. The primary aim of this research is to assess how well four language models can grade reflective essays of third-year medical students. The assessment will specifically target the evaluation of critical thinking skills using CoT prompting. The research will provide the following contributions; to introduce and educate on the process of instructing models to evaluate reflective essays from a dataset they have not been previously trained on; to illustrate the use of CoT prompting as an instructional approach for training large models to carry out particular tasks. Our results suggest that among all the models, Llama-7b performs the least effectively, displaying the highest mean squared error. Conversely, ChatGPT emerges as the superior model, boasting a higher Cohen kappa score value of 0.53. Lastly, it's important to note that the selected models do prioritise user privacy by allowing users to delete their own conducted conversations.

摘要
大型语言模型，如生成预训练转换器3（GPT-3），已经开发以便理解语言，通过分析大量文本数据，找到语言中的模式和关系。 Although these models have shown impressive performance in various text-related tasks, they struggle with tasks that require reasoning. To address this challenge, Chain of Thought (CoT) prompting method has been proposed to enhance the models' ability in complex reasoning tasks, such as solving math word problems and answering questions based on logical argumentative reasoning.本研究的主要目标是评估四个语言模型在评估第三年医学生的反思文章时的表现。研究将特别target evaluation of critical thinking skills using CoT prompting。我们的结果表明，在所有模型中，Llama-7b表现最差，显示最高的方差平方误差。相反，ChatGPT emerges as the superior model, with a higher Cohen kappa score value of 0.53. 最后，我们需要注意的是，选择的模型强调用户隐私，允许用户删除自己的进行的对话。

Unravel Anomalies: An End-to-end Seasonal-Trend Decomposition Approach for Time Series Anomaly Detection

paper_url: http://arxiv.org/abs/2310.00268
repo_url: None
paper_authors: Zhenwei Zhang, Ruiqi Wang, Ran Ding, Yuantao Gu
for: 本研究旨在解决复杂时间序列数据中的各种突变问题，传统时间序列异常检测方法在面临复杂时间序列数据和多种异常时 often struggle.
methods: 我们提出了TADNet，一种结束到端的时间序列异常检测模型，利用季节性-趋势分解将各种异常联系到特定的分解组件，从而简化复杂时间序列分析和异常检测性能。我们的训练方法包括预训练Synthetic dataset followed by fine-tuning, strike a balance between effective decomposition and precise anomaly detection.
results: 实验 validate TADNet在真实世界 dataset 上表现出state-of-the-art的性能，可以准确检测多种异常。

Abstract
Traditional Time-series Anomaly Detection (TAD) methods often struggle with the composite nature of complex time-series data and a diverse array of anomalies. We introduce TADNet, an end-to-end TAD model that leverages Seasonal-Trend Decomposition to link various types of anomalies to specific decomposition components, thereby simplifying the analysis of complex time-series and enhancing detection performance. Our training methodology, which includes pre-training on a synthetic dataset followed by fine-tuning, strikes a balance between effective decomposition and precise anomaly detection. Experimental validation on real-world datasets confirms TADNet's state-of-the-art performance across a diverse range of anomalies.

摘要
传统时序异常检测（TAD）方法经常遇到复杂时序数据的composite性和多样化异常的问题。我们介绍TADNet，一种终端TAD模型，利用季节性趋势分解将各种异常类型关联到特定的分解组件，从而简化复杂时序分析并提高异常检测性能。我们的训练方法，包括先行预训练 followed by fine-tuning，在有效分解和精准异常检测之间寻找平衡点。实验 validate TADNet在实际 dataset上具有现代水平的性能，并在多种异常情况下具有优异的检测性能。

The Physics of Preference: Unravelling Imprecision of Human Preferences through Magnetisation Dynamics

paper_url: http://arxiv.org/abs/2310.00267
repo_url: None
paper_authors: Ivan S. Maksymov, Ganna Pogrebna
for: 这篇论文是为了研究人类决策过程中的不一致行为和偏好逆转现象而写的。
methods: 这篇论文使用了物理原理，具体来说是磁化逆转现象在磁性奈米结构中的电流驱动，来模拟人类决策过程。
results: 测试结果表明，这种物理和心理学材料的混合可以准确地捕捉人类决策过程中的复杂性。

Abstract
Paradoxical decision-making behaviours such as preference reversal often arise from imprecise or noisy human preferences. By harnessing the physical principle of magnetisation reversal in ferromagnetic nanostructures driven by electric current, we developed a model that closely reflects human decision-making dynamics. Tested against a spectrum of psychological data, our model adeptly captures the complexities inherent in individual choices. This blend of physics and psychology paves the way for fresh perspectives on understanding human decision-making processes.

摘要
人类决策行为经常呈现出悖论的特点，如偏好逆转。我们通过利用束缚性材料中的电流驱动的磁化现象，开发了一种模型，能够准确反映人类决策过程中的复杂性。对各种心理数据进行测试，我们的模型能够成功捕捉人类决策过程中的复杂性。这种物理和心理学的融合，为人类决策过程的理解带来了新的视角。

A quantum system control method based on enhanced reinforcement learning

paper_url: http://arxiv.org/abs/2310.03036
repo_url: None
paper_authors: Wenjie Liu, Bosi Wang, Jihao Fan, Yebo Ge, Mohammed Zidan
for: 控制量子系统，提高控制精度和效率。
methods: 基于增强再征学习（QSC-ERL）方法，将量子系统控制转化为再征学习任务。使用新的增强神经网络，快速实现最大化长期总奖励，精确地导向量子状态从初始状态到目标状态。
results: 与其他方法比较，QSC-ERL在有限资源条件下达到近似1比例学习控制量子系统，需要更少的集数进行量子状态演化。

Abstract
Traditional quantum system control methods often face different constraints, and are easy to cause both leakage and stochastic control errors under the condition of limited resources. Reinforcement learning has been proved as an efficient way to complete the quantum system control task. To learn a satisfactory control strategy under the condition of limited resources, a quantum system control method based on enhanced reinforcement learning (QSC-ERL) is proposed. The states and actions in reinforcement learning are mapped to quantum states and control operations in quantum systems. By using new enhanced neural networks, reinforcement learning can quickly achieve the maximization of long-term cumulative rewards, and a quantum state can be evolved accurately from an initial state to a target state. According to the number of candidate unitary operations, the three-switch control is used for simulation experiments. Compared with other methods, the QSC-ERL achieves close to 1 fidelity learning control of quantum systems, and takes fewer episodes to quantum state evolution under the condition of limited resources.

摘要
传统量子系统控制方法经常遇到不同的限制，容易导致泄漏和随机控制错误，特别在有限资源的情况下。基于增强的回归学习（QSC-ERL）方法是一种有效的完成量子系统控制任务的方法。在这种方法中， reinforcement learning 中的状态和操作被映射到量子系统中的状态和控制操作。通过使用新的增强神经网络， reinforcement learning 可以快速实现最大化长期累积奖励，并将量子状态从初始状态演化到目标状态。根据候选unitary操作的数量，使用三个转移控制来进行模拟实验。相比其他方法，QSC-ERL 可以在有限资源的情况下准确地控制量子系统，并在 fewer episodes 中实现量子状态演化。

AdaptNet: Policy Adaptation for Physics-Based Character Control

paper_url: http://arxiv.org/abs/2310.00239
repo_url: None
paper_authors: Pei Xu, Kaixiang Xie, Sheldon Andrews, Paul G. Kry, Michael Neff, Morgan McGuire, Ioannis Karamouzas, Victor Zordan
for: 这篇论文旨在提出一种方法，以便从现有策略中快速学习新的行为。
methods: 该方法基于给定的奖励学习控制器，使用两层层次结构，首先增强原始状态嵌入，然后修改策略网络层来实现更大的更改。
results: 该方法可以快速适应现有的物理学习控制器，并在各种新的步态、任务目标、角色 morphology 和环境变化中显示出显著的提高。此外，它也表现出了与其他方法相比明显的增长效率，即训练时间的减少。

Abstract
Motivated by humans' ability to adapt skills in the learning of new ones, this paper presents AdaptNet, an approach for modifying the latent space of existing policies to allow new behaviors to be quickly learned from like tasks in comparison to learning from scratch. Building on top of a given reinforcement learning controller, AdaptNet uses a two-tier hierarchy that augments the original state embedding to support modest changes in a behavior and further modifies the policy network layers to make more substantive changes. The technique is shown to be effective for adapting existing physics-based controllers to a wide range of new styles for locomotion, new task targets, changes in character morphology and extensive changes in environment. Furthermore, it exhibits significant increase in learning efficiency, as indicated by greatly reduced training times when compared to training from scratch or using other approaches that modify existing policies. Code is available at https://motion-lab.github.io/AdaptNet.

摘要
<>受人类学习新技能的能力启发，本文介绍了 AdaptNet，一种修改现有政策的方法，以允许快速从相似任务中学习新行为。基于给定的奖励学习控制器，AdaptNet使用两层层次结构，将原始状态嵌入更改以支持小幅度的行为变化，并对策略网络层进行更加重要的更改。该技术能够有效地适应现有物理学习控制器到各种新的步态、任务目标、角色结构和环境变化中。此外，它还表现出了明显提高学习效率，如训练时间减少了很多，相比于从scratch或使用其他修改现有政策的方法。代码可以在上获取。<>

Combining Spatial and Temporal Abstraction in Planning for Better Generalization

paper_url: http://arxiv.org/abs/2310.00229
repo_url: https://github.com/pwnerharry/skipper
paper_authors: Mingde Zhao, Safa Alver, Harm van Seijen, Romain Laroche, Doina Precup, Yoshua Bengio
for: 提高机器人的执行效率和普适性
methods: 基于模型学习的决策减少方法，通过空间和时间抽象来总结学习的技能并应用于新情况
results: 比对现有高级 плани方法， Skipper 在零shot总结中显示出显著优势

Abstract
Inspired by human conscious planning, we propose Skipper, a model-based reinforcement learning agent that utilizes spatial and temporal abstractions to generalize learned skills in novel situations. It automatically decomposes the task at hand into smaller-scale, more manageable subtasks and hence enables sparse decision-making and focuses its computation on the relevant parts of the environment. This relies on the definition of a high-level proxy problem represented as a directed graph, in which vertices and edges are learned end-to-end using hindsight. Our theoretical analyses provide performance guarantees under appropriate assumptions and establish where our approach is expected to be helpful. Generalization-focused experiments validate Skipper's significant advantage in zero-shot generalization, compared to existing state-of-the-art hierarchical planning methods.

摘要

Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis

paper_url: http://arxiv.org/abs/2310.00224
repo_url: None
paper_authors: Nithin Gopalakrishnan Nair, Anoop Cherian, Suhas Lohit, Ye Wang, Toshiaki Koike-Akino, Vishal M. Patel, Tim K. Marks
for: 这篇研究是用于实现条件参数化的内部生成模型，以提高内部生成的品质和精确性。
methods: 这篇研究使用了扩散模型，并通过设计一个逆模型来控制扩散过程的推导。
results: 研究所得到的结果显示，这种方法可以在不需要训练的情况下，实现高品质的内部生成，并且可以轻松地整合多个条件。

Abstract
Conditional generative models typically demand large annotated training sets to achieve high-quality synthesis. As a result, there has been significant interest in designing models that perform plug-and-play generation, i.e., to use a predefined or pretrained model, which is not explicitly trained on the generative task, to guide the generative process (e.g., using language). However, such guidance is typically useful only towards synthesizing high-level semantics rather than editing fine-grained details as in image-to-image translation tasks. To this end, and capitalizing on the powerful fine-grained generative control offered by the recent diffusion-based generative models, we introduce Steered Diffusion, a generalized framework for photorealistic zero-shot conditional image generation using a diffusion model trained for unconditional generation. The key idea is to steer the image generation of the diffusion model at inference time via designing a loss using a pre-trained inverse model that characterizes the conditional task. This loss modulates the sampling trajectory of the diffusion process. Our framework allows for easy incorporation of multiple conditions during inference. We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution. Our results demonstrate clear qualitative and quantitative improvements over state-of-the-art diffusion-based plug-and-play models while adding negligible additional computational cost.

摘要
通常情况下，条件生成模型需要大量标注训练数据来达到高质量生成。因此，有很大的兴趣在设计可以使用预定义或预训练模型来导引生成过程的模型。然而，这种导引通常只有用于synthesize高级 semantics而不是编辑细节，如图像。为了解决这个问题，我们介绍了Steered Diffusion，一种普适的扩展框架，可以通过定制 diffusion 模型来实现零shot条件图像生成。我们的关键想法是在推理时使用预训练 inverse 模型来定义条件任务的损失函数，以控制 diffusion 过程的抽象样本。我们的框架允许在推理时容易添加多个条件。我们通过对Steered Diffusion进行多种任务的实验，包括填充、色调化、文本干涉性编辑和图像超分辨率重建，并取得了明显的qualitative和quantitative提升。同时，我们的方法添加了negligible的计算成本。

Beyond Random Noise: Insights on Anonymization Strategies from a Latent Bandit Study

paper_url: http://arxiv.org/abs/2310.00221
repo_url: None
paper_authors: Alexander Galozy, Sadi Alawadi, Victor Kebande, Sławomir Nowaczyk
for: 本研究探讨了学习场景下知识共享的隐私问题。我们的研究贡献到了隐私保护机器学习领域的快速发展，并高亮了特定攻击模式需要适应的隐私技术。
methods: 我们使用隐藏链接设定来评估隐私和推荐性能之间的权衡。我们采用了不同的聚合策略，如平均、最近邻居和聚合加杂。在 simulate 攻击enario中，我们利用了公开的 auxillary 信息，用于攻击者所获取的公共信息。
results: 我们在三个开放的实际数据集上进行了实验，发现将噪声添加到个体用户数据记录中并不是一个好的选择。相比拟标准噪声机制，使用不同的聚合策略和噪声可以提供更多的灵活性。例如，使用不同大小的群组的平均值可以提供不可能由噪声alone 实现的灵活性。总之，没有单一的聚合策略可以在给定的隐私水平下实现最佳的忧郁。

Abstract
This paper investigates the issue of privacy in a learning scenario where users share knowledge for a recommendation task. Our study contributes to the growing body of research on privacy-preserving machine learning and underscores the need for tailored privacy techniques that address specific attack patterns rather than relying on one-size-fits-all solutions. We use the latent bandit setting to evaluate the trade-off between privacy and recommender performance by employing various aggregation strategies, such as averaging, nearest neighbor, and clustering combined with noise injection. More specifically, we simulate a linkage attack scenario leveraging publicly available auxiliary information acquired by the adversary. Our results on three open real-world datasets reveal that adding noise using the Laplace mechanism to an individual user's data record is a poor choice. It provides the highest regret for any noise level, relative to de-anonymization probability and the ADS metric. Instead, one should combine noise with appropriate aggregation strategies. For example, using averages from clusters of different sizes provides flexibility not achievable by varying the amount of noise alone. Generally, no single aggregation strategy can consistently achieve the optimum regret for a given desired level of privacy.

摘要

Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment

paper_url: http://arxiv.org/abs/2310.00212
repo_url: None
paper_authors: Tianhao Wu, Banghua Zhu, Ruoyu Zhang, Zhaojin Wen, Kannan Ramchandran, Jiantao Jiao
for: 这篇论文的目的是提出一个新的整合学习框架，以及一种新的政策梯度算法，以帮助大型自然语言模型（LLM）从人类反馈中学习出更好的行为。
methods: 这篇论文使用了一种新的整合学习框架，即“对照式反馈学习”（Pairwise Proximal Policy Optimization，P3O），这种方法可以直接从比较式反馈中学习政策。P3O 使用了一种新的政策梯度算法，它可以跳过 PPO 中的时间对照运算，并且可以更好地适应不同的对照式反馈。
results: 这篇论文的实验结果显示，P3O 可以与 PPO 相比，在 KL-Reward 贡献损失中实现更好的平衡，并且可以更好地与人类偏好相Alignment。此外，P3O 还可以在不同的对照式反馈中进行精确的优化。

Abstract
Large Language Models (LLMs) can acquire extensive world knowledge through pre-training on large corpora. However, due to exposure to low-quality data, LLMs may exhibit harmful behavior without aligning with human values. The dominant approach for steering LLMs towards beneficial behavior involves Reinforcement Learning with Human Feedback (RLHF), with Proximal Policy Optimization (PPO) serving as the default RL optimizer. Despite its effectiveness, PPO has limitations when optimizing rewards trained from comparison-based loss. Primarily, PPO is not invariant to equivalent reward functions containing identical preference information due to the need to calibrate the reward scale. Additionally, PPO's necessity for token-wise updates introduces complexity in both function approximation and algorithm design compared to trajectory-wise optimization. This paper proposes a new framework, reinforcement learning with relative feedback, and a novel trajectory-wise policy gradient algorithm, Pairwise Proximal Policy Optimization (P3O) that operates directly on comparative rewards. We show theoretically that P3O is invariant to equivalent rewards and avoids the complexity of PPO. Empirical evaluations demonstrate that P3O outperforms PPO in the KL-Reward trade-off and can align with human preferences as well as or better than prior methods. In summary, this work introduces a simpler yet effective approach for aligning LLMs to human preferences through relative feedback.

摘要
大型语言模型（LLM）可以通过预训练大量文本获得广泛的世界知识。然而，由于接触低质量数据，LLM可能会表现出不良行为不符合人类价值观。现行的方法是使用人类反馈的强化学习来引导LLM，其中Proximal Policy Optimization（PPO） serve as the default RL optimizer。尽管它有效，但PPO受到比较基于损失函数的限制。一、PPO不惟一地对等抽象的奖励函数进行匹配，需要调整奖励的权重缩放。二、PPO需要在单个字符级别进行更新，这会增加函数近似和算法设计中的复杂度。本文提出了一种新的框架——强化学习与相对反馈，以及一种新的追踪级别政策梯度算法——对比较政策优化（P3O）。我们证明了P3O对等抽象的奖励函数是惟一的，并且不需要单个字符级别的更新。实验证明，P3O在KL-奖励负担中比PPO表现更好，并且可以与人类偏好相似或更好地启合。总之，本文介绍了一种简洁又有效的方法，通过相对反馈来引导LLM符合人类价值观。

A Prefrontal Cortex-inspired Architecture for Planning in Large Language Models

paper_url: http://arxiv.org/abs/2310.00194
repo_url: None
paper_authors: Taylor Webb, Shanka Subhra Mondal, Chi Wang, Brian Krabach, Ida Momennejad
for: 提高大语言模型（LLM）的计划能力，使其更能够完成多步骤的 reasoning 和目标导向的 плани化。
methods: 启取人类大脑的前前叶皮层（PFC）中的特殊化模块，如冲突监测、状态预测、状态评估、任务分解和任务协调等功能，并将其应用于 LLM 中。
results: 使用多个 LLM-based（GPT-4）模块的黑盒体系，可以在解决 graph traversal 和 Tower of Hanoi 等两个困难的计划任务时，取得显著的提高，比如零shot 提示或在Context learning 中。这些结果表明，从 cognitive neuroscience 中获得的知识可以改善 LLM 的计划能力。

Abstract
Large language models (LLMs) demonstrate impressive performance on a wide variety of tasks, but they often struggle with tasks that require multi-step reasoning or goal-directed planning. To address this, we take inspiration from the human brain, in which planning is accomplished via the recurrent interaction of specialized modules in the prefrontal cortex (PFC). These modules perform functions such as conflict monitoring, state prediction, state evaluation, task decomposition, and task coordination. We find that LLMs are sometimes capable of carrying out these functions in isolation, but struggle to autonomously coordinate them in the service of a goal. Therefore, we propose a black box architecture with multiple LLM-based (GPT-4) modules. The architecture improves planning through the interaction of specialized PFC-inspired modules that break down a larger problem into multiple brief automated calls to the LLM. We evaluate the combined architecture on two challenging planning tasks -- graph traversal and Tower of Hanoi -- finding that it yields significant improvements over standard LLM methods (e.g., zero-shot prompting or in-context learning). These results demonstrate the benefit of utilizing knowledge from cognitive neuroscience to improve planning in LLMs.

摘要

2023-09-30

Reinforcement learning adaptive fuzzy controller for lighting systems: application to aircraft cabin

Learning Informative Latent Representation for Quantum State Tomography

A Brief History of Prompt: Leveraging Language Models

Unveiling the Unborn: Advancing Fetal Health Classification through Machine Learning

From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

On Memorization and Privacy risks of Sharpness Aware Minimization

UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities

Encouraging Inferable Behavior for Autonomy: Repeated Bimatrix Stackelberg Games with Observations

Consistent Aggregation of Objectives with Diverse Time Preferences Requires Non-Markovian Rewards

Active-Perceptive Motion Generation for Mobile Manipulation

Making Friends in the Dark: Ad Hoc Teamwork Under Partial Observability

Building Flexible, Scalable, and Machine Learning-ready Multimodal Oncology Datasets

Refutation of Shapley Values for XAI – Additional Evidence

Order-Preserving GFlowNets

Dynamic Demonstrations Controller for In-Context Learning

Measuring Value Understanding in Language Models through Discriminator-Critique Gap

AI-Dentify: Deep learning for proximal caries detection on bitewing x-ray – HUNT4 Oral Health Study

Neuroadaptation in Physical Human-Robot Collaboration

Visual Political Communication in a Polarized Society: A Longitudinal Study of Brazilian Presidential Elections on Instagram

Unlocking Bias Detection: Leveraging Transformer-Based Models for Content Analysis

Deep Reinforcement Learning for Autonomous Vehicle Intersection Navigation

FedLPA: Personalized One-shot Federated Learning with Layer-Wise Posterior Aggregation

Quantization of Deep Neural Networks to facilitate self-correction of weights on Phase Change Memory-based analog hardware

Efficient Planning with Latent Diffusion

A Hierarchical Approach to Environment Design with Generative Trajectory Modeling

Graph Neural Architecture Search with GPT-4

Active Learning Based Fine-Tuning Framework for Speech Emotion Recognition

Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration

A Unified Framework for Generative Data Augmentation: A Comprehensive Survey

Investigating the Efficacy of Large Language Models in Reflective Assessment Methods through Chain of Thoughts Prompting

Unravel Anomalies: An End-to-end Seasonal-Trend Decomposition Approach for Time Series Anomaly Detection

The Physics of Preference: Unravelling Imprecision of Human Preferences through Magnetisation Dynamics

A quantum system control method based on enhanced reinforcement learning

AdaptNet: Policy Adaptation for Physics-Based Character Control

Combining Spatial and Temporal Abstraction in Planning for Better Generalization

Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis

Beyond Random Noise: Insights on Anonymization Strategies from a Latent Bandit Study

Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment

A Prefrontal Cortex-inspired Architecture for Planning in Large Language Models