2023-08-11

cs.AI

cs.AI - 2023-08-11

Towards a Causal Probabilistic Framework for Prediction, Action-Selection & Explanations for Robot Block-Stacking Tasks

paper_url: http://arxiv.org/abs/2308.06203
repo_url: None
paper_authors: Ricardo Cannizzaro, Jonathan Routley, Lars Kunze
for: 这个论文的目的是提供一种 causal probabilistic 框架，用于嵌入物理模拟能力到STRUCTURAL causal model 中，以便 robot 可以在堆叠任务中进行现状识别、下一步操作选择和后果解释。
methods: 该论文使用 causal inference 和 Bayesian networks 来编码形式化的 causal 关系知识，并将其与 probabilistic 表示方法结合使用。它还使用 physics simulation 来模拟堆叠任务的当前状态，并提出了一种基于 counterfactual explanation 的 post-hoc 解释方法。
results: 论文提出了一种 novel causal probabilistic 框架，并在 simulated 和实际的 robot 堆叠任务中提供了 exemplar 的 next-best action 选择结果。它还将在未来进行实验证明和扩展。

Abstract
Uncertainties in the real world mean that is impossible for system designers to anticipate and explicitly design for all scenarios that a robot might encounter. Thus, robots designed like this are fragile and fail outside of highly-controlled environments. Causal models provide a principled framework to encode formal knowledge of the causal relationships that govern the robot's interaction with its environment, in addition to probabilistic representations of noise and uncertainty typically encountered by real-world robots. Combined with causal inference, these models permit an autonomous agent to understand, reason about, and explain its environment. In this work, we focus on the problem of a robot block-stacking task due to the fundamental perception and manipulation capabilities it demonstrates, required by many applications including warehouse logistics and domestic human support robotics. We propose a novel causal probabilistic framework to embed a physics simulation capability into a structural causal model to permit robots to perceive and assess the current state of a block-stacking task, reason about the next-best action from placement candidates, and generate post-hoc counterfactual explanations. We provide exemplar next-best action selection results and outline planned experimentation in simulated and real-world robot block-stacking tasks.

摘要
real-world uncertainty 意味着 robot 设计人员无法预测和专门设计 robot 可能遇到的所有场景。因此，以前的 robot 设计是脆弱的，只能在高度控制的环境中工作。 causal 模型提供了一个原则的框架，用于编码 robot 与环境的 causal 关系，以及常见的real-world Robot 遇到的抽象表示 uncertainty 和不确定性。 combined with causal inference, these models permit an autonomous agent to understand, reason about, and explain its environment.在这项工作中，我们关注了一个 robot 块堆叠任务，因为它需要 robot 拥有的基本感知和操作能力，这些能力是许多应用程序，包括仓库自动化和家庭支持 robotics 所需的。我们提出了一种新的 causal 概率框架，用于嵌入物理模拟能力到结构 causal 模型中，使 robot 能够识别和评估块堆叠任务的当前状态，从选择候选地点中选择下一个行动，并生成post-hoc counterfactual explanations。我们提供了示例的下一个行动选择结果，并详细介绍计划的实验在模拟和实际 robot 块堆叠任务中。

Exploring Predicate Visual Context in Detecting of Human-Object Interactions

paper_url: http://arxiv.org/abs/2308.06202
repo_url: https://github.com/fredzzhang/pvic
paper_authors: Frederic Z. Zhang, Yuhui Yuan, Dylan Campbell, Zhuoyao Zhong, Stephen Gould
for: 这 paper 的目的是研究人–物交互 (HOI) 问题，尤其是使用 two-stage transformer-based HOI detectors。
methods: 这 paper 使用 visualisations 和仔细的实验来研究如何重新引入图像特征，并通过改进查询设计、广泛探索键和值、以及盒子对位嵌入为空间指导来提高 predicate visual context (PViC)。
results: 这 paper 在 HICO-DET 和 V-COCO 测试集上表现出色，比前一代方法高效，同时保持低训练成本。

Abstract
Recently, the DETR framework has emerged as the dominant approach for human--object interaction (HOI) research. In particular, two-stage transformer-based HOI detectors are amongst the most performant and training-efficient approaches. However, these often condition HOI classification on object features that lack fine-grained contextual information, eschewing pose and orientation information in favour of visual cues about object identity and box extremities. This naturally hinders the recognition of complex or ambiguous interactions. In this work, we study these issues through visualisations and carefully designed experiments. Accordingly, we investigate how best to re-introduce image features via cross-attention. With an improved query design, extensive exploration of keys and values, and box pair positional embeddings as spatial guidance, our model with enhanced predicate visual context (PViC) outperforms state-of-the-art methods on the HICO-DET and V-COCO benchmarks, while maintaining low training cost.

摘要
Here's the Simplified Chinese translation:最近，DETR框架在人物交互（HOI）研究中成为主流方法。特别是两stage转换器基于HOI检测器在性能和训练效率方面表现非常出色。然而，这些模型通常会基于缺乏细化上下文信息的对象特征进行分类，忽略对象的姿势和方向信息，而依赖视觉特征来判断对象的标识和边框端点。这会导致复杂或抽象的交互不能正确识别。在这个工作中，我们通过视觉化和仔细设计的实验来研究这些问题。我们研究如何通过交叉注意来重新引入图像特征，使用改进的查询设计、广泛探索的键和值、以及为空间导航而添加的盒对位嵌入。我们的模型具有加强 predicate visual context（PViC），在HICO-DET和V-COCO测试benchmark上显示出优于当前最佳方法，而且保持训练成本低。

Complex Facial Expression Recognition Using Deep Knowledge Distillation of Basic Features

paper_url: http://arxiv.org/abs/2308.06197
repo_url: https://github.com/angusmaiden/complex-fer
paper_authors: Angus Maiden, Bahareh Nakisa
for: 这个论文的目的是提出一种基于人类认知和学习的新型连续学习方法，以便准确地识别新的复杂表情类别使用少量示例。
methods: 该方法基于人类认知和学习，包括知识储存、知识总结和预测排序记忆等技术。它还使用 GradCAM 可视化来解释基本和复杂表情之间的关系。
results: 该方法可以准确地识别新的复杂表情类别，并且在连续学习中保持知识的稳定性。在新类别上达到了74.28%的总准确率，比非连续学习方法提高13.95%。此外，该方法还应用了几个示例学习来达到了100%的准确率。

Abstract
Complex emotion recognition is a cognitive task that has so far eluded the same excellent performance of other tasks that are at or above the level of human cognition. Emotion recognition through facial expressions is particularly difficult due to the complexity of emotions expressed by the human face. For a machine to approach the same level of performance in this domain as a human, it may need to synthesise knowledge and understand new concepts in real-time as humans do. Humans are able to learn new concepts using only few examples, by distilling the important information from memories and discarding the rest. Similarly, continual learning methods learn new classes whilst retaining the knowledge of known classes, whilst few-shot learning methods are able to learn new classes using very few training examples. We propose a novel continual learning method inspired by human cognition and learning that can accurately recognise new compound expression classes using few training samples, by building on and retaining its knowledge of basic expression classes. Using GradCAM visualisations, we demonstrate the relationship between basic and compound facial expressions, which our method leverages through knowledge distillation and a novel Predictive Sorting Memory Replay. Our method achieves the current state-of-the-art in continual learning for complex facial expression recognition with 74.28% Overall Accuracy on new classes. We also demonstrate that using continual learning for complex facial expression recognition achieves far better performance than non-continual learning methods, improving on state-of-the-art non-continual learning methods by 13.95%. To the best of our knowledge, our work is also the first to apply few-shot learning to complex facial expression recognition, achieving the state-of-the-art with 100% accuracy using a single training sample for each expression class.

摘要
人工智能recognition of complex emotions是一个艰难的认知任务，尤其是通过脸部表达来认出情感。由于人脸上表达的情感复杂，因此机器需要同人类一样快速学习和捕捉新的概念。人类通过少量示例学习新的概念，并将重要信息提取出来，而不是把所有信息都记忆下来。我们提出了一种基于人类认知和学习的新型不断学习方法，可以准确地识别新的复杂表情类别，使用很少的训练样本。我们使用GradCAM视觉化来表示基本和复杂表情之间的关系，并通过知识储存和一种新的预测排序记忆来利用这种关系。我们的方法实现了当前领域的最佳性能，新类别的总准确率为74.28%。我们还证明了不断学习对复杂表情认知的表现远胜非不断学习方法，提高了状态之前的最佳非不断学习方法的13.95%。而且，我们是第一次将几 shot learning应用于复杂表情认知，实现了100%的准确率，只需用每个表情类别的一个训练样本。

Software Doping Analysis for Human Oversight

paper_url: http://arxiv.org/abs/2308.06186
repo_url: None
paper_authors: Sebastian Biewer, Kevin Baum, Sarah Sterz, Holger Hermanns, Sven Hetmank, Markus Langer, Anne Lauber-Rönsberg, Franz Lehr
for: 本文提出了一个框架，用于减少软件对社会 pose 的风险。这包括软件毒品和不公正的决策系统中的不公正和歧视。
methods: 本文结合了软件毒品分析的正式基础和概率证明技术，提出了一种黑盒分析技术，用于发现软件中的不良效果。这种技术应用于柴油车的排放清洁系统以及高风险决策系统中的人类评审。
results: 本文的方法可以帮助人类监督者更好地做出更负责任的决策，以促进更有效的人类监督。此外，文章还提供了一个法律、哲学和心理学等多方面的视角，探讨软件 pose 对社会的 potential 问题。

Abstract
This article introduces a framework that is meant to assist in mitigating societal risks that software can pose. Concretely, this encompasses facets of software doping as well as unfairness and discrimination in high-risk decision-making systems. The term software doping refers to software that contains surreptitiously added functionality that is against the interest of the user. A prominent example of software doping are the tampered emission cleaning systems that were found in millions of cars around the world when the diesel emissions scandal surfaced. The first part of this article combines the formal foundations of software doping analysis with established probabilistic falsification techniques to arrive at a black-box analysis technique for identifying undesired effects of software. We apply this technique to emission cleaning systems in diesel cars but also to high-risk systems that evaluate humans in a possibly unfair or discriminating way. We demonstrate how our approach can assist humans-in-the-loop to make better informed and more responsible decisions. This is to promote effective human oversight, which will be a central requirement enforced by the European Union's upcoming AI Act. We complement our technical contribution with a juridically, philosophically, and psychologically informed perspective on the potential problems caused by such systems.

摘要

Physical Adversarial Attacks For Camera-based Smart Systems: Current Trends, Categorization, Applications, Research Challenges, and Future Outlook

paper_url: http://arxiv.org/abs/2308.06173
repo_url: None
paper_authors: Amira Guesmi, Muhammad Abdullah Hanif, Bassem Ouni, Muhammed Shafique
for: 本文提供了物理敌意攻击的全面评估，强调特征和应用场景，以便为研究人员、实践者和政策制定者提供一份有价值的资源，以掌握物理敌意攻击的全面特征和挑战，并为建立可靠和安全的深度学习系统而做出贡献。
methods: 本文涵盖了物理敌意攻击的多种方法，包括分类、检测、人脸识别、语义分割和深度估计等应用场景，并对这些攻击方法进行了效果、隐蔽性和稳定性的评估。
results: 本文对物理敌意整体攻击的效果进行了评估，并发现这些攻击方法在不同的应用场景中的表现强度不同，同时也发现了这些攻击方法的潜在漏洞和可能的应对策略。

Abstract
In this paper, we present a comprehensive survey of the current trends focusing specifically on physical adversarial attacks. We aim to provide a thorough understanding of the concept of physical adversarial attacks, analyzing their key characteristics and distinguishing features. Furthermore, we explore the specific requirements and challenges associated with executing attacks in the physical world. Our article delves into various physical adversarial attack methods, categorized according to their target tasks in different applications, including classification, detection, face recognition, semantic segmentation and depth estimation. We assess the performance of these attack methods in terms of their effectiveness, stealthiness, and robustness. We examine how each technique strives to ensure the successful manipulation of DNNs while mitigating the risk of detection and withstanding real-world distortions. Lastly, we discuss the current challenges and outline potential future research directions in the field of physical adversarial attacks. We highlight the need for enhanced defense mechanisms, the exploration of novel attack strategies, the evaluation of attacks in different application domains, and the establishment of standardized benchmarks and evaluation criteria for physical adversarial attacks. Through this comprehensive survey, we aim to provide a valuable resource for researchers, practitioners, and policymakers to gain a holistic understanding of physical adversarial attacks in computer vision and facilitate the development of robust and secure DNN-based systems.

摘要
在本文中，我们提供了物理抗击攻击的全面报告，强调特定领域的当前趋势。我们的目标是为抗击攻击提供全面的理解，分析其关键特征和区别。此外，我们还探讨了在物理世界中执行攻击的具体要求和挑战。我们的文章探讨了不同应用领域中的物理抗击攻击方法，分为不同的目标任务，包括分类、检测、脸Recognition、 semantic segmentation 和深度估计。我们评估了这些攻击方法的效果、隐蔽性和可靠性。我们检查了每种技术如何在抗击DNN的同时避免检测和实际世界的扭曲。最后，我们讨论了现有的挑战和未来研究方向，包括增强防御机制、探索新的攻击策略、在不同应用领域中评估攻击、并建立DNN领域的标准化测试基准和评价标准。通过这篇全面报告，我们希望为研究人员、实践者和政策制定者提供一份有价值的资源，以便更好地理解物理抗击攻击，并促进robust和安全的DNN基于系统的发展。

Phased Deep Spatio-temporal Learning for Highway Traffic Volume Prediction

paper_url: http://arxiv.org/abs/2308.06155
repo_url: None
paper_authors: Weilong Ding, Tianpu Zhang, Zhe Wang
for: 预测城市高速公路交通量，提高城市现代生活质量。
methods: 使用深度空间时间学习方法，包括数据准备阶段、空间时间学习阶段和决策阶段，其中数据准备阶段通过精心normal化数据来抑制长尾分布，空间时间学习阶段使用FCN和LSTM组合模型考虑时空天气和历法特征，决策阶段通过对来自不同数据源的交通量进行权重平均来预测下一天城市高速公路交通量。
results: 使用实际数据进行了广泛的实验，结果显示该方法在预测城市高速公路交通量方面具有显著的改进，其中MPAE和R-squre度量达到5.269和0.997 respectively。

Abstract
Inter-city highway transportation is significant for citizens' modern urban life and generates heterogeneous sensory data with spatio-temporal characteristics. As a routine analysis in transportation domain, daily traffic volume estimation faces challenges for highway toll stations including lacking of exploration of correlative spatio-temporal features from a long-term perspective and effective means to deal with data imbalance which always deteriorates the predictive performance. In this paper, a deep spatio-temporal learning method is proposed to predict daily traffic volume in three phases. In feature pre-processing phase, data is normalized elaborately according to latent long-tail distribution. In spatio-temporal learning phase, a hybrid model is employed combining fully convolution network (FCN) and long short-term memory (LSTM), which considers time, space, meteorology, and calendar from heterogeneous data. In decision phase, traffic volumes on a coming day at network-wide toll stations would be achieved effectively, which is especially calibrated for vital few highway stations. Using real-world data from one Chinese provincial highway, extensive experiments show our method has distinct improvement for predictive accuracy than various traditional models, reaching 5.269 and 0.997 in MPAE and R-squre metrics, respectively.

摘要
Modern urban жизнь的城市间交通运输非常重要，生成了多样化的感知数据，具有空间时间特征。为了解决高速公路收费站的日常交通量预测问题，包括缺乏长期纵探感知特征的探索和数据不均衡问题，这些问题都会导致预测性能下降。本文提出了深度空间时间学习方法，用于预测高速公路交通量。在特征预处理阶段，数据进行了精心Normalization，根据异常长尾分布。在空间时间学习阶段，我们使用了混合FCN和LSTM模型，考虑了时间、空间、气象和日历等多种多样数据。在决策阶段，通过对整个高速公路网络的收费站进行有效预测，实现了对重要的一些高速公路站点的准确预测。使用了一个中国省级高速公路的实际数据，我们进行了广泛的实验，并证明了我们的方法在预测精度方面与传统模型有显著提升，分别达到5.269和0.997的MPAE和R-squre指标。

Application of Artificial Neural Networks for Investigation of Pressure Filtration Performance, a Zinc Leaching Filter Cake Moisture Modeling

paper_url: http://arxiv.org/abs/2308.06138
repo_url: None
paper_authors: Masoume Kazemi, Davood Moradkhani, Alireza A. Alipour
for: 这个研究的目的是开发一个人工神经网络模型，用于预测氧气压 фильtration过程中铅生产中的蛋白湿度。
methods: 这个研究使用了人工神经网络模型，并在288个试验中测试了两种不同的织物（S1和S2）。
results: 研究发现，使用人工神经网络模型可以高度准确地预测氧气压 фильtration过程中铅生产中的蛋白湿度，其R2值为0.88和0.83，MSE值为6.243x10-07和1.086x10-06，MAE值为0.00056和0.00088。

Abstract
Machine Learning (ML) is a powerful tool for material science applications. Artificial Neural Network (ANN) is a machine learning technique that can provide high prediction accuracy. This study aimed to develop an ANN model to predict the cake moisture of the pressure filtration process of zinc production. The cake moisture was influenced by seven parameters: temperature (35 and 65 Celsius), solid concentration (0.2 and 0.38 g/L), pH (2, 3.5, and 5), air-blow time (2, 10, and 15 min), cake thickness (14, 20, 26, and 34 mm), pressure, and filtration time. The study conducted 288 tests using two types of fabrics: polypropylene (S1) and polyester (S2). The ANN model was evaluated by the Coefficient of determination (R2), the Mean Square Error (MSE), and the Mean Absolute Error (MAE) metrics for both datasets. The results showed R2 values of 0.88 and 0.83, MSE values of 6.243x10-07 and 1.086x10-06, and MAE values of 0.00056 and 0.00088 for S1 and S2, respectively. These results indicated that the ANN model could predict the cake moisture of pressure filtration in the zinc leaching process with high accuracy.

摘要
machine learning (ml) 是一种强大的工具 для材料科学应用。人工神经网络 (ann) 是一种机器学习技术，可以提供高精度预测。本研究目的是开发一个 ann 模型，以预测压 filtered 过程中锻 production 中糕点湿度。糕点湿度受到七个参数的影响：温度 (35 和 65 摄氏度), 固体浓度 (0.2 和 0.38 g/L), pH (2, 3.5, 和 5), 空气喷压时间 (2, 10, 和 15 分), 糕点厚度 (14, 20, 26, 和 34 mm), 压力, 和过滤时间。研究进行了 288 个测试，使用了两种不同的 fabrics： polypropylene (S1) 和 polyester (S2)。 ann 模型被评估使用 coefficient of determination (R2), mean square error (MSE), 和 mean absolute error (MAE) 度量器，对于两个数据集。结果表明，R2 值为 0.88 和 0.83，MSE 值为 6.243x10-07 和 1.086x10-06，MAE 值为 0.00056 和 0.00088，分别对 S1 和 S2 数据集。这些结果表明，ann 模型可以准确预测压 filtered 过程中锻 production 中糕点湿度。

A Game-Theoretic Framework for Joint Forecasting and Planning

paper_url: http://arxiv.org/abs/2308.06137
repo_url: https://github.com/portal-cornell/game-theoretic-forecasting-planning
paper_authors: Kushal Kedia, Prithwish Dan, Sanjiban Choudhury
for: 本研究旨在提供一种基于游戏理论的规划和预测方法，以提高机器人在人类存在下的安全规划。
methods: 本研究使用了一种新的游戏理论基础的规划和预测方法，其中包括一种新的评价函数，用于评估规划的性能。
results: 研究表明，使用该方法可以生成更安全的规划，并且在人类行为的长尾事件中表现更好。 codes 可以在 https://github.com/portal-cornell/Game-Theoretic-Forecasting-Planning 上下载。

Abstract
Planning safe robot motions in the presence of humans requires reliable forecasts of future human motion. However, simply predicting the most likely motion from prior interactions does not guarantee safety. Such forecasts fail to model the long tail of possible events, which are rarely observed in limited datasets. On the other hand, planning for worst-case motions leads to overtly conservative behavior and a ``frozen robot''. Instead, we aim to learn forecasts that predict counterfactuals that humans guard against. We propose a novel game-theoretic framework for joint planning and forecasting with the payoff being the performance of the planner against the demonstrator, and present practical algorithms to train models in an end-to-end fashion. We demonstrate that our proposed algorithm results in safer plans in a crowd navigation simulator and real-world datasets of pedestrian motion. We release our code at https://github.com/portal-cornell/Game-Theoretic-Forecasting-Planning.

摘要
计划安全机器人运动在人类存在下需要可靠的未来人类运动预测。然而，只预测最有可能性的运动不能保证安全。这些预测不会考虑可能性较低的事件，这些事件在有限的数据集中rarely observed。相反，我们目标是学习预测 humans guard against counterfactuals。我们提出了一种新的游戏理论基础，其中的奖励是与示例者的表现相对评价，并提供了实用的算法来训练模型。我们在人群导航模拟器和实际人行动数据上示出了我们的提议算法可以提供更安全的计划。我们的代码可以在上下载。

Improving Joint Speech-Text Representations Without Alignment

paper_url: http://arxiv.org/abs/2308.06125
repo_url: None
paper_authors: Cal Peyser, Zhong Meng, Ke Hu, Rohit Prabhavalkar, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho
for: 这个论文旨在提出一种基于Modal Space的文本提示图像生成方法，以提高文本检索和识别的效果。
methods: 该方法使用了共同表示空间，将文本和图像领域共同表示，并通过适应loss来忽略序列长度差异。
results: 该方法在大参数训练的声音-文本编码器中显示出了改善的下游WER性能， tanto在单语言系统中 alsowithin multilingual system。

Abstract
The last year has seen astonishing progress in text-prompted image generation premised on the idea of a cross-modal representation space in which the text and image domains are represented jointly. In ASR, this idea has found application as joint speech-text encoders that can scale to the capacities of very large parameter models by being trained on both unpaired speech and text. While these methods show promise, they have required special treatment of the sequence-length mismatch inherent in speech and text, either by up-sampling heuristics or an explicit alignment model. In this work, we offer evidence that joint speech-text encoders naturally achieve consistent representations across modalities by disregarding sequence length, and argue that consistency losses could forgive length differences and simply assume the best alignment. We show that such a loss improves downstream WER in both a large-parameter monolingual and multilingual system.

摘要
最近一年内，文本提示图像生成技术呈现了惊人的进步，基于跨Modal Representation Space（MR）的想法。在ASR中，这种想法得到应用，通过同时训练语音和文本频谱的共同Encoder来实现大型参数模型的扩展。虽然这些方法显示搭配性，但它们需要特殊地处理语音和文本序列长度之间的差异，通常通过填充规则或显式对齐模型。在这项工作中，我们提供证据，表明 JOINT 语音文本 Encoder 自然地实现了不同Modalities中的一致性，而不需要特殊地处理序列长度。我们还证明，在大型单语言和多语言系统中，这种损失函数可以提高下游 WER。

Learning Deductive Reasoning from Synthetic Corpus based on Formal Logic

paper_url: http://arxiv.org/abs/2308.07336
repo_url: https://github.com/hitachi-nlp/fld
paper_authors: Terufumi Morishita, Gaku Morio, Atsuki Yamaguchi, Yasuhiro Sogawa
for: 本研究旨在使语言模型（LM）掌握逻辑推理能力。
methods: 我们采用了基于形式逻辑理论的有效集合推理规则，可以从多步骤中 derivation 任何其他推理规则。
results: 我们的实验表明，使用我们提议的$\textbf{FLD}$ Corpora进行训练，可以使LM acquire更加通用的逻辑推理能力，并且我们identified了推理推理能力中哪些方面可以通过推理 corpora增强LM，以及哪些方面无法增强。

Abstract
We study a synthetic corpus-based approach for language models (LMs) to acquire logical deductive reasoning ability. The previous studies generated deduction examples using specific sets of deduction rules. However, these rules were limited or otherwise arbitrary. This can limit the generalizability of acquired deductive reasoning ability. We rethink this and adopt a well-grounded set of deduction rules based on formal logic theory, which can derive any other deduction rules when combined in a multistep way. We empirically verify that LMs trained on the proposed corpora, which we name $\textbf{FLD}$ ($\textbf{F}$ormal $\textbf{L}$ogic $\textbf{D}$eduction), acquire more generalizable deductive reasoning ability. Furthermore, we identify the aspects of deductive reasoning ability on which deduction corpora can enhance LMs and those on which they cannot. Finally, on the basis of these results, we discuss the future directions for applying deduction corpora or other approaches for each aspect. We release the code, data, and models.

摘要
我们研究了基于合成 corpora 的方法，以使语言模型（LM）掌握逻辑推理能力。过去的研究通常使用特定的推理规则生成推理示例，但这些规则受限或是Random。这会限制学习得到的逻辑推理能力的通用性。我们重新思考这一点，采用基于正式逻辑理论的固定的推理规则，这些规则可以在多步骤的情况下组合生成任何其他的推理规则。我们实验证明，使用我们命名的 $\textbf{FLD}$（$\textbf{F}$ormal $\textbf{L}$ogic $\textbf{D}$eduction）训练集，LM 可以获得更加通用的逻辑推理能力。此外，我们还识别了使用推理 corpora 可以增强LM的逻辑推理能力中的哪些方面，以及哪些方面无法增强。最后，我们根据这些结果，讨论未来如何应用推理 corpora 或其他方法来解决每个方面的问题。我们发布了代码、数据和模型。

Improving Zero-Shot Text Matching for Financial Auditing with Large Language Models

paper_url: http://arxiv.org/abs/2308.06111
repo_url: None
paper_authors: Lars Hillebrand, Armin Berger, Tobias Deußer, Tim Dilmaghani, Mohamed Khaled, Bernd Kliem, Rüdiger Loitz, Maren Pielka, David Leonhard, Christian Bauckhage, Rafet Sifa
for: 帮助审核金融文档，提高审核效率和精度。
methods: 使用AI技术，提供相关文本段的推荐，以满足严格会计标准的法律要求。
results: 比较 existing 方法有 significat 性能提升，可以帮助审核人员更快速地完成审核任务。

Abstract
Auditing financial documents is a very tedious and time-consuming process. As of today, it can already be simplified by employing AI-based solutions to recommend relevant text passages from a report for each legal requirement of rigorous accounting standards. However, these methods need to be fine-tuned regularly, and they require abundant annotated data, which is often lacking in industrial environments. Hence, we present ZeroShotALI, a novel recommender system that leverages a state-of-the-art large language model (LLM) in conjunction with a domain-specifically optimized transformer-based text-matching solution. We find that a two-step approach of first retrieving a number of best matching document sections per legal requirement with a custom BERT-based model and second filtering these selections using an LLM yields significant performance improvements over existing approaches.

摘要
审核财务文档是一项非常繁琐和时间consuming的过程。到目前为止，可以使用基于人工智能的解决方案来提供相关的文本段落，以满足严格的会计标准的法律要求。然而，这些方法需要定期细化，并且需要充足的注释数据，而这些数据在工业环境中经常缺乏。因此，我们提出了ZeroShotALI，一种新的推荐系统，利用现代大语言模型（LLM），并结合适应域特定的 transformer 基于文本匹配解决方案。我们发现，首先使用自定义 BERT 基本模型 retrieve 最佳匹配的文档段落，然后使用 LLM 进行筛选，可以实现显著性能提升。

Neural Conversation Models and How to Rein Them in: A Survey of Failures and Fixes

paper_url: http://arxiv.org/abs/2308.06095
repo_url: None
paper_authors: Fabian Galetzka, Anne Beyer, David Schlangen
for: 这个论文旨在探讨基于强大语言模型的开放领域对话系统，以及如何使用语言模型实现合适的对话贡献。
methods: 该论文根据Grice的协作对话maxims进行了语言模型的解释，并系统化了相关研究的文献，包括使用数据、训练方法和解码方法来控制语言模型的性能。
results: 该论文提出了一些有前途的方法，并建议了未来研究的新方向，以提高语言模型的对话贡献质量。

Abstract
Recent conditional language models are able to continue any kind of text source in an often seemingly fluent way. This fact encouraged research in the area of open-domain conversational systems that are based on powerful language models and aim to imitate an interlocutor by generating appropriate contributions to a written dialogue. From a linguistic perspective, however, the complexity of contributing to a conversation is high. In this survey, we interpret Grice's maxims of cooperative conversation from the perspective of this specific research area and systematize the literature under the aspect of what makes a contribution appropriate: A neural conversation model has to be fluent, informative, consistent, coherent, and follow social norms. In order to ensure these qualities, recent approaches try to tame the underlying language models at various intervention points, such as data, training regime or decoding. Sorted by these categories and intervention points, we discuss promising attempts and suggest novel ways for future research.

摘要

Reinforcement Logic Rule Learning for Temporal Point Processes

paper_url: http://arxiv.org/abs/2308.06094
repo_url: None
paper_authors: Chao Yang, Lu Wang, Kun Gao, Shuang Li
for: 这种方法用于扩展 temporal logic 规则集，以解释时间事件的发生。
methods: 该算法使用 temporal point process 模型和学习框架，逐渐优化规则内容和权重，直到 Observational event sequence 的可能性最大化。
results: 该方法在 both synthetic and real healthcare datasets 上获得了promising results。

Abstract
We propose a framework that can incrementally expand the explanatory temporal logic rule set to explain the occurrence of temporal events. Leveraging the temporal point process modeling and learning framework, the rule content and weights will be gradually optimized until the likelihood of the observational event sequences is optimal. The proposed algorithm alternates between a master problem, where the current rule set weights are updated, and a subproblem, where a new rule is searched and included to best increase the likelihood. The formulated master problem is convex and relatively easy to solve using continuous optimization, whereas the subproblem requires searching the huge combinatorial rule predicate and relationship space. To tackle this challenge, we propose a neural search policy to learn to generate the new rule content as a sequence of actions. The policy parameters will be trained end-to-end using the reinforcement learning framework, where the reward signals can be efficiently queried by evaluating the subproblem objective. The trained policy can be used to generate new rules in a controllable way. We evaluate our methods on both synthetic and real healthcare datasets, obtaining promising results.

摘要
我们提出了一个框架，可以逐步扩展 temporal logic 规则集来解释时间事件的发生。利用 temporal point process 模型和学习框架，规则内容和权重将被渐进优化，直到观察事件序列的可能性最大化。我们的算法会 alternate between主问题和辅助问题。主问题中，当前规则集权重将被更新；辅助问题中，一个新的规则将被搜索并添加到最大化观察事件序列的可能性。主问题是 convex 的continuous optimization 问题，可以使用continuous optimization 方法解决；辅助问题则需要搜索庞大的 combinatorial rule predicate 和关系空间。为解决这个挑战，我们提出了一种 neural search 策略，可以学习生成新规则内容为一个序列的动作。策略参数将通过 reinforcement learning 框架进行end-to-end 训练，其中 reward signal 可以效率地被查询通过辅助问题的目标函数。训练好的策略可以控制性地生成新规则。我们对 synthetic 和实际医疗数据进行了evaluation， obtained promising results.

Toward a Better Understanding of Loss Functions for Collaborative Filtering

paper_url: http://arxiv.org/abs/2308.06091
repo_url: https://github.com/psm1206/mawu
paper_authors: Seongmin Park, Mincheol Yoon, Jae-woong Lee, Hogun Park, Jongwuk Lee
for: 这篇论文探讨了现有的 collaborative filtering（CF）模型学习过程中的三个组成部分，即交互编码器、损失函数和负样本。
methods: 该论文分析了现有的损失函数之间的关系，并提出了一种新的损失函数：margin-aware alignment和weighted uniformity（MAWU），该损失函数能够考虑到数据集的特点，提高CF模型的性能。
results: 实验结果表明，当 equiped with MAWU，MF和LightGCN的性能与现有的CF模型相当或更高，特别是在许多实际应用中。

Abstract
Collaborative filtering (CF) is a pivotal technique in modern recommender systems. The learning process of CF models typically consists of three components: interaction encoder, loss function, and negative sampling. Although many existing studies have proposed various CF models to design sophisticated interaction encoders, recent work shows that simply reformulating the loss functions can achieve significant performance gains. This paper delves into analyzing the relationship among existing loss functions. Our mathematical analysis reveals that the previous loss functions can be interpreted as alignment and uniformity functions: (i) the alignment matches user and item representations, and (ii) the uniformity disperses user and item distributions. Inspired by this analysis, we propose a novel loss function that improves the design of alignment and uniformity considering the unique patterns of datasets called Margin-aware Alignment and Weighted Uniformity (MAWU). The key novelty of MAWU is two-fold: (i) margin-aware alignment (MA) mitigates user/item-specific popularity biases, and (ii) weighted uniformity (WU) adjusts the significance between user and item uniformities to reflect the inherent characteristics of datasets. Extensive experimental results show that MF and LightGCN equipped with MAWU are comparable or superior to state-of-the-art CF models with various loss functions on three public datasets.

摘要

An Autoethnographic Exploration of XAI in Algorithmic Composition

paper_url: http://arxiv.org/abs/2308.06089
repo_url: None
paper_authors: Ashley Noel-Hirst, Nick Bryan-Kinns
for: 本研究旨在探讨如何使用可解释的人工智能（XAI）生成模型来创作传统爱尔兰民谣音乐。
methods: 该研究使用MeasureVAE生成模型，该模型具有可解释的秘密分量，并在爱尔兰民谣音乐上进行训练。
results: 研究发现，在音乐创作过程中，音乐创作者倾向于利用模型中的特征，而不是模型本身。这种方法可能扩展XAI模型的应用范围，并且可能为音乐创作者提供有用的创作工具。

Abstract
Machine Learning models are capable of generating complex music across a range of genres from folk to classical music. However, current generative music AI models are typically difficult to understand and control in meaningful ways. Whilst research has started to explore how explainable AI (XAI) generative models might be created for music, no generative XAI models have been studied in music making practice. This paper introduces an autoethnographic study of the use of the MeasureVAE generative music XAI model with interpretable latent dimensions trained on Irish folk music. Findings suggest that the exploratory nature of the music-making workflow foregrounds musical features of the training dataset rather than features of the generative model itself. The appropriation of an XAI model within an iterative workflow highlights the potential of XAI models to form part of a richer and more complex workflow than they were initially designed for.

摘要
机器学习模型可以生成复杂的音乐，从folk到古典音乐。然而，当前的生成音乐AI模型通常具有难以理解和控制的问题。研究已经开始探索如何创建可解释的AI生成模型（XAI） для音乐，但没有任何生成XAI模型在音乐创作实践中被研究。这篇论文介绍了一个自传式的研究，使用MeasureVAE生成音乐XAI模型，具有可解释的潜在维度，在爱尔兰传统音乐上进行训练。发现结果表明，音乐创作过程的探索性强调了训练数据集中的音乐特征，而不是生成模型自身的特征。将XAI模型 incorporated into an iterative workflow highlights the potential of XAI models to form part of a richer and more complex workflow than they were initially designed for。

Assessing Student Errors in Experimentation Using Artificial Intelligence and Large Language Models: A Comparative Study with Human Raters

paper_url: http://arxiv.org/abs/2308.06088
repo_url: None
paper_authors: Arne Bewersdorff, Kathrin Seßler, Armin Baur, Enkelejda Kasneci, Claudia Nerdel
for: 本研究旨在提供一种基于大语言模型（LLM）的自动错误检测方法，以便为学生实验卷提供产生性、个性化的反馈。
methods: 研究使用了GPT-3.5和GPT-4系列的人工智能系统，并对人类评审者进行比较。
results: 研究发现，LLM系统可以准确地检测一些基本学生错误，如专注于依赖变量而不是预期观察（准确率为0.90）、修改进行中的实验（准确率为1）和VALIDATE_TRIALS（准确率为0.82）等。然而，检测更复杂的错误，如是否进行了有效的控制试验（准确率为0.60），具有更大的挑战。

Abstract
Identifying logical errors in complex, incomplete or even contradictory and overall heterogeneous data like students' experimentation protocols is challenging. Recognizing the limitations of current evaluation methods, we investigate the potential of Large Language Models (LLMs) for automatically identifying student errors and streamlining teacher assessments. Our aim is to provide a foundation for productive, personalized feedback. Using a dataset of 65 student protocols, an Artificial Intelligence (AI) system based on the GPT-3.5 and GPT-4 series was developed and tested against human raters. Our results indicate varying levels of accuracy in error detection between the AI system and human raters. The AI system can accurately identify many fundamental student errors, for instance, the AI system identifies when a student is focusing the hypothesis not on the dependent variable but solely on an expected observation (acc. = 0.90), when a student modifies the trials in an ongoing investigation (acc. = 1), and whether a student is conducting valid test trials (acc. = 0.82) reliably. The identification of other, usually more complex errors, like whether a student conducts a valid control trial (acc. = .60), poses a greater challenge. This research explores not only the utility of AI in educational settings, but also contributes to the understanding of the capabilities of LLMs in error detection in inquiry-based learning like experimentation.

摘要
identifying logical errors in complex, incomplete or even contradictory data like students' experimentation protocols is challenging. recognizing the limitations of current evaluation methods, we investigate the potential of Large Language Models (LLMs) for automatically identifying student errors and streamlining teacher assessments. our aim is to provide a foundation for productive, personalized feedback. using a dataset of 65 student protocols, an Artificial Intelligence (AI) system based on the GPT-3.5 and GPT-4 series was developed and tested against human raters. our results indicate varying levels of accuracy in error detection between the AI system and human raters. the AI system can accurately identify many fundamental student errors, for instance, the AI system identifies when a student is focusing the hypothesis not on the dependent variable but solely on an expected observation (acc. = 0.90), when a student modifies the trials in an ongoing investigation (acc. = 1), and whether a student is conducting valid test trials (acc. = 0.82) reliably. the identification of other, usually more complex errors, like whether a student conducts a valid control trial (acc. = .60), poses a greater challenge. this research explores not only the utility of AI in educational settings, but also contributes to the understanding of the capabilities of LLMs in error detection in inquiry-based learning like experimentation.

Audio-Visual Spatial Integration and Recursive Attention for Robust Sound Source Localization

paper_url: http://arxiv.org/abs/2308.06087
repo_url: https://github.com/visualaikhu/sira-ssl
paper_authors: Sung Jin Um, Dongjin Kim, Jung Uk Kim
for: 本研究旨在开发一种可以听视模式结合探测声音源的方法，以便机器可以在视觉场景中检测声音源的位置。methods: 该方法利用了两个模式的空间cue，即声音和视觉模式，并通过人类行为的模仿，对声音源进行探测。此外，我们还提出了一种循环注意网络，以模仿人类的循环注意行为。results: 我们的方法在Flickr SoundNet和VGG-Sound Source数据集上进行了广泛的实验，并得到了较好的result。与传统方法相比，我们的方法能够更好地探测声音源的位置。代码可以在GitHub上找到：https://github.com/VisualAIKHU/SIRA-SSL。

Abstract
The objective of the sound source localization task is to enable machines to detect the location of sound-making objects within a visual scene. While the audio modality provides spatial cues to locate the sound source, existing approaches only use audio as an auxiliary role to compare spatial regions of the visual modality. Humans, on the other hand, utilize both audio and visual modalities as spatial cues to locate sound sources. In this paper, we propose an audio-visual spatial integration network that integrates spatial cues from both modalities to mimic human behavior when detecting sound-making objects. Additionally, we introduce a recursive attention network to mimic human behavior of iterative focusing on objects, resulting in more accurate attention regions. To effectively encode spatial information from both modalities, we propose audio-visual pair matching loss and spatial region alignment loss. By utilizing the spatial cues of audio-visual modalities and recursively focusing objects, our method can perform more robust sound source localization. Comprehensive experimental results on the Flickr SoundNet and VGG-Sound Source datasets demonstrate the superiority of our proposed method over existing approaches. Our code is available at: https://github.com/VisualAIKHU/SIRA-SSL

摘要
¹ 音源localization任务的目标是让机器在视觉场景中检测声音来源的位置。而现有的方法只是使用音频作为视觉模式的比较依据，而人类则是同时使用音频和视觉模式来定位声音来源。在这篇论文中，我们提出了一种audio-visual空间集成网络，该网络可以 integrates空间信息来自两个模式，以模仿人类行为。此外，我们还引入了一种循环注意网络，以模仿人类的反复关注对象，从而获得更准确的注意区域。为了有效地编码音频-视觉模式中的空间信息，我们提出了音频-视觉对匹配损失和空间区域对齐损失。通过利用音频-视觉模式中的空间信息和循环注意对象，我们的方法可以实现更加稳定的声音来源定位。我们的实验结果表明，我们的方法在Flickr SoundNet和VGG-Sound Source datasets上比现有方法更高效。我们的代码可以在：https://github.com/VisualAIKHU/SIRA-SSL中找到。

Cost-effective On-device Continual Learning over Memory Hierarchy with Miro

paper_url: http://arxiv.org/abs/2308.06053
repo_url: None
paper_authors: Xinyue Ma, Suyeon Jeong, Minjia Zhang, Di Wang, Jonghyun Choi, Myeongjae Jeon
for: This paper focuses on training neural network models incrementally on edge devices using continual learning (CL), with the goal of achieving cost-effectiveness while maintaining high model accuracy.
methods: The paper explores the design space of hierarchical memory replay-based CL and presents a novel system runtime called Miro that dynamically configures the CL system based on resource states for the best cost-effectiveness. Miro also performs online profiling on parameters with clear accuracy-energy trade-offs and adapts to optimal values with low overhead.
results: The paper shows that Miro significantly outperforms baseline systems in terms of cost-effectiveness, achieving higher accuracy while using less energy on edge devices.

Abstract
Continual learning (CL) trains NN models incrementally from a continuous stream of tasks. To remember previously learned knowledge, prior studies store old samples over a memory hierarchy and replay them when new tasks arrive. Edge devices that adopt CL to preserve data privacy are typically energy-sensitive and thus require high model accuracy while not compromising energy efficiency, i.e., cost-effectiveness. Our work is the first to explore the design space of hierarchical memory replay-based CL to gain insights into achieving cost-effectiveness on edge devices. We present Miro, a novel system runtime that carefully integrates our insights into the CL framework by enabling it to dynamically configure the CL system based on resource states for the best cost-effectiveness. To reach this goal, Miro also performs online profiling on parameters with clear accuracy-energy trade-offs and adapts to optimal values with low overhead. Extensive evaluations show that Miro significantly outperforms baseline systems we build for comparison, consistently achieving higher cost-effectiveness.

摘要

Learning to Guide Human Experts via Personalized Large Language Models

paper_url: http://arxiv.org/abs/2308.06039
repo_url: None
paper_authors: Debodeep Banerjee, Stefano Teso, Andrea Passerini
for: 学习延迟（learning to defer），一个预测器可以识别风险决策并将其延迟给人类专家。
methods: 我们提出了学习导航（LTG）框架，而不是提供准备好的决策，机器将提供有用的指导来导引决策，人类完全负责来到决策。
results: 我们介绍了SLOG实现，可以通过一小量的人类监督将普通的大型自然语言模型转换成一个能够生成文本指导的模块，并对医疗诊断任务进行初步但是有前途的实验。

Abstract
In learning to defer, a predictor identifies risky decisions and defers them to a human expert. One key issue with this setup is that the expert may end up over-relying on the machine's decisions, due to anchoring bias. At the same time, whenever the machine chooses the deferral option the expert has to take decisions entirely unassisted. As a remedy, we propose learning to guide (LTG), an alternative framework in which -- rather than suggesting ready-made decisions -- the machine provides guidance useful to guide decision-making, and the human is entirely responsible for coming up with a decision. We also introduce SLOG, an LTG implementation that leverages (a small amount of) human supervision to convert a generic large language model into a module capable of generating textual guidance, and present preliminary but promising results on a medical diagnosis task.

摘要
在学习延迟中，一个预测器可以识别风险决策并将其延迟给人类专家。然而，一个问题是专家可能会因为锚定偏见过度依赖机器的决策。当机器选择延迟选项时，专家必须完全不帮助地做出决策。为了解决这个问题，我们提出了学习导航（LTG）框架，在这个框架中，机器不会提供准备好的决策，而是提供有用的指导，以帮助人类做出决策。我们还介绍了SLOG，一种LTG实现方式，通过（一定的）人类监督来将一个通用的大型自然语言模型转换成一个能够生成文本指导的模块，并发表了初步但有前途的医疗诊断任务结果。

Deep Context Interest Network for Click-Through Rate Prediction

paper_url: http://arxiv.org/abs/2308.06037
repo_url: None
paper_authors: Xuyang Hou, Zhe Wang, Qi Liu, Tan Qu, Jia Cheng, Jun Lei
for: 预测用户点击行为（Click-Through Rate，CTR），提高在线广告等行业中的表现。
methods: 提出了一种名为深度上下文兴趣网络（Deep Context Interest Network，DCIN）的新模型，将用户点击行为和其显示上下文集成到一起，以学习用户的上下文感兴趣。DCIN包括三个关键模块：1）位置意识上下文聚合模块（PCAM），通过注意力机制对显示项进行聚合; 2）反馈上下文融合模块（FCFM），通过非线性特征交互来融合点击和显示上下文表示; 3）兴趣匹配模块（IMM），通过匹配点击和显示上下文中的兴趣来活化用户的兴趣。
results: 在线上和离线上的评估中，DCIN方法显示出了明显的提高，特别是在大规模的产业环境中部署后，使用DCIN方法可以提高1.5%的CTR和1.5%的RPM。

Abstract
Click-Through Rate (CTR) prediction, estimating the probability of a user clicking on an item, is essential in industrial applications, such as online advertising. Many works focus on user behavior modeling to improve CTR prediction performance. However, most of those methods only model users' positive interests from users' click items while ignoring the context information, which is the display items around the clicks, resulting in inferior performance. In this paper, we highlight the importance of context information on user behavior modeling and propose a novel model named Deep Context Interest Network (DCIN), which integrally models the click and its display context to learn users' context-aware interests. DCIN consists of three key modules: 1) Position-aware Context Aggregation Module (PCAM), which performs aggregation of display items with an attention mechanism; 2) Feedback-Context Fusion Module (FCFM), which fuses the representation of clicks and display contexts through non-linear feature interaction; 3) Interest Matching Module (IMM), which activates interests related with the target item. Moreover, we provide our hands-on solution to implement our DCIN model on large-scale industrial systems. The significant improvements in both offline and online evaluations demonstrate the superiority of our proposed DCIN method. Notably, DCIN has been deployed on our online advertising system serving the main traffic, which brings 1.5% CTR and 1.5% RPM lift.

摘要
Click-through rate (CTR) 预测，judging the probability of a user clicking on an item，是工业应用中的关键问题，如在线广告。许多研究都是用户行为模型来提高 CTR 预测性能。然而，大多数这些方法只是模型用户的正面兴趣，即用户点击的项目，而忽略了上下文信息，即点击项目的周围的显示项目，这会导致性能下降。在这篇论文中，我们强调了用户上下文信息的重要性，并提出了一种新的模型，即深度上下文兴趣网络（DCIN）。DCIN 包括三个关键模块：1）位置意识上下文聚合模块（PCAM），通过注意机制来聚合显示项目; 2）反馈上下文融合模块（FCFM），通过非线性特征交互来融合点击和显示上下文的表示; 3）兴趣匹配模块（IMM），通过匹配用户的兴趣和目标项目来活化用户的兴趣。此外，我们还提供了在大规模工业系统上实现 DCIN 模型的实践方法。在线和离线评估中，DCIN 方法显示出了明显的优势，特别是在主要流量上部署 DCIN 方法后，Click-through rate 提高 1.5%，和 Revenue Per Mille (RPM) 提高 1.5%。

Evidence of Human-Like Visual-Linguistic Integration in Multimodal Large Language Models During Predictive Language Processing

paper_url: http://arxiv.org/abs/2308.06035
repo_url: None
paper_authors: Viktor Kewenig, Christopher Edwards, Quitterie Lacome DEstalenx, Akilles Rechardt, Jeremy I Skipper, Gabriella Vigliocco
for:这篇论文探讨了大语言模型（LLM）的语言处理能力是否可以模拟人类的认知过程。methods:研究使用了多Modal language model（mLLM），它将视觉和语言嵌入空间与 transformer 型注意机制结合起来进行下一个词预测。results:研究发现，人类的预测性与 CLIP 分数相似，但不是对一个单modal LLM 进行比较。进一步的分析发现，当 CLIP 的视觉注意重量被干扰时，人类和 CLIP 的预测性都消失。此外，当同一个输入被传递给一个多 modal 模型而不具有注意时，人类和 CLIP 的预测性也消失。这些结果表明，在 mLLM 和人类之间的预测语言处理过程存在相似的多模态信息集成和注意机制。

Abstract
The advanced language processing abilities of large language models (LLMs) have stimulated debate over their capacity to replicate human-like cognitive processes. One differentiating factor between language processing in LLMs and humans is that language input is often grounded in more than one perceptual modality, whereas most LLMs process solely text-based information. Multimodal grounding allows humans to integrate - e.g. visual context with linguistic information and thereby place constraints on the space of upcoming words, reducing cognitive load and improving perception and comprehension. Recent multimodal LLMs (mLLMs) combine visual and linguistic embedding spaces with a transformer type attention mechanism for next-word prediction. To what extent does predictive language processing based on multimodal input align in mLLMs and humans? To answer this question, 200 human participants watched short audio-visual clips and estimated the predictability of an upcoming verb or noun. The same clips were processed by the mLLM CLIP, with predictability scores based on a comparison of image and text feature vectors. Eye-tracking was used to estimate what visual features participants attended to, and CLIP's visual attention weights were recorded. We find that human estimates of predictability align significantly with CLIP scores, but not for a unimodal LLM of comparable parameter size. Further, alignment vanished when CLIP's visual attention weights were perturbed, and when the same input was fed to a multimodal model without attention. Analysing attention patterns, we find a significant spatial overlap between CLIP's visual attention weights and human eye-tracking data. Results suggest that comparable processes of integrating multimodal information, guided by attention to relevant visual features, supports predictive language processing in mLLMs and humans.

摘要
大型语言模型（LLM）的高级语言处理能力已经引起了对其是否可以模拟人类认知过程的辩论。与人类语言处理不同的一点是，LLM通常只处理文本类型的语言输入，而人类则可以将多种感知模式结合起来，以帮助理解和理解语言。多模态基础设施（multimodal grounding）允许人类将视觉上下文与语言信息结合起来，从而限制下一个词的可能性，降低认知负担并提高理解和理解能力。最近的多模态语言模型（mLLM）将视觉和语言嵌入空间与转换器类型注意机制结合起来，以进行下一个词预测。我们问道，mLLM和人类的预测语言处理是否具有相似的特征？为了回答这个问题，我们请求200名参与者观看短 audio-visual clip，并估算下一个词或名称的预测可能性。同时，我们使用CLIP模型处理同一个clip，并根据图像和文本特征向量进行比较。使用眼动追踪来估算参与者关注的视觉特征，并记录CLIP的视觉注意力量。我们发现，人类的预测与CLIP得分之间存在显著的相似性，但不同于一个参数大小相当的单模态 LLM。此外，当CLIP的视觉注意力量被干扰时，或者当同一个输入被传递给没有注意力的多模态模型时，对应的Alignment消失。我们分析了注意力模式，发现CLIP的视觉注意力量和人类眼动追踪数据之间存在显著的空间重叠。结果表明，mLLM和人类在多模态信息的集成和注意力引导下实现了相似的预测语言处理过程。

Large Language Models in Cryptocurrency Securities Cases: Can ChatGPT Replace Lawyers?

paper_url: http://arxiv.org/abs/2308.06032
repo_url: None
paper_authors: Arianna Trozze, Toby Davies, Bennett Kleinberg
for: The paper aims to study the effectiveness of large language models (LLMs) in conducting legal tasks, specifically in the context of securities cases involving cryptocurrencies.
methods: The paper uses GPT-3.5, a large language model, to evaluate its legal reasoning and drafting capabilities in real-life cases. The authors compare the performance of GPT-3.5 with human lawyers in terms of determining potential violations and drafting legal complaints.
results: The paper finds that GPT-3.5’s legal reasoning skills are weak and misses additional correct violations, but it performs better in legal drafting. The authors also find that jurors’ decisions are not statistically significantly associated with the author of the document upon which they based their decisions. Overall, the paper suggests that LLMs cannot satisfactorily conduct legal reasoning tasks but could provide access to justice for more individuals by reducing the cost of legal services.

Abstract
Large Language Models (LLMs) could enhance access to the legal system. However, empirical research on their effectiveness in conducting legal tasks is scant. We study securities cases involving cryptocurrencies as one of numerous contexts where AI could support the legal process, studying LLMs' legal reasoning and drafting capabilities. We examine whether a) an LLM can accurately determine which laws are potentially being violated from a fact pattern, and b) whether there is a difference in juror decision-making based on complaints written by a lawyer compared to an LLM. We feed fact patterns from real-life cases to GPT-3.5 and evaluate its ability to determine correct potential violations from the scenario and exclude spurious violations. Second, we had mock jurors assess complaints written by the LLM and lawyers. GPT-3.5's legal reasoning skills proved weak, though we expect improvement in future models, particularly given the violations it suggested tended to be correct (it merely missed additional, correct violations). GPT-3.5 performed better at legal drafting, and jurors' decisions were not statistically significantly associated with the author of the document upon which they based their decisions. Because LLMs cannot satisfactorily conduct legal reasoning tasks, they would be unable to replace lawyers at this stage. However, their drafting skills (though, perhaps, still inferior to lawyers), could provide access to justice for more individuals by reducing the cost of legal services. Our research is the first to systematically study LLMs' legal drafting and reasoning capabilities in litigation, as well as in securities law and cryptocurrency-related misconduct.

摘要
大型语言模型（LLMs）可能会提高法律系统的存取。然而，实践研究这些模型在执行法律任务上的效果仍然缺乏。我们在投资业中使用 криптовалю牌为背景，研究 AI 可以支持法律程序，研究 LLMs 的法律推理和撰写能力。我们评估了 GPT-3.5 是否能够准确地决定从事例中可能违反的法律，以及法律执行人和 AI 撰写的文件是否会影响陪审员的决策。我们给 GPT-3.5 输入了实际案例的事例，并评估了它的法律推理和撰写能力。我们发现 GPT-3.5 的法律推理能力较弱，但是它还是能够准确地决定案例中的可能违反。在撰写方面，GPT-3.5 表现较好，并且陪审员的决策与文件的作者无关。由于 LLMs 目前无法实现法律推理任务，因此它们无法取代律师。但是，它们的撰写能力可能可以帮助更多人访问法律系统，因为它们可以降低法律服务的成本。我们的研究是首次系统地研究 LLMs 在诉讼中的法律推理和撰写能力，以及在投资业和 криптовалю牌相关的违法行为中。

AI-Assisted Investigation of On-Chain Parameters: Risky Cryptocurrencies and Price Factors

paper_url: http://arxiv.org/abs/2308.08554
repo_url: None
paper_authors: Abdulrezzak Zekiye, Semih Utku, Fadi Amroush, Oznur Ozkasap
for: 本研究旨在帮助投资者做出 Informed 投资决策，对 cryptocurrency 价格的影响因素进行分析，并identify 风险的 cryptocurrency。
methods: 本研究使用历史数据和人工智能算法对 on-chain 参数进行分析，以确定 cryptocurrency 价格的影响因素和风险度。
results: 分析发现，大约一third (39%) of cryptocurrencies 从市场中消失，只有一小部分 (10%) 存活了超过 1000 天。我们发现 cryptocurrency 价格和最大供应量以及总供应量之间存在显著负相关性，同时24小时交易量和价格也存在某种弱相关性。此外，我们使用 clustering 和分类来为投资者提供更全面的 cryptocurrency 认知，并使用多种分类器来预测 cryptocurrency 是否为风险。最终，我们使用 K-Nearest Neighbor 得到了最佳的 f1-score 76%。

Abstract
Cryptocurrencies have become a popular and widely researched topic of interest in recent years for investors and scholars. In order to make informed investment decisions, it is essential to comprehend the factors that impact cryptocurrency prices and to identify risky cryptocurrencies. This paper focuses on analyzing historical data and using artificial intelligence algorithms on on-chain parameters to identify the factors affecting a cryptocurrency's price and to find risky cryptocurrencies. We conducted an analysis of historical cryptocurrencies' on-chain data and measured the correlation between the price and other parameters. In addition, we used clustering and classification in order to get a better understanding of a cryptocurrency and classify it as risky or not. The analysis revealed that a significant proportion of cryptocurrencies (39%) disappeared from the market, while only a small fraction (10%) survived for more than 1000 days. Our analysis revealed a significant negative correlation between cryptocurrency price and maximum and total supply, as well as a weak positive correlation between price and 24-hour trading volume. Moreover, we clustered cryptocurrencies into five distinct groups using their on-chain parameters, which provides investors with a more comprehensive understanding of a cryptocurrency when compared to those clustered with it. Finally, by implementing multiple classifiers to predict whether a cryptocurrency is risky or not, we obtained the best f1-score of 76% using K-Nearest Neighbor.

摘要
digital currencies 在过去几年中变得非常受投资者和学者关注。为了做出了解的投资决策，需要了解影响数字货币价格的因素，并识别风险货币。这篇论文探讨了历史数字货币的链上数据，并使用人工智能算法对其他参数进行分析。我们对历史数字货币的链上数据进行分析，并测量价格与其他参数之间的相关性。此外，我们还使用聚类和分类来更好地理解数字货币，并将其分为五个不同的组。最后，我们通过多种分类器预测数字货币是否为风险，并获得了最佳的 f1 分数（76%）使用 K-最近邻居。

Controlling Character Motions without Observable Driving Source

paper_url: http://arxiv.org/abs/2308.06025
repo_url: None
paper_authors: Weiyuan Li, Bin Dai, Ziyi Zhou, Qi Yao, Baoyuan Wang
for: 生成无驱动源的多样化、自然的人体动作序列
methods: 提议一种系统性框架，结合VQ-VAE和征token级控制策略，并使用优化的奖励函数进行训练
results: 通过全面评估，我们的提议的框架可以有效地解决无驱动源生成中的OOD问题、缺乏多样性和不愿意 periodic 问题，并与其他强基eline比较高效。

Abstract
How to generate diverse, life-like, and unlimited long head/body sequences without any driving source? We argue that this under-investigated research problem is non-trivial at all, and has unique technical challenges behind it. Without semantic constraints from the driving sources, using the standard autoregressive model to generate infinitely long sequences would easily result in 1) out-of-distribution (OOD) issue due to the accumulated error, 2) insufficient diversity to produce natural and life-like motion sequences and 3) undesired periodic patterns along the time. To tackle the above challenges, we propose a systematic framework that marries the benefits of VQ-VAE and a novel token-level control policy trained with reinforcement learning using carefully designed reward functions. A high-level prior model can be easily injected on top to generate unlimited long and diverse sequences. Although we focus on no driving sources now, our framework can be generalized for controlled synthesis with explicit driving sources. Through comprehensive evaluations, we conclude that our proposed framework can address all the above-mentioned challenges and outperform other strong baselines very significantly.

摘要
如何生成无驱动源的多样化、生命般自然的头部或身体序列？我们认为这是一个尚未受到充分研究的问题，具有独特的技术挑战。在没有 semantic 约束的情况下，使用标准的 autoregressive 模型来生成无限长的序列将导致1) OUT-OF-DISTRIBUTION（OOD）问题 Due to the accumulated error，2) 不够的多样性以生成自然和生命般的动作序列，和3) 不愿意的 periodic 模式在时间上。为了解决以上挑战，我们提出了一个系统性的框架，该框架将 VQ-VAE 的优点和一种新的 токен级控制策略，通过使用 Carefully 设计的 reward 函数进行训练。高级别的 prior 模型可以轻松地在上面嵌入，以生成无限长和多样化的序列。虽然我们现在没有驱动源，但我们的框架可以通过 Controlled 的方式扩展到有显式驱动源的情况。通过全面的评估，我们结论是我们提出的框架可以解决所有以上挑战，并与其他强大的基准模型进行比较，显著超出其性能。

Optimizing transformer-based machine translation model for single GPU training: a hyperparameter ablation study

paper_url: http://arxiv.org/abs/2308.06017
repo_url: None
paper_authors: Luv Verma, Ketaki N. Kolhatkar
for: explore the relationship between model complexity and performance in machine translation tasks
methods: systematic investigation using ablation and a single NVIDIA A100 GPU
results: unexpected insight that smaller models can be more effective, and the importance of precise hyperparameter tuning over mere scaling

Abstract
In machine translation tasks, the relationship between model complexity and performance is often presumed to be linear, driving an increase in the number of parameters and consequent demands for computational resources like multiple GPUs. To explore this assumption, this study systematically investigates the effects of hyperparameters through ablation on a sequence-to-sequence machine translation pipeline, utilizing a single NVIDIA A100 GPU. Contrary to expectations, our experiments reveal that combinations with the most parameters were not necessarily the most effective. This unexpected insight prompted a careful reduction in parameter sizes, uncovering "sweet spots" that enable training sophisticated models on a single GPU without compromising translation quality. The findings demonstrate an intricate relationship between hyperparameter selection, model size, and computational resource needs. The insights from this study contribute to the ongoing efforts to make machine translation more accessible and cost-effective, emphasizing the importance of precise hyperparameter tuning over mere scaling.

摘要
在机器翻译任务中，模型复杂度和性能之间的关系 oft presumed 是线性的，驱动参数数量的增加和计算资源的需求，如多个GPU。为了探讨这个假设，这项研究系统atically investigate了机器翻译管道中的效果，使用单个NVIDIA A100 GPU。与预期不同，我们的实验发现，最多参数的组合并不总是最有效的。这 Unexpected insight prompted 我们进行精细的参数减少，探索“甜点”，使得在单个GPU上进行训练复杂的模型不会丧失翻译质量。这些发现表明了参数选择、模型大小和计算资源需求之间的复杂关系。这些发现对于使机器翻译更加可 accessible和cost-effective 有益，强调精确的参数调整的重要性，而不是仅仅是扩大。

Large Language Models for Telecom: Forthcoming Impact on the Industry

paper_url: http://arxiv.org/abs/2308.06013
repo_url: None
paper_authors: Ali Maatouk, Nicola Piovesan, Fadhel Ayed, Antonio De Domenico, Merouane Debbah
For: The paper explores the potential impact of Large Language Models (LLMs) on the telecom industry, and provides insights into their current capabilities and limitations.* Methods: The paper examines the use cases that can be readily implemented in the telecom industry, streamlining numerous tasks that currently hinder operational efficiency and demand significant manpower and engineering expertise.* Results: The paper uncovers essential research directions that deal with the distinctive challenges of utilizing LLMs within the telecom domain, addressing these challenges to fully harness the potential of LLMs and unlock their capabilities within the telecom domain.Here are the three information points in Simplified Chinese text:* For: 论文探讨了大语言模型（LLMs）对电信业界的影响，并提供了其当前能力和限制的深入分析。* Methods: 论文检查了可以在电信业界快速实施的用例，使numerous tasksof operational efficiency and engineering expertise demanding tasks become more efficient and streamlined.* Results: 论文浮现出了在电信领域使用 LLMs 的特殊挑战，并提出了解决这些挑战的重要研究方向，以全面发挥 LLMS 的潜在力量和电信领域中的可能性。

Abstract
Large Language Models (LLMs) have emerged as a transformative force, revolutionizing numerous fields well beyond the conventional domain of Natural Language Processing (NLP) and garnering unprecedented attention. As LLM technology continues to progress, the telecom industry is facing the prospect of its potential impact on its landscape. To elucidate these implications, we delve into the inner workings of LLMs, providing insights into their current capabilities and limitations. We also examine the use cases that can be readily implemented in the telecom industry, streamlining numerous tasks that currently hinder operational efficiency and demand significant manpower and engineering expertise. Furthermore, we uncover essential research directions that deal with the distinctive challenges of utilizing the LLMs within the telecom domain. Addressing these challenges represents a significant stride towards fully harnessing the potential of LLMs and unlocking their capabilities to the fullest extent within the telecom domain.

摘要
Translation notes:* "Large Language Models" (LLMs) is translated as "大型语言模型" (dàxìng yǔyán módel) in Simplified Chinese.* "Natural Language Processing" (NLP) is translated as "自然语言处理" (zìrán yǔyán xῡxí) in Simplified Chinese.* "telecom industry" is translated as "电信行业" (diànxiāng xíngyè) in Simplified Chinese.* "inner workings" is translated as "内部机制" (nèibù jīzhì) in Simplified Chinese.* "capabilities and limitations" is translated as "能力和局限性" (nénglì yǔ jiǔxiàn xìng) in Simplified Chinese.* "use cases" is translated as "应用场景" (yìngyòu scènes) in Simplified Chinese.* "immediately" is translated as "立即" (lìjí) in Simplified Chinese.* "significant manpower and engineering expertise" is translated as "巨大的人力和工程培训" (kùde dì zhōngyàng yǔgōng zhìxíng) in Simplified Chinese.* "essential research directions" is translated as "重要的研究方向" (zhòngyào de yánjiù fāngdìng) in Simplified Chinese.* "distinctive challenges" is translated as "特殊的挑战" (tèshū de tiǎozhàn) in Simplified Chinese.

Deep Task-specific Bottom Representation Network for Multi-Task Recommendation

paper_url: http://arxiv.org/abs/2308.05996
repo_url: None
paper_authors: Qi Liu, Zhilong Zhou, Gangwei Jiang, Tiezheng Ge, Defu Lian
for: 提高 recommendation system 的性能，解决多任务学习中任务之间的负向传递问题。
methods: 提出了 Deep Task-specific Bottom Representation Network (DTRN)，通过在底层表示模型学习阶段为每个任务分别学习专门的表示，解决任务之间的负向传递问题。
results: 通过实验证明，DTRN 可以提高 recommendation system 的性能，并且可以与现有的多任务学习方法结合使用。

Abstract
Neural-based multi-task learning (MTL) has gained significant improvement, and it has been successfully applied to recommendation system (RS). Recent deep MTL methods for RS (e.g. MMoE, PLE) focus on designing soft gating-based parameter-sharing networks that implicitly learn a generalized representation for each task. However, MTL methods may suffer from performance degeneration when dealing with conflicting tasks, as negative transfer effects can occur on the task-shared bottom representation. This can result in a reduced capacity for MTL methods to capture task-specific characteristics, ultimately impeding their effectiveness and hindering the ability to generalize well on all tasks. In this paper, we focus on the bottom representation learning of MTL in RS and propose the Deep Task-specific Bottom Representation Network (DTRN) to alleviate the negative transfer problem. DTRN obtains task-specific bottom representation explicitly by making each task have its own representation learning network in the bottom representation modeling stage. Specifically, it extracts the user's interests from multiple types of behavior sequences for each task through the parameter-efficient hypernetwork. To further obtain the dedicated representation for each task, DTRN refines the representation of each feature by employing a SENet-like network for each task. The two proposed modules can achieve the purpose of getting task-specific bottom representation to relieve tasks' mutual interference. Moreover, the proposed DTRN is flexible to combine with existing MTL methods. Experiments on one public dataset and one industrial dataset demonstrate the effectiveness of the proposed DTRN.

摘要

Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model

paper_url: http://arxiv.org/abs/2308.05995
repo_url: None
paper_authors: Fan Zhang, Naye Ji, Fuxing Gao, Siyuan Zhao, Zhaohan Wang, Shunman Li
for: 这篇论文主要目标是提出一种基于扩散的非自回归 трансформа器模型，用于生成基于语音的自然化姿势。
methods: 该模型使用 WavLM 预训练模型提取低级和高级语音信息，并使用适应层 нор方案学习语音信息和附属姿势之间的关系。
results: 对于 Trinity、ZEGGS 和 BEAT 等 dataset，模型能够生成自然化的姿势，并且可以控制姿势的风格和质量。

Abstract
The generation of co-speech gestures for digital humans is an emerging area in the field of virtual human creation. Prior research has made progress by using acoustic and semantic information as input and adopting classify method to identify the person's ID and emotion for driving co-speech gesture generation. However, this endeavour still faces significant challenges. These challenges go beyond the intricate interplay between co-speech gestures, speech acoustic, and semantics; they also encompass the complexities associated with personality, emotion, and other obscure but important factors. This paper introduces "diffmotion-v2," a speech-conditional diffusion-based and non-autoregressive transformer-based generative model with WavLM pre-trained model. It can produce individual and stylized full-body co-speech gestures only using raw speech audio, eliminating the need for complex multimodal processing and manually annotated. Firstly, considering that speech audio not only contains acoustic and semantic features but also conveys personality traits, emotions, and more subtle information related to accompanying gestures, we pioneer the adaptation of WavLM, a large-scale pre-trained model, to extract low-level and high-level audio information. Secondly, we introduce an adaptive layer norm architecture in the transformer-based layer to learn the relationship between speech information and accompanying gestures. Extensive subjective evaluation experiments are conducted on the Trinity, ZEGGS, and BEAT datasets to confirm the WavLM and the model's ability to synthesize natural co-speech gestures with various styles.

摘要
“数字人类创造领域中的同声动作生成是一个emerging领域。先前的研究使用了语音和语义信息作为输入，采用分类方法来确定人的ID和情绪，以驱动同声动作生成。然而，这个努力仍然面临着一些挑战。这些挑战不仅包括语音、语义和同声动作之间的复杂互动，还包括人格、情绪和其他一些重要 yet obscure的因素。本文介绍了一种基于扩散和非autoregressive transformer的generative模型，称为diffmotion-v2。该模型可以通过 Raw speech audio 生成具有个性化和风格化的全身同声动作，无需复杂的多modal处理和手动标注。首先，我们认为语音audio不仅包含语音和语义特征，还拥有人格特征、情绪和更加细微的动作相关信息。因此，我们采用了 WavLM，一个大规模预训练模型，以提取低级和高级语音信息。其次，我们引入了adaptive层norm架构，以学习语音信息和同声动作之间的关系。我们在 Trinity、ZEGGS 和 BEAT 数据集上进行了评估实验，以确认 WavLM 和模型的能力生成自然的同声动作。”

Defensive Perception: Estimation and Monitoring of Neural Network Performance under Deployment

paper_url: http://arxiv.org/abs/2308.06299
repo_url: None
paper_authors: Hendrik Vogt, Stefan Buehler, Mark Schutera
for: 本研究旨在 Addressing the issue of unnoticed catastrophic deployment and domain shift in neural networks for semantic segmentation in autonomous driving.
methods: 我们的方法基于 deep learning-based perception for autonomous driving 是 uncertain 的，并且可以通过 Monte Carlo Dropout 方法来 estimating epistemic uncertainty. 我们的方法不需要修改已经部署的神经网络，并且可以保证预期的模型性能。
results: 我们的方法可以 estimate neural network performance，并且可以 monitoring 和 notification of entering domains of reduced neural network performance under deployment. 我们还提出了一些新的方法来改进应用在部署设置下，包括减少计算成本和限制估计噪声。最后，我们示出了我们的方法在多种不同的部署转移 relevante to autonomous driving 中的应用，如夜晚、雨天或雪天等。总的来说，我们的方法在部署设置下有很大的潜力，可以实现 operational design domain recognition via uncertainty，并且可以提供 defensive perception、safe state triggers、 warning notifications 和 feedback for testing or development and adaptation of the perception stack.

Abstract
In this paper, we propose a method for addressing the issue of unnoticed catastrophic deployment and domain shift in neural networks for semantic segmentation in autonomous driving. Our approach is based on the idea that deep learning-based perception for autonomous driving is uncertain and best represented as a probability distribution. As autonomous vehicles' safety is paramount, it is crucial for perception systems to recognize when the vehicle is leaving its operational design domain, anticipate hazardous uncertainty, and reduce the performance of the perception system. To address this, we propose to encapsulate the neural network under deployment within an uncertainty estimation envelope that is based on the epistemic uncertainty estimation through the Monte Carlo Dropout approach. This approach does not require modification of the deployed neural network and guarantees expected model performance. Our defensive perception envelope has the capability to estimate a neural network's performance, enabling monitoring and notification of entering domains of reduced neural network performance under deployment. Furthermore, our envelope is extended by novel methods to improve the application in deployment settings, including reducing compute expenses and confining estimation noise. Finally, we demonstrate the applicability of our method for multiple different potential deployment shifts relevant to autonomous driving, such as transitions into the night, rainy, or snowy domain. Overall, our approach shows great potential for application in deployment settings and enables operational design domain recognition via uncertainty, which allows for defensive perception, safe state triggers, warning notifications, and feedback for testing or development and adaptation of the perception stack.

摘要
在这篇论文中，我们提出了一种方法来解决深度学习基于自动驾驶的神经网络中的不良发展和领域转移问题。我们的方法基于神经网络的不确定性，即神经网络在自动驾驶中的观测是一个可能性 Distribution 的。由于自动驾驶的安全性 paramount，因此神经网络的观测系统必须能够识别自动车辆离开操作设计域，预测危险不确定性，并降低神经网络的性能。为此，我们提议将神经网络在部署过程中包裹在一个不确定性估计封装中，该封装基于 Monte Carlo Dropout 方法来估计神经网络的不确定性。这种方法不需要修改已部署的神经网络，并保证预期的模型性能。我们的防御观测封装具有估计神经网络性能的能力，可以监测和通知部署过程中神经网络性能下降。此外，我们还提出了一些新的方法来改进部署设置中的应用，包括减少计算成本和限制估计噪声。最后，我们示例了我们的方法在多种不同的部署转移中的应用，如夜晚、雨天或雪天等。总之，我们的方法在部署设置中具有潜力，可以实现操作设计域识别，并提供了防御观测、安全状态触发器、警示通知和测试或开发和适应观测堆的反馈。

TrajPAC: Towards Robustness Verification of Pedestrian Trajectory Prediction Models

paper_url: http://arxiv.org/abs/2308.05985
repo_url: https://github.com/zl-helios/trajpac
paper_authors: Liang Zhang, Nathaniel Xu, Pengfei Yang, Gaojie Jin, Cheng-Chao Huang, Lijun Zhang
for: 本研究旨在提高自动驾驶车辆的安全性，即使在拥有不同的情况下进行预测。
methods: 本研究使用了一种可靠的拟合正确（PAC）框架，以确保方法的稳定性和可靠性。
results: 本研究对四种state-of-the-art trajectory prediction模型进行了robustness测试，并通过了TrajPAC工具的评估。同时，研究还探讨了影响robustness性表现的多种因素。

Abstract
Robust pedestrian trajectory forecasting is crucial to developing safe autonomous vehicles. Although previous works have studied adversarial robustness in the context of trajectory forecasting, some significant issues remain unaddressed. In this work, we try to tackle these crucial problems. Firstly, the previous definitions of robustness in trajectory prediction are ambiguous. We thus provide formal definitions for two kinds of robustness, namely label robustness and pure robustness. Secondly, as previous works fail to consider robustness about all points in a disturbance interval, we utilise a probably approximately correct (PAC) framework for robustness verification. Additionally, this framework can not only identify potential counterexamples, but also provides interpretable analyses of the original methods. Our approach is applied using a prototype tool named TrajPAC. With TrajPAC, we evaluate the robustness of four state-of-the-art trajectory prediction models -- Trajectron++, MemoNet, AgentFormer, and MID -- on trajectories from five scenes of the ETH/UCY dataset and scenes of the Stanford Drone Dataset. Using our framework, we also experimentally study various factors that could influence robustness performance.

摘要
Robust pedestrian trajectory forecasting是Autonomous Vehicle的关键技能。 Previous works have studied adversarial robustness in the context of trajectory forecasting, but some significant issues remain unaddressed. In this work, we try to tackle these crucial problems.Firstly, the previous definitions of robustness in trajectory prediction are ambiguous. We thus provide formal definitions for two kinds of robustness, namely label robustness and pure robustness.Secondly, previous works fail to consider robustness about all points in a disturbance interval. We utilize a probably approximately correct (PAC) framework for robustness verification. This framework not only identifies potential counterexamples but also provides interpretable analyses of the original methods.Our approach is applied using a prototype tool named TrajPAC. With TrajPAC, we evaluate the robustness of four state-of-the-art trajectory prediction models -- Trajectron++, MemoNet, AgentFormer, and MID -- on trajectories from five scenes of the ETH/UCY dataset and scenes of the Stanford Drone Dataset. Using our framework, we also experimentally study various factors that could influence robustness performance.

Contrastive Explanations of Multi-agent Optimization Solutions

paper_url: http://arxiv.org/abs/2308.05984
repo_url: None
paper_authors: Parisa Zehtabi, Alberto Pozanco, Ayala Bloch, Daniel Borrajo, Sarit Kraus
for: 提供了一种适用于多代理优化问题的域独立方法来获取冲突解释。
methods: 该方法包括生成一个新的解决方案，并将该解决方案与原始解决方案进行比较，以便高亮差异。
results: 计算机实验和用户研究表明，该方法可以为大型多代理优化问题提供有用的冲突解释，并使人类用户对原始解决方案的满意度提高。

Abstract
In many real-world scenarios, agents are involved in optimization problems. Since most of these scenarios are over-constrained, optimal solutions do not always satisfy all agents. Some agents might be unhappy and ask questions of the form ``Why does solution $S$ not satisfy property $P$?''. In this paper, we propose MAoE, a domain-independent approach to obtain contrastive explanations by (i) generating a new solution $S^\prime$ where the property $P$ is enforced, while also minimizing the differences between $S$ and $S^\prime$; and (ii) highlighting the differences between the two solutions. Such explanations aim to help agents understanding why the initial solution is better than what they expected. We have carried out a computational evaluation that shows that MAoE can generate contrastive explanations for large multi-agent optimization problems. We have also performed an extensive user study in four different domains that shows that, after being presented with these explanations, humans' satisfaction with the original solution increases.

摘要
在许多实际场景中，代理人经常参与优化问题。由于大多数场景是过Constraint的，优化解决方案不总能满足所有代理人。一些代理人可能不满意并提问“为什么解决方案 $S$ 不满足属性 $P”？在这篇论文中，我们提议了MAoE，一种适用于各种领域的途径，通过（i）生成一个新的解决方案 $S^\prime$，使属性 $P$ 得到满足，同时尽量减少 $S$ 和 $S^\prime$ 之间的差异；以及（ii）高亮显示这两个解决方案之间的差异。这些解释的目的是帮助代理人理解初始解决方案是如何比预期更好。我们进行了大量计算评估，证明了MAoE可以为大规模多代理人优化问题生成对比性的解释。我们还进行了四个不同领域的用户研究，发现，在被给予这些解释后，人们对初始解决方案的满意度增加。

Face Encryption via Frequency-Restricted Identity-Agnostic Attacks

paper_url: http://arxiv.org/abs/2308.05983
repo_url: None
paper_authors: Xin Dong, Rui Wang, Siyuan Liang, Aishan Liu, Lihua Jing
for: 防止face recognition系统的敏感资讯泄露
methods: 利用频率限制identity-agnostic（FRIA）框架实现隐藏人脸图像
results: 实验结果显示FRIA可以实现高比例的黑盒攻击成功率（96%），并且在实际应用中显示出实际的应用前景。

Abstract
Billions of people are sharing their daily live images on social media everyday. However, malicious collectors use deep face recognition systems to easily steal their biometric information (e.g., faces) from these images. Some studies are being conducted to generate encrypted face photos using adversarial attacks by introducing imperceptible perturbations to reduce face information leakage. However, existing studies need stronger black-box scenario feasibility and more natural visual appearances, which challenge the feasibility of privacy protection. To address these problems, we propose a frequency-restricted identity-agnostic (FRIA) framework to encrypt face images from unauthorized face recognition without access to personal information. As for the weak black-box scenario feasibility, we obverse that representations of the average feature in multiple face recognition models are similar, thus we propose to utilize the average feature via the crawled dataset from the Internet as the target to guide the generation, which is also agnostic to identities of unknown face recognition systems; in nature, the low-frequency perturbations are more visually perceptible by the human vision system. Inspired by this, we restrict the perturbation in the low-frequency facial regions by discrete cosine transform to achieve the visual naturalness guarantee. Extensive experiments on several face recognition models demonstrate that our FRIA outperforms other state-of-the-art methods in generating more natural encrypted faces while attaining high black-box attack success rates of 96%. In addition, we validate the efficacy of FRIA using real-world black-box commercial API, which reveals the potential of FRIA in practice. Our codes can be found in https://github.com/XinDong10/FRIA.

摘要
每天，数十亿人在社交媒体上分享每日生活图片。然而，恶意收集者使用深度人脸识别系统抽取这些图片中的生物信息（例如，脸部）。一些研究在生成加密的人脸照片方面进行了努力，但现有研究受到强制黑盒场景可行性和更自然的视觉效果的挑战。为了解决这些问题，我们提出了频率限制人anonymous（FRIA）框架，用于加密不经授权的人脸识别，不需要个人信息。在弱黑盒场景下，我们发现了多个人脸识别模型的表示相似，因此我们提议使用这些表示作为准则，帮助生成加密人脸照片。此外，我们还采用了快播扩散变换来限制低频脸部区域中的杂散变换，以保证视觉自然性。我们的实验表明，FRIA可以在多个人脸识别模型上达到96%的黑盒攻击成功率，同时生成更自然的加密人脸照片。此外，我们还验证了FRIA的可行性，使用了实际的黑盒商用API。codes可以在https://github.com/XinDong10/FRIA中找到。

CyberForce: A Federated Reinforcement Learning Framework for Malware Mitigation

paper_url: http://arxiv.org/abs/2308.05978
repo_url: None
paper_authors: Chao Feng, Alberto Huertas Celdran, Pedro Miguel Sanchez Sanchez, Jan Kreischer, Jan von der Assen, Gerome Bovet, Gregorio Martinez Perez, Burkhard Stiller
for: 提高互联网物联网设备的网络安全性
methods: 使用联邦强化学习（FRL）和设备指纹识别技术，采用分布式强化学习来采集和私有地确定适用于防御零日攻击的最佳防御策略
results: 在一个真实的互联网物联网平台上进行了一组实验，证明了CyberForce可以高精度地学习适合防御零日攻击的最佳防御策略，并且在所有客户端受到所有攻击时，FRL Agent比中央RL Agent更快速地训练和选择合适的防御策略。在不同的客户端遭受不同的攻击时，CyberForce客户端可以从其他客户端中获得知识并采用相似的攻击行为。此外，CyberForce还显示了强大的数据欺诈攻击Robustness。

Abstract
The expansion of the Internet-of-Things (IoT) paradigm is inevitable, but vulnerabilities of IoT devices to malware incidents have become an increasing concern. Recent research has shown that the integration of Reinforcement Learning with Moving Target Defense (MTD) mechanisms can enhance cybersecurity in IoT devices. Nevertheless, the numerous new malware attacks and the time that agents take to learn and select effective MTD techniques make this approach impractical for real-world IoT scenarios. To tackle this issue, this work presents CyberForce, a framework that employs Federated Reinforcement Learning (FRL) to collectively and privately determine suitable MTD techniques for mitigating diverse zero-day attacks. CyberForce integrates device fingerprinting and anomaly detection to reward or penalize MTD mechanisms chosen by an FRL-based agent. The framework has been evaluated in a federation consisting of ten devices of a real IoT platform. A pool of experiments with six malware samples affecting the devices has demonstrated that CyberForce can precisely learn optimum MTD mitigation strategies. When all clients are affected by all attacks, the FRL agent exhibits high accuracy and reduced training time when compared to a centralized RL agent. In cases where different clients experience distinct attacks, the CyberForce clients gain benefits through the transfer of knowledge from other clients and similar attack behavior. Additionally, CyberForce showcases notable robustness against data poisoning attacks.

摘要
互联网物料（IoT）的扩展是不可避免的，但是IoT设备对于恶意软件攻击的脆弱性已经成为一个增加的 Concern。 latest research 表明，将强化学习与移动目标防御（MTD）机制相结合可以强化IoT设备的防护。然而，新的恶意软件攻击和代理人执行时间使得这种方法在实际的IoT应用中不实际。为解决这个问题，这个研究呈现了CyberForce框架，这个框架使用联邦强化学习（FRL）来集体和私有地决定适当的MTD策略以避免多种零日攻击。CyberForce框架具有设备识别和偏见检测，将选择的MTD机制作为FRL基于的代理人奖励或惩罚。在一个真实的IoT平台上的十台设备组成的联邦中进行了评估。一系列实验显示，CyberForce可以精确地学习适当的MTD防御策略。当所有客户端受到所有攻击时，FRL代理人比中央RL代理人更高精度和较少的训练时间。在不同的客户端遭受不同的攻击时，CyberForce客户端从其他客户端和相似的攻击行为中获得了知识转移。此外，CyberForce还表现出了杰出的抗毒血统特性。

Tweet Sentiment Extraction using Viterbi Algorithm with Transfer Learning

paper_url: http://arxiv.org/abs/2308.05973
repo_url: https://github.com/Zied130/Tweet_Sentiment-
paper_authors: Zied Baklouti
for: 本研究旨在Identifying tweet sentence中的情感部分。
methods: 该研究基于Modified Viterbi algorithm，并引入了信任分数和向量作为内部评估指标。
results: 研究发现，通过调整非 Parametric 模型，可以获得高度解释性的结果，并且信任分数向量可以准确地表示模型对最不确定预测状态的不确idence。

Abstract
Tweet sentiment extraction extracts the most significant portion of the sentence, determining whether the sentiment is positive or negative. This research aims to identify the part of tweet sentences that strikes any emotion. To reach this objective, we continue improving the Viterbi algorithm previously modified by the author to make it able to receive pre-trained model parameters. We introduce the confidence score and vector as two indicators responsible for evaluating the model internally before assessing the final results. We then present a method to fine-tune this nonparametric model. We found that the model gets highly explainable as the confidence score vector reveals precisely where the least confidence predicted states are and if the modifications approved ameliorate the confidence score or if the tuning is going in the wrong direction.

摘要
《推文情感EXTraction》是一项研究，旨在从推文句子中提取情感的最重要部分，以确定情感是正面或负面。为达到这个目标，我们继续改进了作者已经修改过的维特比 алгоритм，以使其能够接受预训练模型参数。我们引入了信任分数和向量作为两个评估模型的内部指标，以评估模型的性能。然后，我们提出了一种方法来细化这种非 Parametric 模型。我们发现，模型在信任分数向量的指导下变得非常可解释，并且可以准确地描述最不信任的预测状态和修改是否有助于提高信任分数。

An Encoder-Decoder Approach for Packing Circles

paper_url: http://arxiv.org/abs/2308.07335
repo_url: None
paper_authors: Akshay Kiran Jose, Gangadhar Karevvanavar, Rajshekhar V Bhat
for: 本研究旨在解决一个多年来吸引了广泛关注的封装问题，即将小型对象封装在大型对象中，并且要求小型对象不可以相互重叠或者尽量减少重叠。
methods: 本研究提出了一种新的编码器-解码器架构，包括编码器块、扰动块和解码器块，用于封装同形圆形在大型圆形中。该方法中，编码器接受一个圆形的标识符作为输入，并通过一个归一化层输出圆心，扰动层添加了控制的扰动，使圆心不能超过小圆形的半径，而解码器接受扰动后的圆心作为输入，并估算出封装的圆形标识符。
results: 该方法可以对高维度和不同形状的对象进行封装，并且可以提供竞争性的性能 compared to 经典方法。

Abstract
The problem of packing smaller objects within a larger object has been of interest since decades. In these problems, in addition to the requirement that the smaller objects must lie completely inside the larger objects, they are expected to not overlap or have minimum overlap with each other. Due to this, the problem of packing turns out to be a non-convex problem, obtaining whose optimal solution is challenging. As such, several heuristic approaches have been used for obtaining sub-optimal solutions in general, and provably optimal solutions for some special instances. In this paper, we propose a novel encoder-decoder architecture consisting of an encoder block, a perturbation block and a decoder block, for packing identical circles within a larger circle. In our approach, the encoder takes the index of a circle to be packed as an input and outputs its center through a normalization layer, the perturbation layer adds controlled perturbations to the center, ensuring that it does not deviate beyond the radius of the smaller circle to be packed, and the decoder takes the perturbed center as input and estimates the index of the intended circle for packing. We parameterize the encoder and decoder by a neural network and optimize it to reduce an error between the decoder's estimated index and the actual index of the circle provided as input to the encoder. The proposed approach can be generalized to pack objects of higher dimensions and different shapes by carefully choosing normalization and perturbation layers. The approach gives a sub-optimal solution and is able to pack smaller objects within a larger object with competitive performance with respect to classical methods.

摘要
“ Packing smaller objects within a larger object 已经是多年来的研究问题。在这些问题中，除了要求小 objet完全嵌入大 objet 之外，还需要避免它们之间的重叠或最小化重叠。由于这个原因， packing 问题变得非 convex 的，获得优化的解决方案具有挑战性。因此，许多启发法被用来获得不优化的解决方案，以及对特殊情况下的可证优化解决方案。在这篇论文中，我们提出了一种新的编码器-解码器架构，包括编码器块、扰动块和解码器块，用于嵌入 identical circles within a larger circle。在我们的方法中，编码器接受一个圆的索引作为输入，并通过normalization layer输出圆心，perturbation layer添加了控制的扰动，确保圆心不会超过小圆的半径，而解码器接受扰动后的圆心作为输入，并估计圆的索引。我们将编码器和解码器参数化为神经网络，并优化它以降低神经网络的输出与实际输入圆的索引之间的错误。我们的方法可以扩展到嵌入高维度和不同形状的对象，通过合适的 normalization 和扰动层来进行parameterization。我们的方法可以提供竞争性的性能，并且可以嵌入小对象 within a larger object 中。”

Decentralised Governance for Foundation Model based Systems: Exploring the Role of Blockchain in Responsible AI

paper_url: http://arxiv.org/abs/2308.05962
repo_url: None
paper_authors: Yue Liu, Qinghua Lu, Liming Zhu, Hye-Young Paik
for: 本研究旨在探讨基础模型 Based AI 系统的治理问题，以确保其可靠性和避免滥用，并对人类、社会和环境造成害。
methods: 本研究采用了 eight 个治理挑战，涵盖基础模型 Based AI 系统的三个基本维度：决策权、激励和责任。此外，研究还探讨了使用区块链技术来解决这些挑战的可能性。
results: 研究表明，使用区块链技术可以实现基础模型 Based AI 系统的分布式治理，并提高其可靠性和安全性。

Abstract
Foundation models are increasingly attracting interest worldwide for their distinguished capabilities and potential to perform a wide variety of tasks. Nevertheless, people are concerned about whether foundation model based AI systems are properly governed to ensure trustworthiness of foundation model based AI systems and to prevent misuse that could harm humans, society and the environment. In this paper, we identify eight governance challenges in the entire lifecycle of foundation model based AI systems regarding the three fundamental dimensions of governance: decision rights, incentives, and accountability. Furthermore, we explore the potential of blockchain as a solution to address the challenges by providing a distributed ledger to facilitate decentralised governance. We present an architecture that demonstrates how blockchain can be leveraged to realise governance in foundation model based AI systems.

摘要
基础模型在全球引起了越来越多的关注，因为它们具有突出的能力和可以执行各种任务。然而，人们担心基础模型基于的AI系统是否得到了适当的管理，以确保该系统的可靠性和避免滥用，以避免对人类、社会和环境造成伤害。在这篇论文中，我们认为基础模型基于AI系统的管理存在八个挑战，这些挑战分布在三个基本维度上：决策权、激励和责任。此外，我们还探讨了使用区块链解决这些挑战的可能性，并提出了一种架构，以示如何使用区块链实现基础模型基于AI系统的管理。

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

paper_url: http://arxiv.org/abs/2308.05960
repo_url: https://github.com/salesforce/bolaa
paper_authors: Zhiwei Liu, Weiran Yao, Jianguo Zhang, Le Xue, Shelby Heinecke, Rithesh Murthy, Yihao Feng, Zeyuan Chen, Juan Carlos Niebles, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese
for: 这篇论文旨在比较不同类型的自主代理（LAA）和大语言模型（LLM）的可比性，以及提出一种新的多代理管理策略，以提高LAA在各种决策和多步逻辑环境中的表现。
methods: 论文使用了多种代理体系和LLM脊梁，并进行了广泛的 simulations validate LAAs 的性能。
results: 研究结果表明，BOLAA 可以在各种环境中提高 LAAs 的表现，并且可以提供可靠的量化建议 для LAA 的设计和 LLMS 的选择。

Abstract
The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs). An LAA is able to generate actions with its core LLM and interact with environments, which facilitates the ability to resolve complex tasks by conditioning on past interactions such as observations and actions. Since the investigation of LAA is still very recent, limited explorations are available. Therefore, we provide a comprehensive comparison of LAA in terms of both agent architectures and LLM backbones. Additionally, we propose a new strategy to orchestrate multiple LAAs such that each labor LAA focuses on one type of action, \textit{i.e.} BOLAA, where a controller manages the communication among multiple agents. We conduct simulations on both decision-making and multi-step reasoning environments, which comprehensively justify the capacity of LAAs. Our performance results provide quantitative suggestions for designing LAA architectures and the optimal choice of LLMs, as well as the compatibility of both. We release our implementation code of LAAs to the public at \url{https://github.com/salesforce/BOLAA}.

摘要
大型语言模型（LLM）的巨大成功激发了LAA的潜在探索（LLM-augmented Autonomous Agents）。LAA可以通过核心LLM生成动作并与环境交互，从而实现根据过去交互的条件来解决复杂任务。由于LAA的探索还很新，有限的探索结果 disponible。因此，我们提供了LAA的全面比较，包括代理建筑和LLM脊梁。此外，我们提议一种新的策略，使得每个劳动LAA专注于一种类型的动作，即BOLAA，其中一个控制器负责多个代理之间的交流。我们在决策和多步逻辑环境中进行了模拟，全面证明了LAAs的能力。我们的性能结果提供了LLA的建 architecture和LLM的优选，以及这两者之间的兼容性。我们在GitHub上公开了LAAs的实现代码，请参考\url{https://github.com/salesforce/BOLAA}.

FoodSAM: Any Food Segmentation

paper_url: http://arxiv.org/abs/2308.05938
repo_url: https://github.com/jamesjg/foodsam
paper_authors: Xing Lan, Jiayi Lyu, Hanyu Jiang, Kun Dong, Zehai Niu, Yi Zhang, Jian Xue
for: 这篇论文探讨了Segment Anything Model（SAM）在食物图像分割中的零基础能力。
methods: 作者提出了一种新的框架，即FoodSAM，用于增强SAM生成的mask的semantic segmentation质量。此外，作者还提出了一种基于独立个体的思想，对食物图像进行实例分割。
results: 广泛的实验表明FoodSAM可以有效地分割食物项目，并且可以在多种级别进行分割。此外，FoodSAM还可以实现实例、�anoptic和Promptable segmentation，是首次在食物图像分割领域实现这些功能的工作。

Abstract
In this paper, we explore the zero-shot capability of the Segment Anything Model (SAM) for food image segmentation. To address the lack of class-specific information in SAM-generated masks, we propose a novel framework, called FoodSAM. This innovative approach integrates the coarse semantic mask with SAM-generated masks to enhance semantic segmentation quality. Besides, we recognize that the ingredients in food can be supposed as independent individuals, which motivated us to perform instance segmentation on food images. Furthermore, FoodSAM extends its zero-shot capability to encompass panoptic segmentation by incorporating an object detector, which renders FoodSAM to effectively capture non-food object information. Drawing inspiration from the recent success of promptable segmentation, we also extend FoodSAM to promptable segmentation, supporting various prompt variants. Consequently, FoodSAM emerges as an all-encompassing solution capable of segmenting food items at multiple levels of granularity. Remarkably, this pioneering framework stands as the first-ever work to achieve instance, panoptic, and promptable segmentation on food images. Extensive experiments demonstrate the feasibility and impressing performance of FoodSAM, validating SAM's potential as a prominent and influential tool within the domain of food image segmentation. We release our code at https://github.com/jamesjg/FoodSAM.

摘要
在这篇论文中，我们探讨Segment Anything Model（SAM）在食品图像分割中的零类能力。为了 Address SAM生成的masks中缺乏类别特定信息，我们提出了一种新的框架，叫做FoodSAM。这种创新的approach integrates the coarse semantic mask with SAM-generated masks，以提高semantic segmentation的质量。此外，我们认为食品中的 ингредиënces可以 viewed as independent individuals，这使我们能够在食品图像上进行实例分割。此外，FoodSAM还扩展了零类能力，以包括泛素分割，通过添加一个对象检测器，使FoodSAM能够有效地捕捉非食品对象信息。 Drawing inspiration from the recent success of promptable segmentation，我们也 extend FoodSAM to promptable segmentation，支持多种提示变体。因此，FoodSAM emerges as an all-encompassing solution capable of segmenting food items at multiple levels of granularity。值得一提的是，这是首次实现在食品图像上实现实例、泛素和可提示分割的工作。extensive experiments demonstrate the feasibility and impressive performance of FoodSAM，证明SAM的潜在力量以及其作为食品图像分割领域的引用工具。我们在https://github.com/jamesjg/FoodSAM上发布了我们的代码。

A Deep Recurrent-Reinforcement Learning Method for Intelligent AutoScaling of Serverless Functions

paper_url: http://arxiv.org/abs/2308.05937
repo_url: None
paper_authors: Siddharth Agarwal, Maria A. Rodriguez, Rajkumar Buyya
for: Function autoscaling in cloud environments, specifically for IoT-edge data processing and anomaly detection.
methods: Model-free Recurrent RL agent and Proximal Policy Optimization (PPO) algorithm.
results: Improved throughput, function execution, and accounted for more function instances compared to commercially used threshold-based function autoscaling.Here’s the full text in Simplified Chinese:
for: 函数自适应缩放在云环境中，尤其是 для IoT-edge数据处理和异常检测。
methods: 模型自由Recurrent RL代理和Proximal Policy Optimization（PPO）算法。
results: 提高通过率、函数执行率和负担更多的函数实例相比于商业使用的阈值基于函数自适应缩放。

Abstract
Function-as-a-Service (FaaS) introduces a lightweight, function-based cloud execution model that finds its relevance in applications like IoT-edge data processing and anomaly detection. While CSP offer a near-infinite function elasticity, these applications often experience fluctuating workloads and stricter performance constraints. A typical CSP strategy is to empirically determine and adjust desired function instances, "autoscaling", based on monitoring-based thresholds such as CPU or memory, to cope with demand and performance. However, threshold configuration either requires expert knowledge, historical data or a complete view of environment, making autoscaling a performance bottleneck lacking an adaptable solution.RL algorithms are proven to be beneficial in analysing complex cloud environments and result in an adaptable policy that maximizes the expected objectives. Most realistic cloud environments usually involve operational interference and have limited visibility, making them partially observable. A general solution to tackle observability in highly dynamic settings is to integrate Recurrent units with model-free RL algorithms and model a decision process as a POMDP. Therefore, in this paper, we investigate a model-free Recurrent RL agent for function autoscaling and compare it against the model-free Proximal Policy Optimisation (PPO) algorithm. We explore the integration of a LSTM network with the state-of-the-art PPO algorithm to find that under our experimental and evaluation settings, recurrent policies were able to capture the environment parameters and show promising results for function autoscaling. We further compare a PPO-based autoscaling agent with commercially used threshold-based function autoscaling and posit that a LSTM-based autoscaling agent is able to improve throughput by 18%, function execution by 13% and account for 8.4% more function instances.

摘要
Function-as-a-Service (FaaS) 引入了一种轻量级、功能基于云执行模型，在 IoT-edge 数据处理和异常检测等应用中发挥作用。而 CSP 提供了近乎无限的功能灵活性，但这些应用经常遇到波动性的工作负荷和更严格的性能限制。一般 CSP 策略是通过实际观察数据或历史数据来确定和调整所需的功能实例数量，以适应需求和性能。但是，这种策略通常需要专业知识、历史数据或完整的环境视图，从而导致自适应缩放成性能瓶颈。RL 算法已经在分析复杂云环境方面展现出了有利的特点，因此在这篇论文中，我们将 investigate 一种基于 POMDP 的模型自由 RL 代理来解决函数自适应缩放问题。我们将比较使用 PPO 算法和 LSTM 网络来模型决策过程，并发现在我们的实验和评估环境下，循环策略能够捕捉环境参数并显示出扎实的结果。我们进一步比较了使用 PPO 算法进行自适应缩放的代理和商业使用的阈值基于自适应缩放，并论证 LSTM 基于的自适应缩放代理能够提高吞吐量by 18%、功能执行by 13% 和覆盖8.4%更多的功能实例。

LittleMu: Deploying an Online Virtual Teaching Assistant via Heterogeneous Sources Integration and Chain of Teach Prompts

paper_url: http://arxiv.org/abs/2308.05935
repo_url: https://github.com/thu-keg/vta
paper_authors: Shangqing Tu, Zheyuan Zhang, Jifan Yu, Chunyang Li, Siyu Zhang, Zijun Yao, Lei Hou, Juanzi Li
for: 这篇论文旨在提供一个基于少量标注数据的虚拟MOOC教学助手，以支持在线学习。
methods: 该系统包括两个互动模块：一是结构化、半结构化和无结构化知识源的集成，以提供准确的答案；另一个是通过大规模预训练模型的“链式教学”示例，处理复杂的未收集问题。
results: 作者在线测试和实际投入中证明了该系统的性能，并在XuetangX MOOC平台上服务了超过80,000名用户，处理了超过300,000个问题。

Abstract
Teaching assistants have played essential roles in the long history of education. However, few MOOC platforms are providing human or virtual teaching assistants to support learning for massive online students due to the complexity of real-world online education scenarios and the lack of training data. In this paper, we present a virtual MOOC teaching assistant, LittleMu with minimum labeled training data, to provide question answering and chit-chat services. Consisting of two interactive modules of heterogeneous retrieval and language model prompting, LittleMu first integrates structural, semi- and unstructured knowledge sources to support accurate answers for a wide range of questions. Then, we design delicate demonstrations named "Chain of Teach" prompts to exploit the large-scale pre-trained model to handle complex uncollected questions. Except for question answering, we develop other educational services such as knowledge-grounded chit-chat. We test the system's performance via both offline evaluation and online deployment. Since May 2020, our LittleMu system has served over 80,000 users with over 300,000 queries from over 500 courses on XuetangX MOOC platform, which continuously contributes to a more convenient and fair education. Our code, services, and dataset will be available at https://github.com/THU-KEG/VTA.

摘要
教学助手在教育历史中扮演了关键角色，但目前许多MOOC平台没有提供人工或虚拟教学助手来支持在线学习者，这主要因为在线教育场景复杂，缺乏培训数据。在这篇论文中，我们提出了一个名为“小慕”的虚拟MOOC教学助手，可以提供问答和聊天服务。“小慕”包括两个互动模块：一是结构化、半结构化和无结构化知识源的集成，以支持各种问题的准确答案。其次，我们设计了细腻的示例名为“链条教”，以利用大规模预训练模型来处理复杂的未收集问题。除了问答外，我们还开发了其他教育服务，如知识基于聊天。我们对系统的性能进行了线上评估和下载测试。自2020年5月以来，我们的“小慕”系统已经为超过80,000名用户提供了超过300,000个问题的回答，从超过500门课程中获得了XuetangX MOOC平台的线上执行。我们的代码、服务和数据将在https://github.com/THU-KEG/VTA上提供。

Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT

paper_url: http://arxiv.org/abs/2308.06294
repo_url: None
paper_authors: Jingye Yang, Cong Liu, Wendy Deng, Da Wu, Chunhua Weng, Yunyun Zhou, Kai Wang
for: 本研究旨在开发一种基于transformer架构的大语言模型（LLM），以便自动检测临床现象术语，包括HPO未ocument的术语。
methods: 我们开发了两种模型：PhenoBCBERT和PhenoGPT。PhenoBCBERT使用 Bio+Clinical BERT 作为预训模型，而 PhenoGPT 则可以从多种 GPT 模型中 initialize，包括开源版本如 GPT-J、Falcon 和 LLaMA，以及关闭源版本如 GPT-3 和 GPT-3.5。
results: 我们发现我们的方法可以从临床观察纪录中提取更多的现象术语，包括 novel 的术语不受 HPO 规范。我们还进行了生物医学文献中的案例研究，以示新现象信息的识别和提取。我们比较了现有的 BERT 基本的 versus GPT 基本的模型，包括模型架构、内存使用、速度、准确率和隐私保护等多个方面。

Abstract
We hypothesize that large language models (LLMs) based on the transformer architecture can enable automated detection of clinical phenotype terms, including terms not documented in the HPO. In this study, we developed two types of models: PhenoBCBERT, a BERT-based model, utilizing Bio+Clinical BERT as its pre-trained model, and PhenoGPT, a GPT-based model that can be initialized from diverse GPT models, including open-source versions such as GPT-J, Falcon, and LLaMA, as well as closed-source versions such as GPT-3 and GPT-3.5. We compared our methods with PhenoTagger, a recently developed HPO recognition tool that combines rule-based and deep learning methods. We found that our methods can extract more phenotype concepts, including novel ones not characterized by HPO. We also performed case studies on biomedical literature to illustrate how new phenotype information can be recognized and extracted. We compared current BERT-based versus GPT-based models for phenotype tagging, in multiple aspects including model architecture, memory usage, speed, accuracy, and privacy protection. We also discussed the addition of a negation step and an HPO normalization layer to the transformer models for improved HPO term tagging. In conclusion, PhenoBCBERT and PhenoGPT enable the automated discovery of phenotype terms from clinical notes and biomedical literature, facilitating automated downstream tasks to derive new biological insights on human diseases.

摘要
We compared current BERT-based versus GPT-based models for phenotype tagging in multiple aspects, including model architecture, memory usage, speed, accuracy, and privacy protection. We also discussed the addition of a negation step and an HPO normalization layer to the transformer models for improved HPO term tagging. In conclusion, PhenoBCBERT and PhenoGPT enable the automated discovery of phenotype terms from clinical notes and biomedical literature, facilitating automated downstream tasks to derive new biological insights on human diseases.Translation notes:* "HPO" is translated as "人类疾病诊断 Ontology" (human disease diagnosis ontology)* "Phenotype" is translated as "诊断特征" (diagnostic feature)* "Clinical notes" is translated as "医疗记录" (medical records)* "Biomedical literature" is translated as "生物医学文献" (biomedical literature)* "BERT" is translated as " Bio+Clinical BERT" (Bio+Clinical BERT)* "GPT" is translated as " GPT-based model" (GPT-based model)* "PhenoTagger" is translated as "HPO识别工具" (HPO recognition tool)* "Rule-based" is translated as "规则基于的" (rule-based)* "Deep learning" is translated as "深度学习" (deep learning)* "Memory usage" is translated as "内存使用" (memory usage)* "Speed" is translated as "速度" (speed)* "Accuracy" is translated as "准确率" (accuracy)* "Privacy protection" is translated as "隐私保护" (privacy protection)* "Negation step" is translated as "否定步骤" (negation step)* "HPO normalization layer" is translated as "HPO正常化层" (HPO normalization layer)

paper_url: http://arxiv.org/abs/2308.05893
repo_url: None
paper_authors: Jaehoon Chung, Jamil Fayyad, Younes Al Younes, Homayoun Najjaran
For: 本论文主要探讨了多 Agent Pathfinding（MAPF）领域中 Deep Reinforcement Learning（DRL）的应用，并提供了一个综合的评估 metric 来评估不同的 MAPF 算法。* Methods: 本文使用了 DRL 技术来解决 MAPF 中的复杂问题，并提供了一个综合的评估 metric 来评估不同的 MAPF 算法。* Results: 本文提供了一个综合的评估 metric 来评估不同的 MAPF 算法，并介绍了 Model-based DRL 作为未来研究的可能性，以及其所需的基础理解。

Abstract
Multi-agent pathfinding (MAPF) is a critical field in many large-scale robotic applications, often being the fundamental step in multi-agent systems. The increasing complexity of MAPF in complex and crowded environments, however, critically diminishes the effectiveness of existing solutions. In contrast to other studies that have either presented a general overview of the recent advancements in MAPF or extensively reviewed Deep Reinforcement Learning (DRL) within multi-agent system settings independently, our work presented in this review paper focuses on highlighting the integration of DRL-based approaches in MAPF. Moreover, we aim to bridge the current gap in evaluating MAPF solutions by addressing the lack of unified evaluation metrics and providing comprehensive clarification on these metrics. Finally, our paper discusses the potential of model-based DRL as a promising future direction and provides its required foundational understanding to address current challenges in MAPF. Our objective is to assist readers in gaining insight into the current research direction, providing unified metrics for comparing different MAPF algorithms and expanding their knowledge of model-based DRL to address the existing challenges in MAPF.

摘要

DF2: Distribution-Free Decision-Focused Learning

paper_url: http://arxiv.org/abs/2308.05889
repo_url: None
paper_authors: Lingkai Kong, Wenhao Mu, Jiaming Cui, Yuchen Zhuang, B. Aditya Prakash, Bo Dai, Chao Zhang
for: 这篇论文是关于解决预测然后优化问题的决策尝试学（DFL）方法中的三个瓶颈的研究。
methods: 该论文提出了一种新的分布自由决策尝试学方法（DF2），该方法可以解决预测模型与优化目标之间的模型匹配错误、样本平均approximation错误和梯度approximation错误。
results: 该论文通过在一个 sintetic 问题、一个风力发电拍卖问题和一个非几何疫苗分布问题上进行测试，证明了 DF2 的有效性。

Abstract
Decision-focused learning (DFL) has recently emerged as a powerful approach for predict-then-optimize problems by customizing a predictive model to a downstream optimization task. However, existing end-to-end DFL methods are hindered by three significant bottlenecks: model mismatch error, sample average approximation error, and gradient approximation error. Model mismatch error stems from the misalignment between the model's parameterized predictive distribution and the true probability distribution. Sample average approximation error arises when using finite samples to approximate the expected optimization objective. Gradient approximation error occurs as DFL relies on the KKT condition for exact gradient computation, while most methods approximate the gradient for backpropagation in non-convex objectives. In this paper, we present DF2 -- the first \textit{distribution-free} decision-focused learning method explicitly designed to address these three bottlenecks. Rather than depending on a task-specific forecaster that requires precise model assumptions, our method directly learns the expected optimization function during training. To efficiently learn the function in a data-driven manner, we devise an attention-based model architecture inspired by the distribution-based parameterization of the expected objective. Our method is, to the best of our knowledge, the first to address all three bottlenecks within a single model. We evaluate DF2 on a synthetic problem, a wind power bidding problem, and a non-convex vaccine distribution problem, demonstrating the effectiveness of DF2.

摘要
决策关注学习（DFL）是一种有力的方法，用于预测然后优化问题，通过适应下游优化任务中的预测模型。然而，现有的端到端DFL方法受到三大瓶颈：模型匹配错误、样本平均预测错误和梯度近似错误。模型匹配错误来自预测模型参数化预测分布与真实概率分布的不一致。样本平均预测错误发生在使用有限样本来近似优化目标函数的时候。梯度近似错误由于DFL依赖于KKT条件来确定梯度，而大多数方法在非 convex 目标函数中使用梯度近似来进行反向传播。在这篇论文中，我们提出了DF2方法——首个不偏 towards任务特定预测器的分布自由决策关注学习方法。相比于基于任务特定预测器的方法，我们的方法直接在训练中学习预期优化函数。为效率地学习函数，我们设计了一种注意力基于分布参数化的预测模型。我们的方法是，到目前为止，第一个同时解决三大瓶颈的方法。我们在一个 sintetic 问题、一个风力发电拍卖问题和一个非 convex vaccine distribution问题上评估了DF2方法，并证明了其效果。

Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-Chips

paper_url: http://arxiv.org/abs/2308.05869
repo_url: None
paper_authors: Ismet Dagli, Mehmet Belviranli
for: 这篇论文是为了提出一种新的策略来管理移动和自动化系统中的多个工作负荷，以提高系统的性能和资源利用率。
methods: 该论文使用了一种新的策略，即HaX-CoNN，来映射具有不同加速器的concurrently执行深度神经网络（DNN）推理任务到SoC中的多种加速器中。该策略考虑了每层执行特性、共享内存（SM）竞争和间接加速器转换，以找到最佳调度。
results: 实验结果表明，HaX-CoNN可以减少SM竞争量达45%，并且可以提高延迟和总吞吐量相对于现有方法的32%和29%。

Abstract
Two distinguishing features of state-of-the-art mobile and autonomous systems are 1) there are often multiple workloads, mainly deep neural network (DNN) inference, running concurrently and continuously; and 2) they operate on shared memory system-on-chips (SoC) that embed heterogeneous accelerators tailored for specific operations. State-of-the-art lacks efficient performance and resource management techniques necessary to either maximize total system throughput or minimize end-to-end workload latency. In this work, we propose HaX-CoNN, a novel scheme that characterizes and maps layers in concurrently executing DNN inference workloads to a diverse set of accelerators within a SoC. Our scheme uniquely takes per-layer execution characteristics, shared memory (SM) contention, and inter-accelerator transitions into account to find optimal schedules. We evaluate HaX-CoNN on NVIDIA Orin, NVIDIA Xavier, and Qualcomm Snapdragon 865 SoCs. Our experimental results indicate that HaX-CoNN minimizes memory contention by up to 45% and can improve latency and total throughput by up to 32% and 29%, respectively, compared to the state-of-the-art approaches.

摘要
两个特点 distinguishing state-of-the-art 移动和自动化系统是：1）经常有多个工作负载，主要是深度神经网络（DNN）推理，同时并发运行；2）它们运行在共享内存系统在板（SoC）中，该系统嵌入特化为特定操作的多种加速器。现状缺乏有效的性能和资源管理技术，以最大化总系统吞吐量或最小化终端工作负载延迟。在这种工作中，我们提出了 HaX-CoNN 方案，它将在同时执行的 DNN 推理工作负载中映射层到 SoC 中的多种加速器中。我们的方案独特地考虑每层执行特性、共享内存（SM）竞争以及 между加速器的转换，以找到最佳时间表。我们对 NVIDIA Orin、NVIDIA Xavier 和 Qualcomm Snapdragon 865 SoC 进行了实验。我们的实验结果表明，HaX-CoNN 可以最大化内存竞争减少至 45%，并可以提高延迟和总吞吐量相对于现状方法的 32% 和 29%。

Unleashing the Strengths of Unlabeled Data in Pan-cancer Abdominal Organ Quantification: the FLARE22 Challenge

paper_url: http://arxiv.org/abs/2308.05862
repo_url: https://github.com/junma11/flare
paper_authors: Jun Ma, Yao Zhang, Song Gu, Cheng Ge, Shihao Ma, Adamo Young, Cheng Zhu, Kangkang Meng, Xin Yang, Ziyan Huang, Fan Zhang, Wentao Liu, YuanKe Pan, Shoujin Huang, Jiacheng Wang, Mingze Sun, Weixin Xu, Dengqiang Jia, Jae Won Choi, Natália Alves, Bram de Wilde, Gregor Koehler, Yajun Wu, Manuel Wiesenfarth, Qiongjie Zhu, Guoqiang Dong, Jian He, the FLARE Challenge Consortium, Bo Wang
for: 这篇论文的目的是要探讨自动化腹部疾病诊断和治疗规划中的量化器官评估。
methods: 这篇论文使用了许多人工智能（AI）算法，以测试它们在实际世界中的多元国际设定下的精度和效率。
results: 这篇论文发现了一些AI算法可以实现高度的准确性和效率，并且可以在不同的种族、疾病、阶段和生产商的CT扫描 immagini中实现一致性。这些算法还可以将腹部器官的生物特征自动提取出来，这是传统manual measurement的劳动密集的领域。

Abstract
Quantitative organ assessment is an essential step in automated abdominal disease diagnosis and treatment planning. Artificial intelligence (AI) has shown great potential to automatize this process. However, most existing AI algorithms rely on many expert annotations and lack a comprehensive evaluation of accuracy and efficiency in real-world multinational settings. To overcome these limitations, we organized the FLARE 2022 Challenge, the largest abdominal organ analysis challenge to date, to benchmark fast, low-resource, accurate, annotation-efficient, and generalized AI algorithms. We constructed an intercontinental and multinational dataset from more than 50 medical groups, including Computed Tomography (CT) scans with different races, diseases, phases, and manufacturers. We independently validated that a set of AI algorithms achieved a median Dice Similarity Coefficient (DSC) of 90.0\% by using 50 labeled scans and 2000 unlabeled scans, which can significantly reduce annotation requirements. The best-performing algorithms successfully generalized to holdout external validation sets, achieving a median DSC of 89.5\%, 90.9\%, and 88.3\% on North American, European, and Asian cohorts, respectively. They also enabled automatic extraction of key organ biology features, which was labor-intensive with traditional manual measurements. This opens the potential to use unlabeled data to boost performance and alleviate annotation shortages for modern AI models.

摘要
《量化器官评估是自动肝脏疾病诊断和治疗规划的关键步骤。人工智能（AI）已经在这一过程中表现出了很大的潜力。然而，现有的大多数AI算法仍然依赖于许多专家注释，而且缺乏实际世界多国场景中的全面评估和精度。为了解决这些局限性，我们组织了2022年FLARE挑战，这是肝脏分析领域最大的挑战至今。我们构建了跨国和多国的数据集，包括不同的种族、疾病、阶段和制造商的 Computed Tomography（CT）扫描图像。我们独立验证了一组AI算法可以在50个医疗机构的50个标注图像和2000个无标注图像的基础上达到 médiane 的 dice相似度（DSC）90.0%。这些算法可以减少注释要求，并在占据外验集中保持高精度。最佳算法还能够自动提取关键器官生物特征，这是传统的手动测量很劳累。这开启了使用无标注数据来提高性能的可能性，并解决了现代AI模型的注释缺乏问题。》

Are We Closing the Loop Yet? Gaps in the Generalizability of VIS4ML Research

paper_url: http://arxiv.org/abs/2308.06290
repo_url: None
paper_authors: Hariharan Subramonyam, Jessica Hullman
for: 这个论文的目的是探讨机器学习（ML）领域的人工智能（AI）研究，以帮助专家更好地发展、理解和改进机器学习模型。
methods: 研究人员使用了交互式可视化技术来让专家更好地理解机器学习组件，并使用了人类知识来支持人类在loop任务。
results: 研究人员发现，目前的VIS4ML研究范围和实践中的应用Scope有一定的差距，许多论文的结论通常是基于非代表性的场景、少量的ML专家和已知的数据集进行验证，而且存在一些关键依赖关系和不充分评估的问题。

Abstract
Visualization for machine learning (VIS4ML) research aims to help experts apply their prior knowledge to develop, understand, and improve the performance of machine learning models. In conceiving VIS4ML systems, researchers characterize the nature of human knowledge to support human-in-the-loop tasks, design interactive visualizations to make ML components interpretable and elicit knowledge, and evaluate the effectiveness of human-model interchange. We survey recent VIS4ML papers to assess the generalizability of research contributions and claims in enabling human-in-the-loop ML. Our results show potential gaps between the current scope of VIS4ML research and aspirations for its use in practice. We find that while papers motivate that VIS4ML systems are applicable beyond the specific conditions studied, conclusions are often overfitted to non-representative scenarios, are based on interactions with a small set of ML experts and well-understood datasets, fail to acknowledge crucial dependencies, and hinge on decisions that lack justification. We discuss approaches to close the gap between aspirations and research claims and suggest documentation practices to report generality constraints that better acknowledge the exploratory nature of VIS4ML research.

摘要

Knowledge Propagation over Conditional Independence Graphs

paper_url: http://arxiv.org/abs/2308.05857
repo_url: None
paper_authors: Urszula Chajewska, Harsh Shrivastava
for: 这个论文主要是为了提出知识传播算法来处理 conditional independence 图（CI graph）。
methods: 该论文使用了一种基于 undirected graph 的方法，用于模型特征之间的相互关系。
results: 实验结果表明，该算法在公开available的 Cora 和 PubMed 数据集上比现有技术更高效。

Abstract
Conditional Independence (CI) graph is a special type of a Probabilistic Graphical Model (PGM) where the feature connections are modeled using an undirected graph and the edge weights show the partial correlation strength between the features. Since the CI graphs capture direct dependence between features, they have been garnering increasing interest within the research community for gaining insights into the systems from various domains, in particular discovering the domain topology. In this work, we propose algorithms for performing knowledge propagation over the CI graphs. Our experiments demonstrate that our techniques improve upon the state-of-the-art on the publicly available Cora and PubMed datasets.

摘要
<>将文本翻译为简化中文。<> condition independence（CI）图是一种特殊的概率图模型（PGM），其特征连接使用无向图表示，边重量表示特征之间的半相关强度。由于 CI 图表示直接相关性 между特征，因此在不同领域研究中吸引了越来越多的关注，尤其是发现领域结构。在这种工作中，我们提出了在 CI 图上进行知识传播的算法。我们的实验结果表明，我们的技术在公共可用的 Cora 和 PubMed 数据集上超过了现状的水平。

Seed Kernel Counting using Domain Randomization and Object Tracking Neural Networks

paper_url: http://arxiv.org/abs/2308.05846
repo_url: None
paper_authors: Venkat Margapuri, Prapti Thapaliya, Mitchell Neilsen
For: The paper is written for the seed production industry, specifically for small-scale seed production firms who cannot afford high-priced mechanized seed kernel counters.* Methods: The paper proposes the use of object tracking neural network models, such as YOLO, to estimate cereal yield inexpensively. The authors also use synthetic imagery as a feasible substitute to train neural networks for object tracking.* Results: The paper demonstrates the use of a low-cost mechanical hopper, trained YOLOv8 neural network model, and object tracking algorithms on StrongSORT and ByteTrack to estimate cereal yield from videos. The results show an accuracy of 95.2% and 93.2% for Soy and Wheat respectively using the StrongSORT algorithm, and an accuracy of 96.8% and 92.4% for Soy and Wheat respectively using the ByteTrack algorithm.Here are the three points in Simplified Chinese text:* For: 这篇论文是为小规模种子生产公司写的，以便他们不能负担高价的机器化种子果实计数器。* Methods: 论文提出使用对象跟踪神经网络模型，如YOLO，来低成本地测量小麦和大豢的果实量。作者还使用Synthetic imagery作为可行的代替方法来训练神经网络。* Results: 论文实践了一种低成本的机械搅拌器、训练过的YOLOv8神经网络模型和对象跟踪算法在StrongSORT和ByteTrack上测量小麦和大豢的果实量。结果显示，使用StrongSORT算法可达95.2%和93.2%的准确率，使用ByteTrack算法可达96.8%和92.4%的准确率。

Abstract
High-throughput phenotyping (HTP) of seeds, also known as seed phenotyping, is the comprehensive assessment of complex seed traits such as growth, development, tolerance, resistance, ecology, yield, and the measurement of parameters that form more complex traits. One of the key aspects of seed phenotyping is cereal yield estimation that the seed production industry relies upon to conduct their business. While mechanized seed kernel counters are available in the market currently, they are often priced high and sometimes outside the range of small scale seed production firms' affordability. The development of object tracking neural network models such as You Only Look Once (YOLO) enables computer scientists to design algorithms that can estimate cereal yield inexpensively. The key bottleneck with neural network models is that they require a plethora of labelled training data before they can be put to task. We demonstrate that the use of synthetic imagery serves as a feasible substitute to train neural networks for object tracking that includes the tasks of object classification and detection. Furthermore, we propose a seed kernel counter that uses a low-cost mechanical hopper, trained YOLOv8 neural network model, and object tracking algorithms on StrongSORT and ByteTrack to estimate cereal yield from videos. The experiment yields a seed kernel count with an accuracy of 95.2\% and 93.2\% for Soy and Wheat respectively using the StrongSORT algorithm, and an accuray of 96.8\% and 92.4\% for Soy and Wheat respectively using the ByteTrack algorithm.

摘要
高通量现象评估（HTP）的种子也称为种子现象评估，是全面评估复杂种子特征，如生长、发展、耐受、抗性、生态学、产量等参数的评估。种子生产行业很需要产量预测，以便进行业务。目前市场上有机器化种子坚果数计算机，但它们往往很昂贵，小规模种子生产公司可能无法负担。基于对象跟踪神经网络模型，如一下只看一次（YOLO），计算科学家可以设计便宜的算法来预测小麦和豫 corn 的产量。然而，神经网络模型的主要瓶颈是需要大量标注训练数据。我们表明，使用合成图像作为可行的替代方案，可以用于训练对象跟踪神经网络模型，包括对象分类和检测任务。此外，我们提议一种使用低成本机械吸盘、训练过 YOLOv8 神经网络模型和对象跟踪算法的种子坚果计数器，用于从视频中预测小麦和豫 corn 的产量。实验结果表明，使用 StrongSORT 算法和 ByteTrack 算法，可以准确地预测小麦和豫 corn 的产量，准确率分别为 95.2% 和 93.2%，以及 96.8% 和 92.4%。

DiLogics: Creating Web Automation Programs With Diverse Logics

paper_url: http://arxiv.org/abs/2308.05828
repo_url: None
paper_authors: Kevin Pu, Jim Yang, Angel Yuan, Minyi Ma, Rui Dong, Xinyu Wang, Yan Chen, Tovi Grossman
For: The paper is written for knowledge workers who frequently encounter repetitive web data entry tasks and want to increase their productivity through web automation.* Methods: The paper presents a programming-by-demonstration system called DiLogics, which uses natural language processing (NLP) to assist users in creating web automation programs that can handle diverse specifications.* Results: The paper shows that non-experts can effectively use DiLogics to create automation programs that fulfill diverse input instructions, and that DiLogics provides an efficient, intuitive, and expressive method for developing web automation programs satisfying diverse specifications.Here’s the same information in Simplified Chinese:* For: 论文是为知识工作者所写，他们经常遇到重复的网络数据录入任务，想通过网络自动化提高工作效率。* Methods: 论文提出了一种基于示例示出的编程系统——DiLogics，通过自然语言处理（NLP）助手，帮助用户创建满足多样化要求的网络自动化程序。* Results: 论文表明，非专业人员可以有效使用 DiLogics 创建满足多样化输入指令的自动化程序，而 DiLogics 提供了高效、直观、表达力强的网络自动化程序开发方法。

Abstract
Knowledge workers frequently encounter repetitive web data entry tasks, like updating records or placing orders. Web automation increases productivity, but translating tasks to web actions accurately and extending to new specifications is challenging. Existing tools can automate tasks that perform the same logical trace of UI actions (e.g., input text in each field in order), but do not support tasks requiring different executions based on varied input conditions. We present DiLogics, a programming-by-demonstration system that utilizes NLP to assist users in creating web automation programs that handle diverse specifications. DiLogics first semantically segments input data to structured task steps. By recording user demonstrations for each step, DiLogics generalizes the web macros to novel but semantically similar task requirements. Our evaluation showed that non-experts can effectively use DiLogics to create automation programs that fulfill diverse input instructions. DiLogics provides an efficient, intuitive, and expressive method for developing web automation programs satisfying diverse specifications.

摘要
知识工作者经常遇到重复的网络数据入力任务，如更新记录或发送订单。网络自动化可以提高生产力，但将任务翻译为网络动作accurately并扩展到新规范是挑战。现有工具可以自动执行同样的逻辑Trace of UI actions（例如，输入文本在每个字段中输入），但不支持基于不同输入条件的任务。我们提出了DiLogics，一个基于Programming-by-Demonstration的系统，使用自然语言处理（NLP）帮助用户创建满足多样化要求的网络自动化程序。DiLogics首先将输入数据semanticallySegmented into结构化任务步骤。通过记录用户示例 для每个步骤，DiLogics将网络 macro扩展到新的，但semantically similar的任务要求。我们的评估显示，非专家可以有效地使用DiLogics创建自动化程序，满足多样化的输入指令。DiLogics提供了高效、直观、表达力强的方法 для开发满足多样化要求的网络自动化程序。

Encode-Store-Retrieve: Enhancing Memory Augmentation through Language-Encoded Egocentric Perception

paper_url: http://arxiv.org/abs/2308.05822
repo_url: None
paper_authors: Junxiao Shen, John Dudley, Per Ola Kristensson
for: 增强人类记忆能力，尤其是对于长期内存和生活记忆。
methods: 利用增强现实头戴式显示器捕捉和保存生活视频，并使用自然语言编码将其存储在矢量数据库中。
results: 比较出色的结果，BLEU分数达8.3，超过了传统机器学习模型的3.4-5.8分数。在用户研究中，人工系统的响应得分为4.13/5，而人类参与者的得分为2.46/5。

Abstract
We depend on our own memory to encode, store, and retrieve our experiences. However, memory lapses can occur. One promising avenue for achieving memory augmentation is through the use of augmented reality head-mounted displays to capture and preserve egocentric videos, a practice commonly referred to as life logging. However, a significant challenge arises from the sheer volume of video data generated through life logging, as the current technology lacks the capability to encode and store such large amounts of data efficiently. Further, retrieving specific information from extensive video archives requires substantial computational power, further complicating the task of quickly accessing desired content. To address these challenges, we propose a memory augmentation system that involves leveraging natural language encoding for video data and storing them in a vector database. This approach harnesses the power of large vision language models to perform the language encoding process. Additionally, we propose using large language models to facilitate natural language querying. Our system underwent extensive evaluation using the QA-Ego4D dataset and achieved state-of-the-art results with a BLEU score of 8.3, outperforming conventional machine learning models that scored between 3.4 and 5.8. Additionally, in a user study, our system received a higher mean response score of 4.13/5 compared to the human participants' score of 2.46/5 on real-life episodic memory tasks.

摘要
我们依赖我们自己的记忆来编码、存储和检索我们的经验。然而，记忆漏洞可能会出现。一种有前途的方法是通过使用扩展现实头戴式显示器捕捉和保存 Egocentric 视频，这种做法通常被称为生活日志。然而，大量视频数据的生成带来了现有技术的存储和编码问题，特别是在检索广泛的视频存档中寻找特定内容的计算机力量很大，使得快速访问感兴趣的内容变得复杂。为解决这些挑战，我们提议一种增强记忆系统，该系统利用自然语言编码器将视频数据存储在 вектор数据库中。这种方法利用大量视力语言模型来实现语言编码过程。此外，我们还提议使用大量语言模型来促进自然语言查询。我们的系统在使用 QA-Ego4D 数据集进行了广泛的评估，并取得了当前最佳成绩，BLEU 分数为 8.3，超越了传统机器学习模型的分数范围 между 3.4 和 5.8。此外，在用户研究中，我们的系统得到了用户平均回答分数为 4.13/5，而人类参与者的平均回答分数为 2.46/5 在真实生活记忆任务中。

Neural Progressive Meshes

paper_url: http://arxiv.org/abs/2308.05741
repo_url: None
paper_authors: Yun-Chun Chen, Vladimir G. Kim, Noam Aigerman, Alec Jacobson
for: efficiently transmitting large geometric data (e.g., 3D meshes) over the Internet
methods: subdivision-based encoder-decoder architecture trained on a large collection of surfaces, with progressive transmission of residual features
results: outperforms baselines in terms of compression ratio and reconstruction quality

Abstract
The recent proliferation of 3D content that can be consumed on hand-held devices necessitates efficient tools for transmitting large geometric data, e.g., 3D meshes, over the Internet. Detailed high-resolution assets can pose a challenge to storage as well as transmission bandwidth, and level-of-detail techniques are often used to transmit an asset using an appropriate bandwidth budget. It is especially desirable for these methods to transmit data progressively, improving the quality of the geometry with more data. Our key insight is that the geometric details of 3D meshes often exhibit similar local patterns even across different shapes, and thus can be effectively represented with a shared learned generative space. We learn this space using a subdivision-based encoder-decoder architecture trained in advance on a large collection of surfaces. We further observe that additional residual features can be transmitted progressively between intermediate levels of subdivision that enable the client to control the tradeoff between bandwidth cost and quality of reconstruction, providing a neural progressive mesh representation. We evaluate our method on a diverse set of complex 3D shapes and demonstrate that it outperforms baselines in terms of compression ratio and reconstruction quality.

摘要
现在有许多手持式设备可以播放3D内容，这些内容的大量几何数据（如3D网格）的传输效率变得非常重要。高分辨率资产可能会占用很多存储空间和传输带宽，而level-of-detail技术可以将资产分解为不同的级别，以适应不同的带宽预算。我们的关键发现是，3D网格的几何细节经常会在不同的形状之间展现相似的地方性特征，因此可以使用共享学习生成空间来有效表示它们。我们使用了分割基于的编码器-解码器架构来学习这个空间，并在大量的表面上进行预先训练。此外，我们还发现可以在间接级别之间进行进程式传输额外的剩余特征，使客户端可以控制带宽成本和重建质量之间的交易，提供神经进程式网格表示。我们对一个多样化的3D形状集合进行了评估，并证明了我们的方法在压缩率和重建质量两个方面都超过了基准值。

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

paper_url: http://arxiv.org/abs/2308.05734
repo_url: https://github.com/haoheliu/AudioLDM2
paper_authors: Haohe Liu, Qiao Tian, Yi Yuan, Xubo Liu, Xinhao Mei, Qiuqiang Kong, Yuping Wang, Wenwu Wang, Yuxuan Wang, Mark D. Plumbley
for: 这篇论文是针对不同类型的声音生成模型的设计，包括speech、音乐和 зву效，并提出一个框架，让这些模型共享同一个学习方法。
methods: 这个框架使用了一个称为“语言音频”（LOA）的通用表现，将任何声音转换为LOA，然后使用GPT-2模型处理自我监督学习。在生成过程中，我们使用了一个潜在扩散模型，将任何modalities转换为LOA。
results: 实验结果显示，这个框架可以实现新的州际表现或与前一代方法竞争的性能，并且具有内在学习能力和可重用的自我监督学习模型。代码和示例可以在https://audioldm.github.io/audioldm2获取。

Abstract
Although audio generation shares commonalities across different types of audio, such as speech, music, and sound effects, designing models for each type requires careful consideration of specific objectives and biases that can significantly differ from those of other types. To bring us closer to a unified perspective of audio generation, this paper proposes a framework that utilizes the same learning method for speech, music, and sound effect generation. Our framework introduces a general representation of audio, called language of audio (LOA). Any audio can be translated into LOA based on AudioMAE, a self-supervised pre-trained representation learning model. In the generation process, we translate any modalities into LOA by using a GPT-2 model, and we perform self-supervised audio generation learning with a latent diffusion model conditioned on LOA. The proposed framework naturally brings advantages such as in-context learning abilities and reusable self-supervised pretrained AudioMAE and latent diffusion models. Experiments on the major benchmarks of text-to-audio, text-to-music, and text-to-speech demonstrate new state-of-the-art or competitive performance to previous approaches. Our demo and code are available at https://audioldm.github.io/audioldm2.

摘要
尽管各种音频之间存在共同之处，如语音、音乐和音效，但设计模型时需要仔细考虑每种类型的特定目标和偏见，这些偏见可能与其他类型的偏见存在很大差异。为了带我们更近于一个统一的音频生成视角，这篇论文提出了一个框架，该框架利用同一种学习方法来生成语音、音乐和音效。我们的框架称之为语音语言（LOA）框架。任何音频都可以根据AudioMAE自supervised预训练表示学习模型转化为LOA表示。在生成过程中，我们使用GPT-2模型将任何模态转化为LOA，并使用conditioned on LOA的隐式扩散模型进行自主学习。我们的提议的框架自然带来了在上下文学习能力和可 reuse自supervised AudioMAE和隐藏扩散模型的优点。我们的实验在文本到音频、文本到音乐和文本到语音的主要标准测试集上达到了新的状态ucker或竞争性的表现。您可以在https://audioldm.github.io/audioldm2上获取我们的demo和代码。

PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers

paper_url: http://arxiv.org/abs/2308.05732
repo_url: None
paper_authors: Phillip Lippe, Bastiaan S. Veeling, Paris Perdikaris, Richard E. Turner, Johannes Brandstetter
for: 这个论文的目的是提出一种基于深度神经网络的 partial differential equation（PDE）解决方案，以提高解决PDE问题的计算效率和准确性。
methods: 这个论文使用了一种基于扩展的 diffusion 模型，通过多步增强过程来更好地模型 PDE 解的所有频率成分。
results: 论文通过对复杂的液体动力学 benchmark 进行验证，表明 PDE-Refiner 可以在稳定和准确的情况下进行 Rollout 操作，并且可以超过现有的神经网络、数值和神经数值模型。 additionally, PDE-Refiner 可以大幅提高数据效率，并且可以准确地评估模型的预测不确定性。

Abstract
Time-dependent partial differential equations (PDEs) are ubiquitous in science and engineering. Recently, mostly due to the high computational cost of traditional solution techniques, deep neural network based surrogates have gained increased interest. The practical utility of such neural PDE solvers relies on their ability to provide accurate, stable predictions over long time horizons, which is a notoriously hard problem. In this work, we present a large-scale analysis of common temporal rollout strategies, identifying the neglect of non-dominant spatial frequency information, often associated with high frequencies in PDE solutions, as the primary pitfall limiting stable, accurate rollout performance. Based on these insights, we draw inspiration from recent advances in diffusion models to introduce PDE-Refiner; a novel model class that enables more accurate modeling of all frequency components via a multistep refinement process. We validate PDE-Refiner on challenging benchmarks of complex fluid dynamics, demonstrating stable and accurate rollouts that consistently outperform state-of-the-art models, including neural, numerical, and hybrid neural-numerical architectures. We further demonstrate that PDE-Refiner greatly enhances data efficiency, since the denoising objective implicitly induces a novel form of spectral data augmentation. Finally, PDE-Refiner's connection to diffusion models enables an accurate and efficient assessment of the model's predictive uncertainty, allowing us to estimate when the surrogate becomes inaccurate.

摘要
时间依赖的partial differential equations (PDEs) 在科学和工程中很普遍。近年来，主要因为传统解决方案的计算成本高涨，深度神经网络基于的surrogate 获得了更多的关注。但是，实际应用中，这些神经PDE解决器的实用性受到长时间预测稳定和准确的限制。在这个工作中，我们对常见的时间推送策略进行大规模分析，发现了忽略非主导空间频率信息的问题，这常常与PDE解决中高频信号相关。基于这些发现，我们从Diffusion模型中灵感获得了PDE-Refiner;一种新的模型类，可以更好地模型所有频率成分，通过多步精度提升过程。我们验证了PDE-Refiner在复杂的液体动力学benchmark上，表明其可以在稳定和准确的情况下进行长时间推送。此外，PDE-Refiner可以大幅提高数据效率，因为净化目标意味着一种新的spectral data augmentation。最后，PDE-Refiner的连接到Diffusion模型使得可以准确和有效地评估模型的预测不确定性，从而估计模型在不准确的情况下。

Rethinking Integration of Prediction and Planning in Deep Learning-Based Automated Driving Systems: A Review

paper_url: http://arxiv.org/abs/2308.05731
repo_url: None
paper_authors: Steffen Hagedorn, Marcel Hallgarten, Martin Stoll, Alexandru Condurache
for: 提高自动驾驶系统的安全性、效率和舒适性。
methods: 使用深度学习模型进行预测和规划，并将两者 integrate 为一个互相关联的模型。
results: 通过对现有的模型进行系统性的审视和分析，提供了关于不同集成方法的研究 gap 和未来挑战，以及指出了未来研究的潜在方向。

Abstract
Automated driving has the potential to revolutionize personal, public, and freight mobility. Besides the enormous challenge of perception, i.e. accurately perceiving the environment using available sensor data, automated driving comprises planning a safe, comfortable, and efficient motion trajectory. To promote safety and progress, many works rely on modules that predict the future motion of surrounding traffic. Modular automated driving systems commonly handle prediction and planning as sequential separate tasks. While this accounts for the influence of surrounding traffic on the ego-vehicle, it fails to anticipate the reactions of traffic participants to the ego-vehicle's behavior. Recent works suggest that integrating prediction and planning in an interdependent joint step is necessary to achieve safe, efficient, and comfortable driving. While various models implement such integrated systems, a comprehensive overview and theoretical understanding of different principles are lacking. We systematically review state-of-the-art deep learning-based prediction, planning, and integrated prediction and planning models. Different facets of the integration ranging from model architecture and model design to behavioral aspects are considered and related to each other. Moreover, we discuss the implications, strengths, and limitations of different integration methods. By pointing out research gaps, describing relevant future challenges, and highlighting trends in the research field, we identify promising directions for future research.

摘要
自动驾驶技术有可能改变人类、公共和货物运输的方式。除了巨大的感知挑战以外，自动驾驶还包括规划一个安全、舒适和高效的动力轨迹。为了促进安全和进步，许多研究都是通过模块来预测周围交通的未来运动来实现。这些模块通常处理预测和规划作为独立的两个任务。although this approach takes into account the influence of surrounding traffic on the ego-vehicle, it fails to anticipate the reactions of traffic participants to the ego-vehicle's behavior。Recent works suggest that integrating prediction and planning in an interdependent joint step is necessary to achieve safe, efficient, and comfortable driving。various models have implemented such integrated systems, but a comprehensive overview and theoretical understanding of different principles are lacking。we systematically review state-of-the-art deep learning-based prediction, planning, and integrated prediction and planning models。different aspects of the integration, such as model architecture, model design, and behavioral aspects, are considered and related to each other。furthermore, we discuss the implications, strengths, and limitations of different integration methods。by pointing out research gaps, describing relevant future challenges, and highlighting trends in the research field, we identify promising directions for future research。

Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems

paper_url: http://arxiv.org/abs/2308.05713
repo_url: None
paper_authors: Ernest Davis, Scott Aaronson
for: 测试GPT-4语言模型在科学和数学领域的105个原始问题中的能力，包括高中和大学水平。
methods: 使用Wolfram Alpha和Code Interpreter插件。
results: 测试结果表明，插件可以显著提高GPT解决这些问题的能力，但仍有“接口”失败，即GPT有问题表述问题以获得有用的答案。

Abstract
This report describes a test of the large language model GPT-4 with the Wolfram Alpha and the Code Interpreter plug-ins on 105 original problems in science and math, at the high school and college levels, carried out in June-August 2023. Our tests suggest that the plug-ins significantly enhance GPT's ability to solve these problems. Having said that, there are still often "interface" failures; that is, GPT often has trouble formulating problems in a way that elicits useful answers from the plug-ins. Fixing these interface failures seems like a central challenge in making GPT a reliable tool for college-level calculation problems.

摘要
Simplified Chinese:这份报告描述了在6月至8月2023年，使用GPT-4语言模型和Wolfram Alpha和Code Interpreter插件，对105个科学和数学问题进行了测试，这些问题分别来自高中和大学水平。我们的测试表明，插件可以大大提高GPT的解决这些问题的能力。然而，还有很多“接口”失败，即GPT在提问时未能得到有用的答案。解决这些接口失败是让GPT成为大学水平计算问题的可靠工具的中心挑战。

Exploring the Potential of World Models for Anomaly Detection in Autonomous Driving

paper_url: http://arxiv.org/abs/2308.05701
repo_url: None
paper_authors: Daniel Bogdoll, Lukas Bosch, Tim Joseph, Helen Gremmelmaier, Yitian Yang, J. Marius Zöllner
for: 本研究旨在探讨世界模型如何应用于自动驾驶系统中的异常检测。
methods: 本研究使用世界模型来检测自动驾驶系统中的异常。
results: 本研究提供了世界模型在自动驾驶系统中异常检测的概述，并将各个组件与前期异常检测研究相关联，以便进一步探讨这一领域。

Abstract
In recent years there have been remarkable advancements in autonomous driving. While autonomous vehicles demonstrate high performance in closed-set conditions, they encounter difficulties when confronted with unexpected situations. At the same time, world models emerged in the field of model-based reinforcement learning as a way to enable agents to predict the future depending on potential actions. This led to outstanding results in sparse reward and complex control tasks. This work provides an overview of how world models can be leveraged to perform anomaly detection in the domain of autonomous driving. We provide a characterization of world models and relate individual components to previous works in anomaly detection to facilitate further research in the field.

摘要
In this work, we explore how world models can be used for anomaly detection in the field of autonomous driving. We provide a comprehensive overview of world models and how they can be applied to detect anomalies in this domain. Additionally, we relate individual components of world models to previous works in anomaly detection, providing a foundation for further research in this area.

SSLRec: A Self-Supervised Learning Library for Recommendation

paper_url: http://arxiv.org/abs/2308.05697
repo_url: https://github.com/hkuds/sslrec
paper_authors: Xubin Ren, Lianghao Xia, Yuhao Yang, Wei Wei, Tianle Wang, Xuheng Cai, Chao Huang
for: This paper is written to address the lack of unified frameworks for evaluating self-supervised learning (SSL) recommendation algorithms across different domains.
methods: The paper introduces SSLRec, a novel benchmark platform that provides a standardized, flexible, and comprehensive framework for evaluating various SSL-enhanced recommenders. The platform features a modular architecture and a complete set of data augmentation and self-supervised toolkits.
results: The paper provides a comprehensive set of state-of-the-art SSL-enhanced recommendation models across different scenarios, enabling researchers to evaluate these cutting-edge models and drive further innovation in the field. The paper also simplifies the process of training and evaluating different recommendation models with consistent and fair settings.

Abstract
Self-supervised learning (SSL) has gained significant interest in recent years as a solution to address the challenges posed by sparse and noisy data in recommender systems. Despite the growing number of SSL algorithms designed to provide state-of-the-art performance in various recommendation scenarios (e.g., graph collaborative filtering, sequential recommendation, social recommendation, KG-enhanced recommendation), there is still a lack of unified frameworks that integrate recommendation algorithms across different domains. Such a framework could serve as the cornerstone for self-supervised recommendation algorithms, unifying the validation of existing methods and driving the design of new ones. To address this gap, we introduce SSLRec, a novel benchmark platform that provides a standardized, flexible, and comprehensive framework for evaluating various SSL-enhanced recommenders. The SSLRec library features a modular architecture that allows users to easily evaluate state-of-the-art models and a complete set of data augmentation and self-supervised toolkits to help create SSL recommendation models with specific needs. Furthermore, SSLRec simplifies the process of training and evaluating different recommendation models with consistent and fair settings. Our SSLRec platform covers a comprehensive set of state-of-the-art SSL-enhanced recommendation models across different scenarios, enabling researchers to evaluate these cutting-edge models and drive further innovation in the field. Our implemented SSLRec framework is available at the source code repository https://github.com/HKUDS/SSLRec.

摘要
自我监督学习（SSL）在过去几年内受到了广泛关注，以解决推荐系统中稀缺和噪音数据的挑战。虽然有一大量的SSL算法，用于在不同领域提供状态对抗性的表现（如图像协同推荐、序列推荐、社交推荐、知识 graphs 增强推荐），但是还没有一个统一的框架，可以将推荐算法集成到不同领域。这样的框架可以作为推荐算法的基础，统一验证现有方法，并驱动新的方法的设计。为解决这个差距，我们介绍了SSLRec，一个新的测试平台，它提供了标准化、灵活、全面的评估推荐算法的框架。SSLRec 库具有可扩展的架构，allowing users to easily evaluate state-of-the-art models，并且提供了完整的数据增强和自我监督工具kit，帮助用户创建特定需求的SSL推荐模型。此外，SSLRec 简化了不同推荐模型的训练和评估过程，使得模型的评估具有共同和公正的设置。我们的SSLRec 平台覆盖了不同enario 中的 cutting-edge SSL-enhanced recommendation models， allowing researchers to evaluate these models and drive further innovation in the field。我们实现的SSLRec 框架可以在上获取。

Hard No-Box Adversarial Attack on Skeleton-Based Human Action Recognition with Skeleton-Motion-Informed Gradient

paper_url: http://arxiv.org/abs/2308.05681
repo_url: https://github.com/luyg45/hardnoboxattack
paper_authors: Zhengzhi Lu, He Wang, Ziyi Chang, Guoan Yang, Hubert P. H. Shum
for: 这 paper 的目的是证明 skeleton-based 人员活动识别方法存在敏感性，并提出一种新的攻击任务，即攻击者没有访问受试者模型或训练数据或标签。
methods: 这 paper 使用了一种新的攻击方法，即基于动作演示的攻击方法，称为 SMI 梯度。这种攻击方法可以在不知情的情况下对skeleton-based 人员活动识别模型进行攻击。
results: experiments 表明，这种攻击方法可以对skeleton-based 人员活动识别模型造成实际的威胁，并且可以在不知情的情况下进行 Transfer-based 和 black-box 攻击。此外，这种攻击方法还可以提高攻击样本的可识别性和透明度。

Abstract
Recently, methods for skeleton-based human activity recognition have been shown to be vulnerable to adversarial attacks. However, these attack methods require either the full knowledge of the victim (i.e. white-box attacks), access to training data (i.e. transfer-based attacks) or frequent model queries (i.e. black-box attacks). All their requirements are highly restrictive, raising the question of how detrimental the vulnerability is. In this paper, we show that the vulnerability indeed exists. To this end, we consider a new attack task: the attacker has no access to the victim model or the training data or labels, where we coin the term hard no-box attack. Specifically, we first learn a motion manifold where we define an adversarial loss to compute a new gradient for the attack, named skeleton-motion-informed (SMI) gradient. Our gradient contains information of the motion dynamics, which is different from existing gradient-based attack methods that compute the loss gradient assuming each dimension in the data is independent. The SMI gradient can augment many gradient-based attack methods, leading to a new family of no-box attack methods. Extensive evaluation and comparison show that our method imposes a real threat to existing classifiers. They also show that the SMI gradient improves the transferability and imperceptibility of adversarial samples in both no-box and transfer-based black-box settings.

摘要
近期，基于骨架的人体活动识别方法被发现容易受到敌意攻击。然而，这些攻击方法具有限制性的要求，包括受害者（white-box攻击）、训练数据（transfer-based攻击）或模型查询频繁（black-box攻击）。这些要求都是非常困难，这引发了对攻击性能的评估。在这篇论文中，我们表明了这种漏洞的存在。为此，我们倡议了一个新的攻击任务：攻击者无法访问受害者模型或训练数据或标签。我们称之为“硬无框攻击”。我们首先学习了一个动作拟合空间，并定义了一种对抗损失函数来计算一个新的攻击方向，称之为“骨动作信息”（SMI）损失函数。我们的损失函数含有动作动力信息，与现有的梯度基于攻击方法不同。SMI损失函数可以增强许多梯度基于攻击方法，导致一个新的无框攻击家族。我们进行了广泛的评估和比较，并证明了我们的方法对现有分类器具有真正的威胁。此外，我们还证明了SMI损失函数可以提高黑框和转移黑框无框攻击的可读性和隐蔽性。

Exploring Deep Learning Approaches to Predict Person and Vehicle Trips: An Analysis of NHTS Data

paper_url: http://arxiv.org/abs/2308.05665
repo_url: None
paper_authors: Kojo Adu-Gyamfi, Sharma Anuj
for: 这项研究的目的是探讨深度学习技术在交通规划预测中的潜在应用，以提高交通规划的准确性和可靠性。methods: 这项研究使用了全国家庭旅行调查（NHTS）数据集，开发了一个深度学习模型来预测人坐标和车辆坐标。该模型利用了NHTS数据中的庞大信息，捕捉了复杂的非线性关系，从而超越了传统模型的性能。results: 该研究发现，使用深度学习模型可以实现人坐标预测的准确率达98%，车辆坐标预测的准确率达96%，与传统交通规划模型相比有显著提高。这表明深度学习在交通规划预测中具有潜在的应用价值。

Abstract
Modern transportation planning relies heavily on accurate predictions of person and vehicle trips. However, traditional planning models often fail to account for the intricacies and dynamics of travel behavior, leading to less-than-optimal accuracy in these predictions. This study explores the potential of deep learning techniques to transform the way we approach trip predictions, and ultimately, transportation planning. Utilizing a comprehensive dataset from the National Household Travel Survey (NHTS), we developed and trained a deep learning model for predicting person and vehicle trips. The proposed model leverages the vast amount of information in the NHTS data, capturing complex, non-linear relationships that were previously overlooked by traditional models. As a result, our deep learning model achieved an impressive accuracy of 98% for person trip prediction and 96% for vehicle trip estimation. This represents a significant improvement over the performances of traditional transportation planning models, thereby demonstrating the power of deep learning in this domain. The implications of this study extend beyond just more accurate predictions. By enhancing the accuracy and reliability of trip prediction models, planners can formulate more effective, data-driven transportation policies, infrastructure, and services. As such, our research underscores the need for the transportation planning field to embrace advanced techniques like deep learning. The detailed methodology, along with a thorough discussion of the results and their implications, are presented in the subsequent sections of this paper.

摘要
现代交通规划强调准确预测人员和车辆行程。然而，传统的规划模型经常忽略旅行行为的细节和动态特征，导致预测不准确。这项研究探讨使用深度学习技术改变交通规划方法的潜在性。我们使用全国家庭旅行调查（NHTS）数据集，开发和训练了深度学习模型，以预测人员和车辆行程。我们的提案的模型利用NHTS数据中的庞大信息，捕捉复杂的非线性关系，从而超过传统模型的性能。因此，我们的深度学习模型在人员行程预测中达到了98%的准确率，在车辆行程估算中达到了96%的准确率。这表明深度学习在这个领域具有显著的优势。这些结果不仅表明了深度学习模型的更高准确性和可靠性，还有助于交通规划师们制定更有效的数据驱动的交通政策、基础设施和服务。因此，我们的研究证明了交通规划领域应该采用高级技术如深度学习。详细的方法和结果的讨论，以及其影响，在后续章节中提供。

2023-08-11

Towards a Causal Probabilistic Framework for Prediction, Action-Selection & Explanations for Robot Block-Stacking Tasks

Exploring Predicate Visual Context in Detecting of Human-Object Interactions

Complex Facial Expression Recognition Using Deep Knowledge Distillation of Basic Features

Software Doping Analysis for Human Oversight

Physical Adversarial Attacks For Camera-based Smart Systems: Current Trends, Categorization, Applications, Research Challenges, and Future Outlook

Phased Deep Spatio-temporal Learning for Highway Traffic Volume Prediction

Application of Artificial Neural Networks for Investigation of Pressure Filtration Performance, a Zinc Leaching Filter Cake Moisture Modeling

A Game-Theoretic Framework for Joint Forecasting and Planning

Improving Joint Speech-Text Representations Without Alignment

Learning Deductive Reasoning from Synthetic Corpus based on Formal Logic

Improving Zero-Shot Text Matching for Financial Auditing with Large Language Models

Neural Conversation Models and How to Rein Them in: A Survey of Failures and Fixes

Reinforcement Logic Rule Learning for Temporal Point Processes

Toward a Better Understanding of Loss Functions for Collaborative Filtering

An Autoethnographic Exploration of XAI in Algorithmic Composition

Assessing Student Errors in Experimentation Using Artificial Intelligence and Large Language Models: A Comparative Study with Human Raters

Audio-Visual Spatial Integration and Recursive Attention for Robust Sound Source Localization

Cost-effective On-device Continual Learning over Memory Hierarchy with Miro

Learning to Guide Human Experts via Personalized Large Language Models

Deep Context Interest Network for Click-Through Rate Prediction

Evidence of Human-Like Visual-Linguistic Integration in Multimodal Large Language Models During Predictive Language Processing

Large Language Models in Cryptocurrency Securities Cases: Can ChatGPT Replace Lawyers?

AI-Assisted Investigation of On-Chain Parameters: Risky Cryptocurrencies and Price Factors

Controlling Character Motions without Observable Driving Source

Optimizing transformer-based machine translation model for single GPU training: a hyperparameter ablation study

Large Language Models for Telecom: Forthcoming Impact on the Industry

Deep Task-specific Bottom Representation Network for Multi-Task Recommendation

Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model

Defensive Perception: Estimation and Monitoring of Neural Network Performance under Deployment

TrajPAC: Towards Robustness Verification of Pedestrian Trajectory Prediction Models

Contrastive Explanations of Multi-agent Optimization Solutions

Face Encryption via Frequency-Restricted Identity-Agnostic Attacks

CyberForce: A Federated Reinforcement Learning Framework for Malware Mitigation

Tweet Sentiment Extraction using Viterbi Algorithm with Transfer Learning

An Encoder-Decoder Approach for Packing Circles

Decentralised Governance for Foundation Model based Systems: Exploring the Role of Blockchain in Responsible AI

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

FoodSAM: Any Food Segmentation

A Deep Recurrent-Reinforcement Learning Method for Intelligent AutoScaling of Serverless Functions

LittleMu: Deploying an Online Virtual Teaching Assistant via Heterogeneous Sources Integration and Chain of Teach Prompts

Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT

Learning to Team-Based Navigation: A Review of Deep Reinforcement Learning Techniques for Multi-Agent Pathfinding

DF2: Distribution-Free Decision-Focused Learning

Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-Chips

Unleashing the Strengths of Unlabeled Data in Pan-cancer Abdominal Organ Quantification: the FLARE22 Challenge

Are We Closing the Loop Yet? Gaps in the Generalizability of VIS4ML Research

Knowledge Propagation over Conditional Independence Graphs

Seed Kernel Counting using Domain Randomization and Object Tracking Neural Networks

DiLogics: Creating Web Automation Programs With Diverse Logics

Encode-Store-Retrieve: Enhancing Memory Augmentation through Language-Encoded Egocentric Perception

Neural Progressive Meshes

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers

Rethinking Integration of Prediction and Planning in Deep Learning-Based Automated Driving Systems: A Review

Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems

Exploring the Potential of World Models for Anomaly Detection in Autonomous Driving

SSLRec: A Self-Supervised Learning Library for Recommendation

Hard No-Box Adversarial Attack on Skeleton-Based Human Action Recognition with Skeleton-Motion-Informed Gradient

Exploring Deep Learning Approaches to Predict Person and Vehicle Trips: An Analysis of NHTS Data