2023-08-01

cs.AI

cs.AI - 2023-08-01

Hessian-Aware Bayesian Optimization for Decision Making Systems

paper_url: http://arxiv.org/abs/2308.00629
repo_url: None
paper_authors: Mohit Rajpal, Lac Gia Tran, Yehong Zhang, Bryan Kian Hsiang Low
For: 优化决策系统，尤其是在缺乏反馈信息的情况下。* Methods: 使用 derivative-free 方法，如 bayesian 优化，以减少对反馈质量的依赖。* Results: 在资源有限和异常反馈情况下，实验结果表明我们的方法（HA-GP-UCB）能够有效地优化决策系统。

Abstract
Many approaches for optimizing decision making systems rely on gradient based methods requiring informative feedback from the environment. However, in the case where such feedback is sparse or uninformative, such approaches may result in poor performance. Derivative-free approaches such as Bayesian Optimization mitigate the dependency on the quality of gradient feedback, but are known to scale poorly in the high-dimension setting of complex decision making systems. This problem is exacerbated if the system requires interactions between several actors cooperating to accomplish a shared goal. To address the dimensionality challenge, we propose a compact multi-layered architecture modeling the dynamics of actor interactions through the concept of role. Additionally, we introduce Hessian-aware Bayesian Optimization to efficiently optimize the multi-layered architecture parameterized by a large number of parameters. Experimental results demonstrate that our method (HA-GP-UCB) works effectively on several benchmarks under resource constraints and malformed feedback settings.

摘要
很多决策系统优化方法基于梯度计算，但在环境反馈缺乏信息时，这些方法可能表现不佳。不含梯度的方法如泊利抽象优化可以减少基于梯度反馈的依赖性，但在复杂决策系统中，这些方法可能 scalability 问题。特别是当决策系统需要多个演员合作完成共同目标时，这问题变得更加严重。为解决维度挑战，我们提议使用嵌入式多层架构，模型演员之间的动态关系，并通过角色概念来减少参数的数量。此外，我们还引入了希尔伯恩对 Bayesian 优化的知识，以高效地优化多层架构中的参数。实验结果表明，我们的方法（HA-GP-UCB）在资源限制和缺乏反馈情况下能够有效地工作。

paper_url: http://arxiv.org/abs/2308.00628
repo_url: https://github.com/soullessrobot/human-m3-dataset
paper_authors: Bohao Fan, Siqi Wang, Wenxuan Guo, Wenzhao Zheng, Jianjiang Feng, Jie Zhou
for: 这篇论文旨在提供一个多Modal多视图多人3D人姿数据库，以便进一步推动多Modal多视图3D人姿估计领域的研究。
methods: 该论文提出了一种基于多Modal数据输入的人 pose估计算法，使得可以更准确地估计人姿。此外，该论文还提出了一种基于多Modal数据输入的人 pose估计算法，以验证多Modal数据输入的优势。
results: 该论文的实验结果表明，该数据库是一个具有挑战性和多样性的 dataset，适用于未来的研究。此外，该论文的实验结果还表明，基于多Modal数据输入的人 pose估计算法具有明显的优势。

Abstract
3D human pose estimation in outdoor environments has garnered increasing attention recently. However, prevalent 3D human pose datasets pertaining to outdoor scenes lack diversity, as they predominantly utilize only one type of modality (RGB image or pointcloud), and often feature only one individual within each scene. This limited scope of dataset infrastructure considerably hinders the variability of available data. In this article, we propose Human-M3, an outdoor multi-modal multi-view multi-person human pose database which includes not only multi-view RGB videos of outdoor scenes but also corresponding pointclouds. In order to obtain accurate human poses, we propose an algorithm based on multi-modal data input to generate ground truth annotation. This benefits from robust pointcloud detection and tracking, which solves the problem of inaccurate human localization and matching ambiguity that may exist in previous multi-view RGB videos in outdoor multi-person scenes, and generates reliable ground truth annotations. Evaluation of multiple different modalities algorithms has shown that this database is challenging and suitable for future research. Furthermore, we propose a 3D human pose estimation algorithm based on multi-modal data input, which demonstrates the advantages of multi-modal data input for 3D human pose estimation. Code and data will be released on https://github.com/soullessrobot/Human-M3-Dataset.

摘要
Recently, 3D human pose estimation in outdoor environments has gained increasing attention. However, existing 3D human pose datasets for outdoor scenes are limited in terms of diversity, as they primarily use only one type of modality (RGB image or pointcloud), and often feature only one individual per scene. This limited scope of dataset infrastructure significantly hinders the variability of available data.In this article, we propose Human-M3, an outdoor multi-modal multi-view multi-person human pose database that includes not only multi-view RGB videos of outdoor scenes but also corresponding pointclouds. To obtain accurate human poses, we propose an algorithm based on multi-modal data input to generate ground truth annotation. This approach leverages robust pointcloud detection and tracking, which resolves the problems of inaccurate human localization and matching ambiguity that may exist in previous multi-view RGB videos of outdoor multi-person scenes, and generates reliable ground truth annotations.Evaluation of multiple different modalities algorithms has shown that this database is challenging and suitable for future research. Furthermore, we propose a 3D human pose estimation algorithm based on multi-modal data input, which demonstrates the advantages of multi-modal data input for 3D human pose estimation. The database and code will be released on GitHub at .

JIANG: Chinese Open Foundation Language Model

paper_url: http://arxiv.org/abs/2308.00624
repo_url: None
paper_authors: Qinhua Duan, Wenchao Gu, Yujia Chen, Wenxin Mao, Zewen Tian, Hui Cao
for: 这个研究是为了开发一个特别设计 для中文的大语言模型，以便在中文领域中表现出更高水准的表达能力。
methods: 我们使用了大量的中文资料来训练我们的模型，并且对模型结构进行优化。
results: 实验结果显示了我们的模型在中文领域的表现非常出色，表现比较有力。

Abstract
With the advancements in large language model technology, it has showcased capabilities that come close to those of human beings across various tasks. This achievement has garnered significant interest from companies and scientific research institutions, leading to substantial investments in the research and development of these models. While numerous large models have emerged during this period, the majority of them have been trained primarily on English data. Although they exhibit decent performance in other languages, such as Chinese, their potential remains limited due to factors like vocabulary design and training corpus. Consequently, their ability to fully express their capabilities in Chinese falls short. To address this issue, we introduce the model named JIANG (Chinese pinyin of ginger) specifically designed for the Chinese language. We have gathered a substantial amount of Chinese corpus to train the model and have also optimized its structure. The extensive experimental results demonstrate the excellent performance of our model.

摘要
随着大语言模型技术的发展，它们在不同任务上展示了人类水平的能力，吸引了企业和科研机构的广泛投资。然而，大多数这些模型都是以英语训练为主，尽管它们在其他语言上表现不错，但其潜力尚未得到完全发挥。这是因为语言设计和训练数据的因素所致。为了解决这个问题，我们介绍了专门为中文设计的模型——江（中文拼音的芳香）。我们收集了大量的中文训练数据，并优化了模型的结构。我们的广泛实验结果表明，我们的模型表现出了极佳的能力。

Beyond One-Hot-Encoding: Injecting Semantics to Drive Image Classifiers

paper_url: http://arxiv.org/abs/2308.00607
repo_url: https://github.com/s1m0n38/semantic-encodings
paper_authors: Alan Perotti, Simone Bertolotto, Eliana Pastor, André Panisson
for: The paper aims to improve the interpretability and trustworthiness of machine learning models for image classification by integrating semantic information into the training process.
methods: The authors propose a generic approach to derive an additional loss term starting from any kind of semantic information about the classification label, and demonstrate its application to ontologies and word embeddings.
results: The authors train image classifiers with the semantically enriched loss and analyze the trade-offs between accuracy, mistake severity, and learned internal representations. They also discuss the potential of this approach for improving explainability and adversarial robustness.

Abstract
Images are loaded with semantic information that pertains to real-world ontologies: dog breeds share mammalian similarities, food pictures are often depicted in domestic environments, and so on. However, when training machine learning models for image classification, the relative similarities amongst object classes are commonly paired with one-hot-encoded labels. According to this logic, if an image is labelled as 'spoon', then 'tea-spoon' and 'shark' are equally wrong in terms of training loss. To overcome this limitation, we explore the integration of additional goals that reflect ontological and semantic knowledge, improving model interpretability and trustworthiness. We suggest a generic approach that allows to derive an additional loss term starting from any kind of semantic information about the classification label. First, we show how to apply our approach to ontologies and word embeddings, and discuss how the resulting information can drive a supervised learning process. Second, we use our semantically enriched loss to train image classifiers, and analyse the trade-offs between accuracy, mistake severity, and learned internal representations. Finally, we discuss how this approach can be further exploited in terms of explainability and adversarial robustness. Code repository: https://github.com/S1M0N38/semantic-encodings

摘要
图像充满 semantics 信息：狗种类共享哺乳动物类似性，食物图像经常在家庭环境中描绘，等等。然而，在机器学习模型图像分类训练中，对象类之间的相似性通常通过一键编码标签进行表示。根据这种逻辑，如果一张图像被标记为 " Spoon "，那么 " Tea-spoon " 和 " 鲨鱼 " 在训练损失方面都是等错的。为了超越这些限制，我们探讨了 Semantic 和 Ontology 知识的集成，以提高模型解释性和可靠性。我们提出了一个通用的方法，可以从任何类型的 semantics 信息开始，生成一个额外的损失项。首先，我们介绍了如何应用我们的方法到 Ontologies 和 Word Embeddings 中，并讨论了如何使得这些信息驱动一个监督学习过程。其次，我们使用我们具有Semantically 增强的损失函数来训练图像分类器，并分析了准确率、错误严重性和学习的内部表示之间的贸易。最后，我们讨论了如何进一步利用这种方法，以提高解释性和对抗攻击性。代码库：https://github.com/S1M0N38/semantic-encodings

Collaborative filtering to capture AI user’s preferences as norms

paper_url: http://arxiv.org/abs/2308.02542
repo_url: None
paper_authors: Marc Serramia, Natalia Criado, Michael Luck
for: 本研究旨在提高人工智能技术的个性化设置，以更好地满足用户的需求。
methods: 本研究使用了协同推荐算法，通过分析大量用户对整体系统的偏好信息，自动地捕捉用户的偏好。
results: 研究发现，通过协同推荐算法可以准确地捕捉用户的偏好，并且可以避免用户过度参与设置过程，从而提高人工智能技术的使用体验。

Abstract
Customising AI technologies to each user's preferences is fundamental to them functioning well. Unfortunately, current methods require too much user involvement and fail to capture their true preferences. In fact, to avoid the nuisance of manually setting preferences, users usually accept the default settings even if these do not conform to their true preferences. Norms can be useful to regulate behaviour and ensure it adheres to user preferences but, while the literature has thoroughly studied norms, most proposals take a formal perspective. Indeed, while there has been some research on constructing norms to capture a user's privacy preferences, these methods rely on domain knowledge which, in the case of AI technologies, is difficult to obtain and maintain. We argue that a new perspective is required when constructing norms, which is to exploit the large amount of preference information readily available from whole systems of users. Inspired by recommender systems, we believe that collaborative filtering can offer a suitable approach to identifying a user's norm preferences without excessive user involvement.

摘要
We argue that a new perspective is needed when constructing norms, one that leverages the abundance of preference information available from large systems of users. Inspired by recommender systems, we believe that collaborative filtering can be a suitable approach to identifying a user's norm preferences without excessive user involvement. By analyzing the preferences of similar users, we can create a normative framework that is more accurately tailored to each individual's needs and preferences. This approach has the potential to improve the effectiveness of AI technologies and enhance user experience.

Towards More Human-like AI Communication: A Review of Emergent Communication Research

paper_url: http://arxiv.org/abs/2308.02541
repo_url: None
paper_authors: Nicolo’ Brandizzi
for: 本研究旨在探讨人类语言使用的规律和人工智能机器的沟通方式，以帮助机器更好地使用自然语言进行人机交互。
methods: 本研究使用了 emergent communication（Emecom）的方法，即通过人工智能机器学习自然语言的使用方式，以便更好地沟通和学习新的概念。
results: 本研究通过分析了各种相关研究的共同特征，并将其分为两个子类别，以便更好地了解人类语言使用的规律和人工智能机器的沟通方式。

Abstract
In the recent shift towards human-centric AI, the need for machines to accurately use natural language has become increasingly important. While a common approach to achieve this is to train large language models, this method presents a form of learning misalignment where the model may not capture the underlying structure and reasoning humans employ in using natural language, potentially leading to unexpected or unreliable behavior. Emergent communication (Emecom) is a field of research that has seen a growing number of publications in recent years, aiming to develop artificial agents capable of using natural language in a way that goes beyond simple discriminative tasks and can effectively communicate and learn new concepts. In this review, we present Emecom under two aspects. Firstly, we delineate all the common proprieties we find across the literature and how they relate to human interactions. Secondly, we identify two subcategories and highlight their characteristics and open challenges. We encourage researchers to work together by demonstrating that different methods can be viewed as diverse solutions to a common problem and emphasize the importance of including diverse perspectives and expertise in the field. We believe a deeper understanding of human communication is crucial to developing machines that can accurately use natural language in human-machine interactions.

摘要
In this review, we examine Emecom from two perspectives. First, we identify common properties found across the literature and how they relate to human interactions. Second, we categorize Emecom into two subcategories and highlight their characteristics and open challenges. We emphasize the importance of including diverse perspectives and expertise in the field, as we believe a deeper understanding of human communication is crucial to developing machines that can accurately use natural language in human-machine interactions.

Reinforcement Learning-based Non-Autoregressive Solver for Traveling Salesman Problems

paper_url: http://arxiv.org/abs/2308.00560
repo_url: None
paper_authors: Yubin Xiao, Di Wang, Huanhuan Chen, Boyang Li, Wei Pang, Xuan Wu, Hao Li, Dong Xu, Yanchun Liang, You Zhou
for: 提出了一种基于 Graph Neural Network (GNN) 和 reinforcement learning (RL) 的 Traveling Salesman Problem (TSP) 解决方案，以提高解决速度和解决质量。
methods: 使用了一种特制的 GNN 来实现非推理 (NAR) decoding，并使用了一种提高后RL策略来消除依赖于高成本的标签来训练传统的超级学习型 NAR 模型。
results: 在 synthetic 和实际世界 TSP 实例上进行了实验，并证明了 NAR4TSP 在解决质量、推理速度和泛化能力等方面都比四种现有模型更好。同时，还提供了 NAR4TSP 的解码过程和总路径规划的视觉化图表，以示其可行性和效果。

Abstract
The Traveling Salesman Problem (TSP) is a well-known problem in combinatorial optimization with applications in various domains. However, existing TSP solvers face challenges in producing high-quality solutions with low latency. To address this issue, we propose NAR4TSP, which produces TSP solutions in a Non-Autoregressive (NAR) manner using a specially designed Graph Neural Network (GNN), achieving faster inference speed. Moreover, NAR4TSP is trained using an enhanced Reinforcement Learning (RL) strategy, eliminating the dependency on costly labels used to train conventional supervised learning-based NAR models. To the best of our knowledge, NAR4TSP is the first TSP solver that successfully combines RL and NAR decoding. The experimental results on both synthetic and real-world TSP instances demonstrate that NAR4TSP outperforms four state-of-the-art models in terms of solution quality, inference latency, and generalization ability. Lastly, we present visualizations of NAR4TSP's decoding process and its overall path planning to showcase the feasibility of implementing NAR4TSP in an end-to-end manner and its effectiveness, respectively.

摘要
旅途卖士问题（TSP）是一个广泛应用的 combinatorial 优化问题。然而，现有的 TSP 解决方案面临着生成高质量解决方案的延迟问题。为解决这个问题，我们提出了 NAR4TSP，它使用特制的图神经网络（GNN）来生成非自适应（NAR）的 TSP 解决方案，实现更快的推理速度。此外，NAR4TSP 通过改进的强化学习（RL）策略进行训练，从而消除了对传统的超级vised学习基于 NAR 模型的贵重标签的依赖。根据我们所知，NAR4TSP 是第一个成功地将 RL 和 NAR 推理结合的 TSP 解决方案。实验结果表明，NAR4TSP 在 synthetic 和实际世界 TSP 实例上比四种现状的模型高于 solution 质量、推理延迟和泛化能力。最后，我们提供了 NAR4TSP 的推理过程和总路径规划的视觉化来展示 NAR4TSP 的端到端实现可行性和效iveness。

Copula for Instance-wise Feature Selection and Ranking

paper_url: http://arxiv.org/abs/2308.00549
repo_url: None
paper_authors: Hanyu Peng, Guanhua Fang, Ping Li
for: 提高神经网络中特征选择和排序的精度，增强模型的性能和可读性。
methods: integrate Gaussian copula into current feature selection framework，无需更改现有的方法。
results: 在 sintetic 和实际数据上，对比现有方法，我们的方法能够更好地捕捉各特征之间的相互关系，提高模型的性能和可读性。

Abstract
Instance-wise feature selection and ranking methods can achieve a good selection of task-friendly features for each sample in the context of neural networks. However, existing approaches that assume feature subsets to be independent are imperfect when considering the dependency between features. To address this limitation, we propose to incorporate the Gaussian copula, a powerful mathematical technique for capturing correlations between variables, into the current feature selection framework with no additional changes needed. Experimental results on both synthetic and real datasets, in terms of performance comparison and interpretability, demonstrate that our method is capable of capturing meaningful correlations.

摘要
<>Instance-wise 特征选择和排名方法可以实现每个样本中任务友好的特征选择。然而，现有的方法假设特征子集为独立的时，存在相互关系的限制。为解决这 limitation，我们提议在当前特征选择框架中 incorporate Gaussian copula，一种强大的数学技术，用于捕捉特征之间的相关性。实验结果表明，我们的方法可以捕捉有意义的相关性。Note:* "Instance-wise" is translated as "每个样本中" (each sample)* "特征选择" is translated as "特征选择" (feature selection)* "排名" is translated as "排名" (ranking)* "Gaussian copula" is translated as " Gaussian copula" (Gaussian copula)* "相关性" is translated as "相关性" (correlation)

Predicting Early Dropouts of an Active and Healthy Ageing App

paper_url: http://arxiv.org/abs/2308.00539
repo_url: None
paper_authors: Vasileios Perifanis, Ioanna Michailidi, Giorgos Stamatelatos, George Drosatos, Pavlos S. Efraimidis
for: The paper is written for predicting early dropouts of an active and healthy ageing app.
methods: The paper uses machine learning algorithms, specifically classification models constructed using pre-processing techniques and dynamic/static features. The authors also employed oversampling methods like SMOTE and ADASYN to improve performance.
results: The paper achieved high-quality adherence predictions, with dynamic features positively influencing the model’s performance. The oversampling approaches led to a remarkable improvement of 10%. The authors won first place in the IFMBE Scientific Challenge 2022.Here’s the simplified Chinese text for the three points:
for: 这篇论文是为预测活健年龄应用中早期退出的研究。
methods: 这篇论文使用机器学习算法，具体来说是使用预处理技术构建的分类模型，并使用动态/静止特征进行预测。作者还使用了SMOTE和ADASYN等扩大samples方法来提高分类性能。
results: 这篇论文实现了高质量遵从预测，动态特征对模型性能有积极影响。使用扩大samples方法导致了10%的显著提高。作者在IFMBE科学挑战赛2022中获得了第一名。

Abstract
In this work, we present a machine learning approach for predicting early dropouts of an active and healthy ageing app. The presented algorithms have been submitted to the IFMBE Scientific Challenge 2022, part of IUPESM WC 2022. We have processed the given database and generated seven datasets. We used pre-processing techniques to construct classification models that predict the adherence of users using dynamic and static features. We submitted 11 official runs and our results show that machine learning algorithms can provide high-quality adherence predictions. Based on the results, the dynamic features positively influence a model's classification performance. Due to the imbalanced nature of the dataset, we employed oversampling methods such as SMOTE and ADASYN to improve the classification performance. The oversampling approaches led to a remarkable improvement of 10\%. Our methods won first place in the IFMBE Scientific Challenge 2022.

摘要
在这项工作中，我们提出了一种机器学习方法来预测活动和健康年龄应用程序中早期退出的问题。我们在IFMBE科学挑战2022中提交了这些算法，该活动是IUPESM WC 2022的一部分。我们对给定的数据库进行了处理，生成了七个数据集。我们使用了预处理技术来构建分类模型，以预测用户的执行情况。我们提交了11个官方运行，结果表明机器学习算法可以提供高质量的执行预测。据结果显示，动态特征对模型的分类性能产生了积极的影响。由于数据集具有偏斜性，我们使用了扩大样本的方法，如SMOTE和ADASYN，以提高分类性能。这些扩大方法导致了10%的明显改善。我们的方法在IFMBE科学挑战2022中获得了第一名。

PressureTransferNet: Human Attribute Guided Dynamic Ground Pressure Profile Transfer using 3D simulated Pressure Maps

paper_url: http://arxiv.org/abs/2308.00538
repo_url: None
paper_authors: Lala Shakti Swarup Ray, Vitor Fortes Rey, Bo Zhou, Sungho Suh, Paul Lukowicz
for: 人体活动识别(HAR)系统的研究和开发
methods: 利用现有的压力数据和编码-解码模型，生成具有特定活动特征的体具压力 profiless
results: 在不同场景下，准确地将人体特征传递到地面压力Profile中，并通过物理学基金amentals的深度学习模型进行验证。

Abstract
We propose PressureTransferNet, a novel method for Human Activity Recognition (HAR) using ground pressure information. Our approach generates body-specific dynamic ground pressure profiles for specific activities by leveraging existing pressure data from different individuals. PressureTransferNet is an encoder-decoder model taking a source pressure map and a target human attribute vector as inputs, producing a new pressure map reflecting the target attribute. To train the model, we use a sensor simulation to create a diverse dataset with various human attributes and pressure profiles. Evaluation on a real-world dataset shows its effectiveness in accurately transferring human attributes to ground pressure profiles across different scenarios. We visually confirm the fidelity of the synthesized pressure shapes using a physics-based deep learning model and achieve a binary R-square value of 0.79 on areas with ground contact. Validation through classification with F1 score (0.911$\pm$0.015) on physical pressure mat data demonstrates the correctness of the synthesized pressure maps, making our method valuable for data augmentation, denoising, sensor simulation, and anomaly detection. Applications span sports science, rehabilitation, and bio-mechanics, contributing to the development of HAR systems.

摘要
我们提出了PressureTransferNet，一种新的人动作认识（HAR）方法，使用地面压力信息。我们的方法生成了特定活动的体部特有的动态地面压力profile，通过利用不同个体的压力数据。PressureTransferNet是一个Encoder-Decoder模型，接受一个源压力地图和一个目标人类特征向量作为输入，生成一个新的压力地图，表示目标特征。我们使用感知模拟生成了多种人类特征和压力profile的多样化数据集，用于训练模型。我们通过对实际数据进行评估，发现PressureTransferNet能够准确地将人类特征传递到地面压力profile中，并在不同场景下保持高度的准确率。我们通过physics-based深度学习模型进行视觉验证，并达到了0.79的二元R-平方值在地面接触区域上，这表明我们生成的压力地图具有高度的准确性。我们通过对物理压力检测数据进行分类，获得了0.911±0.015的F1分数，这表明我们的方法可以准确地生成压力地图，从而在数据增强、噪声除除、感知模拟和异常检测等方面具有价值。这些应用包括运动科学、rehabilitation和生物机器学，这将为人动作认识系统的发展提供重要支持。

Graph Embedding Dynamic Feature-based Supervised Contrastive Learning of Transient Stability for Changing Power Grid Topologies

paper_url: http://arxiv.org/abs/2308.00537
repo_url: None
paper_authors: Zijian Lv, Xin Chen, Zijian Feng
for: 精确的在线稳定性预测对于电力系统稳定性是关键，特别是在面临干扰时。传统稳定性分析使用时间域模拟不能快速适应电力网络结构变化。
methods: 以graph embedding dynamic feature（GEDF）为基础，提出了基于超级对比学习的稳定性GEDF-SCL模型，可以预测稳定性，考虑到电力网络结构信息。
results: 对于不同的电力网络结构，通过对N-1和N-m-1干扰情况的模拟，测试结果表明，GEDF-SCL模型可以达到高精度的稳定性预测，并适应电力网络结构变化。

Abstract
Accurate online transient stability prediction is critical for ensuring power system stability when facing disturbances. While traditional transient stablity analysis replies on the time domain simulations can not be quickly adapted to the power grid toplogy change. In order to vectorize high-dimensional power grid topological structure information into low-dimensional node-based graph embedding streaming data, graph embedding dynamic feature (GEDF) has been proposed. The transient stability GEDF-based supervised contrastive learning (GEDF-SCL) model uses supervised contrastive learning to predict transient stability with GEDFs, considering power grid topology information. To evaluate the performance of the proposed GEDF-SCL model, power grids of varying topologies were generated based on the IEEE 39-bus system model. Transient operational data was obtained by simulating N-1 and N-$\bm{m}$-1 contingencies on these generated power system topologies. Test result demonstrated that the GEDF-SCL model can achieve high accuracy in transient stability prediction and adapt well to changing power grid topologies.

摘要
Traditional transient stability analysis 使用时域模拟，不能快速适应发电系统结构变化。为了将高维电力网结构信息归一化成低维节点基本图卷积数据，提出了图嵌入动态特征（GEDF）。在基于 GEDF 的超级vised contrastive learning（GEDF-SCL）模型中，通过超级vised contrastive learning来预测稳定性，考虑发电系统拓扑信息。为评估提议的 GEDF-SCL 模型表现，对 IEEE 39-bus 系统模型中生成的不同拓扑的发电系统进行了 simulate N-1 和 N-m-1 的稳定操作数据。测试结果表明， GEDF-SCL 模型可以高度准确地预测稳定性，并适应发电系统拓扑变化。Note: "Simplified Chinese" is also known as "Mandarin" or "Standard Chinese".

Transfer-Ensemble Learning based Deep Convolutional Neural Networks for Diabetic Retinopathy Classification

paper_url: http://arxiv.org/abs/2308.00525
repo_url: None
paper_authors: Susmita Ghosh, Abhiroop Chatterjee
for: 这篇论文的目的是用一个ensemble方法来分类糖尿病性视网膜病（DR）为五个不同的类别。methods: 这个模型使用了两个流行的预训练 convolutional neural network：VGG16和Inception V3。 ensemble模型架构中将这两个预训练模型的部分冻结以利用它们已经学习的表示。全球均值层 pooling层被添加以将输入图像的特征地图转换为固定长度的 вектор。results: 实验结果显示，这个ensemble模型可以高度有效地分类糖尿病性视网膜病，其准确率为96.4%。

Abstract
This article aims to classify diabetic retinopathy (DR) disease into five different classes using an ensemble approach based on two popular pre-trained convolutional neural networks: VGG16 and Inception V3. The proposed model aims to leverage the strengths of the two individual nets to enhance the classification performance for diabetic retinopathy. The ensemble model architecture involves freezing a portion of the layers in each pre-trained model to utilize their learned representations effectively. Global average pooling layers are added to transform the output feature maps into fixed-length vectors. These vectors are then concatenated to form a consolidated representation of the input image. The ensemble model is trained using a dataset of diabetic retinopathy images (APTOS), divided into training and validation sets. During the training process, the model learns to classify the retinal images into the corresponding diabetic retinopathy classes. Experimental results on the test set demonstrate the efficacy of the proposed ensemble model for DR classification achieving an accuracy of 96.4%.

摘要
Translated into Simplified Chinese:这篇文章旨在使用ensemble方法将糖尿病肠病(DR)分为五个不同的类别，使用两个流行的预训练 convolutional neural networks：VGG16和Inception V3。提议的模型旨在利用这两个个体网络的优势，以提高糖尿病肠病的分类性能。模型的架构包括冻结一部分的层数在每个预训练模型中，以利用它们已经学习的表示。全局平均 pooling层被添加，以将输出特征图 transformed into fixed-length vectors。这些 векторы被 concatenated 以形成输入图像的总合表示。模型通过使用 APTOS 数据集（训练和验证集）进行训练，在测试集上达到了96.4%的准确率。

SurveyLM: A platform to explore emerging value perspectives in augmented language models’ behaviors

paper_url: http://arxiv.org/abs/2308.00521
repo_url: None
paper_authors: Steve J. Bickley, Ho Fai Chan, Bang Dao, Benno Torgler, Son Tran
for: 这白皮assailed our work on SurveyLM，一个用于分析人工智能语言模型（ALM）在复杂社会场景中的自适应对行为的平台。
methods: 我们使用了survey和实验方法， traditionally used in studying social behaviors，来系统地评估ALMs，从而提供了尚未有的对ALMs的Alignment和emergent behaviors的深入理解。
results: 通过SurveyLM平台，我们发现了一些因素影响ALMs的emergent behaviors，并可以通过调整survey和实验设计来推动ALMs的Alignment with human intentions and expectations。这些结果有助于负责任地开发和部署高级社会AI系统。

Abstract
This white paper presents our work on SurveyLM, a platform for analyzing augmented language models' (ALMs) emergent alignment behaviors through their dynamically evolving attitude and value perspectives in complex social contexts. Social Artificial Intelligence (AI) systems, like ALMs, often function within nuanced social scenarios where there is no singular correct response, or where an answer is heavily dependent on contextual factors, thus necessitating an in-depth understanding of their alignment dynamics. To address this, we apply survey and experimental methodologies, traditionally used in studying social behaviors, to evaluate ALMs systematically, thus providing unprecedented insights into their alignment and emergent behaviors. Moreover, the SurveyLM platform leverages the ALMs' own feedback to enhance survey and experiment designs, exploiting an underutilized aspect of ALMs, which accelerates the development and testing of high-quality survey frameworks while conserving resources. Through SurveyLM, we aim to shed light on factors influencing ALMs' emergent behaviors, facilitate their alignment with human intentions and expectations, and thereby contributed to the responsible development and deployment of advanced social AI systems. This white paper underscores the platform's potential to deliver robust results, highlighting its significance to alignment research and its implications for future social AI systems.

摘要

Explainable Graph Spectral Clustering of Text Documents

paper_url: http://arxiv.org/abs/2308.00504
repo_url: None
paper_authors: Bartłomiej Starosta, Mieczysław A. Kłopotek, Sławomir T. Wierzchoń
for: 本研究旨在提供spectral clustering结果的解释方法，以便用户更好地理解和理解文档 clustering结果。
methods: 本文提出了一种基于combinatorial Laplacian的图spectral clustering解释方法，包括approximate equivalence of combinatorial Laplacian embedding, $K$-embedding和term vector space embedding。
results: 经过实验研究，$K$-embedding可以准确地 aproximate Laplacian embedding，并且在某些情况下，approximation是够好的。

Abstract
Spectral clustering methods are known for their ability to represent clusters of diverse shapes, densities etc. However, results of such algorithms, when applied e.g. to text documents, are hard to explain to the user, especially due to embedding in the spectral space which has no obvious relation to document contents. Therefore there is an urgent need to elaborate methods for explaining the outcome of the clustering. This paper presents a contribution towards this goal. We present a proposal of explanation of results of combinatorial Laplacian based graph spectral clustering. It is based on showing (approximate) equivalence of combinatorial Laplacian embedding, $K$-embedding (proposed in this paper) and term vector space embedding. Hence a bridge is constructed between the textual contents and the clustering results. We provide theoretical background for this approach. We performed experimental study showing that $K$-embedding approximates well Laplacian embedding under favourable block matrix conditions and show that approximation is good enough under other conditions.

摘要
spectral clustering 方法 known for its ability to represent clusters of diverse shapes, densities, etc. However, the results of such algorithms, when applied to text documents, are hard to explain to the user, especially due to the embedding in the spectral space, which has no obvious relation to the document contents. Therefore, there is an urgent need to elaborate methods for explaining the outcome of the clustering. This paper presents a contribution towards this goal. We present a proposal of explanation of the results of combinatorial Laplacian-based graph spectral clustering. It is based on showing (approximate) equivalence of combinatorial Laplacian embedding, $K$-embedding (proposed in this paper), and term vector space embedding. Therefore, a bridge is constructed between the textual contents and the clustering results. We provide theoretical background for this approach. We performed experimental studies showing that $K$-embedding approximates well Laplacian embedding under favourable block matrix conditions, and show that the approximation is good enough under other conditions.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Retrieval Augmented Generation and Representative Vector Summarization for large unstructured textual data in Medical Education

paper_url: http://arxiv.org/abs/2308.00479
repo_url: https://github.com/ssm123ssm/docgpt-pharm
paper_authors: S. S. Manathunga, Y. A. Illangasekara
for: 这篇论文针对对医疗教育领域中大型自然语言模型的应用进行探讨，旨在降低对特定任务的推理错误和生成危险答案。
methods: 论文提出了一种叫做Retrieval Augmented Generation（RAG）的方法，可以轻松地将非 Parametric 知识库附加到大型自然语言模型中，并且可以对这些模型进行摘要和抽象Summary的生成。
results: 论文发现RAG方法可以帮助大型自然语言模型在医疗教育领域中提供更好的答案，并且可以对模型的推理错误和生成危险答案进行降低。

Abstract
Large Language Models are increasingly being used for various tasks including content generation and as chatbots. Despite their impressive performances in general tasks, LLMs need to be aligned when applying for domain specific tasks to mitigate the problems of hallucination and producing harmful answers. Retrieval Augmented Generation (RAG) allows to easily attach and manipulate a non-parametric knowledgebases to LLMs. Applications of RAG in the field of medical education are discussed in this paper. A combined extractive and abstractive summarization method for large unstructured textual data using representative vectors is proposed.

摘要

A Satellite Imagery Dataset for Long-Term Sustainable Development in United States Cities

paper_url: http://arxiv.org/abs/2308.00465
repo_url: https://github.com/axin1301/satellite-imagery-dataset
paper_authors: Yanxin Xi, Yu Liu, Tong Li, Jintao Ding, Yunke Zhang, Sasu Tarkoma, Yong Li, Pan Hui
for: 这份研究是为了支持美国城市的可持续开发目标（SDGs）研究，尤其是使用卫星影像来研究城市可持续发展。
methods: 研究使用了深度学习模型，收集了卫星影像和其他数据，包括人口、夜间照明、调查和城市建筑数据，以描述城市的可持续开发指标。
results: 研究创建了一个覆盖100个最大城市和相应的人口普查区域的卫星影像数据集，可以帮助城市规划师和研究人员进一步推进SDGs相关的研究，特别是使用卫星影像来监控城市长期和多个构度的可持续开发。

Abstract
Cities play an important role in achieving sustainable development goals (SDGs) to promote economic growth and meet social needs. Especially satellite imagery is a potential data source for studying sustainable urban development. However, a comprehensive dataset in the United States (U.S.) covering multiple cities, multiple years, multiple scales, and multiple indicators for SDG monitoring is lacking. To support the research on SDGs in U.S. cities, we develop a satellite imagery dataset using deep learning models for five SDGs containing 25 sustainable development indicators. The proposed dataset covers the 100 most populated U.S. cities and corresponding Census Block Groups from 2014 to 2023. Specifically, we collect satellite imagery and identify objects with state-of-the-art object detection and semantic segmentation models to observe cities' bird's-eye view. We further gather population, nighttime light, survey, and built environment data to depict SDGs regarding poverty, health, education, inequality, and living environment. We anticipate the dataset to help urban policymakers and researchers to advance SDGs-related studies, especially applying satellite imagery to monitor long-term and multi-scale SDGs in cities.

摘要
Translated into Simplified Chinese:城市发挥重要作用于实现可持续发展目标(SDGs)，推动经济增长并满足社会需求。尤其是卫星成像是可能的数据源，用于研究可持续城市发展。然而，美国（U.S.）覆盖多个城市、多年、多级、多指标的全面数据集缺乏。为支持美国城市的SDGs研究，我们开发了使用深度学习模型的卫星成像数据集，包括5个SDGs和25个可持续发展指标。该数据集覆盖了美国100个最大人口城市以及相应的人口普查小区，从2014年到2023年。我们通过使用当前的物体检测和semantic segmentation模型，从卫星成像中识别城市的 Bird's-eye view。此外，我们还收集了人口、夜光亮、调查和建筑环境数据，以描绘SDGs关于贫困、健康、教育、不平等和生活环境等方面。我们预计该数据集将帮助城市规划者和研究人员，通过使用卫星成像，对多个城市进行长期和多级SDGs监测。

DMFC-GraspNet: Differentiable Multi-Fingered Robotic Grasp Generation in Cluttered Scenes

paper_url: http://arxiv.org/abs/2308.00456
repo_url: None
paper_authors: Philipp Blättner, Johannes Brand, Gerhard Neumann, Ngo Anh Vien
for: 提高多指机器人抓取技能的计算效率和多样性
methods: 提出了一种差分 grasp generation 网络（DMFC-GraspNet），包括三大贡献：一种新的神经网络抓取规划算法、一种场景创建和标签映射方法，以及一种综合损失函数和 generalized Q 1 抓取评价指标来训练 DMFC-GraspNet。
results: 对于使用 Shadow Dexterous Hand 在 MuJoCo simulator 进行测试，提出的方法能够提高多指机器人抓取技能的计算效率和多样性，并在多指机器人抓取领域取得显著进步。

Abstract
Robotic grasping is a fundamental skill required for object manipulation in robotics. Multi-fingered robotic hands, which mimic the structure of the human hand, can potentially perform complex object manipulation. Nevertheless, current techniques for multi-fingered robotic grasping frequently predict only a single grasp for each inference time, limiting computational efficiency and their versatility, i.e. unimodal grasp distribution. This paper proposes a differentiable multi-fingered grasp generation network (DMFC-GraspNet) with three main contributions to address this challenge. Firstly, a novel neural grasp planner is proposed, which predicts a new grasp representation to enable versatile and dense grasp predictions. Secondly, a scene creation and label mapping method is developed for dense labeling of multi-fingered robotic hands, which allows a dense association of ground truth grasps. Thirdly, we propose to train DMFC-GraspNet end-to-end using using a forward-backward automatic differentiation approach with both a supervised loss and a differentiable collision loss and a generalized Q 1 grasp metric loss. The proposed approach is evaluated using the Shadow Dexterous Hand on Mujoco simulation and ablated by different choices of loss functions. The results demonstrate the effectiveness of the proposed approach in predicting versatile and dense grasps, and in advancing the field of multi-fingered robotic grasping.

摘要
瑞博机器人抓取是机器人控制领域的基本技能之一，可以帮助机器人抓取和操作物体。多指机器人手臂，它们模仿人类手臂的结构，可以执行复杂的物体抓取。然而，当前的多指机器人抓取技术 frequently predicts only a single grasp for each inference time，这限制了计算效率和其多样性，即单模态抓取分布。这篇论文提出了一种可微分的多指机器人抓取生成网络（DMFC-GraspNet），具有以下三个贡献：首先，一种新的神经网络抓取规划器被提出，可以预测多种抓取方式，以提高抓取多样性和密度。第二，一种场景创建和标签映射方法被开发出来，用于密集标注多指机器人手臂。这allow us to associate dense ground truth grasps with the robotic hands.第三，我们提议使用一种综合损失函数和自动梯度推导法来训练DMFC-GraspNet，包括监督损失和不可 differentiable损失。我们还使用一种通用Q1抓取度量loss。我们在Mujoco simulate中使用瑞博dex手臂进行训练和磨练，并对不同损失函数进行ablation。结果表明，我们的方法可以预测多样和密集的抓取方式，并在多指机器人抓取领域提高了前iers。

MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers

paper_url: http://arxiv.org/abs/2308.03741
repo_url: None
paper_authors: Muhammad Bilal Shaikh, Douglas Chai, Syed Mohammed Shamsul Islam, Naveed Akhtar
for: 提高多模态人体动作识别（MHAR）的效果
methods: 利用音频模式和图像模式的结合，通过将音频模式转化到图像模式中，形成一个统一的表示。
results: 与现有的状态 искусственный智能策略相比，MAiVAR-T表现出色，实验结果证明了模型在人体动作识别任务中的优异表现。

Abstract
In line with the human capacity to perceive the world by simultaneously processing and integrating high-dimensional inputs from multiple modalities like vision and audio, we propose a novel model, MAiVAR-T (Multimodal Audio-Image to Video Action Recognition Transformer). This model employs an intuitive approach for the combination of audio-image and video modalities, with a primary aim to escalate the effectiveness of multimodal human action recognition (MHAR). At the core of MAiVAR-T lies the significance of distilling substantial representations from the audio modality and transmuting these into the image domain. Subsequently, this audio-image depiction is fused with the video modality to formulate a unified representation. This concerted approach strives to exploit the contextual richness inherent in both audio and video modalities, thereby promoting action recognition. In contrast to existing state-of-the-art strategies that focus solely on audio or video modalities, MAiVAR-T demonstrates superior performance. Our extensive empirical evaluations conducted on a benchmark action recognition dataset corroborate the model's remarkable performance. This underscores the potential enhancements derived from integrating audio and video modalities for action recognition purposes.

摘要
根据人类能同时处理和 integrate 多个模式的高维输入，我们提出一种新的模型，MAiVAR-T（多模态音频图像到视频动作识别变换器）。这个模型采用一种直观的方法将 audio-image 和 video 模式结合，以提高多模态人体动作识别（MHAR）的效果。MAiVAR-T 的核心在于将 audio 模式中的重要表示转化到图像频谱中，然后将这些图像表示与 video 模式结合，形成一个统一的表示。这种结合方法利用了 audio 和 video 模式中的内在背景，提高动作识别的性能。与现有的state-of-the-art策略不同，MAiVAR-T 不仅仅关注 audio 或 video 模式，而是通过结合这两种模式来提高动作识别的效果。我们的广泛的实验证明，MAiVAR-T 在一个标准的动作识别数据集上表现出优秀的成绩，这证明了将 audio 和 video 模式结合起来可以提高动作识别的性能。

Structural Embeddings of Tools for Large Language Models

paper_url: http://arxiv.org/abs/2308.00447
repo_url: None
paper_authors: Eren Unlu
for: 本研究的中心目标是强调在未来，大语言模型（LLM）与外部工具之间的图形基本方法的重要性。
methods: 该研究提出了一种例子框架，用于导航 LLM 与 exponentially 增长的外部工具之间的互动。该框架使用图形编码对象ives和功能性，以便在不同任务下进行可控的组合。
results: 该研究认为，图形基本方法可以为 LLM 在不同任务下的扩展和应用带来新的可能性，包括文本段的链式思维（CoT）等。

Abstract
It is evident that the current state of Large Language Models (LLMs) necessitates the incorporation of external tools. The lack of straightforward algebraic and logical reasoning is well documented and prompted researchers to develop frameworks which allow LLMs to operate via external tools. The ontological nature of tool utilization for a specific task can be well formulated with a Directed Acyclic Graph (DAG). The central aim of the paper is to highlight the importance of graph based approaches to LLM-tool interaction in near future. We propose an exemplary framework to guide the orchestration of exponentially increasing numbers of external tools with LLMs,where objectives and functionalities of tools are graph encoded hierarchically. Assuming that textual segments of a Chain-of-Thought (CoT) can be imagined as a tool as defined here, the graph based framework can pave new avenues in that particular direction as well.

摘要
现在的大语言模型（LLM）需要外部工具的整合。lack of 直觉的代数逻辑已经被文献所证明，促使研究人员发展出用外部工具进行 LLM 操作的框架。 ontological 性的工具使用方式可以通过指向不同的导航图（DAG）来形式化。本文的主要目的是强调在未来中graph基本方法将在 LLM 与外部工具之间扮演重要的角色。我们提出了一个示范性的框架，以引导 exponentially 增加的外部工具与 LLM 之间的协调，其中工具的目标和功能将在层次结构中被图解编码。假设文本段落可以被想象为一个链接思维（CoT）中的工具，那么图基的框架将可以开启新的可能性。

ALE: A Simulation-Based Active Learning Evaluation Framework for the Parameter-Driven Comparison of Query Strategies for NLP

paper_url: http://arxiv.org/abs/2308.02537
repo_url: https://github.com/philipp-kohl/active-learning-evaluation-framework
paper_authors: Philipp Kohl, Nils Freyer, Yoka Krämer, Henri Werth, Steffen Wolf, Bodo Kraft, Matthias Meinecke, Albert Zündorf
for: This paper aims to provide an empirical basis for choosing between different active learning (AL) strategies in natural language processing (NLP) tasks.methods: The paper introduces a reproducible active learning evaluation (ALE) framework for comparing AL strategies in NLP. The framework allows for the implementation of AL strategies with low effort and a fair data-driven comparison, and it tracks experiment parameters such as initial dataset size, number of data points per query step, and budget.results: The paper presents a case study to illustrate how to use the ALE framework, and it provides a basis for practitioners to make more informed decisions and for researchers to focus on developing new, effective AL strategies and deriving best practices for specific use cases.

Abstract
Supervised machine learning and deep learning require a large amount of labeled data, which data scientists obtain in a manual, and time-consuming annotation process. To mitigate this challenge, Active Learning (AL) proposes promising data points to annotators they annotate next instead of a subsequent or random sample. This method is supposed to save annotation effort while maintaining model performance. However, practitioners face many AL strategies for different tasks and need an empirical basis to choose between them. Surveys categorize AL strategies into taxonomies without performance indications. Presentations of novel AL strategies compare the performance to a small subset of strategies. Our contribution addresses the empirical basis by introducing a reproducible active learning evaluation (ALE) framework for the comparative evaluation of AL strategies in NLP. The framework allows the implementation of AL strategies with low effort and a fair data-driven comparison through defining and tracking experiment parameters (e.g., initial dataset size, number of data points per query step, and the budget). ALE helps practitioners to make more informed decisions, and researchers can focus on developing new, effective AL strategies and deriving best practices for specific use cases. With best practices, practitioners can lower their annotation costs. We present a case study to illustrate how to use the framework.

摘要
超vised机器学习和深度学习需要大量标注数据，数据科学家通过手动、时间consuming的标注过程获取。为了解决这个挑战，活动学习（AL）提出了有前提的数据点，而不是随机或后续的样本。这种方法可以降低标注努力的时间和成本，同时保持模型性能。然而，实践者面临着许多AL策略，需要一个经验基础来选择。现有的survey categorizes AL策略，但没有表现指标。文章提出了一个可重复的活动学习评价（ALE）框架，用于比较AL策略的相对评价。该框架可以实现AL策略的实现，并且通过定义和跟踪实验参数（例如，初始数据集大小，每次查询步骤中的数据点数和预算）来进行公平的比较。ALE帮助实践者更加了解决，研究者可以更专注于开发新的、有效的AL策略和特定用例的最佳实践。通过最佳实践，实践者可以降低标注成本。我们在case study中示例如如何使用该框架。

SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning

paper_url: http://arxiv.org/abs/2308.00436
repo_url: https://github.com/ningmiao/selfcheck
paper_authors: Ning Miao, Yee Whye Teh, Tom Rainforth
for: 本研究旨在检验大语言模型（LLM）是否可以自动认错，而不需要外部资源。
methods: 我们提出了一种零 shot 验证方案，用于识别具有多步骤 reasoning 的错误。然后，我们使用这种验证方案来提高问答性能，通过对不同生成的答案进行权重投票。
results: 我们在三个数学 dataset（GSM8K、MathQA 和 MATH）上测试了这种方法，发现它可以成功识别错误，并在最终预测性能中提高表现。

Abstract
The recent progress in large language models (LLMs), especially the invention of chain-of-thoughts (CoT) prompting, makes it possible to solve reasoning problems. However, even the strongest LLMs are still struggling with more complicated problems that require non-linear thinking and multi-step reasoning. In this work, we explore whether LLMs have the ability to recognize their own errors, without resorting to external resources. In particular, we investigate whether they can be used to identify individual errors within a step-by-step reasoning. To this end, we propose a zero-shot verification scheme to recognize such errors. We then use this verification scheme to improve question-answering performance, by using it to perform weighted voting on different generated answers. We test the method on three math datasets-GSM8K, MathQA, and MATH-and find that it successfully recognizes errors and, in turn, increases final predictive performance.

摘要
最近的大语言模型（LLM）的进步，尤其是创造思维（CoT）提问的发明，使得解释问题变得可能。然而，即使最强的LLM也在更复杂的问题上尚未能具备非线性思维和多步逻辑。在这种情况下，我们询问LLM是否有能力自动发现错误，不需要外部资源。具体来说，我们研究LLM是否可以识别每个步逻辑中的错误。为此，我们提出了零shot验证方案，用于识别错误。然后，我们使用这种验证方案来提高问答性能，通过对不同生成的答案进行权重投票。我们在三个数学 dataset（GSM8K、MathQA和MATH）上测试了该方法，并发现它可以成功识别错误，并在最终预测性能中提高表现。

Patch-wise Auto-Encoder for Visual Anomaly Detection

paper_url: http://arxiv.org/abs/2308.00429
repo_url: None
paper_authors: Yajie Cui, Zhaoxiang Liu, Shiguo Lian
for: 提高无supervision anomaly detection的能力
methods: 使用patch-wise auto-encoder（Patch AE）框架，通过对每个图像 patch 的重建，提高模型对异常图像的重建能力
results: 在Mvtec AD benchmark上达到了新的州 Of-the-art性能，表明方法的效果。有很大的实际应用前景。

Abstract
Anomaly detection without priors of the anomalies is challenging. In the field of unsupervised anomaly detection, traditional auto-encoder (AE) tends to fail based on the assumption that by training only on normal images, the model will not be able to reconstruct abnormal images correctly. On the contrary, we propose a novel patch-wise auto-encoder (Patch AE) framework, which aims at enhancing the reconstruction ability of AE to anomalies instead of weakening it. Each patch of image is reconstructed by corresponding spatially distributed feature vector of the learned feature representation, i.e., patch-wise reconstruction, which ensures anomaly-sensitivity of AE. Our method is simple and efficient. It advances the state-of-the-art performances on Mvtec AD benchmark, which proves the effectiveness of our model. It shows great potential in practical industrial application scenarios.

摘要
寻找无先知 anomaly 是一项挑战。在无监督 anomaly detection 领域，传统的 auto-encoder (AE) 往往会因为只在正常图像上训练，导致模型无法正确地重建异常图像。相反，我们提出了一种 novel patch-wise auto-encoder (Patch AE) 框架，旨在增强 AE 对异常图像的重建能力，而不是弱化它。每个图像的每个patch 都由对应的空间分布的特征向量来重建， guarantees 异常敏感性。我们的方法简单、高效，可以提高 state-of-the-art 性能，证明了我们的模型的效果。它在实际工业应用场景中表现出了很大的潜力。

Generative adversarial networks with physical sound field priors

paper_url: http://arxiv.org/abs/2308.00426
repo_url: https://github.com/xefonon/soundfieldgan
paper_authors: Xenofon Karakonstantis, Efren Fernandez-Grande
for: 这种方法用于重构听场，使用生成敌对网络（GANs）。
methods: 该方法使用平面波基础，学习室内压力的下来分布，以准确地重构听场从有限多个测量点。
results: 研究表明，该模型可以在高频范围内提高重构精度和能量保持率，特别是在测量区域之外推算。此外，该方法可以适应不同测量点和配置的变化，无需牺牲性能。

Abstract
This paper presents a deep learning-based approach for the spatio-temporal reconstruction of sound fields using Generative Adversarial Networks (GANs). The method utilises a plane wave basis and learns the underlying statistical distributions of pressure in rooms to accurately reconstruct sound fields from a limited number of measurements. The performance of the method is evaluated using two established datasets and compared to state-of-the-art methods. The results show that the model is able to achieve an improved reconstruction performance in terms of accuracy and energy retention, particularly in the high-frequency range and when extrapolating beyond the measurement region. Furthermore, the proposed method can handle a varying number of measurement positions and configurations without sacrificing performance. The results suggest that this approach provides a promising approach to sound field reconstruction using generative models that allow for a physically informed prior to acoustics problems.

摘要

Discourse-Aware Text Simplification: From Complex Sentences to Linked Propositions

paper_url: http://arxiv.org/abs/2308.00425
repo_url: None
paper_authors: Christina Niklaus, Matthias Cetto, André Freitas, Siegfried Handschuh
for: 这个论文的目的是提高自然语言处理应用程序的预测质量，通过修改句子的结构和长度，使其更容易分析。
methods: 这个论文使用了一种基于语言知识的文本简化方法，包括句子拆分和重新排序，以提高句子的简单性和可读性。
results: 这个论文的实验结果表明，这种基于语言知识的文本简化方法可以有效地提高自然语言处理应用程序的预测质量，并且可以保持句子的意义完整性。

Abstract
Sentences that present a complex syntax act as a major stumbling block for downstream Natural Language Processing applications whose predictive quality deteriorates with sentence length and complexity. The task of Text Simplification (TS) may remedy this situation. It aims to modify sentences in order to make them easier to process, using a set of rewriting operations, such as reordering, deletion, or splitting. State-of-the-art syntactic TS approaches suffer from two major drawbacks: first, they follow a very conservative approach in that they tend to retain the input rather than transforming it, and second, they ignore the cohesive nature of texts, where context spread across clauses or sentences is needed to infer the true meaning of a statement. To address these problems, we present a discourse-aware TS approach that splits and rephrases complex English sentences within the semantic context in which they occur. Based on a linguistically grounded transformation stage that uses clausal and phrasal disembedding mechanisms, complex sentences are transformed into shorter utterances with a simple canonical structure that can be easily analyzed by downstream applications. With sentence splitting, we thus address a TS task that has hardly been explored so far. Moreover, we introduce the notion of minimality in this context, as we aim to decompose source sentences into a set of self-contained minimal semantic units. To avoid breaking down the input into a disjointed sequence of statements that is difficult to interpret because important contextual information is missing, we incorporate the semantic context between the split propositions in the form of hierarchical structures and semantic relationships. In that way, we generate a semantic hierarchy of minimal propositions that leads to a novel representation of complex assertions that puts a semantic layer on top of the simplified sentences.

摘要
сложные предложения acted as a major stumbling block for downstream Natural Language Processing applications, whose predictive quality decreased with sentence length and complexity. The task of Text Simplification (TS) may remedy this situation. It aims to modify sentences to make them easier to process, using a set of rewriting operations such as reordering, deletion, or splitting. However, state-of-the-art syntactic TS approaches have two major drawbacks: they tend to retain the input rather than transforming it, and they ignore the cohesive nature of texts, where context spread across clauses or sentences is needed to infer the true meaning of a statement. To address these problems, we present a discourse-aware TS approach that splits and rephrases complex English sentences within the semantic context in which they occur. Based on a linguistically grounded transformation stage that uses clausal and phrasal disembedding mechanisms, complex sentences are transformed into shorter utterances with a simple canonical structure that can be easily analyzed by downstream applications. With sentence splitting, we thus address a TS task that has hardly been explored so far. Moreover, we introduce the notion of minimality in this context, as we aim to decompose source sentences into a set of self-contained minimal semantic units. To avoid breaking down the input into a disjointed sequence of statements that is difficult to interpret because important contextual information is missing, we incorporate the semantic context between the split propositions in the form of hierarchical structures and semantic relationships. In this way, we generate a semantic hierarchy of minimal propositions that leads to a novel representation of complex assertions that puts a semantic layer on top of the simplified sentences.

Exploring the Role of Explainability in AI-Assisted Embryo Selection

paper_url: http://arxiv.org/abs/2308.02534
repo_url: None
paper_authors: Lucia Urcelay, Daniel Hinjos, Pablo A. Martin-Torres, Marta Gonzalez, Marta Mendez, Salva Cívico, Sergio Álvarez-Napagao, Dario Garcia-Gasulla
for: 这篇研究旨在探讨人工受精（In Vitro Fertilization）中选择和评估胚胎的方法，以及如何将人工智能（AI）技术应用于胚胎分析中，以提高评估过程的精度和可靠性。
methods: 本研究使用了深度学习技术，并分析了现有的AI-助け胚胎分析模型的解释性。
results: 研究发现现有的AI-助け胚胎分析模型的解释性有限，并提出了将这些模型整合到临床实践中的建议，以满足诊断师和病人的需求。

Abstract
In Vitro Fertilization is among the most widespread treatments for infertility. One of its main challenges is the evaluation and selection of embryo for implantation, a process with large inter- and intra-clinician variability. Deep learning based methods are gaining attention, but their opaque nature compromises their acceptance in the clinical context, where transparency in the decision making is key. In this paper we analyze the current work in the explainability of AI-assisted embryo analysis models, identifying the limitations. We also discuss how these models could be integrated in the clinical context as decision support systems, considering the needs of clinicians and patients. Finally, we propose guidelines for the sake of increasing interpretability and trustworthiness, pushing this technology forward towards established clinical practice.

摘要
幂化诊断是妊娠不孕的一种最广泛的治疗方法。其中一个主要挑战是评估和选择受试 embryo 进行嵌入，这个过程具有大量的内部和外部医生差异性。深度学习基于的方法在引起了关注，但它们的透明性问题限制了它们在临床上的接受度。本文分析了现有的 AI-assisted embryo 分析模型解释性的工作，并识别了其局限性。我们还讨论了如何将这些模型integrated into the clinical context as decision support systems, considering the needs of clinicians and patients。最后，我们提出了增加解释性和可信度的指南，推动这种技术向确定的临床实践前进。Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

BiERL: A Meta Evolutionary Reinforcement Learning Framework via Bilevel Optimization

paper_url: http://arxiv.org/abs/2308.01207
repo_url: https://github.com/chriswang98sz/bierl
paper_authors: Junyi Wang, Yuanyang Zhu, Zhi Wang, Yan Zheng, Jianye Hao, Chunlin Chen
for: 提高复杂RL问题的解决能力，适应不同RL算法的应用
methods: 提出了一种总体meta-RL框架，通过级联优化来同时更新内部RL模型和meta参数，不需要先知道预料或优化过程
results: 通过在MuJoCo和Box2D任务中进行了广泛的实验，证明了BiERL在多种ERL算法中表现出色，可以持续提高RL模型的学习效率

Abstract
Evolutionary reinforcement learning (ERL) algorithms recently raise attention in tackling complex reinforcement learning (RL) problems due to high parallelism, while they are prone to insufficient exploration or model collapse without carefully tuning hyperparameters (aka meta-parameters). In the paper, we propose a general meta ERL framework via bilevel optimization (BiERL) to jointly update hyperparameters in parallel to training the ERL model within a single agent, which relieves the need for prior domain knowledge or costly optimization procedure before model deployment. We design an elegant meta-level architecture that embeds the inner-level's evolving experience into an informative population representation and introduce a simple and feasible evaluation of the meta-level fitness function to facilitate learning efficiency. We perform extensive experiments in MuJoCo and Box2D tasks to verify that as a general framework, BiERL outperforms various baselines and consistently improves the learning performance for a diversity of ERL algorithms.

摘要
生化演进学习（ERL）算法在复杂的演进学习（RL）问题中受到关注，因为它们具有高并行性。然而，它们可能会因为不足的探索或模型崩溃而需要精心调整超参数（meta-parameters）。在这篇论文中，我们提出了一种通用的meta-ERL框架，通过级联优化（BiERL）来同时更新超参数和ERL模型，从而避免需要先进行模型部署前的优化或培ippi��hn Domain知识。我们设计了一种美化的meta- уров层建 architecture，将inner-level的演进经验嵌入到一个有用的人口表示中，并引入一种简单可行的meta- уров度评价函数，以便提高学习效率。我们在MuJoCo和Box2D任务中进行了广泛的实验，并证明了BiERL框架在多种ERL算法中具有优秀的总体性能。

Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis

paper_url: http://arxiv.org/abs/2308.00404
repo_url: https://github.com/sisinflab/graph-rss-reproducibility
paper_authors: Vito Walter Anelli, Daniele Malitesta, Claudio Pomo, Alejandro Bellogín, Tommaso Di Noia, Eugenio Di Sciascio
for: 本研究旨在提高图解析器（Graph Neural Network，GNN）在推荐系统中的可重现性，以便更好地理解不同图解析器在具体的配置下的表现。
methods: 本研究使用了六种流行的图解析器（NGCF、DGCF、LightGCN、SGL、UltraGCN、GFCF），在三个常用的数据集（Gowalla、Yelp 2018、Amazon Book）上进行了实验。此外，研究者还与传统的共同推荐模型进行了比较，以评估图解析器在不同数据集上的表现。
results: 研究发现，在三个常用的数据集上，图解析器的表现有所不同，而且与传统的共同推荐模型相比，图解析器在一些数据集上表现较差。此外，研究者还发现了数据集的特点对于推荐准确性的影响。

Abstract
The success of graph neural network-based models (GNNs) has significantly advanced recommender systems by effectively modeling users and items as a bipartite, undirected graph. However, many original graph-based works often adopt results from baseline papers without verifying their validity for the specific configuration under analysis. Our work addresses this issue by focusing on the replicability of results. We present a code that successfully replicates results from six popular and recent graph recommendation models (NGCF, DGCF, LightGCN, SGL, UltraGCN, and GFCF) on three common benchmark datasets (Gowalla, Yelp 2018, and Amazon Book). Additionally, we compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations. Furthermore, we extend our study to two new datasets (Allrecipes and BookCrossing) that lack established setups in existing literature. As the performance on these datasets differs from the previous benchmarks, we analyze the impact of specific dataset characteristics on recommendation accuracy. By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure. The code to reproduce our experiments is available at: https://github.com/sisinflab/Graph-RSs-Reproducibility.

摘要
GRAPH Neural Network-based models (GNNs) 的成功有效地提高了推荐系统，通过模型用户和物品为两个bidirectional, undirected 图。然而，许多原始图基 works often adopt 基线纸上的结果 без 验证它们是否适用于特定配置。我们的工作解决这个问题，我们专注于复制性。我们提供了一个代码，可以成功复制六种流行的最近的图推荐模型（NGCF、DGCF、LightGCN、SGL、UltraGCN 和 GFCF）的结果在三个常见的benchmark datasets（Gowalla、Yelp 2018 和 Amazon Book）。此外，我们与传统的合作 filtering 模型进行比较，这些模型在过去的线上评估中表现良好。此外，我们将研究 extends to two new datasets（Allrecipes 和 BookCrossing），这些 dataset 在现有 литературе中没有 Established setup。由于这些dataset的性能与之前的benchmark datasets不同，我们分析数据集特点对于推荐准确性的影响。我们通过研究用户邻居的信息流来Identify which models are influenced by intrinsic features in the dataset structure。我们的代码可以在：https://github.com/sisinflab/Graph-RSs-Reproducibility中复制。

Counterfactual Graph Transformer for Traffic Flow Prediction

paper_url: http://arxiv.org/abs/2308.00391
repo_url: None
paper_authors: Ying Yang, Kai Du, Xingyuan Dai, Jianwu Fang
for: 提高流量预测的可解释性和可靠性
methods: 提出了Counterfactual Graph Transformer（CGT）模型，并实现了对输入感知特征和图结构的干扰掩码生成器，以获取空间和时间 counterfactual 解释
results: 对三个实际世界公共数据集进行了广泛的实验，并证明了 CGT 可以生成可靠的解释和提高流量预测的可靠性。

Abstract
Traffic flow prediction (TFP) is a fundamental problem of the Intelligent Transportation System (ITS), as it models the latent spatial-temporal dependency of traffic flow for potential congestion prediction. Recent graph-based models with multiple kinds of attention mechanisms have achieved promising performance. However, existing methods for traffic flow prediction tend to inherit the bias pattern from the dataset and lack interpretability. To this end, we propose a Counterfactual Graph Transformer (CGT) model with an instance-level explainer (e.g., finding the important subgraphs) specifically designed for TFP. We design a perturbation mask generator over input sensor features at the time dimension and the graph structure on the graph transformer module to obtain spatial and temporal counterfactual explanations. By searching the optimal perturbation masks on the input data feature and graph structures, we can obtain the concise and dominant data or graph edge links for the subsequent TFP task. After re-training the utilized graph transformer model after counterfactual perturbation, we can obtain improved and interpretable traffic flow prediction. Extensive results on three real-world public datasets show that CGT can produce reliable explanations and is promising for traffic flow prediction.

摘要
traffic 流量预测（TFP）是智能交通系统（ITS）的基本问题，它模型了交通流量的隐藏空间时间相互关系，以预测潮湍。 current 图形基于模型已经实现了显著的表现。 however， exiting 方法 для traffic 流量预测通常会继承数据集的偏见模式和lack of interpretability。 To address this, we propose a Counterfactual Graph Transformer（CGT）模型，具有实例级别的解释器（例如，找到重要的子图），specifically designed for TFP. We design a 扰动mask生成器 over input sensor features at the time dimension and the graph structure on the graph transformer module to obtain spatial and temporal counterfactual explanations. By searching the optimal perturbation masks on the input data feature and graph structures, we can obtain the concise and dominant data or graph edge links for the subsequent TFP task. After re-training the utilized graph transformer model after counterfactual perturbation, we can obtain improved and interpretable traffic flow prediction. Extensive results on three real-world public datasets show that CGT can produce reliable explanations and is promising for traffic flow prediction.Note: Some words and phrases have been modified to conform to Simplified Chinese grammar and vocabulary.

Artificial-Intelligence-Based Triple Phase Shift Modulation for Dual Active Bridge Converter with Minimized Current Stress

paper_url: http://arxiv.org/abs/2308.00382
repo_url: None
paper_authors: Xinze Li, Xin Zhang, Fanfan Lin, Changjiang Sun, Kezhi Mao
for: 本研究旨在提出一种基于人工智能的三相扩展（TPS）调制策略，以最小化DAB转换器的电流压力。
methods: 本研究使用神经网络（NN）和软件推理系统（FIS）来解决TPS调制中的三个调制变量对current stress的影响，并提出了一种基于AI的TPS调制策略。
results: 实验结果表明，提出的AI-TPSM策略可以减少DAB转换器的电流压力，并且可以提高TPS调制的精度和可靠性。

Abstract
The dual active bridge (DAB) converter has been popular in many applications for its outstanding power density and bidirectional power transfer capacity. Up to now, triple phase shift (TPS) can be considered as one of the most advanced modulation techniques for DAB converter. It can widen zero voltage switching range and improve power efficiency significantly. Currently, current stress of the DAB converter has been an important performance indicator when TPS modulation is applied for smaller size and higher efficiency. However, to minimize the current stress when the DAB converter is under TPS modulation, two difficulties exist in analysis process and realization process, respectively. Firstly, three degrees of modulation variables in TPS modulation bring challenges to the analysis of current stress in different operating modes. This analysis and deduction process leads to heavy computational burden and also suffers from low accuracy. Secondly, to realize TPS modulation, if a lookup table is adopted after the optimization of modulation variables, modulation performance will be unsatisfactory because of the discrete nature of lookup table. Therefore, an AI-based TPS modulation (AI-TPSM) strategy is proposed in this paper. Neural network (NN) and fuzzy inference system (FIS) are utilized to deal with the two difficulties mentioned above. With the proposed AI-TPSM, the optimization of TPS modulation for minimized current stress will enjoy high degree of automation which can relieve engineers' working burden and improve accuracy. In the end of this paper, the effectiveness of the proposed AI-TPSM has been experimentally verified with a 1 kW prototype.

摘要
双活桥（DAB）Converter在许多应用中具有出色的电力密度和对向电力传输能力。至今为止，三相滑动（TPS）可以被视为DABConverter中最高等级的调制技术。它可以宽化零电压调制范围和提高电力效率很大。然而，当TPS调制被应用时，DABConverter的电流负载成为了重要的性能指标。但是，为了最小化DABConverter在TPS调制下的电流负载，存在两种难点：一是TPS调制中的三个调制变数带来了不同运行模式下的电流压力分析和推导过程中的重要问题，这个过程具有复杂的计算负载和低准确性。二是，为了实现TPS调制，如果使用 lookup 表，则调制性能将会不满足。因此，本文提出了一个基于人工智能（AI）的TPS调制策略（AI-TPSM）。使用神经网络（NN）和决策系统（FIS）来解决这两个问题。具有提案的AI-TPSM，将促进TPS调制的最佳化，从而实现较高的自动化程度，减轻工程师的工作负担，并提高准确性。本文的实验结果显示，提案的AI-TPSM在1kW试验产品中得到了认可。

Artificial-Intelligence-Based Hybrid Extended Phase Shift Modulation for the Dual Active Bridge Converter with Full ZVS Range and Optimal Efficiency

paper_url: http://arxiv.org/abs/2308.00381
repo_url: None
paper_authors: Xinze Li, Xin Zhang, Fanfan Lin, Changjiang Sun, Kezhi Mao
For: The paper aims to propose an artificial-intelligence-based hybrid extended phase shift (HEPS) modulation for dual active bridge (DAB) converters to achieve optimal efficiency with full zero-voltage switching (ZVS) operation over the entire operating range.* Methods: The HEPS modulation is developed using an automated fashion, which alleviates the cumbersome model building process while maintaining high model accuracy. The paper uses extreme gradient boosting (XGBoost) to build data-driven models of ZVS and efficiency performance, and particle swarm optimization with state-based adaptive velocity limit (PSO-SAVL) to select the best EPS strategy and optimize modulation parameters.* Results: The paper verifies the feasibility of HEPS with 1 kW hardware experiments, achieving optimal efficiency of up to 97.1% and full-range ZVS operation.

Abstract
Dual active bridge (DAB) converter is the key enabler in many popular applications such as wireless charging, electric vehicle and renewable energy. ZVS range and efficiency are two significant performance indicators for DAB converter. To obtain the desired ZVS and efficiency performance, modulation should be carefully designed. Hybrid modulation considers several single modulation strategies to achieve good comprehensive performance. Conventionally, to design a hybrid modulation, harmonic approach or piecewise approach is used, but they suffer from time-consuming model building process and inaccuracy. Therefore, an artificial-intelligence-based hybrid extended phase shift (HEPS) modulation is proposed. Generally, the HEPS modulation is developed in an automated fashion, which alleviates cumbersome model building process while keeping high model accuracy. In HEPS modulation, two EPS strategies are considered to realize optimal efficiency with full ZVS operation over entire operating ranges. Specifically, to build data-driven models of ZVS and efficiency performance, extreme gradient boosting (XGBoost), which is a state-of-the-art ensemble learning algorithm, is adopted. Afterwards, particle swarm optimization with state-based adaptive velocity limit (PSO-SAVL) is utilized to select the best EPS strategy and optimize modulation parameters. With 1 kW hardware experiments, the feasibility of HEPS has been verified, achieving optimal efficiency with maximum of 97.1% and full-range ZVS operation.

摘要
双活桥（DAB）转换器是许多受欢迎应用程序中的关键启用器，如无线充电、电动车和可再生能源。ZVS范围和效率是DAB转换器的两个重要性能指标。为了实现所需的ZVS和效率性能，模拟应该仔细设计。在本文中，我们提出了人工智能基于的扩展phas shift（HEPS）模ulation，以解决传统的异步模ulation和分割模ulation的缺点。HEPS模ulation通过自动化模型建立过程来缓解繁琐的模型建立过程，同时保持高度准确。在HEPS模ulation中，两种EPS策略被考虑以实现最佳的效率和ZVS操作。具体来说，通过使用极限梯度搜索（XGBoost）算法来建立数据驱动模型，以实现ZVS和效率性能的优化。然后，使用粒子群搜索与状态基于适应限速（PSO-SAVL）算法来选择最佳EPS策略和优化模ulation参数。在1千瓦硬件实验中，我们证明了HEPS的可行性，实现了最佳效率（97.1%）和全范围ZVS操作。

Shape Completion with Prediction of Uncertain Regions

paper_url: http://arxiv.org/abs/2308.00377
repo_url: https://github.com/dlr-rm/shape-completion
paper_authors: Matthias Humt, Dominik Winkelbauer, Ulrich Hillenbrand
for: 这个研究是为了解决Shape completion问题，即从partial observation中推算出物体的完整几何构造。
methods: 本研究提出了两种新的方法来预测 uncertain regions，一种是通过处理occupancy scores的后处理，另一种是直接预测不确定指标。
results: 比较两种新方法和两种已知的方法，新方法在Shape completion和uncertain region prediction中都表现出来了较高的精度，并且可以避免预测的uncertain regions以提高grasps的质量。

Abstract
Shape completion, i.e., predicting the complete geometry of an object from a partial observation, is highly relevant for several downstream tasks, most notably robotic manipulation. When basing planning or prediction of real grasps on object shape reconstruction, an indication of severe geometric uncertainty is indispensable. In particular, there can be an irreducible uncertainty in extended regions about the presence of entire object parts when given ambiguous object views. To treat this important case, we propose two novel methods for predicting such uncertain regions as straightforward extensions of any method for predicting local spatial occupancy, one through postprocessing occupancy scores, the other through direct prediction of an uncertainty indicator. We compare these methods together with two known approaches to probabilistic shape completion. Moreover, we generate a dataset, derived from ShapeNet, of realistically rendered depth images of object views with ground-truth annotations for the uncertain regions. We train on this dataset and test each method in shape completion and prediction of uncertain regions for known and novel object instances and on synthetic and real data. While direct uncertainty prediction is by far the most accurate in the segmentation of uncertain regions, both novel methods outperform the two baselines in shape completion and uncertain region prediction, and avoiding the predicted uncertain regions increases the quality of grasps for all tested methods. Web: https://github.com/DLR-RM/shape-completion

摘要
Shape completion, 即从部分观察获取物体完整的几何结构，在机器人抓取任务中非常有 relevance。当基于物体形状重建的计划或预测真正的抓取动作时，确保geometry uncertainty的存在是非常重要的。尤其是在给出杂乱的物体视图时，可能存在扩展区域中对整个物体部分的存在的不确定性。为处理这种重要的情况，我们提出了两种新的方法来预测这些不确定的区域，一种通过质量分配的后处理，另一种通过直接预测不确定指标。我们将这些方法与两种已知的概率形状完成方法进行比较。此外，我们还生成了基于ShapeNet的数据集，包含真实渲染的深度图像，以及对这些图像的 annotations for uncertain regions。我们在这个数据集上训练和测试每种方法的 shape completion和不确定区域预测能力，并发现两种新方法在shape completion和不确定区域预测方面都高于两个基elines，并且避免预测的不确定区域可以提高所有测试方法的抓取质量。详细信息请参考。

Fountain – an intelligent contextual assistant combining knowledge representation and language models for manufacturing risk identification

paper_url: http://arxiv.org/abs/2308.00364
repo_url: None
paper_authors: Saurabh Kumar, Daniel Fuchs, Klaus Spindler
for: 本研究旨在提供一种基于语言模型和知识图的启用中间件，以帮助工程师在 deviations 管理中预测和避免因产品设计和生产过程变化而导致的风险。
methods: 本研究使用语言模型finetuned для域pecific semantic similarity，以及基于bill of materials、Failure Modes and Effect Analysis (FMEA)和客户前面 Failure 的知识表示。
results: 研究人员通过实验和案例研究表明，可以通过采用本研究提出的方法，实时地预测和避免因 deviations 而导致的风险，并且可以在现有的计算机机器基础设施上进行模型更新和推理。

Abstract
Deviations from the approved design or processes during mass production can lead to unforeseen risks. However, these changes are sometimes necessary due to changes in the product design characteristics or an adaptation in the manufacturing process. A major challenge is to identify these risks early in the workflow so that failures leading to warranty claims can be avoided. We developed Fountain as a contextual assistant integrated in the deviation management workflow that helps in identifying the risks based on the description of the existing design and process criteria and the proposed deviation. In the manufacturing context, it is important that the assistant provides recommendations that are explainable and consistent. We achieve this through a combination of the following two components 1) language models finetuned for domain specific semantic similarity and, 2) knowledge representation in the form of a property graph derived from the bill of materials, Failure Modes and Effect Analysis (FMEA) and prior failures reported by customers. Here, we present the nuances of selecting and adapting pretrained language models for an engineering domain, continuous model updates based on user interaction with the contextual assistant and creating the causal chain for explainable recommendations based on the knowledge representation. Additionally, we demonstrate that the model adaptation is feasible using moderate computational infrastructure already available to most engineering teams in manufacturing organizations and inference can be performed on standard CPU only instances for integration with existing applications making these methods easily deployable.

摘要
不同于批量生产中所批准的设计或过程的变化可能会导致未预见的风险。然而，这些变化有时是必要的，因为产品设计特点或生产过程中的变化。一个主要挑战是在工作流程中早期发现这些风险，以避免因杂 deviation 导致的售后服务请求。我们开发了一个名为“喷泉”的上下文助手，它在异常处理工作流程中帮助Identify 风险，基于现有设计和过程标准的描述和提案的异常。在制造上下文中，助手需要提供可解释的建议。我们通过以下两个组件实现了这一点：1）预处理语言模型，适应域pecific Semantic Similarity，和2）基于生产物料清单、失效模式分析（FMEA）和客户前置报告的知识表示。我们还解释了如何选择和适应预训练语言模型，连续更新模型基于用户与上下文助手的交互，以及如何创建 causal chain для可解释的建议。此外，我们还证明了模型适配可以在现有的制造组织中进行，并且推理可以在标准CPU上进行，以便与现有应用程序集成。

MetaGPT: Meta Programming for Multi-Agent Collaborative Framework

paper_url: http://arxiv.org/abs/2308.00352
repo_url: https://github.com/geekan/metagpt
paper_authors: Sirui Hong, Xiawu Zheng, Jonathan Chen, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu
for: 这个论文的目的是探讨如何使用大型自然语言模型（LLM）驱动多代理人工作，以解决复杂的多代理人问题。
methods: 这个论文使用了一个创新的框架，名为MetaGPT，它将人类工作流程组合入大型自然语言模型中，以提高多代理人协作的有效性。
results: 实验结果显示，MetaGPT可以生成更一致和正确的解决方案，比较先进的对话式多代理人系统。这显示了将人类专业知识 integrate 到多代理人系统中的潜在可能性，并创造了复杂的现实世界挑战的新机会。

Abstract
Recently, remarkable progress has been made in automated task-solving through the use of multi-agent driven by large language models (LLMs). However, existing LLM-based multi-agent works primarily focus on solving simple dialogue tasks, and complex tasks are rarely studied, mainly due to the LLM hallucination problem. This type of hallucination becomes cascading when naively chaining multiple intelligent agents, resulting in a failure to effectively address complex problems. Therefore, we introduce MetaGPT, an innovative framework that incorporates efficient human workflows as a meta programming approach into LLM-based multi-agent collaboration. Specifically, MetaGPT encodes Standardized Operating Procedures (SOPs) into prompts to enhance structured coordination. Subsequently, it mandates modular outputs, empowering agents with domain expertise comparable to human professionals, to validate outputs and minimize compounded errors. In this way, MetaGPT leverages the assembly line paradigm to assign diverse roles to various agents, thereby establishing a framework that can effectively and cohesively deconstruct complex multi-agent collaborative problems. Our experiments on collaborative software engineering benchmarks demonstrate that MetaGPT generates more coherent and correct solutions compared to existing chat-based multi-agent systems. This highlights the potential of integrating human domain knowledge into multi-agent systems, thereby creating new opportunities to tackle complex real-world challenges. The GitHub repository of this project is publicly available on:https://github.com/geekan/MetaGPT.

摘要
最近，在多代理驱动大语言模型（LLM）的情况下，自动任务解决得到了非常出色的进步。然而，现有的LLM基于多代理工作主要集中于解决简单对话任务，复杂任务则rarely studied，主要是因为LLM幻觉问题。这种幻觉会在不经过严格的验证和约束的情况下，继续层层传递，导致复杂问题的解决不具有效果。因此，我们提出了MetaGPT框架，它将人工流程作为多代理协作的meta编程方法。具体来说，MetaGPT将标准化的操作 процедуures（SOPs）编码成提示，以提高结构化协作。然后，它要求机器人输出具有域专业知识的Module输出，以验证输出并减少相加的错误。通过这种方式，MetaGPT利用了生产线模式，将不同的代理分配给不同的机器人，以建立一个可以有效和凝合地解决复杂多代理协作问题的框架。我们对协同软件工程标准准测试数据进行了实验，并证明MetaGPT可以生成更凝合和正确的解决方案，比现有的协同多代理系统更好。这表明了将人类域知识integrated into多代理系统的潜在价值，以创造新的机会来解决复杂的现实世界挑战。MetaGPT项目的GitHub存储库可以在以下链接中找到：https://github.com/geekan/MetaGPT。

Learning Green’s Function Efficiently Using Low-Rank Approximations

paper_url: http://arxiv.org/abs/2308.00350
repo_url: https://github.com/kishanwn/decgreennet
paper_authors: Kishan Wimalawarne, Taiji Suzuki, Sophie Langer
for: 用深度学习模型解决不同类型的partial differential equations
methods: 使用低级别分解学习绿函数，避免重复 computationally expensive Monte-Carlo integral approximations
results: 提高计算时间，与PINNs和MOD-Net相当准确，但减少计算量

Abstract
Learning the Green's function using deep learning models enables to solve different classes of partial differential equations. A practical limitation of using deep learning for the Green's function is the repeated computationally expensive Monte-Carlo integral approximations. We propose to learn the Green's function by low-rank decomposition, which results in a novel architecture to remove redundant computations by separate learning with domain data for evaluation and Monte-Carlo samples for integral approximation. Using experiments we show that the proposed method improves computational time compared to MOD-Net while achieving comparable accuracy compared to both PINNs and MOD-Net.

摘要
使用深度学习模型学习绿函数可以解决不同类型的partial differential equations。但是使用深度学习 для绿函数受到重复计算成本高的Monte-Carlo积分approximation的限制。我们提议通过低级别分解来学习绿函数，这导致了一种新的架构，可以通过预测和Monte-Carlo样本进行独立学习和积分近似。通过实验，我们发现提议的方法可以比MOD-Net减少计算时间，同时与PINNs和MOD-Net的准确率相似。

Dynamic ensemble selection based on Deep Neural Network Uncertainty Estimation for Adversarial Robustness

paper_url: http://arxiv.org/abs/2308.00346
repo_url: None
paper_authors: Ruoxi Qin, Linyuan Wang, Xuehui Du, Xingyuan Chen, Bin Yan
for: 提高模型对抗性和稳定性
methods: 动态ensemble选择技术， Dirichlet分布和多模型多态性
results: 无需损害准确率，提高了模型对抗性和稳定性

Abstract
The deep neural network has attained significant efficiency in image recognition. However, it has vulnerable recognition robustness under extensive data uncertainty in practical applications. The uncertainty is attributed to the inevitable ambient noise and, more importantly, the possible adversarial attack. Dynamic methods can effectively improve the defense initiative in the arms race of attack and defense of adversarial examples. Different from the previous dynamic method depend on input or decision, this work explore the dynamic attributes in model level through dynamic ensemble selection technology to further protect the model from white-box attacks and improve the robustness. Specifically, in training phase the Dirichlet distribution is apply as prior of sub-models' predictive distribution, and the diversity constraint in parameter space is introduced under the lightweight sub-models to construct alternative ensembel model spaces. In test phase, the certain sub-models are dynamically selected based on their rank of uncertainty value for the final prediction to ensure the majority accurate principle in ensemble robustness and accuracy. Compared with the previous dynamic method and staic adversarial traning model, the presented approach can achieve significant robustness results without damaging accuracy by combining dynamics and diversity property.

摘要
深度神经网络在图像识别方面已经达到了显著的效率。然而，它在实际应用中面临着广泛的数据不确定性，这种不确定性来自于不可避免的环境噪音以及可能的敌意攻击。动态方法可以有效地提高模型的防御机制，在攻击者和防御者之间的战争中，动态方法可以帮助模型更好地应对敌意攻击。在这个工作中，我们explore了模型层次的动态特性，通过动态ensemble选择技术来进一步保护模型免受白盒攻击，提高模型的 Robustness。在训练阶段，我们使用Dirichlet分布作为子模型预测分布的先验，并在轻量级子模型中引入多样性约束来构建多个可 ensemble模型空间。在测试阶段，我们 dynamically选择一些最高的uncertainty值来确定最终预测的子模型，以保证ensemble robustness和准确性的多数原则。与过去的动态方法和静态敌意训练模型相比，我们的方法可以无需损害准确性的情况下获得显著的Robustness。通过结合动态和多样性特性，我们的方法可以提供更好的防御机制，使得模型在实际应用中更加可靠。

Kidnapping Deep Learning-based Multirotors using Optimized Flying Adversarial Patches

paper_url: http://arxiv.org/abs/2308.00344
repo_url: https://github.com/imrclab/flying_adversarial_patch
paper_authors: Pia Hanfeld, Khaled Wahba, Marina M. -C. Höhne, Michael Bussmann, Wolfgang Hönig
for: 这 paper 是关于 autonomous flying robots 的 deep learning 模型如何受到攻击的研究。
methods: 这 paper 使用了多种攻击方法，包括 computed adversarial patches 和 novel attack policy。
results: 这 paper 的结果表明，使用这些攻击方法可以 manipulate autonomous flying robots 的 neural network 预测，并且可以在Physical flights 中实现 robot 的 kidnapping。Here’s the full translation in Simplified Chinese:
for: 这 paper 是关于 autonomous flying robots 的 deep learning 模型如何受到攻击的研究。
methods: 这 paper 使用了多种攻击方法，包括 computed adversarial patches 和 novel attack policy。
results: 这 paper 的结果表明，使用这些攻击方法可以 manipulate autonomous flying robots 的 neural network 预测，并且可以在Physical flights 中实现 robot 的 kidnapping。I hope that helps! Let me know if you have any further questions.

Abstract
Autonomous flying robots, such as multirotors, often rely on deep learning models that makes predictions based on a camera image, e.g. for pose estimation. These models can predict surprising results if applied to input images outside the training domain. This fault can be exploited by adversarial attacks, for example, by computing small images, so-called adversarial patches, that can be placed in the environment to manipulate the neural network's prediction. We introduce flying adversarial patches, where multiple images are mounted on at least one other flying robot and therefore can be placed anywhere in the field of view of a victim multirotor. By introducing the attacker robots, the system is extended to an adversarial multi-robot system. For an effective attack, we compare three methods that simultaneously optimize multiple adversarial patches and their position in the input image. We show that our methods scale well with the number of adversarial patches. Moreover, we demonstrate physical flights with two robots, where we employ a novel attack policy that uses the computed adversarial patches to kidnap a robot that was supposed to follow a human.

摘要
自主飞行机器人，如多Rotor，常用深度学习模型，以图像为输入，例如pose estimation。这些模型可能会出现意外的结果，如果应用于输入图像外的训练领域。这种错误可以被恶意攻击，例如计算小图像，称为恶意补丁，并将其置入环境中，以操纵神经网络的预测。我们引入飞行恶意补丁，其中至少有一个飞行机器人携带多个图像，因此可以在犯罪者多rotor的视场中随意地放置。通过引入攻击机器人，系统变成了一个敌对多机器人系统。为实现攻击，我们比较了三种方法，同时优化多个恶意补丁和其位置在输入图像中。我们发现我们的方法可以很好地扩展到多个恶意补丁。此外，我们还实现了实际的飞行试验，使用计算出来的恶意补丁，将一个预定要跟随人类的机器人绑架。

Monitoring Algorithmic Fairness under Partial Observations

paper_url: http://arxiv.org/abs/2308.00341
repo_url: None
paper_authors: Thomas A. Henzinger, Konstantin Kueffner, Kaushik Mallik
for: 这篇论文目的是监控 deployed 系统的algorithmic fairness，以 guaranteee that AI和机器学习软件在做出决策时保持公正和无偏见。
methods: 这篇论文使用 runtime verification techniques，可以在 deployed 系统上监控algorithmic fairness。这些监控技术假设系统的全 observability，并且只能监控已经指定的公正性特性，即 arithmetic expressions over the probabilities of different events。
results: 这篇论文延伸了 fairness monitoring 到 partially observed Markov chains (POMC) 和 specifications containing arithmetic expressions over the expected values of numerical functions on event sequences。这篇论文可以在这些系统上监控algorithmic fairness，并且可以在一个执行的系统上监控整个分布中的公正性。

Abstract
As AI and machine-learned software are used increasingly for making decisions that affect humans, it is imperative that they remain fair and unbiased in their decisions. To complement design-time bias mitigation measures, runtime verification techniques have been introduced recently to monitor the algorithmic fairness of deployed systems. Previous monitoring techniques assume full observability of the states of the (unknown) monitored system. Moreover, they can monitor only fairness properties that are specified as arithmetic expressions over the probabilities of different events. In this work, we extend fairness monitoring to systems modeled as partially observed Markov chains (POMC), and to specifications containing arithmetic expressions over the expected values of numerical functions on event sequences. The only assumptions we make are that the underlying POMC is aperiodic and starts in the stationary distribution, with a bound on its mixing time being known. These assumptions enable us to estimate a given property for the entire distribution of possible executions of the monitored POMC, by observing only a single execution. Our monitors observe a long run of the system and, after each new observation, output updated PAC-estimates of how fair or biased the system is. The monitors are computationally lightweight and, using a prototype implementation, we demonstrate their effectiveness on several real-world examples.

摘要

paper_url: http://arxiv.org/abs/2308.00331
repo_url: None
paper_authors: Yun Chen, Jiaping Xiao
for: 这篇论文是为了探索和搜寻任务中的合作多元机器人系统，以提高效率。
methods: 这篇论文使用深度强化学习算法来教育策略，并导入多阶段强化学习框架和curiosity模组，以增强代理人对无 visited 环境的探索。
results: 在模拟环境中的实验显示，我们的框架可以将多元机器人系统训练到 unknown 目标位置的搜寻和NAVIGATION，而exist的基准不能实现，且加速训练速度。

Abstract
Collaborative heterogeneous robot systems can greatly improve the efficiency of target search and navigation tasks. In this paper, we design a heterogeneous robot system consisting of a UAV and a UGV for search and rescue missions in unknown environments. The system is able to search for targets and navigate to them in a maze-like mine environment with the policies learned through deep reinforcement learning algorithms. During the training process, if two robots are trained simultaneously, the rewards related to their collaboration may not be properly obtained. Hence, we introduce a multi-stage reinforcement learning framework and a curiosity module to encourage agents to explore unvisited environments. Experiments in simulation environments show that our framework can train the heterogeneous robot system to achieve the search and navigation with unknown target locations while existing baselines may not, and accelerate the training speed.

摘要
将文本翻译成简化中文。<>合作多种机器人系统可以大幅提高目标搜寻和导航任务的效率。在这篇论文中，我们设计了一个多种机器人系统，包括一架UAV和一架UGV，用于搜寻和救援任务在未知环境中。系统可以在封闭的矿山环境中搜寻目标并导航到它们，通过深度强化学习算法学习的策略。在训练过程中，如果两个机器人同时训练，关于他们之间的合作的奖励可能不会正确获得。因此，我们提出了多stage强化学习框架和curiosity模块，以促使代理人探索未曾访问的环境。在模拟环境中的实验表明，我们的框架可以训练多种机器人系统，在目标位置未知的情况下实现搜寻和导航，而现有的基线可能无法达成，并且加速训练速度。

Threshold-aware Learning to Generate Feasible Solutions for Mixed Integer Programs

paper_url: http://arxiv.org/abs/2308.00327
repo_url: None
paper_authors: Taehyun Yoon, Jinwon Choi, Hyokun Yun, Sungbin Lim
for: 寻找一个高质量可行的解决方案，以应对 combinatorial optimization（CO）问题，并在有限时间内完成。
methods: 使用 Neural Diving（ND）方法，一种基于机器学习的方法，以生成混合整数程式中的部分零値变量分配。
results: 透过优化覆盖范围，实现将 machine learning 和整数程式的目标更加接近，并且透过内置学习来实现更好的性能。实验结果显示，使用深度神经网估算覆盖可以实现最佳性能，并且在 NeurIPS ML4CO 数据集上表现出色，特别是在工作负载分配数据集上，实现了最佳性能，优化差值为 0.45%，与 SCIP 相比，提高了十倍。

Abstract
Finding a high-quality feasible solution to a combinatorial optimization (CO) problem in a limited time is challenging due to its discrete nature. Recently, there has been an increasing number of machine learning (ML) methods for addressing CO problems. Neural diving (ND) is one of the learning-based approaches to generating partial discrete variable assignments in Mixed Integer Programs (MIP), a framework for modeling CO problems. However, a major drawback of ND is a large discrepancy between the ML and MIP objectives, i.e., variable value classification accuracy over primal bound. Our study investigates that a specific range of variable assignment rates (coverage) yields high-quality feasible solutions, where we suggest optimizing the coverage bridges the gap between the learning and MIP objectives. Consequently, we introduce a post-hoc method and a learning-based approach for optimizing the coverage. A key idea of our approach is to jointly learn to restrict the coverage search space and to predict the coverage in the learned search space. Experimental results demonstrate that learning a deep neural network to estimate the coverage for finding high-quality feasible solutions achieves state-of-the-art performance in NeurIPS ML4CO datasets. In particular, our method shows outstanding performance in the workload apportionment dataset, achieving the optimality gap of 0.45%, a ten-fold improvement over SCIP within the one-minute time limit.

摘要
寻找一个高质量可行的解决方案 для combinatorial optimization（CO）问题在有限时间内是挑战的，因为它的特点是离散的。在最近，机器学习（ML）方法在Addressing CO problems的问题上增加了。Neural diving（ND）是一种学习基于方法，用于在混合整数程序（MIP）中生成部分离散变量分配。然而，ND的一个主要缺点是ML和MIP目标之间的差距，即变量值分类准确率 над primal bound。我们的研究发现，在特定的变量分配率（coverage）范围内，可以获得高质量可行的解决方案，我们建议优化coverage来跨越这两个目标之间的差距。因此，我们引入了一种后续方法和一种学习基于的方法来优化coverage。我们的方法的关键思想是同时学习限制coverage搜索空间和预测coverage在学习的搜索空间中。实验结果表明，通过学习深度神经网络来估计coverage以找到高质量可行的解决方案，可以在NeurIPS ML4CO数据集中实现状态计算机科学技术最佳性能。尤其是在分配工作量数据集中，我们的方法实现了0.45%的优化率，胜过SCIP在一分钟时限内的十倍提高。

Choir Transformer: Generating Polyphonic Music with Relative Attention on Transformer

paper_url: http://arxiv.org/abs/2308.02531
repo_url: None
paper_authors: Jiuyang Zhou, Hong Zhu, Xingping Wang
for: 这个论文的目的是提出一种基于Transformer的多VOICE音乐生成模型，以便更好地模型音乐的结构和关系。
methods: 这个模型使用了相对位置注意力来更好地模型长距离音符之间的关系，并提出了一种适合多VOICE音乐生成的音乐表示方式。
results: 实验表明，这个模型的性能超过了之前的最佳值4.06%，并且可以根据输入进行不同的风格化音乐生成。

Abstract
Polyphonic music generation is still a challenge direction due to its correct between generating melody and harmony. Most of the previous studies used RNN-based models. However, the RNN-based models are hard to establish the relationship between long-distance notes. In this paper, we propose a polyphonic music generation neural network named Choir Transformer[ https://github.com/Zjy0401/choir-transformer], with relative positional attention to better model the structure of music. We also proposed a music representation suitable for polyphonic music generation. The performance of Choir Transformer surpasses the previous state-of-the-art accuracy of 4.06%. We also measures the harmony metrics of polyphonic music. Experiments show that the harmony metrics are close to the music of Bach. In practical application, the generated melody and rhythm can be adjusted according to the specified input, with different styles of music like folk music or pop music and so on.

摘要
<> traduced the text into Simplified Chinese.<>复音乐生成仍然是一个挑战方向，因为它们需要正确地生成旋律和和谐。大多数前一代的研究使用RNN型模型。然而，RNN型模型对于距离较长的当地谱进行建立关系困难。在这篇论文中，我们提出了一个复音乐生成神经网络名为Choir Transformer[https://github.com/Zjy0401/choir-transformer]，具有相对位置注意力以更好地模型音乐结构。我们还提出了适合复音乐生成的音乐表示。Choir Transformer的表现超过了过去的州立准确率4.06%。我们还测量了复音乐中的和谐指标，实验结果显示和谐指标与巴赫的音乐很相似。在实际应用中，生成的旋律和节奏可以根据输入的特定要求进行调整，包括不同的音乐风格，如民族音乐或流行音乐等。

Pixel to policy: DQN Encoders for within & cross-game reinforcement learning

paper_url: http://arxiv.org/abs/2308.00318
repo_url: None
paper_authors: Ashrya Agrawal, Priyanshi Shah, Sourabh Prakash
for: 本研究探讨了基于增强学习的游戏撸抓机器人的开发，以及如何通过知识转移来提高RL性能。
methods: 本研究使用了基于深度强化学习的DQN模型，并在不同的游戏环境中进行了训练。同时，研究还 comparing了从头开始训练和知识转移的RL模型，以及使用多个游戏环境训练一个通用游戏撸抓机器人的方法。
results: 研究结果显示，使用知识转移的DQN模型可以在不同的游戏环境中达到优秀的性能，其中 mean episode reward 为 46.16，even 超过了人类水平的表现。此外，在 Assault 和 Space Invader 环境中，模型的 achieved mean rewards 分别为 533.42 和 402.17，表现非常出色。

Abstract
Reinforcement Learning can be applied to various tasks, and environments. Many of these environments have a similar shared structure, which can be exploited to improve RL performance on other tasks. Transfer learning can be used to take advantage of this shared structure, by learning policies that are transferable across different tasks and environments and can lead to more efficient learning as well as improved performance on a wide range of tasks. This work explores as well as compares the performance between RL models being trained from the scratch and on different approaches of transfer learning. Additionally, the study explores the performance of a model trained on multiple game environments, with the goal of developing a universal game-playing agent as well as transfer learning a pre-trained encoder using DQN, and training it on the same game or a different game. Our DQN model achieves a mean episode reward of 46.16 which even beats the human-level performance with merely 20k episodes which is significantly lower than deepmind's 1M episodes. The achieved mean rewards of 533.42 and 402.17 on the Assault and Space Invader environments respectively, represent noteworthy performance on these challenging environments.

摘要
� Reinforcement Learning 可以应用于多种任务和环境中。许多这些环境具有相似的共享结构，可以利用这些共享结构来提高RL性能。传输学习可以利用这些共享结构，学习可以在不同任务和环境中传输的策略，从而导致更高效的学习以及多种任务的改进表现。本研究探讨了RL模型从零开始学习和不同的传输学习方法的比较性能，同时还研究了使用多个游戏环境训练的模型，以达到开发通用游戏玩家代理以及传输学习预训练的DQN模型的目的。我们的DQN模型在46.16 episodes中获得了平均回合奖励，这even exceeds human-level performance，只需20k episodes，与深入的1M episodes相比，这是显著的提高。在Assault和Space Invader环境中，我们获得的平均奖励为533.42和402.17，这表示在这些复杂的环境中的出色表现。

Revolutionizing TCAD Simulations with Universal Device Encoding and Graph Attention Networks

paper_url: http://arxiv.org/abs/2308.11624
repo_url: None
paper_authors: Guangxi Fan, Kain Lu Low
for: 这篇论文的目的是提出一种基于人工智能（AI）和图表表示法的TCAD设备模拟中的设备编码方法。
methods: 该方法使用图表基本编码方案，考虑材料层和设备层嵌入，同时还引入了一种新的空间关系嵌入， inspirited by interpolate操作通常用于finite element分 meshing。
results: 该方法可以实现全面的数据驱动模拟，包括surrogate Poisson伪 simulate和current-voltage（IV）预测基于漂移扩散模型，使用一种新的图注意力网络，称为RelGAT。

Abstract
An innovative methodology that leverages artificial intelligence (AI) and graph representation for semiconductor device encoding in TCAD device simulation is proposed. A graph-based universal encoding scheme is presented that not only considers material-level and device-level embeddings, but also introduces a novel spatial relationship embedding inspired by interpolation operations typically used in finite element meshing. Universal physical laws from device simulations are leveraged for comprehensive data-driven modeling, which encompasses surrogate Poisson emulation and current-voltage (IV) prediction based on drift-diffusion model. Both are achieved using a novel graph attention network, referred to as RelGAT. Comprehensive technical details based on the device simulator Sentaurus TCAD are presented, empowering researchers to adopt the proposed AI-driven Electronic Design Automation (EDA) solution at the device level.

摘要
一种创新的方法ология，利用人工智能（AI）和图表示法来实现半导体设备编码在TCAD设备仿真中，被提议。这种图基本 універсаル编码方案不仅考虑材料层和设备层嵌入，还引入了一种新的空间关系嵌入，取得自 interpolate 操作通常用于finite element meshing。这种新的空间关系嵌入可以帮助更好地捕捉设备的物理特性。此外，通过利用设备仿真中的物理法律，实现了全面的数据驱动模拟，包括surrogate Poisson 伪 simulate和current-voltage（IV）预测，基于漫步-扩散模型。这些技术性细节基于设备仿真器Sentaurus TCAD，以便研究人员可以在设备层采用这种AI驱动的电子设计自动化（EDA）解决方案。

Adapt and Decompose: Efficient Generalization of Text-to-SQL via Domain Adapted Least-To-Most Prompting

paper_url: http://arxiv.org/abs/2308.02582
repo_url: None
paper_authors: Aseem Arora, Shabbirhussain Bhaisaheb, Harshit Nigam, Manasi Patwardhan, Lovekesh Vig, Gautam Shroff
For: This paper focuses on improving the cross-domain and cross-compositional generalization of Text-to-SQL semantic parsing using a novel prompt-based approach.* Methods: The proposed method involves offline sampling of a minimal set of few-shots from the training data to synthesize a fixed Generic Prompt (GP) with complete coverage of SQL clauses, operators, and functions, and maximal domain coverage within the allowed token length. The GP is then adapted to the target database domain (DA-GP) to handle cross-domain generalization, and a decomposed Least-To-Most-Prompting (LTMP-DA-GP) is used to handle cross-compositional generalization.* Results: The proposed approach demonstrates superior performance on the KaggleDBQA dataset, designed to evaluate generalizability for the Text-to-SQL task, and shows consistent performance improvement over the baseline GP across LLMs and databases of KaggleDBQA, highlighting the efficacy and model agnostic benefits of the prompt-based adapt and decompose approach.Here is the same information in Simplified Chinese:* For: 这篇论文关注提高cross-domain和cross-compositional generalization的文本到SQL semantics解析。* Methods: 提议方法包括在训练数据集中Offline采样一个最小的几个shot，以生成一个包含全部SQL语句、运算符和函数的固定Generic Prompt (GP)，并在目标数据库领域内具有最大的领域覆盖。此外，还使用分解的Least-To-Most-Prompting (LTMP-DA-GP)来处理cross-compositional generalization。* Results: 提议方法在KaggleDBQA dataset上显示出了superior的性能，这个dataset是用来评估文本到SQL任务的通用性。此外，在不同的LLMs和数据库上，LTMP-DA-GP也显示了与GP相比的性能提高，这highlights the efficacy和model agnostic benefits of the prompt-based adapt和decomposeapproach。

Abstract
Cross-domain and cross-compositional generalization of Text-to-SQL semantic parsing is a challenging task. Existing Large Language Model (LLM) based solutions rely on inference-time retrieval of few-shot exemplars from the training set to synthesize a run-time prompt for each Natural Language (NL) test query. In contrast, we devise an algorithm which performs offline sampling of a minimal set-of few-shots from the training data, with complete coverage of SQL clauses, operators and functions, and maximal domain coverage within the allowed token length. This allows for synthesis of a fixed Generic Prompt (GP), with a diverse set-of exemplars common across NL test queries, avoiding expensive test time exemplar retrieval. We further auto-adapt the GP to the target database domain (DA-GP), to better handle cross-domain generalization; followed by a decomposed Least-To-Most-Prompting (LTMP-DA-GP) to handle cross-compositional generalization. The synthesis of LTMP-DA-GP is an offline task, to be performed one-time per new database with minimal human intervention. Our approach demonstrates superior performance on the KaggleDBQA dataset, designed to evaluate generalizability for the Text-to-SQL task. We further showcase consistent performance improvement of LTMP-DA-GP over GP, across LLMs and databases of KaggleDBQA, highlighting the efficacy and model agnostic benefits of our prompt based adapt and decompose approach.

摘要
横跨域和构成层次的文本到SQL semantics解析是一个具有挑战性的任务。现有的大型自然语言模型（LLM）基本解决方案采用在执行时从训练集中检索一些几个示例来生成每个自然语言（NL）测试查询的时间。相比之下，我们提出了一个算法，它在线上采样了训练数据中的最小集合，涵盖了SQL子句、运算符和函数，并且在允许的字符串长度内实现了最大的域覆盖。这允许我们生成一个固定的生成器提示（GP），其中包含一个多样化的示例集，通用于NL测试查询，以避免高昂的测试时间示例检索。我们进一步自动适应了GP到目标数据库域（DA-GP），以更好地处理跨域泛化；并使用分解的最小到最多提示（LTMP-DA-GP）来处理跨组件泛化。这个synthesis任务是一个离线任务，需要在新的数据库添加时进行一次性的少量人工干预。我们的方法在KaggleDBQA数据集上表现出色，并且在LLMs和数据库之间 consistently 表现出 GP 和 LTMP-DA-GP 之间的性能改进，这 highlights 了我们的提示基于适应和分解方法的效果和模型免疫性。

Making the V in Text-VQA Matter

paper_url: http://arxiv.org/abs/2308.00295
repo_url: None
paper_authors: Shamanthak Hegde, Soumya Jahagirdar, Shankar Gangisetty
for: 提高文本检查问题（Text-based VQA）的答案质量，解决图像文本关系理解不足问题。
methods: 利用VQA数据集作为外部知识，学习图像特征和文本特征之间的关系，提高文本检查问题的答案质量。
results: 通过组合TextVQA数据集和VQA数据集，提高文本检查问题的答案质量，并在不同数据集上进行质量评估和比较。

Abstract
Text-based VQA aims at answering questions by reading the text present in the images. It requires a large amount of scene-text relationship understanding compared to the VQA task. Recent studies have shown that the question-answer pairs in the dataset are more focused on the text present in the image but less importance is given to visual features and some questions do not require understanding the image. The models trained on this dataset predict biased answers due to the lack of understanding of visual context. For example, in questions like "What is written on the signboard?", the answer predicted by the model is always "STOP" which makes the model to ignore the image. To address these issues, we propose a method to learn visual features (making V matter in TextVQA) along with the OCR features and question features using VQA dataset as external knowledge for Text-based VQA. Specifically, we combine the TextVQA dataset and VQA dataset and train the model on this combined dataset. Such a simple, yet effective approach increases the understanding and correlation between the image features and text present in the image, which helps in the better answering of questions. We further test the model on different datasets and compare their qualitative and quantitative results.

摘要
文本基于VQA目标解答问题，通过阅读图像中的文本来回答问题。与普通的VQA任务相比，这种任务需要更多的场景文本关系理解。现有研究表明，数据集中的问题对应答案更加注重图像中的文本，而视觉特征相对较少，一些问题甚至不需要理解图像。因此，模型在这些数据集上训练时会预测偏向的答案，如果只凭文本内容，则忽略图像。为了解决这些问题，我们提议一种方法，通过与VQA数据集的 externos知识来学习图像特征（使V在TextVQA中变得重要），同时学习文本和问题特征。具体来说，我们将TextVQA数据集和VQA数据集组合在一起，并将模型在这个组合数据集上训练。这种简单而有效的方法可以增加图像特征和文本中的关系，从而帮助模型更好地回答问题。我们还对不同的数据集进行测试，并比较其质量和量化结果。

Gated Driver Attention Predictor

paper_url: http://arxiv.org/abs/2308.02530
repo_url: https://github.com/jwfangit/gate-dap
paper_authors: Tianci Zhao, Xue Bai, Jianwu Fang, Jianru Xue
for: 预测司机注意力，以提高驾驶任务理解和安全预测。
methods: 使用网络连接闭合机制，学习不同空间、时间和信息类型在驾驶场景中的重要性。
results: 在DADA-2000和BDDA数据集上证明提出方法的优越性，与现有方法进行比较。

Abstract
Driver attention prediction implies the intention understanding of where the driver intends to go and what object the driver concerned about, which commonly provides a driving task-guided traffic scene understanding. Some recent works explore driver attention prediction in critical or accident scenarios and find a positive role in helping accident prediction, while the promotion ability is constrained by the prediction accuracy of driver attention maps. In this work, we explore the network connection gating mechanism for driver attention prediction (Gate-DAP). Gate-DAP aims to learn the importance of different spatial, temporal, and modality information in driving scenarios with various road types, occasions, and light and weather conditions. The network connection gating in Gate-DAP consists of a spatial encoding network gating, long-short-term memory network gating, and information type gating modules. Each connection gating operation is plug-and-play and can be flexibly assembled, which makes the architecture of Gate-DAP transparent for evaluating different spatial, temporal, and information types for driver attention prediction. Evaluations on DADA-2000 and BDDA datasets verify the superiority of the proposed method with the comparison with state-of-the-art approaches. The code is available on https://github.com/JWFangit/Gate-DAP.

摘要
driver attention prediction 表示司机的意图理解，包括他们想要去哪里和关注哪些对象，通常提供了驾驶任务指导的交通场景理解。一些最近的工作研究了在关键或事故情况下的司机注意力预测，并发现它可以帮助预测事故，但是预测精度的限制了推广能力。在这种情况下，我们研究了网络连接锁定机制 для司机注意力预测（Gate-DAP）。Gate-DAP的目标是在不同的道路类型、场合、照明和天气条件下学习不同的空间、时间和类型信息在驾驶场景中的重要性。Gate-DAP网络连接锁定包括空间编码网络锁定、长短时间记忆网络锁定和信息类型锁定模块。每个连接锁定操作都是可拔取的，可以灵活组合，这使得Gate-DAP的架构可以透明地评估不同的空间、时间和信息类型在司机注意力预测中的重要性。评估于DADA-2000和BDDA数据集 verify了我们提出的方法的优越性，与现有方法进行比较。代码可以在https://github.com/JWFangit/Gate-DAP中找到。

Multi-Modality Multi-Loss Fusion Network

paper_url: http://arxiv.org/abs/2308.00264
repo_url: None
paper_authors: Zehui Wu, Ziwei Gong, Jaywon Koo, Julia Hirschberg
for: 本研究探讨了多模态特征选择和融合的优化方法，以提高情感识别。
methods: 研究使用了不同的融合方法，并对多模态融合网络中的多产业训练进行研究，发现了有用的发现 relate to subnet performance。
results: 我们的最佳模型在三个 dataset（CMU-MOSI、CMU-MOSEI 和 CH-SIMS）上 achieve state-of-the-art performance，并在大多数指标上超过其他方法。我们发现，在多模态特征上进行训练可以提高单模态测试，并基于数据Annotation schema设计融合方法可以提高模型性能。这些结果表明了一种优化特征选择和融合方法的路线图，以提高情感识别在神经网络中的性能。

Abstract
In this work we investigate the optimal selection and fusion of features across multiple modalities and combine these in a neural network to improve emotion detection. We compare different fusion methods and examine the impact of multi-loss training within the multi-modality fusion network, identifying useful findings relating to subnet performance. Our best model achieves state-of-the-art performance for three datasets (CMU-MOSI, CMU-MOSEI and CH-SIMS), and outperforms the other methods in most metrics. We have found that training on multimodal features improves single modality testing and designing fusion methods based on dataset annotation schema enhances model performance. These results suggest a roadmap towards an optimized feature selection and fusion approach for enhancing emotion detection in neural networks.

摘要
在这项研究中，我们研究了多modalities之间的最佳选择和融合，并将其 integrate into a neural network to improve emotion detection。我们比较了不同的融合方法，并研究在多模态融合网络中的多loss训练的影响，发现了有用的发现关于子网络性能。我们的最佳模型在三个 datasets（CMU-MOSI、CMU-MOSEI 和 CH-SIMS）上达到了状态部署性能，并在大多数指标上超过了其他方法。我们发现，训练在多模态特征上提高了单模态测试，而基于数据集注释 schema 设计的融合方法可以提高模型性能。这些结果表明了一种优化特征选择和融合方法的道路，以提高神经网络中的情感检测。

LGViT: Dynamic Early Exiting for Accelerating Vision Transformer

paper_url: http://arxiv.org/abs/2308.00255
repo_url: None
paper_authors: Guanyu Xu, Jiawei Hao, Li Shen, Han Hu, Yong Luo, Hui Lin, Jialie Shen
for: 这个论文的目的是提高资源有限的边缘设备上的视Transformers（ViTs）的部署和加速，以提供多媒体服务。
methods: 这篇论文使用了早期离开方法来加速推理，但是这些方法通常只适用于卷积神经网络（CNNs）和自然语言处理（NLP）领域中的模型。作者们系统地研究了在ViTs中使用早期离开方法的有效性，并发现了这些方法的缺点，例如内部分类器的不充分表示和深度内部分类器的限制。
results: 作者们提出了一个名为LGViT的早期离开框架，该框架包括多种类型的离开头，以实现效率和准确性之间的负荷。作者们还提出了一种两阶段训练方案，包括终端到终端训练和自我顾问，以便将全局和本地信息拼接在一起。实验结果表明，LGViT可以与约1.8倍的速度实现竞争性的性能。

Abstract
Recently, the efficient deployment and acceleration of powerful vision transformers (ViTs) on resource-limited edge devices for providing multimedia services have become attractive tasks. Although early exiting is a feasible solution for accelerating inference, most works focus on convolutional neural networks (CNNs) and transformer models in natural language processing (NLP).Moreover, the direct application of early exiting methods to ViTs may result in substantial performance degradation. To tackle this challenge, we systematically investigate the efficacy of early exiting in ViTs and point out that the insufficient feature representations in shallow internal classifiers and the limited ability to capture target semantic information in deep internal classifiers restrict the performance of these methods. We then propose an early exiting framework for general ViTs termed LGViT, which incorporates heterogeneous exiting heads, namely, local perception head and global aggregation head, to achieve an efficiency-accuracy trade-off. In particular, we develop a novel two-stage training scheme, including end-to-end training and self-distillation with the backbone frozen to generate early exiting ViTs, which facilitates the fusion of global and local information extracted by the two types of heads. We conduct extensive experiments using three popular ViT backbones on three vision datasets. Results demonstrate that our LGViT can achieve competitive performance with approximately 1.8 $\times$ speed-up.

摘要
最近，用于提供多媒体服务的资源有限边缘设备上快速部署和加速强大视transformer（ViT）的任务已经变得非常吸引人。虽然早期离开是一种可行的解决方案，但大多数工作都集中在卷积神经网络（CNN）和自然语言处理（NLP）中。此外，直接将早期离开方法应用于ViT可能会导致显著性能下降。为解决这个挑战，我们系统地研究了ViT中早期离开的效果，并发现了内部分类器的不充分的特征表示和深度内部分类器的限制了这些方法的性能。我们 затем提出了一个通用ViT的早期离开框架，称为LGViT，该框架包括了多种exit head，包括本地感知头和全局聚合头，以实现效率准确之间的负面关系。具体来说，我们开发了一种新的两阶段训练方案，包括端到端训练和自我折衔练习，使得早期离开ViT可以同时捕捉全局和本地信息。我们在三个流行的ViT背景上进行了三个视觉数据集的广泛实验。结果表明，我们的LGViT可以实现与约1.8倍的速度提升。

EEG-based Cognitive Load Classification using Feature Masked Autoencoding and Emotion Transfer Learning

paper_url: http://arxiv.org/abs/2308.00246
repo_url: None
paper_authors: Dustin Pulver, Prithila Angkan, Paul Hungler, Ali Etemad
for: 本研究的目的是开发一个基于电enzephalogram（EEG）的认知负载分类方法。
methods: 本研究使用了 transformer 架构，通过将情感和认知负载的训练数据整合，以进行认知负载分类。我们首先使用了自我监督隐藏数据的泛化学习，然后使用固定重量和精致调整的转移学习来进行下游认知负载分类。
results: 我们的实验结果显示，我们的提议方法可以实现优异的结果，并比传统单阶充分监督学习更好。此外，我们还进行了细部拆分和敏感度研究，以评估不同方面的影响。本研究对于情感 computing 领域的成长做出了贡献，并开启了跨领域转移学习的新领域。

Abstract
Cognitive load, the amount of mental effort required for task completion, plays an important role in performance and decision-making outcomes, making its classification and analysis essential in various sensitive domains. In this paper, we present a new solution for the classification of cognitive load using electroencephalogram (EEG). Our model uses a transformer architecture employing transfer learning between emotions and cognitive load. We pre-train our model using self-supervised masked autoencoding on emotion-related EEG datasets and use transfer learning with both frozen weights and fine-tuning to perform downstream cognitive load classification. To evaluate our method, we carry out a series of experiments utilizing two publicly available EEG-based emotion datasets, namely SEED and SEED-IV, for pre-training, while we use the CL-Drive dataset for downstream cognitive load classification. The results of our experiments show that our proposed approach achieves strong results and outperforms conventional single-stage fully supervised learning. Moreover, we perform detailed ablation and sensitivity studies to evaluate the impact of different aspects of our proposed solution. This research contributes to the growing body of literature in affective computing with a focus on cognitive load, and opens up new avenues for future research in the field of cross-domain transfer learning using self-supervised pre-training.

摘要
它们的诠释：词语翻译：认知负担（Cognitive Load）在完成任务和决策过程中扮演着重要的角色，因此其分类和分析变得非常重要。在这篇论文中，我们提出了一种基于电энцефалографи（EEG）的认知负担分类方法。我们的模型采用了变换器架构，并使用了移植学习来连接情感和认知负担。我们在情感相关的EEG数据集上进行自主学习的掩码自动编码预训练，然后使用冻结权重和精度调整来执行下游认知负担分类。为了评估我们的方法，我们在两个公共可用的EEG基于情感数据集上进行预训练，而用CL-Drive数据集进行下游认知负担分类。实验结果表明，我们的提议方法实现了出色的结果，并超过了传统的单阶段完全监督学习。此外，我们进行了详细的减少和敏感性研究，以评估不同方面的影响。这些研究对情感计算领域的发展做出了贡献，并开启了跨领域自动学习使用自主预训练的新途径。

The Hitchhiker’s Guide to Program Analysis: A Journey with Large Language Models

paper_url: http://arxiv.org/abs/2308.00245
repo_url: None
paper_authors: Haonan Li, Yu Hao, Yizhuo Zhai, Zhiyun Qian
for: This paper is written to explore the use of Large Language Models (LLMs) in assisting static analysis for identifying bugs in software systems.
methods: The paper proposes a fully automated agent called LLift, which interfaces with a static analysis tool and an LLM to overcome challenges in using LLMs for bug discovery.
results: The paper demonstrates the effectiveness of LLift in identifying potential use-before-initialization (UBI) bugs in a real-world scenario, with a high precision (50%) and recall rate (100%). Additionally, LLift identified 13 previously unknown UBI bugs in the Linux kernel.

Abstract
Static analysis is a widely used technique in software engineering for identifying and mitigating bugs. However, a significant hurdle lies in achieving a delicate balance between precision and scalability. Large Language Models (LLMs) offer a promising alternative, as recent advances demonstrate remarkable capabilities in comprehending, generating, and even debugging code. Yet, the logic of bugs can be complex and require sophisticated reasoning and a large analysis scope spanning multiple functions. Therefore, at this point, LLMs are better used in an assistive role to complement static analysis. In this paper, we take a deep dive into the open space of LLM-assisted static analysis, using use-before-initialization (UBI) bugs as a case study. To this end, we develop LLift, a fully automated agent that interfaces with both a static analysis tool and an LLM. By carefully designing the agent and the prompts, we are able to overcome a number of challenges, including bug-specific modeling, the large problem scope, the non-deterministic nature of LLMs, etc. Tested in a real-world scenario analyzing nearly a thousand potential UBI bugs produced by static analysis, LLift demonstrates an extremely potent capability, showcasing a high precision (50%) and recall rate (100%). It even identified 13 previously unknown UBI bugs in the Linux kernel. This research paves the way for new opportunities and methodologies in the use of LLMs for bug discovery in extensive, real-world datasets.

摘要
Static 分析是软件工程中广泛使用的技术，用于发现和解决错误。然而，实现精准和可扩展性之间存在一定的挑战。大自然语言模型（LLM）提供了一个有前途的选择，因为最新的进步表明其在理解、生成和甚至调试代码方面具有惊人的能力。然而，错误的逻辑可能很复杂，需要卓越的推理和广泛的分析范围，因此在这点上，LLMs 更适合作为辅助工具来补充静态分析。在这篇论文中，我们深入探讨了使用 LLM 辅助静态分析的开放空间，使用 use-before-initialization（UBI）错误作为例子。为此，我们开发了 LLift，一个完全自动的代理人，它与静态分析工具和 LLM 集成。通过细心设计代理人和提示，我们成功解决了一些挑战，包括错误特定模型、大问题范围、非束定的 LLM 等。在实际场景中，LLift 对 nearly thousand 个 UBI 错误进行了分析，显示了非常高的精度（50%）和回归率（100%）。甚至找到了 13 个前不知道的 UBI 错误在 Linux 内核中。这项研究开创了新的机遇和方法在使用 LLM 进行错误发现的大规模、真实世界数据中。

ChatMOF: An Autonomous AI System for Predicting and Generating Metal-Organic Frameworks

paper_url: http://arxiv.org/abs/2308.01423
repo_url: https://github.com/yeonghun1675/chatmof
paper_authors: Yeonghun Kang, Jihan Kim
for: 这个论文是为了探讨和开发一个基于自然语言处理的金属有机框架预测和生成系统（ChatMOF）。
methods: 该系统使用了一个大规模语言模型（gpt-3.5-turbo），从文本输入中提取关键信息并提供相应的回答，从而消除了僵化的结构化查询的需求。系统由三个核心组件（代理、工具包和评估器）组成，实现了许多任务，包括数据检索、性能预测和结构生成。
results: 研究发现，使用大规模语言模型AI系统在材料科学领域可以提供优秀的预测和生成功能，并且具有可观的可Transformative potential。

Abstract
ChatMOF is an autonomous Artificial Intelligence (AI) system that is built to predict and generate of metal-organic frameworks (MOFs). By leveraging a large-scale language model (gpt-3.5-turbo), ChatMOF extracts key details from textual inputs and delivers appropriate responses, thus eliminating the necessity for rigid structured queries. The system is comprised of three core components (i.e. an agent, a toolkit, and an evaluator) and it forms a robust pipeline that manages a variety of tasks, including data retrieval, property prediction, and structure generation. The study further explores the merits and constraints of using large language models (LLMs) AI system in material sciences using and showcases its transformative potential for future advancements.

摘要
chatMOF是一个自主的人工智能系统，用于预测和生成金刚烷金刚烷框架（MOFs）。通过利用大规模语言模型（gpt-3.5-turbo），chatMOF从文本输入中提取关键信息并提供相应的回答，因此消除了僵化的结构化查询的需求。系统由三个核心组件（即代理、工具包和评估器）组成，形成一个完整的管道，可以处理多种任务，包括数据 retrieve、属性预测和结构生成。研究还探讨了使用大语言模型（LLMs）AI系统在材料科学中的优劣和限制，并展示了其在未来发展中的Transformative潜力。

Capsa: A Unified Framework for Quantifying Risk in Deep Neural Networks

paper_url: http://arxiv.org/abs/2308.00231
repo_url: None
paper_authors: Sadhana Lolla, Iaroslav Elistratov, Alejandro Perez, Elaheh Ahmadi, Daniela Rus, Alexander Amini
for: 本研究旨在提供一个扩展模型的框架，以提高深度神经网络（NN）的风险意识。
methods: 本研究使用的方法包括多种风险量化方法，包括 aleatoric uncertainty、epistemic uncertainty 和 bias estimation。这些方法可以独立或相互组合使用，以提供全面的风险意识。
results: 研究表明，使用 capsa 框架可以轻松地组合不同的风险量化方法，并在复杂的感知 datasets 上进行测试。 results 显示，capsa 框架可以提供全面的风险意识，并且可以轻松地扩展到不同的应用场景。

Abstract
The modern pervasiveness of large-scale deep neural networks (NNs) is driven by their extraordinary performance on complex problems but is also plagued by their sudden, unexpected, and often catastrophic failures, particularly on challenging scenarios. Existing algorithms that provide risk-awareness to NNs are complex and ad-hoc. Specifically, these methods require significant engineering changes, are often developed only for particular settings, and are not easily composable. Here we present capsa, a framework for extending models with risk-awareness. Capsa provides a methodology for quantifying multiple forms of risk and composing different algorithms together to quantify different risk metrics in parallel. We validate capsa by implementing state-of-the-art uncertainty estimation algorithms within the capsa framework and benchmarking them on complex perception datasets. We demonstrate capsa's ability to easily compose aleatoric uncertainty, epistemic uncertainty, and bias estimation together in a single procedure, and show how this approach provides a comprehensive awareness of NN risk.

摘要
现代大规模深度神经网络（NN）的普遍性受到了它在复杂问题上的极高性能的推动，然而它也面临着突然、不可预期和有时候可致命的失败问题，尤其是在复杂的场景下。现有的风险意识提供方法是复杂且尝试性的，它们需要大量的工程改进，通常只适用于特定的设置，并且不易组合。我们现在提出了 capsa 框架，用于延展模型中的风险意识。capsa 提供了评估多种风险形式的方法和组合不同风险度量计算的能力。我们通过在 capsa 框架中实现现状的不确定性估计算算法，并将其分类 onto 复杂的感知数据集进行了验证。我们示出了 capsa 可以轻松地组合 aleatoric 不确定性、epistemic 不确定性和偏见估计 together 在单个过程中，并证明了这种方法可以提供全面的 NN 风险意识。

Experiments on Generative AI-Powered Parametric Modeling and BIM for Architectural Design

paper_url: http://arxiv.org/abs/2308.00227
repo_url: None
paper_authors: Jaechang Ko, John Ajibefun, Wei Yan
for: 该论文提出了一种新的建筑设计框架，利用生成AI工具包括ChatGPT和Veras，并结合参数化模型和建筑信息模型（BIM），以提高设计过程。
methods: 该研究利用了ChatGPT和生成AI在3D建筑设计中的潜在力，超出了其在文本和2D图像生成中的使用范围。
results: 提出的框架可以增强建筑师和AI之间的合作，快速探索设计想法，生成具有上下文敏感性和创新性的设计生成。通过将ChatGPT用于脚本编写和Veras用于生成设计想法，并与广泛使用的参数化模型和BIM工具集成，该框架为建筑师提供了直观和强大的方法来传达设计意图，从而提高设计效率、创新性和合作性。

Abstract
This paper introduces a new architectural design framework that utilizes generative AI tools including ChatGPT and Veras with parametric modeling and Building Information Modeling (BIM) to enhance the design process. The study experiments with the potential of ChatGPT and generative AI in 3D architectural design, extending beyond its use in text and 2D image generation. The proposed framework promotes collaboration between architects and AI, facilitating a quick exploration of design ideas and producing context-sensitive, creative design generation. By integrating ChatGPT for scripting and Veras for generating design ideas with widely used parametric modeling and BIM tools, the framework provides architects with an intuitive and powerful method to convey design intent, leading to more efficient, creative, and collaborative design processes.

摘要
这篇论文提出了一种新的建筑设计框架，利用生成AI工具，包括ChatGPT和Veras，并与参数化建筑模型和建筑信息模型（BIM）结合，以提高设计过程。该研究对生成AI在3D建筑设计中的潜力进行了实验，超出了其在文本和2D图像生成领域的使用范围。提出的框架旨在促进建筑师和AI之间的合作，快速探索设计想法，生成受Contextsensitive的、创新的设计生成。通过将ChatGPT用于脚本和Veras用于生成设计想法，并与常用的参数化建筑模型和BIM工具结合，该框架为建筑师提供了直观和强大的方法，以达到更高效、更创新、更合作的设计过程。

Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias

paper_url: http://arxiv.org/abs/2308.00225
repo_url: None
paper_authors: Itay Itzhak, Gabriel Stanovsky, Nir Rosenfeld, Yonatan Belinkov
for: 本研究旨在探讨大语言模型（LM）在受到人类反馈的指导下进行调整后，是否会受到更多的偏见影响。
methods: 本研究使用了多种方法来检测大语言模型中的偏见，包括权重调整、排名预测和人类反馈等。
results: 研究发现，经过受到人类反馈的调整后，大语言模型中的偏见呈现更加明显，尤其是在三种常见的偏见中，即套利效应、信心效应和信仰偏见中。这些偏见在不同的模型中显得更加明显，特别是在受到人类反馈的模型中，如Flant-T5、GPT3.5和GPT4等。这些发现提供了对instruction-tuned LMs中偏见的更深入的理解，这将对于开发更加可靠和无偏的语言模型是非常重要。

Abstract
Recent studies show that instruction tuning and learning from human feedback improve the abilities of large language models (LMs) dramatically. While these tuning methods can make models generate high-quality text, we conjecture that more implicit cognitive biases may arise in these fine-tuned models. Our work provides evidence that these fine-tuned models exhibit biases that were absent or less pronounced in their pretrained predecessors. We examine the extent of this phenomenon in three cognitive biases - the decoy effect, the certainty effect, and the belief bias - all of which are known to influence human decision-making and reasoning. Our findings highlight the presence of these biases in various models, especially those that have undergone instruction tuning, such as Flan-T5, GPT3.5, and GPT4. This research constitutes a step toward comprehending cognitive biases in instruction-tuned LMs, which is crucial for the development of more reliable and unbiased language models.

摘要
Recent studies show that 教程调整和学习人类反馈可以大幅提高大语言模型（LM）的能力。然而，我们 conjecture 这些调整方法可能会使模型生成高质量的文本中潜藏更多的隐性认知偏见。我们的研究发现，经过调整后的模型具有先前缺 absent 或较弱的偏见。我们对三种认知偏见——杂谱效应、确定性效应和信念偏见——进行了调查，这些偏见都是影响人类决策和思维的known。我们发现，特别是经过 instrucion tuning 的模型，如 Flan-T5、GPT3.5 和 GPT4，具有这些偏见。这项研究为了理解 instruction-tuned LM 中的认知偏见，是对更可靠和不偏的语言模型发展的重要一步。

Advancing Beyond Identification: Multi-bit Watermark for Language Models

paper_url: http://arxiv.org/abs/2308.00221
repo_url: None
paper_authors: KiYoon Yoo, Wonhyuk Ahn, Nojun Kwak
for: 这项研究旨在预防大语言模型的滥用，不仅是机器生成文本的识别。
methods: 该研究提出了一种名为“多位色列标”（COLOR）的技术，在语言模型生成过程中嵌入可追溯的多位信息。
results: 验证性实验表明，COLOR可以在500个符号左右的中等长度文本中成功嵌入32位消息，准确率达91.9%。这项工作为抗language模型滥用提供了新的策略。

Abstract
This study aims to proactively tackle misuse of large language models beyond identification of machine-generated text. While existing methods focus on detection, some malicious misuses demand tracing the adversary user for counteracting them. To address this, we propose "Multi-bit Watermark through Color-listing" (COLOR), embedding traceable multi-bit information during language model generation. Leveraging the benefits of zero-bit watermarking (Kirchenbauer et al., 2023a), COLOR enables extraction without model access, on-the-fly embedding, and maintains text quality, while allowing zero-bit detection all at the same time. Preliminary experiments demonstrates successful embedding of 32-bit messages with 91.9% accuracy in moderate-length texts ($\sim$500 tokens). This work advances strategies to counter language model misuse effectively.

摘要

Deep Reinforcement Learning-Based Battery Conditioning Hierarchical V2G Coordination for Multi-Stakeholder Benefits

paper_url: http://arxiv.org/abs/2308.00218
repo_url: None
paper_authors: Yubao Zhang, Xin Chen, Yi Gu, Zhicheng Li, Wu Kai
for: 提高可再生能源利用率和电网稳定性，适用于大规模电动车充电 scheduling 策略。
methods: 基于深度强化学习（DRL）和证明权益算法，实现多方参与者协调。
results: 相比四种基准方案，提高可再生能源消耗率、缓和负荷波动、满足电动车充电需求，降低充电成本和电池衰竭。

Abstract
With the growing prevalence of electric vehicles (EVs) and advancements in EV electronics, vehicle-to-grid (V2G) techniques and large-scale scheduling strategies have emerged to promote renewable energy utilization and power grid stability. This study proposes a multi-stakeholder hierarchical V2G coordination based on deep reinforcement learning (DRL) and the Proof of Stake algorithm. Furthermore, the multi-stakeholders include the power grid, EV aggregators (EVAs), and users, and the proposed strategy can achieve multi-stakeholder benefits. On the grid side, load fluctuations and renewable energy consumption are considered, while on the EVA side, energy constraints and charging costs are considered. The three critical battery conditioning parameters of battery SOX are considered on the user side, including state of charge, state of power, and state of health. Compared with four typical baselines, the multi-stakeholder hierarchical coordination strategy can enhance renewable energy consumption, mitigate load fluctuations, meet the energy demands of EVA, and reduce charging costs and battery degradation under realistic operating conditions.

摘要
On the grid side, the approach considers load fluctuations and renewable energy consumption, while on the EVA side, it considers energy constraints and charging costs. For users, the approach takes into account three critical battery conditioning parameters: state of charge, state of power, and state of health.Compared to four typical baselines, the multi-stakeholder hierarchical coordination strategy can increase renewable energy consumption, mitigate load fluctuations, meet the energy demands of EVAs, and reduce charging costs and battery degradation under realistic operating conditions.

Performance Evaluation of Swin Vision Transformer Model using Gradient Accumulation Optimization Technique

paper_url: http://arxiv.org/abs/2308.00197
repo_url: None
paper_authors: Sanad Aburass, Osama Dorgham
for: 评估 Swin Transformers 模型使用 gradient accumulation optimization（GAO）技术的性能和训练时间影响。
methods: 使用 GAO 技术对 Swin ViT 模型进行优化。
results: GAO 技术对 Swin ViT 模型的精度有显著下降，训练时间增加显著。这些结果表明在 Swin ViT 模型中使用 GAO 技术可能不太适用，需要谨慎使用。

Abstract
Vision Transformers (ViTs) have emerged as a promising approach for visual recognition tasks, revolutionizing the field by leveraging the power of transformer-based architectures. Among the various ViT models, Swin Transformers have gained considerable attention due to their hierarchical design and ability to capture both local and global visual features effectively. This paper evaluates the performance of Swin ViT model using gradient accumulation optimization (GAO) technique. We investigate the impact of gradient accumulation optimization technique on the model's accuracy and training time. Our experiments show that applying the GAO technique leads to a significant decrease in the accuracy of the Swin ViT model, compared to the standard Swin Transformer model. Moreover, we detect a significant increase in the training time of the Swin ViT model when GAO model is applied. These findings suggest that applying the GAO technique may not be suitable for the Swin ViT model, and concern should be undertaken when using GAO technique for other transformer-based models.

摘要
视力变换器（ViT）技术在视觉识别任务中表现出色，革命化了该领域，通过利用变换器结构的力量。 Among the various ViT models, Swin Transformers have gained considerable attention due to their hierarchical design and ability to capture both local and global visual features effectively. This paper evaluates the performance of Swin ViT model using gradient accumulation optimization (GAO) technique. We investigate the impact of gradient accumulation optimization technique on the model's accuracy and training time. Our experiments show that applying the GAO technique leads to a significant decrease in the accuracy of the Swin ViT model, compared to the standard Swin Transformer model. Moreover, we detect a significant increase in the training time of the Swin ViT model when GAO model is applied. These findings suggest that applying the GAO technique may not be suitable for the Swin ViT model, and concern should be undertaken when using GAO technique for other transformer-based models.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard Chinese languages used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Analytical Techniques to Support Hospital Case Mix Planning

paper_url: http://arxiv.org/abs/2308.07323
repo_url: None
paper_authors: Robert L Burdett, Paul corry, David Cook, Prasad Yarlagadda
for: 本文旨在提供价值评估和决策支持工具，以支持医院资源评估和案例混合规划（CMP）方法。
methods: 本文提出了一种优化模型，用于分析修改现有案例混合的影响。该模型可以根据医院资源可用性水平的变化，对其他患者类型进行修改。此外，本文还提出了多目标决策技术，用于比较和评估竞争性案例混合解决方案。
results: 本文的提出的技术可以帮助医院管理者更好地了解医院资源的使用情况，并提供更多的情况情况。这些技术还可以帮助医院管理者根据医院资源的限制，选择最佳的案例混合方案。

Abstract
This article introduces analytical techniques and a decision support tool to support capacity assessment and case mix planning (CMP) approaches previously created for hospitals. First, an optimization model is proposed to analyse the impact of making a change to an existing case mix. This model identifies how other patient types should be altered proportionately to the changing levels of hospital resource availability. Then we propose multi-objective decision-making techniques to compare and critique competing case mix solutions obtained. The proposed techniques are embedded seamlessly within an Excel Visual Basic for Applications (VBA) personal decision support tool (PDST), for performing informative quantitative assessments of hospital capacity. The PDST reports informative metrics of difference and reports the impact of case mix modifications on the other types of patient present. The techniques developed in this article provide a bridge between theory and practice that is currently missing and provides further situational awareness around hospital capacity.

摘要

Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?

paper_url: http://arxiv.org/abs/2308.00189
repo_url: None
paper_authors: Ari Holtzman, Peter West, Luke Zettlemoyer
for: 本研究旨在解释语言模型完成多个任务的行为，以帮助未来的研究人员更好地理解和预测语言模型的行为。
methods: 本研究使用了系统性的分析方法，将语言模型的行为 decomposed into 多个类别，以便更好地理解cross-task表现。
results: 研究发现，语言模型的行为可以分为多个类别，包括表示、生成和逻辑等类别，这些类别之间存在着较强的相互关系。这些结果可以帮助未来的研究人员更好地理解和预测语言模型的行为，以便更好地应用语言模型。

Abstract
Coaxing out desired behavior from pretrained models, while avoiding undesirable ones, has redefined NLP and is reshaping how we interact with computers. What was once a scientific engineering discipline-in which building blocks are stacked one on top of the other-is arguably already a complex systems science, in which emergent behaviors are sought out to support previously unimagined use cases. Despite the ever increasing number of benchmarks that measure task performance, we lack explanations of what behaviors language models exhibit that allow them to complete these tasks in the first place. We argue for a systematic effort to decompose language model behavior into categories that explain cross-task performance, to guide mechanistic explanations and help future-proof analytic research.

摘要
使已经训练过的模型展现所需的行为，而不是不良的行为，已经重定义了自然语言处理（NLP）领域，并在我们与计算机之间的交互方式中发生了重大变革。以前，建构模型的工程学是一个科学工程领域，在其中建构块一个接一个排列在上面。然而，现在可能已经是一种复杂系统科学，在寻找emergent行为以支持前所未想到的用 caso。尽管任务性能的测试benchmark数量在不断增加，但我们仍然缺乏 Completing these tasks in the first place. We argue for a systematic effort to decompose language model behavior into categories that explain cross-task performance, to guide mechanistic explanations and help future-proof analytic research.

Multicriteria Optimization Techniques for Understanding the Case Mix Landscape of a Hospital

paper_url: http://arxiv.org/abs/2308.07322
repo_url: None
paper_authors: Robert L Burdett, Paul Corry, Prasad Yarlagadda, David Cook, Sean Birgan
for: 本文针对医院内不同病例混合（case mix）的影响进行研究，以提高医院的质量和效率。
methods: 本文提出了一个改进了多 критери维优化（MCO）方法，并使用了平行减少条件法（ECM）和KD-Trees来生成更多的非主对称（Pareto优）病例混合。
results: 本文获得了一个更好的非主对称病例混合archive，并提出了一个适合决策支持工具（DST）来生成、检视、导航和查询这个archive。

Abstract
Various medical and surgical units operate in a typical hospital and to treat their patients these units compete for infrastructure like operating rooms (OR) and ward beds. How that competition is regulated affects the capacity and output of a hospital. This article considers the impact of treating different patient case mix (PCM) in a hospital. As each case mix has an economic consequence and a unique profile of hospital resource usage, this consideration is important. To better understand the case mix landscape and to identify those which are optimal from a capacity utilisation perspective, an improved multicriteria optimization (MCO) approach is proposed. As there are many patient types in a typical hospital, the task of generating an archive of non-dominated (i.e., Pareto optimal) case mix is computationally challenging. To generate a better archive, an improved parallelised epsilon constraint method (ECM) is introduced. Our parallel random corrective approach is significantly faster than prior methods and is not restricted to evaluating points on a structured uniform mesh. As such we can generate more solutions. The application of KD-Trees is another new contribution. We use them to perform proximity testing and to store the high dimensional Pareto frontier (PF). For generating, viewing, navigating, and querying an archive, the development of a suitable decision support tool (DST) is proposed and demonstrated.

摘要
医院内的不同医疗和手术单位需要共享设备，如操作室（OR）和病房床位，这些设备的竞争会影响医院的负荷和产量。这篇文章考虑了医院内不同患者案例混合（PCM）的影响。每个案例混合都有经济影响和医院资源的唯一性使用特征，因此这种考虑是重要的。为更好地了解案例混合风景，并确定最佳的负荷利用情况，本文提出了改进的多 criterion 优化（MCO）方法。由于医院内通常有很多种类的患者，生成完整的非束定（i.e., Pareto优）案例混合的任务是计算复杂的。为了生成更好的案例混合，本文引入了改进的并行epsilon constraint方法（ECM）。我们的并行随机修正方法比之前的方法更快速，并不 Restricted to评估点structured uniform mesh。因此，我们可以生成更多的解。另外，我们使用KD-Trees来进行 proximity testing和存储高维度Pareto前沿（PF）。为生成、查看、导航和查询案例混合 archive，我们提出了一种适用的决策支持工具（DST），并进行了示例。

The Efficacy of Utility Functions for Multicriteria Hospital Case-Mix Planning

paper_url: http://arxiv.org/abs/2308.07321
repo_url: None
paper_authors: Robert L Burdett, Paul Corry, Prasad Yarlagadda, David Cook, Sean Birgan
for: 本研究的目的是发展一种新的医院案例混合规划方法，以满足不同决策者的 preference和需求。
methods: 本研究使用了 utility functions（UF）来表达不同决策者对输出的偏好和看法。 UF 被 scalarization 以生成一个量化的评价方法，以分配医院资源到不同的运营单元，并提供更好的容量分配和案例混合。
results: 研究发现，UF 方法可以评价医院不同决策者的评价标准，并且可以捕捉医院不同目标和对输出的不同需求。此外，UF 方法还可以提供对输出的敏感分析，帮助医院管理者更好地理解和评价不同案例的需求和优先级。

Abstract
A new approach to perform hospital case-mix planning (CMP) is introduced in this article. Our multi-criteria approach utilises utility functions (UF) to articulate the preferences and standpoint of independent decision makers regarding outputs. The primary aim of this article is to test whether a utility functions method (UFM) based upon the scalarization of aforesaid UF is an appropriate quantitative technique to, i) distribute hospital resources to different operating units, and ii) provide a better capacity allocation and case mix. Our approach is motivated by the need to provide a method able to evaluate the trade-off between different stakeholders and objectives of hospitals. To the best of our knowledge, no such approach has been considered before in the literature. As we will later show, this idea addresses various technical limitations, weaknesses, and flaws in current CMP. The efficacy of the aforesaid approach is tested on a case study of a large tertiary hospital. Currently UF are not used by hospital managers, and real functions are unavailable, hence, 14 rational options are tested. Our exploratory analysis has provided important guidelines for the application of these UF. It indicates that these UF provide a valuable starting point for planners, managers, and executives of hospitals to impose their goals and aspirations. In conclusion, our approach may be better at identifying case mix that users want to treat and seems more capable of modelling the varying importance of different levels of output. Apart from finding desirable case mixes to consider, the approach can provide important insights via a sensitivity analysis of the parameters of each UF.

摘要
本文介绍了一种新的医院案例混合规划（CMP）方法。我们的多 criterion 方法利用了用户函数（UF）来表达各独立决策者对输出的偏好和观点。本文的主要目标是测试 Whether a utility functions method（UFM）基于上述 UF 的权值函数是一种适当的量化技术，以分配医院资源到不同的运营部门，并提供更好的容量分配和案例混合。我们的方法是由医院的不同参与者和目标所驱动的，以解决现有 CMP 中的技术限制、弱点和缺陷。根据我们的研究，这种想法在现有 CMP 中未有所考虑。我们的案例研究表明，这种方法可以更好地评估医院的资源分配和案例混合，并提供关键的参与者和目标的指导。在本文中，我们使用了14种理性选项来测试 UF。我们的探索分析表明，这些 UF 提供了一个有价值的起点，可以帮助医院规划者、管理者和执行者实现他们的目标和 aspirations。结论是，我们的方法可能更好地认知用户想要治疗的案例混合，并且更能模拟不同输出水平的重要性。除了找到愿意考虑的案例混合外，我们的方法还可以提供重要的参与者和目标的指导，以及敏感分析每个 UF 的参数。

Attribution-Scores in Data Management and Explainable Machine Learning

paper_url: http://arxiv.org/abs/2308.00184
repo_url: None
paper_authors: Leopoldo Bertossi
for: 本研究探讨了使用实际 causality 定义责任分数的最新研究，用于解释数据库中的查询结果，以及机器学习中的分类模型的结果。
methods: 本研究使用了实际 causality 定义责任分数，并与数据库修复相连接，以获得数据库的一致性量化度量。在机器学习中，责任分数得到了正确扩展和解释。此外，本研究还研究了 Shap-score 的有效计算方法。
results: 本研究的结果表明，使用实际 causality 定义责任分数可以帮助解释数据库中的查询结果和机器学习中的分类结果。此外，本研究还提供了一种有效的 Shap-score 计算方法。

Abstract
We describe recent research on the use of actual causality in the definition of responsibility scores as explanations for query answers in databases, and for outcomes from classification models in machine learning. In the case of databases, useful connections with database repairs are illustrated and exploited. Repairs are also used to give a quantitative measure of the consistency of a database. For classification models, the responsibility score is properly extended and illustrated. The efficient computation of Shap-score is also analyzed and discussed. The emphasis is placed on work done by the author and collaborators.

摘要
我们描述了最近的研究，把实际 causality 用于责任分数的定义，以解释查询结果存储在数据库中，以及机器学习模型中的结果。在数据库方面，我们 illustrate 了有用的连接，并利用了数据库修复。修复还可以给出数据库的一个量化度量。在机器学习模型方面，责任分数得到了正确的扩展和图示。我们还分析了计算 Shap-score 的高效方法。我们强调了作者和合作者的工作。

Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity

paper_url: http://arxiv.org/abs/2308.00177
repo_url: None
paper_authors: Charlie Hou, Kiran Koshy Thekumparampil, Michael Shavlovsky, Giulia Fanti, Yesh Dattatreya, Sujay Sanghavi
for: 本研究旨在检验是否可以通过无监督预训练提高学习排序（LTR）问题的性能，并与基本监督学习模型（GBDT）和其他非预训练模型进行比较。
methods: 本研究使用了一些简单的设计选择，包括 SimCLR-Rank，我们对 SimCLR（一种无监督预训练方法）进行了修改，以生成预训练的深度学习模型。
results: 研究结果表明，预训练模型可以将 GBDT 和其他非预训练模型Soundly 超越，特别是在 labels 数量相对较多的情况下。此外，预训练模型还可以在排序异常数据时表现更好的Robustness 性能。

Abstract
While deep learning (DL) models are state-of-the-art in text and image domains, they have not yet consistently outperformed Gradient Boosted Decision Trees (GBDTs) on tabular Learning-To-Rank (LTR) problems. Most of the recent performance gains attained by DL models in text and image tasks have used unsupervised pretraining, which exploits orders of magnitude more unlabeled data than labeled data. To the best of our knowledge, unsupervised pretraining has not been applied to the LTR problem, which often produces vast amounts of unlabeled data. In this work, we study whether unsupervised pretraining can improve LTR performance over GBDTs and other non-pretrained models. Using simple design choices--including SimCLR-Rank, our ranking-specific modification of SimCLR (an unsupervised pretraining method for images)--we produce pretrained deep learning models that soundly outperform GBDTs (and other non-pretrained models) in the case where labeled data is vastly outnumbered by unlabeled data. We also show that pretrained models also often achieve significantly better robustness than non-pretrained models (GBDTs or DL models) in ranking outlier data.

摘要
深度学习（DL）模型在文本和图像领域是当前最佳性的，但它们没有一直Consistently exceeded Gradient Boosted Decision Trees（GBDTs）在表格学习到排名（LTR）问题上。大多数最近的DL模型在文本和图像任务中的性能提升都是通过预训练来实现，这些预训练使用了几个数量级的无标签数据。据我们所知，预训练没有被应用于LTR问题，这个问题通常会生成巨量的无标签数据。在这个工作中，我们研究了是否可以通过预训练提高LTR性能，并与GBDTs和其他非预训练模型进行比较。使用简单的设计选择（包括我们的SimCLR-Rank修改），我们生成了在标签数据被很多的情况下，深度学习模型与GBDTs和其他非预训练模型相比，具有显著的性能优势。我们还发现预训练模型通常在排名异常数据时也具有更好的robustness性。

Adversarially Robust Neural Legal Judgement Systems

paper_url: http://arxiv.org/abs/2308.00165
repo_url: None
paper_authors: Rohit Raj, V Susheela Devi
for: 法律判断预测系统的研究，旨在预测法律案件的结果基于案件的事实描述。
methods: 这些研究使用自然语言处理（NLP）技术预测法律判断结果基于事实描述。
results: 我们的方法在三个法律数据集上进行了广泛的实验，并显示了对抗攻击的显著改进，比前一代法律判断预测系统更加可靠。

Abstract
Legal judgment prediction is the task of predicting the outcome of court cases on a given text description of facts of cases. These tasks apply Natural Language Processing (NLP) techniques to predict legal judgment results based on facts. Recently, large-scale public datasets and NLP models have increased research in areas related to legal judgment prediction systems. For such systems to be practically helpful, they should be robust from adversarial attacks. Previous works mainly focus on making a neural legal judgement system; however, significantly less or no attention has been given to creating a robust Legal Judgement Prediction(LJP) system. We implemented adversarial attacks on early existing LJP systems and found that none of them could handle attacks. In this work, we proposed an approach for making robust LJP systems. Extensive experiments on three legal datasets show significant improvements in our approach over the state-of-the-art LJP system in handling adversarial attacks. To the best of our knowledge, we are the first to increase the robustness of early-existing LJP systems.

摘要
法律判断预测是指根据案例事实文本预测法律判断结果。这些任务应用自然语言处理（NLP）技术来预测法律判断结果基于事实。最近，大规模公共数据集和NLP模型的提高了有关法律判断预测系统的研究。为了使这些系统在实际中有用，它们应该是Robust against adversarial attacks。以前的工作主要集中在建立神经法律判断系统；然而，Significantly less or no attention has been given to create a robust Legal Judgement Prediction(LJP) system。我们对早期的LJP系统进行了抗击攻击，并发现其无法抵抗攻击。在这种情况下，我们提出了一种方法来提高LJP系统的Robustness。广泛的实验表明，我们的方法在三个法律数据集上显著提高了对抗攻击的能力，与现有的LJP系统相比。到目前为止，我们是第一个提高早期LJP系统的Robustness。

Predicting Perfect Quality Segments in MT Output with Fine-Tuned OpenAI LLM: Is it possible to capture editing distance patterns from historical data?

paper_url: http://arxiv.org/abs/2308.00158
repo_url: None
paper_authors: Serge Gladkoff, Gleb Erofeev, Lifeng Han, Goran Nenadic
for: 本研究旨在检验现有大语言模型（LLM）是否可以用于精度评估翻译（TQE）任务，以及其能力。
methods: 我们使用了ChatGPT作为示例，将TQE作为二分类任务进行 fine-tuning。使用了英语到意大利语、德语、法语、日语、荷语、葡萄牙语、土耳其语和中文等八种语言对training corpora进行训练。
results: 我们的实验结果显示，通过API进行 fine-tuning的ChatGPT可以在预测翻译质量方面达到相对高的分数，例如英语-意大利语和英语-德语的分数分别是82.42%和83.69%。但是，模型准确率还有很大空间提高。

Abstract
Translation Quality Estimation (TQE) is an essential step before deploying the output translation into usage. TQE is also critical in assessing machine translation (MT) and human translation (HT) quality without seeing the reference translations. This work examines whether the state-of-the-art large language models (LLMs) can be fine-tuned for the TQE task and their capability. We take ChatGPT as one example and approach TQE as a binary classification task. Using \textbf{eight language pairs} including English to Italian, German, French, Japanese, Dutch, Portuguese, Turkish, and Chinese training corpora, our experimental results show that fine-tuned ChatGPT via its API can achieve a relatively high score on predicting translation quality, i.e. \textit{if the translation needs to be edited}. However, there is definitely much space to improve the model accuracy, e.g. they are 82.42\% and 83.69\% for English-Italian and English-German respectively using our experimental settings. English-Italiano bilingual Abstract is available in the paper.

摘要
tranlation Quality Estimation (TQE) 是一个非常重要的步骤在部署输出翻译之前。 TQE 也是评估机器翻译 (MT) 和人工翻译 (HT) 质量的关键步骤，无需看到参考翻译。这项工作检验了现代大语言模型 (LLM) 是否可以用于 TQE 任务，以及其能力。我们使用 ChatGPT 作为一个例子，将 TQE 视为二分类问题。使用包括英语到意大利语、德语、法语、日语、荷兰语、葡萄牙语、土耳其语和中文的训练集，我们的实验结果显示，经过 ChatGPT 的 API 微调，可以达到一定的高分数，即确定翻译是否需要编辑。然而，模型准确率仍然有很大的提升空间，例如英语-意大利语和英语-德语的准确率分别为 82.42% 和 83.69%。英语-意大利语双语摘要在论文中可用。

Formally Explaining Neural Networks within Reactive Systems

paper_url: http://arxiv.org/abs/2308.00143
repo_url: None
paper_authors: Shahaf Bassan, Guy Amir, Davide Corsi, Idan Refaeli, Guy Katz
for: 这个论文的目的是解释深度神经网络（DNN）控制的反应系统中的行为，以便提高系统的透明度和可信度。
methods: 这篇论文提出了一种基于验证的可解释AI技术，可以帮助找到DNN控制系统中的输入特征，并且提供了有效的计算方法来减少搜索空间。
results: 这篇论文在两个popular benchmark上进行了评估，并证明了其方法可以效率地计算出最小和最小的解释，超过了现有技术的性能。此外，论文还表明了其方法生成的正式解释比非验证基于AI技术更可靠。

Abstract
Deep neural networks (DNNs) are increasingly being used as controllers in reactive systems. However, DNNs are highly opaque, which renders it difficult to explain and justify their actions. To mitigate this issue, there has been a surge of interest in explainable AI (XAI) techniques, capable of pinpointing the input features that caused the DNN to act as it did. Existing XAI techniques typically face two limitations: (i) they are heuristic, and do not provide formal guarantees that the explanations are correct; and (ii) they often apply to ``one-shot'' systems, where the DNN is invoked independently of past invocations, as opposed to reactive systems. Here, we begin bridging this gap, and propose a formal DNN-verification-based XAI technique for reasoning about multi-step, reactive systems. We suggest methods for efficiently calculating succinct explanations, by exploiting the system's transition constraints in order to curtail the search space explored by the underlying verifier. We evaluate our approach on two popular benchmarks from the domain of automated navigation; and observe that our methods allow the efficient computation of minimal and minimum explanations, significantly outperforming the state of the art. We also demonstrate that our methods produce formal explanations that are more reliable than competing, non-verification-based XAI techniques.

摘要
Here, we aim to bridge this gap and propose a formal DNN-verification-based XAI technique for reasoning about multi-step, reactive systems. Our approach leverages the system's transition constraints to efficiently calculate succinct explanations. We evaluate our method on two popular benchmarks from the domain of automated navigation and observe that our approach can efficiently compute minimal and minimum explanations, significantly outperforming the state of the art. We also demonstrate that our methods produce formal explanations that are more reliable than competing, non-verification-based XAI techniques.我们使用的 DNN 是响应系统中的控制器，但 DNN 却具有高度透明性，使得解释和证明其行为具有困难。为解决这个问题，有一种强大的兴趣在解释 AI 技术（XAI）中，可以 pinpoint DNN 的输入特征，并且提供正式的 garanties。现有的 XAI 技术有两种限制：（i）它们是规则的，无法提供正式的 garanties，和（ii）它们通常适用于独立的一个执行，而不是响应系统。在这里，我们希望bridging这个差距，并提出一种基于 DNN 验证的 XAI 技术，用于理解多步、响应系统。我们利用系统的转移约束，以便高效地计算简短的解释。我们在两个popular的自动导航 benchmark 上评估了我们的方法，并观察到我们的方法可以高效地计算最小和最小的解释，与当前最佳性能相比较高。我们还证明了我们的方法生成的正式解释，比非验证基于 XAI 技术更可靠。

A Suite of Fairness Datasets for Tabular Classification

paper_url: http://arxiv.org/abs/2308.00133
repo_url: None
paper_authors: Martin Hirzel, Michael Feffer
for: 提高机器学习分类器的公平性
methods: 引入了一组函数来获取20个公平性数据集和相关的公平性metadata，以促进未来的公平性意识机器学习研究的更加严格的实验评估。
results: 未提出实验结果，主要是提供数据集和metadata供后续研究使用。

Abstract
There have been many papers with algorithms for improving fairness of machine-learning classifiers for tabular data. Unfortunately, most use only very few datasets for their experimental evaluation. We introduce a suite of functions for fetching 20 fairness datasets and providing associated fairness metadata. Hopefully, these will lead to more rigorous experimental evaluations in future fairness-aware machine learning research.

摘要
有很多文献提出了机器学习分类器公平性的算法优化方法，但大多数只使用了很少的数据集进行实验评估。我们介绍了一个函数集，用于抓取20个公平性数据集和相关的公平性元数据。希望这些函数能够促进未来的公平性意识机器学习研究的更加严格的实验评估。Here's a breakdown of the translation:* "There have been many papers" becomes "有很多文献"* "with algorithms for improving fairness" becomes "提出了机器学习分类器公平性的算法优化方法"* "for tabular data" becomes " для标量数据"* "Unfortunately, most use only very few datasets" becomes "但大多数只使用了很少的数据集"* "for their experimental evaluation" becomes "进行实验评估"* "We introduce a suite of functions" becomes "我们介绍了一个函数集"* "for fetching 20 fairness datasets" becomes "用于抓取20个公平性数据集"* "and providing associated fairness metadata" becomes "并提供相关的公平性元数据"* "Hopefully, these will lead to more rigorous experimental evaluations" becomes "希望这些函数能够促进未来的公平性意识机器学习研究的更加严格的实验评估"

A Modular Ontology for MODS – Metadata Object Description Schema

paper_url: http://arxiv.org/abs/2308.00116
repo_url: None
paper_authors: Rushrukh Rayan, Cogan Shimizu, Heidi Sieverding, Pascal Hitzler
for: 该论文主要描述了Metadata Object Description Schema (MODS)的开发和改进。
methods: 该论文使用了Modular Ontology Design Methodology (MOMo)来设计了Modular MODS Ontology (MMODS-O)，包含了所有MODS XML schema中的元素和属性。
results: 该论文通过对MODS XML schema进行修改和扩展，实现了更好的知识图数据模型化。

Abstract
The Metadata Object Description Schema (MODS) was developed to describe bibliographic concepts and metadata and is maintained by the Library of Congress. Its authoritative version is given as an XML schema based on an XML mindset which means that it has significant limitations for use in a knowledge graphs context. We have therefore developed the Modular MODS Ontology (MMODS-O) which incorporates all elements and attributes of the MODS XML schema. In designing the ontology, we adopt the recent Modular Ontology Design Methodology (MOMo) with the intention to strike a balance between modularity and quality ontology design on the one hand, and conservative backward compatibility with MODS on the other.

摘要
MetadataObjectDescriptionSchema（MODS）是由美国国会图书馆开发的，用于描述文献元素和元数据。它的官方版本是基于XMLschema的XML思想，这意味着在知识图谱上使用它有一定的限制。我们因此开发了Modular MODS Ontology（MMODS-O），它包含了MODS XML schema中的所有元素和属性。在设计 ontology 时，我们采用了最近的Module Ontology Design Methodology（MOMo），以达到平衡模块性和质量ontology设计的目的，同时保持与MODS的保守兼容性。

Can A Single Human Supervise A Swarm of 100 Heterogeneous Robots?

paper_url: http://arxiv.org/abs/2308.00102
repo_url: None
paper_authors: Julie A. Adams, Joshua Hamell, Phillip Walker
for: 这个论文是为了检验人类单个人是否能够监督真正的多种机器人完成现实世界环境中的任务而写的。
methods: 这个论文使用了多种机器人群体的 Command and Control of Aggregate Swarm Tactics integrator 系统，并在美国陆军城市训练场地进行了实践。
results: 研究发现，人类单个人可以成功地监督100种不同机器人完成现实世界任务，但工作负担会经常超过额度。

Abstract
An open research question has been whether a single human can supervise a true heterogeneous swarm of robots completing tasks in real world environments. A general concern is whether or not the human's workload will be taxed to the breaking point. The Defense Advanced Research Projects Agency's OFFsensive Swarm-Enabled Tactics program's field exercises that occurred at U.S. Army urban training sites provided the opportunity to understand the impact of achieving such swarm deployments. The Command and Control of Aggregate Swarm Tactics integrator team's swarm commander users the heterogeneous robot swarm to conduct relevant missions. During the final OFFSET program field exercise, the team collected objective and subjective metrics related to teh swarm commander's human performance. A multi-dimensional workload algorithm that estimates overall workload based on five components of workload was used to analyze the results. While the swarm commander's workload estimate did cross the overload threshold frequently, the swarm commander was able to successfully complete the missions, often under challenging operational conditions. The presented results demonstrate that a single human can deploy a swarm of 100 heterogeneous robots to conduct real-world missions.

摘要
一个打开的研究问题是可以让一个人监督真正多样化的机器人完成实际环境中的任务。一个普遍的问题是人类工作负担是否会被推到极限。美国国防高等研究计划署（DARPA）的OFFsensive Swarm-Enabled Tactics（OFFSET）项目的场地演练在美国陆军城市训练场地上进行了。 Command and Control of Aggregate Swarm Tactics（C2AST）团队的群组指挥官使用多样化机器人群组进行了相关的任务。在最后一次 OFFSET 项目场地演练中，团队收集了对群组指挥官的人类性能数据。使用多维度工作负担算法，分析结果表明，虽然群组指挥官的工作负担估计频繁超过了过载点，但群组指挥官仍然成功完成了任务，经常在具有挑战性的作战情况下。这些结果表明，一个人可以使用100个多样化机器人完成实际任务。

Towards Semantically Enriched Embeddings for Knowledge Graph Completion

paper_url: http://arxiv.org/abs/2308.00081
repo_url: None
paper_authors: Mehwish Alam, Frank van Harmelen, Maribel Acosta
for: 本文主要用于介绍基于嵌入的知识Graph（KG）完成算法的现状和未来发展方向。
methods: 本文详细介绍了基于KG的逻辑预测算法，包括推uctive和普适链预测算法，以及利用KG、语言模型（LLM）和描述逻辑axioms中的类型信息。
results: 本文结合了现有的KG完成算法和LLM技术，并对各种逻辑预测算法进行了分析和评价。结果表明，通过capturing semantics of description logic axioms, 可以提高KG的完成精度和效率。

Abstract
Embedding based Knowledge Graph (KG) Completion has gained much attention over the past few years. Most of the current algorithms consider a KG as a multidirectional labeled graph and lack the ability to capture the semantics underlying the schematic information. In a separate development, a vast amount of information has been captured within the Large Language Models (LLMs) which has revolutionized the field of Artificial Intelligence. KGs could benefit from these LLMs and vice versa. This vision paper discusses the existing algorithms for KG completion based on the variations for generating KG embeddings. It starts with discussing various KG completion algorithms such as transductive and inductive link prediction and entity type prediction algorithms. It then moves on to the algorithms utilizing type information within the KGs, LLMs, and finally to algorithms capturing the semantics represented in different description logic axioms. We conclude the paper with a critical reflection on the current state of work in the community and give recommendations for future directions.

摘要
<>TRANSLATE_TEXT多年来，基于嵌入的知识图（KG）完成技术已经受到了广泛关注。现有大多数算法视知识图为多向标签图，而无法捕捉知识图中下发的 semantics。在另一方面，大量信息已经被 capture在大语言模型（LLM）中，这些模型已经革命化了人工智能领域。KG可以受益于这些 LLM，并且 LLM 也可以受益于 KG。本视点论文将讨论现有的 KG 完成算法，包括推uctive 和 inductive 链接预测算法，entity type 预测算法，以及利用类型信息在 KG、LLM 和描述逻辑axioms中的算法。我们将结束这篇论文 WITH 一个批判性的反思，并提出未来研究的方向。TRANSLATE_TEXT_END

A Novel Deep Learning based Model to Defend Network Intrusion Detection System against Adversarial Attacks

paper_url: http://arxiv.org/abs/2308.00077
repo_url: None
paper_authors: Khushnaseeb Roshan, Aasim Zafar, Shiekh Burhan Ul Haque
for: 这项研究的目的是研究基于深度学习的网络入侵检测系统（NIDS）中的强大敌意攻击算法以及防御策略。
methods: 本研究使用了四种强大敌意攻击算法，即快速梯度签名方法（FGSM）、Jacobian Saliency Map Attack（JSMA）、Projected Gradient Descent（PGD）以及Carlini & Wagner（C&W）。为了增强NIDS模型的可您性，本研究还使用了对抗训练作为防御策略。
results: 研究结果分为三个阶段，即在攻击前（前期）、攻击后（后期）和防御后（后防）。使用加拿大安全计划2017（CICIDS-2017）数据集进行评估，并使用了各种性能指标如f1-score、准确率等来评估NIDS模型的性能。

Abstract
Network Intrusion Detection System (NIDS) is an essential tool in securing cyberspace from a variety of security risks and unknown cyberattacks. A number of solutions have been implemented for Machine Learning (ML), and Deep Learning (DL) based NIDS. However, all these solutions are vulnerable to adversarial attacks, in which the malicious actor tries to evade or fool the model by injecting adversarial perturbed examples into the system. The main aim of this research work is to study powerful adversarial attack algorithms and their defence method on DL-based NIDS. Fast Gradient Sign Method (FGSM), Jacobian Saliency Map Attack (JSMA), Projected Gradient Descent (PGD) and Carlini & Wagner (C&W) are four powerful adversarial attack methods implemented against the NIDS. As a defence method, Adversarial Training is used to increase the robustness of the NIDS model. The results are summarized in three phases, i.e., 1) before the adversarial attack, 2) after the adversarial attack, and 3) after the adversarial defence. The Canadian Institute for Cybersecurity Intrusion Detection System 2017 (CICIDS-2017) dataset is used for evaluation purposes with various performance measurements like f1-score, accuracy etc.

摘要
网络侵入检测系统（NIDS）是保护网络安全的重要工具，它可以检测到许多安全风险和未知的网络攻击。为了提高NIDS的检测精度，许多解决方案已经实施了机器学习（ML）和深度学习（DL）技术。然而，这些解决方案都受到敌意攻击的威胁，敌意攻击者可以通过投入敌意扰动的示例来诱导模型出错。本研究的主要目标是研究敌意攻击 Algorithm 和对 DL-based NIDS 的防御策略。本研究使用了 Fast Gradient Sign Method（FGSM）、Jacobian Saliency Map Attack（JSMA）、Projected Gradient Descent（PGD）和Carlini & Wagner（C&W）四种强大的敌意攻击方法，以及对 NIDS 模型的 Adversarial Training 防御策略。研究结果分为三个阶段：前 adversarial 攻击、后 adversarial 攻击和后 adversarial 防御。使用了 Canadian Institute for Cybersecurity Intrusion Detection System 2017（CICIDS-2017）数据集进行评估，并使用了各种性能指标，如 f1-score、准确率等。

Crowd Safety Manager: Towards Data-Driven Active Decision Support for Planning and Control of Crowd Events

paper_url: http://arxiv.org/abs/2308.00076
repo_url: None
paper_authors: Panchamy Krishnakumari, Sascha Hoogendoorn-Lanser, Jeroen Steenbakkers, Serge Hoogendoorn
for: 这个论文目的是提出一种新的技术和方法，用于提高群体管理的规划和运作阶段。这种方法包括创新的数据收集技术、数据 интеграción和可视化使用3D数字双胞虫，以及基于人工智能工具的风险识别。
methods: 这个论文使用了Bowtie模型，这是一种全面的风险评估和预测模型，可以评估和预测不同因素的影响，如交通流量和人群密度、天气条件、情绪和访客目的等。此外，这个模型还使用了大量实时数据源，如Resono，以获取访客数量和运动幅度的信息。
results: 这个论文的结果表明，使用XGBoost框架可以获得最准确的预测结果，但是certain locations may benefit from additional input data to further enhance prediction quality。不withstanding these limitations, this work contributes to a more effective crowd management system and opens avenues for further advancements in this critical field.

Abstract
This paper presents novel technology and methodology aimed at enhancing crowd management in both the planning and operational phases. The approach encompasses innovative data collection techniques, data integration, and visualization using a 3D Digital Twin, along with the incorporation of artificial intelligence (AI) tools for risk identification. The paper introduces the Bowtie model, a comprehensive framework designed to assess and predict risk levels. The model combines objective estimations and predictions, such as traffic flow operations and crowdedness levels, with various aggravating factors like weather conditions, sentiments, and the purpose of visitors, to evaluate the expected risk of incidents. The proposed framework is applied to the Crowd Safety Manager project in Scheveningen, where the DigiTwin is developed based on a wealth of real-time data sources. One noteworthy data source is Resono, offering insights into the number of visitors and their movements, leveraging a mobile phone panel of over 2 million users in the Netherlands. Particular attention is given to the left-hand side of the Bowtie, which includes state estimation, prediction, and forecasting. Notably, the focus is on generating multi-day ahead forecasts for event-planning purposes using Resono data. Advanced machine learning techniques, including the XGBoost framework, are compared, with XGBoost demonstrating the most accurate forecasts. The results indicate that the predictions are adequately accurate. However, certain locations may benefit from additional input data to further enhance prediction quality. Despite these limitations, this work contributes to a more effective crowd management system and opens avenues for further advancements in this critical field.

摘要
Note: The translation is written in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges

paper_url: http://arxiv.org/abs/2308.00031
repo_url: None
paper_authors: Giorgio Franceschelli, Mirco Musolesi
for: 这篇论文探讨了应用强化学习（RL）到生成人工智能（AI）的现状、机遇和未解决问题。
methods: 论文使用RL作为生成AI的一种代替方法，同时强调RL可以同时最大化一个目标函数并生成输出。
results: 论文结束时未提供具体的结果，但认为RL在生成AI中具有广泛的应用前景和挑战。

Abstract
Generative Artificial Intelligence (AI) is one of the most exciting developments in Computer Science of the last decade. At the same time, Reinforcement Learning (RL) has emerged as a very successful paradigm for a variety of machine learning tasks. In this survey, we discuss the state of the art, opportunities and open research questions in applying RL to generative AI. In particular, we will discuss three types of applications, namely, RL as an alternative way for generation without specified objectives; as a way for generating outputs while concurrently maximizing an objective function; and, finally, as a way of embedding desired characteristics, which cannot be easily captured by means of an objective function, into the generative process. We conclude the survey with an in-depth discussion of the opportunities and challenges in this fascinating emerging area.

摘要
优化人工智能（AI）是计算机科学领域最有前途的发展之一。同时，奖励学习（RL）已成为许多机器学习任务的非常成功的方法。在这篇评论中，我们讨论了RL应用于生成AI的当前状况、机会和开放研究问题。具体来说，我们会讨论以下三种应用：RL作为生成无需定义目标的备用方法；RL可以同时最大化目标函数并生成输出；RL可以嵌入不容易通过目标函数捕捉的欲求特征到生成过程中。我们在评论中结束，总结了这个有趣的新领域的机会和挑战。

DiVA-360: The Dynamic Visuo-Audio Dataset for Immersive Neural Fields

paper_url: http://arxiv.org/abs/2307.16897
repo_url: None
paper_authors: Cheng-You Lu, Peisen Zhou, Angela Xing, Chandradeep Pokhariya, Arnab Dey, Ishaan Shah, Rugved Mavidipalli, Dylan Hu, Andrew Comport, Kefan Chen, Srinath Sridhar
for: 这篇论文主要是为了提高神经场景的捕捉精度和可靠性。
methods: 这篇论文使用了新的硬件系统和数据采集技术，包括53个RGB摄像头和6个麦克风，以获得高速度和高分辨率的视觉和声音数据。
results: 这篇论文提供了一个大规模的实际场景数据集，包括46个动态场景、30个静态场景和95个静态对象，以及对这些场景的详细文本描述、前景后景分割mask和对象的3D姿态对应。

Abstract
Advances in neural fields are enabling high-fidelity capture of the shape and appearance of static and dynamic scenes. However, their capabilities lag behind those offered by representations such as pixels or meshes due to algorithmic challenges and the lack of large-scale real-world datasets. We address the dataset limitation with DiVA-360, a real-world 360 dynamic visual-audio dataset with synchronized multimodal visual, audio, and textual information about table-scale scenes. It contains 46 dynamic scenes, 30 static scenes, and 95 static objects spanning 11 categories captured using a new hardware system using 53 RGB cameras at 120 FPS and 6 microphones for a total of 8.6M image frames and 1360 s of dynamic data. We provide detailed text descriptions for all scenes, foreground-background segmentation masks, category-specific 3D pose alignment for static objects, as well as metrics for comparison. Our data, hardware and software, and code are available at https://diva360.github.io/.

摘要

Predicting masked tokens in stochastic locations improves masked image modeling

paper_url: http://arxiv.org/abs/2308.00566
repo_url: None
paper_authors: Amir Bar, Florian Bordes, Assaf Shocher, Mahmoud Assran, Pascal Vincent, Nicolas Ballas, Trevor Darrell, Amir Globerson, Yann LeCun
for: 提高自然语言处理中的隐藏学习性能，特别是处理图像中的Semantic segmentation任务。
methods: 提出了一种名为FlexPredict的随机掩码位置Conditional模型，通过引入位置不确定性来帮助模型学习更加稳定的特征表示。
results: 在多种任务上提高了下游性能，比如与基eline相比，FlexPredict在ImageNet线性探测任务上提高了1.6%的性能，而在半supervised видеSegmentation任务上提高了2.5%的性能。

Abstract
Self-supervised learning is a promising paradigm in deep learning that enables learning from unlabeled data by constructing pretext tasks that require learning useful representations. In natural language processing, the dominant pretext task has been masked language modeling (MLM), while in computer vision there exists an equivalent called Masked Image Modeling (MIM). However, MIM is challenging because it requires predicting semantic content in accurate locations. E.g, given an incomplete picture of a dog, we can guess that there is a tail, but we cannot determine its exact location. In this work, we propose FlexPredict, a stochastic model that addresses this challenge by incorporating location uncertainty into the model. Specifically, we condition the model on stochastic masked token positions to guide the model toward learning features that are more robust to location uncertainties. Our approach improves downstream performance on a range of tasks, e.g, compared to MIM baselines, FlexPredict boosts ImageNet linear probing by 1.6% with ViT-B and by 2.5% for semi-supervised video segmentation using ViT-L.

摘要
自我指导学习是深度学习中一种有前途的方法，允许学习无标签数据中的有用表示。在自然语言处理中，主导性隐藏任务是覆盖语言模型（MLM），而在计算机视觉中则存在相应的equivalent，即隐藏图像模型（MIM）。然而，MIM具有很大的挑战，因为它需要准确预测 semantic content的位置。例如，给定一个含有缺失的狗图像，我们可以预测有尾巴，但是无法准确地确定其位置。在这种情况下，我们提出了FlexPredict，一种随机模型，以便在模型中包含位置不确定性。具体来说，我们将模型 Conditioned on stochastic masked token positions，以便导引模型学习更加鲁棒的特征。我们的方法可以提高下游任务的性能，例如，相比MIM基eline，FlexPredict在ImageNet直线探测中提高了ViT-B上的1.6%，并在使用ViT-L的半监督视频分割任务中提高了2.5%。

Foundational Models for Fault Diagnosis of Electrical Motors

paper_url: http://arxiv.org/abs/2307.16891
repo_url: None
paper_authors: Sriram Anbalagan, Deepesh Agarwal, Balasubramaniam Natarajan, Babji Srinivasan
for: 这个研究旨在提出一个基础模型，用于电动机故障诊断。
methods: 这个方法利用自动学习来建立一个神经网络底部，并将其精致化以达到特定目标。
results: 实验结果显示，这个方法可以在不同的故障情况和操作条件下，从少量训练数据中获得高于90%的分类精度。

Abstract
A majority of recent advancements related to the fault diagnosis of electrical motors are based on the assumption that training and testing data are drawn from the same distribution. However, the data distribution can vary across different operating conditions during real-world operating scenarios of electrical motors. Consequently, this assumption limits the practical implementation of existing studies for fault diagnosis, as they rely on fully labelled training data spanning all operating conditions and assume a consistent distribution. This is because obtaining a large number of labelled samples for several machines across different fault cases and operating scenarios may be unfeasible. In order to overcome the aforementioned limitations, this work proposes a framework to develop a foundational model for fault diagnosis of electrical motors. It involves building a neural network-based backbone to learn high-level features using self-supervised learning, and then fine-tuning the backbone to achieve specific objectives. The primary advantage of such an approach is that the backbone can be fine-tuned to achieve a wide variety of target tasks using very less amount of training data as compared to traditional supervised learning methodologies. The empirical evaluation demonstrates the effectiveness of the proposed approach by obtaining more than 90\% classification accuracy by fine-tuning the backbone not only across different types of fault scenarios or operating conditions, but also across different machines. This illustrates the promising potential of the proposed approach for cross-machine fault diagnosis tasks in real-world applications.

摘要
多数最近的电机故障诊断研究假设训练和测试数据来自同一个分布。然而，实际应用中的数据分布可能会在不同的运行条件下发生变化。因此，这种假设限制了现有的研究实施，因为它们依赖于完全标注的训练数据，涵盖所有运行条件和假设一致。这可能是因为获得许多标注的样本是不可能的。为了突破以上限制，本研究提出了一个框架，用于开发电机故障诊断的基础模型。它包括使用神经网络为背景学习高级特征，然后精度地调整背景以实现特定目标。这种方法的优点是，可以通过非常少的训练数据来进行精度地调整背景，相比于传统的直接学习方法。实验证明，提出的方法可以在不同的故障enario和运行条件下达到90%以上的分类精度，并且可以在不同的机器上进行跨机器的故障诊断任务。这表明了提出的方法在实际应用中的扎实潜力。

Learning to Model the World with Language

paper_url: http://arxiv.org/abs/2308.01399
repo_url: https://github.com/microsoft/OpenKP
paper_authors: Jessy Lin, Yuqing Du, Olivia Watkins, Danijar Hafner, Pieter Abbeel, Dan Klein, Anca Dragan
for: 本研究旨在建立一种可以理解多种语言、关联语言和视觉世界、基于语言的Future Prediction的自然语言处理 Agent。
methods: 该Agent使用了多种语言理解方法，包括语言模型、视觉模型和Future Prediction模型，以便从语言中提取信息、理解语言的含义和预测未来的情况。
results: 研究表明，通过使用多种语言理解方法和Future Prediction模型，该Agent可以在不同的环境中提高任务性能，例如在文本、图像和视频等多种语言中提高环境描述、游戏规则和指令等。

Abstract
To interact with humans in the world, agents need to understand the diverse types of language that people use, relate them to the visual world, and act based on them. While current agents learn to execute simple language instructions from task rewards, we aim to build agents that leverage diverse language that conveys general knowledge, describes the state of the world, provides interactive feedback, and more. Our key idea is that language helps agents predict the future: what will be observed, how the world will behave, and which situations will be rewarded. This perspective unifies language understanding with future prediction as a powerful self-supervised learning objective. We present Dynalang, an agent that learns a multimodal world model that predicts future text and image representations and learns to act from imagined model rollouts. Unlike traditional agents that use language only to predict actions, Dynalang acquires rich language understanding by using past language also to predict future language, video, and rewards. In addition to learning from online interaction in an environment, Dynalang can be pretrained on datasets of text, video, or both without actions or rewards. From using language hints in grid worlds to navigating photorealistic scans of homes, Dynalang utilizes diverse types of language to improve task performance, including environment descriptions, game rules, and instructions.

摘要
（Current agents只能执行简单的语言指令，但我们想建立能够应用多种语言的agent。我们的关键思想是语言可以帮助agent预测未来：将会见到什么，世界会如何变化，哪些情况会得到奖励。这种观点将语言理解与未来预测作为一个强大的自我监督学习目标联系起来。我们介绍了Dynalang，一个可以预测未来文本和图像表示的多模态世界模型，并从想象的模型演示中学习行为。不同于传统的语言只用于预测行动的agent，Dynalang通过过去的语言来预测未来语言、视频和奖励，从而获得丰富的语言理解。除了在环境中学习外，Dynalang还可以在没有行动或奖励的情况下在文本、视频或两者之间进行预训练。从使用语言提示在格子世界中到探索高级扫描图像的家庭，Dynalang利用多种语言来提高任务性能，包括环境描述、游戏规则和指令。）

Discovering Adaptable Symbolic Algorithms from Scratch

paper_url: http://arxiv.org/abs/2307.16890
repo_url: None
paper_authors: Stephen Kelly, Daniel S. Park, Xingyou Song, Mitchell McIntire, Pranav Nashikkar, Ritam Guha, Wolfgang Banzhaf, Kalyanmoy Deb, Vishnu Naresh Boddeti, Jie Tan, Esteban Real
for: 这篇论文是为了开发一种能够快速适应环境变化的自主机器人控制策略而写的。
methods: 这篇论文提出了一种基于AutoML-Zero的方法，称为AutoRobotics-Zero（ARZ），可以从scratch找到适应环境变化的零基础策略。与传统的神经网络适应策略不同，ARZ可以构建一个完整的线性注册机器的控制算法，并且可以在运行中调整模型参数和推理算法来适应突然的环境变化。
results: 在一个真实的四肢机器人 simulate 环境中，这种方法可以生成安全的控制策略，以避免机器人当某个肢体突然失效时坠落。与两种常见的神经网络基础模型不同，ARZ可以在突然的环境变化下表现出更高的鲁棒性和可靠性。此外，作者还进行了一种 novel 和复杂的非站台控制任务的分析，结果证明了ARZ的优势。

Abstract
Autonomous robots deployed in the real world will need control policies that rapidly adapt to environmental changes. To this end, we propose AutoRobotics-Zero (ARZ), a method based on AutoML-Zero that discovers zero-shot adaptable policies from scratch. In contrast to neural network adaption policies, where only model parameters are optimized, ARZ can build control algorithms with the full expressive power of a linear register machine. We evolve modular policies that tune their model parameters and alter their inference algorithm on-the-fly to adapt to sudden environmental changes. We demonstrate our method on a realistic simulated quadruped robot, for which we evolve safe control policies that avoid falling when individual limbs suddenly break. This is a challenging task in which two popular neural network baselines fail. Finally, we conduct a detailed analysis of our method on a novel and challenging non-stationary control task dubbed Cataclysmic Cartpole. Results confirm our findings that ARZ is significantly more robust to sudden environmental changes and can build simple, interpretable control policies.

摘要
自主机器人在真实世界中部署时需要快速适应环境变化的控制策略。为此，我们提议了AutoRobotics-Zero（ARZ）方法，基于AutoML-Zero方法，可以从头开始找到适应环境变化的零学习策略。与神经网络适应策略相比，ARZ可以构建一个完全可控的线性注册机器。我们演化出模块化策略，可以在运行时调整模型参数和推理算法，以适应突然发生的环境变化。我们在一个真实的四肢机器人模拟中进行了实验，演示了我们的方法可以避免单个肢体突然失效时的倒下。这是一个许多神经网络基础模型无法解决的问题。最后，我们对一个新的和复杂的非站立控制任务进行了详细分析，称为洗礼Cartpole。结果证明了我们的方法在突然环境变化中更加稳定和可控，可以构建简单、可读取的控制策略。

Image Synthesis under Limited Data: A Survey and Taxonomy

paper_url: http://arxiv.org/abs/2307.16879
repo_url: https://github.com/kobeshegu/awesome-few-shot-generation
paper_authors: Mengping Yang, Zhe Wang
for: 这篇论文旨在为有限数据情况下的图像生成提供系统aticreview和新的分类法，以帮助研究者更好地理解和进行相关研究。
methods: 本论文使用了大量的文献检索和分析，以及一些新的方法和技术，如批处理学习、自适应训练、抽象特征学习等，以解决有限数据情况下的图像生成问题。
results: 本论文通过对各种有限数据情况下的图像生成方法的比较和分析，提出了一种新的图像生成方法，并实现了在有限数据情况下的图像生成 task 的进一步改进。

Abstract
Deep generative models, which target reproducing the given data distribution to produce novel samples, have made unprecedented advancements in recent years. Their technical breakthroughs have enabled unparalleled quality in the synthesis of visual content. However, one critical prerequisite for their tremendous success is the availability of a sufficient number of training samples, which requires massive computation resources. When trained on limited data, generative models tend to suffer from severe performance deterioration due to overfitting and memorization. Accordingly, researchers have devoted considerable attention to develop novel models that are capable of generating plausible and diverse images from limited training data recently. Despite numerous efforts to enhance training stability and synthesis quality in the limited data scenarios, there is a lack of a systematic survey that provides 1) a clear problem definition, critical challenges, and taxonomy of various tasks; 2) an in-depth analysis on the pros, cons, and remain limitations of existing literature; as well as 3) a thorough discussion on the potential applications and future directions in the field of image synthesis under limited data. In order to fill this gap and provide a informative introduction to researchers who are new to this topic, this survey offers a comprehensive review and a novel taxonomy on the development of image synthesis under limited data. In particular, it covers the problem definition, requirements, main solutions, popular benchmarks, and remain challenges in a comprehensive and all-around manner.

摘要
深度生成模型在最近几年内已经取得了无 precedent 的进步，它们的技术突破使得视觉内容的合成质量达到了前所未有的水平。然而，这些模型的巨大成功受到了充足的训练样本数据的限制。当训练数据少时，生成模型往往会因过拟合和记忆而表现严重的性能下降。因此，研究人员在最近几年里一直在努力开发新的模型，以便从有限的训练数据中生成可信度高、多样性强的图像。虽然有很多尝试以提高训练稳定性和合成质量在有限数据情况下，但是还没有一份系统性的报告，提供以下内容：1）明确的问题定义、重要挑战和多种任务的分类; 2）对现有文献的优缺点和限制进行深入分析; 以及3）将来应用和未来方向在有限数据情况下的图像合成领域的详细讨论。为了填补这个空白和为新手研究者提供一份有用的引导，本文提供了一份COMPREHENSIVE REVIEW和一种新的分类方法，涵盖问题定义、需求、主要解决方案、流行的标准 benchmarks 以及未解决的挑战。特别是，本文涵盖了问题定义、需求、主要解决方案、流行的标准 benchmarks 以及未解决的挑战在全面和协调的方式。

Contrastive Learning for API Aspect Analysis

paper_url: http://arxiv.org/abs/2307.16878
repo_url: https://github.com/disa-lab/contrastive-learning-api-aspect-ase2023
paper_authors: G. M. Shahariar, Tahmid Hasan, Anindya Iqbal, Gias Uddin
for: 本研究开发了一个新的方法 - CLAA - 用于 API 层级的问题探索，这个方法使用了训练了对照问题的 transformer 模型，并使用了监督的对照损失函数。
methods: 本研究使用了一个 benchmark Dataset 集合 developer 讨论数据库，从 Stack Overflow 上收集来的讨论数据，并与现有的 transformer 模型进行比较。
results: 我们的实验结果显示，对照学习可以对 transformer 模型在检测 Performance、安全、使用度和文档等方面的表现提供明显改善。此外，我们还进行了实际和开发者的研究，结果显示使用 ‘Stack Overflow + CLAA’ 可以增加了准确性和自信度 During API 选择。

Abstract
We present a novel approach - CLAA - for API aspect detection in API reviews that utilizes transformer models trained with a supervised contrastive loss objective function. We evaluate CLAA using performance and impact analysis. For performance analysis, we utilized a benchmark dataset on developer discussions collected from Stack Overflow and compare the results to those obtained using state-of-the-art transformer models. Our experiments show that contrastive learning can significantly improve the performance of transformer models in detecting aspects such as Performance, Security, Usability, and Documentation. For impact analysis, we performed empirical and developer study. On a randomly selected and manually labeled 200 online reviews, CLAA achieved 92% accuracy while the SOTA baseline achieved 81.5%. According to our developer study involving 10 participants, the use of 'Stack Overflow + CLAA' resulted in increased accuracy and confidence during API selection. Replication package: https://github.com/disa-lab/Contrastive-Learning-API-Aspect-ASE2023

摘要
我们提出了一种新的方法 - CLAA - 用于API方面检测，该方法使用已经训练过监督对比损失函数的转换器模型。我们通过性能分析和影响分析来评估CLAA。在性能分析中，我们使用了Stack Overflow上的开发者讨论数据集，并与现有的转换器模型进行比较。我们的实验结果显示，对比学习可以明显提高转换器模型在检测性能、安全、可用性和Documentation等方面的性能。在影响分析中，我们进行了实验和开发者调查。对于200个在线评论中随机选择和手动标注的样本，CLAA达到了92%的准确率，而基eline达到了81.5%。根据我们的开发者调查，使用'Stack Overflow + CLAA'可以提高了选择API的准确率和自信心。可以在以下GitHub上下载我们的复现包：https://github.com/disa-lab/Contrastive-Learning-API-Aspect-ASE2023。

Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering

paper_url: http://arxiv.org/abs/2307.16877
repo_url: https://github.com/mcgill-nlp/instruct-qa
paper_authors: Vaibhav Adlakha, Parishad BehnamGhader, Xing Han Lu, Nicholas Meade, Siva Reddy
for: 这个论文旨在探讨 Retriever-augmented instruction-following 模型在问答任务中的表现，以及 традицион的评价指标是否准确反映这些模型的表现。
methods: 这个论文使用了 Retriever-augmented instruction-following 模型，并通过自动和人工评价来评估这些模型的表现。
results: 研究发现，使用这些模型可以达到高度正确性，但是它们在保持提供的知识的方面存在困难，经常会产生假的回答。

Abstract
Retriever-augmented instruction-following models are attractive alternatives to fine-tuned approaches for information-seeking tasks such as question answering (QA). By simply prepending retrieved documents in its input along with an instruction, these models can be adapted to various information domains and tasks without additional fine-tuning. While the model responses tend to be natural and fluent, the additional verbosity makes traditional QA evaluation metrics such as exact match (EM) and F1 unreliable for accurately quantifying model performance. In this work, we investigate the performance of instruction-following models across three information-seeking QA tasks. We use both automatic and human evaluation to evaluate these models along two dimensions: 1) how well they satisfy the user's information need (correctness), and 2) whether they produce a response based on the provided knowledge (faithfulness). Guided by human evaluation and analysis, we highlight the shortcomings of traditional metrics for both correctness and faithfulness. We then propose simple token-overlap based and model-based metrics that reflect the true performance of these models. Our analysis reveals that instruction-following models are competitive, and sometimes even outperform fine-tuned models for correctness. However, these models struggle to stick to the provided knowledge and often hallucinate in their responses. We hope our work encourages a more holistic evaluation of instruction-following models for QA. Our code and data is available at https://github.com/McGill-NLP/instruct-qa

摘要
“具有增强功能的寻勤模型是问答任务中信息寻找任务的可取代方案。它们可以通过在输入中附加检索到的文档来适应不同的信息领域和任务，无需进一步的调整。虽然模型的回答通常具有自然和流畅的特点，但是由于额外的verbosity，传统的问答评价指标如精准匹配（EM）和F1指标无法准确地衡量模型的表现。在这项工作中，我们研究了寻勤模型在三种信息寻找问答任务中的表现。我们使用自动和人工评价来评估这些模型，以两个维度评估它们的表现：1）满足用户的信息需求是否正确（正确性），2）是否按照提供的知识生成回答（忠实）。受人工评价和分析的指导，我们发现传统的评价指标对正确性和忠实性都存在缺陷。我们然后提出了基于符号重叠的metric和模型基于的metric，这些metric能够准确反映寻勤模型的表现。我们的分析显示，寻勤模型在正确性方面竞争力强，有时 même outperform了调整模型。然而，这些模型在保持提供的知识并且减少幻想的回答方面做出了差的表现。我们希望我们的工作能够激励更加全面的评估寻勤模型的问答能力。我们的代码和数据可以在https://github.com/McGill-NLP/instruct-qa上获取。”

Towards Trustworthy and Aligned Machine Learning: A Data-centric Survey with Causality Perspectives

paper_url: http://arxiv.org/abs/2307.16851
repo_url: None
paper_authors: Haoyang Liu, Maheep Chaudhary, Haohan Wang
for: 本研究旨在系统性地回顾过去十年内对机器学习可靠性的研究，包括可靠性、安全性、可解释性和公平性等方面。
methods: 本文使用了许多独立发展的方法来解决这些挑战，其中包括robustness、安全性、可解释性和公平性等方面的方法。这些方法都是基于Pearl的 causality hierarchy的。
results: 本文通过一种统一的语言和数学术语来连接这些方法，并发现它们之间的相似性。此外，本文还探讨了大型预训模型的可靠性，包括精度调整、参数效率调整、提示和人工反馈等方法。

Abstract
The trustworthiness of machine learning has emerged as a critical topic in the field, encompassing various applications and research areas such as robustness, security, interpretability, and fairness. The last decade saw the development of numerous methods addressing these challenges. In this survey, we systematically review these advancements from a data-centric perspective, highlighting the shortcomings of traditional empirical risk minimization (ERM) training in handling challenges posed by the data. Interestingly, we observe a convergence of these methods, despite being developed independently across trustworthy machine learning subfields. Pearl's hierarchy of causality offers a unifying framework for these techniques. Accordingly, this survey presents the background of trustworthy machine learning development using a unified set of concepts, connects this language to Pearl's causal hierarchy, and finally discusses methods explicitly inspired by causality literature. We provide a unified language with mathematical vocabulary to link these methods across robustness, adversarial robustness, interpretability, and fairness, fostering a more cohesive understanding of the field. Further, we explore the trustworthiness of large pretrained models. After summarizing dominant techniques like fine-tuning, parameter-efficient fine-tuning, prompting, and reinforcement learning with human feedback, we draw connections between them and the standard ERM. This connection allows us to build upon the principled understanding of trustworthy methods, extending it to these new techniques in large pretrained models, paving the way for future methods. Existing methods under this perspective are also reviewed. Lastly, we offer a brief summary of the applications of these methods and discuss potential future aspects related to our survey. For more information, please visit http://trustai.one.

摘要
machine learning 的可靠性已经成为Field中的一个关键话题，涵盖了多个应用和研究领域，如Robustness、安全性、可读性和公平性。过去十年内，有多种方法提出来解决这些挑战。在这份报告中，我们系统性地回顾这些进步，强调传统的Empirical Risk Minimization（ERM）训练在处理数据时的缺陷。意外地，我们发现这些方法即使独立地在可靠机器学习子领域中发展，它们却有趋同的特点。以珍珠的 causality 领域为基础，我们提出一种统一的语言和概念集，将这些方法连接起来，并且使用概率论语言来链接这些方法。我们提供一种统一的语言和概念集，将这些方法连接起来，并且使用概率论语言来链接这些方法。我们还探讨了大规模预训练模型的可靠性。我们将dominant technique如 fine-tuning、parameter-efficient fine-tuning、提示和人工回响学习简要介绍，并连接它们与标准 ERM。这种连接允许我们将可靠方法的原理性理解扩展到这些新技术上，开 up新的可能性。此外，我们还概述了这些方法的应用和未来方向。更多信息请参考。

Decidable Fragments of LTLf Modulo Theories (Extended Version)

paper_url: http://arxiv.org/abs/2307.16840
repo_url: None
paper_authors: Luca Geatti, Alessandro Gianola, Nicola Gigante, Sarah Winkler
for: 这篇论文是关于Linear Temporal Logic Modulo Theories over Finite Traces（LTLfMT）的研究。
methods: 这篇论文使用了一种新的杜雷茨矩阵（tableau），用于解决LTLfMT的满足问题。
results: 这篇论文证明了一种新的缩矩阵规则，可以保证LTLfMT表达式的满足问题的决策结果是确定的，并且可以提供新的可 decidability 结果 для一些LTLfMT的衍生物。

Abstract
We study Linear Temporal Logic Modulo Theories over Finite Traces (LTLfMT), a recently introduced extension of LTL over finite traces (LTLf) where propositions are replaced by first-order formulas and where first-order variables referring to different time points can be compared. In general, LTLfMT was shown to be semi-decidable for any decidable first-order theory (e.g., linear arithmetics), with a tableau-based semi-decision procedure. In this paper we present a sound and complete pruning rule for the LTLfMT tableau. We show that for any LTLfMT formula that satisfies an abstract, semantic condition, that we call finite memory, the tableau augmented with the new rule is also guaranteed to terminate. Last but not least, this technique allows us to establish novel decidability results for the satisfiability of several fragments of LTLfMT, as well as to give new decidability proofs for classes that are already known.

摘要
我们研究线性时间逻辑模式论（LTLfMT），是LTLf（线性时间逻辑）的一种扩展，其中提poseitions被 replaced by first-order式并允许不同时刻的first-order变量进行比较。在总的来说，LTLfMT已经被证明是任何可 decidable的first-order理论（例如线性算术）的semi-decidable，使用表格式的semi-decision过程。在这篇论文中，我们提出了LTLfMT表格中的一个有效和完整的剪切规则。我们证明，对任何满足抽象的semantic条件的LTLfMT公式，将表格与该规则相加将 garantuee the tableau terminate。此外，这种技术还允许我们为LTLfMT的各个 Fragment establishment新的 decidability result，以及为已知的类提供新的 decidability证明。

Alpha-GPT: Human-AI Interactive Alpha Mining for Quantitative Investment

paper_url: http://arxiv.org/abs/2308.00016
repo_url: None
paper_authors: Saizhuo Wang, Hang Yuan, Leon Zhou, Lionel M. Ni, Heung-Yeung Shum, Jian Guo
for: 这篇论文的目的是探讨一新的α探索方法（effective trading signals or factors），并提供一个新的人工智能-人类互动式α探索框架。
methods: 这篇论文使用了大量语言模型来实现人工智能-人类互动式α探索方法，并提出了一个新的问题工程式框架来实现这种互动。
results: 这篇论文透过一些α探索实验，证明了 alpha-GPT 框架的有效性和优势，并提供了一些创新、深入和有效的 α。

Abstract
One of the most important tasks in quantitative investment research is mining new alphas (effective trading signals or factors). Traditional alpha mining methods, either hand-crafted factor synthesizing or algorithmic factor mining (e.g., search with genetic programming), have inherent limitations, especially in implementing the ideas of quants. In this work, we propose a new alpha mining paradigm by introducing human-AI interaction, and a novel prompt engineering algorithmic framework to implement this paradigm by leveraging the power of large language models. Moreover, we develop Alpha-GPT, a new interactive alpha mining system framework that provides a heuristic way to ``understand'' the ideas of quant researchers and outputs creative, insightful, and effective alphas. We demonstrate the effectiveness and advantage of Alpha-GPT via a number of alpha mining experiments.

摘要
一种非常重要的任务在量化投资研究中是挖掘新的α（有效的交易信号或因素）。传统的α挖掘方法，例如手动制造因子汇集或算法生成器（如基因编程），具有内在的限制，特别是在实现量化研究者的想法方面。在这项工作中，我们提出了一种新的α挖掘方框，通过引入人工智能与人类之间的互动，并利用大语言模型的力量。此外，我们开发了Alpha-GPT，一种新的互动式α挖掘系统框架，可以帮助量化研究者更好地理解他们的想法，并生成创新、深 insightful和有效的α。我们通过一些α挖掘实验来证明Alpha-GPT的效果和优势。

Recent advancement in Disease Diagnostic using machine learning: Systematic survey of decades, comparisons, and challenges

paper_url: http://arxiv.org/abs/2308.01319
repo_url: None
paper_authors: Farzaneh Tajidini, Mohammad-Javad Kheiri
for: 这篇论文主要是为了探讨计算机支持的医学诊断技术，以及这些技术在诊断疾病方面的应用。
methods: 该论文使用了机器学习技术，包括例示学习和深度学习等方法，来分析多个维度和多模式的生物医学数据，以提高疾病诊断的精度。
results: 该论文总结了各种机器学习算法和技术在诊断不同疾病方面的应用和效果，包括肝炎、糖尿病、肝病、登革热和心血管疾病等。

Abstract
Computer-aided diagnosis (CAD), a vibrant medical imaging research field, is expanding quickly. Because errors in medical diagnostic systems might lead to seriously misleading medical treatments, major efforts have been made in recent years to improve computer-aided diagnostics applications. The use of machine learning in computer-aided diagnosis is crucial. A simple equation may result in a false indication of items like organs. Therefore, learning from examples is a vital component of pattern recognition. Pattern recognition and machine learning in the biomedical area promise to increase the precision of disease detection and diagnosis. They also support the decision-making process's objectivity. Machine learning provides a practical method for creating elegant and autonomous algorithms to analyze high-dimensional and multimodal bio-medical data. This review article examines machine-learning algorithms for detecting diseases, including hepatitis, diabetes, liver disease, dengue fever, and heart disease. It draws attention to the collection of machine learning techniques and algorithms employed in studying conditions and the ensuing decision-making process.

摘要
计算机支持诊断（CAD）是医疗影像研究领域的一个热门领域，正在迅速扩展。由于医疗诊断系统中的错误可能会导致严重错误的医疗治疗，因此在最近几年内，对计算机支持诊断应用程序进行改进的努力已经做出了巨大的投入。机器学习在计算机支持诊断中扮演着关键的角色。由于简单的方程可能会导致精准的识别结果，因此学习从示例中的知识是绝对必要的。生物医学领域中的模式识别和机器学习技术 promise to 提高疾病检测和诊断的精度。它们还支持决策过程的 объекivity。机器学习提供了一种实用的方法来分析高维和多模式生物医学数据。本文icle 评论了使用机器学习算法检测疾病，包括肝炎、糖尿病、肝病、登革热和心血管疾病。它吸引了对机器学习技术和算法在研究疾病 condition 的应用和决策过程中的集成。

Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc

paper_url: http://arxiv.org/abs/2308.04445
repo_url: None
paper_authors: Doug Lenat, Gary Marcus
for:The paper is written to address the limitations of current AI approaches, particularly the lack of reasoning ability and unpredictability of large language models (LLMs).methods:The paper proposes an alternative approach to AI that combines the strengths of LLMs with the reasoning ability of symbolic AI systems, using curated pieces of explicit knowledge and rules of thumb to enable an inference engine to automatically deduce logical entailments.results:The paper describes how one AI system, Cyc, has developed ways to overcome the tradeoff between expressiveness and speed in reasoning, allowing it to reason in higher order logic in real time. The authors suggest that any trustworthy general AI will need to hybridize the LLM and symbolic approaches, and lay out a path to realizing this dream.

Abstract
Generative AI, the most popular current approach to AI, consists of large language models (LLMs) that are trained to produce outputs that are plausible, but not necessarily correct. Although their abilities are often uncanny, they are lacking in aspects of reasoning, leading LLMs to be less than completely trustworthy. Furthermore, their results tend to be both unpredictable and uninterpretable. We lay out 16 desiderata for future AI, and discuss an alternative approach to AI which could theoretically address many of the limitations associated with current approaches: AI educated with curated pieces of explicit knowledge and rules of thumb, enabling an inference engine to automatically deduce the logical entailments of all that knowledge. Even long arguments produced this way can be both trustworthy and interpretable, since the full step-by-step line of reasoning is always available, and for each step the provenance of the knowledge used can be documented and audited. There is however a catch: if the logical language is expressive enough to fully represent the meaning of anything we can say in English, then the inference engine runs much too slowly. That's why symbolic AI systems typically settle for some fast but much less expressive logic, such as knowledge graphs. We describe how one AI system, Cyc, has developed ways to overcome that tradeoff and is able to reason in higher order logic in real time. We suggest that any trustworthy general AI will need to hybridize the approaches, the LLM approach and more formal approach, and lay out a path to realizing that dream.

摘要
现代AI的主流方法是生成AI，它们通过训练大型语言模型（LLM）来生成可能性强的输出，但并非总是正确的。尽管它们的能力可能很强大，但它们缺乏推理能力，导致它们不够可靠。此外，它们的输出通常是难以预测和解释的。我们提出了16个愿景 для未来的AI，并讨论了一种可能解决当前方法的局限性的替代方法：通过手动约束的知识和规则来教育AI，使其推理引擎可以自动推理出知识的逻辑推论。这些推理结果不仅可靠，而且可解释，因为整个推理过程的每一步都可以查看，并且每一步使用的知识的来源都可以记录和审核。然而，有一点问题：如果逻辑语言足够表达任何我们可以在英语中说的意思，那么推理引擎就会非常慢。因此，符号AI系统通常选择一些快速而但是远不如表达力强的逻辑，如知识图。我们描述了一个AI系统——Cyc——如何超越这一限制，在实时内推理高阶逻辑。我们建议任何可靠的通用AI都需要混合这两种方法，以及LLM方法和更正式的方法。我们还讲解了实现这一梦想的路径。

On the use of associative memory in Hopfield networks designed to solve propositional satisfiability problems

paper_url: http://arxiv.org/abs/2307.16807
repo_url: https://github.com/nata-web/SO_for_SAT
paper_authors: Natalya Weber, Werner Koch, Ozan Erdem, Tom Froese
for: 该 paper 用 Hopfield networks 和 Self-Optimization (SO) 模型解决了许多计算问题，因为它们提供了生物学可能的机制。
methods: 该 paper 使用了 Hebbian 学习规则，在重复地将网络重置到初始状态后，使网络优化其行为，以达到某种愿望状态。
results: 该 paper 通过两个例子（假象问题和地图彩色问题）示出了 SO 模型可以解决具体的 combinatorial 问题，但也发现在某些条件下，学习后的网络可能产生不适合问题的优化解决方案，这可能是 SO 模型解决难解问题的一种不良效果。

Abstract
Hopfield networks are an attractive choice for solving many types of computational problems because they provide a biologically plausible mechanism. The Self-Optimization (SO) model adds to the Hopfield network by using a biologically founded Hebbian learning rule, in combination with repeated network resets to arbitrary initial states, for optimizing its own behavior towards some desirable goal state encoded in the network. In order to better understand that process, we demonstrate first that the SO model can solve concrete combinatorial problems in SAT form, using two examples of the Liars problem and the map coloring problem. In addition, we show how under some conditions critical information might get lost forever with the learned network producing seemingly optimal solutions that are in fact inappropriate for the problem it was tasked to solve. What appears to be an undesirable side-effect of the SO model, can provide insight into its process for solving intractable problems.

摘要

Multiobjective Evolutionary Component Effect on Algorithm behavior

paper_url: http://arxiv.org/abs/2308.02527
repo_url: None
paper_authors: Yuri Lavinas, Marcelo Ladeira, Gabriela Ochoa, Claus Aranha
for: This paper aims to investigate the effects of the final configuration of an automatically designed multiobjective evolutionary algorithm (MOEA) on the algorithm’s performance.
methods: The paper uses a methodology to analyze the impact of the algorithm components, such as Search Trajectory Networks (STNs), population diversity, and anytime hypervolume values, on the performance of the MOEA.
results: The study finds that the MOEA converges to good hypervolume values in analytical artificial and real-world problems, but the search is still ongoing in simulated real-world problems. The paper also observes a diverse set of trajectories in the analytical artificial problems, and these trajectories are more similar and frequently reach optimal solutions in the other problems.Here is the Chinese translation of the three key information points:
for: 这篇论文目的是研究自动设计的多目标进化算法（MOEA）的配置对算法性能的影响。
methods: 这篇论文使用一种方法来分析自动设计算法组件，如搜索轨迹网络（STNs）、人口多样性和任何时间超量值对算法性能的影响。
results: 研究发现，MOEA在分析人工和实际问题上 converge 到良好的超量值，但在模拟问题上的搜索仍在进行中。论文还发现，分析人工问题时，搜索轨迹网络的多样性较高，这些轨迹更frequently到达优解。

Abstract
The performance of multiobjective evolutionary algorithms (MOEAs) varies across problems, making it hard to develop new algorithms or apply existing ones to new problems. To simplify the development and application of new multiobjective algorithms, there has been an increasing interest in their automatic design from their components. These automatically designed metaheuristics can outperform their human-developed counterparts. However, it is still unknown what are the most influential components that lead to performance improvements. This study specifies a new methodology to investigate the effects of the final configuration of an automatically designed algorithm. We apply this methodology to a tuned Multiobjective Evolutionary Algorithm based on Decomposition (MOEA/D) designed by the iterated racing (irace) configuration package on constrained problems of 3 groups: (1) analytical real-world problems, (2) analytical artificial problems and (3) simulated real-world. We then compare the impact of the algorithm components in terms of their Search Trajectory Networks (STNs), the diversity of the population, and the anytime hypervolume values. Looking at the objective space behavior, the MOEAs studied converged before half of the search to generally good HV values in the analytical artificial problems and the analytical real-world problems. For the simulated problems, the HV values are still improving at the end of the run. In terms of decision space behavior, we see a diverse set of the trajectories of the STNs in the analytical artificial problems. These trajectories are more similar and frequently reach optimal solutions in the other problems.

摘要
multiobjective evolutionary algorithms（MOEAs）的表现在不同的问题上有很大差异，这使得开发新的算法或应用现有的算法到新问题变得困难。为了简化新算法的开发和应用，有越来越多的关注于它们的自动设计。这些自动设计的metaheuristics可以超越人类开发的对应算法。然而，还不清楚哪些组件导致性能改进。本研究提出了一种新的方法来调查自动设计算法的最后配置对性能的影响。我们应用这种方法于一个调参的多目标演化算法基于分解（MOEA/D）在约束问题上，包括三类问题：（1）分析世界问题，（2）分析人工问题和（3）实际世界问题。然后，我们比较了算法组件在各个方面的搜索轨迹网络（STNs）、人口多样性和任何时间的超Volume值的影响。在目标空间行为方面，MOEAsstudied在分析人工问题和分析世界问题中通常在半个搜索时间内 converged to good HV值。而在模拟问题上，HV值仍在搜索结束时提高。在决策空间行为方面，我们看到了分析人工问题中的STN trajectories是更加多样的，这些轨迹更frequently reach到优质解。

Structural Transfer Learning in NL-to-Bash Semantic Parsers

paper_url: http://arxiv.org/abs/2307.16795
repo_url: None
paper_authors: Kyle Duffy, Satwik Bhattamishra, Phil Blunsom
for: 本研究旨在解释预训练数据集的设计方面进行大规模进攻。
methods: 本研究提出了一种方法来量化理解机器翻译任务之间的结构匹配。该方法在NLBash任务上应用，并显示NLBash大多可reducible到lexical alignment。此外，研究还发现了natural language to SQL和NLBash之间强相似性。
results: 研究发现，在英文到德文翻译任务中，更多的预训练计算量不总是导致NLBash中的semantic representation具有更强的传递性。

Abstract
Large-scale pre-training has made progress in many fields of natural language processing, though little is understood about the design of pre-training datasets. We propose a methodology for obtaining a quantitative understanding of structural overlap between machine translation tasks. We apply our methodology to the natural language to Bash semantic parsing task (NLBash) and show that it is largely reducible to lexical alignment. We also find that there is strong structural overlap between NLBash and natural language to SQL. Additionally, we perform a study varying compute expended during pre-training on the English to German machine translation task and find that more compute expended during pre-training does not always correspond semantic representations with stronger transfer to NLBash.

摘要
大规模预训练在自然语言处理多个领域取得了进步，然而预训练数据集的设计仍然不够了解。我们提出了一种方法来获得自然语言翻译任务之间的结构 overlap 的量化理解。我们对 natural language to Bash semantics parsing任务 (NLBash) 应用了这种方法，并发现它大多数可以归结为词语对应。此外，我们发现 natural language to SQL 和 NLBash 之间存在强大的结构 overlap。此外，我们对英语到德语机器翻译任务进行了不同计算开销的预训练研究，发现在某些情况下，更多的计算开销不一定对应 stronger 的语义传递到 NLBash。

2023-08-01

Hessian-Aware Bayesian Optimization for Decision Making Systems

Human-M3: A Multi-view Multi-modal Dataset for 3D Human Pose Estimation in Outdoor Scenes

JIANG: Chinese Open Foundation Language Model

Beyond One-Hot-Encoding: Injecting Semantics to Drive Image Classifiers

Collaborative filtering to capture AI user’s preferences as norms

Towards More Human-like AI Communication: A Review of Emergent Communication Research

Reinforcement Learning-based Non-Autoregressive Solver for Traveling Salesman Problems

Copula for Instance-wise Feature Selection and Ranking

Predicting Early Dropouts of an Active and Healthy Ageing App

PressureTransferNet: Human Attribute Guided Dynamic Ground Pressure Profile Transfer using 3D simulated Pressure Maps

Graph Embedding Dynamic Feature-based Supervised Contrastive Learning of Transient Stability for Changing Power Grid Topologies

Transfer-Ensemble Learning based Deep Convolutional Neural Networks for Diabetic Retinopathy Classification

SurveyLM: A platform to explore emerging value perspectives in augmented language models’ behaviors

Explainable Graph Spectral Clustering of Text Documents

Retrieval Augmented Generation and Representative Vector Summarization for large unstructured textual data in Medical Education

A Satellite Imagery Dataset for Long-Term Sustainable Development in United States Cities

DMFC-GraspNet: Differentiable Multi-Fingered Robotic Grasp Generation in Cluttered Scenes

MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers

Structural Embeddings of Tools for Large Language Models

ALE: A Simulation-Based Active Learning Evaluation Framework for the Parameter-Driven Comparison of Query Strategies for NLP

SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning

Patch-wise Auto-Encoder for Visual Anomaly Detection

Generative adversarial networks with physical sound field priors

Discourse-Aware Text Simplification: From Complex Sentences to Linked Propositions

Exploring the Role of Explainability in AI-Assisted Embryo Selection

BiERL: A Meta Evolutionary Reinforcement Learning Framework via Bilevel Optimization

Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis

Counterfactual Graph Transformer for Traffic Flow Prediction

Artificial-Intelligence-Based Triple Phase Shift Modulation for Dual Active Bridge Converter with Minimized Current Stress

Artificial-Intelligence-Based Hybrid Extended Phase Shift Modulation for the Dual Active Bridge Converter with Full ZVS Range and Optimal Efficiency

Shape Completion with Prediction of Uncertain Regions

Fountain – an intelligent contextual assistant combining knowledge representation and language models for manufacturing risk identification

MetaGPT: Meta Programming for Multi-Agent Collaborative Framework

Learning Green’s Function Efficiently Using Low-Rank Approximations

Dynamic ensemble selection based on Deep Neural Network Uncertainty Estimation for Adversarial Robustness

Kidnapping Deep Learning-based Multirotors using Optimized Flying Adversarial Patches

Monitoring Algorithmic Fairness under Partial Observations

Target Search and Navigation in Heterogeneous Robot Systems with Deep Reinforcement Learning

Threshold-aware Learning to Generate Feasible Solutions for Mixed Integer Programs

Choir Transformer: Generating Polyphonic Music with Relative Attention on Transformer

Pixel to policy: DQN Encoders for within & cross-game reinforcement learning

Revolutionizing TCAD Simulations with Universal Device Encoding and Graph Attention Networks

Adapt and Decompose: Efficient Generalization of Text-to-SQL via Domain Adapted Least-To-Most Prompting

Making the V in Text-VQA Matter

Gated Driver Attention Predictor

Multi-Modality Multi-Loss Fusion Network

LGViT: Dynamic Early Exiting for Accelerating Vision Transformer

EEG-based Cognitive Load Classification using Feature Masked Autoencoding and Emotion Transfer Learning

The Hitchhiker’s Guide to Program Analysis: A Journey with Large Language Models

ChatMOF: An Autonomous AI System for Predicting and Generating Metal-Organic Frameworks

Capsa: A Unified Framework for Quantifying Risk in Deep Neural Networks

Experiments on Generative AI-Powered Parametric Modeling and BIM for Architectural Design

Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias

Advancing Beyond Identification: Multi-bit Watermark for Language Models

Deep Reinforcement Learning-Based Battery Conditioning Hierarchical V2G Coordination for Multi-Stakeholder Benefits

Performance Evaluation of Swin Vision Transformer Model using Gradient Accumulation Optimization Technique

Analytical Techniques to Support Hospital Case Mix Planning

Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?

Multicriteria Optimization Techniques for Understanding the Case Mix Landscape of a Hospital

The Efficacy of Utility Functions for Multicriteria Hospital Case-Mix Planning

Attribution-Scores in Data Management and Explainable Machine Learning

Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity

Adversarially Robust Neural Legal Judgement Systems

Predicting Perfect Quality Segments in MT Output with Fine-Tuned OpenAI LLM: Is it possible to capture editing distance patterns from historical data?

Formally Explaining Neural Networks within Reactive Systems

A Suite of Fairness Datasets for Tabular Classification

A Modular Ontology for MODS – Metadata Object Description Schema

Can A Single Human Supervise A Swarm of 100 Heterogeneous Robots?

Towards Semantically Enriched Embeddings for Knowledge Graph Completion

A Novel Deep Learning based Model to Defend Network Intrusion Detection System against Adversarial Attacks

Crowd Safety Manager: Towards Data-Driven Active Decision Support for Planning and Control of Crowd Events

Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges

DiVA-360: The Dynamic Visuo-Audio Dataset for Immersive Neural Fields

Predicting masked tokens in stochastic locations improves masked image modeling

Foundational Models for Fault Diagnosis of Electrical Motors

Learning to Model the World with Language

Discovering Adaptable Symbolic Algorithms from Scratch

Image Synthesis under Limited Data: A Survey and Taxonomy

Contrastive Learning for API Aspect Analysis