2023-08-15

cs.AI

cs.AI - 2023-08-15

REFORMS: Reporting Standards for Machine Learning Based Science

paper_url: http://arxiv.org/abs/2308.07832
repo_url: None
paper_authors: Sayash Kapoor, Emily Cantrell, Kenny Peng, Thanh Hien Pham, Christopher A. Bail, Odd Erik Gundersen, Jake M. Hofman, Jessica Hullman, Michael A. Lones, Momin M. Malik, Priyanka Nanayakkara, Russell A. Poldrack, Inioluwa Deborah Raji, Michael Roberts, Matthew J. Salganik, Marta Serra-Garcia, Brandon M. Stewart, Gilles Vandewiele, Arvind Narayanan
for: 这篇论文的目的是提供机器学习（ML）基于科学研究的清晰报告标准。
methods: 这篇论文使用了一份名为REFORMS（Reporting Standards For Machine Learning Based Science）的检查列表，该列表包含32个问题和一对拥有的指南。REFORMS是基于19名研究者来自计算机科学、数据科学、数学、社会科学和医学等领域的共识而开发的。
results: 这篇论文提供了一个资源 для研究者在设计和实施研究时使用，以及为评审人员在审查论文时使用，以确保透明度和可重复性。

Abstract
Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways across disciplines. Motivated by this observation, our goal is to provide clear reporting standards for ML-based science. Drawing from an extensive review of past literature, we present the REFORMS checklist ($\textbf{Re}$porting Standards $\textbf{For}$ $\textbf{M}$achine Learning Based $\textbf{S}$cience). It consists of 32 questions and a paired set of guidelines. REFORMS was developed based on a consensus of 19 researchers across computer science, data science, mathematics, social sciences, and biomedical sciences. REFORMS can serve as a resource for researchers when designing and implementing a study, for referees when reviewing papers, and for journals when enforcing standards for transparency and reproducibility.

摘要

Tightest Admissible Shortest Path

paper_url: http://arxiv.org/abs/2308.08453
repo_url: None
paper_authors: Eyal Weiss, Ariel Felner, Gal A. Kaminka
for: 解决Weighted Directed Graphs中的短est path问题，考虑edge-weight计算时间和不确定性的影响。
methods: 基于 generalized framework的提议，引入紧跟最优路径问题（TASP），解决在 bounded uncertainty 下的短est path问题，通过质量保证来提供解决方案。
results: 提出了一种完整的算法，并提供了解决方案的质量保证，验证结果表明该方法的有效性。

Abstract
The shortest path problem in graphs is fundamental to AI. Nearly all variants of the problem and relevant algorithms that solve them ignore edge-weight computation time and its common relation to weight uncertainty. This implies that taking these factors into consideration can potentially lead to a performance boost in relevant applications. Recently, a generalized framework for weighted directed graphs was suggested, where edge-weight can be computed (estimated) multiple times, at increasing accuracy and run-time expense. We build on this framework to introduce the problem of finding the tightest admissible shortest path (TASP); a path with the tightest suboptimality bound on the optimal cost. This is a generalization of the shortest path problem to bounded uncertainty, where edge-weight uncertainty can be traded for computational cost. We present a complete algorithm for solving TASP, with guarantees on solution quality. Empirical evaluation supports the effectiveness of this approach.

摘要
将文本翻译成简化字符串。图形中的最短路径问题是人工智能的基础问题之一。大多数变体的问题和解决方案忽略了边Weight计算时间和Weight不确定性之间的通常关系。这意味着考虑这些因素可能会导致应用中的性能提升。最近，一种总结框架 для权重有向图被建议，其中边Weight可以在不同的精度和计算成本下重复计算。我们在这个框架之上引入了找到最紧张的可接受路径（TASP）问题，这是一种对于不确定性 bounded 的扩展，可以通过计算成本来贸易边Weight uncertainty。我们提出了一个完整的解决TASP问题的算法，并提供了解决方案质量的保证。实验证明了这种方法的有效性。

Learning to Identify Critical States for Reinforcement Learning from Videos

paper_url: http://arxiv.org/abs/2308.07795
repo_url: https://github.com/ai-initiative-kaust/videorlcs
paper_authors: Haozhe Liu, Mingchen Zhuge, Bing Li, Yuhui Wang, Francesco Faccio, Bernard Ghanem, Jürgen Schmidhuber
for: 本研究的目的是利用视频数据提取深度强化学习中的有用策略信息，不需要明确的动作信息。
methods: 该方法使用视频编码的集集数据，通过深度学习预测回报，然后使用面积基于的敏感分析提取重要的关键状态。
results: 广泛的实验显示，该方法可以理解和改进代理行为。代码和生成的数据集可以在 GitHub 上找到。

Abstract
Recent work on deep reinforcement learning (DRL) has pointed out that algorithmic information about good policies can be extracted from offline data which lack explicit information about executed actions. For example, videos of humans or robots may convey a lot of implicit information about rewarding action sequences, but a DRL machine that wants to profit from watching such videos must first learn by itself to identify and recognize relevant states/actions/rewards. Without relying on ground-truth annotations, our new method called Deep State Identifier learns to predict returns from episodes encoded as videos. Then it uses a kind of mask-based sensitivity analysis to extract/identify important critical states. Extensive experiments showcase our method's potential for understanding and improving agent behavior. The source code and the generated datasets are available at https://github.com/AI-Initiative-KAUST/VideoRLCS.

摘要
近期深度强化学习（DRL）的研究表明，可以从没有明确行动信息的线上数据中提取良好策略的算法信息。例如，人类或机器人视频可以传递大量的隐式信息关于奖励行动序列，但一个DRL机器人想要从这些视频中获益，首先必须自己学习 identificifying和识别相关的状态/行动/奖励。无需基于真实标注，我们的新方法called Deep State Identifier可以预测episode编码为视频中的返回。然后使用一种mask-based敏感分析来提取/识别重要的关键状态。广泛的实验表明了我们方法的可能性 для理解和改进代理行为。源代码和生成的数据集可以在https://github.com/AI-Initiative-KAUST/VideoRLCS中下载。

Implementing Quantum Generative Adversarial Network (qGAN) and QCBM in Finance

paper_url: http://arxiv.org/abs/2308.08448
repo_url: None
paper_authors: Santanu Ganguly
for: 本研究探讨了应用量子机器学习（QML）在金融领域的新热点研究领域，包括股票价格预测、资产风险管理和评估等。
methods: 本研究使用了真实的金融数据集和模拟环境，对量子机器学习（QML）模型进行比较，包括qGAN（量子生成对抗网络）和QCBM（量子环境生成机器）等模型。
results: 研究发现，量子机器学习（QML）在金融领域可以提供未来的量子优势，特别是在股票价格预测和资产风险管理等领域。

Abstract
Quantum machine learning (QML) is a cross-disciplinary subject made up of two of the most exciting research areas: quantum computing and classical machine learning (ML), with ML and artificial intelligence (AI) being projected as the first fields that will be impacted by the rise of quantum machines. Quantum computers are being used today in drug discovery, material & molecular modelling and finance. In this work, we discuss some upcoming active new research areas in application of quantum machine learning (QML) in finance. We discuss certain QML models that has become areas of active interest in the financial world for various applications. We use real world financial dataset and compare models such as qGAN (quantum generative adversarial networks) and QCBM (quantum circuit Born machine) among others, using simulated environments. For the qGAN, we define quantum circuits for discriminators and generators and show promises of future quantum advantage via QML in finance.

摘要
量子机器学习（QML）是两个最有前途的研究领域之间的跨学科领域：量子计算和经典机器学习（ML）， Machine learning和人工智能（AI）被predict为第一个受到量子机器的影响的领域。量子计算机在今天的药物发现、物质和分子模型以及金融领域中使用。在这项工作中，我们讨论了在金融领域中应用量子机器学习（QML）的未来活跃研究领域。我们讨论了一些在金融世界中受到关注的QML模型，并使用实际的金融数据进行比较。我们使用 simulated environments 来评估 qGAN 和 QCBM 等模型，并显示了未来量子优势的推荐。Note: Please note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. If you need the translation in Traditional Chinese, please let me know.

Informed Named Entity Recognition Decoding for Generative Language Models

paper_url: http://arxiv.org/abs/2308.07791
repo_url: None
paper_authors: Tobias Deußer, Lars Hillebrand, Christian Bauckhage, Rafet Sifa
for: 这个论文主要是为了提高命名实体识别（Named Entity Recognition，NER）的性能。
methods: 这篇论文提出了一种简单 yet effective的方法，即 Informed Named Entity Recognition Decoding（iNERD），它将命名实体识别视为一种生成过程，利用了最新的生成模型的语言理解能力，并采用了了一种有知识的解码方案，以便更好地处理有限的信息抽取任务。
results: 论文在使用五种生成语言模型，测试在八个命名实体识别 datasets 上，得到了很出色的结果，特别是在未知实体类型集合的环境下，这说明了该方法的适应性。

Abstract
Ever-larger language models with ever-increasing capabilities are by now well-established text processing tools. Alas, information extraction tasks such as named entity recognition are still largely unaffected by this progress as they are primarily based on the previous generation of encoder-only transformer models. Here, we propose a simple yet effective approach, Informed Named Entity Recognition Decoding (iNERD), which treats named entity recognition as a generative process. It leverages the language understanding capabilities of recent generative models in a future-proof manner and employs an informed decoding scheme incorporating the restricted nature of information extraction into open-ended text generation, improving performance and eliminating any risk of hallucinations. We coarse-tune our model on a merged named entity corpus to strengthen its performance, evaluate five generative language models on eight named entity recognition datasets, and achieve remarkable results, especially in an environment with an unknown entity class set, demonstrating the adaptability of the approach.

摘要
现代语言模型在功能上不断提高，成为文本处理工具的标准配置。可是，信息提取任务，如命名实体识别，仍然受到这些进步的影响很少，因为它们主要基于上一代encoder-only transformer模型。在这里，我们提出了一种简单 yet有效的方法，命名实体识别生成（iNERD），它将命名实体识别视为生成过程。它利用最新的生成模型对语言理解能力的提高，并采用了有知识的编码方案，将开放式文本生成和信息提取的限制纳入考虑，从而提高性能并消除所有的幻觉。我们在合并的命名实体 корпу斯上粗略调整我们的模型，以强化其表现，并评估了五种生成语言模型在八个命名实体识别 datasets 上的表现，取得了非常出色的结果，特别是在未知实体类集的环境中，这表明了方法的适应性。

Do We Fully Understand Students’ Knowledge States? Identifying and Mitigating Answer Bias in Knowledge Tracing

paper_url: http://arxiv.org/abs/2308.07779
repo_url: https://github.com/lucky7-code/core
paper_authors: Chaoran Cui, Hebo Ma, Chen Zhang, Chunyun Zhang, Yumo Yao, Meng Chen, Yuling Ma
for: 这 paper 的目的是解决知识追踪 (KT) 中存在的答案偏见问题，以便更好地理解学生们的知识状态。
methods: 这 paper 使用了 causality 理论来解决 KT 中的答案偏见问题，并提出了一种 COunterfactual REasoning (CORE) 框架来减少答案偏见的影响。
results: 这 paper 的实验结果表明，CORE 框架可以减少 KT 中答案偏见的影响，并且可以与现有的多种 KT 模型结合使用。

Abstract
Knowledge tracing (KT) aims to monitor students' evolving knowledge states through their learning interactions with concept-related questions, and can be indirectly evaluated by predicting how students will perform on future questions. In this paper, we observe that there is a common phenomenon of answer bias, i.e., a highly unbalanced distribution of correct and incorrect answers for each question. Existing models tend to memorize the answer bias as a shortcut for achieving high prediction performance in KT, thereby failing to fully understand students' knowledge states. To address this issue, we approach the KT task from a causality perspective. A causal graph of KT is first established, from which we identify that the impact of answer bias lies in the direct causal effect of questions on students' responses. A novel COunterfactual REasoning (CORE) framework for KT is further proposed, which separately captures the total causal effect and direct causal effect during training, and mitigates answer bias by subtracting the latter from the former in testing. The CORE framework is applicable to various existing KT models, and we implement it based on the prevailing DKT, DKVMN, and AKT models, respectively. Extensive experiments on three benchmark datasets demonstrate the effectiveness of CORE in making the debiased inference for KT.

摘要
知识跟踪（KT）目的是监测学生在学习过程中知识状态的变化，通过问题相关的问题来评估学生的知识水平，并可以通过预测未来问题的回答来间接评估。在这篇论文中，我们发现了一种常见的答案偏见现象，即每个问题的答案准确率和错误率存在极大的偏见。现有的模型通常会借助答案偏见作为短cut来实现高度预测性能，从而忽略了学生的知识状态。为解决这问题，我们从 causality 角度对 KT 进行了研究。首先，我们从 KT 问题中构建了一个 causal 图，并发现了答案偏见对学生回答的直接 causal 效应。基于这个 causal 图，我们提出了一种新的 COunterfactual REasoning（CORE）框架，它在训练时分别捕捉总 causal 效应和直接 causal 效应，并在测试时对答案偏见进行补做，以确保debias 的推理。CORE 框架可以应用于多种现有 KT 模型，我们在 DKT、DKVMN 和 AKT 模型上实现了它。我们在三个 benchmark 数据集上进行了广泛的实验，并证明了 CORE 在 KT 中的有效性。

Hierarchical generative modelling for autonomous robots

paper_url: http://arxiv.org/abs/2308.07775
repo_url: None
paper_authors: Kai Yuan, Noor Sajid, Karl Friston, Zhibin Li
for: 这个论文旨在研究人类在与环境交互时如何生成复杂全身运动，以便在自主机器人操作中实现高效的目标完成。
methods: 这篇论文使用了层次生成模型，包括多级规划和自动控制，来模拟人类动作控制的深度时间架构。
results: 通过数字和物理实验，这篇论文证明了使用人类动作控制算法可以实现自主机器人完成复杂任务，例如抓取和运输箱子、穿过门户、踢足球等，并在身体损伤和地面不平的情况下保持稳定性。

Abstract
Humans can produce complex whole-body motions when interacting with their surroundings, by planning, executing and combining individual limb movements. We investigated this fundamental aspect of motor control in the setting of autonomous robotic operations. We approach this problem by hierarchical generative modelling equipped with multi-level planning-for autonomous task completion-that mimics the deep temporal architecture of human motor control. Here, temporal depth refers to the nested time scales at which successive levels of a forward or generative model unfold, for example, delivering an object requires a global plan to contextualise the fast coordination of multiple local movements of limbs. This separation of temporal scales also motivates robotics and control. Specifically, to achieve versatile sensorimotor control, it is advantageous to hierarchically structure the planning and low-level motor control of individual limbs. We use numerical and physical simulation to conduct experiments and to establish the efficacy of this formulation. Using a hierarchical generative model, we show how a humanoid robot can autonomously complete a complex task that necessitates a holistic use of locomotion, manipulation, and grasping. Specifically, we demonstrate the ability of a humanoid robot that can retrieve and transport a box, open and walk through a door to reach the destination, approach and kick a football, while showing robust performance in presence of body damage and ground irregularities. Our findings demonstrated the effectiveness of using human-inspired motor control algorithms, and our method provides a viable hierarchical architecture for the autonomous completion of challenging goal-directed tasks.

摘要
人类可以生成复杂全身运动when interacting with其 surroundings，通过规划、执行和组合各个肢体运动。我们在自主 роботизирован操作的设置下调查了这一基本的 дви作控制问题。我们采用层次生成模型，带有多级规划，以模仿人类 дви作控制的深度时间建筑。在这里，时间深度指的是成功级别模型在不同时间层次上进行的嵌套执行，例如，为了交付物品，需要一个全局规划，以Contextualize the rapid coordination of multiple local limb movements。这种时间层次分离也驱动了机器人和控制。特别是，为了实现多元感知motor控制，是在层次结构的规划和低级motor控制中进行分离。我们使用数字和物理 simulate experiments to verify the effectiveness of this approach.使用层次生成模型，我们展示了一个人工智能机器人可以自主完成一个复杂任务，需要整体使用 locomotion、抓取和抓取。例如，我们示出了一个人工智能机器人可以拾取和运送一个盒子，通过门way，然后踢过一个足球，并在存在身体损伤和地面不平的情况下表现稳定。我们的发现表明了使用人类 inspirational motor control算法的有效性，而我们的方法提供了一个可靠的层次建筑，用于自主完成具有挑战性的目标导向任务。

A Graph Encoder-Decoder Network for Unsupervised Anomaly Detection

paper_url: http://arxiv.org/abs/2308.07774
repo_url: None
paper_authors: Mahsa Mesgaran, A. Ben Hamza
for: 检测图像中异常节点
methods: 使用无监督图像编码器-解码器模型，学习异常分数函数对节点进行排序，并使用本地性受限的线性编码方法来找到异常分数矩阵
results: 在六个基准数据集上使用多种评价指标进行实验，结果显示该方法在异常检测方面具有优异性，比之前的方法更高效和可靠。

Abstract
A key component of many graph neural networks (GNNs) is the pooling operation, which seeks to reduce the size of a graph while preserving important structural information. However, most existing graph pooling strategies rely on an assignment matrix obtained by employing a GNN layer, which is characterized by trainable parameters, often leading to significant computational complexity and a lack of interpretability in the pooling process. In this paper, we propose an unsupervised graph encoder-decoder model to detect abnormal nodes from graphs by learning an anomaly scoring function to rank nodes based on their degree of abnormality. In the encoding stage, we design a novel pooling mechanism, named LCPool, which leverages locality-constrained linear coding for feature encoding to find a cluster assignment matrix by solving a least-squares optimization problem with a locality regularization term. By enforcing locality constraints during the coding process, LCPool is designed to be free from learnable parameters, capable of efficiently handling large graphs, and can effectively generate a coarser graph representation while retaining the most significant structural characteristics of the graph. In the decoding stage, we propose an unpooling operation, called LCUnpool, to reconstruct both the structure and nodal features of the original graph. We conduct empirical evaluations of our method on six benchmark datasets using several evaluation metrics, and the results demonstrate its superiority over state-of-the-art anomaly detection approaches.

摘要
Many graph neural networks (GNNs) 的关键组件是聚合操作，该操作的目标是将图的大小减小，保留重要的结构信息。然而，大多数现有的图聚合策略都是基于使用 GNN 层获得的分配矩阵，这些矩阵通常具有可学习参数，导致计算复杂性很高并且解释性差。在这篇论文中，我们提出了一种无监督的图编码器-解码器模型，用于从图中检测异常节点。在编码阶段，我们设计了一种新的聚合机制，名为 LCPool，它利用了本地化的线性编码来找到一个归一化矩阵，通过解决一个最小二乘优化问题来实现。通过在编码过程中强制实施本地化约束，LCPool 设计为无学习参数，能够高效处理大图，并能够生成一个粗略的图表示，保留原图的最重要的结构特征。在解码阶段，我们提出了一种解聚机制，名为 LCUnpool，用于重建原始图的结构和节点特征。我们对六个标准数据集进行了实验评估，并通过多个评价指标证明了我们的方法的优越性。

MOLE: MOdular Learning FramEwork via Mutual Information Maximization

paper_url: http://arxiv.org/abs/2308.07772
repo_url: None
paper_authors: Tianchao Li, Yulong Pei
for: 这个论文旨在介绍一种异步本地学习框架，即Modular Learning Framework (MOLE)，用于神经网络。
methods: 这个框架通过层次归一化神经网络，定义每个模块的训练目标为相互信息增大，然后逐次训练每个模块以增大相互信息。
results: 实验表明，MOLE可以解决不同类型的数据，包括向量、网格和图数据。此外，MOLE还可以解决图数据上的节点级和图级任务。因此，MOLE已经在实验上证明是对不同类型数据的通用解决方案。

Abstract
This paper is to introduce an asynchronous and local learning framework for neural networks, named Modular Learning Framework (MOLE). This framework modularizes neural networks by layers, defines the training objective via mutual information for each module, and sequentially trains each module by mutual information maximization. MOLE makes the training become local optimization with gradient-isolated across modules, and this scheme is more biologically plausible than BP. We run experiments on vector-, grid- and graph-type data. In particular, this framework is capable of solving both graph- and node-level tasks for graph-type data. Therefore, MOLE has been experimentally proven to be universally applicable to different types of data.

摘要
这份论文旨在介绍一种异步本地学习框架，名为模块学习框架（MOLE）。这个框架将神经网络归一化为层，通过互信息定义每个模块的训练目标，并逐渐训练每个模块以互信息最大化。MOLE使得训练变成了本地优化，梯度归一化在模块之间，这种方法更加生物学可靠性高于bp。我们在向量-, 网格-和图型数据上进行了实验，并证明MOLE可以解决图型数据上的图级和节点级任务。因此，MOLE已经实验证明对不同类型的数据都是通用的。

NeFL: Nested Federated Learning for Heterogeneous Clients

paper_url: http://arxiv.org/abs/2308.07761
repo_url: None
paper_authors: Honggu Kang, Seohyeon Cha, Jinwoo Shin, Jongmyeong Lee, Joonhyuk Kang
For: The paper is written for discussing the issue of slow or incapable clients in federated learning (FL) and proposing a new framework called nested federated learning (NeFL) to address this issue.* Methods: The paper uses a generalized framework that efficiently divides a model into submodels using both depthwise and widthwise scaling, and interprets models as solving ordinary differential equations (ODEs) with adaptive step sizes.* Results: The paper demonstrates that NeFL leads to significant gains, especially for the worst-case submodel, and aligns with recent studies in FL. Specifically, the paper shows an improvement of 8.33 on CIFAR-10.

Abstract
Federated learning (FL) is a promising approach in distributed learning keeping privacy. However, during the training pipeline of FL, slow or incapable clients (i.e., stragglers) slow down the total training time and degrade performance. System heterogeneity, including heterogeneous computing and network bandwidth, has been addressed to mitigate the impact of stragglers. Previous studies split models to tackle the issue, but with less degree-of-freedom in terms of model architecture. We propose nested federated learning (NeFL), a generalized framework that efficiently divides a model into submodels using both depthwise and widthwise scaling. NeFL is implemented by interpreting models as solving ordinary differential equations (ODEs) with adaptive step sizes. To address the inconsistency that arises when training multiple submodels with different architecture, we decouple a few parameters. NeFL enables resource-constrained clients to effectively join the FL pipeline and the model to be trained with a larger amount of data. Through a series of experiments, we demonstrate that NeFL leads to significant gains, especially for the worst-case submodel (e.g., 8.33 improvement on CIFAR-10). Furthermore, we demonstrate NeFL aligns with recent studies in FL.

摘要
federated learning (FL) 是一种有前途的方法，它可以保持隐私性而在分布式学习中进行训练。然而，在 FL 的训练管道中，慢速或无力的客户端（即延迟者）会降低总训练时间和性能。系统不同性，包括不同的计算和网络带宽，已经得到了 Mitigate 的注意。先前的研究把模型分成了解决这个问题，但是它们具有较少的度量自由度，即模型体系结构。我们提出了嵌入式联邦学习（NeFL），一种总体化的框架，它可以高效地将模型分成子模型使用深度和宽度的扩展。NeFL 通过解释模型为解决常微分方程（ODEs）的解释，并使用适应步长来实现。为了解决多个子模型不同体系结构时出现的不一致性，我们划分了一些参数。NeFL 让资源受限的客户端可以有效地参与 FL 管道，并让模型在更大的数据量上进行训练。通过一系列实验，我们表明了 NeFL 带来了显著的改善，特别是最差的子模型（例如， CIFAR-10 上的 8.33 提高）。此外，我们还证明了 NeFL 与最近的 FL 研究相一致。

Dynamic Embedding Size Search with Minimum Regret for Streaming Recommender System

paper_url: http://arxiv.org/abs/2308.07760
repo_url: https://github.com/hebowei2000/DESS
paper_authors: Bowei He, Xu He, Renrui Zhang, Yingxue Zhang, Ruiming Tang, Chen Ma
for: 寻找适合不断增长的用户和项目的流动推荐系统，以适应 dynamically changing environments。
methods: 模型更新过程中采用了流动模型更新策略，并将 embedding layer 的大小设置为动态变量，以提高推荐性能和减少内存成本。
results: 对两个推荐任务中的四个公共数据集进行了实验，证明了我们的方法可以在流动环境下提供更好的推荐性能，同时具有更低的内存成本和更高的时间效率。

Abstract
With the continuous increase of users and items, conventional recommender systems trained on static datasets can hardly adapt to changing environments. The high-throughput data requires the model to be updated in a timely manner for capturing the user interest dynamics, which leads to the emergence of streaming recommender systems. Due to the prevalence of deep learning-based recommender systems, the embedding layer is widely adopted to represent the characteristics of users, items, and other features in low-dimensional vectors. However, it has been proved that setting an identical and static embedding size is sub-optimal in terms of recommendation performance and memory cost, especially for streaming recommendations. To tackle this problem, we first rethink the streaming model update process and model the dynamic embedding size search as a bandit problem. Then, we analyze and quantify the factors that influence the optimal embedding sizes from the statistics perspective. Based on this, we propose the \textbf{D}ynamic \textbf{E}mbedding \textbf{S}ize \textbf{S}earch (\textbf{DESS}) method to minimize the embedding size selection regret on both user and item sides in a non-stationary manner. Theoretically, we obtain a sublinear regret upper bound superior to previous methods. Empirical results across two recommendation tasks on four public datasets also demonstrate that our approach can achieve better streaming recommendation performance with lower memory cost and higher time efficiency.

摘要
随着用户和项目的增加，传统的推荐系统通常采用静态数据集训练，但这些系统在变化的环境中难以适应。高 Throughput 数据需要模型在有效时间内进行更新，以捕捉用户兴趣动态，这导致了流处理推荐系统的出现。由于深度学习基本推荐系统的普遍性，嵌入层广泛采用低维度向量表示用户、项目和其他特征的特征。但是，已经证明将嵌入层大小设置为静态和共同的是优化推荐性和内存成本的不佳选择，特别是在流处理推荐中。为解决这个问题，我们首先重新思考流处理模型更新过程，并将动态嵌入大小搜索视为一个bandit问题。然后，我们分析和量化影响优化嵌入大小的因素，并基于这些因素提出了\textbf{D}ynamic \textbf{E}mbedding \textbf{S}ize \textbf{S}earch (\textbf{DESS})方法，以最小化嵌入大小选择 regret 在用户和项目两个方面。理论上，我们获得了superior于之前方法的下线 regret upper bound。实验结果在四个公共数据集上的两个推荐任务中也表明，我们的方法可以在不同的环境下实现更好的流处理推荐性，同时具有较低的内存成本和更高的时间效率。

Forward-Backward Reasoning in Large Language Models for Verification

paper_url: http://arxiv.org/abs/2308.07758
repo_url: None
paper_authors: Weisen Jiang, Han Shi, Longhui Yu, Zhengying Liu, Yu Zhang, Zhenguo Li, James T. Kwok
for: 本研究旨在提高开端问题 answering 的能力，提出了一种基于 Self-Consistency 和 backwards reasoning 的方法。
methods: 本方法使用了 Self-Consistency sampling 一些可能的 reasoning chains，并使用了 backwards reasoning 来验证 candidate answers。
results: 实验结果表明，FOBAR 可以在六个数据集和三个 LLMS 上达到开端问题 answering 的state-of-the-art性能。

Abstract
Chain-of-Though (CoT) prompting has shown promising performance in various reasoning tasks. Recently, Self-Consistency \citep{wang2023selfconsistency} proposes to sample a diverse set of reasoning chains which may lead to different answers while the answer that receives the most votes is selected. In this paper, we propose a novel method to use backward reasoning in verifying candidate answers. We mask a token in the question by ${\bf x}$ and ask the LLM to predict the masked token when a candidate answer is provided by \textit{a simple template}, i.e., "\textit{\textbf{If we know the answer of the above question is \{a candidate answer\}, what is the value of unknown variable ${\bf x}$?}" Intuitively, the LLM is expected to predict the masked token successfully if the provided candidate answer is correct. We further propose FOBAR to combine forward and backward reasoning for estimating the probability of candidate answers. We conduct extensive experiments on six data sets and three LLMs. Experimental results demonstrate that FOBAR achieves state-of-the-art performance on various reasoning benchmarks.

摘要
Chain-of-Though (CoT) 提示法在不同的理解任务中表现出色。自Consistency \citep{wang2023selfconsistency} 提议采样多种不同的理解链，以便通过不同的答案来选择最佳答案。在这篇论文中，我们提出了一种使用反向理解的新方法，用于验证候选答案。我们将问题中的一个token用{\bf x}来mask，然后问LLM predict这个masked token，当提供了一个简单的模板，即 "\textit{\textbf{如果我们知道上面的问题的答案是\{一个候选答案\},则unknown变量{\bf x}的值是什么？}"。Intuitively，LLM是预计能够成功预测masked token，如果提供的候选答案是正确的。我们还提出了FOBAR来组合前向和反向理解来估计候选答案的概率。我们在六个数据集和三个LLM上进行了广泛的实验，实验结果表明，FOBAR在多种理解 bencmarks 上达到了当前最佳性能。

Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with Image Diffusion Model

paper_url: http://arxiv.org/abs/2308.07749
repo_url: None
paper_authors: Bosheng Qin, Wentao Ye, Qifan Yu, Siliang Tang, Yueting Zhuang
for: 生成高质量人物动画视频，用于应用于游戏、电影等领域。
methods: 使用预训练的T2I扩散模型，通过权重学习模型来生成每帧视频，并使用文本引导和人物姿势来控制人物的动作。
results: 与现有state-of-the-art方法相比，Dancing Avatar可以生成高质量的人物动画视频，保持人物和背景的一致性，同时具有更高的时间协调性。

Abstract
The rising demand for creating lifelike avatars in the digital realm has led to an increased need for generating high-quality human videos guided by textual descriptions and poses. We propose Dancing Avatar, designed to fabricate human motion videos driven by poses and textual cues. Our approach employs a pretrained T2I diffusion model to generate each video frame in an autoregressive fashion. The crux of innovation lies in our adept utilization of the T2I diffusion model for producing video frames successively while preserving contextual relevance. We surmount the hurdles posed by maintaining human character and clothing consistency across varying poses, along with upholding the background's continuity amidst diverse human movements. To ensure consistent human appearances across the entire video, we devise an intra-frame alignment module. This module assimilates text-guided synthesized human character knowledge into the pretrained T2I diffusion model, synergizing insights from ChatGPT. For preserving background continuity, we put forth a background alignment pipeline, amalgamating insights from segment anything and image inpainting techniques. Furthermore, we propose an inter-frame alignment module that draws inspiration from an auto-regressive pipeline to augment temporal consistency between adjacent frames, where the preceding frame guides the synthesis process of the current frame. Comparisons with state-of-the-art methods demonstrate that Dancing Avatar exhibits the capacity to generate human videos with markedly superior quality, both in terms of human and background fidelity, as well as temporal coherence compared to existing state-of-the-art approaches.

摘要
“因应数字世界中创造生命如实的人物需求的增加，我们提出了舞动人物（Dancing Avatar），一个基于文本描述和姿势驱动的人工动画生成器。我们的方法使用预训T2I散射模型来生成每帧影像，透过autoregressive的方式实现每帧影像的生成。我们的创新在于，通过将文本描述和姿势知识融合到预训T2I散射模型中，以确保人物和背景的一致性。为保持人物的一致性，我们提出了一个内部对焦模块，将文本描述驱动的人物知识融合到预训T2I散射模型中。此外，我们还提出了一个间隔对焦模块，将预训T2I散射模型与另一个自动推理管线结合，以增强动画中人物的一致性。 Comparisons with state-of-the-art methods show that Dancing Avatar can generate high-quality human videos with superior fidelity, both in terms of human and background, as well as temporal coherence compared to existing state-of-the-art approaches.”

Exploiting Sparsity in Automotive Radar Object Detection Networks

paper_url: http://arxiv.org/abs/2308.07748
repo_url: None
paper_authors: Marius Lippke, Maurice Quach, Sascha Braun, Daniel Köhler, Michael Ulrich, Bastian Bischoff, Wei Yap Tan
for: 本研究旨在提高自动驾驶系统中的环境识别精度，以确保系统的安全和可靠运行。
methods: 本文使用 sparse convolutional object detection networks，这种网络结合了高效的网格式检测和低计算资源。 authors 还提出了适应 радиар特有挑战的 sparse kernel point pillars (SKPP) 和 dual voxel point convolutions (DVPC)，以解决网格渲染和稀疏基础架构的问题。
results: 在 nuScenes 上测试的 SKPP-DPVCN 架构，与基eline 相比提高了4.19%，并且与之前的状态分析提高了5.89%的Car AP4.0。此外，SKPP-DPVCN 还将平均扩散错误 (ASE) 降低了21.41%。

Abstract
Having precise perception of the environment is crucial for ensuring the secure and reliable functioning of autonomous driving systems. Radar object detection networks are one fundamental part of such systems. CNN-based object detectors showed good performance in this context, but they require large compute resources. This paper investigates sparse convolutional object detection networks, which combine powerful grid-based detection with low compute resources. We investigate radar specific challenges and propose sparse kernel point pillars (SKPP) and dual voxel point convolutions (DVPC) as remedies for the grid rendering and sparse backbone architectures. We evaluate our SKPP-DPVCN architecture on nuScenes, which outperforms the baseline by 5.89% and the previous state of the art by 4.19% in Car AP4.0. Moreover, SKPP-DPVCN reduces the average scale error (ASE) by 21.41% over the baseline.

摘要
“精确的环境认知是自动驾驶系统的安全和可靠运行所必备的。这篇论文探讨了具有强大的格子基础的对象探测网络，它们可以在自动驾驶系统中提供高性能，但是它们需要大量的计算资源。本文提出了稀疑几何点柱（SKPP）和双对称点核心（DVPC）来解决格式化和稀疑网络架构的挑战。我们评估了基于SKPP-DVPC的SKPP-DPVCN架构在nuScenes上的表现，该架构比基准点出5.89%的提升和前一个状态的实验出4.19%的提升。此外，SKPP-DPVCN还 redues了平均规模错误（ASE）的值比基准点下降21.41%。”

Formally-Sharp DAgger for MCTS: Lower-Latency Monte Carlo Tree Search using Data Aggregation with Formal Methods

paper_url: http://arxiv.org/abs/2308.07738
repo_url: None
paper_authors: Debraj Chakraborty, Damien Busatto-Gaston, Jean-François Raskin, Guillermo A. Pérez
for: 这 paper 的目的是提出一种高效的组合 formal methods、Monte Carlo Tree Search (MCTS) 和 deep learning 来生成大 Markov Decision processes (MDPs) 中的高质量递减时间策略。
methods: 这 paper 使用 model-checking 技术来引导 MCTS 算法，生成 MDP 中的高质量决策样本，并用这些样本来训练一个模仿策略。这个模仿策略可以用作在线 MCTS 搜索的导向，或者作为最低延迟时间的策略。
results: 这 paper 使用 statistical model checking 来检测需要更多样本的情况，并将更多样本集中在 MDP 中的不同配置下，以便训练模仿策略。并在 Frozen Lake 和 Pac-Man 环境中进行了实验，证明了该方法的有效性。

Abstract
We study how to efficiently combine formal methods, Monte Carlo Tree Search (MCTS), and deep learning in order to produce high-quality receding horizon policies in large Markov Decision processes (MDPs). In particular, we use model-checking techniques to guide the MCTS algorithm in order to generate offline samples of high-quality decisions on a representative set of states of the MDP. Those samples can then be used to train a neural network that imitates the policy used to generate them. This neural network can either be used as a guide on a lower-latency MCTS online search, or alternatively be used as a full-fledged policy when minimal latency is required. We use statistical model checking to detect when additional samples are needed and to focus those additional samples on configurations where the learnt neural network policy differs from the (computationally-expensive) offline policy. We illustrate the use of our method on MDPs that model the Frozen Lake and Pac-Man environments -- two popular benchmarks to evaluate reinforcement-learning algorithms.

摘要
我们研究如何有效地结合正式方法、Monte Carlo Tree Search（MCTS）和深度学习，以生成高质量的回溯时间政策在大Markov决策过程（MDP）中。特别是，我们使用模型检查技术来引导MCTS算法，以生成 offline 样本高质量决策在 MDP 的表示集中。这些样本可以用来训练一个模仿政策的神经网络，这个神经网络可以在更低的延迟下在线搜索中作为引导，或者作为尽可能快的全功能政策。我们使用统计模型检查来检测需要更多的样本，并将这些样本集中在计算机严重的 offline 政策与学习的神经网络政策之间的差异。我们在 Frozen Lake 和 Pac-Man 环境中使用我们的方法进行示例。

Flashpoints Signal Hidden Inherent Instabilities in Land-Use Planning

paper_url: http://arxiv.org/abs/2308.07714
repo_url: None
paper_authors: Hazhir Aliahmadi, Maeve Beckett, Sam Connolly, Dongmei Chen, Greg van Anders
For: The paper aims to improve the objectivity and transparency of land-use decision-making processes by using optimization-based planning approaches, such as Multi-Objective Land Allocation (MOLA).* Methods: The paper uses quantitative methods to evaluate planning priorities and generate a series of unstable “flashpoints” where small changes in planning priorities lead to large-scale changes in land use.* Results: The paper shows that quantitative methods can reduce the combinatorially large space of possible land-use patterns to a small, characteristic set that can engage stakeholders to arrive at more efficient and just outcomes. Additionally, the paper identifies “gray areas” in land-use type that arise due to instabilities in the planning process.

Abstract
Land-use decision-making processes have a long history of producing globally pervasive systemic equity and sustainability concerns. Quantitative, optimization-based planning approaches, e.g. Multi-Objective Land Allocation (MOLA), seemingly open the possibility to improve objectivity and transparency by explicitly evaluating planning priorities by the type, amount, and location of land uses. Here, we show that optimization-based planning approaches with generic planning criteria generate a series of unstable "flashpoints" whereby tiny changes in planning priorities produce large-scale changes in the amount of land use by type. We give quantitative arguments that the flashpoints we uncover in MOLA models are examples of a more general family of instabilities that occur whenever planning accounts for factors that coordinate use on- and between-sites, regardless of whether these planning factors are formulated explicitly or implicitly. We show that instabilities lead to regions of ambiguity in land-use type that we term "gray areas". By directly mapping gray areas between flashpoints, we show that quantitative methods retain utility by reducing combinatorially large spaces of possible land-use patterns to a small, characteristic set that can engage stakeholders to arrive at more efficient and just outcomes.

摘要
农用决策过程具有历史悠久的生产全球性平等和可持续发展问题。量化优化规划方法，例如多目标农用分配（MOLA），似乎可以提高 объекivity和透明度，由明确规划优先级来评估农用类型、量和位置。在这里，我们表明了量化规划方法中的“闪点”现象，即小 Change in 规划优先级可能导致大规模的农用类型占用量变化。我们提供了量化的证明，表明这些闪点在MOLA模型中是一种更通用的不稳定性现象，无论规划因素是否明确或暗示地表达。我们还显示了这些不稳定性导致农用类型之间的“灰色区”，即不同规划优先级下的农用类型占用量的变化范围。通过直接映射灰色区，我们表明了量化方法仍然保留了实用性，可以将可能的农用模式空间减少到一个小、特征集，以便更有效和公正的决策结果。

Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models

paper_url: http://arxiv.org/abs/2308.07706
repo_url: None
paper_authors: Kanchan Poudel, Manish Dhakal, Prasiddha Bhandari, Rabin Adhikari, Safal Thapaliya, Bishesh Khanal
for: 医疗图像分割是医疗领域中重要的应用之一，但是将文本指导integrated到图像分割模型中仍然是一个有限的进展。
methods: 我们提议使用多Modal vision-language模型来捕捉图像描述和图像的semantic信息，以便进行多种医疗图像的分割。
results: 我们的研究发现，将open-domain图像的视觉语言模型直接应用于医疗图像分割 tasks是不可靠的，但是通过微调可以提高其性能。我们在11个医疗dataset上使用4种VLMs和9种提示来评估其零基eline和微调性能。

Abstract
Medical Image Segmentation is crucial in various clinical applications within the medical domain. While state-of-the-art segmentation models have proven effective, integrating textual guidance to enhance visual features for this task remains an area with limited progress. Existing segmentation models that utilize textual guidance are primarily trained on open-domain images, raising concerns about their direct applicability in the medical domain without manual intervention or fine-tuning. To address these challenges, we propose using multimodal vision-language models for capturing semantic information from image descriptions and images, enabling the segmentation of diverse medical images. This study comprehensively evaluates existing vision language models across multiple datasets to assess their transferability from the open domain to the medical field. Furthermore, we introduce variations of image descriptions for previously unseen images in the dataset, revealing notable variations in model performance based on the generated prompts. Our findings highlight the distribution shift between the open-domain images and the medical domain and show that the segmentation models trained on open-domain images are not directly transferrable to the medical field. But their performance can be increased by finetuning them in the medical datasets. We report the zero-shot and finetuned segmentation performance of 4 Vision Language Models (VLMs) on 11 medical datasets using 9 types of prompts derived from 14 attributes.

摘要
医疗图像分割是医疗领域中不同临床应用中的关键。虽然当前的分割模型有效，但将文本指导 integrate into 图像特征以提高分割效果是一个有限的进展。现有的分割模型，它们主要是在开放领域图像上训练的，这引发了对其直接适用性在医疗领域的担忧。为了解决这些挑战，我们提议使用多模态视语言模型，以便从图像描述和图像中提取 semantic information，以便分割多种医疗图像。本研究对多个数据集进行了全面的评估，以评估现有的视语言模型在医疗领域是否可以进行转移。此外，我们还引入了图像描述中的变化，并评估模型的性能差异。我们的发现表明，开放领域图像和医疗领域之间存在分布差异，而且训练在开放领域图像上的模型不能直接应用于医疗领域。但是，通过训练这些模型在医疗数据集上，可以提高其性能。我们在11个医疗数据集上使用4种视语言模型进行零基础和训练性能测试，使用9种Prompt derived from 14个特征。

DiffGuard: Semantic Mismatch-Guided Out-of-Distribution Detection using Pre-trained Diffusion Models

paper_url: http://arxiv.org/abs/2308.07687
repo_url: https://github.com/cure-lab/diffguard
paper_authors: Ruiyuan Gao, Chenchen Zhao, Lanqing Hong, Qiang Xu
for: 这个研究旨在提出一种基于 semantic mismatch 的 Out-of-Distribution (OOD) 检测方法，并使用 pre-trained diffusion models 来实现。
methods: 本研究使用了 conditional Generative Adversarial Network (cGAN) 来增加 semantic mismatch 在图像空间中，并使用 pre-trained diffusion models 来直接进行 semantic mismatch-guided OOD 检测。
results: 实验结果显示 DiffGuard 能够在 Cifar-10 和 ImageNet 上达到州际级的 OOD 检测效果，并且可以与现有的 OOD 检测技术结合以 дости持续获得最佳 OOD 检测结果。

Abstract
Given a classifier, the inherent property of semantic Out-of-Distribution (OOD) samples is that their contents differ from all legal classes in terms of semantics, namely semantic mismatch. There is a recent work that directly applies it to OOD detection, which employs a conditional Generative Adversarial Network (cGAN) to enlarge semantic mismatch in the image space. While achieving remarkable OOD detection performance on small datasets, it is not applicable to ImageNet-scale datasets due to the difficulty in training cGANs with both input images and labels as conditions. As diffusion models are much easier to train and amenable to various conditions compared to cGANs, in this work, we propose to directly use pre-trained diffusion models for semantic mismatch-guided OOD detection, named DiffGuard. Specifically, given an OOD input image and the predicted label from the classifier, we try to enlarge the semantic difference between the reconstructed OOD image under these conditions and the original input image. We also present several test-time techniques to further strengthen such differences. Experimental results show that DiffGuard is effective on both Cifar-10 and hard cases of the large-scale ImageNet, and it can be easily combined with existing OOD detection techniques to achieve state-of-the-art OOD detection results.

摘要
(Simplified Chinese)给定一个分类器，外围样本的内在特性是其 contenuto 与所有合法类型的 semantics 不同，即 semantics mismatch。有一项最近的工作直接应用于 OOD 检测，使用 conditional Generative Adversarial Network (cGAN) 来增大图像空间中的 semantic mismatch。虽然在小 dataset 上达到了惊人的 OOD 检测性能，但是在 ImageNet scale 上 dataset 上不可能进行训练 cGAN 因为 condition 的困难。由于 diffusion models 训练更加容易，并且可以适应多种 condition，因此在这里我们提议直接使用预训练的 diffusion models 进行 semantics mismatch 导向的 OOD 检测，名为 DiffGuard。Specifically，给定一个 OOD 输入图像和分类器预测的标签，我们尝试通过增大这些 condition 下重建 OOD 图像的 semantic difference 和原始输入图像之间的差异来增大 semantic mismatch。我们还提供了多种测试时技术来进一步强化这些差异。实验结果表明，DiffGuard 效果良好于 Cifar-10 和 ImageNet 中的困难情况，并且可以与现有 OOD 检测技术相结合以 достиieving state-of-the-art OOD 检测结果。

paper_url: http://arxiv.org/abs/2308.07686
repo_url: https://github.com/lihong2303/agm_iccv2023
paper_authors: Hong Li, Xingyu Li, Pengbo Hu, Yinuo Lei, Chunxiao Li, Yi Zhou
for: 提高多模态学习模型的性能，解决现有的模态竞争问题。
methods: 提出了一种适应性Gradient Modulation方法，可以提高多模态模型的表现，并且可以应用于不同的融合策略。
results: 经验表明，我们的方法可以超越现有的模ulation方法，并且通过引入一种新的竞争强度度量，得到了对模态竞争的量化理解。

Abstract
While the field of multi-modal learning keeps growing fast, the deficiency of the standard joint training paradigm has become clear through recent studies. They attribute the sub-optimal performance of the jointly trained model to the modality competition phenomenon. Existing works attempt to improve the jointly trained model by modulating the training process. Despite their effectiveness, those methods can only apply to late fusion models. More importantly, the mechanism of the modality competition remains unexplored. In this paper, we first propose an adaptive gradient modulation method that can boost the performance of multi-modal models with various fusion strategies. Extensive experiments show that our method surpasses all existing modulation methods. Furthermore, to have a quantitative understanding of the modality competition and the mechanism behind the effectiveness of our modulation method, we introduce a novel metric to measure the competition strength. This metric is built on the mono-modal concept, a function that is designed to represent the competition-less state of a modality. Through systematic investigation, our results confirm the intuition that the modulation encourages the model to rely on the more informative modality. In addition, we find that the jointly trained model typically has a preferred modality on which the competition is weaker than other modalities. However, this preferred modality need not dominate others. Our code will be available at https://github.com/lihong2303/AGM_ICCV2023.

摘要
而 field of multi-modal learning 的发展速度不断增长，标准的联合训练方法的缺点也日益明显。这些研究表明，联合训练模型的性能下降归结于modal competition现象。现有的方法可以通过修改训练过程来改善联合训练模型，但这些方法只适用于late fusion模型。更重要的是，modal competition的机制还没有得到解释。在这篇论文中，我们首先提出一种适应性的梯度修正方法，可以提高不同拟合策略的多Modal模型性能。广泛的实验表明，我们的方法超过了所有现有的修正方法。此外，为了有一个准确的理解modal competition的机制，我们引入了一种新的竞争力度量，它基于单模态概念，这是一种用于表示没有竞争的状态的函数。通过系统性的调查，我们的结果证明了我们的修正方法能够鼓励模型依赖于更有用的感知模式。此外，我们发现，联合训练模型通常有一个具有较弱竞争力的首选模式，但这并不意味着这个模式会完全控制其他模式。我们的代码将在https://github.com/lihong2303/AGM_ICCV2023上发布。

EQ-Net: Elastic Quantization Neural Networks

paper_url: http://arxiv.org/abs/2308.07650
repo_url: https://github.com/xuke225/eq-net
paper_authors: Ke Xu, Lei Han, Ye Tian, Shangshang Yang, Xingyi Zhang
for: 该 paper 目的是提出一种一键网络量化 regime， named Elastic Quantization Neural Networks (EQ-Net)，用于训练一个可重用的量化超网。
methods: 该 paper 使用了一种弹性量化空间 (包括弹性比特宽、粒子大小和对称) 适应不同的主流量化形式。其次，提出了 Weight Distribution Regularization Loss (WDR-Loss) 和 Group Progressive Guidance Loss (GPG-Loss) 两种损失函数来减少量化空间中 weights 和输出 logits 的分布不一致。最后，使用了遗传算法和提出的 Conditional Quantization-Aware Accuracy Predictor (CQAP) 作为估计器快速搜索混合精度量化神经网络在超网中。
results: 广泛的实验表明，我们的 EQ-Net 与其静态对应物以及当前最佳稳定量化方法几乎相当或更好。代码可以在 \href{https://github.com/xuke225/EQ-Net.git}{https://github.com/xuke225/EQ-Net} 上获得。

Abstract
Current model quantization methods have shown their promising capability in reducing storage space and computation complexity. However, due to the diversity of quantization forms supported by different hardware, one limitation of existing solutions is that usually require repeated optimization for different scenarios. How to construct a model with flexible quantization forms has been less studied. In this paper, we explore a one-shot network quantization regime, named Elastic Quantization Neural Networks (EQ-Net), which aims to train a robust weight-sharing quantization supernet. First of all, we propose an elastic quantization space (including elastic bit-width, granularity, and symmetry) to adapt to various mainstream quantitative forms. Secondly, we propose the Weight Distribution Regularization Loss (WDR-Loss) and Group Progressive Guidance Loss (GPG-Loss) to bridge the inconsistency of the distribution for weights and output logits in the elastic quantization space gap. Lastly, we incorporate genetic algorithms and the proposed Conditional Quantization-Aware Accuracy Predictor (CQAP) as an estimator to quickly search mixed-precision quantized neural networks in supernet. Extensive experiments demonstrate that our EQ-Net is close to or even better than its static counterparts as well as state-of-the-art robust bit-width methods. Code can be available at \href{https://github.com/xuke225/EQ-Net.git}{https://github.com/xuke225/EQ-Net}.

摘要
当前的模型量化方法已经表现出了减少存储空间和计算复杂度的承诺。然而，由于不同硬件支持的量化形式的多样性，现有的解决方案通常需要重复优化不同的场景。在这篇论文中，我们探索了一种一键网络量化方式，名为弹性量化神经网络（EQ-Net），旨在训练一个可以共享量化超网。首先，我们提出了弹性量化空间（包括弹性位数、粒度和对称），以适应不同主流量化形式。其次，我们提出了Weight Distribution Regularization Loss（WDR-Loss）和Group Progressive Guidance Loss（GPG-Loss）来bridging弹性量化空间中 weights和输出logits的分布不一致性。最后，我们将遗传算法和提出的Conditional Quantization-Aware Accuracy Predictor（CQAP）作为估计器，快速查找混合精度量化神经网络在超网中。广泛的实验证明了我们的EQ-Net与其静态对手以及State-of-the-art Robust Bit-Width Methods相当或甚至更好。代码可以在 \href{https://github.com/xuke225/EQ-Net.git}{https://github.com/xuke225/EQ-Net} 中获取。

Ternary Singular Value Decomposition as a Better Parameterized Form in Linear Mapping

paper_url: http://arxiv.org/abs/2308.07641
repo_url: None
paper_authors: Boyu Chen, Hanxuan Chen, Jiao He, Fengyu Sun, Shangling Jui
For: 本文提出了一种简单 yet novel的参数化线性映射方法，以实现杰出的网络压缩性能。* Methods: 该方法基于pseudo SVD（Ternary SVD，TSVD），与标准SVD不同的是，TSVD限制$U$和$V$矩阵为ternary矩阵（${\pm 1, 0}$）。这意味着在计算$U(\cdot)$和$V(\cdot)$时，只需要进行加法运算。* Results: 在各种网络和任务中，TSVD可以实现现状顶峰的网络压缩性能，包括当前基线模型如ConvNext、Swim、BERT以及大型语言模型如OPT。

Abstract
We present a simple yet novel parameterized form of linear mapping to achieves remarkable network compression performance: a pseudo SVD called Ternary SVD (TSVD). Unlike vanilla SVD, TSVD limits the $U$ and $V$ matrices in SVD to ternary matrices form in $\{\pm 1, 0\}$. This means that instead of using the expensive multiplication instructions, TSVD only requires addition instructions when computing $U(\cdot)$ and $V(\cdot)$. We provide direct and training transition algorithms for TSVD like Post Training Quantization and Quantization Aware Training respectively. Additionally, we analyze the convergence of the direct transition algorithms in theory. In experiments, we demonstrate that TSVD can achieve state-of-the-art network compression performance in various types of networks and tasks, including current baseline models such as ConvNext, Swim, BERT, and large language model like OPT.

摘要
我们提出了一种简单 yet novel的参数化线性映射方法，可以夺得惊人的网络压缩性能：一种 pseudo SVD called Ternary SVD (TSVD)。不同于普通的 SVD，TSVD 限制 $U$ 和 $V$ 矩阵在 SVD 中到了三元矩阵形式 ($\{\pm 1, 0\}$)。这意味着在计算 $U(\cdot)$ 和 $V(\cdot)$ 时，TSVD 只需要使用加法指令，而不需要昂贵的乘法指令。我们提供了直接转移算法和训练转移算法 для TSVD，如 Post Training Quantization 和 Quantization Aware Training 等。此外，我们也 theoretically 分析了直接转移算法的整合性。在实验中，我们证明了 TSVD 可以在不同类型的网络和任务上夺得当今基线模型如 ConvNext、Swim、BERT 和大语言模型 OPT 的状态级网络压缩性能。

LLM-Mini-CEX: Automatic Evaluation of Large Language Model for Diagnostic Conversation

paper_url: http://arxiv.org/abs/2308.07635
repo_url: None
paper_authors: Xiaoming Shi, Jie Xu, Jinru Ding, Jiali Pang, Sichen Liu, Shuqing Luo, Xingwei Peng, Lu Lu, Haihong Yang, Mingtao Hu, Tong Ruan, Shaoting Zhang
for: 这研究旨在提供一个统一的评估标准，以评估医疗语言模型（LLM）的诊断能力。
methods: 该研究首先建立了一个特有的评估标准，称为LLM特有的Mini-CEX，以评估医疗LLM的诊断能力。此外，研究者还开发了一个patient simulator，用于自动与LLM进行对话，并使用ChatGPT来自动评估诊断对话的质量。
results: 实验结果表明，LLM特有的Mini-CEX是一个有效和必需的评估标准，可以评估医疗LLM的诊断对话质量。此外，ChatGPT也可以自动评估诊断对话的人文特质，并提供可重复和自动比较不同LLM的能力。

Abstract
There is an increasing interest in developing LLMs for medical diagnosis to improve diagnosis efficiency. Despite their alluring technological potential, there is no unified and comprehensive evaluation criterion, leading to the inability to evaluate the quality and potential risks of medical LLMs, further hindering the application of LLMs in medical treatment scenarios. Besides, current evaluations heavily rely on labor-intensive interactions with LLMs to obtain diagnostic dialogues and human evaluation on the quality of diagnosis dialogue. To tackle the lack of unified and comprehensive evaluation criterion, we first initially establish an evaluation criterion, termed LLM-specific Mini-CEX to assess the diagnostic capabilities of LLMs effectively, based on original Mini-CEX. To address the labor-intensive interaction problem, we develop a patient simulator to engage in automatic conversations with LLMs, and utilize ChatGPT for evaluating diagnosis dialogues automatically. Experimental results show that the LLM-specific Mini-CEX is adequate and necessary to evaluate medical diagnosis dialogue. Besides, ChatGPT can replace manual evaluation on the metrics of humanistic qualities and provides reproducible and automated comparisons between different LLMs.

摘要
<>将文本翻译成简化中文。<>随着医疗推荐系统的发展，有越来越多的研究者关注开发医疗推荐系统，以提高诊断效率。然而，这些系统的评价标准尚未统一，导致诊断系统的质量和风险难以评估，从而限制了医疗推荐系统的应用场景。此外，当前的评价方法仍然依赖于人工干预，通过与医疗推荐系统进行劳动密集的对话来获取诊断对话，以及人工评估诊断对话的质量。为了解决统一评价标准的缺失，我们首先建立了一个特定于医疗推荐系统的评价标准，称为LLM特定的Mini-CEX，以评估医疗推荐系统的诊断能力。为了解决人工干预的问题，我们开发了一个模拟病人的模拟器，可以自动与医疗推荐系统进行对话，并使用ChatGPT来自动评估诊断对话的质量。实验结果表明，LLM特定的Mini-CEX是有效和必要的评价医疗诊断对话的标准，而ChatGPT可以替代人工评估，并提供可重复和自动化的对比。

A Survey on Model Compression for Large Language Models

paper_url: http://arxiv.org/abs/2308.07633
repo_url: None
paper_authors: Xunyu Zhu, Jian Li, Yong Liu, Can Ma, Weiping Wang
for: 本文旨在概述大自然语言处理任务中的语言模型压缩技术，尤其是针对资源有限的环境下进行实用部署。
methods: 本文介绍了各种压缩方法，包括量化、剪辑、知识传承和更多的技术，并讲述了每种方法的最新发展和创新应用。
results: 本文提供了评估压缩后模型效果的方法和指标，并探讨了这些方法在实际应用中的实用性。

Abstract
Large Language Models (LLMs) have revolutionized natural language processing tasks with remarkable success. However, their formidable size and computational demands present significant challenges for practical deployment, especially in resource-constrained environments. As these challenges become increasingly pertinent, the field of model compression has emerged as a pivotal research area to alleviate these limitations. This paper presents a comprehensive survey that navigates the landscape of model compression techniques tailored specifically for LLMs. Addressing the imperative need for efficient deployment, we delve into various methodologies, encompassing quantization, pruning, knowledge distillation, and more. Within each of these techniques, we highlight recent advancements and innovative approaches that contribute to the evolving landscape of LLM research. Furthermore, we explore benchmarking strategies and evaluation metrics that are essential for assessing the effectiveness of compressed LLMs. By providing insights into the latest developments and practical implications, this survey serves as an invaluable resource for both researchers and practitioners. As LLMs continue to evolve, this survey aims to facilitate enhanced efficiency and real-world applicability, establishing a foundation for future advancements in the field.

摘要

Vision-based Semantic Communications for Metaverse Services: A Contest Theoretic Approach

paper_url: http://arxiv.org/abs/2308.07618
repo_url: None
paper_authors: Guangyuan Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Boon Hee Soong
for: 提供一个对Metaverse中人员与服务提供商之间的对话和资源分配的Semantic Communication框架，以提高用户在虚拟世界中的体验。
methods: 使用Contest Theory来模型用户和服务提供商之间的互动，并根据每个用户的需求进行资源分配。使用Semantic Communication技术将数据量降至51字节，从而减少网络资源的消耗。使用深度Q学网来优化优先级，以最大化性能和资源分配效率。
results: 比较传统平均分配方法，透过优化优先级，将下调损失率降至66.076%。提供一个为Metaverse中人员与服务提供商之间的资源分配解决方案，以提高用户在虚拟世界中的体验。

Abstract
The popularity of Metaverse as an entertainment, social, and work platform has led to a great need for seamless avatar integration in the virtual world. In Metaverse, avatars must be updated and rendered to reflect users' behaviour. Achieving real-time synchronization between the virtual bilocation and the user is complex, placing high demands on the Metaverse Service Provider (MSP)'s rendering resource allocation scheme. To tackle this issue, we propose a semantic communication framework that leverages contest theory to model the interactions between users and MSPs and determine optimal resource allocation for each user. To reduce the consumption of network resources in wireless transmission, we use the semantic communication technique to reduce the amount of data to be transmitted. Under our simulation settings, the encoded semantic data only contains 51 bytes of skeleton coordinates instead of the image size of 8.243 megabytes. Moreover, we implement Deep Q-Network to optimize reward settings for maximum performance and efficient resource allocation. With the optimal reward setting, users are incentivized to select their respective suitable uploading frequency, reducing down-sampling loss due to rendering resource constraints by 66.076\% compared with the traditional average distribution method. The framework provides a novel solution to resource allocation for avatar association in VR environments, ensuring a smooth and immersive experience for all users.

摘要
“Metaverse的受欢迎程度使得虚拟世界中的人物集成变得非常重要。在Metaverse中，人物需要实时更新和渲染，以反映用户的行为。实现实时同步是复杂的，对Metaverse服务提供商（MSP）的渲染资源分配方案带来高要求。为解决这个问题，我们提议一种基于 semantics 的通信框架，利用对用户和 MSP 之间的交互进行模型化，并确定每个用户的优化资源分配策略。使用semantic通信技术可以减少无线传输中的网络资源消耗，并且我们使用 Deep Q-Network 优化奖励设置，以实现最佳性和有效的资源分配。根据优化奖励设置，用户可以选择适合自己的上传频率，从而减少由渲染资源限制引起的下采样损失，比传统均值分布方法减少了66.076%。该框架为虚拟世界中人物协调资源分配提供了一种新的解决方案，以保证所有用户都能获得平滑和充满感的体验。”

ERA: Enhanced Relaxed A algorithm for Solving the Shortest Path Problem in Regular Grid Maps

paper_url: http://arxiv.org/abs/2308.10988
repo_url: None
paper_authors: Adel Ammar
for: 解决点到点最短路径问题在静态Regular 8- neighbbor connectivity（G8）格网中。
methods: 使用一种新的算法，可以看作是 Hadlock 算法的普遍化，并且与 relaxed $A^*$（$RA^*）算法相等于，但具有不同的计算策略，基于定义lookup矩阵。
results: 通过对不同类型和大小的格图（1290个运行在43个地图上）进行实验，证明该算法比 $RA^*$ 快2.25倍，比原始 $A^*$ 快17倍，具有更好的内存利用率，不需要存储 G Score 矩阵。

Abstract
This paper introduces a novel algorithm for solving the point-to-point shortest path problem in a static regular 8-neighbor connectivity (G8) grid. This algorithm can be seen as a generalization of Hadlock algorithm to G8 grids, and is shown to be theoretically equivalent to the relaxed $A^*$ ($RA^*$) algorithm in terms of the provided solution's path length, but with substantial time and memory savings, due to a completely different computation strategy, based on defining a set of lookup matrices. Through an experimental study on grid maps of various types and sizes (1290 runs on 43 maps), it is proven to be 2.25 times faster than $RA^*$ and 17 times faster than the original $A^*$, in average. Moreover, it is more memory-efficient, since it does not need to store a G score matrix.

摘要
这篇论文介绍了一种新的算法，用于解决在静态正方形8邻连接（G8）网格上的点到点最短路径问题。这种算法可以看作是 Hadlock 算法的总线式扩展，并且与 $RA^*$ 算法在解提供的路径长度方面是等价的，但具有不同的计算策略，基于定义一组查找表。通过对不同类型和大小的网格图（1290 个运行在 43 个图）进行实验研究，这种算法被证明为 $RA^*$ 的 2.25 倍快，并且比原始 $A^*$ 快了 17 倍，平均而言。此外，它还更加具有内存效率，因为它不需要存储 G 分数矩阵。

SGDiff: A Style Guided Diffusion Model for Fashion Synthesis

paper_url: http://arxiv.org/abs/2308.07605
repo_url: https://github.com/taited/sgdiff
paper_authors: Zhengwentai Sun, Yanghong Zhou, Honghong He, P. Y. Mok
For: 本研究旨在开发一种新的样式指导扩散模型（SGDiff），以解决现有图像生成模型存在的一些缺陷。* Methods: 该模型结合图像modal和预训练的文本到图像扩散模型，以实现创新的时尚图像生成。它利用补充性的样式指导，降低训练成本，并在文本输入只能控制生成的样式方面解决了一些问题。* Results: 本研究新引入了SG-Fashion数据集，该数据集专门用于时尚图像生成应用，具有高分辨率图像和广泛的衣物类别。通过了全面的缺失学习研究，我们证明了提议的模型可以生成符合类别、产品特性和风格的时尚图像。

Abstract
This paper reports on the development of \textbf{a novel style guided diffusion model (SGDiff)} which overcomes certain weaknesses inherent in existing models for image synthesis. The proposed SGDiff combines image modality with a pretrained text-to-image diffusion model to facilitate creative fashion image synthesis. It addresses the limitations of text-to-image diffusion models by incorporating supplementary style guidance, substantially reducing training costs, and overcoming the difficulties of controlling synthesized styles with text-only inputs. This paper also introduces a new dataset -- SG-Fashion, specifically designed for fashion image synthesis applications, offering high-resolution images and an extensive range of garment categories. By means of comprehensive ablation study, we examine the application of classifier-free guidance to a variety of conditions and validate the effectiveness of the proposed model for generating fashion images of the desired categories, product attributes, and styles. The contributions of this paper include a novel classifier-free guidance method for multi-modal feature fusion, a comprehensive dataset for fashion image synthesis application, a thorough investigation on conditioned text-to-image synthesis, and valuable insights for future research in the text-to-image synthesis domain. The code and dataset are available at: \url{https://github.com/taited/SGDiff}.

摘要
The paper also introduces a new dataset called SG-Fashion, which is specifically designed for fashion image synthesis and includes high-resolution images and a wide range of garment categories. The authors conduct a comprehensive ablation study to examine the effectiveness of the proposed method in various conditions and demonstrate its ability to generate fashion images with the desired categories, attributes, and styles.The main contributions of this paper include a novel classifier-free guidance method for multi-modal feature fusion, a comprehensive dataset for fashion image synthesis, a thorough investigation of conditioned text-to-image synthesis, and valuable insights for future research in the text-to-image synthesis domain. The code and dataset are available online at: .

Generating Personas for Games with Multimodal Adversarial Imitation Learning

paper_url: http://arxiv.org/abs/2308.07598
repo_url: None
paper_authors: William Ahlberg, Alessandro Sestini, Konrad Tollmar, Linus Gisslén
for: 这篇论文目标是生成多个人工智能机器人可以模仿人类游戏玩家的多种玩法。
methods: 该论文提出了一种基于多模式生成对抗学习（MultiGAIL）的新方法，使用辅助输入参数来学习不同的人工智能玩家模式，并使用多个批评器作为奖励模型。
results: 实验结果表明，该方法在两个环境中（连续和离散动作空间）都有效，可以生成多个不同的人工智能玩家模式。

Abstract
Reinforcement learning has been widely successful in producing agents capable of playing games at a human level. However, this requires complex reward engineering, and the agent's resulting policy is often unpredictable. Going beyond reinforcement learning is necessary to model a wide range of human playstyles, which can be difficult to represent with a reward function. This paper presents a novel imitation learning approach to generate multiple persona policies for playtesting. Multimodal Generative Adversarial Imitation Learning (MultiGAIL) uses an auxiliary input parameter to learn distinct personas using a single-agent model. MultiGAIL is based on generative adversarial imitation learning and uses multiple discriminators as reward models, inferring the environment reward by comparing the agent and distinct expert policies. The reward from each discriminator is weighted according to the auxiliary input. Our experimental analysis demonstrates the effectiveness of our technique in two environments with continuous and discrete action spaces.

摘要
现在的人工智能技术中，强化学习已经广泛应用于生成人类水平的游戏机器人。然而，这需要复杂的奖励工程，并且机器人的结果策略可能很难预测。为了模型人类多种游戏风格，超过强化学习是必要的，但这可以很难以表示为奖励函数。本文提出了一种新的模仿学习方法，可以生成多个人格策略用于游戏测试。我们称之为多modal生成对抗学习（MultiGAIL）。MultiGAIL使用了一个辅助输入参数，通过单个机器人模型来学习不同的人格。我们使用多个判据器作为奖励模型，通过比较机器人和各个专家策略来推断环境奖励。每个判据器的奖励得分被Weighted According to辅助输入。我们的实验分析表明，我们的技术在 kontinuous 和 discrete 动作空间中的两个环境中具有效果。

AutoLTS: Automating Cycling Stress Assessment via Contrastive Learning and Spatial Post-processing

paper_url: http://arxiv.org/abs/2308.07580
repo_url: None
paper_authors: Bo Lin, Shoshanna Saxe, Timothy C. Y. Chan
for: 这个论文是为了提供一种快速、精准和大规模的自行车压力评估方法，以便在城市道路网中规划自行车设施和路线建议。
methods: 这个论文使用了深度学习框架，利用街景图像来支持快速、精准和大规模的自行车压力评估。具体来说，这个框架包括一种对比学习方法，利用自行车压力标签的顺序关系，以及一种后处理技术，以保证预测结果的空间稳定性。
results: 在使用了39,153条道路段的 datasets 上，我们的结果表明，我们的深度学习框架可以快速、精准地进行自行车压力评估，并且可以使用街景图像来评估自行车压力，即使没有高质量的道路几何和机动车数据。

Abstract
Cycling stress assessment, which quantifies cyclists' perceived stress imposed by the built environment and motor traffics, increasingly informs cycling infrastructure planning and cycling route recommendation. However, currently calculating cycling stress is slow and data-intensive, which hinders its broader application. In this paper, We propose a deep learning framework to support accurate, fast, and large-scale cycling stress assessments for urban road networks based on street-view images. Our framework features i) a contrastive learning approach that leverages the ordinal relationship among cycling stress labels, and ii) a post-processing technique that enforces spatial smoothness into our predictions. On a dataset of 39,153 road segments collected in Toronto, Canada, our results demonstrate the effectiveness of our deep learning framework and the value of using image data for cycling stress assessment in the absence of high-quality road geometry and motor traffic data.

摘要
《单车压力评估》，它衡量单车者对建筑环境和机动车流的感知压力，逐渐成为单车基础设施规划和单车路径建议的重要指标。但目前计算单车压力却是慢且资料密集的，这限制了它的广泛应用。在这篇论文中，我们提出了一个深度学习框架，用于支持精确、快速、大规模的单车压力评估，基于街景影像。我们的框架包括：①一种对比学习方法，利用单车压力标签之间的顺序关系，并②一种后处理技术，对我们的预测进行空间稳定化。在加拿大多伦多的39,153条道路段的数据集上，我们的结果显示了我们的深度学习框架的有效性，以及使用影像数据进行单车压力评估在缺乏高品质道路几何和机动车流数据的情况下的价值。

IoT Data Trust Evaluation via Machine Learning

paper_url: http://arxiv.org/abs/2308.11638
repo_url: https://github.com/jettbrains/-L-
paper_authors: Timothy Tadj, Reza Arablouei, Volkan Dedeoglu
for: 评估互联网器件（IoT）数据的信任性。
methods: 使用随机散步填充（RWI）方法生成具有不可靠性的数据，并从感知器件数据中提取有效地捕捉自适应性和它们与邻居传感器数据的相关性的新特征。
results: 通过对多种基于机器学习（ML）的 IoT 数据信任评估方法进行广泛的实验，发现常见的 ML 基于方法表现不佳，这可以归因于不可靠的假设，即归一化提供可靠的标签 для数据信任。同时，通过使用RWI生成的数据和提取的特征，ML模型在未看到的数据上进行了好的普适性和超越性。此外， semi-supervised ML 方法，只需要约 10% 的数据标注，可以提供竞争力强的表现，并且在实际应用中更加具有实用性。

Abstract
Various approaches based on supervised or unsupervised machine learning (ML) have been proposed for evaluating IoT data trust. However, assessing their real-world efficacy is hard mainly due to the lack of related publicly-available datasets that can be used for benchmarking. Since obtaining such datasets is challenging, we propose a data synthesis method, called random walk infilling (RWI), to augment IoT time-series datasets by synthesizing untrustworthy data from existing trustworthy data. Thus, RWI enables us to create labeled datasets that can be used to develop and validate ML models for IoT data trust evaluation. We also extract new features from IoT time-series sensor data that effectively capture its auto-correlation as well as its cross-correlation with the data of the neighboring (peer) sensors. These features can be used to learn ML models for recognizing the trustworthiness of IoT sensor data. Equipped with our synthesized ground-truth-labeled datasets and informative correlation-based feature, we conduct extensive experiments to critically examine various approaches to evaluating IoT data trust via ML. The results reveal that commonly used ML-based approaches to IoT data trust evaluation, which rely on unsupervised cluster analysis to assign trust labels to unlabeled data, perform poorly. This poor performance can be attributed to the underlying unsubstantiated assumption that clustering provides reliable labels for data trust, a premise that is found to be untenable. The results also show that the ML models learned from datasets augmented via RWI while using the proposed features generalize well to unseen data and outperform existing related approaches. Moreover, we observe that a semi-supervised ML approach that requires only about 10% of the data labeled offers competitive performance while being practically more appealing compared to the fully-supervised approaches.

摘要
各种基于监督或无监督机器学习（ML）方法已经提出来评估互联网器件（IoT）数据的信任性。然而，在实际世界中评估这些方法的效果很难，主要因为缺乏相关的公共可用数据集，可以用于比较。为了解决这个问题，我们提议一种数据生成方法，即随机游走填充（RWI），用于增强IoT时序数据集。这种方法可以生成可信worthy数据，并将其与现有的可信worthy数据相结合，以生成可靠的标注数据集。这些标注数据集可以用于开发和验证ML模型，以评估IoT数据的信任性。此外，我们还提取了IoT时序感知器数据中有效地捕捉自身的自相关性以及与邻居感知器数据的相关性。这些特征可以用于学习ML模型，以识别IoT感知器数据的信任性。利用我们生成的标注数据集和有用的相关特征，我们进行了广泛的实验，用以检验不同的IoT数据信任评估方法。结果显示，通常用于IoT数据信任评估的ML基于方法，即基于归一化分析进行无监督标注，表现不佳。这种不佳表现可以归因于下面的前提，即归一化分析提供了可靠的标注数据，这是一个不可靠的假设。另外，我们发现，使用RWI生成的数据集和相关特征来学习ML模型，可以在未看过数据时达到比较好的表现，并且与现有相关方法相比，具有更好的普适性。此外，我们还发现，一种半监督的ML方法，只需要标注约10%的数据，可以达到相对较高的表现，而且在实际应用中更加实际。

Story Visualization by Online Text Augmentation with Context Memory

paper_url: http://arxiv.org/abs/2308.07575
repo_url: https://github.com/yonseivnl/cmota
paper_authors: Daechul Ahn, Daneul Kim, Gwangmo Song, Seung Hwan Kim, Honglak Lee, Dongyeop Kang, Jonghyun Choi
for: 提高 Story Visualization task 的效果，使得模型能够更好地从文本描述中提取视觉细节并在多句文本中保持上下文。
methods: 提出了一种基于 Bi-directional Transformer 框架的新内存体系结构，并在训练过程中使用在线文本增强来生成多个 Pseudo-descriptions 作为补做性的超级vision 。
results: 在 Pororo-SV 和 Flintstones-SV 两个常用的 Story Visualization 测试集上，提出的方法significantly 超过了现有的状态天地，包括 FID、character F1、frame accuracy、BLEU-2/3 和 R-precision 等多个维度的metric ，同时 computation complexity 相对较低。

Abstract
Story visualization (SV) is a challenging text-to-image generation task for the difficulty of not only rendering visual details from the text descriptions but also encoding a long-term context across multiple sentences. While prior efforts mostly focus on generating a semantically relevant image for each sentence, encoding a context spread across the given paragraph to generate contextually convincing images (e.g., with a correct character or with a proper background of the scene) remains a challenge. To this end, we propose a novel memory architecture for the Bi-directional Transformer framework with an online text augmentation that generates multiple pseudo-descriptions as supplementary supervision during training for better generalization to the language variation at inference. In extensive experiments on the two popular SV benchmarks, i.e., the Pororo-SV and Flintstones-SV, the proposed method significantly outperforms the state of the arts in various metrics including FID, character F1, frame accuracy, BLEU-2/3, and R-precision with similar or less computational complexity.

摘要
Story visualization (SV) 是一个具有挑战性的文本到图像生成任务，因为不仅需要从文本描述中提取视觉细节，还需要编码长期上下文 Across multiple sentences。而且，在给出的段落中生成语言上下文感的图像（例如，正确的人物或场景背景）仍然是一个挑战。为此，我们提议一种新的储存架构，用于 Bi-directional Transformer 框架中的在线文本增强，在训练期间生成多个 pseudo-descriptions 作为补充性超级视图，以提高语言变化的泛化性。在两个流行的 SV 测试基准上，即 Pororo-SV 和 Flintstones-SV，我们的方法在多个纪录metric中显著超越了现状的术语，包括 FID、人物 F1、帧精度、BLEU-2/3 和 R-精度，同时具有相似或更低的计算复杂度。

Action Class Relation Detection and Classification Across Multiple Video Datasets

paper_url: http://arxiv.org/abs/2308.07558
repo_url: None
paper_authors: Yuya Yoshikawa, Yutaro Shigeto, Masashi Shimbo, Akikazu Takeuchi
for: 提高视频人体动作识别的数据集 augmenteion
methods: 使用语言和视觉信息关联类别进行类别关系探测和分类
results: 使用预训练的最新神经网络模型对文本和视频进行预测，可以获得高度预测性能，并且文本标签预测性能高于视频预测，可以将多模态融合以提高预测性能。

Abstract
The Meta Video Dataset (MetaVD) provides annotated relations between action classes in major datasets for human action recognition in videos. Although these annotated relations enable dataset augmentation, it is only applicable to those covered by MetaVD. For an external dataset to enjoy the same benefit, the relations between its action classes and those in MetaVD need to be determined. To address this issue, we consider two new machine learning tasks: action class relation detection and classification. We propose a unified model to predict relations between action classes, using language and visual information associated with classes. Experimental results show that (i) pre-trained recent neural network models for texts and videos contribute to high predictive performance, (ii) the relation prediction based on action label texts is more accurate than based on videos, and (iii) a blending approach that combines predictions by both modalities can further improve the predictive performance in some cases.

摘要
meta 视频集（MetaVD）提供了动作类别之间的注解关系，这些注解关系可以用于视频人体动作识别的数据集进行数据增强。然而，这些注解关系只适用于MetaVD覆盖的数据集。为了解决这个问题，我们考虑了两个新的机器学习任务：动作类别关系检测和分类。我们提议一种统一的模型，可以预测动作类别之间的关系，使用类别相关的语言和视觉信息。实验结果表明：（i）使用最近的预训练神经网络模型对文本和视频进行预测可以获得高度的预测性能，（ii）基于动作标签文本的关系预测比基于视频的预测更准确，（iii）将两种模态的预测结果融合使用可以在一些情况下提高预测性能。

Reinforcement Learning (RL) Augmented Cold Start Frequency Reduction in Serverless Computing

paper_url: http://arxiv.org/abs/2308.07541
repo_url: None
paper_authors: Siddharth Agarwal, Maria A. Rodriguez, Rajkumar Buyya
for: 本研究旨在减少 Function-as-a-Service（FaaS）平台上的冷启动频率，通过使用人工智能学习算法来预先启动函数。
methods: 本研究使用了Q学习算法，考虑了函数CPU使用率、现有函数实例和响应失败率等指标，以实现在预期需求的基础上进行前置启动函数。
results: 对比 Kubeless 默认策略和函数保持活动策略，RL 算法能够提高吞吐量达到 8.81%，降低计算负担和资源浪费达到 55% 和 37%，这直接归结于减少冷启动。

Abstract
Function-as-a-Service is a cloud computing paradigm offering an event-driven execution model to applications. It features serverless attributes by eliminating resource management responsibilities from developers and offers transparent and on-demand scalability of applications. Typical serverless applications have stringent response time and scalability requirements and therefore rely on deployed services to provide quick and fault-tolerant feedback to clients. However, the FaaS paradigm suffers from cold starts as there is a non-negligible delay associated with on-demand function initialization. This work focuses on reducing the frequency of cold starts on the platform by using Reinforcement Learning. Our approach uses Q-learning and considers metrics such as function CPU utilization, existing function instances, and response failure rate to proactively initialize functions in advance based on the expected demand. The proposed solution was implemented on Kubeless and was evaluated using a normalised real-world function demand trace with matrix multiplication as the workload. The results demonstrate a favourable performance of the RL-based agent when compared to Kubeless' default policy and function keep-alive policy by improving throughput by up to 8.81% and reducing computation load and resource wastage by up to 55% and 37%, respectively, which is a direct outcome of reduced cold starts.

摘要
Function-as-a-Service 是一种云计算 paradigm，它提供了事件驱动的执行模型，让应用程序在无需管理资源的情况下进行执行。它具有无Serverless特性，从开发者那里消除了资源管理责任，同时提供了透明的升级和缩放应用程序的功能。通常的Serverless应用程序具有严格的响应时间和可扩展性要求，因此它们通常依赖于部署的服务来提供快速的和可靠的反馈给客户端。然而，FaaS paradigm 受到冷启动的限制，即在需求时启动函数时存在非致命的延迟。这种工作将focuses on reducing the frequency of cold starts on the platform by using Reinforcement Learning。我们的方法使用Q-learning，考虑了函数 CPU 利用率、现有函数实例和响应失败率，以进行预先 initialize 函数基于预计的需求。我们的解决方案在 Kubeless 上实现，并通过使用一个 нормализован的实际函数需求轨迹进行评估。结果表明，我们的RL-based 代理比 Kubeless 的默认策略和函数保持活动策略提高了吞吐量，同时降低了计算负担和资源浪费，减少了冷启动的频率，从而提高了系统的性能。

Domain Adaptation via Minimax Entropy for Real/Bogus Classification of Astronomical Alerts

paper_url: http://arxiv.org/abs/2308.07538
repo_url: None
paper_authors: Guillermo Cabrera-Vives, César Bolivar, Francisco Förster, Alejandra M. Muñoz Arancibia, Manuel Pérez-Carrasco, Esteban Reyes
for: 这个论文是为了研究域域适应（Domain Adaptation）在天文物理数据分析中的应用，以提高天文物理数据分类的准确率。
methods: 该论文使用了四个不同的数据集：HiTS、DES、ATLAS和ZTF，并研究了这些数据集之间的域shift。它使用了一种简单的深度学习分类模型，并通过微调和半supervised深度域适应（MME）来改进模型。
results: 研究发现，只要在目标数据集上有一个或 fewer 的标注项目，就可以使得基本模型得到显著提高。此外，MME模型不会对源数据集的性能产生负面影响。

Abstract
Time domain astronomy is advancing towards the analysis of multiple massive datasets in real time, prompting the development of multi-stream machine learning models. In this work, we study Domain Adaptation (DA) for real/bogus classification of astronomical alerts using four different datasets: HiTS, DES, ATLAS, and ZTF. We study the domain shift between these datasets, and improve a naive deep learning classification model by using a fine tuning approach and semi-supervised deep DA via Minimax Entropy (MME). We compare the balanced accuracy of these models for different source-target scenarios. We find that both the fine tuning and MME models improve significantly the base model with as few as one labeled item per class coming from the target dataset, but that the MME does not compromise its performance on the source dataset.

摘要
时域天文学在实时处理多个大规模数据集方面取得了进展，导致多流机器学习模型的开发。在这个工作中，我们研究天文知讯报警的域 adapted（DA）技术，用于实时/假报警分类。我们使用四个不同的数据集进行研究：HiTS、DES、ATLAS和ZTF。我们研究这些数据集之间的域转换，并通过微调和半supervised深度DA来提高基本模型的性能。我们对不同的源目标场景进行比较，发现两种方法都可以大幅提高基本模型的性能，但MME方法不会优化源数据集的性能。

KMF: Knowledge-Aware Multi-Faceted Representation Learning for Zero-Shot Node Classification

paper_url: http://arxiv.org/abs/2308.08563
repo_url: None
paper_authors: Likang Wu, Junji Jiang, Hongke Zhao, Hao Wang, Defu Lian, Mengdi Zhang, Enhong Chen
for: Zero-Shot Node Classification (ZNC) task in graph data analysis, to predict nodes from unseen classes.
methods: Knowledge-Aware Multi-Faceted (KMF) framework that enhances label semantics via extracted KG-based topics, and reconstructs node content to a topic-level representation.
results: Extensive experiments on several public graph datasets, with an application of zero-shot cross-domain recommendation, demonstrating the effectiveness and generalization of KMF compared to state-of-the-art baselines.Here is the same information in Simplified Chinese:
for: Zero-Shot Node Classification (ZNC) 任务在图数据分析中，预测从训练过程中未经见过的节点。
methods: Knowledge-Aware Multi-Faceted (KMF) 框架，通过提取的知识图(KG)来增强标签 semantics，并将节点内容重建到一个话题级别表示。
results: 在多个公共图据集上进行了广泛的实验，并设计了跨领域零shot推荐应用，比较了状态的基elines。

Abstract
Recently, Zero-Shot Node Classification (ZNC) has been an emerging and crucial task in graph data analysis. This task aims to predict nodes from unseen classes which are unobserved in the training process. Existing work mainly utilizes Graph Neural Networks (GNNs) to associate features' prototypes and labels' semantics thus enabling knowledge transfer from seen to unseen classes. However, the multi-faceted semantic orientation in the feature-semantic alignment has been neglected by previous work, i.e. the content of a node usually covers diverse topics that are relevant to the semantics of multiple labels. It's necessary to separate and judge the semantic factors that tremendously affect the cognitive ability to improve the generality of models. To this end, we propose a Knowledge-Aware Multi-Faceted framework (KMF) that enhances the richness of label semantics via the extracted KG (Knowledge Graph)-based topics. And then the content of each node is reconstructed to a topic-level representation that offers multi-faceted and fine-grained semantic relevancy to different labels. Due to the particularity of the graph's instance (i.e., node) representation, a novel geometric constraint is developed to alleviate the problem of prototype drift caused by node information aggregation. Finally, we conduct extensive experiments on several public graph datasets and design an application of zero-shot cross-domain recommendation. The quantitative results demonstrate both the effectiveness and generalization of KMF with the comparison of state-of-the-art baselines.

摘要
近期，零shot节点分类（ZNC）在图数据分析中变得越来越重要。这个任务的目标是从训练过程中未见过的类型中预测节点。现有的工作主要利用图神经网络（GNNs）将特征抽象和标签含义相关联，从而实现知识传递从见到未见类型。然而，先前的工作忽略了多面性 semantic orientation的问题，即节点的内容通常涉及多个相关的标签 semantics。为了提高模型的通用性，我们提出了知识注意力多面性框架（KMF），该框架通过提取的知识图（KG）基于主题来增强标签 semantics的 ricness。然后，每个节点的内容被重建为主题级别的表示，以提供多面性和细化的semantic relevancy。由于图的实例（即节点）表示的特殊性，我们开发了一种新的 геометрических约束，以解决由节点信息汇集引起的prototype drift问题。最后，我们进行了多个公共图据集的广泛实验，并设计了零shot跨领域推荐应用。实验结果表明，KMF具有与状态 искусственныйbaseline的效果和通用性。

Nonlinearity, Feedback and Uniform Consistency in Causal Structural Learning

paper_url: http://arxiv.org/abs/2308.07520
repo_url: None
paper_authors: Shuyan Wang
for: 这个论文的目的是找到自动搜索方法，以便从观察数据中学习 causal structure。
methods: 这个论文使用的方法包括提出一种弱 faithfulness 定义，以及一种修改后的 causal discovery 算法，以relaxing Various simplification assumptions，使其适用于更广泛的 causal mechanism 和统计现象。
results: 这个论文的结果表明，使用修改后的 causal discovery 算法，可以在不同的 distributive 下学习 causal structure，并且可以找到 latent variables 的 causal connections。

Abstract
The goal of Causal Discovery is to find automated search methods for learning causal structures from observational data. In some cases all variables of the interested causal mechanism are measured, and the task is to predict the effects one measured variable has on another. In contrast, sometimes the variables of primary interest are not directly observable but instead inferred from their manifestations in the data. These are referred to as latent variables. One commonly known example is the psychological construct of intelligence, which cannot directly measured so researchers try to assess through various indicators such as IQ tests. In this case, casual discovery algorithms can uncover underlying patterns and structures to reveal the causal connections between the latent variables and between the latent and observed variables. This thesis focuses on two questions in causal discovery: providing an alternative definition of k-Triangle Faithfulness that (i) is weaker than strong faithfulness when applied to the Gaussian family of distributions, (ii) can be applied to non-Gaussian families of distributions, and (iii) under the assumption that the modified version of Strong Faithfulness holds, can be used to show the uniform consistency of a modified causal discovery algorithm; relaxing the sufficiency assumption to learn causal structures with latent variables. Given the importance of inferring cause-and-effect relationships for understanding and forecasting complex systems, the work in this thesis of relaxing various simplification assumptions is expected to extend the causal discovery method to be applicable in a wider range with diversified causal mechanism and statistical phenomena.

摘要
目标是找到自动搜寻方法，以学习 causal 结构从观察数据中。在某些情况下，所有变量的 interested causal mechanism 都被测量，需要预测一个测量到的变量对另一个变量的效应。相反，有时变量的首选变量不直接可观察，而是从数据中推导出来的。这些变量被称为 latent 变量。一个常见的例子是心理学中的智商，无法直接测量，因此研究人员会通过不同的指标，如 IQ 测试，来评估。在这种情况下， causal discovery 算法可以揭示下面的 causal 连接和 latent 变量与观察变量之间的连接。本论文关注两个问题在 causal discovery：提供一个 alternative 定义，可以用于 Gaussian 家族的分布，并且可以应用于非 Gaussian 分布家族，以及在 modified 版本的 Strong Faithfulness 假设下，可以用来证明一种修改后的 causal discovery 算法的均匀一致性。减少 sufficiency 假设，以学习包含 latent 变量的 causal 结构。由于推导 causal 关系的重要性，以上工作的扩展 causal discovery 方法的应用范围，预计将能够扩展到更多的 causal 机制和统计现象。

Boosting Semi-Supervised Learning by bridging high and low-confidence predictions

paper_url: http://arxiv.org/abs/2308.07509
repo_url: None
paper_authors: Khanh-Binh Nguyen, Joon-Sung Yang
for: 本研究旨在解决 Pseudo-labeling 方法中的三大问题，提高 semi-supervised learning 的性能和泛化能力。
methods: 本研究提出了一种新的 ReFixMatch 方法，通过全面利用无标示数据来提高模型的泛化能力和性能。
results: 对于 ImageNet 图像集，ReFixMatch 方法实现了 41.05% 的 top-1 准确率，超过 FixMatch 和当前状态的方法。

Abstract
Pseudo-labeling is a crucial technique in semi-supervised learning (SSL), where artificial labels are generated for unlabeled data by a trained model, allowing for the simultaneous training of labeled and unlabeled data in a supervised setting. However, several studies have identified three main issues with pseudo-labeling-based approaches. Firstly, these methods heavily rely on predictions from the trained model, which may not always be accurate, leading to a confirmation bias problem. Secondly, the trained model may be overfitted to easy-to-learn examples, ignoring hard-to-learn ones, resulting in the \textit{"Matthew effect"} where the already strong become stronger and the weak weaker. Thirdly, most of the low-confidence predictions of unlabeled data are discarded due to the use of a high threshold, leading to an underutilization of unlabeled data during training. To address these issues, we propose a new method called ReFixMatch, which aims to utilize all of the unlabeled data during training, thus improving the generalizability of the model and performance on SSL benchmarks. Notably, ReFixMatch achieves 41.05\% top-1 accuracy with 100k labeled examples on ImageNet, outperforming the baseline FixMatch and current state-of-the-art methods.

摘要
假标注是SSL中的一种重要技术，其中一个训练过的模型将生成对未标注数据的人工标注，允许同时在指导下进行 Label 和无标注数据的同时训练。然而，一些研究发现了假标注基于方法的三大问题。首先，这些方法强调训练过的模型的预测结果，可能并不准确，导致确认偏见问题。其次，训练过的模型可能会过拟合易学习的示例，忽略困难学习的示例，从而导致"马太效应"，即已经强的变得更强，弱的变得更弱。最后，大多数无标注数据的低信度预测被抛弃，因为使用高reshold，导致在训练中未充分利用无标注数据。为了解决这些问题，我们提出了一种新的方法 called ReFixMatch，它计划在训练中利用所有的无标注数据，从而提高模型的一般化性和SSL Benchmark中的性能。各种方法的实验结果表明，ReFixMatch可以与 FixMatch 和当前状态的方法相比，在 ImageNet 上达到41.05% 的 top-1 准确率，使用 100k 标注示例。

Detecting The Corruption Of Online Questionnaires By Artificial Intelligence

paper_url: http://arxiv.org/abs/2308.07499
repo_url: None
paper_authors: Benjamin Lebrun, Sharon Temtsin, Andrew Vonasch, Christoph Bartneck
for: 这个研究是为了检测在在线问卷中使用人工智能生成的文本是否可以被识别出来。
methods: 这个研究使用了人类和自动AI检测系统来检测文本的作者性。
results: 人类参与者的识别率为76%，但是这还不够保证数据质量。自动AI检测系统完全无用。如果AI变得太普遍，那么检测伪造提交的成本将超过在线问卷的 beneficial。这问题只能由群组平台系统上解决。

Abstract
Online questionnaires that use crowd-sourcing platforms to recruit participants have become commonplace, due to their ease of use and low costs. Artificial Intelligence (AI) based Large Language Models (LLM) have made it easy for bad actors to automatically fill in online forms, including generating meaningful text for open-ended tasks. These technological advances threaten the data quality for studies that use online questionnaires. This study tested if text generated by an AI for the purpose of an online study can be detected by both humans and automatic AI detection systems. While humans were able to correctly identify authorship of text above chance level (76 percent accuracy), their performance was still below what would be required to ensure satisfactory data quality. Researchers currently have to rely on the disinterest of bad actors to successfully use open-ended responses as a useful tool for ensuring data quality. Automatic AI detection systems are currently completely unusable. If AIs become too prevalent in submitting responses then the costs associated with detecting fraudulent submissions will outweigh the benefits of online questionnaires. Individual attention checks will no longer be a sufficient tool to ensure good data quality. This problem can only be systematically addressed by crowd-sourcing platforms. They cannot rely on automatic AI detection systems and it is unclear how they can ensure data quality for their paying clients.

摘要
在线问卷使用人群投票平台Recruit participants变得普遍，因为它们的使用容易和成本低。人工智能（AI）基于大语言模型（LLM）使得坏actor可以自动填充在线表单，包括生成有意义的文本 для开放式任务。这些技术进步威胁在线问卷中的数据质量。这个研究测试了一I生成的文本是否可以由人类和自动AI检测系统检测出来。人类参与者的准确率为76%，但是这还不够保证数据质量的满意度。研究人员目前需要坏actor的无自身利益来成功使用开放式回答。自动AI检测系统目前不可用。如果AI变得太普遍，则提交假答案的成本将超过在线问卷的利益。人类注意性检查不再是一个有效的数据质量保证工具。这个问题只能通过人群投票平台系统地解决。它们不能依靠自动AI检测系统，而且不清楚如何保证付出客户的数据质量。

paper_url: http://arxiv.org/abs/2308.07498
repo_url: https://github.com/HanqingWangAI/Dreamwalker
paper_authors: Hanqing Wang, Wei Liang, Luc Van Gool, Wenguan Wang
for:* DREAMWALKER is a world model-based VLN-CE agent that leverages mental experiments to plan and make strategic decisions in a freely traversable environment.methods:* The world model is built to summarize the visual, topological, and dynamic properties of the environment into a discrete, structured, and compact representation.* DREAMWALKER simulates and evaluates possible plans entirely in the internal abstract world before executing costly actions.results:* Extensive experiments and ablation studies on VLN-CE dataset confirm the effectiveness of the proposed approach and outline fruitful directions for future work.Here’s the Chinese translation:for:* DREAMWALKER 是一个基于世界模型的 VLN-CE 代理，通过MENTAL EXPERIMENTS 进行规划和策略决策在一个自由可行的环境中。methods:* 世界模型将环境的视觉、拓扑和动态特性总结为一个简化、结构化和压缩的表示。* DREAMWALKER 在内部抽象世界中 simulate 和评估可能的计划，以避免在真实世界中的浪费。results:* 对 VLN-CE 数据集的广泛实验和减少研究表明，提议的方法有效，并提供了未来工作的有价值导向。

Abstract
VLN-CE is a recently released embodied task, where AI agents need to navigate a freely traversable environment to reach a distant target location, given language instructions. It poses great challenges due to the huge space of possible strategies. Driven by the belief that the ability to anticipate the consequences of future actions is crucial for the emergence of intelligent and interpretable planning behavior, we propose DREAMWALKER -- a world model based VLN-CE agent. The world model is built to summarize the visual, topological, and dynamic properties of the complicated continuous environment into a discrete, structured, and compact representation. DREAMWALKER can simulate and evaluate possible plans entirely in such internal abstract world, before executing costly actions. As opposed to existing model-free VLN-CE agents simply making greedy decisions in the real world, which easily results in shortsighted behaviors, DREAMWALKER is able to make strategic planning through large amounts of ``mental experiments.'' Moreover, the imagined future scenarios reflect our agent's intention, making its decision-making process more transparent. Extensive experiments and ablation studies on VLN-CE dataset confirm the effectiveness of the proposed approach and outline fruitful directions for future work.

摘要

ST-MLP: A Cascaded Spatio-Temporal Linear Framework with Channel-Independence Strategy for Traffic Forecasting

paper_url: http://arxiv.org/abs/2308.07496
repo_url: None
paper_authors: Zepu Wang, Yuqi Nie, Peng Sun, Nam H. Nguyen, John Mulvey, H. Vincent Poor
for: 预测交通流量管理在智能交通系统（ITS）中的优化
methods: 使用简单的多层感知机制（MLP）模组和线性层，同时考虑时间资讯、空间资讯和预先定义的路径结构
results: 与现有的STGNNs和其他模型相比，ST-MLP表现出更高的准确性和computational efficiency

Abstract
The criticality of prompt and precise traffic forecasting in optimizing traffic flow management in Intelligent Transportation Systems (ITS) has drawn substantial scholarly focus. Spatio-Temporal Graph Neural Networks (STGNNs) have been lauded for their adaptability to road graph structures. Yet, current research on STGNNs architectures often prioritizes complex designs, leading to elevated computational burdens with only minor enhancements in accuracy. To address this issue, we propose ST-MLP, a concise spatio-temporal model solely based on cascaded Multi-Layer Perceptron (MLP) modules and linear layers. Specifically, we incorporate temporal information, spatial information and predefined graph structure with a successful implementation of the channel-independence strategy - an effective technique in time series forecasting. Empirical results demonstrate that ST-MLP outperforms state-of-the-art STGNNs and other models in terms of accuracy and computational efficiency. Our finding encourages further exploration of more concise and effective neural network architectures in the field of traffic forecasting.

摘要
“智能交通系统（ITS）中的实时流量预测 Criticality has drawn substantial scholarly focus, and Spatio-Temporal Graph Neural Networks (STGNNs) have been praised for their adaptability to road graph structures. However, current research on STGNNs architectures often prioritizes complex designs, leading to elevated computational burdens with only minor enhancements in accuracy. To address this issue, we propose ST-MLP, a concise spatio-temporal model based solely on cascaded Multi-Layer Perceptron (MLP) modules and linear layers. Specifically, we incorporate temporal information, spatial information, and predefined graph structure with a successful implementation of the channel-independence strategy - an effective technique in time series forecasting. Empirical results demonstrate that ST-MLP outperforms state-of-the-art STGNNs and other models in terms of accuracy and computational efficiency. Our finding encourages further exploration of more concise and effective neural network architectures in the field of traffic forecasting.”Here's a word-for-word translation of the text into Simplified Chinese:“智能交通系统（ITS）中的实时流量预测 Criticality 吸引了大量的学术关注，而 Spatio-Temporal Graph Neural Networks (STGNNs) 被赞誉为路径 граosph 结构的适应性。然而，目前 STGNNs 架构设计中经常优先过往复杂的设计，导致计算成本增加，仅对准确性有小量改善。为解决这个问题，我们提出 ST-MLP，一个简洁的 spatio-temporal 模型，基于弹性 Multi-Layer Perceptron (MLP) 模组和线性层。具体而言，我们将时间信息、空间信息和预设的路径结构融合在一起，并成功地实现了通道独立策略 - 一种有效的时间序列预测技术。实验结果显示，ST-MLP 在准确性和计算效率方面都超过了现有的 STGNNs 和其他模型。我们的发现鼓励我们继续探索更简洁和有效的神经网络架构在交通预测领域。”

Omega-Regular Reward Machines

paper_url: http://arxiv.org/abs/2308.07469
repo_url: None
paper_authors: Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, Dominik Wojtczak
for: 这篇论文旨在探讨在强化学习中设计合适的奖励机制是如何实现更高效的行为学习。
methods: 这篇论文使用了奖励机器和ωRegular语言来表达非马歇维奖励，以满足更复杂的学习目标。
results: 研究人员通过设计了一种基于模型自由RL算法的ε-优化策略来评估奖励机器的效果，并通过实验证明了该方法的可行性和有效性。

Abstract
Reinforcement learning (RL) is a powerful approach for training agents to perform tasks, but designing an appropriate reward mechanism is critical to its success. However, in many cases, the complexity of the learning objectives goes beyond the capabilities of the Markovian assumption, necessitating a more sophisticated reward mechanism. Reward machines and omega-regular languages are two formalisms used to express non-Markovian rewards for quantitative and qualitative objectives, respectively. This paper introduces omega-regular reward machines, which integrate reward machines with omega-regular languages to enable an expressive and effective reward mechanism for RL. We present a model-free RL algorithm to compute epsilon-optimal strategies against omega-egular reward machines and evaluate the effectiveness of the proposed algorithm through experiments.

摘要
利用强化学习（RL）训练代理人完成任务，但设计合适的奖励机制是关键。然而，在许多情况下，学习目标的复杂性超出了Markov预测的能力，需要更加复杂的奖励机制。奖励机器和ωRegular语言是两种用于表达非Markov奖励的形式，分别用于量化和质量目标。本文提出了ωRegular奖励机器，可以将奖励机器与ωRegular语言集成，以实现RL中的表达力和有效性。我们提出了一种无模型RL算法，可以计算ε优策略对ωRegular奖励机器，并通过实验评估提案的效果。

Playing with Words: Comparing the Vocabulary and Lexical Richness of ChatGPT and Humans

paper_url: http://arxiv.org/abs/2308.07462
repo_url: None
paper_authors: Pedro Reviriego, Javier Conde, Elena Merino-Gómez, Gonzalo Martínez, José Alberto Hernández
for: This study aims to compare the vocabulary and lexical richness of ChatGPT and human responses when performing the same tasks.
methods: The study uses two datasets containing answers to different types of questions answered by ChatGPT and humans, and analyzes the number of distinct words and lexical richness of each.
results: Preliminary results show that ChatGPT tends to use fewer distinct words and lower lexical richness than humans.Here’s the same information in Simplified Chinese:
for: 这项研究目的是对 chatGPT 和人类回答相同任务的 vocabulary 和语言丰富度进行比较。
methods: 研究使用两个数据集，每个数据集包含不同类型的问题，chatGPT 和人类回答的答案，并对每个数据集进行 vocabulary 和语言丰富度的分析。
results: 初步结果显示，chatGPT 使用的单词数量和语言丰富度比人类低。

Abstract
The introduction of Artificial Intelligence (AI) generative language models such as GPT (Generative Pre-trained Transformer) and tools such as ChatGPT has triggered a revolution that can transform how text is generated. This has many implications, for example, as AI-generated text becomes a significant fraction of the text in many disciplines, would this have an effect on the language capabilities of readers and also on the training of newer AI tools? Would it affect the evolution of languages? Focusing on one specific aspect of the language: words; will the use of tools such as ChatGPT increase or reduce the vocabulary used or the lexical richness (understood as the number of different words used in a written or oral production) when writing a given text? This has implications for words, as those not included in AI-generated content will tend to be less and less popular and may eventually be lost. In this work, we perform an initial comparison of the vocabulary and lexical richness of ChatGPT and humans when performing the same tasks. In more detail, two datasets containing the answers to different types of questions answered by ChatGPT and humans are used, and the analysis shows that ChatGPT tends to use fewer distinct words and lower lexical richness than humans. These results are very preliminary and additional datasets and ChatGPT configurations have to be evaluated to extract more general conclusions. Therefore, further research is needed to understand how the use of ChatGPT and more broadly generative AI tools will affect the vocabulary and lexical richness in different types of text and languages.

摘要
人工智能语言生成模型如GPT（生成预训练变换器）和ChatGPT等工具的出现已经引发了一场革命，这将对文本生成方式产生深远的影响。这有很多意义，例如，如果人工智能生成的文本在多个领域中占据了一定的比例，会对读者的语言能力和 newer AI工具的训练产生影响吗？会对语言演化产生影响？对于语言中的一个方面来说，使用工具如ChatGPT会增加或减少在写作文本时使用的词汇数量和语言 ricness（理解为在书面或口语中使用的不同词汇数量）？这有关词汇的影响，那些不包括在人工智能生成内容中将变得更加少用，并最终可能产生失传。在这项工作中，我们对ChatGPT和人类的答案集进行了初步比较，结果显示ChatGPT使用的词汇数量和语言 ricness较低。这些结果是非常初步的，需要更多的数据和ChatGPT配置来提取更广泛的结论。因此，进一步的研究是必要的，以了解人工智能生成工具在不同类型的文本和语言中的词汇和语言 ricness的影响。

Inductive Knowledge Graph Completion with GNNs and Rules: An Analysis

paper_url: http://arxiv.org/abs/2308.07942
repo_url: https://github.com/anilakash/indkgc
paper_authors: Akash Anil, Víctor Gutiérrez-Basulto, Yazmín Ibañéz-García, Steven Schockaert
for: 这个论文的目的是研究 inductive knowledge graph completion 任务，即从训练图像上学习推理规则，并将其应用于独立的测试图像上进行预测。
methods: 这个论文使用了规则基于的方法，但是在实践中，这些方法表现非常差。作者认为这是因为两个因素：（i）不可能的实体没有被评估，（ii）只考虑最有用的路径来确定链接预测答案的信任程度。作者们为了解决这些问题，研究了一些变种方法，包括一些专门 Addressing 这些问题。
results: 研究发现，这些变种方法可以几乎与 NBFNet 相当的性能，而且它们只使用了 NBFNet 使用的一小部分证据。此外，作者们还发现，一个further variant，即考虑整个知识图，可以一直高于 NBFNet 的性能。

Abstract
The task of inductive knowledge graph completion requires models to learn inference patterns from a training graph, which can then be used to make predictions on a disjoint test graph. Rule-based methods seem like a natural fit for this task, but in practice they significantly underperform state-of-the-art methods based on Graph Neural Networks (GNNs), such as NBFNet. We hypothesise that the underperformance of rule-based methods is due to two factors: (i) implausible entities are not ranked at all and (ii) only the most informative path is taken into account when determining the confidence in a given link prediction answer. To analyse the impact of these factors, we study a number of variants of a rule-based approach, which are specifically aimed at addressing the aforementioned issues. We find that the resulting models can achieve a performance which is close to that of NBFNet. Crucially, the considered variants only use a small fraction of the evidence that NBFNet relies on, which means that they largely keep the interpretability advantage of rule-based methods. Moreover, we show that a further variant, which does look at the full KG, consistently outperforms NBFNet.

摘要
任务是 inductive 知识图完成需要模型学习从训练图中学习推理模式，然后用于测试图上预测。规则式方法看起来很自然地适合这个任务，但在实践中它们实际上显著地下表现。我们认为这是因为两个因素：（i）不可能的实体没有被排序，（ii）只考虑测试图中最有用的路径来确定链接预测答案的信任度。为了分析这些因素的影响，我们研究了一些 variants 的规则式方法，它们专门解决这些问题。我们发现这些模型可以达到与 NBFNet 相似的性能，而且它们只使用了 NBFNet 使用的一小部分证据，这意味着它们保持了解释性的优势。此外，我们还显示了一种 further 的变体，它会在全知识图上进行预测，并一直表现出perform better 于 NBFNet。

Artificial Intelligence for Smart Transportation

paper_url: http://arxiv.org/abs/2308.07457
repo_url: https://github.com/SarmisthaDutta/application-of-artificial-Intelligence-for-future-sustainable-smart-city
paper_authors: Michael Wilbur, Amutheezan Sivagnanam, Afiya Ayman, Samitha Samaranayeke, Abhishek Dubey, Aron Laszka
for: 提高公共交通系统的效率和使用率，以满足社会发展和人类价值创造的需求。
methods: 利用人工智能技术，提供数据驱动的智能交通系统，包括数据收集、人工智能决策支持和计算机科学问题解决方案。
results: 通过对交通系统的数据分析和人工智能技术应用，提高交通系统的效率和使用率，为社会发展和人类价值创造提供可能性。

Abstract
There are more than 7,000 public transit agencies in the U.S. (and many more private agencies), and together, they are responsible for serving 60 billion passenger miles each year. A well-functioning transit system fosters the growth and expansion of businesses, distributes social and economic benefits, and links the capabilities of community members, thereby enhancing what they can accomplish as a society. Since affordable public transit services are the backbones of many communities, this work investigates ways in which Artificial Intelligence (AI) can improve efficiency and increase utilization from the perspective of transit agencies. This book chapter discusses the primary requirements, objectives, and challenges related to the design of AI-driven smart transportation systems. We focus on three major topics. First, we discuss data sources and data. Second, we provide an overview of how AI can aid decision-making with a focus on transportation. Lastly, we discuss computational problems in the transportation domain and AI approaches to these problems.

摘要
美国有超过7,000个公共交通机构，以及许多私人机构，每年共运送600亿公里的乘客。一个健全的公共交通系统会促进企业增长和扩张、分配社会和经济的利益，并将社区成员的能力相互连接，从而提高社会的可能性。由于公共交通服务是许多社区的基础设施，这项工作研究了使用人工智能（AI）提高公共交通系统的效率和使用率。本章讨论了智能交通系统设计的主要需求、目标和挑战。我们主要讨论以下三个主题：第一，讨论数据来源和数据；第二，介绍transportation领域中AI助于决策的方法；第三，讨论交通领域的计算问题和AI应用于这些问题。

GRU-D-Weibull: A Novel Real-Time Individualized Endpoint Prediction

paper_url: http://arxiv.org/abs/2308.07452
repo_url: None
paper_authors: Xiaoyang Ruan, Liwei Wang, Charat Thongprayoon, Wisit Cheungpasitporn, Hongfang Liu
for: 这份研究的目的是发展一个新的方法，即GRU-D-Weibull，用于预测个人水平的终点和时间至终点。这种方法可以实现实时个人化终点预测和人口水平的风险管理。
methods: 这份研究使用了一个新的方法，即GRU-D-Weibull，它结合了闸道运算和衰减（GRU-D）来模型Weibull分布。这种方法可以实现实时个人化终点预测和人口水平的风险管理。
results: 这份研究发现，GRU-D-Weibull方法在终点预测中表现出色，C-指数约为0.7，并在4.3年的追踪期间持续提高到约0.77。这与随机生存树的表现相似。GRU-D-Weibull方法在L1损失上显示出了优秀的表现，与其他方法相比，具有较低的损失值。此外，GRU-D-Weibull方法还能够对终点预测进行时间轴上的精确调整。

Abstract
Accurate prediction models for individual-level endpoints and time-to-endpoints are crucial in clinical practice. In this study, we propose a novel approach, GRU-D-Weibull, which combines gated recurrent units with decay (GRU-D) to model the Weibull distribution. Our method enables real-time individualized endpoint prediction and population-level risk management. Using a cohort of 6,879 patients with stage 4 chronic kidney disease (CKD4), we evaluated the performance of GRU-D-Weibull in endpoint prediction. The C-index of GRU-D-Weibull was ~0.7 at the index date and increased to ~0.77 after 4.3 years of follow-up, similar to random survival forest. Our approach achieved an absolute L1-loss of ~1.1 years (SD 0.95) at the CKD4 index date and a minimum of ~0.45 years (SD0.3) at 4 years of follow-up, outperforming competing methods significantly. GRU-D-Weibull consistently constrained the predicted survival probability at the time of an event within a smaller and more fixed range compared to other models throughout the follow-up period. We observed significant correlations between the error in point estimates and missing proportions of input features at the index date (correlations from ~0.1 to ~0.3), which diminished within 1 year as more data became available. By post-training recalibration, we successfully aligned the predicted and observed survival probabilities across multiple prediction horizons at different time points during follow-up. Our findings demonstrate the considerable potential of GRU-D-Weibull as the next-generation architecture for endpoint risk management, capable of generating various endpoint estimates for real-time monitoring using clinical data.

摘要
准确的预测模型对个体级别终点和时间至终点是临床实践中非常重要。在这项研究中，我们提出了一种新的方法，即GRU-D-Weibull，它将闭合杂列单元（GRU-D）与减速分布（Weibull distribution）结合在一起。我们的方法可以实现实时个体化终点预测和人口级别风险管理。使用6,879名CKD4阶段4慢性肾病患者的 cohort，我们评估了GRU-D-Weibull在终点预测方面的性能。GRU-D-Weibull的C-指数在指定日期为 aproximadamente 0.7，而在4.3年后跟踪中，它提高至 aproximadamente 0.77，与随机生存森林类似。我们的方法实现了终点预测的L1损失约为1.1年（SD 0.95）在CKD4指定日期，并在4年后跟踪中达到了约0.45年（SD0.3）的最小值，与其他方法相比显著性能更高。GRU-D-Weibull通常在终点预测过程中压缩预测生存概率的范围，与其他模型在跟踪期间保持相对更小和更固定的范围相比，表现更稳定。我们发现在指定日期的输入特征批量缺失率和预测点估值的误差之间存在 statistically significant 的相关性（从 approximately 0.1到 approximately 0.3），这种相关性随着时间的推移而减少。通过后期重新训练，我们成功地将预测和观察到的生存概率相互对应，并在不同的跟踪时间点 durante el seguimiento。我们的发现表明GRU-D-Weibull可能是下一代结构，它可以使用临床数据生成多种终点估计，用于实时监测终点风险。

Open-set Face Recognition using Ensembles trained on Clustered Data

paper_url: http://arxiv.org/abs/2308.07445
repo_url: None
paper_authors: Rafael Henrique Vareto, William Robson Schwartz
for: 该论文目的是开发一种可扩展的开放集面Recognition方法，能够处理大量未知人脸。
methods: 该方法使用 clustering 和多个二进制学习算法，对查询人脸样本进行分类，并使用 ensemble 提高预测性能。
results: 实验结果表明，该方法可以在大量人脸库中实现竞争力强的表现，即使面临大量未知人脸。Here’s the English version for reference:
for: The purpose of this paper is to develop a scalable open-set face recognition approach that can handle large numbers of unfamiliar faces.
methods: The method uses clustering and an ensemble of binary learning algorithms to classify query face samples and retrieve their correct identity from a gallery of hundreds or thousands of subjects.
results: Experimental results show that the approach can achieve competitive performance even when targeting scalability, handling large numbers of unfamiliar faces.

Abstract
Open-set face recognition describes a scenario where unknown subjects, unseen during the training stage, appear on test time. Not only it requires methods that accurately identify individuals of interest, but also demands approaches that effectively deal with unfamiliar faces. This work details a scalable open-set face identification approach to galleries composed of hundreds and thousands of subjects. It is composed of clustering and an ensemble of binary learning algorithms that estimates when query face samples belong to the face gallery and then retrieves their correct identity. The approach selects the most suitable gallery subjects and uses the ensemble to improve prediction performance. We carry out experiments on well-known LFW and YTF benchmarks. Results show that competitive performance can be achieved even when targeting scalability.

摘要
开放集 face recognition 描述一种enario，在训练阶段未经见过的不明人脸出现在测试阶段。不仅需要准确地识别关注人脸，而且也需要采用有效地处理未知脸的方法。这篇文章介绍了一种可扩展的开放集 face 识别方法，可以对包括百万多个主题的人脸库进行识别。该方法包括分集和一个 ensemble of 二分学习算法，以便在测试阶段确定查询脸样本是否属于人脸库，并且使用ensemble提高预测性能。我们在well-known LFW和YTF标准准样本上进行了实验，结果表明，即使targeting可扩展性，也可以实现竞争性的表现。

The Performance of Transferability Metrics does not Translate to Medical Tasks

paper_url: http://arxiv.org/abs/2308.07444
repo_url: None
paper_authors: Levy Chaves, Alceu Bissoto, Eduardo Valle, Sandra Avila
for: 本研究旨在评估七种传输可能性分数在医疗图像分析中的表现，以便更好地选择适合目标数据集的深度学习架构。
methods: 本研究使用了七种传输可能性分数，包括三种基于特征之间的相似度的方法，以及四种基于特征之间的相似度和特征之间的相似度的权重平均值的方法。
results: 本研究在三个医疗图像分析应用中进行了广泛的评估，并发现了 Transferability 分数在医疗图像分析中的表现不一定可靠和一致，需要进一步的研究。

Abstract
Transfer learning boosts the performance of medical image analysis by enabling deep learning (DL) on small datasets through the knowledge acquired from large ones. As the number of DL architectures explodes, exhaustively attempting all candidates becomes unfeasible, motivating cheaper alternatives for choosing them. Transferability scoring methods emerge as an enticing solution, allowing to efficiently calculate a score that correlates with the architecture accuracy on any target dataset. However, since transferability scores have not been evaluated on medical datasets, their use in this context remains uncertain, preventing them from benefiting practitioners. We fill that gap in this work, thoroughly evaluating seven transferability scores in three medical applications, including out-of-distribution scenarios. Despite promising results in general-purpose datasets, our results show that no transferability score can reliably and consistently estimate target performance in medical contexts, inviting further work in that direction.

摘要
<>将文本翻译成简化中文。<>基于知识传递的深度学习（DL）在医疗图像分析中提高了性能，但由于DL模型的数量爆炸式增长，探索所有候选者成为不可能的，因此需要更加经济的选择方法。基于传输性的分数评估方法在这个领域出现，允许效率地计算任何目标数据集上模型准确率的相关分数。然而，由于这些传输性分数在医疗图像 datasets 上的应用仍然不明确，因此这种方法在实践中的应用尚未得到推广。我们在这个工作中填补了这个空白，对七种传输性分数在三个医疗应用中的性能进行了全面评估。虽然在通用数据集上显示了承诺的结果，但我们的结果表明，在医疗上下文中，没有一个可靠和一致地估计目标性能的传输性分数。这对未来的研究提出了挑战。

Physics-Informed Deep Learning to Reduce the Bias in Joint Prediction of Nitrogen Oxides

paper_url: http://arxiv.org/abs/2308.07441
repo_url: None
paper_authors: Lianfa Li, Roxana Khalili, Frederick Lurmann, Nathan Pavlovic, Jun Wu, Yan Xu, Yisi Liu, Karl O’Sharkey, Beate Ritz, Luke Oman, Meredith Franklin, Theresa Bastain, Shohreh F. Farzan, Carrie Breton, Rima Habre
for: 这个论文旨在提高空气质量预测的准确性和可靠性，尤其是对氧气杂合物（NOx）的预测。
methods: 这篇论文使用了机器学习（ML）方法和化学运输模型（CTM）的知识来开发一个physics-informed deep learning框架，用于预测NO2和NOx的分布。这个框架具有减少预测偏差的能力，并能够提供明确的uncertainty估计。
results: 这篇论文的结果表明，使用这个physics-informed deep learning框架可以减少ML模型的预测偏差，并且能够更好地预测NO2和NOx的分布。这个框架还能够提供明确的uncertainty估计，这有助于更好地理解空气质量的变化和风险。

Abstract
Atmospheric nitrogen oxides (NOx) primarily from fuel combustion have recognized acute and chronic health and environmental effects. Machine learning (ML) methods have significantly enhanced our capacity to predict NOx concentrations at ground-level with high spatiotemporal resolution but may suffer from high estimation bias since they lack physical and chemical knowledge about air pollution dynamics. Chemical transport models (CTMs) leverage this knowledge; however, accurate predictions of ground-level concentrations typically necessitate extensive post-calibration. Here, we present a physics-informed deep learning framework that encodes advection-diffusion mechanisms and fluid dynamics constraints to jointly predict NO2 and NOx and reduce ML model bias by 21-42%. Our approach captures fine-scale transport of NO2 and NOx, generates robust spatial extrapolation, and provides explicit uncertainty estimation. The framework fuses knowledge-driven physicochemical principles of CTMs with the predictive power of ML for air quality exposure, health, and policy applications. Our approach offers significant improvements over purely data-driven ML methods and has unprecedented bias reduction in joint NO2 and NOx prediction.

摘要

Interaction-Aware Personalized Vehicle Trajectory Prediction Using Temporal Graph Neural Networks

paper_url: http://arxiv.org/abs/2308.07439
repo_url: None
paper_authors: Amr Abdelraouf, Rohit Gupta, Kyungtae Han
for: 提高先进驾驶辅助系统和自动驾驶车辆的预测性能
methods: 使用时间图 convolutional neural networks (GCN) 和长期快速传递Memory (LSTM) 模型当地交通数据
results: 比较先进的预测性能，尤其是在较长的预测时间范围内，并且在不同的驾驶者环境下显示出较高的预测精度。

Abstract
Accurate prediction of vehicle trajectories is vital for advanced driver assistance systems and autonomous vehicles. Existing methods mainly rely on generic trajectory predictions derived from large datasets, overlooking the personalized driving patterns of individual drivers. To address this gap, we propose an approach for interaction-aware personalized vehicle trajectory prediction that incorporates temporal graph neural networks. Our method utilizes Graph Convolution Networks (GCN) and Long Short-Term Memory (LSTM) to model the spatio-temporal interactions between target vehicles and their surrounding traffic. To personalize the predictions, we establish a pipeline that leverages transfer learning: the model is initially pre-trained on a large-scale trajectory dataset and then fine-tuned for each driver using their specific driving data. We employ human-in-the-loop simulation to collect personalized naturalistic driving trajectories and corresponding surrounding vehicle trajectories. Experimental results demonstrate the superior performance of our personalized GCN-LSTM model, particularly for longer prediction horizons, compared to its generic counterpart. Moreover, the personalized model outperforms individual models created without pre-training, emphasizing the significance of pre-training on a large dataset to avoid overfitting. By incorporating personalization, our approach enhances trajectory prediction accuracy.

摘要
<>预测车辆轨迹的准确性是智能驾驶系统和自动驾驶车辆的关键。现有方法主要依靠大规模的数据集来 derivate 通用的轨迹预测，忽略了个人驾驶模式的特点。为了解决这个空白，我们提出了一种基于互动的个性化车辆轨迹预测方法，该方法利用图像卷积神经网络（GCN）和长短期记忆（LSTM）来模型目标车辆和它们周围的交通之间的空间时间互动。为了个性化预测，我们建立了一个管道，该管道通过转移学习来使用大规模轨迹数据进行初始化，然后通过每个驾驶员的特定驾驶数据进行微调。我们使用人类在Loop的 simulate 来收集个性化自然驾驶轨迹和相应的周围车辆轨迹。实验结果表明我们的个性化GCN-LSTM模型在较长的预测时间范围内表现更出色，特别是比其通用对应模型更出色。此外，个性化模型也比没有预训练的模型更好，这说明了大规模数据集的预训练可以避免过拟合。通过个性化，我们的方法可以提高轨迹预测精度。

Semantic Similarity Loss for Neural Source Code Summarization

paper_url: http://arxiv.org/abs/2308.07429
repo_url: https://github.com/apcl-research/funcom-useloss
paper_authors: Chia-Yi Su, Collin McMillan
for: 本研究旨在提出一种改进的损失函数，用于自动生成源代码描述。
methods: 本研究使用一种semantic similarity metric来计算损失，并与传统的一个字一个字损失函数相结合。
results: 对多种基eline进行评估，得到了大多数情况下的改进。

Abstract
This paper presents an improved loss function for neural source code summarization. Code summarization is the task of writing natural language descriptions of source code. Neural code summarization refers to automated techniques for generating these descriptions using neural networks. Almost all current approaches involve neural networks as either standalone models or as part of a pretrained large language models e.g., GPT, Codex, LLaMA. Yet almost all also use a categorical cross-entropy (CCE) loss function for network optimization. Two problems with CCE are that 1) it computes loss over each word prediction one-at-a-time, rather than evaluating a whole sentence, and 2) it requires a perfect prediction, leaving no room for partial credit for synonyms. We propose and evaluate a loss function to alleviate this problem. In essence, we propose to use a semantic similarity metric to calculate loss over the whole output sentence prediction per training batch, rather than just loss for each word. We also propose to combine our loss with traditional CCE for each word, which streamlines the training process compared to baselines. We evaluate our approach over several baselines and report an improvement in the vast majority of conditions.

摘要

It computes loss for each word prediction individually, rather than evaluating the entire sentence.2. It requires perfect predictions, leaving no room for partial credit for synonyms.To address these issues, the proposed loss function uses a semantic similarity metric to calculate loss over the entire output sentence prediction for each training batch, rather than just for each word. Additionally, the proposed loss function combines with traditional CCE for each word, streamlining the training process compared to baselines. The approach is evaluated over several baselines and shows an improvement in the majority of conditions.In simplified Chinese, the text can be translated as:这篇论文提出了一种改进的损失函数，用于神经源代码概要。神经代码概要是自动生成源代码的自然语言描述的任务。目前大多数方法都使用神经网络作为独立模型或大语言模型的组件，如GPT、Codex、LLaMA。然而，这些模型的优化都使用分类交叉熵损失函数（CCE），它有两个问题：1. 它计算每个单词预测的损失，而不是整个句子。2. 它需要精准预测，不允许任何减少偏差。我们提议使用semantic相似度度量来计算损失，而不是单独计算每个单词的损失。此外，我们还提议将我们的损失函数与传统的CCE相结合，以便在训练过程中对每个单词进行损失计算，而不是直接使用CCE。我们对多个基准进行评估，并发现大多数情况下有改进。

UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity

paper_url: http://arxiv.org/abs/2308.07428
repo_url: None
paper_authors: Weijian Mai, Zhijun Zhang
for: 这篇论文旨在探讨使用人脑活动诱发的视觉刺激来重建图像和文本描述，以更好地理解人脑和视觉系统之间的连接。
methods: 该论文提出了一种名为UniBrain的一种普适傅 diffusion模型，通过将fMRI voxels转换为图像和文本的latent空间，并通过基于CLIP的fMRI图像和文本条件来导向反向傅 diffusion过程，生成真实的caption和图像。
results: UniBrain在图像重建和描述方面与现有方法相比，表现出较高的qualitative和quantitative性能，并在Natural Scenes Dataset（NSD）数据集上首次实现了图像描述结果。此外，简除试验和功能ROI分析还表明UniBrain的优势和视觉脑 decode的全面意义。

Abstract
Image reconstruction and captioning from brain activity evoked by visual stimuli allow researchers to further understand the connection between the human brain and the visual perception system. While deep generative models have recently been employed in this field, reconstructing realistic captions and images with both low-level details and high semantic fidelity is still a challenging problem. In this work, we propose UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity. For the first time, we unify image reconstruction and captioning from visual-evoked functional magnetic resonance imaging (fMRI) through a latent diffusion model termed Versatile Diffusion. Specifically, we transform fMRI voxels into text and image latent for low-level information and guide the backward diffusion process through fMRI-based image and text conditions derived from CLIP to generate realistic captions and images. UniBrain outperforms current methods both qualitatively and quantitatively in terms of image reconstruction and reports image captioning results for the first time on the Natural Scenes Dataset (NSD) dataset. Moreover, the ablation experiments and functional region-of-interest (ROI) analysis further exhibit the superiority of UniBrain and provide comprehensive insight for visual-evoked brain decoding.

摘要
Image重建和描述从脑动活动诱发的视觉系统的连接，使研究人员更深入了解人脑和视觉系统之间的关系。而最近，深度生成模型在这一领域得到了广泛应用。但是，重现真实的描述和图像，同时具有低级别细节和高semantic faithfulness仍然是一个挑战。在这种情况下，我们提出了UniBrain：基于人脑活动的一种普适的扩展模型，用于 reunifying image reconstruction和 captioning。UniBrain通过将fMRI voxels转换为文本和图像的latent空间，并通过基于CLIP的fMRI图像和文本条件，驱动回传diffusion过程，生成真实的描述和图像。UniBrain在质量和量上比现有方法出色，并在自然场景数据集（NSD）上首次报告图像描述结果。此外，我们还进行了层次 ROI分析和简要实验，以更深入地探索视觉诱发的脑决码。

Exploring the Intersection of Large Language Models and Agent-Based Modeling via Prompt Engineering

paper_url: http://arxiv.org/abs/2308.07411
repo_url: https://github.com/ejunprung/llm-agents
paper_authors: Edward Junprung
for: 这个研究旨在使用大语言模型（LLM）来实现人类行为的准确模拟，以探索人类行为在复杂社会系统中的行为和互动方式。
methods: 研究使用了提示工程（inspired by Park et al. (2023)），通过设计了两个人类行为的假设 scenario：一个是两个代理的谈判，另一个是六个代理的谋杀推理游戏。
results: 研究发现，使用LLM可以创造出非常真实的人类行为，包括在谈判和推理游戏中的互动和决策。这些结果表明，LLM可以成为模拟人类行为的有效工具。

Abstract
The final frontier for simulation is the accurate representation of complex, real-world social systems. While agent-based modeling (ABM) seeks to study the behavior and interactions of agents within a larger system, it is unable to faithfully capture the full complexity of human-driven behavior. Large language models (LLMs), like ChatGPT, have emerged as a potential solution to this bottleneck by enabling researchers to explore human-driven interactions in previously unimaginable ways. Our research investigates simulations of human interactions using LLMs. Through prompt engineering, inspired by Park et al. (2023), we present two simulations of believable proxies of human behavior: a two-agent negotiation and a six-agent murder mystery game.

摘要
最终的前ier для模拟是准确地表现复杂的现实世界社会系统。而代理人模型（ABM）则尝试研究代理人在更大的系统中的行为和互动，但它无法准确地捕捉人类驱动的行为的全部复杂性。大语言模型（LLM），如ChatGPT，在这个瓶颈上出现了作为解决方案，让研究人员能够在前所未有的方式探索人类驱动的互动。我们的研究探讨了使用LLM进行人类互动的模拟。通过提示工程，取得自 Park et al. (2023)，我们展示了两个人类行为的准确代理：一个两个代理的谈判和一个六个代理的谋杀推理游戏。

PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects

paper_url: http://arxiv.org/abs/2308.07391
repo_url: https://github.com/3dlg-hcvc/paris
paper_authors: Jiayi Liu, Ali Mahdavi-Amiri, Manolis Savva
for: simultaneous part-level reconstruction and motion parameter estimation for articulated objects
methods: self-supervised, end-to-end architecture with implicit shape and appearance models, optimizes motion parameters jointly without 3D supervision or semantic annotation
results: generalizes better across object categories, outperforms baselines and prior work, improves reconstruction with a Chamfer-L1 distance reduction of 3.94 (45.2%) for objects and 26.79 (84.5%) for parts, achieves 5% error rate for motion estimation across 10 object categories.

Abstract
We address the task of simultaneous part-level reconstruction and motion parameter estimation for articulated objects. Given two sets of multi-view images of an object in two static articulation states, we decouple the movable part from the static part and reconstruct shape and appearance while predicting the motion parameters. To tackle this problem, we present PARIS: a self-supervised, end-to-end architecture that learns part-level implicit shape and appearance models and optimizes motion parameters jointly without any 3D supervision, motion, or semantic annotation. Our experiments show that our method generalizes better across object categories, and outperforms baselines and prior work that are given 3D point clouds as input. Our approach improves reconstruction relative to state-of-the-art baselines with a Chamfer-L1 distance reduction of 3.94 (45.2%) for objects and 26.79 (84.5%) for parts, and achieves 5% error rate for motion estimation across 10 object categories. Video summary at: https://youtu.be/tDSrROPCgUc

摘要
我们考虑了同时进行部件重建和运动参数估计的骨架对象问题。给出两个多视图图像集合，我们将可动部分与静止部分分离，重建形状和外观，同时预测运动参数。为解决这个问题，我们提出了PARIS：一种自主、端到端架构，不需要3D超视觉、运动或semantic注解，可以同时学习部件级别的隐式形状和外观模型，并同步优化运动参数。我们的实验表明，我们的方法可以更好地泛化到不同的对象类别，并超越基elines和先前的方法，它们输入3D点云作为输入。我们的方法可以将 Chamfer-L1 距离减少3.94（45.2%） для对象和26.79（84.5%） для部件，并实现了10种对象类别中的运动估计错误率为5%。Video summary:

LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked

paper_url: http://arxiv.org/abs/2308.07308
repo_url: None
paper_authors: Alec Helbling, Mansi Phute, Matthew Hull, Duen Horng Chau
for: 防止语言模型生成危险内容（如犯罪指南）
methods: 使用大语言模型自身的过滤机制来防止生成危险内容
results: even if a model is not fine-tuned to be aligned with human values, it is possible to stop it from presenting harmful content to users by validating the content using a language model.

Abstract
Large language models (LLMs) have skyrocketed in popularity in recent years due to their ability to generate high-quality text in response to human prompting. However, these models have been shown to have the potential to generate harmful content in response to user prompting (e.g., giving users instructions on how to commit crimes). There has been a focus in the literature on mitigating these risks, through methods like aligning models with human values through reinforcement learning. However, it has been shown that even aligned language models are susceptible to adversarial attacks that bypass their restrictions on generating harmful text. We propose a simple approach to defending against these attacks by having a large language model filter its own responses. Our current results show that even if a model is not fine-tuned to be aligned with human values, it is possible to stop it from presenting harmful content to users by validating the content using a language model.

摘要
大型语言模型（LLM）在最近几年内 Popularity 急剧增长，这主要归功于它们可以根据人类提示生成高质量的文本。然而，这些模型也被证明可能生成有害内容（如提供用户提示以commit犯罪）。在文献中，有许多研究探讨如何 Mitigate 这些风险，如通过 reinforcement learning 将模型与人类价值观念进行对齐。然而，研究表明，即使模型已经对齐，也可能受到恶意攻击，这些攻击可以让模型生成有害内容。我们提议一种简单的方法，即让大型语言模型自己过滤其回快。我们当前的结果表明，即使模型没有 fine-tune 对齐人类价值观念，也可以通过语言模型验证来防止它生成有害内容给用户。

Extend Wave Function Collapse to Large-Scale Content Generation

paper_url: http://arxiv.org/abs/2308.07307
repo_url: None
paper_authors: Yuhe Nie, Shaoming Zheng, Zhan Zhuang, Xuan Song
for: 解决 Wave Function Collapse (WFC) 算法在大规模内容生成中的时间复杂性和约束矛盾问题。
methods: 提出 Nested WFC (N-WFC) 算法框架，采用完整和部分完整块集准备策略，可以避免矛盾和回tracking问题，并且可以生成 deterministic 和 periodic 的无限内容。
results: 验证 N-WFC 算法的可行性和适用性，并且通过 weight-brush 系统和游戏设计方法，证明其适用于游戏设计。

Abstract
Wave Function Collapse (WFC) is a widely used tile-based algorithm in procedural content generation, including textures, objects, and scenes. However, the current WFC algorithm and related research lack the ability to generate commercialized large-scale or infinite content due to constraint conflict and time complexity costs. This paper proposes a Nested WFC (N-WFC) algorithm framework to reduce time complexity. To avoid conflict and backtracking problems, we offer a complete and sub-complete tileset preparation strategy, which requires only a small number of tiles to generate aperiodic and deterministic infinite content. We also introduce the weight-brush system that combines N-WFC and sub-complete tileset, proving its suitability for game design. Our contribution addresses WFC's challenge in massive content generation and provides a theoretical basis for implementing concrete games.

摘要
wave function collapse (WFC) 是一种广泛使用的瓷砖式算法 в procedural content generation 中，包括文本、物体和场景等。然而，当前 WFC 算法和相关研究缺乏可商业化的大规模或无限内容生成能力，这是因为约束冲突和时间复杂度成本的问题。本文提出了一种嵌套 WFC（N-WFC）算法框架，以降低时间复杂度。为了避免冲突和回溯问题，我们提供了完整的和半完整的瓷砖集准备策略，只需一小数量的瓷砖可以生成 periodic 和 deterministic 无限内容。我们还介绍了 weight-brush 系统，该系统结合 N-WFC 和半完整瓷砖集，证明其适用于游戏设计。我们的贡献解决了 WFC 在大规模内容生成中的挑战，并提供了实现具体游戏的理论基础。

Neural Authorship Attribution: Stylometric Analysis on Large Language Models

paper_url: http://arxiv.org/abs/2308.07305
repo_url: None
paper_authors: Tharindu Kumarage, Huan Liu
for: 这篇研究旨在探讨人工智能生成文本的伪造问题，并寻找一种可靠的方法来追溯这些文本的来源。
methods: 研究使用了现有的语言模型（LLMs），包括GPT-4、PaLM和Llama，并对这些模型进行了分析和比较。
results: 研究发现了不同的商业和开源模型之间的区别，并发现了这些模型在不同的语言方面的差异。这些结果可以帮助未来的伪造探测和对抗人工智能生成的伪造文本。

Abstract
Large language models (LLMs) such as GPT-4, PaLM, and Llama have significantly propelled the generation of AI-crafted text. With rising concerns about their potential misuse, there is a pressing need for AI-generated-text forensics. Neural authorship attribution is a forensic effort, seeking to trace AI-generated text back to its originating LLM. The LLM landscape can be divided into two primary categories: proprietary and open-source. In this work, we delve into these emerging categories of LLMs, focusing on the nuances of neural authorship attribution. To enrich our understanding, we carry out an empirical analysis of LLM writing signatures, highlighting the contrasts between proprietary and open-source models, and scrutinizing variations within each group. By integrating stylometric features across lexical, syntactic, and structural aspects of language, we explore their potential to yield interpretable results and augment pre-trained language model-based classifiers utilized in neural authorship attribution. Our findings, based on a range of state-of-the-art LLMs, provide empirical insights into neural authorship attribution, paving the way for future investigations aimed at mitigating the threats posed by AI-generated misinformation.

摘要
大型语言模型（LLM）如GPT-4、PaLM和Llama已经有效地推动人工生成的文本生成。随着AI生成文本的可能性的滥用，有一个急需AI生成文本科学的发展。人工作文本识别是一种审查AI生成文本的努力，寻求跟踪AI生成文本的来源LLM。LLM领域可以分为两个主要类别： propriety 和 open-source。在这项工作中，我们深入探究这些新兴类别的LLM，强调在 neural authorship attribution 方面的特点。通过对 LLM 的写作特征进行实质分析，包括 lexical、syntactic 和 structural 方面的语言特征，我们探讨了这些特征是否可以生成可读取的结果，并可以增强基于预训练语言模型的扩展语言模型来进行分类。我们的发现，基于一系列现代LLM，提供了实质性的启示，帮助我们更好地理解 neural authorship attribution，并为未来防止AI生成的误导而做出了重要贡献。

Why Not? Explaining Missing Entailments with Evee (Technical Report)

paper_url: http://arxiv.org/abs/2308.07294
repo_url: None
paper_authors: Christian Alrabbaa, Stefan Borgwardt, Tom Friese, Patrick Koopmann, Mikhail Kotlov
for: 本研究旨在帮助ontology用户更好地理解逻辑推论 derivations。
methods: 本研究使用了描述逻辑理解器和Protégé中的插件，以及新的abduction和 counterexample技术来解释缺失的结论。
results: 本研究提出了一种新的 $\rm E{\scriptsize VEE}$ 插件，可以帮助用户更好地理解ontology中缺失的结论。

Abstract
Understanding logical entailments derived by a description logic reasoner is not always straight-forward for ontology users. For this reason, various methods for explaining entailments using justifications and proofs have been developed and implemented as plug-ins for the ontology editor Prot\'eg\'e. However, when the user expects a missing consequence to hold, it is equally important to explain why it does not follow from the ontology. In this paper, we describe a new version of $\rm E{\scriptsize VEE}$, a Prot\'eg\'e plugin that now also provides explanations for missing consequences, via existing and new techniques based on abduction and counterexamples.

摘要
理解推理推论结论由描述逻辑理解者不一定是直观的，对ontology用户而言。为此，各种用于说明推论使用证明和证据的方法已经开发和实现为Protégé编辑器插件。然而，当用户期望缺失的结论不存在时，也是非常重要的解释为什么不从ontology中推论出来。在这篇论文中，我们描述了一种新版本的 $\rm E{\scriptsize VEE}$，一个Protégé插件，现在还提供缺失结论的解释，通过现有和新的技术基于推理和反例。

Cross-Attribute Matrix Factorization Model with Shared User Embedding

paper_url: http://arxiv.org/abs/2308.07284
repo_url: None
paper_authors: Wen Liang, Zeng Fan, Youzhi Liang, Jianguo Jia
for: 这个研究旨在应用深度学习技术来解决推荐系统中的寒冷开始问题，并且考虑用户和项目的特征对推荐系统的影响。
methods: 本研究使用的方法是内容匹配网络（Neural Matrix Factorization，NeuMF），并且将用户和项目的特征考虑进行扩展。
results: 实验结果显示，我们的跨特征网络匹配网络（Cross-Attribute Matrix Factorization，CAMF）模型在MovieLens和Pinterest dataset上具有较高的性能，特别是在资料集稀疏性较高的情况下。

Abstract
Over the past few years, deep learning has firmly established its prowess across various domains, including computer vision, speech recognition, and natural language processing. Motivated by its outstanding success, researchers have been directing their efforts towards applying deep learning techniques to recommender systems. Neural collaborative filtering (NCF) and Neural Matrix Factorization (NeuMF) refreshes the traditional inner product in matrix factorization with a neural architecture capable of learning complex and data-driven functions. While these models effectively capture user-item interactions, they overlook the specific attributes of both users and items. This can lead to robustness issues, especially for items and users that belong to the "long tail". Such challenges are commonly recognized in recommender systems as a part of the cold-start problem. A direct and intuitive approach to address this issue is by leveraging the features and attributes of the items and users themselves. In this paper, we introduce a refined NeuMF model that considers not only the interaction between users and items, but also acrossing associated attributes. Moreover, our proposed architecture features a shared user embedding, seamlessly integrating with user embeddings to imporve the robustness and effectively address the cold-start problem. Rigorous experiments on both the Movielens and Pinterest datasets demonstrate the superiority of our Cross-Attribute Matrix Factorization model, particularly in scenarios characterized by higher dataset sparsity.

摘要
在过去几年中，深度学习在不同领域，如计算机视觉、语音识别和自然语言处理等领域，都有着杰出的成绩。驱动于其出色的成绩，研究人员开始将深度学习技术应用于推荐系统。基于神经网络的共同积分（NCF）和神经矩阵分解（NeuMF）等模型，将传统的内积分换成神经网络架构，能够学习复杂的数据驱动函数。然而，这些模型可能会忽视用户和项目的特定属性，这可能会导致稳定性问题，尤其是对于“长尾”用户和项目。这种问题在推荐系统中广泛存在，通常被称为冷启动问题。在本文中，我们提出了一种改进的NeuMF模型，该模型不仅考虑用户和项目之间的交互，还考虑用户和项目之间的关联属性。此外，我们的提议的架构还包括共享用户嵌入，可以融合用户嵌入，从而提高系统的稳定性和效果地解决冷启动问题。我们在MovieLens和Pinterest数据集上进行了严格的实验，并证明我们的横向积分模型在数据集稀缺的情况下表现出优异性。

Autonomous Point Cloud Segmentation for Power Lines Inspection in Smart Grid

paper_url: http://arxiv.org/abs/2308.07283
repo_url: None
paper_authors: Alexander Kyuroson, Anton Koval, George Nikolakopoulos
for: 本研究旨在提出一种无监督机器学习（ML）框架，用于从 LiDAR 数据中探测、提取和分析高压和低压电缆的特征，以及相关的绿色区域。
methods: 提议的方法包括Initially eliminating ground points based on statistical analysis, denoising and transforming the remaining candidate points using Principle Component Analysis (PCA) and Kd-tree, and then segmenting power lines using a two-stage DBSCAN clustering.
results: 实验结果表明，提议的框架可以效率地探测电缆和进行相关的风险分析。

Abstract
LiDAR is currently one of the most utilized sensors to effectively monitor the status of power lines and facilitate the inspection of remote power distribution networks and related infrastructures. To ensure the safe operation of the smart grid, various remote data acquisition strategies, such as Airborne Laser Scanning (ALS), Mobile Laser Scanning (MLS), and Terrestrial Laser Scanning (TSL) have been leveraged to allow continuous monitoring of regional power networks, which are typically surrounded by dense vegetation. In this article, an unsupervised Machine Learning (ML) framework is proposed, to detect, extract and analyze the characteristics of power lines of both high and low voltage, as well as the surrounding vegetation in a Power Line Corridor (PLC) solely from LiDAR data. Initially, the proposed approach eliminates the ground points from higher elevation points based on statistical analysis that applies density criteria and histogram thresholding. After denoising and transforming of the remaining candidate points by applying Principle Component Analysis (PCA) and Kd-tree, power line segmentation is achieved by utilizing a two-stage DBSCAN clustering to identify each power line individually. Finally, all high elevation points in the PLC are identified based on their distance to the newly segmented power lines. Conducted experiments illustrate that the proposed framework is an agnostic method that can efficiently detect the power lines and perform PLC-based hazard analysis.

摘要
利达（LiDAR）现在是智能电网运行的一个最广泛使用的感知器，用于监测电力线路的状况和远程电力分布网络和相关基础设施的检查。为保证智能电网的安全运行，远程数据获取策略，如空中激光扫描（ALS）、移动激光扫描（MLS）和地面激光扫描（TSL）已经被利用，以实现区域电网的连续监测，这些区域通常被密集的植被环绕着。在这篇文章中，一种无监测机器学习（ML）框架被提议，用于从LiDAR数据中检测、提取和分析高压和低压电力线路的特征，以及周围的植被。首先，提议的方法从高空点的高程点中排除地面点，基于统计分析，应用密度标准和对 histogram 进行阈值设置。接着，通过应用原理Components分析（PCA）和 Kd-tree 转换，对剩下的候选点进行减噪和变换。然后，通过使用两个阶段 DBSCAN 聚合，对每个电力线 individually 进行分类，以实现电力线的分 Segmentation。最后，通过对新分类的高空点进行距离计算，全部在 PLC 中的高空点被标识出来。实验表明，提议的方法是一种无关的方法，可以有效地检测电力线和在 PLC 中进行风险分析。

Data-Efficient Energy-Aware Participant Selection for UAV-Enabled Federated Learning

paper_url: http://arxiv.org/abs/2308.07273
repo_url: None
paper_authors: Youssra Cheriguene, Wael Jaafar, Chaker Abdelaziz Kerrache, Halim Yanikomeroglu, Fatima Zohra Bousbaa, Nasreddine Lagraa
for: 提高Edge Federated Learning（FL）模型的准确性，并且考虑UAV的能源消耗和通信质量等约束。
methods: 提出了一种新的UAV参与者选择策略，即基于地区数据的结构相似度指数平均分数和能源消耗配置的数据高效能观测选择策略（DEEPS）。
results: 通过实验，提出的选择策略比废 randomly选择策略更高的Edge FL模型准确性、训练时间和UAV能源消耗。

Abstract
Unmanned aerial vehicle (UAV)-enabled edge federated learning (FL) has sparked a rise in research interest as a result of the massive and heterogeneous data collected by UAVs, as well as the privacy concerns related to UAV data transmissions to edge servers. However, due to the redundancy of UAV collected data, e.g., imaging data, and non-rigorous FL participant selection, the convergence time of the FL learning process and bias of the FL model may increase. Consequently, we investigate in this paper the problem of selecting UAV participants for edge FL, aiming to improve the FL model's accuracy, under UAV constraints of energy consumption, communication quality, and local datasets' heterogeneity. We propose a novel UAV participant selection scheme, called data-efficient energy-aware participant selection strategy (DEEPS), which consists of selecting the best FL participant in each sub-region based on the structural similarity index measure (SSIM) average score of its local dataset and its power consumption profile. Through experiments, we demonstrate that the proposed selection scheme is superior to the benchmark random selection method, in terms of model accuracy, training time, and UAV energy consumption.

摘要
“无人机（UAV）启用边缘联合学习（FL）已经引起了研究兴趣，由于UAV收集的数据量庞大和多样化，以及UAV数据传输到边缘服务器的隐私问题。然而，由于UAV收集的数据重复，如图像数据，以及不严谨的FL参与者选择，FL学习过程的收敛时间和模型偏见可能增加。因此，我们在本纸中 investigate 选择UAV参与者 для边缘FL，以提高FL模型的准确性，在UAV的能量消耗、通信质量和本地数据的多样性等因素限制下。我们提出了一种新的UAV参与者选择策略，称为数据高效能觉 participant selection strategy（DEEPS），该策略基于每个子区域中的本地数据和能量消耗profile中的结构相似度平均分数来选择最佳的FL参与者。通过实验，我们示出了提案的选择策略与参照方法（随机选择）相比，在模型准确性、训练时间和UAV能量消耗方面具有显著优势。”Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China. If you need the translation in Traditional Chinese, please let me know.

EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models

paper_url: http://arxiv.org/abs/2308.07269
repo_url: https://github.com/zjunlp/easyedit
paper_authors: Peng Wang, Ningyu Zhang, Xin Xie, Yunzhi Yao, Bozhong Tian, Mengru Wang, Zekun Xi, Siyuan Cheng, Kangwei Liu, Guozhou Zheng, Huajun Chen
for: 提高LLMs的知识更新和修正能力，以提高其可靠性和通用性。
methods: 支持多种现代知识编辑方法，可以轻松应用于多种well-known LLMs。
results: 在LlaMA-2上进行了知识编辑实验，表明知识编辑比传统精度uning更高效和更通用。

Abstract
Large Language Models (LLMs) usually suffer from knowledge cutoff or fallacy issues, which means they are unaware of unseen events or generate text with incorrect facts owing to the outdated/noisy data. To this end, many knowledge editing approaches for LLMs have emerged -- aiming to subtly inject/edit updated knowledge or adjust undesired behavior while minimizing the impact on unrelated inputs. Nevertheless, due to significant differences among various knowledge editing methods and the variations in task setups, there is no standard implementation framework available for the community, which hinders practitioners to apply knowledge editing to applications. To address these issues, we propose EasyEdit, an easy-to-use knowledge editing framework for LLMs. It supports various cutting-edge knowledge editing approaches and can be readily apply to many well-known LLMs such as T5, GPT-J, LlaMA, etc. Empirically, we report the knowledge editing results on LlaMA-2 with EasyEdit, demonstrating that knowledge editing surpasses traditional fine-tuning in terms of reliability and generalization. We have released the source code on GitHub at https://github.com/zjunlp/EasyEdit, along with Google Colab tutorials and comprehensive documentation for beginners to get started. Besides, we present an online system for real-time knowledge editing, and a demo video at http://knowlm.zjukg.cn/easyedit.mp4.

摘要
大型语言模型（LLM）通常会受到知识剖除或误差问题的影响，这意味着它们不知道未经见过的事件或生成文本中含有错误的信息，这由于模型使用的数据过时或噪音有关。为解决这些问题，许多知识编辑方法 для LLM emerged -- hoping to subtly inject 或编辑更新的知识或调整不符合预期的行为，同时尽量减少对无关输入的影响。然而，由于不同的知识编辑方法之间存在差异，以及任务设置的变化，当前没有一个标准的实现框架可供社区使用，这限制了实践者在应用知识编辑方法时的能力。为解决这些问题，我们提出了 EasyEdit，一个易于使用的知识编辑框架 для LLM。它支持多种前沿知识编辑方法，并可以 readily 应用于多个知名的 LLM such as T5, GPT-J, LlaMA, etc. 我们在 LlaMA-2 上进行了知识编辑试验，并证明知识编辑超过了传统的精细调整在可靠性和泛化方面的表现。我们在 GitHub 上公布了源代码，并提供了 Google Colab 教程和详细的文档，以便初学者快速入门。此外，我们还提供了在线实时知识编辑系统和 demo 视频，请参考 http://knowlm.zjukg.cn/easyedit.mp4.

Can we Agree? On the Rashōmon Effect and the Reliability of Post-Hoc Explainable AI

paper_url: http://arxiv.org/abs/2308.07247
repo_url: None
paper_authors: Clement Poiret, Antoine Grigis, Justin Thomas, Marion Noulhiane
for: 这个研究检查了Rashomon效应对机器学习模型中的知识抽取所带来的挑战。
methods: 这个研究使用了SHAP方法对Rashomon集中的模型进行解释。
results: 实验结果显示，随着样本大小增加，解释的一致性逐渐提高，但在少于128个样本的情况下，解释具有高度的变化性，因此不可靠地抽取知识。然而，在更多的数据下，模型之间的一致性提高，allowing for consensus。 bagging ensemble often had higher agreement。这些结果为我们提供了足够的数据来信任解释的指南。

Abstract
The Rash\=omon effect poses challenges for deriving reliable knowledge from machine learning models. This study examined the influence of sample size on explanations from models in a Rash\=omon set using SHAP. Experiments on 5 public datasets showed that explanations gradually converged as the sample size increased. Explanations from <128 samples exhibited high variability, limiting reliable knowledge extraction. However, agreement between models improved with more data, allowing for consensus. Bagging ensembles often had higher agreement. The results provide guidance on sufficient data to trust explanations. Variability at low samples suggests that conclusions may be unreliable without validation. Further work is needed with more model types, data domains, and explanation methods. Testing convergence in neural networks and with model-specific explanation methods would be impactful. The approaches explored here point towards principled techniques for eliciting knowledge from ambiguous models.

摘要
瑞索蒙效应对机器学习模型中提取可靠知识带来挑战。这项研究检查了样本大小对模型解释的影响，使用SHAP进行了5个公共数据集的实验。结果显示，随着样本大小增加，解释逐渐协调，但从128个样本开始，解释具有高度的变化， limiting reliable knowledge extraction。然而，随着更多的数据，模型之间的一致性提高，allowing for consensus。 Bagging ensemble often had higher agreement。结果提供了足够数据来信任解释的指导，变化在低样本数量 suggets that conclusion may be unreliable without validation。进一步的工作需要更多的模型类型，数据领域和解释方法。测试 converges in neural networks和模型特定的解释方法会对其具有深远的影响。研究方法可以带来原则性的技术，从ambiguous models中提取知识。

Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents

paper_url: http://arxiv.org/abs/2308.07241
repo_url: None
paper_authors: Byeonghwi Kim, Jinyeon Kim, Yuyeong Kim, Cheolhong Min, Jonghyun Choi
for: 提高家庭任务完成需要规划一系列的行动，考虑到 previous 动作的后果。
methods: 我们提出了 Context-Aware Planning and Environment-Aware Memory (CAPEAM)，它将Semantic context (例如适合交互的物品)和改变的空间安排和交互对象的状态 (例如交互对象的移动位置)包含在一系列动作中，以推断后续动作。
results: 我们经验表明，搭载CAPEAM的机器人在多个指标中达到了最新的状态前的表现，包括在不同环境中完成交互指令的任务，差异为大致 (+10.70%在未看到环境中)。

Abstract
Accomplishing household tasks requires to plan step-by-step actions considering the consequences of previous actions. However, the state-of-the-art embodied agents often make mistakes in navigating the environment and interacting with proper objects due to imperfect learning by imitating experts or algorithmic planners without such knowledge. To improve both visual navigation and object interaction, we propose to consider the consequence of taken actions by CAPEAM (Context-Aware Planning and Environment-Aware Memory) that incorporates semantic context (e.g., appropriate objects to interact with) in a sequence of actions, and the changed spatial arrangement and states of interacted objects (e.g., location that the object has been moved to) in inferring the subsequent actions. We empirically show that the agent with the proposed CAPEAM achieves state-of-the-art performance in various metrics using a challenging interactive instruction following benchmark in both seen and unseen environments by large margins (up to +10.70% in unseen env.).

摘要
完成家务需要规划每一步行动，考虑先前行动的后果。然而，现状的凉身agent经常在环境中导航和与合适的物体交互时出错，因为它们通过专家学习或算法规划而学习的知识不够完善。为了提高视觉导航和物体交互，我们提议考虑行动的后果，通过Context-Aware Planning and Environment-Aware Memory（CAPEAM）来 incorporate semantic context（例如，与物体交互时适用的对象）在一系列动作中，以及交互对象的改变的空间布局和状态（例如，交互对象的移动位置）。我们实验表明，携带我们提议的CAPEAM的代理人在多种 metrics 中表现出STATE-OF-THE-ART的表现，包括在seen和unseen环境中的挑战性交互指令遵循测试中，差异达 +10.70%。

2023-08-15

REFORMS: Reporting Standards for Machine Learning Based Science

Tightest Admissible Shortest Path

Learning to Identify Critical States for Reinforcement Learning from Videos

Implementing Quantum Generative Adversarial Network (qGAN) and QCBM in Finance

Informed Named Entity Recognition Decoding for Generative Language Models

Do We Fully Understand Students’ Knowledge States? Identifying and Mitigating Answer Bias in Knowledge Tracing

Hierarchical generative modelling for autonomous robots

A Graph Encoder-Decoder Network for Unsupervised Anomaly Detection

MOLE: MOdular Learning FramEwork via Mutual Information Maximization

NeFL: Nested Federated Learning for Heterogeneous Clients

Dynamic Embedding Size Search with Minimum Regret for Streaming Recommender System

Forward-Backward Reasoning in Large Language Models for Verification

Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with Image Diffusion Model

Exploiting Sparsity in Automotive Radar Object Detection Networks

Formally-Sharp DAgger for MCTS: Lower-Latency Monte Carlo Tree Search using Data Aggregation with Formal Methods

Flashpoints Signal Hidden Inherent Instabilities in Land-Use Planning

Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models

DiffGuard: Semantic Mismatch-Guided Out-of-Distribution Detection using Pre-trained Diffusion Models

Boosting Multi-modal Model Performance with Adaptive Gradient Modulation

EQ-Net: Elastic Quantization Neural Networks

Ternary Singular Value Decomposition as a Better Parameterized Form in Linear Mapping

LLM-Mini-CEX: Automatic Evaluation of Large Language Model for Diagnostic Conversation

A Survey on Model Compression for Large Language Models

Vision-based Semantic Communications for Metaverse Services: A Contest Theoretic Approach

ERA*: Enhanced Relaxed A* algorithm for Solving the Shortest Path Problem in Regular Grid Maps

SGDiff: A Style Guided Diffusion Model for Fashion Synthesis

Generating Personas for Games with Multimodal Adversarial Imitation Learning

AutoLTS: Automating Cycling Stress Assessment via Contrastive Learning and Spatial Post-processing

IoT Data Trust Evaluation via Machine Learning

Story Visualization by Online Text Augmentation with Context Memory

Action Class Relation Detection and Classification Across Multiple Video Datasets

Reinforcement Learning (RL) Augmented Cold Start Frequency Reduction in Serverless Computing

Domain Adaptation via Minimax Entropy for Real/Bogus Classification of Astronomical Alerts

KMF: Knowledge-Aware Multi-Faceted Representation Learning for Zero-Shot Node Classification

Nonlinearity, Feedback and Uniform Consistency in Causal Structural Learning

Boosting Semi-Supervised Learning by bridging high and low-confidence predictions

Detecting The Corruption Of Online Questionnaires By Artificial Intelligence

DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation

ST-MLP: A Cascaded Spatio-Temporal Linear Framework with Channel-Independence Strategy for Traffic Forecasting

Omega-Regular Reward Machines

Playing with Words: Comparing the Vocabulary and Lexical Richness of ChatGPT and Humans

Inductive Knowledge Graph Completion with GNNs and Rules: An Analysis

Artificial Intelligence for Smart Transportation

GRU-D-Weibull: A Novel Real-Time Individualized Endpoint Prediction

Open-set Face Recognition using Ensembles trained on Clustered Data

The Performance of Transferability Metrics does not Translate to Medical Tasks

Physics-Informed Deep Learning to Reduce the Bias in Joint Prediction of Nitrogen Oxides

Interaction-Aware Personalized Vehicle Trajectory Prediction Using Temporal Graph Neural Networks

Semantic Similarity Loss for Neural Source Code Summarization

UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity

Exploring the Intersection of Large Language Models and Agent-Based Modeling via Prompt Engineering

PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects

LLM Self Defense: By Self Examination, LLMs Know They Are Being Tricked

Extend Wave Function Collapse to Large-Scale Content Generation

Neural Authorship Attribution: Stylometric Analysis on Large Language Models

Why Not? Explaining Missing Entailments with Evee (Technical Report)

Cross-Attribute Matrix Factorization Model with Shared User Embedding

Autonomous Point Cloud Segmentation for Power Lines Inspection in Smart Grid

Data-Efficient Energy-Aware Participant Selection for UAV-Enabled Federated Learning

EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models

Can we Agree? On the Rashōmon Effect and the Reliability of Post-Hoc Explainable AI

Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents

ERA: Enhanced Relaxed A algorithm for Solving the Shortest Path Problem in Regular Grid Maps