2023-07-30

cs.AI

cs.AI - 2023-07-30

DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and Delivery Route Prediction

paper_url: http://arxiv.org/abs/2307.16246
repo_url: https://github.com/maoxiaowei97/drl4route
paper_authors: Xiaowei Mao, Haomin Wen, Hengrui Zhang, Huaiyu Wan, Lixia Wu, Jianbin Zheng, Haoyuan Hu, Youfang Lin
For: 预测工作者的服务路线（PDRP），以便估算未来服务任务的路线，在过去几年内受到了越来越多的关注。* Methods: 使用深度神经网络和强化学习框架，将工作者的行为模式从大量历史数据中学习出来，并将非导数对象优化纳入训练过程中。* Results: 对实际数据集进行了广泛的离线实验和在线部署，并显示了对PDRP的改进，包括Location Square Deviation（LSD）和Accuracy@3（ACC@3）的改进。

Abstract
Pick-up and Delivery Route Prediction (PDRP), which aims to estimate the future service route of a worker given his current task pool, has received rising attention in recent years. Deep neural networks based on supervised learning have emerged as the dominant model for the task because of their powerful ability to capture workers' behavior patterns from massive historical data. Though promising, they fail to introduce the non-differentiable test criteria into the training process, leading to a mismatch in training and test criteria. Which considerably trims down their performance when applied in practical systems. To tackle the above issue, we present the first attempt to generalize Reinforcement Learning (RL) to the route prediction task, leading to a novel RL-based framework called DRL4Route. It combines the behavior-learning abilities of previous deep learning models with the non-differentiable objective optimization ability of reinforcement learning. DRL4Route can serve as a plug-and-play component to boost the existing deep learning models. Based on the framework, we further implement a model named DRL4Route-GAE for PDRP in logistic service. It follows the actor-critic architecture which is equipped with a Generalized Advantage Estimator that can balance the bias and variance of the policy gradient estimates, thus achieving a more optimal policy. Extensive offline experiments and the online deployment show that DRL4Route-GAE improves Location Square Deviation (LSD) by 0.9%-2.7%, and Accuracy@3 (ACC@3) by 2.4%-3.2% over existing methods on the real-world dataset.

摘要
picked-up 和交付路线预测（PDRP）在最近几年内收到了越来越多的关注。深度神经网络基于超级vised学习 emerged as the dominant model for the task because of their powerful ability to capture workers' behavior patterns from massive historical data。although promising，they fail to introduce the non-differentiable test criteria into the training process，leading to a mismatch in training and test criteria。Which considerably trims down their performance when applied in practical systems。To tackle the above issue，we present the first attempt to generalize Reinforcement Learning（RL）to the route prediction task，leading to a novel RL-based framework called DRL4Route。It combines the behavior-learning abilities of previous deep learning models with the non-differentiable objective optimization ability of reinforcement learning。DRL4Route can serve as a plug-and-play component to boost the existing deep learning models。Based on the framework，we further implement a model named DRL4Route-GAE for PDRP in logistic service。It follows the actor-critic architecture which is equipped with a Generalized Advantage Estimator that can balance the bias and variance of the policy gradient estimates，thus achieving a more optimal policy。Extensive offline experiments and the online deployment show that DRL4Route-GAE improves Location Square Deviation（LSD）by 0.9%-2.7%，and Accuracy@3（ACC@3）by 2.4%-3.2% over existing methods on the real-world dataset。

Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning: A Survey

paper_url: http://arxiv.org/abs/2307.16236
repo_url: None
paper_authors: Gabriele Lagani, Fabrizio Falchi, Claudio Gennaro, Giuseppe Amato
for: 本研究旨在探讨基于深度学习（DL）技术的新兴应用，以及与生物体针对的机制相关的挑战。
methods: 本文综述了一些基于生物机制的深度学习模型，包括synaptic plasticity模型，以及与脉冲神经网络（SNNs）相关的模型。
results: 本研究发现，基于生物机制的深度学习模型在多种应用场景中表现出色，并且可能解决一些DL技术面临的挑战，如针对攻击和生态影响。

Abstract
Recently emerged technologies based on Deep Learning (DL) achieved outstanding results on a variety of tasks in the field of Artificial Intelligence (AI). However, these encounter several challenges related to robustness to adversarial inputs, ecological impact, and the necessity of huge amounts of training data. In response, researchers are focusing more and more interest on biologically grounded mechanisms, which are appealing due to the impressive capabilities exhibited by biological brains. This survey explores a range of these biologically inspired models of synaptic plasticity, their application in DL scenarios, and the connections with models of plasticity in Spiking Neural Networks (SNNs). Overall, Bio-Inspired Deep Learning (BIDL) represents an exciting research direction, aiming at advancing not only our current technologies but also our understanding of intelligence.

摘要

Spiking Neural Networks and Bio-Inspired Supervised Deep Learning: A Survey

paper_url: http://arxiv.org/abs/2307.16235
repo_url: None
paper_authors: Gabriele Lagani, Fabrizio Falchi, Claudio Gennaro, Giuseppe Amato
for:本文提供了一个全面的评论，涵盖最近基于生物学的人工智能技术发展的方法。methods:本文 introduce了生物神经元计算原则和 synaptic plasticity，并提供了精炼的脉冲神经网络（SNN）模型，以及对SNN训练的主要挑战。results:本文讨论了一些基于生物学的训练方法，作为传统backprop-based优化的替代方案，以提高当前模型的计算能力和生物学可能性。

Abstract
For a long time, biology and neuroscience fields have been a great source of inspiration for computer scientists, towards the development of Artificial Intelligence (AI) technologies. This survey aims at providing a comprehensive review of recent biologically-inspired approaches for AI. After introducing the main principles of computation and synaptic plasticity in biological neurons, we provide a thorough presentation of Spiking Neural Network (SNN) models, and we highlight the main challenges related to SNN training, where traditional backprop-based optimization is not directly applicable. Therefore, we discuss recent bio-inspired training methods, which pose themselves as alternatives to backprop, both for traditional and spiking networks. Bio-Inspired Deep Learning (BIDL) approaches towards advancing the computational capabilities and biological plausibility of current models.

摘要
For a long time, biology and neuroscience fields have been a great source of inspiration for computer scientists, towards the development of Artificial Intelligence (AI) technologies. This survey aims at providing a comprehensive review of recent biologically-inspired approaches for AI. After introducing the main principles of computation and synaptic plasticity in biological neurons, we provide a thorough presentation of Spiking Neural Network (SNN) models, and we highlight the main challenges related to SNN training, where traditional backprop-based optimization is not directly applicable. Therefore, we discuss recent bio-inspired training methods, which pose themselves as alternatives to backprop, both for traditional and spiking networks. Bio-Inspired Deep Learning (BIDL) approaches towards advancing the computational capabilities and biological plausibility of current models.Here is the translation in Traditional Chinese:For a long time, biology and neuroscience fields have been a great source of inspiration for computer scientists, towards the development of Artificial Intelligence (AI) technologies. This survey aims at providing a comprehensive review of recent biologically-inspired approaches for AI. After introducing the main principles of computation and synaptic plasticity in biological neurons, we provide a thorough presentation of Spiking Neural Network (SNN) models, and we highlight the main challenges related to SNN training, where traditional backprop-based optimization is not directly applicable. Therefore, we discuss recent bio-inspired training methods, which pose themselves as alternatives to backprop, both for traditional and spiking networks. Bio-Inspired Deep Learning (BIDL) approaches towards advancing the computational capabilities and biological plausibility of current models.

Robust Electric Vehicle Balancing of Autonomous Mobility-On-Demand System: A Multi-Agent Reinforcement Learning Approach

paper_url: http://arxiv.org/abs/2307.16228
repo_url: None
paper_authors: Sihong He, Shuo Han, Fei Miao
for:This paper aims to design a multi-agent reinforcement learning (MARL) framework for electric autonomous vehicles (EAVs) balancing in future autonomous mobility-on-demand (AMoD) systems, with adversarial agents to model both the EAVs supply and mobility demand uncertainties.methods:The proposed method uses a MARL-based framework to train a robust EAVs balancing policy that considers both the supply-demand ratio and charging utilization rate across the whole city.results:Experiments show that the proposed robust method performs better compared with a non-robust MARL method, with improvements of 19.28% in reward, 28.18% in charging utilization fairness, and 3.97% in supply-demand fairness. Compared with a robust optimization-based method, the proposed MARL algorithm improves the reward, charging utilization fairness, and supply-demand fairness by 8.21%, 8.29%, and 9.42%, respectively.

Abstract
Electric autonomous vehicles (EAVs) are getting attention in future autonomous mobility-on-demand (AMoD) systems due to their economic and societal benefits. However, EAVs' unique charging patterns (long charging time, high charging frequency, unpredictable charging behaviors, etc.) make it challenging to accurately predict the EAVs supply in E-AMoD systems. Furthermore, the mobility demand's prediction uncertainty makes it an urgent and challenging task to design an integrated vehicle balancing solution under supply and demand uncertainties. Despite the success of reinforcement learning-based E-AMoD balancing algorithms, state uncertainties under the EV supply or mobility demand remain unexplored. In this work, we design a multi-agent reinforcement learning (MARL)-based framework for EAVs balancing in E-AMoD systems, with adversarial agents to model both the EAVs supply and mobility demand uncertainties that may undermine the vehicle balancing solutions. We then propose a robust E-AMoD Balancing MARL (REBAMA) algorithm to train a robust EAVs balancing policy to balance both the supply-demand ratio and charging utilization rate across the whole city. Experiments show that our proposed robust method performs better compared with a non-robust MARL method that does not consider state uncertainties; it improves the reward, charging utilization fairness, and supply-demand fairness by 19.28%, 28.18%, and 3.97%, respectively. Compared with a robust optimization-based method, the proposed MARL algorithm can improve the reward, charging utilization fairness, and supply-demand fairness by 8.21%, 8.29%, and 9.42%, respectively.

摘要
电动自驾车 (EAVs) 在未来的自动化 Shared Mobility-on-Demand (AMoD) 系统中受到关注，因为它们具有经济和社会的优势。然而，EAVs 的充电模式 (长时间充电、高频充电、不可预测的充电行为等) 使得预测 EAVs 供应很具有挑战性。此外， mobilité 需求预测的不确定性使得设计一个集成的车辆均衡解决方案变得非常困难和挑战。虽然 reinforcement learning 基于 E-AMoD 均衡算法得到了成功，但是 state uncertainties under the EV 供应或 mobilité 需求仍然未经探讨。在这种情况下，我们设计了一个多代理启发学 (MARL) 基本框架 для EAVs 均衡在 E-AMoD 系统中，并使用对抗代理来模拟 EAVs 供应和 mobilité 需求不确定性。然后，我们提出了一种可靠的 E-AMoD Balancing MARL (REBAMA) 算法，用于训练一个可靠的 EAVs 均衡策略，以平衡全市的供应和需求比例，同时保证充电利用率的平衡。实验显示，我们的提出的可靠方法在比较 non-robust MARL 方法时表现更好，提高了奖励、充电利用公平和供应需求公平的指标，分别提高了19.28%, 28.18%和3.97%。相比robust optimization-based方法，我们的 MARL 算法可以提高奖励、充电利用公平和供应需求公平的指标，分别提高了8.21%, 8.29%和9.42%。

Text Analysis Using Deep Neural Networks in Digital Humanities and Information Science

paper_url: http://arxiv.org/abs/2307.16217
repo_url: None
paper_authors: Omri Suissa, Avshalom Elmalech, Maayan Zhitomirsky-Geffet
for: This paper aims to explore the use of deep neural networks (DNNs) in Digital Humanities (DH) research and provide a practical decision model for DH experts to choose the appropriate deep learning approaches for their research.
methods: The paper analyzes multiple use-cases of DH studies in recent literature and their possible solutions, and lays out a practical decision model for DH experts to choose the appropriate deep learning approaches for their research.
results: The paper aims to raise awareness of the benefits of utilizing deep learning models in the DH community and provide a practical decision model for DH experts to choose the appropriate deep learning approaches for their research.Here’s the simplified Chinese text in the format you requested:
for: 这篇论文目的是探讨数字人文学科（DH）研究中使用深度神经网络（DNN）的可能性和实践。
methods: 论文分析了最新的DH研究文献中的多个用例和可能的解决方案，并提供了实用的决策模型，帮助DH专家选择适合他们研究的深度学习方法。
results: 论文的目的是为DH社区宣传深度学习模型的利好，并提供实用的决策模型，帮助DH专家选择适合他们研究的深度学习方法。

Abstract
Combining computational technologies and humanities is an ongoing effort aimed at making resources such as texts, images, audio, video, and other artifacts digitally available, searchable, and analyzable. In recent years, deep neural networks (DNN) dominate the field of automatic text analysis and natural language processing (NLP), in some cases presenting a super-human performance. DNNs are the state-of-the-art machine learning algorithms solving many NLP tasks that are relevant for Digital Humanities (DH) research, such as spell checking, language detection, entity extraction, author detection, question answering, and other tasks. These supervised algorithms learn patterns from a large number of "right" and "wrong" examples and apply them to new examples. However, using DNNs for analyzing the text resources in DH research presents two main challenges: (un)availability of training data and a need for domain adaptation. This paper explores these challenges by analyzing multiple use-cases of DH studies in recent literature and their possible solutions and lays out a practical decision model for DH experts for when and how to choose the appropriate deep learning approaches for their research. Moreover, in this paper, we aim to raise awareness of the benefits of utilizing deep learning models in the DH community.

摘要
使用计算机技术和人文领域的结合是一项持续的努力，旨在使文本、图像、音频、视频和其他文化遗产数字化、搜索化和分析化。在最近几年里，深度神经网络（DNN）在自动文本分析和自然语言处理（NLP）领域占据了主导地位，在某些情况下表现出超人般的表现。DNN是当今最先进的机器学习算法，用于解决数字人文学科（DH）研究中有关的许多NLP任务，如拼写检查、语言检测、实体提取、作者检测、问答等任务。这些有监督的算法通过大量“正确”和“错误”示例学习出模式，然后应用于新示例。然而，在使用DNN进行人文学科研究中，存在两大挑战：数据训练的可用性和领域适应。本文通过分析多个DH研究中的用例，探讨这些挑战并提出解决方案，并提出了实用的决策模型，以帮助DH专家在选择合适的深度学习方法时作出决策。此外，本文的目的还是提高人文学科社区使用深度学习模型的认识。

Question Answering with Deep Neural Networks for Semi-Structured Heterogeneous Genealogical Knowledge Graphs

paper_url: http://arxiv.org/abs/2307.16214
repo_url: https://github.com/omrivm/uncle-bert
paper_authors: Omri Suissa, Maayan Zhitomirsky-Geffet, Avshalom Elmalech
for: 这个研究旨在开发一种基于家谱树的问答系统，以便为家谱研究提供更好的支持。
methods: 这个研究使用了一种综合家谱数据作为知识图，然后将其转换为文本，并将文本与不结构化文本混合在一起，最后使用一种基于Transformer架构的问答模型进行训练。
results: 研究发现，使用专门的方法可以减少问答模型的复杂性，同时提高准确性。这种方法可能对家谱研究和实际项目有实际应用，使家谱数据更加可 accessible。

Abstract
With the rising popularity of user-generated genealogical family trees, new genealogical information systems have been developed. State-of-the-art natural question answering algorithms use deep neural network (DNN) architecture based on self-attention networks. However, some of these models use sequence-based inputs and are not suitable to work with graph-based structure, while graph-based DNN models rely on high levels of comprehensiveness of knowledge graphs that is nonexistent in the genealogical domain. Moreover, these supervised DNN models require training datasets that are absent in the genealogical domain. This study proposes an end-to-end approach for question answering using genealogical family trees by: 1) representing genealogical data as knowledge graphs, 2) converting them to texts, 3) combining them with unstructured texts, and 4) training a trans-former-based question answering model. To evaluate the need for a dedicated approach, a comparison between the fine-tuned model (Uncle-BERT) trained on the auto-generated genealogical dataset and state-of-the-art question-answering models was per-formed. The findings indicate that there are significant differences between answering genealogical questions and open-domain questions. Moreover, the proposed methodology reduces complexity while increasing accuracy and may have practical implications for genealogical research and real-world projects, making genealogical data accessible to experts as well as the general public.

摘要
随着用户生成的家谱树的流行，新的家谱信息系统被开发出来。现代自然问答算法使用深度神经网络（DNN）架构，其中一些模型使用序列化输入并不适用于图形结构，而图形基于DNN模型则需要高度完整的知识图，而这在家谱领域并不存在。此外，这些直接监督DNN模型需要家谱领域缺乏训练数据。本研究提出了一种端到端方法，通过以下步骤来解决问题：1）将家谱数据转换为知识图，2）将其转换为文本，3）将文本与无结构文本结合，4）使用转换器基于模型来回答问题。为评估需要专门的方法，对自动生成的家谱数据 fine-tune Uncle-BERT 模型和现有的问答模型进行比较。研究发现，回答家谱问题和开放领域问题存在显著差异。此外，提出的方法可以减少复杂性而提高准确率，可能对家谱研究和实际项目产生实质性的影响，让家谱数据更加可访易地访问ible for experts and the general public。

Robust Multi-Agent Reinforcement Learning with State Uncertainty

paper_url: http://arxiv.org/abs/2307.16212
repo_url: https://github.com/sihongho/robust_marl_with_state_uncertainty
paper_authors: Sihong He, Songyang Han, Sanbao Su, Shuo Han, Shaofeng Zou, Fei Miao
for: 本研究旨在解决多智能体强化学习（MARL）中存在状态不确定性的问题，提高 MARL 的稳定性和可靠性。
methods: 本文提出了一种基于 Markov Game 的状态扰动敌对（MG-SPA）模型，并使用 robust equilibrium（RE）作为解题方法。同时，提出了一种基于 Q-学习的 robust multi-agent Q-learning（RMAQ）算法，以及一种基于actor-critic 算法的 robust multi-agent actor-critic（RMAAC）算法，以处理高维状态动作空间。
results: 实验结果表明，提出的 RMAQ 算法可以寻求最优值函数；RMAAC 算法在多个多智能体环境中，在状态不确定性存在时，与多种 MARL 和robust MARL 方法相比，表现更高效。

Abstract
In real-world multi-agent reinforcement learning (MARL) applications, agents may not have perfect state information (e.g., due to inaccurate measurement or malicious attacks), which challenges the robustness of agents' policies. Though robustness is getting important in MARL deployment, little prior work has studied state uncertainties in MARL, neither in problem formulation nor algorithm design. Motivated by this robustness issue and the lack of corresponding studies, we study the problem of MARL with state uncertainty in this work. We provide the first attempt to the theoretical and empirical analysis of this challenging problem. We first model the problem as a Markov Game with state perturbation adversaries (MG-SPA) by introducing a set of state perturbation adversaries into a Markov Game. We then introduce robust equilibrium (RE) as the solution concept of an MG-SPA. We conduct a fundamental analysis regarding MG-SPA such as giving conditions under which such a robust equilibrium exists. Then we propose a robust multi-agent Q-learning (RMAQ) algorithm to find such an equilibrium, with convergence guarantees. To handle high-dimensional state-action space, we design a robust multi-agent actor-critic (RMAAC) algorithm based on an analytical expression of the policy gradient derived in the paper. Our experiments show that the proposed RMAQ algorithm converges to the optimal value function; our RMAAC algorithm outperforms several MARL and robust MARL methods in multiple multi-agent environments when state uncertainty is present. The source code is public on \url{https://github.com/sihongho/robust_marl_with_state_uncertainty}.

摘要
在实际多智能体学习（MARL）应用中，智能体可能无法获得完美的状态信息（例如因为不准确的测量或攻击），这会对智能体的策略的稳定性造成挑战。虽然稳定性在MARL部署中变得越来越重要，但是前一个研究中对状态不确定性在MARL中的研究很少。为了解决这个稳定性问题和相关的研究不足，我们在这里研究了MARL中的状态不确定性问题。我们首先将问题模型为一个Markov游戏中的状态干扰者（MG-SPA），并在Markov游戏中引入一组状态干扰者。然后，我们引入了稳定平衡（RE）作为MG-SPA的解决方案。我们进行了基本的分析，并给出了存在稳定平衡的条件。然后，我们提出了一种稳定多智能体Q学习（RMAQ）算法，以找到这样的平衡，并提供了一些确定性的证明。为了处理高维状态动作空间，我们设计了一种基于分析表达的策略梯度的稳定多智能体actor-critic（RMAAC）算法。我们的实验表明，我们的RMAQ算法可以到达最优的值函数；我们的RMAAC算法在多个多智能体环境中，当状态不确定性存在时，与多个MARL和稳定MARL方法相比，表现更好。源代码可以在上获取。

paper_url: http://arxiv.org/abs/2307.16210
repo_url: https://github.com/zjukg/UMAEA
paper_authors: Zhuo Chen, Lingbing Guo, Yin Fang, Yichi Zhang, Jiaoyan Chen, Jeff Z. Pan, Yangning Li, Huajun Chen, Wen Zhang
for: 多modalentityAlignment (MMEA) 的挑战，包括模式噪声和内在的模式不确定性。
methods: 我们提出了一种基于 uncertainly missing and ambiguous visual modalities的Robust Multi-modal Entity Alignment (UMAEA) 方法，并在多个 benchmark splits 上达到了最佳性能。
results: UMAEA 方法在face of modality incompleteness和模式不确定性中具有优秀的性能，比如其他模型具有更多的参数和更多的计时时间，同时能够有效地缓解其他模型中的限制。

Abstract
As a crucial extension of entity alignment (EA), multi-modal entity alignment (MMEA) aims to identify identical entities across disparate knowledge graphs (KGs) by exploiting associated visual information. However, existing MMEA approaches primarily concentrate on the fusion paradigm of multi-modal entity features, while neglecting the challenges presented by the pervasive phenomenon of missing and intrinsic ambiguity of visual images. In this paper, we present a further analysis of visual modality incompleteness, benchmarking latest MMEA models on our proposed dataset MMEA-UMVM, where the types of alignment KGs covering bilingual and monolingual, with standard (non-iterative) and iterative training paradigms to evaluate the model performance. Our research indicates that, in the face of modality incompleteness, models succumb to overfitting the modality noise, and exhibit performance oscillations or declines at high rates of missing modality. This proves that the inclusion of additional multi-modal data can sometimes adversely affect EA. To address these challenges, we introduce UMAEA , a robust multi-modal entity alignment approach designed to tackle uncertainly missing and ambiguous visual modalities. It consistently achieves SOTA performance across all 97 benchmark splits, significantly surpassing existing baselines with limited parameters and time consumption, while effectively alleviating the identified limitations of other models. Our code and benchmark data are available at https://github.com/zjukg/UMAEA.

摘要
为了解决多个知识图（KG）之间的实体对应关系（Entity Alignment，EA）的扩展，多模态实体对应（Multi-modal Entity Alignment，MMEA）尝试通过利用相关的视觉信息来标识不同知识图中的相同实体。然而，现有的MMEA方法主要集中在多模态实体特征的融合方法上，而忽略了视觉图像中的普遍现象——缺失和内在的模糊性。在这篇论文中，我们进行了视觉Modal的进一步分析，并在我们提出的MMEA-UMVM数据集上 benchmark最新的MMEA模型。我们的研究表明，在面临多模态缺失的情况下，模型会受到模态噪声的折衔，并且在高比例的缺失多模态时，表现出振荡或下降的趋势。这表明，在多模态缺失情况下，模型可能会因为模态噪声而降低性能。为了解决这些挑战，我们提出了UMAEA，一种适应不确定、缺失和模糊的视觉多模态实体对应方法。它在所有97个 benchmark split中表现出了最高的SOTA性能，超过了已有的基线值，同时具有有限的参数和时间投入。我们的代码和 benchmark数据可以在https://github.com/zjukg/UMAEA上获取。

Around the GLOBE: Numerical Aggregation Question-Answering on Heterogeneous Genealogical Knowledge Graphs with Deep Neural Networks

paper_url: http://arxiv.org/abs/2307.16208
repo_url: None
paper_authors: Omri Suissa, Maayan Zhitomirsky-Geffet, Avshalom Elmalech
for:This paper is written for researchers and practitioners in the field of natural language processing and genealogy, as well as for the general public who are interested in exploring cultural heritage domains.methods:The paper proposes a new end-to-end methodology for numerical aggregation question-answering (QA) for genealogical trees, which includes an automatic method for training dataset generation, a transformer-based table selection method, and an optimized transformer-based numerical aggregation QA model.results:The proposed architecture, called GLOBE, outperforms the state-of-the-art models and pipelines by achieving 87% accuracy for the task of numerical aggregation QA compared to only 21% by current state-of-the-art models.

Abstract
One of the key AI tools for textual corpora exploration is natural language question-answering (QA). Unlike keyword-based search engines, QA algorithms receive and process natural language questions and produce precise answers to these questions, rather than long lists of documents that need to be manually scanned by the users. State-of-the-art QA algorithms based on DNNs were successfully employed in various domains. However, QA in the genealogical domain is still underexplored, while researchers in this field (and other fields in humanities and social sciences) can highly benefit from the ability to ask questions in natural language, receive concrete answers and gain insights hidden within large corpora. While some research has been recently conducted for factual QA in the genealogical domain, to the best of our knowledge, there is no previous research on the more challenging task of numerical aggregation QA (i.e., answering questions combining aggregation functions, e.g., count, average, max). Numerical aggregation QA is critical for distant reading and analysis for researchers (and the general public) interested in investigating cultural heritage domains. Therefore, in this study, we present a new end-to-end methodology for numerical aggregation QA for genealogical trees that includes: 1) an automatic method for training dataset generation; 2) a transformer-based table selection method, and 3) an optimized transformer-based numerical aggregation QA model. The findings indicate that the proposed architecture, GLOBE, outperforms the state-of-the-art models and pipelines by achieving 87% accuracy for this task compared to only 21% by current state-of-the-art models. This study may have practical implications for genealogical information centers and museums, making genealogical data research easy and scalable for experts as well as the general public.

摘要
一种关键的人工智能工具 для文本资料探索是自然语言问答（QA）。不同于关键词搜索引擎，QA算法会根据自然语言问题提供精确的答案，而不是长列表需要手动扫描的文档。现状最先进的QA算法基于深度学习神经网络（DNN）在多个领域得到了成功应用。然而，在家谱领域，QA仍然受到了不足的研究，而家谱领域的研究人员（以及人文社科领域的研究人员）可以很大程度上受益于自然语言问题的能力，并且可以通过自然语言问题来获得潜藏在大量文档中的新的发现和理解。虽然有些研究已经在家谱领域进行了实际问答，但我们知道的是，没有任何研究在家谱领域进行了更加复杂的数学聚合问答（例如计数、平均值、最大值）。数学聚合问答是远程阅读和分析的关键，因此在这种领域进行数学聚合问答是非常重要的。因此，在本研究中，我们提出了一种新的端到端方法，名为GLOBE，用于家谱领域的数学聚合问答。GLOBE方法包括：1）自动生成训练数据集方法；2）基于转换器的表格选择方法；3）优化的转换器基于数学聚合问答模型。研究结果表明，GLOBE方法在这个任务上的准确率为87%，比现有的状态OF艺术模型和管道的准确率高出26倍。这项研究可能对家谱信息中心和博物馆产生实质性的影响，使家谱数据研究变得容易和可扩展，以便专家和一般公众都能够轻松地进行研究。

Synthesizing Event-centric Knowledge Graphs of Daily Activities Using Virtual Space

paper_url: http://arxiv.org/abs/2307.16206
repo_url: https://github.com/aistairc/virtualhome2kg
paper_authors: Shusaku Egami, Takanori Ugai, Mikiko Oono, Koji Kitamura, Ken Fukuda
for: 本研究旨在提供一个虚拟空间内的日常活动知识 graphs（KG）构建框架，以支持人类日常生活中的各种情感和决策。
methods: 本研究使用的方法包括虚拟空间 simulations、事件中心式架构、和Contextual semantic data的生成。
results: 本研究通过实验示出了虚拟Home2KG框架的实用性和潜力，并显示了可以通过该框架进行日常活动分析、问题回答、嵌入和散列等应用。

Abstract
Artificial intelligence (AI) is expected to be embodied in software agents, robots, and cyber-physical systems that can understand the various contextual information of daily life in the home environment to support human behavior and decision making in various situations. Scene graph and knowledge graph (KG) construction technologies have attracted much attention for knowledge-based embodied question answering meeting this expectation. However, collecting and managing real data on daily activities under various experimental conditions in a physical space are quite costly, and developing AI that understands the intentions and contexts is difficult. In the future, data from both virtual spaces, where conditions can be easily modified, and physical spaces, where conditions are difficult to change, are expected to be combined to analyze daily living activities. However, studies on the KG construction of daily activities using virtual space and their application have yet to progress. The potential and challenges must still be clarified to facilitate AI development for human daily life. Thus, this study proposes the VirtualHome2KG framework to generate synthetic KGs of daily life activities in virtual space. This framework augments both the synthetic video data of daily activities and the contextual semantic data corresponding to the video contents based on the proposed event-centric schema and virtual space simulation results. Therefore, context-aware data can be analyzed, and various applications that have conventionally been difficult to develop due to the insufficient availability of relevant data and semantic information can be developed. We also demonstrate herein the utility and potential of the proposed VirtualHome2KG framework through several use cases, including the analysis of daily activities by querying, embedding, and clustering, and fall risk detection among ...

摘要
人工智能（AI）预期会被嵌入软件代理、机器人和 cyber-physical 系统中，以便在家庭环境中理解日常生活中的多种情况信息，以支持人类行为和决策。Scene graph和知识图（KG）建构技术吸引了很多关注，以满足这个期望。然而，收集和管理实际情况下的日常活动数据在物理空间是非常成本的，而发展AI理解意图和情况是困难的。未来，来自虚拟空间和物理空间的数据将被组合分析日常生活活动。然而，在虚拟空间和物理空间的KG建构日常活动研究仍然处于早期阶段。因此，本研究提出了虚拟家庭2KG框架，用于生成虚拟空间中的日常活动Synthetic KG。该框架将融合日常活动的 sintetic 视频数据和相关的上下文semantic数据，根据提出的事件-中心架构和虚拟空间模拟结果。因此，可以分析上下文化数据，并开发过去由于数据和semantic信息的不足而困难的应用。我们还在本文中展示了虚拟家庭2KG框架的实用性和潜力，包括查询、嵌入和凝聚等多种应用场景，以及落干风险检测等。

Shuffled Differentially Private Federated Learning for Time Series Data Analytics

paper_url: http://arxiv.org/abs/2307.16196
repo_url: None
paper_authors: Chenxi Huang, Chaoyang Jiang, Zhenghua Chen
for: 针对时间序列资料的信任worthy联合学习，实现最佳性能 while ensuring clients’ privacy.
methods: 使用本地差异隐藏来扩展隐藏保护 bound 到客户端，并将抛终技术 incorporated 以实现隐藏增强，从而缓解因采用本地差异隐藏而导致的准确度下降。
results: 在五个时间序列数据集上进行了广泛的实验，结果显示了我们的算法在小客户和大客户enario 中都实现了最小的准确度损失，并在同等隐藏保护水平下与中央差异隐藏联合学习相比，在小客户和大客户enario 中都展现出了改善的准确度。

Abstract
Trustworthy federated learning aims to achieve optimal performance while ensuring clients' privacy. Existing privacy-preserving federated learning approaches are mostly tailored for image data, lacking applications for time series data, which have many important applications, like machine health monitoring, human activity recognition, etc. Furthermore, protective noising on a time series data analytics model can significantly interfere with temporal-dependent learning, leading to a greater decline in accuracy. To address these issues, we develop a privacy-preserving federated learning algorithm for time series data. Specifically, we employ local differential privacy to extend the privacy protection trust boundary to the clients. We also incorporate shuffle techniques to achieve a privacy amplification, mitigating the accuracy decline caused by leveraging local differential privacy. Extensive experiments were conducted on five time series datasets. The evaluation results reveal that our algorithm experienced minimal accuracy loss compared to non-private federated learning in both small and large client scenarios. Under the same level of privacy protection, our algorithm demonstrated improved accuracy compared to the centralized differentially private federated learning in both scenarios.

摘要
信任worthy的联合学习 aimsto achieve optimal performance while ensuring clients' privacy. Existing privacy-preserving federated learning approaches are mostly tailored for image data, lacking applications for time series data, which have many important applications, such as machine health monitoring and human activity recognition. Furthermore, protective noising on a time series data analytics model can significantly interfere with temporal-dependent learning, leading to a greater decline in accuracy. To address these issues, we develop a privacy-preserving federated learning algorithm for time series data. Specifically, we employ local differential privacy to extend the privacy protection trust boundary to the clients. We also incorporate shuffle techniques to achieve a privacy amplification, mitigating the accuracy decline caused by leveraging local differential privacy. Extensive experiments were conducted on five time series datasets. The evaluation results reveal that our algorithm experienced minimal accuracy loss compared to non-private federated learning in both small and large client scenarios. Under the same level of privacy protection, our algorithm demonstrated improved accuracy compared to the centralized differentially private federated learning in both scenarios.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese. If you prefer Traditional Chinese, please let me know and I will be happy to provide the translation in that version as well.

CLGT: A Graph Transformer for Student Performance Prediction in Collaborative Learning

paper_url: http://arxiv.org/abs/2308.02038
repo_url: https://github.com/tianhao-peng/clgt
paper_authors: Tianhao Peng, Yu Liang, Wenjun Wu, Jian Ren, Zhao Pengrui, Yanjun Pu
for: 本研究旨在模型和预测学生在合作学习 paradigm 中的表现。大多数 literatura 中的研究都集中在讨论 forum 和社交学习网络。只有一些工作研究了学生在团队项目中如何互动，以及这些互动如何影响他们的学术表现。为了填补这个差距，我们选择了一个软件工程课程作为研究主题。参与这门课程的学生需要组队完成一个软件项目。在这种情况下，我们构建了一个学生互动图，基于不同团队中学生的活动。以这个学生互动图为基础，我们提出了一种扩展的图 transformer 框架 для合作学习（CLGT），用于评估和预测学生的表现。此外，提出的 CLGT 还包括一个解释模块，用于解释预测结果并可视化学生互动模式。实验结果表明，提出的 CLGT 在基于实际数据集上进行预测时，与基准模型相比，表现更高。此外，提出的 CLGT 可以 diferenciate 学生在合作学习 paradigm 中的低表现学生，并给教师提供早期预警，以便提供相应的帮助。

Abstract
Modeling and predicting the performance of students in collaborative learning paradigms is an important task. Most of the research presented in literature regarding collaborative learning focuses on the discussion forums and social learning networks. There are only a few works that investigate how students interact with each other in team projects and how such interactions affect their academic performance. In order to bridge this gap, we choose a software engineering course as the study subject. The students who participate in a software engineering course are required to team up and complete a software project together. In this work, we construct an interaction graph based on the activities of students grouped in various teams. Based on this student interaction graph, we present an extended graph transformer framework for collaborative learning (CLGT) for evaluating and predicting the performance of students. Moreover, the proposed CLGT contains an interpretation module that explains the prediction results and visualizes the student interaction patterns. The experimental results confirm that the proposed CLGT outperforms the baseline models in terms of performing predictions based on the real-world datasets. Moreover, the proposed CLGT differentiates the students with poor performance in the collaborative learning paradigm and gives teachers early warnings, so that appropriate assistance can be provided.

摘要
学习协作模式下学生表现预测和评价是一项重要任务。大多数文献中的研究都集中在讨论区和社交学习网络上，只有一些研究探讨了学生在团队项目中之间的互动如何影响学业表现。为了填补这一漏洞，我们选择了软件工程课程作为研究对象。参与这门课程的学生需要组队完成软件项目。在这种情况下，我们构建了基于学生分组的团队活动图，然后提出了一种基于协作学习（CLGT）扩展图 transformer 框架，用于评估和预测学生表现。此外，我们的 CLGT 还包括一个解释模块，可以解释预测结果并可视化学生互动模式。实验结果表明，我们的 CLGT 在实际数据集上表现较好，而且可以区分协作学习中表现不佳的学生，为教师提供早期预警，以便提供相应的帮助。

ESP: Exploiting Symmetry Prior for Multi-Agent Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.16186
repo_url: None
paper_authors: Xin Yu, Rongye Shi, Pu Feng, Yongkai Tian, Jie Luo, Wenjun Wu
for: 提高多智能体学习（MARL）的数据效率和模型准确性。
methods: integrate data augmentation和一种Well-designed consistency loss到现有的MARL方法中，使用协同学习和约束优化。
results: 在多个复杂任务上实现了效果，并在物理多机器人测试环境中证明了其优势。Here’s the breakdown of each point:1. for: This paper aims to improve the data efficiency and model accuracy of MARL by incorporating prior knowledge and using data augmentation.2. methods: The proposed framework uses a well-designed consistency loss and integrates it with existing MARL methods, utilizing both individual and cooperative learning.3. results: The proposed framework achieves effectiveness on multiple challenging tasks and outperforms existing methods in a physical multi-robot testbed.

Abstract
Multi-agent reinforcement learning (MARL) has achieved promising results in recent years. However, most existing reinforcement learning methods require a large amount of data for model training. In addition, data-efficient reinforcement learning requires the construction of strong inductive biases, which are ignored in the current MARL approaches. Inspired by the symmetry phenomenon in multi-agent systems, this paper proposes a framework for exploiting prior knowledge by integrating data augmentation and a well-designed consistency loss into the existing MARL methods. In addition, the proposed framework is model-agnostic and can be applied to most of the current MARL algorithms. Experimental tests on multiple challenging tasks demonstrate the effectiveness of the proposed framework. Moreover, the proposed framework is applied to a physical multi-robot testbed to show its superiority.

摘要
多智能体强化学习（MARL）在最近几年内已经取得了成功的结果。然而，现有的强化学习方法大多需要训练模型的大量数据。此外，数据效率的强化学习还需要建立强的概念预测，这些预测在当前的 MARL 方法中被忽略了。 inspirited by 多智能体系统中的对称现象，本文提出了一种将数据扩展和一种良好设计的一致损失integrated into the existing MARL methods。此外，提出的框架是model-agnostic，可以应用于大多数当前的 MARL算法。实验测试在多个复杂任务上表明了提出的框架的有效性。此外，提出的框架还应用于一个物理多机器人测试平台，以示其优势。Note: The translation is done using a machine translation tool, and may not be perfect. Please note that the translation is provided as-is, and may not be accurate or idiomatic.

Data-Driven Modeling with Experimental Augmentation for the Modulation Strategy of the Dual-Active-Bridge Converter

paper_url: http://arxiv.org/abs/2307.16173
repo_url: None
paper_authors: Xinze Li, Josep Pou, Jiaxin Dong, Fanfan Lin, Changyun Wen, Suvajit Mukherjee, Xin Zhang
for: 提高功率转换器性能模型的准确性和实用性
methods: combines simulation data and experimental data to establish a highly accurate and practical data-driven model
results: 实现了99.92%的效率模型准确性，并在2kW硬件实验中达到了98.45%的峰效率

Abstract
For the performance modeling of power converters, the mainstream approaches are essentially knowledge-based, suffering from heavy manpower burden and low modeling accuracy. Recent emerging data-driven techniques greatly relieve human reliance by automatic modeling from simulation data. However, model discrepancy may occur due to unmodeled parasitics, deficient thermal and magnetic models, unpredictable ambient conditions, etc. These inaccurate data-driven models based on pure simulation cannot represent the practical performance in physical world, hindering their applications in power converter modeling. To alleviate model discrepancy and improve accuracy in practice, this paper proposes a novel data-driven modeling with experimental augmentation (D2EA), leveraging both simulation data and experimental data. In D2EA, simulation data aims to establish basic functional landscape, and experimental data focuses on matching actual performance in real world. The D2EA approach is instantiated for the efficiency optimization of a hybrid modulation for neutral-point-clamped dual-active-bridge (NPC-DAB) converter. The proposed D2EA approach realizes 99.92% efficiency modeling accuracy, and its feasibility is comprehensively validated in 2-kW hardware experiments, where the peak efficiency of 98.45% is attained. Overall, D2EA is data-light and can achieve highly accurate and highly practical data-driven models in one shot, and it is scalable to other applications, effortlessly.

摘要
现代电源转换器性能模型ing的主流方法基本上是知识基础的，受到人工劳动的重荷和低精度模型ing的限制。Recent emerging data-driven techniques greatly relieve human reliance by automatic modeling from simulation data. However, model discrepancy may occur due to unmodeled parasitics, deficient thermal and magnetic models, unpredictable ambient conditions, etc. These inaccurate data-driven models based on pure simulation cannot represent the practical performance in physical world, hindering their applications in power converter modeling. To alleviate model discrepancy and improve accuracy in practice, this paper proposes a novel data-driven modeling with experimental augmentation (D2EA), leveraging both simulation data and experimental data. In D2EA, simulation data aims to establish basic functional landscape, and experimental data focuses on matching actual performance in real world. The D2EA approach is instantiated for the efficiency optimization of a hybrid modulation for neutral-point-clamped dual-active-bridge (NPC-DAB) converter. The proposed D2EA approach realizes 99.92% efficiency modeling accuracy, and its feasibility is comprehensively validated in 2-kW hardware experiments, where the peak efficiency of 98.45% is attained. Overall, D2EA is data-light and can achieve highly accurate and highly practical data-driven models in one shot, and it is scalable to other applications, effortlessly.

HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer

paper_url: http://arxiv.org/abs/2307.16171
repo_url: None
paper_authors: Sang-Hoon Lee, Ha-Yeong Choi, Hyung-Seok Oh, Seong-Whan Lee
for: 这个论文是为了解决Zero-shot voice style transfer（VST）系统中，新的话者语言风格转移的问题。
methods: 这个论文使用了 Hierarchical adaptive end-to-end zero-shot VST 模型，不需要文本输入，只使用了语音数据来训练模型，并利用了层次分布式构造和自我supervised representation。
results: 实验结果显示，我们的方法在Zero-shot VST scenario中表现更好，并且可以预测进行话者语言风格转移。Audio samples可以在 \url{https://hiervst.github.io/} 上找到。

Abstract
Despite rapid progress in the voice style transfer (VST) field, recent zero-shot VST systems still lack the ability to transfer the voice style of a novel speaker. In this paper, we present HierVST, a hierarchical adaptive end-to-end zero-shot VST model. Without any text transcripts, we only use the speech dataset to train the model by utilizing hierarchical variational inference and self-supervised representation. In addition, we adopt a hierarchical adaptive generator that generates the pitch representation and waveform audio sequentially. Moreover, we utilize unconditional generation to improve the speaker-relative acoustic capacity in the acoustic representation. With a hierarchical adaptive structure, the model can adapt to a novel voice style and convert speech progressively. The experimental results demonstrate that our method outperforms other VST models in zero-shot VST scenarios. Audio samples are available at \url{https://hiervst.github.io/}.

摘要
尽管voice style transfer（VST）领域的进步迅速，现有的零shot VST系统仍然缺乏将新 speaker的voice style转移的能力。在这篇论文中，我们提出了层次适应式结构的终端零shot VST模型，即HierVST。无需文本脚本，我们只使用语音数据来训练模型，通过层次变量推理和自主学习表示。此外，我们采用层次适应生成器，生成抖音表示和波形声音sequentially。此外，我们利用无条件生成提高了发音人Relative acoustic representation的能力。通过层次适应结构，模型可以适应新的语音风格，并逐渐转换语音。实验结果表明，我们的方法在零shot VST场景中超过了其他VST模型。听样本可以在 \url{https://hiervst.github.io/} 上找到。

An Effective LSTM-DDPM Scheme for Energy Theft Detection and Forecasting in Smart Grid

paper_url: http://arxiv.org/abs/2307.16149
repo_url: None
paper_authors: Xun Yuan, Yang Yang, Arwa Alromih, Prosanta Gope, Biplab Sikdar
for: 这篇论文旨在解决智能电网系统中的能源盗窃探测 (ETD) 和能源消耗预测 (ECF) 两个几相关的挑战，以确保系统安全。
methods: 本文提出的解决方案结合了长期内部积存 (LSTM) 和检测扩散概率模型 (DDPM)，实现输入重建和预测。系统通过利用重建和预测错误来识别能源盗窃实例，并且通过重建错误和预测错误的结合来检测不同类型的攻击。
results: 经过实验表明，提出的方案在真实数据和 sintetic 数据上都表现出色，较基eline方法有更好的检测和预测性。 ensemble 方法可以强化 ETD 性能，精确地检测能源盗窃攻击，而 baseline 方法则失败。

Abstract
Energy theft detection (ETD) and energy consumption forecasting (ECF) are two interconnected challenges in smart grid systems. Addressing these issues collectively is crucial for ensuring system security. This paper addresses the interconnected challenges of ETD and ECF in smart grid systems. The proposed solution combines long short-term memory (LSTM) and a denoising diffusion probabilistic model (DDPM) to generate input reconstruction and forecasting. By leveraging the reconstruction and forecasting errors, the system identifies instances of energy theft, with the methods based on reconstruction error and forecasting error complementing each other in detecting different types of attacks. Through extensive experiments on real-world and synthetic datasets, the proposed scheme outperforms baseline methods in ETD and ECF problems. The ensemble method significantly enhances ETD performance, accurately detecting energy theft attacks that baseline methods fail to detect. The research offers a comprehensive and effective solution for addressing ETD and ECF challenges, demonstrating promising results and improved security in smart grid systems.

摘要
智能电网系统中的能源盗链检测（ETD）和能源消耗预测（ECF）是两个相互关联的挑战。解决这两个问题是确保系统安全的关键。本文介绍了智能电网系统中ETD和ECF的解决方案，combines long short-term memory（LSTM）和denoising diffusion probabilistic model（DDPM）来生成输入重建和预测。通过利用重建和预测错误，系统可以识别能源盗链行为，baseline方法不能检测的不同类型的攻击。通过对实际数据和 sintetic 数据进行广泛的实验，提出的方案在ETD和ECF问题中表现出色，significantly enhances ETD性能，准确地检测能源盗链攻击。本研究提供了智能电网系统中ETD和ECF问题的全面和有效解决方案，实现了系统安全性的提高。

Fully $1\times1$ Convolutional Network for Lightweight Image Super-Resolution

paper_url: http://arxiv.org/abs/2307.16140
repo_url: https://github.com/aitical/scnet
paper_authors: Gang Wu, Junjun Jiang, Kui Jiang, Xianming Liu
for: 提高单张图像超解像（SISR）任务中的性能，特别是在具有大kernel（3×3或更大）的深度模型中。
methods: 提出一种简单 yet effective的完全$1\times1$卷积网络，称为Shift-Conv-based Network（SCNet），通过添加一个参数自由的空间移动操作，使得完全$1\times1$卷积网络具有强大的表示能力和出色的计算效率。
results: 经验表明，SCNets，即使完全使用$1\times1$卷积结构，可以与现有的轻量级SR模型相匹配或超越其性能。

Abstract
Deep models have achieved significant process on single image super-resolution (SISR) tasks, in particular large models with large kernel ($3\times3$ or more). However, the heavy computational footprint of such models prevents their deployment in real-time, resource-constrained environments. Conversely, $1\times1$ convolutions bring substantial computational efficiency, but struggle with aggregating local spatial representations, an essential capability to SISR models. In response to this dichotomy, we propose to harmonize the merits of both $3\times3$ and $1\times1$ kernels, and exploit a great potential for lightweight SISR tasks. Specifically, we propose a simple yet effective fully $1\times1$ convolutional network, named Shift-Conv-based Network (SCNet). By incorporating a parameter-free spatial-shift operation, it equips the fully $1\times1$ convolutional network with powerful representation capability while impressive computational efficiency. Extensive experiments demonstrate that SCNets, despite its fully $1\times1$ convolutional structure, consistently matches or even surpasses the performance of existing lightweight SR models that employ regular convolutions.

摘要
深度模型在单图超分辨（SISR）任务上已经实现了显著的进步，特别是大型模型与大ernel（3×3或更大）。然而，这些模型的计算负担太大，使得它们在实时、资源受限的环境中不得不进行部署。相反，$1\times1$ convolution具有很大的计算效率，但是它们很难将本地空间表示合并成功。为了解决这种对立，我们提议融合$3\times3$和$1\times1$ kernel的优点，并利用轻量级SR任务的潜在能力。具体来说，我们提出了一种简单 yet effective的全$1\times1$ convolutional neural network（SCNet）。通过添加无参数的空间移动操作，SCNet可以具有强大的表示能力，同时具有出色的计算效率。广泛的实验表明，尽管SCNet具有全$1\times1$ convolutional结构，仍然可以与现有的轻量级SR模型相比或超越其性能。

User-Controlled Knowledge Fusion in Large Language Models: Balancing Creativity and Hallucination

paper_url: http://arxiv.org/abs/2307.16139
repo_url: None
paper_authors: Chen Zhang
for: 这篇论文旨在提出一种用户控制的机制，以调节大语言模型（LLM）的假设和现实知识之间的平衡。
methods: 该方法在训练阶段使用数字标签来表示LLM在生成响应时的 faithfulness degree，并通过自动化的过程来计算这个度量，包括 ROUGE scores、Sentence-BERT 嵌入和 LLM 自我评估得分。
results: 研究人员在不同的场景下进行了广泛的实验，并证明了该方法的适应性和精度。结果表明，该方法可以增强 LLM 的多样性，同时保持它们的假设和投影之间的平衡。

Abstract
In modern dialogue systems, the use of Large Language Models (LLMs) has grown exponentially due to their capacity to generate diverse, relevant, and creative responses. Despite their strengths, striking a balance between the LLMs' creativity and their faithfulness to external knowledge remains a key challenge. This paper presents an innovative user-controllable mechanism that modulates the balance between an LLM's imaginative capabilities and its adherence to factual information. Our approach incorporates a numerical tag during the fine-tuning phase of the LLM's training, representing the degree of faithfulness to the reference knowledge in the generated responses. This degree is computed through an automated process that measures lexical overlap using ROUGE scores, semantic similarity using Sentence-BERT embeddings, and an LLM's self-evaluation score. During model inference, users can manipulate this numerical tag, thus controlling the degree of the LLM's reliance on external knowledge. We conduct extensive experiments across various scenarios, demonstrating the adaptability of our method and its efficacy in ensuring the quality and accuracy of the LLM's responses. The results highlight the potential of our approach to enhance the versatility of LLMs while maintaining a balance between creativity and hallucination.

摘要

paper_url: http://arxiv.org/abs/2307.16121
repo_url: None
paper_authors: Yang Lou, Qun Song, Qian Xu, Rui Tan, Jianping Wang
for: 提高自动驾驶感知器对象检测的精度和可靠性。
methods: 利用不同感知器的检测结果和单模态不确定性进行多模态融合，并通过门控网络对结果进行权重衡量。
results: 与状态bla bla bla相比，提高了10.67%, 3.17%, 5.40%的性能。

Abstract
Multi-modal fusion has shown initial promising results for object detection of autonomous driving perception. However, many existing fusion schemes do not consider the quality of each fusion input and may suffer from adverse conditions on one or more sensors. While predictive uncertainty has been applied to characterize single-modal object detection performance at run time, incorporating uncertainties into the multi-modal fusion still lacks effective solutions due primarily to the uncertainty's cross-modal incomparability and distinct sensitivities to various adverse conditions. To fill this gap, this paper proposes Uncertainty-Encoded Mixture-of-Experts (UMoE) that explicitly incorporates single-modal uncertainties into LiDAR-camera fusion. UMoE uses individual expert network to process each sensor's detection result together with encoded uncertainty. Then, the expert networks' outputs are analyzed by a gating network to determine the fusion weights. The proposed UMoE module can be integrated into any proposal fusion pipeline. Evaluation shows that UMoE achieves a maximum of 10.67%, 3.17%, and 5.40% performance gain compared with the state-of-the-art proposal-level multi-modal object detectors under extreme weather, adversarial, and blinding attack scenarios.

摘要
Here is the Simplified Chinese translation:多modal融合已经在自动驾驶感知中展示了初步的抢眼结果，但许多现有的融合方案不考虑每个融合输入的质量，可能会受到一或多个感知器的不良条件的影响。尽管预测不确定性已经应用于characterize单modal对象检测性能的实时，但在多modal融合中缺乏有效的解决方案，主要是因为不确定性的跨模异常性和不同的感知器对各种不良条件的敏感性。为了填补这个空白，这篇论文提出了不确定性编码的权重混合（UMoE）模块，该模块将单modal不确定性编码成特征网络中的一部分。然后，这些特征网络的输出将被分析器网络分析，以确定融合权重。该提出的UMoE模块可以与任何提议融合管道集成。评估结果表明，UMoE在极端天气、反击和盲目攻击等场景下达到了最大10.67%, 3.17%和5.40%的性能提升。

AI Increases Global Access to Reliable Flood Forecasts

paper_url: http://arxiv.org/abs/2307.16104
repo_url: https://github.com/google-research-datasets/global_streamflow_model_paper
paper_authors: Grey Nearing, Deborah Cohen, Vusumuzi Dube, Martin Gauch, Oren Gilon, Shaun Harrigan, Avinatan Hassidim, Frederik Kratzert, Asher Metzger, Sella Nevo, Florian Pappenberger, Christel Prudhomme, Guy Shalev, Shlomo Shenzis, Tadele Tekalign, Dana Weitzner, Yoss Matias
for: 这项研究的目的是开发一种基于人工智能的洪水预测模型，以提供更加准确和及时的洪水警报。
methods: 该模型使用了人工智能技术，并使用了全球覆盖率较高的卫星数据和开放数据来预测洪水事件。
results: 该模型在全球各大洲的洪水预测中表现出色，特别是在 ungauged 水系中，其预测精度高于现有的全球洪水模型。

Abstract
Floods are one of the most common and impactful natural disasters, with a disproportionate impact in developing countries that often lack dense streamflow monitoring networks. Accurate and timely warnings are critical for mitigating flood risks, but accurate hydrological simulation models typically must be calibrated to long data records in each watershed where they are applied. We developed an Artificial Intelligence (AI) model to predict extreme hydrological events at timescales up to 7 days in advance. This model significantly outperforms current state of the art global hydrology models (the Copernicus Emergency Management Service Global Flood Awareness System) across all continents, lead times, and return periods. AI is especially effective at forecasting in ungauged basins, which is important because only a few percent of the world's watersheds have stream gauges, with a disproportionate number of ungauged basins in developing countries that are especially vulnerable to the human impacts of flooding. We produce forecasts of extreme events in South America and Africa that achieve reliability approaching the current state of the art in Europe and North America, and we achieve reliability at between 4 and 6-day lead times that are similar to current state of the art nowcasts (0-day lead time). Additionally, we achieve accuracies over 10-year return period events that are similar to current accuracies over 2-year return period events, meaning that AI can provide warnings earlier and over larger and more impactful events. The model that we develop in this paper has been incorporated into an operational early warning system that produces publicly available (free and open) forecasts in real time in over 80 countries. This work using AI and open data highlights a need for increasing the availability of hydrological data to continue to improve global access to reliable flood warnings.

摘要
洪水是最常见且影响最大的自然灾害之一，特别是在发展中国家，那里缺乏密集的流量监测网。精确和时刻的警告是控制洪水风险的关键，但是需要对每个水系进行精确的水文模型协调。我们开发了人工智能（AI）模型，可以预测极端ydrological事件，时间从7天前到7天后。这个模型在全球各大洲、不同的领先时间和回报期都有出色的表现，特别是在无测流域，因为只有少数世界的水系有流量测站，而那些缺乏测站的水系往往是发展中国家，他们对人类洪水的影响更加敏感。我们的预测在南美和非洲 achieves reliability approaching the current state of the art in Europe and North America, and we achieve reliability at between 4 and 6-day lead times that are similar to current state of the art nowcasts (0-day lead time). In addition, we achieve accuracies over 10-year return period events that are similar to current accuracies over 2-year return period events, meaning that AI can provide warnings earlier and over larger and more impactful events. 我们在这篇文章中开发的模型已经被integrated into an operational early warning system that produces publicly available (free and open) forecasts in real time in over 80 countries. 这个使用AI和开放数据的工作表明了需要增加水文数据的可用性，以继续提高全球访问可靠的洪水警告。

PD-SEG: Population Disaggregation Using Deep Segmentation Networks For Improved Built Settlement Mask

paper_url: http://arxiv.org/abs/2307.16084
repo_url: None
paper_authors: Muhammad Abdul Rahman, Muhammad Ahmad Waseem, Zubair Khalid, Muhammad Tahir, Momin Uppal
For: 该研究旨在提供高精度的人口普查数据，以便用于国家发展规划和资源分配决策。* Methods: 该研究使用深度分割网络生成高精度的建成区域面积图像，并使用POI数据排除非居住区域。* Results: 该研究可以准确地估计人口总数和人口密度，并可以提供30米x30米的分辨率的人口普查数据。

Abstract
Any policy-level decision-making procedure and academic research involving the optimum use of resources for development and planning initiatives depends on accurate population density statistics. The current cutting-edge datasets offered by WorldPop and Meta do not succeed in achieving this aim for developing nations like Pakistan; the inputs to their algorithms provide flawed estimates that fail to capture the spatial and land-use dynamics. In order to precisely estimate population counts at a resolution of 30 meters by 30 meters, we use an accurate built settlement mask obtained using deep segmentation networks and satellite imagery. The Points of Interest (POI) data is also used to exclude non-residential areas.

摘要
任何政策层次决策过程和学术研究，涉及资源最佳利用 для发展和规划倡议，都取决于准确的人口密度统计。现有的最先进数据集，如WorldPop和Meta，无法实现这一目标，因为它们的输入算法不能准确捕捉空间和土地利用动态。为了准确地估算人口数，我们使用高精度的建成市区mask，以及POI数据来排除非居住区域。注意：以下文本使用了简化中文，与标准中文有些细微的差异。

paper_url: http://arxiv.org/abs/2307.16082
repo_url: None
paper_authors: Mohammadali Sefidi Esfahani, Mohammad Akbari
for: 这 paper 的目的是提出一种基于流行社交数据的事件检测方法，以便更好地检测和分类不同类型的社会事件。
methods: 该方法使用语义和语境知识来检测社交媒体上的事件，并通过构建事件链来展示事件的变化。
results: 实验结果表明，该方法能够高效地检测和分类不同类型的社会事件，并且可以准确地捕捉事件的变化。

Abstract
Social platforms have emerged as crucial platforms for disseminating information and discussing real-life social events, which offers an excellent opportunity for researchers to design and implement novel event detection frameworks. However, most existing approaches merely exploit keyword burstiness or network structures to detect unspecified events. Thus, they often fail to identify unspecified events regarding the challenging nature of events and social data. Social data, e.g., tweets, is characterized by misspellings, incompleteness, word sense ambiguation, and irregular language, as well as variation in aspects of opinions. Moreover, extracting discriminative features and patterns for evolving events by exploiting the limited structural knowledge is almost infeasible. To address these challenges, in this thesis, we propose a novel framework, namely EnrichEvent, that leverages the lexical and contextual representations of streaming social data. In particular, we leverage contextual knowledge, as well as lexical knowledge, to detect semantically related tweets and enhance the effectiveness of the event detection approaches. Eventually, our proposed framework produces cluster chains for each event to show the evolving variation of the event through time. We conducted extensive experiments to evaluate our framework, validating its high performance and effectiveness in detecting and distinguishing unspecified social events.

摘要
社交平台已成为散布信息和讨论现实社会事件的重要平台，这提供了研究人员设计和实现新型事件探测框架的优秀机会。然而，大多数现有方法只是利用关键词爆炸或社交网络结构来探测不特定的事件。因此，它们经常无法识别复杂的事件和社会数据中的事件。社会数据，例如微博，具有杂乱不准、缺失、多义词和不规则语言特征，同时也存在意见方面的变化。此外，抽取特征和模式以探测发展事件的限制知识是几乎不可能的。为了解决这些挑战，在本论文中，我们提出了一种新的框架，即EnrichEvent，该框架利用流动社会数据的语言和上下文表示来探测事件。具体来说，我们利用上下文知识以及语言知识来检测相关的微博，从而提高事件探测方法的效iveness。最终，我们的提出的框架生成了每个事件的时间序列链，以示出事件的发展变化。我们进行了广泛的实验来评估我们的框架，并证明其高效性和效iveness在探测和分辨不特定社会事件。

2023-07-30

cs.CL

cs.CL - 2023-07-30

A Private Watermark for Large Language Models

paper_url: http://arxiv.org/abs/2307.16230
repo_url: https://github.com/THU-BPM/private_watermark
paper_authors: Aiwei Liu, Leyi Pan, Xuming Hu, Shu’ang Li, Lijie Wen, Irwin King, Philip S. Yu
for: 保护大语言模型生成的文本免遭伪造和版权侵犯
methods: 使用两个不同的神经网络：一个用于水印生成，另一个用于水印检测，而且一部分参数共享两者
results: 实现高检测精度，无需大量参数和计算资源，同时难以从检测网络中提取水印生成规则

Abstract
Recently, text watermarking algorithms for large language models (LLMs) have been mitigating the potential harms of text generated by the LLMs, including fake news and copyright issues. However, the watermark detection of current text algorithms requires the key from the generation process, making them susceptible to breaches and counterfeiting. In this work, we propose the first private watermarking algorithm, which extends the current text watermarking algorithms by using two different neural networks respectively for watermark generation and detection, rather than using the same key at both stages. Meanwhile, part of the parameters of the watermark generation and detection networks are shared, which makes the detection network achieve a high accuracy very efficiently. Experiments show that our algorithm ensures high detection accuracy with minimal impact on generation and detection speed, due to the small parameter size of both networks. Additionally, our subsequent analysis demonstrates the difficulty of reverting the watermark generation rules from the detection network.

摘要

Optimizing the Neural Network Training for OCR Error Correction of Historical Hebrew Texts

paper_url: http://arxiv.org/abs/2307.16220
repo_url: https://github.com/smartinternz02/SI-GuidedProject-2307-1622049182
paper_authors: Omri Suissa, Avshalom Elmalech, Maayan Zhitomirsky-Geffet
For: The paper aims to improve the accuracy of Optical Character Recognition (OCR) post-correction for historical documents, specifically for Hebrew texts.* Methods: The paper proposes an innovative method for training a light-weight neural network using significantly less manually created data. The method involves generating language and task-specific training data to improve the neural network results for OCR post-correction.* Results: The paper shows that the proposed method outperforms other state-of-the-art neural networks for OCR post-correction and complex spellcheckers. The results also indicate that the performance of the neural network depends on the genre and area of the training data.Here is the same information in Simplified Chinese text:* For: 这篇论文旨在提高历史文档中的Optical Character Recognition（OCR）后处理的准确率，具体是为希伯来文本。* Methods: 论文提出了一种创新的方法，通过大量地使用自动生成的语言和任务特定的训练数据来提高神经网络的OCR后处理准确率。* Results: 论文表明，提议的方法可以超越其他现有的神经网络和复杂的拼写检查器。结果还表明，神经网络的性能受训练数据的种类和地域的影响。

Abstract
Over the past few decades, large archives of paper-based documents such as books and newspapers have been digitized using Optical Character Recognition. This technology is error-prone, especially for historical documents. To correct OCR errors, post-processing algorithms have been proposed based on natural language analysis and machine learning techniques such as neural networks. Neural network's disadvantage is the vast amount of manually labeled data required for training, which is often unavailable. This paper proposes an innovative method for training a light-weight neural network for Hebrew OCR post-correction using significantly less manually created data. The main research goal is to develop a method for automatically generating language and task-specific training data to improve the neural network results for OCR post-correction, and to investigate which type of dataset is the most effective for OCR post-correction of historical documents. To this end, a series of experiments using several datasets was conducted. The evaluation corpus was based on Hebrew newspapers from the JPress project. An analysis of historical OCRed newspapers was done to learn common language and corpus-specific OCR errors. We found that training the network using the proposed method is more effective than using randomly generated errors. The results also show that the performance of the neural network for OCR post-correction strongly depends on the genre and area of the training data. Moreover, neural networks that were trained with the proposed method outperform other state-of-the-art neural networks for OCR post-correction and complex spellcheckers. These results may have practical implications for many digital humanities projects.

摘要
在过去几十年，大量的纸质文档，如书籍和报纸，已经被数字化使用光学字符识别（OCR）技术。这种技术存在误差，尤其是对历史文档。为了纠正OCR误差，基于自然语言分析和机器学习技术的后处理算法被提议。但是，这些算法需要大量的手动标注数据来训练，而这些数据往往不可得。这篇论文提出了一种创新的方法，用于训练一个轻量级的神经网络，以便进行希伯来文OCR后处理。研究的主要目标是开发一种自动生成语言和任务特定的训练数据，以提高神经网络的OCR后处理结果，并investigate最有效的数据集类型，以便对历史文档进行OCR后处理。为此，我们进行了一系列的实验，使用了多个数据集。评估集基于希伯来报纸的JPress项目。我们对历史OCR后的报纸进行了分析，以了解希伯来文中的常见OCR误差。我们发现，使用我们提出的方法来训练神经网络是比使用随机生成的误差更有效的。结果还表明，神经网络的OCR后处理性能强度取决于训练数据的类型和地区。此外，我们使用我们提出的方法训练的神经网络，与其他现有的神经网络和复杂的拼写检查器相比，表现更好。这些结果可能对数字人文学科项目产生实质性的影响。

Toward a Period-Specific Optimized Neural Network for OCR Error Correction of Historical Hebrew Texts

paper_url: http://arxiv.org/abs/2307.16213
repo_url: None
paper_authors: Omri Suissa, Maayan Zhitomirsky-Geffet, Avshalom Elmalech
for: corrected historical documents
methods: neural networks, OCR error correction
results: effective OCR post-correction in Hebrew

Abstract
Over the past few decades, large archives of paper-based historical documents, such as books and newspapers, have been digitized using the Optical Character Recognition (OCR) technology. Unfortunately, this broadly used technology is error-prone, especially when an OCRed document was written hundreds of years ago. Neural networks have shown great success in solving various text processing tasks, including OCR post-correction. The main disadvantage of using neural networks for historical corpora is the lack of sufficiently large training datasets they require to learn from, especially for morphologically-rich languages like Hebrew. Moreover, it is not clear what are the optimal structure and values of hyperparameters (predefined parameters) of neural networks for OCR error correction in Hebrew due to its unique features. Furthermore, languages change across genres and periods. These changes may affect the accuracy of OCR post-correction neural network models. To overcome these challenges, we developed a new multi-phase method for generating artificial training datasets with OCR errors and hyperparameters optimization for building an effective neural network for OCR post-correction in Hebrew.

摘要
过去几十年，大量的纸质历史文献，如书籍和报纸，已经被使用光学字符识别（OCR）技术数字化。然而，这种广泛使用的技术有误，特别是当OCRed文档写于数百年前时。神经网络在解决不同的文本处理任务上表现出色，包括OCR后修正。然而，使用神经网络 для历史资料的问题是缺乏足够大的训练数据集，特别是 для morphologically-rich语言如希伯来语。此外，希伯来语的独特特征使得神经网络模型的优化很难。此外，语言随着时代和领域的变化而变化，这些变化可能会影响OCR后修正神经网络模型的准确性。为了解决这些挑战，我们开发了一种新的多阶段方法，用于生成人工的OCR错误数据集和神经网络模型优化，以建立有效的OCR后修正神经网络模型。

A Knowledge-enhanced Two-stage Generative Framework for Medical Dialogue Information Extraction

paper_url: http://arxiv.org/abs/2307.16200
repo_url: https://github.com/flyingcat-fa/ktgf
paper_authors: Zefa Hu, Ziyi Ni, Jing Shi, Shuang Xu, Bo Xu
For: 这个论文关注医疗对话中的短语状态对（MD-TSPE）抽取，它是诊断对话系统和电子医疗记录自动scriber的基础。过去几年，关于MD-TSPE的研究吸引了增加的关注，特别是在生成方法的进步之后。但是，这些生成方法输出整个序列，包括短语状态对，而忽略了集成先前知识，需要更深刻地理解短语和状态之间的关系，以及模型短语的推理。* Methods: 本论文提出了知识增强的两个阶段生成框架（KTGF），使用任务特定的提示，我们使用单个模型完成MD-TSPE的两个阶段：首先生成所有短语，然后为每个生成的短语生成状态。这样做的好处是可以更好地从序列中学习短语之间的关系，并且我们设计的知识增强提示在第二阶段可以更好地利用短语的类别和状态候选人选择状态生成。此外，我们提出的特殊状态“未提及”使得更多的短语可用，增加训练数据的质量。* Results: 对于Chunyu和CMDD数据集，我们的提posed方法在全训练和低资源设置下 achieve superior results比之前的状态艺术模型。

Abstract
This paper focuses on term-status pair extraction from medical dialogues (MD-TSPE), which is essential in diagnosis dialogue systems and the automatic scribe of electronic medical records (EMRs). In the past few years, works on MD-TSPE have attracted increasing research attention, especially after the remarkable progress made by generative methods. However, these generative methods output a whole sequence consisting of term-status pairs in one stage and ignore integrating prior knowledge, which demands a deeper understanding to model the relationship between terms and infer the status of each term. This paper presents a knowledge-enhanced two-stage generative framework (KTGF) to address the above challenges. Using task-specific prompts, we employ a single model to complete the MD-TSPE through two phases in a unified generative form: we generate all terms the first and then generate the status of each generated term. In this way, the relationship between terms can be learned more effectively from the sequence containing only terms in the first phase, and our designed knowledge-enhanced prompt in the second phase can leverage the category and status candidates of the generated term for status generation. Furthermore, our proposed special status ``not mentioned" makes more terms available and enriches the training data in the second phase, which is critical in the low-resource setting. The experiments on the Chunyu and CMDD datasets show that the proposed method achieves superior results compared to the state-of-the-art models in the full training and low-resource settings.

摘要
To address these challenges, this paper proposes a knowledge-enhanced two-stage generative framework (KTGF) that uses task-specific prompts to complete MD-TSPE in a unified generative form. The first stage generates all terms, and the second stage generates the status of each generated term. By learning the relationship between terms in the first phase and leveraging category and status candidates in the second phase, our method can generate more accurate term-status pairs. Moreover, our proposed "not mentioned" special status enriches the training data in the second phase, which is critical in low-resource settings.Experiments on the Chunyu and CMDD datasets show that our proposed method outperforms state-of-the-art models in both full training and low-resource settings.

Improving TTS for Shanghainese: Addressing Tone Sandhi via Word Segmentation

paper_url: http://arxiv.org/abs/2307.16199
repo_url: https://github.com/edward-martyr/shanghainese-tts
paper_authors: Yuanhao Chen
for: 这篇论文的目的是提高 Shanghainese TTS 模型中的声调混合问题。
methods: 作者使用 word segmentation 技术来增强 TTS 模型对声调混合的表现。特别是在 left-dominant 声调中，使用特殊符号来代表每个词中的声调信息。
results: 作者发现，通过 word segmentation 技术可以提高 TTS 模型对声调混合的表现，并且可以更好地捕捉 Shanghainese 语言的声调特征。这项研究可能成为 Shanghainese 语言计算机化项目的开端。

Abstract
Tone is a crucial component of the prosody of Shanghainese, a Wu Chinese variety spoken primarily in urban Shanghai. Tone sandhi, which applies to all multi-syllabic words in Shanghainese, then, is key to natural-sounding speech. Unfortunately, recent work on Shanghainese TTS (text-to-speech) such as Apple's VoiceOver has shown poor performance with tone sandhi, especially LD (left-dominant sandhi). Here I show that word segmentation during text preprocessing can improve the quality of tone sandhi production in TTS models. Syllables within the same word are annotated with a special symbol, which serves as a proxy for prosodic information of the domain of LD. Contrary to the common practice of using prosodic annotation mainly for static pauses, this paper demonstrates that prosodic annotation can also be applied to dynamic tonal phenomena. I anticipate this project to be a starting point for bringing formal linguistic accounts of Shanghainese into computational projects. Too long have we been using the Mandarin models to approximate Shanghainese, but it is a different language with its own linguistic features, and its digitisation and revitalisation should be treated as such.

摘要
上海话的拥有者是一种武汉话种，主要在上海城市地区使用。声调推移是上海话的重要成分，但是最近的上海话 Text-to-Speech（TTS）技术，如苹果的voiceover，在声调推移方面表现不佳，特别是左倾推移（LD）。我们表明，在文本处理阶段使用 word segmentation 可以提高 TTS 模型中声调推移质量。在同一个词中的每个音节上使用特殊符号，作为声调信息的代理，以表示声调推移的域。与常见的使用静音注释主要用于静止停顿的情况下，这篇论文表明了可以将静音注释应用到动态声调现象上。我预计这个项目将成为将正式语言学质量的上海话计算机项目的开端。我们已经使用普通话模型来近似上海话很长时间，但是它是一种不同的语言，它有自己的语言特征，我们应该对其数字化和恢复进行正确的待遇。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The original text was in Traditional Chinese, which is used in Taiwan and other parts of the world where Traditional Chinese is prevalent.

Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models

paper_url: http://arxiv.org/abs/2307.16180
repo_url: https://github.com/harderthenharder/transformers_tasks
paper_authors: Keyu Pan, Yawen Zeng
for: investigating the feasibility of using the Myers-Briggs Type Indicator (MBTI) as an evaluation metric for large language models (LLMs)
methods: extensive experiments to explore the personality types of different LLMs, the possibility of changing the personality types by prompt engineering, and the impact of training datasets on the model’s personality
results: the study aims to determine whether LLMs with human-like abilities possess human-like personalities, and whether the MBTI can serve as a rough indicator of this similarity

Abstract
The field of large language models (LLMs) has made significant progress, and their knowledge storage capacity is approaching that of human beings. Furthermore, advanced techniques, such as prompt learning and reinforcement learning, are being employed to address ethical concerns and hallucination problems associated with LLMs, bringing them closer to aligning with human values. This situation naturally raises the question of whether LLMs with human-like abilities possess a human-like personality? In this paper, we aim to investigate the feasibility of using the Myers-Briggs Type Indicator (MBTI), a widespread human personality assessment tool, as an evaluation metric for LLMs. Specifically, extensive experiments will be conducted to explore: 1) the personality types of different LLMs, 2) the possibility of changing the personality types by prompt engineering, and 3) How does the training dataset affect the model's personality. Although the MBTI is not a rigorous assessment, it can still reflect the similarity between LLMs and human personality. In practice, the MBTI has the potential to serve as a rough indicator. Our codes are available at https://github.com/HarderThenHarder/transformers_tasks/tree/main/LLM/llms_mbti.

摘要
大型语言模型（LLM）领域已经做出了重大进步，它们的知识储存能力接近人类水平。此外，高级技术，如提示学习和复杂学习，也在实施以解决伦理性和幻觉问题，使得 LLM 更接近人类价值观。这种情况自然地引起了问题： LLM 是否拥有人类式的人格？在这篇文章中，我们将 investigate 使用 Myers-Briggs Type Indicator（MBTI），一个广泛应用于人类人格评估工具，来评估 LLM 的可能性。具体来说，我们将进行大量的实验，以探索：1）不同 LLM 的人格型态，2）提示工程学可以改变 LLM 的人格型态，3）训练数据库对模型的人格影响。虽然 MBTI 不是一个正式的评估工具，但它仍然可以反映 LLM 与人类人格之间的相似性。在实践中，MBTI 有可能作为一个简单的指标。我们的代码可以在 GitHub 上找到：https://github.com/HarderThenHarder/transformers_tasks/tree/main/LLM/llms_mbti。

SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension

paper_url: http://arxiv.org/abs/2307.16125
repo_url: https://github.com/ailab-cvc/seed-bench
paper_authors: Bohao Li, Rui Wang, Guangzhi Wang, Yuying Ge, Yixiao Ge, Ying Shan
for: 本研究旨在评估多元语言模型（MLLMs）的生成理解能力，作为评估生成模型的首要步骤，并提供一个名为SEED-Bench的benchmark。
methods: 本研究使用了一个高级的生成管道，包括自动筛选和人工验证过程，以生成多个选项问题，以覆盖12个评估维度，包括图像和视频模式的理解。
results: 本研究对18个模型进行了全面的评估，并发现了现有MLLMs的限制，以及它们在不同的维度上的表现。这些结果可以为未来的研究提供指导，并为社区提供一个平台来评估和调查模型能力。

Abstract
Based on powerful Large Language Models (LLMs), recent generative Multimodal Large Language Models (MLLMs) have gained prominence as a pivotal research area, exhibiting remarkable capability for both comprehension and generation. In this work, we address the evaluation of generative comprehension in MLLMs as a preliminary step towards a comprehensive assessment of generative models, by introducing a benchmark named SEED-Bench. SEED-Bench consists of 19K multiple choice questions with accurate human annotations (x 6 larger than existing benchmarks), which spans 12 evaluation dimensions including the comprehension of both the image and video modality. We develop an advanced pipeline for generating multiple-choice questions that target specific evaluation dimensions, integrating both automatic filtering and manual verification processes. Multiple-choice questions with groundtruth options derived from human annotation enables an objective and efficient assessment of model performance, eliminating the need for human or GPT intervention during evaluation. We further evaluate the performance of 18 models across all 12 dimensions, covering both the spatial and temporal understanding. By revealing the limitations of existing MLLMs through evaluation results, we aim for SEED-Bench to provide insights for motivating future research. We will launch and consistently maintain a leaderboard to provide a platform for the community to assess and investigate model capability.

摘要

paper_url: http://arxiv.org/abs/2308.02037
repo_url: None
paper_authors: Shu-Feng Tsao, Helen Chen, Samantha Meyer, Zahid A. Butt
for: This study aims to propose a novel conceptual framework for misinformation research using social media data and natural language processing techniques, with a focus on understanding public discourse on social media and its impact on public health behavior.methods: The study uses a literature review to analyze and critique existing theories and models used in COVID-19 related studies, and proposes a new conceptual framework that integrates important attributes of existing theories and adds new attributes. The proposed framework is demonstrated through a case study of the Freedom Convoy social media listening.results: The proposed conceptual framework can be used to better understand public discourse on social media and its impact on public health behavior, and can be integrated with other data analyses to gather a more comprehensive picture. The framework is flexible and can be revised and adopted as health misinformation evolves.

Abstract
Existing communications and behavioral theories have been adopted to address health misinformation. Although various theories and models have been used to investigate the COVID-19 pandemic, there is no framework specially designed for social listening or misinformation studies using social media data and natural language processing techniques. This study aimed to propose a novel yet theory-based conceptual framework for misinformation research. We collected theories and models used in COVID-19 related studies published in peer-reviewed journals. The theories and models ranged from health behaviors, communications, to misinformation. They are analyzed and critiqued for their components, followed by proposing a conceptual framework with a demonstration. We reviewed Health Belief Model, Theory of Planned Behavior/Reasoned Action, Communication for Behavioral Impact, Transtheoretical Model, Uses and Gratifications Theory, Social Judgment Theory, Risk Information Seeking and Processing Model, Behavioral and Social Drivers, and Hype Loop. Accordingly, we proposed the Social Media Listening for Public Health Behavior Conceptual Framework by not only integrating important attributes of existing theories, but also adding new attributes. The proposed conceptual framework was demonstrated in the Freedom Convoy social media listening. The proposed conceptual framework can be used to better understand public discourse on social media, and it can be integrated with other data analyses to gather a more comprehensive picture. The framework will continue to be revised and adopted as health misinformation evolves.

摘要
现有的交流和行为理论已经应用于健康谣言研究中，但是没有专门为社交媒体数据和自然语言处理技术设计的框架。本研究的目的是提议一个新的 yet theory-based 概念框架 для谣言研究。我们收集了在科学期刊上发表的COVID-19相关研究中使用的理论和模型，包括健康行为模型、沟通行为模型、谣言模型等。我们分析和评价这些理论和模型的组成部分，然后提出了一个概念框架，并进行了示例。我们审查了健康信念模型、计划行为理论/逻辑行为理论、沟通对行为的影响、变革模型、用途和满足理论、社会评价理论、风险信息搜索和处理模型、行为和社会驱动力等理论。根据这些理论的重要属性，我们提出了社交媒体听取为公共卫生行为概念框架。这个框架不仅 integrates 重要的现有理论属性，还添加了新的属性。我们在自由征voyage社交媒体听取中进行了示例。这个框架可以用来更好地理解社交媒体上的公共讨论，并可以与其他数据分析结合以获得更全面的图像。这个框架将继续更新和采纳，随着健康谣言的发展。

Roll Up Your Sleeves: Working with a Collaborative and Engaging Task-Oriented Dialogue System

paper_url: http://arxiv.org/abs/2307.16081
repo_url: None
paper_authors: Lingbo Mo, Shijie Chen, Ziru Chen, Xiang Deng, Ashley Lewis, Sunit Singh, Samuel Stevens, Chang-You Tai, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun
for: 论文主要目标是开发一个用户中心的任务强调对话系统，帮助用户完成复杂的真实世界任务。
methods: 论文使用语言理解、对话管理和响应生成组件，以及一个强大的搜索引擎，以提供高效的任务协助。在增强对话体验方面，论文探讨了一系列的数据扩充策略，使用LLMs训练进阶 нейрон网络。
results: 论文通过Alexa Prize TaskBot Challenge中的成功参赛，证明了TACOBot在完成 cooking 和 how-to 类任务方面的效果。此外，论文还提供了一个开源框架，用于实现任务强调对话系统的部署。

Abstract
We introduce TacoBot, a user-centered task-oriented digital assistant designed to guide users through complex real-world tasks with multiple steps. Covering a wide range of cooking and how-to tasks, we aim to deliver a collaborative and engaging dialogue experience. Equipped with language understanding, dialogue management, and response generation components supported by a robust search engine, TacoBot ensures efficient task assistance. To enhance the dialogue experience, we explore a series of data augmentation strategies using LLMs to train advanced neural models continuously. TacoBot builds upon our successful participation in the inaugural Alexa Prize TaskBot Challenge, where our team secured third place among ten competing teams. We offer TacoBot as an open-source framework that serves as a practical example for deploying task-oriented dialogue systems.

摘要
我们介绍TacoBot，一个用户中心的任务导向的数位助手，旨在帮助用户完成复杂的现实世界任务，这些任务通常有多步骤。TacoBot 覆盖了厨艺和如何进行任务的广泛领域，我们目标是提供一个协力和有趣的对话体验。TacoBot 搭配了语言理解、对话管理和回应生成的 комponents，这些 комponents 由一个强大的搜索引擎支持。为了增强对话体验，我们探索了一系列的数据增强策略，使用LLMs训练进阶的神经网络模型。TacoBot 基于我们在Alexa Prize TaskBot Challenge的成功参赛经验，我们的队伍在十支队伍中排名第三。我们提供TacoBot 作为一个开源框架，作为实际的部署任务对话系统的示范。

ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus

paper_url: http://arxiv.org/abs/2307.16071
repo_url: None
paper_authors: Tolulope Ogunremi, Kola Tubosun, Anuoluwapo Aremu, Iroro Orife, David Ifeoluwa Adelani
for: 提高现代尼日利亚语言讲话质量的数据集
methods: 使用新闻和创作域的文本句子，并由多个说话者录音
results: 提供38.5小时的数据集，来自80名志愿者的录音

Abstract
We introduce the \`{I}r\`{o}y\`{i}nSpeech corpus -- a new dataset influenced by a desire to increase the amount of high quality, freely available, contemporary Yor\`{u}b\'{a} speech. We release a multi-purpose dataset that can be used for both TTS and ASR tasks. We curated text sentences from the news and creative writing domains under an open license i.e., CC-BY-4.0 and had multiple speakers record each sentence. We provide 5000 of our utterances to the Common Voice platform to crowdsource transcriptions online. The dataset has 38.5 hours of data in total, recorded by 80 volunteers.

摘要
我们介绍《IroyinSpeech》 corpus -- 一个新的数据集，受到了提高现代尤布语言质量、可以免费使用的需求的影响。我们发布了多用途的数据集，可以用于 TTS 和 ASR 任务。我们从新闻和创作领域中选取了 CC-BY-4.0 开源许可证下的文本句子，并有多个说话者录制每句话。我们提供了5000个音频记录给 Common Voice 平台，以便在线受托写 транскрипт。总共有38.5小时的数据，记录了80名志愿者。

Automatic Extraction of the Romanian Academic Word List: Data and Methods

paper_url: http://arxiv.org/abs/2307.16045
repo_url: https://github.com/bucuram/ro-awl
paper_authors: Ana-Maria Bucur, Andreea Dincă, Mădălina Chitez, Roxana Rogobete
for: 这篇论文是为了自动提取罗马尼亚学术词汇列表（Ro-AWL）的方法和数据。
methods: 这篇论文使用了 corpus 和计算语言学的方法，以及 L2 学习Contexts 的 Writing 方法，将数据组合在一起生成 Ro-AWL。
results: 研究人员通过对两种数据进行组合，包括现有的 Romanian Frequency List 和自编的 Expert Academic Writing Corpus EXPRES，成功地生成了 Ro-AWL，并且其分布特征（总分布、PART-OF-SPEECH 分布）与先前的研究相符。

Abstract
This paper presents the methodology and data used for the automatic extraction of the Romanian Academic Word List (Ro-AWL). Academic Word Lists are useful in both L2 and L1 teaching contexts. For the Romanian language, no such resource exists so far. Ro-AWL has been generated by combining methods from corpus and computational linguistics with L2 academic writing approaches. We use two types of data: (a) existing data, such as the Romanian Frequency List based on the ROMBAC corpus, and (b) self-compiled data, such as the expert academic writing corpus EXPRES. For constructing the academic word list, we follow the methodology for building the Academic Vocabulary List for the English language. The distribution of Ro-AWL features (general distribution, POS distribution) into four disciplinary datasets is in line with previous research. Ro-AWL is freely available and can be used for teaching, research and NLP applications.

摘要
这个论文介绍了自动提取罗马尼亚学术词汇列表（Ro-AWL）的方法和数据。学术词汇列表在L2和L1教学上都是有用的资源。为罗马尼亚语，目前没有相关资源。Ro-AWL通过将核心语言学和计算语言学方法与L2学术写作方法结合起来生成。我们使用两种数据：（a）现有数据，如罗马尼亚频率列表基于ROMBAC corpus，和（b）自制数据，如专家学术写作 corpus EXPRES。为构建学术词汇列表，我们遵循了英语学术词汇列表的建立方法。Ro-AWL的分布特征（总分布、POS分布）在四个学科数据集中与先前研究一致。Ro-AWL公开提供，可以用于教学、研究和NLP应用。

Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

paper_url: http://arxiv.org/abs/2307.16039
repo_url: https://github.com/nlp-uoregon/okapi
paper_authors: Viet Dac Lai, Chien Van Nguyen, Nghia Trung Ngo, Thuat Nguyen, Franck Dernoncourt, Ryan A. Rossi, Thien Huu Nguyen
for:The paper is written to explore instruction tuning for large language models (LLMs) in multiple languages, with a focus on reinforcement learning from human feedback (RLHF) as an alternative approach to supervised fine-tuning (SFT).methods:The paper uses RLHF to instruction-tune LLMs for multiple languages, introducing instruction and response-ranked data in 26 diverse languages to facilitate the experiments.results:The paper demonstrates the advantages of RLHF for multilingual instruction over SFT for different base models and datasets, and releases the framework and resources at https://github.com/nlp-uoregon/Okapi.

Abstract
A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are currently applied to produce the best commercial LLMs (e.g., ChatGPT). To improve the accessibility of LLMs for research and development efforts, various instruction-tuned open-source LLMs have also been introduced recently, e.g., Alpaca, Vicuna, to name a few. However, existing open-source LLMs have only been instruction-tuned for English and a few popular languages, thus hindering their impacts and accessibility to many other languages in the world. Among a few very recent work to explore instruction tuning for LLMs in multiple languages, SFT has been used as the only approach to instruction-tune LLMs for multiple languages. This has left a significant gap for fine-tuned LLMs based on RLHF in diverse languages and raised important questions on how RLHF can boost the performance of multilingual instruction tuning. To overcome this issue, we present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages. Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research. We also present benchmark datasets to enable the evaluation of generative LLMs in multiple languages. Our experiments demonstrate the advantages of RLHF for multilingual instruction over SFT for different base models and datasets. Our framework and resources are released at https://github.com/nlp-uoregon/Okapi.

摘要
具有大语言模型（LLM）的发展的关键技术之一是指令调整，帮助模型的回答与人类期望保持一致，从而实现了惊人的学习能力。目前最流行的两种方法 для实现指令调整是超级精度微调（SFT）和人类反馈学习（RLHF）。为了提高LLM的可访问性，各种指令调整的开源LLM也在不断发布，如Alpaca和Vicuna等。然而，现有的开源LLM仅仅对英语和一些流行语言进行了指令调整，这限制了它们在全球各语言中的影响和可用性。在最近几年中，一些研究已经开始探索LLM在多种语言上的指令调整，但是这些研究仅仅使用SFT进行指令调整。这留下了一个大的空白，即RLHF如何在多语言上提高指令调整的性能。为了解决这个问题，我们介绍了Okapi，第一个基于RLHF的多语言指令调整系统。Okapi在26种多样化的语言中提供了指令和回答排名数据，以便实验和未来多语言LLM研究的发展。我们还提供了多语言生成LLM的评价数据集。我们的实验表明，RLHF在多语言指令调整中具有优势，不同的基本模型和数据集上。我们的框架和资源在https://github.com/nlp-uoregon/Okapi上发布。

2023-07-30

cs.LG

cs.LG - 2023-07-30

Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup

paper_url: http://arxiv.org/abs/2308.00522
repo_url: None
paper_authors: Yan Sun, Li Shen, Hao Sun, Liang Ding, Dacheng Tao
for: 这 paper 的目的是提出一种基于 momentum 的 Federated Learning 算法，以解决分布式学习中的 rugged convergence 和客户端漂移问题。
methods: 该 paper 使用了一种名为 Federated Local ADaptive Amended optimizer（FedLADA），它将 global gradient descent 和 local adaptive amended optimizer 相结合，通过在前一个通信回合中估计全局平均偏移量，并通过一个 momentum-like 项来更好地改进实际训练速度和缓解不同客户端的过拟合。
results: 该 paper 的实验结果表明，使用 FedLADA 可以大幅减少通信回合数和实现更高的准确率，比如基于几个基elines的基elines。

Abstract
Adaptive optimization has achieved notable success for distributed learning while extending adaptive optimizer to federated Learning (FL) suffers from severe inefficiency, including (i) rugged convergence due to inaccurate gradient estimation in global adaptive optimizer; (ii) client drifts exacerbated by local over-fitting with the local adaptive optimizer. In this work, we propose a novel momentum-based algorithm via utilizing the global gradient descent and locally adaptive amended optimizer to tackle these difficulties. Specifically, we incorporate a locally amended technique to the adaptive optimizer, named Federated Local ADaptive Amended optimizer (\textit{FedLADA}), which estimates the global average offset in the previous communication round and corrects the local offset through a momentum-like term to further improve the empirical training speed and mitigate the heterogeneous over-fitting. Theoretically, we establish the convergence rate of \textit{FedLADA} with a linear speedup property on the non-convex case under the partial participation settings. Moreover, we conduct extensive experiments on the real-world dataset to demonstrate the efficacy of our proposed \textit{FedLADA}, which could greatly reduce the communication rounds and achieves higher accuracy than several baselines.

摘要
适应优化在分布式学习中获得了显著的成功，但扩展适应优化到联邦学习（FL）中受到严重的不稳定性困扰，包括（i）粗糙的收敛 due to 不准确的梯度估计在全局适应优化器中;（ii）客户端漂移加剧由本地适应优化器引起的本地过拟合。在这种情况下，我们提出了一种新的慢速逻辑算法，通过利用全局梯度下降和本地适应修正优化器来解决这些困难。 Specifically, we incorporate a locally amended technique to the adaptive optimizer, named Federated Local ADaptive Amended optimizer (\textit{FedLADA}), which estimates the global average offset in the previous communication round and corrects the local offset through a momentum-like term to further improve the empirical training speed and mitigate the heterogeneous over-fitting. 理论上，我们确立了\textit{FedLADA}的收敛率在非对称 случа下的线性快速性质。此外，我们在真实世界数据集上进行了广泛的实验，以证明我们提出的\textit{FedLADA}可以减少通信圈数并达到更高的准确率，比许多基eline的性能更高。

DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and Delivery Route Prediction

paper_url: http://arxiv.org/abs/2307.16246
repo_url: https://github.com/maoxiaowei97/drl4route
paper_authors: Xiaowei Mao, Haomin Wen, Hengrui Zhang, Huaiyu Wan, Lixia Wu, Jianbin Zheng, Haoyuan Hu, Youfang Lin
for: 预测劳务者的服务路线，提高快递服务质量和效率。
methods: 基于强化学习框架， combining 深度学习模型的行为学习能力和强化学习的非导数目标优化能力。
results: 在实际数据集上，对比既有方法，DRL4Route-GAE 提高了 Location Square Deviation (LSD) 和 Accuracy@3 (ACC@3) 的值，具体提高了 0.9%-2.7% 和 2.4%-3.2%。

Abstract
Pick-up and Delivery Route Prediction (PDRP), which aims to estimate the future service route of a worker given his current task pool, has received rising attention in recent years. Deep neural networks based on supervised learning have emerged as the dominant model for the task because of their powerful ability to capture workers' behavior patterns from massive historical data. Though promising, they fail to introduce the non-differentiable test criteria into the training process, leading to a mismatch in training and test criteria. Which considerably trims down their performance when applied in practical systems. To tackle the above issue, we present the first attempt to generalize Reinforcement Learning (RL) to the route prediction task, leading to a novel RL-based framework called DRL4Route. It combines the behavior-learning abilities of previous deep learning models with the non-differentiable objective optimization ability of reinforcement learning. DRL4Route can serve as a plug-and-play component to boost the existing deep learning models. Based on the framework, we further implement a model named DRL4Route-GAE for PDRP in logistic service. It follows the actor-critic architecture which is equipped with a Generalized Advantage Estimator that can balance the bias and variance of the policy gradient estimates, thus achieving a more optimal policy. Extensive offline experiments and the online deployment show that DRL4Route-GAE improves Location Square Deviation (LSD) by 0.9%-2.7%, and Accuracy@3 (ACC@3) by 2.4%-3.2% over existing methods on the real-world dataset.

摘要
picked-up 和交付路线预测（PDRP）在最近几年内 Received rising attention，目的是计算工作者当前任务池的未来服务路线。基于supervised learning的深度神经网络在任务中 Emerged as the dominant model，因为它们可以很好地捕捉工作者的行为模式从大量历史数据中。虽然有前景，但它们无法将不对数据进行梯度下降的测试标准引入到训练过程中，导致训练和测试标准之间的匹配性异常低。为解决这一问题，我们提出了将Reinforcement Learning（RL）应用于路线预测任务，并提出了一个基于RL的框架 called DRL4Route。这个框架结合了以前的深度学习模型中的行为学习能力和RL的非梯度优化能力。DRL4Route可以作为现有的深度学习模型的插件，以提高其性能。基于此框架，我们进一步实现了一个名为DRL4Route-GAE的模型，用于PDRP在物流服务中。这个模型采用actor-critic架构，并配备一个Generalized Advantage Estimator，可以平衡策略梯度估计的偏好和方差，从而实现更优化的策略。经过大量的离线实验和在线部署，我们发现DRL4Route-GAE可以在实际数据上提高Location Square Deviation（LSD）和Accuracy@3（ACC@3）的值，相比 existed 方法，LSD提高0.9%-2.7%，ACC@3提高2.4%-3.2%。

Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning: A Survey

paper_url: http://arxiv.org/abs/2307.16236
repo_url: None
paper_authors: Gabriele Lagani, Fabrizio Falchi, Claudio Gennaro, Giuseppe Amato
for: 本文探讨了基于深度学习的新技术，以及它们在人工智能领域中的应用和挑战。
methods: 本文描述了一些基于生物机制的深度学习模型，包括synaptic plasticity模型和脉冲神经网络（SNNs）模型。
results: 本文总结了这些生物启发的深度学习模型在不同场景下的应用和效果，并指出了这些模型在人工智能领域的潜在发展前景。

Abstract
Recently emerged technologies based on Deep Learning (DL) achieved outstanding results on a variety of tasks in the field of Artificial Intelligence (AI). However, these encounter several challenges related to robustness to adversarial inputs, ecological impact, and the necessity of huge amounts of training data. In response, researchers are focusing more and more interest on biologically grounded mechanisms, which are appealing due to the impressive capabilities exhibited by biological brains. This survey explores a range of these biologically inspired models of synaptic plasticity, their application in DL scenarios, and the connections with models of plasticity in Spiking Neural Networks (SNNs). Overall, Bio-Inspired Deep Learning (BIDL) represents an exciting research direction, aiming at advancing not only our current technologies but also our understanding of intelligence.

摘要
（注意：以下是简化中文版本，如果您需要正式的中文版本，请勿使用这个版本）Recently emerged Deep Learning (DL) technologies have achieved remarkable results in various Artificial Intelligence (AI) tasks, but they also face challenges such as robustness to adversarial inputs, ecological impact, and the need for large amounts of training data. In response, researchers are increasingly interested in biologically grounded mechanisms, which are attractive due to the impressive capabilities of biological brains. This survey explores a range of biologically inspired models of synaptic plasticity, their applications in DL scenarios, and connections with models of plasticity in Spiking Neural Networks (SNNs). Overall, Bio-Inspired Deep Learning (BIDL) is an exciting research direction that aims to advance not only our current technologies but also our understanding of intelligence.

Spiking Neural Networks and Bio-Inspired Supervised Deep Learning: A Survey

paper_url: http://arxiv.org/abs/2307.16235
repo_url: None
paper_authors: Gabriele Lagani, Fabrizio Falchi, Claudio Gennaro, Giuseppe Amato
for: This survey provides a comprehensive review of recent biologically-inspired approaches for Artificial Intelligence (AI) technologies, with a focus on Spiking Neural Network (SNN) models and bio-inspired training methods.
methods: The survey discusses SNN models and their challenges, as well as bio-inspired training methods that pose alternatives to traditional backprop-based optimization. These methods aim to advance the computational capabilities and biological plausibility of current models.
results: The survey provides a thorough presentation of recent biologically-inspired approaches for AI, including SNN models and bio-inspired training methods. These approaches aim to improve the computational capabilities and biological plausibility of current AI models.

Abstract
For a long time, biology and neuroscience fields have been a great source of inspiration for computer scientists, towards the development of Artificial Intelligence (AI) technologies. This survey aims at providing a comprehensive review of recent biologically-inspired approaches for AI. After introducing the main principles of computation and synaptic plasticity in biological neurons, we provide a thorough presentation of Spiking Neural Network (SNN) models, and we highlight the main challenges related to SNN training, where traditional backprop-based optimization is not directly applicable. Therefore, we discuss recent bio-inspired training methods, which pose themselves as alternatives to backprop, both for traditional and spiking networks. Bio-Inspired Deep Learning (BIDL) approaches towards advancing the computational capabilities and biological plausibility of current models.

摘要
For a long time, biology and neuroscience fields have been a great source of inspiration for computer scientists, towards the development of Artificial Intelligence (AI) technologies. This survey aims at providing a comprehensive review of recent biologically-inspired approaches for AI. After introducing the main principles of computation and synaptic plasticity in biological neurons, we provide a thorough presentation of Spiking Neural Network (SNN) models, and we highlight the main challenges related to SNN training, where traditional backprop-based optimization is not directly applicable. Therefore, we discuss recent bio-inspired training methods, which pose themselves as alternatives to backprop, both for traditional and spiking networks. Bio-Inspired Deep Learning (BIDL) approaches towards advancing the computational capabilities and biological plausibility of current models.Here's the text in Simplified Chinese characters: For a long time, biology and neuroscience fields have been a great source of inspiration for computer scientists, towards the development of Artificial Intelligence (AI) technologies. This survey aims at providing a comprehensive review of recent biologically-inspired approaches for AI. After introducing the main principles of computation and synaptic plasticity in biological neurons, we provide a thorough presentation of Spiking Neural Network (SNN) models, and we highlight the main challenges related to SNN training, where traditional backprop-based optimization is not directly applicable. Therefore, we discuss recent bio-inspired training methods, which pose themselves as alternatives to backprop, both for traditional and spiking networks. Bio-Inspired Deep Learning (BIDL) approaches towards advancing the computational capabilities and biological plausibility of current models.

Robust Electric Vehicle Balancing of Autonomous Mobility-On-Demand System: A Multi-Agent Reinforcement Learning Approach

paper_url: http://arxiv.org/abs/2307.16228
repo_url: None
paper_authors: Sihong He, Shuo Han, Fei Miao
for:The paper is written for electric autonomous vehicles (EAVs) in future autonomous mobility-on-demand (AMoD) systems, with the goal of designing an integrated vehicle balancing solution that can handle supply and demand uncertainties.methods:The paper uses multi-agent reinforcement learning (MARL) to model both the EAVs supply and mobility demand uncertainties, and proposes a robust E-AMoD Balancing MARL (REBAMA) algorithm to train a robust EAVs balancing policy that can balance both the supply-demand ratio and charging utilization rate across the whole city.results:The proposed robust method improves the reward, charging utilization fairness, and supply-demand fairness compared to a non-robust MARL method and a robust optimization-based method. Specifically, the proposed method improves the reward by 19.28%, charging utilization fairness by 28.18%, and supply-demand fairness by 3.97%, compared to the non-robust MARL method. Compared to the robust optimization-based method, the proposed MARL algorithm improves the reward by 8.21%, charging utilization fairness by 8.29%, and supply-demand fairness by 9.42%.

Abstract
Electric autonomous vehicles (EAVs) are getting attention in future autonomous mobility-on-demand (AMoD) systems due to their economic and societal benefits. However, EAVs' unique charging patterns (long charging time, high charging frequency, unpredictable charging behaviors, etc.) make it challenging to accurately predict the EAVs supply in E-AMoD systems. Furthermore, the mobility demand's prediction uncertainty makes it an urgent and challenging task to design an integrated vehicle balancing solution under supply and demand uncertainties. Despite the success of reinforcement learning-based E-AMoD balancing algorithms, state uncertainties under the EV supply or mobility demand remain unexplored. In this work, we design a multi-agent reinforcement learning (MARL)-based framework for EAVs balancing in E-AMoD systems, with adversarial agents to model both the EAVs supply and mobility demand uncertainties that may undermine the vehicle balancing solutions. We then propose a robust E-AMoD Balancing MARL (REBAMA) algorithm to train a robust EAVs balancing policy to balance both the supply-demand ratio and charging utilization rate across the whole city. Experiments show that our proposed robust method performs better compared with a non-robust MARL method that does not consider state uncertainties; it improves the reward, charging utilization fairness, and supply-demand fairness by 19.28%, 28.18%, and 3.97%, respectively. Compared with a robust optimization-based method, the proposed MARL algorithm can improve the reward, charging utilization fairness, and supply-demand fairness by 8.21%, 8.29%, and 9.42%, respectively.

摘要
电动自动车 (EAV) 在未来自动移动需求 (AMoD) 系统中吸引了关注，因为它们具有经济和社会的好处。然而，EAV 的充电特点 (长充电时间、高充电频率、不可预测的充电行为等) 使得预测 EAV 供应很困难。此外，移动需求的预测不确定性使得设计一个集成的车辆均衡解决方案变得非常困难和挑战性。虽然激励学习基于 E-AMoD 均衡算法得到了成功，但是状态不确定性以及 EV 供应或移动需求的预测不确定性尚未得到探讨。在这种情况下，我们设计了一个多代理激励学习 (MARL) 基础的框架，用于 EAV 均衡解决方案。我们在这个框架中引入了对 EAV 供应和移动需求不确定性的模型，以便模拟这些不确定性的影响。我们then propose a robust E-AMoD Balancing MARL (REBAMA) algorithm to train a robust EAVs balancing policy to balance both the supply-demand ratio and charging utilization rate across the whole city.实验表明，我们提出的方法比非robust MARL 方法更好，可以提高奖励、充电利用公平性和供应需求公平性的表现。与一种robust优化基础的方法进行比较，我们的方法可以提高奖励、充电利用公平性和供应需求公平性的表现。

Optimizing the Neural Network Training for OCR Error Correction of Historical Hebrew Texts

paper_url: http://arxiv.org/abs/2307.16220
repo_url: https://github.com/smartinternz02/SI-GuidedProject-2307-1622049182
paper_authors: Omri Suissa, Avshalom Elmalech, Maayan Zhitomirsky-Geffet
For: This paper aims to improve the accuracy of Optical Character Recognition (OCR) post-correction for historical documents by developing a method for automatically generating language and task-specific training data.* Methods: The proposed method uses a light-weight neural network and significantly less manually created data to correct OCR errors in Hebrew newspapers. The method is based on natural language analysis and machine learning techniques such as neural networks.* Results: The proposed method outperforms other state-of-the-art neural networks and complex spellcheckers for OCR post-correction, and the performance of the neural network depends on the genre and area of the training data.

Abstract
Over the past few decades, large archives of paper-based documents such as books and newspapers have been digitized using Optical Character Recognition. This technology is error-prone, especially for historical documents. To correct OCR errors, post-processing algorithms have been proposed based on natural language analysis and machine learning techniques such as neural networks. Neural network's disadvantage is the vast amount of manually labeled data required for training, which is often unavailable. This paper proposes an innovative method for training a light-weight neural network for Hebrew OCR post-correction using significantly less manually created data. The main research goal is to develop a method for automatically generating language and task-specific training data to improve the neural network results for OCR post-correction, and to investigate which type of dataset is the most effective for OCR post-correction of historical documents. To this end, a series of experiments using several datasets was conducted. The evaluation corpus was based on Hebrew newspapers from the JPress project. An analysis of historical OCRed newspapers was done to learn common language and corpus-specific OCR errors. We found that training the network using the proposed method is more effective than using randomly generated errors. The results also show that the performance of the neural network for OCR post-correction strongly depends on the genre and area of the training data. Moreover, neural networks that were trained with the proposed method outperform other state-of-the-art neural networks for OCR post-correction and complex spellcheckers. These results may have practical implications for many digital humanities projects.

摘要
在过去几十年，大量的纸质文档，如书籍和报纸，已经被数字化使用光学字符识别（OCR）技术。这种技术存在误差，尤其是对历史文档。为了修正OCR错误，基于自然语言分析和机器学习技术的后处理算法已经被提出。然而，这些算法需要大量的手动标注数据，却经常不可获得。这篇论文提出了一种创新的方法，使用较少的手动创建数据来训练轻量级的神经网络进行希伯来文OCR后处理。研究的主要目标是开发一种自动生成语言和任务特定的训练数据，以提高神经网络的OCR后处理效果，并investigate历史文档OCR后处理中哪种数据集是最有效的。为此，我们进行了一系列实验，使用了多个数据集。评估集基于希伯来报纸JPress项目。我们分析了历史OCR后的报纸，了解希伯来文OCR后处理中的常见语言和核心错误。我们发现，使用我们提出的方法训练神经网络是比使用随机生成错误更有效的。结果还表明，神经网络的OCR后处理效果强度取决于训练数据的类别和地区。此外，使用我们提出的方法训练的神经网络，超过了当前最佳的神经网络和复杂的拼写检查器。这些结果可能对许多数字人文项目产生实质性的影响。

Improving Probabilistic Bisimulation for MDPs Using Machine Learning

paper_url: http://arxiv.org/abs/2308.02519
repo_url: None
paper_authors: Mohammadsadegh Mohaghegh, Khayyam Salehi
for: 本文旨在应用形式验证技术来分析复杂系统，但是遇到了状态空间爆炸问题。
methods: 本文使用 bisimulation 减少模型状态数量，以解决状态空间爆炸问题。在涉及杂Event-driven系统时，使用概率 bisimulation 来减少模型的状态数量。
results: 本文提出一种新的方法，使用 PRISM 程序和机器学习分类技术来partition 模型的状态空间。实验结果显示，该方法可以减少运行时间相比之前的工具。

Abstract
The utilization of model checking has been suggested as a formal verification technique for analyzing critical systems. However, the primary challenge in applying to complex systems is state space explosion problem. To address this issue, bisimulation minimization has emerged as a prominent method for reducing the number of states in a labeled transition system, aiming to overcome the difficulties associated with the state space explosion problem. In the case of systems exhibiting stochastic behaviors, probabilistic bisimulation is employed to minimize a given model, obtaining its equivalent form with fewer states. Recently, various techniques have been introduced to decrease the time complexity of the iterative methods used to compute probabilistic bisimulation for stochastic systems that display nondeterministic behaviors. In this paper, we propose a new technique to partition the state space of a given probabilistic model to its bisimulation classes. This technique uses the PRISM program of a given model and constructs some small versions of the model to train a classifier. It then applies machine learning classification techniques to approximate the related partition. The resulting partition is used as an initial one for the standard bisimulation technique in order to reduce the running time of the method. The experimental results show that the approach can decrease significantly the running time compared to state-of-the-art tools.

摘要
utilization of model checking 被建议作为形式验证技术来分析关键系统。然而，主要挑战在应用于复杂系统时是状态空间爆炸问题。为解决这个问题， bisimulation minimization emerged as a prominent method for reducing the number of states in a labeled transition system, aiming to overcome the difficulties associated with the state space explosion problem。在系统展现杂次性行为时， probabilistic bisimulation 被使用来最小化给定模型，从而获得 fewer states 的等价形式。最近， various techniques have been introduced to decrease the time complexity of the iterative methods used to compute probabilistic bisimulation for stochastic systems that display nondeterministic behaviors。在这篇论文中，我们提出了一种新的方法，用于将 givens 模型的状态空间 partition 到其 bisimulation classes。这种方法使用 PRISM 程序的 givens 模型，并将其转换为一些小版本的模型，以训练一个类ifier。然后，通过机器学习分类技术来approximate 相关的 partition。 obtained 的 partition 被用作标准 bisimulation 技术的初始 partition，以降低方法的运行时间。实验结果表明，该方法可以在比较器与当前工具之间减少运行时间。

Text Analysis Using Deep Neural Networks in Digital Humanities and Information Science

paper_url: http://arxiv.org/abs/2307.16217
repo_url: None
paper_authors: Omri Suissa, Avshalom Elmalech, Maayan Zhitomirsky-Geffet
for: 本研究的目的是探讨如何在人文科技领域中使用深度神经网络（DNN）来自动分析文本资源，以便为人文科学研究（DH）提供更多的可靠的数据分析方法。
methods: 本研究使用了多个DNN模型来解决各种NLP任务，包括拼写检查、语言检测、实体提取、作者检测、问答等任务。这些模型通过从大量“正确”和“错误”示例中学习模式，并将其应用于新的示例。
results: 本研究通过分析多个DH研究 literatura 中的实践，探讨了使用DNN模型在DH研究中的两大挑战：数据AVAILABILITY和领域适应。此外，本研究还提出了一个实用的决策模型，以帮助DH专家在选择合适的深度学习方法时作出更好的决策。

Abstract
Combining computational technologies and humanities is an ongoing effort aimed at making resources such as texts, images, audio, video, and other artifacts digitally available, searchable, and analyzable. In recent years, deep neural networks (DNN) dominate the field of automatic text analysis and natural language processing (NLP), in some cases presenting a super-human performance. DNNs are the state-of-the-art machine learning algorithms solving many NLP tasks that are relevant for Digital Humanities (DH) research, such as spell checking, language detection, entity extraction, author detection, question answering, and other tasks. These supervised algorithms learn patterns from a large number of "right" and "wrong" examples and apply them to new examples. However, using DNNs for analyzing the text resources in DH research presents two main challenges: (un)availability of training data and a need for domain adaptation. This paper explores these challenges by analyzing multiple use-cases of DH studies in recent literature and their possible solutions and lays out a practical decision model for DH experts for when and how to choose the appropriate deep learning approaches for their research. Moreover, in this paper, we aim to raise awareness of the benefits of utilizing deep learning models in the DH community.

摘要
使用计算机技术和人文学是一项持续的努力，旨在使文本、图像、音频、视频和其他文物 digitally可用、搜索可用和分析可用。在过去几年中，深度神经网络（DNN）在自动文本分析和自然语言处理（NLP）领域占据了主导地位，在一些情况下表现出超人般的表现。DNN是当前最佳的机器学习算法，用于解决数字人文学（DH）研究中相关的许多NLP任务，如拼写检查、语言检测、实体提取、作者检测、问题回答等任务。这些有监督的算法通过大量“正确”和“错误”示例学习出模式，然后应用于新示例。但是，在DH研究中使用DNN分析文本资源存在两个主要挑战：数据训练的可用性和领域适应。本文分析了多个DH研究中的用例，并评估了它们的可能的解决方案，并提出了实用的决策模型，以帮助DH专家在选择合适的深度学习方法时做出决策。此外，本文的目的是提醒DH社区利用深度学习模型的好处。

Question Answering with Deep Neural Networks for Semi-Structured Heterogeneous Genealogical Knowledge Graphs

paper_url: http://arxiv.org/abs/2307.16214
repo_url: https://github.com/omrivm/uncle-bert
paper_authors: Omri Suissa, Maayan Zhitomirsky-Geffet, Avshalom Elmalech
for: 这个研究旨在开发一种基于家谱树的问答系统，以便为家谱研究提供更加精准的问答功能。
methods: 该研究使用了转换器模型，将家谱数据转换为知识图，然后与文本结合，并使用自动生成的家谱数据进行训练。
results: 研究发现，与开放领域问答模型相比，专门为家谱问答模型具有更高的精度和更低的复杂性。此外，该方法可能有实际意义 для家谱研究和实际项目，使家谱数据更加访问ible。

Abstract
With the rising popularity of user-generated genealogical family trees, new genealogical information systems have been developed. State-of-the-art natural question answering algorithms use deep neural network (DNN) architecture based on self-attention networks. However, some of these models use sequence-based inputs and are not suitable to work with graph-based structure, while graph-based DNN models rely on high levels of comprehensiveness of knowledge graphs that is nonexistent in the genealogical domain. Moreover, these supervised DNN models require training datasets that are absent in the genealogical domain. This study proposes an end-to-end approach for question answering using genealogical family trees by: 1) representing genealogical data as knowledge graphs, 2) converting them to texts, 3) combining them with unstructured texts, and 4) training a trans-former-based question answering model. To evaluate the need for a dedicated approach, a comparison between the fine-tuned model (Uncle-BERT) trained on the auto-generated genealogical dataset and state-of-the-art question-answering models was per-formed. The findings indicate that there are significant differences between answering genealogical questions and open-domain questions. Moreover, the proposed methodology reduces complexity while increasing accuracy and may have practical implications for genealogical research and real-world projects, making genealogical data accessible to experts as well as the general public.

摘要
随着家谱创建者自动生成的家谱树的 популяр度的提高，新的家谱信息系统已经被开发出来。现代自然问答算法使用深度神经网络（DNN）架构，其中一些模型使用序列化输入，不适合处理图structured数据，而图structured DNN模型则需要高度完整的知识图，而这种图structured 知识图在家谱领域缺失。此外，这些监督式DNN模型需要家谱领域缺失的培训数据。本研究提出了一种终端方法，通过以下步骤来回答家谱问题：1）将家谱数据转换为知识图，2）将其与未结构化文本结合，3）使用转换器基于 transformer 模型进行问答。为了评估需要专门的方法，对自动生成的家谱数据进行了精心 fine-tune 的 Uncle-BERT 模型和现有的问答模型进行比较。结果显示，回答家谱问题和开放领域问题存在显著差异。此外，提出的方法可以降低复杂性，提高准确性，并可能对家谱研究和实际项目产生实际意义，使家谱数据更加可访问ible для专家和一般公众。

Toward a Period-Specific Optimized Neural Network for OCR Error Correction of Historical Hebrew Texts

paper_url: http://arxiv.org/abs/2307.16213
repo_url: None
paper_authors: Omri Suissa, Maayan Zhitomirsky-Geffet, Avshalom Elmalech
for: 为了提高希伯来文件中的Optical Character Recognition（OCR）识别精度，提供一种多阶段方法。
methods: 使用神经网络进行OCR识别错误修复，并且通过人工生成的训练数据集和优化超参数来提高模型的性能。
results: 通过实验表明，该方法可以提高希伯来文件中OCR识别精度，并且可以适应不同的语言风格和时期变化。

Abstract
Over the past few decades, large archives of paper-based historical documents, such as books and newspapers, have been digitized using the Optical Character Recognition (OCR) technology. Unfortunately, this broadly used technology is error-prone, especially when an OCRed document was written hundreds of years ago. Neural networks have shown great success in solving various text processing tasks, including OCR post-correction. The main disadvantage of using neural networks for historical corpora is the lack of sufficiently large training datasets they require to learn from, especially for morphologically-rich languages like Hebrew. Moreover, it is not clear what are the optimal structure and values of hyperparameters (predefined parameters) of neural networks for OCR error correction in Hebrew due to its unique features. Furthermore, languages change across genres and periods. These changes may affect the accuracy of OCR post-correction neural network models. To overcome these challenges, we developed a new multi-phase method for generating artificial training datasets with OCR errors and hyperparameters optimization for building an effective neural network for OCR post-correction in Hebrew.

摘要
To overcome these challenges, we developed a new multi-phase method for generating artificial training datasets with OCR errors and hyperparameters optimization for building an effective neural network for OCR post-correction in Hebrew.

Robust Multi-Agent Reinforcement Learning with State Uncertainty

paper_url: http://arxiv.org/abs/2307.16212
repo_url: https://github.com/sihongho/robust_marl_with_state_uncertainty
paper_authors: Sihong He, Songyang Han, Sanbao Su, Shuo Han, Shaofeng Zou, Fei Miao
for: 本研究旨在解决多代理人学习（MARL）中存在状态不确定性（state uncertainty）的问题，提高agent的稳定性和可靠性。
methods: 本研究使用Markov Game with state perturbation adversaries（MG-SPA）模型，并提出了一种robust equilibrium（RE）作为解题方法。然后，提出了一种robust multi-agent Q-learning（RMAQ）算法来实现RE，并在高维状态动作空间中提出了一种robust multi-agent actor-critic（RMAAC）算法。
results: 实验结果表明，提出的RMAQ算法能够 converges to the optimal value function，而RMAAC算法在多个多代理人环境中比较多个MARL和robust MARL方法表现更好，特别是在状态不确定性存在时。

Abstract
In real-world multi-agent reinforcement learning (MARL) applications, agents may not have perfect state information (e.g., due to inaccurate measurement or malicious attacks), which challenges the robustness of agents' policies. Though robustness is getting important in MARL deployment, little prior work has studied state uncertainties in MARL, neither in problem formulation nor algorithm design. Motivated by this robustness issue and the lack of corresponding studies, we study the problem of MARL with state uncertainty in this work. We provide the first attempt to the theoretical and empirical analysis of this challenging problem. We first model the problem as a Markov Game with state perturbation adversaries (MG-SPA) by introducing a set of state perturbation adversaries into a Markov Game. We then introduce robust equilibrium (RE) as the solution concept of an MG-SPA. We conduct a fundamental analysis regarding MG-SPA such as giving conditions under which such a robust equilibrium exists. Then we propose a robust multi-agent Q-learning (RMAQ) algorithm to find such an equilibrium, with convergence guarantees. To handle high-dimensional state-action space, we design a robust multi-agent actor-critic (RMAAC) algorithm based on an analytical expression of the policy gradient derived in the paper. Our experiments show that the proposed RMAQ algorithm converges to the optimal value function; our RMAAC algorithm outperforms several MARL and robust MARL methods in multiple multi-agent environments when state uncertainty is present. The source code is public on \url{https://github.com/sihongho/robust_marl_with_state_uncertainty}.

摘要
在实际多智能体强化学习（MARL）应用中，智能体可能不具备完整的状态信息（例如因为不准确的测量或攻击），这会挑战智能体的策略的稳定性。虽然稳定性在MARL部署中变得越来越重要，但前一些研究却没有系统地研究了状态不确定性在MARL中的问题。我们受到这种稳定性问题和相关研究的缺失启发，在这里研究了MARL中的状态不确定性问题。我们首次在Markov游戏中引入状态干扰对手（MG-SPA），并将状态干扰对手作为问题的解决方案。我们 THEN 进行了基本的分析和探索，包括状态不确定性下的稳定性存在的条件。然后，我们提出了一种可靠的多智能体Q学习（RMAQ）算法，以找到这种稳定性的解决方案，并有确定的收敛保证。为了处理高维状态动作空间，我们设计了一种基于分析表达的策略梯度的多智能体actor-critic（RMAAC）算法。我们的实验表明，我们的RMAQ算法可以到达优质函数的优化值；我们的RMAAC算法在多个多智能体环境中高效地处理状态不确定性。我们的代码可以在 \url{https://github.com/sihongho/robust_marl_with_state_uncertainty} 上获取。

paper_url: http://arxiv.org/abs/2307.16210
repo_url: https://github.com/zjukg/UMAEA
paper_authors: Zhuo Chen, Lingbing Guo, Yin Fang, Yichi Zhang, Jiaoyan Chen, Jeff Z. Pan, Yangning Li, Huajun Chen, Wen Zhang
for: 本文主要目标是提出一种robust多模态实体对应方法，以解决在多个知识图(KG)中存在不完整的视觉模态的问题。
methods: 本文使用了最新的MMEA模型，并在我们提出的MMEA-UMVM数据集上进行了 benchmarking。该数据集包括了双语和单语对照KG，并采用了标准(非迭代)和迭代训练方法来评估模型性能。
results: 研究结果表明，在面临多模态不完整性时，模型容易过拟合多模态噪音，并出现高比例的性能波动或下降。这表明，在某些情况下，附加的多模态数据可能会对实体对应性能产生负面影响。为解决这些挑战，我们提出了UMAEA方法，它可以有效地处理不确定的多模态视觉信息。UMAEA方法在所有97个分 splitting中表现出色，superiority 过 existed baseline，并且具有限制parameters和时间消耗的优点。

Abstract
As a crucial extension of entity alignment (EA), multi-modal entity alignment (MMEA) aims to identify identical entities across disparate knowledge graphs (KGs) by exploiting associated visual information. However, existing MMEA approaches primarily concentrate on the fusion paradigm of multi-modal entity features, while neglecting the challenges presented by the pervasive phenomenon of missing and intrinsic ambiguity of visual images. In this paper, we present a further analysis of visual modality incompleteness, benchmarking latest MMEA models on our proposed dataset MMEA-UMVM, where the types of alignment KGs covering bilingual and monolingual, with standard (non-iterative) and iterative training paradigms to evaluate the model performance. Our research indicates that, in the face of modality incompleteness, models succumb to overfitting the modality noise, and exhibit performance oscillations or declines at high rates of missing modality. This proves that the inclusion of additional multi-modal data can sometimes adversely affect EA. To address these challenges, we introduce UMAEA , a robust multi-modal entity alignment approach designed to tackle uncertainly missing and ambiguous visual modalities. It consistently achieves SOTA performance across all 97 benchmark splits, significantly surpassing existing baselines with limited parameters and time consumption, while effectively alleviating the identified limitations of other models. Our code and benchmark data are available at https://github.com/zjukg/UMAEA.

摘要
Traditional multi-modal entity alignment (MMEA) aims to identify the same entity across different knowledge graphs (KGs) by leveraging associated visual information. However, existing MMEA methods primarily focus on fusing multi-modal entity features, while neglecting the challenges posed by the prevalent phenomenon of missing and inherent ambiguity of visual images. In this paper, we conduct a further analysis of the incompleteness of visual modalities, and benchmark the latest MMEA models on our proposed dataset MMEA-UMVM, which includes bilingual and monolingual alignment graphs with standard (non-iterative) and iterative training paradigms to evaluate model performance. Our findings indicate that, in the face of modality incompleteness, models tend to overfit the modality noise and exhibit performance fluctuations or declines at high rates of missing modality. This suggests that the inclusion of additional multi-modal data can sometimes adversely affect entity alignment. To address these challenges, we propose UMAEA, a robust multi-modal entity alignment approach designed to handle uncertain, missing, and ambiguous visual modalities. It consistently achieves state-of-the-art (SOTA) performance across all 97 benchmark splits, significantly outperforming existing baselines with limited parameters and time consumption, while effectively alleviating the limitations of other models. Our code and benchmark data are available at https://github.com/zjukg/UMAEA.

Around the GLOBE: Numerical Aggregation Question-Answering on Heterogeneous Genealogical Knowledge Graphs with Deep Neural Networks

paper_url: http://arxiv.org/abs/2307.16208
repo_url: None
paper_authors: Omri Suissa, Maayan Zhitomirsky-Geffet, Avshalom Elmalech
for: 这个研究是为了提高基础设施领域中的数字资产管理和研究效率。
methods: 该研究使用了自动化数据集训练方法、转换器基于表格选择方法和优化的转换器基于数字聚合Question Answering模型。
results: 研究发现，提案的建筑GLOBE，在数字聚合Question Answering任务中的准确率为87%，比现有状态方法和管道的准确率提高了66%。

Abstract
One of the key AI tools for textual corpora exploration is natural language question-answering (QA). Unlike keyword-based search engines, QA algorithms receive and process natural language questions and produce precise answers to these questions, rather than long lists of documents that need to be manually scanned by the users. State-of-the-art QA algorithms based on DNNs were successfully employed in various domains. However, QA in the genealogical domain is still underexplored, while researchers in this field (and other fields in humanities and social sciences) can highly benefit from the ability to ask questions in natural language, receive concrete answers and gain insights hidden within large corpora. While some research has been recently conducted for factual QA in the genealogical domain, to the best of our knowledge, there is no previous research on the more challenging task of numerical aggregation QA (i.e., answering questions combining aggregation functions, e.g., count, average, max). Numerical aggregation QA is critical for distant reading and analysis for researchers (and the general public) interested in investigating cultural heritage domains. Therefore, in this study, we present a new end-to-end methodology for numerical aggregation QA for genealogical trees that includes: 1) an automatic method for training dataset generation; 2) a transformer-based table selection method, and 3) an optimized transformer-based numerical aggregation QA model. The findings indicate that the proposed architecture, GLOBE, outperforms the state-of-the-art models and pipelines by achieving 87% accuracy for this task compared to only 21% by current state-of-the-art models. This study may have practical implications for genealogical information centers and museums, making genealogical data research easy and scalable for experts as well as the general public.

摘要
一种关键的人工智能工具 для文本 corpora 探索是自然语言问答（QA）。与关键词搜索引擎不同，QA 算法会根据自然语言问题进行处理，而不是将长列表交给用户手动搜索。现状的QA 算法基于 DNN 已经在不同领域得到成功应用。然而，在家谱领域，QA 仍然处于未探索的阶段，而研究人员在这个领域（以及人文社会科学领域）可以很大程度上受益于能够通过自然语言提问，得到准确的答案，并从大量文本中获得隐藏的洞察。虽然有些研究在家谱领域的事实Question Answering（QA）方面已经进行，但我们知道，对数字和平均函数的QA（即Answering questions combining aggregation functions, e.g., count, average, max）的研究尚未进行。这种QA 任务对于远程阅读和分析是非常重要的，因此，在这种研究中，我们提出了一种新的综合方法，包括：1）自动生成训练数据集方法；2）基于 transformer 的表格选择方法；3）优化的 transformer 基于 numerical aggregation QA 模型。研究结果表明，我们提出的架构，GLOBE，在这个任务上比现状模型和管道的性能高出87%，而不是只有21%。这个研究可能具有实质性的实际应用，使得家谱信息中心和博物馆的研究变得容易和可扩展，以便专家和一般公众都可以使用。

Deep Convolutional Neural Networks with Zero-Padding: Feature Extraction and Learning

paper_url: http://arxiv.org/abs/2307.16203
repo_url: https://github.com/liubc17/eDCNN_zero_padding
paper_authors: Zhi Han, Baichen Liu, Shao-Bo Lin, Ding-Xuan Zhou
for: 这个论文研究了深度卷积神经网络（DCNNs）中零填充的表现和学习。
methods: 论文首先验证了零填充在特征提取和学习中的作用，并证明了 pooling 的翻译不变性驱动性。然后，论文表明了任何深度全连接神经网络（DFCNs）可以被表示为 DCNNs WITH 零填充，这表明了 DCNNs 的更好的特征提取能力。
results: 论文 derivates 了 DCNNs WITH 零填充的 universal consistency，并证明了其在学习过程中的翻译不变性。numerical experiments 验证了这些理论结论，包括 both Toy 仿真和实际数据运行。

Abstract
This paper studies the performance of deep convolutional neural networks (DCNNs) with zero-padding in feature extraction and learning. After verifying the roles of zero-padding in enabling translation-equivalence, and pooling in its translation-invariance driven nature, we show that with similar number of free parameters, any deep fully connected networks (DFCNs) can be represented by DCNNs with zero-padding. This demonstrates that DCNNs with zero-padding is essentially better than DFCNs in feature extraction. Consequently, we derive universal consistency of DCNNs with zero-padding and show its translation-invariance in the learning process. All our theoretical results are verified by numerical experiments including both toy simulations and real-data running.

摘要

Shuffled Differentially Private Federated Learning for Time Series Data Analytics

paper_url: http://arxiv.org/abs/2307.16196
repo_url: None
paper_authors: Chenxi Huang, Chaoyang Jiang, Zhenghua Chen
for: 针对时间序列数据进行信任性联合学习，以达到最佳性能 while 保护客户端的隐私。
methods: 使用本地差分隐私来延伸隐私保护信赖关系到客户端，并 incorporate 摇摆技术以进一步增强隐私。
results: 在五个时间序列数据集上进行了广泛的实验，结果显示我们的算法在小客户和大客户enario 中都实现了最小的精度损失，并在同等隐私保护水平下与中央差分隐私联合学习的比较更好。

Abstract
Trustworthy federated learning aims to achieve optimal performance while ensuring clients' privacy. Existing privacy-preserving federated learning approaches are mostly tailored for image data, lacking applications for time series data, which have many important applications, like machine health monitoring, human activity recognition, etc. Furthermore, protective noising on a time series data analytics model can significantly interfere with temporal-dependent learning, leading to a greater decline in accuracy. To address these issues, we develop a privacy-preserving federated learning algorithm for time series data. Specifically, we employ local differential privacy to extend the privacy protection trust boundary to the clients. We also incorporate shuffle techniques to achieve a privacy amplification, mitigating the accuracy decline caused by leveraging local differential privacy. Extensive experiments were conducted on five time series datasets. The evaluation results reveal that our algorithm experienced minimal accuracy loss compared to non-private federated learning in both small and large client scenarios. Under the same level of privacy protection, our algorithm demonstrated improved accuracy compared to the centralized differentially private federated learning in both scenarios.

摘要
信任worthy的联合学习目标是实现最佳性能，同时保障客户端的隐私。现有的隐私保护联合学习方法主要针对图像数据，缺乏应用于时间序列数据，这种数据在机器健康监测、人活动识别等领域具有重要应用。此外，对时间序列数据分析模型的保护噪声可能会对时间相关的学习产生干扰，导致准确性下降。为解决这些问题，我们开发了一种隐私保护的联合学习算法 для时间序列数据。具体来说，我们使用本地差分隐私来扩展隐私保护的信任边界到客户端。我们还 integrates 搅拌技术来实现隐私增强，为了 Mitigating the accuracy decline caused by leveraging local differential privacy.我们对五个时间序列数据集进行了广泛的实验。评估结果表明，我们的算法在小客户和大客户场景中都体现出较少的准确性下降，与非隐私联合学习相比。同时，在保持同样的隐私保护水平下，我们的算法在两个场景中都表现出了与中央差分隐私联合学习的改进准确性。

An Efficient Approach to Mitigate Numerical Instability in Backpropagation for 16-bit Neural Network Training

paper_url: http://arxiv.org/abs/2307.16189
repo_url: None
paper_authors: Juyoung Yun
for: 这个研究探讨了在机器学习模型16位计算中出现的数学不稳定性问题，特别是在广泛使用的优化算法RMSProp和Adam中。
methods: 研究人员发现了epsilongamma的单个参数对这种数学不稳定性问题产生了主要影响，并提出了一种新的方法来缓解这些问题。
results: 研究人员发现，通过轻微调整epsilongamma的值，可以恢复RMSProp和Adam在16位计算中的正常功能，并提高了深度神经网络训练过程的稳定性。

Abstract
In this research, we delve into the intricacies of the numerical instability observed in 16-bit computations of machine learning models, particularly when employing popular optimization algorithms such as RMSProp and Adam. This instability is commonly experienced during the training phase of deep neural networks, leading to disrupted learning processes and hindering the effective deployment of such models. We identify the single hyperparameter, epsilon, as the main culprit behind this numerical instability. An in-depth exploration of the role of epsilon in these optimizers within 16-bit computations reveals that a minor adjustment of its value can restore the functionality of RMSProp and Adam, consequently enabling the effective utilization of 16-bit neural networks. We propose a novel method to mitigate the identified numerical instability issues. This method capitalizes on the updates from the Adam optimizer and significantly improves the robustness of the learning process in 16-bit computations. This study contributes to better understanding of optimization in low-precision computations and provides an effective solution to a longstanding issue in training deep neural networks, opening new avenues for more efficient and stable model training.

摘要
在这项研究中，我们探讨了16位计算中机器学习模型的数值不稳定现象，尤其是在广泛使用的优化算法such as RMSProp和Adam时。这种不稳定现象通常在深度神经网络训练阶段出现，导致学习过程中断并阻碍深度神经网络的有效部署。我们确定了ε参数为这种数值不稳定的主要罪魁。在16位计算中的RMSProp和Adam优化器中，我们进行了深入的探讨，发现一小调整ε参数的值可以恢复这些优化器的功能，从而启用16位神经网络的有效使用。我们提出了一种新的约束数值不稳定问题的方法。这种方法基于Adam优化器的更新，可以在16位计算中提高学习过程的稳定性。本研究对低精度计算中优化的理解做出了贡献，并提供了训练深度神经网络的有效解决方案，开创了更高效和稳定的模型训练新途径。

ESP: Exploiting Symmetry Prior for Multi-Agent Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.16186
repo_url: None
paper_authors: Xin Yu, Rongye Shi, Pu Feng, Yongkai Tian, Jie Luo, Wenjun Wu
for: 这篇论文旨在提高多智能体强化学习（MARL）的数据效率。
methods: 该paper提出了一个基于同质现象的框架，通过融合数据增强和一个妥善设计的一致损失函数，以提高现有MARL方法的数据效率。
results: 实验结果显示，提案的框架能够提高多个具有挑战性的任务的数据效率。此外，该框架还应用于物理多机器人测试平台，以显示其优势。

Abstract
Multi-agent reinforcement learning (MARL) has achieved promising results in recent years. However, most existing reinforcement learning methods require a large amount of data for model training. In addition, data-efficient reinforcement learning requires the construction of strong inductive biases, which are ignored in the current MARL approaches. Inspired by the symmetry phenomenon in multi-agent systems, this paper proposes a framework for exploiting prior knowledge by integrating data augmentation and a well-designed consistency loss into the existing MARL methods. In addition, the proposed framework is model-agnostic and can be applied to most of the current MARL algorithms. Experimental tests on multiple challenging tasks demonstrate the effectiveness of the proposed framework. Moreover, the proposed framework is applied to a physical multi-robot testbed to show its superiority.

摘要
Translation notes:* "Multi-agent reinforcement learning" is translated as "多 Agent 强化学习" (MARL)* "achieved promising results" is translated as "取得了可观的成果"* "most existing reinforcement learning methods" is translated as "大多数现有的 reinforcement learning 方法"* "require a large amount of data" is translated as "需要大量数据"* "data-efficient reinforcement learning" is translated as "数据有效的 reinforcement learning"* "strong inductive biases" is translated as "强大的推理假设"* " ignored in the current MARL approaches" is translated as "在当前的 MARL 方法中被忽略"* "inspired by the symmetry phenomenon" is translated as " inspirited by the symmetry phenomenon"* "a framework for exploiting prior knowledge" is translated as "一个抽象框架 для利用先前知识"* "integrating data augmentation" is translated as " integrating data augmentation"* "a well-designed consistency loss" is translated as "一个良好的一致性损失"* "model-agnostic" is translated as "模型无关"* "can be applied to most of the current MARL algorithms" is translated as "可以应用于大多数当前的 MARL 算法"* "Experimental tests on multiple challenging tasks" is translated as "在多个复杂任务上进行了实验测试"* "demonstrate the effectiveness of the proposed framework" is translated as "示出提议的框架的有效性"* "applied to a physical multi-robot testbed" is translated as "应用于物理多机器人测试平台"* "show its superiority" is translated as "示出其优越性"

Unified Model for Image, Video, Audio and Language Tasks

paper_url: http://arxiv.org/abs/2307.16184
repo_url: https://github.com/mshukor/unival
paper_authors: Mustafa Shukor, Corentin Dancette, Alexandre Rame, Matthieu Cord
for:* The paper aims to build a unified model that can support all modalities (text, images, videos, and audio) efficiently and without relying on large datasets or complex models.methods:* The proposed model, UnIVAL, is pretrained on many tasks using task balancing and multimodal curriculum learning.* The model uses weight interpolation of models trained on different multimodal tasks to improve generalization to out-of-distribution inputs.results:* UnIVAL shows competitive performance on image and video-text tasks and achieves competitive performance on audio-text tasks despite not being pretrained on audio.* The unified model demonstrates the synergy between tasks and improves generalization to out-of-distribution inputs.Here is the information in Simplified Chinese text:for:* 论文目的是建立一个能够支持所有Modalities（文本、图像、视频和音频）的有效和高效的模型，不需要庞大的数据集或复杂的模型。methods:* 提议的模型UnIVAL通过任务均衡和多模态训练来预训练多个任务。* 模型使用多模态任务训练的模型Weight interpolation来提高对异常输入的泛化。results:* UnIVAL在图像和视频-文本任务上显示了竞争性表现，并在没有对 audio 进行预训练的情况下在 audio-文本任务上达到了竞争性表现。* 统一模型展示了任务之间的共谊和对异常输入的泛化提高。I hope that helps!

Abstract
Large Language Models (LLMs) have made the ambitious quest for generalist agents significantly far from being a fantasy. A key hurdle for building such general models is the diversity and heterogeneity of tasks and modalities. A promising solution is unification, allowing the support of a myriad of tasks and modalities within one unified framework. While few large models (e.g., Flamingo (Alayrac et al., 2022), trained on massive datasets, can support more than two modalities, current small to mid-scale unified models are still limited to 2 modalities, usually image-text or video-text. The question that we ask is: is it possible to build efficiently a unified model that can support all modalities? To answer this, we propose UnIVAL, a step further towards this ambitious goal. Without relying on fancy datasets sizes or models with billions of parameters, the ~ 0.25B parameter UnIVAL model goes beyond two modalities and unifies text, images, video, and audio into a single model. Our model is efficiently pretrained on many tasks, based on task balancing and multimodal curriculum learning. UnIVAL shows competitive performance to existing state-of-the-art approaches, across image and video-text tasks. The feature representations learned from image and video-text modalities, allows the model to achieve competitive performance when finetuned on audio-text tasks, despite not being pretrained on audio. Thanks to the unified model, we propose a novel study on multimodal model merging via weight interpolation of models trained on different multimodal tasks, showing their benefits in particular for out-of-distribution generalization. Finally, we motivate unification by showing the synergy between tasks. The model weights and code are released here: https://github.com/mshukor/UnIVAL.

摘要
大型语言模型（LLM）已经让普通的通用代理人变得不是一个梦想。一个关键的难点是多元化和多种多样的任务和模式。一个有前途的解决方案是统一，允许支持一大量的任务和模式在一个统一框架下。现在的小型至中型统一模型都只支持2种模式，通常是图像文本或视频文本。我们的问题是：是否可以有效地建立一个统一模型，可以支持所有模式？为了回答这个问题，我们提出了 UnIVAL，这是一个进一步的目标。不需要庞大的数据集或者具有亿位 Parameters 的模型，我们的 ~ 0.25B 参数的 UnIVAL 模型可以超过二种模式，并将文本、图像、视频和音频统一为一个模型。我们的模型通过多任务调整和多模式学习来快速预训。 UnIVAL 在图像和视频文本任务上显示了竞争性的表现，而且可以在不直接预训的音频文本任务上 achieve 竞争性的表现，只因为它可以从图像和视频文本模式中学习出来的特征表现。我们还提出了一个新的研究，通过多模式模型的权重 interpolating 来评估多模式模型在不同多模式任务之间的融合效果，这些任务包括 audio-text 任务。最后，我们鼓励统一，因为多个任务之间存在联互关系，这使得模型可以从不同任务中学习到普遍的特征表现。模型和代码可以在 GitHub 上获取：https://github.com/mshukor/UnIVAL。

Redundancy-aware unsupervised rankings for collections of gene sets

paper_url: http://arxiv.org/abs/2307.16182
repo_url: None
paper_authors: Chiara Balestra, Carlo Maj, Emmanuel Müller, Andreas Mayr
for: 提高生物学 Pathway 的可读性和解释力
methods: 使用 Shapley 值来评估 Pathway 的重要性，并使用 trick 避免计算复杂性
results: 可以减少 Pathway 集合的维度，同时保持高度覆盖所有基因的表达In simplified Chinese:
for: 提高生物学 Pathway 的可读性和解释力
methods: 使用 Shapley 值来评估 Pathway 的重要性，并使用 trick 避免计算复杂性
results: 可以减少 Pathway 集合的维度，同时保持高度覆盖所有基因的表达

Abstract
The biological roles of gene sets are used to group them into collections. These collections are often characterized by being high-dimensional, overlapping, and redundant families of sets, thus precluding a straightforward interpretation and study of their content. Bioinformatics looked for solutions to reduce their dimension or increase their intepretability. One possibility lies in aggregating overlapping gene sets to create larger pathways, but the modified biological pathways are hardly biologically justifiable. We propose to use importance scores to rank the pathways in the collections studying the context from a set covering perspective. The proposed Shapley values-based scores consider the distribution of the singletons and the size of the sets in the families; Furthermore, a trick allows us to circumvent the usual exponential complexity of Shapley values' computation. Finally, we address the challenge of including a redundancy awareness in the obtained rankings where, in our case, sets are redundant if they show prominent intersections. The rankings can be used to reduce the dimension of collections of gene sets, such that they show lower redundancy and still a high coverage of the genes. We further investigate the impact of our selection on Gene Sets Enrichment Analysis. The proposed method shows a practical utility in bioinformatics to increase the interpretability of the collections of gene sets and a step forward to include redundancy into Shapley values computations.

摘要
生物学角色集合用于分组 gene set。这些集合经常是高维、重叠、重复的家庭集合，因此禁止直接解释和研究其内容。生物信息学搜索解决方案以降低维度或增加可读性。一种可能性在于将重叠的 gene set 聚合成更大的路径，但修改后的生物路径几乎不能正确地表达生物学意义。我们提议使用importance scores来排序pathway，并研究集合从集合覆盖角度来学习context。我们的提案基于 Shapley 值，考虑单个元素和集合的大小，并且可以避免通常的对 Shapley 值的计算的指数复杂性。 finally，我们解决了包含重复性在内的获得的排名中的挑战。这些排名可以用来降低集合的维度，以便仍然保持高度覆盖所有的基因。我们进一步调查了我们的选择对 Gene Sets Enrichment Analysis 的影响。我们的方法显示了生物信息学中可行的增加可读性的集合，以及包含重复性在内的 Shapley 值计算的一个进步。

Adaptive learning of density ratios in RKHS

paper_url: http://arxiv.org/abs/2307.16164
repo_url: None
paper_authors: Werner Zellinger, Stefan Kindermann, Sergei V. Pereverzyev
for: 估计两个概率密度之间的比率从finite数据观测中。
methods: 使用regularized Bregman divergence在 reproduce kernel Hilbert space（RKHS）中对概率密度比率进行估计。
results: 提供新的finite-sample error bounds，并提出Lepskii type parameter choice principle，可以在不知道概率密度比率的情况下最小化 bound。在特定情况下，我们的方法可以达到最优的 minimax 错误率。

Abstract
Estimating the ratio of two probability densities from finitely many observations of the densities is a central problem in machine learning and statistics with applications in two-sample testing, divergence estimation, generative modeling, covariate shift adaptation, conditional density estimation, and novelty detection. In this work, we analyze a large class of density ratio estimation methods that minimize a regularized Bregman divergence between the true density ratio and a model in a reproducing kernel Hilbert space (RKHS). We derive new finite-sample error bounds, and we propose a Lepskii type parameter choice principle that minimizes the bounds without knowledge of the regularity of the density ratio. In the special case of quadratic loss, our method adaptively achieves a minimax optimal error rate. A numerical illustration is provided.

摘要
估算两个概率密度之间的比率是机器学习和统计领域中的中心问题，具有应用于两个样本测试、差异估计、生成模型、 covariate shift 适应、条件概率密度估计和新奇检测等领域。在这个工作中，我们分析了一大类的概率密度比率估计方法，这些方法在一个 reproduce kernel Hilbert space（RKHS）中减少了一个弹性Bregman divergence的正则化。我们 derivated新的finite-sample error bounds，并提出了一种Lepskii类型的参数选择原则，该原则可以在不知道概率密度比率的regulatory情况下最小化error bounds。在特定的quadratic loss情况下，我们的方法可以自适应取得一个minimax优化的错误率。一个数字示例也提供。

Variance Control for Distributional Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.16152
repo_url: https://github.com/kuangqi927/qem
paper_authors: Qi Kuang, Zhoufan Zhu, Liwen Zhang, Fan Zhou
for: 这个论文主要是为了检验分布式强化学习（DRL）中Q函数估计器的有效性。
methods: 该论文使用了错误分析来理解Q函数估计器在分布式设定下的拟合误差的影响，并提出了一种新的估计器\emph{Quantiled Expansion Mean}（QEM）以及一种新的DRL算法（QEMRL）。
results: 对于variety of Atari和Mujoco benchmark任务，QEMRL算法比基eline算法在样本效率和收敛性方面具有显著改进。

Abstract
Although distributional reinforcement learning (DRL) has been widely examined in the past few years, very few studies investigate the validity of the obtained Q-function estimator in the distributional setting. To fully understand how the approximation errors of the Q-function affect the whole training process, we do some error analysis and theoretically show how to reduce both the bias and the variance of the error terms. With this new understanding, we construct a new estimator \emph{Quantiled Expansion Mean} (QEM) and introduce a new DRL algorithm (QEMRL) from the statistical perspective. We extensively evaluate our QEMRL algorithm on a variety of Atari and Mujoco benchmark tasks and demonstrate that QEMRL achieves significant improvement over baseline algorithms in terms of sample efficiency and convergence performance.

摘要
尽管分布式强化学习（DRL）在过去几年内得到了广泛的研究，但是很少研究对于分布式设定中的Q函数估计器的有效性。为了全面理解Q函数估计器的折衔错误对整个训练过程的影响，我们进行了错误分析并从统计角度提出了一种新的估计器——量划扩展均值（QEM），以及一种基于统计学的新DRL算法（QEMRL）。我们对多个Atari和Mujoco benchmark任务进行了广泛的评估，并证明了QEMRL在样本效率和收敛性方面具有显著的改进。

An Effective LSTM-DDPM Scheme for Energy Theft Detection and Forecasting in Smart Grid

paper_url: http://arxiv.org/abs/2307.16149
repo_url: None
paper_authors: Xun Yuan, Yang Yang, Arwa Alromih, Prosanta Gope, Biplab Sikdar
for: 该研究目标是解决智能电网系统中的能源盗用检测（ETD）和能源消耗预测（ECF）两个相关的挑战。
methods: 该研究提出了一种结合长期快短训练记忆（LSTM）和杂噪扩散概率模型（DDPM）的方法，通过生成输入重建和预测来实现ETD和ECF。
results: 经过大量的实验 validate 实验，该方法在实际数据集和 sintetic 数据集上都能够高效地解决ETD和ECF问题，并且在ETD问题上显示出了显著的改善。

Abstract
Energy theft detection (ETD) and energy consumption forecasting (ECF) are two interconnected challenges in smart grid systems. Addressing these issues collectively is crucial for ensuring system security. This paper addresses the interconnected challenges of ETD and ECF in smart grid systems. The proposed solution combines long short-term memory (LSTM) and a denoising diffusion probabilistic model (DDPM) to generate input reconstruction and forecasting. By leveraging the reconstruction and forecasting errors, the system identifies instances of energy theft, with the methods based on reconstruction error and forecasting error complementing each other in detecting different types of attacks. Through extensive experiments on real-world and synthetic datasets, the proposed scheme outperforms baseline methods in ETD and ECF problems. The ensemble method significantly enhances ETD performance, accurately detecting energy theft attacks that baseline methods fail to detect. The research offers a comprehensive and effective solution for addressing ETD and ECF challenges, demonstrating promising results and improved security in smart grid systems.

摘要
智能Grid系统中的能源抢夺检测（ETD）和能源消耗预测（ECF）是两个相关的挑战。对这两个问题进行集中解决是确保系统安全的关键。这篇论文解决了智能Grid系统中的ETD和ECF问题。提议的解决方案将长期短期记忆（LSTM）和杂度减少概率模型（DDPM）结合使用，生成输入重建和预测。通过利用重建和预测错误，系统可以识别能源抢夺行为，基于重建错误和预测错误来识别不同类型的攻击。经过广泛的实验，提议的方案在ETD和ECF问题上表现出优于基eline方法。 ensemble方法可以明显提高ETD性能，准确地检测基eline方法无法检测的能源抢夺攻击。这项研究提供了智能Grid系统中ETD和ECF问题的全面和有效解决方案，实验结果表明，该方案在智能Grid系统中提供了更好的安全保障。

Pupil Learning Mechanism

paper_url: http://arxiv.org/abs/2307.16141
repo_url: None
paper_authors: Rua-Huan Tsaih, Yu-Hang Chien, Shih-Yi Chien
for: 这个研究旨在解决人工神经网络中的快速衰减和过拟合问题。
methods: 本研究使用了学习眼视程序，包括解释、选择、理解、填充和组织等模块，从而 derivate 视力学习机制（PLM），并将其应用到2层神经网络（2LNN）中。
results: 在实验中，PLM模型与线性回归模型和传统的反射式2LNN模型相比，PLM模型具有较高的准确率和较低的错误率。

Abstract
Studies on artificial neural networks rarely address both vanishing gradients and overfitting issues. In this study, we follow the pupil learning procedure, which has the features of interpreting, picking, understanding, cramming, and organizing, to derive the pupil learning mechanism (PLM) by which to modify the network structure and weights of 2-layer neural networks (2LNNs). The PLM consists of modules for sequential learning, adaptive learning, perfect learning, and less-overfitted learning. Based upon a copper price forecasting dataset, we conduct an experiment to validate the PLM module design modules, and an experiment to evaluate the performance of PLM. The empirical results indeed approve the PLM module design and show the superiority of the proposed PLM model over the linear regression model and the conventional backpropagation-based 2LNN model.

摘要
研究人工神经网络通常不关注两个问题：衰减梯度和适应过度。在这个研究中，我们采用学生学习过程，具有解释、选择、理解、填充和组织等特点， derivate学生学习机制（PLM），用于修改网络结构和权重。PLM包括顺序学习、适应学习、完美学习和较少适应学习模块。我们使用铜价预测数据集进行实验验证PLM模块设计，并对PLM模型的性能进行评估。实验结果证明PLM模块设计的正确性，并表明我们提出的PLM模型在线性回归模型和传统的反射层2LNN模型的性能上显著优于。

User-Controlled Knowledge Fusion in Large Language Models: Balancing Creativity and Hallucination

paper_url: http://arxiv.org/abs/2307.16139
repo_url: None
paper_authors: Chen Zhang
for: 这个论文旨在解决现代对话系统中大语言模型（LLM）的使用问题，即找到一种能够控制LLM的幽默和实际知识的平衡点。
methods: 该论文提出了一种新的用户可控的机制，通过在LLM训练阶段添加数字标签来控制LLM的幽默和实际知识之间的平衡。该标签的值由自动化过程计算，根据ROUGE分数、Sentence-BERT嵌入和LLM自我评价分数来度量LLM对参考知识的依赖程度。
results: 该论文通过了广泛的实验，证明了该方法的适应性和可控性，并在不同的场景下保持了LLM的回快和准确性。结果表明该方法可以提高LLM的多样性，同时保持幽默和幻想的平衡。

Abstract
In modern dialogue systems, the use of Large Language Models (LLMs) has grown exponentially due to their capacity to generate diverse, relevant, and creative responses. Despite their strengths, striking a balance between the LLMs' creativity and their faithfulness to external knowledge remains a key challenge. This paper presents an innovative user-controllable mechanism that modulates the balance between an LLM's imaginative capabilities and its adherence to factual information. Our approach incorporates a numerical tag during the fine-tuning phase of the LLM's training, representing the degree of faithfulness to the reference knowledge in the generated responses. This degree is computed through an automated process that measures lexical overlap using ROUGE scores, semantic similarity using Sentence-BERT embeddings, and an LLM's self-evaluation score. During model inference, users can manipulate this numerical tag, thus controlling the degree of the LLM's reliance on external knowledge. We conduct extensive experiments across various scenarios, demonstrating the adaptability of our method and its efficacy in ensuring the quality and accuracy of the LLM's responses. The results highlight the potential of our approach to enhance the versatility of LLMs while maintaining a balance between creativity and hallucination.

摘要
现代对话系统中，使用大型语言模型（LLM）的使用量在增长 exponentially due to their capacity to generate diverse, relevant, and creative responses. Despite their strengths, striking a balance between the LLMs' creativity and their faithfulness to external knowledge remains a key challenge. This paper presents an innovative user-controllable mechanism that modulates the balance between an LLM's imaginative capabilities and its adherence to factual information. Our approach incorporates a numerical tag during the fine-tuning phase of the LLM's training, representing the degree of faithfulness to the reference knowledge in the generated responses. This degree is computed through an automated process that measures lexical overlap using ROUGE scores, semantic similarity using Sentence-BERT embeddings, and an LLM's self-evaluation score. During model inference, users can manipulate this numerical tag, thus controlling the degree of the LLM's reliance on external knowledge. We conduct extensive experiments across various scenarios, demonstrating the adaptability of our method and its efficacy in ensuring the quality and accuracy of the LLM's responses. The results highlight the potential of our approach to enhance the versatility of LLMs while maintaining a balance between creativity and hallucination.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Deep Unrolling Networks with Recurrent Momentum Acceleration for Nonlinear Inverse Problems

paper_url: http://arxiv.org/abs/2307.16120
repo_url: https://github.com/zhouqp631/dunets-rma
paper_authors: Qingping Zhou, Jiayu Qian, Junqi Tang, Jinglai Li
for: 解决非线性逆问题
methods: 使用梯度加速技术（RMA）扩展深度推导网络（DuNets）
results: 对两种流行的 DuNets 方法（LPGD 和 LPD）进行了改进，提高了非线性逆问题的解决能力。实验结果表明，RMA 技术在非线性逆问题中的改进效果随问题的非线性程度增长。

Abstract
Combining the strengths of model-based iterative algorithms and data-driven deep learning solutions, deep unrolling networks (DuNets) have become a popular tool to solve inverse imaging problems. While DuNets have been successfully applied to many linear inverse problems, nonlinear problems tend to impair the performance of the method. Inspired by momentum acceleration techniques that are often used in optimization algorithms, we propose a recurrent momentum acceleration (RMA) framework that uses a long short-term memory recurrent neural network (LSTM-RNN) to simulate the momentum acceleration process. The RMA module leverages the ability of the LSTM-RNN to learn and retain knowledge from the previous gradients. We apply RMA to two popular DuNets -- the learned proximal gradient descent (LPGD) and the learned primal-dual (LPD) methods, resulting in LPGD-RMA and LPD-RMA respectively. We provide experimental results on two nonlinear inverse problems: a nonlinear deconvolution problem, and an electrical impedance tomography problem with limited boundary measurements. In the first experiment we have observed that the improvement due to RMA largely increases with respect to the nonlinearity of the problem. The results of the second example further demonstrate that the RMA schemes can significantly improve the performance of DuNets in strongly ill-posed problems.

摘要
使用模型基于迭代算法和数据驱动深度学习解决方案，深度螺旋网络（DuNets）已成为解析逆问题的流行工具。 Although DuNets have been successfully applied to many linear inverse problems, nonlinear problems tend to impair the method's performance. Inspired by momentum acceleration techniques commonly used in optimization algorithms, we propose a recurrent momentum acceleration (RMA) framework that uses a long short-term memory recurrent neural network (LSTM-RNN) to simulate the momentum acceleration process. The RMA module leverages the ability of the LSTM-RNN to learn and retain knowledge from previous gradients. We apply RMA to two popular DuNets - the learned proximal gradient descent (LPGD) and the learned primal-dual (LPD) methods, resulting in LPGD-RMA and LPD-RMA, respectively. We provide experimental results on two nonlinear inverse problems: a nonlinear deconvolution problem and an electrical impedance tomography problem with limited boundary measurements. In the first experiment, we observed that the improvement due to RMA increases significantly with respect to the nonlinearity of the problem. The results of the second example further demonstrate that the RMA schemes can significantly improve the performance of DuNets in strongly ill-posed problems.

TMPNN: High-Order Polynomial Regression Based on Taylor Map Factorization

paper_url: http://arxiv.org/abs/2307.16105
repo_url: https://github.com/andiva/tmpnn
paper_authors: Andrei Ivanov, Stefan Maria Ailuro
for: 这篇论文旨在提出一种基于Taylor map汇合的高阶多项式回传 regression 方法，用于解决非线性模式的预测问题。
methods: 这篇论文使用了 Taylor map 汇合来建构高阶多项式回传 regression 方法，并实现了多目标回传和内部目标之间的关联。
results: 根据 UCI 开放资料集、Feynman симвоlic regression 资料集和 Friedman-1 资料集的比较，提出的方法与现有的回传方法相比，在特定任务上表现更好，并且在某些任务上表现更好。

Abstract
Polynomial regression is widely used and can help to express nonlinear patterns. However, considering very high polynomial orders may lead to overfitting and poor extrapolation ability for unseen data. The paper presents a method for constructing a high-order polynomial regression based on the Taylor map factorization. This method naturally implements multi-target regression and can capture internal relationships between targets. Additionally, we introduce an approach for model interpretation in the form of systems of differential equations. By benchmarking on UCI open access datasets, Feynman symbolic regression datasets, and Friedman-1 datasets, we demonstrate that the proposed method performs comparable to the state-of-the-art regression methods and outperforms them on specific tasks.

摘要
“多项式回传 regression 广泛应用，可以表示非线性征 patten。然而，考虑非常高的多项式顺序可能会导致过拟合和未见数据的 extrapolation 能力不佳。论文提出了基于 Taylor 对偶 factorization 的高顺位多项式回传方法。这种方法自然地实现多 Target 回传和目标之间的内部关系。此外，我们引入了一种模型解释方法，即系统 diferential Equations。通过 UCI 开放存储数据集、Feynman симвоlic 回传数据集和 Friedman-1 数据集的实验，我们显示了提案的方法与现有的回传方法相比，在特定任务上表现相似，甚至在某些任务上超越。”Note: The translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. The other form is Traditional Chinese.

AI Increases Global Access to Reliable Flood Forecasts

paper_url: http://arxiv.org/abs/2307.16104
repo_url: https://github.com/google-research-datasets/global_streamflow_model_paper
paper_authors: Grey Nearing, Deborah Cohen, Vusumuzi Dube, Martin Gauch, Oren Gilon, Shaun Harrigan, Avinatan Hassidim, Frederik Kratzert, Asher Metzger, Sella Nevo, Florian Pappenberger, Christel Prudhomme, Guy Shalev, Shlomo Shenzis, Tadele Tekalign, Dana Weitzner, Yoss Matias
for: 这份研究是为了开发一个使用人工智能（AI）预测极端水文事件的模型，以提供更准确和更早的洪水警告。methods: 这份研究使用了AI模型来预测极端水文事件，并且比较了这个模型与现有的全球水文模型（Copernicus Emergency Management Service Global Flood Awareness System）的性能。results: 研究发现，这个AI模型在全球各地、不同的时间点和返回期下，都有着更高的准确性和更早的预测能力，特别是在无测流域中。这个模型已经被 integrate into an operational early warning system，并且在更 чем 80个国家提供免费和开放的预测。

Abstract
Floods are one of the most common and impactful natural disasters, with a disproportionate impact in developing countries that often lack dense streamflow monitoring networks. Accurate and timely warnings are critical for mitigating flood risks, but accurate hydrological simulation models typically must be calibrated to long data records in each watershed where they are applied. We developed an Artificial Intelligence (AI) model to predict extreme hydrological events at timescales up to 7 days in advance. This model significantly outperforms current state of the art global hydrology models (the Copernicus Emergency Management Service Global Flood Awareness System) across all continents, lead times, and return periods. AI is especially effective at forecasting in ungauged basins, which is important because only a few percent of the world's watersheds have stream gauges, with a disproportionate number of ungauged basins in developing countries that are especially vulnerable to the human impacts of flooding. We produce forecasts of extreme events in South America and Africa that achieve reliability approaching the current state of the art in Europe and North America, and we achieve reliability at between 4 and 6-day lead times that are similar to current state of the art nowcasts (0-day lead time). Additionally, we achieve accuracies over 10-year return period events that are similar to current accuracies over 2-year return period events, meaning that AI can provide warnings earlier and over larger and more impactful events. The model that we develop in this paper has been incorporated into an operational early warning system that produces publicly available (free and open) forecasts in real time in over 80 countries. This work using AI and open data highlights a need for increasing the availability of hydrological data to continue to improve global access to reliable flood warnings.

摘要
洪水是最常见且最有影响力的自然灾害之一，特别是在开发中国家，那里缺乏密集的流域流速监测网。精确和时间对洪水风险的警示是 Mitigating flood risks critical，但是需要对每个水系进行精确的水文模型 calibration 。我们发展了一个人工智能（AI）模型，可以预测7天内的极端水文事件。这个模型在所有大陆、领先时间和返回时间方面都有 significatively outperform 现有的全球水文模型（Copernicus Emergency Management Service Global Flood Awareness System）。AI 特别有用于预测无测流域，因为只有一小 percent of the world's watersheds have stream gauges，而且这些无测流域主要集中在开发中国家，这些国家对人类洪水的影响更加敏感。我们在南美和非洲预测极端事件的精度接近现有的欧洲和北美洲精度，并在4-6天领先时间内实现相似的可靠性。此外，我们在10年返回时间的事件中实现了现有2年返回时间的精度，这意味着AI可以提供更早的警示和更大和更重要的事件。我们在这篇文章中开发的模型已经被 integrate 到一个操作中的早期警示系统中，该系统在实时生成可公开获取（免费和开放）的预测。这个使用 AI 和开放数据的工作 highlights 对于全球访问可靠洪水警示的需求。

On Neural Network approximation of ideal adversarial attack and convergence of adversarial training

paper_url: http://arxiv.org/abs/2307.16099
repo_url: None
paper_authors: Rajdeep Haldar, Qifan Song
for: 本文针对适用于防御模型对抗攻击的方法。
methods: 本文使用了一种基于神经网络的方法，将攻击表示为可训练的函数，不需要进一步的梯度计算。
results: 本文证明了在适当的条件下，攻击可以被表示为光滑的块状函数（块状Holder函数），并使用神经网络实现理想的攻击过程。

Abstract
Adversarial attacks are usually expressed in terms of a gradient-based operation on the input data and model, this results in heavy computations every time an attack is generated. In this work, we solidify the idea of representing adversarial attacks as a trainable function, without further gradient computation. We first motivate that the theoretical best attacks, under proper conditions, can be represented as smooth piece-wise functions (piece-wise H\"older functions). Then we obtain an approximation result of such functions by a neural network. Subsequently, we emulate the ideal attack process by a neural network and reduce the adversarial training to a mathematical game between an attack network and a training model (a defense network). We also obtain convergence rates of adversarial loss in terms of the sample size $n$ for adversarial training in such a setting.

摘要
adversarial attacks 通常表示为输入数据和模型的梯度基于操作，这会导致每次生成攻击时需要重大计算。在这项工作中，我们坚持思考表达攻击为可学习函数，不需要进一步的梯度计算。我们首先证明，理论上最佳的攻击，在适当的条件下，可以表示为流畅的割辑函数（割辑Holder函数）。然后，我们得到了这些函数的近似结果，使用神经网络。接着，我们模拟理想的攻击过程，用神经网络来实现，并将反恐训练转化为数学游戏， между攻击网络和训练模型（防御网络）。我们还得到了对攻击损失的整数化速率，随着样本大小 $n$ 的增加。

ADR-GNN: Advection-Diffusion-Reaction Graph Neural Networks

paper_url: http://arxiv.org/abs/2307.16092
repo_url: None
paper_authors: Moshe Eliasof, Eldad Haber, Eran Treister
for: 本文提出了一种基于扩散吸引系统的图 neural network 架构（ADR-GNN），用于解决图数据上复杂现象的学习表示。
methods: 该架构结合了扩散、吸引和反应三种过程，以模型图数据上的导向传输信息、本地平滑信息和非线性变换信息。
results: 作者对实验数据集进行了评估，并显示了 ADR-GNN 在图分类和空间时间数据集上提供了改进或与状态艺术网络竞争的表现。

Abstract
Graph neural networks (GNNs) have shown remarkable success in learning representations for graph-structured data. However, GNNs still face challenges in modeling complex phenomena that involve advection. In this paper, we propose a novel GNN architecture based on Advection-Diffusion-Reaction systems, called ADR-GNN. Advection models the directed transportation of information, diffusion captures the local smoothing of information, and reaction represents the non-linear transformation of information in channels. We provide an analysis of the qualitative behavior of ADR-GNN, that shows the benefit of combining advection, diffusion, and reaction. To demonstrate its efficacy, we evaluate ADR-GNN on real-world node classification and spatio-temporal datasets, and show that it improves or offers competitive performance compared to state-of-the-art networks.

摘要
GRAPH Neural Networks (GNNs) 已经取得了非常成功的表示图structured数据的学习。然而，GNNS仍然面临Complex Phenomena 的挑战，包括适应。在这篇论文中，我们提出了一种基于适应扩散反应系统的新GNN架构，称为ADR-GNN。适应模型化了irectional transportation of information，扩散捕捉了Local Smoothing of information，并且Reaction表示通道中的非线性变换。我们提供了ADR-GNN的qualitative行为分析，显示了结合适应、扩散和反应的优势。为证明其有效性，我们对实际世界节点分类和空时间数据集进行了评估，并显示了它与当前网络的竞争性或提高性。

Rapid Flood Inundation Forecast Using Fourier Neural Operator

paper_url: http://arxiv.org/abs/2307.16090
repo_url: None
paper_authors: Alexander Y. Sun, Zhi Li, Wonhyun Lee, Qixing Huang, Bridget R. Scanlon, Clint Dawson
For: 预测洪水覆盖范围和水深，提供紧急准备和应急响应之用。* Methods: 结合过程基本模型和数据驱动机器学习方法，采用Fourier neural operator（FNO）模型进行蒸发模拟。* Results: FNO模型在训练使用六个历史洪水事件的计算水深数据（15分钟间隔）后，在两个保留事件上进行测试，显示FNO模型在所有领先时间（最长3小时）中保持高预测精度，并在应用于新地点时表现良好，表明具有强泛化能力。

Abstract
Flood inundation forecast provides critical information for emergency planning before and during flood events. Real time flood inundation forecast tools are still lacking. High-resolution hydrodynamic modeling has become more accessible in recent years, however, predicting flood extents at the street and building levels in real-time is still computationally demanding. Here we present a hybrid process-based and data-driven machine learning (ML) approach for flood extent and inundation depth prediction. We used the Fourier neural operator (FNO), a highly efficient ML method, for surrogate modeling. The FNO model is demonstrated over an urban area in Houston (Texas, U.S.) by training using simulated water depths (in 15-min intervals) from six historical storm events and then tested over two holdout events. Results show FNO outperforms the baseline U-Net model. It maintains high predictability at all lead times tested (up to 3 hrs) and performs well when applying to new sites, suggesting strong generalization skill.

摘要
洪水泛洪预测提供了重要的紧急准备和应急管理之前和在洪水事件发生时的信息。实时洪水泛洪预测工具仍然缺乏。高分解力 hidrodynamic 模型在过去几年中变得更加可 accessible，但是在实时预测洪水泛洪范围和泛洪深度方面仍然是计算挑战。我们提出了一种 hybrid 过程基于的数据驱动机器学习（ML）方法，用于预测洪水泛洪范围和泛洪深度。我们使用了 Fourier 神经网络（FNO）模型，这是一种高效的 ML 方法，用于模拟器。FNO 模型在得克萨斯州HOUSTON 市区域上进行了训练，使用了六个历史洪水事件中的 simulate 水深数据（每 15 分钟一个数据点），然后在两个保留事件上进行测试。结果显示，FNO 模型在所有领先时间（最长 3 小时）中保持高度预测性，并在应用于新地点时表现良好，这表明其具有强大的泛化能力。

Using Implicit Behavior Cloning and Dynamic Movement Primitive to Facilitate Reinforcement Learning for Robot Motion Planning

paper_url: http://arxiv.org/abs/2307.16062
repo_url: None
paper_authors: Zengjie Zhang, Jayden Hong, Amir Soufi Enayati, Homayoun Najjaran
for: 提高多度OFRobot的运动规划效率和可重用性
methods: 使用偏好行为假设（IBC）和动态运动原理（DMP）提高RLagent的训练速度和通用性
results: 在模拟和实验中比对RLagent和传统RLagent，表明提议方法具有更快的训练速度和更高的分数Here’s the breakdown of each point:
for: The paper is written to improve the efficiency and generalizability of reinforcement learning (RL) for motion planning of multi-degree-of-freedom robots.
methods: The proposed method uses implicit behavior cloning (IBC) and dynamic movement primitive (DMP) to improve the training speed and generalizability of an off-policy RL agent.
results: The proposed method is compared with conventional RL agents in simulation and real-robot experiments, showing faster training speed and higher scores.

Abstract
Reinforcement learning (RL) for motion planning of multi-degree-of-freedom robots still suffers from low efficiency in terms of slow training speed and poor generalizability. In this paper, we propose a novel RL-based robot motion planning framework that uses implicit behavior cloning (IBC) and dynamic movement primitive (DMP) to improve the training speed and generalizability of an off-policy RL agent. IBC utilizes human demonstration data to leverage the training speed of RL, and DMP serves as a heuristic model that transfers motion planning into a simpler planning space. To support this, we also create a human demonstration dataset using a pick-and-place experiment that can be used for similar studies. Comparison studies in simulation reveal the advantage of the proposed method over the conventional RL agents with faster training speed and higher scores. A real-robot experiment indicates the applicability of the proposed method to a simple assembly task. Our work provides a novel perspective on using motion primitives and human demonstration to leverage the performance of RL for robot applications.

摘要
Translated into Simplified Chinese:利用强化学习（RL）的动作规划方法，多度 freedom 机器人的准确率仍然受到低效率的困扰，即训练速度慢和泛化能力差。在这篇论文中，我们提出了一种基于RL的机器人动作规划框架，使用隐式行为封装（IBC）和动态运动 primitives（DMP）来提高RLagent的训练速度和泛化能力。IBC利用人类示范数据来利用RL的训练速度，而DMP作为一种启发模型，将动作规划转移到简单的规划空间。为支持这一点，我们还创建了一个人类示范数据集，用于类似的研究。在模拟环境中的比较研究表明，提案方法比普通RL代理人具有更快的训练速度和更高的分数。一个真实机器人实验表明提案方法对简单的组装任务有应用性。我们的工作提供了一种新的思路，利用动作 primitives 和人类示范来提高RL的表现 для机器人应用。

Click-Conversion Multi-Task Model with Position Bias Mitigation for Sponsored Search in eCommerce

paper_url: http://arxiv.org/abs/2307.16060
repo_url: None
paper_authors: Yibo Wang, Yanbing Xue, Bo Liu, Musen Wen, Wenting Zhao, Stephen Guo, Philip S. Yu
for: This paper aims to mitigate position bias in ranking systems, particularly in e-commerce sponsored product search.methods: The authors propose two position-bias-free CTR and CVR prediction models: Position-Aware Click-Conversion (PACC) and PACC via Position Embedding (PACC-PE). PACC is built upon probability decomposition, while PACC-PE utilizes neural networks to model product-specific position information as embedding.results: The proposed models have better ranking effectiveness and can greatly alleviate position bias in both CTR and CVR prediction, as shown in experiments on the E-commerce sponsored product search dataset.

Abstract
Position bias, the phenomenon whereby users tend to focus on higher-ranked items of the search result list regardless of the actual relevance to queries, is prevailing in many ranking systems. Position bias in training data biases the ranking model, leading to increasingly unfair item rankings, click-through-rate (CTR), and conversion rate (CVR) predictions. To jointly mitigate position bias in both item CTR and CVR prediction, we propose two position-bias-free CTR and CVR prediction models: Position-Aware Click-Conversion (PACC) and PACC via Position Embedding (PACC-PE). PACC is built upon probability decomposition and models position information as a probability. PACC-PE utilizes neural networks to model product-specific position information as embedding. Experiments on the E-commerce sponsored product search dataset show that our proposed models have better ranking effectiveness and can greatly alleviate position bias in both CTR and CVR prediction.

摘要
<>TRANSLATE_TEXT位置偏见，用户偏好搜索结果列表中高排名的项目，不管它们与查询的实际相关性有多少关系，是许多排名系统中的现象。位置偏见在训练数据中扭曲排名模型，导致排名预测变得越来越不公平，点击率（CTR）和转化率（CVR）预测也受到影响。为了同时消除位置偏见在ITEM CTR和 CVR预测中，我们提议了两种位置偏见自由的预测模型：Position-Aware Click-Conversion（PACC）和PACC via Position Embedding（PACC-PE）。PACC基于概率分解，将位置信息视为概率；PACC-PE使用神经网络来模型产品特定的位置信息作为嵌入。在电商推荐 searched product dataset上的实验表明，我们的提议模型在排名效果和减少位置偏见方面具有显著优势。Note: The text has been translated using the Google Translate API, which may not be perfect and may not capture all the nuances of the original text.

Evaluating the Robustness of Test Selection Methods for Deep Neural Networks

paper_url: http://arxiv.org/abs/2308.01314
repo_url: None
paper_authors: Qiang Hu, Yuejun Guo, Xiaofei Xie, Maxime Cordy, Wei Ma, Mike Papadakis, Yves Le Traon
for: This paper aims to investigate the reliability of multiple test selection methods for deep learning-based systems, and to identify potential pitfalls in their construction.
methods: The paper examines 11 test selection methods from top-tier venues, and conducts a study on five datasets with two model architectures per dataset to empirically confirm the existence of pitfalls.
results: The paper finds that methods for fault detection suffer from test data that are correctly classified but uncertain, or misclassified but confident, leading to a drop in test relative coverage of up to 86.85%. Additionally, methods for performance estimation are sensitive to the choice of intermediate-layer output, and can be less effective than random selection when using an inappropriate layer.

Abstract
Testing deep learning-based systems is crucial but challenging due to the required time and labor for labeling collected raw data. To alleviate the labeling effort, multiple test selection methods have been proposed where only a subset of test data needs to be labeled while satisfying testing requirements. However, we observe that such methods with reported promising results are only evaluated under simple scenarios, e.g., testing on original test data. This brings a question to us: are they always reliable? In this paper, we explore when and to what extent test selection methods fail for testing. Specifically, first, we identify potential pitfalls of 11 selection methods from top-tier venues based on their construction. Second, we conduct a study on five datasets with two model architectures per dataset to empirically confirm the existence of these pitfalls. Furthermore, we demonstrate how pitfalls can break the reliability of these methods. Concretely, methods for fault detection suffer from test data that are: 1) correctly classified but uncertain, or 2) misclassified but confident. Remarkably, the test relative coverage achieved by such methods drops by up to 86.85%. On the other hand, methods for performance estimation are sensitive to the choice of intermediate-layer output. The effectiveness of such methods can be even worse than random selection when using an inappropriate layer.

摘要
测试深度学习系统是关键但困难的，因为需要大量的时间和劳动来标注采集的原始数据。为了减轻标注劳动，许多测试选择方法已经被提出，只需要标注一个子集的测试数据而不符合测试要求。然而，我们发现这些方法在报道了Promising结果后，很少被评估在复杂的场景下。这引发了我们的问题：这些方法是否总是可靠？在这篇论文中，我们探索测试选择方法在测试时会失败的情况。 Specifically, first, we identify potential pitfalls of 11 selection methods from top-tier venues based on their construction. Second, we conduct a study on five datasets with two model architectures per dataset to empirically confirm the existence of these pitfalls. Furthermore, we demonstrate how pitfalls can break the reliability of these methods. Concretely, methods for fault detection suffer from test data that are: 1) correctly classified but uncertain, or 2) misclassified but confident. Remarkably, the test relative coverage achieved by such methods drops by up to 86.85%. On the other hand, methods for performance estimation are sensitive to the choice of intermediate-layer output. The effectiveness of such methods can be even worse than random selection when using an inappropriate layer.

Unveiling Exotic Magnetic Phases in Fibonacci Quasicrystalline Stacking of Ferromagnetic Layers through Machine Learning

paper_url: http://arxiv.org/abs/2307.16052
repo_url: None
paper_authors: Pablo S. Cornaglia, Matias Nuñez, D. J. Garcia
for: 这个研究探讨了使用短距离磁Materials实现的菲波奈克里斯Stacking结构的磁性性质。
methods: 该研究使用了机器学习方法来探索这个 quasi-crystalline 系统中磁性行为的复杂关系，并提供了磁性相对图。
results: 研究发现了一种特殊的斜排列相，其中磁化程度递减Logarithmically with the stack height。此外，研究还发现了其他斜排列相和非斜排列相。

Abstract
In this study, we conduct a comprehensive theoretical analysis of a Fibonacci quasicrystalline stacking of ferromagnetic layers, potentially realizable using van der Waals magnetic materials. We construct a model of this magnetic heterostructure, which includes up to second neighbor interlayer magnetic interactions, that displays a complex relationship between geometric frustration and magnetic order in this quasicrystalline system. To navigate the parameter space and identify distinct magnetic phases, we employ a machine learning approach, which proves to be a powerful tool in revealing the complex magnetic behavior of this system. We offer a thorough description of the magnetic phase diagram as a function of the model parameters. Notably, we discover among other collinear and non-collinear phases, a unique ferromagnetic alternating helical phase. In this non-collinear quasiperiodic ferromagnetic configuration the magnetization decreases logarithmically with the stack height.

摘要
在这项研究中，我们进行了详细的理论分析，涉及到费波纳契镁磁层的杂合堆叠，可能通过磁性van der Waals材料实现。我们构建了这种磁性异构体系的模型，包括最多的第二邻居层磁交互，这种系统显示了复杂的几何阻碍和磁ORDER之间的关系。为了探索参数空间并特征化不同磁相，我们使用机器学习方法，这证明了这种方法在揭示这种系统的复杂磁性行为上是一个强大工具。我们提供了磁相图的全面描述，其中包括了模型参数的函数。特别是，我们发现了一种独特的梯形扁平磁相，在堆高上呈指数减少的情况下，磁化强度下降。

Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

paper_url: http://arxiv.org/abs/2307.16039
repo_url: https://github.com/nlp-uoregon/okapi
paper_authors: Viet Dac Lai, Chien Van Nguyen, Nghia Trung Ngo, Thuat Nguyen, Franck Dernoncourt, Ryan A. Rossi, Thien Huu Nguyen
for: 本研究旨在提高大型自然语言模型（LLM）的可用性和影响力，通过训练和调教 LLM 可以更好地适应人类的期望，从而实现出色的学习能力。
methods: 本研究使用了 supervised fine-tuning (SFT) 和 reinforcement learning from human feedback (RLHF) 两种方法进行 instruction tuning，以便在多种语言上实现最佳的性能。
results: 我们的实验表明，使用 RLHF 进行多语言 instruction tuning 可以超过 SFT 的性能，并且可以在不同的基模型和数据集上实现优秀的结果。

Abstract
A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are currently applied to produce the best commercial LLMs (e.g., ChatGPT). To improve the accessibility of LLMs for research and development efforts, various instruction-tuned open-source LLMs have also been introduced recently, e.g., Alpaca, Vicuna, to name a few. However, existing open-source LLMs have only been instruction-tuned for English and a few popular languages, thus hindering their impacts and accessibility to many other languages in the world. Among a few very recent work to explore instruction tuning for LLMs in multiple languages, SFT has been used as the only approach to instruction-tune LLMs for multiple languages. This has left a significant gap for fine-tuned LLMs based on RLHF in diverse languages and raised important questions on how RLHF can boost the performance of multilingual instruction tuning. To overcome this issue, we present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages. Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research. We also present benchmark datasets to enable the evaluation of generative LLMs in multiple languages. Our experiments demonstrate the advantages of RLHF for multilingual instruction over SFT for different base models and datasets. Our framework and resources are released at https://github.com/nlp-uoregon/Okapi.

摘要
键技术 для开发大语言模型（LLM）包括指令调整，以使模型的回答与人类期望相一致，从而实现很好的学习能力。目前，supervised fine-tuning（SFT）和人类反馈学习（RLHF）是两种主要的指令调整方法，用于生产最佳商业LLM（如ChatGPT）。为了提高LLM的研究和开发的可accessibility，各种指令调整的开源LLM也已经被引入，如Alpaca和Vicuna等。然而，现有的开源LLM只有在英语和一些流行的语言上进行了指令调整，这限制了它们在全球各种语言中的影响和可用性。在最近几年中，有一些工作尝试了在多语言中进行指令调整，但只使用了SFT作为唯一的调整方法。这留下了一个大的 gap，即在多语言中使用RLHF进行细调的可能性。为了解决这个问题，我们提出了Okapi，首个基于RLHF的多语言指令调整系统。Okapi introduce了26种多语言的指令和回答排名数据，以便进行实验和未来多语言LLM研究的发展。我们还提供了多语言生成LLM的评价数据集。我们的实验表明，RLHF在多语言指令调整中的优势，比SFT在不同的基本模型和数据集上。我们的框架和资源在https://github.com/nlp-uoregon/Okapi上发布。

Developing novel ligands with enhanced binding affinity for the sphingosine 1-phosphate receptor 1 using machine learning

paper_url: http://arxiv.org/abs/2307.16037
repo_url: None
paper_authors: Colin Zhang, Yang Ha
for: 这个研究旨在使用机器学习模型加速多斯普朗肌病（MS）的药物发现过程，并通过分析蛋白质-药物交互的化学特性，探索新的药物设计方法。
methods: 该研究使用了自适应器机器学习模型，将化学式转化为数学向量，并生成了超过500个分子变体基于斯皮诺模（siponimod），其中25种分子具有更高的靶蛋白S1PR1的绑定亲和力。
results: 该研究发现了6种有药理性和易合成的药物候选者，并通过分析这些药物与S1PR1的绑定交互，探讨了一些靶蛋白-药物交互的化学特性，这些结果表明机器学习可以加速药物发现过程，并为药物设计提供新的视角。

Abstract
Multiple sclerosis (MS) is a debilitating neurological disease affecting nearly one million people in the United States. Sphingosine-1-phosphate receptor 1, or S1PR1, is a protein target for MS. Siponimod, a ligand of S1PR1, was approved by the FDA in 2019 for MS treatment, but there is a demonstrated need for better therapies. To this end, we finetuned an autoencoder machine learning model that converts chemical formulas into mathematical vectors and generated over 500 molecular variants based on siponimod, out of which 25 compounds had higher predicted binding affinity to S1PR1. The model was able to generate these ligands in just under one hour. Filtering these compounds led to the discovery of six promising candidates with good drug-like properties and ease of synthesis. Furthermore, by analyzing the binding interactions for these ligands, we uncovered several chemical properties that contribute to high binding affinity to S1PR1. This study demonstrates that machine learning can accelerate the drug discovery process and reveal new insights into protein-drug interactions.

摘要

MUSE: Multi-View Contrastive Learning for Heterophilic Graphs

paper_url: http://arxiv.org/abs/2307.16026
repo_url: None
paper_authors: Mengyi Yuan, Minjie Chen, Xiang Li
for: 这篇文章的目的是提出一种基于多视图对照学习的自动学习模型，即MUSE，以解决传统Graph Neural Networks（GNN）中的标签依赖和泛化性问题。methods: 该模型使用了两种视图来捕捉egos节点和其邻居的信息，即GNNs增强了对照学习的视图，然后将这两个视图融合以生成节点表示。此外，该模型还使用了对照强化和信息整合控制器来模型节点邻居相似性的多样性。results: 对于9个benchmark数据集，我们的实验结果表明MUSE模型在节点分类和聚类任务中具有显著的效果。

Abstract
In recent years, self-supervised learning has emerged as a promising approach in addressing the issues of label dependency and poor generalization performance in traditional GNNs. However, existing self-supervised methods have limited effectiveness on heterophilic graphs, due to the homophily assumption that results in similar node representations for connected nodes. In this work, we propose a multi-view contrastive learning model for heterophilic graphs, namely, MUSE. Specifically, we construct two views to capture the information of the ego node and its neighborhood by GNNs enhanced with contrastive learning, respectively. Then we integrate the information from these two views to fuse the node representations. Fusion contrast is utilized to enhance the effectiveness of fused node representations. Further, considering that the influence of neighboring contextual information on information fusion may vary across different ego nodes, we employ an information fusion controller to model the diversity of node-neighborhood similarity at both the local and global levels. Finally, an alternating training scheme is adopted to ensure that unsupervised node representation learning and information fusion controller can mutually reinforce each other. We conduct extensive experiments to evaluate the performance of MUSE on 9 benchmark datasets. Our results show the effectiveness of MUSE on both node classification and clustering tasks.

摘要
Here's the translation in Simplified Chinese:近年来，自我超vision学习 emerged as a promising approach to address the issues of label dependency and poor generalization performance in traditional GNNs. However, existing self-supervised methods have limited effectiveness on heterophilic graphs due to the homophily assumption, which results in similar node representations for connected nodes. In this work, we propose a multi-view contrastive learning model for heterophilic graphs, called MUSE. Specifically, we construct two views to capture the information of the ego node and its neighborhood using GNNs enhanced with contrastive learning, respectively. Then we integrate the information from these two views to fuse the node representations. Fusion contrast is utilized to enhance the effectiveness of fused node representations. Moreover, considering that the influence of neighboring contextual information on information fusion may vary across different ego nodes, we employ an information fusion controller to model the diversity of node-neighborhood similarity at both the local and global levels. Finally, an alternating training scheme is adopted to ensure that unsupervised node representation learning and information fusion controller can mutually reinforce each other. We conduct extensive experiments to evaluate the performance of MUSE on 9 benchmark datasets. Our results show the effectiveness of MUSE on both node classification and clustering tasks.

Discrete neural nets and polymorphic learning

paper_url: http://arxiv.org/abs/2308.00677
repo_url: https://github.com/caten2/tripods2021ua
paper_authors: Charlotte Aten
for: 这篇论文旨在统一提出神经网络和 универсаль算法的关系，并介绍一种基于 polymorphisms of relational structures 的学习算法。
methods: 这篇论文使用了 Murski\u{i} 的 universal algebra 结论和 Cybenko 的神经网络 universal approximation 结论，并提出了一种基于 polymorphisms of relational structures 的学习算法。
results: 这篇论文的结果表明，使用这种学习算法可以解决一些 классические学习任务。

Abstract
Theorems from universal algebra such as that of Murski\u{i} from the 1970s have a striking similarity to universal approximation results for neural nets along the lines of Cybenko's from the 1980s. We consider here a discrete analogue of the classical notion of a neural net which places these results in a unified setting. We introduce a learning algorithm based on polymorphisms of relational structures and show how to use it for a classical learning task.

摘要
theorem 从通用代数如 murski的 1970年代有一种 striking similarity 与 neural network 的 universal approximation result 类似，例如 cybenko 在 1980年代的 result。我们在这里 Consider 一个离散的 neural network 的类传统的概念，并将这些结果集成到一个统一的设定中。我们介绍一种基于 relational structure 的学习算法，并证明可以用它来解决 classical learning task。Note:* "Murski" should be written as "穆尔斯基" (Mù'ěrskī) in Simplified Chinese.* "Cybenko" should be written as "谢本科" (Xiè Běnkē) in Simplified Chinese.

Fuzzy Logic Visual Network (FLVN): A neuro-symbolic approach for visual features matching

paper_url: http://arxiv.org/abs/2307.16019
repo_url: https://gitlab.com/grains2/flvn
paper_authors: Francesco Manigrasso, Lia Morra, Fabrizio Lamberti
for: 这个研究目的是实现具有symbolic知识表示和深度神经网络学习的neuro-symbolic整合。
methods: 这个研究使用了Logic Tensor Networks (LTNs)来将背景知识转换为可微分的操作，并将其应用到零例学习（ZSL）分类任务中。
results: 这个研究提出了Fuzzy Logic Visual Network (FLVN)，它在neuro-symbolic LTN框架下学习了一个可视 Semantic embedding 空间，并将内在知识（例如类别和概念阶层）统一到这个 embedding 空间中。 FLVN 在 Generalized ZSL（GZSL）测试 benchmark 上表现出色，与其他最新的 ZSL 方法相比，具有较少的计算负载。

Abstract
Neuro-symbolic integration aims at harnessing the power of symbolic knowledge representation combined with the learning capabilities of deep neural networks. In particular, Logic Tensor Networks (LTNs) allow to incorporate background knowledge in the form of logical axioms by grounding a first order logic language as differentiable operations between real tensors. Yet, few studies have investigated the potential benefits of this approach to improve zero-shot learning (ZSL) classification. In this study, we present the Fuzzy Logic Visual Network (FLVN) that formulates the task of learning a visual-semantic embedding space within a neuro-symbolic LTN framework. FLVN incorporates prior knowledge in the form of class hierarchies (classes and macro-classes) along with robust high-level inductive biases. The latter allow, for instance, to handle exceptions in class-level attributes, and to enforce similarity between images of the same class, preventing premature overfitting to seen classes and improving overall performance. FLVN reaches state of the art performance on the Generalized ZSL (GZSL) benchmarks AWA2 and CUB, improving by 1.3% and 3%, respectively. Overall, it achieves competitive performance to recent ZSL methods with less computational overhead. FLVN is available at https://gitlab.com/grains2/flvn.

摘要
In this study, we present the Fuzzy Logic Visual Network (FLVN), which formulates the task of learning a visual-semantic embedding space within a neuro-symbolic LTN framework. FLVN incorporates prior knowledge in the form of class hierarchies and robust high-level inductive biases, allowing for exception handling and similarity enforcement between images of the same class. This improves overall performance and reduces premature overfitting to seen classes.FLVN achieves state-of-the-art performance on the Generalized ZSL (GZSL) benchmarks AWA2 and CUB, improving by 1.3% and 3%, respectively. It also achieves competitive performance to recent ZSL methods with less computational overhead. FLVN is available at .

2023-07-30

eess.IV

eess.IV - 2023-07-30

Unsupervised Decomposition Networks for Bias Field Correction in MR Image

paper_url: http://arxiv.org/abs/2307.16219
repo_url: https://github.com/leongdong/bias-decomposition-networks
paper_authors: Dong Liang, Xingyu Qiu, Kuanquan Wang, Gongning Luo, Wei Wang, Yashu Liu
for: 这个研究旨在提出一种不需要监督学习的批处理网络，以获取受扭曲的MR影像中的偏差场。
methods: 该方法使用了一种由批处理网络组成的分解方法，包括一个分类部分和一个估算部分，以便分解受扭曲的MR影像。
results: 实验结果表明，该方法可以准确地估算偏差场并生成更好的偏差 corrections。 codes 可以在以下链接中找到：https://github.com/LeongDong/Bias-Decomposition-Networks。

Abstract
Bias field, which is caused by imperfect MR devices or imaged objects, introduces intensity inhomogeneity into MR images and degrades the performance of MR image analysis methods. Many retrospective algorithms were developed to facilitate the bias correction, to which the deep learning-based methods outperformed. However, in the training phase, the supervised deep learning-based methods heavily rely on the synthesized bias field. As the formation of the bias field is extremely complex, it is difficult to mimic the true physical property of MR images by synthesized data. While bias field correction and image segmentation are strongly related, the segmentation map is precisely obtained by decoupling the bias field from the original MR image, and the bias value is indicated by the segmentation map in reverse. Thus, we proposed novel unsupervised decomposition networks that are trained only with biased data to obtain the bias-free MR images. Networks are made up of: a segmentation part to predict the probability of every pixel belonging to each class, and an estimation part to calculate the bias field, which are optimized alternately. Furthermore, loss functions based on the combination of fuzzy clustering and the multiplicative bias field are also devised. The proposed loss functions introduce the smoothness of bias field and construct the soft relationships among different classes under intra-consistency constraints. Extensive experiments demonstrate that the proposed method can accurately estimate bias fields and produce better bias correction results. The code is available on the link: https://github.com/LeongDong/Bias-Decomposition-Networks.

摘要
�� bias �eld，��由不完美的 MR 设备或图像物理特性所引起的，会导致 MR 图像中的�Intensity 不均��，从而降低 MR 图像分析方法的性能。许多retrospective算法已经开发来简化偏好 corrections，其中深度学习基于方法在训练阶段更高效。然而，在训练阶段，深度学习基于方法强依赖于制成的偏好场。由于偏好场的形成非常复杂，难以通过制成的数据来模拟真实的物理特性。而偏好场 correction 和图像 segmentation 是非常相关的，可以通过分解偏好场来获得不受偏好影响的 MR 图像。因此，我们提出了一种新的无监督分解网络，该网络通过在偏好数据上进行训练来获得偏好场 corrections。该网络由两部分组成：一个分类部分用于预测每个像素属于哪个类别，以及一个估计部分用于计算偏好场，这两个部分在 alternate 中优化。此外，我们还提出了一种基于多元偏好场的损失函数，该损失函数引入了偏好场的缓和性和不同类别之间的软连接。广泛的实验表明，我们的方法可以准确地估计偏好场并生成更好的偏好 corrections 结果。代码可以在以下链接获取：https://github.com/LeongDong/Bias-Decomposition-Networks。

Gastrointestinal Mucosal Problems Classification with Deep Learning

paper_url: http://arxiv.org/abs/2307.16198
repo_url: None
paper_authors: Mohammadhasan Goharian, Vahid Goharian, Hamidreza Bolhasani
for: 旨在检测胃肠粘膜变化，早期诊断和预防胃肠癌。
methods: 使用深度学习算法，特别是基于Convolutional Neural Networks（CNNs）的传送学习（TL）。
results: 在测试图像中，模型精度达93%，并在实际检anoscopy和colonoscopy视频中进行了预测。

Abstract
Gastrointestinal mucosal changes can cause cancers after some years and early diagnosing them can be very useful to prevent cancers and early treatment. In this article, 8 classes of mucosal changes and anatomical landmarks including Polyps, Ulcerative Colitis, Esophagitis, Normal Z-Line, Normal Pylorus, Normal Cecum, Dyed Lifted Polyps, and Dyed Lifted Margin were predicted by deep learning. We used neural networks in this article. It is a black box artificial intelligence algorithm that works like a human neural system. In this article, Transfer Learning (TL) based on the Convolutional Neural Networks (CNNs), which is one of the well-known types of neural networks in image processing is used. We compared some famous CNN architecture including VGG, Inception, Xception, and ResNet. Our best model got 93% accuracy in test images. At last, we used our model in some real endoscopy and colonoscopy movies to classify problems.

摘要
胃肠内膜变化可能导致癌变，早期诊断可以有助于预防癌变并提供早期治疗。在这篇文章中，我们预测了8种胃肠内膜变化和解剖学特征，包括贫血溃疡、急性肠炎、胃肠内膜Z线、胃肠内膜pylorus、胃肠内膜 Cecum、染料吸引溃疡和染料吸引边缘。我们使用了神经网络来进行预测。神经网络是一种黑盒子人工智能算法，它工作如同人类神经系统一样。在这篇文章中，我们使用了传输学（TL）基于卷积神经网络（CNNs），这是一种广泛使用的神经网络类型在图像处理中。我们比较了一些著名的CNN架构，包括VGG、Inception、Xception和ResNet。我们的最佳模型在测试图像中达到了93%的准确率。最后，我们使用了我们的模型在一些真实的病理影像中进行分类。

paper_url: http://arxiv.org/abs/2307.16169
repo_url: https://github.com/kynthesis/StarSRGAN
paper_authors: Khoa D. Vo, Len T. Bui
for: This paper is written for improving the blind super-resolution (SR) in computer vision, aiming to enhance the resolution of low-resolution images without prior knowledge of the degradation process.
methods: The paper introduces StarSRGAN, a novel GAN model that utilizes 5 various architectures to achieve state-of-the-art (SOTA) performance in blind SR tasks. The model is designed to provide visually compelling outcomes with improved super-resolved quality.
results: The experimental comparisons with Real-ESRGAN show that StarSRGAN achieves roughly 10% better performance on the MANIQA and AHIQ measures, while StarSRGAN Lite provides approximately 7.5 times faster reconstruction speed with only a slight decrease in image quality. The codes are available at https://github.com/kynthesis/StarSRGAN.

Abstract
The aim of blind super-resolution (SR) in computer vision is to improve the resolution of an image without prior knowledge of the degradation process that caused the image to be low-resolution. The State of the Art (SOTA) model Real-ESRGAN has advanced perceptual loss and produced visually compelling outcomes using more complex degradation models to simulate real-world degradations. However, there is still room to improve the super-resolved quality of Real-ESRGAN by implementing recent techniques. This research paper introduces StarSRGAN, a novel GAN model designed for blind super-resolution tasks that utilize 5 various architectures. Our model provides new SOTA performance with roughly 10% better on the MANIQA and AHIQ measures, as demonstrated by experimental comparisons with Real-ESRGAN. In addition, as a compact version, StarSRGAN Lite provides approximately 7.5 times faster reconstruction speed (real-time upsampling from 540p to 4K) but can still keep nearly 90% of image quality, thereby facilitating the development of a real-time SR experience for future research. Our codes are released at https://github.com/kynthesis/StarSRGAN.

摘要
目的是提高计算机视觉中的盲超分辨率（SR），无需先知道降低过程的信息，以提高图像的分辨率。现有的最佳实践（SOTA）模型Real-ESRGAN已经使用了更复杂的降低模型来模拟实际世界中的降低。然而，还有余地可以提高Real-ESRGAN中的超分辨率质量。这篇研究论文介绍了StarSRGAN，一种新的GAN模型，用于盲SR任务。我们的模型使用了5种不同的建筑，并提供了新的SOTA性能，在MANIQA和AHIQ测试中比Real-ESRGAN提高了约10%。此外，我们还提供了一个快速重建速度版本StarSRGAN Lite，可以在540p到4K的快速扩展中实现实时SR体验。我们的代码在https://github.com/kynthesis/StarSRGAN上发布。

Structure-Preserving Synthesis: MaskGAN for Unpaired MR-CT Translation

paper_url: http://arxiv.org/abs/2307.16143
repo_url: https://github.com/HieuPhan33/MaskGAN
paper_authors: Minh Hieu Phan, Zhibin Liao, Johan W. Verjans, Minh-Son To
for: 这篇论文旨在提供一个可靠且cost-effective的医疗影像合成方法，以便对于医疗影像资料的损失或缺乏实现合成。
methods: 这篇论文使用了CycleGAN的架构，并将自动提取的粗糙面给入力到架构中，以便保持体Structural consistency。
results: 实验结果显示，MaskGAN在一个儿童医疗领域的复杂数据集上表现出色，能够保持体 Structural consistency，而不需要专家的标注。

Abstract
Medical image synthesis is a challenging task due to the scarcity of paired data. Several methods have applied CycleGAN to leverage unpaired data, but they often generate inaccurate mappings that shift the anatomy. This problem is further exacerbated when the images from the source and target modalities are heavily misaligned. Recently, current methods have aimed to address this issue by incorporating a supplementary segmentation network. Unfortunately, this strategy requires costly and time-consuming pixel-level annotations. To overcome this problem, this paper proposes MaskGAN, a novel and cost-effective framework that enforces structural consistency by utilizing automatically extracted coarse masks. Our approach employs a mask generator to outline anatomical structures and a content generator to synthesize CT contents that align with these structures. Extensive experiments demonstrate that MaskGAN outperforms state-of-the-art synthesis methods on a challenging pediatric dataset, where MR and CT scans are heavily misaligned due to rapid growth in children. Specifically, MaskGAN excels in preserving anatomical structures without the need for expert annotations. The code for this paper can be found at https://github.com/HieuPhan33/MaskGAN.

摘要
医学图像生成是一项具有挑战性的任务，因为精度匹配数据罕见。许多方法使用CycleGAN来利用无对数据，但它们经常生成错误的映射，导致身体结构的偏移。这个问题更加严重当图像来源和目标模式之间的偏移很大。目前的方法通过添加辅助分割网络来解决这个问题，但这需要成本和时间昂贵的像素级别标注。为了缓解这个问题，这篇论文提出了MaskGAN，一种新的和经济的框架，通过自动提取的粗略Mask来保持结构一致性。我们的方法使用Mask生成器将体结构析出，并使用内容生成器Synthesize CT内容，与这些结构相对应。我们的实验表明，MaskGAN在一个复杂的儿童数据集上表现出色，特别是在MR和CT扫描中存在快速增长的儿童身体中，具有优秀的结构保持性，而不需要专家标注。相关代码可以在https://github.com/HieuPhan33/MaskGAN中找到。

Implicit Neural Representation in Medical Imaging: A Comparative Survey

paper_url: http://arxiv.org/abs/2307.16142
repo_url: https://github.com/mindflow-institue/awesome-implicit-neural-representations-in-medical-imaging
paper_authors: Amirali Molaei, Amirhossein Aminimehr, Armin Tavakoli, Amirhossein Kazerouni, Bobby Azad, Reza Azad, Dorit Merhof
for: This survey provides a comprehensive overview of implicit neural representations (INRs) in the field of medical imaging, exploring their applications and advantages in various medical imaging tasks.methods: The survey discusses the use of INRs in image reconstruction, segmentation, registration, novel view synthesis, and compression, highlighting their resolution-agnostic nature, memory efficiency, ability to avoid locality biases, and differentiability.results: The survey addresses the challenges and considerations specific to medical imaging data, such as data availability, computational complexity, and dynamic clinical scene analysis, and identifies future research directions and opportunities, including integration with multi-modal imaging, real-time and interactive systems, and domain adaptation for clinical decision support.

Abstract
Implicit neural representations (INRs) have gained prominence as a powerful paradigm in scene reconstruction and computer graphics, demonstrating remarkable results. By utilizing neural networks to parameterize data through implicit continuous functions, INRs offer several benefits. Recognizing the potential of INRs beyond these domains, this survey aims to provide a comprehensive overview of INR models in the field of medical imaging. In medical settings, numerous challenging and ill-posed problems exist, making INRs an attractive solution. The survey explores the application of INRs in various medical imaging tasks, such as image reconstruction, segmentation, registration, novel view synthesis, and compression. It discusses the advantages and limitations of INRs, highlighting their resolution-agnostic nature, memory efficiency, ability to avoid locality biases, and differentiability, enabling adaptation to different tasks. Furthermore, the survey addresses the challenges and considerations specific to medical imaging data, such as data availability, computational complexity, and dynamic clinical scene analysis. It also identifies future research directions and opportunities, including integration with multi-modal imaging, real-time and interactive systems, and domain adaptation for clinical decision support. To facilitate further exploration and implementation of INRs in medical image analysis, we have provided a compilation of cited studies along with their available open-source implementations on \href{https://github.com/mindflow-institue/Awesome-Implicit-Neural-Representations-in-Medical-imaging}. Finally, we aim to consistently incorporate the most recent and relevant papers regularly.

摘要
启发神经表示（INR）在场景重建和计算机图形领域已经崭新出名，表现出色。通过使用神经网络来参数化数据通过间接连续函数，INR提供了多个优势。认识到INR在医疗领域之外的潜在应用，这份报告提供了医学成像领域INR模型的全面回顾。在医疗设置下，存在许多复杂和不稳定的问题，使INR成为一种吸引人的解决方案。本报告探讨了INR在各种医学成像任务中的应用，如图像重建、分割、注册、新视图生成和压缩。它讨论了INR的优点和限制，包括其分辨率不依赖、内存效率高、避免地方偏好和可导 differentiability，以便适应不同任务。此外，报告还考虑了医学成像数据特有的挑战和考虑因素，如数据可用性、计算复杂度和临床Scene analysis。最后，报告还提出了未来研究方向和机会，包括与多模态成像集成、实时交互系统和适应医疗决策的领域适应。为便于进一步探索和实现INR在医学成像分析中，我们在\href{https://github.com/mindflow-institue/Awesome-Implicit-Neural-Representations-in-Medical-imaging}提供了参考文献和其可用的开源实现。

RIS-Enhanced Semantic Communications Adaptive to User Requirements

paper_url: http://arxiv.org/abs/2307.16100
repo_url: None
paper_authors: Peiwen Jiang, Chao-Kai Wen, Shi Jin, Geoffrey Ye Li
for: 这个论文是为了提出一个基于智能表面的对话传输框架，以满足不断变化的用户需求和环境。
methods: 这个框架使用了智能表面来自动调整传输通道，以满足不同的用户需求和环境。它还使用了对话传输的混合编码设计和端到端训练，以提高传输效率和可靠性。
results: simulations results indicate that the proposed RIS-SC framework can achieve reasonable task performance and adapt to diverse channel conditions and user requirements. However, under severe channel conditions, some semantic parts may be abandoned. To address this issue, a reconstruction method is introduced to improve visual acceptance by inferring missing semantic parts. Additionally, the framework can efficiently allocate RIS resources among multiple users in friendly channel conditions.

Abstract
Semantic communication significantly reduces required bandwidth by understanding semantic meaning of the transmitted. However, current deep learning-based semantic communication methods rely on joint source-channel coding design and end-to-end training, which limits their adaptability to new physical channels and user requirements. Reconfigurable intelligent surfaces (RIS) offer a solution by customizing channels in different environments. In this study, we propose the RIS-SC framework, which allocates semantic contents with varying levels of RIS assistance to satisfy the changing user requirements. It takes into account user movement and line-of-sight obstructions, enabling the RIS resource to protect important semantics in challenging channel conditions. The simulation results indicate reasonable task performance, but some semantic parts that have no effect on task performances are abandoned under severe channel conditions. To address this issue, a reconstruction method is also introduced to improve visual acceptance by inferring those missing semantic parts. Furthermore, the framework can adjust RIS resources in friendly channel conditions to save and allocate them efficiently among multiple users. Simulation results demonstrate the adaptability and efficiency of the RIS-SC framework across diverse channel conditions and user requirements.

摘要
semantic communication 可以减少需要的带宽，因为它理解传输的 semantic 含义。但是，现有的深度学习基于 semantic communication 方法依赖于共同源-通道编码设计和端到端训练，这限制了它们在新的物理通道和用户需求中的适应性。可重配置智能表面（RIS）提供了一种解决方案，可以在不同环境中自定义通道。在本研究中，我们提出了 RIS-SC 框架，它将具有不同水平的 RIS 帮助分配到满足变化的用户需求。它考虑用户的运动和视线干扰，使得 RIS 资源能够保护重要的 semantics 在具有挑战性的通道条件下。 sim 结果表明任务性能合理，但在严重的通道条件下，一些无关任务性能的 semantic 部分会被放弃。为解决这个问题，我们还提出了一种重建方法，可以通过推理这些缺失的 semantic 部分来提高视觉接受度。此外，框架还可以在友好的通道条件下调整 RIS 资源，以efficiently 地分配它们于多个用户。 sim 结果表明 RIS-SC 框架在多种通道条件和用户需求下展示了适应性和效率。

A New Multi-Level Hazy Image and Video Dataset for Benchmark of Dehazing Methods

paper_url: http://arxiv.org/abs/2307.16050
repo_url: None
paper_authors: Bedrettin Cetinkaya, Yucel Cimtay, Fatma Nazli Gunay, Gokce Nur Yilmaz
for: This study aims to present a new multi-level hazy color image dataset and compare the dehazing performance of five different dehazing methods/models.
methods: The study uses color video data captured for two real scenes with controlled levels of haze, and the dehazing performance is evaluated based on SSIM, PSNR, VSI, and DISTS image quality metrics.
results: The results show that traditional methods can generalize the dehazing problem better than many deep learning-based methods, and the performance of deep models depends mostly on the scene and is generally poor on cross-dataset dehazing.Here’s the Chinese translation of the three key points:
for: 这个研究的目的是为了提供一个多级雾度的颜色图像集合，并对五种不同的抑雾方法/模型进行比较。
methods: 这个研究使用了两个真实场景中的颜色视频数据，并使用了控制雾度的方式来生成多级雾度图像。抑雾性能是根据SSIM、PSNR、VSI和DISTS图像质量指标进行评估。
results: 结果表明，传统方法在抑雾问题上能够更好地总结，而深度学习基于的方法在不同场景下的性能很差，特别是在跨集合抑雾问题上。

Abstract
The changing level of haze is one of the main factors which affects the success of the proposed dehazing methods. However, there is a lack of controlled multi-level hazy dataset in the literature. Therefore, in this study, a new multi-level hazy color image dataset is presented. Color video data is captured for two real scenes with a controlled level of haze. The distance of the scene objects from the camera, haze level, and ground truth (clear image) are available so that different dehazing methods and models can be benchmarked. In this study, the dehazing performance of five different dehazing methods/models is compared on the dataset based on SSIM, PSNR, VSI and DISTS image quality metrics. Results show that traditional methods can generalize the dehazing problem better than many deep learning based methods. The performance of deep models depends mostly on the scene and is generally poor on cross-dataset dehazing.

摘要
“雾度的变化是这些提议的滤雾方法成功的一个主要因素，但在文献中没有受控多级雾度数据集。因此，在本研究中，一个新的多级雾度彩色图像数据集被提出。实际拍摄的彩色视频数据被捕捉到两个场景中，并且有控制雾度、距离相机和真实预期（清晰图像）的资讯，以便不同的滤雾方法和模型进行比较。在本研究中，五种不同的滤雾方法/模型的比较结果显示，传统方法在不同场景下能够更好地应对滤雾问题，而深度学习基本方法则受到场景的影响，一般而言，跨数据集的滤雾性能较差。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

CoVid-19 Detection leveraging Vision Transformers and Explainable AI

paper_url: http://arxiv.org/abs/2307.16033
repo_url: None
paper_authors: Pangoth Santhosh Kumar, Kundrapu Supriya, Mallikharjuna Rao K
for: 这个研究的目的是为了测定肺病的早期诊断，以提高病人的生存机会和质量生活。
methods: 这个研究使用了深度学习算法，包括卷积神经网络（CNN）、普通神经网络、视觉几何组网络（VGG）和封顶网络（Capsule Network）等，以进行肺病预测。
results: 这个研究发现，使用了视觉几何组网络（VGG）和封顶网络（Capsule Network）的方法可以实现肺病早期检测，并在 Covid 19 Radiography Database 上进行了训练和验证，获得了更高的准确率。

Abstract
Lung disease is a common health problem in many parts of the world. It is a significant risk to people health and quality of life all across the globe since it is responsible for five of the top thirty leading causes of death. Among them are COVID 19, pneumonia, and tuberculosis, to name just a few. It is critical to diagnose lung diseases in their early stages. Several different models including machine learning and image processing have been developed for this purpose. The earlier a condition is diagnosed, the better the patient chances of making a full recovery and surviving into the long term. Thanks to deep learning algorithms, there is significant promise for the autonomous, rapid, and accurate identification of lung diseases based on medical imaging. Several different deep learning strategies, including convolutional neural networks (CNN), vanilla neural networks, visual geometry group based networks (VGG), and capsule networks , are used for the goal of making lung disease forecasts. The standard CNN has a poor performance when dealing with rotated, tilted, or other aberrant picture orientations. As a result of this, within the scope of this study, we have suggested a vision transformer based approach end to end framework for the diagnosis of lung disorders. In the architecture, data augmentation, training of the suggested models, and evaluation of the models are all included. For the purpose of detecting lung diseases such as pneumonia, Covid 19, lung opacity, and others, a specialised Compact Convolution Transformers (CCT) model have been tested and evaluated on datasets such as the Covid 19 Radiography Database. The model has achieved a better accuracy for both its training and validation purposes on the Covid 19 Radiography Database.

摘要
肺病是全球许多地区的常见健康问题。它对人们的健康和生活质量构成了重要的威胁，因为它负责全球前30名死亡原因中的5个。包括COVID-19、肺炎和结核病等在内，这些疾病的普遍性使得早期诊断变得非常重要。为了实现这一目标，许多不同的模型，包括机器学习和图像处理，已经被开发出来。随着深度学习算法的出现，对于基于医疗图像的肺病诊断，存在 significante 的承诺。在这种情况下，我们建议使用视transformer基本框架，以实现肺病诊断。在这个框架中，包括数据增强、模型训练和评估等方面。为了检测肺病如肺炎、COVID-19、肺抑制等，我们提出了一种专门的Compact Convolution Transformers（CCT）模型，并在 Covid 19 胸部X射线数据库上进行了测试和评估。该模型在训练和验证过程中具有更高的准确率。

LOTUS: Learning to Optimize Task-based US representations

paper_url: http://arxiv.org/abs/2307.16021
repo_url: None
paper_authors: Yordanka Velikova, Mohammad Farid Azampour, Walter Simson, Vanessa Gonzalez Duque, Nassir Navab
for:The paper is written for the task of anatomical segmentation of organs in ultrasound images, specifically for diagnosis and monitoring.methods:The paper proposes a novel approach for learning to optimize task-based ultrasound image representations, using annotated CT segmentation maps as a simulation medium to generate ultrasound training data. The approach includes a fully differentiable ultrasound simulator that learns to optimize the parameters for generating physics-based ultrasound images guided by the downstream segmentation task, as well as an image adaptation network between real and simulated images to achieve simultaneous image synthesis and automatic segmentation on US images in an end-to-end training setting.results:The proposed method is evaluated on aorta and vessel segmentation tasks and shows promising quantitative results, as well as qualitative results of optimized image representations on other organs.

Abstract
Anatomical segmentation of organs in ultrasound images is essential to many clinical applications, particularly for diagnosis and monitoring. Existing deep neural networks require a large amount of labeled data for training in order to achieve clinically acceptable performance. Yet, in ultrasound, due to characteristic properties such as speckle and clutter, it is challenging to obtain accurate segmentation boundaries, and precise pixel-wise labeling of images is highly dependent on the expertise of physicians. In contrast, CT scans have higher resolution and improved contrast, easing organ identification. In this paper, we propose a novel approach for learning to optimize task-based ultra-sound image representations. Given annotated CT segmentation maps as a simulation medium, we model acoustic propagation through tissue via ray-casting to generate ultrasound training data. Our ultrasound simulator is fully differentiable and learns to optimize the parameters for generating physics-based ultrasound images guided by the downstream segmentation task. In addition, we train an image adaptation network between real and simulated images to achieve simultaneous image synthesis and automatic segmentation on US images in an end-to-end training setting. The proposed method is evaluated on aorta and vessel segmentation tasks and shows promising quantitative results. Furthermore, we also conduct qualitative results of optimized image representations on other organs.

摘要
医学应用中对ultrasound图像的结构分割是非常重要的，特别是诊断和监测。现有的深度神经网络需要大量标注数据进行训练以达到临床可接受的性能。然而，在ultrasound中，由特有的斑点和噪声而导致的分割边界很难确定，并且医生 preciselly pixel-wise 标注图像是高度dependent于医生的专业技巧。然而，CT扫描机有更高的分辨率和更好的对比度，使得器官识别变得更容易。在这篇论文中，我们提出了一种新的方法，用于学习优化任务基于ultrasound图像的表示。我们使用了ray-casting模拟声波传播through tissue，以生成ultrasound训练数据。我们的ultrasound模拟器是完全可导的，可以学习优化参数，以便生成physics-based ultasound图像，并且被下游分割任务导引。此外，我们还训练了一种图像适应网络，以实现同时的图像合成和自动分割任务。我们的提案方法在AAA和血管分割任务中表现出了有力的量化结果。此外，我们还进行了其他器官的优化图像结果的质量评估。

2023-07-29

cs.SD

cs.SD - 2023-07-29

MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis

paper_url: http://arxiv.org/abs/2307.16012
repo_url: None
paper_authors: Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Xixin Wu, Shiyin Kang, Helen Meng
for: 这个论文的目的是提出一种基于多尺度 Style 模型的 expresive speech synthesis 方法，以便在人机交互场景中更加自然和 expresive。
methods: 该方法使用两个子模块：一个是多尺度 Style 提取器，另一个是多尺度 Style 预测器。这两个子模块与 FastSpeech 2 基于 acoustic model 一起训练。预测器通过考虑上下文结构关系来探索层次结构上的 Context information，并预测 Style 嵌入。提取器则提取了多尺度 Style 嵌入从真实的speech中。
results: 论文的实验结果表明，该方法与三个基准方法进行比较，在域内和域外 audiobook 数据集上具有显著的优势。此外，文章还进行了Context information 和多尺度 Style 表示的分析，这些分析从未被讨论过。

Abstract
Expressive speech synthesis is crucial for many human-computer interaction scenarios, such as audiobooks, podcasts, and voice assistants. Previous works focus on predicting the style embeddings at one single scale from the information within the current sentence. Whereas, context information in neighboring sentences and multi-scale nature of style in human speech are neglected, making it challenging to convert multi-sentence text into natural and expressive speech. In this paper, we propose MSStyleTTS, a style modeling method for expressive speech synthesis, to capture and predict styles at different levels from a wider range of context rather than a sentence. Two sub-modules, including multi-scale style extractor and multi-scale style predictor, are trained together with a FastSpeech 2 based acoustic model. The predictor is designed to explore the hierarchical context information by considering structural relationships in context and predict style embeddings at global-level, sentence-level and subword-level. The extractor extracts multi-scale style embedding from the ground-truth speech and explicitly guides the style prediction. Evaluations on both in-domain and out-of-domain audiobook datasets demonstrate that the proposed method significantly outperforms the three baselines. In addition, we conduct the analysis of the context information and multi-scale style representations that have never been discussed before.

摘要
<>Expressive speech synthesis is crucial for many human-computer interaction scenarios, such as audiobooks, podcasts, and voice assistants. Previous works focus on predicting the style embeddings at one single scale from the information within the current sentence. However, context information in neighboring sentences and the multi-scale nature of style in human speech are neglected, making it challenging to convert multi-sentence text into natural and expressive speech. In this paper, we propose MSStyleTTS, a style modeling method for expressive speech synthesis, to capture and predict styles at different levels from a wider range of context rather than a sentence. Two sub-modules, including a multi-scale style extractor and a multi-scale style predictor, are trained together with a FastSpeech 2 based acoustic model. The predictor is designed to explore the hierarchical context information by considering structural relationships in context and predict style embeddings at global-level, sentence-level, and subword-level. The extractor extracts multi-scale style embedding from the ground-truth speech and explicitly guides the style prediction. Evaluations on both in-domain and out-of-domain audiobook datasets demonstrate that the proposed method significantly outperforms the three baselines. In addition, we conduct the analysis of the context information and multi-scale style representations that have never been discussed before.中文简体版：人机交互场景中， expresive speech synthesis 是非常重要的，如 audiobooks、podcasts 和 voice assistants。先前的工作都是根据当前句子中的信息预测 style embedding 的，而忽略了周围句子的上下文信息和人类语音中的多级式样本，这使得将多句子文本转化为自然和 expresive speech 变得困难。在这篇论文中，我们提出了 MSStyleTTS，一种基于 FastSpeech 2 的 speech synthesis 模型，可以在更广泛的上下文中捕捉和预测多级式样本。我们的模型包括两个子模块：多级式样本抽取器和多级式样本预测器。前者从真实的语音中提取多级式样本 embedding，并直接导引预测；后者利用上下文关系来预测 style embedding 的 hierarchical 结构，包括全局水平、句子水平和字句水平。我们对具有域外和域内的 audiobook 数据集进行评估，结果显示，我们的方法至少超过了三个基线。此外，我们还进行了上下文信息和多级式样本表示的分析，这些研究方法从未被讨论过。

Moisesdb: A dataset for source separation beyond 4-stems

paper_url: http://arxiv.org/abs/2307.15913
repo_url: https://github.com/moises-ai/moises-db
paper_authors: Igor Pereira, Felipe Araújo, Filip Korzeniowski, Richard Vogl
for: 这个论文是为了介绍音乐源分离的MoisesDB数据集而写的。
methods: 这个论文使用了一个二级层次的分类法来组织音频源，并提供了一个使用Python编程的易于使用的库来下载、处理和使用MoisesDB数据集。
results: 这个论文提供了不同粒度的开源分离模型的基准结果，并分析了数据集的内容。

Abstract
In this paper, we introduce the MoisesDB dataset for musical source separation. It consists of 240 tracks from 45 artists, covering twelve musical genres. For each song, we provide its individual audio sources, organized in a two-level hierarchical taxonomy of stems. This will facilitate building and evaluating fine-grained source separation systems that go beyond the limitation of using four stems (drums, bass, other, and vocals) due to lack of data. To facilitate the adoption of this dataset, we publish an easy-to-use Python library to download, process and use MoisesDB. Alongside a thorough documentation and analysis of the dataset contents, this work provides baseline results for open-source separation models for varying separation granularities (four, five, and six stems), and discuss their results.

摘要
在这篇论文中，我们介绍了MoisesDB数据集，用于音乐来源分离。它包含240首歌曲，来自45位艺术家，涵盖了12种音乐类型。对每首歌曲，我们提供了它的个别音频来源，以二级层级的分类系统组织。这将促进建立和评估细化来源分离系统，超出了使用四个来源（鼓、 bass、其他和 vocals）的限制，因为缺乏数据。为便于使用这个数据集，我们在Python库中发布了一个易于使用的下载、处理和使用MoisesDB的工具。此外，我们还提供了数据集的详细文档和分析，以及不同的分离精度（四、五、六个来源）的基准结果。

UniBriVL: Robust Universal Representation and Generation of Audio Driven Diffusion Models

paper_url: http://arxiv.org/abs/2307.15898
repo_url: None
paper_authors: Sen Fang, Bowen Gao, Yangjian Wu, Jingwen Cai, Teik Toe Teoh
for: 这篇论文旨在提出一种基于 Bridging-Vision-and-Language（BriVL）的universal语言表示学习方法，以实现多modal应用程序的开发。
methods: 该方法使用audio、图像和文本在共享空间内嵌入，解决了多modal语言表示学习中的主要挑战，同时能够有效地捕捉audio和图像之间的相关性。
results: 我们的实验结果表明，UniBriVL在下游任务中具有较高的效果，并且能够从audio中生成相应的图像。此外，我们还进行了质量评估，发现UniBriVL能够生成高质量的图像。

Abstract
Multimodal large models have been recognized for their advantages in various performance and downstream tasks. The development of these models is crucial towards achieving general artificial intelligence in the future. In this paper, we propose a novel universal language representation learning method called UniBriVL, which is based on Bridging-Vision-and-Language (BriVL). Universal BriVL embeds audio, image, and text into a shared space, enabling the realization of various multimodal applications. Our approach addresses major challenges in robust language (both text and audio) representation learning and effectively captures the correlation between audio and image. Additionally, we demonstrate the qualitative evaluation of the generated images from UniBriVL, which serves to highlight the potential of our approach in creating images from audio. Overall, our experimental results demonstrate the efficacy of UniBriVL in downstream tasks and its ability to choose appropriate images from audio. The proposed approach has the potential for various applications such as speech recognition, music signal processing, and captioning systems.

摘要
多Modal大型模型已被认为具有多种表现和下游任务的优势。这些模型的开发是未来通用人工智能的重要步骤。在这篇文章中，我们提出了一种新的通用语言表现学习方法，即UniBriVL，它基于桥接视觉和语言（BriVL）。这个通用BriVL嵌入音频、影像和文本到共享空间中，使得实现多modal应用的可能性。我们的方法解决了语言表现学习中的重要挑战，并具有优秀的捕捉音频和影像之间的联乘。此外，我们还进行了生成图像的质感评估，以强调我们的方法在创建图像的能力。总的来说，我们的实验结果显示UniBriVL在下游任务中的有效性，并能够从音频中选择适当的图像。这种方法的应用包括语音识别、音乐信号处理和描述系统等。

2023-07-29

eess.AS

eess.AS - 2023-07-29

METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer

paper_url: http://arxiv.org/abs/2307.15951
repo_url: None
paper_authors: Xinfa Zhu, Yi Lei, Tao Li, Yongmao Zhang, Hongbin Zhou, Heng Lu, Lei Xie
for: 本文提出了一种解决cross-speaker和cross-lingual语言 Transfer issue的Multilingual Emotional TTS（METTS）模型，以提高语音合成器的表情和语言能力。
methods: 本文使用了DelightfulTTS作为基础模型，并提出了以下设计：首先，通过多尺度情感模型来分解语音 просодии，以提取语言独特的情感表达；其次，通过形式变换信息的扰动来提高语音样本的精度和多样性；最后，通过 вектор量化的情感匹配器来选择参考音频，以确保合成的语音具有良好的自然性和情感多样性。
results: 实验结果表明，METTS模型能够有效地解决cross-speaker和cross-lingual语言 Transfer issue，并且能够生成高质量的语音合成。

Abstract
Previous multilingual text-to-speech (TTS) approaches have considered leveraging monolingual speaker data to enable cross-lingual speech synthesis. However, such data-efficient approaches have ignored synthesizing emotional aspects of speech due to the challenges of cross-speaker cross-lingual emotion transfer - the heavy entanglement of speaker timbre, emotion, and language factors in the speech signal will make a system produce cross-lingual synthetic speech with an undesired foreign accent and weak emotion expressiveness. This paper proposes the Multilingual Emotional TTS (METTS) model to mitigate these problems, realizing both cross-speaker and cross-lingual emotion transfer. Specifically, METTS takes DelightfulTTS as the backbone model and proposes the following designs. First, to alleviate the foreign accent problem, METTS introduces multi-scale emotion modeling to disentangle speech prosody into coarse-grained and fine-grained scales, producing language-agnostic and language-specific emotion representations, respectively. Second, as a pre-processing step, formant shift-based information perturbation is applied to the reference signal for better disentanglement of speaker timbre in the speech. Third, a vector quantization-based emotion matcher is designed for reference selection, leading to decent naturalness and emotion diversity in cross-lingual synthetic speech. Experiments demonstrate the good design of METTS.

摘要
previous多语言文本到语音（TTS）方法都是利用单语言说话人数据来实现跨语言语音合成。然而，这些数据有效approaches忽略了合成语言表达的情感方面，因为跨说话人跨语言情感传递的挑战在语音信号中存在 speaker timbre、情感和语言因素的严重杂糅。这篇论文提出了多语言情感TTS（METTS）模型，以解决这些问题。 Specifically, METTS使用DelightfulTTS作为基础模型，并提出以下设计：First, 以解决Foreign accent问题，METTS引入多级情感模型，将语音 просоди分解成粗级和细级两个档次，从而生成语言无关和语言特定的情感表示。Second, 作为预处理步骤，基于formant shift的信息扰动被应用于参考信号，以更好地分离说话人timbre在语音中。Third, 设计了基于vector quantization的情感匹配器，用于参考选择，从而实现了质量良好的自然语言和情感多样性在跨语言合成语音中。实验表明METTS的设计是合理的。

2023-07-29

cs.CV

cs.CV - 2023-07-29

Enhancing Object Detection in Ancient Documents with Synthetic Data Generation and Transformer-Based Models

paper_url: http://arxiv.org/abs/2307.16005
repo_url: None
paper_authors: Zahra Ziran, Francesco Leotta, Massimo Mecella
for: 提高古文献中对象检测精度，减少假阳性结果。
methods: 通过计算媒体创建synthetic数据集，并将视觉特征提取integrated到对象检测过程中。
results: 通过实验，我们表明可以提高对象检测精度，有助于 Paleography 领域进行深入分析和理解历史文献。

Abstract
The study of ancient documents provides a glimpse into our past. However, the low image quality and intricate details commonly found in these documents present significant challenges for accurate object detection. The objective of this research is to enhance object detection in ancient documents by reducing false positives and improving precision. To achieve this, we propose a method that involves the creation of synthetic datasets through computational mediation, along with the integration of visual feature extraction into the object detection process. Our approach includes associating objects with their component parts and introducing a visual feature map to enable the model to discern between different symbols and document elements. Through our experiments, we demonstrate that improved object detection has a profound impact on the field of Paleography, enabling in-depth analysis and fostering a greater understanding of these valuable historical artifacts.

摘要
古文献研究可以带我们回到过去，但是这些文献中的图像质量低下和细节复杂度往往会对准确的对象探测带来挑战。我们的研究目标是提高古文献中对象探测的精度，减少假阳性和提高准确率。我们提出了一种方法，即通过计算媒介创造 synthetic 数据集，并将视觉特征提取 integrate 到对象探测过程中。我们的方法包括对象与其组成部分关联，并通过视觉特征地图来使模型能够辨别不同的符号和文档元素。经过我们的实验，我们发现，改进对象探测对 Paleography 领域有着深远的影响，帮助我们进行深入的分析和更好地理解这些历史遗产。

Automated Hit-frame Detection for Badminton Match Analysis

paper_url: http://arxiv.org/abs/2307.16000
repo_url: https://github.com/arthur900530/Automated-Hit-frame-Detection-for-Badminton-Match-Analysis
paper_authors: Yu-Hang Chien, Fang Yu
for: 这项研究旨在为羽毛球运动员提供更高水平的表现分析，帮助教练和运动员通过自动化工具来系统地评估自己的表现。
methods: 本研究使用现代深度学习技术来自动检测羽毛球比赛视频中的击框帧。检测过程包括赛事裁剪、运动员和球场关键点检测、拍篮轨迹预测和击框检测等自动化步骤。
results: 在本研究中，我们实现了视频裁剪精度99%，在应用运动员关键点序列中预测拍篮轨迹方向的准确率高于92%，并对赛事裁剪和击框检测进行评估。

Abstract
Sports professionals constantly under pressure to perform at the highest level can benefit from sports analysis, which allows coaches and players to reduce manual efforts and systematically evaluate their performance using automated tools. This research aims to advance sports analysis in badminton, systematically detecting hit-frames automatically from match videos using modern deep learning techniques. The data included in hit-frames can subsequently be utilized to synthesize players' strokes and on-court movement, as well as for other downstream applications such as analyzing training tasks and competition strategy. The proposed approach in this study comprises several automated procedures like rally-wise video trimming, player and court keypoints detection, shuttlecock flying direction prediction, and hit-frame detection. In the study, we achieved 99% accuracy on shot angle recognition for video trimming, over 92% accuracy for applying player keypoints sequences on shuttlecock flying direction prediction, and reported the evaluation results of rally-wise video trimming and hit-frame detection.

摘要
运动专业人员需要一直处于最高水平的压力，可以从运动分析中受益，使得教练和运动员可以通过自动化工具来系统地评估自己的表现。这项研究的目的是为了提高羽毛球运动分析，通过现代深度学习技术自动检测比赛视频中的击框帧。这些数据可以用于synthesize运动员的拍打和场上运动，以及其他下游应用程序，如分析训练任务和竞赛策略。本研究的方法包括多个自动化步骤，如赛事截割、运动员和场地关键点检测、拍球飞行方向预测和击框检测。在研究中，我们实现了视频截割时的射击角度识别率99%，以及在拍球飞行方向预测中的运动员关键点序列应用率超过92%。我们还发布了赛事截割和击框检测的评估结果。

Separate Scene Text Detector for Unseen Scripts is Not All You Need

paper_url: http://arxiv.org/abs/2307.15991
repo_url: None
paper_authors: Prateek Keserwani, Taveena Lotey, Rohit Keshari, Partha Pratim Roy
for: 解决多种文字识别问题在野外环境中
methods: 利用vector embedding将文字的roke信息映射到文字类别中
results: 在零 shot Setting下，提出的方法可以准确地检测未看过的文字类别

Abstract
Text detection in the wild is a well-known problem that becomes more challenging while handling multiple scripts. In the last decade, some scripts have gained the attention of the research community and achieved good detection performance. However, many scripts are low-resourced for training deep learning-based scene text detectors. It raises a critical question: Is there a need for separate training for new scripts? It is an unexplored query in the field of scene text detection. This paper acknowledges this problem and proposes a solution to detect scripts not present during training. In this work, the analysis has been performed to understand cross-script text detection, i.e., trained on one and tested on another. We found that the identical nature of text annotation (word-level/line-level) is crucial for better cross-script text detection. The different nature of text annotation between scripts degrades cross-script text detection performance. Additionally, for unseen script detection, the proposed solution utilizes vector embedding to map the stroke information of text corresponding to the script category. The proposed method is validated with a well-known multi-lingual scene text dataset under a zero-shot setting. The results show the potential of the proposed method for unseen script detection in natural images.

摘要
文本检测在野外是一个非常有挑战性的问题，其中多种文本的检测增加了挑战。过去的一个 десятилетие，一些文本种类在研究者们中引起了关注，并实现了良好的检测性能。然而，许多文本种类具有训练深度学习基于Scene文本检测器的资源不充分的问题。这个问题提出了一个关键问题：是否需要分开训练新的文本种类？这是Scene文本检测领域未曾研究的问题。本文承认这个问题，并提出了一种用于检测训练之外的文本种类的解决方案。在这种情况下，我们进行了跨脚本文本检测的分析，即将训练的一种文本种类测试在另一种文本种类上。我们发现，文本注释的标准化（word-level/line-level）是跨脚本文本检测的关键因素。不同的文本注释 между脚本会导致跨脚本文本检测性能下降。此外，为了检测未经训练的脚本，我们提出了使用vector embedding将文本的行为映射到脚本类别。我们的方法在一个知名的多语言Scene文本 dataset上进行零shot设定下进行验证。结果表明，我们的方法具有检测未经训练的脚本的潜在能力。

RGB-D-Fusion: Image Conditioned Depth Diffusion of Humanoid Subjects

paper_url: http://arxiv.org/abs/2307.15988
repo_url: None
paper_authors: Sascha Kirch, Valeria Olyunina, Jan Ondřej, Rafael Pagés, Sergio Martin, Clara Pérez-Molina
for: 生成高分辨率深度图从低分辨率灰度图像中
methods: 使用多模态条件杂音扩散概率模型，首先生成低分辨率深度图，然后使用第二个杂音扩散概率模型来upsample深度图，并 introduce了一种新的增强技术，深度噪声增强
results: 实现高效地生成高分辨率深度图，并提高模型的RobustnessIn English, this translates to:
for: Generating high-resolution depth maps from low-resolution monocular RGB images
methods: Using a multi-modal conditional denoising diffusion probabilistic model, first generating a low-resolution depth map, and then upsampling the depth map using a second denoising diffusion probabilistic model conditioned on a low-resolution RGB-D image. Additionally, introducing a novel augmentation technique, depth noise augmentation, to increase the robustness of the super-resolution model.
results: Achieving high-quality super-resolution of depth maps and improving the robustness of the model.

Abstract
We present RGB-D-Fusion, a multi-modal conditional denoising diffusion probabilistic model to generate high resolution depth maps from low-resolution monocular RGB images of humanoid subjects. RGB-D-Fusion first generates a low-resolution depth map using an image conditioned denoising diffusion probabilistic model and then upsamples the depth map using a second denoising diffusion probabilistic model conditioned on a low-resolution RGB-D image. We further introduce a novel augmentation technique, depth noise augmentation, to increase the robustness of our super-resolution model.

摘要
我们介绍RGB-D-Fusion，一种多模态条件噪声扩散概率模型，用于生成高分辨率深度图像从低分辨率单颜色RGB图像中。RGB-D-Fusion首先使用一种图像条件噪声扩散概率模型生成低分辨率深度图像，然后使用第二种噪声扩散概率模型，条件于低分辨率RGB-D图像，进行� upsampling。我们还介绍了一种新的扩展技术，深度噪声扩散，以增强我们的超分辨率模型的稳定性。

Class-Specific Distribution Alignment for Semi-Supervised Medical Image Classification

paper_url: http://arxiv.org/abs/2307.15987
repo_url: None
paper_authors: Zhongzheng Huang, Jiawei Wu, Tao Wang, Zuoyong Li, Anastasia Ioannou
for: 这篇论文是为了解决医疗影像分类问题，因为数据标注是时间耗费很大，且疾病的分布是不均匀的。
methods: 本文提出了一个叫做分类对象分配（CSDA）的半有supervised学习框架，这个框架适用于从高度不均匀的数据集学习。特别是，我们从距离基础架的变数空间中考虑分配过程，并将这个过程转换为capture class-dependent marginal predictions的方法，以避免偏向多数组别的问题。此外，我们提出了一个变数条件队列（VCQ）模组，以确保每个类别的不断数据数量具有相同的比例。
results: 在三个公开数据集HAM10000、CheXpert和Kvasir上进行验证，我们发现our方法可以在半有supervised的情况下提供竞争性的表现，并且在医疗影像分类任务上获得了比较好的结果。

Abstract
Despite the success of deep neural networks in medical image classification, the problem remains challenging as data annotation is time-consuming, and the class distribution is imbalanced due to the relative scarcity of diseases. To address this problem, we propose Class-Specific Distribution Alignment (CSDA), a semi-supervised learning framework based on self-training that is suitable to learn from highly imbalanced datasets. Specifically, we first provide a new perspective to distribution alignment by considering the process as a change of basis in the vector space spanned by marginal predictions, and then derive CSDA to capture class-dependent marginal predictions on both labeled and unlabeled data, in order to avoid the bias towards majority classes. Furthermore, we propose a Variable Condition Queue (VCQ) module to maintain a proportionately balanced number of unlabeled samples for each class. Experiments on three public datasets HAM10000, CheXpert and Kvasir show that our method provides competitive performance on semi-supervised skin disease, thoracic disease, and endoscopic image classification tasks.

摘要
尽管深度神经网络在医疗图像分类中取得了成功，但问题仍然具有挑战性，因为数据标注是时间consuming的，而且疾病的分布是不均衡的，因为疾病的相对罕见性。为了解决这个问题，我们提出了类别特定分布对齐（CSDA），一种基于自动训练的 semi-supervised 学习框架，适用于学习高度不均衡的数据集。 Specifically, we first provide a new perspective to distribution alignment by considering the process as a change of basis in the vector space spanned by marginal predictions, and then derive CSDA to capture class-dependent marginal predictions on both labeled and unlabeled data, in order to avoid the bias towards majority classes. Furthermore, we propose a Variable Condition Queue (VCQ) module to maintain a proportionately balanced number of unlabeled samples for each class. Experiments on three public datasets HAM10000, CheXpert and Kvasir show that our method provides competitive performance on semi-supervised skin disease, thoracic disease, and endoscopic image classification tasks.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

GaitASMS: Gait Recognition by Adaptive Structured Spatial Representation and Multi-Scale Temporal Aggregation

paper_url: http://arxiv.org/abs/2307.15981
repo_url: None
paper_authors: Yan Sun, Hu Long, Xueling Feng, Mark Nixon
for: 本研究旨在提出一种新的步态识别方法，以提高步态识别精度和稳定性。
methods: 该方法使用适应结构 repreentation 抽取模块 (ASRE) 和多Scale temporal 聚合模块 (MSTA)，分别提取步态的适应结构和多Scale temporal 信息。此外，提出了一种新的数据增强技术，即随机mask，以增加样本空间的多样性。
results: 对于两个数据集，该方法能够达到竞争力的性能，特别是在复杂场景下（BG和CL）。在CASIA-B数据集上，GaitASMS方法的均值准确率为93.5%，与基准方法相比，在rank-1准确率上提高3.4%和6.3%。

Abstract
Gait recognition is one of the most promising video-based biometric technologies. The edge of silhouettes and motion are the most informative feature and previous studies have explored them separately and achieved notable results. However, due to occlusions and variations in viewing angles, their gait recognition performance is often affected by the predefined spatial segmentation strategy. Moreover, traditional temporal pooling usually neglects distinctive temporal information in gait. To address the aforementioned issues, we propose a novel gait recognition framework, denoted as GaitASMS, which can effectively extract the adaptive structured spatial representations and naturally aggregate the multi-scale temporal information. The Adaptive Structured Representation Extraction Module (ASRE) separates the edge of silhouettes by using the adaptive edge mask and maximizes the representation in semantic latent space. Moreover, the Multi-Scale Temporal Aggregation Module (MSTA) achieves effective modeling of long-short-range temporal information by temporally aggregated structure. Furthermore, we propose a new data augmentation, denoted random mask, to enrich the sample space of long-term occlusion and enhance the generalization of the model. Extensive experiments conducted on two datasets demonstrate the competitive advantage of proposed method, especially in complex scenes, i.e. BG and CL. On the CASIA-B dataset, GaitASMS achieves the average accuracy of 93.5\% and outperforms the baseline on rank-1 accuracies by 3.4\% and 6.3\%, respectively, in BG and CL. The ablation experiments demonstrate the effectiveness of ASRE and MSTA.

摘要
《跟踪识别是视频基metric技术中最有前途的一种。 Edge of silhouettes and motion是最有信息的特征，先前的研究已经分别研究这两个特征，并取得了可观的成果。然而，由于 occlusion 和视角变化， их跟踪识别性能受到预先定义的空间分割策略的影响。此外，传统的时间汇集通常忽略了跑步动作中的独特时间信息。为了解决上述问题，我们提出了一种新的跟踪识别框架， denoted as GaitASMS， which can effectively extract adaptive structured spatial representations and naturally aggregate multi-scale temporal information. Adaptive Structured Representation Extraction Module (ASRE) 使用 adaptive edge mask 来分离 Edge of silhouettes，并在 semantic latent space 中最大化表示。此外， Multi-Scale Temporal Aggregation Module (MSTA) 可以有效地模型长短距离的时间信息。此外，我们还提出了一种新的数据增强技术， denoted random mask，以增加跑步动作的随机 occlusion 样本空间，并提高模型的通用性。Extensive experiments conducted on two datasets demonstrate the competitive advantage of proposed method, especially in complex scenes, i.e. BG and CL. On the CASIA-B dataset, GaitASMS achieves the average accuracy of 93.5% and outperforms the baseline on rank-1 accuracies by 3.4% and 6.3%, respectively, in BG and CL. The ablation experiments demonstrate the effectiveness of ASRE and MSTA.

Fingerprints of Generative Models in the Frequency Domain

paper_url: http://arxiv.org/abs/2307.15977
repo_url: None
paper_authors: Tianyun Yang, Juan Cao, Danding Wang, Chang Xu
for: 这篇论文旨在分析CNN基于生成模型中的唯一指纹，以及这些指纹如何影响生成图像质量。
methods: 这篇论文使用频谱分析方法来解释CNN生成模型中的网络组件，并从这些频谱分析中提取出频谱分布和格子异常的源头。
results: 研究发现，通过使用低成本的生成模型，可以生成图像，这些图像具有与实际CNN生成模型中的频谱分布和格子异常相同的特征。这些特征可以用于验证、识别和分析CNN基于生成模型的唯一指纹。

Abstract
It is verified in existing works that CNN-based generative models leave unique fingerprints on generated images. There is a lack of analysis about how they are formed in generative models. Interpreting network components in the frequency domain, we derive sources for frequency distribution and grid-like pattern discrepancies exhibited on the spectrum. These insights are leveraged to develop low-cost synthetic models, which generate images emulating the frequency patterns observed in real generative models. The resulting fingerprint extractor pre-trained on synthetic data shows superior transferability in verifying, identifying, and analyzing the relationship of real CNN-based generative models such as GAN, VAE, Flow, and diffusion.

摘要
已经在现有的研究中证明，基于Convolutional Neural Network（CNN）的生成模型会留下唯一的指纹在生成图像中。然而，关于如何形成这些指纹的分析却缺乏。我们通过解释网络组件的频谱域分析，得出了频谱分布和格子状差的来源。这些发现被利用来开发低成本的生成模型，这些模型可以生成图像，具有真实生成模型中观察到的频谱特征。这种指纹提取器在嵌入数据上进行预训练后，能够superior的传播性，用于验证、识别和分析真实的CNN基于生成模型，如GAN、VAE、Flow和diffusion。

XMem++: Production-level Video Segmentation From Few Annotated Frames

paper_url: http://arxiv.org/abs/2307.15958
repo_url: https://github.com/max810/XMem2
paper_authors: Maksym Bekuzarov, Ariana Bermudez, Joon-Young Lee, Hao Li
for: 该论文旨在提高现有的内存基于模型，以提高用户指导视频分割的精度和效率。
methods: 该方法使用了一种新的带有常见Memory模块的半监督视频物体分割模型，可以有效地处理多帧用户选择的图像。
results: 该方法可以在不需要重新训练的情况下，在具有多类和多帧的分割任务中提供了最佳性能，并且需要较少的帧注释量。

Abstract
Despite advancements in user-guided video segmentation, extracting complex objects consistently for highly complex scenes is still a labor-intensive task, especially for production. It is not uncommon that a majority of frames need to be annotated. We introduce a novel semi-supervised video object segmentation (SSVOS) model, XMem++, that improves existing memory-based models, with a permanent memory module. Most existing methods focus on single frame annotations, while our approach can effectively handle multiple user-selected frames with varying appearances of the same object or region. Our method can extract highly consistent results while keeping the required number of frame annotations low. We further introduce an iterative and attention-based frame suggestion mechanism, which computes the next best frame for annotation. Our method is real-time and does not require retraining after each user input. We also introduce a new dataset, PUMaVOS, which covers new challenging use cases not found in previous benchmarks. We demonstrate SOTA performance on challenging (partial and multi-class) segmentation scenarios as well as long videos, while ensuring significantly fewer frame annotations than any existing method. Project page: https://max810.github.io/xmem2-project-page/

摘要
尽管用户导导视频分割技术已经取得了进步，但是在高度复杂的场景下，提取复杂对象的工作仍然是一项劳动密集的任务，特别是在生产环境中。一般来说，大多数帧需要被标注。我们介绍了一种新的半监督视频对象分割（SSVOS）模型，XMem++，该模型在已有的记忆型模型基础上进行改进，具有永久记忆模块。大多数现有方法都是单帧标注的，而我们的方法可以有效地处理多个用户选择的帧，这些帧可能有不同的对象或区域的变化。我们的方法可以提取高度一致的结果，同时保持需要的帧标注数量低。我们还引入了一种迭代和注意力基于的帧建议机制，该机制可以计算下一帧的标注。我们的方法是实时的，不需要重新训练 после每个用户输入。我们还介绍了一个新的数据集，PUMaVOS，该数据集包括一些不同于过去的 benchmark 中的新吸引人用 caso。我们在这些挑战性（部分和多类）分割场景以及长视频中表现出了最高级的表现，同时保持了与任何现有方法相比的帧标注数量减少。项目页面：https://max810.github.io/xmem2-project-page/

CMDA: Cross-Modality Domain Adaptation for Nighttime Semantic Segmentation

paper_url: http://arxiv.org/abs/2307.15942
repo_url: https://github.com/xiarho/cmda
paper_authors: Ruihao Xia, Chaoqiang Zhao, Meng Zheng, Ziyan Wu, Qiyu Sun, Yang Tang
for: 提高夜间semantic segmentation的精度和效果，使用多Modalities的信息（图像和事件）进行培育。
methods: 提出了一种基于无监督的cross-modality domain adaptation（CMDA）框架，通过图像动态特征Extractor和图像内容特征Extractor来桥接不同的Modalities和频率域。
results: 在公共图像集和提出的图像-事件集上进行了广泛的实验，并得到了效果的结果，同时还开源了代码、模型和数据集。

Abstract
Most nighttime semantic segmentation studies are based on domain adaptation approaches and image input. However, limited by the low dynamic range of conventional cameras, images fail to capture structural details and boundary information in low-light conditions. Event cameras, as a new form of vision sensors, are complementary to conventional cameras with their high dynamic range. To this end, we propose a novel unsupervised Cross-Modality Domain Adaptation (CMDA) framework to leverage multi-modality (Images and Events) information for nighttime semantic segmentation, with only labels on daytime images. In CMDA, we design the Image Motion-Extractor to extract motion information and the Image Content-Extractor to extract content information from images, in order to bridge the gap between different modalities (Images to Events) and domains (Day to Night). Besides, we introduce the first image-event nighttime semantic segmentation dataset. Extensive experiments on both the public image dataset and the proposed image-event dataset demonstrate the effectiveness of our proposed approach. We open-source our code, models, and dataset at https://github.com/XiaRho/CMDA.

摘要
大多数夜间semantic segmentation研究基于领域适应方法和图像输入。然而，由于普通相机的低动态范围，图像在低照度条件下无法捕捉结构信息和边界信息。事件摄像机作为视觉感知器的新形态，它们与普通相机相比具有高动态范围。为此，我们提出了一种新的无监督 Cross-Modality Domain Adaptation（CMDA）框架，以利用多模态信息（图像和事件）进行夜间semantic segmentation，只需要在白天图像上提供标签。在CMDA中，我们设计了图像运动提取器和图像内容提取器，以EXTRACTING Motion information和图像内容信息，以桥接不同模态（图像到事件）和领域（白天到夜）。此外，我们提出了首个图像-事件夜间semantic segmentation数据集。我们在公共图像集和我们提议的图像-事件集上进行了广泛的实验，并证明了我们的提议方法的有效性。我们在https://github.com/XiaRho/CMDA上分享了我们的代码、模型和数据集。

Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images

paper_url: http://arxiv.org/abs/2307.15904
repo_url: None
paper_authors: Aayush Dhakal, Adeel Ahmad, Subash Khanal, Srikumar Sastry, Nathan Jacobs
for: 这个论文旨在开发一种基于自由文本描述（或标题）的弱监督地图创建方法。
methods: 该论文使用了一种叫做Sat2Cap的对比学习框架，在一个大规模的对比图像和地面图像 dataset 上训练。
results: 该模型能够successfully capture细腻概念和有效地适应时间变化。 codes, datasets, 和模型将被公开发布。

Abstract
We propose a novel weakly supervised approach for creating maps using free-form textual descriptions (or captions). We refer to this new line of work of creating textual maps as zero-shot mapping. Prior works have approached mapping tasks by developing models that predict over a fixed set of attributes using overhead imagery. However, these models are very restrictive as they can only solve highly specific tasks for which they were trained. Mapping text, on the other hand, allows us to solve a large variety of mapping problems with minimal restrictions. To achieve this, we train a contrastive learning framework called Sat2Cap on a new large-scale dataset of paired overhead and ground-level images. For a given location, our model predicts the expected CLIP embedding of the ground-level scenery. Sat2Cap is also conditioned on temporal information, enabling it to learn dynamic concepts that vary over time. Our experimental results demonstrate that our models successfully capture fine-grained concepts and effectively adapt to temporal variations. Our approach does not require any text-labeled data making the training easily scalable. The code, dataset, and models will be made publicly available.

摘要
我们提出了一种新的无监督方法，通过自由文本描述（或标题）来创建地图。我们称这种新的地图创建方法为零shot mapping。前一些工作都是通过开发模型，以预测Fixed集合属性使用过头像来解决地图任务。然而，这些模型具有很多限制，只能解决高度特定的任务。在text中映射，我们可以解决各种地图问题，几乎没有限制。为 достичь这一点，我们训练了一个异构学习框架called Sat2Cap，使用一个新的大规模的对应过头像和地面照片的数据集来训练。对于给定的位置，我们的模型预测地面照片中预期的CLIP嵌入。Sat2Cap还受到时间信息的限制，使其学习时间变化的动态概念。我们的实验结果表明，我们的模型成功捕捉细腻概念，并有效地适应时间变化。我们的方法不需要任何文本标注数据，因此训练非常可扩展。代码、数据集和模型将公开发布。

Effective Whole-body Pose Estimation with Two-stages Distillation

paper_url: http://arxiv.org/abs/2307.15880
repo_url: https://github.com/idea-research/dwpose
paper_authors: Zhendong Yang, Ailing Zeng, Chun Yuan, Yu Li
for: 本研究旨在提高全身姿势估计器的效率和精度。
methods: 我们提出了一个两阶段的姿势炼制方法，名为DWPose，以提高姿势估计器的效果和效率。
results: 我们的方法可以在COCO-WholeBody测试集上实现新的顶尖性能，从RTMPose-l的64.8% Whole Body AP提高到66.5%，甚至超过RTMPose-x教师的65.3% AP。

Abstract
Whole-body pose estimation localizes the human body, hand, face, and foot keypoints in an image. This task is challenging due to multi-scale body parts, fine-grained localization for low-resolution regions, and data scarcity. Meanwhile, applying a highly efficient and accurate pose estimator to widely human-centric understanding and generation tasks is urgent. In this work, we present a two-stage pose \textbf{D}istillation for \textbf{W}hole-body \textbf{P}ose estimators, named \textbf{DWPose}, to improve their effectiveness and efficiency. The first-stage distillation designs a weight-decay strategy while utilizing a teacher's intermediate feature and final logits with both visible and invisible keypoints to supervise the student from scratch. The second stage distills the student model itself to further improve performance. Different from the previous self-knowledge distillation, this stage finetunes the student's head with only 20% training time as a plug-and-play training strategy. For data limitations, we explore the UBody dataset that contains diverse facial expressions and hand gestures for real-life applications. Comprehensive experiments show the superiority of our proposed simple yet effective methods. We achieve new state-of-the-art performance on COCO-WholeBody, significantly boosting the whole-body AP of RTMPose-l from 64.8% to 66.5%, even surpassing RTMPose-x teacher with 65.3% AP. We release a series of models with different sizes, from tiny to large, for satisfying various downstream tasks. Our codes and models are available at https://github.com/IDEA-Research/DWPose.

摘要
全身姿势估计本地化人体、手、面、脚关键点在图像中。这项任务因多级体部、细化本地化低分辨率区域以及数据缺乏而具有挑战性。然而，在人类中心的理解和生成任务中应用高效精度的姿势估计器是急需的。在这种情况下，我们提出了一种两阶段的姿势估计精炼方法，称为DWPose。该方法可以提高姿势估计器的效iveness和精度。首先，我们设计了一种权重衰减策略，并利用教师的中间特征和最终归一化值来监督学生从零开始学习。其次，我们将学生模型自身精炼，以进一步提高其性能。与之前的自知ledge精炼不同，这一阶段只需要训练学生的头部，并且只需20%的训练时间。为了Addressing数据缺乏问题，我们探索了UBody数据集，该数据集包含多种表情和手势，适用于实际应用。我们进行了全面的实验，并证明了我们提出的简单 yet effective的方法的优越性。我们在COCO-WholeBody上 achieve新的状态码性能，从64.8%提高到66.5%，甚至超过RTMPose-x教师的65.3%AP。我们释放了不同大小的模型，从tiny到大，以满足不同的下游任务。我们的代码和模型可以在https://github.com/IDEA-Research/DWPose中获取。

Cross-dimensional transfer learning in medical image segmentation with deep learning

paper_url: http://arxiv.org/abs/2307.15872
repo_url: https://github.com/hic-messaoudi/cross-dimensional-transfer-learning-in-medical-image-segmentation-with-deep-learning
paper_authors: Hicham Messaoudi, Ahror Belaid, Douraied Ben Salem, Pierre-Henri Conze
for: This paper is focused on improving the efficiency of medical image segmentation using convolutional neural networks (CNNs) and transfer learning.
methods: The authors propose two novel architectures based on weight transfer and dimensional transfer to adapt a pre-trained 2D CNN to 2D, 3D uni- and multi-modal medical image segmentation tasks.
results: The proposed methods were tested on several benchmarks and achieved promising results, ranking first on the CAMUS challenge and outperforming other 2D-based methods on the CHAOS challenge. The 3D network also achieved good results on the BraTS 2022 competition, with an average Dice score of 91.69% for the whole tumor.

Abstract
Over the last decade, convolutional neural networks have emerged and advanced the state-of-the-art in various image analysis and computer vision applications. The performance of 2D image classification networks is constantly improving and being trained on databases made of millions of natural images. However, progress in medical image analysis has been hindered by limited annotated data and acquisition constraints. These limitations are even more pronounced given the volumetry of medical imaging data. In this paper, we introduce an efficient way to transfer the efficiency of a 2D classification network trained on natural images to 2D, 3D uni- and multi-modal medical image segmentation applications. In this direction, we designed novel architectures based on two key principles: weight transfer by embedding a 2D pre-trained encoder into a higher dimensional U-Net, and dimensional transfer by expanding a 2D segmentation network into a higher dimension one. The proposed networks were tested on benchmarks comprising different modalities: MR, CT, and ultrasound images. Our 2D network ranked first on the CAMUS challenge dedicated to echo-cardiographic data segmentation and surpassed the state-of-the-art. Regarding 2D/3D MR and CT abdominal images from the CHAOS challenge, our approach largely outperformed the other 2D-based methods described in the challenge paper on Dice, RAVD, ASSD, and MSSD scores and ranked third on the online evaluation platform. Our 3D network applied to the BraTS 2022 competition also achieved promising results, reaching an average Dice score of 91.69% (91.22%) for the whole tumor, 83.23% (84.77%) for the tumor core, and 81.75% (83.88%) for enhanced tumor using the approach based on weight (dimensional) transfer. Experimental and qualitative results illustrate the effectiveness of our methods for multi-dimensional medical image segmentation.

摘要
Our approach is based on two key principles: weight transfer and dimensional transfer. We embed a pre-trained 2D encoder into a higher dimensional U-Net to transfer the weights of the 2D network to the 3D network. Additionally, we expand the 2D segmentation network into a higher dimensional one to transfer the dimensionality of the 2D network to the 3D network.We tested our proposed networks on several benchmarks, including MR, CT, and ultrasound images. Our 2D network ranked first on the CAMUS challenge for echo-cardiographic data segmentation and surpassed the state-of-the-art. On the CHAOS challenge, our approach outperformed other 2D-based methods on Dice, RAVD, ASSD, and MSSD scores and ranked third on the online evaluation platform. Our 3D network achieved promising results on the BraTS 2022 competition, with an average Dice score of 91.69% (91.22%) for the whole tumor, 83.23% (84.77%) for the tumor core, and 81.75% (83.88%) for enhanced tumor.Experimental and qualitative results demonstrate the effectiveness of our methods for multi-dimensional medical image segmentation. By leveraging the efficiency of 2D networks and the robustness of 3D networks, our approach offers a promising solution for medical image analysis.

Catching Elusive Depression via Facial Micro-Expression Recognition

paper_url: http://arxiv.org/abs/2307.15862
repo_url: None
paper_authors: Xiaohui Chen, Tie Luo
for: 您的报告是为了诊断隐藏性抑郁症？
methods: 您使用了面部微表情（FMEs）来检测和识别真正的情感？
results: 您的研究发现了一种使用面部特征点来解决检测极低强度和细微的FMEs的方法，并提出了一种低成本、隐私保护的自诊断解决方案，可以在家庭环境中使用手持式移动设备进行诊断。

Abstract
Depression is a common mental health disorder that can cause consequential symptoms with continuously depressed mood that leads to emotional distress. One category of depression is Concealed Depression, where patients intentionally or unintentionally hide their genuine emotions through exterior optimism, thereby complicating and delaying diagnosis and treatment and leading to unexpected suicides. In this paper, we propose to diagnose concealed depression by using facial micro-expressions (FMEs) to detect and recognize underlying true emotions. However, the extremely low intensity and subtle nature of FMEs make their recognition a tough task. We propose a facial landmark-based Region-of-Interest (ROI) approach to address the challenge, and describe a low-cost and privacy-preserving solution that enables self-diagnosis using portable mobile devices in a personal setting (e.g., at home). We present results and findings that validate our method, and discuss other technical challenges and future directions in applying such techniques to real clinical settings.

摘要
抑郁是一种常见的心理健康问题，可能导致持续的沮丧情绪，从而引起情感压力。一种类型的抑郁是隐藏的抑郁，病人可能有意或无意地隐藏真实的情感，从而使诊断和治疗受阻，并可能导致意外的自杀。在这篇论文中，我们提议使用表情微表情（FMEs）来检测和识别隐藏的抑郁。然而，FMEs的非常低敏感度和细微的特征使其识别成为一项困难的任务。我们提议一种面部特征点-基于的区域引用（ROI）方法，以解决这个挑战。我们描述了一种低成本、隐私保护的解决方案，允许个人在家中使用可携带的移动设备进行自诊断。我们提供的结果和发现证明了我们的方法的有效性，并讨论了在实际临床设置中应用这种技术的其他技术挑战和未来方向。

What can Discriminator do? Towards Box-free Ownership Verification of Generative Adversarial Network

paper_url: http://arxiv.org/abs/2307.15860
repo_url: None
paper_authors: Ziheng Huang, Boheng Li, Yan Cai, Run Wang, Shangwei Guo, Liming Fang, Jing Chen, Lina Wang
for: 本研究旨在防止生成器模型被非法盗用或泄露，通过在不选择输入的情况下进行所有权验证。
methods: 我们提出了一种基于权威识别器的IP保护方案，通过训练权威识别器学习一个圆柱体来捕捉生成器唯一的分布。
results: 我们的方案在两个受欢迎的GAN任务和多达10个GAN架构上进行了广泛的评估，并 показа出高效地验证所有权。此外，我们的方案还能够抵御输入基于的移除攻击和其他已有攻击。

Abstract
In recent decades, Generative Adversarial Network (GAN) and its variants have achieved unprecedented success in image synthesis. However, well-trained GANs are under the threat of illegal steal or leakage. The prior studies on remote ownership verification assume a black-box setting where the defender can query the suspicious model with specific inputs, which we identify is not enough for generation tasks. To this end, in this paper, we propose a novel IP protection scheme for GANs where ownership verification can be done by checking outputs only, without choosing the inputs (i.e., box-free setting). Specifically, we make use of the unexploited potential of the discriminator to learn a hypersphere that captures the unique distribution learned by the paired generator. Extensive evaluations on two popular GAN tasks and more than 10 GAN architectures demonstrate our proposed scheme to effectively verify the ownership. Our proposed scheme shown to be immune to popular input-based removal attacks and robust against other existing attacks. The source code and models are available at https://github.com/AbstractTeen/gan_ownership_verification

摘要
近年来，生成对抗网络（GAN）和其变种在图像生成方面取得了历史上无 precedent 的成功。然而，已经训练过的 GAN 受到非法窃取或泄露的威胁。先前的研究中，攻击者可以通过特定输入来访问异常模型，这种黑盒 Setting 我们发现是不够的 для生成任务。为此，在本文中，我们提出了一种新的知识Property protection scheme for GANs，可以通过检查输出来验证所有权，不需要选择输入（即无盒 Setting）。我们利用了generator和discriminator之间的可能性，通过学习一个捕捉生成器学习的唯一分布的卷积。我们的提议方案在两个流行的 GAN 任务和多于10个 GAN 架构上进行了广泛的评估，并示出了有效的所有权验证。我们的提议方案具有免疫输入基于移除攻击和其他现有攻击的特点。source code和模型可以在上获取。

Seeing Behind Dynamic Occlusions with Event Cameras

paper_url: http://arxiv.org/abs/2307.15829
repo_url: None
paper_authors: Rong Zou, Manasi Muglikar, Nico Messikommer, Davide Scaramuzza
for: 提高计算机视觉系统的性能，解决干扰物（如尘埃、雨滴、雪花）对计算机视觉系统的影响
methods: 组合传统摄像机和事件摄像机，利用事件提供高时间分辨率的背景内容重建
results: 比image填充方法高3dB的PSNR提高，在我们的数据集上表现出色

Abstract
Unwanted camera occlusions, such as debris, dust, rain-drops, and snow, can severely degrade the performance of computer-vision systems. Dynamic occlusions are particularly challenging because of the continuously changing pattern. Existing occlusion-removal methods currently use synthetic aperture imaging or image inpainting. However, they face issues with dynamic occlusions as these require multiple viewpoints or user-generated masks to hallucinate the background intensity. We propose a novel approach to reconstruct the background from a single viewpoint in the presence of dynamic occlusions. Our solution relies for the first time on the combination of a traditional camera with an event camera. When an occlusion moves across a background image, it causes intensity changes that trigger events. These events provide additional information on the relative intensity changes between foreground and background at a high temporal resolution, enabling a truer reconstruction of the background content. We present the first large-scale dataset consisting of synchronized images and event sequences to evaluate our approach. We show that our method outperforms image inpainting methods by 3dB in terms of PSNR on our dataset.

摘要
不想要的摄像头干扰，如垃圾、尘埃、雨滴和雪，可能严重降低计算机视觉系统的性能。动态干扰特别困难，因为它们的模式在不断变化。现有的干扰除方法使用合成光学投影或图像填充。然而，它们在动态干扰情况下存在问题，因为它们需要多个视点或用户生成的面积图来描述背景强度。我们提出了一种新的方法，使用传统摄像头和事件摄像头的组合来重建背景。当干扰物移动过背景图像时，它会导致强度变化，这些变化会触发事件。这些事件提供高度准确的时间分辨率中的背景内容的重建信息。我们提供了首个大规模的同步图像和事件序列数据集，以评估我们的方法。我们表明，我们的方法在我们的数据集上比图像填充方法高3dB的PSNR。

Multi-growth stage plant recognition: a case study of Palmer amaranth (Amaranthus palmeri) in cotton (Gossypium hirsutum)

paper_url: http://arxiv.org/abs/2307.15816
repo_url: None
paper_authors: Guy RY Coleman, Matthew Kutugata, Michael J Walsh, Muthukumar Bagavathiannan
For: The paper is focused on developing and testing a method for recognizing growth stages of Amaranthus palmeri (a weed plant in cotton production) using convolutional neural networks (CNNs) and the You Only Look Once (YOLO) architecture.* Methods: The authors use 26 different architecture variants from YOLO v3, v5, v6, v6 3.0, v7, and v8 to recognize eight different growth stages of A. palmeri. They compare the performance of these architectures on an eight-class growth stage dataset and use class activation maps (CAM) to understand model attention on the complex dataset.* Results: The highest mAP@[0.5:0.95] for recognition of all growth stage classes was 47.34% achieved by v8-X, with inter-class confusion across visually similar growth stages. With all growth stages grouped as a single class, performance increased, with a maximum mAP@[0.5:0.95] of 67.05% achieved by v7-Original. Single class recall of up to 81.42% was achieved by v5-X, and precision of up to 89.72% was achieved by v8-X.

Abstract
Many advanced, image-based precision agricultural technologies for plant breeding, field crop research, and site-specific crop management hinge on the reliable detection and phenotyping of plants across highly variable morphological growth stages. Convolutional neural networks (CNNs) have shown promise for image-based plant phenotyping and weed recognition, but their ability to recognize growth stages, often with stark differences in appearance, is uncertain. Amaranthus palmeri (Palmer amaranth) is a particularly challenging weed plant in cotton (Gossypium hirsutum) production, exhibiting highly variable plant morphology both across growth stages over a growing season, as well as between plants at a given growth stage due to high genetic diversity. In this paper, we investigate eight-class growth stage recognition of A. palmeri in cotton as a challenging model for You Only Look Once (YOLO) architectures. We compare 26 different architecture variants from YOLO v3, v5, v6, v6 3.0, v7, and v8 on an eight-class growth stage dataset of A. palmeri. The highest mAP@[0.5:0.95] for recognition of all growth stage classes was 47.34% achieved by v8-X, with inter-class confusion across visually similar growth stages. With all growth stages grouped as a single class, performance increased, with a maximum mean average precision (mAP@[0.5:0.95]) of 67.05% achieved by v7-Original. Single class recall of up to 81.42% was achieved by v5-X, and precision of up to 89.72% was achieved by v8-X. Class activation maps (CAM) were used to understand model attention on the complex dataset. Fewer classes, grouped by visual or size features improved performance over the ground-truth eight-class dataset. Successful growth stage detection highlights the substantial opportunity for improving plant phenotyping and weed recognition technologies with open-source object detection architectures.

摘要
Many advanced, image-based precision agricultural technologies for plant breeding, field crop research, and site-specific crop management rely on the reliable detection and phenotyping of plants across highly variable morphological growth stages. Convolutional neural networks (CNNs) have shown promise for image-based plant phenotyping and weed recognition, but their ability to recognize growth stages, often with stark differences in appearance, is uncertain. Amaranthus palmeri (Palmer amaranth) is a particularly challenging weed plant in cotton (Gossypium hirsutum) production, exhibiting highly variable plant morphology both across growth stages over a growing season and between plants at a given growth stage due to high genetic diversity. In this paper, we investigate eight-class growth stage recognition of A. palmeri in cotton as a challenging model for You Only Look Once (YOLO) architectures. We compare 26 different architecture variants from YOLO v3, v5, v6, v6 3.0, v7, and v8 on an eight-class growth stage dataset of A. palmeri. The highest mAP@[0.5:0.95] for recognition of all growth stage classes was 47.34% achieved by v8-X, with inter-class confusion across visually similar growth stages. With all growth stages grouped as a single class, performance increased, with a maximum mean average precision (mAP@[0.5:0.95]) of 67.05% achieved by v7-Original. Single class recall of up to 81.42% was achieved by v5-X, and precision of up to 89.72% was achieved by v8-X. Class activation maps (CAM) were used to understand model attention on the complex dataset. Fewer classes, grouped by visual or size features improved performance over the ground-truth eight-class dataset. Successful growth stage detection highlights the substantial opportunity for improving plant phenotyping and weed recognition technologies with open-source object detection architectures.

VPP: Efficient Conditional 3D Generation via Voxel-Point Progressive Representation

paper_url: http://arxiv.org/abs/2307.16605
repo_url: https://github.com/qizekun/vpp
paper_authors: Zekun Qi, Muzhou Yu, Runpei Dong, Kaisheng Ma
for: 这篇论文主要用于提高3D生成的效率和质量。
methods: 该论文提出了一种进步的生成方法，即精细体量进行渐进生成（VPP），该方法结合了精细体量表示法和点云拼接技术，以实现高效的多类Object生成。
results: 实验表明，VPP可以高效地生成高质量的8K点云数据，并且可以 Transfer Learning 到多种3D下渠道任务，如生成、编辑、完成和预训练。

Abstract
Conditional 3D generation is undergoing a significant advancement, enabling the free creation of 3D content from inputs such as text or 2D images. However, previous approaches have suffered from low inference efficiency, limited generation categories, and restricted downstream applications. In this work, we revisit the impact of different 3D representations on generation quality and efficiency. We propose a progressive generation method through Voxel-Point Progressive Representation (VPP). VPP leverages structured voxel representation in the proposed Voxel Semantic Generator and the sparsity of unstructured point representation in the Point Upsampler, enabling efficient generation of multi-category objects. VPP can generate high-quality 8K point clouds within 0.2 seconds. Additionally, the masked generation Transformer allows for various 3D downstream tasks, such as generation, editing, completion, and pre-training. Extensive experiments demonstrate that VPP efficiently generates high-fidelity and diverse 3D shapes across different categories, while also exhibiting excellent representation transfer performance. Codes will be released on https://github.com/qizekun/VPP.

摘要
<>将文本翻译成简化中文。<>Conditional 3D生成在进行了重要的进步，允许根据文本或2D图像生成3D内容。然而，之前的方法受到了低效率推理、有限的生成类别和下游应用的限制。在这种工作中，我们重新评估了不同的3D表示方式对生成质量和效率的影响。我们提议一种逐步生成方法，通过精细 voxel 表示法和简单点表示法的结合，实现高效的多类对象生成。我们称之为精细点进程表示（VPP）。VPP可以在0.2秒内生成8K点云，并且可以进行多种3D下游任务，如生成、编辑、完成和预训练。广泛的实验表明，VPP可以高效地生成多种不同类别的高质量3D形状，同时也展现出了优秀的表示转移性能。代码将在 GitHub 上发布。

Semi-Supervised Object Detection in the Open World

paper_url: http://arxiv.org/abs/2307.15710
repo_url: None
paper_authors: Garvita Allabadi, Ana Lucic, Peter Pao-Huang, Yu-Xiong Wang, Vikram Adve
for: 本研究旨在开普世 semi-supervised object detection 领域，解决预设固定类目存在于训练和无标签数据中的问题。
methods: 我们提出了 Open World Semi-supervised Detection 框架 (OWSSD)，可以有效地探测异distribution（OOD）样本，同时还可以从这些样本中学习。我们还提出了一个ensemble基于自动编码网络，通过仅使用固定类目数据进行训练。
results: 我们通过广泛的评估表明，我们的方法可以与现状的OOD探测算法竞争，同时也可以在开放世界enario中显著提高 semi-supervised 学习性能。

Abstract
Existing approaches for semi-supervised object detection assume a fixed set of classes present in training and unlabeled datasets, i.e., in-distribution (ID) data. The performance of these techniques significantly degrades when these techniques are deployed in the open-world, due to the fact that the unlabeled and test data may contain objects that were not seen during training, i.e., out-of-distribution (OOD) data. The two key questions that we explore in this paper are: can we detect these OOD samples and if so, can we learn from them? With these considerations in mind, we propose the Open World Semi-supervised Detection framework (OWSSD) that effectively detects OOD data along with a semi-supervised learning pipeline that learns from both ID and OOD data. We introduce an ensemble based OOD detector consisting of lightweight auto-encoder networks trained only on ID data. Through extensive evalulation, we demonstrate that our method performs competitively against state-of-the-art OOD detection algorithms and also significantly boosts the semi-supervised learning performance in open-world scenarios.

摘要
现有的半指导object detection方法假设training和无标据数据集中有固定的类型存在，即在distribution（ID）数据。在开放世界中部署这些技术时，其性能会受到很大的影响，因为测试数据可能包含不同于训练数据中的对象，即out-of-distribution（OOD）数据。我们在这篇论文中考虑以下两个关键问题：可以探测OOD样本吗，如果可以，我们可以学习吗？针对这些考虑，我们提出了开放世界半指导检测框架（OWSSD），可以有效地检测OOD数据，同时还可以从ID和OOD数据中学习。我们提出了一种 ensemble 基于 auto-encoder 网络，只在 ID 数据上训练。经过广泛的评估，我们示出了我们的方法与现有的OOD检测算法相比，性能具有竞争力，同时还可以在开放世界 scenarios 中提高半指导学习性能。

MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking

paper_url: http://arxiv.org/abs/2307.15700
repo_url: https://github.com/mcg-nju/memotr
paper_authors: Ruopeng Gao, Limin Wang
for: 本研究旨在提高多目标追踪（MOT）中的时间信息捕捉效果，尤其是在短时间内的目标追踪 tasks 上。
methods: 我们提出了一种基于Transformer的长期记忆扩展方法，named MeMOTR，通过将长期记忆注入到自定义的记忆注意力层中，使同一个目标的追踪嵌入更加稳定和区别化。
results: 我们在DanceTrack上进行了实验，结果显示MeMOTR比前一代方法提高7.9%和13.0%的HOTA和AssA指标，并且在MOT17和BDD100K上也超越了其他Transformer-based方法的协议性表现。

Abstract
As a video task, Multiple Object Tracking (MOT) is expected to capture temporal information of targets effectively. Unfortunately, most existing methods only explicitly exploit the object features between adjacent frames, while lacking the capacity to model long-term temporal information. In this paper, we propose MeMOTR, a long-term memory-augmented Transformer for multi-object tracking. Our method is able to make the same object's track embedding more stable and distinguishable by leveraging long-term memory injection with a customized memory-attention layer. This significantly improves the target association ability of our model. Experimental results on DanceTrack show that MeMOTR impressively surpasses the state-of-the-art method by 7.9% and 13.0% on HOTA and AssA metrics, respectively. Furthermore, our model also outperforms other Transformer-based methods on association performance on MOT17 and generalizes well on BDD100K. Code is available at https://github.com/MCG-NJU/MeMOTR.

摘要
作为视频任务，多目标跟踪（MOT）预期能够有效捕捉目标的时间信息。然而，大多数现有方法只是直接利用邻帧对象特征，而忽略了长期时间信息的模型。在这篇论文中，我们提出了MeMOTR，一种带有长期记忆的Transformer型多目标跟踪方法。我们的方法可以使同一个目标的跟踪嵌入更加稳定和分化，通过自定义的记忆注意力层进行长期记忆注入。这意味着我们的模型可以更好地进行目标相关性能。实验结果表明，MeMOTR在DanceTrack上表现出色，与比较方法相比，提高了7.9%和13.0%的HOTA和AssA指标。此外，我们的模型还超过了其他基于Transformer的方法在相关性能上，并在MOT17和BDD100K上进行了良好的总体化。代码可以在https://github.com/MCG-NJU/MeMOTR中找到。

SimDETR: Simplifying self-supervised pretraining for DETR

paper_url: http://arxiv.org/abs/2307.15697
repo_url: None
paper_authors: Ioannis Maniadis Metaxas, Adrian Bulat, Ioannis Patras, Brais Martinez, Georgios Tzimiropoulos
for: 提高 DETR 基于检测器的效果，提高 sample efficiency 和速度
methods: 使用无监督预训练，使用高级特征图生成更加具有 semantics 的初始提案，使用对象pseudo标签进行推训，进行自我训练
results: 比对先前的预训练方法，我们的预训练方法在全数据和低数据 régime 都具有显著的提升，可以直接在复杂图像 datasets 上预训练 DETR 从头开始

Abstract
DETR-based object detectors have achieved remarkable performance but are sample-inefficient and exhibit slow convergence. Unsupervised pretraining has been found to be helpful to alleviate these impediments, allowing training with large amounts of unlabeled data to improve the detector's performance. However, existing methods have their own limitations, like keeping the detector's backbone frozen in order to avoid performance degradation and utilizing pretraining objectives misaligned with the downstream task. To overcome these limitations, we propose a simple pretraining framework for DETR-based detectors that consists of three simple yet key ingredients: (i) richer, semantics-based initial proposals derived from high-level feature maps, (ii) discriminative training using object pseudo-labels produced via clustering, (iii) self-training to take advantage of the improved object proposals learned by the detector. We report two main findings: (1) Our pretraining outperforms prior DETR pretraining works on both the full and low data regimes by significant margins. (2) We show we can pretrain DETR from scratch (including the backbone) directly on complex image datasets like COCO, paving the path for unsupervised representation learning directly using DETR.

摘要

基于高级特征图像的丰富Semantic-based初始提案，来提高检测器的性能。2. 使用对象 Pseudo-标签生成的推理训练，以提高检测器的精度。3. 使用自我训练，以利用改进的对象提案来提高检测器的性能。我们的研究发现：1. 我们的预训练在完整数据集和低数据集上都比过去的 DETR 预训练工作表现出了显著的改善。2. 我们可以直接在复杂的图像 datasets 上预训练 DETR，从而开展无监督表征学学习。

PatchMixer: Rethinking network design to boost generalization for 3D point cloud understanding

paper_url: http://arxiv.org/abs/2307.15692
repo_url: https://github.com/davideboscaini/patchmixer
paper_authors: Davide Boscaini, Fabio Poiesi
for: 本研究旨在评估深度学习方法对3D点云理解的能力，并提出一种简单 yet effective的方法来扩展MLP-Mixer纸质。
methods: 本方法使用本地小块处理而不是整个形状，以促进对部分点云的稳定性，并使用MLP进行局部特征聚合。
results: 我们在形态分类和部分分割任务中评估了我们的方法，与一些相关的深度架构进行比较，得到了更好的总体适应性表现。

Abstract
The recent trend in deep learning methods for 3D point cloud understanding is to propose increasingly sophisticated architectures either to better capture 3D geometries or by introducing possibly undesired inductive biases. Moreover, prior works introducing novel architectures compared their performance on the same domain, devoting less attention to their generalization to other domains. We argue that the ability of a model to transfer the learnt knowledge to different domains is an important feature that should be evaluated to exhaustively assess the quality of a deep network architecture. In this work we propose PatchMixer, a simple yet effective architecture that extends the ideas behind the recent MLP-Mixer paper to 3D point clouds. The novelties of our approach are the processing of local patches instead of the whole shape to promote robustness to partial point clouds, and the aggregation of patch-wise features using an MLP as a simpler alternative to the graph convolutions or the attention mechanisms that are used in prior works. We evaluated our method on the shape classification and part segmentation tasks, achieving superior generalization performance compared to a selection of the most relevant deep architectures.

摘要
现代深度学习方法的趋势是不断提出更加复杂的架构，以更好地捕捉3D形态或者引入可能不希望的逻辑偏见。然而，先前的工作通常只在同一个领域中评估其性能，对其在其他领域的泛化性能 menos 关注。我们认为，深度网络模型在不同领域中的泛化性能是一项重要的评估标准。在这篇文章中，我们提出了PatchMixer，一种简单 yet effective的架构，扩展了最近的 MLP-Mixer 文献中的想法，并对3D 点云进行处理。我们的方法的创新之处在于处理本地小块而不是整个形态，以便增强对部分点云的Robustness，以及通过 MLP 作为更简单的替代品，对于先前的图像 convolution 或者注意力机制进行聚合。我们对 shape classification 和部分 segmentation 任务进行评估，并实现了相比一些最相关的深度架构的更好的泛化性能。

TrackAgent: 6D Object Tracking via Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.15671
repo_url: None
paper_authors: Konstantin Röhrl, Dominik Bauer, Timothy Patten, Markus Vincze
for: Tracking an object’s 6D pose in robotics and augmented reality applications, while either the object or the observing camera is moving.
methods: Simplify object tracking to a reinforced point cloud (depth only) alignment task, using a streamlined approach with limited amounts of sparse 3D point clouds and a reinforcement learning (RL) agent that jointly solves for both objectives.
results: The RL agent’s uncertainty and a rendering-based mask propagation are effective reinitialization triggers, and the proposed method outperforms previous RGB(D)-based methods in terms of computational efficiency and robustness to tracking loss.

Abstract
Tracking an object's 6D pose, while either the object itself or the observing camera is moving, is important for many robotics and augmented reality applications. While exploiting temporal priors eases this problem, object-specific knowledge is required to recover when tracking is lost. Under the tight time constraints of the tracking task, RGB(D)-based methods are often conceptionally complex or rely on heuristic motion models. In comparison, we propose to simplify object tracking to a reinforced point cloud (depth only) alignment task. This allows us to train a streamlined approach from scratch with limited amounts of sparse 3D point clouds, compared to the large datasets of diverse RGBD sequences required in previous works. We incorporate temporal frame-to-frame registration with object-based recovery by frame-to-model refinement using a reinforcement learning (RL) agent that jointly solves for both objectives. We also show that the RL agent's uncertainty and a rendering-based mask propagation are effective reinitialization triggers.

摘要
Tracking an object's 6D pose, while either the object itself or the observing camera is moving, is important for many robotics and augmented reality applications. While exploiting temporal priors eases this problem, object-specific knowledge is required to recover when tracking is lost. Under the tight time constraints of the tracking task, RGB(D)-based methods are often conceptionally complex or rely on heuristic motion models. In comparison, we propose to simplify object tracking to a reinforced point cloud (depth only) alignment task. This allows us to train a streamlined approach from scratch with limited amounts of sparse 3D point clouds, compared to the large datasets of diverse RGBD sequences required in previous works. We incorporate temporal frame-to-frame registration with object-based recovery by frame-to-model refinement using a reinforcement learning (RL) agent that jointly solves for both objectives. We also show that the RL agent's uncertainty and a rendering-based mask propagation are effective reinitialization triggers.Translation notes:* "6D pose" is translated as "6D位姿" (liù dì wèi xìng)* "RGB(D)" is translated as "RGB(D)" (RGB(D) séqùì)* "point cloud" is translated as "点云" (diǎn yú)* "reinforced" is translated as "加强" (jiā qiáng)* "streamlined" is translated as "流畅" (liú chóng)* "sparse" is translated as "稀疏" (xī shōu)* "temporal" is translated as "时间" (shí jian)* "frame-to-frame" is translated as "帧到帧" (kuàng dào kuàng)* "object-based" is translated as "基于物体" (jī yú wù tǐ)* "reinitialization" is translated as "重新初始化" (zhòng xīn chū shí huà)* "uncertainty" is translated as "不确定性" (bù jì dì xìng)* "rendering-based" is translated as "基于渲染" (jī yú yǎo chéng)

Multi-layer Aggregation as a key to feature-based OOD detection

paper_url: http://arxiv.org/abs/2307.15647
repo_url: https://github.com/benolmbrt/MedicOOD
paper_authors: Benjamin Lambert, Florence Forbes, Senan Doyle, Michel Dojat
for: 本研究旨在探讨基于深度学习模型的异常检测方法，以提高医学图像分析中的精度和可靠性。
methods: 本研究使用的方法包括单层方法和多层方法，单层方法根据特定层获得的特征图进行检测，多层方法则使用模型生成的ensemble特征图进行检测。
results: 本研究对20种异常类型（约7800个3D MRI）进行了大规模的异常检测实验，结果表明多层方法在异常检测中表现更好，而单层方法则具有不一致的行为，具体取决于异常类型。此外，研究还发现了模型的结构对异常检测性能产生了很大影响。

Abstract
Deep Learning models are easily disturbed by variations in the input images that were not observed during the training stage, resulting in unpredictable predictions. Detecting such Out-of-Distribution (OOD) images is particularly crucial in the context of medical image analysis, where the range of possible abnormalities is extremely wide. Recently, a new category of methods has emerged, based on the analysis of the intermediate features of a trained model. These methods can be divided into 2 groups: single-layer methods that consider the feature map obtained at a fixed, carefully chosen layer, and multi-layer methods that consider the ensemble of the feature maps generated by the model. While promising, a proper comparison of these algorithms is still lacking. In this work, we compared various feature-based OOD detection methods on a large spectra of OOD (20 types), representing approximately 7800 3D MRIs. Our experiments shed the light on two phenomenons. First, multi-layer methods consistently outperform single-layer approaches, which tend to have inconsistent behaviour depending on the type of anomaly. Second, the OOD detection performance highly depends on the architecture of the underlying neural network.

摘要
深度学习模型容易受到输入图像的变化所影响，导致预测结果不可预测。在医学图像分析中，检测这些外围数据（Out-of-Distribution，OOD）图像特别重要。近期，一种新的分类方法在发展，基于模型的中间特征分析。这些方法可以分为两类：单层方法，利用特定层的特征图，和多层方法，利用模型生成的特征图的ensemble。虽有承诺，但对这些算法进行正确的比较仍然缺乏。在这项工作中，我们对20种OOD类型（约7800个3D MRI）进行了大规模的比较。我们的实验揭示了两个现象：首先，多层方法在不同类型的异常情况下表现更好，单层方法则具有不一致的行为；其次，OOD检测性能强烈依赖于下面的神经网络架构。

Scale-aware Test-time Click Adaptation for Pulmonary Nodule and Mass Segmentation

paper_url: http://arxiv.org/abs/2307.15645
repo_url: https://github.com/splinterli/sattca
paper_authors: Zhihao Li, Jiancheng Yang, Yongchao Xu, Li Zhang, Wenhui Dong, Bo Du
for: 针对肺癌检测中肺脏异常的尺寸管理，提出了一种基于多尺度神经网络的尺度意识适应测试方法。
methods: 提出了一种基于简单的Click适应方法，通过使用轻松获得的病理点击来提高分 segmentation 性能，特别是大型肿瘤的检测。
results: EXTENSIVE EXPERIMENTS ON BOTH OPEN-SOURCE AND IN-HOUSE DATASETS CONSISTENTLY DEMONSTRATE THE EFFECTIVENESS OF THE PROPOSED METHOD OVER SOME CNN AND TRANSFORMER-BASED SEGMENTATION METHODS。

Abstract
Pulmonary nodules and masses are crucial imaging features in lung cancer screening that require careful management in clinical diagnosis. Despite the success of deep learning-based medical image segmentation, the robust performance on various sizes of lesions of nodule and mass is still challenging. In this paper, we propose a multi-scale neural network with scale-aware test-time adaptation to address this challenge. Specifically, we introduce an adaptive Scale-aware Test-time Click Adaptation method based on effortlessly obtainable lesion clicks as test-time cues to enhance segmentation performance, particularly for large lesions. The proposed method can be seamlessly integrated into existing networks. Extensive experiments on both open-source and in-house datasets consistently demonstrate the effectiveness of the proposed method over some CNN and Transformer-based segmentation methods. Our code is available at https://github.com/SplinterLi/SaTTCA

摘要
肺部肿瘤和质量是链球癌检测中重要的图像特征，需要精确的诊断管理。尽管深度学习基础的医疗图像分类已经取得成功，但是不同大小的肿瘤和质量的性能仍然是挑战。在这篇论文中，我们提出了一个多尺度神经网络，并导入了渠道对应的测试时间适应方法。具体来说，我们引入了适应状态测试时间Click整合方法，根据易доступible的肿瘤点击作为测试时间启发，以提高分类性能，特别是大肿瘤。提案的方法可以与现有的网络集成。我们在多个开源和内部数据集上进行了广泛的实验，结果显示了提案方法的效果，比如CNN和Transformer基础的分类方法。我们的代码可以在https://github.com/SplinterLi/SaTTCA中下载。

CLIP Brings Better Features to Visual Aesthetics Learners

paper_url: http://arxiv.org/abs/2307.15640
repo_url: None
paper_authors: Liwu Xu, Jinjin Xu, Yuzhe Yang, Yijie Huang, Yanchun Xie, Yaqian Li
for:The paper is written for the application of image aesthetics assessment (IAA), which has a subjective and expensive labeling procedure.methods:The proposed method uses a two-phase approach, which integrates and leverages a multi-source unlabeled dataset to align rich features between a given visual encoder and an off-the-shelf CLIP image encoder via feature alignment loss.results:The proposed method achieves state-of-the-art performance on multiple widely used IAA benchmarks, alleviating the feature collapse issue and showcasing the necessity of feature alignment instead of training directly based on CLIP image encoder.Here is the Chinese version of the three key points:for:这篇论文是为了应用图像美学评估（IAA），这是一个主观和昂贵的标注过程。methods:提议的方法使用了两个阶段的方法，即将多源无标记数据集集成并利用，以实现给定视觉编码器和Off-the-shelf CLIP图像编码器之间的特征匹配，并通过特征匹配损失来进行协调。results:提议的方法在多个广泛使用的IAA标准准点上达到了状态机器人的性能，解决了特征塌积问题，并证明了在不直接基于CLIP图像编码器进行训练的情况下，特征匹配是必不可少的。

Abstract
The success of pre-training approaches on a variety of downstream tasks has revitalized the field of computer vision. Image aesthetics assessment (IAA) is one of the ideal application scenarios for such methods due to subjective and expensive labeling procedure. In this work, an unified and flexible two-phase \textbf{C}LIP-based \textbf{S}emi-supervised \textbf{K}nowledge \textbf{D}istillation paradigm is proposed, namely \textbf{\textit{CSKD}. Specifically, we first integrate and leverage a multi-source unlabeled dataset to align rich features between a given visual encoder and an off-the-shelf CLIP image encoder via feature alignment loss. Notably, the given visual encoder is not limited by size or structure and, once well-trained, it can seamlessly serve as a better visual aesthetic learner for both student and teacher. In the second phase, the unlabeled data is also utilized in semi-supervised IAA learning to further boost student model performance when applied in latency-sensitive production scenarios. By analyzing the attention distance and entropy before and after feature alignment, we notice an alleviation of feature collapse issue, which in turn showcase the necessity of feature alignment instead of training directly based on CLIP image encoder. Extensive experiments indicate the superiority of CSKD, which achieves state-of-the-art performance on multiple widely used IAA benchmarks.

摘要
随着预训练方法在多种下游任务上的成功，计算机视觉领域得到了新的动力。图像美学评价（IAA）是适用于这些方法的理想应用场景，因为评价标签过程是主观且昂贵的。在这项工作中，我们提出了一种统一和灵活的两阶段\textbf{C}LIP-基于\textbf{S}emi-supervised \textbf{K}nowledge \textbf{D}istillation（CSKD）方法。具体来说，我们首先将多种无标注数据集集成并利用，以确保图像编码器和CLIP图像编码器之间的特征相似性。注意，给定的图像编码器没有尺寸或结构的限制，一旦它很好地训练，它就可以成为更好的图像美学学习者。在第二阶段，我们还利用无标注数据集进行 semi-supervised IAA 学习，以进一步提高学生模型在响应时间敏感的生产环境中的性能。通过分析特征距离和熵之前和之后准对，我们发现了特征坍塌问题的缓解，这反映了特征准对的重要性，而不是直接基于CLIP图像编码器进行训练。我们进行了广泛的实验，并证明了 CSKD 的优越性，在多个广泛使用的 IAA benchmark 上达到了状态之最的表现。

2023-07-29

cs.AI

cs.AI - 2023-07-29

Marrying Dialogue Systems with Data Visualization: Interactive Data Visualization Generation from Natural Language Conversations

paper_url: http://arxiv.org/abs/2307.16013
repo_url: None
paper_authors: Yuanfeng Song, Xuefang Zhao, Raymond Chi-Wing Wong
for: 本研究旨在提高数据视化（DV）系统的使用效率，通过自动化DV任务，如自然语言问题（NLQ）到视化翻译（formally called text-to-vis）。
methods: 本研究提出了一个新任务名为CoVis，即对话式文本到视化，旨在通过用户和系统之间的多次交互来构建DV。
results: 研究人员建立了一个名为Dial-NVBench的benchmark dataset，并提出了一种多模式神经网络名为MMCoVisNet，可以回答DV相关的问题。MMCoVisNet使用对话 контекст进行全面理解，然后使用适应性decoder提供相应的回答。实验结果表明，MMCoVisNet在比较基eline上表现出色，达到了状态略。

Abstract
Data visualization (DV) has become the prevailing tool in the market due to its effectiveness into illustrating insights in vast amounts of data. To lower the barrier of using DVs, automatic DV tasks, such as natural language question (NLQ) to visualization translation (formally called text-to-vis), have been investigated in the research community. However, text-to-vis assumes the NLQ to be well-organized and expressed in a single sentence. However, in real-world settings, complex DV is needed through consecutive exchanges between the DV system and the users. In this paper, we propose a new task named CoVis, short for Conversational text-to-Visualization, aiming at constructing DVs through a series of interactions between users and the system. Since it is the task which has not been studied in the literature, we first build a benchmark dataset named Dial-NVBench, including dialogue sessions with a sequence of queries from a user and responses from the system. Then, we propose a multi-modal neural network named MMCoVisNet to answer these DV-related queries. In particular, MMCoVisNet first fully understands the dialogue context and determines the corresponding responses. Then, it uses adaptive decoders to provide the appropriate replies: (i) a straightforward text decoder is used to produce general responses, (ii) an SQL-form decoder is applied to synthesize data querying responses, and (iii) a DV-form decoder tries to construct the appropriate DVs. We comparatively evaluate MMCoVisNet with other baselines over our proposed benchmark dataset. Experimental results validate that MMCoVisNet performs better than existing baselines and achieves a state-of-the-art performance.

摘要
数据视化（DV）已成为市场上最受欢迎的工具，因为它能够快速和有效地表示大量数据中的意见。为了降低使用DV的门槛，研究者们已经对自动DV任务进行了详细的研究，如自然语言问题（NLQ）到视觉翻译（文本至图）。然而，文本至图假设NLQ是结束的和整洁的单句话。在实际应用中，需要通过多次交互来构建复杂的DV。在这篇论文中，我们提出了一个新的任务，即对话式文本至视觉（CoVis），旨在通过用户和系统之间的多次交互来构建DV。由于这是Literature中没有研究过的任务，我们首先构建了一个名为Dial-NVBench的 benchmark数据集，包括用户和系统之间的对话会话，以及一系列关于DV的查询。然后，我们提出了一种多模态神经网络名为MMCoVisNet，用于回答这些DV相关的查询。具体来说，MMCoVisNet首先完全理解对话上下文，然后确定相应的回答。接着，它使用适应编码器提供相应的答案，包括：（i）一般文本编码器用于生成通用答案，（ii）SQL形式编码器用于生成数据查询答案，（iii）DV形式编码器用于构建适当的DV。我们对MMCoVisNet进行了与其他基准点进行比较的实验，并证明它在我们提出的benchmark数据集上表现出色。

RoCar: A Relationship Network-based Evaluation Method to Large Language Models

paper_url: http://arxiv.org/abs/2307.15997
repo_url: https://github.com/neu-datamining/rocar
paper_authors: Ming Wang, Wenfang Wu, Chongyun Gao, Daling Wang, Shi Feng, Yifei Zhang
for: 评估大语言模型（LLMs）的能力
methods: 使用定义的基本模式Random constructions of task graphs and generates natural language evaluation tasks to evaluate LLMs’ reasoning and memory abilities
results: Ensures fairness of evaluation method by preventing LLMs from directly learning the evaluation tasks

Abstract
Large language models (LLMs) have received increasing attention. However, due to the complexity of its capabilities, how to rationally evaluate the capabilities of LLMs is still a task to be solved. We propose the RoCar method, which utilizes the defined basic schemas to randomly construct a task graph and generates natural language evaluation tasks based on the task graph to evaluate the reasoning and memory abilities of LLMs respectively. Due to the very large randomness of the task construction process, it is possible to ensure that none of the LLMs to be tested has directly learned the evaluation tasks, guaranteeing the fairness of the evaluation method.

摘要
大型语言模型（LLMs）已经获得了越来越多的关注。然而，由于它们的能力相当复杂，如何合理评估它们的能力仍然是一个需要解决的任务。我们提议使用RoCar方法，它利用定义的基本模板来随机建立任务图和生成基于任务图的自然语言评估任务，以评估LLMs的推理和记忆能力。由于随机任务建构过程的非常大的Randomness，因此可以保证none of the LLMs to be tested haven't directly learned the evaluation tasks，确保评估方法的公平性。

UPFL: Unsupervised Personalized Federated Learning towards New Clients

paper_url: http://arxiv.org/abs/2307.15994
repo_url: None
paper_authors: Tiandi Ye, Cen Chen, Yinggui Wang, Xiang Li, Ming Gao
for: addressing the challenge of providing personalized models for new clients in federated learning settings
methods: extends adaptive risk minimization technique to unsupervised personalized federated learning, with two optimization strategies (proxy regularization and early-stopping) and a knowledge distillation loss specifically designed for FedTTA
results: extensive experiments on five datasets against eleven baselines demonstrate the effectiveness of the proposed FedTTA and its variants

Abstract
Personalized federated learning has gained significant attention as a promising approach to address the challenge of data heterogeneity. In this paper, we address a relatively unexplored problem in federated learning. When a federated model has been trained and deployed, and an unlabeled new client joins, providing a personalized model for the new client becomes a highly challenging task. To address this challenge, we extend the adaptive risk minimization technique into the unsupervised personalized federated learning setting and propose our method, FedTTA. We further improve FedTTA with two simple yet effective optimization strategies: enhancing the training of the adaptation model with proxy regularization and early-stopping the adaptation through entropy. Moreover, we propose a knowledge distillation loss specifically designed for FedTTA to address the device heterogeneity. Extensive experiments on five datasets against eleven baselines demonstrate the effectiveness of our proposed FedTTA and its variants. The code is available at: https://github.com/anonymous-federated-learning/code.

摘要
个人化联合学习已经吸引了广泛关注，作为数据不同性的解决方案。在这篇论文中，我们解决了联合学习中较少研究的问题。当一个联合模型已经训练并部署后，新客户加入时，为新客户提供个性化模型是一个非常具有挑战性的任务。为解决这个挑战，我们将适应风险最小化技术推广到无标签联合学习设置中，并提出我们的方法，FedTTA。我们还通过两种简单却有效的优化策略来进一步提高FedTTA：在适应模型训练中添加代理规则，并在适应过程中使用熵来停止。此外，我们还提出了特有的知识传播损失，用于解决设备不同性。我们在五个数据集上对十一个基准进行了广泛的实验，并证明了我们提出的FedTTA和其变种的效果。代码可以在以下地址获取：https://github.com/anonymous-federated-learning/code。

Ultrasound Image Reconstruction with Denoising Diffusion Restoration Models

paper_url: http://arxiv.org/abs/2307.15990
repo_url: https://github.com/yuxin-zhang-jasmine/drus-v1
paper_authors: Yuxin Zhang, Clément Huneau, Jérôme Idier, Diana Mateus
for: 这个论文是为了解决超声影像重建问题，通过学习前知识来提高重建质量。
methods: 这篇论文使用了学习前知识的权重，在Denosing Diffusion Restoration Models（DDRM）框架下实现了超声影像重建。提出了两种 modificates of DDRM，DRUS和WDRUS，并对合成数据和PICMUS数据进行了测试。
results: 该方法可以从单个平面波开始，并且可以达到或更好于DAS和当前最佳方法的图像质量。可以在https://github.com/Yuxin-Zhang-Jasmine/DRUS-v1中下载代码。

Abstract
Ultrasound image reconstruction can be approximately cast as a linear inverse problem that has traditionally been solved with penalized optimization using the $l_1$ or $l_2$ norm, or wavelet-based terms. However, such regularization functions often struggle to balance the sparsity and the smoothness. A promising alternative is using learned priors to make the prior knowledge closer to reality. In this paper, we rely on learned priors under the framework of Denoising Diffusion Restoration Models (DDRM), initially conceived for restoration tasks with natural images. We propose and test two adaptions of DDRM to ultrasound inverse problem models, DRUS and WDRUS. Our experiments on synthetic and PICMUS data show that from a single plane wave our method can achieve image quality comparable to or better than DAS and state-of-the-art methods. The code is available at: https://github.com/Yuxin-Zhang-Jasmine/DRUS-v1.

摘要
ultrasound图像重建可以 aproximately 看作一个线性 inverse 问题，传统上使用 $l_1$ 或 $l_2$ 范数或浪涌基元的 regularization 函数来解决。但这些 regularization 函数经常坚持不够平衡稀疏性和稳定性。一种有前途的替代方案是使用学习的 prior 来让 prior 更加接近 reality。在这篇论文中，我们利用学习的 prior 在 Denoising Diffusion Restoration Models（DDRM）框架下，DDRM 最初是为静止图像修复任务设计的。我们提出并测试了 two 种适应 DDRM 到 ultrasound inverse problem 模型的变体，DRUS 和 WDRUS。我们的实验表明，从单个扩散波的数据中，我们的方法可以 achieved 图像质量与 DAS 和现有方法相当或更高。代码可以在：https://github.com/Yuxin-Zhang-Jasmine/DRUS-v1 中找到。

Freespace Optical Flow Modeling for Automated Driving

paper_url: http://arxiv.org/abs/2307.15989
repo_url: None
paper_authors: Yi Feng, Ruge Zhang, Jiayuan Du, Qijun Chen, Rui Fan
for: 这篇论文的目的是为自动驾驶视觉识别提出一个新的方法，具体来说是计算车辆在驾驶环境中的运动流。
methods: 这篇论文使用了一种新的方法，即在三维驾驶环境中利用几何信息来模型光流。这个方法利用了碰撞范围（也称为可动范围或简单地说是“自由空间”）中的几何信息，以便更好地利用环境信息和几何制约。
results: 这篇论文的实验结果显示了新的光流模型的高精度和可靠性。此外，这个模型还有许多应用在自动驾驶领域，例如顶对应探测、车辆位置探测等。实验结果显示了这个模型在不同的公共数据集上的高性能。另外，作者们还提供了一个公开的源代码，让其他研究人员可以免费地使用。

Abstract
Optical flow and disparity are two informative visual features for autonomous driving perception. They have been used for a variety of applications, such as obstacle and lane detection. The concept of "U-V-Disparity" has been widely explored in the literature, while its counterpart in optical flow has received relatively little attention. Traditional motion analysis algorithms estimate optical flow by matching correspondences between two successive video frames, which limits the full utilization of environmental information and geometric constraints. Therefore, we propose a novel strategy to model optical flow in the collision-free space (also referred to as drivable area or simply freespace) for intelligent vehicles, with the full utilization of geometry information in a 3D driving environment. We provide explicit representations of optical flow and deduce the quadratic relationship between the optical flow component and the vertical coordinate. Through extensive experiments on several public datasets, we demonstrate the high accuracy and robustness of our model. Additionally, our proposed freespace optical flow model boasts a diverse array of applications within the realm of automated driving, providing a geometric constraint in freespace detection, vehicle localization, and more. We have made our source code publicly available at https://mias.group/FSOF.

摘要
优化流和差异是自动驾驶视觉感知中的两种有用特征。它们已经用于许多应用程序，如障碍物和车道检测。在文献中，“U-V-差异”概念已经广泛探讨，而其对优化流的匹配相对较少。传统的运动分析算法在两帧视频之间匹配对应点，这限制了环境信息和几何约束的完全利用。因此，我们提出了一种新的策略，在碰撞自由空间（也称为可驾驶空间或简单地 freespace）中模型优化流，充分利用3D驾驶环境中的几何信息。我们提供了优化流的Explicit表示，并证明了优化流组件与垂直坐标之间的 quadratic关系。经过对多个公共数据集的广泛实验，我们示出了我们的模型具有高准确性和稳定性。此外，我们提出的免碰撞自由流模型在自动驾驶领域中拥有多种应用，包括免碰撞自由空间检测、车辆定位和更多。我们的源代码已经公开在https://mias.group/FSOF。

You Can Backdoor Personalized Federated Learning

paper_url: http://arxiv.org/abs/2307.15971
repo_url: None
paper_authors: Tiandi Ye, Cen Chen, Yinggui Wang, Xiang Li, Ming Gao
for: This paper focuses on backdoor attacks in personalized federated learning (pFL) scenarios, where each client constructs a personalized model based on its local data.methods: The paper proposes three backdoor attack methods: BapFL, BapFL+, and Gen-BapFL, which can effectively attack pFL methods by maintaining clean local parameters while implanting the backdoor into the global parameters, and by introducing Gaussian noise to the local parameters.results: The paper demonstrates the effectiveness of the proposed attack methods against two classic pFL methods with partial model-sharing, FedPer and LG-FedAvg, on four FL benchmark datasets. Additionally, the paper assesses the defense efficacy of various defense strategies against the proposed attacks and finds that Gradient Norm-Clipping is particularly effective.

Abstract
Backdoor attacks pose a significant threat to the security of federated learning systems. However, existing research primarily focuses on backdoor attacks and defenses within the generic FL scenario, where all clients collaborate to train a single global model. \citet{qin2023revisiting} conduct the first study of backdoor attacks in the personalized federated learning (pFL) scenario, where each client constructs a personalized model based on its local data. Notably, the study demonstrates that pFL methods with partial model-sharing can significantly boost robustness against backdoor attacks. In this paper, we whistleblow that pFL methods with partial model-sharing are still vulnerable to backdoor attacks in the absence of any defense. We propose three backdoor attack methods: BapFL, BapFL+, and Gen-BapFL, and we empirically demonstrate that they can effectively attack the pFL methods. Specifically, the key principle of BapFL lies in maintaining clean local parameters while implanting the backdoor into the global parameters. BapFL+ generalizes the attack success to benign clients by introducing Gaussian noise to the local parameters. Furthermore, we assume the collaboration of malicious clients and propose Gen-BapFL, which leverages meta-learning techniques to further enhances attack generalization. We evaluate our proposed attack methods against two classic pFL methods with partial model-sharing, FedPer and LG-FedAvg. Extensive experiments on four FL benchmark datasets demonstrate the effectiveness of our proposed attack methods. Additionally, we assess the defense efficacy of various defense strategies against our proposed attacks and find that Gradient Norm-Clipping is particularly effective. It is crucial to note that pFL method is not always secure in the presence of backdoor attacks, and we hope to inspire further research on attack and defense in pFL scenarios.

摘要
背门攻击对联合学习系统安全性提出了严重的威胁。然而，现有的研究主要集中在背门攻击和防御在通用 Federated Learning（FL）场景中，其中所有客户端协力训练单一的全球模型。 however, 某些客户端协力训练个人化 Federated Learning（pFL）场景，每个客户端都会根据本地数据建立个人化的模型。不ably, 这些研究显示了pFL方法在部分模型分享情况下可以大幅提高防御背门攻击的能力。在这篇文章中，我们宣布pFL方法在部分模型分享情况下仍然受到背门攻击的威胁，在没有任何防御措施的情况下。我们提出了三种背门攻击方法：BapFL、BapFL+和Gen-BapFL，并经过实验显示了它们可以有效地攻击pFL方法。具体来说，BapFL的关键原理是维持清洁的本地参数，同时将背门嵌入到全球参数中。BapFL+扩展了攻击成功到良好的客户端，通过引入 Gaussian 噪声到本地参数中。此外，我们假设了合作的黑客端，并提出了Gen-BapFL，利用了元学习技术以进一步增强攻击扩展。我们对两个类型的pFL方法进行了广泛的实验，评估了我们所提出的攻击方法的效果。我们还评估了不同防御策略对我们所提出的攻击方法的防御效果，发现Gradient Norm-Clipping particularly effective。需要注意的是，pFL方法不一定在背门攻击下安全，我们希望透过这篇文章启发更多的研究背门攻击和防御在pFL场景中。

Graph Condensation for Inductive Node Representation Learning

paper_url: http://arxiv.org/abs/2307.15967
repo_url: None
paper_authors: Xinyi Gao, Tong Chen, Yilong Zang, Wentao Zhang, Quoc Viet Hung Nguyen, Kai Zheng, Hongzhi Yin
for: 提高大型图的 Graph Neural Networks (GNNs) 的计算效率，以便在多种应用中使用。
methods: 使用 Graph Condensation 技术，将大型图构建成小型的 sintetic graph，以便训练 GNNs。同时，通过学习一对多节点映射，使新节点可以直接在 sintetic graph 上进行信息传递。
results: 在 Reddit dataset 上，使用 MCond 方法可以 achieve up to 121.5x 的推理速度增幅和 55.9x 的存储需求减少，比对方法 Based on 原始图更高效。

Abstract
Graph neural networks (GNNs) encounter significant computational challenges when handling large-scale graphs, which severely restricts their efficacy across diverse applications. To address this limitation, graph condensation has emerged as a promising technique, which constructs a small synthetic graph for efficiently training GNNs while retaining performance. However, due to the topology structure among nodes, graph condensation is limited to condensing only the observed training nodes and their corresponding structure, thus lacking the ability to effectively handle the unseen data. Consequently, the original large graph is still required in the inference stage to perform message passing to inductive nodes, resulting in substantial computational demands. To overcome this issue, we propose mapping-aware graph condensation (MCond), explicitly learning the one-to-many node mapping from original nodes to synthetic nodes to seamlessly integrate new nodes into the synthetic graph for inductive representation learning. This enables direct information propagation on the synthetic graph, which is much more efficient than on the original large graph. Specifically, MCond employs an alternating optimization scheme with innovative loss terms from transductive and inductive perspectives, facilitating the mutual promotion between graph condensation and node mapping learning. Extensive experiments demonstrate the efficacy of our approach in inductive inference. On the Reddit dataset, MCond achieves up to 121.5x inference speedup and 55.9x reduction in storage requirements compared with counterparts based on the original graph.

摘要
GRAPH NEURAL NETWORKS (GNNs) 面临大规模图处理中的 significiant 计算挑战，这限制了它们在多种应用中的效果。为解决这些限制，图简化技术 emerged as a promising technique, which constructs a small synthetic graph for efficiently training GNNs while retaining performance. However, due to the topology structure among nodes, graph condensation is limited to condensing only the observed training nodes and their corresponding structure, thus lacking the ability to effectively handle the unseen data. Consequently, the original large graph is still required in the inference stage to perform message passing to inductive nodes, resulting in substantial computational demands. To overcome this issue, we propose mapping-aware graph condensation (MCond), explicitly learning the one-to-many node mapping from original nodes to synthetic nodes to seamlessly integrate new nodes into the synthetic graph for inductive representation learning. This enables direct information propagation on the synthetic graph, which is much more efficient than on the original large graph. Specifically, MCond employs an alternating optimization scheme with innovative loss terms from transductive and inductive perspectives, facilitating the mutual promotion between graph condensation and node mapping learning. Extensive experiments demonstrate the efficacy of our approach in inductive inference. On the Reddit dataset, MCond achieves up to 121.5x inference speedup and 55.9x reduction in storage requirements compared with counterparts based on the original graph.

Towards the Visualization of Aggregated Class Activation Maps to Analyse the Global Contribution of Class Features

paper_url: http://arxiv.org/abs/2308.00710
repo_url: None
paper_authors: Igor Cherepanov, David Sessler, Alex Ulmer, Hendrik Lücke-Tieke, Jörn Kohlhammer
for: 这篇论文旨在解释深度学习（DL）模型在分类任务中的决策过程，以便在高风险应用中使用DL模型。
methods: 我们对recent Class Activation Maps（CAMs）方法进行了扩展，以visualize每个数据样本中对分类决策的重要性。我们将多个样本的CAMs进行聚合，以提供一个全局的解释视图，并为每个特征添加了一个方块图示，以显示该特征对分类决策的影响。
results: 我们的视觉表示方法可以帮助分析者了解DL模型在高维数据中做出决策的重要特征，并提供了一种交互式 histogram 来筛选样本和细化 CAM，以便进一步分析 interessing 特征。

Abstract
Deep learning (DL) models achieve remarkable performance in classification tasks. However, models with high complexity can not be used in many risk-sensitive applications unless a comprehensible explanation is presented. Explainable artificial intelligence (xAI) focuses on the research to explain the decision-making of AI systems like DL. We extend a recent method of Class Activation Maps (CAMs) which visualizes the importance of each feature of a data sample contributing to the classification. In this paper, we aggregate CAMs from multiple samples to show a global explanation of the classification for semantically structured data. The aggregation allows the analyst to make sophisticated assumptions and analyze them with further drill-down visualizations. Our visual representation for the global CAM illustrates the impact of each feature with a square glyph containing two indicators. The color of the square indicates the classification impact of this feature. The size of the filled square describes the variability of the impact between single samples. For interesting features that require further analysis, a detailed view is necessary that provides the distribution of these values. We propose an interactive histogram to filter samples and refine the CAM to show relevant samples only. Our approach allows an analyst to detect important features of high-dimensional data and derive adjustments to the AI model based on our global explanation visualization.

摘要

The effect of network topologies on fully decentralized learning: a preliminary investigation

paper_url: http://arxiv.org/abs/2307.15947
repo_url: None
paper_authors: Luigi Palmieri, Lorenzo Valerio, Chiara Boldrini, Andrea Passarella
for: 这篇论文研究了在分布式机器学习系统中节点之间的网络拓扑如何影响模型的训练和性能。
methods: 作者使用了直接各节点间协作来训练机器学习模型，并研究了不同网络拓扑的影响。
results: 研究发现，即使网络组件之间存在较弱的连接，也可以快速传播信息，但是不足以传播知识。另外，研究还发现，核心节点（hubs）在传播知识方面扮演着更重要的角色，而叶节点（leaves）的作用则取决于分布的重要性。最后，研究还发现，紧密结构的社区会很大程度地阻碍知识的传播。

Abstract
In a decentralized machine learning system, data is typically partitioned among multiple devices or nodes, each of which trains a local model using its own data. These local models are then shared and combined to create a global model that can make accurate predictions on new data. In this paper, we start exploring the role of the network topology connecting nodes on the performance of a Machine Learning model trained through direct collaboration between nodes. We investigate how different types of topologies impact the "spreading of knowledge", i.e., the ability of nodes to incorporate in their local model the knowledge derived by learning patterns in data available in other nodes across the networks. Specifically, we highlight the different roles in this process of more or less connected nodes (hubs and leaves), as well as that of macroscopic network properties (primarily, degree distribution and modularity). Among others, we show that, while it is known that even weak connectivity among network components is sufficient for information spread, it may not be sufficient for knowledge spread. More intuitively, we also find that hubs have a more significant role than leaves in spreading knowledge, although this manifests itself not only for heavy-tailed distributions but also when "hubs" have only moderately more connections than leaves. Finally, we show that tightly knit communities severely hinder knowledge spread.

摘要

A Theory for Emergence of Complex Skills in Language Models

paper_url: http://arxiv.org/abs/2307.15936
repo_url: https://github.com/dia2018/What-is-the-Difference-Between-AI-and-Machine-Learning
paper_authors: Sanjeev Arora, Anirudh Goyal
for: 本研究旨在解释语言模型新技能的emergence现象，当参数集和训练数据集scale up时emergence现象的机制尚未得到充分理解。
methods: 本研究使用了著名的Scaling Laws of LLMs和简单的统计分析方法来分析emergence现象。
results: 研究发现，透MDb的损失函数和语言任务的基本技能之间存在 Statistical关系，而Scaling Laws imply strong inductive bias，allowing pre-trained模型在efficiently learn new skills。例如，在$k$-tuple skills任务中，模型可以 Essentially at the same scaling and rate as learning elementary skills themselves。

Abstract
A major driver of AI products today is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up. This phenomenon is poorly understood, and a mechanistic explanation via mathematical analysis of gradient-based training seems difficult. The current paper takes a different approach, analysing emergence using the famous (and empirical) Scaling Laws of LLMs and a simple statistical framework. Contributions include: (a) A statistical framework that relates cross-entropy loss of LLMs to competence on the basic skills that underlie language tasks. (b) Mathematical analysis showing that the Scaling Laws imply a strong form of inductive bias that allows the pre-trained model to learn very efficiently. We informally call this {\em slingshot generalization} since naively viewed it appears to give competence levels at skills that violate usual generalization theory. (c) A key example of slingshot generalization, that competence at executing tasks involving $k$-tuples of skills emerges essentially at the same scaling and same rate as competence on the elementary skills themselves.

摘要
现代AI产品的一个主要驱动力是语言模型新的技能的出现，当其参数集和训练 Corpora 的大小增加时。这种现象还不够了解，而且使用梯度基本训练的数学分析还 seems difficult。本文采用了一种不同的方法，通过著名的（empirical）涨大法律和简单的统计框架来分析出现。本文的贡献包括：(a) 一种统计框架，将语言任务下的基本技能的杂合 entropy loss 与语言模型的 competed 关系。(b) 数学分析，显示了涨大法律的强形 inductive bias，使得预训练模型可以非常高效地学习。我们 Informally 称这为“箭头泛化”，因为从直观来看，它看起来会让模型在不同的任务上达到高效的 competed 水平。(c) 一个重要的例子，即在执行包含 $k $-tuple 技能的任务时，语言模型的 competed 水平会出现在基本技能的 competed 水平之上，并且在同样的涨大程度和速度上进行。

Language models as master equation solvers

paper_url: http://arxiv.org/abs/2308.02514
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Chuanbo Liu, Jin Wang
for: 解决幂等方程（master equation），即模拟随机动力系统的基本方程。
methods: 使用语言模型（language model）作为机器学习方法，将率参数、初始条件和时间值映射到状态共享分布中，即确切匹配输入上下文。
results: 对多模块和高维系统进行了示例应用，并观察到高准确率和扩展性。通过这种方法，可以使用单个预训练大模型解决任何幂等方程。

Abstract
Master equations are of fundamental importance in modeling stochastic dynamical systems.However, solving master equations is challenging due to the exponential increase in the number of possible states or trajectories with the dimension of the state space. In this study, we propose repurposing language models as a machine learning approach to solve master equations. We design a prompt-based neural network to map rate parameters, initial conditions, and time values directly to the state joint probability distribution that exactly matches the input contexts. In this way, we approximate the solution of the master equation in its most general form. We train the network using the policy gradient algorithm within the reinforcement learning framework, with feedback rewards provided by a set of variational autoregressive models. By applying this approach to representative examples, we observe high accuracy for both multi-module and high-dimensional systems. The trained network also exhibits extrapolating ability, extending its predictability to unseen data. Our findings establish the connection between language models and master equations, highlighting the possibility of using a single pretrained large model to solve any master equation.

摘要

ATESA-BÆRT: A Heterogeneous Ensemble Learning Model for Aspect-Based Sentiment Analysis

paper_url: http://arxiv.org/abs/2307.15920
repo_url: None
paper_authors: Elena-Simona Apostol, Alin-Georgian Pisică, Ciprian-Octavian Truică
for: 本研究旨在提高在线评论的分析精度，通过确定用户对不同产品和服务的意见。
methods: 本文提出了一种基于矩阵优化的多元搜索模型，可以同时处理多个凝聚因素。
results: 实验结果表明，该模型在两个 datasets 上具有更高的准确率和更好的精度，比现有方法更有优势。

Abstract
The increasing volume of online reviews has made possible the development of sentiment analysis models for determining the opinion of customers regarding different products and services. Until now, sentiment analysis has proven to be an effective tool for determining the overall polarity of reviews. To improve the granularity at the aspect level for a better understanding of the service or product, the task of aspect-based sentiment analysis aims to first identify aspects and then determine the user's opinion about them. The complexity of this task lies in the fact that the same review can present multiple aspects, each with its own polarity. Current solutions have poor performance on such data. We address this problem by proposing ATESA-B{\AE}RT, a heterogeneous ensemble learning model for Aspect-Based Sentiment Analysis. Firstly, we divide our problem into two sub-tasks, i.e., Aspect Term Extraction and Aspect Term Sentiment Analysis. Secondly, we use the \textit{argmax} multi-class classification on six transformers-based learners for each sub-task. Initial experiments on two datasets prove that ATESA-B{\AE}RT outperforms current state-of-the-art solutions while solving the many aspects problem.

摘要
随着在线评论的量的增加，可以开发出情感分析模型，以确定客户对不同产品和服务的看法。到目前为止，情感分析已经证明是一个有效的工具，用于确定评论的总性。为了提高各个方面的细化，以便更好地理解产品或服务，分别针对每个方面进行情感分析是一项有挑战性的任务。这是因为同一篇评论可能会涵盖多个方面，每个方面都有其自己的正面或负面。现有的解决方案在处理这类数据时表现不佳。我们解决这个问题，提出了ATESA-B{\AE}RT，一种多样性ensemble学习模型，用于各个方面的情感分析。首先，我们将问题分为两个子任务：一是方面术语提取，二是方面术语情感分析。其次，我们使用六个基于转换器的学习器进行每个子任务的\textit{argmax}多类分类。初步实验表明，ATESA-B{\AE}RT在两个数据集上的表现优于当前状态的最佳解决方案，并解决了多个方面问题。

Opportunistic Air Quality Monitoring and Forecasting with Expandable Graph Neural Networks

paper_url: http://arxiv.org/abs/2307.15916
repo_url: None
paper_authors: Jingwei Zuo, Wenbin Li, Michele Baldo, Hakim Hacid
for: 本研究旨在提出一种可扩展的图注意网络模型（EGAT），用于融合不同空间结构的数据采集，以提高空气质量预测的灵活性和准确性。
methods: 本研究使用了一种名为EGAT的图注意网络模型，可以处理不同空间结构的数据采集，并且可以与现有的预测模型结合使用。
results: 研究者通过使用EGAT模型，在实际的空气质量数据集上进行了验证，结果表明EGAT模型可以提高空气质量预测的灵活性和准确性。

Abstract
Air Quality Monitoring and Forecasting has been a popular research topic in recent years. Recently, data-driven approaches for air quality forecasting have garnered significant attention, owing to the availability of well-established data collection facilities in urban areas. Fixed infrastructures, typically deployed by national institutes or tech giants, often fall short in meeting the requirements of diverse personalized scenarios, e.g., forecasting in areas without any existing infrastructure. Consequently, smaller institutes or companies with limited budgets are compelled to seek tailored solutions by introducing more flexible infrastructures for data collection. In this paper, we propose an expandable graph attention network (EGAT) model, which digests data collected from existing and newly-added infrastructures, with different spatial structures. Additionally, our proposal can be embedded into any air quality forecasting models, to apply to the scenarios with evolving spatial structures. The proposal is validated over real air quality data from PurpleAir.

摘要
近年来，空气质量监测和预测已成为科研领域的热点话题。现在，基于数据驱动的空气质量预测方法受到了广泛关注，因为城市地区的数据收集设施已经成熔化了。 fixed 的基础设施，通常由国家机构或科技巨头部署，经常无法满足个性化的情况，例如预测没有任何基础设施的地区。因此，小型机构或公司具有有限预算的情况下，需要寻找更灵活的基础设施来采集数据。在这篇论文中，我们提出了一种可扩展的图注意网络（EGAT）模型，该模型可以处理来自现有和新增的基础设施的数据，并且具有不同的空间结构。此外，我们的提议可以融入任何空气质量预测模型中，以适应不断发展的空间结构。我们的提议被验证通过实际的紫色空气数据。

Moisesdb: A dataset for source separation beyond 4-stems

paper_url: http://arxiv.org/abs/2307.15913
repo_url: https://github.com/moises-ai/moises-db
paper_authors: Igor Pereira, Felipe Araújo, Filip Korzeniowski, Richard Vogl
for: 本研究 introduce了 Musical Source Separation 领域的 MoisesDB 数据集，用于驱动和评估精细音源分离系统的发展。
methods: 本研究使用了一个二级层次的 taxonomy 组织音频源，并提供了一个简单易用的 Python 库来下载、处理和使用 MoisesDB。
results: 本研究提供了不同精细度的开源分离模型的基准结果，并对数据集的内容进行了详细的文档和分析。

Abstract
In this paper, we introduce the MoisesDB dataset for musical source separation. It consists of 240 tracks from 45 artists, covering twelve musical genres. For each song, we provide its individual audio sources, organized in a two-level hierarchical taxonomy of stems. This will facilitate building and evaluating fine-grained source separation systems that go beyond the limitation of using four stems (drums, bass, other, and vocals) due to lack of data. To facilitate the adoption of this dataset, we publish an easy-to-use Python library to download, process and use MoisesDB. Alongside a thorough documentation and analysis of the dataset contents, this work provides baseline results for open-source separation models for varying separation granularities (four, five, and six stems), and discuss their results.

摘要
在这篇论文中，我们介绍了Musical Source Separation的MoisesDB数据集。它包含240首歌曲，来自45位艺术家，涵盖了12种音乐类型。每首歌曲都有其自己的声音来源，以两级层次的概念分类为stems。这将有助于建立和评估精细的音乐来源分离系统，超越使用四个声音来源（鼓、贝斯、其他和 vocals）的限制，因为缺乏数据。为了促进这个数据集的采用，我们在Python库中发布了一个易于使用的下载、处理和使用MoisesDB的工具。同时，我们还提供了数据集的详细文档和分析，以及不同的分离精度（四、五、六个声音来源）的基准结果。

Reinforcement Learning Under Probabilistic Spatio-Temporal Constraints with Time Windows

paper_url: http://arxiv.org/abs/2307.15910
repo_url: None
paper_authors: Xiaoshan Lin, Abbasali Koochakzadeh, Yasin Yazicioglu, Derya Aksaray
for: 本文提出了一种自动机理论方法，用于在复杂的空间时间约束下进行强化学习（RL）。
methods: 本文使用Markov决策过程下的 bounded temporal logic约束来形式化问题，并使用总自动机来翻译这个约束。文章还采用了基于可用的历史过程概率信息的方法来避免”危险”的动作。
results: 本文提供了关于约束满足的概率的理论保证，并提供了一个具有 periodic pick-up和交付任务的enario的数值结果，以证明本方法的有效性。

Abstract
We propose an automata-theoretic approach for reinforcement learning (RL) under complex spatio-temporal constraints with time windows. The problem is formulated using a Markov decision process under a bounded temporal logic constraint. Different from existing RL methods that can eventually learn optimal policies satisfying such constraints, our proposed approach enforces a desired probability of constraint satisfaction throughout learning. This is achieved by translating the bounded temporal logic constraint into a total automaton and avoiding "unsafe" actions based on the available prior information regarding the transition probabilities, i.e., a pair of upper and lower bounds for each transition probability. We provide theoretical guarantees on the resulting probability of constraint satisfaction. We also provide numerical results in a scenario where a robot explores the environment to discover high-reward regions while fulfilling some periodic pick-up and delivery tasks that are encoded as temporal logic constraints.

摘要
我们提出一个自动化理论方法来解决具有复杂时空范围的强化学习（RL）问题。问题是以Markov决策过程形式ulated，并受到紧存时逻规范例的约束。与现有RL方法不同的是，我们的提议方法可以在学习过程中确保满足这些约束的条件，并且可以在学习过程中确保这些约束的满意度。这是通过转换紧存时逻规范例为总自动aton来实现的。我们提供了对结果的概率满意度的理论保证。我们还提供了一个实际应用的数据，该数据显示一个 robot 在环境中探索高奖区域，并且遵循一些periodic pick-up和交付任务，这些任务是通过时间逻规范例表示的。

UniBriVL: Robust Universal Representation and Generation of Audio Driven Diffusion Models

paper_url: http://arxiv.org/abs/2307.15898
repo_url: None
paper_authors: Sen Fang, Bowen Gao, Yangjian Wu, Jingwen Cai, Teik Toe Teoh
for: 这篇论文的目的是提出一种基于视与语言的语言表示学习方法，以实现多模态应用程序的开发。
methods: 该方法基于bridging-vision-and-language（BriVL），将语音、图像和文本 embedding到共享空间中，解决了语音和图像表示学习中的主要挑战，同时能够有效地捕捉语音和图像之间的相互关系。
results: 实验结果表明，UniBriVL可以在下游任务中达到优异的效果，并且可以根据音频生成相应的图像。这种方法有很多应用前景，如语音识别、音乐信号处理和标题生成等。

Abstract
Multimodal large models have been recognized for their advantages in various performance and downstream tasks. The development of these models is crucial towards achieving general artificial intelligence in the future. In this paper, we propose a novel universal language representation learning method called UniBriVL, which is based on Bridging-Vision-and-Language (BriVL). Universal BriVL embeds audio, image, and text into a shared space, enabling the realization of various multimodal applications. Our approach addresses major challenges in robust language (both text and audio) representation learning and effectively captures the correlation between audio and image. Additionally, we demonstrate the qualitative evaluation of the generated images from UniBriVL, which serves to highlight the potential of our approach in creating images from audio. Overall, our experimental results demonstrate the efficacy of UniBriVL in downstream tasks and its ability to choose appropriate images from audio. The proposed approach has the potential for various applications such as speech recognition, music signal processing, and captioning systems.

摘要
多modal大型模型已被认可其在多种性能和下游任务中的优势。这种模型的发展对于实现未来的通用人工智能是关键。本文提出一种新的通用语言表示学习方法，称为UniBriVL，它基于bridging-vision-and-language（BriVL）。这种 универсаルBriVL嵌入音频、图像和文本到共享空间中，使得实现多种多modal应用程序变得可能。我们的方法解决了语言表示学习中的重要挑战，并有效地捕捉音频和图像之间的相关性。此外，我们还进行了生成图像的质量评估，以展示我们的方法在创建图像从音频中的可能性。总的来说，我们的实验结果表明UniBriVL在下游任务中的效果，并且可以选择适当的图像从音频中。这种方法在语音识别、音乐信号处理和描述系统等应用中具有潜在的潜力。

A new Gradient TD Algorithm with only One Step-size: Convergence Rate Analysis using $L$-$λ$ Smoothness

paper_url: http://arxiv.org/abs/2307.15892
repo_url: None
paper_authors: Hengshuai Yao
for: 这种论文是关于减少TD更新的维度($d$)的第一个$O(d)$算法，以及这种算法的 konvergence 约束，以及这种约束的证明。
methods: 这种论文使用了GTD算法，以及其两个步长参数。此外，这种论文还使用了一种新的单时间尺度GTD算法，以及一种基于$L$-$\lambda$ гладкость的证明。
results: 这种论文证明了新的GTD算法（即Impression GTD）可以在$O(1/t)$的速度下 konvergence，并且可以在更强的假设下提供更快的 konvergence 约束。此外，这种论文还对四种GTD算法的 konvergence 约束进行了证明，并且提供了实验结果，证明Impression GTD在Random walks、Boyan chain和Baird counterexample中 konvergence faster than其他GTD算法。

Abstract
Gradient Temporal Difference (GTD) algorithms (Sutton et al., 2008, 2009) are the first $O(d)$ ($d$ is the number features) algorithms that have convergence guarantees for off-policy learning with linear function approximation. Liu et al. (2015) and Dalal et. al. (2018) proved the convergence rates of GTD, GTD2 and TDC are $O(t^{-\alpha/2})$ for some $\alpha \in (0,1)$. This bound is tight (Dalal et al., 2020), and slower than $O(1/\sqrt{t})$. GTD algorithms also have two step-size parameters, which are difficult to tune. In literature, there is a "single-time-scale" formulation of GTD. However, this formulation still has two step-size parameters. This paper presents a truly single-time-scale GTD algorithm for minimizing the Norm of Expected td Update (NEU) objective, and it has only one step-size parameter. We prove that the new algorithm, called Impression GTD, converges at least as fast as $O(1/t)$. Furthermore, based on a generalization of the expected smoothness (Gower et al. 2019), called $L$-$\lambda$ smoothness, we are able to prove that the new GTD converges even faster, in fact, with a linear rate. Our rate actually also improves Gower et al.'s result with a tighter bound under a weaker assumption. Besides Impression GTD, we also prove the rates of three other GTD algorithms, one by Yao and Liu (2008), another called A-transpose-TD (Sutton et al., 2008), and a counterpart of A-transpose-TD. The convergence rates of all the four GTD algorithms are proved in a single generic GTD framework to which $L$-$\lambda$ smoothness applies. Empirical results on Random walks, Boyan chain, and Baird counterexample show that Impression GTD converges much faster than existing GTD algorithms for both on-policy and off-policy learning problems, with well-performing step-sizes in a big range.

摘要
gradient temporal difference（GTD）算法（Sutton et al., 2008, 2009）是首个 $O(d)$ ($d$ 是特征数) 的算法，具有离线学习 linear function approximation 的 convergence guarantee。Liu et al. (2015) 和 Dalal et al. (2018) 证明 GTD、GTD2 和 TDC 的 convergence rate 为 $O(t^{-\alpha/2})$，其中 $\alpha \in (0,1)$。这个 bound 是紧张的（Dalal et al., 2020），并且比 $O(1/\sqrt{t})$ 更慢。GTD 算法还有两个步长参数，这些参数难以调整。在文献中，有一种 "single-time-scale" 的 GTD 表述，但这种表述仍然有两个步长参数。这篇文章提出了一种真正的 single-time-scale GTD 算法，用于最小化 Norm of Expected td Update（NEU）目标函数，并且只有一个步长参数。我们证明该新算法，称为 Impression GTD，在 $O(1/t)$ 的速度下 converges。此外，基于预期的平滑（Gower et al. 2019）的一种推广，称为 $L$-$\lambda$ smoothness，我们能够证明 Impression GTD 的速度实际更快，实际上是 linear 速度。我们的速度实际也超越 Gower et al. 的结果，并且在较弱的假设下提供了更紧张的 bound。此外，我们还证明了三个 GTD 算法的 convergence rate，分别是 Yao and Liu (2008) 的一种算法，Sutton et al. (2008) 的 A-transpose-TD 算法，以及它的对应算法。所有四个 GTD 算法的 convergence rate 在一个通用的 GTD 框架中证明，该框架下 $L$-$\lambda$ smoothness 适用。empirical results 表明，Impression GTD 在Random walks、Boyan chain 和 Baird counterexample 问题中 converge much faster than existing GTD algorithms，并且步长在大范围内表现良好。

Point Annotation Probability Map: Towards Dense Object Counting by Tolerating Annotation Noise

paper_url: http://arxiv.org/abs/2308.00530
repo_url: None
paper_authors: Yuehai Chen
for: 这个研究旨在提高计算机视觉中对拥挤场景中物体的数量测量的精度和韧性。
methods: 这个研究使用了一种基于深度学习的方法，即将物体检测任务转化为一个 Gaussian 概率回归问题。然而，这种方法可能不能正确地考虑人工标注过程中的注释噪声，从而导致不同的分布。为了提高 robustness，这个研究使用了一种通用 Gaussian 分布函数（GGD）来形成学习目标点概率图像（PAPM）。
results: 对比于传统的手动设计 PAPM 方法（HD-PAPM）和适应学习 PAPM 方法（AL-PAPM），这个研究的方法在抗注释噪声方面显示出了更高的精度和韧性。此外，通过使用一种基于 GGD 的有效交通成本函数，这个研究还提出了一种可靠的交通框架，从而实现了更好的 PAPM 表现。

Abstract
Counting objects in crowded scenes remains a challenge to computer vision. The current deep learning based approach often formulate it as a Gaussian density regression problem. Such a brute-force regression, though effective, may not consider the annotation noise properly which arises from the human annotation process and may lead to different distributions. We conjecture that it would be beneficial to consider the annotation noise in the dense object counting task. To obtain strong robustness against annotation noise, generalized Gaussian distribution (GGD) function with a tunable bandwidth and shape parameter is exploited to form the learning target point annotation probability map, PAPM. Specifically, we first present a hand-designed PAPM method (HD-PAPM), in which we design a function based on GGD to tolerate the annotation noise. For end-to-end training, the hand-designed PAPM may not be optimal for the particular network and dataset. An adaptively learned PAPM method (AL-PAPM) is proposed. To improve the robustness to annotation noise, we design an effective transport cost function based on GGD. With such transport cost constraints, a better PAPM presentation could be adaptively learned with an optimal transport framework from point annotation in an end-to-end manner. Extensive experiments show the superiority of our proposed methods.

摘要
计算对象在增强的场景中的数量 remains a challenge to computer vision. 现有的深度学习基于方法 often formulate it as a Gaussian density regression problem. 这种粗野的回归，虽然有效，可能不会正确地考虑人工标注过程中的标注噪音。 We conjecture that it would be beneficial to consider the annotation noise in the dense object counting task. To obtain strong robustness against annotation noise, we exploit the generalized Gaussian distribution (GGD) function with a tunable bandwidth and shape parameter to form the learning target point annotation probability map, PAPM. Specifically, we first present a hand-designed PAPM method (HD-PAPM), in which we design a function based on GGD to tolerate the annotation noise. For end-to-end training, the hand-designed PAPM may not be optimal for the particular network and dataset. An adaptively learned PAPM method (AL-PAPM) is proposed. To improve the robustness to annotation noise, we design an effective transport cost function based on GGD. With such transport cost constraints, a better PAPM presentation could be adaptively learned with an optimal transport framework from point annotation in an end-to-end manner. Extensive experiments show the superiority of our proposed methods.

Recent neutrino oscillation result with the IceCube experiment

paper_url: http://arxiv.org/abs/2307.15855
repo_url: None
paper_authors: Shiqi Yu, Jessie Micallef
for: 探测TeV中微子发射的天体物理源
methods: 使用Convolutional Neural Networks重构中微子交互
results: 对大气μ中微子消失的新result和现有全球测量进行比较

Abstract
The IceCube South Pole Neutrino Observatory is a Cherenkov detector instrumented in a cubic kilometer of ice at the South Pole. IceCube's primary scientific goal is the detection of TeV neutrino emissions from astrophysical sources. At the lower center of the IceCube array, there is a subdetector called DeepCore, which has a denser configuration that makes it possible to lower the energy threshold of IceCube and observe GeV-scale neutrinos, opening the window to atmospheric neutrino oscillations studies. Advances in physics sensitivity have recently been achieved by employing Convolutional Neural Networks to reconstruct neutrino interactions in the DeepCore detector. In this contribution, the recent IceCube result from the atmospheric muon neutrino disappearance analysis using the CNN-reconstructed neutrino sample is presented and compared to the existing worldwide measurements.

摘要
南极冰矿 neutrino观测站是一种液气切变仪器，位于南极的冰中一公顷范围内。南极冰矿的主要科学目标是探测astrophysical sources的TeV neutrino发射。南极冰矿的下部中心有一个名为DeepCore的仪器，具有更密集的配置，使得可以降低iceCube的能量阈值，观测GeV级别的 neutrinos，开启大气中 neutrino振荡的研究窗口。最近，employning Convolutional Neural Networks（CNN）重建 neutrino互动的技术进行了进一步的物理敏感度提高。本贡献中将公布ICEube最新的大气μ neutrino消失分析结果，使用CNN重建的neutrino样本，与全球各地的现有测量进行比较。

Dimensionless Policies based on the Buckingham $π$ Theorem: Is it a good way to Generalize Numerical Results?

paper_url: http://arxiv.org/abs/2307.15852
repo_url: None
paper_authors: Alexandre Girard
for: 这 paper 是为了解决一种动力控制问题，即使用约束限制的倒挂pendulum swing-up问题。
methods: 该 paper 使用数字方法计算优化控制法，并利用约束限制的倒挂pendulum swing-up问题的数据生成优化控制器。
results: 研究发现，通过修改问题表述使用约束限制的倒挂pendulum swing-up问题的数据生成优化控制器可以在相似的系统上 reuse。此外，研究还发现了一种称为” режим”的概念，可以帮助relax约束限制的条件。最后，研究还讨论了将输入和输出缩放到相似系统上的约束限制的问题。

Abstract
Yes if the context, the list of variables defining the motion control problem, is dimensionally similar. Here we show that by modifying the problem formulation using dimensionless variables, we can re-use the optimal control law generated numerically for a specific system to a sub-space of dimensionally similar systems. This is demonstrated, with numerically generated optimal controllers, for the classic motion control problem of swinging-up a torque-limited inverted pendulum. We also discuss the concept of regime, a region in the space of context variables, that can help relax the condition on dimensional similarity. Futhermore, we discuss how applying dimensionnal scaling of the input and output of a context-specific policy is equivalent to substituing the new systems parameters in an analytical equation for dimentionnaly similar systems. It remains to be seen if this approach can also help generalizing policies for more complex high-dimensional problems.

摘要
如果上下文中的变量集合定义的运动控制问题的维度相似，那么我们可以通过修改问题定义使用约束维度相似的系统来重用numerically生成的优化控制器。这种方法在 класси的倾斜挠子问题上实现了，并通过 numerically生成的优化控制器来证明。我们还讨论了“ режим”这个概念，它是上下文变量空间中的一个区域，可以帮助降低维度相似性的条件。此外，我们还讨论了将输入和输出缩放到上下文特定策略中的维度相似系统的方法，这与将新系统参数substitued into an analytical equation for dimensionally similar systems中的方法相同。未知是这种方法还可以扩展到更复杂的高维度问题上。

Comprehensive Algorithm Portfolio Evaluation using Item Response Theory

paper_url: http://arxiv.org/abs/2307.15850
repo_url: https://github.com/sevvandi/airt-scripts
paper_authors: Sevvandi Kandanaarachchi, Kate Smith-Miles
for: 评估机器学习算法表现，包括评估算法在不同数据集上的一致性和异常性。
methods: 基于改进的Item Response Theory（IRT）模型，使用卷积神经网络对数据集进行分类。
results: 提供了一种简单、可解释的方法来评估机器学习算法竞争力，并且可以同时评估算法在不同数据集上的表现。

Abstract
Item Response Theory (IRT) has been proposed within the field of Educational Psychometrics to assess student ability as well as test question difficulty and discrimination power. More recently, IRT has been applied to evaluate machine learning algorithm performance on a single classification dataset, where the student is now an algorithm, and the test question is an observation to be classified by the algorithm. In this paper we present a modified IRT-based framework for evaluating a portfolio of algorithms across a repository of datasets, while simultaneously eliciting a richer suite of characteristics - such as algorithm consistency and anomalousness - that describe important aspects of algorithm performance. These characteristics arise from a novel inversion and reinterpretation of the traditional IRT model without requiring additional dataset feature computations. We test this framework on algorithm portfolios for a wide range of applications, demonstrating the broad applicability of this method as an insightful algorithm evaluation tool. Furthermore, the explainable nature of IRT parameters yield an increased understanding of algorithm portfolios.

摘要

Primitive Skill-based Robot Learning from Human Evaluative Feedback

paper_url: http://arxiv.org/abs/2307.15801
repo_url: None
paper_authors: Ayano Hiranaka, Minjune Hwang, Sharon Lee, Chen Wang, Li Fei-Fei, Jiajun Wu, Ruohan Zhang
for: 提高RL算法在真实环境中执行长期机器人 manipulate 任务的效率和安全性。
methods: 利用RL from human feedback（RLHF）和基本技能基于RL两种方法，可以有效地解决稀缺奖励问题和长期任务的复杂性。
results: 对五种 manipulate 任务进行了广泛的实验，比较了SEED与当前RL算法的性能，得到了显著的提高 sample efficiency 和安全性，同时也比其他RLHF方法具有更少的人工干预。

Abstract
Reinforcement learning (RL) algorithms face significant challenges when dealing with long-horizon robot manipulation tasks in real-world environments due to sample inefficiency and safety issues. To overcome these challenges, we propose a novel framework, SEED, which leverages two approaches: reinforcement learning from human feedback (RLHF) and primitive skill-based reinforcement learning. Both approaches are particularly effective in addressing sparse reward issues and the complexities involved in long-horizon tasks. By combining them, SEED reduces the human effort required in RLHF and increases safety in training robot manipulation with RL in real-world settings. Additionally, parameterized skills provide a clear view of the agent's high-level intentions, allowing humans to evaluate skill choices before they are executed. This feature makes the training process even safer and more efficient. To evaluate the performance of SEED, we conducted extensive experiments on five manipulation tasks with varying levels of complexity. Our results show that SEED significantly outperforms state-of-the-art RL algorithms in sample efficiency and safety. In addition, SEED also exhibits a substantial reduction of human effort compared to other RLHF methods. Further details and video results can be found at https://seediros23.github.io/.

摘要
Reinforcement learning (RL) 算法在实际环境中完成长期机器人操作任务时面临重大挑战，主要是因为样本不充分和安全问题。为了解决这些问题，我们提出了一个新的框架，即 SEED，该框架利用了两种方法：人类反馈学习（RLHF）和基本技能基于学习。这两种方法都能够有效地解决罕见奖励问题和长期任务的复杂性。通过结合这两种方法，SEED可以减少人类努力需要在RLHF中，并在实际训练机器人操作中增加安全性。此外，参数化技能提供了机器人高级意图的明确视图，allowing humans to evaluate skill choices before they are executed。这个特点使得训练过程更加安全和高效。为了评估 SEED 的表现，我们进行了对五种 manipulate 任务的广泛实验。我们的结果表明，SEED 在样本效率和安全性方面明显超过了现有的RL算法。此外，SEED 还表现出了与其他 RLHF 方法相比明显减少的人类努力。更多细节和视频结果可以在找到。

Summaries, Highlights, and Action items: Design, implementation and evaluation of an LLM-powered meeting recap system

paper_url: http://arxiv.org/abs/2307.15793
repo_url: None
paper_authors: Sumit Asthana, Sagih Hilleli, Pengcheng He, Aaron Halfaker
for: 这个论文的目的是提高在线计算机媒体空间中的会议体验，使用大语言模型进行会议摘要，以减少个人会议负担和提高会议输出的明确度和一致性。
methods: 该论文使用大语言模型进行会议摘要，并开发了一个基于对话摘要的会议摘要系统。系统包括两种突出的会议摘要表示方式：重要高亮和结构化层次视图。
results: 该论文通过与7名用户进行实验，发现使用大语言模型进行会议摘要可以提高会议体验，但还存在个人重要性和摘要质量等问题。研究结果表明，高质量的会议摘要可以帮助建立共享摘要文档，并且可以通过与用户合作来进一步改进摘要质量和个人重要性。

Abstract
Meetings play a critical infrastructural role in the coordination of work. In recent years, due to shift to hybrid and remote work, more meetings are moving to online Computer Mediated Spaces. This has led to new problems (e.g. more time spent in less engaging meetings) and new opportunities (e.g. automated transcription/captioning and recap support). Recent advances in large language models (LLMs) for dialog summarization have the potential to improve the experience of meetings by reducing individuals' meeting load and increasing the clarity and alignment of meeting outputs. Despite this potential, they face technological limitation due to long transcripts and inability to capture diverse recap needs based on user's context. To address these gaps, we design, implement and evaluate in-context a meeting recap system. We first conceptualize two salient recap representations -- important highlights, and a structured, hierarchical minutes view. We develop a system to operationalize the representations with dialogue summarization as its building blocks. Finally, we evaluate the effectiveness of the system with seven users in the context of their work meetings. Our findings show promise in using LLM-based dialogue summarization for meeting recap and the need for both representations in different contexts. However, we find that LLM-based recap still lacks an understanding of whats personally relevant to participants, can miss important details, and mis-attributions can be detrimental to group dynamics. We identify collaboration opportunities such as a shared recap document that a high quality recap enables. We report on implications for designing AI systems to partner with users to learn and improve from natural interactions to overcome the limitations related to personal relevance and summarization quality.

摘要
To address these gaps, we designed, implemented, and evaluated an in-context meeting recap system. We conceptualized two salient recap representations: important highlights and a structured, hierarchical minutes view. We developed a system to operationalize these representations using dialogue summarization as its building blocks. We evaluated the effectiveness of the system with seven users in the context of their work meetings. Our findings show promise in using LLM-based dialogue summarization for meeting recap, but we also identified limitations, such as a lack of understanding of what is personally relevant to participants, missing important details, and misattributions that can be detrimental to group dynamics.We suggest collaboration opportunities, such as a shared recap document, that a high-quality recap enables. We also identify the need for AI systems to partner with users to learn and improve from natural interactions to overcome the limitations related to personal relevance and summarization quality. Our findings have implications for designing AI systems for meeting support and other applications where summarization and personal relevance are important.

SAFE: Saliency-Aware Counterfactual Explanations for DNN-based Automated Driving Systems

paper_url: http://arxiv.org/abs/2307.15786
repo_url: None
paper_authors: Amir Samadi, Amir Shirian, Konstantinos Koufos, Kurt Debattista, Mehrdad Dianati
for: 本文提出了一种新的CF解释方法，用于生成更加有用的CF示例，以便更好地理解黑盒模型的决策过程。
methods: 本文使用了照明地图来生成CF示例，并且通过分析照明地图来确定CF示例的有用性。
results: 实验结果表明，本文提出的CF解释方法可以生成更加有用的CF示例，并且可以帮助理解黑盒模型的决策过程。Translation:
for: This paper proposes a new approach to CF explanations, which generates more informative CF examples to better understand the decision-making process of black-box models.
methods: The proposed method uses saliency maps to generate CF examples and evaluates their usefulness.
results: Experimental results show that the proposed CF explanation method can generate more informative CF examples and help understand the decision-making process of black-box models.

Abstract
A CF explainer identifies the minimum modifications in the input that would alter the model's output to its complement. In other words, a CF explainer computes the minimum modifications required to cross the model's decision boundary. Current deep generative CF models often work with user-selected features rather than focusing on the discriminative features of the black-box model. Consequently, such CF examples may not necessarily lie near the decision boundary, thereby contradicting the definition of CFs. To address this issue, we propose in this paper a novel approach that leverages saliency maps to generate more informative CF explanations. Source codes are available at: https://github.com/Amir-Samadi//Saliency_Aware_CF.

摘要
一种 CF 解释器可以确定输入中最小的修改，使模型的输出变为其 complement。即，CF 解释器计算模型决策边界上需要的最小修改。现有的深度生成 CF 模型通常使用用户选择的特征而不是黑盒模型的激发特征，因此 CF 示例可能不会位于决策边界附近，从而违反 CF 的定义。为解决这个问题，我们在这篇论文中提出了一种新的方法，利用 Saliency 地图生成更有用的 CF 解释。代码可以在：https://github.com/Amir-Samadi//Saliency_Aware_CF 中找到。

Spherical and Hyperbolic Toric Topology-Based Codes On Graph Embedding for Ising MRF Models: Classical and Quantum Topology Machine Learning

paper_url: http://arxiv.org/abs/2307.15778
repo_url: https://github.com/Lcrypto/Topology-Signal-Processing
paper_authors: Vasiliy Usatyuk, Sergey Egorov, Denis Sapozhnikov
for: 这篇论文探讨了应用信息几何来描述铁森模型的稳态状态。
methods: 该方法利用了环境和球体上的多面体代数来使用迪迪诺-莫里雷LLDPC码的自动同构和圆柱体代数来实现这一点。
results: 该研究显示了一种将深度学习架构与误差修正编码相关的新嵌入方法，以及一种使用统计物理和数学几何来优化误差修正编码的方法。这些方法有助于提高深度学习架构的设计、有效硬件设计和物理科学等领域的进步。

Abstract
The paper introduces the application of information geometry to describe the ground states of Ising models. This is achieved by utilizing parity-check matrices of cyclic and quasi-cyclic codes on toric and spherical topologies. The approach establishes a connection between machine learning and error-correcting coding, specifically in terms of automorphism and the size of the circulant of the quasi-cyclic code. This proposed approach has implications for the development of new embedding methods based on trapping sets. Statistical physics and number geometry are utilized to optimize error-correcting codes, leading to these embedding and sparse factorization methods. The paper establishes a direct connection between DNN architecture and error-correcting coding by demonstrating how state-of-the-art DNN architectures (ChordMixer, Mega, Mega-chunk, CDIL, ...) from the long-range arena can be equivalent to specific types (Cage-graph, Repeat Accumulate) of block and convolutional LDPC codes. QC codes correspond to certain types of chemical elements, with the carbon element being represented by the mixed automorphism Shu-Lin-Fossorier QC-LDPC code. The Quantum Approximate Optimization Algorithm (QAOA) used in the Sherrington-Kirkpatrick Ising model can be seen as analogous to the back-propagation loss function landscape in training DNNs. This similarity creates a comparable problem with TS pseudo-codeword, resembling the belief propagation method. Additionally, the layer depth in QAOA correlates to the number of decoding belief propagation iterations in the Wiberg decoding tree. Overall, this work has the potential to advance multiple fields, from Information Theory, DNN architecture design (sparse and structured prior graph topology), efficient hardware design for Quantum and Classical DPU/TPU (graph, quantize and shift register architect.) to Materials Science and beyond.

摘要
文章介绍了使用信息几何来描述碰声模型的稳定态。这是通过利用循环和各种圆柱形码的自带矩阵来实现的，特别是在拓扑和球形上。这种方法可以将机器学习和错误修复编码相连接，并且在自动同构和循环码的大小之间建立关系。这种提议的方法可以用于开发新的嵌入方法，基于拦截集。统计物理和数字几何在错误修复编码中进行优化，导致这些嵌入和稀疏因子化方法。文章还证明了深度学习架构与错误修复编码之间的直接关系，并且显示了状态艺术架构（ChordMixer、Mega、Mega-chunk、CDIL等）与特定类型（团格raph、重复积累）块和 convolutional LDPC 码之间的等价关系。QC 码对应于某些化学元素，而碳元素则被表示为混合自带矩阵 Shu-Lin-Fossorier QC-LDPC 码。Quantum Approximate Optimization Algorithm（QAOA）在希林-基瑞泽曼-碰声模型中可以被看作类似于反射传播损失函数顺序地形态，这种相似性创造了相似的问题，与TS pseudo-codeword相似，类似于信念传播方法。此外，QAOA层数与反射传播循环数在Wiberg解码树中相关。总之，这项工作有可能推动多个领域的进步，从信息理论、深度学习架构设计（稀疏和结构化前 Graph 拓扑）、高效的古驱肾设计（图形、量化和移位注册架构）到材料科学和更远的领域。

Select and Augment: Enhanced Dense Retrieval Knowledge Graph Augmentation

paper_url: http://arxiv.org/abs/2307.15776
repo_url: None
paper_authors: Micheal Abaho, Yousef H. Alfaifi
for: 提高知识 graphs（KG）中任务的性能，例如链接预测等
methods: 使用多任务框架，选择合适的文本描述来增强KG表示，并将文本描述与KG表示进行对应或增强
results: 在链接预测任务上，与传统CNN方法相比，提高了5.5%和3.5%的 Mean Reciprocal Rank（MRR）和Hits@10分数Note: The above information is in Simplified Chinese text.

Abstract
Injecting textual information into knowledge graph (KG) entity representations has been a worthwhile expedition in terms of improving performance in KG oriented tasks within the NLP community. External knowledge often adopted to enhance KG embeddings ranges from semantically rich lexical dependency parsed features to a set of relevant key words to entire text descriptions supplied from an external corpus such as wikipedia and many more. Despite the gains this innovation (Text-enhanced KG embeddings) has made, the proposal in this work suggests that it can be improved even further. Instead of using a single text description (which would not sufficiently represent an entity because of the inherent lexical ambiguity of text), we propose a multi-task framework that jointly selects a set of text descriptions relevant to KG entities as well as align or augment KG embeddings with text descriptions. Different from prior work that plugs formal entity descriptions declared in knowledge bases, this framework leverages a retriever model to selectively identify richer or highly relevant text descriptions to use in augmenting entities. Furthermore, the framework treats the number of descriptions to use in augmentation process as a parameter, which allows the flexibility of enumerating across several numbers before identifying an appropriate number. Experiment results for Link Prediction demonstrate a 5.5% and 3.5% percentage increase in the Mean Reciprocal Rank (MRR) and Hits@10 scores respectively, in comparison to text-enhanced knowledge graph augmentation methods using traditional CNNs.

摘要
<>translate the following text into Simplified Chinese<> injecting textual information into knowledge graph (KG) entity representations has been a worthwhile expedition in terms of improving performance in KG oriented tasks within the NLP community. external knowledge often adopted to enhance KG embeddings ranges from semantically rich lexical dependency parsed features to a set of relevant key words to entire text descriptions supplied from an external corpus such as wikipedia and many more. despite the gains this innovation (text-enhanced KG embeddings) has made, the proposal in this work suggests that it can be improved even further. instead of using a single text description (which would not sufficiently represent an entity because of the inherent lexical ambiguity of text), we propose a multi-task framework that jointly selects a set of text descriptions relevant to KG entities as well as align or augment KG embeddings with text descriptions. different from prior work that plugs formal entity descriptions declared in knowledge bases, this framework leverages a retriever model to selectively identify richer or highly relevant text descriptions to use in augmenting entities. furthermore, the framework treats the number of descriptions to use in augmentation process as a parameter, which allows the flexibility of enumerating across several numbers before identifying an appropriate number. experiment results for link prediction demonstrate a 5.5% and 3.5% percentage increase in the mean reciprocal rank (mrr) and hits@10 scores respectively, in comparison to text-enhanced knowledge graph augmentation methods using traditional cnns.Here's the translation:<>translate the following text into Simplified Chinese<>通过注入文本信息，知识图（KG）实体表示得到了NLP社区中进行KG oriented任务的改进。外部知识通常采用了具有semantic richness的 lexical dependency parsed feature以及一组相关的关键词来增强KG嵌入。尽管text-enhanced KG embeddings已经带来了一些进步，但是本文提议可以进一步改进。而不是使用单个文本描述（这将不足以表示实体，因为文本的内在ambiguity），我们提议一个多任务框架，它同时选择KG实体相关的文本描述，并将KG嵌入与文本描述进行对齐或扩充。与之前的方法不同，这个框架不使用知识库中声明的正式实体描述，而是利用一个检索模型，选择更加富有或高度相关的文本描述来增强实体。此外，框架对增强过程中的文本描述数量作为参数，允许在增强过程中列举多个数据，以便选择合适的数量。实验结果表明，在链接预测任务中，与传统CNN使用的text-enhanced KG增强方法相比，本方法可以提高MRR和Hits@10分数的平均reciprocal rank和Hits@10分数。

The Hydra Effect: Emergent Self-repair in Language Model Computations

paper_url: http://arxiv.org/abs/2307.15771
repo_url: None
paper_authors: Thomas McGrath, Matthew Rahtz, Janos Kramar, Vladimir Mikulik, Shane Legg
for: 本研究使用 causal 分析探讨语言模型计算的内部结构。
methods: 本研究使用ablation研究方法探讨语言模型层次结构的相互作用。
results: 研究发现语言模型层次结构具有自适应计算和抵消功能，即“卷舌效应”和“下游MLP层的减退功能”。这些效应在不含dropout的语言模型中也存在。

Abstract
We investigate the internal structure of language model computations using causal analysis and demonstrate two motifs: (1) a form of adaptive computation where ablations of one attention layer of a language model cause another layer to compensate (which we term the Hydra effect) and (2) a counterbalancing function of late MLP layers that act to downregulate the maximum-likelihood token. Our ablation studies demonstrate that language model layers are typically relatively loosely coupled (ablations to one layer only affect a small number of downstream layers). Surprisingly, these effects occur even in language models trained without any form of dropout. We analyse these effects in the context of factual recall and consider their implications for circuit-level attribution in language models.

摘要
我团队 Investigating 语言模型计算机制的内部结构，使用 causal 分析发现了两种模式：（1）一种适应 computation 的形式，其中剪除一层注意力层会导致另一层补偿（我们称之为“卷积效应”），以及（2）一种补偿函数，它使得晚期 MLP 层下降抑制最大可能性token。我们的ablation 研究表明，语言模型层通常是相对松散耦合的（剪除一层只会影响少量下游层）。奇怪的是，这些效果在没有任何 dropout 训练的情况下仍然出现。我们对这些效果在事实记忆中进行分析，并考虑它们对语言模型的征义归属的影响。

CHATREPORT: Democratizing Sustainability Disclosure Analysis through LLM-based Tools

paper_url: http://arxiv.org/abs/2307.15770
repo_url: https://github.com/edisonni-hku/chatreport
paper_authors: Jingwei Ni, Julia Bingler, Chiara Colesanti-Senni, Mathias Kraus, Glen Gostlow, Tobias Schimanski, Dominik Stammbach, Saeid Ashraf Vaghefi, Qian Wang, Nicolas Webersinke, Tobias Wekhof, Tingyu Yu, Markus Leippold
for: The paper aims to provide a novel LLM-based system for automating the analysis of corporate sustainability reports, with the goal of improving transparency and stakeholder empowerment.
methods: The system, called ChatReport, uses large language models (LLMs) to analyze sustainability reports and generate analyses, while addressing two key challenges: hallucination and the inefficiency of involving domain experts in the development loop.
results: The authors provide a methodology, annotated datasets, and generated analyses of 1015 reports to demonstrate the effectiveness of ChatReport. The results show that the system can provide accurate and traceable analyses of sustainability reports, empowering stakeholders and improving transparency in sustainability reporting.

Abstract
In the face of climate change, are companies really taking substantial steps toward more sustainable operations? A comprehensive answer lies in the dense, information-rich landscape of corporate sustainability reports. However, the sheer volume and complexity of these reports make human analysis very costly. Therefore, only a few entities worldwide have the resources to analyze these reports at scale, which leads to a lack of transparency in sustainability reporting. Empowering stakeholders with LLM-based automatic analysis tools can be a promising way to democratize sustainability report analysis. However, developing such tools is challenging due to (1) the hallucination of LLMs and (2) the inefficiency of bringing domain experts into the AI development loop. In this paper, we ChatReport, a novel LLM-based system to automate the analysis of corporate sustainability reports, addressing existing challenges by (1) making the answers traceable to reduce the harm of hallucination and (2) actively involving domain experts in the development loop. We make our methodology, annotated datasets, and generated analyses of 1015 reports publicly available.

摘要
在气候变化的面前，公司是否实际做出了更加可持续的操作？回答需要探索企业可持续报告的含义，但这些报告的量和复杂性使人工分析成本高昂。因此，只有少数世界各国的机构有能力大规模分析这些报告，这导致可持续报告的透明度不足。为了解决这个问题，我们提出了一种基于自然语言处理（LLM）的自动分析系统——ChatReport。我们的系统可以帮助投资者、消费者和其他关注可持续发展的各种利益相关者更好地理解企业的可持续发展情况。我们的系统可以解决现有挑战，包括：1. LLM的幻觉：由于LLM的幻觉问题，自动分析系统可能会产生错误的结论。我们的系统可以使答案traceable，以减少幻觉的影响。2. 域专家的参与：在开发LLM模型时，域专家的参与是关键。我们的系统可以 актив地吸引域专家参与开发过程，以提高模型的准确性和可靠性。我们的方法、标注数据集和对1015份报告的自动分析结果都公开 disponibles。通过我们的系统，您可以快速和高效地获得可持续发展的信息，以帮助您做出更加 Informed 的决策。

Goodness-of-Fit of Attributed Probabilistic Graph Generative Models

paper_url: http://arxiv.org/abs/2308.03773
repo_url: None
paper_authors: Pablo Robles-Granda, Katherine Tsai, Oluwasanmi Koyejo
for: 这篇论文主要用于描述如何评估 probabilistic generative models of graphs 的goodness of fit。
methods: 该论文使用了 mean square contingency coefficient 作为评估标准，并提供了一种方法来确保 strutture of learned attributed graph 的质量。
results: 该论文通过应用这种方法来评估了不同种类的图模型的表示能力。

Abstract
Probabilistic generative models of graphs are important tools that enable representation and sampling. Many recent works have created probabilistic models of graphs that are capable of representing not only entity interactions but also their attributes. However, given a generative model of random attributed graph(s), the general conditions that establish goodness of fit are not clear a-priori. In this paper, we define goodness of fit in terms of the mean square contingency coefficient for random binary networks. For this statistic, we outline a procedure for assessing the quality of the structure of a learned attributed graph by ensuring that the discrepancy of the mean square contingency coefficient (constant, or random) is minimal with high probability. We apply these criteria to verify the representation capability of a probabilistic generative model for various popular types of graph models.

摘要
probabilistic生成模型可以用来表示和采样图像。近期的许多研究都创建了可以表示实体互动以及其属性的概率模型。但是，给定一个生成模型，确定好的适应性是不明确的。在这篇论文中，我们定义适应性是指随机二元网络的mean square contingency coefficient的平均值。我们还详细介绍了一种方法，以确保学习的嵌入图像的结构质量高，这种方法是通过确保mean square contingency coefficient的差异（常数或随机）是最小的来实现。我们应用这些标准来验证不同种类的图像模型的表示能力。

Lessons in Reproducibility: Insights from NLP Studies in Materials Science

paper_url: http://arxiv.org/abs/2307.15759
repo_url: None
paper_authors: Xiangyun Lei, Edward Kim, Viktoriia Baibakova, Shijing Sun
for: 本研究对两篇开创性论文进行了可重复性分析，即 “机器学习和编码 synthesis parameters of oxide materials” 和 “自主学习 word embeddings capture latent knowledge from materials science literature”。
methods: 这两篇论文都提供了完整的工作流程，整洁的代码库，以及丰富的评估指南。这使得复制他们的结果更加容易，并部分地复制他们的发现。
results: 我们的分析表明，这两篇论文设置了可贵的标准，使得未来的材料科学发表物可以借鉴。然而，我们还发现了一些需要改进的地方，如提供可 копи得训练数据，更加透明的模型架构和训练过程，以及软件依赖关系版本的详细说明。此外，我们还比较了这两篇论文中的 word embedding 模型，发现了一些关键的可重复性和交叉兼容性问题，这些问题与模型本身外部的设计选择有关。

Abstract
Natural Language Processing (NLP), a cornerstone field within artificial intelligence, has been increasingly utilized in the field of materials science literature. Our study conducts a reproducibility analysis of two pioneering works within this domain: "Machine-learned and codified synthesis parameters of oxide materials" by Kim et al., and "Unsupervised word embeddings capture latent knowledge from materials science literature" by Tshitoyan et al. We aim to comprehend these studies from a reproducibility perspective, acknowledging their significant influence on the field of materials informatics, rather than critiquing them. Our study indicates that both papers offered thorough workflows, tidy and well-documented codebases, and clear guidance for model evaluation. This makes it easier to replicate their results successfully and partially reproduce their findings. In doing so, they set commendable standards for future materials science publications to aspire to. However, our analysis also highlights areas for improvement such as to provide access to training data where copyright restrictions permit, more transparency on model architecture and the training process, and specifications of software dependency versions. We also cross-compare the word embedding models between papers, and find that some key differences in reproducibility and cross-compatibility are attributable to design choices outside the bounds of the models themselves. In summary, our study appreciates the benchmark set by these seminal papers while advocating for further enhancements in research reproducibility practices in the field of NLP for materials science. This balance of understanding and continuous improvement will ultimately propel the intersecting domains of NLP and materials science literature into a future of exciting discoveries.

摘要
自然语言处理（NLP）是人工智能的一个核心领域，在材料科学文献中越来越广泛应用。我们的研究对两篇先锋性论文进行了可重现性分析：“机器学习和编码的材料合成参数” by Kim et al., 和“自然语言模型捕捉材料科学文献中隐知知识” by Tshitoyan et al。我们的研究目的是理解这两篇论文的可重现性，而不是批评它们。我们发现这两篇论文都提供了完整的工作流程、整洁的代码基础和详细的模型评估指南。这使得复制其结果成功并部分复制其发现更加容易。这两篇论文在设置标准的同时，也释放了一些可以进一步改进的提示，例如提供版权限制允许的训练数据访问，更加透明的模型架构和训练过程，以及软件依赖版本的详细说明。我们还将这两篇论文中的词嵌入模型进行比较，发现它们在可重现性和交互兼容性方面有一些关键的差异，这些差异可以归因于模型设计的选择。总之，我们的研究对这两篇论文进行了评价，同时强调了在NLP和材料科学文献领域的研究可重现性实践的进一步提高。这种平衡的理解和不断改进将最终推动这两个领域的发现。

Uncertainty in Natural Language Generation: From Theory to Applications

paper_url: http://arxiv.org/abs/2307.15703
repo_url: https://github.com/Rastaman4e/-1
paper_authors: Joris Baan, Nico Daheim, Evgenia Ilia, Dennis Ulmer, Haau-Sing Li, Raquel Fernández, Barbara Plank, Rico Sennrich, Chrysoula Zerva, Wilker Aziz
for: 这篇论文旨在探讨如何使自然语言生成（NLG）系统更加可靠和可信，以满足不同人群的需求。
methods: 论文提出了一种基于不确定性理论的NLG系统设计方法，包括表示不确定性的基本概念、框架和 vocabulary，以及从语言角度描述NLG中的主要不确定性来源。
results: 论文认为，在NLG系统中处理不确定性可以帮助创建更加适应人群需求的系统和评估协议，并提出了一些有前途的研究方向，如通过不确定性来强化解码、可控生成、自我评估、选择回答、活动学习等。

Abstract
Recent advances of powerful Language Models have allowed Natural Language Generation (NLG) to emerge as an important technology that can not only perform traditional tasks like summarisation or translation, but also serve as a natural language interface to a variety of applications. As such, it is crucial that NLG systems are trustworthy and reliable, for example by indicating when they are likely to be wrong; and supporting multiple views, backgrounds and writing styles -- reflecting diverse human sub-populations. In this paper, we argue that a principled treatment of uncertainty can assist in creating systems and evaluation protocols better aligned with these goals. We first present the fundamental theory, frameworks and vocabulary required to represent uncertainty. We then characterise the main sources of uncertainty in NLG from a linguistic perspective, and propose a two-dimensional taxonomy that is more informative and faithful than the popular aleatoric/epistemic dichotomy. Finally, we move from theory to applications and highlight exciting research directions that exploit uncertainty to power decoding, controllable generation, self-assessment, selective answering, active learning and more.

摘要
First, we present the fundamental theory, frameworks, and vocabulary required to represent uncertainty. We then characterize the main sources of uncertainty in NLG from a linguistic perspective, and propose a two-dimensional taxonomy that is more informative and faithful than the popular aleatoric/epistemic dichotomy. Finally, we move from theory to applications and highlight exciting research directions that exploit uncertainty to power decoding, controllable generation, self-assessment, selective answering, active learning, and more.Translated into Simplified Chinese:最近的强大语言模型的进步使得自然语言生成（NLG）成为一种重要的技术，不仅可以完成传统任务如摘要或翻译，还可以作为多种应用程序的自然语言 интер法。因此，NLG 系统的可靠性和可预测性是非常重要的，例如指示它们可能会错误，并支持多个视点、背景和写作风格，反映人类亚群体。在这篇文章中，我们 argue That a principled treatment of uncertainty can assist in creating systems and evaluation protocols better aligned with these goals.我们首先提出了需要表示不确定性的基本理论、框架和术语。然后，我们从语言学 perspective Characterize the main sources of uncertainty in NLG, and propose a two-dimensional taxonomy that is more informative and faithful than the popular aleatoric/epistemic dichotomy. Finally, we move from theory to applications and highlight exciting research directions that exploit uncertainty to power decoding, controllable generation, self-assessment, selective answering, active learning, and more.Translated into Traditional Chinese:最近的强大语言模型的进步使得自然语言生成（NLG）成为一种重要的技术，不仅可以完成传统任务如摘要或翻译，还可以作为多种应用程序的自然语言 інтер法。因此，NLG 系统的可靠性和可预测性是非常重要的，例如指示它们可能会错误，并支持多个视点、背景和写作风格，反映人类亚群体。在这篇文章中，我们 argue That a principled treatment of uncertainty can assist in creating systems and evaluation protocols better aligned with these goals.我们首先提出了需要表示不确定性的基本理论、框架和术语。然后，我们从语言学 perspective Characterize the main sources of uncertainty in NLG, and propose a two-dimensional taxonomy that is more informative and faithful than the popular aleatoric/epistemic dichotomy. Finally, we move from theory to applications and highlight exciting research directions that exploit uncertainty to power decoding, controllable generation, self-assessment, selective answering, active learning, and more.

AI for Anticipatory Action: Moving Beyond Climate Forecasting

paper_url: http://arxiv.org/abs/2307.15727
repo_url: None
paper_authors: Benjamin Q. Huynh, Mathew V. Kiang
for: 这篇论文旨在提供关于气候预测转移到预测行动的概述，并评估机器学习模型在气候预测中的应用。
methods: 论文评论了机器学习模型在气候预测中的应用，并发现了一些难题，例如如何使机器学习模型更好地支持预测行动。
results: 论文高亮了机器学习模型在气候预测中的应用可以帮助减轻气候变化对最容易受到影响的人群的影响，但还需要更多的研究来解决方法上的难题。

Abstract
Disaster response agencies have been shifting from a paradigm of climate forecasting towards one of anticipatory action: assessing not just what the climate will be, but how it will impact specific populations, thereby enabling proactive response and resource allocation. Machine learning models are becoming exceptionally powerful at climate forecasting, but methodological gaps remain in terms of facilitating anticipatory action. Here we provide an overview of anticipatory action, review relevant applications of machine learning, identify common challenges, and highlight areas where machine learning can uniquely contribute to advancing disaster response for populations most vulnerable to climate change.

摘要
气候灾害机构正在从气候预测模式向一种预期行动模式转移：不仅评估气候会如何发展，而且评估气候对特定人口的影响，以便进行积极的应对和资源分配。机器学习模型在气候预测方面已经非常强大，但在实施预期行动方面还存在一些方法学挑战。本文提供了预期行动的概述，浏览了相关的机器学习应用，确认了常见的挑战，并强调了机器学习在气候变化影响最容易受到影响的人口群体中的独特贡献。

A supervised hybrid quantum machine learning solution to the emergency escape routing problem

paper_url: http://arxiv.org/abs/2307.15682
repo_url: None
paper_authors: Nathan Haboury, Mo Kordzanganeh, Sebastian Schmitt, Ayush Joshi, Igor Tokarev, Lukas Abdallah, Andrii Kurkin, Basil Kyriacou, Alexey Melnikov
for: 这篇研究探讨了如何使用监督式混合量子机器学习来优化自然灾害发生时的紧急撤退计划。
methods: 研究使用了一种新的混合式监督学习方法，融合了量子和传统的FiLM ней罗网络，并在一个实验城市graph上进行训练。
results: 研究发现，将量子和传统的FiLM ней罗网络融合在一起可以提高整体模型的表达力，并在训练 dataset上预测 Navigation 任务的成功率提高7%。

Abstract
Managing the response to natural disasters effectively can considerably mitigate their devastating impact. This work explores the potential of using supervised hybrid quantum machine learning to optimize emergency evacuation plans for cars during natural disasters. The study focuses on earthquake emergencies and models the problem as a dynamic computational graph where an earthquake damages an area of a city. The residents seek to evacuate the city by reaching the exit points where traffic congestion occurs. The situation is modeled as a shortest-path problem on an uncertain and dynamically evolving map. We propose a novel hybrid supervised learning approach and test it on hypothetical situations on a concrete city graph. This approach uses a novel quantum feature-wise linear modulation (FiLM) neural network parallel to a classical FiLM network to imitate Dijkstra's node-wise shortest path algorithm on a deterministic dynamic graph. Adding the quantum neural network in parallel increases the overall model's expressivity by splitting the dataset's harmonic and non-harmonic features between the quantum and classical components. The hybrid supervised learning agent is trained on a dataset of Dijkstra's shortest paths and can successfully learn the navigation task. The hybrid quantum network improves over the purely classical supervised learning approach by 7% in accuracy. We show that the quantum part has a significant contribution of 45.(3)% to the prediction and that the network could be executed on an ion-based quantum computer. The results demonstrate the potential of supervised hybrid quantum machine learning in improving emergency evacuation planning during natural disasters.

摘要
natural disasters 的回应可以减轻其破坏性的影响。这项工作探讨使用监督式量子机器学习优化自然灾害期间撤离车辆的紧急计划。研究将地震灾害作为研究对象，将问题模型为一个动态计算图，地震破坏城市区域，居民寻求离开城市达到出口点，解决方案是一个短路问题在不确定和动态发展的地图上。我们提出了一种新的半监督学习方法，并在假设情况下测试其在具有具体城市图的情况下。这种方法使用一种新的量子特征WISE linear modulation（FiLM）神经网络并行地与一个经典FiLM神经网络相似，用于模拟在决定性动态图上Dijkstra的节点短路算法。在加入量子神经网络后，总模型的表达能力得到提高，因为将数据集的幂律和非幂律特征分别传递给量子和经典组件。半监督学习代理人通过一个包含Dijkstra短路的数据集进行训练，并成功学习导航任务。半量子网络在纯经典监督学习方法的基础上提高了准确率，提高了7%。我们发现量子部分对预测做出了重要贡献，占总预测概率的45.(3)%。我们还证明了这种网络可以在离子基础上执行量子计算机。结果表明，半量子机器学习在自然灾害期间emergency evacuation planning中具有潜在的优势。

Benchmarking Anomaly Detection System on various Jetson Edge Devices

paper_url: http://arxiv.org/abs/2307.16834
repo_url: None
paper_authors: Hoang Viet Pham, Thinh Gia Tran, Chuong Dinh Le, An Dinh Le, Hien Bich Vo
for: 增强市民安全和福祉的监控视频异常事件捕捉
methods: 应用EdgeAI技术，实现端到端犯罪现场异常检测系统
results: 比其他状态艺术算法竞争的异常检测模型，并在多个Jetson边缘设备上测试并部署AI系统，并提供了使用Docker技术进行系统性能改进的经验。

Abstract
Capturing the abnormal event from surveillance videos enhances the safety and well-being of the citizens. The application of EdgeAI (Edge computing-based Artificial Intelligent ) meets the strict latency requirements for security. In this paper, we apply weakly supervised video anomaly detection called Robust Temporal Feature Magnitude Learning (RTFM) to an end-to-end crime-scene anomaly detection system from the surveillance cameras with the help of edge computing technology. The system is tested directly on multiple Jetson edge devices combined with TensorRT as the software developer kit from NVIDIA for system performance enhancement. The experience of an AI-based system deployment on various Jetson Edge devices with Docker technology is also provided. The anomaly detection model yields competitive results compared to other state-of-the-art (SOTA) algorithms on available datasets such as UCF-Crime and UIT VNAnomaly. The approach system reaches 47.56 frames per second (FPS) inference speed on a Jetson edge device with only 3.11 GB RAM usage total. We also discover the promising Jetson device that the AI system achieves 15% better performance than the previous version of Jetson devices while consuming 50% less energy power.

摘要
capturing the abnormal event from surveillance videos enhances the safety and well-being of the citizens. The application of EdgeAI (Edge computing-based Artificial Intelligence) meets the strict latency requirements for security. In this paper, we apply weakly supervised video anomaly detection called Robust Temporal Feature Magnitude Learning (RTFM) to an end-to-end crime-scene anomaly detection system from the surveillance cameras with the help of edge computing technology. The system is tested directly on multiple Jetson edge devices combined with TensorRT as the software developer kit from NVIDIA for system performance enhancement. The experience of an AI-based system deployment on various Jetson Edge devices with Docker technology is also provided. The anomaly detection model yields competitive results compared to other state-of-the-art (SOTA) algorithms on available datasets such as UCF-Crime and UIT VNAnomaly. The approach system reaches 47.56 frames per second (FPS) inference speed on a Jetson edge device with only 3.11 GB RAM usage total. We also discover the promising Jetson device that the AI system achieves 15% better performance than the previous version of Jetson devices while consuming 50% less energy power.

Case Studies of Causal Discovery from IT Monitoring Time Series

paper_url: http://arxiv.org/abs/2307.15678
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Ali Aït-Bachir, Charles K. Assaad, Christophe de Bignicourt, Emilie Devijver, Simon Ferreira, Eric Gaussier, Hosein Mohanna, Lei Zan
for: 本研究旨在应用 causal discovery 算法于 IT 监控数据，以获得系统的 causal 关系。
methods: 本研究使用了不同的 causal discovery 算法，包括 PC 算法、 FCI 算法和 Causal Additive Model (CAM) 算法，并对不同的 IT 监控数据进行了实验。
results: 研究发现，这些 causal discovery 算法可以帮助获得系统的 causal 关系，但是也存在一些挑战，例如时间序列不一致、睡眠时间序列、时间戳不准确和欠数据等。

Abstract
Information technology (IT) systems are vital for modern businesses, handling data storage, communication, and process automation. Monitoring these systems is crucial for their proper functioning and efficiency, as it allows collecting extensive observational time series data for analysis. The interest in causal discovery is growing in IT monitoring systems as knowing causal relations between different components of the IT system helps in reducing downtime, enhancing system performance and identifying root causes of anomalies and incidents. It also allows proactive prediction of future issues through historical data analysis. Despite its potential benefits, applying causal discovery algorithms on IT monitoring data poses challenges, due to the complexity of the data. For instance, IT monitoring data often contains misaligned time series, sleeping time series, timestamp errors and missing values. This paper presents case studies on applying causal discovery algorithms to different IT monitoring datasets, highlighting benefits and ongoing challenges.

摘要
信息技术（IT）系统是现代企业中不可或缺的，它们负责数据存储、通信和自动化过程。监控这些系统非常重要，因为它可以收集广泛的观察时间序数据，用于分析。随着 causal discovery 的兴趣在 IT 监控系统中增长，因为它可以帮助发现 IT 系统中不同组件之间的 causal 关系，从而降低停机时间，提高系统性能和识别异常和事件的根本原因。此外，它还允许预测未来问题的预测，通过历史数据分析。尽管它拥有这些优点，但是在应用 causal discovery 算法于 IT 监控数据时，还存在一些挑战，例如 IT 监控数据中的时间序列误差、休眠时间序列、时间戳错误和缺失值。这篇文章介绍了不同 IT 监控数据集的 case study，探讨了这些挑战和 beneficial 效果。

paper_url: http://arxiv.org/abs/2307.15644
repo_url: https://github.com/wz0919/scalevln
paper_authors: Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao
for: 提高普通环境下的语言导航agent性能
methods: 使用HM3D和Gibson dataset中的1200+真实照片环境和网络资源进行数据生成，并使用这些数据进行预训练和微调
results: 使用这些数据可以提高现有agent的性能 (+11%相对于之前的SoTA)，并将未看过环境下的游走成功率降低到<1%（相比之前的8%）。此外，这种方法还使得不同的模型在CVDN、REVERIE和R2R中实现了新的状态计算导航结果。

Abstract
Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents. To tackle the common data scarcity issue in existing vision-and-language navigation datasets, we propose an effective paradigm for generating large-scale data for learning, which applies 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs using fully-accessible resources on the web. Importantly, we investigate the influence of each component in this paradigm on the agent's performance and study how to adequately apply the augmented data to pre-train and fine-tune an agent. Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning. The long-lasting generalization gap between navigating in seen and unseen environments is also reduced to less than 1% (versus 8% in the previous best method). Moreover, our paradigm also facilitates different models to achieve new state-of-the-art navigation results on CVDN, REVERIE, and R2R in continuous environments.

摘要
近期研究 языковой导向视觉导航已经表明了广泛环境多样性和训练总体代理人的强需求。为了解决现有视觉语言导航数据集中的常见数据缺乏问题，我们提出了一种有效的数据生成模型，该模型应用了1200+的真实照片环境从HM3D和Gibson数据集，并使用了全面访问网络资源生成490万条 instruciton trajectory对。我们进一步研究这种模型中每个组件对代理人性能的影响，并研究如何正确地应用扩展数据来预训练和精度调整代理人。感谢我们的大规模数据集，已有代理人的性能可以被提高 (+11%绝对与之前的SoTA) 到一个新的最佳值80%单次成功率在R2R测试分割。此外，我们的模型也使得不同的模型在CVDN、REVERIE和R2R在连续环境中实现新的导航成绩。

2023-07-29

cs.CL

cs.CL - 2023-07-29

Towards Codable Text Watermarking for Large Language Models

paper_url: http://arxiv.org/abs/2307.15992
repo_url: https://github.com/lancopku/codable-watermarking-for-llm
paper_authors: Lean Wang, Wenkai Yang, Deli Chen, Hao Zhou, Yankai Lin, Fandong Meng, Jie Zhou, Xu Sun
for: 防止大语言模型（LLM）的滥用，通过植入隐藏模式到生成的文本中来识别文本是否由LLM生成。
methods: 利用文本水印技术，在LLM生成的文本中植入隐藏模式以识别文本来源。
results: 提出了首个系统性研究，对LLM水印技术进行了首个系统性研究，并提出了一种可编程文本水印技术（CTWL），可以在不同的LLM应用场景中具有更多自定义信息编码需求。

Abstract
As large language models (LLMs) generate texts with increasing fluency and realism, there is a growing need to identify the source of texts to prevent the abuse of LLMs. Text watermarking techniques have proven reliable in distinguishing whether a text is generated by LLMs by injecting hidden patterns into the generated texts. However, we argue that existing watermarking methods for LLMs are encoding-inefficient (only contain one bit of information - whether it is generated from an LLM or not) and cannot flexibly meet the diverse information encoding needs (such as encoding model version, generation time, user id, etc.) in different LLMs application scenarios. In this work, we conduct the first systematic study on the topic of Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry more customizable information. First of all, we study the taxonomy of LLM watermarking technology and give a mathematical formulation for CTWL. Additionally, we provide a comprehensive evaluation system for CTWL: (1) watermarking success rate, (2) robustness against various corruptions, (3) coding rate of payload information, (4) encoding and decoding efficiency, (5) impacts on the quality of the generated text. To meet the requirements of these non-Pareto-improving metrics, we devise a CTWL method named Balance-Marking, based on the motivation of ensuring that available and unavailable vocabularies for encoding information have approximately equivalent probabilities. Compared to the random vocabulary partitioning extended from the existing work, a probability-balanced vocabulary partition can significantly improve the quality of the generated text. Extensive experimental results have shown that our method outperforms a direct baseline under comprehensive evaluation.

摘要
大型语言模型（LLM）生成的文本流行度在增长，需要识别文本来防止LLM的滥用。文本水印技术已经证明可以有效地 отличи出LLM生成的文本，但我们认为现有的LLM水印方法存在编码不fficient（只包含一bit信息，即是否由LLM生成）和不能适应不同的LLM应用场景中的多样化信息编码需求。在这项工作中，我们进行了首次系统性的研究，探讨 codable text watermarking for LLM（CTWL）技术，允许文本水印包含更多自定义信息。首先，我们研究了LLM水印技术的分类和CTWL的数学表述。此外，我们提供了对CTWL的全面评价方法：（1）水印成功率，（2）对各种损害的Robustness，（3）payload信息的编码率，（4）编码和解码效率，（5）对生成文本质量的影响。为满足这些不可比较的度量，我们提出了一种名为Balance-Marking的CTWL方法，基于保证可用和不可用词汇的编码信息有相近的概率。与现有的随机词汇分 partitioning 方法相比，一个probability-balanced词汇分可以显著改善生成文本的质量。我们的方法在全面的实验结果中胜过直接基eline。

GeneMask: Fast Pretraining of Gene Sequences to Enable Few-Shot Learning

paper_url: http://arxiv.org/abs/2307.15933
repo_url: https://github.com/roysoumya/genemask
paper_authors: Soumyadeep Roy, Jonas Wallat, Sowmya S Sundaram, Wolfgang Nejdl, Niloy Ganguly
for: 学习优化基因表示，用于基因序列分类。
methods: 提出了一种新的masking算法， GeneMask，用于基因序列中的MASKED语言模型训练。
results: 在四个基因序列分类数据集上，GeneMask-based模型在五种几个shot Setting中显著超过了SOTA模型（DNABert和LOGO），而且可以在训练时间上减少一半。此外，我们还发现了 conserved DNA sequence motifs和top-ranked PMI tokens之间的强相关性，可能表明了基因序列中的隐藏信息的包含。

Abstract
Large-scale language models such as DNABert and LOGO aim to learn optimal gene representations and are trained on the entire Human Reference Genome. However, standard tokenization schemes involve a simple sliding window of tokens like k-mers that do not leverage any gene-based semantics and thus may lead to (trivial) masking of easily predictable sequences and subsequently inefficient Masked Language Modeling (MLM) training. Therefore, we propose a novel masking algorithm, GeneMask, for MLM training of gene sequences, where we randomly identify positions in a gene sequence as mask centers and locally select the span around the mask center with the highest Normalized Pointwise Mutual Information (NPMI) to mask. We observe that in the absence of human-understandable semantics in the genomics domain (in contrast, semantic units like words and phrases are inherently available in NLP), GeneMask-based models substantially outperform the SOTA models (DNABert and LOGO) over four benchmark gene sequence classification datasets in five few-shot settings (10 to 1000-shot). More significantly, the GeneMask-based DNABert model is trained for less than one-tenth of the number of epochs of the original SOTA model. We also observe a strong correlation between top-ranked PMI tokens and conserved DNA sequence motifs, which may indicate the incorporation of latent genomic information. The codes (including trained models) and datasets are made publicly available at https://github.com/roysoumya/GeneMask.

摘要
大规模语言模型如DNABert和LOGO目标学习优化的基因表示，并在人类参照基因组中进行训练。然而，标准的分割方案通常使用简单的滑块窗口的 токен，不利用基因基本 semantics，可能导致（轻松）遮盲易predictable的序列，并最终导致不fficient的Masked Language Modeling（MLM）训练。因此，我们提出了一种新的遮盲算法， called GeneMask，用于 MLM 训练基因序列，我们在基因序列中随机选择位置作为遮盲中心，然后在遮盲中心周围选择 highest Normalized Pointwise Mutual Information（NPMI）的 span来遮盲。我们发现，在 genomics 领域缺乏人类理解的 semantics（与 NLP 领域中的 semantic units 不同），GeneMask 基于模型在四个基因序列分类数据集上表现出优于 SOTA 模型（DNABert 和 LOGO）的五种 few-shot Setting（10 到 1000 个批）中表现出优。此外，我们发现 GeneMask 基于 DNABert 模型在训练时间上的减少为原始 SOTA 模型的一半，并且我们发现顶尖 PMI token 与保守的 DNA 序列模式之间存在强相关性，这可能表明 incorporation 隐藏的 genomic 信息。我们将codes（包括训练模型）和数据集公开发布在 GitHub 上，请参考。

Analysing the Resourcefulness of the Paragraph for Precedence Retrieval

paper_url: http://arxiv.org/abs/2308.01203
repo_url: https://github.com/bhoomeendra/paragraph_resourcefulness
paper_authors: Bhoomeendra Singh Sisodiya, Narendra Babu Unnam, P. Krishna Reddy, Apala Das, K. V. K. Santhy, V. Balakista Reddy
for: aid legal practitioners in retrieving relevant legal information
methods: analyzed the resourcefulness of paragraph-level information in capturing similarity among judgments
results: found that paragraph-level methods could capture similarity with only a few paragraph interactions and exhibit more discriminating power over baseline document-level method, with comparable performance to state-of-the-art methods.Here is the same information in Simplified Chinese text:
for: aid法律启用者在获取相关法律信息方面
methods: 利用判例文本中段级信息来捕捉判例之间的相似性
results: 发现段级方法可以通过只需几个段交互来捕捉判例之间的相似性，并且比基线文档级方法具有更高的分化力，与现状的状态艺术方法相当。

Abstract
Developing methods for extracting relevant legal information to aid legal practitioners is an active research area. In this regard, research efforts are being made by leveraging different kinds of information, such as meta-data, citations, keywords, sentences, paragraphs, etc. Similar to any text document, legal documents are composed of paragraphs. In this paper, we have analyzed the resourcefulness of paragraph-level information in capturing similarity among judgments for improving the performance of precedence retrieval. We found that the paragraph-level methods could capture the similarity among the judgments with only a few paragraph interactions and exhibit more discriminating power over the baseline document-level method. Moreover, the comparison results on two benchmark datasets for the precedence retrieval on the Indian supreme court judgments task show that the paragraph-level methods exhibit comparable performance with the state-of-the-art methods

摘要
研究抽取有关法律信息以帮助法律专业人士是一个活跃的研究领域。在这个方面，研究团队通过不同的信息类型，如元数据、引用、关键词、句子、段落等进行探索。与任何文本文档一样，法律文档也是由段落组成。本文通过分析了法律文档段落级别信息的资源fulness，发现段落级别方法可以通过只有几个段落互动 capture judgments的相似性，并且与基eline文档级别方法相比，段落级别方法具有更高的分化力。此外，对印度最高法院判决任务的两个标准数据集进行比较研究表明，段落级别方法与当前领域的状态态方法相比，具有相似的性能。

Dialogue Shaping: Empowering Agents through NPC Interaction

paper_url: http://arxiv.org/abs/2307.15833
repo_url: None
paper_authors: Wei Zhou, Xiangyu Peng, Mark Riedl
for: 本研究旨在使用人工智能技术快速让RL代理人在文本游戏环境中学习优化策略，特别是在大量步骤训练过程中。
methods: 本研究使用大型自然语言模型（LLMs）与NPCE交互获取关键信息，并使用知识图（KGs）和Story Shaping将该信息integrate到RL代理人的训练中。
results: 研究表明，通过与NPCE交互获取关键信息，可以帮助RL代理人更快地 converge 到优化策略，提高训练效率。

Abstract
One major challenge in reinforcement learning (RL) is the large amount of steps for the RL agent needs to converge in the training process and learn the optimal policy, especially in text-based game environments where the action space is extensive. However, non-player characters (NPCs) sometimes hold some key information about the game, which can potentially help to train RL agents faster. Thus, this paper explores how to interact and converse with NPC agents to get the key information using large language models (LLMs), as well as incorporate this information to speed up RL agent's training using knowledge graphs (KGs) and Story Shaping.

摘要
一个主要挑战在强制学习（RL）是训练过程中RL机器人需要很多步骤才能学习最佳策略，特别在文本基于游戏环境中，动作空间很广泛。然而，非玩家角色（NPC）有时会拥有游戏中的一些关键信息，这可能能够帮助快速训练RL机器人。因此，这篇论文探讨如何与NPC代理交互获取关键信息，以及如何使用知识图（KG）和 Story Shaping incorporate这些信息以加速RL机器人的训练。

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

paper_url: http://arxiv.org/abs/2307.15818
repo_url: None
paper_authors: Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Lisa Lee, Tsang-Wei Edward Lee, Sergey Levine, Yao Lu, Henryk Michalewski, Igor Mordatch, Karl Pertsch, Kanishka Rao, Krista Reymann, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Pierre Sermanet, Jaspiar Singh, Anikait Singh, Radu Soricut, Huong Tran, Vincent Vanhoucke, Quan Vuong, Ayzaan Wahid, Stefan Welker, Paul Wohlhart, Jialin Wu, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, Brianna Zitkovich
for: 这个论文的目标是将视力语言模型直接integrated到终端控制中，以提高总结和允许emergent semantic reasoning。
methods: 作者提出了一种简单的、通用的方法来实现这个目标：表示动作为文本token，并将其直接 incorporated into the training set of the model。
results: 作者的方法导致了高效的机器人策略和优秀的总结能力，并允许模型获得了一系列的emergent capabilities，如对 novel object generalization、不存在于机器人训练数据中的命令理解以及multi-stage semantic reasoning。

Abstract
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web. To this end, we propose to co-fine-tune state-of-the-art vision-language models on both robotic trajectory data and Internet-scale vision-language tasks, such as visual question answering. In contrast to other approaches, we propose a simple, general recipe to achieve this goal: in order to fit both natural language responses and robotic actions into the same format, we express the actions as text tokens and incorporate them directly into the training set of the model in the same way as natural language tokens. We refer to such category of models as vision-language-action models (VLA) and instantiate an example of such a model, which we call RT-2. Our extensive evaluation (6k evaluation trials) shows that our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training. This includes significantly improved generalization to novel objects, the ability to interpret commands not present in the robot training data (such as placing an object onto a particular number or icon), and the ability to perform rudimentary reasoning in response to user commands (such as picking up the smallest or largest object, or the one closest to another object). We further show that incorporating chain of thought reasoning allows RT-2 to perform multi-stage semantic reasoning, for example figuring out which object to pick up for use as an improvised hammer (a rock), or which type of drink is best suited for someone who is tired (an energy drink).

摘要
我们研究如何使用互联网规模的数据进行训练，以实现普适的机器人控制和出现意义的推理。我们的目标是让单个终端训练的模型可以同时学习机器人观察到的动作和大规模预训练的语言和视觉语言数据。为达到这个目标，我们提议并行练习现状的视觉语言模型，包括机器人轨迹数据和互联网规模的视觉语言任务，如视觉问答。我们的方法是将机器人动作表示为文本符号，并将其直接添加到模型的训练集中，与自然语言符号一样。我们称之为视觉语言动作模型（VLA），并实现了一个例子，称为RT-2。我们的广泛评估（6000次评估试验）表明，我们的方法可以实现高效的机器人政策，并使RT-2能够获得互联网规模训练的许多emergent能力，包括对新物品的广泛适应、对机器人培训数据中未出现的命令的解释、以及对用户命令的简单推理。此外，我们还证明了将链条思维包含在VLA中可以实现多stage意义推理，例如选择哪个物品作为锤子（一个岩石），或者选择哪种饮料适合某人（一种能量饮料）。

Resume Evaluation through Latent Dirichlet Allocation and Natural Language Processing for Effective Candidate Selection

paper_url: http://arxiv.org/abs/2307.15752
repo_url: None
paper_authors: Vidhita Jagwani, Smit Meghani, Krishna Pai, Sudhir Dhage
for: 提出一种基于Latent Dirichlet Allocation(LDA)和实体检测（SpaCy）的简历评分方法，以提高简历评分的内容准确性。
methods: 方法首先使用SpaCy的Named Entity Recognition（NER）提取简历中有用的实体，然后使用LDA模型将这些实体分配到不同的主题中，并计算每个实体的主题概率。
results: 使用LDA模型，我们的提出的系统可以将简历分解成 latent topic，并提取有意义的semantic representation。在考虑技能、学历、工作经验等多个属性时，我们的模型达到了77%的准确率（仅考虑技能）和82%的总准确率。

Abstract
In this paper, we propose a method for resume rating using Latent Dirichlet Allocation (LDA) and entity detection with SpaCy. The proposed method first extracts relevant entities such as education, experience, and skills from the resume using SpaCy's Named Entity Recognition (NER). The LDA model then uses these entities to rate the resume by assigning topic probabilities to each entity. Furthermore, we conduct a detailed analysis of the entity detection using SpaCy's NER and report its evaluation metrics. Using LDA, our proposed system breaks down resumes into latent topics and extracts meaningful semantic representations. With a vision to define our resume score to be more content-driven rather than a structure and keyword match driven, our model has achieved 77% accuracy with respect to only skills in consideration and an overall 82% accuracy with all attributes in consideration. (like college name, work experience, degree and skills)

摘要
在这篇论文中，我们提出了一种使用Latent Dirichlet Allocation（LDA）和实体检测（SpaCy）来评分简历的方法。我们的方法首先从简历中提取有用的实体，如教育经验和技能，使用SpaCy的命名实体识别（NER）。然后，LDA模型使用这些实体来评分简历，并将每个实体分配话题概率。此外，我们还进行了NER的Entity detection的详细分析，并对其评估指标进行了报告。使用LDA，我们的提出的系统可以将简历分解成含义上的话题，并提取有意义的语义表示。我们的目标是使简历评分更加内容驱动，而不是基于结构和关键词匹配，我们的模型在只考虑技能时的准确率为77%，并在所有特征（包括学院名、工作经验、学位和技能）的情况下的总准确率为82%。

Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering

paper_url: http://arxiv.org/abs/2307.15745
repo_url: None
paper_authors: Nandita Naik, Christopher Potts, Elisa Kreiss
for: 使网络更加可交互地访问，帮助无法看到图像的人们问题图像。
methods: 引入了 Context-VQA dataset，该 dataset将图像与上下文（例如购物网站）对应。
results: 发现不同上下文下的问题类型系统性异，例如旅游上的图像引起 2 倍多的 “Where?” 问题，社交媒体和新闻上的图像引起 2.8 倍多的 “Who?” 问题。Context 对 VQA 模型的性能有重要影响，特别在无法看到图像的情况下。

Abstract
Visual question answering (VQA) has the potential to make the Internet more accessible in an interactive way, allowing people who cannot see images to ask questions about them. However, multiple studies have shown that people who are blind or have low-vision prefer image explanations that incorporate the context in which an image appears, yet current VQA datasets focus on images in isolation. We argue that VQA models will not fully succeed at meeting people's needs unless they take context into account. To further motivate and analyze the distinction between different contexts, we introduce Context-VQA, a VQA dataset that pairs images with contexts, specifically types of websites (e.g., a shopping website). We find that the types of questions vary systematically across contexts. For example, images presented in a travel context garner 2 times more "Where?" questions, and images on social media and news garner 2.8 and 1.8 times more "Who?" questions than the average. We also find that context effects are especially important when participants can't see the image. These results demonstrate that context affects the types of questions asked and that VQA models should be context-sensitive to better meet people's needs, especially in accessibility settings.

摘要
视觉问答（VQA）有可能使互联网变得更加访问ible，让无法看到图像的人可以通过问题来了解图像。然而，多个研究表明，盲人或有低视力的人更喜欢图像解释包含图像的上下文，然而当前VQA数据集却主要关注图像的孤立显示。我们认为VQA模型不会完全满足人们的需求， Unless they take context into account。为了进一步驱动和分析不同上下文的差异，我们引入了Context-VQA，一个对图像与上下文进行对应的VQA数据集。我们发现，不同的上下文中的问题类型系统atically vary。例如，在旅游上下文中出现的图像会引发2倍的“Where?”问题，而社交媒体和新闻上下文中出现的图像会引发2.8倍的“Who?”问题。我们还发现，上下文效应在参与者无法看到图像时特别重要。这些结果表明上下文对问题类型的影响，并且VQA模型应该sensitive to context，以更好地满足人们的需求，特别是在Accessibility设置下。

2023-07-29

cs.LG

cs.LG - 2023-07-29

Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model

paper_url: http://arxiv.org/abs/2308.00010
repo_url: None
paper_authors: S. Rijal, R. Neupane, S. P. Mainali, S. K. Regmi, S. Maharjan
for: 这篇论文是为了解决cocktail party问题而写的，该问题存在多个说话者的杂音混合中分离 individu speaker的困难。
methods: 该论文基于Transformer架构，使用了有效的形式来实现单频多说话者语音分离。模型在使用LibriMix数据集进行训练，可以分离出2个不同的说话者的源音。
results: 该模型可以减少语音分离模型的计算复杂性，而不是与传统语音分离模型的性能there is a significant trade-off。这个项目预期将在计算效率为核心的speech separation研究中做出重要贡献。

Abstract
Cocktail party problem is the scenario where it is difficult to separate or distinguish individual speaker from a mixed speech from several speakers. There have been several researches going on in this field but the size and complexity of the model is being traded off with the accuracy and robustness of speech separation. "Monaural multi-speaker speech separation" presents a speech-separation model based on the Transformer architecture and its efficient forms. The model has been trained with the LibriMix dataset containing diverse speakers' utterances. The model separates 2 distinct speaker sources from a mixed audio input. The developed model approaches the reduction in computational complexity of the speech separation model, with minimum tradeoff with the performance of prevalent speech separation model and it has shown significant movement towards that goal. This project foresees, a rise in contribution towards the ongoing research in the field of speech separation with computational efficiency at its core.

摘要
干预吧问题是一种场景，在混合语音中分 apart或 distinguishing individu speaker 很困难。有很多研究在这个领域，但是模型的大小和复杂度与准确性和可靠性之间存在负面关系。“单频多 speaker speech separation”提出了一种基于 Transformer 架构的 speech-separation 模型，并且其高效的形式。该模型在使用 LibriMix 数据集中训练，可以从混合音频输入中分离出两个不同的 speaker 源。该模型减少了speech separation 模型的计算复杂度，同时保持了与传统 speech separation 模型的性能相似的水平。这个项目预计会对激进的 speech separation 研究做出重要贡献，计算效率为核心。

A 3D deep learning classifier and its explainability when assessing coronary artery disease

paper_url: http://arxiv.org/abs/2308.00009
repo_url: None
paper_authors: Wing Keung Cheung, Jeremy Kalindjian, Robert Bell, Arjun Nair, Leon J. Menezes, Riyaz Patel, Simon Wan, Kacy Chou, Jiahang Chen, Ryo Torii, Rhodri H. Davies, James C. Moon, Daniel C. Alexander, Joseph Jacob
for: 预测和诊断抑阻性心血管疾病 (CAD)，以保存生命和减少医疗成本。
methods: 使用3D Resnet-50深度学习模型直接将正常人群和CAD患者分类在计算机Tomography coronary angiography图像上。
results: 比2D Resnet-50模型提高23.65%的准确率，同时提供了 Grad-GAM解释性。此外，还将3D CAD分类连接到2D两类 semantic segmentation，以提高解释性和精确的畸形定位。

Abstract
Early detection and diagnosis of coronary artery disease (CAD) could save lives and reduce healthcare costs. In this study, we propose a 3D Resnet-50 deep learning model to directly classify normal subjects and CAD patients on computed tomography coronary angiography images. Our proposed method outperforms a 2D Resnet-50 model by 23.65%. Explainability is also provided by using a Grad-GAM. Furthermore, we link the 3D CAD classification to a 2D two-class semantic segmentation for improved explainability and accurate abnormality localisation.

摘要
早期发现和诊断心络动脉疾病（CAD）可以拯救生命和减少医疗成本。在这项研究中，我们提出了一种基于3D Resnet-50深度学习模型的直接分类正常人和CAD患者的计算机Tomography coronary angiography图像。我们的提议方法比2D Resnet-50模型高出23.65%。此外，我们还使用Grad-GAM来提供解释性。此外，我们将3D CAD分类与2D两个分类semantic segmentation相连接，以提高解释性和精确的异常位置定位。Here's the breakdown of the translation:* 早期发现 (early detection) becomes 早期发现 (zhāo qī fāxìn)* 诊断 (diagnosis) becomes 诊断 (diànfǎng)* 心络动脉疾病 (coronary artery disease) becomes 心络动脉疾病 (xīn liàng dòng mǎi byōng bìng)* computed tomography coronary angiography (CTCA) becomes 计算机Tomography coronary angiography (jìsuànjī Tomography coronary angiography)* 模型 (model) becomes 模型 (módelì)* 直接分类 (direct classification) becomes 直接分类 (zhíxí fānglè)* 正常人 (normal subjects) becomes 正常人 (zhèngzhèng rén)* CAD patients becomes CAD患者 (CAD huàyè)* 解释性 (explainability) becomes 解释性 (jiějīngxìng)* Grad-GAM becomes Grad-GAM (Grad-GAM)* 2D two-class semantic segmentation becomes 2D两个分类semantic segmentation (2D liǎnggè fānglè semantic segmentation)

A data-centric deep learning approach to airway segmentation

paper_url: http://arxiv.org/abs/2308.00008
repo_url: None
paper_authors: Wing Keung Cheung, Ashkan Pakzad, Nesrin Mogulkoc, Sarah Needleman, Bojidar Rangelov, Eyjolfur Gudmundsson, An Zhao, Mariam Abbas, Davina McLaverty, Dimitrios Asimakopoulos, Robert Chapman, Recep Savas, Sam M Janes, Yipeng Hu, Daniel C. Alexander, John R Hurst, Joseph Jacob
for: 这种研究用于鉴别和诊断各种慢性呼吸疾病中的气道束缚畸形和分布特征，以便估算疾病的EXTENT和严重程度。
methods: 该研究提出了一种数据驱动的深度学习技术，用于 segmenting the airway tree。该技术利用了 interpolate 和 image split，以提高数据的有用性和质量。然后，我们实现了一种 ensemble learning 策略，以集成不同缩放的 segmented airway trees。
results: 与基线模型相比，我们的方法在使用 combineloss 时，平均提高了 segmentation performance（dice similarity coefficient）的表现，高于基线模型的平均值2.5%。此外，我们的提posed technique具有低的GPU使用量和高的灵活性，可以在任何2D深度学习模型上部署。

Abstract
The morphology and distribution of airway tree abnormalities enables diagnosis and disease characterisation across a variety of chronic respiratory conditions. In this regard, airway segmentation plays a critical role in the production of the outline of the entire airway tree to enable estimation of disease extent and severity. In this study, we propose a data-centric deep learning technique to segment the airway tree. The proposed technique utilises interpolation and image split to improve data usefulness and quality. Then, an ensemble learning strategy is implemented to aggregate the segmented airway trees at different scales. In terms of segmentation performance (dice similarity coefficient), our method outperforms the baseline model by 2.5% on average when a combined loss is used. Further, our proposed technique has a low GPU usage and high flexibility enabling it to be deployed on any 2D deep learning model.

摘要
《气道树异常 morphology 和分布》能够用于诊断和疾病特征化多种慢性呼吸疾病。在这种情况下，气道分 segmentation 扮演着关键的角色，以生成整个气道树的轮廓，以便估算疾病的扩散和严重程度。本研究提出了一种基于数据的深度学习技术，用于气道分 segmentation。该技术利用 interpolate 和图像分割来提高数据的有用性和质量。然后，我们实施了一种 ensemble learning 策略，将不同缩放的气道树分割结果聚合 together。在 segmentation 性能（ dice 相似度）方面，我们的方法在使用 combinated loss 时比基准模型高出 2.5% 的平均值。此外，我们的提议方法具有低 GPU 使用率和高灵活性，可以在任何 2D 深度学习模型上部署。

UPFL: Unsupervised Personalized Federated Learning towards New Clients

paper_url: http://arxiv.org/abs/2307.15994
repo_url: None
paper_authors: Tiandi Ye, Cen Chen, Yinggui Wang, Xiang Li, Ming Gao
for: 这篇论文targets the problem of providing personalized models for new clients in federated learning, when the existing model has already been trained and deployed.
methods: 本论文提出了一种基于 adaptive risk minimization 技术的方法，named FedTTA，以及两个优化策略：proxy regularization和early-stopping。此外，本论文还提出了一个特有的知识传授损失，用于Addressing device heterogeneity。
results: 实验结果显示，FedTTA和其变型获得了优秀的性能，在五个数据集上比基eline eleven 倍进步。code可以在：https://github.com/anonymous-federated-learning/code。

Abstract
Personalized federated learning has gained significant attention as a promising approach to address the challenge of data heterogeneity. In this paper, we address a relatively unexplored problem in federated learning. When a federated model has been trained and deployed, and an unlabeled new client joins, providing a personalized model for the new client becomes a highly challenging task. To address this challenge, we extend the adaptive risk minimization technique into the unsupervised personalized federated learning setting and propose our method, FedTTA. We further improve FedTTA with two simple yet effective optimization strategies: enhancing the training of the adaptation model with proxy regularization and early-stopping the adaptation through entropy. Moreover, we propose a knowledge distillation loss specifically designed for FedTTA to address the device heterogeneity. Extensive experiments on five datasets against eleven baselines demonstrate the effectiveness of our proposed FedTTA and its variants. The code is available at: https://github.com/anonymous-federated-learning/code.

摘要
<>个人化联合学习已经吸引了广泛的注意力，作为数据不一致性的解决方案。在这篇论文中，我们解决了联合学习中较为未经探索的问题。当一个联合模型已经训练和部署后，新的客户端加入，提供个性化模型 для新客户端是一项非常具有挑战性的任务。为解决这个挑战，我们将适应风险最小化技术扩展到无监督个性化联合学习设置中，并提出我们的方法FedTTA。此外，我们还提出了两种简单 yet 有效的优化策略：在适应模型训练中使用代理约束和在适应过程中使用Entropy来停止。此外，我们还提出了专门为FedTTA设计的知识塑化损失来Address设备不同性。我们在五个数据集上对十一个基eline进行了广泛的实验，并证明了我们的提议FedTTA和其变种的效果。代码可以在：https://github.com/anonymous-federated-learning/code中找到。

Feature Reweighting for EEG-based Motor Imagery Classification

paper_url: http://arxiv.org/abs/2308.02515
repo_url: None
paper_authors: Taveena Lotey, Prateek Keserwani, Debi Prosad Dogra, Partha Pratim Roy
for: 这个研究旨在使用非侵入性的电enzephalographic (EEG) 信号进行 motor imagery (MI) 的分类，以预测用户的手部运动意图。
methods: 这个研究使用了卷积神经网络 (CNN) 方法进行 MI-EEG 信号的分类，并提出了一个具有降噪特性的特征重新权重方法，以缓解对于 MI-EEG 信号的训练中的问题。
results: 实验结果显示，提出的方法可以对 Physionet EEG-MMIDB 和 BCI Competition IV 2a 资料集进行了有效的分类，与现有方法相比，提高了9.34% 和 3.82% 的分类精度。

Abstract
Classification of motor imagery (MI) using non-invasive electroencephalographic (EEG) signals is a critical objective as it is used to predict the intention of limb movements of a subject. In recent research, convolutional neural network (CNN) based methods have been widely utilized for MI-EEG classification. The challenges of training neural networks for MI-EEG signals classification include low signal-to-noise ratio, non-stationarity, non-linearity, and high complexity of EEG signals. The features computed by CNN-based networks on the highly noisy MI-EEG signals contain irrelevant information. Subsequently, the feature maps of the CNN-based network computed from the noisy and irrelevant features contain irrelevant information. Thus, many non-contributing features often mislead the neural network training and degrade the classification performance. Hence, a novel feature reweighting approach is proposed to address this issue. The proposed method gives a noise reduction mechanism named feature reweighting module that suppresses irrelevant temporal and channel feature maps. The feature reweighting module of the proposed method generates scores that reweight the feature maps to reduce the impact of irrelevant information. Experimental results show that the proposed method significantly improved the classification of MI-EEG signals of Physionet EEG-MMIDB and BCI Competition IV 2a datasets by a margin of 9.34% and 3.82%, respectively, compared to the state-of-the-art methods.

摘要
“ motor 幻想（MI）使用非侵入性电энцефалографи（EEG）信号的分类是一个关键的目标，因为它可以预测用户的肢体运动意图。在latest的研究中，卷积神经网络（CNN）基本方法广泛应用于MI-EEG分类。MI-EEG信号分类训练 neural network 的挑战包括低信号噪声率、非站ARY、非线性和高复杂性的EEG信号。CNN基本方法在高噪声MI-EEG信号上计算的特征包含无关信息。因此，CNN基本方法计算的特征图中含有无关信息。这些多余的特征通常会误导神经网络训练并下降分类性能。因此，一种新的特征重要性评分方法被提出，以解决这个问题。该方法包括一种干扰降低机制，名为特征重要性评分模块，该模块可以减少无关的时间和通道特征图中的干扰。特征重要性评分模块生成的分数可以重新评估特征图中的重要性，从而降低无关信息的影响。实验结果表明，提出的方法可以显著改善Physionet EEG-MMIDB和BCI Competition IV 2a数据集中MI-EEG信号的分类性能，相比之下state-of-the-art方法的margin为9.34%和3.82%。”

RGB-D-Fusion: Image Conditioned Depth Diffusion of Humanoid Subjects

paper_url: http://arxiv.org/abs/2307.15988
repo_url: None
paper_authors: Sascha Kirch, Valeria Olyunina, Jan Ondřej, Rafael Pagés, Sergio Martin, Clara Pérez-Molina
for: 生成高分辨率深度图从低分辨率单色RGB图像中
methods: 使用图像conditioned杂度泛化概率模型生成低分辨率深度图，然后使用第二个杂度泛化概率模型conditioned在低分辨率RGB-D图像上upsample depth map
results: 提出了一种多模态conditioned杂度泛化概率模型，可以高效地生成高分辨率深度图从低分辨率单色RGB图像中

Abstract
We present RGB-D-Fusion, a multi-modal conditional denoising diffusion probabilistic model to generate high resolution depth maps from low-resolution monocular RGB images of humanoid subjects. RGB-D-Fusion first generates a low-resolution depth map using an image conditioned denoising diffusion probabilistic model and then upsamples the depth map using a second denoising diffusion probabilistic model conditioned on a low-resolution RGB-D image. We further introduce a novel augmentation technique, depth noise augmentation, to increase the robustness of our super-resolution model.

摘要
我们介绍RGB-D-Fusion，一种多模态条件杂化推敲模型，用于从低分辨率单色RGB图像中生成高分辨率深度图。RGB-D-Fusion首先使用一种图像conditioned杂化推敲probabilistic模型生成低分辨率深度图，然后使用第二个杂化推敲probabilistic模型，conditioned on low-resolution RGB-D图像，进行upsampling。我们还介绍了一种新的扩展技术，深度噪声增强，以提高我们的超分辨率模型的可靠性。

Vehicle Price Prediction By Aggregating decision tree model With Boosting Model

paper_url: http://arxiv.org/abs/2307.15982
repo_url: None
paper_authors: Auwal Tijjani Amshi
for: 这个研究的目的是预测二手车价格，这是一个有趣且需要的问题，因为车价预测具有许多特征，需要考虑多个因素以确定准确的预测结果。
methods: 这个研究使用的方法包括python脚本建立数据Normalization、标准化和清洁，以避免机器学习算法中的噪音。
results: 该研究使用的模型是决策树模型和梯度提升预测模型，这两种模型被组合以实现更加准确的预测结果。研究发现，该模型在预测二手车价格方面表现了良好的性能。未来的车价预测研究可以使用同一数据集，并采用不同的预测技术来进行研究。

Abstract
Predicting the price of used vehicles is a more interesting and needed problem by many users. Vehicle price prediction can be a challenging task due to the high number of attributes that should be considered for accurate prediction. The major step in the prediction process is the collection and pre-processing of the data. In this project, python scripts were built to normalize, standardize, and clean data to avoid unnecessary noise for machine learning algorithms. The data set used in this project can be very valuable in conducting similar research using different prediction techniques. Many assumptions were made on the basis of the data set. The proposed system uses a Decision tree model and Gradient boosting predictive model, which are combined in other to get closed to accurate prediction, the proposed model was evaluated and it gives a promising performance. The future price prediction of used vehicles with the help of the same data set will comprise different models.

摘要
预测二手车价格是一项更有趣且需要的问题，对多个用户来说。预测二手车价格是一项复杂的任务，因为需要考虑大量的特征来确定准确的预测。主要在预测过程中的一步是数据收集和处理。在该项目中，使用Python脚本来 норма化、标准化和清洁数据，以避免机器学习算法中的无用噪音。使用的数据集可以在进行类似研究中发挥重要作用，使用不同的预测技术进行研究。在项目中，提出了许多假设，基于数据集。提议的系统使用决策树模型和梯度拟合预测模型，这两种模型结合使用，以达到更加准确的预测。该模型在评估中表现良好，未来预测二手车价格将使用同一个数据集进行不同的模型。

Initial State Interventions for Deconfounded Imitation Learning

paper_url: http://arxiv.org/abs/2307.15980
repo_url: None
paper_authors: Samuel Pfrommer, Yatong Bai, Hyunin Lee, Somayeh Sojoudi
for: 这个论文旨在解决imitative learning中的 causal confusion问题，即学习策略会因为对于不直接影响专家行为的特征而产生低开放式监督损失，但是在投入后表现不佳。
methods: 这篇论文提出了一种新的掩码算法，用于在分离的 observable space 中掩码 observable 特征，以避免 causal confusion。该算法不需要专家问题、专家奖励函数或 causal 图Specification。在某些假设下，论文 theoretically 证明了这种算法是保守的，不会错过 causally 影响专家的观察。
results: 论文通过应用掩码算法到 CartPole 和 Reacher 两个示例控制系统中，实践证明了该算法可以有效地避免 causal confusion，并提高 open-loop 监督损失。

Abstract
Imitation learning suffers from causal confusion. This phenomenon occurs when learned policies attend to features that do not causally influence the expert actions but are instead spuriously correlated. Causally confused agents produce low open-loop supervised loss but poor closed-loop performance upon deployment. We consider the problem of masking observed confounders in a disentangled representation of the observation space. Our novel masking algorithm leverages the usual ability to intervene in the initial system state, avoiding any requirement involving expert querying, expert reward functions, or causal graph specification. Under certain assumptions, we theoretically prove that this algorithm is conservative in the sense that it does not incorrectly mask observations that causally influence the expert; furthermore, intervening on the initial state serves to strictly reduce excess conservatism. The masking algorithm is applied to behavior cloning for two illustrative control systems: CartPole and Reacher.

摘要
模仿学习受到 causal 混乱的影响。这种现象发生在学习的策略会注意到不会导致专家行为的特征，而是与专家行为间corrrelate的。 causally 混乱的代理人会 prodduce low open-loop 监督损失，但是在投入中表现不佳。我们考虑了隐藏观察空间中的观察器的问题。我们的新的masking算法利用了 usual 能够 intervene在初始系统状态上，不需要专家查询、专家奖励函数或 causal graph specification。在某些假设下，我们理论上证明了这个算法是 conservative的，即不会错чно mask 观察到 causally 影响专家的观察; 而且， intervene 在初始状态上会 strict 减少过度保守。 masking 算法应用于 two 个 ilustrative 控制系统： CartPole 和 Reacher。

Blockchain-empowered Federated Learning for Healthcare Metaverses: User-centric Incentive Mechanism with Optimal Data Freshness

paper_url: http://arxiv.org/abs/2307.15975
repo_url: None
paper_authors: Jiawen Kang, Jinbo Wen, Dongdong Ye, Bingkun Lai, Tianhao Wu, Zehui Xiong, Jiangtian Nie, Dusit Niyato, Yang Zhang, Shengli Xie
for: 这篇论文旨在为健康Metaverse（metaverse）开发出用户中心的隐私保护框架，以提高Metaverse的安全性和数据新鲜度。
methods: 该论文提出了一种基于分布式学习（Federated Learning，FL）的用户中心隐私保护框架，并在此基础上提出了一种跨链接强化的FL框架，以提高感知数据的安全性。
results: 数字实验结果表明，提出的方案可以有效地保护Metaverse的感知数据，并且可以提高服务提供者的数据分享利益。

Abstract
Given the revolutionary role of metaverses, healthcare metaverses are emerging as a transformative force, creating intelligent healthcare systems that offer immersive and personalized services. The healthcare metaverses allow for effective decision-making and data analytics for users. However, there still exist critical challenges in building healthcare metaverses, such as the risk of sensitive data leakage and issues with sensing data security and freshness, as well as concerns around incentivizing data sharing. In this paper, we first design a user-centric privacy-preserving framework based on decentralized Federated Learning (FL) for healthcare metaverses. To further improve the privacy protection of healthcare metaverses, a cross-chain empowered FL framework is utilized to enhance sensing data security. This framework utilizes a hierarchical cross-chain architecture with a main chain and multiple subchains to perform decentralized, privacy-preserving, and secure data training in both virtual and physical spaces. Moreover, we utilize Age of Information (AoI) as an effective data-freshness metric and propose an AoI-based contract theory model under Prospect Theory (PT) to motivate sensing data sharing in a user-centric manner. This model exploits PT to better capture the subjective utility of the service provider. Finally, our numerical results demonstrate the effectiveness of the proposed schemes for healthcare metaverses.

摘要
随着metaverse的革命性发展，健康metaverse正在成为一种转型力量，创造出智能健康系统，提供 immerse和个性化服务。健康metaverse允许用户进行有效的决策和数据分析。然而，建构健康metaverse还存在一些挑战，如敏感数据泄露的风险和感知数据安全和新鲜度的问题，以及数据分享的激励问题。在这篇论文中，我们首先设计了一个基于分布式学习（FL）的用户中心隐私保护框架，以帮助健康metaverse解决这些挑战。为了进一步提高健康metaverse的隐私保护，我们利用了跨链 empowered FL框架，以提高感知数据的安全性。这个框架利用了一个层次结构的跨链架构，包括主链和多个子链，以进行分布式、隐私保护和安全的数据训练 both in virtual and physical spaces。此外，我们利用了Age of Information（AoI）作为有效的新鲜度指标，并提出了基于Prospect Theory（PT）的合约理论模型，以激励感知数据分享。这个模型利用PT来更好地捕捉服务提供者的主观价值。最后，我们的数据示出了健康metaverse中提议的方案的效果。

Graph Condensation for Inductive Node Representation Learning

paper_url: http://arxiv.org/abs/2307.15967
repo_url: None
paper_authors: Xinyi Gao, Tong Chen, Yilong Zang, Wentao Zhang, Quoc Viet Hung Nguyen, Kai Zheng, Hongzhi Yin
for: 提高大型图的计算效率，使图神经网络（GNNs）能够更好地应用于多种应用场景。
methods: 使用mapping-aware graph condensation（MCond）技术，实际学习原节点与新节点之间的一对多映射，从而将新节点直接 integrate into 简化后的图中进行表示学习。
results: 在 inductive 推理中，MCond 可以减少计算开销和存储需求，在 Reddit 数据集上达到了121.5倍的推理速度提升和55.9倍的存储需求减少。

Abstract
Graph neural networks (GNNs) encounter significant computational challenges when handling large-scale graphs, which severely restricts their efficacy across diverse applications. To address this limitation, graph condensation has emerged as a promising technique, which constructs a small synthetic graph for efficiently training GNNs while retaining performance. However, due to the topology structure among nodes, graph condensation is limited to condensing only the observed training nodes and their corresponding structure, thus lacking the ability to effectively handle the unseen data. Consequently, the original large graph is still required in the inference stage to perform message passing to inductive nodes, resulting in substantial computational demands. To overcome this issue, we propose mapping-aware graph condensation (MCond), explicitly learning the one-to-many node mapping from original nodes to synthetic nodes to seamlessly integrate new nodes into the synthetic graph for inductive representation learning. This enables direct information propagation on the synthetic graph, which is much more efficient than on the original large graph. Specifically, MCond employs an alternating optimization scheme with innovative loss terms from transductive and inductive perspectives, facilitating the mutual promotion between graph condensation and node mapping learning. Extensive experiments demonstrate the efficacy of our approach in inductive inference. On the Reddit dataset, MCond achieves up to 121.5x inference speedup and 55.9x reduction in storage requirements compared with counterparts based on the original graph.

摘要
图 neural network (GNN) 在处理大规模图时遇到了重要的计算挑战，这限制了其在多种应用场景中的效果。为解决这 limitation，图简化技术 emerged as a promising technique, which constructs a small synthetic graph for efficiently training GNNs while retaining performance. However, due to the topology structure among nodes, graph condensation is limited to condensing only the observed training nodes and their corresponding structure, thus lacking the ability to effectively handle unseen data. Therefore, the original large graph is still required in the inference stage to perform message passing to inductive nodes, resulting in substantial computational demands. To overcome this issue, we propose mapping-aware graph condensation (MCond), which explicitly learns the one-to-many node mapping from original nodes to synthetic nodes to seamlessly integrate new nodes into the synthetic graph for inductive representation learning. This enables direct information propagation on the synthetic graph, which is much more efficient than on the original large graph. Specifically, MCond employs an alternating optimization scheme with innovative loss terms from transductive and inductive perspectives, facilitating the mutual promotion between graph condensation and node mapping learning. Extensive experiments demonstrate the efficacy of our approach in inductive inference. On the Reddit dataset, MCond achieves up to 121.5x inference speedup and 55.9x reduction in storage requirements compared with counterparts based on the original graph.

Recommendation Unlearning via Matrix Correction

paper_url: http://arxiv.org/abs/2307.15960
repo_url: None
paper_authors: Jiahao Liu, Dongsheng Li, Hansu Gu, Tun Lu, Jiongran Wu, Peng Zhang, Li Shang, Ning Gu
for: 提供个性化服务，但大量用户数据带来隐私、安全和实用性问题。
methods: 使用推荐解启（unlearning）方法，允许忘记特定数据和模型，以降低敏感/恶意/毒害用户数据的风险。
results: 提出一种基于交互和映射矩阵 corrections（IMCorrect）方法，可以增强推荐解启的完整性、实用性和效率，而无需重新训练模型。实验结果表明，IMCorrect 在多种推荐解启场景中具有更高的完整性、实用性和效率，并且可以逐步学习新数据，进一步提高实际应用性。

Abstract
Recommender systems are important for providing personalized services to users, but the vast amount of collected user data has raised concerns about privacy (e.g., sensitive data), security (e.g., malicious data) and utility (e.g., toxic data). To address these challenges, recommendation unlearning has emerged as a promising approach, which allows specific data and models to be forgotten, mitigating the risks of sensitive/malicious/toxic user data. However, existing methods often struggle to balance completeness, utility, and efficiency, i.e., compromising one for the other, leading to suboptimal recommendation unlearning. In this paper, we propose an Interaction and Mapping Matrices Correction (IMCorrect) method for recommendation unlearning. Firstly, we reveal that many collaborative filtering (CF) algorithms can be formulated as mapping-based approach, in which the recommendation results can be obtained by multiplying the user-item interaction matrix with a mapping matrix. Then, IMCorrect can achieve efficient recommendation unlearning by correcting the interaction matrix and enhance the completeness and utility by correcting the mapping matrix, all without costly model retraining. Unlike existing methods, IMCorrect is a whitebox model that offers greater flexibility in handling various recommendation unlearning scenarios. Additionally, it has the unique capability of incrementally learning from new data, which further enhances its practicality. We conducted comprehensive experiments to validate the effectiveness of IMCorrect and the results demonstrate that IMCorrect is superior in completeness, utility, and efficiency, and is applicable in many recommendation unlearning scenarios.

摘要
我们发现许多集成过滤（CF）算法可以表示为映射基本的方法，在这种方法中，推荐结果可以通过用户-项目互动矩阵与映射矩阵的乘法来获得。然后，IMCorrect可以通过修正互动矩阵和映射矩阵来实现高效的推荐忘记，同时提高完整性和有用性，而不需要费时重新训练模型。与现有方法不同，IMCorrect是一种白盒模型，可以更好地处理各种推荐忘记场景。此外，它还具有逐步学习新数据的能力，这使其在实际应用中更加具有实用性。我们对IMCorrect的效果进行了广泛的实验验证，结果表明，IMCorrect在完整性、有用性和效率方面均较为出色，并且可以在许多推荐忘记场景中应用。

Towards the Visualization of Aggregated Class Activation Maps to Analyse the Global Contribution of Class Features

paper_url: http://arxiv.org/abs/2308.00710
repo_url: None
paper_authors: Igor Cherepanov, David Sessler, Alex Ulmer, Hendrik Lücke-Tieke, Jörn Kohlhammer
for: 这 paper 的目的是解释深度学习模型在分类任务中的决策过程。
methods: 该 paper 使用了 Class Activation Maps (CAMs) 方法，该方法可以视觉化每个数据样本中对分类决策的重要性。
results: 该 paper 通过对多个样本的 CAMs 的聚合，提供了一个全局的解释视觉化，可以帮助分析员了解深度学习模型的决策过程。

Abstract
Deep learning (DL) models achieve remarkable performance in classification tasks. However, models with high complexity can not be used in many risk-sensitive applications unless a comprehensible explanation is presented. Explainable artificial intelligence (xAI) focuses on the research to explain the decision-making of AI systems like DL. We extend a recent method of Class Activation Maps (CAMs) which visualizes the importance of each feature of a data sample contributing to the classification. In this paper, we aggregate CAMs from multiple samples to show a global explanation of the classification for semantically structured data. The aggregation allows the analyst to make sophisticated assumptions and analyze them with further drill-down visualizations. Our visual representation for the global CAM illustrates the impact of each feature with a square glyph containing two indicators. The color of the square indicates the classification impact of this feature. The size of the filled square describes the variability of the impact between single samples. For interesting features that require further analysis, a detailed view is necessary that provides the distribution of these values. We propose an interactive histogram to filter samples and refine the CAM to show relevant samples only. Our approach allows an analyst to detect important features of high-dimensional data and derive adjustments to the AI model based on our global explanation visualization.

摘要

The effect of network topologies on fully decentralized learning: a preliminary investigation

paper_url: http://arxiv.org/abs/2307.15947
repo_url: None
paper_authors: Luigi Palmieri, Lorenzo Valerio, Chiara Boldrini, Andrea Passarella
for: 这个论文研究了一种分布式机器学习系统中 nodes 之间的网络拓扑对模型的性能影响。
methods: 作者使用了 direct collaboration between nodes 方法，并调查了不同类型的网络拓扑对 “知识传播” 的影响。
results: 研究发现，即使网络组件之间存在只有弱连接，也可以传输信息；但是，这并不意味着知识可以快速传播。另外，研究发现，核心节点（hubs）在传播知识方面扮演着更重要的角色，而叶节点（leaves）的影响相对较小。最后，研究发现，紧密结合的社区会干扰知识传播。

Abstract
In a decentralized machine learning system, data is typically partitioned among multiple devices or nodes, each of which trains a local model using its own data. These local models are then shared and combined to create a global model that can make accurate predictions on new data. In this paper, we start exploring the role of the network topology connecting nodes on the performance of a Machine Learning model trained through direct collaboration between nodes. We investigate how different types of topologies impact the "spreading of knowledge", i.e., the ability of nodes to incorporate in their local model the knowledge derived by learning patterns in data available in other nodes across the networks. Specifically, we highlight the different roles in this process of more or less connected nodes (hubs and leaves), as well as that of macroscopic network properties (primarily, degree distribution and modularity). Among others, we show that, while it is known that even weak connectivity among network components is sufficient for information spread, it may not be sufficient for knowledge spread. More intuitively, we also find that hubs have a more significant role than leaves in spreading knowledge, although this manifests itself not only for heavy-tailed distributions but also when "hubs" have only moderately more connections than leaves. Finally, we show that tightly knit communities severely hinder knowledge spread.

摘要
Specifically, we examine the roles of more or less connected nodes (hubs and leaves) and macroscopic network properties (such as degree distribution and modularity) in this process. We find that while even weak connectivity among network components is sufficient for information spread, it may not be sufficient for knowledge spread. Additionally, we show that hubs play a more significant role than leaves in spreading knowledge, and that tightly knit communities hinder knowledge spread.我们在这篇论文中开始探讨了一个分布式机器学习系统中数据的分区和节点之间的直接协作对机器学习模型的性能的影响。我们研究了不同类型的网络拓扑如何影响“知识散布”，即每个节点通过学习数据中的学习模式来 incorporate 其他节点中的知识。我们发现，尽管even weak connectivity among network components is sufficient for information spread，但可能不够 для知识散布。此外，我们发现主要的节点（hubs）比叶子节点（leaves）更有助于散布知识，并且这种效应不仅限于重 tailed distribution，而且在“hubs”只有moderately more connections than leaves时也manifests itself。最后，我们发现，紧密结合的社区会严重阻碍知识散布。

paper_url: http://arxiv.org/abs/2307.15944
repo_url: None
paper_authors: Shahab Nikkhoo, Zexin Li, Aritra Samanta, Yufei Li, Cong Liu
for: 这 paper 的目的是探讨如何通过 manipulate 多机器人之间的交流，以达到更好的协作效果。
methods: 这 paper 使用了一种新的 manipulate 方法，即 PIMbot，可以在多机器人协作中 manipulate 奖励函数，从而影响 outcome。
results: 实验结果表明，PIMbot 可以有效地 manipulate 多机器人协作环境，并且可以影响任务结果的负面和正面效果。

Abstract
Recent research has demonstrated the potential of reinforcement learning (RL) in enabling effective multi-robot collaboration, particularly in social dilemmas where robots face a trade-off between self-interests and collective benefits. However, environmental factors such as miscommunication and adversarial robots can impact cooperation, making it crucial to explore how multi-robot communication can be manipulated to achieve different outcomes. This paper presents a novel approach, namely PIMbot, to manipulating the reward function in multi-robot collaboration through two distinct forms of manipulation: policy and incentive manipulation. Our work introduces a new angle for manipulation in recent multi-agent RL social dilemmas that utilize a unique reward function for incentivization. By utilizing our proposed PIMbot mechanisms, a robot is able to manipulate the social dilemma environment effectively. PIMbot has the potential for both positive and negative impacts on the task outcome, where positive impacts lead to faster convergence to the global optimum and maximized rewards for any chosen robot. Conversely, negative impacts can have a detrimental effect on the overall task performance. We present comprehensive experimental results that demonstrate the effectiveness of our proposed methods in the Gazebo-simulated multi-robot environment. Our work provides insights into how inter-robot communication can be manipulated and has implications for various robotic applications. %, including robotics, transportation, and manufacturing.

摘要
近期研究表明了强化学习（RL）在多机器人协作中的潜力，特别是在社会冲突中机器人面临自身利益和集体利益之间的负担。然而，环境因素如沟通错误和反对机器人可能会影响合作，使得探索如何 manipulate multi-robot communication 以实现不同的结果变得非常重要。这篇论文提出了一种新的方法，即 PIMbot，用于 manipulate 多机器人协作中的奖励函数。我们的工作描述了两种不同的欺诈方式：策略欺诈和激励欺诈。我们的 PIMbot 机制可以在多机器人社会冲突环境中有效地操纵环境。PIMbot 有可能导致任务结果的正面和负面影响，其中正面影响可以使任务结果更快地 converges 到全局最优解和最大化任务奖励。然而，负面影响可能会对整体任务性能产生负面影响。我们在 Gazebo simulate 多机器人环境中进行了广泛的实验研究，并提供了具有深入意义的结论。我们的工作为 robotics、运输和制造等领域提供了新的思路和方法，并且有助于我们更好地理解如何 manipulate inter-robot communication。

Continual Learning in Predictive Autoscaling

paper_url: http://arxiv.org/abs/2307.15941
repo_url: https://github.com/anonymousaccountx/DMSHM
paper_authors: Hongyan Hao, Zhixuan Chu, Shiyi Zhu, Gangwei Jiang, Yan Wang, Caigao Jiang, James Zhang, Wei Jiang, Siqiao Xue, Jun Zhou
for: 预测云服务器负载和预备资源以保证服务水平目标 (SLOs) 在动态云环境中。
methods: 提出了一种基于重播学习的 kontinual learning 方法，即密度基于内存选择和提示基于网络学习模型 (DMSHM)，只使用历史记录的一小部分来实现准确预测。
results: 在公共和工业数据集上进行了实验，证明了我们提出的方法在内存容量和预测精度两个方面比现状态静学习方法更高效，并在实际工业应用中表现出了remarkable的实用性。

Abstract
Predictive Autoscaling is used to forecast the workloads of servers and prepare the resources in advance to ensure service level objectives (SLOs) in dynamic cloud environments. However, in practice, its prediction task often suffers from performance degradation under abnormal traffics caused by external events (such as sales promotional activities and applications re-configurations), for which a common solution is to re-train the model with data of a long historical period, but at the expense of high computational and storage costs. To better address this problem, we propose a replay-based continual learning method, i.e., Density-based Memory Selection and Hint-based Network Learning Model (DMSHM), using only a small part of the historical log to achieve accurate predictions. First, we discover the phenomenon of sample overlap when applying replay-based continual learning in prediction tasks. In order to surmount this challenge and effectively integrate new sample distribution, we propose a density-based sample selection strategy that utilizes kernel density estimation to calculate sample density as a reference to compute sample weight, and employs weight sampling to construct a new memory set. Then we implement hint-based network learning based on hint representation to optimize the parameters. Finally, we conduct experiments on public and industrial datasets to demonstrate that our proposed method outperforms state-of-the-art continual learning methods in terms of memory capacity and prediction accuracy. Furthermore, we demonstrate remarkable practicability of DMSHM in real industrial applications.

摘要
predictive autoscaling 是用于预测服务器劳动负荷和预先准备资源，以确保云环境中的服务水平目标 (SLO)。然而，在实践中，其预测任务经常受到外部事件（如销售推广活动和应用程序重新配置）引起的异常流量的影响，导致性能下降。为了更好地解决这个问题，我们提议一种基于重温学习的再启用学习方法，即密度基于内存选择和提示基于网络学习模型 (DMSHM)，只需使用历史记录中的一小部分来实现准确的预测。首先，我们发现在应用重温学习的预测任务中存在样本重叠现象。为了超越这个挑战并有效地 интегра新样本分布，我们提议一种密度基于样本选择策略，利用核函数密度估计计算样本密度，并employs weight sampling将其构成为新的内存集。然后，我们实现提示基于提示表示来优化参数。最后，我们在公共和工业数据集上进行了实验，并证明了我们提议的方法在内存容量和预测精度两个方面比现状前方法更高。此外，我们还证明了DMSHM在实际工业应用中具有很好的实用性。

A Theory for Emergence of Complex Skills in Language Models

paper_url: http://arxiv.org/abs/2307.15936
repo_url: https://github.com/dia2018/What-is-the-Difference-Between-AI-and-Machine-Learning
paper_authors: Sanjeev Arora, Anirudh Goyal
for: 本研究旨在解释语言模型新技能的出现是因为参数集和训练 Corpora 的扩大。
methods: 本研究使用了知名的 Scaling Laws of LLMs 和简单的统计分析框架来分析emergence。
results: 研究发现，通过增加参数集和训练 Corpora，语言模型会出现强大的 inductive bias，使其可以快速学习。此外，研究还发现，这种 inductive bias 可以使语言模型在执行包含多个技能的任务时表现出高水平的能力。

Abstract
A major driver of AI products today is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up. This phenomenon is poorly understood, and a mechanistic explanation via mathematical analysis of gradient-based training seems difficult. The current paper takes a different approach, analysing emergence using the famous (and empirical) Scaling Laws of LLMs and a simple statistical framework. Contributions include: (a) A statistical framework that relates cross-entropy loss of LLMs to competence on the basic skills that underlie language tasks. (b) Mathematical analysis showing that the Scaling Laws imply a strong form of inductive bias that allows the pre-trained model to learn very efficiently. We informally call this {\em slingshot generalization} since naively viewed it appears to give competence levels at skills that violate usual generalization theory. (c) A key example of slingshot generalization, that competence at executing tasks involving $k$-tuples of skills emerges essentially at the same scaling and same rate as competence on the elementary skills themselves.

摘要
现代AI产品的主要驱动力之一是语言模型新的技能出现，当它们的参数集和训练 Corpora 的大小增加时。这种现象尚未得到充分理解，而且通过梯度基本训练的数学分析难以提供机制性解释。本文采取了一种不同的方法，通过著名的扩展法律和简单的统计框架来分析emergence。本文的贡献包括：(a) 一种统计框架，将语言任务下的基本技能的混合损失与语言模型的泛化能力相关联。(b) 数学分析表明，扩展法律Imply一种强大的推导偏见，使预训练模型在学习过程中非常高效。我们称之为“飞弹泛化”，因为在常规泛化理论下看来，模型能够学习出违背常规的技能水平。(c) 一个重要的例子， Competence at executing tasks involving $k$-tuples of skills emerges essentially at the same scaling and same rate as competence on the elementary skills themselves。Translation notes:* "Scaling Laws" is translated as "扩展法律" (fāng zhì fǎ) in Chinese, which is a literal translation of the English phrase.* "LLMs" is translated as "语言模型" (yǔ yán mó del) in Chinese, which is a common abbreviation for "language models" in the field of natural language processing.* "competence" is translated as "泛化能力" (fān huà néng lì) in Chinese, which refers to the ability of a language model to perform a task or a set of tasks.* "elementary skills" is translated as "基本技能" (jī běn jì néng) in Chinese, which refers to the basic skills or abilities that underlie language tasks.* "slingshot generalization" is translated as "飞弹泛化" (fēi dàn fān huà) in Chinese, which is a literal translation of the English phrase.

A Noisy-Label-Learning Formulation for Immune Repertoire Classification and Disease-Associated Immune Receptor Sequence Identification

paper_url: http://arxiv.org/abs/2307.15934
repo_url: https://github.com/tencentailabhealthcare/nll-irc
paper_authors: Mingcai Chen, Yu Zhao, Zhonghuang Wang, Bing He, Jianhua Yao
for: 革命性贡献对新药和免疫疗法的研究，即 computeational biology 领域的前沿研究。
methods: 提出了一种噪声标签学习方法，解决了传统的多例空间 MIL 问题，即直接将袋级标签分配给实例。
results: 实现了高精度的序列级分类和抗体组织级分类，并在 CMV 和癌症数据集上进行了实验，得到了显著的性能提升。

Abstract
Immune repertoire classification, a typical multiple instance learning (MIL) problem, is a frontier research topic in computational biology that makes transformative contributions to new vaccines and immune therapies. However, the traditional instance-space MIL, directly assigning bag-level labels to instances, suffers from the massive amount of noisy labels and extremely low witness rate. In this work, we propose a noisy-label-learning formulation to solve the immune repertoire classification task. To remedy the inaccurate supervision of repertoire-level labels for a sequence-level classifier, we design a robust training strategy: The initial labels are smoothed to be asymmetric and are progressively corrected using the model's predictions throughout the training process. Furthermore, two models with the same architecture but different parameter initialization are co-trained simultaneously to remedy the known "confirmation bias" problem in the self-training-like schema. As a result, we obtain accurate sequence-level classification and, subsequently, repertoire-level classification. Experiments on the Cytomegalovirus (CMV) and Cancer datasets demonstrate our method's effectiveness and superior performance on sequence-level and repertoire-level tasks.

摘要
免疫质量分类是生物计算中一个前沿研究领域，它对新肇病变和免疫治疗做出了革命性的贡献。然而，传统的实例空间MIL（多个实例学习）问题，直接将袋级标签 assigning 到实例，受到巨大量的噪音标签和极低的证人率问题。在这个工作中，我们提出了噪音标签学习形式来解决免疫质量分类任务。为了缓解实例级标签的不精确指导，我们设计了一个预处理策略：初始标签被填充了偏好的差异，并在训练过程中逐渐更正使用模型预测。此外，我们同时培训两个同样的架构，但不同的参数初始化，以缓解自然语言训练中知道的“证人偏见”问题。最终，我们获得了精确的序列级分类和免疫质量分类。实验结果显示，我们的方法在CMV和癌症数据集上具有高效性和优良性，并在序列级和免疫质量任务上获得了佳绩。

Language models as master equation solvers

paper_url: http://arxiv.org/abs/2308.02514
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Chuanbo Liu, Jin Wang
for: 解决复杂随机动力系统的精确解方法
methods: 使用语言模型学习方法，通过提示基金网络将率参数、初始条件和时间值映射到状态联合概率分布中
results: 对多模块和高维系统进行了示例应用，观察了高精度和抽象能力，并证明了使用单个预训练大型模型可以解决任何精确解方程。

Abstract
Master equations are of fundamental importance in modeling stochastic dynamical systems.However, solving master equations is challenging due to the exponential increase in the number of possible states or trajectories with the dimension of the state space. In this study, we propose repurposing language models as a machine learning approach to solve master equations. We design a prompt-based neural network to map rate parameters, initial conditions, and time values directly to the state joint probability distribution that exactly matches the input contexts. In this way, we approximate the solution of the master equation in its most general form. We train the network using the policy gradient algorithm within the reinforcement learning framework, with feedback rewards provided by a set of variational autoregressive models. By applying this approach to representative examples, we observe high accuracy for both multi-module and high-dimensional systems. The trained network also exhibits extrapolating ability, extending its predictability to unseen data. Our findings establish the connection between language models and master equations, highlighting the possibility of using a single pretrained large model to solve any master equation.

摘要
We designed a prompt-based neural network that maps rate parameters, initial conditions, and time values to the joint probability distribution of the state, which exactly matches the input context. This approach can approximate the solution of the master equation in its most general form.We trained the network using the policy gradient algorithm within the reinforcement learning framework, with feedback rewards provided by a set of variational autoregressive models. We applied this approach to representative examples and found that the trained network had high accuracy for both multi-module and high-dimensional systems. Additionally, the trained network was able to extrapolate to unseen data, demonstrating its predictive power.Our findings establish a connection between language models and master equations, showing that a single pretrained large model can be used to solve any master equation. This approach has the potential to revolutionize the field of stochastic dynamical systems modeling.

Dynamic deep-reinforcement-learning algorithm in Partially Observed Markov Decision Processes

paper_url: http://arxiv.org/abs/2307.15931
repo_url: None
paper_authors: Saki Omi, Hyo-Sang Shin, Namhoon Cho, Antonios Tsourdos
for: 解决Partially Observable Markov Decision Process (POMDP)中agent的性能难以保持问题。
methods: 提出了一些结构和方法来扩展latest deep reinforcement learning algorithms with LSTM networks，以提高控制性能对不同类型的外部干扰的Robustness。
results: 研究表明，包含动作序列的情况可以解决POMDP中agent的性能问题，并且提出了一些结构和方法来扩展latest deep reinforcement learning algorithms with LSTM networks，以提高控制性能对不同类型的外部干扰的Robustness。

Abstract
Reinforcement learning has been greatly improved in recent studies and an increased interest in real-world implementation has emerged in recent years. In many cases, due to the non-static disturbances, it becomes challenging for the agent to keep the performance. The disturbance results in the environment called Partially Observable Markov Decision Process. In common practice, Partially Observable Markov Decision Process is handled by introducing an additional estimator, or Recurrent Neural Network is utilized in the context of reinforcement learning. Both of the cases require to process sequential information on the trajectory. However, there are only a few studies investigating the effect of information to consider and the network structure to handle them. This study shows the benefit of action sequence inclusion in order to solve Partially Observable Markov Decision Process. Several structures and approaches are proposed to extend one of the latest deep reinforcement learning algorithms with LSTM networks. The developed algorithms showed enhanced robustness of controller performance against different types of external disturbances that are added to observation.

摘要
现在的研究中，人工智能学习（Reinforcement Learning）已经得到了 significativo 改进，而且在实际应用中的兴趣也在不断增长。然而，由于环境中的非静态干扰，agent often 难以维持性能。这种干扰会导致环境中的Partially Observable Markov Decision Process（POMDP）。在常见的做法中，POMDP 通常通过添加额外估计器或 Recurrent Neural Network（RNN）来处理。两者都需要处理序列信息的扩展。然而，只有一些研究探讨了考虑信息的影响和网络结构的处理方式。本研究表明，包含动作序列的包含可以解决POMDP。本研究提出了一些结构和方法，以扩展最新的深度学习控制算法，包括Long Short-Term Memory（LSTM）网络。研究发现，通过包含动作序列，可以提高控制性能的鲁棒性对不同类型的外部干扰。

Opportunistic Air Quality Monitoring and Forecasting with Expandable Graph Neural Networks

paper_url: http://arxiv.org/abs/2307.15916
repo_url: None
paper_authors: Jingwei Zuo, Wenbin Li, Michele Baldo, Hakim Hacid
for: 这篇论文主要目的是提出一种可扩展的图注意网络模型（EGAT），用于融合不同空间结构的数据收集，以提高空气质量预测的精度。
methods: 该模型使用了图注意网络技术，可以将现有和新增的基础设施数据融合，以满足不同个人化enario的需求。此外，该模型还可以与现有的空气质量预测模型结合使用，以适应变化的空间结构。
results: 对实际空气质量数据进行验证，EGAT模型可以提高空气质量预测的精度，并且可以适应不同的空间结构变化。

Abstract
Air Quality Monitoring and Forecasting has been a popular research topic in recent years. Recently, data-driven approaches for air quality forecasting have garnered significant attention, owing to the availability of well-established data collection facilities in urban areas. Fixed infrastructures, typically deployed by national institutes or tech giants, often fall short in meeting the requirements of diverse personalized scenarios, e.g., forecasting in areas without any existing infrastructure. Consequently, smaller institutes or companies with limited budgets are compelled to seek tailored solutions by introducing more flexible infrastructures for data collection. In this paper, we propose an expandable graph attention network (EGAT) model, which digests data collected from existing and newly-added infrastructures, with different spatial structures. Additionally, our proposal can be embedded into any air quality forecasting models, to apply to the scenarios with evolving spatial structures. The proposal is validated over real air quality data from PurpleAir.

摘要
《空气质量监测和预测研究在最近几年内得到了广泛关注。最近，基于数据驱动的空气质量预测方法受到了广泛关注，因为城市地区的数据收集设施已经成熔化了。固定基础设施，通常由国家机构或科技巨头部署，经常无法满足多样化个性化场景的需求，例如预测没有任何基础设施的地区。因此，更小的机构或公司具有有限预算，需要寻找适合自己的解决方案，例如引入更灵活的数据收集基础设施。在本文中，我们提出了可扩展图注意网络（EGAT）模型，该模型可以处理现有和新增的基础设施，具有不同的空间结构。此外，我们的提议可以与任何空气质量预测模型结合使用，以适应发展中的空间结构。实验 validate 在实际空气质量数据上进行。》Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

An Automata-Theoretic Approach to Synthesizing Binarized Neural Networks

paper_url: http://arxiv.org/abs/2307.15907
repo_url: None
paper_authors: Ye Tao, Wanwei Liu, Fu Song, Zhen Liang, Ji Wang, Hongxu Zhu
for: 这个论文的目的是提出一个自动化方法来Synthesize Binarized Neural Networks (BNNs)，以满足特定的特性。
methods: 这个方法使用了形式语言BLTL来定义特性，并使用自动机来实现。在Synthesis过程中，使用SMT解析器来检查网络是否存在。
results: 这个方法可以对BNNs进行自动化Synthesis，并提高个人公平和本地类别价值的表现，同时保持准确性。

Abstract
Deep neural networks, (DNNs, a.k.a. NNs), have been widely used in various tasks and have been proven to be successful. However, the accompanied expensive computing and storage costs make the deployments in resource-constrained devices a significant concern. To solve this issue, quantization has emerged as an effective way to reduce the costs of DNNs with little accuracy degradation by quantizing floating-point numbers to low-width fixed-point representations. Quantized neural networks (QNNs) have been developed, with binarized neural networks (BNNs) restricted to binary values as a special case. Another concern about neural networks is their vulnerability and lack of interpretability. Despite the active research on trustworthy of DNNs, few approaches have been proposed to QNNs. To this end, this paper presents an automata-theoretic approach to synthesizing BNNs that meet designated properties. More specifically, we define a temporal logic, called BLTL, as the specification language. We show that each BLTL formula can be transformed into an automaton on finite words. To deal with the state-explosion problem, we provide a tableau-based approach in real implementation. For the synthesis procedure, we utilize SMT solvers to detect the existence of a model (i.e., a BNN) in the construction process. Notably, synthesis provides a way to determine the hyper-parameters of the network before training.Moreover, we experimentally evaluate our approach and demonstrate its effectiveness in improving the individual fairness and local robustness of BNNs while maintaining accuracy to a great extent.

摘要
深度神经网络（DNNs，即NNs）在各种任务中广泛应用，并证明了其成功。然而，随之而来的计算和存储成本使得在有限资源设备中部署DNNs成为了一项重要问题。为解决这个问题，量化得到了广泛应用的方法，通过将浮点数转换为低宽定点表示来减少DNNs的成本，同时减少精度下降。量化神经网络（QNNs）已经开发出来，其中二进制神经网络（BNNs）的特殊情况是强制限制为二进制值。另一个神经网络的问题是它的不安全性和解释性的缺失。尽管有大量的研究在深度神经网络的可靠性方面，但对QNNs的研究尚未充分。因此，这篇论文提出了一种自动机理论方法来生成满足特定性质的BNNs。更具体地说，我们定义了一种时间逻辑（BLTL）作为规定语言。我们证明了每个BLTL公式都可以转换为一个在有限字符串上的自动机。为了解决状态爆发问题，我们提供了一种表格基本方法。在Synthesis过程中，我们利用SMT解决器来检测存在一个模型（即BNN）。另外，Synthesis还提供了一种方法来确定网络的各种参数（例如，层数、权重等）在建立过程中。同时，我们还进行了实验评估，并证明了我们的方法可以大幅提高BNNs的个人公平和本地Robustness，保持精度的程度。

Multi-view Sparse Laplacian Eigenmaps for nonlinear Spectral Feature Selection

paper_url: http://arxiv.org/abs/2307.15905
repo_url: None
paper_authors: Gaurav Srivastava, Mahesh Jangid
for: Addressing the challenges of high-dimensional datasets in machine learning, such as overfitting and computational complexity, by identifying an informative subset of features.
methods: Multi-view Sparse Laplacian Eigenmaps (MSLE) for feature selection, combining multiple views of the data, enforcing sparsity constraints, and using a scalable optimization algorithm to identify a reduced feature set.
results: Reduced the feature space by 10 to 90% while maintaining an error rate of 2.72% with Support Vector Machine (SVM), and achieved an accuracy of 96.69% with an 80% reduction in the overall feature space.

Abstract
The complexity of high-dimensional datasets presents significant challenges for machine learning models, including overfitting, computational complexity, and difficulties in interpreting results. To address these challenges, it is essential to identify an informative subset of features that captures the essential structure of the data. In this study, the authors propose Multi-view Sparse Laplacian Eigenmaps (MSLE) for feature selection, which effectively combines multiple views of the data, enforces sparsity constraints, and employs a scalable optimization algorithm to identify a subset of features that capture the fundamental data structure. MSLE is a graph-based approach that leverages multiple views of the data to construct a more robust and informative representation of high-dimensional data. The method applies sparse eigendecomposition to reduce the dimensionality of the data, yielding a reduced feature set. The optimization problem is solved using an iterative algorithm alternating between updating the sparse coefficients and the Laplacian graph matrix. The sparse coefficients are updated using a soft-thresholding operator, while the graph Laplacian matrix is updated using the normalized graph Laplacian. To evaluate the performance of the MSLE technique, the authors conducted experiments on the UCI-HAR dataset, which comprises 561 features, and reduced the feature space by 10 to 90%. Our results demonstrate that even after reducing the feature space by 90%, the Support Vector Machine (SVM) maintains an error rate of 2.72%. Moreover, the authors observe that the SVM exhibits an accuracy of 96.69% with an 80% reduction in the overall feature space.

摘要
高维数据集的复杂性对机器学习模型提出了 significante挑战，包括过拟合、计算复杂性以及解释结果的困难。为了解决这些挑战，必须 identificainformative的特征子集，以捕捉数据的基本结构。本研究提出了多视图稀疏勋略 Laplacian Eigenmaps（MSLE）来选择特征，它可以有效地结合多个视图的数据，实施稀疏约束，并使用可扩展的优化算法来确定一个捕捉数据基本结构的特征子集。MSLE是基于图的方法，利用多个视图的数据构建更加robust和有用的数据表示。方法使用稀疏勋略减少数据维度，生成减少特征集。优化问题使用迭代更新稀疏系数和Laplacian图matrix的iterative算法解决。稀疏系数使用软阈值运算符更新，而Laplacian图matrix使用正则化Laplacian图。为评估MSLE技术的性能，作者在UCIDataset上进行了实验，UCIDataset包括561个特征，并将特征空间减少到10%-90%。我们的结果显示，即使特征空间减少到90%，Support Vector Machine（SVM）的错误率仅为2.72%。此外，作者发现，将特征空间减少到80%后，SVM的准确率达96.69%。

Online Matching: A Real-time Bandit System for Large-scale Recommendations

paper_url: http://arxiv.org/abs/2307.15893
repo_url: None
paper_authors: Xinyang Yi, Shao-Chuan Wang, Ruining He, Hariharan Chandrasekaran, Charles Wu, Lukasz Heldt, Lichan Hong, Minmin Chen, Ed H. Chi
for: 提高大规模推荐系统中的新内容发现和用户兴趣探索能力
methods: 采用协同online+offline学习方法，提出Diag-LinUCB算法来实现分布式更新带ит参数
results: 通过实验示例，在YouTube平台上实现了在线学习系统的可扩展性和实时性，提高了新内容发现和用户兴趣探索的能力

Abstract
The last decade has witnessed many successes of deep learning-based models for industry-scale recommender systems. These models are typically trained offline in a batch manner. While being effective in capturing users' past interactions with recommendation platforms, batch learning suffers from long model-update latency and is vulnerable to system biases, making it hard to adapt to distribution shift and explore new items or user interests. Although online learning-based approaches (e.g., multi-armed bandits) have demonstrated promising theoretical results in tackling these challenges, their practical real-time implementation in large-scale recommender systems remains limited. First, the scalability of online approaches in servicing a massive online traffic while ensuring timely updates of bandit parameters poses a significant challenge. Additionally, exploring uncertainty in recommender systems can easily result in unfavorable user experience, highlighting the need for devising intricate strategies that effectively balance the trade-off between exploitation and exploration. In this paper, we introduce Online Matching: a scalable closed-loop bandit system learning from users' direct feedback on items in real time. We present a hybrid "offline + online" approach for constructing this system, accompanied by a comprehensive exposition of the end-to-end system architecture. We propose Diag-LinUCB -- a novel extension of the LinUCB algorithm -- to enable distributed updates of bandits parameter in a scalable and timely manner. We conduct live experiments in YouTube and show that Online Matching is able to enhance the capabilities of fresh content discovery and item exploration in the present platform.

摘要
过去一个 décennial 内，深度学习基于模型在行业级推荐系统中取得了许多成功。这些模型通常在批量方式进行训练。虽然能够充分捕捉用户在推荐平台上的过去交互，但批量学习受到系统偏见的影响，难以适应分布Shift和探索新的用户 интере点。虽然在线学习基于方法（例如多臂投手）有许多理性的研究成果，但在大规模推荐系统中实际实施仍然受限。首先，在面临巨大在线流量的情况下，在线学习方法的可扩展性和实时更新投手参数的挑战是不可或缺的。其次，探索推荐系统中的不确定性容易导致用户体验不佳，高亮了需要制定复杂的策略，以有效地平衡利用和探索之间的负荷。在本文中，我们介绍了在线匹配：一种可扩展的关闭Loop投手系统，通过用户直接反馈ITEMS的实时反馈来学习。我们提出了一种“离线+在线”的方法，并提供了整体系统架构的详细介绍。我们还提出了Diag-LinUCB算法，用于在分布式环境中有效地更新投手参数。我们在 YouTube 上进行了实验，并证明了在线匹配可以增强现有的新内容发现和项目探索。

A new Gradient TD Algorithm with only One Step-size: Convergence Rate Analysis using $L$-$λ$ Smoothness

paper_url: http://arxiv.org/abs/2307.15892
repo_url: None
paper_authors: Hengshuai Yao
for: 本文研究了一种新的凸优化 gradient temporal difference（GTD）算法，用于解决强化学习中的偏离策略学习问题。
methods: 本文使用了 GTD 算法，并提出了一种新的单时间尺度 GTD 算法，具有只有一个步长参数。此外，本文还使用了 $L$-$\lambda$ 稳定性来证明新算法的 convergency 速率。
results: 本文通过实验表明，新提出的 Impression GTD 算法在 Random walks、Boyan chain 和 Baird counterexample 等问题上具有更高的 convergency 速率，并且可以在各种不同的步长参数下实现好的性能。

Abstract
Gradient Temporal Difference (GTD) algorithms (Sutton et al., 2008, 2009) are the first $O(d)$ ($d$ is the number features) algorithms that have convergence guarantees for off-policy learning with linear function approximation. Liu et al. (2015) and Dalal et. al. (2018) proved the convergence rates of GTD, GTD2 and TDC are $O(t^{-\alpha/2})$ for some $\alpha \in (0,1)$. This bound is tight (Dalal et al., 2020), and slower than $O(1/\sqrt{t})$. GTD algorithms also have two step-size parameters, which are difficult to tune. In literature, there is a "single-time-scale" formulation of GTD. However, this formulation still has two step-size parameters. This paper presents a truly single-time-scale GTD algorithm for minimizing the Norm of Expected td Update (NEU) objective, and it has only one step-size parameter. We prove that the new algorithm, called Impression GTD, converges at least as fast as $O(1/t)$. Furthermore, based on a generalization of the expected smoothness (Gower et al. 2019), called $L$-$\lambda$ smoothness, we are able to prove that the new GTD converges even faster, in fact, with a linear rate. Our rate actually also improves Gower et al.'s result with a tighter bound under a weaker assumption. Besides Impression GTD, we also prove the rates of three other GTD algorithms, one by Yao and Liu (2008), another called A-transpose-TD (Sutton et al., 2008), and a counterpart of A-transpose-TD. The convergence rates of all the four GTD algorithms are proved in a single generic GTD framework to which $L$-$\lambda$ smoothness applies. Empirical results on Random walks, Boyan chain, and Baird counterexample show that Impression GTD converges much faster than existing GTD algorithms for both on-policy and off-policy learning problems, with well-performing step-sizes in a big range.

摘要
Gradient Temporal Difference（GTD）算法（Sutton et al., 2008, 2009）是第一个 $O(d)$ ($d$ 是特征数) 拥有确定性 guarantees 的偏离策略学习算法。Liu et al.（2015）和Dalal et al.（2018）证明 GTD、GTD2 和 TDC 的收敛率为 $O(t^{-\alpha/2})$，其中 $\alpha \in (0,1)$。这个 bound 是紧张的（Dalal et al., 2020），并且 slower than $O(1/\sqrt{t})$。GTD 算法还有两个步长参数，这些参数难以调整。在文献中，有一种“单时间尺度”的 GTD 形式化，但这种形式化仍有两个步长参数。这篇文章提出了一个真正的单时间尺度 GTD 算法，用于最小化 Norm of Expected td Update（NEU）目标函数，并且具有只有一个步长参数。我们证明该新算法，称为 Impression GTD，在至少 $O(1/t)$ 的速度下收敛。此外，基于预期平滑性（Gower et al., 2019）的一种扩展，称为 $L$-$\lambda$ 平滑性，我们能够证明 Impression GTD 在实际情况下收敛更快，具体来说是线性收敛率。我们的速率实际上还超过 Gower et al. 的结果，并且在较弱的假设下提供了更紧张的 bound。除了 Impression GTD 之外，我们还证明了三种 GTD 算法的收敛率，分别是 Yao 和 Liu（2008）的一种 GTD 算法、Sutton et al.（2008）的 A-transpose-TD 算法，以及它的对应者。所有四种 GTD 算法的收敛率在一个通用的 GTD 框架中证明，该框架下 $L$-$\lambda$ 平滑性适用。实际实验表明，Impression GTD 在Random walks、Boyan chain 和 Baird counterexample 等问题上收敛 much faster than 现有 GTD 算法，并且步长在大范围内表现良好。

First-order Policy Optimization for Robust Policy Evaluation

paper_url: http://arxiv.org/abs/2307.15890
repo_url: https://github.com/jettbrains/-L-
paper_authors: Yan Li, Guanghui Lan
for: 该论文关注了Markov决策过程中的稳定性和不确定性问题，提出了一种基于政策优化的方法来评估政策的性能。
methods: 该论文使用了首领策略评估（FRPE）方法，该方法在幂等设定下提供了对于精确和不确定情况下的政策评估的一个统一框架，可以应用于表格表示或通用函数近似。
results: 该论文证明了FRPE方法在幂等设定下具有线性减少性，并且在随机设定下具有$\mathcal{O}(1/\epsilon^2)$的样本复杂度。此外，FRPE还可以自然地扩展到评估不确定状态动作价值函数。

Abstract
We adopt a policy optimization viewpoint towards policy evaluation for robust Markov decision process with $\mathrm{s}$-rectangular ambiguity sets. The developed method, named first-order policy evaluation (FRPE), provides the first unified framework for robust policy evaluation in both deterministic (offline) and stochastic (online) settings, with either tabular representation or generic function approximation. In particular, we establish linear convergence in the deterministic setting, and $\tilde{\mathcal{O}(1/\epsilon^2)$ sample complexity in the stochastic setting. FRPE also extends naturally to evaluating the robust state-action value function with $(\mathrm{s}, \mathrm{a})$-rectangular ambiguity sets. We discuss the application of the developed results for stochastic policy optimization of large-scale robust MDPs.

摘要
我们采用一个政策优化的看法来评估政策的强健Markov决策过程（MDP）中的政策评估。我们发展的方法，名为首轮政策评估（FRPE），提供了强健政策评估的第一个一致性框架，可以在决定（离线）和随机（在线）设置中进行，并且可以使用表格表示或通用函数近似。具体来说，我们证明了决定性设置中的线性传播，并且在随机设置中实现了$\mathcal{O}(1/\epsilon^2)$的样本复杂度。FRPE还自然地扩展到评估强健状态行为函数的($\mathrm{s}$, $\mathrm{a}$)-矩形不确定集。我们讨论了对大规模强健MDP的随机政策优化的应用。

Explaining Full-disk Deep Learning Model for Solar Flare Prediction using Attribution Methods

paper_url: http://arxiv.org/abs/2307.15878
repo_url: https://bitbucket.org/gsudmlab/explainfdvgg16
paper_authors: Chetraj Pandey, Rafal A. Angryk, Berkay Aydin
for: 这研究投入了深度学习方法来预测日冕物理学上的太阳折射风暴，尤其是受到严重忽略的近缘环区风暴。
methods: 这paper使用了一个使用每小时全盘线对视图图像的深度学习模型，并采用了二分类预测模式来预测将在下一个24小时期内发生的M级或更大的风暴。为了解决类别不均衡问题，这paper使用了数据拓展和类型权重技术。
results: 这paper的分析发现，全盘预测太阳风暴与活跃区（ARs）相关的特征都能够准确预测近缘环区风暴。具体来说，这paper的深度学习模型在24小时内预测M级或更大的风暴的True Skill Statistics（TSS）和Heidke Skill Score（HSS）分别为0.51和0.35，而且模型的解释分析表明，模型可以通过从全盘磁图像中提取与ARs相关的特征来做出对应预测。

Abstract
This paper contributes to the growing body of research on deep learning methods for solar flare prediction, primarily focusing on highly overlooked near-limb flares and utilizing the attribution methods to provide a post hoc qualitative explanation of the model's predictions. We present a solar flare prediction model, which is trained using hourly full-disk line-of-sight magnetogram images and employs a binary prediction mode to forecast $\geq$M-class flares that may occur within the following 24-hour period. To address the class imbalance, we employ a fusion of data augmentation and class weighting techniques; and evaluate the overall performance of our model using the true skill statistic (TSS) and Heidke skill score (HSS). Moreover, we applied three attribution methods, namely Guided Gradient-weighted Class Activation Mapping, Integrated Gradients, and Deep Shapley Additive Explanations, to interpret and cross-validate our model's predictions with the explanations. Our analysis revealed that full-disk prediction of solar flares aligns with characteristics related to active regions (ARs). In particular, the key findings of this study are: (1) our deep learning models achieved an average TSS=0.51 and HSS=0.35, and the results further demonstrate a competent capability to predict near-limb solar flares and (2) the qualitative analysis of the model explanation indicates that our model identifies and uses features associated with ARs in central and near-limb locations from full-disk magnetograms to make corresponding predictions. In other words, our models learn the shape and texture-based characteristics of flaring ARs even at near-limb areas, which is a novel and critical capability with significant implications for operational forecasting.

摘要
In simplified Chinese:这篇论文为深入学习方法的太阳风暴预测研究做出了贡献，主要集中在快速减少的近边风暴预测方面，并使用负责任分析方法提供后果质量的解释。我们提出了一种太阳风暴预测模型，该模型通过每小时的全盘视线磁agram图像训练，并使用二进制预测模式预测下一个24小时内可能发生的M级风暴。为了解决类别不均衡问题，我们使用数据扩充和分类权重技术。我们使用真实技能统计学（TSS）和海德克技能统计学（HSS）来评估模型的总性表现。此外，我们还应用了三种负责任分析方法，即导航梯度权重映射、整合梯度和深度凝聚式加法解释，以解释和证明我们的模型预测的解释。我们的分析发现，我们的深度学习模型可以准确预测近边风暴，并且可以从全盘磁agram图像中提取活跃区域（ARs）的特征。具体来说，我们的模型可以在中央和近边位置中找到ARs的形状和文本特征，这是一种新的和重要的能力，具有深刻的运营预测意义。

GraphDAC: A Graph-Analytic Approach to Dynamic Airspace Configuration

paper_url: http://arxiv.org/abs/2307.15876
repo_url: https://github.com/kefenge2022/graphdac
paper_authors: Ke Feng, Dahai Liu, Yongxin Liu, Hong Liu, Houbing Song
for: 提高航空交通能力和应急响应能力
methods: 使用图像学算法和聚类分析生成协同机场组和均衡工作负担
results: 在不同交通情况下，实现了减少工作负担差异50%Here’s a brief explanation of each point:
for: The paper aims to improve the capacity and responsiveness of the current National Airspace System (NAS) by proposing a more dynamic airspace configuration approach.
methods: The proposed approach uses a constraints-embedded graph, dimensional compression, and spectral clustering-enabled adaptive algorithm to generate collaborative airport groups and evenly distribute workloads among them.
results: The experiments demonstrate a 50% reduction in workload imbalances under various traffic conditions, indicating the effectiveness of the proposed approach.

Abstract
The current National Airspace System (NAS) is reaching capacity due to increased air traffic, and is based on outdated pre-tactical planning. This study proposes a more dynamic airspace configuration (DAC) approach that could increase throughput and accommodate fluctuating traffic, ideal for emergencies. The proposed approach constructs the airspace as a constraints-embedded graph, compresses its dimensions, and applies a spectral clustering-enabled adaptive algorithm to generate collaborative airport groups and evenly distribute workloads among them. Under various traffic conditions, our experiments demonstrate a 50\% reduction in workload imbalances. This research could ultimately form the basis for a recommendation system for optimized airspace configuration. Code available at https://github.com/KeFenge2022/GraphDAC.git

摘要
现有的国家航空系统（NAS）因为增加的航空交通量而达到了容量限制，并且基于先前的战略规划。本研究提出了一种更动态航空配置（DAC）方法，可以提高通过率和应对峰值交通时间的适应性。该方法将空间构建为约束嵌入图，压缩其维度，并通过特征集群算法来生成协作机场组和均衡工作负担。在不同的交通情况下，我们的实验表明，可以降低工作负担不均的幅度达50%。这些研究可能最终形成一个优化航空配置的建议系统。代码可以在https://github.com/KeFenge2022/GraphDAC.git中下载。

Cross-dimensional transfer learning in medical image segmentation with deep learning

paper_url: http://arxiv.org/abs/2307.15872
repo_url: https://github.com/hic-messaoudi/cross-dimensional-transfer-learning-in-medical-image-segmentation-with-deep-learning
paper_authors: Hicham Messaoudi, Ahror Belaid, Douraied Ben Salem, Pierre-Henri Conze
for: 这篇论文的目的是将2D类别网络转移到2D和3D多模式医疗影像分类中，以提高医疗影像分类的精度和效率。
methods: 本论文提出了两个关键原则：首先，通过嵌入2D预训网络的重要特征，实现重要特征的转移，以提高2D类别网络的精度和效率。其次，通过扩展2D类别网络到更高维度，以实现维度的转移，以提高2D和3D多模式医疗影像分类的精度和效率。
results: 本论文的实验和质感结果显示，这些方法可以优化2D和3D多模式医疗影像分类的精度和效率。特别是，在CAMUS挑战中，这篇论文的2D网络排名第一，超过了现有的state-of-the-art。在CHAOS挑战中，这篇论文的2D/3D MR和CT腹部影像的分类结果优于其他2D基于方法，并在Dice、RAVD、ASSD和MSSD分类指标中跨越了前一代。在BraTS 2022比赛中，这篇论文的3D网络也获得了良好的结果，平均Dice分类指标为91.69%（91.22%）、核心部分为83.23%（84.77%）和增强部分为81.75%（83.88%）。

Abstract
Over the last decade, convolutional neural networks have emerged and advanced the state-of-the-art in various image analysis and computer vision applications. The performance of 2D image classification networks is constantly improving and being trained on databases made of millions of natural images. However, progress in medical image analysis has been hindered by limited annotated data and acquisition constraints. These limitations are even more pronounced given the volumetry of medical imaging data. In this paper, we introduce an efficient way to transfer the efficiency of a 2D classification network trained on natural images to 2D, 3D uni- and multi-modal medical image segmentation applications. In this direction, we designed novel architectures based on two key principles: weight transfer by embedding a 2D pre-trained encoder into a higher dimensional U-Net, and dimensional transfer by expanding a 2D segmentation network into a higher dimension one. The proposed networks were tested on benchmarks comprising different modalities: MR, CT, and ultrasound images. Our 2D network ranked first on the CAMUS challenge dedicated to echo-cardiographic data segmentation and surpassed the state-of-the-art. Regarding 2D/3D MR and CT abdominal images from the CHAOS challenge, our approach largely outperformed the other 2D-based methods described in the challenge paper on Dice, RAVD, ASSD, and MSSD scores and ranked third on the online evaluation platform. Our 3D network applied to the BraTS 2022 competition also achieved promising results, reaching an average Dice score of 91.69% (91.22%) for the whole tumor, 83.23% (84.77%) for the tumor core, and 81.75% (83.88%) for enhanced tumor using the approach based on weight (dimensional) transfer. Experimental and qualitative results illustrate the effectiveness of our methods for multi-dimensional medical image segmentation.

摘要
过去一个十年， convolutional neural networks（CNN）在各种图像分析和计算视觉应用中得到了提升和进步。2D图像分类网络的性能不断提高，并在包含数百万个自然图像的数据库上进行训练。然而，医疗图像分析的进步受到有限的标注数据和获取限制的影响。这些限制在医疗图像数据的量度上更加明显。在这篇文章中，我们介绍了一种高效的方法，将2D自然图像分类网络的效率传递到2D、3D单模和多模医疗图像分割应用中。为此，我们设计了基于以下两个关键原则的新架构：1）通过嵌入2D预训练encoder来实现重量传递，2）通过扩展2D分割网络到更高维度来实现维度传递。我们在不同模式的数据集上进行测试，包括MR、CT和ultrasound图像。我们的2D网络在Cardio- computed tomography（CAMUS）挑战中以echo-cardiographic数据分割的方式 ranked first，超过了现状。在2D/3D MR和CT腹部图像上，我们的方法在CHAOS挑战中较其他2D基于方法的Dice、RAVD、ASSD和MSSD分数上表现出色，并在在线评估平台上排名第三。我们的3D网络在BraTS 2022比赛中也实现了良好的结果，取得了91.69%（91.22%）的总肿瘤Dice分数，83.23%（84.77%）的肿瘤核心Dice分数和81.75%（83.88%）的增强肿瘤Dice分数。实验和质量分析结果表明，我们的方法在多维度医疗图像分 segmentation 中具有有效性。

Efficient Semi-Supervised Federated Learning for Heterogeneous Participants

paper_url: http://arxiv.org/abs/2307.15870
repo_url: None
paper_authors: Zhipeng Sun, Yang Xu, Hongli Xu, Zhiyuan Wang
for: This paper proposes a novel system for training machine learning models in scenarios where labeled data reside on the server, called Pseudo-Clustering Semi-SFL.methods: The proposed system leverages semi-supervised techniques and clustering regularization to improve model performance under data non-IIDness. Additionally, a control algorithm for global updating frequency adaptation is developed to mitigate the training inconsistency.results: The proposed system achieves a 3.3x speed-up in training time and reduces the communication cost by about 80.1% while reaching the target accuracy, and achieves up to 6.9% improvement in accuracy under non-IID scenarios compared to the state-of-the-art.Here is the simplified Chinese version:for: 这篇论文提出了一种基于服务器上的标签数据的机器学习模型训练系统，名为 Pseudo-Clustering Semi-SFL。methods: 该系统利用了半指导的技术和帮助Regularization来提高模型在非同分布场景下的性能。此外，还开发了一种控制算法来调整全局更新频率，以避免训练不一致。results: 该系统实现了训练时间的3.3倍减少和通信成本的约80.1%减少，同时达到目标准确率，并在非同分布场景下实现了6.9%的提升。

Abstract
Federated Learning (FL) has emerged to allow multiple clients to collaboratively train machine learning models on their private data. However, training and deploying large models for broader applications is challenging in resource-constrained environments. Fortunately, Split Federated Learning (SFL) offers an excellent solution by alleviating the computation and communication burden on the clients SFL often assumes labeled data for local training on clients, however, it is not the case in practice.Prior works have adopted semi-supervised techniques for leveraging unlabeled data in FL, but data non-IIDness poses another challenge to ensure training efficiency. Herein, we propose Pseudo-Clustering Semi-SFL, a novel system for training models in scenarios where labeled data reside on the server. By introducing Clustering Regularization, model performance under data non-IIDness can be improved. Besides, our theoretical and experimental investigations into model convergence reveal that the inconsistent training processes on labeled and unlabeled data impact the effectiveness of clustering regularization. Upon this, we develop a control algorithm for global updating frequency adaptation, which dynamically adjusts the number of supervised training iterations to mitigate the training inconsistency. Extensive experiments on benchmark models and datasets show that our system provides a 3.3x speed-up in training time and reduces the communication cost by about 80.1% while reaching the target accuracy, and achieves up to 6.9% improvement in accuracy under non-IID scenarios compared to the state-of-the-art.

摘要
federated learning (FL) 已经出现以 Allow Multiple clients 共同训练机器学习模型 On Their Private Data 。然而，在有限资源环境中训练和部署大型模型 для更广泛的应用是具有挑战性。幸运的是，Split Federated Learning (SFL) 提供了一个优秀的解决方案，减轻客户端的计算和通信压力。SFL 通常假设客户端上有标注数据进行本地训练，但在实践中并不是这样。先前的工作已经采用了 semi-supervised 技术来利用无标注数据，但是数据非标一致性又是一个困难。在这种情况下，我们提出了 Pseudo-Clustering Semi-SFL，一种用于在客户端上训练模型的新系统。我们引入了集群 regularization，可以在非标一致性情况下提高模型性能。此外，我们对模型融合的理论和实验调查表明，在标注和无标注数据上不一致的训练过程会影响集群 regularization 的效果。为此，我们开发了一种控制算法，可以动态调整全局更新频率，以抵消训练不一致性的影响。我们对标准模型和数据集进行了广泛的实验，结果显示，我们的系统可以提高训练时间速度3.3倍，降低通信成本约80.1%，同时达到目标准确率。此外，我们的系统在非标一致场景下可以提高准确率达6.9%。

Faster Stochastic Algorithms for Minimax Optimization under Polyak–Łojasiewicz Conditions

paper_url: http://arxiv.org/abs/2307.15868
repo_url: https://github.com/truenobility303/spider-gda
paper_authors: Lesi Chen, Boyuan Yao, Luo Luo
for: 这个论文考虑了随机首次算法的最大化问题，它们可以在Polyak-{\L}ojasiewicz（PL）条件下进行优化。
methods: 我们提出了一种名为SPIDER-GDA的算法来解决具有finite-sum形式的问题，即 $\min_x \max_y f(x,y)\triangleq \frac{1}{n} \sum_{i=1}^n f_i(x,y)$，其中目标函数$f(x,y)$是$\mu_x$-PL在$x$上和$\mu_y$-PL在$y$上，每个$f_i(x,y)$是$L$-平滑的。我们证明了SPIDER-GDA可以在${\mathcal O}\left((n + \sqrt{n},\kappa_x\kappa_y^2)\log (1/\epsilon)\right)$个随机首次oracle（SFO）复杂度内找到$\epsilon$-优解，比现状态艺术法的SFOUpper bound更好。
results: 我们的算法可以在糟糕条件下提供更高效的算法，其SFOUpper bound为$\tilde{\mathcal O}\big((n+\sqrt{n},\kappa_x\kappa_y)\log^2 (1/\epsilon)\big)$，当$\kappa_y \gtrsim \sqrt{n}$时。我们的想法还可以应用于更一般的情况下，即目标函数只满足PL条件一个变量。实验 validate了我们的提案的优越性。

Abstract
This paper considers stochastic first-order algorithms for minimax optimization under Polyak--{\L}ojasiewicz (PL) conditions. We propose SPIDER-GDA for solving the finite-sum problem of the form $\min_x \max_y f(x,y)\triangleq \frac{1}{n} \sum_{i=1}^n f_i(x,y)$, where the objective function $f(x,y)$ is $\mu_x$-PL in $x$ and $\mu_y$-PL in $y$; and each $f_i(x,y)$ is $L$-smooth. We prove SPIDER-GDA could find an $\epsilon$-optimal solution within ${\mathcal O}\left((n + \sqrt{n}\,\kappa_x\kappa_y^2)\log (1/\epsilon)\right)$ stochastic first-order oracle (SFO) complexity, which is better than the state-of-the-art method whose SFO upper bound is ${\mathcal O}\big((n + n^{2/3}\kappa_x\kappa_y^2)\log (1/\epsilon)\big)$, where $\kappa_x\triangleq L/\mu_x$ and $\kappa_y\triangleq L/\mu_y$. For the ill-conditioned case, we provide an accelerated algorithm to reduce the computational cost further. It achieves $\tilde{\mathcal O}\big((n+\sqrt{n}\,\kappa_x\kappa_y)\log^2 (1/\epsilon)\big)$ SFO upper bound when $\kappa_y \gtrsim \sqrt{n}$. Our ideas also can be applied to the more general setting that the objective function only satisfies PL condition for one variable. Numerical experiments validate the superiority of proposed methods.

摘要
本文考虑了随机首次算法 для最小最大化问题，其中目标函数 $f(x,y)$ 满足 Polyak-{\L}ojasiewicz（PL）条件，并且每个 $f_i(x,y)$ 是 $L$ 平滑的。我们提出了 SPIDER-GDA 算法来解决 finite-sum 问题 $\min_x \max_y f(x,y) \triangleq \frac{1}{n} \sum_{i=1}^n f_i(x,y)$，其中 $x$ 和 $y$ 都是维度为 $n$ 的变量。我们证明了 SPIDER-GDA 可以在 ${\mathcal O}\left((n + \sqrt{n}\,\kappa_x\kappa_y^2)\log (1/\epsilon)\right)$ 随机首次访问（SFO）复杂度内找到 $\epsilon$-优解，这比现状最佳方法的 SFO upper bound 更好，其为 ${\mathcal O}\big((n + n^{2/3}\kappa_x\kappa_y^2)\log (1/\epsilon)\big)$。在坏条件下，我们提供了一个加速算法，可以将计算复杂度降低到 $\tilde{\mathcal O}\big((n+\sqrt{n}\,\kappa_x\kappa_y)\log^2 (1/\epsilon)\big)$。我们的想法也可以应用于更一般的情况，其中目标函数只满足 PL 条件一个变量。实验 validate 我们提出的方法的优势。

Catching Elusive Depression via Facial Micro-Expression Recognition

paper_url: http://arxiv.org/abs/2307.15862
repo_url: None
paper_authors: Xiaohui Chen, Tie Luo
for: 这项研究旨在识别隐藏型抑郁症（Concealed Depression），通过识别面部微表情（Facial Micro-Expressions，FMEs）来检测和识别真正的情感表达。
methods: 该研究提出了一种基于面部特征点（Facial Landmarks）的区域 интерес点（Region-of-Interest，ROI）方法，以解决识别FMEs的挑战。此外，该研究还提出了一种低成本、隐私保护的解决方案，允许用户在个人 Setting（如家中）进行自诊断，使用可携带的移动设备。
results: 研究结果和发现表明，该方法可以有效地识别和检测隐藏型抑郁症。然而，在实际临床设置中，还需要解决一些技术挑战，以确保方法的可靠性和精度。

Abstract
Depression is a common mental health disorder that can cause consequential symptoms with continuously depressed mood that leads to emotional distress. One category of depression is Concealed Depression, where patients intentionally or unintentionally hide their genuine emotions through exterior optimism, thereby complicating and delaying diagnosis and treatment and leading to unexpected suicides. In this paper, we propose to diagnose concealed depression by using facial micro-expressions (FMEs) to detect and recognize underlying true emotions. However, the extremely low intensity and subtle nature of FMEs make their recognition a tough task. We propose a facial landmark-based Region-of-Interest (ROI) approach to address the challenge, and describe a low-cost and privacy-preserving solution that enables self-diagnosis using portable mobile devices in a personal setting (e.g., at home). We present results and findings that validate our method, and discuss other technical challenges and future directions in applying such techniques to real clinical settings.

摘要
抑郁是一种常见的心理健康问题，可能导致严重的情感不适和情绪压力。一种类型的抑郁是隐藏型抑郁，病人通过表面上的乐观情绪隐藏真实的情感，从而复杂和延迟诊断和治疗，导致意外的自杀。在这篇论文中，我们提议使用表情微表情（FMEs）来检测和识别隐藏的真实情感。然而，表情微表情的非常低敏感和细腻性使其识别成为一项困难的任务。我们提议使用面部特征点的区域利用方法（ROI）解决这个挑战，并描述一种低成本、隐私保护的解决方案，允许自我诊断在家庭环境（如家中）使用手持式移动设备进行。我们展示了结果和发现，并讨论了其他技术挑战和未来方向在实际临床设置中应用such techniques。

Multi-output Headed Ensembles for Product Item Classification

paper_url: http://arxiv.org/abs/2307.15858
repo_url: None
paper_authors: Hotaka Shiokawa, Pradipto Das, Arthur Toth, Justin Chiu
for: The paper is written for the problem of product item classification for large-scale e-commerce catalogs, specifically addressing the issue of poor generalization performance due to the unavailability of sizable curated training sets.
methods: The paper proposes an extensible deep learning based classification model framework that combines multiple classifiers and uses metadata features and low-level feature engineering to boost classification performance.
results: The paper shows improvements in classification performance against robust industry standard baseline models using hyperparameter optimization, and also proposes a novel way to evaluate model performance using user sessions that provides better insights in addition to traditional measures of precision and recall.

Abstract
In this paper, we revisit the problem of product item classification for large-scale e-commerce catalogs. The taxonomy of e-commerce catalogs consists of thousands of genres to which are assigned items that are uploaded by merchants on a continuous basis. The genre assignments by merchants are often wrong but treated as ground truth labels in automatically generated training sets, thus creating a feedback loop that leads to poorer model quality over time. This problem of taxonomy classification becomes highly pronounced due to the unavailability of sizable curated training sets. Under such a scenario it is common to combine multiple classifiers to combat poor generalization performance from a single classifier. We propose an extensible deep learning based classification model framework that benefits from the simplicity and robustness of averaging ensembles and fusion based classifiers. We are also able to use metadata features and low-level feature engineering to boost classification performance. We show these improvements against robust industry standard baseline models that employ hyperparameter optimization. Additionally, due to continuous insertion, deletion and updates to real-world high-volume e-commerce catalogs, assessing model performance for deployment using A/B testing and/or manual annotation becomes a bottleneck. To this end, we also propose a novel way to evaluate model performance using user sessions that provides better insights in addition to traditional measures of precision and recall.

摘要
在这篇论文中，我们重新回到大规模电商目录中的产品项目分类问题上。电商目录的分类系统包含数千个分类，这些分类被商户上传的商品项目分配给了。商户将这些分类分配是经常错误的，但是这些分配被视为真实的标签，因此在自动生成的训练集中创建了一个反馈循环，导致模型质量变得更加低下。这个问题在没有大量精心编辑的训练集时特别突出来。在这种情况下，常见的方法是将多个分类器组合起来，以避免单个分类器的差异性。我们提出了一个可扩展的深度学习基于分类模型框架，该框架具有简单性和鲁棒性，可以使用averaging ensemble和 fusión基类ifiers。此外，我们还可以使用元数据特征和低级特征工程来提高分类性能。我们在使用超参优化的基础模型上显示了这些改进。此外，由于高量电商目录中的不断插入、删除和更新，使得在部署时使用A/B测试和/或手动标注来评估模型性能变得困难。为此，我们还提出了一种新的评估模型性能的方法，使用用户会话，该方法可以提供更好的投影，并且与传统的准确率和受损率相加。

Improving Realistic Worst-Case Performance of NVCiM DNN Accelerators through Training with Right-Censored Gaussian Noise

paper_url: http://arxiv.org/abs/2307.15853
repo_url: None
paper_authors: Zheyu Yan, Yifan Qin, Wujie Wen, Xiaobo Sharon Hu, Yiyu Shi
for: 提高深度神经网络（DNN）加速器的可靠性和稳定性，适用于安全关键应用场景如自动驾驶车辆。
methods: 利用k-th percentile性能（KPP）来捕捉DNN模型在 compute-in-Memory（CiM）加速器上的准确最差性能，并通过 formal 分析和噪声插入法来提高KPP。
results: 比起现有方法，提出一种自动确定插入右 censored Gaussian 噪声的方法，可以达到26%的KPP提高。

Abstract
Compute-in-Memory (CiM), built upon non-volatile memory (NVM) devices, is promising for accelerating deep neural networks (DNNs) owing to its in-situ data processing capability and superior energy efficiency. Unfortunately, the well-trained model parameters, after being mapped to NVM devices, can often exhibit large deviations from their intended values due to device variations, resulting in notable performance degradation in these CiM-based DNN accelerators. There exists a long list of solutions to address this issue. However, they mainly focus on improving the mean performance of CiM DNN accelerators. How to guarantee the worst-case performance under the impact of device variations, which is crucial for many safety-critical applications such as self-driving cars, has been far less explored. In this work, we propose to use the k-th percentile performance (KPP) to capture the realistic worst-case performance of DNN models executing on CiM accelerators. Through a formal analysis of the properties of KPP and the noise injection-based DNN training, we demonstrate that injecting a novel right-censored Gaussian noise, as opposed to the conventional Gaussian noise, significantly improves the KPP of DNNs. We further propose an automated method to determine the optimal hyperparameters for injecting this right-censored Gaussian noise during the training process. Our method achieves up to a 26% improvement in KPP compared to the state-of-the-art methods employed to enhance DNN robustness under the impact of device variations.

摘要
计算在内存（CiM），基于不可塑性存储器（NVM）设备，对深度神经网络（DNN）进行加速，因其可在位置处理数据和能效率具有优势。然而，将训练好的模型参数映射到NVM设备后，由于设备变化而导致的巨大偏差，可能会导致DNN加速器的性能下降。当前的解决方法主要关注提高CiM DNN加速器的平均性能。然而，对于许多安全关键应用，如自动驾驶车辆，保证最坏情况性能是非常重要。在这种情况下，我们提出使用k-th percentile性能（KPP）来捕捉DNN模型在CiM加速器上的实际最坏情况性能。我们通过对KPP和噪声注入式DNN训练的形式分析，表明在插入右 censored Gaussian噪声时，DNN的KPP有显著改善。此外，我们还提出一种自动确定在训练过程中插入这种右 censored Gaussian噪声的优化参数的方法。我们的方法可以与现有的状态艺法相比，提高KPP达26%。

Comprehensive Algorithm Portfolio Evaluation using Item Response Theory

paper_url: http://arxiv.org/abs/2307.15850
repo_url: https://github.com/sevvandi/airt-scripts
paper_authors: Sevvandi Kandanaarachchi, Kate Smith-Miles
for: 评估机器学习算法的表现 across a repository of datasets，同时描述算法的一般特征和异常性。
methods: 使用修改后的 Item Response Theory（IRT）模型，无需更多的数据特征计算，以获得更加具体的算法性能特征。
results: 在各种应用领域中测试了算法股投资，并证明了这种方法的广泛适用性和可解释性。

Abstract
Item Response Theory (IRT) has been proposed within the field of Educational Psychometrics to assess student ability as well as test question difficulty and discrimination power. More recently, IRT has been applied to evaluate machine learning algorithm performance on a single classification dataset, where the student is now an algorithm, and the test question is an observation to be classified by the algorithm. In this paper we present a modified IRT-based framework for evaluating a portfolio of algorithms across a repository of datasets, while simultaneously eliciting a richer suite of characteristics - such as algorithm consistency and anomalousness - that describe important aspects of algorithm performance. These characteristics arise from a novel inversion and reinterpretation of the traditional IRT model without requiring additional dataset feature computations. We test this framework on algorithm portfolios for a wide range of applications, demonstrating the broad applicability of this method as an insightful algorithm evaluation tool. Furthermore, the explainable nature of IRT parameters yield an increased understanding of algorithm portfolios.

摘要
item response theory (IRT) 在教育心理测量领域提出来评估学生能力以及测试题目难度和抗択力。更加最近，IRT 被应用于评估单一分类 dataset 上机器学习算法的性能，其中学生现在是一个算法，测试题目是一个需要被分类的观察。在这篇文章中，我们提出了一种基于 IRT 的修改后的框架，用于评估一个数据库中的算法投资组，同时同时抽取一系列特征，例如算法一致性和异常性，这些特征描述了算法性能中重要的一些方面。这些特征来自于传统 IRT 模型的新的倒推和重新解释，不需要额外的数据特征计算。我们在各种应用领域测试了这种框架， demonstarting its 广泛适用性作为一种深入的算法评估工具。此外，可解释的 IRT 参数带来了对算法投资组的更好的理解。

Quantum Kernel Estimation With Neutral Atoms For Supervised Classification: A Gate-Based Approach

paper_url: http://arxiv.org/abs/2307.15840
repo_url: None
paper_authors: Marco Russo, Edoardo Giusto, Bartolomeo Montrucchio
for: 本文提出了一种基于量子计算机的kernel估计技术（量子卷积kernel估计，QKE），用于训练支持向量机（SVM）。由于实现特征映射需要大量的2本地运算，因此需要较高的量子比特数连接。而现代超导器device不可能实现这种连接。因此，本文使用中性原子量子计算机，因为它们允许更多的自由度。
methods: 本文提出了一种基于门odel的通用方法，包括1个和2个门的门。然后，通过实验计算kernel矩阵，从数据集中获得了高准确率。此外，本文还将这种过程推广到N个量子比特上，利用中性原子device的更 flexible的排列方式。
results: 本文的实验结果表明，使用中性原子量子计算机和基于门odel的方法可以实现高准确率的支持向量机训练，即使数据集小并且分离度低。这是首先提出了一种可以在中性原子device上实现通用 kernel估计的文献。

Abstract
Quantum Kernel Estimation (QKE) is a technique based on leveraging a quantum computer to estimate a kernel function that is classically difficult to calculate, which is then used by a classical computer for training a Support Vector Machine (SVM). Given the high number of 2-local operators necessary for realizing a feature mapping hard to simulate classically, a high qubit connectivity is needed, which is not currently possible on superconducting devices. For this reason, neutral atom quantum computers can be used, since they allow to arrange the atoms with more freedom. Examples of neutral-atom-based QKE can be found in the literature, but they are focused on graph learning and use the analogue approach. In this paper, a general method based on the gate model is presented. After deriving 1-qubit and 2-qubit gates starting from laser pulses, a parameterized sequence for feature mapping on 3 qubits is realized. This sequence is then used to empirically compute the kernel matrix starting from a dataset, which is finally used to train the SVM. It is also shown that this process can be generalized up to N qubits taking advantage of the more flexible arrangement of atoms that this technology allows. The accuracy is shown to be high despite the small dataset and the low separation. This is the first paper that not only proposes an algorithm for explicitly deriving a universal set of gates but also presents a method of estimating quantum kernels on neutral atom devices for general problems using the gate model.

摘要
量子均衡估计（QKE）是一种基于使用量子计算机来估计类比Difficult to calculate classical kernel function，然后使用类型计算机进行培训支持向量机（SVM）的技术。由于实现特征映射所需的2本本操作数量很高，因此需要高 qubit 连接度，这 Currently not possible on superconducting devices. 因此，中性原子量子计算机可以使用，它们允许 atoms 的更多自由排序。文中提到了中性原子基于QKE的例子，但是它们主要关注于图学学习和使用分析方法。本文则提出了一种基于门模型的通用方法。通过从激光脉冲开始 derive 1 qubit 和 2 qubit 门，实现了基于3 qubits的特征映射序列。这个序列然后用来实际计算基于数据集的kernel矩阵，最后用来培训SVM。此外，文中还证明了这种过程可以扩展到N qubits，利用中性原子技术允许的更 flexible atoms 排序。具体来说，文中通过使用小型数据集和低分离度来证明这种方法的高准确率。这是第一篇不仅提出了一种算法来直接 derivation 一组 universal gates，而且还提出了使用门模型来估计中性原子设备上的量子kernels的方法。

Holistic Survey of Privacy and Fairness in Machine Learning

paper_url: http://arxiv.org/abs/2307.15838
repo_url: None
paper_authors: Sina Shaham, Arash Hajisafi, Minh K Quan, Dinh C Nguyen, Bhaskar Krishnamachari, Charith Peris, Gabriel Ghinita, Cyrus Shahabi, Pubudu N. Pathirana
for: 本文旨在探讨负责任人工智能（AI）和可靠机器学习（ML）中的隐私和公平问题，以及这两个目标如何同时 integrate into ML 模型中。
methods: 本文通过对隐私和公平在 ML 中的研究，包括指导、不指导、半指导和奖励学习等多种方法，以及这些方法在应用领域的交互。
results: 本文结合了现有的研究成果，提出了隐私和公平在 ML 中的影响关系，以及如何同时实现这两个目标而减少功能损失。 However, the paper also identifies research challenges in achieving privacy and fairness concurrently in large language models.

Abstract
Privacy and fairness are two crucial pillars of responsible Artificial Intelligence (AI) and trustworthy Machine Learning (ML). Each objective has been independently studied in the literature with the aim of reducing utility loss in achieving them. Despite the significant interest attracted from both academia and industry, there remains an immediate demand for more in-depth research to unravel how these two objectives can be simultaneously integrated into ML models. As opposed to well-accepted trade-offs, i.e., privacy-utility and fairness-utility, the interrelation between privacy and fairness is not well-understood. While some works suggest a trade-off between the two objective functions, there are others that demonstrate the alignment of these functions in certain scenarios. To fill this research gap, we provide a thorough review of privacy and fairness in ML, including supervised, unsupervised, semi-supervised, and reinforcement learning. After examining and consolidating the literature on both objectives, we present a holistic survey on the impact of privacy on fairness, the impact of fairness on privacy, existing architectures, their interaction in application domains, and algorithms that aim to achieve both objectives while minimizing the utility sacrificed. Finally, we identify research challenges in achieving privacy and fairness concurrently in ML, particularly focusing on large language models.

摘要
<> translate the following text into Simplified Chinese:Privacy and fairness are two crucial pillars of responsible Artificial Intelligence (AI) and trustworthy Machine Learning (ML). Each objective has been independently studied in the literature with the aim of reducing utility loss in achieving them. Despite the significant interest attracted from both academia and industry, there remains an immediate demand for more in-depth research to unravel how these two objectives can be simultaneously integrated into ML models. As opposed to well-accepted trade-offs, i.e., privacy-utility and fairness-utility, the interrelation between privacy and fairness is not well-understood. While some works suggest a trade-off between the two objective functions, there are others that demonstrate the alignment of these functions in certain scenarios. To fill this research gap, we provide a thorough review of privacy and fairness in ML, including supervised, unsupervised, semi-supervised, and reinforcement learning. After examining and consolidating the literature on both objectives, we present a holistic survey on the impact of privacy on fairness, the impact of fairness on privacy, existing architectures, their interaction in application domains, and algorithms that aim to achieve both objectives while minimizing the utility sacrificed. Finally, we identify research challenges in achieving privacy and fairness concurrently in ML, particularly focusing on large language models.Translation:<>隐私和公正是负责任人工智能（AI）和可靠机器学习（ML）中的两个关键柱子。每个目标都已经独立地在文献中研究，以减少实现它们的利用损失。尽管学术界和industry都对这两个目标表示了极大的兴趣，但是还有一个立即需要更深入的研究，以了解这两个目标如何同时integrated into ML模型。与well-accepted的交易所不同，i.e., 隐私-实用和公正-实用，隐私和公正之间的关系还不够了解。一些工作表明了这两个目标函数之间的交易，而其他一些则表明了这两个目标函数在某些场景下的alignment。为了填补这个研究漏洞，我们提供了隐私和公正在ML中的经过系统性的综述，包括supervised, unsupervised, semi-supervised,和reinforcement learning。我们结合了文献中关于这两个目标的所有研究，并提供了一个总体的回顾，探讨了隐私对公正的影响，公正对隐私的影响，现有的架构，其交互在应用领域中，以及可以实现这两个目标的Algorithms，同时尽量减少实用损失。最后，我们确定了在ML中实现隐私和公正的研究挑战，特别是关注大型语言模型。

Mean Estimation with User-level Privacy under Data Heterogeneity

paper_url: http://arxiv.org/abs/2307.15835
repo_url: None
paper_authors: Rachel Cummings, Vitaly Feldman, Audra McMillan, Kunal Talwar
for: Handle heterogeneous user data with different distribution and quantity of data while preserving user-level differential privacy.
methods: Propose a simple model of heterogeneous user data and an estimator that achieves asymptotic optimality with proven lower bounds on error.
results: Demonstrate the effectiveness of the proposed method through theoretical analysis and prove the asymptotic optimality and lower bounds on error.

Abstract
A key challenge in many modern data analysis tasks is that user data are heterogeneous. Different users may possess vastly different numbers of data points. More importantly, it cannot be assumed that all users sample from the same underlying distribution. This is true, for example in language data, where different speech styles result in data heterogeneity. In this work we propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data, and provide a method for estimating the population-level mean while preserving user-level differential privacy. We demonstrate asymptotic optimality of our estimator and also prove general lower bounds on the error achievable in the setting we introduce.

摘要
现代数据分析任务中一个关键挑战是用户数据异ogeneous。不同的用户可能拥有极其不同的数据点数。更重要的是，不能假设所有用户来自同一个下面分布。这是语言数据中的不同说话风格导致数据异ogeneous的一个例子。在这个工作中，我们提出了一种简单的异ogeneous用户数据模型，允许用户数据在分布和数据量方面不同，并提供了保持用户级别隐私的方法来估算人口级别的 mean。我们证明了我们的估计器在我们所引入的设定下是 asymptotic optimality 的，并且证明了该设定下的一般下界。

DeepTSF: Codeless machine learning operations for time series forecasting

paper_url: http://arxiv.org/abs/2308.00709
repo_url: None
paper_authors: Sotiris Pelekis, Evangelos Karakolis, Theodosios Pountridis, George Kormpakis, George Lampropoulos, Spiros Mouzakits, Dimitris Askounis
for: 这篇论文旨在提供一个通用的机器学习操作（MLOps）框架，以创新时间序列预测（TS）领域。
methods: 这篇论文使用了深度学习（DL）和机器学习（ML）方法，并自动化了运算和模型化的过程，以提高资料科学家和机器学习工程师的生产力和效率。
results: 这篇论文在实际应用中已经证明了 DeepTSF 的有效性，并且在电力和能源系统领域中展示了它的重要加值。

Abstract
This paper presents DeepTSF, a comprehensive machine learning operations (MLOps) framework aiming to innovate time series forecasting through workflow automation and codeless modeling. DeepTSF automates key aspects of the ML lifecycle, making it an ideal tool for data scientists and MLops engineers engaged in machine learning (ML) and deep learning (DL)-based forecasting. DeepTSF empowers users with a robust and user-friendly solution, while it is designed to seamlessly integrate with existing data analysis workflows, providing enhanced productivity and compatibility. The framework offers a front-end user interface (UI) suitable for data scientists, as well as other higher-level stakeholders, enabling comprehensive understanding through insightful visualizations and evaluation metrics. DeepTSF also prioritizes security through identity management and access authorization mechanisms. The application of DeepTSF in real-life use cases of the I-NERGY project has already proven DeepTSF's efficacy in DL-based load forecasting, showcasing its significant added value in the electrical power and energy systems domain.

摘要
DeepTSF provides a robust and user-friendly solution that seamlessly integrates with existing data analysis workflows, enhancing productivity and compatibility. The framework offers a front-end user interface (UI) suitable for data scientists and other higher-level stakeholders, providing insightful visualizations and evaluation metrics for comprehensive understanding.In addition, DeepTSF prioritizes security through identity management and access authorization mechanisms. The application of DeepTSF in real-life use cases of the I-NERGY project has already demonstrated its efficacy in DL-based load forecasting, showcasing its significant added value in the electrical power and energy systems domain.

A Distance Correlation-Based Approach to Characterize the Effectiveness of Recurrent Neural Networks for Time Series Forecasting

paper_url: http://arxiv.org/abs/2307.15830
repo_url: None
paper_authors: Christopher Salazar, Ashis G. Banerjee
for: 这个论文主要针对时间序列预测问题，尤其是使用循环神经网络（RNN）模型来解决这个问题。
methods: 该论文使用距离相关度指标来链接时间序列特征和RNN活动层的组件，以便解释和解释RNN的性能。
results: 研究发现，RNN活动层可以良好地学习时间序列的延迟结构，但是随着层数的增加，这些信息会逐渐丢失，导致时间序列预测质量下降。此外，活动层也无法完善地模拟平均移动和不均等时间序列过程。

Abstract
Time series forecasting has received a lot of attention with recurrent neural networks (RNNs) being one of the widely used models due to their ability to handle sequential data. Prior studies of RNNs for time series forecasting yield inconsistent results with limited insights as to why the performance varies for different datasets. In this paper, we provide an approach to link the characteristics of time series with the components of RNNs via the versatile metric of distance correlation. This metric allows us to examine the information flow through the RNN activation layers to be able to interpret and explain their performance. We empirically show that the RNN activation layers learn the lag structures of time series well. However, they gradually lose this information over a span of a few consecutive layers, thereby worsening the forecast quality for series with large lag structures. We also show that the activation layers cannot adequately model moving average and heteroskedastic time series processes. Last, we generate heatmaps for visual comparisons of the activation layers for different choices of the network hyperparameters to identify which of them affect the forecast performance. Our findings can, therefore, aid practitioners in assessing the effectiveness of RNNs for given time series data without actually training and evaluating the networks.

摘要
时间序列预测已经受到了很多关注，回归神经网络（RNN）是一种广泛使用的模型，因为它们可以处理序列数据。先前的研究表明，RNN在不同的数据集上的性能异常各异，具体原因不够清晰。在这篇论文中，我们提出了一种方法，通过距离相关度的灵活度量来连接时间序列的特征和RNN的组件。这种度量允许我们检查RNN活动层中信息的流动，以便解释和解释它们的性能。我们实际证明了，RNN活动层可以良好地学习时间序列的延迟结构。然而，它们随着连续层的数量增加，慢慢地失去这些信息，从而使时间序列预测质量下降。此外，我们还证明了，活动层不能够合理地模型平均和不均时间序列过程。最后，我们生成了不同网络参数选择的热图，以便比较活动层的效果。我们的发现可以帮助实践者评估RNN在给定时间序列数据上的效果，而不需要实际训练和评估网络。

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

paper_url: http://arxiv.org/abs/2307.15818
repo_url: None
paper_authors: Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Lisa Lee, Tsang-Wei Edward Lee, Sergey Levine, Yao Lu, Henryk Michalewski, Igor Mordatch, Karl Pertsch, Kanishka Rao, Krista Reymann, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Pierre Sermanet, Jaspiar Singh, Anikait Singh, Radu Soricut, Huong Tran, Vincent Vanhoucke, Quan Vuong, Ayzaan Wahid, Stefan Welker, Paul Wohlhart, Jialin Wu, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, Brianna Zitkovich
for: 这个论文的目标是把视力语言模型直接 integrate into end-to-end robotic control，以提高总结和允许emergent semantic reasoning。
methods: 作者提出了一种简单的、通用的方法，即将 robotic actions 表示为文本token，并将其直接 incorporated into the training set of the model。
results: 作者的方法导致了高性能的 робо控制策略，并允许模型获得了一系列的emergent capabilities，如对新物体的总结、理解不在机器人培训数据中的命令（如将物体放置在特定的数字或图标上）、以及对用户命令的简单逻辑处理（如选择最小或最大的物体、或者最近的物体）。

Abstract
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web. To this end, we propose to co-fine-tune state-of-the-art vision-language models on both robotic trajectory data and Internet-scale vision-language tasks, such as visual question answering. In contrast to other approaches, we propose a simple, general recipe to achieve this goal: in order to fit both natural language responses and robotic actions into the same format, we express the actions as text tokens and incorporate them directly into the training set of the model in the same way as natural language tokens. We refer to such category of models as vision-language-action models (VLA) and instantiate an example of such a model, which we call RT-2. Our extensive evaluation (6k evaluation trials) shows that our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training. This includes significantly improved generalization to novel objects, the ability to interpret commands not present in the robot training data (such as placing an object onto a particular number or icon), and the ability to perform rudimentary reasoning in response to user commands (such as picking up the smallest or largest object, or the one closest to another object). We further show that incorporating chain of thought reasoning allows RT-2 to perform multi-stage semantic reasoning, for example figuring out which object to pick up for use as an improvised hammer (a rock), or which type of drink is best suited for someone who is tired (an energy drink).

摘要
我们研究如何将互联网规模数据上训练的视力语言模型直接应用到终端控制中，以提高泛化和启动Semantic Reasoning。我们的目标是将单一的终端训练模型，能够将机器人观察到动作映射到动作，并且从互联网的视力语言数据获益。为了实现这一目标，我们提出了一个简单的普遍方法：将机器人动作表示为文本 токен，并将它们直接添加到模型的训练集中，与自然语言 токен一样。我们称这种类型的模型为视力语言动作模型（VLA），并实现了一个简单的示例，即 RT-2。我们的广泛评估（6000次评估）表明，我们的方法将带来高效的机器人政策，并允许 RT-2 获得互联网训练中的许多类型的能力。这包括对新物品的泛化、对机器人训练数据中没有的命令（如将物品放在特定数字或图示上）的理解，以及对使用者命令进行基本的推理（如选择最小或最大的物品，或者最近的物品）。我们进一步显示，将链接思维理论添加到 VLA 中，允许它执行多阶层Semantic Reasoning，例如选择哪样的物品用作做扩展钻（一个岩石），或者选择哪种饮料适合疲劳的人（一种能量饮料）。

Multi-growth stage plant recognition: a case study of Palmer amaranth (Amaranthus palmeri) in cotton (Gossypium hirsutum)

paper_url: http://arxiv.org/abs/2307.15816
repo_url: None
paper_authors: Guy RY Coleman, Matthew Kutugata, Michael J Walsh, Muthukumar Bagavathiannan
for: 这个论文旨在测试不同版本的YOLO框架在识别不同生长阶段的amaranthus palmeri中的性能。
methods: 该论文使用了YOLO框架的26种不同变体，并对其进行了测试和比较，以评估它们在识别不同生长阶段的表现。
results: 研究发现，使用最新版本的YOLO框架（v8）可以达到47.34%的识别精度，而将所有生长阶段 grouped为一个类型可以提高性能，最高的mean average precision（mAP）为67.05%。此外，使用不同的分割方法和权重也可以提高模型的性能。

Abstract
Many advanced, image-based precision agricultural technologies for plant breeding, field crop research, and site-specific crop management hinge on the reliable detection and phenotyping of plants across highly variable morphological growth stages. Convolutional neural networks (CNNs) have shown promise for image-based plant phenotyping and weed recognition, but their ability to recognize growth stages, often with stark differences in appearance, is uncertain. Amaranthus palmeri (Palmer amaranth) is a particularly challenging weed plant in cotton (Gossypium hirsutum) production, exhibiting highly variable plant morphology both across growth stages over a growing season, as well as between plants at a given growth stage due to high genetic diversity. In this paper, we investigate eight-class growth stage recognition of A. palmeri in cotton as a challenging model for You Only Look Once (YOLO) architectures. We compare 26 different architecture variants from YOLO v3, v5, v6, v6 3.0, v7, and v8 on an eight-class growth stage dataset of A. palmeri. The highest mAP@[0.5:0.95] for recognition of all growth stage classes was 47.34% achieved by v8-X, with inter-class confusion across visually similar growth stages. With all growth stages grouped as a single class, performance increased, with a maximum mean average precision (mAP@[0.5:0.95]) of 67.05% achieved by v7-Original. Single class recall of up to 81.42% was achieved by v5-X, and precision of up to 89.72% was achieved by v8-X. Class activation maps (CAM) were used to understand model attention on the complex dataset. Fewer classes, grouped by visual or size features improved performance over the ground-truth eight-class dataset. Successful growth stage detection highlights the substantial opportunity for improving plant phenotyping and weed recognition technologies with open-source object detection architectures.

摘要
多种高级图像基于精准农业技术，如植物选择、田间考核和场景特定作物管理，都需要可靠地检测和phenotyping植物。 convolutional neural networks (CNNs) 已经在图像基于植物phenotyping和苔藿识别中展示了抢夺性，但它们在不同生长阶段之间的形态差异 recognition 的能力尚未得到证明。 Amaranthus palmeri（Palmer amaranth）是在棉花（Gossypium hirsutum）生产中 particualrly 挑战人工智能，它的植物形态具有高度变化和 между植物之间的高遗传多样性。在这篇论文中，我们Investigate Amaranthus palmeri 在棉花中的八个生长阶段识别，作为YOLO 架构的挑战模型。我们对 YOLO v3、v5、v6、v6 3.0、v7 和 v8 中的26种不同架构variant进行比较，并获得了最高的mAP@[0.5:0.95] 值为47.34%，由 v8-X 实现。在所有生长阶段被 grouped 为一个单一类时，性能提高，最高的mAP@[0.5:0.95] 值为67.05%，由 v7-Original 实现。单个类回归率可达81.42%，由 v5-X 实现，而特征精度可达89.72%，由 v8-X 实现。通过使用类活动图（CAM）来理解模型在复杂数据集上的注意力。 fewer classes， grouped by visual or size features 可以提高性能。成功的生长阶段检测表明了开源物体检测架构在植物phenotyping和苔藿识别技术中的潜在潜力。

Anomaly Detection in Industrial Machinery using IoT Devices and Machine Learning: a Systematic Mapping

paper_url: http://arxiv.org/abs/2307.15807
repo_url: None
paper_authors: Sérgio F. Chevtchenko, Elisson da Silva Rocha, Monalisa Cristina Moura Dos Santos, Ricardo Lins Mota, Diego Moura Vieira, Ermeson Carneiro de Andrade, Danilo Ricardo Barbosa de Araújo
For: This paper is written for researchers and practitioners who are interested in Anomaly Detection for industrial machinery using IoT devices and ML algorithms.* Methods: The paper uses a systematic mapping study to evaluate 84 relevant studies from 2016 to 2023, providing an extensive review of Anomaly Detection research in industrial machinery. The study covers the most commonly used algorithms, preprocessing techniques, and sensor types.* Results: The paper identifies the application areas and points to future challenges and research opportunities in Anomaly Detection for industrial machinery using IoT devices and ML algorithms.Here is the same information in Simplified Chinese text:* For: 这篇论文是为研究者和实践者而写的，他们关心工业机器使用互联网物联网设备和机器学习算法进行异常检测。* Methods: 这篇论文使用系统性的映射研究来评估84篇相关研究，从2016年到2023年，提供了工业机器异常检测研究的广泛审视。研究涵盖了最常用的算法、预处理技术和传感器类型。* Results: 这篇论文 indentifies了应用领域和未来挑战和研究机会在工业机器使用互联网物联网设备和机器学习算法进行异常检测。

Abstract
Anomaly detection is critical in the smart industry for preventing equipment failure, reducing downtime, and improving safety. Internet of Things (IoT) has enabled the collection of large volumes of data from industrial machinery, providing a rich source of information for Anomaly Detection. However, the volume and complexity of data generated by the Internet of Things ecosystems make it difficult for humans to detect anomalies manually. Machine learning (ML) algorithms can automate anomaly detection in industrial machinery by analyzing generated data. Besides, each technique has specific strengths and weaknesses based on the data nature and its corresponding systems. However, the current systematic mapping studies on Anomaly Detection primarily focus on addressing network and cybersecurity-related problems, with limited attention given to the industrial sector. Additionally, these studies do not cover the challenges involved in using ML for Anomaly Detection in industrial machinery within the context of the IoT ecosystems. This paper presents a systematic mapping study on Anomaly Detection for industrial machinery using IoT devices and ML algorithms to address this gap. The study comprehensively evaluates 84 relevant studies spanning from 2016 to 2023, providing an extensive review of Anomaly Detection research. Our findings identify the most commonly used algorithms, preprocessing techniques, and sensor types. Additionally, this review identifies application areas and points to future challenges and research opportunities.

摘要
“异常探测是智能产业中的关键任务，可以预防设备故障、减少停机时间和提高安全性。互联网物件（IoT）已经允许了对工业机械的大量数据收集，提供了丰富的数据来源供异常探测。然而，由于互联网物件生态系统所生成的数据量和复杂度，使得人类手动探测异常具有困难。机器学习（ML）算法可以自动探测工业机械中的异常，通过分析生成的数据。然而，目前的系统性映射研究主要集中在网络和预防网络攻击等方面，对于工业 сектору的关注相对较少。此外，这些研究并未考虑使用ML探测工业机械中的异常在互联网物件生态系统中的挑战。本文提出了一个系统性映射研究，涵盖2016年至2023年间84份相关的研究，提供了广泛的异常探测研究评估。我们的发现显示了最常使用的算法、处理前置技术和感应器类型。此外，这个评估还点出了应用领域和未来挑战和研究机会。”

On Single Index Models beyond Gaussian Data

paper_url: http://arxiv.org/abs/2307.15804
repo_url: None
paper_authors: Joan Bruna, Loucas Pillaud-Vivien, Aaron Zweig
for: 本文研究了在非标准分布 Setting下，使用批处理梯度下降法（SGD）来修复潜在的隐藏向量 $\theta^*$。
methods: 本文基于 \cite{arous2020online} 的框架，并对非标准分布 Setting进行扩展。
results: 本文的主要结果表明，在高维度 Setting下，SGD 可以有效地修复隐藏向量 $\theta^*$，提供了对previous works \cite{yehudai2020learning,wu2022learning} 的扩展。

Abstract
Sparse high-dimensional functions have arisen as a rich framework to study the behavior of gradient-descent methods using shallow neural networks, showcasing their ability to perform feature learning beyond linear models. Amongst those functions, the simplest are single-index models $f(x) = \phi( x \cdot \theta^*)$, where the labels are generated by an arbitrary non-linear scalar link function $\phi$ applied to an unknown one-dimensional projection $\theta^*$ of the input data. By focusing on Gaussian data, several recent works have built a remarkable picture, where the so-called information exponent (related to the regularity of the link function) controls the required sample complexity. In essence, these tools exploit the stability and spherical symmetry of Gaussian distributions. In this work, building from the framework of \cite{arous2020online}, we explore extensions of this picture beyond the Gaussian setting, where both stability or symmetry might be violated. Focusing on the planted setting where $\phi$ is known, our main results establish that Stochastic Gradient Descent can efficiently recover the unknown direction $\theta^*$ in the high-dimensional regime, under assumptions that extend previous works ~\cite{yehudai2020learning,wu2022learning}.

摘要
稀疏高维函数已成为研究梯度下降方法使用浅层神经网络的理想平台，展示它们在超 linear 模型中进行特征学习。这些函数中最简单的是单指数模型 $f(x) = \phi(x \cdot \theta^*)$，其中标签由一个未知的一维投影 $\theta^*$ 和一个任意非线性的链接函数 $\phi$ 生成。通过对 Gaussian 数据进行研究，一些最近的工作已经构建了一幅很出色的图像，其中信息指数（相关于链接函数的正则性）控制了样本复杂性。这些工具利用了 Gaussian 分布的稳定性和圆涂函数的圆涂性。在本文中，我们基于 \cite{arous2020online} 的框架，探讨这个图像在非 Gaussian 设置下的扩展。我们主要研究在植入 Setting 下，其中 $\phi$ 是已知的，Stochastic Gradient Descent 能够高维化 Recover 未知方向 $\theta^*$，并且我们的主要结果表明，在一些适用于前工作 \cite{yehudai2020learning,wu2022learning} 的假设下，Stochastic Gradient Descent 可以高效地进行梯度下降。

SAFE: Saliency-Aware Counterfactual Explanations for DNN-based Automated Driving Systems

paper_url: http://arxiv.org/abs/2307.15786
repo_url: None
paper_authors: Amir Samadi, Amir Shirian, Konstantinos Koufos, Kurt Debattista, Mehrdad Dianati
for: 本研究的目的是提出一种新的CF解释方法，即使用saliency map来生成更有用的CF解释。
methods: 本研究使用了现有的深度生成CF模型，并提出了一种基于saliency map的CF解释方法，该方法可以更好地考虑黑盒模型的权重分布。
results: 研究发现，使用saliency map可以生成更有用的CF解释，并且可以更好地考虑黑盒模型的权重分布。此外，研究还发现了一些相关的CF特征，可以用于更好地理解黑盒模型的决策过程。

Abstract
A CF explainer identifies the minimum modifications in the input that would alter the model's output to its complement. In other words, a CF explainer computes the minimum modifications required to cross the model's decision boundary. Current deep generative CF models often work with user-selected features rather than focusing on the discriminative features of the black-box model. Consequently, such CF examples may not necessarily lie near the decision boundary, thereby contradicting the definition of CFs. To address this issue, we propose in this paper a novel approach that leverages saliency maps to generate more informative CF explanations. Source codes are available at: https://github.com/Amir-Samadi//Saliency_Aware_CF.

摘要
一种 CF 解释器可以确定输入中最小的修改，使模型的输出变为其补做。即使是深度生成的 CF 模型通常使用用户选择的特征而不是黑盒模型的激发特征，因此 CF 示例可能不会位于决策边界附近，从而违反 CF 的定义。为解决这个问题，我们在这篇论文中提出了一种新的方法，利用saliency map生成更有用的 CF 解释。代码可以在 GitHub 上找到：https://github.com/Amir-Samadi//Saliency_Aware_CF.

Spherical and Hyperbolic Toric Topology-Based Codes On Graph Embedding for Ising MRF Models: Classical and Quantum Topology Machine Learning

paper_url: http://arxiv.org/abs/2307.15778
repo_url: https://github.com/Lcrypto/Topology-Signal-Processing
paper_authors: Vasiliy Usatyuk, Sergey Egorov, Denis Sapozhnikov
for: 本研究探讨了用信息 геометрия来描述爱丁顿模型的基态。
methods: 该方法利用了矩阵检查法和自动同构的概念，并与机器学习和错误检查编码之间的联系。
results: 该研究发现了一种将深度神经网络架构与错误检查编码相关的方法，并提出了一种基于捕获集的嵌入和稀疏分解方法。此外，研究还发现了一种将量子近似优化算法与深度神经网络架构相关的方法。

Abstract
The paper introduces the application of information geometry to describe the ground states of Ising models. This is achieved by utilizing parity-check matrices of cyclic and quasi-cyclic codes on toric and spherical topologies. The approach establishes a connection between machine learning and error-correcting coding, specifically in terms of automorphism and the size of the circulant of the quasi-cyclic code. This proposed approach has implications for the development of new embedding methods based on trapping sets. Statistical physics and number geometry are utilized to optimize error-correcting codes, leading to these embedding and sparse factorization methods. The paper establishes a direct connection between DNN architecture and error-correcting coding by demonstrating how state-of-the-art DNN architectures (ChordMixer, Mega, Mega-chunk, CDIL, ...) from the long-range arena can be equivalent to specific types (Cage-graph, Repeat Accumulate) of block and convolutional LDPC codes. QC codes correspond to certain types of chemical elements, with the carbon element being represented by the mixed automorphism Shu-Lin-Fossorier QC-LDPC code. The Quantum Approximate Optimization Algorithm (QAOA) used in the Sherrington-Kirkpatrick Ising model can be seen as analogous to the back-propagation loss function landscape in training DNNs. This similarity creates a comparable problem with TS pseudo-codeword, resembling the belief propagation method. Additionally, the layer depth in QAOA correlates to the number of decoding belief propagation iterations in the Wiberg decoding tree. Overall, this work has the potential to advance multiple fields, from Information Theory, DNN architecture design (sparse and structured prior graph topology), efficient hardware design for Quantum and Classical DPU/TPU (graph, quantize and shift register architect.) to Materials Science and beyond.

摘要
文章介绍了使用信息 геометрии来描述铁模型的基态。这是通过利用cyclic和quasi-cyclic codes的parity-check矩阵在toric和spherical topologies上进行实现的。该方法确立了机器学习和错误修正编码之间的连接，特别是自动orf和 quasi-cyclic code的 circulant 的大小。这个提议的方法可以用于开发新的嵌入方法，基于拦束集。物理统计学和数字几何被用来优化错误修正编码，导致这些嵌入和稀疏因子化方法。文章显示了如何将state-of-the-art DNN架构（ChordMixer、Mega、Mega-chunk、CDIL等）与error-correcting coding相关联，并证明了这些架构可以被视为特定类型（Cage-graph、Repeat Accumulate）的块和 convolutional LDPC codes。QC codes与某些类型的化学元素相对应，如碳元素被表示为Shu-Lin-Fossorier QC-LDPC code的混合自动orf。Quantum Approximate Optimization Algorithm（QAOA）在Sherrington-Kirkpatrick Ising模型中可以被视为back-propagation loss function landscape在训练DNN时的相似。这种相似性创造了一个相似的问题，与TS pseudo-codeword相似，类似于 belief propagation 方法。此外，QAOA层深度与 belief propagation 迭代数相关。总之，这种工作有可能推动多个领域的进步，从信息理论、DNN架构设计（稀疏和结构化图前 topology）、高效的古典和量子 DPU/TPU 设计（图、量化和移动register 架构）到材料科学和更远的领域。

Seeking the Yield Barrier: High-Dimensional SRAM Evaluation Through Optimal Manifold

paper_url: http://arxiv.org/abs/2307.15773
repo_url: None
paper_authors: Yanfang Liu, Guohao Dai, Wei W. Xing
for: 该研究目标是提高高级规模的SRAM组件失败概率估计的效率和准确性。
methods: 该研究基于经典的 нор minimization 方法，并将其扩展到无穷Components和得到新的优化 manifold 概念，这个概念连接了代理基本和重要抽样（IS）估计方法。然后，提出了一种不良边缘aware的落囊采样方法，并使用神经 Coupling 流（可以学习从样本如surrogate模型）作为 IS 提案分布。
results: 该研究结果显示，OPTIMIS 方法可以具有与 SOTA 方法相同的性能和稳定性，同时具有更高的效率和准确性。在高维度 SRAM 评估中，OPTIMIS 方法可以提高效率达 3.5倍，并提高准确性达 3倍。

Abstract
Being able to efficiently obtain an accurate estimate of the failure probability of SRAM components has become a central issue as model circuits shrink their scale to submicrometer with advanced technology nodes. In this work, we revisit the classic norm minimization method. We then generalize it with infinite components and derive the novel optimal manifold concept, which bridges the surrogate-based and importance sampling (IS) yield estimation methods. We then derive a sub-optimal manifold, optimal hypersphere, which leads to an efficient sampling method being aware of the failure boundary called onion sampling. Finally, we use a neural coupling flow (which learns from samples like a surrogate model) as the IS proposal distribution. These combinations give rise to a novel yield estimation method, named Optimal Manifold Important Sampling (OPTIMIS), which keeps the advantages of the surrogate and IS methods to deliver state-of-the-art performance with robustness and consistency, with up to 3.5x in efficiency and 3x in accuracy over the best of SOTA methods in High-dimensional SRAM evaluation.

摘要
“能够效率地获得SRAM ком ponent的失败概率估计已成为技术迁移到 submicrometer 级别的 central issue。在这种工作中，我们回到了经典的 нор minimization 方法。然后，我们推广它到无限组件，并 derive novel optimal manifold 概念，该概念将 surrogate-based 和 importance sampling （IS） yield estimation 方法相连接。然后，我们 deriv sub-optimal manifold，optimal hypersphere，这导致了一种高效的抽样方法，意识到失败边界called onion sampling。最后，我们使用 neural coupling flow（学习从样本如 surrogate model）作为 IS 提案分布。这些组合在OPTIMIS 方法中，可以带来一种 novel yield estimation 方法，具有 surrogate 和 IS 方法的优点，可以提供 state-of-the-art 性能，同时具有 robustness 和一致性，高效率和高准确率，相比 SOTA 方法，可以提高到 3.5x 的效率和 3x 的准确率。”

Weighted variation spaces and approximation by shallow ReLU networks

paper_url: http://arxiv.org/abs/2307.15772
repo_url: None
paper_authors: Ronald DeVore, Robert D. Nowak, Rahul Parhi, Jonathan W. Siegel
for: 本研究探讨了使用单层ReLU神经网络 approximate 函数 $f$ 在固定域 $\Omega$ 的表示方法。
methods: 本研究使用了单层ReLU神经网络来approximate 函数 $f$ 的表示方法。
results: 研究发现，使用这种方法可以在固定域 $\Omega$ 上获得更高精度的函数表示，并且这些表示的精度不受维度的影响。

Abstract
We investigate the approximation of functions $f$ on a bounded domain $\Omega\subset \mathbb{R}^d$ by the outputs of single-hidden-layer ReLU neural networks of width $n$. This form of nonlinear $n$-term dictionary approximation has been intensely studied since it is the simplest case of neural network approximation (NNA). There are several celebrated approximation results for this form of NNA that introduce novel model classes of functions on $\Omega$ whose approximation rates avoid the curse of dimensionality. These novel classes include Barron classes, and classes based on sparsity or variation such as the Radon-domain BV classes. The present paper is concerned with the definition of these novel model classes on domains $\Omega$. The current definition of these model classes does not depend on the domain $\Omega$. A new and more proper definition of model classes on domains is given by introducing the concept of weighted variation spaces. These new model classes are intrinsic to the domain itself. The importance of these new model classes is that they are strictly larger than the classical (domain-independent) classes. Yet, it is shown that they maintain the same NNA rates.

摘要
我们研究函数 $f$ 在受限区域 $\Omega$ 上的近似方法，使用单层ReLU神经网络的输出。这种单一神经网络近似方法已经广泛研究，因为它是最简单的神经网络近似方法（NNA）的案例。有很多著名的近似结果，包括Barron类和基于稀疏或变化的类，如Radon域BV类。本文关注这些新的模型类在领域 $\Omega$ 上的定义。现有的定义不依赖于领域 $\Omega$。我们引入权重变化空间的概念，并提出一种新的、更加适当的模型类定义。这些新的模型类与传统的领域独立的类相比，更加强大，但它们保持了同样的NNA率。

The Hydra Effect: Emergent Self-repair in Language Model Computations

paper_url: http://arxiv.org/abs/2307.15771
repo_url: None
paper_authors: Thomas McGrath, Matthew Rahtz, Janos Kramar, Vladimir Mikulik, Shane Legg
for: 本研究使用 causal 分析探讨语言模型计算的内部结构。
methods: 研究使用了ablation studying和counterfactual reasoning来探讨语言模型层次结构的作用。
results: 研究发现，语言模型层次结构具有adaptive computation和counterbalancing功能，即一种叫做“响应层补做”的现象，以及一种叫做“较量层下降”的现象。这些效果存在于不含dropout的语言模型中，并且层次结构相对较为松散。这些结果对于语言模型的审计和归因具有重要意义。

Abstract
We investigate the internal structure of language model computations using causal analysis and demonstrate two motifs: (1) a form of adaptive computation where ablations of one attention layer of a language model cause another layer to compensate (which we term the Hydra effect) and (2) a counterbalancing function of late MLP layers that act to downregulate the maximum-likelihood token. Our ablation studies demonstrate that language model layers are typically relatively loosely coupled (ablations to one layer only affect a small number of downstream layers). Surprisingly, these effects occur even in language models trained without any form of dropout. We analyse these effects in the context of factual recall and consider their implications for circuit-level attribution in language models.

摘要
我们使用 causal 分析 investigate 语言模型计算的内部结构，并发现了两种模式：（1）一种适应计算，其中剪除一层注意力层会导致另一层补偿（我们称之为“哈迪拉效应”），以及（2）一种延迟多层扩散（MLP）层的抵消功能，它会降低最大可能性的 Token。我们的剪除研究表明，语言模型层通常是相对松散 Coupled（剪除一层只会影响一小部分下游层）。奇怪的是，这些效果会在没有任何dropout的情况下出现。我们在事实记忆中分析这些效果，并考虑它们对语言模型的征ircuit-level 归因的影响。

Goodness-of-Fit of Attributed Probabilistic Graph Generative Models

paper_url: http://arxiv.org/abs/2308.03773
repo_url: None
paper_authors: Pablo Robles-Granda, Katherine Tsai, Oluwasanmi Koyejo
for: 这篇论文主要用于描述如何评估Random Attributed Graph模型的合适性。
methods: 该论文使用了 Mean Square Contingency Coefficient 来评估模型的合适性，并提供了一种验证过程来确保模型的结构具有最低的偏差。
results: 该论文通过应用这些标准来验证各种流行的图模型的表示能力。

Abstract
Probabilistic generative models of graphs are important tools that enable representation and sampling. Many recent works have created probabilistic models of graphs that are capable of representing not only entity interactions but also their attributes. However, given a generative model of random attributed graph(s), the general conditions that establish goodness of fit are not clear a-priori. In this paper, we define goodness of fit in terms of the mean square contingency coefficient for random binary networks. For this statistic, we outline a procedure for assessing the quality of the structure of a learned attributed graph by ensuring that the discrepancy of the mean square contingency coefficient (constant, or random) is minimal with high probability. We apply these criteria to verify the representation capability of a probabilistic generative model for various popular types of graph models.

摘要
probabilistic生成模型是重要工具，它们可以 representation和采样。在最近的许多研究中，人们已经创建了可以表示不只是实体交互，还有属性的 probabilistic模型。然而，给定一个生成模型的random attributed graph，通用的goodness of fit的条件并不明确。在这篇论文中，我们定义goodness of fit为random binary network的mean square contingency coefficient的 Statistics。我们详细说明了验证学习 attributed graph的结构质量的方法，确保discrepancy的mean square contingency coefficient（随机或常数）是最小的，并且具有高概率。我们应用这些标准来验证不同类型的图模型的表示能力。

Resume Evaluation through Latent Dirichlet Allocation and Natural Language Processing for Effective Candidate Selection

paper_url: http://arxiv.org/abs/2307.15752
repo_url: None
paper_authors: Vidhita Jagwani, Smit Meghani, Krishna Pai, Sudhir Dhage
for: 这 paper 是为了提出一种基于 Latent Dirichlet Allocation (LDA) 和 SpaCy 实体检测的简历评分方法。
methods: 该方法首先使用 SpaCy 的Named Entity Recognition (NER) 提取简历中的相关实体，例如教育、工作经验和技能。然后，LDA 模型使用这些实体对简历进行评分，并将每个实体分配一个主题概率。
results: 我们的提出的系统使用 LDA 分解简历为 latent topics，并提取有意义的 semantic representations。在尝试使用只考虑技能的情况下，我们的模型达到了 77% 的准确率；在考虑所有属性的情况下，我们的模型达到了 82% 的准确率。

Abstract
In this paper, we propose a method for resume rating using Latent Dirichlet Allocation (LDA) and entity detection with SpaCy. The proposed method first extracts relevant entities such as education, experience, and skills from the resume using SpaCy's Named Entity Recognition (NER). The LDA model then uses these entities to rate the resume by assigning topic probabilities to each entity. Furthermore, we conduct a detailed analysis of the entity detection using SpaCy's NER and report its evaluation metrics. Using LDA, our proposed system breaks down resumes into latent topics and extracts meaningful semantic representations. With a vision to define our resume score to be more content-driven rather than a structure and keyword match driven, our model has achieved 77% accuracy with respect to only skills in consideration and an overall 82% accuracy with all attributes in consideration. (like college name, work experience, degree and skills)

摘要
在这篇论文中，我们提出了一种使用Latent Dirichlet Allocation（LDA）和实体检测（SpaCy）来评分简历的方法。我们的方法首先从简历中提取有关的实体，如教育、经验和技能，使用SpaCy的命名实体识别（NER）。然后，LDA模型使用这些实体来评分简历，并将每个实体分配话题概率。此外，我们还进行了NER的实体检测的详细分析，并发布了评估指标。使用LDA，我们的提案系统将简历分解成了隐藏主题，并提取了有意义的语义表示。我们的模型的目标是通过对简历的内容进行评估，而不是仅仅是结构和关键词匹配，因此我们的模型在只考虑技能方面达到了77%的准确率，而在所有属性方面达到了82%的准确率。（包括学院名、工作经验、学位和技能）

How regularization affects the geometry of loss functions

paper_url: http://arxiv.org/abs/2307.15744
repo_url: None
paper_authors: Nathaniel Bottman, Y. Cooper, Antonio Lerario
for: 研究深度神经网络如何学习，即使用不同的正则化方法。
methods: 研究不同的正则化方法如何改变损失函数的几何结构。
results: 发现在权重 decay 等正则化方法下，损失函数可能会变成 Morse 函数，这意味着神经网络可能会更好地学习。

Abstract
What neural networks learn depends fundamentally on the geometry of the underlying loss function. We study how different regularizers affect the geometry of this function. One of the most basic geometric properties of a smooth function is whether it is Morse or not. For nonlinear deep neural networks, the unregularized loss function $L$ is typically not Morse. We consider several different regularizers, including weight decay, and study for which regularizers the regularized function $L_\epsilon$ becomes Morse.

摘要
neuronal networks 的学习听取函数的geometry的基本属性。我们研究不同的正则化对函数的geometry的影响。Morse函数是一种最基本的几何属性，我们研究非线性深度神经网络中未正则化的损失函数$L$是否为Morse函数。我们考虑了多种正则化器，包括权重减少，并研究哪些正则化器使得正则化后的函数$L_\epsilon$变为Morse函数。Note: "Morse function" is a mathematical concept, not a term commonly used in deep learning. In this text, it is used to refer to a smooth function that has a single global minimum.Here's the translation with some additional explanations:neuronal networks 的学习听取函数的geometry的基本属性。我们研究不同的正则化对函数的geometry的影响。Morse函数是一种最基本的几何属性，我们研究非线性深度神经网络中未正则化的损失函数$L$是否为Morse函数。我们考虑了多种正则化器，包括权重减少，并研究哪些正则化器使得正则化后的函数$L_\epsilon$变为Morse函数。In this text, the authors are studying the geometry of the loss function in deep neural networks, specifically how different regularizers affect the geometry of the function. They use the concept of a Morse function, which is a smooth function that has a single global minimum, to describe the geometry of the loss function. They consider various regularizers, including weight decay, and investigate which regularizers cause the regularized function $L_\epsilon$ to become a Morse function.

Quantum-noise-limited optical neural networks operating at a few quanta per activation

paper_url: http://arxiv.org/abs/2307.15712
repo_url: None
paper_authors: Shi-Yuan Ma, Tianyu Wang, Jérémie Laydevant, Logan G. Wright, Peter L. McMahon
for: 这个论文是研究激光神经网络在低功率 режи响应下的性能的，特别是在层次结构中使用单 photon来触发神经元的情况下。
methods: 作者使用了一种直接模型摄像头检测器的随机行为来训练激光神经网络，以实现高精度的图像分类任务。
results: 实验结果显示，使用这种方法可以在低功率 régime下实现高精度的图像分类，并且使用的光能量相对于前一个state-of-the-art低光能量示范项目减少了>40倍。

Abstract
Analog physical neural networks, which hold promise for improved energy efficiency and speed compared to digital electronic neural networks, are nevertheless typically operated in a relatively high-power regime so that the signal-to-noise ratio (SNR) is large (>10). What happens if an analog system is instead operated in an ultra-low-power regime, in which the behavior of the system becomes highly stochastic and the noise is no longer a small perturbation on the signal? In this paper, we study this question in the setting of optical neural networks operated in the limit where some layers use only a single photon to cause a neuron activation. Neuron activations in this limit are dominated by quantum noise from the fundamentally probabilistic nature of single-photon detection of weak optical signals. We show that it is possible to train stochastic optical neural networks to perform deterministic image-classification tasks with high accuracy in spite of the extremely high noise (SNR ~ 1) by using a training procedure that directly models the stochastic behavior of photodetection. We experimentally demonstrated MNIST classification with a test accuracy of 98% using an optical neural network with a hidden layer operating in the single-photon regime; the optical energy used to perform the classification corresponds to 0.008 photons per multiply-accumulate (MAC) operation, which is equivalent to 0.003 attojoules of optical energy per MAC. Our experiment used >40x fewer photons per inference than previous state-of-the-art low-optical-energy demonstrations, to achieve the same accuracy of >90%. Our work shows that some extremely stochastic analog systems, including those operating in the limit where quantum noise dominates, can nevertheless be used as layers in neural networks that deterministically perform classification tasks with high accuracy if they are appropriately trained.

摘要
аналог物理神经网络，它们在能效率和速度方面比数字电子神经网络有前景，然而通常在高功率 режи干（SNR > 10）下运行。如果这样的 аналог系统反而在超低功率 режи干下运行，那么系统的行为会变得极其抽象和随机，而噪声不再是信号的小杂音。在这篇论文中，我们研究了这个问题，具体来说是在使用单 photon 来触发神经元的光学神经网络中。在这种情况下，神经元活动受到光学信号的渐进性和单 photon 的探测的抽象性的限制。我们表明，可以通过直接模型单 photon 探测的随机行为来训练随机光学神经网络，以实现高精度的图像分类任务。我们在 MNIST 分类任务上进行了实验，测试精度达 98%，使用的光学能量为 0.008 photons/MAC 操作，相当于 0.003 attojoules/MAC 的光学能量。我们的实验使用了 >40x fewer photons per inference than previous state-of-the-art low-optical-energy demonstrations，以达到同样的准确率（>90%）。我们的工作表明，一些极其随机的 аналог系统，包括在噪声dominates的情况下，可以作为神经网络的层使用，以实现高精度的图像分类任务，只要采用相应的训练方法。

Semi-Supervised Object Detection in the Open World

paper_url: http://arxiv.org/abs/2307.15710
repo_url: None
paper_authors: Garvita Allabadi, Ana Lucic, Peter Pao-Huang, Yu-Xiong Wang, Vikram Adve
for: 本研究旨在 Addressing the challenges of open-world semi-supervised object detection, where the model must detect out-of-distribution (OOD) samples and learn from both in-distribution (ID) and OOD data.
methods: 我们提出了 Open World Semi-supervised Detection 框架 (OWSSD), which combines an OOD detector based on lightweight auto-encoder networks trained only on ID data, along with a semi-supervised learning pipeline that learns from both ID and OOD data.
results: 我们通过广泛的评估表明，我们的方法可以与现状最佳的OOD检测算法竞争，同时也可以在开放世界场景下提高 semi-supervised 学习性能。

Abstract
Existing approaches for semi-supervised object detection assume a fixed set of classes present in training and unlabeled datasets, i.e., in-distribution (ID) data. The performance of these techniques significantly degrades when these techniques are deployed in the open-world, due to the fact that the unlabeled and test data may contain objects that were not seen during training, i.e., out-of-distribution (OOD) data. The two key questions that we explore in this paper are: can we detect these OOD samples and if so, can we learn from them? With these considerations in mind, we propose the Open World Semi-supervised Detection framework (OWSSD) that effectively detects OOD data along with a semi-supervised learning pipeline that learns from both ID and OOD data. We introduce an ensemble based OOD detector consisting of lightweight auto-encoder networks trained only on ID data. Through extensive evalulation, we demonstrate that our method performs competitively against state-of-the-art OOD detection algorithms and also significantly boosts the semi-supervised learning performance in open-world scenarios.

摘要
现有的半超vised对象检测方法假设训练和无标据数据集中的类集是固定的，即在distribution（ID）数据。这些技术在开放世界中部署时，其性能会受到很大降低，因为测试和无标据数据可能包含训练中没有看到的对象，即out-of-distribution（OOD）数据。我们在这篇文章中考虑了两个关键问题：我们可以检测OOD样本，并且如果可以，我们可以学习它们吗？为此，我们提出了开放世界半超vised检测框架（OWSSD），可以有效地检测OOD数据，同时还可以通过半超vised学习来学习ID和OOD数据。我们提出了一个ensemble基于自动编码网络，该网络只在ID数据上训练。经过广泛的评估，我们发现我们的方法可以与当前的OOD检测算法竞争，同时还可以在开放世界 scenarios中显著提高半超vised学习性能。

Uncertainty in Natural Language Generation: From Theory to Applications

paper_url: http://arxiv.org/abs/2307.15703
repo_url: https://github.com/Rastaman4e/-1
paper_authors: Joris Baan, Nico Daheim, Evgenia Ilia, Dennis Ulmer, Haau-Sing Li, Raquel Fernández, Barbara Plank, Rico Sennrich, Chrysoula Zerva, Wilker Aziz
for: 本研究旨在提高自然语言生成（NLG）系统的可靠性和可靠性，使其能够更好地满足人们的需求。
methods: 本文提出了一种基于理论的不确定处理方法，以提高NLG系统的可靠性和多样性。
results: 本研究提出了一种两维分类方法，可以更好地捕捉NLG系统中的不确定性。此外，本文还提出了一些实际应用的研究方向，如推理、自我评估、活动学习等。

Abstract
Recent advances of powerful Language Models have allowed Natural Language Generation (NLG) to emerge as an important technology that can not only perform traditional tasks like summarisation or translation, but also serve as a natural language interface to a variety of applications. As such, it is crucial that NLG systems are trustworthy and reliable, for example by indicating when they are likely to be wrong; and supporting multiple views, backgrounds and writing styles -- reflecting diverse human sub-populations. In this paper, we argue that a principled treatment of uncertainty can assist in creating systems and evaluation protocols better aligned with these goals. We first present the fundamental theory, frameworks and vocabulary required to represent uncertainty. We then characterise the main sources of uncertainty in NLG from a linguistic perspective, and propose a two-dimensional taxonomy that is more informative and faithful than the popular aleatoric/epistemic dichotomy. Finally, we move from theory to applications and highlight exciting research directions that exploit uncertainty to power decoding, controllable generation, self-assessment, selective answering, active learning and more.

摘要
We first present the fundamental theory, frameworks, and vocabulary required to represent uncertainty. We then characterize the main sources of uncertainty in NLG from a linguistic perspective and propose a two-dimensional taxonomy that is more informative and faithful than the popular aleatoric/epistemic dichotomy. Finally, we move from theory to applications and highlight exciting research directions that exploit uncertainty to power decoding, controllable generation, self-assessment, selective answering, active learning, and more.Here's the text in Simplified Chinese:近期，具有强大语言模型的进步，使自然语言生成（NLG）成为一种重要的技术，不仅可以完成传统任务如概要或翻译，而且可以作为一种自然语言界面，与多种应用集成。因此，NLG系统需要可靠和可信，例如指示它们可能错误的时候，并支持多个视角、背景和写作风格——反映人类子 популяции的多样性。在这篇论文中，我们 argue That a principled treatment of uncertainty can assist in creating systems and evaluation protocols better aligned with these goals.我们首先介绍了需要表示不确定性的基本理论、框架和术语。然后，我们从语言学角度描述了NLG中的主要不确定性来源，并提出了一个两个维度的分类，比较有用和忠实于流行的 aleatoric/epistemic 对立。最后，我们从理论到应用，高亮了利用不确定性来实现排码、可控生成、自我评估、选择答案、活动学习等研究方向。

Universal Recurrent Event Memories for Streaming Data

paper_url: http://arxiv.org/abs/2307.15694
repo_url: None
paper_authors: Ran Dou, Jose Principe
for: 这个论文提出了一种新的事件记忆架构（MemNet），用于 Recurrent Neural Networks（RNN），可以处理不同类型的时间序列数据，包括标量、多变量和符号时间序列。
methods: MemNet 使用键值对来存储信息，这种方式可以提高表示力，同时也避免了模型状态构建的缺点。 MemNet 使用线性适应映射函数实现非线性运算。
results: MemNet 可以应用于不同的应用领域，包括混沌时间序列、符号运算任务和问答任务（bAbI），并在所有应用领域中达到了状态对抗网络和外部储存网络的性能水平。 MemNet 需要 fewer 的训练参数和更小的空间复杂度，使得注意机制更加有效率，开门到 IoT 应用领域。

Abstract
In this paper, we propose a new event memory architecture (MemNet) for recurrent neural networks, which is universal for different types of time series data such as scalar, multivariate or symbolic. Unlike other external neural memory architectures, it stores key-value pairs, which separate the information for addressing and for content to improve the representation, as in the digital archetype. Moreover, the key-value pairs also avoid the compromise between memory depth and resolution that applies to memories constructed by the model state. One of the MemNet key characteristics is that it requires only linear adaptive mapping functions while implementing a nonlinear operation on the input data. MemNet architecture can be applied without modifications to scalar time series, logic operators on strings, and also to natural language processing, providing state-of-the-art results in all application domains such as the chaotic time series, the symbolic operation tasks, and the question-answering tasks (bAbI). Finally, controlled by five linear layers, MemNet requires a much smaller number of training parameters than other external memory networks as well as the transformer network. The space complexity of MemNet equals a single self-attention layer. It greatly improves the efficiency of the attention mechanism and opens the door for IoT applications.

摘要
在这篇论文中，我们提出了一种新的事件记忆架构（MemNet），用于逻辑神经网络，可以 universal 地应用于不同类型的时间序列数据，如scalar、多变量或 симвоlic。与其他外部神经网络记忆架构不同，MemNet 存储 key-value 对，以分离地址和内容信息，从而提高表示能力，类似于数字原型。此外，key-value 对还避免了模型状态构建的记忆深度和分辨率之间的compromise。MemNet 的一个关键特点是只需要线性适应映射函数，实现输入数据的非线性操作。MemNet 架构可以无需修改应用于scalar时间序列、逻辑运算符 strings 和自然语言处理等领域，并在所有应用领域中达到了state-of-the-art 结果，如混沌时间序列、符号操作任务和问答任务（bAbI）。最后，由五个线性层控制，MemNet 需要训练参数的数量比其他外部记忆网络和 transformer 网络要少得多。MemNet 的空间复杂度等于单个自我注意层。它大大提高了注意机制的效率，开启了对 IoT 应用的大门。

ODTlearn: A Package for Learning Optimal Decision Trees for Prediction and Prescription

paper_url: http://arxiv.org/abs/2307.15691
repo_url: https://github.com/d3m-research-group/odtlearn
paper_authors: Patrick Vossler, Sina Aghaei, Nathan Justin, Nathanael Jo, Andrés Gómez, Phebe Vayanos
for: 这个论文主要针对高风险预测和规划任务中的优化决策树问题，提供了一个基于杂Integer优化（MIO）框架的开源Python包。
methods: 论文提出了一种基于MIO框架的优化决策树算法，以及其 extensions。包括优化分类树、优化公平分类树、分类树对 distribuition shift 的Robustness、和优化规划树等。
results: 论文提供了一个开源的Python包，名为ODTLearn，可以帮助用户快速地学习优化决策树。包括对 observational data 的学习、分类和规划等任务。

Abstract
ODTLearn is an open-source Python package that provides methods for learning optimal decision trees for high-stakes predictive and prescriptive tasks based on the mixed-integer optimization (MIO) framework proposed in Aghaei et al. (2019) and several of its extensions. The current version of the package provides implementations for learning optimal classification trees, optimal fair classification trees, optimal classification trees robust to distribution shifts, and optimal prescriptive trees from observational data. We have designed the package to be easy to maintain and extend as new optimal decision tree problem classes, reformulation strategies, and solution algorithms are introduced. To this end, the package follows object-oriented design principles and supports both commercial (Gurobi) and open source (COIN-OR branch and cut) solvers. The package documentation and an extensive user guide can be found at https://d3m-research-group.github.io/odtlearn/. Additionally, users can view the package source code and submit feature requests and bug reports by visiting https://github.com/D3M-Research-Group/odtlearn.

摘要
ODTLearn 是一个开源的 Python 包，提供了用于学习优化决策树的方法，用于高风险预测和指导任务基于混合整数优化（MIO）框架，如 Aghaei et al. (2019) 等扩展。目前版本的包提供了学习优化分类树、优化公平分类树、分类树对分布变化强度 Robust 和指导树的实现。我们设计了该包，以便轻松维护和扩展，随着新的优化决策树问题类型、重新表述策略和解决算法的引入。为此，该包遵循 объек oriented 设计原则，并支持商业（Gurobi）和开源（COIN-OR branch and cut）解决方案。包的文档和详细用户指南可以在 https://d3m-research-group.github.io/odtlearn/ 找到。此外，用户可以在 https://github.com/D3M-Research-Group/odtlearn 查看包源代码，提交功能需求和错误报告。

AI for Anticipatory Action: Moving Beyond Climate Forecasting

paper_url: http://arxiv.org/abs/2307.15727
repo_url: None
paper_authors: Benjamin Q. Huynh, Mathew V. Kiang
for: 该论文主要旨在探讨气候预测转向预先行动的趋势，以及机器学习模型在气候预测中的应用和挑战。
methods: 论文详细介绍了预先行动的概念和实践，并评估了现有机器学习模型在气候预测中的应用。
results: 论文指出，机器学习模型在气候预测中具有极高的准确率和可靠性，但在实现预先行动方面存在一些挑战和限制。

Abstract
Disaster response agencies have been shifting from a paradigm of climate forecasting towards one of anticipatory action: assessing not just what the climate will be, but how it will impact specific populations, thereby enabling proactive response and resource allocation. Machine learning models are becoming exceptionally powerful at climate forecasting, but methodological gaps remain in terms of facilitating anticipatory action. Here we provide an overview of anticipatory action, review relevant applications of machine learning, identify common challenges, and highlight areas where machine learning can uniquely contribute to advancing disaster response for populations most vulnerable to climate change.

摘要
气候灾害机构正在从气候预测 парадигshift towards一个anticipatory action：评估不仅气候将如何发展，而且如何影响特定的人口，从而实现先进的应急响应和资源分配。机器学习模型在气候预测方面已经非常强大，但在实现anticipatory action方面还存在方法学挑战。本文提供了anticipatory action的概述，评估了相关的机器学习应用，描述了常见的挑战，并强调机器学习在对气候变化最容易受影响的人口进行应对方面的独特贡献。

Benchmarking Offline Reinforcement Learning on Real-Robot Hardware

paper_url: http://arxiv.org/abs/2307.15690
repo_url: https://github.com/rr-learning/trifinger_rl_datasets
paper_authors: Nico Gürtler, Sebastian Blaes, Pavel Kolev, Felix Widmaier, Manuel Wüthrich, Stefan Bauer, Bernhard Schölkopf, Georg Martius
for: 这篇论文旨在提出一个关于基于先前记录的数据学习的策略，用于实际Robotics任务。
methods: 论文使用大量多样数据和离线权威学习来解决dexterous manipulation问题。
results: 论文提供了一个大量数据的集合，包括在离线学习中学习的策略，以及一个可以在实际Robotics系统和模拟器上调试的选项。

Abstract
Learning policies from previously recorded data is a promising direction for real-world robotics tasks, as online learning is often infeasible. Dexterous manipulation in particular remains an open problem in its general form. The combination of offline reinforcement learning with large diverse datasets, however, has the potential to lead to a breakthrough in this challenging domain analogously to the rapid progress made in supervised learning in recent years. To coordinate the efforts of the research community toward tackling this problem, we propose a benchmark including: i) a large collection of data for offline learning from a dexterous manipulation platform on two tasks, obtained with capable RL agents trained in simulation; ii) the option to execute learned policies on a real-world robotic system and a simulation for efficient debugging. We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.

摘要
学习政策从前期录制的数据中提取知识是一个有前途的方向，因为在线学习经常不可能。灵活的操作特别是一个打开的问题。 combining offline reinforcement learning with large and diverse datasets has the potential to make significant progress in this challenging domain, much like the rapid progress made in supervised learning in recent years. To coordinate the efforts of the research community towards tackling this problem, we propose a benchmark that includes:* A large collection of data for offline learning from a dexterous manipulation platform on two tasks, obtained with capable RL agents trained in simulation;* The option to execute learned policies on a real-world robotic system and a simulation for efficient debugging.We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.

A supervised hybrid quantum machine learning solution to the emergency escape routing problem

paper_url: http://arxiv.org/abs/2307.15682
repo_url: None
paper_authors: Nathan Haboury, Mo Kordzanganeh, Sebastian Schmitt, Ayush Joshi, Igor Tokarev, Lukas Abdallah, Andrii Kurkin, Basil Kyriacou, Alexey Melnikov
methods: 该论文使用了一种新的混合监督学习方法，其包括一个量子神经网络并与一个经典神经网络并行运行。results: 该研究表明，使用混合监督学习方法可以提高急救规划的准确率，相比之下，纯经典监督学习方法的准确率仅高于7%。此外，研究还表明，量子神经网络在预测中占据了45.(3)%的比重。

Abstract
Managing the response to natural disasters effectively can considerably mitigate their devastating impact. This work explores the potential of using supervised hybrid quantum machine learning to optimize emergency evacuation plans for cars during natural disasters. The study focuses on earthquake emergencies and models the problem as a dynamic computational graph where an earthquake damages an area of a city. The residents seek to evacuate the city by reaching the exit points where traffic congestion occurs. The situation is modeled as a shortest-path problem on an uncertain and dynamically evolving map. We propose a novel hybrid supervised learning approach and test it on hypothetical situations on a concrete city graph. This approach uses a novel quantum feature-wise linear modulation (FiLM) neural network parallel to a classical FiLM network to imitate Dijkstra's node-wise shortest path algorithm on a deterministic dynamic graph. Adding the quantum neural network in parallel increases the overall model's expressivity by splitting the dataset's harmonic and non-harmonic features between the quantum and classical components. The hybrid supervised learning agent is trained on a dataset of Dijkstra's shortest paths and can successfully learn the navigation task. The hybrid quantum network improves over the purely classical supervised learning approach by 7% in accuracy. We show that the quantum part has a significant contribution of 45.(3)% to the prediction and that the network could be executed on an ion-based quantum computer. The results demonstrate the potential of supervised hybrid quantum machine learning in improving emergency evacuation planning during natural disasters.

摘要
naturale 灾害的回应可以优化它们的影响，这个工作探讨使用监督式量子机器学习来优化自然灾害时的紧急避难计划。研究专注在地震紧急情况下，模型问题为一个动态计算图，地震会对城市区域造成破坏。居民尝试通过到达城市边缘的出口点，以避免交通堵塞。这个问题被模型为一个短est-path问题，在一个不确定和动态变化的地图上。我们提出了一种新的复合监督学习方法，并在假设情况下进行了实验。这种方法使用了一个新的量子特征wise线性调整（FiLM）神经网络，与一个 классиical FiLM 神经网络并行，以模仿迪克斯特拉的节点短est-path算法。将量子神经网络加入平行增加了整个模型的表达能力，并将数据集的几何和非几何特征分别分配到量子和классиical ком成分中。复合监督学习代理被训练在一个短est-path 数据集上，并成功学习到 Navigation 任务。复合量子网络与仅使用классиical 监督学习方法相比，提高了7%的准确性。我们显示出量子部分对预测的贡献为45.(3)%，并且显示了这个网络可以在钠基数量子电脑上执行。结果显示了超级监督量子机器学习在自然灾害时的紧急避难规划中的潜力。

Benchmarking Anomaly Detection System on various Jetson Edge Devices

paper_url: http://arxiv.org/abs/2307.16834
repo_url: None
paper_authors: Hoang Viet Pham, Thinh Gia Tran, Chuong Dinh Le, An Dinh Le, Hien Bich Vo
for:This paper focuses on developing an end-to-end crime-scene anomaly detection system using weakly supervised video anomaly detection called Robust Temporal Feature Magnitude Learning (RTFM) and edge computing technology.methods:The system uses edge computing technology and TensorRT as the software developer kit from NVIDIA for system performance enhancement, and is tested directly on multiple Jetson edge devices with Docker technology.results:The anomaly detection model yields competitive results compared to other state-of-the-art (SOTA) algorithms on available datasets such as UCF-Crime and UIT VNAnomaly, with an inference speed of 47.56 frames per second (FPS) on a Jetson edge device with only 3.11 GB RAM usage total. Additionally, the AI system achieves 15% better performance than the previous version of Jetson devices while consuming 50% less energy power.Here is the format you requested:for: 这篇论文关注开发一个结束点犯罪场景异常检测系统，使用弱有监督视频异常检测方法 called Robust Temporal Feature Magnitude Learning (RTFM) 和边缘计算技术。methods: 该系统使用边缘计算技术和NVIDIA的TensorRT软件开发工具包进行性能优化，并直接在多个Jetson边缘设备上进行测试，使用Docker技术进行系统部署。results: 异常检测模型与其他状态艺术算法在可用的 datasets such as UCF-Crime和UIT VNAnomaly 上达到竞争性的结果，推理速度达到47.56帧每秒 (FPS) 在Jetson边缘设备上，具有只有3.11 GB RAM 的总用量。此外，AI系统在不同的Jetson设备上 achieves 15% 更好的性能，同时占用50% menos的能源电力。

Abstract
Capturing the abnormal event from surveillance videos enhances the safety and well-being of the citizens. The application of EdgeAI (Edge computing-based Artificial Intelligent ) meets the strict latency requirements for security. In this paper, we apply weakly supervised video anomaly detection called Robust Temporal Feature Magnitude Learning (RTFM) to an end-to-end crime-scene anomaly detection system from the surveillance cameras with the help of edge computing technology. The system is tested directly on multiple Jetson edge devices combined with TensorRT as the software developer kit from NVIDIA for system performance enhancement. The experience of an AI-based system deployment on various Jetson Edge devices with Docker technology is also provided. The anomaly detection model yields competitive results compared to other state-of-the-art (SOTA) algorithms on available datasets such as UCF-Crime and UIT VNAnomaly. The approach system reaches 47.56 frames per second (FPS) inference speed on a Jetson edge device with only 3.11 GB RAM usage total. We also discover the promising Jetson device that the AI system achieves 15% better performance than the previous version of Jetson devices while consuming 50% less energy power.

摘要
capturing the abnormal event from surveillance videos enhances the safety and well-being of the citizens. The application of EdgeAI (Edge computing-based Artificial Intelligent) meets the strict latency requirements for security. In this paper, we apply weakly supervised video anomaly detection called Robust Temporal Feature Magnitude Learning (RTFM) to an end-to-end crime-scene anomaly detection system from the surveillance cameras with the help of edge computing technology. The system is tested directly on multiple Jetson edge devices combined with TensorRT as the software developer kit from NVIDIA for system performance enhancement. The experience of an AI-based system deployment on various Jetson Edge devices with Docker technology is also provided. The anomaly detection model yields competitive results compared to other state-of-the-art (SOTA) algorithms on available datasets such as UCF-Crime and UIT VNAnomaly. The approach system reaches 47.56 frames per second (FPS) inference speed on a Jetson edge device with only 3.11 GB RAM usage total. We also discover the promising Jetson device that the AI system achieves 15% better performance than the previous version of Jetson devices while consuming 50% less energy power.

Dynamic Analysis and an Eigen Initializer for Recurrent Neural Networks

paper_url: http://arxiv.org/abs/2307.15679
repo_url: None
paper_authors: Ran Dou, Jose Principe
for: 本研究探讨了深度循环神经网络中隐藏状态的动态行为，尤其是长期依赖问题。
methods: 我们采用了一种基于weight矩阵 eigen分解的新视角来分析隐藏状态空间。我们首先使用线性状态空间模型进行分析，并解释了激活函数如何保持信息。我们还对长期依赖进行了解释，并发现了不同任务类型下的独特行为。
results: 我们提出了一种新的初始化方法，可以在vanilla-RNN、LSTM和GRU等深度循环神经网络中提高表现。这种初始化方法在多个 datasets 上（如 Tomita Grammars、 pixel-by-pixel MNIST 数据集和 machine translation 数据集）进行了测试，并与 Xavier 初始izer 和 kaiming 初始izer 以及其他 RNN-only 初始izer LIKE IRNN 和 sp-RNN 相比，在多个任务中具有更高的表现。

Abstract
In recurrent neural networks, learning long-term dependency is the main difficulty due to the vanishing and exploding gradient problem. Many researchers are dedicated to solving this issue and they proposed many algorithms. Although these algorithms have achieved great success, understanding how the information decays remains an open problem. In this paper, we study the dynamics of the hidden state in recurrent neural networks. We propose a new perspective to analyze the hidden state space based on an eigen decomposition of the weight matrix. We start the analysis by linear state space model and explain the function of preserving information in activation functions. We provide an explanation for long-term dependency based on the eigen analysis. We also point out the different behavior of eigenvalues for regression tasks and classification tasks. From the observations on well-trained recurrent neural networks, we proposed a new initialization method for recurrent neural networks, which improves consistently performance. It can be applied to vanilla-RNN, LSTM, and GRU. We test on many datasets, such as Tomita Grammars, pixel-by-pixel MNIST datasets, and machine translation datasets (Multi30k). It outperforms the Xavier initializer and kaiming initializer as well as other RNN-only initializers like IRNN and sp-RNN in several tasks.

摘要
在回归神经网络中，长期依赖是主要挑战，主要因为衰减和爆炸梯度问题。许多研究人员努力解决这个问题，并提出了多种算法。尽管这些算法取得了很大成功，但我们还未完全理解信息如何衰减。在这篇论文中，我们研究了回归神经网络中隐藏状态的动态。我们提出了一新的视角来分析隐藏状态空间，基于权重矩阵的归一化分解。我们从线性状态空间模型开始，解释隐藏状态中的信息保持功能，并对长期依赖进行解释。我们还发现了不同任务类型的欧拉值之间的差异。从已经训练过的回归神经网络的观察结果来看，我们提出了一种新的初始化方法，可以提高回归神经网络的性能。它可以应用于普通RNN、LSTM和GRU。我们在多个数据集上进行了测试，包括Tomita Grammar、像素级MNIST数据集和机器翻译数据集（Multi30k），并且在多个任务上超越了Xavier初始化器、kaiming初始izer以及其他RNN专用的初始化器如IRNN和sp-RNN。

Case Studies of Causal Discovery from IT Monitoring Time Series

paper_url: http://arxiv.org/abs/2307.15678
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Ali Aït-Bachir, Charles K. Assaad, Christophe de Bignicourt, Emilie Devijver, Simon Ferreira, Eric Gaussier, Hosein Mohanna, Lei Zan
for: 这篇论文是为了探讨在现代企业中IT系统的监控和缓解问题，以及通过对历史数据进行分析，预测未来问题的可能性。
methods: 这篇论文使用了 causal discovery 算法来分析 IT 监控数据，并提出了一些对应的挑战，如时序列不对齐、睡眠时序列、时间戳错误和缺失值等。
results: 该论文通过对不同 IT 监控数据集的应用，显示了 causal discovery 算法的好处，但也描述了当前的挑战和未解决的问题。

Abstract
Information technology (IT) systems are vital for modern businesses, handling data storage, communication, and process automation. Monitoring these systems is crucial for their proper functioning and efficiency, as it allows collecting extensive observational time series data for analysis. The interest in causal discovery is growing in IT monitoring systems as knowing causal relations between different components of the IT system helps in reducing downtime, enhancing system performance and identifying root causes of anomalies and incidents. It also allows proactive prediction of future issues through historical data analysis. Despite its potential benefits, applying causal discovery algorithms on IT monitoring data poses challenges, due to the complexity of the data. For instance, IT monitoring data often contains misaligned time series, sleeping time series, timestamp errors and missing values. This paper presents case studies on applying causal discovery algorithms to different IT monitoring datasets, highlighting benefits and ongoing challenges.

摘要
信息技术（IT）系统是现代企业的重要组成部分，负责数据存储、通信和自动化进程。监测这些系统非常重要，因为它可以收集广泛的观察时间序列数据，用于分析。在IT监测系统中，探索 causal 关系的兴趣在增长，因为它可以帮助降低系统停机时间、提高系统性能和识别异常和事件的根本原因。此外，它还允许预测未来问题的预测通过历史数据分析。虽然它拥有很多利点，但是应用 causal 探索算法在IT监测数据中存在很多挑战，例如IT监测数据中时间序列偏移、睡眠时间序列、时间戳错误和缺失值。本文通过不同的IT监测数据集的案例研究，highlights 这些挑战和继续挑战。

Adversarial training for tabular data with attack propagation

paper_url: http://arxiv.org/abs/2307.15677
repo_url: None
paper_authors: Tiago Leon Melo, João Bravo, Marco O. P. Sampaio, Paolo Romano, Hugo Ferreira, João Tiago Ascensão, Pedro Bizarro
For: 防止机器学习模型受到攻击，防止恶意攻击者误导模型为非法活动预测为合法，降低系统维护人员的劳动负担。* Methods: 提出了一种新的对抗训练方法，在训练循环中带动攻击在两个空间中传播。* Results: 通过实验表明，该方法可以防止约30%的性能下降，并在非常攻击性下是必要的，但是存在一定的性能损失。

Abstract
Adversarial attacks are a major concern in security-centered applications, where malicious actors continuously try to mislead Machine Learning (ML) models into wrongly classifying fraudulent activity as legitimate, whereas system maintainers try to stop them. Adversarially training ML models that are robust against such attacks can prevent business losses and reduce the work load of system maintainers. In such applications data is often tabular and the space available for attackers to manipulate undergoes complex feature engineering transformations, to provide useful signals for model training, to a space attackers cannot access. Thus, we propose a new form of adversarial training where attacks are propagated between the two spaces in the training loop. We then test this method empirically on a real world dataset in the domain of credit card fraud detection. We show that our method can prevent about 30% performance drops under moderate attacks and is essential under very aggressive attacks, with a trade-off loss in performance under no attacks smaller than 7%.

摘要
“对于安全应用程序而言，对抗攻击是一项重要的挑战，恶意攻击者不断尝试让机器学习（ML）模型错误地分类为合法的活动，而系统维护人员则努力阻止他们。使 ML 模型通过对抗训练得到鲜度的Robustness可以防止业务损失和减轻系统维护人员的劳重。在这些应用程序中，数据经常是表格式的，攻击者可以在复杂的特征工程转换下进行操作，以提供有用的信号 для模型训练。因此，我们提出了一种新的对抗训练方法，在训练循环中传递攻击。我们在实际世界数据集上进行了empirical测试，显示我们的方法可以在中等攻击下预防约30%的性能下降，并在非常攻击下是必要的，与无攻击下的性能损失比例小于7%。”

Bayesian Time-Series Classifier for Decoding Simple Visual Stimuli from Intracranial Neural Activity

paper_url: http://arxiv.org/abs/2307.15672
repo_url: None
paper_authors: Navid Ziaei, Reza Saadatifard, Ali Yousefi, Behzad Nazari, Sydney S. Cash, Angelique C. Paulk
For: This paper is written to address the need for developing analytical tools that can handle limited data and intrinsic stochasticity present in neural data, with the goal of understanding how external stimuli are encoded in distributed neural activity.* Methods: The proposed Bayesian time series classifier (BTsC) model is used to classify neural data and decode colors in a visual task. The model is based on a straightforward approach that maintains a high level of interpretability.* Results: The BTsC model exhibits consistent and reliable average performance of 75.55% on 4 patients’ dataset, improving upon state-of-the-art machine learning techniques by about 3.0 percent. The proposed solution provides interpretable results, making it a valuable tool to study neural activity in various tasks and categories.

Abstract
Understanding how external stimuli are encoded in distributed neural activity is of significant interest in clinical and basic neuroscience. To address this need, it is essential to develop analytical tools capable of handling limited data and the intrinsic stochasticity present in neural data. In this study, we propose a straightforward Bayesian time series classifier (BTsC) model that tackles these challenges whilst maintaining a high level of interpretability. We demonstrate the classification capabilities of this approach by utilizing neural data to decode colors in a visual task. The model exhibits consistent and reliable average performance of 75.55% on 4 patients' dataset, improving upon state-of-the-art machine learning techniques by about 3.0 percent. In addition to its high classification accuracy, the proposed BTsC model provides interpretable results, making the technique a valuable tool to study neural activity in various tasks and categories. The proposed solution can be applied to neural data recorded in various tasks, where there is a need for interpretable results and accurate classification accuracy.

摘要
理解外部刺激如何在神经活动中被编码是клиниче和基础神经科学中的一项关键问题。为了解决这个问题，需要开发可以处理有限数据和神经数据的内在噪声的分析工具。在这个研究中，我们提出了一种简单的抽象时间序列分类器（BTsC）模型，该模型可以解决这些挑战，同时保持高度的可读性。我们通过使用神经数据来解码视觉任务中的颜色，示出了该模型的分类能力。模型在4名病人的数据集上显示了平均性和可靠性的75.55%的分类精度，比前一个状态的机器学习技术提高约3.0%。此外，我们的提议的BTsC模型不仅具有高的分类精度，还提供了可读的结果，使得该技术成为各种任务和类别中神经活动研究的有价值工具。该解决方案可以应用于神经数据记录在各种任务中，需要可读的结果和高的分类精度。

CoRe Optimizer: An All-in-One Solution for Machine Learning

paper_url: http://arxiv.org/abs/2307.15663
repo_url: https://github.com/jettbrains/-L-
paper_authors: Marco Eckhoff, Markus Reiher
for: 训练机器学习模型的优化算法和其超参数可以对训练速度和模型准确率产生重要影响。
methods: 本文使用了10种优化算法，包括Adam优化器和抗衰减反propagation（RPROP），并对不同的机器学习任务进行了广泛的性能比较。
results: 研究发现，CoRe优化器在各种机器学习任务中表现最佳或与其他优化器竞争，而只需要根据mini-batch或批处理学习而改变一个超参数。

Abstract
The optimization algorithm and its hyperparameters can significantly affect the training speed and resulting model accuracy in machine learning applications. The wish list for an ideal optimizer includes fast and smooth convergence to low error, low computational demand, and general applicability. Our recently introduced continual resilient (CoRe) optimizer has shown superior performance compared to other state-of-the-art first-order gradient-based optimizers for training lifelong machine learning potentials. In this work we provide an extensive performance comparison of the CoRe optimizer and nine other optimization algorithms including the Adam optimizer and resilient backpropagation (RPROP) for diverse machine learning tasks. We analyze the influence of different hyperparameters and provide generally applicable values. The CoRe optimizer yields best or competitive performance in every investigated application, while only one hyperparameter needs to be changed depending on mini-batch or batch learning.

摘要
优化算法和其超参数可以很大地影响机器学习模型的训练速度和结果准确率。理想的优化器的愿望列表包括快速和平滑地 converges to 低误差，低计算成本，通用性。我们最近引入的连续强健（CoRe）优化器在训练持续学习潜力方面显示出优于其他当前状态艺术首频导导优化器。在这个工作中，我们对CoRe优化器和9种其他优化算法，包括Adam优化器和快速反演（RPROP）进行了广泛的性能比较。我们分析了不同的超参数对应用的影响，并提供了通用的值。CoRe优化器在所有调查应用中表现最佳或竞争力强，而只需要根据mini-batch或批处理学习而变化一个超参数。

Multi-layer Aggregation as a key to feature-based OOD detection

paper_url: http://arxiv.org/abs/2307.15647
repo_url: https://github.com/benolmbrt/MedicOOD
paper_authors: Benjamin Lambert, Florence Forbes, Senan Doyle, Michel Dojat
For: 本研究旨在探讨 Deep Learning 模型对输入图像变化的抗干扰性，尤其是在医学图像分析中， где范围内的可能的异常非常广泛。* Methods: 本研究使用了基于模型中间特征的新一代方法，可以分为单层方法和多层方法。单层方法考虑在固定、仔细选择的层获得的特征图，而多层方法考虑模型生成的特征图ensemble。* Results: 本研究对20种异常类型（对应约7800个3D MRI）进行了大规模的对比，发现多层方法在各种异常类型中都有更高的抗干扰性，而单层方法则具有不一致的行为，具体取决于异常类型。此外，本研究还发现了基于模型网络架构的 OOD 检测性能强度的关系。

Abstract
Deep Learning models are easily disturbed by variations in the input images that were not observed during the training stage, resulting in unpredictable predictions. Detecting such Out-of-Distribution (OOD) images is particularly crucial in the context of medical image analysis, where the range of possible abnormalities is extremely wide. Recently, a new category of methods has emerged, based on the analysis of the intermediate features of a trained model. These methods can be divided into 2 groups: single-layer methods that consider the feature map obtained at a fixed, carefully chosen layer, and multi-layer methods that consider the ensemble of the feature maps generated by the model. While promising, a proper comparison of these algorithms is still lacking. In this work, we compared various feature-based OOD detection methods on a large spectra of OOD (20 types), representing approximately 7800 3D MRIs. Our experiments shed the light on two phenomenons. First, multi-layer methods consistently outperform single-layer approaches, which tend to have inconsistent behaviour depending on the type of anomaly. Second, the OOD detection performance highly depends on the architecture of the underlying neural network.

摘要
深度学习模型容易受到训练阶段未见到的输入图像变化的影响，导致预测结果不可预测。在医学图像分析上，检测这些外围（Out-of-Distribution，OOD）图像特别重要。最近，一种新的类型的方法出现了，基于模型的中间特征分析。这些方法可以分为两个组：单层方法，考虑模型在固定、仔细选择的层获得的特征图，以及多层方法，考虑模型生成的特征图ensemble。虽然有承诺，但是这些算法之间的比较仍然缺乏。在这项工作中，我们对各种特征基于OOD检测方法进行了大规模的测试，包括20种类型的OOD图像，代表约7800个3D MRI图像。我们的实验揭示了两种现象：首先，多层方法在单层方法中具有更高的检测性能，而且这些单层方法在异常类型之间存在不一致的行为。其次，OOD检测性能强烈取决于下游神经网络的架构。

Scale-aware Test-time Click Adaptation for Pulmonary Nodule and Mass Segmentation

paper_url: http://arxiv.org/abs/2307.15645
repo_url: https://github.com/splinterli/sattca
paper_authors: Zhihao Li, Jiancheng Yang, Yongchao Xu, Li Zhang, Wenhui Dong, Bo Du
for: 这篇论文是为了提高lung cancer screening中图像分割的精度，特别是处理不同大小的肺脏病变所写的。
methods: 这篇论文提出了一种基于多尺度神经网络的test-time adaptiveClick权重调整方法，使得分割性能特别是对大肺脏病变进行了改进。
results: experiments表明，这种方法可以与一些CNN和Transformer基于的分割方法进行比较，并且可以很好地处理不同大小的肺脏病变。

Abstract
Pulmonary nodules and masses are crucial imaging features in lung cancer screening that require careful management in clinical diagnosis. Despite the success of deep learning-based medical image segmentation, the robust performance on various sizes of lesions of nodule and mass is still challenging. In this paper, we propose a multi-scale neural network with scale-aware test-time adaptation to address this challenge. Specifically, we introduce an adaptive Scale-aware Test-time Click Adaptation method based on effortlessly obtainable lesion clicks as test-time cues to enhance segmentation performance, particularly for large lesions. The proposed method can be seamlessly integrated into existing networks. Extensive experiments on both open-source and in-house datasets consistently demonstrate the effectiveness of the proposed method over some CNN and Transformer-based segmentation methods. Our code is available at https://github.com/SplinterLi/SaTTCA

摘要
肺脏结核和肿块是肺癌检测中重要的成像特征，需要仔细的诊断管理。尽管深度学习基于医疗图像分割的技术取得了成功，但是对不同大小的肿块和结核的性能仍然是挑战。在这篇论文中，我们提出了一种多尺度神经网络以及Scale-aware Test-time Click Adaptation方法，以提高分割性能，特别是大肿块的分割。这种方法可以轻松地与现有网络集成。我们在开源和自有数据集上进行了广泛的实验，并经过了一系列的比较，结果表明我们提出的方法在一些CNN和Transformer基于的分割方法之上表现更加有力。我们的代码可以在https://github.com/SplinterLi/SaTTCA上获取。

paper_url: http://arxiv.org/abs/2307.15644
repo_url: https://github.com/wz0919/scalevln
paper_authors: Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao
for: 提高语言导航agent的总体性和可靠性
methods: 使用HM3D和Gibson数据集中的1200多个真实照片环境，以及网络上可以访问的全部资源，生成490万个指令轨迹对，并对这些数据进行预训练和精度调整
results: 使用这些扩大数据，提高了现有agent的性能，单次成功率提高11%，与前一个SoTA的比较达到了80%的单次成功率，同时将不 familier环境下的性能差距降低到0.1%（相比前一个方法的8%），并且这种方法可以让不同的模型在CVDN、REVERIE和R2R中实现新的顶峰导航成绩。

Abstract
Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents. To tackle the common data scarcity issue in existing vision-and-language navigation datasets, we propose an effective paradigm for generating large-scale data for learning, which applies 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs using fully-accessible resources on the web. Importantly, we investigate the influence of each component in this paradigm on the agent's performance and study how to adequately apply the augmented data to pre-train and fine-tune an agent. Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning. The long-lasting generalization gap between navigating in seen and unseen environments is also reduced to less than 1% (versus 8% in the previous best method). Moreover, our paradigm also facilitates different models to achieve new state-of-the-art navigation results on CVDN, REVERIE, and R2R in continuous environments.

摘要

2023-07-30

DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and Delivery Route Prediction

Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning: A Survey

Spiking Neural Networks and Bio-Inspired Supervised Deep Learning: A Survey

Robust Electric Vehicle Balancing of Autonomous Mobility-On-Demand System: A Multi-Agent Reinforcement Learning Approach

Text Analysis Using Deep Neural Networks in Digital Humanities and Information Science

Question Answering with Deep Neural Networks for Semi-Structured Heterogeneous Genealogical Knowledge Graphs

Robust Multi-Agent Reinforcement Learning with State Uncertainty

Rethinking Uncertainly Missing and Ambiguous Visual Modality in Multi-Modal Entity Alignment

Around the GLOBE: Numerical Aggregation Question-Answering on Heterogeneous Genealogical Knowledge Graphs with Deep Neural Networks

Synthesizing Event-centric Knowledge Graphs of Daily Activities Using Virtual Space

Shuffled Differentially Private Federated Learning for Time Series Data Analytics

CLGT: A Graph Transformer for Student Performance Prediction in Collaborative Learning

ESP: Exploiting Symmetry Prior for Multi-Agent Reinforcement Learning

Data-Driven Modeling with Experimental Augmentation for the Modulation Strategy of the Dual-Active-Bridge Converter

HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer

An Effective LSTM-DDPM Scheme for Energy Theft Detection and Forecasting in Smart Grid

Fully $1\times1$ Convolutional Network for Lightweight Image Super-Resolution

User-Controlled Knowledge Fusion in Large Language Models: Balancing Creativity and Hallucination

Uncertainty-Encoded Multi-Modal Fusion for Robust Object Detection in Autonomous Driving

AI Increases Global Access to Reliable Flood Forecasts

PD-SEG: Population Disaggregation Using Deep Segmentation Networks For Improved Built Settlement Mask

EnrichEvent: Enriching Social Data with Contextual Information for Emerging Event Extraction

2023-07-30

A Private Watermark for Large Language Models

Optimizing the Neural Network Training for OCR Error Correction of Historical Hebrew Texts

Toward a Period-Specific Optimized Neural Network for OCR Error Correction of Historical Hebrew Texts

A Knowledge-enhanced Two-stage Generative Framework for Medical Dialogue Information Extraction

Improving TTS for Shanghainese: Addressing Tone Sandhi via Word Segmentation

Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models

SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension

Proposing a conceptual framework: social media listening for public health behavior

Roll Up Your Sleeves: Working with a Collaborative and Engaging Task-Oriented Dialogue System

ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus

Automatic Extraction of the Romanian Academic Word List: Data and Methods

Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

2023-07-30

Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup

DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and Delivery Route Prediction

Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning: A Survey

Spiking Neural Networks and Bio-Inspired Supervised Deep Learning: A Survey

Robust Electric Vehicle Balancing of Autonomous Mobility-On-Demand System: A Multi-Agent Reinforcement Learning Approach

Optimizing the Neural Network Training for OCR Error Correction of Historical Hebrew Texts

Improving Probabilistic Bisimulation for MDPs Using Machine Learning

Text Analysis Using Deep Neural Networks in Digital Humanities and Information Science

Question Answering with Deep Neural Networks for Semi-Structured Heterogeneous Genealogical Knowledge Graphs

Toward a Period-Specific Optimized Neural Network for OCR Error Correction of Historical Hebrew Texts

Robust Multi-Agent Reinforcement Learning with State Uncertainty

Rethinking Uncertainly Missing and Ambiguous Visual Modality in Multi-Modal Entity Alignment

Around the GLOBE: Numerical Aggregation Question-Answering on Heterogeneous Genealogical Knowledge Graphs with Deep Neural Networks

Deep Convolutional Neural Networks with Zero-Padding: Feature Extraction and Learning

Shuffled Differentially Private Federated Learning for Time Series Data Analytics

An Efficient Approach to Mitigate Numerical Instability in Backpropagation for 16-bit Neural Network Training

ESP: Exploiting Symmetry Prior for Multi-Agent Reinforcement Learning

Unified Model for Image, Video, Audio and Language Tasks

Redundancy-aware unsupervised rankings for collections of gene sets

Adaptive learning of density ratios in RKHS

Variance Control for Distributional Reinforcement Learning

An Effective LSTM-DDPM Scheme for Energy Theft Detection and Forecasting in Smart Grid

Pupil Learning Mechanism

User-Controlled Knowledge Fusion in Large Language Models: Balancing Creativity and Hallucination

Deep Unrolling Networks with Recurrent Momentum Acceleration for Nonlinear Inverse Problems

TMPNN: High-Order Polynomial Regression Based on Taylor Map Factorization

AI Increases Global Access to Reliable Flood Forecasts

On Neural Network approximation of ideal adversarial attack and convergence of adversarial training

ADR-GNN: Advection-Diffusion-Reaction Graph Neural Networks

Rapid Flood Inundation Forecast Using Fourier Neural Operator

Using Implicit Behavior Cloning and Dynamic Movement Primitive to Facilitate Reinforcement Learning for Robot Motion Planning

Click-Conversion Multi-Task Model with Position Bias Mitigation for Sponsored Search in eCommerce

Evaluating the Robustness of Test Selection Methods for Deep Neural Networks

Unveiling Exotic Magnetic Phases in Fibonacci Quasicrystalline Stacking of Ferromagnetic Layers through Machine Learning

Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

Developing novel ligands with enhanced binding affinity for the sphingosine 1-phosphate receptor 1 using machine learning

MUSE: Multi-View Contrastive Learning for Heterophilic Graphs

Discrete neural nets and polymorphic learning

Fuzzy Logic Visual Network (FLVN): A neuro-symbolic approach for visual features matching

2023-07-30

Unsupervised Decomposition Networks for Bias Field Correction in MR Image

Gastrointestinal Mucosal Problems Classification with Deep Learning

StarSRGAN: Improving Real-World Blind Super-Resolution