cs.AI - 2023-10-08

Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods

  • paper_url: http://arxiv.org/abs/2310.05309
  • repo_url: None
  • paper_authors: Constantine Caramanis, Dimitris Fotakis, Alkis Kalavasis, Vasilis Kontonis, Christos Tzamos
  • for: 这 paper 的目的是提供一种新的理论框架,用于分析 Deep Neural Networks 和 Reinforcement Learning 方法在解决复杂的 combinatorial 问题时的效果。
  • methods: 这 paper 使用的方法包括使用 Deep Neural Network 作为解决方案生成器,并通过 gradient-based 方法(例如策略梯度)进行训练,以获得更好的解决方案分布。
  • results: 本 paper 的主要贡献是提供了一个答案,证明 Deep Neural Networks 和 Reinforcement Learning 方法可以有效地解决 combinatorial 问题,包括 Max- 和 Min-Cut、Max-$k$-CSP、最大权重双向匹配和旅行商问题。此外,这 paper 还介绍了一种新的规范过程,用于改进 vanilla gradient descent,并提供了理论和实验证明,这种方法可以解决消失梯度问题和避免坏的站点点。
    Abstract Deep Neural Networks and Reinforcement Learning methods have empirically shown great promise in tackling challenging combinatorial problems. In those methods a deep neural network is used as a solution generator which is then trained by gradient-based methods (e.g., policy gradient) to successively obtain better solution distributions. In this work we introduce a novel theoretical framework for analyzing the effectiveness of such methods. We ask whether there exist generative models that (i) are expressive enough to generate approximately optimal solutions; (ii) have a tractable, i.e, polynomial in the size of the input, number of parameters; (iii) their optimization landscape is benign in the sense that it does not contain sub-optimal stationary points. Our main contribution is a positive answer to this question. Our result holds for a broad class of combinatorial problems including Max- and Min-Cut, Max-$k$-CSP, Maximum-Weight-Bipartite-Matching, and the Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla gradient descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
    摘要 深度神经网络和强化学习方法在解决复杂的 combinatorial 问题方面有广泛的实践经验。在这些方法中,深度神经网络被用作解决生成器,然后通过梯度基本方法(例如策略梯度)进行训练,以逐渐获得更好的解决分布。在这种工作中,我们提出了一个新的理论框架来分析这些方法的效果。我们问题是否存在一些生成模型,满足以下条件:(i)能够生成约等价优的解决方案;(ii) Parameters 的数量是输入数据的线性函数;(iii)优化Landscaper 是柔和的,不含有优化点。我们的主要贡献是给出了一个积极的答案。我们的结果适用于一类复杂的 combinatorial 问题,包括最大批量和最小批量问题、最大-$k$-CSP、最大负载双向匹配和旅行商问题。作为我们的分析的侧重点,我们还提出了一种新的 Regularization 过程,并通过理论和实验证明,它可以帮助解决混合梯度问题和避免坏的站点点。

Tailoring Self-Attention for Graph via Rooted Subtrees

  • paper_url: http://arxiv.org/abs/2310.05296
  • repo_url: https://github.com/lumia-group/subtree-attention
  • paper_authors: Siyuan Huang, Yunchong Song, Jiayue Zhou, Zhouhan Lin
  • for: 本文旨在提出一种新的多跳图注意机制,以解决现有图注意机制中的局部注意力和全局注意力的缺陷。
  • methods: 本文提出了一种名为Subtree Attention(STA)的新型多跳图注意机制,具有跨度更强的能力捕捉长距离信息和细腻的地方信息。STA还提供了一种有理证据的修正方法,以保证STA在极端情况下可以近似于全局注意力。
  • results: 对于十个节点分类 dataset,STA-based模型表现出色,超越现有的图Transformers和主流 GNNs。
    Abstract Attention mechanisms have made significant strides in graph learning, yet they still exhibit notable limitations: local attention faces challenges in capturing long-range information due to the inherent problems of the message-passing scheme, while global attention cannot reflect the hierarchical neighborhood structure and fails to capture fine-grained local information. In this paper, we propose a novel multi-hop graph attention mechanism, named Subtree Attention (STA), to address the aforementioned issues. STA seamlessly bridges the fully-attentional structure and the rooted subtree, with theoretical proof that STA approximates the global attention under extreme settings. By allowing direct computation of attention weights among multi-hop neighbors, STA mitigates the inherent problems in existing graph attention mechanisms. Further we devise an efficient form for STA by employing kernelized softmax, which yields a linear time complexity. Our resulting GNN architecture, the STAGNN, presents a simple yet performant STA-based graph neural network leveraging a hop-aware attention strategy. Comprehensive evaluations on ten node classification datasets demonstrate that STA-based models outperform existing graph transformers and mainstream GNNs. The code is available at https://github.com/LUMIA-Group/SubTree-Attention.
    摘要 注意机制在图学习中已经取得了重要进展,但它们仍然存在显著的限制:当地注意力不能够捕捉远程信息,因为消息传递方案的内在问题,而全局注意力则不能够反映层次结构和细化的地方信息。在这篇论文中,我们提出了一种新的多趟图注意机制,名为子树注意(STA),以解决以上问题。STA可以准确地计算多趟邻居之间的注意力权重,并且有理论证明,STA可以在极端情况下近似于全局注意。我们还提出了一种高效的STA实现方式,通过使用核函数软max,实现了线性时间复杂度。我们的结果是一种简单又高性能的STA-基于GNN,称为STAGNN,它利用跳跃注意策略来实现多趟图注意。我们对十个节点分类 datasets进行了广泛的评估,发现STA-based模型比现有的图transformer和主流GNN都有更好的性能。代码可以在https://github.com/LUMIA-Group/SubTree-Attention中下载。

Generalizable Error Modeling for Search Relevance Data Annotation Tasks

  • paper_url: http://arxiv.org/abs/2310.05286
  • repo_url: None
  • paper_authors: Heinrich Peters, Alireza Hashemi, James Rae
  • for: 这篇论文旨在提高机器学习和人工智能系统的质量,具体来说是针对搜索 relevance 标注任务进行预测错误模型的建立和评估。
  • methods: 该论文使用了一种预测错误模型,并在三个产业级 ML 应用(音乐流媒体、视频流媒体、移动应用)中进行了实践。
  • results: 论文显示了预测错误模型可以在不同应用中具有moderate的模型性能(AUC=0.65-0.75),并且该模型在不同应用之间具有良好的泛化性。此外,论文还提供了模型解释分析,以便理解预测错误的主要驱动因素。最后,论文还证明了这种模型在审核中的有用性,可以提高数据标注过程中的效率和质量。
    Abstract Human data annotation is critical in shaping the quality of machine learning (ML) and artificial intelligence (AI) systems. One significant challenge in this context is posed by annotation errors, as their effects can degrade the performance of ML models. This paper presents a predictive error model trained to detect potential errors in search relevance annotation tasks for three industry-scale ML applications (music streaming, video streaming, and mobile apps) and assesses its potential to enhance the quality and efficiency of the data annotation process. Drawing on real-world data from an extensive search relevance annotation program, we illustrate that errors can be predicted with moderate model performance (AUC=0.65-0.75) and that model performance generalizes well across applications (i.e., a global, task-agnostic model performs on par with task-specific models). We present model explainability analyses to identify which types of features are the main drivers of predictive performance. Additionally, we demonstrate the usefulness of the model in the context of auditing, where prioritizing tasks with high predicted error probabilities considerably increases the amount of corrected annotation errors (e.g., 40% efficiency gains for the music streaming application). These results underscore that automated error detection models can yield considerable improvements in the efficiency and quality of data annotation processes. Thus, our findings reveal critical insights into effective error management in the data annotation process, thereby contributing to the broader field of human-in-the-loop ML.
    摘要 人工数据标注是机器学习(ML)和人工智能(AI)系统的关键因素。一个重要的挑战在这个上是标注错误,因为它们可以降低ML模型的性能。本文介绍了一个预测错误模型,用于检测搜索相关性标注任务中的可能错误,并评估其在三个产业级ML应用(音乐流媒体、视频流媒体和移动应用)中的可能性。基于广泛的搜索相关性标注计划的实际数据,我们示出了预测错误的能力(AUC=0.65-0.75),并证明模型性能可以通过应用之间进行泛化。我们还提供了模型解释分析,以确定预测性能的主要驱动因素。此外,我们还证明了模型在审核中的用途,可以减少标注错误的效率(例如,音乐流媒体应用中的40%效率提升)。这些结果证明了自动错误检测模型可以提供显著改善数据标注过程的效率和质量。因此,我们的发现对人类在ML过程中的循环提供了重要的洞察,并贡献到更广泛的人类-在-loop ML领域。

Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems

  • paper_url: http://arxiv.org/abs/2310.05280
  • repo_url: https://github.com/uclanlp/persona-biases
  • paper_authors: Yixin Wan, Jieyu Zhao, Aman Chadha, Nanyun Peng, Kai-Wei Chang
  • for: 这项研究旨在探讨对话系统中使用人物模拟的风险,以及这些风险如何影响对话系统的性能和用户体验。
  • methods: 本研究使用UNIVERSALPERSONA数据集,对四种不同的对话系统进行了比较,并采用了五种评价指标来评估对话系统中人物模拟的偏见。
  • results: 研究发现,使用人物模拟在对话系统中存在许多偏见,包括不够尊重和不当的回应,这些偏见可能会对用户造成困惑和不良影响。
    Abstract Recent advancements in Large Language Models empower them to follow freeform instructions, including imitating generic or specific demographic personas in conversations. We define generic personas to represent demographic groups, such as "an Asian person", whereas specific personas may take the form of specific popular Asian names like "Yumi". While the adoption of personas enriches user experiences by making dialogue systems more engaging and approachable, it also casts a shadow of potential risk by exacerbating social biases within model responses, thereby causing societal harm through interactions with users. In this paper, we systematically study "persona biases", which we define to be the sensitivity of dialogue models' harmful behaviors contingent upon the personas they adopt. We categorize persona biases into biases in harmful expression and harmful agreement, and establish a comprehensive evaluation framework to measure persona biases in five aspects: Offensiveness, Toxic Continuation, Regard, Stereotype Agreement, and Toxic Agreement. Additionally, we propose to investigate persona biases by experimenting with UNIVERSALPERSONA, a systematically constructed persona dataset encompassing various types of both generic and specific model personas. Through benchmarking on four different models -- including Blender, ChatGPT, Alpaca, and Vicuna -- our study uncovers significant persona biases in dialogue systems. Our findings also underscore the pressing need to revisit the use of personas in dialogue agents to ensure safe application.
    摘要 现代大语言模型可以遵循自由式指令,包括模仿 generic或特定民族人物的对话。我们定义了一些通用的人物类型来表示民族组成部分,例如“一个亚洲人”,而特定的人物可能是具体的受欢迎的亚洲名字“玉米”。虽然采用人物可以增加对话系统的互动性和可接近性,但也可能扩大社会偏见在模型响应中,从而对社会造成伤害。在这篇论文中,我们系统地研究了“人物偏见”,定义为对话模型的危险行为与人物相关的敏感性。我们分类人物偏见为表达偏见和同意偏见,并设计了全面的评价框架来测试人物偏见的五个方面:不礼貌、继续恶势力、尊敬、刻板印象同意和恶势力同意。此外,我们还提出了使用 UNIVERSALPERSONA 系统构建的人物数据集,包括各种通用和特定的模型人物。通过对四种不同的模型(包括 Blender、ChatGPT、Alpaca 和 Vicuna)进行比较,我们的研究发现了对话系统中的人物偏见。我们的发现也警示了对人物的使用以确保安全应用的需要。

Measuring reasoning capabilities of ChatGPT

  • paper_url: http://arxiv.org/abs/2310.05993
  • repo_url: None
  • paper_authors: Adrian Groza
    for: 这个论文的目的是量化 chatGPT 在逻辑任务中生成的逻辑错误。methods: 作者使用了 chatGPT 解决 144 个逻辑题目,并使用 Prover9 和 Mace4 来验证解决方案。results: 作者发现 chatGPT 只能正确解决 7% 的题目,而 BARD 则可以正确解决 5% 的题目。此外,作者还发现 chatGPT 生成的解决方案中包含了 67 种逻辑错误,平均每个逻辑任务中包含 7 种错误。
    Abstract I shall quantify the logical faults generated by ChatGPT when applied to reasoning tasks. For experiments, I use the 144 puzzles from the library \url{https://users.utcluj.ro/~agroza/puzzles/maloga}~\cite{groza:fol}. The library contains puzzles of various types, including arithmetic puzzles, logical equations, Sudoku-like puzzles, zebra-like puzzles, truth-telling puzzles, grid puzzles, strange numbers, or self-reference puzzles. The correct solutions for these puzzles were checked using the theorem prover Prover9~\cite{mccune2005release} and the finite models finder Mace4~\cite{mccune2003mace4} based on human-modelling in Equational First Order Logic. A first output of this study is the benchmark of 100 logical puzzles. For this dataset ChatGPT provided both correct answer and justification for 7\% only. %, while BARD for 5\%. Since the dataset seems challenging, the researchers are invited to test the dataset on more advanced or tuned models than ChatGPT3.5 with more crafted prompts. A second output is the classification of reasoning faults conveyed by ChatGPT. This classification forms a basis for a taxonomy of reasoning faults generated by large language models. I have identified 67 such logical faults, among which: inconsistencies, implication does not hold, unsupported claim, lack of commonsense, wrong justification. The 100 solutions generated by ChatGPT contain 698 logical faults. That is on average, 7 fallacies for each reasoning task. A third ouput is the annotated answers of the ChatGPT with the corresponding logical faults. Each wrong statement within the ChatGPT answer was manually annotated, aiming to quantify the amount of faulty text generated by the language model. On average, 26.03\% from the generated text was a logical fault.
    摘要 A second output is the classification of reasoning faults conveyed by ChatGPT. This classification forms a basis for a taxonomy of reasoning faults generated by large language models. I have identified 67 such logical faults, including inconsistencies, implications that do not hold, unsupported claims, lack of common sense, and wrong justifications. The 100 solutions generated by ChatGPT contain 698 logical faults, averaging 7 fallacies for each reasoning task.A third output is the annotated answers of ChatGPT with the corresponding logical faults. Each wrong statement within the ChatGPT answer was manually annotated to quantify the amount of faulty text generated by the language model. On average, 26.03% of the generated text was found to contain logical faults.Note:[1] Groza, F. (2009). Puzzles for Logical Reasoning. Retrieved from [2] McCune, A. (2005). Prover9: A System for Automatic Theorem Proving. Retrieved from [3] McCune, A. (2003). Mace4: A System for Automatic Finite Model Generation. Retrieved from

Transforming Pixels into a Masterpiece: AI-Powered Art Restoration using a Novel Distributed Denoising CNN (DDCNN)

  • paper_url: http://arxiv.org/abs/2310.05270
  • repo_url: None
  • paper_authors: Sankar B., Mukil Saravanan, Kalaivanan Kumar, Siri Dubbaka
  • For: restore deteriorated artworks accurately and efficiently* Methods: 使用深度学习和计算机视觉技术,创造一种基于DDCNN的混合模型,可以根据不同的损害程度和类型进行自适应 restauration* Results: 实验表明,该方法可以有效地纠正损害,并保持细节的精度,提高了艺术品的 restauration 质量,比传统方法有更大的超越
    Abstract Art restoration is crucial for preserving cultural heritage, but traditional methods have limitations in faithfully reproducing original artworks while addressing issues like fading, staining, and damage. We present an innovative approach using deep learning, specifically Convolutional Neural Networks (CNNs), and Computer Vision techniques to revolutionize art restoration. We start by creating a diverse dataset of deteriorated art images with various distortions and degradation levels. This dataset trains a Distributed Denoising CNN (DDCNN) to remove distortions while preserving intricate details. Our method is adaptable to different distortion types and levels, making it suitable for various deteriorated artworks, including paintings, sketches, and photographs. Extensive experiments demonstrate our approach's efficiency and effectiveness compared to other Denoising CNN models. We achieve a substantial reduction in distortion, transforming deteriorated artworks into masterpieces. Quantitative evaluations confirm our method's superiority over traditional techniques, reshaping the art restoration field and preserving cultural heritage. In summary, our paper introduces an AI-powered solution that combines Computer Vision and deep learning with DDCNN to restore artworks accurately, overcoming limitations and paving the way for future advancements in art restoration.
    摘要 艺术修复是保护文化遗产的关键,但传统方法有限制,无法准确地复制原始艺术作品,同时解决抹涂、损坏等问题。我们提出了一种创新的方法,使用深度学习技术和计算机视觉技术,以推动艺术修复领域的 револю变。我们开始创建一个多样化的褪色艺术图像数据集,用于训练分布式滤清神经网络(DDCNN),以除掉抹涂而保留细节。我们的方法适用于不同类型和水平的抹涂,可以应用于不同的艺术作品,包括画作、素描和照片。我们的实验证明,我们的方法可以减少抹涂,将褪色艺术作品转化为名画。量化评估表明,我们的方法比传统方法更高效,重新定义艺术修复领域,并为未来的艺术修复领域提供了新的发展方向。总之,我们的论文介绍了一种通过计算机视觉和深度学习技术,使用 DDCNN 恢复艺术作品的准确方法,超越传统技术,开拓了未来艺术修复领域的新途径。

Federated Learning: A Cutting-Edge Survey of the Latest Advancements and Applications

  • paper_url: http://arxiv.org/abs/2310.05269
  • repo_url: None
  • paper_authors: Azim Akhtarshenas, Mohammad Ali Vahedifar, Navid Ayoobi, Behrouz Maham, Tohid Alizadeh, Sina Ebrahimi
  • for: 这份论文主要是为了探讨联盟学习(Federated Learning,FL)在机器学习系统中实现隐私安全性的可能性和挑战。
  • methods: 这份论文使用了分布式机器学习(Distributed Machine Learning)和封包技术来实现联盟学习,并且进行了评估和比较现有的FL应用,以评估其效率、精度和隐私保护。
  • results: 这份论文发现了联盟学习可以实现隐私安全性和成本效益,并且发现了一些未解决的问题和挑战,例如资料权益和安全性、资料分布和资料隐私保护等。
    Abstract In the realm of machine learning (ML) systems featuring client-host connections, the enhancement of privacy security can be effectively achieved through federated learning (FL) as a secure distributed ML methodology. FL effectively integrates cloud infrastructure to transfer ML models onto edge servers using blockchain technology. Through this mechanism, it guarantees the streamlined processing and data storage requirements of both centralized and decentralized systems, with an emphasis on scalability, privacy considerations, and cost-effective communication. In current FL implementations, data owners locally train their models, and subsequently upload the outcomes in the form of weights, gradients, and parameters to the cloud for overall model aggregation. This innovation obviates the necessity of engaging Internet of Things (IoT) clients and participants to communicate raw and potentially confidential data directly with a cloud center. This not only reduces the costs associated with communication networks but also enhances the protection of private data. This survey conducts an analysis and comparison of recent FL applications, aiming to assess their efficiency, accuracy, and privacy protection. However, in light of the complex and evolving nature of FL, it becomes evident that additional research is imperative to address lingering knowledge gaps and effectively confront the forthcoming challenges in this field. In this study, we categorize recent literature into the following clusters: privacy protection, resource allocation, case study analysis, and applications. Furthermore, at the end of each section, we tabulate the open areas and future directions presented in the referenced literature, affording researchers and scholars an insightful view of the evolution of the field.
    摘要 在机器学习(ML)系统中,通过联邦学习(FL)可以有效提高隐私安全性。FL可以将云基础设施与边缘服务器集成,通过块链技术实现模型传输。这种机制可以保证中央化和分布式系统之间的流畅处理和数据存储要求,同时强调扩展性、隐私考虑因素和效率沟通。现在的FL实现中,数据所有者在本地训练模型,然后将结果上传到云中进行总模型聚合。这种创新使得无需将互联网物联网(IoT)客户端和参与者直接与云中心进行明文和潜在敏感数据的直接交流,从而降低了通信网络成本并提高了隐私数据的保护。本文对现有的FL应用进行分析和比较,以评估其效率、准确率和隐私保护。然而,随着FL的复杂和不断演化,显然需要进一步的研究,以解决仍存的知识漏洞并有效地应对未来的挑战。在这个研究中,我们将 recens literature into以下类别:隐私保护、资源分配、案例分析和应用。此外,文章结尾附加了每个部分的开放领域和未来方向,为研究人员和学者提供了深入的视野,了解领域的演化。

A Knowledge Graph-Based Search Engine for Robustly Finding Doctors and Locations in the Healthcare Domain

  • paper_url: http://arxiv.org/abs/2310.05258
  • repo_url: None
  • paper_authors: Mayank Kejriwal, Hamid Haidarian, Min-Hsueh Chiu, Andy Xiang, Deep Shrestha, Faizan Javed
  • for: 这篇论文是为了解决医疗领域患者找寻医生和位置的搜索问题而写的。
  • methods: 该论文使用知识图(KG)来结合 semi-structured 数据的感知模型、自然语言处理技术和结构化查询语言 like SPARQL 和 Cypher 来提供强大的搜索引擎体系。
  • results: Early results 表明,该方法可以对复杂查询提供明显更高的覆盖率,无需降低质量。
    Abstract Efficiently finding doctors and locations is an important search problem for patients in the healthcare domain, for which traditional information retrieval methods tend not to work optimally. In the last ten years, knowledge graphs (KGs) have emerged as a powerful way to combine the benefits of gleaning insights from semi-structured data using semantic modeling, natural language processing techniques like information extraction, and robust querying using structured query languages like SPARQL and Cypher. In this short paper, we present a KG-based search engine architecture for robustly finding doctors and locations in the healthcare domain. Early results demonstrate that our approach can lead to significantly higher coverage for complex queries without degrading quality.
    摘要 Traditional information retrieval methods tend not to work optimally for efficiently finding doctors and locations in the healthcare domain. In the last ten years, knowledge graphs (KGs) have emerged as a powerful way to combine the benefits of gleaning insights from semi-structured data using semantic modeling, natural language processing techniques like information extraction, and robust querying using structured query languages like SPARQL and Cypher. In this short paper, we present a KG-based search engine architecture for robustly finding doctors and locations in the healthcare domain. Early results demonstrate that our approach can lead to significantly higher coverage for complex queries without degrading quality.Here's the text in Traditional Chinese:传统的资讯搜寻方法在医疗领域中不太能够有效率地找到医生和位置。过去十年,知识图表(KGs)已经emerged as a powerful way to combine the benefits of gleaning insights from semi-structured data using semantic modeling, natural language processing techniques like information extraction, and robust querying using structured query languages like SPARQL and Cypher。在这篇短篇论文中,我们呈现了一个基于KG的搜索引擎架构,用于在医疗领域中强健地找到医生和位置。初步结果显示,我们的方法可以导致复杂的查询得到更高的覆盖率,而不会降低品质。

Persis: A Persian Font Recognition Pipeline Using Convolutional Neural Networks

  • paper_url: http://arxiv.org/abs/2310.05255
  • repo_url: https://github.com/mehrdad-dev/persis
  • paper_authors: Mehrdad Mohammadian, Neda Maleki, Tobias Olsson, Fredrik Ahlgren
  • for: 这篇论文是为了解决视觉字体识别(VFR)系统中的波斯字体识别问题。
  • methods: 该论文使用卷积神经网络(CNN)来解决这个问题,并使用了新的公共可用数据集来训练模型。
  • results: 根据论文的结果,提出的管道可以达到78.0%的顶部准确率,89.1%的IDPL-PFOD数据集准确率,以及94.5%的KAFD数据集准确率。 Additionally, the average time spent in the entire pipeline for one sample of the proposed datasets is 0.54 seconds and 0.017 seconds for CPU and GPU, respectively.
    Abstract What happens if we encounter a suitable font for our design work but do not know its name? Visual Font Recognition (VFR) systems are used to identify the font typeface in an image. These systems can assist graphic designers in identifying fonts used in images. A VFR system also aids in improving the speed and accuracy of Optical Character Recognition (OCR) systems. In this paper, we introduce the first publicly available datasets in the field of Persian font recognition and employ Convolutional Neural Networks (CNN) to address this problem. The results show that the proposed pipeline obtained 78.0% top-1 accuracy on our new datasets, 89.1% on the IDPL-PFOD dataset, and 94.5% on the KAFD dataset. Furthermore, the average time spent in the entire pipeline for one sample of our proposed datasets is 0.54 and 0.017 seconds for CPU and GPU, respectively. We conclude that CNN methods can be used to recognize Persian fonts without the need for additional pre-processing steps such as feature extraction, binarization, normalization, etc.
    摘要 如果我们在设计工作中遇到一种适合的字体,但是不知道它的名称,可以使用视觉字体识别(VFR)系统来识别字体类型。这些系统可以帮助图形设计师在图像中识别字体。VFR 系统还可以提高光学字符识别(OCR)系统的速度和准确性。在这篇论文中,我们介绍了字体识别领域的第一个公共可用数据集,并使用卷积神经网络(CNN)解决这个问题。结果显示,我们的提案的管道取得了78.0%的顶部一准确率,89.1%的IDPL-PFOD数据集和94.5%的KAFD数据集。此外,我们的整个管道中对一个样本的平均时间为0.54秒和0.017秒,分别是CPU和GPU上的。我们 conclude 的是,CNN 方法可以用来识别波斯字体,不需要额外的预处理步骤,如特征提取、二进制化、Normalization等。

Explainable Claim Verification via Knowledge-Grounded Reasoning with Large Language Models

  • paper_url: http://arxiv.org/abs/2310.05253
  • repo_url: https://github.com/wang2226/folk
  • paper_authors: Haoran Wang, Kai Shu
  • for: 验证宣称的可靠性,对抗虚假信息的扩散。
  • methods: 使用First-Order-Logic-Guided Knowledge-Grounded (FOLK) Reasoning,无需人工标注数据,可以验证复杂的宣称,并生成可读的解释。
  • results: 在三个不同的数据集上,FOLK 已经超越强基eline,并且可以提供清晰的解释,帮助人工验证者更好地理解模型的决策过程。
    Abstract Claim verification plays a crucial role in combating misinformation. While existing works on claim verification have shown promising results, a crucial piece of the puzzle that remains unsolved is to understand how to verify claims without relying on human-annotated data, which is expensive to create at a large scale. Additionally, it is important for models to provide comprehensive explanations that can justify their decisions and assist human fact-checkers. This paper presents First-Order-Logic-Guided Knowledge-Grounded (FOLK) Reasoning that can verify complex claims and generate explanations without the need for annotated evidence using Large Language Models (LLMs). FOLK leverages the in-context learning ability of LLMs to translate the claim into a First-Order-Logic (FOL) clause consisting of predicates, each corresponding to a sub-claim that needs to be verified. Then, FOLK performs FOL-Guided reasoning over a set of knowledge-grounded question-and-answer pairs to make veracity predictions and generate explanations to justify its decision-making process. This process makes our model highly explanatory, providing clear explanations of its reasoning process in human-readable form. Our experiment results indicate that FOLK outperforms strong baselines on three datasets encompassing various claim verification challenges. Our code and data are available.
    摘要 <>转换文本到简化中文。<>研究人员认为,确认说法的重要作用在抵御谎言中扮演着关键角色。 although existing works on claim verification have shown promising results, a crucial piece of the puzzle that remains unsolved is to understand how to verify claims without relying on expensive human-annotated data. In addition, it is important for models to provide comprehensive explanations that can justify their decisions and assist human fact-checkers. This paper presents First-Order-Logic-Guided Knowledge-Grounded (FOLK) Reasoning, which can verify complex claims and generate explanations without the need for annotated evidence using Large Language Models (LLMs). FOLK leverages the in-context learning ability of LLMs to translate the claim into a First-Order-Logic (FOL) clause consisting of predicates, each corresponding to a sub-claim that needs to be verified. Then, FOLK performs FOL-Guided reasoning over a set of knowledge-grounded question-and-answer pairs to make veracity predictions and generate explanations to justify its decision-making process. This process makes our model highly explanatory, providing clear explanations of its reasoning process in human-readable form. Our experiment results indicate that FOLK outperforms strong baselines on three datasets encompassing various claim verification challenges. Our code and data are available.

In-Context Convergence of Transformers

  • paper_url: http://arxiv.org/abs/2310.05249
  • repo_url: None
  • paper_authors: Yu Huang, Yuan Cheng, Yingbin Liang
  • for: 这个论文研究了一层转换器在梯度下降训练下的学习动力学,以便在不需要参数调整的情况下解决未看过的任务。
  • methods: 该论文使用了梯度下降训练方法,并对一层转换器的软max注意力进行研究。
  • results: 研究发现,对于具有平衡或不平衡特征的数据,转换器在梯度下降训练下可以在不同阶段达到近Zero预测错误的finite-time收敛保证。
    Abstract Transformers have recently revolutionized many domains in modern machine learning and one salient discovery is their remarkable in-context learning capability, where models can solve an unseen task by utilizing task-specific prompts without further parameters fine-tuning. This also inspired recent theoretical studies aiming to understand the in-context learning mechanism of transformers, which however focused only on linear transformers. In this work, we take the first step toward studying the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent in order to in-context learn linear function classes. We consider a structured data model, where each token is randomly sampled from a set of feature vectors in either balanced or imbalanced fashion. For data with balanced features, we establish the finite-time convergence guarantee with near-zero prediction error by navigating our analysis over two phases of the training dynamics of the attention map. More notably, for data with imbalanced features, we show that the learning dynamics take a stage-wise convergence process, where the transformer first converges to a near-zero prediction error for the query tokens of dominant features, and then converges later to a near-zero prediction error for the query tokens of under-represented features, respectively via one and four training phases. Our proof features new techniques for analyzing the competing strengths of two types of attention weights, the change of which determines different training phases.
    摘要 <>将文本翻译成简化中文。<>现代机器学习中,变换器最近对许多领域进行了革命性的改变,其中一个吸引人的发现是它们在未经参数调整的情况下可以解决未看过的任务,这也激发了最近的理论研究,旨在理解变换器的在场景学习机制。然而,这些研究仅专注于线性变换器。在这项工作中,我们首先研究了一层变换器,通过梯度下降来学习线性函数类型。我们考虑了一种结构化数据模型,其中每个token是随机选择的特征向量集中的一个元素。对于具有平衡特征的数据,我们证明了在 finite-time 内 convergence guarantee,并且预测错误几乎为零。而对于具有不平衡特征的数据,我们显示了一个stage-wise convergence进程,变换器首先对查询符号的主要特征 converge 到 near-zero prediction error,然后在后四个训练阶段 convergence 到 under-represented 特征上的查询符号的 near-zero prediction error。我们的证明利用了一些新的分析技术,以确定不同训练阶段中的关键因素。

ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data

  • paper_url: http://arxiv.org/abs/2310.05242
  • repo_url: None
  • paper_authors: Tianyang Zhong, Wei Zhao, Yutong Zhang, Yi Pan, Peixin Dong, Zuowei Jiang, Xiaoyan Kui, Youlan Shang, Li Yang, Yaonai Wei, Longtao Yang, Hao Chen, Huan Zhao, Yuxiao Liu, Ning Zhu, Yiwei Li, Yisong Wang, Jiaqi Yao, Jiaqi Wang, Ying Zeng, Lei He, Chao Zheng, Zhixue Zhang, Ming Li, Zhengliang Liu, Haixing Dai, Zihao Wu, Lu Zhang, Shu Zhang, Xiaoyan Cai, Xintao Hu, Shijie Zhao, Xi Jiang, Xin Zhang, Xiang Li, Dajiang Zhu, Lei Guo, Dinggang Shen, Junwei Han, Tianming Liu, Jun Liu, Tuo Zhang
  • For: 这个研究旨在解决医疗影像分析中的报告生成问题,以实现诊断过程中的量化分析。* Methods: 这个研究使用了大型自然语言模型(LLM),发展了一个适应器“ChatRadio-Valuer”,以自动生成医疗影像报告。* Results: 研究结果显示,ChatRadio-Valuer在医疗影像报告中诊断疾病的能力高于现有的模型,特别是与ChatGPT和GPT-4等模型相比。
    Abstract Radiology report generation, as a key step in medical image analysis, is critical to the quantitative analysis of clinically informed decision-making levels. However, complex and diverse radiology reports with cross-source heterogeneity pose a huge generalizability challenge to the current methods under massive data volume, mainly because the style and normativity of radiology reports are obviously distinctive among institutions, body regions inspected and radiologists. Recently, the advent of large language models (LLM) offers great potential for recognizing signs of health conditions. To resolve the above problem, we collaborate with the Second Xiangya Hospital in China and propose ChatRadio-Valuer based on the LLM, a tailored model for automatic radiology report generation that learns generalizable representations and provides a basis pattern for model adaptation in sophisticated analysts' cases. Specifically, ChatRadio-Valuer is trained based on the radiology reports from a single institution by means of supervised fine-tuning, and then adapted to disease diagnosis tasks for human multi-system evaluation (i.e., chest, abdomen, muscle-skeleton, head, and maxillofacial $\&$ neck) from six different institutions in clinical-level events. The clinical dataset utilized in this study encompasses a remarkable total of \textbf{332,673} observations. From the comprehensive results on engineering indicators, clinical efficacy and deployment cost metrics, it can be shown that ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al., in terms of the diseases diagnosis from radiology reports. ChatRadio-Valuer provides an effective avenue to boost model generalization performance and alleviate the annotation workload of experts to enable the promotion of clinical AI applications in radiology reports.
    摘要 医学影像分析中的 radiology 报告生成是医疗决策中的关键步骤,但是复杂和多样的 radiology 报告带有跨源差异性,对当前方法来说是一个巨大普适性挑战。这是因为 radiology 报告的风格和标准性在不同机构、身体区域和 radiologist 之间存在显著差异。然而,最近的大语言模型(LLM)的出现带来了识别健康状况的潜在可能性。为解决这个问题,我们与中国第二医学院合作,并提出了基于 LLM 的 ChatRadio-Valuer 自动 radiology 报告生成模型,该模型学习普适表示和提供基本模式 для模型适应复杂分析员的情况。具体来说,ChatRadio-Valuer 通过单机构的 radiology 报告进行监督微调训练,然后在多个机构的疾病诊断任务中进行人类多系统评估。这些临床数据的总量为 \textbf{332,673} 个观察。根据工程指标、临床效果和部署成本度量,可以看出,ChatRadio-Valuer 在疾病诊断方面与现有模型,特别是 ChatGPT(GPT-3.5-Turbo)和 GPT-4 等模型,表现出色,尤其是在 radiology 报告中诊断疾病。ChatRadio-Valuer 为临床 AI 应用提供了一个有效的通路,以提高模型普适性性和减轻专家的标注工作负担,以便推动临床 AI 应用的普及。

MindfulDiary: Harnessing Large Language Model to Support Psychiatric Patients’ Journaling

  • paper_url: http://arxiv.org/abs/2310.05231
  • repo_url: None
  • paper_authors: Taewan Kim, Seolyeong Bae, Hyun Ah Kim, Su-woo Lee, Hwajung Hong, Chanmo Yang, Young-Ho Kim
  • for: 帮助心理病人每天记录经验,并帮助心理医生更好地理解患者的思想和日常情境。
  • methods: 使用大语言模型(LLM)和移动应用程序,实现了患者每天的自由对话记录,并遵循专业指导方针。
  • results: 经四周的场景研究,发现 MindfulDiary 可以帮助患者日常记录更详细和系统化,同时帮助心理医生更好地理解患者的思想和日常情境,有助于提高心理医疗效果。
    Abstract In the mental health domain, Large Language Models (LLMs) offer promising new opportunities, though their inherent complexity and low controllability have raised questions about their suitability in clinical settings. We present MindfulDiary, a mobile journaling app incorporating an LLM to help psychiatric patients document daily experiences through conversation. Designed in collaboration with mental health professionals (MHPs), MindfulDiary takes a state-based approach to safely comply with the experts' guidelines while carrying on free-form conversations. Through a four-week field study involving 28 patients with major depressive disorder and five psychiatrists, we found that MindfulDiary supported patients in consistently enriching their daily records and helped psychiatrists better empathize with their patients through an understanding of their thoughts and daily contexts. Drawing on these findings, we discuss the implications of leveraging LLMs in the mental health domain, bridging the technical feasibility and their integration into clinical settings.
    摘要 在心理健康领域,大型自然语言模型(LLM)提供了新的机遇,但其内置的复杂性和控制性问题引起了许多关于其在临床设置中适用性的问题。我们介绍了一款名为 MindfulDiary的移动日记应用程序,该应用程序通过与心理医生(MHP)合作,使用 LLM 帮助心理病人每天记录他们的经验。我们在28名主观抑郁症患者和5名心理医生参与的四周实验中发现,MindfulDiary 可以帮助患者日常记录更加详细,并帮助心理医生更好地理解他们的患者的思想和日常背景。根据这些发现,我们讨论了在心理健康领域利用 LLM 的意义,把技术可行性和其在临床设置中的集成相结合。

Physics-aware Machine Learning Revolutionizes Scientific Paradigm for Machine Learning and Process-based Hydrology

  • paper_url: http://arxiv.org/abs/2310.05227
  • repo_url: None
  • paper_authors: Qingsong Xu, Yilei Shi, Jonathan Bamber, Ye Tuo, Ralf Ludwig, Xiao Xiang Zhu
    for:physics-aware machine learning (PaML) is introduced as a transformative approach to overcome the barrier between hydrology and machine learning, and to revolutionize both fields.methods:the review includes a comprehensive analysis of existing PaML methodologies that integrate prior physical knowledge or physics-based modeling into machine learning, including physical data-guided ML, physics-informed ML, physics-embedded ML, and physics-aware hybrid learning.results:the review highlights the most promising and challenging directions for different objectives and PaML methods in hydrology, including rainfall-runoff hydrological processes and hydrodynamic processes. Additionally, a new PaML-based hydrology platform, termed HydroPML, is released as a foundation for hydrological applications, which enhances the explainability and causality of machine learning and lays the groundwork for the digital water cycle’s realization.
    Abstract Accurate hydrological understanding and water cycle prediction are crucial for addressing scientific and societal challenges associated with the management of water resources, particularly under the dynamic influence of anthropogenic climate change. Existing reviews predominantly concentrate on the development of machine learning (ML) in this field, yet there is a clear distinction between hydrology and ML as separate paradigms. Here, we introduce physics-aware ML as a transformative approach to overcome the perceived barrier and revolutionize both fields. Specifically, we present a comprehensive review of the physics-aware ML methods, building a structured community (PaML) of existing methodologies that integrate prior physical knowledge or physics-based modeling into ML. We systematically analyze these PaML methodologies with respect to four aspects: physical data-guided ML, physics-informed ML, physics-embedded ML, and physics-aware hybrid learning. PaML facilitates ML-aided hypotheses, accelerating insights from big data and fostering scientific discoveries. We first conduct a systematic review of hydrology in PaML, including rainfall-runoff hydrological processes and hydrodynamic processes, and highlight the most promising and challenging directions for different objectives and PaML methods. Finally, a new PaML-based hydrology platform, termed HydroPML, is released as a foundation for hydrological applications. HydroPML enhances the explainability and causality of ML and lays the groundwork for the digital water cycle's realization. The HydroPML platform is publicly available at https://hydropml.github.io/.
    摘要 Accurate hydrological understanding和水ecycle prediction是管理水资源的科学和社会挑战中的关键,尤其是在人类活动导致的气候变化的影响下。现有的评论主要集中在机器学习(ML)的发展中,但是有一个明确的分界线:水文和ML为两个不同的思维框架。我们介绍了一种将物理知识integrated into ML的新方法,即物理意识ML(PaML),以超越这一障碍并重塑两个领域。我们对PaML方法进行了系统性的分析,包括物理数据驱动ML、物理信息驱动ML、物理嵌入ML和物理意识混合学习。PaML方法可以加速大数据的学习和探索,并促进科学发现。我们首先对PaML在水文领域进行了系统性的评论,包括雨水径流过程和 hidrodynamic过程,并将不同目标和PaML方法中最有前途和挑战的方向 highlighted。最后,我们发布了一个基于PaML的水文平台,称为HydroPML,以提高ML的解释性和因果关系,并为数字水ecycle的实现奠定基础。HydroPML平台publicly available at

Interpretable Semiotics Networks Representing Awareness

  • paper_url: http://arxiv.org/abs/2310.05212
  • repo_url: None
  • paper_authors: David Kupeev, Eyal Nitcany
  • for: 这个论文描述了一种计算模型,用于跟踪和模拟人类对物体的感知和communication中的表达。
  • methods: 该模型包括两个关键组件(’observed’和’seen’),与计算机视觉术语(‘encoding’和’decoding’)相关。这些元素结合形成了 semiotic networks,用于模拟人类对物体的感知和communication中的意识。
  • results: 作者在多个实验中证明了这个模型的可见性,并且在小训练数据集上,该模型的复合网络超过了单独的分类网络的性能。未来的工作将利用这个模型,以更好地理解人类communication和个人表达。
    Abstract Humans perceive objects daily and communicate their perceptions using various channels. Here, we describe a computational model that track and simulate objects' perception, and their representations as they pass in communication. We describe two key components of our internal representation ('observed' and 'seen') and relate them to familiar computer vision terms (encoding and decoding). These elements joined together to form semiotic networks, which simulate awareness in object perception and human communication. Nowadays, most neural networks are uninterpretable. On the other hand, our model is free from this disadvantages. We performed several experiments and demonstrated the visibility of our model. We describe how our network may be used as preprocessing unit to any classification network. In our experiments the compound network overperforms in average the classification network at datasets with small training data. Future work would leverage our model to gain better understanding of human communications and personal representations.
    摘要 人们日常接触物体,通过不同的渠道传达自己的感知。我们描述了一种计算模型,可以跟踪和模拟物体的感知和表达,以及它们在交流中的表现。我们描述了两个关键组成部分('观察'和'看到'),与 familar computer vision terms(编码和解码)相关。这些元素结合形成了 semiotic networks,可以模拟人类对物体感知和communication的意识。现在,大多数神经网络都是不可解释的。然而,我们的模型免受这些缺点。我们进行了多个实验,并证明了我们的模型的可见性。我们描述了如何使用我们的网络作为任何分类网络的预处理单元,并在实验中发现了compound network在小训练数据集上的超越性。未来的工作将利用我们的模型,更好地理解人类通信和个人表示。

TILFA: A Unified Framework for Text, Image, and Layout Fusion in Argument Mining

  • paper_url: http://arxiv.org/abs/2310.05210
  • repo_url: https://github.com/hkust-knowcomp/tilfa
  • paper_authors: Qing Zong, Zhaowei Wang, Baixuan Xu, Tianshi Zheng, Haochen Shi, Weiqi Wang, Yangqiu Song, Ginny Y. Wong, Simon See
  • for: 本研究旨在分析作者的立场(Argument Mining)
  • methods: 该研究使用了一种新的框架—TILFA(约文本、图像和布局融合框架),可以处理混合数据(文本和图像),并且可以理解文本以及检测图像中的光学字符和布局细节
  • results: 该模型在Argumentative Stance Classification子 зада务中显著超过了现有的基eline,为知识共享(KnowComp)团队赢得了第一名
    Abstract A main goal of Argument Mining (AM) is to analyze an author's stance. Unlike previous AM datasets focusing only on text, the shared task at the 10th Workshop on Argument Mining introduces a dataset including both text and images. Importantly, these images contain both visual elements and optical characters. Our new framework, TILFA (A Unified Framework for Text, Image, and Layout Fusion in Argument Mining), is designed to handle this mixed data. It excels at not only understanding text but also detecting optical characters and recognizing layout details in images. Our model significantly outperforms existing baselines, earning our team, KnowComp, the 1st place in the leaderboard of Argumentative Stance Classification subtask in this shared task.
    摘要 主要目标之一的Argument Mining(AM)是分析作者的态度。与过去的AM数据集仅专注于文本的情况不同,这个共同任务在10个Argument Mining工作坊中引入了包括文本和图像的数据集。重要的是,这些图像包含视觉元素和光学字符。我们的新框架TILFA(文本、图像和布局融合在Argument Mining中的一体化框架)针对这种混合数据进行处理。它不仅能够理解文本,还能探测光学字符和图像中的布局细节。我们的模型在Argumentative Stance Classification子任务中显著超越了现有的基线,让我们的团队 KnowComp 在领导板块中获得第一名。

Scaling Laws of RoPE-based Extrapolation

  • paper_url: http://arxiv.org/abs/2310.05209
  • repo_url: https://github.com/OpenLMLab/scaling-rope
  • paper_authors: Xiaoran Liu, Hang Yan, Shuo Zhang, Chenxin An, Xipeng Qiu, Dahua Lin
  • for: 本文主要研究了基于Rotary Position Embedding(RoPE)的大型自然语言模型(LLM)的推断能力。
  • methods: 本文提出了一种基于RoPE的推断方法,包括修改RoPE的基数和提供长文本练习。
  • results: 本文在16K训练长度下,通过调整RoPE的基数和练习文本长度,实现了在1000000上下文长度内的推断。同时,本文还提出了一种periodic perspective下的扩展法则,以描述推断性能与基数和练习文本长度之间的关系。
    Abstract The extrapolation capability of Large Language Models (LLMs) based on Rotary Position Embedding is currently a topic of considerable interest. The mainstream approach to addressing extrapolation with LLMs involves modifying RoPE by replacing 10000, the rotary base of $\theta_n={10000}^{-2n/d}$ in the original RoPE, with a larger value and providing longer fine-tuning text. In this work, we first observe that fine-tuning a RoPE-based LLM with either a smaller or larger base in pre-training context length could significantly enhance its extrapolation performance. After that, we propose \textbf{\textit{Scaling Laws of RoPE-based Extrapolation}, a unified framework from the periodic perspective, to describe the relationship between the extrapolation performance and base value as well as tuning context length. In this process, we also explain the origin of the RoPE-based extrapolation issue by \textbf{\textit{critical dimension for extrapolation}. Besides these observations and analyses, we achieve extrapolation up to 1 million context length within only 16K training length on LLaMA2 7B and 13B.
    摘要 Currently, the ability of Large Language Models (LLMs) to extrapolate is a topic of great interest. The mainstream approach to improving extrapolation with LLMs is to modify Rotary Position Embedding (RoPE) by replacing the rotary base of $\theta_n={10000}^{-2n/d}$ with a larger value and providing longer fine-tuning text. In this study, we find that fine-tuning a RoPE-based LLM with a smaller or larger base in the pre-training context length can significantly enhance its extrapolation performance. We then propose the \textbf{\textit{Scaling Laws of RoPE-based Extrapolation}, a unified framework from a periodic perspective, to describe the relationship between the extrapolation performance and base value as well as tuning context length. In this process, we also explain the origin of the RoPE-based extrapolation issue by \textbf{\textit{critical dimension for extrapolation}. Furthermore, we achieve extrapolation up to 1 million context length within only 16K training length on LLaMA2 7B and 13B.Note:* "Rotary Position Embedding" (RoPE) is translated as "旋转位嵌入" (Fánzàng wèi yù) in Simplified Chinese.* "Large Language Models" (LLMs) is translated as "大型语言模型" (dàxí yǔyán módelǐ) in Simplified Chinese.

Quantifying Zero-shot Coordination Capability with Behavior Preferring Partners

  • paper_url: http://arxiv.org/abs/2310.05208
  • repo_url: None
  • paper_authors: Xihuai Wang, Shao Zhang, Wenhao Zhang, Wentao Dong, Jingxiao Chen, Ying Wen, Weinan Zhang
  • for: 评估 Zero-shot coordination(ZSC)能力的可靠、全面和高效的评估方法。
  • methods: 提出了一种基于“理想完整”评估伙伴的评估方法,包括构建“完整”评估伙伴和Multi-dimensional度量指标BR-Prox。
  • results: 使用提出的评估方法重新评估了强大的ZSC方法在Overcooked环境中的性能,结果显示一些最常用的布局下,不同ZSC方法的性能差异不明显。此外,评估的ZSC方法需要生成更多和更高性能的训练伙伴。
    Abstract Zero-shot coordination (ZSC) is a new challenge focusing on generalizing learned coordination skills to unseen partners. Existing methods train the ego agent with partners from pre-trained or evolving populations. The agent's ZSC capability is typically evaluated with a few evaluation partners, including human and agent, and reported by mean returns. Current evaluation methods for ZSC capability still need to improve in constructing diverse evaluation partners and comprehensively measuring the ZSC capability. We aim to create a reliable, comprehensive, and efficient evaluation method for ZSC capability. We formally define the ideal 'diversity-complete' evaluation partners and propose the best response (BR) diversity, which is the population diversity of the BRs to the partners, to approximate the ideal evaluation partners. We propose an evaluation workflow including 'diversity-complete' evaluation partners construction and a multi-dimensional metric, the Best Response Proximity (BR-Prox) metric. BR-Prox quantifies the ZSC capability as the performance similarity to each evaluation partner's approximate best response, demonstrating generalization capability and improvement potential. We re-evaluate strong ZSC methods in the Overcooked environment using the proposed evaluation workflow. Surprisingly, the results in some of the most used layouts fail to distinguish the performance of different ZSC methods. Moreover, the evaluated ZSC methods must produce more diverse and high-performing training partners. Our proposed evaluation workflow calls for a change in how we efficiently evaluate ZSC methods as a supplement to human evaluation.
    摘要 Zero-shot coordination (ZSC) 是一个新的挑战,旨在将已经学习的协调技能应用到未见过的伙伴上。现有的方法将自己作为主体Agent训练的伙伴来自预先训练或进化的人类和机器人 population。主体Agent的 ZSC 能力通常是通过一些评估伙伴,包括人类和机器人,并由平均回应报告。现有的评估方法 для ZSC 能力仍然需要改进,以建立多样化的评估伙伴和全面地衡量 ZSC 能力。我们希望创建一个可靠、全面和高效的评估方法。我们正式定义了理想的 '多样化完整' 评估伙伴,并提出了最佳回应多样性(BR 多样性),它是评估伙伴的 Population 多样性的最佳回应。我们提出了一个评估工作流程,包括 '多样化完整' 评估伙伴的建构和多维度度量,即最佳回应距离度量(BR-Prox)。BR-Prox 量化 ZSC 能力为对每个评估伙伴的近似最佳回应的性能相似度,显示了扩展性和改进潜力。我们在 Overcooked 环境中重新评估了强大 ZSC 方法,结果显示,在一些最常用的布局中,不能区分不同 ZSC 方法的表现。此外,评估 ZSC 方法的伙伴必须生成更多和更高性能的训练伙伴。我们的提出的评估工作流程将对 ZSC 方法的评估作为补充 human 评估。

Boosting Facial Action Unit Detection Through Jointly Learning Facial Landmark Detection and Domain Separation and Reconstruction

  • paper_url: http://arxiv.org/abs/2310.05207
  • repo_url: None
  • paper_authors: Ziqiao Shang, Li Yu
  • for: 这篇研究旨在提出一个新的 facial action unit (AU) 检测框架,以便在无标的面部图像中进行supervised检测。
  • methods: 这篇研究使用多任务学习,将AU领域分类和重建、面部标志检测共享同structural facial extraction模组的 Parameters。此外,提出了一个基于对照学习的新Feature alignment方案,加入了四个中途supervisors,以促进特征重建过程。
  • results: 实验结果显示,该方法在两个benchmark上具有较高的精度和稳定性,较之前所有方法有所提高。
    Abstract Recently how to introduce large amounts of unlabeled facial images in the wild into supervised Facial Action Unit (AU) detection frameworks has become a challenging problem. In this paper, we propose a new AU detection framework where multi-task learning is introduced to jointly learn AU domain separation and reconstruction and facial landmark detection by sharing the parameters of homostructural facial extraction modules. In addition, we propose a new feature alignment scheme based on contrastive learning by simple projectors and an improved contrastive loss, which adds four additional intermediate supervisors to promote the feature reconstruction process. Experimental results on two benchmarks demonstrate our superiority against the state-of-the-art methods for AU detection in the wild.
    摘要

GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models

  • paper_url: http://arxiv.org/abs/2310.05205
  • repo_url: https://github.com/bigrl-team/gear
  • paper_authors: Hanjing Wang, Man-Kit Sit, Congjie He, Ying Wen, Weinan Zhang, Jun Wang, Yaodong Yang, Luo Mai
  • for: 这篇论文旨在开发一种分布式、GPU-中心的经验回忆系统(GEAR),用于执行扩展的强化学习(RL),并使用大 sequences 模型(如 transformers)。
  • methods: GEAR 使用了一种优化的内存管理策略,使得 GPU 服务器的内存资源(包括主机内存和设备内存)可以有效地管理经验数据。此外,它还实现了分布式的 GPU 设备来快速执行不同的经验选择策略,从而缓解计算瓶颈。 GEAR 还使用了 GPU 加速器来收集经验数据,并使用零复制访问主机内存和远程指定内存访问来提高通信效率。
  • results: 根据集群实验结果,GEAR 可以与 Reverb 相比,在训练 state-of-the-art 大 RL 模型时达到6倍的性能水平。
    Abstract This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers). With such models, existing systems such as Reverb face considerable bottlenecks in memory, computation, and communication. GEAR, however, optimizes memory efficiency by enabling the memory resources on GPU servers (including host memory and device memory) to manage trajectory data. Furthermore, it facilitates decentralized GPU devices to expedite various trajectory selection strategies, circumventing computational bottlenecks. GEAR is equipped with GPU kernels capable of collecting trajectories using zero-copy access to host memory, along with remote-directed-memory access over InfiniBand, improving communication efficiency. Cluster experiments have shown that GEAR can achieve performance levels up to 6x greater than Reverb when training state-of-the-art large RL models. GEAR is open-sourced at https://github.com/bigrl-team/gear.
    摘要 Translated into Simplified Chinese:这篇论文介绍了一种分布式、GPU中心的经验回放系统GEAR,用于执行可扩展的 reinforcement learning(RL),并且可以使用大型序列模型(如转换器)。现有的系统如Reverb,在内存、计算和通信方面都会遇到严重的瓶颈。然而,GEAR通过启用 GPU 服务器上的内存资源(包括主机内存和设备内存)来管理轨迹数据,从而提高了内存效率。此外,它还可以让分布式的 GPU 设备优先级化不同的轨迹选择策略,以避免计算瓶颈。GEAR 具有可收集轨迹的 GPU kernels,使用零复制访问主机内存,以及通过 InfiniBand 进行远程指定内存访问,以提高通信效率。在分布式集群实验中,GEAR 可以与 Reverb 训练state-of-the-art 大型 RL 模型时, achieve 性能水平高达 6 倍。GEAR 开源在

GMMFormer: Gaussian-Mixture-Model based Transformer for Efficient Partially Relevant Video Retrieval

  • paper_url: http://arxiv.org/abs/2310.05195
  • repo_url: None
  • paper_authors: Yuting Wang, Jinpeng Wang, Bin Chen, Ziyun Zeng, Shu-Tao Xia
  • For: This paper is written for partially relevant video retrieval (PRVR), which aims to find untrimmed videos containing pertinent moments in a database.* Methods: The paper proposes a novel method called GMMFormer, which models clip representations implicitly using a Gaussian-Mixture-Model (GMM) and Transformer architecture. The method incorporates Gaussian-Mixture-Model constraints during frame interactions to focus each frame on its adjacent frames, generating representations that contain multi-scale clip information.* Results: The paper demonstrates the superiority and efficiency of GMMFormer through extensive experiments on three large-scale video datasets (TVR, ActivityNet Captions, and Charades-STA). The results show that GMMFormer outperforms existing PRVR methods and achieves better efficiency by reducing the storage overhead and improving the embedding space.
    Abstract Given a text query, partially relevant video retrieval (PRVR) seeks to find untrimmed videos containing pertinent moments in a database. For PRVR, clip modeling is essential to capture the partial relationship between texts and videos. Current PRVR methods adopt scanning-based clip construction to achieve explicit clip modeling, which is information-redundant and requires a large storage overhead. To solve the efficiency problem of PRVR methods, this paper proposes GMMFormer, a \textbf{G}aussian-\textbf{M}ixture-\textbf{M}odel based Trans\textbf{former} which models clip representations implicitly. During frame interactions, we incorporate Gaussian-Mixture-Model constraints to focus each frame on its adjacent frames instead of the whole video. Then generated representations will contain multi-scale clip information, achieving implicit clip modeling. In addition, PRVR methods ignore semantic differences between text queries relevant to the same video, leading to a sparse embedding space. We propose a query diverse loss to distinguish these text queries, making the embedding space more intensive and contain more semantic information. Extensive experiments on three large-scale video datasets (\ie, TVR, ActivityNet Captions, and Charades-STA) demonstrate the superiority and efficiency of GMMFormer.
    摘要 During frame interactions, we incorporate Gaussian-Mixture-Model constraints to focus each frame on its adjacent frames instead of the whole video. This allows generated representations to contain multi-scale clip information, achieving implicit clip modeling. Additionally, PRVR methods ignore semantic differences between text queries relevant to the same video, leading to a sparse embedding space. We propose a query diverse loss to distinguish these text queries, making the embedding space more intense and contain more semantic information.Extensive experiments on three large-scale video datasets (TVR, ActivityNet Captions, and Charades-STA) demonstrate the superiority and efficiency of GMMFormer.

Factuality Challenges in the Era of Large Language Models

  • paper_url: http://arxiv.org/abs/2310.05189
  • repo_url: None
  • paper_authors: Isabelle Augenstein, Timothy Baldwin, Meeyoung Cha, Tanmoy Chakraborty, Giovanni Luca Ciampaglia, David Corney, Renee DiResta, Emilio Ferrara, Scott Hale, Alon Halevy, Eduard Hovy, Heng Ji, Filippo Menczer, Ruben Miguez, Preslav Nakov, Dietram Scheufele, Shivam Sharma, Giovanni Zagni
  • for: 本研究旨在探讨Generative AI技术的发展以及其对社会的影响,尤其是LLMs技术的潜在的威胁和风险。
  • methods: 本研究采用了文献综述和讨论的方法,检视了现有的LLMs技术和其应用场景,并分析了这些技术的潜在的威胁和风险。
  • results: 本研究发现了一些LLMs技术的潜在威胁和风险,包括生成假信息和假 profiles,以及恶意利用这些技术来欺诈用户。同时,本研究还提出了一些可能的解决方案,如实施技术审核和评估机制,提高用户的AI理解水平,以及进行更多的研究和规范。
    Abstract The emergence of tools based on Large Language Models (LLMs), such as OpenAI's ChatGPT, Microsoft's Bing Chat, and Google's Bard, has garnered immense public attention. These incredibly useful, natural-sounding tools mark significant advances in natural language generation, yet they exhibit a propensity to generate false, erroneous, or misleading content -- commonly referred to as "hallucinations." Moreover, LLMs can be exploited for malicious applications, such as generating false but credible-sounding content and profiles at scale. This poses a significant challenge to society in terms of the potential deception of users and the increasing dissemination of inaccurate information. In light of these risks, we explore the kinds of technological innovations, regulatory reforms, and AI literacy initiatives needed from fact-checkers, news organizations, and the broader research and policy communities. By identifying the risks, the imminent threats, and some viable solutions, we seek to shed light on navigating various aspects of veracity in the era of generative AI.
    摘要 LLM(大语言模型)技术的出现,如OpenAI的ChatGPT、Microsoft的Bing Chat以及Google的Bard,吸引了广泛的公众关注。这些极其有用、自然 звуча的工具表现出了对自然语言生成的重要进步,但它们往往会生成错误、误导性的内容,通常被称为“幻见”。此外,LLM可能会被恶用于黑客活动,如大规模生成假 pero Credible-sounding内容和 Profile。这对社会带来了误导用户的风险,以及假信息的扩散。为了面对这些挑战,我们需要从事实核查、法规改革以及人工智能文化培训等方面来解决这些问题。我们希望通过识别风险、危机点以及可行的解决方案,为在生成AI时的真实性提供指南。

Evolutionary Retrosynthetic Route Planning

  • paper_url: http://arxiv.org/abs/2310.05186
  • repo_url: None
  • paper_authors: Yan Zhang, Hao Hao, Xiao He, Shuanhu Gao, Aimin Zhou
  • for: 本研究目的是提出一种基于进化算法的多步反Synthesis路径规划方法,以解决现有的反Synthesis问题。
  • methods: 该方法首先将反Synthesis问题转化为优化问题,定义搜索空间和操作。此外,为提高搜索效率, parallel 策略被实现。
  • results: 对四种产品的实验结果表明,相比较 Monte Carlo tree search 算法,EA 可以Significantly 减少单步模型的调用数(均减少53.9%),搜索三个解决方案的时间减少83.9%,并同时提高可行搜索路径的数量(增加5倍)。
    Abstract Molecular retrosynthesis is a significant and complex problem in the field of chemistry, however, traditional manual synthesis methods not only need well-trained experts but also are time-consuming. With the development of big data and machine learning, artificial intelligence (AI) based retrosynthesis is attracting more attention and is becoming a valuable tool for molecular retrosynthesis. At present, Monte Carlo tree search is a mainstream search framework employed to address this problem. Nevertheless, its search efficiency is compromised by its large search space. Therefore, we propose a novel approach for retrosynthetic route planning based on evolutionary optimization, marking the first use of Evolutionary Algorithm (EA) in the field of multi-step retrosynthesis. The proposed method involves modeling the retrosynthetic problem into an optimization problem, defining the search space and operators. Additionally, to improve the search efficiency, a parallel strategy is implemented. The new approach is applied to four case products, and is compared with Monte Carlo tree search. The experimental results show that, in comparison to the Monte Carlo tree search algorithm, EA significantly reduces the number of calling single-step model by an average of 53.9%. The time required to search three solutions decreased by an average of 83.9%, and the number of feasible search routes increases by 5 times.
    摘要 分子逆synthesis是化学领域中的一个重要和复杂问题,但传统的手动合成方法不仅需要高水平的专业人员,还需要很长的时间。随着大数据和机器学习的发展,人工智能(AI)基于的逆synthesis在这一问题上吸引了更多的注意力,成为化学领域的一种有价值的工具。目前,蒙特卡洛tree搜索是逆synthesis搜索框架的主流,但它的搜索效率受到搜索空间的限制。因此,我们提出了一种基于进化优化的新方法,标志着多步逆synthesis中Evolutionary Algorithm(EA)的首次应用。该方法包括将逆synthesis问题转化为优化问题,定义搜索空间和运算符。此外,为了提高搜索效率,并行策略被实现。新方法在四种case продуkttest中应用,并与蒙特卡洛tree搜索进行比较。实验结果表明,相比蒙特卡洛tree搜索算法,EA可以平均减少单步模型的呼び出数量53.9%,搜索三个解决方案所需的时间减少83.9%,并同时提高可行搜索路径的数量5倍。

Text2NKG: Fine-Grained N-ary Relation Extraction for N-ary relational Knowledge Graph Construction

  • paper_url: http://arxiv.org/abs/2310.05185
  • repo_url: https://github.com/lhrlab/text2nkg
  • paper_authors: Haoran Luo, Haihong E, Yuhao Yang, Tianyu Yao, Yikai Guo, Zichen Tang, Wentai Zhang, Kaiyang Wan, Shiyao Peng, Meina Song, Wei Lin
  • for: 这篇论文旨在构建基于文本的n-ary关系知识图(NKG),以便更好地表达现实世界中的多元关系。
  • methods: 本文提出了一种新的细化n-ary关系抽取方法,使用 span-tuple classification 和 heteo-ordered merging 技术来实现不同的n-ary关系抽取。
  • results: 实验结果表明,Text2NKG 比前一代模型提高了 nearly 20% 的 $F_1$ 分数在 Hyper-relational schema 中的细化n-ary关系抽取任务上。
    Abstract Beyond traditional binary relational facts, n-ary relational knowledge graphs (NKGs) are comprised of n-ary relational facts containing more than two entities, which are closer to real-world facts with broader applications. However, the construction of NKGs still significantly relies on manual labor, and n-ary relation extraction still remains at a course-grained level, which is always in a single schema and fixed arity of entities. To address these restrictions, we propose Text2NKG, a novel fine-grained n-ary relation extraction framework for n-ary relational knowledge graph construction. We introduce a span-tuple classification approach with hetero-ordered merging to accomplish fine-grained n-ary relation extraction in different arity. Furthermore, Text2NKG supports four typical NKG schemas: hyper-relational schema, event-based schema, role-based schema, and hypergraph-based schema, with high flexibility and practicality. Experimental results demonstrate that Text2NKG outperforms the previous state-of-the-art model by nearly 20\% points in the $F_1$ scores on the fine-grained n-ary relation extraction benchmark in the hyper-relational schema. Our code and datasets are publicly available.
    摘要 traditional binary relational facts beyond, n-ary relational knowledge graphs (NKGs) comprised of n-ary relational facts containing more than two entities, closer to real-world facts with broader applications. However, the construction of NKGs still significantly relies on manual labor, and n-ary relation extraction still remains at a course-grained level, which is always in a single schema and fixed arity of entities. To address these restrictions, we propose Text2NKG, a novel fine-grained n-ary relation extraction framework for n-ary relational knowledge graph construction. We introduce a span-tuple classification approach with hetero-ordered merging to accomplish fine-grained n-ary relation extraction in different arity. Furthermore, Text2NKG supports four typical NKG schemas: hyper-relational schema, event-based schema, role-based schema, and hypergraph-based schema, with high flexibility and practicality. Experimental results demonstrate that Text2NKG outperforms the previous state-of-the-art model by nearly 20\% points in the $F_1$ scores on the fine-grained n-ary relation extraction benchmark in the hyper-relational schema. Our code and datasets are publicly available.Here's the breakdown of the translation:* "traditional binary relational facts" becomes "传统二元关系知识"* "n-ary relational knowledge graphs" becomes "n-ary关系知识图"* "n-ary relational facts" becomes "n-ary关系事实"* "broader applications" becomes "更广泛的应用"* "manual labor" becomes "手动劳动"* "course-grained level" becomes "粗粒度层"* "single schema" becomes "单一 schema"* "fixed arity of entities" becomes " fixes 实体数量"* "Text2NKG" becomes "文本到 NKG"* "span-tuple classification" becomes " span-tuple 分类"* "hetero-ordered merging" becomes "异质顺序合并"* "fine-grained n-ary relation extraction" becomes "细化 n-ary 关系提取"* "n-ary relation extraction benchmark" becomes "n-ary 关系提取指标"* "hyper-relational schema" becomes "超过关系 schema"* "event-based schema" becomes "事件基于 schema"* "role-based schema" becomes "角色基于 schema"* "hypergraph-based schema" becomes "超graph基于 schema"* "high flexibility and practicality" becomes "高灵活性和实用性"* "previous state-of-the-art model" becomes "前一代模型"* "nearly 20\% points" becomes "约 20\% 的点数"Note that the translation is in Simplified Chinese, which is the most widely used variety of Chinese. If you need the translation in Traditional Chinese, please let me know.

Optimizing Large Language Models to Expedite the Development of Smart Contracts

  • paper_url: http://arxiv.org/abs/2310.05178
  • repo_url: None
  • paper_authors: Nii Osae Osae Dade, Margaret Lartey-Quaye, Emmanuel Teye-Kofi Odonkor, Paul Ammah
  • For: The paper aims to help developers build decentralized applications (dApps) on blockchain networks by introducing MazzumaGPT, a large language model that can generate smart contract code and improve development productivity.* Methods: The paper uses a large language model called MazzumaGPT, which is optimized for generating smart contract code. The model is fine-tuned and evaluated for functional correctness.* Results: The paper reports on the performance of MazzumaGPT in generating smart contract code and improving development productivity. The results show that the model can generate correct code and improve development efficiency. However, the paper also acknowledges some limitations and broader impacts of the research.Here is the same information in Simplified Chinese:* For: 本研究旨在帮助开发者在区块链网络上建立分布式应用程序(dApps),通过引入MazzumaGPT大语言模型,生成智能合约代码并提高开发效率。* Methods: 本研究使用MazzumaGPT大语言模型,该模型是为生成智能合约代码优化。模型进行了精度调整和功能正确性评估。* Results: 本研究报告MazzumaGPT模型在生成智能合约代码和提高开发效率方面的性能。结果显示,模型可以生成正确的代码并提高开发效率,但也存在一些限制和更广泛的影响。
    Abstract Programming has always been at the heart of technological innovation in the 21st century. With the advent of blockchain technologies and the proliferation of web3 paradigms of decentralised applications, smart contracts have been very instrumental in enabling developers to build applications that reside on decentralised blockchains. Despite the huge interest and potential of smart contracts, there is still a significant knowledge and skill gap that developers need to cross in order to build web3 applications. In light of this, we introduce MazzumaGPT, a large language model that has been optimised to generate smart contract code and aid developers to scaffold development and improve productivity. As part of this research, we outline the optimisation and fine-tuning parameters, evaluate the model's performance on functional correctness and address the limitations and broader impacts of our research.
    摘要 Programming 一直是现代科技创新的核心在21世纪。随着区块链技术的出现和分布式应用程序的普及,智能合约帮助开发者建立在分布式区块链上的应用程序。虽然智能合约具有巨大的潜在利益和潜力,但开发者仍然需要跨越一定的知识和技能差距来构建Web3应用程序。为了解决这个问题,我们介绍MazzumaGPT,一个优化的大语言模型,可以生成智能合约代码,帮助开发者快速构建和改进开发。在这项研究中,我们详细介绍优化和细调参数,评估模型的性能并讨论我们的研究的局限性和更广泛的影响。

GSLB: The Graph Structure Learning Benchmark

  • paper_url: http://arxiv.org/abs/2310.05174
  • repo_url: https://github.com/gsl-benchmark/gslb
  • paper_authors: Zhixun Li, Liang Wang, Xin Sun, Yifan Luo, Yanqiao Zhu, Dingshuo Chen, Yingtao Luo, Xiangxin Zhou, Qiang Liu, Shu Wu, Liang Wang, Jeffrey Xu Yu
  • for: 本研究的目的是为Graph Structure Learning (GSL)提供一个系统的分析和评估,以便更好地理解GSL在不同情况下的表现。
  • methods: 本研究使用了20种不同的图 dataset和16种不同的 GSL 算法,并进行了系统的性能分析和比较。
  • results: 研究发现,GSL 在 node-level 和 graph-level 任务中表现出色,并且在鲁棒学习和模型复杂度方面也有出色的表现。
    Abstract Graph Structure Learning (GSL) has recently garnered considerable attention due to its ability to optimize both the parameters of Graph Neural Networks (GNNs) and the computation graph structure simultaneously. Despite the proliferation of GSL methods developed in recent years, there is no standard experimental setting or fair comparison for performance evaluation, which creates a great obstacle to understanding the progress in this field. To fill this gap, we systematically analyze the performance of GSL in different scenarios and develop a comprehensive Graph Structure Learning Benchmark (GSLB) curated from 20 diverse graph datasets and 16 distinct GSL algorithms. Specifically, GSLB systematically investigates the characteristics of GSL in terms of three dimensions: effectiveness, robustness, and complexity. We comprehensively evaluate state-of-the-art GSL algorithms in node- and graph-level tasks, and analyze their performance in robust learning and model complexity. Further, to facilitate reproducible research, we have developed an easy-to-use library for training, evaluating, and visualizing different GSL methods. Empirical results of our extensive experiments demonstrate the ability of GSL and reveal its potential benefits on various downstream tasks, offering insights and opportunities for future research. The code of GSLB is available at: https://github.com/GSL-Benchmark/GSLB.
    摘要 “几年前,Graph Structure Learning(GSL)已经吸引了很多注意,因为它可以同时优化Graph Neural Networks(GNNs)的参数和计算图структура。不过,过去几年发展的GSL方法中,没有一个通用的实验设置或公平的比较方法,这导致了理解这个领域的进步受到了很大的阻碍。为了填补这个空白,我们系统地分析了GSL在不同的场景下的表现,并开发了一个全面的Graph Structure Learning Benchmark(GSLB),收集了20个多标的图数据和16种不同的GSL算法。具体来说,GSLB系统地探讨了GSL的特点在三个维度上:有效性、韧性和复杂度。我们对现今的State-of-the-art GSL算法进行了node-和graph-水平的任务,并分析了它们在Robust Learning和模型复杂度上的表现。此外,为了促进可重现性的研究,我们开发了一个容易使用的库,可以用于训练、评估和显示不同的GSL方法。我们的广泛的实验结果显示了GSL的能力,并给出了不同下游任务的可能性和未来研究的方向。GSLB的代码可以在:https://github.com/GSL-Benchmark/GSLB中找到。”

Multi-Ship Tracking by Robust Similarity metric

  • paper_url: http://arxiv.org/abs/2310.05171
  • repo_url: None
  • paper_authors: Hongyu Zhao, Gongming Wei, Yang Xiao, Xianglei Xing
  • for: 提高多船跟踪(MST)技术的应用于海上情况意识和自动船 Navigation System 的发展。
  • methods: 通过在多目标跟踪(MOT)算法中使用最小几何形态的拟合来提高跟踪性能。
  • results: 通过将TIoU metricintegrated into state-of-the-art object tracking frameworks, such as DeepSort and ByteTrack, achieving improvements in tracking performance.
    Abstract Multi-ship tracking (MST) as a core technology has been proven to be applied to situational awareness at sea and the development of a navigational system for autonomous ships. Despite impressive tracking outcomes achieved by multi-object tracking (MOT) algorithms for pedestrian and vehicle datasets, these models and techniques exhibit poor performance when applied to ship datasets. Intersection of Union (IoU) is the most popular metric for computing similarity used in object tracking. The low frame rates and severe image shake caused by wave turbulence in ship datasets often result in minimal, or even zero, Intersection of Union (IoU) between the predicted and detected bounding boxes. This issue contributes to frequent identity switches of tracked objects, undermining the tracking performance. In this paper, we address the weaknesses of IoU by incorporating the smallest convex shapes that enclose both the predicted and detected bounding boxes. The calculation of the tracking version of IoU (TIoU) metric considers not only the size of the overlapping area between the detection bounding box and the prediction box, but also the similarity of their shapes. Through the integration of the TIoU into state-of-the-art object tracking frameworks, such as DeepSort and ByteTrack, we consistently achieve improvements in the tracking performance of these frameworks.
    摘要 多船跟踪(MST)作为核心技术已被应用于海上情况意识和自动船 Navigation System 的开发。 despite impressive tracking outcomes achieved by multi-object tracking(MOT)算法for pedestrian and vehicle datasets, these models and techniques exhibit poor performance when applied to ship datasets。 Intersection of Union(IoU)是计算相似性的最受欢迎度量, However, the low frame rates and severe image shake caused by wave turbulence in ship datasets often result in minimal, or even zero, Intersection of Union(IoU)between the predicted and detected bounding boxes。 This issue contributes to frequent identity switches of tracked objects, undermining the tracking performance。 In this paper, we address the weaknesses of IoU by incorporating the smallest convex shapes that enclose both the predicted and detected bounding boxes。 The calculation of the tracking version of IoU(TIoU)metric considers not only the size of the overlapping area between the detection bounding box and the prediction box, but also the similarity of their shapes。 Through the integration of the TIoU into state-of-the-art object tracking frameworks, such as DeepSort and ByteTrack, we consistently achieve improvements in the tracking performance of these frameworks。

DeepQTest: Testing Autonomous Driving Systems with Reinforcement Learning and Real-world Weather Data

  • paper_url: http://arxiv.org/abs/2310.05170
  • repo_url: https://github.com/simula-complex/deepqtest
  • paper_authors: Chengjie Lu, Tao Yue, Man Zhang, Shaukat Ali
  • for: 这个论文的目的是提出一种基于强化学习的自动驾驶系统测试方法,以确保自动驾驶系统的安全性。
  • methods: 这种测试方法使用强化学习的深度Q学习算法,以学习环境配置,并采用了三种安全和舒适度量来构建奖励函数。
  • results: 对于三个比较基线,深度Q测试表现出显著更高的效果,能够更好地激发自动驾驶系统的异常行为,并确保测试场景的现实性。
    Abstract Autonomous driving systems (ADSs) are capable of sensing the environment and making driving decisions autonomously. These systems are safety-critical, and testing them is one of the important approaches to ensure their safety. However, due to the inherent complexity of ADSs and the high dimensionality of their operating environment, the number of possible test scenarios for ADSs is infinite. Besides, the operating environment of ADSs is dynamic, continuously evolving, and full of uncertainties, which requires a testing approach adaptive to the environment. In addition, existing ADS testing techniques have limited effectiveness in ensuring the realism of test scenarios, especially the realism of weather conditions and their changes over time. Recently, reinforcement learning (RL) has demonstrated great potential in addressing challenging problems, especially those requiring constant adaptations to dynamic environments. To this end, we present DeepQTest, a novel ADS testing approach that uses RL to learn environment configurations with a high chance of revealing abnormal ADS behaviors. Specifically, DeepQTest employs Deep Q-Learning and adopts three safety and comfort measures to construct the reward functions. To ensure the realism of generated scenarios, DeepQTest defines a set of realistic constraints and introduces real-world weather conditions into the simulated environment. We employed three comparison baselines, i.e., random, greedy, and a state-of-the-art RL-based approach DeepCOllision, for evaluating DeepQTest on an industrial-scale ADS. Evaluation results show that DeepQTest demonstrated significantly better effectiveness in terms of generating scenarios leading to collisions and ensuring scenario realism compared with the baselines. In addition, among the three reward functions implemented in DeepQTest, Time-To-Collision is recommended as the best design according to our study.
    摘要 自动驾驶系统(ADS)具有感知环境和做出自主驾驶决策的能力。这些系统的安全性非常重要,测试是确保其安全的重要方法。然而,由于ADS的内在复杂性和操作环境的高维度,测试场景的数量是无限的。此外,ADS的操作环境是动态不断变化的,充满不确定性,需要适应环境的测试方法。此外,现有的ADS测试技术对测试场景的真实性具有有限的效果,特别是天气变化和时间的变化。最近,人工智能学习(RL)已经在解决复杂问题方面表现出了极大的潜力。为此,我们提出了 DeepQTest,一种基于RL学习环境配置,以高概率暴露ADS异常行为的测试方法。具体来说,DeepQTest使用深度Q学习并采用了三种安全和舒适度量来定义奖励函数。为保证生成的场景的真实性,DeepQTest定义了一组真实的约束和将实际天气条件引入模拟环境中。我们对ADS进行了三种比较基准,即随机、积极和当前State-of-the-art RL基于approach DeepCOllision,以评估DeepQTest的效果。评估结果表明,DeepQTest与基准相比显著地提高了导致碰撞的场景生成和场景真实性的效果。此外,我们对DeepQTest中实现的三种奖励函数进行了研究,并确定了时间到碰撞为最佳设计。

Hieros: Hierarchical Imagination on Structured State Space Sequence World Models

  • paper_url: http://arxiv.org/abs/2310.05167
  • repo_url: https://github.com/snagnar/hieros
  • paper_authors: Paul Mattes, Rainer Schlosser, Ralf Herbrich
  • for: 本研究旨在提高现代深度强化学习(DRL)算法的样本效率。
  • methods: 我们提出了一种层次策略(Hieros),该策略使用了S5层来学习时间抽象的世界表示,并在幂 espacio 中预测下一个世界状态。
  • results: 我们的方法在Atari 100k Benchmark上的平均和中位数正常化人工分数中超过了现有的状态势。此外,我们的提出的世界模型能够准确预测复杂的动力学。此外,我们还发现了Hieros在探索方面的优势。
    Abstract One of the biggest challenges to modern deep reinforcement learning (DRL) algorithms is sample efficiency. Many approaches learn a world model in order to train an agent entirely in imagination, eliminating the need for direct environment interaction during training. However, these methods often suffer from either a lack of imagination accuracy, exploration capabilities, or runtime efficiency. We propose Hieros, a hierarchical policy that learns time abstracted world representations and imagines trajectories at multiple time scales in latent space. Hieros uses an S5 layer-based world model, which predicts next world states in parallel during training and iteratively during environment interaction. Due to the special properties of S5 layers, our method can train in parallel and predict next world states iteratively during imagination. This allows for more efficient training than RNN-based world models and more efficient imagination than Transformer-based world models. We show that our approach outperforms the state of the art in terms of mean and median normalized human score on the Atari 100k benchmark, and that our proposed world model is able to predict complex dynamics very accurately. We also show that Hieros displays superior exploration capabilities compared to existing approaches.
    摘要 一个现代深度奖励学习(DRL)算法的主要挑战是样本效率。许多方法尝试通过在幻想中训练代理人,从而消除直接环境互动的需要。然而,这些方法经常受到幻想准确性、探索能力或运行效率的限制。我们提出了 Hieros,一种层次策略,该策略在幻想中预测时间抽象的世界表示和轨迹,并在多个时间尺度上进行幻想。Heros使用基于 S5 层的世界模型,该模型在训练和环境互动过程中并行地预测下一个世界状态。由于 S5 层的特殊性,我们的方法可以并行地训练和在幻想中预测下一个世界状态。这使得我们的方法比 RNN 类世界模型更高效,并且比 Transformer 类世界模型更高效。我们表明,我们的方法在 Atari 100k 测试集上的平均和中位数normalized human score比 state of the art 高,并且我们提出的世界模型能够准确预测复杂的动力学。此外,我们还证明 Hieros 在探索方面表现出优于现有的方法。

MenatQA: A New Dataset for Testing the Temporal Comprehension and Reasoning Abilities of Large Language Models

  • paper_url: http://arxiv.org/abs/2310.05157
  • repo_url: https://github.com/weiyifan1023/MenatQA
  • paper_authors: Yifan Wei, Yisong Su, Huanhuan Ma, Xiaoyan Yu, Fangyu Lei, Yuanzhe Zhang, Jun Zhao, Kang Liu
  • for: This paper aims to evaluate the time comprehension and reasoning abilities of large language models (LLMs) and investigate potential improvement strategies.
  • methods: The paper constructs a benchmark task called Multiple Sensitive Factors Time QA (MenatQA) that tests LLMs’ performance on three temporal factors (scope factor, order factor, counterfactual factor) with a total of 2,853 samples.
  • results: Most LLMs fall behind smaller temporal reasoning models in terms of performance on the MenatQA task, particularly in handling temporal biases and utilizing external information. The paper also explores potential improvement strategies such as devising specific prompts and leveraging external tools.
    Abstract Large language models (LLMs) have shown nearly saturated performance on many natural language processing (NLP) tasks. As a result, it is natural for people to believe that LLMs have also mastered abilities such as time understanding and reasoning. However, research on the temporal sensitivity of LLMs has been insufficiently emphasized. To fill this gap, this paper constructs Multiple Sensitive Factors Time QA (MenatQA), which encompasses three temporal factors (scope factor, order factor, counterfactual factor) with total 2,853 samples for evaluating the time comprehension and reasoning abilities of LLMs. This paper tests current mainstream LLMs with different parameter sizes, ranging from billions to hundreds of billions. The results show most LLMs fall behind smaller temporal reasoning models with different degree on these factors. In specific, LLMs show a significant vulnerability to temporal biases and depend heavily on the temporal information provided in questions. Furthermore, this paper undertakes a preliminary investigation into potential improvement strategies by devising specific prompts and leveraging external tools. These approaches serve as valuable baselines or references for future research endeavors.
    摘要 大型语言模型(LLM)在自然语言处理(NLP)任务上已经显示出几乎满足性的性能。因此,人们对于 LLM 的时间理解和推理能力的拥有有了误解。然而,对于 LLN 的时间敏感性的研究却得不到充分的关注。为了填补这个空白,这篇文章建立了多重敏感因素时间问答(MenatQA),包括三个时间因素(范围因素、次序因素、Counterfactual因素),共有2,853个样本,用于评估 LLM 的时间理解和推理能力。这篇文章测试了现代主流 LLM 的不同参数大小,从十亿到百亿。结果显示,大多数 LLM 落后于不同程度的时间推理模型。具体来说,LLM 对于时间偏见具有重要的敏感性,并且对于时间提供的问题中的时间信息依赖很大。此外,这篇文章进行了初步的改进策略研究,包括设计特定的提示和使用外部工具。这些方法可以作为未来研究的基础或参考。

  • paper_url: http://arxiv.org/abs/2310.05155
  • repo_url: https://github.com/qiancheng0/toolink
  • paper_authors: Cheng Qian, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu
  • for: The paper aims to develop a comprehensive framework for task-solving using tool-based chain-of-solving (CoS) approach, with the goal of leveraging smaller, open-sourced models for adaptability.
  • methods: The proposed framework, called Toolink, creates a toolkit and integrates planning and calling of tools through a CoS approach. The authors validate the efficacy of Toolink on ChatGPT and curate a CoS dataset (CoS-GPT) for task-solving. They finetune the LLaMA-7B model to create LLaMA-CoS, a powerful open-source model with advanced tool-planning and tool-calling capabilities.
  • results: The evaluation on diverse tasks from BIG-bench shows that LLaMA-CoS matches the CoS ability of ChatGPT while surpassing the chain-of-thought approach in performance. The study also demonstrates the generalization of LLaMA-CoS to unseen tasks and its capability in using toolkits not explicitly tailored for the target task, affirming its robustness in real-world scenarios.
    Abstract Large Language Models (LLMs) have demonstrated remarkable progress in utilizing tools, but their closed-source nature and high inference costs pose limitations on their adaptability, necessitating a valid method that leverages smaller, open-sourced models. In this paper, we introduce Toolink, a comprehensive framework that performs task-solving by first creating a toolkit and then integrating the planning and calling of tools through a chain-of-solving (CoS) approach. We first validate the efficacy of Toolink in harnessing the model's creativity and CoS ability on ChatGPT. Subsequently, we curate CoS-GPT, a chain-of-solving dataset designed for tool-using, and finetune the LLaMA-7B model. It results in LLaMA-CoS, a powerful open-source model with advanced tool-planning and tool-calling capabilities. Evaluation on diverse tasks from BIG-bench demonstrates its CoS ability matches that of ChatGPT while its performance surpasses the chain-of-thought approach. Further studies highlight the generalization of LLaMA-CoS to unseen tasks and showcase its capability in using toolkits not explicitly tailored for the target task, affirming its robustness in real-world scenarios. All codes and data are released.
    摘要 Here is the translation in Simplified Chinese:大型语言模型(LLMs)已经展示出具有优异的进步,但它们的关闭源代码和高推论成本导致它们的适应性有限,需要一个有效的方法来应用小型开源模型。在这篇文章中,我们介绍Toolink,一个完整的框架,通过链式解决(CoS)方法来实现任务解决。我们首先验证Toolink在ChatGPT上的有效性,然后创建CoS-GPT dataset,并调整LLaMA-7B模型。它将实现LLaMA-CoS,一个开源模型,拥有进步的工具规划和工具呼叫能力。我们从BIG-bench中的多个任务进行评估,发现LLaMA-CoS的CoS能力与ChatGPT相似,并且其表现超过链式思维方法。此外,我们还进行了进一步的研究,证明LLaMA-CoS具有对未见任务的普遍性和在不同的工具集上的可行性,这证明了它在实际情况中的可靠性。所有代码和数据都是公开发布。

Large Language Model (LLM) as a System of Multiple Expert Agents: An Approach to solve the Abstraction and Reasoning Corpus (ARC) Challenge

  • paper_url: http://arxiv.org/abs/2310.05146
  • repo_url: https://github.com/tanchongmin/arc-challenge
  • paper_authors: John Chong Min Tan, Mehul Motani
  • for: 解决Abstraction and Reasoning Corpus(ARC)挑战,使用大型自然语言模型(LLM)作为多个专家系统。
  • methods: 使用LLM的灵活性,通过零shot、几shot、上下文固定的提示,让LLM解决多种新任务。首先将输入图像转化为多种适合的文本抽象空间,然后利用LLM的协同力量,Derive输入-输出关系,并将其映射到动作形式的工作程序,类似于Voyager / Ghost in MineCraft。 Additionally, use iterative environmental feedback to guide LLMs to solve the task.
  • results: 使用提posed方法解决111个训练集问题中的50个(45%),只需三个抽象空间 - 网格、对象和像素。我们认为,通过添加更多抽象空间和学习动作,我们将能够解决更多问题。
    Abstract We attempt to solve the Abstraction and Reasoning Corpus (ARC) Challenge using Large Language Models (LLMs) as a system of multiple expert agents. Using the flexibility of LLMs to be prompted to do various novel tasks using zero-shot, few-shot, context-grounded prompting, we explore the feasibility of using LLMs to solve the ARC Challenge. We firstly convert the input image into multiple suitable text-based abstraction spaces. We then utilise the associative power of LLMs to derive the input-output relationship and map this to actions in the form of a working program, similar to Voyager / Ghost in the MineCraft. In addition, we use iterative environmental feedback in order to guide LLMs to solve the task. Our proposed approach achieves 50 solves out of 111 training set problems (45%) with just three abstraction spaces - grid, object and pixel - and we believe that with more abstraction spaces and learnable actions, we will be able to solve more.
    摘要 我们尝试使用大型自然语言模型(LLM)解决抽象和逻辑 Corpora(ARC)挑战,以多个专家代理系统的形式进行解决。通过使用 LLM 的灵活性,我们可以使其响应各种新任务,使用零上下文、几上下文、上下文固定的提示,探索使用 LLM 解决 ARC 挑战的可能性。首先,我们将输入图像转换为多个适合的文本基于抽象空间。然后,我们利用 LLMS 的协同力来推导输入-输出关系,并将其映射到作为工作程序的动作,类似于 Voyager / Ghost 在 MineCraft 中。此外,我们使用迭代环境反馈,以引导 LLMS 解决任务。我们的提议方法已经实现了 50 个训练集问题(45%)的解决,只使用了三个抽象空间 - 网格、对象和像素 - 并我们认为,通过添加更多的抽象空间和学习动作,我们将能够解决更多的问题。

NeuralFastLAS: Fast Logic-Based Learning from Raw Data

  • paper_url: http://arxiv.org/abs/2310.05145
  • repo_url: None
  • paper_authors: Theo Charalambous, Yaniv Aspis, Alessandra Russo
  • for: 本研究旨在提出一种可扩展和高效的综合方法,即NeuralFastLAS,用于同时训练神经网络和符号学习器。
  • methods: NeuralFastLAS使用一种新的约束优化技术,通过学习一个 posterior distribution 来提高训练稳定性。
  • results: 实验结果表明,NeuralFastLAS可以在数学和逻辑任务中达到状态革命级别的准确率,训练时间比其他同时训练神经网络和符号学习器的方法快到两个数量级。
    Abstract Symbolic rule learners generate interpretable solutions, however they require the input to be encoded symbolically. Neuro-symbolic approaches overcome this issue by mapping raw data to latent symbolic concepts using a neural network. Training the neural and symbolic components jointly is difficult, due to slow and unstable learning, hence many existing systems rely on hand-engineered rules to train the network. We introduce NeuralFastLAS, a scalable and fast end-to-end approach that trains a neural network jointly with a symbolic learner. For a given task, NeuralFastLAS computes a relevant set of rules, proved to contain an optimal symbolic solution, trains a neural network using these rules, and finally finds an optimal symbolic solution to the task while taking network predictions into account. A key novelty of our approach is learning a posterior distribution on rules while training the neural network to improve stability during training. We provide theoretical results for a sufficient condition on network training to guarantee correctness of the final solution. Experimental results demonstrate that NeuralFastLAS is able to achieve state-of-the-art accuracy in arithmetic and logical tasks, with a training time that is up to two orders of magnitude faster than other jointly trained neuro-symbolic methods.
    摘要 symbolic rule learners 可以生成可读解释的解决方案,但是它们需要输入数据被编码成符号形式。 neural-symbolic 方法可以将原始数据映射到隐藏的符号概念上使用神经网络,从而解决这个问题。 然而,在培aujointly trained neural and symbolic components 的问题上,存在慢速和不稳定的学习问题,因此许多现有系统通常采用手工设计规则来训练网络。我们介绍NeuralFastLAS,一种可扩展和快速的终端方法,可以同时训练神经网络和符号学习器。对于给定任务,NeuralFastLAS 可以计算一个相关的规则集,证明其中包含最优的符号解决方案,使用这些规则来训练神经网络,并最终找到一个包含神经网络预测的最优符号解决方案。我们的方法的一个新特点是在培aujointly trained neural and symbolic components 时,学习一个 posterior distribution sobre rules 以提高培aujoint training 的稳定性。我们提供了理论结果,证明在网络训练时满足某些条件下,可以保证最终解决方案的正确性。实验结果表明,NeuralFastLAS 能够在数学和逻辑任务中达到领先的准确率,并且培aujoint training 时间比其他同时训练的神经网络和符号学习器方法快到两个数量级。

ZooPFL: Exploring Black-box Foundation Models for Personalized Federated Learning

  • paper_url: http://arxiv.org/abs/2310.05143
  • repo_url: https://github.com/microsoft/personalizedfl
  • paper_authors: Wang Lu, Hao Yu, Jindong Wang, Damien Teney, Haohan Wang, Yiqiang Chen, Qiang Yang, Xing Xie, Xiangyang Ji
  • for: 这篇论文旨在解决个性化 Federated Learning (FL) 中资源有限的问题,包括数据、计算和通信成本,以及访问模型的限制。
  • methods: 该论文提出了一种名为 ZOOPFL 的方法,使用零阶优化解决分布偏移问题,并使用简单 yet effective 的线性投影进行个性化。此外,它还使用输入修复来投影预测值。
  • results: 广泛的实验表明,ZOOPFL 可以有效地应用于黑盒基模型上的 FL 任务,并且可以提高个性化的精度。
    Abstract When personalized federated learning (FL) meets large foundation models, new challenges arise from various limitations in resources. In addition to typical limitations such as data, computation, and communication costs, access to the models is also often limited. This paper endeavors to solve both the challenges of limited resources and personalization. i.e., distribution shifts between clients. To do so, we propose a method named ZOOPFL that uses Zeroth-Order Optimization for Personalized Federated Learning. ZOOPFL avoids direct interference with the foundation models and instead learns to adapt its inputs through zeroth-order optimization. In addition, we employ simple yet effective linear projections to remap its predictions for personalization. To reduce the computation costs and enhance personalization, we propose input surgery to incorporate an auto-encoder with low-dimensional and client-specific embeddings. We provide theoretical support for ZOOPFL to analyze its convergence. Extensive empirical experiments on computer vision and natural language processing tasks using popular foundation models demonstrate its effectiveness for FL on black-box foundation models.
    摘要 当个性化联合学习(FL)遇到大规模基础模型时,新的挑战出现,包括不同限制的资源。除了典型的限制,如数据、计算和通信成本外,对模型的访问也经常受限。这篇论文旨在解决限制资源和个性化的两个挑战。即分布shift between客户端。为此,我们提出了一种方法名为ZOOPFL,它使用零阶优化进行个性化联合学习。ZOOPFL避免直接干扰基础模型,而是通过零阶优化学习适应输入。此外,我们使用简单 yet有效的线性映射来重新映射其预测。为了减少计算成本并提高个性化,我们提议输入手术,其中包括一个低维度的自动encoder和客户端特定的嵌入。我们提供了对ZOOPFL的理论支持,以分析其相对稳定性。我们对计算机视觉和自然语言处理任务使用了流行的基础模型进行了广泛的实验,以证明ZOOPFL在黑盒基础模型上的有效性。

Harnessing the Power of Large Language Models for Empathetic Response Generation: Empirical Investigations and Improvements

  • paper_url: http://arxiv.org/abs/2310.05140
  • repo_url: None
  • paper_authors: Yushan Qian, Wei-Nan Zhang, Ting Liu
  • for: 这个论文主要研究了大语言模型(LLMs)在建立和谐社会关系中的应用效果,以及如何使用LLMs提高对话的同理能力。
  • methods: 本文提出了三种改进方法,包括semantically similar in-context learning、two-stage interactive generation和知识库的组合。
  • results: 广泛的实验表明,LLMs可以在我们提出的方法的帮助下显著提高对话的同理能力,并在自动和人类评价中达到了领先水平。此外,我们还探讨了GPT-4可以模拟人类评价者的可能性。
    Abstract Empathetic dialogue is an indispensable part of building harmonious social relationships and contributes to the development of a helpful AI. Previous approaches are mainly based on fine small-scale language models. With the advent of ChatGPT, the application effect of large language models (LLMs) in this field has attracted great attention. This work empirically investigates the performance of LLMs in generating empathetic responses and proposes three improvement methods of semantically similar in-context learning, two-stage interactive generation, and combination with the knowledge base. Extensive experiments show that LLMs can significantly benefit from our proposed methods and is able to achieve state-of-the-art performance in both automatic and human evaluations. Additionally, we explore the possibility of GPT-4 simulating human evaluators.
    摘要 帮助AI的发展,对话是不可或缺的一部分。以前的方法主要基于细致语言模型。随着ChatGPT的出现,大语言模型(LLMs)在这一领域的应用效果吸引了广泛的关注。本研究employs three improvement methods of semantically similar in-context learning, two-stage interactive generation, and combination with the knowledge base to investigate the performance of LLMs in generating empathetic responses. Our extensive experiments show that LLMs can significantly benefit from our proposed methods and achieve state-of-the-art performance in both automatic and human evaluations. In addition, we explore the possibility of GPT-4 simulating human evaluators.

Maximizing Utilitarian and Egalitarian Welfare of Fractional Hedonic Games on Tree-like Graphs

  • paper_url: http://arxiv.org/abs/2310.05139
  • repo_url: None
  • paper_authors: Tesshu Hanaka, Airi Ikeyama, Hirotaka Ono
  • for: Fractional hedonic games are coalition formation games where a player’s utility is determined by the average value they assign to the members of their coalition.
  • methods: The paper presents (pseudo)polynomial-time algorithms to compute welfare-maximizing partitions in fractional hedonic games on tree-like graphs, including two types of social welfare measures: utilitarian and egalitarian.
  • results: The paper provides a hardness result, demonstrating that the pseudopolynomial-time solvability is the best possible under the assumption P$\neq$NP.
    Abstract Fractional hedonic games are coalition formation games where a player's utility is determined by the average value they assign to the members of their coalition. These games are a variation of graph hedonic games, which are a class of coalition formation games that can be succinctly represented. Due to their applicability in network clustering and their relationship to graph hedonic games, fractional hedonic games have been extensively studied from various perspectives. However, finding welfare-maximizing partitions in fractional hedonic games is a challenging task due to the nonlinearity of utilities. In fact, it has been proven to be NP-hard and can be solved in polynomial time only for a limited number of graph classes, such as trees. This paper presents (pseudo)polynomial-time algorithms to compute welfare-maximizing partitions in fractional hedonic games on tree-like graphs. We consider two types of social welfare measures: utilitarian and egalitarian. Tree-like graphs refer to graphs with bounded treewidth and block graphs. A hardness result is provided, demonstrating that the pseudopolynomial-time solvability is the best possible under the assumption P$\neq$NP.
    摘要 幂数 Hedonic 游戏是一种协会成员选择游戏,其中玩家的产生 utility 取决于他们所在协会的平均价值。这种游戏是图 Hedonic 游戏的一种变种,可以简洁地表示。由于它们在网络划分和图 Hedonic 游戏之间的关系,幂数 Hedonic 游戏已经得到了广泛的研究。然而,在幂数 Hedonic 游戏中找到最大启用分 partitions 是一项困难的任务,因为价值函数是非线性的。事实上,已经证明了这是 NP-hard 问题,只有在一些图类型,如树,可以在多项时间内解决。本文提出了(假)多项时间算法来计算幂数 Hedonic 游戏中的最大启用分 partitions。我们考虑了两种社会利益度量:utilitarian 和 egalitarian。树状图指的是具有固定树宽度的图和块图。我们还提供了一个困难性结果,证明了 pseudopolynomial-time 可行性是最佳的,即 P ≠ NP 的假设下。

InstructDET: Diversifying Referring Object Detection with Generalized Instructions

  • paper_url: http://arxiv.org/abs/2310.05136
  • repo_url: https://github.com/jyfenggogo/instructdet
  • paper_authors: Ronghao Dang, Jiangyan Feng, Haodong Zhang, Chongjian Ge, Lin Song, Lijun Gong, Chengju Liu, Qijun Chen, Feng Zhu, Rui Zhao, Yibing Song
  • for: 本文提出了一种数据驱动的对象检测方法(InstructDET),用于基于用户指令(referring expressions,REC)进行对象检测。
  • methods: 本文使用了基于用户指令的数据驱动方法,并利用了新的视觉语言模型(VLM)和大语言模型(LLM)来生成指令和对象 bounding boxes(bbxs)。
  • results: 本文通过使用 InstructDET 方法和自制的 InDET dataset,实现了在标准 REC dataset 和 InDET 测试集上超越现有方法的对象检测性能。
    Abstract We propose InstructDET, a data-centric method for referring object detection (ROD) that localizes target objects based on user instructions. While deriving from referring expressions (REC), the instructions we leverage are greatly diversified to encompass common user intentions related to object detection. For one image, we produce tremendous instructions that refer to every single object and different combinations of multiple objects. Each instruction and its corresponding object bounding boxes (bbxs) constitute one training data pair. In order to encompass common detection expressions, we involve emerging vision-language model (VLM) and large language model (LLM) to generate instructions guided by text prompts and object bbxs, as the generalizations of foundation models are effective to produce human-like expressions (e.g., describing object property, category, and relationship). We name our constructed dataset as InDET. It contains images, bbxs and generalized instructions that are from foundation models. Our InDET is developed from existing REC datasets and object detection datasets, with the expanding potential that any image with object bbxs can be incorporated through using our InstructDET method. By using our InDET dataset, we show that a conventional ROD model surpasses existing methods on standard REC datasets and our InDET test set. Our data-centric method InstructDET, with automatic data expansion by leveraging foundation models, directs a promising field that ROD can be greatly diversified to execute common object detection instructions.
    摘要 我们提出了InstructDET,一种数据驱动的引用物体检测(ROD)方法,它基于用户指令来定位目标对象。而我们所利用的指令不仅来自引用表达(REC),还包括各种用户意图相关的对象检测指令。对于一张图像,我们生成了庞大的指令和对象 bounding box(bbxs),每个指令和对应的bbxs组成一个训练数据对。为了涵盖通用的检测表达,我们利用了趋势感知模型(VLM)和大语言模型(LLM),通过文本提示和对象bbxs来引导生成指令,这些基础模型的泛化效果可以生成人类化表达(例如,描述对象属性、类别和关系)。我们称之为InDET,它包含图像、bbxs和通用指令,这些指令来自基础模型。我们的InDET是基于现有REC dataset和对象检测dataset的扩展,可以通过我们的InstructDET方法将任何图像 WITH object bbxsintegrated。通过使用InDET数据集,我们示出了一个标准ROD模型在标准REC dataset和InDET测试集上的表现比普通方法更高。我们的数据驱动方法InstructDET,通过基于基础模型的自动扩展,指明了一个可能的场景,ROD可以通过各种常见的检测指令执行。

Are Emily and Greg Still More Employable than Lakisha and Jamal? Investigating Algorithmic Hiring Bias in the Era of ChatGPT

  • paper_url: http://arxiv.org/abs/2310.05135
  • repo_url: None
  • paper_authors: Akshaj Kumar Veldanda, Fabian Grob, Shailja Thakur, Hammond Pearce, Benjamin Tan, Ramesh Karri, Siddharth Garg
  • for: 这个研究探讨了大语言模型(LLMs)在算法招聘中的应用,特别是将简历与职业类别相匹配。
  • methods: 研究使用了场景实验来评估大语言模型对保护属性的偏见(如性别、种族和生育状况)的影响。
  • results: 研究发现,LLMs在不同的种族和性别下表现一致,但在孕期状况和政治倾向上存在偏见。使用了开源的LLMs进行对比输入解码来探讨可能的偏见源。
    Abstract Large Language Models (LLMs) such as GPT-3.5, Bard, and Claude exhibit applicability across numerous tasks. One domain of interest is their use in algorithmic hiring, specifically in matching resumes with job categories. Yet, this introduces issues of bias on protected attributes like gender, race and maternity status. The seminal work of Bertrand & Mullainathan (2003) set the gold-standard for identifying hiring bias via field experiments where the response rate for identical resumes that differ only in protected attributes, e.g., racially suggestive names such as Emily or Lakisha, is compared. We replicate this experiment on state-of-art LLMs (GPT-3.5, Bard, Claude and Llama) to evaluate bias (or lack thereof) on gender, race, maternity status, pregnancy status, and political affiliation. We evaluate LLMs on two tasks: (1) matching resumes to job categories; and (2) summarizing resumes with employment relevant information. Overall, LLMs are robust across race and gender. They differ in their performance on pregnancy status and political affiliation. We use contrastive input decoding on open-source LLMs to uncover potential sources of bias.
    摘要 大型语言模型(LLM)如GPT-3.5、Bard和Claude在多个任务中表现出色。一个有趣的领域是它们在算法招聘中的应用,特别是在匹配简历与职业类别之间。然而,这会引入保护特征如性别、种族和生育状况等的偏见。Bertrand & Mullainathan(2003)的著名研究设置了标准 для识别招聘偏见,通过在实验室中对同样的简历进行比较,以确定它们是否具有保护特征。我们在当今最高级的LLMs(GPT-3.5、Bard、Claude和Llama)上重复了这个实验,以评估它们对gender、种族、生育状况、怀孕状况和政治信仰等保护特征的偏见。我们在两个任务上评估LLMs:(1)匹配简历与职业类别之间;和(2)摘要简历中有关雇佣信息。总的来说,LLMs在gender和种族方面都很稳定,但在怀孕状况和政治信仰方面存在差异。我们使用开源LLMs的对比输入解码来探测可能的偏见源。

ed-cec: improving rare word recognition using asr postprocessing based on error detection and context-aware error correction

  • paper_url: http://arxiv.org/abs/2310.05129
  • repo_url: None
  • paper_authors: Jiajun He, Zekun Yang, Tomoki Toda
  • for: 提高自然语言处理(NLP)任务中罕见词的识别精度,以优化下游任务 such as 关键词检测、意图检测和文本概要生成。
  • methods: 提出了一种基于错误检测和上下文相关知识的ASR后处理方法,通过针对预测出的错误位置进行优化decoding过程,最大化精度while minimizing unnecessary computations。此外,我们还利用罕见词名单提供额外的上下文知识,以便更好地 corrected罕见词。
  • results: 在五个数据集上实验表明,我们的提议方法可以比前一些方法更好地降低单词错误率(WER),同时保持一定的推理速度,并且在不同的ASR系统上表现出良好的鲁棒性。
    Abstract Automatic speech recognition (ASR) systems often encounter difficulties in accurately recognizing rare words, leading to errors that can have a negative impact on downstream tasks such as keyword spotting, intent detection, and text summarization. To address this challenge, we present a novel ASR postprocessing method that focuses on improving the recognition of rare words through error detection and context-aware error correction. Our method optimizes the decoding process by targeting only the predicted error positions, minimizing unnecessary computations. Moreover, we leverage a rare word list to provide additional contextual knowledge, enabling the model to better correct rare words. Experimental results across five datasets demonstrate that our proposed method achieves significantly lower word error rates (WERs) than previous approaches while maintaining a reasonable inference speed. Furthermore, our approach exhibits promising robustness across different ASR systems.
    摘要 自动语音识别(ASR)系统经常遇到罕见词汇识别错误,导致下游任务如关键词检测、意图检测和文本概要 SUMMARIZATION 中的错误。为解决这个挑战,我们提出了一种新的 ASR 后处理方法,旨在提高罕见词汇识别的准确率。我们的方法优化了解码过程,只targeting 预测出的错误位置,最小化无用的计算。此外,我们利用罕见词表来提供额外的contextual knowledge,使模型更好地更正罕见词汇。实验结果 across five datasets 表明,我们提出的方法可以在word error rate(WER)下达到 significanly 更高的准确率,同时保持合理的推理速度。此外,我们的方法在不同的 ASR 系统上也展现出了良好的Robustness。

Instances and Labels: Hierarchy-aware Joint Supervised Contrastive Learning for Hierarchical Multi-Label Text Classification

  • paper_url: http://arxiv.org/abs/2310.05128
  • repo_url: https://github.com/simonucl/HJCL
  • paper_authors: Simon Chi Lok U, Jie He, Víctor Gutiérrez-Basulto, Jeff Z. Pan
  • for: 这个研究的目的是解决多个标签分类中的多个标签间的关联性问题。
  • methods: 这个研究使用了对生成的标签类别进行对照学习,以将文本和标签嵌入更加接近。
  • results: 实验结果显示,HJCL可以实现了优异的结果,并且显示了对于多个标签分类的效果。
    Abstract Hierarchical multi-label text classification (HMTC) aims at utilizing a label hierarchy in multi-label classification. Recent approaches to HMTC deal with the problem of imposing an over-constrained premise on the output space by using contrastive learning on generated samples in a semi-supervised manner to bring text and label embeddings closer. However, the generation of samples tends to introduce noise as it ignores the correlation between similar samples in the same batch. One solution to this issue is supervised contrastive learning, but it remains an underexplored topic in HMTC due to its complex structured labels. To overcome this challenge, we propose $\textbf{HJCL}$, a $\textbf{H}$ierarchy-aware $\textbf{J}$oint Supervised $\textbf{C}$ontrastive $\textbf{L}$earning method that bridges the gap between supervised contrastive learning and HMTC. Specifically, we employ both instance-wise and label-wise contrastive learning techniques and carefully construct batches to fulfill the contrastive learning objective. Extensive experiments on four multi-path HMTC datasets demonstrate that HJCL achieves promising results and the effectiveness of Contrastive Learning on HMTC.
    摘要 Here's the translation in Simplified Chinese: Hierarchical Multi-label Text Classification (HMTC) targets 使用标签层次结构在多标签分类中。现有的 HMTC 方法面临着通过使用半supervised contrastive learning 来违规输出空间的问题,这会使文本和标签嵌入更加紧密。然而,生成样本通常会引入噪声,因为它们忽略了同一个批处理中的相似样本之间的相关性。一种解决这个问题的方法是使用supervised contrastive learning,但它在 HMTC 中尚未得到充分发挥。为了 bridge 这两种方法之间的差异,我们提出了 Hierarchy-aware Joint Supervised Contrastive Learning (HJCL) 方法。特别是,我们使用了 both instance-wise 和 label-wise contrastive learning 技术,并且细心地构造批处理来满足对做对的 contrastive learning 目标。广泛的实验表明,HJCL 在四个多路 HMTC 数据集上达到了可塑性和 HMTC 中的对比学习的效果。

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model

  • paper_url: http://arxiv.org/abs/2310.05126
  • repo_url: https://github.com/lukeforeveryoung/ureader
  • paper_authors: Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Guohai Xu, Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin Jin, Liang He, Xin Alex Lin, Fei Huang
  • for: 这个研究旨在提出一个universal OCR-free visually-situated language understanding模型,以便在文档、表格、图表、自然图像和网页 screenshot 等多种类型的视觉文本中进行语言理解。
  • methods: 本研究使用 Multimodal Large Language Model (MLLM),并将其训练为可以进行多种类型的视觉文本理解任务,包括文档、表格、图表、自然图像和网页 screenshot 等。此外,研究者还将两个辅助任务添加到模型中,以增强模型的视觉文本和 semantics 理解能力。
  • results: 根据研究结果,这个单一模型可以在8个不同类型的视觉文本理解任务中实现state-of-the-art的性能,不需要进行下游训练。此外,研究者还发现这个模型可以对高分辨率的图像进行有效的处理,并且可以快速地处理大量的视觉文本。
    Abstract Text is ubiquitous in our visual world, conveying crucial information, such as in documents, websites, and everyday photographs. In this work, we propose UReader, a first exploration of universal OCR-free visually-situated language understanding based on the Multimodal Large Language Model (MLLM). By leveraging the shallow text recognition ability of the MLLM, we only finetuned 1.2% parameters and the training cost is much lower than previous work following domain-specific pretraining and finetuning paradigms. Concretely, UReader is jointly finetuned on a wide range of Visually-situated Language Understanding tasks via a unified instruction format. To enhance the visual text and semantic understanding, we further apply two auxiliary tasks with the same format, namely text reading and key points generation tasks. We design a shape-adaptive cropping module before the encoder-decoder architecture of MLLM to leverage the frozen low-resolution vision encoder for processing high-resolution images. Without downstream finetuning, our single model achieves state-of-the-art ocr-free performance in 8 out of 10 visually-situated language understanding tasks, across 5 domains: documents, tables, charts, natural images, and webpage screenshots. Codes and instruction-tuning datasets will be released.
    摘要 文本在我们的视觉世界中 ubique, 传递重要信息,如文档、网站和日常照片。在这项工作中,我们提出了 UReader,一种首次探索的无需 OCR 的通用视觉语言理解基于多模态大语言模型(MLLM)。我们利用 MLLM 的浅文本认知能力,只需要 finetune 1.2% 的参数,训练成本远低于先前的领域特定预训练和 fine-tuning 方法。具体来说,UReader 是通过一种统一的指令格式进行联合训练多种视觉语言理解任务。为了增强视觉文本和 semantics 理解,我们还应用了两个辅助任务,即文本读取和关键点生成任务。我们设计了适应形式的截取模块,以便使用冻结的低分辨率视觉Encoder 处理高分辨率图像。无需下游训练,我们的单个模型在 8 个视觉语言理解任务中 achievement state-of-the-art OCR-free 性能,覆盖 5 个领域:文档、表格、图表、自然图像和网页截屏。我们将代码和 instrucion-tuning 数据集发布。

Distribution-Based Trajectory Clustering

  • paper_url: http://arxiv.org/abs/2310.05123
  • repo_url: https://github.com/IsolationKernel/TIDKC
  • paper_authors: Zi Jing Wang, Ye Zhu, Kai Ming Ting
  • for: trajectory clustering, 探索 trajectory 数据中的共同模式
  • methods: 使用 Isolation Distributional Kernel (IDK) 作为主要工具,以实现 trajectory 相似度测量和归类
  • results: 比较传统和深度学习基于距离度量的方法,IDK 能够更好地捕捉 trajectory 中复杂的结构,并且提供了更高效和稳定的归类性能。
    Abstract Trajectory clustering enables the discovery of common patterns in trajectory data. Current methods of trajectory clustering rely on a distance measure between two points in order to measure the dissimilarity between two trajectories. The distance measures employed have two challenges: high computational cost and low fidelity. Independent of the distance measure employed, existing clustering algorithms have another challenge: either effectiveness issues or high time complexity. In this paper, we propose to use a recent Isolation Distributional Kernel (IDK) as the main tool to meet all three challenges. The new IDK-based clustering algorithm, called TIDKC, makes full use of the distributional kernel for trajectory similarity measuring and clustering. TIDKC identifies non-linearly separable clusters with irregular shapes and varied densities in linear time. It does not rely on random initialisation and is robust to outliers. An extensive evaluation on 7 large real-world trajectory datasets confirms that IDK is more effective in capturing complex structures in trajectories than traditional and deep learning-based distance measures. Furthermore, the proposed TIDKC has superior clustering performance and efficiency to existing trajectory clustering algorithms.
    摘要 trajectory clustering可以揭示行程数据中的共同模式。现有的行程 clustering方法都基于两点之间的距离度量来衡量行程之间的不同。现有的距离度量面临两个挑战:高计算成本和低准确性。独立于选择的距离度量,现有的归类算法又面临另一个挑战:效果不佳或高时间复杂度。在本文中,我们提议使用最近的隔离分布 kernel(IDK)作为主要工具,以解决这三个挑战。我们称之为 TIDKC 归类算法。 TIDKC 利用分布 kernel 来衡量行程之间的相似度,并且可以快速地找到非线性分割的弯曲形状和不规则的分布。它不需要随机初始化,并且对异常值有较高的Robustness。我们对 7 个大的实际行程数据集进行了广泛的评估,发现 IDK 可以更好地捕捉行程中的复杂结构,比传统和深度学习基于的距离度量更有效。此外,我们的提议的 TIDKC 归类算法也比现有的行程归类算法有更高的归类性和效率。

Breaking Down Word Semantics from Pre-trained Language Models through Layer-wise Dimension Selection

  • paper_url: http://arxiv.org/abs/2310.05115
  • repo_url: None
  • paper_authors: Nayoung Choi
  • for: 本研究旨在分析BERT中各层中的不同语言知识,以及分离含义的不同方面。
  • methods: 本研究使用了一种 binary mask 技术,将中间输出规范化到不同层,以便分离含义。
  • results: 实验结果表明,通过层划分信息可以提高表达效果,而分离含义更进一步提高表达效果。
    Abstract Contextual word embeddings obtained from pre-trained language model (PLM) have proven effective for various natural language processing tasks at the word level. However, interpreting the hidden aspects within embeddings, such as syntax and semantics, remains challenging. Disentangled representation learning has emerged as a promising approach, which separates specific aspects into distinct embeddings. Furthermore, different linguistic knowledge is believed to be stored in different layers of PLM. This paper aims to disentangle semantic sense from BERT by applying a binary mask to middle outputs across the layers, without updating pre-trained parameters. The disentangled embeddings are evaluated through binary classification to determine if the target word in two different sentences has the same meaning. Experiments with cased BERT$_{\texttt{base}$ show that leveraging layer-wise information is effective and disentangling semantic sense further improve performance.
    摘要 Contextual word embeddings obtained from pre-trained language model (PLM) have proven effective for various natural language processing tasks at the word level. However, interpreting the hidden aspects within embeddings, such as syntax and semantics, remains challenging. Disentangled representation learning has emerged as a promising approach, which separates specific aspects into distinct embeddings. Furthermore, different linguistic knowledge is believed to be stored in different layers of PLM. This paper aims to disentangle semantic sense from BERT by applying a binary mask to middle outputs across the layers, without updating pre-trained parameters. The disentangled embeddings are evaluated through binary classification to determine if the target word in two different sentences has the same meaning. Experiments with cased BERT$_{\texttt{base}$ show that leveraging layer-wise information is effective and disentangling semantic sense further improves performance.Here's the translation in Traditional Chinese:Contextual word embeddings obtained from pre-trained language model (PLM) have proven effective for various natural language processing tasks at the word level. However, interpreting the hidden aspects within embeddings, such as syntax and semantics, remains challenging. Disentangled representation learning has emerged as a promising approach, which separates specific aspects into distinct embeddings. Furthermore, different linguistic knowledge is believed to be stored in different layers of PLM. This paper aims to disentangle semantic sense from BERT by applying a binary mask to middle outputs across the layers, without updating pre-trained parameters. The disentangled embeddings are evaluated through binary classification to determine if the target word in two different sentences has the same meaning. Experiments with cased BERT$_{\texttt{base}$ show that leveraging layer-wise information is effective and disentangling semantic sense further improves performance.

Zero-Shot Detection of Machine-Generated Codes

  • paper_url: http://arxiv.org/abs/2310.05103
  • repo_url: https://github.com/baoguangsheng/fast-detect-gpt
  • paper_authors: Xianjun Yang, Kexun Zhang, Haifeng Chen, Linda Petzold, William Yang Wang, Wei Cheng
  • for: 本研究旨在提出一种不需要训练的方法,用于检测 LLMS 生成的代码,以避免这些代码的不当使用而带来的风险。
  • methods: 我们修改了之前的零批文本检测方法 DetectGPT(Mitchell et al., 2023),使用一个代理白盒模型来估算最右侧的字符的概率,以便识别由语言模型生成的代码片断。
  • results: 我们通过对 CodeContest 和 APPS 数据集的 python 代码进行了广泛的实验,并demonstrated 我们的方法可以在 text-davinci-003、GPT-3.5 和 GPT-4 模型上达到领先的检测结果。此外,我们的方法还能够抗 Reynolds 攻击和通用化到 Java 代码。
    Abstract This work proposes a training-free approach for the detection of LLMs-generated codes, mitigating the risks associated with their indiscriminate usage. To the best of our knowledge, our research is the first to investigate zero-shot detection techniques applied to code generated by advanced black-box LLMs like ChatGPT. Firstly, we find that existing training-based or zero-shot text detectors are ineffective in detecting code, likely due to the unique statistical properties found in code structures. We then modify the previous zero-shot text detection method, DetectGPT (Mitchell et al., 2023) by utilizing a surrogate white-box model to estimate the probability of the rightmost tokens, allowing us to identify code snippets generated by language models. Through extensive experiments conducted on the python codes of the CodeContest and APPS dataset, our approach demonstrates its effectiveness by achieving state-of-the-art detection results on text-davinci-003, GPT-3.5, and GPT-4 models. Moreover, our method exhibits robustness against revision attacks and generalizes well to Java codes. We also find that the smaller code language model like PolyCoder-160M performs as a universal code detector, outperforming the billion-scale counterpart. The codes will be available at https://github.com/ Xianjun-Yang/Code_detection.git
    摘要 这个研究提出了一种不需要训练的方法,用于检测 LLMs 生成的代码,从而降低这些代码的不当使用所带来的风险。据我们所知,我们的研究是首次应用零shot 检测技术于高级黑盒 LLMs 如 ChatGPT 生成的代码中。我们发现,现有的训练基于或零shot 文本检测器都不能有效地检测代码,可能是因为代码结构的独特统计特性。我们然后对之前的零shot 文本检测方法 DetectGPT(Mitchell et al., 2023)进行修改,通过利用代理白盒模型来估计右侧的最后几个字符的概率,从而识别 LLMs 生成的代码片断。经过对 Python 代码 dataset CodeContest 和 APPS 进行了广泛的实验,我们的方法在 text-davinci-003、GPT-3.5 和 GPT-4 模型上达到了最佳检测结果。此外,我们的方法还能够抗 revision 攻击,并在 Java 代码上显示良好的泛化性。我们还发现,较小的代码语言模型 PolyCoder-160M 可以作为一个通用的代码检测器,超过了一个百亿级模型的性能。代码将在 上提供。

Intelligent DRL-Based Adaptive Region of Interest for Delay-sensitive Telemedicine Applications

  • paper_url: http://arxiv.org/abs/2310.05099
  • repo_url: None
  • paper_authors: Abdulrahman Soliman, Amr Mohamed, Elias Yaacoub, Nikhil V. Navkar, Aiman Erbad
  • for: 本研究旨在提高 телемедицина应用的效率和质量,尤其是在 COVID-19 大流行后。
  • methods: 本研究使用 Deep Reinforcement Learning(DRL)模型,智能调整 ROI 大小和非 ROI 质量,以适应网络带宽变化。
  • results: 比较结果表明,DRL 模型可以降低延迟率 by 13%,并保持总质量在可接受范围内。这些发现对 телемедицина应用有很大的价值提升。
    Abstract Telemedicine applications have recently received substantial potential and interest, especially after the COVID-19 pandemic. Remote experience will help people get their complex surgery done or transfer knowledge to local surgeons, without the need to travel abroad. Even with breakthrough improvements in internet speeds, the delay in video streaming is still a hurdle in telemedicine applications. This imposes using image compression and region of interest (ROI) techniques to reduce the data size and transmission needs. This paper proposes a Deep Reinforcement Learning (DRL) model that intelligently adapts the ROI size and non-ROI quality depending on the estimated throughput. The delay and structural similarity index measure (SSIM) comparison are used to assess the DRL model. The comparison findings and the practical application reveal that DRL is capable of reducing the delay by 13% and keeping the overall quality in an acceptable range. Since the latency has been significantly reduced, these findings are a valuable enhancement to telemedicine applications.
    摘要 随着 télémedicine 应用的潜在和兴趣的不断增长,尤其是在 COVID-19 大流行之后。远程经验可以帮助人们完成复杂的手术或传输知识到地方外科医生,无需出国。尽管互联网速度有了 significative 的改善,但视频流程延迟仍然是 телеmedicine 应用的一大障碍。为了解决这个问题,这篇论文提出了一种基于深度强化学习(DRL)模型,该模型可以智能调整 ROI 大小和非 ROI 质量,以适应估算的吞吐量。延迟和结构相似度指数(SSIM)比较是用于评估 DRL 模型的。对比结果和实际应用显示,DRL 可以降低延迟约 13%,并保持总质量在可接受范围内。由于延迟得到了重要的减少,这些发现对 télémedicine 应用是有价值的改进。

How Reliable Are AI-Generated-Text Detectors? An Assessment Framework Using Evasive Soft Prompts

  • paper_url: http://arxiv.org/abs/2310.05095
  • repo_url: None
  • paper_authors: Tharindu Kumarage, Paras Sheth, Raha Moraffah, Joshua Garland, Huan Liu
  • for: 本研究旨在评估高性能探测器的可靠性,以响应AI生成文本的滥用问题。
  • methods: 我们提出了一种新的应对方法,即通过调整PLM的软提示来导致PLM生成”人类化”的文本,以诱导探测器做出错误判断。我们在两步中实现了universal逃脱提示:首先,我们为特定PLM设计了逃脱软提示,然后通过软提示的传输性来将学习到的逃脱软提示传递到另一个PLM上。
  • results: 我们通过多种PLM在不同写作任务中进行了广泛的实验,并评估了逃脱软提示的效果。结果表明,逃脱软提示能够成功地诱导探测器做出错误判断,并且可以在不同的PLM和写作任务中实现高度的可重复性和稳定性。
    Abstract In recent years, there has been a rapid proliferation of AI-generated text, primarily driven by the release of powerful pre-trained language models (PLMs). To address the issue of misuse associated with AI-generated text, various high-performing detectors have been developed, including the OpenAI detector and the Stanford DetectGPT. In our study, we ask how reliable these detectors are. We answer the question by designing a novel approach that can prompt any PLM to generate text that evades these high-performing detectors. The proposed approach suggests a universal evasive prompt, a novel type of soft prompt, which guides PLMs in producing "human-like" text that can mislead the detectors. The novel universal evasive prompt is achieved in two steps: First, we create an evasive soft prompt tailored to a specific PLM through prompt tuning; and then, we leverage the transferability of soft prompts to transfer the learned evasive soft prompt from one PLM to another. Employing multiple PLMs in various writing tasks, we conduct extensive experiments to evaluate the efficacy of the evasive soft prompts in their evasion of state-of-the-art detectors.
    摘要 近年来,人工智能生成文本的迅速扩散,主要受到强大预训练语言模型(PLM)的释放所驱动。为了解决人工智能生成文本的违规问题,许多高性能的检测器被开发出来,包括OpenAI检测器和斯坦福DetectGPT。在我们的研究中,我们问到这些检测器的可靠性。我们回答这个问题,我们设计了一种新的方法,可以让任何PLM生成文本,以逃脱这些高性能的检测器。我们的方法建议一种通用逃脱提示,一种新的软提示,可以导引PLM生成“人类化”的文本,使检测器受到误导。我们的新通用逃脱提示包括两个步骤:首先,我们通过提示调整制定一个逃脱软提示,适应特定PLM;然后,我们利用软提示的传输性,将学习的逃脱软提示从一个PLM传递到另一个PLM。通过多种PLM在不同的写作任务中使用,我们进行了广泛的实验来评估逃脱软提示的有效性。

Learning Generalizable Agents via Saliency-Guided Features Decorrelation

  • paper_url: http://arxiv.org/abs/2310.05086
  • repo_url: None
  • paper_authors: Sili Huang, Yanchao Sun, Jifeng Hu, Siyuan Guo, Hechang Chen, Yi Chang, Lichao Sun, Bo Yang
  • for: 实现在视觉基于学习(Reinforcement Learning,RL)中agent能够通过环境变化域对环境变化的应对。
  • methods: 我们提出了Saliency-Guided Features Decorrelation(SGFD),它包括两个核心技术:Random Fourier Functions(RFF)和Saliency map。 RFF用于估计高维度像像的复杂非线性相关,而Saliency map则用于识别变化的特征。 SGFD透过样本重新权重的方式,以降低相关于变化特征的估计相关性,实现特征decorrelation。
  • results: 我们的实验结果显示,SGFD可以在广泛的试验环境中实现很好的通过率,并在处理任务不相关的变化和任务相关的变化方面具有明显的改善。
    Abstract In visual-based Reinforcement Learning (RL), agents often struggle to generalize well to environmental variations in the state space that were not observed during training. The variations can arise in both task-irrelevant features, such as background noise, and task-relevant features, such as robot configurations, that are related to the optimal decisions. To achieve generalization in both situations, agents are required to accurately understand the impact of changed features on the decisions, i.e., establishing the true associations between changed features and decisions in the policy model. However, due to the inherent correlations among features in the state space, the associations between features and decisions become entangled, making it difficult for the policy to distinguish them. To this end, we propose Saliency-Guided Features Decorrelation (SGFD) to eliminate these correlations through sample reweighting. Concretely, SGFD consists of two core techniques: Random Fourier Functions (RFF) and the saliency map. RFF is utilized to estimate the complex non-linear correlations in high-dimensional images, while the saliency map is designed to identify the changed features. Under the guidance of the saliency map, SGFD employs sample reweighting to minimize the estimated correlations related to changed features, thereby achieving decorrelation in visual RL tasks. Our experimental results demonstrate that SGFD can generalize well on a wide range of test environments and significantly outperforms state-of-the-art methods in handling both task-irrelevant variations and task-relevant variations.
    摘要 在视觉基于的回归学习(RL)中,代理人经常难以通过训练不包括的环境变化来泛化良好。这些变化可能来自任务不相关的特征,如背景噪音,也可能来自任务相关的特征,如机器人配置,都与优化的决策相关。为了在这两种情况下实现泛化,代理人需要准确地理解变化特征对决策的影响,即在政策模型中建立真实的关联。然而,由于状态空间中特征之间的自然相关性,这些关联变得杂乱不清晰,使得政策很难分辨它们。为此,我们提出了吸引力引导特征分解(SGFD),通过样本重新权重来消除这些相关性。SGFD包括两种核心技术:Random Fourier Functions(RFF)和Saliency Map。RFF用于估计高维图像中复杂非线性相关性,而Saliency Map则用于标识变化特征。在Saliency Map的引导下,SGFD通过样本重新权重来减少相关性,从而实现特征分解。我们的实验结果表明,SGFD可以在各种测试环境上广泛泛化,并在处理任务不相关的变化和任务相关的变化方面显著超越当前的方法。

FLatS: Principled Out-of-Distribution Detection with Feature-Based Likelihood Ratio Score

  • paper_url: http://arxiv.org/abs/2310.05083
  • repo_url: https://github.com/linhaowei1/flats
  • paper_authors: Haowei Lin, Yuntian Gu
  • for: 本文旨在提出一种理论支持的外围样本检测方法,用于帮助NLPT模型在实际应用中更好地识别外围样本。
  • methods: 本文提出的方法基于likelihood比率的思想,通过对外围分布$\mathcal P_{\textit{out}$和内围分布$\mathcal P_{\textit{in}$的比较,来评估测试样本$\boldsymbol{x}$的”外围性”。而现有的SOTA方法,如Maha和KNN,只计算内围分布$p_{\textit{in}(\boldsymbol{x})$,因此是不优的。
  • results: 实验表明,提出的FLatS方法可以在 популяр的 benchmark 上建立新的SOTA。此外,FLatS 还可以增强其他OOD检测方法,通过包含外围分布 $p_{\textit{out}(\boldsymbol{x})$ 的估计。
    Abstract Detecting out-of-distribution (OOD) instances is crucial for NLP models in practical applications. Although numerous OOD detection methods exist, most of them are empirical. Backed by theoretical analysis, this paper advocates for the measurement of the "OOD-ness" of a test case $\boldsymbol{x}$ through the likelihood ratio between out-distribution $\mathcal P_{\textit{out}$ and in-distribution $\mathcal P_{\textit{in}$. We argue that the state-of-the-art (SOTA) feature-based OOD detection methods, such as Maha and KNN, are suboptimal since they only estimate in-distribution density $p_{\textit{in}(\boldsymbol{x})$. To address this issue, we propose FLatS, a principled solution for OOD detection based on likelihood ratio. Moreover, we demonstrate that FLatS can serve as a general framework capable of enhancing other OOD detection methods by incorporating out-distribution density $p_{\textit{out}(\boldsymbol{x})$ estimation. Experiments show that FLatS establishes a new SOTA on popular benchmarks. Our code is publicly available at https://github.com/linhaowei1/FLatS.
    摘要 检测外部分布(OOD)实例是NLTP模型在实际应用中的关键。虽然有许多OOD检测方法存在,但大多数都是经验的。本文通过理论分析,提出测量测试 caso $\boldsymbol{x}$ 的 "OOD-ness" 通过likelihood比率计算,即在外部分布 $\mathcal P_{\textit{out}$ 和内部分布 $\mathcal P_{\textit{in}$ 之间的比较。我们认为现有的SOTA feature-based OOD检测方法,如Maha和KNN,是不佳的,因为它们只估计内部分布 $p_{\textit{in}(\boldsymbol{x})$。为解决这一问题,我们提出了FLatS,一种理解的OOD检测方法,基于likelihood比率。此外,我们还证明FLatS可以增强其他OOD检测方法,通过包含外部分布 $p_{\textit{out}(\boldsymbol{x})$ 估计。实验表明,FLatS在 популяр的benchmark上建立了新的SOTA。我们的代码在https://github.com/linhaowei1/FLatS上公开。

  • paper_url: http://arxiv.org/abs/2310.18324
  • repo_url: None
  • paper_authors: Ana L. C. Bazzan, Anderson R. Tavares, André G. Pereira, Cláudio R. Jung, Jacob Scharcanski, Joel Luis Carbonera, Luís C. Lamb, Mariana Recamonde-Mendoza, Thiago L. T. da Silveira, Viviane Moreira
    for:* The paper is written to provide an overview of the ever-evolving landscape of Artificial Intelligence (AI) and its applications in various sectors of the economy, impacting society and humanity.methods:* The paper analyzes the risks that come with rapid technological progress and future trends in AI, as well as the potential for AI to become a general-purpose technology like electricity.results:* The paper explores the transformative impact of AI on society, with the potential to revolutionize sectors of the economy and impact humanity in the same way that electricity did in the 19th and 20th centuries.
    Abstract The thought-provoking analogy between AI and electricity, made by computer scientist and entrepreneur Andrew Ng, summarizes the deep transformation that recent advances in Artificial Intelligence (AI) have triggered in the world. This chapter presents an overview of the ever-evolving landscape of AI, written in Portuguese. With no intent to exhaust the subject, we explore the AI applications that are redefining sectors of the economy, impacting society and humanity. We analyze the risks that may come along with rapid technological progress and future trends in AI, an area that is on the path to becoming a general-purpose technology, just like electricity, which revolutionized society in the 19th and 20th centuries. A provocativa compara\c{c}\~ao entre IA e eletricidade, feita pelo cientista da computa\c{c}\~ao e empreendedor Andrew Ng, resume a profunda transforma\c{c}\~ao que os recentes avan\c{c}os em Intelig\^encia Artificial (IA) t\^em desencadeado no mundo. Este cap\'itulo apresenta uma vis\~ao geral pela paisagem em constante evolu\c{c}\~ao da IA. Sem pretens\~oes de exaurir o assunto, exploramos as aplica\c{c}\~oes que est\~ao redefinindo setores da economia, impactando a sociedade e a humanidade. Analisamos os riscos que acompanham o r\'apido progresso tecnol\'ogico e as tend\^encias futuras da IA, \'area que trilha o caminho para se tornar uma tecnologia de prop\'osito geral, assim como a eletricidade, que revolucionou a sociedade dos s\'eculos XIX e XX.
    摘要 思想提出的人工智能和电力相似性比喻,由计算机科学家和企业家安드鲁·涅(Andrew Ng)提出,概括了由最近的人工智能技术进步所Trigger的深刻变革。本章介绍了人工智能领域的不断发展,无意尝试涵盖所有方面。我们探讨人工智能在经济、社会和人类生活中的应用,以及可能随着技术进步而出现的风险。我们还分析了人工智能的未来趋势,该领域正在踏上成为一种通用技术的道路,类似于电力在19世纪和20世纪所 triggers 的社会革命。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is also widely used, especially in Taiwan, Hong Kong, and Macau.

DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models

  • paper_url: http://arxiv.org/abs/2310.05074
  • repo_url: https://github.com/hccngu/dialcot
  • paper_authors: Chengcheng Han, Xiaowei Du, Che Zhang, Yixin Lian, Xiang Li, Ming Gao, Baoyuan Wang
  • for: 提高小语言模型(SLM)的逻辑能力
  • methods: 对 reasoning 任务进行对话指导,并使用 proximal policy optimization(PPO)算法优化逻辑路径选择
  • results: 在四个算术逻辑 dataset 上实现了显著性能提升,比前一代竞争者更好
    Abstract Chain-of-Thought (CoT) prompting has proven to be effective in enhancing the reasoning capabilities of Large Language Models (LLMs) with at least 100 billion parameters. However, it is ineffective or even detrimental when applied to reasoning tasks in Smaller Language Models (SLMs) with less than 10 billion parameters. To address this limitation, we introduce Dialogue-guided Chain-of-Thought (DialCoT) which employs a dialogue format to generate intermediate reasoning steps, guiding the model toward the final answer. Additionally, we optimize the model's reasoning path selection using the Proximal Policy Optimization (PPO) algorithm, further enhancing its reasoning capabilities. Our method offers several advantages compared to previous approaches. Firstly, we transform the process of solving complex reasoning questions by breaking them down into a series of simpler sub-questions, significantly reducing the task difficulty and making it more suitable for SLMs. Secondly, we optimize the model's reasoning path selection through the PPO algorithm. We conduct comprehensive experiments on four arithmetic reasoning datasets, demonstrating that our method achieves significant performance improvements compared to state-of-the-art competitors.
    摘要 大脑语言模型(LLM)的逻辑能力可以通过链条思维(CoT)提示来提高,但是当应用于少于100亿参数的小语言模型(SLM)时,CoT的效果减弱或者甚至有害。为了解决这个局限性,我们提出了对话引导链条思维(DialCoT),它使用对话格式生成中间逻辑步骤,导引模型到答案。此外,我们使用距离策略优化(PPO)算法优化模型的逻辑路径选择,进一步提高其逻辑能力。我们的方法具有以下优势:首先,我们将复杂的逻辑问题转化为一系列更加简单的子问题,从而大大减轻任务难度,使SLM更适合处理。其次,我们通过PPO算法优化模型的逻辑路径选择,从而提高模型的逻辑能力。我们对四个数学逻辑数据集进行了广泛的实验,显示我们的方法与当前的竞争对手相比有显著的性能提升。

Video-CSR: Complex Video Digest Creation for Visual-Language Models

  • paper_url: http://arxiv.org/abs/2310.05060
  • repo_url: None
  • paper_authors: Tingkai Liu, Yunzhe Tao, Haogeng Liu, Qihang Fan, Ding Zhou, Huaibo Huang, Ran He, Hongxia Yang
  • for: 这个论文是用来评估视频语言模型的captioning、摘要和检索能力的新任务和人工标注数据集。
  • methods: 该数据集包含4.8万个YouTube视频clip,每个clip长度在20-60秒之间,覆盖了各种主题和兴趣。每个视频clip都有5个独立的标注caption(1句)和摘要(3-10句)。
  • results: 给任意选择的视频和其相应的ASR信息,我们评估视频语言模型在caption和摘要生成任务中,以及基于caption和摘要的检索任务中的表现。此外,我们还进行了评估不同的现有评价 metric的alignment with human preferences,并提出了一个基eline模型,以便作为Video-CSR任务的参考点。
    Abstract We present a novel task and human annotated dataset for evaluating the ability for visual-language models to generate captions and summaries for real-world video clips, which we call Video-CSR (Captioning, Summarization and Retrieval). The dataset contains 4.8K YouTube video clips of 20-60 seconds in duration and covers a wide range of topics and interests. Each video clip corresponds to 5 independently annotated captions (1 sentence) and summaries (3-10 sentences). Given any video selected from the dataset and its corresponding ASR information, we evaluate visual-language models on either caption or summary generation that is grounded in both the visual and auditory content of the video. Additionally, models are also evaluated on caption- and summary-based retrieval tasks, where the summary-based retrieval task requires the identification of a target video given excerpts of a corresponding summary. Given the novel nature of the paragraph-length video summarization task, we perform extensive comparative analyses of different existing evaluation metrics and their alignment with human preferences. Finally, we propose a foundation model with competitive generation and retrieval capabilities that serves as a baseline for the Video-CSR task. We aim for Video-CSR to serve as a useful evaluation set in the age of large language models and complex multi-modal tasks.
    摘要 我们介绍了一个新的任务和人标注数据集,用于评估视觉语言模型对真实视频片段的captioning、摘要和搜索能力,我们称之为Video-CSR(captioning、摘要和搜索)。该数据集包含4.8万个YouTube视频片段,每个片段长度在20-60秒之间,覆盖了广泛的主题和兴趣。每个视频片段对应着5个独立 annotated captions(1句)和摘要(3-10句)。给任意选择的视频和其相应的ASR信息,我们评估视觉语言模型在caption或摘要生成 tasks中的能力,这些任务都基于视频的视觉和听音内容。此外,我们还评估模型在caption-和摘要基于搜索任务中的能力,其中摘要基于搜索任务需要根据视频摘要的剪辑来标识目标视频。由于文章长度视频摘要任务的新性,我们进行了详细的比较分析不同的评估指标和其与人类偏好的对齐。最后,我们提出了一个基础模型,具有竞争力强的生成和搜索能力,作为Video-CSR任务的基线模型。我们希望Video-CSR能够在大型语言模型和复杂多Modal任务时代发挥作用。

Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-Reading

  • paper_url: http://arxiv.org/abs/2310.05058
  • repo_url: None
  • paper_authors: Songtao Luo, Shuang Yang, Shiguang Shan, Xilin Chen
  • for: 本文提出了一种基于两个观察点的新方法,用于 speaker adaptation lip reading,目的是提高 lip reading 的精度和稳定性。
  • methods: 本文使用了 shallow 和 deep 层,将 speaker 的特征分别处理为 two different targets,以便自动学习 separable hidden unit contributions。在 shallow 层中,引入 speaker-adaptive features 来增强 speech content 相关的特征;在 deep 层中,引入 speaker-adaptive features 来抑制 speech content 不相关的噪音。
  • results: 本文的方法在不同设置下进行了广泛的分析和比较,并 consistently 超过了现有方法的性能。此外,本文还发布了一个新的测试集 CAS-VSR-S68h,以进一步评估在只有几个 speaker 的情况下,但涵盖了大量和多样化的 speech content 的情况下的性能。
    Abstract In this paper, we propose a novel method for speaker adaptation in lip reading, motivated by two observations. Firstly, a speaker's own characteristics can always be portrayed well by his/her few facial images or even a single image with shallow networks, while the fine-grained dynamic features associated with speech content expressed by the talking face always need deep sequential networks to represent accurately. Therefore, we treat the shallow and deep layers differently for speaker adaptive lip reading. Secondly, we observe that a speaker's unique characteristics ( e.g. prominent oral cavity and mandible) have varied effects on lip reading performance for different words and pronunciations, necessitating adaptive enhancement or suppression of the features for robust lip reading. Based on these two observations, we propose to take advantage of the speaker's own characteristics to automatically learn separable hidden unit contributions with different targets for shallow layers and deep layers respectively. For shallow layers where features related to the speaker's characteristics are stronger than the speech content related features, we introduce speaker-adaptive features to learn for enhancing the speech content features. For deep layers where both the speaker's features and the speech content features are all expressed well, we introduce the speaker-adaptive features to learn for suppressing the speech content irrelevant noise for robust lip reading. Our approach consistently outperforms existing methods, as confirmed by comprehensive analysis and comparison across different settings. Besides the evaluation on the popular LRW-ID and GRID datasets, we also release a new dataset for evaluation, CAS-VSR-S68h, to further assess the performance in an extreme setting where just a few speakers are available but the speech content covers a large and diversified range.
    摘要 在这篇论文中,我们提出了一种新的lip reading speaker adaptation方法,基于两个观察结论。首先,一个说话人的自己特征可以通过他/她的一些脸部图像或者even a single image with shallow networks来表示得非常好,而言语内容表达在说话脸上的细腻动态特征则需要深度顺序网络来表示准确。因此,我们将浅层和深层处理 differently。其次,我们发现说话人的独特特征(例如嘴巴和下颌)对不同的话语和发音有不同的影响,需要根据不同的话语和发音进行适应增强或减弱这些特征以实现Robust lip reading。基于这两个观察结论,我们提出了利用说话人自己的特征自动学习可分离的隐藏单元贡献,其中浅层的特征与话语内容相关的特征更强,我们引入说话人特征学习以增强话语内容相关的特征。深层的特征则是说话人特征和话语内容相关的特征都很好地表示,我们引入说话人特征学习以减弱话语内容无关的噪音以实现Robust lip reading。我们的方法在不同的设置下 consistently outperform了现有方法,经过了全面的分析和比较。除了在popular LRW-ID和GRID dataset上进行评估外,我们还发布了一个新的测试集,CAS-VSR-S68h,以进一步评估在只有几个说话人的情况下,但是说话内容覆盖了广泛而多样化的情况下的性能。

FP3O: Enabling Proximal Policy Optimization in Multi-Agent Cooperation with Parameter-Sharing Versatility

  • paper_url: http://arxiv.org/abs/2310.05053
  • repo_url: None
  • paper_authors: Lang Feng, Dong Xing, Junru Zhang, Gang Pan
  • for: 提高多代理人PPO算法的合作多代理人学习(MARL)理论保证性。
  • methods: 基于全管道思想,实现多平行优化管道,通过不同的等价分解方法表示代理人之间的连接。
  • results: FP3O算法在多代理人MuJoCo和StarCraftII任务上表现出色,超过了其他强基eline,并在不同的参数共享配置下展现了强大的可变性。
    Abstract Existing multi-agent PPO algorithms lack compatibility with different types of parameter sharing when extending the theoretical guarantee of PPO to cooperative multi-agent reinforcement learning (MARL). In this paper, we propose a novel and versatile multi-agent PPO algorithm for cooperative MARL to overcome this limitation. Our approach is achieved upon the proposed full-pipeline paradigm, which establishes multiple parallel optimization pipelines by employing various equivalent decompositions of the advantage function. This procedure successfully formulates the interconnections among agents in a more general manner, i.e., the interconnections among pipelines, making it compatible with diverse types of parameter sharing. We provide a solid theoretical foundation for policy improvement and subsequently develop a practical algorithm called Full-Pipeline PPO (FP3O) by several approximations. Empirical evaluations on Multi-Agent MuJoCo and StarCraftII tasks demonstrate that FP3O outperforms other strong baselines and exhibits remarkable versatility across various parameter-sharing configurations.
    摘要 现有的多代理PPO算法缺乏扩展 тео리тиче guarantee of PPO to cooperative multi-agent reinforcement learning (MARL) 中的兼容性。在这篇文章中,我们提出了一种新的和灵活的多代理PPO算法,以超越这些限制。我们的方法基于我们提出的全管道 paradigm,该 paradigm在利用多种等价 decompositions of the advantage function 中实现多个并行的优化管道。这个过程成功地表达了多个代理之间的连接,即多个管道之间的连接,使其与不同类型的参数共享兼容。我们提供了强有力的理论基础,以便策略提高,并在后续开发了一种实用的FP3O算法。在Multi-Agent MuJoCo和StarCraftII任务上的实验评估中,FP3O的表现超过了其他强大的基准,并且在不同的参数共享配置下展现出了remarkable的灵活性。

Learning Intra- and Inter-Cell Differences for Accurate Battery Lifespan Prediction across Diverse Conditions

  • paper_url: http://arxiv.org/abs/2310.05052
  • repo_url: None
  • paper_authors: Han Zhang, Yuqi Li, Shun Zheng, Ziheng Lu, Xiaofan Gui, Wei Xu, Jiang Bian
  • for: 预测电池寿命的研究具有实际应用价值,尤其是在电池研发中。现有的数据驱动模型大多依靠特定电池的早期电学信号来预测它的寿命。然而,这些模型受限于特定腐食条件,这不仅限制了它们的模型能力,还使其在不同条件下预测腐食的效果减退。因此,这些模型经常错过了可以从另一些条件下的历史数据中获得的全部 beneficial。
  • methods: 我们引入了一种方法,可以考虑target电池和参照电池之间的差异,无论它们的材料和腐食条件如何。通过这种差异,我们不仅扩大了特征空间,而且开辟了一个通用的电池寿命预测框架。我们的模型结合了inter-和intra-cell差异,在多种条件下表现出了极高的效率和准确率,使用了所有可用的数据集。
  • results: 我们的方法可以充分利用older电池的数据,使 newer电池可以借鉴过去的电池的经验。这种方法不仅拓宽了电池数据利用策略,还为未来的电池管理系统提供了智能化的基础。
    Abstract Battery life prediction holds significant practical value for battery research and development. Currently, many data-driven models rely on early electrical signals from specific target batteries to predict their lifespan. A common shortfall is that most existing methods are developed based on specific aging conditions, which not only limits their model's capability but also diminishes their effectiveness in predicting degradation under varied conditions. As a result, these models often miss out on fully benefiting from the rich historical data available under other conditions. Here, to address above, we introduce an approach that explicitly captures differences between electrical signals of a target battery and a reference battery, irrespective of their materials and aging conditions, to forecast the target battery life. Through this inter-cell difference, we not only enhance the feature space but also pave the way for a universal battery life prediction framework. Remarkably, our model that combines the inter- and intra-cell differences shines across diverse conditions, standing out in its efficiency and accuracy using all accessible datasets. An essential application of our approach is its capability to leverage data from older batteries effectively, enabling newer batteries to capitalize on insights gained from past batteries. This work not only enriches the battery data utilization strategy but also sets the stage for smarter battery management system in the future.
    摘要 预测电池寿命具有重要的实践价值,对电池研发具有重要的意义。目前,许多数据驱动模型依靠特定目标电池早期的电学信号来预测它们的寿命。然而,大多数现有方法都是基于特定腐蚀条件下开发的,这不仅限制了他们的模型能力,而且降低了它们在不同条件下预测腐蚀的效果。因此,这些模型经常会错过利用可用的历史数据来预测腐蚀情况。在这里,我们引入了一种方法,可以跨电池和参照电池之间的差异来预测目标电池寿命。通过这种差异,我们不仅扩大了特征空间,而且开创了一个通用的电池寿命预测框架。另外,我们的模型结合了差异和内部差异,在多种条件下表现出色,高效精准地使用所有可用的数据集。这种方法不仅可以有效地利用较老的电池数据,还可以为未来的电池管理系统提供智能化的基础。

From Text to Tactic: Evaluating LLMs Playing the Game of Avalon

  • paper_url: http://arxiv.org/abs/2310.05036
  • repo_url: https://github.com/jonathanmli/avalon-llm
  • paper_authors: Jonathan Light, Min Cai, Sheng Shen, Ziniu Hu
  • for: 这篇论文探讨了大语言模型代理人(LLM)在游戏《抵抗avalon》中的潜力。
  • methods: 作者们使用了一个名为AvalonBench的游戏环境,以评估多代理LML模型。这个环境包括avalon游戏环境、基于规则的bot对手和ReAct风格的LML代理人。
  • results: 作者们的评估结果显示,使用AvalonBench评估LML模型时存在明显的能力差距。例如,使用ChatGPT扮演善良角色时,与基于规则的bot对手扮演邪恶角色的情况下,win rate为22.2%,而使用基于规则的bot扮演善良角色时,win rate为38.2%。
    Abstract In this paper, we explore the potential of Large Language Models (LLMs) Agents in playing the strategic social deduction game, Resistance Avalon. Players in Avalon are challenged not only to make informed decisions based on dynamically evolving game phases, but also to engage in discussions where they must deceive, deduce, and negotiate with other players. These characteristics make Avalon a compelling test-bed to study the decision-making and language-processing capabilities of LLM Agents. To facilitate research in this line, we introduce AvalonBench - a comprehensive game environment tailored for evaluating multi-agent LLM Agents. This benchmark incorporates: (1) a game environment for Avalon, (2) rule-based bots as baseline opponents, and (3) ReAct-style LLM agents with tailored prompts for each role. Notably, our evaluations based on AvalonBench highlight a clear capability gap. For instance, models like ChatGPT playing good-role got a win rate of 22.2% against rule-based bots playing evil, while good-role bot achieves 38.2% win rate in the same setting. We envision AvalonBench could be a good test-bed for developing more advanced LLMs (with self-playing) and agent frameworks that can effectively model the layered complexities of such game environments.
    摘要 在这篇论文中,我们探讨了大语言模型代理人(LLM)在游戏《巨大叛逆:阿瓦隆》中的潜力。游戏中的玩家不仅需要根据不断发展的游戏阶段进行了解的决策,还需要与其他玩家进行交流,包括谎言、推理和谈判。这些特点使得阿瓦隆成为了研究LLM代理人决策和语言处理能力的有力的测试场景。为了促进这种研究,我们介绍了阿瓦隆Bench,一个包含以下三个重要组成部分的游戏环境:(1)阿瓦隆游戏环境,(2)基于规则的 bot 作为基准对手,以及(3)ReAct 风格的 LLM 代理人,每个角色都有适应的提示。我们的评估结果表明,与基于规则的 bot 作为邪恶对手进行比较,ChatGPT 扮演善良角色时的胜率为 22.2%,而基于规则的 bot 扮演善良角色时的胜率为 38.2%。我们认为阿瓦隆Bench 可以成为 LLM 的发展和自适应代理人框架的试验场景。

Self-Convinced Prompting: Few-Shot Question Answering with Repeated Introspection

  • paper_url: http://arxiv.org/abs/2310.05035
  • repo_url: None
  • paper_authors: Haodi Zhang, Min Cai, Xinhe Zhang, Chen Jason Zhang, Rui Mao, Kaishun Wu
  • for: 提高大语言模型(LLM)的复杂理解和具有技巧使用能力
  • methods: 使用提前训练的语言模型、询问、检查和修改步骤
  • results: 实验结果 validate Self-Convince 框架的有效性,与基准值进行比较获得了显著提高
    Abstract While large language models (LLMs) such as ChatGPT and PaLM have demonstrated remarkable performance in various language understanding and generation tasks, their capabilities in complex reasoning and intricate knowledge utilization still fall short of human-level proficiency. Recent studies have established the effectiveness of prompts in steering LLMs towards generating desired outputs. Building on these insights, we introduce a novel framework that harnesses the potential of large-scale pre-trained language models, to iteratively enhance performance of the LLMs. Our framework incorporates three components: \textit{Normal CoT}, a \textit{Convincer}, and an \textit{Answerer}. It processes the output of a typical few-shot chain-of-thought prompt, assesses the correctness of the response, scrutinizes the answer, refines the reasoning, and ultimately produces a new solution. Experimental results on the 7 datasets of miscellaneous problems validate the efficacy of the Self-Convince framework, achieving substantial improvements compared to the baselines. This study contributes to the burgeoning body of research focused on integrating pre-trained language models with tailored prompts and iterative refinement processes to augment their performance in complex tasks.
    摘要 大型语言模型(LLM)如ChatGPT和PaLM在不同的语言理解和生成任务中表现出色,但它们在复杂的推理和细节知识利用方面仍然落后人类水平。现在的研究显示了干预提示的效果,可以导引LLM生成所需的出力。基于这些见解,我们提出了一个新的框架,叫做Self-Convince框架。这个框架包含三个 ком成分: Normal CoT、Convincer 和 Answerer。它在一般几步链接思维提示的出力中进行处理,评估回应的正确性,探究答案,删除错误的推理,最终生成一个新的解决方案。实验结果显示,Self-Convince框架在7个多元问题的数据集上实现了重要的改善,较基于点的表现有所提高。这项研究将大型预训语言模型与定制提示和迭代改进过程相结合,以增强它们在复杂任务中的表现。

Counter Turing Test CT^2: AI-Generated Text Detection is Not as Easy as You May Think – Introducing AI Detectability Index

  • paper_url: http://arxiv.org/abs/2310.05030
  • repo_url: None
  • paper_authors: Megha Chakraborty, S. M Towhidul Islam Tonmoy, S M Mehedi Zaman, Krish Sharma, Niyar R Barman, Chandan Gupta, Shreya Gautam, Tanay Kumar, Vinija Jain, Aman Chadha, Amit P. Sheth, Amitava Das
  • for: 这篇论文主要旨在评估当前的AI生成文本检测技术的robustness,以及评估不同大小的自然语言处理模型(LLMs)在生成文本检测中的可探测性。
  • methods: 这篇论文提出了Counter Turing Test(CT^2)作为一个完整的评估AI生成文本检测技术的标准 benchark。它们还提出了一个名为AI Detectability Index(ADI)的指标,用于评估不同大小的LLMs在生成文本检测中的可探测性。
  • results: 这篇论文的实验结果表明,现有的AI生成文本检测技术在面对CT^2的测试中具有较弱的可探测性。此外,研究发现大型LLMs具有较高的AI Detectability Index(ADI),这意味着它们在生成文本检测中更难被检测。
    Abstract With the rise of prolific ChatGPT, the risk and consequences of AI-generated text has increased alarmingly. To address the inevitable question of ownership attribution for AI-generated artifacts, the US Copyright Office released a statement stating that 'If a work's traditional elements of authorship were produced by a machine, the work lacks human authorship and the Office will not register it'. Furthermore, both the US and the EU governments have recently drafted their initial proposals regarding the regulatory framework for AI. Given this cynosural spotlight on generative AI, AI-generated text detection (AGTD) has emerged as a topic that has already received immediate attention in research, with some initial methods having been proposed, soon followed by emergence of techniques to bypass detection. This paper introduces the Counter Turing Test (CT^2), a benchmark consisting of techniques aiming to offer a comprehensive evaluation of the robustness of existing AGTD techniques. Our empirical findings unequivocally highlight the fragility of the proposed AGTD methods under scrutiny. Amidst the extensive deliberations on policy-making for regulating AI development, it is of utmost importance to assess the detectability of content generated by LLMs. Thus, to establish a quantifiable spectrum facilitating the evaluation and ranking of LLMs according to their detectability levels, we propose the AI Detectability Index (ADI). We conduct a thorough examination of 15 contemporary LLMs, empirically demonstrating that larger LLMs tend to have a higher ADI, indicating they are less detectable compared to smaller LLMs. We firmly believe that ADI holds significant value as a tool for the wider NLP community, with the potential to serve as a rubric in AI-related policy-making.
    摘要 随着智能ChatGPT的出现,人工智能生成文本的风险和后果增加了致命地。为了解决人工智能生成文本的所有权归属问题,美国版权办公室发表了一份声明,表示如果文本中的传统元素的作者是机器制造出来的,那么文本就缺乏人类作者,因此不会注册。此外,美国和欧盟政府最近已经起草了关于人工智能的规制框架的初步提案。随着生成AI的焦点逐渐吸引到研究领域,AI生成文本检测(AGTD)已经成为研究的热点,一些初步的方法已经被提出,然后又有人提出了绕过检测的技术。本文介绍了Counter Turing Test(CT^2),一种包含了检测AGTD技术的多种方法的benchmark。我们的实验结果明显地表明,现有的AGTD方法在审查中表现极其脆弱。在政策制定的过程中,评估人工智能生成内容的可检测性是非常重要的。因此,我们提出了人工智能可检测指数(ADI),以评估和排名15种当代LLMs的可检测性水平。我们通过实验证明,大型LLMs tend to have higher ADI, indicating that they are less detectable compared to smaller LLMs。我们认为ADI具有重要的价值,可以作为NLP社区中的工具,并且可能在AI相关的政策制定中扮演重要的角色。

Revisiting Large Language Models as Zero-shot Relation Extractors

  • paper_url: http://arxiv.org/abs/2310.05028
  • repo_url: None
  • paper_authors: Guozheng Li, Peng Wang, Wenjun Ke
  • for: 这个论文主要研究了使用大语言模型(LLM)进行零shot关系EXTRACTION(RE)。
  • methods: 本研究使用了Chain-of-thought(CoT)技术和summarize-and-ask(\textsc{SumAsk})提示法来提高零shot RE的性能。
  • results: 研究发现,\textsc{SumAsk}可以Consistently和Significantly提高LLMs在不同的模型大小、benchmark和设置下的性能。此外,零shot提示与ChatGPT比较或超过了零shot和完全监督方法的性能。LLMs也能够Handle Challenge none-of-the-above(NoTA)关系,但关系性能差异较大。
    Abstract Relation extraction (RE) consistently involves a certain degree of labeled or unlabeled data even if under zero-shot setting. Recent studies have shown that large language models (LLMs) transfer well to new tasks out-of-the-box simply given a natural language prompt, which provides the possibility of extracting relations from text without any data and parameter tuning. This work focuses on the study of exploring LLMs, such as ChatGPT, as zero-shot relation extractors. On the one hand, we analyze the drawbacks of existing RE prompts and attempt to incorporate recent prompt techniques such as chain-of-thought (CoT) to improve zero-shot RE. We propose the summarize-and-ask (\textsc{SumAsk}) prompting, a simple prompt recursively using LLMs to transform RE inputs to the effective question answering (QA) format. On the other hand, we conduct comprehensive experiments on various benchmarks and settings to investigate the capabilities of LLMs on zero-shot RE. Specifically, we have the following findings: (i) \textsc{SumAsk} consistently and significantly improves LLMs performance on different model sizes, benchmarks and settings; (ii) Zero-shot prompting with ChatGPT achieves competitive or superior results compared with zero-shot and fully supervised methods; (iii) LLMs deliver promising performance in extracting overlapping relations; (iv) The performance varies greatly regarding different relations. Different from small language models, LLMs are effective in handling challenge none-of-the-above (NoTA) relation.
    摘要 <> translate_language: zh-CN关系提取(RE)总是需要一定量的标注或未标注数据,即使在零shot设定下。现代语言模型(LLM)可以轻松地在新任务上进行升级,只需要一个自然语言提示,这提供了提取关系从文本中的可能性。本研究将关注使用LLM,如ChatGPT,作为零shot关系提取器的研究。一方面,我们分析了现有RE提示的缺点,并尝试使用最新的提示技术,如链条思维(CoT),来改进零shot RE。我们提出了摘要并问 (\textsc{SumAsk})提示,一种简单的提示,使用LLM recursively将RE输入转换为有效的问答(QA)格式。另一方面,我们对多个benchmark和设置进行了广泛的实验,以研究LLM在零shot RE中的能力。我们得到以下发现:(i) \textsc{SumAsk}在不同的模型大小、benchmark和设置上具有一致性和显著性,提高LLM的表现。(ii) 采用ChatGPT的零shot提示可以与零shot和完全监督方法相比,在不同的任务和设置上达到竞争或更高的性能。(iii) LLM在抽象关系提取方面表现出色,特别是在抽象关系上。(iv) 不同的关系之间的表现差异较大,而LLM在处理抽象关系方面表现更佳。与小语言模型相比,LLM在处理抽象关系方面表现更出色。

Fully Spiking Neural Network for Legged Robots

  • paper_url: http://arxiv.org/abs/2310.05022
  • repo_url: None
  • paper_authors: Xiaoyang Jiang, Qiang Zhang, Jingkai Sun, Renjing Xu
  • For: The paper aims to improve the performance of legged robots using a novel Spiking Neural Network (SNN) to process body perception signals, achieving better speed and energy consumption, and improved biological interpretability.* Methods: The paper employs a SNN to process legged robots’ perception signals, which offers improved biological interpretability and natural advantages in inference speed and energy consumption compared to traditional artificial neural networks.* Results: The paper achieves outstanding results across a range of simulated terrains, demonstrating the effectiveness of SNN in legged robots.
    Abstract In recent years, legged robots based on deep reinforcement learning have made remarkable progress. Quadruped robots have demonstrated the ability to complete challenging tasks in complex environments and have been deployed in real-world scenarios to assist humans. Simultaneously, bipedal and humanoid robots have achieved breakthroughs in various demanding tasks. Current reinforcement learning methods can utilize diverse robot bodies and historical information to perform actions. However, prior research has not emphasized the speed and energy consumption of network inference, as well as the biological significance of the neural networks themselves. Most of the networks employed are traditional artificial neural networks that utilize multilayer perceptrons (MLP). In this paper, we successfully apply a novel Spiking Neural Network (SNN) to process legged robots, achieving outstanding results across a range of simulated terrains. SNN holds a natural advantage over traditional neural networks in terms of inference speed and energy consumption, and their pulse-form processing of body perception signals offers improved biological interpretability. To the best of our knowledge, this is the first work to implement SNN in legged robots.
    摘要

Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models

  • paper_url: http://arxiv.org/abs/2310.05015
  • repo_url: https://github.com/microsoft/moonlit
  • paper_authors: Song Guo, Jiahang Xu, Li Lyna Zhang, Mao Yang
  • for: 这篇研究的目的是为了提高大型语言模型(LLM)的部署,特别是在具有限制的硬件资源的环境下。
  • methods: 这篇研究使用了一种新的架构,叫做Compresso,它通过与LLM的协作,在训练过程中学习最佳的剪辑决策。Compresso使用了LoRA技术来实现$L_0$规律,并在调询过程中引入了协同提示,以增强整体性能。
  • results: 根据实验结果,Compresso可以将LLaMA-7B剪辑到5.4B,保持原始性能,甚至在阅读理解测试中超过LLaMA-7B的表现。Compresso比一项基eline的一项单一剪辑方法(one-shot pruning)有更高的表现,在不同的组合比例下,可以达到2.21%, 11.43%, 7.04%, 4.81%更高的分数在 Commonsense Reasoning、Reading Comprehension、MMLU和BBH测试中。
    Abstract Despite the remarkable success of Large Language Models (LLMs), the massive size poses significant deployment challenges, particularly on resource-constrained hardware. While existing LLM compression methods focus on quantization, pruning remains relatively unexplored due to the high cost of training-based approaches and data collection challenges. One-shot pruning methods, although cost-effective and data-free, have become dominant in LLM pruning, but lead to performance decline under the structured pruning setting. In this work, we introduce a new paradigm for structurally pruning LLMs, called Compresso. Our approach, through the collaboration of the proposed resource-efficient pruning algorithm and the LLM itself, learns optimal pruning decisions during the training process. Compresso addresses the challenges of expensive training costs and data collection by incorporating Low-Rank Adaptation (LoRA) into the $L_0$ regularization during the instruction tuning process. Then, we further augment the pruning algorithm by introducing a collaborative prompt that fosters collaboration between the LLM and the pruning algorithm, significantly boosting the overall performance. To this end, Compresso prunes LLaMA-7B to 5.4B, maintaining original performance and even surpassing LLaMA-7B in reading comprehension by 2.62%. Extensive experiments demonstrate that Compresso significantly outperforms one-shot pruning baselines across various sparsity ratios, achieving up to 2.21%, 11.43%, 7.04%, and 4.81% higher scores on the commonsense reasoning, reading comprehension, MMLU, and BBH benchmarks, respectively.
    摘要 尽管大型自然语言模型(LLM)已经取得了非常出色的成功,但是它们的巨大大小却对资源有限的硬件 pose 了部署的挑战。现有的 LLM 压缩方法主要集中在量化上,而采用 Training-based 方法和数据收集的压缩方法却受到了高成本和数据收集的挑战。一shot 压缩方法,尽管成本低廉且不需数据,但在结构化压缩设定下会导致性能下降。在这种情况下,我们提出了一种新的 LL 模型结构压缩方法,called Compresso。我们的方法通过与 LL 模型的协作,在训练过程中学习最佳压缩决策。Compresso 通过综合利用 LoRA 技术和 $L_0$ 正则化来解决训练成本和数据收集的挑战。此外,我们还在压缩算法中引入了协同提示,使 LL 模型和压缩算法之间的合作更加紧密,从而提高总性能。因此,Compresso 可以压缩 LLaMA-7B 到 5.4B,保持原有性能,甚至在阅读理解任务上超越 LLaMA-7B 的表现。我们的实验表明,Compresso significantly 高于一shot 压缩基准在不同的稀疏比例上,达到了2.21%、11.43%、7.04%和4.81%的提升。

The Reinforce Policy Gradient Algorithm Revisited

  • paper_url: http://arxiv.org/abs/2310.05000
  • repo_url: None
  • paper_authors: Shalabh Bhatnagar
  • for: 本文提出了一种改进版本的强化策略梯度算法,用于处理无穷状态和动作空间的系统。
  • methods: 本文使用一种随机搜索方法来估计策略梯度,而不需要Regularity conditions。
  • results: 本文证明了这种新算法的收敛性,并且在无穷状态和动作空间中的系统上实现了高效的收敛性。In English:
  • for: The paper proposes an improved version of the reinforcement policy gradient algorithm for systems with infinite state and action spaces.
  • methods: The paper uses a random search method to estimate the policy gradient, without requiring regularity conditions.
  • results: The paper proves the convergence of the new algorithm and demonstrates its effectiveness in systems with infinite state and action spaces.
    Abstract We revisit the Reinforce policy gradient algorithm from the literature. Note that this algorithm typically works with cost returns obtained over random length episodes obtained from either termination upon reaching a goal state (as with episodic tasks) or from instants of visit to a prescribed recurrent state (in the case of continuing tasks). We propose a major enhancement to the basic algorithm. We estimate the policy gradient using a function measurement over a perturbed parameter by appealing to a class of random search approaches. This has advantages in the case of systems with infinite state and action spaces as it relax some of the regularity requirements that would otherwise be needed for proving convergence of the Reinforce algorithm. Nonetheless, we observe that even though we estimate the gradient of the performance objective using the performance objective itself (and not via the sample gradient), the algorithm converges to a neighborhood of a local minimum. We also provide a proof of convergence for this new algorithm.
    摘要 我们回顾到文献中的增强策略梯度算法。该算法通常与成本返回得到随机长度集所得到的集成任务(如果结束)或从定义状态访问中抽出的循环状态(在续行任务中)。我们提出了一个主要优化,使用功能测量在干扰参数上估计策略梯度,通过一类随机搜寻方法。这具有利陵系统拥有无限州和动作空间的情况下,可以缓和一些常量需求,从而让增强算法的证明 converges。然而,我们观察到,即使使用表现目标自身估计策略梯度(而不是采样梯度),算法仍会趋向一个地方最小值的邻近。我们也提供了该新算法的充分性证明。

Distantly-Supervised Joint Entity and Relation Extraction with Noise-Robust Learning

  • paper_url: http://arxiv.org/abs/2310.04994
  • repo_url: https://github.com/yul091/denrl
  • paper_authors: Yufei Li, Xiao Yu, Yanghong Guo, Yanchi Liu, Haifeng Chen, Cong Liu
  • for: 这个论文主要用于解决使用远程标注数据进行entity和关系抽象的问题,即使面临着噪声标注的问题。
  • methods: 该论文提出了一种新的噪声鲁棒方法,包括在序列标注模型中预训练GPT-2,以及使用一种新的噪声鲁棒学习框架,包括一个新的损失函数,惩罚与重要关系模式和实体关系依赖性不一致。
  • results: 实验结果显示,该方法可以在两个数据集上达到现有状态的 arts 方法的同等或更高的 JOINT 抽象性和噪声减少效果。
    Abstract Joint entity and relation extraction is a process that identifies entity pairs and their relations using a single model. We focus on the problem of training these models on distantly-labeled data, which is generated by aligning entity mentions in a text corpus with their corresponding entity and relation types in a knowledge base. One key challenge here is the presence of noisy labels, which arises from both entity and relation annotations, and significantly impair the effectiveness of supervised learning applications. However, existing research primarily addresses only one type of noise, thereby limiting the effectiveness of noise reduction. To fill this gap, we introduce a new noise-robust approach, that 1)~incorporates a pre-trained GPT-2 into a sequence tagging scheme for simultaneous entity and relation detection, and 2)~employs a noise-robust learning framework which includes a new loss function that penalizes inconsistency with both significant relation patterns and entity-relation dependencies, as well as a self-adaptive learning step that iteratively selects and trains on high-quality instances. Experiments on two datasets show that our method outperforms the existing state-of-the-art methods in both joint extraction performance and noise reduction effect.
    摘要 共同实体和关系抽取是一个过程,它通过单一模型标识实体对和其关系。我们关注在训练这些模型的远程标注数据上的问题,这些数据是通过文本库中的实体提及与知识库中的实体和关系类型的对应进行对齐的。一个关键挑战是噪声标注,它来自实体和关系注释,并对监督学习应用产生重要影响。然而,现有研究主要只处理一种噪声,因此限制了噪声减少的效iveness。为了填补这个空白,我们介绍了一种新的噪声Robust Approach,它包括以下两个部分:1. 使用预训练的 GPT-2 在序列标记方案中同时检测实体和关系,以提高实体和关系的同时检测能力。2. 使用一种噪声Robust的学习框架,包括一种新的损失函数,该损失函数考虑实体和关系之间的依赖关系和重要关系模式,以及一种自适应学习步骤,该步骤在高质量实例上进行逐步选择和训练。我们在两个数据集上进行了实验,结果表明,我们的方法在同时检测性能和噪声减少效果方面都超过了现有状态的方法。

The Troubling Emergence of Hallucination in Large Language Models – An Extensive Definition, Quantification, and Prescriptive Remediations

  • paper_url: http://arxiv.org/abs/2310.04988
  • repo_url: None
  • paper_authors: Vipula Rawte, Swagata Chakraborty, Agnibh Pathak, Anubhav Sarkar, S. M Towhidul Islam Tonmoy, Aman Chadha, Amit P. Sheth, Amitava Das
  • for: 本研究旨在提供一种细化的幻觉分类方法,以及对幻觉的减轻策略。
  • methods: 本研究使用了15种当代大语言模型生成75,000个样本,并对其进行了人工标注。此外,本研究还提出了一个幻觉敏感指数(HVI),用于评估和排序不同的大语言模型在生成幻觉方面的敏感度。
  • results: 本研究对幻觉进行了细化分类,并提出了两种减轻幻觉的方法。
    Abstract The recent advancements in Large Language Models (LLMs) have garnered widespread acclaim for their remarkable emerging capabilities. However, the issue of hallucination has parallelly emerged as a by-product, posing significant concerns. While some recent endeavors have been made to identify and mitigate different types of hallucination, there has been a limited emphasis on the nuanced categorization of hallucination and associated mitigation methods. To address this gap, we offer a fine-grained discourse on profiling hallucination based on its degree, orientation, and category, along with offering strategies for alleviation. As such, we define two overarching orientations of hallucination: (i) factual mirage (FM) and (ii) silver lining (SL). To provide a more comprehensive understanding, both orientations are further sub-categorized into intrinsic and extrinsic, with three degrees of severity - (i) mild, (ii) moderate, and (iii) alarming. We also meticulously categorize hallucination into six types: (i) acronym ambiguity, (ii) numeric nuisance, (iii) generated golem, (iv) virtual voice, (v) geographic erratum, and (vi) time wrap. Furthermore, we curate HallucInation eLiciTation (HILT), a publicly available dataset comprising of 75,000 samples generated using 15 contemporary LLMs along with human annotations for the aforementioned categories. Finally, to establish a method for quantifying and to offer a comparative spectrum that allows us to evaluate and rank LLMs based on their vulnerability to producing hallucinations, we propose Hallucination Vulnerability Index (HVI). We firmly believe that HVI holds significant value as a tool for the wider NLP community, with the potential to serve as a rubric in AI-related policy-making. In conclusion, we propose two solution strategies for mitigating hallucinations.
    摘要 最近的大自然语言模型(LLM)的进步得到了广泛的赞誉,但同时也出现了一种问题,即幻觉。幻觉的出现引起了一定的担忧,因为它可能会对语言处理 tasks 产生负面影响。然而,关于幻觉的细分分类和相应的缓解方法尚未得到了足够的重视。为了解决这个问题,我们提出了一种细化的幻觉分类方法,以及一些缓解方法。我们将幻觉分为两类:(i)实际幻觉(FM)和(ii)银色幻觉(SL)。这两类幻觉进一步分为内在和外在两类,并且分为三级幻觉的严重程度:(i)轻度、(ii)中度和(iii)警示级。此外,我们还将幻觉分为六种类型:(i)字符混淆、(ii)数字幻觉、(iii)生成的 GOLEM、(iv)虚拟之声、(v)地理错误和(vi)时间包袋。此外,我们还创建了一个名为 HallucInation eLiciTation(HILT)的公共可用数据集,包含75000个样本,由15种当代LLM生成,以及人工标注的相应类别。最后,我们提出了一种用于评估和排名 LLM 的幻觉抵触指数(HVI)。我们认为 HVI 对 NLP 社区拥有广泛的价值,并可能被用作 AI 相关的政策制定的工具。为了缓解幻觉,我们提出了两种解决方案。

A new economic and financial theory of money

  • paper_url: http://arxiv.org/abs/2310.04986
  • repo_url: None
  • paper_authors: Michael E. Glinsky, Sharon Sievert
  • for: This paper aims to reformulate economic and financial theory to include electronic currencies, and to develop a new view of electronic currency as a transactional equity associated with tangible assets.
  • methods: The paper uses macroeconomic theory and the fundamental equation of monetary policy to value electronic currencies, and employs multi time scale models to capture true risk. The decision-making process is approached using deep reinforcement learning, generative pretrained transformers, and other methods of artificial intelligence.
  • results: The paper develops a new view of electronic currency management firms as entities responsible for coordinated monetary and fiscal policies of a substantial sub-economy, and proposes a system response function and DRL/GPT/AI-based active nonlinear control to stabilize unstable equilibriums in the sub-economy.
    Abstract This paper fundamentally reformulates economic and financial theory to include electronic currencies. The valuation of the electronic currencies will be based on macroeconomic theory and the fundamental equation of monetary policy, not the microeconomic theory of discounted cash flows. The view of electronic currency as a transactional equity associated with tangible assets of a sub-economy will be developed, in contrast to the view of stock as an equity associated mostly with intangible assets of a sub-economy. The view will be developed of the electronic currency management firm as an entity responsible for coordinated monetary (electronic currency supply and value stabilization) and fiscal (investment and operational) policies of a substantial (for liquidity of the electronic currency) sub-economy. The risk model used in the valuations and the decision-making will not be the ubiquitous, yet inappropriate, exponential risk model that leads to discount rates, but will be multi time scale models that capture the true risk. The decision-making will be approached from the perspective of true systems control based on a system response function given by the multi scale risk model and system controllers that utilize the Deep Reinforcement Learning, Generative Pretrained Transformers, and other methods of Artificial Intelligence (DRL/GPT/AI). Finally, the sub-economy will be viewed as a nonlinear complex physical system with both stable equilibriums that are associated with short-term exploitation, and unstable equilibriums that need to be stabilized with active nonlinear control based on the multi scale system response functions and DRL/GPT/AI.
    摘要

Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset

  • paper_url: http://arxiv.org/abs/2310.04982
  • repo_url: None
  • paper_authors: Ze Liu
  • for: 本研究旨在提高 Text-to-Speech(TTS)synthesis 的质量,使用深度学习,但现代 TTS 模型需要大量数据。因此,本研究强调使用传输学习,特别是几何学习、少量数据和自定义数据集。
  • methods: 本研究使用了现代 TTS 模型的传输学习能力,进行了系统技术分析,并对几何学习模型进行了实验分析。
  • results: 研究发现,传输学习可以大幅提高 TTS 模型在紧张数据集上的表现,并且可以找到适合特定数据集的优化模型。这种模型可以在数据稀缺时提供高质量的语音输出。
    Abstract Text-to-Speech (TTS) synthesis using deep learning relies on voice quality. Modern TTS models are advanced, but they need large amount of data. Given the growing computational complexity of these models and the scarcity of large, high-quality datasets, this research focuses on transfer learning, especially on few-shot, low-resource, and customized datasets. In this research, "low-resource" specifically refers to situations where there are limited amounts of training data, such as a small number of audio recordings and corresponding transcriptions for a particular language or dialect. This thesis, is rooted in the pressing need to find TTS models that require less training time, fewer data samples, yet yield high-quality voice output. The research evaluates TTS state-of-the-art model transfer learning capabilities through a thorough technical analysis. It then conducts a hands-on experimental analysis to compare models' performance in a constrained dataset. This study investigates the efficacy of modern TTS systems with transfer learning on specialized datasets and a model that balances training efficiency and synthesis quality. Initial hypotheses suggest that transfer learning could significantly improve TTS models' performance on compact datasets, and an optimal model may exist for such unique conditions. This thesis predicts a rise in transfer learning in TTS as data scarcity increases. In the future, custom TTS applications will favour models optimized for specific datasets over generic, data-intensive ones.
    摘要 TEXT-TO-SPEECH(TTS)合成使用深度学习取决于声音质量。现代TTS模型非常先进,但它们需要大量数据。随着这些模型的计算复杂度的增加和数据的罕见性,这些研究将注重传输学习,特别是几个数据集的传输学习。在这种情况下,“低资源”指的是有限的训练数据,例如一小number of audio recording和相应的转录 для一种语言或方言。这个研究是根据找到需要 fewer training time和数据amples的TTS模型的强需求。研究通过深入技术分析评估TTS模型的传输学习能力,然后通过实验分析比较模型在紧张数据集中的表现。这个研究investigatesmodern TTS系统在特殊数据集上的传输学习能力,并预测未来custom TTS应用程序将偏好特定数据集上优化的模型。Note: Simplified Chinese is used here as the translation target, as it is more widely used in mainland China and is the standard form of Chinese used in many online applications. If you prefer Traditional Chinese, I can provide that as well.

MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday Tasks

  • paper_url: http://arxiv.org/abs/2310.04965
  • repo_url: None
  • paper_authors: Jingyuan Qi, Minqian Liu, Ying Shen, Zhiyang Xu, Lifu Huang
  • for: 提高AI虚拟助手完成日常任务的自动生成脚本能力,特别是对于不熟悉的任务。
  • methods: 基于多模态视频和文本描述,提出了两个新任务:多模态脚本生成和后续步骤预测。两个任务的输入都是目标任务名和一段完成目标任务的视频示例,输出包括(1)基于视频示例的结构化文本描述,和(2)基于视频示例的后续步骤文本描述。
  • results: 提出了两种基于大语言模型知识的多模态生成框架,并在MultiScript挑战 задании上实现了显著提高。
    Abstract Automatically generating scripts (i.e. sequences of key steps described in text) from video demonstrations and reasoning about the subsequent steps are crucial to the modern AI virtual assistants to guide humans to complete everyday tasks, especially unfamiliar ones. However, current methods for generative script learning rely heavily on well-structured preceding steps described in text and/or images or are limited to a certain domain, resulting in a disparity with real-world user scenarios. To address these limitations, we present a new benchmark challenge -- MultiScript, with two new tasks on task-oriented multimodal script learning: (1) multimodal script generation, and (2) subsequent step prediction. For both tasks, the input consists of a target task name and a video illustrating what has been done to complete the target task, and the expected output is (1) a sequence of structured step descriptions in text based on the demonstration video, and (2) a single text description for the subsequent step, respectively. Built from WikiHow, MultiScript covers multimodal scripts in videos and text descriptions for over 6,655 human everyday tasks across 19 diverse domains. To establish baseline performance on MultiScript, we propose two knowledge-guided multimodal generative frameworks that incorporate the task-related knowledge prompted from large language models such as Vicuna. Experimental results show that our proposed approaches significantly improve over the competitive baselines.
    摘要 现代AI虚拟助手需要自动生成脚本(即文本描述的顺序步骤)从视频示例中,并根据示例视频进行逻辑推理来导引人类完成日常任务,特别是不熟悉的任务。然而,现有的生成脚本学习方法都是基于结构化的前置步骤(文本和/或图像),或者只能在特定领域中进行学习,这导致了与实际用户场景的差距。为了解决这些限制,我们提出了一个新的比赛挑战——MultiScript,包括两个新任务:(1)多媒体脚本生成和(2)后续步骤预测。对于两个任务,输入都是目标任务名和一段完成目标任务的视频示例,并且期望的输出是(1)基于示例视频的结构化文本描述,和(2)一个基于示例视频的文本描述。MultiScript由WikiHow建立,覆盖了视频和文本描述的多媒体脚本 для人类日常任务的19个不同领域,涵盖了6,655个任务。为了确定MultiScript的基准性能,我们提议两种基于大型自然语言模型(如Vicuna)的知识导向多媒体生成框架,实验结果表明,我们的提议方法具有显著的优势。

LLM4VV: Developing LLM-Driven Testsuite for Compiler Validation

  • paper_url: http://arxiv.org/abs/2310.04963
  • repo_url: None
  • paper_authors: Christian Munley, Aaron Jarmusch, Sunita Chandrasekaran
  • for: This paper explores the capability of state-of-the-art large language models (LLMs) to automatically generate tests and validate compiler implementations of a directive-based programming paradigm, OpenACC.
  • methods: The paper employs various prompt engineering techniques, including code templates, retrieval-augmented generation (RAG) with code templates, expressive prompts using RAG with code templates, one-shot examples, and RAG with one-shot examples.
  • results: The paper investigates the outcome of LLMs-generated tests and analyzes the capabilities of the latest LLMs for code generation.
    Abstract Large language models (LLMs) are a new and powerful tool for a wide span of applications involving natural language and demonstrate impressive code generation abilities. In this paper, we explore the capabilitity of state-of-the-art LLMs, including closed-source options like OpenAI GPT-4 and open-source alternatives like Meta AI Codellama, to automatically generate tests and use these tests to validate and verify compiler implementations of a directive-based programming paradigm, OpenACC. Our approach entails exploring various prompt engineering techniques including a code template, retrieval-augmented generation (RAG) with code template, expressive prompt using RAG with code template, one-shot example, and RAG with one-shot example. This paper focusses on (a) exploring the capabilities of the latest LLMs for code generation, (b) investigating prompt and fine tuning methods, and (c) analyzing the outcome of LLMs generated tests
    摘要 大型自然语言模型(LLM)是一种新的和强大的工具,可以应用于许多自然语言相关的应用程序。在这篇论文中,我们探讨了当前领先的LLM,包括OpenAI GPT-4和Meta AI Codellama等closed-source选择,以及open-source的选择,用于自动生成测试,并使用这些测试来验证和验证编译器实现的指令式编程方法OpenACC。我们的方法包括使用代码模板、代码检索增强生成(RAG)、表达式提示、一shot示例和RAG与一shot示例等多种提示工程技术。本文主要关注以下三点:1. 探讨最新的LLM代码生成能力2. 探讨提示和精度调整方法3. 分析LLM生成的测试结果

Safe Deep Policy Adaptation

  • paper_url: http://arxiv.org/abs/2310.08602
  • repo_url: None
  • paper_authors: Wenli Xiao, Tairan He, John Dolan, Guanya Shi
  • for: 本研究旨在开发一种能够快速适应动态不确定环境的自主 робоット控制框架,同时保证安全性和稳定性。
  • methods: 本研究使用了policy adaptation基于再归折衔学习(RL),并提出了一种安全防止(Safety Filter)来保证实际世界中的安全性。
  • results: 实验结果显示,SafeDPA在三个不同的环境中(倒挠杆、Safety Gym和RC Car)具有出色的安全性和任务性能,与现有的基准值进行比较,SafeDPA在不可见干扰的实际世界中展现出了300%的安全率提升。
    Abstract A critical goal of autonomy and artificial intelligence is enabling autonomous robots to rapidly adapt in dynamic and uncertain environments. Classic adaptive control and safe control provide stability and safety guarantees but are limited to specific system classes. In contrast, policy adaptation based on reinforcement learning (RL) offers versatility and generalizability but presents safety and robustness challenges. We propose SafeDPA, a novel RL and control framework that simultaneously tackles the problems of policy adaptation and safe reinforcement learning. SafeDPA jointly learns adaptive policy and dynamics models in simulation, predicts environment configurations, and fine-tunes dynamics models with few-shot real-world data. A safety filter based on the Control Barrier Function (CBF) on top of the RL policy is introduced to ensure safety during real-world deployment. We provide theoretical safety guarantees of SafeDPA and show the robustness of SafeDPA against learning errors and extra perturbations. Comprehensive experiments on (1) classic control problems (Inverted Pendulum), (2) simulation benchmarks (Safety Gym), and (3) a real-world agile robotics platform (RC Car) demonstrate great superiority of SafeDPA in both safety and task performance, over state-of-the-art baselines. Particularly, SafeDPA demonstrates notable generalizability, achieving a 300% increase in safety rate compared to the baselines, under unseen disturbances in real-world experiments.
    摘要 “一个重要目标是实现自主机器人快速适应动态和不确定环境。类型的适应控制和安全控制可以提供稳定性和安全保证,但是仅对特定系统类型有效。相比之下,基于征得学习(RL)的政策适应则提供了多样性和普遍性,但是产生了安全和可靠性挑战。我们提出了SafeDPA,一个新的RL和控制框架,同时解决政策适应和安全征得学习的问题。SafeDPA在实验中同时学习适应政策和动力学模型,预测环境配置,并将几何数据进行精确化。我们引入了基于控制障碍函数(CBF)的安全筛选器,以保证在真实世界中的安全运行。我们提供了理论上的安全保证,并证明SafeDPA对学习错误和额外干扰具有Robustness。实验结果显示,SafeDPA在三个不同的应用中具有优秀的安全性和任务性能,比基准设定更高。特别是,SafeDPA在未见到的干扰下 demonstrate了特别的多样性,在真实世界中获得300%的安全率提升。”

CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation

  • paper_url: http://arxiv.org/abs/2310.04951
  • repo_url: https://github.com/weixiangyan/codetransocean
  • paper_authors: Weixiang Yan, Yuchen Tian, Yunzhe Li, Qian Chen, Wen Wang
  • for: 这个研究旨在提高代码翻译的质量和维护效率,并满足实际应用中的多元化需求。
  • methods: 这个研究使用了人工神经网络翻译模型,探索了多种程式语言之间的翻译,包括具有多种程式语言的复杂混合翻译。
  • results: 研究发现,这些多种程式语言翻译方法可以提高低资源语言的翻译质量和高资源语言的培训效率。此外,研究还提出了一个新的评估指标Debugging Success Rate@K,用于评估翻译后的程式码可行性。
    Abstract Recent code translation techniques exploit neural machine translation models to translate source code from one programming language to another to satisfy production compatibility or to improve efficiency of codebase maintenance. Most existing code translation datasets only focus on a single pair of popular programming languages. To advance research on code translation and meet diverse requirements of real-world applications, we construct CodeTransOcean, a large-scale comprehensive benchmark that supports the largest variety of programming languages for code translation. CodeTransOcean consists of three novel multilingual datasets, namely, MultilingualTrans supporting translations between multiple popular programming languages, NicheTrans for translating between niche programming languages and popular ones, and LLMTrans for evaluating executability of translated code by large language models (LLMs). CodeTransOcean also includes a novel cross-framework dataset, DLTrans, for translating deep learning code across different frameworks. We develop multilingual modeling approaches for code translation and demonstrate their great potential in improving the translation quality of both low-resource and high-resource language pairs and boosting the training efficiency. We also propose a novel evaluation metric Debugging Success Rate@K for program-level code translation. Last but not least, we evaluate LLM ChatGPT on our datasets and investigate its potential for fuzzy execution predictions. We build baselines for CodeTransOcean and analyze challenges of code translation for guiding future research. The CodeTransOcean datasets and code are publicly available at https://github.com/WeixiangYAN/CodeTransOcean.
    摘要 现代代码翻译技术利用神经机器翻译模型将源代码从一种编程语言翻译到另一种编程语言,以满足生产兼容性或改善代码维护效率。现有大多数代码翻译数据集只关注单个受欢迎的编程语言对。为了推动代码翻译研究和满足实际应用的多样化需求,我们构建了CodeTransOcean,一个大规模、完整的benchmark,支持最多的编程语言对进行代码翻译。CodeTransOcean包括三个新的多语言数据集:MultilingualTrans、NicheTrans和LLMTrans。MultilingualTrans支持多种受欢迎编程语言之间的翻译,NicheTrans用于将特殊编程语言与受欢迎语言之间翻译,LLMTrans用于通过大语言模型(LLMs)评估翻译后代码的执行可能性。CodeTransOcean还包括一个跨框架数据集DLTrans,用于跨不同框架深度学习代码翻译。我们开发了多语言模型方法,并证明它们在低资源语言对和高资源语言对翻译质量提高和训练效率提高。我们还提出了一个新的评价指标Debugging Success Rate@K,用于评估翻译后代码的可调试性。最后,我们评估了LLM ChatGPT在我们的数据集上的性能,并调查其可能性于软件执行预测。我们建立了CodeTransOcean的基准,并分析了代码翻译的挑战,以帮助未来研究。CodeTransOcean数据集和代码可以在https://github.com/WeixiangYAN/CodeTransOcean上下载。