cs.AI - 2023-07-18

CertPri: Certifiable Prioritization for Deep Neural Networks via Movement Cost in Feature Space

  • paper_url: http://arxiv.org/abs/2307.09375
  • repo_url: None
  • paper_authors: Haibin Zheng, Jinyin Chen, Haibo Jin
  • for: 提高深度神经网络(DNN)软件系统的质量,特别是通过测试输入优先级来检测和修复DNN中的误行为。
  • methods: 基于测试输入运动成本的视角,提出了一种名为CertPri的测试输入优先级技术,该技术可以提供正式的Robustness保证,并且可以在不同的任务、数据、模型和场景下进行应用。
  • results: 对于两个任务(分类和回归)、六种数据形式、四种模型结构和两种场景(白盒和黑盒)进行了广泛的评估,结果显示CertPri的优先级效果明显高于基elines,例如平均提高53.97%的优先级效果。其稳定性和通用性分别高于基elines的1.41-2.00倍和1.33-3.39倍的平均值。
    Abstract Deep neural networks (DNNs) have demonstrated their outperformance in various software systems, but also exhibit misbehavior and even result in irreversible disasters. Therefore, it is crucial to identify the misbehavior of DNN-based software and improve DNNs' quality. Test input prioritization is one of the most appealing ways to guarantee DNNs' quality, which prioritizes test inputs so that more bug-revealing inputs can be identified earlier with limited time and manual labeling efforts. However, the existing prioritization methods are still limited from three aspects: certifiability, effectiveness, and generalizability. To overcome the challenges, we propose CertPri, a test input prioritization technique designed based on a movement cost perspective of test inputs in DNNs' feature space. CertPri differs from previous works in three key aspects: (1) certifiable: it provides a formal robustness guarantee for the movement cost; (2) effective: it leverages formally guaranteed movement costs to identify malicious bug-revealing inputs; and (3) generic: it can be applied to various tasks, data, models, and scenarios. Extensive evaluations across 2 tasks (i.e., classification and regression), 6 data forms, 4 model structures, and 2 scenarios (i.e., white-box and black-box) demonstrate CertPri's superior performance. For instance, it significantly improves 53.97% prioritization effectiveness on average compared with baselines. Its robustness and generalizability are 1.41~2.00 times and 1.33~3.39 times that of baselines on average, respectively.
    摘要
  1. 证明性:它提供了 DNN 中输入的形式化Robustness garantue,以确保移动成本的正确性。2. 有效性:它利用 formally guaranteed 的移动成本来 Identify DNN 中的恶意 bug-revealing 输入。3. 普遍性:它可以应用于不同的任务、数据、模型和场景。我们在 2 个任务(分类和回归)、6 种数据形式、4 种模型结构和 2 种场景(白盒和黑盒)进行了广泛的评估。结果表明,CertPri 在 average 上提高了 53.97% 的优先级效果,相比基eline。它的Robustness和普遍性分别高于基eline 的 1.41-2.00 倍和 1.33-3.39 倍的平均值。

Local Minima Drive Communications in Cooperative Interaction

  • paper_url: http://arxiv.org/abs/2307.09364
  • repo_url: None
  • paper_authors: Roger K. Moore
  • for: 这种研究旨在探讨人机合作中,何时决定通信,尤其是在合作任务中。
  • methods: 这种研究使用了感知控制理论(PCT),发现在共同完成任务的情况下,只要有共同的目标,并且合作行为足够完成任务,而不需要间接通信。但是,当任务存在本地最小点时,global解只能通过适时的交流寻找。
  • results: 在计算机基础的 simulate 环境中,两个独立的一维Agent通过合作解决了两个维度的路径找问题。
    Abstract An important open question in human-robot interaction (HRI) is precisely when an agent should decide to communicate, particularly in a cooperative task. Perceptual Control Theory (PCT) tells us that agents are able to cooperate on a joint task simply by sharing the same 'intention', thereby distributing the effort required to complete the task among the agents. This is even true for agents that do not possess the same abilities, so long as the goal is observable, the combined actions are sufficient to complete the task, and there is no local minimum in the search space. If these conditions hold, then a cooperative task can be accomplished without any communication between the contributing agents. However, for tasks that do contain local minima, the global solution can only be reached if at least one of the agents adapts its intention at the appropriate moments, and this can only be achieved by appropriately timed communication. In other words, it is hypothesised that in cooperative tasks, the function of communication is to coordinate actions in a complex search space that contains local minima. These principles have been verified in a computer-based simulation environment in which two independent one-dimensional agents are obliged to cooperate in order to solve a two-dimensional path-finding task.
    摘要 人机交互(HRI)中一个重要的问题是决定何时通信,特别是在合作任务中。感知控制理论(PCT)告诉我们,只要 agents 共享同一个 '意图',就可以在合作任务中分配完成任务所需的努力。这是甚至适用于不具备同样能力的代理人,只要目标可见,合作行动充分,并且无地点最低点。如果这些条件成立,那么合作任务就可以无需交流完成。但是,对于包含地点最低点的任务,全球解决方案只能通过至少一个代理人在合适时间修改其意图来实现。这意味着,在合作任务中,交流的函数是协调在复杂的搜索空间中的行动。这些原则在一个基于计算机的模拟环境中已经得到了验证,在两个独立的一维代理人之间进行了一个两维路径找到任务。

MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments

  • paper_url: http://arxiv.org/abs/2307.09361
  • repo_url: None
  • paper_authors: Spyros Gidaris, Andrei Bursuc, Oriane Simeoni, Antonin Vobecky, Nikos Komodakis, Matthieu Cord, Patrick Pérez
  • for: 降低干扰Transformer网络的贪婪需求,使用自我监督学习。
  • methods: 使用Masked Image Modeling策略和对比策略,并将两种学习方法融合在一起。
  • results: 在低样本设定下 achieve新的状态调研结果,并在多种评价 протокол中显示出强大的实验结果,训练时间至少三倍于先前方法。
    Abstract Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks for very large fully-annotated datasets. Different classes of self-supervised learning offer representations with either good contextual reasoning properties, e.g., using masked image modeling strategies, or invariance to image perturbations, e.g., with contrastive methods. In this work, we propose a single-stage and standalone method, MOCA, which unifies both desired properties using novel mask-and-predict objectives defined with high-level features (instead of pixel-level details). Moreover, we show how to effectively employ both learning paradigms in a synergistic and computation-efficient way. Doing so, we achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols with a training that is at least 3 times faster than prior methods.
    摘要 自我监督学习可以用于 mitigating 视图转换器网络的贪婪需求,不需要很大的完全标注数据集。不同类型的自我监督学习提供了不同的表示,如使用遮盲图像模型策略,或者对图像干扰的不变性。在这个工作中,我们提出了一种单stage和独立的方法,MOCA,它通过使用新的遮盲预测目标定义高级特征(而不是像素级别的细节)来实现良好的上下文理解性和图像干扰不变性。此外,我们还证明了如何有效地结合这两种学习方法,以实现更高的性能。通过这种方法,我们在低预测设定下达到了新的状态状态Record和强大的实验性能,并且训练时间至少三次 faster than priori方法。

Learning to Select SAT Encodings for Pseudo-Boolean and Linear Integer Constraints

  • paper_url: http://arxiv.org/abs/2307.09342
  • repo_url: https://github.com/felixvuo/lease-data
  • paper_authors: Felix Ulrich-Oltean, Peter Nightingale, James Alfred Walker
  • for: 解决复杂的允许满足和优化问题
  • methods: 使用超级vised机器学习方法选择编码
  • results: 选择编码效果优于AutoFolio,并且在未看过的问题类型上也能获得良好的结果
    Abstract Many constraint satisfaction and optimisation problems can be solved effectively by encoding them as instances of the Boolean Satisfiability problem (SAT). However, even the simplest types of constraints have many encodings in the literature with widely varying performance, and the problem of selecting suitable encodings for a given problem instance is not trivial. We explore the problem of selecting encodings for pseudo-Boolean and linear constraints using a supervised machine learning approach. We show that it is possible to select encodings effectively using a standard set of features for constraint problems; however we obtain better performance with a new set of features specifically designed for the pseudo-Boolean and linear constraints. In fact, we achieve good results when selecting encodings for unseen problem classes. Our results compare favourably to AutoFolio when using the same feature set. We discuss the relative importance of instance features to the task of selecting the best encodings, and compare several variations of the machine learning method.
    摘要 许多约束满足和优化问题可以有效地通过将其编码为布尔满足问题(SAT)来解决。然而,即使最简单的约束也有许多文献中的编码方法,性能各异,选择适合的编码方法 для给定问题实例是一个不轻松的问题。我们使用监督式机器学习方法来选择编码方法,并证明可以使用标准的约束问题特征集来选择编码方法,并且使用新的特征集来选择 Pseudo-布尔和线性约束的编码方法。实际上,我们可以对未看过的问题类型进行有效的编码选择。我们的结果与AutoFolio相比,使用同一个特征集时比较好。我们讨论实例特征对选择最佳编码的任务的重要性,并对不同的机器学习方法进行比较。

Company2Vec – German Company Embeddings based on Corporate Websites

  • paper_url: http://arxiv.org/abs/2307.09332
  • repo_url: None
  • paper_authors: Christopher Gerling
  • for: 这个论文提出了一种新的应用场景,即 Representation Learning。模型使用 Word2Vec 和维度减少分析企业活动记录,保持语言结构的含义,并创建细致的行业嵌入。
  • methods: 这个论文使用 Word2Vec 和维度减少来分析企业活动记录,创建细致的行业嵌入。
  • results: 这个论文得到了高效的企业嵌入,可以用于多种银行应用程序,如顶尖词汇分析等。此外,论文还提出了三种算法used for peer-firm identification,包括 firm-centric、industry-centric 和 portfolio-centric。
    Abstract With Company2Vec, the paper proposes a novel application in representation learning. The model analyzes business activities from unstructured company website data using Word2Vec and dimensionality reduction. Company2Vec maintains semantic language structures and thus creates efficient company embeddings in fine-granular industries. These semantic embeddings can be used for various applications in banking. Direct relations between companies and words allow semantic business analytics (e.g. top-n words for a company). Furthermore, industry prediction is presented as a supervised learning application and evaluation method. The vectorized structure of the embeddings allows measuring companies similarities with the cosine distance. Company2Vec hence offers a more fine-grained comparison of companies than the standard industry labels (NACE). This property is relevant for unsupervised learning tasks, such as clustering. An alternative industry segmentation is shown with k-means clustering on the company embeddings. Finally, this paper proposes three algorithms for (1) firm-centric, (2) industry-centric and (3) portfolio-centric peer-firm identification.
    摘要 With Company2Vec, the paper proposes a novel application in representation learning. The model analyzes business activities from unstructured company website data using Word2Vec and dimensionality reduction. Company2Vec maintains semantic language structures and thus creates efficient company embeddings in fine-granular industries. These semantic embeddings can be used for various applications in banking. Direct relations between companies and words allow semantic business analytics (e.g. top-n words for a company). Furthermore, industry prediction is presented as a supervised learning application and evaluation method. The vectorized structure of the embeddings allows measuring companies similarities with the cosine distance. Company2Vec hence offers a more fine-grained comparison of companies than the standard industry labels (NACE). This property is relevant for unsupervised learning tasks, such as clustering. An alternative industry segmentation is shown with k-means clustering on the company embeddings. Finally, this paper proposes three algorithms for (1) firm-centric, (2) industry-centric and (3) portfolio-centric peer-firm identification.Here's the translation in Traditional Chinese:With Company2Vec, the paper proposes a novel application in representation learning. The model analyzes business activities from unstructured company website data using Word2Vec and dimensionality reduction. Company2Vec maintains semantic language structures and thus creates efficient company embeddings in fine-granular industries. These semantic embeddings can be used for various applications in banking. Direct relations between companies and words allow semantic business analytics (e.g. top-n words for a company). Furthermore, industry prediction is presented as a supervised learning application and evaluation method. The vectorized structure of the embeddings allows measuring companies similarities with the cosine distance. Company2Vec hence offers a more fine-grained comparison of companies than the standard industry labels (NACE). This property is relevant for unsupervised learning tasks, such as clustering. An alternative industry segmentation is shown with k-means clustering on the company embeddings. Finally, this paper proposes three algorithms for (1) firm-centric, (2) industry-centric and (3) portfolio-centric peer-firm identification.

Exploiting Field Dependencies for Learning on Categorical Data

  • paper_url: http://arxiv.org/abs/2307.09321
  • repo_url: https://github.com/csiro-robotics/mdl
  • paper_authors: Zhibin Li, Piotr Koniusz, Lu Zhang, Daniel Edward Pagendam, Peyman Moghadam
  • for: 本研究旨在利用领域关系学习 categorical 数据,以优化预测性能。
  • methods: 我们提出了一种新的方法,即在 global field dependency matrix 中学习领域关系,并通过 local dependency modelling 在实例级别进行修改,以提高预测性能。
  • results: 我们的方法在 six 个popular dataset上进行了比较,与state-of-the-art方法进行了比较,并且得到了更好的性能。
    Abstract Traditional approaches for learning on categorical data underexploit the dependencies between columns (\aka fields) in a dataset because they rely on the embedding of data points driven alone by the classification/regression loss. In contrast, we propose a novel method for learning on categorical data with the goal of exploiting dependencies between fields. Instead of modelling statistics of features globally (i.e., by the covariance matrix of features), we learn a global field dependency matrix that captures dependencies between fields and then we refine the global field dependency matrix at the instance-wise level with different weights (so-called local dependency modelling) w.r.t. each field to improve the modelling of the field dependencies. Our algorithm exploits the meta-learning paradigm, i.e., the dependency matrices are refined in the inner loop of the meta-learning algorithm without the use of labels, whereas the outer loop intertwines the updates of the embedding matrix (the matrix performing projection) and global dependency matrix in a supervised fashion (with the use of labels). Our method is simple yet it outperforms several state-of-the-art methods on six popular dataset benchmarks. Detailed ablation studies provide additional insights into our method.
    摘要 传统方法学习 categorical 数据会忽略数据集中列(即字段)之间的依赖关系,因为它们基于单独的分类/回归损失来驱动数据点的嵌入。然而,我们提出了一种新的方法,旨在利用数据集中列之间的依赖关系。而不是基于特征的全局统计(即特征 covariance 矩阵)来模型特征,我们学习一个全局字段依赖矩阵,然后在每个实例级别使用不同的权重(即本地依赖模型)来改进字段依赖的模型。我们的算法利用了元学习概念,即依赖矩阵在内Loop中被反复更新,而外层Loop则在有标签的情况下进行监督式更新(使用标签)。我们的方法简单,但它在六个流行的数据集benchmark上表现出色,并且在详细的减少研究中提供了更多的减少研究。

Biomaker CA: a Biome Maker project using Cellular Automata

  • paper_url: http://arxiv.org/abs/2307.09320
  • repo_url: None
  • paper_authors: Ettore Randazzo, Alexander Mordvintsev
  • for: 这个论文旨在描述一个基于细胞自动机(CA)的生物制造项目,即生物制造CA(Biomaker CA),用于模拟复杂的生态系统。
  • methods: 该论文使用了Python JAX框架来平台化CA规则的计算,并在GPU上并行计算。它还提供了多种环境和物理法则,以及不同的模型建筑和突变策略。
  • results: 论文展示了植物代理人在不同的环境下可以生长、存活、繁殖和演化,形成稳定和不稳定的生态系统。它还介绍了如何使用元进化来让模型在恶劣环境中存活,以及如何通过互动进化来让用户直接参与演化过程。
    Abstract We introduce Biomaker CA: a Biome Maker project using Cellular Automata (CA). In Biomaker CA, morphogenesis is a first class citizen and small seeds need to grow into plant-like organisms to survive in a nutrient starved environment and eventually reproduce with variation so that a biome survives for long timelines. We simulate complex biomes by means of CA rules in 2D grids and parallelize all of its computation on GPUs through the Python JAX framework. We show how this project allows for several different kinds of environments and laws of 'physics', alongside different model architectures and mutation strategies. We further analyze some configurations to show how plant agents can grow, survive, reproduce, and evolve, forming stable and unstable biomes. We then demonstrate how one can meta-evolve models to survive in a harsh environment either through end-to-end meta-evolution or by a more surgical and efficient approach, called Petri dish meta-evolution. Finally, we show how to perform interactive evolution, where the user decides how to evolve a plant model interactively and then deploys it in a larger environment. We open source Biomaker CA at: https://tinyurl.com/2x8yu34s .
    摘要 我们介绍生物创造者CA(Biomaker CA):一个基于细胞自动机(CA)的生物创造项目。在生物创造者CA中,形态形成是一等级公民,小种子需要在营养不足环境中成长为植物подіб的生物,以存活和 eventually reproduce 繁殖,以确保生物群落的长期存储。我们使用2D网格和Python JAX框架进行CA规则的并行计算,并可以模拟复杂的生物群落。我们还显示了不同环境和物理法则,以及不同的模型架构和变异策略。我们进一步分析了一些配置,详细介绍植物代理如何成长、存活、繁殖和演化,形成稳定和不稳定的生物群落。最后,我们显示了如何使用终端进化或对应演化来让模型在严峻环境中存活,并且允许用户互动地进化植物模型,然后将其部署到更大的环境中。我们将生物创造者CA开源发布在以下网址:https://tinyurl.com/2x8yu34s。

Rumor Detection with Diverse Counterfactual Evidence

  • paper_url: http://arxiv.org/abs/2307.09296
  • repo_url: https://github.com/vicinity111/dce-rd
  • paper_authors: Kaiwei Zhang, Junchi Yu, Haichao Shi, Jian Liang, Xiao-Yu Zhang
  • for: 这个研究旨在提出一个能够有效地检测谣传的方法,以应对社交媒体中对个人和社区的威胁增加。
  • methods: 本研究使用graph neural networks (GNNs)以探索谣传的传播模式,并利用多个构图来构成多元的探索结果。
  • results: 研究获得了更好的性能,并且可以提供多元的探索结果,以增强谣传检测的可解性和稳定性。
    Abstract The growth in social media has exacerbated the threat of fake news to individuals and communities. This draws increasing attention to developing efficient and timely rumor detection methods. The prevailing approaches resort to graph neural networks (GNNs) to exploit the post-propagation patterns of the rumor-spreading process. However, these methods lack inherent interpretation of rumor detection due to the black-box nature of GNNs. Moreover, these methods suffer from less robust results as they employ all the propagation patterns for rumor detection. In this paper, we address the above issues with the proposed Diverse Counterfactual Evidence framework for Rumor Detection (DCE-RD). Our intuition is to exploit the diverse counterfactual evidence of an event graph to serve as multi-view interpretations, which are further aggregated for robust rumor detection results. Specifically, our method first designs a subgraph generation strategy to efficiently generate different subgraphs of the event graph. We constrain the removal of these subgraphs to cause the change in rumor detection results. Thus, these subgraphs naturally serve as counterfactual evidence for rumor detection. To achieve multi-view interpretation, we design a diversity loss inspired by Determinantal Point Processes (DPP) to encourage diversity among the counterfactual evidence. A GNN-based rumor detection model further aggregates the diverse counterfactual evidence discovered by the proposed DCE-RD to achieve interpretable and robust rumor detection results. Extensive experiments on two real-world datasets show the superior performance of our method. Our code is available at https://github.com/Vicinity111/DCE-RD.
    摘要 随着社交媒体的增长,假新闻对个人和社区而言变得更加严重,引起了开发有效和及时干预假新闻的研究的关注。现有的方法主要利用图神经网络(GNN)来利用假新闻传播过程中的后传 Pattern,但这些方法缺乏假新闻检测的自然解释,同时Results也较为不稳定。在这篇论文中,我们提出了一种基于多元解释的假新闻检测方法,即多元解释假新闻检测框架(DCE-RD)。我们的假设是利用事件图中的多个子图来提供多元解释,然后将这些多元解释集成为可靠的假新闻检测结果。具体来说,我们首先设计了一种子图生成策略,以生成不同的子图。我们限制了这些子图的移除,以避免影响假新闻检测结果。因此,这些子图自然成为假新闻检测中的反例证据。为了实现多元解释,我们设计了一种基于杂度点过程(DPP)的多元解释损失,以鼓励多元解释的涌现。一个基于GNN的假新闻检测模型进一步聚合了DCE-RD所发现的多元解释,以实现可靠和可解释的假新闻检测结果。我们的实验结果表明,我们的方法在两个真实的数据集上具有更高的性能。我们的代码可以在https://github.com/Vicinity111/DCE-RD上找到。

The Language Labyrinth: Constructive Critique on the Terminology Used in the AI Discourse

  • paper_url: http://arxiv.org/abs/2307.10292
  • repo_url: None
  • paper_authors: Rainer Rehak
  • for: 本研究旨在探讨人工智能(AI)领域中的术语问题,即现有的术语如“训练”、“学习”或“决策”等,对于AI的发展和应用产生了潜在的歧视和误导。
  • methods: 本研究采用了 критиче理解和语言哲学等方法,对AI辩论中关键的概念进行分析和批判,并提出了更合适的术语来促进更有价值的讨论。
  • results: 本研究发现,AI辩论中的术语问题导致了对AI的应用和责任的歧视和误导,从而妨碍了更加深入和系统的讨论。通过提出更合适的术语,本研究期望促进更有价值的AI讨论和应用。
    Abstract In the interdisciplinary field of artificial intelligence (AI) the problem of clear terminology is especially momentous. This paper claims, that AI debates are still characterised by a lack of critical distance to metaphors like 'training', 'learning' or 'deciding'. As consequence, reflections regarding responsibility or potential use-cases are greatly distorted. Yet, if relevant decision-makers are convinced that AI can develop an 'understanding' or properly 'interpret' issues, its regular use for sensitive tasks like deciding about social benefits or judging court cases looms. The chapter argues its claim by analysing central notions of the AI debate and tries to contribute by proposing more fitting terminology and hereby enabling more fruitful debates. It is a conceptual work at the intersection of critical computer science and philosophy of language.
    摘要 在人工智能(AI)的跨学科领域中,术语问题特别突出。这篇论文认为,AI讨论仍然受到'"训练"、'"学习"或'"决策"等 мета征的影响,从而导致责任或实际应用场景的反映变得扭曲。然而,如果有关决策者被感受到AI可以'"理解"或'"正确地解读"问题,那么对社会福利或法律案件的决策将变得更加常见。本章 argue其主张,通过分析中心AI讨论概念,并提出更合适的术语,以便促进更有益的讨论。这是一篇在计算机科学与语言哲学交叉领域的概念性工作。

Llama 2: Open Foundation and Fine-Tuned Chat Models

  • paper_url: http://arxiv.org/abs/2307.09288
  • repo_url: https://github.com/facebookresearch/llama
  • paper_authors: Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom
  • for: 这篇论文是为了开发和发布一个大语言模型(LLM)的集合,覆盖 Parameters 的范围在 70 亿到 700 亿之间。
  • methods: 该论文使用了预训练和精度调整的大语言模型(LLM),并对对话用例进行优化。
  • results: 该论文的模型在大多数测试 benchmark 上表现更好,并且根据人类评估,它们在帮助和安全性方面表现出色,可能成为关闭源模型的可行替代品。
    Abstract In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.
    摘要 在这项工作中,我们开发并发布了LLama 2,一个包含预训练和精度调整的大语言模型(LLM)的收藏,其中参数的规模从70亿到700亿。我们的精度调整的LLM,称为Llama 2-Chat,是对话用例的优化。我们的模型在我们测试的大多数标准准则上表现出色,并且根据我们的人工评估,Llama 2-Chat在帮助性和安全性方面可能成为关闭源模型的可接受替补。我们提供了细节的方法和安全改进的描述,以便社区可以基于我们的工作而进一步发展和贡献到责任的LLM开发。

Improving Text Semantic Similarity Modeling through a 3D Siamese Network

  • paper_url: http://arxiv.org/abs/2307.09274
  • repo_url: None
  • paper_authors: Jianxiang Zang, Hui Liu
  • for: 文章主要用于提出一种基于三维 siamese 网络的文本 semantic similarity 模型,以提高现有方法的精度和可扩展性。
  • methods: 该模型使用了一种三维 siamese 网络,将文本semantic information 映射到更高维空间,以保留更多的特征领域信息和空间信息。模型还引入了一些功能扩充模块,包括特征提取、注意力和特征融合,以强化模型的表现。
  • results: 经过广泛的实验 validate,模型在四个文本 semantic similarity 标准 bencmarks 上显示出了高效和高精度的表现。
    Abstract Siamese networks have gained popularity as a method for modeling text semantic similarity. Traditional methods rely on pooling operation to compress the semantic representations from Transformer blocks in encoding, resulting in two-dimensional semantic vectors and the loss of hierarchical semantic information from Transformer blocks. Moreover, this limited structure of semantic vectors is akin to a flattened landscape, which restricts the methods that can be applied in downstream modeling, as they can only navigate this flat terrain. To address this issue, we propose a novel 3D Siamese network for text semantic similarity modeling, which maps semantic information to a higher-dimensional space. The three-dimensional semantic tensors not only retains more precise spatial and feature domain information but also provides the necessary structural condition for comprehensive downstream modeling strategies to capture them. Leveraging this structural advantage, we introduce several modules to reinforce this 3D framework, focusing on three aspects: feature extraction, attention, and feature fusion. Our extensive experiments on four text semantic similarity benchmarks demonstrate the effectiveness and efficiency of our 3D Siamese Network.
    摘要 塞内链模型在文本Semantic相似性领域的应用得到了广泛的关注。传统方法通过pooling操作来压缩Transformer块中的semantic表示,导致得到的是两维semantic вектор,这限制了downstream模型的应用。此外,这种压缩的semantic表示结构类似于平坦的 terrain,这限制了可以应用的方法,它们只能在这个平坦的地形上行走。为了解决这个问题,我们提出了一种新的3D塞内链网络 для文本Semantic相似性模型,它将semantic信息映射到更高维空间。三维semantic tensor不仅保留了更精确的空间和特征领域信息,还提供了下游模型capture这些信息所需的结构条件。利用这种结构优势,我们在3D框架中引入了多个模块,集中在三个方面:特征提取、注意力和特征融合。我们对四个文本Semantic相似性benchmark进行了广泛的实验, demonstarted the effectiveness and efficiency of our 3D Siamese Network.

UniTabE: Pretraining a Unified Tabular Encoder for Heterogeneous Tabular Data

  • paper_url: http://arxiv.org/abs/2307.09249
  • repo_url: None
  • paper_authors: Yazheng Yang, Yuqi Wang, Guang Liu, Ledell Wu, Qi Liu
  • for: 本研究旨在推广预训示法的应用,将其应用到表格数据领域,以提高表格数据的含义表示。
  • methods: 本研究使用了UniTabE方法,UniTabE方法通过将每个基本表格元素表示为Module,并使用Transformer Encoder进行更精确的表示。
  • results: 实验结果显示,UniTabE方法在多种 benchmark 数据集上表现出色,超过了多个基eline模型。这表明UniTabE方法可以有效地提高表格数据的含义表示,为表格数据分析带来 significiant progress.
    Abstract Recent advancements in Natural Language Processing (NLP) have witnessed the groundbreaking impact of pretrained models, yielding impressive outcomes across various tasks. This study seeks to extend the power of pretraining methodologies to tabular data, a domain traditionally overlooked, yet inherently challenging due to the plethora of table schemas intrinsic to different tasks. The primary research questions underpinning this work revolve around the adaptation to heterogeneous table structures, the establishment of a universal pretraining protocol for tabular data, the generalizability and transferability of learned knowledge across tasks, the adaptation to diverse downstream applications, and the incorporation of incremental columns over time. In response to these challenges, we introduce UniTabE, a pioneering method designed to process tables in a uniform manner, devoid of constraints imposed by specific table structures. UniTabE's core concept relies on representing each basic table element with a module, termed TabUnit. This is subsequently followed by a Transformer encoder to refine the representation. Moreover, our model is designed to facilitate pretraining and finetuning through the utilization of free-form prompts. In order to implement the pretraining phase, we curated an expansive tabular dataset comprising approximately 13 billion samples, meticulously gathered from the Kaggle platform. Rigorous experimental testing and analyses were performed under a myriad of scenarios to validate the effectiveness of our methodology. The experimental results demonstrate UniTabE's superior performance against several baseline models across a multitude of benchmark datasets. This, therefore, underscores UniTabE's potential to significantly enhance the semantic representation of tabular data, thereby marking a significant stride in the field of tabular data analysis.
    摘要 近期的自然语言处理(NLP)技术发展,目睹了革命性的影响,在不同任务上呈现出卓越的表现。本研究旨在扩展预训练方法的应用范围,从传统上被忽视的表格数据中获得突破性的结果。本研究的主要研究问题包括适应不同表格结构、建立通用的预训练协议、学习知识的泛化和传递性、适应多样化下游应用和逐渐增加的列数据。为解决这些挑战,我们提出了UniTabE方法,可以统一处理表格数据,不受特定表格结构的限制。UniTabE的核心思想是通过表格元素模块(TabUnit)来表示每个基本表格元素,然后使用TransformerEncoder来细化表示。此外,我们的模型设计了适应预训练和精化的机制,通过自由形式的提示来进行。为实现预训练阶段,我们细心筛选了 approximate 130亿个样本的广泛表格数据集,从Kaggle平台收集。经过严谨的实验测试和分析,我们证明UniTabE方法在多个benchmark数据集上表现出优于多个基eline模型。这些结果证明UniTabE方法可以显著提高表格数据的semantic表示,从而在表格数据分析中做出重要突破。

Towards Sustainable Deep Learning for Multi-Label Classification on NILM

  • paper_url: http://arxiv.org/abs/2307.09244
  • repo_url: None
  • paper_authors: Anže Pirnat, Blaž Bertalanič, Gregor Cerar, Mihael Mohorčič, Carolina Fortuna
  • for: 本研究旨在提高非侵入式电力监测(NILM)的计算和能量效率,并且使用深度学习(DL)技术进行多个标签分类。
  • methods: 本研究使用了一种新的DL模型,以提高NILM的计算和能量效率。此外,本研究还提出了一种测试方法ологи,用于比较不同模型的性能,该方法使用了从测量数据集Synthesized来更好地模拟实际情况。
  • results: 相比之前的状态艺术,提出的模型可以降低碳脚印的二十三分之一,并且在REFIT和UK-DALE数据集上测试时,其性能提高了大约8个百分点。
    Abstract Non-intrusive load monitoring (NILM) is the process of obtaining appliance-level data from a single metering point, measuring total electricity consumption of a household or a business. Appliance-level data can be directly used for demand response applications and energy management systems as well as for awareness raising and motivation for improvements in energy efficiency and reduction in the carbon footprint. Recently, classical machine learning and deep learning (DL) techniques became very popular and proved as highly effective for NILM classification, but with the growing complexity these methods are faced with significant computational and energy demands during both their training and operation. In this paper, we introduce a novel DL model aimed at enhanced multi-label classification of NILM with improved computation and energy efficiency. We also propose a testing methodology for comparison of different models using data synthesized from the measurement datasets so as to better represent real-world scenarios. Compared to the state-of-the-art, the proposed model has its carbon footprint reduced by more than 23% while providing on average approximately 8 percentage points in performance improvement when testing on data derived from REFIT and UK-DALE datasets.
    摘要 非侵入式电力监控(NILM)是从单一监控点获取家用电器或商业用电器的总电力消耗数据。家用电器或商业用电器的总电力消耗数据可以直接用于需求回应应用程序和能源管理系统,以及对能源效率和碳足迹的意识和鼓励。现在,经典机器学习和深度学习(DL)技术在NILM分类中成为非常受欢迎和高效的方法,但是随着模型的复杂度增加,它们面临训练和运行过程中的巨大计算和能源需求。本文提出了一个新的深度学习模型,以提高多标签分类的NILM性能,并提出了一个比较不同模型的试验方法,使用从测量数据中Synthesized的数据,以更好地反映实际情况。与现有的state-of-the-art相比,提出的模型可以降低碳踪数据的碳踪比例超过23%,并在REFIT和UK-DALE数据集上的测试中平均提高了约8%的性能。

De Re and De Dicto Knowledge in Egocentric Setting

  • paper_url: http://arxiv.org/abs/2308.00001
  • repo_url: None
  • paper_authors: Pavel Naumov, Anna Ovchinnikova
  • for: 本研究探讨了逻辑系统中的自我中心性,即研究代理人的性质而不是可能世界的性质。
  • methods: 本研究使用了两种不同的模态,分别表示de re和de dicto的知识,并证明了这两种模态之间不可definable。
  • results: 本研究证明了逻辑系统中的自我中心性不可definable,即不可以通过一种模式来表示另一种模式。
    Abstract Prior proposes the term "egocentric" for logical systems that study properties of agents rather than properties of possible worlds. In such a setting, the paper introduces two different modalities capturing de re and de dicto knowledge and proves that these two modalities are not definable through each other.
    摘要

Human Body Digital Twin: A Master Plan

  • paper_url: http://arxiv.org/abs/2307.09225
  • repo_url: None
  • paper_authors: Chenyu Tang, Shuo Gao, Luigi G. Occhipinti
  • for: 这篇论文旨在探讨人体数据技术(DT)的当前状况和未来发展前景,并提出了一个五级路图来引导未来的发展。
  • methods: 论文使用了各种技术和方法,包括智能监测器、数据收集、数据分析和决策系统的开发。
  • results: 论文提出了一个五级路图,这个路图覆盖了各种组件的开发,如智能监测器、数据收集、数据分析和决策系统的开发。
    Abstract The human body DT has the potential to revolutionize healthcare and wellness, but its responsible and effective implementation requires consideration of various factors. This article presents a comprehensive overview of the current status and future prospects of the human body DT and proposes a five-level roadmap for its development. The roadmap covers the development of various components, such as wearable devices, data collection, data analysis, and decision-making systems. The article also highlights the necessary support, security, cost, and ethical considerations that must be addressed in order to ensure responsible and effective implementation of the human body DT. The proposed roadmap provides a framework for guiding future development and offers a unique perspective on the future of the human body DT, facilitating new interdisciplinary research and innovative solutions in this rapidly evolving field.
    摘要 人体DT有可能革新医疗和健康领域,但其负责和有效实施需要考虑多种因素。这篇文章提供了人体DT当前状况和未来前景的全面概述,并提出了五级路线图,以帮助未来的发展。这个路线图覆盖了不同组件的开发,如便携设备、数据采集、数据分析和决策系统。文章还强调了必须解决的支持、安全、成本和伦理考虑,以确保负责和有效的人体DT实施。提议的路线图为未来发展提供了框架,并且降低了新兴领域的研究和创新的难度。

Automated Ableism: An Exploration of Explicit Disability Biases in Sentiment and Toxicity Analysis Models

  • paper_url: http://arxiv.org/abs/2307.09209
  • repo_url: None
  • paper_authors: Pranav Narayanan Venkit, Mukund Srinath, Shomir Wilson
  • for: 本研究使用 perturbation sensitivity analysis 方法检测社交媒体平台 Twitter 和 Reddit 上关于残疾人的对话中的隐含残疾偏见。
  • methods: 我们使用 BITS 集合(Bias Identification Test in Sentiment)来评估这些 sentiment analysis 和攻击性检测模型中的显式残疾偏见。
  • results: 我们的研究发现所有这些模型都表现出对残疾人的显式偏见。
    Abstract We analyze sentiment analysis and toxicity detection models to detect the presence of explicit bias against people with disability (PWD). We employ the bias identification framework of Perturbation Sensitivity Analysis to examine conversations related to PWD on social media platforms, specifically Twitter and Reddit, in order to gain insight into how disability bias is disseminated in real-world social settings. We then create the \textit{Bias Identification Test in Sentiment} (BITS) corpus to quantify explicit disability bias in any sentiment analysis and toxicity detection models. Our study utilizes BITS to uncover significant biases in four open AIaaS (AI as a Service) sentiment analysis tools, namely TextBlob, VADER, Google Cloud Natural Language API, DistilBERT and two toxicity detection models, namely two versions of Toxic-BERT. Our findings indicate that all of these models exhibit statistically significant explicit bias against PWD.
    摘要 我们分析情感分析和恶意检测模型,以检测对人际残疾(PWD)的直接偏见。我们使用干扰敏感分析框架来检查社交媒体平台上关于PWD的对话,以获得实际社会中残疾偏见的分析。然后,我们创建了《偏见标准测试集》(BITS),以量化任何情感分析和恶意检测模型中的直接残疾偏见。我们的研究使用BITS来揭露四个开源AIaaS(AI作为服务)情感分析工具——TextBlob、VADER、Google Cloud Natural Language API和DistilBERT——以及两个恶意检测模型——两个版本的Toxic-BERT——中的显著偏见。我们的发现表明所有这些模型都表现出了对PWD的 statistically significant direct bias。

ESMC: Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint

  • paper_url: http://arxiv.org/abs/2307.09193
  • repo_url: None
  • paper_authors: Zhenhao Jiang, Biao Zeng, Hao Feng, Jin Liu, Jicong Fan, Jie Zhang, Jia Jia, Ning Hu, Xingyu Chen, Xuguang Lan
  • for: 这篇论文主要针对大规模在线推荐系统中的Click-Through Rate (CTR)和Post-Click Conversion Rate (CVR)估计问题。
  • methods: 该论文提出了一种基于Entire Space Model的多任务模型,通过跟踪用户决策过程“曝光_点击_购买”来解决传统CVR估计器存在样本选择偏见和数据稀缺问题。
  • results: 实验表明,提出的方法在大规模推荐系统中显著超过了现有方法的性能。此外,对实际世界数据集进行了证明和验证。
    Abstract Large-scale online recommender system spreads all over the Internet being in charge of two basic tasks: Click-Through Rate (CTR) and Post-Click Conversion Rate (CVR) estimations. However, traditional CVR estimators suffer from well-known Sample Selection Bias and Data Sparsity issues. Entire space models were proposed to address the two issues via tracing the decision-making path of "exposure_click_purchase". Further, some researchers observed that there are purchase-related behaviors between click and purchase, which can better draw the user's decision-making intention and improve the recommendation performance. Thus, the decision-making path has been extended to "exposure_click_in-shop action_purchase" and can be modeled with conditional probability approach. Nevertheless, we observe that the chain rule of conditional probability does not always hold. We report Probability Space Confusion (PSC) issue and give a derivation of difference between ground-truth and estimation mathematically. We propose a novel Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint (ESMC) and two alternatives: Entire Space Multi-Task Model with Siamese Network (ESMS) and Entire Space Multi-Task Model in Global Domain (ESMG) to address the PSC issue. Specifically, we handle "exposure_click_in-shop action" and "in-shop action_purchase" separately in the light of characteristics of in-shop action. The first path is still treated with conditional probability while the second one is treated with parameter constraint strategy. Experiments on both offline and online environments in a large-scale recommendation system illustrate the superiority of our proposed methods over state-of-the-art models. The real-world datasets will be released.
    摘要 大规模在互联网上的推荐系统遍布全网,负责两个基本任务:点击率(CTR)和后click conversión率(CVR)的估计。然而,传统的CVR估计器受到Well-knownSample Selection Bias和Data Sparsity问题的影响。Entire space模型被提出以解决这两个问题,通过跟踪用户做出决策的“曝光•点击•购买”的决策路径。此外,一些研究人员发现,在点击和购买之间存在购买相关行为,可以更好地捕捉用户做出决策的INTENTION,提高推荐性能。因此,决策路径被扩展为“曝光•点击•在店动作•购买”,可以通过条件概率方法模型。然而,我们发现链式法则不一定成立,我们报告了概率空间混乱(PSC)问题,并给出了数学上的解释。我们提出了一种新的Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint(ESMC),以及两种alternative:Entire Space Multi-Task Model with Siamese Network(ESMS)和Entire Space Multi-Task Model in Global Domain(ESMG),以解决PSC问题。具体来说,我们在“曝光•点击•在店动作”和“在店动作•购买”两个路径上处理它们分别,在它们的特点上进行处理。第一个路径仍然采用条件概率方法,第二个路径采用参数约束策略。在大规模推荐系统中进行了实验,我们的提出的方法在对state-of-the-art模型的比较中表现出色。真实的数据集将被发布。

Enhancing Network Slicing Architectures with Machine Learning, Security, Sustainability and Experimental Networks Integration

  • paper_url: http://arxiv.org/abs/2307.09151
  • repo_url: https://github.com/romoreira/sfi2-energy-sustainability
  • paper_authors: Joberto S. B. Martins, Tereza C. Carvalho, Rodrigo Moreira, Cristiano Both, Adnei Donatti, João H. Corrêa, José A. Suruagy, Sand L. Corrêa, Antonio J. G. Abelem, Moisés R. N. Ribeiro, Jose-Marcos Nogueira, Luiz C. S. Magalhães, Juliano Wickboldt, Tiago Ferreto, Ricardo Mello, Rafael Pasquini, Marcos Schwarz, Leobino N. Sampaio, Daniel F. Macedo, José F. de Rezende, Kleber V. Cardoso, Flávio O. Silva
    for:NS architectures are proposed to optimize and customize scarce resources for 5G/6G applications, but they often have limited domain-specific functionality.methods:The SFI2 architecture proposal integrates experimental networks, enhances NS with ML native optimizations, energy-efficient slicing, and slicing-tailored security functionalities.results:The SFI2 reference architecture instantiations will enhance multi-domain and multi-technology integrated experimental network deployment with native ML optimization, energy-efficient aware slicing, and slicing-tailored security functionalities for practical domains.Here’s the Chinese text in the format you requested:for: NS architectures 是为了优化和自定义5G/6G应用程序中的稀缺资源,但它们通常具有特定领域的限制性。methods: SFI2架构提案 integrates experimental networks, enhances NS with ML native optimizations, energy-efficient slicing, and slicing-tailored security functionalities.results: SFI2 reference architecture instantiations will enhance multi-domain and multi-technology integrated experimental network deployment with native ML optimization, energy-efficient aware slicing, and slicing-tailored security functionalities for practical domains.
    Abstract Network Slicing (NS) is an essential technique extensively used in 5G networks computing strategies, mobile edge computing, mobile cloud computing, and verticals like the Internet of Vehicles and industrial IoT, among others. NS is foreseen as one of the leading enablers for 6G futuristic and highly demanding applications since it allows the optimization and customization of scarce and disputed resources among dynamic, demanding clients with highly distinct application requirements. Various standardization organizations, like 3GPP's proposal for new generation networks and state-of-the-art 5G/6G research projects, are proposing new NS architectures. However, new NS architectures have to deal with an extensive range of requirements that inherently result in having NS architecture proposals typically fulfilling the needs of specific sets of domains with commonalities. The Slicing Future Internet Infrastructures (SFI2) architecture proposal explores the gap resulting from the diversity of NS architectures target domains by proposing a new NS reference architecture with a defined focus on integrating experimental networks and enhancing the NS architecture with Machine Learning (ML) native optimizations, energy-efficient slicing, and slicing-tailored security functionalities. The SFI2 architectural main contribution includes the utilization of the slice-as-a-service paradigm for end-to-end orchestration of resources across multi-domains and multi-technology experimental networks. In addition, the SFI2 reference architecture instantiations will enhance the multi-domain and multi-technology integrated experimental network deployment with native ML optimization, energy-efficient aware slicing, and slicing-tailored security functionalities for the practical domain.
    摘要 Slicing Future Internet Infrastructures (SFI2) 架构提案对 NS 架构的差异进行了探索,并提出了一个新的 NS 参考架构,旨在整合实验网络和增强 NS 架构,并将机器学习 (ML) 、能源效率和slice-tailored 安全功能纳入 NS 架构。SFI2 架构的主要贡献包括使用 slice-as-a-service 模式来实现端到端资源的整合和管理,以及对多域多技术的实验网络进行了Native ML优化、能源效率和slice-tailored 安全功能的增强。在实际领域中,SFI2 架构的实现将提供多域多技术集成的实验网络,并将Native ML优化、能源效率和slice-tailored 安全功能纳入 NS 架构。这将实现在多域多技术集成的实验网络中,实现了端到端资源的整合和管理,并提供了对特定领域的Native ML优化、能源效率和slice-tailored 安全功能。

Machine Learning for SAT: Restricted Heuristics and New Graph Representations

  • paper_url: http://arxiv.org/abs/2307.09141
  • repo_url: None
  • paper_authors: Mikhail Shirokikh, Ilya Shenbin, Anton Alekseev, Sergey Nikolenko
  • for: 这 paper 是解决Boolean satisfiability (SAT) 问题的基础NP-完全问题,包括自动规划和调度。
  • methods: 这 paper 使用机器学习 (ML) 模型来改进 SAT 解决器的规划策略,以减少步骤数和总时间。
  • results: 这 paper 提出了一种策略,在使用 ML 模型开始解决 SAT 问题后,将控制转交给经典的规划策略,以简化冷启动和减少总时间。此外,paper 还介绍了一种针对 SAT 问题转化自其他领域的 Graph-Q-SAT 修改。
    Abstract Boolean satisfiability (SAT) is a fundamental NP-complete problem with many applications, including automated planning and scheduling. To solve large instances, SAT solvers have to rely on heuristics, e.g., choosing a branching variable in DPLL and CDCL solvers. Such heuristics can be improved with machine learning (ML) models; they can reduce the number of steps but usually hinder the running time because useful models are relatively large and slow. We suggest the strategy of making a few initial steps with a trained ML model and then releasing control to classical heuristics; this simplifies cold start for SAT solving and can decrease both the number of steps and overall runtime, but requires a separate decision of when to release control to the solver. Moreover, we introduce a modification of Graph-Q-SAT tailored to SAT problems converted from other domains, e.g., open shop scheduling problems. We validate the feasibility of our approach with random and industrial SAT problems.
    摘要 布尔满足(SAT)是一个基本NP完全问题,具有许多应用,包括自动规划和计划。为解决大规模实例,SAT解决器需要依赖于规则,如选择分支变量在DPLL和CDCL解决器中。这些规则可以通过机器学习(ML)模型进行改进,它们可以减少步骤数量,但通常会增加运行时间,因为有用的模型通常比较大 и慢。我们建议一种策略,即在一个训练好的ML模型的几个初始步骤后,将控制转移给类别规则;这种策略可以简化冰封开始 дляSAT解决和降低总时间,但需要分离的决定何时将控制转移给解决器。此外,我们介绍了基于图形Query-SAT的修改,适用于从其他领域转化的SAT问题,如开放式制造计划问题。我们验证了我们的方法的可行性,使用随机和工业SAT问题。

DropMix: Reducing Class Dependency in Mixed Sample Data Augmentation

  • paper_url: http://arxiv.org/abs/2307.09136
  • repo_url: None
  • paper_authors: Haeil Lee, Hansang Lee, Junmo Kim
  • for: 提高多种任务的性能,但有些类受到MSDA的影响而下降。
  • methods: 提议DropMix方法,通过排除MSDA计算中的特定百分比数据来减少类依赖性。
  • results: 在CIFAR-100和ImageNet两个数据集上,通过组合MSDA和非MSDA数据进行训练,提高了以前受到MSDA影响的类的性能,并将总平均准确率提高。
    Abstract Mixed sample data augmentation (MSDA) is a widely used technique that has been found to improve performance in a variety of tasks. However, in this paper, we show that the effects of MSDA are class-dependent, with some classes seeing an improvement in performance while others experience a decline. To reduce class dependency, we propose the DropMix method, which excludes a specific percentage of data from the MSDA computation. By training on a combination of MSDA and non-MSDA data, the proposed method not only improves the performance of classes that were previously degraded by MSDA, but also increases overall average accuracy, as shown in experiments on two datasets (CIFAR-100 and ImageNet) using three MSDA methods (Mixup, CutMix and PuzzleMix).
    摘要

Cloud-native RStudio on Kubernetes for Hopsworks

  • paper_url: http://arxiv.org/abs/2307.09132
  • repo_url: None
  • paper_authors: Gibson Chikafa, Sina Sheikholeslami, Salman Niazi, Jim Dowling, Vladimir Vlassov
    for:这篇论文主要目标是提供一个基于多户分布式系统的RStudio Server,以提高云计算中R语言开发环境的可用性和可扩展性。methods: authors使用了 Docker和Kubernetes等云原生技术来解决多户分布式环境中的性能隔离、安全性和扩展性问题。他们还实现了在RStudio Server实例中安全地共享数据,以便在RStudio用户之间进行数据共享和协作。results: authors在Google云平台上测试了他们的系统,并证明了可以同时运行44个RStudio Server实例,每个实例具有2GB的RAM。此外,他们的系统可以扩展到支持数百个同时运行的RStudio Server实例,只需要添加更多资源(CPU和RAM)到集群或系统中即可。
    Abstract In order to fully benefit from cloud computing, services are designed following the "multi-tenant" architectural model, which is aimed at maximizing resource sharing among users. However, multi-tenancy introduces challenges of security, performance isolation, scaling, and customization. RStudio server is an open-source Integrated Development Environment (IDE) accessible over a web browser for the R programming language. We present the design and implementation of a multi-user distributed system on Hopsworks, a data-intensive AI platform, following the multi-tenant model that provides RStudio as Software as a Service (SaaS). We use the most popular cloud-native technologies: Docker and Kubernetes, to solve the problems of performance isolation, security, and scaling that are present in a multi-tenant environment. We further enable secure data sharing in RStudio server instances to provide data privacy and allow collaboration among RStudio users. We integrate our system with Apache Spark, which can scale and handle Big Data processing workloads. Also, we provide a UI where users can provide custom configurations and have full control of their own RStudio server instances. Our system was tested on a Google Cloud Platform cluster with four worker nodes, each with 30GB of RAM allocated to them. The tests on this cluster showed that 44 RStudio servers, each with 2GB of RAM, can be run concurrently. Our system can scale out to potentially support hundreds of concurrently running RStudio servers by adding more resources (CPUs and RAM) to the cluster or system.
    摘要 为了完全利用云计算,服务采用“多户”建筑模型,以最大化用户之间资源共享。然而,多户性带来安全性、性能隔离、扩展和自定义等挑战。RStudio服务器是一个开源的集成开发环境(IDE),可以通过网页浏览器访问R编程语言。我们提出了在Hopsworks上实现多用户分布式系统,以多户模型提供RStudio作为服务(SaaS)。我们使用云原生技术:Docker和Kubernetes解决多户环境中存在的性能隔离、安全性和扩展等问题。此外,我们还启用了安全数据共享在RStudio服务器实例中,以保障数据隐私和允许RStudio用户之间的合作。我们将系统与Apache Spark集成,可以扩展和处理大规模数据处理任务。此外,我们还提供了用户可定制配置和完全控制自己的RStudio服务器实例的UI。我们的系统在Google云平台集群上进行了测试,每个工作节点有30GB的RAM,并测试了44个RStudio服务器,每个服务器具有2GB的RAM。我们的系统可以扩展到可能支持百个同时运行的RStudio服务器,通过添加更多资源(CPU和RAM)到集群或系统。

BOLD: A Benchmark for Linked Data User Agents and a Simulation Framework for Dynamic Linked Data Environments

  • paper_url: http://arxiv.org/abs/2307.09114
  • repo_url: None
  • paper_authors: Tobias Käfer, Victor Charpenay, Andreas Harth
  • for: 本研究为Linked Data代理人提供了BOLD(Buildings on Linked Data)benchmark,以及一个用于模拟动态Linked Data环境的框架。
  • methods: 本研究使用了一个读写Linked Data接口来模拟一座智能建筑,包括互联网时间、占用情况和感知器等。
  • results: 在Linked Data表示环境下,代理人执行了一些具体任务,如灯光控制,并且通过测试环境来检查任务执行是否正确,以及代理人性能的测量。
    Abstract The paper presents the BOLD (Buildings on Linked Data) benchmark for Linked Data agents, next to the framework to simulate dynamic Linked Data environments, using which we built BOLD. The BOLD benchmark instantiates the BOLD framework by providing a read-write Linked Data interface to a smart building with simulated time, occupancy movement and sensors and actuators around lighting. On the Linked Data representation of this environment, agents carry out several specified tasks, such as controlling illumination. The simulation environment provides means to check for the correct execution of the tasks and to measure the performance of agents. We conduct measurements on Linked Data agents based on condition-action rules.
    摘要 文章介绍了BOLD(建筑物 Linked Data)benchmark,用于测试链接数据代理人的能力,同时还提供了模拟动态链接数据环境的框架。BOLD benchmark通过提供一个阅读写链接数据接口,使用模拟时间、占用情况和感知器等来模拟智能建筑环境。在这个链接数据表示中,代理人执行了一些Specified任务,如控制照明。模拟环境提供了检查任务执行是否正确的方式和评估代理人性能的方法。我们对基于条件动作规则的链接数据代理人进行了测量。

  • paper_url: http://arxiv.org/abs/2307.09099
  • repo_url: None
  • paper_authors: Seyed Mahdi Shariatzadeh, Mahmood Fathy, Reza Berangi, Mohammad Shahverdy
  • for: 本文提供了针对多目标神经网络搜索(MONAS)领域的概述,包括主要工作和现状报告,以及未来发展方向。
  • methods: 本文使用了多目标神经网络搜索(MONAS)的主要方法,包括分类和排序、搜索空间和搜索策略等方面的研究。
  • results: 本文通过分析多个目标的关系和优化,揭示了MONAS在不同领域的应用和优势。同时,还提出了未来发展方向和挑战。
    Abstract Recently, the expert-crafted neural architectures is increasing overtaken by the utilization of neural architecture search (NAS) and automatic generation (and tuning) of network structures which has a close relation to the Hyperparameter Optimization and Auto Machine Learning (AutoML). After the earlier NAS attempts to optimize only the prediction accuracy, Multi-Objective Neural architecture Search (MONAS) has been attracting attentions which considers more goals such as computational complexity, power consumption, and size of the network for optimization, reaching a trade-off between the accuracy and other features like the computational cost. In this paper, we present an overview of principal and state-of-the-art works in the field of MONAS. Starting from a well-categorized taxonomy and formulation for the NAS, we address and correct some miscategorizations in previous surveys of the NAS field. We also provide a list of all known objectives used and add a number of new ones and elaborate their specifications. We have provides analyses about the most important objectives and shown that the stochastic properties of some the them should be differed from deterministic ones in the multi-objective optimization procedure of NAS. We finalize this paper with a number of future directions and topics in the field of MONAS.
    摘要

DiTTO: Diffusion-inspired Temporal Transformer Operator

  • paper_url: http://arxiv.org/abs/2307.09072
  • repo_url: None
  • paper_authors: Oded Ovadia, Eli Turkel, Adar Kahana, George Em Karniadakis
  • for: 解决时间依赖的partial differential equations (PDEs) 问题,使用数据驱动的方法。
  • methods: 提出了一种基于操作学习概念的方法,称为DiTTO,它采用了潜在扩散模型的时间条件机制,并结合了Transformer架构的元素以提高其能力。
  • results: 对多维PDE问题进行了广泛的测试,包括1-D Burgers’方程、2-D Navier-Stokes方程以及2-D和3-D的声波方程,DiTTO得到了最佳的精度 результа。此外,还提出了一种使用快速抽象概念来提高DiTTO的性能的方法,并证明了DiTTO可以准确地进行零shot超解析。
    Abstract Solving partial differential equations (PDEs) using a data-driven approach has become increasingly common. The recent development of the operator learning paradigm has enabled the solution of a broader range of PDE-related problems. We propose an operator learning method to solve time-dependent PDEs continuously in time without needing any temporal discretization. The proposed approach, named DiTTO, is inspired by latent diffusion models. While diffusion models are usually used in generative artificial intelligence tasks, their time-conditioning mechanism is extremely useful for PDEs. The diffusion-inspired framework is combined with elements from the Transformer architecture to improve its capabilities. We demonstrate the effectiveness of the new approach on a wide variety of PDEs in multiple dimensions, namely the 1-D Burgers' equation, 2-D Navier-Stokes equations, and the acoustic wave equation in 2-D and 3-D. DiTTO achieves state-of-the-art results in terms of accuracy for these problems. We also present a method to improve the performance of DiTTO by using fast sampling concepts from diffusion models. Finally, we show that DiTTO can accurately perform zero-shot super-resolution in time.
    摘要 解决部分梯度方程(PDE)使用数据驱动方法已成为日益普遍。最近的运算学学 paradigm的发展已经使得解决更广泛的 PDE 相关问题变得可能。我们提议一种运算学学方法,名为DiTTO,可以不需要任何时间细分化地解决时间相依的 PDE。我们的方法受某种扩散模型的启发,而这种模型通常用于生成人工智能任务中。扩散模型在时间条件下的机制非常有用于 PDE。我们将扩散模型与Transformer架构的元素组合以提高其能力。我们在多维度中的一维布格尔方程、二维 Navier-Stokes 方程以及二维和三维的声波方程上进行了广泛的试验。DiTTO 在这些问题上达到了最新的精度标准。我们还提出了使用 diffusion 模型快速采样概念来提高 DiTTO 的性能的方法。最后,我们展示了 DiTTO 可以高精度地进行零shot超解析。

Unleashing the Imagination of Text: A Novel Framework for Text-to-image Person Retrieval via Exploring the Power of Words

  • paper_url: http://arxiv.org/abs/2307.09059
  • repo_url: https://github.com/Delong-liu-bupt/UIT
  • paper_authors: Delong Liu, Haiwen Li
  • for: 这个论文主要针对文本描述和图像之间的对应关系进行研究,以实现文本描述与图像的对应。
  • methods: 该论文提出了一种新的框架,名为“允许文本 imagination 激活”(UIT),以便充分利用文本描述中的表达力。该框架使用预训练的全 CLIP 模型作为图像和文本的双encoder,利用先前的交叉模态对齐知识。此外,该论文还提出了一种文本指导图像修复 auxiliary task,以帮助将抽象的文本描述映射到具体的图像区域。
  • results: 该论文的提议方法在三个popular benchmark dataset上实现了状态的最佳 result,并将源代码公开发布在短时间内。
    Abstract The goal of Text-to-image person retrieval is to retrieve person images from a large gallery that match the given textual descriptions. The main challenge of this task lies in the significant differences in information representation between the visual and textual modalities. The textual modality conveys abstract and precise information through vocabulary and grammatical structures, while the visual modality conveys concrete and intuitive information through images. To fully leverage the expressive power of textual representations, it is essential to accurately map abstract textual descriptions to specific images. To address this issue, we propose a novel framework to Unleash the Imagination of Text (UIT) in text-to-image person retrieval, aiming to fully explore the power of words in sentences. Specifically, the framework employs the pre-trained full CLIP model as a dual encoder for the images and texts , taking advantage of prior cross-modal alignment knowledge. The Text-guided Image Restoration auxiliary task is proposed with the aim of implicitly mapping abstract textual entities to specific image regions, facilitating alignment between textual and visual embeddings. Additionally, we introduce a cross-modal triplet loss tailored for handling hard samples, enhancing the model's ability to distinguish minor differences. To focus the model on the key components within sentences, we propose a novel text data augmentation technique. Our proposed methods achieve state-of-the-art results on three popular benchmark datasets, and the source code will be made publicly available shortly.
    摘要 文本到图像人识别的目标是从大量图库中 retrieve 匹配给定的文本描述图像。主要挑战在于视觉和文本modalities之间的信息表示有所不同。文本modalities通过语言和 grammatical structure 提供 precises 和抽象的信息,而视觉modalities通过图像提供具体和直观的信息。为了充分利用文本表示的表达力,需要准确地将抽象的文本描述映射到具体的图像。为解决这个问题,我们提出了一种新的 Unleash the Imagination of Text 框架(UIT),目的是全面利用文本表示的力量。具体来说,该框架使用预训练的 full CLIP 模型作为图像和文本的双Encoder,利用对modalities的先前交叉Alignment知识。我们还提出了 Text-guided Image Restoration 辅助任务,以帮助将抽象的文本实体映射到特定的图像区域,从而使得文本和视觉嵌入匹配。此外,我们还引入了特定 для处理困难样本的交叉模式 triplet loss,提高模型对微妙差别的识别能力。为了聚焦模型在句子中的关键组件,我们提出了一种新的文本数据增强技术。我们的提出的方法在三个流行的benchmark dataset上达到了状态精度,源代码即将公开。

QMNet: Importance-Aware Message Exchange for Decentralized Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.09051
  • repo_url: None
  • paper_authors: Xiufeng Huang, Sheng Zhou
  • for: 提高多智能体强化学习下的无线资源约束下的性能
  • methods: 提出了一种消息重要性度量和一种重要度意识的调度策略,使得agent可以更有效地交换消息,并且利用消息重要性度量来处理随机访问冲突。
  • results: 在交通拐点环境中评估了提议的方案,发现可以在仅30%的agent可以发送消息的情况下提高系统性能,并且通过消息预测机制可以进一步降低无线资源的使用率。
    Abstract To improve the performance of multi-agent reinforcement learning under the constraint of wireless resources, we propose a message importance metric and design an importance-aware scheduling policy to effectively exchange messages. The key insight is spending the precious communication resources on important messages. The message importance depends not only on the messages themselves, but also on the needs of agents who receive them. Accordingly, we propose a query-message-based architecture, called QMNet. Agents generate queries and messages with the environment observation. Sharing queries can help calculate message importance. Exchanging messages can help agents cooperate better. Besides, we exploit the message importance to deal with random access collisions in decentralized systems. Furthermore, a message prediction mechanism is proposed to compensate for messages that are not transmitted. Finally, we evaluate the proposed schemes in a traffic junction environment, where only a fraction of agents can send messages due to limited wireless resources. Results show that QMNet can extract valuable information to guarantee the system performance even when only $30\%$ of agents can share messages. By exploiting message prediction, the system can further save $40\%$ of wireless resources. The importance-aware decentralized multi-access mechanism can effectively avoid collisions, achieving almost the same performance as centralized scheduling.
    摘要 translate into Simplified Chinese:为了提高多机器人学习下限制无线资源的性能,我们提出了消息重要度度量和重要度感知调度策略,以有效地交换消息。关键发现是在有限的无线资源下花费珍贵的通信资源。消息重要度取决于消息本身,以及接收者需要。我们提出了问题消息架构,称为QMNet。代理人根据环境观察生成问题和消息。共享问题可以帮助计算消息重要度。交换消息可以帮助代理人合作更好。此外,我们利用消息重要度来处理分布式系统中的随机访问冲突。此外,我们还提出了消息预测机制,以补偿未传输消息。最后,我们在交通枢纽环境中评估我们的方案,其中仅有一部分代理人可以发送消息due to limited wireless resources.结果表明,QMNet可以提取有价值的信息,以保证系统性能,即使仅有30%的代理人可以分享消息。通过利用消息预测,系统可以进一步降低40%的无线资源。我们的重要度感知分布式多访问机制可以有效避免冲突,实现与中央调度相似的性能。

R-Cut: Enhancing Explainability in Vision Transformers with Relationship Weighted Out and Cut

  • paper_url: http://arxiv.org/abs/2307.09050
  • repo_url: None
  • paper_authors: Yingjie Niu, Ming Ding, Maoning Ge, Robin Karlsson, Yuxiao Zhang, Kazuya Takeda
  • for: 这paper的目的是提高Transformer模型的可解释性,帮助用户更深入理解模型在图像分类任务中的决策过程。
  • methods: 这paper提出了两个模块来提高可解释性:“Relationship Weighted Out”模块和“Cut”模块。“Relationship Weighted Out”模块通过提取中间层的类别特征来强调相关的信息,而“Cut”模块则进行细腻的特征分解,考虑因素如位置、文本ure、颜色等。通过 integrate these two modules, we generate dense class-specific visual explainability maps。
  • results: 该paper通过了extensive的量化和质量性实验,证明了其方法在ImageNet dataset上的效果明显提高,并在LRN dataset上进行了大量的实验,以评估其方法在复杂背景下的可解释性。结果表明其方法在这些场景下具有显著的改善。此外,paper还进行了简要的抑制实验,以验证每个模块的效果,并证明了它们的共同作用。
    Abstract Transformer-based models have gained popularity in the field of natural language processing (NLP) and are extensively utilized in computer vision tasks and multi-modal models such as GPT4. This paper presents a novel method to enhance the explainability of Transformer-based image classification models. Our method aims to improve trust in classification results and empower users to gain a deeper understanding of the model for downstream tasks by providing visualizations of class-specific maps. We introduce two modules: the ``Relationship Weighted Out" and the ``Cut" modules. The ``Relationship Weighted Out" module focuses on extracting class-specific information from intermediate layers, enabling us to highlight relevant features. Additionally, the ``Cut" module performs fine-grained feature decomposition, taking into account factors such as position, texture, and color. By integrating these modules, we generate dense class-specific visual explainability maps. We validate our method with extensive qualitative and quantitative experiments on the ImageNet dataset. Furthermore, we conduct a large number of experiments on the LRN dataset, specifically designed for automatic driving danger alerts, to evaluate the explainability of our method in complex backgrounds. The results demonstrate a significant improvement over previous methods. Moreover, we conduct ablation experiments to validate the effectiveness of each module. Through these experiments, we are able to confirm the respective contributions of each module, thus solidifying the overall effectiveness of our proposed approach.
    摘要 带有变换器基础的模型在自然语言处理(NLP)领域得到广泛应用,同时也在计算机视觉任务和多模态模型中得到广泛应用,如GPT4。这篇论文提出了一种新的方法来提高变换器基础的图像分类模型的可解释性。我们的方法旨在提高分类结果的信任度和让用户更深入地理解模型,以便在下游任务中获得更多的信息。我们提出了两个模块:“关系权重外”模块和“割”模块。“关系权重外”模块专注于从中间层提取类别特定的信息,使得我们能够高亮相关的特征。而“割”模块对细化特征进行了分析,考虑了位置、xture、颜色等因素。通过将这两个模块集成,我们生成了密集的类别特定的可解释性图。我们验证了我们的方法通过大量的质量和量测试在ImageNet数据集上,并在LRN数据集上进行了大量的实验,以评估我们的方法在复杂背景下的可解释性。结果表明我们的方法与之前的方法相比有了显著的改善。此外,我们还进行了减少模块的实验,以验证每个模块的各自贡献,从而确认整体方法的有效性。

FedDefender: Client-Side Attack-Tolerant Federated Learning

  • paper_url: http://arxiv.org/abs/2307.09048
  • repo_url: https://github.com/deu30303/feddefender
  • paper_authors: Sungwon Park, Sungwon Han, Fangzhao Wu, Sundong Kim, Bin Zhu, Xing Xie, Meeyoung Cha
  • for: 防止模型毒素攻击,增强 federated learning 的鲁棒性
  • methods: 客户端端Side defense 策略,包括攻击忍受本地模型更新和全局知识填充
  • results: 在实际场景中评估多个数据集,结果表明提议的方法可以增强 federated learning 对模型毒素攻击的鲁棒性。
    Abstract Federated learning enables learning from decentralized data sources without compromising privacy, which makes it a crucial technique. However, it is vulnerable to model poisoning attacks, where malicious clients interfere with the training process. Previous defense mechanisms have focused on the server-side by using careful model aggregation, but this may not be effective when the data is not identically distributed or when attackers can access the information of benign clients. In this paper, we propose a new defense mechanism that focuses on the client-side, called FedDefender, to help benign clients train robust local models and avoid the adverse impact of malicious model updates from attackers, even when a server-side defense cannot identify or remove adversaries. Our method consists of two main components: (1) attack-tolerant local meta update and (2) attack-tolerant global knowledge distillation. These components are used to find noise-resilient model parameters while accurately extracting knowledge from a potentially corrupted global model. Our client-side defense strategy has a flexible structure and can work in conjunction with any existing server-side strategies. Evaluations of real-world scenarios across multiple datasets show that the proposed method enhances the robustness of federated learning against model poisoning attacks.
    摘要 federated learning 可以从分散式数据来源学习而不需要遗失隐私,这使其成为一种重要的技术。然而,它受到模型毒化攻击的威胁,恶意客户端可以在训练过程中干扰。先前的防御机制将重点放在服务器端,使用精确的模型聚合,但这可能无法有效地防止数据不对称或攻击者可以访问正常客户端的资讯。在这篇论文中,我们提出了一个新的防御机制,将重点放在客户端,称为FedDefender,以帮助正常的客户端训练稳定的地方模型,并避免由攻击者发送的错误模型更新对正常客户端的影响。我们的方法包括两个主要的元件:(1)攻击忍耐的地方元更新和(2)攻击忍耐的全球知识传递。这两个元件用于找到防护感知的模型参数,并精确地传递全球模型中的知识。我们的客户端防御策略具有可以与任何现有的服务器端策略结合使用的灵活结构。实际应用数据显示,提案的方法可以增强 Federated Learning 中的模型毒化攻击防御能力。

Multimodal Machine Learning for Extraction of Theorems and Proofs in the Scientific Literature

  • paper_url: http://arxiv.org/abs/2307.09047
  • repo_url: https://github.com/mv96/mm_extraction
  • paper_authors: Shrey Mishra, Antoine Gauquier, Pierre Senellart
  • for: 本研究旨在提取数学文章中的定理和证明,使用多modal类型的机器学习方法。
  • methods: 本研究使用文本、字体特征和bitmap图像的多modal特征,并通过晚期结合特征进行融合,以提取定理和证明。文本模块使用自然语言模型进行预训练,并在小规模数据上进行微调。字体特征使用长短折衔LSTM模型,而bitmap图像使用EfficientNetv2深度网络进行处理。最后,使用CRF模型将多modal特征与块序列信息结合。
  • results: 实验结果表明,使用多modal方法比使用单modal方法有较好的性能,并且使用CRF模型可以提高性能。
    Abstract Scholarly articles in mathematical fields feature mathematical statements such as theorems, propositions, etc., as well as their proofs. Extracting them from the PDF representation of the articles requires understanding of scientific text along with visual and font-based indicators. We pose this problem as a multimodal classification problem using text, font features, and bitmap image rendering of the PDF as different modalities. In this paper we propose a multimodal machine learning approach for extraction of theorem-like environments and proofs, based on late fusion of features extracted by individual unimodal classifiers, taking into account the sequential succession of blocks in the document. For the text modality, we pretrain a new language model on a 11 GB scientific corpus; experiments shows similar performance for our task than a model (RoBERTa) pretrained on 160 GB, with faster convergence while requiring much less fine-tuning data. Font-based information relies on training a 128-cell LSTM on the sequence of font names and sizes within each block. Bitmap renderings are dealt with using an EfficientNetv2 deep network tuned to classify each image block. Finally, a simple CRF-based approach uses the features of the multimodal model along with information on block sequences. Experimental results show the benefits of using a multimodal approach vs any single modality, as well as major performance improvements using the CRF modeling of block sequences.
    摘要 学术论文在数学领域中经常包含数学陈述,如定理、命题等,以及其证明。从PDF文档中提取这些陈述和证明需要科学文本的理解以及视觉和字体指示器。我们将这个问题作为多modal分类问题来处理,使用文本、字体特征和Bitmap图像渲染作为不同的Modalities。在这篇论文中,我们提出了一种多modal机器学习方法,用于提取定理-like环境和证明,基于延迟融合多modal特征扩展的方法。文本模式下,我们预训练了一个新的语言模型,使用11GB的科学 corpus;实验表明,我们的任务性能与使用160GB的RoBERTa模型相似,而且具有更快的融合速度和需要的精度训练数据更少。基于字体信息,我们使用128个LSTM单元训练字体名称和大小序列内每个块的模型。Bitmap渲染方面,我们使用EfficientNetv2深度网络,对每个图像块进行分类。最后,我们使用一个简单的CRF模型,使用多modal模型的特征以及块序列信息。实验结果表明,使用多modal方法比使用单一模式有更大的优势,以及使用CRF模型对块序列信息的处理可以提高性能。

Emotional Intelligence of Large Language Models

  • paper_url: http://arxiv.org/abs/2307.09042
  • repo_url: None
  • paper_authors: Xuena Wang, Xueting Li, Zi Yin, Yue Wu, Liu Jia
  • for: 这研究是为了评估大语言模型(LLMs)在情感智能方面的能力,以及这些模型是否与人类情感和价值观Alignment。
  • methods: 研究人员首先开发了一种新的心理测试,旨在评估大语言模型情感理解能力,这种测试包括识别、解释和理解复杂情感的能力。此外,研究人员还使用了一个参照框架, constructed from over 500 adults,来测试不同的主流大语言模型。
  • results: 研究结果表明,大多数大语言模型在情感理解方面的能力强于人类平均水平,其中GPT-4的情感智能指数(EQ)达到了89%的人类参与者水平。此外,一种多变量模式分析发现,一些大语言模型并没有采用人类类似的机制来实现人类水平的表现,其表达模式与人类有所不同。
    Abstract Large Language Models (LLMs) have demonstrated remarkable abilities across numerous disciplines, primarily assessed through tasks in language generation, knowledge utilization, and complex reasoning. However, their alignment with human emotions and values, which is critical for real-world applications, has not been systematically evaluated. Here, we assessed LLMs' Emotional Intelligence (EI), encompassing emotion recognition, interpretation, and understanding, which is necessary for effective communication and social interactions. Specifically, we first developed a novel psychometric assessment focusing on Emotion Understanding (EU), a core component of EI, suitable for both humans and LLMs. This test requires evaluating complex emotions (e.g., surprised, joyful, puzzled, proud) in realistic scenarios (e.g., despite feeling underperformed, John surprisingly achieved a top score). With a reference frame constructed from over 500 adults, we tested a variety of mainstream LLMs. Most achieved above-average EQ scores, with GPT-4 exceeding 89% of human participants with an EQ of 117. Interestingly, a multivariate pattern analysis revealed that some LLMs apparently did not reply on the human-like mechanism to achieve human-level performance, as their representational patterns were qualitatively distinct from humans. In addition, we discussed the impact of factors such as model size, training method, and architecture on LLMs' EQ. In summary, our study presents one of the first psychometric evaluations of the human-like characteristics of LLMs, which may shed light on the future development of LLMs aiming for both high intellectual and emotional intelligence. Project website: https://emotional-intelligence.github.io/
    摘要 大型语言模型(LLMs)已经展现出了很多领域的出色表现,主要是通过语言生成、知识利用和复杂的推理等任务评估。然而,它们与人类情感和价值观Alignment不够系统地评估。在这里,我们评估了LLMs的情感智能(EI),包括情感认知、解释和理解,这些是实际应用中的重要 фактор。具体来说,我们开发了一种新的心理测试,旨在评估LLMs的情感理解能力(EU),这是EI的核心组成部分。这个测试需要评估复杂情感(例如,感到受到了不公平待遇,然而突然获得了优秀成绩)在真实的场景中。我们使用了来自500多名成年人的参考框架,测试了多种主流LLMs。大多数LLMs的EQ分数高于人类参与者的平均分数,GPT-4的EQ分数达117,超过89%的人类参与者。另外,我们通过多变量模式分析发现,一些LLMs并不是通过人类类似的机制来实现人类水平的表现,其表示方式与人类有所不同。此外,我们还讨论了因素 such as模型大小、训练方法和架构对LLMs的EQ具有影响。总之,我们的研究提供了一个可能是LLMs寻求高智商和情感智能的未来发展的首个心理测试。项目网站:

PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation

  • paper_url: http://arxiv.org/abs/2307.09036
  • repo_url: https://github.com/yingchaojiefeng/promptmagician
  • paper_authors: Yingchaojie Feng, Xingbo Wang, Kam Kwai Wong, Sijia Wang, Yuhong Lu, Minfeng Zhu, Baicheng Wang, Wei Chen
  • for: 该论文旨在提供一种可视化分析系统,帮助用户探索与输入提示相关的图像结果,并进行个性化的提示细化。
  • methods: 该系统基于一种提示建议模型,通过对用户提示进行分析,从DiffusionDB中检索相似的提示-图像对,并将特殊提示关键词标出。系统还提供了多级可视化工具,以便用户在跨模态空间中进行交互式的提示细化。
  • results: 两个使用场景、一个用户研究和专家采访表明,PromptMagician系统能够有效地支持用户在生成文本到图像模型中进行提示工程,并且提高了这种创造支持的效果。
    Abstract Generative text-to-image models have gained great popularity among the public for their powerful capability to generate high-quality images based on natural language prompts. However, developing effective prompts for desired images can be challenging due to the complexity and ambiguity of natural language. This research proposes PromptMagician, a visual analysis system that helps users explore the image results and refine the input prompts. The backbone of our system is a prompt recommendation model that takes user prompts as input, retrieves similar prompt-image pairs from DiffusionDB, and identifies special (important and relevant) prompt keywords. To facilitate interactive prompt refinement, PromptMagician introduces a multi-level visualization for the cross-modal embedding of the retrieved images and recommended keywords, and supports users in specifying multiple criteria for personalized exploration. Two usage scenarios, a user study, and expert interviews demonstrate the effectiveness and usability of our system, suggesting it facilitates prompt engineering and improves the creativity support of the generative text-to-image model.
    摘要 自然语言提示的生成图像模型在公众中得到了广泛的推广和应用,但是开发有效的提示语言可以是一项挑战,因为自然语言的复杂性和抽象性。这项研究提出了PromptMagician,一个可视化分析系统,帮助用户探索图像结果和调整输入提示语言。PromptMagician的核心是一个提示建议模型,接受用户提示语言作为输入,从DiffusionDB中检索相似的提示语言-图像对,并标识特殊(重要和相关)的提示关键词。为了促进交互式提示调整,PromptMagician引入了多级可视化,用于跨Modal的嵌入图像和推荐关键词的可视化,并支持用户指定多个标准 для个性化探索。两个使用场景、一项用户研究和专家采访表明PromptMagician可以帮助提高生成图像模型的创作支持和提示工程。

Exploring acceptance of autonomous vehicle policies using KeyBERT and SNA: Targeting engineering students

  • paper_url: http://arxiv.org/abs/2307.09014
  • repo_url: https://github.com/wangbuera/f230778c-6292-4e4f-97ab-6edac0901476
  • paper_authors: Jinwoo Ha, Dongsoo Kim
  • for: 本研究旨在探讨自动驾驶车(AV)政策的用户接受度,通过提升文本挖掘方法来填补政策制定者未充分考虑的用户需求。
  • methods: 本研究采用了两种文本挖掘方法,一是基于TF-IWF和 dice积分的协occurrence网络分析(CNA),另一是基于KeyBERT提取关键词和双cosinus相似度的上下文Semantic网络分析(C-SNA)。
  • results: 结果表明,C-SNA提供了更好地理解用户声音的信息,使用 fewer nodes and features than CNA。用户通过预先理解AV政策基于其工程知识和给出的文本,披露了AV事故政策的潜在风险。本研究建议管理这些风险,以支持AV在公共道路上成功部署。
    Abstract This study aims to explore user acceptance of Autonomous Vehicle (AV) policies with improved text-mining methods. Recently, South Korean policymakers have viewed Autonomous Driving Car (ADC) and Autonomous Driving Robot (ADR) as next-generation means of transportation that will reduce the cost of transporting passengers and goods. They support the construction of V2I and V2V communication infrastructures for ADC and recognize that ADR is equivalent to pedestrians to promote its deployment into sidewalks. To fill the gap where end-user acceptance of these policies is not well considered, this study applied two text-mining methods to the comments of graduate students in the fields of Industrial, Mechanical, and Electronics-Electrical-Computer. One is the Co-occurrence Network Analysis (CNA) based on TF-IWF and Dice coefficient, and the other is the Contextual Semantic Network Analysis (C-SNA) based on both KeyBERT, which extracts keywords that contextually represent the comments, and double cosine similarity. The reason for comparing these approaches is to balance interest not only in the implications for the AV policies but also in the need to apply quality text mining to this research domain. Significantly, the limitation of frequency-based text mining, which does not reflect textual context, and the trade-off of adjusting thresholds in Semantic Network Analysis (SNA) were considered. As the results of comparing the two approaches, the C-SNA provided the information necessary to understand users' voices using fewer nodes and features than the CNA. The users who pre-emptively understood the AV policies based on their engineering literacy and the given texts revealed potential risks of the AV accident policies. This study adds suggestions to manage these risks to support the successful deployment of AVs on public roads.
    摘要

How is ChatGPT’s behavior changing over time?

  • paper_url: http://arxiv.org/abs/2307.09009
  • repo_url: https://github.com/lchen001/llmdrift
  • paper_authors: Lingjiao Chen, Matei Zaharia, James Zou
    for:* The paper evaluates the performance and behavior of GPT-3.5 and GPT-4 on various tasks over time.methods:* The authors use several diverse tasks to evaluate the models’ performance, including math problems, sensitive/dangerous questions, opinion surveys, multi-hop knowledge-intensive questions, generating code, US Medical License tests, and visual reasoning.results:* The performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time, with some tasks showing improvement and others showing decline.* GPT-4 (June 2023) had poor performance on identifying prime vs. composite numbers compared to GPT-4 (March 2023).* GPT-4 became less willing to answer sensitive questions and opinion survey questions in June than in March.* GPT-4 performed better at multi-hop questions in June than in March, while GPT-3.5’s performance dropped on this task.* Both GPT-4 and GPT-3.5 had more formatting mistakes in code generation in June than in March.I hope that helps! Let me know if you have any other questions.
    Abstract GPT-3.5 and GPT-4 are the two most widely used large language model (LLM) services. However, when and how these models are updated over time is opaque. Here, we evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on several diverse tasks: 1) math problems, 2) sensitive/dangerous questions, 3) opinion surveys, 4) multi-hop knowledge-intensive questions, 5) generating code, 6) US Medical License tests, and 7) visual reasoning. We find that the performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time. For example, GPT-4 (March 2023) was reasonable at identifying prime vs. composite numbers (84% accuracy) but GPT-4 (June 2023) was poor on these same questions (51% accuracy). This is partly explained by a drop in GPT-4's amenity to follow chain-of-thought prompting. Interestingly, GPT-3.5 was much better in June than in March in this task. GPT-4 became less willing to answer sensitive questions and opinion survey questions in June than in March. GPT-4 performed better at multi-hop questions in June than in March, while GPT-3.5's performance dropped on this task. Both GPT-4 and GPT-3.5 had more formatting mistakes in code generation in June than in March. Overall, our findings show that the behavior of the "same" LLM service can change substantially in a relatively short amount of time, highlighting the need for continuous monitoring of LLMs.
    摘要

Ord2Seq: Regarding Ordinal Regression as Label Sequence Prediction

  • paper_url: http://arxiv.org/abs/2307.09004
  • repo_url: None
  • paper_authors: Jinhong Wang, Yi Cheng, Jintai Chen, Tingting Chen, Danny Chen, Jian Wu
  • for: 这篇论文主要针对的是ordinal regression问题,即将对象实例分类到ordinal类别中。
  • methods: 这篇论文提出了一种简单的序列预测框架 Ord2Seq,它将每个ordinal类别标签转换成特殊标签序列,从而将ordinal regression任务转化为一个序列预测任务。
  • results: 实验表明,Ord2Seq可以很好地 distinguishing adjacent categories,并且在四个不同的场景中超过了当前state-of-the-art的性能。
    Abstract Ordinal regression refers to classifying object instances into ordinal categories. It has been widely studied in many scenarios, such as medical disease grading, movie rating, etc. Known methods focused only on learning inter-class ordinal relationships, but still incur limitations in distinguishing adjacent categories thus far. In this paper, we propose a simple sequence prediction framework for ordinal regression called Ord2Seq, which, for the first time, transforms each ordinal category label into a special label sequence and thus regards an ordinal regression task as a sequence prediction process. In this way, we decompose an ordinal regression task into a series of recursive binary classification steps, so as to subtly distinguish adjacent categories. Comprehensive experiments show the effectiveness of distinguishing adjacent categories for performance improvement and our new approach exceeds state-of-the-art performances in four different scenarios. Codes are available at https://github.com/wjh892521292/Ord2Seq.
    摘要 ordinal 回归指的是将对象实例分类到ordinal类别中。它在许多场景中受到广泛研究,如医疗疾病等级分类、电影评分等。知名的方法只关注学习间类关系,但然而仍然存在区分邻Category的限制。在这篇论文中,我们提出了一种简单的序列预测框架 для ordinal 回归,称为Ord2Seq,它将每个ordinal类别标签转化为特殊标签序列,从而将ordinal 回归任务转化为一个序列预测过程。这样,我们将ordinal 回归任务分解成一系列的回归binary分类步骤,以便细腻地区分邻Category。经过全面的实验,我们发现可以明显提高区分邻Category的性能,并且我们的新方法在四个不同的场景中超越了当前最佳性能。代码可以在https://github.com/wjh892521292/Ord2Seq中找到。

EVIL: Evidential Inference Learning for Trustworthy Semi-supervised Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2307.08988
  • repo_url: None
  • paper_authors: Yingyu Chen, Ziyuan Yang, Chenyu Shen, Zhiwen Wang, Yang Qin, Yi Zhang
  • for: 提高 semi-supervised medical image segmentation 的准确性和可靠性,并提供 theoretically guaranteed 的解决方案。
  • methods: 基于 Dempster-Shafer Theory of Evidence (DST) 的 Evidential Inference Learning (EVIL) 方法,通过在一次前进中实现准确性量化和不确定性评估,并采用 consistency regularization-based 的训练方法来提高泛化能力。
  • results: 在公共数据集上实验表明,EVIL 方法与一些状态顶方法相比,具有竞争性的表现,并且可以提供 trustworthy pseudo labels on unlabeled data。
    Abstract Recently, uncertainty-aware methods have attracted increasing attention in semi-supervised medical image segmentation. However, current methods usually suffer from the drawback that it is difficult to balance the computational cost, estimation accuracy, and theoretical support in a unified framework. To alleviate this problem, we introduce the Dempster-Shafer Theory of Evidence (DST) into semi-supervised medical image segmentation, dubbed Evidential Inference Learning (EVIL). EVIL provides a theoretically guaranteed solution to infer accurate uncertainty quantification in a single forward pass. Trustworthy pseudo labels on unlabeled data are generated after uncertainty estimation. The recently proposed consistency regularization-based training paradigm is adopted in our framework, which enforces the consistency on the perturbed predictions to enhance the generalization with few labeled data. Experimental results show that EVIL achieves competitive performance in comparison with several state-of-the-art methods on the public dataset.
    摘要 最近,uncertainty-aware方法在半supervised医学图像分割中受到了越来越多的关注。然而,当前方法通常受到难以平衡计算成本、估计准确性和理论支持的问题。为了解决这个问题,我们在半supervised医学图像分割中引入了Dempster-Shafer理论(DST),称之为Evidential Inference Learning(EVIL)。EVIL提供了一个在单个前进 pass中 theoretically guarantee 的解决方案,以便Quantification of uncertainty accurate。在不经过标注数据的基础上生成可靠的 Pseudolabel 。我们的框架中采用了最近提出的一致Regularization-based training paradigm,以便在具有少量标注数据的情况下进行加强通用性。实验结果显示,EVIL在比较一些国际前沿方法的公共数据集上实现了竞争性的表现。

AI-assisted Improved Service Provisioning for Low-latency XR over 5G NR

  • paper_url: http://arxiv.org/abs/2307.08987
  • repo_url: None
  • paper_authors: Moyukh Laha, Dibbendu Roy, Sourav Dutta, Goutam Das
  • for: This paper aims to address the challenges of ensuring low latency, high data rate, and reliability in supporting Extended Reality (XR) services in 5G/6G networks.
  • methods: The proposed AI-assisted service provisioning scheme leverages predicted frames for processing, virtually increasing the network delay budget and improving service provisioning.
  • results: The proposed scheme is validated by extensive simulations, demonstrating a multi-fold increase in supported XR users and providing crucial network design insights.Here’s the Chinese translation of the three key points:
  • for: 本文旨在解决5G/6G网络中支持扩展现实(XR)服务的低延迟、高数据速率和可靠性问题。
  • methods: 该方案利用预测帧进行处理,虚拟增加网络延迟预算,提高服务提供。
  • results: 该方案由于广泛的 simulations validate,能够支持多重XR用户,并提供重要的网络设计理念。
    Abstract Extended Reality (XR) is one of the most important 5G/6G media applications that will fundamentally transform human interactions. However, ensuring low latency, high data rate, and reliability to support XR services poses significant challenges. This letter presents a novel AI-assisted service provisioning scheme that leverages predicted frames for processing rather than relying solely on actual frames. This method virtually increases the network delay budget and consequently improves service provisioning, albeit at the expense of minor prediction errors. The proposed scheme is validated by extensive simulations demonstrating a multi-fold increase in supported XR users and also provides crucial network design insights.
    摘要 伸展现实(XR)是5G/6G媒体应用之一,将fundamentally改变人类互动方式。然而,确保低延迟、高数据速率和可靠性来支持XR服务具有挑战。这封信提出了一种基于预测帧的AI助け服务提供方案,而不是完全依赖实际帧。这种方法虚拟增加网络延迟预算,从而改善服务提供,但是会出现一定的预测错误。提议的方案通过广泛的 simulations 验证,证明可以支持多ples XR用户,同时提供重要的网络设计理念。

PromptCrafter: Crafting Text-to-Image Prompt through Mixed-Initiative Dialogue with LLM

  • paper_url: http://arxiv.org/abs/2307.08985
  • repo_url: None
  • paper_authors: Seungho Baek, Hyerin Im, Jiseung Ryu, Juhyeong Park, Takyeon Lee
  • for: 这篇论文旨在提出一种新的混合式系统,帮助用户efficiently探索模型的能力和创建有效的提示。
  • methods: 该系统使用了步骤式的crafting方法,让用户可以逐步地制定文本到图像提示,并通过答复各种问题来细化意图。
  • results: 该系统可以帮助用户快速探索模型的能力,并帮助用户创建有效的提示,从而提高了用户的使用体验。
    Abstract Text-to-image generation model is able to generate images across a diverse range of subjects and styles based on a single prompt. Recent works have proposed a variety of interaction methods that help users understand the capabilities of models and utilize them. However, how to support users to efficiently explore the model's capability and to create effective prompts are still open-ended research questions. In this paper, we present PromptCrafter, a novel mixed-initiative system that allows step-by-step crafting of text-to-image prompt. Through the iterative process, users can efficiently explore the model's capability, and clarify their intent. PromptCrafter also supports users to refine prompts by answering various responses to clarifying questions generated by a Large Language Model. Lastly, users can revert to a desired step by reviewing the work history. In this workshop paper, we discuss the design process of PromptCrafter and our plans for follow-up studies.
    摘要 文本至图生成模型可以生成具有多样化主题和风格的图像,根据单个提示。 latest works have proposed various interaction methods to help users understand the model's capabilities and utilize them. However, how to efficiently explore the model's capability and create effective prompts are still open-ended research questions.在这篇论文中,我们介绍PromptCrafter,一种新的混合 iniciative 系统,允许用户逐步制作文本至图 prompt。通过迭代过程,用户可以高效地探索模型的能力,并明确自己的意图。PromptCrafter 还支持用户对提示进行细化,通过一个大型自然语言模型生成的各种回答问题来解释。最后,用户可以通过查看工作历史来返回到感兴趣的步骤。在这篇工作论文中,我们介绍PromptCrafter 的设计过程和我们的跟进计划。

Generative Visual Question Answering

  • paper_url: http://arxiv.org/abs/2307.10405
  • repo_url: https://github.com/chojw/genb
  • paper_authors: Ethan Shen, Scotty Singh, Bhavesh Kumar
  • for: 该论文目的是要研究多 modal 任务中的视觉语言深度学习模型在未来数据分布下的一致性,并提出一个可行的方法来创建一个高级的视觉问题回答(VQA)模型,以便在未来数据分布下获得成功的结果。
  • methods: 该论文使用了七种基eline和进步的VQA模型,并将这些模型应用在一个新的扩展 dataset 上,named GenVQA,这个 dataset 使用了 VQAv2 和 MS-COCO dataset 中的图像和描述来生成新的图像,并使用稳定扩展来测试这些模型的一致性。
  • results: 研究发现,这些成功的 VQA 模型在未来数据分布下的一致性较差,但是通过分析这些模型的架构,发现了一些常见的设计选择,可以帮助这些模型在未来数据分布下优化一致性。
    Abstract Multi-modal tasks involving vision and language in deep learning continue to rise in popularity and are leading to the development of newer models that can generalize beyond the extent of their training data. The current models lack temporal generalization which enables models to adapt to changes in future data. This paper discusses a viable approach to creating an advanced Visual Question Answering (VQA) model which can produce successful results on temporal generalization. We propose a new data set, GenVQA, utilizing images and captions from the VQAv2 and MS-COCO dataset to generate new images through stable diffusion. This augmented dataset is then used to test a combination of seven baseline and cutting edge VQA models. Performance evaluation focuses on questions mirroring the original VQAv2 dataset, with the answers having been adjusted to the new images. This paper's purpose is to investigate the robustness of several successful VQA models to assess their performance on future data distributions. Model architectures are analyzed to identify common stylistic choices that improve generalization under temporal distribution shifts. This research highlights the importance of creating a large-scale future shifted dataset. This data can enhance the robustness of VQA models, allowing their future peers to have improved ability to adapt to temporal distribution shifts.
    摘要 多modal任务涉及视觉和语言在深度学习中继续升温,并且导致 newer模型可以泛化到训练数据之外。现有模型缺乏时间泛化能力,使得模型能够适应未来数据的变化。这篇论文提出了创建高级Visual Question Answering(VQA)模型的可能方法,以便在未来数据分布下取得成功。我们提出了一个新的数据集,GenVQA,使用VQAv2和MS-COCO图像和caption生成新图像,并通过稳定扩散来测试七种基eline和潮流VQA模型。性能评估专注于原VQAv2数据集中的问题,并将答案调整到新图像中。本文的目的是investigate Several successful VQA模型的Robustness,以评估它们在未来数据分布下的性能。模型架构的分析,找到了一些通用的风格选择,可以提高模型在时间分布shift下的泛化能力。这种研究强调了创建大规模未来偏移数据的重要性,以提高VQA模型的未来几代的适应能力。

Development of the ChatGPT, Generative Artificial Intelligence and Natural Large Language Models for Accountable Reporting and Use (CANGARU) Guidelines

  • paper_url: http://arxiv.org/abs/2307.08974
  • repo_url: None
  • paper_authors: Giovanni E. Cacciamani, Michael B. Eppler, Conner Ganjavi, Asli Pekan, Brett Biedermann, Gary S. Collins, Inderbir S. Gill
    for: 这个研究的目的是提出一个跨学科全包的global inclusive consensus,以便在学术研究中负责使用、披透和报告生成人工智能(GAI)/生成预训练变换器(GPT)/大语言模型(LLM)技术。methods: 这个研究使用了系统性文献评估,以理解相关的想法、发现和报道标准在学术研究中,并制定了使用和披透GAI/GPT/LLM技术的指南。同时,该研究还进行了一个 bibliometric 分析,以评估现有的作者指南,分析其建议的不一致,并从Delphi调查中提取了共识。results: 该研究通过系统性文献评估和Delphi调查,成功地建立了一个全球 inclusive consensus,以便在学术研究中负责使用、披透和报告GAI/GPT/LLM技术。这些指南将帮助保证GAI/GPT/LLM技术的负责使用、披透和报告,以确保学术研究的可靠性和可重复性。
    Abstract The swift progress and ubiquitous adoption of Generative AI (GAI), Generative Pre-trained Transformers (GPTs), and large language models (LLMs) like ChatGPT, have spurred queries about their ethical application, use, and disclosure in scholarly research and scientific productions. A few publishers and journals have recently created their own sets of rules; however, the absence of a unified approach may lead to a 'Babel Tower Effect,' potentially resulting in confusion rather than desired standardization. In response to this, we present the ChatGPT, Generative Artificial Intelligence, and Natural Large Language Models for Accountable Reporting and Use Guidelines (CANGARU) initiative, with the aim of fostering a cross-disciplinary global inclusive consensus on the ethical use, disclosure, and proper reporting of GAI/GPT/LLM technologies in academia. The present protocol consists of four distinct parts: a) an ongoing systematic review of GAI/GPT/LLM applications to understand the linked ideas, findings, and reporting standards in scholarly research, and to formulate guidelines for its use and disclosure, b) a bibliometric analysis of existing author guidelines in journals that mention GAI/GPT/LLM, with the goal of evaluating existing guidelines, analyzing the disparity in their recommendations, and identifying common rules that can be brought into the Delphi consensus process, c) a Delphi survey to establish agreement on the items for the guidelines, ensuring principled GAI/GPT/LLM use, disclosure, and reporting in academia, and d) the subsequent development and dissemination of the finalized guidelines and their supplementary explanation and elaboration documents.
    摘要 <> translate_language: zh-CNThe swift progress and widespread adoption of Generative AI (GAI), Generative Pre-trained Transformers (GPTs), and large language models (LLMs) like ChatGPT, have triggered questions about their ethical application, use, and disclosure in scholarly research and scientific productions. A few publishers and journals have recently established their own sets of rules; however, the lack of a unified approach may lead to a 'Babel Tower Effect,' potentially resulting in confusion rather than desired standardization. In response to this, we present the ChatGPT, Generative Artificial Intelligence, and Natural Large Language Models for Accountable Reporting and Use Guidelines (CANGARU) initiative, with the aim of fostering a cross-disciplinary global inclusive consensus on the ethical use, disclosure, and proper reporting of GAI/GPT/LLM technologies in academia. The present protocol consists of four distinct parts:a) an ongoing systematic review of GAI/GPT/LLM applications to understand the linked ideas, findings, and reporting standards in scholarly research, and to formulate guidelines for its use and disclosure,b) a bibliometric analysis of existing author guidelines in journals that mention GAI/GPT/LLM, with the goal of evaluating existing guidelines, analyzing the disparity in their recommendations, and identifying common rules that can be brought into the Delphi consensus process,c) a Delphi survey to establish agreement on the items for the guidelines, ensuring principled GAI/GPT/LLM use, disclosure, and reporting in academia, andd) the subsequent development and dissemination of the finalized guidelines and their supplementary explanation and elaboration documents.

Landscape Surrogate: Learning Decision Losses for Mathematical Optimization Under Partial Information

  • paper_url: http://arxiv.org/abs/2307.08964
  • repo_url: https://github.com/facebookresearch/lancer
  • paper_authors: Arman Zharmagambetov, Brandon Amos, Aaron Ferber, Taoan Huang, Bistra Dilkina, Yuandong Tian
  • for: 这 paper 是为了解决部分观察或通用优化器无需专家调整时的优化问题而写的。
  • methods: 该 paper 使用学习优化器 $\mathbf{g}$ 来解决这些复杂问题,并使用知识优化解 $\mathbf{g}$ 来监督学习。
  • results: 该 paper 测试了该方法在synthetic问题和实际问题上,并得到了与当前基准相当或更好的目标值,同时减少了 $\mathbf{g}$ 的调用次数。特别是,该方法在高维ensional问题上表现出优于现有方法。
    Abstract Recent works in learning-integrated optimization have shown promise in settings where the optimization problem is only partially observed or where general-purpose optimizers perform poorly without expert tuning. By learning an optimizer $\mathbf{g}$ to tackle these challenging problems with $f$ as the objective, the optimization process can be substantially accelerated by leveraging past experience. The optimizer can be trained with supervision from known optimal solutions or implicitly by optimizing the compound function $f\circ \mathbf{g}$. The implicit approach may not require optimal solutions as labels and is capable of handling problem uncertainty; however, it is slow to train and deploy due to frequent calls to optimizer $\mathbf{g}$ during both training and testing. The training is further challenged by sparse gradients of $\mathbf{g}$, especially for combinatorial solvers. To address these challenges, we propose using a smooth and learnable Landscape Surrogate $M$ as a replacement for $f\circ \mathbf{g}$. This surrogate, learnable by neural networks, can be computed faster than the solver $\mathbf{g}$, provides dense and smooth gradients during training, can generalize to unseen optimization problems, and is efficiently learned via alternating optimization. We test our approach on both synthetic problems, including shortest path and multidimensional knapsack, and real-world problems such as portfolio optimization, achieving comparable or superior objective values compared to state-of-the-art baselines while reducing the number of calls to $\mathbf{g}$. Notably, our approach outperforms existing methods for computationally expensive high-dimensional problems.
    摘要 现代学习整合优化方法已经在部分观察到的优化问题或通用优化器无需专家调整时表现出了搭配性。通过学习一个优化器 $\mathbf{g}$ 以解决这些复杂的问题,优化过程可以得到加速。 $\mathbf{g}$ 可以在知道优化解的情况下进行监督学习,或者通过优化函数 $f\circ \mathbf{g}$ 进行启发式学习。后者可以不需要优化解作为标签,并且能够处理问题不确定性。然而,这种方法的训练和部署可能需要频繁地调用优化器 $\mathbf{g}$,这会导致训练和部署变慢。此外,$\mathbf{g}$ 的迭代次数可能会增加,尤其是在使用分解优化器时。为解决这些挑战,我们提议使用一个可学习的地形函数 $M$ 作为 $\mathbf{g}$ 的替换。这个函数可以通过神经网络学习,在训练中提供稠密和平滑的梯度,可以泛化到未看过的优化问题,并通过交互式优化快速学习。我们在 synthetic 问题和实际问题上进行测试,比如最优投资等,实现了与当前标准基准相当或更高的目标函数值,同时减少了对 $\mathbf{g}$ 的调用次数。尤其是在高维ensional Computationally 昂贵的问题上,我们的方法表现出了优于现有方法。

REX: Rapid Exploration and eXploitation for AI Agents

  • paper_url: http://arxiv.org/abs/2307.08962
  • repo_url: None
  • paper_authors: Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese
    for:REX is proposed to address the limitations of existing AutoGPT-style techniques in decision-making and to improve the efficiency and practicality of AI agent performance.methods:REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores to enhance AI agent performance. It also utilizes offline behaviors from logs and does not require any model fine-tuning.results:REX-based methods demonstrate comparable performance with existing methods such as Chain-of-Thoughts(CoT) and Reasoning viA Planning(RAP) in certain cases, and exhibit remarkable reductions in execution time, making it more practical and efficient in a diverse range of scenarios.
    Abstract In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. Existing AutoGPT-style techniques have inherent limitations, such as a heavy reliance on precise descriptions for decision-making, and the lack of a systematic approach to leverage try-and-fail procedures akin to traditional Reinforcement Learning (RL). REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance. This approach has the advantage of enabling the utilization of offline behaviors from logs and allowing seamless integration with existing foundation models while it does not require any model fine-tuning. Through comparative analysis with existing methods such as Chain-of-Thoughts(CoT) and Reasoning viA Planning(RAP), REX-based methods demonstrate comparable performance and, in certain cases, even surpass the results achieved by these existing techniques. Notably, REX-based methods exhibit remarkable reductions in execution time, enhancing their practical applicability across a diverse set of scenarios.
    摘要 在这篇论文中,我们提出了一种改进后的快速探索和尝试(Rapid Exploration and eXploitation,REX)方法,用于AI代理。现有的AutoGPT类型技术存在着内置的局限性,如决策过程中的精确描述过重,以及缺乏传统强化学习(Reinforcement Learning,RL)中的系统化尝试和失败处理机制。REX增加了一层奖励,并integrated Upper Confidence Bound(UCB)类概念,从而提高AI代理的稳定性和效率。这种方法可以利用日志中的offline行为,并可以与现有基础模型无需任何模型微调进行集成。与现有方法如Chain-of-Thoughts(CoT)和Reasoning viA Planning(RAP)进行比较分析,REX基于方法在相同的场景下达到了相当的表现水平,甚至在某些情况下超越了现有技术的结果。另外,REX基于方法在执行时间方面也表现出了很大的改善,从而提高了其在多样化场景下的实际应用性。

Siamese Networks for Weakly Supervised Human Activity Recognition

  • paper_url: http://arxiv.org/abs/2307.08944
  • repo_url: None
  • paper_authors: Taoran Sheng, Manfred Huber
  • for: 人体活动识别
  • methods: 使用多个siamesenet,无需明确标签数据进行训练
  • results: 可以作为各种不同的聚类算法的度量,并在三个数据集上进行评估,以验证其效果性。Here’s the breakdown of each point:1. for: The paper is written for human activity recognition, specifically using deep learning methods without explicit labels.2. methods: The paper proposes a model using multiple siamese networks to train a distance metric that can be used for clustering, without requiring explicit labels.3. results: The trained model can be used as a metric for a wide range of clustering algorithms, and the authors evaluate its effectiveness on three datasets for activity segmentation and recognition.
    Abstract Deep learning has been successfully applied to human activity recognition. However, training deep neural networks requires explicitly labeled data which is difficult to acquire. In this paper, we present a model with multiple siamese networks that are trained by using only the information about the similarity between pairs of data samples without knowing the explicit labels. The trained model maps the activity data samples into fixed size representation vectors such that the distance between the vectors in the representation space approximates the similarity of the data samples in the input space. Thus, the trained model can work as a metric for a wide range of different clustering algorithms. The training process minimizes a similarity loss function that forces the distance metric to be small for pairs of samples from the same kind of activity, and large for pairs of samples from different kinds of activities. We evaluate the model on three datasets to verify its effectiveness in segmentation and recognition of continuous human activity sequences.
    摘要 深度学习已经成功应用于人类活动识别。然而,训练深度神经网络需要显式标注数据,这是困难的获得。在这篇论文中,我们提出了一种使用多个对称网络训练的模型,只使用数据对的相似性信息进行训练。训练后的模型将活动数据样本映射到固定大小的表示向量中,使得表示空间中的距离approximate输入空间中的相似性。因此,训练过的模型可以作为各种不同聚类算法的度量。训练过程中的相似损失函数使得距离度量在同类活动样本对应小,不同类活动样本对应大。我们在三个数据集上验证了模型的效果,以确认其在连续人类活动序列的分割和识别中的有效性。

IxDRL: A Novel Explainable Deep Reinforcement Learning Toolkit based on Analyses of Interestingness

  • paper_url: http://arxiv.org/abs/2307.08933
  • repo_url: https://github.com/sri-aic/23-xai-ixdrl-data
  • paper_authors: Pedro Sequeira, Melinda Gervasio
  • for: 这个论文旨在解决 Deep Reinforcement Learning (RL) 中存在的可解释性问题,提供人工智能专家在协作人机设置中更加了解RL机器人的能力和局限性,以便更 Informed 的决策。
  • methods: 该论文提出了一种基于吸引力分析的新框架,可以为 RL 机器人提供多种能力探测方法,并且可以Native 支持 popular RLLib 工具包。
  • results: 该论文通过应用该框架,对 RL 机器人的行为模式和能力进行了全面的探测和分析,并且可以帮助人工智能专家更好地理解 RL 机器人的能力和局限性,以便更好地决策。
    Abstract In recent years, advances in deep learning have resulted in a plethora of successes in the use of reinforcement learning (RL) to solve complex sequential decision tasks with high-dimensional inputs. However, existing systems lack the necessary mechanisms to provide humans with a holistic view of their competence, presenting an impediment to their adoption, particularly in critical applications where the decisions an agent makes can have significant consequences. Yet, existing RL-based systems are essentially competency-unaware in that they lack the necessary interpretation mechanisms to allow human operators to have an insightful, holistic view of their competency. Towards more explainable Deep RL (xDRL), we propose a new framework based on analyses of interestingness. Our tool provides various measures of RL agent competence stemming from interestingness analysis and is applicable to a wide range of RL algorithms, natively supporting the popular RLLib toolkit. We showcase the use of our framework by applying the proposed pipeline in a set of scenarios of varying complexity. We empirically assess the capability of the approach in identifying agent behavior patterns and competency-controlling conditions, and the task elements mostly responsible for an agent's competence, based on global and local analyses of interestingness. Overall, we show that our framework can provide agent designers with insights about RL agent competence, both their capabilities and limitations, enabling more informed decisions about interventions, additional training, and other interactions in collaborative human-machine settings.
    摘要 To address this issue, we propose a new framework based on interestingness analysis to provide a more explainable deep reinforcement learning (xDRL) system. Our tool offers various measures of RL agent competence and is applicable to a wide range of RL algorithms, natively supporting the popular RLLib toolkit. We demonstrate the use of our framework in a variety of scenarios of varying complexity, showcasing its ability to identify agent behavior patterns and competency-controlling conditions, as well as the task elements most responsible for an agent's competence, based on both global and local analyses of interestingness.Our framework provides agent designers with valuable insights into RL agent competence, enabling more informed decisions about interventions, additional training, and other interactions in collaborative human-machine settings. With our proposed xDRL framework, we aim to improve the transparency and accountability of RL systems, ultimately leading to more reliable and trustworthy decision-making processes.

Unsupervised Deep Graph Matching Based on Cycle Consistency

  • paper_url: http://arxiv.org/abs/2307.08930
  • repo_url: None
  • paper_authors: Siddharth Tourani, Carsten Rother, Muhammad Haris Khan, Bogdan Savchynskyy
  • for: 图像中键点匹配无监督学习
  • methods: 自我监督法,不需要对准对应关系
  • results: 新状态作图像键点匹配,设置新纪录Here’s the summary in English for reference:
  • for: Keypoint matching in images without supervision
  • methods: Self-supervised method that enforces consistency of matchings between images of the same object category
  • results: New state-of-the-art for unsupervised graph matching
    Abstract We contribute to the sparsely populated area of unsupervised deep graph matching with application to keypoint matching in images. Contrary to the standard \emph{supervised} approach, our method does not require ground truth correspondences between keypoint pairs. Instead, it is self-supervised by enforcing consistency of matchings between images of the same object category. As the matching and the consistency loss are discrete, their derivatives cannot be straightforwardly used for learning. We address this issue in a principled way by building our method upon the recent results on black-box differentiation of combinatorial solvers. This makes our method exceptionally flexible, as it is compatible with arbitrary network architectures and combinatorial solvers. Our experimental evaluation suggests that our technique sets a new state-of-the-art for unsupervised graph matching.
    摘要 我们在受限的领域中贡献了无监督深度图匹配,具体来说是图像中锚点匹配。与惯常的监督方法不同,我们的方法不需要图像对对的基准 truth 对应。而是通过确保图像类别之间匹配的一致性来自我监督。由于匹配和一致性损失是离散的,它们的导数不可直接用于学习。我们在理性的方式解决这个问题,基于最近的黑盒异构分析器的结果。这使得我们的方法非常灵活,可以与任意网络架构和异构分析器集成。我们的实验评估表明,我们的技术已经创造了无监督图匹配的新州 OF THE ART。

Multi-Stage Cable Routing through Hierarchical Imitation Learning

  • paper_url: http://arxiv.org/abs/2307.08927
  • repo_url: None
  • paper_authors: Jianlan Luo, Charles Xu, Xinyang Geng, Gilbert Feng, Kuan Fang, Liam Tan, Stefan Schaal, Sergey Levine
  • for: 这种研究旨在解决多个阶段机器人 manipulate 任务中的问题,特别是电缆 Routing 任务,机器人需要通过一系列的clip来路径电缆。
  • methods: 该研究使用 imitation learning 方法,从示例中学习 vision-based 政策,包括下一步的Sequencing 级别和下一步的motor control 级别。
  • results: 研究表明,使用这种方法可以在非常困难的clip placement variation中表现出色,并且可以恢复从失败中 recovered 和修正错误。Here are the three main points in English:
  • for: The research aims to solve the problem of multi-stage robotic manipulation tasks, particularly the cable routing task, where the robot needs to route a cable through a series of clips.
  • methods: The study uses imitation learning methods, learning vision-based policies from demonstrations at both the lower (motor control) and upper (sequencing) levels.
  • results: The research shows that the method can perform excellently in very challenging clip placement variations and can recover from failure and correct errors.
    Abstract We study the problem of learning to perform multi-stage robotic manipulation tasks, with applications to cable routing, where the robot must route a cable through a series of clips. This setting presents challenges representative of complex multi-stage robotic manipulation scenarios: handling deformable objects, closing the loop on visual perception, and handling extended behaviors consisting of multiple steps that must be executed successfully to complete the entire task. In such settings, learning individual primitives for each stage that succeed with a high enough rate to perform a complete temporally extended task is impractical: if each stage must be completed successfully and has a non-negligible probability of failure, the likelihood of successful completion of the entire task becomes negligible. Therefore, successful controllers for such multi-stage tasks must be able to recover from failure and compensate for imperfections in low-level controllers by smartly choosing which controllers to trigger at any given time, retrying, or taking corrective action as needed. To this end, we describe an imitation learning system that uses vision-based policies trained from demonstrations at both the lower (motor control) and the upper (sequencing) level, present a system for instantiating this method to learn the cable routing task, and perform evaluations showing great performance in generalizing to very challenging clip placement variations. Supplementary videos, datasets, and code can be found at https://sites.google.com/view/cablerouting.
    摘要 我们研究多阶段机器人操作任务的学习问题,具体是用于电缆 Routing 的情况,机器人需要通过一系列的夹具来 rout 电缆。这个设定提供了复杂的多阶段机器人操作enario,包括处理可扩展物体、关闭视觉感知的关键和处理多步骤的延展 behaviors。在这种情况下,学习单一的基本步骤 для每个阶段是不实用的:如果每个阶段都需要成功完成,并且有一定的失败几率,则完成整个任务的可能性会极低。因此,成功的控制器 для这种多阶段任务必须能够重新启动、补偿低层控制器的缺陷,并在需要时进行订正动作。为此,我们描述了一个模仿学习系统,使用视觉政策来从示例中学习并实现 cable routing 任务,并在评估中表现出非常出色地一致性。请参考 https://sites.google.com/view/cablerouting 获取补充影片、数据和代码。

Federated Large Language Model: A Position Paper

  • paper_url: http://arxiv.org/abs/2307.08925
  • repo_url: None
  • paper_authors: Chaochao Chen, Xiaohua Feng, Jun Zhou, Jianwei Yin, Xiaolin Zheng
  • for: 这篇论文的目的是探讨大规模自然语言模型(LLM)在实际应用中遇到的挑战,以及如何使用联邦学习(FL)技术来解决这些挑战。
  • methods: 这篇论文提出了三个关键 ком成分,即联邦LLM预训练、联邦LLM精度调整和联邦LLM提示工程。每个 ком成分都有优点比传统LLM训练方法,并提出了具体的工程战略来实现。
  • results: 这篇论文分析了联邦LLM的新问题和挑战,并评估了现有的解决方案和可能的阻碍因素。
    Abstract Large scale language models (LLM) have received significant attention and found diverse applications across various domains, but their development encounters challenges in real-world scenarios. These challenges arise due to the scarcity of public domain data availability and the need to maintain privacy with respect to private domain data. To address these issues, federated learning (FL) has emerged as a promising technology that enables collaborative training of shared models while preserving decentralized data. We propose the concept of federated LLM, which comprises three key components, i.e., federated LLM pre-training, federated LLM fine-tuning, and federated LLM prompt engineering. For each component, we discuss its advantage over traditional LLM training methods and propose specific engineering strategies for implementation. Furthermore, we explore the novel challenges introduced by the integration of FL and LLM. We analyze existing solutions and identify potential obstacles faced by these solutions within the context of federated LLM.
    摘要

Continuous-Time Reinforcement Learning: New Design Algorithms with Theoretical Insights and Performance Guarantees

  • paper_url: http://arxiv.org/abs/2307.08920
  • repo_url: None
  • paper_authors: Brent A. Wallace, Jennie Si
  • for: 这个论文的目的是提出一种新的连续时间非线性优化控制方法,用于控制非线性系统。
  • methods: 这些方法包括分解物理系统为小问题,以提高设计直观性和约化维度。它们还使用新的刺激框架,以提高 persistency of excitation 和数值稳定性性能。
  • results: 这些方法在控制一个不稳定、非最小频响 hypersonic vehicle (HSV) 上得到了证明和实验 guarantees,并且在这个应用中表现出了良好的性能。
    Abstract Continuous-time nonlinear optimal control problems hold great promise in real-world applications. After decades of development, reinforcement learning (RL) has achieved some of the greatest successes as a general nonlinear control design method. However, a recent comprehensive analysis of state-of-the-art continuous-time RL (CT-RL) methods, namely, adaptive dynamic programming (ADP)-based CT-RL algorithms, reveals they face significant design challenges due to their complexity, numerical conditioning, and dimensional scaling issues. Despite advanced theoretical results, existing ADP CT-RL synthesis methods are inadequate in solving even small, academic problems. The goal of this work is thus to introduce a suite of new CT-RL algorithms for control of affine nonlinear systems. Our design approach relies on two important factors. First, our methods are applicable to physical systems that can be partitioned into smaller subproblems. This constructive consideration results in reduced dimensionality and greatly improved intuitiveness of design. Second, we introduce a new excitation framework to improve persistence of excitation (PE) and numerical conditioning performance via classical input/output insights. Such a design-centric approach is the first of its kind in the ADP CT-RL community. In this paper, we progressively introduce a suite of (decentralized) excitable integral reinforcement learning (EIRL) algorithms. We provide convergence and closed-loop stability guarantees, and we demonstrate these guarantees on a significant application problem of controlling an unstable, nonminimum phase hypersonic vehicle (HSV).
    摘要 The goal of this work is to introduce a suite of new CT-RL algorithms for controlling affine nonlinear systems. Our design approach focuses on two key factors:1. Applicability to physical systems that can be partitioned into smaller subproblems, resulting in reduced dimensionality and improved intuitiveness of design.2. Introduction of a new excitation framework to improve persistence of excitation (PE) and numerical conditioning performance via classical input/output insights.This design-centric approach is novel in the ADP CT-RL community. We progressively introduce a suite of decentralized excitable integral reinforcement learning (EIRL) algorithms, providing convergence and closed-loop stability guarantees. We demonstrate these guarantees on a significant application problem of controlling an unstable, nonminimum phase hypersonic vehicle (HSV).

Solving multiphysics-based inverse problems with learned surrogates and constraints

  • paper_url: http://arxiv.org/abs/2307.11099
  • repo_url: None
  • paper_authors: Ziyi Yin, Rafael Orozco, Mathias Louboutin, Felix J. Herrmann
  • for: 这个研究旨在解决对地质碳储监控中的多物理 inverse problem,当multimodal时间径数据贵重和numerical simulation too costly时,我们使用computationally cheap learned surrogates和learned constraints相结合,以获得高精度的permeability数据。
  • methods: 我们使用了一个结合learned surrogates和learned constraints的扩展方法,包括一个训练好的深度神经网(normalizing flow),强制模型迭代维持在distribution中,以保证训练了Fourier neural operator的精度。
  • results: 我们透过试验集中心在地质碳储问题上,使用了时间径数据和时间径地震数据两种不同的数据模式,并评估了这两种数据模式的组合效果。结果显示,这种结合方法可以提供高精度的permeability数据和CO2气泡预测,包括监控井点附近和远 away的地区。
    Abstract Solving multiphysics-based inverse problems for geological carbon storage monitoring can be challenging when multimodal time-lapse data are expensive to collect and costly to simulate numerically. We overcome these challenges by combining computationally cheap learned surrogates with learned constraints. Not only does this combination lead to vastly improved inversions for the important fluid-flow property, permeability, it also provides a natural platform for inverting multimodal data including well measurements and active-source time-lapse seismic data. By adding a learned constraint, we arrive at a computationally feasible inversion approach that remains accurate. This is accomplished by including a trained deep neural network, known as a normalizing flow, which forces the model iterates to remain in-distribution, thereby safeguarding the accuracy of trained Fourier neural operators that act as surrogates for the computationally expensive multiphase flow simulations involving partial differential equation solves. By means of carefully selected experiments, centered around the problem of geological carbon storage, we demonstrate the efficacy of the proposed constrained optimization method on two different data modalities, namely time-lapse well and time-lapse seismic data. While permeability inversions from both these two modalities have their pluses and minuses, their joint inversion benefits from either, yielding valuable superior permeability inversions and CO2 plume predictions near, and far away, from the monitoring wells.
    摘要 解决基于多物理的反向问题可能会存在挑战,特别是当时间延迟数据成本高昂且计算成本高昂时。我们通过将计算成本低的学习模型与学习约束结合,不仅能够大幅提高含液流速度的重要参数含液性,还提供了自然的多模态数据混合 inverse 平台。通过添加学习约束,我们得到一个可行的推算方法,保持精度。这是通过包含训练好的深度神经网络(normalizing flow),让模型迭代器的输出呈现在有效范围内,以保证训练过 Fourier neural operator 的精度。通过选择合适的实验,我们在 geological carbon storage 问题上验证了我们的受限优化方法,并在不同数据模式下进行了两种不同的吞吐量推算。虽然吞吐量推算从两种数据模式中各有优缺点,但是两者的联合推算却能够提供更加优秀的含液性推算和 CO2 气泡预测,尤其是在监测井附近和远离监测井的地方。

Basal-Bolus Advisor for Type 1 Diabetes (T1D) Patients Using Multi-Agent Reinforcement Learning (RL) Methodology

  • paper_url: http://arxiv.org/abs/2307.08897
  • repo_url: None
  • paper_authors: Mehrad Jaloli, Marzia Cescon
  • for: 这个研究旨在开发一种基于多代理人学习(RL)的个人化血糖控制方法,以改善型1糖尿病(T1D)患者的血糖水平。
  • methods: 这个方法使用一个关闭链系统,包括一个血糖代谢模型和一个多代理人弹性算法对策。
  • results: 研究结果显示,RL基于的基础-胶囊导师可以有效改善血糖控制,减少血糖波动和增加在目标范围(70-180 mg/dL)中的时间。低血糖事件得到有效预防,并减少严重高血糖事件。此外,RL方法也导致了与传统治疗相比的平均每天基础胰岛素剂量的 statistically significant 减少。这些发现显示RL方法在实现更好的血糖控制和减少高血糖的风险方面是有效的。
    Abstract This paper presents a novel multi-agent reinforcement learning (RL) approach for personalized glucose control in individuals with type 1 diabetes (T1D). The method employs a closed-loop system consisting of a blood glucose (BG) metabolic model and a multi-agent soft actor-critic RL model acting as the basal-bolus advisor. Performance evaluation is conducted in three scenarios, comparing the RL agents to conventional therapy. Evaluation metrics include glucose levels (minimum, maximum, and mean), time spent in different BG ranges, and average daily bolus and basal insulin dosages. Results demonstrate that the RL-based basal-bolus advisor significantly improves glucose control, reducing glycemic variability and increasing time spent within the target range (70-180 mg/dL). Hypoglycemia events are effectively prevented, and severe hyperglycemia events are reduced. The RL approach also leads to a statistically significant reduction in average daily basal insulin dosage compared to conventional therapy. These findings highlight the effectiveness of the multi-agent RL approach in achieving better glucose control and mitigating the risk of severe hyperglycemia in individuals with T1D.
    摘要 The results show that the RL-based basal-bolus advisor significantly improves glucose control, reducing glycemic variability and increasing time spent within the target range (70-180 mg/dL). Hypoglycemia events are effectively prevented, and severe hyperglycemia events are reduced. Additionally, the RL approach leads to a statistically significant reduction in average daily basal insulin dosage compared to conventional therapy. These findings demonstrate the effectiveness of the multi-agent RL approach in achieving better glucose control and mitigating the risk of severe hyperglycemia in individuals with T1D.

AI for the Generation and Testing of Ideas Towards an AI Supported Knowledge Development Environment

  • paper_url: http://arxiv.org/abs/2307.08876
  • repo_url: None
  • paper_authors: Ted Selker
  • for: 这篇论文主要是为了探讨如何使用机器学习技术创建大量语言模型,以便在不同的通信形式中预测Sequential information,并且通过Transformers生成文本或视觉输出,以模拟人类的回应。
  • methods: 这篇论文使用的方法主要是基于Transformers的生成AI,它可以在不同的语言模型中生成文本或视觉输出,并且可以模拟人类的回应。
  • results: 这篇论文的结果主要表明,通过将生成AI与互联网源的追溯功能相结合,可以创造出更加有价值的解决方案,并且可以减少人类偏见。此外,论文还提出了一种名为“Generate And Search Test”的系统,可以帮助知识工作者更加快速地创建高质量的解决方案。
    Abstract New systems employ Machine Learning to sift through large knowledge sources, creating flexible Large Language Models. These models discern context and predict sequential information in various communication forms. Generative AI, leveraging Transformers, generates textual or visual outputs mimicking human responses. It proposes one or multiple contextually feasible solutions for a user to contemplate. However, generative AI does not currently support traceability of ideas, a useful feature provided by search engines indicating origin of information. The narrative style of generative AI has gained positive reception. People learn from stories. Yet, early ChatGPT efforts had difficulty with truth, reference, calculations, and aspects like accurate maps. Current capabilities of referencing locations and linking to apps seem to be better catered by the link-centric search methods we've used for two decades. Deploying truly believable solutions extends beyond simulating contextual relevance as done by generative AI. Combining the creativity of generative AI with the provenance of internet sources in hybrid scenarios could enhance internet usage. Generative AI, viewed as drafts, stimulates thinking, offering alternative ideas for final versions or actions. Scenarios for information requests are considered. We discuss how generative AI can boost idea generation by eliminating human bias. We also describe how search can verify facts, logic, and context. The user evaluates these generated ideas for selection and usage. This paper introduces a system for knowledge workers, Generate And Search Test, enabling individuals to efficiently create solutions previously requiring top collaborations of experts.
    摘要 新系统雇用机器学习探索大量知识源,创造 flexible 大语言模型。这些模型能够辨识 контекст和预测 sequential information 在不同的通讯形式中。生成 AI,运用 transformers,生成文字或图像出力,模拟人类回应。它提供一或多个 contextually feasible 解决方案,让用户思考。但是,生成 AI 目前不支持追溯想法的功能,这是搜索引擎提供的有用功能,它可以显示信息的来源。生成 AI 的 narrative 式受到了正面的评价,人们从故事中学习。然而, early ChatGPT 努力遇到了真实、参考、计算和正确地图等问题。现在的连结中心搜寻方法可以更好地适应连结中心搜寻方法,这些方法在过去二十年中被使用。实现真实的解决方案需要进一步推进,不只是模拟上下文相关的 relevance,生成 AI 可以与互联网的来源混合使用。生成 AI 被视为稿件,刺激思维,提供选择和行动的代替想法。在信息请求的情况下,我们讨论了如何使用生成 AI 增强想法生成,消除人类偏见。我们还描述了如何使用搜寻验证 факти、逻辑和 context。用户评估这些生成的想法,选择和使用。这篇文章介绍了一个专为知识工作者的系统,创新 Test,允许个人快速创建以前需要多个专家合作的解决方案。

An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient

  • paper_url: http://arxiv.org/abs/2307.08873
  • repo_url: None
  • paper_authors: Yudong Luo, Guiliang Liu, Pascal Poupart, Yangchen Pan
  • for: 降低奖励回报的方差,以避免风险投入RL中的潜在风险。
  • methods: 使用新的风险度量,帕尼-迪弗扬分布,代替直接使用奖励回报的方差。
  • results: 在具体的域中,我们的算法可以避免方差基础的限制,并实现高回报低风险。
    Abstract Restricting the variance of a policy's return is a popular choice in risk-averse Reinforcement Learning (RL) due to its clear mathematical definition and easy interpretability. Traditional methods directly restrict the total return variance. Recent methods restrict the per-step reward variance as a proxy. We thoroughly examine the limitations of these variance-based methods, such as sensitivity to numerical scale and hindering of policy learning, and propose to use an alternative risk measure, Gini deviation, as a substitute. We study various properties of this new risk measure and derive a policy gradient algorithm to minimize it. Empirical evaluation in domains where risk-aversion can be clearly defined, shows that our algorithm can mitigate the limitations of variance-based risk measures and achieves high return with low risk in terms of variance and Gini deviation when others fail to learn a reasonable policy.
    摘要 限制策略返回的方差是常见的选择在偏离风险学习(RL)中,因为它有明确的数学定义和易于理解的解释。传统方法直接限制总返回方差。现代方法则限制每步奖励方差作为代理。我们仔细检查了这些方差基于的方法的局限性,如数值归一化的敏感性和策略学习压抑,并提出了一种替代风险度量,幂差偏度,并 derivation of a policy gradient algorithm to minimize it。我们在可以明确定义风险偏好的领域进行了实证评估,显示我们的算法可以减少方差基于的风险度量的局限性,并在其他策略无法学习合理策略时达到高返回低风险水平。

Curriculum Learning for Graph Neural Networks: A Multiview Competence-based Approach

  • paper_url: http://arxiv.org/abs/2307.08859
  • repo_url: https://github.com/CLU-UML/MCCL
  • paper_authors: Nidhi Vakil, Hadi Amiri
  • for: 这篇论文旨在提出一种基于图形复杂性ormalization的课程学习方法,以提高语言应用中的图形神经网络训练效率。
  • methods: 本篇论文使用了一种基于图形复杂性和模型能力的课程学习方法,包括一个调度方案,将不同的图形难度和模型能力考虑在训练中。
  • results: 实验结果显示,提案的方法可以在实际的连接预测和节点分类任务上提高图形神经网络的训练效率。
    Abstract A curriculum is a planned sequence of learning materials and an effective one can make learning efficient and effective for both humans and machines. Recent studies developed effective data-driven curriculum learning approaches for training graph neural networks in language applications. However, existing curriculum learning approaches often employ a single criterion of difficulty in their training paradigms. In this paper, we propose a new perspective on curriculum learning by introducing a novel approach that builds on graph complexity formalisms (as difficulty criteria) and model competence during training. The model consists of a scheduling scheme which derives effective curricula by accounting for different views of sample difficulty and model competence during training. The proposed solution advances existing research in curriculum learning for graph neural networks with the ability to incorporate a fine-grained spectrum of graph difficulty criteria in their training paradigms. Experimental results on real-world link prediction and node classification tasks illustrate the effectiveness of the proposed approach.
    摘要 一个课程是一种规划的学习材料序列,一个有效的课程可以使学习变得高效和有效,不分人类和机器。现在的研究已经开发了使用数据驱动的课程学习方法来训练语言应用中的图神经网络。然而,现有的课程学习方法通常使用单一的困难度标准来训练。在这篇论文中,我们提出了一新的课程学习视角,基于图复杂性形式学(作为困难度标准)和模型能力的建议。我们的方法包括一种安排方案,通过考虑不同的样本困难度和模型能力来 derivation 有效的课程。我们的解决方案超越了现有的研究,使得图神经网络的课程学习可以包含细致的图困难度标准在内。实验结果表明,在真实世界中的链接预测和节点分类任务上,我们的方法具有显著的效果。

Autoregressive Diffusion Model for Graph Generation

  • paper_url: http://arxiv.org/abs/2307.08849
  • repo_url: None
  • paper_authors: Lingkai Kong, Jiaming Cui, Haotian Sun, Yuchen Zhuang, B. Aditya Prakash, Chao Zhang
  • for: 本研究旨在提出一种基于扩散的图生成模型,以提高图生成的效果和速度。
  • methods: 我们提出了一种名为“自适应扩散”的模型,它在图生成过程中直接在离散图空间进行扩散过程,而不是在归一化的邻接矩阵空间进行。我们还设计了一个“扩散排序网络”和一个“减噪网络”,以便在前向和反向生成过程中进行有效的图生成。
  • results: 我们在六个不同的通用图数据集和两个分子数据集上进行了实验,并证明了我们的模型可以在图生成中 дости得更好的或相当的效果,同时具有快速的生成速度。
    Abstract Diffusion-based graph generative models have recently obtained promising results for graph generation. However, existing diffusion-based graph generative models are mostly one-shot generative models that apply Gaussian diffusion in the dequantized adjacency matrix space. Such a strategy can suffer from difficulty in model training, slow sampling speed, and incapability of incorporating constraints. We propose an \emph{autoregressive diffusion} model for graph generation. Unlike existing methods, we define a node-absorbing diffusion process that operates directly in the discrete graph space. For forward diffusion, we design a \emph{diffusion ordering network}, which learns a data-dependent node absorbing ordering from graph topology. For reverse generation, we design a \emph{denoising network} that uses the reverse node ordering to efficiently reconstruct the graph by predicting the node type of the new node and its edges with previously denoised nodes at a time. Based on the permutation invariance of graph, we show that the two networks can be jointly trained by optimizing a simple lower bound of data likelihood. Our experiments on six diverse generic graph datasets and two molecule datasets show that our model achieves better or comparable generation performance with previous state-of-the-art, and meanwhile enjoys fast generation speed.
    摘要 Diffusion-based图像生成模型最近已经取得了图像生成的出色成绩。然而,现有的扩散基于的图像生成模型大多是单一的一步生成模型,通过 Gaussian扩散在解Quantized邻接矩阵空间中进行。这种策略可能会导致模型训练困难、扩散速度慢和约束不能具体化。我们提出了一种“自适应扩散”模型,与现有方法不同,我们直接在零唯一图像空间中定义节点吸收扩散过程。对于前向扩散,我们设计了一个“扩散排序网络”,该网络学习基于图像结构的数据依赖节点吸收排序。对于反向生成,我们设计了一个“释放网络”,该网络使用反向节点排序来快速重建图像,先预测新节点的节点类型和与已经预测过的节点的边。基于图像的卷积 invariants,我们证明了这两个网络可以共同训练,通过优化数据可能的下界来优化模型。我们在六种不同的通用图像dataset和两种分子dataset上进行了实验,发现我们的模型可以与之前的状态前所未有的或相似的生成性能,同时具有快速生成速度。

Towards Accelerating Benders Decomposition via Reinforcement Learning Surrogate Models

  • paper_url: http://arxiv.org/abs/2307.08816
  • repo_url: None
  • paper_authors: Stephen Mak, Kyle Mana, Parisa Zehtabi, Michael Cashmore, Daniele Magazzeni, Manuela Veloso
  • for: 这篇论文是为了解决在不确定性下做出最佳决策的问题。
  • methods: 这篇论文使用的方法是Benders decomposion(BD),它将随机估计问题分解为多个更小的子问题。
  • results: 这篇论文提出了一种使用代理模型来加速BD的加速方法,并证明了这种方法可以提高BD的速度,具体来说,它比其他加速BD实现方法rapidly convergence的速度快30%。
    Abstract Stochastic optimization (SO) attempts to offer optimal decisions in the presence of uncertainty. Often, the classical formulation of these problems becomes intractable due to (a) the number of scenarios required to capture the uncertainty and (b) the discrete nature of real-world planning problems. To overcome these tractability issues, practitioners turn to decomposition methods that divide the problem into smaller, more tractable sub-problems. The focal decomposition method of this paper is Benders decomposition (BD), which decomposes stochastic optimization problems on the basis of scenario independence. In this paper we propose a method of accelerating BD with the aid of a surrogate model in place of an NP-hard integer master problem. Through the acceleration method we observe 30% faster average convergence when compared to other accelerated BD implementations. We introduce a reinforcement learning agent as a surrogate and demonstrate how it can be used to solve a stochastic inventory management problem.
    摘要
  1. The number of scenarios required to capture the uncertainty.2. The discrete nature of real-world planning problems.To address these tractability issues, practitioners often use decomposition methods that break down the problem into smaller, more manageable sub-problems. One such method is Benders decomposition (BD), which decomposes stochastic optimization problems based on scenario independence.In this paper, we propose a method to accelerate BD using a surrogate model in place of an NP-hard integer master problem. Our approach leads to an average convergence rate 30% faster compared to other accelerated BD implementations.We introduce a reinforcement learning agent as a surrogate and demonstrate its use in solving a stochastic inventory management problem. By leveraging the surrogate model, we can efficiently solve the problem and achieve better performance.

Operator Guidance Informed by AI-Augmented Simulations

  • paper_url: http://arxiv.org/abs/2307.08810
  • repo_url: None
  • paper_authors: Samuel J. Edwards, Michael Levine
  • for: 这篇论文是用来估算船舶响应统计数据的多优异、数据适应方法。
  • methods: 该方法使用了一个Long Short-Term Memory(LSTM)神经网络,使用了一个快速的低精度工具SimpleCode和一个更高精度的工具LAMP进行估算。
  • results: 经过训练LSTM神经网络以及对SimpleCode和LAMP数据进行比较,研究发现该方法可以准确地估算船舶响应统计数据。
    Abstract This paper will present a multi-fidelity, data-adaptive approach with a Long Short-Term Memory (LSTM) neural network to estimate ship response statistics in bimodal, bidirectional seas. The study will employ a fast low-fidelity, volume-based tool SimpleCode and a higher-fidelity tool known as the Large Amplitude Motion Program (LAMP). SimpleCode and LAMP data were generated by common bi-modal, bi-directional sea conditions in the North Atlantic as training data. After training an LSTM network with LAMP ship motion response data, a sample route was traversed and randomly sampled historical weather was input into SimpleCode and the LSTM network, and compared against the higher fidelity results.
    摘要 这篇论文将提出一种多优劣、数据适应的方法,使用长Short-Term Memory(LSTM)神经网络来估算船舶响应统计在二模性、双向海域中。这项研究将使用一个快速低精度的代码工具SimpleCode,以及一个更高精度的工具——Large Amplitude Motion Program(LAMP)。SimpleCode和LAMP数据都是通过常见的二模性、双向海域Conditions在北大西洋进行训练的。接下来,我们将在LAMP船舶响应数据上训练LSTM网络,然后在SimpleCode和LSTM网络中随机选择历史天气数据,并与更高精度结果进行比较。

Local or Global: Selective Knowledge Assimilation for Federated Learning with Limited Labels

  • paper_url: http://arxiv.org/abs/2307.08809
  • repo_url: None
  • paper_authors: Yae Jee Cho, Gauri Joshi, Dimitrios Dimitriadis
  • for: 实际上的 Federated Learning (FL) 方法假设每个客户端都有完整的标签数据,但在实际情况下,客户端的标签数据仅具有有限的量,因为标签的过程是时consuming 和劳动的。
  • methods: 我们提出了 FedLabel,一种基于选择性标签的方法,让客户端选择使用本地或全球模型 pseudo-label 其未标的数据,具体来说,客户端根据其数据的特性选择使用哪一个模型,以确保模型对数据的掌握程度最高。我们还使用了全球和本地模型的知识,通过全球-本地一致调整,以降低两个模型在标签未知数据时的差异。
  • results: 我们在cross-device 和 cross-silo 设定下,与其他半结构化 FL 基elines 比较,获得了8%-24%的提高,甚至在仅有5%-20%的标签数据下,可以超越标准的完全监督 FL 基elines ($100%$ 标签数据)。
    Abstract Many existing FL methods assume clients with fully-labeled data, while in realistic settings, clients have limited labels due to the expensive and laborious process of labeling. Limited labeled local data of the clients often leads to their local model having poor generalization abilities to their larger unlabeled local data, such as having class-distribution mismatch with the unlabeled data. As a result, clients may instead look to benefit from the global model trained across clients to leverage their unlabeled data, but this also becomes difficult due to data heterogeneity across clients. In our work, we propose FedLabel where clients selectively choose the local or global model to pseudo-label their unlabeled data depending on which is more of an expert of the data. We further utilize both the local and global models' knowledge via global-local consistency regularization which minimizes the divergence between the two models' outputs when they have identical pseudo-labels for the unlabeled data. Unlike other semi-supervised FL baselines, our method does not require additional experts other than the local or global model, nor require additional parameters to be communicated. We also do not assume any server-labeled data or fully labeled clients. For both cross-device and cross-silo settings, we show that FedLabel outperforms other semi-supervised FL baselines by $8$-$24\%$, and even outperforms standard fully supervised FL baselines ($100\%$ labeled data) with only $5$-$20\%$ of labeled data.
    摘要 许多现有的FL方法假设客户端拥有完整的标签数据,然而在现实中,客户端通常有限的标签数据,因为标签数据的生成和评估是成本和劳动 INTENSIVE 的过程。客户端的局部模型通常因为地方数据的不充分标注而导致其在大量未标注数据上的泛化能力差。为了解决这个问题,客户端可能会寻求从全球模型中吸取优势,但是由于客户端数据的不一致,这也变得困难。在我们的工作中,我们提出了FedLabel方法,其中客户端选择地ocal或全球模型来 Pseudo-label 其未标注数据,根据哪一个模型更加熟悉数据。我们还利用了两个模型的知识,通过全球-局部一致准则来减少两个模型输出不同的偏差。与其他半导导FL基线相比,我们的方法不需要额外的专家,也不需要额外的参数进行通信。我们也不假设服务器拥有完整的标签数据或全部客户端拥有完整的标签数据。在跨设备和跨笼设置下,我们展示了FedLabel方法可以与其他半导导FL基线相比,提高$8$-$24\%$,甚至超过了完全半导导FL基线($100\%$ 标签数据),只需要$5$-$20\%$ 的标签数据。

Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.08794
  • repo_url: None
  • paper_authors: Patrick Emami, Xiangyu Zhang, David Biagioni, Ahmed S. Zamzam
  • for: 这个论文是为了学习多时间步长的多代理RL(Multi-timescale Multi-agent Reinforcement Learning)中的非站ARY政策而写的。
  • methods: 这个论文使用了可用代理时间尺度信息来定义周期时间编码,并通过 periodic multi-agent policy 来学习非站ARY政策。
  • results: 论文通过 theoretically 和实验 validate 了 periodic multi-agent policy 的学习效果,并在 gridworld 和建筑能源管理环境中证明了该方法的有效性。I hope that helps! Let me know if you have any other questions.
    Abstract In multi-timescale multi-agent reinforcement learning (MARL), agents interact across different timescales. In general, policies for time-dependent behaviors, such as those induced by multiple timescales, are non-stationary. Learning non-stationary policies is challenging and typically requires sophisticated or inefficient algorithms. Motivated by the prevalence of this control problem in real-world complex systems, we introduce a simple framework for learning non-stationary policies for multi-timescale MARL. Our approach uses available information about agent timescales to define a periodic time encoding. In detail, we theoretically demonstrate that the effects of non-stationarity introduced by multiple timescales can be learned by a periodic multi-agent policy. To learn such policies, we propose a policy gradient algorithm that parameterizes the actor and critic with phase-functioned neural networks, which provide an inductive bias for periodicity. The framework's ability to effectively learn multi-timescale policies is validated on a gridworld and building energy management environment.
    摘要 在多时间步长多代理人学习(MARL)中,代理人之间的交互具有不同的时间步长。通常,由多个时间步长引起的政策是非站ARY的。学习非站ARY的政策是困难的,通常需要复杂或不效率的算法。为了解决实际世界中这种控制问题,我们提出了一个简单的框架 для学习非站ARY的多时间步长MARL政策。我们利用代理人的时间步长信息来定义 periodic 时间编码。在详细的演示中,我们证明了由多个时间步长引起的非站ARY效果可以通过 periodic 多代理人政策学习。为了学习这种政策,我们提议一种基于phasic 神经网络的策略梯度算法,该算法在actor和critic中使用phasic 神经网络作为参数,从而提供了periodicity的偏好。该框架在Gridworld和建筑能源管理环境中证明了有效地学习多时间步长政策。

On the Real-Time Semantic Segmentation of Aphid Clusters in the Wild

  • paper_url: http://arxiv.org/abs/2307.10267
  • repo_url: None
  • paper_authors: Raiyan Rahman, Christopher Indris, Tianxiao Zhang, Kaidong Li, Brian McCornack, Daniel Flippo, Ajay Sharda, Guanghui Wang
  • for: addresses the urgent need for an intelligent autonomous system to locate and spray aphid infestations in wheat and sorghum fields, reducing pesticide use and environmental impact.
  • methods: uses real-time semantic segmentation models to segment clusters of aphids in complex crop canopies, with a multiscale dataset to allow for learning at different scales.
  • results: compares the segmentation speeds and accuracy of four state-of-the-art real-time semantic segmentation models on an aphid cluster dataset, demonstrating the effectiveness of a real-time solution for pest detection and reducing inefficient pesticide use.
    Abstract Aphid infestations can cause extensive damage to wheat and sorghum fields and spread plant viruses, resulting in significant yield losses in agriculture. To address this issue, farmers often rely on chemical pesticides, which are inefficiently applied over large areas of fields. As a result, a considerable amount of pesticide is wasted on areas without pests, while inadequate amounts are applied to areas with severe infestations. The paper focuses on the urgent need for an intelligent autonomous system that can locate and spray infestations within complex crop canopies, reducing pesticide use and environmental impact. We have collected and labeled a large aphid image dataset in the field, and propose the use of real-time semantic segmentation models to segment clusters of aphids. A multiscale dataset is generated to allow for learning the clusters at different scales. We compare the segmentation speeds and accuracy of four state-of-the-art real-time semantic segmentation models on the aphid cluster dataset, benchmarking them against nonreal-time models. The study results show the effectiveness of a real-time solution, which can reduce inefficient pesticide use and increase crop yields, paving the way towards an autonomous pest detection system.
    摘要 螨子感染可以对小麦和粟米田场造成广泛的损害,并传播植物病毒,导致农业生产的生产损失。为解决这个问题,农民们经常依靠化学杀虫剂,但这些杀虫剂通常在大面积的田场上不充分应用,导致大量的杀虫剂浪费在没有螨子的区域,而应用到螨子严重感染的区域不充分。本文强调需要一个智能自动化系统,可以在复杂的作物顶层中找到和喷洒感染区域,从而减少杀虫剂的使用和环境的影响。我们收集了大量螨子图像数据集,并提议使用实时semantic segmentation模型来 segment clusters of aphids。我们生成了多尺度的数据集,以便在不同的尺度上学习螨子集。我们对四种当前最佳实时semantic segmentation模型进行比较,以测试它们在aphid cluster数据集上的segmentation速度和准确率。研究结果表明,实时解决方案可以减少不必要的杀虫剂使用,提高作物的收成,为自动检测螨子系统开创了道路。

GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution

  • paper_url: http://arxiv.org/abs/2307.08775
  • repo_url: https://github.com/yining610/gear
  • paper_authors: Yining Lu, Haoping Yu, Daniel Khashabi
  • for: 提高大语言模型(LLM)的性能 across 多种任务。
  • methods: 使用外部工具,但之前的工作过于依赖于任务特定的示范,这限制了其通用性和计算成本。我们介绍了一种名为 GEAR 的计算效率高的问题工具固定算法,可以在多种任务中使用不同的工具。
  • results: 在 14 个数据集和 6 个下游任务上进行了评估,并示出了强大的通用性,可以在新任务、工具和不同的 SLM 上进行更好的工具固定。尽管提供更高效的计算,但 GEAR 可以达到更高的工具固定精度,从而提高下游精度,例如,使用 GEAR 修改 GPT-J 和 GPT-3 可以超过对应的工具修改基eline。
    Abstract Augmenting large language models (LLM) to use external tools enhances their performance across a variety of tasks. However, prior works over-rely on task-specific demonstration of tool use that limits their generalizability and computational cost due to making many calls to large-scale LLMs. We introduce GEAR, a computationally efficient query-tool grounding algorithm that is generalizable to various tasks that require tool use while not relying on task-specific demonstrations. GEAR achieves better efficiency by delegating tool grounding and execution to small language models (SLM) and LLM, respectively; while leveraging semantic and pattern-based evaluation at both question and answer levels for generalizable tool grounding. We evaluate GEAR on 14 datasets across 6 downstream tasks, demonstrating its strong generalizability to novel tasks, tools and different SLMs. Despite offering more efficiency, GEAR achieves higher precision in tool grounding compared to prior strategies using LLM prompting, thus improving downstream accuracy at a reduced computational cost. For example, we demonstrate that GEAR-augmented GPT-J and GPT-3 outperform counterpart tool-augmented baselines because of better tool use.
    摘要 加强大语言模型(LLM)使用外部工具可以提高其在多种任务上的表现。然而,先前的工作过于依赖任务特定的工具使用示例,这限制了它们的普遍性和计算成本,因为它们需要访问大规模的 LLM。我们介绍了一种计算效率高的问题工具固定算法(GEAR),可以在多种需要工具使用的任务上实现更好的普遍性,而不需要任务特定的示例。GEAR通过委托工具固定和执行给小语言模型(SLM)和 LLM 分别执行,同时利用 semantic 和 pattern 基于的评估方法来实现更好的工具固定。我们在 14 个数据集上进行了 6 个下游任务的评估,并证明了 GEAR 在新任务、工具和不同的 SLM 上都具有强大的普遍性。尽管 GEAR 提供更高的效率,但它在工具固定精度方面仍然高于先前使用 LLM 提示的策略,因此在减少计算成本的情况下提高了下游精度。例如,我们表明了 GEAR 对 GPT-J 和 GPT-3 的加强可以超过相应的工具加强基eline。

AI empowering research: 10 ways how science can benefit from AI

  • paper_url: http://arxiv.org/abs/2307.10265
  • repo_url: None
  • paper_authors: César França
  • for: 这篇论文探讨人工智能(AI)在科学研究中的转变性影响。
  • methods: 论文介绍了AI在科学研究中的10种应用,包括强大的引用工具、更好地理解研究问题、增强研究问题生成、优化研究设计、数据生成、数据转换、高级数据分析和AI协助报告。
  • results: 论文指出,AI可以帮助科学家增强创造力,但需要考虑偏见、隐私问题和人AI合作。
    Abstract This article explores the transformative impact of artificial intelligence (AI) on scientific research. It highlights ten ways in which AI is revolutionizing the work of scientists, including powerful referencing tools, improved understanding of research problems, enhanced research question generation, optimized research design, stub data generation, data transformation, advanced data analysis, and AI-assisted reporting. While AI offers numerous benefits, challenges such as bias, privacy concerns, and the need for human-AI collaboration must be considered. The article emphasizes that AI can augment human creativity in science but not replace it.
    摘要 这篇文章探讨人工智能(AI)在科学研究中的转变性影响。文章提出了10种AI在科学家工作中的改变方式,包括强大的引用工具、更好地理解研究问题、增强研究问题生成、优化研究设计、干扰数据生成、数据转换、高级数据分析和AI助手报告。虽然AI具有许多优势,但需要考虑偏见、隐私问题和人机合作。文章表达AI可以补充人类创造力,但不能取代它。Here's the translation of the text into Traditional Chinese:这篇文章探讨人工智能(AI)在科学研究中的转换性影响。文章提出了10种AI在科学家工作中的改变方式,包括强大的引用工具、更好地理解研究问题、增强研究问题生成、优化研究设计、干扰数据生成、数据转换、高级数据分析和AI助手报告。处理AI具有许多优势,但需要考虑偏见、隐私问题和人机合作。文章表达AI可以补充人类创造力,但不能取代它。

Reflections from the Workshop on AI-Assisted Decision Making for Conservation

  • paper_url: http://arxiv.org/abs/2307.08774
  • repo_url: None
  • paper_authors: Lily Xu, Esther Rolf, Sara Beery, Joseph R. Bennett, Tanya Berger-Wolf, Tanya Birch, Elizabeth Bondi-Kelly, Justin Brashares, Melissa Chapman, Anthony Corso, Andrew Davies, Nikhil Garg, Angela Gaylard, Robert Heilmayr, Hannah Kerner, Konstantin Klemmer, Vipin Kumar, Lester Mackey, Claire Monteleoni, Paul Moorcroft, Jonathan Palmer, Andrew Perrault, David Thau, Milind Tambe
  • for: 这份白皮书总结了在哈佛大学计算社会中心主办的AI助成决策工作坊上的演讲和讨论,旨在提出保护生态系统的开放研究问题,以及需要人工智能解决的保护挑战。
  • methods: 这份白皮书总结了工作坊上的讲座和讨论,并提出了一些开放研究问题,例如资源分配、规划和干预的算法化决策方法,以及如何应用这些方法解决生态系统的保护挑战。
  • results: 白皮书认为,AI助成决策方法可以帮助解决生态系统的保护挑战,但是还需要进一步的研究和开发,以确保这些方法能够应用到实际的保护场景中。
    Abstract In this white paper, we synthesize key points made during presentations and discussions from the AI-Assisted Decision Making for Conservation workshop, hosted by the Center for Research on Computation and Society at Harvard University on October 20-21, 2022. We identify key open research questions in resource allocation, planning, and interventions for biodiversity conservation, highlighting conservation challenges that not only require AI solutions, but also require novel methodological advances. In addition to providing a summary of the workshop talks and discussions, we hope this document serves as a call-to-action to orient the expansion of algorithmic decision-making approaches to prioritize real-world conservation challenges, through collaborative efforts of ecologists, conservation decision-makers, and AI researchers.
    摘要 在这份白皮书中,我们对哈佛大学计算社会中心在2022年10月20-21日举行的AI助成决策为保护生态系统工作shop的演讲和讨论进行了总结和分析。我们确定了保护生态系统的开放研究问题,包括资源分配、规划和干预,并指出了需要AI解决方案的保护挑战。此外,我们希望通过协作的方式,让生态学家、保护决策者和AI研究人员共同努力,推动算法决策方法的扩展,以便更好地应对实际生态系统保护挑战。

A mixed policy to improve performance of language models on math problems

  • paper_url: http://arxiv.org/abs/2307.08767
  • repo_url: https://github.com/vividitytech/math_lm_rl
  • paper_authors: Gang Chen
  • for: 解决 math 问题时,语言模型通常采用采样策略来预测下一个词的概率。但这会导致 math 问题的解决结果不准确。为了解决这个问题,我们提出了一种混合策略探索方法,使用 reinforcement learning 来解决 math 问题。
  • methods: 我们提出了一种两级 токен探索策略,其中一级是概率采样,二级是决定性选择下一个tokен的最高分选择策略。具体来说,抽象层策略会根据概率采样决定下一个tokен是操作符或操作数,而第二级策略则是在决定性下选择下一个tokен的最高分选择。
  • results: 我们在 GSM8K 数据集上使用 GPT-2 模型进行测试,并达到了 более $2%$ 的性能提升。我们的实现可以在 GitHub 上找到:https://github.com/vividitytech/math_lm_rl。
    Abstract When to solve math problems, most language models take a sampling strategy to predict next word according conditional probabilities. In the math reasoning step, it may generate wrong answer. Considering math problems are deterministic, we propose a mixed policy exploration approach to solve math problems with reinforcement learning. In peculiar, we propose a two level token exploration policy: the abstract level explores next token with probability and the second level is deterministic. Specifically, the abstract level policy will decide whether the token is operator or operand with probability sampling, while the second level is deterministic to select next token with the highest score in a greedy way. We test our method on GSM8K dataset with GPT-2 model, and demonstrate more than $2\%$ performance gain. Our implementation is available at https://github.com/vividitytech/math_lm_rl.
    摘要 当解决数学问题时,大多数语言模型采用采样策略来预测下一个词的 conditional probabilities。在数学逻辑步骤中,它可能生成错误答案。考虑到数学问题是deterministic的,我们提出了混合策略探索方法来解决数学问题使用强化学习。具体来说,我们提出了两级токен探索策略:第一级探索下一个词的概率,第二级是决定性的选择下一个词的最高分。 Specifically, the first-level policy will decide whether the token is an operator or operand with probability sampling, while the second-level policy is deterministic to select the next token with the highest score in a greedy way. 我们在GSM8K dataset上使用GPT-2模型测试了我们的方法,并证明了 más de $2\%$ 的性能提升。我们的实现可以在https://github.com/vividitytech/math_lm_rl上找到。

Quality Assessment of Photoplethysmography Signals For Cardiovascular Biomarkers Monitoring Using Wearable Devices

  • paper_url: http://arxiv.org/abs/2307.08766
  • repo_url: None
  • paper_authors: Felipe M. Dias, Marcelo A. F. Toledo, Diego A. C. Cardenas, Douglas A. Almeida, Filipe A. C. Oliveira, Estela Ribeiro, Jose E. Krieger, Marco A. Gutierrez
  • for: 这项研究用于评估非侵入式光谱学技术(PPG)的可靠性和准确性,以及开发远程、非侵入式和连续式测量设备。
  • methods: 该研究使用了27个统计特征从PPG信号中训练机器学习模型,包括梯度提升(XGBoost和CatBoost)和随机森林(RF)算法。
  • results: 研究发现,使用XGBoost、CatBoost和RF算法可以达到94.4、95.6和95.0的敏感度(Se)、正确预测率(PPV)和F1分数(F1),均高于当前文献报道的值。
    Abstract Photoplethysmography (PPG) is a non-invasive technology that measures changes in blood volume in the microvascular bed of tissue. It is commonly used in medical devices such as pulse oximeters and wrist worn heart rate monitors to monitor cardiovascular hemodynamics. PPG allows for the assessment of parameters (e.g., heart rate, pulse waveform, and peripheral perfusion) that can indicate conditions such as vasoconstriction or vasodilation, and provides information about microvascular blood flow, making it a valuable tool for monitoring cardiovascular health. However, PPG is subject to a number of sources of variations that can impact its accuracy and reliability, especially when using a wearable device for continuous monitoring, such as motion artifacts, skin pigmentation, and vasomotion. In this study, we extracted 27 statistical features from the PPG signal for training machine-learning models based on gradient boosting (XGBoost and CatBoost) and Random Forest (RF) algorithms to assess quality of PPG signals that were labeled as good or poor quality. We used the PPG time series from a publicly available dataset and evaluated the algorithm s performance using Sensitivity (Se), Positive Predicted Value (PPV), and F1-score (F1) metrics. Our model achieved Se, PPV, and F1-score of 94.4, 95.6, and 95.0 for XGBoost, 94.7, 95.9, and 95.3 for CatBoost, and 93.7, 91.3 and 92.5 for RF, respectively. Our findings are comparable to state-of-the-art reported in the literature but using a much simpler model, indicating that ML models are promising for developing remote, non-invasive, and continuous measurement devices.
    摘要 photoplethysmography (PPG) 是一种非侵入式技术,用于测量血液量变化在微血管床中的变化。它通常用于医疗器械中,如搏动计和胳膊上的心率测量仪器,以监测Cardiovascular 血液动力学。 PPG 允许评估一些参数(如心率、搏动波形和 péripheral perfusion),这些参数可能指示 condition 如血管收缩或血管 расширение,并提供关于 microvascular 血液流的信息,使其成为监测 Cardiovascular 健康的有用工具。然而,PPG 受到一些来源的变化的影响,这些变化可能影响其准确性和可靠性,特别是在使用可携带式设备进行连续监测时。在这种研究中,我们从 PPG 信号中提取了27个统计特征,用于训练机器学习模型,包括梯度拟合(XGBoost 和 CatBoost)和Random Forest(RF)算法。我们使用公共可用的数据集中的 PPG 时间序列,并评估算法的性能使用敏感度(Se)、正确预测值(PPV)和 F1 得分(F1)指标。我们的模型实现了 Se、PPV 和 F1 分别为 94.4、95.6 和 95.0,94.7、95.9 和 95.3,93.7、91.3 和 92.5。我们的发现与文献中的状态艺术相比,但使用 much simpler 模型,表明机器学习模型是开发 remote、非侵入式、连续测量设备的有力的选择。

Fast model inference and training on-board of Satellites

  • paper_url: http://arxiv.org/abs/2307.08700
  • repo_url: https://github.com/previtus/ravaen-unibap-dorbit
  • paper_authors: Vít Růžička, Gonzalo Mateo-García, Chris Bridges, Chris Brunskill, Cormac Purcell, Nicolas Longépé, Andrew Markham
  • for: 本研究旨在实现在 cubeSat 上部署了多任务模型,并在 satellite 上进行机器学习模型的训练。
  • methods: 本研究使用了一个轻量级基础模型叫做 RaVAEn,它可以将小图像组成缩排的维度向量,并可以支援多个下游任务。
  • results: 本研究获得了在 cubeSat 上部署 RaVAEn 模型,并在 satellite 上进行了快速的几何训练。 encoding 时间为 0.110 秒,并且可以在 satellite 上进行快速的数据预测和决策。
    Abstract Artificial intelligence onboard satellites has the potential to reduce data transmission requirements, enable real-time decision-making and collaboration within constellations. This study deploys a lightweight foundational model called RaVAEn on D-Orbit's ION SCV004 satellite. RaVAEn is a variational auto-encoder (VAE) that generates compressed latent vectors from small image tiles, enabling several downstream tasks. In this work we demonstrate the reliable use of RaVAEn onboard a satellite, achieving an encoding time of 0.110s for tiles of a 4.8x4.8 km$^2$ area. In addition, we showcase fast few-shot training onboard a satellite using the latent representation of data. We compare the deployment of the model on the on-board CPU and on the available Myriad vision processing unit (VPU) accelerator. To our knowledge, this work shows for the first time the deployment of a multi-task model on-board a CubeSat and the on-board training of a machine learning model.
    摘要 人工智能在卫星上有潜力减少数据传输要求、实时决策和卫星群 collaboration 。本研究部署了一个轻量级基础模型called RaVAEn 在D-Orbit的 ION SCV004卫星上。RaVAEn 是一种变量自动编码器(VAE),通过生成压缩的幂量向量来实现多个下游任务。在这个研究中,我们成功地在卫星上使用 RaVAEn,编码时间为 0.110 秒,对一个 4.8x4.8 km$^2$ 的区域进行编码。此外,我们还展示了在卫星上快速培训 few-shot 的 latent 表示。我们比较了在卫星上 CPU 和可用的 Myriad 视觉处理器(VPU)加速器上部署模型的性能。根据我们所知,这是第一次在 CubeSat 上部署多任务模型,以及在卫星上培训机器学习模型。

Pair then Relation: Pair-Net for Panoptic Scene Graph Generation

  • paper_url: http://arxiv.org/abs/2307.08699
  • repo_url: https://github.com/king159/pair-net
  • paper_authors: Jinghao Wang, Zhengyu Wen, Xiangtai Li, Zujin Guo, Jingkang Yang, Ziwei Liu
  • for: 本研究旨在提出一种新的基线方法来解决Scene Graph Generation(SGG)中的Panoptic Scene Graph(PSG)问题,以创造更加全面的场景图表示。
  • methods: 我们首先进行了深入分析,发现当前PSG方法中最大的瓶颈在于对每个对象之间的对应关系的缺失。基于这一点,我们提出了一种新的框架:Pair then Relation(Pair-Net),它使用一个Pair Proposal Network(PPN)来学习和筛选对象之间的笔直关系。
  • results: 我们通过了广泛的ablation和分析,证明了我们的方法可以大幅提高现有基线的性能。尤其是,我们的方法在PSG benchmark上实现了新的州际纪录,与PSGFormer相比增加了超过10%的绝对提升。代码可以在https://github.com/king159/Pair-Net上获取。
    Abstract Panoptic Scene Graph (PSG) is a challenging task in Scene Graph Generation (SGG) that aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes. Compared to SGG, PSG has several challenging problems: pixel-level segment outputs and full relationship exploration (It also considers thing and stuff relation). Thus, current PSG methods have limited performance, which hinders downstream tasks or applications. The goal of this work aims to design a novel and strong baseline for PSG. To achieve that, we first conduct an in-depth analysis to identify the bottleneck of the current PSG models, finding that inter-object pair-wise recall is a crucial factor that was ignored by previous PSG methods. Based on this and the recent query-based frameworks, we present a novel framework: Pair then Relation (Pair-Net), which uses a Pair Proposal Network (PPN) to learn and filter sparse pair-wise relationships between subjects and objects. Moreover, we also observed the sparse nature of object pairs for both Motivated by this, we design a lightweight Matrix Learner within the PPN, which directly learn pair-wised relationships for pair proposal generation. Through extensive ablation and analysis, our approach significantly improves upon leveraging the segmenter solid baseline. Notably, our method achieves new state-of-the-art results on the PSG benchmark, with over 10\% absolute gains compared to PSGFormer. The code of this paper is publicly available at https://github.com/king159/Pair-Net.
    摘要 panographic scene graph (PSG) 是Scene Graph Generation (SGG) 中一个挑战性的任务,旨在通过�annoying segmentation而非盒子来创建更加全面的场景图表示。相比SGG,PSG具有一些挑战性的问题:像素级别的分割输出和全面关系探索(同时考虑thing和stuff关系)。因此,当前PSG方法的性能有限,这阻碍了下游任务或应用。本文的目标是设计一个新的和强大的PSG基eline。为达到这一目标,我们首先进行了深入分析,并发现了当前PSG模型的瓶颈在于对象间对之间的recall问题。基于这一点和最近的查询基础框架,我们提出了一个新的框架:Pair then Relation(Pair-Net),它使用一个Pair Proposal Network(PPN)来学习和筛选对象间的笔 Edwards,并且对于对象对的笔 Edwards进行了灵活的学习。此外,我们还发现了对象对的笔 Edwards是稀疏的,因此我们设计了一个轻量级的矩阵学习器,直接学习对象间的笔 Edwards。经过广泛的拟合和分析,我们的方法在使用 segmenter 固定基线上显著提高了PSG的性能,并达到了PSGFormer 的新的国际纪录。代码可以在 中获取。

COLLIE: Systematic Construction of Constrained Text Generation Tasks

  • paper_url: http://arxiv.org/abs/2307.08689
  • repo_url: https://github.com/princeton-nlp/Collie
  • paper_authors: Shunyu Yao, Howard Chen, Austin W. Hanjie, Runzhe Yang, Karthik Narasimhan
  • For: This paper is written for those interested in natural language processing and the development of constrained text generation systems.* Methods: The paper presents a grammar-based framework called COLLIE, which allows for the specification of rich and compositional constraints for diverse generation levels and modeling challenges. The framework includes tools for automatic extraction of task instances given a constraint structure and a raw text corpus.* Results: The paper compiles a dataset called COLLIE-v1, which includes 2080 instances with 13 constraint structures, and performs systematic experiments with five state-of-the-art instruction-tuned language models to analyze their performances and reveal shortcomings.
    Abstract Text generation under constraints have seen increasing interests in natural language processing, especially with the rapidly improving capabilities of large language models. However, existing benchmarks for constrained generation usually focus on fixed constraint types (e.g.,generate a sentence containing certain words) that have proved to be easy for state-of-the-art models like GPT-4. We present COLLIE, a grammar-based framework that allows the specification of rich, compositional constraints with diverse generation levels (word, sentence, paragraph, passage) and modeling challenges (e.g.,language understanding, logical reasoning, counting, semantic planning). We also develop tools for automatic extraction of task instances given a constraint structure and a raw text corpus. Using COLLIE, we compile the COLLIE-v1 dataset with 2080 instances comprising 13 constraint structures. We perform systematic experiments across five state-of-the-art instruction-tuned language models and analyze their performances to reveal shortcomings. COLLIE is designed to be extensible and lightweight, and we hope the community finds it useful to develop more complex constraints and evaluations in the future.
    摘要 To address this limitation, we present COLLIE, a grammar-based framework that enables the specification of rich and compositional constraints at various generation levels (word, sentence, paragraph, passage) and with diverse modeling challenges (e.g., language understanding, logical reasoning, counting, semantic planning). We also develop tools for automatically extracting task instances from a constraint structure and a raw text corpus.Using COLLIE, we compiled the COLLIE-v1 dataset consisting of 2080 instances with 13 constraint structures. We conducted systematic experiments on five state-of-the-art instruction-tuned language models and analyzed their performances to reveal their shortcomings.COLLIE is designed to be extensible and lightweight, and we hope that the community will find it useful to develop more complex constraints and evaluations in the future.

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

  • paper_url: http://arxiv.org/abs/2307.08678
  • repo_url: None
  • paper_authors: Yanda Chen, Ruiqi Zhong, Narutatsu Ri, Chen Zhao, He He, Jacob Steinhardt, Zhou Yu, Kathleen McKeown
  • for: 本研究旨在评估大语言模型(LLM)的自我解释能力,以及LLM可以帮助人类构建模型处理不同输入的心理模型。
  • methods: 我们提出了两种基于对反事实可靠性的评估指标:准确性和通用性。我们使用自动生成的反事实来评估现状的State-of-the-art LLMs(如GPT-4)在多步骤事实理解和奖励模型任务中的性能。
  • results: 我们发现LLM的解释准确性较低,并且准确性与可能性无关。因此,不仅通过人类批准(如RLHF)优化是不够的。
    Abstract Large language models (LLMs) are trained to imitate humans to explain human decisions. However, do LLMs explain themselves? Can they help humans build mental models of how LLMs process different inputs? To answer these questions, we propose to evaluate $\textbf{counterfactual simulatability}$ of natural language explanations: whether an explanation can enable humans to precisely infer the model's outputs on diverse counterfactuals of the explained input. For example, if a model answers "yes" to the input question "Can eagles fly?" with the explanation "all birds can fly", then humans would infer from the explanation that it would also answer "yes" to the counterfactual input "Can penguins fly?". If the explanation is precise, then the model's answer should match humans' expectations. We implemented two metrics based on counterfactual simulatability: precision and generality. We generated diverse counterfactuals automatically using LLMs. We then used these metrics to evaluate state-of-the-art LLMs (e.g., GPT-4) on two tasks: multi-hop factual reasoning and reward modeling. We found that LLM's explanations have low precision and that precision does not correlate with plausibility. Therefore, naively optimizing human approvals (e.g., RLHF) may not be a sufficient solution.
    摘要 大型语言模型(LLM)通常被训练以模仿人类来解释人类决策。然而, LLM 是否能够解释自己?可以 LLM 帮助人类建立模型处理不同输入的心理模型吗?为了回答这些问题,我们提议评估 $\textbf{对假实际可能性}$ 的自然语言解释:是否可以通过解释来准确地预测模型对不同的对假输入的输出。例如,如果模型对问题 "鸟可以飞吗?" 的输入提供了解释 "所有鸟类都可以飞",那么人类就可以从解释中推断出模型对 "鸟不能飞" 的对假输入的输出是什么。如果解释准确,那么模型的输出应该与人类的预期匹配。我们实现了两个基于对假实际可能性的度量:准确性和通用性。我们使用 LLM 自动生成了多种对假输入。然后,我们使用这两个度量来评估现状最佳的 LLM (例如 GPT-4) 在多步真实逻辑和奖励模型两个任务上的表现。我们发现 LLM 的解释准确性较低,并且准确性与可能性之间没有直接的相关性。因此,不仅通过人类批准(例如 RLHF)来优化模型可能不够。

TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT

  • paper_url: http://arxiv.org/abs/2307.08674
  • repo_url: None
  • paper_authors: Liangyu Zha, Junlin Zhou, Liyao Li, Rui Wang, Qingyi Huang, Saisai Yang, Jing Yuan, Changbao Su, Xiang Li, Aofeng Su, Tao Zhang, Chen Zhou, Kaizhe Shou, Miao Wang, Wufang Zhu, Guoshan Lu, Chao Ye, Yali Ye, Wentao Ye, Yiming Zhang, Xinglong Deng, Jie Xu, Haobo Wang, Gang Chen, Junbo Zhao
  • For: TableGPT is a unified fine-tuned framework that enables large language models (LLMs) to understand and operate on tables using external functional commands, allowing for seamless interaction with tabular data and enabling a wide range of functionalities such as question answering, data manipulation, data visualization, analysis report generation, and automated prediction.* Methods: TableGPT uses a novel concept of global tabular representations, which empowers LLMs to gain a comprehensive understanding of the entire table beyond meta-information. It jointly trains LLMs on both table and text modalities, achieving a deep understanding of tabular data and the ability to perform complex operations on tables through chain-of-command instructions.* Results: TableGPT offers several advantages, including being a self-contained system rather than relying on external API interfaces, supporting efficient data process flow, query rejection (when appropriate), and private deployment, enabling faster domain data fine-tuning and ensuring data privacy. These features enhance the framework’s adaptability to specific use cases.
    Abstract Tables are prevalent in real-world databases, requiring significant time and effort for humans to analyze and manipulate. The advancements in large language models (LLMs) have made it possible to interact with tables using natural language input, bringing this capability closer to reality. In this paper, we present TableGPT, a unified fine-tuned framework that enables LLMs to understand and operate on tables using external functional commands. It introduces the capability to seamlessly interact with tables, enabling a wide range of functionalities such as question answering, data manipulation (e.g., insert, delete, query, and modify operations), data visualization, analysis report generation, and automated prediction. TableGPT aims to provide convenience and accessibility to users by empowering them to effortlessly leverage tabular data. At the core of TableGPT lies the novel concept of global tabular representations, which empowers LLMs to gain a comprehensive understanding of the entire table beyond meta-information. By jointly training LLMs on both table and text modalities, TableGPT achieves a deep understanding of tabular data and the ability to perform complex operations on tables through chain-of-command instructions. Importantly, TableGPT offers the advantage of being a self-contained system rather than relying on external API interfaces. Moreover, it supports efficient data process flow, query rejection (when appropriate) and private deployment, enabling faster domain data fine-tuning and ensuring data privacy, which enhances the framework's adaptability to specific use cases.
    摘要 tables are prevalent in real-world databases, requiring significant time and effort for humans to analyze and manipulate. The advancements in large language models (LLMs) have made it possible to interact with tables using natural language input, bringing this capability closer to reality. In this paper, we present TableGPT, a unified fine-tuned framework that enables LLMs to understand and operate on tables using external functional commands. It introduces the capability to seamlessly interact with tables, enabling a wide range of functionalities such as question answering, data manipulation (e.g., insert, delete, query, and modify operations), data visualization, analysis report generation, and automated prediction. TableGPT aims to provide convenience and accessibility to users by empowering them to effortlessly leverage tabular data. At the core of TableGPT lies the novel concept of global tabular representations, which empowers LLMs to gain a comprehensive understanding of the entire table beyond meta-information. By jointly training LLMs on both table and text modalities, TableGPT achieves a deep understanding of tabular data and the ability to perform complex operations on tables through chain-of-command instructions. Importantly, TableGPT offers the advantage of being a self-contained system rather than relying on external API interfaces. Moreover, it supports efficient data process flow, query rejection (when appropriate) and private deployment, enabling faster domain data fine-tuning and ensuring data privacy, which enhances the framework's adaptability to specific use cases.

Quaternion Convolutional Neural Networks: Current Advances and Future Directions

  • paper_url: http://arxiv.org/abs/2307.08663
  • repo_url: None
  • paper_authors: Gerardo Altamirano-Gomez, Carlos Gershenson
  • for: 本研究的目的是探讨量子值 convolutional neural network (QCNN) 的开发和应用。
  • methods: 本研究使用了现有的 QCNN 模型,并进行了系统性的分析和评估。
  • results: 研究发现,使用 QCNN 可以 achieve 同等或更好的性能,并且具有更少的参数。此外,QCNN 还可以捕捉到更多的信息,因此可以在各种应用中提供更好的性能。
    Abstract Since their first applications, Convolutional Neural Networks (CNNs) have solved problems that have advanced the state-of-the-art in several domains. CNNs represent information using real numbers. Despite encouraging results, theoretical analysis shows that representations such as hyper-complex numbers can achieve richer representational capacities than real numbers, and that Hamilton products can capture intrinsic interchannel relationships. Moreover, in the last few years, experimental research has shown that Quaternion-Valued CNNs (QCNNs) can achieve similar performance with fewer parameters than their real-valued counterparts. This paper condenses research in the development of QCNNs from its very beginnings. We propose a conceptual organization of current trends and analyze the main building blocks used in the design of QCNN models. Based on this conceptual organization, we propose future directions of research.
    摘要 自它们的首次应用,卷积神经网络(CNN)已经解决了许多领域的问题,提高了状态艺术。CNN表示信息使用实数。尽管获得了激励的结果,理论分析显示,幂复数可以实现更丰富的表达能力,而汉密尔顿产品可以捕捉内在的通道关系。此外,最近几年的实验室研究表明,四元值CNN(QCNN)可以与其实数对应的模型相比,使用 fewer 参数达到相似的性能。本文将从QCNN的开发的起源进行抽象,我们提出了现有趋势的概念组织,并分析了QCNN模型的主要构建块。基于这个概念组织,我们提出了未来的研究方向。

Hyperparameter Tuning Cookbook: A guide for scikit-learn, PyTorch, river, and spotPython

  • paper_url: http://arxiv.org/abs/2307.10262
  • repo_url: https://github.com/sequential-parameter-optimization/spotpython
  • paper_authors: Thomas Bartz-Beielstein
  • for: 本文提供了一份完整的hyperparameter tuning指南,使用spotPython进行scikit-learn、PyTorch和river模型的优化。
  • methods: 本文使用spotPython的代理模型基本优化过程,并将hyperparameter tuning应用于sklearn模型 such as Support Vector Classification、Random Forests、Gradient Boosting (XGB)和K-nearest neighbors (KNN) 等,以及river中的Hoeffding Adaptive Tree Regressor。
  • results: 本文通过实践和步骤解释,为使用Python进行hyperparameter tuning提供了一个实用的开始点。特点包括Tensorboard、PyTorch Lightning、spotPython和river之间的交互,以及PyTorch和PyTorch Lightning训练工作流程的集成。
    Abstract This document provides a comprehensive guide to hyperparameter tuning using spotPython for scikit-learn, PyTorch, and river. The first part introduces spotPython's surrogate model-based optimization process, while the second part focuses on hyperparameter tuning. Several case studies are presented, including hyperparameter tuning for sklearn models such as Support Vector Classification, Random Forests, Gradient Boosting (XGB), and K-nearest neighbors (KNN), as well as a Hoeffding Adaptive Tree Regressor from river. The integration of spotPython into the PyTorch and PyTorch Lightning training workflow is also discussed. With a hands-on approach and step-by-step explanations, this cookbook serves as a practical starting point for anyone interested in hyperparameter tuning with Python. Highlights include the interplay between Tensorboard, PyTorch Lightning, spotPython, and river. This publication is under development, with updates available on the corresponding webpage.
    摘要 这份文档提供了使用 spotPython 进行 scikit-learn、PyTorch 和 river 模型hyperparameter tuning的全面指南。第一部分介绍了 spotPython 的代理模型基于优化过程,而第二部分则专注于 hyperparameter tuning。文档包含了多个案例研究,包括 scikit-learn 模型支持向量分类、Random Forests、Gradient Boosting (XGB) 和 K-nearest neighbors (KNN) 等模型的 hyperparameter tuning,以及来自 river 的 Hoeffding Adaptive Tree Regressor。文档还讲解了将 spotPython integrated into PyTorch 和 PyTorch Lightning 训练工作流程。以一种实践的方式和步骤说明,这本 cookbook 作为 Python 中 hyperparameter tuning 的实践入门点。文档在开发中,更新信息可以在相关 webpage 上获得。

Glamour muscles: why having a body is not what it means to be embodied

  • paper_url: http://arxiv.org/abs/2307.08598
  • repo_url: None
  • paper_authors: Shawn L. Beaulieu, Sam Kriegman
  • for: 提高智能机器的能力
  • methods: 使用embodiment方法
  • results: 生成更高级的智能工具
    Abstract Embodiment has recently enjoyed renewed consideration as a means to amplify the faculties of smart machines. Proponents of embodiment seem to imply that optimizing for movement in physical space promotes something more than the acquisition of niche capabilities for solving problems in physical space. However, there is nothing in principle which should so distinguish the problem of action selection in physical space from the problem of action selection in more abstract spaces, like that of language. Rather, what makes embodiment persuasive as a means toward higher intelligence is that it promises to capture, but does not actually realize, contingent facts about certain bodies (living intelligence) and the patterns of activity associated with them. These include an active resistance to annihilation and revisable constraints on the processes that make the world intelligible. To be theoretically or practically useful beyond the creation of niche tools, we argue that "embodiment" cannot be the trivial fact of a body, nor its movement through space, but the perpetual negotiation of the function, design, and integrity of that body$\unicode{x2013}$that is, to participate in what it means to $\textit{constitute}$ a given body. It follows that computer programs which are strictly incapable of traversing physical space might, under the right conditions, be more embodied than a walking, talking robot.
    摘要 现在,人工智能的实体化受到了新的重视。支持者们认为,通过 Physical space 中的运动优化智能机器的能力。然而,没有任何原理可以解释 why 在 Physical space 中的行动选择问题和其他更抽象的空间,如语言空间中的行动选择问题之间存在差异。实际上,embodiment 的吸引力来自于它承诺能够捕捉,但并没有实现,certain bodies (生命智能)和其活动模式之间的相互关系。这些关系包括活体抵抗灭亡和可修改的世界认知过程。如果旨在超越创建 nich 工具的理论或实践用途,我们认为embodiment 不能是身体的杂志性,也不能是运动在空间中的杂志性,而是通过参与到一个身体的定义、设计和完整性的协商来实现。从这个意义上来看,一个不能在 Physical space 中移动的计算机程序可能在某些条件下比一个步行、说话的 робот更加embodied。