cs.AI - 2023-08-24

FaceTouch: Detecting hand-to-face touch with supervised contrastive learning to assist in tracing infectious disease

  • paper_url: http://arxiv.org/abs/2308.12840
  • repo_url: None
  • paper_authors: Mohamed R. Ibrahim, Terry Lyons
  • for: 本研究旨在提出一种基于深度学习的计算机视觉框架,以探索在复杂的城市场景中自动检测人员之间的手带面接触。
  • methods: 该框架基于深度学习的两个子模型,一个用于检测人员,另一个用于分析人员的动作。 FaceTouch 使用RGB图像来检测手带面接触,并利用人体姿势such as arm movement来减少部分遮挡。
  • results: 研究表明,FaceTouch 在 Complex urban scenes 中能够准确检测手带面接触,并在未经过其他数据集训练的情况下显示了强验证能力。
    Abstract Through our respiratory system, many viruses and diseases frequently spread and pass from one person to another. Covid-19 served as an example of how crucial it is to track down and cut back on contacts to stop its spread. There is a clear gap in finding automatic methods that can detect hand-to-face contact in complex urban scenes or indoors. In this paper, we introduce a computer vision framework, called FaceTouch, based on deep learning. It comprises deep sub-models to detect humans and analyse their actions. FaceTouch seeks to detect hand-to-face touches in the wild, such as through video chats, bus footage, or CCTV feeds. Despite partial occlusion of faces, the introduced system learns to detect face touches from the RGB representation of a given scene by utilising the representation of the body gestures such as arm movement. This has been demonstrated to be useful in complex urban scenarios beyond simply identifying hand movement and its closeness to faces. Relying on Supervised Contrastive Learning, the introduced model is trained on our collected dataset, given the absence of other benchmark datasets. The framework shows a strong validation in unseen datasets which opens the door for potential deployment.
    摘要 我们的呼吸系统中有许多病毒和疾病经常传播和传递 від一个人到另一个。COVID-19 作为一个例子,说明了如何重要地追踪和降低联系以阻据其传播。然而,在复杂的城市场景或室内环境中找到自动方法检测手部触摸是一个明显的潜在难点。在这篇文章中,我们介绍了一个基于深度学习的计算机视觉框架,called FaceTouch。这个框架包括深度子模型以检测人类和分析他们的动作。FaceTouch 目标在野兽中检测手部触摸,例如透过视频聊天、公共汽车录影或 CCTV 输入。即使人脸部分被遮蔽,引入的系统可以从 RGB 表示中检测手部触摸,利用人体姿势的变化,如臂部运动。这已经在复杂的城市场景中显示出了实用性,超出了单纯检测手部运动和距离面部的能力。我们靠Supervised Contrastive Learning 进行训练,使用我们收集的数据集,因为没有其他参考数据集。引入的模型在未见 datasets 中显示了强大的验证,这开启了潜在的应用之门。

Short Run Transit Route Planning Decision Support System Using a Deep Learning-Based Weighted Graph

  • paper_url: http://arxiv.org/abs/2308.12828
  • repo_url: None
  • paper_authors: Nadav Shalit, Michael Fire, Dima Kagan, Eran Ben-Elia
  • for: 提高公共交通服务的效率和可靠性,帮助公共交通观察者快速地找到更好的路线。
  • methods: 使用深度学习技术建立一个决策支持系统,将多种数据来源(如GTFS和智能卡数据)处理和模型,并使用自我超vision进行训练,以预测路段延迟值。这些延迟值被用作交通Graph的边重量,以便高效地寻找路径。
  • results: 在 tel aviv 进行评估中,我们能够 reducemore than 9% of the routes, including both intraurban and suburban routes, highlighting the model’s versatility and effectiveness in improving public transport services.
    Abstract Public transport routing plays a crucial role in transit network design, ensuring a satisfactory level of service for passengers. However, current routing solutions rely on traditional operational research heuristics, which can be time-consuming to implement and lack the ability to provide quick solutions. Here, we propose a novel deep learning-based methodology for a decision support system that enables public transport (PT) planners to identify short-term route improvements rapidly. By seamlessly adjusting specific sections of routes between two stops during specific times of the day, our method effectively reduces times and enhances PT services. Leveraging diverse data sources such as GTFS and smart card data, we extract features and model the transportation network as a directed graph. Using self-supervision, we train a deep learning model for predicting lateness values for road segments. These lateness values are then utilized as edge weights in the transportation graph, enabling efficient path searching. Through evaluating the method on Tel Aviv, we are able to reduce times on more than 9\% of the routes. The improved routes included both intraurban and suburban routes showcasing a fact highlighting the model's versatility. The findings emphasize the potential of our data-driven decision support system to enhance public transport and city logistics, promoting greater efficiency and reliability in PT services.
    摘要 公共交通路径规划在公共交通网络设计中发挥重要作用,确保乘客获得满意的服务水平。然而,当前的路径解决方案通常基于传统的操作研究策略,可能需要较长时间来实现并且缺乏快速解决方案。在这里,我们提出了一种基于深度学习的决策支持系统,可以帮助公共交通(PT)规划人员在短时间内迅速地提高路径。通过在两个停站之间的特定路段进行轻量级调整,我们的方法可以减少时间并提高PT服务质量。我们利用了多种数据源,如GTFS和智能卡数据,提取特征并将交通网络模型为指定图。使用无监督学习,我们训练了一个深度学习模型,可以预测路段延迟值。这些延迟值然后被用作路径搜索的边重量,使得路径搜索更加高效。通过对特拉维夫进行评估,我们可以减少路线的时间超过9%。改进的路线包括城市内和郊区路线,这一结果表明模型的 universality。这些发现强调了我们数据驱动的决策支持系统的潜在能力,推动公共交通和城市物流的更高效和可靠性。

Job Shop Scheduling Benchmark: Environments and Instances for Learning and Non-learning Methods

  • paper_url: http://arxiv.org/abs/2308.12794
  • repo_url: https://github.com/ai-for-decision-making-tue/job_shop_scheduling_benchmark
  • paper_authors: Robbert Reijnen, Kjell van Straaten, Zaharah Bukhsh, Yingqian Zhang
  • for: 提供一个中央化的测试平台供研究人员、实践者和热门追求者在机器平面管理中解决问题。
  • methods: 使用 GitHub 开源存储平台提供了丰富的测试库,涵盖了各种机器平面管理问题,包括 Job Shop Scheduling (JSP)、Flow Shop Scheduling (FSP)、Flexible Job Shop Scheduling (FJSP)、FJSP with Assembly constraints (FAJSP)、FJSP with Sequence-Dependent Setup Times (FJSP-SDST) 和在线 FJSP。
  • results: 提供了一个中央化的测试平台,以便研究人员、实践者和热门追求者可以在一个集中化的位置上解决机器平面管理的挑战。
    Abstract We introduce an open-source GitHub repository containing comprehensive benchmarks for a wide range of machine scheduling problems, including Job Shop Scheduling (JSP), Flow Shop Scheduling (FSP), Flexible Job Shop Scheduling (FJSP), FJSP with Assembly constraints (FAJSP), FJSP with Sequence-Dependent Setup Times (FJSP-SDST), and the online FJSP (with online job arrivals). Our primary goal is to provide a centralized hub for researchers, practitioners, and enthusiasts interested in tackling machine scheduling challenges.
    摘要 我们介绍一个开源的GitHub存储库,包含了各种机器调度问题的完整的benchmark,包括作业shop调度(JSP)、流shop调度(FSP)、可变作业shop调度(FJSP)、FJSP具有组装约束(FAJSP)、FJSP具有时间序列依赖的设置(FJSP-SDST)以及在线FJSP。我们的主要目标是为研究人员、实践者和爱好者提供一个中心化的平台,以便他们可以解决机器调度挑战。

Acquiring Qualitative Explainable Graphs for Automated Driving Scene Interpretation

  • paper_url: http://arxiv.org/abs/2308.12755
  • repo_url: None
  • paper_authors: Nassim Belmecheri, Arnaud Gotlieb, Nadjib Lazaar, Helge Spieker
  • for: 这篇论文旨在提出一种新的自动驾驶场景表示方法,以便更好地解释自动驾驶的决策。
  • methods: 该方法基于 Qualitative Constraint Acquisition paradigm,可以快速计算出自动驾驶场景的Qualitative eXplainable Graph。
  • results: 实验结果表明,这种方法可以在实时计算和快速存储的情况下构建自动驾驶场景的Qualitative eXplainable Graph,这使得它成为可能有用的工具 для提高自动驾驶的识别和控制过程。
    Abstract The future of automated driving (AD) is rooted in the development of robust, fair and explainable artificial intelligence methods. Upon request, automated vehicles must be able to explain their decisions to the driver and the car passengers, to the pedestrians and other vulnerable road users and potentially to external auditors in case of accidents. However, nowadays, most explainable methods still rely on quantitative analysis of the AD scene representations captured by multiple sensors. This paper proposes a novel representation of AD scenes, called Qualitative eXplainable Graph (QXG), dedicated to qualitative spatiotemporal reasoning of long-term scenes. The construction of this graph exploits the recent Qualitative Constraint Acquisition paradigm. Our experimental results on NuScenes, an open real-world multi-modal dataset, show that the qualitative eXplainable graph of an AD scene composed of 40 frames can be computed in real-time and light in space storage which makes it a potentially interesting tool for improved and more trustworthy perception and control processes in AD.
    摘要

Motion In-Betweening with Phase Manifolds

  • paper_url: http://arxiv.org/abs/2308.12751
  • repo_url: https://github.com/pauzii/phasebetweener
  • paper_authors: Paul Starke, Sebastian Starke, Taku Komura, Frank Steinicke
  • for: This paper introduces a novel data-driven motion in-betweening system to reach target poses of characters.
  • methods: The paper uses a mixture-of-experts neural network model, a Periodic Autoencoder, and a learned bi-directional control scheme to generate smooth and realistic character movements.
  • results: The proposed framework can compete with popular state-of-the-art methods for motion in-betweening in terms of motion quality and generalization, especially in the existence of long transition durations, and can also synthesize more challenging movements beyond locomotion behaviors. Additionally, style control is enabled between given target keyframes.
    Abstract This paper introduces a novel data-driven motion in-betweening system to reach target poses of characters by making use of phases variables learned by a Periodic Autoencoder. Our approach utilizes a mixture-of-experts neural network model, in which the phases cluster movements in both space and time with different expert weights. Each generated set of weights then produces a sequence of poses in an autoregressive manner between the current and target state of the character. In addition, to satisfy poses which are manually modified by the animators or where certain end effectors serve as constraints to be reached by the animation, a learned bi-directional control scheme is implemented to satisfy such constraints. The results demonstrate that using phases for motion in-betweening tasks sharpen the interpolated movements, and furthermore stabilizes the learning process. Moreover, using phases for motion in-betweening tasks can also synthesize more challenging movements beyond locomotion behaviors. Additionally, style control is enabled between given target keyframes. Our proposed framework can compete with popular state-of-the-art methods for motion in-betweening in terms of motion quality and generalization, especially in the existence of long transition durations. Our framework contributes to faster prototyping workflows for creating animated character sequences, which is of enormous interest for the game and film industry.
    摘要

Separating the Human Touch from AI-Generated Text using Higher Criticism: An Information-Theoretic Approach

  • paper_url: http://arxiv.org/abs/2308.12747
  • repo_url: None
  • paper_authors: Alon Kipnis
  • for: 本研究旨在判断一篇文章是否完全由生成语言模型编写,或者另一种情况下,文章包含了一些重要的人工编辑。
  • methods: 本研究使用多种困惑测试来评估文章中各句的起源,并将这些测试结果组合使用高等批判(HC)。这种方法可以同时判断各句的起源是否为生成语言模型所致,以及哪些句子可能有人工编辑。
  • results: 研究使用实际数据进行了证明,并分析了影响方法效果的因素。这种分析提出了一些有趣的开放挑战,解决这些挑战可能会提高方法的效果。
    Abstract We propose a method to determine whether a given article was entirely written by a generative language model versus an alternative situation in which the article includes some significant edits by a different author, possibly a human. Our process involves many perplexity tests for the origin of individual sentences or other text atoms, combining these multiple tests using Higher Criticism (HC). As a by-product, the method identifies parts suspected to be edited. The method is motivated by the convergence of the log-perplexity to the cross-entropy rate and by a statistical model for edited text saying that sentences are mostly generated by the language model, except perhaps for a few sentences that might have originated via a different mechanism. We demonstrate the effectiveness of our method using real data and analyze the factors affecting its success. This analysis raises several interesting open challenges whose resolution may improve the method's effectiveness.
    摘要 我们提出了一种方法,用于判断一篇文章是否完全由生成语言模型写成,或者是一种另外的情况,文章包含了一些重要的人工修改。我们的过程包括多种plexity测试,对各个句子或其他文本元素的起源进行组合使用高等批判(HC)。这种方法可以标识可能有人工修改的部分。我们的方法受到了log-plexity converging到cross-entropy rate的统计模型,以及一种编辑文本的统计模型,即句子主要由语言模型生成,除了一些句子可能通过不同的机制生成。我们使用实际数据进行示例,并分析了这种方法的成功因素。这种分析提出了一些有趣的开放挑战,解决这些挑战可能会提高方法的效果。

Human Comprehensible Active Learning of Genome-Scale Metabolic Networks

  • paper_url: http://arxiv.org/abs/2308.12740
  • repo_url: None
  • paper_authors: Lun Ai, Shi-Shun Liang, Wang-Zhou Dai, Liam Hallett, Stephen H. Muggleton, Geoff S. Baldwin
  • for: 这个论文主要用于提高干预生物学中的设计、建造、测试和学习(DBTL)循环中的实验设计和成本降低。
  • methods: 该论文提出了一种基于卷积逻辑编程(ILP)的新机器学习框架ILP-iML1515,该框架通过逻辑推理和学习新的逻辑结构从auxotrophic异常变种试验中更新模型。
  • results: ILP-iML1515可以高速进行模拟和活动地选择实验,并且可以在测试新功能蛋白时降低实验成本。
    Abstract An important application of Synthetic Biology is the engineering of the host cell system to yield useful products. However, an increase in the scale of the host system leads to huge design space and requires a large number of validation trials with high experimental costs. A comprehensible machine learning approach that efficiently explores the hypothesis space and guides experimental design is urgently needed for the Design-Build-Test-Learn (DBTL) cycle of the host cell system. We introduce a novel machine learning framework ILP-iML1515 based on Inductive Logic Programming (ILP) that performs abductive logical reasoning and actively learns from training examples. In contrast to numerical models, ILP-iML1515 is built on comprehensible logical representations of a genome-scale metabolic model and can update the model by learning new logical structures from auxotrophic mutant trials. The ILP-iML1515 framework 1) allows high-throughput simulations and 2) actively selects experiments that reduce the experimental cost of learning gene functions in comparison to randomly selected experiments.
    摘要 синтетиче生物的重要应用之一是通过引入主机细胞系统来生产有用产品。然而,随着主机系统的扩大,设计空间的增加导致了庞大的实验成本和高效性的需求。我们需要一种可靠的机器学习方法,能够有效地探索假设空间,并 guid experimental design 进行 DBTL 循环。我们提出了一种基于推理学习的机器学习框架 ILP-iML1515,通过推理逻辑学习来协助主机细胞系统的设计和验证。与数值模型不同,ILP-iML1515 基于可读性的逻辑表示,可以通过学习新的逻辑结构来更新模型,并且能够高效地进行大规模的 simulations。此外,ILP-iML1515 框架还可以活动地选择实验,以降低学习基因功能的实验成本,相比于随机选择的实验。

Asymmetric Co-Training with Explainable Cell Graph Ensembling for Histopathological Image Classification

  • paper_url: http://arxiv.org/abs/2308.12737
  • repo_url: None
  • paper_authors: Ziqi Yang, Zhongyu Li, Chen Liu, Xiangde Luo, Xingguang Wang, Dou Xu, Chaoqun Li, Xiaoying Qin, Meng Yang, Long Jin
  • for: This paper focuses on multi-class histopathological image classification, with the goal of improving explainability and performance.
  • methods: The proposed method combines a deep graph convolutional network and a convolutional neural network, with an asymmetric co-training framework to dynamically integrate pixel-level and cell-level information.
  • results: The proposed method achieves superior performance, explainability, and generalizability in multi-class histopathological image classification, as demonstrated on private and public datasets.Here is the full text in Simplified Chinese:
  • for: 本研究旨在提高多类组织肿瘤图像分类的可解释性和性能。
  • methods: 提议的方法结合深度图像神经网络和图像神经网络,采用不对称共训框架,动态地集成像素级和细胞级信息。
  • results: 提议的方法在多类组织肿瘤图像分类中具有优秀的性能、可解释性和普适性,如 demonstrated 在私有和公共数据集上。
    Abstract Convolutional neural networks excel in histopathological image classification, yet their pixel-level focus hampers explainability. Conversely, emerging graph convolutional networks spotlight cell-level features and medical implications. However, limited by their shallowness and suboptimal use of high-dimensional pixel data, GCNs underperform in multi-class histopathological image classification. To make full use of pixel-level and cell-level features dynamically, we propose an asymmetric co-training framework combining a deep graph convolutional network and a convolutional neural network for multi-class histopathological image classification. To improve the explainability of the entire framework by embedding morphological and topological distribution of cells, we build a 14-layer deep graph convolutional network to handle cell graph data. For the further utilization and dynamic interactions between pixel-level and cell-level information, we also design a co-training strategy to integrate the two asymmetric branches. Notably, we collect a private clinically acquired dataset termed LUAD7C, including seven subtypes of lung adenocarcinoma, which is rare and more challenging. We evaluated our approach on the private LUAD7C and public colorectal cancer datasets, showcasing its superior performance, explainability, and generalizability in multi-class histopathological image classification.
    摘要 convolutional neural networks 在 Histopathological 图像分类中表现出色,但是它们的像素级别关注使得解释性受限。相反,出现在的图像 convolutional neural networks 注重 cell 级别特征和医学意义。然而,由于它们的浅度和高维像素数据的不佳使用,GCNs在多类 Histopathological 图像分类中表现不佳。为了在动态地使用像素级别和 cell 级别特征,我们提议一种不对称 co-training 框架, combining a deep graph convolutional network 和一个 convolutional neural network для多类 Histopathological 图像分类。为了提高整个框架的解释性,我们建立了一个 14 层深的 graph convolutional network 来处理 cell graph 数据。此外,我们还设计了一种 co-training 策略,以实现像素级别和 cell 级别信息之间的动态交互。值得一提的是,我们收集了一个私有的临床获得的数据集 termed LUAD7C,包括七种肺adenocarcinoma 的亚型,这种数据集是罕见且更加挑战。我们对私人 LUAD7C 和公共的 colorectal cancer 数据集进行评估,展示了我们的方法在多类 Histopathological 图像分类中的优秀表现、解释性和普适性。

DeepLOC: Deep Learning-based Bone Pathology Localization and Classification in Wrist X-ray Images

  • paper_url: http://arxiv.org/abs/2308.12727
  • repo_url: https://github.com/olegrgv/DeepLOC
  • paper_authors: Razan Dibo, Andrey Galichin, Pavel Astashev, Dmitry V. Dylov, Oleg Y. Rogov
  • for: 该论文旨在帮助放射学家更加准确和高效地分析骨病变图像。
  • methods: 该方法结合了YOLO和Shifted Window Transformer(Swin),并提出了一个新的块来解决骨病变图像分类和定位的两大挑战。YOLO框架用于检测和定位骨病变,利用其实时物体检测功能;而Swin则用于从定位区域中提取 Contextual information,以准确地分类骨病变。
  • results: 该方法可以准确地定位和分类骨病变,并且可以提高放射学家的分析效率和准确率。
    Abstract In recent years, computer-aided diagnosis systems have shown great potential in assisting radiologists with accurate and efficient medical image analysis. This paper presents a novel approach for bone pathology localization and classification in wrist X-ray images using a combination of YOLO (You Only Look Once) and the Shifted Window Transformer (Swin) with a newly proposed block. The proposed methodology addresses two critical challenges in wrist X-ray analysis: accurate localization of bone pathologies and precise classification of abnormalities. The YOLO framework is employed to detect and localize bone pathologies, leveraging its real-time object detection capabilities. Additionally, the Swin, a transformer-based module, is utilized to extract contextual information from the localized regions of interest (ROIs) for accurate classification.
    摘要

Continuous Reinforcement Learning-based Dynamic Difficulty Adjustment in a Visual Working Memory Game

  • paper_url: http://arxiv.org/abs/2308.12726
  • repo_url: None
  • paper_authors: Masoud Rahimi, Hadi Moradi, Abdol-hossein Vahabie, Hamed Kebriaei
  • for: 这个论文目的是提出一种基于强化学习的游戏难度调整方法,以提高玩家的游戏体验。
  • methods: 该方法使用了连续控制器学习(RL)方法,并使用了视觉工作记忆(VWM)游戏来处理复杂的搜索空间。
  • results: 该方法在52名参与者的在人体实验中表现出了显著更好的游戏体验,包括积极、紧张、负面情绪和正面情绪。同时,玩家的得分和胜率也有所提高,并且难度调整方法导致20次试Session中的得分下降减少了。
    Abstract Dynamic Difficulty Adjustment (DDA) is a viable approach to enhance a player's experience in video games. Recently, Reinforcement Learning (RL) methods have been employed for DDA in non-competitive games; nevertheless, they rely solely on discrete state-action space with a small search space. In this paper, we propose a continuous RL-based DDA methodology for a visual working memory (VWM) game to handle the complex search space for the difficulty of memorization. The proposed RL-based DDA tailors game difficulty based on the player's score and game difficulty in the last trial. We defined a continuous metric for the difficulty of memorization. Then, we consider the task difficulty and the vector of difficulty-score as the RL's action and state, respectively. We evaluated the proposed method through a within-subject experiment involving 52 subjects. The proposed approach was compared with two rule-based difficulty adjustment methods in terms of player's score and game experience measured by a questionnaire. The proposed RL-based approach resulted in a significantly better game experience in terms of competence, tension, and negative and positive affect. Players also achieved higher scores and win rates. Furthermore, the proposed RL-based DDA led to a significantly less decline in the score in a 20-trial session.
    摘要 dynamical difficulty adjustment (DDA) 是一种有效的方法来提高玩家在电子游戏中的体验。最近,人工智能学习(RL)方法已经在非竞争性游戏中应用于 DDA;然而,它们仅仅基于精确的状态动作空间和小搜索空间。在这篇论文中,我们提议了一种基于连续RL的 DDA方法ology for a visual working memory (VWM) game to handle the complex search space for the difficulty of memorization. 我们定义了一个连续的听力难度度量,然后考虑了任务难度和游戏难度的向量作为RL的动作和状态,分别。我们通过一个在subject experiment中,涉及52名参与者进行评估我们的方法。我们的方法与两种规则基于的难度调整方法进行比较,以获得玩家的得分和游戏体验的评估,通过问卷调查。我们的RL基于的方法在玩家的得分和游戏体验方面表现出了显著更好的效果,并且玩家的得分和胜率也更高。此外,我们的RL基于的 DDA 方法还导致了20次Session中的得分下降减少为显著水平。

VIGC: Visual Instruction Generation and Correction

  • paper_url: http://arxiv.org/abs/2308.12714
  • repo_url: None
  • paper_authors: Bin Wang, Fan Wu, Xiao Han, Jiahui Peng, Huaping Zhong, Pan Zhang, Xiaoyi Dong, Weijia Li, Wei Li, Jiaqi Wang, Conghui He
  • For: The paper aims to address the challenge of obtaining high-quality instruction-tuning data for vision-language tasks, specifically by utilizing multimodal large language models (MLLMs) to generate such data.* Methods: The proposed framework, called Visual Instruction Generation and Correction (VIGC), consists of two main components: Visual Instruction Generation (VIG) and Visual Instruction Correction (VIC). VIG guides the vision-language model to generate diverse instruction-tuning data, while VIC corrects any inaccuracies in the generated data through an iterative update mechanism.* Results: The proposed VIGC framework effectively enhances the quality of instruction-tuning data, as demonstrated by experimental results that show improved benchmark performance compared to language-only data generation methods.Here’s the simplified Chinese version of the three key points:* For: 解决视语言任务中获得高质量指令调试数据的挑战,通过使用多Modal大语言模型(MLLMs)生成相关数据。* Methods: 提出的框架为Visual Instruction Generation and Correction(VIGC),包括Visual Instruction Generation(VIG)和Visual Instruction Correction(VIC)两个主要组成部分。VIG使得视语言模型生成多样的指令调试数据,而VIC通过迭代更新机制,确保生成数据的准确性。* Results: VIGC框架能够有效提高指令调试数据的质量,经实验证明,比语言只数据生成方法得到更好的 benchMark性能。
    Abstract The integration of visual encoders and large language models (LLMs) has driven recent progress in multimodal large language models (MLLMs). However, the scarcity of high-quality instruction-tuning data for vision-language tasks remains a challenge. The current leading paradigm, such as LLaVA, relies on language-only GPT-4 to generate data, which requires pre-annotated image captions and detection bounding boxes, suffering from understanding image details. A practical solution to this problem would be to utilize the available multimodal large language models (MLLMs) to generate instruction data for vision-language tasks. However, it's worth noting that the currently accessible MLLMs are not as powerful as their LLM counterparts, as they tend to produce inadequate responses and generate false information. As a solution for addressing the current issue, this paper proposes the Visual Instruction Generation and Correction (VIGC) framework that enables multimodal large language models to generate instruction-tuning data and progressively enhance its quality on-the-fly. Specifically, Visual Instruction Generation (VIG) guides the vision-language model to generate diverse instruction-tuning data. To ensure generation quality, Visual Instruction Correction (VIC) adopts an iterative update mechanism to correct any inaccuracies in data produced by VIG, effectively reducing the risk of hallucination. Leveraging the diverse, high-quality data generated by VIGC, we finetune mainstream models and validate data quality based on various evaluations. Experimental results demonstrate that VIGC not only compensates for the shortcomings of language-only data generation methods, but also effectively enhances the benchmark performance. The models, datasets, and code will be made publicly available.
    摘要 Recent progress in multimodal large language models (MLLMs) has been driven by the integration of visual encoders and large language models (LLMs). However, the lack of high-quality instruction-tuning data for vision-language tasks remains a challenge. The current leading paradigm, such as LLaVA, relies on language-only GPT-4 to generate data, which requires pre-annotated image captions and detection bounding boxes, and suffers from a lack of understanding of image details. To address this problem, this paper proposes the Visual Instruction Generation and Correction (VIGC) framework, which enables multimodal large language models to generate instruction-tuning data and progressively enhance its quality on-the-fly. Specifically, Visual Instruction Generation (VIG) guides the vision-language model to generate diverse instruction-tuning data. To ensure generation quality, Visual Instruction Correction (VIC) adopts an iterative update mechanism to correct any inaccuracies in data produced by VIG, effectively reducing the risk of hallucination. By leveraging the diverse, high-quality data generated by VIGC, we finetune mainstream models and validate data quality based on various evaluations. Experimental results demonstrate that VIGC not only compensates for the shortcomings of language-only data generation methods, but also effectively enhances the benchmark performance. The models, datasets, and code will be made publicly available.

SayCanPay: Heuristic Planning with Large Language Models using Learnable Domain Knowledge

  • paper_url: http://arxiv.org/abs/2308.12682
  • repo_url: None
  • paper_authors: Rishi Hazra, Pedro Zuidberg Dos Martires, Luc De Raedt
  • for: 这个论文旨在使用语言模型(LLM)和规划方法(heuristic planning)结合以产生可行和成本效益的计划。
  • methods: 本文提出的方法是使用LLM生成动作(Say),根据学习的域知识进行评估动作的可行性(Can)和长期奖励(Pay),并使用规划搜索选择最佳动作序列。
  • results: 对比其他LLM规划方法,本文的模型在评估中表现出色,可以生成更加可行和成本效益的计划。
    Abstract Large Language Models (LLMs) have demonstrated impressive planning abilities due to their vast "world knowledge". Yet, obtaining plans that are both feasible (grounded in affordances) and cost-effective (in plan length), remains a challenge, despite recent progress. This contrasts with heuristic planning methods that employ domain knowledge (formalized in action models such as PDDL) and heuristic search to generate feasible, optimal plans. Inspired by this, we propose to combine the power of LLMs and heuristic planning by leveraging the world knowledge of LLMs and the principles of heuristic search. Our approach, SayCanPay, employs LLMs to generate actions (Say) guided by learnable domain knowledge, that evaluates actions' feasibility (Can) and long-term reward/payoff (Pay), and heuristic search to select the best sequence of actions. Our contributions are (1) a novel framing of the LLM planning problem in the context of heuristic planning, (2) integrating grounding and cost-effective elements into the generated plans, and (3) using heuristic search over actions. Our extensive evaluations show that our model surpasses other LLM planning approaches.
    摘要
  1. 将 LLMS 规划问题框入规划搜索的Context。2. 在生成的计划中 integrate 可行和cost-effective的元素。3. 使用规划搜索 sobre 动作。我们的广泛评估表明,我们的模型超过了其他 LLMS 规划方法。

LR-XFL: Logical Reasoning-based Explainable Federated Learning

  • paper_url: http://arxiv.org/abs/2308.12681
  • repo_url: None
  • paper_authors: Yanci Zhang, Han Yu
  • for: This paper aims to improve the transparency and explainability of federated learning (FL) models by incorporating logic-based explanations into the FL framework.
  • methods: The proposed Logical Reasoning-based eXplainable Federated Learning (LR-XFL) approach involves FL clients creating local logic rules based on their local data and sending them to the FL server, which connects the local logic rules through a proper logical connector without requiring access to the raw data. The server aggregates the local model updates with weight values determined by the quality of the clients’ local data as reflected by their uploaded logic rules.
  • results: The results show that LR-XFL outperforms the most relevant baseline by 1.19%, 5.81% and 5.41% in terms of classification accuracy, rule accuracy and rule fidelity, respectively. The explicit rule evaluation and expression under LR-XFL enable human experts to validate and correct the rules on the server side, hence improving the global FL model’s robustness to errors.
    Abstract Federated learning (FL) is an emerging approach for training machine learning models collaboratively while preserving data privacy. The need for privacy protection makes it difficult for FL models to achieve global transparency and explainability. To address this limitation, we incorporate logic-based explanations into FL by proposing the Logical Reasoning-based eXplainable Federated Learning (LR-XFL) approach. Under LR-XFL, FL clients create local logic rules based on their local data and send them, along with model updates, to the FL server. The FL server connects the local logic rules through a proper logical connector that is derived based on properties of client data, without requiring access to the raw data. In addition, the server also aggregates the local model updates with weight values determined by the quality of the clients' local data as reflected by their uploaded logic rules. The results show that LR-XFL outperforms the most relevant baseline by 1.19%, 5.81% and 5.41% in terms of classification accuracy, rule accuracy and rule fidelity, respectively. The explicit rule evaluation and expression under LR-XFL enable human experts to validate and correct the rules on the server side, hence improving the global FL model's robustness to errors. It has the potential to enhance the transparency of FL models for areas like healthcare and finance where both data privacy and explainability are important.
    摘要 Federated learning(FL)是一种emergingapproach для协同训练机器学习模型,保持数据隐私。由于需要隐私保护,FL模型具有限制global transparency和解释性。为Address这些Limitations,我们在FL中嵌入逻辑基于的解释by proposing the Logical Reasoning-based eXplainable Federated Learning(LR-XFL)approach。在LR-XFL中,FL客户端创建基于本地数据的本地逻辑规则,并将其发送到FL服务器。FL服务器通过基于客户端数据的逻辑连接器连接客户端的本地逻辑规则,而无需访问原始数据。此外,服务器还将客户端上传的模型更新与基于客户端数据质量的重量值相结合。结果表明,LR-XFL比最相关的基eline提高了1.19%、5.81%和5.41%的分类精度、逻辑规则精度和逻辑权重精度,分别。explicit Rule evaluation和表达在LR-XFL中允许人工专家在服务器端验证和更正规则,因此提高了全局FL模型的Robustness。它可以提高FL模型在医疗和金融等领域的透明度,这些领域都是数据隐私和解释性重要。

Improving Translation Faithfulness of Large Language Models via Augmenting Instructions

  • paper_url: http://arxiv.org/abs/2308.12674
  • repo_url: https://github.com/pppa2019/swie_overmiss_llm4mt
  • paper_authors: Yijie Chen, Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu, Jie Zhou
  • for: 提高大型自然语言模型(LLM)的特殊能力,如机器翻译,通过低成本的指令调整。
  • methods: 提出了 Segment-Weighted Instruction Embedding(SWIE)和 OVERMISS instrucion-following 数据集,以解决 LLM 的注意力机制受限,容易忘记指令的问题。
  • results: 对两种主流开源 LLM(BLOOM 和 LLaMA)进行应用,实验结果表明,SWIE 可以改善翻译性能,特别是零shot 和长文本翻译,而 OVERMISS 可以提高翻译性能和对word alignment的忠实度。
    Abstract Large Language Models (LLMs) present strong general capabilities, and a current compelling challenge is stimulating their specialized capabilities, such as machine translation, through low-cost instruction tuning. The standard instruction-following data is sequentially organized as the concatenation of an instruction, an input, and a response. As the attention mechanism of LLMs has limitations on local focus, LLMs tend to focus more on the words or sentences nearby at each position. This leads to a high risk of instruction forgetting during decoding. To alleviate the above issues, We propose SWIE (Segment-Weighted Instruction Embedding) and an instruction-following dataset OVERMISS. SWIE improves the model instruction understanding by adding a global instruction representation on the following input and response representations. OVERMISS improves model faithfulness by comparing over-translation and miss-translation results with the correct translation. We apply our methods to two main-stream open-source LLMs, BLOOM and LLaMA. The experimental results demonstrate significant improvements in translation performance with SWIE based on BLOOMZ-3b, particularly in zero-shot and long text translations due to reduced instruction forgetting risk. Additionally, OVERMISS outperforms the baseline in translation performance (e.g. an increase in BLEU scores from 0.69 to 3.12 and an average improvement of 0.48 percentage comet scores for LLaMA-7b) with further enhancements seen in models combining OVERMISS and SWIE (e.g. the BLUE scores increase up to 0.56 from English to German across three different backbones), and both exhibit improvements in the faithfulness metric based on word alignment.
    摘要 大型语言模型(LLM)具有强大的通用能力,现在的挑战是鼓励它们的特殊能力,例如机器翻译,通过低成本的指令调整。标准的指令跟踪数据是逐一 concatenate 一个指令、输入和回应。由于 LL 的注意机制有局部关注的限制, LL 倾向于在每个位置上更多地关注词句或句子。这会导致翻译过程中的指令忘记风险增加。为了解决上述问题,我们提出了 SWIE(Segment-Weighted Instruction Embedding)和 OVERMISS instruction-following 数据集。SWIE 改善了模型对指令的理解,将全球指令表现添加到下一个输入和回应表现中。OVERMISS 则提高了模型的忠实度,通过比较翻译结果和正确翻译结果之间的差异。我们将我们的方法应用到两个主流的开源 LL 中,namely BLOOM 和 LLaMA。实验结果显示,SWIE 在 BLOOMZ-3b 中具有优化翻译性能,特别是在零shot 和长文翻译中具有削减指令忘记风险的效果。此外,OVERMISS 在翻译性能方面表现出色,例如对于 LLMA-7b 的 BLEU 分数从 0.69 提高到 3.12,平均提高 0.48 百分比统计分数。此外,在组合 OVERMISS 和 SWIE 时,模型具有进一步的改善,例如对于英文到德文的翻译中,BLUE 分数从 0.56 提高到 0.64。此外,SWIE 和 OVERMISS 都具有改善的忠实度,基于词汇对Alignment。

Don’t Look into the Sun: Adversarial Solarization Attacks on Image Classifiers

  • paper_url: http://arxiv.org/abs/2308.12661
  • repo_url: https://github.com/paulgavrikov/adversarial_solarization
  • paper_authors: Paul Gavrikov, Janis Keuper
  • for: 这篇论文的目的是测试深度神经网络对于不同类型的输入进行抗性测试,特别是在自动驾驶和安全系统中,以防止黑客利用数位修改输入来逃脱安全检查。
  • methods: 这篇论文提出了一种基于图像阳光化的攻击方法,这是一种概念简单但不会干扰自然图像的结构的攻击方法。研究者透过对多个ImageNet模型进行了全面的评估,证明了这种攻击方法可以对精度造成严重的下降,但是不包括在训练增强中。
  • results: 研究者发现这种攻击方法可以对精度造成严重的下降,并且不包括在训练增强中。此外,这种攻击方法可以转化为黑盒攻击,并且不需要了解特定的模型内部细节。这些结果表明,对于深度神经网络的抗性测试仍然是一个复杂的和需要进一步研究的领域。
    Abstract Assessing the robustness of deep neural networks against out-of-distribution inputs is crucial, especially in safety-critical domains like autonomous driving, but also in safety systems where malicious actors can digitally alter inputs to circumvent safety guards. However, designing effective out-of-distribution tests that encompass all possible scenarios while preserving accurate label information is a challenging task. Existing methodologies often entail a compromise between variety and constraint levels for attacks and sometimes even both. In a first step towards a more holistic robustness evaluation of image classification models, we introduce an attack method based on image solarization that is conceptually straightforward yet avoids jeopardizing the global structure of natural images independent of the intensity. Through comprehensive evaluations of multiple ImageNet models, we demonstrate the attack's capacity to degrade accuracy significantly, provided it is not integrated into the training augmentations. Interestingly, even then, no full immunity to accuracy deterioration is achieved. In other settings, the attack can often be simplified into a black-box attack with model-independent parameters. Defenses against other corruptions do not consistently extend to be effective against our specific attack. Project website: https://github.com/paulgavrikov/adversarial_solarization
    摘要 评估深度神经网络对非标型输入的Robustness是非常重要的,尤其在自动驾驶和安全系统中, где可能有恶意actor会通过数字修改输入来逃脱安全护垫。然而,设计全面的非标型测试方法,涵盖所有可能的场景,而且保持准确的标签信息是一项具有挑战性的任务。现有的方法ologies oft en compromise on variety and constraint levels for attacks, and sometimes even both.为了更好地评估图像分类模型的Robustness,我们提出了基于图像折射的攻击方法,这是一种简单的概念,但是不会损害自然图像的全球结构,无论输入的强度如何。通过对多个ImageNet模型进行全面的评估,我们示示了这种攻击的能力可以让准确性降低显著,即使不包括在训练增强中。另外,这种攻击可以在其他设置下简化为黑盒攻击,并且可以独立地设置模型参数。防御其他损害的方法不一定能够对我们的特定攻击提供有效防御。更多信息可以查看我们的项目网站:

kTrans: Knowledge-Aware Transformer for Binary Code Embedding

  • paper_url: http://arxiv.org/abs/2308.12659
  • repo_url: https://github.com/learner0x5a/ktrans-release
  • paper_authors: Wenyu Zhu, Hao Wang, Yuchen Zhou, Jiaming Wang, Zihan Sha, Zeyu Gao, Chao Zhang
  • for: 本文旨在提出一种基于Transformer模型的知识塑化二进制代码嵌入(kTrans),以提高下游任务的性能。
  • methods: 本文使用Transformer模型,并FeedExplicit知识作为额外输入,同时使用一种新的预训练任务来融合Implicit知识。
  • results: 对于3个下游任务(二进制代码相似检测、函数类型恢复和间接调用识别),kTrans可以生成高质量的二进制代码嵌入,并与现有最佳方法相比,提高了5.2%, 6.8%和12.6%的性能。
    Abstract Binary Code Embedding (BCE) has important applications in various reverse engineering tasks such as binary code similarity detection, type recovery, control-flow recovery and data-flow analysis. Recent studies have shown that the Transformer model can comprehend the semantics of binary code to support downstream tasks. However, existing models overlooked the prior knowledge of assembly language. In this paper, we propose a novel Transformer-based approach, namely kTrans, to generate knowledge-aware binary code embedding. By feeding explicit knowledge as additional inputs to the Transformer, and fusing implicit knowledge with a novel pre-training task, kTrans provides a new perspective to incorporating domain knowledge into a Transformer framework. We inspect the generated embeddings with outlier detection and visualization, and also apply kTrans to 3 downstream tasks: Binary Code Similarity Detection (BCSD), Function Type Recovery (FTR) and Indirect Call Recognition (ICR). Evaluation results show that kTrans can generate high-quality binary code embeddings, and outperforms state-of-the-art (SOTA) approaches on downstream tasks by 5.2%, 6.8%, and 12.6% respectively. kTrans is publicly available at: https://github.com/Learner0x5a/kTrans-release
    摘要 “二进制代码嵌入”(BCE)在各种反工程任务中扮演着重要角色,如二进制代码相似检测、类型恢复、控制流恢复和数据流分析。现有研究表明,Transformer模型可以理解二进制代码的 semantics,以支持下游任务。然而,现有模型忽略了 Assembly 语言的先前知识。在本文中,我们提出了一种新的 Transformer 基于的方法,即 kTrans,用于生成知识感知的二进制代码嵌入。通过将显式知识作为 Transformer 的输入,并将隐式知识与一种新的预训练任务结合,kTrans 提供了一种新的方式来 incorporate 领域知识 into Transformer 框架。我们通过检测异常值和可视化,以及应用 kTrans 于 3 个下游任务:二进制代码相似检测(BCSD)、函数类型恢复(FTR)和间接调用认知(ICR)。评估结果表明,kTrans 可以生成高质量的二进制代码嵌入,并在下游任务上过去 SOTA 方法 by 5.2%, 6.8%, 和 12.6% 分别。kTrans 可以在:https://github.com/Learner0x5a/kTrans-release 上获取。

APART: Diverse Skill Discovery using All Pairs with Ascending Reward and DropouT

  • paper_url: http://arxiv.org/abs/2308.12649
  • repo_url: None
  • paper_authors: Hadar Schreiber Galler, Tom Zahavy, Guillaume Desjardins, Alon Cohen
  • for: 本研究旨在在奖励环境中发现多样化技能,目的是在简单的格子世界中发现所有可能的技能,而前一些方法在这些环境中很难 succeed.
  • methods: 我们使用了一种名为APART的方法,即多对多推论器和一种新的内在奖励函数,以及一种Dropout regularization技术。
  • results: 我们的实验表明,APART可以在格子世界中发现所有可能的技能,比前一些方法需要更少的样本数据。此外,我们还提出了一种简化版的算法,通过修改VIC、涨奖励和软max推论器的温度来实现最大技能数。我们认为我们的发现对于激励学习中技能发现算法的成功因素提供了灵感。
    Abstract We study diverse skill discovery in reward-free environments, aiming to discover all possible skills in simple grid-world environments where prior methods have struggled to succeed. This problem is formulated as mutual training of skills using an intrinsic reward and a discriminator trained to predict a skill given its trajectory. Our initial solution replaces the standard one-vs-all (softmax) discriminator with a one-vs-one (all pairs) discriminator and combines it with a novel intrinsic reward function and a dropout regularization technique. The combined approach is named APART: Diverse Skill Discovery using All Pairs with Ascending Reward and Dropout. We demonstrate that APART discovers all the possible skills in grid worlds with remarkably fewer samples than previous works. Motivated by the empirical success of APART, we further investigate an even simpler algorithm that achieves maximum skills by altering VIC, rescaling its intrinsic reward, and tuning the temperature of its softmax discriminator. We believe our findings shed light on the crucial factors underlying success of skill discovery algorithms in reinforcement learning.
    摘要 我们研究了不带奖励的环境中多样化技能发现,目标是在简单的网格世界中发现所有可能的技能。这个问题被формализова为通过内在奖励和一个用于预测技能的权重来训练技能。我们的初始解决方案是将标准的一对一(所有对)权重交换掉了一个对一(softmax)权重,并将其与一种新的内在奖励函数和掉量 regularization 技术结合使用。这种结合的方法被称为APART:多样化技能发现使用所有对与升奖和掉量。我们示出了APART在网格世界中发现所有可能的技能,只需要非常少的样本数据,远远少于前一作。受APART的实际成功的激励,我们进一步调查了一种更简单的算法,通过修改VIC,调整其内在奖励,并调整其softmax权重的温度来实现最大技能数。我们认为我们的发现可能把 reinforcement learning 中技能发现算法的成功因素 shed light on 。

Advancing Hungarian Text Processing with HuSpaCy: Efficient and Accurate NLP Pipelines

  • paper_url: http://arxiv.org/abs/2308.12635
  • repo_url: https://github.com/huspacy/huspacy
  • paper_authors: György Orosz, Gergő Szabó, Péter Berkecz, Zsolt Szántó, Richárd Farkas
  • for: 这篇论文旨在提供一个高效且资源减少的工业级文本处理模型,以达到near state-of-the-art性能水平。
  • methods: 该论文使用了spaCy框架,并对 HuSpaCy 工具集进行了多个改进,包括Tokenization、句子分界检测、parts-of-speech 标注、 morphological feature 标注、lemmatization、依赖分析和命名实体识别等基本文本处理步骤。
  • results: 论文对提议的改进进行了全面评估,并与现有的 NLP 工具进行了比较,并证明了新的模型在所有文本预处理步骤中具有竞争性的性能。
    Abstract This paper presents a set of industrial-grade text processing models for Hungarian that achieve near state-of-the-art performance while balancing resource efficiency and accuracy. Models have been implemented in the spaCy framework, extending the HuSpaCy toolkit with several improvements to its architecture. Compared to existing NLP tools for Hungarian, all of our pipelines feature all basic text processing steps including tokenization, sentence-boundary detection, part-of-speech tagging, morphological feature tagging, lemmatization, dependency parsing and named entity recognition with high accuracy and throughput. We thoroughly evaluated the proposed enhancements, compared the pipelines with state-of-the-art tools and demonstrated the competitive performance of the new models in all text preprocessing steps. All experiments are reproducible and the pipelines are freely available under a permissive license.
    摘要

Towards Hierarchical Regional Transformer-based Multiple Instance Learning

  • paper_url: http://arxiv.org/abs/2308.12634
  • repo_url: None
  • paper_authors: Josef Cersovsky, Sadegh Mohammadi, Dagmar Kainmueller, Johannes Hoehne
  • for: 这篇论文主要针对高分辨率 histopathology 图像的分类问题进行研究,以实现数位patology 和精确医疗的需求。
  • methods: 本研究使用 transformer 基础的多个实例学习模型,并取代传统的学习注意力机制使用区域 Vision Transformer 灵活自注意力机制。文章还提出了一种方法,将区域贴图信息融合以 derive 标本水平预测,并示出了在不同距离水平上堆叠Feature Processing的方法。
  • results: 本研究在两个 histopathology 数据集上进行评估,比基eline表现出色,尤其是 для datasets 中小的本地 morphological 特征。研究还引入了一种方法,在推断过程中专注于高注意区域,以提高预测精度。
    Abstract The classification of gigapixel histopathology images with deep multiple instance learning models has become a critical task in digital pathology and precision medicine. In this work, we propose a Transformer-based multiple instance learning approach that replaces the traditional learned attention mechanism with a regional, Vision Transformer inspired self-attention mechanism. We present a method that fuses regional patch information to derive slide-level predictions and show how this regional aggregation can be stacked to hierarchically process features on different distance levels. To increase predictive accuracy, especially for datasets with small, local morphological features, we introduce a method to focus the image processing on high attention regions during inference. Our approach is able to significantly improve performance over the baseline on two histopathology datasets and points towards promising directions for further research.
    摘要 <>对巨像病理图像的分类使用深度多例学习模型已成为数字病理学和精度医学中的关键任务。在这种工作中,我们提议使用Transformer基于自我注意机制来替代传统学习的注意力机制。我们介绍了一种将区域补做信息融合以获得滤波器级别预测的方法,并显示了如何在不同的距离水平上堆叠特征进行处理。为了提高预测精度,特别是 для datasets中的小本地 morphological features,我们引入了一种在推理时对高注意区域进行图像处理的方法。我们的方法可以在两个病理图像集上显著提高性能,并指向了未来研究的可能的方向。>>>

A Greedy Approach for Offering to Telecom Subscribers

  • paper_url: http://arxiv.org/abs/2308.12606
  • repo_url: None
  • paper_authors: Piyush Kanti Bhunre, Tanmay Sen, Arijit Sarkar
  • for: This paper is written for telecom operators to optimize offer campaigns for customer retention and churn prevention.
  • methods: The paper proposes a novel combinatorial algorithm for solving offer optimization under heterogeneous offers by maximizing expected revenue under the scenario of subscriber churn.
  • results: The proposed algorithm is efficient and accurate even for a very large subscriber-base.Here’s the Chinese translation of the three key information points:
  • for: 这篇论文是为telecom运营商优化奖励计划以防止用户流失。
  • methods: 论文提出了一种新的 combinatorial 算法,用于在不同的奖励下进行奖励优化,以达到用户流失情况下的预期收入最大化。
  • results: 提出的算法能够具有高效率和准确性,甚至对很大的用户基数进行处理。
    Abstract Customer retention or churn prevention is a challenging task of a telecom operator. One of the effective approaches is to offer some attractive incentive or additional services or money to the subscribers for keeping them engaged and make sure they stay in the operator's network for longer time. Often, operators allocate certain amount of monetary budget to carry out the offer campaign. The difficult part of this campaign is the selection of a set of customers from a large subscriber-base and deciding the amount that should be offered to an individual so that operator's objective is achieved. There may be multiple objectives (e.g., maximizing revenue, minimizing number of churns) for selection of subscriber and selection of an offer to the selected subscriber. Apart from monetary benefit, offers may include additional data, SMS, hots-spot tethering, and many more. This problem is known as offer optimization. In this paper, we propose a novel combinatorial algorithm for solving offer optimization under heterogeneous offers by maximizing expected revenue under the scenario of subscriber churn, which is, in general, seen in telecom domain. The proposed algorithm is efficient and accurate even for a very large subscriber-base.
    摘要 In this paper, we propose a novel combinatorial algorithm to solve offer optimization under heterogeneous offers by maximizing expected revenue under the scenario of subscriber churn, which is common in the telecom domain. The proposed algorithm is efficient and accurate, even for a very large subscriber base.

APLA: Additional Perturbation for Latent Noise with Adversarial Training Enables Consistency

  • paper_url: http://arxiv.org/abs/2308.12605
  • repo_url: None
  • paper_authors: Yupu Yao, Shangqi Deng, Zihan Cao, Harry Zhang, Liang-Jian Deng
  • for: 这个论文旨在提出一种基于噪声扩散模型的文本到视频(T2V)生成网络结构,以解决传统噪声扩散模型在视频生成中缺乏地方具有一致性的问题。
  • methods: 该论文提出了一种基于噪声扩散模型的T2V生成网络结构,其中引入了一个额外的干扰分布网络(VGT),以EXTRACT perturbances from the inherent information contained within the input, 并且通过混合 transformers 和卷积神经来补做时间细节,从而提高视频生成中的一致性。
  • results: 实验表明,该论文提出的T2V生成网络结构可以显著提高视频生成中的一致性, both qualitatively 和 quantitatively。
    Abstract Diffusion models have exhibited promising progress in video generation. However, they often struggle to retain consistent details within local regions across frames. One underlying cause is that traditional diffusion models approximate Gaussian noise distribution by utilizing predictive noise, without fully accounting for the impact of inherent information within the input itself. Additionally, these models emphasize the distinction between predictions and references, neglecting information intrinsic to the videos. To address this limitation, inspired by the self-attention mechanism, we propose a novel text-to-video (T2V) generation network structure based on diffusion models, dubbed Additional Perturbation for Latent noise with Adversarial training (APLA). Our approach only necessitates a single video as input and builds upon pre-trained stable diffusion networks. Notably, we introduce an additional compact network, known as the Video Generation Transformer (VGT). This auxiliary component is designed to extract perturbations from the inherent information contained within the input, thereby refining inconsistent pixels during temporal predictions. We leverage a hybrid architecture of transformers and convolutions to compensate for temporal intricacies, enhancing consistency between different frames within the video. Experiments demonstrate a noticeable improvement in the consistency of the generated videos both qualitatively and quantitatively.
    摘要 文本转化为简化中文:传播模型在视频生成方面已经取得了可观的进步。然而,它们经常在不同帧之间保持相同的细节存在问题。这一问题的一个根本原因是传播模型通常通过预测噪声来估算 Gaussian 噪声分布,不充分考虑输入中的内在信息的影响。此外,这些模型强调预测和参照之间的差异,忽视视频中的内在信息。为了解决这些局限性,我们提出了一种基于传播模型的文本转化(T2V)生成网络结构,称之为附加噪声扰动with Adversarial training(APLA)。我们的方法只需要一个输入视频,并基于预训练稳定的传播网络。另外,我们引入了一个附加的小型网络,称之为视频生成变换器(VGT)。这个辅助组件是用于提取输入中的内在信息,并在时间预测中进行细节调整。我们利用一种混合的扩展和卷积网络架构,以资料 temporal 细节,提高不同帧之间的视频生成的一致性。实验表明,我们的方法可以在视频生成中提高一致性, tanto qualitatively 还是 quantitatively。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know.

SICNN: Soft Interference Cancellation Inspired Neural Network Equalizers

  • paper_url: http://arxiv.org/abs/2308.12591
  • repo_url: None
  • paper_authors: Stefan Baumgartner, Oliver Lang, Mario Huemer
  • for: 这种论文主要是为了提出一种基于神经网络的平衡方法,以解决模型基于平衡方法的高计算复杂度和性能下降问题。
  • methods: 该论文提出了两种不同的神经网络平衡方法,即SICNNv1和SICNNv2,其中SICNNv1是专门针对单载频域平衡系统,而SICNNv2是更通用的,适用于任何块式数据传输系统。
  • results: 论文表明,SICNNv1在比较其他方法时表现出色,并且可以在不同的训练集大小下达到优秀的性能。此外,论文还进行了计算复杂度分析,并证明了神经网络平衡方法的可行性。
    Abstract Equalization is an important task at the receiver side of a digital wireless communication system, which is traditionally conducted with model-based estimation methods. Among the numerous options for model-based equalization, iterative soft interference cancellation (SIC) is a well-performing approach since error propagation caused by hard decision data symbol estimation during the iterative estimation procedure is avoided. However, the model-based method suffers from high computational complexity and performance degradation due to required approximations. In this work, we propose a novel neural network (NN-)based equalization approach, referred to as SICNN, which is designed by deep unfolding of a model-based iterative SIC method, eliminating the main disadvantages of its model-based counterpart. We present different variants of SICNN. SICNNv1 is very similar to the model-based method, and is specifically tailored for single carrier frequency domain equalization systems, which is the communication system we regard in this work. The second variant, SICNNv2, is more universal, and is applicable as an equalizer in any communication system with a block-based data transmission scheme. We highlight the pros and cons of both variants. Moreover, for both SICNNv1 and SICNNv2 we present a version with a highly reduced number of learnable parameters. We compare the achieved bit error ratio performance of the proposed NN-based equalizers with state-of-the-art model-based and NN-based approaches, highlighting the superiority of SICNNv1 over all other methods. Also, we present a thorough complexity analysis of the proposed NN-based equalization approaches, and we investigate the influence of the training set size on the performance of NN-based equalizers.
    摘要 Equalization是收发器端数字无线通信系统中的重要任务,传统上采用模型基于估计方法进行实现。iterative soft interference cancellation(SIC)是一种 performs well的方法,因为它可以避免由硬判据数据符号估计过程中的错误卷积。然而,模型基于方法受到高计算复杂性和性能下降的限制。在这项工作中,我们提出了一种基于神经网络(NN)的平衡方法, referred to as SICNN,这种方法通过深度 unfolding 的方式消除了模型基于方法的主要缺陷。我们提出了不同的 SICNN 变种。SICNNv1 和 SICNNv2。SICNNv1 特性类似于模型基于方法,并且特意适用于单 carriers frequency domain equalization 系统,这是我们在这项工作中考虑的系统。第二个变种,SICNNv2,更加通用,可以作为任何通信系统的平衡器。我们比较了这两种变种的优缺点。此外,我们还提出了具有很少学习参数的版本。我们比较了我们提出的 NN 基于平衡器与现有的模型基于和 NN 基于方法的 bit error ratio 性能,并证明了 SICNNv1 在所有其他方法之上表现出色。此外,我们还进行了 NN 基于平衡器的复杂度分析,并investigated 训练集大小对 NN 基于平衡器的性能的影响。

A Huber Loss Minimization Approach to Byzantine Robust Federated Learning

  • paper_url: http://arxiv.org/abs/2308.12581
  • repo_url: None
  • paper_authors: Puning Zhao, Fei Yu, Zhiguo Wan
  • for: 防止 Federated Learning 系统受到攻击,提出一种基于捷径函数损失最小化的新集成器,并进行了全面的理论分析。
  • methods: 使用捷径函数损失最小化来防止 Federated Learning 系统受到攻击,并且不需要精确知道攻击客户端的比率(epsilon)。
  • results: 在独立同分布(i.i.d)假设下,我们的方法具有优化的 $\epsilon$ 依赖性,允许客户端有不同的数据大小,并且可以扩展到非 i.i.d 数据。
    Abstract Federated learning systems are susceptible to adversarial attacks. To combat this, we introduce a novel aggregator based on Huber loss minimization, and provide a comprehensive theoretical analysis. Under independent and identically distributed (i.i.d) assumption, our approach has several advantages compared to existing methods. Firstly, it has optimal dependence on $\epsilon$, which stands for the ratio of attacked clients. Secondly, our approach does not need precise knowledge of $\epsilon$. Thirdly, it allows different clients to have unequal data sizes. We then broaden our analysis to include non-i.i.d data, such that clients have slightly different distributions.
    摘要 联合学习系统容易受到敌意攻击。为此,我们提出了基于捷径损函数优化的新的聚合器,并进行了全面的理论分析。在独立同分布(i.i.d)假设下,我们的方法有以下优势:首先,它具有优化的 $\epsilon$ 依赖性,其中 $\epsilon$ 表示攻击客户端的比率。其次,我们的方法不需要准确知道 $\epsilon$。最后,它允许客户端有不同的数据大小。然后,我们扩展了我们的分析至包括非i.i.d数据,例如客户端的数据分布略有不同。>>>Note: The text is translated into Simplified Chinese, which is the standard form of Chinese used in mainland China. The Traditional Chinese form is also commonly used in Taiwan and other parts of the world.

Mind vs. Mouth: On Measuring Re-judge Inconsistency of Social Bias in Large Language Models

  • paper_url: http://arxiv.org/abs/2308.12578
  • repo_url: None
  • paper_authors: Yachao Zhao, Bo Wang, Dongming Zhao, Kun Huang, Yan Wang, Ruifang He, Yuexian Hou
  • for: 本研究探讨了大型自然语言模型(LLMs)中的认知特征,特别是人类认知constructs的相似性。
  • methods: 本研究采用了两个阶段的方法,包括自动完成句子和后续重新评价句子。
  • results: 研究发现,LLMs中存在一种同人类认知constructs相似的”重新评价不一致”现象,即自动完成句子后,LLMs会重新评价并 contradicts 自己生成的句子。这种现象可能与人类的不意识的社会偏见和自我意识的社会偏见之间的不一致有关。
    Abstract Recent researches indicate that Pre-trained Large Language Models (LLMs) possess cognitive constructs similar to those observed in humans, prompting researchers to investigate the cognitive aspects of LLMs. This paper focuses on explicit and implicit social bias, a distinctive two-level cognitive construct in psychology. It posits that individuals' explicit social bias, which is their conscious expression of bias in the statements, may differ from their implicit social bias, which represents their unconscious bias. We propose a two-stage approach and discover a parallel phenomenon in LLMs known as "re-judge inconsistency" in social bias. In the initial stage, the LLM is tasked with automatically completing statements, potentially incorporating implicit social bias. However, in the subsequent stage, the same LLM re-judges the biased statement generated by itself but contradicts it. We propose that this re-judge inconsistency can be similar to the inconsistency between human's unaware implicit social bias and their aware explicit social bias. Experimental investigations on ChatGPT and GPT-4 concerning common gender biases examined in psychology corroborate the highly stable nature of the re-judge inconsistency. This finding may suggest that diverse cognitive constructs emerge as LLMs' capabilities strengthen. Consequently, leveraging psychological theories can provide enhanced insights into the underlying mechanisms governing the expressions of explicit and implicit constructs in LLMs.
    摘要

REB: Reducing Biases in Representation for Industrial Anomaly Detection

  • paper_url: http://arxiv.org/abs/2308.12577
  • repo_url: https://github.com/shuailyu/reb
  • paper_authors: Shuai Lyu, Dongmei Mo, Waikeung Wong
  • for: 提高工业异常检测的性能,减少域名偏见和特定区域密度偏见。
  • methods: 提出了减少域名偏见的 Representing Reducing Biases (REB) 方法,以及基于自我超视的学习任务和异常生成策略(DefectMaker)来更好地适应域名偏见。同时,提出了一种基于Local Density的KNN(LDKNN)方法,以减少本地密度偏见。
  • results: 在广泛使用的 MVTec AD bencmark 上达到了 99.5% AUROC 的优秀成绩,并在挑战性的 MVTec LOCO AD dataset 上达到了 88.0% AUROC,超过了州对比的最佳成绩。此外,使用较小的后向网络(Vgg11 和 Resnet18)获得了更好的效果和效率,表明 REB 方法在实际工业应用中具有效果和经济性。
    Abstract Existing K-nearest neighbor (KNN) retrieval-based methods usually conduct industrial anomaly detection in two stages: obtain feature representations with a pre-trained CNN model and perform distance measures for defect detection. However, the features are not fully exploited as they ignore domain bias and the difference of local density in feature space, which limits the detection performance. In this paper, we propose Reducing Biases (REB) in representation by considering the domain bias of the pre-trained model and building a self-supervised learning task for better domain adaption with a defect generation strategy (DefectMaker) imitating the natural defects. Additionally, we propose a local density KNN (LDKNN) to reduce the local density bias and obtain effective anomaly detection. We achieve a promising result of 99.5\% AUROC on the widely used MVTec AD benchmark. We also achieve 88.0\% AUROC on the challenging MVTec LOCO AD dataset and bring an improvement of 4.7\% AUROC to the state-of-the-art result. All results are obtained with smaller backbone networks such as Vgg11 and Resnet18, which indicates the effectiveness and efficiency of REB for practical industrial applications.
    摘要 现有的K-最近邻(KNN)检索基于方法通常在两个阶段进行工业异常检测:首先使用预训练的CNN模型获取特征表示,然后进行距离度量以检测异常。然而,这些特征不完全利用,因为它们忽略预测模型的领域偏见和特征空间中的地方浓度差异,这限制了检测性能。在这篇论文中,我们提出了减少偏见(REB)在表示中的技术,通过考虑预测模型的领域偏见来减少偏见。此外,我们还提出了一种本地浓度KNN(LDKNN),以减少本地浓度偏见并获得有效的异常检测。我们在广泛使用的MVTec AD数据集上实现了99.5%的AUROC报告,并在挑战性的MVTec LOCO AD数据集上实现了88.0%的AUROC报告,超过了状态 искусственный界的result。这些结果都是使用较小的背部网络,如Vgg11和Resnet18,这表明REB的效iveness和可行性。

Exploring the Integration Strategies of Retriever and Large Language Models

  • paper_url: http://arxiv.org/abs/2308.12574
  • repo_url: None
  • paper_authors: Ye Liu, Semih Yavuz, Rui Meng, Meghana Moorthy, Shafiq Joty, Caiming Xiong, Yingbo Zhou
  • for: 该论文目的是提高开放预测问答的能力, investigate different methods of combining retrieved passages with LLMs to enhance answer generation.
  • methods: 论文使用了四种策略来结合检索结果和LLMs,包括两种单回路方法和两种多回路方法,以实现更好的答案生成。
  • results: 经过广泛的分析和实验,论文提供了有用的见解,以帮助更好地利用检索结果来提高LLMs的答案生成能力。
    Abstract The integration of retrieved passages and large language models (LLMs), such as ChatGPTs, has significantly contributed to improving open-domain question answering. However, there is still a lack of exploration regarding the optimal approach for incorporating retrieved passages into the answer generation process. This paper aims to fill this gap by investigating different methods of combining retrieved passages with LLMs to enhance answer generation. We begin by examining the limitations of a commonly-used concatenation approach. Surprisingly, this approach often results in generating "unknown" outputs, even when the correct document is among the top-k retrieved passages. To address this issue, we explore four alternative strategies for integrating the retrieved passages with the LLMs. These strategies include two single-round methods that utilize chain-of-thought reasoning and two multi-round strategies that incorporate feedback loops. Through comprehensive analyses and experiments, we provide insightful observations on how to effectively leverage retrieved passages to enhance the answer generation capability of LLMs.
    摘要 <> translate("The integration of retrieved passages and large language models (LLMs), such as ChatGPTs, has significantly contributed to improving open-domain question answering. However, there is still a lack of exploration regarding the optimal approach for incorporating retrieved passages into the answer generation process. This paper aims to fill this gap by investigating different methods of combining retrieved passages with LLMs to enhance answer generation. We begin by examining the limitations of a commonly-used concatenation approach. Surprisingly, this approach often results in generating "unknown" outputs, even when the correct document is among the top-k retrieved passages. To address this issue, we explore four alternative strategies for integrating the retrieved passages with the LLMs. These strategies include two single-round methods that utilize chain-of-thought reasoning and two multi-round strategies that incorporate feedback loops. Through comprehensive analyses and experiments, we provide insightful observations on how to effectively leverage retrieved passages to enhance the answer generation capability of LLMs.")]以下是文本的Simplified Chinese翻译:<>大语模型(LLMs)和检索段落的集成已经有效提高了开放领域问答。然而,在将检索段落与LLMs结合的优化方法方面,还存在一些不足。这篇论文的目标是填补这一空白,通过不同的方法来融合检索段落和LLMs来提高答案生成能力。我们首先检查了常用的 concatenation 方法的局限性。尝试意外地发现,这种方法经常生成 "未知" 输出,即使正确的文档在前top-k检索段落中。为解决这一问题,我们探索了四种不同的方法,包括两种单回合方法和两种多回合方法,以便更好地利用检索段落来增强 LLMS 的答案生成能力。通过对这些方法的全面分析和实验,我们提供了有价值的观察和建议,以帮助更好地利用检索段落来提高 LLMS 的答案生成能力。

Conditional Kernel Imitation Learning for Continuous State Environments

  • paper_url: http://arxiv.org/abs/2308.12573
  • repo_url: None
  • paper_authors: Rishabh Agrawal, Nathan Dahlin, Rahul Jain, Ashutosh Nayyar
  • for: 这 paper 的目的是解决基于观察行为的 reinforcement learning 问题,无需transition dynamics信息、奖励结构或其他环境交互数据。
  • methods: 这 paper 使用Markov balance equation和 conditional kernel density estimation 来建立一种基于观察行为的 imitation learning 框架,并证明其 asymptotic consistency 性。
  • results: 通过在连续状态环境上进行numerical experiments, authors 发现该方法在 empirical performance 上表现出 beat many state-of-the-art IL algorithms 的特点。
    Abstract Imitation Learning (IL) is an important paradigm within the broader reinforcement learning (RL) methodology. Unlike most of RL, it does not assume availability of reward-feedback. Reward inference and shaping are known to be difficult and error-prone methods particularly when the demonstration data comes from human experts. Classical methods such as behavioral cloning and inverse reinforcement learning are highly sensitive to estimation errors, a problem that is particularly acute in continuous state space problems. Meanwhile, state-of-the-art IL algorithms convert behavioral policy learning problems into distribution-matching problems which often require additional online interaction data to be effective. In this paper, we consider the problem of imitation learning in continuous state space environments based solely on observed behavior, without access to transition dynamics information, reward structure, or, most importantly, any additional interactions with the environment. Our approach is based on the Markov balance equation and introduces a novel conditional kernel density estimation-based imitation learning framework. It involves estimating the environment's transition dynamics using conditional kernel density estimators and seeks to satisfy the probabilistic balance equations for the environment. We establish that our estimators satisfy basic asymptotic consistency requirements. Through a series of numerical experiments on continuous state benchmark environments, we show consistently superior empirical performance over many state-of-the-art IL algorithms.
    摘要 模仿学习(IL)是激励学习(RL)方法中的一个重要分支,不同于大多数RL方法,它不假设环境提供奖励反馈。奖励推断和形成是知识到达难度和错误感知的方法,特别是当示例数据来自人类专家时。经典方法 such as 行为做抄和反射学习是高度敏感于估计错误,特别是在连续状态空间问题上。在这篇论文中,我们考虑了基于观察行为的激励学习问题,无需环境过程动力学信息、奖励结构或任何其他与环境交互的数据。我们的方法基于Markov平衡方程,并提出了一种基于Conditional Kernel Density Estimation的新型模仿学习框架。它通过估计环境的过程动力学使用Conditional Kernel Density Estimator,并寻求满足环境的 probabilistic balance equation。我们证明了我们的估计符合基本的极限consistency要求。通过对连续状态标准环境进行数值实验,我们显示了在许多状态艺术IL算法的实际性表现上的一致性。

A Co-training Approach for Noisy Time Series Learning

  • paper_url: http://arxiv.org/abs/2308.12551
  • repo_url: None
  • paper_authors: Weiqi Zhang, Jianfeng Zhang, Jia Li, Fugee Tsung
  • for: 本研究强调Robust时间序列表示学习,假设真实世界的时间序列受到噪声和杂音的影响,同时不同视图的时间序列信息具有重要的作用。
  • methods: 我们采用了两个不同的编码器来创建两个视图,然后通过训练协同对照学习来学习这两个编码器。我们的实验表明,这种协同对照学习方法可以显著提高表达的性能。
  • results: 我们在四个时间序列 benchmark上进行了无监督和半监督的实验,结果显示,TS-CoT 方法可以减轻数据噪声和杂音的影响,并且表达可以 Transfer 到下游任务 through fine-tuning。
    Abstract In this work, we focus on robust time series representation learning. Our assumption is that real-world time series is noisy and complementary information from different views of the same time series plays an important role while analyzing noisy input. Based on this, we create two views for the input time series through two different encoders. We conduct co-training based contrastive learning iteratively to learn the encoders. Our experiments demonstrate that this co-training approach leads to a significant improvement in performance. Especially, by leveraging the complementary information from different views, our proposed TS-CoT method can mitigate the impact of data noise and corruption. Empirical evaluations on four time series benchmarks in unsupervised and semi-supervised settings reveal that TS-CoT outperforms existing methods. Furthermore, the representations learned by TS-CoT can transfer well to downstream tasks through fine-tuning.
    摘要 在这项工作中,我们关注于鲁棒时间序列表示学习。我们假设真实世界中的时间序列是噪音的,并且不同视图中的时间序列信息具有重要的作用。基于这个假设,我们创建了两个视图来表示输入时间序列,并通过对这两个视图进行互训练的对照学习来学习这两个视图的编码器。我们的实验表明,这种合作学习方法可以提高表达性能。尤其是通过利用不同视图之间的补做信息,我们的提案的TS-CoT方法可以减轻数据噪音和损害的影响。我们在四个时间序列标准 bencmarks 上进行了无监督和半监督的实验,并证明了TS-CoT方法在表达性能方面的优异性。此外,TS-CoT方法学习的表示可以通过细化进行下游任务的调整,以便在不同任务上进行应用。

Synchronize Feature Extracting and Matching: A Single Branch Framework for 3D Object Tracking

  • paper_url: http://arxiv.org/abs/2308.12549
  • repo_url: None
  • paper_authors: Teli Ma, Mengmeng Wang, Jimin Xiao, Huifeng Wu, Yong Liu
  • for: 本研究旨在提出一种新的单支 Framework,即SyncTrack,用于3D LiDAR对象跟踪。
  • methods: SyncTrack弃备传统的Siamese方式,改用一个单支Encoder,同时在模型中引入了一种新的同步机制,以避免在模型中两次使用Encoder,并且引入了一种新的批处理策略。
  • results: 实验结果表明,SyncTrack在两个标准数据集(KITTI和NuScenes)上达到了实时跟踪的状态态ridge性表现。
    Abstract Siamese network has been a de facto benchmark framework for 3D LiDAR object tracking with a shared-parametric encoder extracting features from template and search region, respectively. This paradigm relies heavily on an additional matching network to model the cross-correlation/similarity of the template and search region. In this paper, we forsake the conventional Siamese paradigm and propose a novel single-branch framework, SyncTrack, synchronizing the feature extracting and matching to avoid forwarding encoder twice for template and search region as well as introducing extra parameters of matching network. The synchronization mechanism is based on the dynamic affinity of the Transformer, and an in-depth analysis of the relevance is provided theoretically. Moreover, based on the synchronization, we introduce a novel Attentive Points-Sampling strategy into the Transformer layers (APST), replacing the random/Farthest Points Sampling (FPS) method with sampling under the supervision of attentive relations between the template and search region. It implies connecting point-wise sampling with the feature learning, beneficial to aggregating more distinctive and geometric features for tracking with sparse points. Extensive experiments on two benchmark datasets (KITTI and NuScenes) show that SyncTrack achieves state-of-the-art performance in real-time tracking.
    摘要 三元网络在3D LiDAR物体跟踪中 serves as a de facto benchmark framework, with a shared-parametric encoder extracting features from the template and search region, respectively. This paradigm relies heavily on an additional matching network to model the cross-correlation/similarity of the template and search region. In this paper, we abandon the conventional Siamese paradigm and propose a novel single-branch framework, SyncTrack, which synchronizes the feature extraction and matching to avoid forwarding the encoder twice for the template and search region, as well as introducing extra parameters of the matching network. The synchronization mechanism is based on the dynamic affinity of the Transformer, and an in-depth analysis of the relevance is provided theoretically. Moreover, based on the synchronization, we introduce a novel Attentive Points-Sampling strategy into the Transformer layers (APST), replacing the random/Farthest Points Sampling (FPS) method with sampling under the supervision of attentive relations between the template and search region. This implies connecting point-wise sampling with the feature learning, beneficial to aggregating more distinctive and geometric features for tracking with sparse points. Extensive experiments on two benchmark datasets (KITTI and NuScenes) show that SyncTrack achieves state-of-the-art performance in real-time tracking.

CALM : A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias

  • paper_url: http://arxiv.org/abs/2308.12539
  • repo_url: https://github.com/vipulgupta1011/calm
  • paper_authors: Vipul Gupta, Pranav Narayanan Venkit, Hugo Laurençon, Shomir Wilson, Rebecca J. Passonneau
  • for: 评估语言模型(LM)的社会经济偏见,以避免可能导致伤害的潜在危害。
  • methods: 引入了Comprehensive Assessment of Language Model bias(CALM),一个可靠的评估语言模型偏见的benchmarkdataset,通过Integrating16个不同领域的 dataset,并对224个模板进行筛选,构建了一 dataset of 78,400例。
  • results: 比较了CALM dataset的多样性与之前的dataset,并测试了小幅度的变化的敏感性,发现CALM dataset更加多样和可靠,能够更好地评估语言模型的偏见。此外,对20个大型语言模型进行了评估,发现大型模型更加偏向于某些群体,而T0系列的模型最少偏见。
    Abstract As language models (LMs) become increasingly powerful, it is important to quantify and compare them for sociodemographic bias with potential for harm. Prior bias measurement datasets are sensitive to perturbations in their manually designed templates, therefore unreliable. To achieve reliability, we introduce the Comprehensive Assessment of Language Model bias (CALM), a benchmark dataset to quantify bias in LMs across three tasks. We integrate 16 existing datasets across different domains, such as Wikipedia and news articles, to filter 224 templates from which we construct a dataset of 78,400 examples. We compare the diversity of CALM with prior datasets on metrics such as average semantic similarity, and variation in template length, and test the sensitivity to small perturbations. We show that our dataset is more diverse and reliable than previous datasets, thus better capture the breadth of linguistic variation required to reliably evaluate model bias. We evaluate 20 large language models including six prominent families of LMs such as Llama-2. In two LM series, OPT and Bloom, we found that larger parameter models are more biased than lower parameter models. We found the T0 series of models to be the least biased. Furthermore, we noticed a tradeoff between gender and racial bias with increasing model size in some model series. The code is available at https://github.com/vipulgupta1011/CALM.
    摘要 为了评估语言模型(LM)的社会经济偏见,需要量化和比较它们。但现有的偏见测试数据集有很多缺陷,例如 manually designed templates 的修改会导致测试结果不可靠。为了解决这问题,我们引入了 Comprehensive Assessment of Language Model Bias(CALM),一个可靠的偏见测试数据集,用于评估 LM 的偏见。我们将16个不同领域的数据集集成,包括 Wikipedia 和新闻文章,并从这些模板中过滤出224个模板,然后构建了一个包含 78,400 个示例的数据集。我们比较了 CALM 数据集的多样性和先前数据集的多样性,并测试了小改动的敏感性。我们发现 CALM 数据集更加多样和可靠,因此更能够正确地评估 LM 的偏见。我们测试了 20 个大型语言模型,包括六种主要的 LM 家族,例如 Llama-2。在一些模型系列中,我们发现大型模型更加偏见,而小型模型更加不偏见。此外,我们发现一些模型系列中, gender 和种族偏见之间存在负相关性。模型代码可以在 GitHub 上找到:https://github.com/vipulgupta1011/CALM。

FedSoL: Bridging Global Alignment and Local Generality in Federated Learning

  • paper_url: http://arxiv.org/abs/2308.12532
  • repo_url: None
  • paper_authors: Gihun Lee, Minchan Jeong, Sangmook Kim, Jaehoon Oh, Se-Young Yun
  • for: 提高 Federated Learning(FL)的性能,解决 Client 数据分布不均的问题。
  • methods: combinest both the concepts of global alignment and local generality,使用 parameter region robust against proximal perturbations 来做 Local Learning。
  • results: experiments show that FedSoL consistently achieves state-of-the-art performance on various setups。
    Abstract Federated Learning (FL) aggregates locally trained models from individual clients to construct a global model. While FL enables learning a model with data privacy, it often suffers from significant performance degradation when client data distributions are heterogeneous. Many previous FL algorithms have addressed this issue by introducing various proximal restrictions. These restrictions aim to encourage global alignment by constraining the deviation of local learning from the global objective. However, they inherently limit local learning by interfering with the original local objectives. Recently, an alternative approach has emerged to improve local learning generality. By obtaining local models within a smooth loss landscape, this approach mitigates conflicts among different local objectives of the clients. Yet, it does not ensure stable global alignment, as local learning does not take the global objective into account. In this study, we propose Federated Stability on Learning (FedSoL), which combines both the concepts of global alignment and local generality. In FedSoL, the local learning seeks a parameter region robust against proximal perturbations. This strategy introduces an implicit proximal restriction effect in local learning while maintaining the original local objective for parameter update. Our experiments show that FedSoL consistently achieves state-of-the-art performance on various setups.
    摘要 federa 学习(FL)将本地训练的模型从客户端集成成全局模型。而FL可以保持数据隐私,但它经常受到客户端数据分布不均的影响,导致性能下降。许多前一代FL算法已经解决这个问题,通过引入不同的距离约束。这些约束 стремятся强制全局对应,但它们会限制本地学习。在最近几年,一种新的方法出现了,以提高本地学习的通用性。通过在本地学习中获得一个平滑的损失函数,这种方法减少了不同客户端的本地目标之间的冲突。然而,这种方法并不保证全局稳定性,因为本地学习并不考虑全局目标。在本研究中,我们提出了联邦稳定学习(FedSoL),它结合了全局对应和本地通用性两个概念。在FedSoL中,本地学习寻找一个对于质量变化具有鲁棒性的参数空间。这种策略引入了一种含义质量变化的隐藏约束效果,同时保持原始本地目标进行参数更新。我们的实验显示,FedSoL能够在不同的设置下保持状态革命性的性能。

Not Only Rewards But Also Constraints: Applications on Legged Robot Locomotion

  • paper_url: http://arxiv.org/abs/2308.12517
  • repo_url: None
  • paper_authors: Yunho Kim, Hyunsik Oh, Jeonghyun Lee, Jinhyeok Choi, Gwanghyeon Ji, Moonkyu Jung, Donghoon Youm, Jemin Hwangbo
    for: 本研究旨在开发一种基于强化学习的控制器训练框架,以便在复杂的机器人系统中实现自然的运动风格和高任务性能。methods: 该框架使用了两种约束类型和一种高效的政策优化算法,以便让工程师可以合理地反映他们的意图并处理约束,而不需要进行大量的计算开销。results: 在大量的simulation和实际实验中,这种学习框架可以让performant的控制器在不需要大量的奖励工程的情况下被训练,只需要调整一个奖励系数即可。此外,由于约束的可读性和普适性,可以更好地利用这些约束来实现更直观和intuitive的工程过程。
    Abstract Several earlier studies have shown impressive control performance in complex robotic systems by designing the controller using a neural network and training it with model-free reinforcement learning. However, these outstanding controllers with natural motion style and high task performance are developed through extensive reward engineering, which is a highly laborious and time-consuming process of designing numerous reward terms and determining suitable reward coefficients. In this work, we propose a novel reinforcement learning framework for training neural network controllers for complex robotic systems consisting of both rewards and constraints. To let the engineers appropriately reflect their intent to constraints and handle them with minimal computation overhead, two constraint types and an efficient policy optimization algorithm are suggested. The learning framework is applied to train locomotion controllers for several legged robots with different morphology and physical attributes to traverse challenging terrains. Extensive simulation and real-world experiments demonstrate that performant controllers can be trained with significantly less reward engineering, by tuning only a single reward coefficient. Furthermore, a more straightforward and intuitive engineering process can be utilized, thanks to the interpretability and generalizability of constraints. The summary video is available at https://youtu.be/KAlm3yskhvM.
    摘要 几乎早前的研究已经表明,通过使用神经网络和无模型学习来设计控制器,可以实现复杂的机器人系统中的出色的控制性能。然而,这些出色的控制器具有自然的运动风格和高任务性能,通过广泛的奖励工程来实现,这是一项高度劳动密集和时间耗费的过程,涉及到设计很多奖励项和确定适当的奖励系数。在这种情况下,我们提出了一种新的学习框架,用于训练神经网络控制器,以便在复杂的机器人系统中实现高性能。为了让工程师能够正确地反映他们的意图,并减少计算开销,我们建议了两种约束类型和一种高效的政策优化算法。我们的学习框架在许多跑动和实际实验中被应用,并证明了可以通过微不足的奖励工程,训练出高性能的控制器。此外,由于约束的可读性和普遍性,工程师可以使用更加直观和直接的工程过程。有关视频可以在以下链接中找到:https://youtu.be/KAlm3yskhvM。

I3DOD: Towards Incremental 3D Object Detection via Prompting

  • paper_url: http://arxiv.org/abs/2308.12512
  • repo_url: None
  • paper_authors: Wenqi Liang, Gan Sun, Chenxi Liu, Jiahua Dong, Kangru Wang
  • for: 本研究的目的是提出一种基于指导的增量3D对象检测框架(I3DOD),以解决现有的类增量3D对象检测方法会导致致命忘记旧类问题。
  • methods: 该方法提出了一种任务共享提示机制,通过学习对象位置信息和类别 semantic 信息之间的匹配关系,以及一种可靠的维持知识传递策略,包括一种可靠的动态维持策略和一种关系特征来捕捉响应特征空间中的相互关系。
  • results: 通过对两个 benchmark 数据集进行了广泛的实验, authors 发现,与现有对象检测方法相比,该方法在 mAP@0.25 中提高了0.6% - 2.7%。
    Abstract 3D object detection has achieved significant performance in many fields, e.g., robotics system, autonomous driving, and augmented reality. However, most existing methods could cause catastrophic forgetting of old classes when performing on the class-incremental scenarios. Meanwhile, the current class-incremental 3D object detection methods neglect the relationships between the object localization information and category semantic information and assume all the knowledge of old model is reliable. To address the above challenge, we present a novel Incremental 3D Object Detection framework with the guidance of prompting, i.e., I3DOD. Specifically, we propose a task-shared prompts mechanism to learn the matching relationships between the object localization information and category semantic information. After training on the current task, these prompts will be stored in our prompt pool, and perform the relationship of old classes in the next task. Moreover, we design a reliable distillation strategy to transfer knowledge from two aspects: a reliable dynamic distillation is developed to filter out the negative knowledge and transfer the reliable 3D knowledge to new detection model; the relation feature is proposed to capture the responses relation in feature space and protect plasticity of the model when learning novel 3D classes. To the end, we conduct comprehensive experiments on two benchmark datasets and our method outperforms the state-of-the-art object detection methods by 0.6% - 2.7% in terms of mAP@0.25.
    摘要 三维物体检测已经在许多领域取得了显著性能,如 робо扮系统、自动驾驶和增强现实。然而,大多数现有方法在类增量enario中会导致致命的忘记老类。同时,当前的类增量三维物体检测方法忽视了物体Localization信息和类别Semantic信息之间的关系,并将所有古老模型的知识视为可靠。为了解决这些挑战,我们提出了一种新的增量三维物体检测框架,即I3DOD。我们特点是提出了一种任务分享提示机制,用于学习物体Localization信息和类别Semantic信息之间的匹配关系。在训练当前任务后,这些提示将被存储在我们的提示Pool中,并在下一个任务中进行相应的关系学习。此外,我们设计了一种可靠的搅拌策略,用于从两个方面传递知识:一种可靠的动态搅拌策略可以过滤出负知识,并将可靠的三维知识传递到新的检测模型;另一种特征relation被提出,用于在特征空间中捕捉响应关系,并保护模型在学习新的三维类时的塑性。最后,我们进行了对两个benchmark数据集的全面实验,并发现我们的方法在mAP@0.25上比前state-of-the-art object detection方法高出0.6% - 2.7%。

Masked Autoencoders are Efficient Class Incremental Learners

  • paper_url: http://arxiv.org/abs/2308.12510
  • repo_url: https://github.com/scok30/mae-cil
  • paper_authors: Jiang-Tian Zhai, Xialei Liu, Andrew D. Bagdanov, Ke Li, Ming-Ming Cheng
  • for: 这篇论文旨在Sequential Learning中学习新的类别,并避免过去知识的抹除。
  • methods: 我们提出使用Masked Autoencoders(MAEs)作为有效的学习器,MAEs可以透过复原式无监督学习得到有用的表现,并且可以轻松地与类别标签整合。我们还提出了两边MAE框架,从图像水平和嵌入水平的融合中学习。
  • results: 我们的方法在CIFAR-100、ImageNet-Subset和ImageNet-Full上比顶对比方法更好,实验证明了我们的方法的有效性。
    Abstract Class Incremental Learning (CIL) aims to sequentially learn new classes while avoiding catastrophic forgetting of previous knowledge. We propose to use Masked Autoencoders (MAEs) as efficient learners for CIL. MAEs were originally designed to learn useful representations through reconstructive unsupervised learning, and they can be easily integrated with a supervised loss for classification. Moreover, MAEs can reliably reconstruct original input images from randomly selected patches, which we use to store exemplars from past tasks more efficiently for CIL. We also propose a bilateral MAE framework to learn from image-level and embedding-level fusion, which produces better-quality reconstructed images and more stable representations. Our experiments confirm that our approach performs better than the state-of-the-art on CIFAR-100, ImageNet-Subset, and ImageNet-Full. The code is available at https://github.com/scok30/MAE-CIL .
    摘要 <>translate into Simplified ChineseClass Incremental Learning (CIL) targets Sequential Learning of new classes while avoiding catastrophic forgetting of previous knowledge. We propose using Masked Autoencoders (MAEs) as efficient learners for CIL. MAEs were originally designed to learn useful representations through reconstructive unsupervised learning, and they can be easily integrated with a supervised loss for classification. Moreover, MAEs can reliably reconstruct original input images from randomly selected patches, which we use to store exemplars from past tasks more efficiently for CIL. We also propose a bilateral MAE framework to learn from image-level and embedding-level fusion, which produces better-quality reconstructed images and more stable representations. Our experiments confirm that our approach performs better than the state-of-the-art on CIFAR-100, ImageNet-Subset, and ImageNet-Full. Code available at https://github.com/scok30/MAE-CIL .Note: Please note that the translation is in Simplified Chinese, which is one of the two standard Chinese languages. If you need Traditional Chinese, please let me know.

CGMI: Configurable General Multi-Agent Interaction Framework

  • paper_url: http://arxiv.org/abs/2308.12503
  • repo_url: None
  • paper_authors: Shi Jinxin, Zhao Jiabao, Wang Yilei, Wu Xingjiao, Li Jiawen, He Liang
  • for: 这个论文旨在提供一个基于大语言模型的多代理人系统,用于模拟人类交互和解决域专任务。
  • methods: 该系统使用树结构的方法来分配、检测和维护代理人性格,同时采用基于ACT*模型的认知建筑,包括记忆、反思和规划模块。
  • results: 通过在虚拟环境中模拟教师与学生之间的互动,该系统实现了许多与实际教室情况相似的方面,如教学方法、课程和学生表现。
    Abstract Benefiting from the powerful capabilities of large language models (LLMs), agents based on LLMs have shown the potential to address domain-specific tasks and emulate human behaviors. However, the content generated by these agents remains somewhat superficial, owing to their limited domain expertise and the absence of an effective cognitive architecture. To address this, we present the Configurable General Multi-Agent Interaction (CGMI) framework, designed to replicate human interactions in real-world scenarios. Specifically, we propose a tree-structured methodology for the assignment, detection, and maintenance of agent personality. Additionally, we designed a cognitive architecture equipped with a skill library based on the ACT* model, which contains memory, reflection, and planning modules. We have also integrated general agents to augment the virtual environment's realism. Using the CGMI framework, we simulated numerous classroom interactions between teacher and students. The experiments indicate that aspects such as the teaching methodology, curriculum, and student performance closely mirror real classroom settings. We will open source our work.
    摘要 利用大型自然语言模型(LLM)的强大能力,基于LLM的代理人已经展示了解决域特定任务和模拟人类行为的潜力。然而,由这些代理人生成的内容仍然有所 superficious,主要是因为它们的域专业知识有限和缺乏有效的认知架构。为了解决这个问题,我们提出了可 configurable通用多代理人交互(CGMI)框架,用于复制真实世界中的人类互动。具体来说,我们提出了一种树状的方法ологи? для代理人性分配、检测和维护。此外,我们还设计了一个基于ACT*模型的认知架构,包括记忆、反思和规划模块。此外,我们还将通用代理人集成到虚拟环境中,以增加真实性。使用CGMI框架,我们模拟了许多教室互动 между教师和学生。实验结果表明,教学方法、课程和学生表现均具有真实教室情况的特点。我们将将我们的工作开源。

Source-Free Collaborative Domain Adaptation via Multi-Perspective Feature Enrichment for Functional MRI Analysis

  • paper_url: http://arxiv.org/abs/2308.12495
  • repo_url: https://github.com/yqfang9199/scda
  • paper_authors: Yuqi Fang, Jinjian Wu, Qianqian Wang, Shijun Qiu, Andrea Bozoki, Huaicheng Yan, Mingxia Liu
  • for: 这项研究旨在提供一种无源数据的域适应方法,以帮助解决Functional MRI(fMRI)数据的差异问题,从而提高脑功能研究的精度和可重复性。
  • methods: 该方法基于多个视角特征增强方法(MFE),包括多个协作分支,每个分支都有数据供应模块、空间时间特征编码器和分类预测器。此外,我们还提出了一种互相一致约束,以便在不同域的数据上学习坚实的特征表示。
  • results: 我们的方法在三个公共数据集和一个私有数据集上进行了实验,并达到了跨扫描仪和跨研究任务的预测任务中的高效性。此外,我们还提供了一个基于大规模rs-fMRI数据的预训练模型,并将其公开发布。
    Abstract Resting-state functional MRI (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis. Existing studies usually suffer from significant cross-site/domain data heterogeneity caused by site effects such as differences in scanners/protocols. Many methods have been proposed to reduce fMRI heterogeneity between source and target domains, heavily relying on the availability of source data. But acquiring source data is challenging due to privacy concerns and/or data storage burdens in multi-site studies. To this end, we design a source-free collaborative domain adaptation (SCDA) framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible. Specifically, a multi-perspective feature enrichment method (MFE) is developed for target fMRI analysis, consisting of multiple collaborative branches to dynamically capture fMRI features of unlabeled target data from multiple views. Each branch has a data-feeding module, a spatiotemporal feature encoder, and a class predictor. A mutual-consistency constraint is designed to encourage pair-wise consistency of latent features of the same input generated from these branches for robust representation learning. To facilitate efficient cross-domain knowledge transfer without source data, we initialize MFE using parameters of a pretrained source model. We also introduce an unsupervised pretraining strategy using 3,806 unlabeled fMRIs from three large-scale auxiliary databases, aiming to obtain a general feature encoder. Experimental results on three public datasets and one private dataset demonstrate the efficacy of our method in cross-scanner and cross-study prediction tasks. The model pretrained on large-scale rs-fMRI data has been released to the public.
    摘要 休息态功能磁共振成像(rs-fMRI)在多站研究中越来越广泛应用,以帮助诊断神经系统疾病。现有的研究通常受到数据不同站点/领域的差异所致的重大跨站/领域数据不一致性问题,而且许多方法已经被提出来减少rs-fMRI数据的不一致性。然而,获取源数据是困难的,这主要是因为隐私问题和数据存储压力在多站研究中。为此,我们设计了一个无源数据的协同领域适应(SCDA)框架,只有预训练的源模型和无标签目标数据可用。具体来说,我们开发了一种多视角特征增强方法(MFE),用于目标rs-fMRI分析,该方法包括多个协同分支,用于动态捕捉目标rs-fMRI数据的不同视角特征。每个分支有数据供应模块、空间时间特征编码器和类别预测器。我们设计了一种互相一致性约束,以便在这些分支中对同一个输入生成的约束实现Robust特征学习。为了快速无源数据进行跨站知识传递,我们使用预训练源模型的参数初始化MFE。此外,我们还提出了一种无监督预训练策略,使用3806个无标签rs-fMRI数据集,以获得一个通用的特征编码器。实验结果表明,我们的方法在三个公共数据集和一个私有数据集上表现出色,在跨扫描和跨研究预测任务中。我们已经将预训练于大规模rs-fMRI数据的模型公开发布。

GPTEval: A Survey on Assessments of ChatGPT and GPT-4

  • paper_url: http://arxiv.org/abs/2308.12488
  • repo_url: None
  • paper_authors: Rui Mao, Guanyi Chen, Xulang Zhang, Frank Guerin, Erik Cambria
  • for: This survey aims to comprehensively review and analyze the collective assessment findings of ChatGPT and GPT-4 in various tasks and disciplines, focusing on their language and reasoning abilities, scientific knowledge, and ethical considerations.
  • methods: The survey examines prior evaluations of ChatGPT and GPT-4, including their language and reasoning abilities, scientific knowledge, and ethical considerations.
  • results: The survey provides a comprehensive assessment of the collective findings of prior evaluations, offering several recommendations for future research in evaluating large language models.
    Abstract The emergence of ChatGPT has generated much speculation in the press about its potential to disrupt social and economic systems. Its astonishing language ability has aroused strong curiosity among scholars about its performance in different domains. There have been many studies evaluating the ability of ChatGPT and GPT-4 in different tasks and disciplines. However, a comprehensive review summarizing the collective assessment findings is lacking. The objective of this survey is to thoroughly analyze prior assessments of ChatGPT and GPT-4, focusing on its language and reasoning abilities, scientific knowledge, and ethical considerations. Furthermore, an examination of the existing evaluation methods is conducted, offering several recommendations for future research in evaluating large language models.
    摘要 chatgpt的出现引发了媒体的很多 спекуляция,关于其可能性影响社会和经济系统。它的语言能力引起了学者们强烈的好奇,关于它在不同领域的表现。已有许多研究评估chatgpt和gpt-4的能力,但没有一篇全面的评估报告。本调查的目标是对先前评估chatgpt和gpt-4的结果进行全面分析,强调其语言和理智能力、科学知识和伦理考虑。此外,还进行了现有评估方法的检查,提出了未来研究评估大语言模型的建议。

A Model of Sequential Learning based on Non-Axiomatic Logic

  • paper_url: http://arxiv.org/abs/2308.12486
  • repo_url: None
  • paper_authors: Bowen Xu
  • for: 本论文是关于智能代理人的sequential learning功能的研究报告。
  • methods: 本论文使用非AXIomial逻辑来解释学习过程,包括三步:假设、修订和回收。学习过程可以在不充分的知识和资源下进行。
  • results: 虽然当前设计有限制,但模型在一些简单的情况下已经证明有效。
    Abstract Sequential learning is a fundamental function of an intelligent agent. This technical report introduces a model of sequential learning, which is interpretable through Non-Axiomatic Logic. The learning procedure includes three steps, hypothesizing, revising, and recycling, and can work under the Assumption of Insufficient Knowledge and Resources. Although there are limitations for the current design, the model has been proven effective in some simple cases.
    摘要 Sequential learning是智能代理的基本功能之一。本技报介绍了一种可解释的续学学习模型,通过非AXIом逻辑进行解释。学习过程包括三步:假设、修订和回收,可以在不充分的知识和资源下进行。虽然当前设计有限制,但模型已在一些简单的情况下证明有效。Note: "Sequential learning" in Chinese is usually translated as "续学学习" (xùxué xuéxí), but in this context, the word "sequential" is used to emphasize the order of the learning steps, so I used "续学" (xùxué) instead.

Attention-Based Acoustic Feature Fusion Network for Depression Detection

  • paper_url: http://arxiv.org/abs/2308.12478
  • repo_url: https://github.com/xuxiaoooo/abafnet
  • paper_authors: Xiao Xu, Yang Wang, Xinru Wei, Fei Wang, Xizhe Zhang
  • for: 这个论文旨在提出一种新的听音数据检测方法,用于早期发现抑郁症。
  • methods: 该方法利用了高级机器学习理论,并将四种不同的听音特征 fusion 到一起,以提高抑郁症检测的准确率。
  • results: 对两个临床听音数据库进行了广泛验证,并比前期方法提高了抑郁症检测和亚型分类的性能。 I hope that helps! Let me know if you have any other questions.
    Abstract Depression, a common mental disorder, significantly influences individuals and imposes considerable societal impacts. The complexity and heterogeneity of the disorder necessitate prompt and effective detection, which nonetheless, poses a difficult challenge. This situation highlights an urgent requirement for improved detection methods. Exploiting auditory data through advanced machine learning paradigms presents promising research directions. Yet, existing techniques mainly rely on single-dimensional feature models, potentially neglecting the abundance of information hidden in various speech characteristics. To rectify this, we present the novel Attention-Based Acoustic Feature Fusion Network (ABAFnet) for depression detection. ABAFnet combines four different acoustic features into a comprehensive deep learning model, thereby effectively integrating and blending multi-tiered features. We present a novel weight adjustment module for late fusion that boosts performance by efficaciously synthesizing these features. The effectiveness of our approach is confirmed via extensive validation on two clinical speech databases, CNRAC and CS-NRAC, thereby outperforming previous methods in depression detection and subtype classification. Further in-depth analysis confirms the key role of each feature and highlights the importance of MFCCrelated features in speech-based depression detection.
    摘要 抑郁症是一种常见的心理疾病,对个人和社会产生了重要的影响。由于抑郁症的复杂性和多样性,早期检测成为了一项紧迫的需求。然而,现有的检测方法多数仅仅利用单一的特征模型,可能会损失大量的信息。为了解决这问题,我们提出了一种新的注意力基于的听音特征融合网络(ABAFnet),用于抑郁检测。ABAFnet结合了四种不同的听音特征,通过深度学习模型进行集成和融合。我们还提出了一种新的权重调整模块,用于late fusion,可以有效地合并这些特征。我们通过对两个临床听音数据库(CNRAC和CS-NRAC)进行广泛验证,证明了我们的方法在抑郁检测和分型分类方面的表现比前方法更高。进一步的深入分析表明,每种特征都扮演着重要的角色,而MFCC相关的特征在听音基于的抑郁检测中具有重要性。

Are ChatGPT and GPT-4 Good Poker Players? – A Pre-Flop Analysis

  • paper_url: http://arxiv.org/abs/2308.12466
  • repo_url: None
  • paper_authors: Akshat Gupta
  • for: 这个论文旨在评估 chatGPT 和 GPT-4 在牛牛游戏中的表现。
  • methods: 作者使用 chatGPT 和 GPT-4 进行了一系列实验,以评估这两种模型在牛牛游戏中的表现。
  • results: 研究发现,虽然 chatGPT 和 GPT-4 都具备了牛牛游戏的基本理解,但两者都不是 game theory optimal (GTO) 牛牛 player。GPT-4 比 chatGPT 更为进攻性,但两者的策略都不是 GTO。
    Abstract Since the introduction of ChatGPT and GPT-4, these models have been tested across a large number of tasks. Their adeptness across domains is evident, but their aptitude in playing games and specifically their aptitude in the realm of poker has remained unexplored. Poker is a game that requires decision making under uncertainty and incomplete information. In this paper, we put ChatGPT and GPT-4 through the poker test and evaluate their poker skills. Our findings reveal that while both models display an advanced understanding of poker, encompassing concepts like the valuation of starting hands, playing positions and other intricacies of game theory optimal (GTO) poker, both ChatGPT and GPT-4 are NOT game theory optimal poker players. Through a series of experiments, we first discover the characteristics of optimal prompts and model parameters for playing poker with these models. Our observations then unveil the distinct playing personas of the two models. We first conclude that GPT-4 is a more advanced poker player than ChatGPT. This exploration then sheds light on the divergent poker tactics of the two models: ChatGPT's conservativeness juxtaposed against GPT-4's aggression. In poker vernacular, when tasked to play GTO poker, ChatGPT plays like a Nit, which means that it has a propensity to only engage with premium hands and folds a majority of hands. When subjected to the same directive, GPT-4 plays like a maniac, showcasing a loose and aggressive style of play. Both strategies, although relatively advanced, are not game theory optimal.
    摘要 Since the introduction of ChatGPT and GPT-4, these models have been tested across a large number of tasks. Their aptitude across domains is evident, but their ability in playing games and specifically their ability in the realm of poker has remained unexplored. Poker is a game that requires decision making under uncertainty and incomplete information. In this paper, we put ChatGPT and GPT-4 through the poker test and evaluate their poker skills. Our findings reveal that while both models display an advanced understanding of poker, encompassing concepts like the valuation of starting hands, playing positions, and other intricacies of game theory optimal (GTO) poker, both ChatGPT and GPT-4 are NOT game theory optimal poker players. Through a series of experiments, we first discover the characteristics of optimal prompts and model parameters for playing poker with these models. Our observations then unveil the distinct playing personas of the two models. We first conclude that GPT-4 is a more advanced poker player than ChatGPT. This exploration then sheds light on the divergent poker tactics of the two models: ChatGPT's conservativeness juxtaposed against GPT-4's aggression. In poker vernacular, when tasked to play GTO poker, ChatGPT plays like a Nit, which means that it has a propensity to only engage with premium hands and folds a majority of hands. When subjected to the same directive, GPT-4 plays like a maniac, showcasing a loose and aggressive style of play. Both strategies, although relatively advanced, are not game theory optimal.

PFL-GAN: When Client Heterogeneity Meets Generative Models in Personalized Federated Learning

  • paper_url: http://arxiv.org/abs/2308.12454
  • repo_url: None
  • paper_authors: Achintha Wijesinghe, Songyang Zhang, Zhi Ding
  • for: 提高个人化 federated learning (PFL) 的效果,处理客户端数据不同的场景。
  • methods: 提出一种基于生成器对抗网络 (GAN) 的 PFL 模型,通过学习客户端之间的相似性,实现Weighted collaborative data aggregation。
  • results: 通过对几个常见的数据集进行严谨的实验,证明 PFL-GAN 的效果。
    Abstract Recent advances of generative learning models are accompanied by the growing interest in federated learning (FL) based on generative adversarial network (GAN) models. In the context of FL, GAN can capture the underlying client data structure, and regenerate samples resembling the original data distribution without compromising the private raw data. Although most existing GAN-based FL works focus on training a global model, Personalized FL (PFL) sometimes can be more effective in view of client data heterogeneity in terms of distinct data sample distributions, feature spaces, and labels. To cope with client heterogeneity in GAN-based FL, we propose a novel GAN sharing and aggregation strategy for PFL. The proposed PFL-GAN addresses the client heterogeneity in different scenarios. More specially, we first learn the similarity among clients and then develop an weighted collaborative data aggregation. The empirical results through the rigorous experimentation on several well-known datasets demonstrate the effectiveness of PFL-GAN.
    摘要 Here's the Simplified Chinese translation:现有的生成学模型技术的进步,使得联合学习(FL)基于生成对抗网络(GAN)模型的兴趣在提高。在FL中,GAN可以捕捉客户端数据结构,并生成类似原始数据分布的样本,无需披露私有的原始数据。然而,大多数现有的GAN-based FL工作都是通过全局模型进行训练,而Personalized FL(PFL)可能更有效,尤其是在客户端数据多样性方面,包括不同的数据样本分布、特征空间和标签。为了 Addressing client heterogeneity in GAN-based FL, we propose a novel GAN sharing and aggregation strategy for PFL. The proposed PFL-GAN addresses client heterogeneity in different scenarios by first learning the similarity among clients and then developing an weighted collaborative data aggregation. Empirical results from rigorous experimentation on several well-known datasets demonstrate the effectiveness of PFL-GAN.

Augmenting medical image classifiers with synthetic data from latent diffusion models

  • paper_url: http://arxiv.org/abs/2308.12453
  • repo_url: None
  • paper_authors: Luke W. Sagers, James A. Diao, Luke Melas-Kyriazi, Matthew Groh, Pranav Rajpurkar, Adewole S. Adamson, Veronica Rotemberg, Roxana Daneshjou, Arjun K. Manrai
  • for: 这篇论文的目的是为了探讨人工智能(AI)算法在医疗领域中的应用,特别是在处理皮肤病的情况下。
  • methods: 这篇论文使用了潜在扩散模型来生成皮肤病的synthetic图像,并评估了这些数据是否能够提高医疗AI算法的表现。
  • results: 研究发现,使用synthetic图像进行模型训练可以提高模型的表现,但是这些表现增强随着实际图像的数量增加。 Specifically, the performance gains saturate at a synthetic-to-real image ratio of 10:1, and are substantially smaller than the gains obtained from adding real images.
    Abstract While hundreds of artificial intelligence (AI) algorithms are now approved or cleared by the US Food and Drugs Administration (FDA), many studies have shown inconsistent generalization or latent bias, particularly for underrepresented populations. Some have proposed that generative AI could reduce the need for real data, but its utility in model development remains unclear. Skin disease serves as a useful case study in synthetic image generation due to the diversity of disease appearance, particularly across the protected attribute of skin tone. Here we show that latent diffusion models can scalably generate images of skin disease and that augmenting model training with these data improves performance in data-limited settings. These performance gains saturate at synthetic-to-real image ratios above 10:1 and are substantially smaller than the gains obtained from adding real images. As part of our analysis, we generate and analyze a new dataset of 458,920 synthetic images produced using several generation strategies. Our results suggest that synthetic data could serve as a force-multiplier for model development, but the collection of diverse real-world data remains the most important step to improve medical AI algorithms.
    摘要 美国食品和药品管理局(FDA)已批准或批准了数百个人工智能(AI)算法,但许多研究表明这些算法在不同人群中存在不一致的泛化或隐藏偏见,特别是对于受保护的人群。一些人提议使用生成AI可以减少实际数据的需求,但它在模型开发中的使用效果仍然不清楚。皮肤病 serves as a useful case study in synthetic image generation due to the diversity of disease appearance, particularly across the protected attribute of skin tone. 在这里,我们表明了液态增殖模型可以可扩展地生成皮肤病图像,并且在数据有限的情况下,通过这些数据进行模型训练可以提高表现。这些表现提升随synthetic-to-real图像比率增加,并且比于使用实际图像添加表现更小。在我们的分析中,我们生成了458,920个synthetic图像,并进行了分析。我们的结果表明,生成数据可以成为模型开发中的力multiplier,但收集多样化的实际数据仍然是改进医疗AI算法的最重要步骤。

An Intentional Forgetting-Driven Self-Healing Method For Deep Reinforcement Learning Systems

  • paper_url: http://arxiv.org/abs/2308.12445
  • repo_url: https://github.com/ahmedhajyahmed/drdrl
  • paper_authors: Ahmed Haj Yahmed, Rached Bouchoucha, Houssem Ben Braiek, Foutse Khomh
  • for: 本研究旨在提出一种有效的自适应方法,以解决 Deep Reinforcement Learning (DRL) 系统在大规模生产环境中遇到的环境风险。
  • methods: 本研究使用了 vanilla Continual Learning (CL) 方法,并增加了一种归yk intentional forgetting 机制,以解决 CL 中的主要问题,如 catastrophic forgetting、warm-starting failure 和 slow convergence。
  • results: 比较 vanilla CL 和 Dr. DRL 两种方法,Dr. DRL 能够在不同的演变环境中减少平均的恢复时间和精度调整集数量,并在19.63% 的演变环境中成功适应。同时,Dr. DRL 能够保持和提高在演变环境中解决的奖励 Water 到 45%。
    Abstract Deep reinforcement learning (DRL) is increasingly applied in large-scale productions like Netflix and Facebook. As with most data-driven systems, DRL systems can exhibit undesirable behaviors due to environmental drifts, which often occur in constantly-changing production settings. Continual Learning (CL) is the inherent self-healing approach for adapting the DRL agent in response to the environment's conditions shifts. However, successive shifts of considerable magnitude may cause the production environment to drift from its original state. Recent studies have shown that these environmental drifts tend to drive CL into long, or even unsuccessful, healing cycles, which arise from inefficiencies such as catastrophic forgetting, warm-starting failure, and slow convergence. In this paper, we propose Dr. DRL, an effective self-healing approach for DRL systems that integrates a novel mechanism of intentional forgetting into vanilla CL to overcome its main issues. Dr. DRL deliberately erases the DRL system's minor behaviors to systematically prioritize the adaptation of the key problem-solving skills. Using well-established DRL algorithms, Dr. DRL is compared with vanilla CL on various drifted environments. Dr. DRL is able to reduce, on average, the healing time and fine-tuning episodes by, respectively, 18.74% and 17.72%. Dr. DRL successfully helps agents to adapt to 19.63% of drifted environments left unsolved by vanilla CL while maintaining and even enhancing by up to 45% the obtained rewards for drifted environments that are resolved by both approaches.
    摘要 深度强化学习(DRL)在大规模生产中越来越普遍应用,如Netflix和Facebook。然而,与大多数数据驱动系统一样,DRL系统可能会表现出不良行为,即因环境变化而导致的环境漂移。循环学习(CL)是DRLAgent的自适应方法,可以响应环境的变化。然而,连续的大规模变化可能会使生产环境偏离原来的状态。现有研究表明,这些环境变化通常会让CL进入长期或无法恢复的循环征化,这是因为它们可能会导致忘记、暖启缺陷和慢 converges。在这篇论文中,我们提出了Dr. DRL,一种有效的自适应方法,用于解决DRL系统中的环境漂移问题。Dr. DRL通过novel的意图忘记机制,系统地优先级掌握DRL系统的关键问题解决技能。使用已知的DRL算法,Dr. DRL与vanilla CL进行了比较。Dr. DRL能够将循环时间和微调集数降低,减少了18.74%和17.72%。Dr. DRL成功地帮助代理人适应了19.63%的漂移环境,并保持了或提高了对漂移环境的解决率。

BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection

  • paper_url: http://arxiv.org/abs/2308.12439
  • repo_url: None
  • paper_authors: Tinghao Xie, Xiangyu Qi, Ping He, Yiming Li, Jiachen T. Wang, Prateek Mittal
    for:这则研究旨在防止深度神经网络(DNNs)上的后门攻击,其中敌友将附加了阴性的行为(backdoor)到DNNs中。methods:我们的防御方法属于post-development防御, Meaning it operates independently of how the model was generated。我们的防御方法基于一种新的反工程approach,可以直接将backdoor功能从一个附加了backdoor的模型中提取出来,并转换为一个名为backdoor expert model的模型。results:我们的防御方法可以高度有效地排除backdoor输入,并对于清洁的使用者享有轻微的影响。我们在多个数据集(CIFAR10、GTSRB和ImageNet)上验证了我们的防御方法,并在不同的模型架构(ResNet、VGG、MobileNetV2和Vision Transformer)上进行了实验。
    Abstract We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs), wherein adversaries covertly implant malicious behaviors (backdoors) into DNNs. Our defense falls within the category of post-development defenses that operate independently of how the model was generated. The proposed defense is built upon a novel reverse engineering approach that can directly extract backdoor functionality of a given backdoored model to a backdoor expert model. The approach is straightforward -- finetuning the backdoored model over a small set of intentionally mislabeled clean samples, such that it unlearns the normal functionality while still preserving the backdoor functionality, and thus resulting in a model (dubbed a backdoor expert model) that can only recognize backdoor inputs. Based on the extracted backdoor expert model, we show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference. Further augmented by an ensemble strategy with a finetuned auxiliary model, our defense, BaDExpert (Backdoor Input Detection with Backdoor Expert), effectively mitigates 16 SOTA backdoor attacks while minimally impacting clean utility. The effectiveness of BaDExpert has been verified on multiple datasets (CIFAR10, GTSRB and ImageNet) across various model architectures (ResNet, VGG, MobileNetV2 and Vision Transformer).
    摘要 我们提出了一种新的防御机制,以防止深度神经网络(DNN)上的后门攻击。在这种攻击中,敌人将附加了阴性的行为(backdoor)到DNN中。我们的防御方法属于后期开发防御,可以独立于模型的生成方式运作。我们的防御方法基于一种新的反引擎方法,可以直接将backdoor功能从一个附加了backdoor的模型中提取出来,并将其转换为一个专门的backdoor专家模型。这个方法简单明了:通过在一小量故意错abeled的清洁样本上调整附加了backdoor的模型,使其忘记正常功能,但仍保留backdoor功能,从而将模型转换为可以只识别backdoor输入的模型(称为backdoor专家模型)。基于提取的backdoor专家模型,我们显示了可以创建高精度的backdoor输入检测器,以筛出backdoor输入在模型测试过程中。另外,我们还将这个防御方法与一个调整的副标的模型进行拓展,创建了一个名为BaDExpert(深度神经网络后门专家)的防御系统。BaDExpert有效地抵销了16个SOTA后门攻击,而且对于清洁的功能影响轻微。我们在CIFAR10、GTSRB和ImageNet等多个数据集上验证了BaDExpert的可靠性,并在不同的模型架构(ResNet、VGG、MobileNetV2和Vision Transformer)上进行了实验。

Deploying Deep Reinforcement Learning Systems: A Taxonomy of Challenges

  • paper_url: http://arxiv.org/abs/2308.12438
  • repo_url: https://github.com/drldeploymentchallenges-icsme2023/replicationpackage
  • paper_authors: Ahmed Haj Yahmed, Altaf Allah Abbassi, Amin Nikanjam, Heng Li, Foutse Khomh
  • For: This paper aims to understand the challenges that practitioners face when deploying deep reinforcement learning (DRL) systems, and to identify the most common and difficult challenges in deploying DRL to different platforms.* Methods: The paper uses an empirical study on Stack Overflow (SO), the most popular Q&A forum for developers, to uncover and understand the challenges practitioners faced when deploying DRL systems. The study categorizes relevant SO posts by deployment platforms and investigates the current state and challenges related to deploying DRL systems.* Results: The study finds that the general interest in DRL deployment is growing, confirming the study’s relevance and importance. The study also finds that DRL deployment is more difficult than other DRL issues, and that RL environment-related challenges are the most popular, while communication-related challenges are the most difficult among practitioners. The study identifies a taxonomy of 31 unique challenges in deploying DRL to different platforms.
    Abstract Deep reinforcement learning (DRL), leveraging Deep Learning (DL) in reinforcement learning, has shown significant potential in achieving human-level autonomy in a wide range of domains, including robotics, computer vision, and computer games. This potential justifies the enthusiasm and growing interest in DRL in both academia and industry. However, the community currently focuses mostly on the development phase of DRL systems, with little attention devoted to DRL deployment. In this paper, we propose an empirical study on Stack Overflow (SO), the most popular Q&A forum for developers, to uncover and understand the challenges practitioners faced when deploying DRL systems. Specifically, we categorized relevant SO posts by deployment platforms: server/cloud, mobile/embedded system, browser, and game engine. After filtering and manual analysis, we examined 357 SO posts about DRL deployment, investigated the current state, and identified the challenges related to deploying DRL systems. Then, we investigate the prevalence and difficulty of these challenges. Results show that the general interest in DRL deployment is growing, confirming the study's relevance and importance. Results also show that DRL deployment is more difficult than other DRL issues. Additionally, we built a taxonomy of 31 unique challenges in deploying DRL to different platforms. On all platforms, RL environment-related challenges are the most popular, and communication-related challenges are the most difficult among practitioners. We hope our study inspires future research and helps the community overcome the most common and difficult challenges practitioners face when deploying DRL systems.
    摘要 深度强化学习(DRL),利用深度学习(DL)在强化学习中,已经表现出了人类水平自主性的潜力,在 робо技术、计算机视觉和电子游戏等领域取得了 significante 的成果。这种潜力正ifies和业界的兴趣,但现在大多数研究者在DRL系统的开发阶段所投入的时间较多,对DRL部署的关注相对较少。在这篇论文中,我们通过Stack Overflow(SO),最popular的开发者问答社区,进行了实证研究,探索和理解实践者在部署DRL系统时遇到的挑战。 Specifically, we categorized relevant SO posts by deployment platforms: server/cloud, mobile/embedded system, browser, and game engine. After filtering and manual analysis, we examined 357 SO posts about DRL deployment, investigated the current state, and identified the challenges related to deploying DRL systems. Then, we investigate the prevalence and difficulty of these challenges. Results show that the general interest in DRL deployment is growing, confirming the study's relevance and importance. Results also show that DRL deployment is more difficult than other DRL issues. Additionally, we built a taxonomy of 31 unique challenges in deploying DRL to different platforms. On all platforms, RL environment-related challenges are the most popular, and communication-related challenges are the most difficult among practitioners. We hope our study inspires future research and helps the community overcome the most common and difficult challenges practitioners face when deploying DRL systems.

Reframing the Brain Age Prediction Problem to a More Interpretable and Quantitative Approach

  • paper_url: http://arxiv.org/abs/2308.12416
  • repo_url: None
  • paper_authors: Neha Gianchandani, Mahsa Dibaji, Mariana Bento, Ethan MacDonald, Roberto Souza
  • for: 这篇论文旨在用深度学习模型从核磁共振图像中预测大脑年龄,并提供更加可读的解释方式。
  • methods: 这篇论文使用了图像到图像回归模型,以估计大脑每个脑窍ixel的年龄。它们比较了全像年龄预测模型和相应的精密度地图,并证明了 voxel-wise 预测模型更加可读,因为它们提供了脑部年龄变化的空间信息,并且具有量化的优势。
  • results: 研究结果表明,voxel-wise 预测模型在提供脑部年龄变化的空间信息方面比较出色,而且具有量化的优势。
    Abstract Deep learning models have achieved state-of-the-art results in estimating brain age, which is an important brain health biomarker, from magnetic resonance (MR) images. However, most of these models only provide a global age prediction, and rely on techniques, such as saliency maps to interpret their results. These saliency maps highlight regions in the input image that were significant for the model's predictions, but they are hard to be interpreted, and saliency map values are not directly comparable across different samples. In this work, we reframe the age prediction problem from MR images to an image-to-image regression problem where we estimate the brain age for each brain voxel in MR images. We compare voxel-wise age prediction models against global age prediction models and their corresponding saliency maps. The results indicate that voxel-wise age prediction models are more interpretable, since they provide spatial information about the brain aging process, and they benefit from being quantitative.
    摘要 In this work, we reframe the age prediction problem from MR images to an image-to-image regression problem, where we estimate the brain age for each brain voxel in MR images. We compare voxel-wise age prediction models against global age prediction models and their corresponding saliency maps. The results show that voxel-wise age prediction models are more interpretable, as they provide spatial information about the brain aging process, and they benefit from being quantitative.

Benchmarking Causal Study to Interpret Large Language Models for Source Code

  • paper_url: http://arxiv.org/abs/2308.12415
  • repo_url: None
  • paper_authors: Daniel Rodriguez-Cardenas, David N. Palacio, Dipin Khati, Henry Burke, Denys Poshyvanyk
  • for: 本研究旨在提供一种基于 causal inference 的生成代码评价策略,以帮助研究人员更好地理解 LLMS 的性能。
  • methods: 本研究使用了 Galeras benchmarking strategy,包括三个 SE 任务(代码完成、代码摘要和提交生成)的测试床,以帮助解释 LLMS 的性能。
  • results: 对 ChatGPT 的表现进行了一个案例研究,发现prompt semantics 对 ChatGPT 的生成性能具有正向 causal 影响(均值效应约为 3%),并发现 prompt size 和 accuracy metrics 之间存在高度相关性(约为 0.412%)。这些结果表明,通过使用 causal inference 评价策略,可以减少偏见干扰,提供更可靠的 LLMS 性能评价。
    Abstract One of the most common solutions adopted by software researchers to address code generation is by training Large Language Models (LLMs) on massive amounts of source code. Although a number of studies have shown that LLMs have been effectively evaluated on popular accuracy metrics (e.g., BLEU, CodeBleu), previous research has largely overlooked the role of Causal Inference as a fundamental component of the interpretability of LLMs' performance. Existing benchmarks and datasets are meant to highlight the difference between the expected and the generated outcome, but do not take into account confounding variables (e.g., lines of code, prompt size) that equally influence the accuracy metrics. The fact remains that, when dealing with generative software tasks by LLMs, no benchmark is available to tell researchers how to quantify neither the causal effect of SE-based treatments nor the correlation of confounders to the model's performance. In an effort to bring statistical rigor to the evaluation of LLMs, this paper introduces a benchmarking strategy named Galeras comprised of curated testbeds for three SE tasks (i.e., code completion, code summarization, and commit generation) to help aid the interpretation of LLMs' performance. We illustrate the insights of our benchmarking strategy by conducting a case study on the performance of ChatGPT under distinct prompt engineering methods. The results of the case study demonstrate the positive causal influence of prompt semantics on ChatGPT's generative performance by an average treatment effect of $\approx 3\%$. Moreover, it was found that confounders such as prompt size are highly correlated with accuracy metrics ($\approx 0.412\%$). The end result of our case study is to showcase causal inference evaluations, in practice, to reduce confounding bias. By reducing the bias, we offer an interpretable solution for the accuracy metric under analysis.
    摘要 一种非常常见的解决方案是训练大型自然语言模型(LLM)以生成代码。虽然许多研究表明了LLM的评价精度(例如BLEU、CodeBleu),但previous research largely overlooked causal inference的作用。现有的benchmark和数据集只能highlight生成结果与预期结果的差异,而不考虑混合变量(例如代码行数、提示大小),这些变量也影响精度指标。因此,在LLM执行生成代码任务时,无法用benchmark来衡量SE-基于治疗的 causal效应,也无法衡量混合变量与模型性能的相关性。为了带来统计学的正规性,本文提出了一种名为Galerascurrency的 benchmarking策略,包括了三个SE任务(代码完成、代码摘要、提交生成)的curated testbed,以帮助解释LLM的性能。我们通过对ChatGPT的表现进行case study,ILLUSTRATE了我们的benchmarking策略的启示。结果表明,ChatGPT的生成性能受提示 semantics 的average treatment effect(ATT)的Positive causal影响,其中ATT约为3%。此外,我们发现提示大小和精度指标之间存在高度的相关性(约0.412%)。最终,我们的case study表明,通过使用causal inference评价,可以减少混合偏见,从而提供可解释的精度指标。

A Theory of Intelligences: Concepts, Models, Implications

  • paper_url: http://arxiv.org/abs/2308.12411
  • repo_url: https://github.com/Aryia-Behroziuan/Other-sources
  • paper_authors: Michael E. Hochberg
  • for: The paper is written to propose a theory of intelligence based on first principles, with the goal of understanding intelligence in humans and machines, and to explain endeavors that do not necessarily affect Darwinian fitness.
  • methods: The paper uses a variety of methods, including discussing key features of intelligence, presenting a framework for a first principles Theory of Intelligence, and proposing a compact mathematical form of surprisal and difficulty.
  • results: The paper presents several conceptual advances, including the prediction that paths to a goal not only function to accurately achieve goals, but also lead to higher probabilities for future attainable goals and increased breadth to enter new goal spaces.Here is the information in Simplified Chinese text:
  • for: 本文提出了一种基于初等原理的智能理论,以便更好地理解人类和机器智能,以及不直接影响达尔沃因果适应的尝试。
  • methods: 本文使用了多种方法,包括讨论智能的关键特征、提出基于初等原理的智能理论框架,以及提出一种简洁的数学表达方式。
  • results: 本文提出了多个概念进步,包括Path efficiency和目标准确率的预测,以及路径可能导致更高的未来可达目标的可能性和新目标空间的扩展。
    Abstract Intelligence is a human construct to represent the ability to achieve goals. Given this wide berth, intelligence has been defined countless times, studied in a variety of ways and quantified using numerous measures. Understanding intelligence ultimately requires theory and quantification, both of which are elusive. My main objectives are to identify some of the central elements in and surrounding intelligence, discuss some of its challenges and propose a theory based on first principles. I focus on intelligence as defined by and for humans, frequently in comparison to machines, with the intention of setting the stage for more general characterizations in life, collectives, human designs such as AI and in non-designed physical and chemical systems. I discuss key features of intelligence, including path efficiency and goal accuracy, intelligence as a Black Box, environmental influences, flexibility to deal with surprisal, the regress of intelligence, the relativistic nature of intelligence and difficulty, and temporal changes in intelligence including its evolution. I present a framework for a first principles Theory of IntelligenceS (TIS), based on the quantifiable macro-scale system features of difficulty, surprisal and goal resolution accuracy. The proposed partitioning of uncertainty/solving and accuracy/understanding is particularly novel since it predicts that paths to a goal not only function to accurately achieve goals, but as experimentations leading to higher probabilities for future attainable goals and increased breadth to enter new goal spaces. TIS can therefore explain endeavors that do not necessarily affect Darwinian fitness, such as leisure, politics, games and art. I conclude with several conceptual advances of TIS including a compact mathematical form of surprisal and difficulty, the theoretical basis of TIS, and open questions.
    摘要 人类创造出的智能概念表示能够实现目标的能力。由此广泛定义、研究和量化,智能的理解最终需要理论和量化,它们都是悬而不稳的。我的主要目标是确定智能的中心元素,讨论智能的挑战,并提出基于初始原则的理论(TIS)。我将智能定义为人类定义的,并与机器进行比较,以设置更广泛的生命、集体、人工智能和非设计的物理和化学系统的舞台。我将讨论智能的关键特征,包括路径效率和目标准确率,智能的黑盒特性,环境影响、意外处理能力、智能的征途回归、智能的相对性和困难度,以及时间变化,包括智能的演化。我将提出一种基于量化大规模系统特征的TIS理论框架,该框架包括难度、意外和目标解决准确率三个量化特征。这种分解不确定性/解决和准确性/理解的分 partitioning是特别有趣,因为它预测了路径不仅用于准确实现目标,还用于产生更高的未来可能目标的可能性和扩大进入新的目标空间。TIS可以解释不必要影响达尔沃尼遗传fitness的活动,如休闲、政治、游戏和艺术。我的概念进步包括量化难度和意外的mathematical表达、TIS理论基础和开Question。

Self-Supervised Learning for Endoscopic Video Analysis

  • paper_url: http://arxiv.org/abs/2308.12394
  • repo_url: https://github.com/royhirsch/endossl
  • paper_authors: Roy Hirsch, Mathilde Caron, Regev Cohen, Amir Livne, Ron Shapiro, Tomer Golany, Roman Goldenberg, Daniel Freedman, Ehud Rivlin
  • for: 这篇论文的目的是探讨自主学习(SSL)在医疗领域中的应用,特别是在镜头结构探查和腹腔镜检查等领域。
  • methods: 这篇论文使用了Masked Siamese Networks(MSNs)作为SSL框架,并使用大量的无标注影像资料进行训练。
  • results: 这篇论文在医疗领域的测试中获得了state-of-the-art的表现,包括胃腔镜检查和colonoscopy的手术阶段识别和肿瘤特征分类等。此外,这篇论文还获得了50%的标注数据量减少,未对表现造成影响。因此,这篇论文提供了SSL可以对医疗领域中的标注数据量做出具体的减少。
    Abstract Self-supervised learning (SSL) has led to important breakthroughs in computer vision by allowing learning from large amounts of unlabeled data. As such, it might have a pivotal role to play in biomedicine where annotating data requires a highly specialized expertise. Yet, there are many healthcare domains for which SSL has not been extensively explored. One such domain is endoscopy, minimally invasive procedures which are commonly used to detect and treat infections, chronic inflammatory diseases or cancer. In this work, we study the use of a leading SSL framework, namely Masked Siamese Networks (MSNs), for endoscopic video analysis such as colonoscopy and laparoscopy. To fully exploit the power of SSL, we create sizable unlabeled endoscopic video datasets for training MSNs. These strong image representations serve as a foundation for secondary training with limited annotated datasets, resulting in state-of-the-art performance in endoscopic benchmarks like surgical phase recognition during laparoscopy and colonoscopic polyp characterization. Additionally, we achieve a 50% reduction in annotated data size without sacrificing performance. Thus, our work provides evidence that SSL can dramatically reduce the need of annotated data in endoscopy.
    摘要 自我监督学习(SSL)已经在计算机视觉领域取得了重要突破,允许学习大量无标注数据。因此,它可能在生物医学领域扮演重要的角色,因为标注数据的获得需要特殊的专业知识。然而,医疗领域中有许多尚未得到广泛探索的领域。我们在这种领域中研究了一种主流SSL框架,即假设网络(MSNs),用于endoскопи视频分析,如colonoscopy和laparoscopy。为了充分利用SSL的力量,我们创建了大量无标注endoскопи视频数据集用于MSNs的训练。这些强大的图像表示serve as a foundation for secondary training with limited annotated datasets, resulting in state-of-the-art performance in endoscopic benchmarks like surgical phase recognition during laparoscopy and colonoscopic polyp characterization。此外,我们实现了标注数据大小的50%减少,无需牺牲性能。因此,我们的工作提供了证据,表明SSL可以在endoscopy中减少标注数据的需求。

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning

  • paper_url: http://arxiv.org/abs/2308.12383
  • repo_url: https://github.com/aimagelab/pma-net
  • paper_authors: Manuele Barraco, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
  • for: 本研究旨在提高图像描述task中Transformer建模器的表现,特别是利用其他训练样本的信息来提高图像描述的准确率。
  • methods: 本研究提出了一种基于prototypical memory模型的注意力机制,可以在图像描述任务中共享知识,提高模型的表现。
  • results: 实验结果表明,与基elines和state-of-the-art方法进行比较,提出的方案可以在COCO数据集上提高Encoder-Decoder Transformer模型的表现,增加CIDEr点3.7个。
    Abstract Image captioning, like many tasks involving vision and language, currently relies on Transformer-based architectures for extracting the semantics in an image and translating it into linguistically coherent descriptions. Although successful, the attention operator only considers a weighted summation of projections of the current input sample, therefore ignoring the relevant semantic information which can come from the joint observation of other samples. In this paper, we devise a network which can perform attention over activations obtained while processing other training samples, through a prototypical memory model. Our memory models the distribution of past keys and values through the definition of prototype vectors which are both discriminative and compact. Experimentally, we assess the performance of the proposed model on the COCO dataset, in comparison with carefully designed baselines and state-of-the-art approaches, and by investigating the role of each of the proposed components. We demonstrate that our proposal can increase the performance of an encoder-decoder Transformer by 3.7 CIDEr points both when training in cross-entropy only and when fine-tuning with self-critical sequence training. Source code and trained models are available at: https://github.com/aimagelab/PMA-Net.
    摘要 图像描述、如多种视觉语言任务一样,目前都是基于Transformer架构来提取图像中的 semantics并将其转换成语言上的准确描述。虽然成功,但只考虑当前输入样本的权重汇集的注意操作,ignore了与其他样本的共同观察可能提供的相关 semantic信息。在这篇论文中,我们设计了一个网络,可以在处理其他训练样本的活动中进行注意力,通过一种prototype模型。我们的记忆模型将过去的键和值分布模型为prototype вектор,这些 вектор都是 Both discriminative和compact。实验中,我们对COCO dataset进行评估,与注意点设计和现有approaches进行比较,并investigate每个提案的作用。我们发现,我们的提议可以在encoder-decoder Transformer中提高性能,即使只在cross-entropy中训练或者自我批判序列训练。源代码和训练模型可以在:https://github.com/aimagelab/PMA-Net 中找到。

Open-set Face Recognition with Neural Ensemble, Maximal Entropy Loss and Feature Augmentation

  • paper_url: http://arxiv.org/abs/2308.12371
  • repo_url: None
  • paper_authors: Rafael Henrique Vareto, Manuel Günther, William Robson Schwartz
  • for: 本研究旨在提高面Recognition系统的准确率和安全性,特别是在面识别任务中处理无法识别的人脸场景。
  • methods: 本研究提出了一种新的方法,通过结合紧凑型神经网络 ensemble和margin-based cost function来提高面识别精度和鲁棒性。在训练时间内,通过新的混合特征增强技术,可以从外部数据库或者生成synthetically获取补充的负样本。
  • results: 在LFW和IJB-C datasets上进行了实验,结果显示,该方法可以提高closed和open-set identification率。
    Abstract Open-set face recognition refers to a scenario in which biometric systems have incomplete knowledge of all existing subjects. Therefore, they are expected to prevent face samples of unregistered subjects from being identified as previously enrolled identities. This watchlist context adds an arduous requirement that calls for the dismissal of irrelevant faces by focusing mainly on subjects of interest. As a response, this work introduces a novel method that associates an ensemble of compact neural networks with a margin-based cost function that explores additional samples. Supplementary negative samples can be obtained from external databases or synthetically built at the representation level in training time with a new mix-up feature augmentation approach. Deep neural networks pre-trained on large face datasets serve as the preliminary feature extraction module. We carry out experiments on well-known LFW and IJB-C datasets where results show that the approach is able to boost closed and open-set identification rates.
    摘要

SafeAR: Towards Safer Algorithmic Recourse by Risk-Aware Policies

  • paper_url: http://arxiv.org/abs/2308.12367
  • repo_url: None
  • paper_authors: Haochen Wu, Shubham Sharma, Sunandita Patra, Sriram Gopalakrishnan
    for:This paper focuses on providing recourse for individuals adversely affected by machine learning (ML) models in critical domains like finance and healthcare. The goal is to empower people to choose a recourse based on their risk tolerance, considering the risk of higher costs.methods:The paper proposes a method called Safer Algorithmic Recourse (SafeAR) that computes recourse policies with risk considerations. It connects algorithmic recourse literature with risk-sensitive reinforcement learning and adopts financial measures like Value at Risk and Conditional Value at Risk to summarize risk concisely.results:The paper compares policies with different levels of risk-aversion using risk measures and recourse desiderata (sparsity and proximity) on two real-world datasets. The results show that SafeAR can provide more risk-sensitive recourse recommendations than existing methods, enabling individuals to make more informed decisions based on their risk tolerance.
    Abstract With the growing use of machine learning (ML) models in critical domains such as finance and healthcare, the need to offer recourse for those adversely affected by the decisions of ML models has become more important; individuals ought to be provided with recommendations on actions to take for improving their situation and thus receive a favorable decision. Prior work on sequential algorithmic recourse -- which recommends a series of changes -- focuses on action feasibility and uses the proximity of feature changes to determine action costs. However, the uncertainties of feature changes and the risk of higher than average costs in recourse have not been considered. It is undesirable if a recourse could (with some probability) result in a worse situation from which recovery requires an extremely high cost. It is essential to incorporate risks when computing and evaluating recourse. We call the recourse computed with such risk considerations as Safer Algorithmic Recourse (SafeAR). The objective is to empower people to choose a recourse based on their risk tolerance. In this work, we discuss and show how existing recourse desiderata can fail to capture the risk of higher costs. We present a method to compute recourse policies that consider variability in cost and connect algorithmic recourse literature with risk-sensitive reinforcement learning. We also adopt measures ``Value at Risk'' and ``Conditional Value at Risk'' from the financial literature to summarize risk concisely. We apply our method to two real-world datasets and compare policies with different levels of risk-aversion using risk measures and recourse desiderata (sparsity and proximity).
    摘要 In this work, we discuss how existing recourse desiderata can fail to capture the risk of higher costs. We present a method to compute recourse policies that consider variability in cost and connect algorithmic recourse literature with risk-sensitive reinforcement learning. We also adopt measures like "Value at Risk" and "Conditional Value at Risk" from the financial literature to summarize risk concisely. We apply our method to two real-world datasets and compare policies with different levels of risk-aversion using risk measures and recourse desiderata (sparsity and proximity).

CHORUS: Learning Canonicalized 3D Human-Object Spatial Relations from Unbounded Synthesized Images

  • paper_url: http://arxiv.org/abs/2308.12288
  • repo_url: None
  • paper_authors: Sookwan Han, Hanbyul Joo
  • for: 这项研究的目的是教育机器人理解和模型人类和物体之间的3D空间共同谐谑。
  • methods: 这种方法利用一种生成模型,该模型可以生成高质量的2D图像从不同视角 capture human-object互动的多个图像。
  • results: 研究人员表明,使用这种方法可以从不同视角的2D图像中学习人类和物体之间的3D空间共同谐谑。此外,他们还提出了多种方法,包括利用生成图像模型来学习3D人类-物体空间关系,以及使用3D占用理解和pose canonicalization来解释不一致的2D图像。
    Abstract We present a method for teaching machines to understand and model the underlying spatial common sense of diverse human-object interactions in 3D in a self-supervised way. This is a challenging task, as there exist specific manifolds of the interactions that can be considered human-like and natural, but the human pose and the geometry of objects can vary even for similar interactions. Such diversity makes the annotating task of 3D interactions difficult and hard to scale, which limits the potential to reason about that in a supervised way. One way of learning the 3D spatial relationship between humans and objects during interaction is by showing multiple 2D images captured from different viewpoints when humans interact with the same type of objects. The core idea of our method is to leverage a generative model that produces high-quality 2D images from an arbitrary text prompt input as an "unbounded" data generator with effective controllability and view diversity. Despite its imperfection of the image quality over real images, we demonstrate that the synthesized images are sufficient to learn the 3D human-object spatial relations. We present multiple strategies to leverage the synthesized images, including (1) the first method to leverage a generative image model for 3D human-object spatial relation learning; (2) a framework to reason about the 3D spatial relations from inconsistent 2D cues in a self-supervised manner via 3D occupancy reasoning with pose canonicalization; (3) semantic clustering to disambiguate different types of interactions with the same object types; and (4) a novel metric to assess the quality of 3D spatial learning of interaction. Project Page: https://jellyheadandrew.github.io/projects/chorus
    摘要 我们提出了一种方法,用于教学机器人理解和模型人类和物体之间的底层空间共识,在3D空间中自主监督式地学习。这是一项挑战性的任务,因为人类的姿势和物体的几何结构可以在相似的交互中存在差异,这导致了标注3D交互的困难和缺乏扩展性,限制了我们可以在监督方式下理解的可能性。一种方法是通过显示多个视角 capture的2D图像,以便在人类和物体之间的交互中学习3D空间关系。我们的核心想法是利用一种生成模型,可以生成高质量的2D图像,从不同的视角捕捉到人类和物体之间的交互。尽管生成的图像质量不如实际图像,但我们示出了这些生成的图像足够以学习人类和物体之间的3D空间关系。我们提出了多种使用生成的图像进行3D人物空间关系学习的策略,包括:1. 首次利用生成图像模型来学习3D人物空间关系。2. 通过自主监督的方式,从不一致的2Dcue中理解3D空间关系,并使用pose canonicalization进行姿势标准化。3. 使用语义归一化来综合分类不同的交互方式。4. 提出了一种新的评价指标,用于评估3D人物空间关系的学习质量。更多细节请参考我们的项目页面:https://jellyheadandrew.github.io/projects/chorus

D4: Improving LLM Pretraining via Document De-Duplication and Diversification

  • paper_url: http://arxiv.org/abs/2308.12284
  • repo_url: None
  • paper_authors: Kushal Tirumala, Daniel Simig, Armen Aghajanyan, Ari S. Morcos
  • for: 这paper主要针对大型自然语言模型(LLM)的预训练和下游任务的性能提升。
  • methods: 这paper使用了预训练模型的嵌入来进行精心的数据选择,以提高预训练和下游任务的性能。
  • results: 这paper的实验结果表明,采用精心的数据选择可以提高LLM的预训练速度(20%的效率提升),并在16种NLP任务中提高下游任务的平均性能(最多2%的提升),特别是在6.7B模型 scales。此外,paper还表明,通过智能重复数据可以超过基eline训练,而随机重复数据则 perfoms worse than baseline training。
    Abstract Over recent years, an increasing amount of compute and data has been poured into training large language models (LLMs), usually by doing one-pass learning on as many tokens as possible randomly selected from large-scale web corpora. While training on ever-larger portions of the internet leads to consistent performance improvements, the size of these improvements diminishes with scale, and there has been little work exploring the effect of data selection on pre-training and downstream performance beyond simple de-duplication methods such as MinHash. Here, we show that careful data selection (on top of de-duplicated data) via pre-trained model embeddings can speed up training (20% efficiency gains) and improves average downstream accuracy on 16 NLP tasks (up to 2%) at the 6.7B model scale. Furthermore, we show that repeating data intelligently consistently outperforms baseline training (while repeating random data performs worse than baseline training). Our results indicate that clever data selection can significantly improve LLM pre-training, calls into question the common practice of training for a single epoch on as much data as possible, and demonstrates a path to keep improving our models past the limits of randomly sampling web data.
    摘要

Simple is Better and Large is Not Enough: Towards Ensembling of Foundational Language Models

  • paper_url: http://arxiv.org/abs/2308.12272
  • repo_url: None
  • paper_authors: Nancy Tyagi, Aidin Shiri, Surjodeep Sarkar, Abhishek Kumar Umrawal, Manas Gaur
  • for: This paper aims to explore the potential of smaller foundational language models (FLMs) and their ensembling on benchmark and real-world datasets, and to investigate the influence of ensemble on the individualistic attention of FLMs.
  • methods: The authors use three ensemble techniques: Shallow, Semi, and Deep, and introduce a knowledge-guided reinforcement learning approach in the Deep-Ensemble.
  • results: The suggested Deep-Ensemble BERT outperforms its large variation i.e. BERTlarge, by a factor of many times using datasets that show the usefulness of NLP in sensitive fields, such as mental health.Here’s the summary in traditional Chinese characters:
  • for: 本研究旨在探讨小型基础语言模型 (FLMs) 的可能性和其ensemble在标准和实际数据集上的表现,并 investigate ensemble对FLMs的个性注意力的影响。
  • methods: 作者使用三种ensemble技术:浅层、半深和深,并引入知识导向资源学习approach。
  • results: 建议的深度ensembleBERT exceeds其大型变体BERTlarge,使用敏感领域中NLP的有用性的数据集,例如心理健康。
    Abstract Foundational Language Models (FLMs) have advanced natural language processing (NLP) research. Current researchers are developing larger FLMs (e.g., XLNet, T5) to enable contextualized language representation, classification, and generation. While developing larger FLMs has been of significant advantage, it is also a liability concerning hallucination and predictive uncertainty. Fundamentally, larger FLMs are built on the same foundations as smaller FLMs (e.g., BERT); hence, one must recognize the potential of smaller FLMs which can be realized through an ensemble. In the current research, we perform a reality check on FLMs and their ensemble on benchmark and real-world datasets. We hypothesize that the ensembling of FLMs can influence the individualistic attention of FLMs and unravel the strength of coordination and cooperation of different FLMs. We utilize BERT and define three other ensemble techniques: {Shallow, Semi, and Deep}, wherein the Deep-Ensemble introduces a knowledge-guided reinforcement learning approach. We discovered that the suggested Deep-Ensemble BERT outperforms its large variation i.e. BERTlarge, by a factor of many times using datasets that show the usefulness of NLP in sensitive fields, such as mental health.
    摘要 基础语言模型(FLM)在自然语言处理(NLP)研究中进步了大量。当前研究人员在开发更大的FLM(如XLNet、T5)以实现语言表示、分类和生成上的上下文化化能力。虽然开发更大的FLM有了显著的优势,但也存在幻觉和预测不确定性的问题。基本上,更大的FLM都是基于小型FLM(如BERT)的基础上建立的,因此需要认可小型FLM的潜在能力,并通过集成来实现。在当前的研究中,我们对FLM和其集成的现实检查,并对标准和实际数据集上进行评估。我们假设 ensemble FLM 可以影响它们的个人注意力,并探索不同 FLM 之间的协作和合作的强度。我们使用 BERT 定义三种集成技术:{浅、半、深},其中深度集成还使用了知识导向的强化学习方法。我们发现,我们提议的深度集成 BERT 在使用敏感领域中的 NLP 数据集上,比其大版本 BERTlarge 高得多。

Language Reward Modulation for Pretraining Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.12270
  • repo_url: https://github.com/ademiadeniji/lamp
  • paper_authors: Ademi Adeniji, Amber Xie, Carmelo Sferrazza, Younggyo Seo, Stephen James, Pieter Abbeel
  • For: 本研究考虑了使用学习奖励函数(LRF)来解决稀缺奖励学习(RL)任务,并取得了一些任务复杂度的进步。但我们提问是否今天的LRF适用于直接替换任务奖励。相反,我们提议利用LRF作为RL前期训练的能力。* Methods: 我们提出了$\textbf{LA}$nguage Reward $\textbf{M}$odulated $\textbf{P}$retraining(LAMP)方法,它利用预训练的视觉语言模型(VLM)来生成随机的语言指令和图像观察的对比准则,并用这些准则作为RL前期训练的预训练资源。* Results: 我们的LAMP方法可以在RLBench中使用 sample-efficient 的方式快速学习爬行任务。
    Abstract Using learned reward functions (LRFs) as a means to solve sparse-reward reinforcement learning (RL) tasks has yielded some steady progress in task-complexity through the years. In this work, we question whether today's LRFs are best-suited as a direct replacement for task rewards. Instead, we propose leveraging the capabilities of LRFs as a pretraining signal for RL. Concretely, we propose $\textbf{LA}$nguage Reward $\textbf{M}$odulated $\textbf{P}$retraining (LAMP) which leverages the zero-shot capabilities of Vision-Language Models (VLMs) as a $\textit{pretraining}$ utility for RL as opposed to a downstream task reward. LAMP uses a frozen, pretrained VLM to scalably generate noisy, albeit shaped exploration rewards by computing the contrastive alignment between a highly diverse collection of language instructions and the image observations of an agent in its pretraining environment. LAMP optimizes these rewards in conjunction with standard novelty-seeking exploration rewards with reinforcement learning to acquire a language-conditioned, pretrained policy. Our VLM pretraining approach, which is a departure from previous attempts to use LRFs, can warmstart sample-efficient learning on robot manipulation tasks in RLBench.
    摘要 LAMP leverages a frozen, pretrained VLM to generate noisy, yet shaped exploration rewards by computing the contrastive alignment between a diverse collection of language instructions and the image observations of an agent in its pretraining environment. LAMP optimizes these rewards in conjunction with standard novelty-seeking exploration rewards with reinforcement learning to acquire a language-conditioned, pretrained policy. Our VLM pretraining approach, which departs from previous attempts to use LRFs, can warmstart sample-efficient learning on robot manipulation tasks in RLBench.

FECoM: A Step towards Fine-Grained Energy Measurement for Deep Learning

  • paper_url: http://arxiv.org/abs/2308.12264
  • repo_url: None
  • paper_authors: Saurabhsingh Rajput, Tim Widmayer, Ziyuan Shang, Maria Kechagia, Federica Sarro, Tushar Sharma
  • for: 本研究旨在提高深度学习(DL)模型的能源消耗量,并且提出了一种精细化能源消耗量测量的框架(FECoM),以便帮助研究人员和开发者更好地了解DL系统的能源消耗量。
  • methods: FECoM使用静态实rumentation技术,考虑了多种因素,包括计算负荷和温度稳定性,以减少测量精细化能源消耗量的挑战。
  • results: 通过使用FECoM,我们对TensorFlow框架的能源消耗量进行了精细化测量,并研究了参数大小和执行时间对能源消耗量的影响。这些结果可以帮助我们更好地理解TensorFlow API的能源资源。此外,我们还讨论了设计和实现精细化能源消耗量测量工具的一些考虑因素和挑战。
    Abstract With the increasing usage, scale, and complexity of Deep Learning (DL) models, their rapidly growing energy consumption has become a critical concern. Promoting green development and energy awareness at different granularities is the need of the hour to limit carbon emissions of DL systems. However, the lack of standard and repeatable tools to accurately measure and optimize energy consumption at a fine granularity (e.g., at method level) hinders progress in this area. In this paper, we introduce FECoM (Fine-grained Energy Consumption Meter), a framework for fine-grained DL energy consumption measurement. Specifically, FECoM provides researchers and developers a mechanism to profile DL APIs. FECoM addresses the challenges of measuring energy consumption at fine-grained level by using static instrumentation and considering various factors, including computational load and temperature stability. We assess FECoM's capability to measure fine-grained energy consumption for one of the most popular open-source DL frameworks, namely TensorFlow. Using FECoM, we also investigate the impact of parameter size and execution time on energy consumption, enriching our understanding of TensorFlow APIs' energy profiles. Furthermore, we elaborate on the considerations, issues, and challenges that one needs to consider while designing and implementing a fine-grained energy consumption measurement tool. We hope this work will facilitate further advances in DL energy measurement and the development of energy-aware practices for DL systems.
    摘要 In this paper, we introduce Fine-grained Energy Consumption Meter (FECoM), a framework for measuring DL energy consumption at a fine granularity. FECoM provides researchers and developers with a mechanism to profile DL APIs. FECoM addresses the challenges of measuring energy consumption at a fine-grained level by using static instrumentation and considering various factors, including computational load and temperature stability.We assess FECoM's capability to measure fine-grained energy consumption for one of the most popular open-source DL frameworks, TensorFlow. Using FECoM, we also investigate the impact of parameter size and execution time on energy consumption, enriching our understanding of TensorFlow APIs' energy profiles. Furthermore, we discuss the considerations, issues, and challenges that need to be considered when designing and implementing a fine-grained energy consumption measurement tool.We hope that this work will facilitate further advances in DL energy measurement and the development of energy-aware practices for DL systems.

Multi-Objective Optimization for Sparse Deep Neural Network Training

  • paper_url: http://arxiv.org/abs/2308.12243
  • repo_url: https://github.com/salomonhotegni/mdmtn
  • paper_authors: S. S. Hotegni, S. Peitz, M. Berkemeier
  • for: 这篇论文目的是为了训练深度学习网络(DNNs),并使其能够同时完成多个任务(Multi-Task Learning)。
  • methods: 这篇论文使用了一种 modificated Weighted Chebyshev scalarization 技术,将多个任务转换为一个单一的问题,并使用 Augmented Lagrangian 方法来解决。
  • results: 这篇论文的实验结果显示,这种方法可以在训练 DNNs 时,逐步简化网络结构,并且可以适应不同任务的适应率,无需对网络结构进行大量修改。
    Abstract Different conflicting optimization criteria arise naturally in various Deep Learning scenarios. These can address different main tasks (i.e., in the setting of Multi-Task Learning), but also main and secondary tasks such as loss minimization versus sparsity. The usual approach is a simple weighting of the criteria, which formally only works in the convex setting. In this paper, we present a Multi-Objective Optimization algorithm using a modified Weighted Chebyshev scalarization for training Deep Neural Networks (DNNs) with respect to several tasks. By employing this scalarization technique, the algorithm can identify all optimal solutions of the original problem while reducing its complexity to a sequence of single-objective problems. The simplified problems are then solved using an Augmented Lagrangian method, enabling the use of popular optimization techniques such as Adam and Stochastic Gradient Descent, while efficaciously handling constraints. Our work aims to address the (economical and also ecological) sustainability issue of DNN models, with a particular focus on Deep Multi-Task models, which are typically designed with a very large number of weights to perform equally well on multiple tasks. Through experiments conducted on two Machine Learning datasets, we demonstrate the possibility of adaptively sparsifying the model during training without significantly impacting its performance, if we are willing to apply task-specific adaptations to the network weights. Code is available at https://github.com/salomonhotegni/MDMTN.
    摘要 不同的冲突优化标准在深度学习场景中自然出现。这些标准可以用于不同的主任务(例如在多任务学习设置中),也可以用于主任务和次任务之间的冲突,如损失最小化和稀疏化。通常的方法是将这些标准简单地权衡,但这只能在凸Setting中有效。在这篇论文中,我们提出了一种多目标优化算法,使用修改后的Weighted ChebyshevScalarization来训练深度神经网络(DNNs)对多个任务进行训练。通过使用这种Scalarization技术,算法可以找到原始问题的所有优化解决方案,并将其减少到一个序列中的单个目标问题。这些简化后的问题然后可以使用 Augmented Lagrangian 方法解决,使用流行的优化技术such as Adam和Stochastic Gradient Descent,同时有效地处理约束。我们的工作旨在解决深度神经网络模型的(经济和生态)可持续性问题,尤其是深度多任务模型,这些模型通常具有很多权重,以便在多个任务上具有相同的性能。通过对两个机器学习数据集进行实验,我们示出了在训练过程中适应性减少模型的可能性,只要愿意在任务特定的网络权重上应用适应。代码可以在 https://github.com/salomonhotegni/MDMTN 上获取。

LLMRec: Benchmarking Large Language Models on Recommendation Task

  • paper_url: http://arxiv.org/abs/2308.12241
  • repo_url: https://github.com/williamliujl/llmrec
  • paper_authors: Junling Liu, Chao Liu, Peilin Zhou, Qichen Ye, Dading Chong, Kang Zhou, Yueqi Xie, Yuwei Cao, Shoujin Wang, Chenyu You, Philip S. Yu
    for:本研究旨在对大型自然语言模型(LLM)在推荐领域的应用进行 investigate。methods:本研究使用了多种常用的Off-the-shelf LLM,如ChatGPT、LLaMA、ChatGLM,对五种推荐任务进行了 benchmark,包括评分预测、sequential recommendation、直接推荐、解释生成和评论概要。此外,我们还 investigate了精度 fine-tuning 的效iveness以提高 LLM 的指令遵从能力。results:结果表明 LLM 在准确性基于任务中只 display moderate 的能力,但在可解释性基于任务中与现状方法相当。此外,我们还进行了质量评估,发现 LLM 可以真正理解提供的信息,并生成 clearer 和更合理的结果。
    Abstract Recently, the fast development of Large Language Models (LLMs) such as ChatGPT has significantly advanced NLP tasks by enhancing the capabilities of conversational models. However, the application of LLMs in the recommendation domain has not been thoroughly investigated. To bridge this gap, we propose LLMRec, a LLM-based recommender system designed for benchmarking LLMs on various recommendation tasks. Specifically, we benchmark several popular off-the-shelf LLMs, such as ChatGPT, LLaMA, ChatGLM, on five recommendation tasks, including rating prediction, sequential recommendation, direct recommendation, explanation generation, and review summarization. Furthermore, we investigate the effectiveness of supervised finetuning to improve LLMs' instruction compliance ability. The benchmark results indicate that LLMs displayed only moderate proficiency in accuracy-based tasks such as sequential and direct recommendation. However, they demonstrated comparable performance to state-of-the-art methods in explainability-based tasks. We also conduct qualitative evaluations to further evaluate the quality of contents generated by different models, and the results show that LLMs can truly understand the provided information and generate clearer and more reasonable results. We aspire that this benchmark will serve as an inspiration for researchers to delve deeper into the potential of LLMs in enhancing recommendation performance. Our codes, processed data and benchmark results are available at https://github.com/williamliujl/LLMRec.
    摘要 近些时候,大型语言模型(LLM)如ChatGPT的快速发展已经对自然语言处理(NLP)任务提供了显著改进。然而,LLM在推荐领域的应用还未得到了全面的探索。为了填补这一空白,我们提出了LLMRec,一个基于LLM的推荐系统,用于对不同的推荐任务进行比较。具体来说,我们对几种流行的准备好的LLM,如ChatGPT、LLaMA、ChatGLM进行了多种推荐任务的评估,包括评分预测、顺序推荐、直接推荐、解释生成和评论概要。此外,我们还 investigate了LLM的指导遵从能力是否可以通过监督微调来改进。 benchmark结果显示,LLM在精度基于任务中只示 moderate 的能力,但在可解释性基于任务中表现和当前领先方法相当。我们还进行了质量评估,以评估不同模型生成的内容质量,结果表明LLM可以真正理解提供的信息,并生成更清晰和合理的结果。我们希望这个 benchmark 能够激励研究人员更深入研究LLM在提高推荐性能方面的潜在力量。我们的代码、处理数据和 benchmark 结果可以在 中找到。

Enhancing cardiovascular risk prediction through AI-enabled calcium-omics

  • paper_url: http://arxiv.org/abs/2308.12224
  • repo_url: None
  • paper_authors: Ammar Hoori, Sadeer Al-Kindi, Tao Hu, Yingnan Song, Hao Wu, Juhwan Lee, Nour Tashtish, Pingfu Fu, Robert Gilkeson, Sanjay Rajagopalan, David L. Wilson
    for:The paper aims to determine if AI methods using detailed calcification features can improve the prediction of major adverse cardiovascular events (MACE).methods:The study uses a Cox model with elastic-net regularization on 2457 CT calcium score (CTCS) enriched for MACE events, and employs sampling techniques to enhance model training. The study also investigates Cox models with selected features to identify explainable high-risk characteristics.results:The proposed calcium-omics model with modified synthetic down sampling and up sampling gave higher C-index and two-year AUC compared to the Agatston score. The study found that numbers of calcifications, LAD mass, and diffusivity were important determinants of increased risk, and dense calcification was associated with lower risk. The calcium-omics model reclassified 63% of MACE patients to the high-risk group in a held-out test, with a categorical net-reclassification index of NRI=0.153.Here’s the information in Simplified Chinese text:for: 这项研究的目的是判断使用细化calcification特征的人工智能方法是否可以提高主要不良心血管事件预测。methods: 这项研究使用了Cox模型与杂化正则化,对2457个CT calcification score(CTCS)中的MACE事件进行了增强模型训练。研究还调查了Cox模型中选择的特征,以确定高风险特征的解释。results: 提案的calcification-omics模型使用了修改的同步下采样和上采样,对80:20的训练/测试集进行了评估。与Agatston分数相比,calcification-omics模型的C-index和两年AUC得分较高(80.5%/71.6% vs 71.3%/70.3%)。研究发现,calcification数量、LAD质量和扩散率(一种度量calcification的空间分布)是增加风险的重要因素,而高浓度calcification(>1000HU)则与降低风险相关。calcification-omics模型在一个储存测试集中重新分类了63%的MACE患者为高风险组。
    Abstract Background. Coronary artery calcium (CAC) is a powerful predictor of major adverse cardiovascular events (MACE). Traditional Agatston score simply sums the calcium, albeit in a non-linear way, leaving room for improved calcification assessments that will more fully capture the extent of disease. Objective. To determine if AI methods using detailed calcification features (i.e., calcium-omics) can improve MACE prediction. Methods. We investigated additional features of calcification including assessment of mass, volume, density, spatial distribution, territory, etc. We used a Cox model with elastic-net regularization on 2457 CT calcium score (CTCS) enriched for MACE events obtained from a large no-cost CLARIFY program (ClinicalTri-als.gov Identifier: NCT04075162). We employed sampling techniques to enhance model training. We also investigated Cox models with selected features to identify explainable high-risk characteristics. Results. Our proposed calcium-omics model with modified synthetic down sampling and up sampling gave C-index (80.5%/71.6%) and two-year AUC (82.4%/74.8%) for (80:20, training/testing), respectively (sampling was applied to the training set only). Results compared favorably to Agatston which gave C-index (71.3%/70.3%) and AUC (71.8%/68.8%), respectively. Among calcium-omics features, numbers of calcifications, LAD mass, and diffusivity (a measure of spatial distribution) were important determinants of increased risk, with dense calcification (>1000HU) associated with lower risk. The calcium-omics model reclassified 63% of MACE patients to the high risk group in a held-out test. The categorical net-reclassification index was NRI=0.153. Conclusions. AI analysis of coronary calcification can lead to improved results as compared to Agatston scoring. Our findings suggest the utility of calcium-omics in improved prediction of risk.
    摘要 背景:肺动脉 calcification(CAC)是一个强大的预测主要冠状疾病事件(MACE)的预测因素。传统的阿加顿分数简单地总计calcification,却不是线性的,留下了更好的calcification评估方法来更全面地捕捉疾病的程度。目标:确定AI方法使用细节calcification特征(i.e., calcium-omics)可以提高MACE预测。方法:我们调查了calcification的更多特征,包括质量、体积、密度、空间分布、领域等。我们使用了一个Cox模型与栅格 regularization,对于2457个CT calcification score(CTCS)中的MACE事件进行了大规模的no-cost CLARIFY计划(ClinicalTrials.gov Identifier: NCT04075162)。我们使用了采样技术来增强模型训练。我们还使用了Cox模型选择特征来 Identify可解释的高风险特征。结果:我们的提议的calcification-omics模型与修改后的同步下采样和上采样给C-index(80.5%/71.6%)和两年AUC(82.4%/74.8%),分别在(80:20,训练/测试)。结果与阿加顿相比,给C-index(71.3%/70.3%)和AUC(71.8%/68.8%)。 Among calcification-omics features,numbers of calcifications、LAD mass和diffusivity(一种度量空间分布)是风险增加的重要决定因素,而dense calcification(>1000HU)与lower risk相关。calcification-omics模型在一个保留测试中重新分类了63%的MACE患者。NRI=0.153。结论:AI对肺动脉calcification的分析可以导致更好的结果,相比阿加顿分数。我们的发现表明calcification-omics的使用可以提高预测风险的准确性。

Critical Learning Periods Emerge Even in Deep Linear Networks

  • paper_url: http://arxiv.org/abs/2308.12221
  • repo_url: None
  • paper_authors: Michael Kleinman, Alessandro Achille, Stefano Soatto
  • for: 本研究探讨了深度学习网络中的批处理期,即在学习过程中,某些特定的批处理可以对后续学习产生深刻的影响。
  • methods: 本研究使用了深度线性网络模型,并通过分析和实验表明了批处理期的依存关系。
  • results: 研究发现,批处理期在深度网络中存在,并且与模型的深度和数据分布结构有关。此外,研究还发现了在多任务学习中,预训练的影响,并提出了一种基于竞争的解释模型。
    Abstract Critical learning periods are periods early in development where temporary sensory deficits can have a permanent effect on behavior and learned representations. Despite the radical differences between biological and artificial networks, critical learning periods have been empirically observed in both systems. This suggests that critical periods may be fundamental to learning and not an accident of biology. Yet, why exactly critical periods emerge in deep networks is still an open question, and in particular it is unclear whether the critical periods observed in both systems depend on particular architectural or optimization details. To isolate the key underlying factors, we focus on deep linear network models, and show that, surprisingly, such networks also display much of the behavior seen in biology and artificial networks, while being amenable to analytical treatment. We show that critical periods depend on the depth of the model and structure of the data distribution. We also show analytically and in simulations that the learning of features is tied to competition between sources. Finally, we extend our analysis to multi-task learning to show that pre-training on certain tasks can damage the transfer performance on new tasks, and show how this depends on the relationship between tasks and the duration of the pre-training stage. To the best of our knowledge, our work provides the first analytically tractable model that sheds light into why critical learning periods emerge in biological and artificial networks.
    摘要 “重要学习期是在发育早期的一些时期,这些时期的暂时感知缺陷可能会导致永久性的行为和学习表征的改变。尽管生物和人工网络之间有很大差异,但critical periods仍然在两种系统中被观察到。这表明critical periods可能是学习的基本特征,而不是生物学意外现象。然而,critical periods在哪里emerge仍然是一个开放的问题,尤其是不知道critical periods在两种系统中是否受到特定的建筑或优化细节的影响。为了孤立关键因素,我们将注重深度Linear Network模型,并显示了这些网络在生物和人工网络中显示了大量的行为,同时可以进行分析处理。我们发现critical periods与模型的深度和数据分布结构有关,并且在分析和 simulations中表明了学习特征的吸引是与多源竞争相关。最后,我们扩展我们的分析到多任务学习,并显示了在某些任务上进行预训练可能会对新任务的传输性能产生负面影响,并且如何这种影响与任务之间的关系和预训练阶段的长度有关。到目前为止,我们的工作提供了第一个可分析的模型,可以解释critical learning periods在生物和人工网络中的出现。”

Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning

  • paper_url: http://arxiv.org/abs/2308.12219
  • repo_url: https://github.com/yegcjs/diffusionllm
  • paper_authors: Jiasheng Ye, Zaixiang Zheng, Yu Bao, Lihua Qian, Quanquan Gu
  • for: 这篇论文旨在探讨 diffusion 语言模型是否可以解决通用语言任务,并证明可以通过扩大数据、大小和任务来使 diffusion 语言模型成为强大的语言学习模型。
  • methods: 作者首先通过大量数据的隐藏语言模型预训练获得知识,然后通过填充式适应来转化预训练的masked语言模型为 diffusion 语言模型,并在不同任务上进行任务特定的训练和指令特化训练来解锁其多样性。
  • results: 实验显示,随着 diffusion 语言模型的扩大,其表现在下游语言任务中逐渐提高,并且在不同任务上具有zero-shot和几少shot在场景学习能力,可以根据自然语言指令来解决许多未看过任务。
    Abstract The recent surge of generative AI has been fueled by the generative power of diffusion probabilistic models and the scalable capabilities of large language models. Despite their potential, it remains elusive whether diffusion language models can solve general language tasks comparable to their autoregressive counterparts. This paper demonstrates that scaling diffusion models w.r.t. data, sizes, and tasks can effectively make them strong language learners. We build competent diffusion language models at scale by first acquiring knowledge from massive data via masked language modeling pretraining thanks to their intrinsic connections. We then reprogram pretrained masked language models into diffusion language models via diffusive adaptation, wherein task-specific finetuning and instruction finetuning are explored to unlock their versatility in solving general language tasks. Experiments show that scaling diffusion language models consistently improves performance across downstream language tasks. We further discover that instruction finetuning can elicit zero-shot and few-shot in-context learning abilities that help tackle many unseen tasks by following natural language instructions, and show promise in advanced and challenging abilities such as reasoning
    摘要 最近的生成AI冲击浪潮得以归功于扩散概率模型的生成能力和大语言模型的可扩展性。 despite their potential, it remains unclear whether diffusion language models can solve general language tasks comparable to their autoregressive counterparts. This paper demonstrates that scaling diffusion models with respect to data, sizes, and tasks can effectively make them strong language learners. We build competent diffusion language models at scale by first acquiring knowledge from massive data via masked language modeling pretraining, thanks to their intrinsic connections. We then reprogram pretrained masked language models into diffusion language models via diffusive adaptation, wherein task-specific finetuning and instruction finetuning are explored to unlock their versatility in solving general language tasks. Experiments show that scaling diffusion language models consistently improves performance across downstream language tasks. We further discover that instruction finetuning can elicit zero-shot and few-shot in-context learning abilities that help tackle many unseen tasks by following natural language instructions, and show promise in advanced and challenging abilities such as reasoning.Here's a word-for-word translation of the text into Simplified Chinese:最近的生成AI冲击浪潮得以归功于扩散概率模型的生成能力和大语言模型的可扩展性。despite their potential, it remains unclear whether diffusion language models can solve general language tasks comparable to their autoregressive counterparts. This paper demonstrates that scaling diffusion models with respect to data, sizes, and tasks can effectively make them strong language learners. We build competent diffusion language models at scale by first acquiring knowledge from massive data via masked language modeling pretraining, thanks to their intrinsic connections. We then reprogram pretrained masked language models into diffusion language models via diffusive adaptation, wherein task-specific finetuning and instruction finetuning are explored to unlock their versatility in solving general language tasks. Experiments show that scaling diffusion language models consistently improves performance across downstream language tasks. We further discover that instruction finetuning can elicit zero-shot and few-shot in-context learning abilities that help tackle many unseen tasks by following natural language instructions, and show promise in advanced and challenging abilities such as reasoning.