cs.AI - 2023-09-06

The Role of Communication and Reference Songs in the Mixing Process: Insights from Professional Mix Engineers

  • paper_url: http://arxiv.org/abs/2309.03404
  • repo_url: None
  • paper_authors: Soumya Sai Vanka, Maryam Safi, Jean-Baptiste Rolland, György Fazekas
  • For: 这个论文的目的是研究专业混音工程师与客户之间的交流和反馈过程,以便更好地理解混音过程中的协作、共鸣和意图。* Methods: 这个研究使用了两个阶段的探索性研究方法,包括第一阶段的 semi-structured 采访,以及第二阶段的在线问卷调查。* Results: 这个研究发现,在混音过程中,协作、共鸣和意图是非常重要的,这些发现可以帮助开发智能多轨混音系统,以更好地支持这些实践。
    Abstract Effective music mixing requires technical and creative finesse, but clear communication with the client is crucial. The mixing engineer must grasp the client's expectations, and preferences, and collaborate to achieve the desired sound. The tacit agreement for the desired sound of the mix is often established using guides like reference songs and demo mixes exchanged between the artist and the engineer and sometimes verbalised using semantic terms. This paper presents the findings of a two-phased exploratory study aimed at understanding how professional mixing engineers interact with clients and use their feedback to guide the mixing process. For phase one, semi-structured interviews were conducted with five mixing engineers with the aim of gathering insights about their communication strategies, creative processes, and decision-making criteria. Based on the inferences from these interviews, an online questionnaire was designed and administered to a larger group of 22 mixing engineers during the second phase. The results of this study shed light on the importance of collaboration, empathy, and intention in the mixing process, and can inform the development of smart multi-track mixing systems that better support these practices. By highlighting the significance of these findings, this paper contributes to the growing body of research on the collaborative nature of music production and provides actionable recommendations for the design and implementation of innovative mixing tools.
    摘要 要有效地混音音乐,技术和创造力都是必要的,但与客户的沟通也非常重要。混音工程师必须理解客户的期望和喜好,并与其合作以实现感想中的音乐风格。在混音过程中,客户和工程师之间的含义和听众往往通过参考歌曲和演示混音来建立共识。这篇论文介绍了一项两期探索性研究,旨在了解专业混音工程师与客户之间的交流方式、创作过程和决策标准。第一阶段,我们采访了5名混音工程师,以了解他们的沟通策略、创作过程和决策标准。基于这些采访的结论,我们 THEN designed an online问卷,并在第二阶段向22名混音工程师进行了调查。这些结果 shed light on the importance of collaboration, empathy, and intention in the mixing process, and can inform the development of smart multi-track mixing systems that better support these practices. By highlighting the significance of these findings, this paper contributes to the growing body of research on the collaborative nature of music production and provides actionable recommendations for the design and implementation of innovative mixing tools.

Efficient Baselines for Motion Prediction in Autonomous Driving

  • paper_url: http://arxiv.org/abs/2309.03387
  • repo_url: https://github.com/cram3r95/mapfe4mp
  • paper_authors: Carlos Gómez-Huélamo, Marcos V. Conde, Rafael Barea, Manuel Ocaña, Luis M. Bergasa
  • for: 这篇论文的目的是提出一些高效的基准方案来解决多个环境中的动作预测问题(MP),以便在复杂的环境中实现自动驾驶栈(ADS)。
  • methods: 这篇论文使用了现有的SOTA技术,包括注意力机制和图 Né net,以及一种新的预处理步骤,基于动力学约束,来生成可靠的多Modal轨迹。
  • results: 论文的实验结果表明,该方法可以在 Argoverse 1 动作预测benchmark上达到与其他SOTA方法相同的精度水平,但具有更少的操作和参数,以及更好的可读性。
    Abstract Motion Prediction (MP) of multiple surroundings agents is a crucial task in arbitrarily complex environments, from simple robots to Autonomous Driving Stacks (ADS). Current techniques tackle this problem using end-to-end pipelines, where the input data is usually a rendered top-view of the physical information and the past trajectories of the most relevant agents; leveraging this information is a must to obtain optimal performance. In that sense, a reliable ADS must produce reasonable predictions on time. However, despite many approaches use simple ConvNets and LSTMs to obtain the social latent features, State-Of-The-Art (SOTA) models might be too complex for real-time applications when using both sources of information (map and past trajectories) as well as little interpretable, specially considering the physical information. Moreover, the performance of such models highly depends on the number of available inputs for each particular traffic scenario, which are expensive to obtain, particularly, annotated High-Definition (HD) maps. In this work, we propose several efficient baselines for the well-known Argoverse 1 Motion Forecasting Benchmark. We aim to develop compact models using SOTA techniques for MP, including attention mechanisms and GNNs. Our lightweight models use standard social information and interpretable map information such as points from the driveable area and plausible centerlines by means of a novel preprocessing step based on kinematic constraints, in opposition to black-box CNN-based or too-complex graphs methods for map encoding, to generate plausible multimodal trajectories achieving up-to-pair accuracy with less operations and parameters than other SOTA methods. Our code is publicly available at https://github.com/Cram3r95/mapfe4mp .
    摘要 <>TRANSLATE_TEXT多 surroundings agent 的动态预测(MP)在无序复杂环境中是关键任务,从简单的机器人到自动驾驶栈(ADS)都是如此。现有技术使用端到端管道来解决这个问题,输入数据通常是 render 的顶视图 Physical information 和过去 trajectory 最 relevante agents; 利用这些信息是必须以获取优化性能。在这种情况下,一个可靠的 ADS 必须在时间上 produz 合理的预测。然而,虽然许多方法使用简单的 ConvNet 和 LSTM 来获取社交尘肤特征,但 SOTA 模型可能在实时应用中过于复杂,特别是使用多种输入信息(地图和过去 trajectory)以及具有少量解释能力,尤其是考虑物理信息。此外,这些模型的性能强度取决于输入信息的数量,这些信息可能昂贵并且困难以获取,特别是需要注释高分辨率地图。在这种情况下,我们提出了一些高效的基线 для Argoverse 1 动态预测挑战 зада。我们目标是开发 compact 的模型,使用 SOTA 技术来实现 MP,包括注意机制和 GNNs。我们的轻量级模型使用标准的社交信息和可读的地图信息,例如 driveable 区域的点和可能的中心线,通过一种新的预处理步骤基于动力学约束来生成可能的多模态轨迹,实现对拥有同等精度的 SOTA 方法相同的准确率,并且具有更少的操作和参数。我们的代码公开可用于 。<

Community-Based Hierarchical Positive-Unlabeled (PU) Model Fusion for Chronic Disease Prediction

  • paper_url: http://arxiv.org/abs/2309.03386
  • repo_url: https://github.com/yangwu001/putree
  • paper_authors: Yang Wu, Xurui Li, Xuhong Zhang, Yangyang Kang, Changlong Sun, Xiaozhong Liu
  • for: 这篇论文旨在解决 Chronic disease screening 问题,使用 Positive-Unlabeled (PU) Learning 方法,并考虑不同人口群体之间的差异。
  • methods: 本研究提出了一个新的 Positive-Unlabeled Learning Tree (PUtree) 算法,具有社区建立 PU 模型的能力,并通过统计融合不同社区的模型得到更加稳定的 Binary 分类结果。
  • results: 在两个 Benchmark 和一个新的 Diabetes 预测数据集上,PUtree 和其变种实现了更好的性能,较之前的 State-of-the-art PU learning 方法。
    Abstract Positive-Unlabeled (PU) Learning is a challenge presented by binary classification problems where there is an abundance of unlabeled data along with a small number of positive data instances, which can be used to address chronic disease screening problem. State-of-the-art PU learning methods have resulted in the development of various risk estimators, yet they neglect the differences among distinct populations. To address this issue, we present a novel Positive-Unlabeled Learning Tree (PUtree) algorithm. PUtree is designed to take into account communities such as different age or income brackets, in tasks of chronic disease prediction. We propose a novel approach for binary decision-making, which hierarchically builds community-based PU models and then aggregates their deliverables. Our method can explicate each PU model on the tree for the optimized non-leaf PU node splitting. Furthermore, a mask-recovery data augmentation strategy enables sufficient training of the model in individual communities. Additionally, the proposed approach includes an adversarial PU risk estimator to capture hierarchical PU-relationships, and a model fusion network that integrates data from each tree path, resulting in robust binary classification results. We demonstrate the superior performance of PUtree as well as its variants on two benchmarks and a new diabetes-prediction dataset.
    摘要 Positive-Unlabeled (PU) 学习是 binary 分类问题中的一个挑战,其中有大量无标注数据和一小数量的正样本,可以用来解决慢性病creening问题。现有的PU学习方法已经导致了不同的风险估计器的开发,但它们忽略了不同人口群体之间的差异。为解决这个问题,我们提出了一种新的Positive-Unlabeled学习树(PUtree)算法。PUtree采用了社区分割的方法,如年龄或收入等,在慢性病预测任务中进行分类。我们提出了一种基于社区的二分法,即在社区中建立PU模型,然后对每个社区进行综合分类。此外,我们还提出了一种面具恢复数据增强策略,以便训练模型在每个社区中。此外,我们还提出了一种对 hierarchical PU 关系进行捕捉的 adversarial PU 风险估计器,以及一种拼接数据从每个树路的模型融合网络,以实现Robust binary 分类结果。我们在两个标准 benchmark 和一个新的 диабеتesprediction 数据集上展示了PUtree 的超越性和其变种的表现。

Self-Supervised Masked Digital Elevation Models Encoding for Low-Resource Downstream Tasks

  • paper_url: http://arxiv.org/abs/2309.03367
  • repo_url: None
  • paper_authors: Priyam Mazumdar, Aiman Soliman, Volodymyr Kindratenko, Luigi Marini, Kenton McHenry
  • for: 本研究的目的是提取基本建筑和路况信息从数字高程模型(DEM),以提供详细的地表 topology。
  • methods: 该模型使用遮盖 autoencoder 预训练 ImageNet(尽管存在大领域差异),并添加 UperNet 头来解码 segmentation。
  • results: 模型在建筑 segmentation 任务上获得 82.1% 交集 overlap(IoU)使用 450 张图像,并在只使用 50 张图像时获得 69.1% IoU。在更加困难的路况检测任务上,模型获得 82.7% IoU 使用 450 张图像,并在只使用 50 张图像时获得 73.2% IoU。
    Abstract The lack of quality labeled data is one of the main bottlenecks for training Deep Learning models. As the task increases in complexity, there is a higher penalty for overfitting and unstable learning. The typical paradigm employed today is Self-Supervised learning, where the model attempts to learn from a large corpus of unstructured and unlabeled data and then transfer that knowledge to the required task. Some notable examples of self-supervision in other modalities are BERT for Large Language Models, Wav2Vec for Speech Recognition, and the Masked AutoEncoder for Vision, which all utilize Transformers to solve a masked prediction task. GeoAI is uniquely poised to take advantage of the self-supervised methodology due to the decades of data collected, little of which is precisely and dependably annotated. Our goal is to extract building and road segmentations from Digital Elevation Models (DEM) that provide a detailed topography of the earths surface. The proposed architecture is the Masked Autoencoder pre-trained on ImageNet (with the limitation that there is a large domain discrepancy between ImageNet and DEM) with an UperNet Head for decoding segmentations. We tested this model with 450 and 50 training images only, utilizing roughly 5% and 0.5% of the original data respectively. On the building segmentation task, this model obtains an 82.1% Intersection over Union (IoU) with 450 Images and 69.1% IoU with only 50 images. On the more challenging road detection task the model obtains an 82.7% IoU with 450 images and 73.2% IoU with only 50 images. Any hand-labeled dataset made today about the earths surface will be immediately obsolete due to the constantly changing nature of the landscape. This motivates the clear necessity for data-efficient learners that can be used for a wide variety of downstream tasks.
    摘要 “缺乏质量标注数据是深度学习模型训练的主要瓶颈。随着任务的复杂度增加,模型会面临更高的溢出和不稳定学习 penalty。当今通用的方法是自我超vised学习,其中模型尝试从大量未结构化和未标注数据中学习知识,然后将其应用到需要的任务上。 notable example包括BERT для大语言模型、Wav2Vec для语音识别和视觉领域中的Masked AutoEncoder,它们都使用Transformers解决了masked prediction任务。GeoAI具有自我超vised方法的优势,因为它拥有大量数据,但只有少量准确和可靠地标注。我们的目标是从数字高程模型(DEM)中提取建筑和路径分割,以获得详细的地表 topology。我们提议使用Masked Autoencoder预训练在ImageNet(具有大领域差异)和UpperNet Head для解码分割。我们使用450和50张图像进行测试,只使用原始数据的5%和0.5%。在建筑分割任务中,这个模型 obtiains 82.1%的Intersection over Union(IoU),使用450张图像时 obtiains 69.1%的IoU,使用50张图像时 obtiains 82.7%的IoU。在更加困难的道路检测任务中,模型 obtiains 82.7%的IoU,使用450张图像时 obtiains 73.2%的IoU。由于地表面的不断变化,任何手动标注的数据今天都将是过时的。这种情况激励我们需要数据效率的学习者,可以用于各种下游任务。”

ETP: Learning Transferable ECG Representations via ECG-Text Pre-training

  • paper_url: http://arxiv.org/abs/2309.07145
  • repo_url: None
  • paper_authors: Che Liu, Zhongwei Wan, Sibo Cheng, Mi Zhang, Rossella Arcucci
  • for: 针对卡路达学健康领域的电位图 (ECG) 作为非侵入性诊断工具,尝试使用自然语言描述 (NLP) 技术来学习 ECG 的特征表示。
  • methods: 我们提出了一种新的框架,即 ECG 文本预训练 (ETP),它通过将 ECG 信号与文本报告相对应,来学习cross-模态的 ECG 特征表示。ETP 使用了 ECG 编码器和预训练的自然语言模型,以实现 ECG 信号与文本报告的对应。
  • results: ETP 在 linear 评估任务和零容量分类任务中表现出色,在 PTB-XL 和 CPSC2018 数据集上进行了证明。ETP 能够学习 Robust 和通用的 cross-模态 ECG 特征表示,并且在不同的类别上具有良好的一致性。
    Abstract In the domain of cardiovascular healthcare, the Electrocardiogram (ECG) serves as a critical, non-invasive diagnostic tool. Although recent strides in self-supervised learning (SSL) have been promising for ECG representation learning, these techniques often require annotated samples and struggle with classes not present in the fine-tuning stages. To address these limitations, we introduce ECG-Text Pre-training (ETP), an innovative framework designed to learn cross-modal representations that link ECG signals with textual reports. For the first time, this framework leverages the zero-shot classification task in the ECG domain. ETP employs an ECG encoder along with a pre-trained language model to align ECG signals with their corresponding textual reports. The proposed framework excels in both linear evaluation and zero-shot classification tasks, as demonstrated on the PTB-XL and CPSC2018 datasets, showcasing its ability for robust and generalizable cross-modal ECG feature learning.
    摘要 在心血管健康领域,电cardiogram (ECG) 作为一种非侵入性诊断工具,扮演着关键的角色。 although recent advances in self-supervised learning (SSL) 在 ECG 表示学习方面具有承诺的进步,这些技术通常需要标注样本,并且在精度调整阶段难以处理不存在的类型。 To address these limitations, we introduce ECG-Text Pre-training (ETP), a novel framework designed to learn cross-modal representations that link ECG signals with textual reports. This framework leverages the zero-shot classification task in the ECG domain for the first time. ETP employs an ECG encoder along with a pre-trained language model to align ECG signals with their corresponding textual reports. The proposed framework excels in both linear evaluation and zero-shot classification tasks, as demonstrated on the PTB-XL and CPSC2018 datasets, showcasing its ability for robust and generalizable cross-modal ECG feature learning.

REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous Manipulation

  • paper_url: http://arxiv.org/abs/2309.03322
  • repo_url: None
  • paper_authors: Zheyuan Hu, Aaron Rovinsky, Jianlan Luo, Vikash Kumar, Abhishek Gupta, Sergey Levine
  • for: 学习灵活的抓取技能,以提高机器人手臂在实际世界中的操作能力。
  • methods: 利用近期的可效RL和重播缓存启动技术,将不同任务或物品的数据作为新任务的启动点,大幅提高学习效率。
  • results: 在实际世界中使用四 fingers 机器人手臂快速学习复杂的抓取技能,并完成实际训练周期,无需人工重置和奖励工程。
    Abstract Dexterous manipulation tasks involving contact-rich interactions pose a significant challenge for both model-based control systems and imitation learning algorithms. The complexity arises from the need for multi-fingered robotic hands to dynamically establish and break contacts, balance non-prehensile forces, and control large degrees of freedom. Reinforcement learning (RL) offers a promising approach due to its general applicability and capacity to autonomously acquire optimal manipulation strategies. However, its real-world application is often hindered by the necessity to generate a large number of samples, reset the environment, and obtain reward signals. In this work, we introduce an efficient system for learning dexterous manipulation skills with RL to alleviate these challenges. The main idea of our approach is the integration of recent advances in sample-efficient RL and replay buffer bootstrapping. This combination allows us to utilize data from different tasks or objects as a starting point for training new tasks, significantly improving learning efficiency. Additionally, our system completes the real-world training cycle by incorporating learned resets via an imitation-based pickup policy as well as learned reward functions, eliminating the need for manual resets and reward engineering. We demonstrate the benefits of reusing past data as replay buffer initialization for new tasks, for instance, the fast acquisition of intricate manipulation skills in the real world on a four-fingered robotic hand. (Videos: https://sites.google.com/view/reboot-dexterous)
    摘要 dexterous manipulation task involving contact-rich interactions 是控制系统和模仿学习算法难以处理的挑战。这些复杂性来自 robotic hands 需要在动态建立和破坏 contacts, 平衡非握持力, 和控制大度自由度。 reinforcement learning (RL) 提供了一个有希望的方法,因为它可以自主地获得优化的抓取策略。然而,它的实际应用frequently hindered by the need to generate a large number of samples, reset the environment, and obtain reward signals.在这项工作中,我们介绍了一种高效的RL系统,用于学习dexterous manipulation skills。我们的方法基于 latest advances in sample-efficient RL 和 replay buffer bootstrapping。这种组合允许我们利用不同任务或对象的数据作为新任务的训练开始,从而显著提高学习效率。此外,我们的系统在实际世界训练周期中完成了学习重置和获得奖励信号的操作,从而消除了手动重置和奖励工程化的需求。我们在实际世界中使用了 learned resets 和 learned reward functions,以完成实际世界训练周期。我们示出了 reuse past data as replay buffer initialization for new tasks 的好处,例如在 four-fingered robotic hand 上快速获得复杂的抓取技能。(视频:https://sites.google.com/view/reboot-dexterous)

Fitness Approximation through Machine Learning

  • paper_url: http://arxiv.org/abs/2309.03318
  • repo_url: https://github.com/itaitzruia4/approxml
  • paper_authors: Itai Tzruia, Tomer Halperin, Moshe Sipper, Achiya Elyasaf
  • for: 这个论文主要目标是提出一种基于机器学习模型的遗传算法中的健康估计方法,以优化遗传算法的运行效率。
  • methods: 该方法使用机器学习模型来保持一个遗传算法中个体的健康估计,并在进行遗传算法的演化运行中不断更新该模型。
  • results: 实验结果表明,使用该方法可以显著提高遗传算法的运行效率,并且fitness分数与实际遗传算法的fitness分数相似或者只有轻微差异。
    Abstract We present a novel approach to performing fitness approximation in genetic algorithms (GAs) using machine-learning (ML) models, focusing on evolutionary agents in Gymnasium (game) simulators -- where fitness computation is costly. Maintaining a dataset of sampled individuals along with their actual fitness scores, we continually update throughout an evolutionary run a fitness-approximation ML model. We compare different methods for: 1) switching between actual and approximate fitness, 2) sampling the population, and 3) weighting the samples. Experimental findings demonstrate significant improvement in evolutionary runtimes, with fitness scores that are either identical or slightly lower than that of the fully run GA -- depending on the ratio of approximate-to-actual-fitness computation. Our approach is generic and can be easily applied to many different domains.
    摘要 我们提出了一种新的方法,用机器学习(ML)模型来实现遗传算法(GA)中的健康估计,专注于在游戏仿真器(Gymnasium)中的进化代理人。在计算质量高的情况下,我们 continually 更新一个包含采样个体以及其实际健康分的数据集。我们比较了不同的方法来:1)在实际和估计健康之间切换,2)采样人口,和3)Weighting样本。我们的实验结果表明,我们的方法可以显著提高进化时间,并且健康分几乎与完全运行GA的健康分相同或者只有略低一些,具体取决于估计与实际健康计算的比率。我们的方法是通用的,可以适用于多种领域。

Comparative Analysis of Deep-Fake Algorithms

  • paper_url: http://arxiv.org/abs/2309.03295
  • repo_url: None
  • paper_authors: Nikhil Sontakke, Sejal Utekar, Shivansh Rastogi, Shriraj Sonawane
  • for: 本研究旨在提供深度伪造技术的现状概述,包括深度学习基于的伪造创建方法和检测技术。
  • methods: 本研究使用多种方法来检测深度伪造视频,包括人脸识别、运动分析和音频视频同步。
  • results: 本研究发现现有的深度伪造检测技术具有限制和挑战,需要进一步的研究和发展以确保数字视频的真实性。
    Abstract Due to the widespread use of smartphones with high-quality digital cameras and easy access to a wide range of software apps for recording, editing, and sharing videos and images, as well as the deep learning AI platforms, a new phenomenon of 'faking' videos has emerged. Deepfake algorithms can create fake images and videos that are virtually indistinguishable from authentic ones. Therefore, technologies that can detect and assess the integrity of digital visual media are crucial. Deepfakes, also known as deep learning-based fake videos, have become a major concern in recent years due to their ability to manipulate and alter images and videos in a way that is virtually indistinguishable from the original. These deepfake videos can be used for malicious purposes such as spreading misinformation, impersonating individuals, and creating fake news. Deepfake detection technologies use various approaches such as facial recognition, motion analysis, and audio-visual synchronization to identify and flag fake videos. However, the rapid advancement of deepfake technologies has made it increasingly difficult to detect these videos with high accuracy. In this paper, we aim to provide a comprehensive review of the current state of deepfake creation and detection technologies. We examine the various deep learning-based approaches used for creating deepfakes, as well as the techniques used for detecting them. Additionally, we analyze the limitations and challenges of current deepfake detection methods and discuss future research directions in this field. Overall, the paper highlights the importance of continued research and development in deepfake detection technologies in order to combat the negative impact of deepfakes on society and ensure the integrity of digital visual media.
    摘要 Due to the widespread use of smartphones with high-quality digital cameras and easy access to a wide range of software apps for recording, editing, and sharing videos and images, as well as the deep learning AI platforms, a new phenomenon of 'faking' videos has emerged. Deepfake algorithms can create fake images and videos that are virtually indistinguishable from authentic ones. Therefore, technologies that can detect and assess the integrity of digital visual media are crucial. Deepfakes, also known as deep learning-based fake videos, have become a major concern in recent years due to their ability to manipulate and alter images and videos in a way that is virtually indistinguishable from the original. These deepfake videos can be used for malicious purposes such as spreading misinformation, impersonating individuals, and creating fake news. Deepfake detection technologies use various approaches such as facial recognition, motion analysis, and audio-visual synchronization to identify and flag fake videos. However, the rapid advancement of deepfake technologies has made it increasingly difficult to detect these videos with high accuracy. In this paper, we aim to provide a comprehensive review of the current state of deepfake creation and detection technologies. We examine the various deep learning-based approaches used for creating deepfakes, as well as the techniques used for detecting them. Additionally, we analyze the limitations and challenges of current deepfake detection methods and discuss future research directions in this field. Overall, the paper highlights the importance of continued research and development in deepfake detection technologies in order to combat the negative impact of deepfakes on society and ensure the integrity of digital visual media.

My Art My Choice: Adversarial Protection Against Unruly AI

  • paper_url: http://arxiv.org/abs/2309.03198
  • repo_url: None
  • paper_authors: Anthony Rhodes, Ram Bhagat, Umur Aybars Ciftci, Ilke Demir
  • for: 保护创作者的版权,防止Diffusion模型把艺术作品利用为自己的目的。
  • methods: 使用UNet生成器、多种损失函数、对黑盒Diffusion模型进行攻击,生成”保护”版本的图像,以做到对 diffusion模型的防御。
  • results: 在多个图像-图像任务上进行了实验,并评估了”保护”版本图像和Diffusion模型输出结果的视觉、噪音、结构、像素和生成空间性能,以验证我们的主张。
    Abstract Generative AI is on the rise, enabling everyone to produce realistic content via publicly available interfaces. Especially for guided image generation, diffusion models are changing the creator economy by producing high quality low cost content. In parallel, artists are rising against unruly AI, since their artwork are leveraged, distributed, and dissimulated by large generative models. Our approach, My Art My Choice (MAMC), aims to empower content owners by protecting their copyrighted materials from being utilized by diffusion models in an adversarial fashion. MAMC learns to generate adversarially perturbed "protected" versions of images which can in turn "break" diffusion models. The perturbation amount is decided by the artist to balance distortion vs. protection of the content. MAMC is designed with a simple UNet-based generator, attacking black box diffusion models, combining several losses to create adversarial twins of the original artwork. We experiment on three datasets for various image-to-image tasks, with different user control values. Both protected image and diffusion output results are evaluated in visual, noise, structure, pixel, and generative spaces to validate our claims. We believe that MAMC is a crucial step for preserving ownership information for AI generated content in a flawless, based-on-need, and human-centric way.
    摘要 “生成AI在崛起,让每个人可以生成真实的内容通过公开可用的界面。尤其是导航图像生成,扩散模型在创新经济中产生高品质低成本的内容。在平行的情况下,艺术家在不良AI的挑战下,因为他们的艺术作品被扩散、分布和歪化了大量生成模型。我们的方法,“我的艺术,我的选择”(MAMC),旨在强化内容拥有者的权益,对扩散模型进行反对式使用的内容。MAMC使用了一个简单的UNet型生成器,攻击黑盒扩散模型,结合了多种损失函数创建反对双生的原始艺术作品。我们在三个 dataset上进行了多种图像转换任务的实验,不同的用户控制值。两个受保护图像和扩散输出结果在视觉、噪音、结构、像素和生成空间进行评估,以验证我们的声明。我们相信MAMC是为AI生成内容的拥有者掌握权益的重要一步,以精彩、需要、人性化的方式。”

Temporal Inductive Path Neural Network for Temporal Knowledge Graph Reasoning

  • paper_url: http://arxiv.org/abs/2309.03251
  • repo_url: None
  • paper_authors: Hao Dong, Pengyang Wang, Meng Xiao, Zhiyuan Ning, Pengfei Wang, Yuanchun Zhou
  • for: 该论文旨在提高 temps 知识图(TKG) reasoning 任务的性能,特别是在处理历史信息和新生成的实体时。
  • methods: 该论文提出了一种名为 Temporal Inductive Path Neural Network(TiPNN)的模型,它在entity-independent的角度模型历史信息,并通过定义query-aware的时间路径来模型历史路径信息相关于查询。
  • results: 实验结果表明,提出的模型不仅具有显著性能提升,还能够处理 inductive 设定,并且可以提供历史图中的理由证明。
    Abstract Temporal Knowledge Graph (TKG) is an extension of traditional Knowledge Graph (KG) that incorporates the dimension of time. Reasoning on TKGs is a crucial task that aims to predict future facts based on historical occurrences. The key challenge lies in uncovering structural dependencies within historical subgraphs and temporal patterns. Most existing approaches model TKGs relying on entity modeling, as nodes in the graph play a crucial role in knowledge representation. However, the real-world scenario often involves an extensive number of entities, with new entities emerging over time. This makes it challenging for entity-dependent methods to cope with extensive volumes of entities, and effectively handling newly emerging entities also becomes a significant challenge. Therefore, we propose Temporal Inductive Path Neural Network (TiPNN), which models historical information in an entity-independent perspective. Specifically, TiPNN adopts a unified graph, namely history temporal graph, to comprehensively capture and encapsulate information from history. Subsequently, we utilize the defined query-aware temporal paths to model historical path information related to queries on history temporal graph for the reasoning. Extensive experiments illustrate that the proposed model not only attains significant performance enhancements but also handles inductive settings, while additionally facilitating the provision of reasoning evidence through history temporal graphs.
    摘要 Temporal Knowledge Graph (TKG) 是传统知识图 (KG) 的扩展,它包含时间dimension。理解 TKGs 是一个重要的任务,旨在根据历史发生项来预测未来的事实。主要挑战在于探索历史子图中的结构依赖关系和时间对称。现有的方法通常基于实体模型,将node在图中扮演重要的知识表示角色。但是,实际情况中通常会有很多实体,新实体随时出现,这使得实体dependent的方法很难处理大量的实体,同时新出现的实体也成为一个主要挑战。因此,我们提出了Temporal Inductive Path Neural Network (TiPNN),它在实体独立的角度来模型历史信息。具体来说,TiPNN使用了一个统一的图,即历史时间图,以全面捕捉和储存历史信息。然后,我们使用定义的查询相依时间路径来模型对历史时间图的查询。实验结果显示,提案的模型不仅实现了重要的性能提升,同时也能够处理 inductive 设定,并且可以通过历史时间图来提供理解证据。

Split-Boost Neural Networks

  • paper_url: http://arxiv.org/abs/2309.03167
  • repo_url: https://github.com/Aastha2104/Parkinson-Disease-Prediction
  • paper_authors: Raffaele Giuseppe Cestari, Gabriele Maroni, Loris Cannelli, Dario Piga, Simone Formentin
  • for: 这篇研究的目的是提出一种对于单向神经网络的训练和准确化方法,以提高性能并自动包含调整行为而不需要直接模型。
  • methods: 这篇研究使用了一种称为”split-boost”的新训练策略,它可以增强性能并自动包含调整行为,不需要直接模型。
  • results: 研究中使用了一个匿名的医疗保险设计问题的实际数据集,结果显示了该策略可以提高性能和减少调整时间。
    Abstract The calibration and training of a neural network is a complex and time-consuming procedure that requires significant computational resources to achieve satisfactory results. Key obstacles are a large number of hyperparameters to select and the onset of overfitting in the face of a small amount of data. In this framework, we propose an innovative training strategy for feed-forward architectures - called split-boost - that improves performance and automatically includes a regularizing behaviour without modeling it explicitly. Such a novel approach ultimately allows us to avoid explicitly modeling the regularization term, decreasing the total number of hyperparameters and speeding up the tuning phase. The proposed strategy is tested on a real-world (anonymized) dataset within a benchmark medical insurance design problem.
    摘要 neural network 的准确和训练是一个复杂且时间consuming的过程,需要大量的计算资源以获得满意的结果。关键障碍是大量的 hyperparameter 选择和数据少的情况下遇到过拟合。在这个框架下,我们提出了一种创新的训练策略 для径向网络 - called split-boost - 可以提高性能并自动包含了正则化行为而无需显式模型。这种新的方法最终允许我们避免显式模型正则化项,减少总的 hyperparameter 数量,加速调整阶段。我们在一个匿名的实际数据集上进行了一个审核医疗保险设计问题。

J-Guard: Journalism Guided Adversarially Robust Detection of AI-generated News

  • paper_url: http://arxiv.org/abs/2309.03164
  • repo_url: None
  • paper_authors: Tharindu Kumarage, Amrita Bhattacharjee, Djordje Padejski, Kristy Roschke, Dan Gillmor, Scott Ruston, Huan Liu, Joshua Garland
  • for: 本研究旨在寻找一种可靠地检测AI生成的新闻文章,以避免在线的谣言散播。
  • methods: 该研究利用了一个多 disciplinary 团队的专业知识,开发了一个名为 J-Guard 的框架,可以改进现有的超级vised AI 文本检测器,以便更好地分辨真实的新闻文章和 AI 生成的文章。
  • results: experiments 表明,J-Guard 可以增强检测能力,同时在面对黑客攻击时保持了 average 的性能下降只有 7%。
    Abstract The rapid proliferation of AI-generated text online is profoundly reshaping the information landscape. Among various types of AI-generated text, AI-generated news presents a significant threat as it can be a prominent source of misinformation online. While several recent efforts have focused on detecting AI-generated text in general, these methods require enhanced reliability, given concerns about their vulnerability to simple adversarial attacks. Furthermore, due to the eccentricities of news writing, applying these detection methods for AI-generated news can produce false positives, potentially damaging the reputation of news organizations. To address these challenges, we leverage the expertise of an interdisciplinary team to develop a framework, J-Guard, capable of steering existing supervised AI text detectors for detecting AI-generated news while boosting adversarial robustness. By incorporating stylistic cues inspired by the unique journalistic attributes, J-Guard effectively distinguishes between real-world journalism and AI-generated news articles. Our experiments on news articles generated by a vast array of AI models, including ChatGPT (GPT3.5), demonstrate the effectiveness of J-Guard in enhancing detection capabilities while maintaining an average performance decrease of as low as 7% when faced with adversarial attacks.
    摘要 人工智能生成文本在线的快速扩散正在深刻地改变信息景观。 Among various types of AI-generated text, AI-generated news presents a significant threat as it can be a prominent source of misinformation online. While several recent efforts have focused on detecting AI-generated text in general, these methods require enhanced reliability, given concerns about their vulnerability to simple adversarial attacks. Furthermore, due to the eccentricities of news writing, applying these detection methods for AI-generated news can produce false positives, potentially damaging the reputation of news organizations. To address these challenges, we leverage the expertise of an interdisciplinary team to develop a framework, J-Guard, capable of steering existing supervised AI text detectors for detecting AI-generated news while boosting adversarial robustness. By incorporating stylistic cues inspired by the unique journalistic attributes, J-Guard effectively distinguishes between real-world journalism and AI-generated news articles. Our experiments on news articles generated by a vast array of AI models, including ChatGPT (GPT3.5), demonstrate the effectiveness of J-Guard in enhancing detection capabilities while maintaining an average performance decrease of as low as 7% when faced with adversarial attacks.

Risk-reducing design and operations toolkit: 90 strategies for managing risk and uncertainty in decision problems

  • paper_url: http://arxiv.org/abs/2309.03133
  • repo_url: https://github.com/sashagutfraind/uncertainty_strategies
  • paper_authors: Alexander Gutfraind
  • for: 这篇论文是为了探讨和发展一个叫做 RDOT(Risk-reducing Design and Operations Toolkit)的解决方案,这些解决方案可以在面对高度不确定的问题时提供有效的回应。
  • methods: 这篇论文使用了一种称为多元目标优化的方法,将 RDOT 策略分类为六个主要类别,并认为这些策略可以对面对高度不确定的问题提供有效的回应。
  • results: 这篇论文发现了超过 90 种 RDOT 策略,这些策略可以在不同的领域和领域中找到,这表明了这些策略的共同性。这篇论文还提出了一个框架,可以将这些策略包含在决策理论中,以便在面对高度不确定的问题时使用。
    Abstract Uncertainty is a pervasive challenge in decision analysis, and decision theory recognizes two classes of solutions: probabilistic models and cognitive heuristics. However, engineers, public planners and other decision-makers instead use a third class of strategies that could be called RDOT (Risk-reducing Design and Operations Toolkit). These include incorporating robustness into designs, contingency planning, and others that do not fall into the categories of probabilistic models or cognitive heuristics. Moreover, identical strategies appear in several domains and disciplines, pointing to an important shared toolkit. The focus of this paper is to develop a catalog of such strategies and develop a framework for them. The paper finds more than 90 examples of such strategies falling into six broad categories and argues that they provide an efficient response to decision problems that are seemingly intractable due to high uncertainty. It then proposes a framework to incorporate them into decision theory using multi-objective optimization. Overall, RDOT represents an overlooked class of responses to uncertainty. Because RDOT strategies do not depend on accurate forecasting or estimation, they could be applied fruitfully to certain decision problems affected by high uncertainty and make them much more tractable.
    摘要 众所周知,决策分析中的不确定性是一大挑战,决策理论则认可两类解决方案:概率模型和认知逻辑。然而,工程师、公共规划人员和其他决策者实际上使用了第三类策略,可以称为风险减少设计和操作工具箱(RDOT)。这些策略包括在设计中加入强健性,备援计划等,不属于概率模型或认知逻辑类别。此外,同一类策略在不同领域和学科中出现,表明存在共同的工具箱。本文的目的是开发一份这些策略的目录,并为其提出一个框架。研究发现了超过90个这类策略,分为六大类。这些策略能够有效地对决策问题进行处理,即使面临高度不确定性。然后,文章提议使用多目标优化来包含这些策略在决策理论中。总之,RDOT表示决策中的一种被忽略的策略类型。由于RDOT策略不依赖于准确预测或估计,因此可以在高度不确定性的决策问题上应用得更加果断。

MyoDex: A Generalizable Prior for Dexterous Manipulation

  • paper_url: http://arxiv.org/abs/2309.03130
  • repo_url: None
  • paper_authors: Vittorio Caggiano, Sudeep Dasari, Vikash Kumar
  • for: 这个论文的目的是开发一种基于多任务学习的人工智能控制系统,以便在几个任务中快速学习和执行新的、先前无法实现的行为。
  • methods: 该论文使用了多任务学习来隐式地捕捉人类手部的行为偏好(MyoDex),并使用了一个physiologically realistic的人类手模型(MyoHand)来训练代理人。
  • results: 研究发现,使用MyoDex可以在几个任务中快速学习和执行新的行为,并且可以在不同的contact-rich behaviors中表现出人类手部的dexterity。此外,MyoDex还可以在24DoF Adroit Hand中提高dexterity的性能。
    Abstract Human dexterity is a hallmark of motor control. Our hands can rapidly synthesize new behaviors despite the complexity (multi-articular and multi-joints, with 23 joints controlled by more than 40 muscles) of musculoskeletal sensory-motor circuits. In this work, we take inspiration from how human dexterity builds on a diversity of prior experiences, instead of being acquired through a single task. Motivated by this observation, we set out to develop agents that can build upon their previous experience to quickly acquire new (previously unattainable) behaviors. Specifically, our approach leverages multi-task learning to implicitly capture task-agnostic behavioral priors (MyoDex) for human-like dexterity, using a physiologically realistic human hand model - MyoHand. We demonstrate MyoDex's effectiveness in few-shot generalization as well as positive transfer to a large repertoire of unseen dexterous manipulation tasks. Agents leveraging MyoDex can solve approximately 3x more tasks, and 4x faster in comparison to a distillation baseline. While prior work has synthesized single musculoskeletal control behaviors, MyoDex is the first generalizable manipulation prior that catalyzes the learning of dexterous physiological control across a large variety of contact-rich behaviors. We also demonstrate the effectiveness of our paradigms beyond musculoskeletal control towards the acquisition of dexterity in 24 DoF Adroit Hand. Website: https://sites.google.com/view/myodex
    摘要 人类dexterity是运动控制的标志之一。我们的手可以快速合成新行为,即使musculoskeletal感知-运动回路复杂(多 JOINTS 和多骨骼,涉及到更多 than 40 肌肉)。在这项工作中,我们受到人类dexterity如何在多种前期经验基础上建立新的行为的启发。我们采取多任务学习的方法,以寻找人类类似的dexterity。我们的方法利用多任务学习来隐式地捕捉任务无关的行为先验(MyoDex),使用一个physiologically realistic的人手模型—MyoHand。我们示出MyoDex在少数shot泛化和负担减少的能力,以及对大量未看过的dexterous manipulation任务的积极转移。使用MyoDex的代理可以解决约3倍多的任务,并在比较基线下4倍快。而先前的工作只能合成单一的musculoskeletal控制行为,MyoDex是首个可以 catalyze 学习人类类似的physiological控制的多种contact-rich行为的普适概念。我们还证明了我们的思想在24 DoF Adroit Hand中的效果。网址:https://sites.google.com/view/myodex

Detecting Manufacturing Defects in PCBs via Data-Centric Machine Learning on Solder Paste Inspection Features

  • paper_url: http://arxiv.org/abs/2309.03113
  • repo_url: None
  • paper_authors: Jubilee Prasad-Rao, Roohollah Heidary, Jesse Williams
  • For: The paper aims to improve the automated detection of defects in Printed Circuit Board (PCB) manufacturing using Solder Paste Inspection (SPI) and Automated Optical Inspection (AOI) machines.* Methods: The paper uses a data-centric approach to train Machine Learning (ML) models to detect PCB defects at three stages of PCB manufacturing. The authors use SPI-extracted features of 6 million pins to train the ML models, and combine pin-level SPI features with component and PCB IDs to capture any inter-pin, inter-component, or spatial effects that may not be apparent at the pin level.* Results: The paper demonstrates the effectiveness of the proposed approach in detecting PCB defects. The authors use a base extreme gradient boosting (XGBoost) ML model and iterate on the data pre-processing step to improve detection performance. The results show that combining the detection results from different models can identify defective components more accurately.Here’s the information in Simplified Chinese text:* For: 本文旨在提高印刷电路板(PCB)生产中自动检测缺陷的效率,使用粘结材料检测(SPI)和自动光学检测(AOI)机器。* Methods: 本文采用数据驱动的方法来训练机器学习(ML)模型,检测PCB缺陷在三个阶段的PCB生产过程中。作者使用600万个封ajection的SPI特征来训练ML模型,并将封ajection级别的SPI特征与组件和PCB ID结合使用,以捕捉任何 между封ajection、组件或空间效果。* Results: 本文证明提出的方法的效果,使用基本的极限梯度提升(XGBoost)ML模型,并在数据预处理步骤中进行迭代来提高检测性能。结果表明,将不同模型的检测结果相乘可以更加准确地标识缺陷组件。
    Abstract Automated detection of defects in Printed Circuit Board (PCB) manufacturing using Solder Paste Inspection (SPI) and Automated Optical Inspection (AOI) machines can help improve operational efficiency and significantly reduce the need for manual intervention. In this paper, using SPI-extracted features of 6 million pins, we demonstrate a data-centric approach to train Machine Learning (ML) models to detect PCB defects at three stages of PCB manufacturing. The 6 million PCB pins correspond to 2 million components that belong to 15,387 PCBs. Using a base extreme gradient boosting (XGBoost) ML model, we iterate on the data pre-processing step to improve detection performance. Combining pin-level SPI features using component and PCB IDs, we developed training instances also at the component and PCB level. This allows the ML model to capture any inter-pin, inter-component, or spatial effects that may not be apparent at the pin level. Models are trained at the pin, component, and PCB levels, and the detection results from the different models are combined to identify defective components.
    摘要 翻译结果:自动检测Printed Circuit Board (PCB)生产过程中的缺陷使用Solder Paste Inspection (SPI)和Automated Optical Inspection (AOI)机器可以提高操作效率,大幅减少人工干预。在这篇论文中,我们使用SPI提取的600万个见识特征,示出了一种数据驱动的方法来使Machine Learning (ML)模型检测PCB缺陷。这600万个PCB见识特征对应了200万个组件,这些组件属于15387个PCB。使用基本的极限Gradient Boosting (XGBoost) ML模型,我们在数据预处理步骤上进行迭代,以提高检测性能。将pin级SPI特征与组件ID和PCB ID结合,我们开发了训练示例也在组件和PCB级别。这使得ML模型能够捕捉到pin级别不可见的间接、组件间或空间效应。我们在不同级别上训练了模型,并将不同级别的检测结果合并,以识别缺陷组件。

A Multimodal Analysis of Influencer Content on Twitter

  • paper_url: http://arxiv.org/abs/2309.03064
  • repo_url: https://github.com/danaesavi/micd-influencer-content-twitter
  • paper_authors: Danae Sánchez Villegas, Catalina Goanta, Nikolaos Aletras
  • for: This paper aims to assist in the automatic detection of commercial influencer content on Twitter.
  • methods: The paper uses a new dataset of 15,998 influencer posts, and experiments with a range of predictive models that combine text and visual information, including a proposed cross-attention approach.
  • results: The paper shows that the cross-attention approach outperforms state-of-the-art multimodal models, and provides a thorough analysis of the strengths and limitations of the models. The models are effective in identifying commercial posts and reducing false positives, while capturing relevant context that aids in the discovery of undisclosed commercial posts.
    Abstract Influencer marketing involves a wide range of strategies in which brands collaborate with popular content creators (i.e., influencers) to leverage their reach, trust, and impact on their audience to promote and endorse products or services. Because followers of influencers are more likely to buy a product after receiving an authentic product endorsement rather than an explicit direct product promotion, the line between personal opinions and commercial content promotion is frequently blurred. This makes automatic detection of regulatory compliance breaches related to influencer advertising (e.g., misleading advertising or hidden sponsorships) particularly difficult. In this work, we (1) introduce a new Twitter (now X) dataset consisting of 15,998 influencer posts mapped into commercial and non-commercial categories for assisting in the automatic detection of commercial influencer content; (2) experiment with an extensive set of predictive models that combine text and visual information showing that our proposed cross-attention approach outperforms state-of-the-art multimodal models; and (3) conduct a thorough analysis of strengths and limitations of our models. We show that multimodal modeling is useful for identifying commercial posts, reducing the amount of false positives, and capturing relevant context that aids in the discovery of undisclosed commercial posts.
    摘要 influencer marketing 包括各种策略,brand与popular content creator(i.e., influencer)合作,利用他们的影响力、信任度和对观众的影响,推广和推销产品或服务。由于追求者们更有可能根据authentic product endorsement而购买产品,而不是直接的产品推广,因此商业和非商业内容之间的分界难以确定。这使得自动检测与Influencer Advertising相关的规定遵守(如误导性广告或隐藏赞助)特别困难。在这项工作中,我们(1)引入了15,998名 influencer 的 Twitter(现在是 X)数据集,并将其分为商业和非商业类别以帮助自动检测商业 influencer 内容;(2)试用了一系列的预测模型,其中包括文本和视觉信息的组合,我们的跨层注意方法比 state-of-the-art 多模式模型更高效;(3)进行了模型的全面分析,包括其优点和局限性。我们表明,多模式模型可以帮助标识商业帖子,降低false positives的数量,并捕捉有关的上下文,以帮助发现未经披露的商业帖子。

Framework-Based Qualitative Analysis of Free Responses of Large Language Models: Algorithmic Fidelity

  • paper_url: http://arxiv.org/abs/2309.06364
  • repo_url: None
  • paper_authors: Aliya Amirova, Theodora Fteropoulli, Nafiso Ahmed, Martin R. Cowie, Joel Z. Leibo
  • For: The paper explores the use of large-scale generative language models (LLMs) to simulate free responses to interview questions and whether these artificial “silicon participants” can be studied using qualitative methods to produce insights that generalize to real human populations.* Methods: The paper uses an LLM to generate interviews with silicon participants matching specific demographic characteristics one-for-one with a set of human participants. The authors use framework-based qualitative analysis to compare the key themes obtained from both human and silicon participants, and they also analyze the structure and tone of the interviews.* Results: The paper finds that while the key themes obtained from both human and silicon participants are strikingly similar, there are significant differences in the structure and tone of the interviews. The authors also find evidence of the hyper-accuracy distortion described by Aher et al. (2023), which suggests that the LLM they tested (GPT-3.5) does not have sufficient algorithmic fidelity to expect research on it to generalize to human populations.Here are the three points in Simplified Chinese text:* For: 这篇论文探讨了使用大规模生成语言模型(LLM)来模拟面试问答和人工智能”Silicon Participants”是否可以通过质量研究方法获得可重复性和普适性。* Methods: 论文使用LLM生成面试问答,并与人类参与者一一匹配特定的人口特征。作者使用框架基本的质量分析方法对两组参与者的主题进行比较。* Results: 论文发现,尽管人类和人工智能参与者的主题相似度很高,但面试结构和语言表达存在很大差异。作者还发现Ahmed等人(2023)所描述的”超级准确性偏见”现象。
    Abstract Today, using Large-scale generative Language Models (LLMs) it is possible to simulate free responses to interview questions like those traditionally analyzed using qualitative research methods. Qualitative methodology encompasses a broad family of techniques involving manual analysis of open-ended interviews or conversations conducted freely in natural language. Here we consider whether artificial "silicon participants" generated by LLMs may be productively studied using qualitative methods aiming to produce insights that could generalize to real human populations. The key concept in our analysis is algorithmic fidelity, a term introduced by Argyle et al. (2023) capturing the degree to which LLM-generated outputs mirror human sub-populations' beliefs and attitudes. By definition, high algorithmic fidelity suggests latent beliefs elicited from LLMs may generalize to real humans, whereas low algorithmic fidelity renders such research invalid. Here we used an LLM to generate interviews with silicon participants matching specific demographic characteristics one-for-one with a set of human participants. Using framework-based qualitative analysis, we showed the key themes obtained from both human and silicon participants were strikingly similar. However, when we analyzed the structure and tone of the interviews we found even more striking differences. We also found evidence of the hyper-accuracy distortion described by Aher et al. (2023). We conclude that the LLM we tested (GPT-3.5) does not have sufficient algorithmic fidelity to expect research on it to generalize to human populations. However, the rapid pace of LLM research makes it plausible this could change in the future. Thus we stress the need to establish epistemic norms now around how to assess validity of LLM-based qualitative research, especially concerning the need to ensure representation of heterogeneous lived experiences.
    摘要 The key concept in our analysis was "algorithmic fidelity," which refers to the degree to which LLM-generated outputs reflect the beliefs and attitudes of human sub-populations. If the algorithmic fidelity is high, it suggests that the latent beliefs elicited from the LLM may generalize to real humans, while low algorithmic fidelity renders the research invalid.To test the algorithmic fidelity of an LLM (GPT-3.5), we generated interviews with silicon participants that matched specific demographic characteristics with a set of human participants. We used framework-based qualitative analysis to identify the key themes in both the human and silicon participants' interviews. While the themes were strikingly similar, we found more significant differences in the structure and tone of the interviews. Additionally, we found evidence of the "hyper-accuracy distortion" described by Aher et al. (2023), which suggests that the LLM's responses were overly accurate and lacked the nuance and variation found in human responses.Based on our findings, we conclude that the LLM we tested does not have sufficient algorithmic fidelity to expect research on it to generalize to human populations. However, the rapid pace of LLM research makes it plausible that this could change in the future. Therefore, we stress the need to establish epistemic norms now around how to assess the validity of LLM-based qualitative research, particularly with regard to ensuring representation of heterogeneous lived experiences.

Hide and Seek (HaS): A Lightweight Framework for Prompt Privacy Protection

  • paper_url: http://arxiv.org/abs/2309.03057
  • repo_url: https://github.com/alohachen/hide-and-seek
  • paper_authors: Yu Chen, Tingxin Li, Huiming Liu, Yang Yu
    for: 这个研究是为了提高大型自然语言模型(LLM)的隐私保护。methods: 这个研究使用了多方 computation(MPC)技术来保护用户的隐私,并将隐私资讯转换为小型本地模型来实现隐私保护。results: 实验结果显示,这个 HaS 框架可以实现隐私保护和功能性的平衡,并在翻译和分类任务中进行了成功的评估。
    Abstract Numerous companies have started offering services based on large language models (LLM), such as ChatGPT, which inevitably raises privacy concerns as users' prompts are exposed to the model provider. Previous research on secure reasoning using multi-party computation (MPC) has proven to be impractical for LLM applications due to its time-consuming and communication-intensive nature. While lightweight anonymization techniques can protect private information in prompts through substitution or masking, they fail to recover sensitive data replaced in the LLM-generated results. In this paper, we expand the application scenarios of anonymization techniques by training a small local model to de-anonymize the LLM's returned results with minimal computational overhead. We introduce the HaS framework, where "H(ide)" and "S(eek)" represent its two core processes: hiding private entities for anonymization and seeking private entities for de-anonymization, respectively. To quantitatively assess HaS's privacy protection performance, we propose both black-box and white-box adversarial models. Furthermore, we conduct experiments to evaluate HaS's usability in translation and classification tasks. The experimental findings demonstrate that the HaS framework achieves an optimal balance between privacy protection and utility.
    摘要 众多公司已经开始提供基于大语言模型(LLM)的服务,例如ChatGPT,这无疑会引起隐私问题,因为用户的提示被暴露给模型提供者。先前的多方计算(MPC)安全思维研究已经证明对LLM应用程序来说是不实用的,因为它们的时间开销和通信 overhead 过高。虽然轻量级隐私技术可以在提示中保护private信息,但它们无法在LLM生成的结果中恢复敏感信息。在这篇论文中,我们扩展了隐私技术的应用场景,通过训练一个小型本地模型来解除LLM返回的结果中的隐私信息,并且减少计算 overhead。我们称之为HaS框架,其中"H"和"S"表示其两个核心过程:隐藏private实体 для隐私和寻找private实体 для解除隐私。为了量化HaS的隐私保护性能,我们提出了黑盒和白盒反对模型。此外,我们进行了翻译和分类任务的实验,以评估HaS的可用性。实验结果表明,HaS框架在隐私保护和实用之间做出了优质的均衡。

Combining pre-trained Vision Transformers and CIDER for Out Of Domain Detection

  • paper_url: http://arxiv.org/abs/2309.03047
  • repo_url: None
  • paper_authors: Grégor Jouet, Clément Duhart, Francis Rousseaux, Julio Laborde, Cyril de Runz
  • for: 本研究探讨了预训练模型在偏出分布外的检测性能。
  • methods: 本研究使用了预训练的 transformers 和 CNN 模型,以及改进方法 CIDER。
  • results: 实验结果表明,预训练 transformers 模型在 OOD 检测 task 上表现出色,而预训练 ViT 和 CNN 可以通过与 CIDER 结合来进一步提高 OOD 检测性能。
    Abstract Out-of-domain (OOD) detection is a crucial component in industrial applications as it helps identify when a model encounters inputs that are outside the training distribution. Most industrial pipelines rely on pre-trained models for downstream tasks such as CNN or Vision Transformers. This paper investigates the performance of those models on the task of out-of-domain detection. Our experiments demonstrate that pre-trained transformers models achieve higher detection performance out of the box. Furthermore, we show that pre-trained ViT and CNNs can be combined with refinement methods such as CIDER to improve their OOD detection performance even more. Our results suggest that transformers are a promising approach for OOD detection and set a stronger baseline for this task in many contexts
    摘要 <>输出文本转换为简化中文。<>业务应用中, OUT-OF-DOMAIN(OOD)检测是一项重要的组成部分,可以识别模型处理的输入是否在训练分布外。大多数工业管道都依赖于预训练模型来进行下游任务,如CNN或Vision Transformers。这篇论文研究了这些模型在OOD检测任务上的性能。我们的实验表明,预训练的转换模型在出团的情况下可以达到更高的检测性能。此外,我们还证明了预训练的CNN和ViT可以与修正方法如CIDER结合,以进一步提高OOD检测性能。我们的结果表明,转换器是OOD检测中的一个有力的方法,并在许多情况下设置了更强的基线。

A Refutation of Shapley Values for Explainability

  • paper_url: http://arxiv.org/abs/2309.03041
  • repo_url: None
  • paper_authors: Xuanxiang Huang, Joao Marques-Silva
  • for: 这个论文的目的是否定使用Shapley值作为基于规则的解释中的理论基础。
  • methods: 这篇论文使用了暴力方法来找到Boolean函数,并对这些函数进行分析以找到它们的缺陷。
  • results: 这篇论文证明了,无论feature数是多少,都存在Boolean函数会显示一些不准确的解释问题,因此不能使用Shapley值作为基于规则的解释中的理议基础。
    Abstract Recent work demonstrated the existence of Boolean functions for which Shapley values provide misleading information about the relative importance of features in rule-based explanations. Such misleading information was broadly categorized into a number of possible issues. Each of those issues relates with features being relevant or irrelevant for a prediction, and all are significant regarding the inadequacy of Shapley values for rule-based explainability. This earlier work devised a brute-force approach to identify Boolean functions, defined on small numbers of features, and also associated instances, which displayed such inadequacy-revealing issues, and so served as evidence to the inadequacy of Shapley values for rule-based explainability. However, an outstanding question is how frequently such inadequacy-revealing issues can occur for Boolean functions with arbitrary large numbers of features. It is plain that a brute-force approach would be unlikely to provide insights on how to tackle this question. This paper answers the above question by proving that, for any number of features, there exist Boolean functions that exhibit one or more inadequacy-revealing issues, thereby contributing decisive arguments against the use of Shapley values as the theoretical underpinning of feature-attribution methods in explainability.
    摘要

An Efficient Temporary Deepfake Location Approach Based Embeddings for Partially Spoofed Audio Detection

  • paper_url: http://arxiv.org/abs/2309.03036
  • repo_url: https://github.com/xieyuankun/tdl-add
  • paper_authors: Yuankun Xie, Haonan Cheng, Yutian Wang, Long Ye
  • for: 本文提出了一种精细化的假音检测方法,即Temporal Deepfake Location(TDL),以准确地检测帧级假音。
  • methods: 该方法包括两部分:嵌入相似模块和时间卷积操作。嵌入相似模块用于生成一个嵌入空间,以分离真实和假的帧。时间卷积操作则用于计算邻帧之间的相似性,并动态选择有用的邻帧进行卷积。
  • results: 实验显示,与基eline模型相比,本方法在ASVspoof2019 Partial Spoof dataset中表现出色,并在跨数据集场景下也达到了优秀的表现。代码已经上传到了线上。
    Abstract Partially spoofed audio detection is a challenging task, lying in the need to accurately locate the authenticity of audio at the frame level. To address this issue, we propose a fine-grained partially spoofed audio detection method, namely Temporal Deepfake Location (TDL), which can effectively capture information of both features and locations. Specifically, our approach involves two novel parts: embedding similarity module and temporal convolution operation. To enhance the identification between the real and fake features, the embedding similarity module is designed to generate an embedding space that can separate the real frames from fake frames. To effectively concentrate on the position information, temporal convolution operation is proposed to calculate the frame-specific similarities among neighboring frames, and dynamically select informative neighbors to convolution. Extensive experiments show that our method outperform baseline models in ASVspoof2019 Partial Spoof dataset and demonstrate superior performance even in the crossdataset scenario. The code is released online.
    摘要 <>将给定文本翻译成简化中文。<>假 Audio 检测是一个复杂的任务,因为需要准确地确定各帧的真实性。为解决这个问题,我们提出了一种细化的假 Audio 检测方法,即 Temporal Deepfake Location(TDL),可以准确地捕捉各帧的特征信息和位置信息。具体来说,我们的方法包括两个新的部分:嵌入相似模块和时间核算操作。为了增强真实和假 Frame 之间的区别,我们设计了嵌入相似模块,用于生成一个可以分离真帧和假帧的嵌入空间。此外,我们还提出了时间核算操作,用于在邻近帧之间进行同义核算,以选择有用的邻居进行 convolution。我们的方法在 ASVspoof2019 partial spoof 数据集上实现了对基线模型的超越性,并在跨数据集场景下也达到了优秀的表现。代码已经在线发布。

Synthetic Text Generation using Hypergraph Representations

  • paper_url: http://arxiv.org/abs/2309.06550
  • repo_url: None
  • paper_authors: Natraj Raman, Sameena Shah
  • for: 文章的目的是生成文档的 sintetic variants,即文本转换。
  • methods: 本文提出了一种基于LLM的方法,首先将文档分解成semantic frames,然后使用这种间接稀疏格式生成文本。frames使用了一个浮动图,可以在原则性的方式下进行框架内容的改变。
  • results: 本文的解决方案可以生成多样化、凝聚的文档,其中包括不同的样式、情感、格式、组织结构和事实。
    Abstract Generating synthetic variants of a document is often posed as text-to-text transformation. We propose an alternate LLM based method that first decomposes a document into semantic frames and then generates text using this interim sparse format. The frames are modeled using a hypergraph, which allows perturbing the frame contents in a principled manner. Specifically, new hyperedges are mined through topological analysis and complex polyadic relationships including hierarchy and temporal dynamics are accommodated. We show that our solution generates documents that are diverse, coherent and vary in style, sentiment, format, composition and facts.
    摘要 通常,生成文档的变体是一个文本到文本转换问题。我们提出了一种基于LLM的新方法,它首先将文档分解成Semantic Frame,然后使用这种间接稀有格式生成文本。这些帧是通过超graph进行模型,这允许在原则上进行帧内容的扰动。我们显示了我们的解决方案可以生成多样、一致、style、情感、格式、结构和事实等方面的文档。

Universal Preprocessing Operators for Embedding Knowledge Graphs with Literals

  • paper_url: http://arxiv.org/abs/2309.03023
  • repo_url: https://gitlab.com/patryk.preisner/mkga
  • paper_authors: Patryk Preisner, Heiko Paulheim
  • for: 本文是针对知识图(KG)中的实体进行 dense 数值表示的 dense numerical representations。
  • methods: 本文提出了一组通用预处理操作,可以将 KG 中的实体与数值、时间、文本和图像信息转换为任何方法可以使用的形式。
  • results: 在 kgbench 数据集上使用三种不同的嵌入方法进行实验,得到了满意的结果。
    Abstract Knowledge graph embeddings are dense numerical representations of entities in a knowledge graph (KG). While the majority of approaches concentrate only on relational information, i.e., relations between entities, fewer approaches exist which also take information about literal values (e.g., textual descriptions or numerical information) into account. Those which exist are typically tailored towards a particular modality of literal and a particular embedding method. In this paper, we propose a set of universal preprocessing operators which can be used to transform KGs with literals for numerical, temporal, textual, and image information, so that the transformed KGs can be embedded with any method. The results on the kgbench dataset with three different embedding methods show promising results.
    摘要 知识图谱嵌入是指知识图谱中实体的密集数字表示。大多数方法只关注知识图谱中的关系信息,而 fewer approaches 存在,它们通常专门针对特定类型的媒体和嵌入方法。在这篇论文中,我们提出了一组通用预处理操作,可以将知识图谱中的 literals 转换为数字、时间、文本和图像信息,以便使用任何嵌入方法进行嵌入。 kgbench 数据集上的三种嵌入方法的结果表现良好。

EdgeFL: A Lightweight Decentralized Federated Learning Framework

  • paper_url: http://arxiv.org/abs/2309.02936
  • repo_url: None
  • paper_authors: Hongyi Zhang, Jan Bosch, Helena Holmström Olsson
  • for: 本研究旨在提供一个轻量级的分布式机器学习框架,以解决现有的聚合中心和扩展性问题。
  • methods: 本研究使用的方法是边缘仅的轻量级分布式机器学习框架,扩展了现有的中央聚合和扩展性问题。
  • results: 本研究的结果显示,边缘仅的轻量级分布式机器学习框架可以实现更好的性能,包括对于聚合中心和扩展性的改善。
    Abstract Federated Learning (FL) has emerged as a promising approach for collaborative machine learning, addressing data privacy concerns. However, existing FL platforms and frameworks often present challenges for software engineers in terms of complexity, limited customization options, and scalability limitations. In this paper, we introduce EdgeFL, an edge-only lightweight decentralized FL framework, designed to overcome the limitations of centralized aggregation and scalability in FL deployments. By adopting an edge-only model training and aggregation approach, EdgeFL eliminates the need for a central server, enabling seamless scalability across diverse use cases. With a straightforward integration process requiring just four lines of code (LOC), software engineers can easily incorporate FL functionalities into their AI products. Furthermore, EdgeFL offers the flexibility to customize aggregation functions, empowering engineers to adapt them to specific needs. Based on the results, we demonstrate that EdgeFL achieves superior performance compared to existing FL platforms/frameworks. Our results show that EdgeFL reduces weights update latency and enables faster model evolution, enhancing the efficiency of edge devices. Moreover, EdgeFL exhibits improved classification accuracy compared to traditional centralized FL approaches. By leveraging EdgeFL, software engineers can harness the benefits of federated learning while overcoming the challenges associated with existing FL platforms/frameworks.
    摘要 Federated Learning(FL)已经出现为协同机器学习的有力的方法,解决了数据隐私问题。然而,现有的 FL 平台和框架通常会给软件工程师带来复杂性、有限的自定义选项和可扩展性限制。在这篇论文中,我们介绍 EdgeFL,一个靠 Edge 进行轻量级归并的分布式 FL 框架,用于超越中央集成和可扩展性在 FL 部署中的限制。通过采用 Edge Only 模型训练和归并方法,EdgeFL 消除了中央服务器的需求,使得各种使用场景中的扩展性变得自然和简单。具有四行代码(LOC)的简单集成过程,软件工程师可以轻松地将 FL 功能集成到其 AI 产品中。此外,EdgeFL 还提供了自定义归并函数的 flexibility,使得工程师可以根据特定需求进行适应。根据结果,我们表明 EdgeFL 可以比现有的 FL 平台/框架实现更高的性能。我们的结果显示,EdgeFL 可以降低 weights 更新延迟和快速进化模型,提高边缘设备的效率。此外,EdgeFL 还展现出了比传统中央 FL 方法更高的分类精度。通过利用 EdgeFL,软件工程师可以获得 federated learning 的优势,同时超越现有的 FL 平台/框架中的挑战。

Estimating irregular water demands with physics-informed machine learning to inform leakage detection

  • paper_url: http://arxiv.org/abs/2309.02935
  • repo_url: https://github.com/swn-group-at-tu-berlin/lila-pinn
  • paper_authors: Ivo Daniel, Andrea Cominola
  • for: This paper aims to develop a physics-informed machine learning algorithm for timely identifying and accurately localizing leakages in drinking water distribution networks.
  • methods: The proposed algorithm uses a fully connected neural network to analyze pressure data and estimate unknown irregular water demands, leveraging the Bernoulli equation to linearize the leakage detection problem.
  • results: The algorithm was tested on data from the L-Town benchmark network and showed good performance in estimating most irregular demands, with R2 values larger than 0.8. The results also demonstrated that the algorithm can improve the identification of leakages under the presence of irregular demands by a factor of 5.3 for abrupt leaks and a factor of 3.0 for incipient leaks compared to disregarding irregular demands.
    Abstract Leakages in drinking water distribution networks pose significant challenges to water utilities, leading to infrastructure failure, operational disruptions, environmental hazards, property damage, and economic losses. The timely identification and accurate localisation of such leakages is paramount for utilities to mitigate these unwanted effects. However, implementation of algorithms for leakage detection is limited in practice by requirements of either hydraulic models or large amounts of training data. Physics-informed machine learning can utilise hydraulic information thereby circumventing both limitations. In this work, we present a physics-informed machine learning algorithm that analyses pressure data and therefrom estimates unknown irregular water demands via a fully connected neural network, ultimately leveraging the Bernoulli equation and effectively linearising the leakage detection problem. Our algorithm is tested on data from the L-Town benchmark network, and results indicate a good capability for estimating most irregular demands, with R2 larger than 0.8. Identification results for leakages under the presence of irregular demands could be improved by a factor of 5.3 for abrupt leaks and a factor of 3.0 for incipient leaks when compared the results disregarding irregular demands.
    摘要 饮水供应网络中的泄漏问题对水公司带来了重要挑战,导致基础设施崩溃、操作中断、环境风险、财务损失等。准确地识别和定位泄漏是水公司应对这些不良影响的关键。然而,现实中实施泄漏检测算法的困难在于需要水力模型或大量的训练数据。物理学 Informed machine learning 可以利用水力信息,从而绕过这两个限制。在这个工作中,我们提出了一种基于物理学 Informed machine learning 算法,通过分析压力数据,并由完全连接神经网络来估算未知的不规则水需求,最终利用白银方程和有效地线性化泄漏检测问题。我们的算法在 L-Town 测试网络上进行了测试,结果表明可以准确地估算大多数不规则需求,R2 值大于 0.8。在存在不规则需求情况下,泄漏的标识结果可以提高了5.3倍 для突然泄漏和3.0倍 дляincipient泄漏。

On the Challenges of Building Datasets for Hate Speech Detection

  • paper_url: http://arxiv.org/abs/2309.02912
  • repo_url: None
  • paper_authors: Vitthal Bhandari
  • for: 本研究旨在提供一种数据创建管道框架,以便在未来创建 hate speech 数据集时能够遵循best practice。
  • methods: 本研究使用了一种数据中心视角来分析 hate speech 检测问题,并提出了一个涵盖七个维度的数据创建管道框架。
  • results: 本研究通过使用这种框架,可以帮助 практикан们在未来创建 hate speech 数据集时遵循best practice,并提高数据的可靠性和一致性。
    Abstract Detection of hate speech has been formulated as a standalone application of NLP and different approaches have been adopted for identifying the target groups, obtaining raw data, defining the labeling process, choosing the detection algorithm, and evaluating the performance in the desired setting. However, unlike other downstream tasks, hate speech suffers from the lack of large-sized, carefully curated, generalizable datasets owing to the highly subjective nature of the task. In this paper, we first analyze the issues surrounding hate speech detection through a data-centric lens. We then outline a holistic framework to encapsulate the data creation pipeline across seven broad dimensions by taking the specific example of hate speech towards sexual minorities. We posit that practitioners would benefit from following this framework as a form of best practice when creating hate speech datasets in the future.
    摘要 偏见排斥检测已经被设计为自然语言处理(NLP)的独立应用程序,不同的方法被采用来识别目标群体、获取原始数据、定义标签过程、选择检测算法和评估性能在适当的设定下。然而,与其他下游任务不同,偏见排斥受到大量、精心整理、通用的数据集的缺乏,这是因为这项任务的性质具有高度主观的特点。在这篇文章中,我们首先分析了偏见排斥检测的问题,并以 hate speech towards sexual minorities 为例子,描述了一个整体的框架,以帮助实践者在未来创建偏见排斥数据集时 seguir esta framework 作为最佳实践。

DECODE: Data-driven Energy Consumption Prediction leveraging Historical Data and Environmental Factors in Buildings

  • paper_url: http://arxiv.org/abs/2309.02908
  • repo_url: None
  • paper_authors: Aditya Mishra, Haroon R. Lone, Aayush Mishra
  • for: 预测建筑物的能源消耗,以便实现有效的能源管理和 distribuTECH grid 内部的能源分配。
  • methods: 使用历史能源数据、occupancy 模式和天气条件来预测建筑物的能源消耗,并使用 Long Short-Term Memory (LSTM) 模型进行预测。
  • results: 相比已有预测方法,LSTM 模型提供了更高的预测精度,其 R2 分数为 0.97,mean absolute error (MAE) 为 0.007。此外,该模型还能够从限制的数据集中进行高效的预测,并且具有很好的泛化能力和可靠性。
    Abstract Energy prediction in buildings plays a crucial role in effective energy management. Precise predictions are essential for achieving optimal energy consumption and distribution within the grid. This paper introduces a Long Short-Term Memory (LSTM) model designed to forecast building energy consumption using historical energy data, occupancy patterns, and weather conditions. The LSTM model provides accurate short, medium, and long-term energy predictions for residential and commercial buildings compared to existing prediction models. We compare our LSTM model with established prediction methods, including linear regression, decision trees, and random forest. Encouragingly, the proposed LSTM model emerges as the superior performer across all metrics. It demonstrates exceptional prediction accuracy, boasting the highest R2 score of 0.97 and the most favorable mean absolute error (MAE) of 0.007. An additional advantage of our developed model is its capacity to achieve efficient energy consumption forecasts even when trained on a limited dataset. We address concerns about overfitting (variance) and underfitting (bias) through rigorous training and evaluation on real-world data. In summary, our research contributes to energy prediction by offering a robust LSTM model that outperforms alternative methods and operates with remarkable efficiency, generalizability, and reliability.
    摘要 The proposed LSTM model achieves high prediction accuracy, with an R2 score of 0.97 and a mean absolute error (MAE) of 0.007. Additionally, the model is efficient and can achieve accurate energy consumption forecasts even when trained on a limited dataset. To address concerns about overfitting and underfitting, the model is trained and evaluated on real-world data.Compared to other prediction methods, including linear regression, decision trees, and random forest, the LSTM model emerges as the superior performer across all metrics. The proposed model offers a robust and reliable solution for energy prediction, with exceptional efficiency and generalizability. Overall, this research contributes to energy prediction by providing a more accurate and effective approach for managing energy consumption in buildings.

A deep Natural Language Inference predictor without language-specific training data

  • paper_url: http://arxiv.org/abs/2309.02887
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Lorenzo Corradi, Alessandro Manenti, Francesca Del Bonifro, Francesco Setti, Dario Del Sorbo
  • for: 解决目标语言选择无需语言特定的训练数据集的推理关系(NLI)问题。
  • methods: 利用生成句子嵌入的generic翻译数据集,并使用两个相同的预训练模型:一个生成源语言句子嵌入,另一个在目标语言上练习,以模拟第一个。这种方法被称为知识填充。
  • results: 在Stanford NLI测试集、Multi-Genre NLI测试集和手动翻译RTE3-ITA测试集上评估了提议的建筑,并在不同任务上进行了实际验证,包括意识分析、偏好分析和主题识别。结果表明,知识填充技术可以超越基于机器翻译的方法,即使其未直接在测试数据上训练。
    Abstract In this paper we present a technique of NLP to tackle the problem of inference relation (NLI) between pairs of sentences in a target language of choice without a language-specific training dataset. We exploit a generic translation dataset, manually translated, along with two instances of the same pre-trained model - the first to generate sentence embeddings for the source language, and the second fine-tuned over the target language to mimic the first. This technique is known as Knowledge Distillation. The model has been evaluated over machine translated Stanford NLI test dataset, machine translated Multi-Genre NLI test dataset, and manually translated RTE3-ITA test dataset. We also test the proposed architecture over different tasks to empirically demonstrate the generality of the NLI task. The model has been evaluated over the native Italian ABSITA dataset, on the tasks of Sentiment Analysis, Aspect-Based Sentiment Analysis, and Topic Recognition. We emphasise the generality and exploitability of the Knowledge Distillation technique that outperforms other methodologies based on machine translation, even though the former was not directly trained on the data it was tested over.
    摘要 在这篇论文中,我们提出了一种基于自然语言处理(NLP)技术的问题推理关系(NLI)解决方案,无需特定语言的训练数据集。我们利用了一个通用翻译数据集,手动翻译的两个实例,其中一个用于生成源语言的句子嵌入,另一个在目标语言上进行了精度调整,以模仿第一个。这种技术被称为知识填充。我们在机器翻译的Stanford NLI测试数据集、多种语言 NLI测试数据集以及手动翻译的RTE3-ITA测试数据集上评估了该模型。我们还在不同任务上测试了该建议的架构,以证明其通用性。在原始意大利语言ABSITA数据集上,我们评估了情感分析、受体语言分析和主题识别等任务。我们强调了知识填充技术的通用性和可利用性,并证明了它在基于机器翻译的方法之上表现出色。

MAD: Modality Agnostic Distance Measure for Image Registration

  • paper_url: http://arxiv.org/abs/2309.02875
  • repo_url: None
  • paper_authors: Vasiliki Sideri-Lampretsa, Veronika A. Zimmer, Huaqi Qiu, Georgios Kaissis, Daniel Rueckert
  • for: 这个论文的目的是提出一种多模态图像匹配算法,以便在医学应用中进行前处理。
  • methods: 该论文使用了随机卷积来学习图像的内在几何结构,并使用这种方法来适应不同的探测模式。
  • results: 论文的实验结果表明,使用这种方法可以成功地将多模态图像匹配成功,并且其捕捉范围更大于传统的度量方法,如相互信息和 норmalized gradient fields。
    Abstract Multi-modal image registration is a crucial pre-processing step in many medical applications. However, it is a challenging task due to the complex intensity relationships between different imaging modalities, which can result in large discrepancy in image appearance. The success of multi-modal image registration, whether it is conventional or learning based, is predicated upon the choice of an appropriate distance (or similarity) measure. Particularly, deep learning registration algorithms lack in accuracy or even fail completely when attempting to register data from an "unseen" modality. In this work, we present Modality Agnostic Distance (MAD), a deep image distance}] measure that utilises random convolutions to learn the inherent geometry of the images while being robust to large appearance changes. Random convolutions are geometry-preserving modules which we use to simulate an infinite number of synthetic modalities alleviating the need for aligned paired data during training. We can therefore train MAD on a mono-modal dataset and successfully apply it to a multi-modal dataset. We demonstrate that not only can MAD affinely register multi-modal images successfully, but it has also a larger capture range than traditional measures such as Mutual Information and Normalised Gradient Fields.
    摘要

Rethinking Momentum Knowledge Distillation in Online Continual Learning

  • paper_url: http://arxiv.org/abs/2309.02870
  • repo_url: None
  • paper_authors: Nicolas Michel, Maorong Wang, Ling Xiao, Toshihiko Yamasaki
  • For: addresses the problem of training neural networks on a continuous data stream where multiple classification tasks emerge in sequence.* Methods: uses Momentum Knowledge Distillation (MKD) to enhance existing OCL methods, and demonstrates its capabilities to improve accuracy by more than 10% points on ImageNet100.* Results: improves existing state-of-the-art accuracy by more than 10% points on ImageNet100, and sheds light on MKD internal mechanics and impacts during training in OCL.
    Abstract Online Continual Learning (OCL) addresses the problem of training neural networks on a continuous data stream where multiple classification tasks emerge in sequence. In contrast to offline Continual Learning, data can be seen only once in OCL. In this context, replay-based strategies have achieved impressive results and most state-of-the-art approaches are heavily depending on them. While Knowledge Distillation (KD) has been extensively used in offline Continual Learning, it remains under-exploited in OCL, despite its potential. In this paper, we theoretically analyze the challenges in applying KD to OCL. We introduce a direct yet effective methodology for applying Momentum Knowledge Distillation (MKD) to many flagship OCL methods and demonstrate its capabilities to enhance existing approaches. In addition to improving existing state-of-the-arts accuracy by more than $10\%$ points on ImageNet100, we shed light on MKD internal mechanics and impacts during training in OCL. We argue that similar to replay, MKD should be considered a central component of OCL.
    摘要 In this paper, we theoretically analyze the challenges in applying KD to OCL. We introduce a direct yet effective methodology for applying momentum knowledge distillation (MKD) to many flagship OCL methods and demonstrate its capabilities to enhance existing approaches. In addition to improving existing state-of-the-art accuracy by more than 10 percentage points on ImageNet100, we shed light on MKD's internal mechanisms and impacts during training in OCL. We argue that, similar to replay, MKD should be considered a central component of OCL.

A recommender for the management of chronic pain in patients undergoing spinal cord stimulation

  • paper_url: http://arxiv.org/abs/2309.03918
  • repo_url: None
  • paper_authors: Tigran Tchrakian, Mykhaylo Zayats, Alessandra Pascale, Dat Huynh, Pritish Parida, Carla Agurto Rios, Sergiy Zhuk, Jeffrey L. Rogers, ENVISION Studies Physician Author Group, Boston Scientific Research Scientists Consortium
  • for: 这个论文主要是为了管理慢性疼痛而设计的。
  • methods: 这个论文使用了一种 Contextual Multi-armed Bandit(CMAB)方法,用于开发一个可以为慢性疼痛患者提供SCS设置的推荐系统。
  • results: 这个研究发现,通过向患者提供SCS推荐,可以 statistically significant improvement in clinical outcomes(疼痛和/或生活质量),85%的所有subjects(N=21)表现出了改善。在moderate PS(N=7)的subjects中,100%的subjects表现出了 statistically significant improvement,5/7的subjects有改善的PS dwell time。
    Abstract Spinal cord stimulation (SCS) is a therapeutic approach used for the management of chronic pain. It involves the delivery of electrical impulses to the spinal cord via an implanted device, which when given suitable stimulus parameters can mask or block pain signals. Selection of optimal stimulation parameters usually happens in the clinic under the care of a provider whereas at-home SCS optimization is managed by the patient. In this paper, we propose a recommender system for the management of pain in chronic pain patients undergoing SCS. In particular, we use a contextual multi-armed bandit (CMAB) approach to develop a system that recommends SCS settings to patients with the aim of improving their condition. These recommendations, sent directly to patients though a digital health ecosystem, combined with a patient monitoring system closes the therapeutic loop around a chronic pain patient over their entire patient journey. We evaluated the system in a cohort of SCS-implanted ENVISION study subjects (Clinicaltrials.gov ID: NCT03240588) using a combination of quality of life metrics and Patient States (PS), a novel measure of holistic outcomes. SCS recommendations provided statistically significant improvement in clinical outcomes (pain and/or QoL) in 85\% of all subjects (N=21). Among subjects in moderate PS (N=7) prior to receiving recommendations, 100\% showed statistically significant improvements and 5/7 had improved PS dwell time. This analysis suggests SCS patients may benefit from SCS recommendations, resulting in additional clinical improvement on top of benefits already received from SCS therapy.
    摘要 脊梗刺激疗法(SCS)是一种治疗方法用于管理慢性痛。它通过在体内嵌入设备的电rical脉搏,对脊梗进行刺激,可以阻据或掩盖痛讯。选择最佳刺激参数通常在医生诊所进行,而在家SCS优化则由病人自行处理。在这篇研究中,我们提出了一个推荐系统,用于管理慢性痛患者在刺激疗法中的痛症。我们使用多臂环境(CMAB)方法开发了一个系统,可以为患者提供SCS设置的推荐。这些推荐通过数位健康生态系统发送到病人,与病人监控系统结合,将患者的整个病程关键点组合起来。我们在ENVISION研究(Clinicaltrials.gov ID:NCT03240588)中评估了这个系统,使用质量生活指标和患者状态(PS),一种新的整体结果测量。SCS推荐提供了显著改善临床结果(痛和/或生活质量)的statistically significant improvement(p=0.003)。在moderate PS(N=7)前 receiving推荐的患者中,100% 示出了 statistically significant improvement,5/7 有改善的 PS 滞留时间。这个分析表明SCS患者可能受益于SCS推荐,从而增加了SCS疗法已经提供的临床改善。

Generalised Mutual Information: a Framework for Discriminative Clustering

  • paper_url: http://arxiv.org/abs/2309.02858
  • repo_url: None
  • paper_authors: Louis Ohl, Pierre-Alexandre Mattei, Charles Bouveyron, Warith Harchaoui, Mickaël Leclercq, Arnaud Droit, Frédéric Precioso
  • for: 本文旨在探讨最近一些深度归一化中的 clustering 目标函数,尤其是使用 Mutual Information (MI) 作为无监督的目标函数。
  • methods: 本文首先指出了 maximizing MI 不一定会得到满意的归一化结果,并提出了 Kullback-Leibler divergence 是这种行为的主要原因。然后,本文引入了一种扩展的 MI,称为 Generalised Mutual Information (GEMINI),它是一种基于距离或kernel在数据空间的geometry-aware的度量。
  • results: GEMINIs 可以自动选择相关的数量的归一化类别,这是深度归一化预先未知的情况下归一化中很少研究的一个特性。
    Abstract In the last decade, recent successes in deep clustering majorly involved the Mutual Information (MI) as an unsupervised objective for training neural networks with increasing regularisations. While the quality of the regularisations have been largely discussed for improvements, little attention has been dedicated to the relevance of MI as a clustering objective. In this paper, we first highlight how the maximisation of MI does not lead to satisfying clusters. We identified the Kullback-Leibler divergence as the main reason of this behaviour. Hence, we generalise the mutual information by changing its core distance, introducing the Generalised Mutual Information (GEMINI): a set of metrics for unsupervised neural network training. Unlike MI, some GEMINIs do not require regularisations when training as they are geometry-aware thanks to distances or kernels in the data space. Finally, we highlight that GEMINIs can automatically select a relevant number of clusters, a property that has been little studied in deep discriminative clustering context where the number of clusters is a priori unknown.
    摘要 过去一个十年,深度归一(Deep Clustering)的成功主要基于用无监督目标函数训练神经网络的积分信息(Mutual Information,MI)。虽然改进质量的讨论得到了广泛关注,但对于MI作为归一目标的重要性几乎没有任何关注。在这篇论文中,我们首先强调了将积分信息 maximize 不会导致满意的归一。我们认为,这是由卷积-莱布勒散射(Kullback-Leibler divergence)的主要原因。因此,我们总结了积分信息,并引入了一组新的准则(Generalized Mutual Information,GEMINI),这些准则不需要训练时的正则化。此外,我们发现了GEMINI可以自动选择合适的归一数量,这是在深度探测归一上未经知道归一数量的情况下很少研究的特性。

Getting too personal(ized): The importance of feature choice in online adaptive algorithms

  • paper_url: http://arxiv.org/abs/2309.02856
  • repo_url: None
  • paper_authors: ZhaoBin Li, Luna Yee, Nathaniel Sauerberg, Irene Sakson, Joseph Jay Williams, Anna N. Rafferty
  • for: 这个论文的目的是研究个性化学习技术是否有成本,以及个性化是否会导致政策采用延迟。
  • methods: 这篇论文使用了多臂投掷(MAB)算法来学习每个学生应该接受哪个版本的教育技术。
  • results: simulation 结果表明,在某些情况下,包含学生特征进行个性化可以有利,但在其他情况下,这会降低投掷算法的性能。此外,包含不必要的学生特征可能会系统性地劣化一些学生。
    Abstract Digital educational technologies offer the potential to customize students' experiences and learn what works for which students, enhancing the technology as more students interact with it. We consider whether and when attempting to discover how to personalize has a cost, such as if the adaptation to personal information can delay the adoption of policies that benefit all students. We explore these issues in the context of using multi-armed bandit (MAB) algorithms to learn a policy for what version of an educational technology to present to each student, varying the relation between student characteristics and outcomes and also whether the algorithm is aware of these characteristics. Through simulations, we demonstrate that the inclusion of student characteristics for personalization can be beneficial when those characteristics are needed to learn the optimal action. In other scenarios, this inclusion decreases performance of the bandit algorithm. Moreover, including unneeded student characteristics can systematically disadvantage students with less common values for these characteristics. Our simulations do however suggest that real-time personalization will be helpful in particular real-world scenarios, and we illustrate this through case studies using existing experimental results in ASSISTments. Overall, our simulations show that adaptive personalization in educational technologies can be a double-edged sword: real-time adaptation improves student experiences in some contexts, but the slower adaptation and potentially discriminatory results mean that a more personalized model is not always beneficial.
    摘要 《数字教育技术的个性化机会》数字教育技术可以为学生个性化学习经验,了解每个学生的需求,并且可以通过更多的学生互动来提高技术。然而,我们需要考虑个性化是否有成本,例如个性化学生信息可能会导致政策采用延迟。我们在多臂投降(MAB)算法学习政策中对学生特征进行个性化时的问题。我们通过 simulations 表明,当学生特征需要了解优化行动时,包含学生特征可以有利。然而,在其他情况下,包含学生特征可能会降低投降算法的性能。此外,包含不必要的学生特征可能会系统性地劣化拥有不同特征的学生。我们的 simulations 表明,在某些实际场景下,实时个性化可以有助于学生,而不是一直等待个性化。总之,我们的 simulations 表明,个性化教育技术可以是一种两面的剑:实时个性化可以提高学生体验,但是更慢的个性化和可能歧视的结果表明,一个更加个性化的模型并不总是有利。

Promoting Open-domain Dialogue Generation through Learning Pattern Information between Contexts and Responses

  • paper_url: http://arxiv.org/abs/2309.02823
  • repo_url: https://github.com/russellliu0/rad
  • paper_authors: Mengjuan Liu, Chenyang Liu, Yunfan Yang, Jiang Liu, Mohan Jing
  • for: 提高开放领域对话模型中生成响应的质量,使其更加生动、有信息内容。
  • methods: 基于预训练语言模型(GPT-2)建立开放领域对话模型,提出改进的计划采样方法,使用响应来引导生成响应的训练阶段,并设计了响应相关机制以挖掘含义脉络信息。
  • results: 在Persona-Chat和DailyDialog数据集上评估提出的模型(RAD),实验结果表明,我们的模型在大多数自动和手动指标上超过了基eline。
    Abstract Recently, utilizing deep neural networks to build the opendomain dialogue models has become a hot topic. However, the responses generated by these models suffer from many problems such as responses not being contextualized and tend to generate generic responses that lack information content, damaging the user's experience seriously. Therefore, many studies try introducing more information into the dialogue models to make the generated responses more vivid and informative. Unlike them, this paper improves the quality of generated responses by learning the implicit pattern information between contexts and responses in the training samples. In this paper, we first build an open-domain dialogue model based on the pre-trained language model (i.e., GPT-2). And then, an improved scheduled sampling method is proposed for pre-trained models, by which the responses can be used to guide the response generation in the training phase while avoiding the exposure bias problem. More importantly, we design a response-aware mechanism for mining the implicit pattern information between contexts and responses so that the generated replies are more diverse and approximate to human replies. Finally, we evaluate the proposed model (RAD) on the Persona-Chat and DailyDialog datasets; and the experimental results show that our model outperforms the baselines on most automatic and manual metrics.
    摘要 近期,使用深度神经网络建立开放领域对话模型已成为热点话题。然而,由这些模型生成的响应具有许多问题,如不具备上下文知识和生成泛润响应,导致用户体验受到严重损害。因此,许多研究尝试通过在对话模型中添加更多信息来使生成的响应更加生动和有用。不同于之前的研究,本文改进了对话模型的质量,通过在训练样本中学习含义映射信息。首先,我们基于预训练语言模型(i.e., GPT-2)建立了一个开放领域对话模型。然后,我们提出了一种改进的排定采样方法,通过响应来引导对话模型在训练阶段的响应生成,同时避免露出偏见问题。此外,我们设计了一种响应感知机制,以挖掘含义映射信息,使生成的答案更加多样化和人类化。最后,我们在Persona-Chat和DailyDialog数据集上评估了我们的模型(RAD),并获得了许多自动和手动指标上的优秀result。

Roulette: A Semantic Privacy-Preserving Device-Edge Collaborative Inference Framework for Deep Learning Classification Tasks

  • paper_url: http://arxiv.org/abs/2309.02820
  • repo_url: None
  • paper_authors: Jingyi Li, Guocheng Liao, Lin Chen, Xu Chen
  • for: 本文提出了一个名为 Roulette 的任务型semantic privacy-preserving 深度学习分类器框架,用于解决非同分布数据和隐私泄露问题。
  • methods: 该框架基于划算学习和加密学习,将数据的真实值视为私钥信息,并提供了一种分布式隐私保护的 garantue。
  • results: 通过使用实际数据进行广泛的性能评估, authors 发现 Roulette 可以有效地防止多种攻击,同时保持模型准确性。 在非同分布数据下,Roulette 可以提高推断精度21%。
    Abstract Deep learning classifiers are crucial in the age of artificial intelligence. The device-edge-based collaborative inference has been widely adopted as an efficient framework for promoting its applications in IoT and 5G/6G networks. However, it suffers from accuracy degradation under non-i.i.d. data distribution and privacy disclosure. For accuracy degradation, direct use of transfer learning and split learning is high cost and privacy issues remain. For privacy disclosure, cryptography-based approaches lead to a huge overhead. Other lightweight methods assume that the ground truth is non-sensitive and can be exposed. But for many applications, the ground truth is the user's crucial privacy-sensitive information. In this paper, we propose a framework of Roulette, which is a task-oriented semantic privacy-preserving collaborative inference framework for deep learning classifiers. More than input data, we treat the ground truth of the data as private information. We develop a novel paradigm of split learning where the back-end DNN is frozen and the front-end DNN is retrained to be both a feature extractor and an encryptor. Moreover, we provide a differential privacy guarantee and analyze the hardness of ground truth inference attacks. To validate the proposed Roulette, we conduct extensive performance evaluations using realistic datasets, which demonstrate that Roulette can effectively defend against various attacks and meanwhile achieve good model accuracy. In a situation where the non-i.i.d. is very severe, Roulette improves the inference accuracy by 21\% averaged over benchmarks, while making the accuracy of discrimination attacks almost equivalent to random guessing.
    摘要 深度学习分类器在人工智能时代扮演着关键性的角色。设备边缘基于的合作推理已成为一种高效的推广应用在物联网和5G/6G网络中。然而,它受到异步数据分布的精度下降和隐私泄露的问题。直接使用传输学习和分裂学习可能会带来高成本和隐私问题,而使用 криптография-based方法会带来巨大的负担。其他轻量级方法假设数据的真实值是无敏感的,但在许多应用中,用户的真实值是关键的隐私敏感信息。在这篇论文中,我们提出了一个名为Roulette的任务意义 Semantic Privacy-Preserving Collaborative Inference框架,其中我们将数据的真实值视为私钥信息。我们开发了一种新的分裂学习方法,其中后端DNN冻结,前端DNN重新训练为特征提取器和加密器。此外,我们提供了一种差分隐私保证,并分析了攻击者对真实值的推理难度。为验证我们的Roulette,我们对实际数据进行了广泛的性能评估,结果表明Roulette可以有效防御各种攻击,同时保持良好的模型准确率。在异步数据分布情况下,Roulette提高了推理精度21%,而对攻击者的推理精度接近随机猜测。

Combining Thermodynamics-based Model of the Centrifugal Compressors and Active Machine Learning for Enhanced Industrial Design Optimization

  • paper_url: http://arxiv.org/abs/2309.02818
  • repo_url: None
  • paper_authors: Shadi Ghiasi, Guido Pazzi, Concettina Del Grosso, Giovanni De Magistris, Giacomo Veneri
  • for: 这个论文主要是用来提高中心力压缩机的设计过程中的优化过程,以减少计算成本。
  • methods: 这个方法结合了一个内部的气动学模型和一个 Gaussian Process 基于的替身模型,并在一个可动性学习(AL)设定下使用。
  • results: 这个方法可以对替身模型进行优化,并且可以在生产环境中实现46%的计算时间减少,同时保持同性能。
    Abstract The design process of centrifugal compressors requires applying an optimization process which is computationally expensive due to complex analytical equations underlying the compressor's dynamical equations. Although the regression surrogate models could drastically reduce the computational cost of such a process, the major challenge is the scarcity of data for training the surrogate model. Aiming to strategically exploit the labeled samples, we propose the Active-CompDesign framework in which we combine a thermodynamics-based compressor model (i.e., our internal software for compressor design) and Gaussian Process-based surrogate model within a deployable Active Learning (AL) setting. We first conduct experiments in an offline setting and further, extend it to an online AL framework where a real-time interaction with the thermodynamics-based compressor's model allows the deployment in production. ActiveCompDesign shows a significant performance improvement in surrogate modeling by leveraging on uncertainty-based query function of samples within the AL framework with respect to the random selection of data points. Moreover, our framework in production has reduced the total computational time of compressor's design optimization to around 46% faster than relying on the internal thermodynamics-based simulator, achieving the same performance.
    摘要 <>传统的中心旋转压缩机设计过程具有计算成本高的问题,因为这些过程下面有复杂的数学方程。虽然使用回归模型可以快速减少计算成本,但是主要挑战在于缺乏训练数据。为了积极利用标注样本,我们提出了Active-CompDesign框架,它将内部的热力学模型和 Gaussian 过程基于的准确模型结合在一起,并在可部署的活动学习(AL)环境中实现。我们首先在线上进行了实验,然后将其扩展到在线 AL 框架中,在实时与热力学模型进行交互时,可以在生产环境中部署。ActiveCompDesign 显示在准确模型化方面得到了明显的改进,通过在 AL 框架中使用不确定性基于的样本选择函数来避免随机选择数据点的问题。此外,我们的框架在生产环境中减少了压缩机的设计优化总计算时间约为46%,实现了同等性。

Near-continuous time Reinforcement Learning for continuous state-action spaces

  • paper_url: http://arxiv.org/abs/2309.02815
  • repo_url: None
  • paper_authors: Lorenzo Croissant, Marc Abeille, Bruno Bouchard
  • for: 这 paper Focuses on the reinforcement learning problem of controlling an unknown dynamical system to maximize the long-term average reward along a single trajectory, with the goal of overcoming the limitations of previous literature, which primarily considers discrete time and state-action spaces.
  • methods: The paper proposes a modelling approach that uses a Poisson clock of frequency $\varepsilon^{-1}$ to capture arbitrary time scales, and considers a generic reward function and state dynamics modelled as a jump process with an arbitrary transition kernel on $\mathbb{R}^d$. The algorithm uses an eluder dimension framework for learning and an approximate planning method based on a diffusive limit approximation of the jump process.
  • results: The paper shows that the celebrated optimism protocol applies when the sub-tasks (learning and planning) can be performed effectively, and the algorithm enjoys a regret of order $\tilde{\mathcal{O}(\varepsilon^{1/2} T+\sqrt{T})$. As the frequency of interactions blows up, the approximation error $\varepsilon^{1/2} T$ vanishes, showing that $\tilde{\mathcal{O}(\sqrt{T})$ is attainable in near-continuous time.
    Abstract We consider the Reinforcement Learning problem of controlling an unknown dynamical system to maximise the long-term average reward along a single trajectory. Most of the literature considers system interactions that occur in discrete time and discrete state-action spaces. Although this standpoint is suitable for games, it is often inadequate for mechanical or digital systems in which interactions occur at a high frequency, if not in continuous time, and whose state spaces are large if not inherently continuous. Perhaps the only exception is the Linear Quadratic framework for which results exist both in discrete and continuous time. However, its ability to handle continuous states comes with the drawback of a rigid dynamic and reward structure. This work aims to overcome these shortcomings by modelling interaction times with a Poisson clock of frequency $\varepsilon^{-1}$, which captures arbitrary time scales: from discrete ($\varepsilon=1$) to continuous time ($\varepsilon\downarrow0$). In addition, we consider a generic reward function and model the state dynamics according to a jump process with an arbitrary transition kernel on $\mathbb{R}^d$. We show that the celebrated optimism protocol applies when the sub-tasks (learning and planning) can be performed effectively. We tackle learning within the eluder dimension framework and propose an approximate planning method based on a diffusive limit approximation of the jump process. Overall, our algorithm enjoys a regret of order $\tilde{\mathcal{O}(\varepsilon^{1/2} T+\sqrt{T})$. As the frequency of interactions blows up, the approximation error $\varepsilon^{1/2} T$ vanishes, showing that $\tilde{\mathcal{O}(\sqrt{T})$ is attainable in near-continuous time.
    摘要 我们考虑了控制未知动力系统的强化学习问题,以最大化漫游趋势的长期奖励。大多数文献都是在离散时间和离散状态动作空间下 considerthe Reinforcement Learning problem。 although this standpoint is suitable for games, it is often inadequate for mechanical or digital systems, where interactions occur at high frequencies or in continuous time, and the state spaces are large or inherently continuous. However, the Linear Quadratic framework exists for both discrete and continuous time, but it has a rigid dynamic and reward structure. This work aims to overcome these shortcomings by modeling interaction times with a Poisson clock of frequency $\varepsilon^{-1}$, which captures arbitrary time scales, from discrete time to continuous time. In addition, we consider a generic reward function and model the state dynamics according to a jump process with an arbitrary transition kernel on $\mathbb{R}^d$. We show that the celebrated optimism protocol applies when the sub-tasks (learning and planning) can be performed effectively. We tackle learning within the eluder dimension framework and propose an approximate planning method based on a diffusive limit approximation of the jump process. Overall, our algorithm enjoys a regret of order $\tilde{\mathcal{O}(\varepsilon^{1/2} T+\sqrt{T})$. As the frequency of interactions blows up, the approximation error $\varepsilon^{1/2} T$ vanishes, showing that $\tilde{\mathcal{O}(\sqrt{T})$ is attainable in near-continuous time.Here's the translation in Traditional Chinese:我们考虑了控制未知动力系统的强化学习问题,以最大化漫游趋势的长期奖励。大多数文献都是在离散时间和离散状态动作空间下考虑Reinforcement Learning problem。 although this standpoint is suitable for games, it is often inadequate for mechanical or digital systems, where interactions occur at high frequencies or in continuous time, and the state spaces are large or inherently continuous. However, the Linear Quadratic framework exists for both discrete and continuous time, but it has a rigid dynamic and reward structure. This work aims to overcome these shortcomings by modeling interaction times with a Poisson clock of frequency $\varepsilon^{-1}$, which captures arbitrary time scales, from discrete time to continuous time. In addition, we consider a generic reward function and model the state dynamics according to a jump process with an arbitrary transition kernel on $\mathbb{R}^d$. We show that the celebrated optimism protocol applies when the sub-tasks (learning and planning) can be performed effectively. We tackle learning within the eluder dimension framework and propose an approximate planning method based on a diffusive limit approximation of the jump process. Overall, our algorithm enjoys a regret of order $\tilde{\mathcal{O}(\varepsilon^{1/2} T+\sqrt{T})$. As the frequency of interactions blows up, the approximation error $\varepsilon^{1/2} T$ vanishes, showing that $\tilde{\mathcal{O}(\sqrt{T})$ is attainable in near-continuous time.

Automated Bioinformatics Analysis via AutoBA

  • paper_url: http://arxiv.org/abs/2309.03242
  • repo_url: https://github.com/joshuachou2018/autoba
  • paper_authors: Juexiao Zhou, Bin Zhang, Xiuying Chen, Haoyang Li, Xiaopeng Xu, Siyuan Chen, Xin Gao
  • for: 这篇论文是为了应对快速增长和变化的Omics数据的分析需求而设计的。
  • methods: 这篇论文使用了一个大型自然语言模型,设计用于传统的Omics数据分析。它使分析过程更加简单,需要最小化使用者的输入,并提供了详细的步骤计划 для多种生物信息学 задачі。
  • results: 这篇论文透过专家生物信息学家的验证,证明了AutoBA在多种Omics分析 caso中的强健性和适应力。包括整 genomic sequencing (WGS), RNA sequencing (RNA-seq), single-cell RNA-seq, ChIP-seq, 和 spatial transcriptomics等 caso。AutoBA还有自动设计分析过程的能力,根据输入数据的变化。相比于在网上的生物信息学服务,AutoBA可以在本地部署分析,保持数据隐私。
    Abstract With the fast-growing and evolving omics data, the demand for streamlined and adaptable tools to handle the analysis continues to grow. In response to this need, we introduce Auto Bioinformatics Analysis (AutoBA), an autonomous AI agent based on a large language model designed explicitly for conventional omics data analysis. AutoBA simplifies the analytical process by requiring minimal user input while delivering detailed step-by-step plans for various bioinformatics tasks. Through rigorous validation by expert bioinformaticians, AutoBA's robustness and adaptability are affirmed across a diverse range of omics analysis cases, including whole genome sequencing (WGS), RNA sequencing (RNA-seq), single-cell RNA-seq, ChIP-seq, and spatial transcriptomics. AutoBA's unique capacity to self-design analysis processes based on input data variations further underscores its versatility. Compared with online bioinformatic services, AutoBA deploys the analysis locally, preserving data privacy. Moreover, different from the predefined pipeline, AutoBA has adaptability in sync with emerging bioinformatics tools. Overall, AutoBA represents a convenient tool, offering robustness and adaptability for complex omics data analysis.
    摘要 随着Omics数据的快速增长和演化,对处理分析的需求不断增长。为回应这一需求,我们介绍Auto Bioinformatics Analysis(AutoBA),一个基于大语言模型的自主AI代理,专门为传统Omics数据分析设计。AutoBA通过最小化用户输入而简化分析过程,并提供详细的步骤计划 для多种生物信息学任务。经过专家生物信息学家的严格验证,AutoBA在多种Omics分析 caso中表现出了Robustness和适应性,包括整个基因组测序(WGS)、RNA测序(RNA-seq)、单个单元RNA-seq、ChIP-seq和空间表述学。AutoBA的独特的自动设计分析过程基于输入数据的变化,进一步强调其灵活性。相比于在线生物信息学服务,AutoBA在本地部署分析,保持数据隐私。此外,AutoBA与emerging生物信息学工具相比,具有适应性。总的来说,AutoBA表示一种便捷的工具,提供Robustness和适应性 для复杂的Omics数据分析。

Norm Tweaking: High-performance Low-bit Quantization of Large Language Models

  • paper_url: http://arxiv.org/abs/2309.02784
  • repo_url: None
  • paper_authors: Liang Li, Qingyuan Li, Bo Zhang, Xiangxiang Chu
  • for: 这篇论文是针对大型自然语言模型(LLM)的模型压缩而写的,以实现在部署时不失去精度。
  • methods: 我们引入了一种名为“norm tweaking”的技术,这是一种可以与现有的PTQ方法整合的高精度且成本效益高的方法。我们的方法基于 Float 版本的活化函数与 quantized 版本之间的差异,通过更新对应的Normalization层的权重,以提高模型的通用能力。
  • results: 我们在诸多数据集上进行了广泛的实验,结果显示我们的方法可以在weight-only quantization和joint quantization of weights和活动中实现更高的精度,超过现有的PTQ方法。在 GLM-130B 和 OPT-66B 上,我们的方法甚至可以在2 bits 的量化下达到浮点版本的精度水平。
    Abstract As the size of large language models (LLMs) continues to grow, model compression without sacrificing accuracy has become a crucial challenge for deployment. While some quantization methods, such as GPTQ, have made progress in achieving acceptable 4-bit weight-only quantization, attempts at lower bit quantization often result in severe performance degradation. In this paper, we introduce a technique called norm tweaking, which can be used as a plugin in current PTQ methods to achieve high precision while being cost-efficient. Our approach is inspired by the observation that rectifying the quantized activation distribution to match its float counterpart can readily restore accuracy for LLMs. To achieve this, we carefully design a tweaking strategy that includes calibration data generation and channel-wise distance constraint to update the weights of normalization layers for better generalization. We conduct extensive experiments on various datasets using several open-sourced LLMs. Our method demonstrates significant improvements in both weight-only quantization and joint quantization of weights and activations, surpassing existing PTQ methods. On GLM-130B and OPT-66B, our method even achieves the same level of accuracy at 2-bit quantization as their float ones. Our simple and effective approach makes it more practical for real-world applications.
    摘要 (Simplified Chinese translation)LLMs的大小继续增长,模型压缩无需牺牲准确性已成为部署的关键挑战。虽然一些归一化方法,如GPTQ,在实现4位准确性的weight-only归一化方面做出了进展,但尝试在更低的位数归一化时经常导致性能下降。在这篇论文中,我们介绍了一种名为“norm tweaking”的技术,可以作为现有的PTQ方法的插件,以实现高精度而且成本效益高的压缩。我们的方法受 Float 版本的 LLMS 的观察所启发,通过修正归一化后的激活量分布,以恢复 LLMS 的准确性。为了实现这一点,我们精心设计了一种调整策略,包括生成准确数据和通道级距离约束,以更新 normalization 层的权重,以便更好地适应。我们在多个数据集和多个开源 LLMS 上进行了广泛的实验。我们的方法在 weight-only 归一化和 weights 和激活量的共同归一化中都显示出了显著的改进,超越了现有的PTQ方法。在 GLM-130B 和 OPT-66B 上,我们的方法甚至可以在2位归一化下达到浮点版本的准确性水平。我们的简单而有效的方法使其更适合实际应用。

Improving diagnosis and prognosis of lung cancer using vision transformers: A scoping review

  • paper_url: http://arxiv.org/abs/2309.02783
  • repo_url: None
  • paper_authors: Hazrat Ali, Farida Mohsen, Zubair Shah
  • for: This paper aims to identify recent developments in vision transformer-based AI methods for lung cancer imaging applications, and to provide insights into their performance and potential for clinical translation.
  • methods: The paper reviews 34 studies published from 2020 to 2022 that use vision transformer-based methods for lung cancer diagnosis and prognosis, including classification of lung cancer types and segmentation of lungs. The studies combine vision transformers with other architectures such as convolutional neural networks or UNet models.
  • results: The review finds that vision transformer-based models are increasingly popular for lung cancer applications, but their computational complexity and clinical relevance are important factors to consider for future research. The studies show promising results in lung cancer diagnosis and prognosis, but lack clear strategies for clinical transformation.Here are the three points in Simplified Chinese text:
  • for: 这篇评论旨在描述最近几年内发表的视Transformer基于AI方法在肺癌图像应用中的发展,以及这些方法在临床翻译中的潜在价值。
  • methods: 这篇评论检讨了2020年至2022年间发表的34篇研究,这些研究使用视Transformer基于方法进行肺癌诊断和预测,包括肺癌类型分类和肺脏分割。这些研究通常将视Transformer与其他架构相结合,如卷积神经网络或UNet模型。
  • results: 评论发现,视Transformer基于模型在肺癌应用中日益受欢迎,但计算复杂度和临床 relevance 是未来研究的重要因素。研究显示,视Transformer基于模型在肺癌诊断和预测中表现出色,但缺乏清晰的临床转化策略。
    Abstract Vision transformer-based methods are advancing the field of medical artificial intelligence and cancer imaging, including lung cancer applications. Recently, many researchers have developed vision transformer-based AI methods for lung cancer diagnosis and prognosis. This scoping review aims to identify the recent developments on vision transformer-based AI methods for lung cancer imaging applications. It provides key insights into how vision transformers complemented the performance of AI and deep learning methods for lung cancer. Furthermore, the review also identifies the datasets that contributed to advancing the field. Of the 314 retrieved studies, this review included 34 studies published from 2020 to 2022. The most commonly addressed task in these studies was the classification of lung cancer types, such as lung squamous cell carcinoma versus lung adenocarcinoma, and identifying benign versus malignant pulmonary nodules. Other applications included survival prediction of lung cancer patients and segmentation of lungs. The studies lacked clear strategies for clinical transformation. SWIN transformer was a popular choice of the researchers; however, many other architectures were also reported where vision transformer was combined with convolutional neural networks or UNet model. It can be concluded that vision transformer-based models are increasingly in popularity for developing AI methods for lung cancer applications. However, their computational complexity and clinical relevance are important factors to be considered for future research work. This review provides valuable insights for researchers in the field of AI and healthcare to advance the state-of-the-art in lung cancer diagnosis and prognosis. We provide an interactive dashboard on lung-cancer.onrender.com/.
    摘要 医学人工智能领域中,视transformer基本方法在肺癌成像方面得到了广泛应用,包括肺癌诊断和预后预测。最近几年,许多研究人员已经开发出了基于视transformer的人工智能方法,用于肺癌成像应用。本篇文章的目的是查找最新的视transformer基本方法在肺癌成像领域的发展情况。它提供了关键的洞察,描述了视transformer如何补充了人工智能和深度学习方法的表现,以及这些方法如何在肺癌预测和诊断中发挥作用。此外,文章还提到了这些研究中使用的数据集,以及这些数据集如何为领域的发展做出了贡献。从2020年到2022年,共检索到314篇研究文章,其中包括34篇发表在这三年间的研究。研究中最常 addressed的任务是分类肺癌类型,如肺平滑细胞癌 versus 肺尖链细胞癌,以及识别正常 versus 癌变肺脏囊。其他应用包括肺癌患者存活预测和肺部分 segmentation。然而,研究中没有明确的临床转化策略。SWIN transformer是研究人员最受欢迎的选择,但是也有许多其他架构,其中视transformer与卷积神经网络或 Unet 模型结合使用。可以 conclude 的是,基于视transformer的模型在肺癌应用领域越来越受欢迎。然而,其计算复杂性和临床实用性是未来研究的关键因素。本文提供了有价值的洞察,可以帮助医学人工智能和医疗领域的研究人员进一步推动肺癌诊断和预后预测的州前沿。我们还提供了一个交互式的仪表板,可以在lung-cancer.onrender.com/ 上查看。

GPT Can Solve Mathematical Problems Without a Calculator

  • paper_url: http://arxiv.org/abs/2309.03241
  • repo_url: https://github.com/thudm/mathglm
  • paper_authors: Zhen Yang, Ming Ding, Qingsong Lv, Zhihuan Jiang, Zehai He, Yuyi Guo, Jinfeng Bai, Jie Tang
  • for: 挑战大语言模型无法准确执行多位数 arithmetic 操作的假设。
  • methods: 使用 sufficient training data,一个 2 亿参数的语言模型可以准确执行多位数 arithmetic 操作,并且几乎没有数据泄露。
  • results: 我们的 MathGLM,基于 GLM-10B 进行了精度调整,在一个包含多步 arithmetic 操作和 math 问题的文本集上达到了类似于 GPT-4 的性能,在一个 5,000 个样本的中文 math 问题测试集上。
    Abstract Previous studies have typically assumed that large language models are unable to accurately perform arithmetic operations, particularly multiplication of >8 digits, and operations involving decimals and fractions, without the use of calculator tools. This paper aims to challenge this misconception. With sufficient training data, a 2 billion-parameter language model can accurately perform multi-digit arithmetic operations with almost 100% accuracy without data leakage, significantly surpassing GPT-4 (whose multi-digit multiplication accuracy is only 4.3%). We also demonstrate that our MathGLM, fine-tuned from GLM-10B on a dataset with additional multi-step arithmetic operations and math problems described in text, achieves similar performance to GPT-4 on a 5,000-samples Chinese math problem test set. Our code and data are public at https://github.com/THUDM/MathGLM.
    摘要

SWAP: Exploiting Second-Ranked Logits for Adversarial Attacks on Time Series

  • paper_url: http://arxiv.org/abs/2309.02752
  • repo_url: None
  • paper_authors: Chang George Dong, Liangwei Nathan Zheng, Weitong Chen, Wei Emma Zhang, Lin Yue
  • for: 这篇论文的目的是提出一种新的时间序列分类模型攻击方法,以强化时间序列分类模型的攻击性和防御性。
  • methods: 这篇论文使用了一种新的攻击方法,名为SWAP,它将注意力集中在第二名的数据上,以提高第二名的数据的信任度,并对其他数据进行最小的操作。这使得SWAP可以增加时间序列分类模型的攻击成功率,同时降低攻击的复杂度。
  • results: 实验结果显示,SWAP可以实现高度的攻击成功率,超过50%,并与现有的方法相比,增加了18%的攻击成功率。
    Abstract Time series classification (TSC) has emerged as a critical task in various domains, and deep neural models have shown superior performance in TSC tasks. However, these models are vulnerable to adversarial attacks, where subtle perturbations can significantly impact the prediction results. Existing adversarial methods often suffer from over-parameterization or random logit perturbation, hindering their effectiveness. Additionally, increasing the attack success rate (ASR) typically involves generating more noise, making the attack more easily detectable. To address these limitations, we propose SWAP, a novel attacking method for TSC models. SWAP focuses on enhancing the confidence of the second-ranked logits while minimizing the manipulation of other logits. This is achieved by minimizing the Kullback-Leibler divergence between the target logit distribution and the predictive logit distribution. Experimental results demonstrate that SWAP achieves state-of-the-art performance, with an ASR exceeding 50% and an 18% increase compared to existing methods.
    摘要 时间序列分类(TSC)已成为各个领域的关键任务,深度神经网络在TSC任务中表现出了优异的表现。然而,这些模型容易受到恶意攻击,其中细腻的干扰可以很大程度地影响预测结果。现有的攻击方法经常受到过参数化或随机Logit干扰的限制,这限制了其效iveness。此外,通常需要生成更多的噪声,以提高攻击成功率(ASR),这使得攻击更容易被检测出来。为解决这些限制,我们提出了SWAP,一种新的攻击方法 дляTSC模型。SWAP通过提高第二个排名的Logit的信任度,同时尽量减少其他Logit的干扰,来实现这一目的。我们通过最小化Kullback-Leibler散度 между目标Logit分布和预测Logit分布来实现这一目标。实验结果表明,SWAP可以达到现状最佳性能,ASR超过50%,与现有方法相比提高18%。

MLN-net: A multi-source medical image segmentation method for clustered microcalcifications using multiple layer normalization

  • paper_url: http://arxiv.org/abs/2309.02742
  • repo_url: https://github.com/yezanting/mln-net-verson1
  • paper_authors: Ke Wang, Zanting Ye, Xiang Xie, Haidong Cui, Tao Chen, Banteng Liu
  • for: 这篇论文是为了提高乳腺癌诊断和治疗的精确度数据分类clustered microcalcifications in mammography images。
  • methods: 这篇论文提出了一个名为MLN-net的新框架,可以将单一源像转换为多个源像,以提高分类精度。这个框架使用多层常化(LN)层来建立分类网络,并且实现了不同领域的分类精度。
  • results: 实验结果显示,MLN-net可以从不同领域的数据中精确地分类clustered microcalcifications,并且其分类精度比前方法高。
    Abstract Accurate segmentation of clustered microcalcifications in mammography is crucial for the diagnosis and treatment of breast cancer. Despite exhibiting expert-level accuracy, recent deep learning advancements in medical image segmentation provide insufficient contribution to practical applications, due to the domain shift resulting from differences in patient postures, individual gland density, and imaging modalities of mammography etc. In this paper, a novel framework named MLN-net, which can accurately segment multi-source images using only single source images, is proposed for clustered microcalcification segmentation. We first propose a source domain image augmentation method to generate multi-source images, leading to improved generalization. And a structure of multiple layer normalization (LN) layers is used to construct the segmentation network, which can be found efficient for clustered microcalcification segmentation in different domains. Additionally, a branch selection strategy is designed for measuring the similarity of the source domain data and the target domain data. To validate the proposed MLN-net, extensive analyses including ablation experiments are performed, comparison of 12 baseline methods. Extensive experiments validate the effectiveness of MLN-net in segmenting clustered microcalcifications from different domains and the its segmentation accuracy surpasses state-of-the-art methods. Code will be available at https://github.com/yezanting/MLN-NET-VERSON1.
    摘要 严重粒体分化在胸部X射线护理中是致癌诊断和治疗的关键。尽管深度学习在医疗图像分割方面的最新进展展现出了专家级别的准确性,但是这些进展在实际应用中并没有做出足够的贡献,因为受到了患者姿势、个人脏腔密度和成像方式等因素的领域转移。在这篇论文中,我们提出了一种名为MLN-net的新框架,可以使用单源图像来准确地分割多源图像。我们首先提出了一种源Domain图像增强方法,以生成多源图像,从而提高了总体化。此外,我们还使用多层 нормализа(LN)层结构来构建分割网络,这种结构在不同的领域中效果非常高。此外,我们还设计了一种分支选择策略,用于测量源领域数据和目标领域数据之间的相似性。为验证我们提出的MLN-net,我们进行了广泛的分析,包括ablation实验和12个基eline方法的比较。广泛的实验证明了MLN-net在不同领域中对集群粒体分化的准确性具有优势,并且其分割精度超过了状态之前的方法。代码将在https://github.com/yezanting/MLN-NET-VERSON1上提供。

Rubric-Specific Approach to Automated Essay Scoring with Augmentation Training

  • paper_url: http://arxiv.org/abs/2309.02740
  • repo_url: None
  • paper_authors: Brian Cho, Youngbin Jang, Jaewoong Yoon
  • for: automatic evaluation of subjective responses
  • methods: neural solutions with data augmentation
  • results: state-of-the-art performance in Automated Student Assessment Prize datasetHere’s the full translation of the abstract in Simplified Chinese:
  • for: 本研究旨在 automatizethe evaluation of subjective responses,使用神经网络方法。
  • methods: 研究使用神经网络方法,并对数据进行数据增强操作,以帮助模型学习掌握旁被前一代工作忽略的特征和函数。
  • results: 研究在Automated Student Assessment Prize数据集上实现了最佳性能。
    Abstract Neural based approaches to automatic evaluation of subjective responses have shown superior performance and efficiency compared to traditional rule-based and feature engineering oriented solutions. However, it remains unclear whether the suggested neural solutions are sufficient replacements of human raters as we find recent works do not properly account for rubric items that are essential for automated essay scoring during model training and validation. In this paper, we propose a series of data augmentation operations that train and test an automated scoring model to learn features and functions overlooked by previous works while still achieving state-of-the-art performance in the Automated Student Assessment Prize dataset.
    摘要

HC3 Plus: A Semantic-Invariant Human ChatGPT Comparison Corpus

  • paper_url: http://arxiv.org/abs/2309.02731
  • repo_url: None
  • paper_authors: Zhenpeng Su, Xing Wu, Wei Zhou, Guangyuan Ma, Songlin Hu
  • for: 本研究旨在提高AI生成内容检测的性能,尤其是对于模型生成文本的检测。
  • methods: 本研究使用了更加全面和complete的数据集,包括semantic-invariant任务,以及进行了大量的任务指令练化。
  • results: 我们的提议检测器在对semantic-invariant任务进行检测时表现出了更高的性能,并且超过了之前的状态 искусственный智能RoBERTa-based检测器。
    Abstract ChatGPT has gained significant interest due to its impressive performance, but people are increasingly concerned about its potential risks, particularly around the detection of AI-generated content (AIGC), which is often difficult for untrained humans to identify. Current datasets utilized for detecting ChatGPT-generated text primarily center around question-answering, yet they tend to disregard tasks that possess semantic-invariant properties, such as summarization, translation, and paraphrasing. Our primary studies demonstrate that detecting model-generated text on semantic-invariant tasks is more difficult. To fill this gap, we introduce a more extensive and comprehensive dataset that considers more types of tasks than previous work, including semantic-invariant tasks. In addition, the model after a large number of task instruction fine-tuning shows a strong powerful performance. Owing to its previous success, we further instruct fine-tuning Tk-instruct and built a more powerful detection system. Experimental results show that our proposed detector outperforms the previous state-of-the-art RoBERTa-based detector.
    摘要

Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data

  • paper_url: http://arxiv.org/abs/2309.02730
  • repo_url: None
  • paper_authors: Hyungseob Lim, Kyungguen Byun, Sunkuk Moon, Erik Visser
  • for: 这个研究的目的是提高无需文本识别或说话者标注的任何对话转换模型,以更好地传递目标说话者的说话风格。
  • methods: 我们提出了一种新的方法,利用自动编程学习(SSL)模型来收集目标说话者每个不同的音频内容的说话风格,并将其表示为一组叫做“风格图书”的嵌入。然后,我们使用这些风格图书来对源语音内容进行适应,以确定最终的目标风格。最后,我们使用扩散基于的生成模型来生成转换后的声音mel-спектрограм。
  • results: 我们的提议方法与扩散基于的生成模型的组合,在任何对话转换任务中可以获得更好的说话者一致性,相比基eline模型。此外,我们发现,对于 longer utterances,计算复杂性的增加很小。
    Abstract While many recent any-to-any voice conversion models succeed in transferring some target speech's style information to the converted speech, they still lack the ability to faithfully reproduce the speaking style of the target speaker. In this work, we propose a novel method to extract rich style information from target utterances and to efficiently transfer it to source speech content without requiring text transcriptions or speaker labeling. Our proposed approach introduces an attention mechanism utilizing a self-supervised learning (SSL) model to collect the speaking styles of a target speaker each corresponding to the different phonetic content. The styles are represented with a set of embeddings called stylebook. In the next step, the stylebook is attended with the source speech's phonetic content to determine the final target style for each source content. Finally, content information extracted from the source speech and content-dependent target style embeddings are fed into a diffusion-based decoder to generate the converted speech mel-spectrogram. Experiment results show that our proposed method combined with a diffusion-based generative model can achieve better speaker similarity in any-to-any voice conversion tasks when compared to baseline models, while the increase in computational complexity with longer utterances is suppressed.
    摘要 而多个最近的任意到任意语音转换模型可以将目标语音的样式信息传递到转换后的语音中,但它们仍然缺乏可以准确复制目标说话者的说话风格的能力。在这种工作中,我们提议一种新的方法,可以从目标语音中提取丰富的说话风格信息,并将其高效地传递给源语音内容,不需要文本转录或说话者标注。我们的提议方法使用一种自然语言学习(SSL)模型来收集目标说话者每个不同的phonetic content对应的说话风格。这些风格被表示为一组叫做“stylebook”的嵌入。在下一步,stylebook与源语音的phonetic content进行 attended,以确定每个源内容的最终目标风格。最后,来自源语音的内容信息和内容相关的目标风格嵌入被 fed into一个扩散型生成模型,以生成转换后的语音mel-spectrogram。实验结果表明,我们的提议方法与扩散型生成模型相结合可以在任意到任意语音转换任务中实现更好的说话者相似性,而与 longer utterances 相比,计算复杂性的增加被抑制。

Large Language Models for Automated Open-domain Scientific Hypotheses Discovery

  • paper_url: http://arxiv.org/abs/2309.02726
  • repo_url: https://github.com/zongliny/moose
  • paper_authors: Zonglin Yang, Xinya Du, Junxian Li, Jie Zheng, Soujanya Poria, Erik Cambria
  • for: 本研究旨在开发一种自动生成有效、新颖、有用的社会科学学术假设探索系统,需要使用原始网络 Corpora 作为观察数据,并提出新的假设,以便为人类研究者提供帮助。
  • methods: 本研究使用了一种多模块框架,以及三种不同的反馈机制,以提高系统的性能。
  • results: 研究发现,使用这些反馈机制可以使系统表现出较高的性能,包括使用 GPT-4 基于评估和社会科学专家评估。
    Abstract Hypothetical induction is recognized as the main reasoning type when scientists make observations about the world and try to propose hypotheses to explain those observations. Past research on hypothetical induction has a limited setting that (1) the observation annotations of the dataset are not raw web corpus but are manually selected sentences (resulting in a close-domain setting); and (2) the ground truth hypotheses annotations are mostly commonsense knowledge, making the task less challenging. In this work, we propose the first NLP dataset for social science academic hypotheses discovery, consisting of 50 recent papers published in top social science journals. Raw web corpora that are necessary for developing hypotheses in the published papers are also collected in the dataset, with the final goal of creating a system that automatically generates valid, novel, and helpful (to human researchers) hypotheses, given only a pile of raw web corpora. The new dataset can tackle the previous problems because it requires to (1) use raw web corpora as observations; and (2) propose hypotheses even new to humanity. A multi-module framework is developed for the task, as well as three different feedback mechanisms that empirically show performance gain over the base framework. Finally, our framework exhibits high performance in terms of both GPT-4 based evaluation and social science expert evaluation.
    摘要 traducción al chino simplificado:推测induction被认为是科学家当中主要的理解类型,当他们观察世界并尝试提出解释这些观察结果。过去关于推测induction的研究有限制的设置,包括(1)数据集的观察注释不是 Raw web corpus,而是 manually selected sentences(导致close-domain setting);和(2)人类常识的ground truth假设注释,使任务更加容易。在这项工作中,我们提出了首个社会科学学报上的NLP数据集,包括最近50篇发表在首屈社会科学期刊上的论文。 Raw web corpus,用于在发表论文中发展假设,也被收集到数据集中,最终目标是创建一个可以自动生成有效、新、有用( для人类研究者)假设,只需要一堆 Raw web corpus。新数据集可以解决以前的问题,因为它需要(1)使用Raw web corpus作为观察;和(2)提出人类未知的假设。我们开发了一个多模块框架,以及三种不同的反馈机制,实际证明了对基础框架的性能提升。最终,我们的框架在GPT-4基于评估和社会科学专家评估中都表现出高性能。

Offensive Hebrew Corpus and Detection using BERT

  • paper_url: http://arxiv.org/abs/2309.02724
  • repo_url: https://github.com/sinalab/offensivehebrew
  • paper_authors: Nagham Hamad, Mustafa Jarrar, Mohammad Khalilia, Nadim Nashif
  • For: The paper is written for offensive language detection in Hebrew, specifically for low-resource languages.* Methods: The paper uses a new offensive language corpus in Hebrew, which consists of 15,881 tweets labeled with one or more of five classes (abusive, hate, violence, pornographic, or none offensive). The authors fine-tuned two Hebrew BERT models, HeBERT and AlephBERT, using their proposed dataset and another published dataset.* Results: The authors observed that their data boosts HeBERT performance by 2% when combined with D_OLaH, and fine-tuning AlephBERT on their data and testing on D_OLaH yields 69% accuracy. They also found that fine-tuning on D_OLaH and testing on their data yields 57% accuracy, which may indicate the generalizability of their data.Here’s the simplified Chinese text for the three key points:* 为:本文关注希伯来语中的不当语言检测,尤其是对于低资源语言。* 方法:本文使用一个新的希伯来语不当语言集,包含15,881条推特消息,每个消息被标记为一个或多个五种类别(不当、仇恨、暴力、色情或无不当)。作者使用了一种新的Annotation process,每个注释者需要熟悉以色列文化、政治和实践,以理解每条消息的上下文。* 结果:作者发现,将HeBERT模型在他们的数据集上进行微调,并与D_OLaH进行组合,可以提高HeBERT模型的性能 by 2%。此外,对AlephBERT模型进行微调,并测试在D_OLaH上,可以达到69%的准确率。此外,将模型微调在D_OLaH上,并测试在他们的数据集上,可以达到57%的准确率,这可能是数据的普适性的证明。
    Abstract Offensive language detection has been well studied in many languages, but it is lagging behind in low-resource languages, such as Hebrew. In this paper, we present a new offensive language corpus in Hebrew. A total of 15,881 tweets were retrieved from Twitter. Each was labeled with one or more of five classes (abusive, hate, violence, pornographic, or none offensive) by Arabic-Hebrew bilingual speakers. The annotation process was challenging as each annotator is expected to be familiar with the Israeli culture, politics, and practices to understand the context of each tweet. We fine-tuned two Hebrew BERT models, HeBERT and AlephBERT, using our proposed dataset and another published dataset. We observed that our data boosts HeBERT performance by 2% when combined with D_OLaH. Fine-tuning AlephBERT on our data and testing on D_OLaH yields 69% accuracy, while fine-tuning on D_OLaH and testing on our data yields 57% accuracy, which may be an indication to the generalizability our data offers. Our dataset and fine-tuned models are available on GitHub and Huggingface.
    摘要 “对粗语言探测已经在许多语言中得到了很好的研究,但是在低资源语言中,例如希伯来语,则落后了。在这篇论文中,我们提供了一个新的希伯来语粗语言数据库。总共从Twitter上获取了15,881则短讯,每则短讯都被标注为一个或多个五种类别(不尊重、仇恨、暴力、色情或无不尊重)由阿拉伯语-希伯来语双语话者标注。标注过程是具有挑战性的,因为每个标注者须熟悉以色列文化、政治和实践来理解每则短讯的背景。我们精确地调整了两个希伯来BERT模型(HeBERT和AlephBERT),使用我们提出的资料集和另一个已发布的资料集。我们发现,当我们的数据与D_OLaH进行调整时,HeBERT的性能提高了2%。精确地调整AlephBERT使用我们的数据和D_OLaH,测试时的准确率为69%,而精确地调整AlephBERT使用D_OLaH,然后测试使用我们的数据,则为57%,这可能是我们数据的一般化性的证明。我们的资料集和调整后的模型在GitHub和Huggingface上可用。”

SlAction: Non-intrusive, Lightweight Obstructive Sleep Apnea Detection using Infrared Video

  • paper_url: http://arxiv.org/abs/2309.02713
  • repo_url: None
  • paper_authors: You Rim Choi, Gyeongseon Eo, Wonhyuck Youn, Hyojin Lee, Haemin Jang, Dongyoon Kim, Hyunwoo Shin, Hyung-Sin Kim
  • for: 该论文目的是检测呼吸暂停睡眠(OSA),以提供早期检测和个性化治疗的可能性。
  • methods: 该论文使用了非侵入式的视频检测技术,使用红外视频记录sleep environment,并使用低帧率、大小和步长的静止窗口分析方法来捕捉睡眠中呼吸事件的变化。
  • results: 论文的实验结果显示,SlAction可以在不同的睡眠环境中达到87.6%的准确率,并且可以在实时进行检测(~3秒钟),这表明SlAction有potential用于早期检测和个性化治疗OSA。
    Abstract Obstructive sleep apnea (OSA) is a prevalent sleep disorder affecting approximately one billion people world-wide. The current gold standard for diagnosing OSA, Polysomnography (PSG), involves an overnight hospital stay with multiple attached sensors, leading to potential inaccuracies due to the first-night effect. To address this, we present SlAction, a non-intrusive OSA detection system for daily sleep environments using infrared videos. Recognizing that sleep videos exhibit minimal motion, this work investigates the fundamental question: "Are respiratory events adequately reflected in human motions during sleep?" Analyzing the largest sleep video dataset of 5,098 hours, we establish correlations between OSA events and human motions during sleep. Our approach uses a low frame rate (2.5 FPS), a large size (60 seconds) and step (30 seconds) for sliding window analysis to capture slow and long-term motions related to OSA. Furthermore, we utilize a lightweight deep neural network for resource-constrained devices, ensuring all video streams are processed locally without compromising privacy. Evaluations show that SlAction achieves an average F1 score of 87.6% in detecting OSA across various environments. Implementing SlAction on NVIDIA Jetson Nano enables real-time inference (~3 seconds for a 60-second video clip), highlighting its potential for early detection and personalized treatment of OSA.
    摘要 扑杀性睡眠呼吸暂停综合症(OSA)是全球范围内一种普遍的睡眠疾病,影响约10亿人。现有的黄金标准 dla OSA 诊断,多somnography(PSG),需要在医院住一夜,并attach多个传感器,可能导致首夜效应,从而影响准确性。为了解决这个问题,我们提出了SlAction,一种不侵入的OSA检测系统,用于日常睡眠环境中。我们问题是:“睡眠视频中的呼吸事件是否能够准确地反映在人体动作中?”通过分析了5098小时的睡眠视频数据,我们发现了OSA事件和人体动作之间的相关性。我们的方法使用低帧率(2.5 FPS)、大小(60秒)和步长(30秒)的滑动窗口分析,以捕捉睡眠中的慢速和长期动作。此外,我们使用轻量级的深度神经网络,确保在资源有限的设备上进行本地处理,并保持隐私。我们的评估表明,SlAction在不同环境中的OSA检测精度为87.6%。通过在NVIDIA Jetson Nano上实现SlAction,我们可以在实时(大约3秒)进行60秒视频剪辑,这表明SlAction在早期检测和个性化治疗OSA方面具有潜在的潜力。

Unveiling the frontiers of deep learning: innovations shaping diverse domains

  • paper_url: http://arxiv.org/abs/2309.02712
  • repo_url: None
  • paper_authors: Shams Forruque Ahmed, Md. Sakib Bin Alam, Maliha Kabir, Shaila Afrin, Sabiha Jannat Rafa, Aanushka Mehjabin, Amir H. Gandomi
  • for: 探讨深度学习在各个领域的应用和挑战
  • methods: 使用深度学习模型进行预测和分析,并且可以自适应和优化数据
  • results: 深度学习在各个领域都有精度的预测和分析结果,但需要大量数据进行有效处理和分析
    Abstract Deep learning (DL) enables the development of computer models that are capable of learning, visualizing, optimizing, refining, and predicting data. In recent years, DL has been applied in a range of fields, including audio-visual data processing, agriculture, transportation prediction, natural language, biomedicine, disaster management, bioinformatics, drug design, genomics, face recognition, and ecology. To explore the current state of deep learning, it is necessary to investigate the latest developments and applications of deep learning in these disciplines. However, the literature is lacking in exploring the applications of deep learning in all potential sectors. This paper thus extensively investigates the potential applications of deep learning across all major fields of study as well as the associated benefits and challenges. As evidenced in the literature, DL exhibits accuracy in prediction and analysis, makes it a powerful computational tool, and has the ability to articulate itself and optimize, making it effective in processing data with no prior training. Given its independence from training data, deep learning necessitates massive amounts of data for effective analysis and processing, much like data volume. To handle the challenge of compiling huge amounts of medical, scientific, healthcare, and environmental data for use in deep learning, gated architectures like LSTMs and GRUs can be utilized. For multimodal learning, shared neurons in the neural network for all activities and specialized neurons for particular tasks are necessary.
    摘要 深度学习(DL)允许开发计算机模型,能够学习、可视化、优化、修剪和预测数据。近年来,DL在各种领域应用,如音视频数据处理、农业、交通预测、自然语言、生物医学、灾害管理、生物信息学、药物设计、 genomics、人脸识别和生态学。为了探讨深度学习的当前状况,需要调查最新的发展和应用在这些领域。然而,文献缺乏探讨深度学习在所有领域的应用。这篇论文因此进行了广泛的调查,探讨了深度学习在所有主要领域的可能应用,以及相关的优势和挑战。根据文献显示,DL在预测和分析中表现出了准确性,使其成为计算机科学中的 poderoso工具。DL的独立性使得它可以处理没有前期训练的数据,需要大量数据进行有效的分析和处理,类似于数据量。为了处理巨量数据的挑战,可以使用门控架构如LSTM和GRU。为多模态学习,共享 neurons 在神经网络中为所有活动和特定任务的专门 neurons 是必要的。

Addressing Imperfect Symmetry: a Novel Symmetry-Learning Actor-Critic Extension

  • paper_url: http://arxiv.org/abs/2309.02711
  • repo_url: https://github.com/m-abr/Adaptive-Symmetry-Learning
  • paper_authors: Miguel Abreu, Luis Paulo Reis, Nuno Lau
  • for: 本研究旨在捕捉人类在完美对称任务中的偏差和认知偏见(如有一个主导手),并通过使用强化学习来捕捉人类大脑对symmetry的能力。
  • methods: 本研究提出了一种名为自适应对称学习(ASL)的模型简化actor-critic扩展,可以在学习过程中适应不完全或不准确的对称描述,并在学习策略时保持共同的对称关系。ASL包括对称适应组件和模块化损失函数。
  • results: 对比现有的对称强化方法,ASL在四脚蚂蚁多向运动任务中表现出优于或相等于其他方法的性能,能够恢复大幅偏差和泛化知识到隐藏的对称状态。
    Abstract Symmetry, a fundamental concept to understand our environment, often oversimplifies reality from a mathematical perspective. Humans are a prime example, deviating from perfect symmetry in terms of appearance and cognitive biases (e.g. having a dominant hand). Nevertheless, our brain can easily overcome these imperfections and efficiently adapt to symmetrical tasks. The driving motivation behind this work lies in capturing this ability through reinforcement learning. To this end, we introduce Adaptive Symmetry Learning (ASL) $\unicode{x2013}$ a model-minimization actor-critic extension that addresses incomplete or inexact symmetry descriptions by adapting itself during the learning process. ASL consists of a symmetry fitting component and a modular loss function that enforces a common symmetric relation across all states while adapting to the learned policy. The performance of ASL is compared to existing symmetry-enhanced methods in a case study involving a four-legged ant model for multidirectional locomotion tasks. The results demonstrate that ASL is capable of recovering from large perturbations and generalizing knowledge to hidden symmetric states. It achieves comparable or better performance than alternative methods in most scenarios, making it a valuable approach for leveraging model symmetry while compensating for inherent perturbations.
    摘要 “同调性”是我们理解环境的基本概念,却常以数学角度简化现实。人类是一个好例子,在外表和认知偏袋(例如有主要手)方面都不寻常。然而,我们的大脑可以轻松超越这些不完整性,并专注于symmetric task中的效率。这个工作的驱动力是通过强化学习来捕捉这个能力。为此,我们介绍了一个名为“adaptive symmetry learning”的model-minimization actor-critic扩展。ASL包括一个对称适摄Component和一个模块损失函数,这些函数在学习政策时适应对称关系。我们在一个四脚蚂蚁模型中进行多向运动任务的case study中评估了ASL的表现。结果显示ASL可以从大的干扰中恢复和获得隐藏的对称状态的知识。它在大多数情况下与其他方法相比,能够获得相似或更好的表现,因此成为一种有价的方法来利用模型的对称性,并对于内在的干扰进行补偿。

Certifying LLM Safety against Adversarial Prompting

  • paper_url: http://arxiv.org/abs/2309.02705
  • repo_url: None
  • paper_authors: Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Soheil Feizi, Hima Lakkaraju
  • for: 防止语言模型生成危险内容,通过检查输入提示的安全性。
  • methods: 使用“erase-and-check”框架,通过擦除提示中的单个字符,并使用安全筛选器进行检查,以保证输入提示的安全性。
  • results: 对于 adversarial suffix、insertion和infusion三种攻击方式,“erase-and-check”框架可以提供强有保证的安全性保证,同时在安全提示上保持良好的性能。例如,对于 adversarial suffix 长度为 20,可以 certificatively 检测93%的危险提示,并将94%的安全提示标记为安全。
    Abstract Large language models (LLMs) released for public use incorporate guardrails to ensure their output is safe, often referred to as "model alignment." An aligned language model should decline a user's request to produce harmful content. However, such safety measures are vulnerable to adversarial prompts, which contain maliciously designed token sequences to circumvent the model's safety guards and cause it to produce harmful content. In this work, we introduce erase-and-check, the first framework to defend against adversarial prompts with verifiable safety guarantees. We erase tokens individually and inspect the resulting subsequences using a safety filter. Our procedure labels the input prompt as harmful if any subsequences or the input prompt are detected as harmful by the filter. This guarantees that any adversarial modification of a harmful prompt up to a certain size is also labeled harmful. We defend against three attack modes: i) adversarial suffix, which appends an adversarial sequence at the end of the prompt; ii) adversarial insertion, where the adversarial sequence is inserted anywhere in the middle of the prompt; and iii) adversarial infusion, where adversarial tokens are inserted at arbitrary positions in the prompt, not necessarily as a contiguous block. Empirical results demonstrate that our technique obtains strong certified safety guarantees on harmful prompts while maintaining good performance on safe prompts. For example, against adversarial suffixes of length 20, it certifiably detects 93% of the harmful prompts and labels 94% of the safe prompts as safe using the open source language model Llama 2 as the safety filter.
    摘要 大型语言模型(LLM)发布 для公共使用都会包括安全保护,通常称为“模型Alignment”。一个平衡化的语言模型应该拒绝用户的请求生成伤害性内容。然而,这些安全措施可能会受到攻击性的提示,这些提示可能会使模型生成伤害性内容。在这个工作中,我们介绍了“抹除和检查”,是第一个对抗攻击性提示的安全保证框架。我们将单独抹除token,并使用一个安全范 filter 来检查结果。如果任何 subsequences 或输入提示被识别为伤害的,我们就将该提示识别为伤害的。这 garanties 任何攻击性提示的修改,都会被识别为伤害的,并且最多可以是一定长度的攻击性提示。我们防止了三种攻击模式:i)攻击 suffix,将攻击性序列 append 到提示的结尾; ii)攻击插入,将攻击性序列插入提示的任何中间位置; iii)攻击混合,将攻击 Token 插入提示的任何位置,不一定是一个连续的对。我们的技术在伤害提示上获得了强大的认证安全保证,同时保持了良好的性能在安全提示上。例如,对于攻击 suffix 的长度为 20,我们可以认证地检测 93% 的伤害提示,并将 94% 的安全提示识别为安全使用 LLama 2 作为安全范 filter。

Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation

  • paper_url: http://arxiv.org/abs/2309.02685
  • repo_url: https://github.com/tomato1mule/diffusion_edf
  • paper_authors: Hyunwoo Ryu, Jiwoo Kim, Junwoo Chang, Hyun Seok Ahn, Joohwan Seo, Taehan Kim, Yubin Kim, Jongeun Choi, Roberto Horowitz
  • for: 本研究旨在提高机器人学习的数据效率、通用性和稳定性,通过在扩散生成模型中 интеGRATE spatial roto-translation equivariance(SE(3)-equivariance)。
  • methods: 本文提出了一种新的Diffusion-EDFs方法,通过在模型架构中 интеGRATE SE(3)-equivariance,实现了很好的数据效率,只需5到10个任务示范来实现终端培训。
  • results: 我们的方法在比较 diffusion-based manipulation方法时显示出了superior的通用性。
    Abstract Recent studies have verified that equivariant methods can significantly improve the data efficiency, generalizability, and robustness in robot learning. Meanwhile, denoising diffusion-based generative modeling has recently gained significant attention as a promising approach for robotic manipulation learning from demonstrations with stochastic behaviors. In this paper, we present Diffusion-EDFs, a novel approach that incorporates spatial roto-translation equivariance, i.e., SE(3)-equivariance to diffusion generative modeling. By integrating SE(3)-equivariance into our model architectures, we demonstrate that our proposed method exhibits remarkable data efficiency, requiring only 5 to 10 task demonstrations for effective end-to-end training. Furthermore, our approach showcases superior generalizability compared to previous diffusion-based manipulation methods.
    摘要 Recent studies have confirmed that equivariant methods can significantly improve data efficiency, generalizability, and robustness in robot learning. Meanwhile, denoising diffusion-based generative modeling has recently gained significant attention as a promising approach for robotic manipulation learning from demonstrations with stochastic behaviors. In this paper, we propose Diffusion-EDFs, a novel approach that incorporates spatial roto-translation equivariance, i.e., SE(3)-equivariance into diffusion generative modeling. By integrating SE(3)-equivariance into our model architectures, we demonstrate that our proposed method exhibits remarkable data efficiency, requiring only 5 to 10 task demonstrations for effective end-to-end training. Furthermore, our approach showcases superior generalizability compared to previous diffusion-based manipulation methods.Note:* "Recent studies" is translated as "近期研究" (jìn qī yán jí)* "equivariancy" is translated as "对称性" (duì xiàng xìng)* "denoising diffusion-based generative modeling" is translated as "减噪扩散生成模型" (jiǎn shēng kuò chǎn shēng chéng yì)* "SE(3)-equivariance" is translated as "SE(3)对称性" (SE(3) duì xiàng xìng)* "spatial roto-translation equivariance" is translated as "空间旋转翻译对称性" (kōng jiān zhòu zhù yǐng duì xiàng xìng)* "diffusion generative modeling" is translated as "扩散生成模型" (kuò chǎn shēng chéng yì)* "task demonstrations" is translated as "任务示例" (tâi yì zhèng yè)* "generalizability" is translated as "通用性" (tōng yòng xìng)

Spatio-Temporal Contrastive Self-Supervised Learning for POI-level Crowd Flow Inference

  • paper_url: http://arxiv.org/abs/2309.03239
  • repo_url: None
  • paper_authors: Songyu Ke, Ting Li, Li Song, Yanping Sun, Qintian Sun, Junbo Zhang, Yu Zheng
  • for: 这种研究的目的是为了准确地掌握城市流动人口,以便更好地管理交通、公共服务和城市规划。
  • methods: 这种研究使用了自我超vised attributed graph representation learning技术,并引入了一种新的对比自学习框架(CSST)来处理缺乏标注数据的问题。
  • results: 实验表明,使用CSST预训练模型,可以在两个实际数据集上 consistently 超过从scratch 训练的模型。
    Abstract Accurate acquisition of crowd flow at Points of Interest (POIs) is pivotal for effective traffic management, public service, and urban planning. Despite this importance, due to the limitations of urban sensing techniques, the data quality from most sources is inadequate for monitoring crowd flow at each POI. This renders the inference of accurate crowd flow from low-quality data a critical and challenging task. The complexity is heightened by three key factors: 1) The scarcity and rarity of labeled data, 2) The intricate spatio-temporal dependencies among POIs, and 3) The myriad correlations between precise crowd flow and GPS reports. To address these challenges, we recast the crowd flow inference problem as a self-supervised attributed graph representation learning task and introduce a novel Contrastive Self-learning framework for Spatio-Temporal data (CSST). Our approach initiates with the construction of a spatial adjacency graph founded on the POIs and their respective distances. We then employ a contrastive learning technique to exploit large volumes of unlabeled spatio-temporal data. We adopt a swapped prediction approach to anticipate the representation of the target subgraph from similar instances. Following the pre-training phase, the model is fine-tuned with accurate crowd flow data. Our experiments, conducted on two real-world datasets, demonstrate that the CSST pre-trained on extensive noisy data consistently outperforms models trained from scratch.
    摘要 准确地获取人群流动的点位(POI)是城市管理、公共服务和城市规划等领域的关键。然而,由于城市感知技术的限制,大多数数据质量不够高,无法准确地监测POI上的人群流动。这使得从低质量数据中推断准确的人群流动成为一项重要和挑战性的任务。这些挑战来自于以下三个因素:1)罕见和罕见的标注数据的稀缺,2)POI之间的复杂的空间-时间关系,3)精度人群流动和GPS报告之间的多种相关性。为 Address these challenges, we recast the crowd flow inference problem as a self-supervised attributed graph representation learning task and introduce a novel Contrastive Self-learning framework for Spatio-Temporal data (CSST). Our approach begins with the construction of a spatial adjacency graph founded on the POIs and their respective distances. We then employ a contrastive learning technique to exploit large volumes of unlabeled spatio-temporal data. We adopt a swapped prediction approach to anticipate the representation of the target subgraph from similar instances. Following the pre-training phase, the model is fine-tuned with accurate crowd flow data. Our experiments, conducted on two real-world datasets, demonstrate that the CSST pre-trained on extensive noisy data consistently outperforms models trained from scratch.

RLSynC: Offline-Online Reinforcement Learning for Synthon Completion

  • paper_url: http://arxiv.org/abs/2309.02671
  • repo_url: None
  • paper_authors: Frazier N. Baker, Ziqi Chen, Xia Ning
    for: 这 paper 的目的是提出一种新的逆Synthesis方法,即RLSynC,可以帮助在 semi-template-based retrosynthesis 方法中完成 synthon。methods: RLSynC 使用一个 offline-online 强化学习方法,每个 synthon 都有一个代理,通过顺序进行行动来完成 synthon。 RLSynC 可以从 both offline 训练集和 online 交互中学习策略,以便探索新的反应空间。 RLSynC 使用一个前向合成模型来评估预测的反应物在合成产品中的可能性,从而导引action搜索。results: 对比之前的逆Synthesis方法,RLSynC 可以在 synthon 完成和 retrosynthesis 方面具有14.9% 和 14.0% 的提升。这表明RLSynC 在合成规划中具有潜在的应用价值。
    Abstract Retrosynthesis is the process of determining the set of reactant molecules that can react to form a desired product. Semi-template-based retrosynthesis methods, which imitate the reverse logic of synthesis reactions, first predict the reaction centers in the products, and then complete the resulting synthons back into reactants. These methods enable necessary interpretability and high practical utility to inform synthesis planning. We develop a new offline-online reinforcement learning method RLSynC for synthon completion in semi-template-based methods. RLSynC assigns one agent to each synthon, all of which complete the synthons by conducting actions step by step in a synchronized fashion. RLSynC learns the policy from both offline training episodes and online interactions which allow RLSynC to explore new reaction spaces. RLSynC uses a forward synthesis model to evaluate the likelihood of the predicted reactants in synthesizing a product, and thus guides the action search. We compare RLSynC with the state-of-the-art retrosynthesis methods. Our experimental results demonstrate that RLSynC can outperform these methods with improvement as high as 14.9% on synthon completion, and 14.0% on retrosynthesis, highlighting its potential in synthesis planning.
    摘要 <>translate "Retrosynthesis is the process of determining the set of reactant molecules that can react to form a desired product. Semi-template-based retrosynthesis methods, which imitate the reverse logic of synthesis reactions, first predict the reaction centers in the products, and then complete the resulting synthons back into reactants. These methods enable necessary interpretability and high practical utility to inform synthesis planning. We develop a new offline-online reinforcement learning method RLSynC for synthon completion in semi-template-based methods. RLSynC assigns one agent to each synthon, all of which complete the synthons by conducting actions step by step in a synchronized fashion. RLSynC learns the policy from both offline training episodes and online interactions which allow RLSynC to explore new reaction spaces. RLSynC uses a forward synthesis model to evaluate the likelihood of the predicted reactants in synthesizing a product, and thus guides the action search. We compare RLSynC with the state-of-the-art retrosynthesis methods. Our experimental results demonstrate that RLSynC can outperform these methods with improvement as high as 14.9% on synthon completion, and 14.0% on retrosynthesis, highlighting its potential in synthesis planning."Translation:这是一个逐步的过程,用于决定具有创建所需产品的化学物质的集合。使用半模板基的逆合成方法,首先预测产品中的反应中心,然后将结果转换回到底物。这些方法具有实用的可行性和解释性,以帮助合成观察。我们开发了一种新的线上-线下强化学习方法RLSynC,用于实对这些方法的完成。RLSynC将一个代理人分配到每个实体,这些代理人逐步完成实体,并在同步化的方式下进行动作搜索。RLSynC从线上训练集和线上互动中学习策略,这 permet RLSynC 探索新的反应空间。RLSynC 使用一个前方合成模型来评估预测的底物是否可以合成产品,因此导引动作搜索。我们与现有的逆合成方法进行比较。我们的实验结果表明,RLSynC 可以与现有的方法相比,在实体完成和逆合成方面提高改善率高达14.9%和14.0%。这 highlights RLSynC 的潜在在合成观察。

Subsethood Measures of Spatial Granules

  • paper_url: http://arxiv.org/abs/2309.02662
  • repo_url: None
  • paper_authors: Liquan Zhao, Yiyu Yao
  • for: 这篇论文主要针对的是掌握复杂的信息系统中的知识空间和知识结构,并通过粗化集合论和空间粗化集合论来描述这些知识空间和知识结构。
  • methods: 该论文使用了粗化集合论和空间粗化集合论来描述知识空间和知识结构,并提出了一种基于 conditional granularity 和 conditional fineness 的推理模型。
  • results: 该论文的研究结果包括提出了十二个笃设增减子集 axioms 和其对应的十二个笃设增减超集 axioms,以及五种 conditional granularity 度量和五种 conditional fineness 度量。这些度量都满足了其对应的笃设增减子集 axioms,但只有一个 boundary condition。此外,该论文还定义了五种 conditional granularity 熵和五种 conditional fineness 熵。
    Abstract Subsethood, which is to measure the degree of set inclusion relation, is predominant in fuzzy set theory. This paper introduces some basic concepts of spatial granules, coarse-fine relation, and operations like meet, join, quotient meet and quotient join. All the atomic granules can be hierarchized by set-inclusion relation and all the granules can be hierarchized by coarse-fine relation. Viewing an information system from the micro and the macro perspectives, we can get a micro knowledge space and a micro knowledge space, from which a rough set model and a spatial rough granule model are respectively obtained. The classical rough set model is the special case of the rough set model induced from the micro knowledge space, while the spatial rough granule model will be play a pivotal role in the problem-solving of structures. We discuss twelve axioms of monotone increasing subsethood and twelve corresponding axioms of monotone decreasing supsethood, and generalize subsethood and supsethood to conditional granularity and conditional fineness respectively. We develop five conditional granularity measures and five conditional fineness measures and prove that each conditional granularity or fineness measure satisfies its corresponding twelve axioms although its subsethood or supsethood measure only hold one of the two boundary conditions. We further define five conditional granularity entropies and five conditional fineness entropies respectively, and each entropy only satisfies part of the boundary conditions but all the ten monotone conditions.
    摘要 “subsethood”,用于量度集合关系的度量,在模糊集合论中具有先锋性。本文介绍了一些基本概念,包括空间格粒、粗糙关系、会议、合并、对应关系和粗糙关系等。所有的格粒都可以归类为集合包含关系中的层次结构,而所有的格粒都可以归类为粗糙关系中的层次结构。从微观和macro two perspectives,我们可以从 informationspace 中获得一个微观知识空间和一个macro知识空间,从而获得一个粗糙集合模型和一个空间粗糙格粒模型。classical rough set model 是微观知识空间中粗糙集合模型的特殊情况,而空间粗糙格粒模型将在结构问题中发挥重要的作用。我们讨论了12个升递条件和12个降递条件,并将subsethood和supersethood扩展到 conditional granularity 和 conditional fineness 。我们开发了5个 conditional granularity 度量和5个 conditional fineness 度量,并证明每个 conditional granularity 或 conditional fineness 度量都遵循其所对应的12个条件,即使它们的subsethood 或 supersethood 度量只满足一个边界条件。我们进一步定义5个 conditional granularity 熵和5个 conditional fineness 熵,每个熵只遵循一部分边界条件,但是所有的10个升递条件。

TFBEST: Dual-Aspect Transformer with Learnable Positional Encoding for Failure Prediction

  • paper_url: http://arxiv.org/abs/2309.02641
  • repo_url: None
  • paper_authors: Rohan Mohapatra, Saptarshi Sengupta
  • for: 预测硬盘失效,避免数据损失和公司形象问题。
  • methods: 使用Self-Monitoring, Analysis and Reporting Technology(S.M.A.R.T)logs和一种新的transformer架构——Temporal-fusion Bi-encoder Self-attention Transformer(TFBEST)进行预测。
  • results: 比 estado-of-the-art RUL 预测方法更高的准确率,并提供了一种新的信任度 estadística来帮助制造商在一定时间内更换硬盘。
    Abstract Hard Disk Drive (HDD) failures in datacenters are costly - from catastrophic data loss to a question of goodwill, stakeholders want to avoid it like the plague. An important tool in proactively monitoring against HDD failure is timely estimation of the Remaining Useful Life (RUL). To this end, the Self-Monitoring, Analysis and Reporting Technology employed within HDDs (S.M.A.R.T.) provide critical logs for long-term maintenance of the security and dependability of these essential data storage devices. Data-driven predictive models in the past have used these S.M.A.R.T. logs and CNN/RNN based architectures heavily. However, they have suffered significantly in providing a confidence interval around the predicted RUL values as well as in processing very long sequences of logs. In addition, some of these approaches, such as those based on LSTMs, are inherently slow to train and have tedious feature engineering overheads. To overcome these challenges, in this work we propose a novel transformer architecture - a Temporal-fusion Bi-encoder Self-attention Transformer (TFBEST) for predicting failures in hard-drives. It is an encoder-decoder based deep learning technique that enhances the context gained from understanding health statistics sequences and predicts a sequence of the number of days remaining before a disk potentially fails. In this paper, we also provide a novel confidence margin statistic that can help manufacturers replace a hard-drive within a time frame. Experiments on Seagate HDD data show that our method significantly outperforms the state-of-the-art RUL prediction methods during testing over the exhaustive 10-year data from Backblaze (2013-present). Although validated on HDD failure prediction, the TFBEST architecture is well-suited for other prognostics applications and may be adapted for allied regression problems.
    摘要 硬盘驱动器(HDD)在数据中心失效的情况非常昂贵,从惨重的数据损失到对客户的信誉受到影响,各方希望避免这种情况。为了执行前置监测,评估硬盘的剩下有用生命(RUL)是非常重要的。为此,硬盘内部的自我监测、分析和报告技术(S.M.A.R.T)提供了关键的日志记录,以长期维护硬盘的安全性和可靠性。过去的数据驱动预测模型使用了这些S.M.A.R.T.日志和卷积神经网络(CNN/RNN)结构,但它们在提供预测RUL值的置信度范围以及处理非常长的日志序列时受到了重大的挑战。此外,一些这些方法,如基于LSTM的方法,在训练过程中具有慢速的特点和繁琐的特征工程过程。为了解决这些挑战,我们在这种工作中提出了一种新的 transformer 架构——时间融合二元自注意 transformer(TFBEST),用于预测硬盘失效。这是一种基于Encoder-Decoder的深度学习技术,它在理解医疗统计序列中的上下文中提高了硬盘的健康统计序列,并预测硬盘失效的序列。在这篇论文中,我们还提出了一种新的置信度范围统计,可以帮助制造商在一定的时间内替换硬盘。实验结果表明,我们的方法在测试过程中对Backblaze(2013-present)的10年数据进行了显著的性能提升,至于RUL预测方面。尽管验证在硬盘失效预测方面,但TFBEST架构适用于其他预测应用程序,可以适应相关的回归问题。

Deep Reinforcement Learning from Hierarchical Weak Preference Feedback

  • paper_url: http://arxiv.org/abs/2309.02632
  • repo_url: https://github.com/abukharin3/heron
  • paper_authors: Alexander Bukharin, Yixiao Li, Pengcheng He, Weizhu Chen, Tuo Zhao
  • For: The paper is written for practical reinforcement learning (RL) tasks, specifically to address the challenges of reward engineering and the limitations of reinforcement learning from human feedback (RLHF).* Methods: The paper proposes a new RL framework called HERON, which uses a hierarchical decision tree to compare trajectories and train a preference-based reward model. The framework leverages human preference data to learn complex rewards that are well aligned with human preferences.* Results: The paper finds that the proposed HERON framework can train high-performing agents on a variety of difficult tasks, and provides additional benefits such as improved sample efficiency and robustness. The authors also provide a publicly available code implementation of the framework at https://github.com/abukharin3/HERON.Here is the same information in Simplified Chinese text:* For: 论文是为实际的奖励学习(RL)任务写的,特别是解决奖励工程学(RLHF)中的挑战和限制。* Methods: 论文提出了一种新的RL框架called HERON,使用决策树来比较 trajectory 并训练一种基于偏好的奖励模型。该框架利用人类偏好数据来学习复杂的奖励,使RL能够更好地解决复杂的问题。* Results: 论文发现,提出的HERON框架可以训练高性能的代理人agent 在多种具有挑战性的任务上,并提供了更多的优点,如提高的样本效率和Robustness。作者还提供了一个公共可用的代码实现HERON的框架,可以在https://github.com/abukharin3/HERON中获取。
    Abstract Reward design is a fundamental, yet challenging aspect of practical reinforcement learning (RL). For simple tasks, researchers typically handcraft the reward function, e.g., using a linear combination of several reward factors. However, such reward engineering is subject to approximation bias, incurs large tuning cost, and often cannot provide the granularity required for complex tasks. To avoid these difficulties, researchers have turned to reinforcement learning from human feedback (RLHF), which learns a reward function from human preferences between pairs of trajectory sequences. By leveraging preference-based reward modeling, RLHF learns complex rewards that are well aligned with human preferences, allowing RL to tackle increasingly difficult problems. Unfortunately, the applicability of RLHF is limited due to the high cost and difficulty of obtaining human preference data. In light of this cost, we investigate learning reward functions for complex tasks with less human effort; simply by ranking the importance of the reward factors. More specifically, we propose a new RL framework -- HERON, which compares trajectories using a hierarchical decision tree induced by the given ranking. These comparisons are used to train a preference-based reward model, which is then used for policy learning. We find that our framework can not only train high performing agents on a variety of difficult tasks, but also provide additional benefits such as improved sample efficiency and robustness. Our code is available at https://github.com/abukharin3/HERON.
    摘要 практическое обучение с подкреплением (RL) 是一个基本 yet 挑战性的问题。 для简单任务,研究人员通常手工设计奖函数,例如使用一个线性组合多个奖因素。 however, Such reward engineering is subject to approximation bias, incurs large tuning cost, and often cannot provide the granularity required for complex tasks. To avoid these difficulties, researchers have turned to reinforcement learning from human feedback (RLHF), which learns a reward function from human preferences between pairs of trajectory sequences. By leveraging preference-based reward modeling, RLHF learns complex rewards that are well aligned with human preferences, allowing RL to tackle increasingly difficult problems. Unfortunately, the applicability of RLHF is limited due to the high cost and difficulty of obtaining human preference data.在考虑这些成本的情况下,我们调查了一种新的RL框架——HERON,该框架通过一个层次决策树来比较方程。这些比较用于训练一个基于偏好的奖金模型,该模型然后用于策略学习。我们发现,我们的框架不仅可以训练高性能的代理人在多种具有挑战性的任务上,还可以提供其他优点,如提高样本效率和Robustness。我们的代码可以在https://github.com/abukharin3/HERON中找到。