cs.CV - 2023-07-09

Histopathology Whole Slide Image Analysis with Heterogeneous Graph Representation Learning

  • paper_url: http://arxiv.org/abs/2307.04189
  • repo_url: https://github.com/hku-medai/wsi-hgnn
  • paper_authors: Tsai Hor Chan, Fernando Julio Cendra, Lan Ma, Guosheng Yin, Lequan Yu
  • for: 本研究旨在提出一种基于非同质图的抽象方法,以利用染色体图像中不同类型的核体之间的复杂结构关系来进行抽象分析。
  • methods: 本研究提出了一种基于非同质图的抽象方法,包括形成WTSI为非同质图,使用HEAT模型进行消息协同汇聚,并提出一种假标签基于含义相似性的 pooling 机制来获取图级特征。
  • results: 对三个TCGA公共数据集进行了广泛的实验,并证明了该方法可以在不同任务上具有显著的优势,比如识别率、抑阻率等。
    Abstract Graph-based methods have been extensively applied to whole-slide histopathology image (WSI) analysis due to the advantage of modeling the spatial relationships among different entities. However, most of the existing methods focus on modeling WSIs with homogeneous graphs (e.g., with homogeneous node type). Despite their successes, these works are incapable of mining the complex structural relations between biological entities (e.g., the diverse interaction among different cell types) in the WSI. We propose a novel heterogeneous graph-based framework to leverage the inter-relationships among different types of nuclei for WSI analysis. Specifically, we formulate the WSI as a heterogeneous graph with "nucleus-type" attribute to each node and a semantic similarity attribute to each edge. We then present a new heterogeneous-graph edge attribute transformer (HEAT) to take advantage of the edge and node heterogeneity during massage aggregating. Further, we design a new pseudo-label-based semantic-consistent pooling mechanism to obtain graph-level features, which can mitigate the over-parameterization issue of conventional cluster-based pooling. Additionally, observing the limitations of existing association-based localization methods, we propose a causal-driven approach attributing the contribution of each node to improve the interpretability of our framework. Extensive experiments on three public TCGA benchmark datasets demonstrate that our framework outperforms the state-of-the-art methods with considerable margins on various tasks. Our codes are available at https://github.com/HKU-MedAI/WSI-HGNN.
    摘要 渐变图方法已广泛应用于整个染色体图像(WSI)分析,这是因为渐变图可以模型染色体图像中的空间关系。然而,大多数现有方法都是使用同质graph(例如,具有同质节点类型)来模型WSI。尽管它们在一定程度上取得了成功,但它们无法捕捉染色体图像中不同生物实体之间复杂的结构关系(例如,不同细胞类型之间的多样化互动)。我们提议一种新的多态渐变图基于框架,以利用染色体图像中不同类型细胞的关系。具体来说,我们将WSI转化为一个多态渐变图,其中每个节点具有“细胞类型”特性,以及每个边具有semantic similarity特性。然后,我们提出一种新的多态渐变图边属性变换器(HEAT),以利用边和节点多样性进行消息汇聚。此外,我们设计了一种新的 pseudo-label-based semantic-consistent pooling机制,以获取图 уров减少过拟合问题。此外,我们发现现有的协同localization方法存在局限性,我们提出一种 causal-driven 方法,以解释我们的框架的解释性。我们的实验结果表明,我们的框架在三个公共 TCGA 测试数据集上比现状态方法具有较大的优势,在不同任务上具有显著的提升。我们的代码可以在 中找到。

Predictive Coding For Animation-Based Video Compression

  • paper_url: http://arxiv.org/abs/2307.04187
  • repo_url: None
  • paper_authors: Goluck Konuko, Stéphane Lathuilière, Giuseppe Valenzise
  • for: 提高视频压缩效率,适用于视频会议类应用。
  • methods: 基于图像动画的新方法,使用预测编码,对待标题帧进行很好的重建。
  • results: 对比HEVC和VVC,实现了70%以上的比特率减少和30%以上的比特率减少,在语音视频数据集上。
    Abstract We address the problem of efficiently compressing video for conferencing-type applications. We build on recent approaches based on image animation, which can achieve good reconstruction quality at very low bitrate by representing face motions with a compact set of sparse keypoints. However, these methods encode video in a frame-by-frame fashion, i.e. each frame is reconstructed from a reference frame, which limits the reconstruction quality when the bandwidth is larger. Instead, we propose a predictive coding scheme which uses image animation as a predictor, and codes the residual with respect to the actual target frame. The residuals can be in turn coded in a predictive manner, thus removing efficiently temporal dependencies. Our experiments indicate a significant bitrate gain, in excess of 70% compared to the HEVC video standard and over 30% compared to VVC, on a datasetof talking-head videos
    摘要 我们处理对 conferencing 型应用程序进行高效压缩影像的问题。我们基于最近的图像动画方法,可以在非常低比特率下 achieve good 重建质量。但这些方法在每帧基于参考帧进行重建,因此在带宽较大时会限制重建质量。我们提议一种预测编码方案,使用图像动画作为预测器,并将差分码到目标帧。这些差分可以在预测性下进行高效地删除时间相依性。我们的实验结果显示,与 HEVC 影像标准和 VVC 相比,我们的方法可以获得高达70% 以上的比特率优化,在 talking-head 影像集上。

Reducing False Alarms in Video Surveillance by Deep Feature Statistical Modeling

  • paper_url: http://arxiv.org/abs/2307.04159
  • repo_url: None
  • paper_authors: Xavier Bou, Aitor Artola, Thibaud Ehret, Gabriele Facciolo, Jean-Michel Morel, Rafael Grompone von Gioi
  • for: 降低视频监测中False Alarm的数量
  • methods: 基于深度特征高维统计模型的无监督可靠性验证过程
  • results: 对六种方法和多个数据集中的多个序列进行Pixel和Object级别的评估,实验结果表明提议的a-contrario验证可以大幅减少False Alarm数量。
    Abstract Detecting relevant changes is a fundamental problem of video surveillance. Because of the high variability of data and the difficulty of properly annotating changes, unsupervised methods dominate the field. Arguably one of the most critical issues to make them practical is to reduce their false alarm rate. In this work, we develop a method-agnostic weakly supervised a-contrario validation process, based on high dimensional statistical modeling of deep features, to reduce the number of false alarms of any change detection algorithm. We also raise the insufficiency of the conventionally used pixel-wise evaluation, as it fails to precisely capture the performance needs of most real applications. For this reason, we complement pixel-wise metrics with object-wise metrics and evaluate the impact of our approach at both pixel and object levels, on six methods and several sequences from different datasets. Experimental results reveal that the proposed a-contrario validation is able to largely reduce the number of false alarms at both pixel and object levels.
    摘要 检测有关变化是视频监测领域的基本问题。由于数据的高度变化和难以正确地标注变化,因此无监督方法在该领域占据主导地位。然而,减少假警告率是实现这些方法的实用性的核心问题。在这项工作中,我们开发了一种方法不偏的弱监睹验证过程,基于深度特征的高维统计模型,以减少任何变化检测算法的假警告率。此外,我们指出了通用的像素精度评价方法的不足,因为它无法准确地捕捉实际应用中的性能需求。因此,我们补充了像素精度 metric 的对象精度 metric,并对六种方法和多个数据集中的多个序列进行了评估。实验结果表明,我们的弱监睹验证方法能够大幅减少像素和对象级别的假警告率。

DIFF-NST: Diffusion Interleaving For deFormable Neural Style Transfer

  • paper_url: http://arxiv.org/abs/2307.04157
  • repo_url: None
  • paper_authors: Dan Ruta, Gemma Canet Tarrés, Andrew Gilbert, Eli Shechtman, Nicholas Kolkin, John Collomosse
  • for: 本研究探讨如何使用神经网络技术来修改内容图像的艺术外观,以符合参照样式图像的风格。
  • methods: 本研究使用了新型的扩散模型,如稳定扩散,可以访问更强大的图像生成技术,以实现新的可能性。
  • results: 本研究提出了一种新的方法,可以在扩散模型的基础之上实现可变式风格传输,这是前一代模型无法实现的。我们还证明了在推理时可以通过模型的假设来获得新的艺术控制,并在这个新方向下进行了探索。
    Abstract Neural Style Transfer (NST) is the field of study applying neural techniques to modify the artistic appearance of a content image to match the style of a reference style image. Traditionally, NST methods have focused on texture-based image edits, affecting mostly low level information and keeping most image structures the same. However, style-based deformation of the content is desirable for some styles, especially in cases where the style is abstract or the primary concept of the style is in its deformed rendition of some content. With the recent introduction of diffusion models, such as Stable Diffusion, we can access far more powerful image generation techniques, enabling new possibilities. In our work, we propose using this new class of models to perform style transfer while enabling deformable style transfer, an elusive capability in previous models. We show how leveraging the priors of these models can expose new artistic controls at inference time, and we document our findings in exploring this new direction for the field of style transfer.
    摘要

A Survey on Figure Classification Techniques in Scientific Documents

  • paper_url: http://arxiv.org/abs/2307.05694
  • repo_url: None
  • paper_authors: Anurag Dhote, Mohammed Javed, David S Doermann
  • for: 本文主要用于系统地把图像分类为五类:表格、照片、图表、地图和图表,并对现有的方法和数据集进行报告和评论。
  • methods: 本文使用了不同的人工智能和机器学习技术来从图像中提取数据,包括图像分类、图像描述、图像识别等方法。
  • results: 本文对现有的方法和数据集进行了批判性评估,并发现了一些研究缺失,提出了可能的未来研究方向。
    Abstract Figures visually represent an essential piece of information and provide an effective means to communicate scientific facts. Recently there have been many efforts toward extracting data directly from figures, specifically from tables, diagrams, and plots, using different Artificial Intelligence and Machine Learning techniques. This is because removing information from figures could lead to deeper insights into the concepts highlighted in the scientific documents. In this survey paper, we systematically categorize figures into five classes - tables, photos, diagrams, maps, and plots, and subsequently present a critical review of the existing methodologies and data sets that address the problem of figure classification. Finally, we identify the current research gaps and provide possible directions for further research on figure classification.
    摘要 figuress 可以视觉表达重要信息,提供有效的科学信息传递方式。近些年来,有许多努力在抽取图表数据方面,特别是从表格、图表、图例和地图等方面,使用不同的人工智能和机器学习技术。这是因为从图表中提取信息可能会导致更深入的理解科学文献中所描述的概念。在本评论 paper中,我们系统地分类图表为五类 - 表格、照片、图例、地图和图表,并随后提供了现有方法和数据集的批判性评审。最后,我们确定了当前的研究漏洞和提供了进一步研究图表分类的可能方向。

ECL: Class-Enhancement Contrastive Learning for Long-tailed Skin Lesion Classification

  • paper_url: http://arxiv.org/abs/2307.04136
  • repo_url: https://github.com/zylbuaa/ecl
  • paper_authors: Yilan Zhang, Jianqi Chen, Ke Wang, Fengying Xie
    for: 这个研究旨在解决肤色图像数据集中的数据分布偏好问题,使得计算机支持的皮肤疾病诊断更加困难。methods: 我们提出了一种叫做类增强对照学习(ECL)的方法,它可以增强少数类别中的信息,并对不同类别进行平等对待。为了实现信息增强,我们设计了一种混合代理模型,并提出了一种循环更新策略来优化参数。我们还设计了一种类别依赖的混合代理损失函数,以利用样本和代理之间的关系,并对不同类别进行平等对待。results: 我们的方法在处理肤色图像数据集中的分类任务中达到了最高的性能。我们还通过评估学习曲线来证明我们的方法可以适应不同的学习环境,并且在不同的数据分布下都能够保持高度的稳定性。
    Abstract Skin image datasets often suffer from imbalanced data distribution, exacerbating the difficulty of computer-aided skin disease diagnosis. Some recent works exploit supervised contrastive learning (SCL) for this long-tailed challenge. Despite achieving significant performance, these SCL-based methods focus more on head classes, yet ignoring the utilization of information in tail classes. In this paper, we propose class-Enhancement Contrastive Learning (ECL), which enriches the information of minority classes and treats different classes equally. For information enhancement, we design a hybrid-proxy model to generate class-dependent proxies and propose a cycle update strategy for parameters optimization. A balanced-hybrid-proxy loss is designed to exploit relations between samples and proxies with different classes treated equally. Taking both "imbalanced data" and "imbalanced diagnosis difficulty" into account, we further present a balanced-weighted cross-entropy loss following curriculum learning schedule. Experimental results on the classification of imbalanced skin lesion data have demonstrated the superiority and effectiveness of our method.
    摘要 皮肤图像数据集经常受到数据分布不均衡的影响,使计算机辅助皮肤病诊断变得更加困难。一些最近的研究利用Supervised Contrastive Learning(SCL)来解决这个长尾挑战。尽管这些SCL基于方法达到了显著的性能,但是它们更关注头等级类,忽略了使用尾等级类信息。在这篇论文中,我们提出了类增强对比学习(ECL)方法,它可以增强少数量级类信息并对不同类型进行平等对待。为了增强信息,我们设计了一种混合代理模型,生成类具有不同代理模型,并提出了一种循环更新策略来优化参数。为了利用样本和代理之间的关系,我们设计了一种权重平衡损失函数。考虑到“不均衡数据”和“不均衡诊断难度”两个因素,我们还提出了一种平衡权重十进制架构。实验结果表明,我们的方法在皮肤病患数据分类任务中具有superiority和有效性。

Ultrasonic Image’s Annotation Removal: A Self-supervised Noise2Noise Approach

  • paper_url: http://arxiv.org/abs/2307.04133
  • repo_url: https://github.com/grandarth/ultrasonicimage-n2n-approach
  • paper_authors: Yuanheng Zhang, Nan Jiang, Zhaoheng Xie, Junying Cao, Yueyang Teng
  • for: 这篇论文的目的是创建一个自动标注医疗影像的方法,以减少医疗影像标注的人工审核时间。
  • methods: 这篇论文使用了一个自我指定预备任务,将标注视为噪音,并使用了一个基于噪音的模型,将影像重新构成为清洁的形式。
  • results: 这篇论文的结果显示,使用了自我指定预备任务和噪音的模型,可以对医疗影像进行高精度的标注,并且比使用噪音-清洁数据对的模型更好。
    Abstract Accurately annotated ultrasonic images are vital components of a high-quality medical report. Hospitals often have strict guidelines on the types of annotations that should appear on imaging results. However, manually inspecting these images can be a cumbersome task. While a neural network could potentially automate the process, training such a model typically requires a dataset of paired input and target images, which in turn involves significant human labour. This study introduces an automated approach for detecting annotations in images. This is achieved by treating the annotations as noise, creating a self-supervised pretext task and using a model trained under the Noise2Noise scheme to restore the image to a clean state. We tested a variety of model structures on the denoising task against different types of annotation, including body marker annotation, radial line annotation, etc. Our results demonstrate that most models trained under the Noise2Noise scheme outperformed their counterparts trained with noisy-clean data pairs. The costumed U-Net yielded the most optimal outcome on the body marker annotation dataset, with high scores on segmentation precision and reconstruction similarity. We released our code at https://github.com/GrandArth/UltrasonicImage-N2N-Approach.
    摘要 高品质医疗报告中的精准阴影图像是不可或缺的元素。医院通常有严格的指引,要求医疗影像报告中的标注项目。然而,手动检查这些图像可能是一个费时的任务。 neural network 可能可以自动进行这个任务,但是训练这种模型通常需要一个对应的数据集,这需要大量的人工劳动。本研究提出了一个自动标注图像的方法。这是通过将标注视为噪音,创建一个自我监督任务,并使用以噪音为学习目标的模型来恢复图像的清洁状态。我们对恢复任务进行了多种模型结构的测试,包括体部标注、径向线标注等。我们的结果显示,大多数以噪音为学习目标的模型比以噪音-清洁数据对的模型来得到更好的结果。自适应U-Net传播网络在体部标注数据集上获得最佳效果,高于分类精度和重建相似性。我们将我们的代码公开在 GitHub 上,请参考 https://github.com/GrandArth/UltrasonicImage-N2N-Approach.

Cross-modal Orthogonal High-rank Augmentation for RGB-Event Transformer-trackers

  • paper_url: http://arxiv.org/abs/2307.04129
  • repo_url: https://github.com/ZHU-Zhiyu/High-Rank_RGB-Event_Tracker
  • paper_authors: Zhiyu Zhu, Junhui Hou, Dapeng Oliver Wu
  • for: 这个论文解决了RGB视频和事件数据之间的跨模态对象跟踪问题。
  • methods: 我们不是构建复杂的跨模态融合网络,而是探索pre-trained视觉转换器(ViT)的潜在能力。我们特别是通过插件和游戏训练增强技术来鼓励ViTbridging两种模态之间的巨大分布差,从而启用全面的跨模态信息互动,提高其能力。我们提议一种面积模型策略,随机遮盖某些批处理的特定模式,以促进不同模式之间的互动。
  • results: 我们的插件和游戏训练增强技术可以明显提高现有的一气道和两气道跟踪器的跟踪精度和成功率。我们的新视角和发现可能会为跨模态数据模型领域带来新的思路和发现。代码将公开发布。
    Abstract This paper addresses the problem of cross-modal object tracking from RGB videos and event data. Rather than constructing a complex cross-modal fusion network, we explore the great potential of a pre-trained vision Transformer (ViT). Particularly, we delicately investigate plug-and-play training augmentations that encourage the ViT to bridge the vast distribution gap between the two modalities, enabling comprehensive cross-modal information interaction and thus enhancing its ability. Specifically, we propose a mask modeling strategy that randomly masks a specific modality of some tokens to enforce the interaction between tokens from different modalities interacting proactively. To mitigate network oscillations resulting from the masking strategy and further amplify its positive effect, we then theoretically propose an orthogonal high-rank loss to regularize the attention matrix. Extensive experiments demonstrate that our plug-and-play training augmentation techniques can significantly boost state-of-the-art one-stream and twostream trackers to a large extent in terms of both tracking precision and success rate. Our new perspective and findings will potentially bring insights to the field of leveraging powerful pre-trained ViTs to model cross-modal data. The code will be publicly available.
    摘要

Marine Debris Detection in Satellite Surveillance using Attention Mechanisms

  • paper_url: http://arxiv.org/abs/2307.04128
  • repo_url: None
  • paper_authors: Ao Shen, Yijie Zhu, Richard Jiang
  • for: 本研究旨在提高marine debris的位置Localization效率和应用范围,通过结合YOLOv7的实例分割和不同的注意机制。
  • methods: 本研究使用了一个标注 dataset,包括含有海洋垃圾的卫星图像,并评估了三种注意模型,包括轻量级坐标注意、CBAM(结合空间和通道注意)以及瓶颈transformer(基于自注意)。
  • results: Box detection 评估显示,CBAM 得到了最好的结果(F1 分数为 77%),比 coordinate attention (F1 分数为 71%)和 YOLOv7/瓶颈transformer (两者 F1 分数为 around 66%)更好。Mask 评估显示 CBAM 再次领先,F1 分数为 73%,而 coordinate attention 和 YOLOv7 的表现相似(around F1 分数为 68%/69%),瓶颈transformer 落后,F1 分数为 56%。这些结果表明,CBAM 适合 Marine debris 的检测。但是,瓶颈transformer 可能具有更好的实际性能,因为它检测到了一些人工标注 missed 的区域,并且具有较好的面积精度。
    Abstract Marine debris is an important issue for environmental protection, but current methods for locating marine debris are yet limited. In order to achieve higher efficiency and wider applicability in the localization of Marine debris, this study tries to combine the instance segmentation of YOLOv7 with different attention mechanisms and explores the best model. By utilizing a labelled dataset consisting of satellite images containing ocean debris, we examined three attentional models including lightweight coordinate attention, CBAM (combining spatial and channel focus), and bottleneck transformer (based on self-attention). Box detection assessment revealed that CBAM achieved the best outcome (F1 score of 77%) compared to coordinate attention (F1 score of 71%) and YOLOv7/bottleneck transformer (both F1 scores around 66%). Mask evaluation showed CBAM again leading with an F1 score of 73%, whereas coordinate attention and YOLOv7 had comparable performances (around F1 score of 68%/69%) and bottleneck transformer lagged behind at F1 score of 56%. These findings suggest that CBAM offers optimal suitability for detecting marine debris. However, it should be noted that the bottleneck transformer detected some areas missed by manual annotation and displayed better mask precision for larger debris pieces, signifying potentially superior practical performance.
    摘要 海洋垃圾是环境保护的重要问题,但当前用于找到海洋垃圾的方法仍有限制。为了提高效率和应用范围,这项研究尝试将YOLOv7的实例 segmentation技术与不同的注意机制结合,并探索最佳模型。通过使用含有海洋垃圾的卫星图像标注 datasets,我们评估了三种注意模型,包括轻量级坐标注意、CBAM(结合空间和通道注意)和瓶颈变换器(基于自注意)。 Box 检测评估表明,CBAM得到了最佳结果(F1分数为77%),比 coordinate attention(F1分数为71%)和 YOLOv7/瓶颈变换器(两者F1分数都在66%左右)更高。面 mask 评估表明,CBAM再次领先,F1分数为73%;coordinate attention 和 YOLOv7 的表现相似(约F1分数为68%/69%),而瓶颈变换器则落后,F1分数为56%。这些结果表明,CBAM 是最适合检测海洋垃圾的选择。然而,瓶颈变换器可能在实际应用中表现更好,因为它检测了一些人工标注 missed 的区域,并且对大型垃圾件的面 mask 准确率较高。

HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding

  • paper_url: http://arxiv.org/abs/2307.05721
  • repo_url: None
  • paper_authors: Hao Zheng, Regina Lee, Yuqian Lu
  • for: 这 paper 是为了掌握现代制造业中的人工 Assembly 知识而做的,以便实现技术突破。
  • methods: 这 paper 使用了人工视频数据集 HA-ViD,该数据集包含了真实世界中的 Assembly 场景,以及人类在 Assembly 过程中的自然行为和学习过程。
  • results: 这 paper 通过分析不同的 Assembly 场景、人类行为和学习过程,对现代制造业中的 Assembly 知识进行了全面的掌握和分析。
    Abstract Understanding comprehensive assembly knowledge from videos is critical for futuristic ultra-intelligent industry. To enable technological breakthrough, we present HA-ViD - the first human assembly video dataset that features representative industrial assembly scenarios, natural procedural knowledge acquisition process, and consistent human-robot shared annotations. Specifically, HA-ViD captures diverse collaboration patterns of real-world assembly, natural human behaviors and learning progression during assembly, and granulate action annotations to subject, action verb, manipulated object, target object, and tool. We provide 3222 multi-view, multi-modality videos (each video contains one assembly task), 1.5M frames, 96K temporal labels and 2M spatial labels. We benchmark four foundational video understanding tasks: action recognition, action segmentation, object detection and multi-object tracking. Importantly, we analyze their performance for comprehending knowledge in assembly progress, process efficiency, task collaboration, skill parameters and human intention. Details of HA-ViD is available at: https://iai-hrc.github.io/ha-vid.
    摘要 理解全面 montage 知识从视频是未来超智能工业的关键。为实现技术突破,我们提出了HA-ViD,首个人工 montage 视频集,它包含了真实工业 assembly 场景,自然的学习过程,以及人机共享标注。HA-ViD 捕捉了真实 assembly 中的多样化合作模式,人类行为和学习过程,并为每个动作分别提供了主体、动作词、操作对象、目标对象和工具的精细标注。我们提供了3222个多视角、多种 modalities 视频(每个视频包含一个 assembly 任务),150万帧,96000个时间标签和200万个空间标签。我们对四个基本视频理解任务进行了测试:行动识别、动作分割、对象检测和多对象跟踪。我们还分析了它们在理解 assembly 进度、生产效率、任务合作、技能参数和人类意图方面的性能。HA-ViD 的详细信息可以在以下网址中找到:https://iai-hrc.github.io/ha-vid。

Enhancing Low-Light Images Using Infrared-Encoded Images

  • paper_url: http://arxiv.org/abs/2307.04122
  • repo_url: https://github.com/wyf0912/ELIEI
  • paper_authors: Shulin Tian, Yufei Wang, Renjie Wan, Wenhan Yang, Alex C. Kot, Bihan Wen
  • For: 提高低光照图像的可见度和细节表示,增强低光照图像的感知效果。* Methods: 根据各个像素的值,除去各个像素的IR滤波器,从而提高图像的信号吞吐量和细节表示。* Results: 经验结果表明,提议的方法可以更好地提高低光照图像的可见度和细节表示,并且和参考图像的对比表明,提议的方法可以更好地保留图像的细节和含义。
    Abstract Low-light image enhancement task is essential yet challenging as it is ill-posed intrinsically. Previous arts mainly focus on the low-light images captured in the visible spectrum using pixel-wise loss, which limits the capacity of recovering the brightness, contrast, and texture details due to the small number of income photons. In this work, we propose a novel approach to increase the visibility of images captured under low-light environments by removing the in-camera infrared (IR) cut-off filter, which allows for the capture of more photons and results in improved signal-to-noise ratio due to the inclusion of information from the IR spectrum. To verify the proposed strategy, we collect a paired dataset of low-light images captured without the IR cut-off filter, with corresponding long-exposure reference images with an external filter. The experimental results on the proposed dataset demonstrate the effectiveness of the proposed method, showing better performance quantitatively and qualitatively. The dataset and code are publicly available at https://wyf0912.github.io/ELIEI/
    摘要 低光照图像增强任务是必备又挑战的,因为它是内在无法定义的。先前的艺术主要在可见光谱上使用像素损失来处理低光照图像,这限制了恢复照度、对比度和Texture详细的能力,因为可见光谱中的光子数量太少。在这种工作中,我们提出了一种新的方法,通过从增强摄像头中除掉内部红外(IR)遮盖器,以获取更多的光子信息,从而提高信号噪声比。为验证提议的策略,我们收集了一个对应的低光照图像和长暂曝光参考图像的对照数据集。实验结果表明,提议的方法具有较好的量化和质量性能。数据集和代码在https://wyf0912.github.io/ELIEI/可公共下载。

Mitosis Detection from Partial Annotation by Dataset Generation via Frame-Order Flipping

  • paper_url: http://arxiv.org/abs/2307.04113
  • repo_url: https://github.com/naivete5656/mdpafof
  • paper_authors: Kazuya Nishimura, Ami Katanaya, Shinichiro Chuma, Ryoma Bise
  • for: 提高生物医学研究中的细胞分化检测精度
  • methods: 使用部分标注序列训练深度学习模型
  • results: 对四个数据集进行测试,比较其性能和其他比较方法,得到了更高的检测精度
    Abstract Detection of mitosis events plays an important role in biomedical research. Deep-learning-based mitosis detection methods have achieved outstanding performance with a certain amount of labeled data. However, these methods require annotations for each imaging condition. Collecting labeled data involves time-consuming human labor. In this paper, we propose a mitosis detection method that can be trained with partially annotated sequences. The base idea is to generate a fully labeled dataset from the partial labels and train a mitosis detection model with the generated dataset. First, we generate an image pair not containing mitosis events by frame-order flipping. Then, we paste mitosis events to the image pair by alpha-blending pasting and generate a fully labeled dataset. We demonstrate the performance of our method on four datasets, and we confirm that our method outperforms other comparisons which use partially labeled sequences.
    摘要 检测细胞分裂事件在生物医学研究中扮演着重要的角色。基于深度学习的细胞分裂检测方法已经实现了非常出色的表现,但这些方法需要每种检测条件的注释。收集标注数据需要很长时间的人工劳动。在本文中,我们提出了一种不需要完全标注的细胞分裂检测方法。我们的基本想法是通过将部分标注序列转化成完全标注序列,然后使用生成的序列来训练细胞分裂检测模型。我们的方法包括将帧顺序翻转生成不含细胞分裂事件的图像对,然后使用alpha拟合粘贴细胞分裂事件到图像对中,并生成了一个完全标注的数据集。我们在四个数据集上展示了我们的方法的性能,并证明了我们的方法在与其他部分标注序列比较的情况下表现更好。

Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird’s Eye View

  • paper_url: http://arxiv.org/abs/2307.04106
  • repo_url: None
  • paper_authors: Jiayu Yang, Enze Xie, Miaomiao Liu, Jose M. Alvarez
  • for: 本研究旨在提高视觉 только perceive autonomous driving模型的性能,通过编码多视图图像特征到 Bird’s-Eye-View(BEV)空间中。
  • methods: 我们提出使用parametric depth distribution modeling来模型特征转换。我们首先通过每个像素在每个视图中预测parametric depth distribution来提升2D图像特征到3D空间中,然后将3D特征体系归一化到BEV幂。
  • results: 我们的方法在object detection和semantic segmentation任务上表现出色,比existings方法更高效。此外,我们还提出了一种新的可视性感知评价指标,可以减少halucination问题的影响。
    Abstract Recent vision-only perception models for autonomous driving achieved promising results by encoding multi-view image features into Bird's-Eye-View (BEV) space. A critical step and the main bottleneck of these methods is transforming image features into the BEV coordinate frame. This paper focuses on leveraging geometry information, such as depth, to model such feature transformation. Existing works rely on non-parametric depth distribution modeling leading to significant memory consumption, or ignore the geometry information to address this problem. In contrast, we propose to use parametric depth distribution modeling for feature transformation. We first lift the 2D image features to the 3D space defined for the ego vehicle via a predicted parametric depth distribution for each pixel in each view. Then, we aggregate the 3D feature volume based on the 3D space occupancy derived from depth to the BEV frame. Finally, we use the transformed features for downstream tasks such as object detection and semantic segmentation. Existing semantic segmentation methods do also suffer from an hallucination problem as they do not take visibility information into account. This hallucination can be particularly problematic for subsequent modules such as control and planning. To mitigate the issue, our method provides depth uncertainty and reliable visibility-aware estimations. We further leverage our parametric depth modeling to present a novel visibility-aware evaluation metric that, when taken into account, can mitigate the hallucination problem. Extensive experiments on object detection and semantic segmentation on the nuScenes datasets demonstrate that our method outperforms existing methods on both tasks.
    摘要 近期无视图识别模型在自动驾驶领域取得了有前途的结果,其中包括编码多视图图像特征到 Bird's-Eye-View(BEV)空间。这个步骤是无视图识别模型的关键步骤,同时也是主要的瓶颈。现有的方法通过非 Parametric depth 分布模型来解决这个问题,导致巨大的内存占用,或者完全忽略geometry信息。相比之下,我们提议使用 Parametric depth 分布模型来模型特征转换。我们首先通过每个像素在每个视图中预测的 Parametric depth 分布将二dimensional的图像特征提升到 egovehicle 定义的三维空间中。然后,我们根据 depth 空间占用 derivation 集成三维特征体volume到 BEV 帧中。最后,我们使用转换后的特征来进行下游任务,如物体检测和semantic segmentation。现有的semantic segmentation方法也受到了一种halucination问题的困扰,他们不考虑视ibilty信息。这种halucination可能对后续模块,如控制和规划,造成 particualrly problematic。为了缓解这个问题,我们的方法提供了depth uncertainty和可靠的视ibilty感知。此外,我们还利用我们的 Parametric depth 模型来提出一种新的可见性感知 metric,当被考虑时,可以缓解halucination问题。我们在 nuScenes 数据集上进行了广泛的对象检测和semantic segmentation实验,结果表明我们的方法在两个任务上都高于现有方法。

CA-CentripetalNet: A novel anchor-free deep learning framework for hardhat wearing detection

  • paper_url: http://arxiv.org/abs/2307.04103
  • repo_url: None
  • paper_authors: Zhijian Liu, Nian Cai, Wensheng Ouyang, Chengbin Zhang, Nili Tian, Han Wang
  • for: 强化建筑工地安全管理,提高建筑工人安全度
  • methods: 使用CA-CentripetalNet深度学习框架,并提出了两种新的策略,即垂直水平角块挤压和约束中心注意力,以提高特征提取和利用能力
  • results: 实验结果显示,CA-CentripetalNet在准确率和内存占用之间取得了更好的平衡,具体来说是86.63%的MAP值,特别是在小型帽子和非穿着帽子的情况下表现更佳,而且比既有深度学习方法更快速、更低占用内存。
    Abstract Automatic hardhat wearing detection can strengthen the safety management in construction sites, which is still challenging due to complicated video surveillance scenes. To deal with the poor generalization of previous deep learning based methods, a novel anchor-free deep learning framework called CA-CentripetalNet is proposed for hardhat wearing detection. Two novel schemes are proposed to improve the feature extraction and utilization ability of CA-CentripetalNet, which are vertical-horizontal corner pooling and bounding constrained center attention. The former is designed to realize the comprehensive utilization of marginal features and internal features. The latter is designed to enforce the backbone to pay attention to internal features, which is only used during the training rather than during the detection. Experimental results indicate that the CA-CentripetalNet achieves better performance with the 86.63% mAP (mean Average Precision) with less memory consumption at a reasonable speed than the existing deep learning based methods, especially in case of small-scale hardhats and non-worn-hardhats.
    摘要 自动帽子穿戴检测可以增强建筑现场的安全管理,但是由于复杂的视频监测场景,这还是一项挑战。为了解决先前深度学习基于方法的泛化不佳问题,一种新的无锚点深度学习框架called CA-CentripetalNet被提议用于帽子穿戴检测。两种新的方案被提出来提高CA-CentripetalNet的特征提取和利用能力,即水平垂直角池和约束中心注意力。前者是为了实现全面利用边缘特征和内部特征。后者是为了让底层在训练时强制注意内部特征,但是只在训练中使用,而不是在检测中使用。实验结果显示,CA-CentripetalNet可以在小规模帽子和非穿戴帽子情况下达到86.63%的MAP(平均准确率),并且具有较低的内存占用和合理的速度,比先前的深度学习基于方法更好。

Enhancing Building Semantic Segmentation Accuracy with Super Resolution and Deep Learning: Investigating the Impact of Spatial Resolution on Various Datasets

  • paper_url: http://arxiv.org/abs/2307.04101
  • repo_url: None
  • paper_authors: Zhiling Guo, Xiaodan Shi, Haoran Zhang, Dou Huang, Xiaoya Song, Jinyue Yan, Ryosuke Shibasaki
    for: 本研究旨在 investigate the impact of spatial resolution on deep learning based building semantic segmentation, and provide insights for data selection and preparation.methods: 本研究使用 remote sensing images 在三个研究区域中创造多个空间分辨率,通过超分辨和降采样。然后,选择了 two 个代表性的深度学习架构:UNet 和 FPN,进行模型训练和测试。results: 实验结果显示,空间分辨率对建筑分类结果产生很大影响,0.3米级别的空间分辨率具有更高的成本效果。
    Abstract The development of remote sensing and deep learning techniques has enabled building semantic segmentation with high accuracy and efficiency. Despite their success in different tasks, the discussions on the impact of spatial resolution on deep learning based building semantic segmentation are quite inadequate, which makes choosing a higher cost-effective data source a big challenge. To address the issue mentioned above, in this study, we create remote sensing images among three study areas into multiple spatial resolutions by super-resolution and down-sampling. After that, two representative deep learning architectures: UNet and FPN, are selected for model training and testing. The experimental results obtained from three cities with two deep learning models indicate that the spatial resolution greatly influences building segmentation results, and with a better cost-effectiveness around 0.3m, which we believe will be an important insight for data selection and preparation.
    摘要 <>Remote sensing技术和深度学习技术的发展,使得建筑 semantic segmentation 的准确率和效率得到了大幅提高。 DESPITE 这些成功在不同任务中,关于深度学习基于建筑 semantic segmentation 的空间分辨率影响的讨论相对较少,这使得选择更高成本效益的数据源变得具有挑战性。为解决上述问题,在本研究中,我们将Remote sensing 图像分割成多个空间分辨率,通过超分辨和降分辨。然后,我们选择了两种代表性的深度学习架构:UNet 和 FPN,进行模型训练和测试。实验结果表明,在三个城市的两种深度学习模型下,空间分辨率对建筑分割结果产生了很大影响,并且在约0.3米的成本效益下,我们认为这将成为数据选择和准备的重要视角。Note: The text has been translated using Google Translate, and some minor adjustments may be necessary to ensure accuracy.

Visible and infrared self-supervised fusion trained on a single example

  • paper_url: http://arxiv.org/abs/2307.04100
  • repo_url: None
  • paper_authors: Nati Ofir
  • for: 这个论文解决了RGB和 Near-Infrared(NIR)图像 fusión问题。
  • methods: 该方法使用了一个Convolutional-Neural-Network(CNN),通过Self-Supervised-Learning(SSL)在一个示例上训练。
  • results: 该方法可以 preserve each spectral channel的相关细节,而不需要大量的训练过程。在实验部分,该方法比其他最近的方法得到了更好的量化和质量的多спектраль融合结果。
    Abstract This paper addresses the problem of visible (RGB) to Near-Infrared (NIR) image fusion. Multispectral imaging is an important task relevant to image processing and computer vision, even more, since the development of the RGBT sensor. While the visible image sees color and suffers from noise, haze, and clouds, the NIR channel captures a clearer picture and it is significantly required by applications such as dehazing or object detection. The proposed approach fuses these two aligned channels by training a Convolutional-Neural-Network (CNN) by a Self-Supervised-Learning (SSL) on a single example. For each such pair, RGB and IR, the network is trained for seconds to deduce the final fusion. The SSL is based on Sturcture-of-Similarity (SSIM) loss combined with Edge-Preservation (EP) loss. The labels for the SSL are the input channels themselves. This fusion preserves the relevant detail of each spectral channel while not based on a heavy training process. In the experiments section, the proposed approach achieves better qualitative and quantitative multispectral fusion results with respect to other recent methods, that are not based on large dataset training.
    摘要

GNP Attack: Transferable Adversarial Examples via Gradient Norm Penalty

  • paper_url: http://arxiv.org/abs/2307.04099
  • repo_url: None
  • paper_authors: Tao Wu, Tie Luo, Donald C. Wunsch
  • for: 增强敌意例的跨模型可转移性,使得实际黑盒攻击可以在不同的目标模型上进行。
  • methods: 使用Gradient Norm Penalty(GNP)进行攻击,GNP会让优化过程逐渐落入丢失函数的平坦区域,从而提高敌意例的可转移性。
  • results: 通过对11个state-of-the-art深度学习模型和6个高级防御方法进行攻击,实证表明GNP可以非常有效地生成高可转移性的敌意例。此外,GNP也可以与其他梯度基于方法结合使用,以实现更强大的跨模型攻击。
    Abstract Adversarial examples (AE) with good transferability enable practical black-box attacks on diverse target models, where insider knowledge about the target models is not required. Previous methods often generate AE with no or very limited transferability; that is, they easily overfit to the particular architecture and feature representation of the source, white-box model and the generated AE barely work for target, black-box models. In this paper, we propose a novel approach to enhance AE transferability using Gradient Norm Penalty (GNP). It drives the loss function optimization procedure to converge to a flat region of local optima in the loss landscape. By attacking 11 state-of-the-art (SOTA) deep learning models and 6 advanced defense methods, we empirically show that GNP is very effective in generating AE with high transferability. We also demonstrate that it is very flexible in that it can be easily integrated with other gradient based methods for stronger transfer-based attacks.
    摘要 “敌对例”(AE)具有良好的转移性,可以实现实际的黑盒攻击,不需要内部知识关于目标模型。先前的方法往往将AE生成成为特定架构和特征表示的适应器,即使生成的AE对目标黑盒模型具有很少或根本无效的转移性。在本文中,我们提出了一种增强AE转移性的方法,使用梯度 нор penalty(GNP)。它驱动损失函数优化程序落在损失图形中的极小区域中落点。我们透过对11个现代深度学习模型和6个高级防护方法进行实验,证明了GNP具有很高的转移性。我们还证明了它可以与其他梯度基本方法结合使用,以实现更强的转移基于攻击。

CMDFusion: Bidirectional Fusion Network with Cross-modality Knowledge Distillation for LIDAR Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2307.04091
  • repo_url: https://github.com/Jun-CEN/CMDFusion
  • paper_authors: Jun Cen, Shiwei Zhang, Yixuan Pei, Kun Li, Hang Zheng, Maochun Luo, Yingya Zhang, Qifeng Chen
  • for: 本研究提出了一种bidirectional fusion network with cross-modality knowledge distillation(CMDFusion),用于解决自动驾驶车辆的LIDAR semantic segmentation任务中的2D和3D混合问题。
  • methods: 我们的CMDFusion方法有两个贡献:首先,我们的对称混合方案可以同时利用2D和3D信息,从而超越单个混合方案;其次,我们通过知识传授来帮助3D网络生成2D信息,从而解决RGB图像不可预知的问题。
  • results: 我们在SemanticKITTI和nuScenes数据集上测试了CMDFusion方法,并证明其在所有混合基于方法中达到最好的性能。
    Abstract 2D RGB images and 3D LIDAR point clouds provide complementary knowledge for the perception system of autonomous vehicles. Several 2D and 3D fusion methods have been explored for the LIDAR semantic segmentation task, but they suffer from different problems. 2D-to-3D fusion methods require strictly paired data during inference, which may not be available in real-world scenarios, while 3D-to-2D fusion methods cannot explicitly make full use of the 2D information. Therefore, we propose a Bidirectional Fusion Network with Cross-Modality Knowledge Distillation (CMDFusion) in this work. Our method has two contributions. First, our bidirectional fusion scheme explicitly and implicitly enhances the 3D feature via 2D-to-3D fusion and 3D-to-2D fusion, respectively, which surpasses either one of the single fusion schemes. Second, we distillate the 2D knowledge from a 2D network (Camera branch) to a 3D network (2D knowledge branch) so that the 3D network can generate 2D information even for those points not in the FOV (field of view) of the camera. In this way, RGB images are not required during inference anymore since the 2D knowledge branch provides 2D information according to the 3D LIDAR input. We show that our CMDFusion achieves the best performance among all fusion-based methods on SemanticKITTI and nuScenes datasets. The code will be released at https://github.com/Jun-CEN/CMDFusion.
    摘要 二维RGB图像和三维激光雷达点云提供补充知识 для自动驾驶车辆的识别系统。许多二维和三维融合方法已经为LIDAR语义分割任务进行研究,但它们受到不同的问题困扰。二维到三维融合方法需要在推理过程中具有匹配的数据,而三维到二维融合方法无法直接充分利用二维信息。因此,我们在本工作中提出了一种架构协同融合网络(CMDFusion)。我们的方法有两个贡献。首先,我们的对称融合方案可以同时利用二维和三维信息,通过二维到三维融合和三维到二维融合来强化三维特征,这超过了单一的融合方案。其次,我们将二维知识从二维网络(Camera分支)传承给三维网络(二维知识分支),以便三维网络可以根据三维激光雷达输入生成二维信息,不需要RGB图像在推理过程中 anymore。我们表明,我们的CMDFusion在SemanticKITTI和nuScenes数据集上的性能比所有融合基于方法更好。代码将于https://github.com/Jun-CEN/CMDFusion中发布。

SVIT: Scaling up Visual Instruction Tuning

  • paper_url: http://arxiv.org/abs/2307.04087
  • repo_url: https://github.com/baai-dcai/visual-instruction-tuning
  • paper_authors: Bo Zhao, Boya Wu, Tiejun Huang
  • for: 提高多Modal模型的可视理解和计划能力
  • methods: 使用Visual Instruction Tuning(SVIT)集成3.2万个视觉指令调整数据,包括1.6万对话问答(QA)对和1.6万个复杂的推理QA对和106个详细的图像描述。
  • results: 训练多Modal模型 на SVIT可以显著提高多Modal性能,包括视觉理解、推理和规划能力。
    Abstract Thanks to the emerging of foundation models, the large language and vision models are integrated to acquire the multimodal ability of visual captioning, dialogue, question answering, etc. Although existing multimodal models present impressive performance of visual understanding and reasoning, their limits are still largely under-explored due to the scarcity of high-quality instruction tuning data. To push the limits of multimodal capability, we Sale up Visual Instruction Tuning (SVIT) by constructing a dataset of 3.2 million visual instruction tuning data including 1.6M conversation question-answer (QA) pairs and 1.6M complex reasoning QA pairs and 106K detailed image descriptions. Besides the volume, the proposed dataset is also featured by the high quality and rich diversity, which is generated by prompting GPT-4 with the abundant manual annotations of images. We empirically verify that training multimodal models on SVIT can significantly improve the multimodal performance in terms of visual perception, reasoning and planing.
    摘要 Thanks to the emergence of foundation models, large language and vision models are integrated to acquire multimodal capabilities such as visual captioning, dialogue, question answering, etc. Although existing multimodal models have shown impressive performance in visual understanding and reasoning, their limitations are still largely unexplored due to the scarcity of high-quality instruction tuning data. To push the limits of multimodal capability, we constructed a dataset of 3.2 million visual instruction tuning data, including 1.6 million conversation question-answer (QA) pairs and 1.6 million complex reasoning QA pairs, as well as 106,000 detailed image descriptions. Besides the volume, the proposed dataset is also characterized by high quality and rich diversity, generated by prompting GPT-4 with abundant manual annotations of images. We empirically verified that training multimodal models on SVIT can significantly improve multimodal performance in terms of visual perception, reasoning, and planning.

Score-based Conditional Generation with Fewer Labeled Data by Self-calibrating Classifier Guidance

  • paper_url: http://arxiv.org/abs/2307.04081
  • repo_url: None
  • paper_authors: Paul Kuo-Ming Huang, Si-An Chen, Hsuan-Tien Lin
  • for: 提高基于分类器的生成模型(SGMs)的 conditional generation 质量,特别是使用 fewer labeled data。
  • methods: 利用 energy-based models 将分类器视为另一种视角,然后使用现有的损失函数来准确地调整分类器。
  • results: 提出的方法可以在不同百分比的标注数据量下提高 conditional generation 质量,并且在使用 fewer labeled data 时比其他 conditional SGMs 表现更佳。
    Abstract Score-based Generative Models (SGMs) are a popular family of deep generative models that achieves leading image generation quality. Earlier studies have extended SGMs to tackle class-conditional generation by coupling an unconditional SGM with the guidance of a trained classifier. Nevertheless, such classifier-guided SGMs do not always achieve accurate conditional generation, especially when trained with fewer labeled data. We argue that the issue is rooted in unreliable gradients of the classifier and the inability to fully utilize unlabeled data during training. We then propose to improve classifier-guided SGMs by letting the classifier calibrate itself. Our key idea is to use principles from energy-based models to convert the classifier as another view of the unconditional SGM. Then, existing loss for the unconditional SGM can be adopted to calibrate the classifier using both labeled and unlabeled data. Empirical results validate that the proposed approach significantly improves the conditional generation quality across different percentages of labeled data. The improved performance makes the proposed approach consistently superior to other conditional SGMs when using fewer labeled data. The results confirm the potential of the proposed approach for generative modeling with limited labeled data.
    摘要 Score-based生成模型(SGM)是一种深度生成模型,其可以实现领先的图像生成质量。早期研究者将SGM扩展到实现类别条件生成,通过与训练过的分类器结合来实现。然而,这种类器导向的SGM并不总能够实现准确的条件生成,特别是当使用少量标注数据训练时。我们认为这问题的根本原因是分类器的梯度不可靠和无法完全利用无标注数据 during training。我们然后提议通过让分类器自己准确化来改进类器导向的SGM。我们的关键思想是使用能量基模型的原理来转换分类器,然后使用现有的损失函数来准确化分类器,并使用标注数据和无标注数据进行准确化。实验结果表明,我们的方法可以明显提高条件生成质量,并在使用少量标注数据时保持优越性。这些结果证明了我们的方法在生成模型中具有有限标注数据的潜在优势。

Random Position Adversarial Patch for Vision Transformers

  • paper_url: http://arxiv.org/abs/2307.04066
  • repo_url: None
  • paper_authors: Mingzhen Shao
    for: This paper aims to propose a novel method for generating adversarial patches that can launch targeted attacks on vision transformers, overcoming the alignment constraint of previous studies.methods: The proposed method employs a GAN-like structure to generate the adversarial patch, instead of directly optimizing the patch using gradients.results: The generated adversarial patch exhibits effectiveness in achieving universal attacks on vision transformers in both digital and physical-world scenarios, and shows robustness to brightness restriction, color transfer, and random noise. Real-world attack experiments validate the effectiveness of the proposed method.Here’s the Chinese version:for: 这篇论文目的是提出一种新的对视转换器进行攻击的方法,使得攻击可以在任何位置进行。methods: 该方法使用GAN-like结构生成攻击 patch,而不是直接使用梯度来优化 patch。results: 生成的攻击 patch能够在数字和物理世界中实现对视转换器的通用攻击,并且具有对比明亮、颜色转换和随机噪声的Robustness。实际攻击实验证明了提案的有效性。
    Abstract Previous studies have shown the vulnerability of vision transformers to adversarial patches, but these studies all rely on a critical assumption: the attack patches must be perfectly aligned with the patches used for linear projection in vision transformers. Due to this stringent requirement, deploying adversarial patches for vision transformers in the physical world becomes impractical, unlike their effectiveness on CNNs. This paper proposes a novel method for generating an adversarial patch (G-Patch) that overcomes the alignment constraint, allowing the patch to launch a targeted attack at any position within the field of view. Specifically, instead of directly optimizing the patch using gradients, we employ a GAN-like structure to generate the adversarial patch. Our experiments show the effectiveness of the adversarial patch in achieving universal attacks on vision transformers, both in digital and physical-world scenarios. Additionally, further analysis reveals that the generated adversarial patch exhibits robustness to brightness restriction, color transfer, and random noise. Real-world attack experiments validate the effectiveness of the G-Patch to launch robust attacks even under some very challenging conditions.
    摘要 Here's the Simplified Chinese translation:先前的研究已经显示了视transformer的易受到攻击的潜在性,但这些研究都基于一个严格的假设:攻击 patches必须与视transformer中使用的 linear projection patches完全匹配。由于这个严格的要求,在实际世界中部署攻击 patches for vision transformers是不实际的,与 CNNs 不同。这篇论文提出了一种新的方法来生成攻击 patch (G-Patch),以 overcome 这个匹配 constraint,让 patch 可以在视野中任意位置发起攻击。具体来说,我们不直接使用梯度来优化 patch,而是使用 GAN 结构来生成攻击 patch。我们的实验显示,G-Patch 可以在数字和物理世界中实现对 vision transformers 的通用攻击。此外,进一步的分析发现,生成的攻击 patch 具有对 brightness restriction、color transfer 和随机噪声的Robustness。实际攻击实验证明 G-Patch 可以在一些非常困难的条件下发起Robust攻击。

Combining transmission speckle photography and convolutional neural network for determination of fat content in cow’s milk – an exercise in classification of parameters of a complex suspension

  • paper_url: http://arxiv.org/abs/2307.15069
  • repo_url: None
  • paper_authors: Kwasi Nyandey, Daniel Jakubczyk
  • for: direct classification and recognition of milk fat content classes
  • methods: combined transmission speckle photography and machine learning
  • results: achieved 100% test and independent classification accuracies
    Abstract We have combined transmission speckle photography and machine learning for direct classification and recognition of milk fat content classes. Our aim was hinged on the fact that parameters of scattering particles (and the dispersion medium) can be linked to the intensity distribution (speckle) observed when coherent light is transmitted through a scattering medium. For milk, it is primarily the size distribution and concentration of fat globules, which constitutes the total fat content. Consequently, we trained convolutional neural network to recognise and classify laser speckle from different fat content classes (0.5, 1.5, 2.0 and 3.2%). We investigated four exposure-time protocols and obtained the highest performance for shorter exposure times, in which the intensity histograms are kept similar for all images and the most probable intensity in the speckle pattern is close to zero. Our neural network was able to recognize the milk fat content classes unambiguously and we obtained the highest test and independent classification accuracies of 100 and ~99% respectively. It indicates that the parameters of other complex realistic suspensions could be classified with similar methods.
    摘要 我们将传输扑杂照相与机器学习结合用于直接分类和识别牛奶脂肪含量类别。我们的目标是基于散射体(以及杂散媒体)参数与干扰光束传输过程中观察到的INTENSITY分布(扑杂),以确定牛奶中脂肪含量的总体分布。因此,我们训练了卷积神经网络,以识别和分类不同脂肪含量类别(0.5、1.5、2.0和3.2%)的扑杂照片。我们研究了四种不同的曝光时间协议,并获得了最高性能,其中短曝光时间下,图像的INTENSITY分布保持相似,而最有可能的扑杂干扰强度在图像中几乎为零。我们的神经网络能够无ambiguously识别牛奶脂肪含量类别,并在测试和独立分类任务中获得了100%和~99%的准确率。这表示我们可以使用类似方法来分类其他复杂的实际涂杂体系。

Deep Unsupervised Learning Using Spike-Timing-Dependent Plasticity

  • paper_url: http://arxiv.org/abs/2307.04054
  • repo_url: None
  • paper_authors: Sen Lu, Abhronil Sengupta
  • for: 这个论文是为了提出一种基于STDP的深度学习框架,以提高SNNs的性能和可扩展性。
  • methods: 该论文使用了一种组合了STDP clustering和深度学习的方法,通过在网络输出上生成pseudo-标签来训练深度网络。
  • results: 相比于使用$k$-means clustering方法,该方法可以达到$24.56%$的高精度和$3.5\times$快的 convergespeed,在Tiny ImageNet dataset上的10类子集上实现。
    Abstract Spike-Timing-Dependent Plasticity (STDP) is an unsupervised learning mechanism for Spiking Neural Networks (SNNs) that has received significant attention from the neuromorphic hardware community. However, scaling such local learning techniques to deeper networks and large-scale tasks has remained elusive. In this work, we investigate a Deep-STDP framework where a convolutional network is trained in tandem with pseudo-labels generated by the STDP clustering process on the network outputs. We achieve $24.56\%$ higher accuracy and $3.5\times$ faster convergence speed at iso-accuracy on a 10-class subset of the Tiny ImageNet dataset in contrast to a $k$-means clustering approach.
    摘要 短时间依赖形变学习(STDP)是一种无监督学习机制,用于神经网络(SNN),它在神经机器学习社区中受到了重视。但是,将这种本地学习技术扩展到更深的网络和大规模任务中,仍然是一个棘手的问题。在这项工作中,我们调查了一种含有深度学习的STDP框架,其中一个卷积网络与STDP归类过程的输出生成的 Pseudo-标签一起进行训练。我们在一个Tiny ImageNet数据集的10类子集上达到了$24.56\%$的高精度和$3.5\times$快于iso-精度的整合速度。相比之下,使用$k$-means归类法,我们的精度提高了$24.56\%$,并且需要$3.5\times$多的训练时间。

Calibration-Aware Margin Loss: Pushing the Accuracy-Calibration Consistency Pareto Frontier for Deep Metric Learning

  • paper_url: http://arxiv.org/abs/2307.04047
  • repo_url: None
  • paper_authors: Qin Zhang, Linghan Xu, Qingming Tang, Jun Fang, Ying Nian Wu, Joe Tighe, Yifan Xing
  • for: 本文旨在提出一种新的评估 metric learning 模型准确性和一致性的方法,以便在不同的测试分布中实现轻松的部署。
  • methods: 本文提出了一种名为 Operating-Point-Incosistency-Score (OPIS) 的新指标,用于衡量不同类别在目标准化范围内的运行特性之间的差异。同时,本文还提出了一种名为 Calibration-Aware Margin (CAM) 的新正则项,用于在训练过程中鼓励类别间的表示结构更加一致。
  • results: 实验结果表明,CAM 正则项可以提高模型的准确性和一致性,并且可以在保持或者超越现有深度 metric learning 方法的情况下,提高模型的一致性。
    Abstract The ability to use the same distance threshold across different test classes / distributions is highly desired for a frictionless deployment of commercial image retrieval systems. However, state-of-the-art deep metric learning losses often result in highly varied intra-class and inter-class embedding structures, making threshold calibration a non-trivial process in practice. In this paper, we propose a novel metric named Operating-Point-Incosistency-Score (OPIS) that measures the variance in the operating characteristics across different classes in a target calibration range, and demonstrate that high accuracy of a metric learning embedding model does not guarantee calibration consistency for both seen and unseen classes. We find that, in the high-accuracy regime, there exists a Pareto frontier where accuracy improvement comes at the cost of calibration consistency. To address this, we develop a novel regularization, named Calibration-Aware Margin (CAM) loss, to encourage uniformity in the representation structures across classes during training. Extensive experiments demonstrate CAM's effectiveness in improving calibration-consistency while retaining or even enhancing accuracy, outperforming state-of-the-art deep metric learning methods.
    摘要 <>将文本翻译成简化中文。<>使用同一个距离阈值来部署商业图像检索系统是非常愿望的。然而,现状的深度度量学损失经常导致类之间和类之间的嵌入结构具有很高的变化程度,从而使得阈值调整成为实践中的非常困难问题。在这篇论文中,我们提出了一个新的度量名为操作点不一致分数(OPIS),该度量测量不同类别在目标调整范围内的操作特性的变化程度,并证明了高精度度量学嵌入模型不一定能够保证所见和未见类别之间的均衡一致。我们发现,在高精度 régime 下,存在一个Pareto前沿,其中精度提高来到了均衡一致的代价。为解决这个问题,我们开发了一种新的正则化方法,即均衡感知损失(CAM),以促进类别之间的表示结构具有更好的一致性。广泛的实验证明了CAM的效果,可以提高均衡一致性,同时保持或者提高精度,超越当前的深度度量学方法。

High Fidelity 3D Hand Shape Reconstruction via Scalable Graph Frequency Decomposition

  • paper_url: http://arxiv.org/abs/2307.05541
  • repo_url: https://github.com/tyluann/freqhand
  • paper_authors: Tianyu Luan, Yuanhao Zhai, Jingjing Meng, Zhong Li, Zhang Chen, Yi Xu, Junsong Yuan
  • for: 高精度手模型生成
  • methods: 频率拆分网络+频域分解损失
  • results: preserved 高频精度个性化手模型Here’s a brief explanation of each point:* “高精度手模型生成” (high-fidelity hand modeling) - The paper aims to generate highly detailed 3D hand models using a novel network architecture.* “频率拆分网络+” (frequency split network) - The proposed network uses a coarse-to-fine strategy to generate 3D hand meshes, with different frequency bands used in each resolution level.* “频域分解损失” (frequency decomposition loss) - The network uses a novel loss function to supervise each frequency component, allowing it to capture high-frequency personalized details.
    Abstract Despite the impressive performance obtained by recent single-image hand modeling techniques, they lack the capability to capture sufficient details of the 3D hand mesh. This deficiency greatly limits their applications when high-fidelity hand modeling is required, e.g., personalized hand modeling. To address this problem, we design a frequency split network to generate 3D hand mesh using different frequency bands in a coarse-to-fine manner. To capture high-frequency personalized details, we transform the 3D mesh into the frequency domain, and propose a novel frequency decomposition loss to supervise each frequency component. By leveraging such a coarse-to-fine scheme, hand details that correspond to the higher frequency domain can be preserved. In addition, the proposed network is scalable, and can stop the inference at any resolution level to accommodate different hardware with varying computational powers. To quantitatively evaluate the performance of our method in terms of recovering personalized shape details, we introduce a new evaluation metric named Mean Signal-to-Noise Ratio (MSNR) to measure the signal-to-noise ratio of each mesh frequency component. Extensive experiments demonstrate that our approach generates fine-grained details for high-fidelity 3D hand reconstruction, and our evaluation metric is more effective for measuring mesh details compared with traditional metrics.
    摘要 尽管最近的单图手模型技术具有印象人的表现,但它们缺乏能够捕捉足够细节的3D手模型的能力。这个问题限制了它们在需要高精度手模型时的应用。为解决这个问题,我们设计了一个频分网络,通过不同的频谱带来生成3D手模型。为了捕捉高频个性化细节,我们将3D网格转换到频域中,并提出了一种新的频分loss来监督每个频谱成分。通过这种层次结构,可以保持手中的高频细节。此外,我们的网络可扩展,可以根据不同的硬件计算能力 stopping inference 在不同的分辨率级别。为了量化评估我们方法在个性化手模型中恢复细节的性能,我们引入了一个新的评估指标,即 Mean Signal-to-Noise Ratio (MSNR),来度量每个频谱成分的信号噪比。我们的实验表明,我们的方法可以生成高精度的3D手模型,而我们引入的评估指标比传统指标更有效地度量网格细节。

  • paper_url: http://arxiv.org/abs/2307.04014
  • repo_url: None
  • paper_authors: Amirhossein Askari-Farsangi, Ali Sharifi-Zarchi, Mohammad Hossein Rohban
  • for: 这份研究的目的是为了提供一个基于深度学习的ALL诊断方法,并且解决了小型医疗训练数据所导致的模型简化现象。
  • methods: 我们的方法是基于专家的工作流程,使用多个图像进行诊断,并且将模型组织成一个多个例子学习问题,以提高诊断精度。
  • results: 我们的模型在ALL IDB 1上取得了96.15%的准确率,94.24%的F1分数,97.56%的感知率和90.91%的特征率,并且在对应外部数据集进行评估时也有了可接受的表现。
    Abstract Acute Lymphoblastic Leukemia (ALL) is one of the most common types of childhood blood cancer. The quick start of the treatment process is critical to saving the patient's life, and for this reason, early diagnosis of this disease is essential. Examining the blood smear images of these patients is one of the methods used by expert doctors to diagnose this disease. Deep learning-based methods have numerous applications in medical fields, as they have significantly advanced in recent years. ALL diagnosis is not an exception in this field, and several machine learning-based methods for this problem have been proposed. In previous methods, high diagnostic accuracy was reported, but our work showed that this alone is not sufficient, as it can lead to models taking shortcuts and not making meaningful decisions. This issue arises due to the small size of medical training datasets. To address this, we constrained our model to follow a pipeline inspired by experts' work. We also demonstrated that, since a judgement based on only one image is insufficient, redefining the problem as a multiple-instance learning problem is necessary for achieving a practical result. Our model is the first to provide a solution to this problem in a multiple-instance learning setup. We introduced a novel pipeline for diagnosing ALL that approximates the process used by hematologists, is sensitive to disease biomarkers, and achieves an accuracy of 96.15%, an F1-score of 94.24%, a sensitivity of 97.56%, and a specificity of 90.91% on ALL IDB 1. Our method was further evaluated on an out-of-distribution dataset, which posed a challenging test and had acceptable performance. Notably, our model was trained on a relatively small dataset, highlighting the potential for our approach to be applied to other medical datasets with limited data availability.
    摘要 急性limephoblastic leukemia(ALL)是儿童血液癌的最常见种类。快速开始治疗过程是患者生存的关键,因此早期诊断这种疾病非常重要。 examine the blood smear images of these patients is one of the methods used by expert doctors to diagnose this disease。 Deep learning-based methods have numerous applications in medical fields, and ALL diagnosis is no exception. However, previous machine learning-based methods for this problem have been criticized for relying too heavily on shortcuts and not making meaningful decisions. This issue arises due to the small size of medical training datasets. To address this, we constrained our model to follow a pipeline inspired by experts' work. We also demonstrated that redefining the problem as a multiple-instance learning problem is necessary for achieving a practical result. Our model is the first to provide a solution to this problem in a multiple-instance learning setup. We introduced a novel pipeline for diagnosing ALL that approximates the process used by hematologists, is sensitive to disease biomarkers, and achieves an accuracy of 96.15%, an F1-score of 94.24%, a sensitivity of 97.56%, and a specificity of 90.91% on ALL IDB 1. Our method was further evaluated on an out-of-distribution dataset, which posed a challenging test and had acceptable performance. Notably, our model was trained on a relatively small dataset, highlighting the potential for our approach to be applied to other medical datasets with limited data availability.

BPNet: Bézier Primitive Segmentation on 3D Point Clouds

  • paper_url: http://arxiv.org/abs/2307.04013
  • repo_url: https://github.com/bizerfr/bpnet
  • paper_authors: Rao Fu, Cheng Wen, Qian Li, Xiao Xiao, Pierre Alliez
  • for: 本文提出了BPNet,一种基于深度学习的端到端框架,用于在3D点云上学习B'ezier基本形态分割。现有的方法往往只处理不同的基本类型分割,因此它们只能处理有限的形态类型。为了解决这个问题,我们寻求一种通用的点云基本分割方法。
  • methods: 我们将B'ezier分解 transferred to point cloud segmentation,并在缓冲 architecture 上进行联合优化来学习B'ezier基本分割和形态适应同时。我们还引入了一个软投票正则化来提高基本分割,并提出了一个自适应嵌入模块来聚合点Cloud特征,使网络更加稳定和通用。
  • results: 我们在ABC dataset和实际扫描数据集上进行了广泛的实验,并与不同的基准方法进行比较。实验结果表明,我们的方法在基本分割方面具有superior performance,并且在推理速度方面表现出了显著的提高。
    Abstract This paper proposes BPNet, a novel end-to-end deep learning framework to learn B\'ezier primitive segmentation on 3D point clouds. The existing works treat different primitive types separately, thus limiting them to finite shape categories. To address this issue, we seek a generalized primitive segmentation on point clouds. Taking inspiration from B\'ezier decomposition on NURBS models, we transfer it to guide point cloud segmentation casting off primitive types. A joint optimization framework is proposed to learn B\'ezier primitive segmentation and geometric fitting simultaneously on a cascaded architecture. Specifically, we introduce a soft voting regularizer to improve primitive segmentation and propose an auto-weight embedding module to cluster point features, making the network more robust and generic. We also introduce a reconstruction module where we successfully process multiple CAD models with different primitives simultaneously. We conducted extensive experiments on the synthetic ABC dataset and real-scan datasets to validate and compare our approach with different baseline methods. Experiments show superior performance over previous work in terms of segmentation, with a substantially faster inference speed.
    摘要 Inspired by B\'ezier decomposition on NURBS models, we transfer this concept to guide point cloud segmentation and eliminate the need for separate treatment of different primitive types. Our approach uses a joint optimization framework to learn B\'ezier primitive segmentation and geometric fitting simultaneously on a cascaded architecture.To improve primitive segmentation, we introduce a soft voting regularizer and an auto-weight embedding module to cluster point features. These components make our network more robust and generic, allowing it to handle a wide range of shapes and scenarios.In addition, we introduce a reconstruction module that enables us to process multiple CAD models with different primitives simultaneously. Our approach is validated through extensive experiments on the synthetic ABC dataset and real-scan datasets, demonstrating superior performance over previous methods in terms of segmentation and inference speed.

cs.AI - 2023-07-09

On the Challenges of Deploying Privacy-Preserving Synthetic Data in the Enterprise

  • paper_url: http://arxiv.org/abs/2307.04208
  • repo_url: None
  • paper_authors: Lauren Arthur, Jason Costello, Jonathan Hardy, Will O’Brien, James Rea, Gareth Rees, Georgi Ganev
  • for: 本研究旨在探讨生成AI技术在企业部署中所遇到的挑战,尤其是由于巨量个人敏感数据的隐私问题。
  • methods: 本研究系统化了40多个挑战,并将其分为五大类:生成、基础设施与架构、治理、合规与法规、并 adopt。
  • results: 本研究提出了一种战略和系统的方法,可以帮助企业有效地解决挑战,并在实施解决方案时建立信任。
    Abstract Generative AI technologies are gaining unprecedented popularity, causing a mix of excitement and apprehension through their remarkable capabilities. In this paper, we study the challenges associated with deploying synthetic data, a subfield of Generative AI. Our focus centers on enterprise deployment, with an emphasis on privacy concerns caused by the vast amount of personal and highly sensitive data. We identify 40+ challenges and systematize them into five main groups -- i) generation, ii) infrastructure & architecture, iii) governance, iv) compliance & regulation, and v) adoption. Additionally, we discuss a strategic and systematic approach that enterprises can employ to effectively address the challenges and achieve their goals by establishing trust in the implemented solutions.
    摘要 <>通用AI技术在当前时期得到了历史上无 precedent的普及度,这引发了一种混乱的感觉,同时也带来了一些担忧。在这篇论文中,我们研究了生成数据的投入问题,这是通用AI技术的一个子领域。我们的研究重点在于企业部署,强调个人隐私权和敏感数据的巨大数量所引起的隐私问题。我们识别了40多个挑战,并将它们分为五个主要组:一、生成;二、基础设施与架构;三、管理;四、合规与法规;五、采用。此外,我们还讨论了企业可以采用的策略和系统性的方法,以确保在实施解决方案时建立信任。

Natural Language Instructions for Intuitive Human Interaction with Robotic Assistants in Field Construction Work

  • paper_url: http://arxiv.org/abs/2307.04195
  • repo_url: None
  • paper_authors: Somin Park, Xi Wang, Carol C. Menassa, Vineet R. Kamat, Joyce Y. Chai
  • For: This paper aims to provide a framework for human workers to interact with construction robots based on natural language instructions, enabling intuitive and familiar communication and improving teamwork and supervision in field construction.* Methods: The proposed method consists of three stages: Natural Language Understanding (NLU), Information Mapping (IM), and Robot Control (RC). The NLU module uses a language model to predict a tag for each word in the input natural language instruction. The IM module generates the final instructional output essential for the robot to acknowledge and perform the construction task, based on the result of the NLU module and building component information.* Results: A case study for drywall installation is conducted to evaluate the proposed approach, and the obtained results highlight the potential of using natural language-based interaction to replicate the communication that occurs between human workers within the context of human-robot teams.
    Abstract The introduction of robots is widely considered to have significant potential of alleviating the issues of worker shortage and stagnant productivity that afflict the construction industry. However, it is challenging to use fully automated robots in complex and unstructured construction sites. Human-Robot Collaboration (HRC) has shown promise of combining human workers' flexibility and robot assistants' physical abilities to jointly address the uncertainties inherent in construction work. When introducing HRC in construction, it is critical to recognize the importance of teamwork and supervision in field construction and establish a natural and intuitive communication system for the human workers and robotic assistants. Natural language-based interaction can enable intuitive and familiar communication with robots for human workers who are non-experts in robot programming. However, limited research has been conducted on this topic in construction. This paper proposes a framework to allow human workers to interact with construction robots based on natural language instructions. The proposed method consists of three stages: Natural Language Understanding (NLU), Information Mapping (IM), and Robot Control (RC). Natural language instructions are input to a language model to predict a tag for each word in the NLU module. The IM module uses the result of the NLU module and building component information to generate the final instructional output essential for a robot to acknowledge and perform the construction task. A case study for drywall installation is conducted to evaluate the proposed approach. The obtained results highlight the potential of using natural language-based interaction to replicate the communication that occurs between human workers within the context of human-robot teams.
    摘要 introduce robots 广泛认为可以有效缓解建筑业的工作人员短缺和低效 Productivity 问题。然而,在复杂和无结构的建筑 Site 中使用完全自动化 Robot 很具挑战性。人 robot合作 (HRC) 表现出了将人工作者的灵活性和机器助手的物理能力结合起来解决建筑工作中的不确定性的潜力。在将 HRC 引入建筑时,需要认可场地建筑 Teamwork 和监督的重要性,并建立一个自然和直观的沟通系统,以便人工作者和机器助手之间能够协作。使用自然语言基于的交互可以让人工作者成为不熟悉机器程序的情况下,与机器助手进行直观和熟悉的交互。然而,建筑业中对这个主题的研究较少。这篇论文提出了一种框架,以便人工作者通过自然语言指令与建筑机器人进行交互。该方法包括三个阶段:自然语言理解 (NLU)、信息映射 (IM) 和机器人控制 (RC)。自然语言指令将输入到语言模型中,以便预测每个词的标签。IM模块使用NLU模块的结果和建筑元件信息,生成构建任务所需的最终指令输出,以便机器人认可并执行构建任务。一个关于墙壁安装的实验研究被进行,以评估提议的方法。研究结果表明,使用自然语言基于的交互可以复制人工作者之间的通信,在人机合作团队中。

SAS Video-QA: Self-Adaptive Sampling for Efficient Video Question-Answering

  • paper_url: http://arxiv.org/abs/2307.04192
  • repo_url: https://github.com/declare-lab/sas-vqa
  • paper_authors: Wei Han, Hui Chen, Min-Yen Kan, Soujanya Poria
  • for: 提高视频理解模型的效果和可靠性,尤其是在实时应用场景中。
  • methods: 提出了两种框架采样策略:最域幂frames(MDF)和最含义frames(MIF),以最大化保留关键帧。MDF通过循环采样来减少风险,而MIF通过辅助模型来搜索个性化的关键帧。
  • results: 在三个公共数据集上(CLIP、GIT和All-in-one)进行了实验,结果表明,提出的采样策略可以提高图文预训练模型的性能。
    Abstract Video question--answering is a fundamental task in the field of video understanding. Although current vision--language models (VLMs) equipped with Video Transformers have enabled temporal modeling and yielded superior results, they are at the cost of huge computational power and thus too expensive to deploy in real-time application scenarios. An economical workaround only samples a small portion of frames to represent the main content of that video and tune an image--text model on these sampled frames. Recent video understanding models usually randomly sample a set of frames or clips, regardless of internal correlations between their visual contents, nor their relevance to the problem. We argue that such kinds of aimless sampling may omit the key frames from which the correct answer can be deduced, and the situation gets worse when the sampling sparsity increases, which always happens as the video lengths increase. To mitigate this issue, we propose two frame sampling strategies, namely the most domain frames (MDF) and most implied frames (MIF), to maximally preserve those frames that are most likely vital to the given questions. MDF passively minimizes the risk of key frame omission in a bootstrap manner, while MIS actively searches key frames customized for each video--question pair with the assistance of auxiliary models. The experimental results on three public datasets from three advanced VLMs (CLIP, GIT and All-in-one) demonstrate that our proposed strategies can boost the performance for image--text pretrained models. The source codes pertaining to the method proposed in this paper are publicly available at https://github.com/declare-lab/sas-vqa.
    摘要 视频问答是视频理解领域的基本任务。尽管当前的视频语言模型(VLM)搭配视频变换器已经实现了时间模型化,并且获得了更高的性能,但是这些模型却需要很大的计算能力,因此在实时应用场景中成本太高。为了解决这个问题,我们提出了两种帧 sampling 策略:最域帧(MDF)和最含义帧(MIF),以最大化保留关键帧。MDF 采用 bootstrap 方式减少风险 omitted 关键帧,而 MIF 通过辅助模型自动搜索每个视频问题对应的关键帧。我们在三个公共数据集上(CLIP、GIT 和 All-in-one)进行了实验,结果显示,我们提出的策略可以提高图文预训练模型的性能。关于我们提出的方法的源代码,可以在 GitHub 上获取:https://github.com/declare-lab/sas-vqa。

Review of feedback in Automated Essay Scoring

  • paper_url: http://arxiv.org/abs/2307.05553
  • repo_url: None
  • paper_authors: You-Jin Jong, Yong-Jin Kim, Ok-Chol Ri
  • for: 这篇论文主要是为了探讨自动化论文评分系统的发展和其在写作技己提升方面的应用。
  • methods: 本论文通过审查已有的研究和最新的案例研究,探讨了不同类型的反馈和论文特征在自动化论文评分系统中的应用。
  • results: 研究发现,反馈是自动化论文评分系统的关键因素,可以帮助用户提升写作技己。
    Abstract The first automated essay scoring system was developed 50 years ago. Automated essay scoring systems are developing into systems with richer functions than the previous simple scoring systems. Its purpose is not only to score essays but also as a learning tool to improve the writing skill of users. Feedback is the most important aspect of making an automated essay scoring system useful in real life. The importance of feedback was already emphasized in the first AES system. This paper reviews research on feedback including different feedback types and essay traits on automated essay scoring. We also reviewed the latest case studies of the automated essay scoring system that provides feedback.
    摘要 50 年前开发出了第一个自动化评分系统,现在自动化评分系统在功能上不断提高,不再是简单的评分系统。它的目的不只是评分文章,更重要的是作为学习工具,帮助用户提高写作技巧。回馈是自动化评分系统在实际应用中最重要的一部分。这篇文章检视了不同类型的回馈和文章特征在自动化评分系统中的应用,同时也检视了最新的案例研究。

Latent Graph Attention for Enhanced Spatial Context

  • paper_url: http://arxiv.org/abs/2307.04149
  • repo_url: None
  • paper_authors: Ayush Singh, Yash Bhambhu, Himanshu Buckchash, Deepak K. Gupta, Dilip K. Prasad
  • for: 这个论文的目的是提出一种 computationally inexpensive 和稳定的全局 контекст模型,用于提高图像转换 зада务中的性能。
  • methods: 该论文使用了一种名为 Latent Graph Attention (LGA) 的模型,它利用一种分布式图像网络来传递信息,并通过自适应的深度设置来控制全局 контекст的扩展。
  • results: 该论文通过在三个复杂的应用中(透明物体分割、雾度修复和光流估计)的实验,表明 incorporating LGA 可以提高性能,而且可以使小型结构更接近大型结构的性能。
    Abstract Global contexts in images are quite valuable in image-to-image translation problems. Conventional attention-based and graph-based models capture the global context to a large extent, however, these are computationally expensive. Moreover, the existing approaches are limited to only learning the pairwise semantic relation between any two points on the image. In this paper, we present Latent Graph Attention (LGA) a computationally inexpensive (linear to the number of nodes) and stable, modular framework for incorporating the global context in the existing architectures, especially empowering small-scale architectures to give performance closer to large size architectures, thus making the light-weight architectures more useful for edge devices with lower compute power and lower energy needs. LGA propagates information spatially using a network of locally connected graphs, thereby facilitating to construct a semantically coherent relation between any two spatially distant points that also takes into account the influence of the intermediate pixels. Moreover, the depth of the graph network can be used to adapt the extent of contextual spread to the target dataset, thereby being able to explicitly control the added computational cost. To enhance the learning mechanism of LGA, we also introduce a novel contrastive loss term that helps our LGA module to couple well with the original architecture at the expense of minimal additional computational load. We show that incorporating LGA improves the performance on three challenging applications, namely transparent object segmentation, image restoration for dehazing and optical flow estimation.
    摘要

A Survey and Approach to Chart Classification

  • paper_url: http://arxiv.org/abs/2307.04147
  • repo_url: None
  • paper_authors: Anurag Dhote, Mohammed Javed, David S Doermann
  • for: 这篇论文主要针对的是自动化图表分类问题,即通过分类图表的类型和特征来理解图表中含义的问题。
  • methods: 这篇论文主要介绍了现有的图表分类技术,包括传统的机器学习方法、卷积神经网络和变换器等。
  • results: 论文中提出了一种基于视觉变换器的图表分类模型,并在CHARTINFO UB-UNITECH PMC数据集上进行了比较性表现分析,实现了图表分类领域的状态机器。
    Abstract Charts represent an essential source of visual information in documents and facilitate a deep understanding and interpretation of information typically conveyed numerically. In the scientific literature, there are many charts, each with its stylistic differences. Recently the document understanding community has begun to address the problem of automatic chart understanding, which begins with chart classification. In this paper, we present a survey of the current state-of-the-art techniques for chart classification and discuss the available datasets and their supported chart types. We broadly classify these contributions as traditional approaches based on ML, CNN, and Transformers. Furthermore, we carry out an extensive comparative performance analysis of CNN-based and transformer-based approaches on the recently published CHARTINFO UB-UNITECH PMC dataset for the CHART-Infographics competition at ICPR 2022. The data set includes 15 different chart categories, including 22,923 training images and 13,260 test images. We have implemented a vision-based transformer model that produces state-of-the-art results in chart classification.
    摘要

Emotion Analysis on EEG Signal Using Machine Learning and Neural Network

  • paper_url: http://arxiv.org/abs/2307.05375
  • repo_url: None
  • paper_authors: S. M. Masrur Ahmed, Eshaan Tanzim Sabur
  • for: 本研究的目的是提高使用脑波识别情绪的性能。
  • methods: 本研究使用了EEG信号处理技术和人工神经网络模型,包括SVM、KNN和RNN等,以提高情绪识别性能。
  • results: 研究在DEAP数据集上进行了多种情绪状态的分类和测试,并获得了较高的识别精度。
    Abstract Emotion has a significant influence on how one thinks and interacts with others. It serves as a link between how a person feels and the actions one takes, or it could be said that it influences one's life decisions on occasion. Since the patterns of emotions and their reflections vary from person to person, their inquiry must be based on approaches that are effective over a wide range of population regions. To extract features and enhance accuracy, emotion recognition using brain waves or EEG signals requires the implementation of efficient signal processing techniques. Various approaches to human-machine interaction technologies have been ongoing for a long time, and in recent years, researchers have had great success in automatically understanding emotion using brain signals. In our research, several emotional states were classified and tested on EEG signals collected from a well-known publicly available dataset, the DEAP Dataset, using SVM (Support Vector Machine), KNN (K-Nearest Neighbor), and an advanced neural network model, RNN (Recurrent Neural Network), trained with LSTM (Long Short Term Memory). The main purpose of this study is to improve ways to improve emotion recognition performance using brain signals. Emotions, on the other hand, can change with time. As a result, the changes in emotion over time are also examined in our research.
    摘要 感情对人们的思维和社交互动产生重要影响。它是人们的情感和行为之间的联系,也可以说是影响人们的生活决策的一种因素。由于人们的情感模式和表达方式不同,因此对于不同人群的情感识别需要采用有效的方法。为了提取特征和提高准确性,使用大脑电声信号进行情感识别需要实施有效的信号处理技术。在人机交互技术方面,研究人员已经在过去几年里取得了很大的成功,通过自动理解大脑电声信号来识别情感。在我们的研究中,我们使用了DEAP数据集,使用SVM、KNN和RNN(长短期 памя存)模型,并在LSTM(长短期 памя存)模型中进行训练,以提高情感识别性能。本研究的主要目标是提高情感识别性能使用大脑电声信号。同时,我们还对情感的变化考查了时间的影响。

A Novel Explainable Artificial Intelligence Model in Image Classification problem

  • paper_url: http://arxiv.org/abs/2307.04137
  • repo_url: None
  • paper_authors: Quoc Hung Cao, Truong Thanh Hung Nguyen, Vo Thanh Khang Nguyen, Xuan Phong Nguyen
  • for: 本研究旨在提供一种新的图像分类模型解释方法,以帮助AI科学家和实际应用者更深入地理解模型内部的工作机制。
  • methods: 本研究使用了LIME、CAM和GradCAM等多种现有的解释算法,并将其综合使用以提高解释效果。同时,本方法还实现了提高解释效果的时间和空间约束。
  • results: 对于ILSVRC数据集中的多种图像分类模型,包括ResNet50、Inception-v3和VGG16等,本方法在准确率和解释效果两个方面均取得了出色的结果。
    Abstract In recent years, artificial intelligence is increasingly being applied widely in many different fields and has a profound and direct impact on human life. Following this is the need to understand the principles of the model making predictions. Since most of the current high-precision models are black boxes, neither the AI scientist nor the end-user deeply understands what's going on inside these models. Therefore, many algorithms are studied for the purpose of explaining AI models, especially those in the problem of image classification in the field of computer vision such as LIME, CAM, GradCAM. However, these algorithms still have limitations such as LIME's long execution time and CAM's confusing interpretation of concreteness and clarity. Therefore, in this paper, we propose a new method called Segmentation - Class Activation Mapping (SeCAM) that combines the advantages of these algorithms above, while at the same time overcoming their disadvantages. We tested this algorithm with various models, including ResNet50, Inception-v3, VGG16 from ImageNet Large Scale Visual Recognition Challenge (ILSVRC) data set. Outstanding results when the algorithm has met all the requirements for a specific explanation in a remarkably concise time.
    摘要 Recently,人工智能在各个领域广泛应用,对人类生活产生深远的影响。随着模型预测的需求,需要理解模型的原理。然而,现有的高精度模型大多是黑obox,科学家和用户都无法深入了解模型内部的工作原理。为此,许多算法被研究以解释AI模型,特别是计算视觉领域的图像分类问题中的LIME、CAM和GradCAM等算法。然而,这些算法仍有局限性,如LIME的执行时间过长和CAM的抽象和明确性的含糊不清。因此,在本文中,我们提出了一种新的方法 called Segmentation - Class Activation Mapping(SeCAM),该方法结合了上述算法的优点,同时超越了它们的缺点。我们在不同的模型,包括ResNet50、Inception-v3和VGG16等,在ImageNet大规模视觉识别挑战(ILSVRC)数据集上进行了测试,结果很出色,可以快速和准确地解释特定的模型预测结果。

Reasoning over the Behaviour of Objects in Video-Clips for Adverb-Type Recognition

  • paper_url: http://arxiv.org/abs/2307.04132
  • repo_url: None
  • paper_authors: Amrit Diggavi Seshadri, Alessandra Russo
  • for: 本研究旨在Recognize scene adverbs from raw video clips, without assuming knowledge of the clips’ underlying action types.
  • methods: 提议一个新的框架,利用对 Raw video clips 中对象行为的抽象来认识clip的相应adverb-types。该框架包括一个新的数据采集管道和一种基于符号和转换器的推理方法。
  • results: 实验结果表明,提议的方法可以与之前的状态OF-THE-ART技术进行比较,而且支持符号视频处理的努力。此外,我们还发布了两个新的数据集,以支持符号视频处理:MSR-VTT-ASP和ActivityNet-ASP数据集。
    Abstract In this work, following the intuition that adverbs describing scene-sequences are best identified by reasoning over high-level concepts of object-behavior, we propose the design of a new framework that reasons over object-behaviours extracted from raw-video-clips to recognize the clip's corresponding adverb-types. Importantly, while previous works for general scene adverb-recognition assume knowledge of the clips underlying action-types, our method is directly applicable in the more general problem setting where the action-type of a video-clip is unknown. Specifically, we propose a novel pipeline that extracts human-interpretable object-behaviour-facts from raw video clips and propose novel symbolic and transformer based reasoning methods that operate over these extracted facts to identify adverb-types. Experiment results demonstrate that our proposed methods perform favourably against the previous state-of-the-art. Additionally, to support efforts in symbolic video-processing, we release two new datasets of object-behaviour-facts extracted from raw video clips - the MSR-VTT-ASP and ActivityNet-ASP datasets.
    摘要 在这个工作中,我们采用直觉,认为Scene-sequences中的副词可以通过对物体行为高级概念的逻辑来识别。因此,我们提出了一种新的框架,可以从 raw-video-clip 中提取 object-behaviour-facts,并使用符号和变换器来进行逻辑和计算来识别 adverb-types。与前一些Scene adverb-recognition工作不同,我们的方法不需要知道clip的 action-type。具体来说,我们提出了一个新的管道,可以从 raw video clips 中提取人类可解释的 object-behaviour-facts,并提出了一种新的符号和变换器来进行逻辑和计算来识别 adverb-types。实验结果表明,我们的提出的方法在比较prevailing state-of-the-art方法之上表现出色。此外,为支持符号视频处理,我们释放了两个新的对象行为事实数据集 - MSR-VTT-ASP 和 ActivityNet-ASP 数据集。

  • paper_url: http://arxiv.org/abs/2307.04131
  • repo_url: None
  • paper_authors: Yiyang Zhao, Tian Guo
  • for: 降低神经网络设计过程中的能耗成本和碳负担
  • methods: 提出了一种基于碳效率的神经建筑搜索方法(CE-NAS),包括碳效率评估算法、多目标优化器和启发式GPU分配策略
  • results: 使用最新的NASbenchmark数据集和两个碳轨迹进行跟踪驱动的模拟结果显示,CE-NAS在碳负担和搜索效率方面比基eline三个基线更好
    Abstract This work presents a novel approach to neural architecture search (NAS) that aims to reduce energy costs and increase carbon efficiency during the model design process. The proposed framework, called carbon-efficient NAS (CE-NAS), consists of NAS evaluation algorithms with different energy requirements, a multi-objective optimizer, and a heuristic GPU allocation strategy. CE-NAS dynamically balances energy-efficient sampling and energy-consuming evaluation tasks based on current carbon emissions. Using a recent NAS benchmark dataset and two carbon traces, our trace-driven simulations demonstrate that CE-NAS achieves better carbon and search efficiency than the three baselines.
    摘要

FILM: How can Few-Shot Image Classification Benefit from Pre-Trained Language Models?

  • paper_url: http://arxiv.org/abs/2307.04114
  • repo_url: None
  • paper_authors: Zihao Jiang, Yunkai Dang, Dong Pang, Huishuai Zhang, Weiran Huang
  • for: 增强少样本学习的方法,使模型可以通过少量样本进行泛化。
  • methods: 使用预训练语言模型,基于对比学习来提取 semantic information,并通过metric模块进行对visual特征和文本嵌入的Alignment。
  • results: 通过对多个 benchmark 进行广泛的实验,证明了我们的方法的有效性。
    Abstract Few-shot learning aims to train models that can be generalized to novel classes with only a few samples. Recently, a line of works are proposed to enhance few-shot learning with accessible semantic information from class names. However, these works focus on improving existing modules such as visual prototypes and feature extractors of the standard few-shot learning framework. This limits the full potential use of semantic information. In this paper, we propose a novel few-shot learning framework that uses pre-trained language models based on contrastive learning. To address the challenge of alignment between visual features and textual embeddings obtained from text-based pre-trained language model, we carefully design the textual branch of our framework and introduce a metric module to generalize the cosine similarity. For better transferability, we let the metric module adapt to different few-shot tasks and adopt MAML to train the model via bi-level optimization. Moreover, we conduct extensive experiments on multiple benchmarks to demonstrate the effectiveness of our method.
    摘要 少量学习目标是训练模型能够通过少量样本泛化到新类。最近,一些工作提出了使用可 accessible semantic information from class names 进行增强少量学习。然而,这些工作通常是修改标准少量学习框架中的视觉原型和特征提取器。这限制了semantic information的全部潜力。在本文中,我们提出了一种新的少量学习框架,使用基于对比学习的预训练语言模型。为了解决视觉特征和文本嵌入获得自文本预训练语言模型之间的对应挑战,我们在文本分支中进行了精心的设计,并引入了一个度量模块来泛化cosine相似性。为了提高trasferability,我们让度量模块适应不同的少量任务,并采用MAML来训练模型viabi-level优化。此外,我们在多个benchmark上进行了广泛的实验,以证明我们的方法的有效性。

A User Study on Explainable Online Reinforcement Learning for Adaptive Systems

  • paper_url: http://arxiv.org/abs/2307.04098
  • repo_url: None
  • paper_authors: Andreas Metzger, Jan Laufer, Felix Feit, Klaus Pohl
  • For: The paper aims to evaluate the effectiveness and usability of an explainable reinforcement learning technique (XRL-DINE) for software engineers to understand and debug adaptive systems.* Methods: The paper uses an empirical user study involving 54 software engineers to assess the performance of software engineers when performing different tasks using XRL-DINE, and to evaluate the perceived usefulness and ease of use of XRL-DINE.* Results: The study finds that XRL-DINE provides visual insights into why certain decisions were made at important time points, and that software engineers perceive XRL-DINE as useful and easy to use.
    Abstract Online reinforcement learning (RL) is increasingly used for realizing adaptive systems in the presence of design time uncertainty. Online RL facilitates learning from actual operational data and thereby leverages feedback only available at runtime. However, Online RL requires the definition of an effective and correct reward function, which quantifies the feedback to the RL algorithm and thereby guides learning. With Deep RL gaining interest, the learned knowledge is no longer explicitly represented, but is represented as a neural network. For a human, it becomes practically impossible to relate the parametrization of the neural network to concrete RL decisions. Deep RL thus essentially appears as a black box, which severely limits the debugging of adaptive systems. We previously introduced the explainable RL technique XRL-DINE, which provides visual insights into why certain decisions were made at important time points. Here, we introduce an empirical user study involving 54 software engineers from academia and industry to assess (1) the performance of software engineers when performing different tasks using XRL-DINE and (2) the perceived usefulness and ease of use of XRL-DINE.
    摘要

DebateKG: Automatic Policy Debate Case Creation with Semantic Knowledge Graphs

  • paper_url: http://arxiv.org/abs/2307.04090
  • repo_url: https://github.com/hellisotherpeople/debatekg
  • paper_authors: Allen Roush
  • for: 本研究的目的是使用自然语言处理系统解决竞赛辩论中的问题,特别是构建高质量辩论案例。
  • methods: 本研究使用受限短 PATH 搜索在Argumentative Semantic Knowledge Graphs上进行了实现。
  • results: 研究人员在Policy Debate中的一种美国竞赛辩论中,使用这种方法可以大幅提高DebateSum数据集的质量,并且开发了一种新的评价方法来评估不同的知识图。
    Abstract Recent work within the Argument Mining community has shown the applicability of Natural Language Processing systems for solving problems found within competitive debate. One of the most important tasks within competitive debate is for debaters to create high quality debate cases. We show that effective debate cases can be constructed using constrained shortest path traversals on Argumentative Semantic Knowledge Graphs. We study this potential in the context of a type of American Competitive Debate, called Policy Debate, which already has a large scale dataset targeting it called DebateSum. We significantly improve upon DebateSum by introducing 53180 new examples, as well as further useful metadata for every example, to the dataset. We leverage the txtai semantic search and knowledge graph toolchain to produce and contribute 9 semantic knowledge graphs built on this dataset. We create a unique method for evaluating which knowledge graphs are better in the context of producing policy debate cases. A demo which automatically generates debate cases, along with all other code and the Knowledge Graphs, are open-sourced and made available to the public here: https://github.com/Hellisotherpeople/DebateKG
    摘要 近期在论证挖掘社区中的工作表明,自然语言处理系统可以解决竞论中的问题。竞论中最重要的任务之一是论者创造高质量辩论案例。我们表明,使用限制最短路径搜索的方法可以构建有效的辩论案例。我们在美国竞论中的一种类型,即政策辩论,已经有了大规模的数据集,即DebateSum。我们在DebateSum上进行了大幅改进,并将53180个新的例子和每个例子的更多有用的元数据添加到数据集中。我们利用txtai的 semantic search和知识图工具链来生成和投入9个基于这个数据集的Semantic Knowledge Graph。我们还创造了一种用于评估知识图的评价方法。一个可以自动生成辩论案例的 demo,以及所有代码和知识图,都被开源并公开提供于公众,可以在以下链接中找到:https://github.com/Hellisotherpeople/DebateKG。

Semi Supervised Meta Learning for Spatiotemporal Learning

  • paper_url: http://arxiv.org/abs/2308.01916
  • repo_url: None
  • paper_authors: Faraz Waseem, Pratyush Muthukumar
  • for: 这个论文的目的是应用元学习到自动编码器中进行空间时间学习。
  • methods: 这个论文使用的方法是将元学习搅入现有的状态艺术架构中。他们通过三个步骤来实现这一目标:首先,他们使用隐藏状态搅入网络(MANN)架构来应用元学习到他们的小规模空间时间数据集中进行视频重建任务。其次,他们使用一个预训练的MAE编码器并在其上添加一个分类头进行动作分类任务。最后,他们使用一个预训练的MAE编码器并与Mann架构结合来进行动作分类任务。
  • results: 这个论文的结果表明,通过应用元学习到现有的状态艺术架构中,可以提高空间时间学习的性能。
    Abstract We approached the goal of applying meta-learning to self-supervised masked autoencoders for spatiotemporal learning in three steps. Broadly, we seek to understand the impact of applying meta-learning to existing state-of-the-art representation learning architectures. Thus, we test spatiotemporal learning through: a meta-learning architecture only, a representation learning architecture only, and an architecture applying representation learning alongside a meta learning architecture. We utilize the Memory Augmented Neural Network (MANN) architecture to apply meta-learning to our framework. Specifically, we first experiment with applying a pre-trained MAE and fine-tuning on our small-scale spatiotemporal dataset for video reconstruction tasks. Next, we experiment with training an MAE encoder and applying a classification head for action classification tasks. Finally, we experiment with applying a pre-trained MAE and fine-tune with MANN backbone for action classification tasks.
    摘要 我们通过三步来应用meta学到自我超visedMasked autoencoders中的空间时间学习。大致来说,我们想要了解将meta学应用到现有的状态艺术 repreentation learning架构中的影响。因此,我们通过以下三种方式进行测试:1. 仅使用meta学架构,2. 仅使用 repreentation learning架构,3. 将 repreentation learning架构与meta学架构相结合。我们使用Memory Augmented Neural Network(MANN)架构来应用meta学到我们的小规模空间时间数据集中。具体来说,我们首先试验使用预训练的MAE和我们小规模空间时间数据集进行视频重建任务。然后,我们试验将MAE编码器训练并应用分类头进行动作分类任务。最后,我们试验使用预训练的MAE和Mann架构进行动作分类任务。

Disentangling Societal Inequality from Model Biases: Gender Inequality in Divorce Court Proceedings

  • paper_url: http://arxiv.org/abs/2307.10200
  • repo_url: None
  • paper_authors: Sujan Dutta, Parth Srivastava, Vaishnavi Solunke, Swaprava Nath, Ashiqur R. KhudaBukhsh
  • For: This paper uses a large corpus of court proceedings to investigate gender inequality in the context of divorce in India.* Methods: The authors use natural language processing (NLP) techniques to analyze the court proceedings and quantify societal inequalities. They also modify existing NLP resources to better suit their research goals.* Results: The authors find that while there may be changing norms in India with more women challenging patriarchy, there is still striking gender inequality in the context of divorce, with women often experiencing domestic violence.Here’s the same information in Simplified Chinese text:* For: 这篇论文通过废除婚姻的法律分裂的法院记录,调查印度妇女在婚姻中的不平等。* Methods: 作者使用自然语言处理(NLP)技术分析法院记录,量化社会不平等。他们还对现有的NLP资源进行了一些修改,以更好地适应他们的研究目标。* Results: 作者发现,尽管印度可能存在变革,但在废除婚姻中,女性仍然面临着家庭暴力的问题,是一种 striking 的性别不平等。
    Abstract Divorce is the legal dissolution of a marriage by a court. Since this is usually an unpleasant outcome of a marital union, each party may have reasons to call the decision to quit which is generally documented in detail in the court proceedings. Via a substantial corpus of 17,306 court proceedings, this paper investigates gender inequality through the lens of divorce court proceedings. While emerging data sources (e.g., public court records) on sensitive societal issues hold promise in aiding social science research, biases present in cutting-edge natural language processing (NLP) methods may interfere with or affect such studies. We thus require a thorough analysis of potential gaps and limitations present in extant NLP resources. In this paper, on the methodological side, we demonstrate that existing NLP resources required several non-trivial modifications to quantify societal inequalities. On the substantive side, we find that while a large number of court cases perhaps suggest changing norms in India where women are increasingly challenging patriarchy, AI-powered analyses of these court proceedings indicate striking gender inequality with women often subjected to domestic violence.
    摘要

A Personalized Reinforcement Learning Summarization Service for Learning Structure from Unstructured Data

  • paper_url: http://arxiv.org/abs/2307.05696
  • repo_url: None
  • paper_authors: Samira Ghodratnama, Amin Beheshti, Mehrdad Zakershahrak
  • for: 提供个性化文摘服务,帮助用户从大量文档中提取有意义的信息。
  • methods: 使用强化学习算法生成个性化文摘,并将文摘映射到一个层次结构中,以便用户更好地理解和浏览文档。
  • results: 提高了用户的理解和 Navigation 能力,帮助用户从文档中提取有意义的信息。
    Abstract The exponential growth of textual data has created a crucial need for tools that assist users in extracting meaningful insights. Traditional document summarization approaches often fail to meet individual user requirements and lack structure for efficient information processing. To address these limitations, we propose Summation, a hierarchical personalized concept-based summarization approach. It synthesizes documents into a concise hierarchical concept map and actively engages users by learning and adapting to their preferences. Using a Reinforcement Learning algorithm, Summation generates personalized summaries for unseen documents on specific topics. This framework enhances comprehension, enables effective navigation, and empowers users to extract meaningful insights from large document collections aligned with their unique requirements.
    摘要 文本数据的指数增长带来了提取有意义信息的重要需求。传统文摘方法frequently fail to meet individual user requirements and lack structure for efficient information processing. To address these limitations, we propose Summation, a hierarchical personalized concept-based summarization approach. It synthesizes documents into a concise hierarchical concept map and actively engages users by learning and adapting to their preferences. Using a Reinforcement Learning algorithm, Summation generates personalized summaries for unseen documents on specific topics. This framework enhances comprehension, enables effective navigation, and empowers users to extract meaningful insights from large document collections aligned with their unique requirements.Here's a word-for-word translation of the text into Simplified Chinese:文本数据的指数增长带来了提取有意义信息的重要需求。传统文摘方法frequently fail to meet individual user requirements and lack structure for efficient information processing. To address these limitations, we propose Summation, a hierarchical personalized concept-based summarization approach. It synthesizes documents into a concise hierarchical concept map and actively engages users by learning and adapting to their preferences. Using a Reinforcement Learning algorithm, Summation generates personalized summaries for unseen documents on specific topics. This framework enhances comprehension, enables effective navigation, and empowers users to extract meaningful insights from large document collections aligned with their unique requirements.

Multi-Head Attention Mechanism Learning for Cancer New Subtypes and Treatment Based on Cancer Multi-Omics Data

  • paper_url: http://arxiv.org/abs/2307.04075
  • repo_url: None
  • paper_authors: Liangrui Pan, Dazhen Liu, Yutao Dou, Lian Wang, Zhichao Feng, Pengfei Rong, Liwen Xu, Shaoliang Peng
    for:The paper aims to identify and characterize cancer subtypes using unsupervised contrastive learning on multi-omics data.methods:The proposed method uses a generalization framework based on attention mechanisms for unsupervised contrastive learning (AMUCL), which includes a decoupled contrastive learning model (DMACL) based on a multi-head attention mechanism to deeply extract multi-omics data features and identify new cancer subtypes.results:The DMACL model achieved the most reliable cancer subtype clustering results on a single-cell multi-omics dataset and a cancer multi-omics dataset, with a C-index of 0.002, a Silhouette score of 0.801, and a Davies Bouldin Score of 0.38 on the single-cell dataset, and a C-index of 0.016, a Silhouette score of 0.688, and a Davies Bouldin Score of 0.46 on the cancer dataset. The results also revealed six cancer subtypes of AML, which were validated through GO functional enrichment, subtype-specific biological functions, and GSEA.
    Abstract Due to the high heterogeneity and clinical characteristics of cancer, there are significant differences in multi-omics data and clinical features among subtypes of different cancers. Therefore, the identification and discovery of cancer subtypes are crucial for the diagnosis, treatment, and prognosis of cancer. In this study, we proposed a generalization framework based on attention mechanisms for unsupervised contrastive learning (AMUCL) to analyze cancer multi-omics data for the identification and characterization of cancer subtypes. AMUCL framework includes a unsupervised multi-head attention mechanism, which deeply extracts multi-omics data features. Importantly, a decoupled contrastive learning model (DMACL) based on a multi-head attention mechanism is proposed to learn multi-omics data features and clusters and identify new cancer subtypes. This unsupervised contrastive learning method clusters subtypes by calculating the similarity between samples in the feature space and sample space of multi-omics data. Compared to 11 other deep learning models, the DMACL model achieved a C-index of 0.002, a Silhouette score of 0.801, and a Davies Bouldin Score of 0.38 on a single-cell multi-omics dataset. On a cancer multi-omics dataset, the DMACL model obtained a C-index of 0.016, a Silhouette score of 0.688, and a Davies Bouldin Score of 0.46, and obtained the most reliable cancer subtype clustering results for each type of cancer. Finally, we used the DMACL model in the AMUCL framework to reveal six cancer subtypes of AML. By analyzing the GO functional enrichment, subtype-specific biological functions, and GSEA of AML, we further enhanced the interpretability of cancer subtype analysis based on the generalizable AMUCL framework.
    摘要 因为癌症的高度多样性和临床特征,不同类型的癌症在多Omics数据和临床特征上存在显著的差异。因此,癌症类型的识别和描述是诊断、治疗和预 afterwards 的关键。在本研究中,我们提出了基于注意力机制的通用泛化学习(AMUCL)框架,用于分析癌症多Omics数据,以识别和描述癌症类型。AMUCL框架包括一种多头注意力机制,深度提取多Omics数据特征。另外,基于多头注意力机制的异步对比学习模型(DMACL)被提出,以学习多Omics数据特征和群集,并Identify新的癌症类型。这种无监督对比学习方法,通过计算样本在特征空间和样本空间的相似度,对癌症类型进行归类。与11种深度学习模型相比,DMACL模型在单个细胞多Omics数据集上达到了C指数0.002,Silhouette分数0.801和Davies Bouldin分数0.38。在癌症多Omics数据集上,DMACL模型获得了C指数0.016,Silhouette分数0.688和Davies Bouldin分数0.46,并获得了每种癌症类型的最可靠归类结果。最后,我们使用DMACL模型在AMUCL框架中,揭示了急性骨髓癌(AML)中的六种癌症类型。通过分析GO功能强化、癌症类型特有的生物功能和GSEA,我们进一步增强了癌症类型分析的可读性,基于通用的AMUCL框架。

Contextual Dynamic Pricing with Strategic Buyers

  • paper_url: http://arxiv.org/abs/2307.04055
  • repo_url: None
  • paper_authors: Pangpang Liu, Zhuoran Yang, Zhaoran Wang, Will Wei Sun
  • for: 本研究旨在解决在优化销售价格时,买家可以通过 manipulate 自己的特征数据来获得更低的价格,这会增加销售商的损失。
  • methods: 本研究使用了 contextual dynamic pricing 问题,即销售商不知道买家真实的特征,只能根据买家提供的报告来决定价格。同时,销售商也不知道买家对产品的评价,只能根据买家的回答来判断销售成功或失败。
  • results: 研究人员提出了一种策略性动态价格策略,可以考虑到买家的策略行为,以 maximize 销售商的总收入。此策略不仅不比随机价格策略差,而且可以同时考虑到买家的策略行为和不确定的 manipulate 成本。实验结果表明,这种策略可以与其他不考虑买家策略行为的价格策略相比,具有更高的效果。
    Abstract Personalized pricing, which involves tailoring prices based on individual characteristics, is commonly used by firms to implement a consumer-specific pricing policy. In this process, buyers can also strategically manipulate their feature data to obtain a lower price, incurring certain manipulation costs. Such strategic behavior can hinder firms from maximizing their profits. In this paper, we study the contextual dynamic pricing problem with strategic buyers. The seller does not observe the buyer's true feature, but a manipulated feature according to buyers' strategic behavior. In addition, the seller does not observe the buyers' valuation of the product, but only a binary response indicating whether a sale happens or not. Recognizing these challenges, we propose a strategic dynamic pricing policy that incorporates the buyers' strategic behavior into the online learning to maximize the seller's cumulative revenue. We first prove that existing non-strategic pricing policies that neglect the buyers' strategic behavior result in a linear $\Omega(T)$ regret with $T$ the total time horizon, indicating that these policies are not better than a random pricing policy. We then establish that our proposed policy achieves a sublinear regret upper bound of $O(\sqrt{T})$. Importantly, our policy is not a mere amalgamation of existing dynamic pricing policies and strategic behavior handling algorithms. Our policy can also accommodate the scenario when the marginal cost of manipulation is unknown in advance. To account for it, we simultaneously estimate the valuation parameter and the cost parameter in the online pricing policy, which is shown to also achieve an $O(\sqrt{T})$ regret bound. Extensive experiments support our theoretical developments and demonstrate the superior performance of our policy compared to other pricing policies that are unaware of the strategic behaviors.
    摘要 企业通常采用个性化价格策略,根据消费者个人特点来调整价格。在这个过程中,消费者也可以通过操作自己的特征数据来获得更低的价格,这会产生一定的操作成本。这种战略性行为可能会妨碍企业实现最大利润。本文研究了Contextual Dynamic Pricing问题,在这个问题中,卖方不知道买方的真实特征,只知道买方通过战略行为 manipulate的特征。此外,卖方也不知道买方对产品的评估价值,只知道一个二分类回应,表示是否成交或者不成交。识别这些挑战,我们提出了一种战略动态价格策略,该策略将买方的战略行为纳入在线学习中,以最大化卖方的总收益。我们首先证明了忽略买方战略行为的非战略价格策略会在总时间周期T上得到线性Ω(T)的 regret,这表明这些策略与随机价格策略相当。然后,我们证明了我们提出的策略可以达到O(√T)的 regret上限,这表明我们的策略不仅不 inferior于随机价格策略,还可以在不知道 manipulate 成本的情况下进行优化。为了考虑这种情况,我们同时估算价格参数和成本参数,并证明该策略可以达到O(√T)的 regret上限。实验结果支持我们的理论发展,并证明了我们的策略比其他不考虑买方战略行为的价格策略更高效。

A Physics-Informed Low-Shot Learning For sEMG-Based Estimation of Muscle Force and Joint Kinematics

  • paper_url: http://arxiv.org/abs/2307.05361
  • repo_url: None
  • paper_authors: Yue Shi, Shuhao Ma, Yihui Zhao, Zhiqiang Zhang
    for:This paper aims to improve the estimation of muscle force and joint kinematics from surface electromyography (sEMG) data using a physics-informed low-shot learning method.methods:The proposed method integrates Lagrange’s equation of motion and an inverse dynamic muscle model into a generative adversarial network (GAN) framework for structured feature decoding and extrapolated estimation from small sample data.results:The proposed method outperforms selected benchmark methods, including physics-informed convolution neural network (PI-CNN), vallina GAN, and multi-layer extreme learning machine (ML-ELM), in estimating muscle forces and joint kinematics. The estimations are also unbiased compared to physics-based inverse dynamics.
    Abstract Muscle force and joint kinematics estimation from surface electromyography (sEMG) are essential for real-time biomechanical analysis of the dynamic interplay among neural muscle stimulation, muscle dynamics, and kinetics. Recent advances in deep neural networks (DNNs) have shown the potential to improve biomechanical analysis in a fully automated and reproducible manner. However, the small sample nature and physical interpretability of biomechanical analysis limit the applications of DNNs. This paper presents a novel physics-informed low-shot learning method for sEMG-based estimation of muscle force and joint kinematics. This method seamlessly integrates Lagrange's equation of motion and inverse dynamic muscle model into the generative adversarial network (GAN) framework for structured feature decoding and extrapolated estimation from the small sample data. Specifically, Lagrange's equation of motion is introduced into the generative model to restrain the structured decoding of the high-level features following the laws of physics. And a physics-informed policy gradient is designed to improve the adversarial learning efficiency by rewarding the consistent physical representation of the extrapolated estimations and the physical references. Experimental validations are conducted on two scenarios (i.e. the walking trials and wrist motion trials). Results indicate that the estimations of the muscle forces and joint kinematics are unbiased compared to the physics-based inverse dynamics, which outperforms the selected benchmark methods, including physics-informed convolution neural network (PI-CNN), vallina generative adversarial network (GAN), and multi-layer extreme learning machine (ML-ELM).
    摘要 Muscle force和关节动态观测从表面电omyography(sEMG)是生动机动分析中的关键因素。最近的深度神经网络(DNNs)技术已经表现出可以自动化和复制生动机动分析的潜在力量。然而,生动机动分析的小样本特征和物理解释性限制了DNNs的应用。本文提出了一种新的物理学习低精度学习方法,用于sEMG基于的肌力和关节动态观测。这种方法将拉格朗日方程组入生成对抗网络(GAN)框架中,用于结构化特征解码和推断预测。具体来说,拉格朗日方程组入生成模型中,以便结构化解码高级特征遵循物理法律。此外,我们还设计了一种物理学习策略,以提高对抗学习效率,通过奖励遵循物理表述的推断估计和物理参考。实验 validate 在两个场景(即行走试验和手部运动试验)。结果表明,肌力和关节动态观测的估计不受偏见,与物理反向动力学相符,超越了选择的参考方法,包括物理学习核lear neural network(PI-CNN)、 vallina GAN 和多层极限学习机(ML-ELM)。

Optimization-based Learning for Dynamic Load Planning in Trucking Service Networks

  • paper_url: http://arxiv.org/abs/2307.04050
  • repo_url: None
  • paper_authors: Ritesh Ojha, Wenbo Chen, Hanyu Zhang, Reem Khir, Alan Erera, Pascal Van Hentenryck
  • For: This paper aims to develop a decision-support tool for parcel carriers to optimize their service network design and load planning.* Methods: The paper formulates the Dynamic Load Planning Problem (DLPP) as a Mixed-Integer Programming (MIP) model and proposes a Goal-Directed Optimization method to eliminate symmetries and improve the quality of solutions. The paper also introduces an optimization proxy that combines a machine learning model and a feasibility restoration model to address computational challenges.* Results: The proposed approach is tested on industrial instances and shows significant improvements in terms of computational efficiency and solution quality compared to a commercial solver. The approach also demonstrates the benefits of load consolidation and the potential for significant cost savings through the combination of machine learning and optimization.Here is the summary in Simplified Chinese:* 为:这篇论文目标是为快递公司开发一种决策支持工具,以优化其服务网络设计和货运规划。* 方法:该论文将动态货运规划问题(DLPP)формализова为杂程式(MIP)模型,并提出了一种目标导向优化方法,以消除对称性并提高解的质量。论文还介绍了一种优化代理,该代理结合机器学习模型和可行性修复模型,以解决优化模型的计算挑战。* 结果:提出的方法在工业实例中进行了广泛的计算研究,并显示出了明显的计算效率和解质量的改善,相比于商业 solve。该方法还 demonstarted了货物集中和加工成本的减少,以及机器学习和优化的结合可以带来的显著经济效益。
    Abstract The load planning problem is a critical challenge in service network design for parcel carriers: it decides how many trailers (or loads) to assign for dispatch over time between pairs of terminals. Another key challenge is to determine a flow plan, which specifies how parcel volumes are assigned to planned loads. This paper considers the Dynamic Load Planning Problem (DLPP) that considers both flow and load planning challenges jointly to adjust loads and flows as the demand forecast changes over time before the day of operations. The paper aims at developing a decision-support tool to inform planners making these decisions at terminals across the network. The paper formulates the DLPP as a MIP and shows that it admits a large number of symmetries in a network where each commodity can be routed through primary and alternate paths. As a result, an optimization solver may return fundamentally different solutions to closely related problems, confusing planners and reducing trust in optimization. To remedy this limitation, the paper proposes a Goal-Directed Optimization that eliminates those symmetries by generating optimal solutions staying close to a reference plan. The paper also proposes an optimization proxy to address the computational challenges of the optimization models. The proxy combines a machine learning model and a feasibility restoration model and finds solutions that satisfy real-time constraints imposed by planners-in-the-loop. An extensive computational study on industrial instances shows that the optimization proxy is around 10 times faster than the commercial solver in obtaining the same quality solutions and orders of magnitude faster for generating solutions that are consistent with each other. The proposed approach also demonstrates the benefits of the DLPP for load consolidation, and the significant savings obtained from combining machine learning and optimization.
    摘要 服务网络设计中的负载观念问题是一个扮演性的挑战,决定在不同终点之间分配多少货车(或负载),以及将货物量分配到计划中的负载上。这篇文章考虑了时间流动负载观念问题(DLPP),考虑了流动和负载观念问题的共同挑战,以适应需求预测在时间上的变化。文章的目标是发展一个帮助计划人员做出决策的决策支持工具。文章将DLPP表述为一个内部数据流过程(MIP),并证明了这个问题在网络中的各种商品可以通过主要和备用路径进行路由。因此,优化 solver 可能会返回 closely related 问题的不同解,导致计划人员误解和依靠优化减少。为解决这个限制,文章提出了一个目标导向优化,删除了这些对称性。文章还提出了一个优化代理, combinates 机器学习模型和可行性修复模型,寻找满足实时约束的解决方案。一系列的 Computational 研究显示,优化代理 比商业 solver 在获得相同质量解决方案上约 10 倍快,而且在生成相容的解决方案上具有数量级的优化。提案的方法也显示了 DLPP 的负载整合和机器学习优化的重要性。

The Value of Chess Squares

  • paper_url: http://arxiv.org/abs/2307.05330
  • repo_url: https://github.com/Dpay123/chess
  • paper_authors: Aditya Gupta, Shiva Maharaj, Nicholas Polson, Vadim Sokolov
  • for: 这个研究的目的是确定棋盘上的棋子和位置的价值,以及用于评估棋盘上的位置的精度。
  • methods: 本研究使用了新的评估方法,其中包括对棋子和平方的边缘价值的引入。
  • results: 研究发现,使用新的评估方法可以更好地评估棋盘上的位置和棋子的价值,并提供了有价值的棋盘结构和棋子评估的新视角。
    Abstract Valuing chess squares and determining the placement of pieces on the board are the main objectives of our study. With the emergence of chess AI, it has become possible to accurately assess the worth of positions in a game of chess. The conventional approach assigns fixed values to pieces $(\symking=\infty, \symqueen=9, \symrook=5, \symbishop=3, \symknight=3, \sympawn=1)$. We enhance this analysis by introducing marginal valuations for both pieces and squares. We demonstrate our method by examining the positioning of Knights and Bishops, and also provide valuable insights into the valuation of pawns. Notably, Nimzowitsch was among the pioneers in advocating for the significance of Pawn structure and valuation. Finally, we conclude by suggesting potential avenues for future research.
    摘要 我们的研究的主要目标是评估棋盘上的棋子和坐标的价值。随着棋盘智能的出现,可以准确评估棋盘上的位置价值。传统方法将棋子的价值分别设置为($\symking=\infty, \symqueen=9, \symrook=5, \symbishop=3, \symknight=3, \sympawn=1)$。我们增强了这种分析,通过引入棋子和坐标的边缘价值。我们通过研究夜莺和主教的位置问题,并提供了关于坐标价值的有益信息。值得注意的是,尼莫迪奇(Nimzowitsch)是棋盘价值和结构的先驱者之一,他认为Pawn结构的价值很重要。最后,我们 conclude by suggesting potential future research directions.Here's the translation of the text into Traditional Chinese:我们的研究的主要目标是评估棋盘上的棋子和坐标的价值。随着棋盘智能的出现,可以准确评估棋盘上的位置价值。传统方法将棋子的价值分别设置为($\symking=\infty, \symqueen=9, \symrook=5, \symbishop=3, \symknight=3, \sympawn=1)$。我们增强了这种分析,通过引入棋子和坐标的边缘价值。我们通过研究夜莺和主教的位置问题,并提供了关于坐标价值的有益信息。值得注意的是,尼莫迪奇(Nimzowitsch)是棋盘价值和结构的先驱者之一,他认为Pawn结构的价值很重要。最后,我们 conclude by suggesting potential future research directions.

Designing a Direct Feedback Loop between Humans and Convolutional Neural Networks through Local Explanations

  • paper_url: http://arxiv.org/abs/2307.04036
  • repo_url: https://github.com/tongstevensun/deepfuse
  • paper_authors: Tong Steven Sun, Yuyang Gao, Shubham Khaladkar, Sijia Liu, Liang Zhao, Young-Ho Kim, Sungsoo Ray Hong
  • for: 本研究旨在提供一种实时Feedback loopbetween用户和Convolutional Neural Networks (CNNs),以便在诊断和修复CNNs的漏洞方面提供直接反馈。
  • methods: 本研究使用了Local explanation方法,通过可见的直观性,帮助ML工程师更好地理解CNNs的输出。同时,本研究还提出了一种实时反馈机制,使得用户可以在执行CNNs时,对其输出进行修改和调整。
  • results: 本研究通过一个两天的实验(S2),证明了DeepFuse可以帮助参与者创建一个更加准确和合理的CNN模型,同时也提高了参与者对CNNs的理解和修复能力。此外,本研究还发现,通过DeepFuse的指导,参与者可以更加准确地诊断和修复CNNs的漏洞。
    Abstract The local explanation provides heatmaps on images to explain how Convolutional Neural Networks (CNNs) derive their output. Due to its visual straightforwardness, the method has been one of the most popular explainable AI (XAI) methods for diagnosing CNNs. Through our formative study (S1), however, we captured ML engineers' ambivalent perspective about the local explanation as a valuable and indispensable envision in building CNNs versus the process that exhausts them due to the heuristic nature of detecting vulnerability. Moreover, steering the CNNs based on the vulnerability learned from the diagnosis seemed highly challenging. To mitigate the gap, we designed DeepFuse, the first interactive design that realizes the direct feedback loop between a user and CNNs in diagnosing and revising CNN's vulnerability using local explanations. DeepFuse helps CNN engineers to systemically search "unreasonable" local explanations and annotate the new boundaries for those identified as unreasonable in a labor-efficient manner. Next, it steers the model based on the given annotation such that the model doesn't introduce similar mistakes. We conducted a two-day study (S2) with 12 experienced CNN engineers. Using DeepFuse, participants made a more accurate and "reasonable" model than the current state-of-the-art. Also, participants found the way DeepFuse guides case-based reasoning can practically improve their current practice. We provide implications for design that explain how future HCI-driven design can move our practice forward to make XAI-driven insights more actionable.
    摘要 本地解释提供图像上的热图,以解释卷积神经网络(CNN)的输出来源。由于其视觉直观,这种方法在诊断CNN方面得到了广泛的应用。但是,根据我们的首次研究(S1),机器学习工程师对本地解释持有折衔的看法,认为它作为CNN建模的重要工具,但同时也会带来劳动iously检测漏洞的困难。此外,通过检测的漏洞来调整CNN也显然具有困难。为了缓解这一问题,我们设计了深度融合(DeepFuse),第一个实现用户与CNN之间的直接反馈循环的交互设计。DeepFuse帮助CNN工程师系统地搜索"不合理"的本地解释,并在劳动效率高的情况下注释新的边界。然后,它将模型按照给出的注释进行调整,以避免模型 introduce 类似的错误。我们进行了两天的研究(S2),与12名经验丰富的CNN工程师进行了合作。使用DeepFuse,参与者创建了更加准确和"合理"的模型,并认为DeepFuse的指导方法可以实际地改善他们当前的做法。我们提供了设计方面的推荐,解释了未来HCID驱动的设计如何将XAI驱动的洞察力变得更加操作化。

Learning Variational Neighbor Labels for Test-Time Domain Generalization

  • paper_url: http://arxiv.org/abs/2307.04033
  • repo_url: None
  • paper_authors: Sameer Ambekar, Zehao Xiao, Jiayi Shen, Xiantong Zhen, Cees G. M. Snoek
  • for: 本研究努力实现领域总结,即模型在训练于源频道后在未见目标频道上部署。我们遵循严格的源训练和目标测试的分离,但是利用目标频道自身的无标注数据来进行推理。
  • methods: 我们提出了三个贡献。首先,我们提出了使用概率 Pseudo-labeling 将目标样本泛化到目标频道上,以使源频道训练的模型在测试时能够泛化到目标频道。其次,我们学习了variational neighbor labels,以使用邻居目标样本的信息来生成更加Robust的 Pseudo labels。第三,我们引入了一个元总结阶段,以在训练中模拟泛化过程,以学习更好地泛化目标信息。
  • results: 我们在六个广泛使用的 dataset 上进行了实验,结果表明我们的提议具有优势、能力和有效性。
    Abstract This paper strives for domain generalization, where models are trained exclusively on source domains before being deployed at unseen target domains. We follow the strict separation of source training and target testing but exploit the value of the unlabeled target data itself during inference. We make three contributions. First, we propose probabilistic pseudo-labeling of target samples to generalize the source-trained model to the target domain at test time. We formulate the generalization at test time as a variational inference problem by modeling pseudo labels as distributions to consider the uncertainty during generalization and alleviate the misleading signal of inaccurate pseudo labels. Second, we learn variational neighbor labels that incorporate the information of neighboring target samples to generate more robust pseudo labels. Third, to learn the ability to incorporate more representative target information and generate more precise and robust variational neighbor labels, we introduce a meta-generalization stage during training to simulate the generalization procedure. Experiments on six widely-used datasets demonstrate the benefits, abilities, and effectiveness of our proposal.
    摘要 这篇论文努力实现领域通用化,即在训练时仅使用源领域,然后在未见目标领域进行部署。我们遵循严格的源训练和目标测试的分离,但是利用目标数据本身的价值进行推理。我们提出了三个贡献:1. 我们提议在测试时使用目标样本的概率 Pseudo-标签来泛化源训练模型到目标领域。我们将泛化视为测试时的变量推理问题,并在泛化过程中考虑 pseudo labels 的不确定性,以避免 pseudo labels 的不准确信号。2. 我们学习了基于邻域目标样本的变量邻域标签,以生成更加稳健的 pseudo labels。3. 为了学习更好地汇集更多的目标信息并生成更加精准和稳定的变量邻域标签,我们引入了一个元泛化阶段在训练中进行模拟泛化过程。我们在六个广泛使用的 dataset 上进行了实验,并证明了我们的提议的优点、能力和有效性。

On “Indifference” and Backward Induction in Games with Perfect Information

  • paper_url: http://arxiv.org/abs/2307.04029
  • repo_url: None
  • paper_authors: Nimrod Megiddo
  • for: 这 paper written for? + 这 paper 探讨了在游戏中不同结果间的偏袋不可小 perturbations 问题。
  • methods: 这 paper 使用了哪些方法? + 这 paper 使用了 rational choice 概念和其他玩家的利得来解决了偏袋不可小 perturbations 问题。
  • results: 这 paper 得到了哪些结果? + 这 paper 得到了一种基于其他玩家的利得的 rationality 概念来解决偏袋不可小 perturbations 问题的方法,即 Tit-for-Tat。
    Abstract Indifference of a player with respect to two distinct outcomes of a game cannot be handled by small perturbations, because the actual choice may have significant impact on other players, and cause them to act in a way that has significant impact of the indifferent player. It is argued that ties among rational choices can be resolved by refinements of the concept of rationality based on the utilities of other players. One such refinement is the concept of Tit-for-Tat.
    摘要 “玩家对两个游戏结果的不在焦虑不能通过小幅度的改变来处理,因为他的实际选择可能会对其他玩家造成重要影响,使其发生重要影响。有一种解决方案是基于其他玩家的利益来修改 rationality 概念,例如 Tit-for-Tat。”Note: "玩家" (wán jiā) in Chinese refers to "player" in English.

Measuring the Success of Diffusion Models at Imitating Human Artists

  • paper_url: http://arxiv.org/abs/2307.04028
  • repo_url: None
  • paper_authors: Stephen Casper, Zifan Guo, Shreya Mogulothu, Zachary Marinov, Chinmay Deshpande, Rui-Jie Yew, Zheng Dai, Dylan Hadfield-Menell
  • for: 这个论文旨在研究现代扩散模型是否可以模仿人类艺术家的作品。
  • methods: 这个论文使用了 Contrastive Language-Image Pretrained (CLIP) 算法来测试模型是否可以模仿特定艺术家的风格。
  • results: 研究发现,当模型被请求模仿某个艺术家的作品时,CLIP 可以很准确地将这些作品归类回原始艺术家。此外,研究还发现,这些模仿作品与艺术家的原始作品之间存在高度的统计学相似性。
    Abstract Modern diffusion models have set the state-of-the-art in AI image generation. Their success is due, in part, to training on Internet-scale data which often includes copyrighted work. This prompts questions about the extent to which these models learn from, imitate, or copy the work of human artists. This work suggests that tying copyright liability to the capabilities of the model may be useful given the evolving ecosystem of generative models. Specifically, much of the legal analysis of copyright and generative systems focuses on the use of protected data for training. As a result, the connections between data, training, and the system are often obscured. In our approach, we consider simple image classification techniques to measure a model's ability to imitate specific artists. Specifically, we use Contrastive Language-Image Pretrained (CLIP) encoders to classify images in a zero-shot fashion. Our process first prompts a model to imitate a specific artist. Then, we test whether CLIP can be used to reclassify the artist (or the artist's work) from the imitation. If these tests match the imitation back to the original artist, this suggests the model can imitate that artist's expression. Our approach is simple and quantitative. Furthermore, it uses standard techniques and does not require additional training. We demonstrate our approach with an audit of Stable Diffusion's capacity to imitate 70 professional digital artists with copyrighted work online. When Stable Diffusion is prompted to imitate an artist from this set, we find that the artist can be identified from the imitation with an average accuracy of 81.0%. Finally, we also show that a sample of the artist's work can be matched to these imitation images with a high degree of statistical reliability. Overall, these results suggest that Stable Diffusion is broadly successful at imitating individual human artists.
    摘要 现代扩散模型已经设置了人工智能图像生成的 estado-del-arte。其成功部分归功于训练在互联网规模的数据上,这些数据经常包含版权工作。这些问题引发了关于模型从人类艺术家的作品中学习、模仿或复制的问题。这个研究建议将版权责任与模型的能力相关联可能是有用的, giventhe evolving ecosystem of generative models。具体来说,法律分析对于版权和生成系统的关系经常集中在使用保护的数据进行训练。因此,数据、训练和系统之间的连接经常被隐藏。我们的方法是通过使用语义相似性来衡量模型是否能够模仿特定艺术家。我们使用语言-图像预训练(CLIP)编码器来在零扩展方式进行图像分类。我们的过程首先要让模型模仿特定艺术家。然后,我们测试是否可以使用CLIP来重新分类艺术家(或艺术家的作品)。如果这些测试匹配艺术家(或艺术家的作品)与模仿,这表示模型可以模仿这位艺术家的表达。我们的方法是简单而量化的,并且不需要额外训练。我们通过对Stable Diffusion的可行性进行审核,发现它可以模仿70名职业数字艺术家的版权作品。当Stable Diffusion被让模仿这些艺术家时,我们发现这些艺术家可以从模仿中匹配出来,均为81.0%。此外,我们还证明了这些模仿图像和艺术家的作品之间存在高度统计学的相互关联。总之,这些结果表明Stable Diffusion能够广泛地模仿人类艺术家。

GP-guided MPPI for Efficient Navigation in Complex Unknown Cluttered Environments

  • paper_url: http://arxiv.org/abs/2307.04019
  • repo_url: None
  • paper_authors: Ihab S. Mohamed, Mahmoud Ali, Lantao Liu
  • For: The paper is written for robotic navigation in unknown, cluttered environments with limited sensing capabilities.* Methods: The paper uses local trajectory optimization methods, specifically Model Predictive Path Intergal (MPPI), and integrates it with a local perception model based on Sparse Gaussian Process (SGP) to learn about the navigable space surrounding the robot and identify suggested subgoals.* Results: The proposed control strategy, called GP-MPPI, is validated through both simulated and real-world experiments of 2D autonomous navigation tasks in complex unknown environments, demonstrating its efficiency and robustness in guiding the robot safely towards its desired goal while avoiding obstacles and escaping entrapment in local minima.Here is the information in Simplified Chinese text:* 为:论文写作的目的是Robotic Navigation在未知、拥堵的环境中进行路径规划,具有限制的感知能力。* 方法:论文使用本地规划方法,即Model Predictive Path Intergal (MPPI),并将其与本地感知模型基于Sparse Gaussian Process (SGP)相结合,以学习环境中可行的空间,并提供了一系列建议的目标,以便MPPI计划器选择最优化的目标。* 结果:论文提出的控制策略,即GP-MPPI,通过在实验室和真实环境中进行的2D自主导航任务的实验 validate了其效率和可靠性,证明了它在避免障碍物和脱险地逃脱局部最优化的情况下安全地导航到目标。
    Abstract Robotic navigation in unknown, cluttered environments with limited sensing capabilities poses significant challenges in robotics. Local trajectory optimization methods, such as Model Predictive Path Intergal (MPPI), are a promising solution to this challenge. However, global guidance is required to ensure effective navigation, especially when encountering challenging environmental conditions or navigating beyond the planning horizon. This study presents the GP-MPPI, an online learning-based control strategy that integrates MPPI with a local perception model based on Sparse Gaussian Process (SGP). The key idea is to leverage the learning capability of SGP to construct a variance (uncertainty) surface, which enables the robot to learn about the navigable space surrounding it, identify a set of suggested subgoals, and ultimately recommend the optimal subgoal that minimizes a predefined cost function to the local MPPI planner. Afterward, MPPI computes the optimal control sequence that satisfies the robot and collision avoidance constraints. Such an approach eliminates the necessity of a global map of the environment or an offline training process. We validate the efficiency and robustness of our proposed control strategy through both simulated and real-world experiments of 2D autonomous navigation tasks in complex unknown environments, demonstrating its superiority in guiding the robot safely towards its desired goal while avoiding obstacles and escaping entrapment in local minima. The GPU implementation of GP-MPPI, including the supplementary video, is available at https://github.com/IhabMohamed/GP-MPPI.
    摘要 人工智能导航在未知、杂乱环境中具有限制的感知能力 pose significant challenges in robotics. Local trajectory optimization methods, such as Model Predictive Path Intergal (MPPI), are a promising solution to this challenge. However, global guidance is required to ensure effective navigation, especially when encountering challenging environmental conditions or navigating beyond the planning horizon. This study presents the GP-MPPI, an online learning-based control strategy that integrates MPPI with a local perception model based on Sparse Gaussian Process (SGP). The key idea is to leverage the learning capability of SGP to construct a variance (uncertainty) surface, which enables the robot to learn about the navigable space surrounding it, identify a set of suggested subgoals, and ultimately recommend the optimal subgoal that minimizes a predefined cost function to the local MPPI planner. Afterward, MPPI computes the optimal control sequence that satisfies the robot and collision avoidance constraints. Such an approach eliminates the necessity of a global map of the environment or an offline training process. We validate the efficiency and robustness of our proposed control strategy through both simulated and real-world experiments of 2D autonomous navigation tasks in complex unknown environments, demonstrating its superiority in guiding the robot safely towards its desired goal while avoiding obstacles and escaping entrapment in local minima. The GPU implementation of GP-MPPI, including the supplementary video, is available at https://github.com/IhabMohamed/GP-MPPI.

Proceedings Nineteenth conference on Theoretical Aspects of Rationality and Knowledge

Abstract The TARK conference (Theoretical Aspects of Rationality and Knowledge) is a conference that aims to bring together researchers from a wide variety of fields, including computer science, artificial intelligence, game theory, decision theory, philosophy, logic, linguistics, and cognitive science. Its goal is to further our understanding of interdisciplinary issues involving reasoning about rationality and knowledge. Previous conferences have been held biennially around the world since 1986, on the initiative of Joe Halpern (Cornell University). Topics of interest include, but are not limited to, semantic models for knowledge, belief, awareness and uncertainty, bounded rationality and resource-bounded reasoning, commonsense epistemic reasoning, epistemic logic, epistemic game theory, knowledge and action, applications of reasoning about knowledge and other mental states, belief revision, computational social choice, algorithmic game theory, and foundations of multi-agent systems. Information about TARK, including conference proceedings, is available at http://www.tark.org/ These proceedings contain the papers that have been accepted for presentation at the Nineteenth Conference on Theoretical Aspects of Rationality and Knowledge (TARK 2023), held between June 28 and June 30, 2023, at the University of Oxford, United Kingdom. The conference website can be found at https://sites.google.com/view/tark-2023
摘要 TARK conference(理性和知识的理论方面)是一个会议,旨在让不同领域的研究人员(包括计算机科学、人工智能、游戏理论、决策理论、哲学、逻辑、语言科学和认知科学)共同分享他们的研究成果。会议的目标是深入了解跨学科问题,有关理性和知识的推理。自1986年以来,TARK会议每两年在世界各地举行,由 Джо·哈尔佩恩(科内尔大学)发起。会议的主题包括,但不限于:知识、信念、意识和不确定性的semantic模型,有限智能和资源有限的推理,通常的epistemic推理,epistemic逻辑、epistemic游戏理论、知识和行动、理性和知识之间的关系,以及应用推理知识和其他心理状态的问题。以下是TARK会议的论文集,包括2023年6月28日-6月30日在英国牛津大学举行的第十九届TARK会议(TARK 2023) Accepted Papers。会议网站的地址为

cs.CL - 2023-07-09

Can Generative Large Language Models Perform ASR Error Correction?

  • paper_url: http://arxiv.org/abs/2307.04172
  • repo_url: None
  • paper_authors: Rao Ma, Mengjie Qian, Potsawee Manakul, Mark Gales, Kate Knill
  • for: 这paper是为了提高ASR系统的性能而写的,具体来说是通过采用zero-shot或1-shot的方式进行语言模型ChatGPT的应用,以实现ASR错误修正。
  • methods: 这paper使用了ChatGPT模型作为ASR错误修正的基础模型,并提出了无约束和N-best约束两种方法来进行修正。
  • results: 试验结果表明,使用ChatGPT模型进行ASR错误修正可以大幅提高ASR系统的性能,特别是在1-shot setting下。
    Abstract ASR error correction continues to serve as an important part of post-processing for speech recognition systems. Traditionally, these models are trained with supervised training using the decoding results of the underlying ASR system and the reference text. This approach is computationally intensive and the model needs to be re-trained when switching the underlying ASR model. Recent years have seen the development of large language models and their ability to perform natural language processing tasks in a zero-shot manner. In this paper, we take ChatGPT as an example to examine its ability to perform ASR error correction in the zero-shot or 1-shot settings. We use the ASR N-best list as model input and propose unconstrained error correction and N-best constrained error correction methods. Results on a Conformer-Transducer model and the pre-trained Whisper model show that we can largely improve the ASR system performance with error correction using the powerful ChatGPT model.
    摘要 <>转换文本为简化中文。<>ASR错误修正仍然serve为speech recognition系统 posterior 处理中的重要部分。传统上,这些模型通过supervised training使用underlying ASR系统的解码结果和参考文本进行训练。这种方法是计算昂贵的,并且模型需要在Switching underlying ASR模型时重新训练。 recent years have seen the development of large language models and their ability to perform natural language processing tasks in a zero-shot manner. 在这篇论文中,我们使用ChatGPT作为例子,检查它是否可以在zero-shot或1-shot settings中进行ASR错误修正。我们使用ASR N-best list作为模型输入,并提出了不受限制的错误修正方法和N-best受限制的错误修正方法。 results on a Conformer-Transducer模型和预训练Whisper模型表明,我们可以使用powerful ChatGPT模型来大幅提高ASR系统性能。

Dream Content Discovery from Reddit with an Unsupervised Mixed-Method Approach

  • paper_url: http://arxiv.org/abs/2307.04167
  • repo_url: None
  • paper_authors: Anubhab Das, Sanja Šćepanović, Luca Maria Aiello, Remington Mallett, Deirdre Barrett, Daniele Quercia
  • for: This paper aims to develop a new, data-driven approach for analyzing dream reports and understanding the topics and themes that appear in dreams.
  • methods: The authors use natural language processing techniques to identify topics in free-form dream reports and group them into larger themes. They also compare their results to the Hall and van de Castle scale to validate their findings.
  • results: The authors analyze 44,213 dream reports from Reddit’s r/Dreams subreddit and identify 217 topics, grouped into 22 larger themes. They also show how their method can be used to understand changes in collective dream experiences over time and around major events like the COVID-19 pandemic and the Russo-Ukrainian war.Here is the same information in Simplified Chinese text:
  • for: 这篇论文的目的是开发一种基于自然语言处理技术的新方法,用于分析梦境报告和了解梦境中出现的主题和主题。
  • methods: 作者使用自然语言处理技术来 identificar梦境报告中的主题,并将其分组成更大的主题。他们还与哈尔和德卡斯特LE scale进行比较,以验证其结果。
  • results: 作者分析了 Reddit r/梦境 subreddit上的44,213个梦境报告,并发现了217个主题,分组成22个更大的主题。他们还表明了他们的方法可以用于理解时间和大事件的影响,如COVID-19疫苗和俄乌战争。
    Abstract Dreaming is a fundamental but not fully understood part of human experience that can shed light on our thought patterns. Traditional dream analysis practices, while popular and aided by over 130 unique scales and rating systems, have limitations. Mostly based on retrospective surveys or lab studies, they struggle to be applied on a large scale or to show the importance and connections between different dream themes. To overcome these issues, we developed a new, data-driven mixed-method approach for identifying topics in free-form dream reports through natural language processing. We tested this method on 44,213 dream reports from Reddit's r/Dreams subreddit, where we found 217 topics, grouped into 22 larger themes: the most extensive collection of dream topics to date. We validated our topics by comparing it to the widely-used Hall and van de Castle scale. Going beyond traditional scales, our method can find unique patterns in different dream types (like nightmares or recurring dreams), understand topic importance and connections, and observe changes in collective dream experiences over time and around major events, like the COVID-19 pandemic and the recent Russo-Ukrainian war. We envision that the applications of our method will provide valuable insights into the intricate nature of dreaming.
    摘要 梦境是人类经验中的基本 Component, but it is not yet fully understood. 传统的梦境分析方法,虽然受欢迎且有130多个专门的Scale和评分系统,但它们有限制。大多数是基于回忆题或实验室实验,难以应用于大规模或显示梦境主题之间的重要性和连接。为了解决这些问题,我们开发了一个新的数据驱动混合方法,通过自然语言处理来识别自由形式的梦境报告中的主题。我们对网络上的Reddit的r/Dreams子区域上的44,213个梦境报告进行了测试,发现了217个主题,分为22个更大的主题:迄今为止最大的梦境主题收集。我们验证了我们的主题,与广泛使用的哈尔和van de Castle专业Scale进行比较。我们的方法可以找到不同类型的梦境(如夜恶梦或重复的梦境)中的独特模式,理解主题的重要性和连接,并观察时间和主要事件(如COVID-19疫情和最近的俄乌战略)的影响。我们觉得这些应用将提供价值的关于梦境的深入了解。

Towards cross-language prosody transfer for dialog

  • paper_url: http://arxiv.org/abs/2307.04123
  • repo_url: https://github.com/joneavila/dral
  • paper_authors: Jonathan E. Avila, Nigel G. Ward
  • for: 这个论文的目的是为了解决现代语音译语系统中对对话用途的不足。特别是在译语过程中略过 speaker intention 和态度的细节。
  • methods: 作者采用了一种数据采集协议,让双语说话者重新重复之前的对话中的语言,并使用这些数据采集了一个英语-西班牙语词库,目前包含1871个匹配的语音对。此外,作者还开发了一个简单的距离度量,基于一组广泛的语音特征。
  • results: 作者的发现可以指导未来关于 cross-language 语音特征的研究,以及设计能够有效地传递语音特征的语音译语系统。
    Abstract Speech-to-speech translation systems today do not adequately support use for dialog purposes. In particular, nuances of speaker intent and stance can be lost due to improper prosody transfer. We present an exploration of what needs to be done to overcome this. First, we developed a data collection protocol in which bilingual speakers re-enact utterances from an earlier conversation in their other language, and used this to collect an English-Spanish corpus, so far comprising 1871 matched utterance pairs. Second, we developed a simple prosodic dissimilarity metric based on Euclidean distance over a broad set of prosodic features. We then used these to investigate cross-language prosodic differences, measure the likely utility of three simple baseline models, and identify phenomena which will require more powerful modeling. Our findings should inform future research on cross-language prosody and the design of speech-to-speech translation systems capable of effective prosody transfer.
    摘要 当今的语音到语音翻译系统不充分支持对话用途。特别是,speaker的意图和态度可能会在不正确的语速传递中丢失。我们提出了一种解决方案,包括发展一种数据采集协议,由双语说话人重新表演之前的对话中的语音,并使用这些数据采集了一个英语-西班牙语词库,目前已经有1871个匹配的语音对。second, we developed a simple prosody dissimilarity metric based on Euclidean distance over a broad set of prosodic features. we then used these to investigate cross-language prosodic differences, measure the likely utility of three simple baseline models, and identify phenomena which will require more powerful modeling. our findings should inform future research on cross-language prosody and the design of speech-to-speech translation systems capable of effective prosody transfer.

Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing

  • paper_url: http://arxiv.org/abs/2307.04096
  • repo_url: None
  • paper_authors: Tom Sherborne, Tom Hosking, Mirella Lapata
  • for: 这则论文旨在探讨跨语言Semantic parsing的问题,即从高资源语言(例如英文)传递分析能力到低资源语言。
  • methods: 本文提出了一新的方法,即使用Optimal Transport来降低跨语言分布差异,以提高从自然语言中的分析。
  • results: 本文在两个数据集MTOP和MultiATIS++SQL上进行了评估,成功地在少数例下lingshot regime中实现了顶尖的结果。
    Abstract Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data. Previous work has primarily considered silver-standard data augmentation or zero-shot methods, however, exploiting few-shot gold data is comparatively unexplored. We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between probabilistic latent variables using Optimal Transport. We demonstrate how this direct guidance improves parsing from natural languages using fewer examples and less training. We evaluate our method on two datasets, MTOP and MultiATIS++SQL, establishing state-of-the-art results under a few-shot cross-lingual regime. Ablation studies further reveal that our method improves performance even without parallel input translations. In addition, we show that our model better captures cross-lingual structure in the latent space to improve semantic representation similarity.
    摘要 通用语言semantic parsing是一种将分析能力从高资源语言(如英语)传递到低资源语言的技术。先前的工作主要集中在银色标准数据增强或零例目标方法上,然而受限于数据量的几个例行金融方法尚未得到了充分利用。我们提出了一种新的交通优化方法来实现跨语言semantic parsing,通过最优交通来减少跨语言差异。我们通过这种直接导向来提高自然语言的分析,需要 fewer examples和更少的训练。我们在MTOP和MultiATIS++SQL两个 datasets上进行了评估,并在几个例行跨语言 режиме下创造了状态机器人的结果。剖分研究表明,我们的方法可以提高性能,甚至没有平行输入翻译。此外,我们的模型更好地捕捉了跨语言结构,以提高 semantic representation的相似性。

Bidirectional Attention as a Mixture of Continuous Word Experts

  • paper_url: http://arxiv.org/abs/2307.04057
  • repo_url: https://github.com/yixinw-lab/attention-uai
  • paper_authors: Kevin Christian Wibisono, Yixin Wang
  • for: 这篇论文研究了bidirectional attention的统计基础,即bidirectional attention是如何适应hetroogeneous数据的。
  • methods: 该论文使用了单层单头bidirectional attention,并对其进行了梯度下降优化。
  • results: 研究发现,bidirectional attention可以在out-of-distribution泛化中表现出优于现有的表示器扩展。此外,该论文还提出了一种基于MoE的对 categorical tabular data的扩展,并证明了bidirectional attention中的线性词 analogies的存在可能性。
    Abstract Bidirectional attention $\unicode{x2013}$ composed of self-attention with positional encodings and the masked language model (MLM) objective $\unicode{x2013}$ has emerged as a key component of modern large language models (LLMs). Despite its empirical success, few studies have examined its statistical underpinnings: What statistical model is bidirectional attention implicitly fitting? What sets it apart from its non-attention predecessors? We explore these questions in this paper. The key observation is that fitting a single-layer single-head bidirectional attention, upon reparameterization, is equivalent to fitting a continuous bag of words (CBOW) model with mixture-of-experts (MoE) weights. Further, bidirectional attention with multiple heads and multiple layers is equivalent to stacked MoEs and a mixture of MoEs, respectively. This statistical viewpoint reveals the distinct use of MoE in bidirectional attention, which aligns with its practical effectiveness in handling heterogeneous data. It also suggests an immediate extension to categorical tabular data, if we view each word location in a sentence as a tabular feature. Across empirical studies, we find that this extension outperforms existing tabular extensions of transformers in out-of-distribution (OOD) generalization. Finally, this statistical perspective of bidirectional attention enables us to theoretically characterize when linear word analogies are present in its word embeddings. These analyses show that bidirectional attention can require much stronger assumptions to exhibit linear word analogies than its non-attention predecessors.
    摘要 bidirectional attention $\unicode{x2013}$ 由自我注意与位置编码和伪语言模型(MLM)目标组成, $\unicode{x2013}$ 在现代大语言模型(LLM)中发展为关键组成部分。尽管其实际成功,但很少研究其统计基础:bidirectional attention隐式地适应哪种统计模型?与非注意前辈相比,它有什么特点?我们在这篇论文中进行了这些问题的探索。关键观察结果表明,单层单头bidirectional attention经重parameterization等价于单个单元词语(CBOW)模型中的混合专家(MoE)权重。此外,bidirectional attention有多个头和多层,等价于堆叠的MoE和混合MoE。这种统计视角显示bidirectional attention中MoE的独特使用,与其实际效果在处理不同数据有关。它还建议了将每个句子中的单词位置视为一个表格特征,对 tabular数据进行扩展。在实验研究中,我们发现这种扩展在不同数据集上的OOD泛化性能高于现有的表transformer扩展。最后,这种统计观点对bidirectional attention的word embeddings中的线性单词 analogies进行了理论 caracterization。这些分析表明,bidirectional attention可能需要更强的假设,以便在其word embeddings中出现线性单词 analogies,与非注意前辈相比。

How is Fatherhood Framed Online in Singapore?

  • paper_url: http://arxiv.org/abs/2307.04053
  • repo_url: None
  • paper_authors: Tran Hien Van, Abhay Goyal, Muhammad Siddique, Lam Yin Cheung, Nimay Parekh, Jonathan Y Huang, Keri McCrickerd, Edson C Tandoc Jr., Gerard Chung, Navin Kumar
  • for: 研究 singapore 父亲身份的框架,以帮助制定有关父亲政策。
  • methods: 使用 NLP 技术分析 singapore 在线环境中 fatherhood 的框架,包括新闻报道、Parenting 讨论区和 Twitter。
  • results: 发现 singapore 父亲在线环境中的框架并不仅仅是中心的家庭单位。
    Abstract The proliferation of discussion about fatherhood in Singapore attests to its significance, indicating the need for an exploration of how fatherhood is framed, aiding policy-making around fatherhood in Singapore. Sound and holistic policy around fatherhood in Singapore may reduce stigma and apprehension around being a parent, critical to improving the nations flagging birth rate. We analyzed 15,705 articles and 56,221 posts to study how fatherhood is framed in Singapore across a range of online platforms (news outlets, parenting forums, Twitter). We used NLP techniques to understand these differences. While fatherhood was framed in a range of ways on the Singaporean online environment, it did not seem that fathers were framed as central to the Singaporean family unit. A strength of our work is how the different techniques we have applied validate each other.
    摘要 《父亲权利在新加坡的普及和讨论,证明了它的重要性,表明了需要对父亲权利的框架进行研究,以便在新加坡制定有效的父亲权利政策。如果制定有效的父亲权利政策,可能会减轻父母身份的偏见和担忧,从而改善新加坡的出生率。我们分析了15705篇文章和56221个微博,以研究新加坡在线环境中 fatherhood 的框架。我们使用自然语言处理技术来理解这些差异。虽然在新加坡的在线环境中 fatherhood 被框架在不同的方式,但是父亲并不被视为新加坡家庭单位的中心。我们的工作的一个优点是我们所应用的不同技术都能够证实Each other。》

Can LLMs be Good Financial Advisors?: An Initial Study in Personal Decision Making for Optimized Outcomes

  • paper_url: http://arxiv.org/abs/2307.07422
  • repo_url: None
  • paper_authors: Kausik Lakkaraju, Sai Krishna Revanth Vuruma, Vishal Pallagani, Bharath Muppasani, Biplav Srivastava
  • for: 本研究旨在 investigate LLM-based chatbot在个人财务领域的表现,即银行在推广金融包容方面的着点。
  • methods: 我们问了13个问题,代表银行个人财务产品,包括银行账户、信用卡和证券,以及这些产品之间的交互和决策。我们还使用不同的方言和语言(英语、非裔美洲语言和telugu)进行询问。
  • results: 我们发现,虽然chatbot的输出具有流畅和可能性,但还存在提供准确和可靠的金融信息的重要缺陷。
    Abstract Increasingly powerful Large Language Model (LLM) based chatbots, like ChatGPT and Bard, are becoming available to users that have the potential to revolutionize the quality of decision-making achieved by the public. In this context, we set out to investigate how such systems perform in the personal finance domain, where financial inclusion has been an overarching stated aim of banks for decades. We asked 13 questions representing banking products in personal finance: bank account, credit card, and certificate of deposits and their inter-product interactions, and decisions related to high-value purchases, payment of bank dues, and investment advice, and in different dialects and languages (English, African American Vernacular English, and Telugu). We find that although the outputs of the chatbots are fluent and plausible, there are still critical gaps in providing accurate and reliable financial information using LLM-based chatbots.
    摘要 增强的大语言模型(LLM)基于的聊天机器人,如ChatGPT和Bard,正在用户处得到普及,它们有可能革命化公众做出决策的质量。在这个上下文下,我们进行了评估这些系统在个人财务领域的表现,其中财务包括银行产品的问题和 dialects and languages(英语、非洲美国黑话和泰卢固)。我们发现虽然聊天机器人的输出具有流畅和合理的特点,但仍然存在提供准确和可靠的金融信息的关键缺失。

Revisiting Cross-Lingual Summarization: A Corpus-based Study and A New Benchmark with Improved Annotation

  • paper_url: http://arxiv.org/abs/2307.04018
  • repo_url: https://github.com/cylnlp/convsumx
  • paper_authors: Yulong Chen, Huajian Zhang, Yijie Zhou, Xuefeng Bai, Yueguan Wang, Ming Zhong, Jianhao Yan, Yafu Li, Judy Li, Michael Zhu, Yue Zhang
  • for: 这个论文的目的是提出一个新的跨语言概要标准 benchmark,以及一种基于人工标注的 Two-Step 方法,以优化跨语言概要的生成。
  • methods: 这个论文使用了一种新的注意力机制,以及一种基于 conversation 和概要的 Two-Step 方法,来模型跨语言概要。
  • results: 实验结果显示,Two-Step 方法在 ConvSumX 下表现出色,并且在人工评估中也获得了好几个比较高的评分。分析表明,输入文本和概要都是模型跨语言概要的关键因素。
    Abstract Most existing cross-lingual summarization (CLS) work constructs CLS corpora by simply and directly translating pre-annotated summaries from one language to another, which can contain errors from both summarization and translation processes. To address this issue, we propose ConvSumX, a cross-lingual conversation summarization benchmark, through a new annotation schema that explicitly considers source input context. ConvSumX consists of 2 sub-tasks under different real-world scenarios, with each covering 3 language directions. We conduct thorough analysis on ConvSumX and 3 widely-used manually annotated CLS corpora and empirically find that ConvSumX is more faithful towards input text. Additionally, based on the same intuition, we propose a 2-Step method, which takes both conversation and summary as input to simulate human annotation process. Experimental results show that 2-Step method surpasses strong baselines on ConvSumX under both automatic and human evaluation. Analysis shows that both source input text and summary are crucial for modeling cross-lingual summaries.
    摘要 现有跨语言概要(CLS)工作通常通过直接翻译已经翻译过的概要来构建CLS corpora,这可能会包含翻译和概要过程中的错误。为解决这个问题,我们提出了ConvSumX,一个跨语言对话概要benchmark,通过一种新的注释schema来考虑原始输入上下文。ConvSumX包括2个子任务,每个覆盖3种语言方向,并进行了对ConvSumX和3种手动注释CLS corpora的 thorought的分析。我们发现ConvSumX更 faithful于输入文本。此外,基于同一种INTUITION,我们提出了2步方法,该方法接受对话和概要作为输入,以模拟人工注释过程。实验结果显示,2步方法在ConvSumX上超过了强基eline。分析表明,原始输入文本和概要都是模型跨语言概要的关键因素。

Toward Interactive Dictation

  • paper_url: http://arxiv.org/abs/2307.04008
  • repo_url: None
  • paper_authors: Belinda Z. Li, Jason Eisner, Adam Pauls, Sam Thomson
  • for: 这个论文旨在研究让用户在 dictation 模式下可以通过语音编辑命令来修改文本。
  • methods: 该论文使用了大型预训练语言模型,并在实时模式下对 spoken 语音进行分类和 segmentation,以便在 dictation 模式下进行语音编辑。
  • results: 实验显示,使用大型模型可以 дости得30%的终态准确率,但是也会增加延迟。同时,使用较小的模型可以降低延迟,但是终态准确率也会降低至55%。
    Abstract Voice dictation is an increasingly important text input modality. Existing systems that allow both dictation and editing-by-voice restrict their command language to flat templates invoked by trigger words. In this work, we study the feasibility of allowing users to interrupt their dictation with spoken editing commands in open-ended natural language. We introduce a new task and dataset, TERTiUS, to experiment with such systems. To support this flexibility in real-time, a system must incrementally segment and classify spans of speech as either dictation or command, and interpret the spans that are commands. We experiment with using large pre-trained language models to predict the edited text, or alternatively, to predict a small text-editing program. Experiments show a natural trade-off between model accuracy and latency: a smaller model achieves 30% end-state accuracy with 1.3 seconds of latency, while a larger model achieves 55% end-state accuracy with 7 seconds of latency.
    摘要 “对话输入模式在不断增长的重要性。现有系统允许Dictation和Speech editing Command的混合使用,但是它们仅允许使用特定的 trigger word invoked flat templates。在这个工作中,我们研究了允许用户在Dictation中间点击 spoken editing Command的可能性。我们引入了一个新的任务和数据集TERTIUS,以便实验这些系统。为了在实时支持这种自由性,一个系统需要逐条分析和识别语音为Dictation或Command,并将这些条件转换为编辑文本。我们尝试使用大型预训语言模型来预测编辑文本,或者alternatively,预测小的文本编辑程式。实验结果表明,这种自然的贸易在精度和延迟之间:一个较小的模型可以在1.3秒的延迟时间下达到30%的终端精度,而一个较大的模型可以在7秒的延迟时间下达到55%的终端精度。”

cs.LG - 2023-07-09

Investigating the Edge of Stability Phenomenon in Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.04210
  • repo_url: None
  • paper_authors: Rares Iordan, Marc Peter Deisenroth, Mihaela Rosca
  • for: 这个论文研究了深度学习 Reinforcement Learning(RL)中的优化动力学特性,具体来说是Off-policy Q-learning算法在不同数据域中的表现。
  • methods: 该论文使用了全批量梯度下降with momentum方法,并对不同的损失函数进行了比较。
  • results: 研究发现,尽管RL和supervised learning在一些方面存在差异,但是在某些情况下可以看到类似的优化动力学特性,例如Edge of stability现象。此外,研究发现DQN使用的Huber损失函数会导致更强的Edge of stability效应,而C51使用的cross entropy损失函数则没有这种效应。
    Abstract Recent progress has been made in understanding optimisation dynamics in neural networks trained with full-batch gradient descent with momentum with the uncovering of the edge of stability phenomenon in supervised learning. The edge of stability phenomenon occurs as the leading eigenvalue of the Hessian reaches the divergence threshold of the underlying optimisation algorithm for a quadratic loss, after which it starts oscillating around the threshold, and the loss starts to exhibit local instability but decreases over long time frames. In this work, we explore the edge of stability phenomenon in reinforcement learning (RL), specifically off-policy Q-learning algorithms across a variety of data regimes, from offline to online RL. Our experiments reveal that, despite significant differences to supervised learning, such as non-stationarity of the data distribution and the use of bootstrapping, the edge of stability phenomenon can be present in off-policy deep RL. Unlike supervised learning, however, we observe strong differences depending on the underlying loss, with DQN -- using a Huber loss -- showing a strong edge of stability effect that we do not observe with C51 -- using a cross entropy loss. Our results suggest that, while neural network structure can lead to optimisation dynamics that transfer between problem domains, certain aspects of deep RL optimisation can differentiate it from domains such as supervised learning.
    摘要 In this work, we explore the edge of stability phenomenon in reinforcement learning (RL), specifically off-policy Q-learning algorithms, across various data regimes, from offline to online RL. Our experiments show that, despite significant differences between supervised learning and RL, such as non-stationarity of the data distribution and the use of bootstrapping, the edge of stability phenomenon can still be present in off-policy deep RL.However, we observe strong differences depending on the underlying loss function. Specifically, we find that DQN, which uses a Huber loss, exhibits a strong edge of stability effect, while C51, which uses a cross entropy loss, does not. Our results suggest that while neural network structure can lead to optimization dynamics that transfer between problem domains, certain aspects of deep RL optimization can differentiate it from domains such as supervised learning.

On the Challenges of Deploying Privacy-Preserving Synthetic Data in the Enterprise

  • paper_url: http://arxiv.org/abs/2307.04208
  • repo_url: None
  • paper_authors: Lauren Arthur, Jason Costello, Jonathan Hardy, Will O’Brien, James Rea, Gareth Rees, Georgi Ganev
  • for: 本研究探讨了在企业环境中部署生成数据所附带的挑战,特别是隐私问题。
  • methods: 本研究使用了5个主要类别的挑战系统化分类:生成、基础设施与架构、治理、合规与法规、采用。
  • results: 研究提出了一种策略性和系统性的方法,可以帮助企业有效地解决挑战并实现目标,同时建立生成数据解决方案的信任。
    Abstract Generative AI technologies are gaining unprecedented popularity, causing a mix of excitement and apprehension through their remarkable capabilities. In this paper, we study the challenges associated with deploying synthetic data, a subfield of Generative AI. Our focus centers on enterprise deployment, with an emphasis on privacy concerns caused by the vast amount of personal and highly sensitive data. We identify 40+ challenges and systematize them into five main groups -- i) generation, ii) infrastructure & architecture, iii) governance, iv) compliance & regulation, and v) adoption. Additionally, we discuss a strategic and systematic approach that enterprises can employ to effectively address the challenges and achieve their goals by establishing trust in the implemented solutions.
    摘要 <>随着生成AI技术的普及,人们对其卓越的能力感到激动和略有拘懑。在这篇论文中,我们研究了部署生成数据的挑战,这是生成AI技术的一个子领域。我们的关注点是企业部署,强调个人隐私问题,由于巨量的个人隐私数据。我们识别出40多个挑战,并将它们分为五个主要类别:一、生成;二、基础设施与架构;三、管理;四、合规与法规;五、采纳。此外,我们还讨论了企业可以采用的战略和系统性的方法,以确保实施解决方案的可靠性和效果。

Extending the Forward Forward Algorithm

  • paper_url: http://arxiv.org/abs/2307.04205
  • repo_url: https://github.com/ads-cmu/forwardforward
  • paper_authors: Saumya Gandhi, Ritu Gala, Jonah Kornberg, Advaith Sridhar
  • For: The paper is written to propose and experiment with the Forward Forward algorithm, a novel method for training neural networks as an alternative to backpropagation.* Methods: The paper uses the MNIST dataset to replicate Hinton’s experiments and extend the scope of the method with two significant contributions: establishing a baseline performance on the IMDb movie reviews dataset for sentiment analysis, and introducing a novel pyramidal optimization strategy for the loss threshold.* Results: The paper shows that the Forward Forward network achieves a good performance on the sentiment analysis task, with a test error difference of up to 8% compared to the baseline method. Additionally, the paper visualizes the trained parameters and derives several significant insights, such as a notably larger mean and variance in the weights acquired by the Forward Forward network.Here is the same information in Simplified Chinese text:* For: 该文章是为了介绍和实验Forward Forward算法,一种用于训练神经网络的新方法,作为对backpropagation的替代方案。* Methods: 文章使用MNIST数据集来重现Hinton的实验,并对方法进行了两项重要贡献:在IMDb电影评论数据集上建立了一个基准性表现,并介绍了一种新的pyramidal优化策略来处理损失阈值。* Results: 文章显示,Forward Forward网络在情感分析任务上达到了一个不错的性能,与基准方法相比,测试错误差异可达8%。文章还可视化了训练参数并得出了一些重要的发现,如Forward Forward网络的 weights 的平均值和标准差较大(10-20倍)。
    Abstract The Forward Forward algorithm, proposed by Geoffrey Hinton in November 2022, is a novel method for training neural networks as an alternative to backpropagation. In this project, we replicate Hinton's experiments on the MNIST dataset, and subsequently extend the scope of the method with two significant contributions. First, we establish a baseline performance for the Forward Forward network on the IMDb movie reviews dataset. As far as we know, our results on this sentiment analysis task marks the first instance of the algorithm's extension beyond computer vision. Second, we introduce a novel pyramidal optimization strategy for the loss threshold - a hyperparameter specific to the Forward Forward method. Our pyramidal approach shows that a good thresholding strategy causes a difference of up to 8% in test error. Lastly, we perform visualizations of the trained parameters and derived several significant insights, such as a notably larger (10-20x) mean and variance in the weights acquired by the Forward Forward network. Repository: https://github.com/Ads-cmu/ForwardForward
    摘要 “对前进对方法”,由 Geoffrey Hinton 在2022年11月提出,是一种可以作为后向传播的训练神经网络的新方法。在这个项目中,我们重现了 Hinton 的实验,并将其推广到两个重要的贡献。首先,我们建立了 Forward Forward 网络在 IMDb 电影评论数据集上的基准性能。我们知道,这是 Forward Forward 方法在 Computer Vision 以外的第一个应用。其次,我们导入了一个新的 pyramidal 优化策略,用于损失阈值(一个特有的 Forward Forward 方法参数)。我们的 pyramidal 方法显示,一个好的阈值选择可以导致测试错误下降8%。最后,我们进行了网络参数的训练和分析,获得了一些重要的见解,例如 Forward Forward 网络获得的平均和方差都是10-20倍于传统的神经网络。Repository:https://github.com/Ads-cmu/ForwardForward

Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory

  • paper_url: http://arxiv.org/abs/2307.04204
  • repo_url: None
  • paper_authors: Minhak Song, Chulhee Yun
  • for: 本研究 empirically studies the evolution of the largest eigenvalue of the loss Hessian during gradient descent (GD) training, and observes a phenomenon called Edge of Stability (EoS).
  • methods: 本研究使用 empirical studies and rigorous proof to demonstrate the phenomenon of trajectory alignment on a specific bifurcation diagram, independent of initialization, when EoS occurs.
  • results: 研究发现,在进行GD训练时,大estenvalue of loss Hessian在初期phasereceives a sharp increase (referred to as progressive sharpening), eventually saturating close to the threshold of $2 / \text{(step size)}$. Additionally, the study establishes the trajectory alignment phenomenon for a two-layer fully-connected linear network and a single-neuron nonlinear network trained with a single data point.
    Abstract Cohen et al. (2021) empirically study the evolution of the largest eigenvalue of the loss Hessian, also known as sharpness, along the gradient descent (GD) trajectory and observe a phenomenon called the Edge of Stability (EoS). The sharpness increases at the early phase of training (referred to as progressive sharpening), and eventually saturates close to the threshold of $2 / \text{(step size)}$. In this paper, we start by demonstrating through empirical studies that when the EoS phenomenon occurs, different GD trajectories (after a proper reparameterization) align on a specific bifurcation diagram independent of initialization. We then rigorously prove this trajectory alignment phenomenon for a two-layer fully-connected linear network and a single-neuron nonlinear network trained with a single data point. Our trajectory alignment analysis establishes both progressive sharpening and EoS phenomena, encompassing and extending recent findings in the literature.
    摘要 科恩等(2021)employs empirical studies to investigate the evolution of the largest eigenvalue of the loss Hessian, also known as sharpness, along the gradient descent (GD) trajectory, and observe a phenomenon called the Edge of Stability (EoS). The sharpness initially increases during the early phase of training (referred to as progressive sharpening) and eventually saturates near the threshold of $2 / (\text{step size})$.在本文中,我们首先通过实验研究发现,当EoS现象出现时,不同的GD轨迹(经过适当的重parameterization)在独立于初始化的情况下归一化到特定的分岔图表上。然后,我们正式证明了这种轨迹归一化现象,并且覆盖了 reciently的 literatura 中的发现。我们的轨迹归一化分析确立了进攻性增强和EoS现象,这包括并推广了现有的 literatura 中的发现。

On the sample complexity of estimation in logistic regression

  • paper_url: http://arxiv.org/abs/2307.04191
  • repo_url: None
  • paper_authors: Daniel Hsu, Arya Mazumdar
  • for: 这 paper 的目的是研究逻辑回归模型在噪声二分类问题中的样本复杂度。
  • methods: 这 paper 使用了标准正态分布的covariates,并研究了逻辑回归模型参数的估计样本复杂度,以及逻辑回归模型参数的估计与错误和 inverse temperature 之间的关系。
  • results: 这 paper 发现了逻辑回归模型参数估计的样本复杂度curve 有两个变点(或 kritical points),这些变点可以清晰地分类三个温度范围:低温、中温和高温。
    Abstract The logistic regression model is one of the most popular data generation model in noisy binary classification problems. In this work, we study the sample complexity of estimating the parameters of the logistic regression model up to a given $\ell_2$ error, in terms of the dimension and the inverse temperature, with standard normal covariates. The inverse temperature controls the signal-to-noise ratio of the data generation process. While both generalization bounds and asymptotic performance of the maximum-likelihood estimator for logistic regression are well-studied, the non-asymptotic sample complexity that shows the dependence on error and the inverse temperature for parameter estimation is absent from previous analyses. We show that the sample complexity curve has two change-points (or critical points) in terms of the inverse temperature, clearly separating the low, moderate, and high temperature regimes.
    摘要 “逻辑回传模型是数据生成模型中最受欢迎的一种,尤其在噪音 binary 分类问题中。在这个工作中,我们研究了逻辑回传模型参数的估计,在给定的 $\ell_2$ 误差下,与维度和逆温度的关系。使用标准正常分布的 covariates。逆温度控制了数据生成过程中的信号噪音比例。 Although the generalization bounds and asymptotic performance of the maximum-likelihood estimator for logistic regression have been well-studied, the non-asymptotic sample complexity that shows the dependence on error and the inverse temperature for parameter estimation is absent from previous analyses. We show that the sample complexity curve has two change-points (or critical points) in terms of the inverse temperature, clearly separating the low, moderate, and high temperature regimes.”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Review of feedback in Automated Essay Scoring

  • paper_url: http://arxiv.org/abs/2307.05553
  • repo_url: None
  • paper_authors: You-Jin Jong, Yong-Jin Kim, Ok-Chol Ri
  • for: 这篇论文主要写于自动评分系统的发展和其在改善写作技巧方面的应用。
  • methods: 这篇论文主要采用了对已有的自动评分系统研究的回顾和对各种反馈类型和文章特点的分析。
  • results: 论文认为反馈是自动评分系统的关键组成部分,可以帮助用户改善写作技巧。
    Abstract The first automated essay scoring system was developed 50 years ago. Automated essay scoring systems are developing into systems with richer functions than the previous simple scoring systems. Its purpose is not only to score essays but also as a learning tool to improve the writing skill of users. Feedback is the most important aspect of making an automated essay scoring system useful in real life. The importance of feedback was already emphasized in the first AES system. This paper reviews research on feedback including different feedback types and essay traits on automated essay scoring. We also reviewed the latest case studies of the automated essay scoring system that provides feedback.
    摘要 50 年前开发出了首个自动评分文章系统。现在的自动评分系统不仅可以评分文章,还可以作为学习工具,帮助用户提升写作技巧。回馈是自动评分系统在实际应用中的重要方面。这篇文章检视了不同类型的回馈和文章特征在自动评分系统中的研究。我们也检视了最新的自动评分系统提供的回馈的案例研究。

Latent Graph Attention for Enhanced Spatial Context

  • paper_url: http://arxiv.org/abs/2307.04149
  • repo_url: None
  • paper_authors: Ayush Singh, Yash Bhambhu, Himanshu Buckchash, Deepak K. Gupta, Dilip K. Prasad
  • for: 提高小规模 architecture 的表现,使其 closer to large size architecture,以满足边缘设备的低计算能力和低能耗要求。
  • methods: 使用 Latent Graph Attention (LGA) 模型,该模型通过构建一个具有Semantic coherence的网络来传递信息,以及引入一种新的对比损失函数来增强学习机制。
  • results: 在透明物体分割、雾度修复和动态流体估计等三个挑战性应用中,通过 incorporating LGA 提高表现。
    Abstract Global contexts in images are quite valuable in image-to-image translation problems. Conventional attention-based and graph-based models capture the global context to a large extent, however, these are computationally expensive. Moreover, the existing approaches are limited to only learning the pairwise semantic relation between any two points on the image. In this paper, we present Latent Graph Attention (LGA) a computationally inexpensive (linear to the number of nodes) and stable, modular framework for incorporating the global context in the existing architectures, especially empowering small-scale architectures to give performance closer to large size architectures, thus making the light-weight architectures more useful for edge devices with lower compute power and lower energy needs. LGA propagates information spatially using a network of locally connected graphs, thereby facilitating to construct a semantically coherent relation between any two spatially distant points that also takes into account the influence of the intermediate pixels. Moreover, the depth of the graph network can be used to adapt the extent of contextual spread to the target dataset, thereby being able to explicitly control the added computational cost. To enhance the learning mechanism of LGA, we also introduce a novel contrastive loss term that helps our LGA module to couple well with the original architecture at the expense of minimal additional computational load. We show that incorporating LGA improves the performance on three challenging applications, namely transparent object segmentation, image restoration for dehazing and optical flow estimation.
    摘要

A Survey on Figure Classification Techniques in Scientific Documents

  • paper_url: http://arxiv.org/abs/2307.05694
  • repo_url: None
  • paper_authors: Anurag Dhote, Mohammed Javed, David S Doermann
  • for: This paper is written to provide a systematic review of existing methodologies and data sets for figure classification, with the goal of identifying current research gaps and providing possible directions for future research.
  • methods: The paper uses a categorization framework to classify figures into five classes - tables, photos, diagrams, maps, and plots - and presents a critical review of existing methodologies and data sets for figure classification.
  • results: The paper identifies current research gaps in figure classification and provides possible directions for future research, including the need for more diverse and annotated data sets, the development of more sophisticated machine learning algorithms, and the integration of figure classification with other NLP tasks.
    Abstract Figures visually represent an essential piece of information and provide an effective means to communicate scientific facts. Recently there have been many efforts toward extracting data directly from figures, specifically from tables, diagrams, and plots, using different Artificial Intelligence and Machine Learning techniques. This is because removing information from figures could lead to deeper insights into the concepts highlighted in the scientific documents. In this survey paper, we systematically categorize figures into five classes - tables, photos, diagrams, maps, and plots, and subsequently present a critical review of the existing methodologies and data sets that address the problem of figure classification. Finally, we identify the current research gaps and provide possible directions for further research on figure classification.
    摘要 Figures 可以用来表示重要信息,并提供一种有效的方式来传递科学知识。近年来,有很多努力在抽取图表中的数据,特别是图表、图形、地图和图表,使用不同的人工智能和机器学习技术。这是因为从图表中抽取信息可以带来更深入的理解科学文献中所提出的概念。在这篇评论文中,我们系统地分类图表为五种类别:表格、照片、图形、地图和图表,然后给出现有的方法和数据集,并评论现有的方法和数据集。最后,我们识别了现有的研究潜在问题,并提供可能的研究方向。

A Survey and Approach to Chart Classification

  • paper_url: http://arxiv.org/abs/2307.04147
  • repo_url: None
  • paper_authors: Anurag Dhote, Mohammed Javed, David S Doermann
  • for: 本研究旨在 automatic chart understanding 问题上进行报告和分析,即从chart的类型来自动分类。
  • methods: 本文使用了传统的机器学习方法、卷积神经网络和变换器来实现chart classification。
  • results: 我们在使用chartINFO UB-UNITECH PMC数据集进行实验后,发现使用视觉基于 transformer 模型可以达到 chart classification 的state-of-the-art result。
    Abstract Charts represent an essential source of visual information in documents and facilitate a deep understanding and interpretation of information typically conveyed numerically. In the scientific literature, there are many charts, each with its stylistic differences. Recently the document understanding community has begun to address the problem of automatic chart understanding, which begins with chart classification. In this paper, we present a survey of the current state-of-the-art techniques for chart classification and discuss the available datasets and their supported chart types. We broadly classify these contributions as traditional approaches based on ML, CNN, and Transformers. Furthermore, we carry out an extensive comparative performance analysis of CNN-based and transformer-based approaches on the recently published CHARTINFO UB-UNITECH PMC dataset for the CHART-Infographics competition at ICPR 2022. The data set includes 15 different chart categories, including 22,923 training images and 13,260 test images. We have implemented a vision-based transformer model that produces state-of-the-art results in chart classification.
    摘要 图表是文档中重要的视觉信息来源,可以帮助人们更深入理解和解释通常用数字形式表示的信息。科学文献中有很多图表,每个图表都有不同的风格。在文档理解社区中,人们已经开始努力解决自动图表理解的问题,开始于图表分类。在这篇论文中,我们提供了当前状态的技术进展和数据集,以及它们支持的图表类型。我们将这些贡献分为传统的机器学习、CNN和Transformers三大类别。此外,我们还进行了对CNN和Transformers两种方法在ICPR 2022年CHARTINFO UB-UNITECH PMC数据集上的比较性能分析。该数据集包括15种不同的图表类别,包括22,923个训练图像和13,260个测试图像。我们实现了一种基于视觉的Transformers模型,可以在图表分类中达到状态之 Art。

On The Impact of Machine Learning Randomness on Group Fairness

  • paper_url: http://arxiv.org/abs/2307.04138
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Prakhar Ganesh, Hongyan Chang, Martin Strobel, Reza Shokri
  • for: 这篇论文探讨了机器学习算法中的群体公平性指标,以及这些指标在不同实验实例中的不稳定性。
  • methods: 作者研究了不同训练实例中机器学习算法的学习过程中的随机性对群体公平性指标的影响。
  • results: 研究发现,群体公平性指标的不稳定性主要来自于训练过程中数据顺序的随机性,并且可以通过改变数据顺序的一个epoch来控制群体级别准确率,无需影响模型的总性能。
    Abstract Statistical measures for group fairness in machine learning reflect the gap in performance of algorithms across different groups. These measures, however, exhibit a high variance between different training instances, which makes them unreliable for empirical evaluation of fairness. What causes this high variance? We investigate the impact on group fairness of different sources of randomness in training neural networks. We show that the variance in group fairness measures is rooted in the high volatility of the learning process on under-represented groups. Further, we recognize the dominant source of randomness as the stochasticity of data order during training. Based on these findings, we show how one can control group-level accuracy (i.e., model fairness), with high efficiency and negligible impact on the model's overall performance, by simply changing the data order for a single epoch.
    摘要 machine learning中的统计方法可以反映不同群体的性能差异。然而,这些度量值具有高度的变化性,使其在实际评估公平性方面不可靠。我们调查了训练神经网络时不同来源的随机性对集体公平性的影响。我们发现,这种变化源于训练过程中少数群体的学习过程的高度波动性。此外,我们发现主要的随机性来源是训练过程中数据顺序的随机性。基于这些发现,我们展示了如何通过单个轮数据的重新排序来控制集体精度(即模型公平性),并且可以高效地、无损到模型总性能。

Graph Neural Network-enabled Terahertz-based Flow-guided Nanoscale Localization

  • paper_url: http://arxiv.org/abs/2307.05551
  • repo_url: None
  • paper_authors: Gerard Calvo Bartra, Filip Lemic, Sergi Abadal, Xavier Costa Perez
  • for: 这篇论文旨在提出一种基于图神经网络(GNN)的流动导向地方化方法,以提高心血管系统中事件的地点确定精度和覆盖率。
  • methods: 该方法使用GNN来地图化流动数据,并通过图神经网络的学习来实现事件的地点确定。
  • results: 相比现有状态的方法,该方法可以提高地点确定精度和覆盖率,并且提供了一些设计指南 дляGNN启用的流动导向地方化。
    Abstract Scientific advancements in nanotechnology and advanced materials are paving the way toward nanoscale devices for in-body precision medicine; comprising integrated sensing, computing, communication, data and energy storage capabilities. In the human cardiovascular system, such devices are envisioned to be passively flowing and continuously sensing for detecting events of diagnostic interest. The diagnostic value of detecting such events can be enhanced by assigning to them their physical locations (e.g., body region), which is the main proposition of flow-guided localization. Current flow-guided localization approaches suffer from low localization accuracy and they are by-design unable to localize events within the entire cardiovascular system. Toward addressing this issue, we propose the utilization of Graph Neural Networks (GNNs) for this purpose, and demonstrate localization accuracy and coverage enhancements of our proposal over the existing State of the Art (SotA) approaches. Based on our evaluation, we provide several design guidelines for GNN-enabled flow-guided localization.
    摘要

  • paper_url: http://arxiv.org/abs/2307.04131
  • repo_url: None
  • paper_authors: Yiyang Zhao, Tian Guo
  • for: 提高模型设计过程中的碳效率和能效性
  • methods: 使用不同能量需求的 NAS评估算法、多目标优化器和启发式GPU分配策略
  • results: 在使用最新的 NASbenchmark数据集和两个碳轨迹下,CE-NAS在碳效率和搜索效率方面表现更好 than三个基eline
    Abstract This work presents a novel approach to neural architecture search (NAS) that aims to reduce energy costs and increase carbon efficiency during the model design process. The proposed framework, called carbon-efficient NAS (CE-NAS), consists of NAS evaluation algorithms with different energy requirements, a multi-objective optimizer, and a heuristic GPU allocation strategy. CE-NAS dynamically balances energy-efficient sampling and energy-consuming evaluation tasks based on current carbon emissions. Using a recent NAS benchmark dataset and two carbon traces, our trace-driven simulations demonstrate that CE-NAS achieves better carbon and search efficiency than the three baselines.
    摘要 Here is the text in Simplified Chinese:这个工作提出了一种新的神经建筑搜索(NAS)方法,旨在降低能耗成本和提高碳素效率 durante 模型设计过程。提出的框架,叫做碳效 NAS(CE-NAS),包括 NAS 评估算法不同的能耗要求,多目标优化器,和一种启发式 GPU 分配策略。CE-NAS 动态均衡能效取样和能耗评估任务基于当前碳排放。使用最新的 NAS benchmark 数据集和两个碳轨迹,我们的轨迹驱动 simulations 显示,CE-NAS 可以比基准三个实现更好的碳和搜索效率。

A Deep Learning Framework for Solving Hyperbolic Partial Differential Equations: Part I

  • paper_url: http://arxiv.org/abs/2307.04121
  • repo_url: None
  • paper_authors: Rajat Arora
  • for: 这个研究旨在开发一种基于物理知识的深度学习框架,用于精确地近似非线性偏微分方程(PDE)的解。
  • methods: 该框架采用了基于物理的偏微分方程(PDE)的近似方法,并利用了finite element法来解决问题。
  • results: 经过numerical experiment和analytical解的验证,该框架能够准确地近似非线性偏微分方程(PDE)的解,并能够自动处理boundary condition和entropy condition等约束。
    Abstract Physics informed neural networks (PINNs) have emerged as a powerful tool to provide robust and accurate approximations of solutions to partial differential equations (PDEs). However, PINNs face serious difficulties and challenges when trying to approximate PDEs with dominant hyperbolic character. This research focuses on the development of a physics informed deep learning framework to approximate solutions to nonlinear PDEs that can develop shocks or discontinuities without any a-priori knowledge of the solution or the location of the discontinuities. The work takes motivation from finite element method that solves for solution values at nodes in the discretized domain and use these nodal values to obtain a globally defined solution field. Built on the rigorous mathematical foundations of the discontinuous Galerkin method, the framework naturally handles imposition of boundary conditions (Neumann/Dirichlet), entropy conditions, and regularity requirements. Several numerical experiments and validation with analytical solutions demonstrate the accuracy, robustness, and effectiveness of the proposed framework.
    摘要 物理学 Informed neural networks (PINNs) 已成为解决部分偏微分方程 (PDEs) 的强大工具,但 PINNs 对具有主要浮点特征的 PDEs 颇受阻碍和挑战。这些研究将发展一种基于物理学 Informed 深度学习框架,用于非线性 PDEs 的解析,无需任何先验知识或解析解的位置。这个框架取得了 finite element 方法的灵感,解决方案值在分解区域中的节点,然后使用这些节点值获得全局定义的解场。建立在精确的数学基础上的步骤,该框架自然处理边界条件(Neumann/Dirichlet)、热力学条件和常数性要求。数学实验和与分析解的验证表明该提案的准确性、稳定性和效果。

FILM: How can Few-Shot Image Classification Benefit from Pre-Trained Language Models?

  • paper_url: http://arxiv.org/abs/2307.04114
  • repo_url: None
  • paper_authors: Zihao Jiang, Yunkai Dang, Dong Pang, Huishuai Zhang, Weiran Huang
  • for: 增强少样本学习,使模型可以通过几个样本进行扩展到新的类别。
  • methods: 使用预训练的自然语言模型,基于对比学习来提高模型的泛化能力。
  • results: 提出了一种新的几shot学习框架,通过精心设计文本分支和 metric 模块,可以更好地利用 semantic 信息,并通过 MAML 进行训练,实现更好的扩展性和传输性。
    Abstract Few-shot learning aims to train models that can be generalized to novel classes with only a few samples. Recently, a line of works are proposed to enhance few-shot learning with accessible semantic information from class names. However, these works focus on improving existing modules such as visual prototypes and feature extractors of the standard few-shot learning framework. This limits the full potential use of semantic information. In this paper, we propose a novel few-shot learning framework that uses pre-trained language models based on contrastive learning. To address the challenge of alignment between visual features and textual embeddings obtained from text-based pre-trained language model, we carefully design the textual branch of our framework and introduce a metric module to generalize the cosine similarity. For better transferability, we let the metric module adapt to different few-shot tasks and adopt MAML to train the model via bi-level optimization. Moreover, we conduct extensive experiments on multiple benchmarks to demonstrate the effectiveness of our method.
    摘要 几个样本学习目标是训练模型可以通过几个样本来泛化到新的类型。最近,一些工作提出了使用可 accessible 的 semantic information from class names 来增强几个样本学习。然而,这些工作通常是对现有模块,如视觉原型和特征提取器,进行改进。这限制了使用 semantic information 的全部潜力。在这篇论文中,我们提出了一种新的几个样本学习框架,使用基于对比学习的预训练语言模型。为 Address the challenge of 对 visual features 和 textual embeddings 的Alignment,我们仔细设计了文本分支我们的框架,并引入了一个 metric module 来泛化cosine similarity。为了提高转移性,我们让 metric module 适应不同的几个样本任务,并采用 MAML 来训练模型via bi-level optimization。此外,我们在多个 benchmark 上进行了广泛的实验,以证明我们的方法的有效性。

Learning Space-Time Continuous Neural PDEs from Partially Observed States

  • paper_url: http://arxiv.org/abs/2307.04110
  • repo_url: None
  • paper_authors: Valerii Iakovlev, Markus Heinonen, Harri Lähdesmäki
  • for: 学习受损和异常观测的 partial differential equations (PDEs) 模型
  • methods: 提议了一种基于 continous latent neural PDE 模型的可靠概率框架和一种改进的编码设计,以提高数据效率和网格独立性
  • results: 模型在复杂的synthetic和实际数据集上达到了状态顶峰性,超越了先前的方法和有效地处理受限观测数据,示出其可能性进一步推动数据驱动的 PDE 模型和可靠网格独立的模型化复杂动态过程。
    Abstract We introduce a novel grid-independent model for learning partial differential equations (PDEs) from noisy and partial observations on irregular spatiotemporal grids. We propose a space-time continuous latent neural PDE model with an efficient probabilistic framework and a novel encoder design for improved data efficiency and grid independence. The latent state dynamics are governed by a PDE model that combines the collocation method and the method of lines. We employ amortized variational inference for approximate posterior estimation and utilize a multiple shooting technique for enhanced training speed and stability. Our model demonstrates state-of-the-art performance on complex synthetic and real-world datasets, overcoming limitations of previous approaches and effectively handling partially-observed data. The proposed model outperforms recent methods, showing its potential to advance data-driven PDE modeling and enabling robust, grid-independent modeling of complex partially-observed dynamic processes.
    摘要 我们介绍了一种新的网格独立模型,用于从不稳定和受限的观测数据集上学习部分梯度方程(PDE)。我们提议了一种空间时间连续的潜在神经PDE模型,并使用了一种高效的概率框架和一种新的编码设计以提高数据效率和网格独立性。 latent state动态被由PDE模型控制,这个模型结合了 colocated method和方程eline方法。我们使用了启动变量推理来估计近似 posterior,并使用多个射击技术来提高训练速度和稳定性。我们的模型在复杂的 sintetic和实际数据集上达到了状态艺术性的表现,超越了过去的方法,有效地处理了部分观测数据。我们的模型比最近的方法表现更佳,这表明其可以推进数据驱动PDE模型化,并提供了可靠、网格独立的模型复杂部分观测动态过程。

Towards Assumption-free Bias Mitigation

  • paper_url: http://arxiv.org/abs/2307.04105
  • repo_url: None
  • paper_authors: Chia-Yuan Chang, Yu-Neng Chuang, Kwei-Herng Lai, Xiaotian Han, Xia Hu, Na Zou
  • for: 降低机器学习模型的不公正预测行为
  • methods: 利用自动检测偏聚特征交互来减少不公正预测行为
  • results: 实验结果表明,提议的框架可以在四个实际 datasets 上减少不公正预测行为,并且不需要假设非敏感特征之间的相关性。
    Abstract Despite the impressive prediction ability, machine learning models show discrimination towards certain demographics and suffer from unfair prediction behaviors. To alleviate the discrimination, extensive studies focus on eliminating the unequal distribution of sensitive attributes via multiple approaches. However, due to privacy concerns, sensitive attributes are often either unavailable or missing in real-world scenarios. Therefore, several existing works alleviate the bias without sensitive attributes. Those studies face challenges, either in inaccurate predictions of sensitive attributes or the need to mitigate unequal distribution of manually defined non-sensitive attributes related to bias. The latter requires strong assumptions about the correlation between sensitive and non-sensitive attributes. As data distribution and task goals vary, the strong assumption on non-sensitive attributes may not be valid and require domain expertise. In this work, we propose an assumption-free framework to detect the related attributes automatically by modeling feature interaction for bias mitigation. The proposed framework aims to mitigate the unfair impact of identified biased feature interactions. Experimental results on four real-world datasets demonstrate that our proposed framework can significantly alleviate unfair prediction behaviors by considering biased feature interactions.
    摘要 尽管机器学习模型表现出色,但它们仍然存在对某些人群的歧视行为,并且受到不公正预测行为的影响。为了解决这问题,广泛的研究努力于消除敏感特征的不均衡分布,使用多种方法。然而,由于隐私问题,敏感特征在实际场景中 oftentimes 缺失或未知。因此,现有的研究往往需要快速假设敏感特征的替代方案,以减少不公正的预测行为。然而,这些研究面临着两个挑战:一是不准确预测敏感特征,二是需要调整不公正的非敏感特征,这需要强制假设敏感特征和非敏感特征之间的相关性。由于数据分布和任务目标因素的变化,这些假设可能不正确,需要培训领域专家。在这种情况下,我们提出一种假设自由框架,通过模型特性互动来检测相关特征,以消除不公正的影响。我们的提议框架可以减少不公正预测行为,并且实际试验结果表明,在四个真实的数据集上,我们的框架可以明显减少不公正预测行为。

A generative flow for conditional sampling via optimal transport

  • paper_url: http://arxiv.org/abs/2307.04102
  • repo_url: https://github.com/giuliotrigila/moonexample
  • paper_authors: Jason Alfonso, Ricardo Baptista, Anupam Bhakta, Noam Gal, Alfin Hou, Isa Lyubimova, Daniel Pocklington, Josef Sajonz, Giulio Trigila, Ryan Tsai
  • for: 这个论文是用来解决Conditional sampling的问题的,即在bayesian inference和density estimation中对于非正态分布的问题。
  • methods: 这个论文使用了一种非 Parametric generative model,它通过iteratively mapping reference samples to the target distribution来描述conditionals。这个模型使用了块三角形的交通地图,其中每个块的组件都可以描述目标分布中的conditionals。这个地图由解决一个最优交通问题来获得,其中cost函数是一个weighted $L^2$ cost function。
  • results: 这个论文的实验结果表明,这种方法可以成功地描述许多非正态的问题,并且比传统的正态流和生成敌对网络更加稳定和可靠。
    Abstract Sampling conditional distributions is a fundamental task for Bayesian inference and density estimation. Generative models, such as normalizing flows and generative adversarial networks, characterize conditional distributions by learning a transport map that pushes forward a simple reference (e.g., a standard Gaussian) to a target distribution. While these approaches successfully describe many non-Gaussian problems, their performance is often limited by parametric bias and the reliability of gradient-based (adversarial) optimizers to learn these transformations. This work proposes a non-parametric generative model that iteratively maps reference samples to the target. The model uses block-triangular transport maps, whose components are shown to characterize conditionals of the target distribution. These maps arise from solving an optimal transport problem with a weighted $L^2$ cost function, thereby extending the data-driven approach in [Trigila and Tabak, 2016] for conditional sampling. The proposed approach is demonstrated on a two dimensional example and on a parameter inference problem involving nonlinear ODEs.
    摘要 采样 conditional distributions 是 bayesian inference 和 density estimation 的基本任务之一。生成模型,如 нормализацион流和生成对抗网络,可以通过学习一个传输图来描述 conditional distributions,其中传输图将一个简单的参考(例如标准正态)推进到目标分布中。although these approaches have been successful in solving many non-Gaussian problems, their performance is often limited by parametric bias and the reliability of gradient-based (adversarial) optimizers to learn these transformations.本文提出了一种非Parametric生成模型,可以逐步将参考样本映射到目标分布中。该模型使用块三角形的传输图,其中每个组件可以表示目标分布的 conditionals。这些传输图来自于解一个最优运输问题,其中的cost函数是 weighted $L^2$ 的,从而扩展了 [Trigila and Tabak, 2016] 中的数据驱动方法。提出的方法在二维示例和非线性 ODE 参数推断中进行了示例。

GNP Attack: Transferable Adversarial Examples via Gradient Norm Penalty

  • paper_url: http://arxiv.org/abs/2307.04099
  • repo_url: None
  • paper_authors: Tao Wu, Tie Luo, Donald C. Wunsch
  • for: 这篇论文的目的是提高敌意例的传输性,使其可以对各种目标模型进行实际的黑盒攻击,不需要内部知识。
  • methods: 这篇论文提出了一种新的方法,即Gradient Norm Penalty(GNP),用于提高敌意例的传输性。GNP使得优化过程 converges to a flat region of local optima in the loss landscape,从而提高了敌意例的通用性。
  • results: 通过对11种state-of-the-art深度学习模型和6种高级防御方法进行实验,这篇论文证明了GNP的高效性和灵活性。GNP可以轻松地与其他梯度基本方法结合使用,以实现更强大的传输基本攻击。
    Abstract Adversarial examples (AE) with good transferability enable practical black-box attacks on diverse target models, where insider knowledge about the target models is not required. Previous methods often generate AE with no or very limited transferability; that is, they easily overfit to the particular architecture and feature representation of the source, white-box model and the generated AE barely work for target, black-box models. In this paper, we propose a novel approach to enhance AE transferability using Gradient Norm Penalty (GNP). It drives the loss function optimization procedure to converge to a flat region of local optima in the loss landscape. By attacking 11 state-of-the-art (SOTA) deep learning models and 6 advanced defense methods, we empirically show that GNP is very effective in generating AE with high transferability. We also demonstrate that it is very flexible in that it can be easily integrated with other gradient based methods for stronger transfer-based attacks.
    摘要 “敌对例”(AE) WITH 良好的传播能力允许实际的黑盒攻击多种目标模型,不需要内部知识 About the target models. Previous methods often generate AE with no or very limited transferability; that is, they easily overfit to the particular architecture and feature representation of the source, white-box model, and the generated AE barely work for target, black-box models. In this paper, we propose a novel approach to enhance AE transferability using Gradient Norm Penalty (GNP). It drives the loss function optimization procedure to converge to a flat region of local optima in the loss landscape. By attacking 11 state-of-the-art (SOTA) deep learning models and 6 advanced defense methods, we empirically show that GNP is very effective in generating AE with high transferability. We also demonstrate that it is very flexible in that it can be easily integrated with other gradient-based methods for stronger transfer-based attacks.Note: Please note that the translation is in Simplified Chinese, which is one of the two standardized Chinese writing systems.

SpreadNUTS – Moderate Dynamic Extension of Paths for No-U-Turn Sampling & Partitioning Visited Regions

  • paper_url: http://arxiv.org/abs/2307.06279
  • repo_url: None
  • paper_authors: Fareed Sheriff
  • for: This paper aims to improve the efficiency and speed of convergence of Hamiltonian Monte Carlo (HMC) methods for sampling distributions.
  • methods: The paper introduces modifications to the no-U-turn sampler (NUTS) algorithm to explore the sample space faster and achieve faster convergence to the true distribution.
  • results: The modified NUTS algorithm is shown to have faster convergence to the true distribution than the original NUTS algorithm.
    Abstract Markov chain Monte Carlo (MCMC) methods have existed for a long time and the field is well-explored. The purpose of MCMC methods is to approximate a distribution through repeated sampling; most MCMC algorithms exhibit asymptotically optimal behavior in that they converge to the true distribution at the limit. However, what differentiates these algorithms are their practical convergence guarantees and efficiency. While a sampler may eventually approximate a distribution well, because it is used in the real world it is necessary that the point at which the sampler yields a good estimate of the distribution is reachable in a reasonable amount of time. Similarly, if it is computationally difficult or intractable to produce good samples from a distribution for use in estimation, then there is no real-world utility afforded by the sampler. Thus, most MCMC methods these days focus on improving efficiency and speeding up convergence. However, many MCMC algorithms suffer from random walk behavior and often only mitigate such behavior as outright erasing random walks is difficult. Hamiltonian Monte Carlo (HMC) is a class of MCMC methods that theoretically exhibit no random walk behavior because of properties related to Hamiltonian dynamics. This paper introduces modifications to a specific HMC algorithm known as the no-U-turn sampler (NUTS) that aims to explore the sample space faster than NUTS, yielding a sampler that has faster convergence to the true distribution than NUTS.
    摘要 Hamiltonian Monte Carlo (HMC) 是一种 MCMC 方法,它们在理论上不会受到随机步行行为的影响,因为它们具有相关的 Hamiltonian dynamics 性质。这篇文章介绍了一种对 NUTS 算法(No-U-turn sampler)进行修改,以实现更快地探索样本空间,并实现一个更快速地趋向真实分布的抽样器。

Restricted Generative Projection for One-Class Classification and Anomaly Detection

  • paper_url: http://arxiv.org/abs/2307.04097
  • repo_url: None
  • paper_authors: Feng Xiao, Ruoyu Sun, Jicong Fan
  • for: 一篇关于一类分类和异常检测的论文,核心思想是将未知训练数据的分布映射到一个已知目标分布上。
  • methods: 我们提出使用截断 Gaussian、均匀在卷积体、均匀在卷积体或均匀 между卷积体作为目标分布。然后我们寻找将数据分布映射到目标分布的最佳方法,以保持原始数据的重建错误小。
  • results: 对多个基准数据集进行比较研究,我们的方法与基准方法相比,显示更高的效果。
    Abstract We present a simple framework for one-class classification and anomaly detection. The core idea is to learn a mapping to transform the unknown distribution of training (normal) data to a known target distribution. Crucially, the target distribution should be sufficiently simple, compact, and informative. The simplicity is to ensure that we can sample from the distribution easily, the compactness is to ensure that the decision boundary between normal data and abnormal data is clear and reliable, and the informativeness is to ensure that the transformed data preserve the important information of the original data. Therefore, we propose to use truncated Gaussian, uniform in hypersphere, uniform on hypersphere, or uniform between hyperspheres, as the target distribution. We then minimize the distance between the transformed data distribution and the target distribution while keeping the reconstruction error for the original data small enough. Comparative studies on multiple benchmark datasets verify the effectiveness of our methods in comparison to baselines.
    摘要 我们提出了一个简单的框架 для一类分类和异常检测。核心思想是学习将未知的训练数据分布映射到已知的目标分布上。重要的是,目标分布应该够简单、够 компакт、够有信息。简单是以便我们可以轻松地采样分布, compactness 是以便决策边界 между 正常数据和异常数据清晰可靠,有信息是以便保留原始数据的重要信息。因此,我们提议使用 truncated Gaussian, uniform in hypersphere, uniform on hypersphere,或 uniform between hyperspheres 作为目标分布。然后我们尝试将数据分布与目标分布的距离降为最小化,保持原始数据的重建错误小 enough。多种比较研究在多个 benchmark 数据集上验证了我们的方法的有效性,与基eline 相比。

Class-Incremental Mixture of Gaussians for Deep Continual Learning

  • paper_url: http://arxiv.org/abs/2307.04094
  • repo_url: None
  • paper_authors: Lukasz Korycki, Bartosz Krawczyk
  • for: 这篇论文专门针对静态数据的连续学习模型,旨在学习和保持来自顺序推出的概念。
  • methods: 该论文提出了一种基于中心点驱动方法的综合含气球模型,并在连续学习框架中进行了端到端的整合。
  • results: 实验表明,该模型在内存免除的场景下可以有效地学习,并与现有的连续学习基eline相比较竞争力强。
    Abstract Continual learning models for stationary data focus on learning and retaining concepts coming to them in a sequential manner. In the most generic class-incremental environment, we have to be ready to deal with classes coming one by one, without any higher-level grouping. This requirement invalidates many previously proposed methods and forces researchers to look for more flexible alternative approaches. In this work, we follow the idea of centroid-driven methods and propose end-to-end incorporation of the mixture of Gaussians model into the continual learning framework. By employing the gradient-based approach and designing losses capable of learning discriminative features while avoiding degenerate solutions, we successfully combine the mixture model with a deep feature extractor allowing for joint optimization and adjustments in the latent space. Additionally, we show that our model can effectively learn in memory-free scenarios with fixed extractors. In the conducted experiments, we empirically demonstrate the effectiveness of the proposed solutions and exhibit the competitiveness of our model when compared with state-of-the-art continual learning baselines evaluated in the context of image classification problems.
    摘要

Properly Learning Decision Trees with Queries Is NP-Hard

  • paper_url: http://arxiv.org/abs/2307.04093
  • repo_url: None
  • paper_authors: Caleb Koch, Carmen Strassle, Li-Yang Tan
  • for: 本研究证明了在使用查询学习时,正确地学习决策树是NP困难的,解决了长期存在的开放问题在学习理论中(Bshouty 1993;Guijarro-Lavin-Raghavan 1999;Mehta-Raghavan 2002;Feldman 2016)。
  • methods: 我们引入了一种called hardness distillation,用于研究决策树复杂性的函数。我们的技术可以应用于任何复杂度度量,并可以排除常量错误的查询学习器。
  • results: 我们的结果,与最近的几乎多项式时间查询算法(Blanc-Lange-Qiao-Tan 2022)一起,表明了分布假设对问题的影响。
    Abstract We prove that it is NP-hard to properly PAC learn decision trees with queries, resolving a longstanding open problem in learning theory (Bshouty 1993; Guijarro-Lavin-Raghavan 1999; Mehta-Raghavan 2002; Feldman 2016). While there has been a long line of work, dating back to (Pitt-Valiant 1988), establishing the hardness of properly learning decision trees from random examples, the more challenging setting of query learners necessitates different techniques and there were no previous lower bounds. En route to our main result, we simplify and strengthen the best known lower bounds for a different problem of Decision Tree Minimization (Zantema-Bodlaender 2000; Sieling 2003). On a technical level, we introduce the notion of hardness distillation, which we study for decision tree complexity but can be considered for any complexity measure: for a function that requires large decision trees, we give a general method for identifying a small set of inputs that is responsible for its complexity. Our technique even rules out query learners that are allowed constant error. This contrasts with existing lower bounds for the setting of random examples which only hold for inverse-polynomial error. Our result, taken together with a recent almost-polynomial time query algorithm for properly learning decision trees under the uniform distribution (Blanc-Lange-Qiao-Tan 2022), demonstrates the dramatic impact of distributional assumptions on the problem.
    摘要 我们证明了对问题进行PAC学习是NP困难的,解决了学习理论中长期存在的开问题(Bshouty 1993;Guijarro-Lavin-Raghavan 1999;Mehta-Raghavan 2002;Feldman 2016)。尽管在过去的工作中(Pitt-Valiant 1988)已经证明了从随机例子学习决策树的困难性,但是问题的设定是查询学习需要不同的技术,没有过去的下界。我们在证明过程中简化了和加强了最好的下界,并将其应用于决策树问题。在技术上,我们引入了困难炼煮的概念,它可以应用于任何复杂度度量。对于需要大型决策树的函数,我们提供了一个通用的方法,可以从小批量的输入中获得问题的复杂性。我们的技术甚至可以排除查询学习器,即允许常数错误。这与以往的下界,仅适用于随机例子中的问题,不同之处。我们的结果,加上最近的几乎多项时间查询算法(Blanc-Lange-Qiao-Tan 2022),显示出分布假设的影响。

DebateKG: Automatic Policy Debate Case Creation with Semantic Knowledge Graphs

  • paper_url: http://arxiv.org/abs/2307.04090
  • repo_url: https://github.com/hellisotherpeople/debatekg
  • paper_authors: Allen Roush
  • for: 这篇论文主要针对竞议辩护中的问题,即如何使用自然语言处理系统解决竞议辩护中的问题。
  • methods: 该论文使用了受限短路征服算法在 Argumentative Semantic Knowledge Graphs 上进行搜索,以构建高质量的辩护案例。
  • results: 该论文在 Policy Debate 中的一种美国竞议辩护中,使用了53180个新的例子和更多的有用metadata,并使用了 txtai semantic search 和知识图工具链生成了9个Semantic Knowledge Graphs。这些知识图可以评估哪些知识图在生成政策辩护案例方面更好。
    Abstract Recent work within the Argument Mining community has shown the applicability of Natural Language Processing systems for solving problems found within competitive debate. One of the most important tasks within competitive debate is for debaters to create high quality debate cases. We show that effective debate cases can be constructed using constrained shortest path traversals on Argumentative Semantic Knowledge Graphs. We study this potential in the context of a type of American Competitive Debate, called Policy Debate, which already has a large scale dataset targeting it called DebateSum. We significantly improve upon DebateSum by introducing 53180 new examples, as well as further useful metadata for every example, to the dataset. We leverage the txtai semantic search and knowledge graph toolchain to produce and contribute 9 semantic knowledge graphs built on this dataset. We create a unique method for evaluating which knowledge graphs are better in the context of producing policy debate cases. A demo which automatically generates debate cases, along with all other code and the Knowledge Graphs, are open-sourced and made available to the public here: https://github.com/Hellisotherpeople/DebateKG
    摘要 最近在辩论挖掘社区中的工作表明,自然语言处理系统可以解决竞赛辩论中的问题。辩论中最重要的任务之一是创建高质量辩论案例。我们表明,可以使用受限短路径搜索在辩论Semantic Knowledge Graphs中构建高效的辩论案例。我们在美国竞赛辩论(Policy Debate)的上下文中研究这一潜力,使用DebateSum数据集进行研究。我们在DebateSum数据集上进行了大规模的扩展和补充,增加了53180个新的示例,以及每个示例的更多有用的元数据。我们使用txtai的semantic搜索和知识图工具链生成和提交了9个基于这些数据集的semantic知识图。我们还创建了一种用于评估这些知识图在生成政策辩论案例中的优劣的评价方法。一个自动生成辩论案例的demo,以及所有代码和知识图,都是公开开源的,可以在以下链接中获取:https://github.com/Hellisotherpeople/DebateKG。

Semi Supervised Meta Learning for Spatiotemporal Learning

  • paper_url: http://arxiv.org/abs/2308.01916
  • repo_url: None
  • paper_authors: Faraz Waseem, Pratyush Muthukumar
  • for: 这个论文的目的是应用元学习到自我超visedMasked autoencoders中进行空间时间学习。
  • methods: 这个论文使用的方法包括:使用Memory Augmented Neural Network(MANN)架构应用元学习到我们的小规模空间时间 dataset上进行视频重建任务,以及在MAE encoder上进行动作分类任务。
  • results: 这个论文的结果显示,通过应用元学习到MAE架构中,可以提高视频重建和动作分类的性能。
    Abstract We approached the goal of applying meta-learning to self-supervised masked autoencoders for spatiotemporal learning in three steps. Broadly, we seek to understand the impact of applying meta-learning to existing state-of-the-art representation learning architectures. Thus, we test spatiotemporal learning through: a meta-learning architecture only, a representation learning architecture only, and an architecture applying representation learning alongside a meta learning architecture. We utilize the Memory Augmented Neural Network (MANN) architecture to apply meta-learning to our framework. Specifically, we first experiment with applying a pre-trained MAE and fine-tuning on our small-scale spatiotemporal dataset for video reconstruction tasks. Next, we experiment with training an MAE encoder and applying a classification head for action classification tasks. Finally, we experiment with applying a pre-trained MAE and fine-tune with MANN backbone for action classification tasks.
    摘要 我们在三个步骤中尝试了应用元学习到自动编码器中进行空间时间学习。我们的目标是理解将元学习应用到现有的状态艺术表示学习架构中的影响。因此,我们通过以下三种方法进行测试:一种仅使用元学习架构,一种仅使用表示学习架构,以及一种同时使用表示学习和元学习架构。我们使用带有记忆增强的神经网络(MANN)架构来应用元学习到我们的小规模空间时间数据集中。首先,我们试验了使用预训练的MAE(自动编码器)并在我们的小规模空间时间数据集中细化 reconstruction 任务。然后,我们试验了训练MAE编码器并应用一个分类头进行动作分类任务。最后,我们试验了使用预训练的MAE并在Mann架构中进行细化,以便进行动作分类任务。

Disentangling Societal Inequality from Model Biases: Gender Inequality in Divorce Court Proceedings

  • paper_url: http://arxiv.org/abs/2307.10200
  • repo_url: None
  • paper_authors: Sujan Dutta, Parth Srivastava, Vaishnavi Solunke, Swaprava Nath, Ashiqur R. KhudaBukhsh
  • for: This paper uses court proceedings to investigate gender inequality in divorce cases in India.
  • methods: The paper uses natural language processing (NLP) techniques to analyze the court proceedings, but also acknowledges the limitations and biases present in these methods.
  • results: The paper finds that while there may be changing social norms in India, with more women challenging patriarchy, the court proceedings reveal striking gender inequality, with women often experiencing domestic violence.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文通过印度离婚法院记录来研究妇女不平等。
  • methods: 这篇论文使用自然语言处理(NLP)技术分析法院记录,但也承认这些方法中存在限制和偏见。
  • results: 这篇论文发现,虽然印度社会规范可能在变化,但法院记录表明,妇女经常遭受家庭暴力。
    Abstract Divorce is the legal dissolution of a marriage by a court. Since this is usually an unpleasant outcome of a marital union, each party may have reasons to call the decision to quit which is generally documented in detail in the court proceedings. Via a substantial corpus of 17,306 court proceedings, this paper investigates gender inequality through the lens of divorce court proceedings. While emerging data sources (e.g., public court records) on sensitive societal issues hold promise in aiding social science research, biases present in cutting-edge natural language processing (NLP) methods may interfere with or affect such studies. We thus require a thorough analysis of potential gaps and limitations present in extant NLP resources. In this paper, on the methodological side, we demonstrate that existing NLP resources required several non-trivial modifications to quantify societal inequalities. On the substantive side, we find that while a large number of court cases perhaps suggest changing norms in India where women are increasingly challenging patriarchy, AI-powered analyses of these court proceedings indicate striking gender inequality with women often subjected to domestic violence.
    摘要 决aroof是法律解除婚姻的法院程序。由于这通常是婚姻合作的不愉快结果,每个方可能有理由来称呼这个决定,通常会在法律程序中详细记录。通过17,306起法律程序的庞大资料库,这篇论文研究了妇女不平等问题,通过婚姻法律程序的视角。虽然新的数据源(如公共法律记录)在社会问题上具有潜在的研究价值,但是现有的自然语言处理(NLP)技术可能会对或affect这些研究。因此,我们需要进行深入的分析,检查现有NLP资源中的潜在差距和局限性。在方法ológico side,我们示出了现有NLP资源的修改,以便量化社会不平等。在 substantiál side,我们发现,尽管有很多法律案件,但AI对这些法律案件的分析表明,妇女在婚姻中 frequently subjected to domestic violence。

Score-based Conditional Generation with Fewer Labeled Data by Self-calibrating Classifier Guidance

  • paper_url: http://arxiv.org/abs/2307.04081
  • repo_url: None
  • paper_authors: Paul Kuo-Ming Huang, Si-An Chen, Hsuan-Tien Lin
  • for: 提高基于标签数据的少量数据下的生成模型质量
  • methods: 使用能量基模型改进类别导向的深度生成模型,并通过使用标签和未标签数据进行准确的均衡
  • results: 提高了基于少量标签数据的生成质量,并在使用不同百分比的标签数据时表现出优异性,证明了提案的方法在生成模型化中具有广泛的应用前景
    Abstract Score-based Generative Models (SGMs) are a popular family of deep generative models that achieves leading image generation quality. Earlier studies have extended SGMs to tackle class-conditional generation by coupling an unconditional SGM with the guidance of a trained classifier. Nevertheless, such classifier-guided SGMs do not always achieve accurate conditional generation, especially when trained with fewer labeled data. We argue that the issue is rooted in unreliable gradients of the classifier and the inability to fully utilize unlabeled data during training. We then propose to improve classifier-guided SGMs by letting the classifier calibrate itself. Our key idea is to use principles from energy-based models to convert the classifier as another view of the unconditional SGM. Then, existing loss for the unconditional SGM can be adopted to calibrate the classifier using both labeled and unlabeled data. Empirical results validate that the proposed approach significantly improves the conditional generation quality across different percentages of labeled data. The improved performance makes the proposed approach consistently superior to other conditional SGMs when using fewer labeled data. The results confirm the potential of the proposed approach for generative modeling with limited labeled data.
    摘要 Score-based生成模型(SGM)是一种深度生成模型,其可以实现领先的图像生成质量。 Earlier studies have extended SGMs to tackle class-conditional generation by coupling an unconditional SGM with the guidance of a trained classifier. However, such classifier-guided SGMs do not always achieve accurate conditional generation, especially when trained with fewer labeled data. We argue that the issue is rooted in unreliable gradients of the classifier and the inability to fully utilize unlabeled data during training. We then propose to improve classifier-guided SGMs by letting the classifier calibrate itself. Our key idea is to use principles from energy-based models to convert the classifier as another view of the unconditional SGM. Then, existing loss for the unconditional SGM can be adopted to calibrate the classifier using both labeled and unlabeled data. Empirical results validate that the proposed approach significantly improves the conditional generation quality across different percentages of labeled data. The improved performance makes the proposed approach consistently superior to other conditional SGMs when using fewer labeled data. The results confirm the potential of the proposed approach for generative modeling with limited labeled data.

Towards Fast and Scalable Private Inference

  • paper_url: http://arxiv.org/abs/2307.04077
  • repo_url: None
  • paper_authors: Jianqiao Mo, Karthik Garimella, Negar Neda, Austin Ebel, Brandon Reagen
  • for: 本文旨在探讨如何通过隐私保护计算(Privacy-Preserving Computation,PPC)技术来提高用户数据的安全性和隐私性。
  • methods: 本文使用了许多现有的隐私保护计算技术,包括同知加密(Homomorphic Encryption,HE)、秘密分享(Secret Sharing,SS)、卷积阵列(Garbled Circuits,GCs)和无知传输(Oblivious Transfer,OT)等。
  • results: 本文对这些技术的使用 overhead 进行了Characterization,并提出了一些加速GCs和HE加速器的解决方案,包括HAAC和RPU。最后,本文还讨论了未来工作的需要,以减少PPC的开销。
    Abstract Privacy and security have rapidly emerged as first order design constraints. Users now demand more protection over who can see their data (confidentiality) as well as how it is used (control). Here, existing cryptographic techniques for security fall short: they secure data when stored or communicated but must decrypt it for computation. Fortunately, a new paradigm of computing exists, which we refer to as privacy-preserving computation (PPC). Emerging PPC technologies can be leveraged for secure outsourced computation or to enable two parties to compute without revealing either users' secret data. Despite their phenomenal potential to revolutionize user protection in the digital age, the realization has been limited due to exorbitant computational, communication, and storage overheads. This paper reviews recent efforts on addressing various PPC overheads using private inference (PI) in neural network as a motivating application. First, the problem and various technologies, including homomorphic encryption (HE), secret sharing (SS), garbled circuits (GCs), and oblivious transfer (OT), are introduced. Next, a characterization of their overheads when used to implement PI is covered. The characterization motivates the need for both GCs and HE accelerators. Then two solutions are presented: HAAC for accelerating GCs and RPU for accelerating HE. To conclude, results and effects are shown with a discussion on what future work is needed to overcome the remaining overheads of PI.
    摘要 <>转换文本到简化中文。<>隐私和安全已经迅速emerge为首要的设计约束。用户现在要求更多的保护,包括谁可以看到他们的数据(保密)以及如何使用它们(控制)。在这里,现有的 криптографические技术 для安全 fallen short:它们可以保护数据在存储或传输时,但必须解密它们进行计算。幸运的是,一种新的计算模式存在,我们称之为隐私保持计算(PPC)。这些技术可以用于安全的外部计算或让两个方面计算而不把用户的秘密数据泄露出来。尽管它们在数字时代中保护用户的潜在潜力很大,但实现却受到了极高的计算、通信和存储开销的限制。这篇文章介绍了在实现隐私保持计算时不同技术的开销。首先,问题和不同技术,包括同质加密(HE)、分 sharing(SS)、拟合圈(GCs)和无意识传输(OT),是介绍的。接着,对这些技术在实现隐私保持计算时的开销进行了描述。这种描述驱动了GCs和HE加速器的需求。然后,文章介绍了两个解决方案:HAAC用于加速GCs,和RPU用于加速HE。最后,文章显示了结果和影响,并进行了未来工作的讨论,以便继续减少隐私保持计算的开销。

Multi-Head Attention Mechanism Learning for Cancer New Subtypes and Treatment Based on Cancer Multi-Omics Data

  • paper_url: http://arxiv.org/abs/2307.04075
  • repo_url: None
  • paper_authors: Liangrui Pan, Dazhen Liu, Yutao Dou, Lian Wang, Zhichao Feng, Pengfei Rong, Liwen Xu, Shaoliang Peng
    for:* 这个研究旨在透过非监督学习的对照学习方法来分类不同种类的肝癌,以提高肝癌诊断、治疗和预后预测。methods:* 这个研究使用了一个普遍化 Framework 基于注意力机制(AMUCL),并提出了一个基于注意力机制的多头注意力对照学习模型(DMACL),以深入探索肝癌多种数据的特点和分类。results:* 相比11个深度学习模型,DMACL 模型在单细胞多种数据集上取得了 C-指数0.002、Silhouette 分数0.801和Davies Bouldin 分数0.38的最佳结果,并在肝癌多种数据集上取得了最可靠的肝癌乱型分类结果。
    Abstract Due to the high heterogeneity and clinical characteristics of cancer, there are significant differences in multi-omics data and clinical features among subtypes of different cancers. Therefore, the identification and discovery of cancer subtypes are crucial for the diagnosis, treatment, and prognosis of cancer. In this study, we proposed a generalization framework based on attention mechanisms for unsupervised contrastive learning (AMUCL) to analyze cancer multi-omics data for the identification and characterization of cancer subtypes. AMUCL framework includes a unsupervised multi-head attention mechanism, which deeply extracts multi-omics data features. Importantly, a decoupled contrastive learning model (DMACL) based on a multi-head attention mechanism is proposed to learn multi-omics data features and clusters and identify new cancer subtypes. This unsupervised contrastive learning method clusters subtypes by calculating the similarity between samples in the feature space and sample space of multi-omics data. Compared to 11 other deep learning models, the DMACL model achieved a C-index of 0.002, a Silhouette score of 0.801, and a Davies Bouldin Score of 0.38 on a single-cell multi-omics dataset. On a cancer multi-omics dataset, the DMACL model obtained a C-index of 0.016, a Silhouette score of 0.688, and a Davies Bouldin Score of 0.46, and obtained the most reliable cancer subtype clustering results for each type of cancer. Finally, we used the DMACL model in the AMUCL framework to reveal six cancer subtypes of AML. By analyzing the GO functional enrichment, subtype-specific biological functions, and GSEA of AML, we further enhanced the interpretability of cancer subtype analysis based on the generalizable AMUCL framework.
    摘要 due to the high heterogeneity and clinical characteristics of cancer, there are significant differences in multi-omics data and clinical features among subtypes of different cancers. therefore, the identification and discovery of cancer subtypes are crucial for the diagnosis, treatment, and prognosis of cancer. in this study, we proposed a generalization framework based on attention mechanisms for unsupervised contrastive learning (AMUCL) to analyze cancer multi-omics data for the identification and characterization of cancer subtypes. AMUCL framework includes a unsupervised multi-head attention mechanism, which deeply extracts multi-omics data features. importantly, a decoupled contrastive learning model (DMACL) based on a multi-head attention mechanism is proposed to learn multi-omics data features and clusters and identify new cancer subtypes. this unsupervised contrastive learning method clusters subtypes by calculating the similarity between samples in the feature space and sample space of multi-omics data. compared to 11 other deep learning models, the DMACL model achieved a C-index of 0.002, a silhouette score of 0.801, and a davies bouldin score of 0.38 on a single-cell multi-omics dataset. on a cancer multi-omics dataset, the DMACL model obtained a C-index of 0.016, a silhouette score of 0.688, and a davies bouldin score of 0.46, and obtained the most reliable cancer subtype clustering results for each type of cancer. finally, we used the DMACL model in the AMUCL framework to reveal six cancer subtypes of AML. by analyzing the go functional enrichment, subtype-specific biological functions, and gsea of AML, we further enhanced the interpretability of cancer subtype analysis based on the generalizable AMUCL framework.

Large-scale global optimization of ultra-high dimensional non-convex landscapes based on generative neural networks

  • paper_url: http://arxiv.org/abs/2307.04065
  • repo_url: None
  • paper_authors: Jiaqi Jiang, Jonathan A. Fan
  • For: 这个论文是关于非凸优化问题的一种基于深度生成网络的算法 мета希略,用于在维度非常高的搜索空间中寻找优化答案。* Methods: 该算法使用了一个特制的损失函数和一个适应性的深度网络 architecture,通过训练这个网络来进行搜索和优化。* Results: 在一些标准的优化问题中,该算法能够比对State-of-the-art algorithm benchmarks更好地性能,并且需要 fewer function evaluations。I hope this helps! Let me know if you have any other questions.
    Abstract We present a non-convex optimization algorithm metaheuristic, based on the training of a deep generative network, which enables effective searching within continuous, ultra-high dimensional landscapes. During network training, populations of sampled local gradients are utilized within a customized loss function to evolve the network output distribution function towards one peak at high-performing optima. The deep network architecture is tailored to support progressive growth over the course of training, which allows the algorithm to manage the curse of dimensionality characteristic of high-dimensional landscapes. We apply our concept to a range of standard optimization problems with dimensions as high as one thousand and show that our method performs better with fewer function evaluations compared to state-of-the-art algorithm benchmarks. We also discuss the role of deep network over-parameterization, loss function engineering, and proper network architecture selection in optimization, and why the required batch size of sampled local gradients is independent of problem dimension. These concepts form the foundation for a new class of algorithms that utilize customizable and expressive deep generative networks to solve non-convex optimization problems.
    摘要 我们提出了一种基于深度生成网络的非 convex 优化算法metaheuristic,可以有效地在维度非常高的连续空间中寻找优点。在网络训练中,通过自定义损失函数来使用射程数据集的人工 popula-tion,逐渐提高网络输出分布函数的优化。我们的网络架构是通过训练进程来支持不断增长,以适应高维度空间的特点。我们在一些标准的优化问题中使用了这种方法,并证明我们的方法可以在 fewer function evaluations 下比 benchmark 更好的性能。我们还讨论了深度网络过parameterization、损失函数工程和网络架构选择对优化的影响,以及批处理大小是独立于问题维度的。这些概念形成了一种新的类型的算法,可以使用可定制和表达ive的深度生成网络来解决非 convex 优化问题。

Bidirectional Attention as a Mixture of Continuous Word Experts

  • paper_url: http://arxiv.org/abs/2307.04057
  • repo_url: https://github.com/yixinw-lab/attention-uai
  • paper_authors: Kevin Christian Wibisono, Yixin Wang
  • for: This paper aims to examine the statistical underpinnings of bidirectional attention in large language models (LLMs), specifically exploring the relationship between bidirectional attention and mixture-of-experts (MoE) weights.
  • methods: The paper uses a combination of theoretical analysis and empirical studies to investigate the statistical properties of bidirectional attention. The authors reparameterize bidirectional attention as a continuous bag of words (CBOW) model with MoE weights, and show that this allows for a deeper understanding of the model’s behavior.
  • results: The paper finds that bidirectional attention with multiple heads and multiple layers is equivalent to stacked MoEs and a mixture of MoEs, respectively. Additionally, the authors extend the model to categorical tabular data and find that it outperforms existing tabular extensions of transformers in out-of-distribution (OOD) generalization. Finally, the paper theoretically characterizes when linear word analogies are present in the word embeddings of bidirectional attention.
    Abstract Bidirectional attention $\unicode{x2013}$ composed of self-attention with positional encodings and the masked language model (MLM) objective $\unicode{x2013}$ has emerged as a key component of modern large language models (LLMs). Despite its empirical success, few studies have examined its statistical underpinnings: What statistical model is bidirectional attention implicitly fitting? What sets it apart from its non-attention predecessors? We explore these questions in this paper. The key observation is that fitting a single-layer single-head bidirectional attention, upon reparameterization, is equivalent to fitting a continuous bag of words (CBOW) model with mixture-of-experts (MoE) weights. Further, bidirectional attention with multiple heads and multiple layers is equivalent to stacked MoEs and a mixture of MoEs, respectively. This statistical viewpoint reveals the distinct use of MoE in bidirectional attention, which aligns with its practical effectiveness in handling heterogeneous data. It also suggests an immediate extension to categorical tabular data, if we view each word location in a sentence as a tabular feature. Across empirical studies, we find that this extension outperforms existing tabular extensions of transformers in out-of-distribution (OOD) generalization. Finally, this statistical perspective of bidirectional attention enables us to theoretically characterize when linear word analogies are present in its word embeddings. These analyses show that bidirectional attention can require much stronger assumptions to exhibit linear word analogies than its non-attention predecessors.
    摘要 bidirectional attention $\unicode{x2013}$ 由自我注意和位置编码组成,并且与屏蔽语言模型(MLM)目标相结合,已经成为现代大语言模型(LLM)的关键组成部分。Despite its empirical success, few studies have examined its statistical underpinnings: What statistical model is bidirectional attention implicitly fitting? What sets it apart from its non-attention predecessors? We explore these questions in this paper. The key observation is that fitting a single-layer single-head bidirectional attention, upon reparameterization, is equivalent to fitting a continuous bag of words (CBOW) model with mixture-of-experts (MoE) weights. Further, bidirectional attention with multiple heads and multiple layers is equivalent to stacked MoEs and a mixture of MoEs, respectively. This statistical viewpoint reveals the distinct use of MoE in bidirectional attention, which aligns with its practical effectiveness in handling heterogeneous data. It also suggests an immediate extension to categorical tabular data, if we view each word location in a sentence as a tabular feature. Across empirical studies, we find that this extension outperforms existing tabular extensions of transformers in out-of-distribution (OOD) generalization. Finally, this statistical perspective of bidirectional attention enables us to theoretically characterize when linear word analogies are present in its word embeddings. These analyses show that bidirectional attention can require much stronger assumptions to exhibit linear word analogies than its non-attention predecessors.

Manifold Filter-Combine Networks

  • paper_url: http://arxiv.org/abs/2307.04056
  • repo_url: https://github.com/krishnaswamylab/mfcn
  • paper_authors: Joyce Chew, Edward De Brouwer, Smita Krishnaswamy, Deanna Needell, Michael Perlmutter
  • for: 本研究旨在更深入理解 manifold neural networks (MNNs), analogous to how the aggregate-combine framework helps with the understanding of graph neural networks (GNNs)。
  • methods: 该类含有多种 subclass,可以看作 manifold 的 аналоги。furthermore, the authors propose a method for implementing such networks when one does not have global knowledge of the manifold, but merely has access to finitely many sample points.
  • results: the authors provide sufficient conditions for the network to provably converge to its continuum limit as the number of sample points tends to infinity. unlike previous work (which focused on specific graph constructions), the rate of convergence does not directly depend on the number of filters used, and exhibits linear dependence on the depth of the network rather than the exponential dependence obtained previously. additionally, the authors provide several examples of interesting subclasses of MFCNs and of the rates of convergence that are obtained under specific graph constructions.
    Abstract We introduce a class of manifold neural networks (MNNs) that we call Manifold Filter-Combine Networks (MFCNs), that aims to further our understanding of MNNs, analogous to how the aggregate-combine framework helps with the understanding of graph neural networks (GNNs). This class includes a wide variety of subclasses that can be thought of as the manifold analog of various popular GNNs. We then consider a method, based on building a data-driven graph, for implementing such networks when one does not have global knowledge of the manifold, but merely has access to finitely many sample points. We provide sufficient conditions for the network to provably converge to its continuum limit as the number of sample points tends to infinity. Unlike previous work (which focused on specific graph constructions), our rate of convergence does not directly depend on the number of filters used. Moreover, it exhibits linear dependence on the depth of the network rather than the exponential dependence obtained previously. Additionally, we provide several examples of interesting subclasses of MFCNs and of the rates of convergence that are obtained under specific graph constructions.
    摘要 我们引入一种类型的拟合神经网络(MNN),称之为拟合筛合网络(MFCN),以深入理解MNN,类似于如何使用集合合并框架理解图神经网络(GNN)。这个类型包括许多子类,可以看作拟合 manifold 中的各种受欢迎 GNN 的拟合。我们 THEN 考虑一种基于构建数据驱动图的方法,实现这些网络,只有 finite 数据点的知识,而不是全局 manifold 的知识。我们提供了足够的条件,使网络可靠地趋向于维度随着数据点的数量增加而减少。与前一个工作不同,我们的速度不直接取决于使用的筛刷数量。此外,它展现出线性取决于网络的深度,而不是之前所获得的对数靠渐增长。我们还提供了一些有趣的 MFCN 的 subclass 和特定图构造下的速度减少率。

Contextual Dynamic Pricing with Strategic Buyers

  • paper_url: http://arxiv.org/abs/2307.04055
  • repo_url: None
  • paper_authors: Pangpang Liu, Zhuoran Yang, Zhaoran Wang, Will Wei Sun
  • For: This paper is written to study the contextual dynamic pricing problem with strategic buyers, where buyers can manipulate their feature data to obtain a lower price.* Methods: The paper proposes a strategic dynamic pricing policy that incorporates the buyers’ strategic behavior into the online learning to maximize the seller’s cumulative revenue. The policy uses a combination of dynamic pricing and strategic behavior handling algorithms to account for the buyers’ manipulation of their feature data.* Results: The paper achieves a sublinear regret upper bound of $O(\sqrt{T})$ and is shown to be superior to other pricing policies that are unaware of the strategic behaviors through extensive experiments.Here’s the Chinese translation of the three points:* For: 这篇论文是研究Contextual Dynamic Pricing问题,在这个问题中,买家可以通过操纵自己的特征数据来获得更低的价格。* Methods: 论文提出了一种战略性动态价格策略,它将买家的战略行为包含在在线学习中,以最大化卖家的累积收益。这种策略使用了动态价格和战略行为处理算法的组合来考虑买家的操纵行为。* Results: 论文实现了一个下限为$O(\sqrt{T})$的非线性 regret bound,并通过广泛的实验证明了其在其他不知情的价格策略比较优秀。
    Abstract Personalized pricing, which involves tailoring prices based on individual characteristics, is commonly used by firms to implement a consumer-specific pricing policy. In this process, buyers can also strategically manipulate their feature data to obtain a lower price, incurring certain manipulation costs. Such strategic behavior can hinder firms from maximizing their profits. In this paper, we study the contextual dynamic pricing problem with strategic buyers. The seller does not observe the buyer's true feature, but a manipulated feature according to buyers' strategic behavior. In addition, the seller does not observe the buyers' valuation of the product, but only a binary response indicating whether a sale happens or not. Recognizing these challenges, we propose a strategic dynamic pricing policy that incorporates the buyers' strategic behavior into the online learning to maximize the seller's cumulative revenue. We first prove that existing non-strategic pricing policies that neglect the buyers' strategic behavior result in a linear $\Omega(T)$ regret with $T$ the total time horizon, indicating that these policies are not better than a random pricing policy. We then establish that our proposed policy achieves a sublinear regret upper bound of $O(\sqrt{T})$. Importantly, our policy is not a mere amalgamation of existing dynamic pricing policies and strategic behavior handling algorithms. Our policy can also accommodate the scenario when the marginal cost of manipulation is unknown in advance. To account for it, we simultaneously estimate the valuation parameter and the cost parameter in the online pricing policy, which is shown to also achieve an $O(\sqrt{T})$ regret bound. Extensive experiments support our theoretical developments and demonstrate the superior performance of our policy compared to other pricing policies that are unaware of the strategic behaviors.
    摘要 企业通常采用个性化价格策略,根据消费者特点来定价。在这个过程中,消费者可以通过滥用自己的特征数据来获得较低的价格,并且会付出一定的操作成本。这种策略性行为可能会妨碍企业实现最高利润。本文研究了上述情境下的动态价格问题,其中买方不仅不见商家的实际特征,而且也不知道自己对产品的评估价值。为了应对这些挑战,我们提出了一种战略性动态价格策略,该策略将买方的策略行为纳入在线学习中,以最大化卖家的总收入。我们首先证明了忽略买方策略行为的非战略价格策略会在总时间 horizon $T$ 上导致一个线性的 $\Omega(T)$ 违和,这表明这些策略不比随机价格策略更好。然后,我们证明了我们提出的策略可以达到一个幂函数Bound $O(\sqrt{T})$ 的违和上限,这表明我们的策略比其他不考虑买方策略行为的价格策略更高效。另外,我们的策略还能够考虑到 manipulate 成本未知的情况,通过在线学习来同时估算买方评估价值和 manipulate 成本参数,并且也可以达到 $O(\sqrt{T})$ 的违和上限。实际实验支持我们的理论发展,并证明了我们的策略在与其他不考虑买方策略行为的价格策略进行比较时具有更高的性能。

A Physics-Informed Low-Shot Learning For sEMG-Based Estimation of Muscle Force and Joint Kinematics

  • paper_url: http://arxiv.org/abs/2307.05361
  • repo_url: None
  • paper_authors: Yue Shi, Shuhao Ma, Yihui Zhao, Zhiqiang Zhang
  • for: 这篇论文旨在提出一种基于surface electromyography(sEMG)的肌力和关节动态分析方法,以便在实时生物机器学中进行无需人工干预的生物机器分析。
  • methods: 该方法基于深度神经网络(DNNs),并将拉格朗日方程和反向动态肌肉模型纳入生成对抗网络(GAN)框架中,以实现结构化特征编码和距离抽象估计。
  • results: 实验结果表明,与物理反向动态肌肉模型相比,该方法的估计结果具有较小的偏差,并且在两个试验(走行和手部运动)中都有较高的准确率。
    Abstract Muscle force and joint kinematics estimation from surface electromyography (sEMG) are essential for real-time biomechanical analysis of the dynamic interplay among neural muscle stimulation, muscle dynamics, and kinetics. Recent advances in deep neural networks (DNNs) have shown the potential to improve biomechanical analysis in a fully automated and reproducible manner. However, the small sample nature and physical interpretability of biomechanical analysis limit the applications of DNNs. This paper presents a novel physics-informed low-shot learning method for sEMG-based estimation of muscle force and joint kinematics. This method seamlessly integrates Lagrange's equation of motion and inverse dynamic muscle model into the generative adversarial network (GAN) framework for structured feature decoding and extrapolated estimation from the small sample data. Specifically, Lagrange's equation of motion is introduced into the generative model to restrain the structured decoding of the high-level features following the laws of physics. And a physics-informed policy gradient is designed to improve the adversarial learning efficiency by rewarding the consistent physical representation of the extrapolated estimations and the physical references. Experimental validations are conducted on two scenarios (i.e. the walking trials and wrist motion trials). Results indicate that the estimations of the muscle forces and joint kinematics are unbiased compared to the physics-based inverse dynamics, which outperforms the selected benchmark methods, including physics-informed convolution neural network (PI-CNN), vallina generative adversarial network (GAN), and multi-layer extreme learning machine (ML-ELM).
    摘要 muscle force和关节动态分析从表面电omyography(sEMG)是生物机器学分析中的关键因素,它们描述了神经肌肉刺激、肌肉动态和动力学的 dynamically interplay。 recent advances in deep neural networks (DNNs) have shown the potential to improve biomechanical analysis in a fully automated and reproducible manner. However, the small sample nature and physical interpretability of biomechanical analysis limit the applications of DNNs. This paper presents a novel physics-informed low-shot learning method for sEMG-based estimation of muscle force and joint kinematics. This method seamlessly integrates Lagrange's equation of motion and inverse dynamic muscle model into the generative adversarial network (GAN) framework for structured feature decoding and extrapolated estimation from the small sample data. Specifically, Lagrange's equation of motion is introduced into the generative model to restrain the structured decoding of the high-level features following the laws of physics. And a physics-informed policy gradient is designed to improve the adversarial learning efficiency by rewarding the consistent physical representation of the extrapolated estimations and the physical references. Experimental validations are conducted on two scenarios (i.e. the walking trials and wrist motion trials). Results indicate that the estimations of the muscle forces and joint kinematics are unbiased compared to the physics-based inverse dynamics, which outperforms the selected benchmark methods, including physics-informed convolution neural network (PI-CNN), vallina generative adversarial network (GAN), and multi-layer extreme learning machine (ML-ELM).

Learning to Group Auxiliary Datasets for Molecule

  • paper_url: http://arxiv.org/abs/2307.04052
  • repo_url: None
  • paper_authors: Tinglin Huang, Ziniu Hu, Rex Ying
  • for: 提高小分子数据集中机器学习模型的性能,因为小分子数据集的数据有限制,通常采用与其他小分子数据集合作,但不一定能够提高模型性能。
  • methods: 提出了一种基于路径优化和二级优化框架的方法,即MolGroup,可以预测每个目标数据集的最佳auxiliary数据集组合,并通过路径优化和meta gradient来优化路径。
  • results: 通过广泛的实验,显示MolGroup可以提高GIN和Graphormer在11个目标数据集上的性能,增加4.41%和3.47%。
    Abstract The limited availability of annotations in small molecule datasets presents a challenge to machine learning models. To address this, one common strategy is to collaborate with additional auxiliary datasets. However, having more data does not always guarantee improvements. Negative transfer can occur when the knowledge in the target dataset differs or contradicts that of the auxiliary molecule datasets. In light of this, identifying the auxiliary molecule datasets that can benefit the target dataset when jointly trained remains a critical and unresolved problem. Through an empirical analysis, we observe that combining graph structure similarity and task similarity can serve as a more reliable indicator for identifying high-affinity auxiliary datasets. Motivated by this insight, we propose MolGroup, which separates the dataset affinity into task and structure affinity to predict the potential benefits of each auxiliary molecule dataset. MolGroup achieves this by utilizing a routing mechanism optimized through a bi-level optimization framework. Empowered by the meta gradient, the routing mechanism is optimized toward maximizing the target dataset's performance and quantifies the affinity as the gating score. As a result, MolGroup is capable of predicting the optimal combination of auxiliary datasets for each target dataset. Our extensive experiments demonstrate the efficiency and effectiveness of MolGroup, showing an average improvement of 4.41%/3.47% for GIN/Graphormer trained with the group of molecule datasets selected by MolGroup on 11 target molecule datasets.
    摘要 因为小分子数据集的标注有限,机器学习模型受到挑战。为了解决这个问题,一种常见的策略是与其他辅助数据集合作。然而,更多的数据不总是能带来改善。当目标数据集和辅助分子数据集之间存在知识差异或矛盾时,可能会出现负面传递。因此,确定可以帮助目标数据集的辅助分子数据集仍是一个关键和未解决的问题。经验分析发现,将分子结构相似性和任务相似性组合起来可以 servir como一个可靠的指标来选择高价值的辅助分子数据集。鼓动了这一点,我们提出了MolGroup,它将数据集的亲和力分解为任务亲和力和结构亲和力,以预测每个目标数据集的最佳辅助分子数据集。MolGroup使用优化的 Routing 机制和meta gradient来实现这一点,并通过最大化目标数据集的性能来衡量亲和力。因此,MolGroup可以预测每个目标数据集的优化辅助分子数据集。我们的广泛实验表明,MolGroup可以提高 GIN 和 Graphormer 在 11 个目标分子数据集上的性能,平均提高4.41%/3.47%。

Parallel Algorithms Align with Neural Execution

  • paper_url: http://arxiv.org/abs/2307.04049
  • repo_url: None
  • paper_authors: Valerie Engelmayer, Dobrik Georgiev, Petar Veličković
  • for: 学习并使用并行算法,以优化神经算法逻辑推理器的性能。
  • methods: 使用并行算法,例如搜索、排序和寻找强连接组件,来利用神经算法逻辑推理器的完全计算能力,从而减少训练时间和提高预测性能。
  • results: 相比顺序实现,并行实现可以减少训练时间,并且在大多数情况下,并行版本可以达到更高的预测性能。
    Abstract Neural algorithmic reasoners are parallel processors. Teaching them sequential algorithms contradicts this nature, rendering a significant share of their computations redundant. Parallel algorithms however may exploit their full computational power, therefore requiring fewer layers to be executed. This drastically reduces training times, as we observe when comparing parallel implementations of searching, sorting and finding strongly connected components to their sequential counterparts on the CLRS framework. Additionally, parallel versions achieve strongly superior predictive performance in most cases.
    摘要 Note:* "Neural algorithmic reasoners" is translated as "神经算法推理器" (shénxīn xiàngxíng suǒyì) in Simplified Chinese.* "Parallel processors" is translated as "并行处理器" (héxìng chùxíng) in Simplified Chinese.* "Sequential algorithms" is translated as "顺序算法" (shùxìng suāfāng) in Simplified Chinese.* "Parallel algorithms" is translated as "并行算法" (héxìng suāfāng) in Simplified Chinese.* "CLRS framework" is translated as "CLRS框架" (C-L-R-S kuàiwā) in Simplified Chinese.

Optimization-based Learning for Dynamic Load Planning in Trucking Service Networks

  • paper_url: http://arxiv.org/abs/2307.04050
  • repo_url: None
  • paper_authors: Ritesh Ojha, Wenbo Chen, Hanyu Zhang, Reem Khir, Alan Erera, Pascal Van Hentenryck
  • for: 这种论文是为了解决货物配送服务网络设计中的货物规划问题,即如何在不同的终端之间分配货物和流量,以及如何在不同的时间点进行货物和流量的划分。
  • methods: 这篇论文使用了动态货物规划问题(DLPP)来 JOINTLY 考虑货物和流量规划问题,并提出了一种决策支持工具来帮助终端执行这些决策。论文还提出了一种目标指导优化方法,以消除网络中每个商品可以通过主要和备用路径的Symmetry,从而提高优化的可靠性和可行性。
  • results: 论文的计算研究表明,使用这种目标指导优化方法可以在约10倍速度下获得同质性的优化解决方案,并且可以在几个数量级快的速度下生成与实际终端决策兼容的解决方案。此外,论文还表明,通过结合机器学习和优化,可以获得货物整合的重要经济效益。
    Abstract The load planning problem is a critical challenge in service network design for parcel carriers: it decides how many trailers (or loads) to assign for dispatch over time between pairs of terminals. Another key challenge is to determine a flow plan, which specifies how parcel volumes are assigned to planned loads. This paper considers the Dynamic Load Planning Problem (DLPP) that considers both flow and load planning challenges jointly to adjust loads and flows as the demand forecast changes over time before the day of operations. The paper aims at developing a decision-support tool to inform planners making these decisions at terminals across the network. The paper formulates the DLPP as a MIP and shows that it admits a large number of symmetries in a network where each commodity can be routed through primary and alternate paths. As a result, an optimization solver may return fundamentally different solutions to closely related problems, confusing planners and reducing trust in optimization. To remedy this limitation, the paper proposes a Goal-Directed Optimization that eliminates those symmetries by generating optimal solutions staying close to a reference plan. The paper also proposes an optimization proxy to address the computational challenges of the optimization models. The proxy combines a machine learning model and a feasibility restoration model and finds solutions that satisfy real-time constraints imposed by planners-in-the-loop. An extensive computational study on industrial instances shows that the optimization proxy is around 10 times faster than the commercial solver in obtaining the same quality solutions and orders of magnitude faster for generating solutions that are consistent with each other. The proposed approach also demonstrates the benefits of the DLPP for load consolidation, and the significant savings obtained from combining machine learning and optimization.
    摘要 服务网络设计中的负载观念问题是货运公司的核心挑战:它决定在时间上将多少货车(或货量)分配给发送。另一个关键挑战是决定流程计划,它决定了各种货物的分配量。这篇文章考虑了时间统计负载观念问题(DLPP),考虑了流程和负载观念问题的 JOINT 解决方案,以适应变化的需求预测。文章的目标是为终端站规划人创建一个决策支持工具,以帮助他们在网络中做出这些决策。文章使用了数理统计方法来形式化 DLPP,并证明了网络中每个商品可以通过主要和备用路径进行 routed。这导致优化 solver 可能会返回对应的解,导致计划师和优化模型之间的不一致,从而减少优化的信任度。为解决这个限制,文章提出了目标导向优化方法,删除网络中的 symmetries,从而确保优化模型返回的解是固定的。文章还提出了一个优化代理,将数理统计方法与可行性修复模型结合,以确保解决方案满足了现场实际的时间限制。一系列的 Computational Study 表明,该优化代理比商业 solver 更快,可以在获得相同质量解决方案的情况下节省大量时间。文章还证明了 DLPP 的负载整合和机器学习优化的好处,并获得了负载观念问题的答案。

Sup-Norm Convergence of Deep Neural Network Estimator for Nonparametric Regression by Adversarial Training

  • paper_url: http://arxiv.org/abs/2307.04042
  • repo_url: None
  • paper_authors: Masaaki Imaizumi
  • for: 这个论文主要探讨了深度神经网络估计器的sup-norm收敛性,并提出了一种新的对抗训练方案来实现这一目标。
  • methods: 作者使用了深度神经网络模型来解决非参数统计问题,并通过对抗训练方案来实现sup-norm收敛性。
  • results: 研究发现,通过对抗训练方案,深度神经网络估计器可以在$L2$-norm上达到更好的性能,而且可以在sup-norm上收敛。此外,作者还扩展了对抗训练方案到更通用的损失函数和数据生成函数上。实验结果支持了理论发现。
    Abstract We show the sup-norm convergence of deep neural network estimators with a novel adversarial training scheme. For the nonparametric regression problem, it has been shown that an estimator using deep neural networks can achieve better performances in the sense of the $L2$-norm. In contrast, it is difficult for the neural estimator with least-squares to achieve the sup-norm convergence, due to the deep structure of neural network models. In this study, we develop an adversarial training scheme and investigate the sup-norm convergence of deep neural network estimators. First, we find that ordinary adversarial training makes neural estimators inconsistent. Second, we show that a deep neural network estimator achieves the optimal rate in the sup-norm sense by the proposed adversarial training with correction. We extend our adversarial training to general setups of a loss function and a data-generating function. Our experiments support the theoretical findings.
    摘要 我们展示了深度神经网络估计器的sup-norm收敛性,使用一个新的反抗训练方案。在非 Parametric 回授问题中,已经证明了使用深度神经网络可以实现更好的性能在 $L2$-norm 的意义上。然而,使用最小二乘训练的神经网络估计器很难实现sup-norm收敛性,因为神经网络模型的深度结构。在这个研究中,我们开发了一个反抗训练方案,并调查深度神经网络估计器的sup-norm收敛性。首先,我们发现了普通的反抗训练会使神经网络估计器不一致。其次,我们显示了一个深度神经网络估计器可以通过我们的反抗训练方案和修正得到最佳的sup-norm收敛性。我们将我们的反抗训练扩展到普通的损失函数和数据生成函数的情况下。我们的实验支持了理论的结论。

The Value of Chess Squares

  • paper_url: http://arxiv.org/abs/2307.05330
  • repo_url: https://github.com/Dpay123/chess
  • paper_authors: Aditya Gupta, Shiva Maharaj, Nicholas Polson, Vadim Sokolov
    for: 研究棋盘上棋子的分布和评估棋盘的价值。methods: 引入杂入valuation的方法,包括对棋子和棋盘进行评估。results: 研究发现, Knight和Bishop的位置有着重要的影响,而Pawn的价值也需要考虑棋盘结构。同时,研究还提供了有价值的 Pawn 评估方法。
    Abstract Valuing chess squares and determining the placement of pieces on the board are the main objectives of our study. With the emergence of chess AI, it has become possible to accurately assess the worth of positions in a game of chess. The conventional approach assigns fixed values to pieces $(\symking=\infty, \symqueen=9, \symrook=5, \symbishop=3, \symknight=3, \sympawn=1)$. We enhance this analysis by introducing marginal valuations for both pieces and squares. We demonstrate our method by examining the positioning of Knights and Bishops, and also provide valuable insights into the valuation of pawns. Notably, Nimzowitsch was among the pioneers in advocating for the significance of Pawn structure and valuation. Finally, we conclude by suggesting potential avenues for future research.
    摘要 我们的研究目标是评估棋盘上的棋子和棋子的位置。随着棋盘AI的出现,我们可以准确评估棋盘上的位置价值。传统方法将棋子分别赋予固定价值(♔∞、♕9、♖5、♗3、♘3、♟1)。我们增强这种分析方法,引入棋子和棋盘的边缘价值。我们通过审查夜报和主教的位置,提供有价值的反馈,并探讨了它们的价值。值得注意的是,尼莫雷维茨(Nimzowitsch)是棋盘结构和价值评估的先驱者之一。最后,我们提出了未来研究的可能性。Note that the piece names in Simplified Chinese are:♔ 皇后 (queen)♕ 王后 (rook)♖ bishop♗ night♘ knight♟ Pawn

Designing a Direct Feedback Loop between Humans and Convolutional Neural Networks through Local Explanations

  • paper_url: http://arxiv.org/abs/2307.04036
  • repo_url: https://github.com/tongstevensun/deepfuse
  • paper_authors: Tong Steven Sun, Yuyang Gao, Shubham Khaladkar, Sijia Liu, Liang Zhao, Young-Ho Kim, Sungsoo Ray Hong
  • For: This paper aims to improve the explainability of Convolutional Neural Networks (CNNs) by providing a more interactive and labor-efficient design for diagnosing and revising CNN vulnerabilities using local explanations.* Methods: The paper proposes an interactive design called DeepFuse, which realizes a direct feedback loop between a user and CNNs in diagnosing and revising CNN vulnerabilities using local explanations.* Results: The paper reports the results of a two-day study with 12 experienced CNN engineers, showing that DeepFuse helps participants create more accurate and “reasonable” models than the current state-of-the-art, and that participants found the way DeepFuse guides case-based reasoning to be practically useful for their current practice.Here’s the Chinese translation of the three points:* For: 这篇论文目的是提高卷积神经网络(CNN)的可解释性,通过提供更有优化的交互式和劳动效率的设计,以便用户更好地诊断和修复 CNN 的潜在漏洞。* Methods: 论文提出了一种交互式的设计方案called DeepFuse,它实现了用户和 CNN 之间的直接反馈循环,以便更好地诊断和修复 CNN 的潜在漏洞。* Results: 论文报告了一项两天的研究, involvement 12名有经验的 CNN 工程师,结果显示,使用 DeepFuse 可以帮助参与者创建更准确和 “合理” 的模型,并且参与者认为 DeepFuse 引导的案例分析方法是实际上有用的 для他们当前的做法。
    Abstract The local explanation provides heatmaps on images to explain how Convolutional Neural Networks (CNNs) derive their output. Due to its visual straightforwardness, the method has been one of the most popular explainable AI (XAI) methods for diagnosing CNNs. Through our formative study (S1), however, we captured ML engineers' ambivalent perspective about the local explanation as a valuable and indispensable envision in building CNNs versus the process that exhausts them due to the heuristic nature of detecting vulnerability. Moreover, steering the CNNs based on the vulnerability learned from the diagnosis seemed highly challenging. To mitigate the gap, we designed DeepFuse, the first interactive design that realizes the direct feedback loop between a user and CNNs in diagnosing and revising CNN's vulnerability using local explanations. DeepFuse helps CNN engineers to systemically search "unreasonable" local explanations and annotate the new boundaries for those identified as unreasonable in a labor-efficient manner. Next, it steers the model based on the given annotation such that the model doesn't introduce similar mistakes. We conducted a two-day study (S2) with 12 experienced CNN engineers. Using DeepFuse, participants made a more accurate and "reasonable" model than the current state-of-the-art. Also, participants found the way DeepFuse guides case-based reasoning can practically improve their current practice. We provide implications for design that explain how future HCI-driven design can move our practice forward to make XAI-driven insights more actionable.
    摘要 当地解释提供了图像上的热图,以解释卷积神经网络(CNN)的输出。由于其可见直观,这种方法在诊断CNN方面变得非常流行。然而,我们在第一个研究(S1)中发现,机器学习工程师对当地解释持有两样的看法:一方面,他们认为这种方法是非常有价值和必不可少的视角,另一方面,由于探测漏洞的做法是尝试性的,因此导致了困惑。此外,根据诊断得到的漏洞来控制CNN也显得很困难。为了弥合这个差距,我们设计了深度融合(DeepFuse),这是第一个可互动的设计,可以在诊断和修复CNN漏洞中直接将用户和CNN之间建立反馈循环。深度融合帮助CNN工程师系统地搜索"不合理"的当地解释,并在劳动效率高的情况下注释新的边界。然后,它会根据给出的注释来导向模型,以避免模型 introduce类似的错误。我们在第二个研究(S2)中与12名经验丰富的CNN工程师进行了两天的研究。使用深度融合,参与者可以建立更加准确和"合理"的模型,并发现了深度融合在诊断过程中的指导作用。我们提供了设计意味着,解释如何未来驱动的设计可以将XAI驱动的洞察力转化为实际行动。

Learning Variational Neighbor Labels for Test-Time Domain Generalization

  • paper_url: http://arxiv.org/abs/2307.04033
  • repo_url: None
  • paper_authors: Sameer Ambekar, Zehao Xiao, Jiayi Shen, Xiantong Zhen, Cees G. M. Snoek
  • for: 本研究探讨域间泛化问题,即将模型在不同域面上进行训练后在未知目标域上进行测试。
  • methods: 我们提出了三种贡献:首先,我们提出了 probabilistic pseudo-labeling 方法,通过模型在测试时对目标样本的推测来对源域训练过的模型进行泛化。其次,我们学习了variational neighbor labels,将邻域目标样本的信息integrated到pseudo标签中,以生成更加稳定和robust的pseudo标签。最后,我们引入了元泛化阶段,通过模拟泛化过程来帮助模型更好地integrate更多的目标信息和生成更加精准和robust的variational neighbor labels。
  • results: 我们在六个常用的数据集上进行了实验,结果表明我们的提议可以提高模型的泛化能力,并且可以在不同的域面上进行更好的泛化。
    Abstract This paper strives for domain generalization, where models are trained exclusively on source domains before being deployed at unseen target domains. We follow the strict separation of source training and target testing but exploit the value of the unlabeled target data itself during inference. We make three contributions. First, we propose probabilistic pseudo-labeling of target samples to generalize the source-trained model to the target domain at test time. We formulate the generalization at test time as a variational inference problem by modeling pseudo labels as distributions to consider the uncertainty during generalization and alleviate the misleading signal of inaccurate pseudo labels. Second, we learn variational neighbor labels that incorporate the information of neighboring target samples to generate more robust pseudo labels. Third, to learn the ability to incorporate more representative target information and generate more precise and robust variational neighbor labels, we introduce a meta-generalization stage during training to simulate the generalization procedure. Experiments on six widely-used datasets demonstrate the benefits, abilities, and effectiveness of our proposal.
    摘要 这篇论文努力实现领域总结,即模型在训练后在未见目标领域中进行部署。我们遵循严格的源训练和目标测试的分离,但是利用目标数据本身的价值进行推断。我们提出了三项贡献:一、在测试时对目标样本进行 probabilistic pseudo-标签,以使源训练模型在目标领域进行普适化。我们将总结问题形式为变量推理问题,模型 pseudo labels 为分布来考虑不确定性,以避免 pseudo labels 的不准确信号。二、我们学习了变量邻域标签,以包含邻近目标样本的信息生成更加稳健的 pseudo labels。三、为了学习更好地包含更多的目标信息并生成更精准和稳定的变量邻域标签,我们引入了元总结阶段在训练中进行模拟总结过程。我们在六种广泛使用的数据集上进行了实验,并证明了我们的提议的优点、能力和有效性。

Measuring the Success of Diffusion Models at Imitating Human Artists

  • paper_url: http://arxiv.org/abs/2307.04028
  • repo_url: None
  • paper_authors: Stephen Casper, Zifan Guo, Shreya Mogulothu, Zachary Marinov, Chinmay Deshpande, Rui-Jie Yew, Zheng Dai, Dylan Hadfield-Menell
  • for: 这个论文旨在检查生成模型是否可以模仿人类艺术家的作品。
  • methods: 这个论文使用了语义映射学习模型CLIP来测试生成模型是否可以模仿特定艺术家的作品。
  • results: 研究发现,当生成模型被训练时,可以很准确地模仿70名艺术家的作品,并且可以在图像上匹配这些作品的特征。
    Abstract Modern diffusion models have set the state-of-the-art in AI image generation. Their success is due, in part, to training on Internet-scale data which often includes copyrighted work. This prompts questions about the extent to which these models learn from, imitate, or copy the work of human artists. This work suggests that tying copyright liability to the capabilities of the model may be useful given the evolving ecosystem of generative models. Specifically, much of the legal analysis of copyright and generative systems focuses on the use of protected data for training. As a result, the connections between data, training, and the system are often obscured. In our approach, we consider simple image classification techniques to measure a model's ability to imitate specific artists. Specifically, we use Contrastive Language-Image Pretrained (CLIP) encoders to classify images in a zero-shot fashion. Our process first prompts a model to imitate a specific artist. Then, we test whether CLIP can be used to reclassify the artist (or the artist's work) from the imitation. If these tests match the imitation back to the original artist, this suggests the model can imitate that artist's expression. Our approach is simple and quantitative. Furthermore, it uses standard techniques and does not require additional training. We demonstrate our approach with an audit of Stable Diffusion's capacity to imitate 70 professional digital artists with copyrighted work online. When Stable Diffusion is prompted to imitate an artist from this set, we find that the artist can be identified from the imitation with an average accuracy of 81.0%. Finally, we also show that a sample of the artist's work can be matched to these imitation images with a high degree of statistical reliability. Overall, these results suggest that Stable Diffusion is broadly successful at imitating individual human artists.
    摘要 现代扩散模型已经提高了人工智能图像生成的水平。这些模型的成功部分归功于训练在互联网规模的数据上,这些数据经常包含版权保护的作品。这种情况引发了关于模型学习、模仿或复制人类艺术家的作品的问题。本研究表明,将版权责任与模型的能力相关可能是有用的,尤其在生成模型生态系统不断演化的情况下。现在的法律分析对版权和生成系统的关系都集中在使用保护数据进行训练上。因此,数据、训练和系统之间的连接经常被隐藏。我们的方法是通过使用语言-图像预训练(CLIP)编码器来测试模型是否可以模仿特定艺术家。我们的过程是先让模型模仿特定艺术家,然后使用CLIP来重新分类艺术家(或者艺术家的作品)。如果这些测试与模仿相匹配,这表明模型可以模仿该艺术家的表达。我们的方法简单、量化,并且不需要额外训练。我们通过对Stable Diffusion的可imitability进行审核,发现这些模型可以成功地模仿70名职业数字艺术家的版权作品。当Stable Diffusion被要求模仿这些艺术家时,我们发现这些艺术家可以通过模仿的方式与模型的作品相匹配,平均准确率为81.0%。此外,我们还发现这些模仿作品与艺术家的作品之间存在高度的统计相似性。总之,这些结果表明Stable Diffusion可以广泛地模仿人类艺术家。

Robust Ranking Explanations

  • paper_url: http://arxiv.org/abs/2307.04024
  • repo_url: None
  • paper_authors: Chao Chen, Chenghua Guo, Guixiang Ma, Ming Zeng, Xi Zhang, Sihong Xie
  • for: 提高机器学习模型的解释可靠性,以便建立人类对模型的信任。
  • methods: 使用 explanation thickness 度量精要特征的排名稳定性,并 derivate tractable surrogate bounds 来设计 \textit{R2ET} 算法,以高效地提高精要特征的鲁棒性。
  • results: 对各种网络架构和数据模式,包括大脑网络,实验表明 R2ET 可以在恶意攻击下增强解释鲁棒性,保持准确性。
    Abstract Robust explanations of machine learning models are critical to establish human trust in the models. Due to limited cognition capability, most humans can only interpret the top few salient features. It is critical to make top salient features robust to adversarial attacks, especially those against the more vulnerable gradient-based explanations. Existing defense measures robustness using $\ell_p$-norms, which have weaker protection power. We define explanation thickness for measuring salient features ranking stability, and derive tractable surrogate bounds of the thickness to design the \textit{R2ET} algorithm to efficiently maximize the thickness and anchor top salient features. Theoretically, we prove a connection between R2ET and adversarial training. Experiments with a wide spectrum of network architectures and data modalities, including brain networks, demonstrate that R2ET attains higher explanation robustness under stealthy attacks while retaining accuracy.
    摘要 机器学习模型的强健解释是建立人类信任的关键。由于人类认知能力有限,大多数人只能理解最重要的几个突出特征。因此,使得最重要的突出特征对抗攻击,特别是对于更脆弱的梯度基于解释而言,是非常重要的。现有的防御措施通过 $\ell_p$-norm 来实现强健性,但这些措施具有较弱的保护力。我们定义解释厚度来衡量突出特征排名稳定性,并 deriv 出可观测的代理 bound 来设计 \textit{R2ET} 算法,以高效地提高厚度和固定最重要的突出特征。理论上,我们证明 R2ET 与反击训练之间存在联系。实验结果表明,R2ET 在隐蔽攻击下具有更高的解释强度,同时保持准确性。

Robust Learning-Based Incipient Slip Detection using the PapillArray Optical Tactile Sensor for Improved Robotic Gripping

  • paper_url: http://arxiv.org/abs/2307.04011
  • repo_url: None
  • paper_authors: Qiang Wang, Pablo Martinez Ulloa, Robert Burke, David Cordova Bulens, Stephen J. Redmond
  • for: This paper aims to detect incipient slip in robotic gripping tasks using a learning-based approach with the PapillArray tactile sensor.
  • methods: The proposed approach uses a machine learning model to identify patterns associated with incipient slip, and the model is trained using data augmentation techniques to enhance its robustness.
  • results: The proposed approach achieved a high detection success rate of 95.6% when tested with an offline dataset, and maintained robust performance with a success rate of 96.8% when transferred to a robotic gripping environment distinct from where the training data was collected.Here is the information in Simplified Chinese text:
  • for: 本研究旨在使用学习基本法探测机器人抓取任务中的incipient slip。
  • methods: 提议的方法使用机器学习模型识别incipient slip的模式,并通过数据增强技术提高模型的可靠性。
  • results: 提议的方法在测试数据集上达到了95.6%的检测成功率,并在将训练数据集 transferred to robotic gripping环境中保持了96.8%的成功率。
    Abstract The ability to detect slip, particularly incipient slip, enables robotic systems to take corrective measures to prevent a grasped object from being dropped. Therefore, slip detection can enhance the overall security of robotic gripping. However, accurately detecting incipient slip remains a significant challenge. In this paper, we propose a novel learning-based approach to detect incipient slip using the PapillArray (Contactile, Australia) tactile sensor. The resulting model is highly effective in identifying patterns associated with incipient slip, achieving a detection success rate of 95.6% when tested with an offline dataset. Furthermore, we introduce several data augmentation methods to enhance the robustness of our model. When transferring the trained model to a robotic gripping environment distinct from where the training data was collected, our model maintained robust performance, with a success rate of 96.8%, providing timely feedback for stabilizing several practical gripping tasks. Our project website: https://sites.google.com/view/incipient-slip-detection.
    摘要 “感知滑动,特别是潜在滑动,可以让机器人系统采取正确的措施以防止握住的物品被掉落。因此,滑动检测可以提高机器人握住的安全性。但是,准确地检测潜在滑动仍然是一项重要的挑战。在这篇论文中,我们提出了一种基于学习的方法来检测潜在滑动,使用Contactile(澳大利亚)的PapillArray感知器。我们的模型可以高效地识别潜在滑动的模式,在测试集上达到95.6%的检测成功率。此外,我们还提出了一些数据增强方法来提高我们的模型的稳定性。当将训练模型应用于机器人握住环境中,与训练数据集不同的环境下,我们的模型保持了96.8%的成功率,提供了实时反馈,以稳定许多实际的握住任务。更多信息请访问我们的项目网站:https://sites.google.com/view/incipient-slip-detection。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know and I can provide that as well.

Understanding the Efficacy of U-Net & Vision Transformer for Groundwater Numerical Modelling

  • paper_url: http://arxiv.org/abs/2307.04010
  • repo_url: None
  • paper_authors: Maria Luisa Taccari, Oded Ovadia, He Wang, Adar Kahana, Xiaohui Chen, Peter K. Jimack
  • for: 本研究比较了不同机器学习模型(U-Net、U-Net+ViT和FNO)在地下水系统中的时间依赖前向模型。
  • methods: 本研究使用了Synthetic数据进行测试,并证明了U-Net和U-Net+ViT模型在精度和效率方面在缺少数据情况下表现出色,特别是在缺少数据情况下。
  • results: 研究发现,U-Net和U-Net+ViT模型在缺少数据情况下的准确率和效率较高,这些结果表明U-Net基于模型在实际应用中的地下水模拟中具有潜力。
    Abstract This paper presents a comprehensive comparison of various machine learning models, namely U-Net, U-Net integrated with Vision Transformers (ViT), and Fourier Neural Operator (FNO), for time-dependent forward modelling in groundwater systems. Through testing on synthetic datasets, it is demonstrated that U-Net and U-Net + ViT models outperform FNO in accuracy and efficiency, especially in sparse data scenarios. These findings underscore the potential of U-Net-based models for groundwater modelling in real-world applications where data scarcity is prevalent.
    摘要

Polynomial Width is Sufficient for Set Representation with High-dimensional Features

  • paper_url: http://arxiv.org/abs/2307.04001
  • repo_url: None
  • paper_authors: Peihao Wang, Shenghao Yang, Shu Li, Zhangyang Wang, Pan Li
  • for: 研究深度学习中集合表示的维度$L$的影响,以确定最小的$L$值可以实现足够的表达力。
  • methods: 使用两种集合元素嵌入层:(a) 线性+指数活化(LP)和(b) 线性+指数活化(LE),并证明$L$可以是$N$和$D$的乘积。
  • results: 表示$L$可以是$N$和$D$的乘积,并提供了对LP嵌入层的下界。此外,我们还扩展了结果到卷积Equivariant集函数和复数域。
    Abstract Set representation has become ubiquitous in deep learning for modeling the inductive bias of neural networks that are insensitive to the input order. DeepSets is the most widely used neural network architecture for set representation. It involves embedding each set element into a latent space with dimension $L$, followed by a sum pooling to obtain a whole-set embedding, and finally mapping the whole-set embedding to the output. In this work, we investigate the impact of the dimension $L$ on the expressive power of DeepSets. Previous analyses either oversimplified high-dimensional features to be one-dimensional features or were limited to analytic activations, thereby diverging from practical use or resulting in $L$ that grows exponentially with the set size $N$ and feature dimension $D$. To investigate the minimal value of $L$ that achieves sufficient expressive power, we present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE). We demonstrate that $L$ being poly$(N, D)$ is sufficient for set representation using both embedding layers. We also provide a lower bound of $L$ for the LP embedding layer. Furthermore, we extend our results to permutation-equivariant set functions and the complex field.
    摘要 设 Representation 已经成为深度学习中广泛使用的 inductive bias,用于建立不受输入顺序影响的神经网络。DeepSets 是最广泛使用的集合表示方法之一,它包括将每个集合元素embedding到一个维度为 $L$ 的隐藏空间中,然后使用总池化以获取整个集合的权重,最后将整个集合权重映射到输出。在这项工作中,我们研究了 $L$ 的缩放对 DeepSets 的表达力的影响。先前的分析都是将高维特征简化为一维特征,或者只是对分析性活动进行了限制,从而与实际应用不符或者 $L$ 与集合大小 $N$ 和特征维度 $D$ 相乘 exponential 增长。为了研究最小的 $L$ 可以 достичь足够的表达力,我们提出了两种集合元素嵌入层:(a)线性 + 力活动(LP)和(b)线性 + 指数活动(LE)。我们示出 $L$ 是 poly $(N, D)$ 是 sufficient for set representation 的。我们还提供了 LP 嵌入层的下界。此外,我们将结果推广到卷积同态函数和复数域。

eess.IV - 2023-07-09

Ultrasonic Image’s Annotation Removal: A Self-supervised Noise2Noise Approach

  • paper_url: http://arxiv.org/abs/2307.04133
  • repo_url: https://github.com/grandarth/ultrasonicimage-n2n-approach
  • paper_authors: Yuanheng Zhang, Nan Jiang, Zhaoheng Xie, Junying Cao, Yueyang Teng
  • for: 提高医疗报告质量的高级医疗图像标注数据集的自动化处理。
  • methods: 使用噪声作为预 Text task,使用Noise2Noise scheme进行模型训练,以恢复图像到干净状态。
  • results: 对不同类型的标注数据进行测试,大多数基于Noise2Noise scheme的模型在噪声恢复任务中表现出色,特别是使用自定义U-Net结构在Body marker标注数据集上得到了最佳效果,具有高精度和高重建相似性。
    Abstract Accurately annotated ultrasonic images are vital components of a high-quality medical report. Hospitals often have strict guidelines on the types of annotations that should appear on imaging results. However, manually inspecting these images can be a cumbersome task. While a neural network could potentially automate the process, training such a model typically requires a dataset of paired input and target images, which in turn involves significant human labour. This study introduces an automated approach for detecting annotations in images. This is achieved by treating the annotations as noise, creating a self-supervised pretext task and using a model trained under the Noise2Noise scheme to restore the image to a clean state. We tested a variety of model structures on the denoising task against different types of annotation, including body marker annotation, radial line annotation, etc. Our results demonstrate that most models trained under the Noise2Noise scheme outperformed their counterparts trained with noisy-clean data pairs. The costumed U-Net yielded the most optimal outcome on the body marker annotation dataset, with high scores on segmentation precision and reconstruction similarity. We released our code at https://github.com/GrandArth/UltrasonicImage-N2N-Approach.
    摘要 高品质医疗报告中的准确标注图像是关键组成部分。医院通常有严格的图像标注规范,但手动检查这些图像可以是一项繁琐的任务。使用神经网络自动化这个过程可能是一个解决方案,但是训练这种模型通常需要一个包含输入和目标图像的 paired 数据集,这需要很多人工劳动。本研究提出了一种自动化图像标注检测方法。这是通过将标注视为噪声,创建一种自我超vised pretext task,并使用基于 Noise2Noise 方案训练的模型来还原图像为一个清晰的状态来实现的。我们测试了不同类型的标注,包括体 markers 标注和径向线标注等,并评估了不同模型结构的性能。我们的结果表明,大多数基于 Noise2Noise 方案训练的模型在杂噪clean 数据对比下表现出色,并且用于体 markers 标注集上的 customized U-Net 得到了最佳的结果,其segmentation精度和重建相似度均达到了高水平。我们的代码可以在 上获取。

Enhancing Low-Light Images Using Infrared-Encoded Images

  • paper_url: http://arxiv.org/abs/2307.04122
  • repo_url: https://github.com/wyf0912/ELIEI
  • paper_authors: Shulin Tian, Yufei Wang, Renjie Wan, Wenhan Yang, Alex C. Kot, Bihan Wen
  • for: 提高低光照图像的可见度和细节表示
  • methods: 移除卡口内的折射光镜Filter,使用更多的照明信息从近infraredpectrum中获取更高的信噪比
  • results: 对比 referencetest dataset,提出的方法能够更好地提高低光照图像的可见度和细节表示,并且量化和 каче地比较好
    Abstract Low-light image enhancement task is essential yet challenging as it is ill-posed intrinsically. Previous arts mainly focus on the low-light images captured in the visible spectrum using pixel-wise loss, which limits the capacity of recovering the brightness, contrast, and texture details due to the small number of income photons. In this work, we propose a novel approach to increase the visibility of images captured under low-light environments by removing the in-camera infrared (IR) cut-off filter, which allows for the capture of more photons and results in improved signal-to-noise ratio due to the inclusion of information from the IR spectrum. To verify the proposed strategy, we collect a paired dataset of low-light images captured without the IR cut-off filter, with corresponding long-exposure reference images with an external filter. The experimental results on the proposed dataset demonstrate the effectiveness of the proposed method, showing better performance quantitatively and qualitatively. The dataset and code are publicly available at https://wyf0912.github.io/ELIEI/
    摘要 低光照图像提升任务是必备又挑战的,因为它是一个内在不定的问题。先前的艺术主要通过像素精度损失来处理低光照图像,这限制了恢复图像的亮度、对比度和细节的能力,因为可得到的光子数很少。在这个工作中,我们提议一种新的方法,利用取消相机内置的红外(IR)剔除filter,以获取更多的光子数,从而提高信号噪声比。为验证提议的策略,我们收集了一个对应的低光照图像集和长曝光参照图像集,使用外部滤镜获取。实验结果表明,提议的方法具有更好的数量和质量性能。数据集和代码在https://wyf0912.github.io/ELIEI/上公开。

Enhancing Building Semantic Segmentation Accuracy with Super Resolution and Deep Learning: Investigating the Impact of Spatial Resolution on Various Datasets

  • paper_url: http://arxiv.org/abs/2307.04101
  • repo_url: None
  • paper_authors: Zhiling Guo, Xiaodan Shi, Haoran Zhang, Dou Huang, Xiaoya Song, Jinyue Yan, Ryosuke Shibasaki
  • for: 本研究旨在探讨深度学习基于远程感知技术的建筑 semantics 分割中 spatial resolution 的影响。
  • methods: 本研究使用了 super-resolution 和 down-sampling 技术将 remote sensing 图像转化为多个空间分辨率,然后选择了 UNet 和 FPN 两种深度学习模型进行训练和测试。
  • results: 实验结果显示,建筑 semantics 分割结果受到空间分辨率的影响,并且在约 0.3m 的空间分辨率下达到最佳成本效果。
    Abstract The development of remote sensing and deep learning techniques has enabled building semantic segmentation with high accuracy and efficiency. Despite their success in different tasks, the discussions on the impact of spatial resolution on deep learning based building semantic segmentation are quite inadequate, which makes choosing a higher cost-effective data source a big challenge. To address the issue mentioned above, in this study, we create remote sensing images among three study areas into multiple spatial resolutions by super-resolution and down-sampling. After that, two representative deep learning architectures: UNet and FPN, are selected for model training and testing. The experimental results obtained from three cities with two deep learning models indicate that the spatial resolution greatly influences building segmentation results, and with a better cost-effectiveness around 0.3m, which we believe will be an important insight for data selection and preparation.
    摘要 <> translate "The development of remote sensing and deep learning techniques has enabled building semantic segmentation with high accuracy and efficiency. Despite their success in different tasks, the discussions on the impact of spatial resolution on deep learning based building semantic segmentation are quite inadequate, which makes choosing a higher cost-effective data source a big challenge. To address the issue mentioned above, in this study, we create remote sensing images among three study areas into multiple spatial resolutions by super-resolution and down-sampling. After that, two representative deep learning architectures: UNet and FPN, are selected for model training and testing. The experimental results obtained from three cities with two deep learning models indicate that the spatial resolution greatly influences building segmentation results, and with a better cost-effectiveness around 0.3m, which we believe will be an important insight for data selection and preparation." into Chinese (Simplified)📝发展远程感知和深度学习技术,使得建筑 semantic segmentation 的准确率和效率得到了大幅提高。然而,关于深度学习基于建筑 semantic segmentation 的空间分辨率影响的讨论,尚不充分,这使得选择更高效果和经济的数据源成为一大挑战。为解决上述问题,本研究中将Remote sensing 图像在三个研究区域中进行多个空间分辨率的创建,通过超分辨和降采样。然后,选择了两种代表性的深度学习架构:UNet 和 FPN。通过三座城市的两个深度学习模型的实验结果显示,空间分辨率对建筑 segmentation 结果产生了极大的影响,并且在约0.3米的成本效果上表现较好。我们认为这将成为数据选择和准备的重要意见。

Combining transmission speckle photography and convolutional neural network for determination of fat content in cow’s milk – an exercise in classification of parameters of a complex suspension

  • paper_url: http://arxiv.org/abs/2307.15069
  • repo_url: None
  • paper_authors: Kwasi Nyandey, Daniel Jakubczyk
  • for: direct classification and recognition of milk fat content classes
  • methods: combined transmission speckle photography and machine learning
  • results: unambiguous recognition of milk fat content classes with high accuracy (100% and ~99%)
    Abstract We have combined transmission speckle photography and machine learning for direct classification and recognition of milk fat content classes. Our aim was hinged on the fact that parameters of scattering particles (and the dispersion medium) can be linked to the intensity distribution (speckle) observed when coherent light is transmitted through a scattering medium. For milk, it is primarily the size distribution and concentration of fat globules, which constitutes the total fat content. Consequently, we trained convolutional neural network to recognise and classify laser speckle from different fat content classes (0.5, 1.5, 2.0 and 3.2%). We investigated four exposure-time protocols and obtained the highest performance for shorter exposure times, in which the intensity histograms are kept similar for all images and the most probable intensity in the speckle pattern is close to zero. Our neural network was able to recognize the milk fat content classes unambiguously and we obtained the highest test and independent classification accuracies of 100 and ~99% respectively. It indicates that the parameters of other complex realistic suspensions could be classified with similar methods.
    摘要 我们将传输扫描光学和机器学习结合,以直接分类和识别牛奶脂肪含量类别。我们的目标是利用散射体粒子(以及散射媒体)的参数与扫描光通过散射媒体所观察到的INTENSITY分布(扫描光斑)之间存在关系。在牛奶中,主要是脂肪球体大小分布和总脂肪含量。因此,我们将 convolutional neural network 训练来识别和分类不同脂肪含量类别(0.5%, 1.5%, 2.0%和3.2%)。我们研究了四种曝光时间协议,并获得了最高性能,其中曝光时间较短,图像INTENSITY频谱均匀,最 probable intensity 在扫描光斑模式中接近零。我们的神经网络能够不 ambiguously 识别牛奶脂肪含量类别,并在测试和独立分类准确率达100%和~99%。这表明可以使用类似方法来分类其他复杂的真实涂敷。

cs.SD - 2023-07-08

Anti-noise window: Subjective perception of active noise reduction and effect of informational masking

  • paper_url: http://arxiv.org/abs/2307.05533
  • repo_url: https://github.com/ntudsp/SPANR
  • paper_authors: Bhan Lam, Kelvin Chee Quan Lim, Kenneth Ooi, Zhen-Ting Ong, Dongyuan Shi, Woon-Seng Gan
  • for: 这个论文旨在探讨在室内隔热中使用活动噪音控制技术以提高室内噪音舒适性。
  • methods: 这个论文使用了一种名为”反噪窗”(ANW)的活动噪音控制技术,并与信息掩蔽(IM)相结合,以evaluate其在模拟床间中的效果。
  • results: 研究发现,使用ANW可以significantly reductions in perceived annoyance(PAY)和听起来强度(PLN),并提高ISO pleasantness,但是水Masking可能会增加PLN。此外, combining ANC with maskers可以获得互动效果,使得maskers Significantly reduce PAY compared to ANC alone。
    Abstract Reviving natural ventilation (NV) for urban sustainability presents challenges for indoor acoustic comfort. Active control and interference-based noise mitigation strategies, such as the use of loudspeakers, offer potential solutions to achieve acoustic comfort while maintaining NV. However, these approaches are not commonly integrated or evaluated from a perceptual standpoint. This study examines the perceptual and objective aspects of an active-noise-control (ANC)-based "anti-noise" window (ANW) and its integration with informational masking (IM) in a model bedroom. Forty participants assessed the ANW in a three-way interaction involving noise types (traffic, train, and aircraft), maskers (bird, water), and ANC (on, off). The evaluation focused on perceived annoyance (PAY; ISO/TS 15666), perceived affective quality (ISO/TS 12913-2), loudness (PLN), and included an open-ended qualitative assessment. Despite minimal objective reduction in decibel-based indicators and a slight increase in psychoacoustic sharpness, the ANW alone demonstrated significant reductions in PAY and PLN, as well as an improvement in ISO pleasantness across all noise types. The addition of maskers generally enhanced overall acoustic comfort, although water masking led to increased PLN. Furthermore, the combination of ANC with maskers showed interaction effects, with both maskers significantly reducing PAY compared to ANC alone.
    摘要 reviving natural ventilation (NV) for urban sustainability poses challenges for indoor acoustic comfort. active control and interference-based noise mitigation strategies, such as the use of loudspeakers, offer potential solutions to achieve acoustic comfort while maintaining NV. however, these approaches are not commonly integrated or evaluated from a perceptual standpoint. this study examines the perceptual and objective aspects of an active-noise-control (ANC)-based "anti-noise" window (ANW) and its integration with informational masking (IM) in a model bedroom. forty participants assessed the ANW in a three-way interaction involving noise types (traffic, train, and aircraft), maskers (bird, water), and ANC (on, off). the evaluation focused on perceived annoyance (PAY; ISO/TS 15666), perceived affective quality (ISO/TS 12913-2), loudness (PLN), and included an open-ended qualitative assessment. despite minimal objective reduction in decibel-based indicators and a slight increase in psychoacoustic sharpness, the ANW alone demonstrated significant reductions in PAY and PLN, as well as an improvement in ISO pleasantness across all noise types. the addition of maskers generally enhanced overall acoustic comfort, although water masking led to increased PLN. furthermore, the combination of ANC with maskers showed interaction effects, with both maskers significantly reducing PAY compared to ANC alone.

On decoder-only architecture for speech-to-text and large language model integration

  • paper_url: http://arxiv.org/abs/2307.03917
  • repo_url: None
  • paper_authors: Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu
  • for: 这个研究旨在探讨如何将语音信号纳入文本大语言模型中,以提高人机交互的自然语言处理能力。
  • methods: 该方法使用Connectionist Temporal Classification和简单的音频编码器将压缩语音特征映射到大语言模型中的连续semantic空间。
  • results: 在多语言speech-to-text翻译任务上,该方法与强基eline比较,显示了较好的表现,这表明decoder-only模型在speech-to-text转换中可能具有优势。
    Abstract Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has not been explored well. The "decoder-only" architecture has also not been well studied for speech processing tasks. In this research, we introduce Speech-LLaMA, a novel approach that effectively incorporates acoustic information into text-based large language models. Our method leverages Connectionist Temporal Classification and a simple audio encoder to map the compressed acoustic features to the continuous semantic space of the LLM. In addition, we further probe the decoder-only architecture for speech-to-text tasks by training a smaller scale randomly initialized speech-LLaMA model from speech-text paired data alone. We conduct experiments on multilingual speech-to-text translation tasks and demonstrate a significant improvement over strong baselines, highlighting the potential advantages of decoder-only models for speech-to-text conversion.
    摘要 大型自然语言处理模型(LLM)已经取得了很大的成功,使得人机交互使用自然语言更加简单。然而,将语音信号纳入LLM中的整合还没有得到了充分的探索。“解oder-only”架构也没有受到过好的研究。在这个研究中,我们介绍了一种新的方法,称为Speech-LLaMA,可以有效地将语音信号纳入文本大型自然语言模型中。我们的方法利用Connectionist Temporal Classification和简单的音频编码器将压缩的语音特征映射到文本大型自然语言模型中的连续Semantic空间中。此外,我们还进一步探索了使用只有解oder-only架构进行语音识别任务的可能性,通过从单独的语音-文本对应数据中 randomly initialize一个较小规模的Speech-LLaMA模型进行训练。我们在多语言语音识别翻译任务中进行了实验,并达到了很大的改善,highlighting the potential advantages of decoder-only models for speech-to-text conversion。

LaunchpadGPT: Language Model as Music Visualization Designer on Launchpad

  • paper_url: http://arxiv.org/abs/2307.04827
  • repo_url: https://github.com/yunlong10/launchpadgpt
  • paper_authors: Siting Xu, Yunlong Tang, Feng Zheng
    for:The paper is written for those who want to create music visualization designs for the Launchpad musical instrument.methods:The paper proposes a method called LaunchpadGPT, which uses a language model to generate music visualization designs on the Launchpad automatically.results:The proposed method can create better music visualization than random generation methods and has the potential for a broader range of music visualization applications.Here’s the text in Simplified Chinese:for: 本研究旨在帮助设计Launchpad的音乐视觉化,并提供一个更加可 accessible的方法来创建音乐视觉化。methods: 本研究提出了一个名为LaunchpadGPT的方法,它使用语言模型来自动生成Launchpad上的音乐视觉化设计。results: 实验结果显示,提案的方法可以对Launchpad上的音乐视觉化进行更好的设计,并且具有更广泛的应用前景。
    Abstract Launchpad is a musical instrument that allows users to create and perform music by pressing illuminated buttons. To assist and inspire the design of the Launchpad light effect, and provide a more accessible approach for beginners to create music visualization with this instrument, we proposed the LaunchpadGPT model to generate music visualization designs on Launchpad automatically. Based on the language model with excellent generation ability, our proposed LaunchpadGPT takes an audio piece of music as input and outputs the lighting effects of Launchpad-playing in the form of a video (Launchpad-playing video). We collect Launchpad-playing videos and process them to obtain music and corresponding video frame of Launchpad-playing as prompt-completion pairs, to train the language model. The experiment result shows the proposed method can create better music visualization than random generation methods and hold the potential for a broader range of music visualization applications. Our code is available at https://github.com/yunlong10/LaunchpadGPT/.
    摘要 Launchpad是一种音乐 instrumente,允许用户通过键盘按钮来创作和演奏音乐。为了帮助设计Launchpad的光效和激励beginner创作音乐视觉,我们提出了LaunchpadGPT模型,自动生成Launchpad演奏的光效设计。基于优秀的语言模型,我们的LaunchpadGPT接受音乐作品作为输入,并输出Launchpad演奏的视频形式的光效设计。我们收集了大量Launchpad演奏视频,并对其进行处理,以获取音乐和对应的视频帧作为提示完成对的对。通过训练语言模型,我们实现了更好的音乐视觉创作。实验结果表明,我们的方法可以创造出比随机生成方法更好的音乐视觉,并拥有更广泛的音乐视觉应用前景。我们的代码可以在https://github.com/yunlong10/LaunchpadGPT/查看。

cs.CV - 2023-07-08

Lightweight Improved Residual Network for Efficient Inverse Tone Mapping

  • paper_url: http://arxiv.org/abs/2307.03998
  • repo_url: None
  • paper_authors: Liqi Xue, Tianyi Xu, Yongbao Song, Yan Liu, Lei Zhang, Xiantong Zhen, Jun Xu
  • for: 用于高动态范围图像的倒计时间映射(ITM)任务。
  • methods: 提出了一种基于增强的待遇块的轻量级Improved Residual Network(IRNet),用于高效地完成ITM任务。
  • results: 在三个标准数据集上进行实验,得到了最佳性能在ITM和joint SR-ITM任务中。
    Abstract The display devices like HDR10 televisions are increasingly prevalent in our daily life for visualizing high dynamic range (HDR) images. But the majority of media images on the internet remain in 8-bit standard dynamic range (SDR) format. Therefore, converting SDR images to HDR ones by inverse tone mapping (ITM) is crucial to unlock the full potential of abundant media images. However, existing ITM methods are usually developed with complex network architectures requiring huge computational costs. In this paper, we propose a lightweight Improved Residual Network (IRNet) by enhancing the power of popular residual block for efficient ITM. Specifically, we propose a new Improved Residual Block (IRB) to extract and fuse multi-layer features for fine-grained HDR image reconstruction. Experiments on three benchmark datasets demonstrate that our IRNet achieves state-of-the-art performance on both the ITM and joint SR-ITM tasks. The code, models and data will be publicly available at https://github.com/ThisisVikki/ITM-baseline.
    摘要 现在的显示设备如HDR10电视在我们日常生活中变得越来越普遍,用于显示高动态范围(HDR)图像。然而,互联网上的媒体图像大多数仍然是8比特标准动态范围(SDR)格式。因此,将SDR图像转换为HDR图像的 inverse tone mapping(ITM)变得非常重要,以便使用丰富的媒体图像。然而,现有的ITM方法通常具有复杂的网络架构,需要巨大的计算成本。在这篇论文中,我们提出了一种轻量级的改进律重网络(IRNet),通过提高流行的律重块来提高HDR图像重建精度。具体来说,我们提出了一种新的改进律重块(IRB),用于提取和融合多层特征来实现细腻的HDR图像重建。实验表明,我们的IRNet在ITM和 joint SR-ITM任务上达到了状态略的表现。代码、模型和数据将在https://github.com/ThisisVikki/ITM-baseline上公开。

Stimulating the Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling

  • paper_url: http://arxiv.org/abs/2307.03992
  • repo_url: None
  • paper_authors: Tong Li, Hansen Feng, Lizhi Wang, Zhiwei Xiong, Hua Huang
  • for: 提高图像噪声去除的高质量感知性和低扭曲性。
  • methods: 提出了一种新的策略called Diffusion Model for Image Denoising (DMID),包括适应嵌入方法和适应 ensemble方法,以解决噪声模型和图像去噪的关键问题。
  • results: 实现了对所有噪声基数和感知指标的状态艺术性和低扭曲性表现,包括对真实世界图像的去噪。
    Abstract Image denoising is a fundamental problem in computational photography, where achieving high-quality perceptual performance with low distortion is highly demanding. Current methods either struggle with perceptual performance or suffer from significant distortion. Recently, the emerging diffusion model achieves state-of-the-art performance in various tasks, and its denoising mechanism demonstrates great potential for image denoising. However, stimulating diffusion models for image denoising is not straightforward and requires solving several critical problems. On the one hand, the input inconsistency hinders the connection of diffusion models and image denoising. On the other hand, the content inconsistency between the generated image and the desired denoised image introduces additional distortion. To tackle these problems, we present a novel strategy called Diffusion Model for Image Denoising (DMID) by understanding and rethinking the diffusion model from a denoising perspective. Our DMID strategy includes an adaptive embedding method that embeds the noisy image into a pre-trained diffusion model, and an adaptive ensembling method that reduces distortion in the denoised image. Our DMID strategy achieves state-of-the-art performance on all distortion-based and perceptual metrics, for both Gaussian and real-world image denoising.
    摘要 Image denoising是计算摄影学中的基本问题,要求 достичь高质量的感知性性能并且减少扭曲。目前的方法可能会导致感知性性能下降或者受到重大的扭曲影响。近些年,出现了扩散模型,在不同任务中达到了国际前ier的性能,其扩散机制对图像杂 noise 有很大的潜力。然而,激活扩散模型以实现图像杂 noise 约束是不直接的,需要解决多个关键问题。一方面,输入不一致会阻碍扩散模型和图像杂 noise 的连接。另一方面,生成的图像与需要的杂 noise 图像的内容不一致会引入额外的扭曲。为了解决这些问题,我们提出了一种新的策略,即扩散模型 для图像杂 noise(DMID)。我们的 DMID 策略包括适应嵌入方法,将噪图像 embedding 到预训练的扩散模型中,以及适应ensemble方法,以减少生成的图像中的扭曲。我们的 DMID 策略在所有基于扭曲和感知性的指标上达到了国际前ier的性能,包括 Gaussian 和实际世界的图像杂 noise 约束。

FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction

  • paper_url: http://arxiv.org/abs/2307.03990
  • repo_url: None
  • paper_authors: Ganglai Wang, Peng Zhang, Junwen Xiong, Feihan Yang, Wei Huang, Yufei Zha
  • for: 防止深伪视频攻击公共媒体安全,特别是利用语音和视频流进行数字面部伪造,以至于识别人脸特征困难。
  • methods: 提出了一种基于视觉、音频和运动特征的检测网络(FTFDNet),并使用高效的跨模态融合(CMF)模块将这些特征融合。此外,还提出了一种新的音频视频注意机制(AVAM),可以找到更多的有用特征。
  • results: FTFDNet在已知的深伪面部检测数据集(FTFDD)以及深伪视频检测数据集(DFDC和DF-TIMIT)上达到了与其他当前领先的深伪视频检测方法相比较好的检测性能。
    Abstract DeepFake based digital facial forgery is threatening public media security, especially when lip manipulation has been used in talking face generation, and the difficulty of fake video detection is further improved. By only changing lip shape to match the given speech, the facial features of identity are hard to be discriminated in such fake talking face videos. Together with the lack of attention on audio stream as the prior knowledge, the detection failure of fake talking face videos also becomes inevitable. It's found that the optical flow of the fake talking face video is disordered especially in the lip region while the optical flow of the real video changes regularly, which means the motion feature from optical flow is useful to capture manipulation cues. In this study, a fake talking face detection network (FTFDNet) is proposed by incorporating visual, audio and motion features using an efficient cross-modal fusion (CMF) module. Furthermore, a novel audio-visual attention mechanism (AVAM) is proposed to discover more informative features, which can be seamlessly integrated into any audio-visual CNN architecture by modularization. With the additional AVAM, the proposed FTFDNet is able to achieve a better detection performance than other state-of-the-art DeepFake video detection methods not only on the established fake talking face detection dataset (FTFDD) but also on the DeepFake video detection datasets (DFDC and DF-TIMIT).
    摘要 深度复制基本的数字面伪造是公共媒体安全的威胁,特别是在口型 manipulate 在生成 talking face 中使用,并使 fake video 检测更加困难。只有改变 lip 形状来匹配给定的speech,then the facial features of identity are difficult to distinguish in such fake talking face videos. In addition, the lack of attention to the audio stream as prior knowledge makes it impossible to detect fake talking face videos. It is found that the optical flow of the fake talking face video is disordered, especially in the lip region, while the optical flow of the real video changes regularly, which means that the motion feature from optical flow is useful for capturing manipulation cues. In this study, a fake talking face detection network (FTFDNet) is proposed by incorporating visual, audio, and motion features using an efficient cross-modal fusion (CMF) module. Furthermore, a novel audio-visual attention mechanism (AVAM) is proposed to discover more informative features, which can be seamlessly integrated into any audio-visual CNN architecture by modularization. With the additional AVAM, the proposed FTFDNet is able to achieve a better detection performance than other state-of-the-art DeepFake video detection methods not only on the established fake talking face detection dataset (FTFDD) but also on the DeepFake video detection datasets (DFDC and DF-TIMIT).

TractGeoNet: A geometric deep learning framework for pointwise analysis of tract microstructure to predict language assessment performance

  • paper_url: http://arxiv.org/abs/2307.03982
  • repo_url: None
  • paper_authors: Yuqian Chen, Leo R. Zekelman, Chaoyi Zhang, Tengfei Xue, Yang Song, Nikos Makris, Yogesh Rathi, Alexandra J. Golby, Weidong Cai, Fan Zhang, Lauren J. O’Donnell
  • for: 这个研究旨在开发一种基于几何深度学习的推论框架,以用于使用扩散磁共振成像(dMRI)轨脉图像和相关的点粒子微结构测量进行回归。
  • methods: 该方法使用点云表示,可以直接利用轨脉图像中所有点的位势信息和细胞微结构信息进行推论。此外,我们提出了一种新的损失函数,即对称对比推论损失函数,以促进模型准确地预测轨脉图像中点粒子微结构之间的差异。
  • results: 我们通过使用TractGeoNet进行回归预测,并评估了20个白 matter轨脉图像的语言功能表现。结果显示,TractGeoNet比许多流行的回归模型表现更优异。此外,我们发现左弯曲脉幕轨脉是语言功能表现的最重要预测因素之一。本地化的关键区域分布在两个半球的白 matter轨脉中,包括耳延盘、前叶、上部和下部的脑区域,这些脑区域被认为是语言功能的重要组成部分。总的来说,TractGeoNet表明几何深度学习可以增强脑白 matter轨脉的研究和语言功能之间的关系。
    Abstract We propose a geometric deep-learning-based framework, TractGeoNet, for performing regression using diffusion magnetic resonance imaging (dMRI) tractography and associated pointwise tissue microstructure measurements. By employing a point cloud representation, TractGeoNet can directly utilize pointwise tissue microstructure and positional information from all points within a fiber tract. To improve regression performance, we propose a novel loss function, the Paired-Siamese Regression loss, which encourages the model to focus on accurately predicting the relative differences between regression label scores rather than just their absolute values. In addition, we propose a Critical Region Localization algorithm to identify highly predictive anatomical regions within the white matter fiber tracts for the regression task. We evaluate the effectiveness of the proposed method by predicting individual performance on two neuropsychological assessments of language using a dataset of 20 association white matter fiber tracts from 806 subjects from the Human Connectome Project. The results demonstrate superior prediction performance of TractGeoNet compared to several popular regression models. Of the twenty tracts studied, we find that the left arcuate fasciculus tract is the most highly predictive of the two studied language performance assessments. The localized critical regions are widespread and distributed across both hemispheres and all cerebral lobes, including areas of the brain considered important for language function such as superior and anterior temporal regions, pars opercularis, and precentral gyrus. Overall, TractGeoNet demonstrates the potential of geometric deep learning to enhance the study of the brain's white matter fiber tracts and to relate their structure to human traits such as language performance.
    摘要 我们提出了一个几何深度学习基于扩散磁共振成像(dMRI)的推论框架,称为TractGeoNet,用于进行回归。该框架利用点云表示,可以直接利用所有纤维股区域的点wise组织和位域信息。为了提高回归性能,我们提出了一种新的损失函数,即对称对拼减损失函数,该函数鼓励模型对准精确地预测相对差值而不仅仅是绝对值。此外,我们提出了一种关键区域定位算法,用于在白 matter纤维股中预测性能。我们通过使用dMRI数据集的20个相关纤维股,从806名参与者中预测语言性能,证明了TractGeoNet的效果比较出色。其中,左弯曲 fasiculus 纤维股被证明为语言性能预测的最高度相关。关键区域广泛分布在两个半球和所有脑叶,包括被认为是语言功能重要的脑区域,如上侧 temporalis、anterior temporalis、pars opercularis 和 precentral gyrus。总之,TractGeoNet表明几何深度学习可以提高白 matter纤维股的研究和语言性能之间的关系。

Building and Road Segmentation Using EffUNet and Transfer Learning Approach

  • paper_url: http://arxiv.org/abs/2307.03980
  • repo_url: None
  • paper_authors: Sahil Gangurde
  • for: 本论文旨在用深度学习方法实现城市规划中的建筑物和路径分割,以提高城市规划决策的效果。
  • methods: 本论文提出了一种基于Google新提出的EfficientNetV2的Encoder-UNet Decoder架构,用于feature提取和分割地图中的建筑物和路径。
  • results: 使用该方法, authors在麻省建筑物和路径 dataset上达到了 benchmark 分割精度(mIOU)的0.8365和0.9153。
    Abstract In city, information about urban objects such as water supply, railway lines, power lines, buildings, roads, etc., is necessary for city planning. In particular, information about the spread of these objects, locations and capacity is needed for the policymakers to make impactful decisions. This thesis aims to segment the building and roads from the aerial image captured by the satellites and UAVs. Many different architectures have been proposed for the semantic segmentation task and UNet being one of them. In this thesis, we propose a novel architecture based on Google's newly proposed EfficientNetV2 as an encoder for feature extraction with UNet decoder for constructing the segmentation map. Using this approach we achieved a benchmark score for the Massachusetts Building and Road dataset with an mIOU of 0.8365 and 0.9153 respectively.
    摘要 在城市中,有关城市 объекts such as 水upply, 铁路线, 电力线, 建筑物, 路网等的信息是必要的 для城市规划。特别是在决策者们需要了解这些对象的扩散、位置和容量,以便做出有效的决策。本论文目的是从航空图像和无人机拍摄的卫星和UAV中提取建筑物和路网的信息,并使用Google提出的新的EfficientNetV2嵌入器和UNet解码器组成 semantic segmentation 图像。我们在这里提出了一种新的架构,并在Massachusetts Building and Road数据集上实现了benchmark分数,具体分别为0.8365和0.9153。

End-to-End Supervised Multilabel Contrastive Learning

  • paper_url: http://arxiv.org/abs/2307.03967
  • repo_url: https://github.com/mahdihosseini/kmcl
  • paper_authors: Ahmad Sajedi, Samir Khaki, Konstantinos N. Plataniotis, Mahdi S. Hosseini
  • for: 本研究旨在提出一种新的综合训练框架,以解决多标签表示学习中的挑战,包括标签相关性和数据相关性。
  • methods: 本研究使用了kernel-based多标签对比学习(KMCL)方法,该方法首先将嵌入特征转化为一种混合 exponential kernel 的 Gaussian RKHS 中,然后使用一个包含 reconstruction loss、非对称分类损失和对比损失的目标函数进行编码。
  • results: EXTENSIVE experiments 表明,KMCL 可以在图像分类任务中实现鲁棒的改进,并且可以减少计算复杂性。
    Abstract Multilabel representation learning is recognized as a challenging problem that can be associated with either label dependencies between object categories or data-related issues such as the inherent imbalance of positive/negative samples. Recent advances address these challenges from model- and data-centric viewpoints. In model-centric, the label correlation is obtained by an external model designs (e.g., graph CNN) to incorporate an inductive bias for training. However, they fail to design an end-to-end training framework, leading to high computational complexity. On the contrary, in data-centric, the realistic nature of the dataset is considered for improving the classification while ignoring the label dependencies. In this paper, we propose a new end-to-end training framework -- dubbed KMCL (Kernel-based Mutlilabel Contrastive Learning) -- to address the shortcomings of both model- and data-centric designs. The KMCL first transforms the embedded features into a mixture of exponential kernels in Gaussian RKHS. It is then followed by encoding an objective loss that is comprised of (a) reconstruction loss to reconstruct kernel representation, (b) asymmetric classification loss to address the inherent imbalance problem, and (c) contrastive loss to capture label correlation. The KMCL models the uncertainty of the feature encoder while maintaining a low computational footprint. Extensive experiments are conducted on image classification tasks to showcase the consistent improvements of KMCL over the SOTA methods. PyTorch implementation is provided in \url{https://github.com/mahdihosseini/KMCL}.
    摘要 多标签表示学习被认为是一个复杂的问题,与对象类别标签之间的相互关系或数据相关的问题有关。现代进步从模型和数据中心视角来解决这些挑战。在模型中心的方法中,通过外部模型设计(例如图像 CNN)获得标签相关性,但是它们无法设计端到端训练框架,导致计算复杂性过高。在数据中心的方法中,利用实际数据的自然特性来提高分类,忽略标签相互关系。在这篇论文中,我们提出一种新的端到端训练框架——拥有kernel-based多标签对比学习(KMCL)——以解决模型和数据中心的缺点。KMCL首先将嵌入特征转换为混合几何函数在高斯RKHS中,然后编码一个包含(a)重建kernel表示的损失函数,(b)偏好类别的不对称分类损失函数,和(c)捕捉标签相关性的对比损失函数的目标损失函数。KMCL模型特征编码器的不确定性,同时保持计算脚本的低峰值。我们在图像分类任务中进行了广泛的实验,并证明了KMCL在SOTA方法上显著提高了性能。PyTorch实现可以在 \url{https://github.com/mahdihosseini/KMCL} 中找到。

Reading Between the Lanes: Text VideoQA on the Road

  • paper_url: http://arxiv.org/abs/2307.03948
  • repo_url: None
  • paper_authors: George Tom, Minesh Mathew, Sergi Garcia, Dimosthenis Karatzas, C. V. Jawahar
  • for: 这个研究是为了提高车辆驾驶员的安全驾驶和情感知识,通过捕捉车辆前方环境中的文字信息,并将其转换为驾驶员可以理解的形式。
  • methods: 这个研究使用了视觉数据流进行文字识别,并将文字识别结果与时间进行推理,以提高驾驶员的情感知识和驾驶安全性。
  • results: 这个研究发现,现有的VideoQA模型在这个领域中仍有很大的提升空间,并且显示了这个 dataset 的可用性和重要性在进一步推进驾驶员支持系统和文字敏感多modal问答的研究中。
    Abstract Text and signs around roads provide crucial information for drivers, vital for safe navigation and situational awareness. Scene text recognition in motion is a challenging problem, while textual cues typically appear for a short time span, and early detection at a distance is necessary. Systems that exploit such information to assist the driver should not only extract and incorporate visual and textual cues from the video stream but also reason over time. To address this issue, we introduce RoadTextVQA, a new dataset for the task of video question answering (VideoQA) in the context of driver assistance. RoadTextVQA consists of $3,222$ driving videos collected from multiple countries, annotated with $10,500$ questions, all based on text or road signs present in the driving videos. We assess the performance of state-of-the-art video question answering models on our RoadTextVQA dataset, highlighting the significant potential for improvement in this domain and the usefulness of the dataset in advancing research on in-vehicle support systems and text-aware multimodal question answering. The dataset is available at http://cvit.iiit.ac.in/research/projects/cvit-projects/roadtextvqa
    摘要 文本和路上的示意图提供了驾驶员 navigate 和情况意识 的关键信息。Scene text recognition in motion 是一个具有挑战性的问题,因为文本cue 通常只出现短时间,早期检测距离是必要的。为了解决这个问题,我们介绍了 RoadTextVQA dataset,用于驾驶助手Question Answering 任务(VideoQA)的研究。RoadTextVQA 包括 $3,222$ 段驾驶视频,收集自多个国家,并有 $10,500$ 个问题,所有问题基于驾驶视频中的文本或路上示意图。我们评估了现有的 VideoQA 模型在我们的 RoadTextVQA dataset 上的性能,并指出了这个领域的显著改进潜力和使用这个dataset进行文本感知多模式问答的研究的用于。dataset 可以在 http://cvit.iiit.ac.in/research/projects/cvit-projects/roadtextvqa 上获取。

Camouflaged Object Detection with Feature Grafting and Distractor Aware

  • paper_url: http://arxiv.org/abs/2307.03943
  • repo_url: https://github.com/syxvision/fdnet
  • paper_authors: Yuxuan Song, Xinyue Li, Lin Qi
  • for: 本研究旨在提高掩蔽物检测的精度,使得能够准确地检测掩蔽在环境中的目标。
  • methods: 本文提出了一种新的Feature Grafting and Distractor Aware网络(FDNet)来解决掩蔽物检测问题。特别是,我们使用了CNN和Transformer来并行地编码多尺度图像。为了更好地利用两个编码器的优势,我们设计了一个cross-attention-based Feature GraftingModule,将Transformer分支中提取的特征融合到CNN分支中,然后在Feature Fusion Module中进行粘合。此外,我们还设计了一个Explicitly Modeling Distractors模块,以便更加精确地模拟掩蔽物检测中的两种可能的干扰因素。
  • results: 我们的方法在四个常用的 benchmark datasets 以及ACOD2K dataset上进行了广泛的实验,结果显示,我们的方法与其他状态之前的方法相比,有显著的提高。代码和ACOD2K dataset将在https://github.com/syxvision/FDNet上公开。
    Abstract The task of Camouflaged Object Detection (COD) aims to accurately segment camouflaged objects that integrated into the environment, which is more challenging than ordinary detection as the texture between the target and background is visually indistinguishable. In this paper, we proposed a novel Feature Grafting and Distractor Aware network (FDNet) to handle the COD task. Specifically, we use CNN and Transformer to encode multi-scale images in parallel. In order to better explore the advantages of the two encoders, we design a cross-attention-based Feature Grafting Module to graft features extracted from Transformer branch into CNN branch, after which the features are aggregated in the Feature Fusion Module. A Distractor Aware Module is designed to explicitly model the two possible distractors in the COD task to refine the coarse camouflage map. We also proposed the largest artificial camouflaged object dataset which contains 2000 images with annotations, named ACOD2K. We conducted extensive experiments on four widely used benchmark datasets and the ACOD2K dataset. The results show that our method significantly outperforms other state-of-the-art methods. The code and the ACOD2K will be available at https://github.com/syxvision/FDNet.
    摘要 “Camouflaged Object Detection(COD)任务的目标是精准地找到融入环境中的掩蔽物,这比普通的检测更加具有挑战性,因为标的和背景的文字特征无法辨识。在这篇论文中,我们提出了一个新的特色插入和干扰识别网络(FDNet)来处理COD任务。具体来说,我们使用CNN和Transformer并行地实现多个标本的像素网络。为了更好地利用两个网络的优点,我们设计了一个交互式特色插入模组,将Transformer分支中提取的特色插入到CNN分支中,然后将特色在特色聚合模组中聚合。此外,我们设计了一个干扰识别模组,以Explicitly模型COD任务中的两种干扰因素,以改善粗糙的掩蔽地图。我们还提出了2000幅掩蔽物标注图像集合,名为ACOD2K。我们实现了广泛的实验,包括四个通用的 benchmark 测试集和 ACOD2K 标注图像集合。结果显示,我们的方法在其他状态顶专门方法之上得到了很好的表现。我们将代码和ACOD2K数据集存储在 GitHub 上,请遵循 https://github.com/syxvision/FDNet 来访问。”

Ariadne’s Thread:Using Text Prompts to Improve Segmentation of Infected Areas from Chest X-ray images

  • paper_url: http://arxiv.org/abs/2307.03942
  • repo_url: https://github.com/junelin2333/languidemedseg-miccai2023
  • paper_authors: Yi Zhong, Mengqiu Xu, Kongming Liang, Kaixin Chen, Ming Wu
  • for: 该研究旨在提高肺病评估的精度,提供更准确的肺病诊断和治疗方案。
  • methods: 该研究使用语言驱动的图像分割方法,通过文本提示来改进图像分割结果。
  • results: 实验结果表明,该方法与单模方法相比,提高了QaTa-COV19数据集的 dice分数6.09%以上。此外,研究还表明,多模式方法在文本粒度和训练数据大小方面具有显著的优势。
    Abstract Segmentation of the infected areas of the lung is essential for quantifying the severity of lung disease like pulmonary infections. Existing medical image segmentation methods are almost uni-modal methods based on image. However, these image-only methods tend to produce inaccurate results unless trained with large amounts of annotated data. To overcome this challenge, we propose a language-driven segmentation method that uses text prompt to improve to the segmentation result. Experiments on the QaTa-COV19 dataset indicate that our method improves the Dice score by 6.09% at least compared to the uni-modal methods. Besides, our extended study reveals the flexibility of multi-modal methods in terms of the information granularity of text and demonstrates that multi-modal methods have a significant advantage over image-only methods in terms of the size of training data required.
    摘要 segmentation of infected lung areas is crucial for assessing lung disease severity, such as pulmonary infections. current medical image segmentation methods are mostly uni-modal, relying solely on images. however, these image-only methods often produce inaccurate results without large amounts of annotated data. to address this challenge, we propose a language-driven segmentation method that utilizes text prompts to improve segmentation accuracy. experiments on the QaTa-COV19 dataset show that our method improves the dice score by at least 6.09% compared to uni-modal methods. furthermore, our extended study demonstrates the flexibility of multi-modal methods in terms of text information granularity and shows that multi-modal methods require significantly less training data than image-only methods.Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well.

Face Image Quality Enhancement Study for Face Recognition

  • paper_url: http://arxiv.org/abs/2307.05534
  • repo_url: https://github.com/djdprogramming/adfa2
  • paper_authors: Iqbal Nouyed, Na Zhang
  • for: 本研究探讨低质量face图像 face recognition的问题,尝试提高低质量face图像的识别精度。
  • methods: 使用当前最佳的人脸图像进行优化,开发了一种新的识别协议,以避免实验偏见。
  • results: 实验结果表明,使用优化后的face图像可以提高识别精度,但也存在一些挑战性的问题。
    Abstract Unconstrained face recognition is an active research area among computer vision and biometric researchers for many years now. Still the problem of face recognition in low quality photos has not been well-studied so far. In this paper, we explore the face recognition performance on low quality photos, and we try to improve the accuracy in dealing with low quality face images. We assemble a large database with low quality photos, and examine the performance of face recognition algorithms for three different quality sets. Using state-of-the-art facial image enhancement approaches, we explore the face recognition performance for the enhanced face images. To perform this without experimental bias, we have developed a new protocol for recognition with low quality face photos and validate the performance experimentally. Our designed protocol for face recognition with low quality face images can be useful to other researchers. Moreover, experiment results show some of the challenging aspects of this problem.
    摘要 <>无约束面Recognition是计算机视觉和生物认证领域的活跃研究领域,数年来仍然没有充分研究低质量照片的面Recognition问题。在这篇论文中,我们探索了低质量照片中的面Recognition性能,并尝试提高对低质量面图像的准确率。我们组织了一个大型数据库,并对三个不同质量集进行了测试。使用当前的状态kemal facial image enhancement方法,我们探索了对加强的面图像进行认证的性能。为了避免实验偏见,我们开发了一种新的协议 для面Recognition with low quality face photos,并 Validate its performance experimentally。我们的设计的协议可以为其他研究人员提供帮助。此外,实验结果表明了一些面Recognition问题的挑战性方面。Translation notes:* "无约束" (wú jiè shì) means "unconstrained" in Chinese.* "面Recognition" (liǎn tiān xiǎng) means "face recognition" in Chinese.* "生物认证" (shēng wù rèn shè) means "biometric recognition" in Chinese.* "计算机视觉" (jìsuàn jīsuān) means "computer vision" in Chinese.* "数年来" (shù nián lái) means "for many years" in Chinese.* "仍然" (jiéguān) means "still" in Chinese.* "low quality" (gōng yǐn) means "low quality" in Chinese.* "照片" (zhāo pǐn) means "photos" in Chinese.* "面图像" (liǎn tú xiàng) means "face images" in Chinese.* " recognition" (tiān xiǎng) means "recognition" in Chinese.* "准确率" (zhèng qiú lǐ) means "accuracy" in Chinese.* "数据库" (shù jīng kù) means "database" in Chinese.* "测试" (cè shí) means "testing" in Chinese.* "状态kemal" (zhuàng tài kè mā) means "state-of-the-art" in Chinese.* "facial image enhancement" (liǎn tíng xiǎng yǎng) means "facial image enhancement" in Chinese.* "协议" (xié yì) means "protocol" in Chinese.* " Validate" (bèi yǐ) means "to validate" in Chinese.* "实验偏见" (shí yàn pēn jiàn) means "experimental bias" in Chinese.* "挑战性" (tiǎo zhàn xìng) means "challenging aspects" in Chinese.

Edge-Aware Mirror Network for Camouflaged Object Detection

  • paper_url: http://arxiv.org/abs/2307.03932
  • repo_url: https://github.com/sdy1999/eamnet
  • paper_authors: Dongyue Sun, Shiyao Jiang, Lin Qi
  • for: 提高隐形目标检测精度
  • methods: 提出了一种Edge-aware Mirror Network(EAMNet),通过在检测和分割过程中进行交叠引导,提高了目标检测和分割精度
  • results: 对三个常用的隐形目标检测数据集进行量化和质量测试,比较了与现有最佳基eline的比较,得到了更高的精度Here is the translation in English:
  • for: Improving the accuracy of camouflaged object detection
  • methods: Proposed a novel Edge-aware Mirror Network (EAMNet) that models edge detection and camouflaged object segmentation as a cross refinement process, consisting of a segmentation-induced edge aggregation module, an edge-induced integrity aggregation module, and a guided-residual channel attention module.
  • results: Quantitative and qualitative experiment results on three widely used COD datasets show that EAMNet outperforms existing cutting-edge baselines.Note that the translation is done in a simplified Chinese format, which is a more casual and conversational style of Chinese writing. If you prefer a more formal style, I can also provide that.
    Abstract Existing edge-aware camouflaged object detection (COD) methods normally output the edge prediction in the early stage. However, edges are important and fundamental factors in the following segmentation task. Due to the high visual similarity between camouflaged targets and the surroundings, edge prior predicted in early stage usually introduces erroneous foreground-background and contaminates features for segmentation. To tackle this problem, we propose a novel Edge-aware Mirror Network (EAMNet), which models edge detection and camouflaged object segmentation as a cross refinement process. More specifically, EAMNet has a two-branch architecture, where a segmentation-induced edge aggregation module and an edge-induced integrity aggregation module are designed to cross-guide the segmentation branch and edge detection branch. A guided-residual channel attention module which leverages the residual connection and gated convolution finally better extracts structural details from low-level features. Quantitative and qualitative experiment results show that EAMNet outperforms existing cutting-edge baselines on three widely used COD datasets. Codes are available at https://github.com/sdy1999/EAMNet.
    摘要 现有的隐形目标检测(COD)方法通常在早期输出边预测。然而,边是后续 segmentation 任务中非常重要的因素。由于隐形目标和周围环境的视觉相似性很高,早期边预测通常会导致误分别前景和背景,污染分割特征。为解决这个问题,我们提出了一种新的 Edge-aware Mirror Network(EAMNet),它将边检测和隐形目标分割视为交叠的过程。更 Specifically,EAMNet 具有两极性体系,包括 segmentation-induced edge aggregation module 和 edge-induced integrity aggregation module,这两个模块用于交叠导航分割支路和边检测支路。最后,一个受引用的残差核心注意力模块,通过残差连接和阻止 convolution,终于更好地提取低级特征的结构细节。量化和质量实验结果表明,EAMNet 在三个广泛使用的 COD 数据集上表现出色,超过现有的最新基线。代码可以在 https://github.com/sdy1999/EAMNet 中找到。

VS-TransGRU: A Novel Transformer-GRU-based Framework Enhanced by Visual-Semantic Fusion for Egocentric Action Anticipation

  • paper_url: http://arxiv.org/abs/2307.03918
  • repo_url: None
  • paper_authors: Congqi Cao, Ze Sun, Qinyi Lv, Lingtong Min, Yanning Zhang
  • for: This paper aims to improve the performance of egocentric action anticipation by proposing a novel visual-semantic fusion enhanced and Transformer GRU-based action anticipation framework.
  • methods: The proposed method introduces high-level semantic information to improve action anticipation performance, uses an effective visual-semantic fusion module, and employs a Transformer based encoder and GRU-based decoder to model long-term sequential and flexible iteration decoding.
  • results: The proposed method achieves new state-of-the-art performance on two large-scale first-person view datasets, outperforming previous approaches by a large margin.
    Abstract Egocentric action anticipation is a challenging task that aims to make advanced predictions of future actions from current and historical observations in the first-person view. Most existing methods focus on improving the model architecture and loss function based on the visual input and recurrent neural network to boost the anticipation performance. However, these methods, which merely consider visual information and rely on a single network architecture, gradually reach a performance plateau. In order to fully understand what has been observed and capture the dependencies between current observations and future actions well enough, we propose a novel visual-semantic fusion enhanced and Transformer GRU-based action anticipation framework in this paper. Firstly, high-level semantic information is introduced to improve the performance of action anticipation for the first time. We propose to use the semantic features generated based on the class labels or directly from the visual observations to augment the original visual features. Secondly, an effective visual-semantic fusion module is proposed to make up for the semantic gap and fully utilize the complementarity of different modalities. Thirdly, to take advantage of both the parallel and autoregressive models, we design a Transformer based encoder for long-term sequential modeling and a GRU-based decoder for flexible iteration decoding. Extensive experiments on two large-scale first-person view datasets, i.e., EPIC-Kitchens and EGTEA Gaze+, validate the effectiveness of our proposed method, which achieves new state-of-the-art performance, outperforming previous approaches by a large margin.
    摘要 先前的方法主要是通过提高模型架构和损失函数来提高预测性能,主要基于视觉输入和循环神经网络。然而,这些方法很快就会达到性能极限。为了全面理解已经观察到的内容和捕捉未来行为之间的依赖关系,我们在这篇论文中提出了一种新的视 semantic融合增强的动作预测框架。首先,我们提出了在动作预测中使用高级 semantic信息,以提高性能。我们使用基于类别标签或直接从视觉观察到的semantic特征来增强原始视觉特征。其次,我们提出了一种有效的视 semantic融合模块,以弥补semantic漏斗并全面利用不同模式之间的共轭性。最后,为了利用并行和自适应模型,我们设计了基于Transformer的编码器进行长期序列化和基于GRU的解码器进行灵活迭代解码。我们在两个大规模的第一人视角数据集,即EPIC-Kitchens和EGTEA Gaze+上进行了广泛的实验,并证明了我们提出的方法的效iveness,成功击败了之前的方法,占新的状态符极限。

Adversarial Self-Attack Defense and Spatial-Temporal Relation Mining for Visible-Infrared Video Person Re-Identification

  • paper_url: http://arxiv.org/abs/2307.03903
  • repo_url: None
  • paper_authors: Huafeng Li, Le Xu, Yafei Zhang, Dapeng Tao, Zhengtao Yu
    for:The paper is written for solving the problem of cross-modal pedestrian identity matching in visible-infrared video person re-identification, by proposing a new method that integrates adversarial self-attack defense and spatial-temporal relation mining.methods:The proposed method uses adversarial self-attack to defend against the perturbations caused by changes in views, posture, background, and modal discrepancy, and a spatial-temporal information-guided feature representation network to extract robust features from video sequences.results:The proposed method exhibits compelling performance on large-scale cross-modality video datasets.Here is the Chinese version of the three information:for:这篇论文是为了解决视频人识别中的跨模态人识别问题,提出了一种新的方法,即利用对抗自我攻击防御和空间时间关系挖掘。methods:该方法使用对抗自我攻击来防御视频中人识别特征的变化,并使用空间时间关系引导的特征表示网络来提取更加稳定的特征。results:该方法在大规模跨模态视频 dataset 上表现出了吸引人的表现。
    Abstract In visible-infrared video person re-identification (re-ID), extracting features not affected by complex scenes (such as modality, camera views, pedestrian pose, background, etc.) changes, and mining and utilizing motion information are the keys to solving cross-modal pedestrian identity matching. To this end, the paper proposes a new visible-infrared video person re-ID method from a novel perspective, i.e., adversarial self-attack defense and spatial-temporal relation mining. In this work, the changes of views, posture, background and modal discrepancy are considered as the main factors that cause the perturbations of person identity features. Such interference information contained in the training samples is used as an adversarial perturbation. It performs adversarial attacks on the re-ID model during the training to make the model more robust to these unfavorable factors. The attack from the adversarial perturbation is introduced by activating the interference information contained in the input samples without generating adversarial samples, and it can be thus called adversarial self-attack. This design allows adversarial attack and defense to be integrated into one framework. This paper further proposes a spatial-temporal information-guided feature representation network to use the information in video sequences. The network cannot only extract the information contained in the video-frame sequences but also use the relation of the local information in space to guide the network to extract more robust features. The proposed method exhibits compelling performance on large-scale cross-modality video datasets. The source code of the proposed method will be released at https://github.com/lhf12278/xxx.
    摘要 visible-infrared видео人重复标识(re-ID)中,提取不受复杂场景(如Modalidad、摄像头视图、行人姿态、背景等)变化的特征是键,以实现交叉模式人标识匹配。为此,本文提出了一种新的可见infrared视频人重复标识方法,即反对抗自我攻击和空间时间关系挖掘。在这种方法中,视图、姿态、背景和模式差异被视为人标识特征变化的主要因素。这些干扰信息被包含在训练样本中,并用作对抗攻击。通过在训练中引入这些干扰信息,使模型更加抗性于这些不利因素。此外,文章还提出了一种基于视频序列的空间时间信息引导特征表示网络,以使用视频序列中的信息。该网络不仅可以提取视频帧序列中的信息,还可以使用当地信息的空间关系来引导网络提取更加Robust的特征。提议的方法在大规模交叉模式视频数据集上展示了吸引人的表现。源代码将在https://github.com/lhf12278/xxx上发布。

StyleGAN3: Generative Networks for Improving the Equivariance of Translation and Rotation

  • paper_url: http://arxiv.org/abs/2307.03898
  • repo_url: None
  • paper_authors: Tianlei Zhu, Junqi Chen, Renzhe Zhu, Gaurav Gupta
  • for: 本研究的目的是对StyleGAN进行改进,以提高其对等变换的能力。
  • methods: 本研究使用了StyleGAN2和两个修改后的StyleGAN3版本,并使用FFHQ dataset进行评估。
  • results: 研究发现,StyleGAN3版本是一个更好的生成网络,可以提高等变换性。这些发现有助于动画和视频的创作。Translation:
  • for: The purpose of this study is to improve the equivariance of StyleGAN.
  • methods: The study used StyleGAN2 and two modified versions of StyleGAN3, and evaluated them using the FFHQ dataset.
  • results: The study found that the StyleGAN3 version is a better generative network, with improved equivariance. These findings are beneficial for the creation of animations and videos.
    Abstract StyleGAN can use style to affect facial posture and identity features, and noise to affect hair, wrinkles, skin color and other details. Among these, the outcomes of the picture processing will vary slightly between different versions of styleGAN. As a result, the comparison of performance differences between styleGAN2 and the two modified versions of styleGAN3 will be the main focus of this study. We used the FFHQ dataset as the dataset and FID, EQ-T, and EQ-R were used to be the assessment of the model. In the end, we discovered that Stylegan3 version is a better generative network to improve the equivariance. Our findings have a positive impact on the creation of animation and videos.
    摘要 StyleGAN 可以通过风格来影响脸部姿势和个体特征,并通过噪音来影响头发、皱纹、皮肤颜色等细节。 Among these, the outcomes of the picture processing will vary slightly between different versions of StyleGAN. As a result, the comparison of performance differences between StyleGAN2 and the two modified versions of StyleGAN3 will be the main focus of this study. We used the FFHQ dataset as the dataset and FID, EQ-T, and EQ-R were used to be the assessment of the model. In the end, we discovered that Stylegan3 version is a better generative network to improve the equivariance. Our findings have a positive impact on the creation of animation and videos.Here's the translation in Traditional Chinese: StyleGAN 可以透过风格来影响脸部姿势和个体特征,并通过噪音来影响头发、皱纹、皮肤颜色等细节。 Among these, the outcomes of the picture processing will vary slightly between different versions of StyleGAN. As a result, the comparison of performance differences between StyleGAN2 and the two modified versions of StyleGAN3 will be the main focus of this study. We used the FFHQ dataset as the dataset and FID, EQ-T, and EQ-R were used to be the assessment of the model. In the end, we discovered that Stylegan3 version is a better generative network to improve the equivariance. Our findings have a positive impact on the creation of animation and videos.

HUMS2023 Data Challenge Result Submission

  • paper_url: http://arxiv.org/abs/2307.03871
  • repo_url: None
  • paper_authors: Dhiraj Neupane, Lakpa Dorje Tamang, Ngoc Dung Huynh, Mohamed Reda Bouadjenek, Sunil Aryal
  • for: 这项研究的目的是提出一种早期检测方法。
  • methods: 这项研究使用了绘图和Scalogram图像分析,以及计算每个信号的平均值、标准差(STD)和峰值至峰值(P2P)值。此外,研究还使用了 autoregressive integrated moving average(ARIMA)方法跟踪进程。
  • results: 研究发现了一些有用的结果,包括检测到的瑕理症状和ARIMA方法的预测结果。
    Abstract We implemented a simple method for early detection in this research. The implemented methods are plotting the given mat files and analyzing scalogram images generated by performing Continuous Wavelet Transform (CWT) on the samples. Also, finding the mean, standard deviation (STD), and peak-to-peak (P2P) values from each signal also helped detect faulty signs. We have implemented the autoregressive integrated moving average (ARIMA) method to track the progression.
    摘要 我们在这项研究中实现了一种简单的早期检测方法。我们使用了Continuous Wavelet Transform (CWT)来生成scalogram图像,并从每个信号中计算了平均值、标准差(STD)和峰值至谷值(P2P)。此外,我们还使用了autoregressive integrated moving average (ARIMA)方法来跟踪进程。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation

  • paper_url: http://arxiv.org/abs/2307.03869
  • repo_url: None
  • paper_authors: Aditya Sanghi, Pradeep Kumar Jayaraman, Arianna Rampini, Joseph Lambourne, Hooman Shayani, Evan Atherton, Saeid Asgari Taghanaki
  • for: 本研究旨在exploring how large pre-trained models can be used to generate 3D shapes from sketches, which has been an open challenge due to limited datasets and varying abstraction levels in the sketches.
  • methods: 我们使用了一种简单的方法,即在训练时使用大型预训练视觉模型的特征来conditioning 3D生成模型,以便在推理时从绘制 generate 3D shapes.
  • results: 我们的实验表明,使用大型预训练视觉模型的特征可以允许我们在推理时从绘制 generate 3D shapes, regardless of the level of abstraction in the sketches. 我们还发现这些特征可以跨域传递Semantic信号,从而实现多个3D shapes的生成 per each input sketch.
    Abstract Significant progress has recently been made in creative applications of large pre-trained models for downstream tasks in 3D vision, such as text-to-shape generation. This motivates our investigation of how these pre-trained models can be used effectively to generate 3D shapes from sketches, which has largely remained an open challenge due to the limited sketch-shape paired datasets and the varying level of abstraction in the sketches. We discover that conditioning a 3D generative model on the features (obtained from a frozen large pre-trained vision model) of synthetic renderings during training enables us to effectively generate 3D shapes from sketches at inference time. This suggests that the large pre-trained vision model features carry semantic signals that are resilient to domain shifts, i.e., allowing us to use only RGB renderings, but generalizing to sketches at inference time. We conduct a comprehensive set of experiments investigating different design factors and demonstrate the effectiveness of our straightforward approach for generation of multiple 3D shapes per each input sketch regardless of their level of abstraction without requiring any paired datasets during training.
    摘要 Recently, there have been significant advances in using large pre-trained models for downstream tasks in 3D vision, such as text-to-shape generation. This has inspired us to explore how these pre-trained models can be used effectively to generate 3D shapes from sketches, which has been a long-standing challenge due to the limited availability of sketch-shape paired datasets and the varying level of abstraction in the sketches. We discovered that conditioning a 3D generative model on the features (obtained from a frozen large pre-trained vision model) of synthetic renderings during training enables us to effectively generate 3D shapes from sketches at inference time. This suggests that the large pre-trained vision model features carry semantic signals that are robust to domain shifts, i.e., allowing us to use only RGB renderings, but generalizing to sketches at inference time. We conducted a comprehensive set of experiments investigating different design factors and demonstrated the effectiveness of our straightforward approach for generating multiple 3D shapes per each input sketch regardless of their level of abstraction without requiring any paired datasets during training.

Novel Categories Discovery from probability matrix perspective

  • paper_url: http://arxiv.org/abs/2307.03856
  • repo_url: https://github.com/mxahan/nev-ncd
  • paper_authors: Zahid Hasan, Abu Zaher Md Faridee, Masud Ahmed, Sanjay Purushotham, Heesung Kwon, Hyungtae Lee, Nirmalya Roy
  • for: 本研究是为了解决开放世界问题,通过类 semantics 进行知道的分类和 clustering novel category。
  • methods: 我们从 novel data 概率矩阵的角度 investigate NCD,并利用提供的 novel class 多尼尔分布( categorical distribution)的连接。我们预测可以通过学习其类分布来实现semantic-based novel data clustering。我们提出了一些新的约束,包括实例级别的信息约束和第一个统计特征约束。
  • results: 我们的简单方法成功地实现了基于类 semantics 的 novel data clustering,但需要提供类 semantic similarity между标签未标注类。我们在图像和视频模式下 demonstate了我们的方法的探索性。此外,我们进行了广泛的减少研究,以提供更好的理解。我们的方法可以在 Cifar10、UCF101 和 MPSC-ARL 数据集上实现 94%、93% 和 85% 的分类精度,同时实现 ~90%、84% 和 ~72% 的 clustering 精度,与状态 искусственный智能方法相匹配。
    Abstract Novel Categories Discovery (NCD) tackles the open-world problem of classifying known and clustering novel categories based on the class semantics using partial class space annotated data. Unlike traditional pseudo-label and retraining, we investigate NCD from the novel data probability matrix perspective. We leverage the connection between NCD novel data sampling with provided novel class Multinoulli (categorical) distribution and hypothesize to implicitly achieve semantic-based novel data clustering by learning their class distribution. We propose novel constraints on first-order (mean) and second-order (covariance) statistics of probability matrix features while applying instance-wise information constraints. In particular, we align the neuron distribution (activation patterns) under a large batch of Monte-Carlo novel data sampling by matching their empirical features mean and covariance with the provided Multinoulli-distribution. Simultaneously, we minimize entropy and enforce prediction consistency for each instance. Our simple approach successfully realizes semantic-based novel data clustering provided the semantic similarity between label-unlabeled classes. We demonstrate the discriminative capacity of our approaches in image and video modalities. Moreover, we perform extensive ablation studies regarding data, networks, and our framework components to provide better insights. Our approach maintains ~94%, ~93%, and ~85%, classification accuracy in labeled data while achieving ~90%, ~84%, and ~72% clustering accuracy for novel categories for Cifar10, UCF101, and MPSC-ARL datasets that matches state-of-the-art approaches without any external clustering.
    摘要 新领域发现(NCD)处理开放世界中的已知类和新类分类问题,基于类 semantics 使用偏序数据进行分类。 unlike traditional pseudo-label 和重新训练,我们从新数据概率矩阵的视角进行研究。我们利用新数据采样与提供的新类 Multinoulli 分布之间的连接,并假设通过学习其类分布来隐式地实现 semantic-based 新数据归类。我们提出了新的一级(平均值)和二级(协方差)统计特征的约束,并在实例级别上应用情况约束。特别是,我们在大批量 Monte-Carlo 新数据采样中对 neuron 分布(活动模式)进行对齐,使其 empirical features 的平均值和协方差与提供的 Multinoulli-分布匹配。同时,我们减少 entropy 并强制实例级别预测一致。我们简单的方法成功地实现 semantic-based 新数据归类,只要提供类相似性。我们在图像和视频模式中展示了我们的方法的探索性能。此外,我们进行了广泛的数据、网络和框架组件的拟合研究,以提供更好的理解。我们的方法在 Cifar10、UCF101 和 MPSC-ARL 数据集上保持了 ~94%、~93% 和 ~85% 的分类精度,同时实现了 ~90%、~84% 和 ~72% 的归类精度,与状态艺技术相匹配。

TBSS++: A novel computational method for Tract-Based Spatial Statistics

  • paper_url: http://arxiv.org/abs/2307.05387
  • repo_url: None
  • paper_authors: Davood Karimi, Hamza Kebiri, Ali Gholipour
  • for: 这个论文旨在提高Diffusion-weighted磁共振成像(dMRI)中评估大脑白atter的精度和可靠性。
  • methods: 该论文提出了一种新的计算框架,通过准确的分割和数据集之间的精确匹配来超越现有方法的缺陷和局限性。
  • results: 对比TBSS方法,该论文的提议方法显示了更高的复制性和对数据干扰的Robustness。
    Abstract Diffusion-weighted magnetic resonance imaging (dMRI) is widely used to assess the brain white matter. One of the most common computations in dMRI involves cross-subject tract-specific analysis, whereby dMRI-derived biomarkers are compared between cohorts of subjects. The accuracy and reliability of these studies hinges on the ability to compare precisely the same white matter tracts across subjects. This is an intricate and error-prone computation. Existing computational methods such as Tract-Based Spatial Statistics (TBSS) suffer from a host of shortcomings and limitations that can seriously undermine the validity of the results. We present a new computational framework that overcomes the limitations of existing methods via (i) accurate segmentation of the tracts, and (ii) precise registration of data from different subjects/scans. The registration is based on fiber orientation distributions. To further improve the alignment of cross-subject data, we create detailed atlases of white matter tracts. These atlases serve as an unbiased reference space where the data from all subjects is registered for comparison. Extensive evaluations show that, compared with TBSS, our proposed framework offers significantly higher reproducibility and robustness to data perturbations. Our method promises a drastic improvement in accuracy and reproducibility of cross-subject dMRI studies that are routinely used in neuroscience and medical research.
    摘要 Diffusion-weighted магнитная резонансная томография (dMRI) 广泛用于评估大脑白 mater. 一种最常见的计算在 dMRI 中是 cross-subject 股道特征分析,其中 dMRI 获得的生物标志物被比较 между 团队的Subjects. 这些研究的准确性和可靠性取决于能够准确比较不同主体/扫描数据中的白 mater股道。 现有的计算方法,如 Tract-Based Spatial Statistics (TBSS),受到多种缺陷和局限性的影响,可能会严重损害研究结果的有效性。 我们提出了一种新的计算框架,通过以下两个方法来缓解现有方法的局限性:1. 精准的股道分割2. 基于纤维方向分布的数据重复注册为了进一步提高交由数据的对接,我们创建了详细的白 mater股道 атла斯。 这些 атла斯作为一种无偏参照空间,用于注册所有主体的数据,以便对比。 我们的方法与 TBSS 相比,具有显著更高的重复性和对数据扰动的抗难度。 我们的方法承诺可以大幅提高交由数据的精度和可重复性,这些研究在 neuroscience 和医学研究中 Routinely 使用。

Blocks2World: Controlling Realistic Scenes with Editable Primitives

  • paper_url: http://arxiv.org/abs/2307.03847
  • repo_url: None
  • paper_authors: Vaibhav Vavilala, Seemandhar Jain, Rahul Vasanth, Anand Bhattad, David Forsyth
  • for: 3D scene rendering and editing
  • methods: convex decomposition of images and conditioned synthesis
  • results: highly customizable scene rendering process with remarkable control over the synthesis of novel and edited scenesHere’s the full summary in Simplified Chinese:
  • for: 这paper是为了解决3D场景渲染和编辑问题
  • methods: 使用几何分解和受控合成
  • results: 提供一种高度自定义的场景渲染过程,可以高度控制创建和编辑场景的图像生成I hope this helps! Let me know if you have any other questions.
    Abstract We present Blocks2World, a novel method for 3D scene rendering and editing that leverages a two-step process: convex decomposition of images and conditioned synthesis. Our technique begins by extracting 3D parallelepipeds from various objects in a given scene using convex decomposition, thus obtaining a primitive representation of the scene. These primitives are then utilized to generate paired data through simple ray-traced depth maps. The next stage involves training a conditioned model that learns to generate images from the 2D-rendered convex primitives. This step establishes a direct mapping between the 3D model and its 2D representation, effectively learning the transition from a 3D model to an image. Once the model is fully trained, it offers remarkable control over the synthesis of novel and edited scenes. This is achieved by manipulating the primitives at test time, including translating or adding them, thereby enabling a highly customizable scene rendering process. Our method provides a fresh perspective on 3D scene rendering and editing, offering control and flexibility. It opens up new avenues for research and applications in the field, including authoring and data augmentation.
    摘要 我们介绍了Blocks2World,一种新的3D场景渲染和编辑方法,利用了两步过程:几何分解和条件生成。我们的技术首先从给定场景中的各种物体中提取3D矩形体使用几何分解,从而获得场景的原始表示。这些基本对象然后用简单的投影法生成对应的深度地图。接下来,我们将这些对应的数据用条件学习模型进行训练,以学习将3D模型转换为图像。这个步骤建立了3D模型和其2D表示之间的直接映射,从而学习了将3D模型转换为图像的过程。一旦模型完全训练完成,它可以在测试时对基本对象进行 manipulate,包括平移或添加,以此获得高度自定义的场景渲染过程。我们的方法为3D场景渲染和编辑带来了新的视角,提供了控制和灵活性。它打开了新的研究和应用领域,包括作者和数据增强。

Invariant Scattering Transform for Medical Imaging

  • paper_url: http://arxiv.org/abs/2307.04771
  • repo_url: None
  • paper_authors: Nafisa Labiba Ishrat Huda, Angona Biswas, MD Abdullah Al Nasim, Md. Fahim Rahman, Shoaib Ahmed
  • for: 这个论文主要研究的是用深度学习对医疗图像进行分类的方法。
  • methods: 这篇论文使用了散射变换,它是一种基于干扰的信号处理技术,可以帮助建立有用的图像分类表示。
  • results: 研究发现,使用散射变换可以提高医疗图像分类的精度和效率。Here’s the full text in Simplified Chinese:
  • for: 这个论文主要研究的是用深度学习对医疗图像进行分类的方法。
  • methods: 这篇论文使用了散射变换,它是一种基于干扰的信号处理技术,可以帮助建立有用的图像分类表示。
  • results: 研究发现,使用散射变换可以提高医疗图像分类的精度和效率。
    Abstract Invariant scattering transform introduces new area of research that merges the signal processing with deep learning for computer vision. Nowadays, Deep Learning algorithms are able to solve a variety of problems in medical sector. Medical images are used to detect diseases brain cancer or tumor, Alzheimer's disease, breast cancer, Parkinson's disease and many others. During pandemic back in 2020, machine learning and deep learning has played a critical role to detect COVID-19 which included mutation analysis, prediction, diagnosis and decision making. Medical images like X-ray, MRI known as magnetic resonance imaging, CT scans are used for detecting diseases. There is another method in deep learning for medical imaging which is scattering transform. It builds useful signal representation for image classification. It is a wavelet technique; which is impactful for medical image classification problems. This research article discusses scattering transform as the efficient system for medical image analysis where it's figured by scattering the signal information implemented in a deep convolutional network. A step by step case study is manifested at this research work.
    摘要 “恒常散射变换引入了一新的研究领域,即将信号处理与深度学习结合用于计算机视觉。目前,深度学习算法能够解决医疗领域多种问题。医疗图像用于检测脑癌或肿瘤、阿尔茨曼病、乳癌、 Parkinson 病等。在2020年大流行期间,机器学习和深度学习扮演了关键的角色,用于检测 COVID-19,包括变异分析、预测、诊断和决策。医疗图像如 X 射、MRI(磁共振成像)、CT扫描是用于检测疾病的。另一种在深度学习中用于医疗图像分类的方法是散射变换。它建立了有用的信号表示,用于图像分类问题。这篇研究文章介绍了散射变换作为医疗图像分析中效果的系统,其中将信号信息在深度卷积神经网络中散射。本研究文章通过一个步骤案例研究。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you prefer Traditional Chinese, please let me know.

Thoracic Cartilage Ultrasound-CT Registration using Dense Skeleton Graph

  • paper_url: http://arxiv.org/abs/2307.03800
  • repo_url: None
  • paper_authors: Zhongliang Jiang, Chenyang Li, Xuesong Li, Nassir Navab
  • for: 提高自动ultrasound(US)成像的精度和效率,尤其是在骨骼结构下方的高阻挡性肺部应用中。
  • methods: 使用图形基于非RIGID注册方法,考虑到骨骼表面特征,从CT模板中提取最佳图形表示,并使用自组织地图进行两次Successive Registration。
  • results: 对五个不同患者的软骨点云进行了评估,结果表明,提案的图形基于注册方法可以有效地将CT中的轨迹映射到当前设置中,并且非RIGID注册结果中的 Hausdorff 距离(Mean$\pm$SD)为9.48$\pm$0.27 mm,路径传输错误(Euclidean distance)为2.21$\pm$1.11 mm。
    Abstract Autonomous ultrasound (US) imaging has gained increased interest recently, and it has been seen as a potential solution to overcome the limitations of free-hand US examinations, such as inter-operator variations. However, it is still challenging to accurately map planned paths from a generic atlas to individual patients, particularly for thoracic applications with high acoustic-impedance bone structures under the skin. To address this challenge, a graph-based non-rigid registration is proposed to enable transferring planned paths from the atlas to the current setup by explicitly considering subcutaneous bone surface features instead of the skin surface. To this end, the sternum and cartilage branches are segmented using a template matching to assist coarse alignment of US and CT point clouds. Afterward, a directed graph is generated based on the CT template. Then, the self-organizing map using geographical distance is successively performed twice to extract the optimal graph representations for CT and US point clouds, individually. To evaluate the proposed approach, five cartilage point clouds from distinct patients are employed. The results demonstrate that the proposed graph-based registration can effectively map trajectories from CT to the current setup for displaying US views through limited intercostal space. The non-rigid registration results in terms of Hausdorff distance (Mean$\pm$SD) is 9.48$\pm$0.27 mm and the path transferring error in terms of Euclidean distance is 2.21$\pm$1.11 mm.
    摘要 自主式超声成像(US)已经在最近得到了更多的关注,被视为可以解决自由手超声检测中的操作员间变化的问题。然而,将规划的路径从通用Atlas到个体患者中的精准映射仍然是一个挑战。为解决这个问题,一种基于图的非RIGID注册方法被提议,以便将规划的路径从Atlas传递到当前设置,并且特别考虑下皮骨表面特征。为此,使用模板匹配将气肠和软骨分支分别分割出来。然后,基于CT模板生成了指向图。接着,在CT和US点云上进行了顺序的自组织地图使用地理 distance来抽取最佳表示。为评估提议方法,使用了五个不同患者的软骨点云。结果表明,提议的图基于注册可以有效地将CT中的路径映射到当前设置中,并且非RIGID注册的 Hausdorff距离(Mean$\pm$SD)为9.48$\pm$0.27 mm,路径传输错误(Euclidean distance)为2.21$\pm$1.11 mm。

Exploring the Lottery Ticket Hypothesis with Explainability Methods: Insights into Sparse Network Performance

  • paper_url: http://arxiv.org/abs/2307.13698
  • repo_url: None
  • paper_authors: Shantanu Ghosh, Kayhan Batmanghelich
  • for: 这个论文旨在找出一个高性能的稀疏网络,以便在有限存储的设备上部署,如移动电话。同时,AI的可解释性是非常重要的。
  • methods: 这个论文使用了 Lottery Ticket Hypothesis(LTH)来找到一个深度网络中的一个高性能的子网络。但是,有限的研究已经发现了LTH在可解释性方面的成功或失败。这个论文 исследова了剪辑后网络的性能是如何逐渐增加或减少的原因。使用Grad-CAM和Post-hoc概念瓶隔(PCBM)来调查剪辑后网络的可解释性。
  • results: 研究发现,随着剪辑更多的参数,网络的性能逐渐下降。发现的概念和像素从剪辑后的网络与原始网络不匹配,可能是性能下降的原因。
    Abstract Discovering a high-performing sparse network within a massive neural network is advantageous for deploying them on devices with limited storage, such as mobile phones. Additionally, model explainability is essential to fostering trust in AI. The Lottery Ticket Hypothesis (LTH) finds a network within a deep network with comparable or superior performance to the original model. However, limited study has been conducted on the success or failure of LTH in terms of explainability. In this work, we examine why the performance of the pruned networks gradually increases or decreases. Using Grad-CAM and Post-hoc concept bottleneck models (PCBMs), respectively, we investigate the explainability of pruned networks in terms of pixels and high-level concepts. We perform extensive experiments across vision and medical imaging datasets. As more weights are pruned, the performance of the network degrades. The discovered concepts and pixels from the pruned networks are inconsistent with the original network -- a possible reason for the drop in performance.
    摘要 发现一个高性能稀畴网络在大规模神经网络中是有利于在具有限制存储的设备上部署,如移动电话。此外,AI的可解释性是提高人工智能的信任的关键。抽奖假设(LTH)找到一个在深度网络中的网络,其性能与原始模型相当或更高。然而,有限的研究在LTH的成功或失败方面进行了解释性的研究。在这项工作中,我们调查了剪除网络性能的增加或减少原因。使用Grad-CAM和后置概念瓶颈模型(PCBM),我们研究剪除网络的可解释性,即像素和高级概念。我们在视觉和医学影像 dataset 上进行了广泛的实验。随着更多的权重被剪除,网络的性能下降。发现的概念和像素从剪除网络与原始网络不匹配,可能是性能下降的原因。

Synthesizing Forestry Images Conditioned on Plant Phenotype Using a Generative Adversarial Network

  • paper_url: http://arxiv.org/abs/2307.03789
  • repo_url: None
  • paper_authors: Debasmita Pal, Arun Ross
  • for: 这种研究的目的是开发一种基于生成对抗网络(GAN)的方法,用于生成符合某特定地区植被特征的人工森林图像,以提高农业生产力。
  • methods: 这种方法使用了自动化的数字相机图像,提供由国家生态观测网络(NEON),并由phenocam网络处理。它还使用了生成对抗网络(GAN)来生成符合植被特征的人工图像。
  • results: 这种方法可以准确地生成符合植被特征的人工图像,并且可以用来预测另一种植被特征:植物的红色度。这种方法的可重复性和扩展性也被证明。
    Abstract Plant phenology and phenotype prediction using remote sensing data is increasingly gaining the attention of the plant science community to improve agricultural productivity. In this work, we generate synthetic forestry images that satisfy certain phenotypic attributes, viz. canopy greenness. The greenness index of plants describes a particular vegetation type in a mixed forest. Our objective is to develop a Generative Adversarial Network (GAN) to synthesize forestry images conditioned on this continuous attribute, i.e., greenness of vegetation, over a specific region of interest. The training data is based on the automated digital camera imagery provided by the National Ecological Observatory Network (NEON) and processed by the PhenoCam Network. The synthetic images generated by our method are also used to predict another phenotypic attribute, viz., redness of plants. The Structural SIMilarity (SSIM) index is utilized to assess the quality of the synthetic images. The greenness and redness indices of the generated synthetic images are compared against that of the original images using Root Mean Squared Error (RMSE) in order to evaluate their accuracy and integrity. Moreover, the generalizability and scalability of our proposed GAN model is determined by effectively transforming it to generate synthetic images for other forest sites and vegetation types.
    摘要 植物生理学和形态预测使用远程感知数据在农业生产力提高方面得到越来越多的关注。在这项工作中,我们生成了符合某些形态特性的人工森林图像,其中一个是叶绿度。叶绿度指的是某种混合林中的植物种类。我们的目标是使用生成 adversarial Network (GAN) 来生成基于这个连续特征(植物覆盖物的绿度)的森林图像,在特定区域上进行条件生成。我们的训练数据来自自动化的数字相机图像,由国家生态观测网络(NEON)提供,并由phenoCam网络处理。我们的生成的人工图像也用于预测另一个形态特性:植物的红度。我们使用结构相似性(SSIM)指数来评估生成的图像质量。我们比较生成的绿度和红度指数与原始图像的Root Mean Squared Error(RMSE)来评估它们的准确性和完整性。此外,我们还确定了我们提议的GAN模型的普适性和扩展性,通过将其转换为生成其他森林站点和植物类型的 synthetic 图像。

Context-aware Pedestrian Trajectory Prediction with Multimodal Transformer

  • paper_url: http://arxiv.org/abs/2307.03786
  • repo_url: None
  • paper_authors: Haleh Damirchi, Michael Greenspan, Ali Etemad
  • for: 预测未来行人轨迹
  • methods: 使用多ModalEncoder-Decoder transformer架构,输入包括行人位置和ego汽车速度,单pass预测整个未来轨迹,适用于嵌入式边缘部署
  • results: 与当前状态艺术比较,常量错误低于0.5、1.0和1.5秒三个时刻点,并且比当前状态艺术更快于PIE和JAAD两个数据集。此外,灵活的多Modal配置对方法的影响也进行了证明。
    Abstract We propose a novel solution for predicting future trajectories of pedestrians. Our method uses a multimodal encoder-decoder transformer architecture, which takes as input both pedestrian locations and ego-vehicle speeds. Notably, our decoder predicts the entire future trajectory in a single-pass and does not perform one-step-ahead prediction, which makes the method effective for embedded edge deployment. We perform detailed experiments and evaluate our method on two popular datasets, PIE and JAAD. Quantitative results demonstrate the superiority of our proposed model over the current state-of-the-art, which consistently achieves the lowest error for 3 time horizons of 0.5, 1.0 and 1.5 seconds. Moreover, the proposed method is significantly faster than the state-of-the-art for the two datasets of PIE and JAAD. Lastly, ablation experiments demonstrate the impact of the key multimodal configuration of our method.
    摘要 我们提出了一种新的解决方案,用于预测行人未来路径。我们的方法使用一种多ModalEncoder-Decoder变换架构,该架构接受行人位置和自身车速度作为输入。值得注意的是,我们的解码器在单次执行中预测整个未来路径,而不是一步预测,这使得方法适合嵌入式边缘部署。我们进行了详细的实验和PIE和JAAD两个流行的数据集上的评估。量化结果表明,我们的提出的模型在0.5、1.0和1.5秒三个时刻的错误率始终保持最低,并且与现有状态的艺术 consistently outperform。此外,我们的方法在PIE和JAAD两个数据集上明显更快于现有状态。最后,我们进行了关键多模式配置的ablation实验,以评估其影响。

Unsupervised 3D out-of-distribution detection with latent diffusion models

  • paper_url: http://arxiv.org/abs/2307.03777
  • repo_url: https://github.com/marksgraham/ddpm-ood
  • paper_authors: Mark S. Graham, Walter Hugo Lopez Pinaya, Paul Wright, Petru-Daniel Tudosiu, Yee H. Mah, James T. Teo, H. Rolf Jäger, David Werring, Parashkev Nachev, Sebastien Ourselin, M. Jorge Cardoso
  • For: The paper is written for detecting out-of-distribution (OOD) data in 3D medical data using Latent Diffusion Models (LDMs).* Methods: The paper proposes using LDMs to scale denoising diffusion probabilistic models (DDPMs) to high-resolution 3D medical data, and compares the proposed approach to a recently proposed, 3D-enabled approach using Latent Transformer Models (LTMs).* Results: The proposed LDM-based approach achieves statistically significant better performance than the LTM-based approach, with less sensitivity to the underlying latent representation, more favourable memory scaling, and produces better spatial anomaly maps.Here’s the simplified Chinese text for the three key points:* 为:该文章是用Latent Diffusion Models(LDMs)检测三维医疗数据中的外围数据(Out-of-distribution,OOD)。* 方法:文章提出使用LDMs将denoising diffusion probabilistic models(DDPMs)扩展到高分辨率三维医疗数据,并与近期提出的使用Latent Transformer Models(LTMs)的3D可用方法进行比较。* 结果:提出的LDM-based方法与LTM-based方法进行比较,显示LDM-based方法具有更好的性能,具有更好的准确率、更好的嵌入特征、更好的空间异常映射。
    Abstract Methods for out-of-distribution (OOD) detection that scale to 3D data are crucial components of any real-world clinical deep learning system. Classic denoising diffusion probabilistic models (DDPMs) have been recently proposed as a robust way to perform reconstruction-based OOD detection on 2D datasets, but do not trivially scale to 3D data. In this work, we propose to use Latent Diffusion Models (LDMs), which enable the scaling of DDPMs to high-resolution 3D medical data. We validate the proposed approach on near- and far-OOD datasets and compare it to a recently proposed, 3D-enabled approach using Latent Transformer Models (LTMs). Not only does the proposed LDM-based approach achieve statistically significant better performance, it also shows less sensitivity to the underlying latent representation, more favourable memory scaling, and produces better spatial anomaly maps. Code is available at https://github.com/marksgraham/ddpm-ood
    摘要 <>将文本翻译成简化中文。<>在真实世界临床深度学习系统中,对于不同数据集的异常检测方法是非常重要的组件。经典的杂噪扩散概率模型(DDPM)在2D数据集上进行重建基于异常检测已被提议,但是这些模型不直接适用于3D数据集。在这个工作中,我们提议使用幽默扩散模型(LDM),以便将DDPM扩展到高分辨率3D医学数据集。我们验证提议的方法在靠近和远离异常数据集上,并与最近提议的3D启用的Latent Transformer Models(LTM)进行比较。我们发现提议的LDM基本逻辑达到了统计学上显著更好的性能,同时也具有更好的内存扩展和更好的空间异常地图。代码可以在https://github.com/marksgraham/ddpm-ood中找到。

AutoDecoding Latent 3D Diffusion Models

  • paper_url: http://arxiv.org/abs/2307.05445
  • repo_url: https://github.com/snap-research/3dvader
  • paper_authors: Evangelos Ntavelis, Aliaksandr Siarohin, Kyle Olszewski, Chaoyang Wang, Luc Van Gool, Sergey Tulyakov
  • for: 生成静态和动态3D资产的新方法,包括3D自动解码器框架,用于捕捉视图一致的外观和几何结构。
  • methods: 使用目标数据集中学习的属性嵌入在缺省空间中,然后将其解码成可视化的涂抹表示形式,并使用robust抽象和正则化操作来学习3D协振。
  • results: 比state-of-the-art方法高效,在多视图图像数据集、实际野外视频和大规模真实视频数据集上获得优秀的生成结果。
    Abstract We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core. The 3D autodecoder framework embeds properties learned from the target dataset in the latent space, which can then be decoded into a volumetric representation for rendering view-consistent appearance and geometry. We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations to learn a 3D diffusion from 2D images or monocular videos of rigid or articulated objects. Our approach is flexible enough to use either existing camera supervision or no camera information at all -- instead efficiently learning it during training. Our evaluations demonstrate that our generation results outperform state-of-the-art alternatives on various benchmark datasets and metrics, including multi-view image datasets of synthetic objects, real in-the-wild videos of moving people, and a large-scale, real video dataset of static objects.
    摘要 我们提出了一种新的方法来生成静止和受拘式3D资产,其核心是3D自动解码器框架。该框架嵌入目标数据集中学习的属性在缺失空间中嵌入,然后可以用于渲染视角一致的外观和几何结构。我们然后确定了适当的中间缺失空间,并引入了稳定的 нормализа和解决操作来学习3D扩散从2D图像或半球形物体的照片或视频中。我们的方法可以使用现有的摄像头监督或没有摄像头信息,而不是在训练中高效地学习。我们的评估表明,我们的生成结果超过了当前的参考方法在多视图图像集、实际野外视频中的人体动作和大规模、实际视频集中的多种数据集和指标上。

Training Ensembles with Inliers and Outliers for Semi-supervised Active Learning

  • paper_url: http://arxiv.org/abs/2307.03741
  • repo_url: https://github.com/vladan-stojnic/active-outliers
  • paper_authors: Vladan Stojnić, Zakaria Laskar, Giorgos Tolias
  • for: 本文采用深度学习和活动学习方法,解决在异常示例存在下的激活学习问题。
  • methods: 本文提出了三个高度协同的组件,包括: joint 分类器训练(含异常示例和正常示例)、 semi-supervised learning 通过 pseudo-labeling、模型ensemble。
  • results: 本文的实验结果表明,将这三个组件结合使用可以提高 pseudo-labeling 的准确率和数据收集质量。特别是,joint 训练可以正确处理异常示例,无需进行Explicit outlier detection。此外,我们的方法的简单性和易用性,使其在性能上超越其他方法。
    Abstract Deep active learning in the presence of outlier examples poses a realistic yet challenging scenario. Acquiring unlabeled data for annotation requires a delicate balance between avoiding outliers to conserve the annotation budget and prioritizing useful inlier examples for effective training. In this work, we present an approach that leverages three highly synergistic components, which are identified as key ingredients: joint classifier training with inliers and outliers, semi-supervised learning through pseudo-labeling, and model ensembling. Our work demonstrates that ensembling significantly enhances the accuracy of pseudo-labeling and improves the quality of data acquisition. By enabling semi-supervision through the joint training process, where outliers are properly handled, we observe a substantial boost in classifier accuracy through the use of all available unlabeled examples. Notably, we reveal that the integration of joint training renders explicit outlier detection unnecessary; a conventional component for acquisition in prior work. The three key components align seamlessly with numerous existing approaches. Through empirical evaluations, we showcase that their combined use leads to a performance increase. Remarkably, despite its simplicity, our proposed approach outperforms all other methods in terms of performance. Code: https://github.com/vladan-stojnic/active-outliers
    摘要 深入的活动学习在异常示例存在下 pose 一个现实 yet 挑战的场景。 获取未标注数据 для注释需要一个细腻的平衡,以避免异常示例,并且优先级划分有用的准确示例,以便有效地训练。 在这种情况下,我们提出了一种方法,该方法利用三个高度协同的组件,即: joint 类ifier 训练与准确示例和异常示例, semi-supervised 学习通过pseudo-labeling,以及模型集成。 我们的工作表明,将这三个组件相互融合可以显著提高 pseudo-labeling 的准确度和数据收集质量。通过允许 semi-supervision 过程中的异常示例处理,我们发现了一个非常大的提升准确率,并且可以使用所有可用的无标注示例。另外,我们发现在 joint 训练过程中,无需进行明确的异常检测,可以直接使用所有示例进行训练。这三个关键组件与许多现有方法兼容。通过实验评估,我们表明这些组件的结合使用可以带来性能提升。尤其是,我们的提议方法的简单性和高效性,在性能方面胜过所有其他方法。代码:https://github.com/vladan-stojnic/active-outliersNote: Please note that the translation is in Simplified Chinese, and the word order and grammar may be different from the original text.

Equivariant Single View Pose Prediction Via Induced and Restricted Representations

  • paper_url: http://arxiv.org/abs/2307.03704
  • repo_url: None
  • paper_authors: Owen Howell, David Klee, Ondrej Biza, Linfeng Zhao, Robin Walters
  • for: 本研究旨在解决计算机视觉中的基本问题,即从二dimensional图像中学习三dimensional世界。
  • methods: 我们使用SO(3)对准的约束来限制二dimensional输入的可能性,并使用SO(2)对准的约束来保证图像的准确性。
  • results: 我们提出了一种新的算法,可以learn三dimensional世界的表示从二dimensional图像中,并在PASCAL3D+和SYMSOL两个 pose estimation 任务上达到了最高精度。
    Abstract Learning about the three-dimensional world from two-dimensional images is a fundamental problem in computer vision. An ideal neural network architecture for such tasks would leverage the fact that objects can be rotated and translated in three dimensions to make predictions about novel images. However, imposing SO(3)-equivariance on two-dimensional inputs is difficult because the group of three-dimensional rotations does not have a natural action on the two-dimensional plane. Specifically, it is possible that an element of SO(3) will rotate an image out of plane. We show that an algorithm that learns a three-dimensional representation of the world from two dimensional images must satisfy certain geometric consistency properties which we formulate as SO(2)-equivariance constraints. We use the induced and restricted representations of SO(2) on SO(3) to construct and classify architectures which satisfy these geometric consistency constraints. We prove that any architecture which respects said consistency constraints can be realized as an instance of our construction. We show that three previously proposed neural architectures for 3D pose prediction are special cases of our construction. We propose a new algorithm that is a learnable generalization of previously considered methods. We test our architecture on three pose predictions task and achieve SOTA results on both the PASCAL3D+ and SYMSOL pose estimation tasks.
    摘要 学习三维世界从二维图像是计算机视觉的基本问题。理想的神经网络架构 для此类任务应该利用对象可以在三维空间中旋转和平移来做预测。但是,在将 SO(3) 对二维平面的动作直观不是自然的,这使得在二维输入上强制 SO(3) 对称性是困难的。我们显示,一个可以学习三维世界的二维图像表示需要满足某些几何一致性性质,我们称之为 SO(2) 对称性约束。我们使用 SO(2) 在 SO(3) 上的引出和受限表示来构建和分类满足这些几何一致性约束的架构。我们证明,任何满足这些约束的架构都可以通过我们的构建实现。我们显示,三种之前提出的神经网络架构 для 3D 姿态预测是特例我们的构建。我们提出一种新的算法,它是learnable的一般化前述方法。我们测试我们的架构在三种姿态预测任务上,并在 PASCAL3D+ 和 SYMSOL 姿态预测任务上达到了最高的 SOTA 结果。

Motion Magnification in Robotic Sonography: Enabling Pulsation-Aware Artery Segmentation

  • paper_url: http://arxiv.org/abs/2307.03698
  • repo_url: https://github.com/dianyehuang/robpmepasnn
  • paper_authors: Dianye Huang, Yuan Bi, Nassir Navab, Zhongliang Jiang
  • for: 该研究旨在提高血液动力图像 segmentation 精度和稳定性,通过利用心脏血流带动的信息来帮助临床医生诊断和监测动脉疾病。
  • methods: 该研究使用了一种新的振荡帮助分割神经网络(PAS-NN),利用心脏血流带动的信息来帮助定位动脉,并使用了运动增强技术来增强心脏血流的信号。
  • results: 实验结果表明,PAS-NN 可以与现有技术相当,并且可以有效地提高小动脉(血管)的 segmentation 性能。
    Abstract Ultrasound (US) imaging is widely used for diagnosing and monitoring arterial diseases, mainly due to the advantages of being non-invasive, radiation-free, and real-time. In order to provide additional information to assist clinicians in diagnosis, the tubular structures are often segmented from US images. To improve the artery segmentation accuracy and stability during scans, this work presents a novel pulsation-assisted segmentation neural network (PAS-NN) by explicitly taking advantage of the cardiac-induced motions. Motion magnification techniques are employed to amplify the subtle motion within the frequency band of interest to extract the pulsation signals from sequential US images. The extracted real-time pulsation information can help to locate the arteries on cross-section US images; therefore, we explicitly integrated the pulsation into the proposed PAS-NN as attention guidance. Notably, a robotic arm is necessary to provide stable movement during US imaging since magnifying the target motions from the US images captured along a scan path is not manually feasible due to the hand tremor. To validate the proposed robotic US system for imaging arteries, experiments are carried out on volunteers' carotid and radial arteries. The results demonstrated that the PAS-NN could achieve comparable results as state-of-the-art on carotid and can effectively improve the segmentation performance for small vessels (radial artery).
    摘要 ultrasound(US)成像广泛用于诊断和监测arterial疾病,主要因为非侵入性、无辐射和实时等优点。为了为临床医生提供更多的诊断信息,在US图像中分割 tubular结构成为一项重要的任务。为了提高artery分割精度和稳定性,本研究提出了一种基于征动辐射信号的新型pulsation-assisted segmentation neural network(PAS-NN)。在图像序列中提取 cardiac-induced motions 的柔化信号,并将其作为注意力引导进行explicitly integrating。由于需要稳定的运动来提供US成像,因此在US成像过程中使用了Robotic arm。为验证提出的Robotic US系统在诊断arteries中的可行性,对志愿者的 Common carotid和Radial artery进行了实验。结果表明,PAS-NN可以与当前状态艺术一样好,并且可以有效地提高小动脉(Radial artery)的分割性能。

cs.AI - 2023-07-08

PCG-based Static Underground Garage Scenario Generation

  • paper_url: http://arxiv.org/abs/2307.03988
  • repo_url: None
  • paper_authors: Wenjin Li, Kai Li
  • for: 本研究旨在用Sarsa算法解决地下停车场 static scenario simulation 的PCG问题。
  • methods: 本paper使用Sarsa算法进行PCG,以生成具有充足细节的地下停车场场景。
  • results: 本研究实现了基于Sarsa算法的PCG方法,可以生成高质量的地下停车场场景,为自动驾驶技术的训练提供了更多的数据支持。
    Abstract Autonomous driving technology has five levels, from L0 to L5. Currently, only the L2 level (partial automation) can be achieved, and there is a long way to go before reaching the final level of L5 (full automation). The key to crossing these levels lies in training the autonomous driving model. However, relying solely on real-world road data to train the model is far from enough and consumes a great deal of resources. Although there are already examples of training autonomous driving models through simulators that simulate real-world scenarios, these scenarios require complete manual construction. Directly converting 3D scenes from road network formats will lack a large amount of detail and cannot be used as training sets. Underground parking garage static scenario simulation is regarded as a procedural content generation (PCG) problem. This paper will use the Sarsa algorithm to solve procedural content generation on underground garage structures.
    摘要 自动驾驶技术有五级,从L0到L5。目前只有L2级(部分自动化)可以实现,剩下的级别还有很长的路要走。模型训练是十分重要的关键。然而,仅仅通过使用现实世界道路数据来训练模型是费尽资源的,而且需要大量的数据。虽然已经有些使用模拟器 simulate real-world scenarios的例子,但这些场景需要完全手动构建。直接将3D场景从道路网络格式转换来用作训练集是缺乏详细信息的,无法用于训练。地下停车场 static scenario simulation被视为一个过程Content generation(PCG)问题。本文使用Sarsa算法解决地下停车场的PCG问题。

Integrating Curricula with Replays: Its Effects on Continual Learning

  • paper_url: http://arxiv.org/abs/2307.05747
  • repo_url: https://github.com/zhanglab-deepneurocoglab/integrating-curricula-with-replays
  • paper_authors: Ren Jie Tee, Mengmi Zhang
  • for: 这种研究旨在探讨在进行连续学习时,通过融合课程和重温方法,以便提高知识保持和学习传递。
  • methods: 研究使用了不同的课程设计,包括交叠频率、顺序和选择策略,以影响重温过程中的连续学习。
  • results: 研究发现,通过融合课程和重温方法,可以有效地避免忘却现象,并提高知识传递。这些结果表明,融合课程可以成为连续学习方法的进一步发展。
    Abstract Humans engage in learning and reviewing processes with curricula when acquiring new skills or knowledge. This human learning behavior has inspired the integration of curricula with replay methods in continual learning agents. The goal is to emulate the human learning process, thereby improving knowledge retention and facilitating learning transfer. Existing replay methods in continual learning agents involve the random selection and ordering of data from previous tasks, which has shown to be effective. However, limited research has explored the integration of different curricula with replay methods to enhance continual learning. Our study takes initial steps in examining the impact of integrating curricula with replay methods on continual learning in three specific aspects: the interleaved frequency of replayed exemplars with training data, the sequence in which exemplars are replayed, and the strategy for selecting exemplars into the replay buffer. These aspects of curricula design align with cognitive psychology principles and leverage the benefits of interleaved practice during replays, easy-to-hard rehearsal, and exemplar selection strategy involving exemplars from a uniform distribution of difficulties. Based on our results, these three curricula effectively mitigated catastrophic forgetting and enhanced positive knowledge transfer, demonstrating the potential of curricula in advancing continual learning methodologies. Our code and data are available: https://github.com/ZhangLab-DeepNeuroCogLab/Integrating-Curricula-with-Replays
    摘要 人类在学习和复习过程中使用课程,当学习新技能或知识时。这种人类学习行为激发了在不断学习代理人中结合课程和复习方法的整合。目标是模拟人类学习过程,从而提高知识保持和学习转移。现有的复习方法在不断学习代理人中Randomly selecting and ordering data from previous tasks has been shown to be effective. However, limited research has explored the integration of different curricula with replay methods to enhance continual learning. Our study takes initial steps in examining the impact of integrating curricula with replay methods on continual learning in three specific aspects: the interleaved frequency of replayed exemplars with training data, the sequence in which exemplars are replayed, and the strategy for selecting exemplars into the replay buffer. These aspects of curricula design align with cognitive psychology principles and leverage the benefits of interleaved practice during replays, easy-to-hard rehearsal, and exemplar selection strategy involving exemplars from a uniform distribution of difficulties. According to our results, these three curricula effectively mitigated catastrophic forgetting and enhanced positive knowledge transfer, demonstrating the potential of curricula in advancing continual learning methodologies.我们的代码和数据可以在 GitHub上获取:https://github.com/ZhangLab-DeepNeuroCogLab/Integrating-Curricula-with-Replays

Autonomy 2.0: The Quest for Economies of Scale

  • paper_url: http://arxiv.org/abs/2307.03973
  • repo_url: None
  • paper_authors: Shuang Wu, Bo Yu, Shaoshan Liu, Yuhao Zhu
  • for: 本文主要适用于 autonomous machines 领域的技术挑战和经济影响。
  • methods: 本文使用技术分析和经济分析方法来探讨 autonomous machines 领域的可描述性和经济可能性。
  • results: 本文 argue that scalability 是 autonomous machines 领域的关键因素,但现有的发展模式(Autonomy 1.0)不能充分利用计算成本和数据资源的经济效益。 在解决关键瓶颈的同时,新的发展模式(Autonomy 2.0)可以大幅提高 autonomous machines 领域的可描述性和经济可能性。
    Abstract With the advancement of robotics and AI technologies in the past decade, we have now entered the age of autonomous machines. In this new age of information technology, autonomous machines, such as service robots, autonomous drones, delivery robots, and autonomous vehicles, rather than humans, will provide services. In this article, through examining the technical challenges and economic impact of the digital economy, we argue that scalability is both highly necessary from a technical perspective and significantly advantageous from an economic perspective, thus is the key for the autonomy industry to achieve its full potential. Nonetheless, the current development paradigm, dubbed Autonomy 1.0, scales with the number of engineers, instead of with the amount of data or compute resources, hence preventing the autonomy industry to fully benefit from the economies of scale, especially the exponentially cheapening compute cost and the explosion of available data. We further analyze the key scalability blockers and explain how a new development paradigm, dubbed Autonomy 2.0, can address these problems to greatly boost the autonomy industry.
    摘要 Translated into Simplified Chinese:随着过去十年的机器人和人工智能技术的发展,我们已经进入了自动化机器的时代。在这个新的信息技术时代,自动化机器,如服务机器人、自动飞行器、配送机器人和自动驾驶车辆,而不是人类,将提供服务。在这篇文章中,我们通过分析技术挑战和数字经济的影响, argue that可扩展性是技术上必需的和经济上有利的,因此是自动化industry的潜在力量。然而,当前的开发模式,称为Autonomy 1.0,与工程师数量成比例增长,而不是与数据量或计算资源成比例增长,因此阻碍了自动化industry从全面获得经济效益,特别是快速减少的计算成本和可用数据的爆发。我们进一步分析阻碍可扩展性的关键问题,并解释如何一种新的开发模式,称为Autonomy 2.0,可以解决这些问题,以大幅提高自动化industry。

Multi-Intent Detection in User Provided Annotations for Programming by Examples Systems

  • paper_url: http://arxiv.org/abs/2307.03966
  • repo_url: None
  • paper_authors: Nischal Ashok Kumar, Nitin Gupta, Shanmukha Guttula, Hima Patel
  • for: 这个论文的目的是解决在集成开发中数据映射的问题,尤其是在应用程序缺乏命名标准和嵌套字段结构的情况下。
  • methods: 这个论文使用了编程例子(PBE)技术来自动生成数据转换程序,从用户提供的输入和输出样本中学习计算机程序的正确意图。
  • results: 该论文提出了一种深度神经网络基于不确定性预测模型,可以分析输入输出字符串并将其映射到不同的属性集,以解决PBE系统中的多意问题。
    Abstract In mapping enterprise applications, data mapping remains a fundamental part of integration development, but its time consuming. An increasing number of applications lack naming standards, and nested field structures further add complexity for the integration developers. Once the mapping is done, data transformation is the next challenge for the users since each application expects data to be in a certain format. Also, while building integration flow, developers need to understand the format of the source and target data field and come up with transformation program that can change data from source to target format. The problem of automatic generation of a transformation program through program synthesis paradigm from some specifications has been studied since the early days of Artificial Intelligence (AI). Programming by Example (PBE) is one such kind of technique that targets automatic inferencing of a computer program to accomplish a format or string conversion task from user-provided input and output samples. To learn the correct intent, a diverse set of samples from the user is required. However, there is a possibility that the user fails to provide a diverse set of samples. This can lead to multiple intents or ambiguity in the input and output samples. Hence, PBE systems can get confused in generating the correct intent program. In this paper, we propose a deep neural network based ambiguity prediction model, which analyzes the input-output strings and maps them to a different set of properties responsible for multiple intent. Users can analyze these properties and accordingly can provide new samples or modify existing samples which can help in building a better PBE system for mapping enterprise applications.
    摘要 Mapping企业应用程序中,数据映射仍然是集成开发的基本部分,但是它占用了很多时间。越来越多的应用程序缺少命名标准,嵌套的字段结构进一步增加了集成开发人员的复杂性。一旦映射完成,则下一个挑战是数据转换,因为每个应用程序都会预期数据在某种格式下来。在建立集成流时,开发人员需要理解源数据和目标数据字段的格式,并编写转换程序以将数据从源格式转换到目标格式。自AI时代以来,人工智能编程方法已经被研究了很长时间。在这个过程中,一种技术是编程示例(PBE),它可以自动生成计算机程序,以实现格式或字符串转换任务。为了学习正确的意图,用户需要提供多样的示例。然而,用户可能无法提供多样的示例,这会导致多个意图或输入/输出样本的歧义。因此,PBE系统可能会在生成正确意图程序时感到困惑。在本文中,我们提出了一种基于深度神经网络的歧义预测模型,该模型可以分析输入/输出字符串,并将其映射到不同的属性集,这些属性集负责多个意图。用户可以分析这些属性,并根据这些属性提供新的示例或修改现有示例,以帮助建立更好的PBE系统。

Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions

  • paper_url: http://arxiv.org/abs/2307.03941
  • repo_url: None
  • paper_authors: Dawen Zhang, Pamela Finckenberg-Broman, Thong Hoang, Shidong Pan, Zhenchang Xing, Mark Staples, Xiwei Xu
  • For: This paper explores the challenges of implementing the Right to Be Forgotten (RTBF) in Large Language Models (LLMs) and provides insights on how to implement technical solutions for RTBF.* Methods: The paper discusses the use of machine unlearning, model editing, and prompting engineering as potential solutions for RTBF in LLMs.* Results: The paper provides insights on the challenges of implementing RTBF in LLMs and suggests potential solutions for compliance with the RTBF.Here’s the text in Simplified Chinese:* For: 这篇论文研究了大语言模型(LLM)中实现“忘记权”(RTBF)的挑战,并提供了实现RTBF的技术解决方案。* Methods: 论文讨论了机器“忘记”、模型修改和引导工程等可能的解决方案。* Results: 论文提供了LLM中实现RTBF的挑战和可能的解决方案。
    Abstract The Right to be Forgotten (RTBF) was first established as the result of the ruling of Google Spain SL, Google Inc. v AEPD, Mario Costeja Gonz\'alez, and was later included as the Right to Erasure under the General Data Protection Regulation (GDPR) of European Union to allow individuals the right to request personal data be deleted by organizations. Specifically for search engines, individuals can send requests to organizations to exclude their information from the query results. With the recent development of Large Language Models (LLMs) and their use in chatbots, LLM-enabled software systems have become popular. But they are not excluded from the RTBF. Compared with the indexing approach used by search engines, LLMs store, and process information in a completely different way. This poses new challenges for compliance with the RTBF. In this paper, we explore these challenges and provide our insights on how to implement technical solutions for the RTBF, including the use of machine unlearning, model editing, and prompting engineering.
    摘要 “右投忘”(RTBF)首次得到了Google西班牙SL、Google公司诉AEPD、马里奥·科斯泰加·冈萨雷斯案例的判决, later被包含在欧盟数据保护条例(GDPR)中,以allow个人请求组织删除个人数据。特别是 для搜索引擎,个人可以向组织发送请求,请求排除他们的信息从查询结果中。随着大自然语言模型(LLMs)的发展和它们在 чат机器人中的使用,LLM-enabled software systems 已成为流行的。但它们并不是RTBF的例外。与搜索引擎使用的索引方法不同,LLMs存储和处理信息的方式带来了新的RTBF的挑战。在这篇论文中,我们探讨这些挑战,并提供了实现RTBF的技术解决方案,包括机器学习、模型编辑和提示工程。

Copilot for Xcode: Exploring AI-Assisted Programming by Prompting Cloud-based Large Language Models

  • paper_url: http://arxiv.org/abs/2307.14349
  • repo_url: None
  • paper_authors: Chee Wei Tan, Shangxin Guo, Man Fai Wong, Ching Nam Hang
  • for: 支持人类软件开发者的AI助手工具,帮助开发者更快速、更高效地完成软件开发任务。
  • methods: 利用云端大语言模型(LLM)和Apple的本地开发环境Xcode进行集成,通过高级自然语言处理(NLP)技术来处理代码符号和代码模式,实现代码生成、自动完成、文档生成和错误探测等功能。
  • results: 通过在Xcode中 integrate LLM,可以提高开发效率和释放创造力,并且可以同时进行一些小型决策,通过提示工程来帮助开发者更快速地完成软件开发任务。
    Abstract This paper presents an AI-assisted programming tool called Copilot for Xcode for program composition and design to support human software developers. By seamlessly integrating cloud-based Large Language Models (LLM) with Apple's local development environment, Xcode, this tool enhances productivity and unleashes creativity for software development in Apple software ecosystem (e.g., iOS apps, macOS). Leveraging advanced natural language processing (NLP) techniques, Copilot for Xcode effectively processes source code tokens and patterns within code repositories, enabling features such as code generation, autocompletion, documentation, and error detection. Software developers can also query and make "small" decisions for program composition, some of which can be made simultaneously, and this is facilitated through prompt engineering in a chat interface of Copilot for Xcode. Finally, we present simple case studies as evidence of the effectiveness of utilizing NLP in Xcode to prompt popular LLM services like OpenAI ChatGPT for program composition and design.
    摘要 这篇论文描述了一个基于人工智能的编程工具called Copilot,用于增强 Xcode 中的软件开发产业。该工具通过将云端大语言模型(LLM)与 Apple 的本地开发环境 XCode 集成,以提高开发效率和释放创造力。通过进行源代码Token和模式的高级自然语言处理(NLP)处理,Copilot for Xcode 可以实现代码生成、自动完成、文档生成和错误检测等功能。开发者可以通过提示工程来进行小决策,并通过交互式弹出框架来同时进行多个决策。最后,我们提供了一些简单的案例研究,以证明使用 NLP 在 Xcode 中提示流行的 LLM 服务如 OpenAI ChatGPT 可以增强软件开发和设计。

Inductive Meta-path Learning for Schema-complex Heterogeneous Information Networks

  • paper_url: http://arxiv.org/abs/2307.03937
  • repo_url: None
  • paper_authors: Shixuan Liu, Changjun Fan, Kewei Cheng, Yunfei Wang, Peng Cui, Yizhou Sun, Zhong Liu
  • for: 这篇论文主要是为了解决复杂 schema 上的 Heterogeneous Information Networks (HINs) 中的 meta-path 问题。
  • methods: 该论文提出了一种 inducing meta-path learning 框架,使用 schema-level 表示来支持不同关系的 meta-path 学习,并采用了一种基于奖励学习的路径找索引机制来学习Establishing meta-paths with high coverage and confidence for multiple relations。
  • results: 实验结果表明,该提出的方法可以有效地解决复杂 schema 上的 HINs 中 meta-path 问题,并且可以提高 meta-path 的覆盖率和信任度。
    Abstract Heterogeneous Information Networks (HINs) are information networks with multiple types of nodes and edges. The concept of meta-path, i.e., a sequence of entity types and relation types connecting two entities, is proposed to provide the meta-level explainable semantics for various HIN tasks. Traditionally, meta-paths are primarily used for schema-simple HINs, e.g., bibliographic networks with only a few entity types, where meta-paths are often enumerated with domain knowledge. However, the adoption of meta-paths for schema-complex HINs, such as knowledge bases (KBs) with hundreds of entity and relation types, has been limited due to the computational complexity associated with meta-path enumeration. Additionally, effectively assessing meta-paths requires enumerating relevant path instances, which adds further complexity to the meta-path learning process. To address these challenges, we propose SchemaWalk, an inductive meta-path learning framework for schema-complex HINs. We represent meta-paths with schema-level representations to support the learning of the scores of meta-paths for varying relations, mitigating the need of exhaustive path instance enumeration for each relation. Further, we design a reinforcement-learning based path-finding agent, which directly navigates the network schema (i.e., schema graph) to learn policies for establishing meta-paths with high coverage and confidence for multiple relations. Extensive experiments on real data sets demonstrate the effectiveness of our proposed paradigm.
    摘要 非同质信息网络(HIN)是一种具有多种节点和边的信息网络。meta-path这个概念,即连接两个实体的一系列实体类型和关系类型的序列,是为了提供HIN任务的元素级别解释 semantics。然而,在Schema-Complex HINs中,例如知识库(KB)中的百种实体和关系类型,meta-path的采用受到了计算复杂性的限制。此外,评估meta-path需要列出相关的路径实例,这加重了meta-path学习过程中的复杂性。为解决这些挑战,我们提出了SchemaWalk,一种induktive meta-path学习框架 дляSchema-Complex HINs。我们使用schema层次表示meta-paths,以支持不同关系的score学习,从而消除了每个关系的枚举路径实例的需要。此外,我们设计了一种基于奖励学习的路径找索引器,可以直接在网络schema图(即schema图)上学习Establish meta-paths with high coverage and confidence for multiple relations。实验表明了我们提出的方法的有效性。

Towards Efficient In-memory Computing Hardware for Quantized Neural Networks: State-of-the-art, Open Challenges and Perspectives

  • paper_url: http://arxiv.org/abs/2307.03936
  • repo_url: None
  • paper_authors: Olga Krestinskaya, Li Zhang, Khaled Nabil Salama
  • for: 本研究旨在探讨在 Edge Computing 环境中实现吞吐量压缩神经网络,尤其是针对具有限制的能源和计算资源的边缘设备。
  • methods: 本研究使用 In-memory Computing (IMC) 技术和量化神经网络 (QNN) 来实现Edge Computing中的神经网络处理。
  • results: 本研究提供了一个完整的 QNN 和 IMC 硬件实现的评估,以及开放的挑战、设计要求、建议和前瞻,并提供了一个 IMCC 硬件路线图。
    Abstract The amount of data processed in the cloud, the development of Internet-of-Things (IoT) applications, and growing data privacy concerns force the transition from cloud-based to edge-based processing. Limited energy and computational resources on edge push the transition from traditional von Neumann architectures to In-memory Computing (IMC), especially for machine learning and neural network applications. Network compression techniques are applied to implement a neural network on limited hardware resources. Quantization is one of the most efficient network compression techniques allowing to reduce the memory footprint, latency, and energy consumption. This paper provides a comprehensive review of IMC-based Quantized Neural Networks (QNN) and links software-based quantization approaches to IMC hardware implementation. Moreover, open challenges, QNN design requirements, recommendations, and perspectives along with an IMC-based QNN hardware roadmap are provided.
    摘要 云计算中数据处理量、互联网物联网(IoT)应用的发展以及数据隐私问题的增加,导致从云基础到边缘基础的处理过渡。边缘设备的有限能源和计算资源使得从传统 von Neumann 架构过渡到内存计算(IMC),特别是 для机器学习和神经网络应用。为实现限制性硬件资源上的神经网络实现,网络压缩技术被应用。量化是最高效的网络压缩技术,可以减少内存占用量、延迟和能耗。本文提供了全面的内存计算基于量化神经网络(QNN)审查,并将软件基于量化方法与IMC硬件实现相连。此外,还提供了开放的挑战、QNN设计要求、建议和前瞻,以及IMC基于QNN硬件路线图。

Bounding data reconstruction attacks with the hypothesis testing interpretation of differential privacy

  • paper_url: http://arxiv.org/abs/2307.03928
  • repo_url: None
  • paper_authors: Georgios Kaissis, Jamie Hayes, Alexander Ziller, Daniel Rueckert
  • for: 本研究探讨了数据重建攻击对机器学习模型的成功率上的Upper bound,即数据重建Robustness(ReRo)。
  • methods: 本研究使用了渐近 Monte Carlo 估计来计算 ReRo 的紧致 bound,但这些估计只适用于特定的渐近DP机制。本文则提出了基于假设测试DP和ReRo的连接,并 deriveclosed-form、分析的或数字 ReRo 下限 для别 Laplace 和 Gaussian 机制以及它们的抽样variant。
  • results: 本研究提出了可直接计算的 ReRo 下限 для普通的DP机制,包括Laplace和Gaussian机制以及它们的抽样variant。这些下限可用于评估数据重建攻击的成功率,并帮助选择合适的DP机制。
    Abstract We explore Reconstruction Robustness (ReRo), which was recently proposed as an upper bound on the success of data reconstruction attacks against machine learning models. Previous research has demonstrated that differential privacy (DP) mechanisms also provide ReRo, but so far, only asymptotic Monte Carlo estimates of a tight ReRo bound have been shown. Directly computable ReRo bounds for general DP mechanisms are thus desirable. In this work, we establish a connection between hypothesis testing DP and ReRo and derive closed-form, analytic or numerical ReRo bounds for the Laplace and Gaussian mechanisms and their subsampled variants.
    摘要 我们探索重建鲁棒性(ReRo),最近提出的数据重建攻击隐私模型的成功率上限。先前的研究表明,差分隐私(DP)机制也提供了ReRo,但只有非正式的贝叶斯 Monte Carlo 估计。因此,直接计算可读的 ReRo 下限对普通的 DP 机制是有价值的。在这种工作中,我们将假设测试DP与ReRo之间的连接,并对拉普拉斯和高斯机制及其抽样变体 derivation of closed-form, analytic or numerical ReRo bounds.

Applying human-centered AI in developing effective human-AI teaming: A perspective of human-AI joint cognitive systems

  • paper_url: http://arxiv.org/abs/2307.03913
  • repo_url: None
  • paper_authors: Wei Xu, Zaifeng Gao
    for:* The paper focuses on the concept of human-AI teaming (HAT) as a new paradigm for developing AI systems, and the challenges and limitations of each member in human-AI collaboration.methods:* The paper proposes a conceptual framework of human-AI joint cognitive systems (HAIJCS) to represent and implement HAT for developing effective human-AI teaming.results:* The paper discusses the implications and future work for HAIJCS, and argues that HAIJCS may help adopt HAI while enabling HCAI.
    Abstract Research and application have used human-AI teaming (HAT) as a new paradigm to develop AI systems. HAT recognizes that AI will function as a teammate instead of simply a tool in collaboration with humans. Effective human-AI teams need to be capable of taking advantage of the unique abilities of both humans and AI while overcoming the known challenges and limitations of each member, augmenting human capabilities, and raising joint performance beyond that of either entity. The National AI Research and Strategic Plan 2023 update has recognized that research programs focusing primarily on the independent performance of AI systems generally fail to consider the functionality that AI must provide within the context of dynamic, adaptive, and collaborative teams and calls for further research on human-AI teaming and collaboration. However, there has been debate about whether AI can work as a teammate with humans. The primary concern is that adopting the "teaming" paradigm contradicts the human-centered AI (HCAI) approach, resulting in humans losing control of AI systems. This article further analyzes the HAT paradigm and the debates. Specifically, we elaborate on our proposed conceptual framework of human-AI joint cognitive systems (HAIJCS) and apply it to represent HAT under the HCAI umbrella. We believe that HAIJCS may help adopt HAI while enabling HCAI. The implications and future work for HAIJCS are also discussed. Insights: AI has led to the emergence of a new form of human-machine relationship: human-AI teaming (HAT), a paradigmatic shift in human-AI systems; We must follow a human-centered AI (HCAI) approach when applying HAT as a new design paradigm; We propose a conceptual framework of human-AI joint cognitive systems (HAIJCS) to represent and implement HAT for developing effective human-AI teaming
    摘要 研究和应用已经使用人类-人工智能团队(HAT)作为新的设计模式开发人工智能系统。HAT认可人工智能将作为团队成员而不仅仅是工具和人类合作。有效的人类-人工智能团队需要能够利用人类和人工智能的特殊能力,超越每个成员的知道的限制,增强人类能力,并使团队性能高于任何一个成员。2023年国家人工智能研究和战略计划更新认为,研究专注于独立运行的人工智能系统的Programmes通常不会考虑人工智能在动态、适应和协作团队中的功能,并呼吁进一步研究人类-人工智能团队和合作。然而,有讨论是人工智能能够作为团队成员。主要关注点是采用“团队”模式会让人类失去对人工智能系统的控制。本文进一步分析HAT模式和辩论。特别是,我们详细阐述我们的提出的人类-人工智能共同认知系统(HAIJCS)概念框架,并将其应用于HCAI领域中的HAT。我们认为HAIJCS可以帮助采用HAI,同时保持HCAI。本文还讨论了HAIJCS的意义和未来工作。

ScriptWorld: Text Based Environment For Learning Procedural Knowledge

  • paper_url: http://arxiv.org/abs/2307.03906
  • repo_url: https://github.com/exploration-lab/scriptworld
  • paper_authors: Abhinav Joshi, Areeb Ahmad, Umang Pandey, Ashutosh Modi
  • for: 本研究旨在开发一个基于奖励学习的文本环境,用于帮助代理人学习日常生活中的常识知识和自然语言理解能力。
  • methods: 本研究使用了一个名为ScriptWorld的文本环境,其中包含了10种日常生活中的活动,并对这些活动进行了详细的分析。furthermore, the authors use reinforcement learning-based baseline models/agents to play the games in Scriptworld, and leverage features obtained from pre-trained language models to understand the role of language models in such environments.
  • results: 实验结果表明,由于使用了预训练的语言模型,代理人可以更好地解决日常生活中的文本基于奖励学习环境。
    Abstract Text-based games provide a framework for developing natural language understanding and commonsense knowledge about the world in reinforcement learning based agents. Existing text-based environments often rely on fictional situations and characters to create a gaming framework and are far from real-world scenarios. In this paper, we introduce ScriptWorld: a text-based environment for teaching agents about real-world daily chores and hence imparting commonsense knowledge. To the best of our knowledge, it is the first interactive text-based gaming framework that consists of daily real-world human activities designed using scripts dataset. We provide gaming environments for 10 daily activities and perform a detailed analysis of the proposed environment. We develop RL-based baseline models/agents to play the games in Scriptworld. To understand the role of language models in such environments, we leverage features obtained from pre-trained language models in the RL agents. Our experiments show that prior knowledge obtained from a pre-trained language model helps to solve real-world text-based gaming environments. We release the environment via Github: https://github.com/Exploration-Lab/ScriptWorld
    摘要

Improving Prototypical Part Networks with Reward Reweighing, Reselection, and Retraining

  • paper_url: http://arxiv.org/abs/2307.03887
  • repo_url: None
  • paper_authors: Robin Netzorg, Jiaxun Li, Bin Yu
  • for: 这个论文的目的是提出一种基于深度可读性方法的图像分类方法,以便从图像中提取有意义的特征来进行分类。
  • methods: 这种方法是基于protoypical part network(ProtoPNet),它尝试通过分析图像的各个部分来进行分类。然而,这种方法经常会从图像中学习无用或不一致的部分来进行分类。为了解决这个问题,这个论文引用了人工智能反馈学习(RLHF)的最近发展,以便为ProtoPNet进行微调。通过收集CUB-200-2011 dataset上的人工标注,构建一个奖励模型,以便识别非无用的原型。
  • results: 通过在ProtoPNet训练过程中添加奖励模型、重新选择和重新训练原型,提出了一种名为R3-ProtoPNet的新方法。R3-ProtoPNet可以提高图像分类中的总体一致性和有意义性,但是独立使用R3-ProtoPNet时会下降测试预测精度。然而,将多个R3-ProtoPNet组合成ensemble时,可以提高测试预测性能,同时保持可读性。
    Abstract In recent years, work has gone into developing deep interpretable methods for image classification that clearly attributes a model's output to specific features of the data. One such of these methods is the prototypical part network (ProtoPNet), which attempts to classify images based on meaningful parts of the input. While this method results in interpretable classifications, this method often learns to classify from spurious or inconsistent parts of the image. Hoping to remedy this, we take inspiration from the recent developments in Reinforcement Learning with Human Feedback (RLHF) to fine-tune these prototypes. By collecting human annotations of prototypes quality via a 1-5 scale on the CUB-200-2011 dataset, we construct a reward model that learns to identify non-spurious prototypes. In place of a full RL update, we propose the reweighted, reselected, and retrained prototypical part network (R3-ProtoPNet), which adds an additional three steps to the ProtoPNet training loop. The first two steps are reward-based reweighting and reselection, which align prototypes with human feedback. The final step is retraining to realign the model's features with the updated prototypes. We find that R3-ProtoPNet improves the overall consistency and meaningfulness of the prototypes, but lower the test predictive accuracy when used independently. When multiple R3-ProtoPNets are incorporated into an ensemble, we find an increase in test predictive performance while maintaining interpretability.
    摘要 近年来,有很多工作在发展深入可解释的图像分类方法,以便清晰地归因模型的输出到特定的数据特征。一种这些方法是 прототипиаль部分网络(ProtoPNet),它尝试通过基于图像的意义部分来分类图像。虽然这种方法可以得到可解释的分类结果,但它经常从不安定或不一致的图像部分进行分类。为了纠正这个问题,我们取得了人类对 проtotypes 质量的注释(通过 CUB-200-2011 数据集上的一个5级评分系统),并根据这些注释构建了一个奖励模型,可以识别非安定的 prototypes。而不是整个RL更新,我们提议一种名为 R3-ProtoPNet 的方法,它在 ProtoPNet 训练循环中添加了三个额外步骤。首先,我们使用奖励来重新权重和选择 prototypes,以使其与人类反馈相吻合。然后,我们重新训练模型的特征,以便与更新后的 prototypes 进行对齐。我们发现 R3-ProtoPNet 可以提高总的一致性和意义性,但在独立使用时测试预测精度相对较低。然而,当多个 R3-ProtoPNets 被组合成ensemble时,我们发现测试预测性能得到了提高,同时保持了可解释性。

Designing Mixed-Initiative Video Games

  • paper_url: http://arxiv.org/abs/2307.03877
  • repo_url: None
  • paper_authors: Daijin Yang
    for: This paper aims to explore the use of gamification in mixed-initiative co-creation to make human-AI interactions more accessible and fun.methods: The author prototyped a game called Snake Story, where players can select AI-generated texts to write a story of a snake by playing a “Snake” like game. A controlled experiment was conducted to compare player-AI interactions with and without the game component.results: The study found that players utilized different strategies when playing with the two versions, game mechanics significantly affected the output stories, players’ creative process, and players’ role perceptions. Additionally, players with different backgrounds showed different preferences for the two versions.
    Abstract The development of Artificial Intelligence (AI) enables humans to co-create content with machines. The unexpectedness of AI-generated content can bring inspiration and entertainment to users. However, the co-creation interactions are always designed for content creators and have poor accessibility. To explore gamification of mixed-initiative co-creation and make human-AI interactions accessible and fun for players, I prototyped Snake Story, a mixed-initiative game where players can select AI-generated texts to write a story of a snake by playing a "Snake" like game. A controlled experiment was conducted to investigate the dynamics of player-AI interactions with and without the game component in the designed interface. As a result of a study with 11 players (n=11), I found that players utilized different strategies when playing with the two versions, game mechanics significantly affected the output stories, players' creative process, as well as role perceptions, and players with different backgrounds showed different preferences for the two versions. Based on these results, I further discussed considerations for mixed-initiative game design. This work aims to inspire the design of engaging co-creation experiences.
    摘要 人工智能(AI)的发展使得人们可以与机器共同创作内容。AI生成的内容的不可预测性可以给用户带来创意和娱乐。然而,共同创作交互都是为内容创作者设计的,而且访问性很差。为了探索杂合式共同创作的游戏化和人机交互的可乐性,我设计了蛇故事,一款杂合式游戏,其中玩家可以通过选择AI生成的文本来写一个蛇的故事。我们进行了一项控制实验,并与11名玩家进行了测试(n=11)。结果表明,玩家在两个版本中使用了不同的策略,游戏机制对输出故事、玩家的创作过程以及玩家的角色认知产生了显著影响,而不同背景的玩家也表现出了不同的偏好。这些结果表明,在设计杂合式游戏时需要考虑一些考量。这项工作的目的是鼓励设计有趣的共同创作经验。

Large Language Models for Supply Chain Optimization

  • paper_url: http://arxiv.org/abs/2307.03875
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Beibin Li, Konstantina Mellou, Bo Zhang, Jeevan Pathuri, Ishai Menache
  • for: 该研究旨在使用大语言模型(LLM)提高供应链自动化的可理解度和信任度。
  • methods: 研究提出了一个名为 OptiGuide 的框架,可以接受普通文本查询,并输出供应链优化结果的概念性解释。该框架不会抛弃现有的可combined optimization技术,而是通过解决 what-if 问题(例如,如果使用提供商 B 而不是提供商 A 来满足某个需求,则cost 会如何变化?)来提供可衡量的答案。
  • results: 研究在 Microsoft 云供应链中实现了一个真实的服务器分布式enario,并开发了一个通用的评估标准,可以用于评估其他情况下 LLM 输出的准确性。
    Abstract Supply chain operations traditionally involve a variety of complex decision making problems. Over the last few decades, supply chains greatly benefited from advances in computation, which allowed the transition from manual processing to automation and cost-effective optimization. Nonetheless, business operators still need to spend substantial efforts in explaining and interpreting the optimization outcomes to stakeholders. Motivated by the recent advances in Large Language Models (LLMs), we study how this disruptive technology can help bridge the gap between supply chain automation and human comprehension and trust thereof. We design OptiGuide -- a framework that accepts as input queries in plain text, and outputs insights about the underlying optimization outcomes. Our framework does not forgo the state-of-the-art combinatorial optimization technology, but rather leverages it to quantitatively answer what-if scenarios (e.g., how would the cost change if we used supplier B instead of supplier A for a given demand?). Importantly, our design does not require sending proprietary data over to LLMs, which can be a privacy concern in some circumstances. We demonstrate the effectiveness of our framework on a real server placement scenario within Microsoft's cloud supply chain. Along the way, we develop a general evaluation benchmark, which can be used to evaluate the accuracy of the LLM output in other scenarios.
    摘要 供应链操作涉及到许多复杂的决策问题。过去几十年,供应链受到计算技术的进步,从手动处理转变到自动化和成本效果的优化。然而,商业运营者仍需投入大量的努力来解释和解读优化结果给潜在投资者。鼓励了最近的大语言模型(LLM)的进步,我们研究如何使用这种破坏技术来bridging供应链自动化和人类理解和信任之间的差距。我们设计了OptiGuide框架,它接受输入是简单文本问题,并输出供应链优化结果的凝视。我们的框架不会抛弃现有的组合优化技术,而是利用它来回答具有优化结果的what-ifenario(例如,如果使用supplier B而不是supplier A来满足一个需求, Then what would be the cost?)。重要的是,我们的设计不会将专有数据传输到LLM,这可能会在某些情况下成为隐私问题。我们在Microsoft云供应链中进行了一个真实的服务部署场景,并开发了一个通用的评估标准,可以用于评估LLM输出的准确性在其他场景中。

Domain Adaptation using Silver Standard Labels for Ki-67 Scoring in Digital Pathology: A Step Closer to Widescale Deployment

  • paper_url: http://arxiv.org/abs/2307.03872
  • repo_url: None
  • paper_authors: Amanda Dy, Ngoc-Nhu Jennifer Nguyen, Seyed Hossein Mirjahanmardi, Melanie Dawe, Anthony Fyles, Wei Shi, Fei-Fei Liu, Dimitrios Androutsos, Susan Done, April Khademi
  • for: 提高基因矫正评分的对 объекivity和效率
  • methods: 使用无监督框架生成目标领域的银标签,并将银标签与来源领域的金标签(GS)一起使用进行训练
  • results: 在两种验证的基因矫正架构(UV-Net和piNET)上测试了五种训练方案,其中SS+GS方法在目标数据上显示出最高的PI准确率(95.9%)和更一致的结果,并且分析了t-SNE图表表明SS+GS模型学习的特征更加适合源和目标数据,从而提高了通用性。
    Abstract Deep learning systems have been proposed to improve the objectivity and efficiency of Ki- 67 PI scoring. The challenge is that while very accurate, deep learning techniques suffer from reduced performance when applied to out-of-domain data. This is a critical challenge for clinical translation, as models are typically trained using data available to the vendor, which is not from the target domain. To address this challenge, this study proposes a domain adaptation pipeline that employs an unsupervised framework to generate silver standard (pseudo) labels in the target domain, which is used to augment the gold standard (GS) source domain data. Five training regimes were tested on two validated Ki-67 scoring architectures (UV-Net and piNET), (1) SS Only: trained on target silver standard (SS) labels, (2) GS Only: trained on source GS labels, (3) Mixed: trained on target SS and source GS labels, (4) GS+SS: trained on source GS labels and fine-tuned on target SS labels, and our proposed method (5) SS+GS: trained on source SS labels and fine-tuned on source GS labels. The SS+GS method yielded significantly (p < 0.05) higher PI accuracy (95.9%) and more consistent results compared to the GS Only model on target data. Analysis of t-SNE plots showed features learned by the SS+GS models are more aligned for source and target data, resulting in improved generalization. The proposed pipeline provides an efficient method for learning the target distribution without manual annotations, which are time-consuming and costly to generate for medical images. This framework can be applied to any target site as a per-laboratory calibration method, for widescale deployment.
    摘要 深度学习系统可以提高 Ki-67 PI 分数的 объектив性和效率。然而,深度学习技术在不同领域数据上表现不佳,这是临床翻译中的挑战。为解决这个挑战,本研究提出了一个领域适应管道,使用无监督框架生成目标领域的银标签(pseudo labels),并将其与源领域的黄金标签(GS)一起使用来补充数据。本研究测试了五种训练方案,包括(1)SS Only:在目标领域的银标签(SS)上训练,(2)GS Only:在源领域的黄金标签(GS)上训练,(3)Mixed:在目标领域的银标签和源领域的黄金标签上训练,(4)GS+SS:在源领域的黄金标签上训练,然后在目标领域的银标签上进行细化,以及我们的提议方法(5)SS+GS:在源领域的银标签上训练,然后在源领域的黄金标签上进行细化。SS+GS 方法在目标数据上显示出了明显高于GS Only模型的PI准确率(95.9%)和更一致的结果。分析t-SNE图示了SS+GS模型学习的特征与源和目标数据更加一致,导致了提高了总体化。这种管道可以高效地学习目标分布而无需手动标注,这些标注是医疗图像生成的时间消耗和成本高昂的。这种框架可以在任何目标站点上进行各实验室的调整,为广泛部署准备。

Personalized Resource Allocation in Wireless Networks: An AI-Enabled and Big Data-Driven Multi-Objective Optimization

  • paper_url: http://arxiv.org/abs/2307.03867
  • repo_url: None
  • paper_authors: Rawan Alkurd, Ibrahim Abualhaol, Halim Yanikomeroglu
  • for: 本文主要用于描述如何使用人工智能(AI)技术来优化无线网络设计和优化。
  • methods: 本文使用了AI技术来实现用户水平个性化,并且提出了一种基于大数据的智能层来微管无线网络资源。
  • results: 根据本文描述,使用AI技术可以在无线网络中实现用户水平个性化,并且可以在实时中微调网络资源以提高用户满意度和资源利用率。
    Abstract The design and optimization of wireless networks have mostly been based on strong mathematical and theoretical modeling. Nonetheless, as novel applications emerge in the era of 5G and beyond, unprecedented levels of complexity will be encountered in the design and optimization of the network. As a result, the use of Artificial Intelligence (AI) is envisioned for wireless network design and optimization due to the flexibility and adaptability it offers in solving extremely complex problems in real-time. One of the main future applications of AI is enabling user-level personalization for numerous use cases. AI will revolutionize the way we interact with computers in which computers will be able to sense commands and emotions from humans in a non-intrusive manner, making the entire process transparent to users. By leveraging this capability, and accelerated by the advances in computing technologies, wireless networks can be redesigned to enable the personalization of network services to the user level in real-time. While current wireless networks are being optimized to achieve a predefined set of quality requirements, the personalization technology advocated in this article is supported by an intelligent big data-driven layer designed to micro-manage the scarce network resources. This layer provides the intelligence required to decide the necessary service quality that achieves the target satisfaction level for each user. Due to its dynamic and flexible design, personalized networks are expected to achieve unprecedented improvements in optimizing two contradicting objectives in wireless networks: saving resources and improving user satisfaction levels.
    摘要 wireless 网络的设计和优化曾主要基于强大的数学和理论模型。然而,随着5G和以后的应用出现, wireless 网络的设计和优化将面临无前例的复杂性。因此,人工智能(AI)将在无线网络设计和优化中扮演重要的作用,因为它可以在实时解决极其复杂的问题中提供灵活和适应性。未来,AI 将在无线网络设计和优化中扮演重要的作用,允许用户水平个性化。通过感知人类的命令和情感,计算机将成为不侵入的,使整个过程透明给用户。通过这种能力,并且受计算技术的加速,无线网络可以被重新设计,以实时个性化网络服务,以达到每个用户的目标满意度。当前的无线网络被优化以实现一组预定的质量要求,而人性化技术在本文中提出的协助层,通过大数据驱动的智能层,为缺乏资源的网络资源进行微管理。这层提供了必要的智能,以确定每个用户需要的服务质量,以达到目标满意度水平。由于它的动态和灵活设计,个性化网络预计会实现无前例的改善,在无线网络中改善两个矛盾目标:节约资源和提高用户满意度水平。

Reinforcement and Deep Reinforcement Learning-based Solutions for Machine Maintenance Planning, Scheduling Policies, and Optimization

  • paper_url: http://arxiv.org/abs/2307.03860
  • repo_url: None
  • paper_authors: Oluwaseyi Ogunfowora, Homayoun Najjaran
  • for: 本研究目的是对维护规划和优化问题的应用 Reinforcement Learning 进行文献查询和分析。
  • methods: 本研究使用了 Reinforcement Learning 和深度 Reinforcement Learning 等数据驱动决策算法来开发智能维护计划。
  • results: 本研究通过对文献进行分类和摘要,提出了维护规划和优化问题的共同主题和研究方法,同时还提出了未来研究的方向和关键问题。
    Abstract Systems and machines undergo various failure modes that result in machine health degradation, so maintenance actions are required to restore them back to a state where they can perform their expected functions. Since maintenance tasks are inevitable, maintenance planning is essential to ensure the smooth operations of the production system and other industries at large. Maintenance planning is a decision-making problem that aims at developing optimum maintenance policies and plans that help reduces maintenance costs, extend asset life, maximize their availability, and ultimately ensure workplace safety. Reinforcement learning is a data-driven decision-making algorithm that has been increasingly applied to develop dynamic maintenance plans while leveraging the continuous information from condition monitoring of the system and machine states. By leveraging the condition monitoring data of systems and machines with reinforcement learning, smart maintenance planners can be developed, which is a precursor to achieving a smart factory. This paper presents a literature review on the applications of reinforcement and deep reinforcement learning for maintenance planning and optimization problems. To capture the common ideas without losing touch with the uniqueness of each publication, taxonomies used to categorize the systems were developed, and reviewed publications were highlighted, classified, and summarized based on these taxonomies. Adopted methodologies, findings, and well-defined interpretations of the reviewed studies were summarized in graphical and tabular representations to maximize the utility of the work for both researchers and practitioners. This work also highlights the research gaps, key insights from the literature, and areas for future work.
    摘要 系统和机器会经历多种失效模式,导致机器健康下降,因此维护工作是必要的来恢复它们回到可以进行预期功能的状态。由于维护任务是不可避免的,因此维护观察是必要的来确保生产系统和其他业界的顺畅运行。维护观察是一个决策问题,旨在发展最佳维护政策和计划,帮助降低维护成本,延长资产寿命,最大化资产可用性,并确保工作安全。对于系统和机器的状态监控数据,强化学习是一种资料驱动的决策算法,它在发展动态维护计划方面表现出色。透过对系统和机器的状态监控数据的强化学习,可以发展出智能维护观察器,这是智能工厂的前提。本文将介绍对维护观察和优化问题的应用强化学习和深度强化学习的文献综述。为了捕捉每篇文章的共同主题而不失其各自独特性,文章被分类和摘要,并以文献综述的形式呈现,以便 для研究人员和实践者对其具有最大的实用性。本文还 highlights 维护观察和优化问题的研究漏洞、文献综述中的主要意义和未来工作的方向。

Teach Me How to Learn: A Perspective Review towards User-centered Neuro-symbolic Learning for Robotic Surgical Systems

  • paper_url: http://arxiv.org/abs/2307.03853
  • repo_url: None
  • paper_authors: Amr Gomaa, Bilal Mahdy, Niko Kleer, Michael Feld, Frank Kirchner, Antonio Krüger
    for: 这项研究旨在开发一种人类在Loop学习 paradigma,用于教育机器人在术式和感知两个水平上进行学习,以提高机器人在手术过程中的性能。methods: 本研究使用了混合神经符号学习方法,其中包括人类在Loop学习和自动学习两个方面。results: 研究人员通过对各种手术任务进行评估,发现存在一些挑战,如机器人与外科医生之间的交互和feedback问题。此外,还发现了一些可能的解决方案,如在线学习和专家反馈。
    Abstract Recent advances in machine learning models allowed robots to identify objects on a perceptual nonsymbolic level (e.g., through sensor fusion and natural language understanding). However, these primarily black-box learning models still lack interpretation and transferability and require high data and computational demand. An alternative solution is to teach a robot on both perceptual nonsymbolic and conceptual symbolic levels through hybrid neurosymbolic learning approaches with expert feedback (i.e., human-in-the-loop learning). This work proposes a concept for this user-centered hybrid learning paradigm that focuses on robotic surgical situations. While most recent research focused on hybrid learning for non-robotic and some generic robotic domains, little work focuses on surgical robotics. We survey this related research while focusing on human-in-the-loop surgical robotic systems. This evaluation highlights the most prominent solutions for autonomous surgical robots and the challenges surgeons face when interacting with these systems. Finally, we envision possible ways to address these challenges using online apprenticeship learning based on implicit and explicit feedback from expert surgeons.
    摘要 Translated into Simplified Chinese:近期机器学习模型的进步使得机器人能够通过感知混合和自然语言理解识别物体,但这些主要是黑盒学习模型,仍然缺乏解释性和可 перенос性,并需要高度数据和计算资源。一种 altenative 解决方案是通过混合神经符号学习方法与专家反馈(即人在循环学习)来教育机器人。这个工作提出了一种以用户为中心的混合学习 парадиг,专注于 робо医学应用。虽然最近的研究主要集中在非机器人和一些通用机器人领域的混合学习,但对于手术机器人来说,有很少的研究。我们在这些相关研究中做了评估,主要关注人在循环手术机器系统中与这些系统的互动所遇到的挑战。最后,我们想象了使用在线循环学习,基于专家外科医生的显式和隐式反馈来解决这些挑战。

Optimal Learners for Realizable Regression: PAC Learning and Online Learning

  • paper_url: http://arxiv.org/abs/2307.03848
  • repo_url: None
  • paper_authors: Idan Attias, Steve Hanneke, Alkis Kalavasis, Amin Karbasi, Grigoris Velegkas
  • for: 本文的目的是Characterizing the statistical complexity of realizable regression in both PAC learning setting and online learning setting.
  • methods: 本文使用了一种新的维度来Characterize which classes of real-valued predictors are learnable, 以及一种新的维度来Characterize the minimax instance optimal cumulative loss up to a constant factor.
  • results: 本文提出了一个必要的 condition for learnability based on a combinatorial dimension related to the DS dimension, 并 conjecture that it may also be sufficient in this context. 此外,本文还解决了Daskalakis和Golowich在STOC ‘22中提出的一个开Question.
    Abstract In this work, we aim to characterize the statistical complexity of realizable regression both in the PAC learning setting and the online learning setting. Previous work had established the sufficiency of finiteness of the fat shattering dimension for PAC learnability and the necessity of finiteness of the scaled Natarajan dimension, but little progress had been made towards a more complete characterization since the work of Simon 1997 (SICOMP '97). To this end, we first introduce a minimax instance optimal learner for realizable regression and propose a novel dimension that both qualitatively and quantitatively characterizes which classes of real-valued predictors are learnable. We then identify a combinatorial dimension related to the Graph dimension that characterizes ERM learnability in the realizable setting. Finally, we establish a necessary condition for learnability based on a combinatorial dimension related to the DS dimension, and conjecture that it may also be sufficient in this context. Additionally, in the context of online learning we provide a dimension that characterizes the minimax instance optimal cumulative loss up to a constant factor and design an optimal online learner for realizable regression, thus resolving an open question raised by Daskalakis and Golowich in STOC '22.
    摘要 在这项工作中,我们目标是characterize realizable regression的统计复杂性在PAC学习 Setting和在线学习 Setting中。先前的工作已经证明了实izable regression的可学习性充分 suffices finiteness of the fat shattering dimension,但是没有进行更完整的Characterization。为此,我们首先引入一个最小最优学习器 для realizable regression,并提出一种新的维度来Characterize which classes of real-valued predictors are learnable。然后,我们 indentify a combinatorial dimension related to the Graph dimension that characterizes ERM learnability in the realizable setting。最后,我们确立了学习可能性的必要条件,基于一种 combinatorial dimension related to the DS dimension,并speculate that it may also be sufficient in this context。在在线学习 Setting中,我们提供了一个Characterize the minimax instance optimal cumulative loss up to a constant factor的维度,并设计了一个optimal online learner for realizable regression,这解决了Daskalakis和Golowich在STOC '22中提出的一个开问。

RADAR: Robust AI-Text Detection via Adversarial Learning

  • paper_url: http://arxiv.org/abs/2307.03838
  • repo_url: None
  • paper_authors: Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho
    for: 这篇论文主要目的是提出一个新的框架,即 RADAR,用于检测人工智能文本生成器(LLM)生成的文本是否真实,并且能够对抗LLM生成的各种抽象和重建文本。methods: 这篇论文提出了一个新的框架RADAR,其中包括两个主要部分:一个是一个强大的AI-text检测器,另一个是一个强大的伪文本生成器。这两个部分在训练时会互相影响,以提高AI-text检测器的准确性和适用范围。results: 实验结果显示,RADAR在8种不同的LLM中进行测试时,具有与现有AI-text检测方法相比的优秀性能,特别是在各种抽象和重建文本情况下。此外,RADAR还能够对不同的LLM进行强大的转移学习,并且透过GPT-3.5进行进一步的改进。
    Abstract Recent advances in large language models (LLMs) and the intensifying popularity of ChatGPT-like applications have blurred the boundary of high-quality text generation between humans and machines. However, in addition to the anticipated revolutionary changes to our technology and society, the difficulty of distinguishing LLM-generated texts (AI-text) from human-generated texts poses new challenges of misuse and fairness, such as fake content generation, plagiarism, and false accusation of innocent writers. While existing works show that current AI-text detectors are not robust to LLM-based paraphrasing, this paper aims to bridge this gap by proposing a new framework called RADAR, which jointly trains a Robust AI-text Detector via Adversarial leaRning. RADAR is based on adversarial training of a paraphraser and a detector. The paraphraser's goal is to generate realistic contents to evade AI-text detection. RADAR uses the feedback from the detector to update the paraphraser, and vice versa. Evaluated with 8 different LLMs (Pythia, Dolly 2.0, Palmyra, Camel, GPT-J, Dolly 1.0, LLaMA, and Vicuna) across 4 datasets, experimental results show that RADAR significantly outperforms existing AI-text detection methods, especially when paraphrasing is in place. We also identify the strong transferability of RADAR from instruction-tuned LLMs to other LLMs, and evaluate the improved capability of RADAR via GPT-3.5.
    摘要 近年的大语言模型(LLM)和智能客户端应用的普及,使得人机生成高质量文本的边界变得模糊。然而,这些技术和社会变革的潜在改变,也带来了新的困难,如假内容生成、抄袭和无辜的写作者被诬告。现有研究表明,现有的AI-文本检测器对LLM-基于重新推敲的文本不具有坚固的鲁棒性。这篇论文旨在bridging这个差距,提出了一种新的框架,即RADAR,它通过对抗学习训练一个Robust AI-文本检测器和一个重新推敲器。RADAR基于抗对抗训练,重新推敲器的目标是生成真实的内容,逃脱AI-文本检测器的检测。RADAR通过检测器对重新推敲器的反馈来更新重新推敲器,并 vice versa。在8种不同的LLM(Pythia、Dolly 2.0、Palmyra、Camel、GPT-J、Dolly 1.0、LLaMA和Vicuna)在4个数据集上进行了实验,结果显示,RADAR在AI-文本检测方面表现出色,特别是在重新推敲时。我们还发现了RADAR在受过特定指令的LLM上的强大传输性,并通过GPT-3.5进行评估,发现RADAR的改进能力。

Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation

  • paper_url: http://arxiv.org/abs/2307.03833
  • repo_url: https://github.com/ipl-uw/ZeDO-Release
  • paper_authors: Zhongyu Jiang, Zhuoran Zhou, Lei Li, Wenhao Chai, Cheng-Yen Yang, Jenq-Neng Hwang
  • for: 3D human pose estimation (HPE) tasks in the wild, where traditional optimization-based methods have limited performance and learning-based methods have difficulty generalizing to new domains and scenarios.
  • methods: Zero-shot Diffusion-based Optimization (ZeDO) pipeline, which combines the advantages of optimization-based and learning-based methods by using a diffusion process to refine the pose estimates and a multi-hypothesis framework to handle cross-domain and in-the-wild variations.
  • results: state-of-the-art (SOTA) performance on Human3.6M and 3DPW datasets, with minMPJPE $51.4$mm and PA-MPJPE $42.6$mm, respectively, without requiring any 2D-3D or image-3D pairs for training.
    Abstract Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. Nonetheless, 3D HPE in the wild is still the biggest challenge of learning-based models, whether with 2D-3D lifting, image-to-3D, or diffusion-based methods, since the trained networks implicitly learn camera intrinsic parameters and domain-based 3D human pose distributions and estimate poses by statistical average. On the other hand, the optimization-based methods estimate results case-by-case, which can predict more diverse and sophisticated human poses in the wild. By combining the advantages of optimization-based and learning-based methods, we propose the Zero-shot Diffusion-based Optimization (ZeDO) pipeline for 3D HPE to solve the problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis ZeDO achieves state-of-the-art (SOTA) performance on Human3.6M as minMPJPE $51.4$mm without training with any 2D-3D or image-3D pairs. Moreover, our single-hypothesis ZeDO achieves SOTA performance on 3DPW dataset with PA-MPJPE $42.6$mm on cross-dataset evaluation, which even outperforms learning-based methods trained on 3DPW.
    摘要 学习基于方法在3D人姿估计任务中占据主导地位,在大多数标准准则上比传统优化基于方法更好表现。然而,3D人姿估计在野外仍然是学习基于方法模型的最大挑战,无论是2D-3D升级、图像-3D或扩散基于方法。这是因为训练的网络隐藏状态参数和领域基于3D人姿分布,并估计姿势的统计平均值。相比之下,优化基于方法方法每个案例都预测结果,可以预测更多和更复杂的人姿。我们提出了基于零shot扩散优化(ZeDO)管道来解决跨领域和野外3D人姿估计问题。我们的多种假设ZeDO在人3.6M数据集上实现了最佳性(SOTA)性能,无需训练2D-3D或图像-3D对。此外,我们的单种假设ZeDO在3DPW数据集上实现了SOTA性能,与学习基于方法方法相比,即使在跨数据集评估中。

Effect of Intensity Standardization on Deep Learning for WML Segmentation in Multi-Centre FLAIR MRI

  • paper_url: http://arxiv.org/abs/2307.03827
  • repo_url: None
  • paper_authors: Abdollah Ghazvanchahi, Pejman Jahbedar Maralani, Alan R. Moody, April Khademi
  • for: 这个论文的目的是对FLAIR MR图像进行白 matter损变(WML)分割,以提高DL方法在不同成像中心的性能。
  • methods: 这个论文使用了多种抑制法,包括IALMLAB和其他流行的 нормализа法,以及一种 skip-connection UNet 模型。
  • results: 结果显示,IALMLAB和 ensemble 方法在各种维度上都有较高的WML分割性能,特别是在不同的成像中心数据上。
    Abstract Deep learning (DL) methods for white matter lesion (WML) segmentation in MRI suffer a reduction in performance when applied on data from a scanner or centre that is out-of-distribution (OOD) from the training data. This is critical for translation and widescale adoption, since current models cannot be readily applied to data from new institutions. In this work, we evaluate several intensity standardization methods for MRI as a preprocessing step for WML segmentation in multi-centre Fluid-Attenuated Inversion Recovery (FLAIR) MRI. We evaluate a method specifically developed for FLAIR MRI called IAMLAB along with other popular normalization techniques such as White-strip, Nyul and Z-score. We proposed an Ensemble model that combines predictions from each of these models. A skip-connection UNet (SC UNet) was trained on the standardized images, as well as the original data and segmentation performance was evaluated over several dimensions. The training (in-distribution) data consists of a single study, of 60 volumes, and the test (OOD) data is 128 unseen volumes from three clinical cohorts. Results show IAMLAB and Ensemble provide higher WML segmentation performance compared to models from original data or other normalization methods. IAMLAB & Ensemble have the highest dice similarity coefficient (DSC) on the in-distribution data (0.78 & 0.80) and on clinical OOD data. DSC was significantly higher for IAMLAB compared to the original data (p<0.05) for all lesion categories (LL>25mL: 0.77 vs. 0.71; 10mL<= LL<25mL: 0.66 vs. 0.61; LL<10mL: 0.53 vs. 0.52). The IAMLAB and Ensemble normalization methods are mitigating MRI domain shift and are optimal for DL-based WML segmentation in unseen FLAIR data.
    摘要 深度学习(DL)方法用于FLAIR MR中的白质损伤(WML)分 segmentation时,当应用于不同的扫描仪或中心数据时,性能会下降。这对于翻译和大规模应用而言是重要的,因为当前的模型无法 direct 应用于新机构的数据。在这种情况下,我们评估了多种INTENSITY STANDARDIZATION METHODS FOR MRI作为FLAIR MR分 segmentation的预处理步骤。我们评估了specifically developed for FLAIR MR的IAMLAB方法,以及其他流行的normalization技术,如White-strip、Nyul和Z-score。我们提出了一种ensemble模型,该模型将每个模型的预测结果进行combine。一个skip-connection UNET(SC UNNet)在标准化图像上进行训练,以及原始数据上。我们对标准化图像和原始数据进行分 segmentation性能进行评估。training(在 Distribution)数据包括60个Volume,测试(OOD)数据包括128个未看过的Volume从三个临床群体。结果显示,IAMLAB和Ensemble方法在WML分 segmentation中提供了更高的性能,相比于原始数据或其他normalization方法。IAMLAB和Ensemble方法在in-distribution数据上的DSC( dice similarity coefficient)为0.78和0.80,并在临床OOD数据上也有最高的DSC。相比原始数据,IAMLAB方法的DSC显著高于原始数据(p<0.05),对所有损伤分类(LL>25mL:0.77 vs. 0.71; 10mL≤ LL<25mL:0.66 vs. 0.61; LL<10mL:0.53 vs. 0.52)。IAMLAB和Ensemble normalization方法可以 Mitigate MRI DOMAIN SHIFT,是适用于DL-based WML分 segmentation的优选方法。

How does AI chat change search behaviors?

  • paper_url: http://arxiv.org/abs/2307.03826
  • repo_url: None
  • paper_authors: Robert Capra, Jaime Arguello
  • for: 这个研究是一个初步的调查,旨在研究用户在搜索过程中如何使用生成AI聊天系统(简称chat),以及将chat系统与现有搜索工具结合使用后,用户的搜索习惯和策略会受到什么影响。
  • methods: 这个研究使用了10名参与者,他们使用了一个组合的Chat+Search系统,该系统使用了OpenAI GPT-3.5 API和Bing Web Search v5 API。参与者完成了三个搜索任务。
  • results: 这个预印稿中报告了用户如何将AI聊天系统纳入搜索过程中,他们对聊天系统的喜好和不喜好,对聊天系统的信任度,以及他们对聊天系统生成回复的心理模型。
    Abstract Generative AI tools such as chatGPT are poised to change the way people engage with online information. Recently, Microsoft announced their "new Bing" search system which incorporates chat and generative AI technology from OpenAI. Google has announced plans to deploy search interfaces that incorporate similar types of technology. These new technologies will transform how people can search for information. The research presented here is an early investigation into how people make use of a generative AI chat system (referred to simply as chat from here on) as part of a search process, and how the incorporation of chat systems with existing search tools may effect users search behaviors and strategies. We report on an exploratory user study with 10 participants who used a combined Chat+Search system that utilized the OpenAI GPT-3.5 API and the Bing Web Search v5 API. Participants completed three search tasks. In this pre-print paper of preliminary results, we report on ways that users integrated AI chat into their search process, things they liked and disliked about the chat system, their trust in the chat responses, and their mental models of how the chat system generated responses.
    摘要 “生成AI工具如 chatGPT 将改变线上信息搜寻方式。微软最近宣布“新的Bing”搜寻系统,融合了OpenAI的 chat和生成AI技术。Google 也宣布将推出相似的技术。这些新技术将改变人们搜寻资讯的方式。本研究是探索用户如何使用生成AI chat系统(以下简称为“chat”)作为搜寻过程的一部分,以及将 chat 系统与现有的搜寻工具结合后对用户搜寻行为和策略的影响。”“我们进行了10名用户的exploratory用户研究,他们使用了 combine Chat+Search 系统,该系统使用 OpenAI GPT-3.5 API 和 Bing Web Search v5 API。用户完成了三个搜寻任务。在这个预印稿中,我们报告了用户如何将 chat 系统组合到搜寻过程中,他们喜欢和不喜欢 chat 系统,他们对 chat 系统的信任度,以及他们如何解释 chat 系统生成的回答。”

Exploring and Characterizing Large Language Models For Embedded System Development and Debugging

  • paper_url: http://arxiv.org/abs/2307.03817
  • repo_url: None
  • paper_authors: Zachary Englhardt, Richard Li, Dilini Nissanka, Zhihan Zhang, Girish Narayanswamy, Joseph Breda, Xin Liu, Shwetak Patel, Vikram Iyer
    for:这种论文旨在评估领先的大语言模型(GPT-3.5、GPT-4、PaLM 2)在嵌入式系统开发中的表现,以及人工程序员与这些工具之间的交互方式。methods:该论文采用端到端硬件在Loop(HIL)评估平台来验证LLM生成的程序,并对N=450个实验进行比较。同时,该论文还开发了一种基于AI的软件工程办事处程序,用于建立嵌入式系统。results:研究发现,GPT-4特别表现出跨领域理解和逻辑能力,在某些情况下可以从单个提示生成完全正确的程序。在N=50次实验中,GPT-4生成的I2C接口66%的时间能够正常工作。此外,GPT-4还生成了特定的储存器驱动程序、LoRa通信程序和Context特定的电源优化程序,使nRF52程序的电流减少至12.2 uA,减少了740倍。
    Abstract Large language models (LLMs) have shown remarkable abilities to generate code, however their ability to develop software for embedded systems, which requires cross-domain knowledge of hardware and software has not been studied. In this paper we systematically evaluate leading LLMs (GPT-3.5, GPT-4, PaLM 2) to assess their performance for embedded system development, study how human programmers interact with these tools, and develop an AI-based software engineering workflow for building embedded systems. We develop an an end-to-end hardware-in-the-loop evaluation platform for verifying LLM generated programs using sensor actuator pairs. We compare all three models with N=450 experiments and find surprisingly that GPT-4 especially shows an exceptional level of cross-domain understanding and reasoning, in some cases generating fully correct programs from a single prompt. In N=50 trials, GPT-4 produces functional I2C interfaces 66% of the time. GPT-4 also produces register-level drivers, code for LoRa communication, and context-specific power optimizations for an nRF52 program resulting in over 740x current reduction to 12.2 uA. We also characterize the models' limitations to develop a generalizable workflow for using LLMs in embedded system development. We evaluate the workflow with 15 users including novice and expert programmers. We find that our workflow improves productivity for all users and increases the success rate for building a LoRa environmental sensor from 25% to 100%, including for users with zero hardware or C/C++ experience.
    摘要 大型语言模型(LLM)已经表现出杰出的代码生成能力,但它们对嵌入式系统开发,需要跨领域知识的开发仍未被研究。在这篇文章中,我们系统评估了主流LLM(GPT-3.5、GPT-4、PaLM 2)的表现,了解人工开发者与这些工具之间的互动,并开发了基于人工智能的软件工程生命周期 workflow,用于建立嵌入式系统。我们开发了一个终端硬件在Loop评估平台,用于验证LLM生成的程式码。我们对N=450实验进行比较,发现GPT-4尤其表现出跨领域理解和推理的特别高水平,有时候从单一提示中生成完全正确的程式码。在N=50试验中,GPT-4产生了66%的正常I2C接口。GPT-4还产生了对应的储存器驱动程式码、LoRa通信程式码和特定应用程序码,导致nRF52程式的电流降低至12.2 uA,实现了740倍的电流增强。我们还评估了模型的限制,以发展一个通用的工作流程,用于在嵌入式系统开发中使用LLM。我们将这个工作流程评估了15名使用者,包括初学者和高级程序员。我们发现,我们的工作流程可以帮助所有使用者提高生产力,并将将LoRa环境感应器的建立率由25%提高至100%,包括零硬件或C/C++经验的使用者。

For Women, Life, Freedom: A Participatory AI-Based Social Web Analysis of a Watershed Moment in Iran’s Gender Struggles

  • paper_url: http://arxiv.org/abs/2307.03764
  • repo_url: None
  • paper_authors: Adel Khorramrouz, Sujan Dutta, Ashiqur R. KhudaBukhsh
  • for: 这篇论文目的是计算推特语言对妇女平等的推动。
  • methods: 这篇论文使用了一个ensemble活动学习管道,以训练一个立场分类器。该管道中,伊朗女性参与了一个活跃的角色,不仅为标注提供标签,还提供了有价值的关键词和更具有意义的文档样本,以便更好地建立AI系统。
  • results: 分析结果显示,马哈赛妮·阿米尼的死亡引发了一些极化的推特语言讨论,增加了对妇女平等的负面和正面推文。正面推文的增加略大于负面推文的增加。此外,对于账户创建时间,与国家对齐的推特账户和oprotest推特账户之间,oprotest账户更像基线波斯语推特活动。
    Abstract In this paper, we present a computational analysis of the Persian language Twitter discourse with the aim to estimate the shift in stance toward gender equality following the death of Mahsa Amini in police custody. We present an ensemble active learning pipeline to train a stance classifier. Our novelty lies in the involvement of Iranian women in an active role as annotators in building this AI system. Our annotators not only provide labels, but they also suggest valuable keywords for more meaningful corpus creation as well as provide short example documents for a guided sampling step. Our analyses indicate that Mahsa Amini's death triggered polarized Persian language discourse where both fractions of negative and positive tweets toward gender equality increased. The increase in positive tweets was slightly greater than the increase in negative tweets. We also observe that with respect to account creation time, between the state-aligned Twitter accounts and pro-protest Twitter accounts, pro-protest accounts are more similar to baseline Persian Twitter activity.
    摘要 在本文中,我们提出了一种计算机分析方法,用于分析Twitter上的波斯语言讨论,以估计在贝娅·艾米尼在警察执法中去世后,对男女平等的态度发生了哪些变化。我们提出了一个ensemble活动学习管道,用于训练一个立场分类器。我们的创新在于,伊朗女性参与了这个人工智能系统的建构过程中,不仅提供标签,还提供了有价值的关键词,以及一些导向采样步骤中的短文案例。我们的分析表明,贝娅·艾米尼的去世导致了波斯语言讨论的各种极化,正面和负面的推文数量均增加,正面推文数量略大于负面推文数量。此外,我们发现,在帐户创建时间方面,抗护护护 Twitter 帐户与支持抗议 Twitter 帐户之间,后者更加类似于基线波斯 Twitter 活动。

URL: A Representation Learning Benchmark for Transferable Uncertainty Estimates

  • paper_url: http://arxiv.org/abs/2307.03810
  • repo_url: https://github.com/mkirchhof/url
  • paper_authors: Michael Kirchhof, Bálint Mucsányi, Seong Joon Oh, Enkelejda Kasneci
  • for: 本研究旨在提出一个名为“Uncertainty-aware Representation Learning”(URL)的测试库,用于评估对于不同类型的资料集进行转换的表达学习模型。
  • methods: 本研究使用了十一种不同的不确定量化方法,从ImageNet上进行预训练,然后转移到八个下游资料集上进行评估。
  • results: 研究结果显示,专注于表达本身的不确定性或直接估计预测风险的方法表现较好,但实现可转移的不确定量化仍然是一个开启的挑战。
    Abstract Representation learning has significantly driven the field to develop pretrained models that can act as a valuable starting point when transferring to new datasets. With the rising demand for reliable machine learning and uncertainty quantification, there is a need for pretrained models that not only provide embeddings but also transferable uncertainty estimates. To guide the development of such models, we propose the Uncertainty-aware Representation Learning (URL) benchmark. Besides the transferability of the representations, it also measures the zero-shot transferability of the uncertainty estimate using a novel metric. We apply URL to evaluate eleven uncertainty quantifiers that are pretrained on ImageNet and transferred to eight downstream datasets. We find that approaches that focus on the uncertainty of the representation itself or estimate the prediction risk directly outperform those that are based on the probabilities of upstream classes. Yet, achieving transferable uncertainty quantification remains an open challenge. Our findings indicate that it is not necessarily in conflict with traditional representation learning goals. Code is provided under https://github.com/mkirchhof/url .
    摘要 <>将文本翻译成简化中文。>现代化学习中,表现学习已经带来了大量的预训练模型,可以作为新数据集的起点进行转移。随着可靠机器学习和不确定量评估的需求增加,需要开发可以提供嵌入和传输不确定度估计的预训练模型。为了导引这些模型的发展,我们提出了不确定性感知学习(URL) benchmark。除了表达的传输性外,它还测量了零批量传输不确定度估计的新度量。我们使用 URL 评估了 eleven 种 ImageNet 预训练的不确定度估计器,并将它们转移到八个下游数据集。我们发现,关注表达本身的不确定度或直接估计预测风险的方法表现较好。然而,实现传输不确定度估计仍然是一个开放的挑战。我们的发现表明,这并不一定与传统表现学习目标在冲突。代码可以在 获取。

CLIPMasterPrints: Fooling Contrastive Language-Image Pre-training Using Latent Variable Evolution

  • paper_url: http://arxiv.org/abs/2307.03798
  • repo_url: https://github.com/matfrei/clipmasterprints
  • paper_authors: Matthias Freiberger, Peter Kun, Anders Sundnes Løvlie, Sebastian Risi
  • for: This paper demonstrates the vulnerability of Contrastive Language-Image Pre-training (CLIP) models to “fooling master images” that can manipulate the model’s confidence score for a wide range of prompts, while being unrecognizable to humans.
  • methods: The authors mine fooling master images by searching the latent space of generative models using evolution strategies or stochastic gradient descent. They investigate the properties of these mined images and find that they can generalize to a large number of semantically related captions.
  • results: The authors evaluate two possible mitigation strategies and find that the vulnerability to fooling master examples is closely related to a modality gap in contrastive pre-trained multi-modal networks. They argue for the mitigation of modality gaps in CLIP and related multi-modal approaches to improve their robustness.Here’s the full summary in Simplified Chinese:
  • for: 这个论文展示了CLIP模型对”骗主图像”的感受性,这些图像可以让CLIP模型对广泛的提示中的大量提示的确idence得分高,而人类则无法识别。
  • methods: 作者通过演化策略或批处 gradient descent 搜索生成模型的latent空间,挖掘出可以让CLIP模型对广泛的提示中的大量提示的确idence得分高的”骗主图像”。他们investigate这些挖掘出来的图像的性质,发现它们可以泛化到大量相关的提示中。
  • results: 作者评估了两种可能的防御策略,发现模态差在相对的多modal网络中对CLIP模型的感受性具有直接关系。他们因此 argues for在CLIP和相关多modal方法中减少模态差以提高其Robustness。
    Abstract Models leveraging both visual and textual data such as Contrastive Language-Image Pre-training (CLIP), are increasingly gaining importance. In this work, we show that despite their versatility, such models are vulnerable to what we refer to as fooling master images. Fooling master images are capable of maximizing the confidence score of a CLIP model for a significant number of widely varying prompts, while being unrecognizable for humans. We demonstrate how fooling master images can be mined by searching the latent space of generative models by means of an evolution strategy or stochastic gradient descent. We investigate the properties of the mined fooling master images, and find that images trained on a small number of image captions potentially generalize to a much larger number of semantically related captions. Further, we evaluate two possible mitigation strategies and find that vulnerability to fooling master examples is closely related to a modality gap in contrastive pre-trained multi-modal networks. From the perspective of vulnerability to off-manifold attacks, we therefore argue for the mitigation of modality gaps in CLIP and related multi-modal approaches. Source code and mined CLIPMasterPrints are available at https://github.com/matfrei/CLIPMasterPrints.
    摘要 模型结合视觉和文本数据,如对照语言图像预训练(CLIP),在当前研究中变得越来越重要。在这项工作中,我们表明,即使这些模型具有多样性,它们却容易受到我们称为“诡异主图”的攻击。诡异主图可以让CLIP模型对许多Semantic的提示具有最高的信任度,而这些图像对人类来说是不可识别的。我们通过搜索生成模型的latent空间使用演化策略或Stochastic gradient descent来挖掘诡异主图。我们研究了挖掘出来的诡异主图的性质,发现图像通过一小 número de图像描述学习可以涵盖一个许多更广泛的Semantic相关的描述。此外,我们评估了两种可能的缓解策略,发现模式差异导致的攻击性质与CLIP和相关多模态网络的易训练性相关。从攻击性角度来看,我们因此主张在CLIP和相关多模态网络中减少模式差异,以避免诡异主图的攻击。代码和挖掘出来的CLIPMasterPrints可以在https://github.com/matfrei/CLIPMasterPrints中找到。

When does the ID algorithm fail?

  • paper_url: http://arxiv.org/abs/2307.03750
  • repo_url: https://github.com/SOYJUN/Implement-ODR-protocol
  • paper_authors: Ilya Shpitser
  • for: 本文研究了一种解决图解 causal 模型中 interventional 分布 p(Y | do(a)) 的问题,即 ID 算法。
  • methods: 本文使用了 ID 算法,并提供了一些其他形式的表述。
  • results: 本文证明了 ID 算法是有声称的(对于输入 graphical causal model 中的 p(Y | do(a)) 进行正确的函数),并且完整(如果输入不能被模型中的 p(Y | do(a)) 进行正确的函数,则会显式地标识为失败)。
    Abstract The ID algorithm solves the problem of identification of interventional distributions of the form p(Y | do(a)) in graphical causal models, and has been formulated in a number of ways [12, 9, 6]. The ID algorithm is sound (outputs the correct functional of the observed data distribution whenever p(Y | do(a)) is identified in the causal model represented by the input graph), and complete (explicitly flags as a failure any input p(Y | do(a)) whenever this distribution is not identified in the causal model represented by the input graph). The reference [9] provides a result, the so called "hedge criterion" (Corollary 3), which aims to give a graphical characterization of situations when the ID algorithm fails to identify its input in terms of a structure in the input graph called the hedge. While the ID algorithm is, indeed, a sound and complete algorithm, and the hedge structure does arise whenever the input distribution is not identified, Corollary 3 presented in [9] is incorrect as stated. In this note, I outline the modern presentation of the ID algorithm, discuss a simple counterexample to Corollary 3, and provide a number of graphical characterizations of the ID algorithm failing to identify its input distribution.
    摘要 《ID算法解决了 causal 模型中 intervenational 分布的形式 p(Y | do(a)) 的问题,并已经有多种表述方式 [12, 9, 6]。ID算法是有效的(对于输入数据分布 p(Y | do(a)) 是 causal 模型中的正确函数),并且是完整的(明确地标识输入数据分布 p(Y | do(a)) 不能在 causal 模型中被识别)。参考 [9] 提供了一个名为 "防茂 критерион"(悬峰3)的结果,它目的是给出一种图解方式,用于描述 ID 算法输入分布不能被识别的情况。然而,ID 算法确实是一个有效和完整的算法,而且防茂结构在输入分布不能被识别时会出现。在本文中,我将详细介绍 ID 算法的现代表述方法,提供一个简单的反例,以及一些图解方式,用于描述 ID 算法输入分布不能被识别的情况。

AI and the EU Digital Markets Act: Addressing the Risks of Bigness in Generative AI

  • paper_url: http://arxiv.org/abs/2308.02033
  • repo_url: None
  • paper_authors: Ayse Gizem Yasar, Andrew Chong, Evan Dong, Thomas Krendl Gilbert, Sarah Hladikova, Roland Maio, Carlos Mougan, Xudong Shen, Shubham Singh, Ana-Andreea Stoica, Savannah Thais, Miri Zilka
  • for: 本研究旨在逐doFiltering the risks of bigness in digital markets, particularly in relation to generative AI systems.
  • methods: 作者提议 integrate certain AI software as core platform services,并 классифици certain developers as gatekeepers under the EU’s Digital Markets Act (DMA).
  • results: 本研究提出了一种评估 gatekeeper obligations的方法,以确保它们覆盖 generative AI services。这些结果可以帮助欧盟在考虑 generative AI 特定规则和可能的 DMA 修订时,更好地保持多样性和开放性在 generative AI 服务中。
    Abstract As AI technology advances rapidly, concerns over the risks of bigness in digital markets are also growing. The EU's Digital Markets Act (DMA) aims to address these risks. Still, the current framework may not adequately cover generative AI systems that could become gateways for AI-based services. This paper argues for integrating certain AI software as core platform services and classifying certain developers as gatekeepers under the DMA. We also propose an assessment of gatekeeper obligations to ensure they cover generative AI services. As the EU considers generative AI-specific rules and possible DMA amendments, this paper provides insights towards diversity and openness in generative AI services.
    摘要 随着人工智能技术的快速发展,大型数字市场的风险问题也在不断增长。欧盟的数字市场法(DMA)想要解决这些问题。然而,现有的框架可能不够覆盖生成AI系统,这些系统可能会成为AI服务的门户。这篇文章提议将某些AI软件作为核心平台服务 интегра进DMA,并将某些开发者 классифици为DMA中的“门槛keeper”。我们还提议对门槛keeper的义务进行评估,以确保它们覆盖生成AI服务。随着欧盟考虑生成AI特有的规则和可能的DMA修改,这篇文章提供了关于多样性和开放性在生成AI服务方面的洞察。

Intelligent Robotic Sonographer: Mutual Information-based Disentangled Reward Learning from Few Demonstrations

  • paper_url: http://arxiv.org/abs/2307.03705
  • repo_url: None
  • paper_authors: Zhongliang Jiang, Yuan Bi, Mingchuan Zhou, Ying Hu, Michael Burke, Nassir Navab
    for:The paper proposes an intelligent robotic sonographer to autonomously explore target anatomies and navigate a US probe to a relevant 2D plane by learning from expert.methods:The proposed approach uses a neural reward function, ranked pairwise image comparisons, and mutual information to learn the “language of sonography” and overcome inter-patient variations. A Gaussian distribution-based filter is also developed to evaluate the quality of the expert’s demonstrations.results:The proposed approach is demonstrated to be effective in different experiments, including representative experiments for the “line” target and “point” target on vascular phantom and two ex-vivo animal organ phantoms, respectively. The results showed that the proposed advanced framework can robustly work on different kinds of known and unseen phantoms.
    Abstract Ultrasound (US) imaging is widely used for biometric measurement and diagnosis of internal organs due to the advantages of being real-time and radiation-free. However, due to high inter-operator variability, resulting images highly depend on operators' experience. In this work, an intelligent robotic sonographer is proposed to autonomously "explore" target anatomies and navigate a US probe to a relevant 2D plane by learning from expert. The underlying high-level physiological knowledge from experts is inferred by a neural reward function, using a ranked pairwise image comparisons approach in a self-supervised fashion. This process can be referred to as understanding the "language of sonography". Considering the generalization capability to overcome inter-patient variations, mutual information is estimated by a network to explicitly extract the task-related and domain features in latent space. Besides, a Gaussian distribution-based filter is developed to automatically evaluate and take the quality of the expert's demonstrations into account. The robotic localization is carried out in coarse-to-fine mode based on the predicted reward associated to B-mode images. To demonstrate the performance of the proposed approach, representative experiments for the "line" target and "point" target are performed on vascular phantom and two ex-vivo animal organ phantoms (chicken heart and lamb kidney), respectively. The results demonstrated that the proposed advanced framework can robustly work on different kinds of known and unseen phantoms.
    摘要 超声成像(US)成为内部器官诊断和生物米特ри评估的广泛应用,主要原因是它具有实时和无核燐灰的优点。然而,由于操作员之间的高度变化,导致图像具有操作员的经验效应。在这项工作中,一个智能机器人超声测试员被提议,以自主地"探索"目标解剖结构,并使用学习从专家得到的高级生理知识来导航US prob。这种过程可以称为"医学超声语言"的理解。另外,为了强化图像的泛化能力,一个基于Gaussian分布的筛选器被开发,以自动评估专家的示范质量。机器人本地化采用了从预测的奖励关系来进行粗略到细粒的模式,以实现对B模式图像的预测。为证明提出的方法的性能,对不同类型的知名和未知荟袋(vascular phantom和两只酪肉动物器官荟袋)进行了代表性的实验。结果表明,提出的高级框架可以在不同类型的荟袋上具有robust性。

Unveiling the Potential of Knowledge-Prompted ChatGPT for Enhancing Drug Trafficking Detection on Social Media

  • paper_url: http://arxiv.org/abs/2307.03699
  • repo_url: None
  • paper_authors: Chuanbo Hu, Bin Liu, Xin Li, Yanfang Ye
    for:The paper aims to detect illicit drug trafficking activities on social media, specifically on platforms like Instagram and Twitter.methods:The authors use large language models (LLMs), such as ChatGPT, to detect drug trafficking activities. They propose an analytical framework that incorporates prior knowledge and scenario-based prompts to improve the accuracy of drug trafficking detection.results:The proposed framework outperforms other baseline language models in terms of drug trafficking detection accuracy, with a remarkable improvement of nearly 12%. The use of prior knowledge and scenario-based prompts helps ChatGPT effectively identify and label drug trafficking activities, even in the presence of deceptive language and euphemisms used by drug dealers to evade detection.
    Abstract Social media platforms such as Instagram and Twitter have emerged as critical channels for drug marketing and illegal sale. Detecting and labeling online illicit drug trafficking activities becomes important in addressing this issue. However, the effectiveness of conventional supervised learning methods in detecting drug trafficking heavily relies on having access to substantial amounts of labeled data, while data annotation is time-consuming and resource-intensive. Furthermore, these models often face challenges in accurately identifying trafficking activities when drug dealers use deceptive language and euphemisms to avoid detection. To overcome this limitation, we conduct the first systematic study on leveraging large language models (LLMs), such as ChatGPT, to detect illicit drug trafficking activities on social media. We propose an analytical framework to compose \emph{knowledge-informed prompts}, which serve as the interface that humans can interact with and use LLMs to perform the detection task. Additionally, we design a Monte Carlo dropout based prompt optimization method to further to improve performance and interpretability. Our experimental findings demonstrate that the proposed framework outperforms other baseline language models in terms of drug trafficking detection accuracy, showing a remarkable improvement of nearly 12\%. By integrating prior knowledge and the proposed prompts, ChatGPT can effectively identify and label drug trafficking activities on social networks, even in the presence of deceptive language and euphemisms used by drug dealers to evade detection. The implications of our research extend to social networks, emphasizing the importance of incorporating prior knowledge and scenario-based prompts into analytical tools to improve online security and public safety.
    摘要 社交媒体平台如Instagram和Twitter已成为药品市场和非法销售的重要渠道。检测和标注在线贩卖药品活动变得非常重要,以解决这个问题。然而,传统的监督学习方法在检测贩卖药品上凭借具有较大量的标注数据进行检测是不可靠的,而且这些模型经常遇到识别贩卖药品活动时,贩卖者使用欺骗语言和推荐词来避免检测的问题。为了解决这些限制,我们进行了第一次系统性的研究,利用大型自然语言模型(LLM),如ChatGPT,检测社交媒体上的贩卖药品活动。我们提出了一个分析框架,将知识告诉作为界面,让人们通过LLM进行检测任务。此外,我们还设计了基于Monte Carlo Dropout的提示优化方法,以提高性能和可解释性。我们的实验结果表明,我们的框架在贩卖药品检测精度方面比基eline语言模型提高了12%以上。通过将知识和我们提出的提示结合使用,ChatGPT可以在社交网络上有效地识别和标注贩卖药品活动,即使贩卖者使用欺骗语言和推荐词来避免检测。我们的研究结果对社交网络产生重要的扩展,强调在线安全和公共安全中包含知识和场景基于的分析工具的重要性。

Scalable Membership Inference Attacks via Quantile Regression

  • paper_url: http://arxiv.org/abs/2307.03694
  • repo_url: None
  • paper_authors: Martin Bertran, Shuai Tang, Michael Kearns, Jamie Morgenstern, Aaron Roth, Zhiwei Steven Wu
  • for: 本研究旨在攻击用黑盒训练的模型,以确定特定示例是否被用于训练。
  • methods: 本研究使用量化回归来攻击模型,并不需要知道模型的结构。
  • results: 实验结果表明,本方法可以与现有最佳攻击方法竞争,而且需要训练只一个模型,相比之前的攻击方法需要训练多个模型。
    Abstract Membership inference attacks are designed to determine, using black box access to trained models, whether a particular example was used in training or not. Membership inference can be formalized as a hypothesis testing problem. The most effective existing attacks estimate the distribution of some test statistic (usually the model's confidence on the true label) on points that were (and were not) used in training by training many \emph{shadow models} -- i.e. models of the same architecture as the model being attacked, trained on a random subsample of data. While effective, these attacks are extremely computationally expensive, especially when the model under attack is large. We introduce a new class of attacks based on performing quantile regression on the distribution of confidence scores induced by the model under attack on points that are not used in training. We show that our method is competitive with state-of-the-art shadow model attacks, while requiring substantially less compute because our attack requires training only a single model. Moreover, unlike shadow model attacks, our proposed attack does not require any knowledge of the architecture of the model under attack and is therefore truly ``black-box". We show the efficacy of this approach in an extensive series of experiments on various datasets and model architectures.
    摘要 成员推测攻击是设计用黑盒访问训练过的模型来确定特定示例是否在训练中使用过。成员推测可以形式化为一个假设测试问题。现有最有效的攻击方法是估计模型在真实标签上的信任度分布中的某些测试统计(通常是模型对真实标签的信任度)。这些攻击通常是非常计算昂贵的,特别是当模型被攻击时非常大。我们介绍了一种新的攻击方法,基于模型下攻击的信任度分布中的confidence分布进行量化回归。我们显示了我们的方法与现有的阴影模型攻击相当竞争力,而需要更少的计算资源,因为我们的攻击只需要训练一个模型。此外,我们的提议的攻击方法不需要知道模型下攻击的结构,因此是真正的“黑盒”攻击。我们在不同的 dataset 和模型结构上进行了广泛的实验,以证明这种方法的可行性。

cs.CL - 2023-07-08

Advancements in Scientific Controllable Text Generation Methods

  • paper_url: http://arxiv.org/abs/2307.05538
  • repo_url: None
  • paper_authors: Arnav Goel, Medha Hira, Avinash Anand, Siddhesh Bangar, Dr. Rajiv Ratn Shah
  • for: 本研究旨在开发一种可控文本生成技术,用于科学文献中的自动生成。
  • methods: 该研究使用了七个组件,包括语料库、模型、排序策略、干扰方法、随机化方法、权重补做和拟合策略。这些组件之间可以进行组合,以实现不同的控制效果。
  • results: 本研究通过理论分析和质量评估来描述这些方法的工作原理和优劣点,并提供了可能的新architecture。未来的实验研究将比较这些方法的效果,以了解它们在不同情况下的优劣点。
    Abstract The previous work on controllable text generation is organized using a new schema we provide in this study. Seven components make up the schema, and each one is crucial to the creation process. To accomplish controlled generation for scientific literature, we describe the various modulation strategies utilised to modulate each of the seven components. We also offer a theoretical study and qualitative examination of these methods. This insight makes possible new architectures based on combinations of these components. Future research will compare these methods empirically to learn more about their strengths and utility.
    摘要 先前的文本控制生成研究由我们提供的新架构组织。这个架构包括7个组件,每个组件都是生成过程中不可或缺的。为了实现科学文献控制生成,我们介绍了对每个组件进行调整的各种调制策略。我们还提供了这些方法的理论研究和质量分析。这些洞察可能导致基于这些组件的新架构的开发。未来的研究将通过实验比较这些方法的优势和实用性。

A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation

  • paper_url: http://arxiv.org/abs/2307.03987
  • repo_url: None
  • paper_authors: Neeraj Varshney, Wenlin Yao, Hongming Zhang, Jianshu Chen, Dong Yu
  • for: 这个论文的目的是提高大型自然语言处理器的可靠性和可信度。
  • methods: 这篇论文提出了一种活动检测和纠正方法,以解决大型自然语言处理器中的“幻见”问题。这种方法包括在生成过程中首先标识有可能存在幻见的语句,然后通过验证程序确认其正确性,并最后纠正已确检测到的幻见。
  • results: 经过广泛的实验表明,提出的检测和纠正方法可以成功地降低GPT-3.5模型中的幻见率,从47.5%降低到14.5%的平均值。此外,这种方法还可以在不同类型的问题上进行有效地应用,包括多步问题和False Premise问题。
    Abstract Recently developed large language models have achieved remarkable success in generating fluent and coherent text. However, these models often tend to 'hallucinate' which critically hampers their reliability. In this work, we address this crucial problem and propose an approach that actively detects and mitigates hallucinations during the generation process. Specifically, we first identify the candidates of potential hallucination leveraging the model's logit output values, check their correctness through a validation procedure, mitigate the detected hallucinations, and then continue with the generation process. Through extensive experiments with GPT-3.5 (text-davinci-003) on the 'article generation task', we first demonstrate the individual efficacy of our detection and mitigation techniques. Specifically, the detection technique achieves a recall of ~88% and the mitigation technique successfully mitigates 57.6% of the correctly detected hallucinations. Importantly, our mitigation technique does not introduce new hallucinations even in the case of incorrectly detected hallucinations, i.e., false positives. Then, we show that the proposed active detection and mitigation approach successfully reduces the hallucinations of the GPT-3.5 model from 47.5% to 14.5% on average. We further demonstrate the effectiveness and wide applicability of our approach through additional studies including performance on different types of questions (multi-hop and false premise questions) and with another LLM from a different model family (Vicuna). In summary, our work contributes to improving the reliability and trustworthiness of large language models, a crucial step en route to enabling their widespread adoption in real-world applications.
    摘要 现在已经开发出的大型语言模型已经达到了非常出色的成绩,可以生成流畅、一致的文本。然而,这些模型经常会“幻觉”,这会严重降低其可靠性。在这个工作中,我们解决这个重要的问题,我们提出了一种活动检测和纠正幻觉的方法。具体来说,我们首先通过模型的极值输出值来认为潜在的幻觉者,然后通过验证过程来确认其正确性,并在检测到的幻觉被纠正后继续进行生成过程。通过对GPT-3.5(文本达文奇003)进行了广泛的实验,我们证明了我们的检测和纠正技术的个人效果。 Specifically, our detection technique achieves a recall of approximately 88%, and the mitigation technique successfully mitigates 57.6% of the correctly detected hallucinations. Furthermore, our mitigation technique does not introduce new hallucinations even in the case of incorrectly detected hallucinations, i.e., false positives. Finally, we show that our active detection and mitigation approach successfully reduces the hallucinations of the GPT-3.5 model from 47.5% to 14.5% on average. We also demonstrate the effectiveness and wide applicability of our approach through additional studies including performance on different types of questions (multi-hop and false premise questions) and with another LLM from a different model family (Vicuna). In summary, our work contributes to improving the reliability and trustworthiness of large language models, a crucial step en route to enabling their widespread adoption in real-world applications.

Evaluating the Capability of Large-scale Language Models on Chinese Grammatical Error Correction Task

  • paper_url: http://arxiv.org/abs/2307.03972
  • repo_url: None
  • paper_authors: Fanyi Qu, Yunfang Wu
  • for: 这份报告旨在探讨大型自然语言处理(NLP)任务中大型语言模型(LLM)的性能,以及未来工作的指导。
  • methods: 我们在4个中文grammatical error correction(GEC)数据集上进行了3种不同的LLM模型的实验。
  • results: 我们发现LLMs在自动评估指标上的性能不足前一代模型,并且存在过 corrections 的问题。此外,我们还发现了不同数据分布下LLMs的性能有很大差异。这些发现表明需要进一步调查LLMs在中文GEC任务中的应用。
    Abstract Large-scale language models (LLMs) has shown remarkable capability in various of Natural Language Processing (NLP) tasks and attracted lots of attention recently. However, some studies indicated that large language models fail to achieve promising result beyond the state-of-the-art models in English grammatical error correction (GEC) tasks. In this report, we aim to explore the how large language models perform on Chinese grammatical error correction tasks and provide guidance for future work. We conduct experiments with 3 different LLMs of different model scale on 4 Chinese GEC dataset. Our experimental results indicate that the performances of LLMs on automatic evaluation metrics falls short of the previous sota models because of the problem of over-correction. Furthermore, we also discover notable variations in the performance of LLMs when evaluated on different data distributions. Our findings demonstrates that further investigation is required for the application of LLMs on Chinese GEC task.
    摘要 大规模语言模型(LLM)在自然语言处理(NLP)任务中表现出色,引起了广泛的关注。然而,一些研究表明,大型语言模型在英语grammatical error correction(GEC)任务中未能达到前景模型的成绩。在这份报告中,我们想要探究大型语言模型在中文GEC任务中的表现,并提供未来工作的指导。我们在4个中文GEC数据集上进行了3种不同的LLM模型的实验。我们的实验结果表明,LLM的自动评估指标的表现不足,主要是因为过度修复的问题。此外,我们还发现了不同数据分布下LLM的表现异常大的现象。我们的发现表明,未来应该进一步调查大型语言模型在中文GEC任务中的应用。

Is ChatGPT a Good Personality Recognizer? A Preliminary Study

  • paper_url: http://arxiv.org/abs/2307.03952
  • repo_url: None
  • paper_authors: Yu Ji, Wen Wu, Hong Zheng, Yi Hu, Xi Chen, Liang He
    for:这个研究的目的是评估 chatGPT 在文本基础人格识别任务中的能力,以生成有效的人格数据。methods:本研究使用了多种提示策略,包括自然语言生成和逻辑推理,以测试 chatGPT 在文本分析和推理方面的能力。results:实验结果显示,使用 zero-shot chain-of-thought 提示策略可以帮助 chatGPT 在文本基础人格识别任务中表现出色,并且能够提供自然语言的解释。此外,通过对 chatGPT 进行水平调整的提示策略,可以将其与相应的现有模型之间的性能差距更加缩小。但是,研究发现 chatGPT 对某些敏感特征(如性别和年龄)存在不公正现象。同时,通过询问 chatGPT 的人格识别能力,可以提高它在相关下游任务中的表现,如情感分类和压力预测。
    Abstract In recent years, personality has been regarded as a valuable personal factor being incorporated into numerous tasks such as sentiment analysis and product recommendation. This has led to widespread attention to text-based personality recognition task, which aims to identify an individual's personality based on given text. Considering that ChatGPT has recently exhibited remarkable abilities on various natural language processing tasks, we provide a preliminary evaluation of ChatGPT on text-based personality recognition task for generating effective personality data. Concretely, we employ a variety of prompting strategies to explore ChatGPT's ability in recognizing personality from given text, especially the level-oriented prompting strategy we designed for guiding ChatGPT in analyzing given text at a specified level. The experimental results on two representative real-world datasets reveal that ChatGPT with zero-shot chain-of-thought prompting exhibits impressive personality recognition ability and is capable to provide natural language explanations through text-based logical reasoning. Furthermore, by employing the level-oriented prompting strategy to optimize zero-shot chain-of-thought prompting, the performance gap between ChatGPT and corresponding state-of-the-art model has been narrowed even more. However, we observe that ChatGPT shows unfairness towards certain sensitive demographic attributes such as gender and age. Additionally, we discover that eliciting the personality recognition ability of ChatGPT helps improve its performance on personality-related downstream tasks such as sentiment classification and stress prediction.
    摘要 Recently, personality has been recognized as a valuable personal factor in various tasks such as sentiment analysis and product recommendation. This has led to widespread attention to text-based personality recognition tasks, which aim to identify an individual's personality based on given text. Considering ChatGPT's recent remarkable abilities in natural language processing tasks, we provide a preliminary evaluation of ChatGPT on text-based personality recognition tasks to generate effective personality data.To explore ChatGPT's ability in recognizing personality from given text, we employ various prompting strategies. Specifically, we use a level-oriented prompting strategy designed to guide ChatGPT in analyzing given text at a specified level. The experimental results on two representative real-world datasets show that ChatGPT with zero-shot chain-of-thought prompting exhibits impressive personality recognition ability and can provide natural language explanations through text-based logical reasoning. Furthermore, by optimizing zero-shot chain-of-thought prompting with the level-oriented prompting strategy, the performance gap between ChatGPT and corresponding state-of-the-art models has been narrowed even more.However, we observe that ChatGPT shows unfairness towards certain sensitive demographic attributes such as gender and age. Additionally, we find that eliciting the personality recognition ability of ChatGPT can improve its performance on personality-related downstream tasks such as sentiment classification and stress prediction.

Opening up ChatGPT: Tracking openness, transparency, and accountability in instruction-tuned text generators

  • paper_url: http://arxiv.org/abs/2307.05532
  • repo_url: https://github.com/opening-up-chatgpt/opening-up-chatgpt.github.io
  • paper_authors: Andreas Liesenfeld, Alianda Lopez, Mark Dingemanse
  • for: 本研究旨在探讨大语言模型在对话界面中的使用,以及这种使用的风险和机遇。
  • methods: 本研究使用了开源项目的比较性能,并对这些项目进行了评估,以探讨这些项目的开放性和可读性。
  • results: 研究发现,许多自称为“开源”的项目仍然存在不明文法律性的数据问题,并且少数项目分享人工调教数据,导致 scientific documentation 非常罕见。
    Abstract Large language models that exhibit instruction-following behaviour represent one of the biggest recent upheavals in conversational interfaces, a trend in large part fuelled by the release of OpenAI's ChatGPT, a proprietary large language model for text generation fine-tuned through reinforcement learning from human feedback (LLM+RLHF). We review the risks of relying on proprietary software and survey the first crop of open-source projects of comparable architecture and functionality. The main contribution of this paper is to show that openness is differentiated, and to offer scientific documentation of degrees of openness in this fast-moving field. We evaluate projects in terms of openness of code, training data, model weights, RLHF data, licensing, scientific documentation, and access methods. We find that while there is a fast-growing list of projects billing themselves as 'open source', many inherit undocumented data of dubious legality, few share the all-important instruction-tuning (a key site where human annotation labour is involved), and careful scientific documentation is exceedingly rare. Degrees of openness are relevant to fairness and accountability at all points, from data collection and curation to model architecture, and from training and fine-tuning to release and deployment.
    摘要 大型语言模型 exhibiting instruction-following behavior 是最近一大革命的 conversational interfaces 的趋势,这趋势受到 OpenAI 的 ChatGPT 的发布以及人类反馈驱动的 reinforcement learning 的激发。我们评估了依赖于专有软件的风险,并检查了相同架构和功能的开源项目的第一批。本文的主要贡献是显示开源是 differentiated,并提供了这个快速发展的领域中科学文献的记录。我们将项目评估为开源代码、训练数据、模型权重、RLHF 数据、许可证、科学文献和访问方法等方面的开放程度进行评估。我们发现许多自称为 'open source' 的项目继承了未经Documented的数据,少数分享了关键的指令调整(一个关键的人工注释劳动 Site),并且精心的科学文献记录是非常罕见。度量开放程度对公平和负责任是非常重要,从数据收集和整理到模型架构、训练和细化到发布和部署都是如此。

On decoder-only architecture for speech-to-text and large language model integration

  • paper_url: http://arxiv.org/abs/2307.03917
  • repo_url: None
  • paper_authors: Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu
  • For: 这种研究旨在探索将语音信号细目Integrated into文本大语言模型中,以提高人机交互的自然语言处理能力。* Methods: 该方法使用Connectionist Temporal Classification和一个简单的音频编码器将压缩语音特征映射到文本大语言模型中的连续 semantic space。* Results: 实验结果表明,使用这种方法可以在多语言speech-to-text翻译任务中实现显著的提升,这 highlights the potential advantages of decoder-only models for speech-to-text conversion。
    Abstract Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has not been explored well. The "decoder-only" architecture has also not been well studied for speech processing tasks. In this research, we introduce Speech-LLaMA, a novel approach that effectively incorporates acoustic information into text-based large language models. Our method leverages Connectionist Temporal Classification and a simple audio encoder to map the compressed acoustic features to the continuous semantic space of the LLM. In addition, we further probe the decoder-only architecture for speech-to-text tasks by training a smaller scale randomly initialized speech-LLaMA model from speech-text paired data alone. We conduct experiments on multilingual speech-to-text translation tasks and demonstrate a significant improvement over strong baselines, highlighting the potential advantages of decoder-only models for speech-to-text conversion.
    摘要 大型自然语言处理模型(LLM)已经在人工智能领域取得了很大的成功,使得人机交互使用自然语言更加简单。然而,将语音信号集成到LLM中还没有得到了充分的探索。“解码器只”架构也尚未得到了充分的研究。在这项研究中,我们介绍了一种新的方法,即将语音信号集成到文本基于大语言模型中。我们的方法利用了Connectionist Temporal Classification和一个简单的音频编码器,将压缩的语音特征映射到文本基于大语言模型的连续 semantics 空间中。此外,我们还进一步探索了基于 speech-to-text 任务的 decoder-only 架构,通过从 speech-text 对应数据 alone initialize 一个较小规模的随机 initialize 的 speech-LLaMA 模型进行训练。我们在多语言 speech-to-text 翻译任务上进行了实验,并表明了 decoder-only 模型在 speech-to-text 转换中的潜在优势。

Answering Ambiguous Questions via Iterative Prompting

  • paper_url: http://arxiv.org/abs/2307.03897
  • repo_url: https://github.com/sunnweiwei/ambigprompt
  • paper_authors: Weiwei Sun, Hengyi Cai, Hongshen Chen, Pengjie Ren, Zhumin Chen, Maarten de Rijke, Zhaochun Ren
  • for: answering ambiguous questions
  • methods: integrates an answering model with a prompting model in an iterative manner, with task-specific post-pretraining approach
  • results: achieves state-of-the-art or competitive results while using less memory and having a lower inference latency than competing approaches, performs well in low-resource settings
    Abstract In open-domain question answering, due to the ambiguity of questions, multiple plausible answers may exist. To provide feasible answers to an ambiguous question, one approach is to directly predict all valid answers, but this can struggle with balancing relevance and diversity. An alternative is to gather candidate answers and aggregate them, but this method can be computationally costly and may neglect dependencies among answers. In this paper, we present AmbigPrompt to address the imperfections of existing approaches to answering ambiguous questions. Specifically, we integrate an answering model with a prompting model in an iterative manner. The prompting model adaptively tracks the reading process and progressively triggers the answering model to compose distinct and relevant answers. Additionally, we develop a task-specific post-pretraining approach for both the answering model and the prompting model, which greatly improves the performance of our framework. Empirical studies on two commonly-used open benchmarks show that AmbigPrompt achieves state-of-the-art or competitive results while using less memory and having a lower inference latency than competing approaches. Additionally, AmbigPrompt also performs well in low-resource settings. The code are available at: https://github.com/sunnweiwei/AmbigPrompt.
    摘要 在开放领域问答中,由于问题的抽象性,可能存在多个有可能的答案。为提供有可能性的答案,一种方法是直接预测所有有效答案,但这可能会困难平衡相关性和多样性。另一种方法是收集候选答案并聚合它们,但这可能会很计算昂贵并可能忽视答案之间的依赖关系。在本文中,我们提出了 AmbigPrompt,以解决现有问答系统中的缺陷。具体来说,我们将答题模型与提示模型集成在迭代方式下,以便在不同的读者过程中适应地触发答题模型,并生成具有不同特征和相关性的答案。此外,我们还开发了特定任务的预训练方法,用于提高我们的框架的性能。经验研究表明,AmbigPrompt在两个常用的开放benchmark上实现了状态当前或竞争性的结果,同时使用的内存和执行时间比竞争方法更低。此外,AmbigPrompt还在低资源环境下表现良好。代码可以在以下链接中找到:https://github.com/sunnweiwei/AmbigPrompt。

Incomplete Utterance Rewriting as Sequential Greedy Tagging

  • paper_url: http://arxiv.org/abs/2307.06337
  • repo_url: None
  • paper_authors: Yunshan Chen
  • for: 实现对句子中欠缺的词汇进行补充。
  • methods: 提出了一种基于序列标记的模型,能够从对话 контек斯中提取信息。同时,引入了speaker-aware嵌入,以处理 speaker 的变化。
  • results: 在多个公共数据集上实验,模型在九个重建得分中均 achievable 最佳结果,而其他 метриック分数与之前的州流量模型相似。此外,由于模型的简单性,我们的方法在推断速度上表现出优于大多数先前的模型。
    Abstract The task of incomplete utterance rewriting has recently gotten much attention. Previous models struggled to extract information from the dialogue context, as evidenced by the low restoration scores. To address this issue, we propose a novel sequence tagging-based model, which is more adept at extracting information from context. Meanwhile, we introduce speaker-aware embedding to model speaker variation. Experiments on multiple public datasets show that our model achieves optimal results on all nine restoration scores while having other metric scores comparable to previous state-of-the-art models. Furthermore, benefitting from the model's simplicity, our approach outperforms most previous models on inference speed.
    摘要 近期,句子重构任务受到了广泛关注。先前的模型在对话上提取信息方面存在问题,这可以从低的重建得分来看出来。为解决这问题,我们提议一种基于序列标记的新模型,它更好地从对话上提取信息。同时,我们引入了说话人变化的嵌入,以模型说话人的变化。在多个公共数据集上进行实验,我们发现我们的模型在所有九个重建得分上具有优化的结果,而其他维度得分与先前状态艺模型相似。此外,由于我们的模型简单,我们的方法在推理速度方面超过了大多数先前模型。

Embedding Mental Health Discourse for Community Recommendation

  • paper_url: http://arxiv.org/abs/2307.03892
  • repo_url: None
  • paper_authors: Hy Dang, Bang Nguyen, Noah Ziems, Meng Jiang
  • for: investigate the use of discourse embedding techniques to develop a community recommendation system for mental health support groups on social media
  • methods: use content-based and collaborative filtering techniques to enhance the performance of the recommendation system
  • results: the proposed approach outperforms the use of each technique separately and provides interpretability in the recommendation process.Here’s the full text in Simplified Chinese:
  • for: 本研究探讨了基于话语嵌入技术的社交媒体群体推荐系统,帮助用户在社交媒体平台上找到适合自己精神健康问题的群体。
  • methods: 我们使用了内容基于和合作筛选技术来提高推荐系统的性能,以提供更加有效和可解释的推荐结果。
  • results: 我们的研究结果表明,提pose的方法在单独使用每种技术时都表现出优异,并且具有更高的可解释性。
    Abstract Our paper investigates the use of discourse embedding techniques to develop a community recommendation system that focuses on mental health support groups on social media. Social media platforms provide a means for users to anonymously connect with communities that cater to their specific interests. However, with the vast number of online communities available, users may face difficulties in identifying relevant groups to address their mental health concerns. To address this challenge, we explore the integration of discourse information from various subreddit communities using embedding techniques to develop an effective recommendation system. Our approach involves the use of content-based and collaborative filtering techniques to enhance the performance of the recommendation system. Our findings indicate that the proposed approach outperforms the use of each technique separately and provides interpretability in the recommendation process.
    摘要 我们的论文研究了利用话语嵌入技术开发一个关注社交媒体上心理健康支持群体的社区推荐系统。社交媒体平台提供了用户匿名连接到专门针对他们的兴趣的社区的机制。然而,由于在线社区的数量繁多,用户可能面临着找到与他们心理健康问题相关的群体的挑战。为解决这个问题,我们研究了将话语信息从不同的 subreddit 社区嵌入技术的集成,以开发一个高效的推荐系统。我们的方法包括使用内容基本和合作筛选技术来提高推荐系统的性能。我们的发现表明,我们的提议方法比使用每一种技术分别提供更高的性能和可解释性。

MDACE: MIMIC Documents Annotated with Code Evidence

  • paper_url: http://arxiv.org/abs/2307.03859
  • repo_url: https://github.com/3mcloud/MDACE
  • paper_authors: Hua Cheng, Rana Jafari, April Russell, Russell Klopfer, Edmond Lu, Benjamin Striner, Matthew R. Gormley
  • for: 这个论文是为了提出一个基于医疗记录的代码证据集,以便用于评估医疗代码检索系统中的代码证据提取方法。
  • methods: 这个论文使用了EffectiveCAN模型(Liu et al., 2021)来实现代码证据提取方法的基准性能。
  • results: 这个论文 introduce了第一个公共可用的代码证据集(MDACE),该集基于MIMIC-III医疗记录子集,并由专业医疗编码人员进行标注。该集包括302名入院病人的3,934个证据段和52名医生的5,563个证据段。
    Abstract We introduce a dataset for evidence/rationale extraction on an extreme multi-label classification task over long medical documents. One such task is Computer-Assisted Coding (CAC) which has improved significantly in recent years, thanks to advances in machine learning technologies. Yet simply predicting a set of final codes for a patient encounter is insufficient as CAC systems are required to provide supporting textual evidence to justify the billing codes. A model able to produce accurate and reliable supporting evidence for each code would be a tremendous benefit. However, a human annotated code evidence corpus is extremely difficult to create because it requires specialized knowledge. In this paper, we introduce MDACE, the first publicly available code evidence dataset, which is built on a subset of the MIMIC-III clinical records. The dataset -- annotated by professional medical coders -- consists of 302 Inpatient charts with 3,934 evidence spans and 52 Profee charts with 5,563 evidence spans. We implemented several evidence extraction methods based on the EffectiveCAN model (Liu et al., 2021) to establish baseline performance on this dataset. MDACE can be used to evaluate code evidence extraction methods for CAC systems, as well as the accuracy and interpretability of deep learning models for multi-label classification. We believe that the release of MDACE will greatly improve the understanding and application of deep learning technologies for medical coding and document classification.
    摘要 我们介绍一个数据集用于证据/理由提取的极多标签分类任务, specifically Computer-Assisted Coding (CAC)。在过去几年,随着机器学习技术的进步,CAC 技术得到了显著改进。然而,只是预测病人遇到的最终代码是不够的,CAC 系统需要提供可靠的文本证据来证明计费代码。一个能够生成准确和可靠的证据 для每个代码的模型会是一项很大的利益。然而,人工标注的代码证据集是非常困难的创建,因为它需要专业的医疗知识。在这篇论文中,我们介绍了 MDACE,第一个公共可用的代码证据集,建立在 MIMIC-III 医疗记录子集上。该数据集由专业医疗编码员标注,包括302 例入院记录和 52 例 Profee 记录,共计 3,934 个证据段和 5,563 个证据段。我们基于 EffectiveCAN 模型(Liu et al., 2021)实现了多种证据提取方法,以建立基线性能于这个数据集。 MDACE 可以用来评估代码证据提取方法,以及深度学习模型的多标签分类精度和可读性。我们认为,释放 MDACE 将大大提高深度学习技术在医疗编码和文档分类领域的理解和应用。

Subjective Crowd Disagreements for Subjective Data: Uncovering Meaningful CrowdOpinion with Population-level Learning

  • paper_url: http://arxiv.org/abs/2307.10189
  • repo_url: None
  • paper_authors: Tharindu Cyril Weerasooriya, Sarah Luger, Saloni Poddar, Ashiqur R. KhudaBukhsh, Christopher M. Homan
  • for: 这个论文旨在提高人工标注数据的公平性,包括生活altering的决策和社交媒体上的内容审核。
  • methods: 该论文提出了一种基于无监督学习的方法,使用语言特征和标签分布来汇集相似的项目,以提高标注数据的公平性。
  • results: 该论文通过在五个社交媒体平台上进行实验,发现该方法可以有效地降低标注差异,并在 Facebook 上进行了在野实验,证明了该方法的可行性。
    Abstract Human-annotated data plays a critical role in the fairness of AI systems, including those that deal with life-altering decisions or moderating human-created web/social media content. Conventionally, annotator disagreements are resolved before any learning takes place. However, researchers are increasingly identifying annotator disagreement as pervasive and meaningful. They also question the performance of a system when annotators disagree. Particularly when minority views are disregarded, especially among groups that may already be underrepresented in the annotator population. In this paper, we introduce \emph{CrowdOpinion}\footnote{Accepted for publication at ACL 2023}, an unsupervised learning based approach that uses language features and label distributions to pool similar items into larger samples of label distributions. We experiment with four generative and one density-based clustering method, applied to five linear combinations of label distributions and features. We use five publicly available benchmark datasets (with varying levels of annotator disagreements) from social media (Twitter, Gab, and Reddit). We also experiment in the wild using a dataset from Facebook, where annotations come from the platform itself by users reacting to posts. We evaluate \emph{CrowdOpinion} as a label distribution prediction task using KL-divergence and a single-label problem using accuracy measures.
    摘要 人类标注数据在人工智能系统中发挥 kritical 作用,包括决策生活方式或修订人类创建的网络/社交媒体内容。 Conventionally, annotator disagreements are resolved before any learning takes place. However, researchers are increasingly identifying annotator disagreement as pervasive and meaningful. They also question the performance of a system when annotators disagree, especially when minority views are disregarded, especially among groups that may already be underrepresented in the annotator population. In this paper, we introduce \emph{CrowdOpinion}\footnote{Accepted for publication at ACL 2023}, an unsupervised learning based approach that uses language features and label distributions to pool similar items into larger samples of label distributions. We experiment with four generative and one density-based clustering method, applied to five linear combinations of label distributions and features. We use five publicly available benchmark datasets (with varying levels of annotator disagreements) from social media (Twitter, Gab, and Reddit). We also experiment in the wild using a dataset from Facebook, where annotations come from the platform itself by users reacting to posts. We evaluate \emph{CrowdOpinion} as a label distribution prediction task using KL-divergence and a single-label problem using accuracy measures.

Linguistic representations for fewer-shot relation extraction across domains

  • paper_url: http://arxiv.org/abs/2307.03823
  • repo_url: None
  • paper_authors: Sireesh Gururaja, Ritam Dutt, Tinglong Liao, Carolyn Rose
  • for: 这些研究旨在探讨语言表示的 incorporation 对几个 NLP 任务的跨领域性表现的影响。
  • methods: 这些研究使用了 freely available off-the-shelf 工具 construct 语法和 semantics 图,并将其与 popular transformer-based 架构结合使用,以提高 Generalization 性。
  • results: 研究发现,通过 incorporating 语言表示,可以significantly 提高 few-shot 转移中的性能,但是两种类型的图都 display roughly equivalent utility。
    Abstract Recent work has demonstrated the positive impact of incorporating linguistic representations as additional context and scaffolding on the in-domain performance of several NLP tasks. We extend this work by exploring the impact of linguistic representations on cross-domain performance in a few-shot transfer setting. An important question is whether linguistic representations enhance generalizability by providing features that function as cross-domain pivots. We focus on the task of relation extraction on three datasets of procedural text in two domains, cooking and materials science. Our approach augments a popular transformer-based architecture by alternately incorporating syntactic and semantic graphs constructed by freely available off-the-shelf tools. We examine their utility for enhancing generalization, and investigate whether earlier findings, e.g. that semantic representations can be more helpful than syntactic ones, extend to relation extraction in multiple domains. We find that while the inclusion of these graphs results in significantly higher performance in few-shot transfer, both types of graph exhibit roughly equivalent utility.
    摘要 最近的研究已经证明在各种自然语言处理任务中,通过添加语言表示来提供更多的上下文和托管的环境可以提高域内性能。我们延续这项工作,探索语言表示在跨领域传输中的影响。关键问题是否reno linguistic representations enhance generalizability by providing features that function as cross-domain pivots。我们选择关注在三个dataset上进行Relation Extraction任务,这三个dataset分别是cooking和materials science。我们的方法是在流行的transformer-based architecture中,通过自由可用的off-the-shelf工具来构建语法和Semantic graphs,并尝试以这些graphs来提高通用性。我们检查它们是否有助于提高通用性,并 investigates whether earlier findings, e.g. that semantic representations can be more helpful than syntactic ones, extend to relation extraction in multiple domains。我们发现,虽然包含这些graphs可以在几个shot传输中显著提高性能,但是两种类型的graph具有相当的有用性。

On the Efficacy of Sampling Adapters

  • paper_url: http://arxiv.org/abs/2307.03749
  • repo_url: https://github.com/rycolab/sampling-adapters
  • paper_authors: Clara Meister, Tiago Pimentel, Luca Malagutti, Ethan G. Wilcox, Ryan Cotterell
  • for: 这个论文的目的是解释各种修改语言生成模型的抽象分布,以提高生成的文本质量。
  • methods: 这篇论文使用了一种统一的框架来理解这些修改技术,并通过分析这些技术对模型的影响来解释它们如何改善文本质量。
  • results: 研究发现,使用 sampling adapters 技术可以导致更高质量的文本生成,并且可以在不同的测试集上保持这种改善。此外,这些技术可以帮助模型更好地遵循语言的规则,从而提高文本的可读性和可理解性。
    Abstract Sampling is a common strategy for generating text from probabilistic models, yet standard ancestral sampling often results in text that is incoherent or ungrammatical. To alleviate this issue, various modifications to a model's sampling distribution, such as nucleus or top-k sampling, have been introduced and are now ubiquitously used in language generation systems. We propose a unified framework for understanding these techniques, which we term sampling adapters. Sampling adapters often lead to qualitatively better text, which raises the question: From a formal perspective, how are they changing the (sub)word-level distributions of language generation models? And why do these local changes lead to higher-quality text? We argue that the shift they enforce can be viewed as a trade-off between precision and recall: while the model loses its ability to produce certain strings, its precision rate on desirable text increases. While this trade-off is not reflected in standard metrics of distribution quality (such as perplexity), we find that several precision-emphasizing measures indeed indicate that sampling adapters can lead to probability distributions more aligned with the true distribution. Further, these measures correlate with higher sequence-level quality scores, specifically, Mauve.
    摘要 <>将文本翻译成简化中文。<>采样是一种常见的语言生成模型策略,然而标准祖先采样通常会导致文本无法理解或不正确。为了解决这个问题,许多模型采样分布的修改,如核心采样或top-k采样,已经被引入并广泛应用于语言生成系统。我们提出一个统一框架来理解这些技术,我们称之为采样适配器。采样适配器通常会导致更高质量的文本,这引起了问题:从形式上讲,这些地方性改变如何影响语言生成模型的(子)字元级分布?而这些本地改变是如何导致更高质量的文本呢?我们认为这种shift可以被视为一种精度和回归之间的交易:虽然模型失去了生成某些字串的能力,但它在愿景字串上的精度提高。尽管这种交易不被标准的分布质量指标(如复杂度)反映,我们发现了一些精度强调的指标确实表明采样适配器可以导致更加适应true分布的概率分布。此外,这些指标与高级序级质量分数相关,具体是Mauve。

QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models

  • paper_url: http://arxiv.org/abs/2307.03738
  • repo_url: https://github.com/ist-daslab/qigen
  • paper_authors: Tommaso Pegolotti, Elias Frantar, Dan Alistarh, Markus Püschel
  • for: 支持量化生成推理在 LLMA 或 OPT 上的自动代码生成方法。
  • methods: 使用目标架构和性能模型,包括硬件特性和方法特定的准确性约束。
  • results: 对 CPU 上的 LLMA 模型进行了快速和准确的推理,与现有开源解决方案相比,性能和准确性都比较高。
    Abstract We present ongoing work on a new automatic code generation approach for supporting quantized generative inference on LLMs such as LLaMA or OPT on off-the-shelf CPUs. Our approach is informed by the target architecture and a performance model, including both hardware characteristics and method-specific accuracy constraints. Results on CPU-based inference for LLaMA models show that our approach can lead to high performance and high accuracy, comparing favorably to the best existing open-source solution. A preliminary implementation is available at https://github.com/IST-DASLab/QIGen.
    摘要 我们现在正在进行一种新的自动代码生成方法的研究,用于支持量化生成推理在LLaMA或OPT类型的大语言模型上进行OFF-the-SHELF CPU上的推理。我们的方法被目标架构和性能模型所指导,包括硬件特点和方法特定的准确性约束。对CPU上的LLaMA模型进行推理的结果显示,我们的方法可以达到高性能和高准确性,与现有开源解决方案相比之下,表现出色。一个初步的实现可以在https://github.com/IST-DASLab/QIGen中找到。

Improving Automatic Quotation Attribution in Literary Novels

  • paper_url: http://arxiv.org/abs/2307.03734
  • repo_url: None
  • paper_authors: Krishnapriya Vishnubhotla, Frank Rudzicz, Graeme Hirst, Adam Hammond
  • for: 本研究旨在Addressing the challenge of quotation attribution in literary novels, where available information varies.
  • methods: 本研究使用four interconnected sub-tasks:character identification, coreference resolution, quotation identification, and speaker attribution.
  • results: 研究显示,使用state-of-the-art models on each sub-task independently,可以 achieve high accuracy scores. 特别是,一种简单的sequential prediction model可以达到与现有模型相同的准确率。
    Abstract Current models for quotation attribution in literary novels assume varying levels of available information in their training and test data, which poses a challenge for in-the-wild inference. Here, we approach quotation attribution as a set of four interconnected sub-tasks: character identification, coreference resolution, quotation identification, and speaker attribution. We benchmark state-of-the-art models on each of these sub-tasks independently, using a large dataset of annotated coreferences and quotations in literary novels (the Project Dialogism Novel Corpus). We also train and evaluate models for the speaker attribution task in particular, showing that a simple sequential prediction model achieves accuracy scores on par with state-of-the-art models.
    摘要 当前的引用归属模型在文学小说中假设有不同水平的可用信息,这会对在野外推理中带来挑战。我们将引用归属看作为四个相互连接的子任务:人物识别、核心引用解决、引用归属和说话人归属。我们使用大量Literary novels中注释的核心引用和引用的数据集来对每个子任务进行独立的 bencharking,并对说话人归属任务进行特点验证,发现简单的顺序预测模型可以达到与状态值模型相同的准确率。

INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers

  • paper_url: http://arxiv.org/abs/2307.03712
  • repo_url: https://github.com/lightmatter-ai/int-fp-qsim
  • paper_authors: Lakshmi Nair, Mikhail Bernadskiy, Arulselvan Madhavan, Craig Chan, Ayon Basumallik, Darius Bunandar
  • for: 本文旨在支持资源约束和大语言模型的民生化,提出一个开源的INT-FP-QSim simulate器,可以在不同的数字精度和格式下灵活评估大语言模型和视Transformers。
  • methods: INT-FP-QSim 使用现有的开源库 such as TensorRT, QPytorch 和 AIMET,组合了这些库来支持不同的浮点数和整数格式。
  • results: 通过使用 INT-FP-QSim,我们对大语言模型和视Transformers 的性能做了评估,并比较了最近提出的 Adaptive Block Floating Point, SmoothQuant, GPTQ 和 RPTQ 方法的影响。
    Abstract The recent rise of large language models (LLMs) has resulted in increased efforts towards running LLMs at reduced precision. Running LLMs at lower precision supports resource constraints and furthers their democratization, enabling users to run billion-parameter LLMs on their personal devices. To supplement this ongoing effort, we propose INT-FP-QSim: an open-source simulator that enables flexible evaluation of LLMs and vision transformers at various numerical precisions and formats. INT-FP-QSim leverages existing open-source repositories such as TensorRT, QPytorch and AIMET for a combined simulator that supports various floating point and integer formats. With the help of our simulator, we survey the impact of different numerical formats on the performance of LLMs and vision transformers at 4-bit weights and 4-bit or 8-bit activations. We also compare recently proposed methods like Adaptive Block Floating Point, SmoothQuant, GPTQ and RPTQ on the model performances. We hope INT-FP-QSim will enable researchers to flexibly simulate models at various precisions to support further research in quantization of LLMs and vision transformers.
    摘要 INT-FP-QSim leverages existing open-source repositories such as TensorRT, QPytorch, and AIMET to create a combined simulator that supports various floating point and integer formats. With the help of our simulator, we survey the impact of different numerical formats on the performance of LLMs and vision transformers when using 4-bit weights and 4-bit or 8-bit activations. We also compare recently proposed methods like Adaptive Block Floating Point, SmoothQuant, GPTQ, and RPTQ on the model performances.Our hope is that INT-FP-QSim will enable researchers to flexibly simulate models at various precisions, supporting further research in quantization of LLMs and vision transformers.

LaunchpadGPT: Language Model as Music Visualization Designer on Launchpad

  • paper_url: http://arxiv.org/abs/2307.04827
  • repo_url: https://github.com/yunlong10/launchpadgpt
  • paper_authors: Siting Xu, Yunlong Tang, Feng Zheng
  • for: 这个论文的目的是帮助设计Launchpad的音乐视觉效果,以及为 Beginner 提供更加可 accessible的方法来创建音乐视觉。
  • methods: 这个论文提出了一种基于语言模型的自动生成音乐视觉设计方法,用于Launhpad上演奏音乐。该方法使用了音乐piece作为输入,并输出了Launhpad上演奏的光效果形式的视频。
  • results: 实验结果表明,该方法可以创造出比随机生成方法更好的音乐视觉,并且具有更广泛的音乐视觉应用前景。
    Abstract Launchpad is a musical instrument that allows users to create and perform music by pressing illuminated buttons. To assist and inspire the design of the Launchpad light effect, and provide a more accessible approach for beginners to create music visualization with this instrument, we proposed the LaunchpadGPT model to generate music visualization designs on Launchpad automatically. Based on the language model with excellent generation ability, our proposed LaunchpadGPT takes an audio piece of music as input and outputs the lighting effects of Launchpad-playing in the form of a video (Launchpad-playing video). We collect Launchpad-playing videos and process them to obtain music and corresponding video frame of Launchpad-playing as prompt-completion pairs, to train the language model. The experiment result shows the proposed method can create better music visualization than random generation methods and hold the potential for a broader range of music visualization applications. Our code is available at https://github.com/yunlong10/LaunchpadGPT/.
    摘要 Launchpad是一种音乐 инструмент,允许用户通过点击灯光按钮创作和演奏音乐。为帮助设计Launchpad的光效和启发创新,以及为Beginner提供更 accessible的音乐视觉创作方式,我们提议了LaunchpadGPT模型,自动生成Launchpad演奏的视觉设计。基于出色的语言模型,我们的提议LaunchpadGPT接受音乐作品作为输入,并输出Launchpad演奏的光效效果 видео(Launchpad演奏视频)。我们收集了Launchpad演奏视频,并处理它们,以获得音乐和对应的视频帧的Launchpad演奏Prompt-completion对。通过训练语言模型,我们实现了该方法可以创造更好的音乐视觉,并且具有更广泛的音乐视觉应用前景。我们的代码可以在https://github.com/yunlong10/LaunchpadGPT/上获取。

cs.LG - 2023-07-08

Efficient Model-Free Exploration in Low-Rank MDPs

  • paper_url: http://arxiv.org/abs/2307.03997
  • repo_url: None
  • paper_authors: Zakaria Mhammedi, Adam Block, Dylan J. Foster, Alexander Rakhlin
  • for: 本文旨在开发一种实用、效率高的探索算法,用于在高维空间中进行强化学习,并且具有函数近似和泛化能力。
  • methods: 本文提出的 VoX 算法使用一种通过unknown feature embedding的低维 Markov Decision Processes(MDPs)来实现探索,该算法是 computationally efficient 并且无需额外的统计假设。
  • results: 本文的分析表明,VoX 算法可以在低维 MDPs 中提供 provably 样本效率的探索,并且不需要额外的模型基础或 latent variable structure。
    Abstract A major challenge in reinforcement learning is to develop practical, sample-efficient algorithms for exploration in high-dimensional domains where generalization and function approximation is required. Low-Rank Markov Decision Processes -- where transition probabilities admit a low-rank factorization based on an unknown feature embedding -- offer a simple, yet expressive framework for RL with function approximation, but existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions such as latent variable structure, access to model-based function approximation, or reachability. In this work, we propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs that is both computationally efficient and model-free, allowing for general function approximation and requiring no additional structural assumptions. Our algorithm, VoX, uses the notion of a generalized optimal design for the feature embedding as an efficiently computable basis for exploration, performing efficient optimal design computation by interleaving representation learning and policy optimization. Our analysis -- which is appealingly simple and modular -- carefully combines several techniques, including a new reduction from optimal design computation to policy optimization based on the Frank-Wolfe method, and an improved analysis of a certain minimax representation learning objective found in prior work.
    摘要 一个主要挑战在强化学习中是开发实用、样本效率高的探索算法,以便在高维度空间中进行探索,并且需要泛化和函数近似。低维马尔可夫遇过程(Low-Rank MDPs)提供了一个简单 yet 表达力强的框架,但现有算法的问题包括(1)计算复杂度太高,或(2)需要特殊的统计假设,如隐藏变量结构、函数近似模型或可达性。在这项工作中,我们提出了首个可证明样本效率高的探索算法,可以在低维马尔可夫遇过程中进行探索,并且不需要特殊的结构假设。我们的算法VoX利用了一个通用优化设计的概念,作为一种可读取的基础 для探索,并通过将表示学习和政策优化结合在一起,实现高效的优化设计计算。我们的分析,感知简单而干净,综合使用了多种技术,包括一种新的减少从优化设计计算到政策优化基于Frank-Wolfe方法的技术,以及在先前的工作中发现的一种改进的最小最大表达学习目标的分析。

NLP Meets RNA: Unsupervised Embedding Learning for Ribozymes with Word2Vec

  • paper_url: http://arxiv.org/abs/2307.05537
  • repo_url: None
  • paper_authors: Andrew Kean Gao
  • for: 本研究使用Word2Vec算法来提高我们对核酸杂合物(ribozyme)的理解,并寻找更好的方法来分类ribozyme。
  • methods: 本研究使用Word2Vec算法,通过训练9,000多个不同的核酸杂合物,将核酸序列映射到128和256维度的 вектор空间中。
  • results: 结果表明,使用核酸序列embedding可以准确地分类核酸杂合物,并且256维度的embedding Vector Space可以捕捉核酸杂合物的特征。
    Abstract Ribozymes, RNA molecules with distinct 3D structures and catalytic activity, have widespread applications in synthetic biology and therapeutics. However, relatively little research has focused on leveraging deep learning to enhance our understanding of ribozymes. This study implements Word2Vec, an unsupervised learning technique for natural language processing, to learn ribozyme embeddings. Ribo2Vec was trained on over 9,000 diverse ribozymes, learning to map sequences to 128 and 256-dimensional vector spaces. Using Ribo2Vec, sequence embeddings for five classes of ribozymes (hatchet, pistol, hairpin, hovlinc, and twister sister) were calculated. Principal component analysis demonstrated the ability of these embeddings to distinguish between ribozyme classes. Furthermore, a simple SVM classifier trained on ribozyme embeddings showed promising results in accurately classifying ribozyme types. Our results suggest that the embedding vectors contained meaningful information about ribozymes. Interestingly, 256-dimensional embeddings behaved similarly to 128-dimensional embeddings, suggesting that a lower dimension vector space is generally sufficient to capture ribozyme features. This approach demonstrates the potential of Word2Vec for bioinformatics, opening new avenues for ribozyme research. Future research includes using a Transformer-based method to learn RNA embeddings, which can capture long-range interactions between nucleotides.
    摘要 瑞博酵素(ribozyme)具有广泛的应用前景,包括生物技术和生物医学。然而,相对论文不多关注利用深度学习来提高我们对瑞博酵素的理解。这项研究使用Word2Vec算法,一种无监督学习技术,来学习瑞博酵素的嵌入。 Ribo2Vec 训练在超过9000个多样化瑞博酵素上,将序列映射到128和256维度的向量空间中。使用 Ribo2Vec,我们计算了5种瑞博酵素类型(锥子、手枪、捷径、托征和姐妹)的序列嵌入。对于这些嵌入,我们使用主成分分析得到了可以分辨瑞博酵素类型的结果。此外,使用瑞博酵素嵌入训练的简单支持向量机器学习(SVM)分类器也表现出了良好的准确率。这些结果表明瑞博酵素嵌入 vectors 含有有用的信息。有趣的是,256维度的嵌入和128维度的嵌入之间的行为相似,这表明低维度向量空间通常足够 capture瑞博酵素特征。这种方法可能会打开新的 Bioinformatics 研究途径。未来的研究可能包括使用Transformer算法来学习RNA嵌入,以捕捉RNA中距离较远的核苷酸之间的长距离交互。

Integrating Curricula with Replays: Its Effects on Continual Learning

  • paper_url: http://arxiv.org/abs/2307.05747
  • repo_url: https://github.com/zhanglab-deepneurocoglab/integrating-curricula-with-replays
  • paper_authors: Ren Jie Tee, Mengmi Zhang
  • for: 本研究旨在探讨curricula的integrating与replay方法在持续学习中的作用,以提高知识退化和学习转移。
  • methods: 我们使用了三种不同的curricula设计方法,包括交叠频率的重复示例与训练数据、顺序排序示例的重复顺序、以及从uniform分布中选择示例的策略。这些方法与认知心理学原理相Alignment,并可以利用重复实践中的优点、易于困难重复、以及示例选择策略。
  • results: 我们的结果表明,这三种curricula有效地遏制了衰弱性失忆,并提高了正面知识传递。这些结果表明,curricula可以在持续学习方法中提供进一步的改进。我们的代码和数据可以在GitHub上找到:https://github.com/ZhangLab-DeepNeuroCogLab/Integrating-Curricula-with-Replays
    Abstract Humans engage in learning and reviewing processes with curricula when acquiring new skills or knowledge. This human learning behavior has inspired the integration of curricula with replay methods in continual learning agents. The goal is to emulate the human learning process, thereby improving knowledge retention and facilitating learning transfer. Existing replay methods in continual learning agents involve the random selection and ordering of data from previous tasks, which has shown to be effective. However, limited research has explored the integration of different curricula with replay methods to enhance continual learning. Our study takes initial steps in examining the impact of integrating curricula with replay methods on continual learning in three specific aspects: the interleaved frequency of replayed exemplars with training data, the sequence in which exemplars are replayed, and the strategy for selecting exemplars into the replay buffer. These aspects of curricula design align with cognitive psychology principles and leverage the benefits of interleaved practice during replays, easy-to-hard rehearsal, and exemplar selection strategy involving exemplars from a uniform distribution of difficulties. Based on our results, these three curricula effectively mitigated catastrophic forgetting and enhanced positive knowledge transfer, demonstrating the potential of curricula in advancing continual learning methodologies. Our code and data are available: https://github.com/ZhangLab-DeepNeuroCogLab/Integrating-Curricula-with-Replays
    摘要 人类在学习和复习过程中使用课程,以获得新技能或知识。这种人类学习行为对于 continual learning agent 的 интеграцию有着灵感。目标是通过模拟人类学习过程,提高知识保持和学习传递。现有的重播方法在 continual learning agent 中已经证明有效。然而,有限的研究探讨了不同课程的 интеграция和重播方法的影响。我们的研究首先探讨了在三个方面中的课程设计对 continual learning 的影响:重播示例的频率与训练数据的排序、示例的重播顺序和选择示例进入缓存的策略。这些课程设计方面与认知心理学原理相吻合,利用重播中的叠加实践、易于困难的复习和选择示例的策略。根据我们的结果,这三种课程有效地遏止了恶性遗忘和提高了正面知识传递,这表明了课程在 continual learning 方法中的潜力。我们的代码和数据可以在 GitHub 上找到:https://github.com/ZhangLab-DeepNeuroCogLab/Integrating-Curricula-with-Replays

Building and Road Segmentation Using EffUNet and Transfer Learning Approach

  • paper_url: http://arxiv.org/abs/2307.03980
  • repo_url: None
  • paper_authors: Sahil Gangurde
  • for: 本论文目标是对遥感图像中的建筑和路径进行 semantics 分割。
  • methods: 该论文提出了一种基于 Google 新提出的 EfficientNetV2 增强网络,并结合 UNet 解码器实现分割图像的方法。
  • results: 该方法在麻省建筑和路径数据集上达到了 benchmark 分割精度,IOU 分割精度分别为 0.8365 和 0.9153。
    Abstract In city, information about urban objects such as water supply, railway lines, power lines, buildings, roads, etc., is necessary for city planning. In particular, information about the spread of these objects, locations and capacity is needed for the policymakers to make impactful decisions. This thesis aims to segment the building and roads from the aerial image captured by the satellites and UAVs. Many different architectures have been proposed for the semantic segmentation task and UNet being one of them. In this thesis, we propose a novel architecture based on Google's newly proposed EfficientNetV2 as an encoder for feature extraction with UNet decoder for constructing the segmentation map. Using this approach we achieved a benchmark score for the Massachusetts Building and Road dataset with an mIOU of 0.8365 and 0.9153 respectively.
    摘要 在城市中,有关城市 объекts 如水 supply、铁路线、电力线、建筑、路径等的信息是城市规划的必要条件。特别是政策制定者需要知道这些对象的扩散、位置和容量,以便做出有力的决策。本论文目的是从航天图像和无人机拍摄的卫星和无人机图像中分割建筑和路径。许多不同的架构已经为semantic segmentation任务提出了多种方案,其中UNet是其中之一。本论文提出了基于Google新提出的EfficientNetV2架构来进行特征提取,并与UNet决码器结合使用以生成分割图像。通过这种方法,我们在马萨诸岛建筑和路径数据集上达到了 benchmark 分数,具体分别为0.8365和0.9153。

Digital Twins for Patient Care via Knowledge Graphs and Closed-Form Continuous-Time Liquid Neural Networks

  • paper_url: http://arxiv.org/abs/2307.04772
  • repo_url: None
  • paper_authors: Logan Nye
  • for: 这篇研究是为了探讨如何使用数字双技术在医疗领域提供个性化的药物和支持、早期诊断、模拟治疗结果和优化手术规划。
  • methods: 本研究提出了一种新的框架,使用知识图和闭式形式的连续时间流体神经网络来解决计算成本和模型复杂性的挑战,以实现实时分析和个性化医疗。
  • results: 本研究的结果表明,使用这种新的框架可以实现实时的患者健康状况概述、个性化医疗和早期诊断、模拟治疗结果和优化手术规划,为数字双技术在医疗领域的应用提供了新的可能性。
    Abstract Digital twin technology has is anticipated to transform healthcare, enabling personalized medicines and support, earlier diagnoses, simulated treatment outcomes, and optimized surgical plans. Digital twins are readily gaining traction in industries like manufacturing, supply chain logistics, and civil infrastructure. Not in patient care, however. The challenge of modeling complex diseases with multimodal patient data and the computational complexities of analyzing it have stifled digital twin adoption in the biomedical vertical. Yet, these major obstacles can potentially be handled by approaching these models in a different way. This paper proposes a novel framework for addressing the barriers to clinical twin modeling created by computational costs and modeling complexities. We propose structuring patient health data as a knowledge graph and using closed-form continuous-time liquid neural networks, for real-time analytics. By synthesizing multimodal patient data and leveraging the flexibility and efficiency of closed form continuous time networks and knowledge graph ontologies, our approach enables real time insights, personalized medicine, early diagnosis and intervention, and optimal surgical planning. This novel approach provides a comprehensive and adaptable view of patient health along with real-time analytics, paving the way for digital twin simulations and other anticipated benefits in healthcare.
    摘要 《数字双胞技术在医疗领域的应用》数字双胞技术已被预测将重点改变医疗领域,使得个性化药物和支持、早期诊断、模拟治疗结果和优化手术规划等became possible。然而,在病人护理方面,数字双胞技术的普及受到了许多挑战,主要是模拟复杂的疾病和多模态病人数据的计算复杂性。然而,这些主要障碍可以通过一种新的方法来解决。本文提出了一种新的框架,用于解决在计算成本和模型复杂性等方面阻碍临床双胞模型的障碍。我们提议将病人健康数据结构化为知识图,并使用闭合形时间连续神经网络,以实现实时分析。通过对多模态病人数据进行synthesize,并利用closed-form continuous-time liquid neural networks和知识图 ontologies的灵活性和高效性,我们的方法可以实现实时的医疗探索和个性化医疗。本文的新方法可以为医疗领域提供一个普适和可变的病人健康视图,同时实现实时的分析,为数字双胞 simulations和其他预期的健康医疗带来了新的机遇。

Fault Monitoring in Passive Optical Networks using Machine Learning Techniques

  • paper_url: http://arxiv.org/abs/2307.03945
  • repo_url: None
  • paper_authors: Khouloud Abdelli, Carsten Tropschug, Helmut Griesser, Stephan Pachnicke
  • for: 提高Passive Optical Network(PON)系统的稳定性和可靠性,降低服务提供商或运维商面临的财务损失风险。
  • methods: 利用机器学习(ML)技术进行PON系统故障监测,并通过实验Optical Time Domain Reflectometry(OTDR)数据验证其效果。
  • results: 通过ML方法实现PON系统故障监测,可以减少服务中断时的财务损失风险,并提高系统的可靠性和稳定性。
    Abstract Passive optical network (PON) systems are vulnerable to a variety of failures, including fiber cuts and optical network unit (ONU) transmitter/receiver failures. Any service interruption caused by a fiber cut can result in huge financial losses for service providers or operators. Identifying the faulty ONU becomes difficult in the case of nearly equidistant branch terminations because the reflections from the branches overlap, making it difficult to distinguish the faulty branch given the global backscattering signal. With increasing network size, the complexity of fault monitoring in PON systems increases, resulting in less reliable monitoring. To address these challenges, we propose in this paper various machine learning (ML) approaches for fault monitoring in PON systems, and we validate them using experimental optical time domain reflectometry (OTDR) data.
    摘要 激光网络(PON)系统受到多种故障的威胁,包括纤维断和Optical Network Unit(ONU)发送器/接收器故障。任何纤维断导致的服务中断可能会对服务提供商或运营商造成巨大的经济损失。在分支结束处 nearly equidistant的情况下,缺陷的ONU diffficult to identify,因为分支 reflection overlap,使得不可以通过全球反射信号来 отличи出缺陷分支。随着网络规模的增加,PON系统中的缺陷监测复杂度增加,导致监测变得更加不可靠。为解决这些挑战,本文提出了基于机器学习(ML)的多种缺陷监测方法,并通过实验optical time domain reflectometry(OTDR)数据 validate them。

Rosko: Row Skipping Outer Products for Sparse Matrix Multiplication Kernels

  • paper_url: http://arxiv.org/abs/2307.03930
  • repo_url: https://github.com/vnatesh/rosko
  • paper_authors: Vikas Natesh, Andrew Sabot, H. T. Kung, Mark Ting
  • for: 这篇论文是为了提高深度神经网络(DNN)的计算和内存访问需求。
  • methods: 论文使用了row skipping outer products(Rosko)来 derivate sparse matrix multiplication(SpMM)kernels,以降低DNNs的计算和内存访问需求。Rosko可以在程式执行时 skip entire row computations,并且具有低缓存管理开销。
  • results: Rosko kernels可以与其他outer product scheduling方法结合使用,并且可以将其他方法的计算 skipped by using Rosko的packing format。Rosko kernels在实际硬件上比 EXISTS auto-tuning和搜索基于解决方案和商业供应链的 vendor-optimized 库来的性能更好,并且可以在不同的神经网络负载上实现更高的执行速度,达到6.5倍的时间优化。
    Abstract We propose Rosko -- row skipping outer products -- for deriving sparse matrix multiplication (SpMM) kernels in reducing computation and memory access requirements of deep neural networks (DNNs). Rosko allows skipping of entire row computations during program execution with low sparsity-management overheads. We analytically derive sparse CPU kernels that adapt to given hardware characteristics to effectively utilize processor cores and minimize data movement without the need for auto-tuning or search space exploration. Rosko can be integrated with other outer product scheduling methods, allowing them to leverage row skipping by using Rosko's packing format to skip unnecessary computation. Rosko kernels outperform existing auto-tuning and search-based solutions as well as state-of-the-art vendor-optimized libraries on real hardware across a variety of neural network workloads. For matrices with sparsities ranging from 65% to 99.8% typically found in machine learning, Rosko kernels achieve up to a 6.5x runtime reduction on Intel and ARM CPUs.
    摘要 我们提出了Rosko,它是跳过外积栈的方法,用于获得深度神经网络(DNN)中的简约矩阵乘法(SpMM)内核。Rosko可以在程式执行时跳过整个行的计算,具有低简约管理成本。我们分析性地 derivatesparse CPU内核,可以根据硬件特性来有效利用处理器核心和减少资料移动,无需进行自动调整或搜索空间探索。Rosko可以与其他外积栈调度方法结合,让它们利用Rosko的封包格式跳过无需计算。Rosko内核比现有的自动调整和搜索基于解决方案以及商业化优化库在真实硬件上表现更好,对于具有65%到99.8%的简约率,通常见于机器学习 tasks 中,Rosko内核在英特尔和ARM CPU上可以获得至多6.5倍的执行时间优化。

Fairness-Aware Graph Neural Networks: A Survey

  • paper_url: http://arxiv.org/abs/2307.03929
  • repo_url: None
  • paper_authors: April Chen, Ryan A. Rossi, Namyong Park, Puja Trivedi, Yu Wang, Tong Yu, Sungchul Kim, Franck Dernoncourt, Nesreen K. Ahmed
    for: This paper focuses on improving the fairness of Graph Neural Networks (GNNs) by examining and categorizing fairness techniques for GNNs.methods: The paper discusses previous work on fair GNN models and techniques, including those that focus on improving fairness during preprocessing, training, or post-processing. The paper also introduces an intuitive taxonomy for fairness evaluation metrics.results: The paper highlights the advantages and intuition of using fairness techniques in GNNs, and summarizes graph datasets that are useful for benchmarking the fairness of GNN models. The paper also identifies key open problems and challenges that remain to be addressed.
    Abstract Graph Neural Networks (GNNs) have become increasingly important due to their representational power and state-of-the-art predictive performance on many fundamental learning tasks. Despite this success, GNNs suffer from fairness issues that arise as a result of the underlying graph data and the fundamental aggregation mechanism that lies at the heart of the large class of GNN models. In this article, we examine and categorize fairness techniques for improving the fairness of GNNs. Previous work on fair GNN models and techniques are discussed in terms of whether they focus on improving fairness during a preprocessing step, during training, or in a post-processing phase. Furthermore, we discuss how such techniques can be used together whenever appropriate, and highlight the advantages and intuition as well. We also introduce an intuitive taxonomy for fairness evaluation metrics including graph-level fairness, neighborhood-level fairness, embedding-level fairness, and prediction-level fairness metrics. In addition, graph datasets that are useful for benchmarking the fairness of GNN models are summarized succinctly. Finally, we highlight key open problems and challenges that remain to be addressed.
    摘要 graph neural networks (GNNs) 在最近几年内变得越来越重要,这是因为它们在许多基本学习任务上具有代表力和状态速度。然而,GNNs 受到公平问题的影响,这些问题 arise 由下面的图数据和基本聚合机制。在这篇文章中,我们评估和分类 fairness 技术,以提高 GNNs 的公平性。先前的 fair GNN 模型和技术被分为三类:在预处理阶段、在训练阶段和在后处理阶段。此外,我们还讨论了这些技术可以在合适的时候一起使用,并高亮了其优势和直觉。此外,我们还提出了一个直观的公平评价度量分类,包括图级公平、邻居级公平、嵌入级公平和预测级公平度量。此外,我们还简要介绍了一些用于评估 GNN 模型公平性的图数据。最后,我们高亮了一些未解决的问题和挑战。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Fast Empirical Scenarios

  • paper_url: http://arxiv.org/abs/2307.03927
  • repo_url: None
  • paper_authors: Michael Multerer, Paul Schneider, Rohan Sen
  • for: 从大型高维数据中提取一小量表示性的方案,以便进行可靠的enario-based模型和高维数学 интегра。
  • methods: 提出了两种新的算法,第一种可以找到尚未被观察到的enario,并提供了基于enario的协方差矩阵表示;第二种选择了已经实现的世界状态中重要的数据点,并与更高阶sample moment信息相符。
  • results: 对比较几种现有算法,提出的两种算法具有高效计算和可靠的enario-based模型特点,并在股票投资中得到了广泛的应用。
    Abstract We seek to extract a small number of representative scenarios from large and high-dimensional panel data that are consistent with sample moments. Among two novel algorithms, the first identifies scenarios that have not been observed before, and comes with a scenario-based representation of covariance matrices. The second proposal picks important data points from states of the world that have already realized, and are consistent with higher-order sample moment information. Both algorithms are efficient to compute, and lend themselves to consistent scenario-based modeling and high-dimensional numerical integration. Extensive numerical benchmarking studies and an application in portfolio optimization favor the proposed algorithms.
    摘要 我们寻求从大量高维批处理数据中提取一小量表示性的情景,这些情景与样本幂应于一致。我们提出了两种新算法,第一种可以找到没有被观察到的情景,同时提供了情景基于协方差矩阵的表示方式。第二种算法选择已经实现的世界状态中重要的数据点,并与高阶样本幂信息一致。这两种算法都具有计算效率,适用于一致的enario-based模型和高维数字 интегра。我们在大量计算和股票投资应用中进行了广泛的数值对比研究,而这两种算法都得到了 preference。

Training Physics-Informed Neural Networks via Multi-Task Optimization for Traffic Density Prediction

  • paper_url: http://arxiv.org/abs/2307.03920
  • repo_url: None
  • paper_authors: Bo Wang, A. K. Qin, Sajjad Shafiei, Hussein Dia, Adriana-Simona Mihaita, Hanna Grzybowska
  • for: 用于预测交通流量
  • methods: 使用多任务优化(MTO)框架,创建多个辅助任务并与主任务一起解决
  • results: 实验结果表明,我们提议的训练框架可以在比较限制的数据量下提高PINN的性能
    Abstract Physics-informed neural networks (PINNs) are a newly emerging research frontier in machine learning, which incorporate certain physical laws that govern a given data set, e.g., those described by partial differential equations (PDEs), into the training of the neural network (NN) based on such a data set. In PINNs, the NN acts as the solution approximator for the PDE while the PDE acts as the prior knowledge to guide the NN training, leading to the desired generalization performance of the NN when facing the limited availability of training data. However, training PINNs is a non-trivial task largely due to the complexity of the loss composed of both NN and physical law parts. In this work, we propose a new PINN training framework based on the multi-task optimization (MTO) paradigm. Under this framework, multiple auxiliary tasks are created and solved together with the given (main) task, where the useful knowledge from solving one task is transferred in an adaptive mode to assist in solving some other tasks, aiming to uplift the performance of solving the main task. We implement the proposed framework and apply it to train the PINN for addressing the traffic density prediction problem. Experimental results demonstrate that our proposed training framework leads to significant performance improvement in comparison to the traditional way of training the PINN.
    摘要 Physics-informed neural networks (PINNs) 是一个新兴的研究领域,它将物理法则 incorporated 到 neural network (NN) 的训练中,使得 NN 能够更好地预测数据集中的特征。在 PINNs 中,NN acted as 数据集中的解决方案,而 PDE acted as 导航 NN 训练的知识。这使得 NN 在面临有限的训练数据时能够达到更好的总体性能。然而,训练 PINNs 是一个非rivial任务,主要因为损失函数的复杂性,它包括 NN 和物理法则部分。在这种情况下,我们提出了一个基于多任务优化 (MTO) 的新的训练框架。在这个框架中,我们创建了多个auxiliary任务,并与主任务一起解决,使得解决一个任务的有用知识可以在适应模式下传递给另一个任务,以提高主任务的解决性能。我们实现了提议的框架,并应用它来训练 PINN 来解决交通密度预测问题。实验结果表明,我们的训练框架可以在传统训练 PINN 的基础上获得显著性能提升。

Incorporating Deep Q – Network with Multiclass Classification Algorithms

  • paper_url: http://arxiv.org/abs/2307.03908
  • repo_url: None
  • paper_authors: Noopur Zambare, Ravindranath Sawane
  • for: 本研究探讨了如何使用深度Q网络(DQN)提高多类分类算法的功能。我们使用Kaggle的标准数据集创建了一个框架,该框架将DQN与现有的多类分类算法结合使用。
  • methods: 本研究使用了Kaggle的标准数据集,并采用了深度强化学习策略来提高多类分类精度。
  • results: 本研究发现,通过使用DQN,可以提高多类分类精度和稳定性。这些结果可以用于各种领域,包括图像识别、自然语言处理和生物信息学。特别是在金融风险管理和财务预测方面,可以用DQN来预测企业面临financial distress的可能性。
    Abstract In this study, we explore how Deep Q-Network (DQN) might improve the functionality of multiclass classification algorithms. We will use a benchmark dataset from Kaggle to create a framework incorporating DQN with existing supervised multiclass classification algorithms. The findings of this study will bring insight into how deep reinforcement learning strategies may be used to increase multiclass classification accuracy. They have been used in a number of fields, including image recognition, natural language processing, and bioinformatics. This study is focused on the prediction of financial distress in companies in addition to the wider application of Deep Q-Network in multiclass classification. Identifying businesses that are likely to experience financial distress is a crucial task in the fields of finance and risk management. Whenever a business experiences serious challenges keeping its operations going and meeting its financial responsibilities, it is said to be in financial distress. It commonly happens when a company has a sharp and sustained recession in profitability, cash flow issues, or an unsustainable level of debt.
    摘要 在这个研究中,我们探讨了深度Q网络(DQN)如何改善多类分类算法的功能。我们将使用Kaggle的标准数据集创建一个框架,该框架将包含DQN与现有的多类分类算法。我们的发现将提供深入理解deep reinforcement learning策略如何提高多类分类精度。这些策略在图像识别、自然语言处理和生物信息处理等领域都有广泛的应用。本研究的特点是用DQN预测公司面临财务危机的可能性,而不是仅仅是将其应用于多类分类。公司经历严重的运营困难和履行财务责任时,被称为财务危机。这通常发生在公司收入下降、财务流动性困难或债务水平不可持续时。

ScriptWorld: Text Based Environment For Learning Procedural Knowledge

  • paper_url: http://arxiv.org/abs/2307.03906
  • repo_url: https://github.com/exploration-lab/scriptworld
  • paper_authors: Abhinav Joshi, Areeb Ahmad, Umang Pandey, Ashutosh Modi
  • for: 这篇论文的目的是教育RL算法理解日常生活中的常识知识和自然语言理解能力。
  • methods: 这篇论文使用了ScriptWorld环境,这是一个基于脚本集的文本基础环境,用于教育RL算法日常生活中的常识知识和自然语言理解能力。RL基线模型/代理在Scriptworld环境中进行游戏,并利用预训练语言模型中的特征来解决文本基础环境中的问题。
  • results: 实验表明,基于预训练语言模型的语言特征可以帮助RL算法解决日常生活中的文本基础环境问题。
    Abstract Text-based games provide a framework for developing natural language understanding and commonsense knowledge about the world in reinforcement learning based agents. Existing text-based environments often rely on fictional situations and characters to create a gaming framework and are far from real-world scenarios. In this paper, we introduce ScriptWorld: a text-based environment for teaching agents about real-world daily chores and hence imparting commonsense knowledge. To the best of our knowledge, it is the first interactive text-based gaming framework that consists of daily real-world human activities designed using scripts dataset. We provide gaming environments for 10 daily activities and perform a detailed analysis of the proposed environment. We develop RL-based baseline models/agents to play the games in Scriptworld. To understand the role of language models in such environments, we leverage features obtained from pre-trained language models in the RL agents. Our experiments show that prior knowledge obtained from a pre-trained language model helps to solve real-world text-based gaming environments. We release the environment via Github: https://github.com/Exploration-Lab/ScriptWorld
    摘要

Feature selection simultaneously preserving both class and cluster structures

  • paper_url: http://arxiv.org/abs/2307.03902
  • repo_url: None
  • paper_authors: Suchismita Das, Nikhil R. Pal
  • for: 本研究的目的是提出一种能同时考虑分类和归类结构的特征选择方法,以提高分类和归类性能。
  • methods: 本研究使用神经网络来实现特征选择,同时考虑分类和归类结构的保持。
  • results: 实验结果表明,提议的特征/带选择方法可以选择一 subset of 特征,这些特征是良好的分类和归类。
    Abstract When a data set has significant differences in its class and cluster structure, selecting features aiming only at the discrimination of classes would lead to poor clustering performance, and similarly, feature selection aiming only at preserving cluster structures would lead to poor classification performance. To the best of our knowledge, a feature selection method that simultaneously considers class discrimination and cluster structure preservation is not available in the literature. In this paper, we have tried to bridge this gap by proposing a neural network-based feature selection method that focuses both on class discrimination and structure preservation in an integrated manner. In addition to assessing typical classification problems, we have investigated its effectiveness on band selection in hyperspectral images. Based on the results of the experiments, we may claim that the proposed feature/band selection can select a subset of features that is good for both classification and clustering.
    摘要 当数据集具有显著的类和差异结构时,仅仅选择用于分类的特征会导致群集性能差,而仅仅保持群集结构的特征选择也会导致分类性能差。据我们所知,Literature中没有一种同时考虑类分化和群集结构保持的特征选择方法。在这篇论文中,我们尝试了bridging这个差距,提出了基于神经网络的特征选择方法,该方法同时考虑类分化和群集结构保持。除了评估典型的分类问题之外,我们还对带选择在多spectral图像进行了研究。根据实验结果,我们可以确认,我们提议的特征/带选择方法可以选择一个良好的分类和群集结构的子集。

Active Learning in Physics: From 101, to Progress, and Perspective

  • paper_url: http://arxiv.org/abs/2307.03899
  • repo_url: None
  • paper_authors: Yongcheng Ding, José D. Martín-Guerrero, Yolanda Vives-Gilabert, Xi Chen
  • for: 这篇论文主要是为了介绍活动学习(AL),包括它的理论和最新进展。
  • methods: 这篇论文使用了 iterate 选择无标示样本,并由专家进行标注。这种协议可以帮助模型性能更高,比训练所有标记样本。
  • results: 这篇论文提出了将活动学习与量子机器学习(QL)融合的想法,以实现这两个领域之间的共融。
    Abstract Active Learning (AL) is a family of machine learning (ML) algorithms that predates the current era of artificial intelligence. Unlike traditional approaches that require labeled samples for training, AL iteratively selects unlabeled samples to be annotated by an expert. This protocol aims to prioritize the most informative samples, leading to improved model performance compared to training with all labeled samples. In recent years, AL has gained increasing attention, particularly in the field of physics. This paper presents a comprehensive and accessible introduction to the theory of AL reviewing the latest advancements across various domains. Additionally, we explore the potential integration of AL with quantum ML, envisioning a synergistic fusion of these two fields rather than viewing AL as a mere extension of classical ML into the quantum realm.
    摘要 aktiv learning (AL) 是一家机器学习(ML)算法家族,比现代人工智能更早出现。不同于传统的方法,AL 在训练过程中不需要标注样本,而是逐渐选择无标注样本,由专家进行标注。这个协议的目的是优化模型性能,与全部标注样本训练相比。在最近几年,AL 在物理领域获得了越来越多的注意,特别是在物理领域。这篇论文提供了 AL 的完整和可访问的理论介绍,同时还探讨了 AL 与量子 ML 的可能的集成,而不是视 AL 为класси ML 在量子世界的扩展。

Incomplete Utterance Rewriting as Sequential Greedy Tagging

  • paper_url: http://arxiv.org/abs/2307.06337
  • repo_url: None
  • paper_authors: Yunshan Chen
  • for: 这个论文主要是为了解决 incomplete utterance rewriting 问题,即在对话中提取信息的问题。
  • methods: 这个模型使用 sequence tagging 方法,可以更好地从对话中提取信息。此外,我们还引入了 speaker-aware embedding,以模型说话人的变化。
  • results: 我们的模型在多个公共数据集上实现了最佳的 nine restoration scores,而其他 metric scores 与之前的状态OF-the-art模型相比可观。此外,由于我们的模型简单,在推理速度方面也超过了大多数之前的模型。
    Abstract The task of incomplete utterance rewriting has recently gotten much attention. Previous models struggled to extract information from the dialogue context, as evidenced by the low restoration scores. To address this issue, we propose a novel sequence tagging-based model, which is more adept at extracting information from context. Meanwhile, we introduce speaker-aware embedding to model speaker variation. Experiments on multiple public datasets show that our model achieves optimal results on all nine restoration scores while having other metric scores comparable to previous state-of-the-art models. Furthermore, benefitting from the model's simplicity, our approach outperforms most previous models on inference speed.
    摘要 “ incomplete utterance rewriting ”在最近已经获得了很多注意。以前的模型很难从对话 контекст中提取信息,这可以见到低恢复得分。为解决这个问题,我们提议一个新的序列标签基于模型,这个模型更能够从 контекст中提取信息。同时,我们引入了Speaker-aware embedding来模型说话者的变化。多个公共数据集上的实验表明,我们的模型在所有九个恢复得分上取得了最佳结果,而其他指标得分与过往最佳模型相近。此外,由于我们的方法简单,我们的方法在推理速度方面超越了大多数前一代模型。

Improving Prototypical Part Networks with Reward Reweighing, Reselection, and Retraining

  • paper_url: http://arxiv.org/abs/2307.03887
  • repo_url: None
  • paper_authors: Robin Netzorg, Jiaxun Li, Bin Yu
  • for: 该paper aimed to improve the interpretability of deep learning models for image classification by using human feedback to fine-tune the prototypes.
  • methods: 该paper proposed a novel method called R3-ProtoPNet, which combines reward-based reweighting, reselection, and retraining to align the model’s features with the updated prototypes.
  • results: 该paper found that R3-ProtoPNet improves the overall consistency and meaningfulness of the prototypes, but lower the test predictive accuracy when used independently. However, when multiple R3-ProtoPNets are incorporated into an ensemble, the test predictive performance is increased while maintaining interpretability.
    Abstract In recent years, work has gone into developing deep interpretable methods for image classification that clearly attributes a model's output to specific features of the data. One such of these methods is the prototypical part network (ProtoPNet), which attempts to classify images based on meaningful parts of the input. While this method results in interpretable classifications, this method often learns to classify from spurious or inconsistent parts of the image. Hoping to remedy this, we take inspiration from the recent developments in Reinforcement Learning with Human Feedback (RLHF) to fine-tune these prototypes. By collecting human annotations of prototypes quality via a 1-5 scale on the CUB-200-2011 dataset, we construct a reward model that learns to identify non-spurious prototypes. In place of a full RL update, we propose the reweighted, reselected, and retrained prototypical part network (R3-ProtoPNet), which adds an additional three steps to the ProtoPNet training loop. The first two steps are reward-based reweighting and reselection, which align prototypes with human feedback. The final step is retraining to realign the model's features with the updated prototypes. We find that R3-ProtoPNet improves the overall consistency and meaningfulness of the prototypes, but lower the test predictive accuracy when used independently. When multiple R3-ProtoPNets are incorporated into an ensemble, we find an increase in test predictive performance while maintaining interpretability.
    摘要 recent years, work has gone into developing deep interpretable methods for image classification that clearly attributes a model's output to specific features of the data. One such of these methods is the prototypical part network (ProtoPNet), which attempts to classify images based on meaningful parts of the input. While this method results in interpretable classifications, this method often learns to classify from spurious or inconsistent parts of the image. Hoping to remedy this, we take inspiration from the recent developments in Reinforcement Learning with Human Feedback (RLHF) to fine-tune these prototypes. By collecting human annotations of prototypes quality via a 1-5 scale on the CUB-200-2011 dataset, we construct a reward model that learns to identify non-spurious prototypes. In place of a full RL update, we propose the reweighted, reselected, and retrained prototypical part network (R3-ProtoPNet), which adds an additional three steps to the ProtoPNet training loop. The first two steps are reward-based reweighting and reselection, which align prototypes with human feedback. The final step is retraining to realign the model's features with the updated prototypes. We find that R3-ProtoPNet improves the overall consistency and meaningfulness of the prototypes, but lower the test predictive accuracy when used independently. When multiple R3-ProtoPNets are incorporated into an ensemble, we find an increase in test predictive performance while maintaining interpretability.Here's the translation in Traditional Chinese:在近年来,有很多工作在开发深度可解释的图像分类方法,以将模型的输出明确地对对应的资料特征进行推导。一个如此方法是 прототипіаль部分网络(ProtoPNet),它尝试根据输入图像的意义部分进行分类。although this method results in interpretable classifications, this method often learns to classify from spurious or inconsistent parts of the image. hoping to remedy this, we take inspiration from the recent developments in Reinforcement Learning with Human Feedback (RLHF) to fine-tune these prototypes. by collecting human annotations of prototypes quality via a 1-5 scale on the CUB-200-2011 dataset, we construct a reward model that learns to identify non-spurious prototypes. in place of a full RL update, we propose the reweighted, reselected, and retrained prototypical part network (R3-ProtoPNet), which adds an additional three steps to the ProtoPNet training loop. the first two steps are reward-based reweighting and reselection, which align prototypes with human feedback. the final step is retraining to realign the model's features with the updated prototypes. we find that R3-ProtoPNet improves the overall consistency and meaningfulness of the prototypes, but lower the test predictive accuracy when used independently. when multiple R3-ProtoPNets are incorporated into an ensemble, we find an increase in test predictive performance while maintaining interpretability.

On Regularization and Inference with Label Constraints

  • paper_url: http://arxiv.org/abs/2307.03886
  • repo_url: None
  • paper_authors: Kaifu Wang, Hangfeng He, Tin D. Nguyen, Piyush Kumar, Dan Roth
  • for: 这个论文主要针对的是机器学习中的约束问题,具体来说是在结构预测问题中表达约束的方法。
  • methods: 论文使用了两种常见的约束编码策略,即常规化和约束推理,并对它们在机器学习管道中的影响进行了评估。
  • results: 论文表明,正则化可以减少泛化差距,但是它会偏好小违反,导致模型偏离优质点。而受约束推理则可以降低人口风险,从而使得违反变成了优势。因此,论文建议在使用这两种方法时,可以共同使用它们,并在某些条件下使用约束推理来补偿正则化引入的偏见。
    Abstract Prior knowledge and symbolic rules in machine learning are often expressed in the form of label constraints, especially in structured prediction problems. In this work, we compare two common strategies for encoding label constraints in a machine learning pipeline, regularization with constraints and constrained inference, by quantifying their impact on model performance. For regularization, we show that it narrows the generalization gap by precluding models that are inconsistent with the constraints. However, its preference for small violations introduces a bias toward a suboptimal model. For constrained inference, we show that it reduces the population risk by correcting a model's violation, and hence turns the violation into an advantage. Given these differences, we further explore the use of two approaches together and propose conditions for constrained inference to compensate for the bias introduced by regularization, aiming to improve both the model complexity and optimal risk.
    摘要 Prior knowledge和符号规则在机器学习中经常表达为标签约束,特别是在结构预测问题中。在这项工作中,我们比较了两种常见的标签约束编码策略在机器学习管道中的影响,即规范减少和受约束的推理。对于规范,我们表明了它可以防止模型与约束不一致的情况,从而缩小泛化差。但是,它会偏好小规模的违反,导致模型偏好一个不佳的模型。对于受约束推理,我们表明了它可以降低人口风险,通过约束违反的修正,使违反变成一个优势。基于这些差异,我们进一步探索了两种方法的同时使用,并提出了限制推理可以资COMPENSATE FOR THE BIAS INTRODUCED BY REGULARIZATION,以提高模型复杂度和优化风险。

Noisy Tensor Ring approximation for computing gradients of Variational Quantum Eigensolver for Combinatorial Optimization

  • paper_url: http://arxiv.org/abs/2307.03884
  • repo_url: None
  • paper_authors: Dheeraj Peddireddy, Utkarsh Priyam, Vaneet Aggarwal
  • for: 提高量子 approximate optimization 和量子对角化器(VQE)的可扩展性,突破分类势能 Computational complexity limit.
  • methods: 提议一种类别计算方法,利用参数移位规则,从环形矩阵中计算期望值,使用二元位卷积矩阵来表示环形矩阵的变换。
  • results: 比较分类计算和量子计算的复杂度,显示这种方法可以减少分类计算的复杂度,使其可以更快地评估量子算法的梯度。
    Abstract Variational Quantum algorithms, especially Quantum Approximate Optimization and Variational Quantum Eigensolver (VQE) have established their potential to provide computational advantage in the realm of combinatorial optimization. However, these algorithms suffer from classically intractable gradients limiting the scalability. This work addresses the scalability challenge for VQE by proposing a classical gradient computation method which utilizes the parameter shift rule but computes the expected values from the circuits using a tensor ring approximation. The parametrized gates from the circuit transform the tensor ring by contracting the matrix along the free edges of the tensor ring. While the single qubit gates do not alter the ring structure, the state transformations from the two qubit rotations are evaluated by truncating the singular values thereby preserving the structure of the tensor ring and reducing the computational complexity. This variation of the Matrix product state approximation grows linearly in number of qubits and the number of two qubit gates as opposed to the exponential growth in the classical simulations, allowing for a faster evaluation of the gradients on classical simulators.
    摘要 “精简量子算法,尤其是量子近似优化和量子对角器(VQE),已经证明了它们在排序问题中的计算优势。然而,这些算法受到古典无法计算的梯度所限制,这使得扩展性受到挑战。本工作解决了VQE的扩展性问题,提出一种古典梯度计算方法,利用参数移动规则,并从图静态环节中计算出预期值。图静态环节中的参数门由图静态环节中的矩阵折缩,单位门不改变环节结构,但两个量子矩阵的状态转换则被舒缓,以保持环节结构并降低计算复杂性。这种矩阵产品state的扩展增长Linearly在量子矩阵中,相比之下,古典 simulations中的扩展增长 exponential,使得在古典 simulators 上较快地评估梯度。”

Large Language Models for Supply Chain Optimization

  • paper_url: http://arxiv.org/abs/2307.03875
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Beibin Li, Konstantina Mellou, Bo Zhang, Jeevan Pathuri, Ishai Menache
  • for: The paper is written for supply chain operators and managers who need to interpret and explain the outcomes of optimization algorithms to stakeholders.
  • methods: The paper proposes a framework called OptiGuide that leverages Large Language Models (LLMs) to provide insights into the underlying optimization outcomes. The framework accepts queries in plain text and outputs explanations of the optimization results without requiring the transfer of proprietary data to the LLM.
  • results: The paper demonstrates the effectiveness of OptiGuide on a real server placement scenario within Microsoft’s cloud supply chain. The results show that OptiGuide can provide accurate explanations of the optimization outcomes, and the proposed evaluation benchmark can be used to evaluate the accuracy of the LLM output in other scenarios.
    Abstract Supply chain operations traditionally involve a variety of complex decision making problems. Over the last few decades, supply chains greatly benefited from advances in computation, which allowed the transition from manual processing to automation and cost-effective optimization. Nonetheless, business operators still need to spend substantial efforts in explaining and interpreting the optimization outcomes to stakeholders. Motivated by the recent advances in Large Language Models (LLMs), we study how this disruptive technology can help bridge the gap between supply chain automation and human comprehension and trust thereof. We design OptiGuide -- a framework that accepts as input queries in plain text, and outputs insights about the underlying optimization outcomes. Our framework does not forgo the state-of-the-art combinatorial optimization technology, but rather leverages it to quantitatively answer what-if scenarios (e.g., how would the cost change if we used supplier B instead of supplier A for a given demand?). Importantly, our design does not require sending proprietary data over to LLMs, which can be a privacy concern in some circumstances. We demonstrate the effectiveness of our framework on a real server placement scenario within Microsoft's cloud supply chain. Along the way, we develop a general evaluation benchmark, which can be used to evaluate the accuracy of the LLM output in other scenarios.
    摘要 供应链运营传统上涉及到许多复杂的决策问题。过去几十年,供应链受计算技术的进步所助,从人工处理过渡到自动化和成本效果优化。然而,业务运营者仍需投入很大的努力来解释和理解优化结果,以获得投资者和客户的信任。鼓励于最近的大语言模型(LLM)技术的进步,我们研究如何使用这种破坏技术来bridging供应链自动化和人类理解之间的差距。我们设计了OptiGuide框架,它可以接受普通文本查询,并输出供应链优化结果的概念。我们的框架不会抛弃现有的组合优化技术,而是利用它们来回答什么时候的问题(例如,如果使用供应商B而不是供应商A来满足某个需求时,成本会如何变化?)。重要的是,我们的设计不需要将专有数据传递到LLM中,这可能会在某些情况下成为隐私问题。我们在微软云供应链中进行了实际的服务分配enario,并在过程中开发了一个通用评估标准,可以用于评估其他场景中LLM输出的准确性。

Domain Adaptation using Silver Standard Labels for Ki-67 Scoring in Digital Pathology: A Step Closer to Widescale Deployment

  • paper_url: http://arxiv.org/abs/2307.03872
  • repo_url: None
  • paper_authors: Amanda Dy, Ngoc-Nhu Jennifer Nguyen, Seyed Hossein Mirjahanmardi, Melanie Dawe, Anthony Fyles, Wei Shi, Fei-Fei Liu, Dimitrios Androutsos, Susan Done, April Khademi
    for: 这个研究旨在提高 Ki-67 PI 分配的 объектив性和效率,使用深度学习系统。methods: 该研究提出了一个领域适应管道,使用无监督框架生成目标领域的银标签,以增强源频率银标签数据的学习效果。results: 比较 SS Only、GS Only、Mixed、GS+SS 和我们的提议方法 SS+GS 的五种训练方案,SS+GS 方法在目标数据上显示出最高的 PI 准确率(95.9%)和更一致的结果,与 GS Only 模型在目标数据上的表现有 statistically significant difference(p < 0.05)。
    Abstract Deep learning systems have been proposed to improve the objectivity and efficiency of Ki- 67 PI scoring. The challenge is that while very accurate, deep learning techniques suffer from reduced performance when applied to out-of-domain data. This is a critical challenge for clinical translation, as models are typically trained using data available to the vendor, which is not from the target domain. To address this challenge, this study proposes a domain adaptation pipeline that employs an unsupervised framework to generate silver standard (pseudo) labels in the target domain, which is used to augment the gold standard (GS) source domain data. Five training regimes were tested on two validated Ki-67 scoring architectures (UV-Net and piNET), (1) SS Only: trained on target silver standard (SS) labels, (2) GS Only: trained on source GS labels, (3) Mixed: trained on target SS and source GS labels, (4) GS+SS: trained on source GS labels and fine-tuned on target SS labels, and our proposed method (5) SS+GS: trained on source SS labels and fine-tuned on source GS labels. The SS+GS method yielded significantly (p < 0.05) higher PI accuracy (95.9%) and more consistent results compared to the GS Only model on target data. Analysis of t-SNE plots showed features learned by the SS+GS models are more aligned for source and target data, resulting in improved generalization. The proposed pipeline provides an efficient method for learning the target distribution without manual annotations, which are time-consuming and costly to generate for medical images. This framework can be applied to any target site as a per-laboratory calibration method, for widescale deployment.
    摘要 深度学习系统已被提议以提高基因67PI分配的 объектив性和效率。然而,深度学习技术在域外数据上表现不佳是一个重要挑战,这是因为模型通常在生产商提供的数据上进行训练,而不是target域的数据。为解决这个挑战,本研究提出了一个适应域pipeline,该pipeline使用了无监督框架生成target域的银标(pseudo)标签,并将其用于增强来自源域的金标(GS)数据。本研究测试了五种训练方案,包括(1) SS Only:基于target银标(SS)标签进行训练,(2) GS Only:基于源GS标签进行训练,(3) Mixed:基于target SS和源GS标签进行训练,(4) GS+SS:基于源GS标签进行训练,并在target SS标签上进行微调,以及我们的提议方法(5) SS+GS:基于源SS标签进行训练,并在源GS标签上进行微调。SS+GS方法在target数据上得到了 statistically significant (p < 0.05) 的PI准确率(95.9%),并且与源数据的结果更一致。分析t-SNE图表显示,SS+GS模型学习的特征更加适应于源和目标数据,导致了改善的总体性。本ipeline提供了一种效率的方法来学习目标分布,不需要手动生成昂贵和时间consuming的医学图像标签。这种框架可以在任何目标站点上应用,作为室内准确方法进行大规模部署。

When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment

  • paper_url: http://arxiv.org/abs/2307.03864
  • repo_url: https://github.com/twni2016/memory-rl
  • paper_authors: Tianwei Ni, Michel Ma, Benjamin Eysenbach, Pierre-Luc Bacon
  • for: 这种研究旨在解释RL算法中Transformer Architecture的成功原因,以及未来研究和benchmark设计的重要领域。
  • methods: 该研究使用了Formal definitions of memory length和credit assignment length来测试Transformer-based RL方法的表现。
  • results: 研究发现,Transformers可以增强RL算法的记忆能力,可以扩展到需要记忆步骤1500个的任务。但是,Transformers不会改善长期归因。
    Abstract Reinforcement learning (RL) algorithms face two distinct challenges: learning effective representations of past and present observations, and determining how actions influence future returns. Both challenges involve modeling long-term dependencies. The transformer architecture has been very successful to solve problems that involve long-term dependencies, including in the RL domain. However, the underlying reason for the strong performance of Transformer-based RL methods remains unclear: is it because they learn effective memory, or because they perform effective credit assignment? After introducing formal definitions of memory length and credit assignment length, we design simple configurable tasks to measure these distinct quantities. Our empirical results reveal that Transformers can enhance the memory capacity of RL algorithms, scaling up to tasks that require memorizing observations $1500$ steps ago. However, Transformers do not improve long-term credit assignment. In summary, our results provide an explanation for the success of Transformers in RL, while also highlighting an important area for future research and benchmark design.
    摘要 reinforcement learning (RL) 算法面临两个不同的挑战:学习过去和当前观察的有效表示,以及确定行动对未来返回的影响。两个挑战都涉及到模型长期关系。 transformer 架构在RL领域中具有非常出色的表现,但是下面的原因仍然不清楚:是因为它们学习有效的记忆,或者是因为它们实现有效的准确分配?我们给出了正式的定义 memory length 和 credit assignment length,然后设计了简单可配置的任务来测量这两个特点。我们的实验结果表明,Transformers 可以增强 RL 算法的记忆容量,可以扩展到需要记忆 observation 1500 步的任务。但是,Transformers 不会改善长期准确分配。简单来说,我们的结果可以解释 transformer 在 RL 中的成功,同时也提出了未来研究和标准化的测试设计。

Memory-Immersed Collaborative Digitization for Area-Efficient Compute-in-Memory Deep Learning

  • paper_url: http://arxiv.org/abs/2307.03863
  • repo_url: None
  • paper_authors: Shamma Nasrin, Maeesha Binte Hashem, Nastaran Darabi, Benjamin Parpillon, Farah Fahim, Wilfred Gomes, Amit Ranjan Trivedi
  • for: 这个研究旨在提高深度学习推导中的计算效率,通过将compute-in-memory(CiM)阵列用于深度学习推导,以减少外部内存存取和面积开销。
  • methods: 这个研究使用了内存内部的潜在阻抗bit线来实现area-efficient的successive approximation(SA)数字化,并借由CiM阵列之间的协同运算来实现更多的并行计算和面积优化。
  • results: 这个研究使用了65奈米CMOS试验板,与40奈米节点5位SAR ADC和40奈米节点5位Flash ADC进行比较,结果显示,这个设计需要相对于40奈米节点SAR ADC的面积和能源减少为$\sim$25$\times$和$\sim$1.4$\times$,相对于40奈米节点Flash ADC的面积和能源减少为$\sim$51$\times$和$\sim$13$\times$。
    Abstract This work discusses memory-immersed collaborative digitization among compute-in-memory (CiM) arrays to minimize the area overheads of a conventional analog-to-digital converter (ADC) for deep learning inference. Thereby, using the proposed scheme, significantly more CiM arrays can be accommodated within limited footprint designs to improve parallelism and minimize external memory accesses. Under the digitization scheme, CiM arrays exploit their parasitic bit lines to form a within-memory capacitive digital-to-analog converter (DAC) that facilitates area-efficient successive approximation (SA) digitization. CiM arrays collaborate where a proximal array digitizes the analog-domain product-sums when an array computes the scalar product of input and weights. We discuss various networking configurations among CiM arrays where Flash, SA, and their hybrid digitization steps can be efficiently implemented using the proposed memory-immersed scheme. The results are demonstrated using a 65 nm CMOS test chip. Compared to a 40 nm-node 5-bit SAR ADC, our 65 nm design requires $\sim$25$\times$ less area and $\sim$1.4$\times$ less energy by leveraging in-memory computing structures. Compared to a 40 nm-node 5-bit Flash ADC, our design requires $\sim$51$\times$ less area and $\sim$13$\times$ less energy.
    摘要 In the proposed digitization scheme, CiM arrays utilize their parasitic bit lines to form a within-memory capacitive digital-to-analog converter (DAC) that enables area-efficient successive approximation (SA) digitization. CiM arrays collaborate by digitizing the analog-domain product-sums when one array computes the scalar product of input and weights.The proposed memory-immersed scheme can efficiently implement various networking configurations among CiM arrays, including Flash, SA, and their hybrid digitization steps. The results are demonstrated using a 65 nm CMOS test chip. Compared to a 40 nm-node 5-bit SAR ADC, our 65 nm design requires approximately 25 times less area and 1.4 times less energy. Compared to a 40 nm-node 5-bit Flash ADC, our design requires approximately 51 times less area and 13 times less energy.

A Natural Language Processing Approach to Malware Classification

  • paper_url: http://arxiv.org/abs/2307.11032
  • repo_url: None
  • paper_authors: Ritik Mehta, Olha Jurečková, Mark Stamp
  • for: 本研究旨在提出一种hybrid模型,将隐藏马尔可夫模型(HMM)训练于机器码序列,并将其生成的隐藏状态序列作为各种分类器的特征 vector。
  • methods: 本研究使用了隐藏马尔可夫模型(HMM)和Random Forests(RF)等多种机器学习和深度学习技术,并将这些技术组合成一种hybrid模型。
  • results: 研究发现,使用这种NLP基于的方法可以在一个复杂的恶意软件集合上达到最佳效果,HMM-Random Forrest模型在这个集合上得到了最佳结果。
    Abstract Many different machine learning and deep learning techniques have been successfully employed for malware detection and classification. Examples of popular learning techniques in the malware domain include Hidden Markov Models (HMM), Random Forests (RF), Convolutional Neural Networks (CNN), Support Vector Machines (SVM), and Recurrent Neural Networks (RNN) such as Long Short-Term Memory (LSTM) networks. In this research, we consider a hybrid architecture, where HMMs are trained on opcode sequences, and the resulting hidden states of these trained HMMs are used as feature vectors in various classifiers. In this context, extracting the HMM hidden state sequences can be viewed as a form of feature engineering that is somewhat analogous to techniques that are commonly employed in Natural Language Processing (NLP). We find that this NLP-based approach outperforms other popular techniques on a challenging malware dataset, with an HMM-Random Forrest model yielding the best results.
    摘要 很多不同的机器学习和深度学习技术已经成功地应用于恶意软件检测和分类。例如,常见的学习技术在恶意软件领域包括隐藏markov模型(HMM)、Random Forests(RF)、卷积神经网络(CNN)、支持向量机器(SVM)和回归神经网络(RNN),如长期短时间记忆网络(LSTM)。在这些研究中,我们考虑了一种混合体系,其中HMM被训练于机器码序列,并将这些训练得到的隐藏状态用作不同的分类器的特征向量。在这种情况下,提取HMM隐藏状态序列可以被视为一种特征工程技术,与自然语言处理(NLP)中常见的技术有一定的相似性。我们发现,这种NLP基于的方法在一个复杂的恶意软件数据集上表现出色,HMM-Random Forrest模型实现了最佳结果。

Keystroke Dynamics for User Identification

  • paper_url: http://arxiv.org/abs/2307.05529
  • repo_url: https://github.com/andreArtelt/KeystrokeDynamicsForUserIdentification
  • paper_authors: Atharva Sharma, Martin Jureček, Mark Stamp
  • for: 这个研究是为了解决用户验证问题,特别是在自由文本数据上。
  • methods: 这个研究使用了一种复杂的图像Like特征,以及多类型Convolutional Neural Networks来进行用户验证。
  • results: 这个研究获得了0.78的分类精度(即用户识别率),但是使用Random Forest分类器并不对相似的特征进行轻微修改后,获得了0.93的分类精度。
    Abstract In previous research, keystroke dynamics has shown promise for user authentication, based on both fixed-text and free-text data. In this research, we consider the more challenging multiclass user identification problem, based on free-text data. We experiment with a complex image-like feature that has previously been used to achieve state-of-the-art authentication results over free-text data. Using this image-like feature and multiclass Convolutional Neural Networks, we are able to obtain a classification (i.e., identification) accuracy of 0.78 over a set of 148 users. However, we find that a Random Forest classifier trained on a slightly modified version of this same feature yields an accuracy of 0.93.
    摘要 在过去的研究中,键盘动态学已经展示了用户认证的搭配可能性,基于固定文本和自由文本数据。在这项研究中,我们考虑了更加困难的多类用户识别问题,基于自由文本数据。我们尝试使用过去已经用来实现自由文本数据上状态前瞻的复杂图像类特征。使用这种图像类特征和多类卷积神经网络,我们能够获得一个分类精度(即识别率)为0.78,在148个用户中。然而,我们发现,使用一个基于这个特征的微小修改版本的Random Forest分类器,可以达到0.93的精度。

Reinforcement and Deep Reinforcement Learning-based Solutions for Machine Maintenance Planning, Scheduling Policies, and Optimization

  • paper_url: http://arxiv.org/abs/2307.03860
  • repo_url: None
  • paper_authors: Oluwaseyi Ogunfowora, Homayoun Najjaran
  • For: This paper reviews the applications of reinforcement and deep reinforcement learning for maintenance planning and optimization problems.* Methods: The paper uses a literature review to identify and categorize existing research on reinforcement learning for maintenance planning, and provides graphical and tabular representations of the adopted methodologies, findings, and interpretations.* Results: The paper highlights research gaps, key insights from the literature, and areas for future work in the field of reinforcement learning for maintenance planning.In Simplified Chinese text, the three information points could be summarized as follows:* For: 本文review了使用强化学习和深度强化学习进行维护规划和优化问题的应用。* Methods: 本文使用文献综述来 indentify和分类现有的强化学习维护规划研究,并提供图形和表格形式的采用方法、发现和解释。* Results: 本文指出了维护规划领域的研究漏洞、文献中的关键发现和未来工作的方向。
    Abstract Systems and machines undergo various failure modes that result in machine health degradation, so maintenance actions are required to restore them back to a state where they can perform their expected functions. Since maintenance tasks are inevitable, maintenance planning is essential to ensure the smooth operations of the production system and other industries at large. Maintenance planning is a decision-making problem that aims at developing optimum maintenance policies and plans that help reduces maintenance costs, extend asset life, maximize their availability, and ultimately ensure workplace safety. Reinforcement learning is a data-driven decision-making algorithm that has been increasingly applied to develop dynamic maintenance plans while leveraging the continuous information from condition monitoring of the system and machine states. By leveraging the condition monitoring data of systems and machines with reinforcement learning, smart maintenance planners can be developed, which is a precursor to achieving a smart factory. This paper presents a literature review on the applications of reinforcement and deep reinforcement learning for maintenance planning and optimization problems. To capture the common ideas without losing touch with the uniqueness of each publication, taxonomies used to categorize the systems were developed, and reviewed publications were highlighted, classified, and summarized based on these taxonomies. Adopted methodologies, findings, and well-defined interpretations of the reviewed studies were summarized in graphical and tabular representations to maximize the utility of the work for both researchers and practitioners. This work also highlights the research gaps, key insights from the literature, and areas for future work.
    摘要 系统和机器会经历多种故障模式,导致机器健康下降,因此维护工作是必要的以还原它们到可以执行预期功能的状态。维护工作是不可避免的,因此维护观念是非常重要,以确保生产系统和其他行业的顺畅运行。维护观念是一个决策问题,旨在发展最佳的维护政策和计划,帮助降低维护成本,延长资产寿命,最大化资产可用性,并确保工作安全。对于维护计划和优化问题,很多使用了强化学习,这是一种基于数据驱动的决策推断算法。通过使用系统和机器的状态监控数据和强化学习,可以开发出智能维护观念,这是一个进攻智能厂的先驱。本文将介绍一篇文献综述,探讨对维护计划和优化问题的应用强化学习和深度强化学习的研究。为了捕捉每篇文献的共同主题而不让它们与具体性的区别,则使用了分类系统,并将综述的文献按照这些分类系统进行分类和摘要。采用的方法、发现和实际的解释都是通过图表和表格的形式呈现,以便对研究人员和实践者具有最大的实用性。本文还强调了研究潜在差距、关键见解和未来工作的方向。

The Ethical Implications of Generative Audio Models: A Systematic Literature Review

  • paper_url: http://arxiv.org/abs/2307.05527
  • repo_url: None
  • paper_authors: Julia Barnett
  • for: 本研究写作的目的是实现Generative audio模型的系统性文献综述,以便评估这个领域的研究者是否考虑到可能的负面影响,以及需要考虑的伦理问题。
  • methods: 本研究使用了884篇Generative audio模型相关的研究文献,通过分析这些文献的内容来评估研究者对可能的负面影响的考虑程度。
  • results: 研究结果显示,只有少于10%的Generative audio研究文献讨论了可能的负面影响,这是极其罕见的。然而,这些文献中提出的伦理问题和问题是深刻的,例如欺诈、深圳制作和版权侵犯等。本研究这样的缺乏伦理考虑和潜在的负面影响,将是这个领域的未来研究指南。
    Abstract Generative audio models typically focus their applications in music and speech generation, with recent models having human-like quality in their audio output. This paper conducts a systematic literature review of 884 papers in the area of generative audio models in order to both quantify the degree to which researchers in the field are considering potential negative impacts and identify the types of ethical implications researchers in this area need to consider. Though 65% of generative audio research papers note positive potential impacts of their work, less than 10% discuss any negative impacts. This jarringly small percentage of papers considering negative impact is particularly worrying because the issues brought to light by the few papers doing so are raising serious ethical implications and concerns relevant to the broader field such as the potential for fraud, deep-fakes, and copyright infringement. By quantifying this lack of ethical consideration in generative audio research and identifying key areas of potential harm, this paper lays the groundwork for future work in the field at a critical point in time in order to guide more conscientious research as this field progresses.
    摘要

inTformer: A Time-Embedded Attention-Based Transformer for Crash Likelihood Prediction at Intersections Using Connected Vehicle Data

  • paper_url: http://arxiv.org/abs/2307.03854
  • repo_url: None
  • paper_authors: B. M. Tazbiul Hassan Anik, Zubayer Islam, Mohamed Abdel-Aty
  • for: 预测交叉点事故可能性的实时预测模型,帮助提高交通安全性。
  • methods: 使用Transformer模型,通过注意力机制来处理数据序列,并且可以同时处理所有数据元素 durante training。
  • results: 在使用INRIX和CATT Lab的信号分析平台上测试的connected vehicle数据上,提出了一个名为inTformer的时间嵌入注意力基于Transformer模型,可以效果地预测交叉点事故可能性。最佳inTformer模型达到了73%的敏感性。
    Abstract The real-time crash likelihood prediction model is an essential component of the proactive traffic safety management system. Over the years, numerous studies have attempted to construct a crash likelihood prediction model in order to enhance traffic safety, but mostly on freeways. In the majority of the existing studies, researchers have primarily employed a deep learning-based framework to identify crash potential. Lately, Transformer has emerged as a potential deep neural network that fundamentally operates through attention-based mechanisms. Transformer has several functional benefits over extant deep learning models such as Long Short-Term Memory (LSTM), Convolution Neural Network (CNN), etc. Firstly, Transformer can readily handle long-term dependencies in a data sequence. Secondly, Transformers can parallelly process all elements in a data sequence during training. Finally, a Transformer does not have the vanishing gradient issue. Realizing the immense possibility of Transformers, this paper proposes inTersection-Transformer (inTformer), a time-embedded attention-based Transformer model that can effectively predict intersection crash likelihood in real-time. The proposed model was evaluated using connected vehicle data extracted from INRIX and Center for Advanced Transportation Technology (CATT) Lab's Signal Analytics Platform. The data was parallelly formatted and stacked at different timesteps to develop nine inTformer models. The best inTformer model achieved a sensitivity of 73%. This model was also compared to earlier studies on crash likelihood prediction at intersections and with several established deep learning models trained on the same connected vehicle dataset. In every scenario, this inTformer outperformed the benchmark models confirming the viability of the proposed inTformer architecture.
    摘要 现实时启发风险预测模型是智能交通安全管理系统的重要组件。过去几年,许多研究都尝试了构建启发风险预测模型,以提高交通安全,但大多数研究都在高速公路上进行。现有的大多数研究者都使用了深度学习框架来识别启发 potential。最近,Transformer 出现了作为深度神经网络的潜在可能,它基于注意力机制来运行。Transformer 与既有的深度学习模型(如 Long Short-Term Memory 和 Convolution Neural Network)相比,具有多种功能优势。首先,Transformer 可以识别长期依赖关系。其次,Transformer 可以并行处理数据序列中的所有元素。最后,Transformer 不受消失梯度问题的影响。鉴于 Transformer 的可能性,本文提出了 intersection-Transformer(inTformer)模型,可以在实时中预测交叉口启发风险。该模型使用 INRIX 和 Center for Advanced Transportation Technology (CATT) Lab 的 Signal Analytics Platform 提供的连接式汽车数据进行评估。数据被平行格式化并堆叠在不同的时间步上,以构建九个 inTformer 模型。最佳 inTformer 模型达到了 73% 的敏感度。这个模型还与其他关于交叉口启发风险预测的研究和已有的深度学习模型在同一个连接式汽车dataset上进行比较。在每种场景下,这个 inTformer 模型都超越了参考模型,证明了提案的 inTformer 架构的可行性。

Optimal Learners for Realizable Regression: PAC Learning and Online Learning

  • paper_url: http://arxiv.org/abs/2307.03848
  • repo_url: None
  • paper_authors: Idan Attias, Steve Hanneke, Alkis Kalavasis, Amin Karbasi, Grigoris Velegkas
  • for: 本文主要研究 realizable 回归的统计复杂性,包括 PAC 学习 Setting 和 online 学习 Setting。
  • methods: 本文首先提出了一种最优化学习器,并提出了一种新的维度来描述可学习的类别。此外,本文还提出了一种基于 Graph 维度的 ERM 学习性维度,以及一种基于 DS 维度的学习可能性维度。
  • results: 本文确定了一个必要条件 для学习可能性,并 conjecture 这可能也是充分条件。此外,本文还解决了 Daskalakis 和 Golowich 在 STOC ‘22 中提出的一个开问。
    Abstract In this work, we aim to characterize the statistical complexity of realizable regression both in the PAC learning setting and the online learning setting. Previous work had established the sufficiency of finiteness of the fat shattering dimension for PAC learnability and the necessity of finiteness of the scaled Natarajan dimension, but little progress had been made towards a more complete characterization since the work of Simon 1997 (SICOMP '97). To this end, we first introduce a minimax instance optimal learner for realizable regression and propose a novel dimension that both qualitatively and quantitatively characterizes which classes of real-valued predictors are learnable. We then identify a combinatorial dimension related to the Graph dimension that characterizes ERM learnability in the realizable setting. Finally, we establish a necessary condition for learnability based on a combinatorial dimension related to the DS dimension, and conjecture that it may also be sufficient in this context. Additionally, in the context of online learning we provide a dimension that characterizes the minimax instance optimal cumulative loss up to a constant factor and design an optimal online learner for realizable regression, thus resolving an open question raised by Daskalakis and Golowich in STOC '22.
    摘要 在这项工作中,我们目标是Characterize realizable regression的统计复杂性在PAC学习设定下和在线学习设定下。previoius work had established the sufficiency of finiteness of the fat shattering dimension for PAC learnability and the necessity of finiteness of the scaled Natarajan dimension, but little progress had been made towards a more complete characterization since the work of Simon 1997 (SICOMP '97). To this end, we first introduce a minimax instance optimal learner for realizable regression and propose a novel dimension that both qualitatively and quantitatively characterizes which classes of real-valued predictors are learnable. We then identify a combinatorial dimension related to the Graph dimension that characterizes ERM learnability in the realizable setting. Finally, we establish a necessary condition for learnability based on a combinatorial dimension related to the DS dimension, and conjecture that it may also be sufficient in this context. 在线学习上,我们提供一个characterizes the minimax instance optimal cumulative loss up to a constant factor的dimension,并设计了一个optimal online learner for realizable regression,thereby resolving an open question raised by Daskalakis and Golowich in STOC '22.

RADAR: Robust AI-Text Detection via Adversarial Learning

  • paper_url: http://arxiv.org/abs/2307.03838
  • repo_url: None
  • paper_authors: Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho
  • for: 本研究的目的是提出一种新的AI-文本检测框架,以便在LLMs技术的进步和ChatGPT-like应用的普及之下,更好地分辨人工生成的文本和机器生成的文本。
  • methods: 本研究使用的方法是基于对抗学习的RADAR框架,它共同培训了一个Robust AI-text Detector和一个paraphraser。paraphraser的目的是生成真实的内容,以逃脱AI-文本检测。RADAR使用检测器的反馈来更新paraphraser,并 vice versa。
  • results: 对8种LLMs(Pythia、Dolly 2.0、Palmyra、Camel、GPT-J、Dolly 1.0、LLaMA、Vicuna)在4个dataset上进行了实验,结果显示,RADARsignificantly outperforms现有的AI-文本检测方法,特别是在paraphrasing存在时。此外,我们还发现RADAR在instruction-tuned LLMs上 Transferability 强,并且通过GPT-3.5进行评估,发现RADAR的改进能力。
    Abstract Recent advances in large language models (LLMs) and the intensifying popularity of ChatGPT-like applications have blurred the boundary of high-quality text generation between humans and machines. However, in addition to the anticipated revolutionary changes to our technology and society, the difficulty of distinguishing LLM-generated texts (AI-text) from human-generated texts poses new challenges of misuse and fairness, such as fake content generation, plagiarism, and false accusation of innocent writers. While existing works show that current AI-text detectors are not robust to LLM-based paraphrasing, this paper aims to bridge this gap by proposing a new framework called RADAR, which jointly trains a Robust AI-text Detector via Adversarial leaRning. RADAR is based on adversarial training of a paraphraser and a detector. The paraphraser's goal is to generate realistic contents to evade AI-text detection. RADAR uses the feedback from the detector to update the paraphraser, and vice versa. Evaluated with 8 different LLMs (Pythia, Dolly 2.0, Palmyra, Camel, GPT-J, Dolly 1.0, LLaMA, and Vicuna) across 4 datasets, experimental results show that RADAR significantly outperforms existing AI-text detection methods, especially when paraphrasing is in place. We also identify the strong transferability of RADAR from instruction-tuned LLMs to other LLMs, and evaluate the improved capability of RADAR via GPT-3.5.
    摘要 RADAR is based on adversarial training of a paraphraser and a detector. The paraphraser's goal is to generate realistic contents to evade AI-text detection, while the detector's goal is to correctly identify AI-generated texts. RADAR uses the feedback from the detector to update the paraphraser, and vice versa. The framework was evaluated with 8 different LLMs (Pythia, Dolly 2.0, Palmyra, Camel, GPT-J, Dolly 1.0, LLaMA, and Vicuna) across 4 datasets, and the results show that RADAR significantly outperforms existing AI-text detection methods, especially when paraphrasing is involved. Additionally, RADAR was found to have strong transferability from instruction-tuned LLMs to other LLMs, and its capability was improved further via GPT-3.5.

Effect of Intensity Standardization on Deep Learning for WML Segmentation in Multi-Centre FLAIR MRI

  • paper_url: http://arxiv.org/abs/2307.03827
  • repo_url: None
  • paper_authors: Abdollah Ghazvanchahi, Pejman Jahbedar Maralani, Alan R. Moody, April Khademi
  • for: 这个论文是为了提高白 matter lesion(WML) segmentation在magnetic resonance imaging(MRI)中的性能而写的。
  • methods: 这篇论文使用了多种INTENSITY STANDARDIZATION方法来进行MRI数据的预处理,以提高WML segmentation的性能。其中包括IAMLAB方法,以及其他流行的normalization技术,如White-strip、Nyul和Z-score。
  • results: 结果表明,IAMLAB和Ensemble方法在不同的lesion category中均有更高的WML segmentation性能,比如原始数据或其他normalization方法。IAMLAB和Ensemble方法在各种lesion category中都有最高的dice similarity coefficient(DSC),并且在不同的临床数据集中也具有最高的DSC。这些方法可以减轻MRI领域的差异,并且是适用于DL-based WML segmentation的优选方法。
    Abstract Deep learning (DL) methods for white matter lesion (WML) segmentation in MRI suffer a reduction in performance when applied on data from a scanner or centre that is out-of-distribution (OOD) from the training data. This is critical for translation and widescale adoption, since current models cannot be readily applied to data from new institutions. In this work, we evaluate several intensity standardization methods for MRI as a preprocessing step for WML segmentation in multi-centre Fluid-Attenuated Inversion Recovery (FLAIR) MRI. We evaluate a method specifically developed for FLAIR MRI called IAMLAB along with other popular normalization techniques such as White-strip, Nyul and Z-score. We proposed an Ensemble model that combines predictions from each of these models. A skip-connection UNet (SC UNet) was trained on the standardized images, as well as the original data and segmentation performance was evaluated over several dimensions. The training (in-distribution) data consists of a single study, of 60 volumes, and the test (OOD) data is 128 unseen volumes from three clinical cohorts. Results show IAMLAB and Ensemble provide higher WML segmentation performance compared to models from original data or other normalization methods. IAMLAB & Ensemble have the highest dice similarity coefficient (DSC) on the in-distribution data (0.78 & 0.80) and on clinical OOD data. DSC was significantly higher for IAMLAB compared to the original data (p<0.05) for all lesion categories (LL>25mL: 0.77 vs. 0.71; 10mL<= LL<25mL: 0.66 vs. 0.61; LL<10mL: 0.53 vs. 0.52). The IAMLAB and Ensemble normalization methods are mitigating MRI domain shift and are optimal for DL-based WML segmentation in unseen FLAIR data.
    摘要 深度学习(DL)方法 для白 matter损害(WML)分割在MRI中受到外部数据集(OOD)的影响,导致性能下降。这对于翻译和大规模应用而言是关键,因为现有的模型无法直接应用于新机构的数据集。在这项工作中,我们评估了多种MRIIntensity标准化方法作为WML分割前的预处理步骤。我们评估了特定 дляFLAIR MRI的IAMLAB方法,以及其他流行的标准化技术,如白带、恩琴和Z-score。我们提出了一种组合这些模型的ensemble模型。一个skip-connection UNet(SC UNNet)在标准化图像上进行训练,以及原始数据上进行分割性能的评估。训练(卷积)数据集包括60个Volume,测试(OOD)数据集包括128个未看过的Volume从三个临床各类数据集。结果显示,IAMLAB和Ensemble模型在WML分割性能方面比原始数据或其他标准化方法高得多。IAMLAB和Ensemble模型在卷积数据集(卷积数据)上的DSC值分别为0.78和0.80,并在临床OOD数据上达到了最高的DSC值(0.77和0.80)。对于所有损害类别(LL>25mL:0.77 vs. 0.71;10mL≤ LL<25mL:0.66 vs. 0.61;LL<10mL:0.53 vs. 0.52),IAMLAB模型的DSC值与原始数据相比有 statistically significant difference(P<0.05)。IAMLAB和Ensemble normalization方法可以 Mitigate MRI域shift,是适用于DL基于WML分割的优选方案。

A Combinatorial Characterization of Online Learning Games with Bounded Losses

  • paper_url: http://arxiv.org/abs/2307.03816
  • repo_url: None
  • paper_authors: Vinod Raman, Unique Subedi, Ambuj Tewari
  • for: 学习假设集的在线学习性能对于任意、但是有界的损失函数
  • methods: 使用新的渐进敏感 combinatorial 维度——顺序最小最大维度,对于在线学习性能进行数量化定量Characterization
  • results: 在vector-valued regression和多标签分类两个自然的学习设定中,得到了第一个量化的在线学习性能Characterization
    Abstract We study the online learnability of hypothesis classes with respect to arbitrary, but bounded, loss functions. We give a new scale-sensitive combinatorial dimension, named the sequential Minimax dimension, and show that it gives a tight quantitative characterization of online learnability. As applications, we give the first quantitative characterization of online learnability for two natural learning settings: vector-valued regression and multilabel classification.
    摘要 我们研究在使用各种固定但受限的损失函数时,假设集合在线学习的可学习性。我们提出了一种新的敏感度量,称为顺序最小最大维度,并证明它为在线学习的准确量提供了紧跟的量化特征化。我们还应用到了两个自然的学习场景:向量值回归和多类分类。

Controlling Chaotic Maps using Next-Generation Reservoir Computing

  • paper_url: http://arxiv.org/abs/2307.03813
  • repo_url: None
  • paper_authors: Robert M. Kent, Wendson A. S. Barbosa, Daniel J. Gauthier
  • for: 这个论文是为了研究非线性系统控制技术和下一代潜在 computing 之间的结合。
  • methods: 论文使用了非线性系统控制技术和下一代潜在 computing 来预测动力系统的行为。
  • results: 论文在一系列控制任务中展示了控制器的性能,包括控制系统 между不稳定的固定点、稳定系统到更高阶 периодических轨迹、和到一个指定的状态。 论文还表明了控制器只需要10个数据点进行训练,可以在一次迭代中控制系统到指定的轨迹,并且对噪音和模型误差 Displaytext 有 robustness。
    Abstract In this work, we combine nonlinear system control techniques with next-generation reservoir computing, a best-in-class machine learning approach for predicting the behavior of dynamical systems. We demonstrate the performance of the controller in a series of control tasks for the chaotic H\'enon map, including controlling the system between unstable fixed-points, stabilizing the system to higher order periodic orbits, and to an arbitrary desired state. We show that our controller succeeds in these tasks, requires only 10 data points for training, can control the system to a desired trajectory in a single iteration, and is robust to noise and modeling error.
    摘要 在这项工作中,我们将非线性系统控制技术与下一代散射 computing(一种最佳级机器学习方法)结合使用,用于预测动力系统的行为。我们在哈农地图中进行了一系列控制任务,包括控制系统在不稳定的固定点上,稳定系统到更高阶 periodic orbit 以及到任意所希望的状态。我们发现,我们的控制器在这些任务中具有出色的表现,只需要10个数据点进行训练,可以在单次迭代中控制系统到所希望的轨迹,并具有噪声和模型误差的抗性。

For Women, Life, Freedom: A Participatory AI-Based Social Web Analysis of a Watershed Moment in Iran’s Gender Struggles

  • paper_url: http://arxiv.org/abs/2307.03764
  • repo_url: None
  • paper_authors: Adel Khorramrouz, Sujan Dutta, Ashiqur R. KhudaBukhsh
  • for: 这 paper 的目的是计算并分析波斯语推特讨论,以估算在警察拘留中死亡的马哈萨·阿米尼去世后,对男女平等的态度发生了如何的变化。
  • methods: 该 paper 使用了一个ensemble active learning挺训练一个立场分类器,其特点在于伊朗女性参与了活动的角色,不仅提供标签,还提供了有价值的关键词 для更加有意义的词汇创造以及短示文档 для导向采样步骤。
  • results: 分析结果表明,贝娅·阿米尼去世后,波斯语推特讨论发生了偏好化的变化,双方的负面和正面的推特数量均增加了,其中正面推特数量微小地大于负面推特数量增加。此外,与基eline波斯推特活动相比,支持抗议的推特账户的创建时间更加接近于基eline。
    Abstract In this paper, we present a computational analysis of the Persian language Twitter discourse with the aim to estimate the shift in stance toward gender equality following the death of Mahsa Amini in police custody. We present an ensemble active learning pipeline to train a stance classifier. Our novelty lies in the involvement of Iranian women in an active role as annotators in building this AI system. Our annotators not only provide labels, but they also suggest valuable keywords for more meaningful corpus creation as well as provide short example documents for a guided sampling step. Our analyses indicate that Mahsa Amini's death triggered polarized Persian language discourse where both fractions of negative and positive tweets toward gender equality increased. The increase in positive tweets was slightly greater than the increase in negative tweets. We also observe that with respect to account creation time, between the state-aligned Twitter accounts and pro-protest Twitter accounts, pro-protest accounts are more similar to baseline Persian Twitter activity.
    摘要 在这篇论文中,我们对波斯语推特讨论进行计算分析,以估算死于警察执法中的马赛穆罕默的去世对性别平等的看法产生影响。我们提出了一个协同学习激活管道,以训练立场分类器。我们的创新在于伊朗女性参与了活动角色,作为标注人员,不仅提供标签,还提供了有价值的关键词,以便更好地创建词汇库,以及短示例文档,用于指导采样步骤。我们的分析表明,马赛穆罕默的去世导致波斯语推特讨论呈极化趋势,负面和正面的推特数量均增加,但正面推特数量略大于负面推特数量。此外,我们发现,在账户创建时间方面,支持抗议的推特账户和支持政府的推特账户之间的差异较小。

Predicting Outcomes in Long COVID Patients with Spatiotemporal Attention

  • paper_url: http://arxiv.org/abs/2307.04770
  • repo_url: None
  • paper_authors: Degan Hao, Mohammadreza Negahdar
  • for: 预测长期感染COVID-19患者的严重程度
  • methods: 使用本地LSTM和共同空间时间注意力机制,同时限制短期相互依赖学习和长期相互依赖学习
  • results: 在具有困难预测特征的长COVID患者数据集上,本方法比相关方法表现出色,可用于评估长COVID患者的严重程度。
    Abstract Long COVID is a general term of post-acute sequelae of COVID-19. Patients with long COVID can endure long-lasting symptoms including fatigue, headache, dyspnea and anosmia, etc. Identifying the cohorts with severe long-term complications in COVID-19 could benefit the treatment planning and resource arrangement. However, due to the heterogeneous phenotype presented in long COVID patients, it is difficult to predict their outcomes from their longitudinal data. In this study, we proposed a spatiotemporal attention mechanism to weigh feature importance jointly from the temporal dimension and feature space. Considering that medical examinations can have interchangeable orders in adjacent time points, we restricted the learning of short-term dependency with a Local-LSTM and the learning of long-term dependency with the joint spatiotemporal attention. We also compared the proposed method with several state-of-the-art methods and a method in clinical practice. The methods are evaluated on a hard-to-acquire clinical dataset of patients with long COVID. Experimental results show the Local-LSTM with joint spatiotemporal attention outperformed related methods in outcome prediction. The proposed method provides a clinical tool for the severity assessment of long COVID.
    摘要 长期 COVID 是 COVID-19 后遗症的总称。患有长期 COVID 的患者可能会经历长期的症状,如疲劳、头痛、呼吸急促和 anosmia 等。确定 COVID-19 患者长期grave的合并症状可以帮助诊断和资源安排。然而,由于长期 COVID 患者的多样性表现,从 их长期数据来预测结果很困难。在这项研究中,我们提出了一种空间时间注意力机制,以同时评估特征的重要性。由于医学检查可能会在邻近时间点互换检查顺序,我们限制了短期依赖性学习的 Local-LSTM,以及与其他特征空间进行同时学习的合并空间时间注意力。我们还对我们的方法与其他现有方法和临床实践中的方法进行了比较。这些方法在普遍难以获得的医学数据集上进行了评估。实验结果表明,Local-LSTM WITH 共同空间时间注意力在结果预测方面表现出色,超过了相关方法。我们的方法提供了诊断长期 COVID 严重程度的临床工具。

Formulation Graphs for Mapping Structure-Composition of Battery Electrolytes to Device Performance

  • paper_url: http://arxiv.org/abs/2307.03811
  • repo_url: None
  • paper_authors: Vidushi Sharma, Maxwell Giammona, Dmitry Zubarev, Andy Tek, Khanh Nugyuen, Linda Sundberg, Daniele Congiu, Young-Hye La
  • For: The paper is written for researchers and developers working on the discovery and development of new combinatorial materials, particularly in the context of battery electrolytes.* Methods: The paper proposes a deep learning model called Formulation Graph Convolution Network (F-GCN) that can predict the properties of liquid formulations based on the structure-composition relationship of their individual components. The model uses molecular descriptors derived from molecular graphs, informed by HOMO-LUMO and electric moment properties of the molecules.* Results: The paper demonstrates the effectiveness of the proposed model on two exemplary datasets related to battery electrolytes, achieving low errors in predicting performance metrics such as Coulombic Efficiency (CE) and specific capacity. The best performing F-GCN model uses molecular descriptors derived from molecular graphs that are informed with HOMO-LUMO and electric moment properties of the molecules.
    Abstract Advanced computational methods are being actively sought for addressing the challenges associated with discovery and development of new combinatorial material such as formulations. A widely adopted approach involves domain informed high-throughput screening of individual components that can be combined into a formulation. This manages to accelerate the discovery of new compounds for a target application but still leave the process of identifying the right 'formulation' from the shortlisted chemical space largely a laboratory experiment-driven process. We report a deep learning model, Formulation Graph Convolution Network (F-GCN), that can map structure-composition relationship of the individual components to the property of liquid formulation as whole. Multiple GCNs are assembled in parallel that featurize formulation constituents domain-intuitively on the fly. The resulting molecular descriptors are scaled based on respective constituent's molar percentage in the formulation, followed by formalizing into a combined descriptor that represents a complete formulation to an external learning architecture. The use case of proposed formulation learning model is demonstrated for battery electrolytes by training and testing it on two exemplary datasets representing electrolyte formulations vs battery performance -- one dataset is sourced from literature about Li/Cu half-cells, while the other is obtained by lab-experiments related to lithium-iodide full-cell chemistry. The model is shown to predict the performance metrics like Coulombic Efficiency (CE) and specific capacity of new electrolyte formulations with lowest reported errors. The best performing F-GCN model uses molecular descriptors derived from molecular graphs that are informed with HOMO-LUMO and electric moment properties of the molecules using a knowledge transfer technique.
    摘要 当前计算方法在开发新的 combinatorial材料领域中是活跃的搜寻。一种广泛采用的方法是通过域内高速屏选个别组分,以加速针对特定应用的新化合物的发现。然而,从短列表中选择合适的“形态”仍然是实验室实验驱动的过程。我们报道了一种深度学习模型,形态图 convolutional neural network (F-GCN),可以将个体组分的结构-组分关系映射到液体形态的性能。多个GCN被紧密地 assembled 并在 fly 上域特征化形态成分。 resulting molecular descriptors 被权重根据各个成分的分子比例缩放,然后以组合的描述符形式传递给外部学习架构。我们使用了 Li/Cu 半细胞和锂iodide 全细胞化学的两个数据集来评估我们的形态学习模型。我们的模型可以预测新的电解质形态的性能指标,如电子效率(CE)和Specific capacity。我们的最佳运行F-GCN模型使用基于分子图的分子描述符,并使用了知识传递技术以获得HOMO-LUMO和电动量特性。

URL: A Representation Learning Benchmark for Transferable Uncertainty Estimates

  • paper_url: http://arxiv.org/abs/2307.03810
  • repo_url: https://github.com/mkirchhof/url
  • paper_authors: Michael Kirchhof, Bálint Mucsányi, Seong Joon Oh, Enkelejda Kasneci
  • for: 这个论文是为了开发一个能够在新数据集上传输的预训练模型,同时也能够提供可靠的机器学习和不确定量衡的基础模型。
  • methods: 这个论文使用了一种新的 uncertainty-aware representation learning(URL)benchmark,用于评估eleven个不确定量衡器,这些不确定量衡器在ImageNet上预训练后被转移到了八个下游数据集上。
  • results: 研究发现,对于表示本身的不确定性或直接估计预测风险的方法比较出色,但是实现可传输的不确定量衡仍然是一个开放的挑战。
    Abstract Representation learning has significantly driven the field to develop pretrained models that can act as a valuable starting point when transferring to new datasets. With the rising demand for reliable machine learning and uncertainty quantification, there is a need for pretrained models that not only provide embeddings but also transferable uncertainty estimates. To guide the development of such models, we propose the Uncertainty-aware Representation Learning (URL) benchmark. Besides the transferability of the representations, it also measures the zero-shot transferability of the uncertainty estimate using a novel metric. We apply URL to evaluate eleven uncertainty quantifiers that are pretrained on ImageNet and transferred to eight downstream datasets. We find that approaches that focus on the uncertainty of the representation itself or estimate the prediction risk directly outperform those that are based on the probabilities of upstream classes. Yet, achieving transferable uncertainty quantification remains an open challenge. Our findings indicate that it is not necessarily in conflict with traditional representation learning goals. Code is provided under https://github.com/mkirchhof/url .
    摘要 “表达学学习”在领域中发挥了重要作用,使得开发者可以从新数据集上转移到新的任务。随着机器学习的可靠性和不确定性评估的需求增加,需要开发可以提供嵌入和传输不确定性估计的预训练模型。为了引导这类模型的开发,我们提出了“不确定性感知学习”(URL)数据集。除了表达的传输性外,它还测量了零批转移不确定性估计的新指标。我们对ImageNet预训练的十一种不确定量进行了URL的评估,并将其转移到八个下游数据集上。我们发现,关注表达不确定性本身或直接估计预测风险的方法表现较好。然而,实现传输不确定性估计仍然是一个开放的挑战。我们的发现表明,这并不是传统表达学习目标的矛盾。代码可以在获取。

A Theoretical Perspective on Subnetwork Contributions to Adversarial Robustness

  • paper_url: http://arxiv.org/abs/2307.03803
  • repo_url: None
  • paper_authors: Jovon Craig, Josh Andle, Theodore S. Nowak, Salimeh Yasaei Sekeh
  • for: 这个论文是为了研究深度神经网络(DNNs)对攻击的Robustness进行了广泛的研究,以便更好地理解深度学习模型的凝结和安全应用中的模型安全性。
  • methods: 这个论文使用了对DNNs进行针对性攻击的训练方法来强化其对攻击的Robustness,并证明了这种方法可以在整个模型上应用计算成本的高的训练方法。
  • results: 该论文提出了一个新的理论框架,用于研究攻击如何影响整个网络的Robustness,并提供了一种测试这种理论的方法。经验表明,如果某个子网络具有一定的鲁棒性,那么整个网络也是鲁棒的,并且需要在不同层次之间存在一定的依赖关系。
    Abstract The robustness of deep neural networks (DNNs) against adversarial attacks has been studied extensively in hopes of both better understanding how deep learning models converge and in order to ensure the security of these models in safety-critical applications. Adversarial training is one approach to strengthening DNNs against adversarial attacks, and has been shown to offer a means for doing so at the cost of applying computationally expensive training methods to the entire model. To better understand these attacks and facilitate more efficient adversarial training, in this paper we develop a novel theoretical framework that investigates how the adversarial robustness of a subnetwork contributes to the robustness of the entire network. To do so we first introduce the concept of semirobustness, which is a measure of the adversarial robustness of a subnetwork. Building on this concept, we then provide a theoretical analysis to show that if a subnetwork is semirobust and there is a sufficient dependency between it and each subsequent layer in the network, then the remaining layers are also guaranteed to be robust. We validate these findings empirically across multiple DNN architectures, datasets, and adversarial attacks. Experiments show the ability of a robust subnetwork to promote full-network robustness, and investigate the layer-wise dependencies required for this full-network robustness to be achieved.
    摘要 深度神经网络(DNNs)的对抗攻击的稳定性已经得到了广泛的研究,以便更好地理解深度学习模型的协调方式,并确保这些模型在安全关键应用中的安全性。对抗训练是一种加强DNNs对抗攻击的方法,并已经证明可以通过对整个模型进行计算昂贵的训练方法来实现。为了更好地理解这些攻击和实现更有效的对抗训练,在这篇论文中我们开发了一种新的理论框架,以 investigate如何各个子网络的对抗稳定性对整个网络的稳定性的贡献。我们首先介绍了semirobustness这个概念,它是一种对抗稳定性的度量。然后,我们提供了一种理论分析,证明如果一个子网络是semirobust的,并且每个后续层与其之间存在足够的依赖关系,那么剩下的层也一定是稳定的。我们验证了这些发现的实验结果,并对多个DNN架构、数据集和攻击方法进行了验证。实验表明,一个稳定的子网络可以推动整个网络的稳定性,并且调查层间的依赖关系可以实现这种全网络稳定性。

CLIPMasterPrints: Fooling Contrastive Language-Image Pre-training Using Latent Variable Evolution

  • paper_url: http://arxiv.org/abs/2307.03798
  • repo_url: https://github.com/matfrei/clipmasterprints
  • paper_authors: Matthias Freiberger, Peter Kun, Anders Sundnes Løvlie, Sebastian Risi
  • for: 这个论文旨在探讨 Contrastive Language-Image Pre-training (CLIP) 模型在面对“伪装主要图像”(fooling master images)时的护卫机制。
  • methods: 该论文使用了演化策略和杂化度规避策略来搜寻 CLIP 模型的易训练图像,并 investigate 这些图像的特性和普适性。
  • results: 研究发现,使用少量图像标签可以生成大量semantically相关的图像,而且这些图像可以让 CLIP 模型具有高置信度。此外,研究还发现 modality gap 在多modal网络中导致 CLIP 模型易受到伪装主要图像的攻击。
    Abstract Models leveraging both visual and textual data such as Contrastive Language-Image Pre-training (CLIP), are increasingly gaining importance. In this work, we show that despite their versatility, such models are vulnerable to what we refer to as fooling master images. Fooling master images are capable of maximizing the confidence score of a CLIP model for a significant number of widely varying prompts, while being unrecognizable for humans. We demonstrate how fooling master images can be mined by searching the latent space of generative models by means of an evolution strategy or stochastic gradient descent. We investigate the properties of the mined fooling master images, and find that images trained on a small number of image captions potentially generalize to a much larger number of semantically related captions. Further, we evaluate two possible mitigation strategies and find that vulnerability to fooling master examples is closely related to a modality gap in contrastive pre-trained multi-modal networks. From the perspective of vulnerability to off-manifold attacks, we therefore argue for the mitigation of modality gaps in CLIP and related multi-modal approaches. Source code and mined CLIPMasterPrints are available at https://github.com/matfrei/CLIPMasterPrints.
    摘要 模型利用视觉和文本数据,如对照语言图像预训练(CLIP),在当前研究中变得越来越重要。在这项工作中,我们表明,尽管这些模型具有多样性,但它们却容易受到我们称为“欺骗主图”的攻击。欺骗主图可以让CLIP模型对各种各样的提示进行最大化信任分数,而human是无法识别的。我们表明,可以通过演化策略或权重下降来在生成模型的latent空间中搜寻欺骗主图。我们研究欺骗主图的性质,发现图像通过少量的图像描述训练可以对Semantically相关的提示进行扩展。此外,我们评估了两种可能的防范策略,发现攻击模式与多样性差有close关系。从防范多样性差的角度来看,我们 argue for the mitigation of modality gaps in CLIP and related multi-modal approaches。代码和搜寻到的CLIPMasterPrints可以在https://github.com/matfrei/CLIPMasterPrints上获取。

Exploring the Lottery Ticket Hypothesis with Explainability Methods: Insights into Sparse Network Performance

  • paper_url: http://arxiv.org/abs/2307.13698
  • repo_url: None
  • paper_authors: Shantanu Ghosh, Kayhan Batmanghelich
  • for: 本研究旨在找到具有比较高性能的稀缺网络,并解释这些稀缺网络的性能是如何逐渐提高或下降的。
  • methods: 本研究使用了Grad-CAM和Post-hoc概念瓶顶模型(PCBMs)来调查减少网络中的权重后,网络的解释性。
  • results: 研究发现,随着权重的减少,网络的性能逐渐下降,并且发现了原始网络中的概念和像素与减少后的网络存在差异,这可能是性能下降的原因。
    Abstract Discovering a high-performing sparse network within a massive neural network is advantageous for deploying them on devices with limited storage, such as mobile phones. Additionally, model explainability is essential to fostering trust in AI. The Lottery Ticket Hypothesis (LTH) finds a network within a deep network with comparable or superior performance to the original model. However, limited study has been conducted on the success or failure of LTH in terms of explainability. In this work, we examine why the performance of the pruned networks gradually increases or decreases. Using Grad-CAM and Post-hoc concept bottleneck models (PCBMs), respectively, we investigate the explainability of pruned networks in terms of pixels and high-level concepts. We perform extensive experiments across vision and medical imaging datasets. As more weights are pruned, the performance of the network degrades. The discovered concepts and pixels from the pruned networks are inconsistent with the original network -- a possible reason for the drop in performance.
    摘要 发现高性能的简化网络在巨量神经网络中是有利于在具有有限存储的设备上部署,如移动电话。此外,AI信任的重要因素之一是模型解释性。抽奖假设(LTH)找到了深度网络中的相似或更高性能的网络,但有限的研究对LTH的成功或失败进行了解释性研究。在这项工作中,我们研究了剪除网络性能的提高或下降的原因。使用Grad-CAM和后置概念瓶颈模型(PCBM),我们进行了剪除网络的解释性研究,即像素和高级概念的解释性。我们在视觉和医学影像 dataset 上进行了广泛的实验。随着更多的权重被剪除,网络性能下降。发现的概念和像素从剪除网络中与原始网络不同,可能是性能下降的原因。

Neural Abstraction-Based Controller Synthesis and Deployment

  • paper_url: http://arxiv.org/abs/2307.03783
  • repo_url: https://github.com/msalamati/neural-representation
  • paper_authors: Rupak Majumdar, Mahmoud Salamati, Sadegh Soudjani
  • for: 本研究旨在提高抽象基本方法的内存效率,以便在实时控制中应用。
  • methods: 我们提出了一种基于神经网络表示的内存有效的控制器生成方法,包括在执行阶段使用压缩神经网络表示,以及在训练阶段使用神经网络来减少内存占用。
  • results: 我们的方法可以减少抽象基本方法的内存占用,并且可以在实时控制中应用。在选择的标准套件中,我们的方法可以减少平均内存占用量分别为1.31×10^5和7.13×10^3,最高达7.54×10^5和3.18×10^4。
    Abstract Abstraction-based techniques are an attractive approach for synthesizing correct-by-construction controllers to satisfy high-level temporal requirements. A main bottleneck for successful application of these techniques is the memory requirement, both during controller synthesis and in controller deployment. We propose memory-efficient methods for mitigating the high memory demands of the abstraction-based techniques using neural network representations. To perform synthesis for reach-avoid specifications, we propose an on-the-fly algorithm that relies on compressed neural network representations of the forward and backward dynamics of the system. In contrast to usual applications of neural representations, our technique maintains soundness of the end-to-end process. To ensure this, we correct the output of the trained neural network such that the corrected output representations are sound with respect to the finite abstraction. For deployment, we provide a novel training algorithm to find a neural network representation of the synthesized controller and experimentally show that the controller can be correctly represented as a combination of a neural network and a look-up table that requires a substantially smaller memory. We demonstrate experimentally that our approach significantly reduces the memory requirements of abstraction-based methods. For the selected benchmarks, our approach reduces the memory requirements respectively for the synthesis and deployment by a factor of $1.31\times 10^5$ and $7.13\times 10^3$ on average, and up to $7.54\times 10^5$ and $3.18\times 10^4$. Although this reduction is at the cost of increased off-line computations to train the neural networks, all the steps of our approach are parallelizable and can be implemented on machines with higher number of processing units to reduce the required computational time.
    摘要 “对于高水平时间需求的正确控制器的合成,具有吸引力的方法是基于抽象的技术。然而,这些技术的记忆需求在控制器合成和部署过程中都是主要的瓶颈。我们提出了一些记忆效率的方法,使用神经网络表示法来减少高级抽象技术的记忆需求。为了实现这些目的,我们提出了一种在线算法,它基于压缩神经网络表示法来进行控制器合成。与传统神经网络应用不同的是,我们的方法保持了途径的有效性。为了确保这一点,我们会对训练神经网络输出进行修正,使其与有限抽象之间保持相对的准确性。在部署阶段,我们提供了一种新的训练算法,用于在神经网络和lookup表之间找到一个可以减少记忆需求的控制器表示。我们通过实验证明,我们的方法可以减少抽象基于方法的记忆需求。对于我们选择的标准套件,我们的方法分别减少了在合成和部署阶段的记忆需求的平均值为1.31\*10^5和7.13\*10^3。最多可以减少到7.54\*10^5和3.18\*10^4。虽然这些减少是在训练神经网络的过程中付出的成本,但所有的步骤都可以并行进行并在高处理器数量的机器上进行实现,以减少所需的计算时间。”

When does the ID algorithm fail?

  • paper_url: http://arxiv.org/abs/2307.03750
  • repo_url: https://github.com/SOYJUN/Implement-ODR-protocol
  • paper_authors: Ilya Shpitser
  • for: 本文研究的是ID算法在图解释模型中进行 интервенциональ分布ID问题的解决方案。
  • methods: 本文使用的方法包括ID算法的各种表述,以及对ID算法的完善性和有效性的分析。
  • results: 本文提出了一些对ID算法的批评和改进,包括指出ID算法在某些情况下会失败,并提供了一些图形化的Characterization来描述这些情况。
    Abstract The ID algorithm solves the problem of identification of interventional distributions of the form p(Y | do(a)) in graphical causal models, and has been formulated in a number of ways [12, 9, 6]. The ID algorithm is sound (outputs the correct functional of the observed data distribution whenever p(Y | do(a)) is identified in the causal model represented by the input graph), and complete (explicitly flags as a failure any input p(Y | do(a)) whenever this distribution is not identified in the causal model represented by the input graph). The reference [9] provides a result, the so called "hedge criterion" (Corollary 3), which aims to give a graphical characterization of situations when the ID algorithm fails to identify its input in terms of a structure in the input graph called the hedge. While the ID algorithm is, indeed, a sound and complete algorithm, and the hedge structure does arise whenever the input distribution is not identified, Corollary 3 presented in [9] is incorrect as stated. In this note, I outline the modern presentation of the ID algorithm, discuss a simple counterexample to Corollary 3, and provide a number of graphical characterizations of the ID algorithm failing to identify its input distribution.
    摘要 “ID算法解决了图structural causal模型中p(Y | do(a))的分布标定问题,并在多种形式下表述([12, 9, 6])。ID算法是有效的(对于输入分布p(Y | do(a)),输出正确的函数),并且是完整的(如果输入分布不能在图structural causal模型中标定,则直接标记为失败)。”“参考[9]中的结论(即‘别branch criterion’)提供了一种图Structural characterization of situations when the ID algorithm fails to identify its input, in terms of a structure in the input graph called the hedge. However, this conclusion is incorrect as stated, and in this note, I present a modern presentation of the ID algorithm and a simple counterexample to Corollary 3. Additionally, I provide several graphical characterizations of the ID algorithm failing to identify its input distribution.”

Incentive-Theoretic Bayesian Inference for Collaborative Science

  • paper_url: http://arxiv.org/abs/2307.03748
  • repo_url: None
  • paper_authors: Stephen Bates, Michael I. Jordan, Michael Sklar, Jake A. Soloff
  • for: 这个论文是为了研究现代科学研究中的分布式、协作性,以及研究人员、管制机构、资金机构、商业伙伴和科学机构之间的互动和不同的驱动力。
  • methods: 论文使用了一种假设检验方法,其中一个代理人(例如研究人员或药品公司)有一个私人前置信息,而执行者(例如政策制定者或监管机构)希望根据参数值进行决策。代理人选择是否进行统计试验,然后试验的结果被用于执行者进行决策。
  • results: 论文表明了执行者可以通过代理人的决策行为来揭示部分信息,并使用这些信息来控制 posterior 概率的值。这一结果有一个重要的应用是在临床试验中设置类型一错误水平:试验类型一错误水平应该是临床试验成本除以企业利润的比率。
    Abstract Contemporary scientific research is a distributed, collaborative endeavor, carried out by teams of researchers, regulatory institutions, funding agencies, commercial partners, and scientific bodies, all interacting with each other and facing different incentives. To maintain scientific rigor, statistical methods should acknowledge this state of affairs. To this end, we study hypothesis testing when there is an agent (e.g., a researcher or a pharmaceutical company) with a private prior about an unknown parameter and a principal (e.g., a policymaker or regulator) who wishes to make decisions based on the parameter value. The agent chooses whether to run a statistical trial based on their private prior and then the result of the trial is used by the principal to reach a decision. We show how the principal can conduct statistical inference that leverages the information that is revealed by an agent's strategic behavior -- their choice to run a trial or not. In particular, we show how the principal can design a policy to elucidate partial information about the agent's private prior beliefs and use this to control the posterior probability of the null. One implication is a simple guideline for the choice of significance threshold in clinical trials: the type-I error level should be set to be strictly less than the cost of the trial divided by the firm's profit if the trial is successful.
    摘要 现代科学研究是一项分布式、合作性的努力,由研究人员、管理机构、资金机构、商业伙伴和科学机构共同参与,这些组织之间存在不同的驱动和激励。为保持科学的严谨性,统计方法应该考虑这种情况。为此,我们研究在有一个代理人(例如研究人员或药品制造商)有私人估计参数的情况下,检测 гипотезы的问题。代理人根据自己的私人估计选择是否进行统计试验,然后试验结果被使用者来做决策。我们表明如何使得首脑可以通过代理人的战略行为(即是否进行试验)了解一部分私人估计信息,并使用这些信息来控制后验 posterior 概率。特别是,我们表明如何使得首脑可以设计一种政策来描述代理人的私人估计信息,并使用这些信息来控制后验 posterior 概率。这一结论之一是一个简单的临床试验选择水平指南:类型一错误水平应该设置为试验成本除以成功后利润的比率。

QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models

  • paper_url: http://arxiv.org/abs/2307.03738
  • repo_url: https://github.com/ist-daslab/qigen
  • paper_authors: Tommaso Pegolotti, Elias Frantar, Dan Alistarh, Markus Püschel
  • for: 支持量化生成推理在 LLMA 或 OPT 上的自动代码生成方法。
  • methods: 基于目标架构和性能模型,包括硬件特性和方法特有的准确性约束。
  • results: CPU 上的 LLMA 模型推理显示,我们的方法可以达到高性能和高准确性,与现有开源解决方案相比。Here’s the English version for reference:
  • for: An automatic code generation approach for supporting quantized generative inference on large language models (LLMs) such as LLaMA or OPT on off-the-shelf CPUs.
  • methods: Informed by the target architecture and a performance model, including both hardware characteristics and method-specific accuracy constraints.
  • results: Results on CPU-based inference for LLaMA models show that our approach can lead to high performance and high accuracy, comparing favorably to the best existing open-source solution.
    Abstract We present ongoing work on a new automatic code generation approach for supporting quantized generative inference on LLMs such as LLaMA or OPT on off-the-shelf CPUs. Our approach is informed by the target architecture and a performance model, including both hardware characteristics and method-specific accuracy constraints. Results on CPU-based inference for LLaMA models show that our approach can lead to high performance and high accuracy, comparing favorably to the best existing open-source solution. A preliminary implementation is available at https://github.com/IST-DASLab/QIGen.
    摘要 我们正在进行一项新的自动代码生成方法,用于支持量化生成推理在LLaMA或OPT类型的语言模型上的Off-the-shelf CPU上。我们的方法受到目标架构和性能模型的影响,包括硬件特性和方法具体精度限制。对CPU上的推理过程中的LLaMA模型进行了结果,我们的方法可以实现高性能和高精度,与现有开源解决方案相比,表现优异。一个初步的实现可以在https://github.com/IST-DASLab/QIGen上获得。

Polybot: Training One Policy Across Robots While Embracing Variability

  • paper_url: http://arxiv.org/abs/2307.03719
  • repo_url: None
  • paper_authors: Jonathan Yang, Dorsa Sadigh, Chelsea Finn
    for:多种机器人平台上的视觉控制技能的跨平台传输methods:使用终端摄像头和团队代码库实现观察和动作空间的匹配使用对比学习对策的内部表示进行匹配results:在6个任务和3个机器人上收集了60小时的数据,并取得了显著提高成功率和样本效率的结果,验证了我们的设计决策。Please note that the above information is in Simplified Chinese text, as requested.
    Abstract Reusing large datasets is crucial to scale vision-based robotic manipulators to everyday scenarios due to the high cost of collecting robotic datasets. However, robotic platforms possess varying control schemes, camera viewpoints, kinematic configurations, and end-effector morphologies, posing significant challenges when transferring manipulation skills from one platform to another. To tackle this problem, we propose a set of key design decisions to train a single policy for deployment on multiple robotic platforms. Our framework first aligns the observation and action spaces of our policy across embodiments via utilizing wrist cameras and a unified, but modular codebase. To bridge the remaining domain shift, we align our policy's internal representations across embodiments through contrastive learning. We evaluate our method on a dataset collected over 60 hours spanning 6 tasks and 3 robots with varying joint configurations and sizes: the WidowX 250S, the Franka Emika Panda, and the Sawyer. Our results demonstrate significant improvements in success rate and sample efficiency for our policy when using new task data collected on a different robot, validating our proposed design decisions. More details and videos can be found on our anonymized project website: https://sites.google.com/view/polybot-multirobot
    摘要 重用大量数据是扩展视觉基于机器人 manipulate 到日常场景中的关键因素,因为收集机器人数据的成本很高。然而,机器人平台具有不同的控制方案、摄像头视点、骨骼配置和器官形态,这会导致将抓取技能从一个平台转移到另一个平台的具有很大挑战。为解决这个问题,我们提出了一些关键的设计决策,用于在多个机器人平台上训练单个策略。我们的框架首先将我们策略的观察空间和动作空间在不同实现中进行对齐,通过使用臂部摄像头和一个统一、可分模块的代码库来实现这一点。为了填补剩下的领域差异,我们对策略的内部表示进行对齐,通过对比学习来实现这一点。我们的方法在一个包含60小时、6个任务和3个机器人的数据集上进行了评估,这些机器人包括WidowX 250S、Franka Emika Panda 和Sawyer。我们的结果表明,使用我们的策略在新的任务数据上进行训练后,在不同机器人上的成功率和样本效率有显著提高,这 validate 我们的设计决策。更多细节和视频可以在我们匿名项目网站上找到:https://sites.google.com/view/polybot-multirobot。

SAR: Generalization of Physiological Agility and Dexterity via Synergistic Action Representation

  • paper_url: http://arxiv.org/abs/2307.03716
  • repo_url: None
  • paper_authors: Cameron Berg, Vittorio Caggiano, Vikash Kumar
  • for: 学习高维系统中的连续控制策略,包括肌肉机械系统,仍然是一项复杂的挑战。生物进化过程中,生物体发展了一些强大的机制,以学习高度复杂的运动控制策略。这种robust行为flexibility哪里来自?
  • methods: 模块化控制via肌肉共同强制(i.e., 肌肉合作)是一种可能的机制,它使得生物体可以通过简化和总结的动作空间来学习肌肉控制。这篇文章引用这种演化出来的运动控制策略,使用physiologically accurate的人工手和脚模型作为测试环境,以确定这种Synergistic Action Representation(SAR)在更复杂任务中是否能够促进学习。
  • results: 结果表明,使用SAR在更复杂任务中能够显著提高学习效率和成功率。在人工手和脚模型中,SAR-使用策略可以在各种 Terrains上实现 robust locomotion,而基线方法无法学习有意义的行为。此外,在多个目标搅拌任务中,SAR-使用策略的成功率高达70%,而基线方法的成功率仅有20%。这两个SAR-使用策略还能够在不同的环境条件下进行零基础学习,而不使用SAR的策略则无法进行泛化。最后,文章证明了SAR在更广泛的高维控制问题上的一致性,使用了机器人搅拌任务集和全身人形机器人步行任务。
    Abstract Learning effective continuous control policies in high-dimensional systems, including musculoskeletal agents, remains a significant challenge. Over the course of biological evolution, organisms have developed robust mechanisms for overcoming this complexity to learn highly sophisticated strategies for motor control. What accounts for this robust behavioral flexibility? Modular control via muscle synergies, i.e. coordinated muscle co-contractions, is considered to be one putative mechanism that enables organisms to learn muscle control in a simplified and generalizable action space. Drawing inspiration from this evolved motor control strategy, we use physiologically accurate human hand and leg models as a testbed for determining the extent to which a Synergistic Action Representation (SAR) acquired from simpler tasks facilitates learning more complex tasks. We find in both cases that SAR-exploiting policies significantly outperform end-to-end reinforcement learning. Policies trained with SAR were able to achieve robust locomotion on a wide set of terrains with high sample efficiency, while baseline approaches failed to learn meaningful behaviors. Additionally, policies trained with SAR on a multiobject manipulation task significantly outperformed (>70% success) baseline approaches (<20% success). Both of these SAR-exploiting policies were also found to generalize zero-shot to out-of-domain environmental conditions, while policies that did not adopt SAR failed to generalize. Finally, we establish the generality of SAR on broader high-dimensional control problems using a robotic manipulation task set and a full-body humanoid locomotion task. To the best of our knowledge, this investigation is the first of its kind to present an end-to-end pipeline for discovering synergies and using this representation to learn high-dimensional continuous control across a wide diversity of tasks.
    摘要 学习高维系统中的连续控制策略是一项挑战。生物演化过程中,生物体发展出了一些强大的机制来解决这种复杂性,以学习高度复杂的动作控制策略。这种行为的灵活性来源于哪里?模块化控制通过肌肉同步,即肌肉共同收缩,是一种被认为是生物体学习肌肉控制的潜在机制。我们以人工手和脚模型作为测试平台,通过使用生理学准确的人工手和脚模型,确定使用SAR(Synergistic Action Representation)从更简单的任务中获得的策略是否可以帮助学习更复杂的任务。我们发现,在两个情况下,使用SAR策略的学习效果都高于端到端学习。SAR策略可以在各种不同的地形上实现稳定的移动,而基线方法无法学习有意义的行为。此外,在多个目标拼接任务中,使用SAR策略的成功率高于70%,而基线方法的成功率低于20%。同时,这两种SAR策略还能够适应不同的环境条件,而不使用SAR的策略无法适应。最后,我们通过使用机器人 manipulate 任务集和全身人形步行任务,证明SAR在更广泛的高维控制问题上具有普遍性。据我们所知,这是首次对高维连续控制问题的整体解决方案。

INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers

  • paper_url: http://arxiv.org/abs/2307.03712
  • repo_url: https://github.com/lightmatter-ai/int-fp-qsim
  • paper_authors: Lakshmi Nair, Mikhail Bernadskiy, Arulselvan Madhavan, Craig Chan, Ayon Basumallik, Darius Bunandar
  • for: 这篇论文的目的是提出一个开源的模拟器,以便评估大型自然语言模型(LLM)和感知对应模型(VT)在不同的数值精度和格式下的性能。
  • methods: 这篇论文使用了现有的开源库,例如TensorRT、QPyTorch和AIMET,将其组合成一个可以支持多种浮点和整数格式的模拟器。
  • results: 这篇论文通过使用 INT-FP-QSim 模拟器,评估了不同数值精度和格式下 LLM 和 VT 的性能,并比较了最近提出的 Adaptive Block Floating Point、SmoothQuant、GPTQ 和 RPTQ 等方法的影响。
    Abstract The recent rise of large language models (LLMs) has resulted in increased efforts towards running LLMs at reduced precision. Running LLMs at lower precision supports resource constraints and furthers their democratization, enabling users to run billion-parameter LLMs on their personal devices. To supplement this ongoing effort, we propose INT-FP-QSim: an open-source simulator that enables flexible evaluation of LLMs and vision transformers at various numerical precisions and formats. INT-FP-QSim leverages existing open-source repositories such as TensorRT, QPytorch and AIMET for a combined simulator that supports various floating point and integer formats. With the help of our simulator, we survey the impact of different numerical formats on the performance of LLMs and vision transformers at 4-bit weights and 4-bit or 8-bit activations. We also compare recently proposed methods like Adaptive Block Floating Point, SmoothQuant, GPTQ and RPTQ on the model performances. We hope INT-FP-QSim will enable researchers to flexibly simulate models at various precisions to support further research in quantization of LLMs and vision transformers.
    摘要 最近的大语言模型(LLMs)的崛起导致了减少精度下运行的努力的增加。在减少精度下运行 LLMs 支持资源限制和普及,使用者可以在个人设备上运行 billion-parameter LLMs。为此,我们提议 INT-FP-QSim:一个开源的 simulator,可以在不同的数字精度和格式下灵活评估 LLMs 和视transformers。INT-FP-QSim 利用现有的开源库 such as TensorRT, QPytorch 和 AIMET,构建了一个集成的 simulator,支持多种浮点数和整数格式。我们使用 INT-FP-QSim,对 LLMs 和视transformers 的不同数字格式的影响进行了评估,并对最近提出的方法 like Adaptive Block Floating Point, SmoothQuant, GPTQ 和 RPTQ 进行了比较。我们希望 INT-FP-QSim 能够帮助研究人员在不同的精度下灵活模拟模型,以支持更多的 LLMS 和视transformers 的量化研究。

Fermat Distances: Metric Approximation, Spectral Convergence, and Clustering Algorithms

  • paper_url: http://arxiv.org/abs/2307.05750
  • repo_url: None
  • paper_authors: Nicolás García Trillos, Anna Little, Daniel McKenzie, James M. Murphy
  • for: 这个论文研究了费马距离的收敛性质,它是在里曼纲抽象上定义的一种浮动度量,可以在维度感知的数据上进行分 clustering。
  • methods: 这个论文使用了新的几何和统计学方法,包括在非均匀密度和抽象纲上的几何构造和统计学分析,以证明费马距离的收敛性。
  • results: 这个论文证明了费马距离在小邻域内收敛到其连续类比中,收敛速率取决于数据的内在维度和权重参数。此外,这个论文还证明了基于费马距离的图 Laplacian 在维度感知的数据上的收敛性。
    Abstract We analyze the convergence properties of Fermat distances, a family of density-driven metrics defined on Riemannian manifolds with an associated probability measure. Fermat distances may be defined either on discrete samples from the underlying measure, in which case they are random, or in the continuum setting, in which they are induced by geodesics under a density-distorted Riemannian metric. We prove that discrete, sample-based Fermat distances converge to their continuum analogues in small neighborhoods with a precise rate that depends on the intrinsic dimensionality of the data and the parameter governing the extent of density weighting in Fermat distances. This is done by leveraging novel geometric and statistical arguments in percolation theory that allow for non-uniform densities and curved domains. Our results are then used to prove that discrete graph Laplacians based on discrete, sample-driven Fermat distances converge to corresponding continuum operators. In particular, we show the discrete eigenvalues and eigenvectors converge to their continuum analogues at a dimension-dependent rate, which allows us to interpret the efficacy of discrete spectral clustering using Fermat distances in terms of the resulting continuum limit. The perspective afforded by our discrete-to-continuum Fermat distance analysis leads to new clustering algorithms for data and related insights into efficient computations associated to density-driven spectral clustering. Our theoretical analysis is supported with numerical simulations and experiments on synthetic and real image data.
    摘要 我们分析 Ferma 距离的归一化性质,这是在里曼纹理上定义的一家density-driven métrique中的一个家族。Ferma 距离可以在粒子样本上定义,在这种情况下它们是随机的,也可以在 kontinuum 设置下定义,在这种情况下它们由 geodesics 下的 density-distorted Riemannian métrique 引起。我们证明了粒子样本上的 discrete Fermat 距离在小邻域内与其 kontinuum 同构中的analogues converge 到一个具体的rate,这个rate取决于数据的内在维度和 density weighting 参数的值。我们使用了新的几何和统计学理由,包括非均匀的density和拐弯的 Domian,来证明这一结论。我们的结果被用来证明基于 discrete Fermat 距离的图 Laplacian 的数据 clustering 算法是可靠的。具体来说,我们证明了 discrete eigenvalues 和 eigenvectors 在一定的维度上与其 kontinuum 同构中的analogues converge 到一个具体的rate,这使得我们可以根据continuum limit 来 интерпретирова discrete spectral clustering 的效果。我们的理论分析得到了 numérical simulations 和实验数据的支持,并提供了新的 clustering 算法和相关的办法。

Equivariant Single View Pose Prediction Via Induced and Restricted Representations

  • paper_url: http://arxiv.org/abs/2307.03704
  • repo_url: None
  • paper_authors: Owen Howell, David Klee, Ondrej Biza, Linfeng Zhao, Robin Walters
  • for: 这篇论文是为了解决计算机视觉中从二维图像中学习三维世界的基本问题而写的。
  • methods: 这篇论文使用了SO(3)-equivariance的限制来适应二维图像上的对象旋转和翻译。具体来说,它使用了SO(2)-equivariance的约束来满足三维世界中对象的几何一致性约束。
  • results: 这篇论文提出了一种新的算法,可以在三维世界中从二维图像中学习对象的姿态。该算法在三个不同的 pose 预测任务上进行了测试,并在PASCAL3D+和SYMSOL pose estimation任务上达到了最高的测试精度。
    Abstract Learning about the three-dimensional world from two-dimensional images is a fundamental problem in computer vision. An ideal neural network architecture for such tasks would leverage the fact that objects can be rotated and translated in three dimensions to make predictions about novel images. However, imposing SO(3)-equivariance on two-dimensional inputs is difficult because the group of three-dimensional rotations does not have a natural action on the two-dimensional plane. Specifically, it is possible that an element of SO(3) will rotate an image out of plane. We show that an algorithm that learns a three-dimensional representation of the world from two dimensional images must satisfy certain geometric consistency properties which we formulate as SO(2)-equivariance constraints. We use the induced and restricted representations of SO(2) on SO(3) to construct and classify architectures which satisfy these geometric consistency constraints. We prove that any architecture which respects said consistency constraints can be realized as an instance of our construction. We show that three previously proposed neural architectures for 3D pose prediction are special cases of our construction. We propose a new algorithm that is a learnable generalization of previously considered methods. We test our architecture on three pose predictions task and achieve SOTA results on both the PASCAL3D+ and SYMSOL pose estimation tasks.
    摘要 学习三维世界从二维图像中是计算机视觉的基本问题。理想的神经网络架构 для此类任务应该利用对象可以在三维空间中旋转和平移,以便对 novel 图像进行预测。然而,对于二维输入,强制 SO(3) 同态性是困难的,因为三维 rotate 操作没有自然的二维平面上的行为。我们表明,一个学习三维世界的算法从二维图像中获得三维表示的世界,必须满足某些几何一致性要求,我们将这些要求表述为 SO(2) 同态性约束。我们使用 SO(2) 在 SO(3) 上的启发和受限表示来设计和分类架构,并证明任何满足这些几何一致性要求的架构都可以通过我们的构造实现。我们表明,前面已经提出的三个神经网络架构 для 3D 姿态预测都是我们的构造的特例。我们提出了一种新的算法,它是learnable的,并且是已知的方法的普适化。我们对三个姿态预测任务进行测试,并在 PASCAL3D+ 和 SYMSOL 姿态预测任务上达到了最高的成绩。

Scalable Membership Inference Attacks via Quantile Regression

  • paper_url: http://arxiv.org/abs/2307.03694
  • repo_url: None
  • paper_authors: Martin Bertran, Shuai Tang, Michael Kearns, Jamie Morgenstern, Aaron Roth, Zhiwei Steven Wu
  • for: The paper is written for discussing a new class of membership inference attacks that are competitive with state-of-the-art shadow model attacks but require substantially less compute.
  • methods: The paper uses quantile regression on the distribution of confidence scores induced by the model under attack on points that are not used in training as the attack method.
  • results: The paper shows the efficacy of this approach in an extensive series of experiments on various datasets and model architectures, demonstrating that the proposed attack is competitive with state-of-the-art shadow model attacks while requiring less compute and being truly “black-box”.
    Abstract Membership inference attacks are designed to determine, using black box access to trained models, whether a particular example was used in training or not. Membership inference can be formalized as a hypothesis testing problem. The most effective existing attacks estimate the distribution of some test statistic (usually the model's confidence on the true label) on points that were (and were not) used in training by training many \emph{shadow models} -- i.e. models of the same architecture as the model being attacked, trained on a random subsample of data. While effective, these attacks are extremely computationally expensive, especially when the model under attack is large. We introduce a new class of attacks based on performing quantile regression on the distribution of confidence scores induced by the model under attack on points that are not used in training. We show that our method is competitive with state-of-the-art shadow model attacks, while requiring substantially less compute because our attack requires training only a single model. Moreover, unlike shadow model attacks, our proposed attack does not require any knowledge of the architecture of the model under attack and is therefore truly ``black-box". We show the efficacy of this approach in an extensive series of experiments on various datasets and model architectures.
    摘要 域名推测攻击是用于判断一个特定示例是否在训练中使用了黑盒访问已经训练过的模型。域名推测可以形式化为一个假设测试问题。现有最有效的攻击方法是通过训练多个“陌生模型”(即与被攻击模型相同的架构的模型,被训练于Random subsets of data)来估计模型在真实标签上的信任度分布。然而,这些攻击非常 computationally expensive,特别是当模型被攻击时很大。我们介绍了一种新的攻击方法,基于模型下发的信任分布中的分值回归。我们表明,我们的方法与现有的陌生模型攻击相比,需要更少的计算资源,因为我们的攻击只需要训练一个模型。此外,我们的提议的攻击方法不需要知道模型下发的架构,因此是真正的“黑盒”攻击。我们在多个数据集和模型架构上进行了广泛的实验,证明了这种方法的有效性。