cs.AI - 2023-09-13

Efficient quantum recurrent reinforcement learning via quantum reservoir computing

  • paper_url: http://arxiv.org/abs/2309.07339
  • repo_url: None
  • paper_authors: Samuel Yen-Chi Chen
  • for: This paper aims to address the challenge of inefficient training in quantum reinforcement learning (QRL) models incorporating quantum recurrent neural networks (QRNNs).
  • methods: The proposed approach utilizes QLSTM-based reservoirs, with randomly initialized and fixed parameters, and trains the model using the asynchronous advantage actor-critic (A3C) algorithm.
  • results: Numerical simulations demonstrate the efficacy of the proposed QLSTM-Reservoir RL framework, achieving comparable results to a fully trained QLSTM RL model with the same architecture and training settings.
    Abstract Quantum reinforcement learning (QRL) has emerged as a framework to solve sequential decision-making tasks, showcasing empirical quantum advantages. A notable development is through quantum recurrent neural networks (QRNNs) for memory-intensive tasks such as partially observable environments. However, QRL models incorporating QRNN encounter challenges such as inefficient training of QRL with QRNN, given that the computation of gradients in QRNN is both computationally expensive and time-consuming. This work presents a novel approach to address this challenge by constructing QRL agents utilizing QRNN-based reservoirs, specifically employing quantum long short-term memory (QLSTM). QLSTM parameters are randomly initialized and fixed without training. The model is trained using the asynchronous advantage actor-aritic (A3C) algorithm. Through numerical simulations, we validate the efficacy of our QLSTM-Reservoir RL framework. Its performance is assessed on standard benchmarks, demonstrating comparable results to a fully trained QLSTM RL model with identical architecture and training settings.
    摘要 This work presents a novel approach to address this challenge by constructing QRL agents utilizing QRNN-based reservoirs, specifically employing quantum long short-term memory (QLSTM)。 QLSTM parameters are randomly initialized and fixed without training。 The model is trained using the asynchronous advantage actor-aritic (A3C) algorithm。Through numerical simulations, we validate the efficacy of our QLSTM-Reservoir RL framework。 Its performance is assessed on standard benchmarks, demonstrating comparable results to a fully trained QLSTM RL model with identical architecture and training settings。

Learning from Auxiliary Sources in Argumentative Revision Classification

  • paper_url: http://arxiv.org/abs/2309.07334
  • repo_url: None
  • paper_authors: Tazin Afrin, Diane Litman
  • for: 这个论文是为了开发用于分类辩论性修订的模型。
  • methods: 论文使用了两种方法:多任务学习和传输学习,以利用相似任务的辅助数据来提高分类器性能。
  • results: 论文的内在和外在评估结果显示,multi-task learning和传输学习都可以提高分类器性能,其中传输学习更好地表达数据之间的关系。
    Abstract We develop models to classify desirable reasoning revisions in argumentative writing. We explore two approaches -- multi-task learning and transfer learning -- to take advantage of auxiliary sources of revision data for similar tasks. Results of intrinsic and extrinsic evaluations show that both approaches can indeed improve classifier performance over baselines. While multi-task learning shows that training on different sources of data at the same time may improve performance, transfer-learning better represents the relationship between the data.
    摘要 我们开发了一些模型,用于分类 Argumentative 写作中的可covetous 修订。我们explore两种方法:多任务学习和转移学习,以利用 auxiliary 数据来提高分类器性能。结果显示,两种方法都可以提高分类器的性能,比baseline更好。而多任务学习表明,在同时使用不同的数据源进行训练可以提高性能,而转移学习更好地表达数据之间的关系。

Reliability-based cleaning of noisy training labels with inductive conformal prediction in multi-modal biomedical data mining

  • paper_url: http://arxiv.org/abs/2309.07332
  • repo_url: https://github.com/xzhan96-stf/icp_train_clean
  • paper_authors: Xianghao Zhan, Qinmei Xu, Yuanning Zheng, Guangming Lu, Olivier Gevaert
  • for: 提高生物医学数据标注的精度,解决传统半supervised学习方法在使用大量标注数据时仍然表现不佳的问题。
  • methods: 提出了一种基于概率预测的数据清洁方法,利用 inductive conformal prediction (ICP) 计算出的可靠度指标, Rectify 标注数据和噪声数据,提高数据标注的精度。
  • results: 在三种不同的模式下进行了三种类型的分类任务,包括使用标题和摘要滤除 DILI 文献、通过 CT 成像和电子医疗记录预测 COVID-19 患者 ICU admit、以及使用 RNA 序列数据分型乳腺癌。结果表明,该方法可以显著提高分类性能,包括增加精度、AUROC 和 AUPRC。
    Abstract Accurately labeling biomedical data presents a challenge. Traditional semi-supervised learning methods often under-utilize available unlabeled data. To address this, we propose a novel reliability-based training data cleaning method employing inductive conformal prediction (ICP). This method capitalizes on a small set of accurately labeled training data and leverages ICP-calculated reliability metrics to rectify mislabeled data and outliers within vast quantities of noisy training data. The efficacy of the method is validated across three classification tasks within distinct modalities: filtering drug-induced-liver-injury (DILI) literature with title and abstract, predicting ICU admission of COVID-19 patients through CT radiomics and electronic health records, and subtyping breast cancer using RNA-sequencing data. Varying levels of noise to the training labels were introduced through label permutation. Results show significant enhancements in classification performance: accuracy enhancement in 86 out of 96 DILI experiments (up to 11.4%), AUROC and AUPRC enhancements in all 48 COVID-19 experiments (up to 23.8% and 69.8%), and accuracy and macro-average F1 score improvements in 47 out of 48 RNA-sequencing experiments (up to 74.6% and 89.0%). Our method offers the potential to substantially boost classification performance in multi-modal biomedical machine learning tasks. Importantly, it accomplishes this without necessitating an excessive volume of meticulously curated training data.
    摘要 准确标注生物医学数据存在挑战。传统的半监督学习方法经常不充分利用可用的无标签数据。为解决这个问题,我们提议一种基于可靠性的训练数据清洁方法,利用抽象匹配预测(ICP)计算的可靠性指标来正确化涂抹数据和异常值在大量的噪音训练数据中。我们在三种不同的Modalities上进行了三种类型的分类任务:在标题和摘要中筛选药物引起的liver损伤文献(DILI)、通过CT成像记录和电子医疗记录预测COVID-19患者ICU入院,以及使用RNA测序数据进行乳腺癌类型分类。我们对训练标签进行了各种噪音添加,并评估了方法的效果。结果显示,我们的方法可以帮助提高分类性能:DILI中的86个实验中(最高11.4%)、COVID-19中的所有48个实验(最高23.8%和69.8%)和RNA测序中的47个实验中(最高74.6%和89.0%)。我们的方法可以在多模态生物医学机器学习任务中帮助提高分类性能,而不需要大量的精心筛选训练数据。

Racing Control Variable Genetic Programming for Symbolic Regression

  • paper_url: http://arxiv.org/abs/2309.07934
  • repo_url: None
  • paper_authors: Nan Jiang, Yexiang Xue
  • for: 这 paper 旨在提高 symbolic regression 的效率和准确性,使其能够更快地从实验数据中找到 governing equations。
  • methods: 这 paper 使用了 Control Variable Genetic Programming (CVGP) 和 Racing Control Variable Genetic Programming (Racing-CVGP) 两种方法, CVGP 通过设计控制变量实验来加速 regression 过程,而 Racing-CVGP 则同时执行多个 experiment schedule,选择最佳 experiment schedule 以提高效率。
  • results: 该 paper 在多个 synthetic 和实际世界数据集上进行了测试,并证明 Racing-CVGP 可以比 CVGP 和一系列基于固定数据集的 symbolic regressors 更高效和准确地找到 governing equations。
    Abstract Symbolic regression, as one of the most crucial tasks in AI for science, discovers governing equations from experimental data. Popular approaches based on genetic programming, Monte Carlo tree search, or deep reinforcement learning learn symbolic regression from a fixed dataset. They require massive datasets and long training time especially when learning complex equations involving many variables. Recently, Control Variable Genetic Programming (CVGP) has been introduced which accelerates the regression process by discovering equations from designed control variable experiments. However, the set of experiments is fixed a-priori in CVGP and we observe that sub-optimal selection of experiment schedules delay the discovery process significantly. To overcome this limitation, we propose Racing Control Variable Genetic Programming (Racing-CVGP), which carries out multiple experiment schedules simultaneously. A selection scheme similar to that used in selecting good symbolic equations in the genetic programming process is implemented to ensure that promising experiment schedules eventually win over the average ones. The unfavorable schedules are terminated early to save time for the promising ones. We evaluate Racing-CVGP on several synthetic and real-world datasets corresponding to true physics laws. We demonstrate that Racing-CVGP outperforms CVGP and a series of symbolic regressors which discover equations from fixed datasets.
    摘要 Symbolic regression, as one of the most crucial tasks in AI for science, 发现 governing equations from experimental data. 受欢迎的方法包括生物学编程、Monte Carlo tree search 或深度奖励学习,它们从固定的数据集中学习 symbolic regression, 但它们需要庞大的数据集和长时间训练,特别是当学习复杂的方程式时。 最近,Control Variable Genetic Programming (CVGP) 被引入,它可以从设计的控制变量实验中发现方程式。 然而,CVGP 中的实验安排是固定的,我们发现在实验中选择的方案会延迟发现过程的进度。 为了解决这个限制,我们提出了 Racing Control Variable Genetic Programming (Racing-CVGP),它同时执行多个实验安排。 我们实施了一种类似于在生物学编程过程中选择好的符号方程式的选择方式,以确保Promising experiment schedules 最终胜过平均的那些。 不利的安排将在早些时候被终止,以便把时间花在Promising ones 上。 我们对 Racing-CVGP 进行了多个 synthetic 和真实世界数据集的评估,并证明它在 CVGP 和一系列基于固定数据集的符号回归器之上表现出色。

Traveling Words: A Geometric Interpretation of Transformers

  • paper_url: http://arxiv.org/abs/2309.07315
  • repo_url: https://github.com/santiag0m/traveling-words
  • paper_authors: Raul Molina
  • for: 本研究旨在解释转换器内部机制的几何视角,以便更好地理解转换器如何处理自然语言处理任务。
  • methods: 本文使用层Normalization和注意力机制来描述转换器的内部机制。层Normalization使得含义特征被归一化到一个超球上,从而使得注意力可以模糊语言表示的含义。
  • results: 通过对预训练的GPT-2模型进行探测,发现了层Normalization和注意力机制的相互关系,并发现了早期层的查询-关键注意力模式。这些结果证明了几何视角的有用性,并提供了一种直观的理解转换器的方式,即将单词粒子视为在超球上的旋转。
    Abstract Transformers have significantly advanced the field of natural language processing, but comprehending their internal mechanisms remains a challenge. In this paper, we introduce a novel geometric perspective that elucidates the inner mechanisms of transformer operations. Our primary contribution is illustrating how layer normalization confines the latent features to a hyper-sphere, subsequently enabling attention to mold the semantic representation of words on this surface. This geometric viewpoint seamlessly connects established properties such as iterative refinement and contextual embeddings. We validate our insights by probing a pre-trained 124M parameter GPT-2 model. Our findings reveal clear query-key attention patterns in early layers and build upon prior observations regarding the subject-specific nature of attention heads at deeper layers. Harnessing these geometric insights, we present an intuitive understanding of transformers, depicting them as processes that model the trajectory of word particles along the hyper-sphere.
    摘要 transformers 已经在自然语言处理领域取得了 significativ advancement, 但理解其内部机制仍然是一个挑战。在这篇论文中,我们提出了一种新的几何视角,可以帮助我们更好地理解 transformer 的内部机制。我们的主要贡献在于解释如何层 normalization 使得封闭 latent features 在 hyper-sphere 上,然后使得 attention 可以模糊这个表面上的 semantic representation 的形态。这种几何视角通过连接了已知的属性,如迭代精度和 contextual embeddings。我们验证了我们的发现,通过探测一个预训练的 124M 参数 GPT-2 模型。我们的发现显示了早期层的查询-密钥 attention 模式,并建立在更深层的 attention 头上的主题特有性。利用这种几何视角,我们提供了一种直观的理解 transformers,描述它们为模elling 单词粒子的轨迹在 hyper-sphere 上。

AudioSR: Versatile Audio Super-resolution at Scale

  • paper_url: http://arxiv.org/abs/2309.07314
  • repo_url: None
  • paper_authors: Haohe Liu, Ke Chen, Qiao Tian, Wenwu Wang, Mark D. Plumbley
  • for: 提高低分辨率音频质量
  • methods: 使用扩散模型进行数据生成
  • results: 能够在各种音频类型和频谱范围内提高音频质量,包括音效、音乐和说话
    Abstract Audio super-resolution is a fundamental task that predicts high-frequency components for low-resolution audio, enhancing audio quality in digital applications. Previous methods have limitations such as the limited scope of audio types (e.g., music, speech) and specific bandwidth settings they can handle (e.g., 4kHz to 8kHz). In this paper, we introduce a diffusion-based generative model, AudioSR, that is capable of performing robust audio super-resolution on versatile audio types, including sound effects, music, and speech. Specifically, AudioSR can upsample any input audio signal within the bandwidth range of 2kHz to 16kHz to a high-resolution audio signal at 24kHz bandwidth with a sampling rate of 48kHz. Extensive objective evaluation on various audio super-resolution benchmarks demonstrates the strong result achieved by the proposed model. In addition, our subjective evaluation shows that AudioSR can acts as a plug-and-play module to enhance the generation quality of a wide range of audio generative models, including AudioLDM, Fastspeech2, and MusicGen. Our code and demo are available at https://audioldm.github.io/audiosr.
    摘要 音频超分辨是一个基础任务,它预测低分辨率音频中缺失的高频成分,以提高数字应用中的音频质量。现有的方法具有限制,例如只能处理特定类型的音频(如音乐、语音)以及特定的宽频范围(如4kHz到8kHz)。在这篇论文中,我们介绍了一种扩散基于的生成模型,即AudioSR,可以在多种音频类型上进行稳定的音频超分辨。具体来说,AudioSR可以将输入音频信号在2kHz到16kHz的宽频范围内 upsample 到高分辨率音频信号,具有48kHz的抽样率和24kHz的宽频范围。对于多种音频超分辨测试 benchmark 进行了广泛的对象评估,结果显示提案的模型具有强大的效果。此外,我们的主观评估表明,AudioSR可以作为一个插件模块,提高多种音频生成模型的生成质量,包括AudioLDM、Fastspeech2和MusicGen。我们的代码和示例可以在 上下载。

Pretraining on the Test Set Is All You Need

  • paper_url: http://arxiv.org/abs/2309.08632
  • repo_url: https://github.com/molyswu/hand_detection
  • paper_authors: Rylan Schaeffer
  • for: 这 paper 是为了检验和评估基于 transformer 语言模型的小型语言模型。
  • methods: 这 paper 使用了一种新的数据集 mixture,包含 less than 100 thousand tokens,并使用 transformer 语言模型进行预训练。
  • results: 这 paper 的模型可以在多种学术 bencmarks 上达到完美的结果,超越所有已知基础模型,并且能够准确预测下游评估 bencmarks 的 canaries。
    Abstract Inspired by recent work demonstrating the promise of smaller Transformer-based language models pretrained on carefully curated data, we supercharge such approaches by investing heavily in curating a novel, high quality, non-synthetic data mixture based solely on evaluation benchmarks. Using our novel dataset mixture consisting of less than 100 thousand tokens, we pretrain a 1 million parameter transformer-based LLM \textbf{phi-CTNL} (pronounced ``fictional") that achieves perfect results across diverse academic benchmarks, strictly outperforming all known foundation models. \textbf{phi-CTNL} also beats power-law scaling and exhibits a never-before-seen grokking-like ability to accurately predict downstream evaluation benchmarks' canaries.
    摘要 受最近一些研究所示出小型Transformer模型预训练在精心准备的数据上的承诺,我们决心投入大量资源来精心制作一种新的高质量非 sintetic 数据混合物。使用我们的新数据混合物,包含 fewer than 100 thousand tokens,我们预训练了1 million参数的Transformer模型\textbf{phi-CTNL}(发音为“虚构的”),在多种学术评价指标上达到了完美的结果,一直超越所有已知基础模型。\textbf{phi-CTNL} 还击倒了力学律稳定的增长和显示出 nunca-before-seen 的感知能力,准确预测下游评价指标的鸡。

  • paper_url: http://arxiv.org/abs/2309.07276
  • repo_url: None
  • paper_authors: Thao Nguyen, Vladislav Hrosinkov, Eric Rosen, Stefanie Tellex
  • for: 本研究旨在实现实际的物品搜寻任务,使 robot 能够通过语言描述找到环境中的物品。
  • methods: 本研究使用 Markov decision process (POMDP) 来解决物品搜寻问题,并通过一个深度神经网络来决定物品探测器和视觉传感器的错误模型。
  • results: 研究比较了这个新方法和现有的物品搜寻算法,结果显示,使用这个方法可以实现更高的任务完成率(由 0.46 提升至 0.66)和更快和有效的物品搜寻。此外,研究还在 Boston Dynamics Spot 机器人上进行了实际应用,证明了这个方法的可行性和效果。
    Abstract Object search is a challenging task because when given complex language descriptions (e.g., "find the white cup on the table"), the robot must move its camera through the environment and recognize the described object. Previous works map language descriptions to a set of fixed object detectors with predetermined noise models, but these approaches are challenging to scale because new detectors need to be made for each object. In this work, we bridge the gap in realistic object search by posing the search problem as a partially observable Markov decision process (POMDP) where the object detector and visual sensor noise in the observation model is determined by a single Deep Neural Network conditioned on complex language descriptions. We incorporate the neural network's outputs into our language-conditioned observation model (LCOM) to represent dynamically changing sensor noise. With an LCOM, any language description of an object can be used to generate an appropriate object detector and noise model, and training an LCOM only requires readily available supervised image-caption datasets. We empirically evaluate our method by comparing against a state-of-the-art object search algorithm in simulation, and demonstrate that planning with our observation model yields a significantly higher average task completion rate (from 0.46 to 0.66) and more efficient and quicker object search than with a fixed-noise model. We demonstrate our method on a Boston Dynamics Spot robot, enabling it to handle complex natural language object descriptions and efficiently find objects in a room-scale environment.
    摘要 <>将文本翻译成简化中文。<>对象搜索是一项复杂的任务,因为当给出复杂的自然语言描述(例如,"找到桌子上的白瓶")时,机器人需要通过环境移动相机并识别描述的对象。先前的方法将语言描述映射到一组预先定义的对象检测器中,但这些方法具有扩展性问题,因为需要为每个对象创建新的检测器。在这种工作中,我们将搜索问题视为一个部分可观察的Markov决策过程(POMDP),其中对象检测器和视觉传感器的噪音在观察模型中由单个深度神经网络决定,该神经网络是根据复杂的自然语言描述 Conditioned。我们将神经网络的输出 integrate into我们的语言受控观察模型(LCOM),以表示动态变化的感知噪音。与一个LCOM相比,任何自然语言描述都可以生成适当的对象检测器和噪音模型,并且训练LCOM只需要可以获得的ready-to-use的Supervised image-caption dataset。我们实际运行我们的方法,与一个状态革新的对象搜索算法进行比较,并证明在 simulation中,使用我们的观察模型可以实现更高的均值任务完成率(从0.46到0.66)和更有效和更快的对象搜索。我们在Boston Dynamics Spot机器人上运行我们的方法,使其能够处理复杂的自然语言对象描述,并快速寻找房间级别环境中的对象。

Safe and Accelerated Deep Reinforcement Learning-based O-RAN Slicing: A Hybrid Transfer Learning Approach

  • paper_url: http://arxiv.org/abs/2309.07265
  • repo_url: https://github.com/ahmadnagib/tl-aided-drl
  • paper_authors: Ahmad M. Nagib, Hatem Abou-Zeid, Hossam S. Hassanein
    for: 这篇论文旨在提出一种基于启发学习(DRL)的闭环控制方法,以优化Radio Access Network(RAN)功能。methods: 该论文使用了转移学习(TL)作为训练和部署工作流程的核心组成部分,并提出了一种hybrid TL-aided方法, combinign policy reuse和distillation TL方法,以提供安全和加速的收敛。results: 实验结果表明,提议的hybrid方法在面对实际的VR游戏压力流量的情况下,能够提供至少7.7%和20.7%的初始奖励值和结束收敛情况的提升,同时保持快速收敛和提高普适性。
    Abstract The open radio access network (O-RAN) architecture supports intelligent network control algorithms as one of its core capabilities. Data-driven applications incorporate such algorithms to optimize radio access network (RAN) functions via RAN intelligent controllers (RICs). Deep reinforcement learning (DRL) algorithms are among the main approaches adopted in the O-RAN literature to solve dynamic radio resource management problems. However, despite the benefits introduced by the O-RAN RICs, the practical adoption of DRL algorithms in real network deployments falls behind. This is primarily due to the slow convergence and unstable performance exhibited by DRL agents upon deployment and when encountering previously unseen network conditions. In this paper, we address these challenges by proposing transfer learning (TL) as a core component of the training and deployment workflows for the DRL-based closed-loop control of O-RAN functionalities. To this end, we propose and design a hybrid TL-aided approach that leverages the advantages of both policy reuse and distillation TL methods to provide safe and accelerated convergence in DRL-based O-RAN slicing. We conduct a thorough experiment that accommodates multiple services, including real VR gaming traffic to reflect practical scenarios of O-RAN slicing. We also propose and implement policy reuse and distillation-aided DRL and non-TL-aided DRL as three separate baselines. The proposed hybrid approach shows at least: 7.7% and 20.7% improvements in the average initial reward value and the percentage of converged scenarios, and a 64.6% decrease in reward variance while maintaining fast convergence and enhancing the generalizability compared with the baselines.
    摘要 Open Radio Access Network (O-RAN) 架构支持智能网络控制算法作为其核心功能之一。数据驱动应用程序将此类算法用于Radio Access Network (RAN) 函数的优化,通过RAN 智能控制器 (RICs)。深度强化学习 (DRL) 算法是 O-RAN 文献中广泛采用的方法来解决动态Radio资源管理问题。然而,尽管 O-RAN RICs 带来了优点,但实际部署中对 DRL 算法的应用仍迟后。这主要是因为 DRL 代理人在部署和遇到新的网络条件时的慢速融合和不稳定性。在这篇论文中,我们解决这些挑战,通过在训练和部署过程中引入传输学习 (TL) 作为核心组件。为此,我们提出了一种混合 TL-aided 方法,利用传输学习的优势,提供安全和加速融合,并在 O-RAN 排程中提供更好的一致性。我们进行了 Thorough 实验,包括多种服务,如真实的 VR 游戏流量,以反映实际 O-RAN 排程的实际场景。我们还提出了非 TL-aided DRL 和 policy reuse 和 distillation 两种基eline。我们的提案的混合方法在 DRL 基eline 中显示出至少7.7%和20.7%的平均初始奖励值和百分比可以融合,并在奖励变幅方面减少64.6%,同时保持快速融合和提高一致性。

Autotuning Apache TVM-based Scientific Applications Using Bayesian Optimization

  • paper_url: http://arxiv.org/abs/2309.07235
  • repo_url: None
  • paper_authors: Xingfu Wu, Praveen Paramasivam, Valerie Taylor
  • for: 优化稠密矩阵分解(LU、Cholesky、3mm)的性能在GPU和AI加速器上。
  • methods: 使用 Bayesian 优化和 TVM tensor expression language 实现自动调整。
  • results: 对 Argonne National Laboratory 的 GPU 集群(Swing)进行科学计算kernel的评估,并与 AutoTVM Comparing four tuners 的自动调整框架,发现自动调整框架在大多数情况下表现更好。
    Abstract Apache TVM (Tensor Virtual Machine), an open source machine learning compiler framework designed to optimize computations across various hardware platforms, provides an opportunity to improve the performance of dense matrix factorizations such as LU (Lower Upper) decomposition and Cholesky decomposition on GPUs and AI (Artificial Intelligence) accelerators. In this paper, we propose a new TVM autotuning framework using Bayesian Optimization and use the TVM tensor expression language to implement linear algebra kernels such as LU, Cholesky, and 3mm. We use these scientific computation kernels to evaluate the effectiveness of our methods on a GPU cluster, called Swing, at Argonne National Laboratory. We compare the proposed autotuning framework with the TVM autotuning framework AutoTVM with four tuners and find that our framework outperforms AutoTVM in most cases.
    摘要 Apache TVM(tensor虚拟机),一个开源机器学习编译框架,设计用于优化不同硬件平台上的计算,为GPU和人工智能加速器提供了提高稠密矩阵因子化(如LU和Cholesky分解)的机会。在这篇论文中,我们提出了一个基于抽象优化的新的TVM自动调整框架,使用 bayesian优化来实现。我们使用TVMtensor表达语言来实现线性代数kernels,如LU、Cholesky和3mm。我们使用这些科学计算kernels来评估我们的方法在Argonne国家实验室的GPU集群上的效果。我们比较了我们的自动调整框架与AutoTVM自动调整框架,并发现我们的框架在大多数情况下超越AutoTVM。

Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics

  • paper_url: http://arxiv.org/abs/2309.07120
  • repo_url: https://github.com/ucsc-vlaa/sight-beyond-text
  • paper_authors: Haoqin Tu, Bingchen Zhao, Chen Wei, Cihang Xie
  • for: 这个研究旨在探讨多Modal大型自然语言模型(MLLMs)的各种能力,包括文本回答和视觉指令。
  • methods: 研究人员使用了视觉指令调整,这是一种广泛使用的方法来转化大型自然语言模型(LLMs)为MLLMs。
  • results: 研究人员发现,使用视觉指令调整可以帮助模型更好地保持真实性和道德观念在纯文本上下文中。例如,一个视觉指令调整后的 LLaMA2 7B 模型在 TruthfulQA-mc 和 Ethics bencmarks 上表现更好 than 一个基于一百万个人签名的 LLaMA2-chat 7B 模型。
    Abstract Multi-modal large language models (MLLMs) are trained based on large language models (LLM), with an enhanced capability to comprehend multi-modal inputs and generate textual responses. While they excel in multi-modal tasks, the pure NLP abilities of MLLMs are often underestimated and left untested. In this study, we get out of the box and unveil an intriguing characteristic of MLLMs -- our preliminary results suggest that visual instruction tuning, a prevailing strategy for transitioning LLMs into MLLMs, unexpectedly and interestingly helps models attain both improved truthfulness and ethical alignment in the pure NLP context. For example, a visual-instruction-tuned LLaMA2 7B model surpasses the performance of the LLaMA2-chat 7B model, fine-tuned with over one million human annotations, on TruthfulQA-mc and Ethics benchmarks. Further analysis reveals that the improved alignment can be attributed to the superior instruction quality inherent to visual-text data. In releasing our code at github.com/UCSC-VLAA/Sight-Beyond-Text, we aspire to foster further exploration into the intrinsic value of visual-text synergies and, in a broader scope, multi-modal interactions in alignment research.
    摘要 多Modal大语言模型(MLLMs)由大语言模型(LLM)进行训练,具有理解多Modal输入并生成文本响应的能力。虽然它们在多Modal任务中表现出色,但纯NLP能力的MLLMs经常被低估和未经测试。在这个研究中,我们离开“盒子”,揭示了MLLMs的一个吸引人特点——我们的初步结果表明,使用视觉指令调整,一种常见的将LLMs转换为MLLMs的策略,奇怪地和有趣地帮助模型在纯NLP上获得更高的真实性和道德适应性。例如,一个视觉指令调整后的LLaMA2 7B模型超过了人工注释超过一百万个的LLaMA2-chat 7B模型在TruthfulQA-mc和Ethicsbenchmark上的表现。进一步分析表明,改善的协调可以归因于视觉文本数据中的更高质量指令。在发布我们的代码在github.com/UCSC-VLAA/Sight-Beyond-Text上,我们希望能够激发更多人研究多Modal交互和协调在对齐研究中的内在价值。

Characterizing Speed Performance of Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.07108
  • repo_url: None
  • paper_authors: Samuel Wiggins, Yuan Meng, Rajgopal Kannan, Viktor Prasanna
  • for: 本研究旨在探讨多智能体强化学习(MARL)算法在大规模AI系统和大数据应用中的性能瓶颈。
  • methods: 本研究使用了一种稍加简化的分类方法,将MARL算法分为两类:培训方案和通信方式。然后,通过对三种目标算法(MADDPG、ToM2C和NeurComm)的性能瓶颈进行系统分析,以探讨它们的性能瓶颈。
  • results: 研究发现,MARL算法的性能瓶颈主要来自于培训和通信过程中的计算和内存占用。此外,研究还发现,通过并行化和加速MARL算法可以大幅提高性能。
    Abstract Multi-Agent Reinforcement Learning (MARL) has achieved significant success in large-scale AI systems and big-data applications such as smart grids, surveillance, etc. Existing advancements in MARL algorithms focus on improving the rewards obtained by introducing various mechanisms for inter-agent cooperation. However, these optimizations are usually compute- and memory-intensive, thus leading to suboptimal speed performance in end-to-end training time. In this work, we analyze the speed performance (i.e., latency-bounded throughput) as the key metric in MARL implementations. Specifically, we first introduce a taxonomy of MARL algorithms from an acceleration perspective categorized by (1) training scheme and (2) communication method. Using our taxonomy, we identify three state-of-the-art MARL algorithms - Multi-Agent Deep Deterministic Policy Gradient (MADDPG), Target-oriented Multi-agent Communication and Cooperation (ToM2C), and Networked Multi-Agent RL (NeurComm) - as target benchmark algorithms, and provide a systematic analysis of their performance bottlenecks on a homogeneous multi-core CPU platform. We justify the need for MARL latency-bounded throughput to be a key performance metric in future literature while also addressing opportunities for parallelization and acceleration.
    摘要 在这个工作中,我们分析 MARL 实现中的速度性能(即延迟确定通过put)作为关键度量。 Specifically, we first introduce a taxonomy of MARL algorithms from an acceleration perspective categorized by (1) training scheme and (2) communication method. Using our taxonomy, we identify three state-of-the-art MARL algorithms - Multi-Agent Deep Deterministic Policy Gradient (MADDPG), Target-oriented Multi-agent Communication and Cooperation (ToM2C), and Networked Multi-Agent RL (NeurComm) - as target benchmark algorithms, and provide a systematic analysis of their performance bottlenecks on a homogeneous multi-core CPU platform. We justify the need for MARL latency-bounded throughput to be a key performance metric in future literature while also addressing opportunities for parallelization and acceleration.

Mitigating Group Bias in Federated Learning for Heterogeneous Devices

  • paper_url: http://arxiv.org/abs/2309.07085
  • repo_url: None
  • paper_authors: Khotso Selialia, Yasra Chandio, Fatima M. Anwar
  • for: 这篇论文的目的是为了提出一个具有隐私保证的聚合学习框架,以便在分散的边缘应用中进行模型训练,并且能够处理不同部署环境下的数据不均匀。
  • methods: 这篇论文使用了聚合学习的方法,并且提出了一个修改了多重加权更新方法,以便在不同部署环境下实现具有均衡性的模型训练。此外,论文还提出了一些调整技术来减少最差和最好的群体之间的差异,并且通过阈值机制来确保均衡具有均衡性和差异减少。
  • results: 论文的实验结果显示,这个框架可以实现具有均衡性和隐私保证的聚合学习,并且能够在实际的不同部署环境下进行均衡的模型训练。此外,论文还显示了该框架在人类情绪识别和图像分类 benchmark 上的良好表现,证明了它在实际应用中的可行性和有效性。
    Abstract Federated Learning is emerging as a privacy-preserving model training approach in distributed edge applications. As such, most edge deployments are heterogeneous in nature i.e., their sensing capabilities and environments vary across deployments. This edge heterogeneity violates the independence and identical distribution (IID) property of local data across clients and produces biased global models i.e. models that contribute to unfair decision-making and discrimination against a particular community or a group. Existing bias mitigation techniques only focus on bias generated from label heterogeneity in non-IID data without accounting for domain variations due to feature heterogeneity and do not address global group-fairness property. Our work proposes a group-fair FL framework that minimizes group-bias while preserving privacy and without resource utilization overhead. Our main idea is to leverage average conditional probabilities to compute a cross-domain group \textit{importance weights} derived from heterogeneous training data to optimize the performance of the worst-performing group using a modified multiplicative weights update method. Additionally, we propose regularization techniques to minimize the difference between the worst and best-performing groups while making sure through our thresholding mechanism to strike a balance between bias reduction and group performance degradation. Our evaluation of human emotion recognition and image classification benchmarks assesses the fair decision-making of our framework in real-world heterogeneous settings.
    摘要 federated 学习是一种隐私保护的模型训练方法,在分布式边缘应用中得到普遍应用。由于大多数边缘部署是多样化的,即它们的感知能力和环境不同。这种边缘多样性违反了本地数据之间独立和同分布性(IID)属性,导致模型偏袋和不公正决策。现有的偏袋缓解技术仅考虑模型生成的标签多样性,而不考虑特征多样性引起的领域变化,也没有考虑全球群体公平性质量。我们的工作提出了一种群体公平的 Federated 学习框架,以降低群体偏袋,保护隐私和不增加资源使用负担。我们的主要想法是通过conditional probability的平均值来计算各个领域的群体重要性权,基于多样化的训练数据,使用修改后的乘数更新方法来优化最差performing group的性能。此外,我们还提出了约束技术,以降低最差和最优performing group之间的差距,同时通过阈值机制保证偏袋减少和群体性能下降之间的平衡。我们对人听情感识别和图像分类benchmark进行了评估,以评估我们的框架在实际多样化环境中的公平决策性。

A Comprehensive Analysis of the Role of Artificial Intelligence and Machine Learning in Modern Digital Forensics and Incident Response

  • paper_url: http://arxiv.org/abs/2309.07064
  • repo_url: None
  • paper_authors: Dipo Dunsin, Mohamed C. Ghanem, Karim Ouazzane, Vassil Vassilev
  • for: This paper aims to provide a comprehensive analysis of the use of Artificial Intelligence (AI) and Machine Learning (ML) in digital forensics and incident response, exploring cutting-edge research initiatives and their applications in various facets of digital forensics practice.
  • methods: The paper employs a thorough and in-depth analysis, including a review of existing research, to examine the use of AI and ML techniques in digital forensics, including data collection and recovery, cybercrime timeline reconstruction, big data analysis, pattern recognition, chain of custody safeguarding, and responsive strategies to hacking incidents.
  • results: The study highlights the potential and limitations of AI and ML techniques in digital forensics, including their contributions, limitations, and gaps in the existing research. It also underscores the significance of strategic planning, continual research, and development to unlock AI’s full potential in digital forensics and incident response, and offers insights into their benefits, drawbacks, and broader implications for tackling modern cyber threats.
    Abstract In the dynamic landscape of digital forensics, the integration of Artificial Intelligence (AI) and Machine Learning (ML) stands as a transformative technology, poised to amplify the efficiency and precision of digital forensics investigations. However, the use of ML and AI in digital forensics is still in its nascent stages. As a result, this paper gives a thorough and in-depth analysis that goes beyond a simple survey and review. The goal is to look closely at how AI and ML techniques are used in digital forensics and incident response. This research explores cutting-edge research initiatives that cross domains such as data collection and recovery, the intricate reconstruction of cybercrime timelines, robust big data analysis, pattern recognition, safeguarding the chain of custody, and orchestrating responsive strategies to hacking incidents. This endeavour digs far beneath the surface to unearth the intricate ways AI-driven methodologies are shaping these crucial facets of digital forensics practice. While the promise of AI in digital forensics is evident, the challenges arising from increasing database sizes and evolving criminal tactics necessitate ongoing collaborative research and refinement within the digital forensics profession. This study examines the contributions, limitations, and gaps in the existing research, shedding light on the potential and limitations of AI and ML techniques. By exploring these different research areas, we highlight the critical need for strategic planning, continual research, and development to unlock AI's full potential in digital forensics and incident response. Ultimately, this paper underscores the significance of AI and ML integration in digital forensics, offering insights into their benefits, drawbacks, and broader implications for tackling modern cyber threats.
    摘要 在数字审查领域的动态景观中,人工智能(AI)和机器学习(ML)技术被视为一种转型性的技术,它将在数字审查调查中提高效率和精度。然而,在数字审查中使用ML和AI的使用仍然处于初始阶段。因此,本文进行了深入的分析和评论,不仅是简单的报告和评论。本研究的目的是仔细研究AI和ML技术在数字审查和应急应对中的应用。本研究探讨了跨领域的数据采集和恢复、复杂的网络犯罪时间线重建、大数据分析、模式识别、维护链接 custody和应对黑客事件的应急策略等方面。本研究不仅仅是浅表面上的探讨,而是深入探究AI驱动的方法如何影响这些关键方面的数字审查实践。虽然AI在数字审查中的承诺很明显,但是随着数据库的增加和犯罪手段的发展,需要持续合作研究和完善。本研究 examine了现有研究的贡献、局限性和缺陷,为数字审查和应急应对领域的未来发展提供了新的思路和方向。最终,本文强调了AI和ML在数字审查中的潜在价值,并 shed light on its potential and limitations。

Deep Quantum Graph Dreaming: Deciphering Neural Network Insights into Quantum Experiments

  • paper_url: http://arxiv.org/abs/2309.07056
  • repo_url: None
  • paper_authors: Tareq Jaouni, Sören Arlt, Carlos Ruiz-Gonzalez, Ebrahim Karimi, Xuemei Gu, Mario Krenn
  • for: 本研究旨在使用可解释AI技术(XAI)来解释神经网络在量子光学实验中的学习结果。
  • methods: 本研究使用了inception或deep dreaming技术来探索神经网络对量子系统的学习。首先,我们在神经网络上训练了量子系统的性质,然后使用了inception技术来探索神经网络如何改变量子系统的性质。
  • results: 研究发现,神经网络可以Shift量子系统的初始分布性质,并且可以描述神经网络学习的策略。在深层层次中,神经网络可以识别复杂的量子结构和甚至是量子共振。这与计算机视觉中已知的性质相似,我们现在在复杂的自然科学任务中发现了这种性质。这种方法可能有助于开发更有可读性的AI基于科学发现技术。
    Abstract Despite their promise to facilitate new scientific discoveries, the opaqueness of neural networks presents a challenge in interpreting the logic behind their findings. Here, we use a eXplainable-AI (XAI) technique called $inception$ or $deep$ $dreaming$, which has been invented in machine learning for computer vision. We use this techniques to explore what neural networks learn about quantum optics experiments. Our story begins by training a deep neural networks on the properties of quantum systems. Once trained, we "invert" the neural network -- effectively asking how it imagines a quantum system with a specific property, and how it would continuously modify the quantum system to change a property. We find that the network can shift the initial distribution of properties of the quantum system, and we can conceptualize the learned strategies of the neural network. Interestingly, we find that, in the first layers, the neural network identifies simple properties, while in the deeper ones, it can identify complex quantum structures and even quantum entanglement. This is in reminiscence of long-understood properties known in computer vision, which we now identify in a complex natural science task. Our approach could be useful in a more interpretable way to develop new advanced AI-based scientific discovery techniques in quantum physics.
    摘要 尽管神经网络的承诺是为科学发现提供新的机会,但它们的不透明性却成为解释其发现的逻辑的挑战。在这里,我们使用一种可解释的AI技术,称为inception或deep dreaming,这种技术在机器学习中提出,用于计算机视觉领域。我们使用这种技术来探索神经网络在量子光学实验中学习的内容。我们的故事开始于训练深度神经网络,然后我们“反转”神经网络,即问神经网络如何在Specific property的量子系统中做出修改,以达到改变这个属性的目的。我们发现神经网络可以将初始分布的属性转移到量子系统中,并且我们可以概念化神经网络学习的策略。有趣的是,在核心层次上,神经网络可以识别简单的属性,而在更深的层次上,它可以识别复杂的量子结构,甚至是量子紧密相关。这与计算机视觉中长期认知的特性相似,我们现在在复杂的自然科学任务中发现这些特性。我们的方法可能有助于在更可解释的方式下开发新的高级AI科学发现技术。

Pearl’s and Jeffrey’s Update as Modes of Learning in Probabilistic Programming

  • paper_url: http://arxiv.org/abs/2309.07053
  • repo_url: None
  • paper_authors: Bart Jacobs, Dario Stein
  • for: 这篇论文主要针对 updating 概率分布 的问题,即在新证据出现后,如何更新概率分布。
  • methods: 论文使用了 Pearl 和 Jeffrey 的两种自然更新方法,并对它们之间的相似性和差异进行了解释。
  • results: 论文表明,Jeffrey 的更新规则可以通过变分推理得到,而且在概率论中可以用多erset 函数和分布Monad 的 Kleisli 类来分析。
    Abstract The concept of updating a probability distribution in the light of new evidence lies at the heart of statistics and machine learning. Pearl's and Jeffrey's rule are two natural update mechanisms which lead to different outcomes, yet the similarities and differences remain mysterious. This paper clarifies their relationship in several ways: via separate descriptions of the two update mechanisms in terms of probabilistic programs and sampling semantics, and via different notions of likelihood (for Pearl and for Jeffrey). Moreover, it is shown that Jeffrey's update rule arises via variational inference. In terms of categorical probability theory, this amounts to an analysis of the situation in terms of the behaviour of the multiset functor, extended to the Kleisli category of the distribution monad.
    摘要 <>将新证据纳入概率分布更新的概念是统计和机器学习的核心。珍珠和胡均的更新规则是自然的更新机制,却又存在价值和不同之处,这些相似和不同的关系仍然莫迪。这篇论文通过以下几种方式解释了这两个更新规则之间的关系:首先,通过在概率程序和抽样语义上分别描述这两个更新规则,并通过不同的可能性概率(对珍珠和胡均)进行解释。其次,这篇论文还证明了胡均的更新规则可以通过变分推理来 derivation。在Category probability theory中,这等于对多erset函数的行为进行分析,将分布模块的Kleisli类型扩展到多erset函数。Note: The translation is in Simplified Chinese, which is the standard Chinese writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons

  • paper_url: http://arxiv.org/abs/2309.07051
  • repo_url: https://github.com/youngseng/unifiedgesture
  • paper_authors: Sicheng Yang, Zilin Wang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Qiaochu Huang, Lei Hao, Songcen Xu, Xiaofei Wu, changpeng yang, Zonghong Dai
  • for: 这个论文的目的是提出一种新的散射模型基于的speech-driven手势生成方法,用于解决过去的个体数据集设计和不同的动体捕捉标准下的手势生成问题。
  • methods: 该方法首先提出了一种重定向网络,以学习不同动体标准下的手势表示,并将其整合到不同的手势数据集中。然后,通过涉及注意力和自注意力来捕捉speech和手势之间的相关性,并通过使用散射模型架构来生成更加匹配的speech和realistic的手势。最后,通过在粒度单位上使用学习的奖励函数来补做手势和speech的对齐和多样性。
  • results: 对于speech-driven手势生成方法,这个方法的实验结果表明,它在CCA、FGD和人工智能化三个指标上都超过了现有的方法。
    Abstract The automatic co-speech gesture generation draws much attention in computer animation. Previous works designed network structures on individual datasets, which resulted in a lack of data volume and generalizability across different motion capture standards. In addition, it is a challenging task due to the weak correlation between speech and gestures. To address these problems, we present UnifiedGesture, a novel diffusion model-based speech-driven gesture synthesis approach, trained on multiple gesture datasets with different skeletons. Specifically, we first present a retargeting network to learn latent homeomorphic graphs for different motion capture standards, unifying the representations of various gestures while extending the dataset. We then capture the correlation between speech and gestures based on a diffusion model architecture using cross-local attention and self-attention to generate better speech-matched and realistic gestures. To further align speech and gesture and increase diversity, we incorporate reinforcement learning on the discrete gesture units with a learned reward function. Extensive experiments show that UnifiedGesture outperforms recent approaches on speech-driven gesture generation in terms of CCA, FGD, and human-likeness. All code, pre-trained models, databases, and demos are available to the public at https://github.com/YoungSeng/UnifiedGesture.
    摘要 “自动合成手势拥有了广泛的关注,特别是在计算机动画领域。先前的工作基于个别数据集设计了网络结构,却导致数据量不足和不同动作捕捉标准之间的普适性问题。此外,手势和语音之间的关系强度较弱,这也是一个挑战。为解决这些问题,我们提出了一种新的扩散模型基于的语音驱动手势生成方法,并在多个手势数据集上进行了训练。具体来说,我们首先提出了一种重定向网络,以学习不同动作捕捉标准下的共同表示,从而统一各种手势的表示,同时扩展数据集。然后,我们基于扩散模型架构使用了相互关注和自身关注,以生成更好地语音匹配和更加真实的手势。为了进一步对语音和手势进行对齐和增加多样性,我们还在粗粒度上进行了资源学习,并使用了学习的奖励函数。我们的实验表明,UnifiedGesture可以在语音驱动手势生成方面与现有的方法相比,在CCA、FGD和人类化指标上表现出色。所有代码、预训练模型、数据库和示例都可以在https://github.com/YoungSeng/UnifiedGesture上获取。”

Latent Representation and Simulation of Markov Processes via Time-Lagged Information Bottleneck

  • paper_url: http://arxiv.org/abs/2309.07200
  • repo_url: None
  • paper_authors: Marco Federici, Patrick Forré, Ryota Tomioka, Bastiaan S. Veeling
  • for: 这种论文是为了提出一种基于信息理论的时间延迟维度减少方法,用于精确地模拟大规模系统在长时间步长下的动态行为。
  • methods: 这种方法使用时间延迟信息瓶颈(T-IB),一种基于信息理论的原则,来将复杂系统映射到简化的表示空间中,并模型大幅时间跳动。
  • results: 实验表明,T-IB可以成功地捕捉原始过程的统计特征和动态特征,并在选择的时间延迟下提供信息优质的表示,超越现有的时间延迟维度减少方法。
    Abstract Markov processes are widely used mathematical models for describing dynamic systems in various fields. However, accurately simulating large-scale systems at long time scales is computationally expensive due to the short time steps required for accurate integration. In this paper, we introduce an inference process that maps complex systems into a simplified representational space and models large jumps in time. To achieve this, we propose Time-lagged Information Bottleneck (T-IB), a principled objective rooted in information theory, which aims to capture relevant temporal features while discarding high-frequency information to simplify the simulation task and minimize the inference error. Our experiments demonstrate that T-IB learns information-optimal representations for accurately modeling the statistical properties and dynamics of the original process at a selected time lag, outperforming existing time-lagged dimensionality reduction methods.
    摘要

Efficient Reinforcement Learning for Jumping Monopods

  • paper_url: http://arxiv.org/abs/2309.07038
  • repo_url: https://github.com/mfocchi/jump_rl
  • paper_authors: Riccardo Bussola, Michele Focchi, Andrea Del Prete, Daniele Fontanelli, Luigi Palopoli
  • for: 这个研究旨在解决控制问题,使一个单脚抓到目标点。
  • methods: 该研究使用了知识导引RL框架,以快速学习控制器。
  • results: 研究表明,该方法比优化基于方法和端到端RL方法更高效。
    Abstract In this work, we consider the complex control problem of making a monopod reach a target with a jump. The monopod can jump in any direction and the terrain underneath its foot can be uneven. This is a template of a much larger class of problems, which are extremely challenging and computationally expensive to solve using standard optimisation-based techniques. Reinforcement Learning (RL) could be an interesting alternative, but the application of an end-to-end approach in which the controller must learn everything from scratch, is impractical. The solution advocated in this paper is to guide the learning process within an RL framework by injecting physical knowledge. This expedient brings to widespread benefits, such as a drastic reduction of the learning time, and the ability to learn and compensate for possible errors in the low-level controller executing the motion. We demonstrate the advantage of our approach with respect to both optimization-based and end-to-end RL approaches.
    摘要 在这项工作中,我们考虑了一个复杂的控制问题,即使用杆 Foot 达到目标点。杆 Foot 可以在任何方向跳跃,地面下的地形可能不平坦。这是一个更大的类型问题的模板,这些问题非常困难并且计算昂贵,使用标准优化基本技术来解决。学习反馈(RL)可能是一个有趣的alternative,但是把控制器学习 everything from scratch 的end-to-end方法是不实用的。我们提出的解决方案是通过在 RL 框架中注入物理知识来导引学习过程。这种方法带来了广泛的好处,如减少学习时间和可以学习和赔偿低级控制器执行运动中的可能错误。我们通过与优化基本技术和end-to-end RL方法进行比较,证明了我们的方法的优势。

How (Not) to Use Sociodemographic Information for Subjective NLP Tasks

  • paper_url: http://arxiv.org/abs/2309.07034
  • repo_url: https://github.com/ukplab/arxiv2023-sociodemographic-prompting
  • paper_authors: Tilman Beck, Hendrik Schuff, Anne Lauscher, Iryna Gurevych
  • for: 这个研究旨在解释社会学人类对于主观NLU任务的决策会受到哪些因素影响,以及社会学人类对于这些任务的评估是否有所不同。
  • methods: 这个研究使用了社会学人类对于主观NLU任务的评估,并评估了不同的提示方式以及模型家族对于这些任务的表现。
  • results: 研究发现,社会学人类对于主观NLU任务的评估存在许多不同,并且不同的提示方式和模型家族对于这些任务的表现也有很大的差异。
    Abstract Annotators' sociodemographic backgrounds (i.e., the individual compositions of their gender, age, educational background, etc.) have a strong impact on their decisions when working on subjective NLP tasks, such as hate speech detection. Often, heterogeneous backgrounds result in high disagreements. To model this variation, recent work has explored sociodemographic prompting, a technique, which steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give. However, the available NLP literature disagrees on the efficacy of this technique -- it remains unclear, for which tasks and scenarios it can help and evaluations are limited to specific tasks only. We address this research gap by presenting the largest and most comprehensive study of sociodemographic prompting today. Concretely, we evaluate several prompt formulations across seven datasets and six instruction-tuned model families. We find that (1) while sociodemographic prompting can be beneficial for improving zero-shot learning in subjective NLP tasks, (2) its outcomes largely vary for different model types, sizes, and datasets, (3) are subject to large variance with regards to prompt formulations. Thus, sociodemographic prompting is not a reliable proxy for traditional data annotation with a sociodemographically heterogeneous group of annotators. Instead, we propose (4) to use it for identifying ambiguous instances resulting in more informed annotation efforts.
    摘要 评分人员的社会 демографические背景(例如,个人的性别、年龄、教育背景等)在主观NLPTasks中的决策产生很大影响。常见的多样性导致高度不一致。为了模拟这种多样性,最近的研究已经探索了社会 демографических提示,一种技术,可以让模型的输出更加受人类社会 демографиically多样化的人的影响。然而,NLPLiterature中有限的资源表明,这种技术的有效性是不确定的,它的应用场景和任务类型都需要进一步的研究。我们填补这个研究漏洞,并提出了以下结论:1. 社会 демографические提示可以在主观NLPTasks中提高零shot学习的效果。2. 不同的模型类型、大小和数据集中,sociodemographic prompting的效果差异较大。3. 对提示形式的选择会导致很大的变化。4. 因此,sociodemographic prompting不是一个可靠的代理人 для使用多样化的社会 демографи学注解人员进行标注。而是应用于帮助标注人员更加了解的情况。

Résumé Parsing as Hierarchical Sequence Labeling: An Empirical Study

  • paper_url: http://arxiv.org/abs/2309.07015
  • repo_url: https://github.com/federetyk/resume-parsing
  • paper_authors: Federico Retyk, Hermenegildo Fabregat, Juan Aizpuru, Mariana Taglio, Rabih Zbib
  • For: 本研究旨在提出一种 simultaneous 分类模型,用于从简历中提取信息。* Methods: 本研究采用了两级标签模型,将文档分为行和单词两个水平,并同时进行行和单词的标签。* Results: 实验结果表明,提出的模型在英文、法语、中文、西班牙语、德语、葡萄牙语和瑞典语的简历提取任务中表现出色,超越了之前的方法。
    Abstract Extracting information from r\'esum\'es is typically formulated as a two-stage problem, where the document is first segmented into sections and then each section is processed individually to extract the target entities. Instead, we cast the whole problem as sequence labeling in two levels -- lines and tokens -- and study model architectures for solving both tasks simultaneously. We build high-quality r\'esum\'e parsing corpora in English, French, Chinese, Spanish, German, Portuguese, and Swedish. Based on these corpora, we present experimental results that demonstrate the effectiveness of the proposed models for the information extraction task, outperforming approaches introduced in previous work. We conduct an ablation study of the proposed architectures. We also analyze both model performance and resource efficiency, and describe the trade-offs for model deployment in the context of a production environment.
    摘要 通常情况下,从简历中提取信息是一个两阶段的问题,首先将文档分 segmented into sections,然后对每个section进行处理,以提取目标实体。而我们将这个问题框定为一个两级标注问题,将文本分为两个水平:lines和tokens,并研究如何同时解决这两个任务。我们建立了高质量的简历分析 Corpora 在英语、法语、中文、西班牙语、德语、葡萄牙语和瑞典语等语言中。基于这些 Corpora,我们提出了一些实验结果,表明我们的方法在信息提取任务中表现出色,超过了之前的方法。我们进行了模型architecture的ablation Study,并分析了模型性能和资源使用情况,并描述了在生产环境中模型部署的trade-offs。

  • paper_url: http://arxiv.org/abs/2309.07001
  • repo_url: None
  • paper_authors: Ziyuan Xia, Anchen Sun, Xiaodong Cai, Saixing Zeng
  • for: 这种研究旨在映射全球市场中公司ESG报告的变化景观。
  • methods: 该研究采用动态框架分析公司ESG策略管理,包括个体类、多个类和特定可持续发展指数的Alignment。
  • results: 研究发现,通过包含分析关键词的方法,可以揭示过去几年ESG话题的同时演变。
    Abstract Environmental, social, and governance (ESG) reports are globally recognized as a keystone in sustainable enterprise development. This study aims to map the changing landscape of ESG topics within firms in the global market. A dynamic framework is developed to analyze ESG strategic management for individual classes, across multiple classes, and in alignment with a specific sustainability index. The output of these analytical processes forms the foundation of an ESG strategic model. Utilizing a rich collection of 21st-century ESG reports from technology companies, our experiment elucidates the changes in ESG perspectives by incorporating analytical keywords into the proposed framework. This work thus provides an empirical method that reveals the concurrent evolution of ESG topics over recent years.
    摘要 环境、社会和管理(ESG)报告在全球市场被广泛认可为可持续企业发展的关键estone。这项研究旨在映射全球市场内公司ESG主题的变化情况。我们提出了一种动态框架,用于分析公司独立类、多个类和特定可持续指数的ESG策略管理。这些分析过程的输出形成了ESG策略模型的基础。通过使用21世纪ESG报告的丰富集合,我们的实验揭示了ESG视角的变化。这种实验方法可以揭示过去几年ESG主题的同时演化。Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

MASTERKEY: Practical Backdoor Attack Against Speaker Verification Systems

  • paper_url: http://arxiv.org/abs/2309.06981
  • repo_url: None
  • paper_authors: Hanqing Guo, Xun Chen, Junfeng Guo, Li Xiao, Qiben Yan
  • For: The paper is written to propose a backdoor attack called MASTERKEY that can compromise speaker verification (SV) models in mobile systems, with a focus on real-world practical settings where the attacker has no knowledge of the intended victim.* Methods: The paper investigates the limitations of existing poisoning attacks against unseen targets, optimizes a universal backdoor that can attack arbitrary targets, embeds speaker’s characteristics and semantics information into the backdoor, and estimates channel distortion to integrate it into the backdoor.* Results: The paper validates the attack on 6 popular SV models, poisoning a total of 53 models and using a trigger to attack 16,430 enrolled speakers (310 target speakers enrolled in 53 poisoned models) with a 100% attack success rate at a 15% poison rate, and achieving a 50% attack success rate at a 3% poison rate. The attack is validated in 3 real-world scenarios, including over-the-air and over-the-telephony-line scenarios.Here’s the Chinese version of the three information points:* FOR: 这篇论文是为了提出一种名为MASTERKEY的后门攻击,用于在移动系统中验证真实用户的语音特征,并且在实际生活中的假设中,攻击者没有受害者的知情。* METHODS: 论文调查了现有的毒 attacked against 未知目标的局限性,并且优化了一个通用的后门,可以攻击任意目标。然后,它嵌入了说话者的特征和语义信息到后门中,使其隐蔽。最后,它估算了通道扭曲,并将其纳入到后门中。* RESULTS: 论文验证了这种攻击在6种流行的SV模型上,毒化了53个模型,并使用触发器攻击了16,430名注册说话者(310名目标说话者注册在53个毒化模型中),成功率100%,毒料率15%。降低毒料率到3%,成功率仍然约为50%。论文在3个实际生活中场景中成功地演示了这种攻击,包括过空中和过 телеphony-line 场景。
    Abstract Speaker Verification (SV) is widely deployed in mobile systems to authenticate legitimate users by using their voice traits. In this work, we propose a backdoor attack MASTERKEY, to compromise the SV models. Different from previous attacks, we focus on a real-world practical setting where the attacker possesses no knowledge of the intended victim. To design MASTERKEY, we investigate the limitation of existing poisoning attacks against unseen targets. Then, we optimize a universal backdoor that is capable of attacking arbitrary targets. Next, we embed the speaker's characteristics and semantics information into the backdoor, making it imperceptible. Finally, we estimate the channel distortion and integrate it into the backdoor. We validate our attack on 6 popular SV models. Specifically, we poison a total of 53 models and use our trigger to attack 16,430 enrolled speakers, composed of 310 target speakers enrolled in 53 poisoned models. Our attack achieves 100% attack success rate with a 15% poison rate. By decreasing the poison rate to 3%, the attack success rate remains around 50%. We validate our attack in 3 real-world scenarios and successfully demonstrate the attack through both over-the-air and over-the-telephony-line scenarios.
    摘要 听话验证(SV)广泛应用在移动系统中,以使用用户的声音特征进行身份验证。在这项工作中,我们提出了一种后门攻击方法,名为主要钥匙(MASTERKEY),以损害SV模型。与前一些攻击不同,我们在实际世界中具有无知的攻击者。我们调查现有毒素攻击的局限性,然后优化一个通用的后门,可以攻击任意目标。接着,我们将听话者的特征和 semantics信息 embed 到后门中,使其隐藏不见。最后,我们估算通道扭曲,并将其纳入后门。我们验证了我们的攻击方法,并成功攻击了6个流行的SV模型。具体来说,我们毒备53个模型,并使用我们的触发器攻击16430名注册用户,其中310名目标用户注册在53个毒备模型中。我们的攻击达100%成功率,且使用毒备率为15%时,成功率约为50%。我们在3个实际场景中验证了我们的攻击,并成功地通过了 both over-the-air 和 over-the-telephony-line 场景。

DNNShifter: An Efficient DNN Pruning System for Edge Computing

  • paper_url: http://arxiv.org/abs/2309.06973
  • repo_url: https://github.com/blessonvar/dnnshifter
  • paper_authors: Bailey J. Eccles, Philip Rodgers, Peter Kilpatrick, Ivor Spence, Blesson Varghese
  • for: 本研究旨在 Addressing the challenge of deploying deep neural network (DNN) models on mobile and embedded devices with limited computational and memory resources.
  • methods: 本研究使用了一种 novel methodology of structured pruning to rapidly derive suitable model variants that maintain the accuracy of the original model.
  • results: compared to conventional training methods, DNNShifter produces pruned model variants up to 93x faster, with a 1.67x inference latency speedup and up to 3.8x lower memory utilization. Additionally, DNNShifter has up to 11.9x lower overhead for switching models.
    Abstract Deep neural networks (DNNs) underpin many machine learning applications. Production quality DNN models achieve high inference accuracy by training millions of DNN parameters which has a significant resource footprint. This presents a challenge for resources operating at the extreme edge of the network, such as mobile and embedded devices that have limited computational and memory resources. To address this, models are pruned to create lightweight, more suitable variants for these devices. Existing pruning methods are unable to provide similar quality models compared to their unpruned counterparts without significant time costs and overheads or are limited to offline use cases. Our work rapidly derives suitable model variants while maintaining the accuracy of the original model. The model variants can be swapped quickly when system and network conditions change to match workload demand. This paper presents DNNShifter, an end-to-end DNN training, spatial pruning, and model switching system that addresses the challenges mentioned above. At the heart of DNNShifter is a novel methodology that prunes sparse models using structured pruning. The pruned model variants generated by DNNShifter are smaller in size and thus faster than dense and sparse model predecessors, making them suitable for inference at the edge while retaining near similar accuracy as of the original dense model. DNNShifter generates a portfolio of model variants that can be swiftly interchanged depending on operational conditions. DNNShifter produces pruned model variants up to 93x faster than conventional training methods. Compared to sparse models, the pruned model variants are up to 5.14x smaller and have a 1.67x inference latency speedup, with no compromise to sparse model accuracy. In addition, DNNShifter has up to 11.9x lower overhead for switching models and up to 3.8x lower memory utilisation than existing approaches.
    摘要 深度神经网络(DNN)在机器学习应用中扮演着重要的角色。生产质量DNN模型在训练数百万个DNN参数后可以达到高精度推理。这种情况带来了Edge网络中的资源挑战,如移动设备和嵌入式设备,它们具有有限的计算和存储资源。为解决这个问题,模型会被截取,以创建轻量级、适应 Edge 设备的模型变体。现有的截取方法无法提供与原始模型相同质量的模型变体,或者需要过长的时间和资源成本,或者只适用于线上场景。我们的工作可以快速生成适应 Edge 设备的模型变体,保持原始模型的精度。这些模型变体可以快速交换,根据系统和网络条件变化,以适应工作负荷。我们提出了DNNShifter,一个端到端的DNN训练、空间截取和模型交换系统,解决以上问题。DNNShifter的核心是一种新的预测截取方法,可以快速生成稀疏模型的减少版本。这些减少版本比传统训练方法快速,小于传统稀疏模型的大小,但保持与原始 dense 模型的精度。DNNShifter生成了一个模型变体的PORTFOLIO,可以快速交换,根据操作条件变化。相比于稀疏模型,DNNShifter的减少版本更快,具有1.67倍的推理速度,而且没有牺牲稀疏模型的精度。此外,DNNShifter的模型交换过程 overhead 低于11.9倍,内存使用率低于3.8倍。

Setting the Right Expectations: Algorithmic Recourse Over Time

  • paper_url: http://arxiv.org/abs/2309.06969
  • repo_url: None
  • paper_authors: Joao Fonseca, Andrew Bell, Carlo Abrate, Francesco Bonchi, Julia Stoyanovich
  • for: 这篇论文关注了算法决策中的高风险决策,以及算法救济的可靠性问题。
  • methods: 该论文提出了一种基于代理模型的 simulated environment 来研究算法救济在不断变化的环境中的可靠性问题。
  • results: 研究发现,只有在某些特定的参数化情况下,算法救济可以在时间上保持可靠性。这指出了更多的研究是需要进行,以确保算法救济的可靠性。
    Abstract Algorithmic systems are often called upon to assist in high-stakes decision making. In light of this, algorithmic recourse, the principle wherein individuals should be able to take action against an undesirable outcome made by an algorithmic system, is receiving growing attention. The bulk of the literature on algorithmic recourse to-date focuses primarily on how to provide recourse to a single individual, overlooking a critical element: the effects of a continuously changing context. Disregarding these effects on recourse is a significant oversight, since, in almost all cases, recourse consists of an individual making a first, unfavorable attempt, and then being given an opportunity to make one or several attempts at a later date - when the context might have changed. This can create false expectations, as initial recourse recommendations may become less reliable over time due to model drift and competition for access to the favorable outcome between individuals. In this work we propose an agent-based simulation framework for studying the effects of a continuously changing environment on algorithmic recourse. In particular, we identify two main effects that can alter the reliability of recourse for individuals represented by the agents: (1) competition with other agents acting upon recourse, and (2) competition with new agents entering the environment. Our findings highlight that only a small set of specific parameterizations result in algorithmic recourse that is reliable for agents over time. Consequently, we argue that substantial additional work is needed to understand recourse reliability over time, and to develop recourse methods that reward agents' effort.
    摘要 算法系统经常被召唤来协助高风险决策。因此,算法补救(individuals should be able to take action against an undesirable outcome made by an algorithmic system)正在受到加强注意力。大多数Literature on algorithmic recourse to date focuses primarily on how to provide recourse to a single individual, neglecting a critical element: the effects of a continuously changing context. Ignoring these effects on recourse is a significant oversight, as recourse typically consists of an individual making a first, unfavorable attempt, and then being given an opportunity to make one or several attempts at a later date - when the context might have changed. This can create false expectations, as initial recourse recommendations may become less reliable over time due to model drift and competition for access to the favorable outcome between individuals.在这个研究中,我们提出了一个基于代理模型的 simulate the effects of a continuously changing environment on algorithmic recourse. Specifically, we identify two main effects that can alter the reliability of recourse for individuals represented by the agents: (1) competition with other agents acting upon recourse, and (2) competition with new agents entering the environment. Our findings highlight that only a small set of specific parameterizations result in algorithmic recourse that is reliable for agents over time. Therefore, we argue that substantial additional work is needed to understand recourse reliability over time, and to develop recourse methods that reward agents' effort.

Attention-based Dynamic Graph Convolutional Recurrent Neural Network for Traffic Flow Prediction in Highway Transportation

  • paper_url: http://arxiv.org/abs/2309.07196
  • repo_url: None
  • paper_authors: Tianpu Zhang, Weilong Ding, Mengda Xing
  • for: traffic flow prediction in highway transportation
  • methods: Attention-based Dynamic Graph Convolutional Recurrent Neural Network (ADGCRNN) with self-attention, multi-dynamic graphs, and dedicated gated kernel for graph convolution operations
  • results: better performance than state-of-the-art baselines and practical benefit in highway transportation, as proven by experiments on two public datasets and case studies of a real Web system.
    Abstract As one of the important tools for spatial feature extraction, graph convolution has been applied in a wide range of fields such as traffic flow prediction. However, current popular works of graph convolution cannot guarantee spatio-temporal consistency in a long period. The ignorance of correlational dynamics, convolutional locality and temporal comprehensiveness would limit predictive accuracy. In this paper, a novel Attention-based Dynamic Graph Convolutional Recurrent Neural Network (ADGCRNN) is proposed to improve traffic flow prediction in highway transportation. Three temporal resolutions of data sequence are effectively integrated by self-attention to extract characteristics; multi-dynamic graphs and their weights are dynamically created to compliantly combine the varying characteristics; a dedicated gated kernel emphasizing highly relative nodes is introduced on these complete graphs to reduce overfitting for graph convolution operations. Experiments on two public datasets show our work better than state-of-the-art baselines, and case studies of a real Web system prove practical benefit in highway transportation.
    摘要 As one of the important tools for spatial feature extraction, graph convolution has been applied in a wide range of fields such as traffic flow prediction. However, current popular works of graph convolution cannot guarantee spatio-temporal consistency in a long period. The ignorance of correlational dynamics, convolutional locality and temporal comprehensiveness would limit predictive accuracy. In this paper, a novel Attention-based Dynamic Graph Convolutional Recurrent Neural Network (ADGCRNN) is proposed to improve traffic flow prediction in highway transportation. Three temporal resolutions of data sequence are effectively integrated by self-attention to extract characteristics; multi-dynamic graphs and their weights are dynamically created to compliantly combine the varying characteristics; a dedicated gated kernel emphasizing highly relative nodes is introduced on these complete graphs to reduce overfitting for graph convolution operations. Experiments on two public datasets show our work better than state-of-the-art baselines, and case studies of a real Web system prove practical benefit in highway transportation.Here's the breakdown of the translation:* "As one of the important tools for spatial feature extraction" becomes "作为空间特征提取工具之一"* "graph convolution" becomes "图 convolution"* "cannot guarantee spatio-temporal consistency in a long period" becomes "不能保证长期空间-时间一致性"* "ignorance of correlational dynamics" becomes "忽略相关动态的忽略"* "convolutional locality" becomes "卷积性的局部性"* "temporal comprehensiveness" becomes "时间全面性"* "would limit predictive accuracy" becomes "会限制预测精度"* "In this paper, a novel Attention-based Dynamic Graph Convolutional Recurrent Neural Network (ADGCRNN) is proposed" becomes "本文提出了一种新型的注意力基于动态图 convolutional Recurrent Neural Network (ADGCRNN)"* "Three temporal resolutions of data sequence are effectively integrated by self-attention" becomes "数据序列的三个时间分辨率被自注意力有效地集成"* "multi-dynamic graphs and their weights are dynamically created to compliantly combine the varying characteristics" becomes "多个动态图和其权重在运行时动态创建,以兼容不同特征的变化"* "a dedicated gated kernel emphasizing highly relative nodes is introduced on these complete graphs" becomes "在这些完整的图上引入了一种专门的权重抑制层,以强调高度相关的节点"* "Experiments on two public datasets show our work better than state-of-the-art baselines" becomes "在两个公共数据集上进行了对我们的工作的比较,显示我们的工作比预设基线更好"* "case studies of a real Web system prove practical benefit in highway transportation" becomes "一个真实的Web系统的案例研究表明,我们的工作在高速交通中具有实用的利益"

Towards Reliable Dermatology Evaluation Benchmarks

  • paper_url: http://arxiv.org/abs/2309.06961
  • repo_url: https://github.com/digital-dermatology/selfclean-revised-benchmarks
  • paper_authors: Fabian Gröger, Simone Lionetti, Philippe Gottfrois, Alvaro Gonzalez-Jimenez, Matthew Groh, Roxana Daneshjou, Labelling Consortium, Alexander A. Navarini, Marc Pouly
  • for: 这篇论文是为了提高数字DERMATOLOGY模型性能评估的可靠性而写的。
  • methods: 该论文提出了一种资源效率的数据清洁协议,使用现有的算法清洁策略,并通过多名DERMATOLOGIST的确认进行筛选不相关样本和近似样本,并估计每个DERMATOLOGY图像集中标签错误的百分比。
  • results: 该论文通过对六个DERMATOLOGY图像集进行数据清洁和确认,发现这些集中存在一些不准确的标签,并估计这些标签错误的百分比。同时, authors也发布了每个图像集的修订后的文件列表,以便将来的模型评估工作更加可靠。
    Abstract Benchmark datasets for digital dermatology unwittingly contain inaccuracies that reduce trust in model performance estimates. We propose a resource-efficient data cleaning protocol to identify issues that escaped previous curation. The protocol leverages an existing algorithmic cleaning strategy and is followed by a confirmation process terminated by an intuitive stopping criterion. Based on confirmation by multiple dermatologists, we remove irrelevant samples and near duplicates and estimate the percentage of label errors in six dermatology image datasets for model evaluation promoted by the International Skin Imaging Collaboration. Along with this paper, we publish revised file lists for each dataset which should be used for model evaluation. Our work paves the way for more trustworthy performance assessment in digital dermatology.
    摘要 《数字DERMATOLOGY中的标准数据集不慎地含有错误,这会降低模型性能估计的信任度。我们提议一种资源有效的数据清洁协议,以确定这些问题。该协议基于现有的算法清洁策略,并且通过多名 dermatologist 的确认进行终止。我们从六个DERMATOLOGY图像集中移除无关样本和近似样本,并估计这些样本集中标签错误的百分比。我们随着这篇论文发布了每个样本集的修订文件列表,这些列表应该用于模型评估。我们的工作为数字DERMATOLOGY中的性能评估奠定了基础。》Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection

  • paper_url: http://arxiv.org/abs/2309.06960
  • repo_url: None
  • paper_authors: Hanqing Guo, Guangjing Wang, Yuanda Wang, Bocheng Chen, Qiben Yan, Li Xiao
  • for: 针对声音助手的黑盒攻击
  • methods: 利用决策基本攻击避免训练模型输出的敏感信息泄露
  • results: 在3个实际场景下成功攻击5种商业声音控制设备,穿越3种生物特征验证机制,成功率高于95%,并且可以在几分钟内生成攻击示例并发起攻击
    Abstract In this paper, we propose PhantomSound, a query-efficient black-box attack toward voice assistants. Existing black-box adversarial attacks on voice assistants either apply substitution models or leverage the intermediate model output to estimate the gradients for crafting adversarial audio samples. However, these attack approaches require a significant amount of queries with a lengthy training stage. PhantomSound leverages the decision-based attack to produce effective adversarial audios, and reduces the number of queries by optimizing the gradient estimation. In the experiments, we perform our attack against 4 different speech-to-text APIs under 3 real-world scenarios to demonstrate the real-time attack impact. The results show that PhantomSound is practical and robust in attacking 5 popular commercial voice controllable devices over the air, and is able to bypass 3 liveness detection mechanisms with >95% success rate. The benchmark result shows that PhantomSound can generate adversarial examples and launch the attack in a few minutes. We significantly enhance the query efficiency and reduce the cost of a successful untargeted and targeted adversarial attack by 93.1% and 65.5% compared with the state-of-the-art black-box attacks, using merely ~300 queries (~5 minutes) and ~1,500 queries (~25 minutes), respectively.
    摘要 在这篇论文中,我们提出了PhantomSound,一种高效的黑盒攻击方法 toward voice assistants。现有的黑盒对voice assistants的攻击方法 Either apply substitution models or leverage the intermediate model output to estimate the gradients for crafting adversarial audio samples.然而,这些攻击方法需要较长的训练时间和较多的查询数。PhantomSound leverages the decision-based attack to produce effective adversarial audios, and reduces the number of queries by optimizing the gradient estimation.在实验中,我们对4种speech-to-text API进行了3个实际场景的攻击,以示出实时攻击的影响。结果显示,PhantomSound可以快速和可靠地攻击5种流行的语音控制设备,并可以绕过3种生活体验检测机制,成功率高于95%。 benchmark结果表明,PhantomSound可以在几分钟内生成攻击示例和发动攻击。我们在查询效率和成功攻击成本方面做出了重要提高,相比之前的黑盒攻击方法,PhantomSound可以在merely ~300 queries(~5分钟)和~1,500 queries(~25分钟)内发动成功无目标和目标攻击,减少了93.1%和65.5%的成本。

Implicit Neural Multiple Description for DNA-based data storage

  • paper_url: http://arxiv.org/abs/2309.06956
  • repo_url: None
  • paper_authors: Trung Hieu Le, Xavier Pic, Jeremy Mateos, Marc Antonini
  • for: 这篇论文是为了探讨用DNA作为数据存储解决方案的可能性,以及处理存储和生物 manipulate 操作中出现的错误的技术方法。
  • methods: 该论文使用了一种新型的压缩方案和多重描述编码(MDC)技术,利用神经网络对DNA数据进行编码,以快速减少数据的存储容量。
  • results: 实验结果显示,该方法可以与当前领域中最新的DNA数据存储方法相比,具有更高的压缩率和更强的静音鲁棒性。
    Abstract DNA exhibits remarkable potential as a data storage solution due to its impressive storage density and long-term stability, stemming from its inherent biomolecular structure. However, developing this novel medium comes with its own set of challenges, particularly in addressing errors arising from storage and biological manipulations. These challenges are further conditioned by the structural constraints of DNA sequences and cost considerations. In response to these limitations, we have pioneered a novel compression scheme and a cutting-edge Multiple Description Coding (MDC) technique utilizing neural networks for DNA data storage. Our MDC method introduces an innovative approach to encoding data into DNA, specifically designed to withstand errors effectively. Notably, our new compression scheme overperforms classic image compression methods for DNA-data storage. Furthermore, our approach exhibits superiority over conventional MDC methods reliant on auto-encoders. Its distinctive strengths lie in its ability to bypass the need for extensive model training and its enhanced adaptability for fine-tuning redundancy levels. Experimental results demonstrate that our solution competes favorably with the latest DNA data storage methods in the field, offering superior compression rates and robust noise resilience.
    摘要

DEFormer: DCT-driven Enhancement Transformer for Low-light Image and Dark Vision

  • paper_url: http://arxiv.org/abs/2309.06941
  • repo_url: None
  • paper_authors: Xiangchen Yin, Zhenda Yu, Xin Gao, Ran Ju, Xiao Sun, Xinyu Zhang
  • for: 这篇论文的目的是提高低光照图像的色彩和细节,以便在自动驾驶中进行高级视觉任务。
  • methods: 该论文提出了一种基于频率的新凭据,并提出了一种名为频率增强变换器(DEFormer)。该变换器包括一个可学习的频率分支(LFB),用于频率增强,以及一种跨频域融合(CDF)来减少RGB频谱和频率频谱之间的差异。
  • results: 该论文的实验结果表明,通过使用DEFormer进行预处理,可以提高检测器的性能,在ExDark和DARK FACE数据集上分别提高了2.1%和3.4%的均值精度。
    Abstract The goal of low-light image enhancement is to restore the color and details of the image and is of great significance for high-level visual tasks in autonomous driving. However, it is difficult to restore the lost details in the dark area by relying only on the RGB domain. In this paper we introduce frequency as a new clue into the network and propose a novel DCT-driven enhancement transformer (DEFormer). First, we propose a learnable frequency branch (LFB) for frequency enhancement contains DCT processing and curvature-based frequency enhancement (CFE). CFE calculates the curvature of each channel to represent the detail richness of different frequency bands, then we divides the frequency features, which focuses on frequency bands with richer textures. In addition, we propose a cross domain fusion (CDF) for reducing the differences between the RGB domain and the frequency domain. We also adopt DEFormer as a preprocessing in dark detection, DEFormer effectively improves the performance of the detector, bringing 2.1% and 3.4% improvement in ExDark and DARK FACE datasets on mAP respectively.
    摘要 goal of low-light image enhancement is to restore the color and details of the image, and it is of great significance for high-level visual tasks in autonomous driving. However, it is difficult to restore the lost details in the dark area by relying only on the RGB domain. In this paper, we introduce frequency as a new clue into the network and propose a novel DCT-driven enhancement transformer (DEFormer). First, we propose a learnable frequency branch (LFB) for frequency enhancement, which contains DCT processing and curvature-based frequency enhancement (CFE). CFE calculates the curvature of each channel to represent the detail richness of different frequency bands, then we divide the frequency features, which focuses on frequency bands with richer textures. In addition, we propose a cross domain fusion (CDF) for reducing the differences between the RGB domain and the frequency domain. We also adopt DEFormer as a preprocessing in dark detection, DEFormer effectively improves the performance of the detector, bringing 2.1% and 3.4% improvement in ExDark and DARK FACE datasets on mAP respectively.

Collectionless Artificial Intelligence

  • paper_url: http://arxiv.org/abs/2309.06938
  • repo_url: None
  • paper_authors: Marco Gori, Stefano Melacci
  • For: The paper advocates for the development of new machine learning protocols that prioritize human-like cognitive skills and environmental interactions, rather than relying on centralized data collections.* Methods: The proposed approach is based on the collectionless principle, which restricts the learning process to processing data acquired from the environment at each time instant, without storing temporal information. This promotes self-organized memorization skills and dynamic information organization.* Results: The authors suggest that this approach could lead to the development of AI technologies that are better suited to addressing privacy issues, control, and customizability, and that it could reduce the concentration of power in companies and governments by promoting massively distributed computation.
    Abstract By and large, the professional handling of huge data collections is regarded as a fundamental ingredient of the progress of machine learning and of its spectacular results in related disciplines, with a growing agreement on risks connected to the centralization of such data collections. This paper sustains the position that the time has come for thinking of new learning protocols where machines conquer cognitive skills in a truly human-like context centered on environmental interactions. This comes with specific restrictions on the learning protocol according to the collectionless principle, which states that, at each time instant, data acquired from the environment is processed with the purpose of contributing to update the current internal representation of the environment, and that the agent is not given the privilege of recording the temporal stream. Basically, there is neither permission to store the temporal information coming from the sensors, thus promoting the development of self-organized memorization skills at a more abstract level, instead of relying on bare storage to simulate learning dynamics that are typical of offline learning algorithms. This purposely extreme position is intended to stimulate the development of machines that learn to dynamically organize the information by following human-based schemes. The proposition of this challenge suggests developing new foundations on computational processes of learning and reasoning that might open the doors to a truly orthogonal competitive track on AI technologies that avoid data accumulation by design, thus offering a framework which is better suited concerning privacy issues, control and customizability. Finally, pushing towards massively distributed computation, the collectionless approach to AI will likely reduce the concentration of power in companies and governments, thus better facing geopolitical issues.
    摘要 (以下是简化中文版本)通常来说,职业处理巨大数据集是机器学习进步和相关领域的基本组成部分,并且有一致的看法是中央化数据集的风险。这篇文章支持新的学习协议,即机器在人类化上下文中 conquering 认知能力,并且采用时间约束,不允许机器记录时间流。这种极端位置的目的是促进机器学习动态组织信息,按照人类方式,而不是依靠存储来模拟传统学习算法的动力学。这种挑战的提议是开发新的学习和理解过程的计算基础,以解决隐私、控制和个性化等问题。最后,通过分布式计算,集中化的问题将减少,从而更好地面对地opolitical 问题。

Continual Learning with Dirichlet Generative-based Rehearsal

  • paper_url: http://arxiv.org/abs/2309.06917
  • repo_url: None
  • paper_authors: Min Zeng, Wei Xue, Qifeng Liu, Yike Guo
  • for: 提高数据驱动任务异常对话系统(ToDs)的不断学习能力,解决由计算约束和时间消耗引起的问题。
  • methods: Dirichlet Continual Learning(DCL),一种基于生成的再学习策略,使用 Dirichlet 分布来模型 latent 变量,fficiently 捕捉上一个任务的句子特征,并准确地生成 pseudo 样本。 In addition, 我们引入 Jensen-Shannon Knowledge Distillation(JSKD),一种可靠的 logit-based 知识传递方法,在 pseudo 样本生成过程中增强知识传递。
  • results: 我们的方法在意图检测和填充任务中表现出色,超越了现有的方法。
    Abstract Recent advancements in data-driven task-oriented dialogue systems (ToDs) struggle with incremental learning due to computational constraints and time-consuming issues. Continual Learning (CL) attempts to solve this by avoiding intensive pre-training, but it faces the problem of catastrophic forgetting (CF). While generative-based rehearsal CL methods have made significant strides, generating pseudo samples that accurately reflect the underlying task-specific distribution is still a challenge. In this paper, we present Dirichlet Continual Learning (DCL), a novel generative-based rehearsal strategy for CL. Unlike the traditionally used Gaussian latent variable in the Conditional Variational Autoencoder (CVAE), DCL leverages the flexibility and versatility of the Dirichlet distribution to model the latent prior variable. This enables it to efficiently capture sentence-level features of previous tasks and effectively guide the generation of pseudo samples. In addition, we introduce Jensen-Shannon Knowledge Distillation (JSKD), a robust logit-based knowledge distillation method that enhances knowledge transfer during pseudo sample generation. Our experiments confirm the efficacy of our approach in both intent detection and slot-filling tasks, outperforming state-of-the-art methods.
    摘要

Towards the TopMost: A Topic Modeling System Toolkit

  • paper_url: http://arxiv.org/abs/2309.06908
  • repo_url: https://github.com/bobxwu/topmost
  • paper_authors: Xiaobao Wu, Fengjun Pan, Anh Tuan Luu
  • For: 提供了一个包含完整生命周期的主题模型工具套件(TopMost),以便快速使用、公正比较和扩展不同的主题模型。* Methods: TopMost 采用了高度协调和分离的模块化设计,可以快速使用、公正比较和扩展不同的主题模型。* Results: TopMost 可以覆盖更广泛的主题模型enario,包括资料预处理、模型训练、测试和评估等完整生命周期。In English, this means:* For: The paper proposes a Topic Modeling System Toolkit (TopMost) that provides a comprehensive set of tools for topic modeling, including dataset pre-processing, model training, testing, and evaluations.* Methods: TopMost uses a highly cohesive and decoupled modular design, allowing for quick use, fair comparisons, and flexible extensions of different topic models.* Results: TopMost can cover a wider range of topic modeling scenarios, including complete lifecycles with dataset pre-processing, model training, testing, and evaluations.
    Abstract Topic models have been proposed for decades with various applications and recently refreshed by the neural variational inference. However, these topic models adopt totally distinct dataset, implementation, and evaluation settings, which hinders their quick utilization and fair comparisons. This greatly hinders the research progress of topic models. To address these issues, in this paper we propose a Topic Modeling System Toolkit (TopMost). Compared to existing toolkits, TopMost stands out by covering a wider range of topic modeling scenarios including complete lifecycles with dataset pre-processing, model training, testing, and evaluations. The highly cohesive and decoupled modular design of TopMost enables quick utilization, fair comparisons, and flexible extensions of different topic models. This can facilitate the research and applications of topic models. Our code, tutorials, and documentation are available at https://github.com/bobxwu/topmost.
    摘要 Topic models have been proposed for decades with various applications, and recently have been refreshed by neural variational inference. However, these topic models use completely different datasets, implementations, and evaluation settings, which greatly hinders their quick utilization and fair comparisons. This hinders the research progress of topic models. To address these issues, in this paper we propose a Topic Modeling System Toolkit (TopMost). Compared to existing toolkits, TopMost covers a wider range of topic modeling scenarios, including complete lifecycles with dataset pre-processing, model training, testing, and evaluations. The highly cohesive and decoupled modular design of TopMost enables quick utilization, fair comparisons, and flexible extensions of different topic models. This can facilitate the research and applications of topic models. Our code, tutorials, and documentation are available at https://github.com/bobxwu/topmost.Here's the word-for-word translation:Topic models已经提出了数十年,具有各种应用,最近又被变量推理重新启发。然而,这些主题模型使用完全不同的数据集、实现和评估设置,这会大大阻碍它们的快速使用和公正比较。这阻碍了主题模型的研究进步。为了解决这些问题,在这篇论文中,我们提出了一个主题模型系统工具集(TopMost)。相比之下,现有的工具集,TopMost覆盖了更广泛的主题模型enario,包括完整的生命周期,从数据预处理、模型训练、测试、评估等方面。TopMost的高度凝结和分离的模块化设计,使得快速使用、公正比较和扩展不同的主题模型变得更加容易。这可以促进主题模型的研究和应用。我们的代码、教程和文档可以在https://github.com/bobxwu/topmost中找到。

OWL Reasoners still useable in 2023

  • paper_url: http://arxiv.org/abs/2309.06888
  • repo_url: https://github.com/k00ni/owl-reasoner-list
  • paper_authors: Konrad Abicht
  • for: 这篇论文的目的是对OWL理解器进行系统性的文献和软件综述,以确定它们在2023年是否仍然可用。
  • methods: 本论文使用了对100个OWL理解器/系统的分析,以及对每个item的项目页面、源代码仓库和相关文档的收集。
  • results: 研究得到了95个独立的OWL理解器和使用OWL理解器的系统,并提供了相关的原始研究数据,以便任何人使用。
    Abstract In a systematic literature and software review over 100 OWL reasoners/systems were analyzed to see if they would still be usable in 2023. This has never been done in this capacity. OWL reasoners still play an important role in knowledge organisation and management, but the last comprehensive surveys/studies are more than 8 years old. The result of this work is a comprehensive list of 95 standalone OWL reasoners and systems using an OWL reasoner. For each item, information on project pages, source code repositories and related documentation was gathered. The raw research data is provided in a Github repository for anyone to use.
    摘要 在一项系统性的文献和软件综述中,对超过100个OWL理解器/系统进行了分析,以确定它们在2023年仍然可以使用。这是历史上没有done before。OWL理解器仍然在知识组织和管理中发挥着重要作用,但最后一次全面的调查/研究已经超过8年了。这项工作的结果是一份包含95个独立OWL理解器和使用OWL理解器的系统的完整列表。对每个项目,我们收集了项目页面、源代码存储库和相关文档的信息。 raw research data 被提供到GitHub存储库中,任何人都可以使用。

Gpachov at CheckThat! 2023: A Diverse Multi-Approach Ensemble for Subjectivity Detection in News Articles

  • paper_url: http://arxiv.org/abs/2309.06844
  • repo_url: None
  • paper_authors: Georgi Pachov, Dimitar Dimitrov, Ivan Koychev, Preslav Nakov
  • for: 本研究旨在提高社交媒体上的信息对象性和质量,通过对话检测技术来检测信息的主观性。
  • methods: 本研究使用了三种不同的研究方向:一是基于精度调整的句子嵌入模型和维度减少;二是使用一种具有几种语言的多语言变换器,在修改后的数据集上进行了训练;三是使用一种几何学上的简单多数投票算法,将三种方法进行简单的加权投票。
  • results: 本研究在CheckThat! lab Task~2的测试集上取得了0.77宏F1的成绩,在英语子任务上位列第二名。
    Abstract The wide-spread use of social networks has given rise to subjective, misleading, and even false information on the Internet. Thus, subjectivity detection can play an important role in ensuring the objectiveness and the quality of a piece of information. This paper presents the solution built by the Gpachov team for the CLEF-2023 CheckThat! lab Task~2 on subjectivity detection. Three different research directions are explored. The first one is based on fine-tuning a sentence embeddings encoder model and dimensionality reduction. The second one explores a sample-efficient few-shot learning model. The third one evaluates fine-tuning a multilingual transformer on an altered dataset, using data from multiple languages. Finally, the three approaches are combined in a simple majority voting ensemble, resulting in 0.77 macro F1 on the test set and achieving 2nd place on the English subtask.
    摘要 因为社交媒体的广泛使用,互联网上的信息中有很多主观、误导和False信息。因此,主观检测可以为确保信息的 объекivity 和质量做出重要的贡献。这篇文章介绍了Gpachov 团队为 CLEF-2023 CheckThat! 实验室任务2的主观检测解决方案。文章 explore 了三个不同的研究方向:一是基于精细调整句子嵌入模型和维度减少;二是探索一种高效的少量样本学习模型;三是使用多种语言的多语言 transformer 模型在修改后的数据集上进行了调整。最后,这三种方法被组合成了简单的多数投票ensemble,在测试集上获得了0.77宏折衔率,并在英语子任务上获得了第二名。

On the Local Quadratic Stability of T-S Fuzzy Systems in the Vicinity of the Origin

  • paper_url: http://arxiv.org/abs/2309.06841
  • repo_url: None
  • paper_authors: Donghwan Lee, Do Wan Kim
  • for: 这篇论文旨在提出新的连续时间 Takagi-Sugeno(T-S)朴素系统的本地稳定条件。
  • methods: 这些稳定条件基于线性矩阵不等式(LMIs),并利用quadratic Lyapunov函数。这些条件能够充分利用Underlying nonlinear系统的线性结构,并且在起点处提供信息。
  • results: 提出的条件比现有的杂化Lyapunov函数方法更为不保守,并且 prove to be necessary and sufficient conditions for the local exponential stability of T-S fuzzy systems.
    Abstract The main goal of this paper is to introduce new local stability conditions for continuous-time Takagi-Sugeno (T-S) fuzzy systems. These stability conditions are based on linear matrix inequalities (LMIs) in combination with quadratic Lyapunov functions. Moreover, they integrate information on the membership functions at the origin and effectively leverage the linear structure of the underlying nonlinear system in the vicinity of the origin. As a result, the proposed conditions are proved to be less conservative compared to existing methods using fuzzy Lyapunov functions in the literature. Moreover, we establish that the proposed methods offer necessary and sufficient conditions for the local exponential stability of T-S fuzzy systems. The paper also includes discussions on the inherent limitations associated with fuzzy Lyapunov approaches. To demonstrate the theoretical results, we provide comprehensive examples that elucidate the core concepts and validate the efficacy of the proposed conditions.
    摘要 本文的主要目标是介绍新的连续时间高aki-Sugeno(T-S)混合系统的本地稳定条件。这些稳定条件基于线性矩阵不等式(LMIs),并与 quadratic Lyapunov 函数相结合。此外,它们还利用了非线性系统的原点附近的线性结构,并且考虑了 membership 函数的原点信息。因此,提出的条件比现有的使用混合Lyapunov函数的方法更加不保守。此外,我们证明了该方法对 T-S 混合系统的本地 экспоненциаль稳定是必要和充分的。文章还探讨了混合Lyapunov 方法的内在限制。为validate the theoretical results, we provide comprehensive examples that illustrate the key concepts and demonstrate the effectiveness of the proposed conditions.Note: "高aki-Sugeno" is the Simplified Chinese term for "Takagi-Sugeno".

SAMUS: Adapting Segment Anything Model for Clinically-Friendly and Generalizable Ultrasound Image Segmentation

  • paper_url: http://arxiv.org/abs/2309.06824
  • repo_url: https://github.com/xianlin7/samus
  • paper_authors: Xian Lin, Yangyang Xiang, Li Zhang, Xin Yang, Zengqiang Yan, Li Yu
  • for: 这篇论文是为了提出一种适用于ultrasound图像分割的通用模型(SAMUS),以提高现有基础模型(SAM)在医疗图像分割 task 的表现。
  • methods: 该模型基于 SAM 的基础,增加了一个平行 CNN 分支,通过交叉分支注意力来注入本地特征到 ViT 编码器,以提高医疗图像分割的表现。此外,还开发了位置适配器和特征适配器,以适应 SAM 的医疗图像分割 task 和小型输入 (256x256) 的要求。
  • results: 对比评测表明,SAMUS 在任务特定模型和基础模型下都具有显著优势,并且可以在入门级 GPU 上部署。code、数据和模型将在 GitHub 上发布。
    Abstract Segment anything model (SAM), an eminent universal image segmentation model, has recently gathered considerable attention within the domain of medical image segmentation. Despite the remarkable performance of SAM on natural images, it grapples with significant performance degradation and limited generalization when confronted with medical images, particularly with those involving objects of low contrast, faint boundaries, intricate shapes, and diminutive sizes. In this paper, we propose SAMUS, a universal model tailored for ultrasound image segmentation. In contrast to previous SAM-based universal models, SAMUS pursues not only better generalization but also lower deployment cost, rendering it more suitable for clinical applications. Specifically, based on SAM, a parallel CNN branch is introduced to inject local features into the ViT encoder through cross-branch attention for better medical image segmentation. Then, a position adapter and a feature adapter are developed to adapt SAM from natural to medical domains and from requiring large-size inputs (1024x1024) to small-size inputs (256x256) for more clinical-friendly deployment. A comprehensive ultrasound dataset, comprising about 30k images and 69k masks and covering six object categories, is collected for verification. Extensive comparison experiments demonstrate SAMUS's superiority against the state-of-the-art task-specific models and universal foundation models under both task-specific evaluation and generalization evaluation. Moreover, SAMUS is deployable on entry-level GPUs, as it has been liberated from the constraints of long sequence encoding. The code, data, and models will be released at https://github.com/xianlin7/SAMUS.
    摘要 segments anything model (SAM),一种受欢迎的通用图像分割模型,在医疗图像分割领域已经吸引了广泛的关注。尽管SAM在自然图像上表现出色,但在医疗图像上却面临较大的性能下降和限制,特别是对于低对比度、柔和的边界、复杂形态和小型对象的分割。在本文中,我们提出了SAMUS,一种适用于ultrasound图像分割的通用模型。与前一代基于SAM的通用模型不同,SAMUS不仅增加了更好的泛化性,还减少了部署成本,使其更适合临床应用。具体来说,基于SAM,我们在ViT编码器中引入了并行的CNN分支,通过交叉分支注意力注入本地特征,以改进医疗图像分割。然后,我们开发了位置适应器和特征适应器,以适应SAM从自然到医疗领域的转换,并从需要大型输入(1024x1024)降低到小型输入(256x256),以便更加临床友好的部署。我们收集了一个涵盖6类对象的完整ultrasound数据集,包括约30k张图像和69k个Mask,并进行了广泛的比较试验。结果表明SAMUS在比较当今任务特定模型和基础模型下显示出优异性,同时也可以在泛化评价中达到优秀的表现。此外,SAMUS可以在入门级GPU上部署,因为它已经脱离了长序编码的限制。代码、数据和模型将在https://github.com/xianlin7/SAMUS上公开。

Comparative Analysis of Contextual Relation Extraction based on Deep Learning Models

  • paper_url: http://arxiv.org/abs/2309.06814
  • repo_url: None
  • paper_authors: R. Priyadharshini, G. Jeyakodi, P. Shanthi Bala
  • for: 本研究旨在构建域知 graph,使用 Ontology 进行 semantic search、query answering 和 textual entailment。
  • methods: 本文使用深度学习技术来提取复杂关系,解决现有机器学习和自然语言处理技术无法高效地预测多个关系和未特定实体的问题。
  • results: 本研究表明,hybrid deep learning 模型可以有效地提取复杂句子中的关系,并且可以解决现有机器学习模型无法预测多个关系的问题。
    Abstract Contextual Relation Extraction (CRE) is mainly used for constructing a knowledge graph with a help of ontology. It performs various tasks such as semantic search, query answering, and textual entailment. Relation extraction identifies the entities from raw texts and the relations among them. An efficient and accurate CRE system is essential for creating domain knowledge in the biomedical industry. Existing Machine Learning and Natural Language Processing (NLP) techniques are not suitable to predict complex relations from sentences that consist of more than two relations and unspecified entities efficiently. In this work, deep learning techniques have been used to identify the appropriate semantic relation based on the context from multiple sentences. Even though various machine learning models have been used for relation extraction, they provide better results only for binary relations, i.e., relations occurred exactly between the two entities in a sentence. Machine learning models are not suited for complex sentences that consist of the words that have various meanings. To address these issues, hybrid deep learning models have been used to extract the relations from complex sentence effectively. This paper explores the analysis of various deep learning models that are used for relation extraction.
    摘要 Contextual Relation Extraction (CRE) 主要用于构建知识图库,帮助了 Ontology。它执行多种任务,如semantic search、查询回答和文本推理。relation extraction 可以从 raw text 中提取实体和关系。一个有效和准确的 CRE 系统对生物医学领域的Domain Knowledge 创造至关重要。现有的机器学习和自然语言处理(NLP)技术不适用于从多个句子中提取复杂关系。在这种情况下,深度学习技术被使用,以基于上下文从多个句子中提取适当的semantic关系。尽管许多机器学习模型已经用于relation extraction,但它们只能提供二分关系(即sentence中两个实体之间的关系)的更好结果。机器学习模型不适用于含有多义词的复杂句子。为解决这些问题,hybrid深度学习模型被用来从复杂句子中提取关系。本文探讨了不同的深度学习模型在relation extraction中的分析。

Leveraging SE(3) Equivariance for Learning 3D Geometric Shape Assembly

  • paper_url: http://arxiv.org/abs/2309.06810
  • repo_url: https://github.com/crtie/leveraging-se-3-equivariance-for-learning-3d-geometric-shape-assembly
  • paper_authors: Ruihai Wu, Chenrui Tie, Yushi Du, Yan Zhao, Hao Dong
  • for: 本研究目的是解决受损物体重新组装问题,即在计算机视觉和机器人领域中的一个新兴任务。不同于传统的semantic部件组装(例如将 chair的Semantic部件如脚部组装成整个椅子),这个任务更关注几何信息,而不是semantic信息。
  • methods: 我们提出了利用SE(3)对称性来实现几何 pose 分解的方法。而在之前的视觉和机器人研究中,只考虑单个物体表示的SE(3)对称性,我们则是在考虑多部件相互关系的情况下提出了利用SE(3)对称性的方法,这有助于提高多部件组装的性能。
  • results: 实验结果表明,SE(3)对称性和我们提出的方法对几何形态组装具有重要作用。这些结果表明,在多部件组装任务中,利用SE(3)对称性可以提高组装精度和效率。项目页面:https://crtie.github.io/SE-3-part-assembly/
    Abstract Shape assembly aims to reassemble parts (or fragments) into a complete object, which is a common task in our daily life. Different from the semantic part assembly (e.g., assembling a chair's semantic parts like legs into a whole chair), geometric part assembly (e.g., assembling bowl fragments into a complete bowl) is an emerging task in computer vision and robotics. Instead of semantic information, this task focuses on geometric information of parts. As the both geometric and pose space of fractured parts are exceptionally large, shape pose disentanglement of part representations is beneficial to geometric shape assembly. In our paper, we propose to leverage SE(3) equivariance for such shape pose disentanglement. Moreover, while previous works in vision and robotics only consider SE(3) equivariance for the representations of single objects, we move a step forward and propose leveraging SE(3) equivariance for representations considering multi-part correlations, which further boosts the performance of the multi-part assembly. Experiments demonstrate the significance of SE(3) equivariance and our proposed method for geometric shape assembly. Project page: https://crtie.github.io/SE-3-part-assembly/
    摘要 Shape assembly aims to reassemble parts (or fragments) into a complete object, which is a common task in our daily life. Different from the semantic part assembly (e.g., assembling a chair's semantic parts like legs into a whole chair), geometric part assembly (e.g., assembling bowl fragments into a complete bowl) is an emerging task in computer vision and robotics. Instead of semantic information, this task focuses on geometric information of parts. As the both geometric and pose space of fractured parts are exceptionally large, shape pose disentanglement of part representations is beneficial to geometric shape assembly. In our paper, we propose to leverage SE(3) equivariance for such shape pose disentanglement. Moreover, while previous works in vision and robotics only consider SE(3) equivariance for the representations of single objects, we move a step forward and propose leveraging SE(3) equivariance for representations considering multi-part correlations, which further boosts the performance of the multi-part assembly. Experiments demonstrate the significance of SE(3) equivariance and our proposed method for geometric shape assembly. Project page: .

Bayesian uncertainty-weighted loss for improved generalisability on polyp segmentation task

  • paper_url: http://arxiv.org/abs/2309.06807
  • repo_url: None
  • paper_authors: Rebecca S. Stone, Pedro E. Chavarrias-Solano, Andrew J. Bulpitt, David C. Hogg, Sharib Ali
  • for: 这 paper 的目的是为了提高肿瘤分类的一致性,减少因为不同中心或受测器等因素而导致的模型偏见。
  • methods: 这 paper 使用了一种基于 Bayesian 的 epistemic uncertainty 来降低模型内在偏见,以提高模型对不同中心和受测器等不寻常数据的一致性。
  • results: 这 paper 在一个具有多中心和多数据模式的肿瘤分类数据集(PolypGen)上进行了评估,结果显示这种方法可以提高模型的一致性无需损失现有的表现水准。
    Abstract While several previous studies have devised methods for segmentation of polyps, most of these methods are not rigorously assessed on multi-center datasets. Variability due to appearance of polyps from one center to another, difference in endoscopic instrument grades, and acquisition quality result in methods with good performance on in-distribution test data, and poor performance on out-of-distribution or underrepresented samples. Unfair models have serious implications and pose a critical challenge to clinical applications. We adapt an implicit bias mitigation method which leverages Bayesian epistemic uncertainties during training to encourage the model to focus on underrepresented sample regions. We demonstrate the potential of this approach to improve generalisability without sacrificing state-of-the-art performance on a challenging multi-center polyp segmentation dataset (PolypGen) with different centers and image modalities.
    摘要 Previous studies have proposed various methods for segmenting polyps, but most of these methods have not been rigorously evaluated on multi-center datasets. The differences in the appearance of polyps between centers, the grades of endoscopic instruments, and the variations in image acquisition quality can lead to models with good performance on in-distribution test data but poor performance on out-of-distribution or underrepresented samples. This can have serious implications for clinical applications. We have adapted an approach to mitigate implicit bias by leveraging Bayesian epistemic uncertainties during training to encourage the model to focus on underrepresented sample regions. We demonstrate the potential of this approach to improve generalizability without sacrificing state-of-the-art performance on a challenging multi-center polyp segmentation dataset (PolypGen) with different centers and image modalities.Here's the word-for-word translation of the text into Simplified Chinese:先前的研究已经提出了多种方法来分割肿体,但大多数这些方法没有在多中心数据集上进行了审核。不同的中心所出现的肿体的外观、endoскоpic工具的级别和图像获取的质量变化会导致模型在不同的测试数据上表现出色,但对于少数或下 represents 的样本表现差。这会对临床应用造成严重的问题。我们采用了一种避免偏见的方法,通过在训练时使用泛化不确定性来强制模型关注下 represents 的样本区域。我们在一个多中心肿体分割数据集(PolypGen)上示出了这种方法的潜在改进性,不 sacrifice 先进的性能。

FedDIP: Federated Learning with Extreme Dynamic Pruning and Incremental Regularization

  • paper_url: http://arxiv.org/abs/2309.06805
  • repo_url: https://github.com/ericloong/feddip
  • paper_authors: Qianyu Long, Christos Anagnostopoulos, Shameem Puthiya Parambath, Daning Bi
  • for: 这篇论文旨在提出一个新的联合式学习(Federated Learning)框架,以确保在分布式训练和推断大规模深度学习网络(DNNs)中,能够有效地控制模型的缩减和储存。
  • methods: 这篇论文使用了动态模型删除和错误反馈来删除无用的信息交换,并且运用了增量调整来实现极端简化的模型。
  • results: 论文的结果显示,FedDIP可以不仅控制模型的缩减,而且能够实现和其他模型删除方法相似或更好的性能。
    Abstract Federated Learning (FL) has been successfully adopted for distributed training and inference of large-scale Deep Neural Networks (DNNs). However, DNNs are characterized by an extremely large number of parameters, thus, yielding significant challenges in exchanging these parameters among distributed nodes and managing the memory. Although recent DNN compression methods (e.g., sparsification, pruning) tackle such challenges, they do not holistically consider an adaptively controlled reduction of parameter exchange while maintaining high accuracy levels. We, therefore, contribute with a novel FL framework (coined FedDIP), which combines (i) dynamic model pruning with error feedback to eliminate redundant information exchange, which contributes to significant performance improvement, with (ii) incremental regularization that can achieve \textit{extreme} sparsity of models. We provide convergence analysis of FedDIP and report on a comprehensive performance and comparative assessment against state-of-the-art methods using benchmark data sets and DNN models. Our results showcase that FedDIP not only controls the model sparsity but efficiently achieves similar or better performance compared to other model pruning methods adopting incremental regularization during distributed model training. The code is available at: https://github.com/EricLoong/feddip.
    摘要 联合学习(Federated Learning,FL)已经成功地应用于分布式训练和推导大规模深度神经网络(Deep Neural Networks,DNNs)。然而,DNNs 具有极高的参数数量,因此在分布式节点之间交换这些参数具有很大的挑战。 although recent DNN compression methods (e.g., sparsification, pruning) 可以解决这些挑战,但它们不会全面考虑适当地控制参数交换的减少,以维持高精度水平。我们因此提出了一个新的 FL 框架(称为 FedDIP),它结合了(i)动态模型剔除与错误反馈,以消除重复的资讯交换,从而获得了显著的性能改善,以及(ii)增量调整,可以实现极端简化的模型。我们提供了 FedDIP 的凝聚分析和比较性评估,使用了标准的 benchmark 数据集和 DNN 模型。我们的结果显示,FedDIP 不仅可以控制模型的简化,而且可以实现类似或更好的性能,与其他适用增量调整的模型剔除方法相比。代码可以在:https://github.com/EricLoong/feddip 中找到。

Defensive Alliances in Signed Networks

  • paper_url: http://arxiv.org/abs/2309.06801
  • repo_url: None
  • paper_authors: Emmanuel Arrighi, Zhidan Feng, Henning Fernau, Kevin Mann, Xingqin Qi, Petra Wolf
  • for: 本研究旨在找到一个集合 agents可以共同实现一个目标。
  • methods: 本研究使用 signed networks 模型 liking 和 disliking 间的agent,并提出了一种新的defensive alliance 概念。
  • results: 研究回答了一些自然的算法问题,以及一些 combinatorial 发现,connect our notion to correlation clustering 和 signed neighborhood diversity snd。还提出了一种 parameterized algorithm 来找到一个最小的defensive alliance。
    Abstract The analysis of (social) networks and multi-agent systems is a central theme in Artificial Intelligence. Some line of research deals with finding groups of agents that could work together to achieve a certain goal. To this end, different notions of so-called clusters or communities have been introduced in the literature of graphs and networks. Among these, defensive alliance is a kind of quantitative group structure. However, all studies on the alliance so for have ignored one aspect that is central to the formation of alliances on a very intuitive level, assuming that the agents are preconditioned concerning their attitude towards other agents: they prefer to be in some group (alliance) together with the agents they like, so that they are happy to help each other towards their common aim, possibly then working against the agents outside of their group that they dislike. Signed networks were introduced in the psychology literature to model liking and disliking between agents, generalizing graphs in a natural way. Hence, we propose the novel notion of a defensive alliance in the context of signed networks. We then investigate several natural algorithmic questions related to this notion. These, and also combinatorial findings, connect our notion to that of correlation clustering, which is a well-established idea of finding groups of agents within a signed network. Also, we introduce a new structural parameter for signed graphs, signed neighborhood diversity snd, and exhibit a parameterized algorithm that finds a smallest defensive alliance in a signed graph.
    摘要 social networks 和多代理系统的分析是人工智能的中心主题。一些研究的目标是找到一些代理工作 вместе以实现某个目标。为此,图和网络的不同的定义叫做集群或社区在文献中出现。其中,防御联盟是一种量化的群体结构。然而,所有关于联盟的研究都忽略了一个非常直观的因素:代理在与其他代理之间有互恩的情感,因此偏好与自己喜欢的代理一起组成一个群体,以便帮助彼此实现共同目标,可能然后与外部的代理进行冲突。 signed networks 是在心理学文献中引入的,以模型代理之间的喜欢和讨厌关系,自然地扩展了图的概念。因此,我们提出了在 signed networks 上的新的防御联盟概念。我们然后调查了这个概念相关的自然的算法问题,以及 combinatorial 发现。这些问题与另一个已知的idea——相关均衡 clustering——相连。此外,我们引入了一个新的 signed 图结构参数,signed neighborhood diversity snd,并提供了一个 parameterized 算法,用于在 signed 图上找到最小的防御联盟。

Uncertainty-aware Traffic Prediction under Missing Data

  • paper_url: http://arxiv.org/abs/2309.06800
  • repo_url: None
  • paper_authors: Hao Mei, Junxian Li, Zhiming Liang, Guanjie Zheng, Bin Shi, Hua Wei
  • for: 这 paper 是为了解决 traffic prediction 中的一个重要问题,即如何在缺失历史记录的位置上进行预测。
  • methods: 这 paper 使用了一种基于 inductive graph neural network 的 uncertainty-aware 框架,可以1) 将预测扩展到缺失历史记录的位置,并2) 生成可预测性的预测结果,同时Quantify the uncertainty of the prediction results.
  • results: 经过广泛的实验 validate 了这 paper 的方法,可以在 real-life 数据上达到了良好的预测结果,并且uncertainty quantification 的结果与历史数据和无历史数据的位置之间呈高度相关性。此外,这 paper 还展示了其方法可以帮助交通部门在限定的投入下选择合适的感知器,以提高预测结果的准确性。
    Abstract Traffic prediction is a crucial topic because of its broad scope of applications in the transportation domain. Recently, various studies have achieved promising results. However, most studies assume the prediction locations have complete or at least partial historical records and cannot be extended to non-historical recorded locations. In real-life scenarios, the deployment of sensors could be limited due to budget limitations and installation availability, which makes most current models not applicable. Though few pieces of literature tried to impute traffic states at the missing locations, these methods need the data simultaneously observed at the locations with sensors, making them not applicable to prediction tasks. Another drawback is the lack of measurement of uncertainty in prediction, making prior works unsuitable for risk-sensitive tasks or involving decision-making. To fill the gap, inspired by the previous inductive graph neural network, this work proposed an uncertainty-aware framework with the ability to 1) extend prediction to missing locations with no historical records and significantly extend spatial coverage of prediction locations while reducing deployment of sensors and 2) generate probabilistic prediction with uncertainty quantification to help the management of risk and decision making in the down-stream tasks. Through extensive experiments on real-life datasets, the result shows our method achieved promising results on prediction tasks, and the uncertainty quantification gives consistent results which highly correlated with the locations with and without historical data. We also show that our model could help support sensor deployment tasks in the transportation field to achieve higher accuracy with a limited sensor deployment budget.
    摘要 traffic prediction 是一个重要的话题,因为它在交通领域有广泛的应用。在最近的研究中,许多研究已经获得了有前途的结果。然而,大多数研究假设预测位置有完整或至少部分的历史记录,并不能扩展到无历史记录的位置。在实际的场景中,投入设备的限制和安装可用性可能会限制投入设备的数量,使现有的模型无法适用。虽然一些文献尝试了填充交通状态的缺失位置,但这些方法需要同时观测的数据在投入设备上,使得它们无法适用于预测任务。另外,现有的方法缺乏测量预测结果的不确定性,使得前一代的研究不适用于风险敏感任务或决策任务。为了填补这个空白,我们提出了一种不确定性意识框架,能够1)扩展预测至缺失位置,大幅减少投入设备数量,同时提高预测精度,2)生成probabilistic预测,并对预测结果进行不确定性评估,以帮助管理风险和决策任务。经过广泛的实验研究,我们的方法在预测任务中获得了优秀的结果,并且不确定性评估与位置有 historia record 的相关性很高。我们还证明了我们的模型可以帮助交通领域中的投入设备部署任务实现更高的准确率,使用有限的投入设备预算。

When Geoscience Meets Foundation Models: Towards General Geoscience Artificial Intelligence System

  • paper_url: http://arxiv.org/abs/2309.06799
  • repo_url: None
  • paper_authors: Hao Zhang, Jin-Jian Xu
  • for: 这篇论文旨在探讨地球系统的动态模型,以探索地球系统的演化和发展。
  • methods: 这篇论文使用了跨学科数据集合来模拟地球系统的动态,并使用人工智能技术来探索这些数据的关系。
  • results: 这篇论文获得了一些关于地球系统的预测和模拟结果,并提供了一些关于地球系统的未来发展的探访。
    Abstract Geoscience foundation models represent a revolutionary approach in the field of Earth sciences by integrating massive cross-disciplinary data to simulate and understand the Earth systems dynamics. As a data-centric artificial intelligence (AI) paradigm, they uncover insights from petabytes of structured and unstructured data. Flexible task specification, diverse inputs and outputs and multi-modal knowledge representation enable comprehensive analysis infeasible with individual data sources. Critically, the scalability and generalizability of geoscience models allow for tackling diverse prediction, simulation, and decision challenges related to Earth systems interactions. Collaboration between domain experts and computer scientists leads to innovations in these invaluable tools for understanding the past, present, and future of our planet. However, challenges remain in validation and verification, scale, interpretability, knowledge representation, and social bias. Going forward, enhancing model integration, resolution, accuracy, and equity through cross-disciplinary teamwork is key. Despite current limitations, geoscience foundation models show promise for providing critical insights into pressing issues including climate change, natural hazards, and sustainability through their ability to probe scenarios and quantify uncertainties. Their continued evolution toward integrated, data-driven modeling holds paradigm-shifting potential for Earth science.
    摘要 地球科学基础模型代表了地球科学领域的一种革命性的方法,通过融合各个领务的大量交叉学科数据来模拟和理解地球系统的动态。作为一种数据驱动人工智能(AI)范例,它们揭示了数据中的新发现,从 petabytes 的结构化和无结构化数据中提取了有价值的信息。灵活的任务规定、多样化的输入和输出以及多Modal 的知识表示使得全面的分析变得可能,而各个数据源之间的集成分析则是不可能的。重要的是,地球科学基础模型的可扩展性和通用性使得可以解决多种预测、模拟和决策问题 related to Earth systems interactions。通过域专家和计算机科学家之间的合作,这些无价的工具为我们理解地球的过去、当前和未来带来了创新。然而,验证和验证、缺失、知识表示和社会偏见仍然是挑战。未来,通过跨学科团队的努力,地球科学基础模型的集成、分辨率、准确性和公平性将得到改进。尽管当前存在限制,但地球科学基础模型仍然拥有提供关键的洞察和量化不确定性的潜力,它们的持续演化将对地球科学产生 Paradigm-shifting 的影响。

Cognitive Mirage: A Review of Hallucinations in Large Language Models

  • paper_url: http://arxiv.org/abs/2309.06794
  • repo_url: https://github.com/hongbinye/cognitive-mirage-hallucinations-in-llms
  • paper_authors: Hongbin Ye, Tong Liu, Aijia Zhang, Wei Hua, Weiqiang Jia
  • for: 这 study 探讨了大语言模型中的幻觉现象,以及幻觉的类型、检测方法和改进策略。
  • methods: 该 paper 使用了 various 文本生成任务中的幻觉类型,并提供了理论分析、检测方法和改进方向。
  • results: 该 paper 提供了一个完整的幻觉分类系统,并提供了现有的检测和改进方法。 future research directions 也被提出了。
    Abstract As large language models continue to develop in the field of AI, text generation systems are susceptible to a worrisome phenomenon known as hallucination. In this study, we summarize recent compelling insights into hallucinations in LLMs. We present a novel taxonomy of hallucinations from various text generation tasks, thus provide theoretical insights, detection methods and improvement approaches. Based on this, future research directions are proposed. Our contribution are threefold: (1) We provide a detailed and complete taxonomy for hallucinations appearing in text generation tasks; (2) We provide theoretical analyses of hallucinations in LLMs and provide existing detection and improvement methods; (3) We propose several research directions that can be developed in the future. As hallucinations garner significant attention from the community, we will maintain updates on relevant research progress.
    摘要 Large language models (LLMs) 在人工智能领域的发展中,文本生成系统受到一种有害现象的威胁,称为投影。在这项研究中,我们summarize了最近的有力证明,描述了不同的文本生成任务中的投影现象,并提供了理论分析、检测方法和改进方法。根据这些成果,我们提出了未来研究的方向。我们的贡献包括以下三个方面:1. 我们提供了文本生成任务中投影现象的完整和详细的分类体系。2. 我们提供了LLMs中投影现象的理论分析,并提供了现有的检测和改进方法。3. 我们提出了未来研究的多个方向,以探索投影现象的原因和解决方案。由于投影现象在社区中受到广泛关注,我们将继续更新相关的研究进展。

Predicting Survival Time of Ball Bearings in the Presence of Censoring

  • paper_url: http://arxiv.org/abs/2309.07188
  • repo_url: https://github.com/thecml/ball-bearing-survival
  • paper_authors: Christian Marius Lillelund, Fernando Pannullo, Morten Opprud Jakobsen, Christian Fischer Pedersen
  • for: 这 paper 的目的是提出一种基于存活分析的方法来预测球磨具的时间到失败。
  • methods: 该 paper 使用了存活分析在时域和频域中分析数据,并使用了一些生存模型来估计时间到失败。
  • results: 该 paper 在 XJTU 和 PRONOSTIA 数据集上实现了良好的预测结果,其中 XJTU 上得到的最佳结果是0.70的 concordance-index和0.21的累积布莱尔分数,PRONOSTIA 上得到的最佳结果是0.76的 concordance-index和0.19的累积布莱尔分数。
    Abstract Ball bearings find widespread use in various manufacturing and mechanical domains, and methods based on machine learning have been widely adopted in the field to monitor wear and spot defects before they lead to failures. Few studies, however, have addressed the problem of censored data, in which failure is not observed. In this paper, we propose a novel approach to predict the time to failure in ball bearings using survival analysis. First, we analyze bearing data in the frequency domain and annotate when a bearing fails by comparing the Kullback-Leibler divergence and the standard deviation between its break-in frequency bins and its break-out frequency bins. Second, we train several survival models to estimate the time to failure based on the annotated data and covariates extracted from the time domain, such as skewness, kurtosis and entropy. The models give a probabilistic prediction of risk over time and allow us to compare the survival function between groups of bearings. We demonstrate our approach on the XJTU and PRONOSTIA datasets. On XJTU, the best result is a 0.70 concordance-index and 0.21 integrated Brier score. On PRONOSTIA, the best is a 0.76 concordance-index and 0.19 integrated Brier score. Our work motivates further work on incorporating censored data in models for predictive maintenance.
    摘要 滚球支持件在不同的生产和机械领域中广泛使用,而基于机器学习技术的监测方法也在这一领域得到了广泛应用。然而,有很少的研究专门关注缺失数据(失败不观察)问题。在这篇论文中,我们提出了一种新的方法,用于预测滚球支持件的时间到失败使用存生分析。首先,我们分析了滚球数据在频域中,并在break-in和break-out频率分布之间进行了比较,以确定缺失数据。然后,我们使用了多种存生模型来估算时间到失败的风险,并使用了时域中的特征,如方差、泊松指数和自 entropy。这些模型可以为不同的滚球支持件提供可比较的生存函数,并允许我们对缺失数据进行处理。我们在XJTU和PRONOSTIA数据集上进行了实践,最佳结果为XJTU的0.70 concordance-index和0.21 integral Brier score,PRONOSTIA的0.76 concordance-index和0.19 integral Brier score。我们的工作鼓励了更多关于缺失数据的包含在预测维护模型中的进一步研究。

Generative AI

  • paper_url: http://arxiv.org/abs/2309.07930
  • repo_url: https://github.com/StanGirard/quivr
  • paper_authors: Stefan Feuerriegel, Jochen Hartmann, Christian Janiesch, Patrick Zschech
  • for: 本研究旨在探讨Generative AI在信息系统中的应用和发展,以及其对BISE领域的影响和挑战。
  • methods: 本研究使用概率模型、深度学习和自然语言处理等技术,并提供了一些实际应用例如Dall-E 2、GPT-4和Copilot等。
  • results: 本研究发现当前Generative AI技术存在一些限制和挑战,如数据质量、隐私和安全等问题,并提出了BISE领域的研究论点和方向。
    Abstract The term "generative AI" refers to computational techniques that are capable of generating seemingly new, meaningful content such as text, images, or audio from training data. The widespread diffusion of this technology with examples such as Dall-E 2, GPT-4, and Copilot is currently revolutionizing the way we work and communicate with each other. In this article, we provide a conceptualization of generative AI as an entity in socio-technical systems and provide examples of models, systems, and applications. Based on that, we introduce limitations of current generative AI and provide an agenda for Business & Information Systems Engineering (BISE) research. Different from previous works, we focus on generative AI in the context of information systems, and, to this end, we discuss several opportunities and challenges that are unique to the BISE community and make suggestions for impactful directions for BISE research.
    摘要 “生成AI”是指通过训练数据生成新的、有意义的内容,如文本、图像或音频等。现在,这种技术的广泛散布,如达尔-E2、GPT-4和副手等,正在改变我们工作和交流方式。本文将生成AI视为社会技术系统中的一个实体,并提供模型、系统和应用的示例。然后,我们介绍当前生成AI的限制,并为商业信息系统工程(BISE)研究提供研究议程。与之前的研究不同,我们在信息系统上下文中强调生成AI,并讨论了BISE社区独特的机遇和挑战,并提供了影响力的BISE研究方向。

Fundamental Limits of Deep Learning-Based Binary Classifiers Trained with Hinge Loss

  • paper_url: http://arxiv.org/abs/2309.06774
  • repo_url: None
  • paper_authors: Tilahun M. Getu, Georges Kaddoum
  • for: 本文旨在解释深度学习(DL)在多个领域中的成功原因,以及DL的核心问题。
  • methods: 本文使用了多种创新,包括优化、泛化和近似方法,以解决DL的问题。
  • results: 本文derived了深度Rectified Linear Unit(ReLU)隐藏层神经网络(FNN)和深度FNNwith ReLU和Tanh激活函数的二分类测试性能上限。这些上限被 validate by extensive computer experiments。
    Abstract Although deep learning (DL) has led to several breakthroughs in many disciplines as diverse as chemistry, computer science, electrical engineering, mathematics, medicine, neuroscience, and physics, a comprehensive understanding of why and how DL is empirically successful remains fundamentally elusive. To attack this fundamental problem and unravel the mysteries behind DL's empirical successes, significant innovations toward a unified theory of DL have been made. These innovations encompass nearly fundamental advances in optimization, generalization, and approximation. Despite these advances, however, no work to date has offered a way to quantify the testing performance of a DL-based algorithm employed to solve a pattern classification problem. To overcome this fundamental challenge in part, this paper exposes the fundamental testing performance limits of DL-based binary classifiers trained with hinge loss. For binary classifiers that are based on deep rectified linear unit (ReLU) feedforward neural networks (FNNs) and ones that are based on deep FNNs with ReLU and Tanh activation, we derive their respective novel asymptotic testing performance limits. The derived testing performance limits are validated by extensive computer experiments.
    摘要 Note:* "深度学习" (shēn dào xué xí) is the Simplified Chinese term for "deep learning".* "hinge loss" (折衣函数) is the Simplified Chinese term for "hinge loss".* "ReLU" (Rectified Linear Unit) is the Simplified Chinese term for "Rectified Linear Unit".* "Tanh" (Hyperbolic Tangent) is the Simplified Chinese term for "Hyperbolic Tangent".

Enhancing Keyphrase Generation by BART Finetuning with Splitting and Shuffling

  • paper_url: http://arxiv.org/abs/2309.06726
  • repo_url: None
  • paper_authors: Bin Chen, Mizuho Iwaihara
  • for: 本研究旨在提高缺失关键短语生成的性能,提出了关键短语集中心化的BART模型,并实现了缺失关键短语生成的精度优化。
  • methods: 本研究使用了序列到序列模型,并进行了两个独立的BART模型的微调,以便更好地生成缺失关键短语。
  • results: 对于缺失关键短语生成,本研究的Keyphrase-Focused BART在五个关键短语生成测试集上的F1@5指标中取得了新的州际最佳成绩。
    Abstract Keyphrase generation is a task of identifying a set of phrases that best repre-sent the main topics or themes of a given text. Keyphrases are dividend int pre-sent and absent keyphrases. Recent approaches utilizing sequence-to-sequence models show effectiveness on absent keyphrase generation. However, the per-formance is still limited due to the hardness of finding absent keyphrases. In this paper, we propose Keyphrase-Focused BART, which exploits the differ-ences between present and absent keyphrase generations, and performs fine-tuning of two separate BART models for present and absent keyphrases. We further show effective approaches of shuffling keyphrases and candidate keyphrase ranking. For absent keyphrases, our Keyphrase-Focused BART achieved new state-of-the-art score on F1@5 in two out of five keyphrase gen-eration benchmark datasets.
    摘要 <> tranlate_into_simplified_chinese键短语生成是一个文本中主题或主题的精炼过程,用于标识文本中的关键短语。键短语可以分为存在和缺失两类。 latest approaches 使用序列到序列模型,表现良好在缺失键短语生成中。然而,性能仍然有限因为缺失键短语的发现困难。在这篇论文中,我们提出了键短语专注的BART,利用存在和缺失键短语生成的差异,并对两个分离的BART模型进行了精炼。我们还提出了键短语洗牌和候选键短语排名的有效方法。对于缺失键短语,我们的键短语专注BART实现了五个键短语生成 benchmark dataset 中的新状态的末端得分。

Dynamic Spectrum Mixer for Visual Recognition

  • paper_url: http://arxiv.org/abs/2309.06721
  • repo_url: None
  • paper_authors: Zhiqiang Hu, Tao Yu
  • for: 提高多种视觉识别任务的性能,包括图像分类、物体检测和semantic segmentation。
  • methods: 使用Discrete Cosine Transform来在频域上表示 токен之间的互动,并提出了动态频谱权重生成层来选择有用的频谱频率。
  • results: 在多种视觉识别任务上达到了比较高的性能,比如 ImageNet 上的 top-1 准确率达到 83.8%,以及 ADE20K 上的 mIoU 达到 49.9%。
    Abstract Recently, MLP-based vision backbones have achieved promising performance in several visual recognition tasks. However, the existing MLP-based methods directly aggregate tokens with static weights, leaving the adaptability to different images untouched. Moreover, Recent research demonstrates that MLP-Transformer is great at creating long-range dependencies but ineffective at catching high frequencies that primarily transmit local information, which prevents it from applying to the downstream dense prediction tasks, such as semantic segmentation. To address these challenges, we propose a content-adaptive yet computationally efficient structure, dubbed Dynamic Spectrum Mixer (DSM). The DSM represents token interactions in the frequency domain by employing the Discrete Cosine Transform, which can learn long-term spatial dependencies with log-linear complexity. Furthermore, a dynamic spectrum weight generation layer is proposed as the spectrum bands selector, which could emphasize the informative frequency bands while diminishing others. To this end, the technique can efficiently learn detailed features from visual input that contains both high- and low-frequency information. Extensive experiments show that DSM is a powerful and adaptable backbone for a range of visual recognition tasks. Particularly, DSM outperforms previous transformer-based and MLP-based models, on image classification, object detection, and semantic segmentation tasks, such as 83.8 \% top-1 accuracy on ImageNet, and 49.9 \% mIoU on ADE20K.
    摘要 近期,基于多层感知(MLP)的视觉基干已经实现了许多视觉识别任务的出色表现。然而,现有的MLP基本方法直接汇集token的静态重量,未能考虑不同图像的适应性。此外,最近的研究表明,MLP transformer在创建长距离依赖关系方面卓越,但缺乏捕捉高频信息,主要传输本地信息的能力,这使得它无法应用于下游密集预测任务,如semantic segmentation。为Address这些挑战,我们提议一种内容适应且计算效率高的结构,名为动态спектルmixer(DSM)。DSM通过使用Discrete Cosine Transform来表示token之间的交互,可以学习长期空间依赖关系,并且可以快速学习细致的特征。此外,我们还提出了一种动态spectrum weight生成层,可以强调有用的频率带,而忽略其他带。通过这种方式,我们可以高效地从视觉输入中学习细致的特征,包括高频和低频信息。我们的实验表明,DSM是一种强大和适应的视觉基干,可以在多种视觉识别任务中表现出色,包括ImageNet的83.8%的top-1准确率和ADE20K的49.9%的mIoU。

TrafficGPT: Viewing, Processing and Interacting with Traffic Foundation Models

  • paper_url: http://arxiv.org/abs/2309.06719
  • repo_url: https://github.com/lijlansg/trafficgpt
  • paper_authors: Siyao Zhang, Daocheng Fu, Zhao Zhang, Bin Yu, Pinlong Cai
  • for: 提高城市交通管理和控制的能力,尤其是处理数字数据和与 simulate 的交互。
  • methods: 结合 ChatGPT 和专业交通基础模型,以提高解决复杂交通问题的能力。
  • results: 提供了一种新的 Approach 到利用 AI 技术解决城市交通问题,并可以提供有用的决策支持。
    Abstract With the promotion of chatgpt to the public, Large language models indeed showcase remarkable common sense, reasoning, and planning skills, frequently providing insightful guidance. These capabilities hold significant promise for their application in urban traffic management and control. However, LLMs struggle with addressing traffic issues, especially processing numerical data and interacting with simulations, limiting their potential in solving traffic-related challenges. In parallel, specialized traffic foundation models exist but are typically designed for specific tasks with limited input-output interactions. Combining these models with LLMs presents an opportunity to enhance their capacity for tackling complex traffic-related problems and providing insightful suggestions. To bridge this gap, we present TrafficGPT, a fusion of ChatGPT and traffic foundation models. This integration yields the following key enhancements: 1) empowering ChatGPT with the capacity to view, analyze, process traffic data, and provide insightful decision support for urban transportation system management; 2) facilitating the intelligent deconstruction of broad and complex tasks and sequential utilization of traffic foundation models for their gradual completion; 3) aiding human decision-making in traffic control through natural language dialogues; and 4) enabling interactive feedback and solicitation of revised outcomes. By seamlessly intertwining large language model and traffic expertise, TrafficGPT not only advances traffic management but also offers a novel approach to leveraging AI capabilities in this domain. The TrafficGPT demo can be found in https://github.com/lijlansg/TrafficGPT.git.
    摘要 With the promotion of ChatGPT to the public, large language models have indeed shown remarkable common sense, reasoning, and planning skills, frequently providing insightful guidance. These capabilities hold significant promise for their application in urban traffic management and control. However, LLMs struggle with addressing traffic issues, especially processing numerical data and interacting with simulations, limiting their potential in solving traffic-related challenges. In parallel, specialized traffic foundation models exist but are typically designed for specific tasks with limited input-output interactions. Combining these models with LLMs presents an opportunity to enhance their capacity for tackling complex traffic-related problems and providing insightful suggestions. To bridge this gap, we present TrafficGPT, a fusion of ChatGPT and traffic foundation models. This integration yields the following key enhancements:1. 允许ChatGPTView、分析和处理交通数据,为城市交通系统管理提供智能支持;2. 可以智能分解广泛和复杂的任务,并逐步使用交通基础模型完成其完成;3. 帮助人类决策交通控制通过自然语言对话;4. 允许人类反馈和修改结果。通过融合大语言模型和交通专家知识,TrafficGPT不仅提高了交通管理,还提供了一种新的使用AI技术解决交通问题的方法。TrafficGPT demo可以在https://github.com/lijlansg/TrafficGPT.git中找到。

Tackling the Non-IID Issue in Heterogeneous Federated Learning by Gradient Harmonization

  • paper_url: http://arxiv.org/abs/2309.06692
  • repo_url: None
  • paper_authors: Xinyu Zhang, Weiyu Sun, Ying Chen
  • for: 提高 Federated Learning(FL)的性能,解决非独立同分布(non-IID)数据和设备不一致性的挑战。
  • methods: 通过对服务器端的梯度冲突现象进行分析,提出了一种简单 yet effective的 Gradient Harmonization(梯度融合)技术来mitigate local drifts。
  • results: 在多个 benchmark 和 non-IID 场景中,FedGH 能够顺利提高多个 state-of-the-art FL 基础系统,特别是在强度不一致的场景中表现更优异。
    Abstract Federated learning (FL) is a privacy-preserving paradigm for collaboratively training a global model from decentralized clients. However, the performance of FL is hindered by non-independent and identically distributed (non-IID) data and device heterogeneity. In this work, we revisit this key challenge through the lens of gradient conflicts on the server side. Specifically, we first investigate the gradient conflict phenomenon among multiple clients and reveal that stronger heterogeneity leads to more severe gradient conflicts. To tackle this issue, we propose FedGH, a simple yet effective method that mitigates local drifts through Gradient Harmonization. This technique projects one gradient vector onto the orthogonal plane of the other within conflicting client pairs. Extensive experiments demonstrate that FedGH consistently enhances multiple state-of-the-art FL baselines across diverse benchmarks and non-IID scenarios. Notably, FedGH yields more significant improvements in scenarios with stronger heterogeneity. As a plug-and-play module, FedGH can be seamlessly integrated into any FL framework without requiring hyperparameter tuning.
    摘要

Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in Robotics

  • paper_url: http://arxiv.org/abs/2309.06687
  • repo_url: https://github.com/zhehuazhou/llm_reward_design
  • paper_authors: Jiayang Song, Zhehua Zhou, Jiawei Liu, Chunrong Fang, Zhan Shu, Lei Ma
  • for: 自动化奖函数设计
  • methods: 利用大型自然语言模型(LLM)自我修复机制
  • results: 在三个不同的机器人系统上进行多种连续控制任务中,LLM设计的奖函数能与人工设计的奖函数相比或者超越它们,证明了方法的可行性和应用性。
    Abstract Although Deep Reinforcement Learning (DRL) has achieved notable success in numerous robotic applications, designing a high-performing reward function remains a challenging task that often requires substantial manual input. Recently, Large Language Models (LLMs) have been extensively adopted to address tasks demanding in-depth common-sense knowledge, such as reasoning and planning. Recognizing that reward function design is also inherently linked to such knowledge, LLM offers a promising potential in this context. Motivated by this, we propose in this work a novel LLM framework with a self-refinement mechanism for automated reward function design. The framework commences with the LLM formulating an initial reward function based on natural language inputs. Then, the performance of the reward function is assessed, and the results are presented back to the LLM for guiding its self-refinement process. We examine the performance of our proposed framework through a variety of continuous robotic control tasks across three diverse robotic systems. The results indicate that our LLM-designed reward functions are able to rival or even surpass manually designed reward functions, highlighting the efficacy and applicability of our approach.
    摘要 尽管深度强化学习(DRL)在机器人应用中已经取得了显著的成功,但是设计高性能的奖金函数仍然是一项具有挑战性的任务,往往需要大量的手动输入。在最近,大型自然语言模型(LLM)在涉及深度通用常识知识的任务上得到了广泛的应用,如理解和规划。认识到奖金函数设计与这些知识息息相关,LLM具有潜在的应用潜力。在这种情况下,我们在这个工作中提出了一种基于LLM的自动奖金函数设计框架。框架开始于LLM根据自然语言输入形成初始奖金函数。然后,奖金函数的性能被评估,并将结果返回给LLM以引导其自我修正过程。我们通过三种不同的机器人控制任务来评估我们的提议的性能。结果表明,我们使用LLM设计的奖金函数能够与或超过手动设计的奖金函数,这说明了我们的方法的有效性和可行性。

Attention Loss Adjusted Prioritized Experience Replay

  • paper_url: http://arxiv.org/abs/2309.06684
  • repo_url: None
  • paper_authors: Zhuoying Chen, Huiping Li, Rizhong Wang
  • for: 提高深度强化学习训练速度
  • methods: 使用改进的自我注意网络和双抽样机制来调整重要抽样 вес,从而消除PER引起的估计误差
  • results: 在OPENAI gym中测试了值函数基于、政策梯度基于和多Agent强化学习算法,并进行了比较研究,证明提出的训练框架具有优势和效率。
    Abstract Prioritized Experience Replay (PER) is a technical means of deep reinforcement learning by selecting experience samples with more knowledge quantity to improve the training rate of neural network. However, the non-uniform sampling used in PER inevitably shifts the state-action space distribution and brings the estimation error of Q-value function. In this paper, an Attention Loss Adjusted Prioritized (ALAP) Experience Replay algorithm is proposed, which integrates the improved Self-Attention network with Double-Sampling mechanism to fit the hyperparameter that can regulate the importance sampling weights to eliminate the estimation error caused by PER. In order to verify the effectiveness and generality of the algorithm, the ALAP is tested with value-function based, policy-gradient based and multi-agent reinforcement learning algorithms in OPENAI gym, and comparison studies verify the advantage and efficiency of the proposed training framework.
    摘要 <>使用优先验收(PER)技术进行深度学习强化学习,可以提高神经网络训练速率。然而,PER 中的不均匀采样无可避免地导致状态动作空间分布的变化,从而引起 Q-值函数的估计误差。在这篇论文中,一种 Attention Loss Adjusted Prioritized(ALAP)经验回放算法被提出,该算法将改进的 Self-Attention 网络和双抽样机制结合起来,以适应调整重要抽样重量,以消除 PER 引起的估计误差。为证明 ALAP 的效果和通用性,该算法在 OPENAI gym 中使用值函数基本、政策梯度基本和多 Agent 强化学习算法进行测试,并进行对比研究,以验证提案的训练框架的优势和效率。>>>

A plug-and-play synthetic data deep learning for undersampled magnetic resonance image reconstruction

  • paper_url: http://arxiv.org/abs/2309.06681
  • repo_url: None
  • paper_authors: Min Xiao, Zi Wang, Jiefeng Guo, Xiaobo Qu
  • for: 提高现代医疗影像诊断中的磁共振成像(MRI)的速度,使其具有更快的扫描速度。
  • methods: 使用深度学习方法对快速扫描MRI数据进行重建,并可以适应不同的抽样设定。
  • results: 在实验数据上表明,提议的深度插件和游戏方法可以在不同的抽样模式和抽样速率下提供美观和稳定的加速图像重建性能。
    Abstract Magnetic resonance imaging (MRI) plays an important role in modern medical diagnostic but suffers from prolonged scan time. Current deep learning methods for undersampled MRI reconstruction exhibit good performance in image de-aliasing which can be tailored to the specific kspace undersampling scenario. But it is very troublesome to configure different deep networks when the sampling setting changes. In this work, we propose a deep plug-and-play method for undersampled MRI reconstruction, which effectively adapts to different sampling settings. Specifically, the image de-aliasing prior is first learned by a deep denoiser trained to remove general white Gaussian noise from synthetic data. Then the learned deep denoiser is plugged into an iterative algorithm for image reconstruction. Results on in vivo data demonstrate that the proposed method provides nice and robust accelerated image reconstruction performance under different undersampling patterns and sampling rates, both visually and quantitatively.
    摘要

SHARM: Segmented Head Anatomical Reference Models

  • paper_url: http://arxiv.org/abs/2309.06677
  • repo_url: None
  • paper_authors: Essam A. Rashed, Mohammad al-Shatouri, Ilkka Laakso, Akimasa Hirata
  • for: 这个研究的目的是提供一个开放访问的 Segmented Head Anatomical Reference Models(SHARM),用于衡量人头的不同组织部分的分布。
  • methods: 这个研究使用了开源的IXI MRI数据集和一种名为ForkNet+的卷积神经网络结构进行人头的分类。
  • results: 研究发现,SHARM在年龄层的统计特征上具有高一致性,与实际测量结果相符。 SHARM预期能够为电磁吸收研究提供一个有用的参照基准,同时也可以用于不同的人头分类应用。
    Abstract Reliable segmentation of anatomical tissues of human head is a major step in several clinical applications such as brain mapping, surgery planning and associated computational simulation studies. Segmentation is based on identifying different anatomical structures through labeling different tissues through medical imaging modalities. The segmentation of brain structures is commonly feasible with several remarkable contributions mainly for medical perspective; however, non-brain tissues are of less interest due to anatomical complexity and difficulties to be observed using standard medical imaging protocols. The lack of whole head segmentation methods and unavailability of large human head segmented datasets limiting the variability studies, especially in the computational evaluation of electrical brain stimulation (neuromodulation), human protection from electromagnetic field, and electroencephalography where non-brain tissues are of great importance. To fill this gap, this study provides an open-access Segmented Head Anatomical Reference Models (SHARM) that consists of 196 subjects. These models are segmented into 15 different tissues; skin, fat, muscle, skull cancellous bone, skull cortical bone, brain white matter, brain gray matter, cerebellum white matter, cerebellum gray matter, cerebrospinal fluid, dura, vitreous humor, lens, mucous tissue and blood vessels. The segmented head models are generated using open-access IXI MRI dataset through convolutional neural network structure named ForkNet+. Results indicate a high consistency in statistical characteristics of different tissue distribution in age scale with real measurements. SHARM is expected to be a useful benchmark not only for electromagnetic dosimetry studies but also for different human head segmentation applications.
    摘要 “人头部分 Segmentation 是诊断和运行方面的重要一步,包括脑地图、手术规划和相关的计算模拟研究。Segmentation 基于医学影像模式中不同的 анатомі学结构的识别,并将不同的组织分类为不同的 ткани。脑部分的 Segmentation 通常是医学观点上可行的,但非脑组织则因为 анатомі学复杂和普通医学影像实验困难而被忽略。没有整体人头部分 Segmentation 方法和大量人头部分分类dataset的不足,限制了多样性研究,特别是电子脑刺激(neuromodulation)、人体对电磁场的保护和电enzephalography 等方面的研究,非脑组织在这些领域非常重要。为了填补这个 gap,本研究提供了一个开放存取的 Segmented Head Anatomical Reference Models (SHARM),包括196个诊断件。这些模型分成15种不同的 ткани,包括皮肤、脂肪、肌肉、皮屑骨、骨髓、脑白 matter、脑灰 matter、脑灰白 matter、脑灰质液、脑髓液、眼镜膜、肉眼和血管。这些分类的人头模型通过开源的 IXI MRI 数据集通过 Convolutional Neural Network 结构 named ForkNet+ 生成。结果显示这些模型在年龄对应的统计特征上有高一致性。SHARM 预期会成为电磁质量研究之外,不同人头分类应用的有用benchmark。”

Large Language Models Can Infer Psychological Dispositions of Social Media Users

  • paper_url: http://arxiv.org/abs/2309.08631
  • repo_url: None
  • paper_authors: Heinrich Peters, Sandra Matz
  • for: 本研究旨在 investigate Large Language Models (LLMs) 的能力,以及这些模型在自然语言处理任务中的人类化能力。
  • methods: 本研究使用 GPT-3.5 和 GPT-4 来推断用户的 Facebook 状态更新中的心理特质。
  • results: 研究发现,LLMs 能够很准确地推断用户的 Five Big Personality Traits, correlation coefficient 平均为 r = .29(range = [.22, .33])。然而,研究还发现了性别和年龄的偏见,即对于女性和年轻人来说,LLMs 的推断结果更为准确。
    Abstract As Large Language Models (LLMs) demonstrate increasingly human-like abilities in various natural language processing (NLP) tasks that are bound to become integral to personalized technologies, understanding their capabilities and inherent biases is crucial. Our study investigates the potential of LLMs like ChatGPT to infer psychological dispositions of individuals from their digital footprints. Specifically, we assess the ability of GPT-3.5 and GPT-4 to derive the Big Five personality traits from users' Facebook status updates in a zero-shot learning scenario. Our results show an average correlation of r = .29 (range = [.22, .33]) between LLM-inferred and self-reported trait scores. Furthermore, our findings suggest biases in personality inferences with regard to gender and age: inferred scores demonstrated smaller errors for women and younger individuals on several traits, suggesting a potential systematic bias stemming from the underlying training data or differences in online self-expression.
    摘要 As Large Language Models (LLMs) demonstrate increasingly human-like abilities in various natural language processing (NLP) tasks that are bound to become integral to personalized technologies, understanding their capabilities and inherent biases is crucial. Our study investigates the potential of LLMs like ChatGPT to infer psychological dispositions of individuals from their digital footprints. Specifically, we assess the ability of GPT-3.5 and GPT-4 to derive the Big Five personality traits from users' Facebook status updates in a zero-shot learning scenario. Our results show an average correlation of r = .29 (range = [.22, .33]) between LLM-inferred and self-reported trait scores. Furthermore, our findings suggest biases in personality inferences with regard to gender and age: inferred scores demonstrated smaller errors for women and younger individuals on several traits, suggesting a potential systematic bias stemming from the underlying training data or differences in online self-expression.Here's the translation in Traditional Chinese:随着大型语言模型(LLMs)在不同的自然语言处理(NLP)任务中展示出人类化的能力,理解它们的能力和内在偏见是非常重要的。我们的研究探讨了 ChatGPT 等 LLMs 是否可以从用户的数码足迹中推断个人心理特质。具体来说,我们评估 GPT-3.5 和 GPT-4 是否可以从 Facebook 状态更新中推断 Big Five 人格特质。我们的结果显示,LLM-推断的和自我报告的特质分数之间的平均相关 coefficient 为 r = .29(range = [.22, .33])。此外,我们发现在不同的性别和年龄群体中,推断的特质分数存在偏见:对某些特质而言,女性和年轻人的推断分数更为精确,这可能是训练数据或在线自我表达中的偏见。

Offline Prompt Evaluation and Optimization with Inverse Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.06553
  • repo_url: https://github.com/holarissun/Prompt-OIRL
  • paper_authors: Hao Sun
    for:这个研究目的是为了提高大型自然语言模型(LLMs)的效能,并且可以实现人工智能的应用。methods:这个研究使用了逆引导式学习(Inverse-RL)来评估提示的效能,并且使用了网上的专家评价数据来 derivate 一个价值函数。results:这个研究获得了以下结果: Prompt-OIRL 可以预测提示的效能,是成本高效的,生成了人阅读的结果,并且能够有效地探索提示空间。
    Abstract The recent advances in the development of Large Language Models (LLMs) like ChatGPT have achieved remarkable performance by leveraging human expertise. Yet, fully eliciting LLMs' potential for complex tasks requires navigating the vast search space of natural language prompts. While prompt engineering has shown promise, the requisite human-crafted prompts in trial-and-error attempts and the associated costs pose significant challenges. Crucially, the efficiency of prompt optimization hinges on the costly procedure of prompt evaluation. This work introduces Prompt-OIRL, an approach rooted in offline inverse reinforcement learning that seeks to bridge the gap between effective prompt evaluation and affordability. Our method draws on offline datasets from expert evaluations, employing Inverse-RL to derive a reward model for offline, query-dependent prompt evaluations. The advantages of Prompt-OIRL are manifold: it predicts prompt performance, is cost-efficient, produces human-readable results, and efficiently navigates the prompt space. We validate our method across four LLMs and three arithmetic datasets, highlighting its potential as a robust and effective tool for offline prompt evaluation and optimization. Our code as well as the offline datasets are released, and we highlight the Prompt-OIRL can be reproduced within a few hours using a single laptop using CPU
    摘要 近期,大型自然语言模型(LLM)的开发已取得了显著的进步,如ChatGPT。然而,为复杂任务完全发挥LLM的潜力需要在自然语言提示的庞大搜索空间中穿梭。虽然提示工程学习表现了承诺,但是在尝试和错误中制定人工提示的过程和相关的成本带来了重大挑战。关键是提示优化的效率取决于提示评估的昂贵过程。本文提出了Prompt-OIRL方法,它基于线上反束学习来bridging提示评估的效率和可Affordability之间的 gap。我们的方法利用了线上专家评估数据,使用反束RL来 derive一个偏好prompt评估的奖励模型。Prompt-OIRL的优点包括:预测提示性能、成本效益、生成人readable的结果和高效地探索提示空间。我们验证了Prompt-OIRL在四个LLM和三个算术数据集上,并 highlighted its potential as a robust and effective tool for offline prompt evaluation and optimization。我们的代码以及offline数据集都已经发布,并且在使用单个 laptop CPU 上进行了复现,可以在几个小时内完成。

A Health Monitoring System Based on Flexible Triboelectric Sensors for Intelligence Medical Internet of Things and its Applications in Virtual Reality

  • paper_url: http://arxiv.org/abs/2309.07185
  • repo_url: None
  • paper_authors: Junqi Mao, Puen Zhou, Xiaoyao Wang, Hongbo Yao, Liuyang Liang, Yiqiao Zhao, Jiawei Zhang, Dayan Ban, Haiwu Zheng
    for:这项研究旨在开发一个可靠且智能的医疗物联网系统,以满足当代数字化和智能化时代的精准医疗、智能医疗和电子医疗需求。methods:研究人员采用了可变式摩擦电镜技术,并结合深度学习分析数据,实现了一个智能医疗监测系统,可以跟踪和分析患有parkinson病的患者的手部运动。results:研究人员实现了一个可靠、高敏感、智能的医疗监测系统,可以准确地捕捉和分析患有parkinson病的患者的细微运动和细腔运动。这种监测系统可以提供有价值的反馈和全面评估患者的情况,因此强调了人体感知技术在健康4.0社会中的极大潜力。
    Abstract The Internet of Medical Things (IoMT) is a platform that combines Internet of Things (IoT) technology with medical applications, enabling the realization of precision medicine, intelligent healthcare, and telemedicine in the era of digitalization and intelligence. However, the IoMT faces various challenges, including sustainable power supply, human adaptability of sensors and the intelligence of sensors. In this study, we designed a robust and intelligent IoMT system through the synergistic integration of flexible wearable triboelectric sensors and deep learning-assisted data analytics. We embedded four triboelectric sensors into a wristband to detect and analyze limb movements in patients suffering from Parkinson's Disease (PD). By further integrating deep learning-assisted data analytics, we actualized an intelligent healthcare monitoring system for the surveillance and interaction of PD patients, which includes location/trajectory tracking, heart monitoring and identity recognition. This innovative approach enabled us to accurately capture and scrutinize the subtle movements and fine motor of PD patients, thus providing insightful feedback and comprehensive assessment of the patients conditions. This monitoring system is cost-effective, easily fabricated, highly sensitive, and intelligent, consequently underscores the immense potential of human body sensing technology in a Health 4.0 society.
    摘要 互联网医疗物联网(IoMT)是一种将互联网物联网(IoT)技术与医疗应用结合的平台,使得精准医疗、智能医疗和 теле医学在数字化和智能时代得到实现。然而,IoMT面临着不同的挑战,包括可持续的电源供应、人类感知器的适应和感知器的智能。在本研究中,我们设计了一个可靠和智能的IoMT系统,通过同时结合柔性摩擦电子感知器和深度学习帮助数据分析。我们将四个摩擦电子感知器 integrate into a wristband to detect and analyze limb movements in patients with Parkinson's disease (PD).通过进一步结合深度学习帮助数据分析,我们实现了一个智能医疗监测系统,用于PD患者的位置/轨迹跟踪、心跳监测和身份识别。这种创新的方法使得我们能够准确捕捉和分析PD患者的微小动作和细 Motor,从而提供了深入的反馈和全面的病情评估。这种监测系统是成本低、易制造、高敏感和智能的,因此强调了人体感知技术在健康4.0社会中的无限潜力。