cs.AI - 2023-08-08

A Lightweight and Accurate Face Detection Algorithm Based on Retinaface

  • paper_url: http://arxiv.org/abs/2308.04340
  • repo_url: None
  • paper_authors: Baozhu Liu, Hewei Yu
  • for: 这个论文提出了一种轻量级准确的人脸检测算法LAFD(轻量级和准确的人脸检测),基于Retinaface。
  • methods: 这个算法使用了修改后的MobileNetV3网络作为后处网络,并将核心卷积的大小、通道扩展乘数和倒置径向块中的SE注意力机制调整。在Context模块中引入了弹性卷积网络(DCN),并使用了焦点损失函数而不是交叉熵损失函数作为模型的分类损失函数。
  • results: 测试结果表明,LAFD在WIDERFACE数据集上的均值准确率为94.1%、92.2%和82.1%,与Retinaface相比提高3.4%、4.0%和8.3%,并且与轻量级模型LFFD相比提高3.1%、4.1%和4.1%。如果输入图像先进行预处理并将其横幅或长宽尺寸调整到1560px或1200px,则模型在’hard’验证子集上的均值准确率为86.2%。模型轻量级,只有10.2MB大小。
    Abstract In this paper, we propose a lightweight and accurate face detection algorithm LAFD (Light and accurate face detection) based on Retinaface. Backbone network in the algorithm is a modified MobileNetV3 network which adjusts the size of the convolution kernel, the channel expansion multiplier of the inverted residuals block and the use of the SE attention mechanism. Deformable convolution network(DCN) is introduced in the context module and the algorithm uses focal loss function instead of cross-entropy loss function as the classification loss function of the model. The test results on the WIDERFACE dataset indicate that the average accuracy of LAFD is 94.1%, 92.2% and 82.1% for the "easy", "medium" and "hard" validation subsets respectively with an improvement of 3.4%, 4.0% and 8.3% compared to Retinaface and 3.1%, 4.1% and 4.1% higher than the well-performing lightweight model, LFFD. If the input image is pre-processed and scaled to 1560px in length or 1200px in width, the model achieves an average accuracy of 86.2% on the 'hard' validation subset. The model is lightweight, with a size of only 10.2MB.
    摘要 在这篇论文中,我们提出了一种轻量级并高度准确的人脸检测算法LAFD(轻量级和准确的人脸检测),基于Retinaface。这个算法中的基础网络是一种修改后的MobileNetV3网络,通过调整卷积核的大小、扩展通道多少和使用SE注意力机制来调整。在 context 模块中,我们引入了弹性卷积网络(DCN),并使用 focal loss 函数 instead of cross-entropy loss function 作为模型的分类损失函数。在 WIDERFACE 数据集上进行测试,LAFD 的平均准确率为 94.1%、92.2% 和 82.1% ,对 Retinaface 的提高为 3.4%、4.0% 和 8.3%,而与轻量级表现良好的模型 LFFD 的提高为 3.1%、4.1% 和 4.1%。如果输入图像经过预处理并将其扩展到 1560px 长或 1200px 宽,则模型在 'hard' 验证子集上的平均准确率为 86.2%。该模型轻量级,只有 10.2MB 大小。

Pengembangan Model untuk Mendeteksi Kerusakan pada Terumbu Karang dengan Klasifikasi Citra

  • paper_url: http://arxiv.org/abs/2308.04337
  • repo_url: None
  • paper_authors: Fadhil Muhammad, Alif Bintang Elfandra, Iqbal Pahlevi Amin, Alfan Farizki Wicaksono
  • For: 这个研究旨在开发一个精确的分类模型,以识别和区别健康和萎缩珊瑚的视觉特征。* Methods: 这个研究使用机器学习模型,特别是卷积神经网络(CNN),以识别和区别健康和萎缩珊瑚的视觉特征。* Results: 这个研究发现,由 scratch ResNet 模型可以在精度和准确性方面超越预训练的模型。这些精度的分类模型将有助研究人员和海洋生物学家更好地理解珊瑚礁生态环境的健康状况,并且可以用于监控珊瑚礁环境的变化,从而做出有关生态系统重建和保护的重要贡献。
    Abstract The abundant biodiversity of coral reefs in Indonesian waters is a valuable asset that needs to be preserved. Rapid climate change and uncontrolled human activities have led to the degradation of coral reef ecosystems, including coral bleaching, which is a critical indicator of coral health conditions. Therefore, this research aims to develop an accurate classification model to distinguish between healthy corals and corals experiencing bleaching. This study utilizes a specialized dataset consisting of 923 images collected from Flickr using the Flickr API. The dataset comprises two distinct classes: healthy corals (438 images) and bleached corals (485 images). These images have been resized to a maximum of 300 pixels in width or height, whichever is larger, to maintain consistent sizes across the dataset. The method employed in this research involves the use of machine learning models, particularly convolutional neural networks (CNN), to recognize and differentiate visual patterns associated with healthy and bleached corals. In this context, the dataset can be used to train and test various classification models to achieve optimal results. By leveraging the ResNet model, it was found that a from-scratch ResNet model can outperform pretrained models in terms of precision and accuracy. The success in developing accurate classification models will greatly benefit researchers and marine biologists in gaining a better understanding of coral reef health. These models can also be employed to monitor changes in the coral reef environment, thereby making a significant contribution to conservation and ecosystem restoration efforts that have far-reaching impacts on life.
    摘要 INDONESIA的珊瑚礁多样性具有巨大的价值,需要保护。快速的气候变化和无控制的人类活动导致珊瑚礁生态系统的退化,包括珊瑚病症,是珊瑚健康状况的重要指标。因此,这项研究的目标是开发一个准确的分类模型,以分辨健康的珊瑚和经受病症的珊瑚。本研究使用特殊的数据集,包括Flickr API上收集的923张图片。这个数据集包括两个不同的类别:健康的珊瑚(438张图片)和病症的珊瑚(485张图片)。这些图片已经被缩放到最多300像素的宽或高,以保持数据集中图片的尺寸一致。本研究使用机器学习模型,特别是卷积神经网络(CNN),识别和区分健康和病症珊瑚的视觉特征。在这种情况下,数据集可以用来训练和测试不同的分类模型,以达到最佳结果。通过利用ResNet模型,发现从头开始的ResNet模型可以在精度和准确性方面超越预训练模型。成功地开发准确的分类模型,将对研究人员和海洋生物学家提供深刻的理解,珊瑚礁的健康状况。这些模型也可以用来监测珊瑚礁环境的变化,从而为保护和生态系统重建做出重要贡献。

Cooperative Multi-agent Bandits: Distributed Algorithms with Optimal Individual Regret and Constant Communication Costs

  • paper_url: http://arxiv.org/abs/2308.04314
  • repo_url: None
  • paper_authors: Lin Yang, Xuchuang Wang, Mohammad Hajiesmaili, Lijun Zhang, John C. S. Lui, Don Towsley
  • for: 本文旨在研究分布式智能体在多重武器游戏中的合作游戏,以实现最佳团队和个体误差,同时减少智能体之间的通信成本。
  • methods: 本文使用分布式算法和领导者追随者模式来解决这个问题。
  • results: 本文的算法可以实现最佳个体误差和常数通信成本,并且超越了现有的分布式算法和领导者追随者模式。
    Abstract Recently, there has been extensive study of cooperative multi-agent multi-armed bandits where a set of distributed agents cooperatively play the same multi-armed bandit game. The goal is to develop bandit algorithms with the optimal group and individual regrets and low communication between agents. The prior work tackled this problem using two paradigms: leader-follower and fully distributed algorithms. Prior algorithms in both paradigms achieve the optimal group regret. The leader-follower algorithms achieve constant communication costs but fail to achieve optimal individual regrets. The state-of-the-art fully distributed algorithms achieve optimal individual regrets but fail to achieve constant communication costs. This paper presents a simple yet effective communication policy and integrates it into a learning algorithm for cooperative bandits. Our algorithm achieves the best of both paradigms: optimal individual regret and constant communication costs.
    摘要 近来,有广泛的研究关于协同多智能多手枪抽筋游戏,其中多个分布式代理共同参与同一个多手枪抽筋游戏。目标是开发抽筋算法,以便各个代理具有最佳小组和个人惩罚,同时减少代理之间的交流。先前的工作通过两种方法解决了这个问题:领导者-追随者和完全分布式算法。先前的领导者-追随者算法实现了常数交流成本,但失去最佳个人惩罚。现状的完全分布式算法实现了最佳个人惩罚,但失去常数交流成本。本文提出了一种简单又有效的交流策略,并将其 интегра到了一种学习算法中,以实现协同抽筋中的最佳个人惩罚和常数交流成本。

A Comparative Study of Code Generation using ChatGPT 3.5 across 10 Programming Languages

  • paper_url: http://arxiv.org/abs/2308.04477
  • repo_url: https://github.com/abuscemi02/A-Comparative-Study-of-Code-Generation-using-ChatGPT-3.5-across-10-Programming-Languages
  • paper_authors: Alessio Buscemi
  • for: 这个研究旨在评估OpenAI在11月2022年发布的ChatGPT 3.5语言模型在编程语言和软件领域中的代码创作能力。
  • methods: 该研究使用了10种编程语言和4种软件领域来评估模型的代码创作能力。
  • results: 研究发现了模型的一些意外行为和限制,以及 automatized code generation对编程语言和技术领域的演化的影响。
    Abstract Large Language Models (LLMs) are advanced Artificial Intelligence (AI) systems that have undergone extensive training using large datasets in order to understand and produce language that closely resembles that of humans. These models have reached a level of proficiency where they are capable of successfully completing university exams across several disciplines and generating functional code to handle novel problems. This research investigates the coding proficiency of ChatGPT 3.5, a LLM released by OpenAI in November 2022, which has gained significant recognition for its impressive text generating and code creation capabilities. The skill of the model in creating code snippets is evaluated across 10 various programming languages and 4 different software domains. Based on the findings derived from this research, major unexpected behaviors and limitations of the model have been identified. This study aims to identify potential areas for development and examine the ramifications of automated code generation on the evolution of programming languages and on the tech industry.
    摘要 大型语言模型(LLM)是人工智能(AI)系统的进步,经过大量数据训练以便理解和生成语言,与人类语言更加相似。这些模型已经达到了人类水平,能够成功完成大学考试的多个领域和解决新的问题。本研究探讨了ChatGPT 3.5,一个由OpenAI在2022年11月发布的LLM,它在文本生成和代码创建方面获得了广泛的赞誉。这个模型在10种程式语言和4个软件领域中创建代码的技能被评估。根据这些研究发现的结果,模型具有一些意外的行为和限制。本研究旨在确定模型的发展前景和自动代码生成对程式语言的演化和科技业的影响。

Apple Vision Pro for Healthcare: “The Ultimate Display”? – Entering the Wonderland of Precision

  • paper_url: http://arxiv.org/abs/2308.04313
  • repo_url: None
  • paper_authors: Jan Egger, Christina Gsaxner, Xiaojun Chen, Jiang Bian, Jens Kleesiek, Behrus Puladi
  • for: 这篇论文是关于Apple Vision Pro混合现实头戴式设备的研究,它可以作为虚拟现实(VR)设备,同时也具有增强现实(AR)功能。
  • methods: 该论文使用了内部摄像头和涂抹技术来实现头戴式设备的混合现实功能,同时还使用了一个名为”数字皇冠”的按钮来让用户轻松地融合数字内容和物理空间。
  • results: 该论文认为,Apple Vision Pro可以在医疗领域中提供更高效的辅助工具,帮助临床医生在诊断和治疗过程中占用更多时间与病人进行互动。
    Abstract At the Worldwide Developers Conference (WWDC) in June 2023, Apple introduced the Vision Pro. The Vision Pro is a Mixed Reality (MR) headset, more specifically it is a Virtual Reality (VR) device with an additional Video See-Through (VST) capability. The VST capability turns the Vision Pro also into an Augmented Reality (AR) device. The AR feature is enabled by streaming the real world via cameras to the (VR) screens in front of the user's eyes. This is of course not unique and similar to other devices, like the Varjo XR-3. Nevertheless, the Vision Pro has some interesting features, like an inside-out screen that can show the headset wearers' eyes to "outsiders" or a button on the top, called "Digital Crown", that allows you to seamlessly blend digital content with your physical space by turning it. In addition, it is untethered, except for the cable to the battery, which makes the headset more agile, compared to the Varjo XR-3. This could actually come closer to the "Ultimate Display", which Ivan Sutherland had already sketched in 1965. Not available to the public yet, like the Ultimate Display, we want to take a look into the crystal ball in this perspective to see if it can overcome some clinical challenges that - especially - AR still faces in the medical domain, but also go beyond and discuss if the Vision Pro could support clinicians in essential tasks to spend more time with their patients.
    摘要 在2023年6月的全球开发者大会(WWDC)上,苹果公司发布了“视野豪”(Vision Pro)混合现实(MR)头戴式设备,具体来说是虚拟现实(VR)设备具有视频增强(VST)功能。VST功能使得视野豪也成为了增强现实(AR)设备。AR功能由通过摄像头传输真实世界到用户的视网膜上的方式实现,这与其他设备类似,如Varjo XR-3。然而,视野豪有一些有趣的特点,如内置屏幕,可以在外部显示头戴式设备穿戴者的眼睛,以及位于顶部的“数字皇冠”(Digital Crown)按钮,可以轻松融合数字内容与实际空间。此外,它还不受绑定,除了电池供电的电缆,使得头戴式设备更加灵活,相比Varjo XR-3。这可能可以实现“最终显示”(Ultimate Display), Ivan Sutherland在1965年绘制的概念。虽然不如“最终显示”一样不到公众,但我们可以通过幻灯片来看看这个头戴式设备是否可以在医疗领域超越临床挑战,同时还可以讨论这个设备是否可以支持临床专业人员在实际任务中更多时间与病人进行互动。

Interpretable Goal-Based model for Vehicle Trajectory Prediction in Interactive Scenarios

  • paper_url: http://arxiv.org/abs/2308.04312
  • repo_url: None
  • paper_authors: Amina Ghoul, Itheri Yahiaoui, Anne Verroust-Blondet, Fawzi Nashashibi
  • for: 预测自动驾驶车辆的路径,提高道路安全性。
  • methods: combinatorial discrete choice model和神经网络模型的组合,以提高预测的可解释性。
  • results: 通过使用 INTERACTION 数据集,实现并评估了我们的方案,并证明了我们的方案可以准确地预测车辆路径而不会产生可解释性的损害。
    Abstract The abilities to understand the social interaction behaviors between a vehicle and its surroundings while predicting its trajectory in an urban environment are critical for road safety in autonomous driving. Social interactions are hard to explain because of their uncertainty. In recent years, neural network-based methods have been widely used for trajectory prediction and have been shown to outperform hand-crafted methods. However, these methods suffer from their lack of interpretability. In order to overcome this limitation, we combine the interpretability of a discrete choice model with the high accuracy of a neural network-based model for the task of vehicle trajectory prediction in an interactive environment. We implement and evaluate our model using the INTERACTION dataset and demonstrate the effectiveness of our proposed architecture to explain its predictions without compromising the accuracy.
    摘要 autonomous driving 的道路安全受到 vehicle 与周围环境之间的社交互动行为的理解是关键。社交互动的不确定性使其很难以解释。在过去几年,基于神经网络的方法在路径预测方面得到了广泛应用,但这些方法受到了其不可解释性的限制。为了缓解这个问题,我们将精确的选择模型与高精度的神经网络模型结合,以实现在交互环境中的路径预测。我们使用 INTERACTION 数据集进行实现和评估,并证明了我们的提议的建筑可以不妨碍准确性而提供解释。

Vehicle Motion Forecasting using Prior Information and Semantic-assisted Occupancy Grid Maps

  • paper_url: http://arxiv.org/abs/2308.04303
  • repo_url: None
  • paper_authors: Rabbia Asghar, Manuel Diaz-Zapata, Lukas Rummelhard, Anne Spalanzani, Christian Laugier
  • for: 本文是为了解决自动驾驶车辆中的动态预测问题,具体来说是使用卷积神经网络和概率方法预测车辆行为。
  • methods: 本文使用的方法包括将场景表示为动态占用Grid Maps(DOGMs),将占用细胞 assigning semantic标签,并使用地图信息。
  • results: 对于实际 NuScenes 数据集的测试和验证,本文的模型表现出色,能够更好地预测静止和动态车辆的行为,并且通过缺失数据集和地图信息的补做来证明模型的可靠性。
    Abstract Motion prediction is a challenging task for autonomous vehicles due to uncertainty in the sensor data, the non-deterministic nature of future, and complex behavior of agents. In this paper, we tackle this problem by representing the scene as dynamic occupancy grid maps (DOGMs), associating semantic labels to the occupied cells and incorporating map information. We propose a novel framework that combines deep-learning-based spatio-temporal and probabilistic approaches to predict vehicle behaviors.Contrary to the conventional OGM prediction methods, evaluation of our work is conducted against the ground truth annotations. We experiment and validate our results on real-world NuScenes dataset and show that our model shows superior ability to predict both static and dynamic vehicles compared to OGM predictions. Furthermore, we perform an ablation study and assess the role of semantic labels and map in the architecture.
    摘要 <> translate "Motion prediction is a challenging task for autonomous vehicles due to uncertainty in the sensor data, the non-deterministic nature of future, and complex behavior of agents. In this paper, we tackle this problem by representing the scene as dynamic occupancy grid maps (DOGMs), associating semantic labels to the occupied cells and incorporating map information. We propose a novel framework that combines deep-learning-based spatio-temporal and probabilistic approaches to predict vehicle behaviors.Contrary to the conventional OGM prediction methods, evaluation of our work is conducted against the ground truth annotations. We experiment and validate our results on real-world NuScenes dataset and show that our model shows superior ability to predict both static and dynamic vehicles compared to OGM predictions. Furthermore, we perform an ablation study and assess the role of semantic labels and map in the architecture." into 中文(简体)Here's the translation:<>预测行为是自动驾驶车辆的挑战之一,因为感知数据中的不确定性、未来的非束定性和智能代理人的复杂行为。在这篇论文中,我们通过将场景表示为动态占用格网图(DOGM),将占用细胞 association semantic label,并利用地图信息来解决这个问题。我们提出了一种新的框架, combining 深度学习基于空间temporal和概率方法来预测车辆行为。与传统 OGM 预测方法不同,我们的评估采用了真实的地图注释。我们在实际的 NuScenes 数据集上进行了实验和验证,并证明了我们的模型在预测静止和动态车辆方面具有更高的能力,比传统 OGM 预测方法更好。此外,我们还进行了减少研究,以评估semantic label和地图在架构中的作用。

Actor-Critic with variable time discretization via sustained actions

  • paper_url: http://arxiv.org/abs/2308.04299
  • repo_url: None
  • paper_authors: Jakub Łyskawa, Paweł Wawrzyński
  • for: 本研究使用强化学习方法解决维度缺失问题,并研究不同时间粒度设置对控制器性能的影响。
  • methods: 本文提出了一种名为SusACER的离政策强化学习算法,该算法在不同时间粒度设置下进行学习,并且可以在粗粒度和细粒度之间切换。
  • results: 在Ant、HalfCheetah、Hopper和Walker2D等四个机器人控制环境中,SusACER算法都能够超越当前最佳算法。
    Abstract Reinforcement learning (RL) methods work in discrete time. In order to apply RL to inherently continuous problems like robotic control, a specific time discretization needs to be defined. This is a choice between sparse time control, which may be easier to train, and finer time control, which may allow for better ultimate performance. In this work, we propose SusACER, an off-policy RL algorithm that combines the advantages of different time discretization settings. Initially, it operates with sparse time discretization and gradually switches to a fine one. We analyze the effects of the changing time discretization in robotic control environments: Ant, HalfCheetah, Hopper, and Walker2D. In all cases our proposed algorithm outperforms state of the art.
    摘要 重复学习(RL)方法在离散时间下运行。为了将RL应用于基于连续时间的问题,例如机器人控制,需要定义特定的时间离散设定。这是一个选择 между稀疏时间控制和精细时间控制的选择。在这种工作中,我们提出了 SusACER,一种离散RL算法,将不同时间离散设定的优点相互结合。首先,它使用稀疏时间离散,然后慢慢地转换到精细时间离散。我们对机器人控制环境中的Ant、半驰虎、跳跃机和 Walker2D进行分析,在所有情况下,我们的提议算法超越了现有的state of the art。

Engineering LaCAM$^\ast$: Towards Real-Time, Large-Scale, and Near-Optimal Multi-Agent Pathfinding

  • paper_url: http://arxiv.org/abs/2308.04292
  • repo_url: None
  • paper_authors: Keisuke Okumura
  • for: 本研究旨在解决实时、大规模、近似优质多智能路径找索(MAPF)问题,通过增强最近提出的LaCAM*算法进行改进。
  • methods: 本研究使用了各种改进技术,部分启发自其他MAPF方法,以提高LaCAM*算法的初始解质量和融合速度。
  • results: 经验证明,将这些改进技术融合到LaCAM*算法中,可以明显提高解质量,从而进一步推进MAPF算法的边缘。
    Abstract This paper addresses the challenges of real-time, large-scale, and near-optimal multi-agent pathfinding (MAPF) through enhancements to the recently proposed LaCAM* algorithm. LaCAM* is a scalable search-based algorithm that guarantees the eventual finding of optimal solutions for cumulative transition costs. While it has demonstrated remarkable planning success rates, surpassing various state-of-the-art MAPF methods, its initial solution quality is far from optimal, and its convergence speed to the optimum is slow. To overcome these limitations, this paper introduces several improvement techniques, partly drawing inspiration from other MAPF methods. We provide empirical evidence that the fusion of these techniques significantly improves the solution quality of LaCAM*, thus further pushing the boundaries of MAPF algorithms.
    摘要 Here is the text in Simplified Chinese:这篇论文解决了实时、大规模、近似优质多代理路径寻找(MAPF)的挑战,通过对最近提出的LaCAM*算法进行增强。LaCAM*是一种可扩展的搜索基本算法,可以 garantue the eventual finding of optimal solutions for cumulative transition costs。虽然它已经达到了多种状态前的寻找成功率,但其初始解质不佳,并且与优质相对较慢。为了超越这些限制,这篇论文提出了多种改进技术,部分 Draw inspiration from other MAPF methods。我们提供了实证证据,表明这些技术的融合可以 Significantly improve LaCAM*的解质,进一步推动MAPF算法的发展。

In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning

  • paper_url: http://arxiv.org/abs/2308.04275
  • repo_url: https://github.com/xhan77/in-context-alignment
  • paper_authors: Xiaochuang Han
  • for: 这 paper 的目的是探讨在下文学习中进行归一化的可能性。
  • methods: 这 paper 使用了一个 vanilla 预训练语言模型 Llama-2,并通过在下文中学习来实现归一化。
  • results: compared to 直接提示,在下context中进行归一化无需更改模型参数,可以提高 win-rate 7 倍,使 vanilla 语言模型与对齐 fine-tuning 的强基线模型相当。
    Abstract In this note, we explore inference-time alignment through in-context learning. We consider a vanilla pretrained language model Llama-2 before any fine-tuning and retrieve an average of 9 demonstration alignment examples when the model is prompted to follow chat-style instructions. Compared to direct prompting, the in-context alignment without changing model weights leads to a 7x increase in win-rate w.r.t. the text-davinci-003 model from OpenAI, making the vanilla language model comparable to strong baselines with alignment fine-tuning.
    摘要 在这份说明中,我们探索了在语言模型学习中的推理时对适应。我们考虑了一个未经任何微调的语言模型Llama-2,并从其在语音指令下检索了9个示例匹配例子。相比直接提示,无需更改模型参数的受上下文匹配导致了与文本-达维纳-003模型从OpenAI的7倍增加胜率,使原始语言模型与对适应微调相当。

Lossy and Lossless (L$^2$) Post-training Model Size Compression

  • paper_url: http://arxiv.org/abs/2308.04269
  • repo_url: https://github.com/modeltc/l2_compression
  • paper_authors: Yumeng Shi, Shihao Bai, Xiuying Wei, Ruihao Gong, Jianlei Yang
  • for: 这个研究旨在提高深度神经网络的传输和储存方便性,透过结合lossy和lossless压缩方法。
  • methods: 本研究提出了一个后训练模型大小压缩方法,使用了一个统一的参数重复转换,以便在后训练过程中进行多种lossy压缩方法。此外,我们还引入了一个特殊的可微分Counter来帮助优化lossy压缩,以取得更适合的压缩点。
  • results: 本研究可以实现稳定的$10\times$压缩比率无损准确性,并且可以在短时间内取得$20\times$压缩比率对应轻微损失。代码可以在https://github.com/ModelTC/L2_Compression上获取。
    Abstract Deep neural networks have delivered remarkable performance and have been widely used in various visual tasks. However, their huge size causes significant inconvenience for transmission and storage. Many previous studies have explored model size compression. However, these studies often approach various lossy and lossless compression methods in isolation, leading to challenges in achieving high compression ratios efficiently. This work proposes a post-training model size compression method that combines lossy and lossless compression in a unified way. We first propose a unified parametric weight transformation, which ensures different lossy compression methods can be performed jointly in a post-training manner. Then, a dedicated differentiable counter is introduced to guide the optimization of lossy compression to arrive at a more suitable point for later lossless compression. Additionally, our method can easily control a desired global compression ratio and allocate adaptive ratios for different layers. Finally, our method can achieve a stable $10\times$ compression ratio without sacrificing accuracy and a $20\times$ compression ratio with minor accuracy loss in a short time. Our code is available at https://github.com/ModelTC/L2_Compression .
    摘要 深度神经网络已经提供了很好的性能,并在各种视觉任务中广泛使用。然而,它们的巨大大小带来了传输和存储的不便。许多前面的研究已经探讨过模型大小压缩。然而,这些研究通常是采用各种损失压缩和无损压缩方法,导致高效压缩率很困难。本工作提出了一种后处理模型大小压缩方法,它可以同时使用损失压缩和无损压缩。我们首先提出了一种统一的参数重要性变换,使得不同的损失压缩方法可以在后处理中进行 JOINT 处理。然后,我们引入了特有的可微分Counter,以便通过优化损失压缩来到达更适合的点,以便 later 无损压缩。此外,我们的方法可以轻松地控制desired的全局压缩比,并分配适应的层级压缩率。最后,我们的方法可以实现稳定的 $10\times$ 压缩比,无损减少精度,以及 $20\times$ 压缩比,只有微量损失精度。我们的代码可以在 https://github.com/ModelTC/L2_Compression 上找到。

Teacher-Student Architecture for Knowledge Distillation: A Survey

  • paper_url: http://arxiv.org/abs/2308.04268
  • repo_url: None
  • paper_authors: Chengming Hu, Xuan Li, Dan Liu, Haolun Wu, Xi Chen, Ju Wang, Xue Liu
  • for: 本研究主要是为了解决深度神经网络(DNNs)在实际应用中的问题,即DNNs的参数量过多。
  • methods: 本研究使用了教师-学生架构,其中简单的学生网络只需要一些参数就可以达到与深度教师网络相同的性能。
  • results: 本研究通过多种知识压缩、扩展、适应和加强目标,成功地实现了多种知识压缩目标。
    Abstract Although Deep neural networks (DNNs) have shown a strong capacity to solve large-scale problems in many areas, such DNNs are hard to be deployed in real-world systems due to their voluminous parameters. To tackle this issue, Teacher-Student architectures were proposed, where simple student networks with a few parameters can achieve comparable performance to deep teacher networks with many parameters. Recently, Teacher-Student architectures have been effectively and widely embraced on various knowledge distillation (KD) objectives, including knowledge compression, knowledge expansion, knowledge adaptation, and knowledge enhancement. With the help of Teacher-Student architectures, current studies are able to achieve multiple distillation objectives through lightweight and generalized student networks. Different from existing KD surveys that primarily focus on knowledge compression, this survey first explores Teacher-Student architectures across multiple distillation objectives. This survey presents an introduction to various knowledge representations and their corresponding optimization objectives. Additionally, we provide a systematic overview of Teacher-Student architectures with representative learning algorithms and effective distillation schemes. This survey also summarizes recent applications of Teacher-Student architectures across multiple purposes, including classification, recognition, generation, ranking, and regression. Lastly, potential research directions in KD are investigated, focusing on architecture design, knowledge quality, and theoretical studies of regression-based learning, respectively. Through this comprehensive survey, industry practitioners and the academic community can gain valuable insights and guidelines for effectively designing, learning, and applying Teacher-Student architectures on various distillation objectives.
    摘要 although deep neural networks (DNNs) have shown strong capacity to solve large-scale problems in many areas, such DNNs are difficult to deploy in real-world systems due to their numerous parameters. To address this issue, Teacher-Student architectures were proposed, where simple student networks with a few parameters can achieve comparable performance to deep teacher networks with many parameters. recently, Teacher-Student architectures have been effectively and widely embraced on various knowledge distillation (KD) objectives, including knowledge compression, knowledge expansion, knowledge adaptation, and knowledge enhancement. With the help of Teacher-Student architectures, current studies are able to achieve multiple distillation objectives through lightweight and generalized student networks. different from existing KD surveys that primarily focus on knowledge compression, this survey first explores Teacher-Student architectures across multiple distillation objectives. this survey presents an introduction to various knowledge representations and their corresponding optimization objectives. additionally, we provide a systematic overview of Teacher-Student architectures with representative learning algorithms and effective distillation schemes. this survey also summarizes recent applications of Teacher-Student architectures across multiple purposes, including classification, recognition, generation, ranking, and regression. finally, potential research directions in KD are investigated, focusing on architecture design, knowledge quality, and theoretical studies of regression-based learning, respectively. through this comprehensive survey, industry practitioners and the academic community can gain valuable insights and guidelines for effectively designing, learning, and applying Teacher-Student architectures on various distillation objectives.

FLIRT: Feedback Loop In-context Red Teaming

  • paper_url: http://arxiv.org/abs/2308.04265
  • repo_url: None
  • paper_authors: Ninareh Mehrabi, Palash Goyal, Christophe Dupuy, Qian Hu, Shalini Ghosh, Richard Zemel, Kai-Wei Chang, Aram Galstyan, Rahul Gupta
  • for: 这篇论文旨在测试和分析 générative 模型中的漏洞,以便提高模型的安全性和适用性。
  • methods: 这篇论文提出了一个自动化红队框架,用于评估给定的模型,并暴露其具有不安全和不适当内容生成的漏洞。该框架使用了受Context学习的反馈循环,以便对模型进行自动化红队。
  • results: 对比基eline方法,这种提出的策略更有效地暴露了Stable Diffusion(SD)模型中的漏洞,即使SD模型具有安全特性。此外,这种框架还能够对文本到文本模型进行红队, resulting in significantly higher toxic response generation rate compared to previously reported numbers.
    Abstract Warning: this paper contains content that may be inappropriate or offensive. As generative models become available for public use in various applications, testing and analyzing vulnerabilities of these models has become a priority. Here we propose an automatic red teaming framework that evaluates a given model and exposes its vulnerabilities against unsafe and inappropriate content generation. Our framework uses in-context learning in a feedback loop to red team models and trigger them into unsafe content generation. We propose different in-context attack strategies to automatically learn effective and diverse adversarial prompts for text-to-image models. Our experiments demonstrate that compared to baseline approaches, our proposed strategy is significantly more effective in exposing vulnerabilities in Stable Diffusion (SD) model, even when the latter is enhanced with safety features. Furthermore, we demonstrate that the proposed framework is effective for red teaming text-to-text models, resulting in significantly higher toxic response generation rate compared to previously reported numbers.
    摘要 警告:这篇论文可能包含不适或不宜的内容。 随着生成模型在不同应用中变得更加普遍使用,测试和分析这些模型的漏洞已成为一个优先事项。 在这篇论文中,我们提出一种自动红团框架,用于评估给定模型的漏洞,并让模型生成不安全或不适的内容。 我们的框架使用受 Context 学习的反馈循环,以红团模型并让它生成不安全内容。 我们提出了不同的 Context 攻击策略,以自动学习有效和多样的对抗示例 для文本到图像模型。 我们的实验表明,相比基eline方法,我们提出的策略在Stable Diffusion(SD)模型上更加有效,即使后者具有安全功能。 此外,我们的框架还对文本到文本模型进行了红团,并得到了远远高于之前报道的恶意回应率。

PokerKit: A Comprehensive Python Library for Fine-Grained Multi-Variant Poker Game Simulations

  • paper_url: http://arxiv.org/abs/2308.07327
  • repo_url: None
  • paper_authors: Juho Kim
  • for: 这篇论文是用于描述一个开源的Python库PokerKit,该库用于扩展现有的 póker游戏模拟和手牌评估工具的功能,支持更多的 póker变种和自定义游戏。
  • methods: 这篇论文详细介绍了PokerKit的设计和实现,包括它的直观的编程API,多种变种游戏支持,以及不同手牌类型的手牌评估 suite。
  • results: PokerKit的可靠性已经通过静态类型检查、广泛的doctests和单元测试确认,实现了97%的代码覆盖率。PokerKit的出现对计算机 póker领域做出了重要贡献,推动未来的研究和高级AI开发,用于多种 póker游戏。
    Abstract PokerKit is an open-source Python library designed to overcome the restrictions of existing poker game simulation and hand evaluation tools, which typically support only a handful of poker variants and lack flexibility in game state control. In contrast, PokerKit significantly expands this scope by supporting an extensive array of poker variants and it provides a flexible architecture for users to define their custom games. This paper details the design and implementation of PokerKit, including its intuitive programmatic API, multi-variant game support, and a unified hand evaluation suite across different hand types. The flexibility of PokerKit allows for applications in diverse areas, such as poker AI development, tool creation, and online poker casino implementation. PokerKit's reliability has been established through static type checking, extensive doctests, and unit tests, achieving 97\% code coverage. The introduction of PokerKit represents a significant contribution to the field of computer poker, fostering future research and advanced AI development for a wide variety of poker games.
    摘要 pokerKit 是一个开源的 Python 库,旨在超越现有的 póker 游戏模拟和手牌评估工具,这些工具通常只支持几种 póker 变种并缺乏游戏状态控制的灵活性。相比之下,pokerKit 对此进行了广泛的扩展,支持了大量的 póker 变种,并提供了用户定义的自定义游戏功能。这篇论文介绍了 pokerKit 的设计和实现,包括它的直观的编程 API,多种变种游戏支持,以及不同手牌类型的统一手牌评估 suite。pokerKit 的灵活性允许其在多种领域应用,如 póker AI 研发、工具创造和在线 póker 赌场实现。pokerKit 的可靠性已经通过静态类型检查、extensive doctests 和单元测试达到 97% 代码覆盖率。pokerKit 的出现对计算机 póker 领域做出了重要贡献,激发未来的研究和高级 AI 开发,涵盖各种 póker 游戏。

MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion

  • paper_url: http://arxiv.org/abs/2308.04249
  • repo_url: https://github.com/reedonepeck/minddiffuser
  • paper_authors: Yizhuo Lu, Changde Du, Qiongyi zhou, Dianpeng Wang, Huiguang He
  • for: 这个论文的目的是提出一种两阶段图像重建模型,以解决脑计算机交互界面中图像重建精度和控制性的挑战。
  • methods: 该模型使用了VQ-VAE卷积Autoencoder和CLIP文本嵌入,并通过稳定扩散来初步重建图像,然后通过反射学习来调整图像的结构信息。
  • results: 该模型在Natural Scenes Dataset(NSD)上表现出了现在领先的性能,并且经过了许多质量和质量分析,得出了可读性的多模式特征,与脑响应的对应性得到了证明。
    Abstract Reconstructing visual stimuli from brain recordings has been a meaningful and challenging task. Especially, the achievement of precise and controllable image reconstruction bears great significance in propelling the progress and utilization of brain-computer interfaces. Despite the advancements in complex image reconstruction techniques, the challenge persists in achieving a cohesive alignment of both semantic (concepts and objects) and structure (position, orientation, and size) with the image stimuli. To address the aforementioned issue, we propose a two-stage image reconstruction model called MindDiffuser. In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into Stable Diffusion, which yields a preliminary image that contains semantic information. In Stage 2, we utilize the CLIP visual feature decoded from fMRI as supervisory information, and continually adjust the two feature vectors decoded in Stage 1 through backpropagation to align the structural information. The results of both qualitative and quantitative analyses demonstrate that our model has surpassed the current state-of-the-art models on Natural Scenes Dataset (NSD). The subsequent experimental findings corroborate the neurobiological plausibility of the model, as evidenced by the interpretability of the multimodal feature employed, which align with the corresponding brain responses.
    摘要 <>重构脑记录中的视觉刺激是一项有意义且挑战性的任务。特别是在实现精确和可控的图像重建方面,这种任务具有推动脑机器交互的进步和应用的重要性。尽管复杂图像重建技术得到了进步,但是在图像刺激中协调semantic(概念和物体)和structure(位置、方向、大小)仍然是一项挑战。为了解决这个问题,我们提出了一种两个阶段的图像重建模型,称为 MindDiffuser。在第一阶段,使用VQ-VAE隐藏表示和CLIP文本嵌入从fMRI中解码的,并将其置入稳定扩散,从而得到包含semantic信息的初步图像。在第二阶段,我们利用CLIP视觉特征从fMRI中解码的,作为监督信息,通过反射来调整在第一阶段解码的两个特征向量,以实现结构信息的协调。实验结果表明,我们的模型在Natural Scenes Dataset(NSD)上超过了当前状态的艺术模型。后续的实验发现,证明了我们的模型在脑响应的可靠性方面具有神经生物学可能性,其中multimodal特征的可读性与脑响应的对应。

Gloss Alignment Using Word Embeddings

  • paper_url: http://arxiv.org/abs/2308.04248
  • repo_url: None
  • paper_authors: Harry Walsh, Ozge Mercanoglu Sincan, Ben Saunders, Richard Bowden
  • for: 本研究旨在提高听语字幕和签语对应的精度,以便更好地训练无约束听语转文本模型。
  • methods: 本研究使用大量的说语言模型来对签语检测点进行对应。这种方法可以与现有的对应技术结合使用,从而降低计算成本。
  • results: 我们在\acf{mdgs}和\acf{bobsl} dataset上Quantitatively证明了我们的方法的效果,可以达到33.22 BLEU-1分数的word对应精度。
    Abstract Capturing and annotating Sign language datasets is a time consuming and costly process. Current datasets are orders of magnitude too small to successfully train unconstrained \acf{slt} models. As a result, research has turned to TV broadcast content as a source of large-scale training data, consisting of both the sign language interpreter and the associated audio subtitle. However, lack of sign language annotation limits the usability of this data and has led to the development of automatic annotation techniques such as sign spotting. These spottings are aligned to the video rather than the subtitle, which often results in a misalignment between the subtitle and spotted signs. In this paper we propose a method for aligning spottings with their corresponding subtitles using large spoken language models. Using a single modality means our method is computationally inexpensive and can be utilized in conjunction with existing alignment techniques. We quantitatively demonstrate the effectiveness of our method on the \acf{mdgs} and \acf{bobsl} datasets, recovering up to a 33.22 BLEU-1 score in word alignment.
    摘要 捕捉和标注手语数据集是一个时间consuming和成本高的过程。现有的数据集规模几个数量级小于需要成功训练无约制手语识别模型。因此,研究人员将视频广播内容作为大规模训练数据,包括手语 interprete 和关联的音频字幕。然而,手语注释缺失限制了这些数据的可用性,导致了自动注释技术的开发,如手语搜索。这些搜索被视频而不是字幕进行对齐,经常导致字幕和搜索到的手语之间的不一致。在这篇论文中,我们提议一种方法用于将搜索与其对应的字幕进行对齐,使用大量的人语言模型。由于我们只使用一种模式,我们的方法是计算机不昂贵的,可以与现有的对齐方法结合使用。我们量化地示示了我们的方法在\acf{mdgs}和\acf{bobsl}数据集上的效果,recovering up to 33.22 BLEU-1 分数。

AutoPCF: Efficient Product Carbon Footprint Accounting with Large Language Models

  • paper_url: http://arxiv.org/abs/2308.04241
  • repo_url: None
  • paper_authors: Zhu Deng, Jinjie Liu, Biao Luo, Can Yuan, Qingrun Yang, Lei Xiao, Wenwen Zhou, Zhu Liu
  • for: 这个研究旨在发展一个自动化的碳脚印检测框架,以便快速和自动地计算产品生命周期中的碳脚印。
  • methods: 本研究使用了五个大语言模型(LLMs)来测试和比较生命周期模型的emergent能力,并将这些模型应用于自动生成产品生命周期的数据库。另外,这个框架还使用了深度学习算法来自动匹配计算参数,以便快速计算产品的碳脚印。
  • results: 使用AutoPCF框架估计三个案例产品的碳脚印,结果显示AutoPCF框架具有快速计算碳脚印的能力,比传统方法快得多少倍。
    Abstract The product carbon footprint (PCF) is crucial for decarbonizing the supply chain, as it measures the direct and indirect greenhouse gas emissions caused by all activities during the product's life cycle. However, PCF accounting often requires expert knowledge and significant time to construct life cycle models. In this study, we test and compare the emergent ability of five large language models (LLMs) in modeling the 'cradle-to-gate' life cycles of products and generating the inventory data of inputs and outputs, revealing their limitations as a generalized PCF knowledge database. By utilizing LLMs, we propose an automatic AI-driven PCF accounting framework, called AutoPCF, which also applies deep learning algorithms to automatically match calculation parameters, and ultimately calculate the PCF. The results of estimating the carbon footprint for three case products using the AutoPCF framework demonstrate its potential in achieving automatic modeling and estimation of PCF with a large reduction in modeling time from days to minutes.
    摘要 产品碳脚印(PCF)对于减少供应链的碳排放非常重要,因为它测量产品生命周期中直接和间接气候变化所导致的绿house gas排放。然而,PCF会计通常需要专业知识和大量时间建立生命周期模型。在这项研究中,我们测试和比较五种大型自然语言模型(LLM)在产品“营养径”生命周期的模型和生成输入输出inv质数据方面的能力,揭示它们的局限性作为总体PCF知识库。通过使用LLM,我们提议一种自动驱动的PCF会计框架,称为AutoPCF,该框架还应用深度学习算法来自动匹配计算参数,最终计算PCF。三个案例 продукт的碳脚印估计结果表明AutoPCF框架在自动模型和估计PCF方面具有很大的潜力,从天天减少到分钟内。

Federated Inference with Reliable Uncertainty Quantification over Wireless Channels via Conformal Prediction

  • paper_url: http://arxiv.org/abs/2308.04237
  • repo_url: None
  • paper_authors: Meiyi Zhu, Matteo Zecchin, Sangwoo Park, Caili Guo, Chunyan Feng, Osvaldo Simeone
  • for: 这个论文旨在研究在分布式计算环境下,通过设备到服务器的通信,提高服务器的推断准确性。
  • methods: 该论文提出了一种名为分布式准确预测(Federated Conformal Prediction,简称WFCP)的协议,它基于类型基本多访问(Type-Based Multiple Access,简称TBMA)和一种新的量词 corrections 策略。WFCP 提供了正式的可靠性保证,包括服务器预测集的覆盖率。
  • results: 该论文通过数值结果显示,WFCP 在有限通信资源和/或大量设备情况下具有显著优势,特别是与已有的 federated CP 方案进行数字实现的比较。
    Abstract Consider a setting in which devices and a server share a pre-trained model. The server wishes to make an inference on a new input given the model. Devices have access to data, previously not used for training, and can communicate to the server over a common wireless channel. If the devices have no access to the new input, can communication from devices to the server enhance the quality of the inference decision at the server? Recent work has introduced federated conformal prediction (CP), which leverages devices-to-server communication to improve the reliability of the server's decision. With federated CP, devices communicate to the server information about the loss accrued by the shared pre-trained model on the local data, and the server leverages this information to calibrate a decision interval, or set, so that it is guaranteed to contain the correct answer with a pre-defined target reliability level. Previous work assumed noise-free communication, whereby devices can communicate a single real number to the server. In this paper, we study for the first time federated CP in a wireless setting. We introduce a novel protocol, termed wireless federated conformal prediction (WFCP), which builds on type-based multiple access (TBMA) and on a novel quantile correction strategy. WFCP is proved to provide formal reliability guarantees in terms of coverage of the predicted set produced by the server. Using numerical results, we demonstrate the significant advantages of WFCP against digital implementations of existing federated CP schemes, especially in regimes with limited communication resources and/or large number of devices.
    摘要 假设设备和服务器共享预训练模型。服务器想要对新输入进行推断。设备可以访问未使用过训练的数据,并可以通过共享的无线通信chnnel与服务器进行通信。如果设备没有访问新输入,可以通过设备到服务器的通信来提高服务器的推断决策质量吗? latest work introduced federated conformal prediction (CP), which leverages devices-to-server communication to improve the reliability of the server's decision. With federated CP, devices communicate to the server information about the loss accrued by the shared pre-trained model on the local data, and the server leverages this information to calibrate a decision interval, or set, so that it is guaranteed to contain the correct answer with a pre-defined target reliability level. Previous work assumed noise-free communication, whereby devices can communicate a single real number to the server. In this paper, we study for the first time federated CP in a wireless setting. We introduce a novel protocol, termed wireless federated conformal prediction (WFCP), which builds on type-based multiple access (TBMA) and on a novel quantile correction strategy. WFCP is proved to provide formal reliability guarantees in terms of coverage of the predicted set produced by the server. Using numerical results, we demonstrate the significant advantages of WFCP against digital implementations of existing federated CP schemes, especially in regimes with limited communication resources and/or large number of devices.

Semantic Interpretation and Validation of Graph Attention-based Explanations for GNN Models

  • paper_url: http://arxiv.org/abs/2308.04220
  • repo_url: None
  • paper_authors: Efimia Panagiotaki, Daniele De Martini, Lars Kunze
  • for: 这种方法用于提高图神经网络(GNN)模型的解释性,通过使用semantic attention来增强图结构中的特征重要性的描述。
  • methods: 该方法利用semantic attention mechanism来提供基于特征重要性的解释,并通过对模型精度和特征重要性之间的相关性进行分析,从而获得有价值的特征重要性信息。
  • results: 通过应用该方法于一个遥感点云估计模型,成功地 indentify了提高性的semantic类别,并生成了可靠的后续semantic解释。
    Abstract In this work, we propose a methodology for investigating the application of semantic attention to enhance the explainability of Graph Neural Network (GNN)-based models, introducing semantically-informed perturbations and establishing a correlation between predicted feature-importance weights and model accuracy. Graph Deep Learning (GDL) has emerged as a promising field for tasks like scene interpretation, leveraging flexible graph structures to concisely describe complex features and relationships. As traditional explainability methods used in eXplainable AI (XAI) cannot be directly applied to such structures, graph-specific approaches are introduced. Attention mechanisms have demonstrated their efficacy in estimating the importance of input features in deep learning models and thus have been previously employed to provide feature-based explanations for GNN predictions. Building upon these insights, we extend existing attention-based graph-explainability methods investigating the use of attention weights as importance indicators of semantically sorted feature sets. Through analysing the behaviour of predicted attention-weights distribution in correlation with model accuracy, we gain valuable insights into feature importance with respect to the behaviour of the GNN model. We apply our methodology to a lidar pointcloud estimation model successfully identifying key semantic classes that contribute to enhanced performance effectively generating reliable post-hoc semantic explanations.
    摘要 在这项工作中,我们提出了一种方法来增强Graph Neural Network(GNN)模型的解释性,通过引入semantically-informed perturbations和建立 predicted feature-importance weights与模型准确率之间的相关性。Graph Deep Learning(GDL)已经成为一个有前途的领域,用于场景理解等任务,利用灵活的图结构 concisely describe complex features和关系。traditional explainability methods在XAI中不能直接应用于such structures,因此introduce graph-specific approaches。Attention mechanisms have demonstrated their efficacy in estimating the importance of input features in deep learning models, and thus have been previously employed to provide feature-based explanations for GNN predictions. Building upon these insights, we extend existing attention-based graph-explainability methods by investigating the use of attention weights as importance indicators of semantically sorted feature sets. Through analyzing the behavior of predicted attention-weights distribution in correlation with model accuracy, we gain valuable insights into feature importance with respect to the behavior of the GNN model. We apply our methodology to a lidar pointcloud estimation model and successfully identify key semantic classes that contribute to enhanced performance, effectively generating reliable post-hoc semantic explanations.

Hybrid Retrieval-Augmented Generation for Real-time Composition Assistance

  • paper_url: http://arxiv.org/abs/2308.04215
  • repo_url: None
  • paper_authors: Xuchao Zhang, Menglin Xia, Camille Couturier, Guoqing Zheng, Saravan Rajmohan, Victor Ruhle
  • for: 提高语言模型的上下文理解和私有数据的 интеграción,以及减少幻觉。
  • methods: 使用Hybrid Retrieval-Augmented Generation(HybridRAG)框架,结合云端和客户端模型,并在云端使用大语言模型(LLM)生成异步的检索增强内存。
  • results: HybridRAG在 Wikitext 和 Pile 子集上实现了更低的延迟,并在实用性方面超过了云端只有模型。
    Abstract Retrieval augmented models show promise in enhancing traditional language models by improving their contextual understanding, integrating private data, and reducing hallucination. However, the processing time required for retrieval augmented large language models poses a challenge when applying them to tasks that require real-time responses, such as composition assistance. To overcome this limitation, we propose the Hybrid Retrieval-Augmented Generation (HybridRAG) framework that leverages a hybrid setting that combines both client and cloud models. HybridRAG incorporates retrieval-augmented memory generated asynchronously by a Large Language Model (LLM) in the cloud. By integrating this retrieval augmented memory, the client model acquires the capability to generate highly effective responses, benefiting from the LLM's capabilities. Furthermore, through asynchronous memory integration, the client model is capable of delivering real-time responses to user requests without the need to wait for memory synchronization from the cloud. Our experiments on Wikitext and Pile subsets show that HybridRAG achieves lower latency than a cloud-based retrieval-augmented LLM, while outperforming client-only models in utility.
    摘要 Note:* "Retrieval-augmented models" refers to models that use retrieval-augmented memory to improve their performance.* "Large Language Model" (LLM) refers to a model that can process and generate human-like language.* "Client model" refers to a model that runs on a local device, such as a smartphone or a computer.* "Cloud model" refers to a model that runs on a remote server, such as a cloud computing service.* "Memory synchronization" refers to the process of synchronizing the memory of multiple devices or models, so that they can access and share the same information.* "Utility" refers to the usefulness or effectiveness of a model or approach.

Adding Why to What? Analyses of an Everyday Explanation

  • paper_url: http://arxiv.org/abs/2308.04187
  • repo_url: None
  • paper_authors: Lutz Terfloth, Michael Schaffer, Heike M. Buhl, Carsten Schulte
  • for: 这篇论文的目的是研究如何为非专家用户提供可解释的技术决策。
  • methods: 这篇论文使用了技术哲学的双重本质理论来探讨对非专家用户的解释。
  • results: 研究发现,解释者在解释游戏时首先关注建筑(Architecture),然后关注相关性(Relevance)。在视频回忆中,解释者解释了基本组件之前 initially 解释了Physical Aspects,然后才转移到更复杂的、不可见的方面。 shift between addressing the two sides was justified by explanation goals, emerging misunderstandings, and the knowledge needs of the explainee。
    Abstract In XAI it is important to consider that, in contrast to explanations for professional audiences, one cannot assume common expertise when explaining for laypeople. But such explanations between humans vary greatly, making it difficult to research commonalities across explanations. We used the dual nature theory, a techno-philosophical approach, to cope with these challenges. According to it, one can explain, for example, an XAI's decision by addressing its dual nature: by focusing on the Architecture (e.g., the logic of its algorithms) or the Relevance (e.g., the severity of a decision, the implications of a recommendation). We investigated 20 game explanations using the theory as an analytical framework. We elaborate how we used the theory to quickly structure and compare explanations of technological artifacts. We supplemented results from analyzing the explanation contents with results from a video recall to explore how explainers justified their explanation. We found that explainers were focusing on the physical aspects of the game first (Architecture) and only later on aspects of the Relevance. Reasoning in the video recalls indicated that EX regarded the focus on the Architecture as important for structuring the explanation initially by explaining the basic components before focusing on more complex, intangible aspects. Shifting between addressing the two sides was justified by explanation goals, emerging misunderstandings, and the knowledge needs of the explainee. We discovered several commonalities that inspire future research questions which, if further generalizable, provide first ideas for the construction of synthetic explanations.
    摘要 在XAI中,需要注意的是,与专业听众的解释不同,不能假设共同知识。然而,人类之间的解释却很多样化,这使得研究共同点困难。我们采用了双重本质理论,一种技术哲学方法,以应对这些挑战。根据这种理论,可以通过关注XAI的几个方面来解释它的决策: Architecture(例如算法逻辑)或 Relevance(例如决策严重性、建议的影响)。我们对20个游戏解释使用了这种分析框架。我们详细介绍了如何使用这种理论快速结构和比较解释技术 artifacts。我们还补充了分析解释内容的结果,以及视频回忆中的解释者 justify their explanation。我们发现,解释者在初始阶段关注物理方面(Architecture),然后才关注更复杂、无形的方面。在视频回忆中的理由表明,EX认为在初始阶段通过解释基本组件来结构化解释是重要的。在转换 между两个方面时,解释者根据解释目标、出现的混淆和需要了解的知识来决定转换。我们发现了一些共同点,这些共同点可能会激发未来的研究问题。如果这些共同点能够普遍适用,它们将提供首先的想法 для构建人工解释。

Assistive Chatbots for healthcare: a succinct review

  • paper_url: http://arxiv.org/abs/2308.04178
  • repo_url: None
  • paper_authors: Basabdatta Sen Bhattacharya, Vibhav Sinai Pissurlenkar
  • For: The paper is written to review the state-of-the-art in AI-enabled Chatbots in healthcare, specifically during the last 10 years (2013-2023).* Methods: The paper reviews commercial and non-commercial Chatbots that are being used for patient support, as well as those in clinical trial phases. It also discusses the need for thorough and rigorous checks to ensure patient safety and medical ethics.* Results: The paper highlights a lack of trust in AI-enabled Chatbots among healthcare workers, patients, and the wider community, as well as dissatisfaction with the NLP skills of the Chatbots. It suggests that to enable deployment and integration of AI-enabled Chatbots in public health services, the technology needs to be simple and safe to use, and confidence in the technology needs to be built among the medical community and the wider community through outreach.Here are the three points in Simplified Chinese text:* For: 这篇论文是为了回顾过去十年(2013-2023)内健康服务中AI应用的状况。* Methods: 论文评论了商业和非商业的Chatbot,以及它们在患者支持方面的应用。它还提出了为保证患者安全和医疗伦理的严格检查的需要。* Results: 论文指出了健康工作者、患者和社会大众对AI应用Chatbot的不信任,以及Chatbot的自然语言处理技术不够的不满。它建议,为了让AI应用Chatbot在公共医疗服务中得到广泛应用,技术需要简单、安全,并需要对医疗人员和社会大众进行宣传和培训。
    Abstract Artificial Intelligence (AI) for supporting healthcare services has never been more necessitated than by the recent global pandemic. Here, we review the state-of-the-art in AI-enabled Chatbots in healthcare proposed during the last 10 years (2013-2023). The focus on AI-enabled technology is because of its potential for enhancing the quality of human-machine interaction via Chatbots, reducing dependence on human-human interaction and saving man-hours. Our review indicates that there are a handful of (commercial) Chatbots that are being used for patient support, while there are others (non-commercial) that are in the clinical trial phases. However, there is a lack of trust on this technology regarding patient safety and data protection, as well as a lack of wider awareness on its benefits among the healthcare workers and professionals. Also, patients have expressed dissatisfaction with Natural Language Processing (NLP) skills of the Chatbots in comparison to humans. Notwithstanding the recent introduction of ChatGPT that has raised the bar for the NLP technology, this Chatbot cannot be trusted with patient safety and medical ethics without thorough and rigorous checks to serve in the `narrow' domain of assistive healthcare. Our review suggests that to enable deployment and integration of AI-enabled Chatbots in public health services, the need of the hour is: to build technology that is simple and safe to use; to build confidence on the technology among: (a) the medical community by focussed training and development; (b) the patients and wider community through outreach.
    摘要 人工智能(AI)在支持医疗服务方面从未如今所需要的那么重要。我们对过去10年(2013-2023)提出的AI应用于医疗领域的评论。我们的评论表明,只有一些商业聊天机器人在患者支持方面使用,而其他非商业聊天机器人则处于临床试验阶段。然而,技术的可靠性和数据保护方面存在不足的信任,同时医疗工作者和专业人员对其利好的认知也不够。此外,患者对自然语言处理(NLP)技术的评价较低,与人类之间的交流仍然存在差距。尽管最近出现了ChatGPT,但这种技术在医疗领域的应用仍需进行严格的检验和评估,以确保Patient Safety和医疗伦理的安全性。我们的评论建议,为了使AI应用于医疗服务中,需要:建立简单安全的技术;帮助医疗社区了解和信任技术;通过宣传和教育,建立患者和社区的信任。

Predicting Drug-Drug Interactions Using Knowledge Graphs

  • paper_url: http://arxiv.org/abs/2308.04172
  • repo_url: None
  • paper_authors: Lizzy Farrugia, Lilian M. Azzopardi, Jeremy Debattista, Charlie Abela
    for:The paper aims to predict unknown Drug-Drug Interactions (DDIs) by incorporating Knowledge Graphs (KGs) and various drug features from public drug repositories.methods:The medicX end-to-end framework uses a combination of translation, factorisation, and Neural Network (NN) based KG Embedding (KGE) methods to integrate drug features and predict unknown DDIs. The best performing combination was the ComplEx embedding method with a Long Short-Term Memory (LSTM) network, which achieved an F1-score of 95.19%.results:The ComplEx embedding method with an LSTM network achieved an F1-score of 95.19% on a dataset based on the DDIs found in DrugBank version 5.1.8, outperforming the state-of-the-art model DeepDDI by 5.61%. Additionally, a graph auto-encoder model using a Graph Neural Network (GNN) achieved an F1-score of 91.94%.
    Abstract In the last decades, people have been consuming and combining more drugs than before, increasing the number of Drug-Drug Interactions (DDIs). To predict unknown DDIs, recently, studies started incorporating Knowledge Graphs (KGs) since they are able to capture the relationships among entities providing better drug representations than using a single drug property. In this paper, we propose the medicX end-to-end framework that integrates several drug features from public drug repositories into a KG and embeds the nodes in the graph using various translation, factorisation and Neural Network (NN) based KG Embedding (KGE) methods. Ultimately, we use a Machine Learning (ML) algorithm that predicts unknown DDIs. Among the different translation and factorisation-based KGE models, we found that the best performing combination was the ComplEx embedding method with a Long Short-Term Memory (LSTM) network, which obtained an F1-score of 95.19% on a dataset based on the DDIs found in DrugBank version 5.1.8. This score is 5.61% better than the state-of-the-art model DeepDDI. Additionally, we also developed a graph auto-encoder model that uses a Graph Neural Network (GNN), which achieved an F1-score of 91.94%. Consequently, GNNs have demonstrated a stronger ability to mine the underlying semantics of the KG than the ComplEx model, and thus using higher dimension embeddings within the GNN can lead to state-of-the-art performance.
    摘要 在最近几十年中,人们的药物consumption和组合已经变得更加普遍,导致药物相互作用(DDIs)的数量增加。为预测未知的DDIs,最近的研究开始 incorporating知识图(KGs),因为它们可以捕捉药物之间的关系,提供更好的药物表示than使用单一的药物属性。在这篇文章中,我们提出了medicX终端框架,该框架 integrates 多种药物特征从公共药物库中into a KG,并使用不同的翻译、分解和神经网络(NN)基于KGE方法来嵌入图节点。最终,我们使用机器学习算法预测未知DDIs。在不同的翻译和分解基于KGE模型中,我们发现了最佳的组合是ComplEx嵌入方法与长短期记忆网络(LSTM),其在基于DrugBank版本5.1.8的数据集上取得了F1得分95.19%,高于当前状态的模型DeepDDI。此外,我们还开发了一种图自编码模型,使用图神经网络(GNN),其取得了F1得分91.94%。因此,GNNs在挖掘知识图下的能力更强,使用高维度嵌入在GNN中可以达到状态之Art。

Current and Future Challenges in Knowledge Representation and Reasoning

  • paper_url: http://arxiv.org/abs/2308.04161
  • repo_url: https://github.com/Aryia-Behroziuan/Other-sources
  • paper_authors: James P. Delgrande, Birte Glimm, Thomas Meyer, Miroslaw Truszczynski, Frank Wolter
  • For: The paper discusses the current state of the art in Knowledge Representation and Reasoning, including its relation to other areas such as machine learning and uncertainty reasoning, and provides recommendations for future progress.* Methods: The paper is based on presentations, panels, working groups, and discussions that took place at a Dagstuhl Perspectives workshop on Knowledge Representation and Reasoning in July 2022.* Results: The paper provides a manifesto that declares the current views on Knowledge Representation, including its origins, goals, milestones, and current foci, as well as its challenges and key priorities for the next decade.Here is the same information in Simplified Chinese text:
  • for: 本文讲述了知识表示和推理领域的当前状况,包括它与其他领域的关系,如机器学习和不确定性推理,以及未来进展的建议。
  • methods: 本文基于2022年7月的达斯图尔视点工作shop的现场表示、小组讨论和推动活动。
  • results: 本文提供了一份宣言,宣布知识表示的起源、目标、里程碑和当前焦点,以及其挑战和未来十年的关键优先事项。
    Abstract Knowledge Representation and Reasoning is a central, longstanding, and active area of Artificial Intelligence. Over the years it has evolved significantly; more recently it has been challenged and complemented by research in areas such as machine learning and reasoning under uncertainty. In July 2022 a Dagstuhl Perspectives workshop was held on Knowledge Representation and Reasoning. The goal of the workshop was to describe the state of the art in the field, including its relation with other areas, its shortcomings and strengths, together with recommendations for future progress. We developed this manifesto based on the presentations, panels, working groups, and discussions that took place at the Dagstuhl Workshop. It is a declaration of our views on Knowledge Representation: its origins, goals, milestones, and current foci; its relation to other disciplines, especially to Artificial Intelligence; and on its challenges, along with key priorities for the next decade.
    摘要 知识表示和推理是人工智能的中心、长期积极发展的领域。随着时间的推移,它不断发展和改进,最近受到机器学习和不确定性推理的研究启发。2022年7月,达斯图尔视角工作坊(Dagstuhl Perspectives)举行了关于知识表示和推理的国际研讨会。工作坊的目的是描述该领域的现状,包括与其他领域的关系、短coming和优势,以及未来十年的发展优先级。我们基于工作坊的演讲、审议组、工作组和讨论会议的结果,制定了这份宣言。这是我们对知识表示的看法,包括其起源、目标、里程碑和当前焦点;与其他学科的关系,特别是人工智能;以及其挑战和未来十年的发展优先级。

Correlating Medi-Claim Service by Deep Learning Neural Networks

  • paper_url: http://arxiv.org/abs/2308.04469
  • repo_url: None
  • paper_authors: Jayanthi Vajiram, Negha Senthil, Nean Adhith. P
  • for: 防止医疗保险诈骗案件,包括患者、医生、诊断中心和保险公司之间的串谍关系,以确保金融增长。
  • methods: 使用卷积神经网络架构,通过对不同提供者的clam做 corrrelation 研究,检测诈骗CLAIM。同时使用超级vised和无监督分类器来检测诈骗和非诈骗CLAIM。
  • results: 通过使用卷积神经网络架构和 corrrelation 研究,能够准确地检测诈骗CLAIM,并且可以帮助防止金融诈骗案件。
    Abstract Medical insurance claims are of organized crimes related to patients, physicians, diagnostic centers, and insurance providers, forming a chain reaction that must be monitored constantly. These kinds of frauds affect the financial growth of both insured people and health insurance companies. The Convolution Neural Network architecture is used to detect fraudulent claims through a correlation study of regression models, which helps to detect money laundering on different claims given by different providers. Supervised and unsupervised classifiers are used to detect fraud and non-fraud claims.
    摘要 医疗保险养成有组织犯罪关系于病人、医生、诊断中心和保险公司,形成一个推动式的链 reaction。这种类型的诈骗活动会对保险人和健康保险公司的财务增长产生影响。使用卷积神经网络架构来检测诈骗养成,通过对不同提供者的clamshell进行相关性研究,可以检测到不同提供者的钱财洗涤。使用supervised和Unsupervised分类器来检测诈骗和非诈骗养成。

Heterogeneous 360 Degree Videos in Metaverse: Differentiated Reinforcement Learning Approaches

  • paper_url: http://arxiv.org/abs/2308.04083
  • repo_url: None
  • paper_authors: Wenhan Yu, Jun Zhao
  • for: 这篇论文旨在提出一种适用于多种需求的质量服务模型,以满足未来元宇宙中多样化用户需求的视频技术发展。
  • methods: 该论文提出了一种基于自适应深度学习算法的帧槽结构,并对帧进行优化。两种结构:分离输入异构输出(SIDO)和合并输入异构输出(MIDO),以适应多种需求的场景。
  • results: 实验表明,该模型能够有效地优化帧率和压缩率,并适应不同需求的场景。
    Abstract Advanced video technologies are driving the development of the futuristic Metaverse, which aims to connect users from anywhere and anytime. As such, the use cases for users will be much more diverse, leading to a mix of 360-degree videos with two types: non-VR and VR 360-degree videos. This paper presents a novel Quality of Service model for heterogeneous 360-degree videos with different requirements for frame rates and cybersickness. We propose a frame-slotted structure and conduct frame-wise optimization using self-designed differentiated deep reinforcement learning algorithms. Specifically, we design two structures, Separate Input Differentiated Output (SIDO) and Merged Input Differentiated Output (MIDO), for this heterogeneous scenario. We also conduct comprehensive experiments to demonstrate their effectiveness.
    摘要 高级视频技术驱动未来Metaverse的发展,目的是Connect users from anywhere and anytime。因此,用户的用例将变得更加多样化,导致360度视频的两种类型:非VR和VR 360度视频。这篇论文提出了一种新的服务质量模型 для不同需求的 heterogeneous 360度视频,包括帧率和恶心症的不同需求。我们提出了一种帧槽结构,并通过自定义分化深度学习算法进行帧WISE优化。具体来说,我们设计了两种结构:分离输入�ifferentiated输出(SIDO)和合并输入�ifferentiated输出(MIDO),为这种多样化enario提供了优化。我们还进行了广泛的实验,以证明它们的有效性。

Federated Zeroth-Order Optimization using Trajectory-Informed Surrogate Gradients

  • paper_url: http://arxiv.org/abs/2308.04077
  • repo_url: None
  • paper_authors: Yao Shu, Xiaoqiang Lin, Zhongxiang Dai, Bryan Kian Hsiang Low
  • for: Federated zeroth-order optimization (ZOO) algorithms, which are used for query- and communication-efficient optimization in applications such as federated learning.
  • methods: Trajectory-informed gradient surrogates and adaptive gradient correction techniques, which are used to improve the accuracy and efficiency of federated ZOO.
  • results: The proposed FZooS algorithm achieves theoretical improvements over existing approaches and is supported by real-world experiments in federated black-box adversarial attack and federated non-differentiable metric optimization.Here is the simplified Chinese version of the three information:
  • for: federated zeroth-order优化(ZOO)算法,用于实现缓存和通信效率的优化,如联合学习等应用。
  • methods: using trajectory-informed gradient surrogates和适应式Gradient correction技术,以提高联合ZOO的准确性和效率。
  • results: proposed FZooS算法在理论上有所改进,并在实际中通过联合黑盒抗击和非凸度量优化等实验得到支持。
    Abstract Federated optimization, an emerging paradigm which finds wide real-world applications such as federated learning, enables multiple clients (e.g., edge devices) to collaboratively optimize a global function. The clients do not share their local datasets and typically only share their local gradients. However, the gradient information is not available in many applications of federated optimization, which hence gives rise to the paradigm of federated zeroth-order optimization (ZOO). Existing federated ZOO algorithms suffer from the limitations of query and communication inefficiency, which can be attributed to (a) their reliance on a substantial number of function queries for gradient estimation and (b) the significant disparity between their realized local updates and the intended global updates. To this end, we (a) introduce trajectory-informed gradient surrogates which is able to use the history of function queries during optimization for accurate and query-efficient gradient estimation, and (b) develop the technique of adaptive gradient correction using these gradient surrogates to mitigate the aforementioned disparity. Based on these, we propose the federated zeroth-order optimization using trajectory-informed surrogate gradients (FZooS) algorithm for query- and communication-efficient federated ZOO. Our FZooS achieves theoretical improvements over the existing approaches, which is supported by our real-world experiments such as federated black-box adversarial attack and federated non-differentiable metric optimization.
    摘要 联合优化,是一种兴起的概念,它在联合学习、联合优化等实际应用中找到了广泛的应用。在这种概念下,多个客户端(例如边缘设备)可以共同优化一个全球函数。客户端不会分享自己的本地数据,通常只会分享本地的梯度。但是,在许多应用中,梯度信息不可用,因此产生了联合零阶优化(ZOO)的概念。现有的联合ZOO算法受到函数询问和通信不�fficiente的限制,这可以被归因于(a)它们依赖了访问函数的很多次以估计梯度,以及(b)它们实现的本地更新和 globally 预期的更新之间存在很大的差异。为了解决这个问题,我们(a)引入了路径受限的梯度代理,这些梯度代理可以使用优化过程中的历史函数询问来实现精确和查询节省的梯度估计,以及(b)开发了适应性梯度调整技术,使用这些梯度代理来缓和上述差异。基于这些,我们提出了联合零阶优化使用路径受限梯度代理(FZooS)算法,实现了查询和通信节省的联合ZOO。我们的FZooS理论上超越了现有的方法,这被我们在实际应用中,如联合黑盒抗击和联合非 diffeomorphic 度量优化中所证明。

Path Signatures for Diversity in Probabilistic Trajectory Optimisation

  • paper_url: http://arxiv.org/abs/2308.04071
  • repo_url: None
  • paper_authors: Lucas Barcelos, Tin Lai, Rafael Oliveira, Paulo Borges, Fabio Ramos
  • for: 这个论文的目的是提出一种用于平行 trajectory 优化的算法,以避免模式溃灭并实现更好的全局性。
  • methods: 该算法基于粗路论断理论中的新进展,利用粗路论断理论中的粗路签名和希尔伯特空间表示来实现平行优化,并将平行变量推断与多样性推进的kernel相连接。
  • results: 实验表明,该策略可以在各种问题上实现更低的平均成本,包括2D导航和受损环境中的机器人手臂操作。
    Abstract Motion planning can be cast as a trajectory optimisation problem where a cost is minimised as a function of the trajectory being generated. In complex environments with several obstacles and complicated geometry, this optimisation problem is usually difficult to solve and prone to local minima. However, recent advancements in computing hardware allow for parallel trajectory optimisation where multiple solutions are obtained simultaneously, each initialised from a different starting point. Unfortunately, without a strategy preventing two solutions to collapse on each other, naive parallel optimisation can suffer from mode collapse diminishing the efficiency of the approach and the likelihood of finding a global solution. In this paper we leverage on recent advances in the theory of rough paths to devise an algorithm for parallel trajectory optimisation that promotes diversity over the range of solutions, therefore avoiding mode collapses and achieving better global properties. Our approach builds on path signatures and Hilbert space representations of trajectories, and connects parallel variational inference for trajectory estimation with diversity promoting kernels. We empirically demonstrate that this strategy achieves lower average costs than competing alternatives on a range of problems, from 2D navigation to robotic manipulators operating in cluttered environments.
    摘要 路径规划可以被看作是一个轨迹优化问题,其中需要将轨迹优化为最小化一个成本函数。在复杂的环境中,找到globally optimal solution可以是一个困难的任务,因为这个问题通常会陷入到地方最优解。然而,随着计算机硬件的进步,我们可以使用并行的轨迹优化方法,从不同的初始点开始并行地生成多个解决方案。然而,如果不采取措施来避免解决方案之间的冲突,那么纯粹的并行优化方法可能会陷入到模式塌突,从而降低方法的效率和找到全局解的可能性。在这篇论文中,我们采用了最近的粗 PATH 理论来设计一种并行轨迹优化算法,该算法可以在轨迹优化过程中提高多样性,因此避免模式塌突并实现更好的全局性。我们的方法基于轨迹签名和希尔伯特空间表示,并将并行变分推理与多样性激活函数相连接。我们实际上证明了这种策略在一系列问题上实现了更低的平均成本,从2D导航到受损环境中的机器人抓取器。

Enhancing Adversarial Robustness in Low-Label Regime via Adaptively Weighted Regularization and Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2308.04061
  • repo_url: None
  • paper_authors: Dongyoon Yang, Insung Kong, Yongdai Kim
  • for: 本研究针对具有仅有少量标签数据的情况下进行了 semi-supervised adversarial 训练。
  • methods: 本研究提出了两个上限函数,并提出了一个问题数据驱动的调整项。然后,我们开发了一个兼容这些上限函数的 semi-supervised adversarial 训练算法,其结合了问题数据驱动的知识传递和专家模型(i.e., 一个使用 semi-supervised 学习算法训练的教师模型)。
  • results: 我们的实验结果显示,我们的提案的算法可以实现 state-of-the-art 的性能,与现有算法相比,具有显著的优势。具体来说,对于仅有少量标签数据的情况下,我们的算法与使用所有标签数据的超级vised adversarial 训练算法相比,在 CIFAR-10 上的标准和Robust 精度上几乎相同。例如,我们的算法仅使用 8% 的标签数据时,与使用所有标签数据的超级vised adversarial 训练算法相比,其性能仍然具有显著的优势。
    Abstract Adversarial robustness is a research area that has recently received a lot of attention in the quest for trustworthy artificial intelligence. However, recent works on adversarial robustness have focused on supervised learning where it is assumed that labeled data is plentiful. In this paper, we investigate semi-supervised adversarial training where labeled data is scarce. We derive two upper bounds for the robust risk and propose a regularization term for unlabeled data motivated by these two upper bounds. Then, we develop a semi-supervised adversarial training algorithm that combines the proposed regularization term with knowledge distillation using a semi-supervised teacher (i.e., a teacher model trained using a semi-supervised learning algorithm). Our experiments show that our proposed algorithm achieves state-of-the-art performance with significant margins compared to existing algorithms. In particular, compared to supervised learning algorithms, performance of our proposed algorithm is not much worse even when the amount of labeled data is very small. For example, our algorithm with only 8\% labeled data is comparable to supervised adversarial training algorithms that use all labeled data, both in terms of standard and robust accuracies on CIFAR-10.
    摘要 “敌对类型调教是现在人工智能的研究领域中受到了很多关注,以确保人工智能的可靠性。然而,现有的工作通常假设有充足的标签数据,而我们在这篇论文中则 investigate 敌对调教中的半supervised 学习,在标签数据 scarce 的情况下。我们 deriv 了两个上限 bound 的敌对风险,并提出了一个基于这两个上限 bound 的调教term。然后,我们开发了一个半supervised adversarial training algorithm,它结合了我们提出的调教term 和知识传授使用半supervised teacher (即一个使用半supervised learning algorithm训练的教师模型)。我们的实验结果显示,我们的提案的算法可以 achieve state-of-the-art 性能,并且与已有算法相比,在标签数据很少的情况下,性能不会太差。例如,我们的算法仅使用8%的标签数据时,可以与完全supervised adversarial training algorithm相比,在 CIFAR-10 上 Both 标准和敌对精度方面表现出色。”

SODFormer: Streaming Object Detection with Transformer Using Events and Frames

  • paper_url: http://arxiv.org/abs/2308.04047
  • repo_url: https://github.com/dianzl/sodformer
  • paper_authors: Dianze Li, Jianing Li, Yonghong Tian
  • for: 提高对象检测的精度和效率,特别是在高速运动和低光照条件下。
  • methods: 利用Transformer架构, integrates events and frames to continuously detect objects in an asynchronous manner,并使用 asynchronous attention-based fusion module to integrate two heterogeneous sensing modalities。
  • results: 与四种state-of-the-art方法和八个基eline比较,提出的SODFormer方法显示出了显著的性能优势。 Additionally, the proposed method works well even in cases where the conventional frame-based camera fails, such as high-speed motion and low-light conditions.
    Abstract DAVIS camera, streaming two complementary sensing modalities of asynchronous events and frames, has gradually been used to address major object detection challenges (e.g., fast motion blur and low-light). However, how to effectively leverage rich temporal cues and fuse two heterogeneous visual streams remains a challenging endeavor. To address this challenge, we propose a novel streaming object detector with Transformer, namely SODFormer, which first integrates events and frames to continuously detect objects in an asynchronous manner. Technically, we first build a large-scale multimodal neuromorphic object detection dataset (i.e., PKU-DAVIS-SOD) over 1080.1k manual labels. Then, we design a spatiotemporal Transformer architecture to detect objects via an end-to-end sequence prediction problem, where the novel temporal Transformer module leverages rich temporal cues from two visual streams to improve the detection performance. Finally, an asynchronous attention-based fusion module is proposed to integrate two heterogeneous sensing modalities and take complementary advantages from each end, which can be queried at any time to locate objects and break through the limited output frequency from synchronized frame-based fusion strategies. The results show that the proposed SODFormer outperforms four state-of-the-art methods and our eight baselines by a significant margin. We also show that our unifying framework works well even in cases where the conventional frame-based camera fails, e.g., high-speed motion and low-light conditions. Our dataset and code can be available at https://github.com/dianzl/SODFormer.
    摘要 《DAVIS摄像头 Streaming Two Complementary Sensing Modalities of Asynchronous Events and Frames for Object Detection》DAVIS摄像头, Streaming two complementary sensing modalities of asynchronous events and frames,已经被广泛应用于重要的物体检测挑战中(例如快速运动模糊和低光照)。然而,如何有效利用rich temporal cues和融合两种不同的视觉流还是一个挑战。为了解决这个挑战,我们提出了一种新的流动对象检测器,即SODFormer,它首先将事件和帧集成为一起continuously检测物体。技术上,我们首先建立了一个大规模的多模态神经元摄像头检测数据集(即PKU-DAVIS-SOD),包括1080.1k的手动标签。然后,我们设计了一种空间时间Transformer架构,通过一个终到终的序列预测问题,来检测物体。在这个架构中,我们提出了一种新的 temporal Transformer模块,利用了两个视觉流的rich temporal cues来提高检测性能。最后,我们提出了一种异步注意力基于的融合模块,以便将两种不同的感知模式融合在一起,并且可以在任何时候提问,以便查找物体和跨出同步帧基于的融合策略的限制。结果显示,我们提出的SODFormer方法在比较四种state-of-the-art方法和我们的八个基eline之上取得了显著的提高。我们还证明了我们的统一框架在高速运动和低光照等情况下也能够正常工作。我们的数据集和代码可以在https://github.com/dianzl/SODFormer上下载。

Non-Intrusive Electric Load Monitoring Approach Based on Current Feature Visualization for Smart Energy Management

  • paper_url: http://arxiv.org/abs/2308.11627
  • repo_url: None
  • paper_authors: Yiwen Xu, Dengfeng Liu, Liangtao Huang, Zhiquan Lin, Tiesong Zhao, Sam Kwong
  • for: 本研究旨在提出一种非侵入式电力负荷监测方法,以支持智能城市的经济可持续能源管理。
  • methods: 本文employs popular计算机视觉技术,包括卷积变换和gramianangular场方法,将一维电流信号映射到二维颜色特征图像上。然后,通过U型深度神经网络 WITH multi-scale特征提取和注意机制,识别所有电动负荷。
  • results: 实验结果表明,提出的方法在公共数据集和私有数据集上均达到了superior表现,可以支持大规模互联网对象(IoT)中的能效能源管理。
    Abstract The state-of-the-art smart city has been calling for an economic but efficient energy management over large-scale network, especially for the electric power system. It is a critical issue to monitor, analyze and control electric loads of all users in system. In this paper, we employ the popular computer vision techniques of AI to design a non-invasive load monitoring method for smart electric energy management. First of all, we utilize both signal transforms (including wavelet transform and discrete Fourier transform) and Gramian Angular Field (GAF) methods to map one-dimensional current signals onto two-dimensional color feature images. Second, we propose to recognize all electric loads from color feature images using a U-shape deep neural network with multi-scale feature extraction and attention mechanism. Third, we design our method as a cloud-based, non-invasive monitoring of all users, thereby saving energy cost during electric power system control. Experimental results on both public and our private datasets have demonstrated our method achieves superior performances than its peers, and thus supports efficient energy management over large-scale Internet of Things (IoT).
    摘要 现代智能城市的要求是实现经济高效的能源管理,特别是电力系统。监测、分析和控制所有用户的电载是一个关键问题。在这篇论文中,我们利用人工智能popular计算机视觉技术来设计一种不侵入式的电力监测方法。首先,我们利用信号变换(包括wavelet transform和Discrete Fourier Transform)和Gramian Angular Field(GAF)方法将一维电流信号映射到二维颜色特征图像上。其次,我们提出了通过U型深度神经网络with multi-scale feature extraction和注意机制来识别所有电载。最后,我们设计了一种云端基于的非侵入式监测方法,以 saves energy cost during electric power system control。实验结果表明,我们的方法在公共数据集和私有数据集上都达到了更高的性能,因此支持了大规模Internet of Things(IoT)中的高效能源管理。

InfeRE: Step-by-Step Regex Generation via Chain of Inference

  • paper_url: http://arxiv.org/abs/2308.04041
  • repo_url: https://github.com/smallqqqq/infere
  • paper_authors: Shuai Zhang, Xiaodong Gu, Yuting Chen, Beijun Shen
  • for: 这个论文的目的是提出一种新的自然语言生成regex表达式(InfeRE),它可以帮助生成regex表达式的神经语言模型更加准确和可读性好。
  • methods: 这个论文使用了一种新的批处理方法,即将生成regex表达式的过程 decomposes into chains of step-by-step inference,以提高生成的regex表达式的精度和可读性。此外,它还引入了一种自适应均衡机制,以 Ensemble 多个模型的输出,从而提高了生成的regex表达式的稳定性。
  • results: 实验结果表明,InfeRE 可以备受提高神经语言模型生成regex表达式的精度,在两个公开的数据集上(NL-RX-Turk 和 KB13)测试,与前一代的基eline 和树状生成方法相比,InfeRE 可以提高 DFA@5 准确率的16.3% 和 14.7%。特别是,InfeRE 可以在两个数据集上,相比之前的树状生成方法,提高 DFA@5 准确率的18.1% 和 11.3%。
    Abstract Automatically generating regular expressions (abbrev. regexes) from natural language description (NL2RE) has been an emerging research area. Prior studies treat regex as a linear sequence of tokens and generate the final expressions autoregressively in a single pass. They did not take into account the step-by-step internal text-matching processes behind the final results. This significantly hinders the efficacy and interpretability of regex generation by neural language models. In this paper, we propose a new paradigm called InfeRE, which decomposes the generation of regexes into chains of step-by-step inference. To enhance the robustness, we introduce a self-consistency decoding mechanism that ensembles multiple outputs sampled from different models. We evaluate InfeRE on two publicly available datasets, NL-RX-Turk and KB13, and compare the results with state-of-the-art approaches and the popular tree-based generation approach TRANX. Experimental results show that InfeRE substantially outperforms previous baselines, yielding 16.3% and 14.7% improvement in DFA@5 accuracy on two datasets, respectively. Particularly, InfeRE outperforms the popular tree-based generation approach by 18.1% and 11.3% on both datasets, respectively, in terms of DFA@5 accuracy.
    摘要 自然语言描述(NL2RE)自动生成正则表达式(regex)是一个emerging研究领域。先前的研究通常将regex视为一个连续序列的token,通过单个通过一次推导生成最终结果。然而,这些研究未能考虑regex生成的内部文本匹配过程,这会限制神经语言模型的效果和可读性。在这篇论文中,我们提出了一新的思路called InfeRE,它将regex生成分解为一系列步骤的推导链。为了提高稳定性,我们还引入了自适应嵌入机制,该机制可以从不同模型中抽象多个输出,并将其ensemble。我们在两个公共可用的数据集NL-RX-Turk和KB13上进行了实验,并与当前的基eline和树状生成方法相比较。实验结果表明,InfeREsubstantiallyoutsperforms先前的基eline,在两个数据集上DFA@5准确率提高16.3%和14.7%。尤其是,InfeRE在两个数据集上与树状生成方法相比,DFA@5准确率提高18.1%和11.3%。

Adapting Foundation Models for Information Synthesis of Wireless Communication Specifications

  • paper_url: http://arxiv.org/abs/2308.04033
  • repo_url: None
  • paper_authors: Manikanta Kotaru
  • for: 本研究旨在提供一种基于人工智能技术的 wireless 通信规范总结工具,帮助用户快速获取相关信息。
  • methods: 该工具基于现有的基础模型,并添加了三个关键组件:域特定数据库、上下文提取器和反馈机制。用户的问题将被补充了基于技术规范的简短和相关信息。
  • results: 根据一个标准 benchmark 集合,该工具能够提供更加准确和相关的答案,其中 Bleu 分数和 BERTScore F1-度分别为 0.37 和 0.79,比前一代工具 ChatGPT 的分数高出许多。
    Abstract Existing approaches to understanding, developing and researching modern wireless communication technologies involves time-intensive and arduous process of sifting through numerous webpages and technical specification documents, gathering the required information and synthesizing it. This paper presents NextGen Communications Copilot, a conversational artificial intelligence tool for information synthesis of wireless communication specifications. The system builds on top of recent advancements in foundation models and consists of three key additional components: a domain-specific database, a context extractor, and a feedback mechanism. The system appends user queries with concise and query-dependent contextual information extracted from a database of wireless technical specifications and incorporates tools for expert feedback and data contributions. On evaluation using a benchmark dataset of queries and reference responses created by subject matter experts, the system demonstrated more relevant and accurate answers with an average BLEU score and BERTScore F1-measure of 0.37 and 0.79 respectively compared to the corresponding values of 0.07 and 0.59 achieved by state-of-the-art tools like ChatGPT.
    摘要 现有的方法 для了解、开发和研究现代无线通信技术都是一个时间consuming和辛苦的过程,需要逐页搜索众多的网页和技术规范文档,收集所需信息并将其综合化。本文介绍了 NextGen Communications Copilot,一个基于最新的基础模型的会话型人工智能工具,用于无线通信规范信息的综合处理。该系统包括三个关键组件:域pecific数据库、上下文提取器和反馈机制。系统将用户查询 append 与域pecific数据库中的简短和查询dependent的上下文信息,并包括专家反馈和数据贡献工具。经评估使用一个标准 benchmark dataset of queries和 reference responses,创建了由专家制定的查询和参照响应,系统示出了与现有工具 like ChatGPT 的相对比较好的准确性和相关性,其 BLEU 分数和 BERTScore F1-measure 分别为 0.37 和 0.79。

Measure of Uncertainty in Human Emotions

  • paper_url: http://arxiv.org/abs/2308.04032
  • repo_url: https://github.com/Aryia-Behroziuan/Other-sources
  • paper_authors: Etienne Naude, Henry Gann, Balaram Panda, Lance Zhang, Raina Song, Yuwei Shen
  • for: 这个研究旨在调查计算机是否能够根据人类表达的情感来进行不同任务。
  • methods: 这个研究使用了不同的uncertainty信息显示方式来影响人类决策过程。
  • results: 研究发现,显示更多的uncertainty信息可以帮助用户更自信地做出决策。I hope this helps! Let me know if you have any other questions.
    Abstract Many research explore how well computers are able to examine emotions displayed by humans and use that data to perform different tasks. However, there have been very few research which evaluate the computers ability to generate emotion classification information in an attempt to help the user make decisions or perform tasks. This is a crucial area to explore as it is paramount to the two way communication between humans and computers. This research conducted an experiment to investigate the impact of different uncertainty information displays of emotion classification on the human decision making process. Results show that displaying more uncertainty information can help users to be more confident when making decisions.
    摘要 很多研究都在研究计算机如何识别人类表达的情感,并使用这些数据来完成不同的任务。然而,有很少的研究探讨计算机是否能够生成情感分类信息,以帮助用户做出决策或完成任务。这是一个关键的领域,因为两个方向的人机交互是非常重要的。本研究进行了一项实验,以调查不同的不确定信息显示方式对人类决策过程的影响。结果显示,显示更多的不确定信息可以帮助用户更加自信地做出决策。

Gentopia: A Collaborative Platform for Tool-Augmented LLMs

  • paper_url: http://arxiv.org/abs/2308.04030
  • repo_url: https://github.com/gentopia-ai/gentopia
  • paper_authors: Binfeng Xu, Xukun Liu, Hua Shen, Zeyu Han, Yuhan Li, Murong Yue, Zhiyuan Peng, Yuchen Liu, Ziyu Yao, Dongkuan Xu
  • For: The paper aims to provide a flexible and customizable framework for Augmented Language Models (ALMs) that enables the use of various language models, task formats, prompting modules, and plugins.* Methods: The paper proposes a new framework called gentopia, which allows users to customize their ALMs through simple configurations and integrates various language models, task formats, prompting modules, and plugins into a unified paradigm.* Results: The paper establishes gentpool, a public platform for registering and sharing user-customized agents, and gentbench, an integral component of gentpool that evaluates user-customized agents across diverse aspects such as safety, robustness, and efficiency.
    Abstract Augmented Language Models (ALMs) empower large language models with the ability to use tools, transforming them into intelligent agents for real-world interactions. However, most existing frameworks for ALMs, to varying degrees, are deficient in the following critical features: flexible customization, collaborative democratization, and holistic evaluation. We present gentopia, an ALM framework enabling flexible customization of agents through simple configurations, seamlessly integrating various language models, task formats, prompting modules, and plugins into a unified paradigm. Furthermore, we establish gentpool, a public platform enabling the registration and sharing of user-customized agents. Agents registered in gentpool are composable such that they can be assembled together for agent collaboration, advancing the democratization of artificial intelligence. To ensure high-quality agents, gentbench, an integral component of gentpool, is designed to thoroughly evaluate user-customized agents across diverse aspects such as safety, robustness, efficiency, etc. We release gentopia on Github and will continuously move forward.
    摘要 基于扩展语言模型(ALM)的框架,gentopia,允许大语言模型使用工具,将其转变成智能代理人进行实际交互。然而,现有的ALM框架,各有不同程度的缺失,包括灵活定制、合作民主化和整体评估。我们提出了gentopia框架,允许用户通过简单的配置来自定义代理人,并允许不同的语言模型、任务格式、提示模块和插件在一个统一的架构中协作。此外,我们建立了gentpool公共平台,让用户可以注册和分享自定义代理人。gentpool中注册的代理人可以组合起来,推动人工智能的民主化。为保证高质量代理人,gentbench,gentpool的一个重要组件,专门用于评估用户自定义代理人的多个方面,包括安全、稳定性、效率等。我们将gentopia发布到Github,并将持续推进。

Top K Relevant Passage Retrieval for Biomedical Question Answering

  • paper_url: http://arxiv.org/abs/2308.04028
  • repo_url: https://github.com/shashank140195/Biomedical_QA_Model
  • paper_authors: Shashank Gupta
  • for: 这个论文的目的是提高生物医学问答系统的精度,使其能够更正确地回答生物医学相关的问题。
  • methods: 这个论文使用了现有的Dense Passage Retrieval(DPR)框架,并对其进行了微调,以便在生物医学领域中应用。具体来说,他们使用了Pubmed文献来回答医学问题。
  • results: 经过微调后,这个DPR模型在BioASQ问答 dataset上得到了0.81的F1分数,表明其能够准确地回答生物医学相关的问题。
    Abstract Question answering is a task that answers factoid questions using a large collection of documents. It aims to provide precise answers in response to the user's questions in natural language. Question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. On the web, there is no single article that could provide all the possible answers available on the internet to the question of the problem asked by the user. The existing Dense Passage Retrieval model has been trained on Wikipedia dump from Dec. 20, 2018, as the source documents for answering questions. Question answering (QA) has made big strides with several open-domain and machine comprehension systems built using large-scale annotated datasets. However, in the clinical domain, this problem remains relatively unexplored. According to multiple surveys, Biomedical Questions cannot be answered correctly from Wikipedia Articles. In this work, we work on the existing DPR framework for the biomedical domain and retrieve answers from the Pubmed articles which is a reliable source to answer medical questions. When evaluated on a BioASQ QA dataset, our fine-tuned dense retriever results in a 0.81 F1 score.
    摘要 问答任务是回答基于大量文档的问题,目的是通过自然语言提供精确的答案。问答依赖于高效的段 Retrieval,传统的稀疏 вектор空间模型,如 TF-IDF 或 BM25,是现实中的标准方法。在互联网上,没有一篇文章可以提供用户问题的所有可能的答案。现有的 dense passage retrieval 模型已经在 Dec. 20, 2018 的Wikipedia dump上进行了训练,作为回答问题的源文档。问答(QA)在开放领域和机器理解领域已经做出了大量的进展,但在医疗领域,这个问题还很少研究。根据多个调查,医学问题无法从 Wikipedia 文章中正确地回答。在这种情况下,我们在现有的 DPR 框架上进行了改进,并从可靠的 Pubmed 文章中提取答案。当评估在 BioASQ QA 数据集上时,我们的精制 dense retriever 得分为 0.81 F1 分。

AgentSims: An Open-Source Sandbox for Large Language Model Evaluation

  • paper_url: http://arxiv.org/abs/2308.04026
  • repo_url: None
  • paper_authors: Jiaju Lin, Haoran Zhao, Aochi Zhang, Yiting Wu, Huqiuyue Ping, Qin Chen
  • for: 评估大语言模型(LLM)的能力是一个公开的问题,因为现有的评估方法受到以下缺点的限制:(1)受限的评估能力,(2)易受到攻击的标准套件,(3)不具有客观的度量。
  • methods: 我们建议使用任务基本评估方法,即让LLM代理在模拟环境中完成任务,这是一个一 Size fits all的解决方案,可以解决上述问题。我们提供了 AgentSims,一个易于使用的基础设施,它可以帮助研究人员从不同领域测试他们感兴趣的具体能力。研究人员可以通过点击 GUI 或输入几行代码来构建评估任务和添加代理,以及测试新的支持机制,如记忆、规划和工具使用系统。
  • results: 我们的示例可以在 https://agentsims.com 上查看。
    Abstract With ChatGPT-like large language models (LLM) prevailing in the community, how to evaluate the ability of LLMs is an open question. Existing evaluation methods suffer from following shortcomings: (1) constrained evaluation abilities, (2) vulnerable benchmarks, (3) unobjective metrics. We suggest that task-based evaluation, where LLM agents complete tasks in a simulated environment, is a one-for-all solution to solve above problems. We present AgentSims, an easy-to-use infrastructure for researchers from all disciplines to test the specific capacities they are interested in. Researchers can build their evaluation tasks by adding agents and buildings on an interactive GUI or deploy and test new support mechanisms, i.e. memory, planning and tool-use systems, by a few lines of codes. Our demo is available at https://agentsims.com .
    摘要 具有chatGPT大语言模型(LLM)的社区中,评估这些模型的能力是一个公开的问题。现有的评估方法受到以下缺点:(1)受限的评价能力,(2)易受到攻击的标准,(3)不准确的度量。我们建议使用任务基本评估,让LLM代理在模拟环境中完成任务,作为一个一元解决方案。我们提供了 AgentSims,一个易于使用的基础设施,让研究人员从各个领域测试他们感兴趣的具体能力。研究人员可以通过在交互式GUI上添加代理和建筑,或者通过几行代码来部署和测试新的支持机制,如记忆、规划和工具使用系统。我们的 demo 可以在 上查看。

MSAC: Multiple Speech Attribute Control Method for Speech Emotion Recognition

  • paper_url: http://arxiv.org/abs/2308.04025
  • repo_url: None
  • paper_authors: Yu Pan
  • for: 本研究旨在探讨speech emotion recognition(SER)方法的可靠性,并研究如何从多个speech attribute的分布角度来模型speech emotion。
  • methods: 本研究提出了一种基于CNN的新型SER模型,采用了添加性marginsoftmax损失函数来提高类别间特征之间的距离,从而提高分类的精度。此外,提出了一种多种speech attribute控制方法MSAC,可以Explicitly控制speech attribute,使模型免受情绪无关的attribute的影响,捕捉更细腻的情绪相关特征。
  • results: 对于单个corpus和跨corpus的SER场景,我们的提出的SER工作流程在recognition、generalization和可靠性性方面均表现出优于基eline。单个corpusSER场景中,我们的SER工作流程取得了72.97%的WAR和71.76%的UAR在IEMOCAP corpus上。
    Abstract Despite significant progress, speech emotion recognition (SER) remains challenging due to inherent complexity and ambiguity of the emotion attribute, particularly in wild world. Whereas current studies primarily focus on recognition and generalization capabilities, this work pioneers an exploration into the reliability of SER methods and investigates how to model the speech emotion from the aspect of data distribution across various speech attributes. Specifically, we first build a novel CNN-based SER model which adopts additive margin softmax loss to expand the distance between features of different classes, thereby enhancing their discrimination. Second, a novel multiple speech attribute control method MSAC is proposed to explicitly control speech attributes, enabling the model to be less affected by emotion-agnostic attributes and capture more fine-grained emotion-related features. Third, we make a first attempt to test and analyze the reliability of the proposed SER workflow using the out-of-distribution detection method. Extensive experiments on both single and cross-corpus SER scenarios show that our proposed unified SER workflow consistently outperforms the baseline in terms of recognition, generalization, and reliability performance. Besides, in single-corpus SER, the proposed SER workflow achieves superior recognition results with a WAR of 72.97\% and a UAR of 71.76\% on the IEMOCAP corpus.
    摘要 尽管有了 significative progress,speech emotion recognition(SER)仍然具有挑战性,主要是因为情感属性的内在复杂和不确定性,特别是在野外环境中。而现有研究主要关注 reconocimiento y generalización capacidades,这个工作则探索了SER方法的可靠性,并 investigate了如何从数据分布角度来模型speech emotion。 Specifically,我们首先构建了一个基于CNN的SER模型,采用了添加式margin softmax损失函数,以增强不同类别之间的距离,从而提高它们的区分度。其次,我们提出了一种 Multiple Speech Attribute Control(MSAC)方法,以控制speech attribute,使模型免受情感无关的属性的影响,捕捉更细腻的情感相关特征。 Finally,我们对提出的SER工作流进行了首次测试和分析,并在单个corpus和交叉corpus中进行了广泛的实验。结果表明,我们的提出的SER工作流在认知、泛化和可靠性方面均有显著的优异性。此外,在单个corpus中,我们的SER工作流在IEMOCAP corpus上 achievable 的recognition结果为72.97%和71.76%。

Scope Loss for Imbalanced Classification and RL Exploration

  • paper_url: http://arxiv.org/abs/2308.04024
  • repo_url: None
  • paper_authors: Hasham Burhani, Xiao Qi Shi, Jonathan Jaegerman, Daniel Balicki
  • for: This paper aims to address the exploration-exploitation trade-off in reinforcement learning and the dataset imbalance problem in supervised classification.
  • methods: The paper equates the two problems and derives a novel loss function called Scope Loss, which adjusts gradients to prevent performance losses from over-exploitation and dataset imbalances without the need for tuning.
  • results: The paper shows that Scope Loss outperforms state-of-the-art loss functions over a basket of benchmark reinforcement learning tasks and a skewed classification dataset.
    Abstract We demonstrate equivalence between the reinforcement learning problem and the supervised classification problem. We consequently equate the exploration exploitation trade-off in reinforcement learning to the dataset imbalance problem in supervised classification, and find similarities in how they are addressed. From our analysis of the aforementioned problems we derive a novel loss function for reinforcement learning and supervised classification. Scope Loss, our new loss function, adjusts gradients to prevent performance losses from over-exploitation and dataset imbalances, without the need for any tuning. We test Scope Loss against SOTA loss functions over a basket of benchmark reinforcement learning tasks and a skewed classification dataset, and show that Scope Loss outperforms other loss functions.
    摘要 我们证明了回归学习问题与supervised分类问题之间的等价性。我们因此将rek-exploration偏好和数据偏好问题相提并论,并发现它们在解决方面存在相似之处。基于这些问题的分析,我们提出了一种新的损失函数,称为Scope损失。Scope损失函数可以适应找到潜在的性能损失和数据偏好问题,无需任何调整。我们对一组标准回归学习任务和一个偏好分类 dataset进行测试,并证明Scope损失函数在与现状最优损失函数进行比较时表现出色。

Improving Performance of Semi-Supervised Learning by Adversarial Attacks

  • paper_url: http://arxiv.org/abs/2308.04018
  • repo_url: None
  • paper_authors: Dongyoon Yang, Kunwoong Kim, Yongdai Kim
  • for: 提高现有的隐私学习(SSL)算法性能
  • methods: 利用对预训练模型的 adversarial 攻击选择高自信度无标记数据进行标注
  • results: 在 CIFAR10 上,与 SCAR 结合的三种 latest SSL algorithms 显示出显著提高图像分类性能
    Abstract Semi-supervised learning (SSL) algorithm is a setup built upon a realistic assumption that access to a large amount of labeled data is tough. In this study, we present a generalized framework, named SCAR, standing for Selecting Clean samples with Adversarial Robustness, for improving the performance of recent SSL algorithms. By adversarially attacking pre-trained models with semi-supervision, our framework shows substantial advances in classifying images. We introduce how adversarial attacks successfully select high-confident unlabeled data to be labeled with current predictions. On CIFAR10, three recent SSL algorithms with SCAR result in significantly improved image classification.
    摘要 半supervised learning(SSL)算法是基于现实的假设,即获得大量标注数据很Difficult。在这种研究中,我们提出一种普适的框架,名为SCAR,即选择干净样本并具有对抗性强度,以提高 latest SSL算法的性能。通过对预训练模型进行对抗性攻击,我们的框架成功地选择高自信产生的无标注样本进行标注。在CIFAR10上,三种latest SSL算法与SCAR结果显著改善图像分类。

Multi-Granularity Attention Model for Group Recommendation

  • paper_url: http://arxiv.org/abs/2308.04017
  • repo_url: None
  • paper_authors: Jianye Ji, Jiayan Pei, Shaochuan Lin, Taotao Zhou, Hengxu He, Jia Jia, Ning Hu
  • for: 提供个性化推荐给多个用户组 based on their shared interests, preferences, and characteristics.
  • methods: 使用多级别的granularity (i.e., subsets, groups, and supersets) to uncover group members’ latent preferences and mitigate recommendation noise. Specifically, our method includes a Subset Preference Extraction module, a Group Preference Extraction module, and a Superset Preference Extraction module.
  • results: 在多个级别的granularity上减少推荐噪音,并全面学习用户的个性兴趣. Extensive offline and online experiments have demonstrated the superiority of our method in terms of performance.
    Abstract Group recommendation provides personalized recommendations to a group of users based on their shared interests, preferences, and characteristics. Current studies have explored different methods for integrating individual preferences and making collective decisions that benefit the group as a whole. However, most of them heavily rely on users with rich behavior and ignore latent preferences of users with relatively sparse behavior, leading to insufficient learning of individual interests. To address this challenge, we present the Multi-Granularity Attention Model (MGAM), a novel approach that utilizes multiple levels of granularity (i.e., subsets, groups, and supersets) to uncover group members' latent preferences and mitigate recommendation noise. Specially, we propose a Subset Preference Extraction module that enhances the representation of users' latent subset-level preferences by incorporating their previous interactions with items and utilizing a hierarchical mechanism. Additionally, our method introduces a Group Preference Extraction module and a Superset Preference Extraction module, which explore users' latent preferences on two levels: the group-level, which maintains users' original preferences, and the superset-level, which includes group-group exterior information. By incorporating the subset-level embedding, group-level embedding, and superset-level embedding, our proposed method effectively reduces group recommendation noise across multiple granularities and comprehensively learns individual interests. Extensive offline and online experiments have demonstrated the superiority of our method in terms of performance.
    摘要 群体推荐提供个性化的推荐给群体成员基于他们共同的兴趣、偏好和特征。现有研究已经探索了不同的方法来集成个体偏好并为群体作出共同的决策,但大多数情况都忽略了用户的潜在偏好,导致个体兴趣的学习不够。为解决这个挑战,我们提出了多级别注意力模型(MGAM),一种新的方法,利用不同级别的划分(i.e., 子集、组和超集)来探索群体成员的潜在偏好并减少推荐噪音。具体来说,我们提出了一个子集偏好提取模块,通过利用用户对物品的前期互动和层次机制来强化用户的潜在子集级别偏好的表示。此外,我们的方法还引入了组偏好提取模块和超集偏好提取模块,它们分别探索用户的组级别偏好和超集级别偏好。通过结合子集级别嵌入、组级别嵌入和超集级别嵌入,我们提出的方法可以有效减少群体推荐噪音并全面学习个体兴趣。经过大量的线上和线下实验,我们的方法在性能方面表现出了明显的优势。

Understanding CNN Hidden Neuron Activations Using Structured Background Knowledge and Deductive Reasoning

  • paper_url: http://arxiv.org/abs/2308.03999
  • repo_url: https://github.com/abhilekha-dalal/xai-using-wikidataAndEcii
  • paper_authors: Abhilekha Dalal, Md Kamruzzaman Sarker, Adrita Barua, Eugene Vasserman, Pascal Hitzler
  • for: 这篇论文的目的是解释深度学习系统中隐藏层neuron的活动,以提供系统内部检测输入的相关信息,从而减轻深度学习系统的黑盒效应。
  • methods: 这篇论文使用了大规模背景知识(约200万类)和基于描述逻辑的符号推理方法 called Concept Induction,原本设计用于semantic web领域。这种方法可以自动将大规模背景知识链接到 convolutional neural network 中 dense layer 中的各个神经元,并通过假设和验证过程提供有意义的标签。
  • results: 研究结果表明,这种方法可以自动地将大规模背景知识链接到 convolutional neural network 中 dense layer 中的各个神经元,并提供有意义的标签。这些标签可以帮助解释深度学习系统中隐藏层neuron的活动,从而减轻深度学习系统的黑盒效应。
    Abstract A major challenge in Explainable AI is in correctly interpreting activations of hidden neurons: accurate interpretations would provide insights into the question of what a deep learning system has internally detected as relevant on the input, demystifying the otherwise black-box character of deep learning systems. The state of the art indicates that hidden node activations can, in some cases, be interpretable in a way that makes sense to humans, but systematic automated methods that would be able to hypothesize and verify interpretations of hidden neuron activations are underexplored. In this paper, we provide such a method and demonstrate that it provides meaningful interpretations. Our approach is based on using large-scale background knowledge approximately 2 million classes curated from the Wikipedia concept hierarchy together with a symbolic reasoning approach called Concept Induction based on description logics, originally developed for applications in the Semantic Web field. Our results show that we can automatically attach meaningful labels from the background knowledge to individual neurons in the dense layer of a Convolutional Neural Network through a hypothesis and verification process.
    摘要 一个主要挑战在可解释人工智能是正确地解释隐藏神经元的活动:正确的解释会提供关于deep learning系统内部检测到的输入信息的深入了解,从而消除深度学习系统的黑盒特性。现状的最佳实践表明,隐藏节点的活动可以,在某些情况下,被解释得通常是人类可理解的,但系统化的自动方法,能够假设和验证解释隐藏神经元的活动,尚未得到充分的探索。在这篇论文中,我们提供了一种这样的方法,并证明它可以提供有意义的解释。我们的方法基于使用大规模的背景知识(约200万个类别),来自Wikipedia概念层次结构,以及基于描述逻辑的符号推理方法 called Concept Induction,原始是为Semantic Web领域开发的。我们的结果表明,我们可以通过一种假设和验证过程,自动将background知识中的有意义标签附加到 convolutional neural network 的紧凑层中的个体神经元。

Cooperative Multi-Type Multi-Agent Deep Reinforcement Learning for Resource Management in Space-Air-Ground Integrated Networks

  • paper_url: http://arxiv.org/abs/2308.03995
  • repo_url: None
  • paper_authors: Hengxi Zhang, Huaze Tang, Wenbo Ding, Xiao-Ping Zhang
  • for: 这篇论文的目的是提出一个包含多种通讯链接的Space-Air-Ground Integrated Network(SAGIN)系统,并使用合作多型多代理人深度强化学习(CMT-MARL)方法来解决资源管理问题。
  • methods: 这篇论文使用了五种不同的通讯链接,并提出了一个有效的CMT-MARL方法来管理这些链接的资源。
  • results: 实验结果显示了CMT-MARL方法的有效性,包括总转送率和转送成功率等关键性能指标。这些结果证明了SAGIN系统的可能性和实现性。
    Abstract The Space-Air-Ground Integrated Network (SAGIN), integrating heterogeneous devices including low earth orbit (LEO) satellites, unmanned aerial vehicles (UAVs), and ground users (GUs), holds significant promise for advancing smart city applications. However, resource management of the SAGIN is a challenge requiring urgent study in that inappropriate resource management will cause poor data transmission, and hence affect the services in smart cities. In this paper, we develop a comprehensive SAGIN system that encompasses five distinct communication links and propose an efficient cooperative multi-type multi-agent deep reinforcement learning (CMT-MARL) method to address the resource management issue. The experimental results highlight the efficacy of the proposed CMT-MARL, as evidenced by key performance indicators such as the overall transmission rate and transmission success rate. These results underscore the potential value and feasibility of future implementation of the SAGIN.
    摘要 Space-Air-Ground интеegrated Network (SAGIN),融合各种不同设备,包括低地球轨道卫星(LEO)、无人飞行器(UAV)和地面用户(GU),具有推动智能城市应用的巨大潜力。然而,SAGIN资源管理是一项需要紧迫研究的挑战,因为不当的资源管理会导致数据传输差,从而影响智能城市服务的质量。在这篇论文中,我们提出了一个全面的 SAGIN 系统,包括五种不同的通信链接,并提出了一种高效的合作多种多代理人深度学习(CMT-MARL)方法来解决资源管理问题。实验结果表明,提议的 CMT-MARL 方法能够减少数据传输差和提高传输成功率,这些结果证明了 SAGIN 的可能性和实现性。

AI Chatbots as Multi-Role Pedagogical Agents: Transforming Engagement in CS Education

  • paper_url: http://arxiv.org/abs/2308.03992
  • repo_url: None
  • paper_authors: Cassie Chen Cao, Zijian Ding, Jionghao Lin, Frank Hopfgartner
    for:这项研究旨在利用人工智能(AI)搭载的多角色 чат bot 来提高计算机科学教育的学习经验和参与度。methods:我们采用了设计基本研究方法,开发、实现和评估一个具有四个不同 чат bot 角色的学习环境,这些角色基于自主决定理论,满足学生的三种 innate 心理需求 - 能力、自主和相互关系。results:我们在高等教育上下文中进行了一个月的测试,征得 200 名学生的参与,并与人教和单个 чат bot 的条件进行比较。我们的研究采用了混合方法,包括量化测量如 chat log 序列分析,以及讨论和问卷调查。通过结合 cutting-edge 自然语言处理技术如话题分析和情感分析,我们提供了深入的理解系统对学生参与度、动机和问题解决方面的影响。
    Abstract This study investigates the use of Artificial Intelligence (AI)-powered, multi-role chatbots as a means to enhance learning experiences and foster engagement in computer science education. Leveraging a design-based research approach, we develop, implement, and evaluate a novel learning environment enriched with four distinct chatbot roles: Instructor Bot, Peer Bot, Career Advising Bot, and Emotional Supporter Bot. These roles, designed around the tenets of Self-Determination Theory, cater to the three innate psychological needs of learners - competence, autonomy, and relatedness. Additionally, the system embraces an inquiry-based learning paradigm, encouraging students to ask questions, seek solutions, and explore their curiosities. We test this system in a higher education context over a period of one month with 200 participating students, comparing outcomes with conditions involving a human tutor and a single chatbot. Our research utilizes a mixed-methods approach, encompassing quantitative measures such as chat log sequence analysis, and qualitative methods including surveys and focus group interviews. By integrating cutting-edge Natural Language Processing techniques such as topic modelling and sentiment analysis, we offer an in-depth understanding of the system's impact on learner engagement, motivation, and inquiry-based learning. This study, through its rigorous design and innovative approach, provides significant insights into the potential of AI-empowered, multi-role chatbots in reshaping the landscape of computer science education and fostering an engaging, supportive, and motivating learning environment.
    摘要 We test the system in a higher education context for one month with 200 participating students, comparing outcomes with conditions involving a human tutor and a single chatbot. Our research combines quantitative measures such as chat log sequence analysis and qualitative methods like surveys and focus group interviews. We employ cutting-edge Natural Language Processing techniques like topic modeling and sentiment analysis to gain a deeper understanding of the system's impact on learner engagement, motivation, and inquiry-based learning.Our study offers significant insights into the potential of AI-empowered, multi-role chatbots to reshape computer science education and create an engaging, supportive, and motivating learning environment. By integrating innovative approaches and cutting-edge technologies, we provide a comprehensive understanding of the system's effectiveness and its potential for future applications.

NEOLAF, an LLM-powered neural-symbolic cognitive architecture

  • paper_url: http://arxiv.org/abs/2308.03990
  • repo_url: None
  • paper_authors: Richard Jiarui Tong, Cassie Chen Cao, Timothy Xueqian Lee, Guodong Zhao, Ray Wan, Feiyue Wang, Xiangen Hu, Robin Schmucker, Jinsheng Pan, Julian Quevedo, Yu Lu
  • for: 这篇论文旨在构建一个智能代理人,用于解决复杂的数学问题。
  • methods: 该论文提出了一种基于神经网络和符号学的聪明架构,名为NEOLAF,可以模型和构建智能代理人。NEOLAF架构具有可解释性、逐步学习、高效性、协作和分布式学习、人工智能在循环中启用、自我改进等优点。
  • results: 在使用MATH数据集上进行的实验表明,NEOLAF代理人具有出色的学习能力,并且有可能革新认知架构和自我改进的教学系统。
    Abstract This paper presents the Never Ending Open Learning Adaptive Framework (NEOLAF), an integrated neural-symbolic cognitive architecture that models and constructs intelligent agents. The NEOLAF framework is a superior approach to constructing intelligent agents than both the pure connectionist and pure symbolic approaches due to its explainability, incremental learning, efficiency, collaborative and distributed learning, human-in-the-loop enablement, and self-improvement. The paper further presents a compelling experiment where a NEOLAF agent, built as a problem-solving agent, is fed with complex math problems from the open-source MATH dataset. The results demonstrate NEOLAF's superior learning capability and its potential to revolutionize the field of cognitive architectures and self-improving adaptive instructional systems.
    摘要 Translation Notes:* "Never Ending Open Learning Adaptive Framework" (NEOLAF) is translated as "无止境开放学习适应框架" (Wú zhì jìng kāifàng xuéxí suīyìng kāngyì)* "pure connectionist" is translated as "纯连接主义" (chún liánxì zhǔyì)* "pure symbolic" is translated as "纯符号主义" (chún fúhào zhǔyì)* "explainability" is translated as "可解释性" (kějìexplainability)* "incremental learning" is translated as "逐步学习" (jìbù xuéxí)* "efficiency" is translated as "效率" (fùliàng)* "collaborative and distributed learning" is translated as "合作分布式学习" (hèzuò fēnzhèng zhīxíng xuéxí)* "human-in-the-loop enablement" is translated as "人在循环启用" (rén zài xiànglún kāi yòng)* "self-improvement" is translated as "自我改进" (zìwǒ gǎi jìn)

SimplyRetrieve: A Private and Lightweight Retrieval-Centric Generative AI Tool

  • paper_url: http://arxiv.org/abs/2308.03983
  • repo_url: https://github.com/rcgai/simplyretrieve
  • paper_authors: Youyang Ng, Daisuke Miyashita, Yasuto Hoshi, Yasuhiro Morioka, Osamu Torii, Tomoya Kodama, Jun Deguchi
  • for: 这篇论文旨在探讨如何通过私有数据和公共可用的生成AI系统之间的集成,以提高生成AI的性能而不需要额外的模型微调。
  • methods: 该论文使用了 Retrieval-Centric Generation(RCG)方法,其中分离了LLM和检索器在上下文理解和知识储存中的角色,从而可能导致更高效的实现。
  • results: 该论文介绍了一个开源的GUI和API基于RCG平台,名为SimplyRetrieve,它具有本地化、轻量级和用户友好的界面,可以帮助机器学习社区更好地利用这些高级技术。
    Abstract Large Language Model (LLM) based Generative AI systems have seen significant progress in recent years. Integrating a knowledge retrieval architecture allows for seamless integration of private data into publicly available Generative AI systems using pre-trained LLM without requiring additional model fine-tuning. Moreover, Retrieval-Centric Generation (RCG) approach, a promising future research direction that explicitly separates roles of LLMs and retrievers in context interpretation and knowledge memorization, potentially leads to more efficient implementation. SimplyRetrieve is an open-source tool with the goal of providing a localized, lightweight, and user-friendly interface to these sophisticated advancements to the machine learning community. SimplyRetrieve features a GUI and API based RCG platform, assisted by a Private Knowledge Base Constructor and a Retrieval Tuning Module. By leveraging these capabilities, users can explore the potential of RCG for improving generative AI performance while maintaining privacy standards. The tool is available at https://github.com/RCGAI/SimplyRetrieve with an MIT license.
    摘要

CheXFusion: Effective Fusion of Multi-View Features using Transformers for Long-Tailed Chest X-Ray Classification

  • paper_url: http://arxiv.org/abs/2308.03968
  • repo_url: None
  • paper_authors: Dongkyun Kim
  • for: 这份论文是为了解决医疗影像分类中的长尾分布、诊断发现之间的共存、以及每个研究或病人可以提供多个视角的问题。
  • methods: 这份论文提出了一个基于对称融合模组的解决方案,称为CheXFusion,可以有效地聚合多个视角特征,并考虑预测结果的共存关系。这个模组利用自我注意和跨视角注意机制来有效地聚合多个视角特征。此外,论文还探讨了资料平衡和自我训练方法来优化模型的性能。
  • results: 这份论文的解决方案在MIMIC-CXR测试集上取得了0.372 mAP的成绩,在竞赛中排名第一。这表明了考虑多个视角、类别不均匀和预测结果的共存关系在医疗影像分类中的重要性。论文的代码可以在https://github.com/dongkyuk/CXR-LT-public-solution上获取。
    Abstract Medical image classification poses unique challenges due to the long-tailed distribution of diseases, the co-occurrence of diagnostic findings, and the multiple views available for each study or patient. This paper introduces our solution to the ICCV CVAMD 2023 Shared Task on CXR-LT: Multi-Label Long-Tailed Classification on Chest X-Rays. Our approach introduces CheXFusion, a transformer-based fusion module incorporating multi-view images. The fusion module, guided by self-attention and cross-attention mechanisms, efficiently aggregates multi-view features while considering label co-occurrence. Furthermore, we explore data balancing and self-training methods to optimize the model's performance. Our solution achieves state-of-the-art results with 0.372 mAP in the MIMIC-CXR test set, securing 1st place in the competition. Our success in the task underscores the significance of considering multi-view settings, class imbalance, and label co-occurrence in medical image classification. Public code is available at https://github.com/dongkyuk/CXR-LT-public-solution
    摘要 医学图像分类面临独特挑战,这些挑战包括疾病的长尾分布、诊断发现的共处和每个案例或病人可以提供多个视图。本文介绍我们在ICCV CVAMD 2023 共同任务中的解决方案:多标签长尾分类在胸部X射线图像(CXR-LT)中。我们的方法引入了CheXFusion,一种基于变换器的融合模块,该模块通过自我注意和交叉注意机制有效地聚合多视图特征,同时考虑标签共处。此外,我们还探索了数据填充和自我训练方法来优化模型性能。我们的解决方案在MIMIC-CXR测试集上 achievement 0.372 mAP,在竞赛中获得了第一名,这 подтвержда了在医学图像分类中考虑多视图设置、类别不均衡和标签共处的重要性。我们的代码可以在https://github.com/dongkyuk/CXR-LT-public-solution 上获取。

ALFA – Leveraging All Levels of Feature Abstraction for Enhancing the Generalization of Histopathology Image Classification Across Unseen Hospitals

  • paper_url: http://arxiv.org/abs/2308.03936
  • repo_url: None
  • paper_authors: Milad Sikaroudi, Maryam Hosseini, Shahryar Rahnamayan, H. R. Tizhoosh
  • for: 提高图像分类的泛化性,使模型能够在不同的医院中提供更好的表现
  • methods: 使用扩展自我超级视图,并在不同的分布差异场景下进行自我超级视图,从而 derivatin invariant feature from training images,并使用域对齐模块来进一步提取抽象特征
  • results: 实验结果表明,提出的方法可以在不同的医院图像中提供更好的泛化性,并在不同的分布差异场景下进行更好的表现
    Abstract We propose an exhaustive methodology that leverages all levels of feature abstraction, targeting an enhancement in the generalizability of image classification to unobserved hospitals. Our approach incorporates augmentation-based self-supervision with common distribution shifts in histopathology scenarios serving as the pretext task. This enables us to derive invariant features from training images without relying on training labels, thereby covering different abstraction levels. Moving onto the subsequent abstraction level, we employ a domain alignment module to facilitate further extraction of invariant features across varying training hospitals. To represent the highly specific features of participating hospitals, an encoder is trained to classify hospital labels, independent of their diagnostic labels. The features from each of these encoders are subsequently disentangled to minimize redundancy and segregate the features. This representation, which spans a broad spectrum of semantic information, enables the development of a model demonstrating increased robustness to unseen images from disparate distributions. Experimental results from the PACS dataset (a domain generalization benchmark), a synthetic dataset created by applying histopathology-specific jitters to the MHIST dataset (defining different domains with varied distribution shifts), and a Renal Cell Carcinoma dataset derived from four image repositories from TCGA, collectively indicate that our proposed model is adept at managing varying levels of image granularity. Thus, it shows improved generalizability when faced with new, out-of-distribution hospital images.
    摘要 我们提出了一种涵盖所有水平的特征抽象方法,目的是提高图像分类的通用性,覆盖不同医院的不见图像。我们的方法通过在历史病理景象中添加自我超visuospatial alignment,使得在不需要训练标签的情况下 derivation invariant features from 训练图像。在接下来的层次,我们使用域Alignment模块来进一步提取不同医院的抽象特征。为了表示参与医院的特定特征,我们训练了一个Encoder来分类医院标签,不同于其诊断标签。从每个Encoder中提取的特征后,我们进行了拟合以避免重复性和分化特征。这种表示,覆盖了广泛的语义信息,使得我们提出的模型在面对新、未经见图像时显示出更好的通用性。实验结果来自PACS数据集(领域通用性标准 benchmark)、在应用特定于 histopathology 的扰动后生成的 sintethic 数据集以及来自TCGA的 Renal Cell Carcinoma 数据集,表明我们的模型在不同水平的图像粒度下具有更好的普适性。

  • paper_url: http://arxiv.org/abs/2308.03929
  • repo_url: None
  • paper_authors: Ahmed Abdeen Hamed, Alessandro Crimi, Magdalena M. Misiak, Byung Suk Lee
  • for: 本研究的目的是使用ontology-based知识图构建医学文献和人工智能生成的内容,以分辨准确信息和未经验证的数据。
  • methods: 我们使用了疾病 ontology (DOID) 和症状 ontology (SYMP) 构建知识图,并使用了我们的事实检查算法和网络中心性度量来进行 GPT 疾病-症状链分析,以量化医学文献和人工智能生成的内容中的准确性。
  • results: 我们的结果表明,在比较不同的 ChatGPT 知识图和其相应的 PubMed 知识图时,发现了一些有趣的观察结果。例如,一些 ChatGPT 知识图中的连接数比 PubMed 知识图更多,而且一些 GPT 知识图的中心性度量更高,尤其是对于相互重叠的节点。这些结果表明了人工智能生成的内容中的未经验证知识的潜在价值,需要进一步验证。
    Abstract Methods: Through an innovative approach, we construct ontology-based knowledge graphs from authentic medical literature and AI-generated content. Our goal is to distinguish factual information from unverified data. We compiled two datasets: one from biomedical literature using a "human disease and symptoms" query, and another generated by ChatGPT, simulating articles. With these datasets (PubMed and ChatGPT), we curated 10 sets of 250 abstracts each, selected randomly with a specific seed. Our method focuses on utilizing disease ontology (DOID) and symptom ontology (SYMP) to build knowledge graphs, robust mathematical models that facilitate unbiased comparisons. By employing our fact-checking algorithms and network centrality metrics, we conducted GPT disease-symptoms link analysis to quantify the accuracy of factual knowledge amid noise, hypotheses, and significant findings. Results: The findings obtained from the comparison of diverse ChatGPT knowledge graphs with their PubMed counterparts revealed some interesting observations. While PubMed knowledge graphs exhibit a wealth of disease-symptom terms, it is surprising to observe that some ChatGPT graphs surpass them in the number of connections. Furthermore, some GPT graphs are demonstrating supremacy of the centrality scores, especially for the overlapping nodes. This striking contrast indicates the untapped potential of knowledge that can be derived from AI-generated content, awaiting verification. Out of all the graphs, the factual link ratio between any two graphs reached its peak at 60%. Conclusions: An intriguing insight from our findings was the striking number of links among terms in the knowledge graph generated from ChatGPT datasets, surpassing some of those in its PubMed counterpart. This early discovery has prompted further investigation using universal network metrics to unveil the new knowledge the links may hold.
    摘要 方法:通过创新的方法,我们从authentic医学文献和AI生成的内容中构建了ontology-based知识图。我们的目标是区分 фактической信息和未经证实的数据。我们编译了两个数据集:一个是生物医学文献,使用“人类疾病和症状”查询,另一个是由ChatGPT生成的文章。 With这两个数据集(PubMed和ChatGPT),我们精心审选了250个摘要,使用特定的种子值进行随机选择。我们的方法是利用疾病ontology(DOID)和症状ontology(SYMP)建立知识图,并使用我们的 фактиче性检查算法和网络中心度度量来进行GPT疾病-症状链接分析,以量化factual知识中的噪音、假设和重要发现。结果:对比多个ChatGPT知识图与其PubMed对应的知识图,我们发现了一些有趣的观察。PubMed知识图显示了丰富的疾病-症状 термина,但是某些ChatGPT graphs在连接数量方面超过了它们。此外,一些GPT graphs的中心度分数特别高,特别是在重叠的节点上。这个明显的对比表明AI生成的内容中的知识尚未得到证实,但它们具有潜在的价值。在所有知识图中,factual链接比率最高达60%。结论:我们的发现表明,ChatGPT生成的知识图中的链接数量异常多,有些连接数量甚至超过了PubMed知识图中的一些连接。这种早期的发现已经引发了我们进一步的调查,使用通用网络度量来揭示这些链接可能含有的新知识。

ViLP: Knowledge Exploration using Vision, Language, and Pose Embeddings for Video Action Recognition

  • paper_url: http://arxiv.org/abs/2308.03908
  • repo_url: None
  • paper_authors: Soumyabrata Chaudhuri, Saumik Bhattacharya
  • for: 本文提出了一种基于多模态学习的人体动作识别方法,以提高人体动作识别的准确率。
  • methods: 本文使用了一种pose增强的视觉语言模型(VLM),结合了pose、视觉信息和文本特征,以便更好地识别人体动作。
  • results: 根据实验结果,本文的方法在UCFC-101和HMDB-51两个人体动作识别数据集上的准确率分别为92.81%和73.02%,而无需视频数据预训练,而且经过kinetics预训练后,准确率分别提高至96.11%和75.75%。
    Abstract Video Action Recognition (VAR) is a challenging task due to its inherent complexities. Though different approaches have been explored in the literature, designing a unified framework to recognize a large number of human actions is still a challenging problem. Recently, Multi-Modal Learning (MML) has demonstrated promising results in this domain. In literature, 2D skeleton or pose modality has often been used for this task, either independently or in conjunction with the visual information (RGB modality) present in videos. However, the combination of pose, visual information, and text attributes has not been explored yet, though text and pose attributes independently have been proven to be effective in numerous computer vision tasks. In this paper, we present the first pose augmented Vision-language model (VLM) for VAR. Notably, our scheme achieves an accuracy of 92.81% and 73.02% on two popular human video action recognition benchmark datasets, UCF-101 and HMDB-51, respectively, even without any video data pre-training, and an accuracy of 96.11% and 75.75% after kinetics pre-training.
    摘要 视频动作识别(VAR)是一个复杂的任务,它的内在复杂性使得设计一个综合性的框架来识别大量人类动作变得具有挑战性。在文献中,不同的方法已经被探讨,但是设计一个综合性的框架来识别大量人类动作仍然是一个挑战性的问题。在文献中,2D骨架或 pose 模式 часто被用于这项任务,可以独立或与视觉信息(RGB 模式)一起使用。然而,将 pose、视觉信息和文本特征相结合尚未被探讨,尽管文本和 pose 特征独立地已经在计算机视觉任务中证明有效。在这篇论文中,我们提出了首个含有pose的视力语言模型(VLM),该模型在 UCF-101 和 HMDB-51 两个常用的人类视频动作识别 benchmark 数据集上取得了92.81% 和 73.02% 的准确率,而不需要任何视频数据预训练,并且在 kinetic 预训练后达到了96.11% 和 75.75% 的准确率。

Intelligent Assistant Language Understanding On Device

  • paper_url: http://arxiv.org/abs/2308.03905
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Cecilia Aas, Hisham Abdelsalam, Irina Belousova, Shruti Bhargava, Jianpeng Cheng, Robert Daland, Joris Driesen, Federico Flego, Tristan Guigue, Anders Johannsen, Partha Lal, Jiarui Lu, Joel Ruben Antony Moniz, Nathan Perkins, Dhivya Piraviperumal, Stephen Pulman, Diarmuid Ó Séaghdha, David Q. Sun, John Torr, Marco Del Vecchio, Jay Wacker, Jason D. Williams, Hong Yu
  • for: 本研究旨在提出一种运行于个人设备上的自然语言理解系统,以提高隐私、可靠性、速度、表达力和准确性。
  • methods: 本文介绍了设计选择和技术方向,包括对对话系统文献中一些方法的实践评估,以及对适用于实际部署的挑战。
  • results: 本研究实现了一种更加私钥、可靠、快速、表达力和准确的自然语言理解系统,并提供了实践经验和建议,以便未来的研究工作。
    Abstract It has recently become feasible to run personal digital assistants on phones and other personal devices. In this paper we describe a design for a natural language understanding system that runs on device. In comparison to a server-based assistant, this system is more private, more reliable, faster, more expressive, and more accurate. We describe what led to key choices about architecture and technologies. For example, some approaches in the dialog systems literature are difficult to maintain over time in a deployment setting. We hope that sharing learnings from our practical experiences may help inform future work in the research community.
    摘要 现在已经可以在手机和其他个人设备上运行个人数字助手。在这篇论文中,我们描述了一种运行在设备上的自然语言理解系统的设计。与服务器上的助手相比,这种系统更加私钥、可靠、快速、表达力强、准确。我们详细介绍了一些关键的建筑和技术选择。例如,一些对话系统文献中的方法在部署环境中具有维护困难。我们希望通过分享我们的实践经验,对未来的研究工作产生影响。

FLIPS: Federated Learning using Intelligent Participant Selection

  • paper_url: http://arxiv.org/abs/2308.03901
  • repo_url: None
  • paper_authors: Rahul Atul Bhope, K. R. Jayaram, Nalini Venkatasubramanian, Ashish Verma, Gegi Thomas
    for: 这个论文旨在解决 Federated Learning (FL) 训练任务中数据和参与者多样性的管理问题,特别是在FL训练过程中对参与者选择的影响。methods: 该论文提出了一种基于标签分布划分的中间件系统,称为 FLIPS,它可以在FL训练过程中对参与者进行划分,以确保每个划分群在参与者选择中具有平等的代表性。此外,FLIPS还支持多种常见的FL算法,包括 FedAvg、FedProx、FedDyn、FedOpt 和 FedYogi。为了管理分布式平台的多样性和动态资源可用性,FLIPS还包含了一种卫星管理机制。results: 该论文的实验研究表明,FLIPS可以在实际世界数据集上提高FL训练的精度,相比随机选择、Oort和梯度划分等其他两种”聪明”选择机制,FLIPS可以在20-60%的通信成本下提高精度 by 17-20%。此外,FLIPS的效果还能在存在延迟参与者的情况下保持。
    Abstract This paper presents the design and implementation of FLIPS, a middleware system to manage data and participant heterogeneity in federated learning (FL) training workloads. In particular, we examine the benefits of label distribution clustering on participant selection in federated learning. FLIPS clusters parties involved in an FL training job based on the label distribution of their data apriori, and during FL training, ensures that each cluster is equitably represented in the participants selected. FLIPS can support the most common FL algorithms, including FedAvg, FedProx, FedDyn, FedOpt and FedYogi. To manage platform heterogeneity and dynamic resource availability, FLIPS incorporates a straggler management mechanism to handle changing capacities in distributed, smart community applications. Privacy of label distributions, clustering and participant selection is ensured through a trusted execution environment (TEE). Our comprehensive empirical evaluation compares FLIPS with random participant selection, as well as two other "smart" selection mechanisms - Oort and gradient clustering using two real-world datasets, two different non-IID distributions and three common FL algorithms (FedYogi, FedProx and FedAvg). We demonstrate that FLIPS significantly improves convergence, achieving higher accuracy by 17 - 20 % with 20 - 60 % lower communication costs, and these benefits endure in the presence of straggler participants.
    摘要 具体来说,这篇论文提出了一个名为FLIPS的中间件系统,用于管理 federated learning(FL)训练任务中的数据和参与者多样性。FLIPS在FL训练之前将参与者按照其标签分布进行分群,并在训练中确保每个分群都得到了公平的表现。FLIPS支持通用的FL算法,同时管理平台多样性和动态资源可用性,并通过安全执行环境(TEE)保证标签分布、分群和参与者选择的隐私。我们的实验证明,FLIPS可以大幅提高FL训练的收敛速度,在20-60%的通信成本下达到17-20%的高精度,这些优势在受到延迟参与者的情况下也保持不变。

Exploiting Generalization in Offline Reinforcement Learning via Unseen State Augmentations

  • paper_url: http://arxiv.org/abs/2308.03882
  • repo_url: None
  • paper_authors: Nirbhay Modhe, Qiaozi Gao, Ashwin Kalyan, Dhruv Batra, Govind Thattai, Gaurav Sukhatme
  • for: 这个论文主要研究了线上强化学习(RL)方法在找到未知状态的问题上。
  • methods: 这个论文使用了模型自由RL方法和模型基于RL方法,它们都会征略未知状态的值。但是这些方法因两个因素受限:一是模型的扩展horizon非常短,二是模型扩展只基于已知的Offline数据。这个论文提出了一种新的未知状态扩展策略,允许在已知状态的基础上找到未知状态。
  • results: 这个论文在多个Offline RL任务中实现了改进的性能,并发现了其扩展策略通常比基eline更保守。
    Abstract Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by conservative value estimation -- penalizing values of unseen states and actions. Model-free methods penalize values at all unseen actions, while model-based methods are able to further exploit unseen states via model rollouts. However, such methods are handicapped in their ability to find unseen states far away from the available offline data due to two factors -- (a) very short rollout horizons in models due to cascading model errors, and (b) model rollouts originating solely from states observed in offline data. We relax the second assumption and present a novel unseen state augmentation strategy to allow exploitation of unseen states where the learned model and value estimates generalize. Our strategy finds unseen states by value-informed perturbations of seen states followed by filtering out states with epistemic uncertainty estimates too high (high error) or too low (too similar to seen data). We observe improved performance in several offline RL tasks and find that our augmentation strategy consistently leads to overall lower average dataset Q-value estimates i.e. more conservative Q-value estimates than a baseline.
    摘要 “在线束缚学习(RL)方法寻求平衡between exploration和利用,通过保守的价值估计--- penalty 未看过的状态和动作的价值。无模型方法对所有未看过的动作进行 penalty,而具有模型方法可以通过模型执行来进一步利用未看过的状态。然而,这些方法因两个因素受到限制---(a)模型中的执行 horizon 非常短,因为模型错误的堆叠,以及(b)模型执行仅启动自已经见过的状态。我们松动这一假设,并提出了一种新的未看过状态扩展策略,允许在已知模型和价值估计中利用未看过状态。我们的策略通过在已经看过的状态上进行价值意识的偏移,然后过滤高度不确定性(高错误)或者太相似于已经看过的数据的状态。我们发现在多个Offline RL任务中表现出色,并观察到我们的扩展策略通常比基准值更保守,即更低的平均数据Q值估计。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Guarding the Guardians: Automated Analysis of Online Child Sexual Abuse

  • paper_url: http://arxiv.org/abs/2308.03880
  • repo_url: None
  • paper_authors: Juanita Puentes, Angela Castillo, Wilmar Osejo, Yuly Calderón, Viviana Quintero, Lina Saldarriaga, Diana Agudelo, Pablo Arbeláez
  • For: The paper is written to address the urgent need for a solution to analyze children’s sexual abuse reports comprehensively, with a focus on reducing the risk of exposure to harmful content for analysts.* Methods: The paper proposes a novel automated tool that categorizes reports on three dimensions: Subject, Degree of Criminality, and Damage. Additionally, the paper introduces a novel approach to annotate the collected data, enabling a more in-depth analysis of the reports.* Results: The paper’s approach significantly reduces the risk of exposure to harmful content for analysts, and improves the comprehension of fundamental patterns and trends in children’s sexual abuse reports, enabling law enforcement agencies and policymakers to create focused strategies in the fight against children’s violence.In Simplified Chinese text, the three key points would be:
  • for: 这篇论文是为了解决儿童色情虐待报告的全面分析问题,尤其是减少分析人员遭受有害内容的风险。
  • methods: 论文提出了一种新的自动化工具,可以将报告分为三个维度:主体、犯罪程度和伤害。此外,论文还介绍了一种新的标注方法,以便更深入地分析报告中的数据。
  • results: 论文的方法可以明显减少分析人员遭受有害内容的风险,同时提高了对儿童色情虐待报告的基本 patrón和趋势的理解,为儿童保护和法制建设提供了有力的支持。
    Abstract Online violence against children has increased globally recently, demanding urgent attention. Competent authorities manually analyze abuse complaints to comprehend crime dynamics and identify patterns. However, the manual analysis of these complaints presents a challenge because it exposes analysts to harmful content during the review process. Given these challenges, we present a novel solution, an automated tool designed to analyze children's sexual abuse reports comprehensively. By automating the analysis process, our tool significantly reduces the risk of exposure to harmful content by categorizing the reports on three dimensions: Subject, Degree of Criminality, and Damage. Furthermore, leveraging our multidisciplinary team's expertise, we introduce a novel approach to annotate the collected data, enabling a more in-depth analysis of the reports. This approach improves the comprehension of fundamental patterns and trends, enabling law enforcement agencies and policymakers to create focused strategies in the fight against children's violence.
    摘要 在全球范围内,网络对儿童的暴力行为已经增加,需要紧急关注。有能力的当局人工分析滥剑投诉,以便更好地理解犯罪动力和趋势。然而,手动分析这些投诉存在挑战,因为它可能曝露分析员遭受有害内容的风险。为了解决这些挑战,我们提出了一种新的解决方案:一种自动化分析儿童色情虐待投诉的工具。通过自动化分析过程,我们的工具可以减少分析员遭受有害内容的风险,并将投诉分为三个维度:主体、犯罪程度和伤害。此外,我们的多科学队伍专家的协作,我们引入了一种新的数据标注方法,以便更深入地分析投诉。这种方法可以更好地描述基本的趋势和模式,使宪法机关和制定政策者可以根据这些数据制定有关儿童暴力的专门策略。

Trusting Language Models in Education

  • paper_url: http://arxiv.org/abs/2308.03866
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Jogi Suda Neto, Li Deng, Thejaswi Raya, Reza Shahbazi, Nick Liu, Adhitya Venkatesh, Miral Shah, Neeru Khosla, Rodrigo Capobianco Guido
  • for: 这个论文是为了提高语言模型在教育领域中的准确率,避免模型显示错误的答案,从而对学生造成误导。
  • methods: 这个论文提出了使用XGBoost在BERT之上进行报告修正,使用基于注意力机制的特征来改善模型的自信度。
  • results: 这个论文发现了注意力流中的不确定程度与模型回答质量之间存在关系,并通过修正模型的自信度来避免错误答案的显示。
    Abstract Language Models are being widely used in Education. Even though modern deep learning models achieve very good performance on question-answering tasks, sometimes they make errors. To avoid misleading students by showing wrong answers, it is important to calibrate the confidence - that is, the prediction probability - of these models. In our work, we propose to use an XGBoost on top of BERT to output the corrected probabilities, using features based on the attention mechanism. Our hypothesis is that the level of uncertainty contained in the flow of attention is related to the quality of the model's response itself.
    摘要 语言模型在教育领域广泛使用。虽然现代深度学习模型在问答任务上表现非常出色,但有时会出现错误。为了避免通过错误答案误导学生,需要对这些模型进行准确性调整。在我们的工作中,我们提议使用XGBoost在BERT之上输出修正的概率,使用基于注意力机制的特征。我们假设注意力流中的不确定程度与模型的答案质量之间存在相关性。

AI Text-to-Behavior: A Study In Steerability

  • paper_url: http://arxiv.org/abs/2308.07326
  • repo_url: None
  • paper_authors: David Noever, Sam Hyams
  • for: 本研究探讨了大语言模型(LLM)的可控性,尤其是OpenAI的ChatGPT迭代。
  • methods: 我们使用了行为心理学框架OCEAN(开放性、聪明性、外向性、合作性、情绪性),量化测量模型对特定提示的回应。
  • results: 我们发现,“开放性”在语言上存在很大的混乱,而“聪明性”和“情绪性”在OCEAN框架中表现出了明显的强调,“外向性”和“合作性”则表现出了明确的分离。我们的发现表明GPT的多样性和可以根据人类意图进行定制的能力。
    Abstract The research explores the steerability of Large Language Models (LLMs), particularly OpenAI's ChatGPT iterations. By employing a behavioral psychology framework called OCEAN (Openness, Conscientiousness, Extroversion, Agreeableness, Neuroticism), we quantitatively gauged the model's responsiveness to tailored prompts. When asked to generate text mimicking an extroverted personality, OCEAN scored the language alignment to that behavioral trait. In our analysis, while "openness" presented linguistic ambiguity, "conscientiousness" and "neuroticism" were distinctly evoked in the OCEAN framework, with "extroversion" and "agreeableness" showcasing a notable overlap yet distinct separation from other traits. Our findings underscore GPT's versatility and ability to discern and adapt to nuanced instructions. Furthermore, historical figure simulations highlighted the LLM's capacity to internalize and project instructible personas, precisely replicating their philosophies and dialogic styles. However, the rapid advancements in LLM capabilities and the opaque nature of some training techniques make metric proposals degrade rapidly. Our research emphasizes a quantitative role to describe steerability in LLMs, presenting both its promise and areas for further refinement in aligning its progress to human intentions.
    摘要 研究探讨大语言模型(LLM)的可控性,尤其是OpenAI的ChatGPT迭代。通过employnig行为心理学框架called OCEAN(开放性、聪明性、外向性、合作性、情绪性),我们量化了模型对定制提示的回应。当请求生成文本模拟外向性人格时,OCEAN分数表示语言对该行为 trait的吻合。在我们的分析中,“开放性”存在语言 ambiguity,而“聪明性”和“情绪性”在OCEAN框架中得到了明显的表达,而“外向性”和“合作性”则显示了明显的 overlap yet distinct separation from other traits。我们的发现强调GPT的灵活性和对 instrucible 指令的适应能力。此外,历史人物模拟表明了LLM的能力 internalize和 project instructible personas,精准地复制他们的哲学和对话风格。然而,LLM的技能快速发展和一些训练技术的不透明性使得metric proposal degrade rapidly。我们的研究强调了量化描述 LLM 的可控性的重要性,并提出了其推进人类意图的方法。

Mobile Supply: The Last Piece of Jigsaw of Recommender System

  • paper_url: http://arxiv.org/abs/2308.03855
  • repo_url: None
  • paper_authors: Zhenhao Jiang, Biao Zeng, Hao Feng, Jin Liu, Jie Zhang, Jia Jia, Ning Hu
  • for: 提高边缘推荐系统的性能和用户体验
  • methods: 提出了一个新的模块”Mobile Supply”,并使用点 wise paradigm和设备相关的移动排名方法来解决分页触发机制问题
  • results: 实验证明,提出的方法可以further improve the performance of edge-side recommender systems and user experience,并已经在一个大规模的在线美食平台上部署,获得了可观的业务效益。
    Abstract Recommendation system is a fundamental functionality of online platforms. With the development of computing power of mobile phones, some researchers have deployed recommendation algorithms on users' mobile devices to address the problems of data transmission delay and pagination trigger mechanism. However, the existing edge-side mobile rankings cannot completely solve the problem of pagination trigger mechanism. The mobile ranking can only sort the items on the current page, and the fixed set of candidate items limits the performance of the mobile ranking. Besides, after the user has viewed the items of interest to the user on the current page, the user refresh to get a new page of items. This will affect the user's immersive experience because the user is not satisfied with the left items on the current page. In order to address the problem of pagination trigger mechanism, we propose a completely new module in the pipeline of recommender system named Mobile Supply. The pipeline of recommender system is extended to "retrival->pre-ranking->ranking->re-ranking->Mobile Supply->mobile ranking". Specifically, we introduce the concept of list value and use point-wise paradigm to approximate list-wise estimation to calculate the maximum revenue that can be achieved by mobile ranking for the current page. We also design a new mobile ranking approach named device-aware mobile ranking considering the differences of mobile devices tailored to the new pipeline. Extensive offline and online experiments show the superiority of our proposed method and prove that Mobile Supply can further improve the performance of edge-side recommender system and user experience. Mobile Supply has been deployed on the homepage of a large-scale online food platform and has yielded considerable profits in our business.
    摘要 “推荐系统是线上平台的基本功能之一。随着移动设备的计算能力的提高,一些研究人员已经将推荐算法部署到用户的移动设备上以解决数据传输延迟和分页触发器机制的问题。然而,现有的边缘式移动排名无法完全解决分页触发器机制的问题。这个边缘式移动排名只能在当前页面上排序项目,而且固定的候选项目限制了排名的表现。此外,当用户已经查看了他们 interessant 的项目时,用户刷新以获取新的页面项目。这会影响用户的沉浸体验,因为用户不满意LEFT项目。”“为了解决分页触发器机制的问题,我们提出了一个全新的模组,名为 Mobile Supply。我们将推荐系统的管线延展为“获取->预选->排名->重新排名->Mobile Supply->边缘式排名”。具体来说,我们引入了列值的概念,并使用点子法来估算列值的最大收益,以计算可以由边缘式排名获得的当前页面的最大收益。我们还设计了一个新的边缘式排名方法,名为 Device-Aware Mobile Ranking,考虑了移动设备的不同特点,以适应新的管线。”“我们将 Mobile Supply 部署到一个大规模的线上食物平台的首页上,并获得了显著的收益。”

Revisiting Prompt Engineering via Declarative Crowdsourcing

  • paper_url: http://arxiv.org/abs/2308.03854
  • repo_url: None
  • paper_authors: Aditya G. Parameswaran, Shreya Shankar, Parth Asawa, Naman Jain, Yujie Wang
  • for: 该论文旨在提出一种宣告式描述工程(Declarative Prompt Engineering)的视野,以便为LLM(大型自然语言模型)数据处理工作流程进行优化,同时保持成本在所需范围内。
  • methods: 该论文提出了一种基于宣告式描述的工程方法,利用大量自然语言模型(LLM)来进行数据处理工作流程的优化。该方法包括多个宣告策略、内部一致性和混合LLM-非LLM方法等。
  • results: 该论文的预liminary案例研究表明,使用宣告式描述工程可以提高LLM数据处理工作流程的质量,同时保持成本在所需范围内。这些案例包括排序、实体解决和填充等。
    Abstract Large language models (LLMs) are incredibly powerful at comprehending and generating data in the form of text, but are brittle and error-prone. There has been an advent of toolkits and recipes centered around so-called prompt engineering-the process of asking an LLM to do something via a series of prompts. However, for LLM-powered data processing workflows, in particular, optimizing for quality, while keeping cost bounded, is a tedious, manual process. We put forth a vision for declarative prompt engineering. We view LLMs like crowd workers and leverage ideas from the declarative crowdsourcing literature-including leveraging multiple prompting strategies, ensuring internal consistency, and exploring hybrid-LLM-non-LLM approaches-to make prompt engineering a more principled process. Preliminary case studies on sorting, entity resolution, and imputation demonstrate the promise of our approach
    摘要 巨型语言模型(LLM)极其强大地理解和生成文本数据,但是脆弱和容易出错。随着推广工具和热门recipes的出现,关于 socalled prompt engineering——通过一系列提示来要求 LLM 做某件事——的过程在 LLM 驱动的数据处理工作流程中变得极其重要。然而,在Optimizing for quality的同时,保持成本在可控的范围内是一个艰辛的、手动的过程。我们提出了声明式提示工程的视野,将 LLM 看作是一群人群,利用声明式人群创新的想法——包括多种提示策略、保证内部一致性,以及混合 LLM 和非 LLM 方法——来使提示工程变得更加原则化。我们的初步案例研究包括排序、实体解析和填充,表明了我们的方法的承诺。

Recurrent Multi-scale Transformer for High-Resolution Salient Object Detection

  • paper_url: http://arxiv.org/abs/2308.03826
  • repo_url: None
  • paper_authors: Xinhao Deng, Pingping Zhang, Wei Liu, Huchuan Lu
    for:This paper aims to improve the performance of high-resolution salient object detection (HRSOD) by proposing a new dataset and a novel Recurrent Multi-scale Transformer (RMFormer) method.methods:The proposed RMFormer method utilizes shared Transformers and multi-scale refinement architectures to generate high-resolution saliency maps, guided by lower-resolution predictions.results:Extensive experiments on both high-resolution and low-resolution benchmarks demonstrate the effectiveness and superiority of the proposed framework, with the RMFormer method achieving state-of-the-art performance on the newly proposed HRS10K dataset.
    Abstract Salient Object Detection (SOD) aims to identify and segment the most conspicuous objects in an image or video. As an important pre-processing step, it has many potential applications in multimedia and vision tasks. With the advance of imaging devices, SOD with high-resolution images is of great demand, recently. However, traditional SOD methods are largely limited to low-resolution images, making them difficult to adapt to the development of High-Resolution SOD (HRSOD). Although some HRSOD methods emerge, there are no large enough datasets for training and evaluating. Besides, current HRSOD methods generally produce incomplete object regions and irregular object boundaries. To address above issues, in this work, we first propose a new HRS10K dataset, which contains 10,500 high-quality annotated images at 2K-8K resolution. As far as we know, it is the largest dataset for the HRSOD task, which will significantly help future works in training and evaluating models. Furthermore, to improve the HRSOD performance, we propose a novel Recurrent Multi-scale Transformer (RMFormer), which recurrently utilizes shared Transformers and multi-scale refinement architectures. Thus, high-resolution saliency maps can be generated with the guidance of lower-resolution predictions. Extensive experiments on both high-resolution and low-resolution benchmarks show the effectiveness and superiority of the proposed framework. The source code and dataset are released at: https://github.com/DrowsyMon/RMFormer.
    摘要 抽象对象检测(SOD)目的是在图像或视频中识别和分割最为醒目的对象。作为前处理步骤,它在多媒体和视觉任务中具有重要的应用前景。随着捕捉设备的发展,高分辨率SOD(HRSOD)的需求日益增加。然而,传统的SOD方法主要适用于低分辨率图像,使其难以适应HRSOD的发展。虽然一些HRSOD方法已经出现,但是没有足够的大型数据集用于训练和评估。此外,现有的HRSOD方法通常生成不完整的对象区域和不规则的对象边界。为解决上述问题,在这种工作中,我们首先提出了一个新的HRSOD数据集,名为HRS10K,它包含10500个高质量注解图像,分别在2K-8K分辨率上。我们知道,这是HRSOD任务中最大的数据集,它将有助于未来的工作在训练和评估模型。此外,为提高HRSOD性能,我们提出了一种新的循环多ScaleTransformer(RMFormer),它可以在不同的尺度上重复使用共享的Transformer和多尺度精度建立。因此,高分辨率的Saliency图可以通过低分辨率预测的指导生成。我们进行了广泛的实验,并证明了我们的框架的有效性和超越性。数据集和代码可以在https://github.com/DrowsyMon/RMFormer上下载。

A Cost Analysis of Generative Language Models and Influence Operations

  • paper_url: http://arxiv.org/abs/2308.03740
  • repo_url: https://github.com/georgetown-cset/disinfo-costs
  • paper_authors: Micah Musser
  • for: 这个研究的目的是研究宣传人员使用大语言模型(LLM)时的成本和效益。
  • methods: 该研究使用了成本建模和优化分析来分析宣传人员使用LLM时的成本和效益。
  • results: 研究结果表明,LLM只需要生成可用输出,并且输出的可靠性只需要达到25%,就可以为宣传人员提供成本节省。同时,监控控制对于API访问ible的LLM可以减少成本,但是对于国家来说,特别是进行大规模影响操作的国家,没有经济上的收益来自于专门为影响操作培训自己的LLM。
    Abstract Despite speculation that recent large language models (LLMs) are likely to be used maliciously to improve the quality or scale of influence operations, uncertainty persists regarding the economic value that LLMs offer propagandists. This research constructs a model of costs facing propagandists for content generation at scale and analyzes (1) the potential savings that LLMs could offer propagandists, (2) the potential deterrent effect of monitoring controls on API-accessible LLMs, and (3) the optimal strategy for propagandists choosing between multiple private and/or open source LLMs when conducting influence operations. Primary results suggest that LLMs need only produce usable outputs with relatively low reliability (roughly 25%) to offer cost savings to propagandists, that the potential reduction in content generation costs can be quite high (up to 70% for a highly reliable model), and that monitoring capabilities have sharply limited cost imposition effects when alternative open source models are available. In addition, these results suggest that nation-states -- even those conducting many large-scale influence operations per year -- are unlikely to benefit economically from training custom LLMs specifically for use in influence operations.
    摘要 尽管有人 especulate recent large language models (LLMs) 可能会被用于提高媒体操作质量或规模,但是对于宣传者而言, LLMS 的经济价值还存在uncertainty。这项研究构建了宣传者内容生成在大规模时所面临的成本模型,并分析了以下问题:(1) LLMs 可以提供宣传者内容生成的可能性,(2) API 可访问的 LLMs 监控控制的抑效果,以及(3) 宣传者选择多个私人和/或开源 LLMs 时的优化策略。主要结果表明,LLMs 只需生成可用输出,并且只需要roughly 25% 的可靠性,就能为宣传者提供成本节省。此外,研究还发现,监控控制对于使用开源模型来源的宣传者来说,成本干扰效果很少。最后,这些结果表明,even nation-states 进行大规模的媒体操作,不太可能通过专门为影响操作培训自己的 LLMs 来获得经济效益。

SurvBeX: An explanation method of the machine learning survival models based on the Beran estimator

  • paper_url: http://arxiv.org/abs/2308.03730
  • repo_url: https://github.com/danilaeremenko/survbex
  • paper_authors: Lev V. Utkin, Danila Y. Eremenko, Andrei V. Konstantinov
  • for: The paper proposes a new explanation method called SurvBeX for interpreting predictions of machine learning survival black-box models.
  • methods: The method uses a modified Beran estimator as the surrogate explanation model, and generates many points in a local area around an example of interest to compute the survival function of the black-box model and the Beran estimator.
  • results: The paper demonstrates the efficiency of SurvBeX through numerical experiments with synthetic and real survival data, and compares the method with SurvLIME and SurvSHAP. The code implementing SurvBeX is available online.
    Abstract An explanation method called SurvBeX is proposed to interpret predictions of the machine learning survival black-box models. The main idea behind the method is to use the modified Beran estimator as the surrogate explanation model. Coefficients, incorporated into Beran estimator, can be regarded as values of the feature impacts on the black-box model prediction. Following the well-known LIME method, many points are generated in a local area around an example of interest. For every generated example, the survival function of the black-box model is computed, and the survival function of the surrogate model (the Beran estimator) is constructed as a function of the explanation coefficients. In order to find the explanation coefficients, it is proposed to minimize the mean distance between the survival functions of the black-box model and the Beran estimator produced by the generated examples. Many numerical experiments with synthetic and real survival data demonstrate the SurvBeX efficiency and compare the method with the well-known method SurvLIME. The method is also compared with the method SurvSHAP. The code implementing SurvBeX is available at: https://github.com/DanilaEremenko/SurvBeX
    摘要 一种名为SurvBeX的解释方法被提议用于解释机器学习生存黑盒模型的预测结果。该方法的主要思想是使用修改后的Beran估计器作为解释模型。将这些修改后的Beran估计器作为特征影响值,可以看作黑盒模型预测结果中特征的影响。与已知的LIME方法类似,在一个当地区域around一个Example of interest中,生成多个例子。对每个生成的例子,计算黑盒模型的生存函数,并将BERAN估计器中的生存函数作为特征影响值构建。为了找到解释系数,提议使用生成的例子中的平均距离来最小化黑盒模型和BERAN估计器生成的生存函数之间的距离。多个数学实验证明SurvBeX的效果,并与SurvLIME和SurvSHAP方法进行比较。代码实现SurvBeX可以在以下链接中找到:https://github.com/DanilaEremenko/SurvBeX。

Tiny LVLM-eHub: Early Multimodal Experiments with Bard

  • paper_url: http://arxiv.org/abs/2308.03729
  • repo_url: https://github.com/opengvlab/multi-modality-arena
  • paper_authors: Wenqi Shao, Yutao Hu, Peng Gao, Meng Lei, Kaipeng Zhang, Fanqing Meng, Peng Xu, Siyuan Huang, Hongsheng Li, Yu Qiao, Ping Luo
  • for: 本研究目的是评估大型视语言模型(LVLM)在多模态任务上的表现,特别是Google的Bard模型,并提出一种轻量级的LVLM-eHub变体。
  • methods: 本研究使用了一种系统性评估多模态能力的方法,包括视觉理解、视觉知识获取、视觉逻辑、视觉常识、物体推理和embodied intelligence等六类多模态能力,通过42种文本相关的视觉benchmark测试。
  • results: 研究结果显示,Bard模型在大多数多模态能力中表现出色,仅在物体推理方面表现不佳,与人类评估更加一致。此外,Tiny LVLM-eHub变体可以便捷地评估各种Offline LVLMs模型。
    Abstract Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated significant progress in tackling complex multimodal tasks. Among these cutting-edge developments, Google's Bard stands out for its remarkable multimodal capabilities, promoting comprehensive comprehension and reasoning across various domains. This work presents an early and holistic evaluation of LVLMs' multimodal abilities, with a particular focus on Bard, by proposing a lightweight variant of LVLM-eHub, named Tiny LVLM-eHub. In comparison to the vanilla version, Tiny LVLM-eHub possesses several appealing properties. Firstly, it provides a systematic assessment of six categories of multimodal capabilities, including visual perception, visual knowledge acquisition, visual reasoning, visual commonsense, object hallucination, and embodied intelligence, through quantitative evaluation of $42$ standard text-related visual benchmarks. Secondly, it conducts an in-depth analysis of LVLMs' predictions using the ChatGPT Ensemble Evaluation (CEE), which leads to a robust and accurate evaluation and exhibits improved alignment with human evaluation compared to the word matching approach. Thirdly, it comprises a mere $2.1$K image-text pairs, facilitating ease of use for practitioners to evaluate their own offline LVLMs. Through extensive experimental analysis, this study demonstrates that Bard outperforms previous LVLMs in most multimodal capabilities except object hallucination, to which Bard is still susceptible. Tiny LVLM-eHub serves as a baseline evaluation for various LVLMs and encourages innovative strategies aimed at advancing multimodal techniques. Our project is publicly available at \url{https://github.com/OpenGVLab/Multi-Modality-Arena}.
    摘要 近期大量视语言模型(LVLM)的进步,表明了许多复杂多Modal任务的解决方案。其中,Google的Bard凸出了优异的多Modal能力,涵盖了多个领域的全面理解和合理思维。本文提出了一种轻量级的LVLM-eHub变体,名为Tiny LVLM-eHub,与传统版本相比具有多个优点。首先,它提供了六类多Modal能力的系统性评估,包括视觉理解、视觉知识获取、视觉逻辑、视觉常识、物体梦幻和embodied智能,通过42个标准文本相关的视觉准确度评估。其次,它使用ChatGPT Ensemble Evaluation(CEE)进行深入分析,从而获得了更加稳定和准确的评估结果,并与人类评估更加一致。最后,它使用2.1万个图文对象,使得评估方便快速。通过广泛的实验分析,本研究表明,Bard在大多数多Modal能力中都超越了前一代LVLM,只有物体梦幻能力方面存在一定的极点。Tiny LVLM-eHub可以作为多种LVLM的基准评估,激励创新的多Modal技术发展。我们的项目公开可用于\url{https://github.com/OpenGVLab/Multi-Modality-Arena}.

Dimensionality Reduction for Improving Out-of-Distribution Detection in Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2308.03723
  • repo_url: https://github.com/mckellwoodland/dimen_reduce_mahal
  • paper_authors: McKell Woodland, Nihil Patel, Mais Al Taie, Joshua P. Yung, Tucker J. Netherton, Ankit B. Patel, Kristy K. Brock
  • for: 验证 segmentation 模型在数据外部分布下的性能。
  • methods: 使用 Mahalanobis 距离post hoc 方法对瓶颈特征进行降维,并使用 Principal Component Analysis 降维瓶颈特征。
  • results: 可以高效地检测到数据外部分布下的图像。
    Abstract Clinically deployed segmentation models are known to fail on data outside of their training distribution. As these models perform well on most cases, it is imperative to detect out-of-distribution (OOD) images at inference to protect against automation bias. This work applies the Mahalanobis distance post hoc to the bottleneck features of a Swin UNETR model that segments the liver on T1-weighted magnetic resonance imaging. By reducing the dimensions of the bottleneck features with principal component analysis, OOD images were detected with high performance and minimal computational load.
    摘要

SEM-GAT: Explainable Semantic Pose Estimation using Learned Graph Attention

  • paper_url: http://arxiv.org/abs/2308.03718
  • repo_url: None
  • paper_authors: Efimia Panagiotaki, Daniele De Martini, Georgi Pramatarov, Matthew Gadd, Lars Kunze
  • for: 本研究提出了一种基于GNN的方法,用于利用语义和地方几何信息导引可靠的点云注册候选者。
  • methods: 本方法使用了一种新的轻量级静止图structures,将语义特征和形态特征作为关键参考点,以便实现高精度激光探测pose估计。我们的novel lightweight static graph structure通过语义实例基于的关系提取semantic instance-based关系,从而减少了计算卷积核算符的计算负担。
  • results: 我们的方法在KITTI odometry dataset上进行测试,与参考方法相比具有竞争性的准确率,同时具有更高的轨迹缓和更少的网络参数。
    Abstract This paper proposes a GNN-based method for exploiting semantics and local geometry to guide the identification of reliable pointcloud registration candidates. Semantic and morphological features of the environment serve as key reference points for registration, enabling accurate lidar-based pose estimation. Our novel lightweight static graph structure informs our attention-based keypoint node aggregation GNN network by identifying semantic instance-based relationships, acting as inductive bias to significantly reduce the computational burden of pointcloud registration. By connecting candidate nodes and exploiting cross-graph attention, we identify confidence scores for all potential registration correspondences, estimating the displacement between pointcloud scans. Our pipeline enables introspective analysis of the model's performance by correlating it with the individual contributions of local structures in the environment, providing valuable insights into the system's behaviour. We test our method on the KITTI odometry dataset, achieving competitive accuracy compared to benchmark methods and a higher track smoothness while relying on significantly fewer network parameters.
    摘要 (本文提出了一种基于GNN的方法,利用 semantics和local geometry来导引可靠的点云注册候选者的标识。环境中的semantic和形态特征作为注册参考点,实现了高精度的激光探测pose estimation。我们的新的轻量级静止图 структуры告诉我们的注意力基于节点聚合GNN网络,通过标识semantic实例之间的关系,以 inductive bias 的形式减少点云注册的计算成本。通过连接候选节点并利用交叉图注意力,我们可以为所有可能的注册匹配计算出信任度,并估算点云扫描中的偏移量。我们的管道可以 introspective 地分析模型的性能,将其与本地环境结构的个别贡献相对考量,提供有价值的信息,了解系统的行为。我们在KITTI odometry dataset上测试了我们的方法,与标准方法相比,实现了竞争性的准确率和更高的车辆运动平滑性,同时使用的网络参数数量更少。)

Safe Multimodal Communication in Human-Robot Collaboration

  • paper_url: http://arxiv.org/abs/2308.03690
  • repo_url: None
  • paper_authors: Davide Ferrari, Andrea Pupa, Alberto Signoretti, Cristian Secchi
  • for: 本研究旨在帮助人工智能机器人和人类在各种新工业设置中合作完成任务,但是这需要考虑多种因素。
  • methods: 本研究提出了一种基于多Modal融合的语音和手势命令的框架,以便人工智能机器人和人类之间进行自然和高效的交流。同时,该框架 siempre respects safety regulations。
  • results: 通过比较实验表明,通过多Modal融合的语音和手势命令,机器人可以从人类提供有价值的信息来完成任务,同时保证操作员的安全。
    Abstract The new industrial settings are characterized by the presence of human and robots that work in close proximity, cooperating in performing the required job. Such a collaboration, however, requires to pay attention to many aspects. Firstly, it is crucial to enable a communication between this two actors that is natural and efficient. Secondly, the robot behavior must always be compliant with the safety regulations, ensuring always a safe collaboration. In this paper, we propose a framework that enables multi-channel communication between humans and robots by leveraging multimodal fusion of voice and gesture commands while always respecting safety regulations. The framework is validated through a comparative experiment, demonstrating that, thanks to multimodal communication, the robot can extract valuable information for performing the required task and additionally, with the safety layer, the robot can scale its speed to ensure the operator's safety.
    摘要 新的工业设置 caracterized by the presence of human and robots working in close proximity, cooperating to perform the required job. However, such collaboration requires attention to many aspects. Firstly, it is crucial to enable natural and efficient communication between the two actors. Secondly, the robot's behavior must always comply with safety regulations, ensuring safe collaboration. In this paper, we propose a framework that enables multi-channel communication between humans and robots by leveraging multimodal fusion of voice and gesture commands while always respecting safety regulations. The framework is validated through a comparative experiment, demonstrating that, thanks to multimodal communication, the robot can extract valuable information for performing the required task and additionally, with the safety layer, the robot can scale its speed to ensure the operator's safety.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

AgentBench: Evaluating LLMs as Agents

  • paper_url: http://arxiv.org/abs/2308.03688
  • repo_url: https://github.com/thudm/agentbench
  • paper_authors: Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, Jie Tang
  • for: 这个论文的目的是评估大语言模型(LLMs)作为智能代理的能力,以及评估 LLMS 在复杂环境中的决策和判断能力。
  • methods: 这篇论文使用了 AgentBench 多维度演变 bencmark,该 bencmark 包括 8 个不同环境,用于评估 LLMS 的解释和决策能力。
  • results: 测试结果显示,商业 LLMS 在复杂环境中表现强,但是与开源竞争对手相比,它们的表现存在显著差异。
    Abstract Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks. As a result, there has been an urgent need to evaluate LLMs as agents on challenging tasks in interactive environments. We present AgentBench, a multi-dimensional evolving benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities in a multi-turn open-ended generation setting. Our extensive test over 25 LLMs (including APIs and open-sourced models) shows that, while top commercial LLMs present a strong ability of acting as agents in complex environments, there is a significant disparity in performance between them and open-sourced competitors. It also serves as a component of an ongoing project with wider coverage and deeper consideration towards systematic LLM evaluation. Datasets, environments, and an integrated evaluation package for AgentBench are released at https://github.com/THUDM/AgentBench
    摘要