cs.AI - 2023-07-02

RH20T: A Robotic Dataset for Learning Diverse Skills in One-Shot

  • paper_url: http://arxiv.org/abs/2307.00595
  • repo_url: None
  • paper_authors: Hao-Shu Fang, Hongjie Fang, Zhenyu Tang, Jirong Liu, Junbo Wang, Haoyi Zhu, Cewu Lu
  • For: The paper aims to enable robots to acquire diverse and generalizable skills in open domains using one-shot imitation learning with multi-modal perception.* Methods: The paper uses a large-scale dataset of contact-rich robot manipulation sequences collected in the real world, with visual, force, audio, and action information, along with human demonstration videos. The dataset is calibrated and made publicly available.* Results: The paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception.
    Abstract A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots. Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations. This feature is attractive for enabling robots to acquire new skills and improving task and motion planning. However, due to limitations in the training dataset, the current focus of the community has mainly been on simple cases, such as push or pick-place tasks, relying solely on visual guidance. In reality, there are many complex skills, some of which may even require both visual and tactile perception to solve. This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception. To achieve this, we have collected a dataset comprising over 110,000 \emph{contact-rich} robot manipulation sequences across diverse skills, contexts, robots, and camera viewpoints, all collected \emph{in the real world}. Each sequence in the dataset includes visual, force, audio, and action information, along with a corresponding human demonstration video. We have invested significant efforts in calibrating all the sensors and ensuring a high-quality dataset. The dataset is made publicly available at rh20t.github.io
    摘要 一大挑战在机器人操作中是如何获得多样化和普遍适用的技能。最近的研究表明一次学习可以将训练的政策传递到新任务基于示例。这个特点很有吸引力,可以使机器人获得新的技能并改进任务和运动规划。然而,由于训练数据的限制,当前社区的关注主要集中在简单的情况,如推或捕捉任务,凭借视觉指导。在实际情况下,有许多复杂的技能,一些甚至需要视觉和感觉感知来解决。这篇论文旨在解锁一个代理人能够通过多模态感知泛化到百种真实世界技能。为了实现这一目标,我们收集了超过110,000个 contacts-rich机器人操作序列,涵盖多种技能、上下文、机器人和摄像头视点,全部在真实世界中收集。每个序列包括视觉、力、声音和动作信息,以及对应的人类示例视频。我们投入了大量的努力来准确各种感知器和高质量数据集。数据集现已公开在 rh20t.github.io

BioCPT: Contrastive Pre-trained Transformers with Large-scale PubMed Search Logs for Zero-shot Biomedical Information Retrieval

  • paper_url: http://arxiv.org/abs/2307.00589
  • repo_url: https://github.com/ncbi/biocpt
  • paper_authors: Qiao Jin, Won Kim, Qingyu Chen, Donald C. Comeau, Lana Yeganova, John Wilbur, Zhiyong Lu
  • for: This paper aims to improve the performance of biomedical information retrieval (IR) systems by introducing a new Contrastively Pre-trained Transformer (BioCPT) model.
  • methods: The authors use contrastive learning to train a pair of closely-integrated retriever and re-ranker using an unprecedented scale of 255 million user click logs from PubMed.
  • results: BioCPT sets new state-of-the-art performance on five biomedical IR tasks, outperforming various baselines including much larger models such as GPT-3-sized cpt-text-XL. Additionally, BioCPT generates better biomedical article and sentence representations for semantic evaluations.
    Abstract Information retrieval (IR) is essential in biomedical knowledge acquisition and clinical decision support. While recent progress has shown that language model encoders perform better semantic retrieval, training such models requires abundant query-article annotations that are difficult to obtain in biomedicine. As a result, most biomedical IR systems only conduct lexical matching. In response, we introduce BioCPT, a first-of-its-kind Contrastively Pre-trained Transformer model for zero-shot biomedical IR. To train BioCPT, we collected an unprecedented scale of 255 million user click logs from PubMed. With such data, we use contrastive learning to train a pair of closely-integrated retriever and re-ranker. Experimental results show that BioCPT sets new state-of-the-art performance on five biomedical IR tasks, outperforming various baselines including much larger models such as GPT-3-sized cpt-text-XL. In addition, BioCPT also generates better biomedical article and sentence representations for semantic evaluations. As such, BioCPT can be readily applied to various real-world biomedical IR tasks. BioCPT API and code are publicly available at https://github.com/ncbi/BioCPT.
    摘要 生物医学知识获取 (IR) 是生物医学领域的关键技术,帮助医生和研究人员快速找到有关疾病和病理的信息。然而,在生物医学领域,获取大量的查询-文章对应数据很困难,这限制了IR系统的发展。因此,大多数生物医学IR系统只能进行字面匹配。为了解决这个问题,我们提出了 BioCPT,一种首先在生物医学IR领域使用对比预训练变换器的方法。我们使用了255万次PubMed用户点击记录来训练 BioCPT,并使用对比学习训练一对 closely-integrated retriever和重新排序器。实验结果表明,BioCPT在五种生物医学IR任务中设置了新的状态态束表现,超过了多种基eline,包括GPT-3-sized cpt-text-XL。此外,BioCPT还生成了更好的生物医学文章和句子表示,用于semantic评估。因此,BioCPT可以轻松应用于各种生物医学IR任务。BioCPT API和代码在https://github.com/ncbi/BioCPT上公开提供。

Protecting the Future: Neonatal Seizure Detection with Spatial-Temporal Modeling

  • paper_url: http://arxiv.org/abs/2307.05382
  • repo_url: None
  • paper_authors: Ziyue Li, Yuchen Fang, You Li, Kan Ren, Yansen Wang, Xufang Luo, Juanyong Duan, Congrui Huang, Dongsheng Li, Lili Qiu
    for: 这份研究是为了提出一个深度学习框架,以便帮助新生儿电脑 Tomography (EEG)中的癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫癫�
    Abstract A timely detection of seizures for newborn infants with electroencephalogram (EEG) has been a common yet life-saving practice in the Neonatal Intensive Care Unit (NICU). However, it requires great human efforts for real-time monitoring, which calls for automated solutions to neonatal seizure detection. Moreover, the current automated methods focusing on adult epilepsy monitoring often fail due to (i) dynamic seizure onset location in human brains; (ii) different montages on neonates and (iii) huge distribution shift among different subjects. In this paper, we propose a deep learning framework, namely STATENet, to address the exclusive challenges with exquisite designs at the temporal, spatial and model levels. The experiments over the real-world large-scale neonatal EEG dataset illustrate that our framework achieves significantly better seizure detection performance.
    摘要 新生儿在婴儿急救室(NICU)中的电энцефаogram(EEG)检测已成为一种常见 yet 生命存在的做法。然而,这需要大量的人工监测,呼吁自动化解决方案。然而,现有的自动化方法往往因(i)人脑突发发生地点的动态性;(ii)婴儿和成人montage的不同;以及(iii)不同个体之间的巨大分布变化而失败。本文提出了一个深度学习框架,即STATENet,以解决这些独特的挑战。实验表明,我们的框架在真实世界大规模的新生儿EEG数据集上表现出了明显更好的癫痫检测性能。

Filter Bubbles in Recommender Systems: Fact or Fallacy – A Systematic Review

  • paper_url: http://arxiv.org/abs/2307.01221
  • repo_url: None
  • paper_authors: Qazi Mohammad Areeb, Mohammad Nadeem, Shahab Saquib Sohail, Raza Imam, Faiyaz Doctor, Yassine Himeur, Amir Hussain, Abbes Amira
  • for: This paper aims to investigate the impact of filter bubbles in recommender systems and propose an integrated approach to mitigate their effects.
  • methods: The authors conduct a systematic literature review on the topic of filter bubbles in recommender systems, analyzing and classifying the reviewed articles to provide valuable insights.
  • results: The authors identify evidence of filter bubbles in recommendation systems, highlighting several biases that contribute to their existence. They also propose mechanisms to mitigate the impact of filter bubbles and demonstrate that incorporating diversity into recommendations can potentially help alleviate this issue.
    Abstract A filter bubble refers to the phenomenon where Internet customization effectively isolates individuals from diverse opinions or materials, resulting in their exposure to only a select set of content. This can lead to the reinforcement of existing attitudes, beliefs, or conditions. In this study, our primary focus is to investigate the impact of filter bubbles in recommender systems. This pioneering research aims to uncover the reasons behind this problem, explore potential solutions, and propose an integrated tool to help users avoid filter bubbles in recommender systems. To achieve this objective, we conduct a systematic literature review on the topic of filter bubbles in recommender systems. The reviewed articles are carefully analyzed and classified, providing valuable insights that inform the development of an integrated approach. Notably, our review reveals evidence of filter bubbles in recommendation systems, highlighting several biases that contribute to their existence. Moreover, we propose mechanisms to mitigate the impact of filter bubbles and demonstrate that incorporating diversity into recommendations can potentially help alleviate this issue. The findings of this timely review will serve as a benchmark for researchers working in interdisciplinary fields such as privacy, artificial intelligence ethics, and recommendation systems. Furthermore, it will open new avenues for future research in related domains, prompting further exploration and advancement in this critical area.
    摘要 Filter bubble 是指互联网个性化化导致个人仅浏览一 selec tset of content,从而隔离多元意见或内容,这可能会加剧现有的态度、信念或情况。在这项研究中,我们的主要关注点是调查推荐系统中的过滤层。我们通过系统性的文献综述,挖掘了这个问题的原因,探索可能的解决方案,并提出一种集成的工具,帮助用户避免推荐系统中的过滤层。我们认为,通过 incorporating 多样性到推荐中可能有助于缓解这个问题。我们的综述还发现了推荐系统中的过滤层证据,揭示了一些偏见的来源。我们还提出了一些缓解过滤层影响的机制,并证明了在推荐中 incorporating 多样性可能有助于缓解这个问题。我们的研究结果将成为互联网隐私、人工智能伦理和推荐系统等领域的研究 benchmark,并开创了未来研究的新途径。

Adaptive reinforcement learning of multi-agent ethically-aligned behaviours: the QSOM and QDSOM algorithms

  • paper_url: http://arxiv.org/abs/2307.00552
  • repo_url: None
  • paper_authors: Rémy Chaput, Olivier Boissier, Mathieu Guillermin
  • for: 这种论文是为了解决人工智能系统与我们的伦理考虑相协调的问题而写的。
  • methods: 这两种算法(QSOM 和 QDSOM)使用了自适应环境和奖励函数的方法,以适应环境和伦理考虑的变化。
  • results: 在一个小型智能Grid社区中,这两种算法在多个代理人能源分配问题上表现出了适应和高性能,比基eline Reinforcement Learning算法更好。
    Abstract The numerous deployed Artificial Intelligence systems need to be aligned with our ethical considerations. However, such ethical considerations might change as time passes: our society is not fixed, and our social mores evolve. This makes it difficult for these AI systems; in the Machine Ethics field especially, it has remained an under-studied challenge. In this paper, we present two algorithms, named QSOM and QDSOM, which are able to adapt to changes in the environment, and especially in the reward function, which represents the ethical considerations that we want these systems to be aligned with. They associate the well-known Q-Table to (Dynamic) Self-Organizing Maps to handle the continuous and multi-dimensional state and action spaces. We evaluate them on a use-case of multi-agent energy repartition within a small Smart Grid neighborhood, and prove their ability to adapt, and their higher performance compared to baseline Reinforcement Learning algorithms.
    摘要 各种部署的人工智能系统需要与我们的道德考虑进行协调。然而,这些道德考虑可能随着时间的推移而改变:我们的社会不固定,我们的社会习俗也在发展。这会让这些 AI 系统受到挑战,特别是在机器伦理学领域,这是一个未得到充分研究的挑战。在这篇论文中,我们提出了两种算法,即 QSOM 和 QDSOM,它们能够适应环境的变化,特别是奖励函数的变化,这些奖励函数表达我们想要这些系统与我们的道德考虑相协调。它们将知名的 Q-表与(动态)自组织地图相结合,以处理连续和多维状态和动作空间。我们在一个多个代理在小智能网格中的能源分配use case中评估了它们,并证明它们的适应性和高性能相比基eline Reinforcement Learning算法。

Defending Against Malicious Behaviors in Federated Learning with Blockchain

  • paper_url: http://arxiv.org/abs/2307.00543
  • repo_url: None
  • paper_authors: Nanqing Dong, Zhipeng Wang, Jiahao Sun, Michael Kampffmeyer, Yizhe Wen, Shuoying Zhang, William Knottenbelt, Eric Xing
  • for: 提出了一种基于区块链和分布ledger技术的安全可靠的联合学习系统,以解决现有联合学习方法中的单点失败风险。
  • methods: 我们的系统使用了点对点投票机制和奖励折损机制,这些机制通过在链上智能合约支持来检测和抵制不良客户端行为。
  • results: 我们的 teoría y empirical analyses 表明,我们的框架可以具有强大的对抗不良客户端行为的能力,并且可以提高联合学习的安全性和可靠性。
    Abstract In the era of deep learning, federated learning (FL) presents a promising approach that allows multi-institutional data owners, or clients, to collaboratively train machine learning models without compromising data privacy. However, most existing FL approaches rely on a centralized server for global model aggregation, leading to a single point of failure. This makes the system vulnerable to malicious attacks when dealing with dishonest clients. In this work, we address this problem by proposing a secure and reliable FL system based on blockchain and distributed ledger technology. Our system incorporates a peer-to-peer voting mechanism and a reward-and-slash mechanism, which are powered by on-chain smart contracts, to detect and deter malicious behaviors. Both theoretical and empirical analyses are presented to demonstrate the effectiveness of the proposed approach, showing that our framework is robust against malicious client-side behaviors.
    摘要 在深度学习时代,联邦学习(FL)提供了一种有前途的方法,允许多家机构数据所有者或客户集成机器学习模型,无需产生数据隐私问题。然而,大多数现有FL方法仍然依赖中央服务器进行全球模型汇总,这会导致单点失败。这会使系统易受到不良客户的攻击,特别是在与不诚实客户进行交互时。在这种情况下,我们解决这个问题,通过基于区块链和分布式日志技术的安全可靠FL系统。我们的系统包括了分布式投票机制和奖励折损机制,这些机制都是基于链上智能合约,以探测和抑制贪念行为。我们提供了理论和实证分析,以证明我们的框架具有对不良客户侧行为的强大抗性。

Enhancing Super-Resolution Networks through Realistic Thick-Slice CT Simulation

  • paper_url: http://arxiv.org/abs/2307.10182
  • repo_url: None
  • paper_authors: Zeyu Tang, Xiaodan Xing, Guang Yang
    for:The paper aims to develop and evaluate an innovative simulation algorithm for generating thick-slice CT images that closely resemble actual images.methods:The proposed method uses a novel simulation algorithm to generate thick-slice CT images, which are evaluated using Peak Signal-to-Noise Ratio (PSNR) and Root Mean Square Error (RMSE) metrics.results:The proposed method demonstrated substantial enhancements in terms of both PSNR and RMSE over other simulation methods, with the highest PSNR values obtained and the lowest RMSE. The generated images were then used to train four distinct super-resolution (SR) models, which exhibited enhanced performance when trained with data produced by the proposed algorithm.
    Abstract This study aims to develop and evaluate an innovative simulation algorithm for generating thick-slice CT images that closely resemble actual images in the AAPM-Mayo's 2016 Low Dose CT Grand Challenge dataset. The proposed method was evaluated using Peak Signal-to-Noise Ratio (PSNR) and Root Mean Square Error (RMSE) metrics, with the hypothesis that our simulation would produce images more congruent with their real counterparts. Our proposed method demonstrated substantial enhancements in terms of both PSNR and RMSE over other simulation methods. The highest PSNR values were obtained with the proposed method, yielding 49.7369 $\pm$ 2.5223 and 48.5801 $\pm$ 7.3271 for D45 and B30 reconstruction kernels, respectively. The proposed method also registered the lowest RMSE with values of 0.0068 $\pm$ 0.0020 and 0.0108 $\pm$ 0.0099 for D45 and B30, respectively, indicating a distribution more closely aligned with the authentic thick-slice image. Further validation of the proposed simulation algorithm was conducted using the TCIA LDCT-and-Projection-data dataset. The generated images were then leveraged to train four distinct super-resolution (SR) models, which were subsequently evaluated using the real thick-slice images from the 2016 Low Dose CT Grand Challenge dataset. When trained with data produced by our novel algorithm, all four SR models exhibited enhanced performance.
    摘要 Translation:这项研究的目的是开发和评估一种创新的厚slice CT图像生成算法,以便更好地模拟实际图像在AAPM-Mayo的2016年低剂量CT大挑战数据集中。提议的方法通过PSNR和RMSE метри来评估,假设我们的模拟会生成更加相似的实际图像。我们的提议方法显示了较大的改善,PSNR和RMSE都达到了最高值。对D45和B30重建核而言,提议方法的PSNR值为49.7369 ± 2.5223和48.5801 ± 7.3271,而RMSE值为0.0068 ± 0.0020和0.0108 ± 0.0099。这表明我们的方法生成的图像更加准确地反映实际图像。此外,我们还对使用TCIA LDCT-and-Projection-data数据集进行了进一步验证。生成的图像然后被用来训练四种不同的超分辨率(SR)模型,并使用实际厚slice图像从2016年低剂量CT大挑战数据集进行评估。当使用我们的新算法生成数据时,所有四种SR模型均表现出了改善。

Collaborative Policy Learning for Dynamic Scheduling Tasks in Cloud-Edge-Terminal IoT Networks Using Federated Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.00541
  • repo_url: None
  • paper_authors: Do-Yup Kim, Da-Eun Lee, Ji-Wan Kim, Hyun-Suk Lee
  • for: 这种论文探讨了云端-边缘-终端互联网络,其中边缘进行了一系列常见的动态调度任务。
  • methods: 该论文提出了一种基于联邦强化学习的共同策略学习框架,用于动态调度任务。此外,论文还提出了一种具有层次结构的互联网络,以实现对策略的集成学习。
  • results: 通过实验,论文表明了该框架在动态调度任务中的显著优势,包括加速策略学习速度和使新到达的边缘更容易适应其任务。
    Abstract In this paper, we examine cloud-edge-terminal IoT networks, where edges undertake a range of typical dynamic scheduling tasks. In these IoT networks, a central policy for each task can be constructed at a cloud server. The central policy can be then used by the edges conducting the task, thereby mitigating the need for them to learn their own policy from scratch. Furthermore, this central policy can be collaboratively learned at the cloud server by aggregating local experiences from the edges, thanks to the hierarchical architecture of the IoT networks. To this end, we propose a novel collaborative policy learning framework for dynamic scheduling tasks using federated reinforcement learning. For effective learning, our framework adaptively selects the tasks for collaborative learning in each round, taking into account the need for fairness among tasks. In addition, as a key enabler of the framework, we propose an edge-agnostic policy structure that enables the aggregation of local policies from different edges. We then provide the convergence analysis of the framework. Through simulations, we demonstrate that our proposed framework significantly outperforms the approaches without collaborative policy learning. Notably, it accelerates the learning speed of the policies and allows newly arrived edges to adapt to their tasks more easily.
    摘要 在这篇论文中,我们研究了云端-边缘-终端互联网络,其中边缘进行了一系列 Typical dynamic scheduling 任务。在这些互联网络中,可以在云服务器上构建每个任务的中央策略。然后,这些中央策略可以被边缘进行任务的边缘使用,从而减少边缘需要从头开始学习自己的策略的需求。此外,这些中央策略可以在云服务器上归一化地学习,通过对边缘的地址进行归一化。为了实现这一点,我们提出了一种基于联邦感知学习的共同策略学习框架。为了有效学习,我们的框架在每次轮次中选择需要合作学习的任务,考虑到任务之间的公平性。此外,我们还提出了一种不受边缘限制的策略结构,允许在不同的边缘上合并本地策略。然后,我们提供了框架的收敛分析。通过 simulate,我们示出了我们提出的框架可以非常有效地与无共同策略学习相比。尤其是,它可以加速策略学习的速度,并使新到达的边缘更容易适应自己的任务。

Graph Neural Network based Log Anomaly Detection and Explanation

  • paper_url: http://arxiv.org/abs/2307.00527
  • repo_url: None
  • paper_authors: Zhong Li, Jiayang Shi, Matthijs van Leeuwen
  • for: 本研究旨在提高高科技系统监控中的日志异常检测精度,使用图structure来捕捉日志中的异常。
  • methods: 本方法使用图 neural network来检测日志中的异常,首先将日志转换为特征化、指向的、权重图,然后使用One-Class Digraph Inception Convolutional Networks(OCDiGCN)模型来检测图中的异常。
  • results: 实验结果显示,Logs2Graphs在五个基准数据集上至少与状态计算机方法相当,而在复杂数据集上大幅超过状态计算机方法。此外,对每个异常检测结果,还提供一小 subsets of nodes 作为解释,这些节点可以为后续根本原因诊断提供有价值的提示。
    Abstract Event logs are widely used to record the status of high-tech systems, making log anomaly detection important for monitoring those systems. Most existing log anomaly detection methods take a log event count matrix or log event sequences as input, exploiting quantitative and/or sequential relationships between log events to detect anomalies. Unfortunately, only considering quantitative or sequential relationships may result in many false positives and/or false negatives. To alleviate this problem, we propose a graph-based method for unsupervised log anomaly detection, dubbed Logs2Graphs, which first converts event logs into attributed, directed, and weighted graphs, and then leverages graph neural networks to perform graph-level anomaly detection. Specifically, we introduce One-Class Digraph Inception Convolutional Networks, abbreviated as OCDiGCN, a novel graph neural network model for detecting graph-level anomalies in a collection of attributed, directed, and weighted graphs. By coupling the graph representation and anomaly detection steps, OCDiGCN can learn a representation that is especially suited for anomaly detection, resulting in a high detection accuracy. Importantly, for each identified anomaly, we additionally provide a small subset of nodes that play a crucial role in OCDiGCN's prediction as explanations, which can offer valuable cues for subsequent root cause diagnosis. Experiments on five benchmark datasets show that Logs2Graphs performs at least on par state-of-the-art log anomaly detection methods on simple datasets while largely outperforming state-of-the-art log anomaly detection methods on complicated datasets.
    摘要 Event logs 广泛用于记录高科技系统的状态,因此 log anomaly detection 成为监测这些系统的重要任务。现有大多数 log anomaly detection 方法都使用 log event count matrix 或 log event sequences 作为输入,利用量质和/或序列关系 между log events 检测异常。然而,只考虑量质和/或序列关系可能会导致许多假阳性和/或假阴性。为解决这个问题,我们提议一种基于图的方法,称为 Logs2Graphs,它将事件日志转换为带有属性、指向的、权重的图,然后使用图神经网络进行图级异常检测。特别是,我们引入 One-Class Digraph Inception Convolutional Networks,简称 OCDiGCN,一种基于图的异常检测模型,用于检测图级异常。通过对图表示和异常检测步骤的结合,OCDiGCN 可以学习一种特别适合异常检测的表示,从而实现高的检测精度。此外,对每个被检测到的异常,我们还提供一小 subsets of nodes 作为 OCDiGCN 的预测所需的关键节点,这些节点可以提供有价值的诊断依据。在五个基准数据集上进行实验,Logs2Graphs 在简单的数据集上与状态的检测方法相当,而在复杂的数据集上大幅超越状态的检测方法。

LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance

  • paper_url: http://arxiv.org/abs/2307.00522
  • repo_url: https://github.com/adham-elarabawy/ledits
  • paper_authors: Linoy Tsaban, Apolinário Passos
  • for: 这个论文的目的是提出一种轻量级的实像编辑方法,以便使用文本来编辑真实的图像。
  • methods: 该方法使用 Edit Friendly DDPM 倒推技术和semantic guidance 组合,以扩展semantic guidance到实像编辑领域,同时利用 DDPM 倒推的编辑功能。
  • results: 该方法可以实现多样化的编辑效果,包括细微的修改和大幅的修改,以及修改图像的组成和风格,而无需修改模型的架构。
    Abstract Recent large-scale text-guided diffusion models provide powerful image-generation capabilities. Currently, a significant effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing. However, editing proves to be difficult for these generative models due to the inherent nature of editing techniques, which involves preserving certain content from the original image. Conversely, in text-based models, even minor modifications to the text prompt frequently result in an entirely distinct result, making attaining one-shot generation that accurately corresponds to the users intent exceedingly challenging. In addition, to edit a real image using these state-of-the-art tools, one must first invert the image into the pre-trained models domain - adding another factor affecting the edit quality, as well as latency. In this exploratory report, we propose LEDITS - a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance, thus extending Semantic Guidance to real image editing, while harnessing the editing capabilities of DDPM inversion as well. This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.
    摘要 现代大规模文本导向扩散模型提供了强大的图像生成能力。目前,很大的努力在 modifying these images using text only as means to offer intuitive and versatile editing. However, editing proves to be difficult for these generative models due to the inherent nature of editing techniques, which involves preserving certain content from the original image. Conversely, in text-based models, even minor modifications to the text prompt frequently result in an entirely distinct result, making attaining one-shot generation that accurately corresponds to the users intent exceedingly challenging. In addition, to edit a real image using these state-of-the-art tools, one must first invert the image into the pre-trained models domain - adding another factor affecting the edit quality, as well as latency. In this exploratory report, we propose LEDITS - a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance, thus extending Semantic Guidance to real image editing, while harnessing the editing capabilities of DDPM inversion as well. This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.

DSTCGCN: Learning Dynamic Spatial-Temporal Cross Dependencies for Traffic Forecasting

  • paper_url: http://arxiv.org/abs/2307.00518
  • repo_url: https://github.com/water-wbq/dstcgcn
  • paper_authors: Binqing Wu, Ling Chen
  • for: 预测交通流量是智能交通系统中的关键任务,但是由于路网的复杂性和时空关系,现有方法通常会分别学习空间和时间两个维度的相互关系,忽略了时空两个维度之间的相互关系。本文提出了DSTCGCN,一种能够同时学习空间和时间两个维度的动态空间-时间跨度Graph Convolutional Network,用于交通流量预测。
  • methods: 本文提出了一种基于快速傅立叶变换(FFT)的注意力选择器,可以根据时间序列数据选择相关的时间步骤。然后,本文引入了动态跨度图构建模块,包括空间图构建、时间连接图构建和融合模块,以无预先假设的方式学习动态空间-时间跨度关系。
  • results: 在六个真实世界数据集上进行了广泛的实验,显示了DSTCGCN可以达到领先性的表现。
    Abstract Traffic forecasting is essential to intelligent transportation systems, which is challenging due to the complicated spatial and temporal dependencies within a road network. Existing works usually learn spatial and temporal dependencies separately, ignoring the dependencies crossing spatial and temporal dimensions. In this paper, we propose DSTCGCN, a dynamic spatial-temporal cross graph convolution network to learn dynamic spatial and temporal dependencies jointly via graphs for traffic forecasting. Specifically, we introduce a fast Fourier transform (FFT) based attentive selector to choose relevant time steps for each time step based on time-varying traffic data. Given the selected time steps, we introduce a dynamic cross graph construction module, consisting of the spatial graph construction, temporal connection graph construction, and fusion modules, to learn dynamic spatial-temporal cross dependencies without pre-defined priors. Extensive experiments on six real-world datasets demonstrate that DSTCGCN achieves the state-of-the-art performance.
    摘要 traffic 预测是智能交通系统的关键 component, 具有较复杂的空间和时间相关性, 使得现有的方法通常是分别学习空间和时间相关性, 忽略了空间和时间维度之间的相互关系. 在这篇论文中, 我们提出了 DSTCGCN, 一种动态空间-时间cross graph convolution network, 用于同时学习动态空间和时间相关性. 具体来说, 我们引入了 Fast Fourier Transform (FFT) 基于的选择器, 用于每个时间步选择相关的时间步, 基于时变交通数据. 给出选择的时间步, 我们引入了动态cross graph construction module, 包括空间图构建模块, 时间连接图构建模块和融合模块, 以无预先假设的方式学习动态空间-时间相关性. 我们在六个实际数据集上进行了广泛的实验, 并证明了 DSTCGCN 在智能交通预测中具有最佳性能.

HeGeL: A Novel Dataset for Geo-Location from Hebrew Text

  • paper_url: http://arxiv.org/abs/2307.00509
  • repo_url: https://github.com/onlplab/hegel
  • paper_authors: Tzuf Paz-Argaman, Tal Bauman, Itai Mondshine, Itzhak Omer, Sagi Dalyot, Reut Tsarfaty
    for:This paper aims to collect and analyze literal Hebrew place descriptions to study lingual geospatial reasoning and improve textual geolocation.methods:The paper uses crowdsourcing to collect 5,649 literal Hebrew place descriptions in three cities in Israel, and employs qualitative and empirical analysis to examine the data’s geospatial reasoning and the need for a novel environmental representation.results:The study finds that the data exhibits abundant use of geospatial reasoning, indicating the importance of a novel environmental representation for textual geolocation in morphologically rich and resource-poor languages like Hebrew.
    Abstract The task of textual geolocation - retrieving the coordinates of a place based on a free-form language description - calls for not only grounding but also natural language understanding and geospatial reasoning. Even though there are quite a few datasets in English used for geolocation, they are currently based on open-source data (Wikipedia and Twitter), where the location of the described place is mostly implicit, such that the location retrieval resolution is limited. Furthermore, there are no datasets available for addressing the problem of textual geolocation in morphologically rich and resource-poor languages, such as Hebrew. In this paper, we present the Hebrew Geo-Location (HeGeL) corpus, designed to collect literal place descriptions and analyze lingual geospatial reasoning. We crowdsourced 5,649 literal Hebrew place descriptions of various place types in three cities in Israel. Qualitative and empirical analysis show that the data exhibits abundant use of geospatial reasoning and requires a novel environmental representation.
    摘要 文本地理位置 Retrieving the coordinates of a place based on a free-form language description requires not only grounding but also natural language understanding and geospatial reasoning. Although there are several datasets in English for geolocation, they are based on open-source data (Wikipedia and Twitter), where the location of the described place is mostly implicit, resulting in limited location retrieval resolution. Furthermore, there are no datasets available for addressing the problem of textual geolocation in morphologically rich and resource-poor languages, such as Hebrew. In this paper, we present the Hebrew Geo-Location (HeGeL) corpus, designed to collect literal place descriptions and analyze lingual geospatial reasoning. We crowdsourced 5,649 literal Hebrew place descriptions of various place types in three cities in Israel. Qualitative and empirical analysis show that the data exhibits abundant use of geospatial reasoning and requires a novel environmental representation.Note: "Simplified Chinese" is used to refer to the standardized form of Chinese used in mainland China and Singapore, as opposed to "Traditional Chinese" used in Hong Kong and Taiwan.

Deep Cross-Modal Steganography Using Neural Representations

  • paper_url: http://arxiv.org/abs/2307.08671
  • repo_url: None
  • paper_authors: Gyojin Han, Dong-Jae Lee, Jiwan Hur, Jaehyun Choi, Junmo Kim
  • for: 这个论文是为了提出一种基于深度学习的跨模式隐藏措施,以隐藏不同类型的机密数据在封面图像中。
  • methods: 该框架使用隐藏表示(INRs)来表示机密数据,可以处理不同的模式和分辨率。
  • results: 实验结果表明,提出的方法可以扩展到不同的机密数据集,并且可以处理不同的模式和分辨率。
    Abstract Steganography is the process of embedding secret data into another message or data, in such a way that it is not easily noticeable. With the advancement of deep learning, Deep Neural Networks (DNNs) have recently been utilized in steganography. However, existing deep steganography techniques are limited in scope, as they focus on specific data types and are not effective for cross-modal steganography. Therefore, We propose a deep cross-modal steganography framework using Implicit Neural Representations (INRs) to hide secret data of various formats in cover images. The proposed framework employs INRs to represent the secret data, which can handle data of various modalities and resolutions. Experiments on various secret datasets of diverse types demonstrate that the proposed approach is expandable and capable of accommodating different modalities.
    摘要 《隐藏数据在另一个消息或数据中》是стегаノграфия的过程。随着深度学习的发展,深度神经网络(DNNs)最近在隐藏中使用。然而,现有的深度隐藏技术有限,因为它们专注于特定的数据类型,并不适用于跨模态隐藏。因此,我们提出了一种基于含义神经表示(INRs)的深度跨模态隐藏框架,以隐藏不同类型的秘密数据在覆写图像中。我们的框架使用INRs来表示秘密数据,可以处理不同的模态和分辨率。对于不同的秘密数据集进行了各种实验,结果表明我们的方法可以扩展和适应不同的模态。

Cloud Ensemble Learning for Fault Diagnosis of Rolling Bearings with Stochastic Configuration Networks

  • paper_url: http://arxiv.org/abs/2307.00507
  • repo_url: None
  • paper_authors: Wei Dai, Jiang Liu, Lanhao Wang
  • for: 这篇论文主要应用于推导滚当 fault diagnosis 的问题,尤其是在仅有少量数据的情况下。
  • methods: 本文使用了 Stochastic Configuration Network (SCN) 基于云端集成学习的方法,包括云端特征抽象方法和云端抽象样本生成方法,以及一个 Ensemble Model 来涵盖缺失信息的不确定性。
  • results: 实验结果显示,提案的方法可以优化滚当 fault diagnosis 的精度和一致性,尤其在仅有少量数据的情况下。
    Abstract Fault diagnosis of rolling bearings is of great significance for post-maintenance in rotating machinery, but it is a challenging work to diagnose faults efficiently with a few samples. Additionally, faults commonly occur with randomness and fuzziness due to the complexity of the external environment and the structure of rolling bearings, hindering effective mining of fault characteristics and eventually restricting accuracy of fault diagnosis. To overcome these problems, stochastic configuration network (SCN) based cloud ensemble learning, called SCN-CEL, is developed in this work. Concretely, a cloud feature extraction method is first developed by using a backward cloud generator of normal cloud model to mine the uncertainty of fault information. Then, a cloud sampling method, which generates enough cloud droplets using bidirectional cloud generator, is proposed to extend the cloud feature samples. Finally, an ensemble model with SCNs is developed to comprehensively characterize the uncertainty of fault information and advance the generalization performance of fault diagnosis machine. Experimental results demonstrate that the proposed method indeed performs favorably for distinguishing fault categories of rolling bearings in the few shot scenarios.
    摘要 FAULT诊断 OF ROLLING BEARINGS IS OF GREAT IMPORTANCE FOR POST-maintenance OF ROTATING MACHINERY, BUT IT IS A CHALLENGING TASK TO DIAGNOSE FAULTS EFFICIENTLY WITH A FEW SAMPLES. ADDITIONALLY, FAULTS OFTEN OCCUR WITH RANDOMNESS AND FUZZINESS DUE TO THE COMPLEXITY OF THE EXTERNAL ENVIRONMENT AND THE STRUCTURE OF ROLLING BEARINGS, HINDERING EFFECTIVE MINING OF FAULT CHARACTERISTICS AND eventually RESTRICTING THE ACCURACY OF FAULT DIAGNOSIS. TO OVERCOME THESE PROBLEMS, A STOCHASTIC CONFIGURATION NETWORK (SCN) BASED CLOUD ENSEMBLE LEARNING METHOD, CALLED SCN-CEL, IS DEVELOPED IN THIS WORK. CONCRETely, A CLOUD FEATURE EXTRACTION METHOD IS FIRST DEVELOPED BY USING A BACKWARD CLOUD GENERATOR OF NORMAL CLOUD MODEL TO MINE THE UNCERTAINTY OF FAULT INFORMATION. THEN, A CLOUD SAMPLING METHOD, WHICH GENERATES ENOUGH CLOUD DROPLETS USING BIDIRECTIONAL CLOUD GENERATOR, IS PROPOSED TO EXTEND THE CLOUD FEATURE SAMPLES. FINALLY, AN ENSEMBLE MODEL WITH SCNs IS DEVELOPED TO COMPREHENSIVELY CHARACTERIZE THE UNCERTAINTY OF FAULT INFORMATION AND ADVANCE THE GENERALIZATION PERFORMANCE OF FAULT DIAGNOSIS MACHINE. EXPERIMENTAL RESULTS DEMONSTRATE THAT THE PROPOSED METHOD INDEED PERFORMS FAVORABLY FOR DISTINGUISHING FAULT CATEGORIES OF ROLLING BEARINGS IN THE FEW SHOT SCENARIOS.

On efficient computation in active inference

  • paper_url: http://arxiv.org/abs/2307.00504
  • repo_url: https://github.com/aswinpaul/dpefe_2023
  • paper_authors: Aswin Paul, Noor Sajid, Lancelot Da Costa, Adeel Razi
  • for: 提高active inference的计算效率和定义目标分布的可能性
  • methods: 提出了两种解决方案,包括一种新的规划算法和一种基于Z-学习的目标分布设定方法
  • results: 通过实验在标准的网格世界任务中证明了这些方法的有效性和可行性,创造了新的应用机会
    Abstract Despite being recognized as neurobiologically plausible, active inference faces difficulties when employed to simulate intelligent behaviour in complex environments due to its computational cost and the difficulty of specifying an appropriate target distribution for the agent. This paper introduces two solutions that work in concert to address these limitations. First, we present a novel planning algorithm for finite temporal horizons with drastically lower computational complexity. Second, inspired by Z-learning from control theory literature, we simplify the process of setting an appropriate target distribution for new and existing active inference planning schemes. Our first approach leverages the dynamic programming algorithm, known for its computational efficiency, to minimize the cost function used in planning through the Bellman-optimality principle. Accordingly, our algorithm recursively assesses the expected free energy of actions in the reverse temporal order. This improves computational efficiency by orders of magnitude and allows precise model learning and planning, even under uncertain conditions. Our method simplifies the planning process and shows meaningful behaviour even when specifying only the agent's final goal state. The proposed solutions make defining a target distribution from a goal state straightforward compared to the more complicated task of defining a temporally informed target distribution. The effectiveness of these methods is tested and demonstrated through simulations in standard grid-world tasks. These advances create new opportunities for various applications.
    摘要 translate into Simplified Chinese:active inference在复杂环境中模拟智能行为存在困难,主要是计算成本高和设定合适的目标分布困难。本文介绍两种解决方案,它们在合作下解决这些限制。首先,我们提出了一种新的规划算法,可以在固定的时间途径上降低计算成本。其次,我们受控制理论文献中的Z学习启发,将定义目标分布的过程简化。我们的第一种方法利用动态计划算法,知道计算效率高,来最小化规划中的成本函数。根据bellman优化原理,我们递归评估动作的预期自由能量,从reverse temporal order进行评估。这些改进可以降低计算成本的许多次数,并允许精准的模型学习和规划,即使在不确定条件下。我们的方法简化了规划过程,并在指定Final Goal State时显示了有意义的行为。我们的提案使得从目标分布定义 become easier,而不是在更复杂的时间 informed target distribution中定义。我们的方法在标准grid-world任务中进行了测试和证明,这些进步创造了新的应用机会。

Don’t Memorize; Mimic The Past: Federated Class Incremental Learning Without Episodic Memory

  • paper_url: http://arxiv.org/abs/2307.00497
  • repo_url: None
  • paper_authors: Sara Babakniya, Zalan Fabian, Chaoyang He, Mahdi Soltanolkotabi, Salman Avestimehr
  • for: 这种纸是用于解决深度学习模型在新数据上忘记过去学习的问题。
  • methods: 这篇论文使用了生成模型来Synthesize过去分布,从而使得客户端可以在本地避免卡斯特罗φ菲律敦效应。
  • results: 论文对CIFAR-100数据集进行了实验,与现有基eline相比,呈现出了显著的改善。
    Abstract Deep learning models are prone to forgetting information learned in the past when trained on new data. This problem becomes even more pronounced in the context of federated learning (FL), where data is decentralized and subject to independent changes for each user. Continual Learning (CL) studies this so-called \textit{catastrophic forgetting} phenomenon primarily in centralized settings, where the learner has direct access to the complete training dataset. However, applying CL techniques to FL is not straightforward due to privacy concerns and resource limitations. This paper presents a framework for federated class incremental learning that utilizes a generative model to synthesize samples from past distributions instead of storing part of past data. Then, clients can leverage the generative model to mitigate catastrophic forgetting locally. The generative model is trained on the server using data-free methods at the end of each task without requesting data from clients. Therefore, it reduces the risk of data leakage as opposed to training it on the client's private data. We demonstrate significant improvements for the CIFAR-100 dataset compared to existing baselines.
    摘要 This paper proposes a framework for federated class incremental learning that utilizes a generative model to synthesize samples from past distributions instead of storing part of past data. Clients can leverage the generative model to mitigate catastrophic forgetting locally. The generative model is trained on the server using data-free methods at the end of each task without requesting data from clients, reducing the risk of data leakage compared to training it on the client's private data. We demonstrate significant improvements for the CIFAR-100 dataset compared to existing baselines.

STG4Traffic: A Survey and Benchmark of Spatial-Temporal Graph Neural Networks for Traffic Prediction

  • paper_url: http://arxiv.org/abs/2307.00495
  • repo_url: https://github.com/trainingl/stg4traffic
  • paper_authors: Xunlian Luo, Chunjiang Zhu, Detian Zhang, Qing Li
  • for: 这篇论文的目的是为了提供一种系统的review of graph learning策略和通用的图 convolution算法,以及对最近提出的空间时间图网络模型的全面分析。
  • methods: 本论文使用了一种称为 STG4Traffic 的深度学习框架,使用 PyTorch 建立了一个标准化和扩展的 benchmark,并对两种交通数据集进行了评估。
  • results: 研究发现,STG4Traffic 可以在两种交通数据集上达到比较高的预测精度,并且可以根据不同的数据集和模型设置进行个性化定制。
    Abstract Traffic prediction has been an active research topic in the domain of spatial-temporal data mining. Accurate real-time traffic prediction is essential to improve the safety, stability, and versatility of smart city systems, i.e., traffic control and optimal routing. The complex and highly dynamic spatial-temporal dependencies make effective predictions still face many challenges. Recent studies have shown that spatial-temporal graph neural networks exhibit great potential applied to traffic prediction, which combines sequential models with graph convolutional networks to jointly model temporal and spatial correlations. However, a survey study of graph learning, spatial-temporal graph models for traffic, as well as a fair comparison of baseline models are pending and unavoidable issues. In this paper, we first provide a systematic review of graph learning strategies and commonly used graph convolution algorithms. Then we conduct a comprehensive analysis of the strengths and weaknesses of recently proposed spatial-temporal graph network models. Furthermore, we build a study called STG4Traffic using the deep learning framework PyTorch to establish a standardized and scalable benchmark on two types of traffic datasets. We can evaluate their performance by personalizing the model settings with uniform metrics. Finally, we point out some problems in the current study and discuss future directions. Source codes are available at https://github.com/trainingl/STG4Traffic.
    摘要 历史预测已经是智能城市系统中的活跃研究主题之一,准确的实时历史预测能够提高智能城市系统的安全、稳定性和多样性。然而,由于历史的复杂和高度动态关系,实现有效预测仍然面临着许多挑战。最近的研究表明,使用历史空间图神经网络可以有效地应用于历史预测,这种方法可以同时模型历史序列和空间相关性。然而,一项历史学习、空间历史图模型以及基准模型的比较仍然是一个潜在的问题。在这篇论文中,我们首先提供了一种系统性的历史学习策略和常用的历史 convolution 算法的评论。然后,我们进行了全面的审查最近提出的空间历史图网络模型的优劣点。此外,我们在 PyTorch 深度学习框架上建立了一个标准化和可扩展的 STG4Traffic 研究,并使用两种交通数据集来评估其性能。我们可以通过个性化模型设置来评估其表现,并且提出了一些问题和未来方向。源代码可以在 GitHub 上找到。

Fourier-Mixed Window Attention: Accelerating Informer for Long Sequence Time-Series Forecasting

  • paper_url: http://arxiv.org/abs/2307.00493
  • repo_url: https://github.com/nhatthanhtran/fwin2023
  • paper_authors: Nhat Thanh Tran, Jack Xin
  • for: 快速地处理长序列时间预测中的Informer。
  • methods: 使用本地-全局窗口基于注意力方法加速Informer,而不需要查询稀缺性假设和经验上的近似准则。
  • results: FWin transformer可以提高Informer的总预测精度,同时提高推断速度,在uniivariate和multivariate dataset上实现40-50%的加速。此外,我们还证明了一个学习的FWin类注意力可以与Softmax全注意力相当或者超过基于Informer模型的全注意力层时间序列数据的键vector。
    Abstract We study a fast local-global window-based attention method to accelerate Informer for long sequence time-series forecasting. While window attention is local and a considerable computational saving, it lacks the ability to capture global token information which is compensated by a subsequent Fourier transform block. Our method, named FWin, does not rely on query sparsity hypothesis and an empirical approximation underlying the ProbSparse attention of Informer. Through experiments on univariate and multivariate datasets, we show that FWin transformers improve the overall prediction accuracies of Informer while accelerating its inference speeds by 40 to 50 %. We also show in a nonlinear regression model that a learned FWin type attention approaches or even outperforms softmax full attention based on key vectors extracted from an Informer model's full attention layer acting on time series data.
    摘要 我们研究了一种快速的本地-全局窗口基于注意力方法,用于加速Informer进行长序时间序列预测。虽然窗口注意力是本地的,但lacks the ability to capture global token information,这被补偿了随后的傅立做块。我们的方法,名为FWin,不依赖于查询稀缺假设和Informer中的ProbSparse注意力的经验 aproximation。通过对单Variate和多Variate数据进行实验,我们显示了FWin transformers可以提高Informer的总预测精度,同时加速其推断速度,提高40到50%。我们还在非线性回归模型中表明,学习的FWin类型注意力可以与softmax全注意力相当或者超过,基于Informer模型中全注意力层中的键vector从时间序列数据中提取出来的。

PatternGPT :A Pattern-Driven Framework for Large Language Model Text Generation

  • paper_url: http://arxiv.org/abs/2307.00470
  • repo_url: None
  • paper_authors: Le Xiao, Xin Shan
  • For: This paper aims to improve the text generation capability of large language models (LLMs) by proposing a pattern-driven text generation framework called PatternGPT.* Methods: The framework uses the extraction capability of LLMs to generate rich and diversified structured and formalized patterns, which are then used to guide the generation of models. The framework also utilizes federated learning to share patterns among multiple agents and optimize the search for high-quality patterns.* Results: The proposed framework has several advantages, including generating diversified patterns, protecting data privacy, combining external knowledge, and improving the quality of generation. The framework provides an effective method to optimize the text generation capability of LLMs and apply them to the field of intelligent dialogue and content generation.Here is the simplified Chinese text for the three key points:
  • for: 这篇论文目标是提高大语言模型(LLM)的文本生成能力,提出了一种基于模式的文本生成框架 PatternGPT。
  • methods: 该框架利用大语言模型的提取能力生成丰富和多样化的结构化和正式化模式,并使用联合学习来共享模式,优化搜索高质量模式。
  • results: 提议的框架具有多样性、数据隐私保护、外部知识组合和生成质量提高等优点,为大语言模型的文本生成能力优化提供有效的方法,应用于智能对话和内容生成等领域。
    Abstract Large language models(LLMS)have shown excellent text generation capabilities, capable of generating fluent human-like responses for many downstream tasks. However, applying large language models to real-world critical tasks remains challenging due to their susceptibility to hallucinations and inability to directly use external knowledge. To cope with the above challenges, this paper proposes PatternGPT, a pattern-driven text generation framework for Large Language Models. Firstly, the framework utilizes the extraction capability of Large Language Models to generate rich and diversified structured and formalized patterns, which facilitates the introduction of external knowledge to do the computation, and then draws on the idea of federated learning to use multiple agents to achieve the sharing in order to obtain more diversified patterns, and finally uses judgment criteria and optimization algorithm to search for high-quality patterns to guide the generation of models. Finally, external knowledge such as judgment criteria and optimization algorithms are used to search for high-quality patterns, and the searched patterns are used to guide model generation. This framework has the advantages of generating diversified patterns, protecting data privacy, combining external knowledge, and improving the quality of generation, which provides an effective method to optimize the text generation capability of large language models, and make it better applied to the field of intelligent dialogue and content generation.
    摘要 大型语言模型(LLM)已经显示出扎实的文本生成能力,能够生成流畅、人工智能的回应 для许多下游任务。但是,将大型语言模型应用到实际世界中的重要任务仍然具有挑战,主要是因为它们容易受到幻视和无法直接使用外部知识。为了解决以上问题,这篇论文提出了 PatternGPT,一个基于模式的文本生成框架 для Large Language Models。首先,这个框架利用了 Large Language Models 的提取能力来生成丰富和多样化的结构化和正式化模式,以便引入外部知识进行计算,然后参考 federated learning 的想法,使用多个代理人共享,以获得更多的多样化模式,最后使用判断标准和优化算法来搜寻高品质的模式,以导引模型的生成。最后,这个框架使用了外部知识,例如判断标准和优化算法,搜寻高品质的模式,并将搜寻到的模式用来导引模型的生成。这个框架有丰富的模式生成、保护数据隐私、结合外部知识、提高生成质量等优点,提供了一个有效的方法来优化大型语言模型的文本生成能力,并将其应用到智能对话和内容生成领域。

FedDefender: Backdoor Attack Defense in Federated Learning

  • paper_url: http://arxiv.org/abs/2307.08672
  • repo_url: https://github.com/warisgill/FedDefender
  • paper_authors: Waris Gill, Ali Anwar, Muhammad Ali Gulzar
  • for: 防止targeted poisoning攻击在 Federated Learning (FL) 中,保护客户端模型免受攻击并维持模型的优化。
  • methods: 利用 differential testing 方法察看客户端模型的neuron activations的差异,以识别可能有恶意的客户端。
  • results: 在 MNIST 和 FashionMNIST datasets上,FedDefender 有效地防止了targeted poisoning攻击,降低了攻击成功率(ASR)至 10%,而不会影响全球模型的性能。
    Abstract Federated Learning (FL) is a privacy-preserving distributed machine learning technique that enables individual clients (e.g., user participants, edge devices, or organizations) to train a model on their local data in a secure environment and then share the trained model with an aggregator to build a global model collaboratively. In this work, we propose FedDefender, a defense mechanism against targeted poisoning attacks in FL by leveraging differential testing. Our proposed method fingerprints the neuron activations of clients' models on the same input and uses differential testing to identify a potentially malicious client containing a backdoor. We evaluate FedDefender using MNIST and FashionMNIST datasets with 20 and 30 clients, and our results demonstrate that FedDefender effectively mitigates such attacks, reducing the attack success rate (ASR) to 10\% without deteriorating the global model performance.
    摘要 federated learning(FL)是一种隐私保护的分布式机器学习技术,允许个体客户端(例如用户参与者、边缘设备或组织)在安全环境中使用本地数据进行模型训练,然后将训练好的模型分享给一个综合器,共同构建全球模型。在这项工作中,我们提出了FedDefender,一种防御机制,用于防止针对性攻击FL。我们的提议方法通过对客户端模型的神经元活动进行指纹测试,来确定潜在恶意客户端是否含有后门。我们使用MNIST和FashionMNIST数据集,并在20和30个客户端上进行了测试。我们的结果表明,FedDefender有效地 Mitigate Such attacks,降低攻击成功率(ASR)至10%,无需降低全球模型性能。

Human-to-Human Interaction Detection

  • paper_url: http://arxiv.org/abs/2307.00464
  • repo_url: https://github.com/kakaobrain/hotr
  • paper_authors: Zhenhua Wang, Kaining Ying, Jiajun Meng, Jifeng Ning
  • for: 这篇论文旨在探讨人类之间的互动行为,如排队、握手、斗争和追逐,以便帮助公共安全监控领域中的视频监控。
  • methods: 这篇论文引入了一种新的人类互动检测任务(HID),该任务旨在在一个模型中检测人类的动作,识别每个人的动作,并将人类分组 according to their 互动关系。
  • results: 研究人员通过使用AVA dataset创建了一个新的HID benchмарke(AVA-I),并提出了一种基于Transformer模型的SaMFormer方法来解决HID任务。该方法在AVA-I上进行了广泛的实验,并被证明高效性。
    Abstract A comprehensive understanding of interested human-to-human interactions in video streams, such as queuing, handshaking, fighting and chasing, is of immense importance to the surveillance of public security in regions like campuses, squares and parks. Different from conventional human interaction recognition, which uses choreographed videos as inputs, neglects concurrent interactive groups, and performs detection and recognition in separate stages, we introduce a new task named human-to-human interaction detection (HID). HID devotes to detecting subjects, recognizing person-wise actions, and grouping people according to their interactive relations, in one model. First, based on the popular AVA dataset created for action detection, we establish a new HID benchmark, termed AVA-Interaction (AVA-I), by adding annotations on interactive relations in a frame-by-frame manner. AVA-I consists of 85,254 frames and 86,338 interactive groups, and each image includes up to 4 concurrent interactive groups. Second, we present a novel baseline approach SaMFormer for HID, containing a visual feature extractor, a split stage which leverages a Transformer-based model to decode action instances and interactive groups, and a merging stage which reconstructs the relationship between instances and groups. All SaMFormer components are jointly trained in an end-to-end manner. Extensive experiments on AVA-I validate the superiority of SaMFormer over representative methods. The dataset and code will be made public to encourage more follow-up studies.
    摘要 “一个全面的理解人际互动在视频流中,如排队、握手、战斗和追逐,对公共安全监控区域如校园、广场和公园来说是非常重要。不同于传统的人际互动识别,使用仪制化视频为输入,忽略同时互动的小组,并在不同阶段进行检测和识别,我们引入了一个新任务名为人际互动检测(HID)。HID的目的是在一个模型中检测主题,识别每个人的动作,并根据人们之间的互动关系分组人。首先,我们根据已知的AVA数据集,创建了一个新的HIDbenchmark,称为AVA-Interaction(AVA-I),通过在每个帧上添加互动关系的标注。AVA-I包含85,254帧和86,338个互动小组,每帧最多可以有4个同时互动的小组。其次,我们提出了一个基本的SaMFormer方法来进行HID,包括一个视觉特征提取器、一个分阶段使用Transformer型模型将动作实例和互动小组解析,以及一个合并阶段将实例和小组之间的关系重建。所有SaMFormer комponents都是以终端式方式进行集成训练。广泛的实验验证了SaMFormer在AVA-I上的超越性。这个数据集和代码将会公开,以便更多的后续研究。”

Conformer LLMs – Convolution Augmented Large Language Models

  • paper_url: http://arxiv.org/abs/2307.00461
  • repo_url: None
  • paper_authors: Prateek Verma
  • for: 这个论文是为了开发一种基于卷积层和转换器的大语言模型(LLM)的 causal 训练方法。
  • methods: 这个论文使用了非 causal 卷积层和转换器,并将其适应到 causal 设置中进行训练 LLM。
  • results: 该论文实现了在听说任务中获得显著的性能提升,表明该方法可以在大规模语言模型中具有良好的性能。
    Abstract This work builds together two popular blocks of neural architecture, namely convolutional layers and Transformers, for large language models (LLMs). Non-causal conformers are used ubiquitously in automatic speech recognition. This work aims to adapt these architectures in a causal setup for training LLMs. Transformers decoders effectively capture long-range dependencies over several modalities and form a core backbone of modern advancements in machine learning. Convolutional architectures have been popular in extracting features in domains such as raw 1-D signals, speech, and images, to name a few. In this paper, by combining local and global dependencies over latent representations using causal convolutional filters and Transformer, we achieve significant gains in performance. This work showcases a robust speech architecture that can be integrated and adapted in a causal setup beyond speech applications for large-scale language modeling.
    摘要 这个工作将两种流行的神经网络块结合在一起,即卷积层和Transformers,用于大型语言模型(LLM)的训练。非 causal 的变体在自动语音识别中广泛使用。本工作想要将这些架构在 causal 设置中适应训练 LLM。Transformers 解码器可以效果地捕捉多modalities 中的长距离依赖关系,并成为现代机器学习的核心进步。卷积 arquitectures 在处理 Raw 1-D 信号、语音和图像等领域中受欢迎,以EXTRACT特征。在这篇论文中,我们通过将本地和全球依赖关系使用 causal 卷积Filter和Transformer 来实现显著提高性能。这个robust的语音架构可以在 causal 设置中被集成和适应 beyond speech 应用程序。

GenRec: Large Language Model for Generative Recommendation

  • paper_url: http://arxiv.org/abs/2307.00457
  • repo_url: https://github.com/rutgerswiselab/genrec
  • paper_authors: Jianchao Ji, Zelong Li, Shuyuan Xu, Wenyue Hua, Yingqiang Ge, Juntao Tan, Yongfeng Zhang
  • for: 这篇论文旨在探讨大语言模型(LLM)在生成推荐方式下的潜在应用。
  • methods: 该论文提出了一种基于大语言模型(LLM)的生成推荐方法(GenRec),利用LLM的理解能力来解释上下文、学习用户偏好和生成相关推荐。
  • results: experiments 表明,GenRec 在大量数据集上具有显著优异表现,与传统的分类推荐方法相比,GenRec 可以更好地理解用户偏好和适应变化的用户需求。
    Abstract In recent years, large language models (LLM) have emerged as powerful tools for diverse natural language processing tasks. However, their potential for recommender systems under the generative recommendation paradigm remains relatively unexplored. This paper presents an innovative approach to recommendation systems using large language models (LLMs) based on text data. In this paper, we present a novel LLM for generative recommendation (GenRec) that utilized the expressive power of LLM to directly generate the target item to recommend, rather than calculating ranking score for each candidate item one by one as in traditional discriminative recommendation. GenRec uses LLM's understanding ability to interpret context, learn user preferences, and generate relevant recommendation. Our proposed approach leverages the vast knowledge encoded in large language models to accomplish recommendation tasks. We first we formulate specialized prompts to enhance the ability of LLM to comprehend recommendation tasks. Subsequently, we use these prompts to fine-tune the LLaMA backbone LLM on a dataset of user-item interactions, represented by textual data, to capture user preferences and item characteristics. Our research underscores the potential of LLM-based generative recommendation in revolutionizing the domain of recommendation systems and offers a foundational framework for future explorations in this field. We conduct extensive experiments on benchmark datasets, and the experiments shows that our GenRec has significant better results on large dataset.
    摘要 In this paper, we present a novel LLM for generative recommendation (GenRec) that leverages the expressive power of LLMs to directly generate the target item to recommend, rather than calculating ranking scores for each candidate item one by one as in traditional discriminative recommendation. GenRec uses LLM's understanding ability to interpret context, learn user preferences, and generate relevant recommendations.Our proposed approach leverages the vast knowledge encoded in large language models to accomplish recommendation tasks. We first formulate specialized prompts to enhance the ability of LLMs to comprehend recommendation tasks. We then use these prompts to fine-tune the LLaMA backbone LLM on a dataset of user-item interactions, represented by textual data, to capture user preferences and item characteristics.Our research highlights the potential of LLM-based generative recommendation to revolutionize the field of recommendation systems and provides a foundational framework for future explorations in this area. We conduct extensive experiments on benchmark datasets, and the results show that our GenRec achieves significantly better results on large datasets.

3D-IDS: Doubly Disentangled Dynamic Intrusion Detection

  • paper_url: http://arxiv.org/abs/2307.11079
  • repo_url: None
  • paper_authors: Chenyang Qiu, Yingsheng Geng, Junrui Lu, Kaida Chen, Shitong Zhu, Ya Su, Guoshun Nan, Can Zhang, Junsong Fu, Qimei Cui, Xiaofeng Tao
  • For: + 3D-IDS is proposed to tackle the inconsistent performance of existing NIDS methods in detecting various unknown and known attacks, especially in encrypted traffic. + The proposed method aims to disentangle traffic features and highlight attack-specific features for effective identification of attacks. + The method is designed to improve the explainability of NIDS.* Methods: + Two-step feature disentanglements are used to differentiate complex features of various attacks. + A non-parameterized optimization based on mutual information is used to automatically disentangle traffic features. + A memory model is used to generate representations of the disentangled features. + A novel graph diffusion method is used to dynamically fuse the network topology for spatial-temporal aggregation in evolving data streams.* Results: + The proposed 3D-IDS method outperforms existing NIDS methods in detecting various attacks, including unknown threats and known ones that are not easily detected. + Experiments show the superiority of the proposed method. + The two-step feature disentanglements benefit the explainability of NIDS.
    Abstract Network-based intrusion detection system (NIDS) monitors network traffic for malicious activities, forming the frontline defense against increasing attacks over information infrastructures. Although promising, our quantitative analysis shows that existing methods perform inconsistently in declaring various unknown attacks (e.g., 9% and 35% F1 respectively for two distinct unknown threats for an SVM-based method) or detecting diverse known attacks (e.g., 31% F1 for the Backdoor and 93% F1 for DDoS by a GCN-based state-of-the-art method), and reveals that the underlying cause is entangled distributions of flow features. This motivates us to propose 3D-IDS, a novel method that aims to tackle the above issues through two-step feature disentanglements and a dynamic graph diffusion scheme. Specifically, we first disentangle traffic features by a non-parameterized optimization based on mutual information, automatically differentiating tens and hundreds of complex features of various attacks. Such differentiated features will be fed into a memory model to generate representations, which are further disentangled to highlight the attack-specific features. Finally, we use a novel graph diffusion method that dynamically fuses the network topology for spatial-temporal aggregation in evolving data streams. By doing so, we can effectively identify various attacks in encrypted traffics, including unknown threats and known ones that are not easily detected. Experiments show the superiority of our 3D-IDS. We also demonstrate that our two-step feature disentanglements benefit the explainability of NIDS.
    摘要 First, we disentangle traffic features using a non-parameterized optimization based on mutual information, which automatically differentiates tens and hundreds of complex features of various attacks. These differentiated features are then fed into a memory model to generate representations, which are further disentangled to highlight attack-specific features. Finally, we use a novel graph diffusion method that dynamically fuses the network topology for spatial-temporal aggregation in evolving data streams. This allows us to effectively identify various attacks in encrypted traffics, including unknown threats and known ones that are not easily detected.Experiments show the superiority of our 3D-IDS. We also demonstrate that our two-step feature disentanglements benefit the explainability of NIDS.

WaveMixSR: A Resource-efficient Neural Network for Image Super-resolution

  • paper_url: http://arxiv.org/abs/2307.00430
  • repo_url: https://github.com/pranavphoenix/WaveMixSR
  • paper_authors: Pranav Jeevan, Akella Srinidhi, Pasunuri Prathiba, Amit Sethi
  • For: The paper is written for research on image super-resolution, specifically proposing a new neural network called WaveMixSR that uses a 2D-discrete wavelet transform for spatial token-mixing.* Methods: The paper uses the WaveMix architecture, which combines the inductive bias of convolutions with the lossless token-mixing property of wavelet transform to achieve higher performance in image super-resolution. The network does not unroll the image as a sequence of pixels/patches like transformer-based models do.* Results: The paper compares the performance of WaveMixSR with other state-of-the-art methods for image super-resolution and shows that it achieves competitive performance in all datasets and reaches state-of-the-art performance in the BSD100 dataset on multiple super-resolution tasks. The model achieves this performance using less training data and computational resources while maintaining high parameter efficiency compared to current state-of-the-art models.Here’s the simplified Chinese text for the three key points:* For: 这篇论文是关于图像超解析研究,具体是提出一种新的神经网络 called WaveMixSR,使用2D离散波лет变换来实现空间токен混合。* Methods: 论文使用WaveMix架构,结合卷积的假设导向和波лет变换的无损token混合性来实现图像超解析 task。不同于转换器模型,WaveMixSR不会将图像作为像素/补丁序列推进。* Results: 论文对WaveMixSR与其他当前顶尖方法进行比较,并显示它在所有数据集中具有竞争性的性能,并在BSD100数据集中实现多个超解析任务的state-of-the-art性。WaveMixSR使用 fewer training data和计算资源,同时保持高参数效率与当前顶尖模型相比。
    Abstract Image super-resolution research recently been dominated by transformer models which need higher computational resources than CNNs due to the quadratic complexity of self-attention. We propose a new neural network -- WaveMixSR -- for image super-resolution based on WaveMix architecture which uses a 2D-discrete wavelet transform for spatial token-mixing. Unlike transformer-based models, WaveMixSR does not unroll the image as a sequence of pixels/patches. It uses the inductive bias of convolutions along with the lossless token-mixing property of wavelet transform to achieve higher performance while requiring fewer resources and training data. We compare the performance of our network with other state-of-the-art methods for image super-resolution. Our experiments show that WaveMixSR achieves competitive performance in all datasets and reaches state-of-the-art performance in the BSD100 dataset on multiple super-resolution tasks. Our model is able to achieve this performance using less training data and computational resources while maintaining high parameter efficiency compared to current state-of-the-art models.
    摘要

Sparsity-aware generalization theory for deep neural networks

  • paper_url: http://arxiv.org/abs/2307.00426
  • repo_url: None
  • paper_authors: Ramchandran Muthukumar, Jeremias Sulam
  • for: 这 paper 的目的是解释深度人工神经网络的泛化能力。
  • methods: 这 paper 使用了一种新的方法来分析深度循环ReLU网络的泛化性,利用隐藏层活动的稀烈程度来降低效果模型的大小。
  • results: 这 paper 表明了泛化性和稀烈程度之间的基本负面关系,而且这些结果不假设模型的稀烈程度有很大的限制。数值计算也验证了这些结果,并在特定情况下提供了非虚假的下界。
    Abstract Deep artificial neural networks achieve surprising generalization abilities that remain poorly understood. In this paper, we present a new approach to analyzing generalization for deep feed-forward ReLU networks that takes advantage of the degree of sparsity that is achieved in the hidden layer activations. By developing a framework that accounts for this reduced effective model size for each input sample, we are able to show fundamental trade-offs between sparsity and generalization. Importantly, our results make no strong assumptions about the degree of sparsity achieved by the model, and it improves over recent norm-based approaches. We illustrate our results numerically, demonstrating non-vacuous bounds when coupled with data-dependent priors in specific settings, even in over-parametrized models.
    摘要 深度人工神经网络达到了意外的泛化能力,这种能力仍未得到充分理解。在这篇论文中,我们提出了一种新的分析泛化方法,利用隐藏层活动的稀疏度来提高分析效果。我们开发了一个考虑这种减少效果模型大小的框架,从而显示了泛化和稀疏度之间的基本负相关性。这些结果不假设模型达到了具体的稀疏度水平,并且超过了最近的 нор-based方法。我们通过数值示例,证明了这些结果的有效性,即在特定设定下,与数据依赖的假设结合使用时可以获得非虚无效的下限。

Understanding Counterspeech for Online Harm Mitigation

  • paper_url: http://arxiv.org/abs/2307.04761
  • repo_url: None
  • paper_authors: Yi-Ling Chung, Gavin Abercrombie, Florence Enock, Jonathan Bright, Verena Rieser
  • for: 本研究旨在探讨对仇恨言论的抗议,以帮助制定有效的仇恨mitigation策略。
  • methods: 本研究使用社会科学和计算机科学的研究方法,对仇恨言论的抗议进行系统性的审查和比较,以找到最有效的抗议方法和最佳实施条件。
  • results: 研究发现,有效的抗议方法包括直接抗议、呈现反对意见、提供反对证据等,而且最佳实施条件包括在社交媒体平台上进行抗议、在抗议中使用正面语言、以及在抗议中强调团结和支持。
    Abstract Counterspeech offers direct rebuttals to hateful speech by challenging perpetrators of hate and showing support to targets of abuse. It provides a promising alternative to more contentious measures, such as content moderation and deplatforming, by contributing a greater amount of positive online speech rather than attempting to mitigate harmful content through removal. Advances in the development of large language models mean that the process of producing counterspeech could be made more efficient by automating its generation, which would enable large-scale online campaigns. However, we currently lack a systematic understanding of several important factors relating to the efficacy of counterspeech for hate mitigation, such as which types of counterspeech are most effective, what are the optimal conditions for implementation, and which specific effects of hate it can best ameliorate. This paper aims to fill this gap by systematically reviewing counterspeech research in the social sciences and comparing methodologies and findings with computer science efforts in automatic counterspeech generation. By taking this multi-disciplinary view, we identify promising future directions in both fields.
    摘要 对话抗言可以直接反驳仇恨言语,挑战仇恨行为者并表达对受害者的支持。它提供了一种有前途的替代方案,而不是通过内容审核和屏蔽来缓解伤害性内容。随着大语言模型的发展,生成对话抗言的过程可以通过自动化来加速,这将使得大规模的在线运动变得可能。然而,我们目前缺乏对对话抗言的效果进行系统性理解,例如最有效的类型、实施条件以及哪些特定情况下可以最好地缓解仇恨的效果。这篇论文希望通过对社科研究和计算机科学的自动对话抗言生成技术进行系统性比较,从而填补这些知识空白。通过这种多学科视角,我们可以找到未来的发展方向。

WavePaint: Resource-efficient Token-mixer for Self-supervised Inpainting

  • paper_url: http://arxiv.org/abs/2307.00407
  • repo_url: https://github.com/pranavphoenix/WavePaint
  • paper_authors: Pranav Jeevan, Dharshan Sampath Kumar, Amit Sethi
  • for: 图像填充(image inpainting),用于重建 occluded 或 degraded 图像区域,以及作为自我监督前置任务。
  • methods: 使用 computationally-efficient WaveMix-based fully convolutional architecture – WavePaint,利用 2D-discrete wavelet transform (DWT) для spatial 和 multi-resolution token-mixing 以及 convolutional layers。
  • results: 比现有 state-of-the-art 模型在 reconstruction 质量上表现出色,同时具有参数数量少于半、训练和评估时间较短的优势。对 CelebA-HQ 数据集进行了比较,无需使用对抗性训练的潜在探测器,而且还超过了当前 GAN-based 架构。
    Abstract Image inpainting, which refers to the synthesis of missing regions in an image, can help restore occluded or degraded areas and also serve as a precursor task for self-supervision. The current state-of-the-art models for image inpainting are computationally heavy as they are based on transformer or CNN backbones that are trained in adversarial or diffusion settings. This paper diverges from vision transformers by using a computationally-efficient WaveMix-based fully convolutional architecture -- WavePaint. It uses a 2D-discrete wavelet transform (DWT) for spatial and multi-resolution token-mixing along with convolutional layers. The proposed model outperforms the current state-of-the-art models for image inpainting on reconstruction quality while also using less than half the parameter count and considerably lower training and evaluation times. Our model even outperforms current GAN-based architectures in CelebA-HQ dataset without using an adversarially trainable discriminator. Our work suggests that neural architectures that are modeled after natural image priors require fewer parameters and computations to achieve generalization comparable to transformers.
    摘要 图像填充(image inpainting)是指将图像中缺失的区域重新生成,以便修复受遮挡或退化的区域。这项技术可以作为自我超级视觉任务的前置任务,以及图像修复和改善的方法。当前的状态 искусственный智能模型(state-of-the-art models) для图像填充都是基于 transformer 或 CNN 底层的,这些模型在对抗或扩散设置下训练。本文与视觉 transformer 不同,使用 computationally-efficient WaveMix-based fully convolutional architecture -- WavePaint。它使用 2D-discrete wavelet transform (DWT) 进行空间和多谱分辨率的混合,并使用卷积层。提议的模型在重建质量方面超过当前状态 искусственный智能模型,而且使用的参数数量少于half,并且训练和评估时间较短。我们的模型甚至在 CelebA-HQ 数据集上超过当前基于 GAN 架构的模型,而不需要 adversarially trainable 的识别器。我们的工作表明,基于自然图像假设的神经网络模型可以通过减少参数和计算来实现与 transformers 相同的泛化能力。

ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models

  • paper_url: http://arxiv.org/abs/2307.00398
  • repo_url: https://github.com/ExplainableML/ProbVLM
  • paper_authors: Uddeshya Upadhyay, Shyamgopal Karthik, Massimiliano Mancini, Zeynep Akata
  • for: 这paper是为了解决大规模视语言模型(VLMs)中的固定映射问题,使得模型能够更好地捕捉图像和文本之间的抽象关系。
  • methods: 这paper使用了一种名为ProbVLM的概率适配器,通过在后期manner中对已经预训练的VLMs进行多模态协调并且不需要大量的数据或计算来估计图像和文本的嵌入空间的概率分布。
  • results: 在四个复杂的dataset上(COCO、Flickr、CUB和Oxford-flowers),这paper测试了两个VLMs(CLIP和BLIP)的嵌入空间的不确定性,并证明了ProbVLM在检索任务中的评估和选择性能比其他方法更高。此外,paper还提出了在实际应用中的活动学习和模型选择任务中的 embeddinguncertainty 的使用,并证明了它们的有用性。最后,paper还介绍了一种使用大规模预训练的潜在扩散模型来可见化嵌入分布的新技术。
    Abstract Large-scale vision-language models (VLMs) like CLIP successfully find correspondences between images and text. Through the standard deterministic mapping process, an image or a text sample is mapped to a single vector in the embedding space. This is problematic: as multiple samples (images or text) can abstract the same concept in the physical world, deterministic embeddings do not reflect the inherent ambiguity in the embedding space. We propose ProbVLM, a probabilistic adapter that estimates probability distributions for the embeddings of pre-trained VLMs via inter/intra-modal alignment in a post-hoc manner without needing large-scale datasets or computing. On four challenging datasets, i.e., COCO, Flickr, CUB, and Oxford-flowers, we estimate the multi-modal embedding uncertainties for two VLMs, i.e., CLIP and BLIP, quantify the calibration of embedding uncertainties in retrieval tasks and show that ProbVLM outperforms other methods. Furthermore, we propose active learning and model selection as two real-world downstream tasks for VLMs and show that the estimated uncertainty aids both tasks. Lastly, we present a novel technique for visualizing the embedding distributions using a large-scale pre-trained latent diffusion model.
    摘要 大规模视力语言模型(VLM)如CLIP成功地找到图像和文本之间的对应关系。通过标准的推测映射过程,一个图像或文本样本将映射到单个向量空间中。这会引起问题:由于多个样本(图像或文本)可以抽象出physical world中的同一个概念,则推测的 embedding 空间中的固定 embedding 不会反映 embedding 空间的内在含义。我们提出了ProbVLM,一种可信度抑制器,通过模式/内部模式的对齐来对预训练VLM的embedding进行随机化,而无需大规模数据集或计算。在四个复杂的数据集上(COCO、Flickr、CUB和Oxford-flowers),我们估算了预训练VLM的多模态 embedding 不确定性,衡量预测任务中的评估和折衔,并显示了ProbVLM的超越性。此外,我们还提出了基于 VLM 的活动学习和模型选择两个现实世界下游任务,并证明了估算不确定性对这两个任务具有帮助作用。最后,我们介绍了一种基于大规模预训练潜在扩散模型的新技术来视觉化 embedding 分布。

CasTGAN: Cascaded Generative Adversarial Network for Realistic Tabular Data Synthesis

  • paper_url: http://arxiv.org/abs/2307.00384
  • repo_url: https://github.com/abedshantti/castgan
  • paper_authors: Abdallah Alshantti, Damiano Varagnolo, Adil Rasheed, Aria Rahmati, Frank Westad
  • for: This paper aims to generate realistic tabular data with a specific focus on validity, addressing the limitations of traditional generative models.
  • methods: The proposed method, CasTGAN, uses a cascaded tabular GAN architecture, where a dedicated generator samples each feature, resulting in more representative synthetic output.
  • results: The experimental results show that CasTGAN well captures the constraints and correlations between features of real data, especially for high-dimensional datasets. Additionally, the model demonstrates robustness against white-box privacy attacks with perturbations applied to the auxiliary learners.
    Abstract Generative adversarial networks (GANs) have drawn considerable attention in recent years for their proven capability in generating synthetic data which can be utilized for multiple purposes. While GANs have demonstrated tremendous successes in producing synthetic data samples that replicate the dynamics of the original datasets, the validity of the synthetic data and the underlying privacy concerns represent major challenges which are not sufficiently addressed. In this work, we design a cascaded tabular GAN framework (CasTGAN) for generating realistic tabular data with a specific focus on the validity of the output. In this context, validity refers to the the dependency between features that can be found in the real data, but is typically misrepresented by traditional generative models. Our key idea entails that employing a cascaded architecture in which a dedicated generator samples each feature, the synthetic output becomes more representative of the real data. Our experimental results demonstrate that our model well captures the constraints and the correlations between the features of the real data, especially the high dimensional datasets. Furthermore, we evaluate the risk of white-box privacy attacks on our model and subsequently show that applying some perturbations to the auxiliary learners in CasTGAN increases the overall robustness of our model against targeted attacks.
    摘要