cs.AI - 2023-10-28

AI for Open Science: A Multi-Agent Perspective for Ethically Translating Data to Knowledge

  • paper_url: http://arxiv.org/abs/2310.18852
  • repo_url: None
  • paper_authors: Chase Yakaboski, Gregory Hyde, Clement Nyanhongo, Eugene Santos Jr
  • for: 本文提出了一种名为“AI for Open Science”(AI4OS)的概念,以便在科学实验室中提高开放性,并且将科学发现的开放化视为核心原则。
  • methods: 本文使用了知识发现和数据挖掘(KDD)的原则来正式化AI4OS的语言。并详细介绍了AI4OS系统中知识翻译的三个关键阶段,以及在这些阶段中应用开放性的具体方法。
  • results: 本文提出了一种用于评估AI4OS的理论指标,并阐述了这种指标的伦理意义。作者希望通过强调AI4OS,使AI4科学的自动化实验室不仅对开发者而言是有利,而且对社会也是有益。
    Abstract AI for Science (AI4Science), particularly in the form of self-driving labs, has the potential to sideline human involvement and hinder scientific discovery within the broader community. While prior research has focused on ensuring the responsible deployment of AI applications, enhancing security, and ensuring interpretability, we also propose that promoting openness in AI4Science discoveries should be carefully considered. In this paper, we introduce the concept of AI for Open Science (AI4OS) as a multi-agent extension of AI4Science with the core principle of maximizing open knowledge translation throughout the scientific enterprise rather than a single organizational unit. We use the established principles of Knowledge Discovery and Data Mining (KDD) to formalize a language around AI4OS. We then discuss three principle stages of knowledge translation embedded in AI4Science systems and detail specific points where openness can be applied to yield an AI4OS alternative. Lastly, we formulate a theoretical metric to assess AI4OS with a supporting ethical argument highlighting its importance. Our goal is that by drawing attention to AI4OS we can ensure the natural consequence of AI4Science (e.g., self-driving labs) is a benefit not only for its developers but for society as a whole.
    摘要 人工智能(AI)在科学领域(AI4Science),特别是自动驾驶室,有可能削弱人类参与度和阻碍科学发现。而且,现有研究主要集中在负责AI应用部署、加强安全性和保持可解释性等方面。我们还建议在AI4Science发现中保持开放性应该仔细考虑。在本文中,我们提出了AI для开放科学(AI4OS)的概念,它是AI4Science的多代理扩展,核心原则是在科学产业中最大化开放知识翻译。我们使用已有的知识发现和数据挖掘(KDD)原则来正式化AI4OS的语言。然后,我们讨论了AI4Science系统中知识翻译的三个基本阶段,并详细介绍了在每个阶段中开放性可以如何应用,以生成一种AI4OS的替代方案。最后,我们提出了一个理论指标来评估AI4OS,并附加了一个伦理论据,强调其重要性。我们的目标是通过吸引关注AI4OS,使AI4Science的自然后果(例如自动驾驶室)对发展者和社会都带来好处。

Exploring Data Augmentations on Self-/Semi-/Fully- Supervised Pre-trained Models

  • paper_url: http://arxiv.org/abs/2310.18850
  • repo_url: None
  • paper_authors: Shentong Mo, Zhun Sun, Chao Li
  • for: investigate the effectiveness of data augmentation techniques in vision pre-trained models
  • methods: apply 4 types of data augmentations (Random Erasing, CutOut, CutMix, and MixUp) to self-/semi-/fully-supervised pre-trained models
  • results: observe that masking regions of images decreases invariance but increases diversity, while MixUp approach improves diversity with minimal decrease in invariance.Here’s the full text in Simplified Chinese:
  • for: 研究视觉预训模型中数据增强技术的效果
  • methods: 对自助/半助/全助预训模型应用4种数据增强方法(随机覆盖、剪辑、混合和混合)
  • results: 发现,对图像masking区域可以降低学习的不变性,但提供更大的多样性;而混合方法可以提高多样性,只是有一定的减少不变性。
    Abstract Data augmentation has become a standard component of vision pre-trained models to capture the invariance between augmented views. In practice, augmentation techniques that mask regions of a sample with zero/mean values or patches from other samples are commonly employed in pre-trained models with self-/semi-/fully-supervised contrastive losses. However, the underlying mechanism behind the effectiveness of these augmentation techniques remains poorly explored. To investigate the problems, we conduct an empirical study to quantify how data augmentation affects performance. Concretely, we apply 4 types of data augmentations termed with Random Erasing, CutOut, CutMix and MixUp to a series of self-/semi-/fully- supervised pre-trained models. We report their performance on vision tasks such as image classification, object detection, instance segmentation, and semantic segmentation. We then explicitly evaluate the invariance and diversity of the feature embedding. We observe that: 1) Masking regions of the images decreases the invariance of the learned feature embedding while providing a more considerable diversity. 2) Manual annotations do not change the invariance or diversity of the learned feature embedding. 3) The MixUp approach improves the diversity significantly, with only a marginal decrease in terms of the invariance.
    摘要 <>将文本翻译成简化中文。<>预训练模型中的数据扩充已成为标准组件,以捕捉不同扩充视图之间的不变性。在实践中,通常使用随机将区域Masking为零或平均值的技术来实现预训练模型,并使用自我/半自动/全自动对比损失。然而,这些扩充技术的下面机制仍然不够了解。为了调查问题,我们进行了一项实验来衡量数据扩充对性能的影响。具体来说,我们将4种数据扩充方法称为随机擦除、CutOut、CutMix和MixUp应用于一系列自我/半自动/全自动预训练模型。我们则Report它们在视觉任务中的性能,包括图像分类、物体检测、实例分割和semantic segmentation。然后,我们显式评估扩充后feature embedding的不变性和多样性。我们发现:1. 将图像中的区域Masking为零或平均值会降低学习的feature embedding不变性,同时提供更大的多样性。2. 手动标注没有改变学习的feature embedding不变性或多样性。3. MixUp方法可以提高多样性,只有小量地降低不变性。

BanditPAM++: Faster $k$-medoids Clustering

  • paper_url: http://arxiv.org/abs/2310.18844
  • repo_url: https://github.com/thrungroup/banditpam_plusplus_experiments
  • paper_authors: Mo Tiwari, Ryan Kang, Donghyun Lee, Sebastian Thrun, Chris Piech, Ilan Shomorony, Martin Jinye Zhang
  • for: 这个论文主要关注于提高$k$-medoids clustering算法的效率和准确性。
  • methods: 该论文提出了两种算法优化方法,即在每个迭代中重用归一化信息,以及在不同迭代之间重用信息。
  • results: 实验结果表明,使用提出的 BanditPAM++ 算法可以在 CIFAR10 数据集上返回同样的 clustering 解决方案,但是运行速度比 BanditPAM 快得多,例如在 CIFAR10 数据集上,BanditPAM++ 运行时间是 BanditPAM 的10倍以上。
    Abstract Clustering is a fundamental task in data science with wide-ranging applications. In $k$-medoids clustering, cluster centers must be actual datapoints and arbitrary distance metrics may be used; these features allow for greater interpretability of the cluster centers and the clustering of exotic objects in $k$-medoids clustering, respectively. $k$-medoids clustering has recently grown in popularity due to the discovery of more efficient $k$-medoids algorithms. In particular, recent research has proposed BanditPAM, a randomized $k$-medoids algorithm with state-of-the-art complexity and clustering accuracy. In this paper, we present BanditPAM++, which accelerates BanditPAM via two algorithmic improvements, and is $O(k)$ faster than BanditPAM in complexity and substantially faster than BanditPAM in wall-clock runtime. First, we demonstrate that BanditPAM has a special structure that allows the reuse of clustering information $\textit{within}$ each iteration. Second, we demonstrate that BanditPAM has additional structure that permits the reuse of information $\textit{across}$ different iterations. These observations inspire our proposed algorithm, BanditPAM++, which returns the same clustering solutions as BanditPAM but often several times faster. For example, on the CIFAR10 dataset, BanditPAM++ returns the same results as BanditPAM but runs over 10$\times$ faster. Finally, we provide a high-performance C++ implementation of BanditPAM++, callable from Python and R, that may be of interest to practitioners at https://github.com/motiwari/BanditPAM. Auxiliary code to reproduce all of our experiments via a one-line script is available at https://github.com/ThrunGroup/BanditPAM_plusplus_experiments.
    摘要 “集群是数据科学中的基本任务,具有广泛的应用。在$k$-medians集群中,集群中心必须是实际数据点,并且可以使用任意距离度量;这些特点使得$k$-medians集群更有可读性,并且可以更好地集 clusters 的特殊对象。随着更高效的$k$-medians算法的发现,$k$-medians集群在最近几年内 Popularity 增长。本文提出了 BanditPAM++,它是一种随机化的 $k$-medians算法,通过两个算法优化,与 BanditPAM 相比, complexity 为 $O(k)$ 和增加了很多的 wall-clock 时间。首先,我们证明 BanditPAM 具有特殊的结构,可以在每个迭代中重用 clustering 信息。其次,我们证明 BanditPAM 具有额外的结构,允许在不同的迭代中重用信息。这些观察点激发我们提出 BanditPAM++,它返回与 BanditPAM 相同的 clustering 解决方案,但通常很多 slower。例如,在 CIFAR10 数据集上,BanditPAM++ 与 BanditPAM 返回相同的结果,但运行速度比 BanditPAM 快了大约 10 倍。最后,我们提供了高性能的 C++ 实现,可以在 Python 和 R 中调用,并可能对实践者有利。详细的实验代码可以在 https://github.com/motiwari/BanditPAM 和 https://github.com/ThrunGroup/BanditPAM_plusplus_experiments 上找到。”

Automating the Correctness Assessment of AI-generated Code for Security Contexts

  • paper_url: http://arxiv.org/abs/2310.18834
  • repo_url: None
  • paper_authors: Domenico Cotroneo, Alessio Foggia, Cristina Improta, Pietro Liguori, Roberto Natella
  • for: This paper aims to evaluate the correctness of AI-generated code for security purposes using a fully automated method.
  • methods: The proposed method, named ACCA, uses symbolic execution to assess whether the AI-generated code behaves as a reference implementation.
  • results: The proposed method outperforms baseline solutions and shows a strong correlation with human evaluation, with an average time of ~0.17s per code snippet, much faster than manual inspection.
    Abstract In this paper, we propose a fully automated method, named ACCA, to evaluate the correctness of AI-generated code for security purposes. The method uses symbolic execution to assess whether the AI-generated code behaves as a reference implementation. We use ACCA to assess four state-of-the-art models trained to generate security-oriented assembly code and compare the results of the evaluation with different baseline solutions, including output similarity metrics, widely used in the field, and the well-known ChatGPT, the AI-powered language model developed by OpenAI. Our experiments show that our method outperforms the baseline solutions and assesses the correctness of the AI-generated code similar to the human-based evaluation, which is considered the ground truth for the assessment in the field. Moreover, ACCA has a very strong correlation with human evaluation (Pearson's correlation coefficient r=0.84 on average). Finally, since it is a fully automated solution that does not require any human intervention, the proposed method performs the assessment of every code snippet in ~0.17s on average, which is definitely lower than the average time required by human analysts to manually inspect the code, based on our experience.
    摘要 在这篇论文中,我们提出了一种完全自动化的方法,名为ACCA,用于评估人工智能生成的代码的正确性,以便用于安全目的。该方法利用symbolic执行来评估AI生成的代码是否与参考实现一致。我们使用ACCA评估了四种现状最佳的模型,用于生成安全听力的assembly代码,并与不同的基准解决方案进行比较,包括输出相似度指标,在领域内广泛使用的,以及由OpenAI开发的知名的ChatGPT人工智能语言模型。我们的实验表明,我们的方法超越了基准解决方案,并与人类评估类似,被视为领域中的地面真实值。此外,ACCA与人类评估之间存在强相关性(平均Pearson相关系数r=0.84)。最后,由于它是完全自动化的,不需要任何人类参与,我们的方法可以快速地评估每个代码副本,平均需时约0.17秒,明显低于由人工分析员手动检查代码所需的时间,根据我们的经验。

Responsible AI (RAI) Games and Ensembles

  • paper_url: http://arxiv.org/abs/2310.18832
  • repo_url: https://github.com/yashgupta-7/rai-games
  • paper_authors: Yash Gupta, Runtian Zhai, Arun Suggala, Pradeep Ravikumar
  • for: 这个研究旨在解决人工智能(AI)在社会中的影响,包括公平性、可靠性和安全性等问题。
  • methods: 这个研究使用了一种普遍的框架,称为责任AI(RAI)游戏,来研究这些问题。两种算法来解决这些游戏:一种是基于在线学习和游戏理论的游戏玩家算法,另一种是基于经典统计文献中的提升和回归算法。
  • results: 研究证明了这些方法在解决一些RAI问题,特别是在子人口变化时的性能竞争力。
    Abstract Several recent works have studied the societal effects of AI; these include issues such as fairness, robustness, and safety. In many of these objectives, a learner seeks to minimize its worst-case loss over a set of predefined distributions (known as uncertainty sets), with usual examples being perturbed versions of the empirical distribution. In other words, aforementioned problems can be written as min-max problems over these uncertainty sets. In this work, we provide a general framework for studying these problems, which we refer to as Responsible AI (RAI) games. We provide two classes of algorithms for solving these games: (a) game-play based algorithms, and (b) greedy stagewise estimation algorithms. The former class is motivated by online learning and game theory, whereas the latter class is motivated by the classical statistical literature on boosting, and regression. We empirically demonstrate the applicability and competitive performance of our techniques for solving several RAI problems, particularly around subpopulation shift.
    摘要 Recent research has focused on the social impact of AI, including issues such as fairness, robustness, and safety. In many cases, the goal is to minimize the worst-case loss over a set of predefined distribution (known as uncertainty sets), such as perturbed versions of the empirical distribution. These problems can be formulated as min-max problems over the uncertainty sets. In this study, we propose a general framework for addressing these issues, which we refer to as Responsible AI (RAI) games. We present two classes of algorithms for solving these games: (a) game-play based algorithms, and (b) greedy stagewise estimation algorithms. The former class is inspired by online learning and game theory, while the latter class is based on the classical statistical literature on boosting and regression. We empirically demonstrate the applicability and competitive performance of our techniques for solving several RAI problems, particularly in the context of subpopulation shift.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Macau, and Taiwan.

All Things Considered: Detecting Partisan Events from News Media with Cross-Article Comparison

  • paper_url: http://arxiv.org/abs/2310.18827
  • repo_url: None
  • paper_authors: Yujian Liu, Xinliang Frederick Zhang, Kaijian Zou, Ruihong Huang, Nick Beauchamp, Lu Wang
  • for: 本研究旨在探讨新闻媒体如何影响公众意见,以及媒体如何通过透明或不透明的方式 shape opinion。
  • methods: 本研究使用了一种基于潜在变量的框架,通过比较相同故事的多篇文章来预测文章的政治倾向。
  • results: 实验表明,媒体可以通过不公平地选择报道事件来形成公众意见,而且这种偏见存在于主流媒体中,即使媒体具有强的 объекivity 和非政治化准则。
    Abstract Public opinion is shaped by the information news media provide, and that information in turn may be shaped by the ideological preferences of media outlets. But while much attention has been devoted to media bias via overt ideological language or topic selection, a more unobtrusive way in which the media shape opinion is via the strategic inclusion or omission of partisan events that may support one side or the other. We develop a latent variable-based framework to predict the ideology of news articles by comparing multiple articles on the same story and identifying partisan events whose inclusion or omission reveals ideology. Our experiments first validate the existence of partisan event selection, and then show that article alignment and cross-document comparison detect partisan events and article ideology better than competitive baselines. Our results reveal the high-level form of media bias, which is present even among mainstream media with strong norms of objectivity and nonpartisanship. Our codebase and dataset are available at https://github.com/launchnlp/ATC.
    摘要 社会舆论是由新闻媒体提供的信息所形成的,而这些信息可能受媒体机构的意识形态偏好所影响。然而,许多注意力集中在媒体偏见的明显表达或话题选择上,而媒体 shape 意见的更加不显式的方式却很少得到关注。我们提出了一种基于隐藏变量的框架,用于预测新闻文章的意识性。我们通过比较同一个故事的多篇文章来确定包含或排除某些政治事件的媒体偏见。我们的实验首先证明了事件选择的存在,然后展示了文章对齐和跨文档比较的能力更好地探测文章意识性和媒体偏见。我们的结果表明,媒体偏见存在于主流媒体中,即使媒体有强大的objectivity和非政治化的准则。我们的代码库和数据集可以在 上获取。

A Fuzzy Time Series-Based Model Using Particle Swarm Optimization and Weighted Rules

  • paper_url: http://arxiv.org/abs/2310.18825
  • repo_url: None
  • paper_authors: Daniel Ortiz-Arroyo
  • for: 提高高阶不确定时间序列模型的精度和可靠性。
  • methods: combining particle swarm optimization (PSO) and weighted summation to address the limitations of high-order fuzzy time series models.
  • results: 比前方法更高精度地模型时间序列。
    Abstract During the last decades, a myriad of fuzzy time series models have been proposed in scientific literature. Among the most accurate models found in fuzzy time series, the high-order ones are the most accurate. The research described in this paper tackles three potential limitations associated with the application of high-order fuzzy time series models. To begin with, the adequacy of forecast rules lacks consistency. Secondly, as the model's order increases, data utilization diminishes. Thirdly, the uniformity of forecast rules proves to be highly contingent on the chosen interval partitions. To address these likely drawbacks, we introduce a novel model based on fuzzy time series that amalgamates the principles of particle swarm optimization (PSO) and weighted summation. Our results show that our approach models accurately the time series in comparison with previous methods.
    摘要 在过去几十年中,数字时间序列模型的研究得到了广泛的发展和应用。高阶的含糊时间序列模型在科学文献中被认为是最为准确的。本研究考虑了高阶含糊时间序列模型的三个可能的限制:首先,预测规则的适用稳定性不充分;第二,随着模型的阶数增加,数据利用率逐渐减少;第三,预测规则的均匀性高度取决于选择的时间分割。为解决这些可能的缺点,我们提出了一种基于含糊时间序列的新模型,具有融合了粒子群组合优化(PSO)和Weighted Summary的原则。我们的结果表明,我们的方法可以准确地模型时间序列,与前期方法相比。

Rethinking Semi-Supervised Federated Learning: How to co-train fully-labeled and fully-unlabeled client imaging data

  • paper_url: http://arxiv.org/abs/2310.18815
  • repo_url: None
  • paper_authors: Pramit Saha, Divyanshu Mishra, J. Alison Noble
  • for: 本研究旨在解决 semi-supervised federated learning (SSFL) 中 client 之间具有半标注数据的问题,特别是在医疗设置下,合作伙伴(通常是医院)可能拥有图像,但没有注释。
  • methods: 我们提出了一种新的学习方案,即 Isolated Federated Learning (IsoFed),以避免简单的平均方法。我们的训练方法包括两个部分:(a)孤立的客户端模型归一化,以及(b)所有客户端的本地自我超vised pre-training。
  • results: 我们在四种不同的医疗影像数据集上进行了实验,包括 MedMNIST 的医疗影像 benchmark。我们还在不同的实验设置下变换了比例的标注客户端和多样性,以示出我们的方法在不同的情况下的效果。
    Abstract The most challenging, yet practical, setting of semi-supervised federated learning (SSFL) is where a few clients have fully labeled data whereas the other clients have fully unlabeled data. This is particularly common in healthcare settings where collaborating partners (typically hospitals) may have images but not annotations. The bottleneck in this setting is the joint training of labeled and unlabeled clients as the objective function for each client varies based on the availability of labels. This paper investigates an alternative way for effective training with labeled and unlabeled clients in a federated setting. We propose a novel learning scheme specifically designed for SSFL which we call Isolated Federated Learning (IsoFed) that circumvents the problem by avoiding simple averaging of supervised and semi-supervised models together. In particular, our training approach consists of two parts - (a) isolated aggregation of labeled and unlabeled client models, and (b) local self-supervised pretraining of isolated global models in all clients. We evaluate our model performance on medical image datasets of four different modalities publicly available within the biomedical image classification benchmark MedMNIST. We further vary the proportion of labeled clients and the degree of heterogeneity to demonstrate the effectiveness of the proposed method under varied experimental settings.
    摘要 最大挑战的、但实际可行的 semi-supervised federated learning(SSFL)设置是,一些客户端有完全标注数据,而另一些客户端有完全无标注数据。这种情况 particullary 在医疗设置中常见,合作伙伴(通常是医院)可能拥有图像,但并没有标注。瓶颈在这种设置下是 joint 训练标注和无标注客户端的目标函数,因为每个客户端的目标函数因标注的可用性而变化。这篇论文investigates an alternative way for effective training with labeled and unlabeled clients in a federated setting. We propose a novel learning scheme specifically designed for SSFL, which we call Isolated Federated Learning (IsoFed), to circumvent this problem by avoiding simple averaging of supervised and semi-supervised models together. In particular, our training approach consists of two parts: (a) isolated aggregation of labeled and unlabeled client models, and (b) local self-supervised pretraining of isolated global models in all clients. We evaluate our model performance on medical image datasets of four different modalities publicly available within the biomedical image classification benchmark MedMNIST. We further vary the proportion of labeled clients and the degree of heterogeneity to demonstrate the effectiveness of the proposed method under varied experimental settings.

Hierarchical Framework for Interpretable and Probabilistic Model-Based Safe Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.18811
  • repo_url: None
  • paper_authors: Ammar N. Abbas, Georgios C. Chasparis, John D. Kelleher
  • for: 这篇论文的目的是提出一种基于深度强化学习的安全关键系统解决方案,以便在安全关键系统中使用深度强化学习,并且提供解释性的执行。
  • methods: 这篇论文使用了深度强化学习,并与传统决策策略相合作,以提高安全关键系统的可靠性和可控性。它还使用了潜在模型和强化学习的融合,以提高解释性和可靠性。
  • results: 这篇论文的实验结果显示,BC-SRLA在维护领域中的维护维护过程中表现出色,比传统方法和其他基于RL的基eline更好。
    Abstract The difficulty of identifying the physical model of complex systems has led to exploring methods that do not rely on such complex modeling of the systems. Deep reinforcement learning has been the pioneer for solving this problem without the need for relying on the physical model of complex systems by just interacting with it. However, it uses a black-box learning approach that makes it difficult to be applied within real-world and safety-critical systems without providing explanations of the actions derived by the model. Furthermore, an open research question in deep reinforcement learning is how to focus the policy learning of critical decisions within a sparse domain. This paper proposes a novel approach for the use of deep reinforcement learning in safety-critical systems. It combines the advantages of probabilistic modeling and reinforcement learning with the added benefits of interpretability and works in collaboration and synchronization with conventional decision-making strategies. The BC-SRLA is activated in specific situations which are identified autonomously through the fused information of probabilistic model and reinforcement learning, such as abnormal conditions or when the system is near-to-failure. Further, it is initialized with a baseline policy using policy cloning to allow minimum interactions with the environment to address the challenges associated with using RL in safety-critical industries. The effectiveness of the BC-SRLA is demonstrated through a case study in maintenance applied to turbofan engines, where it shows superior performance to the prior art and other baselines.
    摘要 因为识别复杂系统的物理模型具有挑战,因此探索不需要基于这些复杂模型的方法。深度强化学习曾经是解决这个问题的先驱,它不需要基于系统的物理模型来解决问题,只需通过与系统交互来解决问题。然而,它使用黑盒学习方法,这使得其在实际世界和安全关键系统中应用非常困难,而且无法提供行为的解释。此外,深度强化学习中的一个开放研究问题是如何将策略学习集中在稀疏领域中。本文提出了一种基于深度强化学习的新方法,用于安全关键系统中。它结合了概率模型和强化学习的优点,同时增加了可解性。此外,它与传统决策策略协作和同步,在特定情况下自动识别并且通过混合信息来识别,例如异常情况或系统垂直危机。此外,它使用策略做副本来初始化,以最小化与环境的交互,解决了使用强化学习在安全关键行业中的挑战。本文通过一个维护案例研究展示了BC-SRLA的有效性,其在维护领域的表现较优于先前艺术和其他基线。

OC-NMN: Object-centric Compositional Neural Module Network for Generative Visual Analogical Reasoning

  • paper_url: http://arxiv.org/abs/2310.18807
  • repo_url: None
  • paper_authors: Rim Assouel, Pau Rodriguez, Perouz Taslakian, David Vazquez, Yoshua Bengio
  • for: This paper aims to improve the ability of machine learning systems to imagine and compose learned concepts in novel ways, specifically in the context of visual reasoning.
  • methods: The paper proposes a modular data augmentation framework called Object-centric Compositional Neural Module Network (OC-NMN), which decomposes visual generative reasoning tasks into a series of primitives applied to objects.
  • results: The paper shows that the proposed modular architectural choices can be used to generate new training tasks that lead to better out-of-distribution generalization, and compares the model to existing and new baselines in a proposed visual reasoning benchmark.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文目标是提高机器学习系统的想象和组合学习能力,特别是在视觉理解中。
  • methods: 该论文提出了一种模块化数据增强框架,称为Object-centric Compositional Neural Module Network (OC-NMN),它将视觉生成逻辑任务 decomposes 成一系列对象上的基本操作。
  • results: 论文显示,提出的模块性建筑设计可以生成新的训练任务,导致更好的 OUT-OF-distribution 通用化。并与现有和新的基准值进行比较,在提posed的视觉理解 bencmark 中。
    Abstract A key aspect of human intelligence is the ability to imagine -- composing learned concepts in novel ways -- to make sense of new scenarios. Such capacity is not yet attained for machine learning systems. In this work, in the context of visual reasoning, we show how modularity can be leveraged to derive a compositional data augmentation framework inspired by imagination. Our method, denoted Object-centric Compositional Neural Module Network (OC-NMN), decomposes visual generative reasoning tasks into a series of primitives applied to objects without using a domain-specific language. We show that our modular architectural choices can be used to generate new training tasks that lead to better out-of-distribution generalization. We compare our model to existing and new baselines in proposed visual reasoning benchmark that consists of applying arithmetic operations to MNIST digits.
    摘要 人类智能的一个重要方面是具备想象能力---把已学习的概念组合在新的方式下来---以便理解新的场景。这种能力目前尚未被机器学习系统具备。在这项工作中,我们在视觉逻辑上利用了模块性,以 derive一种基于想象的数据增强框架。我们的方法,称为物体中心的compositional Neural Module Network(OC-NMN),将视觉生成逻辑任务分解成一系列对象上的基本Primitive。我们显示了我们的建筑方式可以生成新的训练任务,导致更好的对外值 generale。我们与现有和新的基准值进行比较,并在我们提出的视觉理解benchmark中进行测试,该benchmark包括对MNIST数字应用数学运算。

Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting

  • paper_url: http://arxiv.org/abs/2310.18804
  • repo_url: None
  • paper_authors: Hejie Cui, Xinyu Fang, Zihan Zhang, Ran Xu, Xuan Kan, Xin Liu, Yue Yu, Manling Li, Yangqiu Song, Carl Yang
  • for: 这篇论文旨在探讨开放视觉知识EXTRACTION的新方法,以提高机器理解世界的能力。
  • methods: 该方法使用开放关系区域检测器和大型多模态模型,从图像中提取无格式的视觉知识。
  • results: 实验表明,OpenVik可以生成具有准确性和独特性的开放视觉知识,并在多种视觉理解应用中提供了显著的改进。
    Abstract Images contain rich relational knowledge that can help machines understand the world. Existing methods on visual knowledge extraction often rely on the pre-defined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation types), restricting the expressiveness of the extracted knowledge. In this work, we take a first exploration to a new paradigm of open visual knowledge extraction. To achieve this, we present OpenVik which consists of an open relational region detector to detect regions potentially containing relational knowledge and a visual knowledge generator that generates format-free knowledge by prompting the large multimodality model with the detected region of interest. We also explore two data enhancement techniques for diversifying the generated format-free visual knowledge. Extensive knowledge quality evaluations highlight the correctness and uniqueness of the extracted open visual knowledge by OpenVik. Moreover, integrating our extracted knowledge across various visual reasoning applications shows consistent improvements, indicating the real-world applicability of OpenVik.
    摘要 图像含有丰富的关系知识,可以帮助机器理解世界。现有的视觉知识EXTRACTION方法 oft rely on先defined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation types), restricting the expressiveness of the extracted knowledge。在这项工作中,我们开始了一种新的开放视觉知识EXTRACTION paradigm。为达到这个目标,我们提出了 OpenVik,它包括一个开放关系区域检测器,用于检测可能包含关系知识的区域,以及一个可视知识生成器,通过在检测到区域特点的提示下,生成无格式的知识。我们还探讨了两种数据增强技术,用于让生成的无格式视觉知识更加多样化。广泛的知识质量评估表明OpenVik提取的开放视觉知识具有正确性和独特性。此外,我们在不同的视觉理解应用中集成我们提取的知识,显示了一致的改进, indicating the real-world applicability of OpenVik。

Sequence-Level Certainty Reduces Hallucination In Knowledge-Grounded Dialogue Generation

  • paper_url: http://arxiv.org/abs/2310.18794
  • repo_url: None
  • paper_authors: Yixin Wan, Fanyou Wu, Weijie Xu, Srinivasan H. Sengamedu
  • for: 本研究的目的是探讨模型幻化现象在自然语言生成(NLG)中的作用,并提出基于确定性的回答排名方法来减少模型幻化。
  • methods: 本研究使用了序列级确定性的两个方面:概率确定性和含义确定性,并通过对知识推理对话生成(KGDG)任务的实验发现,两者均与模型回答中幻化水平有显著相关性。
  • results: 研究发现,在模型回答中含义确定性水平较高时,幻化水平较低,而概率确定性水平较高时,幻化水平较高。此外,研究还提供了理论分析和证明,证明含义确定性可以作为概率确定性的一种代替方案,并在黑obox场景中具有可行性。基于这些发现,本研究提出了确定性基本回答排名(CRR)方法,以减少NLG中模型幻化现象。CRR分为两种类型:概率CRR(P-CRR)和含义CRR(S-CRR)。P-CRR使用模型回答整个序列的平均Log probability来排名样本。S-CRR根据模型回答的含义相似度来排名一些模型回答的候选者,并使用含义相似度来估计模型回答的确定性水平。通过对3个KGDG数据集、3种排序方法和4个模型进行了广泛的实验, validate了我们提出的2种CRR方法的效果。
    Abstract Model hallucination has been a crucial interest of research in Natural Language Generation (NLG). In this work, we propose sequence-level certainty as a common theme over hallucination in NLG, and explore the correlation between sequence-level certainty and the level of hallucination in model responses. We categorize sequence-level certainty into two aspects: probabilistic certainty and semantic certainty, and reveal through experiments on Knowledge-Grounded Dialogue Generation (KGDG) task that both a higher level of probabilistic certainty and a higher level of semantic certainty in model responses are significantly correlated with a lower level of hallucination. What's more, we provide theoretical proof and analysis to show that semantic certainty is a good estimator of probabilistic certainty, and therefore has the potential as an alternative to probability-based certainty estimation in black-box scenarios. Based on the observation on the relationship between certainty and hallucination, we further propose Certainty-based Response Ranking (CRR), a decoding-time method for mitigating hallucination in NLG. Based on our categorization of sequence-level certainty, we propose 2 types of CRR approach: Probabilistic CRR (P-CRR) and Semantic CRR (S-CRR). P-CRR ranks individually sampled model responses using their arithmetic mean log-probability of the entire sequence. S-CRR approaches certainty estimation from meaning-space, and ranks a number of model response candidates based on their semantic certainty level, which is estimated by the entailment-based Agreement Score (AS). Through extensive experiments across 3 KGDG datasets, 3 decoding methods, and on 4 different models, we validate the effectiveness of our 2 proposed CRR methods to reduce model hallucination.
    摘要 modelo de generación de lenguaje natural (NLG) ha sido un tema crucial de investigación en la comunidad científica. En este trabajo, propusimos la certidumbre de secuencia como un tema común en la generación de lenguaje natural, y exploramos la relación entre la certidumbre de secuencia y el nivel de halucinación en las respuestas del modelo. Distinguiendo la certidumbre de secuencia en dos aspectos: la certidumbre probabilística y la certidumbre semántica, revelamos a través de experimentos en la tarea de generación de diálogo basado en conocimientos (KGDG) que ambos tienen un nivel significativamente correlacionado con un nivel bajo de halucinación. Además, proveímos pruebas teóricas y análisis para demostrar que la certidumbre semántica es un buen estimador de la certidumbre probabilística, y por lo tanto tiene el potencial de servir como una alternativa a la estimación de certidumbre basada en probabilidades en escenarios de "black-box". Basándonos en la observación de la relación entre la certidumbre y la halucinación, propusimos el Metodo de Ranking de Respuestas basado en la Certidumbre (CRR), un método de decodificación en tiempo real para mitigar la halucinación en NLG. Basándonos en nuestra categorización de la certidumbre de secuencia, propusimos dos enfoques de CRR: el enfoque de Certidumbre Probabilística (P-CRR) y el enfoque de Certidumbre Semántica (S-CRR). El enfoque P-CRR clasifica las respuestas individualmente seleccionadas del modelo utilizando su probabilidad aritmética promedio de toda la secuencia. El enfoque S-CRR se basa en la certidumbre semántica, y clasifica un número de candidatos de respuestas del modelo según su nivel de certidumbre semántica, que se estima utilizando el índice de Entendimiento (AS). A través de extensivos experimentos en tres conjuntos de datos de KGDG, tres métodos de decodificación y cuatro modelos diferentes, validamos la eficacia de nuestros dos métodos de CRR para reducir la halucinación del modelo.

“Do it my way!”: Impact of Customizations on Trust perceptions in Human-Robot Collaboration

  • paper_url: http://arxiv.org/abs/2310.18791
  • repo_url: None
  • paper_authors: Parv Kapoor, Simon Chu, Angela Chen
  • for: 这个研究旨在探讨个性化助手机器人的影响,以及个性化程度对人类使用者的体验和信任感的影响。
  • methods: 研究采用了在人类使用者身上进行的内置研究(N=17),并对不同水平的自定义可能性进行了比较。
  • results: 研究发现,增加个性化程度会导致更高的信任和舒适感。这些发现可以帮助设计师设计更信任worthy和个性化的助手机器人。
    Abstract Trust has been shown to be a key factor in effective human-robot collaboration. In the context of assistive robotics, the effect of trust factors on human experience is further pronounced. Personalization of assistive robots is an orthogonal factor positively correlated with robot adoption and user perceptions. In this work, we investigate the relationship between these factors through a within-subjects study (N=17). We provide different levels of customization possibilities over baseline autonomous robot behavior and investigate its impact on trust. Our findings indicate that increased levels of customization was associated with higher trust and comfort perceptions. The assistive robot design process can benefit significantly from our insights for designing trustworthy and customized robots.
    摘要 信任被证明为人机合作中关键因素。在帮助型机器人领域,信任因素对人类体验的影响更加明显。个性化机器人设计是一个 orthogonal 因素,与机器人采用和用户对机器人的评价显著相关。本研究通过在subjects(N=17)中进行内部研究,研究自适应机器人行为的不同水平的个性化可能性对信任的影响。我们发现,逐渐提高个性化水平与信任、舒适感的增加有显著相关性。这些发现可以帮助设计信任worthy和个性化的机器人设计过程。

Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

  • paper_url: http://arxiv.org/abs/2310.18780
  • repo_url: None
  • paper_authors: Stefano Massaroli, Michael Poli, Daniel Y. Fu, Hermann Kumbong, Rom N. Parnichkun, Aman Timalsina, David W. Romero, Quinn McIntyre, Beidi Chen, Atri Rudra, Ce Zhang, Christopher Re, Stefano Ermon, Yoshua Bengio
  • for: 降低memory footprint和提高throughput during generation
  • methods: 使用 rational interpolation和model-order reduction techniques,以及Weight-tying filters across channels into heads
  • results: 实现了10倍于Transformers的throughput和1.5倍于Hyena的throughput,而且无损质量 послеdistillation
    Abstract Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers. In particular, long convolution sequence models have achieved state-of-the-art performance in many domains, but incur a significant cost during auto-regressive inference workloads -- naively requiring a full pass (or caching of activations) over the input sequence for each generated token -- similarly to attention-based models. In this paper, we seek to enable $\mathcal O(1)$ compute and memory cost per token in any pre-trained long convolution architecture to reduce memory footprint and increase throughput during generation. Concretely, our methods consist in extracting low-dimensional linear state-space models from each convolution layer, building upon rational interpolation and model-order reduction techniques. We further introduce architectural improvements to convolution-based layers such as Hyena: by weight-tying the filters across channels into heads, we achieve higher pre-training quality and reduce the number of filters to be distilled. The resulting model achieves 10x higher throughput than Transformers and 1.5x higher than Hyena at 1.3B parameters, without any loss in quality after distillation.
    摘要 最近的进展在无注意力序列模型中使用核函数作为Transformer核心中的注意力运算符的替代方案。具体而言,长核函数序列模型在多个领域中 achieved state-of-the-art performance,但是在自动生成推干负载中具有重要的成本 - 需要遍历输入序列的全部通过或缓存活动的结果。在这篇论文中,我们寻求实现$\mathcal O(1)$的 compute和memory成本每个 токен,以降低快取面积和增加生成速度。具体来说,我们的方法是从每个核函数层提取低维度的线性状态空间模型,建立在理性插值和模型阶层技术之上。我们还引入了对于核函数层的建筑改进,例如Hyena:将核函数跨通道联结成头部,以提高预训品质和降低缩减策略中的缩减策略。实验结果显示,我们的模型可以与Transformer和Hyena相比,在1.3B个参数下实现10倍的生成速度,不会对品质造成损害。

Improving Compositional Generalization Using Iterated Learning and Simplicial Embeddings

  • paper_url: http://arxiv.org/abs/2310.18777
  • repo_url: None
  • paper_authors: Yi Ren, Samuel Lavoie, Mikhail Galkin, Danica J. Sutherland, Aaron Courville
  • for: The paper aims to improve the compositional generalization of deep neural networks, which is the ability to generalize to unseen combinations of latent factors.
  • methods: The paper proposes using iterated learning on models with simplicial embeddings to improve compositional generalization. This approach is motivated by an analysis of compositionality based on Kolmogorov complexity.
  • results: The paper demonstrates improvements in compositional generalization over other approaches, using both vision tasks with well-understood latent factors and real molecular graph prediction tasks where the latent structure is unknown.
    Abstract Compositional generalization, the ability of an agent to generalize to unseen combinations of latent factors, is easy for humans but hard for deep neural networks. A line of research in cognitive science has hypothesized a process, ``iterated learning,'' to help explain how human language developed this ability; the theory rests on simultaneous pressures towards compressibility (when an ignorant agent learns from an informed one) and expressivity (when it uses the representation for downstream tasks). Inspired by this process, we propose to improve the compositional generalization of deep networks by using iterated learning on models with simplicial embeddings, which can approximately discretize representations. This approach is further motivated by an analysis of compositionality based on Kolmogorov complexity. We show that this combination of changes improves compositional generalization over other approaches, demonstrating these improvements both on vision tasks with well-understood latent factors and on real molecular graph prediction tasks where the latent structure is unknown.
    摘要 人类的 Compositional generalization,即对未经过视图的组合因素进行泛化,容易 для人类,但困难 для深度神经网络。认知科学中的一条研究提出了一个过程,即“迭代学习”,以解释人类语言的发展能力;该理论基于同时应对压缩性(ikor ignorant agent 从 informed one 学习)和表达性(ikor it 使用表示进行下游任务)的同时压力。 inspirited by this process, we propose to improve the compositional generalization of deep networks by using iterated learning on models with simplicial embeddings, which can approximately discretize representations. This approach is further motivated by an analysis of compositionality based on Kolmogorov complexity. We show that this combination of changes improves compositional generalization over other approaches, demonstrating these improvements both on vision tasks with well-understood latent factors and on real molecular graph prediction tasks where the latent structure is unknown.Note: Simplified Chinese is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and Macau. The translation is written in Simplified Chinese.

Linear Mode Connectivity in Sparse Neural Networks

  • paper_url: http://arxiv.org/abs/2310.18769
  • repo_url: None
  • paper_authors: Luke McDermott, Daniel Cummings
  • for: 这个论文研究了使用生成的数据进行神经网络减少,并研究了这些减少后的神经网络在真实数据上的训练特性。
  • methods: 该论文使用了迭代幅度减少(IMP)法,并使用了一种称为“液体减少”的方法来生成数据。
  • results: 研究发现,使用生成的数据和IMP法可以创建一类稀疏神经网络,这些神经网络在真实数据上训练时更加稳定,并且可以与传统IMP法相比,使用更少的训练点(最多150倍)达到相同的性能。
    Abstract With the rise in interest of sparse neural networks, we study how neural network pruning with synthetic data leads to sparse networks with unique training properties. We find that distilled data, a synthetic summarization of the real data, paired with Iterative Magnitude Pruning (IMP) unveils a new class of sparse networks that are more stable to SGD noise on the real data, than either the dense model, or subnetworks found with real data in IMP. That is, synthetically chosen subnetworks often train to the same minima, or exhibit linear mode connectivity. We study this through linear interpolation, loss landscape visualizations, and measuring the diagonal of the hessian. While dataset distillation as a field is still young, we find that these properties lead to synthetic subnetworks matching the performance of traditional IMP with up to 150x less training points in settings where distilled data applies.
    摘要 “因为神经网络束缚的兴趣增长,我们研究了使用 sintetic data 进行神经网络剪除的影响。我们发现,通过对真实数据进行概要汇总,并使用迭代大小剪除(IMP),可以找到一类特有的稀疏网络,它们在真实数据上更加稳定,SGD 噪音的影响下。即使使用真实数据进行 IMP,也不能达到这类网络的性能。我们通过线性 interpolate,损失地图可见化和对偏导数矩阵的评估来研究这一点。虽然数据概要为一个 relativity 新的领域,但我们发现这些特性使得使用 sintetic data 可以达到与传统 IMP 相同的性能,即使是使用 150 倍少的训练点。”

Reboost Large Language Model-based Text-to-SQL, Text-to-Python, and Text-to-Function – with Real Applications in Traffic Domain

  • paper_url: http://arxiv.org/abs/2310.18752
  • repo_url: None
  • paper_authors: Guanghu Sui, Zhishuai Li, Ziyue Li, Sun Yang, Jingqing Ruan, Hangyu Mao, Rui Zhao
  • for: 提高文本到SQL执行精度
  • methods: 改进提问方法,包括查询重写和SQL增强
  • results: 实现了显著提高执行精度,使用较弱的预训练语言模型也达到了21.05%的最高精度Here’s the full translation of the abstract in Simplified Chinese:本文提出了一种更适应和更通用的提问方法,用于提高文本到SQL执行精度。我们发现了对于商业 dataset 的执行精度的显著下降,并且分析了 dataset 的复杂性和问题意图的不同所带来的影响。为了减少信息漏斗,我们将comments、值类型和值示例包含在数据库描述中。我们的实验表明,使用大型自然语言模型(LLMs)可以实现显著的性能提高。相比之下,state-of-the-art 方法在商业 dataset 上的执行精度为21.05%,而我们的方法在同一 dataset 上达到了65.79%。此外,我们还探讨了文本到Python和文本到函数等选项,并对其间的优缺点进行了深入分析,为社区提供了有价值的意见。
    Abstract The previous state-of-the-art (SOTA) method achieved a remarkable execution accuracy on the Spider dataset, which is one of the largest and most diverse datasets in the Text-to-SQL domain. However, during our reproduction of the business dataset, we observed a significant drop in performance. We examined the differences in dataset complexity, as well as the clarity of questions' intentions, and assessed how those differences could impact the performance of prompting methods. Subsequently, We develop a more adaptable and more general prompting method, involving mainly query rewriting and SQL boosting, which respectively transform vague information into exact and precise information and enhance the SQL itself by incorporating execution feedback and the query results from the database content. In order to prevent information gaps, we include the comments, value types, and value samples for columns as part of the database description in the prompt. Our experiments with Large Language Models (LLMs) illustrate the significant performance improvement on the business dataset and prove the substantial potential of our method. In terms of execution accuracy on the business dataset, the SOTA method scored 21.05, while our approach scored 65.79. As a result, our approach achieved a notable performance improvement even when using a less capable pre-trained language model. Last but not least, we also explore the Text-to-Python and Text-to-Function options, and we deeply analyze the pros and cons among them, offering valuable insights to the community.
    摘要 previous state-of-the-art (SOTA) 方法在 Spider 数据集上达到了杰出的执行精度,这是文本到 SQL 领域中最大和最多样的数据集之一。然而,在我们重现商业数据集时,我们注意到了显著的性能下降。我们分析了数据集的复杂性以及问题意图的清晰度,并评估了这些差异如何影响提示方法的性能。因此,我们开发了更适应和更通用的提示方法,包括主要的查询重写和 SQL 加强,将混淆信息转化为准确和精确信息,并通过 incorporating 执行反馈和数据库内容的查询结果来增强 SQL 本身。为了避免信息异常,我们将数据库描述中的注释、值类型和值示例包含在提示中。我们的实验表明,使用大型自然语言模型 (LLMs) 可以在商业数据集上实现显著性能提升,并证明了我们的方法的巨大潜力。在商业数据集上的执行精度方面,SOTA 方法得分 21.05,而我们的方法得分 65.79。因此,我们的方法在使用较弱预训练语言模型时 still 实现了显著的性能提升。最后,我们还探索了 Text-to-Python 和 Text-to-Function 选项,并进行了深入分析,提供了价值的发现。

On Training Implicit Meta-Learning With Applications to Inductive Weighing in Consistency Regularization

  • paper_url: http://arxiv.org/abs/2310.18741
  • repo_url: None
  • paper_authors: Fady Rezk
  • for: 这个论文的目的是比较不同的缺省方法在隐式微调学习中的计算成本、稳定性、泛化性和估计准确性。
  • methods: 这个论文使用了多种缺省方法,包括矩阵估计、均值场估计和积分估计等,并对它们进行了系统比较。
  • results: 研究发现,矩阵估计和均值场估计在缺省学习中具有较高的计算成本和稳定性,而积分估计具有较高的泛化性和估计准确性。此外,研究还提出了一种新的半监督学习算法,可以透过增强具有适应性的域特异特征来增强鲁棒性。该算法的实验结果超过了基eline FixMatch性能。
    Abstract Meta-learning that uses implicit gradient have provided an exciting alternative to standard techniques which depend on the trajectory of the inner loop training. Implicit meta-learning (IML), however, require computing $2^{nd}$ order gradients, particularly the Hessian which is impractical to compute for modern deep learning models. Various approximations for the Hessian were proposed but a systematic comparison of their compute cost, stability, generalization of solution found and estimation accuracy were largely overlooked. In this study, we start by conducting a systematic comparative analysis of the various approximation methods and their effect when incorporated into IML training routines. We establish situations where catastrophic forgetting is exhibited in IML and explain their cause in terms of the inability of the approximations to estimate the curvature at convergence points. Sources of IML training instability are demonstrated and remedied. A detailed analysis of the effeciency of various inverse Hessian-vector product approximation methods is also provided. Subsequently, we use the insights gained to propose and evaluate a novel semi-supervised learning algorithm that learns to inductively weigh consistency regularization losses. We show how training a "Confidence Network" to extract domain specific features can learn to up-weigh useful images and down-weigh out-of-distribution samples. Results outperform the baseline FixMatch performance.
    摘要 Meta-学习使用隐式梯度提供了一种有趣的代替标准技术,这些技术取决于内部循环训练的轨迹。然而,隐式 meta-学习(IML)需要计算第二个梯度,特别是希尔比格,这是现代深度学习模型中计算的不实际。Various approximations for the Hessian were proposed, but a systematic comparison of their compute cost, stability, generalization of solution found and estimation accuracy were largely overlooked.在这项研究中,我们开始了一个系统性的比较分析,检验不同的近似方法在IML训练流程中的效果。我们证明了IML训练中出现的 катастрофи忘记现象,并解释了其原因为近似方法无法在 converges 点 estimating 曲线的 curvature。我们还示出了IML训练的不稳定性的来源,并提供了修复方法。另外,我们还提供了一个细节的 inverse Hessian-vector product approximation 方法的效率分析。然后,我们使用获得的理解,提出和评估一种新的半监督学习算法,该算法可以学习 inductively 权重一致减少损失。我们表明了在训练 "信任网络" 来提取域pecific特征时,可以学习到升重用户图像和降低非标范图像。结果超出了基eline FixMatch性能。

Pre-training with Random Orthogonal Projection Image Modeling

  • paper_url: http://arxiv.org/abs/2310.18737
  • repo_url: None
  • paper_authors: Maryam Haghighat, Peyman Moghadam, Shaheer Mohamed, Piotr Koniusz
  • for: The paper is written for proposing a new self-supervised learning framework called Random Orthogonal Projection Image Modeling (ROPIM) that can be used for visual pre-training without the need for labels.
  • methods: The paper uses a random orthogonal projection method to randomly mask entire spatial image areas with locally varying masking degrees, which encourages the network to capture and learn structural information about objects and scenes.
  • results: The paper shows that using random orthogonal projection leads to superior performance compared to crop-based masking, and demonstrates state-of-the-art results on several popular benchmarks.Here is the same information in Simplified Chinese text:
  • for: 这篇论文是为了介绍一种新的自我超视learning框架,即Random Orthogonal Projection Image Modeling(ROPIM),用于无标签的视觉预训练。
  • methods: 这篇论文使用随机正交投影方法,随机将整个图像空间掩码,实现了地方性Masking的效果,从而让网络学习对象和场景的结构信息。
  • results: 这篇论文表明,使用随机正交投影比crop-based masking更高效,并在多个流行的标准测试集上达到了领先的性能。
    Abstract Masked Image Modeling (MIM) is a powerful self-supervised strategy for visual pre-training without the use of labels. MIM applies random crops to input images, processes them with an encoder, and then recovers the masked inputs with a decoder, which encourages the network to capture and learn structural information about objects and scenes. The intermediate feature representations obtained from MIM are suitable for fine-tuning on downstream tasks. In this paper, we propose an Image Modeling framework based on random orthogonal projection instead of binary masking as in MIM. Our proposed Random Orthogonal Projection Image Modeling (ROPIM) reduces spatially-wise token information under guaranteed bound on the noise variance and can be considered as masking entire spatial image area under locally varying masking degrees. Since ROPIM uses a random subspace for the projection that realizes the masking step, the readily available complement of the subspace can be used during unmasking to promote recovery of removed information. In this paper, we show that using random orthogonal projection leads to superior performance compared to crop-based masking. We demonstrate state-of-the-art results on several popular benchmarks.
    摘要 自适应学习 ohne 标签的视觉预训练策略:面罩图像模型(MIM)。MIM 使用随机剪辑对输入图像进行处理,然后使用解码器恢复受随机剪辑影响的输入图像,这使得网络学习和捕捉图像中的结构信息。MIM 生成的中间特征表示可以进行下游任务的细化。在这篇论文中,我们提出了基于随机正交投影的图像模型框架(ROPIM)。ROPIM 在空间上减少了Token信息,并且可以保证随机投影的噪声方差的下界。由于 ROPIM 使用随机子空间进行投影,因此可以使用该子空间的可用资源进行解压缩,以便恢复被移除的信息。在这篇论文中,我们证明了使用随机正交投影可以比随机剪辑更高效。我们在多个流行的 benchmark 上达到了状态机器的表现。

  • paper_url: http://arxiv.org/abs/2310.18729
  • repo_url: None
  • paper_authors: Jakub Drápal, Hannes Westermann, Jaromir Savelka
  • for: 本研究旨在探讨如何使用大语言模型(LLM)和法律专家合作进行逻辑分析,以便提高逻辑分析的效率和质量。
  • methods: 本研究使用了一种新的框架,即将LLM与法律专家合作进行逻辑分析的初始编码(阶段2)、主题搜索(阶段3)和数据分类(阶段4)。
  • results: 研究发现,使用LLM可以生成合理的初始编码,并且可以根据专家反馈进行改进。此外,模型还能够透过零例学习来将描述事实分类到主题类别中。最后,由LLM自动发现的主题与法律专家所找到的主题之间存在一定的相似性。这些发现可以帮助法律研究人员在启用LLM时作出更 Informed Decisions。
    Abstract Thematic analysis and other variants of inductive coding are widely used qualitative analytic methods within empirical legal studies (ELS). We propose a novel framework facilitating effective collaboration of a legal expert with a large language model (LLM) for generating initial codes (phase 2 of thematic analysis), searching for themes (phase 3), and classifying the data in terms of the themes (to kick-start phase 4). We employed the framework for an analysis of a dataset (n=785) of facts descriptions from criminal court opinions regarding thefts. The goal of the analysis was to discover classes of typical thefts. Our results show that the LLM, namely OpenAI's GPT-4, generated reasonable initial codes, and it was capable of improving the quality of the codes based on expert feedback. They also suggest that the model performed well in zero-shot classification of facts descriptions in terms of the themes. Finally, the themes autonomously discovered by the LLM appear to map fairly well to the themes arrived at by legal experts. These findings can be leveraged by legal researchers to guide their decisions in integrating LLMs into their thematic analyses, as well as other inductive coding projects.
    摘要 empirical legal studies (ELS) widely used qualitative analytic methods, including thematic analysis and its variants. We propose a novel framework for effective collaboration between a legal expert and a large language model (LLM) in thematic analysis, including generating initial codes (phase 2), searching for themes (phase 3), and classifying the data in terms of themes (to kick-start phase 4). We applied the framework to a dataset (n=785) of fact descriptions from criminal court opinions on thefts, aiming to discover typical theft classes. Our results show that OpenAI's GPT-4, the LLM, generated reasonable initial codes and improved code quality based on expert feedback. Additionally, the model performed well in zero-shot classification of fact descriptions in terms of themes. The themes autonomously discovered by the LLM align well with the themes identified by legal experts, providing valuable insights for legal researchers integrating LLMs into their thematic analyses and other inductive coding projects.

The Evolution of the Interplay Between Input Distributions and Linear Regions in Networks

  • paper_url: http://arxiv.org/abs/2310.18725
  • repo_url: None
  • paper_authors: Xuan Qi, Yi Wei
  • for: 本研究旨在探讨深度神经网络的表达能力,具体来说是通过ReLU activation function来评估神经网络的表达能力。
  • methods: 本研究使用了 counted number of linear convex regions 来评估神经网络的表达能力。我们也提供了一种基于ReLU activation function的训练过程的分析。
  • results: 我们的研究发现,对于任意一个一维输入,存在一个最小阈值的神经元数量可以表达它。此外,我们还发现在训练过程中,ReLU网络的决策边界会经历反复细化过程。我们的研究希望能够激发网络优化的研究,并为深度神经网络的探索和分析提供启示。
    Abstract It is commonly recognized that the expressiveness of deep neural networks is contingent upon a range of factors, encompassing their depth, width, and other relevant considerations. Currently, the practical performance of the majority of deep neural networks remains uncertain. For ReLU (Rectified Linear Unit) networks with piecewise linear activations, the number of linear convex regions serves as a natural metric to gauge the network's expressivity. In this paper, we count the number of linear convex regions in deep neural networks based on ReLU. In particular, we prove that for any one-dimensional input, there exists a minimum threshold for the number of neurons required to express it. We also empirically observe that for the same network, intricate inputs hinder its capacity to express linear regions. Furthermore, we unveil the iterative refinement process of decision boundaries in ReLU networks during training. We aspire for our research to serve as an inspiration for network optimization endeavors and aids in the exploration and analysis of the behaviors exhibited by deep networks.
    摘要 通常认为深度神经网络的表达能力取决于各种因素,包括它们的深度、宽度和其他相关因素。目前,大多数深度神经网络的实际表现仍然不清楚。为ReLU(矩阵线性单元)网络,数量的凸 convex 区域作为一个自然的度量来衡量网络的表达能力。在这篇论文中,我们计算了深度神经网络中ReLU activation function的凸 convex 区域数量。特别是,我们证明了任何一维输入都存在一个最小阈值的神经元数量,可以表达它。此外,我们还观察到了在同一个网络中,复杂的输入会降低其表达线性区域的能力。此外,我们还揭示了ReLU网络在训练过程中的迭代精细化过程。我们希望通过这项研究,能够激发网络优化的努力,并且对深度网络的行为进行探索和分析。

WCLD: Curated Large Dataset of Criminal Cases from Wisconsin Circuit Courts

  • paper_url: http://arxiv.org/abs/2310.18724
  • repo_url: None
  • paper_authors: Elliott Ash, Naman Goel, Nianyun Li, Claudia Marangon, Peiyao Sun
  • for: This paper provides a large dataset of criminal cases to support research on machine learning decision-support tools in criminal justice systems, with a focus on fairness and systemic issues.
  • methods: The dataset is constructed using reliable public data from 1970 to 2020, including information on prior criminal counts, recidivism outcomes, and various other attributes such as neighborhood characteristics, charge severity, and case decisions.
  • results: The dataset contains a large number of samples from five racial groups and provides researchers with a more comprehensive and rigorous platform for studying algorithmic fairness in the context of criminal justice.
    Abstract Machine learning based decision-support tools in criminal justice systems are subjects of intense discussions and academic research. There are important open questions about the utility and fairness of such tools. Academic researchers often rely on a few small datasets that are not sufficient to empirically study various real-world aspects of these questions. In this paper, we contribute WCLD, a curated large dataset of 1.5 million criminal cases from circuit courts in the U.S. state of Wisconsin. We used reliable public data from 1970 to 2020 to curate attributes like prior criminal counts and recidivism outcomes. The dataset contains large number of samples from five racial groups, in addition to information like sex and age (at judgment and first offense). Other attributes in this dataset include neighborhood characteristics obtained from census data, detailed types of offense, charge severity, case decisions, sentence lengths, year of filing etc. We also provide pseudo-identifiers for judge, county and zipcode. The dataset will not only enable researchers to more rigorously study algorithmic fairness in the context of criminal justice, but also relate algorithmic challenges with various systemic issues. We also discuss in detail the process of constructing the dataset and provide a datasheet. The WCLD dataset is available at \url{https://clezdata.github.io/wcld/}.
    摘要 机器学习基于决策支持工具在刑事司法系统中是激烈的讨论和学术研究的主题。有重要的开放问题,例如这些工具的有用性和公平性。学术研究人员 часто依靠一些小的数据集来实际研究各种现实世界方面的问题。在这篇论文中,我们贡献了WCLD,一个 curaated大型数据集,包含150万个刑事案件从美国威斯康星州的环境法院。我们使用可靠的公共数据从1970年到2020年来Curate属性,如前科犯罪记录和重犯率结果。这个数据集包含多个种族组,以及性别和年龄(审判时和首次犯罪时)的信息。其他属性包括从人口普查数据获取的社区特征、细致的犯罪类型、罪名严重程度、审判结果、刑罚长度、提交年份等。我们还提供了判官、郡和邮政编码的 Pseudo-标识符。这个数据集不仅允许研究人员更加严谨地研究刑事司法中的算法公平性,还可以将算法挑战与多种系统问题相关联。我们还在详细介绍了数据集构建过程,并提供了数据表单。WCLD数据集可以在 \url{https://clezdata.github.io/wcld/} 上下载。

Robust Offline Policy Evaluation and Optimization with Heavy-Tailed Rewards

  • paper_url: http://arxiv.org/abs/2310.18715
  • repo_url: None
  • paper_authors: Jin Zhu, Runzhe Wan, Zhengling Qi, Shikai Luo, Chengchun Shi
  • for: 增强线上强化学习(RL)在重 tailed 奖励下的Robustness,这种情况在实际应用中很普遍。
  • methods: 我们提出了两种算法框架,ROAM和ROOM,用于稳定的 Off-policy Evaluation(OPE)和 Offline Policy Optimization(OPO)。我们的框架通过精心将 median-of-means 方法与线上RL结合,以便直观地估计值函数估计器的uncertainty。这不仅遵循 OPO 的原则,而且 также有效地处理重 tailed 奖励。
  • results: 我们的两种框架在对 logged 数据集展示 heavy-tailed 奖励分布时表现出色,与现有方法相比,有较高的性能。
    Abstract This paper endeavors to augment the robustness of offline reinforcement learning (RL) in scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world applications. We propose two algorithmic frameworks, ROAM and ROOM, for robust off-policy evaluation (OPE) and offline policy optimization (OPO), respectively. Central to our frameworks is the strategic incorporation of the median-of-means method with offline RL, enabling straightforward uncertainty estimation for the value function estimator. This not only adheres to the principle of pessimism in OPO but also adeptly manages heavy-tailed rewards. Theoretical results and extensive experiments demonstrate that our two frameworks outperform existing methods on the logged dataset exhibits heavy-tailed reward distributions.
    摘要

An Investigation of Darwiche and Pearl’s Postulates for Iterated Belief Update

  • paper_url: http://arxiv.org/abs/2310.18714
  • repo_url: None
  • paper_authors: Quanlong Guan, Tong Zhu, Liangda Fang, Junming Qiu, Zhao-Rong Lai, Weiqi Luo
  • For: This paper focuses on belief revision and update, two types of belief change, and how an agent can modify her beliefs in the presence of new information.* Methods: The paper uses the AGM and KM postulates to capture rational belief revision and update, respectively, but notes that these postulates are too permissive and can lead to unreasonable changes in the iteration.* Results: The paper presents a modification of the original KM postulates based on belief states, and migrates several well-known postulates for iterated belief revision to iterated belief update. The paper also provides exact semantic characterizations based on partial preorders for each of the proposed postulates, and analyzes the compatibility between the iterated postulates and the KM postulates for belief update.
    Abstract Belief revision and update, two significant types of belief change, both focus on how an agent modify her beliefs in presence of new information. The most striking difference between them is that the former studies the change of beliefs in a static world while the latter concentrates on a dynamically-changing world. The famous AGM and KM postulates were proposed to capture rational belief revision and update, respectively. However, both of them are too permissive to exclude some unreasonable changes in the iteration. In response to this weakness, the DP postulates and its extensions for iterated belief revision were presented. Furthermore, Rodrigues integrated these postulates in belief update. Unfortunately, his approach does not meet the basic requirement of iterated belief update. This paper is intended to solve this problem of Rodrigues's approach. Firstly, we present a modification of the original KM postulates based on belief states. Subsequently, we migrate several well-known postulates for iterated belief revision to iterated belief update. Moreover, we provide the exact semantic characterizations based on partial preorders for each of the proposed postulates. Finally, we analyze the compatibility between the above iterated postulates and the KM postulates for belief update.
    摘要 belief revision和更新两种重要的信念变化都关注于一个代理人在新信息存在下如何修改她的信念。两者最明显的差异在于前者研究在静止世界中的信念变化,而后者专注于动态变化的世界。著名的AGM和KM假设被提出来捕捉合理的信念修改和更新。然而,两者都过于允许一些不合理的修改在迭代中。为了解决这个弱点,DP假设和其扩展被提出来。此外,Rodrigues将这些假设 integrate到信念更新中。然而,他的方法并不满足基本的迭代信念更新要求。这篇论文的目的是解决Rodrigues的方法中的这个问题。首先,我们提出修改了原始KM假设的基于信念状态的修改。然后,我们将许多已知的迭代信念修改假设迁移到迭代信念更新中。此外,我们提供了每个提案的准确的语义特征化,基于partial orden для每个提案。最后,我们分析了以上迭代假设与KM假设之间的兼容性。

Probing LLMs for Joint Encoding of Linguistic Categories

  • paper_url: http://arxiv.org/abs/2310.18696
  • repo_url: https://github.com/thesofakillers/infoshare
  • paper_authors: Giulio Starace, Konstantinos Papakostas, Rochelle Choenni, Apostolos Panagiotopoulos, Matteo Rosati, Alina Leidinger, Ekaterina Shutova
  • for: 这个论文旨在探讨大语言模型(LLM)中不同语言现象之间的编码方式,以及这些编码方式如何交互影响模型的表示。
  • methods: 作者提出了一种测试框架,用于检查 LLM 中不同语言现象之间的编码方式。他们在 syntax 领域进行了实验,并发现了在同一级别(相关的 parts-of-speech 类)和不同级别(parts-of-speech 类和相关的语法依赖关系)之间存在共同编码的证据。
  • results: 实验显示,在多语言 LLM 中,同样的 patterns 存在于不同语言中。
    Abstract Large Language Models (LLMs) exhibit impressive performance on a range of NLP tasks, due to the general-purpose linguistic knowledge acquired during pretraining. Existing model interpretability research (Tenney et al., 2019) suggests that a linguistic hierarchy emerges in the LLM layers, with lower layers better suited to solving syntactic tasks and higher layers employed for semantic processing. Yet, little is known about how encodings of different linguistic phenomena interact within the models and to what extent processing of linguistically-related categories relies on the same, shared model representations. In this paper, we propose a framework for testing the joint encoding of linguistic categories in LLMs. Focusing on syntax, we find evidence of joint encoding both at the same (related part-of-speech (POS) classes) and different (POS classes and related syntactic dependency relations) levels of linguistic hierarchy. Our cross-lingual experiments show that the same patterns hold across languages in multilingual LLMs.
    摘要 大型语言模型(LLM)在多种自然语言处理任务上表现出众,这是因为预训练期间获得的通用语言知识。现有的模型解释研究(Tenney等,2019)表明,LLM层次结构中的下层更适合解决语法任务,而上层则用于 semantics处理。然而,我们对 LLM 中不同语言现象编码的交互并不甚了解,以及这些编码如何在模型中互相协作。在这篇论文中,我们提出了测试 LLM 中语言类别之间的共同编码框架。我们将注重语法,发现在同一级别(相关的部分词类)和不同级别(部分词类和相关的语法关系)之间都有共同编码证据。我们的跨语言实验表明,这些模式在多语言 LLM 中也存在。

Unsupervised Behavior Extraction via Random Intent Priors

  • paper_url: http://arxiv.org/abs/2310.18687
  • repo_url: None
  • paper_authors: Hao Hu, Yiqin Yang, Jianing Ye, Ziqing Mai, Chongjie Zhang
  • for: 提高offline reinforcement learning(RL)算法的效率和实用性,使其能够更好地利用奖励自由数据中的人类行为知识。
  • methods: 提出了一种无监督的方法UBER,通过不同的假奖分配给不同的代理人来提取多样化的行为集,并将其 reuse 为新任务学习的候选策略。
  • results: 经验和理论证明表明,使用随机神经网络生成的奖励函数可以提取多样化和有用的行为,一些甚至与专家相似。实验结果表明,UBER可以在多个 benchmark 上学习有效和多样的行为集,超过现有的基elines。
    Abstract Reward-free data is abundant and contains rich prior knowledge of human behaviors, but it is not well exploited by offline reinforcement learning (RL) algorithms. In this paper, we propose UBER, an unsupervised approach to extract useful behaviors from offline reward-free datasets via diversified rewards. UBER assigns different pseudo-rewards sampled from a given prior distribution to different agents to extract a diverse set of behaviors, and reuse them as candidate policies to facilitate the learning of new tasks. Perhaps surprisingly, we show that rewards generated from random neural networks are sufficient to extract diverse and useful behaviors, some even close to expert ones. We provide both empirical and theoretical evidence to justify the use of random priors for the reward function. Experiments on multiple benchmarks showcase UBER's ability to learn effective and diverse behavior sets that enhance sample efficiency for online RL, outperforming existing baselines. By reducing reliance on human supervision, UBER broadens the applicability of RL to real-world scenarios with abundant reward-free data.
    摘要 reward-free 数据够丰富,含有人类行为的丰富先验知识,但是现在的Offline reinforcement learning(RL)算法未能充分利用这些数据。在这篇论文中,我们提出了UBER,一种不带supervision的方法,通过多样化的奖励来提取用于新任务学习的有用行为集。我们尝试 assigning different pseudo-奖励,从给定的先验分布中随机生成的奖励,给不同的代理人,以提取多样化的行为集,并将其 reuse 作为新任务学习的候选策略。我们发现,由随机神经网络生成的奖励可以提取出高质量的多样化行为集,其中一些甚至可以与专家级别相比。我们提供了 both empirical 和理论证据,证明使用随机先验来奖励函数是有理由的。我们在多个 bench mark 上进行了实验,证明 UBER 能够学习有效和多样化的行为集,提高在线RL的样本效率,超过现有的基eline。通过减少人类监督,UBER 扩展了RL的应用范围,使其可以在实际情况下应用。

N-Critics: Self-Refinement of Large Language Models with Ensemble of Critics

  • paper_url: http://arxiv.org/abs/2310.18679
  • repo_url: None
  • paper_authors: Sajad Mousavi, Ricardo Luna Gutiérrez, Desik Rengarajan, Vineet Gundecha, Ashwin Ramesh Babu, Avisek Naug, Antonio Guillen, Soumyendu Sarkar
  • for: 提高 LLM 的可靠性和准确性, Mitigate 偏见和谎言
  • methods: 使用一个 ensemble of critics 和模型自身的反馈来修正模型输出, drawing inspiration from human self-reflection and input seeking behavior
  • results: observe consistent performance improvements in reducing toxicity and correcting factual errors
    Abstract We propose a self-correction mechanism for Large Language Models (LLMs) to mitigate issues such as toxicity and fact hallucination. This method involves refining model outputs through an ensemble of critics and the model's own feedback. Drawing inspiration from human behavior, we explore whether LLMs can emulate the self-correction process observed in humans who often engage in self-reflection and seek input from others to refine their understanding of complex topics. Our approach is model-agnostic and can be applied across various domains to enhance trustworthiness by addressing fairness, bias, and robustness concerns. We consistently observe performance improvements in LLMs for reducing toxicity and correcting factual errors.
    摘要 我们提出了一种自修复机制,用于 mitigate Large Language Models(LLMs)中的问题,如恶意和谬误投入。这种方法通过一个ensemble of critics和模型自身的反馈来纠正模型输出。我们从人类行为中获得灵感,探讨 LLMS 是否可以模仿人类自我反思的自修复过程。我们的方法是无关模型的,可以在不同领域中应用,以提高可靠性,解决公平、偏见和Robustness 等问题。我们一致地观察到 LLMS 的性能提高,用于减少恶意和 corrected 错误。

GalliformeSpectra: A Hen Breed Dataset

  • paper_url: http://arxiv.org/abs/2310.19830
  • repo_url: None
  • paper_authors: Galib Muhammad Shahriar Himel, Md Masudul Islam
  • for: 这个论文旨在提供一份包含十种不同鸡种的完整数据集,以捕捉每种鸡种的独特特征和特征。
  • methods: 该论文使用了一种多样化的数据收集方法,收集了1010个原始JPG图像,展示了各种鸡种的身体特征、毛皮模式和特有的特征。这些图像后来被标准化、缩放和转换为PNG格式以保持数据集的一致性。
  • results: 该数据集提供了一个多样化的资源,可以用于鸡类科学、遗传学和农业研究。这个数据集的潜在价值在于它可以帮助研究人员探索不同鸡种之间的一致性和遗传特征,从而支持鸡类种养、遗传研究和生物技术发展。
    Abstract This article presents a comprehensive dataset featuring ten distinct hen breeds, sourced from various regions, capturing the unique characteristics and traits of each breed. The dataset encompasses Bielefeld, Blackorpington, Brahma, Buckeye, Fayoumi, Leghorn, Newhampshire, Plymouthrock, Sussex, and Turken breeds, offering a diverse representation of poultry commonly bred worldwide. A total of 1010 original JPG images were meticulously collected, showcasing the physical attributes, feather patterns, and distinctive features of each hen breed. These images were subsequently standardized, resized, and converted to PNG format for consistency within the dataset. The compilation, although unevenly distributed across the breeds, provides a rich resource, serving as a foundation for research and applications in poultry science, genetics, and agricultural studies. This dataset holds significant potential to contribute to various fields by enabling the exploration and analysis of unique characteristics and genetic traits across different hen breeds, thereby supporting advancements in poultry breeding, farming, and genetic research.
    摘要

FinBTech: Blockchain-Based Video and Voice Authentication System for Enhanced Security in Financial Transactions Utilizing FaceNet512 and Gaussian Mixture Models

  • paper_url: http://arxiv.org/abs/2310.18668
  • repo_url: None
  • paper_authors: Prof N. Jeenath Laila, Dr G. Tamilpavai
  • for: 为了提高金融交易的安全性和可靠性
  • methods: 使用智能合约、区块链技术、FaceNet512 face recognition和GMM语音认证,实现视频和音频验证
  • results: 提供了一个无 precedent 的多因素生物 metric 验证系统,提高安全性至新高度
    Abstract In the digital age, it is crucial to make sure that financial transactions are as secure and reliable as possible. This abstract offers a ground-breaking method that combines smart contracts, blockchain technology, FaceNet512 for improved face recognition, and Gaussian Mixture Models (GMM) for speech authentication to create a system for video and audio verification that is unmatched. Smart contracts and the immutable ledger of the blockchain are combined to offer a safe and open environment for financial transactions. FaceNet512 and GMM offer multi-factor biometric authentication simultaneously, enhancing security to new heights. By combining cutting-edge technology, this system offers a strong defense against identity theft and illegal access, establishing a new benchmark for safe financial transactions.
    摘要 在数字时代,确保金融交易的安全和可靠性非常重要。这个报道提供了一种创新的方法, combinig智能合同、区块链技术、FaceNet512 для提高人脸识别和混合 Gaussian Mixture Models (GMM) для语音验证,以创建一个无与伦比的视频和音频验证系统。智能合同和区块链的坚实记录结合,提供了一个安全和开放的金融交易环境。FaceNet512 和 GMM 同时提供多因素生物 metric 验证,提高安全性至新的高度。通过结合前沿技术,这个系统提供了一个强大的防止身份盗用和未经授权访问的防御,设立了新的安全金融交易标准。

From Indeterminacy to Determinacy: Augmenting Logical Reasoning Capabilities with Large Language Models

  • paper_url: http://arxiv.org/abs/2310.18659
  • repo_url: None
  • paper_authors: Hongda Sun, Weikai Xu, Wei Liu, Jian Luan, Bin Wang, Shuo Shang, Ji-Rong Wen, Rui Yan
  • for: 提高LLM的逻辑推理能力,以便更好地模仿人类逻辑思维。
  • methods: 提出了一种新的逻辑推理框架,即DetermLR,该框架将逻辑推理过程定义为一种从不确定前提开始,逐步增加确定前提,使结论变得更加明确的过程。DetermLR包括三个重要组成部分:1)前提识别:将前提分为两类:确定和不确定。这使LLM可以根据特定任务的复杂度选择适当的逻辑结构。2)前提优化和探索:利用量化度量评估每个前提的相关性,以便更好地决定探索哪些前提可能会带来新的发现。3)迭代过程和逻辑记忆:引入逻辑记忆模块,自动记录和提取可用的前提和逻辑路径,以保持历史逻辑细节,从而更好地优化前提优化和逻辑推理过程。
  • results: 对四个复杂的逻辑推理任务LogiQA、ProofWriter、FOLIO和LogicalDeduction进行了广泛的实验,结果表明,DetermLR与所有基线相比,在逻辑推理任务中表现出色,可以更好地完成逻辑推理任务,同时需要更少的访问状态。
    Abstract Recent advances in LLMs have revolutionized the landscape of reasoning tasks. To enhance the capabilities of LLMs to emulate human reasoning, prior works focus on modeling reasoning steps using specific thought structures like chains, trees, or graphs. However, LLM-based reasoning continues to encounter three challenges: 1) Selecting appropriate reasoning structures for various tasks; 2) Exploiting known conditions sufficiently and efficiently to deduce new insights; 3) Considering the impact of historical reasoning experience. To address these challenges, we propose DetermLR, a novel reasoning framework that formulates the reasoning process as a transformational journey from indeterminate premises to determinate ones. This process is marked by the incremental accumulation of determinate premises, making the conclusion progressively closer to clarity. DetermLR includes three essential components: 1) Premise identification: We categorize premises into two distinct types: determinate and indeterminate. This empowers LLMs to customize reasoning structures to match the specific task complexities. 2) Premise prioritization and exploration: We leverage quantitative measurements to assess the relevance of each premise to the target, prioritizing more relevant premises for exploring new insights. 3) Iterative process with reasoning memory: We introduce a reasoning memory module to automate storage and extraction of available premises and reasoning paths, preserving historical reasoning details for more accurate premise prioritization. Comprehensive experimental results show that DetermLR outperforms all baselines on four challenging logical reasoning tasks: LogiQA, ProofWriter, FOLIO, and LogicalDeduction. DetermLR can achieve better reasoning performance while requiring fewer visited states, highlighting its superior efficiency and effectiveness in tackling logical reasoning tasks.
    摘要
  1. Selecting appropriate reasoning structures for various tasks2. Exploiting known conditions sufficiently and efficiently to deduce new insights3. Considering the impact of historical reasoning experience.To address these challenges, we propose DetermLR, a novel reasoning framework that formulates the reasoning process as a transformational journey from indeterminate premises to determinate ones. This process is marked by the incremental accumulation of determinate premises, making the conclusion progressively closer to clarity. DetermLR includes three essential components: 1. Premise identification: We categorize premises into two distinct types: determinate and indeterminate. This empowers LLMs to customize reasoning structures to match the specific task complexities. 2. Premise prioritization and exploration: We leverage quantitative measurements to assess the relevance of each premise to the target, prioritizing more relevant premises for exploring new insights. 3. Iterative process with reasoning memory: We introduce a reasoning memory module to automate storage and extraction of available premises and reasoning paths, preserving historical reasoning details for more accurate premise prioritization.Comprehensive experimental results show that DetermLR outperforms all baselines on four challenging logical reasoning tasks: LogiQA, ProofWriter, FOLIO, and LogicalDeduction. DetermLR can achieve better reasoning performance while requiring fewer visited states, highlighting its superior efficiency and effectiveness in tackling logical reasoning tasks.

EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images

  • paper_url: http://arxiv.org/abs/2310.18652
  • repo_url: https://github.com/baeseongsu/ehrxqa
  • paper_authors: Seongsu Bae, Daeun Kyung, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei Ji, Eric I-Chao Chang, Tackeun Kim, Edward Choi
  • For: 这个论文旨在开发一个基于电子医疗记录(EHR)的多模态问答集(EHRXQA),以推动现有EHR问答系统中多模态合理的推理。* Methods: 该论文使用了两个uni-modal资源:1)MIMIC-CXR-VQA数据集,我们新创建的医疗图像问答标准 benchmark,以增强imaging模式在EHR问答中的参与度;2)EHRSQL(MIMIC-IV),一个重新设计的表格基于EHR问答dataset。通过将这两个uni-modal资源集成,我们成功构建了一个多模态EHR问答集。* Results: 该论文提出了一种基于NeuralSQL的策略,其中包括一个外部VQA API,以解决多模态EHR问题中的独特挑战。这项创新的尝试可以提高对多模态EHR源的参与度,我们认为这个dataset可以促进现实世界的医疗应用,如临床决策和研究。
    Abstract Electronic Health Records (EHRs), which contain patients' medical histories in various multi-modal formats, often overlook the potential for joint reasoning across imaging and table modalities underexplored in current EHR Question Answering (QA) systems. In this paper, we introduce EHRXQA, a novel multi-modal question answering dataset combining structured EHRs and chest X-ray images. To develop our dataset, we first construct two uni-modal resources: 1) The MIMIC- CXR-VQA dataset, our newly created medical visual question answering (VQA) benchmark, specifically designed to augment the imaging modality in EHR QA, and 2) EHRSQL (MIMIC-IV), a refashioned version of a previously established table-based EHR QA dataset. By integrating these two uni-modal resources, we successfully construct a multi-modal EHR QA dataset that necessitates both uni-modal and cross-modal reasoning. To address the unique challenges of multi-modal questions within EHRs, we propose a NeuralSQL-based strategy equipped with an external VQA API. This pioneering endeavor enhances engagement with multi-modal EHR sources and we believe that our dataset can catalyze advances in real-world medical scenarios such as clinical decision-making and research. EHRXQA is available at https://github.com/baeseongsu/ehrxqa.
    摘要 电子健康记录(EHR),它们包含了患者的医疗历史记录在不同的多模态格式中,经常忽视了现有EHR问答系统中的跨模态合理化潜力。在这篇论文中,我们引入了EHRXQA,一个新的多模态问答数据集,结合了结构化的EHR和胸部X射影像。为了开发我们的数据集,我们首先构建了两个单模态资源:1)我们新创建的医疗图像问答数据集(MIMIC-CXR-VQA),用于增强EHR中的图像模态,并2)EHRSQL(MIMIC-IV),一个重新设计的表格基于EHR问答数据集。通过将这两个单模态资源集成起来,我们成功地构建了一个多模态EHR问答数据集,需要同时进行单模态和跨模态合理化。为了解决EHR中多模态问题中的特殊挑战,我们提出了基于NeuralSQL的策略,并配备了外部VQA API。我们认为这一努力可以提高对多模态EHR源的参与度,并且我们相信EHRXQA数据集可以促进实际医疗场景中的决策和研究。EHRXQA数据集可以在https://github.com/baeseongsu/ehrxqa上下载。

Sleep Deprivation in the Forward-Forward Algorithm

  • paper_url: http://arxiv.org/abs/2310.18647
  • repo_url: https://github.com/mirceatlx/ff
  • paper_authors: Mircea-Tudor Lică, David Dinucu-Jianu
  • for: 本研究探讨了在睡眠Context中Forward-Forward算法的两个前向通道分离方法的生物学意义。
  • methods: 本研究使用了Forward-Forward算法进行学习,并通过调整睡眠和醒目阶段之间的差距来调整算法的学习能力。
  • results: 研究发现,睡眠阶段的差距影响了算法的学习能力,而负数据的存在可以减轻睡眠不足的影响。
    Abstract This paper aims to explore the separation of the two forward passes in the Forward-Forward algorithm from a biological perspective in the context of sleep. We show the size of the gap between the sleep and awake phase influences the learning capabilities of the algorithm and highlight the importance of negative data in diminishing the devastating effects of sleep deprivation.
    摘要 Note: "Forward-Forward algorithm" is not a real algorithm, it's a fictional one used for illustration purposes only.

Predicting Agricultural Commodities Prices with Machine Learning: A Review of Current Research

  • paper_url: http://arxiv.org/abs/2310.18646
  • repo_url: None
  • paper_authors: Nhat-Quang Tran, Anna Felipe, Thanh Nguyen Ngoc, Tom Huynh, Quang Tran, Arthur Tang, Thuy Nguyen
  • For: 这篇论文是关于机器学习算法在农业价格预测中的一种评论。* Methods: 论文详细介绍了各种机器学习算法在农业价格预测中的应用,包括支持向量机器、决策树、彩虹分解等。* Results: 论文认为,机器学习算法可以提高农业价格预测的准确性和实时性,同时可以适应不同的农业市场和环境。但是,论文也指出了这些算法的限制和挑战,例如数据质量和可用性的问题。
    Abstract Agricultural price prediction is crucial for farmers, policymakers, and other stakeholders in the agricultural sector. However, it is a challenging task due to the complex and dynamic nature of agricultural markets. Machine learning algorithms have the potential to revolutionize agricultural price prediction by improving accuracy, real-time prediction, customization, and integration. This paper reviews recent research on machine learning algorithms for agricultural price prediction. We discuss the importance of agriculture in developing countries and the problems associated with crop price falls. We then identify the challenges of predicting agricultural prices and highlight how machine learning algorithms can support better prediction. Next, we present a comprehensive analysis of recent research, discussing the strengths and weaknesses of various machine learning techniques. We conclude that machine learning has the potential to revolutionize agricultural price prediction, but further research is essential to address the limitations and challenges associated with this approach.
    摘要 农业价格预测对农民、政策制定者和农业领acker有着重要的意义。然而,由于农业市场的复杂和动态特点,这是一项具有挑战性的任务。机器学习算法有可能为农业价格预测带来革命性的改善,包括准确性、实时预测、定制化和 интеграción。本文 recensreview了最近的研究,探讨了机器学习算法在农业价格预测中的应用。我们讨论了发展国家农业的重要性以及作物价格下跌的问题,然后详细介绍了各种机器学习技术的挑战和局限性。我们 conclude that 机器学习有可能为农业价格预测带来革命性的改善,但进一步的研究是必要的,以解决这种方法的限制和挑战。

One-shot Localization and Segmentation of Medical Images with Foundation Models

  • paper_url: http://arxiv.org/abs/2310.18642
  • repo_url: None
  • paper_authors: Deepa Anand, Gurunath Reddy M, Vanika Singhal, Dattesh D. Shanbhag, Shriram KS, Uday Patil, Chitresh Bhushan, Kavitha Manickam, Dawei Gui, Rakesh Mullick, Avinash Gopal, Parminder Bhatia, Taha Kass-Hout
  • for: 本研究使用自然图像预训练的视Transformers(ViT)和稳定扩散(SD)模型来解决医学图像对应问题。
  • methods: 研究使用多种预训练的ViT(DINO、DINOv2、SAM、CLIP)和SD模型,对医学图像进行解决对应问题。
  • results: 研究表明,使用自然图像预训练的ViT和SD模型可以在不同的医学图像模式(CT、MR、ultrasound)、多个解剖区域(脑、胸、 Abdomen、Extremities)和多种任务上达到良好的性能。此外,通过与模板图像进行对应,我们使用SAM模型进行单击分割,达到了单击分割的 dice range 62%-90%。我们的单击方法也超过了 reciently proposed few-shot segmentation方法 - UniverSeg(Dice range 47%-80%) 在大多数医学图像模式中的多个semantic segmentation任务中表现出色。
    Abstract Recent advances in Vision Transformers (ViT) and Stable Diffusion (SD) models with their ability to capture rich semantic features of the image have been used for image correspondence tasks on natural images. In this paper, we examine the ability of a variety of pre-trained ViT (DINO, DINOv2, SAM, CLIP) and SD models, trained exclusively on natural images, for solving the correspondence problems on medical images. While many works have made a case for in-domain training, we show that the models trained on natural images can offer good performance on medical images across different modalities (CT,MR,Ultrasound) sourced from various manufacturers, over multiple anatomical regions (brain, thorax, abdomen, extremities), and on wide variety of tasks. Further, we leverage the correspondence with respect to a template image to prompt a Segment Anything (SAM) model to arrive at single shot segmentation, achieving dice range of 62%-90% across tasks, using just one image as reference. We also show that our single-shot method outperforms the recently proposed few-shot segmentation method - UniverSeg (Dice range 47%-80%) on most of the semantic segmentation tasks(six out of seven) across medical imaging modalities.
    摘要 近期,人工智能领域内的视觉转换器(ViT)和稳定扩散(SD)模型在自然图像上表现出了捕捉图像 semantics的能力,这些模型在图像匹配任务中表现出色。在这篇论文中,我们研究了不同预训练的 ViT(DINO、DINOv2、SAM、CLIP)和 SD 模型,这些模型均在自然图像上进行封闭式训练,是否能够在医疗图像上解决匹配问题。虽然许多研究认为域内训练是关键,但我们发现这些模型在医疗图像上表现良好,包括不同的modalities(CT、MR、ultrasound),来自不同的制造商,以及多个解剖区域(大脑、胸部、腹部、四肢)。此外,我们利用模板图像的对应关系,使用 SAM 模型进行一步分割,实现了单步分割的 dice 范围为 62%-90%,使用只有一张图像作为参考。此外,我们的单步方法在多种医疗影像模式中的多个semantic segmentation任务中表现出色,超过了最近提出的几个shot segmentation方法(UniverSeg)的 dice 范围(47%-80%)。

Electrical Impedance Tomography: A Fair Comparative Study on Deep Learning and Analytic-based Approaches

  • paper_url: http://arxiv.org/abs/2310.18636
  • repo_url: https://github.com/dericknganyu/eit_dataset_generation
  • paper_authors: Derick Nganyu Tanyu, Jianfeng Ning, Andreas Hauptmann, Bangti Jin, Peter Maass
  • For: This paper focuses on the Electrical Impedance Tomography (EIT) inverse problem, which is the challenge of inferring the internal conductivity distribution of an object from measurements taken on its boundary. The paper explores techniques for solving this problem, particularly the interplay between deep learning-based strategies and classical analytic-based methods.* Methods: The paper examines four state-of-the-art deep learning algorithms for solving the EIT inverse problem, including their representational capabilities and strengths. In addition, two analytic-based methods are dissected for their limitations and strengths. The paper also employs various numerical experiments to evaluate the efficacy of these methods.* Results: The paper provides a nuanced understanding of the methods’ ability to capture essential features and delineate complex conductivity patterns. The incorporation of variable conductivity scenarios allows for exploring the robustness and adaptability of each method. The results demonstrate the potential of deep learning-based methods for solving the EIT inverse problem, particularly in the presence of complex conductivity patterns.
    Abstract Electrical Impedance Tomography (EIT) is a powerful imaging technique with diverse applications, e.g., medical diagnosis, industrial monitoring, and environmental studies. The EIT inverse problem is about inferring the internal conductivity distribution of an object from measurements taken on its boundary. It is severely ill-posed, necessitating advanced computational methods for accurate image reconstructions. Recent years have witnessed significant progress, driven by innovations in analytic-based approaches and deep learning. This review explores techniques for solving the EIT inverse problem, focusing on the interplay between contemporary deep learning-based strategies and classical analytic-based methods. Four state-of-the-art deep learning algorithms are rigorously examined, harnessing the representational capabilities of deep neural networks to reconstruct intricate conductivity distributions. In parallel, two analytic-based methods, rooted in mathematical formulations and regularisation techniques, are dissected for their strengths and limitations. These methodologies are evaluated through various numerical experiments, encompassing diverse scenarios that reflect real-world complexities. A suite of performance metrics is employed to assess the efficacy of these methods. These metrics collectively provide a nuanced understanding of the methods' ability to capture essential features and delineate complex conductivity patterns. One novel feature of the study is the incorporation of variable conductivity scenarios, introducing a level of heterogeneity that mimics textured inclusions. This departure from uniform conductivity assumptions mimics realistic scenarios where tissues or materials exhibit spatially varying electrical properties. Exploring how each method responds to such variable conductivity scenarios opens avenues for understanding their robustness and adaptability.
    摘要 电气阻抗成像技术(EIT)是一种 poderosa 的成像技术,广泛应用于医学诊断、工业监测和环境研究等领域。EIT逆问题是关于从物体边缘测量获得内部电导分布的问题,它是非常不稳定的,需要高级计算方法以实现准确的成像重建。过去几年,驱动了由创新的数学基础和深度学习的技术进步,这种技术的研究受到了广泛关注。本文探讨了解决EIT逆问题的方法,特别是将现代深度学习基础与传统的数学基础相结合的方法。本文选择了四种现代深度学习算法进行严格的分析和评估,利用深度神经网络的表达能力来重建复杂的电导分布。同时,本文还介绍了两种传统的数学基础方法,包括基于数学形式和正则化技术的方法,并评估了它们的优缺点。这些方法在多种数字实验中被评估,涵盖了实际中的复杂场景。为评估这些方法的效果,本文采用了多种效果指标,这些指标共同提供了对方法的准确性和复杂电导分布的能力的全面了解。本文的一个新特点是对不同电导性场景进行变量电导分布的研究,这种假设与实际中的细胞或材料表现相符。通过对每种方法的响应来评估它们的Robustness和适应性。

Setting the Trap: Capturing and Defeating Backdoors in Pretrained Language Models through Honeypots

  • paper_url: http://arxiv.org/abs/2310.18633
  • repo_url: None
  • paper_authors: Ruixiang Tang, Jiayi Yuan, Yiming Li, Zirui Liu, Rui Chen, Xia Hu
  • for: 防止语言模型中的后门攻击
  • methods: integrate a honeypot module into the original PLM, impose penalties on the information acquired by the honeypot module
  • results: 减少了10%~40%的攻击成功率,比前一代方法更有效和可靠
    Abstract In the field of natural language processing, the prevalent approach involves fine-tuning pretrained language models (PLMs) using local samples. Recent research has exposed the susceptibility of PLMs to backdoor attacks, wherein the adversaries can embed malicious prediction behaviors by manipulating a few training samples. In this study, our objective is to develop a backdoor-resistant tuning procedure that yields a backdoor-free model, no matter whether the fine-tuning dataset contains poisoned samples. To this end, we propose and integrate a honeypot module into the original PLM, specifically designed to absorb backdoor information exclusively. Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features while carrying minimal information about the original tasks. Consequently, we can impose penalties on the information acquired by the honeypot module to inhibit backdoor creation during the fine-tuning process of the stem network. Comprehensive experiments conducted on benchmark datasets substantiate the effectiveness and robustness of our defensive strategy. Notably, these results indicate a substantial reduction in the attack success rate ranging from 10\% to 40\% when compared to prior state-of-the-art methods.
    摘要 在自然语言处理领域,普遍的方法是细化预训练语言模型(PLM)使用本地样本。 recent research has exposed the vulnerability of PLMs to backdoor attacks, where adversaries can embed malicious prediction behaviors by manipulating a few training samples. In this study, our objective is to develop a backdoor-resistant tuning procedure that yields a backdoor-free model, regardless of whether the fine-tuning dataset contains poisoned samples. To this end, we propose and integrate a honeypot module into the original PLM, specifically designed to absorb backdoor information exclusively. Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features while carrying minimal information about the original tasks. Therefore, we can impose penalties on the information acquired by the honeypot module to inhibit backdoor creation during the fine-tuning process of the stem network. Comprehensive experiments conducted on benchmark datasets substantiate the effectiveness and robustness of our defensive strategy. Notably, these results indicate a substantial reduction in the attack success rate ranging from 10% to 40% when compared to prior state-of-the-art methods.

Benchmark Generation Framework with Customizable Distortions for Image Classifier Robustness

  • paper_url: http://arxiv.org/abs/2310.18626
  • repo_url: None
  • paper_authors: Soumyendu Sarkar, Ashwin Ramesh Babu, Sajad Mousavi, Zachariah Carmichael, Vineet Gundecha, Sahand Ghorbanpour, Ricardo Luna, Gutierrez Antonio Guillen, Avisek Naug
  • for: 这个 paper 是为了提供一种生成攻击测试集的框架,以评估图像分类模型的可靠性。
  • methods: 这个框架使用了一种基于模型学习的强化学习算法,可以根据用户的需求选择合适的扰动种类,并生成多种扰动水平的测试集,以评估不同的图像分类模型的可靠性。
  • results: 这个框架可以生成高效和可转移的攻击样本,可以让不同的图像分类模型失败,包括 ResNet-50、Inception-V3 和 VGG-16 等模型。这些攻击样本可以在不受束缚的情况下生成,而不需要引入不自然的artifacts或颜色泄漏。
    Abstract We present a novel framework for generating adversarial benchmarks to evaluate the robustness of image classification models. Our framework allows users to customize the types of distortions to be optimally applied to images, which helps address the specific distortions relevant to their deployment. The benchmark can generate datasets at various distortion levels to assess the robustness of different image classifiers. Our results show that the adversarial samples generated by our framework with any of the image classification models, like ResNet-50, Inception-V3, and VGG-16, are effective and transferable to other models causing them to fail. These failures happen even when these models are adversarially retrained using state-of-the-art techniques, demonstrating the generalizability of our adversarial samples. We achieve competitive performance in terms of net $L_2$ distortion compared to state-of-the-art benchmark techniques on CIFAR-10 and ImageNet; however, we demonstrate our framework achieves such results with simple distortions like Gaussian noise without introducing unnatural artifacts or color bleeds. This is made possible by a model-based reinforcement learning (RL) agent and a technique that reduces a deep tree search of the image for model sensitivity to perturbations, to a one-level analysis and action. The flexibility of choosing distortions and setting classification probability thresholds for multiple classes makes our framework suitable for algorithmic audits.
    摘要 我团队提出了一种新的框架,用于生成对图像分类模型的Robustness进行评估。我们的框架允许用户自定义图像上应用的最佳噪声类型,以适应其特定的部署环境。这个框架可以生成各种噪声水平的数据集,以评估不同的图像分类器的Robustness。我们的结果显示,我们的框架生成的对图像分类模型的攻击样本,包括ResNet-50、Inception-V3和VGG-16等模型,都是有效的和可传递的。这些攻击样本会让这些模型失败,即使这些模型通过了最先进的防御技术进行适应。我们的框架在CIFAR-10和ImageNet上达到了与state-of-the-art的$L_2$损失相同的竞争性,但是我们的框架可以使用简单的噪声(如 Gaussian 噪声)而不需要引入不自然的artifacts或颜色泄漏。这是由一个基于模型的强化学习(RL) Agent和一种减少图像深度搜索的技术来实现的。我们的框架可以根据用户选择的噪声类型和多个类别的分类概率来进行自定义。这使得我们的框架适用于算法审核。

Arbitrarily Scalable Environment Generators via Neural Cellular Automata

  • paper_url: http://arxiv.org/abs/2310.18622
  • repo_url: https://github.com/lunjohnzhang/warehouse_env_gen_nca_public
  • paper_authors: Yulun Zhang, Matthew C. Fontaine, Varun Bhatt, Stefanos Nikolaidis, Jiaoyang Li
  • for: 提高多机器人系统的吞吐量(improve the throughput of multi-robot systems)
  • methods: 使用质量多样性(Quality Diversity)算法优化环境生成器(Neural Cellular Automata environment generators)
  • results: 可以生成无限大的环境,并且维持环境中的准备规划(consistent, regularized patterns),提高多机器人系统的可扩展性和可靠性(improve the scalability and reliability of multi-robot systems)
    Abstract We study the problem of generating arbitrarily large environments to improve the throughput of multi-robot systems. Prior work proposes Quality Diversity (QD) algorithms as an effective method for optimizing the environments of automated warehouses. However, these approaches optimize only relatively small environments, falling short when it comes to replicating real-world warehouse sizes. The challenge arises from the exponential increase in the search space as the environment size increases. Additionally, the previous methods have only been tested with up to 350 robots in simulations, while practical warehouses could host thousands of robots. In this paper, instead of optimizing environments, we propose to optimize Neural Cellular Automata (NCA) environment generators via QD algorithms. We train a collection of NCA generators with QD algorithms in small environments and then generate arbitrarily large environments from the generators at test time. We show that NCA environment generators maintain consistent, regularized patterns regardless of environment size, significantly enhancing the scalability of multi-robot systems in two different domains with up to 2,350 robots. Additionally, we demonstrate that our method scales a single-agent reinforcement learning policy to arbitrarily large environments with similar patterns. We include the source code at \url{https://github.com/lunjohnzhang/warehouse_env_gen_nca_public}.
    摘要 我们研究多机器人系统中的环境生成问题,以提高其吞吐量。先前的方法提出了质量多样性(QD)算法来优化自动化仓储的环境,但这些方法仅能优化较小的环境,无法模拟现实世界仓储的规模。这问题的挑战在于搜索空间的对数增长,以及先前的方法仅在350台机器人的 simulations 中进行过测试。在这篇文章中,我们不是直接优化环境,而是透过 QD 算法来优化神经细胞自动机(NCA)环境生成器。我们在小型环境中训练了一个 NCA 环境生成器,然后在测试时使用这个生成器来生成任意大的环境。我们证明了 NCA 环境生成器在不同领域中能够维持一致的、规律的模式,很大地提高了多机器人系统的扩展性,并且还能将单机器人学习策略扩展到任意大的环境中。我们在这篇文章中还提供了源代码,可以在 \url{https://github.com/lunjohnzhang/warehouse_env_gen_nca_public} 中获取。

Dense Retrieval as Indirect Supervision for Large-space Decision Making

  • paper_url: http://arxiv.org/abs/2310.18619
  • repo_url: https://github.com/luka-group/ddr
  • paper_authors: Nan Xu, Fei Wang, Mingtao Dong, Muhao Chen
  • for: 提高大量分类任务的准确率和泛化能力。
  • methods: 使用 dense retrieval 方法,将大量分类任务 reformulate 为学习 retrieve 任务,并使用 dual-encoder 架构来学习预测。
  • results: 在多个极端多类分类任务和少量数据情况下,DDR 可以大幅提高预测精度和泛化能力,至少比基eline 27.54%,F1 分数提高 1.17%,并在三个少量意图分类任务中平均提高了1.26%的准确率。
    Abstract Many discriminative natural language understanding (NLU) tasks have large label spaces. Learning such a process of large-space decision making is particularly challenging due to the lack of training instances per label and the difficulty of selection among many fine-grained labels. Inspired by dense retrieval methods for passage finding in open-domain QA, we propose a reformulation of large-space discriminative NLU tasks as a learning-to-retrieve task, leading to a novel solution named Dense Decision Retrieval (DDR ). Instead of predicting fine-grained decisions as logits, DDR adopts a dual-encoder architecture that learns to predict by retrieving from a decision thesaurus. This approach not only leverages rich indirect supervision signals from easy-to-consume learning resources for dense retrieval, it also leads to enhanced prediction generalizability with a semantically meaningful representation of the large decision space. When evaluated on tasks with decision spaces ranging from hundreds to hundred-thousand scales, DDR outperforms strong baselines greatly by 27.54% in P@1 on two extreme multi-label classification tasks, 1.17% in F1 score ultra-fine entity typing, and 1.26% in accuracy on three few-shot intent classification tasks on average. Code and resources are available at https://github.com/luka-group/DDR
    摘要 很多推理性自然语言理解(NLU)任务有很大的标签空间。学习这种大空间决策的过程特别是有很多标签的选择和训练实例的缺乏。 inspirited by dense retrieval方法用于在开放领域Question Answering中找到段落,我们提出了对大空间推理性NLU任务的重新表述,导致一种新的解决方案 называ为粘性决策检索(DDR)。而不是预测细化的决策,DDR采用了双核生成器体系,学习通过检索决策词典来预测。这种方法不仅利用了易于采用的学习资源的丰富间接监督信号,还导致了增强的预测泛化性和semantically meaningful的决策空间表示。当评估在标签空间范围从百到千千的任务上,DDR大幅超越了强基eline的表现,平均提高了27.54%的P@1、1.17%的F1分数和1.26%的准确率。代码和资源可以在https://github.com/luka-group/DDR上找到。

Hierarchical Mutual Information Analysis: Towards Multi-view Clustering in The Wild

  • paper_url: http://arxiv.org/abs/2310.18614
  • repo_url: None
  • paper_authors: Jiatai Wang, Zhiwei Xu, Xuewen Yang, Xin Wang
  • for: This paper focuses on addressing the challenges of missing and unaligned data in multi-view clustering, which is a common problem in practical computer vision applications.
  • methods: The proposed method uses a deep framework that combines data recovery and alignment in a hierarchically consistent way, leveraging dual prediction and contrastive reconstruction to achieve instance-level and class-level alignment.
  • results: The proposed method significantly outperforms state-of-the-art methods on multi-view clustering even in the cases of view missing and unalignment, as demonstrated by extensive experiments on public datasets.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文针对多视图 clustering 中缺失和不一致的数据问题进行解决,这是实际计算机视觉应用中的常见问题。
  • methods: 提议的方法使用深度框架,将数据恢复和对齐 fusion 在层次结构上进行一致性验证,通过 dual prediction 和对比重建来实现实例级别和类别级别的对齐。
  • results: 提议的方法在实际公共数据集上进行了广泛的实验,与现有方法进行比较,得到了显著的性能提升,即使在缺失和不一致的情况下也能够达到显著的效果。
    Abstract Multi-view clustering (MVC) can explore common semantics from unsupervised views generated by different sources, and thus has been extensively used in applications of practical computer vision. Due to the spatio-temporal asynchronism, multi-view data often suffer from view missing and are unaligned in real-world applications, which makes it difficult to learn consistent representations. To address the above issues, this work proposes a deep MVC framework where data recovery and alignment are fused in a hierarchically consistent way to maximize the mutual information among different views and ensure the consistency of their latent spaces. More specifically, we first leverage dual prediction to fill in missing views while achieving the instance-level alignment, and then take the contrastive reconstruction to achieve the class-level alignment. To the best of our knowledge, this could be the first successful attempt to handle the missing and unaligned data problem separately with different learning paradigms. Extensive experiments on public datasets demonstrate that our method significantly outperforms state-of-the-art methods on multi-view clustering even in the cases of view missing and unalignment.
    摘要

Embedding in Recommender Systems: A Survey

  • paper_url: http://arxiv.org/abs/2310.18608
  • repo_url: None
  • paper_authors: Xiangyu Zhao, Maolin Wang, Xinjian Zhao, Jiansheng Li, Shucheng Zhou, Dawei Yin, Qing Li, Jiliang Tang, Ruocheng Guo
  • for: 本文提供了一个概述近期 embedding 技术在推荐系统中的研究进展的survey。
  • methods: 本文覆盖了多种 embedding 方法,包括 collaborative filtering、自监学习和图基于的技术。
  • results: 本文提出了一些创新的方法来提高推荐系统的性能和计算复杂性,包括 AutoML、哈希技术和量化技术。
    Abstract Recommender systems have become an essential component of many online platforms, providing personalized recommendations to users. A crucial aspect is embedding techniques that coverts the high-dimensional discrete features, such as user and item IDs, into low-dimensional continuous vectors and can enhance the recommendation performance. Applying embedding techniques captures complex entity relationships and has spurred substantial research. In this survey, we provide an overview of the recent literature on embedding techniques in recommender systems. This survey covers embedding methods like collaborative filtering, self-supervised learning, and graph-based techniques. Collaborative filtering generates embeddings capturing user-item preferences, excelling in sparse data. Self-supervised methods leverage contrastive or generative learning for various tasks. Graph-based techniques like node2vec exploit complex relationships in network-rich environments. Addressing the scalability challenges inherent to embedding methods, our survey delves into innovative directions within the field of recommendation systems. These directions aim to enhance performance and reduce computational complexity, paving the way for improved recommender systems. Among these innovative approaches, we will introduce Auto Machine Learning (AutoML), hash techniques, and quantization techniques in this survey. We discuss various architectures and techniques and highlight the challenges and future directions in these aspects. This survey aims to provide a comprehensive overview of the state-of-the-art in this rapidly evolving field and serve as a useful resource for researchers and practitioners working in the area of recommender systems.
    摘要 现在许多在线平台上都有推荐系统,为用户提供个性化的推荐。一个重要的方面是嵌入技术,将用户和 Item ID 等高维离散特征转换成低维连续向量,以提高推荐性能。采用嵌入技术可以捕捉复杂的实体关系,并促进了大量研究。在这篇报告中,我们提供了现代推荐系统中嵌入技术的最新Literature综述。这篇报告覆盖了协同练习、自然学习和图像基本技术等嵌入方法。协同练习生成 embeddings,捕捉用户和 Item 的偏好,在缺乏数据时表现出色。自然学习使用对比或生成学习来实现多种任务。图像基本技术如 node2vec 利用网络中的复杂关系。为了解决嵌入方法中的扩展性问题,我们在推荐系统领域内进行了创新的方向,以提高性能并降低计算复杂性,为未来的推荐系统铺平道路。这些创新方向包括自动机器学习(AutoML)、哈希技术和量化技术。我们讨论了不同的架构和技术,并高亮了这些方面中的挑战和未来方向。该报告的目的是为研究人员和实践者提供一份现代化的推荐系统领域的 estado-da-arte 资源,以便他们在这一领域进行更好的研究和实践。

  • paper_url: http://arxiv.org/abs/2310.18600
  • repo_url: https://github.com/law-ai/mildsum
  • paper_authors: Debtanu Datta, Shubham Soni, Rajdeep Mukherjee, Saptarshi Ghosh
  • for: 本研究旨在提供英文法律文件的跨语言概要,以便在印度的法律系统中提供更加公平的 justice。
  • methods: 该研究使用了多种多样的概要方法,以评估其在法律领域的性能。
  • results: 研究发现,跨语言概要在法律领域的应用仍然需要进一步的研究,以提高概要的准确性和可读性。
    Abstract Automatic summarization of legal case judgments is a practically important problem that has attracted substantial research efforts in many countries. In the context of the Indian judiciary, there is an additional complexity -- Indian legal case judgments are mostly written in complex English, but a significant portion of India's population lacks command of the English language. Hence, it is crucial to summarize the legal documents in Indian languages to ensure equitable access to justice. While prior research primarily focuses on summarizing legal case judgments in their source languages, this study presents a pioneering effort toward cross-lingual summarization of English legal documents into Hindi, the most frequently spoken Indian language. We construct the first high-quality legal corpus comprising of 3,122 case judgments from prominent Indian courts in English, along with their summaries in both English and Hindi, drafted by legal practitioners. We benchmark the performance of several diverse summarization approaches on our corpus and demonstrate the need for further research in cross-lingual summarization in the legal domain.
    摘要 自动摘要法律案例判决是一个实际重要的问题,在多个国家的研究中都获得了substantial的投入。在印度法院的背景下,有一个额外的复杂性---印度的法律案例判决大多是用复杂的英语写成,但印度大部分人口不会英语。因此,实际需要摘要法律文件的印地语言,以确保公平的法律服务。在先前的研究中,主要对源语言进行摘要,但这项研究则对英文法律文件进行标准化,并将其摘要为印地语言。我们建立了首个高品质的法律档案,包括3,122个案例判决由印度主要法院提供,以及其摘要在英语和印地语言中,由法律专业人员撰写。我们在我们的档案上评估了多种多元摘要方法的表现,并证明了在法律领域中的标准化摘要仍然需要进一步的研究。

Using Early Readouts to Mediate Featural Bias in Distillation

  • paper_url: http://arxiv.org/abs/2310.18590
  • repo_url: None
  • paper_authors: Rishabh Tiwari, Durga Sivasubramanian, Anmol Mekala, Ganesh Ramakrishnan, Pradeep Shenoy
  • for: 本研究旨在改进在真实世界的超级vised学习任务中深度网络学习的潜在损害,特别是在托管学习中,学生模型可能比对应教师模型更具有较低的表达能力。
  • methods: 我们提出了一种新的早期读取机制,通过使用早期网络层的表示来预测标签。我们发现这些早期读outs自动地标识了问题实例或组,具体来说是具有高度信任但 incorrect 预测的情况。
  • results: 我们显示了这种早期读outs可以自动地为实例层次提供较好的预测信号,可以用于修改分配损害loss的学习过程中。我们在多个benchmark数据集上显示了提高group fairness度量和学生模型的总准确率。此外,我们还提供了次要分析,以帮助理解超级vised学习中特征学习的角色。
    Abstract Deep networks tend to learn spurious feature-label correlations in real-world supervised learning tasks. This vulnerability is aggravated in distillation, where a student model may have lesser representational capacity than the corresponding teacher model. Often, knowledge of specific spurious correlations is used to reweight instances & rebalance the learning process. We propose a novel early readout mechanism whereby we attempt to predict the label using representations from earlier network layers. We show that these early readouts automatically identify problem instances or groups in the form of confident, incorrect predictions. Leveraging these signals to modulate the distillation loss on an instance level allows us to substantially improve not only group fairness measures across benchmark datasets, but also overall accuracy of the student model. We also provide secondary analyses that bring insight into the role of feature learning in supervision and distillation.
    摘要 深度网络通常在实际supervised learning任务中学习假的特征-标签相关性。这种漏洞在精神投射中更加加剧,因为学生模型可能比对应的教师模型有更差的表达能力。经常使用特定假相关性的知识来重新权衡实例和重新调整学习过程。我们提出了一种新的早期读取机制,尝试使用早期网络层的表示来预测标签。我们发现这些早期读outs自然地标识问题实例或组,即高信息准确预测。利用这些信号来修改分配损失的实例级别可以大幅提高 benchmark数据集上的组准则性和学生模型的总准确率。我们还提供了次要分析,探讨特征学习在监督和精神投射中的角色。

Visual Explanations via Iterated Integrated Attributions

  • paper_url: http://arxiv.org/abs/2310.18585
  • repo_url: None
  • paper_authors: Oren Barkan, Yehonatan Elisha, Yuval Asher, Amit Eshel, Noam Koenigstein
  • for: 这篇论文用于解释视觉模型的预测结果。
  • methods: 该论文使用迭代 интеGRATED ATTRIBUTES(IIA)方法,通过迭代 интеGRATE 输入图像、模型内部表示和导数,生成准确和专注的解释地图。
  • results: 论文的实验结果表明,IIA方法可以准确地解释视觉模型的预测结果,并且在不同任务、数据集和网络架构上表现出色,超过了其他当前领先的解释技术。
    Abstract We introduce Iterated Integrated Attributions (IIA) - a generic method for explaining the predictions of vision models. IIA employs iterative integration across the input image, the internal representations generated by the model, and their gradients, yielding precise and focused explanation maps. We demonstrate the effectiveness of IIA through comprehensive evaluations across various tasks, datasets, and network architectures. Our results showcase that IIA produces accurate explanation maps, outperforming other state-of-the-art explanation techniques.
    摘要 我们介绍Iterated Integrated Attributions(IIA),一种通用的视觉模型预测解释方法。IIA通过迭代 интеграpection 输入图像、模型内部表示和其导数,生成精细和专注的解释地图。我们通过多种任务、数据集和网络架构的全面评估,证明IIA可以生成准确的解释地图,超越其他当前领域的解释技术。

Breaking the Trilemma of Privacy, Utility, Efficiency via Controllable Machine Unlearning

  • paper_url: http://arxiv.org/abs/2310.18574
  • repo_url: https://github.com/guangyaodou/conmu
  • paper_authors: Zheyuan Liu, Guangyao Dou, Yijun Tian, Chunhui Zhang, Eli Chien, Ziwei Zhu
  • for: 这篇论文的主要目标是解决机器学习模型中的数据隐私问题,具体来说是通过控制 Privacy-Utility-Efficiency 三方面的质量来实现机器学习模型的卸载。
  • methods: 这篇论文提出了一种名为 Controllable Machine Unlearning(ConMU)的新框架,该框架包括三个基本模块:重要数据选择模块、进程 Gaussian 机制模块和卸载代理模块。这些模块协同实现了控制 Privacy-Utility-Efficiency 三方面的质量。
  • results: 对于各种标准数据集的实验表明,ConMU 控制机制具有优于现有卸载方法的灵活性和可控性,并且可以充分考虑不同的实际隐私法规。
    Abstract Machine Unlearning (MU) algorithms have become increasingly critical due to the imperative adherence to data privacy regulations. The primary objective of MU is to erase the influence of specific data samples on a given model without the need to retrain it from scratch. Accordingly, existing methods focus on maximizing user privacy protection. However, there are different degrees of privacy regulations for each real-world web-based application. Exploring the full spectrum of trade-offs between privacy, model utility, and runtime efficiency is critical for practical unlearning scenarios. Furthermore, designing the MU algorithm with simple control of the aforementioned trade-off is desirable but challenging due to the inherent complex interaction. To address the challenges, we present Controllable Machine Unlearning (ConMU), a novel framework designed to facilitate the calibration of MU. The ConMU framework contains three integral modules: an important data selection module that reconciles the runtime efficiency and model generalization, a progressive Gaussian mechanism module that balances privacy and model generalization, and an unlearning proxy that controls the trade-offs between privacy and runtime efficiency. Comprehensive experiments on various benchmark datasets have demonstrated the robust adaptability of our control mechanism and its superiority over established unlearning methods. ConMU explores the full spectrum of the Privacy-Utility-Efficiency trade-off and allows practitioners to account for different real-world regulations. Source code available at: https://github.com/guangyaodou/ConMU.
    摘要

A General Framework for Robust G-Invariance in G-Equivariant Networks

  • paper_url: http://arxiv.org/abs/2310.18564
  • repo_url: https://github.com/gtc-invariance/gtc-invariance
  • paper_authors: Sophia Sanborn, Nina Miolane
  • For: The paper proposes a method for achieving robust group-invariance in group-equivariant convolutional neural networks (G-CNNs) called the G-triple-correlation (G-TC) layer.* Methods: The G-TC layer leverages the theory of the triple-correlation on groups, which is a unique, lowest-degree polynomial invariant map that is also complete.* Results: The G-TC layer yields measurable improvements in classification accuracy over standard Max G-Pooling in G-CNN architectures, and is resistant to invariance-based adversarial attacks. The method is demonstrated on several groups acting on both $\mathbb{R}^2$ and $\mathbb{R}^3$ on the G-MNIST and G-ModelNet10 datasets.Here is the same information in Simplified Chinese text:* For: 本文提出了一种方法来实现robust group-invariance在群equivariant convolutional neural networks(G-CNNs)中,称为G-triple-correlation(G-TC)层。* Methods: G-TC层利用群中的 triple-correlation理论,这是一个唯一的、最低度的多项式恒等函数,同时也是完整的。* Results: G-TC层在G-CNN架构中提供了较好的分类精度,并且对 invariant-based adversarial attacks具有强大的Robustness。此方法在几个群中对 $\mathbb{R}^2$ 和 $\mathbb{R}^3$ 上的 G-MNIST 和 G-ModelNet10 数据集进行了证明。
    Abstract We introduce a general method for achieving robust group-invariance in group-equivariant convolutional neural networks ($G$-CNNs), which we call the $G$-triple-correlation ($G$-TC) layer. The approach leverages the theory of the triple-correlation on groups, which is the unique, lowest-degree polynomial invariant map that is also complete. Many commonly used invariant maps - such as the max - are incomplete: they remove both group and signal structure. A complete invariant, by contrast, removes only the variation due to the actions of the group, while preserving all information about the structure of the signal. The completeness of the triple correlation endows the $G$-TC layer with strong robustness, which can be observed in its resistance to invariance-based adversarial attacks. In addition, we observe that it yields measurable improvements in classification accuracy over standard Max $G$-Pooling in $G$-CNN architectures. We provide a general and efficient implementation of the method for any discretized group, which requires only a table defining the group's product structure. We demonstrate the benefits of this method for $G$-CNNs defined on both commutative and non-commutative groups - $SO(2)$, $O(2)$, $SO(3)$, and $O(3)$ (discretized as the cyclic $C8$, dihedral $D16$, chiral octahedral $O$ and full octahedral $O_h$ groups) - acting on $\mathbb{R}^2$ and $\mathbb{R}^3$ on both $G$-MNIST and $G$-ModelNet10 datasets.
    摘要 我们介绍了一个通用的方法,可以在群equivariant convolutional neural networks($G$-CNNs)中实现强健的群对称性,我们称之为$G$-三重相关($G$-TC)层。这种方法利用群论中的三重相関,这是唯一的、最低阶的多项式群对称函数,同时也是完备的。许多常用的对称函数,如最大值,都是不完备的:它们会消除群和信号结构中的一部分。一个完备的对称函数,则会消除群的行动所导致的变化,保留信号的结构信息。三重相关的完备性使得$G$-TC层具有强大的免疫力,可以通过观察它对抗对称基于的攻击而证明。此外,我们发现这种方法可以提高$G$-CNN的分类精度,比标准的最大值$G$-Pooling更好。我们提供了一个通用且有效的实现方法,这需要一个表格定义了群的产生结构。我们在$G$-CNNs中使用了不同的域群,包括$SO(2)$, $O(2)$, $SO(3)$,和$O(3)$(为数为顺序$C8$, $D16$, $O$和$O_h$群),并在$\mathbb{R}^2$和$\mathbb{R}^3$上进行了$G$-MNIST和$G$-ModelNet10数据集上的实验。

Optimization-Free Test-Time Adaptation for Cross-Person Activity Recognition

  • paper_url: http://arxiv.org/abs/2310.18562
  • repo_url: https://github.com/Claydon-Wang/OFTTA
  • paper_authors: Shuoyuan Wang, Jindong Wang, HuaJun Xi, Bob Zhang, Lei Zhang, Hongxin Wei
  • for: 这个论文主要针对的是人体动作识别(HAR)模型在实际应用中的性能降低问题,以及如何通过测试流式进行时间适应(TTA)来解决这个问题。
  • methods: 这篇论文提出了一种不需要优化的测试时适应(OFTTA)框架,用于抗预测域变化和实时适应。OFTTA使用了快速的测试时批处理(EDTN)来取代批处理(CBN)层,并对分类器进行了距离计算和支持集维护。
  • results: 对于三个公共的人体动作识别(HAR)数据集和两种不同的TTA设置,实验结果表明,OFTTA可以与现有的TTA方法进行比较,在分类性能和计算效率两个方面均有优异表现。此外,我们还验证了OFTTA在边缘设备上的可行性,表明可能的部署在实际应用中。
    Abstract Human Activity Recognition (HAR) models often suffer from performance degradation in real-world applications due to distribution shifts in activity patterns across individuals. Test-Time Adaptation (TTA) is an emerging learning paradigm that aims to utilize the test stream to adjust predictions in real-time inference, which has not been explored in HAR before. However, the high computational cost of optimization-based TTA algorithms makes it intractable to run on resource-constrained edge devices. In this paper, we propose an Optimization-Free Test-Time Adaptation (OFTTA) framework for sensor-based HAR. OFTTA adjusts the feature extractor and linear classifier simultaneously in an optimization-free manner. For the feature extractor, we propose Exponential DecayTest-time Normalization (EDTN) to replace the conventional batch normalization (CBN) layers. EDTN combines CBN and Test-time batch Normalization (TBN) to extract reliable features against domain shifts with TBN's influence decreasing exponentially in deeper layers. For the classifier, we adjust the prediction by computing the distance between the feature and the prototype, which is calculated by a maintained support set. In addition, the update of the support set is based on the pseudo label, which can benefit from reliable features extracted by EDTN. Extensive experiments on three public cross-person HAR datasets and two different TTA settings demonstrate that OFTTA outperforms the state-of-the-art TTA approaches in both classification performance and computational efficiency. Finally, we verify the superiority of our proposed OFTTA on edge devices, indicating possible deployment in real applications. Our code is available at \href{https://github.com/Claydon-Wang/OFTTA}{this https URL}.
    摘要 人体活动识别(HAR)模型经常在实际应用中受到分布偏移的影响,导致性能下降。测试时适应(TTA)是一种新趋势的学习方法,它在实时推断中使用测试流来调整预测,在HAR中尚未得到探索。然而,优化基本的TTA算法的计算成本过高,使其无法在有限的边缘设备上进行实时推断。在这篇论文中,我们提出了一种不需要优化的测试时适应(OFTTA)框架,用于感知器基本HAR。OFTTA同时调整特征提取器和线性分类器。特征提取器方面,我们提出了对域偏移的 exponential decay test-time normalization(EDTN),以取代传统的批量normalization(CBN)层。EDTN将CBN和测试时批量normalization(TBN)相结合,以提取可靠的特征对域偏移。分类器方面,我们通过计算特征和拟标的距离,来更新支持集和pseudo标签。此外,更新支持集的方法基于pseudo标签,可以利用EDTN提取的可靠特征。我们对三个公共跨人HAR数据集和两种不同的TTA设置进行了广泛的实验,结果表明OFTTA在分类性能和计算效率两个方面都超过了当前TTA方法。最后,我们验证了我们提出的OFTTA在边缘设备上的可部署性, indicating possible deployment in real applications.我们的代码可以在\href{https://github.com/Claydon-Wang/OFTTA}{这个https URL}上找到。

Deep Intrinsic Decomposition with Adversarial Learning for Hyperspectral Image Classification

  • paper_url: http://arxiv.org/abs/2310.18549
  • repo_url: None
  • paper_authors: Zhiqiang Gong, Xian Zhou, Wen Yao
  • for: 提高干扰因素影响的高spectral图像分类性能
  • methods: 利用深度学习的强化学习方法,提取环境因素相关的特征和分类特征,并在激烈学习环境下进行对环境和分类的共同学习
  • results: 对三个常用的实际数据集进行了实验,并与其他比较方法进行了比较,结果表明提出的方法可以提高高spectral图像分类性能。
    Abstract Convolutional neural networks (CNNs) have been demonstrated their powerful ability to extract discriminative features for hyperspectral image classification. However, general deep learning methods for CNNs ignore the influence of complex environmental factor which enlarges the intra-class variance and decreases the inter-class variance. This multiplies the difficulty to extract discriminative features. To overcome this problem, this work develops a novel deep intrinsic decomposition with adversarial learning, namely AdverDecom, for hyperspectral image classification to mitigate the negative impact of environmental factors on classification performance. First, we develop a generative network for hyperspectral image (HyperNet) to extract the environmental-related feature and category-related feature from the image. Then, a discriminative network is constructed to distinguish different environmental categories. Finally, a environmental and category joint learning loss is developed for adversarial learning to make the deep model learn discriminative features. Experiments are conducted over three commonly used real-world datasets and the comparison results show the superiority of the proposed method. The implementation of the proposed method and other compared methods could be accessed at https://github.com/shendu-sw/Adversarial Learning Intrinsic Decomposition for the sake of reproducibility.
    摘要 卷积神经网络(CNN)在多spectral影像分类中表现出了强大的特征提取能力。然而,通用深度学习方法忽略了环境因素的复杂影响,这会增加内类差异和降低对类差异,从而困难提取特征。为解决这个问题,本文提出了一种新的深度内在分解与对抗学习方法,称为AdverDecom,用于多spectral影像分类。首先,我们开发了一个生成网络(HyperNet),用于提取影像中的环境相关特征和类别相关特征。然后,我们构建了一个分类网络,用于分辨不同的环境类别。最后,我们开发了一个环境和类别联合学习损失函数,用于对抗学习,以使深度模型学习特征。我们在三个常用的实际数据集上进行了实验,并比较了我们的方法和其他比较方法的结果,显示了我们的方法的优越性。实现方法和其他比较方法的实现可以通过https://github.com/shendu-sw/Adversarial Learning Intrinsic Decomposition访问,以便重现。

ReConTab: Regularized Contrastive Representation Learning for Tabular Data

  • paper_url: http://arxiv.org/abs/2310.18541
  • repo_url: None
  • paper_authors: Suiyao Chen, Jing Wu, Naira Hovakimyan, Handong Yao
  • for: 本研究旨在提出一种深度自动表示学习框架,以提高tabular数据中的特征工程和选择过程。
  • methods: 该框架基于同 Raw Features 的非对称 autoencoder,并采用了正则化技术进行 Raw Feature 选择。同时,框架还应用了对比学习来维护最关键的信息。
  • results: 实验结果表明,该框架可以在各种实际数据集上提供显著的性能提升,并且可以轻松地与传统方法相结合,如 XGBoost 和 Random Forest。
    Abstract Representation learning stands as one of the critical machine learning techniques across various domains. Through the acquisition of high-quality features, pre-trained embeddings significantly reduce input space redundancy, benefiting downstream pattern recognition tasks such as classification, regression, or detection. Nonetheless, in the domain of tabular data, feature engineering and selection still heavily rely on manual intervention, leading to time-consuming processes and necessitating domain expertise. In response to this challenge, we introduce ReConTab, a deep automatic representation learning framework with regularized contrastive learning. Agnostic to any type of modeling task, ReConTab constructs an asymmetric autoencoder based on the same raw features from model inputs, producing low-dimensional representative embeddings. Specifically, regularization techniques are applied for raw feature selection. Meanwhile, ReConTab leverages contrastive learning to distill the most pertinent information for downstream tasks. Experiments conducted on extensive real-world datasets substantiate the framework's capacity to yield substantial and robust performance improvements. Furthermore, we empirically demonstrate that pre-trained embeddings can seamlessly integrate as easily adaptable features, enhancing the performance of various traditional methods such as XGBoost and Random Forest.
    摘要 <>机器学习中的表示学习技术在不同领域具有重要的地位。通过获得高质量的特征,预训练的嵌入significantly reducent输入空间的重复性,从而为下游的模式识别任务,如分类、回归或检测提供了明显的性能提升。然而,在标量数据领域,功能工程和选择仍然高度依赖于人工干预,导致时间消耗大、需要域专业知识。为解决这个挑战,我们介绍ReConTab,一种深度自动表示学习框架,通过带有正则化的对比学习来实现。不论任务模型的类型,ReConTab使用同 Raw Features 的同构自动encoder来生成低维表示嵌入。特别是,对 Raw Features 进行正则化处理。同时,ReConTab通过对比学习来筛选最关键的信息,以便下游任务。经验表明,ReConTab在广泛的实际数据集上实现了显著和可靠的性能提升。此外,我们也证明了预训练嵌入可以轻松地适应为多种传统方法,如 XGBoost 和 Random Forest 提高性能。<>