cs.LG - 2023-10-16

Active Learning Framework for Cost-Effective TCR-Epitope Binding Affinity Prediction

  • paper_url: http://arxiv.org/abs/2310.10893
  • repo_url: https://github.com/lee-cbg/activetcr
  • paper_authors: Pengfei Zhang, Seojin Bang, Heewook Lee
  • for: 这个研究的目的是为了提高TCR与复装序列之间的紧密互动,以便更好地预测TCR对复装序列的认知。
  • methods: 这个研究使用了活动学习和机器学习技术,将TCR与复装序列之间的紧密互动预测模型与训练数据集成一体。
  • results: 这个研究发现,使用活动学习可以大幅降低验证成本,并且可以提高预测模型的性能。此外,提供真实的TCR与复装序列对的标签可以帮助减少更多的重复数据,无需增加训练数据量。
    Abstract T cell receptors (TCRs) are critical components of adaptive immune systems, responsible for responding to threats by recognizing epitope sequences presented on host cell surface. Computational prediction of binding affinity between TCRs and epitope sequences using machine/deep learning has attracted intense attention recently. However, its success is hindered by the lack of large collections of annotated TCR-epitope pairs. Annotating their binding affinity requires expensive and time-consuming wet-lab evaluation. To reduce annotation cost, we present ActiveTCR, a framework that incorporates active learning and TCR-epitope binding affinity prediction models. Starting with a small set of labeled training pairs, ActiveTCR iteratively searches for unlabeled TCR-epitope pairs that are ''worth'' for annotation. It aims to maximize performance gains while minimizing the cost of annotation. We compared four query strategies with a random sampling baseline and demonstrated that ActiveTCR reduces annotation costs by approximately 40%. Furthermore, we showed that providing ground truth labels of TCR-epitope pairs to query strategies can help identify and reduce more than 40% redundancy among already annotated pairs without compromising model performance, enabling users to train equally powerful prediction models with less training data. Our work is the first systematic investigation of data optimization for TCR-epitope binding affinity prediction.
    摘要

The Calysto Scheme Project

  • paper_url: http://arxiv.org/abs/2310.10886
  • repo_url: https://github.com/calysto/calysto_scheme
  • paper_authors: Douglas S. Blank, James B. Marshall
  • for: 这篇论文主要是用来介绍一种名为Calysto Scheme的编程语言,以及它在Python基础上实现的一系列可行性和可读性优化。
  • methods: 这篇论文使用了Continuation-Passing Style和一系列正确性保持的编程变换来将Scheme语言转换成Python语言。它支持标准Scheme功能,包括call/cc,以及一些扩展功能,如非探测运算符、自动回tracking和Python交互。
  • results: 这篇论文的研究结果表明,Calysto Scheme可以在教学和实际应用中使用,具有简单易用的特点和可以快速安装。它已经在Jupyter Notebook生态系统中集成,并在教学中使用了一些有趣和独特的措施。
    Abstract Calysto Scheme is written in Scheme in Continuation-Passing Style, and converted through a series of correctness-preserving program transformations into Python. It has support for standard Scheme functionality, including call/cc, as well as syntactic extensions, a nondeterministic operator for automatic backtracking, and many extensions to allow Python interoperation. Because of its Python foundation, it can take advantage of modern Python libraries, including those for machine learning and other pedagogical contexts. Although Calysto Scheme was developed with educational purposes in mind, it has proven to be generally useful due to its simplicity and ease of installation. It has been integrated into the Jupyter Notebook ecosystem and used in the classroom to teach introductory Programming Languages with some interesting and unique twists.
    摘要 加利斯托计划(Calysto Scheme)是一种使用Scheme语言编写的continuation-passing style编程语言,并通过一系列正确性保持的程序转换成Python语言。它支持标准Scheme功能,包括call/cc,以及一些语法扩展和不确定运算符,用于自动回tracking。由于其基于Python语言,因此可以利用现代Python库,包括机器学习和其他教学上的其他库。虽然加利斯托计划是为教育目的设计的,但由于其简单易用,因此在其他场景中也有广泛的应用。它已经与Jupyter Notebook生态系统集成,并在课堂上使用,以教授初级编程语言。

BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling

  • paper_url: http://arxiv.org/abs/2310.10879
  • repo_url: None
  • paper_authors: Raphael Ruschel, A. S. M. Iftekhar, B. S. Manjunath, Suya You
  • for: 提高现代深度学习模型的优化和扩展数据集的训练效率
  • methods: 提出了一种基于分布式数据并行训练的新训练方案,可以有效地处理不同长度的序列,无需添加过多的padding
  • results: 通过该方案,可以在训练时间和准确率两个方面提高表现,在实验中比较了100倍以上的减少padding量而不删除任何帧
    Abstract The increasing complexity of modern deep neural network models and the expanding sizes of datasets necessitate the development of optimized and scalable training methods. In this white paper, we addressed the challenge of efficiently training neural network models using sequences of varying sizes. To address this challenge, we propose a novel training scheme that enables efficient distributed data-parallel training on sequences of different sizes with minimal overhead. By using this scheme we were able to reduce the padding amount by more than 100$x$ while not deleting a single frame, resulting in an overall increased performance on both training time and Recall in our experiments.
    摘要 现代深度神经网络模型的复杂度和数据集的规模在不断增长,因此需要开发优化和可扩展的训练方法。在这份白皮书中,我们解决了神经网络模型使用不同长度序列训练的挑战。我们提议一种新的训练方案,可以允许高效分布式数据并行训练,无需较大的过载。通过这种方法,我们可以在不删帧的情况下,将padding减少超过100倍,从而提高了训练时间和准确率的表现。

Eco-Driving Control of Connected and Automated Vehicles using Neural Network based Rollout

  • paper_url: http://arxiv.org/abs/2310.10878
  • repo_url: None
  • paper_authors: Jacob Paugh, Zhaoxuan Zhu, Shobhit Gupta, Marcello Canova, Stephanie Stockar
  • for: 提高connected和自主汽车的能源消耗效率,通过在行驶中使用 Vehicle-to-Everything信息优化车辆速度和动力系统。
  • methods: 使用层次多时间预报法,通过神经网络学习全路价值函数,并用来 aproximate终端成本在减少范围优化中。
  • results: 在实际道路上的模拟中,提议的方法可以与使用强化学习获得的 Stochastic 优化解决方案相当,而无需 слож的训练方法和内存占用。
    Abstract Connected and autonomous vehicles have the potential to minimize energy consumption by optimizing the vehicle velocity and powertrain dynamics with Vehicle-to-Everything info en route. Existing deterministic and stochastic methods created to solve the eco-driving problem generally suffer from high computational and memory requirements, which makes online implementation challenging. This work proposes a hierarchical multi-horizon optimization framework implemented via a neural network. The neural network learns a full-route value function to account for the variability in route information and is then used to approximate the terminal cost in a receding horizon optimization. Simulations over real-world routes demonstrate that the proposed approach achieves comparable performance to a stochastic optimization solution obtained via reinforcement learning, while requiring no sophisticated training paradigm and negligible on-board memory.
    摘要 自适应和连接的汽车可以减少能源消耗,通过优化车辆速度和动力系统,基于车辆到所有事物(V2X)信息在路线上进行优化。现有的决策方法通常具有高计算和存储需求,在线实现具有挑战性。本工作提出了层次多 horizons 优化框架,通过神经网络学习全路价值函数,并用于缩小往返极限优化。通过实验示例,我们证明了提议的方法可以与基于强化学习的随机优化方法相当,而不需要复杂的训练方法和车辆上的快速存储。

Religious Affiliation in the Twenty-First Century: A Machine Learning Perspective on the World Value Survey

  • paper_url: http://arxiv.org/abs/2310.10874
  • repo_url: None
  • paper_authors: Elaheh Jafarigol, William Keely, Tess Hartog, Tom Welborn, Peyman Hekmatpour, Theodore B. Trafalis
  • for: 这个研究是使用全球数据收集的世界价值调查数据进行量化分析,以研究社会中个体的宗教信仰、价值观和行为的变化趋势。
  • methods: 该研究使用随机森林算法来标识宗教性的关键因素,并使用重抽样技术来平衡数据并改善偏袋学习性能指标。
  • results: 变量重要性分析结果显示,年龄和收入在大多数国家中是最重要的变量,这与社会学基本理论中关于宗教和人类行为的概念有直接关系。
    Abstract This paper is a quantitative analysis of the data collected globally by the World Value Survey. The data is used to study the trajectories of change in individuals' religious beliefs, values, and behaviors in societies. Utilizing random forest, we aim to identify the key factors of religiosity and classify respondents of the survey as religious and non religious using country level data. We use resampling techniques to balance the data and improve imbalanced learning performance metrics. The results of the variable importance analysis suggest that Age and Income are the most important variables in the majority of countries. The results are discussed with fundamental sociological theories regarding religion and human behavior. This study is an application of machine learning in identifying the underlying patterns in the data of 30 countries participating in the World Value Survey. The results from variable importance analysis and classification of imbalanced data provide valuable insights beneficial to theoreticians and researchers of social sciences.
    摘要 这个论文是一项量化分析全球World Value Survey所收集的数据。数据用于研究社会中个体信仰、价值观和行为的变化轨迹。使用Random Forest算法,我们希望寻找 religiosity 的关键因素并使用国家级数据将响应者分类为宗教和非宗教。我们使用重样技术来协议数据,以改善偏袋学习表现指标。结果变量重要性分析表明,年龄和收入在大多数国家中是最重要的变量。结果与基本社会学理论相关于宗教和人类行为进行了讨论。这项研究是机器学习在World Value Survey数据中寻找下面的 patrón 的应用。变量重要性分析和响应者分类提供了有价值的发现,对社会科学研究人员有所帮助。

Joint Optimization of Traffic Signal Control and Vehicle Routing in Signalized Road Networks using Multi-Agent Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.10856
  • repo_url: None
  • paper_authors: Xianyue Peng, Hang Gao, Gengyue Han, Hao Wang, Michael Zhang
  • for: 解决现代道路网络中的城市干道堵塞问题,提高交通效率。
  • methods: 提出了一种联合优化方法,通过同时控制信号时间和车辆路径选择,使用多智能拟似人类学习(MADRL)。
  • results: 经过数学实验表明,我们的策略可以在修改后的哈佛环境中提高交通效率,并且比只控制信号时间或车辆路径选择 alone 更高。
    Abstract Urban traffic congestion is a critical predicament that plagues modern road networks. To alleviate this issue and enhance traffic efficiency, traffic signal control and vehicle routing have proven to be effective measures. In this paper, we propose a joint optimization approach for traffic signal control and vehicle routing in signalized road networks. The objective is to enhance network performance by simultaneously controlling signal timings and route choices using Multi-Agent Deep Reinforcement Learning (MADRL). Signal control agents (SAs) are employed to establish signal timings at intersections, whereas vehicle routing agents (RAs) are responsible for selecting vehicle routes. By establishing relevance between agents and enabling them to share observations and rewards, interaction and cooperation among agents are fostered, which enhances individual training. The Multi-Agent Advantage Actor-Critic algorithm is used to handle multi-agent environments, and Deep Neural Network (DNN) structures are designed to facilitate the algorithm's convergence. Notably, our work is the first to utilize MADRL in determining the optimal joint policy for signal control and vehicle routing. Numerical experiments conducted on the modified Sioux network demonstrate that our integration of signal control and vehicle routing outperforms controlling signal timings or vehicles' routes alone in enhancing traffic efficiency.
    摘要 现代城市交通堵塞是一个严重的问题,对现代道路网络造成了很大的影响。为了解决这个问题并提高交通效率,交通信号控制和车辆 routing 已经被证明是有效的解决方案。在这篇论文中,我们提出了一种联合优化方法,将交通信号控制和车辆 routing 联合优化。我们使用多 Agent Deep Reinforcement Learning(MADRL)来同时控制信号时间和车辆路径选择。信号控制代理(SAs)负责在交叉口设置信号时间,而车辆路径选择代理(RAs)负责选择车辆路径。通过在代理之间建立相关性,并让代理们共享观察和奖励,这会促进代理之间的交互和合作,从而提高每个代理的训练效果。我们使用多 Agent Advantage Actor-Critic算法来处理多 Agent 环境,并使用深度神经网络(DNN)结构来促进算法的收敛。值得注意的是,我们的工作是首次在确定合适的共同策略方面利用 MADRL。我们在修改过 Sioux 网络进行数值实验,结果显示,我们将信号控制和车辆 routing 联合优化的策略与单独控制信号时间或车辆路径的策略相比,在提高交通效率方面具有显著的优势。

Probabilistic Classification by Density Estimation Using Gaussian Mixture Model and Masked Autoregressive Flow

  • paper_url: http://arxiv.org/abs/2310.10843
  • repo_url: https://github.com/bghojogh/density-based-classifiers
  • paper_authors: Benyamin Ghojogh, Milad Amir Toutounchian
  • for: 这篇论文主要用于提出一种基于密度估计的分类方法,而密度估计通常用于数据分布估计而不是分类。
  • methods: 该论文使用了两种密度估计方法: Gaussian Mixture Model (GMM) 和 Masked Autoregressive Flow (MAF)。GMM 是一种预测最大化的密度估计方法,而 MAF 则是一种基于普通化流和自适应网络的生成模型。
  • results: 该论文的实验结果显示,使用 GMM 和 MAF 进行密度估计可以超过简单的线性分类器,如线性激发分析。此外,该论文还开启了研究者们可以根据密度估计来提出其他可能的概率分类器的研究方向。
    Abstract Density estimation, which estimates the distribution of data, is an important category of probabilistic machine learning. A family of density estimators is mixture models, such as Gaussian Mixture Model (GMM) by expectation maximization. Another family of density estimators is the generative models which generate data from input latent variables. One of the generative models is the Masked Autoregressive Flow (MAF) which makes use of normalizing flows and autoregressive networks. In this paper, we use the density estimators for classification, although they are often used for estimating the distribution of data. We model the likelihood of classes of data by density estimation, specifically using GMM and MAF. The proposed classifiers outperform simpler classifiers such as linear discriminant analysis which model the likelihood using only a single Gaussian distribution. This work opens the research door for proposing other probabilistic classifiers based on joint density estimation.
    摘要 density 估计是机器学习中一种重要的分布估计类别。一个家族的density 估计器是混合模型,如 Gaussian Mixture Model (GMM) 使用期望最大化。另一个家族的density 估计器是生成模型,它们可以将输入的latent variable generate 为数据。本文使用density 估计器进行分类,尽管它们通常用于估计数据的分布。我们使用GMM和MAF来模型数据的类别概率。提议的分类器比 simpler 分类器,如线性混合分析,which 模型只使用单个 Gaussian 分布来模型类别概率。这项工作打开了研究机会,提出其他基于共同分布估计的概率分类器的建议。

A Machine Learning-based Algorithm for Automated Detection of Frequency-based Events in Recorded Time Series of Sensor Data

  • paper_url: http://arxiv.org/abs/2310.10841
  • repo_url: None
  • paper_authors: Bahareh Medghalchi, Andreas Vogel
  • for: 本研究旨在提出一种新的自动事件探测方法,用于检测时间序列数据中的事件。
  • methods: 该方法首先将时间序列数据映射到时间频域的表示中,然后对表示进行筛选和对象检测模型的训练,以检测期望的事件对象在表示中。
  • results: 该方法在未seen的数据集上测试,准确率为0.97,能够准确地检测事件的时间间隔,提高了自动事件检测的精度和可靠性。
    Abstract Automated event detection has emerged as one of the fundamental practices to monitor the behavior of technical systems by means of sensor data. In the automotive industry, these methods are in high demand for tracing events in time series data. For assessing the active vehicle safety systems, a diverse range of driving scenarios is conducted. These scenarios involve the recording of the vehicle's behavior using external sensors, enabling the evaluation of operational performance. In such setting, automated detection methods not only accelerate but also standardize and objectify the evaluation by avoiding subjective, human-based appraisals in the data inspection. This work proposes a novel event detection method that allows to identify frequency-based events in time series data. To this aim, the time series data is mapped to representations in the time-frequency domain, known as scalograms. After filtering scalograms to enhance relevant parts of the signal, an object detection model is trained to detect the desired event objects in the scalograms. For the analysis of unseen time series data, events can be detected in their scalograms with the trained object detection model and are thereafter mapped back to the time series data to mark the corresponding time interval. The algorithm, evaluated on unseen datasets, achieves a precision rate of 0.97 in event detection, providing sharp time interval boundaries whose accurate indication by human visual inspection is challenging. Incorporating this method into the vehicle development process enhances the accuracy and reliability of event detection, which holds major importance for rapid testing analysis.
    摘要 自动化事件检测已经成为监测技术系统行为的基本实践,通过感知器数据。在汽车业,这些方法受到时间序列数据跟踪事件的高需求。为评估活动汽车安全系统,进行了多种驾驶场景测试。这些场景包括记录汽车的行为使用外部感知器,以便评估运作性能。在这种设置下,自动检测方法不仅加速了,还标准化和对象化了评估,通过避免人类基于数据检查的主观评估。本研究提出了一种新的事件检测方法,可以在时间序列数据中检测频率基于事件。为此,将时间序列数据映射到时间频率域的表示,称为scalogram。然后,对scalogram进行过滤,以增强有关信号的部分。然后,将对象检测模型训练以检测想要的事件对象在scalogram中。对于未看过的时间序列数据分析,可以使用训练好的对象检测模型在scalogram中检测事件,并将其映射回时间序列数据,以标识相应的时间间隔。这种算法,在未经过测试的数据上进行评估,具有0.97的准确率,提供了准确的时间间隔边界,人工视觉检查具有挑战性。通过将这种方法 integrate到汽车开发过程中,提高了事件检测的准确性和可靠性,这些特性对于快速分析具有重要性。

Approximating Two-Layer Feedforward Networks for Efficient Transformers

  • paper_url: http://arxiv.org/abs/2310.10837
  • repo_url: https://github.com/robertcsordas/moe
  • paper_authors: Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber
  • for: 降低神经网络(NN)计算和内存需求,而不是牺牲性能。
  • methods: 使用稀疏混合专家(MoE)建立资源高效的大语言模型(LM)。
  • results: 在 WikiText-103 和 enwiki8 数据集上,与 dense Transformer-XL 相比,我们的 MoE 在两个不同的尺度上具有相当的竞争力,而且具有许多资源的灵活性。
    Abstract How to reduce compute and memory requirements of neural networks (NNs) without sacrificing performance? Many recent works use sparse Mixtures of Experts (MoEs) to build resource-efficient large language models (LMs). Here we introduce several novel perspectives on MoEs, presenting a general framework that unifies various methods to approximate two-layer NNs (e.g., feedforward blocks of Transformers), including product-key memories (PKMs). Leveraging insights from this framework, we propose methods to improve both MoEs and PKMs. Unlike prior work that compares MoEs with dense baselines under the compute-equal condition, our evaluation condition is parameter-equal, which is crucial to properly evaluate LMs. We show that our MoEs are competitive with the dense Transformer-XL on both the WikiText-103 and enwiki8 datasets at two different scales, while being much more resource efficient. This demonstrates that MoEs are relevant not only to extremely large LMs but also to any-scale resource-efficient LMs. Our code is public.
    摘要 如何减少神经网络(NN)的计算和内存需求而不 sacrificing性能?许多latest works使用稀疏混合专家(MoE)来建立资源高效的大语言模型(LM)。我们介绍了MoE的一些新的视角,提出了一个通用的框架,可以 aproximate two-layer NNs(例如Transformers的Feedforward块),包括产品密钥记忆(PKM)。通过这个框架的洞察,我们提出了MoE和PKM的改进方法。与先前的工作不同,我们的评估条件是参数平等(parameter-equal),这是评估LMs的关键。我们的MoE在WikiText-103和enwiki8 dataset上与 dense Transformer-XL 在两个不同的 scales 上具有相似的竞争力,而且具有许多资源的约束。这表明MoE不仅适用于极大规模的LM,还适用于任何规模的资源高效LM。我们的代码public。

Gaussian processes based data augmentation and expected signature for time series classification

  • paper_url: http://arxiv.org/abs/2310.10836
  • repo_url: None
  • paper_authors: Marco Romito, Francesco Triggiano
  • for: 这篇论文旨在提出一种基于预期签名的时间序列特征提取模型,用于预测时间序列的统计特征。
  • methods: 该模型使用 Gaussian processes 数据增强来计算预期签名,并通过一个监督任务来学习最佳的特征提取。
  • results: 模型可以学习出优化的特征提取,并且可以用于预测时间序列的统计特征。
    Abstract The signature is a fundamental object that describes paths (that is, continuous functions from an interval to a Euclidean space). Likewise, the expected signature provides a statistical description of the law of stochastic processes. We propose a feature extraction model for time series built upon the expected signature. This is computed through a Gaussian processes based data augmentation. One of the main features is that an optimal feature extraction is learnt through the supervised task that uses the model.
    摘要 《签名》是一个基本对象,描述了路径(即连续函数从一个时间interval到一个欧几里得空间)。类似地,预期签名提供了一个统计描述,描述了随机过程的法律。我们提议一种基于预期签名的特征提取模型,用于时间序列。这个模型通过基于 Gaussian 过程的数据增强来计算预期签名。其中一个主要特点是通过一个监督任务来学习优化的特征提取。

Accurate Data-Driven Surrogates of Dynamical Systems for Forward Propagation of Uncertainty

  • paper_url: http://arxiv.org/abs/2310.10831
  • repo_url: None
  • paper_authors: Saibal De, Reese E. Jones, Hemanth Kolla
  • for: 这篇研究的目的是为了提出一种新的非侵入式方法来建立模型uncertainty quantification中的代理模型。
  • methods: 这篇研究使用了Stochastic collocation(SC)方法,并与Data-driven sparse identification of nonlinear dynamics(SINDy)框架结合,以建立动态模型的代理模型。
  • results: 研究发现,使用SC-over-dynamics框架可以降低错误,包括系统轨道的描述和模型状态分布的描述。三个测试问题中的两个问题(一个ordinary differential equation和一个partial differential equation)的数据表明,这种方法可以提供更好的结果。
    Abstract Stochastic collocation (SC) is a well-known non-intrusive method of constructing surrogate models for uncertainty quantification. In dynamical systems, SC is especially suited for full-field uncertainty propagation that characterizes the distributions of the high-dimensional primary solution fields of a model with stochastic input parameters. However, due to the highly nonlinear nature of the parameter-to-solution map in even the simplest dynamical systems, the constructed SC surrogates are often inaccurate. This work presents an alternative approach, where we apply the SC approximation over the dynamics of the model, rather than the solution. By combining the data-driven sparse identification of nonlinear dynamics (SINDy) framework with SC, we construct dynamics surrogates and integrate them through time to construct the surrogate solutions. We demonstrate that the SC-over-dynamics framework leads to smaller errors, both in terms of the approximated system trajectories as well as the model state distributions, when compared against full-field SC applied to the solutions directly. We present numerical evidence of this improvement using three test problems: a chaotic ordinary differential equation, and two partial differential equations from solid mechanics.
    摘要 这项工作提出了一种alternative方法,在这里我们通过SC 近似方法来 Approximate the dynamics of the model, rather than the solution。通过将数据驱动的稀疏标识非线性动力(SINDy)框架与SC 结合,我们构建了动力准模型,并将其通过时间集成以构建准确的解决方案。我们发现,使用SC над动力框架,相比拟合解决方案直接使用SC,能够减少误差,包括系统轨迹的近似值以及模型状态分布的误差。我们通过三个测试问题来提供数字证据:一个混沌的常微分方程,以及两个固体力学中的部分偏微分方程。

Uncertainty-aware transfer across tasks using hybrid model-based successor feature reinforcement learning

  • paper_url: http://arxiv.org/abs/2310.10818
  • repo_url: None
  • paper_authors: Parvin Malekzadeh, Ming Hou, Konstantinos N. Plataniotis
  • For: 该论文主要针对复杂大规模决策问题的实用强化学习(RL)问题进行研究,旨在提高样本效率。* Methods: 该论文提出了一种将模型基于(MB)方法与继承特征(SF)算法结合的方法,以及一种基于卡尔曼滤波器(KF)的多模型适应估计来实现不确定性感知探索。* Results: 该论文的实验结果显示,该算法可以在不同的转移动力学中转移知识,对下游任务进行有效的探索和学习,并在样本量方面比起现有基elines要少得多。
    Abstract Sample efficiency is central to developing practical reinforcement learning (RL) for complex and large-scale decision-making problems. The ability to transfer and generalize knowledge gained from previous experiences to downstream tasks can significantly improve sample efficiency. Recent research indicates that successor feature (SF) RL algorithms enable knowledge generalization between tasks with different rewards but identical transition dynamics. It has recently been hypothesized that combining model-based (MB) methods with SF algorithms can alleviate the limitation of fixed transition dynamics. Furthermore, uncertainty-aware exploration is widely recognized as another appealing approach for improving sample efficiency. Putting together two ideas of hybrid model-based successor feature (MB-SF) and uncertainty leads to an approach to the problem of sample efficient uncertainty-aware knowledge transfer across tasks with different transition dynamics or/and reward functions. In this paper, the uncertainty of the value of each action is approximated by a Kalman filter (KF)-based multiple-model adaptive estimation. This KF-based framework treats the parameters of a model as random variables. To the best of our knowledge, this is the first attempt at formulating a hybrid MB-SF algorithm capable of generalizing knowledge across large or continuous state space tasks with various transition dynamics while requiring less computation at decision time than MB methods. The number of samples required to learn the tasks was compared to recent SF and MB baselines. The results show that our algorithm generalizes its knowledge across different transition dynamics, learns downstream tasks with significantly fewer samples than starting from scratch, and outperforms existing approaches.
    摘要 sample efficiency是RL中央的一个重要问题,它可以提高RL的实用性和可扩展性。在不同的任务中,通过转移和总结之前的经验可以提高sample efficiency。现有研究表明,Successor Feature(SF)RL算法可以在不同的奖励下产生相同的转移动力学中转移知识。此外,uncertainty-aware探索被广泛认为是一个有appeal的方法,可以提高sample efficiency。将两个想法MB-SF和uncertainty结合起来,可以解决在不同的转移动力学或奖励函数下的样本效率不稳定问题。在这篇论文中,我们使用Kalman缓冲(KF)基于多模型适应的方法来估计每个行为的值的不确定性。这是我们知道的第一个能够在大或连续状态空间任务上广泛适用的hybrid MB-SF算法。我们对SF和MB基线进行比较,并发现我们的算法可以在不同的转移动力学下转移知识,在不同的任务上学习要少样本更多,并且超过现有的方法。

Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms

  • paper_url: http://arxiv.org/abs/2310.10810
  • repo_url: https://github.com/abukharin3/ernie
  • paper_authors: Alexander Bukharin, Yan Li, Yue Yu, Qingru Zhang, Zhehui Chen, Simiao Zuo, Chao Zhang, Songan Zhang, Tuo Zhao
  • For: The paper aims to improve the robustness of multi-agent reinforcement learning (MARL) policies by controlling the Lipschitz constant of the policies and using adversarial regularization to promote continuity with respect to state observations and actions.* Methods: The proposed framework, called ERNIE, uses adversarial regularization to promote the Lipschitz continuity of policies, and the authors reformulate adversarial regularization as a Stackelberg game to reduce training instability.* Results: The authors demonstrate the effectiveness of the proposed framework with extensive experiments in traffic light control and particle environments, and show that the ERNIE framework provides robustness against noisy observations, changing transition dynamics, and malicious actions of agents. Additionally, the authors extend ERNIE to mean-field MARL with a formulation based on distributionally robust optimization that outperforms its non-robust counterpart and is of independent interest.
    Abstract Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains. Despite this promise, MARL policies often lack robustness and are therefore sensitive to small changes in their environment. This presents a serious concern for the real world deployment of MARL algorithms, where the testing environment may slightly differ from the training environment. In this work we show that we can gain robustness by controlling a policy's Lipschitz constant, and under mild conditions, establish the existence of a Lipschitz and close-to-optimal policy. Based on these insights, we propose a new robust MARL framework, ERNIE, that promotes the Lipschitz continuity of the policies with respect to the state observations and actions by adversarial regularization. The ERNIE framework provides robustness against noisy observations, changing transition dynamics, and malicious actions of agents. However, ERNIE's adversarial regularization may introduce some training instability. To reduce this instability, we reformulate adversarial regularization as a Stackelberg game. We demonstrate the effectiveness of the proposed framework with extensive experiments in traffic light control and particle environments. In addition, we extend ERNIE to mean-field MARL with a formulation based on distributionally robust optimization that outperforms its non-robust counterpart and is of independent interest. Our code is available at https://github.com/abukharin3/ERNIE.
    摘要 多智能体学习(MARL)已经在多个领域展示了承诺的成绩。然而,MARL策略通常缺乏鲁棒性,因此容易受到环境小变化的影响。这对实际世界中部署MARL算法提出了严重的问题,因为测试环境可能与训练环境有所不同。在这种情况下,我们展示了通过控制策略的 lipschitz常量来增加鲁棒性的可能性,并在某些条件下证明存在 lipschitz和准确策略的存在。基于这些发现,我们提出了一个新的鲁棒MARL框架,称为ERNIE,它通过对策略的状态观测和行动进行对抗规范化来提高策略的鲁棒性。ERNIE框架可以抵抗噪声观测、变化的转移动力和代理者的恶意行动。然而,ERNIE的对抗规范化可能会导致训练不稳定。为了减少这种不稳定,我们将对抗规范化转换为一个Stackelberg游戏。我们通过广泛的实验证明了提议的框架的效果,包括交通灯控制和粒子环境。此外,我们将ERNIE扩展到了 Mean-field MARL,使其超越其不鲁棒的对比,并且这种扩展是独立有价值的。我们的代码可以在https://github.com/abukharin3/ERNIE上获取。

Regularization properties of adversarially-trained linear regression

  • paper_url: http://arxiv.org/abs/2310.10807
  • repo_url: https://github.com/antonior92/advtrain-linreg
  • paper_authors: Antônio H. Ribeiro, Dave Zachariah, Francis Bach, Thomas B. Schön
  • For: The paper is focused on studying the effectiveness of adversarial training in defending against input perturbations in linear models, and exploring the relationship between adversarial training and other regularization methods.* Methods: The paper uses a min-max formulation of adversarial training to search for the best solution when the training data are corrupted by worst-case attacks. The authors also compare the solution of adversarial training with other regularization methods, such as ridge regression and Lasso.* Results: The main findings of the paper include that adversarial training yields the minimum-norm interpolating solution in the overparameterized regime, and that adversarial training can be equivalent to parameter shrinking methods in the underparametrized region. Additionally, the choice of adversarial radius for optimal bounds does not depend on the additive noise variance. The authors confirm their theoretical findings with numerical examples.
    Abstract State-of-the-art machine learning models can be vulnerable to very small input perturbations that are adversarially constructed. Adversarial training is an effective approach to defend against it. Formulated as a min-max problem, it searches for the best solution when the training data were corrupted by the worst-case attacks. Linear models are among the simple models where vulnerabilities can be observed and are the focus of our study. In this case, adversarial training leads to a convex optimization problem which can be formulated as the minimization of a finite sum. We provide a comparative analysis between the solution of adversarial training in linear regression and other regularization methods. Our main findings are that: (A) Adversarial training yields the minimum-norm interpolating solution in the overparameterized regime (more parameters than data), as long as the maximum disturbance radius is smaller than a threshold. And, conversely, the minimum-norm interpolator is the solution to adversarial training with a given radius. (B) Adversarial training can be equivalent to parameter shrinking methods (ridge regression and Lasso). This happens in the underparametrized region, for an appropriate choice of adversarial radius and zero-mean symmetrically distributed covariates. (C) For $\ell_\infty$-adversarial training -- as in square-root Lasso -- the choice of adversarial radius for optimal bounds does not depend on the additive noise variance. We confirm our theoretical findings with numerical examples.
    摘要 现代机器学习模型可能受到非常小的输入干扰,这些干扰是利用攻击性的构造的。对抗攻击是一种有效的防御策略。作为一个最小化问题,它搜索最佳解决方案,当训练数据被攻击最坏的攻击时。线性模型是我们研究的简单模型,在这种情况下,对抗训练转化为一个凸优化问题,可以通过最小化一个有限和来解决。我们提供了对抗训练在线性回归和其他规化方法之间的比较分析。我们的主要发现是:(A) 对抗训练在过参数化 régime(更多参数 чем数据)下给出最小欧式 interpolator,只要攻击干扰半径小于一个阈值。而且,对抗训练的解决方案与最小欧式 interpolator相同。(B) 对抗训练可以与参数缩放方法(ridge regression和lasso)等同。这发生在下参数化 régime,对应的攻击干扰半径和零均匀分布的covariate是合适的。(C) 对 $\ell_\infty$-对抗训练(如平方lasso)中选择的攻击干扰半径对最佳 bound 没有依赖于加法噪声Variance。我们通过数学示例证明了我们的理论发现。

Neural Tangent Kernels Motivate Graph Neural Networks with Cross-Covariance Graphs

  • paper_url: http://arxiv.org/abs/2310.10791
  • repo_url: None
  • paper_authors: Shervin Khalafi, Saurabh Sihag, Alejandro Ribeiro
  • for: 这篇论文探讨了用神经网络学习推论和泛化行为,特别是在图 нейрон网络(GNN)中。
  • methods: 该论文使用了神经 tangent 函数(NTK)来分析神经网络的学习和泛化行为。
  • results: 研究发现,在 GNN 中,优化对齐关系(alignment)可以优化图表示或图变换函数,并且有理论保证对齐是最佳的。实验结果表明,使用 cross-covariance 作为图 shift 函数可以超过只使用输入数据 covariance matrix 的 GNN。
    Abstract Neural tangent kernels (NTKs) provide a theoretical regime to analyze the learning and generalization behavior of over-parametrized neural networks. For a supervised learning task, the association between the eigenvectors of the NTK kernel and given data (a concept referred to as alignment in this paper) can govern the rate of convergence of gradient descent, as well as generalization to unseen data. Building upon this concept, we investigate NTKs and alignment in the context of graph neural networks (GNNs), where our analysis reveals that optimizing alignment translates to optimizing the graph representation or the graph shift operator in a GNN. Our results further establish the theoretical guarantees on the optimality of the alignment for a two-layer GNN and these guarantees are characterized by the graph shift operator being a function of the cross-covariance between the input and the output data. The theoretical insights drawn from the analysis of NTKs are validated by our experiments focused on a multi-variate time series prediction task for a publicly available dataset. Specifically, they demonstrate that GNNs with cross-covariance as the graph shift operator indeed outperform those that operate on the covariance matrix from only the input data.
    摘要 neural tangent kernels (NTKs) 提供了一个理论 régime 来分析过 parametrized нейрон网络在学习和泛化行为的条件。在一个监督学习任务中,NTK kernel 的 eigenvectors 和资料之间的关联(一个称为“alignment”的概念)可以控制梯度下降的速度,以及对未见到的资料的泛化。在这篇文章中,我们对 GNN (图形神经网络)中的 NTK 和配置进行了研究,我们的分析表明,对 GNN 进行配置的最佳化将导致图形表现或图形移动操作的最佳化。我们的结果还证明了在 GNN 中对 alignment 的最佳化具有理论保证,这些保证是由图形移动操作的cross-covariance 和输入资料的covariance matrix 所决定。我们的实验还证明了 GNN 使用 cross-covariance 作为图形移动操作实际上会比使用仅从输入资料的covariance matrix 来操作更好。

Correcting model misspecification in physics-informed neural networks (PINNs)

  • paper_url: http://arxiv.org/abs/2310.10776
  • repo_url: None
  • paper_authors: Zongren Zou, Xuhui Meng, George Em Karniadakis
    methods: 这 paper 使用的方法包括 physics-informed neural networks (PINNs) 和其他深度神经网络 (DNNs)。results: 这 paper 的结果表明,通过使用 DNNs 来模型不完全的物理模型,可以将计算错误减少,并且可以使 PINNs 在复杂系统中应用。此外,这 paper 还使用 Bayesian PINNs (B-PINNs) 和/或 ensemble PINNs 来评估不确定性。
    Abstract Data-driven discovery of governing equations in computational science has emerged as a new paradigm for obtaining accurate physical models and as a possible alternative to theoretical derivations. The recently developed physics-informed neural networks (PINNs) have also been employed to learn governing equations given data across diverse scientific disciplines. Despite the effectiveness of PINNs for discovering governing equations, the physical models encoded in PINNs may be misspecified in complex systems as some of the physical processes may not be fully understood, leading to the poor accuracy of PINN predictions. In this work, we present a general approach to correct the misspecified physical models in PINNs for discovering governing equations, given some sparse and/or noisy data. Specifically, we first encode the assumed physical models, which may be misspecified, then employ other deep neural networks (DNNs) to model the discrepancy between the imperfect models and the observational data. Due to the expressivity of DNNs, the proposed method is capable of reducing the computational errors caused by the model misspecification and thus enables the applications of PINNs in complex systems where the physical processes are not exactly known. Furthermore, we utilize the Bayesian PINNs (B-PINNs) and/or ensemble PINNs to quantify uncertainties arising from noisy and/or gappy data in the discovered governing equations. A series of numerical examples including non-Newtonian channel and cavity flows demonstrate that the added DNNs are capable of correcting the model misspecification in PINNs and thus reduce the discrepancy between the physical models and the observational data. We envision that the proposed approach will extend the applications of PINNs for discovering governing equations in problems where the physico-chemical or biological processes are not well understood.
    摘要 <>用数据驱动的方法发现计算机科学中的管理方程式已经成为一种新的方法,以获取准确的物理模型,也可能是理论 derivations的可能的替代方法。最近开发的物理学泛化神经网络(PINNs)已经在多个科学领域中使用来学习管理方程式。 despite the effectiveness of PINNs for discovering governing equations, the physical models encoded in PINNs may be misspecified in complex systems as some of the physical processes may not be fully understood, leading to poor accuracy of PINN predictions. In this work, we present a general approach to correct the misspecified physical models in PINNs for discovering governing equations, given some sparse and/or noisy data. Specifically, we first encode the assumed physical models, which may be misspecified, then employ other deep neural networks (DNNs) to model the discrepancy between the imperfect models and the observational data. Due to the expressivity of DNNs, the proposed method is capable of reducing the computational errors caused by the model misspecification and thus enables the applications of PINNs in complex systems where the physical processes are not exactly known. Furthermore, we utilize the Bayesian PINNs (B-PINNs) and/or ensemble PINNs to quantify uncertainties arising from noisy and/or gappy data in the discovered governing equations. A series of numerical examples including non-Newtonian channel and cavity flows demonstrate that the added DNNs are capable of correcting the model misspecification in PINNs and thus reduce the discrepancy between the physical models and the observational data. We envision that the proposed approach will extend the applications of PINNs for discovering governing equations in problems where the physico-chemical or biological processes are not well understood.

Gotta be SAFE: A New Framework for Molecular Design

  • paper_url: http://arxiv.org/abs/2310.10773
  • repo_url: None
  • paper_authors: Emmanuel Noutahi, Cristian Gabellini, Michael Craig, Jonathan S. C Lim, Prudencio Tossou
  • for: 用于AI驱动的分子设计
  • methods: 使用Sequential Attachment-based Fragment Embedding(SAFE),一种新的线notation для化学结构
  • results: 通过训练一个87亿参数的GPT2-like模型,实现了多种优秀的优化性能,打开了新的化学空间探索的可能性,对AI驱动的分子设计具有广泛的应用前景
    Abstract Traditional molecular string representations, such as SMILES, often pose challenges for AI-driven molecular design due to their non-sequential depiction of molecular substructures. To address this issue, we introduce Sequential Attachment-based Fragment Embedding (SAFE), a novel line notation for chemical structures. SAFE reimagines SMILES strings as an unordered sequence of interconnected fragment blocks while maintaining full compatibility with existing SMILES parsers. It streamlines complex generative tasks, including scaffold decoration, fragment linking, polymer generation, and scaffold hopping, while facilitating autoregressive generation for fragment-constrained design, thereby eliminating the need for intricate decoding or graph-based models. We demonstrate the effectiveness of SAFE by training an 87-million-parameter GPT2-like model on a dataset containing 1.1 billion SAFE representations. Through extensive experimentation, we show that our SAFE-GPT model exhibits versatile and robust optimization performance. SAFE opens up new avenues for the rapid exploration of chemical space under various constraints, promising breakthroughs in AI-driven molecular design.
    摘要 传统分子串表示方式,如SMILES,经常对AI驱动分子设计带来挑战,因为它们不能正确地表示分子子结构的顺序。为解决这个问题,我们介绍Sequential Attachment-based Fragment Embedding(SAFE),一种新的化学结构表示方式。SAFE将SMILES字符串重新表示为一个无序的连接式块序列,同时保持与现有SMILES解析器的兼容性。这种方法可以简化复杂的生成任务,如架构饰 ornamentation、分子连接、聚合物生成和架构跳跃,并且可以支持束缚生成,从而消除需要复杂的解码或图形模型。我们在一个8700万参数的GPT2-like模型上训练了SAFE模型,并通过广泛的实验表明了我们的SAFE-GPT模型在不同的约束下进行优化表现灵活和强大。SAFE打开了新的化学空间探索的可能性,承诺AI驱动分子设计中的突破。

Unsupervised Lead Sheet Generation via Semantic Compression

  • paper_url: http://arxiv.org/abs/2310.10772
  • repo_url: https://github.com/zacharynovack/lead-ae
  • paper_authors: Zachary Novack, Nikita Srivatsan, Taylor Berg-Kirkpatrick, Julian McAuley
  • for: 本研究旨在提高生成音乐的效率和质量,通过生成与原始乐谱版本相对应的简化后的乐谱(lead sheet)。
  • methods: 我们提出了一种新的模型 called Lead-AE,它使用可控的局部稀缺约束来模型乐谱,并使用可导的top-k算法来实现简化后的乐谱可控。
  • results: 我们的方法比已有的决定性基线要好,可以生成具有准确性和完整性的简化后乐谱,并且在人工评价中也得到了良好的评价。
    Abstract Lead sheets have become commonplace in generative music research, being used as an initial compressed representation for downstream tasks like multitrack music generation and automatic arrangement. Despite this, researchers have often fallen back on deterministic reduction methods (such as the skyline algorithm) to generate lead sheets when seeking paired lead sheets and full scores, with little attention being paid toward the quality of the lead sheets themselves and how they accurately reflect their orchestrated counterparts. To address these issues, we propose the problem of conditional lead sheet generation (i.e. generating a lead sheet given its full score version), and show that this task can be formulated as an unsupervised music compression task, where the lead sheet represents a compressed latent version of the score. We introduce a novel model, called Lead-AE, that models the lead sheets as a discrete subselection of the original sequence, using a differentiable top-k operator to allow for controllable local sparsity constraints. Across both automatic proxy tasks and direct human evaluations, we find that our method improves upon the established deterministic baseline and produces coherent reductions of large multitrack scores.
    摘要 乐谱简纸在生成音乐研究中变得普遍,用作下游任务的初始压缩表示,如多轨音乐生成和自动编排。尽管如此,研究人员通常会回归到决定性减少方法(如天空线算法)来生成乐谱简纸,未受到乐谱简纸本身质量和如何准确反映其管弦乐谱的关注。为解决这些问题,我们提出了 conditional lead sheet generation 问题(即根据全音乐谱版本生成乐谱简纸),并证明这可以视为一种无监督的音乐压缩任务,其中乐谱简纸表示一个压缩的 latent 序列。我们提出了一种新的模型, называ为 Lead-AE,该模型将乐谱简纸视为原始序列的一个离散子选择,使用可微 differentiable top-k 算子来实现可控的地方缺失约束。在自动代理任务和直接人类评估中,我们发现我们的方法比已有的决定性基线更好,并生成了大量多轨音乐谱的准确压缩。

Wide Neural Networks as Gaussian Processes: Lessons from Deep Equilibrium Models

  • paper_url: http://arxiv.org/abs/2310.10767
  • repo_url: None
  • paper_authors: Tianxiang Gao, Xiaokai Huo, Hailiang Liu, Hongyang Gao
  • for: 本研究探讨了深度平衡模型(DEQ),即无限层 neural network 的普适性和训练特性。
  • methods: 本研究使用了 neural ordinary differential equations(ODEs)和 deep equilibrium models(DEQs)来探讨深度平衡模型的性质。
  • results: 研究发现,当 DEQ 层宽度趋于无穷时,它会 converge to a Gaussian process,并且这种整合在深度和宽度之间进行交换时不会出现。此外,研究还发现,相关的 Gaussian vector 在任意两个不同输入数据对之间保持非零最小特征值,这使得 NNGP kernel 具有稳定性。这些发现对 DEQ 的训练和泛化做出了基础性的研究。
    Abstract Neural networks with wide layers have attracted significant attention due to their equivalence to Gaussian processes, enabling perfect fitting of training data while maintaining generalization performance, known as benign overfitting. However, existing results mainly focus on shallow or finite-depth networks, necessitating a comprehensive analysis of wide neural networks with infinite-depth layers, such as neural ordinary differential equations (ODEs) and deep equilibrium models (DEQs). In this paper, we specifically investigate the deep equilibrium model (DEQ), an infinite-depth neural network with shared weight matrices across layers. Our analysis reveals that as the width of DEQ layers approaches infinity, it converges to a Gaussian process, establishing what is known as the Neural Network and Gaussian Process (NNGP) correspondence. Remarkably, this convergence holds even when the limits of depth and width are interchanged, which is not observed in typical infinite-depth Multilayer Perceptron (MLP) networks. Furthermore, we demonstrate that the associated Gaussian vector remains non-degenerate for any pairwise distinct input data, ensuring a strictly positive smallest eigenvalue of the corresponding kernel matrix using the NNGP kernel. These findings serve as fundamental elements for studying the training and generalization of DEQs, laying the groundwork for future research in this area.
    摘要 神经网络with宽层有吸引了广泛关注,因为它们相当于 Gaussian 过程,可以完美适应训练数据而且保持泛化性能,称为恰好的过拟合。然而,现有的研究主要集中在浅或固定深度的神经网络上,需要进行广泛的抽象深度神经网络的研究,如神经ordinary differential equations(ODEs)和deep equilibrium models(DEQs)。在这篇论文中,我们专门研究深度平衡模型(DEQ),这是一个无限深度神经网络,具有共享权重矩阵的层。我们的分析表明,当 DEQ 层的宽度接近无穷大时,它会 converge to a Gaussian process,确立了称为神经网络和Gaussian过程(NNGP)匹配。很Remarkably,这种convergence 随着深度和宽度的限制的交换,不同于 typical 无限深度多层感知(MLP)网络。此外,我们证明了相关的 Gaussian vector 在任意不同输入数据对之间保持非零特征值,确保 smallest eigenvalue of the corresponding kernel matrix strictly positive using the NNGP kernel。这些发现对 DEQ 的训练和泛化提供了基本的元素,为将来在这个领域的研究奠定基础。

Exploring hyperelastic material model discovery for human brain cortex: multivariate analysis vs. artificial neural network approaches

  • paper_url: http://arxiv.org/abs/2310.10762
  • repo_url: None
  • paper_authors: Jixin Hou, Nicholas Filla, Xianyan Chen, Mir Jalil Razavi, Tianming Liu, Xianqiao Wang
  • for: 这个研究的目的是找到最适合人脑组织的 constitutive material model.
  • methods: 这个研究使用人工神经网络和多元回归方法来自动找到合适的 constitutive material model.
  • results: 研究发现,人工神经网络可以自动地找到准确的 constitutive material model,但是五个参数和两个参数神经网络模型在单模和多模载 scenarios下被发现是不优的,可以further simplifies into two-term和单term模型。这些发现 highlights the importance of hyperparameters for artificial neural network和emphasize the necessity for detailed cross-validations of regularization parameters to ensure optimal selection at a global level.
    Abstract Traditional computational methods, such as the finite element analysis, have provided valuable insights into uncovering the underlying mechanisms of brain physical behaviors. However, precise predictions of brain physics require effective constitutive models to represent the intricate mechanical properties of brain tissue. In this study, we aimed to identify the most favorable constitutive material model for human brain tissue. To achieve this, we applied artificial neural network and multiple regression methods to a generalization of widely accepted classic models, and compared the results obtained from these two approaches. To evaluate the applicability and efficacy of the model, all setups were kept consistent across both methods, except for the approach to prevent potential overfitting. Our results demonstrate that artificial neural networks are capable of automatically identifying accurate constitutive models from given admissible estimators. Nonetheless, the five-term and two-term neural network models trained under single-mode and multi-mode loading scenarios, were found to be suboptimal and could be further simplified into two-term and single-term, respectively, with higher accuracy using multiple regression. Our findings highlight the importance of hyperparameters for the artificial neural network and emphasize the necessity for detailed cross-validations of regularization parameters to ensure optimal selection at a global level in the development of material constitutive models. This study validates the applicability and accuracy of artificial neural network to automatically discover constitutive material models with proper regularization as well as the benefits in model simplification without compromising accuracy for traditional multivariable regression.
    摘要 传统计算方法,如finite element分析,已经提供了许多关键的发现,揭示了大脑物理行为的下面机制。然而,精确预测大脑物理需要有效的 constitutive 模型来表示大脑组织的复杂机械性质。在本研究中,我们想要找到最佳的 constitutive 材料模型 для人类大脑组织。为了实现这一目标,我们使用人工神经网络和多重回归方法,并对这两种方法进行比较。为了评估模型的适用性和效果,所有的设置都保持了一致,除非是避免过拟合。我们的结果表明,人工神经网络可以自动地从给定的可接受的估计器中提取精确的 constitutive 模型。然而,在单模式和多模式荷载场景下,五项和二项神经网络模型被发现为不优化,可以进一步简化为二项和单项模型,具有更高的准确率。我们的发现强调了人工神经网络中的hyperparameter的重要性,并提醒了在开发物理模型时需要进行详细的交叉验证,以确保优选的全局范围内的正则化参数。本研究证明了人工神经网络可以自动地找到符合正则化的 constitutive 材料模型,并且可以避免过拟合而不会产生准确性下降。

Statistical Barriers to Affine-equivariant Estimation

  • paper_url: http://arxiv.org/abs/2310.10758
  • repo_url: None
  • paper_authors: Zihao Chen, Yeshwanth Cherapanamjeri
  • for: Robust mean estimation in high-dimensional datasets with affine-invariant properties.
  • methods: Affine-equivariant estimators, lower bounds, and a new estimator based on a high-dimensional median.
  • results: Strict degradation in recovery error with quantitative rates degrading by a factor of $\sqrt{d}$ under two outlier models, and a new affine-equivariant estimator that nearly matches the lower bound.
    Abstract We investigate the quantitative performance of affine-equivariant estimators for robust mean estimation. As a natural stability requirement, the construction of such affine-equivariant estimators has been extensively studied in the statistics literature. We quantitatively evaluate these estimators under two outlier models which have been the subject of much recent work: the heavy-tailed and adversarial corruption settings. We establish lower bounds which show that affine-equivariance induces a strict degradation in recovery error with quantitative rates degrading by a factor of $\sqrt{d}$ in both settings. We find that classical estimators such as the Tukey median (Tukey '75) and Stahel-Donoho estimator (Stahel '81 and Donoho '82) are either quantitatively sub-optimal even within the class of affine-equivariant estimators or lack any quantitative guarantees. On the other hand, recent estimators with strong quantitative guarantees are not affine-equivariant or require additional distributional assumptions to achieve it. We remedy this by constructing a new affine-equivariant estimator which nearly matches our lower bound. Our estimator is based on a novel notion of a high-dimensional median which may be of independent interest. Notably, our results are applicable more broadly to any estimator whose performance is evaluated in the Mahalanobis norm which, for affine-equivariant estimators, corresponds to an evaluation in Euclidean norm on isotropic distributions.
    摘要 我们研究了不变式性的估计器在鲁棒均值估计中的量化性能。作为自然的稳定要求,建构这类不变式估计器在统计学Literature中得到了广泛的研究。我们量化地评估这些估计器在两种噪声模型下:重 tailed 和 adversarial corruption 设定下。我们建立了下限,显示不变式性导致了减少Recovery error的精度下限,具体是在两个设定下的 $\sqrt{d}$ 因子下降。我们发现经典估计器,如Tukey median(Tukey '75)和Stahel-Donoho estimator(Stahel '81和Donoho '82)在不变式估计器中是量化上不优或者没有量化保证。然而,现有的估计器具有强量化保证的,但是它们不是不变式的或者需要额外的分布假设来实现不变式性。我们提供了一种新的不变式估计器,它几乎与我们的下限匹配。我们的估计器基于一种新的高维度中位数据,这可能是独立的兴趣。值得注意的是,我们的结果适用于任何在Mahalanobis 距离上评估的估计器,这对于不变式估计器来说相当于在几何均勋度上评估。

Mori-Zwanzig latent space Koopman closure for nonlinear autoencoder

  • paper_url: http://arxiv.org/abs/2310.10745
  • repo_url: None
  • paper_authors: Priyam Gupta, Peter J. Schmid, Denis Sipp, Taraneh Sayadi, Georgios Rigas
  • for: 该研究旨在提高数据驱动方法的精度和稳定性,以便更好地理解和预测复杂非线性系统的动态。
  • methods: 该研究提出了一种新的Morzy-Zwanzig自适应器(MZ-AE),通过非线性自适应器提取关键观察量,并通过Morzy-Zwanzigormalism来修正非马洛夫矩阵,实现了closed的动态表示。
  • results: 实验表明,MZ-AE可以准确地捕捉圆柱体流动中的模式转移,并提供了低维度的预测模型,对恒定 Kuramoto-Sivashinsky 系统 exhibit promising short-term predictability和robust long-term statistical performance。
    Abstract The Koopman operator presents an attractive approach to achieve global linearization of nonlinear systems, making it a valuable method for simplifying the understanding of complex dynamics. While data-driven methodologies have exhibited promise in approximating finite Koopman operators, they grapple with various challenges, such as the judicious selection of observables, dimensionality reduction, and the ability to predict complex system behaviours accurately. This study presents a novel approach termed Mori-Zwanzig autoencoder (MZ-AE) to robustly approximate the Koopman operator in low-dimensional spaces. The proposed method leverages a nonlinear autoencoder to extract key observables for approximating a finite invariant Koopman subspace and integrates a non-Markovian correction mechanism using the Mori-Zwanzig formalism. Consequently, this approach yields a closed representation of dynamics within the latent manifold of the nonlinear autoencoder, thereby enhancing the precision and stability of the Koopman operator approximation. Demonstrations showcase the technique's ability to capture regime transitions in the flow around a circular cylinder. It also provided a low dimensional approximation for chaotic Kuramoto-Sivashinsky with promising short-term predictability and robust long-term statistical performance. By bridging the gap between data-driven techniques and the mathematical foundations of Koopman theory, MZ-AE offers a promising avenue for improved understanding and prediction of complex nonlinear dynamics.
    摘要 科普曼运算符提供了一种globally linearization的方法,使得非线性系统的理解得以简化。虽然数据驱动的方法在 aproximate Koopman operator 方面表现出了承诺,但它们还需要解决一些挑战,例如选择合适的观察量、维度减少和准确预测复杂系统行为。这种研究提出了一种新的方法,即Mori-Zwanzig autoencoder (MZ-AE),以稳定地 aproximate Koopman operator 在低维空间中。该方法利用非线性自适应神经网络提取关键观察量,并通过Morin Zwanzig 正则进行修正。因此,该方法可以在非线性自适应神经网络的 latent manifold 中 closure 动力学,从而提高 Koopman operator 的准确性和稳定性。示例显示该方法可以在圆柱体流动中捕捉转态。此外,它还为混沌 Kuramoto-Sivashinsky 提供了一种低维度的近似,并且在短期预测和长期统计性能方面具有承诺。通过将数据驱动技术与 Koopman 理论的数学基础相连接,MZ-AE 提供了一条可能的通路,以提高复杂非线性动力学的理解和预测。

Fast Adversarial Label-Flipping Attack on Tabular Data

  • paper_url: http://arxiv.org/abs/2310.10744
  • repo_url: None
  • paper_authors: Xinglong Chang, Gillian Dobbie, Jörg Wicker
  • for: 这篇研究旨在阐述机器学习模型在需要高可靠性的领域中受到攻击的问题,以及这些攻击的威胁。
  • methods: 本研究提出了一种新的快速攻击方法,即 Fast Adversarial Label-Flipping Attack (FALFA),用于游戏机器学习模型。FALFA基于对敌人目标的变数转换,并使用线性程式来降低计算Complexity。
  • results: 使用了10个真实的条形数据集,研究发现FALFA具有高度的攻击潜力,显示了需要robust的防御措施。
    Abstract Machine learning models are increasingly used in fields that require high reliability such as cybersecurity. However, these models remain vulnerable to various attacks, among which the adversarial label-flipping attack poses significant threats. In label-flipping attacks, the adversary maliciously flips a portion of training labels to compromise the machine learning model. This paper raises significant concerns as these attacks can camouflage a highly skewed dataset as an easily solvable classification problem, often misleading machine learning practitioners into lower defenses and miscalculations of potential risks. This concern amplifies in tabular data settings, where identifying true labels requires expertise, allowing malicious label-flipping attacks to easily slip under the radar. To demonstrate this risk is inherited in the adversary's objective, we propose FALFA (Fast Adversarial Label-Flipping Attack), a novel efficient attack for crafting adversarial labels. FALFA is based on transforming the adversary's objective and employs linear programming to reduce computational complexity. Using ten real-world tabular datasets, we demonstrate FALFA's superior attack potential, highlighting the need for robust defenses against such threats.
    摘要

MOFDiff: Coarse-grained Diffusion for Metal-Organic Framework Design

  • paper_url: http://arxiv.org/abs/2310.10732
  • repo_url: None
  • paper_authors: Xiang Fu, Tian Xie, Andrew S. Rosen, Tommi Jaakkola, Jake Smith
  • for: 这项研究旨在开发一种基于推卷模型的金属有机框架(MOF)结构生成器,以便为碳捕集应用提供高性能的MOF材料。
  • methods: 该研究使用了一种基于等距离推卷模型的diffusion模型,通过对分子组分部件坐标和identities进行杜磊推卷过程,生成高精度的MOF结构。然后,通过一种新型组装算法,确定了全原子MOF结构。
  • results: 研究人员通过分子仿真实验,证明了该模型可以生成有效和新型的MOF结构,并且可以有效地设计出standing MOFOaterials for carbon capture应用。
    Abstract Metal-organic frameworks (MOFs) are of immense interest in applications such as gas storage and carbon capture due to their exceptional porosity and tunable chemistry. Their modular nature has enabled the use of template-based methods to generate hypothetical MOFs by combining molecular building blocks in accordance with known network topologies. However, the ability of these methods to identify top-performing MOFs is often hindered by the limited diversity of the resulting chemical space. In this work, we propose MOFDiff: a coarse-grained (CG) diffusion model that generates CG MOF structures through a denoising diffusion process over the coordinates and identities of the building blocks. The all-atom MOF structure is then determined through a novel assembly algorithm. Equivariant graph neural networks are used for the diffusion model to respect the permutational and roto-translational symmetries. We comprehensively evaluate our model's capability to generate valid and novel MOF structures and its effectiveness in designing outstanding MOF materials for carbon capture applications with molecular simulations.
    摘要 金属有机框架(MOF)在应用于气体存储和碳捕集等领域具有极高的利用价值,这主要归功于它们的非常的孔隙和可调化化学结构。MOF的模块性质使得可以通过模板基本方法生成 гипотетическихMOF结构,这些结构是通过将分子结构块组合在已知网络结构中来实现的。然而,这些方法的选择性往往受到生成化学空间的局限性的影响。在这种情况下,我们提出了MOFDiff:一种粗粒度(CG)扩散模型,该模型通过CG结构块坐标和分子标识的杜瓦扩散过程来生成CG MOF结构。然后,我们使用一种新的组装算法来确定全原子MOF结构。我们使用对称图 Néural networks来实现扩散模型,以保持分子的卷积和旋转平移 symmetries。我们对我们的模型的有效性进行了广泛的评估,并通过分子价值计算来评估MOF材料在碳捕集应用中的性能。

A representation learning approach to probe for dynamical dark energy in matter power spectra

  • paper_url: http://arxiv.org/abs/2310.10717
  • repo_url: None
  • paper_authors: Davide Piras, Lucas Lombriser
  • for: searched for a compressed representation of dynamical dark energy models in observational studies of the cosmic large-scale structure.
  • methods: trained a variational autoencoder (VAE) architecture, DE-VAE, on matter power spectra boosts generated at different redshift values and wavenumbers, and used a neural network to compress and reconstruct the boosts.
  • results: found that a single latent parameter is sufficient to predict 95% (99%) of DE power spectra within $1\sigma$ ($2\sigma$) of a Gaussian error, and the three variables can be linked together with an explicit equation through symbolic regression.
    Abstract We present DE-VAE, a variational autoencoder (VAE) architecture to search for a compressed representation of dynamical dark energy (DE) models in observational studies of the cosmic large-scale structure. DE-VAE is trained on matter power spectra boosts generated at wavenumbers $k\in(0.01-2.5) \ h/\rm{Mpc}$ and at four redshift values $z\in(0.1,0.48,0.78,1.5)$ for the most typical dynamical DE parametrization with two extra parameters describing an evolving DE equation of state. The boosts are compressed to a lower-dimensional representation, which is concatenated with standard cold dark matter (CDM) parameters and then mapped back to reconstructed boosts; both the compression and the reconstruction components are parametrized as neural networks. Remarkably, we find that a single latent parameter is sufficient to predict 95% (99%) of DE power spectra generated over a broad range of cosmological parameters within $1\sigma$ ($2\sigma$) of a Gaussian error which includes cosmic variance, shot noise and systematic effects for a Stage IV-like survey. This single parameter shows a high mutual information with the two DE parameters, and these three variables can be linked together with an explicit equation through symbolic regression. Considering a model with two latent variables only marginally improves the accuracy of the predictions, and adding a third latent variable has no significant impact on the model's performance. We discuss how the DE-VAE architecture can be extended from a proof of concept to a general framework to be employed in the search for a common lower-dimensional parametrization of a wide range of beyond-$\Lambda$CDM models and for different cosmological datasets. Such a framework could then both inform the development of cosmological surveys by targeting optimal probes, and provide theoretical insight into the common phenomenological aspects of beyond-$\Lambda$CDM models.
    摘要 我们提出了DE-VAE,一种简化自动抽象(VAE)架构,用于在观测宇宙大规模结构的观测学中寻找压缩表现。DE-VAE 被训练在物质能谱强化器中,这些强化器在几何常数 $k\in(0.01-2.5) \ h/\rm{Mpc}$ 和四个红shift值 $z\in(0.1,0.48,0.78,1.5)$ 上生成了最常见的动态暗能(DE)模型的两个额外参数。这些强化器被压缩到较低维度的表现,并与标准冷黑物质(CDM)参数一起 concatenated,然后将其映射回重建的强化器;压缩和重建都是用神经网 parametrized。我们发现,仅具一个潜在参数可以预测95%(99%)的DE强化器生成的广泛范围的 cosmological 参数之间的误差,包括cosmic variance、shot noise和系统效应。这个单一参数与 DE 两个参数之间存在高的共同信息,这三个变数可以通过symbolic regression 连接起来。仅具二个潜在参数的情况只有marginally 提高了预测的精度,而添加第三个潜在参数没有显著影响模型的性能。我们讨论了DE-VAE 架构如何从证明理论中扩展到一个通用的架构,以便在不同的 cosmological 资料集上寻找一致的下dimensional parametrization。这个架构可以帮助发展 cosmological 调查,targeting 最佳探针,并提供理论上的共同现象描述。

A Computational Framework for Solving Wasserstein Lagrangian Flows

  • paper_url: http://arxiv.org/abs/2310.10649
  • repo_url: https://github.com/necludov/wl-mechanics
  • paper_authors: Kirill Neklyudov, Rob Brekelmans, Alexander Tong, Lazar Atanackovic, Qiang Liu, Alireza Makhzani
  • for: 这篇论文主要针对单元细胞动力学问题进行优化运输问题的扩展,包括不同的可能性空间(kinetic energy)和权重函数(potential energy)的组合,以及这些组合所导致的多种优化运输问题,如契德桥、不均习运输、物理约束等。
  • methods: 该论文提出了一种基于深度学习的框架,可以处理这些优化运输问题的复杂计算。该框架不需要直接 simulate 或 backpropagate learned dynamics,也不需要优化couplings。
  • results: 作者们在单元细胞动力学问题中展示了该框架的灵活性和高效性,并比 précédentes 方法(含 incorporating prior knowledge into the dynamics)取得了更好的结果。
    Abstract The dynamical formulation of the optimal transport can be extended through various choices of the underlying geometry ($\textit{kinetic energy}$), and the regularization of density paths ($\textit{potential energy}$). These combinations yield different variational problems ($\textit{Lagrangians}$), encompassing many variations of the optimal transport problem such as the Schr\"odinger bridge, unbalanced optimal transport, and optimal transport with physical constraints, among others. In general, the optimal density path is unknown, and solving these variational problems can be computationally challenging. Leveraging the dual formulation of the Lagrangians, we propose a novel deep learning based framework approaching all of these problems from a unified perspective. Our method does not require simulating or backpropagating through the trajectories of the learned dynamics, and does not need access to optimal couplings. We showcase the versatility of the proposed framework by outperforming previous approaches for the single-cell trajectory inference, where incorporating prior knowledge into the dynamics is crucial for correct predictions.
    摘要 “Optimal transport问题的动力学表述可以通过不同的下层结构(动能)和扩散函数(potential energy)的选择扩展。这些组合导致了不同的变量问题(Lagrangians),包括舒得桥、不均衡优化运输、物理约束优化运输等。总的来说,优化的扩散路径未知,解决这些变量问题可能会 computationally challenging。我们基于对准形式的方法提出了一种新的深度学习框架,该框架不需要 simulate或backpropagate通过学习的动力学轨迹,也不需要对优化的扩散函数进行访问。我们展示了该框架的多样性,在单个细胞轨迹推断中超过了先前的方法。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Efficacy of Dual-Encoders for Extreme Multi-Label Classification

  • paper_url: http://arxiv.org/abs/2310.10636
  • repo_url: None
  • paper_authors: Nilesh Gupta, Devvrit Khatri, Ankit S Rawat, Srinadh Bhojanapalli, Prateek Jain, Inderjit S Dhillon
  • for: 这个论文主要针对的是多类分类问题(Extreme Multi-Label Classification,XMC),具体来说是使用 dual-encoder 模型来解决这类问题。
  • methods: 这篇论文使用了 dual-encoder 模型,并且提出了一种新的损失函数来优化 Recall@k 纪录。
  • results: 论文的实验结果表明,使用标准的 dual-encoder 模型可以与现有的 SOTA 多类分类方法匹配或超越,即使是在最大的 XMC 数据集上。此外,论文还提出了一种可微的 topk 错误基于损失函数,可以专门优化 Recall@k 纪录。
    Abstract Dual-encoder models have demonstrated significant success in dense retrieval tasks for open-domain question answering that mostly involves zero-shot and few-shot scenarios. However, their performance in many-shot retrieval problems where training data is abundant, such as extreme multi-label classification (XMC), remains under-explored. Existing empirical evidence suggests that, for such problems, the dual-encoder method's accuracies lag behind the performance of state-of-the-art (SOTA) extreme classification methods that grow the number of learnable parameters linearly with the number of classes. As a result, some recent extreme classification techniques use a combination of dual-encoders and a learnable classification head for each class to excel on these tasks. In this paper, we investigate the potential of "pure" DE models in XMC tasks. Our findings reveal that when trained correctly standard dual-encoders can match or outperform SOTA extreme classification methods by up to 2% at Precision@1 even on the largest XMC datasets while being 20x smaller in terms of the number of trainable parameters. We further propose a differentiable topk error-based loss function, which can be used to specifically optimize for Recall@k metrics. We include our PyTorch implementation along with other resources for reproducing the results in the supplementary material.
    摘要 dual-encoder 模型在开放问题 answering 中的 dense retrieval 任务中表现出了重要的成功,特别是在零shot 和几shot 场景下。然而,它们在有很多training data的 many-shot retrieval 问题中,如极多标签分类 (XMC),的性能还未得到了充分的探索。现有的实际证据表明,对于这些任务, dual-encoder 方法的准确率落后于 state-of-the-art (SOTA) 极分类方法的性能,后者通过将学习参数的数量与类数直线上增加来提高性能。因此,一些最新的极分类技术使用了 dual-encoder 和每个类别上的学习权重的组合来进行优化。在这篇论文中,我们调查了 "纯" dual-encoder 模型在 XMC 任务中的潜力。我们发现,当正确地训练标准 dual-encoder 模型时,它们可以与 SOTA 极分类方法相当或超越它们,在最大 XMC 数据集上的精度@1 指标上提高至2%,而且只需20倍的学习参数数量。我们还提出了一种可微的 topk 错误基于损失函数,可以专门优化 Recall@k 指标。我们在辅料中包含了 PyTorch 实现以及其他用于重现结果的资源。

Certainty In, Certainty Out: REVQCs for Quantum Machine Learning

  • paper_url: http://arxiv.org/abs/2310.10629
  • repo_url: None
  • paper_authors: Hannah Helgesen, Michael Felsberg, Jan-Åke Larsson
  • for: 这篇论文的目的是提出高单个样本准确率作为主要目标,并提出一种逆向培训方法以实现此目标。
  • methods: 该论文使用统计理论和反向培训方法来实现高准确率单个样本推断。
  • results: 论文通过评估多种有效的变换量量计划(VQC)在随机二进制子集上进行单个样本推断时,实现了10-15%的提升。
    Abstract The field of Quantum Machine Learning (QML) has emerged recently in the hopes of finding new machine learning protocols or exponential speedups for classical ones. Apart from problems with vanishing gradients and efficient encoding methods, these speedups are hard to find because the sampling nature of quantum computers promotes either simulating computations classically or running them many times on quantum computers in order to use approximate expectation values in gradient calculations. In this paper, we make a case for setting high single-sample accuracy as a primary goal. We discuss the statistical theory which enables highly accurate and precise sample inference, and propose a method of reversed training towards this end. We show the effectiveness of this training method by assessing several effective variational quantum circuits (VQCs), trained in both the standard and reversed directions, on random binary subsets of the MNIST and MNIST Fashion datasets, on which our method provides an increase of $10-15\%$ in single-sample inference accuracy.
    摘要 quantum机器学习(QML)领域在最近才出现,旨在找到新的机器学习协议或类比速度。不过,由于混合度难以计算和有效编码方法,这些增速很难找。在这篇论文中,我们提出了设置高单个样本准确率为主要目标的观点。我们讨论了 Statistical Theory,它使得高准确和精确的样本推测成为可能,并提出了反向训练方法。我们通过评估多种有效的量子征值逻辑环(VQC),在标准和反向方向上进行训练,在随机二进制subset of MNIST和MNIST Fashion数据集上显示了10-15%的单个样本推测准确率提高。

How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations

  • paper_url: http://arxiv.org/abs/2310.10616
  • repo_url: None
  • paper_authors: Tianyu Guo, Wei Hu, Song Mei, Huan Wang, Caiming Xiong, Silvio Savarese, Yu Bai
  • for: 这篇论文探讨了基于转换器架构的大语言模型在更复杂的情况下的启发式学习(ICL)能力,以及这种能力的理论和机制。
  • methods: 作者构造了一系列基于复杂表达函数的synthetic ICLe学习问题,并证明了存在特定的转换器可以近似地实现这些算法,只需要较少的深度和大小。在实验中,作者发现训练过的转换器能够在这些设定下达到近似optimal ICL性能,并展示了层次结构的分解,其中下层层transforms the dataset,而上层进行线性ICL。
  • results: 作者通过广泛的探索和一种新的粘贴实验发现了许多在训练过的转换器中的机制,如输入和表示的具体复制行为,上层线性ICL能力,以及在更加复杂的混合 Setting中的后ICL表示选择机制。这些观察到的机制与作者的理论相符,可能有助于理解转换器在更真实的场景中的ICL能力。
    Abstract While large language models based on the transformer architecture have demonstrated remarkable in-context learning (ICL) capabilities, understandings of such capabilities are still in an early stage, where existing theory and mechanistic understanding focus mostly on simple scenarios such as learning simple function classes. This paper takes initial steps on understanding ICL in more complex scenarios, by studying learning with representations. Concretely, we construct synthetic in-context learning problems with a compositional structure, where the label depends on the input through a possibly complex but fixed representation function, composed with a linear function that differs in each instance. By construction, the optimal ICL algorithm first transforms the inputs by the representation function, and then performs linear ICL on top of the transformed dataset. We show theoretically the existence of transformers that approximately implement such algorithms with mild depth and size. Empirically, we find trained transformers consistently achieve near-optimal ICL performance in this setting, and exhibit the desired dissection where lower layers transforms the dataset and upper layers perform linear ICL. Through extensive probing and a new pasting experiment, we further reveal several mechanisms within the trained transformers, such as concrete copying behaviors on both the inputs and the representations, linear ICL capability of the upper layers alone, and a post-ICL representation selection mechanism in a harder mixture setting. These observed mechanisms align well with our theory and may shed light on how transformers perform ICL in more realistic scenarios.
    摘要 大型语言模型基于变换器架构已经展示了很出色的上下文学习(ICL)能力,但对这些能力的理解仍然处于早期阶段,现有的理论和机制理解主要集中在简单的情景下,如学习简单的函数类。这篇论文从更复杂的情景出发,研究学习表示法。具体来说,我们构造了一些具有复合结构的培成式上下文学习问题,其中标签取决于输入的表示函数,这个函数可能是复杂的但固定的。因此,最佳的ICL算法首先将输入转化为表示函数的输出,然后在这些输出上进行线性ICL。我们证明了在某种程度上,存在可以近似实现这种算法的变换器,只需要较少的深度和大小。Empirically,我们发现训练后的变换器在这种设定下具有近乎最佳的ICL性能,并且展现出了预期的分割,下层层次将输入数据转化,而上层层次进行线性ICL。通过广泛的探索和一种新的粘贴实验,我们还发现了许多内部机制,如输入和表示的具体复制行为,上层层次的线性ICL能力,以及在更复杂的混合 Setting下的后ICL表示选择机制。这些观察到的机制与我们的理论相吻合,可能为ICL在更实际的情景中的研究提供了灵感。

IW-GAE: Importance weighted group accuracy estimation for improved calibration and model selection in unsupervised domain adaptation

  • paper_url: http://arxiv.org/abs/2310.10611
  • repo_url: None
  • paper_authors: Taejong Joo, Diego Klabjan
  • for: 这篇论文旨在解决机器学习中的分布偏移问题,即在不具备标签的情况下,在分布偏移后的预测 зада务中保持高度的准确性。
  • methods: 该论文提出了一种重要性权重Weighted group accuracy estimator,通过提出一个优化问题,找到一个能够在分布偏移后的预测任务中准确地估计分组准确率的重要性权重。同时,该论文也进行了理论分析。
  • results: 经过广泛的实验,论文证明了该重要性权重Weighted group accuracy estimator的效果,可以帮助解决不supervised domain adaptation问题中的模型校准和模型选择问题。同时,该论文还提出了一种新的改进方向,即通过提高分组准确率来提高模型的转移率。
    Abstract Reasoning about a model's accuracy on a test sample from its confidence is a central problem in machine learning, being connected to important applications such as uncertainty representation, model selection, and exploration. While these connections have been well-studied in the i.i.d. settings, distribution shifts pose significant challenges to the traditional methods. Therefore, model calibration and model selection remain challenging in the unsupervised domain adaptation problem--a scenario where the goal is to perform well in a distribution shifted domain without labels. In this work, we tackle difficulties coming from distribution shifts by developing a novel importance weighted group accuracy estimator. Specifically, we formulate an optimization problem for finding an importance weight that leads to an accurate group accuracy estimation in the distribution shifted domain with theoretical analyses. Extensive experiments show the effectiveness of group accuracy estimation on model calibration and model selection. Our results emphasize the significance of group accuracy estimation for addressing challenges in unsupervised domain adaptation, as an orthogonal improvement direction with improving transferability of accuracy.
    摘要 machine learning 中,关于模型在测试样本上的准确性的推理是一个中心问题,与重要应用领域如不确定性表示、模型选择和探索相连。然而,在不同分布下 poses significant challenges to traditional methods。因此,模型准确性和模型选择在无supervised domain adaptation问题中仍然是挑战。在这种情况下,我们通过开发一种重要性加权组准精度估计器来解决分布shift的困难。specifically,我们提出了一个优化问题,找到一个导致在分布shifted domain中准确的组准精度估计器,并进行了理论分析。我们的实验表明,组准精度估计器对模型准确性和模型选择具有重要的作用,并且是一个对照 Transferability of accuracy的正交改进方向。

BayRnTune: Adaptive Bayesian Domain Randomization via Strategic Fine-tuning

  • paper_url: http://arxiv.org/abs/2310.10606
  • repo_url: None
  • paper_authors: Tianle Huang, Nitish Sontakke, K. Niranjan Kumar, Irfan Essa, Stefanos Nikolaidis, Dennis W. Hong, Sehoon Ha
  • for: 降低 sim2real 距离
  • methods: 自适应域随机化 + 精细调整
  • results: 比 vanilla DR 或 Bayesian DR 更高的奖励得分,同样的时间步数内Here’s a more detailed explanation of each point:
  • for: The paper is written to address the issue of careful tuning of randomization parameters in domain randomization (DR) methods, and to propose a new method called Adaptive Bayesian Domain Randomization via Strategic Fine-tuning (BayRnTune) that can significantly accelerate the learning process by fine-tuning from previously learned policy.
  • methods: The proposed BayRnTune method inherits the spirit of Bayesian DR but with a key difference - it uses strategic fine-tuning of the previous policy to adapt to new environments. The method is evaluated in five simulated environments, ranging from simple benchmark tasks to more complex legged robot environments.
  • results: The results show that BayRnTune yields better rewards in the same amount of timesteps compared to vanilla domain randomization or Bayesian DR. This suggests that the proposed method can significantly accelerate the learning process and improve the performance of DR in robotics.
    Abstract Domain randomization (DR), which entails training a policy with randomized dynamics, has proven to be a simple yet effective algorithm for reducing the gap between simulation and the real world. However, DR often requires careful tuning of randomization parameters. Methods like Bayesian Domain Randomization (Bayesian DR) and Active Domain Randomization (Adaptive DR) address this issue by automating parameter range selection using real-world experience. While effective, these algorithms often require long computation time, as a new policy is trained from scratch every iteration. In this work, we propose Adaptive Bayesian Domain Randomization via Strategic Fine-tuning (BayRnTune), which inherits the spirit of BayRn but aims to significantly accelerate the learning processes by fine-tuning from previously learned policy. This idea leads to a critical question: which previous policy should we use as a prior during fine-tuning? We investigated four different fine-tuning strategies and compared them against baseline algorithms in five simulated environments, ranging from simple benchmark tasks to more complex legged robot environments. Our analysis demonstrates that our method yields better rewards in the same amount of timesteps compared to vanilla domain randomization or Bayesian DR.
    摘要 域随机化(DR),即在训练策略时随机使用不同的动力学,已经证明是一种简单 yet effective的算法,可以减少实际世界和模拟之间的差距。然而,DR通常需要仔细调整随机参数。例如, bayesian domain randomization(Bayesian DR)和活动域随机化(Adaptive DR)可以自动选择随机参数的范围,使用实际世界经验。尽管有效,这些算法通常需要长时间的计算,因为每轮训练都需要从头开始训练一个新的策略。在这种情况下,我们提出了 adaptive bayesian domain randomization via strategic fine-tuning(BayRnTune),它继承了 BayRn 的精神,但是强调快速学习过程,通过对之前学习的策略进行细化来加速学习。这个想法引出了一个关键的问题:我们在细化过程中应该使用哪个先前学习的策略作为先前?我们 investigate了四种不同的细化策略,并与基线算法进行比较在五个模拟环境中,这些环境从简单的benchmark任务到更复杂的四肢机器人环境。我们的分析表明,我们的方法可以在同样的时间步骤内达到更高的奖励。

Pareto Optimization to Accelerate Multi-Objective Virtual Screening

  • paper_url: http://arxiv.org/abs/2310.10598
  • repo_url: None
  • paper_authors: Jenna C. Fromer, David E. Graff, Connor W. Coley
  • for: 本研究旨在透过多属性算法来快速找到具有强烈结合性、最小化副作用和适当的药物性特性的药物分子。
  • methods: 本研究使用多属性贝叶斯搜寻来减少虚拟实验成本,并运用这种方法在确定蛋白质和副标的对应的选择性抑制剂中找到适当的药物分子。
  • results: 本研究发现,使用多属性贝叶斯搜寻可以快速找到具有强烈结合性、最小化副作用和适当的药物性特性的药物分子,并且可以实现对虚拟实验中的药物分子库进行高效的搜寻和范畴化。
    Abstract The discovery of therapeutic molecules is fundamentally a multi-objective optimization problem. One formulation of the problem is to identify molecules that simultaneously exhibit strong binding affinity for a target protein, minimal off-target interactions, and suitable pharmacokinetic properties. Inspired by prior work that uses active learning to accelerate the identification of strong binders, we implement multi-objective Bayesian optimization to reduce the computational cost of multi-property virtual screening and apply it to the identification of ligands predicted to be selective based on docking scores to on- and off-targets. We demonstrate the superiority of Pareto optimization over scalarization across three case studies. Further, we use the developed optimization tool to search a virtual library of over 4M molecules for those predicted to be selective dual inhibitors of EGFR and IGF1R, acquiring 100% of the molecules that form the library's Pareto front after exploring only 8% of the library. This workflow and associated open source software can reduce the screening burden of molecular design projects and is complementary to research aiming to improve the accuracy of binding predictions and other molecular properties.
    摘要 发现治疗分子是一个多目标优化问题的基本问题。一种形ulation的问题是通过同时具有高绑定亲和力、最小的偶折受影响和合适的药物生物学性 Properties 来认定分子。取得了先前工作使用活动学习加速绑定分子的识别的灵感,我们实现了多属性权重优化来降低虚拟屏选中计算成本,并应用于预测绑定分子的药物设计中。我们在三个案例中证明了对比权重优化的优势,并使用开发的优化工具来搜索虚拟库中的可选性双抑制剂。通过探索虚拟库的8% only,我们收获了虚拟库的极值 front 上的100%分子。这种工作流和相关的开源软件可以减轻分子设计项目的屏选负担,并且与尝试提高绑定预测和其他分子性质的研究相 complementary。

HelmSim: Learning Helmholtz Dynamics for Interpretable Fluid Simulation

  • paper_url: http://arxiv.org/abs/2310.10565
  • repo_url: None
  • paper_authors: Lanxiang Xing, Haixu Wu, Yuezhou Ma, Jianmin Wang, Mingsheng Long
  • for: 这个论文旨在提出一种准确且可解释的流体模拟器,即HelmSim,以解决流体动力学的长期挑战。
  • methods: 该论文提出了一种基于Helmholtz定理的HelmDynamic块,该块将流体动力学分解为更容易解决的curl-free和divergence-free部分,物理相应于流体的潜potential和流体流函数。该块被 embedding到一个多尺度 интеграцион网络中,以 интеGRATE temporal维度上的多个空间尺度的Helmholtz动力学。
  • results: 对比previoius velocity estimating方法,HelmSim具有 faithful derived from Helmholtz theorem和Physically interpretable evidence,并在 numerically simulated和实际观测的标准准样中实现了一致的state-of-the-art表现,即使在复杂的边界条件下。
    Abstract Fluid simulation is a long-standing challenge due to the intrinsic high-dimensional non-linear dynamics. Previous methods usually utilize the non-linear modeling capability of deep models to directly estimate velocity fields for future prediction. However, skipping over inherent physical properties but directly learning superficial velocity fields will overwhelm the model from generating precise or physics-reliable results. In this paper, we propose the HelmSim toward an accurate and interpretable simulator for fluid. Inspired by the Helmholtz theorem, we design a HelmDynamic block to learn the Helmholtz dynamics, which decomposes fluid dynamics into more solvable curl-free and divergence-free parts, physically corresponding to potential and stream functions of fluid. By embedding the HelmDynamic block into a Multiscale Integration Network, HelmSim can integrate learned Helmholtz dynamics along temporal dimension in multiple spatial scales to yield future fluid. Comparing with previous velocity estimating methods, HelmSim is faithfully derived from Helmholtz theorem and ravels out complex fluid dynamics with physically interpretable evidence. Experimentally, our proposed HelmSim achieves the consistent state-of-the-art in both numerical simulated and real-world observed benchmarks, even for scenarios with complex boundaries.
    摘要 fluid 模拟是一个长期的挑战,因为它的自然高维非线性动力学性。以前的方法通常使用深度模型的非线性建模能力直接估算未来的速度场,但是跳过了内在物理属性,直接学习 superficies 的速度场将会让模型生成精度不高或者物理可靠的结果。在这篇论文中,我们提出了 HelmSim,一种准确和可解释的流体模拟器。受 helmholtz 定理启发,我们设计了 HelmDynamic 块,用于学习 helmholtz 动力学,它将流体动力学分解为更可解决的 curl-free 和 divergence-free 部分,物理相应于流体的潜在函数和流函数。通过在多尺度练习网络中嵌入 HelmDynamic 块,HelmSim 可以在多个空间尺度上将学习的 helmholtz 动力学集成到时间维度上,以生成未来的流体。相比之前的速度估计方法,HelmSim 是准确地从 helmholtz 定理中派生出来,并且可以揭示出复杂的流体动力学性,并且具有物理可解的证据。实验表明,我们提出的 HelmSim 在数值 simulate 和实际观测的标准准确,即使场景具有复杂的边界。

Causal Dynamic Variational Autoencoder for Counterfactual Regression in Longitudinal Data

  • paper_url: http://arxiv.org/abs/2310.10559
  • repo_url: None
  • paper_authors: Mouad El Bouchattaoui, Myriam Tami, Benoit Lepetit, Paul-Henry Cournède
  • for: 这篇论文的目的是用来估计治疗效果的变化趋势,特别是在精准医学、epidemiology、经济和市场营销等领域。
  • methods: 这篇论文使用了一种新的方法,即假设存在不观察到的风险因素(也称为调整变量),这些风险因素只影响短期内的结果。
  • results: 论文的实验结果显示,这种新方法可以准确地估计个体治疗效果,并能够捕捉长期内治疗响应中的不观察到风险因素。
    Abstract Estimating treatment effects over time is relevant in many real-world applications, such as precision medicine, epidemiology, economy, and marketing. Many state-of-the-art methods either assume the observations of all confounders or seek to infer the unobserved ones. We take a different perspective by assuming unobserved risk factors, i.e., adjustment variables that affect only the sequence of outcomes. Under unconfoundedness, we target the Individual Treatment Effect (ITE) estimation with unobserved heterogeneity in the treatment response due to missing risk factors. We address the challenges posed by time-varying effects and unobserved adjustment variables. Led by theoretical results over the validity of the learned adjustment variables and generalization bounds over the treatment effect, we devise Causal DVAE (CDVAE). This model combines a Dynamic Variational Autoencoder (DVAE) framework with a weighting strategy using propensity scores to estimate counterfactual responses. The CDVAE model allows for accurate estimation of ITE and captures the underlying heterogeneity in longitudinal data. Evaluations of our model show superior performance over state-of-the-art models.
    摘要 在许多实际应用中,如精准医学、 Epidemiology、经济和市场营销中,估计治疗效果的演化是非常重要的。许多现代方法都是假设所有干扰因素的观察,或者尝试推断未观察到的干扰因素。我们采取了一种不同的视角,假设存在未观察到的风险因素,即调整变量,这些变量只影响结果序列。在干扰性下,我们target个人治疗效果(ITE)估计,带有未观察到的多变性。我们解决了时间变化的效果和未观察到的调整变量的挑战。通过理论结果的有效性和权重分配策略使用可能性分数来估计对应响应,我们设计了 causal DVAE(CDVAE)模型。这个模型结合了动态变量自动编码器(DVAE)框架和一种利用可能性分数进行权重分配的策略,以估计对应响应。 CDVAE 模型允许精准地估计 ITE,并捕捉了长期数据中的下降多变性。我们对我们的模型进行评估,并证明它们在现有模型中表现出色。

Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks

  • paper_url: http://arxiv.org/abs/2310.10556
  • repo_url: None
  • paper_authors: Zihao Li, Xiang Ji, Minshuo Chen, Mengdi Wang
  • for: solves reinforcement learning problems with human preference data
  • methods: uses actor-critic methods and fitted-Q-evaluation with a deep neural network
  • results: establishes a sample-efficient estimator for off-policy evaluation with high reward smoothness, and almost aligns with classical OPE results with observable reward data.
    Abstract A recently popular approach to solving reinforcement learning is with data from human preferences. In fact, human preference data are now used with classic reinforcement learning algorithms such as actor-critic methods, which involve evaluating an intermediate policy over a reward learned from human preference data with distribution shift, known as off-policy evaluation (OPE). Such algorithm includes (i) learning reward function from human preference dataset, and (ii) learning expected cumulative reward of a target policy. Despite the huge empirical success, existing OPE methods with preference data often lack theoretical understanding and rely heavily on heuristics. In this paper, we study the sample efficiency of OPE with human preference and establish a statistical guarantee for it. Specifically, we approach OPE by learning the value function by fitted-Q-evaluation with a deep neural network. By appropriately selecting the size of a ReLU network, we show that one can leverage any low-dimensional manifold structure in the Markov decision process and obtain a sample-efficient estimator without suffering from the curse of high data ambient dimensionality. Under the assumption of high reward smoothness, our results \textit{almost align with the classical OPE results with observable reward data}. To the best of our knowledge, this is the first result that establishes a \textit{provably efficient} guarantee for off-policy evaluation with RLHF.
    摘要 一种最近受欢迎的解决方案是使用人类偏好数据来解决强化学习问题。实际上,人类偏好数据现在与 классиical 强化学习算法(如actor-critic方法)结合使用,这些算法包括(i)从人类偏好数据集中学习奖励函数,和(ii)使用learned reward的Off-Policy Evaluation(OPE)来评估目标策略的预期总奖励。 despite the huge empirical success, existing OPE methods with preference data often lack theoretical understanding and rely heavily on heuristics. In this paper, we study the sample efficiency of OPE with human preference and establish a statistical guarantee for it. Specifically, we approach OPE by learning the value function by fitted-Q-evaluation with a deep neural network. By appropriately selecting the size of a ReLU network, we show that one can leverage any low-dimensional manifold structure in the Markov decision process and obtain a sample-efficient estimator without suffering from the curse of high data ambient dimensionality. Under the assumption of high reward smoothness, our results almost align with the classical OPE results with observable reward data. To the best of our knowledge, this is the first result that establishes a provably efficient guarantee for off-policy evaluation with RLHF.

Population-based wind farm monitoring based on a spatial autoregressive approach

  • paper_url: http://arxiv.org/abs/2310.10555
  • repo_url: None
  • paper_authors: W. Lin, K. Worden, E. J. Cross
  • for: 降低风力电站运行和维护成本
  • methods: 使用人口基于的结构健康监测系统,并利用多个结构(i.e.~风机)共享数据来提高结构行为预测
  • results: 提出了一种基于 Gaussian process 的空间自回归模型(GP-SPARX 模型),可以正确捕捉风机群的空间和时间相关性,并且可以用于健康监测系统的实现。
    Abstract An important challenge faced by wind farm operators is to reduce operation and maintenance cost. Structural health monitoring provides a means of cost reduction through minimising unnecessary maintenance trips as well as prolonging turbine service life. Population-based structural health monitoring can further reduce the cost of health monitoring systems by implementing one system for multiple structures (i.e.~turbines). At the same time, shared data within a population of structures may improve the predictions of structural behaviour. To monitor turbine performance at a population/farm level, an important initial step is to construct a model that describes the behaviour of all turbines under normal conditions. This paper proposes a population-level model that explicitly captures the spatial and temporal correlations (between turbines) induced by the wake effect. The proposed model is a Gaussian process-based spatial autoregressive model, named here a GP-SPARX model. This approach is developed since (a) it reflects our physical understanding of the wake effect, and (b) it benefits from a stochastic data-based learner. A case study is provided to demonstrate the capability of the GP-SPARX model in capturing spatial and temporal variations as well as its potential applicability in a health monitoring system.
    摘要 operator of wind farms 面临一个重要挑战是减少运营和维护成本。人口基本的结构健康监测可以通过最小化无必要的维护旅行以及提高机顺服务寿命,从而减少健康监测系统的成本。同时,在多个结构之间共享数据可以提高结构行为预测的准确性。为监测风 турbin的性能,一个重要的初始步骤是建立一个描述所有风 турbin在正常情况下行为的模型。这篇文章提出了一种人口级别的模型,该模型由 Gaussian 过程基本的空间自相关模型(GP-SPARX)组成,这种方法因其体现了风阻效应的物理理解,同时受益于数据驱动的随机学习。一个案例研究证明了 GP-SPARX 模型能够 capture 空间和时间变化,以及其可能应用于健康监测系统。

TacticAI: an AI assistant for football tactics

  • paper_url: http://arxiv.org/abs/2310.10553
  • repo_url: None
  • paper_authors: Zhe Wang, Petar Veličković, Daniel Hennes, Nenad Tomašev, Laurel Prince, Michael Kaisers, Yoram Bachrach, Romuald Elie, Li Kevin Wenliang, Federico Piccinini, William Spearman, Ian Graham, Jerome Connor, Yi Yang, Adrià Recasens, Mina Khan, Nathalie Beauguerlange, Pablo Sprechmann, Pol Moreno, Nicolas Heess, Michael Bowling, Demis Hassabis, Karl Tuyls
  • for: 这篇论文是为了开发一种基于人工智能的足球战术助手(TacticAI),帮助教练分析对手队伍的战术模式,并提供有效的回应策略。
  • methods: 这篇论文使用了预测和生成两部分的算法,允许教练通过样本和探索不同的球员布局来评估不同的角球模式,并选择最有可能性 succeed 的设置。
  • results: 研究人员通过对一些有关的 benchmark task 进行验证,证明 TacticAI 的模型建议不仅与实际战术无法分辨,而且在 90% 的时间上超过现有战术。 另外,TacticAI 还提供了一个有效的角球检索系统。
    Abstract Identifying key patterns of tactics implemented by rival teams, and developing effective responses, lies at the heart of modern football. However, doing so algorithmically remains an open research challenge. To address this unmet need, we propose TacticAI, an AI football tactics assistant developed and evaluated in close collaboration with domain experts from Liverpool FC. We focus on analysing corner kicks, as they offer coaches the most direct opportunities for interventions and improvements. TacticAI incorporates both a predictive and a generative component, allowing the coaches to effectively sample and explore alternative player setups for each corner kick routine and to select those with the highest predicted likelihood of success. We validate TacticAI on a number of relevant benchmark tasks: predicting receivers and shot attempts and recommending player position adjustments. The utility of TacticAI is validated by a qualitative study conducted with football domain experts at Liverpool FC. We show that TacticAI's model suggestions are not only indistinguishable from real tactics, but also favoured over existing tactics 90% of the time, and that TacticAI offers an effective corner kick retrieval system. TacticAI achieves these results despite the limited availability of gold-standard data, achieving data efficiency through geometric deep learning.
    摘要 现代足球中,认识对手队伍实施的战术模式,并开发有效应对策略,是核心问题。然而,这种算法化研究仍然是一个开放的研究挑战。为解决这个需求,我们提出了TacticAI,一个基于人工智能的足球战术助手。我们与足球领域专家合作开发并评估了TacticAI,专注于分析角球机会,因为这些机会提供了教练最直接的改进和优化机会。TacticAI包含预测和生成两个组成部分,允许教练通过采样和探索不同的玩家设置来寻找最有可能成功的角球机会。我们在多个相关的 bencmark任务上验证了TacticAI:预测接收者和射击尝试,并建议玩家位置调整。我们通过对足球领域专家进行质量调研,证明TacticAI的模型建议与实际战术无法分辨,并且90%的时间 prefer TacticAI的建议。此外,TacticAI还提供了有效的角球检索系统。TacticAI达到了这些结果,尽管数据的可用性受限,通过几何深度学习实现了数据效率。

Optimal vintage factor analysis with deflation varimax

  • paper_url: http://arxiv.org/abs/2310.10545
  • repo_url: None
  • paper_authors: Xin Bing, Dian Jin, Yuqian Zhang
  • for: This paper proposes a new method for vintage factor analysis, which aims to find a low-dimensional representation of the original data and then seek a rotation that is scientifically meaningful.
  • methods: The proposed method uses a deflation varimax procedure that solves each row of an orthogonal matrix sequentially, which has a net computational gain and flexibility.
  • results: The proposed method is able to fully establish theoretical guarantees for the proposed procedure in a broad context, and it is shown to be optimal in all SNR regimes. Additionally, the method is valid for finite sample and allows the number of the latent factors to grow with the sample size.Here is the Chinese translation of the three points:
  • for: 这个论文提出了一种新的维度分析方法,目的是找到原始数据的低维度表示,然后寻找科学意义的旋转。
  • methods: 该方法使用了一种减法varimax过程,解决每个正交矩阵的每行问题,具有计算效益和灵活性。
  • results: 该方法能够在广泛的情况下提供完整的理论保证,并且在所有的噪声范围内都是优化的。此外,方法是有限样本的有效性和因子数量可以随样本大小和维度的增长而增长。
    Abstract Vintage factor analysis is one important type of factor analysis that aims to first find a low-dimensional representation of the original data, and then to seek a rotation such that the rotated low-dimensional representation is scientifically meaningful. Perhaps the most widely used vintage factor analysis is the Principal Component Analysis (PCA) followed by the varimax rotation. Despite its popularity, little theoretical guarantee can be provided mainly because varimax rotation requires to solve a non-convex optimization over the set of orthogonal matrices. In this paper, we propose a deflation varimax procedure that solves each row of an orthogonal matrix sequentially. In addition to its net computational gain and flexibility, we are able to fully establish theoretical guarantees for the proposed procedure in a broad context. Adopting this new varimax approach as the second step after PCA, we further analyze this two step procedure under a general class of factor models. Our results show that it estimates the factor loading matrix in the optimal rate when the signal-to-noise-ratio (SNR) is moderate or large. In the low SNR regime, we offer possible improvement over using PCA and the deflation procedure when the additive noise under the factor model is structured. The modified procedure is shown to be optimal in all SNR regimes. Our theory is valid for finite sample and allows the number of the latent factors to grow with the sample size as well as the ambient dimension to grow with, or even exceed, the sample size. Extensive simulation and real data analysis further corroborate our theoretical findings.
    摘要 古典因素分析是一种重要的因素分析方法,旨在首先找到原始数据的低维度表示,然后寻找一种可靠的旋转,使得旋转后的低维度表示具有科学意义。最广泛使用的古典因素分析方法是主Component分析(PCA)followed by varimax旋转。尽管它受欢迎,但是可以提供的理论保证很少,因为varimax旋转需要解决非核心化优化问题。 在这篇论文中,我们提出了一种减少varimax过程中的计算量和灵活性的方法,并且可以在广泛的Context下提供完整的理论保证。我们采用这种新的varimax方法作为PCA之后的第二步,然后对这两步进程进行了广泛的分析。我们的结果表明,这种两步过程在中等或大的信号噪声比(SNR)下能够优化因子加载矩阵。在噪声比较低的情况下,我们提供了可能的改进方案,其中添加的噪声在因子模型下是结构化的。我们的修改过程在所有SNR régime下是优化的。我们的理论是有限样本和因子数量可以随样本大小和环境维度增长。我们的实验和实际数据分析进一步证明了我们的理论发现。

Comparing Comparators in Generalization Bounds

  • paper_url: http://arxiv.org/abs/2310.10534
  • repo_url: None
  • paper_authors: Fredrik Hellström, Benjamin Guedj
  • for: 本研究的目的是提出一种基于信息理论和PAC-搜索概率的通用泛化约束,用于评估机器学习模型的泛化性能。
  • methods: 本文使用了信息理论和PAC-搜索概率来 derive一些泛化约束,并证明了这些约束的优化性。
  • results: 本文的研究结果表明,使用这些泛化约束可以获得更加优化的泛化性能,并且可以在不同的维度上进行泛化。
    Abstract We derive generic information-theoretic and PAC-Bayesian generalization bounds involving an arbitrary convex comparator function, which measures the discrepancy between the training and population loss. The bounds hold under the assumption that the cumulant-generating function (CGF) of the comparator is upper-bounded by the corresponding CGF within a family of bounding distributions. We show that the tightest possible bound is obtained with the comparator being the convex conjugate of the CGF of the bounding distribution, also known as the Cram\'er function. This conclusion applies more broadly to generalization bounds with a similar structure. This confirms the near-optimality of known bounds for bounded and sub-Gaussian losses and leads to novel bounds under other bounding distributions.
    摘要

Learning optimal integration of spatial and temporal information in noisy chemotaxis

  • paper_url: http://arxiv.org/abs/2310.10531
  • repo_url: https://github.com/kirkegaardlab/chemoxrl
  • paper_authors: Albert Alonso, Julius B. Kirkegaard
  • for: 研究 chemotaxis 驱动 by spatial 和 temporal 估计的边界
  • methods: 使用 deep reinforcement learning 研究可以在不受限制的方式集成 spatial 和 temporal 信息
  • results: 发现 transition between regimes 是连续的,combined strategy 在过渡区域表现更好,并且 policy 听取的 gradient 信息是非rivial的组合。Is there anything else I can help with?
    Abstract We investigate the boundary between chemotaxis driven by spatial estimation of gradients and chemotaxis driven by temporal estimation. While it is well known that spatial chemotaxis becomes disadvantageous for small organisms at high noise levels, it is unclear whether there is a discontinuous switch of optimal strategies or a continuous transition exists. Here, we employ deep reinforcement learning to study the possible integration of spatial and temporal information in an a priori unconstrained manner. We parameterize such a combined chemotactic policy by a recurrent neural network and evaluate it using a minimal theoretical model of a chemotactic cell. By comparing with constrained variants of the policy, we show that it converges to purely temporal and spatial strategies at small and large cell sizes, respectively. We find that the transition between the regimes is continuous, with the combined strategy outperforming in the transition region both the constrained variants as well as models that explicitly integrate spatial and temporal information. Finally, by utilizing the attribution method of integrated gradients, we show that the policy relies on a non-trivial combination of spatially and temporally derived gradient information in a ratio that varies dynamically during the chemotactic trajectories.
    摘要 Translation into Simplified Chinese:我们研究 chemotaxis 驱动 by 空间估计 gradient 和 temporal 估计之间的边界。然而,已知的是,随着噪声水平的提高,小organism 中的 spatial chemotaxis 变得不利。但是,是否存在突然的优化策略转换,或者是一个连续的转换,这还未得到了解。我们使用深度学习来研究可能的空间和时间信息的集成,不受任何假设或限制。我们使用 recurrent neural network 来参数化这种合并的 chemotactic 政策,并使用一个最小的化学吸引细胞模型来评估它。我们比较了这种政策与受限的变种,发现它在小和大细胞尺度之间分别转换为纯 temporal 和空间策略,并且在这两个策略之间存在一个连续的转换。此外,我们还发现,在转换区域,合并策略比受限变种和显式地集成空间和时间信息的模型都更高效。最后,我们使用 integrated gradients 的归因方法,发现政策在化学追踪过程中动态变化的空间和时间信息的权重组合是非常复杂的。

From Spectral Theorem to Statistical Independence with Application to System Identification

  • paper_url: http://arxiv.org/abs/2310.10523
  • repo_url: None
  • paper_authors: Muhammad Abdullah Naeem, Amir Khazraei, Miroslav Pajic
  • for: 这个论文是关于高维Random Dynamical Systems的研究,具体来说是研究这些系统的identification问题。
  • methods: 作者使用spectral theorem for non-Hermitian operators来研究系统的特征向量,并通过分析eigenvalues和eigenvectors来描述系统的特性。
  • results: 作者发现,当系统是稳定的时,系统的特征向量可以分解为多个lower dimensional Random Dynamical Systems,这些系统之间是独立的。此外,作者还发现,在这种情况下,covariates可能会受到维度的干扰,导致error的增加。
    Abstract High dimensional random dynamical systems are ubiquitous, including -- but not limited to -- cyber-physical systems, daily return on different stocks of S&P 1500 and velocity profile of interacting particle systems around McKeanVlasov limit. Mathematically, underlying phenomenon can be captured via a stable $n$-dimensional linear transformation `$A$' and additive randomness. System identification aims at extracting useful information about underlying dynamical system, given a length $N$ trajectory from it (corresponds to an $n \times N$ dimensional data matrix). We use spectral theorem for non-Hermitian operators to show that spatio-temperal correlations are dictated by the discrepancy between algebraic and geometric multiplicity of distinct eigenvalues corresponding to state transition matrix. Small discrepancies imply that original trajectory essentially comprises of multiple lower dimensional random dynamical systems living on $A$ invariant subspaces and are statistically independent of each other. In the process, we provide first quantitative handle on decay rate of finite powers of state transition matrix $\|A^{k}\|$ . It is shown that when a stable dynamical system has only one distinct eigenvalue and discrepancy of $n-1$: $\|A\|$ has a dependence on $n$, resulting dynamics are spatially inseparable and consequently there exist at least one row with covariates of typical size $\Theta\big(\sqrt{N-n+1}$ $e^{n}\big)$ i.e., even under stability assumption, covariates can suffer from curse of dimensionality. In the light of these findings we set the stage for non-asymptotic error analysis in estimation of state transition matrix $A$ via least squares regression on observed trajectory by showing that element-wise error is essentially a variant of well-know Littlewood-Offord problem.
    摘要 高维Random动力系统广泛存在,包括但不限于Cyber-Physical Systems、每天不同股票S&P 1500的回报和Interacting Particle Systems around McKeanVlasov limit的速度 Profile。数学上,下面的现象可以通过一个稳定的$n$-维线性变换'$A$'和随机性来捕捉。系统识别目标是从这个系统中提取有用的信息,了解下面的动力系统。我们使用非 hermitian 算子的特征定理来证明,在空间-时间 correlations 中,存在一些独特的多个低维Random dynamical systems 在 $A$ invariable subspaces 中生活,这些系统是独立的。在这个过程中,我们提供了第一个量化的把握,以及 $\|A^{k}\|$ 的衰减率。当一个稳定的动力系统只有一个独特的征值,并且差值为 $n-1$,则 $\|A\|$ 具有对 $n$ 的依赖关系,结果的动力系统是无法分离的。因此,存在至少一行具有特点大小 $\Theta\big(\sqrt{N-n+1}$ $e^{n}\big)$ 的covariates,即,even under stability assumption,covariates 可能会受到维度约束。在这些发现的基础上,我们设置了非对数学术的错误分析在 $A$ 的最小二乘回归中,并证明了元素级别的错误是一种变种的 Littlewood-Offord 问题。

Reproducing Bayesian Posterior Distributions for Exoplanet Atmospheric Parameter Retrievals with a Machine Learning Surrogate Model

  • paper_url: http://arxiv.org/abs/2310.10521
  • repo_url: None
  • paper_authors: Eyup B. Unlu, Roy T. Forestano, Konstantin T. Matchev, Katia Matcheva
  • for: 这个论文是为了实现基于机器学习的 posterior 分布模型,用于重现通过掩蔽行星的谱 spectra 获得的外层星球大气参数的 Bayesian posterior distributions。
  • methods: 该模型使用了适应性学习和半监督学习,以便利用大量的无标注训练数据。它还进行了领域适应的特征处理,以提高模型性能。
  • results: 该模型在2023年 Ariel 机器学习数据挑战中获得了优胜解决方案。
    Abstract We describe a machine-learning-based surrogate model for reproducing the Bayesian posterior distributions for exoplanet atmospheric parameters derived from transmission spectra of transiting planets with typical retrieval software such as TauRex. The model is trained on ground truth distributions for seven parameters: the planet radius, the atmospheric temperature, and the mixing ratios for five common absorbers: $H_2O$, $CH_4$, $NH_3$, $CO$ and $CO_2$. The model performance is enhanced by domain-inspired preprocessing of the features and the use of semi-supervised learning in order to leverage the large amount of unlabelled training data available. The model was among the winning solutions in the 2023 Ariel Machine Learning Data Challenge.
    摘要 我们描述了一种基于机器学习的代理模型,用于重现吸收 спектроскопии中探测到的外层星球大气参数的 bayesian posterior distribution。该模型使用了常用的恢复软件 such as TauRex,并在七个参数上进行了训练:星球半径、大气温度以及五种常见吸收物的混合率:$H_2O$, $CH_4$, $NH_3$, $CO$ 和 $CO_2$。通过域名预处理和使用半监督学习,我们提高了模型的性能,并利用了大量的无标注训练数据。该模型在2023年的Ariel机器学习数据挑战中获得了奖励。

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

  • paper_url: http://arxiv.org/abs/2310.10505
  • repo_url: https://github.com/liziniu/ReMax
  • paper_authors: Ziniu Li, Tian Xu, Yushun Zhang, Yang Yu, Ruoyu Sun, Zhi-Quan Luo
  • for: 本研究旨在提高RLHF任务中的训练效率,并解决PPO算法的计算效率问题。
  • methods: 本研究提出了一种新的RLHF算法 called ReMax,基于REINFORCE算法,并具有一种新的减少方差技术。
  • results: ReMax比PPO具有三大优点:首先,ReMax简单实现,消除了多个超参数,减少了训练时间和精度优化的努力。其次,ReMax减少了50%的内存使用量,可以在8xA100-40GB GPU上训练Llama2(7B)模型。最后,ReMax比PPO快2倍,不降低性能。
    Abstract Alignment is of critical importance for training large language models (LLMs). The predominant strategy to address this is through Reinforcement Learning from Human Feedback (RLHF), where PPO serves as the de-facto algorithm. Yet, PPO is known to suffer from computational inefficiency, which is a challenge that this paper aims to address. We identify three important properties in RLHF tasks: fast simulation, deterministic transitions, and trajectory-level rewards, which are not leveraged in PPO. Based on such observations, we develop a new algorithm tailored for RLHF, called ReMax. The algorithm design of ReMax is built on a celebrated algorithm REINFORCE but is equipped with a new variance-reduction technique. Our method has three-fold advantages over PPO: first, ReMax is simple to implement and removes many hyper-parameters in PPO, which are scale-sensitive and laborious to tune. Second, ReMax saves about 50% memory usage in principle. As a result, PPO runs out-of-memory when fine-tuning a Llama2 (7B) model on 8xA100-40GB GPUs, whereas ReMax can afford training. This memory improvement is achieved by removing the value model in PPO. Third, based on our calculations, we find that even assuming PPO can afford the training of Llama2 (7B), it would still run about 2x slower than ReMax. This is due to the computational overhead of the value model, which does not exist in ReMax. Importantly, the above computational improvements do not sacrifice the performance. We hypothesize these advantages can be maintained in larger-scaled models. Our implementation of ReMax is available at https://github.com/liziniu/ReMax
    摘要 <>translate text into Simplified Chinese<>大量语言模型(LLM)的训练需要Alignment是关键。现有的主流策略是通过人类反馈学习(RLHF),其中PPO serves as the de-facto algorithm。然而,PPO知道 suffer from computational inefficiency,这是这篇文章的目标。我们确定了RLHF任务中的三个重要特性:快速的模拟,决定性的转移和轨迹级别的奖励,这些特性在PPO中未被利用。基于这些观察,我们开发了一种适合RLHF的新算法,called ReMax。ReMax的算法设计基于celebrated algorithm REINFORCE,但具有一种新的减少偏移技术。 我们的方法有三个优势:首先,ReMax简单实现,消除了PPO中许多参数,这些参数是敏感度和耗时consuming。其次,ReMax将减少约50%的内存使用量。这使得PPO在精度级别的模型(7B)上的8xA100-40GB GPU上进行精度级别的模型(7B)上进行精度级别的训练时出现内存不足问题,而ReMax可以进行训练。这种内存改进是通过 removing the value model in PPO 来实现的。第三,基于我们的计算,即使PPO可以训练Llama2(7B),它仍然会比ReMax约2倍 slower。这是因为值模型在PPO中的计算 overhead,不存在在ReMax中。重要的是,上述计算改进不会减少性能。我们认为这些优势可以在更大的模型 scale 中被维持。我们的 ReMax 实现可以在 中找到。

Few-Shot Learning Patterns in Financial Time-Series for Trend-Following Strategies

  • paper_url: http://arxiv.org/abs/2310.10500
  • repo_url: None
  • paper_authors: Kieran Wood, Samuel Kessler, Stephen J. Roberts, Stefan Zohren
  • for: 这个论文是为了提出一种能够快速适应金融市场变化的时间序列趋势预测模型,以避免在金融市场突然变化时出现的损失。
  • methods: 该模型使用了深度学习的最新进展,特别是几拟学习,以及时间序列趋势预测模型。
  • results: 该模型在2018-2023年的紧张市场期间,相比 neural forecaster 和时间序列势力策略,提高了18.9%的谭瑞比,并且在 COVID-19 下落期间, doubles 快速恢复。此外,该模型还可以对新的金融资产进行零 shot 位置,并且与 neural time-series trend forecaster 相比,在同一时间期内提高了5倍的谭瑞比。
    Abstract Forecasting models for systematic trading strategies do not adapt quickly when financial market conditions change, as was seen in the advent of the COVID-19 pandemic in 2020, when market conditions changed dramatically causing many forecasting models to take loss-making positions. To deal with such situations, we propose a novel time-series trend-following forecaster that is able to quickly adapt to new market conditions, referred to as regimes. We leverage recent developments from the deep learning community and use few-shot learning. We propose the Cross Attentive Time-Series Trend Network - X-Trend - which takes positions attending over a context set of financial time-series regimes. X-Trend transfers trends from similar patterns in the context set to make predictions and take positions for a new distinct target regime. X-Trend is able to quickly adapt to new financial regimes with a Sharpe ratio increase of 18.9% over a neural forecaster and 10-fold over a conventional Time-series Momentum strategy during the turbulent market period from 2018 to 2023. Our strategy recovers twice as quickly from the COVID-19 drawdown compared to the neural-forecaster. X-Trend can also take zero-shot positions on novel unseen financial assets obtaining a 5-fold Sharpe ratio increase versus a neural time-series trend forecaster over the same period. X-Trend both forecasts next-day prices and outputs a trading signal. Furthermore, the cross-attention mechanism allows us to interpret the relationship between forecasts and patterns in the context set.
    摘要 预测模型 для系统性交易策略不快适应金融市场条件变化,例如2020年COVID-19大流行期间,市场条件快速变化,许多预测模型亏损。为解决这种情况,我们提出了一种新的时间序列趋势预测器,可以快速适应新的市场条件,称为“ régime”。我们利用了最新的深度学习社区的进展,并使用几何学学习。我们提出了跨注意力时间序列趋势网络(X-Trend),它在一个上下文集中注意力分配位置,并将趋势从类似的模式传递到新目标 régime 中进行预测和交易。X-Trend 能快速适应新的金融 régime,其肖特比(Sharpe ratio)提高18.9%于神经预测器和10倍于传统时间序列势力策略在2018-2023年的混乱市场期间。我们的策略在COVID-19下滑期间复制两倍于神经预测器。X-Trend 还可以在未看到的金融资产上出现零shot位置,其肖特比提高5倍于神经时间序列趋势预测器在同一时间期。X-Trend 同时预测下一天的价格和输出交易信号。此外,跨注意力机制允许我们解释预测和上下文集中的模式之间的关系。

Passive Inference Attacks on Split Learning via Adversarial Regularization

  • paper_url: http://arxiv.org/abs/2310.10483
  • repo_url: None
  • paper_authors: Xiaochen Zhu, Xinjian Luo, Yuncheng Wu, Yangfan Jiang, Xiaokui Xiao, Beng Chin Ooi
  • for: 这个研究旨在攻击 Split Learning (SL) 的实际和有效替代方案。
  • methods: 这个研究引入了一个名为 SDAR 的攻击框架,这个框架使用辅助数据和敌对调整来学习一个可以实时重建客户端私人模型的可靠模拟器。
  • results: 实验结果显示,在实际且实用的攻击enario中,SDAR 能够实时重建客户端私人数据,并在 U-shaped SL 中重建数据和标签。在 CIFAR-10 上,在深度分割水平 7 下,SDAR 能够实现私人数据重建的 mean squared error 小于 0.025,并在 U-shaped SL 中 дости得标签推论精度高于 98%。
    Abstract Split Learning (SL) has emerged as a practical and efficient alternative to traditional federated learning. While previous attempts to attack SL have often relied on overly strong assumptions or targeted easily exploitable models, we seek to develop more practical attacks. We introduce SDAR, a novel attack framework against SL with an honest-but-curious server. SDAR leverages auxiliary data and adversarial regularization to learn a decodable simulator of the client's private model, which can effectively infer the client's private features under the vanilla SL, and both features and labels under the U-shaped SL. We perform extensive experiments in both configurations to validate the effectiveness of our proposed attacks. Notably, in challenging but practical scenarios where existing passive attacks struggle to reconstruct the client's private data effectively, SDAR consistently achieves attack performance comparable to active attacks. On CIFAR-10, at the deep split level of 7, SDAR achieves private feature reconstruction with less than 0.025 mean squared error in both the vanilla and the U-shaped SL, and attains a label inference accuracy of over 98% in the U-shaped setting, while existing attacks fail to produce non-trivial results.
    摘要

Adaptive Neural Ranking Framework: Toward Maximized Business Goal for Cascade Ranking Systems

  • paper_url: http://arxiv.org/abs/2310.10462
  • repo_url: None
  • paper_authors: Yunli Wang, Zhiqiang Wang, Jian Yang, Shiyang Wen, Dongying Kong, Han Li, Kun Gai
  • for: 这个论文主要针对大规模top-k选择问题中的涨幅排序系统优化,具体来说是通过学习排序来优化模型。
  • methods: 该论文提出了一种基于多任务学习框架的 Adaptive Neural Ranking Framework,通过将relaxed和完整的目标优化并 combinely,使得优化目标适应不同数据复杂度和模型能力。
  • results: 实验结果表明,该方法在4个公共和商业benchmark上表现出色,并且在线上实验中具有显著的应用价值。
    Abstract Cascade ranking is widely used for large-scale top-k selection problems in online advertising and recommendation systems, and learning-to-rank is an important way to optimize the models in cascade ranking systems. Previous works on learning-to-rank usually focus on letting the model learn the complete order or pay more attention to the order of top materials, and adopt the corresponding rank metrics as optimization targets. However, these optimization targets can not adapt to various cascade ranking scenarios with varying data complexities and model capabilities; and the existing metric-driven methods such as the Lambda framework can only optimize a rough upper bound of the metric, potentially resulting in performance misalignment. To address these issues, we first propose a novel perspective on optimizing cascade ranking systems by highlighting the adaptability of optimization targets to data complexities and model capabilities. Concretely, we employ multi-task learning framework to adaptively combine the optimization of relaxed and full targets, which refers to metrics Recall@m@k and OAP respectively. Then we introduce a permutation matrix to represent the rank metrics and employ differentiable sorting techniques to obtain a relaxed permutation matrix with controllable approximate error bound. This enables us to optimize both the relaxed and full targets directly and more appropriately using the proposed surrogate losses within the deep learning framework. We named this method as Adaptive Neural Ranking Framework. We use the NeuralSort method to obtain the relaxed permutation matrix and draw on the uncertainty weight method in multi-task learning to optimize the proposed losses jointly. Experiments on a total of 4 public and industrial benchmarks show the effectiveness and generalization of our method, and online experiment shows that our method has significant application value.
    摘要 cascade ranking 广泛应用于大规模 top-k 选择问题中,学习 rank 是一种重要的优化方法。前一些工作通常是让模型学习完整的排序或更加注重 top Materials 的排序,采用相应的rank metric作为优化目标。然而,这些优化目标无法适应不同的排序场景中的数据复杂性和模型能力;而现有的 metric-driven 方法,如Lambda框架,只能优化一个粗略的上界,可能导致性能不符。为解决这些问题,我们首先提出一种新的视角,即优化 cascade ranking 系统的可适应性。具体来说,我们使用多任务学习框架来适应性地组合优化 relaxed 和 full 目标。relaxed 目标指的是 recall@m@k 和 OAP metric,而 full 目标则是完整的排序。然后,我们引入排序矩阵来表示排序 metric,并使用可微排序技术来获得一个可控的相对误差 bound。这使得我们可以直接优化 relaxed 和 full 目标,并更加合适地使用我们提出的代理损失函数在深度学习框架中进行优化。我们称这种方法为 Adaptive Neural Ranking Framework。我们使用 NeuralSort 方法来获得 relaxed 排序矩阵,并在多任务学习中使用不确定性权重来优化我们的提出的损失函数。实验结果显示,我们的方法在四个公共和工业标准准中表现出色,并且在实际应用中具有显著的价值。

A Geometric Insight into Equivariant Message Passing Neural Networks on Riemannian Manifolds

  • paper_url: http://arxiv.org/abs/2310.10448
  • repo_url: None
  • paper_authors: Ilyes Batatia
  • for: 本文提出了一种 geometric 的思路,用于解释 equivariant message passing 在 Riemannian manifold 上的实现。
  • methods: 作者使用 coordinate-independent feature fields 表示数据的 numerical features,并将其映射到主bundle 上的 equivariant embedding 中。然后,他们提出一种优化 Polyakov action 的方法,以确保 embedding 中的 metric 与原始 metric 相似。
  • results: 作者提出了一种基于 equivariant diffusion process 的 message passing scheme,可以在 manifold 上实现。此外,他们还提出了一种基于高阶 equivariant diffusion process 的新的一般化 GNN 模型,可以扩展 ACE 和 MACE formalism 到 Riemannian manifold 上的数据。
    Abstract This work proposes a geometric insight into equivariant message passing on Riemannian manifolds. As previously proposed, numerical features on Riemannian manifolds are represented as coordinate-independent feature fields on the manifold. To any coordinate-independent feature field on a manifold comes attached an equivariant embedding of the principal bundle to the space of numerical features. We argue that the metric this embedding induces on the numerical feature space should optimally preserve the principal bundle's original metric. This optimality criterion leads to the minimization of a twisted form of the Polyakov action with respect to the graph of this embedding, yielding an equivariant diffusion process on the associated vector bundle. We obtain a message passing scheme on the manifold by discretizing the diffusion equation flow for a fixed time step. We propose a higher-order equivariant diffusion process equivalent to diffusion on the cartesian product of the base manifold. The discretization of the higher-order diffusion process on a graph yields a new general class of equivariant GNN, generalizing the ACE and MACE formalism to data on Riemannian manifolds.
    摘要 这个工作提出了一种几何视角来理解在里曼尼投影上的平衡消息传递。在先前的提议中,数字特征在里曼尼投影上是作为独立坐标的特征场表示的。为任何独立特征场在投影上来说,有一个对称嵌入主 bundle 到特征空间的 equivariant 嵌入。我们 argue 这个嵌入应该保持原始主 bundle 的 metric 的最佳方式,这个标准导致了对 twisted 形式的 Polyakov 动作的最小化,从而获得一个 equivariant 扩散过程在关联的向量bundle 上。我们可以通过粘束扩散方程的离散来获得一个消息传递方案在投影上。我们提出了一种高阶不变扩散过程,与 cartesian product 的基 manifold 相等。离散这种高阶扩散过程在图上得到一个新的一般类型的不变 GNN,扩展了数据在里曼尼投影上的 ACE 和 MACE formalism。

Taming the Sigmoid Bottleneck: Provably Argmaxable Sparse Multi-Label Classification

  • paper_url: http://arxiv.org/abs/2310.10443
  • repo_url: https://github.com/andreasgrv/sigmoid-bottleneck
  • paper_authors: Andreas Grivas, Antonio Vergari, Adam Lopez
  • for: 这篇论文是关于多标签分类任务中的sigmoid输出层,其中每个输入可以获得多个标签。
  • methods: 这篇论文使用了Discrete Fourier Transform(DFT)输出层,以确保所有稀疏的标签组合都是可arginmax的。
  • results: 论文表明,sigmoid输出层在多标签分类任务中会导致无法argmax的输出,并且可以通过使用DFT输出层来避免这种情况。DFT输出层比sigmoid输出层更快速地训练,并且具有更好的参数效率。
    Abstract Sigmoid output layers are widely used in multi-label classification (MLC) tasks, in which multiple labels can be assigned to any input. In many practical MLC tasks, the number of possible labels is in the thousands, often exceeding the number of input features and resulting in a low-rank output layer. In multi-class classification, it is known that such a low-rank output layer is a bottleneck that can result in unargmaxable classes: classes which cannot be predicted for any input. In this paper, we show that for MLC tasks, the analogous sigmoid bottleneck results in exponentially many unargmaxable label combinations. We explain how to detect these unargmaxable outputs and demonstrate their presence in three widely used MLC datasets. We then show that they can be prevented in practice by introducing a Discrete Fourier Transform (DFT) output layer, which guarantees that all sparse label combinations with up to $k$ active labels are argmaxable. Our DFT layer trains faster and is more parameter efficient, matching the F1@k score of a sigmoid layer while using up to 50% fewer trainable parameters. Our code is publicly available at https://github.com/andreasgrv/sigmoid-bottleneck.
    摘要 希格迪输出层在多标签分类(MLC)任务中广泛使用,在任务中任何输入都可以获得多个标签。在实际应用中,可能有数천个可能的标签,常常超过输入特征的数量,导致输出层的低级排名。在多类分类中,这种低级输出层会导致不可预测的类:无法预测的类。在这篇论文中,我们表明MLC任务中的希格迪瓶颈会导致无数多个不可预测的标签组合。我们解释了如何检测这些不可预测的输出和三个常用的MLC数据集中其存在。然后我们表明可以通过引入离散傅里叶变换(DFT)输出层来避免这些不可预测的输出。我们的DFT层在训练时更快,并且使用更少的可训练参数,与希格迪层的F1@k分数相同,而使用的参数数量可以减少到50%。我们的代码可以在https://github.com/andreasgrv/sigmoid-bottleneck上获取。

Equivariant Matrix Function Neural Networks

  • paper_url: http://arxiv.org/abs/2310.10434
  • repo_url: None
  • paper_authors: Ilyes Batatia, Lars L. Schaaf, Huajie Chen, Gábor Csányi, Christoph Ortner, Felix A. Faber
  • for: This paper aims to address the challenges of modeling non-local interactions in systems such as large conjugated molecules, metals, or amorphous materials using Graph Neural Networks (GNNs) and traditional neural networks.
  • methods: The paper introduces a novel architecture called Matrix Function Neural Networks (MFNs), which parameterizes non-local interactions through analytic matrix equivariant functions. The MFN architecture uses resolvent expansions for a straightforward implementation and the potential for linear scaling with system size.
  • results: The MFN architecture achieves state-of-the-art performance in standard graph benchmarks, such as the ZINC and TU datasets, and is able to capture intricate non-local interactions in quantum systems, paving the way to new state-of-the-art force fields.
    Abstract Graph Neural Networks (GNNs), especially message-passing neural networks (MPNNs), have emerged as powerful architectures for learning on graphs in diverse applications. However, MPNNs face challenges when modeling non-local interactions in systems such as large conjugated molecules, metals, or amorphous materials. Although Spectral GNNs and traditional neural networks such as recurrent neural networks and transformers mitigate these challenges, they often lack extensivity, adaptability, generalizability, computational efficiency, or fail to capture detailed structural relationships or symmetries in the data. To address these concerns, we introduce Matrix Function Neural Networks (MFNs), a novel architecture that parameterizes non-local interactions through analytic matrix equivariant functions. Employing resolvent expansions offers a straightforward implementation and the potential for linear scaling with system size. The MFN architecture achieves state-of-the-art performance in standard graph benchmarks, such as the ZINC and TU datasets, and is able to capture intricate non-local interactions in quantum systems, paving the way to new state-of-the-art force fields.
    摘要 图形神经网络(GNNs),特别是消息传递神经网络(MPNNs),在不同应用场景中显示出了强大的架构能力。然而,MPNNs在大 conjugated molecules、金属和归一化材料等系统中模型非本地交互时面临挑战。虽然spectral GNNs和传统神经网络如回归神经网络和transformers可以减轻这些挑战,但它们经常缺乏广泛性、适应性、普适性、计算效率或失去数据中的细致结构关系或对称性。为解决这些问题,我们介绍了矩阵函数神经网络(MFNs),一种新的架构,该参数非本地交互通过矩阵对偶变换函数。使用resolvent expansions的实现可以提供一种简单的实现方式,并且可能实现系统大小的线性扩展。MFN架构在标准图形数据集上达到了state-of-the-art性能,如ZINC和TU数据集,并能够捕捉到量子系统中的复杂非本地交互,为新的state-of-the-art力场开创道路。

Continuously Adapting Random Sampling (CARS) for Power Electronics Parameter Design

  • paper_url: http://arxiv.org/abs/2310.10425
  • repo_url: None
  • paper_authors: Dominik Happel, Philipp Brendel, Andreas Rosskopf, Stefan Ditze
  • for: 这个论文主要针对的是电子能源参数设计任务的优化问题,通常使用详细的优化方法或者笨拙的搜索方法来解决。
  • methods: 该论文提出了一种新的方法 named “Continuously Adapting Random Sampling” (CARS),它提供了一种连续的方法,位于详细优化方法和笨拙搜索方法之间。这种方法可以快速地进行大量的 simulations,同时逐渐增加关注最有前途的参数范围。这个方法 Draws inspiration from multi-armed bandit research and leads to prioritized sampling of sub-domains in one high-dimensional parameter tensor。
  • results: 该论文对三个例子的电子能源使用情况进行了评估,得到的设计与遗传算法相当竞争力,同时具有高度并行化的 simulate 特点和不断进行探索和利用设置之间的融合。
    Abstract To date, power electronics parameter design tasks are usually tackled using detailed optimization approaches with detailed simulations or using brute force grid search grid search with very fast simulations. A new method, named "Continuously Adapting Random Sampling" (CARS) is proposed, which provides a continuous method in between. This allows for very fast, and / or large amounts of simulations, but increasingly focuses on the most promising parameter ranges. Inspirations are drawn from multi-armed bandit research and lead to prioritized sampling of sub-domains in one high-dimensional parameter tensor. Performance has been evaluated on three exemplary power electronic use-cases, where resulting designs appear competitive to genetic algorithms, but additionally allow for highly parallelizable simulation, as well as continuous progression between explorative and exploitative settings.
    摘要 文本翻译为简化中文:迄今,电力电子参数设计任务通常使用详细优化方法或使用劳顿搜索法,均采用详细的simulation。一种新的方法,名为“连续适应随机抽样”(CARS)被提议,它提供了一种连续的方法,位于详细优化和劳顿搜索之间。这使得可以很快、或者进行大量的simulation,但是逐渐关注最有前途的参数范围。 draw inspirations from multi-armed bandit research and lead to prioritized sampling of sub-domains in one high-dimensional parameter tensor。 performance has been evaluated on three exemplary power electronic use-cases, where resulting designs appear competitive to genetic algorithms, but additionally allow for highly parallelizable simulation, as well as continuous progression between explorative and exploitative settings。Here is the translation of the text into Simplified Chinese:迄今,电力电子参数设计任务通常使用详细优化方法或使用劳顿搜索法,均采用详细的simulation。一种新的方法,名为“连续适应随机抽样”(CARS)被提议,它提供了一种连续的方法,位于详细优化和劳顿搜索之间。这使得可以很快、或者进行大量的simulation,但是逐渐关注最有前途的参数范围。 draw inspirations from multi-armed bandit research and lead to prioritized sampling of sub-domains in one high-dimensional parameter tensor。 performance has been evaluated on three exemplary power electronic use-cases, where resulting designs appear competitive to genetic algorithms, but additionally allow for highly parallelizable simulation, as well as continuous progression between explorative and exploitative settings。

Towards Fair and Calibrated Models

  • paper_url: http://arxiv.org/abs/2310.10399
  • repo_url: None
  • paper_authors: Anand Brahmbhatt, Vipul Rathore, Mausam, Parag Singla
  • for: 建立不偏袋化和准确的机器学习模型
  • methods: 使用特定定义的公平性、抽象和解释性,并提出了一种基于温度 scaling的简单预处理技术和修改现有抽象损失来实现公平和准确的模型
  • results: 通过对多种数据集进行广泛实验,发现这些技术可以实现公平和准确的模型,并提供了对模型的解释和分析。
    Abstract Recent literature has seen a significant focus on building machine learning models with specific properties such as fairness, i.e., being non-biased with respect to a given set of attributes, calibration i.e., model confidence being aligned with its predictive accuracy, and explainability, i.e., ability to be understandable to humans. While there has been work focusing on each of these aspects individually, researchers have shied away from simultaneously addressing more than one of these dimensions. In this work, we address the problem of building models which are both fair and calibrated. We work with a specific definition of fairness, which closely matches [Biswas et. al. 2019], and has the nice property that Bayes optimal classifier has the maximum possible fairness under our definition. We show that an existing negative result towards achieving a fair and calibrated model [Kleinberg et. al. 2017] does not hold for our definition of fairness. Further, we show that ensuring group-wise calibration with respect to the sensitive attributes automatically results in a fair model under our definition. Using this result, we provide a first cut approach for achieving fair and calibrated models, via a simple post-processing technique based on temperature scaling. We then propose modifications of existing calibration losses to perform group-wise calibration, as a way of achieving fair and calibrated models in a variety of settings. Finally, we perform extensive experimentation of these techniques on a diverse benchmark of datasets, and present insights on the pareto-optimality of the resulting solutions.
    摘要

Revisiting Logistic-softmax Likelihood in Bayesian Meta-Learning for Few-Shot Classification

  • paper_url: http://arxiv.org/abs/2310.10379
  • repo_url: https://github.com/keanson/revisit-logistic-softmax
  • paper_authors: Tianjun Ke, Haoqun Cao, Zenan Ling, Feng Zhou
  • for: 这个论文主要研究了如何使用逻辑-软泛函数来提高几个shot分类(FSC)中的不确定性评估和性能。
  • methods: 该论文使用了 bayesian 方法来 caracterize uncertainty in FSC,并使用了修改后的逻辑-软泛函数来控制先前不确定性的问题。
  • results: 该论文通过 theoretically 和 empirically 表明,修改后的逻辑-软泛函数可以提高 uncertainty 估计的准确性和性能,并且可以在标准 benchmark 数据集上达到或超过相同水平。
    Abstract Meta-learning has demonstrated promising results in few-shot classification (FSC) by learning to solve new problems using prior knowledge. Bayesian methods are effective at characterizing uncertainty in FSC, which is crucial in high-risk fields. In this context, the logistic-softmax likelihood is often employed as an alternative to the softmax likelihood in multi-class Gaussian process classification due to its conditional conjugacy property. However, the theoretical property of logistic-softmax is not clear and previous research indicated that the inherent uncertainty of logistic-softmax leads to suboptimal performance. To mitigate these issues, we revisit and redesign the logistic-softmax likelihood, which enables control of the \textit{a priori} confidence level through a temperature parameter. Furthermore, we theoretically and empirically show that softmax can be viewed as a special case of logistic-softmax and logistic-softmax induces a larger family of data distribution than softmax. Utilizing modified logistic-softmax, we integrate the data augmentation technique into the deep kernel based Gaussian process meta-learning framework, and derive an analytical mean-field approximation for task-specific updates. Our approach yields well-calibrated uncertainty estimates and achieves comparable or superior results on standard benchmark datasets. Code is publicly available at \url{https://github.com/keanson/revisit-logistic-softmax}.
    摘要 使用适应学习的方法可以在几个批处理(Few-shot Classification,FSC)中表现出色,因为它可以通过之前的知识来解决新的问题。 bayesian方法可以准确地描述 FSC 中的uncertainty,这对于高风险领域非常重要。在这种情况下,通常使用Logistic-softmax概率 Distribution来取代Softmax概率 Distribution,因为它们具有 conditional conjugacy 性质。然而,Logistic-softmax的理论性不够清楚,而且先前的研究表明,Logistic-softmax的内在不确定性会导致表现下降。为了解决这些问题,我们重新访问和重新设计Logistic-softmax概率 Distribution,这使得可以通过温度参数控制 \textit{a priori} 信任水平。此外,我们还证明了Softmax可以视为Logistic-softmax的特殊情况,Logistic-softmax可以生成更大的数据分布Family。通过修改Logistic-softmax,我们将数据扩展技术集成到深度kernel基于Gaussian Process meta-学习框架中,并 derive了analytical mean-field Approximation for task-specific updates。我们的方法可以提供Well-calibrated uncertainty estimates,并在标准 benchmark datasets上实现了相对或superior的Results。相关代码可以在 上获取。I hope this helps! Let me know if you have any further questions or if you'd like me to translate anything else.

Multi-Factor Spatio-Temporal Prediction based on Graph Decomposition Learning

  • paper_url: http://arxiv.org/abs/2310.10374
  • repo_url: None
  • paper_authors: Jiahao Ji, Jingyuan Wang, Yu Mou, Cheng Long
  • for: 本文提出了一种多因素空间时间预测任务,用于预测不同因素的空间时间数据的发展趋势。
  • methods: 本文提出了一种基于层次分解策略的 theoretically 有效的方法,以及一种名为空间时间图分解学习(STGDL)的模型无关框架。STGDL 包括两个主要组成部分:自动图分解模块和分解学习网络。
  • results: 对四个实际的空间时间数据集进行了广泛的实验,结果显示,使用本文提出的方法可以significantly 降低不同模型的预测错误率,最高降低到35.36%。此外,一个案例研究也表明了本方法的可解释性潜力。
    Abstract Spatio-temporal (ST) prediction is an important and widely used technique in data mining and analytics, especially for ST data in urban systems such as transportation data. In practice, the ST data generation is usually influenced by various latent factors tied to natural phenomena or human socioeconomic activities, impacting specific spatial areas selectively. However, existing ST prediction methods usually do not refine the impacts of different factors, but directly model the entangled impacts of multiple factors. This amplifies the modeling complexity of ST data and compromises model interpretability. To this end, we propose a multi-factor ST prediction task that predicts partial ST data evolution under different factors, and combines them for a final prediction. We make two contributions to this task: an effective theoretical solution and a portable instantiation framework. Specifically, we first propose a theoretical solution called decomposed prediction strategy and prove its effectiveness from the perspective of information entropy theory. On top of that, we instantiate a novel model-agnostic framework, named spatio-temporal graph decomposition learning (STGDL), for multi-factor ST prediction. The framework consists of two main components: an automatic graph decomposition module that decomposes the original graph structure inherent in ST data into subgraphs corresponding to different factors, and a decomposed learning network that learns the partial ST data on each subgraph separately and integrates them for the final prediction. We conduct extensive experiments on four real-world ST datasets of two types of graphs, i.e., grid graph and network graph. Results show that our framework significantly reduces prediction errors of various ST models by 9.41% on average (35.36% at most). Furthermore, a case study reveals the interpretability potential of our framework.
    摘要 这是一个很重要的数据探索和分析技术,尤其是在城市系统中的交通数据。在实践中,这些数据通常受到自然现象或人类社会经济活动的多种隐藏因素影响,这些因素影响特定的空间区域选择性地。然而,现有的这些预测方法通常不会细分这些因素的影响,而是直接模型这些杂糅的影响。这会增加这些数据的预测复杂性和模型解释性。为了解决这个问题,我们提出了一个多因素预测任务,预测不同因素的部分预测结果,然后结合它们进行最终预测。我们做出了两个贡献:一个有效的理论解决方案和一个可携的实现框架。具体来说,我们首先提出了一个名为分解预测策略的理论解决方案,并证明其有效性从信息熵理论的角度。而在这个基础上,我们实现了一个名为类型-独立预测架构(STGDL)的新模型独立框架,这个框架包括两个主要 ком成分:一个自动对应图解析模组,将原始的图структуре组织体内的ST数据分解为不同因素的子图,以及一个分解学网络,这个学网络在每个子图上进行分解预测,然后将它们结合进行最终预测。我们对四个真实世界的ST数据集进行了广泛的实验,结果显示,我们的框架可以对不同的ST模型进行预测,将预测错误量降低了9.41%的平均值(最高到35.36%)。此外,一个实验显示了我们的框架的解释能力。

Machine learning in physics: a short guide

  • paper_url: http://arxiv.org/abs/2310.10368
  • repo_url: https://github.com/franciscorodrigues-usp/MLP
  • paper_authors: Francisco A. Rodrigues
  • for: Physics field (物理领域)
  • methods: Machine learning(机器学习)
  • results: Causal inference, symbolic regression, deep learning(因果推理、符号回归、深度学习)Here’s a more detailed explanation of each point:1. for: The paper is written for the field of physics, specifically focusing on the applications of machine learning in physics.2. methods: The paper covers the main concepts of machine learning, including supervised, unsupervised, and reinforcement learning, as well as more specialized topics such as causal inference, symbolic regression, and deep learning.3. results: The paper discusses some of the principal applications of machine learning in physics and highlights the associated challenges and perspectives.I hope this helps! Let me know if you have any other questions.
    Abstract Machine learning is a rapidly growing field with the potential to revolutionize many areas of science, including physics. This review provides a brief overview of machine learning in physics, covering the main concepts of supervised, unsupervised, and reinforcement learning, as well as more specialized topics such as causal inference, symbolic regression, and deep learning. We present some of the principal applications of machine learning in physics and discuss the associated challenges and perspectives.
    摘要 机器学习是一个迅速成长的领域,拥有可能改革多个科学领域的潜力,包括物理学。本篇文章提供了物理学中机器学习的简要总览,涵盖主要概念的监督学习、无监督学习和强化学习,以及更特殊的主题,如 causal inference、符号回传和深度学习。我们介绍了物理学中机器学习的主要应用和相关挑战,以及未来的展望。

Advantages of Machine Learning in Bus Transport Analysis

  • paper_url: http://arxiv.org/abs/2310.19810
  • repo_url: None
  • paper_authors: Amirsadegh Roshanzamir
  • for: 这个研究旨在使用指导学习算法分析特拉特当地公共汽车系统的准时性。
  • methods: 该研究使用了各种指导学习算法,包括Python的Sci Kit Learn和Stats Models库,以建立准确的模型,能够预测任何一天是否会遵循公共汽车路线的时间标准。
  • results: 研究发现,指导学习算法最重要的考虑因素是公共汽车路线的效率,这对于改善公共汽车系统的性能提供了重要的洞察。
    Abstract Supervised Machine Learning is an innovative method that aims to mimic human learning by using past experiences. In this study, we utilize supervised machine learning algorithms to analyze the factors that contribute to the punctuality of Tehran BRT bus system. We gather publicly available datasets of 2020 to 2022 from Municipality of Tehran to train and test our models. By employing various algorithms and leveraging Python's Sci Kit Learn and Stats Models libraries, we construct accurate models capable of predicting whether a bus route will meet the prescribed standards for on-time performance on any given day. Furthermore, we delve deeper into the decision-making process of each algorithm to determine the most influential factor it considers. This investigation allows us to uncover the key feature that significantly impacts the effectiveness of bus routes, providing valuable insights for improving their performance.
    摘要 超vised机器学习是一种创新的方法,旨在模仿人类学习的方式,使用过去的经验。在这个研究中,我们使用超vised机器学习算法来分析特拉ن布特公共汽车系统的准时性因素。我们使用2020年至2022年公共数据集,来训练和测试我们的模型。通过使用不同的算法和利用Python的Sci Kit Learn和Stats Models库,我们构建了准确的模型,能够预测任何一天会否遵循指定的准时性标准。此外,我们还探究每个算法的决策过程,以确定它最重要的考虑因素。这些调查可以帮助我们找到影响公共汽车线路效果的关键特征,提供有价值的反馈,以提高其性能。

MgNO: Efficient Parameterization of Linear Operators via Multigrid

  • paper_url: http://arxiv.org/abs/2310.19809
  • repo_url: None
  • paper_authors: Juncai He, Xinliang Liu, Jinchao Xu
  • for: 这个论文旨在提出一种简洁的神经网络架构,用于学习运算。
  • methods: 该方法使用了神经网络中的非线性运算层,其输出可以表示为 $\mathcal O_i(u) = \sigma\left( \sum_j \mathcal W_{ij} u + \mathcal B_{ij}\right)$。在这里,$\mathcal W_{ij}$ 是将 $j $- 个输入神经元连接到 $i $- 个输出神经元的半bounded线性算子,而偏置 $\mathcal B_{ij}$ 是一个函数而不是整数。
  • results: 该方法可以准确地解决不同类型的偏微分方程(PDEs),并且在训练时显示出了更高的易学性和更低的抗抑阻性。
    Abstract In this work, we propose a concise neural operator architecture for operator learning. Drawing an analogy with a conventional fully connected neural network, we define the neural operator as follows: the output of the $i$-th neuron in a nonlinear operator layer is defined by $\mathcal O_i(u) = \sigma\left( \sum_j \mathcal W_{ij} u + \mathcal B_{ij}\right)$. Here, $\mathcal W_{ij}$ denotes the bounded linear operator connecting $j$-th input neuron to $i$-th output neuron, and the bias $\mathcal B_{ij}$ takes the form of a function rather than a scalar. Given its new universal approximation property, the efficient parameterization of the bounded linear operators between two neurons (Banach spaces) plays a critical role. As a result, we introduce MgNO, utilizing multigrid structures to parameterize these linear operators between neurons. This approach offers both mathematical rigor and practical expressivity. Additionally, MgNO obviates the need for conventional lifting and projecting operators typically required in previous neural operators. Moreover, it seamlessly accommodates diverse boundary conditions. Our empirical observations reveal that MgNO exhibits superior ease of training compared to other CNN-based models, while also displaying a reduced susceptibility to overfitting when contrasted with spectral-type neural operators. We demonstrate the efficiency and accuracy of our method with consistently state-of-the-art performance on different types of partial differential equations (PDEs).
    摘要 在这项工作中,我们提出了一种简洁神经操作架构,用于神经网络学习。我们将神经操作定义为:输出第i个神经元的非线性操作层的输出为: $\mathcal O_i(u) = \sigma\left(\sum_j \mathcal W_{ij} u + \mathcal B_{ij}\right)$. 其中, $\mathcal W_{ij}$ 表示连接第j个输入神经元到第i个输出神经元的稍尺度的线性操作,而偏置 $\mathcal B_{ij}$ 是一个函数而非整数。由于这个新的通用近似性质,神经操作中的稍尺度线性操作之间的效率参数化(Banach空间)扮演了关键的角色。因此,我们引入MgNO,利用多重格struktur来参数这些线性操作。这种方法具有数学上的准确性和实际上的表达力。此外,MgNO可以自然地满足多种边界条件。我们的实验观察表明,MgNO比其他CNN基于模型更易于训练,同时也具有较少的折衔强度。我们通过不同类型的偏微分方程(PDEs)的实验表明了我们的方法的效率和准确性。

An Anytime Algorithm for Good Arm Identification

  • paper_url: http://arxiv.org/abs/2310.10359
  • repo_url: None
  • paper_authors: Marc Jourdan, Clémence Réda
  • for: 这 paper 的目的是解决在固定预算和时间限制下的好臂标识问题(GAI)。
  • methods: 这 paper 提出了一种无参数和时间自适应的采样规则,称为 APGAI,可以在固定信度和预算设置下使用。
  • results: 作者提供了关于 APGAI 的Upper bound 的概率错误和预测采样复杂性的证明,以及实验结果表明 APGAI 在 synthetic 和实际数据上具有良好的表现。
    Abstract In good arm identification (GAI), the goal is to identify one arm whose average performance exceeds a given threshold, referred to as good arm, if it exists. Few works have studied GAI in the fixed-budget setting, when the sampling budget is fixed beforehand, or the anytime setting, when a recommendation can be asked at any time. We propose APGAI, an anytime and parameter-free sampling rule for GAI in stochastic bandits. APGAI can be straightforwardly used in fixed-confidence and fixed-budget settings. First, we derive upper bounds on its probability of error at any time. They show that adaptive strategies are more efficient in detecting the absence of good arms than uniform sampling. Second, when APGAI is combined with a stopping rule, we prove upper bounds on the expected sampling complexity, holding at any confidence level. Finally, we show good empirical performance of APGAI on synthetic and real-world data. Our work offers an extensive overview of the GAI problem in all settings.
    摘要 <>translate into Simplified ChineseIn good arm identification (GAI), the goal is to identify one arm whose average performance exceeds a given threshold, referred to as good arm, if it exists. Few works have studied GAI in the fixed-budget setting, when the sampling budget is fixed beforehand, or the anytime setting, when a recommendation can be asked at any time. We propose APGAI, an anytime and parameter-free sampling rule for GAI in stochastic bandits. APGAI can be straightforwardly used in fixed-confidence and fixed-budget settings. First, we derive upper bounds on its probability of error at any time. They show that adaptive strategies are more efficient in detecting the absence of good arms than uniform sampling. Second, when APGAI is combined with a stopping rule, we prove upper bounds on the expected sampling complexity, holding at any confidence level. Finally, we show good empirical performance of APGAI on synthetic and real-world data. Our work offers an extensive overview of the GAI problem in all settings.中文简体版:在好臂标识(GAI)中,目标是找到一个臂的平均性能超过给定的阈值的臂,如果存在。已有相对少的研究对GAI进行了固定预算设定或任何时间设定。我们提出了APGAI,一种无参数和任何时间 sampling 规则。APGAI可以直接在固定信度和固定预算设定下使用。我们首先 deriv 了APGAI在任何时间的错误概率的Upper bound。这些结果显示了适应策略在缺乏好臂时更有效率地检测。其次,当APGAI与停止规则结合使用时,我们证明了预期的样本复杂度的Upper bound,保持任何信度水平。最后,我们在 sintetic 和实际数据上显示了APGAI的良好实际表现。我们的工作对GAI问题在所有设定中进行了全面的概述。

Hamming Encoder: Mining Discriminative k-mers for Discrete Sequence Classification

  • paper_url: http://arxiv.org/abs/2310.10321
  • repo_url: None
  • paper_authors: Junjie Dong, Mudi Jiang, Lianyu Hu, Zengyou He
  • for: 该论文的目的是提出一种新的序列分类方法,以解决现有方法中的一些挑战,如缺乏特征组合的探索和精度下降。
  • methods: 该方法基于1D卷积神经网络(1DCNN)架构,并采用哈明距离基于相似度度量来确保特征挖掘和分类过程中的一致性。具体来说,该方法首先训练一个可解释的CNNEncoder对序列数据进行学习,然后通过梯度下降方式搜索出高度探索的k-mer组合。
  • results: 实验结果表明,该方法在分类精度方面比现有的状态作法更高。
    Abstract Sequence classification has numerous applications in various fields. Despite extensive studies in the last decades, many challenges still exist, particularly in pattern-based methods. Existing pattern-based methods measure the discriminative power of each feature individually during the mining process, leading to the result of missing some combinations of features with discriminative power. Furthermore, it is difficult to ensure the overall discriminative performance after converting sequences into feature vectors. To address these challenges, we propose a novel approach called Hamming Encoder, which utilizes a binarized 1D-convolutional neural network (1DCNN) architecture to mine discriminative k-mer sets. In particular, we adopt a Hamming distance-based similarity measure to ensure consistency in the feature mining and classification procedure. Our method involves training an interpretable CNN encoder for sequential data and performing a gradient-based search for discriminative k-mer combinations. Experiments show that the Hamming Encoder method proposed in this paper outperforms existing state-of-the-art methods in terms of classification accuracy.
    摘要 Sequence 分类有很多应用场景,尤其是在不同领域。 DESPITE extensive studies in the last decades, many challenges still exist, particularly in pattern-based methods. Existing pattern-based methods measure the discriminative power of each feature individually during the mining process, leading to the result of missing some combinations of features with discriminative power. Furthermore, it is difficult to ensure the overall discriminative performance after converting sequences into feature vectors. To address these challenges, we propose a novel approach called Hamming Encoder, which utilizes a binarized 1D-convolutional neural network (1DCNN) architecture to mine discriminative k-mer sets. In particular, we adopt a Hamming distance-based similarity measure to ensure consistency in the feature mining and classification procedure. Our method involves training an interpretable CNN encoder for sequential data and performing a gradient-based search for discriminative k-mer combinations. Experiments show that the Hamming Encoder method proposed in this paper outperforms existing state-of-the-art methods in terms of classification accuracy.Here's the word-for-word translation:序列分类有很多应用场景,尤其是在不同领域。 DESPITE extensive studies in the last decades, many challenges still exist, particularly in pattern-based methods. 现有的 pattern-based methods 在挖掘过程中对每个特征进行分解能力的测量,导致漏掉一些特征组合的分解能力。 更重要的是,将序列转换为特征向量后,保证总的分解性能是一个大问题。 To address these challenges, we propose a novel approach called Hamming Encoder, which utilizes a binarized 1D-convolutional neural network (1DCNN) architecture to mine discriminative k-mer sets. In particular, we adopt a Hamming distance-based similarity measure to ensure consistency in the feature mining and classification procedure. Our method involves training an interpretable CNN encoder for sequential data and performing a gradient-based search for discriminative k-mer combinations. Experiments show that the Hamming Encoder method proposed in this paper outperforms existing state-of-the-art methods in terms of classification accuracy.

  • paper_url: http://arxiv.org/abs/2310.10315
  • repo_url: None
  • paper_authors: Kamila Zaman, Alberto Marchisio, Muhammad Abdullah Hanif, Muhammad Shafique
  • for: 这篇论文主要是为了提供一个全面的Quantum Machine Learning(QML)领域的审视,并对不同的QML算法、量子数据集、硬件技术、软件工具、模拟器和应用场景进行了详细的介绍。
  • methods: 本论文使用了多种方法,包括论述基础概念、对класситиче计算的比较、介绍不同的QML算法和其适用领域、描述量子数据集和硬件技术的发展,以及介绍软件工具和模拟器。
  • results: 本论文提供了大量有价值的信息和资源,可以帮助读者快速入门到当前QML领域的state-of-the-art技术。
    Abstract Quantum Computing (QC) claims to improve the efficiency of solving complex problems, compared to classical computing. When QC is applied to Machine Learning (ML) applications, it forms a Quantum Machine Learning (QML) system. After discussing the basic concepts of QC and its advantages over classical computing, this paper reviews the key aspects of QML in a comprehensive manner. We discuss different QML algorithms and their domain applicability, quantum datasets, hardware technologies, software tools, simulators, and applications. In this survey, we provide valuable information and resources for readers to jumpstart into the current state-of-the-art techniques in the QML field.
    摘要 量子计算(QC)宣称可以提高解决复杂问题的效率,相比于经典计算。当QC应用于机器学习(ML)应用时,它形成了量子机器学习(QML)系统。本文详细介绍了QML的关键方面,包括不同的QML算法和它们的领域应用、量子数据集、硬件技术、软件工具、模拟器和应用。本文提供了读者们进入现有技术领域的价值信息和资源,以便他们可以快速掌握当前领域的最新技术。

Transparent Anomaly Detection via Concept-based Explanations

  • paper_url: http://arxiv.org/abs/2310.10702
  • repo_url: None
  • paper_authors: Laya Rafiee Sevyeri, Ivaxi Sheth, Farhood Farahnak, Shirin Abbasinejad Enger
  • for: 本文提出了一种可解释的异常检测方法,以提高异常检测的可读性和人类可解释性。
  • methods: 本文使用了一种基于概念学习的异常检测方法,可以提供人类可解释的概念解释。此外,本文还提出了一种可与其他分类型异常检测方法集成的概念学习方法。
  • results: 本文通过三个实际数据集的实验表明,ACE方法可以提供高或相当于黑色盒模型的准确率,同时具有人类可解释的优势。
    Abstract Advancements in deep learning techniques have given a boost to the performance of anomaly detection. However, real-world and safety-critical applications demand a level of transparency and reasoning beyond accuracy. The task of anomaly detection (AD) focuses on finding whether a given sample follows the learned distribution. Existing methods lack the ability to reason with clear explanations for their outcomes. Hence to overcome this challenge, we propose Transparent {A}nomaly Detection {C}oncept {E}xplanations (ACE). ACE is able to provide human interpretable explanations in the form of concepts along with anomaly prediction. To the best of our knowledge, this is the first paper that proposes interpretable by-design anomaly detection. In addition to promoting transparency in AD, it allows for effective human-model interaction. Our proposed model shows either higher or comparable results to black-box uninterpretable models. We validate the performance of ACE across three realistic datasets - bird classification on CUB-200-2011, challenging histopathology slide image classification on TIL-WSI-TCGA, and gender classification on CelebA. We further demonstrate that our concept learning paradigm can be seamlessly integrated with other classification-based AD methods.
    摘要 深度学习技术的进步使得异常检测性能得到了提高。然而,实际应用中需要更进一步的透明度和理解,而不仅仅是精度。异常检测任务的目标是判断给定样本是否遵循学习的分布。现有方法缺乏对结果的解释能力。因此,我们提出了透明异常检测概念解释(ACE)。ACE可以提供人类可读解释,并且与异常预测一起提供概念。根据我们所知,这是第一篇提出可解释的异常检测方法。此外,ACE还允许人机交互,从而提高了异常检测的效iveness。我们的提议的模型在三个实际数据集上进行验证:鸟类分类在CUB-200-2011上, histopathology slice image分类在TIL-WSI-TCGA上,以及性别分类在CelebA上。此外,我们还证明了我们的概念学习方法可以与其他分类型异常检测方法一起兼容。

Time integration schemes based on neural networks for solving partial differential equations on coarse grids

  • paper_url: http://arxiv.org/abs/2310.10308
  • repo_url: None
  • paper_authors: Xinxin Yan, Zhideng Zhou, Xiaohan Cheng, Xiaolei Yang
  • for: 本研究旨在提出一种基于神经网络的时间步长学习方法,以满足不同数学条件的需求。
  • methods: 本研究使用神经网络学习3步线性多步法,并应用到了三个模拟问题中,即一维热方程、一维波方程和一维吸引方程。
  • results: 结果显示,学习的完全约束方法的预测误差与Runge-Kutta方法和Adams-Bashforth方法的预测误差几乎相同。相比传统方法,学习的无约束和半约束方法在粗网格上显著减少预测误差,特别是对一维热方程的温度预测有显著改善。在4倍粗网格上,一些热方程的 casos 的 Mean Square Error 可以减少一个数量级,而波方程的预测相比传统方法具有明显的改善。在32倍粗网格上,Burgers方程的 Mean Square Error 可以减少35%-40%。
    Abstract The accuracy of solving partial differential equations (PDEs) on coarse grids is greatly affected by the choice of discretization schemes. In this work, we propose to learn time integration schemes based on neural networks which satisfy three distinct sets of mathematical constraints, i.e., unconstrained, semi-constrained with the root condition, and fully-constrained with both root and consistency conditions. We focus on the learning of 3-step linear multistep methods, which we subsequently applied to solve three model PDEs, i.e., the one-dimensional heat equation, the one-dimensional wave equation, and the one-dimensional Burgers' equation. The results show that the prediction error of the learned fully-constrained scheme is close to that of the Runge-Kutta method and Adams-Bashforth method. Compared to the traditional methods, the learned unconstrained and semi-constrained schemes significantly reduce the prediction error on coarse grids. On a grid that is 4 times coarser than the reference grid, the mean square error shows a reduction of up to an order of magnitude for some of the heat equation cases, and a substantial improvement in phase prediction for the wave equation. On a 32 times coarser grid, the mean square error for the Burgers' equation can be reduced by up to 35% to 40%.
    摘要 “对于半精簇方程(PDEs)的粗糙网格解决方法,选择精度方法的选择对准精度有着很大的影响。在这项工作中,我们提出了基于神经网络的时间拟合方法,满足三种不同的数学约束,即无约束、半约束(根条件)和完全约束(根条件和一致性条件)。我们主要关注了学习3步线性多步法,并将其应用于解决三个模型PDE中的一维热方程、一维波方程和一维布尔格方程。结果表明,学习的完全约束方法的预测误差与Runge-Kutta方法和Adams-Bashforth方法几乎相同。相比传统方法,学习无约束和半约束方法在粗糙网格上显著减少预测误差。在参照网格4倍粗的情况下,一些热方程的 случа在下面可以减少至次之 Magnitude 的误差,而波方程的预测也有显著改善。在32倍粗网格上,布尔格方程的误差可以减少35%-40%。”

Mimicking the Maestro: Exploring the Efficacy of a Virtual AI Teacher in Fine Motor Skill Acquisition

  • paper_url: http://arxiv.org/abs/2310.10280
  • repo_url: None
  • paper_authors: Hadar Mulian, Segev Shlomov, Lior Limonad
  • for: 这项研究旨在探讨人工智能教师模型在促进细动技能学习中的潜在优势,以提高学习效率和学习结果的一致性。
  • methods: 该研究采用了人工智能学习和仿真学习方法,通过模拟教师与学生之间的互动来评估人工智能教师模型的效果。
  • results: 研究发现,使用人工智能教师模型可以提高学习效率和学习结果的一致性,并且可以适应不同的学生和学习环境。
    Abstract Motor skills, especially fine motor skills like handwriting, play an essential role in academic pursuits and everyday life. Traditional methods to teach these skills, although effective, can be time-consuming and inconsistent. With the rise of advanced technologies like robotics and artificial intelligence, there is increasing interest in automating such teaching processes using these technologies, via human-robot and human-computer interactions. In this study, we examine the potential of a virtual AI teacher in emulating the techniques of human educators for motor skill acquisition. We introduce an AI teacher model that captures the distinct characteristics of human instructors. Using a Reinforcement Learning environment tailored to mimic teacher-learner interactions, we tested our AI model against four guiding hypotheses, emphasizing improved learner performance, enhanced rate of skill acquisition, and reduced variability in learning outcomes. Our findings, validated on synthetic learners, revealed significant improvements across all tested hypotheses. Notably, our model showcased robustness across different learners and settings and demonstrated adaptability to handwriting. This research underscores the potential of integrating Reinforcement Learning and Imitation Learning models with robotics in revolutionizing the teaching of critical motor skills.
    摘要 motor skills, especially fine motor skills like handwriting, play a crucial role in academic pursuits and everyday life. traditional methods to teach these skills, although effective, can be time-consuming and inconsistent. with the rise of advanced technologies like robotics and artificial intelligence, there is increasing interest in automating such teaching processes using these technologies, via human-robot and human-computer interactions. in this study, we examine the potential of a virtual AI teacher in emulating the techniques of human educators for motor skill acquisition. we introduce an AI teacher model that captures the distinct characteristics of human instructors. using a reinforcement learning environment tailored to mimic teacher-learner interactions, we tested our AI model against four guiding hypotheses, emphasizing improved learner performance, enhanced rate of skill acquisition, and reduced variability in learning outcomes. our findings, validated on synthetic learners, revealed significant improvements across all tested hypotheses. notably, our model showcased robustness across different learners and settings and demonstrated adaptability to handwriting. this research underscores the potential of integrating reinforcement learning and imitation learning models with robotics in revolutionizing the teaching of critical motor skills.

Leveraging heterogeneous spillover effects in maximizing contextual bandit rewards

  • paper_url: http://arxiv.org/abs/2310.10259
  • repo_url: None
  • paper_authors: Ahmed Sayeed Faruk, Elena Zheleva
  • for: 提高个性化推荐的相关性和准确性
  • methods: 利用多重环境抽象和个性化投资策略考虑用户之间的协同影响
  • results: 比现有方法高得多,可以更好地满足用户的需求和期望Here’s a brief explanation of each point:
  • for: The paper aims to improve the relevance and accuracy of personalized recommendations by taking into account the interdependent relationships between users.
  • methods: The proposed method uses a multi-armed bandit framework to model the interactions between users and the items they interact with, and incorporates heterogeneous spillover effects to better capture the impact of one user’s actions on others.
  • results: The proposed method outperforms existing approaches that ignore spillover effects, achieving significantly higher rewards in several real-world datasets.
    Abstract Recommender systems relying on contextual multi-armed bandits continuously improve relevant item recommendations by taking into account the contextual information. The objective of these bandit algorithms is to learn the best arm (i.e., best item to recommend) for each user and thus maximize the cumulative rewards from user engagement with the recommendations. However, current approaches ignore potential spillover between interacting users, where the action of one user can impact the actions and rewards of other users. Moreover, spillover may vary for different people based on their preferences and the closeness of ties to other users. This leads to heterogeneity in the spillover effects, i.e., the extent to which the action of one user can impact the action of another. Here, we propose a framework that allows contextual multi-armed bandits to account for such heterogeneous spillovers when choosing the best arm for each user. By experimenting on several real-world datasets using prominent linear and non-linear contextual bandit algorithms, we observe that our proposed method leads to significantly higher rewards than existing solutions that ignore spillover.
    摘要 Translated into Simplified Chinese: recommender systems 使用 contextual multi-armed bandits 不断提高相关的 item 推荐,通过考虑 contextual 信息来学习每个用户最佳的 arm (即最佳推荐),以达到用户参与推荐的累积奖励的最大化。然而,当前的方法忽略了用户之间的互动副作用,即一个用户的行为会影响另一个用户的行为和奖励。此外,这种副作用可能因用户的偏好和与其他用户之间的关系而异常,即副作用的强度不同。为此,我们提出了一个框架,使得 contextual multi-armed bandits 能够考虑这种异常的副作用,以便为每个用户选择最佳的 arm。通过在一些真实世界数据上使用许多知名的线性和非线性 contextual bandit 算法进行实验,我们发现,我们提出的方法可以与忽略副作用的方法相比,获得更高的奖励。

Leveraging Topological Maps in Deep Reinforcement Learning for Multi-Object Navigation

  • paper_url: http://arxiv.org/abs/2310.10250
  • repo_url: None
  • paper_authors: Simon Hakenes, Tobias Glasmachers
  • for: 解决扩展空间 navigate 极少奖励问题
  • methods: 使用 topological maps 提升 elementary actions 到 object-oriented macro actions
  • results: 使用 DQN agent 解决 otherwise 不可能的环境
    Abstract This work addresses the challenge of navigating expansive spaces with sparse rewards through Reinforcement Learning (RL). Using topological maps, we elevate elementary actions to object-oriented macro actions, enabling a simple Deep Q-Network (DQN) agent to solve otherwise practically impossible environments.
    摘要 这个工作面临了在广阔空间中缺乏奖励的挑战,通过再增 learning (RL) 方法解决。我们使用 topological maps,将基本的动作提升到对象层次的macro动作,使得简单的深度Q网络 (DQN) Agent 能够解决 otherwise 不可能的环境。

The Mixtures and the Neural Critics: On the Pointwise Mutual Information Profiles of Fine Distributions

  • paper_url: http://arxiv.org/abs/2310.10240
  • repo_url: https://github.com/cbg-ethz/bmi
  • paper_authors: Paweł Czyż, Frederic Grabowski, Julia E. Vogt, Niko Beerenwinkel, Alexander Marx
  • for: 这个论文研究了点wise矩阵相互信息的profile,这是矩阵相互信息的一种扩展,它保持了 diffeomorphisms 的变换不变性。
  • methods: 论文使用了 Monte Carlo 方法来近似 multivariate normal distributions 的profile,并 introduce了 fine distributions 家族,可以用来研究现有的矩阵相互信息估计器的局限性,以及 neural critics 在variational estimators 中的行为。
  • results: 论文显示了 fine distributions 可以用来研究矩阵相互信息估计器的局限性,以及 neural critics 的行为,并可以用来获得model-based Bayesian 矩阵相互信息估计,适用于具有可用的领域专业知识的问题,在哪里 uncertainty quantification 是必要的。
    Abstract Mutual information quantifies the dependence between two random variables and remains invariant under diffeomorphisms. In this paper, we explore the pointwise mutual information profile, an extension of mutual information that maintains this invariance. We analytically describe the profiles of multivariate normal distributions and introduce the family of fine distributions, for which the profile can be accurately approximated using Monte Carlo methods. We then show how fine distributions can be used to study the limitations of existing mutual information estimators, investigate the behavior of neural critics used in variational estimators, and understand the effect of experimental outliers on mutual information estimation. Finally, we show how fine distributions can be used to obtain model-based Bayesian estimates of mutual information, suitable for problems with available domain expertise in which uncertainty quantification is necessary.
    摘要 互信息量量化了两个随机变量之间的依赖关系,并保持不变于 diffeomorphisms。在这篇文章中,我们探讨了点 wise 互信息 Profile,是互信息的扩展,保持这种不变性。我们 analytically 描述了多变量正态分布的 Profile,并引入了 fine 分布家族,其中 profile 可以使用 Monte Carlo 方法准确地 approximation。然后,我们示示了 fine 分布可以用来研究现有互信息估计器的限制,调查变量批评器在variational estimator中的行为,并理解试验异常点对互信息估计的影响。最后,我们示示了 fine 分布可以用来获得基于模型的 Bayesian 估计,适用于具有可用的领域专业知识的问题,在uncertainty quantification中需要。

Structural transfer learning of non-Gaussian DAG

  • paper_url: http://arxiv.org/abs/2310.10239
  • repo_url: None
  • paper_authors: Mingyang Ren, Xin He, Junhui Wang
  • for: targets to improve the reconstruction of directional relationships among nodes in a directed acyclic graph (DAG) using heterogeneous data from multiple studies.
  • methods: proposes a novel set of structural similarity measures for DAG and a transfer DAG learning framework that leverages information from auxiliary DAGs of different levels of similarities.
  • results: substantial improvement in DAG reconstruction in the target study, even when no auxiliary DAG is overall similar to the target DAG, and supported by extensive numerical experiments on both synthetic data and multi-site brain functional connectivity network data.Here’s the full translation in Simplified Chinese:
  • for: 本研究目的是提高基于多个研究中收集的不同数据的指向关系图(DAG)的重建精度。
  • methods: 提出了一种新的结构相似度测量方法,并提出了一种基于不同相似度水平的转移DAG学习框架,以有效地利用 auxiliary DAGs 中的信息。
  • results: 对目标研究中的 DAGC 重建具有重要提高,即使 auxiliary DAG 与目标 DAGC 无法总体相似,而且通过对 sintetic 数据和多地点大脑功能连接网络数据进行广泛的数值实验支持。
    Abstract Directed acyclic graph (DAG) has been widely employed to represent directional relationships among a set of collected nodes. Yet, the available data in one single study is often limited for accurate DAG reconstruction, whereas heterogeneous data may be collected from multiple relevant studies. It remains an open question how to pool the heterogeneous data together for better DAG structure reconstruction in the target study. In this paper, we first introduce a novel set of structural similarity measures for DAG and then present a transfer DAG learning framework by effectively leveraging information from auxiliary DAGs of different levels of similarities. Our theoretical analysis shows substantial improvement in terms of DAG reconstruction in the target study, even when no auxiliary DAG is overall similar to the target DAG, which is in sharp contrast to most existing transfer learning methods. The advantage of the proposed transfer DAG learning is also supported by extensive numerical experiments on both synthetic data and multi-site brain functional connectivity network data.
    摘要

GEVO-ML: Optimizing Machine Learning Code with Evolutionary Computation

  • paper_url: http://arxiv.org/abs/2310.10211
  • repo_url: None
  • paper_authors: Jhe-Yu Liou, Stephanie Forrest, Carole-Jean Wu
  • for: 本研究旨在提高大规模机器学习(ML)应用中的并行加速器(如GPU)的性能,但 ML 模型开发者通常缺乏关于下游系统架构的详细知识,而系统编程者则通常没有高级别的 ML 模型的理解。
  • methods: 本研究提出了 GEVO-ML,一种自动发现优化机会并调整 ML kernels性能的工具,其中 ML 模型和训练/预测过程都是通过单一高级表示语言(多层次中间表示语言,MLIR)表示的。GEVO-ML 使用多目标进化搜索发现 MLIR 代码中的修改(突变),以提高 Desired riteria 的性能,保留必要的功能。
  • results: 对两个不同的 ML 工作负荷进行了训练和预测。GEVO-ML 在这两个模型中发现了显著的 Pareto 提升,提高了模型精度的误差率从 2% 下降到 90.43%,并在训练工作负荷中提高了模型精度从 91% 到 96%,无需牺牲训练或测试速度。分析表明,GEVO-ML 的关键突变包括多种code修改,尽管可能不familiar with human developers,但它们实现了类似于人类开发者在模型设计中进行的改进,例如更改学习率或者修剪不必要的层参数。
    Abstract Parallel accelerators, such as GPUs, are key enablers for large-scale Machine Learning (ML) applications. However, ML model developers often lack detailed knowledge of the underlying system architectures, while system programmers usually do not have a high-level understanding of the ML model that runs on the specific system. To mitigate this gap between two relevant aspects of domain knowledge, this paper proposes GEVO-ML, a tool for automatically discovering optimization opportunities and tuning the performance of ML kernels, where the model and training/prediction processes are uniformly represented in a single intermediate language, the Multiple-Layer Intermediate Representation (MLIR). GEVO-ML uses multi-objective evolutionary search to find edits (mutations) to MLIR code that ultimately runs on GPUs, improving performance on desired criteria while retaining required functionality. We demonstrate GEVO-ML on two different ML workloads for both model training and prediction. GEVO-ML finds significant Pareto improvements for these models, achieving 90.43% performance improvement when model accuracy is relaxed by 2%, from 91.2% to 89.3%. For the training workloads, GEVO-ML finds a 4.88% improvement in model accuracy, from 91% to 96%, without sacrificing training or testing speed. Our analysis of key GEVO-ML mutations reveals diverse code modifications, while might be foreign to human developers, achieving similar effects with how human developers improve model design, for example, by changing learning rates or pruning non-essential layer parameters.
    摘要 高级加速器,如图形处理器(GPU),是大规模机器学习(ML)应用的关键驱动器。然而,ML模型开发者经常缺乏深入的系统架构知识,而系统编程者通常没有高级的ML模型的具体知识。为了 bridge这两个领域的知识差距,这篇论文提出了 GEVO-ML,一种自动发现优化机会并调整 ML kernels的工具。GEVO-ML 使用多目标进化搜索来找到 MLIR 代码中的修改(突变),以提高 Desired 特性的性能,保留必要的功能。我们在两个不同的 ML 任务上运行 GEVO-ML,包括模型训练和预测。GEVO-ML 在这些模型上发现了显著的 pareto 改进,将模型精度从 91.2% 下降到 89.3%,同时提高了性能。对于训练任务,GEVO-ML 提高了模型精度从 91% 到 96%,而无需牺牲训练或测试速度。我们分析了 GEVO-ML 中关键的突变,发现这些突变可能 foreign 于人类开发者,但具有类似的效果,例如更改学习率或减少不必要的层参数。

Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World

  • paper_url: http://arxiv.org/abs/2310.10207
  • repo_url: https://github.com/joyjayng/Bongard-OpenWorld
  • paper_authors: Rujie Wu, Xiaojian Ma, Qing Li, Wei Wang, Zhenliang Zhang, Song-Chun Zhu, Yizhou Wang
  • for: 评估现实世界中的几何shot理解能力,即通过几何shot的图像训练模型可以在新的图像中理解和分类图像。
  • methods: 使用经典的Bongard问题(BPs)作为基础,并添加两种新的挑战:1)开放世界自由形容符,即图像概念由开放词汇中的图像特征和概念组成,2)使用实际世界图像而非 sintetic 图像。
  • results: 研究发现,当前的几何shot理解算法面临 significiant 挑战,而且even irectly probing VLMs 和 combining VLMs 和 LLMs 在交互理解方案中,无法距离人类的问题解决能力(64% 准确率,而人类参与者可以达到 91%)。
    Abstract We introduce Bongard-OpenWorld, a new benchmark for evaluating real-world few-shot reasoning for machine vision. It originates from the classical Bongard Problems (BPs): Given two sets of images (positive and negative), the model needs to identify the set that query images belong to by inducing the visual concepts, which is exclusively depicted by images from the positive set. Our benchmark inherits the few-shot concept induction of the original BPs while adding the two novel layers of challenge: 1) open-world free-form concepts, as the visual concepts in Bongard-OpenWorld are unique compositions of terms from an open vocabulary, ranging from object categories to abstract visual attributes and commonsense factual knowledge; 2) real-world images, as opposed to the synthetic diagrams used by many counterparts. In our exploration, Bongard-OpenWorld already imposes a significant challenge to current few-shot reasoning algorithms. We further investigate to which extent the recently introduced Large Language Models (LLMs) and Vision-Language Models (VLMs) can solve our task, by directly probing VLMs, and combining VLMs and LLMs in an interactive reasoning scheme. We even designed a neuro-symbolic reasoning approach that reconciles LLMs & VLMs with logical reasoning to emulate the human problem-solving process for Bongard Problems. However, none of these approaches manage to close the human-machine gap, as the best learner achieves 64% accuracy while human participants easily reach 91%. We hope Bongard-OpenWorld can help us better understand the limitations of current visual intelligence and facilitate future research on visual agents with stronger few-shot visual reasoning capabilities.
    摘要 我们介绍了一个新的评价实际世界几何reasoning的benchmark,即Bongard-OpenWorld。它基于经典的Bongard问题(BP),要求模型通过引入视觉概念,将查询图像分配到正确的集合中。我们的benchmark继承了原BP的几何概念引入,并添加了两个新的挑战:1)开放世界自由形态概念,视觉概念在Bongard-OpenWorld中是独特的词汇库中的组合,范围从物体类到抽象视觉特征和常识知识; 2)实际图像,而不是许多同类的synthetic图像。在我们的探索中,Bongard-OpenWorld已经对当前几何reasoning算法带来了 significativetranslation challenges。我们进一步调查了current Large Language Models (LLMs)和Vision-Language Models (VLMs)是否可以解决我们的任务,直接考试VLMs,并将VLMs和LLMs结合在互动理解方案中。我们甚至设计了一种神经符号逻辑 reasoningapproach,将LLMs & VLMs与逻辑逻辑 reasoning相结合,以便模拟人类问题解决过程。然而, none of these approaches manage to close the human-machine gap,best learner的准确率只有64%,而人类参与者容易达到91%。我们希望Bongard-OpenWorld可以帮助我们更好地理解当前视觉智能的局限性,并促进未来的视觉代理人with stronger few-shot visual reasoning能力的研究。

Interpretable Predictive Models to Understand Risk Factors for Maternal and Fetal Outcomes

  • paper_url: http://arxiv.org/abs/2310.10203
  • repo_url: None
  • paper_authors: Tomas M. Bosschieter, Zifei Xu, Hui Lan, Benjamin J. Lengerich, Harsha Nori, Ian Painter, Vivienne Souter, Rich Caruana
  • for: 这个论文旨在提高妈妈和婴儿的健康,通过更好地理解风险因素,加强高风险患者的监测,及时采取有效措施,以便妈妈医生能够提供更好的照料。
  • methods: 这篇论文使用了可解释扩展机器学习方法(EBM)进行预测和重要风险因素的 indentification。EBM具有高准确率和可解释性,并且在验证和稳定性分析中证明了其可靠性。
  • results: 研究发现,EBM模型可以准确预测四种妈妈和婴儿的病情,并且可以提供有价值的风险因素。例如, maternal height 是Shoulder dystocia 的第二重要风险因素。这些结果表明,EBM模型在预测和预防妈妈和婴儿的严重病情中具有优秀的性能和可解释性。
    Abstract Although most pregnancies result in a good outcome, complications are not uncommon and can be associated with serious implications for mothers and babies. Predictive modeling has the potential to improve outcomes through better understanding of risk factors, heightened surveillance for high risk patients, and more timely and appropriate interventions, thereby helping obstetricians deliver better care. We identify and study the most important risk factors for four types of pregnancy complications: (i) severe maternal morbidity, (ii) shoulder dystocia, (iii) preterm preeclampsia, and (iv) antepartum stillbirth. We use an Explainable Boosting Machine (EBM), a high-accuracy glass-box learning method, for prediction and identification of important risk factors. We undertake external validation and perform an extensive robustness analysis of the EBM models. EBMs match the accuracy of other black-box ML methods such as deep neural networks and random forests, and outperform logistic regression, while being more interpretable. EBMs prove to be robust. The interpretability of the EBM models reveals surprising insights into the features contributing to risk (e.g. maternal height is the second most important feature for shoulder dystocia) and may have potential for clinical application in the prediction and prevention of serious complications in pregnancy.
    摘要

An Interpretable Deep-Learning Framework for Predicting Hospital Readmissions From Electronic Health Records

  • paper_url: http://arxiv.org/abs/2310.10187
  • repo_url: None
  • paper_authors: Fabio Azzalini, Tommaso Dolci, Marco Vagaggini
  • for: 预测医院复 admit 的风险,以降低医疗成本并提高患者健康状况。
  • methods: 提出了一种新的、可解释的深度学习框架,基于 NLP 发现 word embeddings 和 ConvLSTM 神经网络模型,以更好地处理时间数据。
  • results: 对医院复 admit 的预测任务进行了 validate,并 introduce 了一种模型依赖的技术来使结果更容易被医疗 personnels 理解。结果比传统基于机器学习的模型提供更好的性能,同时也提供了更加可解释的结果。
    Abstract With the increasing availability of patients' data, modern medicine is shifting towards prospective healthcare. Electronic health records contain a variety of information useful for clinical patient description and can be exploited for the construction of predictive models, given that similar medical histories will likely lead to similar progressions. One example is unplanned hospital readmission prediction, an essential task for reducing hospital costs and improving patient health. Despite predictive models showing very good performances especially with deep-learning models, they are often criticized for the poor interpretability of their results, a fundamental characteristic in the medical field, where incorrect predictions might have serious consequences for the patient health. In this paper we propose a novel, interpretable deep-learning framework for predicting unplanned hospital readmissions, supported by NLP findings on word embeddings and by neural-network models (ConvLSTM) for better handling temporal data. We validate our system on the two predictive tasks of hospital readmission within 30 and 180 days, using real-world data. In addition, we introduce and test a model-dependent technique to make the representation of results easily interpretable by the medical staff. Our solution achieves better performances compared to traditional models based on machine learning, while providing at the same time more interpretable results.
    摘要 Translation notes:* "prospective healthcare" is translated as "前瞻医疗" (pre-emptive healthcare)* "electronic health records" is translated as "电子健康记录" (electronic health records)* "clinical patient description" is translated as "临床患者描述" (clinical patient description)* "unplanned hospital readmission" is translated as "不计划入院" (unplanned hospital readmission)* "predictive models" is translated as "预测模型" (predictive models)* "word embeddings" is translated as "词嵌入" (word embeddings)* "ConvLSTM" is translated as "卷积LSTM" (ConvLSTM)* "temporal data" is translated as "时间数据" (temporal data)* "medical staff" is translated as "医疗人员" (medical staff)

Hypergraph Echo State Network

  • paper_url: http://arxiv.org/abs/2310.10177
  • repo_url: None
  • paper_authors: Justin Lien
  • for: 这篇文章是用于描述一种基于几何网络的对数据进行有效处理的网络模型,并且提出了一个基于几何网络的对数状态网络(HypergraphESN)的设计。
  • methods: 这篇文章使用了一种基于几何网络的对数状态网络(HypergraphESN),并且提出了这个方法的算法和稳定性条件。
  • results: 数据实验显示,HypergraphESN在处理几何网络结构的数据时,能够与传统的几何网络模型(GraphESN)相比,获得更高的准确率。具体来说,HypergraphESN在处理高阶相互作用的数据时,能够更好地处理非线性特征,并且可以实现更高的准确率。
    Abstract A hypergraph as a generalization of graphs records higher-order interactions among nodes, yields a more flexible network model, and allows non-linear features for a group of nodes. In this article, we propose a hypergraph echo state network (HypergraphESN) as a generalization of graph echo state network (GraphESN) designed for efficient processing of hypergraph-structured data, derive convergence conditions for the algorithm, and discuss its versatility in comparison to GraphESN. The numerical experiments on the binary classification tasks demonstrate that HypergraphESN exhibits comparable or superior accuracy performance to GraphESN for hypergraph-structured data, and accuracy increases if more higher-order interactions in a network are identified.
    摘要 一种超графи(hypergraph)是图的扩展,用于记录高阶交互 among nodes,具有更灵活的网络模型,并允许非线性特征 для一组节点。在这篇文章中,我们提议一种基于超графи的响应状态网络(HypergraphESN)作为图响应状态网络(GraphESN)的扩展,用于高效处理超графи结构数据, derivation of convergence conditions for the algorithm, and discussion of its versatility compared to GraphESN. 数值实验表明,对于二分类任务,HypergraphESN可以与GraphESN具有相同或更高的准确率表现,并且如果在网络中更多的高阶交互被标识, то准确率会进一步提高。

On permutation symmetries in Bayesian neural network posteriors: a variational perspective

  • paper_url: http://arxiv.org/abs/2310.10171
  • repo_url: None
  • paper_authors: Simone Rossi, Ankit Singh, Thomas Hannagan
  • for: 这项研究旨在理解神经网络中梯度下降优化的困难性, 以及 Bayesian neural networks(BNNs)中 approximate inference 的问题。
  • methods: 这篇论文使用了 marginalized loss barrier 和 solution interpolation 的扩展 formalism, 以及一种匹配算法来搜索线性连接的解。
  • results: 实验结果表明, 对于多种架构和数据集, linearly connected solutions 的 marginalized loss barrier 几乎为零。
    Abstract The elusive nature of gradient-based optimization in neural networks is tied to their loss landscape geometry, which is poorly understood. However recent work has brought solid evidence that there is essentially no loss barrier between the local solutions of gradient descent, once accounting for weight-permutations that leave the network's computation unchanged. This raises questions for approximate inference in Bayesian neural networks (BNNs), where we are interested in marginalizing over multiple points in the loss landscape. In this work, we first extend the formalism of marginalized loss barrier and solution interpolation to BNNs, before proposing a matching algorithm to search for linearly connected solutions. This is achieved by aligning the distributions of two independent approximate Bayesian solutions with respect to permutation matrices. We build on the results of Ainsworth et al. (2023), reframing the problem as a combinatorial optimization one, using an approximation to the sum of bilinear assignment problem. We then experiment on a variety of architectures and datasets, finding nearly zero marginalized loss barriers for linearly connected solutions.
    摘要 “神经网络中梯度基本优化的难易程度与其损失函数 geometry 存在深刻的关系。然而,最近的研究表明,在考虑权重 Permutation 后,梯度 descend 的本地解决方案之间存在无损函数梯度。这引发了对 approximate inference 在 Bayesian neural networks (BNNs)中进行 marginalization 的问题。在这个工作中,我们首先扩展了 marginalized loss barrier 和 solution interpolation 的形式主义,然后提出了一种匹配算法,用于搜索 linearly connected solutions。这是通过对两个独立的approximate Bayesian解决方案的分布进行对齐,以实现对 permutation matrices 的Alignment。我们基于 Ainsworth et al. (2023) 的结果,重新定义问题为一个 combinatorial optimization 问题,使用一种 Approximation 来计算 bilinear assignment problem 的和。然后我们在不同的架构和数据集上进行了实验,发现linearly connected solutions 的 marginalized loss barrier 几乎为零。”Note: The translation is in Simplified Chinese, which is the standardized form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.

An Empirical Study of Simplicial Representation Learning with Wasserstein Distance

  • paper_url: http://arxiv.org/abs/2310.10143
  • repo_url: None
  • paper_authors: Makoto Yamada, Yuki Takezawa, Guillaume Houry, Kira Michaela Dusterwald, Deborah Sulem, Han Zhao, Yao-Hung Hubert Tsai
  • for: 本研究探讨了使用树结构上的1- Wasserstein距离(Tree-Wasserstein distance,TWD)来学习 simplicial 表示,TWD 是两个树嵌入向量之间的L1距离。
  • methods: 本研究使用了一种基于自动采样的自监学习方法,使用 TWD 作为相似度度量,并提出了一种简单 yet effective的 Jeffrey divergence 基于正则化方法来稳定优化。
  • results: 通过对 STL10、CIFAR10、CIFAR100 和 SVHN 等数据集进行实验,研究发现,将 softmax 函数和 TWD 组合使用可以获得较低的结果,而且模型性能取决于 TWD 和 simplicial 模型的组合,并且 Jeffrey divergence 正则化通常能够稳定模型训练。最终,研究人员发现了选择合适的 TWD 和 simplicial 模型的组合可以超越cosine similarity 基于表示学习。
    Abstract In this paper, we delve into the problem of simplicial representation learning utilizing the 1-Wasserstein distance on a tree structure (a.k.a., Tree-Wasserstein distance (TWD)), where TWD is defined as the L1 distance between two tree-embedded vectors. Specifically, we consider a framework for simplicial representation estimation employing a self-supervised learning approach based on SimCLR with a negative TWD as a similarity measure. In SimCLR, the cosine similarity with real-vector embeddings is often utilized; however, it has not been well studied utilizing L1-based measures with simplicial embeddings. A key challenge is that training the L1 distance is numerically challenging and often yields unsatisfactory outcomes, and there are numerous choices for probability models. Thus, this study empirically investigates a strategy for optimizing self-supervised learning with TWD and find a stable training procedure. More specifically, we evaluate the combination of two types of TWD (total variation and ClusterTree) and several simplicial models including the softmax function, the ArcFace probability model, and simplicial embedding. Moreover, we propose a simple yet effective Jeffrey divergence-based regularization method to stabilize the optimization. Through empirical experiments on STL10, CIFAR10, CIFAR100, and SVHN, we first found that the simple combination of softmax function and TWD can obtain significantly lower results than the standard SimCLR (non-simplicial model and cosine similarity). We found that the model performance depends on the combination of TWD and the simplicial model, and the Jeffrey divergence regularization usually helps model training. Finally, we inferred that the appropriate choice of combination of TWD and simplicial models outperformed cosine similarity based representation learning.
    摘要 在这篇论文中,我们研究了使用树结构上的一 Wasserstein 距离(TWD)来学习 simplicial 表示,其中 TWD 定义为两个树嵌入向量之间的 L1 距离。我们考虑了一种基于自适应学习的 simplicial 表示估计方法,使用 SimCLR 自适应学习框架,并使用 TWD 作为相似度量。在 SimCLR 中,通常使用 cosine 相似性来衡量实际向量嵌入,但是使用 L1 基于的度量尚未得到广泛研究。主要挑战在于训练 L1 距离是数值上困难,而且存在多种概率模型选择。因此,我们在这篇论文中进行了实验性的研究,以便在 TWD 和 simplicial 模型之间找到稳定的训练过程。具体来说,我们评估了两种类型的 TWD(总变量和 ClusterTree)以及多种 simplicial 模型,包括软max 函数、ArcFace 概率模型和 simplicial 嵌入。此外,我们还提出了一种简单 yet 有效的 Jeffrey 分布基本规范 regularization 方法,以稳定优化。通过对 STL10、CIFAR10、CIFAR100 和 SVHN 等数据集进行实验,我们发现了以下结论:1. 将 softmax 函数和 TWD 结合使用可以获得显著更好的结果,与标准 SimCLR(非 simplicial 模型和 cosine 相似性)相比。2. 模型性能取决于 TWD 和 simplicial 模型的组合,而 Jeffrey 分布基本规范常常帮助模型训练。3. 选择合适的 TWD 和 simplicial 模型的组合,通常会超越 cosine 相似性基于的表示学习。总之,我们的研究表明,使用 TWD 和 simplicial 模型可以提高表示学习的性能,并且可以选择合适的组合来超越 cosine 相似性基于的表示学习。

A Comprehensive Study of Privacy Risks in Curriculum Learning

  • paper_url: http://arxiv.org/abs/2310.10124
  • repo_url: None
  • paper_authors: Joann Qiongna Chen, Xinlei He, Zheng Li, Yang Zhang, Zhou Li
  • for: 本研究旨在探讨curriculum learning(CL)对机器学习的隐私影响,以填补现有的知识空白。
  • methods: 我们使用了membership inference attack(MIA)和attribute inference attack(AIA)两种方法来衡量CL对隐私的泄露。
  • results: 我们的evalution结果显示,MIA在CL下变得slightly more effective,但它的影响尤其明显于difficult sample subset。AIA对CL下的模型比MIA更加敏感,而exististing defense techniques仍然有效。此外,我们还提出了一种新的MIA方法,称为Diff-Cali,它基于difficulty scores进行结果准确化。
    Abstract Training a machine learning model with data following a meaningful order, i.e., from easy to hard, has been proven to be effective in accelerating the training process and achieving better model performance. The key enabling technique is curriculum learning (CL), which has seen great success and has been deployed in areas like image and text classification. Yet, how CL affects the privacy of machine learning is unclear. Given that CL changes the way a model memorizes the training data, its influence on data privacy needs to be thoroughly evaluated. To fill this knowledge gap, we perform the first study and leverage membership inference attack (MIA) and attribute inference attack (AIA) as two vectors to quantify the privacy leakage caused by CL. Our evaluation of nine real-world datasets with attack methods (NN-based, metric-based, label-only MIA, and NN-based AIA) revealed new insights about CL. First, MIA becomes slightly more effective when CL is applied, but the impact is much more prominent to a subset of training samples ranked as difficult. Second, a model trained under CL is less vulnerable under AIA, compared to MIA. Third, the existing defense techniques like DP-SGD, MemGuard, and MixupMMD are still effective under CL, though DP-SGD has a significant impact on target model accuracy. Finally, based on our insights into CL, we propose a new MIA, termed Diff-Cali, which exploits the difficulty scores for result calibration and is demonstrated to be effective against all CL methods and the normal training method. With this study, we hope to draw the community's attention to the unintended privacy risks of emerging machine-learning techniques and develop new attack benchmarks and defense solutions.
    摘要 通过训练机器学习模型使用meaningful order的数据,即从易到难,已经证明可以加速训练过程并提高模型性能。关键技术是curriculum learning(CL),已经在图像和文本分类等领域取得了很大成功。然而,CL对机器学习的隐私影响是不清楚。因为CL改变了模型对训练数据的记忆方式,因此其对隐私的影响需要进行仔细评估。为了填补这个知识空白,我们进行了第一个研究,并利用成员推理攻击(MIA)和特征推理攻击(AIA)作为两种量度CL对隐私的泄露的方法。我们对九个实际 datasets进行了评估,并使用NN-based、metric-based、label-only MIA和NN-based AIA等方法进行攻击。我们发现了以下新的发现:1. MIA在CL应用后变得略微更加有效,但对于一些训练样本 ranked as difficult 的影响更加明显。2. 一个CL训练的模型对AIA更加抵触,相比于MIA。3. 现有的防御技术如DP-SGD、MemGuard和MixupMMD仍然有效于CL,尽管DP-SGD对目标模型准确率有显著影响。4. 基于我们对CL的发现,我们提出了一种新的MIA,称为Diff-Cali,它利用难度分数进行结果准确性的调整,并证明可以有效地对CL方法和常规训练方法进行攻击。通过这项研究,我们希望能吸引社区关注机器学习领域的意外隐私风险,并开发新的攻击 benchmark和防御解决方案。

A proximal augmented Lagrangian based algorithm for federated learning with global and local convex conic constraints

  • paper_url: http://arxiv.org/abs/2310.10117
  • repo_url: None
  • paper_authors: Chuan He, Le Peng, Ju Sun
  • for: 本研究针对 federated learning (FL) with constraints 进行研究,实现了在中央服务器和所有本地客户端之间收集数据,并实现了模型训练。
  • methods: 本研究提出了一个基于 proximal augmented Lagrangian (AL) 的 federated learning 框架,并使用了各种对数方法来解决问题。
  • results: 本研究的实验结果显示了这个方法在 Neyman-Pearson 分类和模型公平性方面的实际优势。 另外,本研究还提出了一个新的 federated learning 框架,具有全球和本地凸对数约束的特点。
    Abstract This paper considers federated learning (FL) with constraints, where the central server and all local clients collectively minimize a sum of convex local objective functions subject to global and local convex conic constraints. To train the model without moving local data from clients to the central server, we propose an FL framework in which each local client performs multiple updates using the local objective and local constraint, while the central server handles the global constraint and performs aggregation based on the updated local models. In particular, we develop a proximal augmented Lagrangian (AL) based algorithm for FL with global and local convex conic constraints. The subproblems arising in this algorithm are solved by an inexact alternating direction method of multipliers (ADMM) in a federated fashion. Under a local Lipschitz condition and mild assumptions, we establish the worst-case complexity bounds of the proposed algorithm for finding an approximate KKT solution. To the best of our knowledge, this work proposes the first algorithm for FL with global and local constraints. Our numerical experiments demonstrate the practical advantages of our algorithm in performing Neyman-Pearson classification and enhancing model fairness in the context of FL.
    摘要 We develop a proximal augmented Lagrangian (AL) based algorithm for FL with global and local convex conic constraints. The subproblems arising in this algorithm are solved using an inexact alternating direction method of multipliers (ADMM) in a federated fashion. Under local Lipschitz conditions and mild assumptions, we establish the worst-case complexity bounds of the proposed algorithm for finding an approximate KKT solution.To the best of our knowledge, this work proposes the first algorithm for FL with global and local constraints. Our numerical experiments demonstrate the practical advantages of our algorithm in performing Neyman-Pearson classification and enhancing model fairness in the context of FL.Here's the Simplified Chinese translation:这篇论文研究了基于约束的联合学习(Federated Learning,FL),其中中央服务器和所有本地客户端共同减少一个拥有 convex 本地目标函数的总和,同时遵循全局和本地 convex 凹陷约束。为了不让本地数据从客户端传输到中央服务器,我们提议了一种基于 FL 的框架,其中每个本地客户端可以多次使用本地目标和约束进行更新,而中央服务器则负责全局约束并基于更新后的本地模型进行聚合。我们开发了一种基于 proximal augmented Lagrangian(AL)的算法,用于解决 FL 中的全局和本地 convex 凹陷约束问题。这些子问题在我们的算法中使用了一种不精确的 alternating direction method of multipliers(ADMM)来解决。在本地 Lipschitz 条件和某些假设下,我们确定了我们提议的算法的最坏情况复杂性 bound。根据我们所知,这是第一个基于 FL 的全局和本地约束算法。我们的数据实验表明,我们的算法在 Neyman-Pearson 分类和 Federation Learning 中的实际优势。

PAC Learning Linear Thresholds from Label Proportions

  • paper_url: http://arxiv.org/abs/2310.10098
  • repo_url: None
  • paper_authors: Anand Brahmbhatt, Rishi Saket, Aravindan Raghuveer
  • for: 本研究旨在学习从标签分布中提取信息,特别是在标签分布中存在噪声和不确定性的情况下。
  • methods: 本研究使用了一种基于 Gaussian distribution 的方法,使用随机抽样来估算标签分布的方差矩阵,并使用这个矩阵来定义一个特征向量空间中的正常向量。然后,使用这个正常向量来定义一个线性阈值函数(LTF),并使用这个 LTF 来学习实例分类器。
  • results: 本研究表明,使用这种方法可以高效地学习 LTF,并且可以在标签分布中存在噪声和不确定性的情况下提取有用的信息。此外,研究还提供了一些总体错误 bounds 和特性分布 bounds,以确保学习的准确性和稳定性。实验评估表明,本方法可以与 [Saket’21, Saket’22] 等方法相比,并且在某些特殊情况下可以提供更高的准确性。
    Abstract Learning from label proportions (LLP) is a generalization of supervised learning in which the training data is available as sets or bags of feature-vectors (instances) along with the average instance-label of each bag. The goal is to train a good instance classifier. While most previous works on LLP have focused on training models on such training data, computational learnability of LLP was only recently explored by [Saket'21, Saket'22] who showed worst case intractability of properly learning linear threshold functions (LTFs) from label proportions. However, their work did not rule out efficient algorithms for this problem on natural distributions. In this work we show that it is indeed possible to efficiently learn LTFs using LTFs when given access to random bags of some label proportion in which feature-vectors are, conditioned on their labels, independently sampled from a Gaussian distribution $N(\mathbf{\mu}, \mathbf{\Sigma})$. Our work shows that a certain matrix -- formed using covariances of the differences of feature-vectors sampled from the bags with and without replacement -- necessarily has its principal component, after a transformation, in the direction of the normal vector of the LTF. Our algorithm estimates the means and covariance matrices using subgaussian concentration bounds which we show can be applied to efficiently sample bags for approximating the normal direction. Using this in conjunction with novel generalization error bounds in the bag setting, we show that a low error hypothesis LTF can be identified. For some special cases of the $N(\mathbf{0}, \mathbf{I})$ distribution we provide a simpler mean estimation based algorithm. We include an experimental evaluation of our learning algorithms along with a comparison with those of [Saket'21, Saket'22] and random LTFs, demonstrating the effectiveness of our techniques.
    摘要 学习从标签比例(LLP)是一种泛化超级vised学习,其训练数据为特征向量集或包中的平均实例标签。目标是训练一个好的实例分类器。而大多数之前的LLP研究都集中在模型的训练上,而Computational learnability of LLP只在[Saket'21, Saket'22]中被研究过,他们表明了线性阈值函数(LTF)的合理学习是最坏情况不可能的。然而,他们的工作没有排除了自然分布下的有效算法。在这个工作中,我们证明了可以高效地学习LTF,只要给出Random Bag of Label Proportions(RBLP)中的特征向量,并且这些特征向量是Conditioned on their labels,独立地从 Gaussian 分布 $N(\mathbf{\mu}, \mathbf{\Sigma})$ 中随机抽取。我们的工作表明,一个特定的矩阵,由RBLP中带有和无置换的特征向量的差异的covariances形成,然后经过一种变换,必然有其主成分在正常方向上。我们的算法使用Subgaussian散射约束来估算均值和 covariance 矩阵,然后使用这些矩阵来随机抽取包来估算正常方向。使用这种方法,我们可以高效地分类LTF。对于 $N(\mathbf{0}, \mathbf{I})$ 分布的特殊情况,我们还提供了一个简单的均值估计基于算法。我们在实验评估了我们的学习算法,并与 [Saket'21, Saket'22] 和随机 LTF 进行了比较,demonstrating the effectiveness of our techniques。

LLP-Bench: A Large Scale Tabular Benchmark for Learning from Label Proportions

  • paper_url: http://arxiv.org/abs/2310.10096
  • repo_url: None
  • paper_authors: Anand Brahmbhatt, Mohith Pokala, Rishi Saket, Aravindan Raghuveer
  • for: 本文提出了一个大规模的标注量学习(LLP)数据集,用于 Addressing the lack of a open, large scale tabular benchmark。
  • methods: 本文提出了一个名为 LLP-Bench 的数据集,包含 56 个 LLP 数据集(52 个特征袋和 4 个随机袋数据集),它们都是从 Criteo CTR 预测数据集中的 45 万个实例中构建的。此外,本文还提出了四种度量量学习数据集的困难程度。
  • results: 本文通过使用这四种度量量学习数据集的困难程度,进行了深入的分析。此外,本文还通过使用 9 种 state-of-the-art 和受欢迎的标注量学习技术,对所有 56 个数据集进行了性能测试。根据本文的描述,这是文献中最广泛的标注量学习技术测试。
    Abstract In the task of Learning from Label Proportions (LLP), a model is trained on groups (a.k.a bags) of instances and their corresponding label proportions to predict labels for individual instances. LLP has been applied pre-dominantly on two types of datasets - image and tabular. In image LLP, bags of fixed size are created by randomly sampling instances from an underlying dataset. Bags created via this methodology are called random bags. Experimentation on Image LLP has been mostly on random bags on CIFAR-* and MNIST datasets. Despite being a very crucial task in privacy sensitive applications, tabular LLP does not yet have a open, large scale LLP benchmark. One of the unique properties of tabular LLP is the ability to create feature bags where all the instances in a bag have the same value for a given feature. It has been shown in prior research that feature bags are very common in practical, real world applications [Chen et. al '23, Saket et. al. '22]. In this paper, we address the lack of a open, large scale tabular benchmark. First we propose LLP-Bench, a suite of 56 LLP datasets (52 feature bag and 4 random bag datasets) created from the Criteo CTR prediction dataset consisting of 45 million instances. The 56 datasets represent diverse ways in which bags can be constructed from underlying tabular data. To the best of our knowledge, LLP-Bench is the first large scale tabular LLP benchmark with an extensive diversity in constituent datasets. Second, we propose four metrics that characterize and quantify the hardness of a LLP dataset. Using these four metrics we present deep analysis of the 56 datasets in LLP-Bench. Finally we present the performance of 9 SOTA and popular tabular LLP techniques on all the 56 datasets. To the best of our knowledge, our study consisting of more than 2500 experiments is the most extensive study of popular tabular LLP techniques in literature.
    摘要 在学习从标签比例(LLP)任务中,一个模型被训练在实例组(即袋)和其对应的标签比例上,以预测个体实例的标签。 LLG 已经主要应用于图像和表格数据集。在图像 LLG 中,实例组通常通过随机抽样实例从下面数据集创建。这种方法创建的袋被称为随机袋。在 CIFAR-* 和 MNIST 数据集上进行了大量实验。虽然图像 LLG 是一个非常重要的任务,但是表格 LLG 尚未有一个开放、大规模的 LLG bencmark。表格 LLG 的一个独特性是可以创建特征袋,其中所有实例在袋中都有相同的特征值。在先前的研究中已经证明了特征袋在实际应用中很常见。在这篇文章中,我们解决表格 LLG 缺乏一个开放、大规模的 bencmark 问题。我们提出了 LLP-Bench,一个包含 56 个 LLG 数据集(52 个特征袋数据集和 4 个随机袋数据集)的集合,这些数据集是从 Criteo CTR 预测数据集中的 45 万个实例中创建的。这些 56 个数据集表示了从下面表格数据中构建袋的多种方法。我们知道,LLP-Bench 是首先开放、大规模的表格 LLG bencmark,并且具有广泛的数据集多样性。其次,我们提出了四种度量 LLG 数据集的困难程度。使用这四种度量,我们进行了深入的分析 LLP-Bench 中的 56 个数据集。最后,我们在所有 56 个数据集上运行了 9 种 state-of-the-art 和流行的表格 LLG 技术,并进行了 более чем 2500 个实验。到目前为止,我们的研究是文献中最广泛的表格 LLG 技术研究。

Label Differential Privacy via Aggregation

  • paper_url: http://arxiv.org/abs/2310.10092
  • repo_url: None
  • paper_authors: Anand Brahmbhatt, Rishi Saket, Shreyas Havaldar, Anshul Nasery, Aravindan Raghuveer
  • for: 保护敏感训练标签的隐私
  • methods: 使用 randomly weighted aggregation 和 additive noise 保护隐私
  • results: 可以实现 label-DP 保护,无需或少量的添加噪声,并且 preserved 训练任务的效果
    Abstract In many real-world applications, in particular due to recent developments in the privacy landscape, training data may be aggregated to preserve the privacy of sensitive training labels. In the learning from label proportions (LLP) framework, the dataset is partitioned into bags of feature-vectors which are available only with the sum of the labels per bag. A further restriction, which we call learning from bag aggregates (LBA) is where instead of individual feature-vectors, only the (possibly weighted) sum of the feature-vectors per bag is available. We study whether such aggregation techniques can provide privacy guarantees under the notion of label differential privacy (label-DP) previously studied in for e.g. [Chaudhuri-Hsu'11, Ghazi et al.'21, Esfandiari et al.'22]. It is easily seen that naive LBA and LLP do not provide label-DP. Our main result however, shows that weighted LBA using iid Gaussian weights with $m$ randomly sampled disjoint $k$-sized bags is in fact $(\varepsilon, \delta)$-label-DP for any $\varepsilon > 0$ with $\delta \approx \exp(-\Omega(\sqrt{k}))$ assuming a lower bound on the linear-mse regression loss. Further, this preserves the optimum over linear mse-regressors of bounded norm to within $(1 \pm o(1))$-factor w.p. $\approx 1 - \exp(-\Omega(m))$. We emphasize that no additive label noise is required. The analogous weighted-LLP does not however admit label-DP. Nevertheless, we show that if additive $N(0, 1)$ noise can be added to any constant fraction of the instance labels, then the noisy weighted-LLP admits similar label-DP guarantees without assumptions on the dataset, while preserving the utility of Lipschitz-bounded neural mse-regression tasks. Our work is the first to demonstrate that label-DP can be achieved by randomly weighted aggregation for regression tasks, using no or little additive noise.
    摘要 在许多实际应用中,特别是due to recent developments in privacy landscape,training data可能会被聚合以保护敏感训练标签的隐私。在学习从标签聚合(LLP)框架中,数据集被分解成具有特征向量的袋子,但这些特征向量只有每个袋子的标签总和。我们称这种约束为学习从袋子聚合(LBA)。我们研究了这种聚合技术是否可以提供隐私保证,并且我们发现这种保证是可行的。我们的主要结果表明,使用独立的 Gaussian 权重,将 $m$ 个不同大小的 $k$-个袋子Randomly sampled,并使用加权 LBA,可以实现 $( \varepsilon, \delta)$-标签隐私(label-DP),其中 $\delta \approx \exp(-\Omega(\sqrt{k}))$。此外,这种方法可以保持最佳的线性mse回归损失,即 $(1 \pm o(1))$-factor w.p. $\approx 1 - \exp(-\Omega(m))$.这意味着没有添加标签噪声。虽然加权 LLP 不能实现标签隐私,但我们发现,如果将任意一部分的实例标签添加 $N(0, 1)$ 噪声,那么这种噪声化的加权 LLP 可以实现类似的标签隐私保证,不需要对数据集进行任何假设。此外,这种方法可以保持 Lipschitz-bounded 神经网络 mse-regression 任务的实用性。我们的工作是首次示出,通过Randomly weighted aggregation可以实现标签隐私,无需或只需少量的添加噪声。

Over-the-Air Federated Learning and Optimization

  • paper_url: http://arxiv.org/abs/2310.10089
  • repo_url: None
  • paper_authors: Jingyang Zhu, Yuanming Shi, Yong Zhou, Chunxiao Jiang, Wei Chen, Khaled B. Letaief
  • For: 这篇论文关注于 federated learning (FL) 中的 over-the-air computation (AirComp),以减少无线网络上的通信负担,但是会增加模型聚合错误引起的学习性能下降。* Methods: 该论文首先进行了 AirComp-based FedAvg 算法的完整的研究,包括在强型凸和非凸设定下的常数和减少学习率下的渐进分析,以及在数据不同性下的影响分析。* Results: 该论文通过渐进分析和启发性分析,描述了模型聚合错误对渐进级别的影响,并提供了系统设计的准确性保证。此外,该论文还探讨了不同类型的本地更新(模型、梯度和模型差异)在 AirFedAvg 算法中的影响。
    Abstract Federated learning (FL), as an emerging distributed machine learning paradigm, allows a mass of edge devices to collaboratively train a global model while preserving privacy. In this tutorial, we focus on FL via over-the-air computation (AirComp), which is proposed to reduce the communication overhead for FL over wireless networks at the cost of compromising in the learning performance due to model aggregation error arising from channel fading and noise. We first provide a comprehensive study on the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both strongly convex and non-convex settings with constant and diminishing learning rates in the presence of data heterogeneity. Through convergence and asymptotic analysis, we characterize the impact of aggregation error on the convergence bound and provide insights for system design with convergence guarantees. Then we derive convergence rates for AirFedAvg algorithms for strongly convex and non-convex objectives. For different types of local updates that can be transmitted by edge devices (i.e., local model, gradient, and model difference), we reveal that transmitting local model in AirFedAvg may cause divergence in the training procedure. In addition, we consider more practical signal processing schemes to improve the communication efficiency and further extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes. Extensive simulation results under different settings of objective functions, transmitted local information, and communication schemes verify the theoretical conclusions.
    摘要 federated learning(FL)是一种新的分布式机器学习模式,允许Edge设备之间的大量设备共同训练全球模型,保持隐私。在这个教程中,我们关注FL通过过空 computation(AirComp)来降低由无线网络传输的交流负担,但是由于通道抖动和噪声而导致模型聚合错误,因此降低了学习性能。我们首先对AirComp-based FedAvg(AirFedAvg)算法进行了全面的研究,包括强度凸和非凸设置下的常数和减少学习率,并在数据不同性下进行了与界分析。我们通过收敛和极限分析来描述聚合错误对收敛 bound 的影响,并提供了系统设计的准确性保证。然后,我们 derivated convergence rates for AirFedAvg algorithms for strongly convex and non-convex objectives. 对于不同的本地更新可以在 Edge devices 上传输(即本地模型、梯度和模型差异),我们发现在 AirFedAvg 中传输本地模型可能会导致训练过程中的偏转。此外,我们考虑了更实际的通信减少技术,并将这些技术应用于不同的聚合错误类型,以进一步推广收敛分析。我们的实验结果在不同的目标函数、传输的本地信息和通信方案下都验证了我们的理论结论。

A simple uniformly optimal method without line search for convex optimization

  • paper_url: http://arxiv.org/abs/2310.10082
  • repo_url: None
  • paper_authors: Tianjiao Li, Guanghui Lan
  • for: solves convex optimization problems with unknown problem parameters (e.g., Lipschitz constant) without the need for line search procedures.
  • methods: presents a novel accelerated gradient descent type algorithm called auto-conditioned fast gradient method (AC-FGM) that achieves an optimal $\mathcal{O}(1/k^2)$ rate of convergence for smooth convex optimization without requiring the estimate of a global Lipschitz constant.
  • results: demonstrates the advantages of AC-FGM over the previously developed parameter-free methods for convex optimization through numerical results.
    Abstract Line search (or backtracking) procedures have been widely employed into first-order methods for solving convex optimization problems, especially those with unknown problem parameters (e.g., Lipschitz constant). In this paper, we show that line search is superfluous in attaining the optimal rate of convergence for solving a convex optimization problem whose parameters are not given a priori. In particular, we present a novel accelerated gradient descent type algorithm called auto-conditioned fast gradient method (AC-FGM) that can achieve an optimal $\mathcal{O}(1/k^2)$ rate of convergence for smooth convex optimization without requiring the estimate of a global Lipschitz constant or the employment of line search procedures. We then extend AC-FGM to solve convex optimization problems with H\"{o}lder continuous gradients and show that it automatically achieves the optimal rates of convergence uniformly for all problem classes with the desired accuracy of the solution as the only input. Finally, we report some encouraging numerical results that demonstrate the advantages of AC-FGM over the previously developed parameter-free methods for convex optimization.
    摘要 《线搜索(或回溯)过程已广泛应用于首领方法中解决凸优化问题,特别是当问题参数未知(例如 lipschitz常数)。在这篇论文中,我们证明线搜索是不必要的,以实现凸优化问题的最佳$\mathcal{O}(1/k^2)$趋势速度。我们 THEN present a novel accelerated gradient descent type algorithm called auto-conditioned fast gradient method (AC-FGM),可以在凸优化问题中不需要global lipschitz常数或线搜索过程来实现最佳趋势速度。我们然后扩展AC-FGM来解决凸优化问题,并证明它自动实现了最佳趋势速度,并且可以在所有问题类型中实现最佳趋势速度,只需要输入所需的精度。最后,我们report some encouraging numerical results that demonstrate the advantages of AC-FGM over the previously developed parameter-free methods for convex optimization。》Note: The translation is in Simplified Chinese, which is one of the two standard versions of Chinese. The other version is Traditional Chinese.

SoTTA: Robust Test-Time Adaptation on Noisy Data Streams

  • paper_url: http://arxiv.org/abs/2310.10074
  • repo_url: https://github.com/taeckyung/SoTTA
  • paper_authors: Taesik Gong, Yewon Kim, Taeckyung Lee, Sorn Chottananurak, Sung-Ju Lee
  • for: 这个论文旨在Addressing distributional shifts between training and testing data using only unlabeled test data streams for continual model adaptation.
  • methods: 这个方法使用 two-fold enablers: (i) input-wise robustness via high-confidence uniform-class sampling, and (ii) parameter-wise robustness via entropy-sharpness minimization.
  • results: 比较先前的TTA方法,这个方法在存在噪音样本的情况下实现了比较好的性能,并且在没有噪音样本的情况下实现了相当的性能。
    Abstract Test-time adaptation (TTA) aims to address distributional shifts between training and testing data using only unlabeled test data streams for continual model adaptation. However, most TTA methods assume benign test streams, while test samples could be unexpectedly diverse in the wild. For instance, an unseen object or noise could appear in autonomous driving. This leads to a new threat to existing TTA algorithms; we found that prior TTA algorithms suffer from those noisy test samples as they blindly adapt to incoming samples. To address this problem, we present Screening-out Test-Time Adaptation (SoTTA), a novel TTA algorithm that is robust to noisy samples. The key enabler of SoTTA is two-fold: (i) input-wise robustness via high-confidence uniform-class sampling that effectively filters out the impact of noisy samples and (ii) parameter-wise robustness via entropy-sharpness minimization that improves the robustness of model parameters against large gradients from noisy samples. Our evaluation with standard TTA benchmarks with various noisy scenarios shows that our method outperforms state-of-the-art TTA methods under the presence of noisy samples and achieves comparable accuracy to those methods without noisy samples. The source code is available at https://github.com/taeckyung/SoTTA .
    摘要 (i) input-wise robustness via high-confidence uniform-class sampling that effectively filters out the impact of noisy samples;(ii) parameter-wise robustness via entropy-sharpness minimization that improves the robustness of model parameters against large gradients from noisy samples.Our evaluation with standard TTA benchmarks with various noisy scenarios shows that our method outperforms state-of-the-art TTA methods under the presence of noisy samples and achieves comparable accuracy to those methods without noisy samples. The source code is available at .

Data Augmentation for Time-Series Classification: An Extensive Empirical Study and Comprehensive Survey

  • paper_url: http://arxiv.org/abs/2310.10060
  • repo_url: None
  • paper_authors: Zijun Gao, Lingbo Li, Tianhua Xu
    for:* The paper aims to provide a comprehensive review of data augmentation (DA) techniques for time series classification (TSC) and to develop a novel taxonomy for categorizing these techniques.methods:* The paper uses an extensive literature review and a rigorous analysis of over 100 scholarly articles to identify and categorize more than 60 unique DA techniques for TSC.* The paper also employs an all-encompassing empirical assessment using 8 UCR time-series datasets and ResNet to evaluate the performance of various DA strategies.results:* The paper reports a benchmark accuracy of 88.94 +- 11.83% using a multi-faceted evaluation paradigm that includes Accuracy, Method Ranking, and Residual Analysis.* The paper highlights the inconsistent efficacies of DA techniques for TSC and underscores the need for a robust navigational aid for scholars to select appropriate methods.
    Abstract Data Augmentation (DA) has emerged as an indispensable strategy in Time Series Classification (TSC), primarily due to its capacity to amplify training samples, thereby bolstering model robustness, diversifying datasets, and curtailing overfitting. However, the current landscape of DA in TSC is plagued with fragmented literature reviews, nebulous methodological taxonomies, inadequate evaluative measures, and a dearth of accessible, user-oriented tools. In light of these challenges, this study embarks on an exhaustive dissection of DA methodologies within the TSC realm. Our initial approach involved an extensive literature review spanning a decade, revealing that contemporary surveys scarcely capture the breadth of advancements in DA for TSC, prompting us to meticulously analyze over 100 scholarly articles to distill more than 60 unique DA techniques. This rigorous analysis precipitated the formulation of a novel taxonomy, purpose-built for the intricacies of DA in TSC, categorizing techniques into five principal echelons: Transformation-Based, Pattern-Based, Generative, Decomposition-Based, and Automated Data Augmentation. Our taxonomy promises to serve as a robust navigational aid for scholars, offering clarity and direction in method selection. Addressing the conspicuous absence of holistic evaluations for prevalent DA techniques, we executed an all-encompassing empirical assessment, wherein upwards of 15 DA strategies were subjected to scrutiny across 8 UCR time-series datasets, employing ResNet and a multi-faceted evaluation paradigm encompassing Accuracy, Method Ranking, and Residual Analysis, yielding a benchmark accuracy of 88.94 +- 11.83%. Our investigation underscored the inconsistent efficacies of DA techniques, with...
    摘要 <>Translate the given text into Simplified Chinese.<>时序分类(TSC)中的数据扩展(DA)已经成为一种不可或缺的策略,主要是因为它可以增加训练样本数量,从而增强模型的鲁棒性,维护数据集的多样性,并避免过拟合。然而,现有的DA在TSC领域的文献 landscape 受到了 Fragmented 的文献回顾,混乱的方法分类、不够的评价标准和Accessible 的用户工具的缺乏,这些挑战使得这项研究发起了一项极其详细的DA方法分析。我们的初始方法是进行了一项广泛的文献回顾,覆盖了一个 décennial 的时间范围,发现当前的 contemporary 文献几乎不能涵盖DA在TSC领域的全面发展,因此我们仔细分析了超过 100 篇学术文章,提取了超过 60 种Unique DA技术。这项精心的分析导致了我们提出了一个专门为TSC领域的DA方法分类的新分类法,将技术分为五个主要层次:转换基于、模式基于、生成器、分解基于和自动化数据扩展。我们的分类法承诺成为学术界的一个robust navigational aid,为执行者提供了方法选择的明确性和方向性。为了弥补DA技术的普遍存在的评价问题,我们实施了一项总面的实验室评价,对超过 15 种DA策略进行了8个UCSD时序数据集的评估,使用了ResNet和一种多方面的评价方案,包括准确率、方法排名和剩余分析,实现了基准准确率88.94±11.83%。我们的调查表明,DA技术的不同策略在不同的时序数据集上的表现存在差异,...

Latent Conservative Objective Models for Data-Driven Crystal Structure Prediction

  • paper_url: http://arxiv.org/abs/2310.10056
  • repo_url: None
  • paper_authors: Han Qi, Xinyang Geng, Stefano Rando, Iku Ohama, Aviral Kumar, Sergey Levine
  • for: 这 paper 的目的是提出一种数据驱动的晶体结构预测方法,以优化晶体结构的计算成本。
  • methods: 这 paper 使用了一种名为 LCOMs (latent conservative objective models) 的方法,它利用一个状态环境 autoencoder 将晶体结构转换为一个矢量空间中的搜索空间,然后优化一个保守的晶体能量模型。
  • results: 这 paper 的结果表明,LCOMs 方法可以与现有最佳方法相比,成功率相似,而计算成本减少了多少。
    Abstract In computational chemistry, crystal structure prediction (CSP) is an optimization problem that involves discovering the lowest energy stable crystal structure for a given chemical formula. This problem is challenging as it requires discovering globally optimal designs with the lowest energies on complex manifolds. One approach to tackle this problem involves building simulators based on density functional theory (DFT) followed by running search in simulation, but these simulators are painfully slow. In this paper, we study present and study an alternate, data-driven approach to crystal structure prediction: instead of directly searching for the most stable structures in simulation, we train a surrogate model of the crystal formation energy from a database of existing crystal structures, and then optimize this model with respect to the parameters of the crystal structure. This surrogate model is trained to be conservative so as to prevent exploitation of its errors by the optimizer. To handle optimization in the non-Euclidean space of crystal structures, we first utilize a state-of-the-art graph diffusion auto-encoder (CD-VAE) to convert a crystal structure into a vector-based search space and then optimize a conservative surrogate model of the crystal energy, trained on top of this vector representation. We show that our approach, dubbed LCOMs (latent conservative objective models), performs comparably to the best current approaches in terms of success rate of structure prediction, while also drastically reducing computational cost.
    摘要 在计算化学中,晶体结构预测(CSP)是一个优化问题,涉及到找到给定化学式的最低能量稳定晶体结构。这个问题是复杂的,因为需要找到最低能量的全局优化设计在复杂的拟合上。一种方法是建立基于密度函数理论(DFT)的模拟器,然后通过搜索在模拟中进行优化,但这些模拟器很慢。在这篇论文中,我们研究了一种 alternate,数据驱动的晶体结构预测方法:而不是直接在模拟中搜索最稳定的结构,我们将训练一个晶体形成能量的模拟器,该模拟器在数据库中的已知晶体结构基础上被训练,然后对晶体结构参数进行优化。这个模拟器被设计为保守的,以避免其错误被优化器利用。为处理晶体结构的非几何空间优化问题,我们首先使用当前最佳的图像扩散自动encoder(CD-VAE)将晶体结构转换为一个矢量基本搜索空间,然后对这个矢量表示的晶体能量模拟器进行保守的优化。我们发现,我们的方法,称为LCOMs(幽默保守目标模型),与当前最佳方法相比,在结构预测成功率方面表现相似,同时也很快地减少计算成本。

Symmetrical SyncMap for Imbalanced General Chunking Problems

  • paper_url: http://arxiv.org/abs/2310.10045
  • repo_url: None
  • paper_authors: Heng Zhang, Danilo Vasconcellos Vargas
  • for: 本研究旨在学习从序列中检索复杂结构,并适应任何结构变化。
  • methods: 本研究使用非线性动力学方程, inspirited by neuron group behaviors,而不使用损失函数。
  • results: 我们的算法在12个不均衡CGCP中表现出色,超过或与其他无监督状态元既达到同等水平。在实际应用场景中,我们的方法在3个场景中表现出优异,表明对时间数据中的トポлогиcal结构和层次结构具有探索性。
    Abstract Recently, SyncMap pioneered an approach to learn complex structures from sequences as well as adapt to any changes in underlying structures. This is achieved by using only nonlinear dynamical equations inspired by neuron group behaviors, i.e., without loss functions. Here we propose Symmetrical SyncMap that goes beyond the original work to show how to create dynamical equations and attractor-repeller points which are stable over the long run, even dealing with imbalanced continual general chunking problems (CGCPs). The main idea is to apply equal updates from negative and positive feedback loops by symmetrical activation. We then introduce the concept of memory window to allow for more positive updates. Our algorithm surpasses or ties other unsupervised state-of-the-art baselines in all 12 imbalanced CGCPs with various difficulties, including dynamically changing ones. To verify its performance in real-world scenarios, we conduct experiments on several well-studied structure learning problems. The proposed method surpasses substantially other methods in 3 out of 4 scenarios, suggesting that symmetrical activation plays a critical role in uncovering topological structures and even hierarchies encoded in temporal data.
    摘要 最近,SyncMap开创了一种从序列学习复杂结构的方法,同时适应下面结构的变化。这是通过使用非线性动力学方程, inspirited by neuron group behaviors,而不使用损失函数。在这个研究中,我们提出了对称的SyncMap,超越原始工作,并显示了如何创建动力学方程和吸引器-抵抗点,这些点在长期内是稳定的,甚至在不均衡的CGCP中进行总览。我们的算法在12个不均衡CGCP中至少与其他无监督状态艺术基elines一样好,包括动态变化的CGCP。为了证明它在实际情况下的表现,我们在several well-studied structure learning问题上进行了实验。提出的方法在3个问题中大幅超过其他方法,表明对称活动在捕捉时间数据中的 topological结构和层次结构具有关键作用。

TpopT: Efficient Trainable Template Optimization on Low-Dimensional Manifolds

  • paper_url: http://arxiv.org/abs/2310.10039
  • repo_url: None
  • paper_authors: Jingkai Yan, Shiyu Wang, Xinyu Rain Wei, Jimmy Wang, Zsuzsanna Márka, Szabolcs Márka, John Wright
  • for: 检测低维度信号家族
  • methods: 使用 TemPlate OPTimization 框架,combined with embedding和kernel interpolation,提高计算效率
  • results: 在 gravitational wave detection 和手写数据上显示了明显的性能改善,并且可以替换现有的 matched filtering 方法
    Abstract In scientific and engineering scenarios, a recurring task is the detection of low-dimensional families of signals or patterns. A classic family of approaches, exemplified by template matching, aims to cover the search space with a dense template bank. While simple and highly interpretable, it suffers from poor computational efficiency due to unfavorable scaling in the signal space dimensionality. In this work, we study TpopT (TemPlate OPTimization) as an alternative scalable framework for detecting low-dimensional families of signals which maintains high interpretability. We provide a theoretical analysis of the convergence of Riemannian gradient descent for TpopT, and prove that it has a superior dimension scaling to covering. We also propose a practical TpopT framework for nonparametric signal sets, which incorporates techniques of embedding and kernel interpolation, and is further configurable into a trainable network architecture by unrolled optimization. The proposed trainable TpopT exhibits significantly improved efficiency-accuracy tradeoffs for gravitational wave detection, where matched filtering is currently a method of choice. We further illustrate the general applicability of this approach with experiments on handwritten digit data.
    摘要 在科学和工程应用中,检测低维度信号家族是一项常复现的任务。经典的方法之一是模板匹配,它在搜索空间使用密集的模板银行,但它的计算效率受到信号空间维度的不利影响。在这项工作中,我们研究TpopT(模板优化)作为一种可扩展的搜索框架,可以快速检测低维度信号家族,同时保持高度可读性。我们提供了TpopT的理论分析,证明它在维度上有更好的缩放性。此外,我们还提出了一种实用的TpopT框架,用于非参数式信号集,该框架包括投影和核函数 interpolate 技术,并可以通过不断的优化来转化为可训练的网络结构。我们的可训练TpopT在探测 gravitational wave 方面表现出了明显的效率-准确性融合优势,现在matched filtering 是选择的方法。此外,我们还通过对手写数据进行实验,证明了这种方法的通用性。

Unraveling Fundamental Properties of Power System Resilience Curves using Unsupervised Machine Learning

  • paper_url: http://arxiv.org/abs/2310.10030
  • repo_url: None
  • paper_authors: Bo Li, Ali Mostafavi
  • for: 这个研究旨在描述和量化基础设施的鲜敏性特点。
  • methods: 这个研究使用无监督机器学习分析了超过200个关于三次极端天气事件的停电情况下的鲜敏性曲线。
  • results: 研究发现了两种基础设施鲜敏性曲线模型:三角形曲线和梯形曲线。三角形曲线基于 three critical functionality threshold、critical functionality recovery rate 和 recovery pivot point。梯形曲线则基于停电持续时间和平均恢复率。停电持续时间越长,恢复率就越慢。这些发现可以帮助我们更好地理解和预测基础设施的鲜敏性表现。
    Abstract The standard model of infrastructure resilience, the resilience triangle, has been the primary way of characterizing and quantifying infrastructure resilience. However, the theoretical model merely provides a one-size-fits-all framework for all infrastructure systems. Most of the existing studies examine the characteristics of infrastructure resilience curves based on analytical models constructed upon simulated system performance. Limited empirical studies hindered our ability to fully understand and predict resilience characteristics in infrastructure systems. To address this gap, this study examined over 200 resilience curves related to power outages in three major extreme weather events. Using unsupervised machine learning, we examined different curve archetypes, as well as the fundamental properties of each resilience curve archetype. The results show two primary archetypes for power system resilience curves, triangular, and trapezoidal curves. Triangular curves characterize resilience behavior based on 1. critical functionality threshold, 2. critical functionality recovery rate, and 3. recovery pivot point. Trapezoidal archetypes explain resilience curves based on 1. duration of sustained function loss and 2. constant recovery rate. The longer the duration of sustained function loss, the slower the constant rate of recovery. The findings of this study provide novel perspectives enabling better understanding and prediction of resilience performance of power system infrastructures.
    摘要 现代基础设施鲜度模型,即鲜度三角形模型,已成为基础设施鲜度的主要方法。然而,这种理论模型只能为所有基础设施系统提供一个一大 Familiar framework。大多数现有研究都是基于对基础设施系统性能的分析建模。有限的实证研究限制了我们理解和预测基础设施系统鲜度的能力。为了解决这个差距,本研究对三次极端天气事件中的电力停机事件进行了200多个鲜度曲线的研究。使用无监督机器学习方法,我们研究了不同的鲜度曲线范型,以及每个鲜度曲线范型的基本性质。结果显示,电力系统鲜度曲线有两种主要范型:三角形曲线和梯形曲线。三角形曲线表示鲜度行为的三个关键指标:极限功能阈值、极限功能恢复率和恢复枢轴点。梯形曲线则解释鲜度曲线的两个指标:持续功能损失的时间长度和恢复率。即使持续功能损失的时间长度越长,恢复率也越慢。这些发现为电力系统基础设施的鲜度性能提供了新的视角,帮助更好地理解和预测鲜度性能。

Data-Driven Score-Based Models for Generating Stable Structures with Adaptive Crystal Cells

  • paper_url: http://arxiv.org/abs/2310.10695
  • repo_url: https://github.com/findooshka/diffusion-atoms
  • paper_authors: Arsen Sultanov, Jean-Claude Crivello, Tabea Rebafka, Nataliya Sokolovska
  • for: 本研究旨在通过机器学习生成模型,找到新的功能性和稳定性的材料。
  • methods: 该研究使用了分布式朴素迭代随机动力学模型,在训练过程中学习了晶格的各个参数,并在生成新的化学结构时使用了两个杂谱处理来生成晶格和原子位置。
  • results: 研究人员通过对不同化学系统和晶体群进行比较,表明了他们的模型能够在不需要额外训练的情况下,生成新的候选结构。
    Abstract The discovery of new functional and stable materials is a big challenge due to its complexity. This work aims at the generation of new crystal structures with desired properties, such as chemical stability and specified chemical composition, by using machine learning generative models. Compared to the generation of molecules, crystal structures pose new difficulties arising from the periodic nature of the crystal and from the specific symmetry constraints related to the space group. In this work, score-based probabilistic models based on annealed Langevin dynamics, which have shown excellent performance in various applications, are adapted to the task of crystal generation. The novelty of the presented approach resides in the fact that the lattice of the crystal cell is not fixed. During the training of the model, the lattice is learned from the available data, whereas during the sampling of a new chemical structure, two denoising processes are used in parallel to generate the lattice along the generation of the atomic positions. A multigraph crystal representation is introduced that respects symmetry constraints, yielding computational advantages and a better quality of the sampled structures. We show that our model is capable of generating new candidate structures in any chosen chemical system and crystal group without any additional training. To illustrate the functionality of the proposed method, a comparison of our model to other recent generative models, based on descriptor-based metrics, is provided.
    摘要 <>Translate the given text into Simplified Chinese.<>新型功能稳定材料的发现是一个大的挑战,因为它的复杂性。这项工作的目标是通过机器学习生成模型生成新的晶体结构,其具有指定的化学稳定性和某些化学成分。与分子生成不同,晶体结构受到晶体 периоди性和空间群特殊约束的限制,这些约束使得晶体生成增加了新的挑战。在这项工作中,我们使用了Score-based潜在随机模型,这种模型在多种应用中表现出色。我们的新方法在训练模型时不 fix 晶体维度,而是在数据available时学习晶体矩阵,并在生成原子位置时使用了两个杂化过程。我们引入了多граф晶体表示,该表示符合Symmetry约束,从而获得计算优势和更高质量的样本结构。我们显示了我们的模型可以在任选的化学系统和晶体组中生成新的候选结构,无需额外训练。为证明我们的方法的可行性,我们对其与其他最近的生成模型进行了比较,并通过描述符 metric 进行评估。

Riemannian Residual Neural Networks

  • paper_url: http://arxiv.org/abs/2310.10013
  • repo_url: None
  • paper_authors: Isay Katsman, Eric Ming Chen, Sidhanth Holalkere, Anna Asch, Aaron Lou, Ser-Nam Lim, Christopher De Sa
  • for: 这种研究旨在扩展常见的欧几丁素神经网络(ResNet)到整体几何抽象空间中,以便在自然科学中遇到的拓扑空间数据上进行学习。
  • methods: 这篇论文使用了几何神经网络的扩展,以便在拓扑空间上进行学习。这种扩展基于几何抽象空间的原则,并且可以覆盖整体几何抽象空间中的任何点。
  • results: 论文的实验结果表明,使用这种几何神经网络可以在拓扑空间上进行更好的学习,并且在相关的测试指标上表现更好,比如训练律动和测试结果。
    Abstract Recent methods in geometric deep learning have introduced various neural networks to operate over data that lie on Riemannian manifolds. Such networks are often necessary to learn well over graphs with a hierarchical structure or to learn over manifold-valued data encountered in the natural sciences. These networks are often inspired by and directly generalize standard Euclidean neural networks. However, extending Euclidean networks is difficult and has only been done for a select few manifolds. In this work, we examine the residual neural network (ResNet) and show how to extend this construction to general Riemannian manifolds in a geometrically principled manner. Originally introduced to help solve the vanishing gradient problem, ResNets have become ubiquitous in machine learning due to their beneficial learning properties, excellent empirical results, and easy-to-incorporate nature when building varied neural networks. We find that our Riemannian ResNets mirror these desirable properties: when compared to existing manifold neural networks designed to learn over hyperbolic space and the manifold of symmetric positive definite matrices, we outperform both kinds of networks in terms of relevant testing metrics and training dynamics.
    摘要 现代几何深度学习方法已经引入了许多神经网络操作于偏射抽象空间上的数据。这些神经网络经常用于学习具有层次结构的图或者学习自然科学中遇到的拟合空间上的数据。这些神经网络通常是基于标准欧几何网络的扩展,但扩展到普通的欧几何空间是困难的,只有对一些特殊的欧几何空间进行了扩展。在这个工作中,我们研究了剩余神经网络(ResNet)的扩展,并证明了这种扩展方法可以在一般的偏射抽象空间上进行地理emetric的扩展。原本是解决减速问题的概念,ResNet在机器学习中得到了广泛的应用,因为它具有良好的学习性、优秀的实验性和容易整合到不同神经网络中的特点。我们发现,我们的偏射抽象空间上的ResNet和已有的欧几何空间上的神经网络相比,在相关的测试指标和训练动态上表现出色。

Ring-A-Bell! How Reliable are Concept Removal Methods for Diffusion Models?

  • paper_url: http://arxiv.org/abs/2310.10012
  • repo_url: None
  • paper_authors: Yu-Lin Tsai, Chia-Yi Hsu, Chulin Xie, Chih-Hsun Lin, Jia-You Chen, Bo Li, Pin-Yu Chen, Chia-Mu Yu, Chun-Ying Huang
  • for: 本研究旨在调查 diffusion models 的安全机制,以确保它们不会生成不适或有害内容。
  • methods: 我们提出了一种新的概念检索算法,可以评估 diffusion models 的安全性。该算法首先提取敏感或不适的概念,然后使用这些概念来自动标识 diffusion models 中可能生成不适内容的提问。
  • results: 我们的研究表明, Ring-A-Bell 可以 manipulate 安全提问 benchmarks,使得原本被视为安全的提问可以逃脱现有的安全机制,并生成不适或有害内容。这表明现有的安全机制并不够,需要进一步改进。
    Abstract Diffusion models for text-to-image (T2I) synthesis, such as Stable Diffusion (SD), have recently demonstrated exceptional capabilities for generating high-quality content. However, this progress has raised several concerns of potential misuse, particularly in creating copyrighted, prohibited, and restricted content, or NSFW (not safe for work) images. While efforts have been made to mitigate such problems, either by implementing a safety filter at the evaluation stage or by fine-tuning models to eliminate undesirable concepts or styles, the effectiveness of these safety measures in dealing with a wide range of prompts remains largely unexplored. In this work, we aim to investigate these safety mechanisms by proposing one novel concept retrieval algorithm for evaluation. We introduce Ring-A-Bell, a model-agnostic red-teaming tool for T2I diffusion models, where the whole evaluation can be prepared in advance without prior knowledge of the target model. Specifically, Ring-A-Bell first performs concept extraction to obtain holistic representations for sensitive and inappropriate concepts. Subsequently, by leveraging the extracted concept, Ring-A-Bell automatically identifies problematic prompts for diffusion models with the corresponding generation of inappropriate content, allowing the user to assess the reliability of deployed safety mechanisms. Finally, we empirically validate our method by testing online services such as Midjourney and various methods of concept removal. Our results show that Ring-A-Bell, by manipulating safe prompting benchmarks, can transform prompts that were originally regarded as safe to evade existing safety mechanisms, thus revealing the defects of the so-called safety mechanisms which could practically lead to the generation of harmful contents.
    摘要 Diffusion模型 для文本到图像(T2I)合成,如稳定扩散(SD),最近已经展示出了高质量内容的生成能力。然而,这种进步也引起了许多关于可能的不当使用的担忧,特别是在生成版权、禁止或限制的内容,或者NSFW(不适合工作)图像。虽有尝试了对这些问题进行缓解,例如在评估阶段实施安全筛选或者 Fine-tune模型以消除不жела的概念或风格,但是这些安全措施在各种提示下的效果仍然未经充分探索。在这项工作中,我们目的是调查这些安全机制。我们提出了一种新的概念检索算法,用于评估T2I扩散模型的安全性。我们称之为“铃铛”(Ring-A-Bell),它是一种无关模型的红Team工具,可以在提前准备的情况下完全无需先知Target模型来进行评估。具体来说,“铃铛”首先从敏感和不适合内容中提取概念,然后利用提取到的概念来自动识别扩散模型中的问题提示,并生成相应的不适合内容。这样,用户可以评估 deployed safety mechanisms的可靠性。最后,我们经验 validate我们的方法,测试在线服务如midjourney和不同的概念 removalfrom。我们的结果表明,“铃铛”可以通过修改安全提示benchmark,将原本被视为安全的提示转变为扩散模型生成不适合内容,因此揭示了现有的安全机制的缺陷,这些缺陷可能导致生成危害内容。

Implicit regularization via soft ascent-descent

  • paper_url: http://arxiv.org/abs/2310.10006
  • repo_url: https://github.com/feedbackward/bdd-flood
  • paper_authors: Matthew J. Holland, Kosuke Nakatani
  • for: 提高机器学习过程中的OFF-sample泛化性能,避免过多的试错和重复。
  • methods: 使用Gradient Regularization的softened、点 wise机制,以降低边缘点的影响和抑制异常值的影响。
  • results: 与SAM和Flooding相比,SoftAD可以实现类比的分类精度,同时具有远小的损失泛化差和模型评价。
    Abstract As models grow larger and more complex, achieving better off-sample generalization with minimal trial-and-error is critical to the reliability and economy of machine learning workflows. As a proxy for the well-studied heuristic of seeking "flat" local minima, gradient regularization is a natural avenue, and first-order approximations such as Flooding and sharpness-aware minimization (SAM) have received significant attention, but their performance depends critically on hyperparameters (flood threshold and neighborhood radius, respectively) that are non-trivial to specify in advance. In order to develop a procedure which is more resilient to misspecified hyperparameters, with the hard-threshold "ascent-descent" switching device used in Flooding as motivation, we propose a softened, pointwise mechanism called SoftAD that downweights points on the borderline, limits the effects of outliers, and retains the ascent-descent effect. We contrast formal stationarity guarantees with those for Flooding, and empirically demonstrate how SoftAD can realize classification accuracy competitive with SAM and Flooding while maintaining a much smaller loss generalization gap and model norm. Our empirical tests range from simple binary classification on the plane to image classification using neural networks with millions of parameters; the key trends are observed across all datasets and models studied, and suggest a potential new approach to implicit regularization.
    摘要 To overcome this limitation, we propose a softened, pointwise mechanism called SoftAD, which downweights points on the borderline, limits the effects of outliers, and retains the ascent-descent effect. By comparing the formal stationarity guarantees of SoftAD with those of Flooding, we demonstrate that SoftAD can achieve classification accuracy competitive with SAM and Flooding while maintaining a much smaller loss generalization gap and model norm.Our empirical tests cover a range of datasets and models, from simple binary classification on the plane to image classification using neural networks with millions of parameters. The key trends observed across all datasets and models suggest a potential new approach to implicit regularization.

Conformal Contextual Robust Optimization

  • paper_url: http://arxiv.org/abs/2310.10003
  • repo_url: None
  • paper_authors: Yash Patel, Sahana Rayan, Ambuj Tewari
  • for: 这个论文是为了解决决策问题,具体来说是使用数据驱动方法来避免对不确定性范围的误差,从而提高决策的优化。
  • methods: 这个论文使用的方法是基于Conditional Generative Model的高维空间中的非 conjugate 预测区域,这些预测区域具有 Desired distribution-free coverage guarantees。
  • results: 研究人员通过在一系列的 simulations-based inference benchmark tasks和基于气象预测的交通路径规划问题来展示 CPO 框架的效果,并提供了semantically meaningful的视觉总结来解释决策的优化。
    Abstract Data-driven approaches to predict-then-optimize decision-making problems seek to mitigate the risk of uncertainty region misspecification in safety-critical settings. Current approaches, however, suffer from considering overly conservative uncertainty regions, often resulting in suboptimal decisionmaking. To this end, we propose Conformal-Predict-Then-Optimize (CPO), a framework for leveraging highly informative, nonconvex conformal prediction regions over high-dimensional spaces based on conditional generative models, which have the desired distribution-free coverage guarantees. Despite guaranteeing robustness, such black-box optimization procedures alone inspire little confidence owing to the lack of explanation of why a particular decision was found to be optimal. We, therefore, augment CPO to additionally provide semantically meaningful visual summaries of the uncertainty regions to give qualitative intuition for the optimal decision. We highlight the CPO framework by demonstrating results on a suite of simulation-based inference benchmark tasks and a vehicle routing task based on probabilistic weather prediction.
    摘要 <>使用数据驱动的方法来解决决策问题,以减少不确定性区域的误差,在安全关键的场景中非常重要。现有的方法frequently suffer from considering overly conservative uncertainty regions, often resulting in suboptimal decision-making. To address this, we propose Conformal-Predict-Then-Optimize (CPO), a framework that leverages highly informative, nonconvex conformal prediction regions over high-dimensional spaces based on conditional generative models, which provide desired distribution-free coverage guarantees. Despite providing robustness, such black-box optimization procedures alone may lack confidence due to the lack of explanation of why a particular decision was found to be optimal. We, therefore, augment CPO with additional provision of semantically meaningful visual summaries of the uncertainty regions to provide qualitative intuition for the optimal decision. We demonstrate the effectiveness of the CPO framework through results on a suite of simulation-based inference benchmark tasks and a vehicle routing task based on probabilistic weather prediction.>Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. If you prefer Traditional Chinese, I can provide that as well.

Outlier Detection Using Generative Models with Theoretical Performance Guarantees

  • paper_url: http://arxiv.org/abs/2310.09999
  • repo_url: None
  • paper_authors: Jirong Yi, Jingchao Gao, Tianming Wang, Xiaodong Wu, Weiyu Xu
  • for: 这篇论文考虑了模拟器模型中的信号恢复问题,具体来说是在线性测量中受到稀疏异常的情况下恢复原始信号。
  • methods: 我们提出了一种异常检测方法,可以在模拟器模型下恢复原始信号,并且我们提供了有关信号恢复的理论保证。
  • results: 我们的实验结果表明,使用我们的方法可以成功恢复信号,即使在稀疏异常的情况下。我们的方法比传统的lasso和平方$\ell_2$最小化方法更高效。
    Abstract This paper considers the problem of recovering signals modeled by generative models from linear measurements contaminated with sparse outliers. We propose an outlier detection approach for reconstructing the ground-truth signals modeled by generative models under sparse outliers. We establish theoretical recovery guarantees for reconstruction of signals using generative models in the presence of outliers, giving lower bounds on the number of correctable outliers. Our results are applicable to both linear generator neural networks and the nonlinear generator neural networks with an arbitrary number of layers. We propose an iterative alternating direction method of multipliers (ADMM) algorithm for solving the outlier detection problem via $\ell_1$ norm minimization, and a gradient descent algorithm for solving the outlier detection problem via squared $\ell_1$ norm minimization. We conduct extensive experiments using variational auto-encoder and deep convolutional generative adversarial networks, and the experimental results show that the signals can be successfully reconstructed under outliers using our approach. Our approach outperforms the traditional Lasso and $\ell_2$ minimization approach.
    摘要 We propose two algorithms to solve the outlier detection problem: an iterative alternating direction method of multipliers (ADMM) algorithm that minimizes the $\ell_1$ norm, and a gradient descent algorithm that minimizes the squared $\ell_1$ norm. We conduct extensive experiments using variational auto-encoders and deep convolutional generative adversarial networks, and the results show that our approach can successfully reconstruct the signals even under outliers. Our approach outperforms traditional Lasso and $\ell_2$ minimization methods.

  • paper_url: http://arxiv.org/abs/2310.09991
  • repo_url: None
  • paper_authors: Thanh Tung Khuat, Robert Bassett, Ellen Otte, Alistair Grevis-James, Bogdan Gabrys
  • for: 本研究旨在提供一个全面的机器学习(ML)解决方案在生物医药领域的应用现状,包括生物产品设计、监测、控制和优化的过程中的应用。
  • methods: 本研究使用的方法包括机器学习模型的采用,以提高生物医药生产过程中的分析、监测和控制能力。
  • results: 本研究结果表明,机器学习模型在生物医药生产过程中的应用可以提高生产效率、产品质量和生产可靠性等方面的表现。同时,本研究还揭示了生物医药过程数据的复杂性和多维性,以及机器学习模型在生物医药过程中的挑战和限制。
    Abstract While machine learning (ML) has made significant contributions to the biopharmaceutical field, its applications are still in the early stages in terms of providing direct support for quality-by-design based development and manufacturing of biopharmaceuticals, hindering the enormous potential for bioprocesses automation from their development to manufacturing. However, the adoption of ML-based models instead of conventional multivariate data analysis methods is significantly increasing due to the accumulation of large-scale production data. This trend is primarily driven by the real-time monitoring of process variables and quality attributes of biopharmaceutical products through the implementation of advanced process analytical technologies. Given the complexity and multidimensionality of a bioproduct design, bioprocess development, and product manufacturing data, ML-based approaches are increasingly being employed to achieve accurate, flexible, and high-performing predictive models to address the problems of analytics, monitoring, and control within the biopharma field. This paper aims to provide a comprehensive review of the current applications of ML solutions in a bioproduct design, monitoring, control, and optimisation of upstream, downstream, and product formulation processes. Finally, this paper thoroughly discusses the main challenges related to the bioprocesses themselves, process data, and the use of machine learning models in biopharmaceutical process development and manufacturing. Moreover, it offers further insights into the adoption of innovative machine learning methods and novel trends in the development of new digital biopharma solutions.
    摘要 机器学习(ML)在生物医药领域已经做出了重要贡献,但是其应用还处于初期阶段,对生物医药生产的质量设计和生产进行直接支持的应用还尚未发挥出大量潜力。然而,由于生产数据的积累,ML模型的应用正在不断增加,取代传统的多变量数据分析方法。这种趋势主要归功于实时监测生产过程中变量和产品质量特征的实施,以及高级进程分析技术的普及。由于生物产品设计、生产和加工数据的复杂性和多维性,ML方法在解决生物过程数据分析、监测和控制方面提供了高精度、灵活性和高性能的预测模型。本文旨在为读者提供生物产品设计、监测、控制和优化过程中机器学习解决方案的全面审视。此外,本文还详细讨论了生物过程本身、数据和机器学习模型在生物医药过程开发和生产中的主要挑战,以及采用创新的机器学习方法和新趋势在生物医药领域的发展。

Personalization of CTC-based End-to-End Speech Recognition Using Pronunciation-Driven Subword Tokenization

  • paper_url: http://arxiv.org/abs/2310.09988
  • repo_url: None
  • paper_authors: Zhihong Lei, Ernest Pusateri, Shiyi Han, Leo Liu, Mingbin Xu, Tim Ng, Ruchir Travadi, Youyuan Zhang, Mirko Hannemann, Man-Hung Siu, Zhen Huang
  • for: 这个论文旨在提高端到端语音识别系统的个性化性,使其能够更准确地识别个人内容,如联系人姓名。
  • methods: 该论文基于连接主义时间分类的技术,提出了一种生成个人实体唤起的新的子词tokenization方法。此外,该论文还使用了两种已知技术:上下文偏移和词段均衡。
  • results: 根据论文的表述,使用这些技术组合后,个人名实体识别精度与一个竞争性hybrid系统相当。
    Abstract Recent advances in deep learning and automatic speech recognition have improved the accuracy of end-to-end speech recognition systems, but recognition of personal content such as contact names remains a challenge. In this work, we describe our personalization solution for an end-to-end speech recognition system based on connectionist temporal classification. Building on previous work, we present a novel method for generating additional subword tokenizations for personal entities from their pronunciations. We show that using this technique in combination with two established techniques, contextual biasing and wordpiece prior normalization, we are able to achieve personal named entity accuracy on par with a competitive hybrid system.
    摘要 Translated into Simplified Chinese:最近的深度学习和自动语音识别技术的进步,已经提高了端到端语音识别系统的准确率,但是个人内容such as contact names仍然是一个挑战。在这项工作中,我们描述了基于连接主义时间分类的个人化解决方案。基于之前的工作,我们提出了一种新的方法,通过个人实体的发音来生成额外的子字符串拼接。我们表明,使用这种技术与两种已知技术,Contextual biasing和wordpiece prior normalization,可以达到与竞争性混合系统相同的个人名实体准确率。

eess.IV - 2023-10-16

Overcoming the Rayleigh limit in extremely low SNR

  • paper_url: http://arxiv.org/abs/2310.10633
  • repo_url: None
  • paper_authors: Hyunsoo Choi, Seungman Choi, Peter Menart, Angshuman Deka, Zubin Jacob
    for:* 这个论文的目的是开发一种新的随机子衰减图像重构算法(SSRI),以优化低信号响应率(SNR)和分辨率较低的光学成像系统。methods:* 该算法利用了常见的成像设备,使其在实际应用中易于适应。results:* 对于多种挑战性的场景,如非常低的SNR水平和较大的相对亮度比,SSRI算法表现出色,超过了已知的理德兹逊-卢西(Richardson-Lucy)减 convolution和CLEAN算法。* SSRI算法在实验图像中成功地估计了点源的位置、亮度和数量,并且在SNR水平低于1.2和子衰减范围内表现出80%-40%的成功率,位偏误在2.5像素以下。
    Abstract Overcoming the diffraction limit and addressing low Signal-to-Noise Ratio (SNR) scenarios have posed significant challenges to optical imaging systems in applications such as medical diagnosis, remote sensing, and astronomical observations. In this study, we introduce a novel Stochastic Sub-Rayleigh Imaging (SSRI) algorithm capable of localizing point sources and estimating their positions, brightness, and number in low SNR conditions and within the diffraction limit. The SSRI algorithm utilizes conventional imaging devices, facilitating practical and adaptable solutions for real-world applications. Through extensive experimentation, we demonstrate that our proposed method outperforms established algorithms, such as Richardson-Lucy deconvolution and CLEAN, in various challenging scenarios, including extremely low SNR conditions and large relative brightness ratios. We achieved between 40% and 80% success rate in estimating the number of point sources in experimental images with SNR less than 1.2 and sub-Rayleigh separations, with mean position errors less than 2.5 pixels. In the same conditions, the Richardson-Lucy and CLEAN algorithms correctly estimated the number of sources between 0% and 10% of the time, with mean position errors greater than 5 pixels. Notably, SSRI consistently performs well even in the sub-Rayleigh region, offering a benchmark for assessing future quantum superresolution techniques. In conclusion, the SSRI algorithm presents a significant advance in overcoming diffraction limitations in optical imaging systems, particularly under low SNR conditions, with potential widespread impact across multiple fields like biomedical microscopy and astronomical imaging.
    摘要 超过 diffraction limit 和低信号噪比 (SNR) 场景下,光学成像系统在医疗诊断、远程探测和天文观测等领域中受到了重大挑战。在这种研究中,我们介绍了一种新的 Stochastic Sub-Rayleigh Imaging(SSRI)算法,能够在低 SNR 条件下和 diffraction limit 内 Localize 点源并估计其位置、亮度和数量。SSRI 算法可以使用普通的成像设备,提供了实用和适应的解决方案。经过广泛的实验,我们证明了我们的提posed方法在各种挑战性enario中都能够超过Richardson-Lucy 混合和 CLEAN 算法,包括 extremely low SNR 条件下和大relative brightness ratio。我们在实验图像中成功地估计了40%到80%的点源数量,位置误差在2.5 pix 左右,而Richardson-Lucy 和 CLEAN 算法只能在0%到10%的时间内正确地估计点源数量,位置误差大于5 pix。特别是,SSRI 在 sub-Rayleigh 区域中表现良好,为未来 quantum superresolution 技术的评估提供了标准。综上所述,SSRI 算法在光学成像系统中超过 diffraction limit 的能力,特别是在低 SNR 条件下,具有广泛的应用前景,如生物微scopy 和天文成像。

NeuroQuantify – An Image Analysis Software for Detection and Quantification of Neurons and Neurites using Deep Learning

  • paper_url: http://arxiv.org/abs/2310.10978
  • repo_url: https://github.com/StanleyZ0528/neural-image-segmentation
  • paper_authors: Ka My Dang, Yi Jia Zhang, Tianchen Zhang, Chao Wang, Anton Sinner, Piero Coronica, Joyce K. S. Poon
  • for: 研究neuronal networks的发展和neuron growth的量化信息
  • methods: 使用深度学习自动分类 cells和neurites
  • results: 可以快速和高效地分类 cells和neurites,并提供neurite length和orientation的量化信息
    Abstract The segmentation of cells and neurites in microscopy images of neuronal networks provides valuable quantitative information about neuron growth and neuronal differentiation, including the number of cells, neurites, neurite length and neurite orientation. This information is essential for assessing the development of neuronal networks in response to extracellular stimuli, which is useful for studying neuronal structures, for example, the study of neurodegenerative diseases and pharmaceuticals. However, automatic and accurate analysis of neuronal structures from phase contrast images has remained challenging. To address this, we have developed NeuroQuantify, an open-source software that uses deep learning to efficiently and quickly segment cells and neurites in phase contrast microscopy images. NeuroQuantify offers several key features: (i) automatic detection of cells and neurites; (ii) post-processing of the images for the quantitative neurite length measurement based on segmentation of phase contrast microscopy images, and (iii) identification of neurite orientations. The user-friendly NeuroQuantify software can be installed and freely downloaded from GitHub https://github.com/StanleyZ0528/neural-image-segmentation.
    摘要 segmenation of cells and neurites in microscopy images of neuronal networks provides valuable quantitative information about neuron growth and neuronal differentiation, including the number of cells, neurites, neurite length and neurite orientation. This information is essential for assessing the development of neuronal networks in response to extracellular stimuli, which is useful for studying neuronal structures, for example, the study of neurodegenerative diseases and pharmaceuticals. However, automatic and accurate analysis of neuronal structures from phase contrast images has remained challenging. To address this, we have developed NeuroQuantify, an open-source software that uses deep learning to efficiently and quickly segment cells and neurites in phase contrast microscopy images. NeuroQuantify offers several key features: (i) automatic detection of cells and neurites; (ii) post-processing of the images for the quantitative neurite length measurement based on segmentation of phase contrast microscopy images, and (iii) identification of neurite orientations. The user-friendly NeuroQuantify software can be installed and freely downloaded from GitHub https://github.com/StanleyZ0528/neural-image-segmentation.Here's the word-for-word translation of the text into Simplified Chinese: cells 和 neurites 的分 segmentation 在 neuronal networks 的 microscopy 图像中提供了有价值的量化信息,包括细胞数量、辐化长度、辐化方向等。这些信息对于研究 neuronal networks 的发展响应 extracellular stimuli 非常重要,这些信息可以用于研究 neuronal structures,例如研究 neuodegenerative diseases 和 pharmaceuticals。然而,从 phase contrast 图像中自动和准确地分析 neuronal structures 一直是一个挑战。为解决这个问题,我们已经开发了 NeuroQuantify,一个开源的软件,使用深度学习来快速和高效地分 segmentation 细胞和 neurites 在 phase contrast microscopy 图像中。NeuroQuantify 提供了多个关键特性: (i) 自动检测细胞和辐化; (ii) 根据 segmentation 的图像进行后处理,以获取辐化长度的量化测量;以及 (iii) 辐化方向的识别。用户友好的 NeuroQuantify 软件可以在 GitHub 上免费下载 https://github.com/StanleyZ0528/neural-image-segmentation。

Impact of Data Synthesis Strategies for the Classification of Craniosynostosis

  • paper_url: http://arxiv.org/abs/2310.10199
  • repo_url: https://github.com/kit-ibt/craniosource-gan-pca-ssm
  • paper_authors: Matthias Schaufelberger, Reinald Peter Kühle, Andreas Wachter, Frederic Weichel, Niclas Hagen, Friedemann Ringwald, Urs Eisenmann, Jürgen Hoffmann, Michael Engel, Christian Freudlsperger, Werner Nahm
  • for: 用于评估和分类颅部凹陷症。
  • methods: 使用三种不同的人工数据源:统计学形态模型(SSM)、生成敌对网络(GAN)和图像基本主成分分析,为一种基于 convolutional neural network(CNN)的颅部凹陷症分类。CNN 只在人工数据上训练,但 Validate 和测试在临床数据上。
  • results: 组合 SSM 和 GAN 达到了高于 0.96 的准确率和高于 0.95 的 F1 分数在未看到的测试集上。与训练在临床数据上的差异小于 0.01。包括第二个图像模式可以提高分类性能。So the three key points are:1. The paper is written to assess and classify craniosynostosis using photogrammetric surface scans.2. The methods used include three different synthetic data sources: a statistical shape model, a generative adversarial network, and image-based principal component analysis.3. The results show that a combination of these synthetic data sources can achieve high accuracy and F1 score (over 0.95 and 0.96 respectively) on unseen test sets, with little difference between training on synthetic data and clinical data.
    Abstract Introduction: Photogrammetric surface scans provide a radiation-free option to assess and classify craniosynostosis. Due to the low prevalence of craniosynostosis and high patient restrictions, clinical data is rare. Synthetic data could support or even replace clinical data for the classification of craniosynostosis, but this has never been studied systematically. Methods: We test the combinations of three different synthetic data sources: a statistical shape model (SSM), a generative adversarial network (GAN), and image-based principal component analysis for a convolutional neural network (CNN)-based classification of craniosynostosis. The CNN is trained only on synthetic data, but validated and tested on clinical data. Results: The combination of a SSM and a GAN achieved an accuracy of more than 0.96 and a F1-score of more than 0.95 on the unseen test set. The difference to training on clinical data was smaller than 0.01. Including a second image modality improved classification performance for all data sources. Conclusion: Without a single clinical training sample, a CNN was able to classify head deformities as accurate as if it was trained on clinical data. Using multiple data sources was key for a good classification based on synthetic data alone. Synthetic data might play an important future role in the assessment of craniosynostosis.
    摘要 方法:我们测试了三种不同的生成数据源:统计学形态模型(SSM)、生成对抗网络(GAN)和图像基于主成分分析(PCA),用于基于 convolutional neural network(CNN)的颅部缺陷分类。CNN只在生成数据上训练,但VALIDATE和测试在临床数据上进行验证。结果:SSM和GAN的组合实现了在未看到的测试集上的准确率高于0.96和F1分数高于0.95。与训练在临床数据上的差异小于0.01。包括第二个图像特征提高了所有数据源的分类性能。结论:没有任何临床训练样本,CNN仍可以准确地将头形缺陷分类为临床数据。使用多种数据源是针对synthetic数据alone的分类的关键。synthetic数据可能在未来对颅部缺陷的评估中扮演一个重要的角色。

eess.SP - 2023-10-16

Rapid Non-cartesian Reconstruction Using an Implicit Representation of GROG Kernels

  • paper_url: http://arxiv.org/abs/2310.10823
  • repo_url: None
  • paper_authors: Daniel Abraham, Mark Nishimura, Xiaozhi Cao, Congyu Liao, Kawin Setsompop
  • for: 提高MR图像成像的速度和效率,使非 carteesian sampling更广泛应用
  • methods: 使用iGROG方法将非 carteesian数据转换为cartesian数据,以便更加简单和快速地进行重建
  • results: 提高了MR图像成像的速度和效率,并且可以更好地抵消运动 artifacts
    Abstract MRI data is acquired in Fourier space. Data acquisition is typically performed on a Cartesian grid in this space to enable the use of a fast Fourier transform algorithm to achieve fast and efficient reconstruction. However, it has been shown that for multiple applications, non-Cartesian data acquisition can improve the performance of MR imaging by providing fast and more efficient data acquisition, and improving motion robustness. Nonetheless, the image reconstruction process of non-Cartesian data is more involved and can be time-consuming, even through the use of efficient algorithms such as non-uniform FFT (NUFFT). This work provides an efficient approach (iGROG) to transform the non-Cartesian data into Cartesian data, to achieve simpler and faster reconstruction which should help enable non-Cartesian data sampling to be performed more widely in MRI.
    摘要

Constant Modulus Waveform Design with Block-Level Interference Exploitation for DFRC Systems

  • paper_url: http://arxiv.org/abs/2310.10804
  • repo_url: None
  • paper_authors: Byunghyun Lee, Anindya Bijoy Das, David J. Love, Christopher G. Brinton, James V. Krogmeier
  • for: 这篇论文旨在设计具有双 функ数 radar-通信(DFRC)系统的常数模ulus波形。
  • methods: 本文使用了相互干扰基于封页水平 precoding(CI-BLP)来利用多用户和雷达传输所带来的歪曲。我们还提出了一个基于主要化-最小化(MM)的解决方案,并使用了一个改进的主要化函数,充分利用了一个新的 діагональ矩阵结构。
  • results: 透过严谨的 simulations,我们证明了提案的方法和主要化函数的效果。
    Abstract Dual-functional radar-communication (DFRC) is a promising technology where radar and communication functions operate on the same spectrum and hardware. In this paper, we propose an algorithm for designing constant modulus waveforms for DFRC systems. Particularly, we jointly optimize the correlation properties and the spatial beam pattern. For communication, we employ constructive interference-based block-level precoding (CI-BLP) to exploit distortion due to multi-user and radar transmission. We propose a majorization-minimization (MM)-based solution to the formulated problem. To accelerate convergence, we propose an improved majorizing function that leverages a novel diagonal matrix structure. We then evaluate the performance of the proposed algorithm through rigorous simulations. Simulation results demonstrate the effectiveness of the proposed approach and the proposed majorizer.
    摘要 双功能雷达通信(DFRC)技术是一种有前途的技术,雷达和通信功能都运行在同一频谱和硬件上。在这篇论文中,我们提出了一种常数模式波形设计算法,特别是同时优化相关性和空间扫描方式。在通信方面,我们使用基于构建性干扰的块级预编码(CI-BLP)来利用多用户和雷达传输所导致的扭曲。我们提出了一种基于主要化-最小化(MM)的解决方案,并提出了一种改进的主要化函数,利用了一个新的对角矩阵结构。然后,我们通过严格的仿真测试评估了提案的性能和提案的主要化函数。Here's the text with some additional information about the Simplified Chinese translation:Simplified Chinese is a written form of Chinese that uses simpler characters and grammar than Traditional Chinese. It is commonly used in mainland China and Singapore.In this translation, I have used Simplified Chinese characters and grammar to translate the text. However, I have kept the original English sentence structure and phrasing to ensure that the meaning of the text is preserved.Note that some technical terms and jargon may have different translations in Simplified Chinese, depending on the context and the specific field of study. However, I have tried my best to provide an accurate and natural-sounding translation based on my knowledge of Simplified Chinese.

Neuromorphic Place Cells

  • paper_url: http://arxiv.org/abs/2310.10790
  • repo_url: None
  • paper_authors: Zhaoqi Chen, Ralph Etienne-Cummings
  • for: 这个脑机模型系统可能比传统系统更加高效实现。
  • methods: 我们实现了混合模式的空间编码神经元,包括theta细胞、vector细胞和place细胞。这些神经元组成了生物学可能的网络,可以重produce地方Cells的localization功能。
  • results: 我们的模型在Analog Circuit变化时的Robustness得到了实验 validate。我们提供了动态脑机SLAM系统的实现基础和生物学形成空间细胞的灵感。
    Abstract A neuromorphic SLAM system shows potential for more efficient implementation than its traditional counterpart. We demonstrate a mixed-mode implementation for spatial encoding neurons including theta cells, vector cells and place cells. Together, they form a biologically plausible network that could reproduce the localization functionality of place cells. Experimental results validate the robustness of our model when suffering from variations of analog circuits. We provide a foundation for implementing dynamic neuromorphic SLAM systems and inspirations for the formation of spatial cells in biology.
    摘要 一种神经模拟SLAM系统显示了更高效的实现可能性,相比传统系统。我们实现了混合模式的空间编码神经元,包括theta细胞、向量细胞和位置细胞。这些神经元共同组成了生物学上可能的网络,可以重现位置细胞的地方化功能。实验结果证明我们的模型在 анаóg逻circuit变化时的稳定性。我们提供了神经模拟SLAM系统的实现基础和生物学形成空间细胞的灵感。Note: "Simplified Chinese" is also known as "Mandarin Chinese" or "Standard Chinese".

Indoor Wireless Signal Modeling with Smooth Surface Diffraction Effects

  • paper_url: http://arxiv.org/abs/2310.10578
  • repo_url: None
  • paper_authors: Ruichen Wang, Samuel Audia, Dinesh Manocha
  • for: 提高室内电磁场模拟的准确性,包括表面散射的影响
  • methods: 使用统一几何理论 Of Diffraction (UTD) 表面散射,并提出了精炼表面 UTD 和高效计算射线路的技术
  • results: 提高阴影区预测功率约 5dB,并能够捕捉到阴影区之外的复杂场效应,并且在不同室内场景下表现出60%更快的计算速度。
    Abstract We present a novel algorithm that enhances the accuracy of electromagnetic field simulations in indoor environments by incorporating the Uniform Geometrical Theory of Diffraction (UTD) for surface diffraction. This additional diffraction phenomenology is important for the design of modern wireless systems and allows us to capture the effects of more complex scene geometries. Central to our methodology is the Dynamic Coherence-Based EM Ray Tracing Simulator (DCEM), and we augment that formulation with smooth surface UTD and present techniques to efficiently compute the ray paths. We validate our additions by comparing them to analytical solutions of a sphere, method of moments solutions from FEKO, and ray-traced indoor scenes from WinProp. Our algorithm improves shadow region predicted powers by about 5dB compared to our previous work, and captures nuanced field effects beyond shadow boundaries. We highlight the performance on different indoor scenes and observe 60% faster computation time over WinProp.
    摘要 我们提出了一种新的算法,用于提高室内电磁场 simulations 的准确性,通过包含表面折射理论(UTD)的各向异性折射现象。这种额外的折射现象对现代无线系统设计非常重要,允许我们捕捉更复杂的场景几何。我们的方法中心是动态几何相关性基于EM射线追踪模拟器(DCEM),并在该形式ulation中添加了光滑表面UTD。我们还提供了有效计算射线路的技术。我们的添加与 Analytical solutions of a sphere、method of moments solutions from FEKO和WinProp的射线追踪场景进行比较。我们的算法可以在不同的室内场景中提高阴影区域预测功率约5dB,并捕捉到场效应 beyond shadow boundaries。我们还证明了我们的算法在不同的室内场景中的性能,并发现其计算时间比WinProp快60%。

Applications of Distributed Machine Learning for the Internet-of-Things: A Comprehensive Survey

  • paper_url: http://arxiv.org/abs/2310.10549
  • repo_url: None
  • paper_authors: Mai Le, Thien Huynh-The, Tan Do-Duy, Thai-Hoc Vu, Won-Joo Hwang, Quoc-Viet Pham
  • for: 提高 emerging 无线网络(如 beyond 5G 和 6G)中的服务和应用程序的质量,通过在互联网物联网(IoT)中使用人工智能(AI)。
  • methods: 分布式机器学习(distributed learning)方法,包括联邦学习、多代理奖励学习和分布式推理。
  • results: 对 IoT 服务和应用程序的重要提高,包括数据共享和计算卸载、定位、移动 Crowdsensing 和安全隐私。
    Abstract The emergence of new services and applications in emerging wireless networks (e.g., beyond 5G and 6G) has shown a growing demand for the usage of artificial intelligence (AI) in the Internet of Things (IoT). However, the proliferation of massive IoT connections and the availability of computing resources distributed across future IoT systems have strongly demanded the development of distributed AI for better IoT services and applications. Therefore, existing AI-enabled IoT systems can be enhanced by implementing distributed machine learning (aka distributed learning) approaches. This work aims to provide a comprehensive survey on distributed learning for IoT services and applications in emerging networks. In particular, we first provide a background of machine learning and present a preliminary to typical distributed learning approaches, such as federated learning, multi-agent reinforcement learning, and distributed inference. Then, we provide an extensive review of distributed learning for critical IoT services (e.g., data sharing and computation offloading, localization, mobile crowdsensing, and security and privacy) and IoT applications (e.g., smart healthcare, smart grid, autonomous vehicle, aerial IoT networks, and smart industry). From the reviewed literature, we also present critical challenges of distributed learning for IoT and propose several promising solutions and research directions in this emerging area.
    摘要 随着新的服务和应用程序在 развивающихся无线网络(例如 beyond 5G 和 6G)的出现,人们对艺ificial intelligence(AI)在互联网物联网(IoT)中的使用的需求在增加。然而,质量的巨大 IoT 连接和未来 IoT 系统中的计算资源的分布强烈要求开发分布式 AI,以提供更好的 IoT 服务和应用程序。因此,现有的 AI 启用 IoT 系统可以通过实施分布式机器学习(分布式学习)方法进行增强。本工作的目的是为提供分布式学习在 IoT 服务和应用程序方面的全面的评价。特别是,我们首先提供机器学习的背景,然后介绍一些常见的分布式学习方法,如联合学习、多代理人奖励学习和分布式推理。然后,我们对分布式学习在关键的 IoT 服务(例如数据分享和计算卸载、位置定位、移动 Crowdsensing 和安全隐私)和 IoT 应用程序(例如智能医疗、智能电网、自动驾驶、空中 IoT 网络和智能工业)进行了广泛的评审。从 Literature 中,我们还提出了分布式学习在 IoT 中的主要挑战和一些可能的解决方案和研究方向。

A Tutorial on Chirp Spread Spectrum for LoRaWAN: Basics and Key Advances

  • paper_url: http://arxiv.org/abs/2310.10503
  • repo_url: None
  • paper_authors: Alireza Maleki, Ha H. Nguyen, Ebrahim Bedeer, Robert Barton
  • for: 本研究旨在提供一个全面的CSS模ulation在LoRaWAN应用中的教程,包括信号生成、检测、错误性表现和频率特性等方面的分析。
  • methods: 本研究使用了LoRa特有的CSS模ulation,并对其在IoT网络中的应用进行了深入的检查和分析。
  • results: 研究发现CSS模ulation在LoRaWAN应用中具有优秀的错误性和spectral caracteristics,并提出了一些适用于IoT网络的CSS模ulation应用的新技术和算法。
    Abstract Chirps spread spectrum (CSS) modulation is the heart of long-range (LoRa) modulation used in the context of long-range wide area network (LoRaWAN) in internet of things (IoT) scenarios. Despite being a proprietary technology owned by Semtech Corp., LoRa modulation has drawn much attention from the research and industry communities in recent years. However, to the best of our knowledge, a comprehensive tutorial, investigating the CSS modulation in the LoRaWAN application, is missing in the literature. Therefore, in the first part of this paper, we provide a thorough analysis and tutorial of CSS modulation modified by LoRa specifications, discussing various aspects such as signal generation, detection, error performance, and spectral characteristics. Moreover, a summary of key recent advances in the context of CSS modulation applications in IoT networks is presented in the second part of this paper under four main categories of transceiver configuration and design, data rate improvement, interference modeling, and synchronization algorithms.
    摘要 射频扩散模ulation (CSS) 是LoRa射频模ulation的核心,用于长距离宽频网络(LoRaWAN)应用场景中的物联网(IoT)。尽管LoRa模ulation是Semtech Corp.拥有的专有技术,但在过去几年中,研究和业界社区对其吸引了很多关注。然而,根据我们所知,Literature中没有一篇全面的教程,探讨CSS模ulation在LoRaWAN应用中的各个方面,包括信号生成、检测、错误性能和频谱特性。因此,在本文的第一部分中,我们提供了CSS模ulation在LoRaWAN应用中的全面分析和教程,讨论了各种方面。此外,在文章的第二部分中,我们还提供了针对CSS模ulation在IoT网络应用中的四个主要类别的扩散配置和设计、数据速率改进、干扰模型和同步算法的最新进展。

Performance Analysis of a Low-Complexity OTFS Integrated Sensing and Communication System

  • paper_url: http://arxiv.org/abs/2310.10476
  • repo_url: None
  • paper_authors: Tommaso Bacchielli, Lorenzo Pucci, Enrico Paolini, Andrea Giorgetti
  • for: 该论文提出了一种低复杂度估计方法,用于 ortfs 基于集成感知通信(isac)系统。
  • methods: 我们首先定义了四个低维度矩阵,用于计算通道矩阵通过简单的代数手动操作。然后,我们建立了一个独立系统参数的分析标准,用于 Identify the most informative elements within these derived matrices,利用 Dirichlet kernel 的性质。这使得我们可以简化这些矩阵,保留只有关键的元素,从而实现高效、低复杂度的感知接收器。
  • results: 数字结果表明,提出的近似技术可以高效地保持感知性能, measured in terms of root mean square error (RMSE) of the range and velocity estimation, 同时减少计算努力 enormously。
    Abstract This work proposes a low-complexity estimation approach for an orthogonal time frequency space (OTFS)-based integrated sensing and communication (ISAC) system. In particular, we first define four low-dimensional matrices used to compute the channel matrix through simple algebraic manipulations. Secondly, we establish an analytical criterion, independent of system parameters, to identify the most informative elements within these derived matrices, leveraging the properties of the Dirichlet kernel. This allows the distilling of such matrices, keeping only those entries that are essential for detection, resulting in an efficient, low-complexity implementation of the sensing receiver. Numerical results, which refer to a vehicular scenario, demonstrate that the proposed approximation technique effectively preserves the sensing performance, evaluated in terms of root mean square error (RMSE) of the range and velocity estimation, while concurrently reducing the computational effort enormously.
    摘要 Note: The text has been translated into Simplified Chinese, which is the standard writing system used in mainland China. The traditional Chinese writing system is also widely used, especially in Taiwan and Hong Kong. If you prefer the traditional Chinese writing system, please let me know and I can provide the translation accordingly.

Flag Sequence Set Design for Low-Complexity Delay-Doppler Estimation

  • paper_url: http://arxiv.org/abs/2310.10457
  • repo_url: None
  • paper_authors: Lingsheng Meng, Yong Liang Guan, Yao Ge, Zilong Liu
  • for: 该 paper 探讨了用 Flag 序列实现低复杂度延迟-多普勒估计,通过利用 Flag 序列的特殊峰柜ambiguity函数(AF)。不同于现有的 Flag 序列设计,我们的设计不受 prime 长度和 periodic auto-AF 的限制,而是设计了 Flag 序列集合的任意长度和低(非极) periodic/aperiodic auto-和cross-AF。
  • methods: 我们首先investigated Zone-based Curtain sequence sets of arbitrary lengths的代数设计。我们的提议的设计导致了新的 Curtain sequence sets,其具有理想的毯幕自动ambiguity函数(AF)和低/zero cross-AF在延迟-多普勒频率范围内。使用这些 Curtain sequence sets,我们提出了两个优化问题,以最小化 Flag sequence set的总成本weighted integrated sidelobe level(SCWISL)。我们还提出了一种加速Parallel Partially Majorization-Minimization Algorithm,用于同时优化发射 Flag sequence和与其匹配/不匹配的参照序列。
  • results: 我们的实验结果表明,我们的提议 Flag sequences 比现有 Flag sequences 具有更好的 SCWISL 和自定义峰-侧噪比。此外,我们的 Flag sequences under Flag method 的 Mean Squared Errors 逐渐接近 Cramer-Rao Lower Bound 和 Sampling Bound,当信号噪声比高 enough 时。
    Abstract This paper studies Flag sequences for lowcomplexity delay-Doppler estimation by exploiting their distinctive peak-curtain ambiguity functions (AFs). Unlike the existing Flag sequence designs that are limited to prime lengths and periodic auto-AFs, we aim to design Flag sequence sets of arbitrary lengths and with low (nontrivial) periodic/aperiodic auto- and cross-AFs. Since every Flag sequence consists of a Curtain sequence and a Peak sequence, we first investigate the algebraic design of zone-based Curtain sequence sets of arbitrary lengths. Our proposed design gives rise to novel Curtain sequence sets with ideal curtain auto-AFs and low/zero cross-AFs within the delay-Doppler zone of interest. Leveraging these Curtain sequence sets, two optimization problems are formulated to minimize the summed customized weighted integrated sidelobe level (SCWISL) of the Flag sequence set. Accelerated Parallel Partially Majorization-Minimization Algorithms are proposed to jointly optimize the transmit Flag sequences and matched/mismatched reference sequences stored in the receiver. Simulations demonstrate that our proposed Flag sequences lead to improved SCWISL and customized peak-to-max-sidelobe ratio compared with the existing Flag sequences. Additionally, our Flag sequences under Flag method exhibit Mean Squared Errors that approach the Cramer-Rao Lower Bound and the Sampling Bound at high signal-to-noise power ratios.
    摘要 Since every Flag sequence consists of a Curtain sequence and a Peak sequence, we first investigate the algebraic design of zone-based Curtain sequence sets of arbitrary lengths. Our proposed design yields novel Curtain sequence sets with ideal curtain auto-AFs and low/zero cross-AFs within the delay-Doppler zone of interest.Leveraging these Curtain sequence sets, two optimization problems are formulated to minimize the summed customized weighted integrated sidelobe level (SCWISL) of the Flag sequence set. Accelerated Parallel Partially Majorization-Minimization Algorithms are proposed to jointly optimize the transmit Flag sequences and matched/mismatched reference sequences stored in the receiver.Simulations show that our proposed Flag sequences lead to improved SCWISL and customized peak-to-max-sidelobe ratio compared with existing Flag sequences. Additionally, our Flag sequences under the Flag method exhibit Mean Squared Errors that approach the Cramer-Rao Lower Bound and the Sampling Bound at high signal-to-noise power ratios.

Soft Demodulator for Symbol-Level Precoding in Coded Multiuser MISO Systems

  • paper_url: http://arxiv.org/abs/2310.10296
  • repo_url: None
  • paper_authors: Yafei Wang, Hongwei Hou, Wenjin Wang, Xinping Yi, Shi Jin
  • for: 本文研究了Symbol-level precoding (SLP)在channel-coded多用户多输入单输出(MISO)系统中的应用。
  • methods: 本文提出了一种新的软解调器设计方法,用于处理不符合 Gaussian 分布的 SLP 信号。
  • results: 实验结果表明,提议的软解调器可以减少现有 SLP 系统的通信 overhead 和计算复杂度,同时提高了传输率。
    Abstract In this paper, we consider symbol-level precoding (SLP) in channel-coded multiuser multi-input single-output (MISO) systems. It is observed that the received SLP signals do not always follow Gaussian distribution, rendering the conventional soft demodulation with the Gaussian assumption unsuitable for the coded SLP systems. It, therefore, calls for novel soft demodulator designs for non-Gaussian distributed SLP signals with accurate log-likelihood ratio (LLR) calculation. To this end, we first investigate the non-Gaussian characteristics of both phase-shift keying (PSK) and quadrature amplitude modulation (QAM) received signals with existing SLP schemes and categorize the signals into two distinct types. The first type exhibits an approximate-Gaussian distribution with the outliers extending along the constructive interference region (CIR). In contrast, the second type follows some distribution that significantly deviates from the Gaussian distribution. To obtain accurate LLR, we propose the modified Gaussian soft demodulator and Gaussian mixture model (GMM) soft demodulators to deal with two types of signals respectively. Subsequently, to further reduce the computational complexity and pilot overhead, we put forward a novel neural soft demodulator, named pilot feature extraction network (PFEN), leveraging the transformer mechanism in deep learning. Simulation results show that the proposed soft demodulators dramatically improve the throughput of existing SLPs for both PSK and QAM transmission in coded systems.
    摘要 在这篇论文中,我们考虑了符号级precoding(SLP)在多用户多输入单出力(MISO)系统中。我们发现接收的SLP信号不总是follow Gaussian分布,这使得传统的软模解器不适用于编码SLP系统。因此,我们需要设计新的软模解器,以便在非Gaussian分布下计算准确的log-likelihood ratio(LLR)。为此,我们首先研究了现有SLP方案中PSK和QAM接收信号的非Gaussian特征,并将信号分类为两种类型。第一种类型表现出近似Gaussian分布,其异常点分布在构建性干扰区(CIR)上。然而,第二种类型的信号具有显著不同于Gaussian分布的特征。为了获得准确的LLR,我们提议使用修改后Gaussian软模解器和Gaussian混合模型(GMM)软模解器来处理这两种信号。然后,为了进一步减少计算复杂性和导航点负担,我们提出了一种新的神经软模解器,即预测特征提取网络(PFEN),利用深度学习中的转换机制。实验结果表明,我们提出的软模解器可以很大程度提高现有SLP的传输能力。

  • paper_url: http://arxiv.org/abs/2310.10276
  • repo_url: None
  • paper_authors: Pavankumar Ganjimala, Subrahmanyam Mula
  • for: 模型无线非线性系统
  • methods: 使用块归一化功能链适应 Filter (BO-FLAF) 和 Hammersen BO trigonometric FLAF (HBO-TFLAF)
  • results: 对比原始 TFLAF,HBO-TFLAF 具有47% fewer multiplications,并且 exhibits 更快的 convergence rate 和 3-5 dB 更好的稳态平均方差 (MSE) 表现。
    Abstract The high computation complexity of nonlinear adaptive filtering algorithms poses significant challenges at the hardware implementation level. In order to tackle the computational complexity problem, this paper proposes a novel block-oriented functional link adaptive filter (BO-FLAF) to model memoryless nonlinear systems. Through theoretical complexity analysis, we show that the proposed Hammerstein BO trigonometric FLAF (HBO-TFLAF) has 47% lesser multiplications than the original TFLAF for a filter order of 1024. Moreover, the HBO-TFLAF exhibits a faster convergence rate and achieved 3-5 dB lesser steady-state mean square error (MSE) compared to the original TFLAF for a memoryless nonlinear system identification task.
    摘要 高度计算复杂性的非线性适应滤波算法在硬件实现方面带来了重要的挑战。为了解决计算复杂性问题,这篇论文提议了一种新的块 oriented 函数链适应滤波器(BO-FLAF),用于模型无记忆非线性系统。通过理论复杂性分析,我们表明了提案的汽olinski BO trigonometric FLAF(HBO-TFLAF)在缓存大小为 1024 的情况下,相比原始 TFLAF 减少了 47% 的 multiply 操作数。此外,HBO-TFLAF 还表现出更快的收敛速率和对非线性系统识别任务中的稳态平均幂二分量(MSE)减少了 3-5 dB。

Hierarchical MTC User Activity Detection and Channel Estimation with Unknown Spatial Covariance

  • paper_url: http://arxiv.org/abs/2310.10204
  • repo_url: None
  • paper_authors: Hamza Djelouat, Mikko J. Sillanpää, Markus Leinonen, Markku Juntti
  • for: 这篇论文解决了机器型通信中的共同用户标识和通道估计(JUICE)问题,采用实际的空间相关通道模型和未知 covariance 矩阵。
  • methods: 作者首先利用了强级先验的概念,并提出了层次稀突减模矩阵来模型结构化稀突活动模式。然后,他们 derivated了一种 bayesian 推理方案,将 expectation propagation(EP)算法和 expectation maximization(EM)框架结合起来。
  • results: 作者通过对 JUICE 问题进行最大 posteriori(MAP)估计,并提出了一种基于 alternating direction method of multipliers(ADMM)的计算效率高的解决方案。数据结果表明,提出的算法具有显著性能提升和具有不同用户稀突活动行为假设的Robustness。
    Abstract This paper addresses the joint user identification and channel estimation (JUICE) problem in machine-type communications under the practical spatially correlated channels model with unknown covariance matrices. Furthermore, we consider an MTC network with hierarchical user activity patterns following an event-triggered traffic mode. Therein the users are distributed over clusters with a structured sporadic activity behaviour that exhibits both cluster-level and intra-cluster sparsity patterns. To solve the JUICE problem, we first leverage the concept of strong priors and propose a hierarchical-sparsity-inducing spike-and-slab prior to model the structured sparse activity pattern. Subsequently, we derive a Bayesian inference scheme by coupling the expectation propagation (EP) algorithm with the expectation maximization (EM) framework. Second, we reformulate the JUICE as a maximum a posteriori (MAP) estimation problem and propose a computationally-efficient solution based on the alternating direction method of multipliers (ADMM). More precisely, we relax the strong spike-and-slab prior with a cluster-sparsity-promoting prior based on the long-sum penalty. We then derive an ADMM algorithm that solves the MAP problem through a sequence of closed-form updates. Numerical results highlight the significant performance significant gains obtained by the proposed algorithms, as well as their robustness against various assumptions on the users sparse activity behaviour.
    摘要 Second, we reformulate the JUICE as a maximum a posteriori (MAP) estimation problem and propose a computationally-efficient solution based on the alternating direction method of multipliers (ADMM). More precisely, we relax the strong spike-and-slab prior with a cluster-sparsity-promoting prior based on the long-sum penalty. We then derive an ADMM algorithm that solves the MAP problem through a sequence of closed-form updates. Numerical results highlight the significant performance gains obtained by the proposed algorithms, as well as their robustness against various assumptions on the users' sparse activity behavior.Translated into Simplified Chinese:这篇论文研究了机器类通信中的用户标识和通道估计(JUICE)问题,在实际的空间相关的通道模型下,并且假设用户活动模式遵循事件触发的交通模式。用户被分布在归一化的集群中,并且表现出了结构化零散的活动模式,这种模式包括集群水平和内部零散的特征。为解决JUICE问题,我们首先利用强级先验的概念,并提出了一种层次含拥权的钉板准则,以模型结构化零散的活动模式。然后,我们 deriv了一种 Bayesian 推理方案,通过将期望传播算法和期望最大化算法结合在一起。其次,我们将JUICE问题转换为最大 posteriori(MAP)估计问题,并提出了一种计算效率高的解决方案基于 alternating direction method of multipliers(ADMM)。更具体地,我们将强级钉板准则松弛为一种层次含拥权的长SUM penalty,然后 deriv 了一种 ADMM 算法来解决 MAP 问题。数据结果表明,我们提出的算法具有显著的性能提升和各种假设用户的稀疏活动行为下的稳定性。

cs.CV - 2023-10-15

AP$n$P: A Less-constrained P$n$P Solver for Pose Estimation with Unknown Anisotropic Scaling or Focal Lengths

  • paper_url: http://arxiv.org/abs/2310.09982
  • repo_url: https://github.com/goldoak/APnP
  • paper_authors: Jiaxin Wei, Stefan Leutenegger, Laurent Kneip
  • for: 提出了一种新的 pose estimation 算法,可以处理不准确的 3D 坐标和完全加工数据。
  • methods: 使用了代数处理和新的 Parametrization,将两种情况都转化为同样的多项式问题,并使用 Gr"obner basis 方法解决。
  • results: 实验结果表明,AP$n$P 算法可以提供更 flexible 和实用的 pose estimation 解决方案,并且在 simulate 和实际数据上达到了良好的效果。
    Abstract Perspective-$n$-Point (P$n$P) stands as a fundamental algorithm for pose estimation in various applications. In this paper, we present a new approach to the P$n$P problem with relaxed constraints, eliminating the need for precise 3D coordinates or complete calibration data. We refer to it as AP$n$P due to its ability to handle unknown anisotropic scaling factors of 3D coordinates or alternatively two distinct focal lengths in addition to the conventional rigid pose. Through algebraic manipulations and a novel parametrization, both cases are brought into similar forms that distinguish themselves primarily by the order of a rotation and an anisotropic scaling operation. AP$n$P furthermore brings down both cases to an identical polynomial problem, which is solved using the Gr\"obner basis approach. Experimental results on both simulated and real datasets demonstrate the effectiveness of AP$n$P, providing a more flexible and practical solution to several pose estimation tasks. Code: https://github.com/goldoak/APnP.
    摘要 投影-$n$-点(P$n$P)算法是各种应用中pose estimation的基本算法之一。在这篇论文中,我们提出了一种新的P$n$P问题的解决方案, eliminates the need for precise 3D坐标或完整的准备数据。我们称之为AP$n$P,因为它可以处理未知的三维坐标的 aniotropic scaling factor或两个不同的焦点距离。通过代数操作和一种新的参数化,两种情况都被带入了相似的形式,主要在某种顺序下进行旋转和 aniotropic scaling 操作。此外,AP$n$P还将两种情况下降到了同样的多项式问题,使用Groebner基式方法解决。实验结果表明,AP$n$P是一种更 flexible和实用的pose estimation方法,在 simulate 和实际数据上均有效。代码:https://github.com/goldoak/APnP。

Class-Specific Data Augmentation: Bridging the Imbalance in Multiclass Breast Cancer Classification

  • paper_url: http://arxiv.org/abs/2310.09981
  • repo_url: None
  • paper_authors: Kanan Mahammadli, Abdullah Burkan Bereketoglu, Ayse Gul Kabakci
  • For: This paper aims to improve the accuracy of breast cancer image classification, specifically for the undersampled classes, by employing class-level data augmentation and a transformer-based ViTNet architecture.* Methods: The paper uses class-level data augmentation on structure-preserving stain normalization techniques to hematoxylin and eosin-stained images, as well as a transformer-based ViTNet architecture via transfer learning for multiclass classification of breast cancer images.* Results: The approach proposed in the paper leads to lower mortality rates associated with breast cancer by increasing the precision of classification on undersampled classes. The paper is able to categorize breast cancer images with advanced image processing and deep learning into either benign or one of four distinct malignant subtypes with high accuracy.
    Abstract Breast Cancer is the most common cancer among women, which is also visible in men, and accounts for more than 1 in 10 new cancer diagnoses each year. It is also the second most common cause of women who die from cancer. Hence, it necessitates early detection and tailored treatment. Early detection can provide appropriate and patient-based therapeutic schedules. Moreover, early detection can also provide the type of cyst. This paper employs class-level data augmentation, addressing the undersampled classes and raising their detection rate. This approach suggests two key components: class-level data augmentation on structure-preserving stain normalization techniques to hematoxylin and eosin-stained images and transformer-based ViTNet architecture via transfer learning for multiclass classification of breast cancer images. This merger enables categorizing breast cancer images with advanced image processing and deep learning as either benign or as one of four distinct malignant subtypes by focusing on class-level augmentation and catering to unique characteristics of each class with increasing precision of classification on undersampled classes, which leads to lower mortality rates associated with breast cancer. The paper aims to ease the duties of the medical specialist by operating multiclass classification and categorizing the image into benign or one of four different malignant types of breast cancers.
    摘要 乳癌是女性最常见的癌症,也可以出现在男性身上,每年负担着超过1/10新诊断癌症的责任。它同时也是女性死于癌症的第二大原因。因此,早期发现和定制治疗是非常重要的。早期发现可以提供适当的治疗时间表,同时也可以确定肿瘤的类型。本文提出了一种方法,通过结合分类数据增强和结构保持的染色Normalization技术,以及基于Transformer的ViTNet架构,进行多类分类分析乳癌图像。这种方法可以将乳癌图像分为benign或四种不同的恶性Subtype中的一种,并且可以根据不同的分类类别,提高对受抽样分布的类别的准确性。这种方法可以减轻医生的工作负担,并且可以帮助鉴定乳癌图像的分类结果。

ProteusNeRF: Fast Lightweight NeRF Editing using 3D-Aware Image Context

  • paper_url: http://arxiv.org/abs/2310.09965
  • repo_url: None
  • paper_authors: Binglun Wang, Niladri Shekhar Dutt, Niloy J. Mitra
  • for: 这个论文旨在提出一种用于互动编辑NeRFs的简单 yet effective的神经网络架构,以实现高效、低占用内存的图像改编。
  • methods: 该架构通过图像特征缩掌和视觉上下文来实现视觉一致的图像编辑,并可以通过对神经网络进行增量导引来实现图像改编。
  • results: 作者在多个示例中证明了该方法可以带来 appearances 和 geometric 的编辑,并与同期工作相比,提供了10-30倍的速度提升。视频结果可以在https://proteusnerf.github.io上查看。
    Abstract Neural Radiance Fields (NeRFs) have recently emerged as a popular option for photo-realistic object capture due to their ability to faithfully capture high-fidelity volumetric content even from handheld video input. Although much research has been devoted to efficient optimization leading to real-time training and rendering, options for interactive editing NeRFs remain limited. We present a very simple but effective neural network architecture that is fast and efficient while maintaining a low memory footprint. This architecture can be incrementally guided through user-friendly image-based edits. Our representation allows straightforward object selection via semantic feature distillation at the training stage. More importantly, we propose a local 3D-aware image context to facilitate view-consistent image editing that can then be distilled into fine-tuned NeRFs, via geometric and appearance adjustments. We evaluate our setup on a variety of examples to demonstrate appearance and geometric edits and report 10-30x speedup over concurrent work focusing on text-guided NeRF editing. Video results can be seen on our project webpage at https://proteusnerf.github.io.
    摘要 neural Radiance Fields (NeRFs) 最近受到了大量研究,因为它们可以准确地捕捉高精度三维内容,即使从手持式视频输入中。虽然许多研究投入到了高效优化,以达到实时训练和渲染,但对Interactive编辑NeRFs的选择仍然有限。我们提出了一种简单 yet effective的神经网络架构,它具有快速和高效的特点,同时具有较低的内存占用率。这种架构可以通过用户友好的图像基于的编辑来进行慢慢导航。我们的表示方式允许直接通过 semantic feature distillation 在训练阶段进行对象选择。更重要的是,我们提议一种基于图像上的 мест化三维意识的图像上下文,以便实现视角一致的图像编辑,然后通过 geometric 和 appearance 调整来蒸馏 fine-tuned NeRFs。我们在多个例子中评估了我们的设置,并发现了10-30倍的速度提升, compared to 同时期关注 text-guided NeRF 编辑的工作。视频结果可以在我们项目网站(https://proteusnerf.github.io)上查看。

Tabletop Transparent Scene Reconstruction via Epipolar-Guided Optical Flow with Monocular Depth Completion Prior

  • paper_url: http://arxiv.org/abs/2310.09956
  • repo_url: None
  • paper_authors: Xiaotong Chen, Zheming Zhou, Zhuo Deng, Omid Ghasemalizadeh, Min Sun, Cheng-Hao Kuo, Arnie Sen
  • for: reconstruction of transparent objects using affordable RGB-D cameras
  • methods: leveraging monocular object segmentation and depth completion networks, Epipolar-guided Optical Flow (EOF)
  • results: significantly improved 3D reconstruction quality compared to baseline methods, paving the way for more adept robotic perception and interaction with transparent objects.Here’s the full text in Simplified Chinese:
  • for: 本研究旨在使用便宜的RGB-D摄像机重建透明物体,解决了RGB频谱中的不一致和单视深度测量不准确问题。
  • methods: 我们提出了一个两个阶段的架构,首先使用商业可用的单目object segmentation和深度完成网络预测透明物体的深度,提供单视形状优先。然后,我们提出了Epipolar-guided Optical Flow(EOF),将多个单视形状优先融合成cross-view一致的3D重建,基于摄像机pose估计。EOF使用边界敏感抽样和epipolar-line约束加入光流计算,准确建立透明物体的2D匹配。
  • results: 我们的架构与基eline方法进行比较,显示我们的3D重建质量得到了显著改善,为Robotic perception和透明物体交互带来了新的可能性。
    Abstract Reconstructing transparent objects using affordable RGB-D cameras is a persistent challenge in robotic perception due to inconsistent appearances across views in the RGB domain and inaccurate depth readings in each single-view. We introduce a two-stage pipeline for reconstructing transparent objects tailored for mobile platforms. In the first stage, off-the-shelf monocular object segmentation and depth completion networks are leveraged to predict the depth of transparent objects, furnishing single-view shape prior. Subsequently, we propose Epipolar-guided Optical Flow (EOF) to fuse several single-view shape priors from the first stage to a cross-view consistent 3D reconstruction given camera poses estimated from opaque part of the scene. Our key innovation lies in EOF which employs boundary-sensitive sampling and epipolar-line constraints into optical flow to accurately establish 2D correspondences across multiple views on transparent objects. Quantitative evaluations demonstrate that our pipeline significantly outperforms baseline methods in 3D reconstruction quality, paving the way for more adept robotic perception and interaction with transparent objects.
    摘要 <>translate text into Simplified Chinese文本:重建透明 объек所用的便宜RGB-D相机是机器人感知中的一个持续挑战,因为颜色域中的 appearances 是不一致的,以及每个单视图中的深度测量不准确。我们提出了一个两stage管道,用于重建透明 объек。在第一个阶段,我们使用 commercially 可用的单目物体分割和深度完成网络来预测透明 объек的深度,从而提供单视图形状优先。然后,我们提出了基于epipolar线的Optical Flow(EOF)来融合多个单视图形状优先,以实现相机pose estimate的cross-view一致性。我们的关键创新在于EOF,它使用边界敏感的采样和epipolar线约束来准确地在多个视图中建立透明对象的2D匹配。量化评估表明,我们的管道在3D重建质量方面与基eline方法进行了显著比较,为更加善的机器人感知和透明对象交互开创了道路。Translation:重建透明对象使用便宜RGB-D相机是机器人感知中的一个持续挑战,因为颜色域中的 appearances 是不一致的,以及每个单视图中的深度测量不准确。我们提出了一个两stage管道,用于重建透明对象。在第一个阶段,我们使用 commercially 可用的单目物体分割和深度完成网络来预测透明对象的深度,从而提供单视图形状优先。然后,我们提出了基于epipolar线的Optical Flow(EOF)来融合多个单视图形状优先,以实现相机pose estimate的cross-view一致性。我们的关键创新在于EOF,它使用边界敏感的采样和epipolar线约束来准确地在多个视图中建立透明对象的2D匹配。量化评估表明,我们的管道在3D重建质量方面与基eline方法进行了显著比较,为更加善的机器人感知和透明对象交互开创了道路。

Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning

  • paper_url: http://arxiv.org/abs/2310.09943
  • repo_url: None
  • paper_authors: Chahyon Ku, Carl Winge, Ryan Diaz, Wentao Yuan, Karthik Desingh
  • for: 这个论文主要关注评估和比较视觉表示的Robustness在物体组装任务中。
  • methods: 我们采用一种通用框架,利用视觉预训模型作为视觉编码器来进行视 Motor 政策学习。
  • results: 我们的量化分析表明现有预训模型无法捕捉这个任务所需的关键视觉特征,但一个从scratch预训的视觉编码器一直表现出色,并且我们提出了旋转表示和相关的损失函数,可以显著提高政策学习。
    Abstract This paper primarily focuses on evaluating and benchmarking the robustness of visual representations in the context of object assembly tasks. Specifically, it investigates the alignment and insertion of objects with geometrical extrusions and intrusions, commonly referred to as a peg-in-hole task. The accuracy required to detect and orient the peg and the hole geometry in SE(3) space for successful assembly poses significant challenges. Addressing this, we employ a general framework in visuomotor policy learning that utilizes visual pretraining models as vision encoders. Our study investigates the robustness of this framework when applied to a dual-arm manipulation setup, specifically to the grasp variations. Our quantitative analysis shows that existing pretrained models fail to capture the essential visual features necessary for this task. However, a visual encoder trained from scratch consistently outperforms the frozen pretrained models. Moreover, we discuss rotation representations and associated loss functions that substantially improve policy learning. We present a novel task scenario designed to evaluate the progress in visuomotor policy learning, with a specific focus on improving the robustness of intricate assembly tasks that require both geometrical and spatial reasoning. Videos, additional experiments, dataset, and code are available at https://bit.ly/geometric-peg-in-hole .
    摘要 (本文主要关注评估和比较视觉表示的稳定性在物体组装任务中。具体来说,它研究了具有几何嵌入和嵌入的物体的对齐和插入,通常被称为圆锥盘嵌入任务。成功的组装需要在 SE(3) 空间检测和 orient 圆锥和孔径的精度 pose significant challenges。为此,我们采用一种通用的 Framework in visuomotor policy learning,使用视觉预训模型作为视觉编码器。我们的研究检验了这种 Framework 在双臂 manipulate 设置中的稳定性,特别是在抓取变化中。我们的量化分析表明,现有的预训模型无法捕捉到这种任务所需的关键视觉特征。然而,一个从scratch 训练的视觉编码器在常规模型中具有显著的优势。此外,我们讨论了旋转表示和相关的损失函数,可以在策略学习中提高稳定性。我们还介绍了一种新的任务场景,用于评估visuomotor策略学习的进步,特别是在改进精度的几何和空间逻辑任务中。视频、附加实验、数据集和代码可以在 https://bit.ly/geometric-peg-in-hole 上获得。)

Unsupervised Discovery of Interpretable Directions in h-space of Pre-trained Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.09912
  • repo_url: None
  • paper_authors: Zijian Zhang, Luping Liu. Zhijie Lin, Yichen Zhu, Zhou Zhao
  • for: 这个研究旨在开发一个无监督性的学习基础的方法,用于找到对称易于理解的方向在预训 diffusion 模型的 h-space 中。
  • methods: 我们的方法基于现有的 GAN 内部空间技术,包括一个 shift control 模组和一个重建器。这两个模组共同实现对于预训 diffusion 模型的易于理解方向的发现。为了避免发现无意义和破坏性的方向,我们还使用了一个检测器来维持Shifted sample的实际性。
  • results: 我们的方法可以实现透过 iterative 生成过程来快速发现对称易于理解的方向,并且比较不需要其他复杂的程序。实验结果显示了我们的方法的有效性。
    Abstract We propose the first unsupervised and learning-based method to identify interpretable directions in the h-space of pre-trained diffusion models. Our method is derived from an existing technique that operates on the GAN latent space. In a nutshell, we employ a shift control module for pre-trained diffusion models to manipulate a sample into a shifted version of itself, followed by a reconstructor to reproduce both the type and the strength of the manipulation. By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions. To prevent the discovery of meaningless and destructive directions, we employ a discriminator to maintain the fidelity of shifted sample. Due to the iterative generative process of diffusion models, our training requires a substantial amount of GPU VRAM to store numerous intermediate tensors for back-propagating gradient. To address this issue, we first propose a general VRAM-efficient training algorithm based on gradient checkpointing technique to back-propagate any gradient through the whole generative process, with acceptable occupancy of VRAM and sacrifice of training efficiency. Compared with existing related works on diffusion models, our method inherently identifies global and scalable directions, without necessitating any other complicated procedures. Extensive experiments on various datasets demonstrate the effectiveness of our method.
    摘要 我们提出了首个无监督、学习基于的方法,用于在预训练的扩散模型中标识可解释的方向。我们的方法基于现有的GAN特征空间技术。总之,我们使用一个shift控制模块来控制预训练扩散模型中的一个样本,然后使用一个重构器来重建样本的类型和强度。通过同时优化它们,模型会自动发现分解和可解释的方向。为了避免发现无意义和破坏性的方向,我们使用一个探测器来保持扭曲样本的真实性。由于扩散模型的迭代生成过程,我们的训练需要大量的GPU VRAM来存储多个间接张量以供反向传播梯度。为解决这问题,我们首先提出了一种通用VRAM有效的训练算法,基于梯度检查点技术,可以在接受ABLE的VRAM占用和训练效率的情况下,反向传播任何梯度。与现有相关作品相比,我们的方法自然地标识全局和可扩展的方向,无需其他复杂的过程。广泛的实验表明了我们的方法的有效性。

Zero-Shot Object Goal Visual Navigation With Class-Independent Relationship Network

  • paper_url: http://arxiv.org/abs/2310.09883
  • repo_url: None
  • paper_authors: Xinting Li, Shizhou Zhang, Yue LU, Kerry Dan, Lingyan Ran, Peng Wang, Yanning Zhang
  • for: 这个论文研究了零shot对象目标视觉导航问题。
  • methods: 我们提出了一种名为Class-Independent Relationship Network(CIRN)的方法,它结合目标检测信息和目标和导航目标之间的相似性,构建了一个新的状态表示,不包含目标特征或环境特征,从而有效地解couple了机器人的导航能力与目标特征。
  • results: 在AI2-THOR虚拟环境中进行了广泛的实验,我们的方法在零shot导航任务中表现出了强大的泛化能力,包括不同目标和环境下的导航任务。进一步的跨目标和跨场景设置中的实验也进一步验证了我们的方法的稳定性和泛化能力。
    Abstract This paper investigates the zero-shot object goal visual navigation problem. In the object goal visual navigation task, the agent needs to locate navigation targets from its egocentric visual input. "Zero-shot" means that the target the agent needs to find is not trained during the training phase. To address the issue of coupling navigation ability with target features during training, we propose the Class-Independent Relationship Network (CIRN). This method combines target detection information with the relative semantic similarity between the target and the navigation target, and constructs a brand new state representation based on similarity ranking, this state representation does not include target feature or environment feature, effectively decoupling the agent's navigation ability from target features. And a Graph Convolutional Network (GCN) is employed to learn the relationships between different objects based on their similarities. During testing, our approach demonstrates strong generalization capabilities, including zero-shot navigation tasks with different targets and environments. Through extensive experiments in the AI2-THOR virtual environment, our method outperforms the current state-of-the-art approaches in the zero-shot object goal visual navigation task. Furthermore, we conducted experiments in more challenging cross-target and cross-scene settings, which further validate the robustness and generalization ability of our method. Our code is available at: https://github.com/SmartAndCleverRobot/ICRA-CIRN.
    摘要

Top-K Pooling with Patch Contrastive Learning for Weakly-Supervised Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2310.09828
  • repo_url: None
  • paper_authors: Wangyu Wu, Tianhong Dai, Xiaowei Huang, Fei Ma, Jimin Xiao
  • for: 实现cost-effectiveweakly supervised semantic segmentation (WSSS)
  • methods: 使用Vision Transformer (ViT)方法,不使用class activation map (CAM),并 introduces top-K pooling layer和patch contrastive error (PCE)
  • results: 实验结果显示,我们的方法效果很高,在PASCAL VOC 2012 dataset上比其他state-of-the-art WSSS方法更高效。
    Abstract Weakly Supervised Semantic Segmentation (WSSS) using only image-level labels has gained significant attention due to cost-effectiveness. Recently, Vision Transformer (ViT) based methods without class activation map (CAM) have shown greater capability in generating reliable pseudo labels than previous methods using CAM. However, the current ViT-based methods utilize max pooling to select the patch with the highest prediction score to map the patch-level classification to the image-level one, which may affect the quality of pseudo labels due to the inaccurate classification of the patches. In this paper, we introduce a novel ViT-based WSSS method named top-K pooling with patch contrastive learning (TKP-PCL), which employs a top-K pooling layer to alleviate the limitations of previous max pooling selection. A patch contrastive error (PCE) is also proposed to enhance the patch embeddings to further improve the final results. The experimental results show that our approach is very efficient and outperforms other state-of-the-art WSSS methods on the PASCAL VOC 2012 dataset.
    摘要 强度不高的 semantic segmentation (WSSS) 使用图像级别标签已经吸引了广泛关注,主要是因为它的成本低廉。最近,基于 Vision Transformer (ViT) 的方法无需分类映射 (CAM) 已经显示出更高的可靠 Pseudo Label 生成能力。然而,现有的 ViT 基本方法使用最大池化来选择最高分预测值来映射patch级别分类到图像级别的,这可能会影响 pseudo labels 的质量由于不准确的patch分类。在本文中,我们介绍了一种新的 ViT 基本 WSSS 方法,名为 top-K pooling with patch contrastive learning (TKP-PCL),该方法使用 top-K pooling 层来缓解上述最大池化的局限性。此外,我们还提出了一种 patch contrastive error (PCE) 来进一步改进 patch 的嵌入,从而提高最终结果。实验结果表明,我们的方法非常高效,并在 PASCAL VOC 2012 数据集上超越了其他当前最佳 WSSS 方法。

Turn Passive to Active: A Survey on Active Intellectual Property Protection of Deep Learning Models

  • paper_url: http://arxiv.org/abs/2310.09822
  • repo_url: None
  • paper_authors: Mingfu Xue, Leo Yu Zhang, Yushu Zhang, Weiqiang Liu
  • for: 这篇论文主要是为了介绍和探讨深度学习(DL)模型的知识产权保护方法,具体来说是关于活动权利保护(active copyright protection)技术。
  • methods: 本论文主要采用了文献综述的方法,审查了现有的知识产权保护方法,并提出了新的活动权利保护方法的需求和挑战。
  • results: 本论文通过系统地介绍了活动权利保护的概念、特点和要求,提供了评价方法和指标,审查了现有的知识产权保护方法,探讨了可能面临的攻击和未来发展的挑战。
    Abstract The intellectual property protection of deep learning (DL) models has attracted increasing serious concerns. Many works on intellectual property protection for Deep Neural Networks (DNN) models have been proposed. The vast majority of existing work uses DNN watermarking to verify the ownership of the model after piracy occurs, which is referred to as passive verification. On the contrary, we focus on a new type of intellectual property protection method named active copyright protection, which refers to active authorization control and user identity management of the DNN model. As of now, there is relatively limited research in the field of active DNN copyright protection. In this review, we attempt to clearly elaborate on the connotation, attributes, and requirements of active DNN copyright protection, provide evaluation methods and metrics for active copyright protection, review and analyze existing work on active DL model intellectual property protection, discuss potential attacks that active DL model copyright protection techniques may face, and provide challenges and future directions for active DL model intellectual property protection. This review is helpful to systematically introduce the new field of active DNN copyright protection and provide reference and foundation for subsequent work.
    摘要 深度学习(DL)模型知识产权保护已经引起了越来越多的关注。许多关于深度神经网络(DNN)模型知识产权保护的工作已经被提出。大多数现有的工作使用深度神经网络水印来验证模型的所有权 после盗版,这被称为被动验证。相比之下,我们关注了一种新的知识产权保护方法,即活动版权保护,即活动授权控制和用户身份管理。到目前为止,对活动DL模型知识产权保护的研究相对较少。在这篇文章中,我们尝试了明确地解释活动DL模型知识产权保护的含义、特点和要求,提供评估方法和指标 для活动版权保护,回顾和分析现有的活动DL模型知识产权保护工作,讨论可能面临的攻击和未来方向。这篇文章有助于系统地介绍新的活动DNN模型知识产权保护领域,并提供参考和基础 для后续的工作。

LICO: Explainable Models with Language-Image Consistency

  • paper_url: http://arxiv.org/abs/2310.09821
  • repo_url: https://github.com/ymleifdu/lico
  • paper_authors: Yiming Lei, Zilong Li, Yangyang Li, Junping Zhang, Hongming Shan
  • for: 这篇论文的目的是解释深度学习模型的决策过程。
  • methods: 该论文提出了一种基于语言图像一致性的图像分类解释方法,称为LICO,它通过将学习的语言提示与对应的视觉特征进行对应关系的建立,从而生成更加解释的注意力地图。
  • results: 实验结果表明,LICO可以与现有的解释方法结合使用,并且可以提高图像分类模型的解释能力,而无需在推理过程中增加计算开销。
    Abstract Interpreting the decisions of deep learning models has been actively studied since the explosion of deep neural networks. One of the most convincing interpretation approaches is salience-based visual interpretation, such as Grad-CAM, where the generation of attention maps depends merely on categorical labels. Although existing interpretation methods can provide explainable decision clues, they often yield partial correspondence between image and saliency maps due to the limited discriminative information from one-hot labels. This paper develops a Language-Image COnsistency model for explainable image classification, termed LICO, by correlating learnable linguistic prompts with corresponding visual features in a coarse-to-fine manner. Specifically, we first establish a coarse global manifold structure alignment by minimizing the distance between the distributions of image and language features. We then achieve fine-grained saliency maps by applying optimal transport (OT) theory to assign local feature maps with class-specific prompts. Extensive experimental results on eight benchmark datasets demonstrate that the proposed LICO achieves a significant improvement in generating more explainable attention maps in conjunction with existing interpretation methods such as Grad-CAM. Remarkably, LICO improves the classification performance of existing models without introducing any computational overhead during inference. Source code is made available at https://github.com/ymLeiFDU/LICO.
    摘要 深度学习模型的解释方法在深度神经网络爆发后活跃研究。一种最有力的解释方法是基于分类标签的特征焦点映射,如Grad-CAM,其中生成特征焦点映射的依赖于分类标签。 although existing interpretation methods can provide explainable decision clues, they often yield partial correspondence between image and saliency maps due to the limited discriminative information from one-hot labels. This paper proposes a Language-Image COnsistency model for explainable image classification, termed LICO, by correlating learnable linguistic prompts with corresponding visual features in a coarse-to-fine manner. Specifically, we first establish a coarse global manifold structure alignment by minimizing the distance between the distributions of image and language features. We then achieve fine-grained saliency maps by applying optimal transport (OT) theory to assign local feature maps with class-specific prompts. Extensive experimental results on eight benchmark datasets demonstrate that the proposed LICO achieves a significant improvement in generating more explainable attention maps in conjunction with existing interpretation methods such as Grad-CAM. Remarkably, LICO improves the classification performance of existing models without introducing any computational overhead during inference. 源代码可以在https://github.com/ymLeiFDU/LICO 中下载。

OAAFormer: Robust and Efficient Point Cloud Registration Through Overlapping-Aware Attention in Transformer

  • paper_url: http://arxiv.org/abs/2310.09817
  • repo_url: None
  • paper_authors: Junjie Gao, Qiujie Dong, Ruian Wang, Shuangmin Chen, Shiqing Xin, Changhe Tu, Wenping Wang
  • For: The paper focuses on improving the correspondence quality in point cloud registration using a coarse-to-fine feature matching paradigm.* Methods: The proposed method, called OAAFormer, introduces a soft matching mechanism, overlapping region detection module, and region-wise attention module to enhance correspondence quality.* Results: The proposed method achieves a substantial increase of about 7% in the inlier ratio and an enhancement of 2-4% in registration recall on the challenging 3DLoMatch benchmark.Here is the same information in Simplified Chinese:* For: 本文关注在点云注册中提高匹配质量,使用粗细匹配模式。* Methods: 提议的方法是OAAFormer,它引入软匹配机制、重叠区检测模块和区域精度注意力模块来提高匹配质量。* Results: 测试结果表明,提议的方法在3DLoMatch benchmark上增加了约7%的匹配率和2-4%的注册再现率。
    Abstract In the domain of point cloud registration, the coarse-to-fine feature matching paradigm has received substantial attention owing to its impressive performance. This paradigm involves a two-step process: first, the extraction of multi-level features, and subsequently, the propagation of correspondences from coarse to fine levels. Nonetheless, this paradigm exhibits two notable limitations.Firstly, the utilization of the Dual Softmax operation has the potential to promote one-to-one correspondences between superpoints, inadvertently excluding valuable correspondences. This propensity arises from the fact that a source superpoint typically maintains associations with multiple target superpoints. Secondly, it is imperative to closely examine the overlapping areas between point clouds, as only correspondences within these regions decisively determine the actual transformation. Based on these considerations, we propose {\em OAAFormer} to enhance correspondence quality. On one hand, we introduce a soft matching mechanism, facilitating the propagation of potentially valuable correspondences from coarse to fine levels. Additionally, we integrate an overlapping region detection module to minimize mismatches to the greatest extent possible. Furthermore, we introduce a region-wise attention module with linear complexity during the fine-level matching phase, designed to enhance the discriminative capabilities of the extracted features. Tests on the challenging 3DLoMatch benchmark demonstrate that our approach leads to a substantial increase of about 7\% in the inlier ratio, as well as an enhancement of 2-4\% in registration recall. =
    摘要 在点云注册领域,粗细到细粒度匹配方法受到了广泛关注,这种方法包括两个步骤:首先提取多级特征,然后在粗细层次上传递匹配。然而,这种方法存在两个显著的限制。首先,使用双层软MAX操作可能会促进一对一的匹配 между超点,不经意增加有价值的匹配。这种倾向来自于源超点通常与多个目标超点保持关联。其次,需要仔细检查点云之间的重叠区域,只有这些区域中的匹配才能决定实际变换。基于这些考虑,我们提出了{\em OAAFormer}来提高匹配质量。一方面,我们引入了软匹配机制,以便在粗细层次上传递有可能的有价值匹配。另一方面,我们集成了重叠区域检测模块,以最大限度避免匹配错误。此外,我们引入了区域wise注意力模块,用于在细粒度匹配阶段提高提取特征的推理能力。3DLoMatch benchmark上的测试表明,我们的方法可以提高约7%的匹配率,同时提高注册回归率约2-4%。

Can LSH (Locality-Sensitive Hashing) Be Replaced by Neural Network?

  • paper_url: http://arxiv.org/abs/2310.09806
  • repo_url: None
  • paper_authors: Renyang Liu, Jun Zhao, Xing Chu, Yu Liang, Wei Zhou, Jing He
  • for: 提高信息搜索性能
  • methods: 使用深度神经网络学习locality-sensitive hashing
  • results: 提高查询精度、减少时间和内存消耗
    Abstract With the rapid development of GPU (Graphics Processing Unit) technologies and neural networks, we can explore more appropriate data structures and algorithms. Recent progress shows that neural networks can partly replace traditional data structures. In this paper, we proposed a novel DNN (Deep Neural Network)-based learned locality-sensitive hashing, called LLSH, to efficiently and flexibly map high-dimensional data to low-dimensional space. LLSH replaces the traditional LSH (Locality-sensitive Hashing) function families with parallel multi-layer neural networks, which reduces the time and memory consumption and guarantees query accuracy simultaneously. The proposed LLSH demonstrate the feasibility of replacing the hash index with learning-based neural networks and open a new door for developers to design and configure data organization more accurately to improve information-searching performance. Extensive experiments on different types of datasets show the superiority of the proposed method in query accuracy, time consumption, and memory usage.
    摘要 With the rapid development of GPU (图形处理器) technologies and neural networks, we can explore more appropriate data structures and algorithms. Recent progress shows that neural networks can partly replace traditional data structures. In this paper, we proposed a novel DNN (深度神经网络)-based learned locality-sensitive hashing, called LLSH, to efficiently and flexibly map high-dimensional data to low-dimensional space. LLSH replaces the traditional LSH (本地相似哈希) function families with parallel multi-layer neural networks, which reduces the time and memory consumption and guarantees query accuracy simultaneously. The proposed LLSH demonstrate the feasibility of replacing the hash index with learning-based neural networks and open a new door for developers to design and configure data organization more accurately to improve information-searching performance. Extensive experiments on different types of datasets show the superiority of the proposed method in query accuracy, time consumption, and memory usage.

Model Inversion Attacks on Homogeneous and Heterogeneous Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2310.09800
  • repo_url: None
  • paper_authors: Renyang Liu, Wei Zhou, Jinhong Zhang, Xiaoyuan Liu, Peiyuan Si, Haoran Li
    for:这个研究旨在提出一种新的模型反向攻击方法,以对Homogeneous Graph Neural Networks (HomoGNNs)和Heterogeneous Graph Neural Networks (HeteGNNs)进行模型反向攻击。methods:这个方法基于Gradient Descent的优化方法,目的是将目标GNN的模型内部结构重建出来,以便进行模型反向攻击。results:实验结果显示,提案的方法可以在多个 benchmark 上实现更好的性能,并且在HeteGNNs上进行模型反向攻击是第一次尝试。
    Abstract Recently, Graph Neural Networks (GNNs), including Homogeneous Graph Neural Networks (HomoGNNs) and Heterogeneous Graph Neural Networks (HeteGNNs), have made remarkable progress in many physical scenarios, especially in communication applications. Despite achieving great success, the privacy issue of such models has also received considerable attention. Previous studies have shown that given a well-fitted target GNN, the attacker can reconstruct the sensitive training graph of this model via model inversion attacks, leading to significant privacy worries for the AI service provider. We advocate that the vulnerability comes from the target GNN itself and the prior knowledge about the shared properties in real-world graphs. Inspired by this, we propose a novel model inversion attack method on HomoGNNs and HeteGNNs, namely HomoGMI and HeteGMI. Specifically, HomoGMI and HeteGMI are gradient-descent-based optimization methods that aim to maximize the cross-entropy loss on the target GNN and the $1^{st}$ and $2^{nd}$-order proximities on the reconstructed graph. Notably, to the best of our knowledge, HeteGMI is the first attempt to perform model inversion attacks on HeteGNNs. Extensive experiments on multiple benchmarks demonstrate that the proposed method can achieve better performance than the competitors.
    摘要 最近,图 necklace Neural Networks(GNNs),包括同种图 necklace Neural Networks(HomoGNNs)和不同种图 necklace Neural Networks(HeteGNNs),在许多物理场景中取得了很大的进步,特别是在通信应用场景中。尽管取得了很大的成功,但是隐私问题也得到了广泛的关注。先前的研究表明,给出了一个良好适应的目标GNN,攻击者可以通过模型反向攻击来重建敏感的训练图,从而导致AI服务提供商的隐私问题。我们认为,抵触点来自于目标GNN自身和在实际图中共享的特性知识。 inspirited by this,我们提出了一种基于梯度下降优化的模型反向攻击方法,称为HomoGMI和HeteGMI。具体来说,HomoGMI和HeteGMI都是使用梯度下降方法来最大化目标GNN上的权重损失和$1^{st}$和$2^{nd}$邻域的距离损失。需要注意的是,到目前为止,HeteGMI是首次对HeteGNN进行模型反向攻击。我们在多个标准 benchmark上进行了广泛的实验,结果表明,我们的方法可以在与竞争者相比取得更好的性能。

AFLOW: Developing Adversarial Examples under Extremely Noise-limited Settings

  • paper_url: http://arxiv.org/abs/2310.09795
  • repo_url: None
  • paper_authors: Renyang Liu, Jinhong Zhang, Haoran Li, Jin Zhang, Yuanyu Wang, Wei Zhou
  • for: 本研究旨在提出一种隐蔽的对抗例生成方法,以揭示深度神经网络(DNNs)的漏洞,并帮助提高对抗例的耐久性。
  • methods: 本研究提出了一种基于Normalize Flow的端到端攻击框架,称为AFLOW,以直接干扰隐藏表示的图像来生成恶意对抗例。与先前的方法不同,AFLOW不添加噪音,而是直接对图像的隐藏表示进行修改。
  • results: 对三个标准数据集进行了广泛的实验,结果表明,AFLOW可以生成更隐蔽、更高质量的对抗例,并在一些耐久性较高的模型上仍然达到更高的攻击成功率。
    Abstract Extensive studies have demonstrated that deep neural networks (DNNs) are vulnerable to adversarial attacks. Despite the significant progress in the attack success rate that has been made recently, the adversarial noise generated by most of the existing attack methods is still too conspicuous to the human eyes and proved to be easily detected by defense mechanisms. Resulting that these malicious examples cannot contribute to exploring the vulnerabilities of existing DNNs sufficiently. Thus, to better reveal the defects of DNNs and further help enhance their robustness under noise-limited situations, a new inconspicuous adversarial examples generation method is exactly needed to be proposed. To bridge this gap, we propose a novel Normalize Flow-based end-to-end attack framework, called AFLOW, to synthesize imperceptible adversarial examples under strict constraints. Specifically, rather than the noise-adding manner, AFLOW directly perturbs the hidden representation of the corresponding image to craft the desired adversarial examples. Compared with existing methods, extensive experiments on three benchmark datasets show that the adversarial examples built by AFLOW exhibit superiority in imperceptibility, image quality and attack capability. Even on robust models, AFLOW can still achieve higher attack results than previous methods.
    摘要 广泛的研究表明,深度神经网络(DNNs)容易受到敌意攻击。尽管最近有很大的进步在攻击成功率方面,但现有的攻击方法仍然生成的恶意示例过于醒目,容易被防御机制检测。这意味着这些恶意示例无法充分探索现有DNNs的漏洞,增强其 robustness 下限 Situation。因此,为了更好地揭示 DNNs 的缺陷并帮助提高其Robustness,我们需要提出一种透明度低的攻击方法。为了填补这个空白,我们提出了一种基于 Normalize Flow 的终端攻击框架,称为 AFLOW,可以在严格的约束下生成透明度低的恶意示例。与现有方法相比,我们进行了三个 benchmark 数据集的广泛实验,结果表明,由 AFLOW 生成的恶意示例具有较高的透明度、图像质量和攻击能力。即使面对Robust 模型,AFLOW 仍然可以达到更高的攻击成功率。>>>

Automated Detection of Cat Facial Landmarks

  • paper_url: http://arxiv.org/abs/2310.09793
  • repo_url: None
  • paper_authors: George Martvel, Ilan Shimshoni, Anna Zamansky
  • for: 该论文主要用于提供一个高质量、全面的猫脸表情数据集,以及一种基于猫脸部分描述的面部坐标检测模型。
  • methods: 该论文使用了一种基于 convolutional neural network 的面部坐标检测模型,其中包括一种使用缩放ensemble方法的面部坐标检测模型,可以在猫脸上显示出极高的性能。
  • results: 该论文通过使用猫脸数据集和面部坐标检测模型,实现了人类和猫脸的面部坐标检测 task 的混合性能,并且可以在猫脸上显示出极高的准确率。
    Abstract The field of animal affective computing is rapidly emerging, and analysis of facial expressions is a crucial aspect. One of the most significant challenges that researchers in the field currently face is the scarcity of high-quality, comprehensive datasets that allow the development of models for facial expressions analysis. One of the possible approaches is the utilisation of facial landmarks, which has been shown for humans and animals. In this paper we present a novel dataset of cat facial images annotated with bounding boxes and 48 facial landmarks grounded in cat facial anatomy. We also introduce a landmark detection convolution neural network-based model which uses a magnifying ensembe method. Our model shows excellent performance on cat faces and is generalizable to human facial landmark detection.
    摘要 “动物情感计算领域在快速发展,面部表达分析是一项重要的挑战。研究人员目前面临的一个主要挑战是获得高质量、全面的面部表达数据集,以便发展面部表达分析模型。我们在这篇论文中提出了一种使用面部特征点的方法,并提供了一个基于 convolutional neural network 的面部特征点检测模型。我们的模型在猫脸上表现出色,并且可以普适应用于人类面部特征点检测。”Here's the breakdown of the translation:* 动物情感计算领域 (dòngwù qíngshěn jìsuàn) - "animal affective computing field"* rapidly emerging (shìyù xiǎngchuāng) - "rapidly emerging"* analysis of facial expressions (miàn zhèng xiàngxìng) - "analysis of facial expressions"* One of the most significant challenges (yī zhèng zhìshì) - "one of the most significant challenges"* that researchers in the field currently face (zhèng zhìshì) - "that researchers in the field currently face"* is the scarcity of high-quality, comprehensive datasets (shūshì, zhìshì de yīxiàng) - "is the scarcity of high-quality, comprehensive datasets"* that allow the development of models for facial expressions analysis (miàn zhèng xiàngxìng yìjīng) - "that allow the development of models for facial expressions analysis"* One of the possible approaches (yī zhèng zhìshì) - "one of the possible approaches"* is the utilization of facial landmarks (miàn zhèng zhìshì) - "is the utilization of facial landmarks"* which has been shown for humans and animals (yīnwàng zhèndào) - "which has been shown for humans and animals"* We present a novel dataset of cat facial images (wǒmen xiǎngchuāng zhèng zhìshì) - "we present a novel dataset of cat facial images"* annotated with bounding boxes and 48 facial landmarks grounded in cat facial anatomy (jìchuāng yīnwàng zhèndào) - "annotated with bounding boxes and 48 facial landmarks grounded in cat facial anatomy"* We also introduce a landmark detection convolution neural network-based model (wǒmen xiǎngchuāng zhìshì yìjīng) - "we also introduce a landmark detection convolution neural network-based model"* which uses a magnifying ensemble method (jìchuāng yìjīng) - "which uses a magnifying ensemble method"* Our model shows excellent performance on cat faces (wǒmen xiǎngchuāng zhèng zhìshì) - "our model shows excellent performance on cat faces"* and is generalizable to human facial landmark detection (rénshēng zhìshì) - "and is generalizable to human facial landmark detection"

SCME: A Self-Contrastive Method for Data-free and Query-Limited Model Extraction Attack

  • paper_url: http://arxiv.org/abs/2310.09792
  • repo_url: None
  • paper_authors: Renyang Liu, Jinhong Zhang, Kwok-Yan Lam, Jun Zhao, Wei Zhou
    for:本研究旨在提高模型EXTRACTION攻击的效果,尤其是在有限Query情况下。methods:提出了一种新的数据自由的模型EXTRACTION方法(SCME),通过考虑内类多样性和间类多样性,生成多样化的假数据,并通过Mixup操作进一步增强模型的探索能力。results:在多种攻击场景下,SCME方法在有限Query情况下显示出了11.43%的平均提升,特别是对于未targeted攻击,SCME方法超过了当前最佳方法。
    Abstract Previous studies have revealed that artificial intelligence (AI) systems are vulnerable to adversarial attacks. Among them, model extraction attacks fool the target model by generating adversarial examples on a substitute model. The core of such an attack is training a substitute model as similar to the target model as possible, where the simulation process can be categorized in a data-dependent and data-free manner. Compared with the data-dependent method, the data-free one has been proven to be more practical in the real world since it trains the substitute model with synthesized data. However, the distribution of these fake data lacks diversity and cannot detect the decision boundary of the target model well, resulting in the dissatisfactory simulation effect. Besides, these data-free techniques need a vast number of queries to train the substitute model, increasing the time and computing consumption and the risk of exposure. To solve the aforementioned problems, in this paper, we propose a novel data-free model extraction method named SCME (Self-Contrastive Model Extraction), which considers both the inter- and intra-class diversity in synthesizing fake data. In addition, SCME introduces the Mixup operation to augment the fake data, which can explore the target model's decision boundary effectively and improve the simulating capacity. Extensive experiments show that the proposed method can yield diversified fake data. Moreover, our method has shown superiority in many different attack settings under the query-limited scenario, especially for untargeted attacks, the SCME outperforms SOTA methods by 11.43\% on average for five baseline datasets.
    摘要 To address these issues, we propose a novel data-free model extraction method called SCME (Self-Contrastive Model Extraction). Our method considers both inter- and intra-class diversity when synthesizing fake data, and introduces the Mixup operation to augment the fake data, allowing us to effectively explore the target model's decision boundary and improve the simulating capacity. Extensive experiments show that the proposed method can generate diversified fake data, and our method has shown superiority in many different attack settings under the query-limited scenario, especially for untargeted attacks. On average, our method outperforms state-of-the-art methods by 11.43% for five baseline datasets.

CBARF: Cascaded Bundle-Adjusting Neural Radiance Fields from Imperfect Camera Poses

  • paper_url: http://arxiv.org/abs/2310.09776
  • repo_url: None
  • paper_authors: Hongyu Fu, Xin Yu, Lincheng Li, Li Zhang
  • for: 本文提出了一种新的3D重建框架,用于同时优化摄像头姿势,以提高novel view的synthesizing质量。
  • methods: 该框架采用了分层bundle-adjustment(BA)模块,通过粗化-细化的方式进行摄像头姿势的优化,并采用了一种邻居替换策略来进一步优化BA的结果。
  • results: 实验结果表明,CBARF模型在摄像头姿势优化和novel view synthesis中表现出了state-of-the-art的性能,特别是在大量摄像头姿势噪声的情况下。
    Abstract Existing volumetric neural rendering techniques, such as Neural Radiance Fields (NeRF), face limitations in synthesizing high-quality novel views when the camera poses of input images are imperfect. To address this issue, we propose a novel 3D reconstruction framework that enables simultaneous optimization of camera poses, dubbed CBARF (Cascaded Bundle-Adjusting NeRF).In a nutshell, our framework optimizes camera poses in a coarse-to-fine manner and then reconstructs scenes based on the rectified poses. It is observed that the initialization of camera poses has a significant impact on the performance of bundle-adjustment (BA). Therefore, we cascade multiple BA modules at different scales to progressively improve the camera poses. Meanwhile, we develop a neighbor-replacement strategy to further optimize the results of BA in each stage. In this step, we introduce a novel criterion to effectively identify poorly estimated camera poses. Then we replace them with the poses of neighboring cameras, thus further eliminating the impact of inaccurate camera poses. Once camera poses have been optimized, we employ a density voxel grid to generate high-quality 3D reconstructed scenes and images in novel views. Experimental results demonstrate that our CBARF model achieves state-of-the-art performance in both pose optimization and novel view synthesis, especially in the existence of large camera pose noise.
    摘要 现有的量化神经渲染技术,如神经辐射场(NeRF),在输入图像的相机pose不 precisions时面临限制。为解决这个问题,我们提出了一种新的三维重建框架,称为CBARF(层次拟合束适应NeRF)。总之,我们的框架在层次进行相机pose的优化,然后基于修正后的相机pose来重建场景。我们发现初始相机pose的初始化对bundle-adjustment(BA)的性能有很大的影响。因此,我们在不同的级别上cascade多个BA模块,以逐步改进相机pose。同时,我们开发了一种邻居替换策略,以进一步优化BA模块在每个阶段的结果。在这个步骤中,我们引入了一种新的 criterion,以有效地识别估计不准确的相机pose。然后,我们将其替换为邻居相机的pose,从而进一步消除不准确的相机pose的影响。一旦相机pose被优化,我们就可以使用密度体积格来生成高质量的3D重建场景和图像。实验结果表明,我们的CBARF模型在相机pose的优化和新视图合成方面达到了状态级表现,特别是在相机pose噪声较大的情况下。

Image Augmentation with Controlled Diffusion for Weakly-Supervised Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2310.09760
  • repo_url: None
  • paper_authors: Wangyu Wu, Tianhong Dai, Xiaowei Huang, Fei Ma, Jimin Xiao
  • for: trains semantic segmentation models solely using image-level labels, and aims to improve the quality of pseudo labels when the size of available dataset is limited.
  • methods: introduces a novel approach called Image Augmentation with Controlled Diffusion (IACD), which effectively augments existing labeled datasets by generating diverse images through controlled diffusion, and proposes a high-quality image selection strategy to mitigate the potential noise introduced by the randomness of diffusion models.
  • results: clearly surpasses existing state-of-the-art methods, and the effect is more obvious when the amount of available data is small, demonstrating the effectiveness of the proposed IACD approach.
    Abstract Weakly-supervised semantic segmentation (WSSS), which aims to train segmentation models solely using image-level labels, has achieved significant attention. Existing methods primarily focus on generating high-quality pseudo labels using available images and their image-level labels. However, the quality of pseudo labels degrades significantly when the size of available dataset is limited. Thus, in this paper, we tackle this problem from a different view by introducing a novel approach called Image Augmentation with Controlled Diffusion (IACD). This framework effectively augments existing labeled datasets by generating diverse images through controlled diffusion, where the available images and image-level labels are served as the controlling information. Moreover, we also propose a high-quality image selection strategy to mitigate the potential noise introduced by the randomness of diffusion models. In the experiments, our proposed IACD approach clearly surpasses existing state-of-the-art methods. This effect is more obvious when the amount of available data is small, demonstrating the effectiveness of our method.
    摘要 弱级Semantic segmentation(WSSS),它目标是通过图像级别标签来训练 segmentation 模型,已经吸引了广泛的关注。现有方法主要集中在生成高质量 Pseudo label 上,使用可用的图像和图像级别标签。然而,当数据集的大小受限时, pseudo label 的质量会下降 significatively。因此,在这篇论文中,我们从不同的角度解决这个问题,我们提出了一种新的方法,即 Image Augmentation with Controlled Diffusion(IACD)。这个框架可以有效地增强现有的标注数据集,通过控制的扩散来生成多样化的图像。此外,我们还提出了一种高质量图像选择策略,以避免扩散模型中的随机性引入的噪音。在实验中,我们的提议的 IACD 方法明显超越了现有的状态对照方法。这个效果更加明显,当数据集的量受限时,这表明了我们的方法的有效性。

Prototype-oriented Unsupervised Change Detection for Disaster Management

  • paper_url: http://arxiv.org/abs/2310.09759
  • repo_url: None
  • paper_authors: Youngtack Oh, Minseok Seo, Doyi Kim, Junghoon Seo
  • For: 本研究旨在提出一种不需要标注的自然灾害监测方法,以应对气候变化导致的自然灾害的频发。* Methods: 本研究提出了一种名为prototype-orientedUnsupervised Change Detection for Disaster Management(PUCD)的方法,该方法通过比较预事件、后事件和基础模型生成的变化合成图像的特征来检测变化,并使用Segment Anything Model(SAM)进行精细化。* Results: 本研究在LEVIR-Extension数据集上评估了PUCD方法,并与其他方法进行比较,结果显示PUCD方法在LEVIR-Extension数据集上达到了现有方法的最优性。
    Abstract Climate change has led to an increased frequency of natural disasters such as floods and cyclones. This emphasizes the importance of effective disaster monitoring. In response, the remote sensing community has explored change detection methods. These methods are primarily categorized into supervised techniques, which yield precise results but come with high labeling costs, and unsupervised techniques, which eliminate the need for labeling but involve intricate hyperparameter tuning. To address these challenges, we propose a novel unsupervised change detection method named Prototype-oriented Unsupervised Change Detection for Disaster Management (PUCD). PUCD captures changes by comparing features from pre-event, post-event, and prototype-oriented change synthesis images via a foundational model, and refines results using the Segment Anything Model (SAM). Although PUCD is an unsupervised change detection, it does not require complex hyperparameter tuning. We evaluate PUCD framework on the LEVIR-Extension dataset and the disaster dataset and it achieves state-of-the-art performance compared to other methods on the LEVIR-Extension dataset.
    摘要 климат变化导致自然灾害的频率增加,这重要性化效果监测。因此,远程感知社区已经探索了变化检测方法。这些方法主要分为监督式技术,它们可以提供精确的结果,但是需要高的标注成本,以及无监督技术,它们可以消除标注需求,但是需要复杂的 гиперпараметров调整。为解决这些挑战,我们提出了一种基于原型的无监督变化检测方法,名为“原型 ориентирован无监督变化检测 для灾害管理”(PUCD)。PUCD通过比较预事件、后事件和基于原型的变化合成图像的特征来捕捉变化,并使用基本模型(SAM)进行细化。尽管PUCD是无监督的变化检测方法,但它不需要复杂的 гиперпараметров调整。我们对PUCD框架在LEVIR-Extension数据集和灾害数据集进行评估,并达到了与其他方法相比的状态前方性表现。

MoEmo Vision Transformer: Integrating Cross-Attention and Movement Vectors in 3D Pose Estimation for HRI Emotion Detection

  • paper_url: http://arxiv.org/abs/2310.09757
  • repo_url: None
  • paper_authors: David C. Jeong, Tianma Shen, Hongji Liu, Raghav Kapoor, Casey Nguyen, Song Liu, Christopher A. Kitts
  • for: 这 paper 的目的是提出一种基于人体姿势估计的人工智能人机交互(HRI)中的情绪检测方法。
  • methods: 这 paper 使用了cross-attention视觉变换器(ViT)和自然语言处理技术,将人体姿势估计与环境上下文进行交互,以实现更高精度的情绪检测。
  • results: 对比现有方法,这 paper 的方法可以更好地利用人体姿势和环境上下文之间的微妙关系,从而提高情绪检测的准确率。
    Abstract Emotion detection presents challenges to intelligent human-robot interaction (HRI). Foundational deep learning techniques used in emotion detection are limited by information-constrained datasets or models that lack the necessary complexity to learn interactions between input data elements, such as the the variance of human emotions across different contexts. In the current effort, we introduce 1) MoEmo (Motion to Emotion), a cross-attention vision transformer (ViT) for human emotion detection within robotics systems based on 3D human pose estimations across various contexts, and 2) a data set that offers full-body videos of human movement and corresponding emotion labels based on human gestures and environmental contexts. Compared to existing approaches, our method effectively leverages the subtle connections between movement vectors of gestures and environmental contexts through the use of cross-attention on the extracted movement vectors of full-body human gestures/poses and feature maps of environmental contexts. We implement a cross-attention fusion model to combine movement vectors and environment contexts into a joint representation to derive emotion estimation. Leveraging our Naturalistic Motion Database, we train the MoEmo system to jointly analyze motion and context, yielding emotion detection that outperforms the current state-of-the-art.
    摘要 人工智能human-robot交互(HRI)中的情绪检测受到挑战。基础的深度学习技术在情绪检测方面受到数据约束或模型缺乏足够复杂性来学习输入数据元素之间的交互,如人类情绪在不同情境下的变化。在当前努力中,我们介绍了以下两个方法:1. MoEmo(动作到情绪):基于机器人系统的人体pose估计中的3D人体动作,采用视Transformer(ViT)来检测人类情绪。2. 一个包含全身动作和对应情绪标签的完整数据集。与现有方法相比,我们的方法可以充分利用人体动作vector和环境上下文的关系,通过跨注意力 fusion模型将动作vector和环境上下文融合为共同表示,从而实现情绪估计。我们使用自然主义人体动作数据库来训练MoEmo系统,以同时分析动作和环境,实现情绪检测的改进。

New Benchmarks for Asian Facial Recognition Tasks: Face Classification with Large Foundation Models

  • paper_url: http://arxiv.org/abs/2310.09756
  • repo_url: https://github.com/dukong1/koin_benchmark_dataset
  • paper_authors: Jinwoo Seo, Soora Choi, Eungyeom Ha, Beomjune Kim, Dongbin Na
  • for: 这个论文是为了开发一个大规模的韩国Influencer分类系统而写的。
  • methods: 这篇论文使用了大量的韩国明星照片,并在这些照片中添加了各种环境,如舞台照明、后台舞者和背景物体。这些照片可以用于训练分类模型,以便正确地识别韩国Influencer。
  • results: 论文提出了一个名为KoIn的大规模韩国Influencer数据集,包含100,000多张韩国明星照片,并提供了一些困难的样例图像,如人脸图像包含面具和帽子。论文还进行了多种实验,包括使用现有的基础模型来证明KoIn数据集的有效性。
    Abstract The face classification system is an important tool for recognizing personal identity properly. This paper introduces a new Large-Scale Korean Influencer Dataset named KoIn. Our presented dataset contains many real-world photos of Korean celebrities in various environments that might contain stage lighting, backup dancers, and background objects. These various images can be useful for training classification models classifying K-influencers. Most of the images in our proposed dataset have been collected from social network services (SNS) such as Instagram. Our dataset, KoIn, contains over 100,000 K-influencer photos from over 100 Korean celebrity classes. Moreover, our dataset provides additional hard case samples such as images including human faces with masks and hats. We note that the hard case samples are greatly useful in evaluating the robustness of the classification systems. We have extensively conducted several experiments utilizing various classification models to validate the effectiveness of our proposed dataset. Specifically, we demonstrate that recent state-of-the-art (SOTA) foundation architectures show decent classification performance when trained on our proposed dataset. In this paper, we also analyze the robustness performance against hard case samples of large-scale foundation models when we fine-tune the foundation models on the normal cases of the proposed dataset, KoIn. Our presented dataset and codes will be publicly available at https://github.com/dukong1/KoIn_Benchmark_Dataset.
    摘要 “人脸分类系统是识别个人身份的重要工具。本文介绍了一个新的大规模韩国 influencer 数据集名为 KoIn。我们提供的数据集包含了许多真实的韩国明星照片,其中包括舞台照明、后台舞者和背景物品等不同环境。这些图像可以用于训练分类模型,以便将 K-influencer 分类 correctly。大多数图像在我们提出的数据集中来自社交媒体服务(SNS)such as Instagram。我们的数据集 KoIn 包含了超过 100,000 韩国明星照片,来自于超过 100 个韩国明星类别。此外,我们的数据集还提供了一些难易 случа例样本,包括人脸图像中的面具和帽子等。我们注意到这些难易 случа例样本在评估分类系统的Robustness 性能时非常有用。我们在这篇文章中进行了多种实验,以验证我们提出的数据集的有效性。具体来说,我们表明了最新的基础建筑(SOTA)的基础模型在我们提出的数据集上进行训练后的分类性能不错。在这篇文章中,我们还分析了大规模基础模型在 KoIn 数据集中的 robustness 性能,当我们在 normal cases 上进行 fine-tune 时。我们将提供的数据集和代码将在 https://github.com/dukong1/KoIn_Benchmark_Dataset 上公开。”

Staged Depthwise Correlation and Feature Fusion for Siamese Object Tracking

  • paper_url: http://arxiv.org/abs/2310.09747
  • repo_url: None
  • paper_authors: Dianbo Ma, Jianqiang Xiao, Ziyan Gao, Satoshi Yamane
  • for: 提高视觉跟踪的特征提取效果
  • methods: 提出了一种新的多Stage深度相关和特征融合网络(DCFFNet),利用多级层次特征和多通道semantics来学习对象特征的优化权重
  • results: 对多个大规模数据集进行了端到端协调训练,实现了模型的稳定训练和高性能,并在多个标准测试集上达到了多种跟踪器的竞争性表现
    Abstract In this work, we propose a novel staged depthwise correlation and feature fusion network, named DCFFNet, to further optimize the feature extraction for visual tracking. We build our deep tracker upon a siamese network architecture, which is offline trained from scratch on multiple large-scale datasets in an end-to-end manner. The model contains a core component, that is, depthwise correlation and feature fusion module (correlation-fusion module), which facilitates model to learn a set of optimal weights for a specific object by utilizing ensembles of multi-level features from lower and higher layers and multi-channel semantics on the same layer. We combine the modified ResNet-50 with the proposed correlation-fusion layer to constitute the feature extractor of our model. In training process, we find the training of model become more stable, that benifits from the correlation-fusion module. For comprehensive evaluations of performance, we implement our tracker on the popular benchmarks, including OTB100, VOT2018 and LaSOT. Extensive experiment results demonstrate that our proposed method achieves favorably competitive performance against many leading trackers in terms of accuracy and precision, while satisfying the real-time requirements of applications.
    摘要 在这个工作中,我们提出了一种新的阶段化深度相关和特征融合网络(DCFFNet),用于进一步优化视觉跟踪的特征提取。我们基于siames network架构,将模型在多个大规模数据集上进行了线性训练,并在整个过程中从头到尾地训练。模型的核心组件是深度相关和特征融合模块(相关融合模块),它使得模型可以通过不同层次的特征和多个渠道semantic来学习一个特定对象的最佳权重。我们将修改后的ResNet-50和提议的相关融合层组合成我们的特征提取器。在训练过程中,我们发现模型的训练变得更加稳定,这是由相关融合模块带来的。为了进行全面的性能评估,我们在知名的benchmark上实现了我们的跟踪器,包括OTB100、VOT2018和LaSOT。广泛的实验结果表明,我们的提议方法在精度和稳定性两个方面与许多领先的跟踪器相比,表现出非常竞争力,同时满足应用中的实时需求。

Explore the Effect of Data Selection on Poison Efficiency in Backdoor Attacks

  • paper_url: http://arxiv.org/abs/2310.09744
  • repo_url: None
  • paper_authors: Ziqiang Li, Pengfei Xia, Hong Sun, Yueqi Zeng, Wei Zhang, Bin Li
    for: 降低培训数据 Collection 成本,提高深度神经网络 (DNNs) 的攻击性能。methods: 提出了一种基于 Forgetting 和 curvature 的 Sample Selection Strategy,可以提高攻击效率。results: 在多个领域 (CIFAR-10、CIFAR-100、ImageNet-10、AG News、ESC-50、Facial Age) 的实验结果表明,提案的方法可以在相同的恶意比例下提高攻击性能。
    Abstract As the number of parameters in Deep Neural Networks (DNNs) scales, the thirst for training data also increases. To save costs, it has become common for users and enterprises to delegate time-consuming data collection to third parties. Unfortunately, recent research has shown that this practice raises the risk of DNNs being exposed to backdoor attacks. Specifically, an attacker can maliciously control the behavior of a trained model by poisoning a small portion of the training data. In this study, we focus on improving the poisoning efficiency of backdoor attacks from the sample selection perspective. The existing attack methods construct such poisoned samples by randomly selecting some clean data from the benign set and then embedding a trigger into them. However, this random selection strategy ignores that each sample may contribute differently to the backdoor injection, thereby reducing the poisoning efficiency. To address the above problem, a new selection strategy named Improved Filtering and Updating Strategy (FUS++) is proposed. Specifically, we adopt the forgetting events of the samples to indicate the contribution of different poisoned samples and use the curvature of the loss surface to analyses the effectiveness of this phenomenon. Accordingly, we combine forgetting events and curvature of different samples to conduct a simple yet efficient sample selection strategy. The experimental results on image classification (CIFAR-10, CIFAR-100, ImageNet-10), text classification (AG News), audio classification (ESC-50), and age regression (Facial Age) consistently demonstrate the effectiveness of the proposed strategy: the attack performance using FUS++ is significantly higher than that using random selection for the same poisoning ratio.
    摘要 Existing attack methods create poisoned samples by randomly selecting some clean data from the benign set and embedding a trigger into them. However, this random selection strategy overlooks the fact that each sample may contribute differently to the backdoor injection, thereby reducing the poisoning efficiency.To address this issue, we propose a new sample selection strategy called Improved Filtering and Updating Strategy (FUS++). We utilize the forgetting events of the samples to indicate their contribution to the backdoor injection and analyze the effectiveness of this phenomenon using the curvature of the loss surface. By combining forgetting events and curvature of different samples, we develop a simple yet efficient sample selection strategy.Our experimental results on image classification (CIFAR-10, CIFAR-100, ImageNet-10), text classification (AG News), audio classification (ESC-50), and age regression (Facial Age) consistently show that the proposed strategy outperforms random selection for the same poisoning ratio. The attack performance using FUS++ is significantly higher, indicating the effectiveness of our proposed strategy in enhancing the poisoning efficiency of backdoor attacks.

AugUndo: Scaling Up Augmentations for Unsupervised Depth Completion

  • paper_url: http://arxiv.org/abs/2310.09739
  • repo_url: None
  • paper_authors: Yangchao Wu, Tian Yu Liu, Hyoungseob Park, Stefano Soatto, Dong Lao, Alex Wong
  • for: 提高无监督深度完成任务的性能(improve the performance of unsupervised depth completion tasks)
  • methods: 使用“undo”操作来解除各种几何变换对深度图的影响,从而计算恢复损失使用原始图像和稀疏深度图,从而缩大数据增强的可能性(use “undo” operation to eliminate the impact of various geometric transformations on the depth map, and compute the reconstruction loss using the original images and sparse depth maps, thus expanding the possibility of data augmentation)
  • results: 在indoor(VOID)和outdoor(KITTI)数据集上,与三种现有方法进行比较,平均提高了10.4%(compared to three existing methods on the indoor (VOID) and outdoor (KITTI) datasets, with an average improvement of 10.4%)
    Abstract Unsupervised depth completion methods are trained by minimizing sparse depth and image reconstruction error. Block artifacts from resampling, intensity saturation, and occlusions are amongst the many undesirable by-products of common data augmentation schemes that affect image reconstruction quality, and thus the training signal. Hence, typical augmentations on images that are viewed as essential to training pipelines in other vision tasks have seen limited use beyond small image intensity changes and flipping. The sparse depth modality have seen even less as intensity transformations alter the scale of the 3D scene, and geometric transformations may decimate the sparse points during resampling. We propose a method that unlocks a wide range of previously-infeasible geometric augmentations for unsupervised depth completion. This is achieved by reversing, or "undo"-ing, geometric transformations to the coordinates of the output depth, warping the depth map back to the original reference frame. This enables computing the reconstruction losses using the original images and sparse depth maps, eliminating the pitfalls of naive loss computation on the augmented inputs. This simple yet effective strategy allows us to scale up augmentations to boost performance. We demonstrate our method on indoor (VOID) and outdoor (KITTI) datasets where we improve upon three existing methods by an average of 10.4\% across both datasets.
    摘要 <> translate("Unsupervised depth completion methods are trained by minimizing sparse depth and image reconstruction error. Block artifacts from resampling, intensity saturation, and occlusions are amongst the many undesirable by-products of common data augmentation schemes that affect image reconstruction quality, and thus the training signal. Hence, typical augmentations on images that are viewed as essential to training pipelines in other vision tasks have seen limited use beyond small image intensity changes and flipping. The sparse depth modality have seen even less as intensity transformations alter the scale of the 3D scene, and geometric transformations may decimate the sparse points during resampling. We propose a method that unlocks a wide range of previously-infeasible geometric augmentations for unsupervised depth completion. This is achieved by reversing, or "undo"-ing, geometric transformations to the coordinates of the output depth, warping the depth map back to the original reference frame. This enables computing the reconstruction losses using the original images and sparse depth maps, eliminating the pitfalls of naive loss computation on the augmented inputs. This simple yet effective strategy allows us to scale up augmentations to boost performance. We demonstrate our method on indoor (VOID) and outdoor (KITTI) datasets where we improve upon three existing methods by an average of 10.4\% across both datasets.")>Here's the translation in Simplified Chinese:Unsupervised深度完成方法通常通过最小化稀疏深度和图像重建错误来训练。采样、强化、 occlusion 等常见数据增强方法的副作用会影响图像重建质量,从而影响训练信号。因此,通常会限制图像增强方法的应用,只有小型图像Intensity 变化和翻转。 sparse depth 模式更加受到INTENSITY 变化的影响,Geometric 变换可能会在抽样时消耗稀疏点。我们提出了一种方法,可以解锁许多前置不可能的几何增强。这是通过将几何变换转换为输出深度坐标的反向操作,使深度地图返回原始参照帧。这使得可以使用原始图像和稀疏深度图来计算重建损失,消除了使用增强输入的恶性问题。这种简单 yet 有效的策略允许我们扩大增强,提高性能。我们在indoor (VOID) 和 outdoor (KITTI) 数据集上示出了我们的方法,与三种现有方法的平均提升率为10.4%。

FuseSR: Super Resolution for Real-time Rendering through Efficient Multi-resolution Fusion

  • paper_url: http://arxiv.org/abs/2310.09726
  • repo_url: https://github.com/Isaac-Paradox/FuseSR
  • paper_authors: Zhihua Zhong, Jingsen Zhu, Yuxin Dai, Chuankun Zheng, Yuchi Huo, Guanlin Chen, Hujun Bao, Rui Wang
  • for: 提高实时渲染的效率和质量,满足高分辨率、高刷新率和高实зм的需求。
  • methods: 利用低分辨率输入图像,通过高价值高分辨率auxiliary G-Buffer来提高渲染的精度和效率。introduce an efficient and effective H-Net architecture to solve the problem of aligning and fusing features at multi-resolution levels.
  • results: 实现4K分辨率的实时渲染,并在$4 \times 4$和$8 \times 8$� upsampling cases中提供高质量和高性能的渲染结果,与现有方法相比有substantially improved quality和significant performance boost。
    Abstract The workload of real-time rendering is steeply increasing as the demand for high resolution, high refresh rates, and high realism rises, overwhelming most graphics cards. To mitigate this problem, one of the most popular solutions is to render images at a low resolution to reduce rendering overhead, and then manage to accurately upsample the low-resolution rendered image to the target resolution, a.k.a. super-resolution techniques. Most existing methods focus on exploiting information from low-resolution inputs, such as historical frames. The absence of high frequency details in those LR inputs makes them hard to recover fine details in their high-resolution predictions. In this paper, we propose an efficient and effective super-resolution method that predicts high-quality upsampled reconstructions utilizing low-cost high-resolution auxiliary G-Buffers as additional input. With LR images and HR G-buffers as input, the network requires to align and fuse features at multi resolution levels. We introduce an efficient and effective H-Net architecture to solve this problem and significantly reduce rendering overhead without noticeable quality deterioration. Experiments show that our method is able to produce temporally consistent reconstructions in $4 \times 4$ and even challenging $8 \times 8$ upsampling cases at 4K resolution with real-time performance, with substantially improved quality and significant performance boost compared to existing works.
    摘要 工作负载实时渲染在需求高分辨率、高刷新率和高现实性的需求增长,导致大多数图形卡被拥塞。为解决这个问题,一种非常流行的解决方案是在低分辨率下渲染图像,以减轻渲染负担,然后使用高分辨率auxiliary G-Buffers作为额外输入,进行高分辨率恢复。大多数现有方法都是利用低分辨率输入的信息,如历史帧,但是低分辨率输入缺乏高频率细节,使其很难回归高分辨率预测中的细节。在这篇论文中,我们提出了一种高效、高质量的超解像技术,通过利用低成本高分辨率auxiliary G-Buffers作为额外输入,将LR图像和HR G-Buffers作为输入,并对多个分辨率水平进行对齐和融合特征。我们提出了一种高效的H-Net架构来解决这个问题,实现了显著减少渲染负担,无需明显下降质量。实验表明,我们的方法在4K分辨率的$4 \times 4$和甚至更加挑战性的$8 \times 8$恢复 случа中,可以实现实时性和显著性能提升,而且与现有方法相比,有较大的质量提升和性能提升。

Efficient and Effective Multi-View Subspace Clustering for Large-scale Data

  • paper_url: http://arxiv.org/abs/2310.09718
  • repo_url: None
  • paper_authors: Yuxiu Lin, Hui Liu, Ren Wang, Gongguan Chen, Caiming Zhang
  • For: 提高大规模多视图数据集中 clustering 性能,解决现有方法中FC层的参数缺乏效率和内存成本问题。* Methods: 提出了一种新的深度框架E$^2$LMVSC,通过在多视图数据上实现硬件约束来提高共享表示的质量,并使用信息瓶颈理论来获得最小够的共享特征表示。* Results: 对大规模多视图数据集进行了广泛的实验,并证明了E$^2$LMVSC可以与现有方法匹配性能,同时在大规模数据集中实现更高的 clustering 性能。
    Abstract Recent multi-view subspace clustering achieves impressive results utilizing deep networks, where the self-expressive correlation is typically modeled by a fully connected (FC) layer. However, they still suffer from two limitations: i) it is under-explored to extract a unified representation from multiple views that simultaneously satisfy minimal sufficiency and discriminability. ii) the parameter scale of the FC layer is quadratic to the number of samples, resulting in high time and memory costs that significantly degrade their feasibility in large-scale datasets. In light of this, we propose a novel deep framework termed Efficient and Effective Large-scale Multi-View Subspace Clustering (E$^2$LMVSC). Specifically, to enhance the quality of the unified representation, a soft clustering assignment similarity constraint is devised for explicitly decoupling consistent, complementary, and superfluous information across multi-view data. Then, following information bottleneck theory, a sufficient yet minimal unified feature representation is obtained. Moreover, E$^2$LMVSC employs the maximal coding rate reduction principle to promote intra-cluster aggregation and inter-cluster separability within the unified representation. Finally, the self-expressive coefficients are learned by a Relation-Metric Net instead of a parameterized FC layer for greater efficiency. Extensive experiments show that E$^2$LMVSC yields comparable results to existing methods and achieves state-of-the-art clustering performance in large-scale multi-view datasets.
    摘要 最近的多视图子空间分 clustering 技术已经取得了很好的成果,使用深度网络,其中自我表达相关性通常是使用全连接(FC)层来模型的。然而,它们仍然受到两种限制:一是不足地提取多视图数据中共同满足最小充分和分类可能性的统一表示。二是FC层的参数缺省是数据样本的平方,导致时间和内存成本很高,使其在大规模数据集中不可行。为此,我们提出了一种新的深度框架,称为高效高质量大规模多视图子空间分 clustering(E$^2$LMVSC)。Specifically, E$^2$LMVSC 使用软 clustering分配相似性约束,以解耦多视图数据中一致、补充和冗余信息的信息。然后,根据信息瓶颈理论,从多视图数据中获得最小充分的统一特征表示。此外,E$^2$LMVSC 使用最大编码率减少原理,以促进内群归一化和间群分离在统一表示中。最后,相关度 metric 网络代替 parameterized FC 层来学习自我表达系数,以提高效率。我们的实验表明,E$^2$LMVSC 与现有方法相当,并在大规模多视图数据集中实现了状态机器人的分 clustering性能。

LOVECon: Text-driven Training-Free Long Video Editing with ControlNet

  • paper_url: http://arxiv.org/abs/2310.09711
  • repo_url: https://github.com/zhijie-group/lovecon
  • paper_authors: Zhenyi Liao, Zhijie Deng
  • for: 这 paper targets 长视频编辑 без需要训练,以满足电影制作、广告等领域的需求。
  • methods: 我们基于 ControlNet 建立了一个简单而有效的基线,通过将长视频分成窗口,并开发了一种跨窗口注意力机制来保证全局风格的一致性和最大化窗口之间的平滑性。 我们还利用 DDIM 逆转来提取源视频中的信息,并将其集成到生成过程中的秘密状态中。
  • results: 我们的方法在不同场景中(包括Attributes改变、风格传输和背景替换等)都显示出了超越基eline的效果,能够编辑长达 128 帧的视频 according to 用户需求。
    Abstract Leveraging pre-trained conditional diffusion models for video editing without further tuning has gained increasing attention due to its promise in film production, advertising, etc. Yet, seminal works in this line fall short in generation length, temporal coherence, or fidelity to the source video. This paper aims to bridge the gap, establishing a simple and effective baseline for training-free diffusion model-based long video editing. As suggested by prior arts, we build the pipeline upon ControlNet, which excels at various image editing tasks based on text prompts. To break down the length constraints caused by limited computational memory, we split the long video into consecutive windows and develop a novel cross-window attention mechanism to ensure the consistency of global style and maximize the smoothness among windows. To achieve more accurate control, we extract the information from the source video via DDIM inversion and integrate the outcomes into the latent states of the generations. We also incorporate a video frame interpolation model to mitigate the frame-level flickering issue. Extensive empirical studies verify the superior efficacy of our method over competing baselines across scenarios, including the replacement of the attributes of foreground objects, style transfer, and background replacement. In particular, our method manages to edit videos with up to 128 frames according to user requirements. Code is available at https://github.com/zhijie-group/LOVECon.
    摘要 利用预训练的条件扩散模型进行视频编辑,无需进一步调参,在电影制作、广告等领域受到了越来越多的关注。然而,先前的研究在这一领域缺乏长期编辑、时间准确性和原始视频忠实性等方面的表现。本文旨在填补这一空白,建立一个简单有效的基线方法,通过控制网络(ControlNet)和跨窗口注意力机制来实现无需训练的扩散模型基于长视频编辑。为了缓解由计算机内存限制所导致的长度约束,我们将长视频拆分成连续的窗口,并开发了一种新的跨窗口注意力机制,以保证全局风格的一致性和最大化窗口之间的平滑性。此外,我们还提取了源视频中的信息通过DDIM反向减法,并将其 интегрирова到生成过程中的幂态态中。此外,我们还添加了一种视频帧 interpolate模型,以减少帧级闪烁问题。经验研究表明,我们的方法在不同场景中,如改变对eground对象的特性、风格传递和背景替换等场景中,具有更高的效果,并能编辑长达128帧的视频。代码可以在https://github.com/zhijie-group/LOVECon中下载。

cs.AI - 2023-10-15

On Statistical Learning of Branch and Bound for Vehicle Routing Optimization

  • paper_url: http://arxiv.org/abs/2310.09986
  • repo_url: https://github.com/isotlaboratory/ml4vrp
  • paper_authors: Andrew Naguib, Waleed A. Yousef, Issa Traoré, Mohammad Mamun
  • for: solve the capacitated vehicle routing problem (CVRP) using machine learning
  • methods: utilize and compare the performance of three neural networks (GCNN, GraphSAGE, and GAT) to emulate the Strong Branching strategy
  • results: match or improve upon the performance of the branch and bound algorithm with significantly less computational time
    Abstract Recently, machine learning of the branch and bound algorithm has shown promise in approximating competent solutions to NP-hard problems. In this paper, we utilize and comprehensively compare the outcomes of three neural networks--graph convolutional neural network (GCNN), GraphSAGE, and graph attention network (GAT)--to solve the capacitated vehicle routing problem. We train these neural networks to emulate the decision-making process of the computationally expensive Strong Branching strategy. The neural networks are trained on six instances with distinct topologies from the CVRPLIB and evaluated on eight additional instances. Moreover, we reduced the minimum number of vehicles required to solve a CVRP instance to a bin-packing problem, which was addressed in a similar manner. Through rigorous experimentation, we found that this approach can match or improve upon the performance of the branch and bound algorithm with the Strong Branching strategy while requiring significantly less computational time. The source code that corresponds to our research findings and methodology is readily accessible and available for reference at the following web address: https://isotlaboratory.github.io/ml4vrp
    摘要 近些时间,机器学习的分支和约束算法在解决NP困难问题中表现出了承诺。在这篇论文中,我们利用了三种神经网络--图 convolutional neural network (GCNN), GraphSAGE, 和 graph attention network (GAT) --来解决具有限制的车辆路径问题。我们使用这些神经网络来模拟计算成本较高的强分支策略的决策过程。我们在CVRPLIB中采样了六个不同的topology实例,并对八个额外实例进行了评估。此外,我们将CVRP实例中的最小车辆数量降低到了一个箱包问题,该问题在类似的方式进行了解决。经过严格的实验,我们发现这种方法可以与分支和约束算法中的强分支策略相匹配或超越性能,并且需要 significatively less computational time。相关的研究发现和方法的源代码可以在以下网址查看:https://isotlaboratory.github.io/ml4vrp

Farzi Data: Autoregressive Data Distillation

  • paper_url: http://arxiv.org/abs/2310.09983
  • repo_url: None
  • paper_authors: Noveen Sachdeva, Zexue He, Wang-Cheng Kang, Jianmo Ni, Derek Zhiyuan Cheng, Julian McAuley
  • for: 本研究旨在为自动逆进机器学习任务提供数据减混技术,以便在训练大型模型时采用更小的数据量。
  • methods: 我们提出了一种名为“Farzi”的方法,它可以将输入和输出之间的紧密左右 causal 结构转化为一小批 synthetic 序列(Farzi Data),以保持或提高模型性能。 Farzi 在内部使用了高效的反向模板导数和积分产品来实现内存灵活的数据减混。
  • results: 我们在测试sequential recommendation和语言模型任务中,可以使用 Farzi Data 的0.1%到原始数据大小的比例来训练现代模型,并达到98-120%的下游全数据性能。这表明可以通过减少数据量来训练更好的模型,并开启了将来大型自动逆进机器学习模型的设计和数据量的扩展的新机遇。
    Abstract We study data distillation for auto-regressive machine learning tasks, where the input and output have a strict left-to-right causal structure. More specifically, we propose Farzi, which summarizes an event sequence dataset into a small number of synthetic sequences -- Farzi Data -- which are optimized to maintain (if not improve) model performance compared to training on the full dataset. Under the hood, Farzi conducts memory-efficient data distillation by (i) deriving efficient reverse-mode differentiation of the Adam optimizer by leveraging Hessian-Vector Products; and (ii) factorizing the high-dimensional discrete event-space into a latent-space which provably promotes implicit regularization. Empirically, for sequential recommendation and language modeling tasks, we are able to achieve 98-120% of downstream full-data performance when training state-of-the-art models on Farzi Data of size as little as 0.1% of the original dataset. Notably, being able to train better models with significantly less data sheds light on the design of future large auto-regressive models, and opens up new opportunities to further scale up model and data sizes.
    摘要 我们研究数据简化技术,用于自动递归学习任务,输入和输出具有约束性的左向 causal 结构。我们提出了 Farzi,它将事件序列数据总结为一小数量的 sintetic 序列 -- Farzi 数据 -- 以保持(或更好)模型性能相对训练全 dataset。在实现方面,Farzi 实现了内存有效的数据简化,通过以下两个方法:(i) 通过利用 Hessian-Vector Products 来 derivate Adam 优化器的逆向Mode 导数,实现高效的数据简化。(ii) 通过 факторизе 高维 discrete 事件空间为尺度空间,实现隐式 regularization。在实验中,我们在序列推荐和自然语言处理任务中,可以使用 Farzi 数据训练现有模型,达到了原始数据的 98-120% 的下游性能。这表明可以通过减少数据量训练更好的模型,为未来大型自动递归模型的设计提供了新的思路,并开创了训练模型和数据量的新机遇。

Chinese Painting Style Transfer Using Deep Generative Models

  • paper_url: http://arxiv.org/abs/2310.09978
  • repo_url: https://github.com/yanyangbaobeiisemma/chinsepaintingstyletransfer
  • paper_authors: Weijian Ma, Yanyang Kong
  • for: 本研究旨在将传统中国画风 transferred to modern images like nature objects, portraits and landscapes.
  • methods: 我们将使用 state-of-the-art deep generative models for Chinese painting style transfer, 并评估其表现 both qualitatively and quantitatively. 此外,我们还提出了一种 combining several style transfer models for our task.
  • results: 我们将在本研究中评估和比较不同的深度生成模型在传统中国画风转移 task 中的表现, 并提出一种新的方法 combination 多种风格转移模型。
    Abstract Artistic style transfer aims to modify the style of the image while preserving its content. Style transfer using deep learning models has been widely studied since 2015, and most of the applications are focused on specific artists like Van Gogh, Monet, Cezanne. There are few researches and applications on traditional Chinese painting style transfer. In this paper, we will study and leverage different state-of-the-art deep generative models for Chinese painting style transfer and evaluate the performance both qualitatively and quantitatively. In addition, we propose our own algorithm that combines several style transfer models for our task. Specifically, we will transfer two main types of traditional Chinese painting style, known as "Gong-bi" and "Shui-mo" (to modern images like nature objects, portraits and landscapes.
    摘要 <>文化风格转移目的是对图像的风格进行修改,保留其内容。 Deep learning模型在2015年之后广泛研究了风格转移,大多数应用都是专注于特定艺术家如万高、蒙德、刺激。有很少的研究和应用在传统中国画风格转移方面。在这篇论文中,我们将研究和利用不同的国际先进的生成模型进行中国画风格转移,评估其性能 both qualitatively和quantitatively。此外,我们还提出了我们自己的算法,将多种风格转移模型结合起来用于我们的任务。具体来说,我们将将“公笔”和“水墨”两种传统中国画风格转移到现代图像中,如自然景观、人像和风景等。Translation notes:* "Gong-bi" (工笔) and "Shui-mo" (水墨) are two main types of traditional Chinese painting styles.* "公笔" and "水墨" are both translated as "Chinese painting style" in the text, but they refer to different specific styles.* "国际先进的生成模型" (international advanced generative models) is a phrase used to refer to state-of-the-art deep learning models.* "qualitatively" and "quantitatively" are both translated as "both qualitatively and quantitatively" in the text, but "qualitatively" refers to the subjective evaluation of the results, while "quantitatively" refers to the objective evaluation using metrics such as PSNR or SSIM.

Specialized Deep Residual Policy Safe Reinforcement Learning-Based Controller for Complex and Continuous State-Action Spaces

  • paper_url: http://arxiv.org/abs/2310.14788
  • repo_url: https://github.com/ammar-n-abbas/CoL-SDRPRL
  • paper_authors: Ammar N. Abbas, Georgios C. Chasparis, John D. Kelleher
  • For: The paper is written to address the limitations of traditional controllers in safety-critical environments, and to propose a specialized deep reinforcement learning approach for complex and continuous state-action spaces.* Methods: The paper proposes a cycle of learning approach that combines residual policy learning with expert trajectory guidance, and specializes the policy through input-output hidden Markov model to optimize the policy within the region of interest.* Results: The proposed solution is validated on the Tennessee Eastman process control, and the results show that the hybrid control architecture that combines the reinforcement learning agent with the conventional controller can improve the control performance and adapt to abnormal situations.Here’s the simplified Chinese text for the three key points:* For: 这篇论文是为了解决传统控制器在安全关键环境中的局限性,并提出一种特殊的深度强化学习方法来处理复杂的状态动作空间。* Methods: 论文提出了一种循环学习方法, combining residual policy learning with expert trajectory guidance, 并通过输入输出隐马尔可夫模型特化策略以优化策略在兴趣区域内。* Results: 论文在田州东曼过程控制中验证了该解决方案,结果显示了hybrid控制架构, combining reinforcement learning agent和传统控制器,可以提高控制性能并适应异常情况。
    Abstract Traditional controllers have limitations as they rely on prior knowledge about the physics of the problem, require modeling of dynamics, and struggle to adapt to abnormal situations. Deep reinforcement learning has the potential to address these problems by learning optimal control policies through exploration in an environment. For safety-critical environments, it is impractical to explore randomly, and replacing conventional controllers with black-box models is also undesirable. Also, it is expensive in continuous state and action spaces, unless the search space is constrained. To address these challenges we propose a specialized deep residual policy safe reinforcement learning with a cycle of learning approach adapted for complex and continuous state-action spaces. Residual policy learning allows learning a hybrid control architecture where the reinforcement learning agent acts in synchronous collaboration with the conventional controller. The cycle of learning initiates the policy through the expert trajectory and guides the exploration around it. Further, the specialization through the input-output hidden Markov model helps to optimize policy that lies within the region of interest (such as abnormality), where the reinforcement learning agent is required and is activated. The proposed solution is validated on the Tennessee Eastman process control.
    摘要 传统控制器有限制,因为它们基于前期知识,需要动态模型化,并且在异常情况下表现不佳。深度权值学习有可能解决这些问题,通过环境中的探索学习优化控制策略。但是,在安全关键环境下,随机探索是不现实istic,而替换传统控制器的黑obox模型也不符合意愿。此外,在连续状态和动作空间中进行搜索也是昂贵的。为了解决这些挑战,我们提出了特殊化的深度剩余政策安全权值学习,采用环境中的循环学习策略。剩余政策学习允许在传统控制器和权值学习代理之间同步协作,并且通过输入-输出隐藏马尔可夫模型进行特殊化,以便优化政策,使其在特定区域(如异常情况)中表现最佳。我们的解决方案在田中东曼制程控制中得到验证。

Seeking Next Layer Neurons’ Attention for Error-Backpropagation-Like Training in a Multi-Agent Network Framework

  • paper_url: http://arxiv.org/abs/2310.09952
  • repo_url: None
  • paper_authors: Arshia Soltani Moakhar, Mohammad Azizmalayeri, Hossein Mirzaei, Mohammad Taghi Manzuri, Mohammad Hossein Rohban
    for: 这 paper 的目的是提出一种基于 local objective 的多智能体神经网络训练方法,以提高神经网络在实际问题中的应用性。methods: 该 paper 使用了一种基于自利 Interest 的神经网络模型,并对其进行了优化。在这种模型中,每个神经元尝试通过 Maximizing 其自己的局部目标来适应神经网络的训练。results: 该 paper 通过三个数据集的实验表明,使用这种方法可以提高神经网络在快速学习和牵扯问题中的性能,并在灾变性学习测试中超过 error-backpropagation。
    Abstract Despite considerable theoretical progress in the training of neural networks viewed as a multi-agent system of neurons, particularly concerning biological plausibility and decentralized training, their applicability to real-world problems remains limited due to scalability issues. In contrast, error-backpropagation has demonstrated its effectiveness for training deep networks in practice. In this study, we propose a local objective for neurons that, when pursued by neurons individually, align them to exhibit similarities to error-backpropagation in terms of efficiency and scalability during training. For this purpose, we examine a neural network comprising decentralized, self-interested neurons seeking to maximize their local objective -- attention from subsequent layer neurons -- and identify the optimal strategy for neurons. We also analyze the relationship between this strategy and backpropagation, establishing conditions under which the derived strategy is equivalent to error-backpropagation. Lastly, we demonstrate the learning capacity of these multi-agent neural networks through experiments on three datasets and showcase their superior performance relative to error-backpropagation in a catastrophic forgetting benchmark.
    摘要 具有很大理论进步的神经网络 viewed as a multi-agent system of neurons 的训练,特别是生物可能性和分散式训练,却因为扩展性问题而受限。相比之下,错误反射法在实务中证明了它的有效性 для 训练深度网络。在这篇研究中,我们提出了一个本地目标 для neurons,使得它们个别努力以获得类似于错误反射法的有效性和扩展性 durante 训练。为了实现这个目标,我们对一个分散式、自利 neurons 组成的神经网络进行了分析,并找到了最佳策略 для neurons。我们还分析了这策略和错误反射之间的关系,并证明了在某些情况下, derivated 策略与错误反射法相同。最后,我们透过实验证明了这些多客体神经网络的学习能力,并在三个数据集上显示了它们的超越性。

Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models

  • paper_url: http://arxiv.org/abs/2310.09949
  • repo_url: None
  • paper_authors: Wenqi Jiang, Marco Zeller, Roger Waleffe, Torsten Hoefler, Gustavo Alonso
    methods: 该研究使用了一种约束语言模型(LM)和检索器的异质加速器系统,以提高LM的执行效率。results: 研究发现,使用Chameleon系统可以实现23.72倍的速度提升和26.2倍的能效率提升,相比CPU和GPU vector搜索系统。此外,Chameleon系统在不同RALM配置下可以实现1.16倍的响应时间减少和3.18倍的速度提升。
    Abstract A Retrieval-Augmented Language Model (RALM) augments a generative language model by retrieving context-specific knowledge from an external database. This strategy facilitates impressive text generation quality even with smaller models, thus reducing orders of magnitude of computational demands. However, RALMs introduce unique system design challenges due to (a) the diverse workload characteristics between LM inference and retrieval and (b) the various system requirements and bottlenecks for different RALM configurations such as model sizes, database sizes, and retrieval frequencies. We propose Chameleon, a heterogeneous accelerator system that integrates both LM and retrieval accelerators in a disaggregated architecture. The heterogeneity ensures efficient acceleration of both LM inference and retrieval, while the accelerator disaggregation enables the system to independently scale both types of accelerators to fulfill diverse RALM requirements. Our Chameleon prototype implements retrieval accelerators on FPGAs and assigns LM inference to GPUs, with a CPU server orchestrating these accelerators over the network. Compared to CPU-based and CPU-GPU vector search systems, Chameleon achieves up to 23.72x speedup and 26.2x energy efficiency. Evaluated on various RALMs, Chameleon exhibits up to 2.16x reduction in latency and 3.18x speedup in throughput compared to the hybrid CPU-GPU architecture. These promising results pave the way for bringing accelerator heterogeneity and disaggregation into future RALM systems.
    摘要 一种叫做Retrieval-Augmented Language Model(RALM)的语言模型可以通过从外部数据库中获取上下文特定的知识来增强生成语言模型。这种策略使得even with smaller models可以达到出色的文本生成质量,从而降低了计算需求的级别。然而,RALM引入了一些独特的系统设计挑战,包括(a)语言模型推理和检索工作负荷的多样性,以及(b)不同的RALM配置,如模型大小、数据库大小和检索频率等的系统需求和瓶颈。我们提出了一种叫做Chameleon的异步加速器系统,它将语言模型推理和检索加速器分解成不同的硬件模块。这种多样性和分解使得系统可以独立地扩展两类加速器,以满足不同的RALM需求。我们的Chameleon原型在FPGA上实现检索加速器,并将语言模型推理分配给GPU。CPU服务器通过网络管理这些加速器。相比CPU基于和CPU-GPU вектор搜索系统,Chameleon可以达到23.72倍的速度提升和26.2倍的能效率提升。在不同的RALM系统上进行了评估,Chameleon可以减少响应时间2.16倍,提高通过put Throughput 3.18倍。这些出色的结果铺平了将加速器多样性和分解引入未来RALM系统的道路。

“Reading Between the Heat”: Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection

  • paper_url: http://arxiv.org/abs/2310.09932
  • repo_url: None
  • paper_authors: Yi Xiao, Harshit Sharma, Zhongyang Zhang, Dessa Bergen-Cico, Tauhidur Rahman, Asif Salekin
  • for: 这 paper 是为了开发一种可靠的、无接触的indoor stress监测系统,用于评估工作场所产能、智能家庭和个性化心理健康监测。
  • methods: 这 paper 使用了 ThermaStrain,一种共同教学框架,将穿戴式 electrodermal activity (EDA) 传感器和无接触thermal感知结合使用,以提高无接触stress监测的精度。
  • results: 这 paper 的实验结果表明,ThermaStrain 可以在不同的距离和压力情况下,实现高精度的stress分类,并且在实时执行、边缘计算和多个人感知方面表现出色。
    Abstract Stress impacts our physical and mental health as well as our social life. A passive and contactless indoor stress monitoring system can unlock numerous important applications such as workplace productivity assessment, smart homes, and personalized mental health monitoring. While the thermal signatures from a user's body captured by a thermal camera can provide important information about the "fight-flight" response of the sympathetic and parasympathetic nervous system, relying solely on thermal imaging for training a stress prediction model often lead to overfitting and consequently a suboptimal performance. This paper addresses this challenge by introducing ThermaStrain, a novel co-teaching framework that achieves high-stress prediction performance by transferring knowledge from the wearable modality to the contactless thermal modality. During training, ThermaStrain incorporates a wearable electrodermal activity (EDA) sensor to generate stress-indicative representations from thermal videos, emulating stress-indicative representations from a wearable EDA sensor. During testing, only thermal sensing is used, and stress-indicative patterns from thermal data and emulated EDA representations are extracted to improve stress assessment. The study collected a comprehensive dataset with thermal video and EDA data under various stress conditions and distances. ThermaStrain achieves an F1 score of 0.8293 in binary stress classification, outperforming the thermal-only baseline approach by over 9%. Extensive evaluations highlight ThermaStrain's effectiveness in recognizing stress-indicative attributes, its adaptability across distances and stress scenarios, real-time executability on edge platforms, its applicability to multi-individual sensing, ability to function on limited visibility and unfamiliar conditions, and the advantages of its co-teaching approach.
    摘要 压力会影响我们的身体和心理健康以及我们的社会生活。一个不需要接触和干预的indoor压力监测系统可以开启多个重要应用程序,如工作场所产量评估、智能家庭和个性化压力监测。而thermal图像中的用户体表的热签ature可以提供关键的“战斗或逃脱”压力反应信息,但凭借热成像alone来训练压力预测模型可能会导致过拟合,从而影响性能。这篇论文解决了这个挑战,通过引入ThermaStrain,一种新的合作学习框架,实现高精度压力预测表现。在训练过程中,ThermaStrain使用了一个穿着式电导活动(EDA)传感器,将热成像中的压力指示符转换为穿着式EDA传感器的压力指示符,以便在训练过程中增强模型的鲁棒性。在测试过程中,只使用热成像,从热成像和模拟EDA表示中提取压力指示符,以提高压力评估。研究采集了包括热成像和EDA数据在内的全面数据集,ThermaStrain在二分类压力预测中取得F1分数为0.8293,在热成像基eline方法上出performancedoor9%。广泛的评估表明ThermaStrain具有识别压力指示符的能力,适应不同距离和压力情况,实时执行在边缘平台上,适用于多个个体感知,在有限视力和不熟悉情况下可行,以及合作学习的优势。

Estimating Uncertainty in Multimodal Foundation Models using Public Internet Data

  • paper_url: http://arxiv.org/abs/2310.09926
  • repo_url: https://github.com/AlaaLab/WebCP
  • paper_authors: Shiladitya Dutta, Hongbo Wei, Lars van der Laan, Ahmed M. Alaa
  • for: 这种论文是为了解决零shot预测中的不确定性问题。
  • methods: 该论文使用了自我超vised学习,并在测试时使用CLIP样式模型进行零shot分类。它还使用了一种新的协Forms score来衡量预测的可靠性。
  • results: 研究人员通过使用web数据进行 calibration,实现了针对各种生物医学数据集的零shot预测。他们的初步结果表明,通过在测试时使用网络上的calibration数据,可以实现预测的目标覆盖率,并且效率相对较高。
    Abstract Foundation models are trained on vast amounts of data at scale using self-supervised learning, enabling adaptation to a wide range of downstream tasks. At test time, these models exhibit zero-shot capabilities through which they can classify previously unseen (user-specified) categories. In this paper, we address the problem of quantifying uncertainty in these zero-shot predictions. We propose a heuristic approach for uncertainty estimation in zero-shot settings using conformal prediction with web data. Given a set of classes at test time, we conduct zero-shot classification with CLIP-style models using a prompt template, e.g., "an image of a ", and use the same template as a search query to source calibration data from the open web. Given a web-based calibration set, we apply conformal prediction with a novel conformity score that accounts for potential errors in retrieved web data. We evaluate the utility of our proposed method in Biomedical foundation models; our preliminary results show that web-based conformal prediction sets achieve the target coverage with satisfactory efficiency on a variety of biomedical datasets.
    摘要 基础模型通过大规模数据的自我超vision学习训练,可以适应各种下游任务。在测试时,这些模型可以通过零批预测来分类之前未看到的类别。在这篇论文中,我们解决了零批预测中的uncertainty量化问题。我们提出了一种启发式方法,使用web数据来实现零批预测中的uncertainty估计。给定一组测试时的类别,我们使用CLIP样式的模型进行零批分类,使用提示模板,例如“一张<类别>的图像”,并使用相同的模板作为搜索关键词来源网络数据。给定一个网络基础的核心集,我们应用彩色预测技术,使用一种新的彩色度分数,考虑可能存在的网络数据错误。我们对生物基础模型进行了初步的实验结果,表明在各种生物数据集上,网络基础的彩色预测集可以达到目标覆盖率,并且具有满意的效率。

Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers

  • paper_url: http://arxiv.org/abs/2310.09925
  • repo_url: https://github.com/hmohebbi/ContextMixingASR
  • paper_authors: Hosein Mohebbi, Grzegorz Chrupała, Willem Zuidema, Afra Alishahi
  • for: This paper aims to investigate how measures of ‘context-mixing’ developed for text models can be adapted and applied to models of spoken language, specifically in the case of homophony in French.
  • methods: The authors use a series of controlled experiments and probing analyses on Transformer-based speech models to explore how representations in encoder-only models and encoder-decoder models incorporate syntactic cues to identify the correct transcription.
  • results: The authors find that representations in encoder-only models effectively incorporate these cues, while encoders in encoder-decoder models mainly relegate the task of capturing contextual dependencies to decoder modules.
    Abstract Transformers have become a key architecture in speech processing, but our understanding of how they build up representations of acoustic and linguistic structure is limited. In this study, we address this gap by investigating how measures of 'context-mixing' developed for text models can be adapted and applied to models of spoken language. We identify a linguistic phenomenon that is ideal for such a case study: homophony in French (e.g. livre vs livres), where a speech recognition model has to attend to syntactic cues such as determiners and pronouns in order to disambiguate spoken words with identical pronunciations and transcribe them while respecting grammatical agreement. We perform a series of controlled experiments and probing analyses on Transformer-based speech models. Our findings reveal that representations in encoder-only models effectively incorporate these cues to identify the correct transcription, whereas encoders in encoder-decoder models mainly relegate the task of capturing contextual dependencies to decoder modules.
    摘要 听说模型已成为语音处理中关键的建筑,但我们对它们如何建立语音和文本结构的表示还是有限的。在这项研究中,我们尝试将文本模型中的'上下文混合'度量应用到语音模型中,以更好地理解它们如何建立表示。我们选择了一种语言现象,即法语中的同音异义(例如,"livre" vs "livres"),这种现象需要语音识别模型通过 determiners 和 Pronouns 等语法提示来纠正 spoken 词的意思,并且将其转录为句子中的正确形式。我们进行了一系列控制的实验和探索分析,发现encoder-only模型中的表示能够有效地捕捉这些语法提示,而encoder-decoder模型中的encoder模块主要通过decoder模块来捕捉上下文关系。

Predictive Maintenance Model Based on Anomaly Detection in Induction Motors: A Machine Learning Approach Using Real-Time IoT Data

  • paper_url: http://arxiv.org/abs/2310.14949
  • repo_url: None
  • paper_authors: Sergio F. Chevtchenko, Monalisa C. M. dos Santos, Diego M. Vieira, Ricardo L. Mota, Elisson Rocha, Bruna V. Cruz, Danilo Araújo, Ermeson Andrade
  • for: 本研究旨在透过互联网路物 (IoT) 设备收集腐败现象数据,并运用数据驱动模型进行异常检测在工业设备中。
  • methods: 本研究使用了一组融合预处理技术和机器学习 (ML) 模型,包括快速傅立叶 transform (FFT)、波лет трансформа (WT) 和分割,以提取数据的特征。 本研究还使用多目标优化和分析以保证异常检测率、假阳性率和推论速率之间的最佳平衡。
  • results: 本研究获得了一系列的实验结果,证明了融合预处理技术和 ML 模型可以实现高精度异常检测,并且可以在不同的工业上适用。
    Abstract With the support of Internet of Things (IoT) devices, it is possible to acquire data from degradation phenomena and design data-driven models to perform anomaly detection in industrial equipment. This approach not only identifies potential anomalies but can also serve as a first step toward building predictive maintenance policies. In this work, we demonstrate a novel anomaly detection system on induction motors used in pumps, compressors, fans, and other industrial machines. This work evaluates a combination of pre-processing techniques and machine learning (ML) models with a low computational cost. We use a combination of pre-processing techniques such as Fast Fourier Transform (FFT), Wavelet Transform (WT), and binning, which are well-known approaches for extracting features from raw data. We also aim to guarantee an optimal balance between multiple conflicting parameters, such as anomaly detection rate, false positive rate, and inference speed of the solution. To this end, multiobjective optimization and analysis are performed on the evaluated models. Pareto-optimal solutions are presented to select which models have the best results regarding classification metrics and computational effort. Differently from most works in this field that use publicly available datasets to validate their models, we propose an end-to-end solution combining low-cost and readily available IoT sensors. The approach is validated by acquiring a custom dataset from induction motors. Also, we fuse vibration, temperature, and noise data from these sensors as the input to the proposed ML model. Therefore, we aim to propose a methodology general enough to be applied in different industrial contexts in the future.
    摘要 “利用互联网络器件(IoT),可以从损坏现象中获取数据,设计数据驱动的模型以进行异常检测在工业设备中。这种方法不仅可以检测出可能的异常,而且可以作为建立预测维护政策的第一步。在这个工作中,我们展示了一个新的异常检测系统,应用于对发电机(induction motor)进行验证。这个工作使用了一组合的预处理技术,包括快速傅立叶变换(FFT)、wavelet变换(WT)和分割,这些技术都是抽象数据的常用方法。我们还希望确保多项衡量的依势关系,例如异常检测率、伪阳性率和推理速度,得到一个优化的解。为达到这个目的,我们进行多目标优化和分析。得到的 pareto 最佳解可以选择最佳的模型,以及评估这些模型的数据驱动和计算成本。不同于大多数在这个领域中使用公开available的数据集来验证他们的模型,我们提出了一个终端解决方案, combining 低成本和易于入手的 IoT 感应器。我们将这种方法应用于不同的工业上下,以提高维护效率和降低成本。”

Lifelong Sequence Generation with Dynamic Module Expansion and Adaptation

  • paper_url: http://arxiv.org/abs/2310.09886
  • repo_url: None
  • paper_authors: Chengwei Qin, Chen Chen, Shafiq Joty
  • for: 解决 continual learning 中的 Life-long Sequence Generation (LSG) 问题,即在不断训练模型的同时,总结出来的新生成模式,而不是忘记之前的知识。
  • methods: 我们提出了 Dynamic Module Expansion and Adaptation (DMEA) 方法,即在任务相似性的基础上动态决定模型需要的架构,并选择最相似的先前任务来促进新任务的适应性。同时,我们还提出了动态梯度缩放,以保持当前任务和先前任务的学习平衡。
  • results: 通过广泛的实验,我们示出了 DMEA 可以在不同的 LSG 设定下表现出色,常常超越现有的方法。
    Abstract Lifelong sequence generation (LSG), a problem in continual learning, aims to continually train a model on a sequence of generation tasks to learn constantly emerging new generation patterns while avoiding the forgetting of previous knowledge. Existing LSG methods mainly focus on maintaining old knowledge while paying little attention to knowledge transfer across tasks. In contrast, humans can better learn new tasks by leveraging previously acquired knowledge from similar tasks. Inspired by the learning paradigm of humans, we propose Dynamic Module Expansion and Adaptation (DMEA), which enables the model to dynamically determine the architecture for acquiring new knowledge based on task correlation and select the most similar previous tasks to facilitate adaptation to new tasks. In addition, as the learning process can easily be biased towards the current task which might cause more severe forgetting of previously learned knowledge, we propose dynamic gradient scaling to balance the learning of the current task and replayed tasks. With extensive experiments, we demonstrate that DMEA can consistently outperform existing methods in different LSG settings.
    摘要 这是一个生命长序列生成(LSG)问题,它是一种持续学习的问题,旨在不断训练一个模型,以学习不断出现的新生成模式,而且避免遗传知识的忘记。现有的LSG方法主要是维护古代知识,对任务之间的知识传递甚少关注。然而,人类在学习新任务时,可以更好地利用先前所获得的知识,以便更好地适应新任务。受人类学习模式启发,我们提出了动态模组扩展和适应(DMEA)方法,让模型在任务相似度和先前任务之间进行动态决定模组架构,并选择最相似的先前任务来促进新任务的适应。此外,当学习过程可能会偏向现在任务,导致更严重的知识忘记,我们提出了动态GradientScaling来均衡现在任务和重复任务的学习。经过广泛的实验,我们证明了DMEA可以在不同的LSG设定中具有优秀的表现。

In-Context Learning with Iterative Demonstration Selection

  • paper_url: http://arxiv.org/abs/2310.09881
  • repo_url: None
  • paper_authors: Chengwei Qin, Aston Zhang, Anirudh Dagar, Wenming Ye
  • for: 提高大语言模型(LLM)在几个示例下学习中的表现。
  • methods: Iterative Demonstration Selection(IDS)方法,使用零shot chain-of-thoughtreasoning(Zero-shot-CoT)选择示例,并在多个迭代中选择最佳示例。
  • results: 在多个任务上,包括通用理解、问答、话题分类和情感分析,IDS方法可以一直 exceed 现有的ICL示例选择方法。
    Abstract Spurred by advancements in scale, large language models (LLMs) have demonstrated strong few-shot learning ability via in-context learning (ICL). However, the performance of ICL has been shown to be highly sensitive to the selection of few-shot demonstrations. Selecting the most suitable examples as context remains an ongoing challenge and an open problem. Existing literature has highlighted the importance of selecting examples that are diverse or semantically similar to the test sample while ignoring the fact that the optimal selection dimension, i.e., diversity or similarity, is task-specific. Leveraging the merits of both dimensions, we propose Iterative Demonstration Selection (IDS). Using zero-shot chain-of-thought reasoning (Zero-shot-CoT), IDS iteratively selects examples that are diverse but still strongly correlated with the test sample as ICL demonstrations. Specifically, IDS applies Zero-shot-CoT to the test sample before demonstration selection. The output reasoning path is then used to choose demonstrations that are prepended to the test sample for inference. The generated answer is accompanied by its corresponding reasoning path for extracting a new set of demonstrations in the next iteration. After several iterations, IDS adopts majority voting to obtain the final result. Through extensive experiments on tasks including commonsense reasoning, question answering, topic classification, and sentiment analysis, we demonstrate that IDS can consistently outperform existing ICL demonstration selection methods.
    摘要 促进了规模的进步,大语言模型(LLM)在内容学习(ICL)中表现出了强大的几个示例学习能力。然而,ICL表现的选择示例仍然是一个持续的挑战和开放问题。现有的文献强调选择测试样本中的多样化或semantic相似的示例,而忽略了任务特定的最佳选择维度。基于这两个维度的优点,我们提出了迭代示例选择(IDS)。IDS使用零实例链条思维(Zero-shot-CoT)来选择示例,其中逻辑路径是在测试样本之前应用于测试样本。然后,选择的示例将被附加到测试样本中进行INF的推理。生成的答案将被 accompanied by its corresponding reasoning path,以提取新的示例集。经过多轮迭代,IDS采用多数投票方式获得最终结果。我们通过对常识推理、问答、话题分类和情感分析等任务进行广泛的实验,证明IDS可以一直性能高于现有的ICL示例选择方法。

Statistical inference using machine learning and classical techniques based on accumulated local effects (ALE)

  • paper_url: http://arxiv.org/abs/2310.09877
  • repo_url: None
  • paper_authors: Chitu Okoli
  • for: 这篇论文主要是为了提出一种model-agnostic的方法来进行黑盒机器学习(ML)算法的全面解释。
  • methods: 这篇论文使用了ALE(Accumulated Local Effects)模型无关的方法来进行解释,并提出了一些新的统计推断方法来解决小样本大小的问题,以及在ML数据分析中对变量的总效果的INTRODUCTION。
  • results: 这篇论文提出了一些实用的解决方案,包括在ALE分析中确保可靠性,以及在ML数据分析中对变量的总效果进行INTRODUCTION。这些解决方案可以帮助更好地进行ML数据分析和统计推断。
    Abstract Accumulated Local Effects (ALE) is a model-agnostic approach for global explanations of the results of black-box machine learning (ML) algorithms. There are at least three challenges with conducting statistical inference based on ALE: ensuring the reliability of ALE analyses, especially in the context of small datasets; intuitively characterizing a variable's overall effect in ML; and making robust inferences from ML data analysis. In response, we introduce innovative tools and techniques for statistical inference using ALE, establishing bootstrapped confidence intervals tailored to dataset size and introducing ALE effect size measures that intuitively indicate effects on both the outcome variable scale and a normalized scale. Furthermore, we demonstrate how to use these tools to draw reliable statistical inferences, reflecting the flexible patterns ALE adeptly highlights, with implementations available in the 'ale' package in R. This work propels the discourse on ALE and its applicability in ML and statistical analysis forward, offering practical solutions to prevailing challenges in the field.
    摘要 集成本地效应(ALE)是一种模型不依赖的方法,用于全面解释黑盒机器学习(ML)算法的结果。在进行统计推断基于ALE时,存在至少三个挑战:确保ALE分析的可靠性,特别是在小数据集中;Intuitively characterize a variable's overall effect in ML;和从ML数据分析中获得可靠的推断。为此,我们介绍了新的工具和技术,用于基于ALE的统计推断,包括适应 dataset 大小的 bootstrap 信任区间和 ALE 效果大小度量,这些度量可以直观地反映变量对结果变量的影响和Normalized 比例。此外,我们示例了如何使用这些工具来提取可靠的统计推断,反映 ALE 灵活地高亮的各种模式,R 中的 'ale' 包提供了实现。这项工作推动了 ALE 在 ML 和统计分析领域的应用前进,提供了实用的解决方案,用于解决领域中的挑战。

Federated Multi-Objective Learning

  • paper_url: http://arxiv.org/abs/2310.09866
  • repo_url: https://github.com/Zakaria-Dahi/Multi-Objective_Optimiser_For_Federated_Learning
  • paper_authors: Haibo Yang, Zhuqing Liu, Jia Liu, Chaosheng Dong, Michinari Momma
  • for: Multi-agent multi-task learning applications with distributed nature and data privacy needs.
  • methods: Federated multi-objective learning (FMOL) framework with multiple clients distributively and collaboratively solving an MOO problem while keeping their training data private.
  • results: Proposed two new federated multi-objective optimization (FMOO) algorithms called federated multi-gradient descent averaging (FMGDA) and federated stochastic multi-gradient descent averaging (FSMGDA), which allow local updates to significantly reduce communication costs, while achieving the same convergence rates as those of their algorithmic counterparts in the single-objective federated learning.
    Abstract In recent years, multi-objective optimization (MOO) emerges as a foundational problem underpinning many multi-agent multi-task learning applications. However, existing algorithms in MOO literature remain limited to centralized learning settings, which do not satisfy the distributed nature and data privacy needs of such multi-agent multi-task learning applications. This motivates us to propose a new federated multi-objective learning (FMOL) framework with multiple clients distributively and collaboratively solving an MOO problem while keeping their training data private. Notably, our FMOL framework allows a different set of objective functions across different clients to support a wide range of applications, which advances and generalizes the MOO formulation to the federated learning paradigm for the first time. For this FMOL framework, we propose two new federated multi-objective optimization (FMOO) algorithms called federated multi-gradient descent averaging (FMGDA) and federated stochastic multi-gradient descent averaging (FSMGDA). Both algorithms allow local updates to significantly reduce communication costs, while achieving the {\em same} convergence rates as those of their algorithmic counterparts in the single-objective federated learning. Our extensive experiments also corroborate the efficacy of our proposed FMOO algorithms.
    摘要 To solve the FMOL problem, we propose two new federated multi-objective optimization (FMOO) algorithms, called federated multi-gradient descent averaging (FMGDA) and federated stochastic multi-gradient descent averaging (FSMGDA). Both algorithms allow for local updates to significantly reduce communication costs, while achieving the same convergence rates as their algorithmic counterparts in single-objective federated learning. Our extensive experiments also demonstrate the effectiveness of our proposed FMOO algorithms.

Federated Reinforcement Learning for Resource Allocation in V2X Networks

  • paper_url: http://arxiv.org/abs/2310.09858
  • repo_url: None
  • paper_authors: Kaidi Xu, Shenglong Zhou, Geoffrey Ye Li
  • for: 这个论文是用来研究车至所有东西(V2X)网络资源分配的最佳化方法。
  • methods: 这个论文使用联邦强化学习(FRL)框架,并使用不精确的方向分解方法(ADMM)来解决资源分配问题。
  • results: 这个论文的结果显示,使用PASM算法可以实现资源分配问题的最佳化,并且比一些基于估计的方法具有更好的数字性表现。
    Abstract Resource allocation significantly impacts the performance of vehicle-to-everything (V2X) networks. Most existing algorithms for resource allocation are based on optimization or machine learning (e.g., reinforcement learning). In this paper, we explore resource allocation in a V2X network under the framework of federated reinforcement learning (FRL). On one hand, the usage of RL overcomes many challenges from the model-based optimization schemes. On the other hand, federated learning (FL) enables agents to deal with a number of practical issues, such as privacy, communication overhead, and exploration efficiency. The framework of FRL is then implemented by the inexact alternative direction method of multipliers (ADMM), where subproblems are solved approximately using policy gradients and accelerated by an adaptive step size calculated from their second moments. The developed algorithm, PASM, is proven to be convergent under mild conditions and has a nice numerical performance compared with some baseline methods for solving the resource allocation problem in a V2X network.
    摘要 资源分配对于 vehicle-to-everything(V2X)网络的性能有着重要的影响。大多数现有的资源分配算法基于优化或机器学习(例如,强化学习)。在这篇论文中,我们explore V2X网络中的资源分配问题在 federated reinforcement learning(FRL)框架下。一方面,RL可以超越许多模型基于优化方案中的挑战。另一方面,联邦学习(FL)可以帮助代理人处理一些实际问题,如隐私、通信开销和探索效率。然后,FRL框架被实现通过不确定多члены方法(ADMM),其中子问题被解决approximately使用政策偏导和加速器是根据其第二次 moments。开发的算法,PASM,在一定的条件下被证明是收敛的,并与一些基准方法相比有良好的数值性能。

MERTech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model With Multi-Task Finetuning

  • paper_url: http://arxiv.org/abs/2310.09853
  • repo_url: None
  • paper_authors: Dichucheng Li, Yinghao Ma, Weixing Wei, Qiuqiang Kong, Yulun Wu, Mingjin Che, Fan Xia, Emmanouil Benetos, Wei Li
  • for: 本研究旨在提出一种自动检测乐器演奏技巧(IPT)的方法,以解决数据稀缺和类别不均匀问题。
  • methods: 该方法利用自动学习模型,先在大规模无标签音乐数据上进行自动学习,然后在IPT检测任务上练习 fine-tuning。此外,还 investigate了多任务融合finetuning,包括抑制和识别抑制的多个任务。
  • results: 该方法在多个IPT标准测试集上比过去的方法表现出色,在 Frame-level和事件-level度量中均显示出优异性。此外,多任务融合finetuning也能够提高每个IPT类别的准确率。
    Abstract Instrument playing techniques (IPTs) constitute a pivotal component of musical expression. However, the development of automatic IPT detection methods suffers from limited labeled data and inherent class imbalance issues. In this paper, we propose to apply a self-supervised learning model pre-trained on large-scale unlabeled music data and finetune it on IPT detection tasks. This approach addresses data scarcity and class imbalance challenges. Recognizing the significance of pitch in capturing the nuances of IPTs and the importance of onset in locating IPT events, we investigate multi-task finetuning with pitch and onset detection as auxiliary tasks. Additionally, we apply a post-processing approach for event-level prediction, where an IPT activation initiates an event only if the onset output confirms an onset in that frame. Our method outperforms prior approaches in both frame-level and event-level metrics across multiple IPT benchmark datasets. Further experiments demonstrate the efficacy of multi-task finetuning on each IPT class.
    摘要 To further enhance performance, we investigate multi-task finetuning with pitch and onset detection as auxiliary tasks. Pitch is essential for capturing the nuances of IPTs, while onset information is critical for locating IPT events. We also apply a post-processing approach for event-level prediction, where an IPT activation is only triggered if the onset output confirms an onset in that frame.Our method outperforms prior approaches in both frame-level and event-level metrics across multiple IPT benchmark datasets. Additionally, we demonstrate the effectiveness of multi-task finetuning on each IPT class. Our approach provides a significant improvement in IPT detection accuracy, addressing the challenges of limited labeled data and class imbalance issues.

ACES: Generating Diverse Programming Puzzles with Autotelic Language Models and Semantic Descriptors

  • paper_url: http://arxiv.org/abs/2310.10692
  • repo_url: None
  • paper_authors: Julien Pourcel, Cédric Colas, Pierre-Yves Oudeyer, Laetitia Teodorescu
  • For: studying automated problem generation in the context of python programming puzzles, with a focus on interesting diversity optimization.* Methods: using semantic descriptors produced by a large language model (LLM) to directly optimize for interesting diversity, as well as few-shot-based generation.* Results: discovering a richer diversity of puzzles than existing diversity-maximizing algorithms, as measured across a range of diversity metrics.
    Abstract Finding and selecting new and interesting problems to solve is at the heart of curiosity, science and innovation. We here study automated problem generation in the context of the open-ended space of python programming puzzles. Existing generative models often aim at modeling a reference distribution without any explicit diversity optimization. Other methods explicitly optimizing for diversity do so either in limited hand-coded representation spaces or in uninterpretable learned embedding spaces that may not align with human perceptions of interesting variations. With ACES (Autotelic Code Exploration via Semantic descriptors), we introduce a new autotelic generation method that leverages semantic descriptors produced by a large language model (LLM) to directly optimize for interesting diversity, as well as few-shot-based generation. Each puzzle is labeled along 10 dimensions, each capturing a programming skill required to solve it. ACES generates and pursues novel and feasible goals to explore that abstract semantic space, slowly discovering a diversity of solvable programming puzzles in any given run. Across a set of experiments, we show that ACES discovers a richer diversity of puzzles than existing diversity-maximizing algorithms as measured across a range of diversity metrics. We further study whether and in which conditions this diversity can translate into the successful training of puzzle solving models.
    摘要 寻找和选择新领域的问题是感知、科学和创新的核心。我们在python编程练习中的开放式空间中研究自动生成问题。现有的生成模型通常是模型参考分布而不是直接优化多样性。其他方法通过手动编码的表示空间或学习的嵌入空间来显式地优化多样性,但这些方法可能并不与人类的意义变化相匹配。我们在ACES(自动telic代码探索 via 语义描述符)中引入了一种新的自动telic生成方法,利用大语言模型生成的语义描述符直接优化有趣的多样性,以及几招学习。每个练习都被标记了10个维度,每个维度捕捉一个需要解决它的编程技能。ACES生成和追求新的可行目标,慢慢发现任务抽象 semantic空间中的多样性,在任务执行中逐渐发现可解决的编程练习。在一系列实验中,我们发现ACES在多样性度量上比现有的多样性最大化算法更加丰富。我们进一步研究是否和在哪些条件下,这种多样性可以导致练习解决模型的成功培训。

CoCoFormer: A controllable feature-rich polyphonic music generation method

  • paper_url: http://arxiv.org/abs/2310.09843
  • repo_url: None
  • paper_authors: Jiuyang Zhou, Tengfei Niu, Hong Zhu, Xingping Wang
  • for: 本研究探讨了多重音乐序列的模型化方法,尤其是使用 transformer 模型进行可控音乐生成。
  • methods: 本研究提出了 Condition Choir Transformer(CoCoFormer)模型,通过控制输出模型的逻辑和拍子输入来实现精细化控制。同时,通过自我超VI等方法进行验证和训练。
  • results: 实验表明,CoCoFormer 模型在指定多重音乐Texture时,可以生成多种不同的同一首歌曲,并且达到了当前最佳水平。
    Abstract This paper explores the modeling method of polyphonic music sequence. Due to the great potential of Transformer models in music generation, controllable music generation is receiving more attention. In the task of polyphonic music, current controllable generation research focuses on controlling the generation of chords, but lacks precise adjustment for the controllable generation of choral music textures. This paper proposed Condition Choir Transformer (CoCoFormer) which controls the output of the model by controlling the chord and rhythm inputs at a fine-grained level. In this paper, the self-supervised method improves the loss function and performs joint training through conditional control input and unconditional input training. In order to alleviate the lack of diversity on generated samples caused by the teacher forcing training, this paper added an adversarial training method. CoCoFormer enhances model performance with explicit and implicit inputs to chords and rhythms. In this paper, the experiments proves that CoCoFormer has reached the current better level than current models. On the premise of specifying the polyphonic music texture, the same melody can also be generated in a variety of ways.
    摘要 The paper uses a self-supervised method to improve the loss function and performs joint training through conditional control input and unconditional input training. To alleviate the lack of diversity in generated samples caused by teacher forcing training, the paper adds an adversarial training method. CoCoFormer enhances model performance with explicit and implicit inputs to chords and rhythms.Experiments show that CoCoFormer has reached a current better level than current models. With the premise of specifying the polyphonic music texture, the same melody can also be generated in a variety of ways.Translation notes:* "polyphonic music sequence" is translated as "多重音乐序列" (polytrophic music sequence)* "Transformer models" is translated as "变换器模型" (transformer models)* "controllable music generation" is translated as "可控音乐生成" (controllable music generation)* "chord" is translated as "和声" (chord)* "rhythm" is translated as "拍" (rhythm)* "self-supervised method" is translated as "自我指导方法" (self-supervised method)* "adversarial training method" is translated as "对抗训练方法" (adversarial training method)* "CoCoFormer" is translated as "CoCoFormer" (CoCoFormer)* "polyphonic music texture" is translated as "多重音乐Texture" (polyphonic music texture)* "melody" is translated as "旋律" (melody)

Explaining How a Neural Network Play the Go Game and Let People Learn

  • paper_url: http://arxiv.org/abs/2310.09838
  • repo_url: None
  • paper_authors: Huilin Zhou, Huijie Tang, Mingjie Li, Hao Zhang, Zhenyu Liu, Quanshi Zhang
  • for: 本研究的目的是解释Go游戏中AI模型所编码的知识,并使用这些知识来教育人类玩家。
  • methods: 本研究使用了Value网络来提取Go游戏中石头之间的交互 primitives,以便人类可以从Value网络中学习准确和可靠的知识。
  • results: 实验表明,我们的方法可以有效地提取Go游戏中AI模型所编码的知识,并帮助人类玩家更好地理解和掌握Go游戏。
    Abstract The AI model has surpassed human players in the game of Go, and it is widely believed that the AI model has encoded new knowledge about the Go game beyond human players. In this way, explaining the knowledge encoded by the AI model and using it to teach human players represent a promising-yet-challenging issue in explainable AI. To this end, mathematical supports are required to ensure that human players can learn accurate and verifiable knowledge, rather than specious intuitive analysis. Thus, in this paper, we extract interaction primitives between stones encoded by the value network for the Go game, so as to enable people to learn from the value network. Experiments show the effectiveness of our method.
    摘要 人工智能模型已经在围棋游戏中超越人类玩家,而且广泛认为该模型已经编码了人类玩家之外的新知识。因此,解释AI模型所编码的知识并使用其教育人类玩家是一项有前途又挑战的问题。为此,我们需要有数学支持,以确保人类玩家可以学习准确和可靠的知识,而不是基于假设的直觉分析。在本文中,我们提取了围棋中石头之间的互动基本原理,以便让人类玩家从值网络中学习。实验表明我们的方法的效果。

MIR2: Towards Provably Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization

  • paper_url: http://arxiv.org/abs/2310.09833
  • repo_url: None
  • paper_authors: Simin Li, Ruixiao Xu, Jun Guo, Pu Feng, Jiakai Wang, Aishan Liu, Yaodong Yang, Xianglong Liu, Weifeng Lv
  • for: 这篇论文的目的是提出一种robust多代理学习(MARL)方法,以增强对不确定或最坏情况的抗性。
  • methods: 该方法使用policy学习在 Routine Scenarios 中训练,并使用Mutual Information as Robust Regularization来避免过度优化。
  • results: 对于StarCraft II、Multi-agent Mujoco和 rendezvous 等场景,MIR2方法显示了更高的抗性性能,并且在实际应用中的 robot 群集控制场景中也表现出了优异性能。
    Abstract Robust multi-agent reinforcement learning (MARL) necessitates resilience to uncertain or worst-case actions by unknown allies. Existing max-min optimization techniques in robust MARL seek to enhance resilience by training agents against worst-case adversaries, but this becomes intractable as the number of agents grows, leading to exponentially increasing worst-case scenarios. Attempts to simplify this complexity often yield overly pessimistic policies, inadequate robustness across scenarios and high computational demands. Unlike these approaches, humans naturally learn adaptive and resilient behaviors without the necessity of preparing for every conceivable worst-case scenario. Motivated by this, we propose MIR2, which trains policy in routine scenarios and minimize Mutual Information as Robust Regularization. Theoretically, we frame robustness as an inference problem and prove that minimizing mutual information between histories and actions implicitly maximizes a lower bound on robustness under certain assumptions. Further analysis reveals that our proposed approach prevents agents from overreacting to others through an information bottleneck and aligns the policy with a robust action prior. Empirically, our MIR2 displays even greater resilience against worst-case adversaries than max-min optimization in StarCraft II, Multi-agent Mujoco and rendezvous. Our superiority is consistent when deployed in challenging real-world robot swarm control scenario. See code and demo videos in Supplementary Materials.
    摘要 多智能体强化学习(MARL)需要对不确定或最坏情况的行动具备抗性。现有的最大最小优化技术在Robust MARL中增强抗性,但随着智能体数量增加,最坏情况的数量将 exponentiation 增长,导致计算量过高。尝试简化这种复杂性通常会导致过度保守的策略,不足robustness across scenarios和高计算需求。与这些方法不同,人类自然地学习了适应和抗性行为,无需为每个可能的最坏情况做准备。 inspirited by this,我们提出了MIR2,它在 Routine scenarios 中训练策略,并将 Mutual Information 作为Robust Regularization 来最小化。从理论角度来看,我们将robustness 视为一个推理问题,并证明在某些假设下,将mutual information between histories and actions 最小化会implicitly 最大化一个下界于robustness的lower bound。进一步的分析表明,我们的提posed approach prevent agents from overreacting to others through an information bottleneck,并使策略与一个robust action prior 吻合。Empirically,我们的MIR2在StarCraft II, Multi-agent Mujoco和 rendezvous 中对最坏情况的抗性性能更高than max-min optimization。我们的superiority 在Real-world robot swarm control scenario 中也是一致的。参考代码和示例视频在Supplementary Materials中。

Large Language Models for In-Context Student Modeling: Synthesizing Student’s Behavior in Visual Programming from One-Shot Observation

  • paper_url: http://arxiv.org/abs/2310.10690
  • repo_url: None
  • paper_authors: Manh Hung Nguyen, Sebastian Tschiatschek, Adish Singla
  • for: This paper is written for researchers and practitioners in the field of educational technology, particularly those interested in student modeling and personalized learning.
  • methods: The paper explores the use of Large Language Models (LLMs) for in-context student modeling in open-ended learning environments. The proposed framework, LLM-SS, leverages LLMs to synthesize a student’s behavior based on their solving attempts on a reference task. The authors fine-tune LLMs using domain-specific expertise to improve their understanding of domain background and student behaviors.
  • results: The paper reports significant improvements in student behavior synthesis compared to baseline methods included in the StudentSyn benchmark. Specifically, the method using the fine-tuned Llama2-70B model improves noticeably compared to using the base model and becomes on par with using the state-of-the-art GPT-4 model.
    Abstract Student modeling is central to many educational technologies as it enables the prediction of future learning outcomes and targeted instructional strategies. However, open-ended learning environments pose challenges for accurately modeling students due to the diverse behaviors exhibited by students and the absence of a well-defined set of learning skills. To approach these challenges, we explore the application of Large Language Models (LLMs) for in-context student modeling in open-ended learning environments. We introduce a novel framework, LLM-SS, that leverages LLMs for synthesizing student's behavior. More concretely, given a particular student's solving attempt on a reference task as observation, the goal is to synthesize the student's attempt on a target task. Our framework can be combined with different LLMs; moreover, we fine-tune LLMs using domain-specific expertise to boost their understanding of domain background and student behaviors. We evaluate several concrete methods based on LLM-SS using the StudentSyn benchmark, an existing student's attempt synthesis benchmark in visual programming. Experimental results show a significant improvement compared to baseline methods included in the StudentSyn benchmark. Furthermore, our method using the fine-tuned Llama2-70B model improves noticeably compared to using the base model and becomes on par with using the state-of-the-art GPT-4 model.
    摘要 We propose a novel framework, LLM-SS, which leverages LLMs to synthesize a student's behavior. Given a particular student's attempt at a reference task, the goal is to synthesize their attempt on a target task. Our framework can be combined with different LLMs, and we fine-tune these models using domain-specific expertise to improve their understanding of the domain background and student behaviors.We evaluate several concrete methods based on LLM-SS using the StudentSyn benchmark, an existing student attempt synthesis benchmark in visual programming. The results show a significant improvement compared to baseline methods included in the StudentSyn benchmark. Additionally, our method using the fine-tuned Llama2-70B model improves noticeably compared to using the base model and is on par with using the state-of-the-art GPT-4 model.

Optimizing K-means for Big Data: A Comparative Study

  • paper_url: http://arxiv.org/abs/2310.09819
  • repo_url: None
  • paper_authors: Ravil Mussabayev, Rustam Mussabayev
  • for: 这篇论文旨在比较不同优化技术对K-means算法的应用在大数据场景中的影响。
  • methods: 论文描述了不同的优化技术,包括并行、简化、采样等方法,以解决K-means算法在大数据场景中的缺乏扩展性问题。
  • results: 作者通过对各种标准数据集进行比较,发现不同的技术在不同的数据集上的表现不同,并提供了关于速度和准确性之间的负担平衡的理解。
    Abstract This paper presents a comparative analysis of different optimization techniques for the K-means algorithm in the context of big data. K-means is a widely used clustering algorithm, but it can suffer from scalability issues when dealing with large datasets. The paper explores different approaches to overcome these issues, including parallelization, approximation, and sampling methods. The authors evaluate the performance of these techniques on various benchmark datasets and compare them in terms of speed, quality of clustering, and scalability according to the LIMA dominance criterion. The results show that different techniques are more suitable for different types of datasets and provide insights into the trade-offs between speed and accuracy in K-means clustering for big data. Overall, the paper offers a comprehensive guide for practitioners and researchers on how to optimize K-means for big data applications.
    摘要 Translation in Simplified Chinese:这篇论文提出了对K-means算法的不同优化技术进行比较分析,以帮助在大数据场景下使用K-means算法。K-means算法广泛使用,但是它在处理大数据时可能会遇到扩展性问题。论文探讨了不同的方法来解决这些问题,包括并行、 aproximation 和采样方法。作者对这些技术在不同的测试数据集上进行评估,并根据LIMA主导因素来比较它们的速度、归一化质量和可扩展性。结果显示不同的技术适用于不同的数据类型,并提供了关于速度和准确性在K-means归一化中的贸易OFF的深入理解。总之,这篇论文为实践者和研究人员提供了一份全面的指南,以帮助他们在大数据应用中优化K-means算法。

Negative Sampling with Adaptive Denoising Mixup for Knowledge Graph Embedding

  • paper_url: http://arxiv.org/abs/2310.09781
  • repo_url: https://github.com/DeMix2023/Demix
  • paper_authors: Xiangnan Chen, Wen Zhang, Zhen Yao, Mingyang Chen, Siliang Tang
  • for: 本研究旨在提高知识图(KG)中entity和relation embedding的质量,通过减少负样本中的噪声。
  • methods: 提议使用一种混合策略,通过自我supervised的方式来更新负样本,从而提高KGE的训练效果。
  • results: 实验结果表明,提议的DeMix方法可以更好地减少负样本中的噪声,使KGE更快地训练到更好的链接预测结果。
    Abstract Knowledge graph embedding (KGE) aims to map entities and relations of a knowledge graph (KG) into a low-dimensional and dense vector space via contrasting the positive and negative triples. In the training process of KGEs, negative sampling is essential to find high-quality negative triples since KGs only contain positive triples. Most existing negative sampling methods assume that non-existent triples with high scores are high-quality negative triples. However, negative triples sampled by these methods are likely to contain noise. Specifically, they ignore that non-existent triples with high scores might also be true facts due to the incompleteness of KGs, which are usually called false negative triples. To alleviate the above issue, we propose an easily pluggable denoising mixup method called DeMix, which generates high-quality triples by refining sampled negative triples in a self-supervised manner. Given a sampled unlabeled triple, DeMix firstly classifies it into a marginal pseudo-negative triple or a negative triple based on the judgment of the KGE model itself. Secondly, it selects an appropriate mixup partner for the current triple to synthesize a partially positive or a harder negative triple. Experimental results on the knowledge graph completion task show that the proposed DeMix is superior to other negative sampling techniques, ensuring corresponding KGEs a faster convergence and better link prediction results.
    摘要 知识图embedding(KGE)目的是将知识图(KG)中的实体和关系映射到一个低维度和紧凑的向量空间,通过对正确和错误 triplets进行对比。在KGE训练过程中,负样本是关键的,因为KG只包含正确的 triplets。现有的负样本方法假设高分负样本是高质量的负样本,但这些负样本可能含有噪声。Specifically, these methods ignore the fact that high-scoring non-existent triplets may be true facts due to the incompleteness of KGs, which are called false negative triplets. To address this issue, we propose an easily pluggable denoising mixup method called DeMix, which generates high-quality triples by refining sampled negative triples in a self-supervised manner. Given a sampled unlabeled triple, DeMix first classifies it into a marginal pseudo-negative triple or a negative triple based on the judgment of the KGE model itself. Secondly, it selects an appropriate mixup partner for the current triple to synthesize a partially positive or a harder negative triple. Experimental results on the knowledge graph completion task show that the proposed DeMix is superior to other negative sampling techniques, ensuring corresponding KGEs a faster convergence and better link prediction results.

Notes on Applicability of Explainable AI Methods to Machine Learning Models Using Features Extracted by Persistent Homology

  • paper_url: http://arxiv.org/abs/2310.09780
  • repo_url: https://github.com/naofumihama/xai_ph_ml
  • paper_authors: Naofumi Hama
  • For: The paper explores the potential application of explainable AI methodologies to the persistent homology (PH)-machine learning (ML) pipeline for predicting gas adsorption in metal-organic frameworks.* Methods: The paper uses the PH-ML pipeline to extract features from topological data analysis and applies explainable AI methodologies to improve the interpretability of the results.* Results: The paper demonstrates suggestive results for predicting gas adsorption in metal-organic frameworks using the PH-ML pipeline with explainable AI methodologies. The codes to reproduce the results are available on GitHub.Here is the same information in Simplified Chinese text:* For: 本文探讨PH-ML管线在预测金属组织材料中气吸附过程中的可读性。* Methods: 本文使用PH-ML管线提取特征,并应用可读性AI方法来提高结果的解释性。* Results: 本文提出了预测金属组织材料中气吸附过程中的可读性结果,并提供了在GitHub上可重现的代码。
    Abstract Data analysis that uses the output of topological data analysis as input for machine learning algorithms has been the subject of extensive research. This approach offers a means of capturing the global structure of data. Persistent homology (PH), a common methodology within the field of TDA, has found wide-ranging applications in machine learning. One of the key reasons for the success of the PH-ML pipeline lies in the deterministic nature of feature extraction conducted through PH. The ability to achieve satisfactory levels of accuracy with relatively simple downstream machine learning models, when processing these extracted features, underlines the pipeline's superior interpretability. However, it must be noted that this interpretation has encountered issues. Specifically, it fails to accurately reflect the feasible parameter region in the data generation process, and the physical or chemical constraints that restrict this process. Against this backdrop, we explore the potential application of explainable AI methodologies to this PH-ML pipeline. We apply this approach to the specific problem of predicting gas adsorption in metal-organic frameworks and demonstrate that it can yield suggestive results. The codes to reproduce our results are available at https://github.com/naofumihama/xai_ph_ml
    摘要 研究使用 topological data analysis(TDA)的输出作为机器学习算法的输入的数据分析方法已经得到了广泛的研究。这种方法可以捕捉数据的全局结构。 persistent homology(PH)是TDA领域中常用的方法ологи,在机器学习领域也有广泛的应用。PH-ML管道的成功一个关键原因在于PH的干扰特征,这使得可以使用简单的下游机器学习模型达到高度的准确性。然而,这种解释存在一些问题,它无法准确地反映数据生成过程中可行的参数范围和物理或化学约束。为了解决这些问题,我们研究了使用可解释AI方法ologies来解释PH-ML管道。我们在预测金属组分材料中的气体吸附问题中应用了这种方法,并证明了它可以提供有价值的结果。codes可以在https://github.com/naofumihama/xai_ph_ml中找到。

Worst-Case Analysis is Maximum-A-Posteriori Estimation

  • paper_url: http://arxiv.org/abs/2310.09774
  • repo_url: None
  • paper_authors: Hongjun Wu, Di Wang
  • for: 这种软件工程任务中的性能优化和算法复杂性找出缺陷。
  • methods: 使用一种通用、适应和有 garantía的随机探测框架,称为DSE-SMC,来估计最坏情况的资源使用。
  • results: 对 Java 应用程序进行实验评估,得到了 DSE-SMC 比现有黑盒随机探测方法更有效。
    Abstract The worst-case resource usage of a program can provide useful information for many software-engineering tasks, such as performance optimization and algorithmic-complexity-vulnerability discovery. This paper presents a generic, adaptive, and sound fuzzing framework, called DSE-SMC, for estimating worst-case resource usage. DSE-SMC is generic because it is black-box as long as the user provides an interface for retrieving resource-usage information on a given input; adaptive because it automatically balances between exploration and exploitation of candidate inputs; and sound because it is guaranteed to converge to the true resource-usage distribution of the analyzed program. DSE-SMC is built upon a key observation: resource accumulation in a program is isomorphic to the soft-conditioning mechanism in Bayesian probabilistic programming; thus, worst-case resource analysis is isomorphic to the maximum-a-posteriori-estimation problem of Bayesian statistics. DSE-SMC incorporates sequential Monte Carlo (SMC) -- a generic framework for Bayesian inference -- with adaptive evolutionary fuzzing algorithms, in a sound manner, i.e., DSE-SMC asymptotically converges to the posterior distribution induced by resource-usage behavior of the analyzed program. Experimental evaluation on Java applications demonstrates that DSE-SMC is significantly more effective than existing black-box fuzzing methods for worst-case analysis.
    摘要 <> translate "The worst-case resource usage of a program can provide useful information for many software-engineering tasks, such as performance optimization and algorithmic-complexity-vulnerability discovery. This paper presents a generic, adaptive, and sound fuzzing framework, called DSE-SMC, for estimating worst-case resource usage. DSE-SMC is generic because it is black-box as long as the user provides an interface for retrieving resource-usage information on a given input; adaptive because it automatically balances between exploration and exploitation of candidate inputs; and sound because it is guaranteed to converge to the true resource-usage distribution of the analyzed program. DSE-SMC is built upon a key observation: resource accumulation in a program is isomorphic to the soft-conditioning mechanism in Bayesian probabilistic programming; thus, worst-case resource analysis is isomorphic to the maximum-a-posteriori-estimation problem of Bayesian statistics. DSE-SMC incorporates sequential Monte Carlo (SMC) -- a generic framework for Bayesian inference -- with adaptive evolutionary fuzzing algorithms, in a sound manner, i.e., DSE-SMC asymptotically converges to the posterior distribution induced by resource-usage behavior of the analyzed program. Experimental evaluation on Java applications demonstrates that DSE-SMC is significantly more effective than existing black-box fuzzing methods for worst-case analysis."into Simplified Chinese:<>将程序的最差情况资源使用情况提供有用信息,用于软件工程各种任务,如性能优化和漏极性漏极性检测。本文介绍了一种通用、适应、有Sound的异步爬虫框架,称为DSE-SMC,用于估计最差情况资源使用。DSE-SMC是通用的,因为它可以透过输入的接口获取资源使用信息; 适应的,因为它会自动考虑探索和利用候选输入; 和有Sound的,因为它可以保证对分析程序的资源使用行为进行正确的拟合。 DSE-SMC基于资源寄生在程序中的软件条件机制,因此最差情况资源分析与 bayesian probabilistic programming 中的最大 posterior estimation 归一化。 DSE-SMC通过将 Bayesian 推理框架sequential Monte Carlo (SMC) 与适应演化爬虫算法相结合,实现了一种有Sound的方法。实验结果表明,DSE-SMC在 Java 应用程序上比现有的黑盒爬虫方法更有效。

A Critical Survey on Fairness Benefits of XAI

  • paper_url: http://arxiv.org/abs/2310.13007
  • repo_url: None
  • paper_authors: Luca Deck, Jakob Schoeffer, Maria De-Arteaga, Niklas Kühl
  • for: 这些研究旨在探讨可解释人工智能(XAI)与公平性之间的关系,并寻找XAI如何实现公平性的方法。
  • methods: 这些研究使用系统性的文献复查和后续的质量分析,找到了175篇关于XAI是如何提供公平性的纷争性的论文。
  • results: 研究发现了7种典型的声索,即XAI可以帮助实现多种公平性标准。但是,研究还发现了这些声索的一些重要的限制和困难。
    Abstract In this critical survey, we analyze typical claims on the relationship between explainable AI (XAI) and fairness to disentangle the multidimensional relationship between these two concepts. Based on a systematic literature review and a subsequent qualitative content analysis, we identify seven archetypal claims from 175 papers on the alleged fairness benefits of XAI. We present crucial caveats with respect to these claims and provide an entry point for future discussions around the potentials and limitations of XAI for specific fairness desiderata. While the literature often suggests XAI to be an enabler for several fairness desiderata, we notice a misalignment between these desiderata and the capabilities of XAI. We encourage to conceive XAI as one of many tools to approach the multidimensional, sociotechnical challenge of algorithmic fairness and to be more specific about how exactly what kind of XAI method enables whom to address which fairness desideratum.
    摘要 在这份重要的调查中,我们分析了通用Explainable AI(XAI)和公平之间的关系,以彻底分离这两个概念之间的多维关系。通过系统性文献综述和 subsequential 资料分析,我们确定了175篇文章中对XAI的公平 benefittest的七种典型声明。我们提出了关于这些声明的重要警告和限制,并为将来关于XAI在特定公平要求上的潜在优势和局限性的讨论提供入口点。尽管文献 часто表明XAI是许多公平要求的激活器,但我们注意到了XAI的能力与这些要求的不一致。我们建议视XAI为一种用于多维、社技挑战的算法公平的工具,并更 preciselly 说明XAI方法可以为谁 Address 哪些公平要求。

VLIS: Unimodal Language Models Guide Multimodal Language Generation

  • paper_url: http://arxiv.org/abs/2310.09767
  • repo_url: https://github.com/jiwanchung/vlis
  • paper_authors: Jiwan Chung, Youngjae Yu
  • for: 提高多Modal语言生成的复杂语言理解能力
  • methods: combinesthe visual conditioning capability of vision-language models with the language understanding of unimodal text-only language models without further training
  • results: 在多种任务上(包括CommonSense理解、复杂文本生成等),VLIS可以提高视觉语言模型的性能
    Abstract Multimodal language generation, which leverages the synergy of language and vision, is a rapidly expanding field. However, existing vision-language models face challenges in tasks that require complex linguistic understanding. To address this issue, we introduce Visual-Language models as Importance Sampling weights (VLIS), a novel framework that combines the visual conditioning capability of vision-language models with the language understanding of unimodal text-only language models without further training. It extracts pointwise mutual information of each image and text from a visual-language model and uses the value as an importance sampling weight to adjust the token likelihood from a text-only model. VLIS improves vision-language models on diverse tasks, including commonsense understanding (WHOOPS, OK-VQA, and ScienceQA) and complex text generation (Concadia, Image Paragraph Captioning, and ROCStories). Our results suggest that VLIS represents a promising new direction for multimodal language generation.
    摘要 多模态语言生成,利用语言和视觉之间的共同作用,是一个快速发展的领域。然而,现有的视觉语言模型在需要复杂的语言理解任务时会遇到挑战。为解决这个问题,我们介绍了视觉语言模型作为重要抽象权重(VLIS),这是一种将视觉语言模型的视觉条件能力与单模式文本Only语言模型的语言理解能力结合在一起的新框架。它从视觉语言模型中提取每个图像和文本的点对 Mutual Information,并将其用作重要抽象权重,以调整文本Only模型的单词概率。VLIS改进了多种任务,包括宽泛理解(WHOOPS、OK-VQA和科学问答)和复杂文本生成(Concadia、图像段落描述和ROCStories)。我们的结果表明,VLIS代表了一个有前途的新方向 для多模态语言生成。

  • paper_url: http://arxiv.org/abs/2310.09765
  • repo_url: None
  • paper_authors: Sayan Mahapatra, Debtanu Datta, Shubham Soni, Adrijit Goswami, Saptarshi Ghosh
  • For: The paper aims to make legal text in the Indian judiciary more accessible to the general population, who are not comfortable with reading English.* Methods: The authors construct a high-quality legal parallel corpus containing aligned text units in English and nine Indian languages, and benchmark the performance of various Machine Translation (MT) systems over this corpus.* Results: The authors survey Law practitioners to evaluate the quality of the translations produced by the MT systems, and compare the results with automatic MT evaluation metrics.
    Abstract Most legal text in the Indian judiciary is written in complex English due to historical reasons. However, only about 10% of the Indian population is comfortable in reading English. Hence legal text needs to be made available in various Indian languages, possibly by translating the available legal text from English. Though there has been a lot of research on translation to and between Indian languages, to our knowledge, there has not been much prior work on such translation in the legal domain. In this work, we construct the first high-quality legal parallel corpus containing aligned text units in English and nine Indian languages, that includes several low-resource languages. We also benchmark the performance of a wide variety of Machine Translation (MT) systems over this corpus, including commercial MT systems, open-source MT systems and Large Language Models. Through a comprehensive survey by Law practitioners, we check how satisfied they are with the translations by some of these MT systems, and how well automatic MT evaluation metrics agree with the opinions of Law practitioners.
    摘要 大多数印度法律文本在印度司法系统中 escriten in 复杂的英语,历史原因。然而,只有约10%的印度人口能够读写英语。因此,法律文本需要在各种印度语言中提供,可能是通过从英语翻译。虽然已有很多关于翻译与印度语言之间的研究,但我们知道,在法律领域中的翻译研究不多。在这项工作中,我们构建了首个高质量的法律平行文本库,包括英语和九种印度语言的对应文本单位。我们还对这个库进行了评估,包括商业MT系统、开源MT系统和大语言模型。通过对法律专业人员的详细调查,我们检查了这些MT系统的翻译质量如何满意,以及自动MT评估指标与专业人员的意见如何相符。

DropMix: Better Graph Contrastive Learning with Harder Negative Samples

  • paper_url: http://arxiv.org/abs/2310.09764
  • repo_url: https://github.com/Mayueq/DropMix-Code
  • paper_authors: Yueqi Ma, Minjie Chen, Xiang Li
  • for: 提高图像对比学习中的负样本质量
  • methods: DropMix方法包括两个主要步骤:首先选择图像中的困难负样本,然后只在部分表示维度上进行混合,以生成更困难的负样本
  • results: 对六个基准数据集进行了广泛的实验,结果表明 DropMix 方法可以提高对比学习性能
    Abstract While generating better negative samples for contrastive learning has been widely studied in the areas of CV and NLP, very few work has focused on graph-structured data. Recently, Mixup has been introduced to synthesize hard negative samples in graph contrastive learning (GCL). However, due to the unsupervised learning nature of GCL, without the help of soft labels, directly mixing representations of samples could inadvertently lead to the information loss of the original hard negative and further adversely affect the quality of the newly generated harder negative. To address the problem, in this paper, we propose a novel method DropMix to synthesize harder negative samples, which consists of two main steps. Specifically, we first select some hard negative samples by measuring their hardness from both local and global views in the graph simultaneously. After that, we mix hard negatives only on partial representation dimensions to generate harder ones and decrease the information loss caused by Mixup. We conduct extensive experiments to verify the effectiveness of DropMix on six benchmark datasets. Our results show that our method can lead to better GCL performance. Our data and codes are publicly available at https://github.com/Mayueq/DropMix-Code.
    摘要 “对待于图structured数据的异构学习中,生成更好的负样本已经广泛研究在CV和NLP领域,但很少有研究在图结构数据上。近期,Mixup方法在图相关学习(GCL)中被引入,以生成困难的负样本。然而,由于GCL是无监督学习的,没有软标签的帮助,直接混合样本表示可能会导致原始困难的负样本中的信息损失,从而降低新生成的更困难负样本的质量。为解决这个问题,在本文中,我们提出了一种新的方法DropMix,它包括两个主要步骤。具体来说,我们首先从图中选择一些困难的负样本,并测量它们的困难程度从本地和全局视图同时。然后,我们只在部分表示维度上混合困难负样本,以生成更困难的负样本和减少Mixup导致的信息损失。我们对六个标准 benchmark dataset进行了广泛的实验,结果显示,我们的方法可以提高GCL性能。我们的数据和代码在https://github.com/Mayueq/DropMix-Code上公开。”

Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer

  • paper_url: http://arxiv.org/abs/2310.09762
  • repo_url: None
  • paper_authors: Boan Liu, Liang Ding, Li Shen, Keqin Peng, Yu Cao, Dazhao Cheng, Dacheng Tao
  • for: 提高MoE模型的表现和多样性
  • methods: 提出了一种简单 yet高效的解决方案——对采用MoE结构的模型进行非对称专家优化,并 introduce了一种 alternate training strategy to encourage each expert to update in a direction orthogonal to the subspace spanned by other experts。
  • results: 通过广泛的实验,证明了我们提出的优化算法可以显著提高MoE模型在GLUE、SuperGLUE、问答任务和名词识别任务的表现。
    Abstract The Mixture of Experts (MoE) has emerged as a highly successful technique in deep learning, based on the principle of divide-and-conquer to maximize model capacity without significant additional computational cost. Even in the era of large-scale language models (LLMs), MoE continues to play a crucial role, as some researchers have indicated that GPT-4 adopts the MoE structure to ensure diverse inference results. However, MoE is susceptible to performance degeneracy, particularly evident in the issues of imbalance and homogeneous representation among experts. While previous studies have extensively addressed the problem of imbalance, the challenge of homogeneous representation remains unresolved. In this study, we shed light on the homogeneous representation problem, wherein experts in the MoE fail to specialize and lack diversity, leading to frustratingly high similarities in their representations (up to 99% in a well-performed MoE model). This problem restricts the expressive power of the MoE and, we argue, contradicts its original intention. To tackle this issue, we propose a straightforward yet highly effective solution: OMoE, an orthogonal expert optimizer. Additionally, we introduce an alternating training strategy that encourages each expert to update in a direction orthogonal to the subspace spanned by other experts. Our algorithm facilitates MoE training in two key ways: firstly, it explicitly enhances representation diversity, and secondly, it implicitly fosters interaction between experts during orthogonal weights computation. Through extensive experiments, we demonstrate that our proposed optimization algorithm significantly improves the performance of fine-tuning the MoE model on the GLUE benchmark, SuperGLUE benchmark, question-answering task, and name entity recognition tasks.
    摘要 《粗粒化专家(MoE)》技术在深度学习中得到了广泛应用,基于分治分 conquering的原则,以提高模型容量而不增加显著的计算成本。即使在大规模语言模型(LLM)时代,MoE仍然扮演着关键的角色,一些研究人员表示GPT-4采用了MoE结构以确保多样化的推理结果。然而,MoE受到性能异常化的问题困扰,特别是专家之间的不均衡和同质化表现问题。虽然以前的研究已经广泛地解决了不均衡问题,但同质化表现问题仍然未得到解决。在这项研究中,我们 shed light on the homogeneous representation problem,专家在MoE中失去特化和多样性,导致其表达相似度达99%以上(在一个良好的MoE模型中)。这个问题限制了MoE的表达力,我们认为这与MoE的原意相抵触。为解决这个问题,我们提出了一种简单 yet highly effective的解决方案:OMoE,一种ortogonal expert optimizer。此外,我们还提出了一种 alternate training strategy,鼓励每个专家在归一化方向上更新其 weights。我们的算法可以在两个关键方面帮助MoE训练:首先,它明确提高了表达多样性;其次,它 implicit地促进了专家之间的交互在 ortogonal weights 计算中。通过广泛的实验,我们证明了我们的提出的优化算法可以显著提高 fine-tuning MoE 模型在 GLUE Benchmark、SuperGLUE Benchmark、问题回答任务和名词识别任务上的性能。

CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes

  • paper_url: http://arxiv.org/abs/2310.09761
  • repo_url: https://github.com/yuleiqin/capro
  • paper_authors: Yulei Qin, Xingyu Chen, Yunhang Shen, Chaoyou Fu, Yun Gu, Ke Li, Xing Sun, Rongrong Ji
  • For: 这个论文旨在提出一种基于文本和图像协同学习的Visual Representation Learning方法,以适应现实世界中噪声的挑战。* Methods: 该方法使用文本prototype来选择干净的图像,并通过文本匹配来解决视觉prototype的混乱问题。此外,它还使用视觉特征空间来完善和提高图像的文本描述,以及使用集合bootstrap来鼓励更好的标签参考。* Results: 实验表明,CAPro可以 effetively处理现实世界中的噪声,并在单个标签和多个标签场景下达到新的州OF-THE-ART性能。它还展示了对开集认识的Robustness。代码可以在https://github.com/yuleiqin/capro上下载。
    Abstract Webly supervised learning has attracted increasing attention for its effectiveness in exploring publicly accessible data at scale without manual annotation. However, most existing methods of learning with web datasets are faced with challenges from label noise, and they have limited assumptions on clean samples under various noise. For instance, web images retrieved with queries of tiger cat (a cat species) and drumstick (a musical instrument) are almost dominated by images of tigers and chickens, which exacerbates the challenge of fine-grained visual concept learning. In this case, exploiting both web images and their associated texts is a requisite solution to combat real-world noise. In this paper, we propose Cross-modality Aligned Prototypes (CAPro), a unified prototypical contrastive learning framework to learn visual representations with correct semantics. For one thing, we leverage textual prototypes, which stem from the distinct concept definition of classes, to select clean images by text matching and thus disambiguate the formation of visual prototypes. For another, to handle missing and mismatched noisy texts, we resort to the visual feature space to complete and enhance individual texts and thereafter improve text matching. Such semantically aligned visual prototypes are further polished up with high-quality samples, and engaged in both cluster regularization and noise removal. Besides, we propose collective bootstrapping to encourage smoother and wiser label reference from appearance-similar instances in a manner of dictionary look-up. Extensive experiments on WebVision1k and NUS-WIDE (Web) demonstrate that CAPro well handles realistic noise under both single-label and multi-label scenarios. CAPro achieves new state-of-the-art performance and exhibits robustness to open-set recognition. Codes are available at https://github.com/yuleiqin/capro.
    摘要 优先级学习在扫描公共访问数据时得到了越来越多的关注,因为它可以让计算机系统利用大规模数据来学习而无需手动标注。然而,现有的网络数据学习方法受到噪声标注的挑战,而且它们假设清晰的样本存在于各种噪声下。例如,通过查询“虎猫”和“鼓”的图像检索结果将主要是虎猫和鸡图像,这会使视觉概念学习受到挑战。在这种情况下,利用网络图像和其相关文本是一种必要的解决方案,以避免实际世界中的噪声。在这篇论文中,我们提出了跨模态对应原型(CAPro),一种统一的 проtotypical contrastive learning框架,用于学习正确的视觉表示。首先,我们利用文本原型,它们来自不同类型的概念定义,来选择干净的图像,并通过文本匹配来减少视觉原型的形成干扰。其次,为了处理缺失和不一致的噪声文本,我们 resorts to the visual feature space to complete and enhance individual texts, and thereafter improve text matching。这些semantically aligned的视觉原型被进一步练练以高质量样本,并在集群规则和噪声除除中使用。此外,我们提出了集体 bootstrap的方法,以便在 appearancely similar 的实例上进行词典查找,以便更好地引用表达相似的标签。广泛的实验表明,CAPro可以有效地处理现实世界中的噪声,并在单个标签和多标签场景中达到新的领先性性和robustness。代码可以在https://github.com/yuleiqin/capro中获取。

EX-FEVER: A Dataset for Multi-hop Explainable Fact Verification

  • paper_url: http://arxiv.org/abs/2310.09754
  • repo_url: https://github.com/dependentsign/EX-FEVER
  • paper_authors: Huanhuan Ma, Weizhi Xu, Yifan Wei, Liuji Chen, Liang Wang, Qiang Liu, Shu Wu, Liang Wang
  • for: 这个论文的目的是构建一个可解释的事实验证系统,以便在复杂多层扩展中实现自动化的真实检查。
  • methods: 该论文使用了一种新的基于Wikipedia文档的数据集,并提出了一种基于这些数据集的基线系统。该基线系统包括文档检索、解释生成和CLAIM验证三个部分。
  • results: 该论文通过对EX-FEVER数据集进行实验,发现现有的事实验证模型在这个数据集上表现不佳,而Large Language Models在这个任务中具有潜在的应用前景。
    Abstract Fact verification aims to automatically probe the veracity of a claim based on several pieces of evidence. Existing works are always engaging in the accuracy improvement, let alone the explainability, a critical capability of fact verification system. Constructing an explainable fact verification system in a complex multi-hop scenario is consistently impeded by the absence of a relevant high-quality dataset. Previous dataset either suffer from excessive simplification or fail to incorporate essential considerations for explainability. To address this, we present EX-FEVER, a pioneering dataset for multi-hop explainable fact verification. With over 60,000 claims involving 2-hop and 3-hop reasoning, each is created by summarizing and modifying information from hyperlinked Wikipedia documents. Each instance is accompanied by a veracity label and an explanation that outlines the reasoning path supporting the veracity classification. Additionally, we demonstrate a novel baseline system on our EX-FEVER dataset, showcasing document retrieval, explanation generation, and claim verification and observe that existing fact verification models trained on previous datasets struggle to perform well on our dataset. Furthermore, we highlight the potential of utilizing Large Language Models in the fact verification task. We hope our dataset could make a significant contribution by providing ample opportunities to explore the integration of natural language explanations in the domain of fact verification.
    摘要 fact checking 目标是自动检查声明的真实性,基于多个证据。现有工作都在增强准确性,却忽视了解释性,这是 фаクト checking 系统的关键能力。在复杂多趋场景中构建可解释的 факт checking 系统受到高质量数据缺乏的阻碍。现有数据集都受到过度简化或者缺乏关键考虑因素,以解释性为前提,我们提出了 EX-FEVER 数据集,包含了2-hop和3-hop逻辑推理的60,000个声明,每个声明都由修改和摘要来自 hyperlinked Wikipedia 文档。每个实例都有真实性标签和解释,其中解释描述了支持真实性分类的逻辑路径。此外,我们还提出了一种基于 EX-FEVER 数据集的基线系统,包括文档检索、解释生成和声明验证,并观察到现有的 fact checking 模型在前一个数据集上表现不佳。此外,我们还强调了利用大型自然语言模型在 fact checking 任务中的潜在优势。我们希望我们的数据集能够为研究人员提供丰富的探索自然语言解释在验证领域的机会。

Beyond Segmentation: Road Network Generation with Multi-Modal LLMs

  • paper_url: http://arxiv.org/abs/2310.09755
  • repo_url: None
  • paper_authors: Sumedh Rasal, Sanjay Kumar Boddhu
  • for: 本研究旨在提供一种创新的路网生成方法,利用多modal的大语言模型(LLM)来生成细致、可行驾驶的路网。
  • methods: 我们的模型使用了BLIP-2架构 arXiv:2301.12597,利用预先冻结的图像编码器和大语言模型来创造一种多modal LLM。
  • results: 我们的实验结果表明,使用我们的方法可以准确地生成路网,并且不需要生成二进制分割mask。这种方法可以增强自主驾驶系统,特别是在路网场景中,准确的导航是非常重要的。
    Abstract This paper introduces an innovative approach to road network generation through the utilization of a multi-modal Large Language Model (LLM). Our model is specifically designed to process aerial images of road layouts and produce detailed, navigable road networks within the input images. The core innovation of our system lies in the unique training methodology employed for the large language model to generate road networks as its output. This approach draws inspiration from the BLIP-2 architecture arXiv:2301.12597, leveraging pre-trained frozen image encoders and large language models to create a versatile multi-modal LLM. Our work also offers an alternative to the reasoning segmentation method proposed in the LISA paper arXiv:2308.00692. By training the large language model with our approach, the necessity for generating binary segmentation masks, as suggested in the LISA paper arXiv:2308.00692, is effectively eliminated. Experimental results underscore the efficacy of our multi-modal LLM in providing precise and valuable navigational guidance. This research represents a significant stride in bolstering autonomous navigation systems, especially in road network scenarios, where accurate guidance is of paramount importance.
    摘要

When can transformers reason with abstract symbols?

  • paper_url: http://arxiv.org/abs/2310.09753
  • repo_url: https://github.com/eboix/relational-reasoning
  • paper_authors: Enric Boix-Adsera, Omid Saremi, Emmanuel Abbe, Samy Bengio, Etai Littwin, Joshua Susskind
  • for: 这个研究探讨了基于抽象符号的关系理解任务中 transformer大语言模型(LLMs)的能力。
  • methods: 这些任务使用了许多年来在 neuroscience 文献中研究的基本建构物,包括程序编程、数学和语言理解。
  • results: 研究发现,对于回归任务, transformer 可以通过训练而泛化,但需要很大量的训练数据;对于下一个符号预测任务, transformer 的表示维度增加会导致泛化失败,但可以通过添加两个可调参数来降低数据量。
    Abstract We investigate the capabilities of transformer large language models (LLMs) on relational reasoning tasks involving abstract symbols. Such tasks have long been studied in the neuroscience literature as fundamental building blocks for more complex abilities in programming, mathematics, and verbal reasoning. For (i) regression tasks, we prove that transformers generalize when trained, but require astonishingly large quantities of training data. For (ii) next-token-prediction tasks with symbolic labels, we show an "inverse scaling law": transformers fail to generalize as their embedding dimension increases. For both settings (i) and (ii), we propose subtle transformer modifications which can reduce the amount of data needed by adding two trainable parameters per head.
    摘要 我们研究transformer大语言模型(LLM)在关系理解任务中的能力,这些任务在神经科学文献中已经被认为是更进阶的程序设计、数学和语言理解能力的基础元素。 для(i)回溯任务,我们证明transformer会通过训练时通过数据大量化,但需要非常多的训练数据。 для(ii)下一个字符预测任务,我们显示了“倒推法则”:transformer在增加嵌入维度时无法通过数据大量化。 для beiden(i)和(ii)设定,我们提出了微妙的transformer修改,可以透过添加两个可调参数每个head来降低训练数据量。

Domain-Specific Language Model Post-Training for Indonesian Financial NLP

  • paper_url: http://arxiv.org/abs/2310.09736
  • repo_url: https://github.com/intanq/indonesian-financial-domain-lm
  • paper_authors: Ni Putu Intan Maharani, Yoga Yustiawan, Fauzy Caesar Rochim, Ayu Purwarianti
  • for: 这 paper 是关于金融领域的自然语言处理(NLP)任务中BERT和IndoBERT的应用和调整。
  • methods: 本文使用了预训练的IndoBERT,在小规模的INDONESIAN financial corpus上进行了后期训练。同时,我们还构建了INDONESIAN自然语言负面情感分类和主题分类数据集,并发布了一家BERT模型 для金融NLP。
  • results: 我们的实验结果表明,对特定领域下的下游任务进行适应性训练可以提高语言模型的效果。
    Abstract BERT and IndoBERT have achieved impressive performance in several NLP tasks. There has been several investigation on its adaption in specialized domains especially for English language. We focus on financial domain and Indonesian language, where we perform post-training on pre-trained IndoBERT for financial domain using a small scale of Indonesian financial corpus. In this paper, we construct an Indonesian self-supervised financial corpus, Indonesian financial sentiment analysis dataset, Indonesian financial topic classification dataset, and release a family of BERT models for financial NLP. We also evaluate the effectiveness of domain-specific post-training on sentiment analysis and topic classification tasks. Our findings indicate that the post-training increases the effectiveness of a language model when it is fine-tuned to domain-specific downstream tasks.
    摘要 BERT和IndoBERT在多个自然语言处理任务中表现出色。有很多关于它们在特定领域的调整的研究。我们在金融领域和印度尼西亚语言中进行调整,使用小规模的印度尼西亚金融文本库进行后处理。在这篇论文中,我们构建了一个印度尼西亚自我指导的金融文本库,印度尼西亚金融情感分析数据集和印度尼西亚金融话题分类数据集,并推出一家BERT模型的家族用于金融NLPT。我们还评估了域名特定的后处理对情感分析和话题分类任务的效果。我们的发现表明,后处理可以提高语言模型在域名特定下滤波器任务的效果。

Enhancing Conversational Search: Large Language Model-Aided Informative Query Rewriting

  • paper_url: http://arxiv.org/abs/2310.09716
  • repo_url: None
  • paper_authors: Fanghua Ye, Meng Fang, Shenghui Li, Emine Yilmaz
  • for: 提高对话搜索的会话搜索性能,使用语言模型来重写用户查询。
  • methods: 使用大型语言模型(LLM)来重写查询,通过设计良好的指令来生成有用的重写。
  • results: 对QReCC数据集进行实验,显示了使用有用的重写可以提高搜索性能,尤其是使用稀有搜索器。
    Abstract Query rewriting plays a vital role in enhancing conversational search by transforming context-dependent user queries into standalone forms. Existing approaches primarily leverage human-rewritten queries as labels to train query rewriting models. However, human rewrites may lack sufficient information for optimal retrieval performance. To overcome this limitation, we propose utilizing large language models (LLMs) as query rewriters, enabling the generation of informative query rewrites through well-designed instructions. We define four essential properties for well-formed rewrites and incorporate all of them into the instruction. In addition, we introduce the role of rewrite editors for LLMs when initial query rewrites are available, forming a "rewrite-then-edit" process. Furthermore, we propose distilling the rewriting capabilities of LLMs into smaller models to reduce rewriting latency. Our experimental evaluation on the QReCC dataset demonstrates that informative query rewrites can yield substantially improved retrieval performance compared to human rewrites, especially with sparse retrievers.
    摘要 查询重写play vital role in enhance conversational search by transforming context-dependent user queries into standalone forms. Existing approaches primarily leverages human-rewritten queries as labels to train query rewriting models. However, human rewrites may lack sufficient information for optimal retrieval performance. To overcome this limitation, we propose utilizing large language models (LLMs) as query rewriters, enabling the generation of informative query rewrites through well-designed instructions. We define four essential properties for well-formed rewrites and incorporate all of them into the instruction. In addition, we introduce the role of rewrite editors for LLMs when initial query rewrites are available, forming a "rewrite-then-edit" process. Furthermore, we propose distilling the rewriting capabilities of LLMs into smaller models to reduce rewriting latency. Our experimental evaluation on the QReCC dataset demonstrates that informative query rewrites can yield substantially improved retrieval performance compared to human rewrites, especially with sparse retrievers.Here's the text with Traditional Chinese characters:查询重写play vital role in enhance conversational search by transforming context-dependent user queries into standalone forms. Existing approaches primarily leverages human-rewritten queries as labels to train query rewriting models. However, human rewrites may lack sufficient information for optimal retrieval performance. To overcome this limitation, we propose utilizing large language models (LLMs) as query rewriters, enabling the generation of informative query rewrites through well-designed instructions. We define four essential properties for well-formed rewrites and incorporate all of them into the instruction. In addition, we introduce the role of rewrite editors for LLMs when initial query rewrites are available, forming a "rewrite-then-edit" process. Furthermore, we propose distilling the rewriting capabilities of LLMs into smaller models to reduce rewriting latency. Our experimental evaluation on the QReCC dataset demonstrates that informative query rewrites can yield substantially improved retrieval performance compared to human rewrites, especially with sparse retrievers.

New Advances in Body Composition Assessment with ShapedNet: A Single Image Deep Regression Approach

  • paper_url: http://arxiv.org/abs/2310.09709
  • repo_url: None
  • paper_authors: Navar Medeiros M. Nascimento, Pedro Cavalcante de Sousa Junior, Pedro Yuri Rodrigues Nunes, Suane Pires Pinheiro da Silva, Luiz Lannes Loureiro, Victor Zaban Bittencourt, Valden Luis Matos Capistrano Junior, Pedro Pedrosa Rebouças Filho
  • for: 增强体重分析方法
  • methods: 使用深度神经网络进行身体脂肪百分比(BFP)估算、个体识别和位置确定,只需单张照片
  • results: 比对 стандар方法双能X射线吸收仪(DXA),1273名健康成人的Age、性别和BFP水平进行验证,结果表明ShapedNet比前方法提高19.5%,MAPE为4.91%,MAE为1.42%,Gender-neutral方法表现更优。
    Abstract We introduce a novel technique called ShapedNet to enhance body composition assessment. This method employs a deep neural network capable of estimating Body Fat Percentage (BFP), performing individual identification, and enabling localization using a single photograph. The accuracy of ShapedNet is validated through comprehensive comparisons against the gold standard method, Dual-Energy X-ray Absorptiometry (DXA), utilizing 1273 healthy adults spanning various ages, sexes, and BFP levels. The results demonstrate that ShapedNet outperforms in 19.5% state of the art computer vision-based approaches for body fat estimation, achieving a Mean Absolute Percentage Error (MAPE) of 4.91% and Mean Absolute Error (MAE) of 1.42. The study evaluates both gender-based and Gender-neutral approaches, with the latter showcasing superior performance. The method estimates BFP with 95% confidence within an error margin of 4.01% to 5.81%. This research advances multi-task learning and body composition assessment theory through ShapedNet.
    摘要 我们介绍了一种新的技术called ShapedNet,用于提高身体组分评估。这种方法利用深度神经网络,能够估算身体脂肪百分比(BFP),进行个体识别,并使用单张图像进行地图化。我们 validate了ShapedNet的准确性,通过对杰基标方法(DXA)的1273名健康成人进行比较,这些成人来自不同的年龄、性别和BFP水平。结果表明,ShapedNet在19.5%的state of the art计算机视觉基础上进行身体脂肪估计方法中,表现出色,其 Mean Absolute Percentage Error(MAPE)为4.91%, Mean Absolute Error(MAE)为1.42。我们也评估了不同的性别和无性别方法,其中后者表现更出色。ShapedNet可以在95%的信息内,对BFP进行4.01%至5.81%的估计,这对身体组分评估理论和多任务学习做出了重要贡献。

AdaptSSR: Pre-training User Model with Augmentation-Adaptive Self-Supervised Ranking

  • paper_url: http://arxiv.org/abs/2310.09706
  • repo_url: https://github.com/yflyl613/AdaptSSR
  • paper_authors: Yang Yu, Qi Liu, Kai Zhang, Yuren Zhang, Chao Song, Min Hou, Yuqing Yuan, Zhihao Ye, Zaixi Zhang, Sanshi Lei Yu
  • for: 用于提高用户模型的泛化能力和数据稀缺性问题。
  • methods: 使用对数据进行增强学习,并采用自适应自我supervised排序任务来改善用户模型的准确性。
  • results: 经过extensive experiments的证明,该方法可以提高用户模型的性能和数据稀缺性问题。
    Abstract User modeling, which aims to capture users' characteristics or interests, heavily relies on task-specific labeled data and suffers from the data sparsity issue. Several recent studies tackled this problem by pre-training the user model on massive user behavior sequences with a contrastive learning task. Generally, these methods assume different views of the same behavior sequence constructed via data augmentation are semantically consistent, i.e., reflecting similar characteristics or interests of the user, and thus maximizing their agreement in the feature space. However, due to the diverse interests and heavy noise in user behaviors, existing augmentation methods tend to lose certain characteristics of the user or introduce noisy behaviors. Thus, forcing the user model to directly maximize the similarity between the augmented views may result in a negative transfer. To this end, we propose to replace the contrastive learning task with a new pretext task: Augmentation-Adaptive SelfSupervised Ranking (AdaptSSR), which alleviates the requirement of semantic consistency between the augmented views while pre-training a discriminative user model. Specifically, we adopt a multiple pairwise ranking loss which trains the user model to capture the similarity orders between the implicitly augmented view, the explicitly augmented view, and views from other users. We further employ an in-batch hard negative sampling strategy to facilitate model training. Moreover, considering the distinct impacts of data augmentation on different behavior sequences, we design an augmentation-adaptive fusion mechanism to automatically adjust the similarity order constraint applied to each sample based on the estimated similarity between the augmented views. Extensive experiments on both public and industrial datasets with six downstream tasks verify the effectiveness of AdaptSSR.
    摘要 用户模型化,它目标是捕捉用户特点或兴趣,受到任务特定的标注数据的缺乏问题困扰。一些最近的研究解决了这个问题,通过在大量用户行为序列上进行预训练,并使用对偶学习任务。通常,这些方法假设不同的视图的同一个行为序列,通过数据扩展生成的方法是具有相同特征或兴趣的用户,并且尽量在特征空间中增加它们之间的一致性。然而,由于用户的兴趣和行为噪声的多样性,现有的扩展方法通常会消失用户的特征或引入噪声行为。因此,直接在扩展视图之间寻求最大的一致性可能会导致负面传播。为此,我们提议将对偶学习任务改为一种新的预文任务:增强自监 Ranking(AdaptSSR),这种任务可以降低对扩展视图的 semantic consistency 要求,而在预训练用户模型时,capture用户的相似性序列。具体来说,我们采用多对多对比损失函数,训练用户模型,捕捉扩展视图、显式扩展视图和其他用户视图之间的相似性序列。此外,考虑不同的数据扩展对不同的行为序列的不同影响,我们设计了数据扩展适应机制,自动调整每个样本所应用的相似性序列约束,基于每个扩展视图之间的估计相似性。我们在公共和工业数据集上进行了六个下游任务的广泛实验,并证明了 AdaptSSR 的效果。

Progressive Evidence Refinement for Open-domain Multimodal Retrieval Question Answering

  • paper_url: http://arxiv.org/abs/2310.09696
  • repo_url: None
  • paper_authors: Shuwen Yang, Anran Wu, Xingjiao Wu, Luwei Xiao, Tianlong Ma, Cheng Jin, Liang He
  • for: 提高 retrieval-based question answering 模型的表现,解决现有模型在使用压缩证据特征时丢失细节信息,以及Question和证据之间的特征提取差距。
  • methods: 提出了一种两阶段框架,包括进行逐步证据筛选、使用 semi-supervised contrastive learning 训练策略、多次询问回答等方法来解决这两个问题。
  • results: 通过广泛的实验证明,该模型在 WebQA 和 MultimodelQA 测试上达到了出色的表现。
    Abstract Pre-trained multimodal models have achieved significant success in retrieval-based question answering. However, current multimodal retrieval question-answering models face two main challenges. Firstly, utilizing compressed evidence features as input to the model results in the loss of fine-grained information within the evidence. Secondly, a gap exists between the feature extraction of evidence and the question, which hinders the model from effectively extracting critical features from the evidence based on the given question. We propose a two-stage framework for evidence retrieval and question-answering to alleviate these issues. First and foremost, we propose a progressive evidence refinement strategy for selecting crucial evidence. This strategy employs an iterative evidence retrieval approach to uncover the logical sequence among the evidence pieces. It incorporates two rounds of filtering to optimize the solution space, thus further ensuring temporal efficiency. Subsequently, we introduce a semi-supervised contrastive learning training strategy based on negative samples to expand the scope of the question domain, allowing for a more thorough exploration of latent knowledge within known samples. Finally, in order to mitigate the loss of fine-grained information, we devise a multi-turn retrieval and question-answering strategy to handle multimodal inputs. This strategy involves incorporating multimodal evidence directly into the model as part of the historical dialogue and question. Meanwhile, we leverage a cross-modal attention mechanism to capture the underlying connections between the evidence and the question, and the answer is generated through a decoding generation approach. We validate the model's effectiveness through extensive experiments, achieving outstanding performance on WebQA and MultimodelQA benchmark tests.
    摘要 先进多模态模型已经在回答问题中取得了显著成功。然而,当前的多模态回答问题模型面临两个主要挑战。首先,使用压缩证据特征作为模型输入会导致证据中细详信息的损失。其次,证据和问题之间的特征EXTRACTING存在差距,这使得模型从证据中EXTRACTING答案相关的关键特征变得困难。我们提出了一个两个阶段框架,用于增强证据检索和回答问题。首先,我们提出了一种进步的证据精细化策略,用于选择重要的证据。这种策略使用迭代的证据检索方法,找到证据归并的逻辑顺序。它使用两轮的筛选来优化解决空间,从而更加确保时间效率。其次,我们引入了一种半监督对比学习训练策略,以扩展问题领域。这种策略基于负样本,通过对已知样本进行更多的探索,扩大问题领域的范围。 finally,为了减少细详信息的损失,我们提出了一种多turn检索和回答策略,用于处理多模态输入。这种策略将多模态证据直接 integrate into the model 中的历史对话和问题。同时,我们利用交叉模式注意力机制,捕捉证据和问题之间的下面连接。通过解码生成方法,我们生成答案。我们通过广泛的实验 validate the model's effectiveness, achieved outstanding performance on WebQA and MultimodelQA benchmark tests.

Spike-based Neuromorphic Computing for Next-Generation Computer Vision

  • paper_url: http://arxiv.org/abs/2310.09692
  • repo_url: None
  • paper_authors: Md Sakib Hasan, Catherine D. Schuman, Zhongyang Zhang, Tauhidur Rahman, Garrett S. Rose
  • for: 这篇论文旨在探讨 neuromorphic computing 技术的应用在计算机视觉领域。
  • methods: 论文使用了不同层次设计(设备、电路和算法)的示例来介绍 neuromorphic computing 技术。
  • results: 论文 conclude 了一些可能的应用和未来研究方向,例如用于 edge device 中的视觉任务。
    Abstract Neuromorphic Computing promises orders of magnitude improvement in energy efficiency compared to traditional von Neumann computing paradigm. The goal is to develop an adaptive, fault-tolerant, low-footprint, fast, low-energy intelligent system by learning and emulating brain functionality which can be realized through innovation in different abstraction layers including material, device, circuit, architecture and algorithm. As the energy consumption in complex vision tasks keep increasing exponentially due to larger data set and resource-constrained edge devices become increasingly ubiquitous, spike-based neuromorphic computing approaches can be viable alternative to deep convolutional neural network that is dominating the vision field today. In this book chapter, we introduce neuromorphic computing, outline a few representative examples from different layers of the design stack (devices, circuits and algorithms) and conclude with a few exciting applications and future research directions that seem promising for computer vision in the near future.
    摘要 《神经omorphic computing》承诺在能效率方面比传统的各批计算模式提供多个数量级的提升。目标是开发一个适应、错误tolerant、占用空间小、快速、低能耗智能系统,通过学习和模拟大脑功能来实现。在复杂视觉任务中的能 consumption不断增加,而 Edge devices受限的资源变得越来越普遍,使得使用射频型神经omorphic computing方法可以成为视觉领域今天的可行替代方案。在这个书章中,我们介绍了神经omorphic computing,从不同层次的设计栈(设备、电路和算法)中选出了一些示例,并结束于一些有前途的应用和未来研究方向。

Configuration Validation with Large Language Models

  • paper_url: http://arxiv.org/abs/2310.09690
  • repo_url: https://github.com/ciri4conf/ciri
  • paper_authors: Xinyu Lian, Yinfang Chen, Runxiang Cheng, Jie Huang, Parth Thakkar, Tianyin Xu
  • for: 这个论文主要是为了探讨使用自然语言处理(NLP)和机器学习(ML)进行配置验证的可能性和效果。
  • methods: 该论文使用了大量的配置数据和不同的大语言模型(LLMs)进行验证,并开发了一个通用的 LLM-based 验证框架(Ciri)。该框架使用了小量的示例数据和几何学学习来设计有效的提示,并将多个 LLMs 的输出 validate 并聚合成验证结果。
  • results: 该论文的分析表明,使用 LLMs 进行配置验证是可能的,并且可以采用提示工程学习和几何学学习来设计有效的提示。但是,该论文还发现了一些问题,例如某些类型的错误配置不能准确地被检测出来,以及 LLMs 的偏见对一些常见的配置参数产生影响。
    Abstract Misconfigurations are the major causes of software failures. Existing configuration validation techniques rely on manually written rules or test cases, which are expensive to implement and maintain, and are hard to be comprehensive. Leveraging machine learning (ML) and natural language processing (NLP) for configuration validation is considered a promising direction, but has been facing challenges such as the need of not only large-scale configuration data, but also system-specific features and models which are hard to generalize. Recent advances in Large Language Models (LLMs) show the promises to address some of the long-lasting limitations of ML/NLP-based configuration validation techniques. In this paper, we present an exploratory analysis on the feasibility and effectiveness of using LLMs like GPT and Codex for configuration validation. Specifically, we take a first step to empirically evaluate LLMs as configuration validators without additional fine-tuning or code generation. We develop a generic LLM-based validation framework, named Ciri, which integrates different LLMs. Ciri devises effective prompt engineering with few-shot learning based on both valid configuration and misconfiguration data. Ciri also validates and aggregates the outputs of LLMs to generate validation results, coping with known hallucination and nondeterminism of LLMs. We evaluate the validation effectiveness of Ciri on five popular LLMs using configuration data of six mature, widely deployed open-source systems. Our analysis (1) confirms the potential of using LLMs for configuration validation, (2) understands the design space of LLMbased validators like Ciri, especially in terms of prompt engineering with few-shot learning, and (3) reveals open challenges such as ineffectiveness in detecting certain types of misconfigurations and biases to popular configuration parameters.
    摘要 软件故障的主要原因是配置错误。现有的配置验证技术依赖于手动编写的规则或测试用例,实施和维护成本高,难以全面验证。使用机器学习(ML)和自然语言处理(NLP)进行配置验证是一个有前途的方向,但它面临着大规模配置数据和系统特有的特征和模型难以普适化的挑战。近年来,大型自然语言模型(LLMs)的进步表明可以解决一些长期存在的ML/NLP基于配置验证技术的局限性。在这篇论文中,我们提出了一种使用LLMs like GPT和Codex进行配置验证的探索性分析。 Specifically,我们不需要额外 fine-tuning或代码生成,就可以使用LLMs来验证配置。我们开发了一个通用的LLM-based validation框架,名为Ciri。Ciri使用几种LLMs,并开发了有效的提示工程学和少量学习技术,以适应不同的配置数据。Ciri还可以将LLMs的输出验证和聚合,以生成验证结果,并处理知道的投影和非决定性。我们对五种流行的LLMs进行了配置数据的六种广泛部署的开源系统的验证。我们的分析表明:(1)使用LLMs进行配置验证是有潜力的;(2)LLM-based validator如Ciri在提示工程学和少量学习方面存在设计空间,特别是在针对有效配置和错误配置数据进行少量学习;(3)存在一些未解决的挑战,例如对某些类型的配置错误不够有效。

  • paper_url: http://arxiv.org/abs/2310.09689
  • repo_url: https://github.com/anindyasarkariith/psrl_vas
  • paper_authors: Anindya Sarkar, Nathan Jacobs, Yevgeniy Vorobeychik
  • for: 这篇论文旨在提出一个名为“视觉活搜”(Visual Active Search,VAS)的框架,用于使用视觉讯号来引导探索,以找到大地ospatial空间中的区域兴趣。
  • methods: 这篇论文使用了深度强化学习(Deep Reinforcement Learning,DRL)和传统的活搜搜寻(Active Search)两种方法。
  • results: 论文的实验结果显示,该方法可以对现有的DRL框架进行改进,并且在多个问题领域中表现出色。
    Abstract Visual active search (VAS) has been proposed as a modeling framework in which visual cues are used to guide exploration, with the goal of identifying regions of interest in a large geospatial area. Its potential applications include identifying hot spots of rare wildlife poaching activity, search-and-rescue scenarios, identifying illegal trafficking of weapons, drugs, or people, and many others. State of the art approaches to VAS include applications of deep reinforcement learning (DRL), which yield end-to-end search policies, and traditional active search, which combines predictions with custom algorithmic approaches. While the DRL framework has been shown to greatly outperform traditional active search in such domains, its end-to-end nature does not make full use of supervised information attained either during training, or during actual search, a significant limitation if search tasks differ significantly from those in the training distribution. We propose an approach that combines the strength of both DRL and conventional active search by decomposing the search policy into a prediction module, which produces a geospatial distribution of regions of interest based on task embedding and search history, and a search module, which takes the predictions and search history as input and outputs the search distribution. We develop a novel meta-learning approach for jointly learning the resulting combined policy that can make effective use of supervised information obtained both at training and decision time. Our extensive experiments demonstrate that the proposed representation and meta-learning frameworks significantly outperform state of the art in visual active search on several problem domains.
    摘要 视觉活动搜索(VAS)被提出作为模型框架,使用视觉提示导航,以找到大型地ospatial领域中的区域兴趣点。其潜在应用包括珍稀野生动物贩卖活动热点检测、搜救找寻、武器贸易毒品人贩卖等。现状最佳实践方法包括应用深度强化学习(DRL),得到综合搜索策略,以及传统的活动搜索,将预测与自定义算法策略结合。而DRL框架在这些领域中已经大幅超越传统的活动搜索,但其端到端的结构不能充分利用在训练和决策过程中获得的指导信息。我们提议一种将DRL和传统的活动搜索结合在一起的方法,将搜索策略 decomposes为预测模块和搜索模块。预测模块根据任务嵌入和搜索历史生成地ospatial领域中的区域兴趣点,搜索模块将预测和搜索历史作为输入,输出搜索分布。我们开发了一种新的元学习方法,用于同时学习结果的结合策略,以便在训练和决策过程中有效地利用获得的指导信息。我们的广泛实验表明,我们的表示和元学习框架在多个问题领域中具有显著超越现状的性能。

Recursively-Constrained Partially Observable Markov Decision Processes

  • paper_url: http://arxiv.org/abs/2310.09688
  • repo_url: None
  • paper_authors: Qi Heng Ho, Tyler Becker, Ben Kraske, Zakariya Laouar, Martin Feather, Federico Rossi, Morteza Lahijanian, Zachary N. Sunberg
  • for: 本研究旨在解决受到转移不确定性和部分可见性限制的优化目标函数问题。
  • methods: 本研究使用了受到约束的部分 observable Markov Decision Process (C-POMDP) 模型,并提出了一种新的形式ulation,即 Recursively-Constrained POMDP (RC-POMDP),以解决优化目标函数问题中的缺陷。
  • results: 研究发现,对于 C-POMDPs,优化策略可能会违反贝尔曼的优化原则,导致不良行为。而 RC-POMDPs 中的优化策略总是具有确定性,并且遵循贝尔曼的优化原则。研究还提出了一种基于点的动态计划算法,可以Synthesize RC-POMDPs 中的优化策略。在一系列 benchmark 问题中,研究发现 RC-POMDPs 中的策略比 C-POMDPs 中的策略更为愉悦,并且 demonstrate 了算法的可靠性。
    Abstract In many problems, it is desirable to optimize an objective function while imposing constraints on some other aspect of the problem. A Constrained Partially Observable Markov Decision Process (C-POMDP) allows modelling of such problems while subject to transition uncertainty and partial observability. Typically, the constraints in C-POMDPs enforce a threshold on expected cumulative costs starting from an initial state distribution. In this work, we first show that optimal C-POMDP policies may violate Bellman's principle of optimality and thus may exhibit pathological behaviors, which can be undesirable for many applications. To address this drawback, we introduce a new formulation, the Recursively-Constrained POMDP (RC-POMDP), that imposes additional history dependent cost constraints on the C-POMDP. We show that, unlike C-POMDPs, RC-POMDPs always have deterministic optimal policies, and that optimal policies obey Bellman's principle of optimality. We also present a point-based dynamic programming algorithm that synthesizes optimal policies for RC-POMDPs. In our evaluations, we show that policies for RC-POMDPs produce more desirable behavior than policies for C-POMDPs and demonstrate the efficacy of our algorithm across a set of benchmark problems.
    摘要 很多问题中,您希望优化一个目标函数,同时对另一个问题进行约束。一个受过чай�odel Markov决策过程(C-POMDP)可以模拟这些问题,同时受到过程不确定和部分可见性的影响。通常,C-POMDPs 中的约束都是对起始状态分布的预期总成本下的阈值。在这项工作中,我们首先表明了C-POMDP 的优化策略可能会违反 Bellman 的优化原理,从而导致不良行为,这可能对许多应用程序不符合预期。为解决这个缺点,我们引入了一种新的形式,即循环约束 POMDP(RC-POMDP),该形式在 C-POMDP 中添加了历史висимые成本约束。我们表明了,不同于 C-POMDPs,RC-POMDPs 的优化策略总是具有确定性,并且优化策略都遵循 Bellman 的优化原理。我们还提出了一种基于点的动态Programming算法,该算法可以Synthesize RC-POMDPs 中的优化策略。在我们的评估中,我们发现RC-POMDPs 中的策略产生了更加愿意的行为,并且我们的算法在一组标准问题上进行了评估, demonstrate了其效果。

Generative artificial intelligence for de novo protein design

  • paper_url: http://arxiv.org/abs/2310.09685
  • repo_url: None
  • paper_authors: Adam Winnifrith, Carlos Outeiral, Brian Hie
  • for: 这些论文的目的是探讨人工智能在蛋白质设计中的应用,以扩展我们对蛋白质的工程能力。
  • methods: 这些论文使用了生成型架构,如语言模型和扩散过程,生成 novel yet realistic 的蛋白质,以实现预先定义的功能和性能。
  • results: 现代设计协议的实验成功率已经接近 20%,从而扩大了蛋白质设计的可能性。 despite extensive progress, there are still challenges in the field, such as determining the best in silico metrics to prioritize designs for experimental testing, and designing proteins that can undergo large conformational changes or be regulated by post-translational modifications and other cellular processes.
    Abstract Engineering new molecules with desirable functions and properties has the potential to extend our ability to engineer proteins beyond what nature has so far evolved. Advances in the so-called "de novo" design problem have recently been brought forward by developments in artificial intelligence. Generative architectures, such as language models and diffusion processes, seem adept at generating novel, yet realistic proteins that display desirable properties and perform specified functions. State-of-the-art design protocols now achieve experimental success rates nearing 20%, thus widening the access to de novo designed proteins. Despite extensive progress, there are clear field-wide challenges, for example in determining the best in silico metrics to prioritise designs for experimental testing, and in designing proteins that can undergo large conformational changes or be regulated by post-translational modifications and other cellular processes. With an increase in the number of models being developed, this review provides a framework to understand how these tools fit into the overall process of de novo protein design. Throughout, we highlight the power of incorporating biochemical knowledge to improve performance and interpretability.
    摘要 engineer新分子 possessing desired functions and properties has the potential to extend our ability to engineer proteins beyond what nature has so far evolved. Advances in the so-called "de novo" design problem have recently been brought forward by developments in artificial intelligence. Generative architectures, such as language models and diffusion processes, seem adept at generating novel, yet realistic proteins that display desirable properties and perform specified functions. State-of-the-art design protocols now achieve experimental success rates nearing 20%, thus widening the access to de novo designed proteins. Despite extensive progress, there are clear field-wide challenges, for example in determining the best in silico metrics to prioritize designs for experimental testing, and in designing proteins that can undergo large conformational changes or be regulated by post-translational modifications and other cellular processes. With an increase in the number of models being developed, this review provides a framework to understand how these tools fit into the overall process of de novo protein design. Throughout, we highlight the power of incorporating biochemical knowledge to improve performance and interpretability.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. Traditional Chinese is also widely used, especially in Taiwan and Hong Kong.

cs.CL - 2023-10-15

UvA-MT’s Participation in the WMT23 General Translation Shared Task

  • paper_url: http://arxiv.org/abs/2310.09946
  • repo_url: None
  • paper_authors: Di Wu, Shaomu Tan, David Stap, Ali Araabi, Christof Monz
  • for: 这个研究报告描述了阿姆斯特丹大学的自然语言处理实验室(UvA-MT)在2023年世界机器翻译大会(WMT)共享任务中的参加。他们在英文<->希伯来两个方向的受限Track中参加竞赛,并显示了使用一个模型处理对向任务时,可以达到相似的结果,比较 Traditional的双语翻译。
  • methods: 这个研究使用了一些有效的策略,如回 перевод、重定义的嵌入表格和任务导向的精细调整,以提高自动评估中的最终结果。
  • results: 在自动评估中,他们在英文->希伯来和希伯来->英文两个方向中都获得了竞争性的结果。
    Abstract This paper describes the UvA-MT's submission to the WMT 2023 shared task on general machine translation. We participate in the constrained track in two directions: English <-> Hebrew. In this competition, we show that by using one model to handle bidirectional tasks, as a minimal setting of Multilingual Machine Translation (MMT), it is possible to achieve comparable results with that of traditional bilingual translation for both directions. By including effective strategies, like back-translation, re-parameterized embedding table, and task-oriented fine-tuning, we obtained competitive final results in the automatic evaluation for both English -> Hebrew and Hebrew -> English directions.
    摘要 translate to Simplified Chinese as follows:这篇论文描述了UvA-MT在WMT 2023共同任务中的提交,我们在Constrained Track中参加了英文 <-> 希伯来两个方向的翻译。在这次竞赛中,我们表明,通过使用一个模型处理双向任务,作为多语言翻译的最小设置(MMT),可以达到相同的结果。通过包括有效策略,如回译、重新参数表示表 и任务导向精度调整,我们在自动评估中获得了对 beiden方向的竞争性最终结果。

FiLM: Fill-in Language Models for Any-Order Generation

  • paper_url: http://arxiv.org/abs/2310.09930
  • repo_url: https://github.com/shentianxiao/film
  • paper_authors: Tianxiao Shen, Hao Peng, Ruoqi Shen, Yao Fu, Zaid Harchaoui, Yejin Choi
  • for: 填充语言模型 (Fill-in Language Model, FiLM) 的目的是提供一种可以在任意位置进行灵活生成的语言模型,以便在填充文本中使用双向文本上下文。
  • methods: FiLM 使用了一种新的语言模型方法,即采用 beta 分布中的变化掩码概率来提高 FiLM 的生成能力。在推理过程中,FiLM 可以顺利地插入缺失的句子、段落或整个文本,以确保输出的文本流畅、与周围上下文一致。
  • results: 在自动和人工评估中,FiLM 表现出色,超过了基于左到右语言模型的填充方法。FiLM 可以轻松地在不同的文本长度和难度水平上进行调整,并且可以在不同的语言模型大小上进行训练和 fine-tuning。
    Abstract Language models have become the backbone of today's AI systems. However, their predominant left-to-right generation limits the use of bidirectional context, which is essential for tasks that involve filling text in the middle. We propose the Fill-in Language Model (FiLM), a new language modeling approach that allows for flexible generation at any position without adhering to a specific generation order. Its training extends the masked language modeling objective by adopting varying mask probabilities sampled from the Beta distribution to enhance the generative capabilities of FiLM. During inference, FiLM can seamlessly insert missing phrases, sentences, or paragraphs, ensuring that the outputs are fluent and are coherent with the surrounding context. In both automatic and human evaluations, FiLM outperforms existing infilling methods that rely on left-to-right language models trained on rearranged text segments. FiLM is easy to implement and can be either trained from scratch or fine-tuned from a left-to-right language model. Notably, as the model size grows, FiLM's perplexity approaches that of strong left-to-right language models of similar sizes, indicating FiLM's scalability and potential as a large language model.
    摘要 现代人工智能系统中,语言模型已成为背景模型。然而,这些主要左往右生成的语言模型限制了使用对向文本填充的 bidirectional 上下文,这是装备填充文本的任务中非常重要。我们提出了填充语言模型(FiLM),一种新的语言模型化方法,可以在任何位置进行 flexible 生成,不受特定生成顺序的限制。它的训练将推广遮盾语言模型的对话预设,透过对应排版的 beta 分布来增强FiLM的生成能力。在推断中,FiLM可以顺利地插入缺失的句子、句末或段落,以确保输出的流畅和与周围上下文一致。在自动和人工评估中,FiLM比靠左往右的语言模型训练在重新排序的文本段落上的填充方法表现出色,并且可以轻松地从头部训练或精革左往右语言模型。值得一提的是,当模型的大小增加时,FiLM的误差接近强左往右语言模型相似大小的误差,这表明FiLM在大型模型中的可扩展性和潜力。

Prompting Scientific Names for Zero-Shot Species Recognition

  • paper_url: http://arxiv.org/abs/2310.09929
  • repo_url: None
  • paper_authors: Shubham Parashar, Zhiqiu Lin, Yanan Li, Shu Kong
  • for: 本研究旨在使用CLIP进行零shot认知高级生物物种,包括鸟类、植物和动物的species recognition。
  • methods: 本研究使用CLIP进行零shot认知,并使用大语言模型(LLM)生成描述(例如物种颜色和形状)以提高性能。
  • results: 研究发现,使用common名称(例如mountain hare)而不是学名(例如Lepus Timidus)在prompt中可以提高CLIP的认知精度,并且可以达到2∼5倍的提升。
    Abstract Trained on web-scale image-text pairs, Vision-Language Models (VLMs) such as CLIP can recognize images of common objects in a zero-shot fashion. However, it is underexplored how to use CLIP for zero-shot recognition of highly specialized concepts, e.g., species of birds, plants, and animals, for which their scientific names are written in Latin or Greek. Indeed, CLIP performs poorly for zero-shot species recognition with prompts that use scientific names, e.g., "a photo of Lepus Timidus" (which is a scientific name in Latin). Because these names are usually not included in CLIP's training set. To improve performance, prior works propose to use large-language models (LLMs) to generate descriptions (e.g., of species color and shape) and additionally use them in prompts. We find that they bring only marginal gains. Differently, we are motivated to translate scientific names (e.g., Lepus Timidus) to common English names (e.g., mountain hare) and use such in the prompts. We find that common names are more likely to be included in CLIP's training set, and prompting them achieves 2$\sim$5 times higher accuracy on benchmarking datasets of fine-grained species recognition.
    摘要 使用 web 级别的图片文本对,视觉语言模型(VLM)如 CLIP 可以不经过训练就识别通用对象的图片。但是,对于高度专业化的概念,如鸟类、植物和动物的种类,它们的科学名称通常是拉丁文或希腊文。CLIP 在无需训练的情况下识别这些种类的图片表现不佳,因为这些名称没有包含在 CLIP 的训练集中。以前的研究提议使用大型自然语言模型(LLM)生成描述(例如,种类颜色和形状),并将其添加到提示中。我们发现它们只提供了有限的改进。与此不同,我们强调将科学名称翻译成通用英文名称(例如,山兔),并使用这些名称作为提示。我们发现这样可以提高 CLIP 的准确率,在benchmarking数据集上实现2-5倍的提高。Here's the translation in Traditional Chinese as well:使用 web 级别的图片文本对,视觉语言模型(VLM)如 CLIP 可以不经过训练就识别通用对象的图片。但是,对于高度专业化的概念,如鸟类、植物和动物的种类,它们的科学名称通常是拉丁文或希腊文。CLIP 在无需训练的情况下识别这些种类的图片表现不佳,因为这些名称没有包含在 CLIP 的训练集中。以前的研究提议使用大型自然语言模型(LLM)生成描述(例如,种类颜色和形状),并将其添加到提示中。我们发现它们只提供了有限的改进。与此不同,我们强调将科学名称翻译成通用英文名称(例如,山兔),并使用这些名称作为提示。我们发现这样可以提高 CLIP 的准确率,在benchmarking数据集上实现2-5倍的提高。

Empirical study of pretrained multilingual language models for zero-shot cross-lingual generation

  • paper_url: http://arxiv.org/abs/2310.09917
  • repo_url: None
  • paper_authors: Nadezhda Chirkova, Sheng Liang, Vassilina Nikoulina
  • for: 这个论文旨在研究零shot cross-语言生成技术,即使finetuning多语言预训练语言模型(mPLM)在一种语言上的一个生成任务,然后用其来预测这个任务在其他语言上的结果。
  • methods: 这篇论文测试了一些替代的mPLM模型,包括mBART和NLLB,并考虑了全 Parameters 的 fine-tuning 和 parameter-efficient fine-tuning with adapters。
  • results: 研究发现,mBART with adapters 与 mT5 相似,NLLB 可以在一些情况下与 mT5 竞争。 此外,研究发现训练学习率对 fine-tuning 的调整可以减轻生成错误语言的问题。
    Abstract Zero-shot cross-lingual generation assumes finetuning the multilingual pretrained language model (mPLM) on a generation task in one language and then using it to make predictions for this task in other languages. Previous works notice a frequent problem of generation in a wrong language and propose approaches to address it, usually using mT5 as a backbone model. In this work, we test alternative mPLMs, such as mBART and NLLB, considering full finetuning and parameter-efficient finetuning with adapters. We find that mBART with adapters performs similarly to mT5 of the same size, and NLLB can be competitive in some cases. We also underline the importance of tuning learning rate used for finetuning, which helps to alleviate the problem of generation in the wrong language.
    摘要 zero-shot 跨语言生成假设通过质量化多语言预训练语言模型(mPLM)的 Fine-tuning 进行一种语言的生成任务,然后用其来预测这个任务的其他语言。 previous works 发现生成 incorrect language 的问题,并提出了解决方案,通常使用 mT5 作为基础模型。 在这个工作中,我们测试了不同的 mPLM,如 mBART 和 NLLB,包括全部 Fine-tuning 和参数有效的 Fine-tuning WITH 适配器。我们发现 mBART WITH 适配器 和 mT5 的同等大小下表现相似,而 NLLB 在一些情况下可以达到竞争水平。我们还强调了在 Fine-tuning 中调整学习率的重要性,可以减轻生成 incorrect language 的问题。

Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis

  • paper_url: http://arxiv.org/abs/2310.09909
  • repo_url: https://github.com/chaoyi-wu/gpt-4v_medical_evaluation
  • paper_authors: Chaoyi Wu, Jiayu Lei, Qiaoyu Zheng, Weike Zhao, Weixiong Lin, Xiaoman Zhang, Xiao Zhou, Ziheng Zhao, Ya Zhang, Yanfeng Wang, Weidi Xie
  • for: This paper assesses the performance of OpenAI’s GPT-4V model in multimodal medical diagnosis, evaluating its ability to distinguish between medical image modalities and anatomy, as well as its ability to generate comprehensive reports.
  • methods: The evaluation uses 17 human body systems and 8 modalities of medical images, with or without patent history provided, to probe the GPT-4V’s ability on multiple clinical tasks such as imaging modality and anatomy recognition, disease diagnosis, and report generation.
  • results: The study finds that while GPT-4V demonstrates proficiency in distinguishing between medical image modalities and anatomy, it faces significant challenges in disease diagnosis and generating comprehensive reports, highlighting the limitations of large multimodal models in supporting real-world medical applications and clinical decision-making.Here are the three key points in Simplified Chinese:
  • for: 这项研究用于评估OpenAI的GPT-4V模型在多模态医学诊断中的表现,包括分辨医疗影像模式和解剖结构等能力。
  • methods: 这项评估使用17个人体系统和8种医疗影像模式,有或无患者历史提供,以探索GPT-4V在多种临床任务上的能力,包括影像模式和解剖结构识别、疾病诊断、报告生成等。
  • results: 研究发现,虽然GPT-4V在分辨医疗影像模式和解剖结构方面表现出色,但在疾病诊断和生成全面报告方面受到了重大挑战,表明大型多模态模型在实际医疗应用和临床决策中仍有很大的发展空间。
    Abstract Driven by the large foundation models, the development of artificial intelligence has witnessed tremendous progress lately, leading to a surge of general interest from the public. In this study, we aim to assess the performance of OpenAI's newest model, GPT-4V(ision), specifically in the realm of multimodal medical diagnosis. Our evaluation encompasses 17 human body systems, including Central Nervous System, Head and Neck, Cardiac, Chest, Hematology, Hepatobiliary, Gastrointestinal, Urogenital, Gynecology, Obstetrics, Breast, Musculoskeletal, Spine, Vascular, Oncology, Trauma, Pediatrics, with images taken from 8 modalities used in daily clinic routine, e.g., X-ray, Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET), Digital Subtraction Angiography (DSA), Mammography, Ultrasound, and Pathology. We probe the GPT-4V's ability on multiple clinical tasks with or without patent history provided, including imaging modality and anatomy recognition, disease diagnosis, report generation, disease localisation. Our observation shows that, while GPT-4V demonstrates proficiency in distinguishing between medical image modalities and anatomy, it faces significant challenges in disease diagnosis and generating comprehensive reports. These findings underscore that while large multimodal models have made significant advancements in computer vision and natural language processing, it remains far from being used to effectively support real-world medical applications and clinical decision-making. All images used in this report can be found in https://github.com/chaoyi-wu/GPT-4V_Medical_Evaluation.
    摘要 由大型基础模型驱动,人工智能的发展最近几年有了很大的进步,引起了公众的广泛关注。在这项研究中,我们想要评估OpenAI的最新模型GPT-4V(视觉)在多modal医学诊断方面的表现。我们的评估覆盖了17个人体系统,包括中枢神经系统、头颈部、心脏、胸部、血液系统、肝胆系统、肠道系统、尿道系统、妇科、儿科、骨骼系统、脊梁系统、血管系统、肿瘤系统、护理、外伤等,图像来自日常临床 Routine的8种模式,例如X射线、计算tomography(CT)、核磁共振成像(MRI)、 позитрон发射tomography(PET)、数字抽取ANGIOGRAPHY(DSA)、胸部X射线、计算tomography(CT)、ultrasound和pathology。我们 probing GPT-4V的能力在多种临床任务上,包括图像模式和解剖学识别、疾病诊断、报告生成、疾病Localization。我们的观察表明,GPT-4V能够Distinguish between different medical imaging modalities and anatomy, but it faces significant challenges in disease diagnosis and report generation. These findings highlight that while large multimodal models have made significant advancements in computer vision and natural language processing, they are still far from being used to effectively support real-world medical applications and clinical decision-making.所有图像使用在这项报告中可以在GitHub上找到:https://github.com/chaoyi-wu/GPT-4V_Medical_Evaluation。

Reformulating NLP tasks to Capture Longitudinal Manifestation of Language Disorders in People with Dementia

  • paper_url: http://arxiv.org/abs/2310.09897
  • repo_url: None
  • paper_authors: Dimitris Gkoumas, Matthew Purver, Maria Liakata
  • for: 这个研究是为了 automatization 语言障碍模式,以便更好地识别和评估词语障碍。
  • methods: 这个研究使用了一个已经训练过的自然语言处理(NLP)模型,并对其进行了修改,以便在NLP任务中强制使用语言模式。然后,他们使用了这些任务的概率估计来构建数字语言标记,用于评估语言交流质量和语言障碍的严重程度。
  • results: 研究发现,提出的语言标记能够准确地识别患有 деменция 的人的语言障碍,并且与临床标记呈正相关。此外,这些语言标记还提供了词语障碍的可观察性和恰当性,可以用于评估词语障碍的进程。
    Abstract Dementia is associated with language disorders which impede communication. Here, we automatically learn linguistic disorder patterns by making use of a moderately-sized pre-trained language model and forcing it to focus on reformulated natural language processing (NLP) tasks and associated linguistic patterns. Our experiments show that NLP tasks that encapsulate contextual information and enhance the gradient signal with linguistic patterns benefit performance. We then use the probability estimates from the best model to construct digital linguistic markers measuring the overall quality in communication and the intensity of a variety of language disorders. We investigate how the digital markers characterize dementia speech from a longitudinal perspective. We find that our proposed communication marker is able to robustly and reliably characterize the language of people with dementia, outperforming existing linguistic approaches; and shows external validity via significant correlation with clinical markers of behaviour. Finally, our proposed linguistic disorder markers provide useful insights into gradual language impairment associated with disease progression.
    摘要 偏僻症与语言障碍有关,我们通过自动学习受控语言模型,训练其专注于修改后NLP任务和相关的语言模式。我们的实验显示,包含语言上下文信息并在语言模式中增强梯度信号的NLP任务可以提高表现。然后,我们使用最佳模型的概率估计来构建数字语言标记,评估整体沟通质量和语言障碍的严重程度。我们研究如何使用我们的提议的沟通标记来 caracterize dementia speech的长期趋势。我们发现,我们的提议的语言障碍标记能够坚定可靠地 caracterize人们患有偏僻症的语言,高于现有的语言方法;并与临床标记相关。最后,我们的语言障碍标记提供了有用的透视 gradual language impairment与疾病进程相关的语言障碍。

Bounding and Filling: A Fast and Flexible Framework for Image Captioning

  • paper_url: http://arxiv.org/abs/2310.09876
  • repo_url: https://github.com/changxinwang/boficap
  • paper_authors: Zheng Ma, Changxin Wang, Bo Huang, Zixuan Zhu, Jianbing Zhang
  • for: 这篇论文目的是提出一种快速和灵活的图像描述模型,以解决现有的描述模型具有 significiant inference latency 问题。
  • methods: 该模型使用 bounding 和 filling 技术,将图像分割成多个区域,然后采用 two-generation 方式填充每个区域。
  • results: 该模型在 MS-COCO 测试集上取得了状态的最佳性能(CIDEr 125.6),并且比基eline模型快速 9.22 倍;在半循环的情况下,该模型达到了 128.4 的 CIDEr 性能,并且速度比基eline模型快速 3.69 倍。
    Abstract Most image captioning models following an autoregressive manner suffer from significant inference latency. Several models adopted a non-autoregressive manner to speed up the process. However, the vanilla non-autoregressive manner results in subpar performance, since it generates all words simultaneously, which fails to capture the relationships between words in a description. The semi-autoregressive manner employs a partially parallel method to preserve performance, but it sacrifices inference speed. In this paper, we introduce a fast and flexible framework for image captioning called BoFiCap based on bounding and filling techniques. The BoFiCap model leverages the inherent characteristics of image captioning tasks to pre-define bounding boxes for image regions and their relationships. Subsequently, the BoFiCap model fills corresponding words in each box using two-generation manners. Leveraging the box hints, our filling process allows each word to better perceive other words. Additionally, our model offers flexible image description generation: 1) by employing different generation manners based on speed or performance requirements, 2) producing varied sentences based on user-specified boxes. Experimental evaluations on the MS-COCO benchmark dataset demonstrate that our framework in a non-autoregressive manner achieves the state-of-the-art on task-specific metric CIDEr (125.6) while speeding up 9.22x than the baseline model with an autoregressive manner; in a semi-autoregressive manner, our method reaches 128.4 on CIDEr while a 3.69x speedup. Our code and data is available at https://github.com/ChangxinWang/BoFiCap.
    摘要 大多数图像描述模型采用回归方式,却受到显著的推理延迟。一些模型采用非回归方式以加速过程,但这会导致性能下降,因为它们同时生成所有 слова,无法捕捉图像描述中 слова之间的关系。半回归方式使用部分并行方法保持性能,但是它们牺牲推理速度。本文提出一种快速和灵活的图像描述模型called BoFiCap,基于缓存和填充技术。BoFiCap模型利用图像描述任务的特点,先定义图像区域的缓存框,然后使用两种生成方式填充对应的字。利用框提示,我们的填充过程让每个字etter perceive其他字。此外,我们的模型提供了自适应的图像描述生成:1)根据速度或性能要求使用不同的生成方式,2)生成基于用户指定的盒子的多种句子。在COCO数据集上的实验评估 demonstrate了我们的框架在非回归方式下达到了状态之arte(CIDEr=125.6),同时速度比基eline模型(具有回归方式)快9.22倍。在半回归方式下,我们的方法达到了128.4的CIDEr,速度比基eline模型快3.69倍。我们的代码和数据可以在https://github.com/ChangxinWang/BoFiCap上获取。

Enhancing Stance Classification with Quantified Moral Foundations

  • paper_url: http://arxiv.org/abs/2310.09848
  • repo_url: None
  • paper_authors: Hong Zhang, Prasanta Bhattacharya, Wei Gao, Liang Ze Wong, Brandon Siyuan Loh, Joseph J. P. Simons, Jisun An
  • for: 这 paper 的目的是增强社交媒体上的立场检测,通过 incorporating deeper psychological attributes,特别是个人的道德基础。
  • methods: 这 paper 使用的方法包括EXTRACTING moral foundation features from text, 以及 message semantic features,来 классифика stance 在 message- 和 user-levels 上。
  • results: Preliminary results 表明, encoding moral foundations 可以提高 stance detection 任务的性能,并帮助描述特定道德基础和 online stance 之间的关系。 results highlight the importance of considering deeper psychological attributes in stance analysis and underscores the role of moral foundations in guiding online social behavior.
    Abstract This study enhances stance detection on social media by incorporating deeper psychological attributes, specifically individuals' moral foundations. These theoretically-derived dimensions aim to provide a comprehensive profile of an individual's moral concerns which, in recent work, has been linked to behaviour in a range of domains, including society, politics, health, and the environment. In this paper, we investigate how moral foundation dimensions can contribute to predicting an individual's stance on a given target. Specifically we incorporate moral foundation features extracted from text, along with message semantic features, to classify stances at both message- and user-levels across a range of targets and models. Our preliminary results suggest that encoding moral foundations can enhance the performance of stance detection tasks and help illuminate the associations between specific moral foundations and online stances on target topics. The results highlight the importance of considering deeper psychological attributes in stance analysis and underscores the role of moral foundations in guiding online social behavior.
    摘要

Merging Experts into One: Improving Computational Efficiency of Mixture of Experts

  • paper_url: http://arxiv.org/abs/2310.09832
  • repo_url: https://github.com/shwai-he/meo
  • paper_authors: Shwai He, Run-Ze Fan, Liang Ding, Li Shen, Tianyi Zhou, Dacheng Tao
  • for: 提高语言模型的大小通常会导致NLPTasks的进步,但是会增加计算成本。零含量的混合专家(MoE)可以减少计算成本,但是如果增加激活专家的数量,计算成本会增加很快,限制实际应用。本文提出一种名为\textbf{\texttt{Merging Experts into One}(MEO)的计算效率的方法,可以保持增加专家的优点而不导致计算成本增加。
  • methods: 我们首先证明选择多个专家的优势,然后提出一种计算效率的方法,即\textbf{\texttt{Merging Experts into One}(MEO),可以将计算成本降低到单个专家的水平。此外,我们还提出了一种符号级注意块,可以进一步提高MEO的效率和表现。
  • results: 我们进行了广泛的实验,显示MEO可以减少计算成本,例如FLOPS从72.0G下降到28.6G(MEO)。此外,我们还提出了一种符号级注意块,可以进一步提高MEO的效率和表现。例如,在GLUE benchmark上,MEO的平均分数为83.3%,而vanilla MoE的平均分数为82.6%。
    Abstract Scaling the size of language models usually leads to remarkable advancements in NLP tasks. But it often comes with a price of growing computational cost. Although a sparse Mixture of Experts (MoE) can reduce the cost by activating a small subset of parameters (e.g., one expert) for each input, its computation escalates significantly if increasing the number of activated experts, limiting its practical utility. Can we retain the advantages of adding more experts without substantially increasing the computational costs? In this paper, we first demonstrate the superiority of selecting multiple experts and then propose a computation-efficient approach called \textbf{\texttt{Merging Experts into One} (MEO), which reduces the computation cost to that of a single expert. Extensive experiments show that MEO significantly improves computational efficiency, e.g., FLOPS drops from 72.0G of vanilla MoE to 28.6G (MEO). Moreover, we propose a token-level attention block that further enhances the efficiency and performance of token-level MEO, e.g., 83.3\% (MEO) vs. 82.6\% (vanilla MoE) average score on the GLUE benchmark. Our code will be released upon acceptance. Code will be released at: \url{https://github.com/Shwai-He/MEO}.
    摘要 通常,将语言模型的大小扩展到可观的尺度会导致NLPTasks的显著进步。然而,这经常会带来计算成本的增加。虽然 sparse Mixture of Experts(MoE)可以降低计算成本,但是如果启用更多的专家,计算成本会快速增加,限制其实际应用。我们是否可以保留添加更多专家的优点而不导致计算成本增加很多?在这篇论文中,我们首先表明了多个专家的选择的优势,然后我们提出了一种 computation-efficient的方法called \textbf{\texttt{Merging Experts into One}(MEO),可以降低计算成本到单个专家的水平。我们进行了广泛的实验,发现 MEO 可以减少 FLOPS 的值,例如,从 vanilla MoE 的 72.0G 降低到 28.6G(MEO)。此外,我们还提出了一种循环预测块,可以进一步提高 MEO 的效率和性能,例如,在 GLUE 测试准则上,MEO 的平均分数为 83.3%,而 vanilla MoE 的平均分数为 82.6%。我们将代码发布在接受后。代码将发布在:\url{https://github.com/Shwai-He/MEO}.

Assessing the Reliability of Large Language Model Knowledge

  • paper_url: http://arxiv.org/abs/2310.09820
  • repo_url: None
  • paper_authors: Weixuan Wang, Barry Haddow, Alexandra Birch, Wei Peng
  • for: 评估大语言模型(LLMs)的知识可靠性。
  • methods: 提出了一种名为 Model Knowledge Relibility Score (MONITOR) 的新度量方法,用于直接测试 LLMs 的事实可靠性。
  • results: 在一系列12种 LLMS 上进行了实验,并证明了 MONITOR 的效iveness 以及低计算成本。此外,还释放了一个名为 Factual Knowledge Test Corpus (FKTC) 的测试集,以便进一步研究。
    Abstract Large language models (LLMs) have been treated as knowledge bases due to their strong performance in knowledge probing tasks. LLMs are typically evaluated using accuracy, yet this metric does not capture the vulnerability of LLMs to hallucination-inducing factors like prompt and context variability. How do we evaluate the capabilities of LLMs to consistently produce factually correct answers? In this paper, we propose MOdel kNowledge relIabiliTy scORe (MONITOR), a novel metric designed to directly measure LLMs' factual reliability. MONITOR computes the distance between the probability distributions of a valid output and its counterparts produced by the same LLM probing the same fact using different styles of prompts and contexts.Experiments on a comprehensive range of 12 LLMs demonstrate the effectiveness of MONITOR in evaluating the factual reliability of LLMs while maintaining a low computational overhead. In addition, we release the FKTC (Factual Knowledge Test Corpus) test set, containing 210,158 prompts in total to foster research along this line (https://github.com/Vicky-Wil/MONITOR).
    摘要

RSVP: Customer Intent Detection via Agent Response Contrastive and Generative Pre-Training

  • paper_url: http://arxiv.org/abs/2310.09773
  • repo_url: https://github.com/tommytyc/rsvp
  • paper_authors: Yu-Chien Tang, Wei-Yao Wang, An-Zi Yen, Wen-Chih Peng
  • for: 提供用户task-oriented对话中的精准回答和24小时支持
  • methods: 利用神经网络模型检测客户意图 based on their utterances
  • results: 与状态空间的基eline比进行了比较,得到了4.95%的准确率提升,3.4%的MRR@3提升和2.75%的MRR@5提升的结果
    Abstract The dialogue systems in customer services have been developed with neural models to provide users with precise answers and round-the-clock support in task-oriented conversations by detecting customer intents based on their utterances. Existing intent detection approaches have highly relied on adaptively pre-training language models with large-scale datasets, yet the predominant cost of data collection may hinder their superiority. In addition, they neglect the information within the conversational responses of the agents, which have a lower collection cost, but are significant to customer intent as agents must tailor their replies based on the customers' intent. In this paper, we propose RSVP, a self-supervised framework dedicated to task-oriented dialogues, which utilizes agent responses for pre-training in a two-stage manner. Specifically, we introduce two pre-training tasks to incorporate the relations of utterance-response pairs: 1) Response Retrieval by selecting a correct response from a batch of candidates, and 2) Response Generation by mimicking agents to generate the response to a given utterance. Our benchmark results for two real-world customer service datasets show that RSVP significantly outperforms the state-of-the-art baselines by 4.95% for accuracy, 3.4% for MRR@3, and 2.75% for MRR@5 on average. Extensive case studies are investigated to show the validity of incorporating agent responses into the pre-training stage.
    摘要 Dialogue 系统在客户服务中已经采用神经网络模型,以提供用户精准的答案和24小时的支持,通过检测客户意图基于他们的谈话来进行任务化对话。现有的意图检测方法强调适应性地预训练语言模型,但这可能增加成本。此外,它们忽略了代理人回复的信息,尽管这些信息在客户意图方面具有重要性,因为代理人必须根据客户的意图修改他们的回复。在本文中,我们提出了 RSVP,一个自动预训练框架,专门用于任务化对话。我们在两个阶段中使用代理人回复进行预训练:1)回复选择,选择一个正确的回复从批处理中的候选者中,2)回复生成,模仿代理人生成一个回复来回应给一个谈话。我们对两个实际的客户服务数据集进行了比较,结果显示,RSVP在精度、MRR@3和MRR@5等指标上平均高于状态之前的基eline by 4.95%、3.4%和2.75%。我们还进行了广泛的案例研究,以证明代理人回复的包含在预训练阶段是有效的。

Revisiting Graph Meaning Representations through Decoupling Contextual Representation Learning and Structural Information Propagation

  • paper_url: http://arxiv.org/abs/2310.09772
  • repo_url: None
  • paper_authors: Li Zhou, Wenyu Chen, Dingyi Zeng, Hong Qu, Daniel Hershcovich
  • for: 本研究旨在探讨图意表示(GMRs)在关系EXTRACTION任务中的精确影响。
  • methods: 本研究提出了一种简单和参数效率高的神经网络架构,用于分离上下文表示学习和结构信息传递。
  • results: 研究结果表明,GMRs在四个英文和两个中文 dataset 中有所提高表达关系的性能,特别是英文dataset更加精确。然而,在文学领域dataset中,GMRs的效果较低。这些发现可以为将来关系EXTRACTION任务中的GMRs和 parser设计提供更好的指导。
    Abstract In the field of natural language understanding, the intersection of neural models and graph meaning representations (GMRs) remains a compelling area of research. Despite the growing interest, a critical gap persists in understanding the exact influence of GMRs, particularly concerning relation extraction tasks. Addressing this, we introduce DAGNN-plus, a simple and parameter-efficient neural architecture designed to decouple contextual representation learning from structural information propagation. Coupled with various sequence encoders and GMRs, this architecture provides a foundation for systematic experimentation on two English and two Chinese datasets. Our empirical analysis utilizes four different graph formalisms and nine parsers. The results yield a nuanced understanding of GMRs, showing improvements in three out of the four datasets, particularly favoring English over Chinese due to highly accurate parsers. Interestingly, GMRs appear less effective in literary-domain datasets compared to general-domain datasets. These findings lay the groundwork for better-informed design of GMRs and parsers to improve relation classification, which is expected to tangibly impact the future trajectory of natural language understanding research.
    摘要 在自然语言理解领域,神经网络和图意表示(GMR)的交叉研究仍然吸引着广泛的关注。尽管有增长的兴趣,但是关于GMR的具体影响仍然存在一个重要的知识 gap。为了解决这个问题,我们介绍了DAGNN-plus,一种简单而参数有效的神经网络架构,用于分离上下文表示学习和结构信息传递。与不同的序列编码器和GMR相结合,这个架构提供了对系统实验的基础,并在四种图形式和九个解析器的支持下进行了实验分析。我们的实验结果表明,GMR在英文和中文两个领域中的表现不同,特别是在文学领域比通用领域更具有优势。这些发现为将来改进GMR和解析器的设计,以提高关系类别的识别,这将对自然语言理解研究的未来轨迹产生直接的影响。

Large Language Model-Aware In-Context Learning for Code Generation

  • paper_url: http://arxiv.org/abs/2310.09748
  • repo_url: None
  • paper_authors: Jia Li, Ge Li, Chongyang Tao, Jia Li, Huangzhao Zhang, Fang Liu, Zhi Jin
  • for: 这 paper 的目的是提出一种基于学习的选择方法,以提高 Code Generation 中 LLMS 的培养效果。
  • methods: 这 paper 使用了 LLMS 自身的生成概率来评估候选示例,然后通过对概率反馈来标注候选示例为正负。最后,通过带有对比学习目标的带有对比学习目标的导入,训练一个有效的检索器,以获得 LLMS 在 Code Generation 中的偏好。
  • results: 这 paper 的实验结果表明,LAIL 可以在 CodeGen 和 GPT-3.5 上提高 LLMS 的培养效果,相比之前的基eline 提高了11.58%、6.89%和5.07%,以及4.38%、2.85%和2.74%。
    Abstract Large language models (LLMs) have shown impressive in-context learning (ICL) ability in code generation. LLMs take a prompt consisting of requirement-code examples and a new requirement as input, and output new programs. Existing studies have found that ICL is highly dominated by the examples and thus arises research on example selection. However, existing approaches randomly select examples or only consider the textual similarity of requirements to retrieve, leading to sub-optimal performance. In this paper, we propose a novel learning-based selection approach named LAIL (LLM-Aware In-context Learning) for code generation. Given a candidate example, we exploit LLMs themselves to estimate it by considering the generation probabilities of ground-truth programs given a requirement and the example. We then label candidate examples as positive or negative through the probability feedback. Based on the labeled data, we import a contrastive learning objective to train an effective retriever that acquires the preference of LLMs in code generation. We apply LAIL to three LLMs and evaluate it on three representative datasets (e.g., MBJP, MBPP, and MBCPP). LATA outperforms the state-of-the-art baselines by 11.58%, 6.89%, and 5.07% on CodeGen, and 4.38%, 2.85%, and 2.74% on GPT-3.5 in terms of Pass@1, respectively.
    摘要 大型语言模型(LLM)在代码生成中表现出了吸引人的上下文学习(ICL)能力。LLM 接受一个包含需求代码示例和新需求的提示,并输出新的程序。现有的研究发现,ICL 受到示例的影响很大,因此引发了研究示例选择的研究。然而,现有的方法 Randomly 选择示例或者只考虑需求文本相似性来 retrieve,导致表现不佳。在这篇论文中,我们提出了一种新的学习基于选择方法 named LAIL(LLM-Aware In-context Learning)。给定一个候选示例,我们利用 LLM 自己来估算它,通过考虑需求和示例下的生成概率来Feedback probability。然后,我们将候选示例标记为正例或者负例,根据概率反馈。基于标记数据,我们导入了对比学习目标,以培养一个有效的检索器,使其获得 LLM 在代码生成中的偏好。我们在三个 LLM 上应用 LAIL,并对 MBJP、MBPP 和 MBCPP 三个表示性数据集进行评估。LATA 与当前基eline 相比,提高了代码生成的性能,具体是11.58%、6.89% 和 5.07% 的提升。

Overview of ImageArg-2023: The First Shared Task in Multimodal Argument Mining

  • paper_url: http://arxiv.org/abs/2310.12172
  • repo_url: None
  • paper_authors: Zhexiong Liu, Mohamed Elaraby, Yang Zhong, Diane Litman
  • for: 这篇论文提供了 ImageArg 共同任务的概述,这是第一个 Multimodal Argument Mining 共同任务,它在 EMNLP 2023 年度会议上召开。
  • methods: 这篇论文描述了两个分类子任务:(1)Argument Stance Classification,即判断一个包含图片和文本的推文是否支持或反对一个热点话题(如枪支持和堕胎);(2)Image Persuasiveness Classification,即判断图片是否使文本更加吸引人。
  • results: 这个共同任务收到了 31 个参赛作品,其中 21 个来自 9 个团队,来自 6 个国家。最佳提交在 Subtask-A 中获得了 F1 分数 0.8647,而在 Subtask-B 中获得了 F1 分数 0.5561。
    Abstract This paper presents an overview of the ImageArg shared task, the first multimodal Argument Mining shared task co-located with the 10th Workshop on Argument Mining at EMNLP 2023. The shared task comprises two classification subtasks - (1) Subtask-A: Argument Stance Classification; (2) Subtask-B: Image Persuasiveness Classification. The former determines the stance of a tweet containing an image and a piece of text toward a controversial topic (e.g., gun control and abortion). The latter determines whether the image makes the tweet text more persuasive. The shared task received 31 submissions for Subtask-A and 21 submissions for Subtask-B from 9 different teams across 6 countries. The top submission in Subtask-A achieved an F1-score of 0.8647 while the best submission in Subtask-B achieved an F1-score of 0.5561.
    摘要 这份论文介绍了图像论据共同任务(ImageArg),这是在EMNLP 2023年工作坊上的第一个多Modal Argument Mining共同任务。该任务包括两个分类子任务:(1)子任务A:图像立场分类;(2)子任务B:图像宣传效果分类。前者确定一个推文中的图像和文本对于一个争议话题(例如,枪支控制和堕胎)的立场。后者确定图像是否使得推文文本更加吸引人。共同任务收到了31个提交 для子任务A和21个提交 для子任务B来自9个不同的团队在6个国家。最佳提交在子任务A中取得了F1分数0.8647,而最佳提交在子任务B中取得了F1分数0.5561。

KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models

  • paper_url: http://arxiv.org/abs/2310.09725
  • repo_url: https://github.com/leopoldwhite/kgquiz
  • paper_authors: Yuyang Bai, Shangbin Feng, Vidhisha Balachandran, Zhaoxuan Tan, Shiqi Lou, Tianxing He, Yulia Tsvetkov
  • for: 这个论文旨在探讨大语言模型(LLM)在知识培养任务上的表现,以及如何系统地评估LLM的知识能力和其在不同知识领域和任务格式下的普适性。
  • methods: 这篇论文提出了一个名为KGQuiz的知识强度测试 benchmark,用于全面检验LLM的知识普适性和可行性。KGQuiz包括三个知识领域和五种任务 formats,从简单的真或假问题到复杂的开放知识生成。
  • results: 经过广泛的实验表明,LLM在简单的知识 QA 任务上表现出色,但是需要更复杂的推理或使用域pecific的知识时仍然存在很大挑战。这些结果表明KGQuiz可以用于分析LLM的知识能力和普适性在不同知识领域和任务格式下的变化。
    Abstract Large language models (LLMs) demonstrate remarkable performance on knowledge-intensive tasks, suggesting that real-world knowledge is encoded in their model parameters. However, besides explorations on a few probing tasks in limited knowledge domains, it is not well understood how to evaluate LLMs' knowledge systematically and how well their knowledge abilities generalize, across a spectrum of knowledge domains and progressively complex task formats. To this end, we propose KGQuiz, a knowledge-intensive benchmark to comprehensively investigate the knowledge generalization abilities of LLMs. KGQuiz is a scalable framework constructed from triplet-based knowledge, which covers three knowledge domains and consists of five tasks with increasing complexity: true-or-false, multiple-choice QA, blank filling, factual editing, and open-ended knowledge generation. To gain a better understanding of LLMs' knowledge abilities and their generalization, we evaluate 10 open-source and black-box LLMs on the KGQuiz benchmark across the five knowledge-intensive tasks and knowledge domains. Extensive experiments demonstrate that LLMs achieve impressive performance in straightforward knowledge QA tasks, while settings and contexts requiring more complex reasoning or employing domain-specific facts still present significant challenges. We envision KGQuiz as a testbed to analyze such nuanced variations in performance across domains and task formats, and ultimately to understand, evaluate, and improve LLMs' knowledge abilities across a wide spectrum of knowledge domains and tasks.
    摘要 大型语言模型(LLM)在知识密集任务中表现出色,表明其模型参数中含有真实世界知识。然而,关于如何系统地评估 LLM 的知识能力和其知识能力是否可以普遍应用于多个知识领域和复杂任务格式,还不够了解。为此,我们提出了 KGQuiz,一个用于全面探索 LLM 的知识普适能力的benchmark。KGQuiz 基于 triplet 知识结构,覆盖了三个知识领域,包括五种任务 formats,从简单的true-or-false 和多选问答,到复杂的blank filling和factual editing,最后是开放式知识生成。为了更好地理解 LLM 的知识能力和其普适性,我们在 KGQuiz benchmark 上测试了 10 个开源和黑盒 LLM,并进行了广泛的实验。结果表明, LLM 在直观知识 QA 任务中表现出色,但是需要更复杂的解释或使用域pecific的事实时仍然存在很大的挑战。我们认为 KGQuiz 可以作为一个测试台来分析这些 nuanced 的表现差异,并 ultimately 理解、评估和提高 LLM 的知识能力在多个知识领域和任务格式中。

HiCL: Hierarchical Contrastive Learning of Unsupervised Sentence Embeddings

  • paper_url: http://arxiv.org/abs/2310.09720
  • repo_url: None
  • paper_authors: Zhuofeng Wu, Chaowei Xiao, VG Vinod Vydiswaran
  • for: 提高序列表示学习的效率和有效性,通过地方归一化和全序列归一化的对比学习来学习地方和全序列之间的关系。
  • methods: 提出了一种层次对比学习框架(HiCL),将序列分成多个段,使用地方和全序列归一化对比学习来学习段级和序列级关系,并且通过首先编码短段并然后聚合以提高训练效率。
  • results: 对比于传统方法,HiCL能够提高7种广泛评估的STS任务的前一个表现,升师平均提高+0.2%(BERT-large)和+0.44%(RoBERTa-large)。
    Abstract In this paper, we propose a hierarchical contrastive learning framework, HiCL, which considers local segment-level and global sequence-level relationships to improve training efficiency and effectiveness. Traditional methods typically encode a sequence in its entirety for contrast with others, often neglecting local representation learning, leading to challenges in generalizing to shorter texts. Conversely, HiCL improves its effectiveness by dividing the sequence into several segments and employing both local and global contrastive learning to model segment-level and sequence-level relationships. Further, considering the quadratic time complexity of transformers over input tokens, HiCL boosts training efficiency by first encoding short segments and then aggregating them to obtain the sequence representation. Extensive experiments show that HiCL enhances the prior top-performing SNCSE model across seven extensively evaluated STS tasks, with an average increase of +0.2% observed on BERT-large and +0.44% on RoBERTa-large.
    摘要 在这篇论文中,我们提出了一个层次对比学习框架,即HiCL,该框架考虑了本地分割段级和全序列级关系,以提高训练效率和有效性。传统方法通常将序列编码为整体对比他们,而忽略本地表示学习,这会导致对短文本掌握困难。相反,HiCL通过将序列分割成多个段,并使用本地和全序列对比学习来模型段级和序列级关系。此外,考虑到 transformer 对输入字符数的平方时间复杂度,HiCL 提高了训练效率,先对短段进行编码,然后将其聚合以获得序列表示。广泛的实验表明,HiCL 可以提高先前的最佳 SNCSE 模型在七个广泛评估的 STS 任务上,平均提高 +0.2% 在 BERT-large 上和 +0.44% 在 RoBERTa-large 上。

cs.LG - 2023-10-15

AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents

  • paper_url: http://arxiv.org/abs/2310.09971
  • repo_url: https://github.com/ut-austin-rpl/amago
  • paper_authors: Jake Grigsby, Linxi Fan, Yuke Zhu
  • for: The paper is written to tackle the challenges of generalization, long-term memory, and meta-learning in in-context Reinforcement Learning (RL) agents.
  • methods: The paper proposes a new in-context RL agent called AMAGO, which uses sequence models and off-policy learning to overcome the limitations of previous approaches.
  • results: The paper demonstrates the strong performance of AMAGO in meta-RL and long-term memory domains, and shows that it can solve goal-conditioned problems with challenging exploration. Additionally, the paper introduces a novel hindsight relabeling scheme that allows AMAGO to solve open-world domains.
    Abstract We introduce AMAGO, an in-context Reinforcement Learning (RL) agent that uses sequence models to tackle the challenges of generalization, long-term memory, and meta-learning. Recent works have shown that off-policy learning can make in-context RL with recurrent policies viable. Nonetheless, these approaches require extensive tuning and limit scalability by creating key bottlenecks in agents' memory capacity, planning horizon, and model size. AMAGO revisits and redesigns the off-policy in-context approach to successfully train long-sequence Transformers over entire rollouts in parallel with end-to-end RL. Our agent is uniquely scalable and applicable to a wide range of problems. We demonstrate its strong performance empirically in meta-RL and long-term memory domains. AMAGO's focus on sparse rewards and off-policy data also allows in-context learning to extend to goal-conditioned problems with challenging exploration. When combined with a novel hindsight relabeling scheme, AMAGO can solve a previously difficult category of open-world domains, where agents complete many possible instructions in procedurally generated environments. We evaluate our agent on three goal-conditioned domains and study how its individual improvements connect to create a generalist policy.
    摘要 我们介绍AMAGO,一个内Context Reinforcement Learning(RL)代理人,使用序列模型解决通用化、长期记忆和元学习的挑战。现有研究表明,离政RL可以使内Context RL with recurrent policies成为可能。然而,这些方法需要广泛的调整和限制数据容量、观察 horizon和模型大小,导致代理人的可扩展性和应用范围受限。AMAGO重新评估和重新设计了离政内Context Approach,成功地在整个推套中平行训练长序Transformer,并且具有广泛适用性。我们在Meta-RL和长期记忆领域 empirically 显示了它的强大表现。AMAGO的专注点在于罕见的 reward和离政数据也使内Context learning扩展到目标条件下的问题。当与一个新的预测重新标示方案相结合时,AMAGO可以解决一些过去Difficult的开放世界领域,其中代理人完成了许多可能的指令在生成的环境中。我们在三个目标条件下评估了我们的代理人,并研究了它们个别的改进如何相互连接,创建一个通用的政策。

Theoretical Evaluation of Asymmetric Shapley Values for Root-Cause Analysis

  • paper_url: http://arxiv.org/abs/2310.09961
  • repo_url: None
  • paper_authors: Domokos M. Kelen, Mihály Petreczky, Péter Kersch, András A. Benczúr
  • for: 本研究探讨非对称的雪岭值(ASV),它是SHAP加itive本地解释方法的一种变体,可以在模型预测中检测不公正抵制。
  • methods: 本研究使用variance的方法来解释ASV的方法,并在多个实际 dataset上进行了比较,以证明ASV在特定模型家族中的有用性。
  • results: 研究发现,在某些情况下,ASV可能会产生Counter-intuitive的解释结果,这可能会导致模型预测中的根本原因分析错误。此外,研究还发现了一些特定的模型家族,如泛型加itive模型(GAM),在这些家族中,ASV具有愉悦的性质。
    Abstract In this work, we examine Asymmetric Shapley Values (ASV), a variant of the popular SHAP additive local explanation method. ASV proposes a way to improve model explanations incorporating known causal relations between variables, and is also considered as a way to test for unfair discrimination in model predictions. Unexplored in previous literature, relaxing symmetry in Shapley values can have counter-intuitive consequences for model explanation. To better understand the method, we first show how local contributions correspond to global contributions of variance reduction. Using variance, we demonstrate multiple cases where ASV yields counter-intuitive attributions, arguably producing incorrect results for root-cause analysis. Second, we identify generalized additive models (GAM) as a restricted class for which ASV exhibits desirable properties. We support our arguments by proving multiple theoretical results about the method. Finally, we demonstrate the use of asymmetric attributions on multiple real-world datasets, comparing the results with and without restricted model families using gradient boosting and deep learning models.
    摘要 在这项工作中,我们研究非对称的雪平值(ASV),这是SHAP添加式本地解释方法的一种变体。ASV提出了 incorporating known causal relations between variables的方法,并且被视为测试模型预测中的不公正折衔测试。在前期文献中没有被研究过,放弃雪平值的对称性可能会导致模型解释中的counter-intuitive consequence。为了更好地理解这种方法,我们首先示出了local contributions与global contributions of variance reduction的对应关系。使用方差,我们展示了多种情况下,ASV可能会生成错误的root-cause分析结果。其次,我们认为Generalized Additive Models(GAM)是一种受限的模型家族,ASV在这种模型家族中具有恰当的性质。我们支持我们的 Argument by proving multiple theoretical results about the method。最后,我们在多个实际 dataset上使用非对称的贡献值,并与和 без Restricted model families using gradient boosting和deep learning模型进行比较。

Deep Reinforcement Learning with Explicit Context Representation

  • paper_url: http://arxiv.org/abs/2310.09924
  • repo_url: None
  • paper_authors: Francisco Munguia-Galeano, Ah-Hwee Tan, Ze Ji
  • for: solves complex computational problems with contextual information
  • methods: uses Iota explicit context representation (IECR) framework with contextual key frames (CKFs) and two loss functions
  • results: significantly outperforms state-of-the-art equivalents in five discrete environments with contextual information
    Abstract Reinforcement learning (RL) has shown an outstanding capability for solving complex computational problems. However, most RL algorithms lack an explicit method that would allow learning from contextual information. Humans use context to identify patterns and relations among elements in the environment, along with how to avoid making wrong actions. On the other hand, what may seem like an obviously wrong decision from a human perspective could take hundreds of steps for an RL agent to learn to avoid. This paper proposes a framework for discrete environments called Iota explicit context representation (IECR). The framework involves representing each state using contextual key frames (CKFs), which can then be used to extract a function that represents the affordances of the state; in addition, two loss functions are introduced with respect to the affordances of the state. The novelty of the IECR framework lies in its capacity to extract contextual information from the environment and learn from the CKFs' representation. We validate the framework by developing four new algorithms that learn using context: Iota deep Q-network (IDQN), Iota double deep Q-network (IDDQN), Iota dueling deep Q-network (IDuDQN), and Iota dueling double deep Q-network (IDDDQN). Furthermore, we evaluate the framework and the new algorithms in five discrete environments. We show that all the algorithms, which use contextual information, converge in around 40,000 training steps of the neural networks, significantly outperforming their state-of-the-art equivalents.
    摘要 强化学习(RL)已经表现出解决复杂计算问题的惊人能力。然而,大多数RL算法缺乏显式的方法来学习上下文信息。人类通过上下文来识别环境中元素之间的征交和相互关系,以及如何避免 incorrect 行为。相反,RL Agent可能需要多达百步才能学习避免错误的决策。这篇论文提出了一个名为IECR(Iota Explicit Context Representation)的框架,该框架可以在精确的环境中提取上下文信息,并学习CKFs(上下文关键帧)的表示。此外,本文还提出了两个相对于上下文的产品函数损失函数。IECR框架的创新之处在于可以从环境中提取上下文信息,并学习CKFs的表示。我们验证了IECR框架,并开发了四种使用上下文学习的算法:Iota Deep Q-Network(IDQN)、Iota Double Deep Q-Network(IDDQN)、Iota Duelling Deep Q-Network(IDuDQN)和Iota Duelling Double Deep Q-Network(IDDDQN)。此外,我们还在五个精确环境中评估了IECR框架和这些算法。我们发现,所有使用上下文信息的算法在40,000步训练步骤后,可以快速并高效地学习,与当前最佳算法相比显著性能更高。

BONES: Near-Optimal Neural-Enhanced Video Streaming

  • paper_url: http://arxiv.org/abs/2310.09920
  • repo_url: None
  • paper_authors: Lingdong Wang, Simran Singh, Jacob Chakareski, Mohammad Hajiesmaili, Ramesh K. Sitaraman
  • for: 提高用户视频流程体验质量(Quality of Experience,QoE)
  • methods: 使用神经网络优化技术(Neural Enhancement)和在线启发式优化算法(Online Lyapunov Optimization)提高视频质量
  • results: 比对当前状态艺技术,BONES算法可以提高用户视频流程体验质量4%到13%,显示其在提高视频流程体验质量方面具有潜在的应用前景。
    Abstract Accessing high-quality video content can be challenging due to insufficient and unstable network bandwidth. Recent advances in neural enhancement have shown promising results in improving the quality of degraded videos through deep learning. Neural-Enhanced Streaming (NES) incorporates this new approach into video streaming, allowing users to download low-quality video segments and then enhance them to obtain high-quality content without violating the playback of the video stream. We introduce BONES, an NES control algorithm that jointly manages the network and computational resources to maximize the quality of experience (QoE) of the user. BONES formulates NES as a Lyapunov optimization problem and solves it in an online manner with near-optimal performance, making it the first NES algorithm to provide a theoretical performance guarantee. Our comprehensive experimental results indicate that BONES increases QoE by 4% to 13% over state-of-the-art algorithms, demonstrating its potential to enhance the video streaming experience for users. Our code and data will be released to the public.
    摘要 Accessing high-quality video content can be challenging due to insufficient and unstable network bandwidth. Recent advances in neural enhancement have shown promising results in improving the quality of degraded videos through deep learning. Neural-Enhanced Streaming (NES) incorporates this new approach into video streaming, allowing users to download low-quality video segments and then enhance them to obtain high-quality content without violating the playback of the video stream. We introduce BONES, an NES control algorithm that jointly manages the network and computational resources to maximize the quality of experience (QoE) of the user. BONES formulates NES as a Lyapunov optimization problem and solves it in an online manner with near-optimal performance, making it the first NES algorithm to provide a theoretical performance guarantee. Our comprehensive experimental results indicate that BONES increases QoE by 4% to 13% over state-of-the-art algorithms, demonstrating its potential to enhance the video streaming experience for users. Our code and data will be released to the public.Here's the text in Traditional Chinese:Accessing high-quality video content can be challenging due to insufficient and unstable network bandwidth. Recent advances in neural enhancement have shown promising results in improving the quality of degraded videos through deep learning. Neural-Enhanced Streaming (NES) incorporates this new approach into video streaming, allowing users to download low-quality video segments and then enhance them to obtain high-quality content without violating the playback of the video stream. We introduce BONES, an NES control algorithm that jointly manages the network and computational resources to maximize the quality of experience (QoE) of the user. BONES formulates NES as a Lyapunov optimization problem and solves it in an online manner with near-optimal performance, making it the first NES algorithm to provide a theoretical performance guarantee. Our comprehensive experimental results indicate that BONES increases QoE by 4% to 13% over state-of-the-art algorithms, demonstrating its potential to enhance the video streaming experience for users. Our code and data will be released to the public.

Evaluation of feature selection performance for identification of best effective technical indicators on stock market price prediction

  • paper_url: http://arxiv.org/abs/2310.09903
  • repo_url: None
  • paper_authors: Fatemeh Moodi, Amir Jahangard-Rafsanjani
  • for: 本研究的目的是通过特征选择来选择最佳的股票市场指标,以预测股票市场价格的误差最小。
  • methods: 本研究使用了 wrapper 特征选择方法,包括 SFS 和 SBS,并使用了 10 种估计器和 123 个技术指标来预测股票市场价格。
  • results: 研究发现,每种 wrapper 特征选择方法都有不同的结果,与不同的机器学习方法相关。ridge 和 LR 估计器,单独使用和与 wrapper 特征选择方法结合使用,在所有评价标准下得到了最佳股票市场预测结果。
    Abstract Due to the influence of many factors, including technical indicators on stock market prediction, feature selection is important to choose the best indicators. One of the feature selection methods that consider the performance of models during feature selection is the wrapper feature selection method. The aim of this research is to identify a combination of the best stock market indicators through feature selection to predict the stock market price with the least error. In order to evaluate the impact of wrapper feature selection techniques on stock market prediction, in this paper SFS and SBS with 10 estimators and 123 technical indicators have been examined on the last 13 years of Apple Company. Also, by the proposed method, the data created by the 3-day time window were converted to the appropriate input for regression methods. Based on the results observed: (1) Each wrapper feature selection method has different results with different machine learning methods, and each method is more correlated with a specific set of technical indicators of the stock market. (2) Ridge and LR estimates alone, and with two methods of the wrapper feature selection, namely SFS and SBS; They had the best results with all assessment criteria for market forecast. (3)The Ridge and LR method with all the R2, MSE, RMSE, MAE and MAPE have the best stock market prediction results. Also, the MLP Regression Method, along with the Sequential Forwards Selection and the MSE, had the best performance. SVR regression, along with the SFS and the MSE, has improved greatly compared to the SVR regression with all indicators. (4) It was also observed that different features are selected by different ML methods with different evaluation parameters. (5) Most ML methods have used the Squeeze_pro, Percentage Price Oscillator, Thermo, Decay, Archer On-Balance Volume, Bollinger Bands, Squeeze and Ichimoku indicator.
    摘要 因为多种因素的影响,包括技术指标在股票市场预测中,特征选择是重要的。本研究的目标是通过特征选择选择最佳的股票市场指标,以预测股票市场价格的误差最小。为了评估包装特征选择技术对股票市场预测的影响,本文在Apple公司上评估了13年的数据。具体来说,通过提posed方法,将3天时窗内的数据转换为适合回归方法的输入。根据结果所见:1. 每种包装特征选择方法都有不同的结果,与不同的机器学习方法相关。每种方法更加相关于股票市场技术指标的特定集。2. Ridge和LR估计独立,以及使用SFS和SBS两种包装特征选择方法时,在所有评价标准中表现最佳。3. Ridge和LR方法与所有评价标准(R2、MSE、RMSE、MAE和MAPE)表现最佳。此外,MLP回归方法,结合Sequential Forwards Selection和MSE,也表现出色。4. Observation showed that different ML methods select different features with different evaluation parameters.5. Most ML methods use Squeeze_pro、Percentage Price Oscillator、Thermo、Decay、Archer On-Balance Volume、Bollinger Bands、Squeeze和Ichimoku指标。

Towards Deep Learning Models Resistant to Transfer-based Adversarial Attacks via Data-centric Robust Learning

  • paper_url: http://arxiv.org/abs/2310.09891
  • repo_url: None
  • paper_authors: Yulong Yang, Chenhao Lin, Xiang Ji, Qiwei Tian, Qian Li, Hongshan Yang, Zhibo Wang, Chao Shen
  • for: 防御对于转移基于攻击的威胁,提供一种新的防御方法,即数据中心Robust Learning(DRL)。
  • methods: DRL使用一招一时的反恶例增强,而不是在整个训练过程中优化对抗例。
  • results: DRL在黑盒Robustness方面比PGD-AT、TRADES、EAT和FAT等常用AT技术表现出色,并且可以与多种数据增强和损失规则结合使用,以提高防御性。
    Abstract Transfer-based adversarial attacks raise a severe threat to real-world deep learning systems since they do not require access to target models. Adversarial training (AT), which is recognized as the strongest defense against white-box attacks, has also guaranteed high robustness to (black-box) transfer-based attacks. However, AT suffers from heavy computational overhead since it optimizes the adversarial examples during the whole training process. In this paper, we demonstrate that such heavy optimization is not necessary for AT against transfer-based attacks. Instead, a one-shot adversarial augmentation prior to training is sufficient, and we name this new defense paradigm Data-centric Robust Learning (DRL). Our experimental results show that DRL outperforms widely-used AT techniques (e.g., PGD-AT, TRADES, EAT, and FAT) in terms of black-box robustness and even surpasses the top-1 defense on RobustBench when combined with diverse data augmentations and loss regularizations. We also identify other benefits of DRL, for instance, the model generalization capability and robust fairness.
    摘要 transfer-based adversarial attacks pose a severe threat to real-world deep learning systems, as they do not require access to target models. adversarial training (AT), which is recognized as the strongest defense against white-box attacks, has also guaranteed high robustness to (black-box) transfer-based attacks. however, AT suffers from heavy computational overhead, as it optimizes adversarial examples during the entire training process. in this paper, we demonstrate that such heavy optimization is not necessary for AT against transfer-based attacks. instead, a one-shot adversarial augmentation prior to training is sufficient, and we name this new defense paradigm data-centric robust learning (drl). our experimental results show that drl outperforms widely-used at techniques (e.g., pgd-at, trades, eat, and fat) in terms of black-box robustness and even surpasses the top-1 defense on robustbench when combined with diverse data augmentations and loss regularizations. we also identify other benefits of drl, such as model generalization capability and robust fairness.

Score-Based Methods for Discrete Optimization in Deep Learning

  • paper_url: http://arxiv.org/abs/2310.09890
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Eric Lei, Arman Adibi, Hamed Hassani
  • for: 这 paper 是用于解决深度学习任务中的离散优化问题的。
  • methods: 这 paper 使用了一种分数函数方法来解决这些问题,该方法使用了一个分数函数作为目标函数的代理,并使用了隐藏变量的嵌入和自动导函数框架来并行计算反向传播。
  • results: 该 paper 的实验表明,在对抗式集成分类任务中,该方法可以实现一个更好的平衡点,即快速并且解决高维数据的问题。
    Abstract Discrete optimization problems often arise in deep learning tasks, despite the fact that neural networks typically operate on continuous data. One class of these problems involve objective functions which depend on neural networks, but optimization variables which are discrete. Although the discrete optimization literature provides efficient algorithms, they are still impractical in these settings due to the high cost of an objective function evaluation, which involves a neural network forward-pass. In particular, they require $O(n)$ complexity per iteration, but real data such as point clouds have values of $n$ in thousands or more. In this paper, we investigate a score-based approximation framework to solve such problems. This framework uses a score function as a proxy for the marginal gain of the objective, leveraging embeddings of the discrete variables and speed of auto-differentiation frameworks to compute backward-passes in parallel. We experimentally demonstrate, in adversarial set classification tasks, that our method achieves a superior trade-off in terms of speed and solution quality compared to heuristic methods.
    摘要 几乎所有深度学习任务中都会遇到离散优化问题,即使神经网络通常处理连续数据。这类问题中的目标函数取决于神经网络,但优化变量是离散的。虽然离散优化文献中提供了高效的算法,但它们在这些设置中仍然不实用,因为目标函数评估的成本高,需要对神经网络进行前进传播,这需要 $O(n)$ 复杂度每次迭代。例如,实际数据如点云可能有 thousands 或更多的值。在这篇论文中,我们研究了一种分数函数近似框架,用于解决这些问题。这个框架使用分数函数作为目标函数的代理,利用离散变量的嵌入和自动导数框架来并行计算反向传播。我们在随机设置中的对抗性分类任务中实验ally示出,我们的方法可以在速度和解决质量之间取得优化的负号比例。

Empower Text-Attributed Graphs Learning with Large Language Models (LLMs)

  • paper_url: http://arxiv.org/abs/2310.09872
  • repo_url: None
  • paper_authors: Jianxiang Yu, Yuxiang Ren, Chenghua Gong, Jiaqi Tan, Xiang Li, Xuecang Zhang
  • for: 提高文本权重图像的性能在几个shot情况下(提高 node classification 任务的性能)
  • methods: 使用 Large Language Models (LLMs) 提取标签中的Semantic信息,并生成相应的类别 exemplars,然后使用边预测器捕捉原始数据中的结构信息,并将新生成的样本纳入原始图像中。
  • results: 对ogbn-arxiv dataset进行了广泛的实验,并显示了在1-shot情况下对基eline模型的76%提升。
    Abstract Text-attributed graphs have recently garnered significant attention due to their wide range of applications in web domains. Existing methodologies employ word embedding models for acquiring text representations as node features, which are subsequently fed into Graph Neural Networks (GNNs) for training. Recently, the advent of Large Language Models (LLMs) has introduced their powerful capabilities in information retrieval and text generation, which can greatly enhance the text attributes of graph data. Furthermore, the acquisition and labeling of extensive datasets are both costly and time-consuming endeavors. Consequently, few-shot learning has emerged as a crucial problem in the context of graph learning tasks. In order to tackle this challenge, we propose a lightweight paradigm called ENG, which adopts a plug-and-play approach to empower text-attributed graphs through node generation using LLMs. Specifically, we utilize LLMs to extract semantic information from the labels and generate samples that belong to these categories as exemplars. Subsequently, we employ an edge predictor to capture the structural information inherent in the raw dataset and integrate the newly generated samples into the original graph. This approach harnesses LLMs for enhancing class-level information and seamlessly introduces labeled nodes and edges without modifying the raw dataset, thereby facilitating the node classification task in few-shot scenarios. Extensive experiments demonstrate the outstanding performance of our proposed paradigm, particularly in low-shot scenarios. For instance, in the 1-shot setting of the ogbn-arxiv dataset, ENG achieves a 76% improvement over the baseline model.
    摘要 文本拥有Graph neural networks (GNNs) 在网络领域中得到了广泛的应用,现有的方法使用word embedding模型来获取文本表示,然后将其传递给GNNs进行训练。在大型自然语言模型(LLMs)的出现之后,这些 модели的强大能力在信息检索和文本生成中得到了应用,可以大幅提高文本特征。然而,收集和标注大量数据都是成本和时间consuming的任务。因此,几拍学习成为了图学习任务中的一个关键问题。为解决这个问题,我们提出了一种轻量级的方法called ENG,它采用了一种插件式的方法来授权文本拥有Graph中的节点生成。具体来说,我们使用LLMs来提取标签中的semantic信息,并生成符合这些类别的样本作为示例。然后,我们使用边预测器来捕捉原始数据中的结构信息,并将新生成的样本与原始图 Integrate into the graph。这种方法利用了LLMs来增强类别信息,无需修改原始数据,因此可以轻松地在几拍学习场景下进行节点分类。我们的实验表明,ENG方法在低shot场景下表现出色,例如在ogbn-arxivdataset中的1拍 Setting中,ENG方法与基eline模型相比提高了76%。

Alpha Elimination: Using Deep Reinforcement Learning to Reduce Fill-In during Sparse Matrix Decomposition

  • paper_url: http://arxiv.org/abs/2310.09852
  • repo_url: None
  • paper_authors: Arpan Dasgupta, Pawan Kumar
  • for: 这个论文目的是对叠合矩阵进行分解,以提高分解过程中的效率和内存需求。
  • methods: 这个论文使用了一种基于强化学习的叠合矩阵重新排序方法,即alphaElimination。这个方法以单玩家游戏的形式表现出叠合矩阵重新排序问题,并使用Monte-Carlo tree search和神经网络来找到最佳移动。
  • results: 这个论文的结果显示,alphaElimination 可以与现有的热点排序法相比,实现更好的填充避免,并且对叠合矩阵的分解过程和解释过程都有很好的影响。
    Abstract A large number of computational and scientific methods commonly require decomposing a sparse matrix into triangular factors as LU decomposition. A common problem faced during this decomposition is that even though the given matrix may be very sparse, the decomposition may lead to a denser triangular factors due to fill-in. A significant fill-in may lead to prohibitively larger computational costs and memory requirement during decomposition as well as during the solve phase. To this end, several heuristic sparse matrix reordering methods have been proposed to reduce fill-in before the decomposition. However, finding an optimal reordering algorithm that leads to minimal fill-in during such decomposition is known to be a NP-hard problem. A reinforcement learning based approach is proposed for this problem. The sparse matrix reordering problem is formulated as a single player game. More specifically, Monte-Carlo tree search in combination with neural network is used as a decision making algorithm to search for the best move in our game. The proposed method, alphaElimination is found to produce significantly lesser non-zeros in the LU decomposition as compared to existing state-of-the-art heuristic algorithms with little to no increase in overall running time of the algorithm. The code for the project will be publicly available here\footnote{\url{https://github.com/misterpawan/alphaEliminationPaper}.
    摘要 许多计算和科学方法通常需要将稀疏矩阵分解为LU分解。在这个分解过程中,常见的问题是,即使给定矩阵很稀疏,但是分解可能会导致三角因子更加稠密,即fill-in问题。这种填充可能会导致计算成本和内存需求急剧增加。为解决这个问题,许多启发式稀疏矩阵重新排序方法已经被提出。然而,找到最优的重新排序算法,以使得在这个分解过程中避免填充,是一个NP困难问题。本文提出了一种基于强化学习的方法。将稀疏矩阵重新排序问题定义为单player游戏。具体来说,使用Monte-Carlo搜索和神经网络的决策算法来搜索最佳的移动。该方法被命名为alphaElimination,并发现它可以在LU分解中生成许多更少的非零元素,与现有的状态艺术算法相比,几乎没有增加总运行时间。代码将在以下链接公开:\footnote{\url{https://github.com/misterpawan/alphaEliminationPaper}。

Enhancing ML model accuracy for Digital VLSI circuits using diffusion models: A study on synthetic data generation

  • paper_url: http://arxiv.org/abs/2310.10691
  • repo_url: None
  • paper_authors: Prasha Srivastava, Pawan Kumar, Zia Abbas
  • for: 这个研究旨在使用扩散模型生成人工数据,以提高后续机器学习模型在电子芯片设计、评估和测试等任务中的准确性。
  • methods: 我们使用HSPICE设计环境和22nm CMOS技术节点进行了仿真,以获得可靠的真实训练数据。
  • results: 我们的结果表明,扩散模型生成的人工数据和实际数据之间存在很close的相似性。我们验证了生成的数据质量,并证明了数据扩展确实有效地提高了VLSI设计中的预测性。
    Abstract Generative AI has seen remarkable growth over the past few years, with diffusion models being state-of-the-art for image generation. This study investigates the use of diffusion models in generating artificial data generation for electronic circuits for enhancing the accuracy of subsequent machine learning models in tasks such as performance assessment, design, and testing when training data is usually known to be very limited. We utilize simulations in the HSPICE design environment with 22nm CMOS technology nodes to obtain representative real training data for our proposed diffusion model. Our results demonstrate the close resemblance of synthetic data using diffusion model to real data. We validate the quality of generated data, and demonstrate that data augmentation certainly effective in predictive analysis of VLSI design for digital circuits.
    摘要 这篇研究探讨使用扩散模型来生成人工训练数据,以提高后续机器学习模型在电子遗传学 задача中的准确性。我们使用HSPICE设计环境,使用22nm CMOS技术架构,从实验中获得真实训练数据,以验证扩散模型的效果。我们的结果显示,扩散模型生成的 sintetic 数据和实际数据之间存在着类似的相似性。我们还证明了生成数据的质量,并证明了数据增强对适用于对数字遗传学设计的预测分析有效。Note:* "扩散模型" (diffusion model) refers to a type of generative model that generates data by iteratively refining a random noise vector until it matches the target data distribution.* "HSPICE" is a circuit simulator that is widely used in the field of electronic design automation.* "CMOS" (Complementary Metal-Oxide-Semiconductor) is a technology node that is commonly used in the fabrication of integrated circuits.* "适用" (suitable) means appropriate or applicable.* "预测分析" (predictive analysis) refers to the use of statistical or machine learning techniques to forecast the behavior of a system or process.

XRMDN: A Recurrent Mixture Density Networks-based Architecture for Short-Term Probabilistic Demand Forecasting in Mobility-on-Demand Systems with High Volatility

  • paper_url: http://arxiv.org/abs/2310.09847
  • repo_url: None
  • paper_authors: Xiaoming Li, Hubert Normandin-Taillon, Chun Wang, Xiao Huang
  • for: 这个研究是为了提高现实的 mobilitity-on-demand(MoD)系统中的需求预测精度,因为需求是具有高度动态波动性,这些波动性难以预测使用传统时间序列预测方法。
  • methods: 本研究提出了一个扩展的回传混合密度网络(XRMDN),它将传统的重量和均值神经网络扩展到回传神经网络,以capture historic data-series data的趋势,从而获得更好的预测结果。
  • results: 根据实验结果,XRMDN比三种参考模型(包括统计、机器学习和深度学习模型)在三个评估指标上表现更好,特别是在具有强波动性的需求预测中。此外,XRMDN还可以帮助优化MoD系统中的其他应用问题,例如在不确定性下进行优化。
    Abstract In real Mobility-on-Demand (MoD) systems, demand is subject to high and dynamic volatility, which is difficult to predict by conventional time-series forecasting approaches. Most existing forecasting approaches yield the point value as the prediction result, which ignores the uncertainty that exists in the forecasting result. This will lead to the forecasting result severely deviating from the true demand value due to the high volatility existing in demand. To fill the gap, we propose an extended recurrent mixture density network (XRMDN), which extends the weight and mean neural networks to recurrent neural networks. The recurrent neurons for mean and variance can capture the trend of the historical data-series data, which enables a better forecasting result in dynamic and high volatility. We conduct comprehensive experiments on one taxi trip record and one bike-sharing real MoD data set to validate the performance of XRMDN. Specifically, we compare our model to three types of benchmark models, including statistical, machine learning, and deep learning models on three evaluation metrics. The validation results show that XRMDN outperforms the three groups of benchmark models in terms of the evaluation metrics. Most importantly, XRMDN substantially improves the forecasting accuracy with the demands in strong volatility. Last but not least, this probabilistic demand forecasting model contributes not only to the demand prediction in MoD systems but also to other optimization application problems, especially optimization under uncertainty, in MoD applications.
    摘要 真实的流动性-on-需求(MoD)系统中的需求受到高度和动态的不稳定性影响,这些影响难以预测通过传统时间序列预测方法。大多数现有预测方法只预测点值,忽略预测结果中存在的不确定性。这将导致预测结果与真实需求值严重不符,因为需求的高度不稳定。为了填补这个空白,我们提议一种扩展的循环混合密度网络(XRMDN)模型,扩展了权重和均值神经网络到循环神经网络。循环神经网络可以捕捉历史数据时系列数据的趋势,从而实现更好的预测结果在动态和高度不稳定的情况下。我们对一个出租车旅程记录和一个自行车分享真实MoD数据集进行了广泛的实验,以验证XRMDN的性能。具体来说,我们与三种参考模型进行比较,包括统计、机器学习和深度学习模型,并在三个评价指标上进行比较。验证结果显示,XRMDN在评价指标上都高于参考模型。此外,XRMDN在需求强度不稳定的情况下显著提高了预测精度。最后,这种probabilistic需求预测模型不仅有助于MoD系统中的需求预测,还有助于其他优化应用问题,尤其是在MoD应用中的不确定性优化问题。

Secure and Robust Communications for Cislunar Space Networks

  • paper_url: http://arxiv.org/abs/2310.09835
  • repo_url: None
  • paper_authors: Selen Gecgel Cetin, Gunes Karabulut Kurt, Angeles Vazquez-Castro
  • for: 这个研究旨在提供一个基于机器学习的cislunar空间领域意识能力,以确保无间断的月球和地球之间通信。
  • methods: 本研究提出了一个细部通道模型,以及两种可能会在cislunar空间发生的干扰模型。
  • results: 研究结果显示,使用机器学习算法的cislunar空间领域意识能力可以实现96%的准确性,并且显示出了这种方法的扎实性和可靠性。
    Abstract There is no doubt that the Moon has become the center of interest for commercial and international actors. Over the past decade, the number of planned long-term missions has increased dramatically. This makes the establishment of cislunar space networks (CSNs) crucial to orchestrate uninterrupted communications between the Moon and Earth. However, there are numerous challenges, unknowns, and uncertainties associated with cislunar communications that may pose various risks to lunar missions. In this study, we aim to address these challenges for cislunar communications by proposing a machine learning-based cislunar space domain awareness (SDA) capability that enables robust and secure communications. To this end, we first propose a detailed channel model for selected cislunar scenarios. Secondly, we propose two types of interference that could model anomalies that occur in cislunar space and are so far known only to a limited extent. Finally, we discuss our cislunar SDA to work in conjunction with the spacecraft communication system. Our proposed cislunar SDA, involving heuristic learning capabilities with machine learning algorithms, detects interference models with over 96% accuracy. The results demonstrate the promising performance of our cislunar SDA approach for secure and robust cislunar communication.
    摘要 <>将文本翻译成简化中文。<>月球已成为商业和国际行动者的中心,过去一代,计划的长期任务数量有所增加。这使得在月球和地球之间建立cislunar空间网络(CSN)变得非常重要,以确保无间断的通信。然而,cislunar通信存在许多挑战、未知和不确定性,这些风险可能对月球任务产生影响。在这种情况下,我们提出了一种基于机器学习的cislunar空间领域意识(SDA)能力,以确保安全和可靠的通信。为此,我们首先提出了选择的cislunar场景下的通道模型。其次,我们提出了两种可能出现在cislunar空间中的干扰,这些干扰至今只有有限的知识。最后,我们讨论了我们的cislunar SDA如何与空间器通信系统结合使用。我们的提议的cislunar SDA,结合机器学习算法的启发学能力,可以检测干扰模型的准确率高达96%。结果表明我们的cislunar SDA方法在确保安全和可靠的cislunar通信方面表现出色。

MAGIC: Detecting Advanced Persistent Threats via Masked Graph Representation Learning

  • paper_url: http://arxiv.org/abs/2310.09831
  • repo_url: https://github.com/fdudsde/magic
  • paper_authors: Zian Jia, Yun Xiong, Yuhong Nan, Yao Zhang, Jinjing Zhao, Mi Wen
  • for: 这篇论文旨在探讨防止高级攻击者(APT)的攻击方法,并提出一个名为MAGIC的新型自我超级攻击探测方法,可以在不同的监控环境下进行多层级的探测。
  • methods: 本文使用隐藏标签数据库学来学习防止APT攻击的模型,并将其应用于不同监控环境下的探测。
  • results: 本文在三个广泛使用的数据集上进行评估,结果显示MAGIC可以在所有测试场景中获得出色的探测结果,并与现有的APT探测方法相比,具有很大的性能优势。
    Abstract Advance Persistent Threats (APTs), adopted by most delicate attackers, are becoming increasing common and pose great threat to various enterprises and institutions. Data provenance analysis on provenance graphs has emerged as a common approach in APT detection. However, previous works have exhibited several shortcomings: (1) requiring attack-containing data and a priori knowledge of APTs, (2) failing in extracting the rich contextual information buried within provenance graphs and (3) becoming impracticable due to their prohibitive computation overhead and memory consumption. In this paper, we introduce MAGIC, a novel and flexible self-supervised APT detection approach capable of performing multi-granularity detection under different level of supervision. MAGIC leverages masked graph representation learning to model benign system entities and behaviors, performing efficient deep feature extraction and structure abstraction on provenance graphs. By ferreting out anomalous system behaviors via outlier detection methods, MAGIC is able to perform both system entity level and batched log level APT detection. MAGIC is specially designed to handle concept drift with a model adaption mechanism and successfully applies to universal conditions and detection scenarios. We evaluate MAGIC on three widely-used datasets, including both real-world and simulated attacks. Evaluation results indicate that MAGIC achieves promising detection results in all scenarios and shows enormous advantage over state-of-the-art APT detection approaches in performance overhead.
    摘要 高级攻击者所采用的持续攻击(APT)正在不断增长,对各种企业和机构构成极大的威胁。数据源推论分析在APT检测中得到了广泛应用。然而,先前的工作具有以下缺陷:(1)需要攻击数据和先验知识,(2)无法提取质量推论图中埋藏的详细信息,(3)因计算负担和内存占用过高而成为不可持续。在这篇论文中,我们介绍MAGIC,一种新的自动化和灵活的APT检测方法。MAGIC可以在不同的级别和水平进行多重粒度检测,并且可以通过匿名系统实体和行为模型来快速抽象出深度特征。通过检测异常系统行为,MAGIC可以同时进行系统实体层和批处理日志层APT检测。MAGIC特别地采用了掩码图表学习来模型无辜系统实体和行为,并通过异常检测方法来检测异常系统行为。MAGIC可以适应概念漂移,并在通用条件和检测场景下显示出优异表现。我们对三个广泛使用的数据集进行了评估,包括真实攻击和模拟攻击。评估结果表明,MAGIC在所有场景中具有扎实的检测效果,与当前APT检测方法相比,具有巨大的性能优势。

VFLAIR: A Research Library and Benchmark for Vertical Federated Learning

  • paper_url: http://arxiv.org/abs/2310.09827
  • repo_url: https://github.com/flair-thu/vflair
  • paper_authors: Tianyuan Zou, Zixuan Gu, Yu He, Hideaki Takahashi, Yang Liu, Guangnan Ye, Ya-Qin Zhang
  • for: 这篇论文旨在探讨Vertival Federated Learning(VFL)的应用和研究前景,以及如何防止不同类型的数据推理和后门攻击。
  • methods: 本文使用了多种模型、数据集和协议,并提供了标准化的评估模块以评估攻击和防御策略的性能。
  • results: 本文对11种攻击和8种防御策略进行了实验性评估,并从不同的通信和模型分割设置中绘制了具体的发现和建议,以帮助实际应用中的VFL部署场景选择防御策略。
    Abstract Vertical Federated Learning (VFL) has emerged as a collaborative training paradigm that allows participants with different features of the same group of users to accomplish cooperative training without exposing their raw data or model parameters. VFL has gained significant attention for its research potential and real-world applications in recent years, but still faces substantial challenges, such as in defending various kinds of data inference and backdoor attacks. Moreover, most of existing VFL projects are industry-facing and not easily used for keeping track of the current research progress. To address this need, we present an extensible and lightweight VFL framework VFLAIR (available at https://github.com/FLAIR-THU/VFLAIR), which supports VFL training with a variety of models, datasets and protocols, along with standardized modules for comprehensive evaluations of attacks and defense strategies. We also benchmark 11 attacks and 8 defenses performance under different communication and model partition settings and draw concrete insights and recommendations on the choice of defense strategies for different practical VFL deployment scenario.
    摘要

Communication Compression for Byzantine Robust Learning: New Efficient Algorithms and Improved Rates

  • paper_url: http://arxiv.org/abs/2310.09804
  • repo_url: None
  • paper_authors: Ahmad Rammal, Kaja Gruntkowska, Nikita Fedin, Eduard Gorbunov, Peter Richtárik
  • for: 本研究探讨了对 collaborative/federated learning 中的审计缓存问题进行 Byzantine 鲁棒性的算法设计,以及通信压缩的搅动。
  • methods: 本文提出了两种新的 Byzantine 鲁棒方法:Byz-DASHA-PAGE 和 Byz-EF21,其中 Byz-DASHA-PAGE 在非 convex 和 Polyak-Lojasiewicz 平坦函数中具有更高的收敛速率、更小的邻居大小,并能承受更多的 Byzantine 工作者。Byz-EF21 方法则是首个具有通信压缩和错误反馈的 Byzantine 鲁棒方法,其bidirectional compression version 是 Byz-EF21-BC。
  • results: 作者在数学实验中测试了提议的方法,并证明了它们在非 convex 和 Polyak-Lojasiewicz 平坦函数中具有更高的收敛速率和更好的承受性。
    Abstract Byzantine robustness is an essential feature of algorithms for certain distributed optimization problems, typically encountered in collaborative/federated learning. These problems are usually huge-scale, implying that communication compression is also imperative for their resolution. These factors have spurred recent algorithmic and theoretical developments in the literature of Byzantine-robust learning with compression. In this paper, we contribute to this research area in two main directions. First, we propose a new Byzantine-robust method with compression -- Byz-DASHA-PAGE -- and prove that the new method has better convergence rate (for non-convex and Polyak-Lojasiewicz smooth optimization problems), smaller neighborhood size in the heterogeneous case, and tolerates more Byzantine workers under over-parametrization than the previous method with SOTA theoretical convergence guarantees (Byz-VR-MARINA). Secondly, we develop the first Byzantine-robust method with communication compression and error feedback -- Byz-EF21 -- along with its bidirectional compression version -- Byz-EF21-BC -- and derive the convergence rates for these methods for non-convex and Polyak-Lojasiewicz smooth case. We test the proposed methods and illustrate our theoretical findings in the numerical experiments.
    摘要 布日兹罗布特性是分布式优化问题中的重要特性,通常在协同学习/联邦学习中出现。这些问题通常很大规模,因此通信压缩也是必要的。这些因素在文献中促进了最近的算法和理论发展,包括Byzantine-robust学习与压缩。在这篇论文中,我们对这个研究领域进行了两个主要贡献。首先,我们提出了一种新的Byzantine-robust方法——Byz-DASHA-PAGE,并证明其在非对称和Polyak-Lojasiewicz细致优化问题中的更好的收敛率,小于前一个方法(Byz-VR-MARINA)的最佳理论收敛保证。其次,我们开发了第一种Byzantine-robust方法与通信压缩,并对其在非对称和Polyak-Lojasiewicz细致优化问题中的收敛率进行了分析。我们还开发了一种双向压缩版本——Byz-EF21-BC,并对其进行了数学分析。我们的实验证明了我们的理论发现。

FLrce: Efficient Federated Learning with Relationship-based Client Selection and Early-Stopping Strategy

  • paper_url: http://arxiv.org/abs/2310.09789
  • repo_url: None
  • paper_authors: Ziru Niu, Hai Dong, A. Kai Qin, Tao Gu
  • for: 提高 Federated Learning(FL)的通信和计算效率,以提供智能服务保持数据隐私。
  • methods: 引入Dropout技术,让限制资源的边缘设备共同训练全球模型参数的一部分。
  • results: FLrce提高了通信和计算效率,在更少的轮数下达到了相同的准确率,并且可以在提前终止FL来降低通信和计算资源的消耗。
    Abstract Federated learning (FL) achieves great popularity in broad areas as a powerful interface to offer intelligent services to customers while maintaining data privacy. Nevertheless, FL faces communication and computation bottlenecks due to limited bandwidth and resource constraints of edge devices. To comprehensively address the bottlenecks, the technique of dropout is introduced, where resource-constrained edge devices are allowed to collaboratively train a subset of the global model parameters. However, dropout impedes the learning efficiency of FL under unbalanced local data distributions. As a result, FL requires more rounds to achieve appropriate accuracy, consuming more communication and computation resources. In this paper, we present FLrce, an efficient FL framework with a relationship-based client selection and early-stopping strategy. FLrce accelerates the FL process by selecting clients with more significant effects, enabling the global model to converge to a high accuracy in fewer rounds. FLrce also leverages an early stopping mechanism to terminate FL in advance to save communication and computation resources. Experiment results show that FLrce increases the communication and computation efficiency by 6% to 73.9% and 20% to 79.5%, respectively, while maintaining competitive accuracy.
    摘要 federated learning (FL) 在各种领域得到了广泛的推广,作为一种保持数据隐私的强大接口,提供智能服务给客户。然而,FL 面临有限带宽和资源限制的边缘设备的通信和计算瓶颈。为了全面解决这些瓶颈,dropout技术被引入,允许有限资源的边缘设备共同训练全球模型参数的一部分。然而,dropout会降低FL在不均匀本地数据分布时的学习效率。因此,FL需要更多的轮次来达到适当的准确率,消耗更多的通信和计算资源。在这篇论文中,我们提出了FLrce,一个高效的FL框架,具有关系基于的客户选择和早期终止策略。FLrce 加速了FL 过程,选择有更大影响的客户,使全球模型更快地 converges 到高准确率。FLrce 还利用了早期终止机制,提前终止 FL,以保存通信和计算资源。实验结果表明,FLrce 可以提高通信和计算效率,在 6% 到 73.9% 和 20% 到 79.5% 之间,而且保持竞争性的准确率。

  • paper_url: http://arxiv.org/abs/2310.09787
  • repo_url: None
  • paper_authors: Xiaobo Zhu, Yan Wu, Qinhu Zhang, Zhanheng Chen, Ying He
  • for: 这篇论文的目的是提出一个基于 meta-learning 原则的模型,用于预测新的节点之间的连接。这种预测问题在实际应用中非常重要,例如在推荐系统中给予新的用户有关的项目推荐,以及在社交平台上给予新用户有关的内容推荐。
  • methods: 这篇论文使用了一个称为 temporal encoder 的专门模型,并且使用了一个称为 predictor 的模型来预测新节点是否会产生连接。这两个模型在 meta-learning 原则下进行学习,以便在预测新节点之间的连接时能够更好地适应。
  • results: 在三个公开的数据集上进行了实验,结果显示了这个模型的表现比以前的方法更好。具体来说,这个模型可以更好地预测新节点之间的连接,并且可以在几乎无预警情况下进行预测。
    Abstract Modelling temporal networks for dynamic link prediction of new nodes has many real-world applications, such as providing relevant item recommendations to new customers in recommender systems and suggesting appropriate posts to new users on social platforms. Unlike old nodes, new nodes have few historical links, which poses a challenge for the dynamic link prediction task. Most existing dynamic models treat all nodes equally and are not specialized for new nodes, resulting in suboptimal performances. In this paper, we consider dynamic link prediction of new nodes as a few-shot problem and propose a novel model based on the meta-learning principle to effectively mitigate this problem. Specifically, we develop a temporal encoder with a node-level span memory to obtain a new node embedding, and then we use a predictor to determine whether the new node generates a link. To overcome the few-shot challenge, we incorporate the encoder-predictor into the meta-learning paradigm, which can learn two types of implicit information during the formation of the temporal network through span adaptation and node adaptation. The acquired implicit information can serve as model initialisation and facilitate rapid adaptation to new nodes through a fine-tuning process on just a few links. Experiments on three publicly available datasets demonstrate the superior performance of our model compared to existing state-of-the-art methods.
    摘要 模拟 temporal networks 为新节点动态链接预测有多个实际应用,如为新用户提供相关的ITE推荐和社交平台上新用户的适当帖子推荐。与老节点不同,新节点具有少量历史链接,这对动态链接预测任务带来挑战。大多数现有的动态模型往往对所有节点进行等效处理,从而导致下OPTIMAL表现。在本文中,我们将动态链接预测新节点视为几枚shot问题,并提出一种基于元学习原则的新模型。具体来说,我们开发了一个包含节点级别span记忆的时间编码器,以获得新节点嵌入,然后使用一个预测器来判断新节点是否生成链接。为了解决几枚shot挑战,我们将编码器-预测器 integrate 到元学习 парадиг中,可以在形成 temporal network 过程中通过 span 适应和节点适应学习两种隐式信息。获得的隐式信息可以作为模型初始化,并且通过一些链接的精度适应来快速适应新节点。在三个公开的数据集上进行了实验,我们的模型比现有状态 искусственный方法表现出色。

Pseudo-Bayesian Optimization

  • paper_url: http://arxiv.org/abs/2310.09766
  • repo_url: None
  • paper_authors: Haoxian Chen, Henry Lam
  • for: 这paper的目的是提出一种可靠的黑obox函数优化方法,以便在实际应用中实现优化过程中的可控性和稳定性。
  • methods: 这paper使用了一种叫做“Pseudo-Bayesian Optimization”的方法,它基于一个具有抽象原则的axioma framework,并使用了一种简单的本地回归和随机 prior 的构造来确定优化过程中的uncertainty。
  • results: 这paper的实验结果表明,使用 Pseudo-Bayesian Optimization 方法可以不仅确保优化过程的 convergence,还可以在高维 synthetic experiment、hyperparameter tuning 和机器人应用中实现更高的性能,并且可以与现有的state-of-the-art benchmarks相比,显示出更好的性能。
    Abstract Bayesian Optimization is a popular approach for optimizing expensive black-box functions. Its key idea is to use a surrogate model to approximate the objective and, importantly, quantify the associated uncertainty that allows a sequential search of query points that balance exploitation-exploration. Gaussian process (GP) has been a primary candidate for the surrogate model, thanks to its Bayesian-principled uncertainty quantification power and modeling flexibility. However, its challenges have also spurred an array of alternatives whose convergence properties could be more opaque. Motivated by these, we study in this paper an axiomatic framework that elicits the minimal requirements to guarantee black-box optimization convergence that could apply beyond GP-related methods. Moreover, we leverage the design freedom in our framework, which we call Pseudo-Bayesian Optimization, to construct empirically superior algorithms. In particular, we show how using simple local regression, and a suitable "randomized prior" construction to quantify uncertainty, not only guarantees convergence but also consistently outperforms state-of-the-art benchmarks in examples ranging from high-dimensional synthetic experiments to realistic hyperparameter tuning and robotic applications.
    摘要 bayesian 优化是一种广泛应用的优化方法,用于优化costly黑obox函数。其关键思想是使用一个surrogate模型来近似目标函数,同时能够量化相关的uncertainty,以实现sequential搜索。 Gaussian process(GP)因其 bayesian原理下的uncertainty量化能力和模型灵活性而成为主要候选人。然而,GP也存在一些挑战,这些挑战激发了一系列alternatives的发展。在这篇论文中,我们提出了一个axioms framework,该框架可以保证黑obox优化的收敛性,并且可以在GP相关方法之外应用。此外,我们利用了我们的框架的设计自由度,constructed一种empirically superior的算法。具体来说,我们使用了一个简单的local regression,并使用一种适当的“Randomized Prior”的构建来量化uncertainty。这不仅保证了收敛性,还可以在高维 synthetic experiments中consistently outperformstate-of-the-art benchmarks。

UniTime: A Language-Empowered Unified Model for Cross-Domain Time Series Forecasting

  • paper_url: http://arxiv.org/abs/2310.09751
  • repo_url: None
  • paper_authors: Xu Liu, Junfeng Hu, Yuan Li, Shizhe Diao, Yuxuan Liang, Bryan Hooi, Roger Zimmermann
  • for: 这个研究旨在提出一个横跨多个时间序列应用领域的统一模型架构,以扩大现有模型的应用范围。
  • methods: 本研究提出了UniTime模型,可以灵活地适应不同数据特性,并使用语言-时间缩排变换器和封页来实现多模式识别和时间序列数据的搭配。
  • results: 实验结果显示UniTime模型能够提高现有模型的预测性能和零学习转移性能。
    Abstract Multivariate time series forecasting plays a pivotal role in contemporary web technologies. In contrast to conventional methods that involve creating dedicated models for specific time series application domains, this research advocates for a unified model paradigm that transcends domain boundaries. However, learning an effective cross-domain model presents the following challenges. First, various domains exhibit disparities in data characteristics, e.g., the number of variables, posing hurdles for existing models that impose inflexible constraints on these factors. Second, the model may encounter difficulties in distinguishing data from various domains, leading to suboptimal performance in our assessments. Third, the diverse convergence rates of time series domains can also result in compromised empirical performance. To address these issues, we propose UniTime for effective cross-domain time series learning. Concretely, UniTime can flexibly adapt to data with varying characteristics. It also uses domain instructions and a Language-TS Transformer to offer identification information and align two modalities. In addition, UniTime employs masking to alleviate domain convergence speed imbalance issues. Our extensive experiments demonstrate the effectiveness of UniTime in advancing state-of-the-art forecasting performance and zero-shot transferability.
    摘要 多变量时间序列预测在当代网络技术中扮演着关键角色。与传统方法不同,这项研究提出了跨领域模型的统一模型架构,跨越领域边界。然而,学习有效的跨领域模型存在以下挑战:首先,不同领域的数据特征存在差异,例如变量数量,这会限制现有模型的灵活性。其次,模型可能很难分辨不同领域的数据,导致预测性能下降。最后,时间序列领域的多样化速度也可能导致实际性能下降。为解决这些问题,我们提出了UniTime,一种可靠地适应数据特征变化的模型。具体来说,UniTime可以适应数据中的变量数量变化,并使用领域指令和语言-TS transformer来提供标识信息和对两种模式进行对应。此外,UniTime还使用屏蔽来缓解领域融合速度不平衡问题。我们的广泛实验表明UniTime可以提高预测性能和零代负荷传递性。

Private Synthetic Data Meets Ensemble Learning

  • paper_url: http://arxiv.org/abs/2310.09729
  • repo_url: None
  • paper_authors: Haoyuan Sun, Navid Azizan, Akash Srivastava, Hao Wang
  • for: 这篇论文的目的是提高在真实数据上使用机器学习模型的性能,因为训练在实验数据上的模型在真实数据上可能会出现性能下降的问题。
  • methods: 这篇论文使用了一种新的ensemble策略,通过将多个实验数据中的模型 ensemble起来,以增强模型在真实数据上的表现。具体来说,他们使用了一些具有数据分布差异的实验数据,并将这些数据 ensemble起来,以增加模型的数据多样性。
  • results: 根据实验结果,这种ensemble策略可以对于使用GAN-based differential privacy mechanisms(即生成机制)训练的下游模型,提高其在真实数据上的表现,包括精度和模型调整的方面。但是,这种策略不会对于使用margin-based或workload-based differential privacy mechanisms(即统计机制)训练的下游模型提高表现。
    Abstract When machine learning models are trained on synthetic data and then deployed on real data, there is often a performance drop due to the distribution shift between synthetic and real data. In this paper, we introduce a new ensemble strategy for training downstream models, with the goal of enhancing their performance when used on real data. We generate multiple synthetic datasets by applying a differential privacy (DP) mechanism several times in parallel and then ensemble the downstream models trained on these datasets. While each synthetic dataset might deviate more from the real data distribution, they collectively increase sample diversity. This may enhance the robustness of downstream models against distribution shifts. Our extensive experiments reveal that while ensembling does not enhance downstream performance (compared with training a single model) for models trained on synthetic data generated by marginal-based or workload-based DP mechanisms, our proposed ensemble strategy does improve the performance for models trained using GAN-based DP mechanisms in terms of both accuracy and calibration of downstream models.
    摘要 当机器学习模型在生成的数据上训练并在实际数据上部署时,通常会出现性能下降,这是因为生成的数据和实际数据的分布shift。在这篇论文中,我们介绍了一种新的集成策略,用于训练下游模型,以提高它们在实际数据上的表现。我们通过应用多次 diferencial privacy(DP)机制来生成多个生成的数据集,然后将这些数据集上训练的下游模型 ensemble。虽然每个生成的数据集可能更 deviation from the real data distribution,但它们的总体样本多样性可以提高下游模型对分布shift的Robustness。我们的广泛实验表明,对于基于marginal-based或workload-based DP机制生成的数据集,集成不会提高下游模型的性能(与单个模型训练相比),但是我们的提议的集成策略对基于GAN-based DP机制生成的数据集进行训练,可以提高模型的准确性和下游模型的Calibration。

SVM based Multiclass Classifier for Gait phase Classification using Shank IMU Sensor

  • paper_url: http://arxiv.org/abs/2310.09728
  • repo_url: None
  • paper_authors: Aswadh Khumar G S, Barath Kumar JK
  • for: 本研究旨在开发一种基于SVM多类分类的步态分类方法,以高精度地标识步态阶段,包括七个子阶段。
  • methods: 该方法使用个体IMU传感器数据,如膝盖加速度x、y、z和膝盖陀螺x,作为特征进行分类。
  • results: 该方法可以高度准确地分类不同的步态阶段,准确率约为90.3%。Here’s a breakdown of each point:
  • for: 本研究旨在开发一种基于SVM多类分类的步态分类方法,以高精度地标识步态阶段,包括七个子阶段。 (The study aims to develop a gait phase classification method based on SVM multi-class classification, to accurately identify the gait phases, including seven sub-phases.)
  • methods: 该方法使用个体IMU传感器数据,如膝盖加速度x、y、z和膝盖陀螺x,作为特征进行分类。 (The method uses individual IMU sensor data, such as shank acceleration x, y, and z, and knee angles, as features for classification.)
  • results: 该方法可以高度准确地分类不同的步态阶段,准确率约为90.3%。 (The method can accurately classify different gait phases with an accuracy of approximately 90.3%.)
    Abstract In this study, a gait phase classification method based on SVM multiclass classification is introduced, with a focus on the precise identification of the stance and swing phases, which are further subdivided into seven phases. Data from individual IMU sensors, such as Shank Acceleration X, Y, Z, Shank Gyro X, and Knee Angles, are used as features in this classification model. The suggested technique successfully classifies the various gait phases with a significant accuracy of about 90.3%. Gait phase classification is crucial, especially in the domains of exoskeletons and prosthetics, where accurate identification of gait phases enables seamless integration with assistive equipment, improving mobility, stability, and energy economy. This study extends the study of gait and offers an effective method for correctly identifying gait phases from Shank IMU sensor data, with potential applications in biomechanical research, exoskeletons, rehabilitation, and prosthetics.
    摘要 在本研究中,基于SVM多类分类的步态分类方法被引入,强调精准地识别步态的不同阶段,这些阶段进一步细分为七个阶段。研究使用个体IMU传感器数据,如膝盖加速度x、y、z、膝盖陀螺x等,作为分类模型的特征。提议的技术成功地分类不同的步态阶段,准确率达到了约90.3%。步态分类在许多领域都非常重要,如外围机械助手和假肢,准确识别步态阶段可以帮助融合助手设备,提高 mobilidad、稳定性和能源经济。本研究对步态的研究进一步推广,并提供了基于膝盖IMU传感器数据的有效的步态分类方法,可能在生物机械研究、外围机械助手、rehabilitation和假肢领域得到应用。

Provably Fast Convergence of Independent Natural Policy Gradient for Markov Potential Games

  • paper_url: http://arxiv.org/abs/2310.09727
  • repo_url: https://github.com/sundave1998/independent-npg-mpg
  • paper_authors: Youbang Sun, Tao Liu, Ruida Zhou, P. R. Kumar, Shahin Shahrampour
  • for: 这个研究探讨了一种独立自然策略加速算法(NPG),用于多代理游戏学习问题中的Markov潜在游戏。
  • methods: 这种独立NPG方法使用了一个假 oracle,以获得精确的策略评估,从而在技术假设和潜在差额的假设下,在 $\mathcal{O}(1/\epsilon)$ 迭代中达到 $\epsilon$-纳什平衡(NE)。
  • results: 这个研究证明了,在 synthetic potential game 和 congestion game 中,独立NPG方法可以在 $\mathcal{O}(1/\epsilon)$ 迭代中达到 $\epsilon$-纳什平衡,超过之前的最佳结果 $\mathcal{O}(1/\epsilon^2)$ 迭代。
    Abstract This work studies an independent natural policy gradient (NPG) algorithm for the multi-agent reinforcement learning problem in Markov potential games. It is shown that, under mild technical assumptions and the introduction of the \textit{suboptimality gap}, the independent NPG method with an oracle providing exact policy evaluation asymptotically reaches an $\epsilon$-Nash Equilibrium (NE) within $\mathcal{O}(1/\epsilon)$ iterations. This improves upon the previous best result of $\mathcal{O}(1/\epsilon^2)$ iterations and is of the same order, $\mathcal{O}(1/\epsilon)$, that is achievable for the single-agent case. Empirical results for a synthetic potential game and a congestion game are presented to verify the theoretical bounds.
    摘要

SGA: A Graph Augmentation Method for Signed Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2310.09705
  • repo_url: None
  • paper_authors: Zeyu Zhang, Shuyan Wan, Sijie Wang, Xianda Zheng, Xinrui Zhang, Kaiqi Zhao, Jiamou Liu, Dong Hao
  • for: This paper is written for analyzing complex patterns in real-world signed graphs, and addressing three key challenges in SGNN-based signed graph representation learning: sparsity, unbalanced triangles, and lack of supplementary information.
  • methods: The paper proposes a novel Signed Graph Augmentation framework (SGA) that includes three main components: (1) using an SGNN model to encode the signed graph and extract latent structural information for candidate augmentation structures, (2) evaluating and selecting the most beneficial candidate samples for modifying the original training set, and (3) a novel augmentation perspective that assigns varying training difficulty to training samples.
  • results: The paper demonstrates significant improvements in performance across multiple benchmarks using the proposed SGA method, outperforming baselines by up to 22.2% in AUC, 33.3% in F1-binary, 48.8% in F1-micro, and 36.3% in F1-macro on six real-world datasets.
    Abstract Signed Graph Neural Networks (SGNNs) are vital for analyzing complex patterns in real-world signed graphs containing positive and negative links. However, three key challenges hinder current SGNN-based signed graph representation learning: sparsity in signed graphs leaves latent structures undiscovered, unbalanced triangles pose representation difficulties for SGNN models, and real-world signed graph datasets often lack supplementary information like node labels and features. These constraints limit the potential of SGNN-based representation learning. We address these issues with data augmentation techniques. Despite many graph data augmentation methods existing for unsigned graphs, none are tailored for signed graphs. Our paper introduces the novel Signed Graph Augmentation framework (SGA), comprising three main components. First, we employ the SGNN model to encode the signed graph, extracting latent structural information for candidate augmentation structures. Second, we evaluate these candidate samples (edges) and select the most beneficial ones for modifying the original training set. Third, we propose a novel augmentation perspective that assigns varying training difficulty to training samples, enabling the design of a new training strategy. Extensive experiments on six real-world datasets (Bitcoin-alpha, Bitcoin-otc, Epinions, Slashdot, Wiki-elec, and Wiki-RfA) demonstrate that SGA significantly improves performance across multiple benchmarks. Our method outperforms baselines by up to 22.2% in AUC for SGCN on Wiki-RfA, 33.3% in F1-binary, 48.8% in F1-micro, and 36.3% in F1-macro for GAT on Bitcoin-alpha in link sign prediction.
    摘要 Signed Graph Neural Networks (SGNNs) 是对实际中带有正负链接的签名图进行分析复杂模式的关键工具。然而,现有的SGNN模型在签名图表示学习中存在三大挑战:签名图中的稀疏性使得潜在结构未被发现,不均衡的triangle对SGNN模型进行表示带来挑战,而且现实中的签名图数据往往缺乏节点标签和特征信息。这些限制使SGNN-基于表示学习的潜力受限。我们通过数据扩充技术来解决这些问题。虽然现有许多对 unsigned 图进行数据扩充的方法,但是这些方法并没有适应签名图。我们的论文提出了一种新的签名图扩充框架(SGA),包括以下三个主要组成部分:1. 我们使用 SGNN 模型来编码签名图,提取签名图中的潜在结构信息作为候选扩充结构。2. 我们评估这些候选样本(边),并选择对原始训练集进行最有利的修改。3. 我们提出了一种新的增强训练方法,对各种训练样本分配不同的训练难度,以便设计更好的训练策略。我们在六个真实世界数据集(Bitcoin-alpha、Bitcoin-otc、Epinions、Slashdot、Wiki-elec和Wiki-RfA)进行了广泛的实验,结果表明,SGA 可以在多个 bench 上显著提高性能。我们的方法在 Wiki-RfA 上的 AUC 上比基eline 提高了22.2%,在 Bitcoin-alpha 上的 F1-binary、F1-micro 和 F1-macro 上提高了33.3%、48.8% 和 36.3%。

When Collaborative Filtering is not Collaborative: Unfairness of PCA for Recommendations

  • paper_url: http://arxiv.org/abs/2310.09687
  • repo_url: None
  • paper_authors: David Liu, Jackie Baek, Tina Eliassi-Rad
  • for: 这paper主要研究了推荐系统中的维度减少方法的公平性。
  • methods: 这paper使用了主要方法为原理 Component Analysis (PCA),它可以从高维数据中提取特征组件,并生成一个低维表示。
  • results: 这paper发现了PCA的两个下面机制,它们会导致推荐系统中的不公平性。此外,paper还提出了一种改进PCA的算法,即Item-Weighted PCA,可以更好地处理不同类型的项目。在一些假设的矩阵上,paper证明了Item-Weighted PCA使用特定的质量可以最小化一个媒体化错误度量。在实际数据上,这paper发现Item-Weighted PCA不仅可以提高总体推荐质量,还可以提高流行和不流行的项目。
    Abstract We study the fairness of dimensionality reduction methods for recommendations. We focus on the established method of principal component analysis (PCA), which identifies latent components and produces a low-rank approximation via the leading components while discarding the trailing components. Prior works have defined notions of "fair PCA"; however, these definitions do not answer the following question: what makes PCA unfair? We identify two underlying mechanisms of PCA that induce unfairness at the item level. The first negatively impacts less popular items, due to the fact that less popular items rely on trailing latent components to recover their values. The second negatively impacts the highly popular items, since the leading PCA components specialize in individual popular items instead of capturing similarities between items. To address these issues, we develop a polynomial-time algorithm, Item-Weighted PCA, a modification of PCA that uses item-specific weights in the objective. On a stylized class of matrices, we prove that Item-Weighted PCA using a specific set of weights minimizes a popularity-normalized error metric. Our evaluations on real-world datasets show that Item-Weighted PCA not only improves overall recommendation quality by up to $0.1$ item-level AUC-ROC but also improves on both popular and less popular items.
    摘要 我们研究了维度减少方法的公平性,特别是已知的主 componenets分析(PCA)方法。PCA方法可以找到缺失的特征并生成一个低级别的approximation,通过主要的特征来抛弃追随的特征。先前的研究已经定义了“公平PCA”的概念,但这些定义并没有回答以下问题:PCA方法是如何不公平的?我们认为PCA方法存在两种下面机制,导致item级别的不公平ness。首先,less popular items会受到负面影响,因为这些items rely on追随的特征来恢复其价值。其次,highly popular items会受到负面影响,因为leading PCA components会专注于个体受欢迎的items而不是捕捉items之间的相似性。为了解决这些问题,我们开发了一个幂时间算法,Item-Weighted PCA,这是PCA方法的修改。在一个简化的矩阵类型上,我们证明了Item-Weighted PCA使用的项目特定的权重在目标函数中具有最小化一个受欢迎度normalized error metric的性能。我们对实际数据进行评估,显示Item-Weighted PCA不仅提高了总的推荐质量,最高达0.1个item-level AUC-ROC,同时也提高了受欢迎和less popular items的质量。

Enhancing Column Generation by Reinforcement Learning-Based Hyper-Heuristic for Vehicle Routing and Scheduling Problems

  • paper_url: http://arxiv.org/abs/2310.09686
  • repo_url: None
  • paper_authors: Kuan Xu, Li Shen, Lindong Liu
  • for: 提高大规模问题的解决效率和解得质量
  • methods: 利用强化学习法 Hyper-heuristic 框架 RLHH 加速Column Generation 方法,并在每个 CG 迭代中选择最佳低级别规划算法
  • results: 在 Vehicle Routing Problem with Time Windows 和 Bus Driver Scheduling Problem 两个典型的 combinatorial optimization 问题中,可以提高解得质量,最高减少总成本达 27.9% 和 15.4%,在相同或更少的计算时间内减少计算时间。
    Abstract Column generation (CG) is a vital method to solve large-scale problems by dynamically generating variables. It has extensive applications in common combinatorial optimization, such as vehicle routing and scheduling problems, where each iteration step requires solving an NP-hard constrained shortest path problem. Although some heuristic methods for acceleration already exist, they are not versatile enough to solve different problems. In this work, we propose a reinforcement learning-based hyper-heuristic framework, dubbed RLHH, to enhance the performance of CG. RLHH is a selection module embedded in CG to accelerate convergence and get better integer solutions. In each CG iteration, the RL agent selects a low-level heuristic to construct a reduced network only containing the edges with a greater chance of being part of the optimal solution. In addition, we specify RLHH to solve two typical combinatorial optimization problems: Vehicle Routing Problem with Time Windows (VRPTW) and Bus Driver Scheduling Problem (BDSP). The total cost can be reduced by up to 27.9\% in VRPTW and 15.4\% in BDSP compared to the best lower-level heuristic in our tested scenarios, within equivalent or even less computational time. The proposed RLHH is the first RL-based CG method that outperforms traditional approaches in terms of solution quality, which can promote the application of CG in combinatorial optimization.
    摘要 column generation (CG) 是一种重要的方法,用于解决大规模问题,通过动态生成变量。它在各种常见的 combinatorial optimization 中有广泛的应用,如车辆 Routing 和调度问题,每个迭代步骤都需要解决一个 NP-hard 约束短路问题。虽然一些启发法已经存在,但它们并不够 versatile enough 解决不同的问题。在这项工作中,我们提出了一种基于强化学习的 hyper-heuristic 框架,称为 RLHH,以提高 CG 的性能。RLHH 是 CG 中的一个选择模块,用于加速迭代和获得更好的整数解。在每个 CG 迭代中,RL Agent 将选择一个低级别启发,用于构建一个只包含有更高可能性成为优解的最佳解的减少网络。此外,我们将 RLHH 应用于两种典型的 combinatorial optimization 问题:车辆 Routing 问题 with Time Windows (VRPTW) 和 Bus Driver Scheduling 问题 (BDSP)。我们在测试场景中发现,RLHH 可以将总成本降低到 27.9% 以下,相对于最佳下级别启发,并且在相同或更少的计算时间内达成。我们的提案的 RLHH 是首个通过强化学习来超越传统方法的 CG 方法,可以提高 CG 在 combinatorial optimization 中的应用。

eess.IV - 2023-10-15

Joint Sparse Representations and Coupled Dictionary Learning in Multi-Source Heterogeneous Image Pseudo-color Fusion

  • paper_url: http://arxiv.org/abs/2310.09937
  • repo_url: None
  • paper_authors: Long Bai, Shilong Yao, Kun Gao, Yanjun Huang, Ruijie Tang, Hong Yan, Max Q. -H. Meng, Hongliang Ren
  • for: 提出一种基于 Coupled Dictionary Learning (CDL) 方法的 Synthetic Aperture Radar (SAR) 和多spectral pseudo-color合并方法,以实现高质量的合并图像。
  • methods: 使用传统的 Brovey 变换进行预处理,然后使用 CDL 捕捉对照图像对的相关性,通过强制联合稀热编码生成字典。最后,利用对字典中的联合稀热表示来构建图像掩蔽mask,并生成最终的合并图像。
  • results: 通过使用 Sentinel-1 卫星的 SAR 图像和 Landsat-8 卫星的多spectral图像进行实验验证,提出的方法可以实现优秀的视觉效果和数值性能,包括spectral distortion、相关系数、MSE、NIQE、BRISQUE 和 PIQE 等指标。
    Abstract Considering that Coupled Dictionary Learning (CDL) method can obtain a reasonable linear mathematical relationship between resource images, we propose a novel CDL-based Synthetic Aperture Radar (SAR) and multispectral pseudo-color fusion method. Firstly, the traditional Brovey transform is employed as a pre-processing method on the paired SAR and multispectral images. Then, CDL is used to capture the correlation between the pre-processed image pairs based on the dictionaries generated from the source images via enforced joint sparse coding. Afterward, the joint sparse representation in the pair of dictionaries is utilized to construct an image mask via calculating the reconstruction errors, and therefore generate the final fusion image. The experimental verification results of the SAR images from the Sentinel-1 satellite and the multispectral images from the Landsat-8 satellite show that the proposed method can achieve superior visual effects, and excellent quantitative performance in terms of spectral distortion, correlation coefficient, MSE, NIQE, BRISQUE, and PIQE.
    摘要 基于coupled dictionary learning(CDL)方法,我们提出了一种新的Synthetic Aperture Radar(SAR)和多spectral pseudo-color融合方法。首先,我们使用传统的Brovey变换作为预处理方法,对paired SAR和多spectral图像进行预处理。然后,我们使用CDL来捕捉paired图像对的相关性,基于源图像生成的字典via强制联合稀热编码。接着,我们利用对的字典中的联合稀热表示来构建图像掩码,通过计算重建错误来生成最终融合图像。实验Result of SAR图像来自Sentinel-1卫星和多spectral图像来自Landsat-8卫星表明,提出的方法可以实现优秀的视觉效果,且在 spectral distortion、相关系数、MSE、NIQE、BRISQUE和PIQE等方面具有出色的量化表现。

Segment Anything Model for Pedestrian Infrastructure Inventory: Assessing Zero-Shot Segmentation on Multi-Mode Geospatial Data

  • paper_url: http://arxiv.org/abs/2310.09918
  • repo_url: None
  • paper_authors: Jiahao Xia, Gavin Gong, Jiawei Liu, Zhigang Zhu, Hao Tang
  • for: 这个论文旨在设计和优化基于Segment Anything Model(SAM)的人行道基础设施分割工作流程,能够有效处理多源地ospatial数据,包括LiDAR数据和卫星图像数据。
  • methods: 本论文使用扩展的人行道基础设施清单,包括通常被传统定义中排除的街区用品对象。我们的贡献在于生成必要的知识,回答以下两个问题:首先,哪种数据表示可以使SAM实现零批处理基础设施对象?其次,SAM如何在分割人行道基础设施对象方面表现?
  • results: 我们的发现表明,将来自移动LiDAR点云数据生成的街景图像与卫星图像数据结合使用,可以与SAM高效地创建可扩展的人行道基础设施清单,具有立即的利用价值,对于GIS专业人员、城市管理者、交通所有者和残疾人旅行者都具有重要意义。
    Abstract In this paper, a Segment Anything Model (SAM)-based pedestrian infrastructure segmentation workflow is designed and optimized, which is capable of efficiently processing multi-sourced geospatial data including LiDAR data and satellite imagery data. We used an expanded definition of pedestrian infrastructure inventory which goes beyond the traditional transportation elements to include street furniture objects often omitted from the traditional definition. Our contributions lie in producing the necessary knowledge to answer the following two questions. First, which data representation can facilitate zero-shot segmentation of infrastructure objects with SAM? Second, how well does the SAM-based method perform on segmenting pedestrian infrastructure objects? Our findings indicate that street view images generated from mobile LiDAR point cloud data, when paired along with satellite imagery data, can work efficiently with SAM to create a scalable pedestrian infrastructure inventory approach with immediate benefits to GIS professionals, city managers, transportation owners, and walkers, especially those with travel-limiting disabilities.
    摘要 在这篇论文中,我们设计了基于Segment Anything Model(SAM)的人行道基础设施分割工作流程,可以高效处理多源地ospatial数据,包括LiDAR数据和卫星图像数据。我们使用扩展的人行道基础设施清单定义,超出传统交通元素,包括通常被忽略的街 furniture对象。我们的贡献在于生成必要的知识,回答以下两个问题:首先,哪种数据表示可以通过SAM实现零批处理基础设施对象?第二,SAM基于方法如何在基础设施对象上进行分割?我们的发现表明,从移动LiDAR点云数据生成的街景图像,当与卫星图像数据结合使用时,可以高效地与SAM合作创建可扩展的人行道基础设施清单方法,具有立即的利益 дляGIS专业人员、城市管理者、交通所有人和残疾人,特别是那些受限的旅行者。

eess.SP - 2023-10-15

Distributed Estimation with Partially Accessible Information: An IMAT Approach to LMS Diffusion

  • paper_url: http://arxiv.org/abs/2310.09970
  • repo_url: None
  • paper_authors: Mahdi Shamsi, Farokh Marvasti
  • for: 提高分布式算法的可见性和稳定性
  • methods: 基于信号流分析的组合策略分析框架和阈值算法
  • results: 在时域和变换域中存在缺失信息的情况下,提出了一种基于阈值算法的支持向量识别和利用策略,并在两种组合enario中进行了示范
    Abstract Distributed algorithms, particularly Diffusion Least Mean Square, are widely favored for their reliability, robustness, and fast convergence in various industries. However, limited observability of the target can compromise the integrity of the algorithm. To address this issue, this paper proposes a framework for analyzing combination strategies by drawing inspiration from signal flow analysis. A thresholding-based algorithm is also presented to identify and utilize the support vector in scenarios with missing information about the target vector's support. The proposed approach is demonstrated in two combination scenarios, showcasing the effectiveness of the algorithm in situations characterized by sparse observations in the time and transform domains.
    摘要 diffused least squares 算法在各个领域得到广泛应用,特别是因为它们的可靠性、鲁棒性和快速收敛性。然而,target vector的有限可见性可能会导致算法的完整性受到损害。为解决这个问题,本文提出了一种基于信号流分析的框架,并提出了一种阈值分析法来Identify和利用目标向量的支持向量在有限信息的情况下。该方法在时域和变换域中的缺失观测场景中进行了两种组合场景的示例,展示了该算法在稀疏观测场景中的效果。Note: "Diffusion Least Mean Square" in the original text is translated as "diffused least squares 算法" in Simplified Chinese, as "diffusion" is not a commonly used term in Chinese and "least squares" is more commonly used to refer to this type of algorithm.

Semi-Supervised End-to-End Learning for Integrated Sensing and Communications

  • paper_url: http://arxiv.org/abs/2310.09940
  • repo_url: https://github.com/josemateosramos/sslisac
  • paper_authors: José Miguel Mateos-Ramos, Baptiste Chatelier, Christian Häger, Musa Furkan Keskin, Luc Le Magoarou, Henk Wymeersch
  • for: 本文针对 ISAC 混合感知通信系统的问题进行研究,旨在提高硬件、频率和能源效率。
  • methods: 本文使用 differentiable model-based 学习方法,实现了单目标检测和定位估计,以及多input single-output 通信。
  • results: 我们的结果显示,使用半指导学习策略可以实现相似的性能,仅需使用 98.8% fewer labeled data。
    Abstract Integrated sensing and communications (ISAC) is envisioned as one of the key enablers of next-generation wireless systems, offering improved hardware, spectral, and energy efficiencies. In this paper, we consider an ISAC transceiver with an impaired uniform linear array that performs single-target detection and position estimation, and multiple-input single-output communications. A differentiable model-based learning approach is considered, which optimizes both the transmitter and the sensing receiver in an end-to-end manner. An unsupervised loss function that enables impairment compensation without the need for labeled data is proposed. Semi-supervised learning strategies are also proposed, which use a combination of small amounts of labeled data and unlabeled data. Our results show that semi-supervised learning can achieve similar performance to supervised learning with 98.8% less required labeled data.
    摘要 “集成感测通信(ISAC)是未来无线系统的关键促进因素,提供了改善硬件、频率和能量效率。本文考虑了一种受损均匀线列天线的ISAC收发器,实现单目标探测和位置估计,以及多输入单出口通信。我们使用可导模型基本学习方法,在终端干扰下优化发射器和探测Receiver。我们提出了无标签数据补偿的不监督学习策略,以及使用小量标签数据和无标签数据的半监督学习策略。我们的结果表明,半监督学习可以与监督学习准确率相似,仅需98.8%的标签数据。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Enhance Security of Time-Modulated Array-Enabled Directional Modulation by Introducing Symbol Ambiguity

  • paper_url: http://arxiv.org/abs/2310.09922
  • repo_url: None
  • paper_authors: Zhihao Tao, Zhaoyi Xu, Athina Petropulu
  • for: 这个论文研究了时间模拟数组(TMA)启用的方向性模拟(DM)通信系统是否可以破解。
  • methods: 论文首先示出了使用格点搜索可以成功找到TMA生成的唯一和实际混合矩阵。然后,提出了引入符号模糊来防止格点搜索的推论,并设计了两个原则来构建符号模糊:一是缺rank的缺失和非唯一性的ON-OFF切换模式。
  • results: 论文提出的原则和机制不仅可以在理论上设计更安全的TMA DM系统,还经验 validate 了其效果。
    Abstract In this paper, if the time-modulated array (TMA)-enabled directional modulation (DM) communication system can be cracked is investigated and the answer is YES! We first demonstrate that the scrambling data received at the eavesdropper can be defied by using grid search to successfully find the only and actual mixing matrix generated by TMA. Then, we propose introducing symbol ambiguity to TMA to defend the defying of grid search, and design two principles for the TMA mixing matrix, i.e., rank deficiency and non-uniqueness of the ON-OFF switching pattern, that can be used to construct the symbol ambiguity. Also, we present a feasible mechanism to implement these two principles. Our proposed principles and mechanism not only shed light on how to design a more secure TMA DM system theoretically in the future, but also have been validated to be effective by bit error rate measurements.
    摘要 在这篇论文中,我们调查了使用时间模拟数组(TMA)启用方向性模式(DM)通信系统是否可以被破解。我们首先表明了使用格点搜索可以成功地找到由TMA生成的唯一和实际混合矩阵。然后,我们提议引入符号模糊性来防止格点搜索的推断,并设计了两种原则来构建符号模糊性,即缺陷行列和非唯一性的ON-OFF切换模式。此外,我们还提出了可行的实现机制。我们的提出的原则和机制不仅帮助我们在未来理论上设计更安全的TMA DM系统,而且已经被验证了通过比特错误率测量。

Stacked Intelligent Metasurface Performs a 2D DFT in the Wave Domain for DOA Estimation

  • paper_url: http://arxiv.org/abs/2310.09861
  • repo_url: None
  • paper_authors: Jiancheng An, Chau Yuen, Marco Di Renzo, Merouane Debbah, H. Vincent Poor, Lajos Hanzo
  • for: 这个论文的目的是提出一种基于受托辐射元件(SIM)的技术来实现二维方向来源估算(DOA)。
  • methods: 这种技术使用了一种先进的SIM,其中每个元件在入射波传播过程中自动完成了二维离散傅里叶变换(DFT)。为了使SIM完成这个任务,我们设计了一个梯度下降算法,用于逐步更新每个元件的相位Shift,以最小化SIM的响应和2D DFT矩阵之间的差异。
  • results: 数值模拟结果表明,一个充分训练的SIM可以很准确地完成2D DFT。例如,在实验中,SIM的计算速度为光学计算速度,DOA估算的 сред平均误差(MSE)为10^-4。
    Abstract Staked intelligent metasurface (SIM) based techniques are developed to perform two-dimensional (2D) direction-of-arrival (DOA) estimation. In contrast to the conventional designs, an advanced SIM in front of the receiving array automatically performs the 2D discrete Fourier transform (DFT) as the incident waves propagate through it. To arrange for the SIM to carry out this task, we design a gradient descent algorithm for iteratively updating the phase shift of each meta-atom in the SIM to minimize the fitting error between the SIM's response and the 2D DFT matrix. To further improve the DOA estimation accuracy, we configure the phase shifts in the input layer of SIM to generate a set of 2D DFT matrices having orthogonal spatial frequency bins. Extensive numerical simulations verify the capability of a well-trained SIM to perform 2D DFT. Specifically, it is demonstrated that the SIM having an optical computational speed achieves an MSE of $10^{-4}$ in 2D DOA estimation.
    摘要 “基于固化智能表面(SIM)技术的二维方向来源估计(2D DOA)方法已经开发出来。与传统设计不同的是,我们在接收阵列前方的高级SIM上自动执行2D离散傅里叶变换(DFT)。为让SIM进行这项任务,我们设计了一种梯度下降算法,通过迭代更新每个元素的相位偏移,以最小化SIM的响应和2D DFT矩阵之间的差异。为了进一步提高DOA估计精度,我们在SIM的输入层中配置了一系列的2D DFT矩阵,其中每个矩阵具有正交的空间频率分辨率。数值仿真表明,一个具有光学计算速度的SIM可以实现2D DOA估计的 mean squared error(MSE)为10^-4。”Note: The translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. The other form is Traditional Chinese.

Towards Structural Sparse Precoding: Dynamic Time, Frequency, Space, and Power Multistage Resource Programming

  • paper_url: http://arxiv.org/abs/2310.09840
  • repo_url: None
  • paper_authors: Zhongxiang Wei, Ping Wang, Qingjiang Shi, Xu Zhu, Christos Masouros
  • for: 这篇论文主要针对 fifth-generation 通信系统中的即时传输应用需求,尤其是在时间维度上的质量要求。
  • methods: 本论文使用 multistage 优化方法,包括时间、频率、空间和能量领域资源的共同优化。
  • results: 本论文的设计可以实现高性能的传输系统,并且具有快速的数据测试速率。
    Abstract In last decades, dynamic resource programming in partial resource domains has been extensively investigated for single time slot optimizations. However, with the emerging real-time media applications in fifth-generation communications, their new quality of service requirements are often measured in temporal dimension. This requires multistage optimization for full resource domain dynamic programming. Taking experience rate as a typical temporal multistage metric, we jointly optimize time, frequency, space and power domains resource for multistage optimization. To strike a good tradeoff between system performance and computational complexity, we first transform the formulated mixed integer non-linear constraints into equivalent convex second order cone constraints, by exploiting the coupling effect among the resources. Leveraging the concept of structural sparsity, the objective of max-min experience rate is given as a weighted 1-norm term associated with the precoding matrix. Finally, a low-complexity iterative algorithm is proposed for full resource domain programming, aided by another simple conic optimization for obtaining its feasible initial result. Simulation verifies that our design significantly outperform the benchmarks while maintaining a fast convergence rate, shedding light on full domain dynamic resource programming of multistage optimizations.
    摘要 最近几十年,在半资源领域中进行了广泛的动态资源编程,以优化单个时间槽。然而,五代通信技术出现后,新的服务质量要求 oft measure在时间维度上。这需要进行全资源领域的多阶段优化。以经验率为例的 temporaldimensional metric,我们同时优化了时间、频率、空间和功率领域的资源。为了 достичь系统性能和计算复杂度之间的好 equilibrio,我们首先将混合整数非线性约束转化为等效的几何二次辐射约束,利用资源之间的协同作用。然后,我们给出了一个优化目标函数,其中包含了约束matrix的权重的1- norm。最后,我们提出了一种低复杂度的迭代算法,用于实现全资源领域的动态资源编程,并且使用另一个简单的几何优化算法来获得其可行的初始结果。实验证明了我们的设计在比较性能和速度方面具有显著的优势,提供了全资源领域的动态资源编程的全面性和可行性。

Cell-Free Massive MIMO Surveillance Systems

  • paper_url: http://arxiv.org/abs/2310.09769
  • repo_url: None
  • paper_authors: Zahra Mobini, Hien Quoc Ngo, Michail Matthaiou, Lajos Hanzo
  • for: 本研究旨在提高国家安全性,通过使用无线监测系统来监测不可信通信链接。
  • methods: 本研究提出了一种新的维度免疫多输入多输出(CF-mMIMO)无线监测系统,其中许多分散的多天线帮助监测节点(MNs)进行观察或干扰不可信的通信链接。
  • results: 我们分析了CF-mMIMO无线监测系统的性能,并 deriv了关于监测成功率的关闭式表达式。我们还提出了一种满足同时观察和干扰的模式分配算法,以及一种最大化最小监测成功率的干扰发射功率分配算法。研究结果表明,我们的提posed CF-mMIMO系统可以在相对较少的MN数量下提供显著性能提升,比基准值co-located mMIMO系统的11倍。
    Abstract Wireless surveillance, in which untrusted communications links are proactively monitored by legitimate agencies, has started to garner a lot of interest for enhancing the national security. In this paper, we propose a new cell-free massive multiple-input multiple-output (CF-mMIMO) wireless surveillance system, where a large number of distributed multi-antenna aided legitimate monitoring nodes (MNs) embark on either observing or jamming untrusted communication links. To facilitate concurrent observing and jamming, a subset of the MNs is selected for monitoring the untrusted transmitters (UTs), while the remaining MNs are selected for jamming the untrusted receivers (URs). We analyze the performance of CF-mMIMO wireless surveillance and derive a closed-form expression for the monitoring success probability of MNs. We then propose a greedy algorithm for the observing vs, jamming mode assignment of MNs, followed by the conception of a jamming transmit power allocation algorithm for maximizing the minimum monitoring success probability concerning all the UT and UR pairs based on the associated long-term channel state information knowledge. In conclusion, our proposed CF-mMIMO system is capable of significantly improving the performance of the MNs compared to that of the state-of-the-art baseline. In scenarios of a mediocre number of MNs, our proposed scheme provides an 11-fold improvement in the minimum monitoring success probability compared to its co-located mMIMO benchmarker.
    摘要 无线监测,在无法确保通信链路的情况下,由合法机构监测,已经吸引了很多关注,以提高国家安全。在这篇论文中,我们提议一种新的终端分布式多输入多输出(CF-mMIMO)无线监测系统,其中一大量分布在多个antenna帮助合法监测节点(MNs)进行观察或干扰不可信通信链路。为实现同时观察和干扰,一部分MNs用于观察不可信发送器(UTs),而另一部分MNs用于干扰不可信接收器(URs)。我们分析了CF-mMIMO无线监测系统的性能,并 derivated一个关闭式表达式来表示MNs的监测成功率。然后,我们提出了一种满足策略来选择MNs的观察和干扰模式,并提出了一种基于长期通道状态信息的干扰发射功率分配策略,以最大化所有UT和UR对的监测成功率。在结论中,我们的提议的CF-mMIMO系统可以在MNs的数量不多的情况下,与当前基准相比,提高监测成功率的最小值。在一些中等数量的MNs情况下,我们的方案提供了11倍的监测成功率提升,相比于其相对位置的mMIMO参考。

Assessing Smart Algorithms for Gait Phases Detection in Lower Limb Prosthesis: A Comprehensive Review

  • paper_url: http://arxiv.org/abs/2310.09735
  • repo_url: None
  • paper_authors: Barath Kumar JK, Aswadh Khumar G S
  • for: 这些研究旨在提高步态分类的精度,以便在脊梁rehabilitation系统中应用。
  • methods: 这些研究使用了多种感知器,包括佩戴式和非佩戴式的感知器,以获取步态数据。
  • results: 研究发现了多种感知器和感知器组合,可以在日常环境中分析步态模式。这些感知器的选择因素包括感知器的精度、可靠性和成本等。
    Abstract Over the past few years, the division of gait phases has emerged as a complex area of research that carries significant importance for various applications in the field of gait technologies. The accurate partitioning of gait phases plays a crucial role in advancing these applications. Researchers have been exploring a range of sensors that can be employed to provide data for algorithms involved in gait phase partitioning. These sensors can be broadly categorized into two types: wearable and non-wearable, each offering unique advantages and capabilities. In our study aimed at examining the current approaches to gait analysis and detection specifically designed for implementation in ambulatory rehabilitation systems, we conducted a comprehensive meta-analysis of existing research studies. Our analysis revealed a diverse range of sensors and sensor combinations that demonstrate the ability to analyze gait patterns in ambulatory settings. These sensor options vary from basic force-based binary switches to more intricate setups incorporating multiple inertial sensors and sophisticated algorithms. The findings highlight the wide spectrum of available technologies and methodologies used in gait analysis for ambulatory applications. To conduct an extensive review, we systematically examined two prominent databases, IEEE and Scopus, with the aim of identifying relevant studies pertaining to gait analysis. The search criteria were limited to 189 papers published between 1999 and 2023. From this pool, we identified and included five papers that specifically focused on various techniques including Thresholding, Quasi-static method, adaptive classifier, and SVM-based approaches. These selected papers provided valuable insights for our review.
    摘要 过去几年,走势阶段的分类已经成为一个复杂的研究领域,具有重要的应用意义在走势技术领域。精确地分类走势阶段是提高这些应用的关键因素。研究人员正在探索一种以下的仪器来提供资料 для走势阶段分类的算法:抽象和非抽象的仪器,每一种都有各自的优点和能力。在我们的研究中,我们对于数位训练系统中的走势分析和检测进行了广泛的meta分析。我们发现了一些不同的仪器和仪器组合,可以在行动 Setting中分析走势模式。这些仪器选择自基本的力矩基于的二进制变数到更复杂的设备和复杂的算法。我们的发现显示了走势分析在行动应用中的广泛技术和方法。为了进行广泛的评审,我们对IEEE和Scopus两个著名的数据库进行了系统性的搜寻,并将搜寻结果限定为1999年至2023年发表的189篇文献。从这个池中,我们选择和包括了不同的技术,例如阈值分类、静止方法、适应分类和SVM基本方法的五篇文献。这些选择的文献给我们提供了宝贵的启示。

A generalization of the achievable rate of a MISO system using Bode-Fano wideband matching theory

  • paper_url: http://arxiv.org/abs/2310.09723
  • repo_url: None
  • paper_authors: Nitish Deshpande, Miguel R. Castellanos, Saeed R. Khosravirad, Jinfeng Du, Harish Viswanathan, Robert W. Heath Jr
  • for: 本研究旨在提高多输入单输出(MISO)系统的信息理论可实现率,并具体实现了宽频匹配理论。
  • methods: 本研究使用多口电路理论方法,具体是利用频率选择性的散射参数来优化MISO系统的可实现率。
  • results: 研究结果表明,使用优化的传输系数和劳达-法诺不等式约束,可以提高MISO系统的信息理论可实现率。对比 идеаль的可实现率、不考虑匹配约束的可实现率以及使用不优化匹配策略的可实现率,研究结果表明优化匹配网络可以提高MISO系统的可实现率。此外,研究还提出了一种实用的方法来估算可实现率上限。
    Abstract Impedance-matching networks affect power transfer from the radio frequency (RF) chains to the antennas. Their design impacts the signal to noise ratio (SNR) and the achievable rate. In this paper, we maximize the information-theoretic achievable rate of a multiple-input-single-output (MISO) system with wideband matching constraints. Using a multiport circuit theory approach with frequency-selective scattering parameters, we propose a general framework for optimizing the MISO achievable rate that incorporates Bode-Fano wideband matching theory. We express the solution to the achievable rate optimization problem in terms of the optimized transmission coefficient and the Lagrangian parameters corresponding to the Bode-Fano inequality constraints. We apply this framework to a single electric Chu's antenna and an array of two electric Chu's antennas. We compare the optimized achievable rate obtained numerically with other benchmarks like the ideal achievable rate computed by disregarding matching constraints and the achievable rate obtained by using sub-optimal matching strategies like conjugate matching and frequency-flat transmission. We also propose a practical methodology to approximate the achievable rate bound by using the optimal transmission coefficient to derive a physically realizable matching network through the ADS software.
    摘要 “干扰网络影响电力传输自 ради频率(RF)扩展到天线。它们的设计对信号与噪音比(SNR)和可行率有影响。在这篇论文中,我们将最大化多Input单Output(MISO)系统的信号理论可行率。使用多口筒电路理论方法,我们提出一个应用各种频率选择性散射特性的通用框架,以便优化MISO可行率。我们将解决可行率最佳化问题的解释为优化传输系数和Bode-Fano干扰对称理论中的Lagrangian参数。我们将这个框架应用到单电池Chu天线和两个电池Chu天线阵列。我们比较优化的可行率与其他参考标准(例如忽略干扰限制的理论可行率和适应干扰策略如 conjugate matching和频率平坦传输)进行比较。我们还提出了一个实用的方法来近似可行率上限,通过使用最佳传输系数来 derive physically realizable干扰网络,使用ADS软件。”

Two Enhanced-rate Power Allocation Strategies for Active IRS-assisted Wireless Network

  • paper_url: http://arxiv.org/abs/2310.09721
  • repo_url: None
  • paper_authors: Qiankun Cheng, Rongen Dong, Wenlong Cai, Ruiqi Liu, Feng Shu, Jiangzhou Wang
    for:active IRS-aided network under a total power constraintmethods:adjusting power between base station (BS) and IRS, transmit beamforming at BS, reflecting beamforming at IRSresults:maximizing the SNR with two high-performance PA strategies, enhanced multiple random initialization Newton’s (EMRIN) and Taylor polynomial approximation (TPA), which perform much better than fixed PA in accordance with rate, and approach exhaustive search as the number of IRS reflecting elements increases.Here is the Chinese translation of the three key information points:for:活动反射表面协助网络下的总功率限制方法:调整基站和反射表面之间的功率,传输扫描和反射扫描结果:通过两种高性能的PA策略,即增强多random初始化Newton方法(EMRIN)和Taylor多项式approximation(TPA),实现了对级别的最大化SNR,并且比固定PA更好,随着反射表面元素的数量增加,逼近极值搜索。
    Abstract Due to its ability of overcoming the impact of double-fading effect, active intelligent reflecting surface (IRS) has attracted a lot of attention. Unlike passive IRS, active IRS should be supplied by power, thus adjusting power between base station (BS) and IRS having a direct impact on the system rate performance. In this paper, the active IRS-aided network under a total power constraint is modeled with an ability of adjusting power between BS and IRS. Given the transmit beamforming at BS and reflecting beamforming at IRS, the SNR expression is derived to be a function of power allocation (PA) factor, and the optimization of maximizing the SNR is given. Subsequently, two high-performance PA strategies, enhanced multiple random initialization Newton's (EMRIN) and Taylor polynomial approximation (TPA), are proposed. The former is to improve the rate performance of classic Netwon's method to avoid involving a local optimal point by using multiple random initializations. To reduce its high computational complexity, the latter provides a closed-form solution by making use of the first-order Taylor polynomial approximation to the original SNR function. Actually, using TPA, the original optimization problem is transformed into a problem of finding a root for a third-order polynomial.Simulation results are as follows: the first-order TPA of SNR fit its exact expression well, the proposed two PA methods performs much better than fixed PA in accordance with rate, and appoaches exhaustive search as the number of IRS reflecting elements goes to large-scale.
    摘要 因为它可以超越双折射效应的影响,活动智能反射 superficie (IRS) 已经吸引了很多关注。 unlike 被动 IRS,活动 IRS 需要接受电力供应,因此在基站 (BS) 和 IRS 之间的电力调整直接影响系统速率性能。在这篇文章中,我们模型了具有电力限制的活动 IRS-助け的网络。给定基站发射扫描和 IRS 反射扫描,我们 derivate 了 SNR 表达式,并提出了最大化 SNR 的优化问题。然后,我们提出了两种高性能 PA 策略:增强多个随机初始化 Newton 方法(EMRIN)和 Taylor polynomials 近似法(TPA)。前者是为了提高 классических Newton 方法的率性能,避免Local 优点点的涉及。而后者通过使用首颗 Taylor polynomials 近似来原 SNR 函数,提供了一个关闭式解决方案,从而减少了高计算复杂性。实际上,使用 TPA,原来的优化问题被转化为了一个找到第三阶 polynomials 的根的问题。实验结果如下:TPA 的第一阶近似与 exact 表达式很相似,而我们提出的两种 PA 方法在 accordance WITH 率性能上明显超过 fix PA,并且随着 IRS 反射元件的数量增加,两种方法的性能接近 exhaustive search。

cs.SD - 2023-10-14

Dynamic Prediction of Full-Ocean Depth SSP by Hierarchical LSTM: An Experimental Result

  • paper_url: http://arxiv.org/abs/2310.09522
  • repo_url: None
  • paper_authors: Jiajun Lu, Wei Huang, Hao Zhang
  • for: 用于预测未来水声速度分布,提高海上定位、导航和时间测量(PNT)精度。
  • methods: 提议使用层次Long Short-Term Memory(H-LSTM)神经网络预测未来水声速度分布,利用时间维度中的声速分布分布模式。
  • results: 通过实验和仿真 validate the proposed method,结果显示该方法的准确率高于现有方法。
    Abstract SSP distribution is an important parameter for underwater positioning, navigation and timing (PNT) because it affects the propagation mode of underwater acoustic signals. To accurate predict future sound speed distribution, we propose a hierarchical long short--term memory (H--LSTM) neural network for future sound speed prediction, which explore the distribution pattern of sound velocity in the time dimension. To verify the feasibility and effectiveness, we conducted both simulations and real experiments. The ocean experiment was held in the South China Sea in April, 2023. Results show that the accuracy of the proposed method outperforms the state--of--the--art methods.
    摘要 <>Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well.