cs.LG - 2023-08-24

Easy attention: A simple self-attention mechanism for Transformers

  • paper_url: http://arxiv.org/abs/2308.12874
  • repo_url: None
  • paper_authors: Marcial Sanchis-Agudo, Yuning Wang, Karthik Duraisamy, Ricardo Vinuesa
  • for: 预测混沌系统的时间动态特性
  • methods: 提议一种名为“易注意”的注意机制,通过对软MAX的特征值分解来更好地捕捉长期相关性
  • results: 与自注意和LSTM网络相比,该方法具有更高的稳定性和较低的复杂性,并且在重建和预测混沌系统的时间动态特性方面获得了优秀的结果
    Abstract To improve the robustness of transformer neural networks used for temporal-dynamics prediction of chaotic systems, we propose a novel attention mechanism called easy attention. Due to the fact that self attention only makes usage of the inner product of queries and keys, it is demonstrated that the keys, queries and softmax are not necessary for obtaining the attention score required to capture long-term dependencies in temporal sequences. Through implementing singular-value decomposition (SVD) on the softmax attention score, we further observe that the self attention compresses contribution from both queries and keys in the spanned space of the attention score. Therefore, our proposed easy-attention method directly treats the attention scores as learnable parameters. This approach produces excellent results when reconstructing and predicting the temporal dynamics of chaotic systems exhibiting more robustness and less complexity than the self attention or the widely-used long short-term memory (LSTM) network. Our results show great potential for applications in more complex high-dimensional dynamical systems.
    摘要 要提高转换器神经网络在时间动力学预测混沌系统的稳定性,我们提议一种新的注意力机制called“容易注意”。由于自注意只使用内积Query和Key,所以我们发现键、Query和软MAX不是必需的来获取注意力分数,以 capture长期依赖关系在时间序列中。通过对软MAX注意力分数进行特征值分解(SVD),我们还发现自注意力压缩从Query和Key在注意力分数的核心空间中的贡献。因此,我们的提议的“容易注意”方法直接将注意力分数作为学习参数。这种方法在重建和预测混沌系统的时间动力学中表现出色,比自注意或广泛使用的长期短期记忆(LSTM)网络更加稳定和简洁。我们的结果表明这种方法在更复杂的高维动力系统中具有潜在的应用前景。

IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency

  • paper_url: http://arxiv.org/abs/2308.12871
  • repo_url: None
  • paper_authors: Saeid Ghafouri, Kamran Razavi, Mehran Salmani, Alireza Sanaee, Tania Lorido-Botran, Lin Wang, Joseph Doyle, Pooyan Jamshidi
  • for: 提高深度学习推理系统的快速、准确和Cost-effective推理
  • methods: 使用Integer Programming动态配置批处理大小、复制和模型变体,以优化准确率、降低成本并满足用户定义的响应时间SLAs
  • results: 在Kubernetes实现上,对五个实际推理管道进行了广泛的实验,发现IPA可以提高normalized准确率达35%,而成本增加仅为5%以下。
    Abstract Efficiently optimizing multi-model inference pipelines for fast, accurate, and cost-effective inference is a crucial challenge in ML production systems, given their tight end-to-end latency requirements. To simplify the exploration of the vast and intricate trade-off space of accuracy and cost in inference pipelines, providers frequently opt to consider one of them. However, the challenge lies in reconciling accuracy and cost trade-offs. To address this challenge and propose a solution to efficiently manage model variants in inference pipelines, we present IPA, an online deep-learning Inference Pipeline Adaptation system that efficiently leverages model variants for each deep learning task. Model variants are different versions of pre-trained models for the same deep learning task with variations in resource requirements, latency, and accuracy. IPA dynamically configures batch size, replication, and model variants to optimize accuracy, minimize costs, and meet user-defined latency SLAs using Integer Programming. It supports multi-objective settings for achieving different trade-offs between accuracy and cost objectives while remaining adaptable to varying workloads and dynamic traffic patterns. Extensive experiments on a Kubernetes implementation with five real-world inference pipelines demonstrate that IPA improves normalized accuracy by up to 35% with a minimal cost increase of less than 5%.
    摘要 efficiently 优化多模型推理管道是 ML 生产系统中关键的挑战,以实现快速、准确、cost-effective 的推理。为了简化推理管道中精度和成本费用之间的质量和成本费用的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用之间的质量和成本费用

Auto-weighted Bayesian Physics-Informed Neural Networks and robust estimations for multitask inverse problems in pore-scale imaging of dissolution

  • paper_url: http://arxiv.org/abs/2308.12864
  • repo_url: None
  • paper_authors: Sarah Perez, Philippe Poncet
    for:这种新的数据融合策略可以帮助解决涉及不确定性评估的反应 inverse 问题,并提供可靠的减噪计算方法。methods:这种方法基于多任务形式的反应 inverse 问题,结合数据驱动和物理驱动技术,以评估气体扩散和氯化物溶解的反应过程。results:这种方法可以提供可靠的 uncertainty 评估和反应参数范围的估计,并在1D+时和2D+时calcite dissolution中进行成功的 bayesian 推理。
    Abstract In this article, we present a novel data assimilation strategy in pore-scale imaging and demonstrate that this makes it possible to robustly address reactive inverse problems incorporating Uncertainty Quantification (UQ). Pore-scale modeling of reactive flow offers a valuable opportunity to investigate the evolution of macro-scale properties subject to dynamic processes. Yet, they suffer from imaging limitations arising from the associated X-ray microtomography (X-ray microCT) process, which induces discrepancies in the properties estimates. Assessment of the kinetic parameters also raises challenges, as reactive coefficients are critical parameters that can cover a wide range of values. We account for these two issues and ensure reliable calibration of pore-scale modeling, based on dynamical microCT images, by integrating uncertainty quantification in the workflow. The present method is based on a multitasking formulation of reactive inverse problems combining data-driven and physics-informed techniques in calcite dissolution. This allows quantifying morphological uncertainties on the porosity field and estimating reactive parameter ranges through prescribed PDE models with a latent concentration field and dynamical microCT. The data assimilation strategy relies on sequential reinforcement incorporating successively additional PDE constraints. We guarantee robust and unbiased uncertainty quantification by straightforward adaptive weighting of Bayesian Physics-Informed Neural Networks (BPINNs), ensuring reliable micro-porosity changes during geochemical transformations. We demonstrate successful Bayesian Inference in 1D+Time and 2D+Time calcite dissolution based on synthetic microCT images with meaningful posterior distribution on the reactive parameters and dimensionless numbers.
    摘要 在这篇文章中,我们介绍了一种新的数据融合策略在粒度图像中,并证明这使得可以坚定地解决包含不确定性评估(UQ)的反应反问题。粒度图像模拟的反应流动提供了可观察的大规模特性的演化,但受到X射微 Tomatoes图像过程的限制,导致属性估计存在偏差。评估反应参数也存在挑战,因为反应系数可以覆盖广泛的值范围。我们考虑了这两个问题,并确保了粒度图像模拟的可靠校准,基于动态微 Tomatoes图像。我们的方法基于反应反问题的多任务形式,结合数据驱动和物理驱动技术,在氯酸硅酸盐解析中实现。这允许评估 morphological uncertainty 在porosity field 上,并估算反应参数范围通过隐藏 концентрация场和动态微 Tomatoes图像。数据融合策略基于顺序强制,并在每个循环中添加更多的PDE约束。我们 garantía robust 和无偏评估,通过简单的适应权重 Bayesian Physics-Informed Neural Networks (BPINNs),确保微软� Porosity 变化在地化学转化过程中可靠。我们在1D+Time和2D+Time氯酸硅酸盐解析中成功完成 bayesian inference,并得到了 meaningful posterior distribution 上的反应参数和约束数。

Towards Automated Animal Density Estimation with Acoustic Spatial Capture-Recapture

  • paper_url: http://arxiv.org/abs/2308.12859
  • repo_url: None
  • paper_authors: Yuheng Wang, Juan Ye, David L. Borchers
  • for: 监测野生动物人口,特别是难以视见的动物。
  • methods: 使用机器学习方法进行 vocals 识别,并将各个发声的特征纳入捕捉识别。
  • results: 比 tradicional 方法更准确地捕捉野生动物人口,并且可以考虑发声特征的不确定性。
    Abstract Passive acoustic monitoring can be an effective way of monitoring wildlife populations that are acoustically active but difficult to survey visually. Digital recorders allow surveyors to gather large volumes of data at low cost, but identifying target species vocalisations in these data is non-trivial. Machine learning (ML) methods are often used to do the identification. They can process large volumes of data quickly, but they do not detect all vocalisations and they do generate some false positives (vocalisations that are not from the target species). Existing wildlife abundance survey methods have been designed specifically to deal with the first of these mistakes, but current methods of dealing with false positives are not well-developed. They do not take account of features of individual vocalisations, some of which are more likely to be false positives than others. We propose three methods for acoustic spatial capture-recapture inference that integrate individual-level measures of confidence from ML vocalisation identification into the likelihood and hence integrate ML uncertainty into inference. The methods include a mixture model in which species identity is a latent variable. We test the methods by simulation and find that in a scenario based on acoustic data from Hainan gibbons, in which ignoring false positives results in 17% positive bias, our methods give negligible bias and coverage probabilities that are close to the nominal 95% level.
    摘要 Passive acoustic monitoring可以是有效的监测野生动物人口,因为这些动物可能是视觉上难以观察的。数字录音器allsow surveyors to gather大量数据at low cost,但是寻找目标种 vocalizations在这些数据中是非常困难的。机器学习(ML)方法经常用于进行识别。它们可以快速处理大量数据,但是它们不会检测所有的 vocalizations,而且会出现一些假阳性(不是目标种的 vocalizations)。现有的野生生物资源量评估方法已经特意设计来处理第一种错误,但是现有的方法并不是很好地处理假阳性。它们没有考虑个体 vocalizations 的特征,一些这些特征更可能是假阳性的。我们提议三种静音空间捕捉-重复检测方法,这些方法会将机器学习确定性测试结果 integrate 到概率中,因此可以 integrate 机器学习的不确定性到推断中。这些方法包括一种混合模型,在这种模型中,种类标识是隐藏变量。我们通过模拟测试了这些方法,在基于海南 Gibbon 的声音数据场景中,如果忽略假阳性,则有17%的正确率偏差,而我们的方法则没有偏差,并且涵 coverage probabilities 接近 Nominal 95% 水平。

Fast Adversarial Training with Smooth Convergence

  • paper_url: http://arxiv.org/abs/2308.12857
  • repo_url: https://github.com/fat-cs/convergesmooth
  • paper_authors: Mengnan Zhao, Lihe Zhang, Yuqiu Kong, Baocai Yin
  • for: 提高神经网络的攻击强度 robustness
  • methods: 提出了一种新的振荡约束(ConvergeSmooth),以保证训练过程的稳定和平滑,从而避免极端过拟合问题。
  • results: 经过EXTENSIVE EXPERIMENTS ON POPULAR DATASETS的测试,提出的方法能够高效地避免极端过拟合问题,并且超过所有之前的FAT方法的性能。
    Abstract Fast adversarial training (FAT) is beneficial for improving the adversarial robustness of neural networks. However, previous FAT work has encountered a significant issue known as catastrophic overfitting when dealing with large perturbation budgets, \ie the adversarial robustness of models declines to near zero during training. To address this, we analyze the training process of prior FAT work and observe that catastrophic overfitting is accompanied by the appearance of loss convergence outliers. Therefore, we argue a moderately smooth loss convergence process will be a stable FAT process that solves catastrophic overfitting. To obtain a smooth loss convergence process, we propose a novel oscillatory constraint (dubbed ConvergeSmooth) to limit the loss difference between adjacent epochs. The convergence stride of ConvergeSmooth is introduced to balance convergence and smoothing. Likewise, we design weight centralization without introducing additional hyperparameters other than the loss balance coefficient. Our proposed methods are attack-agnostic and thus can improve the training stability of various FAT techniques. Extensive experiments on popular datasets show that the proposed methods efficiently avoid catastrophic overfitting and outperform all previous FAT methods. Code is available at \url{https://github.com/FAT-CS/ConvergeSmooth}.
    摘要 快速对抗训练(FAT)可以提高神经网络的对抗性。然而,先前的FAT工作遇到了一个重要的问题,即灾难性过拟合,即在训练过程中模型的对抗性下降到接近零。 为了解决这个问题,我们分析了过去的FAT工作训练过程,发现灾难性过拟合与搅拌值异常强相关。因此,我们认为一个 moderadamente suave proceso de convergencia de pérdida(dubbed ConvergeSmooth)可以使FAT过程更加稳定,并解决灾难性过拟合。 为了获得一个 suave proceso de convergencia de pérdida,我们提出了一种新的振荡约束(dubbed ConvergeSmooth),用于限制连续两个步骤之间的搅拌值差。抽象层中的权重归一化也是在搅拌值差的平衡下实现的。 我们提出的方法是对抗性的,可以改善各种FAT技术的训练稳定性。 我们进行了广泛的实验,发现我们的方法可以有效避免灾难性过拟合,并在各种 популяр的数据集上超越所有之前的FAT方法。代码可以在 \url{https://github.com/FAT-CS/ConvergeSmooth} 上找到。

Probabilistic load forecasting with Reservoir Computing

  • paper_url: http://arxiv.org/abs/2308.12844
  • repo_url: https://github.com/MicheleUIT/Probabilistic-load-forecasting-with-Reservoir-Computing
  • paper_authors: Michele Guerra, Simone Scardapane, Filippo Maria Bianchi
  • for: 预测电力负荷的准确性和不确定性评估
  • methods: 使用 ressvoir computing 方法进行时间序列预测,并评估不确定性评估方法的Compatibility和效果
  • results: 对多种不确定性评估方法进行比较,并基于精心选择的性能指标进行评估,以确定最佳的不确定性评估方法。
    Abstract Some applications of deep learning require not only to provide accurate results but also to quantify the amount of confidence in their prediction. The management of an electric power grid is one of these cases: to avoid risky scenarios, decision-makers need both precise and reliable forecasts of, for example, power loads. For this reason, point forecasts are not enough hence it is necessary to adopt methods that provide an uncertainty quantification. This work focuses on reservoir computing as the core time series forecasting method, due to its computational efficiency and effectiveness in predicting time series. While the RC literature mostly focused on point forecasting, this work explores the compatibility of some popular uncertainty quantification methods with the reservoir setting. Both Bayesian and deterministic approaches to uncertainty assessment are evaluated and compared in terms of their prediction accuracy, computational resource efficiency and reliability of the estimated uncertainty, based on a set of carefully chosen performance metrics.
    摘要 某些深度学习应用需要不仅提供准确的结果,还需要衡量预测结果的信度。电力系统管理是这种情况之一:为了避免危险场景,决策者需要准确且可靠的电力负荷预测。因此,点预测不足,需要采用能够衡量不确定性的方法。这项工作选择了储池计算作为核心时间序列预测方法,因为它在计算效率和时间序列预测效果方面具有优势。而 RC 文献中大多数研究都集中在点预测方面,这项工作则探讨了储池设置下的不确定性评估方法的可行性。本研究对不确定性评估方法进行了bayesian和束定的两种方法的评估和比较,基于一组精心选择的性能指标进行评估。

Actuator Trajectory Planning for UAVs with Overhead Manipulator using Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.12843
  • repo_url: None
  • paper_authors: Hazim Alzorgan, Abolfazl Razi, Ata Jahangir Moshayedi
    for:这 paper 旨在研究一种 aerial manipulator 系统,即一架有控制飞行器的臂部两个自由度的无人飞行器(UAV),以执行 actuation 任务。methods:这 paper 使用 Q-learning 方法控制 manipulate 器的轨迹,并开发了基于 Time To Collision (TTC) 的动态规划模型,使quadrotor UAV 绕障碍物进行探索,并保证 manipulator 的可达性。results:这 paper 实现了多种 actuation 任务,如高空抛剂、结构监测和维护、电池更换、排障清理、高层建筑清理、电缆维护等,在困难和危险环境中完成,同时保持与飞行控制固件的兼容性。RL-based 控制机制实现了一种稳定的控制策略,可以处理飞行器的动态运动不确定性,提供了出色的性能,具体实现了92%的均方差误差(i.e. 目标和实际轨迹点之间的平均距离),使用 Q-learning 方法,共计15000集。
    Abstract In this paper, we investigate the operation of an aerial manipulator system, namely an Unmanned Aerial Vehicle (UAV) equipped with a controllable arm with two degrees of freedom to carry out actuation tasks on the fly. Our solution is based on employing a Q-learning method to control the trajectory of the tip of the arm, also called \textit{end-effector}. More specifically, we develop a motion planning model based on Time To Collision (TTC), which enables a quadrotor UAV to navigate around obstacles while ensuring the manipulator's reachability. Additionally, we utilize a model-based Q-learning model to independently track and control the desired trajectory of the manipulator's end-effector, given an arbitrary baseline trajectory for the UAV platform. Such a combination enables a variety of actuation tasks such as high-altitude welding, structural monitoring and repair, battery replacement, gutter cleaning, sky scrapper cleaning, and power line maintenance in hard-to-reach and risky environments while retaining compatibility with flight control firmware. Our RL-based control mechanism results in a robust control strategy that can handle uncertainties in the motion of the UAV, offering promising performance. Specifically, our method achieves 92\% accuracy in terms of average displacement error (i.e. the mean distance between the target and obtained trajectory points) using Q-learning with 15,000 episodes
    摘要 在这篇论文中,我们研究了一种无人飞行器系统,即一架有控制可能的无人飞行器(UAV),其中包括两个自由度的臂部,用于在飞行过程中完成 actuation 任务。我们的解决方案基于 employing Q-learning 方法控制臂部的 trajectory,具体来说,我们开发了一种基于 Time To Collision(TTC)的运动规划模型,使得一架 quadrotor UAV 在避免障碍物时能够自由 navigation。此外,我们利用一种基于模型的 Q-learning 模型来独立地跟踪和控制 manipulator 的 end-effector 的愿望轨迹, givien an arbitrary baseline trajectory for the UAV platform。这种组合使得可以完成多种 actuation 任务,如高空抛射、结构监测和维护、电池更换、排水管理、高层建筑物清洁和维护、电缆维护等,而且保持与飞行控制软件的兼容性。我们的 RL-based 控制机制得到了一个可靠的控制策略,可以处理无人飞行器的运动不确定性,提供了有前途的性能。具体来说,我们的方法实现了 92% 的准确率(即target和实际轨迹点之间的平均距离)使用 Q-learning WITH 15,000 集。

Short Run Transit Route Planning Decision Support System Using a Deep Learning-Based Weighted Graph

  • paper_url: http://arxiv.org/abs/2308.12828
  • repo_url: None
  • paper_authors: Nadav Shalit, Michael Fire, Dima Kagan, Eran Ben-Elia
  • for: 提高公共交通服务的效率和可靠性
  • methods: 使用深度学习方法,利用多种数据源,自动调整路线段,快速实现路线改进
  • results: 在 Tel Aviv 测试,可以降低路线时间超过 9%,包括城市内和郊区路线,表明模型的 universality 和可靠性。
    Abstract Public transport routing plays a crucial role in transit network design, ensuring a satisfactory level of service for passengers. However, current routing solutions rely on traditional operational research heuristics, which can be time-consuming to implement and lack the ability to provide quick solutions. Here, we propose a novel deep learning-based methodology for a decision support system that enables public transport (PT) planners to identify short-term route improvements rapidly. By seamlessly adjusting specific sections of routes between two stops during specific times of the day, our method effectively reduces times and enhances PT services. Leveraging diverse data sources such as GTFS and smart card data, we extract features and model the transportation network as a directed graph. Using self-supervision, we train a deep learning model for predicting lateness values for road segments. These lateness values are then utilized as edge weights in the transportation graph, enabling efficient path searching. Through evaluating the method on Tel Aviv, we are able to reduce times on more than 9\% of the routes. The improved routes included both intraurban and suburban routes showcasing a fact highlighting the model's versatility. The findings emphasize the potential of our data-driven decision support system to enhance public transport and city logistics, promoting greater efficiency and reliability in PT services.
    摘要 公共交通路径规划在城市交通网络设计中扮演了关键角色,确保乘客获得满意的服务水平。然而,当前的路径规划解决方案基于传统的操作研究规则,可能需要大量的时间来实施并且缺乏快速解决方案。在这里,我们提出了一种基于深度学习的决策支持系统,可以帮助公共交通(PT)规划员在短时间内迅速地标识路径改进。通过在特定时间段和站点之间进行部分路径的微调,我们的方法可以有效减少时间并提高PT服务质量。通过利用多种数据源,如GTFS和智能卡数据,我们提取特征并模型了交通网络为指定图。使用无监督学习,我们训练了深度学习模型,以预测路段延迟值。这些延迟值然后被用作路径搜索的边权值,使得路径搜索变得更加高效。通过对特拉维夫进行评估,我们能够在9.3%的路线上减少时间。改进的路线包括城市内部和郊区路线,这些成果展示了我们的数据驱动决策支持系统的多样性。这些发现强调了我们的方法在城市交通和城市物流方面的潜在潜力,推动公共交通和城市物流的更高效和可靠性。

Prediction without Preclusion: Recourse Verification with Reachable Sets

  • paper_url: http://arxiv.org/abs/2308.12820
  • repo_url: None
  • paper_authors: Avni Kothari, Bogdan Kulynych, Tsui-Wei Weng, Berk Ustun
  • for: 这个论文旨在提出一种正式的测试过程,用于检查机器学习模型是否能够提供可 revertible 的预测结果,以确保用户可以根据自己的需求来修改模型的决策。
  • methods: 该论文使用了一种新的测试方法,可以判断模型是否能够提供回退性的预测结果,并且可以根据用户提供的行动可能性来确定模型的可 revertibility。
  • results: 研究人员使用了这种测试方法,对实际的借款数据集进行了研究,发现一些模型可能会在预测结果中固化用户的状态,从而导致用户无法回退。这些结果表明了机器学习模型在决策中的缺陷,并且提供了一种新的方法来解决这个问题。
    Abstract Machine learning models are often used to decide who will receive a loan, a job interview, or a public benefit. Standard techniques to build these models use features about people but overlook their actionability. In turn, models can assign predictions that are fixed, meaning that consumers who are denied loans, interviews, or benefits may be permanently locked out from access to credit, employment, or assistance. In this work, we introduce a formal testing procedure to flag models that assign fixed predictions that we call recourse verification. We develop machinery to reliably determine if a given model can provide recourse to its decision subjects from a set of user-specified actionability constraints. We demonstrate how our tools can ensure recourse and adversarial robustness in real-world datasets and use them to study the infeasibility of recourse in real-world lending datasets. Our results highlight how models can inadvertently assign fixed predictions that permanently bar access, and we provide tools to design algorithms that account for actionability when developing models.
    摘要 (简体中文)机器学习模型常用于决定谁将获得贷款、面试或公共援助。标准的建模技术会使用人们的特征来建模,但忽略其可行性。因此,模型可能会分配固定的预测,导致被拒绝的消费者无法再次申请贷款、面试或援助。在这项工作中,我们引入了一种正式的测试过程,以确定模型是否可以为其决策对象提供征求(recourse verification)。我们开发了一套机器可靠地确定一个给定模型是否可以从用户指定的可行性约束中提供征求。我们在实际数据集中应用了这些工具,并研究了实际贷款数据集中的不可能性。我们的结果显示了模型可能会无意地分配固定的预测,永久排除消费者的访问权,并提供了一些工具来设计考虑可行性的模型。

Job Shop Scheduling Benchmark: Environments and Instances for Learning and Non-learning Methods

  • paper_url: http://arxiv.org/abs/2308.12794
  • repo_url: https://github.com/ai-for-decision-making-tue/job_shop_scheduling_benchmark
  • paper_authors: Robbert Reijnen, Kjell van Straaten, Zaharah Bukhsh, Yingqian Zhang
  • for: 本研究的目的是提供一个中心化的Hub,用于研究者、实践者和爱好者们在机器计划问题上进行探索和解决。
  • methods: 本研究使用了多种机器计划问题的benchmark,包括Job Shop Scheduling (JSP)、Flow Shop Scheduling (FSP)、Flexible Job Shop Scheduling (FJSP)、FJSP with Assembly constraints (FAJSP)、FJSP with Sequence-Dependent Setup Times (FJSP-SDST)和在线FJSP。
  • results: 本研究提供了一个开源的GitHub存储库,包含了各种机器计划问题的完整的benchmark,以便研究者和实践者可以使用这些benchmark进行研究和解决。
    Abstract We introduce an open-source GitHub repository containing comprehensive benchmarks for a wide range of machine scheduling problems, including Job Shop Scheduling (JSP), Flow Shop Scheduling (FSP), Flexible Job Shop Scheduling (FJSP), FJSP with Assembly constraints (FAJSP), FJSP with Sequence-Dependent Setup Times (FJSP-SDST), and the online FJSP (with online job arrivals). Our primary goal is to provide a centralized hub for researchers, practitioners, and enthusiasts interested in tackling machine scheduling challenges.
    摘要 我们介绍了一个开源的 GitHub 存储库,包含了广泛的机器调度问题的benchmark,包括作业shop调度问题(JSP)、流shop调度问题(FSP)、可变作业shop调度问题(FJSP)、FJSP中Assembly约束(FAJSP)、FJSP中sequence-dependent设置时间(FJSP-SDST)以及在线FJSP。我们的主要目标是提供一个中央集中的平台,为研究人员、实践者和爱好者提供机器调度挑战的机会。

Single-shot Bayesian approximation for neural networks

  • paper_url: http://arxiv.org/abs/2308.12785
  • repo_url: https://github.com/kaibrach/Moment-Propagation
  • paper_authors: Kai Brach, Beate Sick, Oliver Dürr
  • For: The paper aims to develop a single-shot Monte Carlo (MC) dropout approximation for Bayesian neural networks (BNNs) that can provide uncertainty measures and high prediction performance, while being as fast as traditional neural networks (NNs).* Methods: The proposed method is based on moment propagation (MP) and can analytically approximate the expected value and variance of the MC dropout signal for commonly used layers in NNs. The approach does not require re-training and can convert an NN into a BNN for uncertainty estimation.* Results: The paper demonstrates that the proposed single-shot MC dropout approximation can resemble the point estimate and uncertainty estimate of the predictive distribution achieved with MC methods, while being fast enough for real-time deployments. Additionally, combining the MP approach with deep ensemble techniques further improves uncertainty measures.
    Abstract Deep neural networks (NNs) are known for their high-prediction performances. However, NNs are prone to yield unreliable predictions when encountering completely new situations without indicating their uncertainty. Bayesian variants of NNs (BNNs), such as Monte Carlo (MC) dropout BNNs, do provide uncertainty measures and simultaneously increase the prediction performance. The only disadvantage of BNNs is their higher computation time during test time because they rely on a sampling approach. Here we present a single-shot MC dropout approximation that preserves the advantages of BNNs while being as fast as NNs. Our approach is based on moment propagation (MP) and allows to analytically approximate the expected value and the variance of the MC dropout signal for commonly used layers in NNs, i.e. convolution, max pooling, dense, softmax, and dropout layers. The MP approach can convert an NN into a BNN without re-training given the NN has been trained with standard dropout. We evaluate our approach on different benchmark datasets and a simulated toy example in a classification and regression setting. We demonstrate that our single-shot MC dropout approximation resembles the point estimate and the uncertainty estimate of the predictive distribution that is achieved with an MC approach, while being fast enough for real-time deployments of BNNs. We show that using part of the saved time to combine our MP approach with deep ensemble techniques does further improve the uncertainty measures.
    摘要 Here we present a single-shot MC dropout approximation that preserves the advantages of BNNs while being as fast as NNs. Our approach is based on moment propagation (MP) and allows to analytically approximate the expected value and the variance of the MC dropout signal for commonly used layers in NNs, such as convolution, max pooling, dense, softmax, and dropout layers. The MP approach can convert an NN into a BNN without re-training, given the NN has been trained with standard dropout.We evaluate our approach on different benchmark datasets and a simulated toy example in a classification and regression setting. We demonstrate that our single-shot MC dropout approximation resembles the point estimate and the uncertainty estimate of the predictive distribution that is achieved with an MC approach, while being fast enough for real-time deployments of BNNs. Additionally, we show that using part of the saved time to combine our MP approach with deep ensemble techniques further improves the uncertainty measures.

Intentionally-underestimated Value Function at Terminal State for Temporal-difference Learning with Mis-designed Reward

  • paper_url: http://arxiv.org/abs/2308.12772
  • repo_url: None
  • paper_authors: Taisuke Kobayashi
  • For: This paper addresses the problem of unintentional overestimation in reinforcement learning, specifically in the context of temporal-difference (TD) learning.* Methods: The proposed method intentionally underestimates the value after termination to avoid learning failures due to unintentional overestimation. The degree of underestimation is adjusted according to the degree of stationarity at termination.* Results: The proposed method was tested in simulations and real robot experiments, and it was able to stably obtain the optimal policies for various tasks and reward designs.Here’s the simplified Chinese version of the three key points:* 用途: 这篇论文解决了 reinforcement learning 中 temporal-difference (TD) 学习过程中的不 Intentional overestimation 问题。* 方法: 提议的方法在结束时故意下预估值,以避免因不 Intentional overestimation 而导致的学习失败。预估度与结束时的 Stationarity 度相似。* 结果: 提议的方法在 simulated 和实际 robot 实验中被证明可以稳定地获得多种任务和奖励设计的优化策略。
    Abstract Robot control using reinforcement learning has become popular, but its learning process generally terminates halfway through an episode for safety and time-saving reasons. This study addresses the problem of the most popular exception handling that temporal-difference (TD) learning performs at such termination. That is, by forcibly assuming zero value after termination, unintentionally implicit underestimation or overestimation occurs, depending on the reward design in the normal states. When the episode is terminated due to task failure, the failure may be highly valued with the unintentional overestimation, and the wrong policy may be acquired. Although this problem can be avoided by paying attention to the reward design, it is essential in practical use of TD learning to review the exception handling at termination. This paper therefore proposes a method to intentionally underestimate the value after termination to avoid learning failures due to the unintentional overestimation. In addition, the degree of underestimation is adjusted according to the degree of stationarity at termination, thereby preventing excessive exploration due to the intentional underestimation. Simulations and real robot experiments showed that the proposed method can stably obtain the optimal policies for various tasks and reward designs. https://youtu.be/AxXr8uFOe7M
    摘要

On the Consistency of Average Embeddings for Item Recommendation

  • paper_url: http://arxiv.org/abs/2308.12767
  • repo_url: https://github.com/deezer/consistency
  • paper_authors: Walid Bendada, Guillaume Salha-Galvan, Romain Hennequin, Thomas Bouabça, Tristan Cazenave
  • for: 本研究探讨了averaging item embeddings的做法是否有效。
  • methods: 本研究提出了一个预期精度分数,用于衡量averaging item embeddings的一致性。
  • results: 实验结果表明,实际的average embedding在推荐 task中的一致性较差,这laying the groundwork for future research to improve the alignment of real-world embeddings with theoretical assumptions.
    Abstract A prevalent practice in recommender systems consists of averaging item embeddings to represent users or higher-level concepts in the same embedding space. This paper investigates the relevance of such a practice. For this purpose, we propose an expected precision score, designed to measure the consistency of an average embedding relative to the items used for its construction. We subsequently analyze the mathematical expression of this score in a theoretical setting with specific assumptions, as well as its empirical behavior on real-world data from music streaming services. Our results emphasize that real-world averages are less consistent for recommendation, which paves the way for future research to better align real-world embeddings with assumptions from our theoretical setting.
    摘要 一种常见的做法在推荐系统中是将item embedding平均化以表示用户或更高级概念在同一个嵌入空间中。这篇论文检查这种做法的 relevance。为此,我们提出了一个预期精度分数,用于衡量average embedding的一致性与构建它的 item 的一致性。我们随后对这个分数的数学表述在具体的假设下进行分析,以及它在实际数据上的实际行为。我们的结果表明,实际中的均值更不一致,这为未来的研究提供了更好地对准实际嵌入与假设中的假设进行调整的方向。

IP-UNet: Intensity Projection UNet Architecture for 3D Medical Volume Segmentation

  • paper_url: http://arxiv.org/abs/2308.12761
  • repo_url: None
  • paper_authors: Nyothiri Aung, Tahar Kechadi, Liming Chen, Sahraoui Dhelim
  • for: 这篇论文的目的是提出一个可以进行多类别分类的深度学习方法,并且能够运用有限的内存容量进行训练,而不会对原始3D影像的分辨率造成影响。
  • methods: 这篇论文提出了一个名为IP-UNet的深度学习模型,该模型使用了Intensity Projection(IP)的方法来将3D类别资料转换为2D影像,并且使用了有限的内存容量进行训练。
  • results: 实验结果显示,IP-UNet可以与3D-UNet模型实现相似的分类精度,但是具有更好的性能。它可以降低训练时间70%,并且对于内存consumption降低92%。
    Abstract CNNs have been widely applied for medical image analysis. However, limited memory capacity is one of the most common drawbacks of processing high-resolution 3D volumetric data. 3D volumes are usually cropped or downsized first before processing, which can result in a loss of resolution, increase class imbalance, and affect the performance of the segmentation algorithms. In this paper, we propose an end-to-end deep learning approach called IP-UNet. IP-UNet is a UNet-based model that performs multi-class segmentation on Intensity Projection (IP) of 3D volumetric data instead of the memory-consuming 3D volumes. IP-UNet uses limited memory capability for training without losing the original 3D image resolution. We compare the performance of three models in terms of segmentation accuracy and computational cost: 1) Slice-by-slice 2D segmentation of the CT scan images using a conventional 2D UNet model. 2) IP-UNet that operates on data obtained by merging the extracted Maximum Intensity Projection (MIP), Closest Vessel Projection (CVP), and Average Intensity Projection (AvgIP) representations of the source 3D volumes, then applying the UNet model on the output IP images. 3) 3D-UNet model directly reads the 3D volumes constructed from a series of CT scan images and outputs the 3D volume of the predicted segmentation. We test the performance of these methods on 3D volumetric images for automatic breast calcification detection. Experimental results show that IP-Unet can achieve similar segmentation accuracy with 3D-Unet but with much better performance. It reduces the training time by 70\% and memory consumption by 92\%.
    摘要 卷积神经网络(CNN)已广泛应用于医疗影像分析领域。然而,处理高分辨率3DVolume数据的限制内存容量是最常见的缺点。通常情况下,将3DVolume数据剪辑或缩放为适合内存限制的大小后处理,可能会导致分辨率下降、类别偏斜增加和分 segmentation 算法性能下降。在这篇论文中,我们提出了一种终端深度学习方法,即IP-UNet。IP-UNet 是基于 UNet 模型的终端深度学习方法,用于在Intensity Projection(IP)中进行多类分 segmentation。与传统的内存占用高的3DVolume数据处理不同,IP-UNet 可以在具有有限内存capacity的情况下训练,无需失去原始3D图像的分辨率。我们将比较三种模型的性能,包括分 segmentation 精度和计算成本:1. 使用传统的2D UNet模型,对 CT 扫描图像进行层段处理。2. IP-UNet,对来自3DVolume数据的EXTRACTED Maximum Intensity Projection(MIP)、Closest Vessel Projection(CVP)和Average Intensity Projection(AvgIP)表示进行合并,然后应用 UNet 模型于输出 IP 图像。3. 直接使用3D UNet模型,对 CT 扫描图像序列构建的3D Volume进行分 segmentation。我们对这些方法进行了3D volumetric 图像自动乳腺炎检测的测试,实验结果表明,IP-UNet 可以与3D UNet 模型具有相似的分 segmentation 精度,但是性能更好。它可以降低训练时间70%,内存占用量92%。

Motion In-Betweening with Phase Manifolds

  • paper_url: http://arxiv.org/abs/2308.12751
  • repo_url: https://github.com/pauzii/phasebetweener
  • paper_authors: Paul Starke, Sebastian Starke, Taku Komura, Frank Steinicke
  • for: 这篇论文提出了一种基于数据驱动的动作 interpolating 系统,用于实现人物的目标姿态。
  • methods: 该方法使用 periodic autoencoder 学习 phase 变量,并使用 mixture-of-experts 神经网络模型,以生成动作序列。在满足特定的约束条件下,使用学习的 bi-directional 控制方案来满足这些约束。
  • results: 结果表明,使用 phase 变量进行动作 interpolating 可以增强 interpolated 运动的精度,并且可以在长 transition 时间下稳定学习过程。此外,该方法还可以synthesize 更加复杂的运动行为,并具有样式控制功能。与现有的状态静态方法相比,该方法在动作质量和泛化性方面具有竞争力。
    Abstract This paper introduces a novel data-driven motion in-betweening system to reach target poses of characters by making use of phases variables learned by a Periodic Autoencoder. Our approach utilizes a mixture-of-experts neural network model, in which the phases cluster movements in both space and time with different expert weights. Each generated set of weights then produces a sequence of poses in an autoregressive manner between the current and target state of the character. In addition, to satisfy poses which are manually modified by the animators or where certain end effectors serve as constraints to be reached by the animation, a learned bi-directional control scheme is implemented to satisfy such constraints. The results demonstrate that using phases for motion in-betweening tasks sharpen the interpolated movements, and furthermore stabilizes the learning process. Moreover, using phases for motion in-betweening tasks can also synthesize more challenging movements beyond locomotion behaviors. Additionally, style control is enabled between given target keyframes. Our proposed framework can compete with popular state-of-the-art methods for motion in-betweening in terms of motion quality and generalization, especially in the existence of long transition durations. Our framework contributes to faster prototyping workflows for creating animated character sequences, which is of enormous interest for the game and film industry.
    摘要

Human Comprehensible Active Learning of Genome-Scale Metabolic Networks

  • paper_url: http://arxiv.org/abs/2308.12740
  • repo_url: None
  • paper_authors: Lun Ai, Shi-Shun Liang, Wang-Zhou Dai, Liam Hallett, Stephen H. Muggleton, Geoff S. Baldwin
  • for: engineering of host cell systems to yield useful products
  • methods: Inductive Logic Programming (ILP) and active learning from training examples
  • results: high-throughput simulations and reduced experimental cost of learning gene functions compared to randomly selected experiments
    Abstract An important application of Synthetic Biology is the engineering of the host cell system to yield useful products. However, an increase in the scale of the host system leads to huge design space and requires a large number of validation trials with high experimental costs. A comprehensible machine learning approach that efficiently explores the hypothesis space and guides experimental design is urgently needed for the Design-Build-Test-Learn (DBTL) cycle of the host cell system. We introduce a novel machine learning framework ILP-iML1515 based on Inductive Logic Programming (ILP) that performs abductive logical reasoning and actively learns from training examples. In contrast to numerical models, ILP-iML1515 is built on comprehensible logical representations of a genome-scale metabolic model and can update the model by learning new logical structures from auxotrophic mutant trials. The ILP-iML1515 framework 1) allows high-throughput simulations and 2) actively selects experiments that reduce the experimental cost of learning gene functions in comparison to randomly selected experiments.
    摘要 sintetic biology 的一个重要应用是 Engineering the host cell system to produce useful products. However, as the scale of the host system increases, the design space grows exponentially, and a large number of validation trials with high experimental costs are required. A comprehensible machine learning approach that efficiently explores the hypothesis space and guides experimental design is urgently needed for the Design-Build-Test-Learn (DBTL) cycle of the host cell system. We propose a novel machine learning framework ILP-iML1515 based on Inductive Logic Programming (ILP) that performs abductive logical reasoning and actively learns from training examples. Unlike numerical models, ILP-iML1515 is built on comprehensible logical representations of a genome-scale metabolic model and can update the model by learning new logical structures from auxotrophic mutant trials. The ILP-iML1515 framework 1) enables high-throughput simulations and 2) actively selects experiments that reduce the experimental cost of learning gene functions compared to randomly selected experiments.

Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion

  • paper_url: http://arxiv.org/abs/2308.12734
  • repo_url: None
  • paper_authors: Jordan J. Bird, Ahmad Lotfi
  • for: 这个研究的目的是检测AI生成的speech,以防止深伪语音转换。
  • methods: 研究使用Retrieval-based Voice Conversion生成DEEP-VOICE dataset,并通过统计分析时间声学特征进行 binary 分类。
  • results: 使用208个个人机器学习模型进行10次十字验证,得到Extreme Gradient Boosting模型的平均分类精度为99.3%,可以在0.004毫秒内分类一段时间为1秒的语音。
    Abstract There are growing implications surrounding generative AI in the speech domain that enable voice cloning and real-time voice conversion from one individual to another. This technology poses a significant ethical threat and could lead to breaches of privacy and misrepresentation, thus there is an urgent need for real-time detection of AI-generated speech for DeepFake Voice Conversion. To address the above emerging issues, the DEEP-VOICE dataset is generated in this study, comprised of real human speech from eight well-known figures and their speech converted to one another using Retrieval-based Voice Conversion. Presenting as a binary classification problem of whether the speech is real or AI-generated, statistical analysis of temporal audio features through t-testing reveals that there are significantly different distributions. Hyperparameter optimisation is implemented for machine learning models to identify the source of speech. Following the training of 208 individual machine learning models over 10-fold cross validation, it is found that the Extreme Gradient Boosting model can achieve an average classification accuracy of 99.3% and can classify speech in real-time, at around 0.004 milliseconds given one second of speech. All data generated for this study is released publicly for future research on AI speech detection.
    摘要 “生成AI在语音领域的应用逐渐增加,可以实现人声复制和即时变换一个人的语音为另一个人的语音。这技术带来了重要的伦理性问题,可能会导致隐私泄露和误导,因此有urgent需要实时检测AI生成的语音。为解决以上问题,这study中生成了DEEP-VOICE数据集,包括8名知名人士的真实语音和将其转换为另一个人的语音使用Retrieval-based Voice Conversion。这问题设置为一个二分类问题,决定语音是真实或AI生成的。通过时间对数特性分析,发现AI生成的语音和真实语音之间有 statistically significant differences。运用机器学习模型来识别语音来源,并在10次跨 Validation中训练208个个体学习模型。结果显示,Extreme Gradient Boosting模型可以实现平均分类率99.3%,并且可以在约0.004毫秒静态识别语音,对于一秒语音来说,这是非常快的。所有这study中生成的数据都会公开发布,以便未来关于AI语音检测的研究。”

Out of the Box Thinking: Improving Customer Lifetime Value Modelling via Expert Routing and Game Whale Detection

  • paper_url: http://arxiv.org/abs/2308.12729
  • repo_url: None
  • paper_authors: Shijie Zhang, Xin Yan, Xuejiao Yang, Binfeng Jia, Shuangyang Wang
  • for: 这个研究旨在提出一个统一的多任务框架,以便同时进行用户生命值预测(LTV)和游戏鱼类检测(Game Whale Detection)。
  • methods: 这个研究使用了深度神经网络设计了一个游戏鱼类检测器,可以不仅推测用户的内在顺序, sondern auch精确地识别高支付者(游戏鱼)和低支付者。然后,这个检测器被用来决定不同的混合模式,以便充分利用共同信息和场景特定信息(例如游戏鱼模型和低支付者模型)。最后,相比之前设计了两个任务的分别的估计器,这个研究设计了一个共享的估计器,可以保留内部任务之间的关系。
  • results: 这个研究的实验结果显示,ExpLTV 可以优化游戏发行商在用户推广投资中的广告投资,并且可以提高预测用户生命值的准确性。
    Abstract Customer lifetime value (LTV) prediction is essential for mobile game publishers trying to optimize the advertising investment for each user acquisition based on the estimated worth. In mobile games, deploying microtransactions is a simple yet effective monetization strategy, which attracts a tiny group of game whales who splurge on in-game purchases. The presence of such game whales may impede the practicality of existing LTV prediction models, since game whales' purchase behaviours always exhibit varied distribution from general users. Consequently, identifying game whales can open up new opportunities to improve the accuracy of LTV prediction models. However, little attention has been paid to applying game whale detection in LTV prediction, and existing works are mainly specialized for the long-term LTV prediction with the assumption that the high-quality user features are available, which is not applicable in the UA stage. In this paper, we propose ExpLTV, a novel multi-task framework to perform LTV prediction and game whale detection in a unified way. In ExpLTV, we first innovatively design a deep neural network-based game whale detector that can not only infer the intrinsic order in accordance with monetary value, but also precisely identify high spenders (i.e., game whales) and low spenders. Then, by treating the game whale detector as a gating network to decide the different mixture patterns of LTV experts assembling, we can thoroughly leverage the shared information and scenario-specific information (i.e., game whales modelling and low spenders modelling). Finally, instead of separately designing a purchase rate estimator for two tasks, we design a shared estimator that can preserve the inner task relationships. The superiority of ExpLTV is further validated via extensive experiments on three industrial datasets.
    摘要 客户生命值(LTV)预测是移动游戏发布商需要优化每个用户获取的广告投资基于预测的价值。在移动游戏中,部署微交易是一种简单而有效的营收化策略,吸引了一些游戏鲸鱼,他们在游戏中购买各种付加值。鲸鱼的购买行为与普通用户存在很大差异,因此存在鲸鱼的存在可能会降低现有的 LTV 预测模型的实用性。然而,关于在 LTV 预测中应用游戏鲸鱼检测的研究得到了少量的关注,现有的工作主要关注于长期 LTV 预测,假设高质量的用户特征可以获得,这不适用于 UA 阶段。本文提出了 ExpLTV,一种新的多任务框架,用于同时进行 LTV 预测和游戏鲸鱼检测。在 ExpLTV 中,我们首先创新地设计了深度神经网络基于的游戏鲸鱼检测器,可以不仅掌握游戏鲸鱼的内在顺序,还能准确地识别高支付者(即游戏鲸鱼)和低支付者。然后,我们将游戏鲸鱼检测器作为分配网络来决定不同的混合模式,以便充分利用共同信息和场景特定信息(即游戏鲸鱼模型和低支付者模型)。最后,而不是分别设计两个任务的购买率估计器,我们设计了一个共享估计器,可以保持内部任务之间的关系。ExpLTV 的优势得到了EXTENSIVE EXPERIMENTS 的验证,并在三个 industrialdatasets 上进行了 validate。

Continuous Reinforcement Learning-based Dynamic Difficulty Adjustment in a Visual Working Memory Game

  • paper_url: http://arxiv.org/abs/2308.12726
  • repo_url: None
  • paper_authors: Masoud Rahimi, Hadi Moradi, Abdol-hossein Vahabie, Hamed Kebriaei
  • for: 提高玩家的游戏体验 (enhance the game experience)
  • methods: 使用 continuous reinforcement learning (RL) 方法调整游戏Difficulty (adjust game difficulty)
  • results: 比较rule-based方法,RL-based方法可以提高玩家的得分和胜率,同时减少游戏session中的得分下降 (compared to rule-based methods, the RL-based method can improve the player’s scores and win rates, while reducing the decline in scores over the course of a 20-trial session)
    Abstract Dynamic Difficulty Adjustment (DDA) is a viable approach to enhance a player's experience in video games. Recently, Reinforcement Learning (RL) methods have been employed for DDA in non-competitive games; nevertheless, they rely solely on discrete state-action space with a small search space. In this paper, we propose a continuous RL-based DDA methodology for a visual working memory (VWM) game to handle the complex search space for the difficulty of memorization. The proposed RL-based DDA tailors game difficulty based on the player's score and game difficulty in the last trial. We defined a continuous metric for the difficulty of memorization. Then, we consider the task difficulty and the vector of difficulty-score as the RL's action and state, respectively. We evaluated the proposed method through a within-subject experiment involving 52 subjects. The proposed approach was compared with two rule-based difficulty adjustment methods in terms of player's score and game experience measured by a questionnaire. The proposed RL-based approach resulted in a significantly better game experience in terms of competence, tension, and negative and positive affect. Players also achieved higher scores and win rates. Furthermore, the proposed RL-based DDA led to a significantly less decline in the score in a 20-trial session.
    摘要 “智能难度调整(DDA)是一种有效的方法来提高玩家的游戏体验。最近,人工智能学习(RL)方法已经在非竞争游戏中应用于DDA,但它们仅仅基于精确的状态动作空间和小搜索空间。在这篇论文中,我们提出了基于RL的连续Difficulty Adjustment方法,用于处理视觉工作记忆游戏中的复杂搜索空间。我们定义了一个连续指标来衡量记忆难度,然后将任务难度和游戏难度的向量作为RL的动作和状态。我们通过在52名参与者的内部实验中评估了我们的方法,并与两种规则基于的难度调整方法进行比较。我们的RL基于方法对于玩家的得分和游戏体验得分(问卷评估)具有显著更好的效果,包括竞技感、紧张感和负面情感。玩家也获得了更高的得分和胜率。此外,我们的RL基于方法在20场游戏会议中的得分下降幅度也显著减少。”

Solving Forward and Inverse Problems of Contact Mechanics using Physics-Informed Neural Networks

  • paper_url: http://arxiv.org/abs/2308.12716
  • repo_url: None
  • paper_authors: T. Sahin, M. von Danwitz, A. Popp
  • for: 解决小塑性弹性物理中的前向和反向问题,使用物理学 informed neural networks (PINNs)。
  • methods: 使用混合变量表示、输出变换等技术,在训练网络时强制实施硬件边界条件和不等式约束。
  • results: PINNs 可以作为纯 PDE 解决方案、数据增强前向模型、参数标定 inverse 问题,以及快速评估的减少模型。
    Abstract This paper explores the ability of physics-informed neural networks (PINNs) to solve forward and inverse problems of contact mechanics for small deformation elasticity. We deploy PINNs in a mixed-variable formulation enhanced by output transformation to enforce Dirichlet and Neumann boundary conditions as hard constraints. Inequality constraints of contact problems, namely Karush-Kuhn-Tucker (KKT) type conditions, are enforced as soft constraints by incorporating them into the loss function during network training. To formulate the loss function contribution of KKT constraints, existing approaches applied to elastoplasticity problems are investigated and we explore a nonlinear complementarity problem (NCP) function, namely Fischer-Burmeister, which possesses advantageous characteristics in terms of optimization. Based on the Hertzian contact problem, we show that PINNs can serve as pure partial differential equation (PDE) solver, as data-enhanced forward model, as inverse solver for parameter identification, and as fast-to-evaluate surrogate model. Furthermore, we demonstrate the importance of choosing proper hyperparameters, e.g. loss weights, and a combination of Adam and L-BFGS-B optimizers aiming for better results in terms of accuracy and training time.
    摘要 To formulate the loss function contribution of KKT constraints, we explore existing approaches applied to elastoplasticity problems and investigate a nonlinear complementarity problem (NCP) function, namely Fischer-Burmeister, which has advantages in terms of optimization. Based on the Hertzian contact problem, we show that PINNs can serve as pure partial differential equation (PDE) solvers, data-enhanced forward models, inverse solvers for parameter identification, and fast-to-evaluate surrogate models.Furthermore, we demonstrate the importance of choosing proper hyperparameters, such as loss weights, and a combination of Adam and L-BFGS-B optimizers to achieve better results in terms of accuracy and training time.

Disentanglement Learning via Topology

  • paper_url: http://arxiv.org/abs/2308.12696
  • repo_url: None
  • paper_authors: Nikita Balabin, Daria Voronkova, Ilya Trofimov, Evgeny Burnaev, Serguei Barannikov
  • for: 学习分离表示,即数据表示的解释性和强健性的基础。
  • methods: 使用多尺度拓扑学损失函数来学习分离表示。
  • results: 对比之前的状态艺术方法,我们的方法在分离度指标(MIG、FactorVAE分数、SAP分数和DCI分离度)上显著提高了表示分离度。我们的方法可以无监督地应用于无标注因素变化的问题。此外,我们还示出了如何使用我们的拓扑学损失函数来找出已经训练的GAN中分离的方向。
    Abstract We propose TopDis (Topological Disentanglement), a method for learning disentangled representations via adding multi-scale topological loss term. Disentanglement is a crucial property of data representations substantial for the explainability and robustness of deep learning models and a step towards high-level cognition. The state-of-the-art method based on VAE minimizes the total correlation of the joint distribution of latent variables. We take a different perspective on disentanglement by analyzing topological properties of data manifolds. In particular, we optimize the topological similarity for data manifolds traversals. To the best of our knowledge, our paper is the first one to propose a differentiable topological loss for disentanglement. Our experiments have shown that the proposed topological loss improves disentanglement scores such as MIG, FactorVAE score, SAP score and DCI disentanglement score with respect to state-of-the-art results. Our method works in an unsupervised manner, permitting to apply it for problems without labeled factors of variation. Additionally, we show how to use the proposed topological loss to find disentangled directions in a trained GAN.
    摘要 我们提出了TopDis(Topological Disentanglement)方法,该方法通过添加多尺度 topological 损失项来学习分离的表示。分离是深度学习模型的解释性和稳定性的关键性质,也是高级认知的一步。现有的方法基于VAE减少共corrrelation的总 JOINT 分布变量的总 corrrelation。我们从数据集的topological性analyzing的角度来看待分离。具体来说,我们优化数据集的探索性 Similarity 来 traverse数据集的探索性。根据我们所知,我们的论文是第一篇提出了可导的topological损失的论文。我们的实验表明,我们的方法可以在无标注因素变量的情况下提高分离分数,如MIG、FactorVAE分数、SAP分数和DCI分离分数。我们的方法在无监督的情况下工作,可以应用于无标注因素变量的问题。此外,我们还展示了如何使用我们的topological损失来在训练过的GAN中找到分离的方向。

An Efficient Data Analysis Method for Big Data using Multiple-Model Linear Regression

  • paper_url: http://arxiv.org/abs/2308.12691
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Bohan Lyu, Jianzhong Li
  • for: 这篇论文提出了一种新的大数据分析方法,使用一种新定义的多模型线性回归(MMLR)模型,可以将输入数据集分成多个子集并建立本地线性回归模型。
  • methods: 该论文提出了一种新的近似算法来构建MMLR模型,基于($\epsilon$, $\delta$)-估计器,并给出了数学证明MMLR算法的正确性和效率。
  • results: 该论文通过实验证明,MMLR算法在许多情况下和现有回归方法有相当的性能,而且它的计算时间几乎是最短的。
    Abstract This paper introduces a new data analysis method for big data using a newly defined regression model named multiple model linear regression(MMLR), which separates input datasets into subsets and construct local linear regression models of them. The proposed data analysis method is shown to be more efficient and flexible than other regression based methods. This paper also proposes an approximate algorithm to construct MMLR models based on $(\epsilon,\delta)$-estimator, and gives mathematical proofs of the correctness and efficiency of MMLR algorithm, of which the time complexity is linear with respect to the size of input datasets. This paper also empirically implements the method on both synthetic and real-world datasets, the algorithm shows to have comparable performance to existing regression methods in many cases, while it takes almost the shortest time to provide a high prediction accuracy.
    摘要 这篇论文介绍了一种新的大数据分析方法,使用新定义的多模型线性回归(MMLR)模型,将输入数据集分解成子集,并在每个子集上构建本地线性回归模型。提出的数据分析方法比其他回归基于方法更高效和灵活。本文还提出了一种近似算法,基于($\epsilon$, $\delta$)-估计器来构建MMLR模型,并给出了数学证明MMLR算法的正确性和效率。时间复杂度为输入数据集的线性时间。此外,文章还employs这种方法在 sintetic 和实际数据集上,实验结果表明,该算法在许多情况下与现有回归方法相当,而且具有最短时间内提供高预测精度的优点。

Match-And-Deform: Time Series Domain Adaptation through Optimal Transport and Temporal Alignment

  • paper_url: http://arxiv.org/abs/2308.12686
  • repo_url: https://github.com/rtavenar/MatchAndDeform
  • paper_authors: François Painblanc, Laetitia Chapel, Nicolas Courty, Chloé Friguet, Charlotte Pelletier, Romain Tavenard
  • for: 本文旨在采用源频率上的标签来分类目标频率上的数据,同时解决时间序列中的扭曲问题。
  • methods: 本文提出的匹配和扭曲(Match-And-Deform,MAD)方法可以在源和目标频率上找到匹配,同时通过最优运输损失和动态时间折叠来对时间序列进行同步。
  • results: 实验结果表明,MAD可以在标准时间序列预测任务上达到类似或更好的表现,比如深度时间序列预测任务。
    Abstract While large volumes of unlabeled data are usually available, associated labels are often scarce. The unsupervised domain adaptation problem aims at exploiting labels from a source domain to classify data from a related, yet different, target domain. When time series are at stake, new difficulties arise as temporal shifts may appear in addition to the standard feature distribution shift. In this paper, we introduce the Match-And-Deform (MAD) approach that aims at finding correspondences between the source and target time series while allowing temporal distortions. The associated optimization problem simultaneously aligns the series thanks to an optimal transport loss and the time stamps through dynamic time warping. When embedded into a deep neural network, MAD helps learning new representations of time series that both align the domains and maximize the discriminative power of the network. Empirical studies on benchmark datasets and remote sensing data demonstrate that MAD makes meaningful sample-to-sample pairing and time shift estimation, reaching similar or better classification performance than state-of-the-art deep time series domain adaptation strategies.
    摘要 大量的无标签数据通常可available,但关联的标签却scarce. 不supervised domain adaptation问题的目标是利用源Domain中的标签来分类目标Domain中的数据。当时间序列出现时,新的问题出现,即特征分布shift。在这篇文章中,我们引入Match-And-Deform(MAD)方法,该方法在找到源和目标时间序列之间的匹配的同时,允许时间偏移。相关的优化问题同时使用最优运输损失和时间戳Dynamic Time Warping来对时间序列进行对齐。当抽象到深度神经网络中时,MAD帮助建立新的时间序列表示,同时对域进行对齐和最大化神经网络的分类力。empirical studies on benchmark datasets和远程感知数据表明,MAD可以实现有意义的样本对对应和时间偏移估计,达到或更好的深度时间序列领域适应策略的分类性能。

LR-XFL: Logical Reasoning-based Explainable Federated Learning

  • paper_url: http://arxiv.org/abs/2308.12681
  • repo_url: None
  • paper_authors: Yanci Zhang, Han Yu
  • for: 这篇论文旨在探讨如何在联合学习中保持数据隐私,同时提高模型的透明度和解释性。
  • methods: 这篇论文提出了逻辑推理基于联合学习(LR-XFL)方法,该方法通过在客户端上创建本地逻辑规则,并将其传输到联合学习服务器。服务器通过将本地逻辑规则连接成一个合理的逻辑连接,而不需要访问原始数据,以提高模型的透明度和解释性。
  • results: 实验结果表明,LR-XFL比最相关的基eline提高了1.19%、5.81%和5.41%的分类精度、逻辑规则精度和逻辑整合度,分别。此外,LR-XFL还可以提高全局联合学习模型的可靠性和透明度,为医疗和金融等领域, где数据隐私和解释性均很重要,带来可能的改进。
    Abstract Federated learning (FL) is an emerging approach for training machine learning models collaboratively while preserving data privacy. The need for privacy protection makes it difficult for FL models to achieve global transparency and explainability. To address this limitation, we incorporate logic-based explanations into FL by proposing the Logical Reasoning-based eXplainable Federated Learning (LR-XFL) approach. Under LR-XFL, FL clients create local logic rules based on their local data and send them, along with model updates, to the FL server. The FL server connects the local logic rules through a proper logical connector that is derived based on properties of client data, without requiring access to the raw data. In addition, the server also aggregates the local model updates with weight values determined by the quality of the clients' local data as reflected by their uploaded logic rules. The results show that LR-XFL outperforms the most relevant baseline by 1.19%, 5.81% and 5.41% in terms of classification accuracy, rule accuracy and rule fidelity, respectively. The explicit rule evaluation and expression under LR-XFL enable human experts to validate and correct the rules on the server side, hence improving the global FL model's robustness to errors. It has the potential to enhance the transparency of FL models for areas like healthcare and finance where both data privacy and explainability are important.
    摘要 federated learning(FL)是一种emerging approach дляtrain machine learning模型在协同下保持数据隐私。由于保护隐私的需求,FL模型具有global transparency和解释性的限制。为了解决这个限制,我们将逻辑based explanations incorporated into FL by proposing the Logical Reasoning-based eXplainable Federated Learning(LR-XFL)approach。在LR-XFL中,FL客户端创建基于本地数据的本地逻辑规则,并将其发送到FL服务器。FL服务器通过基于客户端数据的性质 derivation proper logical connector,连接本地逻辑规则,而无需访问原始数据。此外,服务器还将本地模型更新与基于客户端数据质量的weight值相连接。结果显示,LR-XFL比最相关的基eline 高1.19%,5.81%和5.41%在分类精度,规则精度和规则忠实度方面。LR-XFL中的explicit rule evaluation和表达使得人工专家可以在服务器端验证和修正规则,因此提高了全球FL模型的鲁棒性。这有可能提高FL模型在医疗和金融等领域的透明度,这些领域都是数据隐私和解释性重要。

Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity Constraints

  • paper_url: http://arxiv.org/abs/2308.12680
  • repo_url: https://github.com/huanghanchi/master-slave-algorithm-for-top-k-bandits
  • paper_authors: Hanchi Huang, Li Shen, Deheng Ye, Wei Liu
  • for: 解决 combinatorial 多臂矢量带带约束的 top-$K$ 带带约束问题,这是首次考虑 combinatorial 带带约束的带带约束设定。
  • methods: 引入 six 个奴隶模型,每个奴隶模型具有独特的优点,以生成均衡奖励和约束的多样化样本,并使用教师学习和策略合作技术来提高奴隶模型的表现。
  • results: 比较 existing state-of-the-art 算法,our 方法在 synthetic 和实际数据集上 для recommendation 任务表现出色,并且可以快速尝试多个策略,以满足不同的约束和奖励要求。
    Abstract We propose a novel master-slave architecture to solve the top-$K$ combinatorial multi-armed bandits problem with non-linear bandit feedback and diversity constraints, which, to the best of our knowledge, is the first combinatorial bandits setting considering diversity constraints under bandit feedback. Specifically, to efficiently explore the combinatorial and constrained action space, we introduce six slave models with distinguished merits to generate diversified samples well balancing rewards and constraints as well as efficiency. Moreover, we propose teacher learning based optimization and the policy co-training technique to boost the performance of the multiple slave models. The master model then collects the elite samples provided by the slave models and selects the best sample estimated by a neural contextual UCB-based network to make a decision with a trade-off between exploration and exploitation. Thanks to the elaborate design of slave models, the co-training mechanism among slave models, and the novel interactions between the master and slave models, our approach significantly surpasses existing state-of-the-art algorithms in both synthetic and real datasets for recommendation tasks. The code is available at: \url{https://github.com/huanghanchi/Master-slave-Algorithm-for-Top-K-Bandits}.
    摘要 我们提出了一种新的主奴 Architecture来解决top-$K$ combinatorial多臂弓炮问题,这是我们所知道的首次对 combinatorial bandits 设定下的多臂弓炮问题中考虑了多样性约束。 Specifically, 为了有效地探索 combinatorial 和受限的动作空间,我们引入了六个奴隶模型,每个模型具有特殊的优点,以生成均衡奖励和约束的多样化样本。此外,我们提出了教师学习基于优化和策略合作技术来提高多个奴隶模型的性能。 master model 然后收集奴隶模型提供的最佳样本,并使用神经上下文ual UCB 网络来选择最佳样本,以实现exploration 和 exploitation 的平衡。 благо于奴隶模型的精心设计、奴隶模型之间的合作机制以及主奴模型与奴隶模型之间的新型互动,我们的方法在 synthetic 和实际数据集上 для推荐任务上显著超过了现有的状态足算法。代码可以在以下链接中找到: \url{https://github.com/huanghanchi/Master-slave-Algorithm-for-Top-K-Bandits}.

A Continual Learning Approach for Cross-Domain White Blood Cell Classification

  • paper_url: http://arxiv.org/abs/2308.12679
  • repo_url: None
  • paper_authors: Ario Sadafi, Raheleh Salehi, Armin Gruber, Sayedali Shetab Boushehri, Pascal Giehr, Nassir Navab, Carsten Marr
  • for: 静脉血白细胞分类是诊断血液疾病的关键,因此需要定期更新机器学习分类模型以适应不断变化的临床环境、数据源和疾病分类。
  • methods: 我们提出了一种基于复练的连续学习方法,用于在白细胞分类任务中处理逐渐学习和领域逐渐学习场景。我们使用模型预测结果选择 represntative 样本,以避免忘记先前学习的知识。
  • results: 我们对三个不同的白细胞分类 datasets 进行了全面的测试,包括颜色、分辨率和类型组合不同的场景。我们的方法在横跨领域的连续学习中表现出色,比如established baselines 和 iCaRL 和 EWC 方法。
    Abstract Accurate classification of white blood cells in peripheral blood is essential for diagnosing hematological diseases. Due to constantly evolving clinical settings, data sources, and disease classifications, it is necessary to update machine learning classification models regularly for practical real-world use. Such models significantly benefit from sequentially learning from incoming data streams without forgetting previously acquired knowledge. However, models can suffer from catastrophic forgetting, causing a drop in performance on previous tasks when fine-tuned on new data. Here, we propose a rehearsal-based continual learning approach for class incremental and domain incremental scenarios in white blood cell classification. To choose representative samples from previous tasks, we employ exemplar set selection based on the model's predictions. This involves selecting the most confident samples and the most challenging samples identified through uncertainty estimation of the model. We thoroughly evaluated our proposed approach on three white blood cell classification datasets that differ in color, resolution, and class composition, including scenarios where new domains or new classes are introduced to the model with every task. We also test a long class incremental experiment with both new domains and new classes. Our results demonstrate that our approach outperforms established baselines in continual learning, including existing iCaRL and EWC methods for classifying white blood cells in cross-domain environments.
    摘要 准确分类白血球在周围血液中是诊断血液疾病的关键。由于临床设定不断变化,数据来源和疾病分类不断更新,因此需要定期更新机器学习分类模型以适应实际世界中的应用。但是,模型可能会出现悬峰现象,导致在新数据上练习后表现下降。为此,我们提出了基于复习的不间断学习方法,适用于白血球分类的类增量和领域增量场景。我们使用模型预测结果来选择先前任务中的表现最好和最难的样本,以增强模型的稳定性和鲁棒性。我们对三个不同的白血球分类数据集进行了全面的评估,包括新领域和新类的引入。我们还进行了长期类增量实验,以测试我们的方法在跨领域环境中的表现。结果表明,我们的方法在不间断学习中超过了现有的iCaRL和EWC方法,用于分类白血球。

Masked Feature Modelling: Feature Masking for the Unsupervised Pre-training of a Graph Attention Network Block for Bottom-up Video Event Recognition

  • paper_url: http://arxiv.org/abs/2308.12673
  • repo_url: None
  • paper_authors: Dimitrios Daskalakis, Nikolaos Gkalelis, Vasileios Mezaris
  • for: 这 paper 是为了提高视频事件识别性能而设计的。
  • methods: 这 paper 使用了一种新的隐藏特征模型(Masked Feature Modelling,MFM),该模型利用预训练的视觉化器来重建视频中对象的隐藏特征,并将这些特征与一个已经预训练的 Graph Attention Network(GAT)块结合使用。
  • results: 实验表明,使用 MFM 可以提高视频事件识别性能。
    Abstract In this paper, we introduce Masked Feature Modelling (MFM), a novel approach for the unsupervised pre-training of a Graph Attention Network (GAT) block. MFM utilizes a pretrained Visual Tokenizer to reconstruct masked features of objects within a video, leveraging the MiniKinetics dataset. We then incorporate the pre-trained GAT block into a state-of-the-art bottom-up supervised video-event recognition architecture, ViGAT, to improve the model's starting point and overall accuracy. Experimental evaluations on the YLI-MED dataset demonstrate the effectiveness of MFM in improving event recognition performance.
    摘要 在这篇论文中,我们介绍了一种新的无监督预训练方法,即Masked Feature Modelling(MFM),用于提高视频事件识别性能。MFM利用一个预训练的视觉词法器来重建视频中对象的做了遮盲特征,基于MiniKinetics dataset。然后,我们将预训练的GAT块integrated到了现有的顶部向下推导视频事件识别架构ViGAT中,以提高模型的起点和总性能。实验评估在YLI-MED数据集上,证明MFM可以提高事件识别性能。

Optimal data pooling for shared learning in maintenance operations

  • paper_url: http://arxiv.org/abs/2308.12670
  • repo_url: None
  • paper_authors: Collin Drent, Melvin Drent, Geert-Jan van Houtum
  • for: 本研究探讨了 Shared Learning 在维护操作中的优势。
  • methods: 我们使用了一种归纳数据的方法,可以将高维 Markov 决策过程(MDP)转化为两维 MDP,以便进行结构分析和计算。
  • results: 我们的研究表明,通过归纳数据,可以实现cost reduction,比不归纳的情况更好。
    Abstract This paper addresses the benefits of pooling data for shared learning in maintenance operations. We consider a set of systems subject to Poisson degradation that are coupled through an a-priori unknown rate. Decision problems involving these systems are high-dimensional Markov decision processes (MDPs). We present a decomposition result that reduces such an MDP to two-dimensional MDPs, enabling structural analyses and computations. We leverage this decomposition to demonstrate that pooling data can lead to significant cost reductions compared to not pooling.
    摘要 Translation Notes:* "Pooling data" is translated as "合并数据" (héshì data)* "Shared learning" is translated as "共享学习" (gòngshǎng xuéxí)* "Maintenance operations" is translated as "维护操作" (wéijī ànxíng)* "Decision problems" is translated as "决策问题" (jùcè wèn tí)* "High-dimensional Markov decision processes" is translated as "多维马尔可夫决策过程" (duōwèi mǎlèhuì juédà gòu jiāng)* "Decomposition result" is translated as "分解结果" (fēnjiè jièshu)* "Two-dimensional MDPs" is translated as "二维马尔可夫决策过程" (èrwèi mǎlèhuì juédà gòu jiāng)

Geodesic Mode Connectivity

  • paper_url: http://arxiv.org/abs/2308.12666
  • repo_url: https://github.com/char-tan/geodesic-mode-connectivity
  • paper_authors: Charlie Tan, Theodore Long, Sarah Zhao, Rudolf Laine
  • for: 研究模型之间的连接性,即训练过的模型可以通过低损失的路径相连。
  • methods: 使用信息几何来研究神经网络,即神经网络可视为参数化分布的空间,其几何结构具有弯曲性。提出使用曲线来近似地odesics,以实现模式连接性。
  • results: 验证了曲线方法可以实现模式连接性,并且可以用来优化模型的性能。
    Abstract Mode connectivity is a phenomenon where trained models are connected by a path of low loss. We reframe this in the context of Information Geometry, where neural networks are studied as spaces of parameterized distributions with curved geometry. We hypothesize that shortest paths in these spaces, known as geodesics, correspond to mode-connecting paths in the loss landscape. We propose an algorithm to approximate geodesics and demonstrate that they achieve mode connectivity.
    摘要 Mode 连接性是一种现象,训练过的模型之间由一条低损失的路径相连。我们将这种现象重新划分到信息 геометрии中,即神经网络被视为参数化分布空间的拥有护圈的空间。我们假设这些空间中的最短路径(即地odesics)对应于损失图像中的模式连接路径。我们提出了一种算法来approxime geodesics,并证明它们实现模式连接性。Note: "Simplified Chinese" is a romanization of the Chinese language that uses a simpler set of characters and grammar than Traditional Chinese. It is commonly used in mainland China and Singapore.

Don’t Look into the Sun: Adversarial Solarization Attacks on Image Classifiers

  • paper_url: http://arxiv.org/abs/2308.12661
  • repo_url: https://github.com/paulgavrikov/adversarial_solarization
  • paper_authors: Paul Gavrikov, Janis Keuper
  • for: This paper aims to evaluate the robustness of deep neural networks against out-of-distribution inputs, specifically through the use of image solarization attacks.
  • methods: The paper introduces a new attack method based on image solarization, which is a straightforward yet effective way to degrade the accuracy of image classification models.
  • results: The authors demonstrate the attack’s capacity to significantly degrade accuracy, and show that existing defenses are not consistently effective against this specific attack.Here is the same information in Simplified Chinese:
  • for: 这篇论文目的是评估深度神经网络对不同输入的Robustness,具体来说是通过图像曝光攻击来评估。
  • methods: 这篇论文提出了一种基于图像曝光的新攻击方法,这种方法是简单又有效的,可以减弱图像分类模型的准确率。
  • results: 作者们示出了这种攻击的准确率下降的能力,并证明现有的防御措施不一定有效对于这种特定的攻击。
    Abstract Assessing the robustness of deep neural networks against out-of-distribution inputs is crucial, especially in safety-critical domains like autonomous driving, but also in safety systems where malicious actors can digitally alter inputs to circumvent safety guards. However, designing effective out-of-distribution tests that encompass all possible scenarios while preserving accurate label information is a challenging task. Existing methodologies often entail a compromise between variety and constraint levels for attacks and sometimes even both. In a first step towards a more holistic robustness evaluation of image classification models, we introduce an attack method based on image solarization that is conceptually straightforward yet avoids jeopardizing the global structure of natural images independent of the intensity. Through comprehensive evaluations of multiple ImageNet models, we demonstrate the attack's capacity to degrade accuracy significantly, provided it is not integrated into the training augmentations. Interestingly, even then, no full immunity to accuracy deterioration is achieved. In other settings, the attack can often be simplified into a black-box attack with model-independent parameters. Defenses against other corruptions do not consistently extend to be effective against our specific attack. Project website: https://github.com/paulgavrikov/adversarial_solarization
    摘要 要评估深度神经网络对于非标准输入的Robustness特别是在自动驾驶和安全系统中,因为攻击者可以通过修改输入来绕过安全措施。然而,设计全面的非标准测试方法,同时保持准确性信息是一项复杂的任务。现有的方法ologies often involve a compromise between variety and constraint levels for attacks, and sometimes even both.为了更好地评估图像分类模型的Robustness,我们介绍了一种基于图像曝光的攻击方法,这种方法是概念简单,但不会影响自然图像的全球结构独立于强度。通过对多个ImageNet模型进行全面的评估,我们示出了这种攻击的能力在准确性上带来了显著的下降,只要它不包括在训练增强中。即使如此,也没有完全免疫于准确性下降。在其他设置下,这种攻击可以大多数情况下简化为黑盒攻击,参数独立于模型。防御其他损害不一定能延伸到对我们的特定攻击有效。项目网站:https://github.com/paulgavrikov/adversarial_solarization

APART: Diverse Skill Discovery using All Pairs with Ascending Reward and DropouT

  • paper_url: http://arxiv.org/abs/2308.12649
  • repo_url: None
  • paper_authors: Hadar Schreiber Galler, Tom Zahavy, Guillaume Desjardins, Alon Cohen
  • for: 这个论文旨在解决无奖励环境中多种技能发现问题,目标是在简单的网格世界环境中发现所有可能的技能。
  • methods: 该论文使用了一种名为APART的方法,即多对多(所有对对)拟合器和一种新的内在奖励函数,以及一种排除误差技术。
  • results: 论文表明,APART在网格世界环境中能够寻找所有可能的技能,并且需要更少的样本数据 than previous works。此外,论文还提出了一种简化版的算法,可以达到最大技能数量。
    Abstract We study diverse skill discovery in reward-free environments, aiming to discover all possible skills in simple grid-world environments where prior methods have struggled to succeed. This problem is formulated as mutual training of skills using an intrinsic reward and a discriminator trained to predict a skill given its trajectory. Our initial solution replaces the standard one-vs-all (softmax) discriminator with a one-vs-one (all pairs) discriminator and combines it with a novel intrinsic reward function and a dropout regularization technique. The combined approach is named APART: Diverse Skill Discovery using All Pairs with Ascending Reward and Dropout. We demonstrate that APART discovers all the possible skills in grid worlds with remarkably fewer samples than previous works. Motivated by the empirical success of APART, we further investigate an even simpler algorithm that achieves maximum skills by altering VIC, rescaling its intrinsic reward, and tuning the temperature of its softmax discriminator. We believe our findings shed light on the crucial factors underlying success of skill discovery algorithms in reinforcement learning.
    摘要 我们研究了无奖环境中多样化技能发现,旨在发现所有可能的技能在简单的格子世界环境中。这个问题被形式化为用内生奖和预测技能的探测器进行互训练技能。我们的初始解决方案是将标准的一对一(所有对)探测器取代一个一对多(softmax)探测器,并将其与一种新的内生奖函数和掉帽正则化技术相结合。这种结合方法被称为APART:多样化技能发现使用所有对with Ascending奖和掉帽。我们示出了APART在格子世界中可以很快地发现所有可能的技能,远远少于之前的实验成果。受APART的实验成功的激发,我们进一步调查了一种最简单的算法,通过修改VIC、调整其内生奖、并调整探测器的温度来实现最大技能。我们认为我们的发现可以透视到涉及到奖励学习算法成功的关键因素。

The GENEA Challenge 2023: A large scale evaluation of gesture generation models in monadic and dyadic settings

  • paper_url: http://arxiv.org/abs/2308.12646
  • repo_url: None
  • paper_authors: Taras Kucherenko, Rajmund Nagy, Youngwoo Yoon, Jieyeon Woo, Teodor Nikolov, Mihail Tsakov, Gustav Eje Henter
  • for: 这个研究报告描述了2023年的GENEA挑战,参与者们构建了基于语音的手势生成系统,并进行了共同评估。
  • methods: 参与者们使用了同一个语音和动作数据集,并使用了语音和动作的同时评估。
  • results: 研究发现了参与者们的动作human-likeness有很大的差异,只有一些系统被评估为近似人工捕捉数据。并且,适用性问题尚未得到解决,大多数提交系统只能在有限的范围内 slightly above chance 水平。
    Abstract This paper reports on the GENEA Challenge 2023, in which participating teams built speech-driven gesture-generation systems using the same speech and motion dataset, followed by a joint evaluation. This year's challenge provided data on both sides of a dyadic interaction, allowing teams to generate full-body motion for an agent given its speech (text and audio) and the speech and motion of the interlocutor. We evaluated 12 submissions and 2 baselines together with held-out motion-capture data in several large-scale user studies. The studies focused on three aspects: 1) the human-likeness of the motion, 2) the appropriateness of the motion for the agent's own speech whilst controlling for the human-likeness of the motion, and 3) the appropriateness of the motion for the behaviour of the interlocutor in the interaction, using a setup that controls for both the human-likeness of the motion and the agent's own speech. We found a large span in human-likeness between challenge submissions, with a few systems rated close to human mocap. Appropriateness seems far from being solved, with most submissions performing in a narrow range slightly above chance, far behind natural motion. The effect of the interlocutor is even more subtle, with submitted systems at best performing barely above chance. Interestingly, a dyadic system being highly appropriate for agent speech does not necessarily imply high appropriateness for the interlocutor. Additional material is available via the project website at https://svito-zar.github.io/GENEAchallenge2023/ .
    摘要
  1. Human-likeness of the motion2. Appropriateness of the motion for the agent’s own speech, while controlling for the human-likeness of the motion3. Appropriateness of the motion for the behavior of the interlocutor in the interaction, while controlling for both the human-likeness of the motion and the agent’s own speechWe found a large span in human-likeness between challenge submissions, with a few systems rated close to human motion capture. However, appropriateness was not well-solved, with most submissions performing in a narrow range slightly above chance, far behind natural motion. The effect of the interlocutor was even more subtle, with submitted systems at best performing barely above chance. Interestingly, a dyadic system being highly appropriate for agent speech does not necessarily imply high appropriateness for the interlocutor. Additional materials are available on the project website at https://svito-zar.github.io/GENEAchallenge2023/.

Towards Hierarchical Regional Transformer-based Multiple Instance Learning

  • paper_url: http://arxiv.org/abs/2308.12634
  • repo_url: None
  • paper_authors: Josef Cersovsky, Sadegh Mohammadi, Dagmar Kainmueller, Johannes Hoehne
  • for: 这个研究旨在提高大比例 Histopathology 图像的分类 task 的性能,使用深度多实例学习模型。
  • methods: 该方法使用 Transformer 基于自注意机制,并将 regional 自注意机制应用于 Vision Transformer 中。方法还利用区域融合来Derive slide-level 预测,并可以在不同距离水平上堆叠进行特征处理。
  • results: 该方法在两个 Histopathology 数据集上显著提高了性能,特别是对于具有小本地形态特征的数据集。
    Abstract The classification of gigapixel histopathology images with deep multiple instance learning models has become a critical task in digital pathology and precision medicine. In this work, we propose a Transformer-based multiple instance learning approach that replaces the traditional learned attention mechanism with a regional, Vision Transformer inspired self-attention mechanism. We present a method that fuses regional patch information to derive slide-level predictions and show how this regional aggregation can be stacked to hierarchically process features on different distance levels. To increase predictive accuracy, especially for datasets with small, local morphological features, we introduce a method to focus the image processing on high attention regions during inference. Our approach is able to significantly improve performance over the baseline on two histopathology datasets and points towards promising directions for further research.
    摘要 “ digitization pathology 和精度医学中,分类 gigapixel 组织学像(histopathology images)已成为一个重要的任务。在这个工作中,我们提出了一个基于 Transformer 的多个实例学习方法,取代传统的学习注意力机制。我们提出了一种使用区域 Vision Transformer 灵活注意力机制来聚合地方资讯,以 deriv 标本水平预测。此外,我们还引入了一种以高注意区域进行推理时的对应方法,以提高预测精度,特别是在小型、地方 morphological 特征的数据集上。我们的方法能够在两个 histopathology 数据集上实现明显的性能提升,这点显示了我们的方法具有推进性。”Note: Simplified Chinese is used here, which is a standardized form of Chinese that is widely used in mainland China and Singapore.

Uncertainty and Explainable Analysis of Machine Learning Model for Reconstruction of Sonic Slowness Logs

  • paper_url: http://arxiv.org/abs/2308.12625
  • repo_url: None
  • paper_authors: Hua Wang, Yuqiong Wu, Yushun Zhang, Fuqiang Lai, Zhou Feng, Bing Xie, Ailin Zhao
  • for: 这个论文的目的是用机器学习算法预测在垂直或老井中缺失的压缩波慢速度和剪切波慢速度记录,以便在钻井Field应用中减少缺失的问题。
  • methods: 这个论文使用了2020年的机器学习竞赛数据,并使用NGBoost算法构建了一个ensemble学习模型,以预测缺失的压缩波慢速度和剪切波慢速度记录。此外,使用SHAP方法来探究机器学习模型的可解性。
  • results: 研究发现,NGBoost模型在测试集中表现良好,可以提供预测结果的概率分布。此外,对预测结果的变化进行了评估,并发现预测结果的变化与 neutron气压和γ射线的大小有关,这与石油物理模型的认知相符。此外,机器学习模型还捕捉了钻井尺寸的变化对 slowness的影响,这种影响是复杂的,不易建立直接关系。这些发现与物理原理相符。
    Abstract Logs are valuable information for oil and gas fields as they help to determine the lithology of the formations surrounding the borehole and the location and reserves of subsurface oil and gas reservoirs. However, important logs are often missing in horizontal or old wells, which poses a challenge in field applications. In this paper, we utilize data from the 2020 machine learning competition of the SPWLA, which aims to predict the missing compressional wave slowness and shear wave slowness logs using other logs in the same borehole. We employ the NGBoost algorithm to construct an Ensemble Learning model that can predicate the results as well as their uncertainty. Furthermore, we combine the SHAP method to investigate the interpretability of the machine learning model. We compare the performance of the NGBosst model with four other commonly used Ensemble Learning methods, including Random Forest, GBDT, XGBoost, LightGBM. The results show that the NGBoost model performs well in the testing set and can provide a probability distribution for the prediction results. In addition, the variance of the probability distribution of the predicted log can be used to justify the quality of the constructed log. Using the SHAP explainable machine learning model, we calculate the importance of each input log to the predicted results as well as the coupling relationship among input logs. Our findings reveal that the NGBoost model tends to provide greater slowness prediction results when the neutron porosity and gamma ray are large, which is consistent with the cognition of petrophysical models. Furthermore, the machine learning model can capture the influence of the changing borehole caliper on slowness, where the influence of borehole caliper on slowness is complex and not easy to establish a direct relationship. These findings are in line with the physical principle of borehole acoustics.
    摘要 批处是钻井场中的重要信息,它们可以帮助确定附近钻井的地层学特性和油气储量。然而,有些重要的批处在水平或老钻井中缺失,这会对钻井场的应用带来挑战。在这篇论文中,我们使用2020年机器学习竞赛的SPWLA数据,以预测缺失的压缩波慢速和剪切波慢速批处。我们使用NGBoost算法构建了一个ensemble学习模型,可以预测结果以及其不确定性。此外,我们使用SHAP方法来调查机器学习模型的可解释性。我们与四种常用的ensemble学习方法进行比较,包括Random Forest、GBDT、XGBoost和LightGBM。结果显示,NGBoost模型在测试集中表现良好,并可以提供预测结果的概率分布。此外,预测结果的不确定性的方差可以用来评估模型的质量。使用SHAP可解释机器学习模型,我们计算了每个输入批处对预测结果的重要性以及输入批处之间的相互关系。我们的发现表明,NGBoost模型在大于钻井内 neutron气压和γ射线时表现出更好的慢速预测结果,这与岩石物理模型的认知一致。此外,机器学习模型可以捕捉随着钻井压力的变化,钻井压力与批处之间的复杂关系。这与物理原理的钻井声学有关。

Try with Simpler – An Evaluation of Improved Principal Component Analysis in Log-based Anomaly Detection

  • paper_url: http://arxiv.org/abs/2308.12612
  • repo_url: None
  • paper_authors: Lin Yang, Junjie Chen, Zhihao Gong, Shutao Gao, Hongyu Zhang, Yue Kang, Huaan Li
  • for: 这个研究的目的是强化传统机器学习和数据挖掘技术,以提高logs中的异常探测效能。
  • methods: 本研究使用了优化的无supervised PCA(主成分分析)技术,将logs中的 semantic-based representation与数据分析相结合,以解决无法在训练数据中看到的问题。
  • results: 结果显示,优化的PCA技术与进阶的supervised/semi-supervised深度学习方法的效能相似,并且在有限的训练数据和资源下更稳定。
    Abstract The rapid growth of deep learning (DL) has spurred interest in enhancing log-based anomaly detection. This approach aims to extract meaning from log events (log message templates) and develop advanced DL models for anomaly detection. However, these DL methods face challenges like heavy reliance on training data, labels, and computational resources due to model complexity. In contrast, traditional machine learning and data mining techniques are less data-dependent and more efficient but less effective than DL. To make log-based anomaly detection more practical, the goal is to enhance traditional techniques to match DL's effectiveness. Previous research in a different domain (linking questions on Stack Overflow) suggests that optimized traditional techniques can rival state-of-the-art DL methods. Drawing inspiration from this concept, we conducted an empirical study. We optimized the unsupervised PCA (Principal Component Analysis), a traditional technique, by incorporating lightweight semantic-based log representation. This addresses the issue of unseen log events in training data, enhancing log representation. Our study compared seven log-based anomaly detection methods, including four DL-based, two traditional, and the optimized PCA technique, using public and industrial datasets. Results indicate that the optimized unsupervised PCA technique achieves similar effectiveness to advanced supervised/semi-supervised DL methods while being more stable with limited training data and resource-efficient. This demonstrates the adaptability and strength of traditional techniques through small yet impactful adaptations.
    摘要 深度学习(DL)的快速发展激发了对日志基本异常检测的改进。这种方法的目标是从日志事件模板中提取意义并开发高级DL模型进行异常检测。然而,这些DL方法面临着强依赖于训练数据、标签和计算资源的挑战,因为模型的复杂性。与此相反,传统的机器学习和数据挖掘技术更加不依赖于数据,更加高效,但也更加效率。为了让日志基本异常检测更加实用,目标是提高传统技术,使其与DL的效果相匹配。前一个研究(在Stack Overflow上的问题链接)表明,优化传统技术可以与当前DL方法相当有效。以这个概念为发想,我们进行了一个实验研究。我们对不带标签的PCA(主成分分析)进行了优化,通过 incorporating lightweight semantic-based log representation来解决训练数据中未见的日志事件问题。我们对公共和工业 dataset 进行了七种日志基本异常检测方法的比较,其中包括四种DL基本、两种传统、优化PCA技术。结果表明,优化的无标签PCA技术与高级指导/半指导DL方法相当有效,同时更加稳定,资源更加有效。这种示例展示了传统技术的适应性和强大性,通过小 yet 有影响的改进。

A Greedy Approach for Offering to Telecom Subscribers

  • paper_url: http://arxiv.org/abs/2308.12606
  • repo_url: None
  • paper_authors: Piyush Kanti Bhunre, Tanmay Sen, Arijit Sarkar
  • For: The paper is written for telecom operators to optimize offer campaigns to retain subscribers and prevent churn.* Methods: The paper proposes a novel combinatorial algorithm to solve offer optimization under heterogeneous offers by maximizing expected revenue under the scenario of subscriber churn.* Results: The proposed algorithm is efficient and accurate even for a very large subscriber-base.Here are the three information points in Simplified Chinese text:* For: 这篇论文是为 телеком运营商开发的,以便优化提供套件,保持客户和防止落囊。* Methods: 论文提出了一种新的复合算法,用于在多种不同的套件下进行套件优化,以达到预期收入的最大化,面对消费者落囊的情况下。* Results: 该算法能够高效地处理很大的用户基数。
    Abstract Customer retention or churn prevention is a challenging task of a telecom operator. One of the effective approaches is to offer some attractive incentive or additional services or money to the subscribers for keeping them engaged and make sure they stay in the operator's network for longer time. Often, operators allocate certain amount of monetary budget to carry out the offer campaign. The difficult part of this campaign is the selection of a set of customers from a large subscriber-base and deciding the amount that should be offered to an individual so that operator's objective is achieved. There may be multiple objectives (e.g., maximizing revenue, minimizing number of churns) for selection of subscriber and selection of an offer to the selected subscriber. Apart from monetary benefit, offers may include additional data, SMS, hots-spot tethering, and many more. This problem is known as offer optimization. In this paper, we propose a novel combinatorial algorithm for solving offer optimization under heterogeneous offers by maximizing expected revenue under the scenario of subscriber churn, which is, in general, seen in telecom domain. The proposed algorithm is efficient and accurate even for a very large subscriber-base.
    摘要 客户退订或防退是电信运营商面临的挑战之一。一种有效的方法是向用户提供吸引人的折扣或附加服务,以保持用户的兴趣和使他们尽量长时间留在运营商的网络中。经常,运营商会分配一定的财务预算来实施优惠活动。选择一个来自大量用户基数的 subset 并决定每个用户所需的金额是Operator的目标实现的困难部分。有多个目标(例如,最大化收入,最小化退订数)可以用来选择用户和选择优惠给选择的用户。除了金钱的利益外,优惠可能包括额外数据、SMS、热点终端等多种服务。这个问题被称为优惠优化。在这篇论文中,我们提出了一种新的 combinatorial 算法,用于在不同类型的优惠下对客户优惠进行优化,以达到预期收入的最大化,这是在通信领域中一般存在的退订问题。提议的算法是高效和准确,即使用户基数非常大。

Exploiting Time-Frequency Conformers for Music Audio Enhancement

  • paper_url: http://arxiv.org/abs/2308.12599
  • repo_url: None
  • paper_authors: Yunkee Chae, Junghyun Koo, Sungho Lee, Kyogu Lee
  • for: 提高网络视频平台上音乐表演录音质量
  • methods: 基于Conformer架构,利用注意机制进行音乐提升
  • results: 实现单音轨提升和多轨混合音乐提升,达到领先水平
    Abstract With the proliferation of video platforms on the internet, recording musical performances by mobile devices has become commonplace. However, these recordings often suffer from degradation such as noise and reverberation, which negatively impact the listening experience. Consequently, the necessity for music audio enhancement (referred to as music enhancement from this point onward), involving the transformation of degraded audio recordings into pristine high-quality music, has surged to augment the auditory experience. To address this issue, we propose a music enhancement system based on the Conformer architecture that has demonstrated outstanding performance in speech enhancement tasks. Our approach explores the attention mechanisms of the Conformer and examines their performance to discover the best approach for the music enhancement task. Our experimental results show that our proposed model achieves state-of-the-art performance on single-stem music enhancement. Furthermore, our system can perform general music enhancement with multi-track mixtures, which has not been examined in previous work.
    摘要

LORD: Leveraging Open-Set Recognition with Unknown Data

  • paper_url: http://arxiv.org/abs/2308.12584
  • repo_url: None
  • paper_authors: Tobias Koch, Christian Riess, Thomas Köhler
  • for: 本研究旨在提高类фика器对未知数据的识别能力, addresses the challenge of handling entirely unknown data for deployed classifiers.
  • methods: 本研究提出了一种名为LORD的框架, Leverage Open-set Recognition by exploiting unknown Data。LORD在类ifier培训过程中直接模型开放空间,并提供了一系列模型独立的训练策略。
  • results: 根据研究表明,LORD可以提高未知数据的识别率,并且可以避免依赖于大量和昂贵的背景数据。此外,研究还发现了一种名为mixup的数据生成技术,可以作为背景数据的替代品,并且可以further improve OSR performance。
    Abstract Handling entirely unknown data is a challenge for any deployed classifier. Classification models are typically trained on a static pre-defined dataset and are kept in the dark for the open unassigned feature space. As a result, they struggle to deal with out-of-distribution data during inference. Addressing this task on the class-level is termed open-set recognition (OSR). However, most OSR methods are inherently limited, as they train closed-set classifiers and only adapt the downstream predictions to OSR. This work presents LORD, a framework to Leverage Open-set Recognition by exploiting unknown Data. LORD explicitly models open space during classifier training and provides a systematic evaluation for such approaches. We identify three model-agnostic training strategies that exploit background data and applied them to well-established classifiers. Due to LORD's extensive evaluation protocol, we consistently demonstrate improved recognition of unknown data. The benchmarks facilitate in-depth analysis across various requirement levels. To mitigate dependency on extensive and costly background datasets, we explore mixup as an off-the-shelf data generation technique. Our experiments highlight mixup's effectiveness as a substitute for background datasets. Lightweight constraints on mixup synthesis further improve OSR performance.
    摘要 处理完全未知数据是任何部署分类器的挑战。分类模型通常在静态预先定义的数据集上训练,因此在推理时难以处理不同步数据。为解决这个问题,我们提出了开放集 recognition(OSR)技术。然而,大多数OSR方法都受限于它们只是在closed-set分类器上进行适应,而不是直接训练开放集分类器。本文介绍了LORD框架,它可以在分类器训练过程中显式地模型开放空间,并提供了一种系统的评估方法。我们确定了三种模型不依赖的训练策略,并应用于已成熟的分类器。由于LORD的广泛评估协议,我们在不同的需求水平上 consistently 示出了对未知数据的更好的识别。这些标准化的协议使得可以进行深入的分析。为了减少依赖于费时和成本高的背景数据集,我们探索了mixup作为一种可用的数据生成技术。我们的实验表明,mixup可以作为背景数据集的替代品。进一步的轻量级约束可以进一步提高OSR性能。

Persistent learning signals and working memory without continuous attractors

  • paper_url: http://arxiv.org/abs/2308.12585
  • repo_url: None
  • paper_authors: Il Memming Park, Ábel Ságodi, Piotr Aleksander Sokół
  • for: 这个论文探讨了神经动力系统中稳定吸引结构,如点吸引器和连续吸引器,是否能够支持有用的时间学习信号,以适应环境中的时间结构变化。
  • methods: 作者使用了Periodic和 quasi-Periodic吸引器来支持学习无限长的时间关系。与Continuous吸引器不同, quasi-Periodic吸引器具有免疫细调问题,使其更适合学习生成时间结构。
  • results: 作者发现,Periodic和 quasi-Periodic吸引器可以支持学习无限长的时间关系,并且不受细调问题的影响。此外,作者还提出了一种新的初始化方案,可以使 искусствен neural network在学习时间动力学任务时表现出更好的性能。最后,作者还提出了一种Robust recurrent memory机制,可以在缺少环形吸引器的情况下,将头向量维护和 инте格。
    Abstract Neural dynamical systems with stable attractor structures, such as point attractors and continuous attractors, are hypothesized to underlie meaningful temporal behavior that requires working memory. However, working memory may not support useful learning signals necessary to adapt to changes in the temporal structure of the environment. We show that in addition to the continuous attractors that are widely implicated, periodic and quasi-periodic attractors can also support learning arbitrarily long temporal relationships. Unlike the continuous attractors that suffer from the fine-tuning problem, the less explored quasi-periodic attractors are uniquely qualified for learning to produce temporally structured behavior. Our theory has broad implications for the design of artificial learning systems and makes predictions about observable signatures of biological neural dynamics that can support temporal dependence learning and working memory. Based on our theory, we developed a new initialization scheme for artificial recurrent neural networks that outperforms standard methods for tasks that require learning temporal dynamics. Moreover, we propose a robust recurrent memory mechanism for integrating and maintaining head direction without a ring attractor.
    摘要 神经动力系统with稳定吸引结构,如点吸引器和连续吸引器,被假设在有用的时间行为中存在。然而,工作记忆可能无法提供有用的学习信号,以适应环境中的时间结构变化。我们表明,除了广泛被推荐的连续吸引器之外, periodic和 quasi-periodic吸引器也可以支持学习无限长的时间关系。与连续吸引器相比,quasi-periodic吸引器具有独特优势,可以学习生成时间结构化的行为。我们的理论具有广泛的应用前景,可以设计人工学习系统,并预测生物神经动力学中可以支持时间依赖学习和工作记忆的可观察特征。根据我们的理论,我们开发了一种新的初始化方案,可以超过标准方法在需要学习时间动力学任务中表现更好。此外,我们提出了一种可靠的回忆机制,可以融合和维护方向指向,而不需要环形吸引器。

A Huber Loss Minimization Approach to Byzantine Robust Federated Learning

  • paper_url: http://arxiv.org/abs/2308.12581
  • repo_url: None
  • paper_authors: Puning Zhao, Fei Yu, Zhiguo Wan
  • for: 强制学习系统受到敌意攻击的威胁,我们提出一种基于哈伯损函整合方法,并进行了全面的理论分析。
  • methods: 我们的方法基于哈伯损函整合,并且在独立同分布(i.i.d)假设下有以下优点:首先,它具有优化的 $\epsilon$ 依赖性,其中 $\epsilon$ 表示攻击客户端的比率;其次,我们的方法不需要准确地知道 $\epsilon$;最后,它允许客户端有不同的数据大小。
  • results: 我们扩展了我们的分析至非i.i.d数据,包括客户端有轻微不同的分布。
    Abstract Federated learning systems are susceptible to adversarial attacks. To combat this, we introduce a novel aggregator based on Huber loss minimization, and provide a comprehensive theoretical analysis. Under independent and identically distributed (i.i.d) assumption, our approach has several advantages compared to existing methods. Firstly, it has optimal dependence on $\epsilon$, which stands for the ratio of attacked clients. Secondly, our approach does not need precise knowledge of $\epsilon$. Thirdly, it allows different clients to have unequal data sizes. We then broaden our analysis to include non-i.i.d data, such that clients have slightly different distributions.
    摘要

Hypergraph Convolutional Networks for Fine-grained ICU Patient Similarity Analysis and Risk Prediction

  • paper_url: http://arxiv.org/abs/2308.12575
  • repo_url: None
  • paper_authors: Yuxi Liu, Zhenhao Zhang, Shaowen Qin, Flora D. Salim, Antonio Jimeno Yepes, Jun Shen
  • For: 预测患者死亡风险* Methods: 使用 Hypergraph Convolutional Network represent 非对应关系(如诊断代码),以捕捉隐藏特征结构,计算细化患者相似性* Results: 在使用 eICU Collaborative Research Database 评估中,方法与状态艺前模型相比,实现了更高的死亡风险预测性能,并通过多个案例研究,表明图网络可以提供良好的透明度和可靠性在决策中。Here’s the translation in Simplified Chinese:* For: 预测患者死亡风险* Methods: 使用 Hypergraph Convolutional Network represent 非对应关系(如诊断代码),以捕捉隐藏特征结构,计算细化患者相似性* Results: 在使用 eICU Collaborative Research Database 评估中,方法与状态艺前模型相比,实现了更高的死亡风险预测性能,并通过多个案例研究,表明图网络可以提供良好的透明度和可靠性在决策中。
    Abstract The Intensive Care Unit (ICU) is one of the most important parts of a hospital, which admits critically ill patients and provides continuous monitoring and treatment. Various patient outcome prediction methods have been attempted to assist healthcare professionals in clinical decision-making. Existing methods focus on measuring the similarity between patients using deep neural networks to capture the hidden feature structures. However, the higher-order relationships are ignored, such as patient characteristics (e.g., diagnosis codes) and their causal effects on downstream clinical predictions. In this paper, we propose a novel Hypergraph Convolutional Network that allows the representation of non-pairwise relationships among diagnosis codes in a hypergraph to capture the hidden feature structures so that fine-grained patient similarity can be calculated for personalized mortality risk prediction. Evaluation using a publicly available eICU Collaborative Research Database indicates that our method achieves superior performance over the state-of-the-art models on mortality risk prediction. Moreover, the results of several case studies demonstrated the effectiveness of constructing graph networks in providing good transparency and robustness in decision-making.
    摘要 医院重症监护室(ICU)是医院中最重要的部分之一, admit 重症病人并提供连续监测和治疗。不同的患者结果预测方法已经被尝试以协助医疗专业人员进行临床决策。现有方法主要是通过深度神经网络捕捉患者特征结构的隐藏关系,但是忽略了患者特征(例如诊断代码)和其影响下游临床预测的 causal 关系。在本文中,我们提出了一种新的 Hypergraph Convolutional Network,允许在幂图中表示诊断代码之间的非对比关系,以捕捉隐藏特征结构,从而计算出细致的患者相似性,用于个性化死亡风险预测。经过使用公共可用的 eICU Collaborative Research Database 评估,我们的方法在死亡风险预测中超过了当前状态的模型性能。此外,多个案例研究表明,在做出决策时,建立图网络可以提供良好的透明度和可靠性。

Conditional Kernel Imitation Learning for Continuous State Environments

  • paper_url: http://arxiv.org/abs/2308.12573
  • repo_url: None
  • paper_authors: Rishabh Agrawal, Nathan Dahlin, Rahul Jain, Ashutosh Nayyar
  • for: 本研究的目的是在离散状态空间环境中进行模仿学习,无需transition dynamics信息、奖励结构或任何额外交互。
  • methods: 我们的方法基于Markov balance equation,使用conditional kernel density estimator来估计环境的转移动力学,并尝试满足环境的 probabilistic balance equations。
  • results: 我们通过对离散状态 benchmark环境的数字实验表明,我们的方法在empirical性能方面具有显著优势,常常高于现有的IL算法。
    Abstract Imitation Learning (IL) is an important paradigm within the broader reinforcement learning (RL) methodology. Unlike most of RL, it does not assume availability of reward-feedback. Reward inference and shaping are known to be difficult and error-prone methods particularly when the demonstration data comes from human experts. Classical methods such as behavioral cloning and inverse reinforcement learning are highly sensitive to estimation errors, a problem that is particularly acute in continuous state space problems. Meanwhile, state-of-the-art IL algorithms convert behavioral policy learning problems into distribution-matching problems which often require additional online interaction data to be effective. In this paper, we consider the problem of imitation learning in continuous state space environments based solely on observed behavior, without access to transition dynamics information, reward structure, or, most importantly, any additional interactions with the environment. Our approach is based on the Markov balance equation and introduces a novel conditional kernel density estimation-based imitation learning framework. It involves estimating the environment's transition dynamics using conditional kernel density estimators and seeks to satisfy the probabilistic balance equations for the environment. We establish that our estimators satisfy basic asymptotic consistency requirements. Through a series of numerical experiments on continuous state benchmark environments, we show consistently superior empirical performance over many state-of-the-art IL algorithms.
    摘要 欢迎来到我们的实验室!今天我们将谈论一种重要的学习方法——仿制学习(Imitation Learning,IL)。不同于大多数激励学习(Reinforcement Learning,RL)方法,IL不假设环境会提供奖励反馈。奖励推断和形成是一个难题,尤其是当示例数据来自人类专家时。经典方法如行为做为模式(Behavioral Cloning)和反向激励学习(Inverse Reinforcement Learning)都具有高度敏感性,特别是在连续状态空间问题上。而现代IL算法通常将行为政策学习问题转化为分布匹配问题,这些问题经常需要在线互动数据来实现效果。在这篇论文中,我们研究了在连续状态空间环境中的IL问题,无需访问转移动力学信息、奖励结构或任何额外的环境互动数据。我们的方法基于马可夫平衡方程,并提出了一种基于条件kernel density参数估计的IL框架。我们使用条件kernel density参数估计来估计环境的转移动力学,并寻求满足环境的 probabilistic 平衡方程。我们证明了我们的估计符合基本的 asymptotic 一致性要求。通过对连续状态 benchmark 环境进行 série 的实验,我们显示了在许多现代IL算法中出现的超过 empirical 性能。

Multivariate Time-Series Anomaly Detection with Contaminated Data: Application to Physiological Signals

  • paper_url: http://arxiv.org/abs/2308.12563
  • repo_url: None
  • paper_authors: Thi Kieu Khanh Ho, Narges Armanfard
  • for: 本研究旨在提出一种实用的无监督时间序列异常检测方法(TSAD),可以在受到噪声的训练数据下进行异常检测。
  • methods: 该方法包括三个模块:一个去噪模块,可以 rectify the anomalies(即噪声)在训练数据中;一个变量依赖模型模块,可以捕捉长期内部和间部变量之间的依赖关系,用作净正常数据的代表;以及一个异常分数模块,用于检测异常。
  • results: 经过广泛的实验表明,该方法在三个通用的生理学数据集上的性能都超过了现有的方法,因此成功地建立了新的领先性。
    Abstract Mainstream unsupervised anomaly detection algorithms often excel in academic datasets, yet their real-world performance is restricted due to the controlled experimental conditions involving clean training data. Addressing the challenge of training with noise, a prevalent issue in practical anomaly detection, is frequently overlooked. In a pioneering endeavor, this study delves into the realm of label-level noise within sensory time-series anomaly detection (TSAD). This paper presents a novel and practical end-to-end unsupervised TSAD when the training data are contaminated with anomalies. The introduced approach, called TSAD-C, is devoid of access to abnormality labels during the training phase. TSAD-C encompasses three modules: a Decontaminator to rectify the abnormalities (aka noise) present in the training data, a Variable Dependency Modeling module to capture both long-term intra- and inter-variable dependencies within the decontaminated data that can be considered as a surrogate of the pure normal data, and an Anomaly Scoring module to detect anomalies. Our extensive experiments conducted on three widely used physiological datasets conclusively demonstrate that our approach surpasses existing methodologies, thus establishing a new state-of-the-art performance in the field.
    摘要 主流无监督异常检测算法在学术数据集中表现出色,然而在实际应用中它们的性能受到干扰。因为干扰是实际异常检测中的一个普遍存在的问题,而这种干扰在训练数据中的存在 frequently overlooked。本研究探索了标签水平干扰在感知时序异常检测(TSAD)中的挑战。本文提出了一种新的实用无监督TSAD方法,称为TSAD-C,它在训练数据中存在异常时不具备对异常标签的访问。TSAD-C包括三个模块:一个Rectifier来修正训练数据中的异常(即干扰),一个变量依赖模型来捕捉训练数据中的长期内部和间部变量关系,以及一个异常检测模块来检测异常。我们对三个广泛使用的生理数据集进行了广泛的实验,结果表明,我们的方法超越了现有方法,因此在该领域成立了新的状态态-of-the-art表现。

Variational Information Pursuit with Large Language and Multimodal Models for Interpretable Predictions

  • paper_url: http://arxiv.org/abs/2308.12562
  • repo_url: None
  • paper_authors: Kwan Ho Ryan Chan, Aditya Chattopadhyay, Benjamin David Haeffele, Rene Vidal
  • for: 这个研究的目的是扩展Variational Information Pursuit(V-IP)框架,使其能够在更大规模的任务上进行透明预测。
  • methods: 该研究使用两步进程,首先使用大型自然语言模型(LLM)生成足够多的任务相关和可解释性概念集,然后使用大型多Modal模型对每个数据样本进行 semantic similarity 标注。
  • results: 研究表明,使用LM+V-IP方法可以在测试性能和透明度之间做出平衡,并且在其他可解释性框架 such as Concept Bottleneck Models(CBMs)中使用更少的概念/查询可以达到类似的测试性能。
    Abstract Variational Information Pursuit (V-IP) is a framework for making interpretable predictions by design by sequentially selecting a short chain of task-relevant, user-defined and interpretable queries about the data that are most informative for the task. While this allows for built-in interpretability in predictive models, applying V-IP to any task requires data samples with dense concept-labeling by domain experts, limiting the application of V-IP to small-scale tasks where manual data annotation is feasible. In this work, we extend the V-IP framework with Foundational Models (FMs) to address this limitation. More specifically, we use a two-step process, by first leveraging Large Language Models (LLMs) to generate a sufficiently large candidate set of task-relevant interpretable concepts, then using Large Multimodal Models to annotate each data sample by semantic similarity with each concept in the generated concept set. While other interpretable-by-design frameworks such as Concept Bottleneck Models (CBMs) require an additional step of removing repetitive and non-discriminative concepts to have good interpretability and test performance, we mathematically and empirically justify that, with a sufficiently informative and task-relevant query (concept) set, the proposed FM+V-IP method does not require any type of concept filtering. In addition, we show that FM+V-IP with LLM generated concepts can achieve better test performance than V-IP with human annotated concepts, demonstrating the effectiveness of LLMs at generating efficient query sets. Finally, when compared to other interpretable-by-design frameworks such as CBMs, FM+V-IP can achieve competitive test performance using fewer number of concepts/queries in both cases with filtered or unfiltered concept sets.
    摘要 <> translate_language: zh-CN<>Variational Information Pursuit (V-IP) 是一个框架,用于做出可解释的预测,通过顺序选择一串任务相关、用户定义且可解释的问题,以获取最有用的信息。这允许预测模型内置可解释性,但是实际应用 V-IP 到任何任务时需要具有充足的数据样本,并且需要专家 manually 标注数据,限制了 V-IP 的应用范围仅对小规模任务进行数据标注是可能的。在这个工作中,我们将 V-IP 框架扩展为基础模型(FM),以解决这个限制。更 Specifically,我们使用一个 two-step 程序,首先利用大型自然语言模型(LLM)生成一个足够大的候选者集,然后使用大型多modal模型来标注每个数据样本,以semantic similarity with each concept in the generated concept set。相比其他可解释的设计框架,如概念瓶颈模型(CBM),我们不需要进行额外的排除重复和无用的概念,以确保好的解释性和试验性。具体来说,我们 mathematically 和实验显示,只要概念集足够有用和任务相关,则 FM+V-IP 方法不需要任何型的概念范 filtering。此外,我们显示 FM+V-IP 使用 LLM 生成的概念可以 achieve better test performance than V-IP 使用人工标注的概念,这表明 LLM 可以实现更有效率的查询集生成。最后,在与其他可解释的设计框架,如 CBM,进行比较时,FM+V-IP 可以 achieve 竞争性的试验性,使用更少的概念/询问。

Deep Reinforcement Learning-driven Cross-Community Energy Interaction Optimal Scheduling

  • paper_url: http://arxiv.org/abs/2308.12554
  • repo_url: None
  • paper_authors: Yang Li, Fanjin Bu, Zhen Yang, Bin Wang, Meng Han
  • for: 这篇论文是为了协调不同社区之间的能源交互和多种能源互补系统内部的能源转换,以及在不确定条件下实现整体能源系统的优化和调度。
  • methods: 该论文提出了一种基于多智能深度学习算法的全面调度模型,利用不同社区的负荷特征来做出决策。在该模型中,整体能源系统的调度问题被转化为一个Markov决策过程,并使用数据驱动的深度学习算法来解决。这种方法不需要模拟复杂的能源协同关系 между多个社区和多种能源互补系统。
  • results: 对于实验结果,提出的方法能够准确捕捉不同社区的负荷特征,并利用这些特征进行合理的能源交互协调。这导致风力浪费率从16.3%降至0%,并将总运行成本降低为5445.6元,表现出了明显的经济和环保效益。
    Abstract In order to coordinate energy interactions among various communities and energy conversions among multi-energy subsystems within the multi-community integrated energy system under uncertain conditions, and achieve overall optimization and scheduling of the comprehensive energy system, this paper proposes a comprehensive scheduling model that utilizes a multi-agent deep reinforcement learning algorithm to learn load characteristics of different communities and make decisions based on this knowledge. In this model, the scheduling problem of the integrated energy system is transformed into a Markov decision process and solved using a data-driven deep reinforcement learning algorithm, which avoids the need for modeling complex energy coupling relationships between multi-communities and multi-energy subsystems. The simulation results show that the proposed method effectively captures the load characteristics of different communities and utilizes their complementary features to coordinate reasonable energy interactions among them. This leads to a reduction in wind curtailment rate from 16.3% to 0% and lowers the overall operating cost by 5445.6 Yuan, demonstrating significant economic and environmental benefits.
    摘要 为了协调不同社区之间的能源互动和多种能源互系统内部的能源转换,并在不确定条件下实现整体能源系统的优化和调度,这篇论文提出了一种涵盖式调度模型,利用多代理人深度强化学习算法来学习不同社区的荷载特点,并根据这些知识来做出决策。在这个模型中,集成能源系统的调度问题被转化为马可夫决策过程,并使用数据驱动的深度强化学习算法来解决,这避免了模拟复杂的能源协同关系between多个社区和多种能源互系统的需要。 simulation results show that the proposed method effectively captures the load characteristics of different communities and utilizes their complementary features to coordinate reasonable energy interactions among them. This leads to a reduction in wind curtailment rate from 16.3% to 0% and lowers the overall operating cost by 5445.6 Yuan, demonstrating significant economic and environmental benefits.Here's a word-for-word translation of the text into Simplified Chinese:为了协调不同社区之间的能源互动和多种能源互系统内部的能源转换,并在不确定条件下实现整体能源系统的优化和调度,这篇论文提出了一种涵盖式调度模型,利用多代理人深度强化学习算法来学习不同社区的荷载特点,并根据这些知识来做出决策。在这个模型中,集成能源系统的调度问题被转化为马可夫决策过程,并使用数据驱动的深度强化学习算法来解决,这避免了模拟复杂的能源协同关系between多个社区和多种能源互系统的需要。 simulation results show that the proposed method effectively captures the load characteristics of different communities and utilizes their complementary features to coordinate reasonable energy interactions among them. This leads to a reduction in wind curtailment rate from 16.3% to 0% and lowers the overall operating cost by 5445.6 Yuan, demonstrating significant economic and environmental benefits.

Don’t blame Dataset Shift! Shortcut Learning due to Gradients and Cross Entropy

  • paper_url: http://arxiv.org/abs/2308.12553
  • repo_url: None
  • paper_authors: Aahlad Puli, Lily Zhang, Yoav Wald, Rajesh Ranganath
  • for: 这篇论文研究了 Default-ERM 算法在感知任务中的缺点,以及如何通过改变 inductive bias 来解决这个问题。
  • methods: 作者使用了一种 linear perception task 来研究 Default-ERM 的行为,并发现了 Default-ERM 在这种任务中的缺点。
  • results: 作者发现了一种基于 uniform margin 的 loss function,可以避免 Default-ERM 的短cut learning问题,并在多种视觉和语言任务上进行了实验,证明了这种 inductive bias 的有效性。
    Abstract Common explanations for shortcut learning assume that the shortcut improves prediction under the training distribution but not in the test distribution. Thus, models trained via the typical gradient-based optimization of cross-entropy, which we call default-ERM, utilize the shortcut. However, even when the stable feature determines the label in the training distribution and the shortcut does not provide any additional information, like in perception tasks, default-ERM still exhibits shortcut learning. Why are such solutions preferred when the loss for default-ERM can be driven to zero using the stable feature alone? By studying a linear perception task, we show that default-ERM's preference for maximizing the margin leads to models that depend more on the shortcut than the stable feature, even without overparameterization. This insight suggests that default-ERM's implicit inductive bias towards max-margin is unsuitable for perception tasks. Instead, we develop an inductive bias toward uniform margins and show that this bias guarantees dependence only on the perfect stable feature in the linear perception task. We develop loss functions that encourage uniform-margin solutions, called margin control (MARG-CTRL). MARG-CTRL mitigates shortcut learning on a variety of vision and language tasks, showing that better inductive biases can remove the need for expensive two-stage shortcut-mitigating methods in perception tasks.
    摘要 通常的解释是,快捷学习短cut减少预测误差在训练分布下,但在测试分布下不减少误差。因此,通过typical的梯度基于cross-entropy的优化,我们称之为default-ERM,这些模型使用快捷。然而,even when the stable feature determines the label in the training distribution and the shortcut does not provide any additional information, like in perception tasks, default-ERM still exhibits shortcut learning。Why are such solutions preferred when the loss for default-ERM can be driven to zero using the stable feature alone?By studying a linear perception task, we show that default-ERM's preference for maximizing the margin leads to models that depend more on the shortcut than the stable feature, even without overparameterization。This insight suggests that default-ERM's implicit inductive bias towards max-margin is unsuitable for perception tasks。Instead, we develop an inductive bias toward uniform margins and show that this bias guarantees dependence only on the perfect stable feature in the linear perception task。We develop loss functions that encourage uniform-margin solutions, called margin control (MARG-CTRL)。MARG-CTRL mitigates shortcut learning on a variety of vision and language tasks, showing that better inductive biases can remove the need for expensive two-stage shortcut-mitigating methods in perception tasks。

A Co-training Approach for Noisy Time Series Learning

  • paper_url: http://arxiv.org/abs/2308.12551
  • repo_url: None
  • paper_authors: Weiqi Zhang, Jianfeng Zhang, Jia Li, Fugee Tsung
  • for: 本研究强调鲁棒时间序列表示学习。
  • methods: 我们采用了两个视图的encoder创建两个不同的视图,然后通过协同对照学习来学习encoder。
  • results: 我们的TS-CoT方法在四个时间序列benchmark上进行了实验,结果显示TS-CoT方法可以减轻数据噪声和损害的影响,并且可以 Transfer learning到下游任务。
    Abstract In this work, we focus on robust time series representation learning. Our assumption is that real-world time series is noisy and complementary information from different views of the same time series plays an important role while analyzing noisy input. Based on this, we create two views for the input time series through two different encoders. We conduct co-training based contrastive learning iteratively to learn the encoders. Our experiments demonstrate that this co-training approach leads to a significant improvement in performance. Especially, by leveraging the complementary information from different views, our proposed TS-CoT method can mitigate the impact of data noise and corruption. Empirical evaluations on four time series benchmarks in unsupervised and semi-supervised settings reveal that TS-CoT outperforms existing methods. Furthermore, the representations learned by TS-CoT can transfer well to downstream tasks through fine-tuning.
    摘要 在这项工作中,我们关注着鲁棒时序序表示学习。我们假设真实世界中的时序序列是噪音的,并且不同视图中的相同时序序列信息在分析噪音输入时发挥重要作用。基于这个假设,我们创建了两个视图对输入时序序列进行编码。我们通过轮替彩色学习来启动这两个编码器。我们的实验表明,这种轮替彩色学习方法可以提高性能。尤其是通过利用不同视图中的补充信息,我们的提案的TS-CoT方法可以减轻数据噪音和损害的影响。我们在四个时序序 benchmark 上进行了无监督和半监督的实验,结果表明TS-CoT方法在性能上超过了现有方法。此外,TS-CoT方法学习的表示可以通过精度调整来传递到下游任务中。

CALM : A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias

  • paper_url: http://arxiv.org/abs/2308.12539
  • repo_url: https://github.com/vipulgupta1011/calm
  • paper_authors: Vipul Gupta, Pranav Narayanan Venkit, Hugo Laurençon, Shomir Wilson, Rebecca J. Passonneau
  • for: 这个论文的目的是量化和比较语言模型(LM)的社会经济偏见,以及这些偏见的可能性导致的危害。
  • methods: 这个论文使用了一个新的benchmarkdataset,称为Comprehensive Assessment of Language Model bias(CALM),来量化LM的偏见。它 integrate了16个不同领域的数据集,并从中过滤了224个模板,然后构建了一个包含78,400个例子的dataset。
  • results: 研究发现,与先前的数据集不同,CALM dataset更加多样化和可靠,并且可以更好地评估LM的偏见。在测试20个大型语言模型时,研究发现,一些模型系列的大型模型更加偏见,而一些模型系列的小型模型更加不偏见。此外,研究还发现了一些模型系列中的人种和性别偏见之间的负相关性。
    Abstract As language models (LMs) become increasingly powerful, it is important to quantify and compare them for sociodemographic bias with potential for harm. Prior bias measurement datasets are sensitive to perturbations in their manually designed templates, therefore unreliable. To achieve reliability, we introduce the Comprehensive Assessment of Language Model bias (CALM), a benchmark dataset to quantify bias in LMs across three tasks. We integrate 16 existing datasets across different domains, such as Wikipedia and news articles, to filter 224 templates from which we construct a dataset of 78,400 examples. We compare the diversity of CALM with prior datasets on metrics such as average semantic similarity, and variation in template length, and test the sensitivity to small perturbations. We show that our dataset is more diverse and reliable than previous datasets, thus better capture the breadth of linguistic variation required to reliably evaluate model bias. We evaluate 20 large language models including six prominent families of LMs such as Llama-2. In two LM series, OPT and Bloom, we found that larger parameter models are more biased than lower parameter models. We found the T0 series of models to be the least biased. Furthermore, we noticed a tradeoff between gender and racial bias with increasing model size in some model series. The code is available at https://github.com/vipulgupta1011/CALM.
    摘要 As language models (LMs) become increasingly powerful, it is important to quantify and compare them for sociodemographic bias with potential for harm. Prior bias measurement datasets are sensitive to perturbations in their manually designed templates, therefore unreliable. To achieve reliability, we introduce the Comprehensive Assessment of Language Model bias (CALM), a benchmark dataset to quantify bias in LMs across three tasks. We integrate 16 existing datasets across different domains, such as Wikipedia and news articles, to filter 224 templates from which we construct a dataset of 78,400 examples. We compare the diversity of CALM with prior datasets on metrics such as average semantic similarity, and variation in template length, and test the sensitivity to small perturbations. We show that our dataset is more diverse and reliable than previous datasets, thus better capture the breadth of linguistic variation required to reliably evaluate model bias. We evaluate 20 large language models including six prominent families of LMs such as Llama-2. In two LM series, OPT and Bloom, we found that larger parameter models are more biased than lower parameter models. We found the T0 series of models to be the least biased. Furthermore, we noticed a tradeoff between gender and racial bias with increasing model size in some model series. The code is available at https://github.com/vipulgupta1011/CALM.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other parts of the world. If you prefer Traditional Chinese, I can provide that as well.

FedSoL: Bridging Global Alignment and Local Generality in Federated Learning

  • paper_url: http://arxiv.org/abs/2308.12532
  • repo_url: None
  • paper_authors: Gihun Lee, Minchan Jeong, Sangmook Kim, Jaehoon Oh, Se-Young Yun
  • for: 提高 Federated Learning 性能在不同客户端数据分布情况下
  • methods: combining global alignment和本地通用性,通过在本地学习中寻找Parameter region robust against proximal perturbations
  • results: experiments show that FedSoL consistently achieves state-of-the-art performance on various setups.Here’s the full text in Simplified Chinese:
  • for: 本研究旨在提高 Federated Learning 性能在不同客户端数据分布情况下
  • methods: Federated Stability on Learning (FedSoL) combines global alignment和本地通用性,通过在本地学习中寻找Parameter region robust against proximal perturbations
  • results: experiments show that FedSoL consistently achieves state-of-the-art performance on various setups.
    Abstract Federated Learning (FL) aggregates locally trained models from individual clients to construct a global model. While FL enables learning a model with data privacy, it often suffers from significant performance degradation when client data distributions are heterogeneous. Many previous FL algorithms have addressed this issue by introducing various proximal restrictions. These restrictions aim to encourage global alignment by constraining the deviation of local learning from the global objective. However, they inherently limit local learning by interfering with the original local objectives. Recently, an alternative approach has emerged to improve local learning generality. By obtaining local models within a smooth loss landscape, this approach mitigates conflicts among different local objectives of the clients. Yet, it does not ensure stable global alignment, as local learning does not take the global objective into account. In this study, we propose Federated Stability on Learning (FedSoL), which combines both the concepts of global alignment and local generality. In FedSoL, the local learning seeks a parameter region robust against proximal perturbations. This strategy introduces an implicit proximal restriction effect in local learning while maintaining the original local objective for parameter update. Our experiments show that FedSoL consistently achieves state-of-the-art performance on various setups.
    摘要 联合学习(FL)将个别客户的本地训练模型集成成全域模型。而FL可以保持资料隐私,但它经常受到客户资料分布不均的影响,导致性能下降。许多前一代FL算法已经解决这个问题,通过引入不同的距离限制。这些限制的目的是鼓励全球对齐,但是它们会限制本地学习。在最近的研究中,一种新的方法已经出现,可以提高本地学习的通用性。通过在本地学习中获得一个缓和的损失函数,这种方法可以抑制不同客户的本地目标之间的冲突。然而,这种方法不能保证稳定的全球对齐,因为本地学习不会考虑全球目标。在这个研究中,我们提出了联合学习稳定(FedSoL),它结合了全球对齐和本地通用性两个概念。在FedSoL中,本地学习寻找一个对 proximal 干扰有效的参数区域。这策略将引入一个隐式距离限制效应,同时保持原始本地目标 для参数更新。我们的实验显示,FedSoL能够预量性能在不同的设置中具有前所未有的表现。

SieveNet: Selecting Point-Based Features for Mesh Networks

  • paper_url: http://arxiv.org/abs/2308.12530
  • repo_url: https://github.com/sievenet/sievenet.github.io
  • paper_authors: Shengchao Yuan, Yishun Dou, Rui Shi, Bingbing Ni, Zhong Zheng
  • for: This paper aims to address the challenges of applying mesh neural networks to existing architectures due to the irregular topology of meshes.
  • methods: The proposed method, SieveNet, utilizes both the regular topology from remeshing and accurate geometric information from distortion-aware point sampling on the surface of the original mesh.
  • results: The proposed method achieves effective and superior performance on classification and segmentation tasks, eliminating the need for hand-crafted feature engineering and leveraging off-the-shelf network architectures such as the vision transformer.Here is the text in Simplified Chinese:
  • for: 这篇论文目标是解决将三维计算机视觉和图形中的网格应用于现有架构所存在的挑战。
  • methods: 提议的方法是使用结构化网格topology从重新排序和精确地从原始网格表面上的点抽取 geometric information。
  • results: 提议的方法在分类和分割任务中取得了有效和超越性的表现,不需要手动设计特征工程和可以利用现有的网格架构such as 视Transformer。
    Abstract Meshes are widely used in 3D computer vision and graphics, but their irregular topology poses challenges in applying them to existing neural network architectures. Recent advances in mesh neural networks turn to remeshing and push the boundary of pioneer methods that solely take the raw meshes as input. Although the remeshing offers a regular topology that significantly facilitates the design of mesh network architectures, features extracted from such remeshed proxies may struggle to retain the underlying geometry faithfully, limiting the subsequent neural network's capacity. To address this issue, we propose SieveNet, a novel paradigm that takes into account both the regular topology and the exact geometry. Specifically, this method utilizes structured mesh topology from remeshing and accurate geometric information from distortion-aware point sampling on the surface of the original mesh. Furthermore, our method eliminates the need for hand-crafted feature engineering and can leverage off-the-shelf network architectures such as the vision transformer. Comprehensive experimental results on classification and segmentation tasks well demonstrate the effectiveness and superiority of our method.
    摘要 mesh 是计算机视觉和图形领域广泛使用的,但它们的不规则 topology 使得应用于现有的神经网络架构带来挑战。 recent advances in mesh neural networks 推动了重新拼接和推界的方法,这些方法可以让神经网络架构设计变得更加容易。 although remeshing 提供了规则的 topology,但是从这些拼接的代理中提取出来的特征可能会产生不准确地表示原始的几何结构,这限制了后续神经网络的能力。 To address this issue, we propose SieveNet, a novel paradigm that takes into account both the regular topology and the exact geometry. Specifically, this method utilizes structured mesh topology from remeshing and accurate geometric information from distortion-aware point sampling on the surface of the original mesh. Furthermore, our method eliminates the need for hand-crafted feature engineering and can leverage off-the-shelf network architectures such as the vision transformer. Comprehensive experimental results on classification and segmentation tasks well demonstrate the effectiveness and superiority of our method.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. The Traditional Chinese version of the translation would be slightly different.

UNISOUND System for VoxCeleb Speaker Recognition Challenge 2023

  • paper_url: http://arxiv.org/abs/2308.12526
  • repo_url: None
  • paper_authors: Yu Zheng, Yajun Zhang, Chuanying Niu, Yibin Zhan, Yanhua Long, Dongxing Xu
  • for: 本文是对VoxCeleb Speaker Recognition Challenge 2023(VoxSRC 2023)的论文提交,包括Track 1和Track 2。
  • methods: 该系统使用了大规模ResNet和RepVGG架构,并提出了一种稳定性 aware的分数均衡方法(CMF),以提高对话音频印痕的稳定性。
  • results: 该系统通过将六个模型进行融合,在VoxSRC 2023中获得了Track 1的第一名和Track 2的第二名,其minDCF为0.0855%,EER为1.5880%。
    Abstract This report describes the UNISOUND submission for Track1 and Track2 of VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC 2023). We submit the same system on Track 1 and Track 2, which is trained with only VoxCeleb2-dev. Large-scale ResNet and RepVGG architectures are developed for the challenge. We propose a consistency-aware score calibration method, which leverages the stability of audio voiceprints in similarity score by a Consistency Measure Factor (CMF). CMF brings a huge performance boost in this challenge. Our final system is a fusion of six models and achieves the first place in Track 1 and second place in Track 2 of VoxSRC 2023. The minDCF of our submission is 0.0855 and the EER is 1.5880%.
    摘要 这份报告描述了我们在VoxCeleb Speaker Recognition Challenge 2023(VoxSRC 2023)中的UNISOUND提交,包括Track1和Track2。我们使用了VoxCeleb2-dev进行训练,并开发了大规模ResNet和RepVGG架构。我们提出了一种具有稳定性的声音印记稳定因子(CMF),以改进比赛中的相似性分数。我们的最终系统是六个模型的拟合,在Track 1中获得了第一名,在Track 2中获得了第二名。我们的最小dCF是0.0855,EER是1.5880%。

Not Only Rewards But Also Constraints: Applications on Legged Robot Locomotion

  • paper_url: http://arxiv.org/abs/2308.12517
  • repo_url: None
  • paper_authors: Yunho Kim, Hyunsik Oh, Jeonghyun Lee, Jinhyeok Choi, Gwanghyeon Ji, Moonkyu Jung, Donghoon Youm, Jemin Hwangbo
  • for: 这个论文的目的是提出一种新的强化学习框架,用于训练神经网络控制器,以实现复杂的机器人系统的高性能控制。
  • methods: 这种框架使用了两种约束类型和一种高效的政策优化算法,以便让工程师在极少的计算开销下,准确地反映他们的意图和处理约束。
  • results: 在 simulate 和实际实验中,这种学习框架可以让控制器在不同的四足机器人系统中提供高性能和自然的运动样式,并且只需要调整单个奖励系数,可以减少奖励工程的努力和时间。
    Abstract Several earlier studies have shown impressive control performance in complex robotic systems by designing the controller using a neural network and training it with model-free reinforcement learning. However, these outstanding controllers with natural motion style and high task performance are developed through extensive reward engineering, which is a highly laborious and time-consuming process of designing numerous reward terms and determining suitable reward coefficients. In this work, we propose a novel reinforcement learning framework for training neural network controllers for complex robotic systems consisting of both rewards and constraints. To let the engineers appropriately reflect their intent to constraints and handle them with minimal computation overhead, two constraint types and an efficient policy optimization algorithm are suggested. The learning framework is applied to train locomotion controllers for several legged robots with different morphology and physical attributes to traverse challenging terrains. Extensive simulation and real-world experiments demonstrate that performant controllers can be trained with significantly less reward engineering, by tuning only a single reward coefficient. Furthermore, a more straightforward and intuitive engineering process can be utilized, thanks to the interpretability and generalizability of constraints. The summary video is available at https://youtu.be/KAlm3yskhvM.
    摘要 前些研究已经表明使用神经网络设计控制器和无模型奖励学习可以实现复杂机器人系统中的出色控制性能。然而,这些出色的控制器通过大量的奖励工程来实现自然的运动风格和高任务性能,这是一个非常劳动ious和时间consuming的过程。在这项工作中,我们提出了一种基于奖励学习的控制器训练框架,以便让工程师适当地反映约束并处理其减少计算开销。我们建议了两种约束类型和一种高效的政策优化算法。我们在许多模拟和实际实验中证明了,可以通过调整单个奖励系数来训练高性能的控制器,而不需要大量的奖励工程。此外,由于约束的可读性和普遍性,可以使用更直观和直接的工程过程。关于这个研究的摘要视频可以在以下链接中找到:https://youtu.be/KAlm3yskhvM。

Masked Autoencoders are Efficient Class Incremental Learners

  • paper_url: http://arxiv.org/abs/2308.12510
  • repo_url: https://github.com/scok30/mae-cil
  • paper_authors: Jiang-Tian Zhai, Xialei Liu, Andrew D. Bagdanov, Ke Li, Ming-Ming Cheng
  • for: 这篇论文旨在Sequential Learning new classes while avoiding catastrophic forgetting of previous knowledge.
  • methods: 使用Masked Autoencoders (MAEs) as efficient learners for CIL, 并且通过组合supervised loss for classification.
  • results: 实验结果显示,我们的方法比顶对照方法在CIFAR-100, ImageNet-Subset, 和 ImageNet-Full 的表现更好.
    Abstract Class Incremental Learning (CIL) aims to sequentially learn new classes while avoiding catastrophic forgetting of previous knowledge. We propose to use Masked Autoencoders (MAEs) as efficient learners for CIL. MAEs were originally designed to learn useful representations through reconstructive unsupervised learning, and they can be easily integrated with a supervised loss for classification. Moreover, MAEs can reliably reconstruct original input images from randomly selected patches, which we use to store exemplars from past tasks more efficiently for CIL. We also propose a bilateral MAE framework to learn from image-level and embedding-level fusion, which produces better-quality reconstructed images and more stable representations. Our experiments confirm that our approach performs better than the state-of-the-art on CIFAR-100, ImageNet-Subset, and ImageNet-Full. The code is available at https://github.com/scok30/MAE-CIL .
    摘要 classe 增量学习 (CIL) 目的是逐步学习新的类,而不导致之前的知识减弱。我们提议使用假设抑制器 (MAE) 作为高效的学习器,MAE 原本是用于通过无监督学习获得有用的表示,它可以轻松地与分类的监督损失结合。此外,MAE 可靠地从随机选择的 patches 中重建原始输入图像,我们使用这些 patches 来更高效地存储过去任务中的 exemplars。我们还提出了双向 MAE 框架,用于学习图像级和嵌入级的混合,这会生成更高质量的重建图像和更稳定的表示。我们的实验表明,我们的方法比当前状态的更高效于 CIFAR-100、ImageNet-Subset 和 ImageNet-Full。代码可以在 上获取。

False Information, Bots and Malicious Campaigns: Demystifying Elements of Social Media Manipulations

  • paper_url: http://arxiv.org/abs/2308.12497
  • repo_url: None
  • paper_authors: Mohammad Majid Akhtar, Rahat Masood, Muhammad Ikram, Salil S. Kanhere
  • for: This paper aims to provide a comprehensive analysis of the manipulation landscape on online social networks (OSNs), including false information, bots, and malicious campaigns.
  • methods: The paper synthesizes insights from various disciplines and integrates primary elements of social media manipulation (SMM) to extensively examine each SMM element.
  • results: The findings highlight the urgent need for interdisciplinary research to effectively combat social media manipulations, and provide valuable insights for OSN providers to ensure the safety and integrity of their platforms.Here are the three points in Simplified Chinese text:
  • for: 这篇论文目标是为online社交网络(OSN)提供全面的欺诈景观,包括假信息、机器人和恶意运动。
  • methods: 这篇论文将不同领域的知识融合,并将社交媒体欺诈(SMM)的主要元素集成,进行广泛的研究。
  • results: 发现表明需要跨学科研究,以有效应对社交媒体欺诈,并为社交媒体平台提供安全和完整性。
    Abstract The rapid spread of false information and persistent manipulation attacks on online social networks (OSNs), often for political, ideological, or financial gain, has affected the openness of OSNs. While researchers from various disciplines have investigated different manipulation-triggering elements of OSNs (such as understanding information diffusion on OSNs or detecting automated behavior of accounts), these works have not been consolidated to present a comprehensive overview of the interconnections among these elements. Notably, user psychology, the prevalence of bots, and their tactics in relation to false information detection have been overlooked in previous research. To address this research gap, this paper synthesizes insights from various disciplines to provide a comprehensive analysis of the manipulation landscape. By integrating the primary elements of social media manipulation (SMM), including false information, bots, and malicious campaigns, we extensively examine each SMM element. Through a systematic investigation of prior research, we identify commonalities, highlight existing gaps, and extract valuable insights in the field. Our findings underscore the urgent need for interdisciplinary research to effectively combat social media manipulations, and our systematization can guide future research efforts and assist OSN providers in ensuring the safety and integrity of their platforms.
    摘要 在社交媒体网络(OSN)上,快速传播的假信息和持续的操纵攻击已经影响了OSN的开放性。虽然来自不同领域的研究人员已经调查了OSN上的不同攻击触发元素(如了解信息传播在OSN上或检测账户的自动行为),但这些研究没有被集成来提供全面的概念概述。特别是用户心理学、灵活的机器人和它们与假信息检测之间的关系尚未得到过去研究的关注。为了解决这个研究漏洞,本文将从不同领域的视角 integrates 社交媒体攻击(SMM)的主要元素,包括假信息、机器人和恶意运动。我们对每个SMM元素进行了系统性的调查,并通过对先前研究的系统性分析,找到了共同点、突出了现有的漏洞、提取了有价值的发现。我们的发现表明,需要跨学科研究,以有效地抗击社交媒体攻击,而我们的系统化分析可以引导未来的研究努力和帮助OSN提供者保持平台的安全和完整性。

Optimizing Neural Network Scale for ECG Classification

  • paper_url: http://arxiv.org/abs/2308.12492
  • repo_url: None
  • paper_authors: Byeong Tak Lee, Yong-Yeon Jo, Joon-Myoung Kwon
  • for: 这个论文旨在研究用于分析电cardiogram(ECG)的卷积神经网络(CNN),特指Residual神经网络(ResNet)。
  • methods: 该论文使用了CNN模型,并对不同参数进行了探索和分析,以优化网络缩放。
  • results: 研究发现,采用更浅的网络结构、更多的通道数和更小的核心大小可以提高ECG分类的性能。结果表明,针对不同目标任务,可以根据我们的发现来获得更高效和准确的模型,即使使用更少的计算资源或时间。在实践中,我们示例了一种基于我们发现的窄搜索空间可以提高性能。
    Abstract We study scaling convolutional neural networks (CNNs), specifically targeting Residual neural networks (ResNet), for analyzing electrocardiograms (ECGs). Although ECG signals are time-series data, CNN-based models have been shown to outperform other neural networks with different architectures in ECG analysis. However, most previous studies in ECG analysis have overlooked the importance of network scaling optimization, which significantly improves performance. We explored and demonstrated an efficient approach to scale ResNet by examining the effects of crucial parameters, including layer depth, the number of channels, and the convolution kernel size. Through extensive experiments, we found that a shallower network, a larger number of channels, and smaller kernel sizes result in better performance for ECG classifications. The optimal network scale might differ depending on the target task, but our findings provide insight into obtaining more efficient and accurate models with fewer computing resources or less time. In practice, we demonstrate that a narrower search space based on our findings leads to higher performance.
    摘要 我们研究了卷积神经网络(CNN)的扩大,特指幂值神经网络(ResNet),用于分析电心律ogram(ECG)信号。虽然ECG信号是时序数据,但CNN模型在ECG分析中表现更好,而其他不同结构的神经网络则被忽略了。然而,大多数之前的ECG分析研究忽略了网络扩大优化的重要性,这对性能有着显著的提高效果。我们探讨了和分析了关键参数的效果,包括层深度、通道数和卷积核大小。经过广泛的实验,我们发现,更浅的网络、更多的通道和更小的卷积核大小会对ECG分类提供更好的表现。不同目标任务的优化策略可能会不同,但我们的发现可以帮助您更快地获得更高性能和更有效的模型,使用更少的计算资源或更少的时间。在实践中,我们示出了基于我们发现的更窄的搜索空间可以提高性能。

Fall Detection using Knowledge Distillation Based Long short-term memory for Offline Embedded and Low Power Devices

  • paper_url: http://arxiv.org/abs/2308.12481
  • repo_url: None
  • paper_authors: Hannah Zhou, Allison Chen, Celine Buer, Emily Chen, Kayleen Tang, Lauryn Gong, Zhiqi Liu, Jianbin Tang
  • for: 这篇论文旨在提出一种低功耗、成本效果的滑落探测方法,通过知识传授基于LSTM模型优化精确性。
  • methods: 本论文使用时间序列数据集合发展知识传授基于LSTM模型,并评估不同传感器的滑落探测模型精确性。此外, authors 还使用知识传授技术优化模型的精确性,以降低功耗消耗。
  • results: 本论文的结果显示,这种基于LSTM模型的滑落探测方法可以实现实时探测,并且可以提高滑落探测精确性。此外, authors 发现知识传授技术可以优化模型的精确性,并降低功耗消耗。
    Abstract This paper presents a cost-effective, low-power approach to unintentional fall detection using knowledge distillation-based LSTM (Long Short-Term Memory) models to significantly improve accuracy. With a primary focus on analyzing time-series data collected from various sensors, the solution offers real-time detection capabilities, ensuring prompt and reliable identification of falls. The authors investigate fall detection models that are based on different sensors, comparing their accuracy rates and performance. Furthermore, they employ the technique of knowledge distillation to enhance the models' precision, resulting in refined accurate configurations that consume lower power. As a result, this proposed solution presents a compelling avenue for the development of energy-efficient fall detection systems for future advancements in this critical domain.
    摘要 Translation Notes:* "unintentional fall detection" is translated as "意外落下探测" (yì wài luò xià tèng)* "knowledge distillation-based LSTM" is translated as "基于知识填充的LSTM" (jī yǔ zhī xí shén shì de LSTM)* "time-series data" is translated as "时间序列数据" (shí jiān xiàng xuě dà)* "real-time detection" is translated as "实时探测" (shí jī tèng)* "prompt and reliable identification" is translated as "快速可靠识别" (kuài sù kě huì bǐ)* "energy-efficient" is translated as "能效的" (néng xiǎo de)

Zero-delay Consistent Signal Reconstruction from Streamed Multivariate Time Series

  • paper_url: http://arxiv.org/abs/2308.12459
  • repo_url: None
  • paper_authors: Emilio Ruiz-Moreno, Luis Miguel López-Ramos, Baltasar Beferull-Lozano
  • for: 本研究旨在提出一种能够在数据流中逐步重建数字化的实时分析信号的方法,以实现零延迟响应。
  • methods: 本方法基于循环神经网络学习多变量时间序列的空间时间相关性,以降低重建过程中的干扰。
  • results: 实验结果显示,提议的方法可以在采样率增加的情况下实现逐步重建,并且与非一致重建相比,实现较好的误差衰减。
    Abstract Digitalizing real-world analog signals typically involves sampling in time and discretizing in amplitude. Subsequent signal reconstructions inevitably incur an error that depends on the amplitude resolution and the temporal density of the acquired samples. From an implementation viewpoint, consistent signal reconstruction methods have proven a profitable error-rate decay as the sampling rate increases. Despite that, these results are obtained under offline settings. Therefore, a research gap exists regarding methods for consistent signal reconstruction from data streams. This paper presents a method that consistently reconstructs streamed multivariate time series of quantization intervals under a zero-delay response requirement. On the other hand, previous work has shown that the temporal dependencies within univariate time series can be exploited to reduce the roughness of zero-delay signal reconstructions. This work shows that the spatiotemporal dependencies within multivariate time series can also be exploited to achieve improved results. Specifically, the spatiotemporal dependencies of the multivariate time series are learned, with the assistance of a recurrent neural network, to reduce the roughness of the signal reconstruction on average while ensuring consistency. Our experiments show that our proposed method achieves a favorable error-rate decay with the sampling rate compared to a similar but non-consistent reconstruction.
    摘要 数字化实际世界的 аналоговignal通常通过时间采样和扫描幅度的精度来实现。随后的信号重建必然会产生错误,这个错误取决于扫描频率和采样时间的分辨率。从实现角度来看,一致的信号重建方法在采样率增加时显示了负责任的错误下降。然而,这些结果在线上设置下获得。因此,一个研究空白是关于从数据流中一致地重建信号的方法。这篇文章提出了一种方法,可以在零延迟响应下一致地重建流动的多变量时间序列。在这种情况下,我们发现了在多变量时间序列中的空间时间相互关系可以被利用,以实现改善的结果。具体来说,我们使用回归神经网络学习多变量时间序列中的空间时间相互关系,以降低重建后信号的抖音程度的平均值,同时保证一致性。我们的实验表明,我们的提议方法在采样率增加时与非一致重建相比,可以获得更好的错误下降。

PFL-GAN: When Client Heterogeneity Meets Generative Models in Personalized Federated Learning

  • paper_url: http://arxiv.org/abs/2308.12454
  • repo_url: None
  • paper_authors: Achintha Wijesinghe, Songyang Zhang, Zhi Ding
  • for: 强调在多 Client 环境下实现对应的 Federated Learning (FL) 案例,特别是在客户数据不同性下实现更好的学习效果。
  • methods: 基于 Generative Adversarial Network (GAN) 模型,提出了一个 novel GAN sharing and aggregation strategy for Personalized Federated Learning (PFL),包括客户相似性学习和权重联合数据聚合。
  • results: 透过严谨的实验评估在多个知名数据集上,证明 PFL-GAN 能够在不同客户数据不同性下实现更好的学习效果。
    Abstract Recent advances of generative learning models are accompanied by the growing interest in federated learning (FL) based on generative adversarial network (GAN) models. In the context of FL, GAN can capture the underlying client data structure, and regenerate samples resembling the original data distribution without compromising the private raw data. Although most existing GAN-based FL works focus on training a global model, Personalized FL (PFL) sometimes can be more effective in view of client data heterogeneity in terms of distinct data sample distributions, feature spaces, and labels. To cope with client heterogeneity in GAN-based FL, we propose a novel GAN sharing and aggregation strategy for PFL. The proposed PFL-GAN addresses the client heterogeneity in different scenarios. More specially, we first learn the similarity among clients and then develop an weighted collaborative data aggregation. The empirical results through the rigorous experimentation on several well-known datasets demonstrate the effectiveness of PFL-GAN.
    摘要 近期生成学模型的进步引起了基于联合学习(Federated Learning,FL)的生成对抗网络(Generative Adversarial Network,GAN)模型的增加兴趣。在FL中,GAN可以捕捉客户端数据结构的下面,并生成符合原始数据分布的样本,而无需让客户端披露私人原始数据。虽然大多数现有的GAN基于FL工作集中在全球模型的训练上,但在视 Client数据多样性的情况下,个性化FL(Personalized FL,PFL)可能更有效。为了处理客户端多样性在GAN基于FL中,我们提出了一种新的GAN共享和汇聚策略。我们首先学习客户端之间的相似性,然后开发一种权重协同数据汇聚。实际结果通过对severalwell-known数据集进行严谨的实验证明了PFL-GAN的有效性。

Augmenting medical image classifiers with synthetic data from latent diffusion models

  • paper_url: http://arxiv.org/abs/2308.12453
  • repo_url: None
  • paper_authors: Luke W. Sagers, James A. Diao, Luke Melas-Kyriazi, Matthew Groh, Pranav Rajpurkar, Adewole S. Adamson, Veronica Rotemberg, Roxana Daneshjou, Arjun K. Manrai
  • for: 这个研究旨在测试generative AI可以帮助医疗人员开发更好的医疗AI算法,特别是在资料有限的情况下。
  • methods: 研究使用了latent diffusion模型,并与现实影像进行混合训练,以提高模型的表现。
  • results: 研究发现,使用生成的影像可以帮助提高医疗AI模型的表现,但是这些表现 improvements尚未到达显著的水平。另外,研究发现了一个新的数据集,包含458,920帧生成的影像。
    Abstract While hundreds of artificial intelligence (AI) algorithms are now approved or cleared by the US Food and Drugs Administration (FDA), many studies have shown inconsistent generalization or latent bias, particularly for underrepresented populations. Some have proposed that generative AI could reduce the need for real data, but its utility in model development remains unclear. Skin disease serves as a useful case study in synthetic image generation due to the diversity of disease appearance, particularly across the protected attribute of skin tone. Here we show that latent diffusion models can scalably generate images of skin disease and that augmenting model training with these data improves performance in data-limited settings. These performance gains saturate at synthetic-to-real image ratios above 10:1 and are substantially smaller than the gains obtained from adding real images. As part of our analysis, we generate and analyze a new dataset of 458,920 synthetic images produced using several generation strategies. Our results suggest that synthetic data could serve as a force-multiplier for model development, but the collection of diverse real-world data remains the most important step to improve medical AI algorithms.
    摘要 美国食品和药品管理局(FDA)已批准或核可了数百种人工智能(AI)算法,但许多研究表明这些算法在不同人群中存在不一致的泛化或隐藏偏见,尤其是对于受保护属性的人口。一些人提议用生成AI降低实际数据的需求,但其在模型开发中的用途仍未得到清楚的回答。皮肤病 serves as a useful case study in synthetic image generation due to the diversity of disease appearance, particularly across the protected attribute of skin tone. 在本研究中,我们显示了潜在扩散模型可以可扩展地生成皮肤病图像,并且在数据有限的情况下,通过这些数据进行模型训练可以提高表现。这些表现提升随synthetic-to-real image ratio的增加而增加,但是与使用真实图像相比,这些提升的效果远远小于。在我们的分析中,我们生成了458,920个synthetic图像,并对其进行分析。我们的结果表明,生成的数据可以作为模型开发中的力量multiplier,但是收集真实世界数据仍然是改进医疗AI算法的最重要步骤。

An Intentional Forgetting-Driven Self-Healing Method For Deep Reinforcement Learning Systems

  • paper_url: http://arxiv.org/abs/2308.12445
  • repo_url: https://github.com/ahmedhajyahmed/drdrl
  • paper_authors: Ahmed Haj Yahmed, Rached Bouchoucha, Houssem Ben Braiek, Foutse Khomh
  • for: 这篇论文是针对大规模生产中的深度强化学习(DRL)系统进行应用,并解决DRL系统在环境变化中导致的不适用行为问题。
  • methods: 这篇论文提出了一种具有自我疗愈能力的DRL系统,称为Dr. DRL,它通过新的忘记机制来解决传统的CL潜在问题,例如 catastrophic forgetting、warm-starting failure 和 slow convergence。
  • results: 相比传统CL,Dr. DRL能够将疗愈时间和精灵化集合数量降低,平均降低18.74%和17.72%。此外,Dr. DRL能够在19.63%的推移环境中帮助代理人适应,并在已经解决的环境中保持和提高回报率。
    Abstract Deep reinforcement learning (DRL) is increasingly applied in large-scale productions like Netflix and Facebook. As with most data-driven systems, DRL systems can exhibit undesirable behaviors due to environmental drifts, which often occur in constantly-changing production settings. Continual Learning (CL) is the inherent self-healing approach for adapting the DRL agent in response to the environment's conditions shifts. However, successive shifts of considerable magnitude may cause the production environment to drift from its original state. Recent studies have shown that these environmental drifts tend to drive CL into long, or even unsuccessful, healing cycles, which arise from inefficiencies such as catastrophic forgetting, warm-starting failure, and slow convergence. In this paper, we propose Dr. DRL, an effective self-healing approach for DRL systems that integrates a novel mechanism of intentional forgetting into vanilla CL to overcome its main issues. Dr. DRL deliberately erases the DRL system's minor behaviors to systematically prioritize the adaptation of the key problem-solving skills. Using well-established DRL algorithms, Dr. DRL is compared with vanilla CL on various drifted environments. Dr. DRL is able to reduce, on average, the healing time and fine-tuning episodes by, respectively, 18.74% and 17.72%. Dr. DRL successfully helps agents to adapt to 19.63% of drifted environments left unsolved by vanilla CL while maintaining and even enhancing by up to 45% the obtained rewards for drifted environments that are resolved by both approaches.
    摘要 深度强化学习(DRL)在大规模生产中越来越普遍应用,如Netflix和Facebook。然而,与大多数数据驱动系统一样,DRL系统可能会出现不жела的行为,即因环境变化而导致的环境漂移。Continual Learning(CL)是DRLagent的自适应方法,但继续的大规模变化可能会让生产环境偏离原始状态。现有研究表明,这些环境变化可能会让CL进入长时间的或者无法成功的恢复循环,这些循环由多种不足所致,如恐慌忘记、温启失败和慢 converges。在这篇论文中,我们提出了Dr. DRL,一种有效的自适应方法,用于解决DRL系统中的主要问题。Dr. DRL通过novel的意图忘记机制,系统地优先级掌握DRL系统的关键问题解决技能。使用已知的DRL算法,Dr. DRL与vanilla CL进行比较,在多个漂移环境中显示出了更好的表现。Dr. DRL能够降低,在 average,恢复时间和精度调整集数量,分别降低18.74%和17.72%。Dr. DRL成功地帮助代理人适应了vanilla CL无法解决的19.63%漂移环境,同时保持和提高了对漂移环境的解决得到的奖励。

TAI-GAN: Temporally and Anatomically Informed GAN for early-to-late frame conversion in dynamic cardiac PET motion correction

  • paper_url: http://arxiv.org/abs/2308.12443
  • repo_url: https://github.com/gxq1998/tai-gan
  • paper_authors: Xueqi Guo, Luyao Shi, Xiongchao Chen, Bo Zhou, Qiong Liu, Huidong Xie, Yi-Hwa Liu, Richard Palyo, Edward J. Miller, Albert J. Sinusas, Bruce Spottiswoode, Chi Liu, Nicha C. Dvornek
    for:* 这个论文主要关注的是 Dynamic cardiac positron emission tomography (PET) 图像序列中的迅速跟踪器动态分布和高异常性,尤其是在早期帧中,常见的INTENSITY-based image registration技术不能适用。methods:* 作者提出了一种使用生成方法处理 tracer 分布变化,以帮助现有的 registration 方法进行框架匹配。* 特别是,作者提出了一种 Temporally and Anatomically Informed Generative Adversarial Network (TAI-GAN),用于将早期帧转换成参照帧中的图像,通过一个 all-to-one 映射。results:* 作者验证了他们的提议在临床 $^{82}$Rb PET 数据集上,并发现他们的 TAI-GAN 可以生成高质量的转换图像,与参照帧的真实图像相似。* 经过 TAI-GAN 转换后,运动估计精度和临床血液流量(MBF)的量化也有所改善,与原始帧相比。
    Abstract The rapid tracer kinetics of rubidium-82 ($^{82}$Rb) and high variation of cross-frame distribution in dynamic cardiac positron emission tomography (PET) raise significant challenges for inter-frame motion correction, particularly for the early frames where conventional intensity-based image registration techniques are not applicable. Alternatively, a promising approach utilizes generative methods to handle the tracer distribution changes to assist existing registration methods. To improve frame-wise registration and parametric quantification, we propose a Temporally and Anatomically Informed Generative Adversarial Network (TAI-GAN) to transform the early frames into the late reference frame using an all-to-one mapping. Specifically, a feature-wise linear modulation layer encodes channel-wise parameters generated from temporal tracer kinetics information, and rough cardiac segmentations with local shifts serve as the anatomical information. We validated our proposed method on a clinical $^{82}$Rb PET dataset and found that our TAI-GAN can produce converted early frames with high image quality, comparable to the real reference frames. After TAI-GAN conversion, motion estimation accuracy and clinical myocardial blood flow (MBF) quantification were improved compared to using the original frames. Our code is published at https://github.com/gxq1998/TAI-GAN.
    摘要 <>使用 rubidium-82 ($^{82}$Rb) 的快速追踪器和动态心脏 позиトрон发射 Tomatoes(PET)中的高程度分布变化 pose significant challenges for inter-frame motion correction, especially for early frames where conventional intensity-based image registration techniques are not applicable. Alternatively, a promising approach utilizes generative methods to handle tracer distribution changes to assist existing registration methods. To improve frame-wise registration and parametric quantification, we propose a Temporally and Anatomically Informed Generative Adversarial Network (TAI-GAN) to transform early frames into a late reference frame using an all-to-one mapping. Specifically, a feature-wise linear modulation layer encodes channel-wise parameters generated from temporal tracer kinetics information, and rough cardiac segmentations with local shifts serve as anatomical information. We validated our proposed method on a clinical $^{82}$Rb PET dataset and found that our TAI-GAN can produce converted early frames with high image quality, comparable to the real reference frames. After TAI-GAN conversion, motion estimation accuracy and clinical myocardial blood flow (MBF) quantification were improved compared to using the original frames. Our code is published at .

BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection

  • paper_url: http://arxiv.org/abs/2308.12439
  • repo_url: None
  • paper_authors: Tinghao Xie, Xiangyu Qi, Ping He, Yiming Li, Jiachen T. Wang, Prateek Mittal
  • For: 防止深度神经网络(DNNs)中的后门攻击(backdoor attacks)。* Methods: 基于反工程技术,从backdoored模型中提取出后门功能,并将其转化为高精度的后门输入检测器。* Results: 对16种State-of-the-Art(SOTA)后门攻击进行了有效防御,而无需干扰清洁功能。验证在多个 datasets(CIFAR10、GTSRB和ImageNet)和不同的模型架构(ResNet、VGG、MobileNetV2和Vision Transformer)上。
    Abstract We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs), wherein adversaries covertly implant malicious behaviors (backdoors) into DNNs. Our defense falls within the category of post-development defenses that operate independently of how the model was generated. The proposed defense is built upon a novel reverse engineering approach that can directly extract backdoor functionality of a given backdoored model to a backdoor expert model. The approach is straightforward -- finetuning the backdoored model over a small set of intentionally mislabeled clean samples, such that it unlearns the normal functionality while still preserving the backdoor functionality, and thus resulting in a model (dubbed a backdoor expert model) that can only recognize backdoor inputs. Based on the extracted backdoor expert model, we show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference. Further augmented by an ensemble strategy with a finetuned auxiliary model, our defense, BaDExpert (Backdoor Input Detection with Backdoor Expert), effectively mitigates 16 SOTA backdoor attacks while minimally impacting clean utility. The effectiveness of BaDExpert has been verified on multiple datasets (CIFAR10, GTSRB and ImageNet) across various model architectures (ResNet, VGG, MobileNetV2 and Vision Transformer).
    摘要 我们提出了一种新的防御机制,对深度神经网络(DNN)中的后门攻击。在这种攻击中,敌人将附加了黑客程式码到DNN中。我们的防御方法属于 poste development 防御,即在模型生成后进行防御。我们的防御方法基于一种新的反向工程方法,可以直接将黑客模型中的黑客功能扩展到一个名为“黑客专家模型”(Backdoor Expert Model)中。这种方法简单易行,只需要使用一小批 INTENTIONALLY 误 Labelled 的清洁样本进行调整,以让模型忘记正常功能,但保留黑客功能,从而产生了一个仅能识别黑客输入的模型。基于这个黑客专家模型,我们显示了如何设计高准确度的黑客输入检测器,以筛选掉黑客输入 durante 模型推导。另外,我们还提出了一个 ensemble 策略,将调整后的专家模型与一个调整后的副模型 ensemble together,从而产生了一个高效的防御机制,名为 BaDExpert(黑客输入检测器)。我们在多个数据集(CIFAR10、GTSRB 和 ImageNet)和多种模型架构(ResNet、VGG、MobileNetV2 和 Vision Transformer)上验证了 BaDExpert 的效果,发现它能够有效地抵销16种 SOTA 黑客攻击,而且对于清洁输入的影响相对轻微。

Deploying Deep Reinforcement Learning Systems: A Taxonomy of Challenges

  • paper_url: http://arxiv.org/abs/2308.12438
  • repo_url: https://github.com/drldeploymentchallenges-icsme2023/replicationpackage
  • paper_authors: Ahmed Haj Yahmed, Altaf Allah Abbassi, Amin Nikanjam, Heng Li, Foutse Khomh
    for: This paper aims to investigate the challenges faced by practitioners when deploying deep reinforcement learning (DRL) systems, specifically on Stack Overflow (SO), the most popular Q&A forum for developers.methods: The authors conducted an empirical study on SO to identify and understand the challenges related to deploying DRL systems. They categorized relevant SO posts by deployment platforms and manually analyzed 357 posts to investigate the current state and prevalence of these challenges.results: The study found that the general interest in DRL deployment is growing, and DRL deployment is more difficult than other DRL issues. The authors also built a taxonomy of 31 unique challenges in deploying DRL to different platforms, with RL environment-related challenges being the most popular and communication-related challenges being the most difficult among practitioners.
    Abstract Deep reinforcement learning (DRL), leveraging Deep Learning (DL) in reinforcement learning, has shown significant potential in achieving human-level autonomy in a wide range of domains, including robotics, computer vision, and computer games. This potential justifies the enthusiasm and growing interest in DRL in both academia and industry. However, the community currently focuses mostly on the development phase of DRL systems, with little attention devoted to DRL deployment. In this paper, we propose an empirical study on Stack Overflow (SO), the most popular Q&A forum for developers, to uncover and understand the challenges practitioners faced when deploying DRL systems. Specifically, we categorized relevant SO posts by deployment platforms: server/cloud, mobile/embedded system, browser, and game engine. After filtering and manual analysis, we examined 357 SO posts about DRL deployment, investigated the current state, and identified the challenges related to deploying DRL systems. Then, we investigate the prevalence and difficulty of these challenges. Results show that the general interest in DRL deployment is growing, confirming the study's relevance and importance. Results also show that DRL deployment is more difficult than other DRL issues. Additionally, we built a taxonomy of 31 unique challenges in deploying DRL to different platforms. On all platforms, RL environment-related challenges are the most popular, and communication-related challenges are the most difficult among practitioners. We hope our study inspires future research and helps the community overcome the most common and difficult challenges practitioners face when deploying DRL systems.
    摘要 Results show that the general interest in DRL deployment is growing, confirming the study's relevance and importance. Results also show that DRL deployment is more difficult than other DRL issues. Additionally, we built a taxonomy of 31 unique challenges in deploying DRL to different platforms. On all platforms, RL environment-related challenges are the most popular, and communication-related challenges are the most difficult among practitioners. We hope our study inspires future research and helps the community overcome the most common and difficult challenges practitioners face when deploying DRL systems.以下是我们的研究结果:1. DRL 部署的总兴趣在增长,这证明了这项研究的重要性和 relevance。2. DRL 部署比其他 DRL 问题更加困难。3. 在不同的平台上部署 DRL 系统时,RL 环境相关的挑战是最受欢迎的,而通信相关的挑战是最difficult的。4. 我们建立了一个包含 31 个唯一挑战的 DRL 部署稠密度图表,这些挑战分布在不同的平台上。我们希望这项研究能够激发未来的研究,并帮助社区解决实践者在部署 DRL 系统时遇到的最常见和最Difficult的挑战。

Evolution of ESG-focused DLT Research: An NLP Analysis of the Literature

  • paper_url: http://arxiv.org/abs/2308.12420
  • repo_url: None
  • paper_authors: Walter Hernandez, Kamil Tylinski, Alastair Moore, Niall Roche, Nikhil Vadgama, Horst Treiblmaier, Jiangbo Shangguan, Paolo Tasca, Jiahua Xu
  • For: 本研究的目的是提供一种基于机器学习的系统性文献复查方法,用于探讨分布式记录技术(DLT)领域中环境、可持续发展和管理(ESG)方面的多个组成部分。* Methods: 本研究使用了107种种子论文建立了63,083个参考文献的公共网络,并将其缩减到24,539篇文献进行分析。然后,对46篇论文中的命名实体进行了12种顶层类别的标注,并将DLT的ESG元素细化。使用基于转换器的自然语言处理模型,进行了一个Named Entity Recognition(NER)任务的精度调整。* Results: 本研究通过Named Entity Recognition(NER)任务的精度调整,将论文库缩减到505篇关键论文,并通过命名实体和时间图分析,对DLT的发展进行了一种系统性的文献复查。本研究的贡献包括一种用于DLT领域的机器学习驱动的系统性文献复查方法,以及一个特有的ESG方面的Named Entity Recognition(NER)数据集,包含54,808个命名实体。
    Abstract Distributed Ledger Technologies (DLTs) have rapidly evolved, necessitating comprehensive insights into their diverse components. However, a systematic literature review that emphasizes the Environmental, Sustainability, and Governance (ESG) components of DLT remains lacking. To bridge this gap, we selected 107 seed papers to build a citation network of 63,083 references and refined it to a corpus of 24,539 publications for analysis. Then, we labeled the named entities in 46 papers according to twelve top-level categories derived from an established technology taxonomy and enhanced the taxonomy by pinpointing DLT's ESG elements. Leveraging transformer-based language models, we fine-tuned a pre-trained language model for a Named Entity Recognition (NER) task using our labeled dataset. We used our fine-tuned language model to distill the corpus to 505 key papers, facilitating a literature review via named entities and temporal graph analysis on DLT evolution in the context of ESG. Our contributions are a methodology to conduct a machine learning-driven systematic literature review in the DLT field, placing a special emphasis on ESG aspects. Furthermore, we present a first-of-its-kind NER dataset, composed of 54,808 named entities, designed for DLT and ESG-related explorations.
    摘要 分布式记录技术(DLT)在短时间内快速发展,需要全面的检视其多样化组件。然而,一篇系统性的文献评论,强调环境、可持续发展和管理(ESG)方面的DLT组件,仍然缺失。为填补这一漏洞,我们选择了107个种子论文,建立了63,083个参考文献的公共网络,并将其缩小至24,539篇文献进行分析。然后,我们对46篇论文中的名称实体进行了12种顶级分类,基于已有的技术分类,并将DLT的ESG元素归类。通过使用基于转换器的自然语言处理模型,我们对这些名称实体进行了微调,并使用我们的微调模型来进行名称实体识别(NER)任务。我们使用了这些微调模型来缩小 corpus 到505个关键论文,以便通过名称实体和时间图分析来探讨DLT在ESG CONTEXT中的发展。我们的贡献包括一种在DLT领域进行机器学习驱动的系统性文献评论方法,以及一个由我们微调的NER数据集,包含54,808个名称实体,适用于DLT和ESG相关的探索。

Machine learning in parameter estimation of nonlinear systems

  • paper_url: http://arxiv.org/abs/2308.12393
  • repo_url: None
  • paper_authors: Kaushal Kumar
  • for: 这篇论文旨在探讨一种基于神经网络的参数估测方法,用于处理复杂的非线性系统。
  • methods: 这篇论文使用一个具有哈伯损失函数的神经网络,来探索非线性系统中参数的潜在行为。
  • results: 这篇论文透过训练神经网络使用噪音时间序数据,实现了参数的精确估测,并证明了这种方法的稳定性和灵活性。
    Abstract Accurately estimating parameters in complex nonlinear systems is crucial across scientific and engineering fields. We present a novel approach for parameter estimation using a neural network with the Huber loss function. This method taps into deep learning's abilities to uncover parameters governing intricate behaviors in nonlinear equations. We validate our approach using synthetic data and predefined functions that model system dynamics. By training the neural network with noisy time series data, it fine-tunes the Huber loss function to converge to accurate parameters. We apply our method to damped oscillators, Van der Pol oscillators, Lotka-Volterra systems, and Lorenz systems under multiplicative noise. The trained neural network accurately estimates parameters, evident from closely matching latent dynamics. Comparing true and estimated trajectories visually reinforces our method's precision and robustness. Our study underscores the Huber loss-guided neural network as a versatile tool for parameter estimation, effectively uncovering complex relationships in nonlinear systems. The method navigates noise and uncertainty adeptly, showcasing its adaptability to real-world challenges.
    摘要 估算复杂非线性系统中参数的精度是科学和工程领域的关键。我们提出了一种使用神经网络和战棘损失函数来进行参数估算的新方法。这种方法利用深度学习的能力来探索非线性方程中的参数。我们使用模拟数据和预定的函数来验证我们的方法。通过训练神经网络,它可以根据噪声时间序列数据来细化战棘损失函数,以达到准确的参数。我们在振荡者、范德波振荡器、洛特卡-沃尔特拉系统和洛兹系统下进行了multiplicative噪声的应用。训练神经网络可以准确地估算参数,可以从相似的潜在动力来证明。比较真实和估算的轨迹可以视觉地证明我们的方法的精度和可靠性。我们的研究证明了使用战棘损失函数导航神经网络为参数估算的方法,可以快速和稳定地探索复杂的非线性系统,并在噪声和不确定性中 navigation adaptively。

FOSA: Full Information Maximum Likelihood (FIML) Optimized Self-Attention Imputation for Missing Data

  • paper_url: http://arxiv.org/abs/2308.12388
  • repo_url: https://github.com/oudeng/fosa
  • paper_authors: Ou Deng, Qun Jin
  • for: 填充缺失数据,特别是复杂的数据集中的缺失值。
  • methods: 融合FIML估计和自注意力神经网络的FIML优化自注意力(FOSA)框架。
  • results: FOSA的实验表明,它在模拟数据和实际数据集上具有优于传统FIML技术的优势,包括准确性、计算效率和数据结构的适应性。即使SEM可能不准确,FOSA的自注意力结构仍能修复和优化投入值。FOSA在40%随机缺失情况下也能提供优秀的预测结果,证明其 Robustness 和广泛应用的潜力。
    Abstract In data imputation, effectively addressing missing values is pivotal, especially in intricate datasets. This paper delves into the FIML Optimized Self-attention (FOSA) framework, an innovative approach that amalgamates the strengths of Full Information Maximum Likelihood (FIML) estimation with the capabilities of self-attention neural networks. Our methodology commences with an initial estimation of missing values via FIML, subsequently refining these estimates by leveraging the self-attention mechanism. Our comprehensive experiments on both simulated and real-world datasets underscore FOSA's pronounced advantages over traditional FIML techniques, encapsulating facets of accuracy, computational efficiency, and adaptability to diverse data structures. Intriguingly, even in scenarios where the Structural Equation Model (SEM) might be mis-specified, leading to suboptimal FIML estimates, the robust architecture of FOSA's self-attention component adeptly rectifies and optimizes the imputation outcomes. Our empirical tests reveal that FOSA consistently delivers commendable predictions, even in the face of up to 40% random missingness, highlighting its robustness and potential for wide-scale applications in data imputation.
    摘要 在数据填充中,有效地处理缺失值是非常重要,特别是在复杂的数据集中。这篇论文探讨了FIML优化自注意(FOSA)框架,这是一种将FIML估计的优点与自注意神经网络的能力相结合的创新方法。我们的方法开始于初步估计缺失值via FIML,然后通过自注意机制来进一步改进这些估计。我们的全面实验表明,FOSA在 simulate 和实际数据集上具有明显的优势,包括精度、计算效率和适应不同数据结构。即使SEM可能是错误的,导致FIML估计不佳,FOSA的自注意结构仍能够正确地修正和优化填充结果。我们的实验表明,FOSA在40%的随机缺失情况下仍然能够提供优秀的预测结果,这表明其Robustness和广泛应用的潜力。

Open-set Face Recognition with Neural Ensemble, Maximal Entropy Loss and Feature Augmentation

  • paper_url: http://arxiv.org/abs/2308.12371
  • repo_url: None
  • paper_authors: Rafael Henrique Vareto, Manuel Günther, William Robson Schwartz
  • for: 开放集face认证问题中,生物 metric系统缺乏所有已注册的主体的完整知识,因此需要避免未注册的主体的面征amples被识别为先前注册的标识体。
  • methods: 该研究提出一种新的方法,即将 ensemble of 精简神经网络与边缘基于的成本函数结合,通过采用外部数据库或在训练时间使用新的混合特征生成方法来获取补充的负样本。
  • results: 研究在well-known LFW和IJB-C数据集上进行了实验,结果显示该方法能够提高closed和开放集标识率。
    Abstract Open-set face recognition refers to a scenario in which biometric systems have incomplete knowledge of all existing subjects. Therefore, they are expected to prevent face samples of unregistered subjects from being identified as previously enrolled identities. This watchlist context adds an arduous requirement that calls for the dismissal of irrelevant faces by focusing mainly on subjects of interest. As a response, this work introduces a novel method that associates an ensemble of compact neural networks with a margin-based cost function that explores additional samples. Supplementary negative samples can be obtained from external databases or synthetically built at the representation level in training time with a new mix-up feature augmentation approach. Deep neural networks pre-trained on large face datasets serve as the preliminary feature extraction module. We carry out experiments on well-known LFW and IJB-C datasets where results show that the approach is able to boost closed and open-set identification rates.
    摘要 开放集face认识指的是一种情况,在生物认证系统中存在部分知情的人员。因此,它们需要避免已经注册的人员面部样本被识别为未注册的人员。这个观察者上下文添加了一项艰辛的要求,即排除不相关的面部样本,主要关注关注点。为回应这个问题,本研究提出了一种新的方法,它将一组紧凑型神经网络 ensemble 与一种基于margin的成本函数相结合,并利用外部数据库或者在训练时期synthesize constructed 的新混合特征增强方法来获得补充性质样本。启用大面库 dataset 进行预处理的深度神经网络 serving 作为先期特征提取模块。我们在well-known LFW和IJB-C datasets上进行了实验,结果显示该方法能够提高closed和open-set认证率。

SafeAR: Towards Safer Algorithmic Recourse by Risk-Aware Policies

  • paper_url: http://arxiv.org/abs/2308.12367
  • repo_url: None
  • paper_authors: Haochen Wu, Shubham Sharma, Sunandita Patra, Sriram Gopalakrishnan
  • for: 提供了一种基于机器学习模型的决策中的抗议和改善机制,以便在决策中帮助人们更好地处理不利的结果。
  • methods: 使用了sequential algorithmic recourse的方法,考虑了变量的不确定性和风险,并使用了金融领域的Value at Risk和Conditional Value at Risk等风险措施来衡量风险。
  • results: 通过应用该方法于两个实际数据集,发现了不同风险偏好的策略之间的区别,并且在使用不同的抗议措施时,可以更好地满足用户的需求。
    Abstract With the growing use of machine learning (ML) models in critical domains such as finance and healthcare, the need to offer recourse for those adversely affected by the decisions of ML models has become more important; individuals ought to be provided with recommendations on actions to take for improving their situation and thus receive a favorable decision. Prior work on sequential algorithmic recourse -- which recommends a series of changes -- focuses on action feasibility and uses the proximity of feature changes to determine action costs. However, the uncertainties of feature changes and the risk of higher than average costs in recourse have not been considered. It is undesirable if a recourse could (with some probability) result in a worse situation from which recovery requires an extremely high cost. It is essential to incorporate risks when computing and evaluating recourse. We call the recourse computed with such risk considerations as Safer Algorithmic Recourse (SafeAR). The objective is to empower people to choose a recourse based on their risk tolerance. In this work, we discuss and show how existing recourse desiderata can fail to capture the risk of higher costs. We present a method to compute recourse policies that consider variability in cost and connect algorithmic recourse literature with risk-sensitive reinforcement learning. We also adopt measures ``Value at Risk'' and ``Conditional Value at Risk'' from the financial literature to summarize risk concisely. We apply our method to two real-world datasets and compare policies with different levels of risk-aversion using risk measures and recourse desiderata (sparsity and proximity).
    摘要 随着机器学习(ML)模型在金融和医疗领域的使用的增长,为那些由ML模型决策所受到的不良影响者提供了救济的需求变得更加重要。人们应该被提供改善其情况的建议,并从而获得有利的决策。先前的序列算法救济工作(sequential algorithmic recourse),关注行动可行性,使用特征变化的邻近来确定行动成本。然而,特征变化的不确定性和救济成本的风险没有被考虑。如果救济可能(有一定的概率)导致情况更加糟糕,从而需要极高的成本来恢复,这是不жела的。因此,在计算和评估救济时需要考虑风险。我们称之为考虑风险的救济为Safer Algorithmic Recourse(SafeAR)。SafeAR的目标是让人们根据其风险忍耐度选择救济。在这种工作中,我们讨论了现有的救济需求可能无法捕捉成本变化的风险。我们提出了一种计算救济策略的方法,该方法考虑特征变化的变化和连接算法救济文献与风险敏感的再增强学习。我们还采用金融文献中的“Value at Risk”和“Conditional Value at Risk”等度量准确地描述风险。我们应用我们的方法于两个实际数据集,并与不同风险偏好的策略进行比较,使用风险度量和救济需求(简洁和邻近)。

Renormalizing Diffusion Models

  • paper_url: http://arxiv.org/abs/2308.12355
  • repo_url: None
  • paper_authors: Jordan Cotler, Semon Rezchikov
  • for: 学习 inverse renormalization group flows of statistical and quantum field theories
  • methods: 使用 diffusion models 学习 inverse 过程,并将其与物理的 renormalization group schemes 结合起来
  • results: 提出了一种基于机器学习的方法来研究场理论,并实现了在 lattice field theory 中使用 adaptive bridge (或平行温度) 抽样器Here’s a more detailed explanation of each point:1. for: The paper is written for studying the inverse renormalization group flows of statistical and quantum field theories, using machine learning techniques.2. methods: The paper uses diffusion models to learn the inverse process of renormalization group schemes, which are explicitly specified in the context of field theories. The models are combined with adaptive bridge (or parallel tempering) samplers to efficiently explore the space of fields.3. results: The paper provides a new approach to studying field theories using machine learning, and demonstrates the effectiveness of the method by applying it to numerically find renormalization group flows of interacting statistical field theories. Additionally, the paper provides explicit prescriptions for comparing results derived from models associated with different renormalization group schemes, and discusses the use of diffusion models in a variational method to find ground states of quantum systems.
    Abstract We explain how to use diffusion models to learn inverse renormalization group flows of statistical and quantum field theories. Diffusion models are a class of machine learning models which have been used to generate samples from complex distributions, such as the distribution of natural images, by learning the inverse process to a diffusion process which adds noise to the data until the distribution of the data is pure noise. Nonperturbative renormalization group schemes can naturally be written as diffusion processes in the space of fields. We combine these observations in a concrete framework for building ML-based models for studying field theories, in which the models learn the inverse process to an explicitly-specified renormalization group scheme. We detail how these models define a class of adaptive bridge (or parallel tempering) samplers for lattice field theory. Because renormalization group schemes have a physical meaning, we provide explicit prescriptions for how to compare results derived from models associated to several different renormalization group schemes of interest. We also explain how to use diffusion models in a variational method to find ground states of quantum systems. We apply some of our methods to numerically find RG flows of interacting statistical field theories. From the perspective of machine learning, our work provides an interpretation of multiscale diffusion models, and gives physically-inspired suggestions for diffusion models which should have novel properties.
    摘要 我们说明如何使用扩散模型来学习逆减少量子场论理的倒数法。扩散模型是一种机器学习模型,可以生成复杂分布中的样本,如自然图像分布,通过学习逆 процес,即将数据中的噪声添加到数据,直到数据的分布成为纯噪声。非perturbative renormalization group scheme可以自然地写作扩散 процес在场论理空间中。我们结合这些观察,建立一个实际应用的框架,以机器学习方式研究场论理。这些模型学习逆 процес,即一个明确指定的renormalization group scheme。我们详细说明如何使用这些模型建立一种适应桥(或平行温度)抽样器,以便探索量子场论理的基本点。因为renormalization group scheme有物理意义,我们提供明确的比较方法,以便从各个不同的renormalization group scheme中获得结果。我们还说明如何使用扩散模型在统计力学中找到量子系统的基本点。我们将一些我们的方法应用到量子场论理中,以 numerically 找到各种相互作用的量子场论理的流动。从机器学习的角度来看,我们的工作具有物理激发的解释,并提供了具有新特性的扩散模型的建议。

Improving Generative Model-based Unfolding with Schrödinger Bridges

  • paper_url: http://arxiv.org/abs/2308.12351
  • repo_url: None
  • paper_authors: Sascha Diefenbacher, Guan-Horng Liu, Vinicius Mikuni, Benjamin Nachman, Weili Nie
  • for: 这个论文是为了探讨机器学习基于 unfolding 的高维ensional差异观测方法。
  • methods: 这个论文使用了 Schroedinger Bridges 和 diffusion models 等方法。
  • results: 论文表明,SBUnfold 方法可以在 Synthetic Z+jets 数据集上达到优秀的性能。
    Abstract Machine learning-based unfolding has enabled unbinned and high-dimensional differential cross section measurements. Two main approaches have emerged in this research area: one based on discriminative models and one based on generative models. The main advantage of discriminative models is that they learn a small correction to a starting simulation while generative models scale better to regions of phase space with little data. We propose to use Schroedinger Bridges and diffusion models to create SBUnfold, an unfolding approach that combines the strengths of both discriminative and generative models. The key feature of SBUnfold is that its generative model maps one set of events into another without having to go through a known probability density as is the case for normalizing flows and standard diffusion models. We show that SBUnfold achieves excellent performance compared to state of the art methods on a synthetic Z+jets dataset.
    摘要 机器学习基于的 unfolding 技术已经实现了无桶和高维差分观测。这两种主要方法是基于抑制模型和基于生成模型。抑制模型的主要优点是它们可以学习一个小的修正来于起始 simulations,而生成模型则更适合在数据稀缺的区域中进行观测。我们提议使用 Schrödinger Bridges 和扩散模型来创建 SBUnfold,一种结合了抑制和生成模型的 unfolding 方法。SBUnfold 的关键特点是其生成模型可以将一个事件集转换成另一个事件集,而不需要通过一个已知的概率密度。我们示出 SBUnfold 在一个 synthetic Z+jets 数据集上的表现准确性比州对照方法更高。

D4: Improving LLM Pretraining via Document De-Duplication and Diversification

  • paper_url: http://arxiv.org/abs/2308.12284
  • repo_url: None
  • paper_authors: Kushal Tirumala, Daniel Simig, Armen Aghajanyan, Ari S. Morcos
  • for: 这篇论文的目的是探讨大型自然语言模型(LLM)的预训练和下游性能如何受到数据选择的影响。
  • methods: 这篇论文使用了预训练模型embeddings进行数据选择,并证明了智能重复数据可以提高预训练速度(20%的效率提升)和下游任务的均值准确率(最高达2%)。
  • results: 这篇论文的结果表明,智能数据选择可以显著提高LLM预训练的性能,并质疑了 randomly sampling web data 的常见做法。
    Abstract Over recent years, an increasing amount of compute and data has been poured into training large language models (LLMs), usually by doing one-pass learning on as many tokens as possible randomly selected from large-scale web corpora. While training on ever-larger portions of the internet leads to consistent performance improvements, the size of these improvements diminishes with scale, and there has been little work exploring the effect of data selection on pre-training and downstream performance beyond simple de-duplication methods such as MinHash. Here, we show that careful data selection (on top of de-duplicated data) via pre-trained model embeddings can speed up training (20% efficiency gains) and improves average downstream accuracy on 16 NLP tasks (up to 2%) at the 6.7B model scale. Furthermore, we show that repeating data intelligently consistently outperforms baseline training (while repeating random data performs worse than baseline training). Our results indicate that clever data selection can significantly improve LLM pre-training, calls into question the common practice of training for a single epoch on as much data as possible, and demonstrates a path to keep improving our models past the limits of randomly sampling web data.
    摘要 在最近几年,越来越多的计算和数据被投入到训练大型语言模型(LLM)中,通常是通过单 passes 学习大量网络资料中的随机选择进行。虽然在越来越大的规模上进行训练会导致性能提高,但这些提高的大小随着规模减少,而且有少量的研究探讨了数据选择对预训练和下游性能的影响 beyond 简单的去重方法such as MinHash。我们显示,通过预训练模型 embedding 进行精心的数据选择(以上下文为了减少数据重复)可以提高训练效率(20%效率提升)和提高16种 NLP 任务的平均下游准确率(最高达2%)。此外,我们还显示,通过智能重复数据可以不断超过基eline训练(而Random data 重复的情况下则比基eline训练更差)。我们的结果表明,精心的数据选择可以显著提高 LLM 预训练,质疑了训练一次性处理大量网络数据的常见做法,并提出了继续改进我们的模型的路径。

Extended Linear Regression: A Kalman Filter Approach for Minimizing Loss via Area Under the Curve

  • paper_url: http://arxiv.org/abs/2308.12280
  • repo_url: None
  • paper_authors: Gokulprasath R
  • for: 增强线性回归模型,使用kalman filter和分析曲线面积来降低损失。
  • methods: 使用随机梯度下降(SGD)更新参数,并使用kalman filter来预测下一个融合参数。
  • results: 实现了一个优化的线性回归方程,并且可以避免常量参数更新和使用完整数据集。但需要考虑计算复杂性。
    Abstract This research enhances linear regression models by integrating a Kalman filter and analysing curve areas to minimize loss. The goal is to develop an optimal linear regression equation using stochastic gradient descent (SGD) for weight updating. Our approach involves a stepwise process, starting with user-defined parameters. The linear regression model is trained using SGD, tracking weights and loss separately and zipping them finally. A Kalman filter is then trained based on weight and loss arrays to predict the next consolidated weights. Predictions result from multiplying input averages with weights, evaluated for loss to form a weight-versus-loss curve. The curve's equation is derived using the two-point formula, and area under the curve is calculated via integration. The linear regression equation with minimum area becomes the optimal curve for prediction. Benefits include avoiding constant weight updates via gradient descent and working with partial datasets, unlike methods needing the entire set. However, computational complexity should be considered. The Kalman filter's accuracy might diminish beyond a certain prediction range.
    摘要 这种研究提高了线性回归模型,通过纳曼滤波器和分析曲线面积来减少损失。目标是通过随机梯度下降(SGD)来开发最优的线性回归方程。我们的方法包括一步骤过程,从用户定义的参数开始,使用SGD训练线性回归模型,并分别跟踪 weights 和损失。然后,使用纳曼滤波器根据权重和损失数组预测下一个总结权重。预测结果是通过输入均值与权重进行乘法,并评估损失来形成权重与损失曲线。曲线的方程由两点方程 derive,并通过积分来计算曲线面积。最优的曲线方程是 minimum 损失曲线方程。这种方法的优点包括:不需要不断更新权重 via 梯度下降,并且可以处理部分数据集,而不是需要整个数据集。然而,计算复杂性应该被考虑。纳曼滤波器的准确性可能在预测范围内逐渐减退。

On-Manifold Projected Gradient Descent

  • paper_url: http://arxiv.org/abs/2308.12279
  • repo_url: https://github.com/JonasGrabbe/GradientDecentOnManifolds
  • paper_authors: Aaron Mahler, Tyrus Berry, Tom Stephens, Harbir Antil, Michael Merritt, Jeanie Schreiber, Ioannis Kevrekidis
  • for: 这项研究的目的是为高维数据提供计算可能、直接、数学上正确的抛物线geometryapproximation,以及从输入空间投影到这些类型 manifold上。
  • methods: 这项研究使用了协形变换图(CIDM)来近似类型 manifold,并开发了 Nystr"{o}m 投影来将新点投影到类型 manifold 上。此外,它还使用了spectral exterior calculus(SEC)来确定类型 manifold 上的 géométríques量 such as tangent vectors。
  • results: 这项研究得出了一种可以在类型 manifold 上生成骗子样本,并且这些骗子样本可以让类型 manifold 上的分类器进行骗子攻击。此外,这项研究还表明了骗子样本在分类器准确率下的影响,并且提供了一种以人类可理解的投影来解释这些骗子样本。
    Abstract This work provides a computable, direct, and mathematically rigorous approximation to the differential geometry of class manifolds for high-dimensional data, along with nonlinear projections from input space onto these class manifolds. The tools are applied to the setting of neural network image classifiers, where we generate novel, on-manifold data samples, and implement a projected gradient descent algorithm for on-manifold adversarial training. The susceptibility of neural networks (NNs) to adversarial attack highlights the brittle nature of NN decision boundaries in input space. Introducing adversarial examples during training has been shown to reduce the susceptibility of NNs to adversarial attack; however, it has also been shown to reduce the accuracy of the classifier if the examples are not valid examples for that class. Realistic "on-manifold" examples have been previously generated from class manifolds in the latent of an autoencoder. Our work explores these phenomena in a geometric and computational setting that is much closer to the raw, high-dimensional input space than can be provided by VAE or other black box dimensionality reductions. We employ conformally invariant diffusion maps (CIDM) to approximate class manifolds in diffusion coordinates, and develop the Nystr\"{o}m projection to project novel points onto class manifolds in this setting. On top of the manifold approximation, we leverage the spectral exterior calculus (SEC) to determine geometric quantities such as tangent vectors of the manifold. We use these tools to obtain adversarial examples that reside on a class manifold, yet fool a classifier. These misclassifications then become explainable in terms of human-understandable manipulations within the data, by expressing the on-manifold adversary in the semantic basis on the manifold.
    摘要 Translated into Simplified Chinese:这个工作提供了一种可计算、直接、数学上正确的方法来 aproximate class manifold的 diferencial geometry for high-dimensional data,以及将输入空间中的点映射到这些class manifold上的非线性投影。这些工具被应用于神经网络图像分类器,where we generate novel, on-manifold data samples and implement a projected gradient descent algorithm for on-manifold adversarial training.神经网络(NN)对 adversarial attack的抵触 highlights the brittle nature of NN decision boundaries in input space.引入 adversarial examples during training has been shown to reduce the susceptibility of NNs to adversarial attack, but it has also been shown to reduce the accuracy of the classifier if the examples are not valid examples for that class.Realistic "on-manifold" examples have been previously generated from class manifolds in the latent of an autoencoder. Our work explores these phenomena in a geometric and computational setting that is much closer to the raw, high-dimensional input space than can be provided by VAE or other black box dimensionality reductions. we employ conformally invariant diffusion maps (CIDM) to approximate class manifolds in diffusion coordinates, and develop the Nystr\"{o}m projection to project novel points onto class manifolds in this setting. On top of the manifold approximation, we leverage the spectral exterior calculus (SEC) to determine geometric quantities such as tangent vectors of the manifold. we use these tools to obtain adversarial examples that reside on a class manifold, yet fool a classifier. These misclassifications then become explainable in terms of human-understandable manipulations within the data, by expressing the on-manifold adversary in the semantic basis on the manifold.

LCANets++: Robust Audio Classification using Multi-layer Neural Networks with Lateral Competition

  • paper_url: http://arxiv.org/abs/2308.12882
  • repo_url: None
  • paper_authors: Sayanton V. Dibbo, Juston S. Moore, Garrett T. Kenyon, Michael A. Teti
  • for: Audio classification aims to recognize audio signals, including speech commands or sound events, but current audio classifiers are vulnerable to perturbations and adversarial attacks.
  • methods: To address these challenges, the paper introduces LCANets++, which are CNNs that perform sparse coding in multiple layers via the Locally Competitive Algorithm (LCA).
  • results: LCANets++ are more robust than standard CNNs and LCANets against perturbations and adversarial attacks, such as background noise and black-box and white-box attacks.
    Abstract Audio classification aims at recognizing audio signals, including speech commands or sound events. However, current audio classifiers are susceptible to perturbations and adversarial attacks. In addition, real-world audio classification tasks often suffer from limited labeled data. To help bridge these gaps, previous work developed neuro-inspired convolutional neural networks (CNNs) with sparse coding via the Locally Competitive Algorithm (LCA) in the first layer (i.e., LCANets) for computer vision. LCANets learn in a combination of supervised and unsupervised learning, reducing dependency on labeled samples. Motivated by the fact that auditory cortex is also sparse, we extend LCANets to audio recognition tasks and introduce LCANets++, which are CNNs that perform sparse coding in multiple layers via LCA. We demonstrate that LCANets++ are more robust than standard CNNs and LCANets against perturbations, e.g., background noise, as well as black-box and white-box attacks, e.g., evasion and fast gradient sign (FGSM) attacks.
    摘要 Audio 分类目标是识别音频信号,包括语音命令或声音事件。然而,当前的音频分类器受到干扰和反对攻击的影响。此外,实际世界中的音频分类任务经常受到有限的标注数据的限制。为了bridge这些差距,先前的工作开发了基于神经元的启发式卷积神经网络(CNN),使用本地竞争算法(LCA)在第一层进行稀盐编码,称为LCANets。LCANets通过组合监督和无监督学习,减少依赖于标注样本。听说 auditory cortex 也是稀盐的,我们扩展LCANets到音频识别任务,并引入LCANets++,它是多层通过LCA进行稀盐编码的CNN。我们示示LCANets++ 比标准 CNN 和 LCANets 更加鲁棒,对干扰(例如背景噪音)和黑盒和白盒攻击(例如欺骗和快速梯度签名)表现更好。

Language Reward Modulation for Pretraining Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.12270
  • repo_url: https://github.com/ademiadeniji/lamp
  • paper_authors: Ademi Adeniji, Amber Xie, Carmelo Sferrazza, Younggyo Seo, Stephen James, Pieter Abbeel
  • for: 本文探讨了使用学习奖函数(LRF)来解决稀谱奖励学习(RL)任务的进展。
  • methods: 本文提出了使用语言奖函数(VLM)作为RL的预训练Signal,并使用冻结的VLM生成各种语言指令和图像观察值的对比准则来生成噪音的探索奖励。
  • results: 本文的方法可以在RLBench中快速启动样本繁殖的RL学习,并且可以在稀谱奖励情况下提高任务复杂度。
    Abstract Using learned reward functions (LRFs) as a means to solve sparse-reward reinforcement learning (RL) tasks has yielded some steady progress in task-complexity through the years. In this work, we question whether today's LRFs are best-suited as a direct replacement for task rewards. Instead, we propose leveraging the capabilities of LRFs as a pretraining signal for RL. Concretely, we propose $\textbf{LA}$nguage Reward $\textbf{M}$odulated $\textbf{P}$retraining (LAMP) which leverages the zero-shot capabilities of Vision-Language Models (VLMs) as a $\textit{pretraining}$ utility for RL as opposed to a downstream task reward. LAMP uses a frozen, pretrained VLM to scalably generate noisy, albeit shaped exploration rewards by computing the contrastive alignment between a highly diverse collection of language instructions and the image observations of an agent in its pretraining environment. LAMP optimizes these rewards in conjunction with standard novelty-seeking exploration rewards with reinforcement learning to acquire a language-conditioned, pretrained policy. Our VLM pretraining approach, which is a departure from previous attempts to use LRFs, can warmstart sample-efficient learning on robot manipulation tasks in RLBench.
    摘要 LAMP uses a frozen, pretrained VLM to generate noisy, but shaped exploration rewards by computing the contrastive alignment between a highly diverse collection of language instructions and the image observations of an agent in its pretraining environment. LAMP optimizes these rewards in conjunction with standard novelty-seeking exploration rewards with reinforcement learning to acquire a language-conditioned, pretrained policy. Our VLM pretraining approach, which is a departure from previous attempts to use LRFs, can warmstart sample-efficient learning on robot manipulation tasks in RLBench.

FECoM: A Step towards Fine-Grained Energy Measurement for Deep Learning

  • paper_url: http://arxiv.org/abs/2308.12264
  • repo_url: None
  • paper_authors: Saurabhsingh Rajput, Tim Widmayer, Ziyuan Shang, Maria Kechagia, Federica Sarro, Tushar Sharma
  • for: 这篇论文的目的是来提高深度学习(Deep Learning)模型的能源消耗量的测量和优化。
  • methods: 这篇论文使用了精确的能源消耗量测量方法,即 Fine-grained Energy Consumption Meter(FECoM),并考虑了不同因素,例如计算负载和温度稳定性。
  • results: 这篇论文使用 FECoM 来测量 TensorFlow 框架中 API 的能源消耗量,并 investigate 了参数大小和执行时间对能源消耗量的影响。
    Abstract With the increasing usage, scale, and complexity of Deep Learning (DL) models, their rapidly growing energy consumption has become a critical concern. Promoting green development and energy awareness at different granularities is the need of the hour to limit carbon emissions of DL systems. However, the lack of standard and repeatable tools to accurately measure and optimize energy consumption at a fine granularity (e.g., at method level) hinders progress in this area. In this paper, we introduce FECoM (Fine-grained Energy Consumption Meter), a framework for fine-grained DL energy consumption measurement. Specifically, FECoM provides researchers and developers a mechanism to profile DL APIs. FECoM addresses the challenges of measuring energy consumption at fine-grained level by using static instrumentation and considering various factors, including computational load and temperature stability. We assess FECoM's capability to measure fine-grained energy consumption for one of the most popular open-source DL frameworks, namely TensorFlow. Using FECoM, we also investigate the impact of parameter size and execution time on energy consumption, enriching our understanding of TensorFlow APIs' energy profiles. Furthermore, we elaborate on the considerations, issues, and challenges that one needs to consider while designing and implementing a fine-grained energy consumption measurement tool. We hope this work will facilitate further advances in DL energy measurement and the development of energy-aware practices for DL systems.
    摘要

Learning from Negative User Feedback and Measuring Responsiveness for Sequential Recommenders

  • paper_url: http://arxiv.org/abs/2308.12256
  • repo_url: None
  • paper_authors: Yueqi Wang, Yoni Halpern, Shuo Chang, Jingchen Feng, Elaine Ya Le, Longfei Li, Xujian Liang, Min-Cheng Huang, Shane Li, Alex Beutel, Yaping Zhang, Shuchao Bi
  • for: 这个论文主要是为了提高sequential retrieval模型中对负反馈的学习和应用。
  • methods: 这个论文使用了explicit和implicit的负反馈来调整sequential retrieval模型的训练目标,并使用了一个”not-to-recommend”损失函数来优化模型的逻辑可能性。
  • results: 实验结果表明,通过这种方法可以提高sequential retrieval模型的应对负反馈性能,并且通过对不同用户行为进行对照分析,提高了推荐器的反应性。
    Abstract Sequential recommenders have been widely used in industry due to their strength in modeling user preferences. While these models excel at learning a user's positive interests, less attention has been paid to learning from negative user feedback. Negative user feedback is an important lever of user control, and comes with an expectation that recommenders should respond quickly and reduce similar recommendations to the user. However, negative feedback signals are often ignored in the training objective of sequential retrieval models, which primarily aim at predicting positive user interactions. In this work, we incorporate explicit and implicit negative user feedback into the training objective of sequential recommenders in the retrieval stage using a "not-to-recommend" loss function that optimizes for the log-likelihood of not recommending items with negative feedback. We demonstrate the effectiveness of this approach using live experiments on a large-scale industrial recommender system. Furthermore, we address a challenge in measuring recommender responsiveness to negative feedback by developing a counterfactual simulation framework to compare recommender responses between different user actions, showing improved responsiveness from the modeling change.
    摘要 幂值推荐器在工业中广泛使用,主要是因为它们能够准确地模型用户的喜好。然而,这些模型往往忽略了从用户的负反馈中学习。负反馈是用户控制的重要手段,用户对推荐的预期是快速地避免类似的推荐。然而,现有的推荐模型在训练目标中忽略了负反馈信号,主要是预测正确的用户交互。在这种情况下,我们将显式和隐式的负反馈 integrate into 推荐模型的训练目标中,使用一个“不推荐”损失函数,以便优化log-likelihood的不推荐ITEMS。我们通过实际实验证明了这种方法的有效性,并解决了对推荐响应负反馈的测量挑战。在这种框架下,我们开发了一种对推荐响应的counterfactual simulation框架,以比较不同用户行为下的推荐响应,显示了改进后的模型响应性。

How Safe Am I Given What I See? Calibrated Prediction of Safety Chances for Image-Controlled Autonomy

  • paper_url: http://arxiv.org/abs/2308.12252
  • repo_url: https://github.com/maozj6/hsai-predictor
  • paper_authors: Zhenjiang Mao, Carson Sobolewski, Ivan Ruchkin
  • for: 这篇论文是为了解决自动化系统的安全验证问题而写的。
  • methods: 该论文提出了一种基于生成世界模型的可配置的学习管道,不需要低维度的状态。它们解决了在预测过程中数据分布shift的问题,并提供了对预测安全概率的统计加验 garanties。
  • results: 在两个图像控制系统的案例研究中,提出的学习管道实现了对预测安全概率的统计加验 garanties。
    Abstract End-to-end learning has emerged as a major paradigm for developing autonomous systems. Unfortunately, with its performance and convenience comes an even greater challenge of safety assurance. A key factor of this challenge is the absence of the notion of a low-dimensional and interpretable dynamical state, around which traditional assurance methods revolve. Focusing on the online safety prediction problem, this paper proposes a configurable family of learning pipelines based on generative world models, which do not require low-dimensional states. To implement these pipelines, we overcome the challenges of learning safety-informed latent representations and missing safety labels under prediction-induced distribution shift. These pipelines come with statistical calibration guarantees on their safety chance predictions based on conformal prediction. We perform an extensive evaluation of the proposed learning pipelines on two case studies of image-controlled systems: a racing car and a cartpole.
    摘要 <>endo-to-endo 学习已经成为自主系统的主要 парадигмы。然而,与其性能和便利性一样,这也带来了更大的安全保证挑战。一个关键因素是absence of a low-dimensional and interpretable dynamical state,这使得传统的安全保证方法无法运作。本文提出了一种可 configurable 的学习管道,基于生成世界模型,不需要低维状态。为实现这些管道,我们解决了学习安全信息学 Representation 和预测导致的分布变化下缺失安全标签的问题。这些管道具有基于conformal prediction的统计加革保证安全可能性预测。我们对两个图像控制系统的case study进行了广泛的评估:一辆racing car和一个cartpole。>>>

  • paper_url: http://arxiv.org/abs/2308.12247
  • repo_url: None
  • paper_authors: Timothy Chu, Zhao Song, Chiwun Yang
  • for: 本研究旨在解决大语言模型(LLMs)训练和优化过程中是否生成版权数据的问题。
  • methods: 本研究使用了softmax回归问题来解决大语言模型训练和优化问题,并提出了一种有效地实现softmax回归的方法,以避免生成版权数据。
  • results: 本研究显示,可以通过视为softmax回归问题来有效地训练和优化大语言模型,并避免生成版权数据。这种方法提供了一种理论上的训练大语言模型的方法,以避免生成版权数据。
    Abstract Large language models (LLMs) and generative AI have played a transformative role in computer research and applications. Controversy has arisen as to whether these models output copyrighted data, which can occur if the data the models are trained on is copyrighted. LLMs are built on the transformer neural network architecture, which in turn relies on a mathematical computation called Attention that uses the softmax function. In this paper, we show that large language model training and optimization can be seen as a softmax regression problem. We then establish a method of efficiently performing softmax regression, in a way that prevents the regression function from generating copyright data. This establishes a theoretical method of training large language models in a way that avoids generating copyright data.
    摘要 大型语言模型(LLM)和生成AI在计算机研究和应用中发挥了转变性的作用。但是,有争议是这些模型输出的数据是否受版权保护,这可能会发生如果模型训练数据是受版权保护的。LLMs是基于变换器神经网络架构,变换器神经网络又 rely on一种数学计算called Attention,它使用softmax函数。在这篇论文中,我们表明了大型语言模型的训练和优化可以看作一种softmax回归问题。然后,我们建立了一种有效地进行softmax回归的方法,以避免回归函数生成受版权保护的数据。这种方法可以在训练大型语言模型时避免生成受版权保护的数据。

Multi-Objective Optimization for Sparse Deep Neural Network Training

  • paper_url: http://arxiv.org/abs/2308.12243
  • repo_url: https://github.com/salomonhotegni/mdmtn
  • paper_authors: S. S. Hotegni, S. Peitz, M. Berkemeier
  • for: 本研究旨在提出一种多目标优化算法,用于在深度学习中训练多任务模型。
  • methods: 本研究使用修改后的Weighted Chebyshev scalarization方法,将多任务问题转化为一系列单目标问题,然后使用扩展拉格朗日方法解决。
  • results: 实验结果表明,通过在训练过程中动态减少模型中的参数数量,可以降低模型的计算成本,同时不会影响模型的性能。
    Abstract Different conflicting optimization criteria arise naturally in various Deep Learning scenarios. These can address different main tasks (i.e., in the setting of Multi-Task Learning), but also main and secondary tasks such as loss minimization versus sparsity. The usual approach is a simple weighting of the criteria, which formally only works in the convex setting. In this paper, we present a Multi-Objective Optimization algorithm using a modified Weighted Chebyshev scalarization for training Deep Neural Networks (DNNs) with respect to several tasks. By employing this scalarization technique, the algorithm can identify all optimal solutions of the original problem while reducing its complexity to a sequence of single-objective problems. The simplified problems are then solved using an Augmented Lagrangian method, enabling the use of popular optimization techniques such as Adam and Stochastic Gradient Descent, while efficaciously handling constraints. Our work aims to address the (economical and also ecological) sustainability issue of DNN models, with a particular focus on Deep Multi-Task models, which are typically designed with a very large number of weights to perform equally well on multiple tasks. Through experiments conducted on two Machine Learning datasets, we demonstrate the possibility of adaptively sparsifying the model during training without significantly impacting its performance, if we are willing to apply task-specific adaptations to the network weights. Code is available at https://github.com/salomonhotegni/MDMTN.
    摘要 不同的冲突优化标准在深度学习场景中自然而然而生成。这些标准可以处理不同的主任务(例如在多任务学习设置中),也可以处理主要和次要任务,如损失最小化与稀疏化。通常的方法是简单地将标准加权,这只有在几何设定下才能正确工作。在这篇论文中,我们提出了一种多目标优化算法,使用修改后的Weighted ChebyshevScalarization来训练深度神经网络(DNNs)在多个任务之间。通过使用这种Scalarization技术,算法可以找到原始问题的所有优化解决方案,同时减少问题的复杂性为一个序列单个目标问题。这些简化后的问题然后可以使用增强的拉格朗日方法解决,这使得可以使用流行的优化技术,如Adam和随机梯度下降,同时有效地处理约束。我们的工作旨在解决深度神经网络模型中的(经济和生态)可持续性问题,尤其是深度多任务模型,这些模型通常具有非常多的权重,以便在多个任务上表现均优。通过在两个机器学习数据集上进行实验,我们示出了在训练过程中适量缩减模型的可能性,而不需要对性能产生显著影响,只要采用任务特定的网络权重修改。代码可以在https://github.com/salomonhotegni/MDMTN上找到。

Critical Learning Periods Emerge Even in Deep Linear Networks

  • paper_url: http://arxiv.org/abs/2308.12221
  • repo_url: None
  • paper_authors: Michael Kleinman, Alessandro Achille, Stefano Soatto
  • for: 这篇论文探讨了深度网络中的批处理期,并解释了这些期限的出现是由哪些因素导致的。
  • methods: 作者使用了深度线性网络模型,并通过分析和实验来研究批处理期的特点和影响。
  • results: 研究发现,批处理期的出现与网络深度和数据分布结构有关,同时也与特定任务的学习和竞争关系相关。此外,在多任务学习中,预训练certain tasks可能会对新任务的传输性能产生负面影响,这与批处理期的长度和任务之间的关系有关。
    Abstract Critical learning periods are periods early in development where temporary sensory deficits can have a permanent effect on behavior and learned representations. Despite the radical differences between biological and artificial networks, critical learning periods have been empirically observed in both systems. This suggests that critical periods may be fundamental to learning and not an accident of biology. Yet, why exactly critical periods emerge in deep networks is still an open question, and in particular it is unclear whether the critical periods observed in both systems depend on particular architectural or optimization details. To isolate the key underlying factors, we focus on deep linear network models, and show that, surprisingly, such networks also display much of the behavior seen in biology and artificial networks, while being amenable to analytical treatment. We show that critical periods depend on the depth of the model and structure of the data distribution. We also show analytically and in simulations that the learning of features is tied to competition between sources. Finally, we extend our analysis to multi-task learning to show that pre-training on certain tasks can damage the transfer performance on new tasks, and show how this depends on the relationship between tasks and the duration of the pre-training stage. To the best of our knowledge, our work provides the first analytically tractable model that sheds light into why critical learning periods emerge in biological and artificial networks.
    摘要 critical learning periods are periods early in development where temporary sensory deficits can have a permanent effect on behavior and learned representations. despite the radical differences between biological and artificial networks, critical periods have been empirically observed in both systems. this suggests that critical periods may be fundamental to learning and not an accident of biology. yet, why exactly critical periods emerge in deep networks is still an open question, and in particular it is unclear whether the critical periods observed in both systems depend on particular architectural or optimization details. to isolate the key underlying factors, we focus on deep linear network models, and show that, surprisingly, such networks also display much of the behavior seen in biology and artificial networks, while being amenable to analytical treatment. we show that critical periods depend on the depth of the model and structure of the data distribution. we also show analytically and in simulations that the learning of features is tied to competition between sources. finally, we extend our analysis to multi-task learning to show that pre-training on certain tasks can damage the transfer performance on new tasks, and show how this depends on the relationship between tasks and the duration of the pre-training stage. to the best of our knowledge, our work provides the first analytically tractable model that sheds light into why critical learning periods emerge in biological and artificial networks.Here is the translation in Traditional Chinese:critical learning periods are periods early in development where temporary sensory deficits can have a permanent effect on behavior and learned representations. despite the radical differences between biological and artificial networks, critical periods have been empirically observed in both systems. this suggests that critical periods may be fundamental to learning and not an accident of biology. yet, why exactly critical periods emerge in deep networks is still an open question, and in particular it is unclear whether the critical periods observed in both systems depend on particular architectural or optimization details. to isolate the key underlying factors, we focus on deep linear network models, and show that, surprisingly, such networks also display much of the behavior seen in biology and artificial networks, while being amenable to analytical treatment. we show that critical periods depend on the depth of the model and structure of the data distribution. we also show analytically and in simulations that the learning of features is tied to competition between sources. finally, we extend our analysis to multi-task learning to show that pre-training on certain tasks can damage the transfer performance on new tasks, and show how this depends on the relationship between tasks and the duration of the pre-training stage. to the best of our knowledge, our work provides the first analytically tractable model that sheds light into why critical learning periods emerge in biological and artificial networks.

Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning

  • paper_url: http://arxiv.org/abs/2308.12219
  • repo_url: https://github.com/yegcjs/diffusionllm
  • paper_authors: Jiasheng Ye, Zaixiang Zheng, Yu Bao, Lihua Qian, Quanquan Gu
  • for: 这篇论文主要目标是探讨Diffusion Probabilistic Models是否能够解决通用的语言任务,并证明可以通过扩大数据、大小和任务来使Diffusion Models成为强大的语言学习模型。
  • methods: 该论文使用了Diffusion Models和大语言模型的混合,通过预训练和特定任务的精度适应来转换预训练模型为Diffusion Models,并通过自适应和指令精度适应来解锁其多样化的语言任务能力。
  • results: 实验显示,随着Diffusion Models的扩大,其表现在下游语言任务中得到了重大提升,而 instruciton finetuning 还能够启动零shot和几shot在Context learning 的能力,并且在进一步的和复杂的任务,如推理,表现出了扎实的能力。
    Abstract The recent surge of generative AI has been fueled by the generative power of diffusion probabilistic models and the scalable capabilities of large language models. Despite their potential, it remains elusive whether diffusion language models can solve general language tasks comparable to their autoregressive counterparts. This paper demonstrates that scaling diffusion models w.r.t. data, sizes, and tasks can effectively make them strong language learners. We build competent diffusion language models at scale by first acquiring knowledge from massive data via masked language modeling pretraining thanks to their intrinsic connections. We then reprogram pretrained masked language models into diffusion language models via diffusive adaptation, wherein task-specific finetuning and instruction finetuning are explored to unlock their versatility in solving general language tasks. Experiments show that scaling diffusion language models consistently improves performance across downstream language tasks. We further discover that instruction finetuning can elicit zero-shot and few-shot in-context learning abilities that help tackle many unseen tasks by following natural language instructions, and show promise in advanced and challenging abilities such as reasoning
    摘要 最近的生成AI冲击浪潮得到了扩散概率模型的生成能力和大语言模型的可扩展能力。虽然它们的潜力尚未得到证明,但这篇论文表明,通过数据、大小和任务的扩散可以让扩散语言模型成为强大的语言学习者。我们通过先从巨量数据中获得知识,然后通过扩散适应来重新编程已有的偏Masked语言模型,以解锁其在普通语言任务中的多样性。实验表明,通过扩散语言模型的扩大会提高下游语言任务的性能。我们还发现,对于特定任务和指令进行练习和调整可以让模型在没有seen任务中表现出零shot和几shot在场景中学习的能力,并且在复杂和挑战性任务中表现出了承袭。