cs.LG - 2023-07-28

TriadNet: Sampling-free predictive intervals for lesional volume in 3D brain MR images

  • paper_url: http://arxiv.org/abs/2307.15638
  • repo_url: https://github.com/benolmbrt/TriadNet
  • paper_authors: Benjamin Lambert, Florence Forbes, Senan Doyle, Michel Dojat
  • for: 这个研究的目的是提高Volume segmentation工具的可靠性和丰富性,以便在临床实践中更好地使用。
  • methods: 这个研究使用了多头Convolutional Neural Networks (CNN)架构,同时提供了病变体积和相关预测范围。
  • results: 研究在BraTS 2021数据集上表现出色,证明了这种方法的优越性。
    Abstract The volume of a brain lesion (e.g. infarct or tumor) is a powerful indicator of patient prognosis and can be used to guide the therapeutic strategy. Lesional volume estimation is usually performed by segmentation with deep convolutional neural networks (CNN), currently the state-of-the-art approach. However, to date, few work has been done to equip volume segmentation tools with adequate quantitative predictive intervals, which can hinder their usefulness and acceptation in clinical practice. In this work, we propose TriadNet, a segmentation approach relying on a multi-head CNN architecture, which provides both the lesion volumes and the associated predictive intervals simultaneously, in less than a second. We demonstrate its superiority over other solutions on BraTS 2021, a large-scale MRI glioblastoma image database.
    摘要 “脑部损伤(例如血栓或肿瘤)的体积是诊断病人 prospect 和治疗策略的重要指标,可以用来引导诊断和治疗策略。 however, 到目前为止,有很少的工作是将体积分 segmentation 工具与适当的量化预测 интервал相结合,这会限制其在临床实践中的用途和采纳。 在这个工作中,我们提出了 TriadNet,一种基于多头 CNN 架构的分 segmentation 方法,可以同时提供体积和相应的预测 интервал,执行时间只需要0.1秒钟。 我们在 BraTS 2021 大规模 MRI 肿瘤影像库中与其他解析方法进行比较, demonstarte 其超越性。”Note: "BraTS" is an abbreviation for "Brain Tumor Segmentation" challenge, which is a large-scale MRI glioblastoma image database.

A Comparative Analysis of Machine Learning Methods for Lane Change Intention Recognition Using Vehicle Trajectory Data

  • paper_url: http://arxiv.org/abs/2307.15625
  • repo_url: None
  • paper_authors: Renteng Yuan
  • for: 本研究旨在用机器学习方法Recognize lane change (LC) intention from high-dimensionality time series data, 帮助自动驾驶车辆更好地理解它们的周围环境,认出安全隐患,提高交通安全性。
  • methods: 本研究比较了不同的机器学习方法在LC意图识别方面的表现,包括XGBoost和LightGBM两种机器学习算法。
  • results: 结果显示, ensemble methods可以减少类型II和类型III错误的影响,并且LightGBM可以提高模型训练效率,而无需牺牲识别精度。
    Abstract Accurately detecting and predicting lane change (LC)processes can help autonomous vehicles better understand their surrounding environment, recognize potential safety hazards, and improve traffic safety. This paper focuses on LC processes and compares different machine learning methods' performance to recognize LC intention from high-dimensionality time series data. To validate the performance of the proposed models, a total number of 1023 vehicle trajectories is extracted from the CitySim dataset. For LC intention recognition issues, the results indicate that with ninety-eight percent of classification accuracy, ensemble methods reduce the impact of Type II and Type III classification errors. Without sacrificing recognition accuracy, the LightGBM demonstrates a sixfold improvement in model training efficiency than the XGBoost algorithm.
    摘要 自动驾驶车辆可以更好地理解它所处环境,认识安全隐患和提高交通安全性,通过准确探测和预测车道变换(LC)过程。这篇论文关注LC过程,并比较不同机器学习方法在认识LC意图时的表现。为验证提议模型的性能,从CitySim数据集中提取了1023辆汽车轨迹。对LC意图识别问题,结果表明, ensemble方法可以降低类型II和类型III错误的影响,同时保持识别精度。不 sacrificing 识别精度,LightGBM Algorithm 比 XGBoost 算法快六倍。

  • paper_url: http://arxiv.org/abs/2307.15621
  • repo_url: https://github.com/awesomelemon/pbt-nas
  • paper_authors: Alexander Chebykin, Arkadiy Dushatskiy, Tanja Alderliesten, Peter A. N. Bosman
  • for: 这篇论文目的是实现Neural Architecture Search(NAS)中的内在搜寻。
  • methods: 这篇论文使用了同时训练和混合神经网络,并将遗传学到的 weights 重复使用来优化参数。提出了基于PBT(Population Based Training)算法的PBT-NAS,可以在训练过程中改善架构,并将不好performing网络替换为well-performing网络的结果。
  • results: PBT-NAS在复杂任务(如图像生成和强化学习)上实现了superior的表现,较baseline(随机搜寻和变化基于PBT)为佳。
    Abstract In this work, we show that simultaneously training and mixing neural networks is a promising way to conduct Neural Architecture Search (NAS). For hyperparameter optimization, reusing the partially trained weights allows for efficient search, as was previously demonstrated by the Population Based Training (PBT) algorithm. We propose PBT-NAS, an adaptation of PBT to NAS where architectures are improved during training by replacing poorly-performing networks in a population with the result of mixing well-performing ones and inheriting the weights using the shrink-perturb technique. After PBT-NAS terminates, the created networks can be directly used without retraining. PBT-NAS is highly parallelizable and effective: on challenging tasks (image generation and reinforcement learning) PBT-NAS achieves superior performance compared to baselines (random search and mutation-based PBT).
    摘要 在这个工作中,我们表明同时训练和混合神经网络是可行的神经建筑搜索(NAS)方法。为超参数优化,重用部分训练过的权重可以实现高效的搜索,这与前一个 Population Based Training(PBT)算法相似。我们提议PBT-NAS算法,它是PBT的NAS适应,通过在训练过程中将不好表现的网络替换为良好表现的网络和混合的结果,使用缩减误差技巧继承权重。在PBT-NAS结束后,创建的网络可以直接使用无需重新训练。PBT-NAS高度并行和有效,在复杂任务(图像生成和强化学习)上,PBT-NAS比基线(随机搜索和误差基于PBT)表现出优秀性。

Robust Distortion-free Watermarks for Language Models

  • paper_url: http://arxiv.org/abs/2307.15593
  • repo_url: https://github.com/jthickstun/watermark
  • paper_authors: Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, Percy Liang
  • for: 这个论文的目的是在语言模型中植入水印,以防止不当使用和抓取文本。
  • methods: 这个论文使用了随机水印键来生成水印文本,并使用了两种采样方法来增强水印的可靠性。
  • results: 实验表明,这种水印方法可以够强大地检测 watermarked 文本,即使在文本中受到较大的杂音和抓取攻击。
    Abstract We propose a methodology for planting watermarks in text from an autoregressive language model that are robust to perturbations without changing the distribution over text up to a certain maximum generation budget. We generate watermarked text by mapping a sequence of random numbers -- which we compute using a randomized watermark key -- to a sample from the language model. To detect watermarked text, any party who knows the key can align the text to the random number sequence. We instantiate our watermark methodology with two sampling schemes: inverse transform sampling and exponential minimum sampling. We apply these watermarks to three language models -- OPT-1.3B, LLaMA-7B and Alpaca-7B -- to experimentally validate their statistical power and robustness to various paraphrasing attacks. Notably, for both the OPT-1.3B and LLaMA-7B models, we find we can reliably detect watermarked text ($p \leq 0.01$) from $35$ tokens even after corrupting between $40$-$50$\% of the tokens via random edits (i.e., substitutions, insertions or deletions). For the Alpaca-7B model, we conduct a case study on the feasibility of watermarking responses to typical user instructions. Due to the lower entropy of the responses, detection is more difficult: around $25\%$ of the responses -- whose median length is around $100$ tokens -- are detectable with $p \leq 0.01$, and the watermark is also less robust to certain automated paraphrasing attacks we implement.
    摘要 我们提出了一种方法,可以在语言模型中植入水印,这些水印是对文本的扰动不变的,并且可以在一定的最大生成预算内保持水印的可读性。我们使用一个随机水印密钥来生成水印文本,其中每个字符串都是基于随机数列的映射。为检测水印文本,任何知道水印密钥的人都可以对文本进行对齐。我们在两种抽样方式下实现了这种水印方法:倒计时间抽样和最小值抽样。我们在三个语言模型(OPT-1.3B、LLaMA-7B和Alpaca-7B)上实验 validate了这种水印方法的统计能力和对各种重建攻击的Robustness。结果显示,对OPT-1.3B和LLaMA-7B模型来说,我们可以在35个字符串上(即40%-50%的扰动)检测水印文本(p ≤ 0.01)。对Alpaca-7B模型来说,我们进行了一个案例研究,发现在响应 тиppy instrucions 上植入水印的可能性较低,只有25%的响应可以在p ≤ 0.01的概率下检测,并且水印在某些自动重建攻击下也显示了较弱的Robustness。

Evaluating the structure of cognitive tasks with transfer learning

  • paper_url: http://arxiv.org/abs/2308.02408
  • repo_url: None
  • paper_authors: Bruno Aristimunha, Raphael Y. de Camargo, Walter H. Lopez Pinaya, Sylvain Chevallier, Alexandre Gramfort, Cedric Rommel
  • for: 这个研究旨在investigate deep learning representation transferability in EEG decoding tasks, in order to address the challenge of limited labelled data.
  • methods: 研究使用state-of-the-art decoding models on two recently released EEG datasets, ERP CORE和M$^3$CV, with over 140 subjects and 11 cognitive tasks. measure transferability by pre-training deep neural networks on one task and assessing their ability to decode subsequent tasks.
  • results: 实验结果显示,即使使用线性探测传输,也可以获得显著改善 decoding performance,比对纯粹supervisedapproach增加28%。此外,发现certain decoding paradigms elicit specific and narrow brain activities, while others benefit from pre-training on a broad range of representations.
    Abstract Electroencephalography (EEG) decoding is a challenging task due to the limited availability of labelled data. While transfer learning is a promising technique to address this challenge, it assumes that transferable data domains and task are known, which is not the case in this setting. This study investigates the transferability of deep learning representations between different EEG decoding tasks. We conduct extensive experiments using state-of-the-art decoding models on two recently released EEG datasets, ERP CORE and M$^3$CV, containing over 140 subjects and 11 distinct cognitive tasks. We measure the transferability of learned representations by pre-training deep neural networks on one task and assessing their ability to decode subsequent tasks. Our experiments demonstrate that, even with linear probing transfer, significant improvements in decoding performance can be obtained, with gains of up to 28% compare with the pure supervised approach. Additionally, we discover evidence that certain decoding paradigms elicit specific and narrow brain activities, while others benefit from pre-training on a broad range of representations. By revealing which tasks transfer well and demonstrating the benefits of transfer learning for EEG decoding, our findings have practical implications for mitigating data scarcity in this setting. The transfer maps generated also provide insights into the hierarchical relations between cognitive tasks, hence enhancing our understanding of how these tasks are connected from a neuroscientific standpoint.
    摘要 电enzephalography(EEG)解oding是一个挑战性的任务,因为标注数据的可用性有限。而转移学习是一种有前途的技术,它假设有可转移的数据领域和任务,但这不是这个设定中的情况。这项研究探讨了EEG解oding任务中深度学习表示的可转移性。我们在两个最新发布的EEG数据集,ERP CORE和M$^3$CV,上进行了广泛的实验,这两个数据集包含了140名参与者和11种不同的认知任务。我们测量了深度神经网络在不同任务之间的可转移性,通过在一个任务上预训练深度神经网络,并评估它在后续任务上的解oding性能。我们的实验表明,即使使用线性探索传输,也可以获得显著改善,与纯粹监督方法相比,提高达28%。此外,我们发现了一些解oding方法会产生特定和窄频谱的大脑活动,而其他方法则需要预训练在广泛的表示上。我们的发现有实际意义,可以减轻EEG解oding中的数据缺乏问题。同时,我们生成的传输地图也提供了认知任务之间的层次关系,从神经科学的角度来看,这有助于我们更好地理解这些任务之间的连接。

Dynamic algorithms for k-center on graphs

  • paper_url: http://arxiv.org/abs/2307.15557
  • repo_url: https://github.com/swati1024/torrents
  • paper_authors: Emilio Cruciani, Sebastian Forster, Gramoz Goranci, Yasamin Nazari, Antonis Skarlatos
  • for: 本研究的目的是提出首个高效的动态图-$k$-中心问题解决方案。
  • methods: 本文使用了 deterministic decremental $(2+\epsilon)$-approximation algorithm和 randomized incremental $(4+\epsilon)$-approximation algorithm,both with amortized update time $kn^{o(1)}$ for weighted graphs。
  • results: 本文获得了一个fully dynamic $(2+\epsilon)$-approximation algorithm for the $k$-center problem,with worst-case update time that is within a factor $k$ of the state-of-the-art upper bound for maintaining $(1+\epsilon)$-approximate single-source distances in graphs。
    Abstract In this paper we give the first efficient algorithms for the $k$-center problem on dynamic graphs undergoing edge updates. In this problem, the goal is to partition the input into $k$ sets by choosing $k$ centers such that the maximum distance from any data point to the closest center is minimized. It is known that it is NP-hard to get a better than $2$ approximation for this problem. While in many applications the input may naturally be modeled as a graph, all prior works on $k$-center problem in dynamic settings are on metrics. In this paper, we give a deterministic decremental $(2+\epsilon)$-approximation algorithm and a randomized incremental $(4+\epsilon)$-approximation algorithm, both with amortized update time $kn^{o(1)}$ for weighted graphs. Moreover, we show a reduction that leads to a fully dynamic $(2+\epsilon)$-approximation algorithm for the $k$-center problem, with worst-case update time that is within a factor $k$ of the state-of-the-art upper bound for maintaining $(1+\epsilon)$-approximate single-source distances in graphs. Matching this bound is a natural goalpost because the approximate distances of each vertex to its center can be used to maintain a $(2+\epsilon)$-approximation of the graph diameter and the fastest known algorithms for such a diameter approximation also rely on maintaining approximate single-source distances.
    摘要 在这篇论文中,我们提供了首次有效的动态图-$k$-中心问题算法。在这个问题中,目标是将输入分成$k$个集合,通过选择$k$个中心,以最小化最大数据点与最近中心之间的距离。已知这个问题是NP困难的,无法获得更好于2的近似值。在许多应用中,输入可能自然地表示为图,但所有之前的动态设置-$k$-中心问题研究都是基于度量。在这篇论文中,我们提供了一个 deterministic 的减少$(2+\epsilon)$-近似算法和一个随机的增量$(4+\epsilon)$-近似算法,它们都有平均更新时间为$kn^{o(1)}$。此外,我们还提供了一个减少算法,它可以在最坏情况下更新时间与状态艺术的最佳上限相比,这个上限是为维护$(1+\epsilon)$-近似单源距离的最佳算法。这个目标是自然的因为每个顶点的中心的近似距离可以用来维护$(2+\epsilon)$-近似图 Diameter 和最快的知识算法。

On the Trade-off Between Efficiency and Precision of Neural Abstraction

  • paper_url: http://arxiv.org/abs/2307.15546
  • repo_url: None
  • paper_authors: Alec Edwards, Mirco Giacobbe, Alessandro Abate
  • for: 这篇论文主要目的是提出了一种基于神经网络的形式准确模型,并研究了这种模型在不同场景下的使用和优点。
  • methods: 这篇论文使用了形式幂学 Synthesis 技术,通过设计不同类型的神经网络来生成准确的模型。
  • results: 研究发现,不同类型的神经网络模型在不同场景下具有不同的优点和缺点,并且可以根据具体的应用场景选择合适的模型。
    Abstract Neural abstractions have been recently introduced as formal approximations of complex, nonlinear dynamical models. They comprise a neural ODE and a certified upper bound on the error between the abstract neural network and the concrete dynamical model. So far neural abstractions have exclusively been obtained as neural networks consisting entirely of $ReLU$ activation functions, resulting in neural ODE models that have piecewise affine dynamics, and which can be equivalently interpreted as linear hybrid automata. In this work, we observe that the utility of an abstraction depends on its use: some scenarios might require coarse abstractions that are easier to analyse, whereas others might require more complex, refined abstractions. We therefore consider neural abstractions of alternative shapes, namely either piecewise constant or nonlinear non-polynomial (specifically, obtained via sigmoidal activations). We employ formal inductive synthesis procedures to generate neural abstractions that result in dynamical models with these semantics. Empirically, we demonstrate the trade-off that these different neural abstraction templates have vis-a-vis their precision and synthesis time, as well as the time required for their safety verification (done via reachability computation). We improve existing synthesis techniques to enable abstraction of higher-dimensional models, and additionally discuss the abstraction of complex neural ODEs to improve the efficiency of reachability analysis for these models.
    摘要

Beating Backdoor Attack at Its Own Game

  • paper_url: http://arxiv.org/abs/2307.15539
  • repo_url: https://github.com/damianliumin/non-adversarial_backdoor
  • paper_authors: Min Liu, Alberto Sangiovanni-Vincentelli, Xiangyu Yue
  • for: 防止深度神经网络(DNNs)受到后门攻击,保护模型的正常运行。
  • methods: 基于非对抗性后门攻击,对恶意样本进行探测和毒素处理,以防止后门攻击。
  • results: 对多个benchmark和不同的攻击方法进行了广泛的实验,结果表明我们的方法可以达到现有防御方法的最高效果,同时具有最低的清洁数据影响。
    Abstract Deep neural networks (DNNs) are vulnerable to backdoor attack, which does not affect the network's performance on clean data but would manipulate the network behavior once a trigger pattern is added. Existing defense methods have greatly reduced attack success rate, but their prediction accuracy on clean data still lags behind a clean model by a large margin. Inspired by the stealthiness and effectiveness of backdoor attack, we propose a simple but highly effective defense framework which injects non-adversarial backdoors targeting poisoned samples. Following the general steps in backdoor attack, we detect a small set of suspected samples and then apply a poisoning strategy to them. The non-adversarial backdoor, once triggered, suppresses the attacker's backdoor on poisoned data, but has limited influence on clean data. The defense can be carried out during data preprocessing, without any modification to the standard end-to-end training pipeline. We conduct extensive experiments on multiple benchmarks with different architectures and representative attacks. Results demonstrate that our method achieves state-of-the-art defense effectiveness with by far the lowest performance drop on clean data. Considering the surprising defense ability displayed by our framework, we call for more attention to utilizing backdoor for backdoor defense. Code is available at https://github.com/damianliumin/non-adversarial_backdoor.
    摘要 Inspired by the stealthiness and effectiveness of backdoor attacks, we propose a simple but highly effective defense framework that injects non-adversarial backdoors targeting poisoned samples. We follow the general steps of backdoor attacks and detect a small set of suspected samples, and then apply a poisoning strategy to them. The non-adversarial backdoor, once triggered, suppresses the attacker's backdoor on poisoned data but has limited influence on clean data.Our defense can be carried out during data preprocessing, without modifying the standard end-to-end training pipeline. We conduct extensive experiments on multiple benchmarks with different architectures and representative attacks. The results show that our method achieves state-of-the-art defense effectiveness with the lowest performance drop on clean data.Considering the surprising defense ability displayed by our framework, we call for more attention to utilizing backdoors for backdoor defense. Our code is available at https://github.com/damianliumin/non-adversarial_backdoor.

RFID-Assisted Indoor Localization Using Hybrid Wireless Data Fusion

  • paper_url: http://arxiv.org/abs/2308.02410
  • repo_url: None
  • paper_authors: Abouzar Ghavami, Ali Abedi
  • for: 本研究旨在提出一种混合部分基于RFID追踪设备和多种IoT无线通信协议的indoor定位方法,以减少RFID标签成本。
  • methods: 本方法使用开发的RFID追踪设备和多种IoT无线通信协议,包括蓝牙、WiFi和ZigBee等,实现了indoor定位。
  • results: 实验结果验证了分析结果,并且表明了 hybrid 方法可以减少RFID标签数量而不影响定位精度。
    Abstract Wireless localization is essential for tracking objects in indoor environments. Internet of Things (IoT) enables localization through its diverse wireless communication protocols. In this paper, a hybrid section-based indoor localization method using a developed Radio Frequency Identification (RFID) tracking device and multiple IoT wireless technologies is proposed. In order to reduce the cost of the RFID tags, the tags are installed only on the borders of each section. The RFID tracking device identifies the section, and the proposed wireless hybrid method finds the location of the object inside the section. The proposed hybrid method is analytically driven by linear location estimates obtained from different IoT wireless technologies. The experimental results using developed RFID tracking device and RSSI-based localization for Bluetooth, WiFi and ZigBee technologies verifies the analytical results.
    摘要 无线地位确定是内部环境中对物品跟踪的重要组成部分。互联网东西(IoT)使得地位确定可以通过其多种无线通信协议。本文提出了一种混合段基地的indoor地位确定方法,使用开发的Radio Frequency Identification(RFID)跟踪设备和多种IoT无线技术。为了降低RFID标签的成本,标签仅在每个段的边部安装。RFID跟踪设备确定段,提出的混合方法在段内确定物品的位置。提议的混合方法由不同的IoT无线技术提供的线性位置估计驱动。实验结果使用开发的RFID跟踪设备和RSSI基于Bluetooth、WiFi和ZigBee技术的本地化位置确定验证了分析结果。

The Applicability of Federated Learning to Official Statistics

  • paper_url: http://arxiv.org/abs/2307.15503
  • repo_url: None
  • paper_authors: Joshua Stock, Oliver Hauke, Julius Weißmann, Hannes Federrath
  • for: 这项研究探讨了 Federated Learning(FL)在官方统计领域的潜在可能性,并证明 FL 模型的性能可以与中央学习方法相比。同时,通过保护数据持有者的隐私,FL 的使用可以拓宽数据访问,最终提高官方统计。
  • methods: 该研究通过三个不同的应用场景进行了模拟,包括医疗保险数据集、细颗粒污染数据集和移动 радио覆盖数据集。这些数据集都来自于官方统计领域相关的领域。我们提供了详细的分析结果,包括中央和 FL 算法性能比较。在所有三个应用场景中,我们成功地使用 FL 模型来训练达到中央模型标准性能水平。
  • results: 我们的关键观察结论和其在实践中的应用有关的建议。我们得出结论,FL 有可能成为未来官方统计领域的重要技术。
    Abstract This work investigates the potential of Federated Learning (FL) for official statistics and shows how well the performance of FL models can keep up with centralized learning methods. At the same time, its utilization can safeguard the privacy of data holders, thus facilitating access to a broader range of data and ultimately enhancing official statistics. By simulating three different use cases, important insights on the applicability of the technology are gained. The use cases are based on a medical insurance data set, a fine dust pollution data set and a mobile radio coverage data set - all of which are from domains close to official statistics. We provide a detailed analysis of the results, including a comparison of centralized and FL algorithm performances for each simulation. In all three use cases, we were able to train models via FL which reach a performance very close to the centralized model benchmarks. Our key observations and their implications for transferring the simulations into practice are summarized. We arrive at the conclusion that FL has the potential to emerge as a pivotal technology in future use cases of official statistics.
    摘要 Here is the text in Simplified Chinese:这项研究探讨了联邦学习(FL)在官方统计中的潜力,并比较了中央学习方法的性能。研究表明,FL可以保护数据持有者的隐私,同时维护模型的性能,因此成为官方统计中重要的技术。研究使用医疗保险、细颗粒污染和移动广播覆盖等三个不同的应用场景进行模拟,并提供了详细的分析结果。研究发现,FL模型在所有三个应用场景中都可以达到中央模型标准的性能水平,并提出了关键观察和未来实践中的启示。研究结论是,FL有望成为未来官方统计中的重要技术。

AbDiffuser: Full-Atom Generation of In-Vitro Functioning Antibodies

  • paper_url: http://arxiv.org/abs/2308.05027
  • repo_url: None
  • paper_authors: Karolis Martinkus, Jan Ludwiczak, Kyunghyun Cho, Wei-Ching Liang, Julien Lafrance-Vanasse, Isidro Hotzel, Arvind Rajpal, Yan Wu, Richard Bonneau, Vladimir Gligorijevic, Andreas Loukas
  • for: 本研究旨在提出一种基于等变量和物理约束的扩散模型,用于同时生成抗体三维结构和序列。
  • methods: 该模型基于一种新的蛋白结构表示方法,利用一种新的对称抗体体积架构,并使用强 diffusion 约束来改进减阈过程。
  • results: 实验表明,AbDiffuser 可以准确地生成与参照集的序列和结构性质高度相似的抗体。实验室实验证实了所有16种 HER2 抗体的高水平表达,并且57.1%的选择设计是紧紧的绑定者。
    Abstract We introduce AbDiffuser, an equivariant and physics-informed diffusion model for the joint generation of antibody 3D structures and sequences. AbDiffuser is built on top of a new representation of protein structure, relies on a novel architecture for aligned proteins, and utilizes strong diffusion priors to improve the denoising process. Our approach improves protein diffusion by taking advantage of domain knowledge and physics-based constraints; handles sequence-length changes; and reduces memory complexity by an order of magnitude enabling backbone and side chain generation. We validate AbDiffuser in silico and in vitro. Numerical experiments showcase the ability of AbDiffuser to generate antibodies that closely track the sequence and structural properties of a reference set. Laboratory experiments confirm that all 16 HER2 antibodies discovered were expressed at high levels and that 57.1% of selected designs were tight binders.
    摘要 我们介绍AbDiffuser,一个equivariant和physics-informed扩散模型,用于同时生成抗体三维结构和序列。AbDiffuser基于一个新的蛋白结构表示方法,使用一个新的对称蛋白架构,并利用强大的扩散假设来改善整理过程。我们的方法可以改善蛋白扩散,利用domain知识和物理基于的限制;处理序列长度变化;并降低内存复杂度,实现脊梁和侧链生成。我们在silico和in vitro验证AbDiffuser,实验结果显示AbDiffuser能够将抗体的序列和结构属性与参考集 closely track。实验验证确认了所有16个HER2抗体的高水平表达,并证明57.1%的选择设计是紧系者。

Curiosity-Driven Reinforcement Learning based Low-Level Flight Control

  • paper_url: http://arxiv.org/abs/2307.15724
  • repo_url: https://github.com/a-ramezani/cdrl-l2fc_u_hcm
  • paper_authors: Amir Ramezani Dooraki, Alexandros Iosifidis
  • for: 这个论文是为了研究人工智能中的curiosity drivenserial,并提出了一种基于惊喜的自主学习控制算法,以便 quadcopter 通过障碍物进行控制并实现最佳奖励。
  • methods: 该论文使用了各种方法,包括基于奖励的回归学习算法、新的curiosity方法基于预测误差、视觉化效果的测试和评估。
  • results: 测试结果显示,该算法可以学习优化策略,最大化奖励,而其他算法则无法实现。
    Abstract Curiosity is one of the main motives in many of the natural creatures with measurable levels of intelligence for exploration and, as a result, more efficient learning. It makes it possible for humans and many animals to explore efficiently by searching for being in states that make them surprised with the goal of learning more about what they do not know. As a result, while being curious, they learn better. In the machine learning literature, curiosity is mostly combined with reinforcement learning-based algorithms as an intrinsic reward. This work proposes an algorithm based on the drive of curiosity for autonomous learning to control by generating proper motor speeds from odometry data. The quadcopter controlled by our proposed algorithm can pass through obstacles while controlling the Yaw direction of the quad-copter toward the desired location. To achieve that, we also propose a new curiosity approach based on prediction error. We ran tests using on-policy, off-policy, on-policy plus curiosity, and the proposed algorithm and visualized the effect of curiosity in evolving exploration patterns. Results show the capability of the proposed algorithm to learn optimal policy and maximize reward where other algorithms fail to do so.
    摘要 尝尝是许多自然生物具有衡量水平的智慧所展现出的主要动机之一,用于探索和因此更有效地学习。它使人类和许多动物能够有效地探索,通过搜寻未知状态来带来惊喜,并且从而学习更多。在机器学习文献中,尝尝通常与回归学习算法结合使用,作为内在奖励。这项工作提议一种基于尝尝的自主学习控制算法,通过生成相应的机动速度来从ODometry数据中生成。控制我们提议的四旋翼机器人可以通过避免障碍物,同时控制四旋翼机器人的纬度方向向所需的位置。为了实现这一点,我们还提出了一种基于预测误差的新尝尝方法。我们在使用on-policy、off-policy、on-policy plus curiosity和我们提议的算法进行测试,并将其视觉化为探索模式的演变。结果显示我们的算法能够学习优化策略,并在其他算法失败之前最大化奖励。

From continuous-time formulations to discretization schemes: tensor trains and robust regression for BSDEs and parabolic PDEs

  • paper_url: http://arxiv.org/abs/2307.15496
  • repo_url: https://github.com/lorenzrichter/PDE-backward-solver
  • paper_authors: Lorenz Richter, Leon Sallandt, Nikolas Nüsken
  • for: 解决高维partial differential equations (PDEs)的数值近似问题,即卷积环境中的呼啸问题。
  • methods: 利用Monte Carlo方法和变分方法,使用神经网络来近似函数。基于tensor train的框架,将拥有层次结构的问题转化为 backwards stochastic differential equations和回推方法。
  • results: 提出了一种新的数值策略,可以同时实现高精度和高效计算。理论和实验研究表明,这种方法可以在许多情况下获得一个有利的妥协,即高精度和高效计算之间。
    Abstract The numerical approximation of partial differential equations (PDEs) poses formidable challenges in high dimensions since classical grid-based methods suffer from the so-called curse of dimensionality. Recent attempts rely on a combination of Monte Carlo methods and variational formulations, using neural networks for function approximation. Extending previous work (Richter et al., 2021), we argue that tensor trains provide an appealing framework for parabolic PDEs: The combination of reformulations in terms of backward stochastic differential equations and regression-type methods holds the promise of leveraging latent low-rank structures, enabling both compression and efficient computation. Emphasizing a continuous-time viewpoint, we develop iterative schemes, which differ in terms of computational efficiency and robustness. We demonstrate both theoretically and numerically that our methods can achieve a favorable trade-off between accuracy and computational efficiency. While previous methods have been either accurate or fast, we have identified a novel numerical strategy that can often combine both of these aspects.
    摘要 NUMERICAL APPROXIMATION OF PARTIAL DIFFERENTIAL EQUATIONS (PDEs) IN HIGH DIMENSIONS PRESENTS DIFFICULT CHALLENGES. CLASSICAL GRID-BASED METHODS SUFFER FROM THE SO-CALLED CURSE OF DIMENSIONALITY. RECENT ATTEMPTS USE A COMBINATION OF MONTE CARLO METHODS AND VARIATIONAL FORMULATIONS WITH NEURAL NETWORKS FOR FUNCTION APPROXIMATION. WE ARGUE THAT TENSOR TRAINS PROVIDE AN ATTRACTIVE FRAMEWORK FOR PARABOLIC PDEs. COMBINING REFORMULATIONS IN TERMS OF BACKWARD STOCHASTIC DIFFERENTIAL EQUATIONS AND REGRESSION-TYPE METHODS CAN LEVERAGE LATENT LOW-RANK STRUCTURES, ENABLING BOTH COMPRESSION AND EFFICIENT COMPUTATION. EMPHASIZING A CONTINUOUS-TIME VIEWPOINT, WE DEVELOP ITERATIVE SCHEMES THAT DIFFER IN TERMS OF COMPUTATIONAL EFFICIENCY AND ROBUSTNESS. WE DEMONSTRATE THEORETICALLY AND NUMERICALLY THAT OUR METHODS CAN ACHIEVE A FAVORABLE TRADE-OFF BETWEEN ACCURACY AND COMPUTATIONAL EFFICIENCY. PREVIOUS METHODS HAVE BEEN EITHER ACCURATE OR FAST, BUT WE HAVE IDENTIFIED A NOVEL NUMERICAL STRATEGY THAT CAN OFTEN COMBINE BOTH OF THESE ASPECTS.

FeedbackLogs: Recording and Incorporating Stakeholder Feedback into Machine Learning Pipelines

  • paper_url: http://arxiv.org/abs/2307.15475
  • repo_url: None
  • paper_authors: Matthew Barker, Emma Kallina, Dhananjay Ashok, Katherine M. Collins, Ashley Casovan, Adrian Weller, Ameet Talwalkar, Valerie Chen, Umang Bhatt
  • for: 这篇论文是为了提高机器学习(ML)管道的可追溯性和可信度而写的。
  • methods: 这篇论文提出了一种名为“FeedbackLogs”的新方法,用于跟踪多个潜在参与者的反馈。FeedbackLogs是ML管道的现有文档的补充,可以记录反馈收集过程中的重要细节、反馈内容以及如何将反馈纳入到管道中。
  • results: 这篇论文提出了一种可以作为算法审核的证据,以及一种用于记录基于潜在参与者反馈的更新的工具。
    Abstract Even though machine learning (ML) pipelines affect an increasing array of stakeholders, there is little work on how input from stakeholders is recorded and incorporated. We propose FeedbackLogs, addenda to existing documentation of ML pipelines, to track the input of multiple stakeholders. Each log records important details about the feedback collection process, the feedback itself, and how the feedback is used to update the ML pipeline. In this paper, we introduce and formalise a process for collecting a FeedbackLog. We also provide concrete use cases where FeedbackLogs can be employed as evidence for algorithmic auditing and as a tool to record updates based on stakeholder feedback.
    摘要 尽管机器学习(ML)管道影响越来越多的利益相关者,但有少量的研究关于如何记录和整合利益相关者的意见。我们提议使用 FeedbackLogs,增加现有的机器学习管道文档,以跟踪多个利益相关者的输入。每个日志记录了关键的反馈收集过程、反馈内容和如何使用反馈更新机器学习管道。在这篇论文中,我们介绍了和正式化收集FeedbackLog的过程。我们还提供了具体的应用场景,其中FeedbackLog可以作为算法审核的证据,以及记录基于利益相关者反馈的更新。

Rethinking Noisy Label Learning in Real-world Annotation Scenarios from the Noise-type Perspective

  • paper_url: http://arxiv.org/abs/2307.16889
  • repo_url: https://github.com/fuxiailab/protosemi
  • paper_authors: Renyu Zhu, Haoyu Liu, Runze Wu, Minmin Lin, Tangjie Lv, Changjie Fan, Haobo Wang
  • for: 本文研究了在实际标注场景中学习含有噪声标签的问题,噪声可以分为两类:事实噪声和抽象噪声。
  • methods: 我们提出了一种基于样本选择的噪声标签学习方法,称为Proto-semi。该方法首先将所有样本分为自信量高和自信量低两个集合,并利用自信量高集合 constructed protoype vectors capture 类别特征。然后,对不确定样本与 protoype vectors 之间的距离进行计算,以便噪声分类。根据这些距离,标签 either corrected 或 retained,从而改善自信量高和自信量低两个集合。最后,我们引入了一种半supervised learning方法,以提高训练。
  • results: 对一个实际标注数据集进行实验,证明了Proto-semi 能够有效地处理含有噪声标签的学习问题。同时, prototype-based 重新分配策略被证明可以减轻噪声标签的负面影响。我们的代码和数据可以在https://github.com/fuxiAIlab/ProtoSemi 上获取。
    Abstract In this paper, we investigate the problem of learning with noisy labels in real-world annotation scenarios, where noise can be categorized into two types: factual noise and ambiguity noise. To better distinguish these noise types and utilize their semantics, we propose a novel sample selection-based approach for noisy label learning, called Proto-semi. Proto-semi initially divides all samples into the confident and unconfident datasets via warm-up. By leveraging the confident dataset, prototype vectors are constructed to capture class characteristics. Subsequently, the distances between the unconfident samples and the prototype vectors are calculated to facilitate noise classification. Based on these distances, the labels are either corrected or retained, resulting in the refinement of the confident and unconfident datasets. Finally, we introduce a semi-supervised learning method to enhance training. Empirical evaluations on a real-world annotated dataset substantiate the robustness of Proto-semi in handling the problem of learning from noisy labels. Meanwhile, the prototype-based repartitioning strategy is shown to be effective in mitigating the adverse impact of label noise. Our code and data are available at https://github.com/fuxiAIlab/ProtoSemi.
    摘要 在这篇论文中,我们研究了在现实世界注解场景中学习含有噪声标签的问题。噪声可以分为两类:事实噪声和抽象噪声。为了更好地 distinguish these noise types和利用它们的 semantics,我们提议了一种基于样本选择的噪声标签学习方法,called Proto-semi。Proto-semi首先将所有样本分为confident和uncertain datasetsvia warm-up。通过利用confident dataset,我们构造了类特征表示vector。然后,我们计算了uncertain samples和类特征表示vector之间的距离,以便噪声分类。根据这些距离,我们可以更正或保留标签,从而得到了含有修复的confident和uncertain datasets。最后,我们引入了一种半supervised learning方法,以提高训练。Empirical evaluations on a real-world annotated dataset substantiate the robustness of Proto-semi in handling the problem of learning from noisy labels. Meanwhile, the prototype-based repartitioning strategy is shown to be effective in mitigating the adverse impact of label noise.我们的代码和数据可以在https://github.com/fuxiAIlab/ProtoSemi中找到。

Testing the Depth of ChatGPT’s Comprehension via Cross-Modal Tasks Based on ASCII-Art: GPT3.5’s Abilities in Regard to Recognizing and Generating ASCII-Art Are Not Totally Lacking

  • paper_url: http://arxiv.org/abs/2307.16806
  • repo_url: None
  • paper_authors: David Bayani
  • for: 本研究旨在探讨GPT3.5模型在视觉任务中的能力,以及对图像的理解和生成能力。
  • methods: 本研究使用了GPT3.5模型,对图像进行了不同的转换和处理,以评估模型的表现。
  • results: 研究发现,GPT3.5模型在图像认识和分类任务中表现出色,能够准确地识别和描述图像的不同部分。此外,模型还能够生成高质量的图像。
    Abstract Over the eight months since its release, ChatGPT and its underlying model, GPT3.5, have garnered massive attention, due to their potent mix of capability and accessibility. While a niche-industry of papers have emerged examining the scope of capabilities these models possess, the information fed to and extracted from these networks has been either natural language text or stylized, code-like language. Drawing inspiration from the prowess we expect a truly human-level intelligent agent to have across multiple signal modalities, in this work we examine GPT3.5's aptitude for visual tasks, where the inputs feature content provided as ASCII-art without overt distillation into a lingual summary. We conduct experiments analyzing the model's performance on image recognition tasks after various transforms typical in visual settings, trials investigating knowledge of image parts, and tasks covering image generation.
    摘要 Over the past eight months since its release, ChatGPT and its underlying model, GPT3.5, have received massive attention due to their powerful combination of capabilities and accessibility. While a niche industry of papers has emerged examining the scope of capabilities these models possess, the information fed to and extracted from these networks has been limited to natural language text or stylized, code-like language. Inspired by the versatility we expect from a truly human-level intelligent agent, we explore GPT3.5's ability for visual tasks, where the inputs are provided as ASCII art without any explicit lingual summaries. We conduct experiments analyzing the model's performance on image recognition tasks after various transforms commonly used in visual settings, trials investigating the model's understanding of image parts, and tasks involving image generation.

LUCID-GAN: Conditional Generative Models to Locate Unfairness

  • paper_url: http://arxiv.org/abs/2307.15466
  • repo_url: https://github.com/integrated-intelligence-lab/canonical_sets
  • paper_authors: Andres Algaba, Carmen Mazijn, Carina Prunkl, Jan Danckaert, Vincent Ginis
  • for: 检测黑盒模型中的不公正偏见
  • methods: 使用LUCID-GAN方法,通过生成幂等输入,暴露模型的内部逻辑中的可能的不公正偏见
  • results: 在UCI成人和COMPAS数据集上对黑盒模型进行了实验,并证明了LUCID-GAN方法可以无需访问训练数据,检测黑盒模型中的不公正偏见。
    Abstract Most group fairness notions detect unethical biases by computing statistical parity metrics on a model's output. However, this approach suffers from several shortcomings, such as philosophical disagreement, mutual incompatibility, and lack of interpretability. These shortcomings have spurred the research on complementary bias detection methods that offer additional transparency into the sources of discrimination and are agnostic towards an a priori decision on the definition of fairness and choice of protected features. A recent proposal in this direction is LUCID (Locating Unfairness through Canonical Inverse Design), where canonical sets are generated by performing gradient descent on the input space, revealing a model's desired input given a preferred output. This information about the model's mechanisms, i.e., which feature values are essential to obtain specific outputs, allows exposing potential unethical biases in its internal logic. Here, we present LUCID-GAN, which generates canonical inputs via a conditional generative model instead of gradient-based inverse design. LUCID-GAN has several benefits, including that it applies to non-differentiable models, ensures that canonical sets consist of realistic inputs, and allows to assess proxy and intersectional discrimination. We empirically evaluate LUCID-GAN on the UCI Adult and COMPAS data sets and show that it allows for detecting unethical biases in black-box models without requiring access to the training data.
    摘要 大多数群体公平性理念通过计算统计偏见指标来检测不当偏见。然而,这种方法存在多个缺点,如哲学上的分歧、互不兼容和解释性的缺失。这些缺点激发了研究补偿偏见检测方法的研究,这些方法可以提供更多的透明度,了解歧视的来源,并且不需要先行决定公平性的定义和保护特征。一种最近提出的方案是LUCID(找到不公正的位置),它通过在输入空间上进行梯度下降来生成含义的集合,并显示模型对特定输出的愿望的输入。这些信息可以暴露模型内部的不公正偏见。在这篇文章中,我们提出LUCID-GAN(基于GAN的找到不公正的位置),它使用conditional生成模型而不是梯度下降来生成含义的集合。LUCID-GAN具有多种优点,包括适用于不 diferenciable模型、生成的含义集合是实际的输入、以及可以评估代理和交叉性歧视。我们对UCI成人和COMPAS数据集进行了实验,并证明了LUCID-GAN可以无需访问训练数据检测黑盒模型中的不公正偏见。

Unsupervised machine-learning shock-capturing technique for high-order solvers

  • paper_url: http://arxiv.org/abs/2308.00086
  • repo_url: None
  • paper_authors: Andrés Mateo-Gabín, Kenza Tlales, Eusebio Valero, Esteban Ferrer, Gonzalo Rubio
  • for: 提高CFD代码的稳定性和效率,适用于复杂的 geometries 和 varied flow configurations。
  • methods: 基于 Gaussian Mixture Models (GMMs) 的无监督机器学习冲击捕捉算法,无需参数调整具有remarkable accuracy和 robustness。
  • results: 在多种测试 caso中,GMM感知器与现有方法相当,并且在高 Reynolds 数下表现出类似的效果,适用于supersonic flow 和高速燃烧等应用。
    Abstract We present a novel unsupervised machine learning shock capturing algorithm based on Gaussian Mixture Models (GMMs). The proposed GMM sensor demonstrates remarkable accuracy in detecting shocks and is robust across diverse test cases without the need for parameter tuning. We compare the GMM-based sensor with state-of-the-art alternatives. All methods are integrated into a high-order compressible discontinuous Galerkin solver where artificial viscosity can be modulated to capture shocks. Supersonic test cases, including high Reynolds numbers, showcase the sensor's performance, demonstrating the same effectiveness as fine-tuned state-of-the-art sensors. %The nodal DG aproach allows for potential applications in sub-cell flux-differencing formulations, supersonic feature detection, and mesh refinement. The adaptive nature and ability to function without extensive training datasets make this GMM-based sensor suitable for complex geometries and varied flow configurations. Our study reveals the potential of unsupervised machine learning methods, exemplified by the GMM sensor, to improve the robustness and efficiency of advanced CFD codes.
    摘要 我们提出了一种新的无监督机器学习震动捕捉算法基于混合分布(GMM)。我们的GMM感测器能够准确检测震动并在多种测试 caso中显示出良好的Robust性,无需进行参数调整。我们与现有的状态艺术方法进行比较。所有方法都被 интегрирован到一个高阶可 compressible discontinuous Galerkin 解析器中,其中人工粘度可以调整以捕捉震动。我们使用supersonic测试 caso,包括高 Reynolds 数,来证明感测器的性能,显示与精心调整的现有状态艺术感测器一样有效。 nodal DG 方法允许在 sub-cell flux-differencing 表述、supersonic特征检测和 mesh 细化中应用。适应性和无需广泛的培训数据集使得这种 GMM 感测器适用于复杂的 geometries 和多种流体配置。我们的研究表明无监督机器学习方法,如 GMM 感测器,可以提高高效性和稳定性的进险CFD 代码。

Worrisome Properties of Neural Network Controllers and Their Symbolic Representations

  • paper_url: http://arxiv.org/abs/2307.15456
  • repo_url: https://github.com/mimuw-rl/worrisome-nn
  • paper_authors: Jacek Cyranka, Kevin E M Church, Jean-Philippe Lessard
  • for: 研究控制器在简单强化学习问题中的稳定性问题。
  • methods: 使用神经网络控制器和其低神经和符号抽象。
  • results: 发现典型控制器可以达到高平均回报值,但同时也生成许多 persistently low-return 解决方案,这是一个非常不 desireable 性质,易受到敌对者利用。 simpler controllers 更容易生成 persistently bad solutions。 提供了一种系统性 robustness 研究算法,并证明存在 persistently solutions 和,在某些情况下, periodic orbits 的存在,使用计算机助力证明方法。
    Abstract We raise concerns about controllers' robustness in simple reinforcement learning benchmark problems. We focus on neural network controllers and their low neuron and symbolic abstractions. A typical controller reaching high mean return values still generates an abundance of persistent low-return solutions, which is a highly undesirable property, easily exploitable by an adversary. We find that the simpler controllers admit more persistent bad solutions. We provide an algorithm for a systematic robustness study and prove existence of persistent solutions and, in some cases, periodic orbits, using a computer-assisted proof methodology.
    摘要 我们对控制器的稳定性表示关注,特别是对于简单的问题学习 benchmark 中的神经网络控制器。我们发现,一般情况下,控制器可以获得高平均回数,但仍然生成丰富的持续低回数解,这是非常不愿意的危险性,可以轻易地被敌人利用。我们发现简单的控制器更易允许持续坏解。我们提供了一个系统性的稳定性研究 Algorithm,并证明了存在持续解和,在某些情况下, periodic orbit。我们使用了电脑支持的证明方法。

Autonomous Payload Thermal Control

  • paper_url: http://arxiv.org/abs/2307.15438
  • repo_url: None
  • paper_authors: Alejandro D. Mousist
  • for: 这个研究是为了解决小型卫星中热控制问题,因为发电设备和科学仪器的空间受限,热控制困难,可能会影响元件寿命和任务性能。
  • methods: 这个研究使用深度强化学习框架,特别是Soft Actor-Critic算法,在小型卫星上学习热控制策略。
  • results: 实验结果显示,提案的框架能够学习控制负载处理功率,以维持操作范围内的温度,辅助传统热控制系统。
    Abstract In small satellites there is less room for heat control equipment, scientific instruments, and electronic components. Furthermore, the near proximity of the electronics makes power dissipation difficult, with the risk of not being able to control the temperature appropriately, reducing component lifetime and mission performance. To address this challenge, taking advantage of the advent of increasing intelligence on board satellites, a deep reinforcement learning based framework that uses Soft Actor-Critic algorithm is proposed for learning the thermal control policy onboard. The framework is evaluated both in a naive simulated environment and in a real space edge processing computer that will be shipped in the future IMAGIN-e mission and hosted in the ISS. The experiment results show that the proposed framework is able to learn to control the payload processing power to maintain the temperature under operational ranges, complementing traditional thermal control systems.
    摘要 在小卫星中,因空间约束,热控制设备、科学仪器和电子组件具有更小的空间,导致热控制变得更加困难。此外,电子元件的近距离位置使得热量减少困难,从而影响组件寿命和任务性能。为解决这个挑战,利用卫星上增加的智能化程度,我们提出了基于深度强化学习的框架,使用软行为评价算法来学习卫星上的热控制策略。我们在模拟环境中和未来发射的IMAGIN-e任务中进行了实验,结果显示,我们的框架能够学习控制payload处理功率,以维护工作范围内的温度,与传统热控制系统相结合。

Improvable Gap Balancing for Multi-Task Learning

  • paper_url: http://arxiv.org/abs/2307.15429
  • repo_url: https://github.com/yanqidai/igb4mtl
  • paper_authors: Yanqi Dai, Nanyi Fei, Zhiwu Lu
  • for: 这种研究是为了提高多任务学习(MTL)中的性能,尤其是通过改善 improvable gap 来减少性能差距。
  • methods: 这种研究提出了两种新的 improvable gap balancing(IGB)算法,一种是简单的规则,另一种是通过深度强化学习来实现。两种算法都是通过动态分配任务权重来实现 improvable gap 的均衡。
  • results: 在两个 benchmark 数据集上进行了广泛的实验,发现这两种 IGB 算法在 MTL 中可以 дости到最佳结果,并且在结合 gradient balancing 时可以得到进一步的改善。
    Abstract In multi-task learning (MTL), gradient balancing has recently attracted more research interest than loss balancing since it often leads to better performance. However, loss balancing is much more efficient than gradient balancing, and thus it is still worth further exploration in MTL. Note that prior studies typically ignore that there exist varying improvable gaps across multiple tasks, where the improvable gap per task is defined as the distance between the current training progress and desired final training progress. Therefore, after loss balancing, the performance imbalance still arises in many cases. In this paper, following the loss balancing framework, we propose two novel improvable gap balancing (IGB) algorithms for MTL: one takes a simple heuristic, and the other (for the first time) deploys deep reinforcement learning for MTL. Particularly, instead of directly balancing the losses in MTL, both algorithms choose to dynamically assign task weights for improvable gap balancing. Moreover, we combine IGB and gradient balancing to show the complementarity between the two types of algorithms. Extensive experiments on two benchmark datasets demonstrate that our IGB algorithms lead to the best results in MTL via loss balancing and achieve further improvements when combined with gradient balancing. Code is available at https://github.com/YanqiDai/IGB4MTL.
    摘要 在多任务学习(MTL)中,梯度均衡在最近几年内得到了更多的研究兴趣,因为它经常会导致更好的性能。然而,损失均衡是梯度均衡的更加效率的方法,因此仍然值得进一步探索。尽管先前的研究通常忽略了多任务中存在的变化可以改善的差距,这个差距每个任务定义为当前训练进度与理想的最终训练进度之间的距离。因此,在许多情况下, после损失均衡,性能差距仍然存在。在这篇论文中,我们按照损失均衡框架,提出了两种新的可改进差距均衡(IGB)算法 для MTL:一种使用了简单的规则,另一种(是首次)使用了深度优化学习。特别是,不直接在 MTL 中平衡损失,两种算法都选择了在不同任务上动态分配任务权重以进行可改进差距均衡。此外,我们将 IGB 和梯度均衡结合,以示这两种算法之间的补充性。我们在两个标准数据集上进行了广泛的实验,结果显示,我们的 IGB 算法在 MTL 中通过损失均衡得到了最佳结果,并在与梯度均衡结合时得到了进一步的改进。代码可以在 GitHub 上找到:https://github.com/YanqiDai/IGB4MTL。

Implicit neural representation for change detection

  • paper_url: http://arxiv.org/abs/2307.15428
  • repo_url: None
  • paper_authors: Peter Naylor, Diego Di Carlo, Arianna Traviglia, Makoto Yamada, Marco Fiorucci
  • for: 检测3D空中LiDAR点云集中的变化,包括不匹配的空间支持和探测系统噪声。
  • methods: 提出了一种无监督方法,包括Neural Field(NF) для连续形态重建和 Gaussian Mixture Model(GMM) для分类变化。NF可以在不同的 spatial scale 上进行比较,从而提高检测Capability。
  • results: 在一个 simulated LiDAR点云集上进行了多种场景的比较,与当前状态艺技相比,提高了10%的intersection over union metric。此外,还应用于实际场景,鉴定了考古遗产被非法挖掘(looting)的情况,与场地专家的发现相符。
    Abstract Detecting changes that occurred in a pair of 3D airborne LiDAR point clouds, acquired at two different times over the same geographical area, is a challenging task because of unmatching spatial supports and acquisition system noise. Most recent attempts to detect changes on point clouds are based on supervised methods, which require large labelled data unavailable in real-world applications. To address these issues, we propose an unsupervised approach that comprises two components: Neural Field (NF) for continuous shape reconstruction and a Gaussian Mixture Model for categorising changes. NF offer a grid-agnostic representation to encode bi-temporal point clouds with unmatched spatial support that can be regularised to increase high-frequency details and reduce noise. The reconstructions at each timestamp are compared at arbitrary spatial scales, leading to a significant increase in detection capabilities. We apply our method to a benchmark dataset of simulated LiDAR point clouds for urban sprawling. The dataset offers different challenging scenarios with different resolutions, input modalities and noise levels, allowing a multi-scenario comparison of our method with the current state-of-the-art. We boast the previous methods on this dataset by a 10% margin in intersection over union metric. In addition, we apply our methods to a real-world scenario to identify illegal excavation (looting) of archaeological sites and confirm that they match findings from field experts.
    摘要 检测在两个不同时间采集的空中三维 LiDAR点云之间的变化是一项具有挑战性的任务,主要因为点云采集系统的噪声和不匹配的空间支持。大多数最新的变化检测方法都基于有监督的方法,需要庞大的标注数据,而实际应用中却缺乏这些数据。为解决这些问题,我们提出了一种无监督的方法,该方法包括两个组件:神经场(NF) для连续形态重建和高斯混合模型(GMM) для分类变化。NF提供了不受格子约束的表示方式,可以将不匹配的时间支持编码为高频环境和噪声减少。在每个时间戳中对重建的比较,可以在任意的空间缩放比进行比较,从而提高检测能力。我们在一个 simulate LiDAR点云数据集上进行了多种场景的比较,并证明了我们的方法在相对于当前状态的艺术中提高了10%的 intersect over union 指标。此外,我们还应用了我们的方法于一个真实世界应用中,并证明了它们与场景专家的发现相匹配。

Deep Generative Models, Synthetic Tabular Data, and Differential Privacy: An Overview and Synthesis

  • paper_url: http://arxiv.org/abs/2307.15424
  • repo_url: None
  • paper_authors: Conor Hassan, Robert Salomone, Kerrie Mengersen
  • for: 本研究提供了深入的生成数据技术发展synopsis,特注意点是对私人敏感数据的synthetic数据生成。
  • methods: 本文使用深度生成模型来生成tabular数据,并详细介绍了相关概念,如无监督学习、神经网络和生成模型。
  • results: 本文详细介绍了使用深度生成模型生成tabular数据的挑战和考虑因素,如数据normalization、隐私问题和模型评估。
    Abstract This article provides a comprehensive synthesis of the recent developments in synthetic data generation via deep generative models, focusing on tabular datasets. We specifically outline the importance of synthetic data generation in the context of privacy-sensitive data. Additionally, we highlight the advantages of using deep generative models over other methods and provide a detailed explanation of the underlying concepts, including unsupervised learning, neural networks, and generative models. The paper covers the challenges and considerations involved in using deep generative models for tabular datasets, such as data normalization, privacy concerns, and model evaluation. This review provides a valuable resource for researchers and practitioners interested in synthetic data generation and its applications.
    摘要 Simplified Chinese translation:这篇文章提供了关于深度生成模型在生成数据方面的最新发展,特别是在表格数据上。文章强调了在隐私敏感数据的情况下,生成数据的重要性。此外,文章还 highlights 深度生成模型与其他方法的优势,并提供了详细的解释,包括无监督学习、神经网络和生成模型的基本概念。文章还讨论了使用深度生成模型处理表格数据时的挑战和考虑因素,如数据Normalization、隐私问题和模型评估。这篇文章为关心生成数据和其应用的研究人员和实践者提供了有价值的资源。

Is One Epoch All You Need For Multi-Fidelity Hyperparameter Optimization?

  • paper_url: http://arxiv.org/abs/2307.15422
  • repo_url: https://github.com/deephyper/benchmark
  • paper_authors: Romain Egele, Isabelle Guyon, Yixuan Sun, Prasanna Balaprakash
  • for: 优化机器学习模型的精度,减少计算成本。
  • methods: 利用多级准确水平进行学习过程中的剔除低性能模型,提高模型选择效率。
  • results: 与其他代表性的多级准确优化方法相比,简单的基准点获得了类似的结果,而 computation 减少了一个数量级。分析了数据集的学习曲线,发现了一些主导的学习曲线,这解释了基准点的成功。
    Abstract Hyperparameter optimization (HPO) is crucial for fine-tuning machine learning models but can be computationally expensive. To reduce costs, Multi-fidelity HPO (MF-HPO) leverages intermediate accuracy levels in the learning process and discards low-performing models early on. We compared various representative MF-HPO methods against a simple baseline on classical benchmark data. The baseline involved discarding all models except the Top-K after training for only one epoch, followed by further training to select the best model. Surprisingly, this baseline achieved similar results to its counterparts, while requiring an order of magnitude less computation. Upon analyzing the learning curves of the benchmark data, we observed a few dominant learning curves, which explained the success of our baseline. This suggests that researchers should (1) always use the suggested baseline in benchmarks and (2) broaden the diversity of MF-HPO benchmarks to include more complex cases.
    摘要 (简体中文)神经网络模型精度优化(HPO)是机器学习模型微调的关键步骤,但可能具有高计算成本。为了降低成本,多级准确度HPO(MF-HPO)利用学习过程中的中间准确度水平,早期抛弃低性能模型。我们对多种代表性的MF-HPO方法进行了比较,与简单的基线方法进行了对比。基线方法是在训练一 epoch后,仅保留Top-K模型,并进行进一步的训练来选择最佳模型。 surprisingly,这个基线方法达到了与其他方法相同的结果,而且只需要一个数量级下的计算。对于经典数据集的学习曲线进行分析,我们发现了一些DOMINANT的学习曲线,这解释了我们的基线方法的成功。这表明研究人员应该(1)在benchmark中使用我们建议的基线方法,并(2)扩展MF-HPO的benchmark,以包括更复杂的情况。

The Initial Screening Order Problem

  • paper_url: http://arxiv.org/abs/2307.15398
  • repo_url: None
  • paper_authors: Jose M. Alvarez, Salvatore Ruggieri
  • For: The paper is written to address the initial screening order problem in candidate screening, with the goal of finding the first k suitable candidates rather than the best k candidates in a candidate pool.* Methods: The paper uses formal methods to prove the effects of initial screening orders on the selected set of k candidates, particularly in situations where the candidate pool is unbalanced (e.g., having more male than female candidates).* Results: The paper shows that under unbalanced candidate pools, the human-like screener can suffer from uneven efforts that hinder decision-making over the protected, under-represented group relative to the non-protected, over-represented group. The paper also proves other fairness results under the human-like screener.Here are the three points in Simplified Chinese text:* For: 本文旨在解决初层屏选择问题,即在候选人池中找到第一个k适合的候选人而不是最佳k candidates。* Methods: 本文使用正式方法证明初层屏选择顺序对选定的k candidates产生的影响,特别是在候选人池受损(例如有更多的男性 candidates)时。* Results: 本文表明在不均衡的候选人池时,人类化屏选者可能会因屏选顺序而受到不公正的努力,从保护的、下降的群体中减少决策。 论文还证明了其他公平性结果。
    Abstract In this paper we present the initial screening order problem, a crucial step within candidate screening. It involves a human-like screener with an objective to find the first k suitable candidates rather than the best k suitable candidates in a candidate pool given an initial screening order. The initial screening order represents the way in which the human-like screener arranges the candidate pool prior to screening. The choice of initial screening order has considerable effects on the selected set of k candidates. We prove that under an unbalanced candidate pool (e.g., having more male than female candidates), the human-like screener can suffer from uneven efforts that hinder its decision-making over the protected, under-represented group relative to the non-protected, over-represented group. Other fairness results are proven under the human-like screener. This research is based on a collaboration with a large company to better understand its hiring process for potential automation. Our main contribution is the formalization of the initial screening order problem which, we argue, opens the path for future extensions of the current works on ranking algorithms, fairness, and automation for screening procedures.
    摘要 在本文中,我们提出了初层屏选问题,这是候选人筛选的一个关键步骤。它涉及一个人类化的屏选员,目的是在候选人池中找出第一个k适合的候选人而不是最佳k适合的候选人。初层屏选顺序表示屏选员在筛选之前对候选人池进行了排序。初层屏选顺序的选择对选定的k候选人产生了很大的影响。我们证明,在不均衡的候选人池(例如有更多的男性候选人)情况下,屏选员可能会受到不平等的努力,使其对保护的、少数群体进行决策受到阻碍,相比之下对非保护的、多数群体的决策更加容易。我们还证明了其他的公平性结果。这项研究基于与大公司合作,以更好地理解其招聘过程,以便更好地自动化屏选过程。我们的主要贡献是对初层屏选顺序问题的正式化,我们认为这将开启未来对排名算法、公平性和自动化屏选过程的扩展。

Noisy Interpolation Learning with Shallow Univariate ReLU Networks

  • paper_url: http://arxiv.org/abs/2307.15396
  • repo_url: None
  • paper_authors: Nirmit Joshi, Gal Vardi, Nathan Srebro
  • for: 研究了采用 interpolate with minimum norm( weights 的 $\ell_2$ norm)的 two-layer ReLU 网络在噪音单变量回归中的 asymptotic overfitting 行为。
  • methods: 使用 interpolate with minimum norm 方法,并研究不同的损失函数($L_p$ loss) 对 overfitting 的影响。
  • results: 发现在 $L_1$ 损失函数下,overfitting 受控;在 $L_p$ 损失函数 ($p<2$) 下,overfitting 也受控,但在 $p\geq 2$ 下,overfitting catastrophic。
    Abstract We study the asymptotic overfitting behavior of interpolation with minimum norm ($\ell_2$ of the weights) two-layer ReLU networks for noisy univariate regression. We show that overfitting is tempered for the $L_1$ loss, and any $L_p$ loss for $p<2$, but catastrophic for $p\geq 2$.
    摘要 我们研究插值的极大欠拟合行为,使用两层ReLU网络,在噪音一变量回归 зада务中。我们显示,使用$L_1$损失函数时,过拟合行为会减少;使用任何$L_p$损失函数($p<2)时,过拟合行为也会减少,但是使用$L_p$损失函数($p\geq 2)时,过拟合行为将会恶化。

Does Full Waveform Inversion Benefit from Big Data?

  • paper_url: http://arxiv.org/abs/2307.15388
  • repo_url: None
  • paper_authors: Peng Jin, Yinan Feng, Shihang Feng, Hanchen Wang, Yinpeng Chen, Benjamin Consolvo, Zicheng Liu, Youzuo Lin
  • for: 这个论文研究了大数据对深度学习模型在全波形推算(FWI)中的影响。
  • methods: 该论文使用了OpenFWI数据集,一个最近发布的大规模多结构数据集,来评估深度学习模型在FWI中的性能。
  • results: 实验结果表明,随着数据集大小的增加,深度学习模型在FWI中的性能和泛化性也随着增加。同时,模型容量需要随着数据大小增加以实现最佳改进。
    Abstract This paper investigates the impact of big data on deep learning models for full waveform inversion (FWI). While it is well known that big data can boost the performance of deep learning models in many tasks, its effectiveness has not been validated for FWI. To address this gap, we present an empirical study that investigates how deep learning models in FWI behave when trained on OpenFWI, a collection of large-scale, multi-structural datasets published recently. Particularly, we train and evaluate the FWI models on a combination of 10 2D subsets in OpenFWI that contain 470K data pairs in total. Our experiments demonstrate that larger datasets lead to better performance and generalization of deep learning models for FWI. We further demonstrate that model capacity needs to scale in accordance with data size for optimal improvement.
    摘要 这篇论文研究了大数据对深度学习模型在全波形推敲(FWI)中的影响。虽然已经确立了大数据可以提高深度学习模型在许多任务中表现的效果,但这些效果在FWI中尚未得到证明。为解决这个空白,我们发表了一项实验研究,检验了在OpenFWI中发布的大规模多结构数据集上深度学习模型在FWI中的表现。特别是,我们在OpenFWI中选取了10个2D子集,这些子集包含470K个数据对。我们的实验表明,大数据集会导致深度学习模型在FWI中的表现和泛化性能更好。我们还证明了模型容量需要与数据大小成正比以获得优化的改进。

Co-attention Graph Pooling for Efficient Pairwise Graph Interaction Learning

  • paper_url: http://arxiv.org/abs/2307.15377
  • repo_url: https://github.com/leejunhyun/coattentiongraphpooling
  • paper_authors: Junhyun Lee, Bumsoo Kim, Minji Jeon, Jaewoo Kang
  • for: 这个研究旨在提高graph neural network(GNN)的效能,以便处理和学习具有图структуре数据的问题。
  • methods: 本研究使用co-attention在图 pooling中提取互动表示,实现更高效的 computation complexity和更好的性能。
  • results: 研究表明,CAGPool方法在真实世界数据集上展现出了竞争性的表现,同时保持较低的computational complexity。
    Abstract Graph Neural Networks (GNNs) have proven to be effective in processing and learning from graph-structured data. However, previous works mainly focused on understanding single graph inputs while many real-world applications require pair-wise analysis for graph-structured data (e.g., scene graph matching, code searching, and drug-drug interaction prediction). To this end, recent works have shifted their focus to learning the interaction between pairs of graphs. Despite their improved performance, these works were still limited in that the interactions were considered at the node-level, resulting in high computational costs and suboptimal performance. To address this issue, we propose a novel and efficient graph-level approach for extracting interaction representations using co-attention in graph pooling. Our method, Co-Attention Graph Pooling (CAGPool), exhibits competitive performance relative to existing methods in both classification and regression tasks using real-world datasets, while maintaining lower computational complexity.
    摘要 格式化神经网络(GNNs)已经证明可以有效地处理和学习具有图结构数据的数据。然而,先前的工作主要集中于理解单个图输入,而实际应用中许多应用需要对图结构数据进行对比分析(例如,场景图匹配、代码搜索和药物交互预测)。为了解决这个问题,现有的工作已经转移着注意力到对图对之间的学习。虽然它们的性能有所改善,但是它们仍然受到图节点级别的交互限制,从而导致计算成本高、性能下降。为了解决这个问题,我们提出了一种新的和高效的图级别方法,即协同注意力图Pooling(CAGPool)。我们的方法在实际数据集上的分类和回归任务中展现出竞争性的性能,同时保持计算复杂度较低。

Conflict-free joint decision by lag and zero-lag synchronization in laser network

  • paper_url: http://arxiv.org/abs/2307.15373
  • repo_url: None
  • paper_authors: Hisako Ito, Takatomo Mihana, Ryoichi Horisaki, Makoto Naruse
  • for: 这篇论文是关于应用激光网络作为光学加速器解决竞争多臂投机问题的研究。
  • methods: 该研究使用了激光网络,通过零延迟和延迟同步实现决策合作和碰撞避免功能。
  • results: 实验表明,该系统可以实现低碰撞率和高奖励,并且可扩展到更复杂的场景。
    Abstract With the end of Moore's Law and the increasing demand for computing, photonic accelerators are garnering considerable attention. This is due to the physical characteristics of light, such as high bandwidth and multiplicity, and the various synchronization phenomena that emerge in the realm of laser physics. These factors come into play as computer performance approaches its limits. In this study, we explore the application of a laser network, acting as a photonic accelerator, to the competitive multi-armed bandit problem. In this context, conflict avoidance is key to maximizing environmental rewards. We experimentally demonstrate cooperative decision-making using zero-lag and lag synchronization within a network of four semiconductor lasers. Lag synchronization of chaos realizes effective decision-making and zero-delay synchronization is responsible for the realization of the collision avoidance function. We experimentally verified a low collision rate and high reward in a fundamental 2-player, 2-slot scenario, and showed the scalability of this system. This system architecture opens up new possibilities for intelligent functionalities in laser dynamics.
    摘要 随着Moore的法则的结束和计算机能力的增长,光学加速器正在吸引一些关注。这是因为光子的物理特性,如带宽和多重性,以及在激光物理中出现的各种同步现象。这些因素在计算机性能接近界限时变得重要。在这种研究中,我们探讨了一个激光网络作为光学加速器应用于竞争多臂猎手问题。在这个上下文中,避免冲突是最大化环境奖励的关键。我们实验性地表明了光网络中的共同决策,使用零延迟和延迟同步实现有效的决策。我们实验证明了光网络中的冲突率低,奖励高,并证明了这种系统架构的扩展性。这种系统架构开启了新的智能功能在激光动力学中。

Toward Transparent Sequence Models with Model-Based Tree Markov Model

  • paper_url: http://arxiv.org/abs/2307.15367
  • repo_url: None
  • paper_authors: Chan Hsu, Wei-Chun Huang, Jun-Ting Wu, Chih-Yuan Li, Yihuang Kang
  • for: 该研究目标是解决复杂黑盒机器学习模型在序列数据上的解释性问题。
  • methods: 该研究提出了基于树的模型基于隐藏Markov模型(MOB-HSMM),该模型可以检测高死亡风险事件并揭示隐藏在ICU中的死亡风险。该模型利用了深度神经网络(DNN)中提取的知识来提高预测性能,同时提供了明确的解释。
  • results: 实验结果表明,通过使用LSTM学习序列模式,MOB树可以得到改进的性能。将MOB树与隐藏Markov模型(HSMM)结合在一起,可以揭示可用信息中的可能和解释性序列。
    Abstract In this study, we address the interpretability issue in complex, black-box Machine Learning models applied to sequence data. We introduce the Model-Based tree Hidden Semi-Markov Model (MOB-HSMM), an inherently interpretable model aimed at detecting high mortality risk events and discovering hidden patterns associated with the mortality risk in Intensive Care Units (ICU). This model leverages knowledge distilled from Deep Neural Networks (DNN) to enhance predictive performance while offering clear explanations. Our experimental results indicate the improved performance of Model-Based trees (MOB trees) via employing LSTM for learning sequential patterns, which are then transferred to MOB trees. Integrating MOB trees with the Hidden Semi-Markov Model (HSMM) in the MOB-HSMM enables uncovering potential and explainable sequences using available information.
    摘要 在这项研究中,我们解决了复杂黑盒机器学习模型应用于序列数据中的可解释性问题。我们引入了基于模型的树隐藏半马尔可夫模型(MOB-HSMM),这是一种自然可解释的模型,用于检测高死亡风险事件并揭示隐藏在死亡风险中的征 patrern。这个模型利用了深度神经网络(DNN)提供的知识来提高预测性能,同时也提供了明确的解释。我们的实验结果表明,通过使用LSTM学习序列模式,MOB树可以得到改进的性能。将MOB树与隐藏半马尔可夫模型(HSMM)结合在一起,在MOB-HSMM中实现了潜在的和可解释的序列使用可用信息。

Confident Feature Ranking

  • paper_url: http://arxiv.org/abs/2307.15361
  • repo_url: None
  • paper_authors: Bitya Neuhof, Yuval Benjamini
  • for: 本研究旨在提出一种基于对比比较的后处方法,以确定特征重要性值的稳定排名。
  • methods: 该方法基于对比比较特征重要性值,并生成一个稳定的排名和相应的信任范围。
  • results: 研究表明,该方法能够包含真正的排名(在无穷样本情况下),并且允许选择top-k集。
    Abstract Interpretation of feature importance values often relies on the relative order of the features rather than on the value itself, referred to as ranking. However, the order may be unstable due to the small sample sizes used in calculating the importance values. We propose that post-hoc importance methods produce a ranking and simultaneous confident intervals for the rankings. Based on pairwise comparisons of the feature importance values, our method is guaranteed to include the ``true'' (infinite sample) ranking with high probability and allows for selecting top-k sets.
    摘要 通常情况下,特征重要性值的解释往往基于特征之间的相对排序而不是值本身,称为排名。然而,小样本大小可能导致排名不稳定。我们提议使用后期重要性方法生成排名和同时确定范围,以 garantía que la "verdadera" (大样本) 排名包含在内的高概率。此外,我们的方法还允许选择top-k集。Note: "top-k" refers to the top k features in the dataset, where k is a positive integer.

Med-HALT: Medical Domain Hallucination Test for Large Language Models

  • paper_url: http://arxiv.org/abs/2307.15343
  • repo_url: None
  • paper_authors: Logesh Kumar Umapathi, Ankit Pal, Malaikannan Sankarasubbu
  • for: 这种研究旨在解决大语言模型(LLM)中的幻觉问题,尤其在医疗领域。幻觉可能会在医疗应用中产生严重的后果。
  • methods: 我们提出了一个新的标准和数据集,即医疗领域幻觉测试(Med-HALT),用于评估和减少幻觉。Med-HALT 包括多种国际多元的数据集, derivated from medical examinations across various countries, and includes multiple innovative testing modalities。
  • results: 我们对主要的 LLM 进行了评估,包括 Text Davinci、GPT-3.5、LlaMa-2、MPT 和 Falcon,发现了这些模型在幻觉方面的显著差异。本文提供了详细的数据集描述,促进了透明度和重复性。通过这项工作,我们希望为医疗领域中更安全和可靠的语言模型的开发作出贡献。
    Abstract This research paper focuses on the challenges posed by hallucinations in large language models (LLMs), particularly in the context of the medical domain. Hallucination, wherein these models generate plausible yet unverified or incorrect information, can have serious consequences in healthcare applications. We propose a new benchmark and dataset, Med-HALT (Medical Domain Hallucination Test), designed specifically to evaluate and reduce hallucinations. Med-HALT provides a diverse multinational dataset derived from medical examinations across various countries and includes multiple innovative testing modalities. Med-HALT includes two categories of tests reasoning and memory-based hallucination tests, designed to assess LLMs's problem-solving and information retrieval abilities. Our study evaluated leading LLMs, including Text Davinci, GPT-3.5, LlaMa-2, MPT, and Falcon, revealing significant differences in their performance. The paper provides detailed insights into the dataset, promoting transparency and reproducibility. Through this work, we aim to contribute to the development of safer and more reliable language models in healthcare. Our benchmark can be found at medhalt.github.io
    摘要

The Radon Signed Cumulative Distribution Transform and its applications in classification of Signed Images

  • paper_url: http://arxiv.org/abs/2307.15339
  • repo_url: https://github.com/rohdelab/PyTransKit
  • paper_authors: Le Gong, Shiying Li, Naqib Sad Pathan, Mohammad Shifat-E-Rabbi, Gustavo K. Rohde, Abu Hasnat Mohammad Rubaiyat, Sumati Thareja
  • for: 这个论文的目的是提出一种基于运输和最优运输的新图像表示技术。
  • methods: 这种新的图像表示方法结合了宽泛使用的劳顿变换和最近的信号表示方法called Signed Cumulative Distribution Transform。
  • results: 这种新的图像表示方法可以更好地表示签名图像中的信息内容,与现有的运输变换方法和深度学习基于分类方法相比,可以获得更高的分类精度。
    Abstract Here we describe a new image representation technique based on the mathematics of transport and optimal transport. The method relies on the combination of the well-known Radon transform for images and a recent signal representation method called the Signed Cumulative Distribution Transform. The newly proposed method generalizes previous transport-related image representation methods to arbitrary functions (images), and thus can be used in more applications. We describe the new transform, and some of its mathematical properties and demonstrate its ability to partition image classes with real and simulated data. In comparison to existing transport transform methods, as well as deep learning-based classification methods, the new transform more accurately represents the information content of signed images, and thus can be used to obtain higher classification accuracies. The implementation of the proposed method in Python language is integrated as a part of the software package PyTransKit, available on Github.
    摘要 我们介绍一种新的图像表示技术,基于运输学和最优运输学。该方法通过结合已知的卷积变换和最近的签名总额变换方法,将图像表示为函数。该新提议可以将图像分类问题应用于更多的场景。我们描述了该新变换,以及一些其数学性质和实际应用。我们还比较了现有的运输变换方法和深度学习基于分类方法,显示该新变换更好地表示签名图像的信息内容,因此可以获得更高的分类精度。我们在Python语言中实现了该方法,并将其集成到PyTransKit软件包中,可以在Github上下载。

Staging E-Commerce Products for Online Advertising using Retrieval Assisted Image Generation

  • paper_url: http://arxiv.org/abs/2307.15326
  • repo_url: None
  • paper_authors: Yueh-Ning Ku, Mikhail Kuznetsov, Shaunak Mishra, Paloma de Juan
  • For: 提高动态产品广告(DPA)图像的吸引力和实用性,使得用户更容易点击广告。* Methods: 使用生成对抗网络(GAN)和检索帮助GAN(Retrieval Assisted GANs)生成精心设计的背景,以增强产品图像的吸引力和实用性。* Results: 通过在线 метриks和人工评估,证明了我们的复制粘贴stage方法可以提高DPA图像的吸引力和实用性,并且可以生成动态产品广告短片。
    Abstract Online ads showing e-commerce products typically rely on the product images in a catalog sent to the advertising platform by an e-commerce platform. In the broader ads industry such ads are called dynamic product ads (DPA). It is common for DPA catalogs to be in the scale of millions (corresponding to the scale of products which can be bought from the e-commerce platform). However, not all product images in the catalog may be appealing when directly re-purposed as an ad image, and this may lead to lower click-through rates (CTRs). In particular, products just placed against a solid background may not be as enticing and realistic as a product staged in a natural environment. To address such shortcomings of DPA images at scale, we propose a generative adversarial network (GAN) based approach to generate staged backgrounds for un-staged product images. Generating the entire staged background is a challenging task susceptible to hallucinations. To get around this, we introduce a simpler approach called copy-paste staging using retrieval assisted GANs. In copy paste staging, we first retrieve (from the catalog) staged products similar to the un-staged input product, and then copy-paste the background of the retrieved product in the input image. A GAN based in-painting model is used to fill the holes left after this copy-paste operation. We show the efficacy of our copy-paste staging method via offline metrics, and human evaluation. In addition, we show how our staging approach can enable animations of moving products leading to a video ad from a product image.
    摘要 在线广告通常会显示电子商务产品,通常是由电子商务平台发送的目录至广告平台。在更广泛的广告业界中,这些广告被称为动态产品广告(DPA)。DPA目录通常有数百万产品(相应到电子商务平台上可购买的产品数量)。然而,不 ALL的产品图片在目录中都是吸引人和真实的,这可能导致低Click-through rate(CTR)。特别是产品只是在固定背景上显示,可能不如产品在自然环境中搭配而更吸引人。为了解决DPA图片的缺陷,我们提出了基于生成敌人网络(GAN)的方法,生成产品在自然环境中的 stagged 背景。生成整个 stagged 背景是一个具有潜在误导的任务,因此我们提出了一个简单的方法:copy-paste staging。在copy-paste staging中,我们首先从目录中检索与输入产品相似的 stagged 产品,然后将其背景贴上输入图片中。使用GAN基于填充模型来填充贴上的孔隙。我们透过线上指标和人类评价显示了我们的copy-paste staging方法的有效性。此外,我们还显示了我们的 stagging 方法可以实现产品动画,从产品图片中生成动画广告。

Partial observations, coarse graining and equivariance in Koopman operator theory for large-scale dynamical systems

  • paper_url: http://arxiv.org/abs/2307.15325
  • repo_url: None
  • paper_authors: Sebastian Peitz, Hans Harder, Feliks Nüske, Friedrich Philipp, Manuel Schaller, Karl Worthmann
  • for: 这篇论文旨在解决大规模系统数据驱动分析、预测和控制中的一个问题,即классиical EDMD算法不自动提供系统下的库曼操作符approximation,因为只有部分观察数据。
  • methods: 这篇论文使用了 Koopman 操作符来研究大规模系统的非线性动力学,并提出了一种新的方法来保持系统动力学 симметрии,从而大幅提高模型效率。
  • results: 数字实验表明,这种新方法可以减少数据量,同时保持模型的准确性,并且可以与域分解技术相结合以提高效率。
    Abstract The Koopman operator has become an essential tool for data-driven analysis, prediction and control of complex systems, the main reason being the enormous potential of identifying linear function space representations of nonlinear dynamics from measurements. Until now, the situation where for large-scale systems, we (i) only have access to partial observations (i.e., measurements, as is very common for experimental data) or (ii) deliberately perform coarse graining (for efficiency reasons) has not been treated to its full extent. In this paper, we address the pitfall associated with this situation, that the classical EDMD algorithm does not automatically provide a Koopman operator approximation for the underlying system if we do not carefully select the number of observables. Moreover, we show that symmetries in the system dynamics can be carried over to the Koopman operator, which allows us to massively increase the model efficiency. We also briefly draw a connection to domain decomposition techniques for partial differential equations and present numerical evidence using the Kuramoto--Sivashinsky equation.
    摘要 科普曼运算已成为数据驱动分析、预测和控制复杂系统的重要工具,主要原因是可以从测量数据中提取非线性动力学的线性函数空间表示。直到现在,我们对大规模系统的情况还没有充分考虑,即我们只有部分观察数据(例如测量数据)或者故意压缩系统(为了提高效率)。在这篇论文中,我们证明了类型EDMD算法不会自动为下列系统提供科普曼运算符的近似值,除非我们特别选择观察量的数量。此外,我们发现了系统动力学中的对称性可以传递到科普曼运算符中,这使得我们可以巨大提高模型的效率。我们还 briefly Draw a connection to域分解技术 для部分杜拉Equation,并提供了数据证明使用库拉诺-西瓦希诺 Equation。

Robust Visual Sim-to-Real Transfer for Robotic Manipulation

  • paper_url: http://arxiv.org/abs/2307.15320
  • repo_url: None
  • paper_authors: Ricardo Garcia, Robin Strudel, Shizhe Chen, Etienne Arlaud, Ivan Laptev, Cordelia Schmid
  • For: + The paper focuses on bridging the visual sim-to-real domain gap in robotic manipulation tasks using domain randomization (DR) methods. + The authors aim to evaluate the effectiveness of DR methods for challenging robotic manipulation tasks and to develop a systematic approach to selecting DR parameters.* Methods: + The authors propose an off-line proxy task of cube localization to select DR parameters for texture randomization, lighting randomization, variations of object colors, and camera parameters. + They use off-line optimized DR parameters to train visuomotor policies in simulation and directly apply such policies to a real robot.* Results: + The authors achieve an average success rate of 93% on a diverse set of challenging manipulation tasks. + Their simulator-trained policies outperform policies learned using real but limited data, demonstrating the effectiveness of their approach in handling visual variations in real scenes.Here’s the simplified Chinese text for the three key information points:* For: + 论文主要关注在机器人 manipulate 任务中的视觉领域域转移问题,使用域随机化 (DR) 方法 bridge 距离。 + 作者们想要评估 DR 方法在复杂的机器人 manipulate 任务中的效果,并开发一种系统的方法来选择 DR 参数。* Methods: + 作者们提议一种离线代理任务,即立方体localization,来选择 DR 参数的Texture randomization、lighting randomization、物体颜色的变化和摄像头参数。 + 他们使用离线优化的 DR 参数来在 simulator 中训练视觉动作策略,然后直接将其应用到真实的机器人上。* Results: + 作者们在多种复杂的机器人 manipulate 任务中得到了平均 93% 的成功率。 + 他们的 simulator 训练的策略在实际场景中处理视觉变化的情况下表现更好,超过了基于实际 pero 有限的数据学习的策略。
    Abstract Learning visuomotor policies in simulation is much safer and cheaper than in the real world. However, due to discrepancies between the simulated and real data, simulator-trained policies often fail when transferred to real robots. One common approach to bridge the visual sim-to-real domain gap is domain randomization (DR). While previous work mainly evaluates DR for disembodied tasks, such as pose estimation and object detection, here we systematically explore visual domain randomization methods and benchmark them on a rich set of challenging robotic manipulation tasks. In particular, we propose an off-line proxy task of cube localization to select DR parameters for texture randomization, lighting randomization, variations of object colors and camera parameters. Notably, we demonstrate that DR parameters have similar impact on our off-line proxy task and on-line policies. We, hence, use off-line optimized DR parameters to train visuomotor policies in simulation and directly apply such policies to a real robot. Our approach achieves 93% success rate on average when tested on a diverse set of challenging manipulation tasks. Moreover, we evaluate the robustness of policies to visual variations in real scenes and show that our simulator-trained policies outperform policies learned using real but limited data. Code, simulation environment, real robot datasets and trained models are available at https://www.di.ens.fr/willow/research/robust_s2r/.
    摘要 顺序训练在模拟环境中的视听动作策略比实际世界更安全和更便宜。然而,由于模拟和实际数据之间的差异,模拟训练的策略通常在转移到实际机器人上失败。一种常见的方法是域随机化(DR),以 bridge the visual sim-to-real domain gap。在这里,我们系统地探讨视听域随机化方法,并对其进行了丰富的机器人 manipulate 任务的benchmark。具体来说,我们提出了一个离线代理任务——立方体localization,用于选择DR参数的Texture randomization、lighting randomization、物体颜色变换和摄像头参数。值得一提的是,我们示出了DR参数对我们的离线代理任务和在线策略具有相似的影响。因此,我们使用离线优化的DR参数来在模拟环境中训练视听动作策略,然后直接将其应用到实际机器人上。我们的方法在多种挑战性的机器人 manipulate 任务上得到了93%的成功率的平均值。此外,我们还评估了模拟训练的策略在实际场景中对视觉变化的Robustness,并发现我们在模拟环境中训练的策略在实际数据中的表现更佳。可以在https://www.di.ens.fr/willow/research/robust_s2r/获取我们的代码、模拟环境、实际机器人数据和训练模型。

SAP-sLDA: An Interpretable Interface for Exploring Unstructured Text

  • paper_url: http://arxiv.org/abs/2308.01420
  • repo_url: None
  • paper_authors: Charumathi Badrinath, Weiwei Pan, Finale Doshi-Velez
  • for: explore text corpora and learn topics that preserve semantically meaningful relationships between documents
  • methods: semi-supervised human-in-the-loop LDA-based method
  • results: more interpretable projections than baseline methods with only a fraction of labels provided, qualitatively similar results on a real corpus
    Abstract A common way to explore text corpora is through low-dimensional projections of the documents, where one hopes that thematically similar documents will be clustered together in the projected space. However, popular algorithms for dimensionality reduction of text corpora, like Latent Dirichlet Allocation (LDA), often produce projections that do not capture human notions of document similarity. We propose a semi-supervised human-in-the-loop LDA-based method for learning topics that preserve semantically meaningful relationships between documents in low-dimensional projections. On synthetic corpora, our method yields more interpretable projections than baseline methods with only a fraction of labels provided. On a real corpus, we obtain qualitatively similar results.
    摘要 (Simplified Chinese translation)通常来说,探索文本集合的方式之一是通过文本文档的低维度投影,希望在投影空间中可以将有相似主题的文档归类到一起。然而,常见的文本集合维度减少算法,如Latent Dirichlet Allocation(LDA),经常生成不能捕捉人类理解的文档相似性的投影。我们提议一种半监督人类 loops LDA 基于方法,可以在低维度投影中保持文档之间含义 significado的关系。在 sintética corpora 上,我们的方法可以提供更加可解的投影,并且只需提供一部分标签。在实际 corpora 上,我们获得了类似的结果。

DiffKendall: A Novel Approach for Few-Shot Learning with Differentiable Kendall’s Rank Correlation

  • paper_url: http://arxiv.org/abs/2307.15317
  • repo_url: None
  • paper_authors: Kaipeng Zheng, Huishuai Zhang, Weiran Huang
  • for: 提高ew-shot learning的性能,尤其是在不同领域的数据集上。
  • methods: 使用Kendall的排名相关度metric instead of geometric similarity metric during inference, 并提pose a carefully designed differentiable loss for meta-training to address the non-differentiability issue。
  • results: 提高ew-shot learning的性能 across a wide range of datasets with different domains, and the proposed rank-correlation-based approach substantially enhances few-shot learning performance.
    Abstract Few-shot learning aims to adapt models trained on the base dataset to novel tasks where the categories are not seen by the model before. This often leads to a relatively uniform distribution of feature values across channels on novel classes, posing challenges in determining channel importance for novel tasks. Standard few-shot learning methods employ geometric similarity metrics such as cosine similarity and negative Euclidean distance to gauge the semantic relatedness between two features. However, features with high geometric similarities may carry distinct semantics, especially in the context of few-shot learning. In this paper, we demonstrate that the importance ranking of feature channels is a more reliable indicator for few-shot learning than geometric similarity metrics. We observe that replacing the geometric similarity metric with Kendall's rank correlation only during inference is able to improve the performance of few-shot learning across a wide range of datasets with different domains. Furthermore, we propose a carefully designed differentiable loss for meta-training to address the non-differentiability issue of Kendall's rank correlation. Extensive experiments demonstrate that the proposed rank-correlation-based approach substantially enhances few-shot learning performance.
    摘要 几个shot学习目标是使模型在基础数据集上训练后适应到新任务中,其中类别没有模型所见过。这经常导致新任务中通道的特征值分布相对均匀,从而增加了决定通道重要性的挑战。标准的几个shot学习方法使用几何相似度度量,如cosine相似度和负Euclidean距离,来衡量两个特征之间的 semantic相似性。然而,具有高几何相似性的特征可能表达出不同的 semantics,特别在几个shot学习上。在这篇论文中,我们表明了特征通道的重要性排名在几个shot学习中是更可靠的指标,而不是几何相似度度量。我们发现,在推理过程中将几何相似度度量替换为Kendall的排名相关性可以在多个领域的数据集上提高几个shot学习性能。此外,我们提出了一种特殊的可导损失函数,用于meta-training,以解决Kendall的排名相关性的不导能性问题。广泛的实验表明,我们的排名相关性基本 Approach 可以大幅提高几个shot学习性能。

Differential Evolution Algorithm based Hyper-Parameters Selection of Transformer Neural Network Model for Load Forecasting

  • paper_url: http://arxiv.org/abs/2307.15299
  • repo_url: https://github.com/anuvabsen1/meta-transformer
  • paper_authors: Anuvab Sen, Arul Rhik Mazumder, Udayon Sen
  • for: 预测电网负荷,减少能源浪费和提高供电稳定性。
  • methods: 使用时间序列模型(ARIMA)和深度学习模型(ANN、LSTM、GRU等),并应用多种metaheuristics(如 diferencial evolution)来找到最佳的模型hyperparameters。
  • results: 研究表明,通过metaheuristics对Transformer模型进行优化,可以提高预测精度,并且提供了不同metaheuristics对模型性能的比较。
    Abstract Accurate load forecasting plays a vital role in numerous sectors, but accurately capturing the complex dynamics of dynamic power systems remains a challenge for traditional statistical models. For these reasons, time-series models (ARIMA) and deep-learning models (ANN, LSTM, GRU, etc.) are commonly deployed and often experience higher success. In this paper, we analyze the efficacy of the recently developed Transformer-based Neural Network model in Load forecasting. Transformer models have the potential to improve Load forecasting because of their ability to learn long-range dependencies derived from their Attention Mechanism. We apply several metaheuristics namely Differential Evolution to find the optimal hyperparameters of the Transformer-based Neural Network to produce accurate forecasts. Differential Evolution provides scalable, robust, global solutions to non-differentiable, multi-objective, or constrained optimization problems. Our work compares the proposed Transformer based Neural Network model integrated with different metaheuristic algorithms by their performance in Load forecasting based on numerical metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE). Our findings demonstrate the potential of metaheuristic-enhanced Transformer-based Neural Network models in Load forecasting accuracy and provide optimal hyperparameters for each model.
    摘要 准确的负荷预测在多个领域发挥重要作用,但是准确地捕捉动态能源系统的复杂动态还是传统统计模型的挑战。为此,时间序列模型(ARIMA)和深度学习模型(ANN、LSTM、GRU等)通常被部署,并经常得到更高的成功。在这篇论文中,我们分析了将最近发展的Transformer基于神经网络模型在负荷预测中的效果。Transformer模型具有学习长距离依赖关系的能力,因此它们在负荷预测中具有潜在的优势。我们使用多种metaheuristics,包括差分演化,来找出最佳的神经网络模型参数,以便生成高精度的预测结果。我们的工作比较了不同metaheuristicsAlgorithm和Transformer基于神经网络模型的性能,并通过数学统计指标(如 Mean Squared Error 和 Mean Absolute Percentage Error)来评估其表现。我们的发现表明metaheuristic增强的Transformer基于神经网络模型在负荷预测精度方面具有潜在的优势,并且可以为每个模型提供优化参数。

Learning Nonlinear Projections for Reduced-Order Modeling of Dynamical Systems using Constrained Autoencoders

  • paper_url: http://arxiv.org/abs/2307.15288
  • repo_url: https://github.com/grmacchio/romnet_chaos2023
  • paper_authors: Samuel E. Otto, Gregory R. Macchio, Clarence W. Rowley
  • for: 这种新发展的减少模型技术用于近似非线性动力系统的低维抽象表示。
  • methods: 我们使用受限的自编码神经网络来学习数据上的抽象表示,其中抽象表示的维度是低于实际系统的维度。
  • results: 我们提出了一种新的非线性投影方法,该方法可以学习数据上的抽象表示和投影纤维。我们还提出了一些高维系统建模的技术,包括一种新的简洁化约束 penalty。
    Abstract Recently developed reduced-order modeling techniques aim to approximate nonlinear dynamical systems on low-dimensional manifolds learned from data. This is an effective approach for modeling dynamics in a post-transient regime where the effects of initial conditions and other disturbances have decayed. However, modeling transient dynamics near an underlying manifold, as needed for real-time control and forecasting applications, is complicated by the effects of fast dynamics and nonnormal sensitivity mechanisms. To begin to address these issues, we introduce a parametric class of nonlinear projections described by constrained autoencoder neural networks in which both the manifold and the projection fibers are learned from data. Our architecture uses invertible activation functions and biorthogonal weight matrices to ensure that the encoder is a left inverse of the decoder. We also introduce new dynamics-aware cost functions that promote learning of oblique projection fibers that account for fast dynamics and nonnormality. To demonstrate these methods and the specific challenges they address, we provide a detailed case study of a three-state model of vortex shedding in the wake of a bluff body immersed in a fluid, which has a two-dimensional slow manifold that can be computed analytically. In anticipation of future applications to high-dimensional systems, we also propose several techniques for constructing computationally efficient reduced-order models using our proposed nonlinear projection framework. This includes a novel sparsity-promoting penalty for the encoder that avoids detrimental weight matrix shrinkage via computation on the Grassmann manifold.
    摘要 近期开发的减少模型技术目的是近似非线性动力系统在低维抽象 manifold 上。这是一种有效的方法,用于模型在后过渡期的动力系统,其中Initial conditions和其他干扰的影响都已经衰退。然而,在near manifold 上模型急速动力和非正常敏感机制的影响,为实时控制和预测应用而带来了复杂性。为解决这些问题,我们提出了一个参数化的非线性投影描述,其中 manifold 和投影纤维都是从数据学习出来的。我们的架构使用可逆激活函数和对偶重量矩阵,以确保encoder 是 decoder 的左逆。我们还引入了新的动力感知成本函数,以便学习辐射投影纤维,考虑到急速动力和非正常性。为证明这些方法和它们解决的具体挑战,我们提供了一个细化的三个状态模型的游离振荡例子,该模型具有可 analytically 计算的二维慢态 manifold。预计将来应用于高维系统,我们还提出了一些构造高效减少模型的技术。这包括一种新的瑞度推荐策略,以避免 encoder 的权重矩阵减小,通过计算 Grassmann manifold 进行计算。

Optimal Approximation of Zonoids and Uniform Approximation by Shallow Neural Networks

  • paper_url: http://arxiv.org/abs/2307.15285
  • repo_url: None
  • paper_authors: Jonathan W. Siegel
  • for: 本文解决了两个相关的问题:第一个是确定一个任意的ζonoid在 $\mathbb{R}^{d+1}$ 可以在 Hausdorff 距离上被近似为 $n$ 条线段的和,第二个是确定 shallow ReLU$^k$ 神经网络在其变换空间上的优化approximation rate。
  • methods: 本文使用了新的技术解决了第一个问题,并且对第二个问题进行了 significiant 改进,能够uniformlyapproximate target function和其导函数。
  • results: 本文在所有维度中解决了第一个问题,并且对第二个问题提供了 improved approximation rates,可以uniformlyapproximate target function和其导函数。
    Abstract We study the following two related problems. The first is to determine to what error an arbitrary zonoid in $\mathbb{R}^{d+1}$ can be approximated in the Hausdorff distance by a sum of $n$ line segments. The second is to determine optimal approximation rates in the uniform norm for shallow ReLU$^k$ neural networks on their variation spaces. The first of these problems has been solved for $d\neq 2,3$, but when $d=2,3$ a logarithmic gap between the best upper and lower bounds remains. We close this gap, which completes the solution in all dimensions. For the second problem, our techniques significantly improve upon existing approximation rates when $k\geq 1$, and enable uniform approximation of both the target function and its derivatives.
    摘要 我们研究以下两个相关的问题。第一个问题是决定任意几何在 $\mathbb{R}^{d+1} $ 中可以被估计在 Hausdorff 距离上靠拢到 $n $ 条直线段的误差。第二个问题是决定 uniform 距离上的最佳数值� Lavu neural networks 的变化空间上的数值� Lavu 率。第一个问题在 $d\neq 2,3 $ 已经解决,但在 $d=2,3 $ 还有很小的几何差。我们在这里关闭了这个差,完成了所有维度的解决方案。对于第二个问题,我们的技术可以在 $k\geq 1 $ 上进一步提高现有的数值� Lavu 率,并允许对目标函数和其 derivatives 进行均匀数值� Lavu。

VeriGen: A Large Language Model for Verilog Code Generation

  • paper_url: http://arxiv.org/abs/2308.00708
  • repo_url: None
  • paper_authors: Shailja Thakur, Baleegh Ahmad, Hammond Pearce, Benjamin Tan, Brendan Dolan-Gavitt, Ramesh Karri, Siddharth Garg
  • for: 这个研究探讨了大语言模型(LLMs)是否能够自动设计硬件,通过生成高质量的Verilog代码。
  • methods: 研究人员在这个研究中细化了存在的LLMs,并对Verilog dataset从GitHub和Verilog教材中进行了编译。他们使用了一个专门设计的测试 suite,包括一个自定义问题集和测试架。
  • results: 研究人员发现,他们细化的开源CodeGen-16B模型在测试中表现出色,与商业化状态天空的GPT-3.5-turbo模型相比,实现了1.1%的总体提升。在面临更多和更复杂的问题集时,细化模型与状态天空模型竞争,在某些情况下表现出优异。特别是,它在不同问题类别中生成正确的Verilog代码的比例提高了41%,这highlights了小型、内部LLMs在硬件设计自动化中的潜力。
    Abstract In this study, we explore the capability of Large Language Models (LLMs) to automate hardware design by generating high-quality Verilog code, a common language for designing and modeling digital systems. We fine-tune pre-existing LLMs on Verilog datasets compiled from GitHub and Verilog textbooks. We evaluate the functional correctness of the generated Verilog code using a specially designed test suite, featuring a custom problem set and testing benches. Here, our fine-tuned open-source CodeGen-16B model outperforms the commercial state-of-the-art GPT-3.5-turbo model with a 1.1% overall increase. Upon testing with a more diverse and complex problem set, we find that the fine-tuned model shows competitive performance against state-of-the-art gpt-3.5-turbo, excelling in certain scenarios. Notably, it demonstrates a 41% improvement in generating syntactically correct Verilog code across various problem categories compared to its pre-trained counterpart, highlighting the potential of smaller, in-house LLMs in hardware design automation.
    摘要 在这项研究中,我们探索了大型自然语言模型(LLM)可以自动设计硬件的能力,通过生成高质量的Verilog代码,这是数字系统设计和模型的通用语言。我们对已有的LLM进行了微调,使用从GitHub和Verilog教材中编译的Verilog数据集。我们使用自定义测试环境和测试架构来评估生成的Verilog代码的功能正确性。在我们的微调open-source CodeGen-16B模型与商业现代GPT-3.5-turbo模型进行比较时,我们发现了1.1%的总提高。在测试更多和更复杂的问题集时,我们发现微调后的模型在某些场景下表现竞争力强,并且在某些场景下超过了state-of-the-art gpt-3.5-turbo模型。具有41%的提高在不同类型问题集中生成符合语法规则的Verilog代码的能力,这highlights了小型、内部LLM在硬件设计自动化中的潜力。

Recovering high-quality FODs from a reduced number of diffusion-weighted images using a model-driven deep learning architecture

  • paper_url: http://arxiv.org/abs/2307.15273
  • repo_url: https://github.com/jbartlett6/sdnet
  • paper_authors: J Bartlett, C E Davey, L A Johnston, J Duan
  • for: 该 paper 的目的是提出一种基于深度学习的纤维Orientation Distribution(FOD)重建方法,以提高 diffusion-weighted images(DWI)的重建精度并减少总的成像时间。
  • methods: 该方法使用了深度学习网络,并使用了diffusion acquisition invariant representations来输入数据,以确保网络可以适应不同的b-vectors和b-values。网络还使用了一种圆拟定网络,以确保输出的FOD和输入DWI信号之间的一致性。
  • results: 对比一种现有的FOD超分辨网络FOD-Net,该方法的性能相对竞争力强,并且可以通过调整fixel分类罚项来提高下游fixel基本分析的性能。code可以在https://github.com/Jbartlett6/SDNet 中获取。
    Abstract Fibre orientation distribution (FOD) reconstruction using deep learning has the potential to produce accurate FODs from a reduced number of diffusion-weighted images (DWIs), decreasing total imaging time. Diffusion acquisition invariant representations of the DWI signals are typically used as input to these methods to ensure that they can be applied flexibly to data with different b-vectors and b-values; however, this means the network cannot condition its output directly on the DWI signal. In this work, we propose a spherical deconvolution network, a model-driven deep learning FOD reconstruction architecture, that ensures intermediate and output FODs produced by the network are consistent with the input DWI signals. Furthermore, we implement a fixel classification penalty within our loss function, encouraging the network to produce FODs that can subsequently be segmented into the correct number of fixels and improve downstream fixel-based analysis. Our results show that the model-based deep learning architecture achieves competitive performance compared to a state-of-the-art FOD super-resolution network, FOD-Net. Moreover, we show that the fixel classification penalty can be tuned to offer improved performance with respect to metrics that rely on accurately segmented of FODs. Our code is publicly available at https://github.com/Jbartlett6/SDNet .
    摘要 Diffusion-weighted imaging (DWI) 的扩展读取(FOD)重建使用深度学习有可能生成准确的FOD,从一个减少的数量的DWI图像中,降低总图像时间。通常使用Diffusion acquisition invariant representations of the DWI signals as input to these methods to ensure that they can be applied flexibly to data with different b-vectors and b-values; however, this means the network cannot condition its output directly on the DWI signal。在这种工作中,我们提出了一种圆柱体减少网络,一种驱动式深度学习FOD重建架构,以确保输入DWI信号的中间和输出FOD都与输入DWI信号兼容。此外,我们在我们的损失函数中实现了精度分类约束,让网络生成FOD,可以后续被正确分割为多个精度。我们的结果表明,我们的模型基于深度学习架构与FOD超分辨率网络FOD-Net具有竞争性的性能。此外,我们还表明,精度分类约束可以调整以提高基于FOD的下游分析中的性能。我们的代码可以在https://github.com/Jbartlett6/SDNet 上公开获取。

An Overview Of Temporal Commonsense Reasoning and Acquisition

  • paper_url: http://arxiv.org/abs/2308.00002
  • repo_url: None
  • paper_authors: Georg Wenzel, Adam Jatowt
  • for: 本研究旨在提高语言模型在时间常识逻辑 reasoning 方面的性能,特别是通过多种扩充和评估方法来提高模型的逻辑能力。
  • methods: 本研究使用了多种扩充方法,包括随机扩充、逻辑扩充和知识扩充,以提高语言模型的时间常识逻辑能力。
  • results: despite the use of these augmentations, the models still struggle to approach human performance on reasoning tasks over temporal common sense properties, such as the typical occurrence times, orderings, or durations of events.
    Abstract Temporal commonsense reasoning refers to the ability to understand the typical temporal context of phrases, actions, and events, and use it to reason over problems requiring such knowledge. This trait is essential in temporal natural language processing tasks, with possible applications such as timeline summarization, temporal question answering, and temporal natural language inference. Recent research on the performance of large language models suggests that, although they are adept at generating syntactically correct sentences and solving classification tasks, they often take shortcuts in their reasoning and fall prey to simple linguistic traps. This article provides an overview of research in the domain of temporal commonsense reasoning, particularly focusing on enhancing language model performance through a variety of augmentations and their evaluation across a growing number of datasets. However, these augmented models still struggle to approach human performance on reasoning tasks over temporal common sense properties, such as the typical occurrence times, orderings, or durations of events. We further emphasize the need for careful interpretation of research to guard against overpromising evaluation results in light of the shallow reasoning present in transformers. This can be achieved by appropriately preparing datasets and suitable evaluation metrics.
    摘要 temporal common sense reasoning 指的是理解语句、行动和事件的典型时间背景,并使用这些知识来解决问题。这种 trait 是 temporal natural language processing 任务中的必备能力,可能的应用包括时间轴概要、时间问答和时间自然语言推理。 current research 表明,虽然大语言模型具有生成 grammatically 正确句子和解决分类任务的能力,但它们经常采取缩短逻辑的缘故,容易受到简单的语言陷阱。这篇文章提供了 temporal common sense reasoning 领域的研究概述,特别是通过多种增强和其评估在不断增长的数据集上。然而,这些增强模型仍然无法接近人类在时间通用理解上的表现,例如事件的典型发生时间、顺序或持续时间。我们更加强调需要在研究评估中小心着将 transformers 的浅层逻辑解释为不可避免的问题,通过适当准备数据集和评估指标来解决这个问题。

Is this model reliable for everyone? Testing for strong calibration

  • paper_url: http://arxiv.org/abs/2307.15247
  • repo_url: https://github.com/jjfeng/testing_strong_calibration
  • paper_authors: Jean Feng, Alexej Gossmann, Romain Pirracchio, Nicholas Petrick, Gene Pennello, Berkman Sahiner
  • for: The paper is written for auditing a risk prediction model for strong calibration, particularly for machine learning algorithms, and for identifying poorly calibrated subgroups.
  • methods: The paper proposes a new testing procedure based on the insight that if observations can be reordered by their expected residuals, there should be a change in the association between the predicted and observed residuals if a poorly calibrated subgroup exists. The procedure uses a sample-splitting method, cross-validation, and a score-based cumulative sum (CUSUM) test to detect changes in the association.
  • results: The proposed procedure consistently achieved higher power in simulation studies and more than doubled the power when auditing a mortality risk prediction model compared to existing methods.
    Abstract In a well-calibrated risk prediction model, the average predicted probability is close to the true event rate for any given subgroup. Such models are reliable across heterogeneous populations and satisfy strong notions of algorithmic fairness. However, the task of auditing a model for strong calibration is well-known to be difficult -- particularly for machine learning (ML) algorithms -- due to the sheer number of potential subgroups. As such, common practice is to only assess calibration with respect to a few predefined subgroups. Recent developments in goodness-of-fit testing offer potential solutions but are not designed for settings with weak signal or where the poorly calibrated subgroup is small, as they either overly subdivide the data or fail to divide the data at all. We introduce a new testing procedure based on the following insight: if we can reorder observations by their expected residuals, there should be a change in the association between the predicted and observed residuals along this sequence if a poorly calibrated subgroup exists. This lets us reframe the problem of calibration testing into one of changepoint detection, for which powerful methods already exist. We begin with introducing a sample-splitting procedure where a portion of the data is used to train a suite of candidate models for predicting the residual, and the remaining data are used to perform a score-based cumulative sum (CUSUM) test. To further improve power, we then extend this adaptive CUSUM test to incorporate cross-validation, while maintaining Type I error control under minimal assumptions. Compared to existing methods, the proposed procedure consistently achieved higher power in simulation studies and more than doubled the power when auditing a mortality risk prediction model.
    摘要 在一个良好准确的风险预测模型中,每个子组的平均预测概率与实际事件率之间的差异应该很小。这些模型在多样化的人口中具有可靠性,并满足强的算法公平性。然而,对模型准确性进行审核是一项具有挑战性的任务,特别是对机器学习(ML)算法来说,因为数据中的可能子组的数量过多。因此,通常只是对一些预先定义的子组进行审核。现有的开发中有一些解决方案,但它们在弱信号下或者小 subgroup 中不具有准确性。我们提出了一种新的测试过程,基于以下经验:如果我们可以重新排序观测值按其预期差异,那么在这个序列上应该出现一个关于预测和实际差异的变化,如果存在一个不准确的子组。这样我们可以将准确性测试转换为变化点检测,这里有强大的方法。我们开始通过一种分段数据的方法,将一部分数据用于训练一组候选模型,用于预测差异,另一部分数据用于进行分数基元CUSUM测试。为了进一步提高力量,我们然后将这种适应CUSUM测试扩展到包括cross-validation,保持型I错误控制,并且具有最小的假设。与现有方法相比,我们的方法在 simulated studies 中一直表现出更高的力量,并在预测 mortality risk 模型时超过了力量。

A Practical Recipe for Federated Learning Under Statistical Heterogeneity Experimental Design

  • paper_url: http://arxiv.org/abs/2307.15245
  • repo_url: https://github.com/mmorafah/fedzoo-bench
  • paper_authors: Mahdi Morafah, Weijia Wang, Bill Lin
  • for: 本研究旨在 investigate Federated Learning (FL) 在数据不同性的情况下的成功性, 并提供一个系统性的研究结果和实践建议。
  • methods: 本研究使用了多种 FL-specific experimental variables, 包括 client-side and server-side techniques, 以及不同的数据准备和评价方法。
  • results: 本研究发现了一些关键的实验变量对 FL 性能的影响, 并提供了一些实践建议和标准化的实验设置。 我们还发布了 FedZoo-Bench,一个基于 PyTorch 的开源库,包含 22 种 state-of-the-art 方法的实现,可以在 https://github.com/MMorafah/FedZoo-Bench 上下载。
    Abstract Federated Learning (FL) has been an area of active research in recent years. There have been numerous studies in FL to make it more successful in the presence of data heterogeneity. However, despite the existence of many publications, the state of progress in the field is unknown. Many of the works use inconsistent experimental settings and there are no comprehensive studies on the effect of FL-specific experimental variables on the results and practical insights for a more comparable and consistent FL experimental setup. Furthermore, the existence of several benchmarks and confounding variables has further complicated the issue of inconsistency and ambiguity. In this work, we present the first comprehensive study on the effect of FL-specific experimental variables in relation to each other and performance results, bringing several insights and recommendations for designing a meaningful and well-incentivized FL experimental setup. We further aid the community by releasing FedZoo-Bench, an open-source library based on PyTorch with pre-implementation of 22 state-of-the-art methods, and a broad set of standardized and customizable features available at https://github.com/MMorafah/FedZoo-Bench. We also provide a comprehensive comparison of several state-of-the-art (SOTA) methods to better understand the current state of the field and existing limitations.
    摘要 《联合学习(Federated Learning,FL)》在过去几年中得到了广泛的研究。有很多研究旨在使FL在数据不同性的情况下更加成功。然而,尽管有很多论文,但现状的进步还不清楚。许多研究使用不一致的实验设置,而且没有系统的研究表现FL特有的实验变量对结果的影响和实践建议。此外,存在多个标准和干扰变量,使得问题变得更加复杂和模糊。在这篇研究中,我们提供了FL特有的实验变量对之间的首次全面研究,从而提供了许多新的视角和建议,以设计一个有意义和奖励性的FL实验设置。此外,我们还提供了FedZoo-Bench,一个基于PyTorch的开源库,包含22种当前领先的方法的预实现,以及一系列标准化和可定制的特性。我们还对多种当前领先方法进行了全面比较,以更好地理解当前领域的状况和存在的限制。

Sustainable Transparency in Recommender Systems: Bayesian Ranking of Images for Explainability

  • paper_url: http://arxiv.org/abs/2308.01196
  • repo_url: None
  • paper_authors: Jorge Paz-Ruza, Amparo Alonso-Betanzos, Berta Guijarro-Berdiñas, Brais Cancela, Carlos Eiras-Franco
  • for: 提高推荐系统的透明度和用户信任度
  • methods: 使用用户创建的视觉内容生成个性化解释
  • results: 比前一代模型具有更高的性能和效率,减少了75%的CO${_2}$排放和模型尺寸。
    Abstract Recommender Systems have become crucial in the modern world, commonly guiding users towards relevant content or products, and having a large influence over the decisions of users and citizens. However, ensuring transparency and user trust in these systems remains a challenge; personalized explanations have emerged as a solution, offering justifications for recommendations. Among the existing approaches for generating personalized explanations, using visual content created by the users is one particularly promising option, showing a potential to maximize transparency and user trust. Existing models for explaining recommendations in this context face limitations: sustainability has been a critical concern, as they often require substantial computational resources, leading to significant carbon emissions comparable to the Recommender Systems where they would be integrated. Moreover, most models employ surrogate learning goals that do not align with the objective of ranking the most effective personalized explanations for a given recommendation, leading to a suboptimal learning process and larger model sizes. To address these limitations, we present BRIE, a novel model designed to tackle the existing challenges by adopting a more adequate learning goal based on Bayesian Pairwise Ranking, enabling it to achieve consistently superior performance than state-of-the-art models in six real-world datasets, while exhibiting remarkable efficiency, emitting up to 75% less CO${_2}$ during training and inference with a model up to 64 times smaller than previous approaches.
    摘要

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

  • paper_url: http://arxiv.org/abs/2307.15217
  • repo_url: None
  • paper_authors: Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell
  • for: 这个论文旨在探讨人工智能系统如何与人类目标相对应,以及RLHF方法在实践中的问题和局限性。
  • methods: 这篇论文使用了RLHF方法来训练大语言模型,并提出了一些实践中的技巧来改进RLHF方法。
  • results: 这篇论文认为RLHF方法存在一些潜在的问题和局限性,并提出了一些审核和公布标准来提高社会监管RLHF系统的能力。
    Abstract Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and related methods; (2) overview techniques to understand, improve, and complement RLHF in practice; and (3) propose auditing and disclosure standards to improve societal oversight of RLHF systems. Our work emphasizes the limitations of RLHF and highlights the importance of a multi-faceted approach to the development of safer AI systems.
    摘要 人类反馈学习强化(RLHF)是一种训练人工智能系统对人类目标的技术。RLHF已经成为现代大语言模型(LLM)的Finetune中心方法。尽管它的流行程度,但有 relativ little public work systematizing its flaws。在这篇论文中,我们(1)报告RLHF和相关方法的开放问题和基本限制;(2)介绍RLHF在实践中理解、改进和补充的技巧;(3)提出了RLHF系统的审核和披透标准,以提高社会对RLHF系统的监管。我们的工作强调RLHF的限制,并高亮了在开发更安全的AI系统方面需要多种方法的发展。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization

  • paper_url: http://arxiv.org/abs/2307.15199
  • repo_url: None
  • paper_authors: Junhyeong Cho, Gilhyun Nam, Sungyeon Kim, Hunmin Yang, Suha Kwak
  • for: 这个研究旨在提高源自零领域缩减中的表现,不需要使用任何图像。
  • methods: 提案的方法是PromptStyler,它使用提示来生成多种Distribution Shift在共同空间中,并透过学习式字库Word vector来生成多种Style feature。
  • results: 研究获得了State of the art的成绩在PACS、VLCS、OfficeHome和DomainNet等测试集上,而不需要任何图像进行训练。
    Abstract In a joint vision-language space, a text feature (e.g., from "a photo of a dog") could effectively represent its relevant image features (e.g., from dog photos). Also, a recent study has demonstrated the cross-modal transferability phenomenon of this joint space. From these observations, we propose PromptStyler which simulates various distribution shifts in the joint space by synthesizing diverse styles via prompts without using any images to deal with source-free domain generalization. The proposed method learns to generate a variety of style features (from "a S* style of a") via learnable style word vectors for pseudo-words S*. To ensure that learned styles do not distort content information, we force style-content features (from "a S* style of a [class]") to be located nearby their corresponding content features (from "[class]") in the joint vision-language space. After learning style word vectors, we train a linear classifier using synthesized style-content features. PromptStyler achieves the state of the art on PACS, VLCS, OfficeHome and DomainNet, even though it does not require any images for training.
    摘要 在共同视语空间中,文本特征(例如,来自“一张狗照片”)可以有效地表示相关的图像特征(例如,来自狗照片)。此外,一项最近的研究已经证明了这个共同空间中的跨modal传递现象。基于这些观察,我们提出了PromptStyler,它通过使用提示而在共同空间中 simulate 多种分布转移。该方法不需要使用任何图像来处理源无限定类型泛化。我们学习了一系列的风格特征(例如,“一种S*风格的”)via可学习的风格词Vector。为确保学习的风格不会扭曲内容信息,我们强制风格-内容特征(例如,“一种S*风格的[类]”)在共同视语空间中与其对应的内容特征(例如,“[类]”)相 nearby。之后,我们使用生成的风格-内容特征进行线性分类。PromptStyler在PACS、VLCS、OfficeHome和DomainNet等 dataset上达到了状态计算机科学中的顶峰表现,即使不需要任何图像进行训练。

Identifying acute illness phenotypes via deep temporal interpolation and clustering network on physiologic signatures

  • paper_url: http://arxiv.org/abs/2307.15719
  • repo_url: None
  • paper_authors: Yuanfang Ren, Yanjun Li, Tyler J. Loftus, Jeremy Balch, Kenneth L. Abbott, Shounak Datta, Matthew M. Ruppert, Ziyuan Guan, Benjamin Shickel, Parisa Rashidi, Tezcan Ozrazgat-Baslanti, Azra Bihorac
  • for: 这份研究用于发现初期医院接受时间对临床走向的影响,并且通过数据缺乏的情况下提供早期临床决策的支持。
  • methods: 这份研究使用了深度时间 interpolating和 clustering 网络,将稀疏、不规则的生命征象数据中提取出潜在表示,并从训练集(n=41,502)中提取出明显的患者型别。
  • results: 研究发现了4个患者型别,每个型别都有不同的疾病和结果。型别A(18%)有最多的复合疾病,高率的呼吸不足、肾衰竭、 septic shock 和三年后的死亡率。型别B(33%)和C(31%)有普遍的轻度器官衰竭,但型别B 有最好的短期结果,而型别C 有最好的临床结果。型别D(17%)有早期/持续的低血压、高率的早期手术和许多血液标记的inflammation,但三年后的死亡率较低。
    Abstract Initial hours of hospital admission impact clinical trajectory, but early clinical decisions often suffer due to data paucity. With clustering analysis for vital signs within six hours of admission, patient phenotypes with distinct pathophysiological signatures and outcomes may support early clinical decisions. We created a single-center, longitudinal EHR dataset for 75,762 adults admitted to a tertiary care center for 6+ hours. We proposed a deep temporal interpolation and clustering network to extract latent representations from sparse, irregularly sampled vital sign data and derived distinct patient phenotypes in a training cohort (n=41,502). Model and hyper-parameters were chosen based on a validation cohort (n=17,415). Test cohort (n=16,845) was used to analyze reproducibility and correlation with biomarkers. The training, validation, and testing cohorts had similar distributions of age (54-55 yrs), sex (55% female), race, comorbidities, and illness severity. Four clusters were identified. Phenotype A (18%) had most comorbid disease with higher rate of prolonged respiratory insufficiency, acute kidney injury, sepsis, and three-year mortality. Phenotypes B (33%) and C (31%) had diffuse patterns of mild organ dysfunction. Phenotype B had favorable short-term outcomes but second-highest three-year mortality. Phenotype C had favorable clinical outcomes. Phenotype D (17%) had early/persistent hypotension, high rate of early surgery, and substantial biomarker rate of inflammation but second-lowest three-year mortality. After comparing phenotypes' SOFA scores, clustering results did not simply repeat other acuity assessments. In a heterogeneous cohort, four phenotypes with distinct categories of disease and outcomes were identified by a deep temporal interpolation and clustering network. This tool may impact triage decisions and clinical decision-support under time constraints.
    摘要 <>医院 admit 初始时间对临床轨迹产生重要影响,但早期临床决策 frequently 受到数据缺乏的困扰。我们通过在 admit 至 6 小时内进行整合分析,可以从稀疏、不规则的生命征数据中提取潜在的 patient phenotypes 和不同的疾病特征。我们在一所三级医疗中心收治了 75,762 名成人, duration 至少 6 小时。我们提出了一种深度时间 interpolating 和 clustering 网络,以提取生命征数据中的潜在表示。我们在训练集(n=41,502)中提出了四种 patient phenotypes,每种 phenotype 都有明确的疾病特征和结果。我们根据验证集(n=17,415)中的模型和参数进行选择。测试集(n=16,845)用于评估重复性和与生物标志物相关性。训练、验证和测试集中年龄(54-55 岁)、性别(55% 女性)、种族、后遗病和疾病严重程度均匀分布。四种 phenotypes 中的首个(18%)有最多的并发病和高概率的呼吸窘迫、肾衰竭、 septic shock 和三年 mortality。第二种 phenotypes(33%)和第三种 phenotypes(31%)具有普遍的轻度器官衰竭,但短期result 好。第四种 phenotypes(17%)有早期/持续低血压、高概率的早期手术和严重的生物标志物,但三年 mortality 相对较低。通过对不同 phenotypes 的 SOFA 分数进行比较,整合分析结果并不 simply 重复其他acuity assessments。在一个多样化的 cohort 中,我们通过深度时间 interpolating 和 clustering 网络,可以从稀疏、不规则的生命征数据中提取四种不同类型的疾病和结果。这种工具可能会影响医疗决策和临床决策支持,尤其是在时间紧张的情况下。

The Marginal Value of Momentum for Small Learning Rate SGD

  • paper_url: http://arxiv.org/abs/2307.15196
  • repo_url: None
  • paper_authors: Runzhe Wang, Sadhika Malladi, Tianhao Wang, Kaifeng Lyu, Zhiyuan Li
  • for: 这 paper 的目的是解释 momentum 在 Stochastic gradient descent (SGD) 中的作用,特别是在小学习率和大量批处理误差的情况下。
  • methods: 这 paper 使用了 theoretical analysis 和实验来研究 momentum 的效果。
  • results: 研究发现,在实际训练场景下, momentum 对优化和泛化都没有明显的提升,尤其是当学习率不够大时。
    Abstract Momentum is known to accelerate the convergence of gradient descent in strongly convex settings without stochastic gradient noise. In stochastic optimization, such as training neural networks, folklore suggests that momentum may help deep learning optimization by reducing the variance of the stochastic gradient update, but previous theoretical analyses do not find momentum to offer any provable acceleration. Theoretical results in this paper clarify the role of momentum in stochastic settings where the learning rate is small and gradient noise is the dominant source of instability, suggesting that SGD with and without momentum behave similarly in the short and long time horizons. Experiments show that momentum indeed has limited benefits for both optimization and generalization in practical training regimes where the optimal learning rate is not very large, including small- to medium-batch training from scratch on ImageNet and fine-tuning language models on downstream tasks.
    摘要 势能在强共轭情况下加速梯度下降减速,但在随机优化中,folklore Suggests that momentum may help deep learning optimization by reducing the variance of the stochastic gradient update, but previous theoretical analyses do not find momentum to offer any provable acceleration. 这篇论文解释了在随机设置中,where the learning rate is small and gradient noise is the dominant source of instability, SGD with and without momentum behave similarly in the short and long time horizons. 实验表明,势能 indeed has limited benefits for both optimization and generalization in practical training regimes where the optimal learning rate is not very large, including small- to medium-batch training from scratch on ImageNet and fine-tuning language models on downstream tasks.

Learning in Repeated Multi-Unit Pay-As-Bid Auctions

  • paper_url: http://arxiv.org/abs/2307.15193
  • repo_url: None
  • paper_authors: Rigel Galgana, Negin Golrezaei
  • for: 这篇论文关注的是在重复的多单位付款拍卖中学习如何出价,以实现最大化利润。
  • methods: 作者使用动态计划(DP)算法来解决这个问题,并在全信息和强化反馈下进行了线上学习算法的设计。
  • results: 作者证明了在线上学习算法的时间复杂度为乘方时间复杂度,并且在实际实验中,当所有投标者遵循作者提出的无 regret学习算法时,市场动态会向一个最大化利润的均衡点转化。此外,作者还发现在多单位付款拍卖中,付款拍卖可以带来较高的收益,比其受欢迎的替代方案——固定价格拍卖。
    Abstract Motivated by Carbon Emissions Trading Schemes, Treasury Auctions, and Procurement Auctions, which all involve the auctioning of homogeneous multiple units, we consider the problem of learning how to bid in repeated multi-unit pay-as-bid auctions. In each of these auctions, a large number of (identical) items are to be allocated to the largest submitted bids, where the price of each of the winning bids is equal to the bid itself. The problem of learning how to bid in pay-as-bid auctions is challenging due to the combinatorial nature of the action space. We overcome this challenge by focusing on the offline setting, where the bidder optimizes their vector of bids while only having access to the past submitted bids by other bidders. We show that the optimal solution to the offline problem can be obtained using a polynomial time dynamic programming (DP) scheme. We leverage the structure of the DP scheme to design online learning algorithms with polynomial time and space complexity under full information and bandit feedback settings. We achieve an upper bound on regret of $O(M\sqrt{T\log |\mathcal{B}|})$ and $O(M\sqrt{|\mathcal{B}|T\log |\mathcal{B}|})$ respectively, where $M$ is the number of units demanded by the bidder, $T$ is the total number of auctions, and $|\mathcal{B}|$ is the size of the discretized bid space. We accompany these results with a regret lower bound, which match the linear dependency in $M$. Our numerical results suggest that when all agents behave according to our proposed no regret learning algorithms, the resulting market dynamics mainly converge to a welfare maximizing equilibrium where bidders submit uniform bids. Lastly, our experiments demonstrate that the pay-as-bid auction consistently generates significantly higher revenue compared to its popular alternative, the uniform price auction.
    摘要 受到碳排放交易制度、储蓄拍卖和采购拍卖的 inspirations,我们考虑了在重复的多单位付出为投标的问题上学习投标策略。在这些拍卖中,大量相同的商品需要分配给最大的投标价格,其中每个赢得的投标价格都等于投标价格本身。由于拍卖的动作空间具有 combinatorial 性,这个问题非常困难。我们通过关注 offline 设定,即投标者在过去投标记录上仅可以获取历史投标记录,解决了这个问题。我们证明了 offline 问题的优化策略可以使用一种 polynomial time 的动态规划(DP)方案来实现。我们利用 DP 方案的结构,设计了在全信息和带有反馈的情况下的在线学习算法,其时间复杂度和存储空间复杂度均为多项式时间。我们证明了在 $O(M\sqrt{T\log |\mathcal{B}|})$ 和 $O(M\sqrt{|\mathcal{B}|T\log |\mathcal{B}|})$ 的情况下,我们的恐慌 regret upper bound 和 lower bound 均为线性函数。我们的数学结果表明,当所有代理人按照我们提议的不留 regret 学习算法,市场动态会主要循环到一个最大化公平价值的均衡,其中投标者将提交均匀投标。最后,我们的实验表明,付出为投标 auction 在许多情况下会生成比 uniform price auction 更高的收入。

f-Divergence Minimization for Sequence-Level Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2307.15190
  • repo_url: https://github.com/manga-uofa/fdistill
  • paper_authors: Yuqiao Wen, Zichao Li, Wenyu Du, Lili Mou
  • for: 本研究旨在提出一个名为f-DISTILL的框架,用于实现语言模型知识传递。
  • methods: 本研究使用一个通过最小化一个通用f-分配函数来实现序列级知识传递的方法,并提出了四种知识传递变种。
  • results: 实验结果表明, compared to现有的SeqKD和ENGINE方法,我们的f-DISTILL方法在四个数据集上表现更好,而我们的对称的知识传递损失可以更好地让学生学习教师分布。
    Abstract Knowledge distillation (KD) is the process of transferring knowledge from a large model to a small one. It has gained increasing attention in the natural language processing community, driven by the demands of compressing ever-growing language models. In this work, we propose an f-DISTILL framework, which formulates sequence-level knowledge distillation as minimizing a generalized f-divergence function. We propose four distilling variants under our framework and show that existing SeqKD and ENGINE approaches are approximations of our f-DISTILL methods. We further derive step-wise decomposition for our f-DISTILL, reducing intractable sequence-level divergence to word-level losses that can be computed in a tractable manner. Experiments across four datasets show that our methods outperform existing KD approaches, and that our symmetric distilling losses can better force the student to learn from the teacher distribution.
    摘要 知识填充(KD)是将知识从大模型传递到小模型的过程。随着语言模型的不断扩大,KD在自然语言处理领域得到了越来越多的关注。在这项工作中,我们提出了f-DISTILL框架,将序列级知识填充定为最小化一个泛化f-散度函数。我们提出了四种填充变体,并证明了现有的SeqKD和ENGINE方法是我们f-DISTILL方法的近似方法。我们还 deriv出了逐步分解,将不可 tractable的序列级散度转化为可计算的单词级损失。经过四个数据集的实验,我们发现我们的方法可以超越现有的KD方法,并且我们的对称填充损失可以更好地让学生学习教师分布。

Rotation-Invariant Random Features Provide a Strong Baseline for Machine Learning on 3D Point Clouds

  • paper_url: http://arxiv.org/abs/2308.06271
  • repo_url: https://github.com/meliao/rotation-invariant-random-features
  • paper_authors: Owen Melia, Eric Jonas, Rebecca Willett
  • for: 这篇论文是为了探讨三维点 cloud 数据的 rotation-invariant 机器学习方法,以及这种方法在分子性质预测和3D shape 分类 зада务中的表现。
  • methods: 本论文使用了一种简单且通用的随机特征方法,将三维点 cloud 数据转换为 rotation-invariant 的特征,并且显示了这种方法在标准分子性质预测 benchmark 资料集 QM7 和 QM9 上匹配或超越了一般的 rotation-invariant neural network 的性能。
  • results: 本论文显示了这种方法在分子性质预测和3D shape 分类 зада务中的一般化和高效性,并且与一般的 rotation-invariant neural network 相比,预测时间仅有一个数量级的差异。
    Abstract Rotational invariance is a popular inductive bias used by many fields in machine learning, such as computer vision and machine learning for quantum chemistry. Rotation-invariant machine learning methods set the state of the art for many tasks, including molecular property prediction and 3D shape classification. These methods generally either rely on task-specific rotation-invariant features, or they use general-purpose deep neural networks which are complicated to design and train. However, it is unclear whether the success of these methods is primarily due to the rotation invariance or the deep neural networks. To address this question, we suggest a simple and general-purpose method for learning rotation-invariant functions of three-dimensional point cloud data using a random features approach. Specifically, we extend the random features method of Rahimi & Recht 2007 by deriving a version that is invariant to three-dimensional rotations and showing that it is fast to evaluate on point cloud data. We show through experiments that our method matches or outperforms the performance of general-purpose rotation-invariant neural networks on standard molecular property prediction benchmark datasets QM7 and QM9. We also show that our method is general-purpose and provides a rotation-invariant baseline on the ModelNet40 shape classification task. Finally, we show that our method has an order of magnitude smaller prediction latency than competing kernel methods.
    摘要 “旋转协variance是机器学习中广泛使用的 inductive bias,如计算机视觉和量子化学机器学习。旋转不变机器学习方法在许多任务上设置了州OF-the-art,包括分子性质预测和3D形状分类。这些方法通常是使用任务特定的旋转不变特征,或者使用通用的深度神经网络,后者具有复杂的设计和训练问题。然而,是否 rotation协variance的成功主要归功于旋转不变性还是深度神经网络,这问题仍然存在。为了解决这个问题,我们提出了一种简单和通用的方法,利用随机特征来学习三维点云数据上的旋转不变函数。具体来说,我们将 Rahimi & Recht 2007 的随机特征方法扩展到三维旋转不变的版本,并证明其在点云数据上快速评估。我们通过实验表明,我们的方法与通用的旋转不变神经网络相比,在标准分子性质预测 benchmark 数据上匹配或超越其性能。我们还证明了我们的方法是通用的,并在 ModelNet40 形状分类任务上提供了旋转不变基准。最后,我们证明了我们的方法与竞争的核函数方法相比,预测延迟只有一个数量级。”

RCT Rejection Sampling for Causal Estimation Evaluation

  • paper_url: http://arxiv.org/abs/2307.15176
  • repo_url: https://github.com/kakeith/rct_rejection_sampling
  • paper_authors: Katherine A. Keith, Sergey Feldman, David Jurgens, Jonathan Bragg, Rohit Bhattacharya
  • For: The paper is written to address the challenge of confounding in observational data, specifically in high-dimensional settings such as text data, genomics, or the behavioral social sciences.* Methods: The paper proposes a new method called RCT rejection sampling, which uses subsampling of randomized controlled trials (RCTs) to create confounded observational datasets, and provides theoretical guarantees for causal identification.* Results: The paper shows that the proposed algorithm results in low bias when evaluated on synthetic data, and highlights several finite data considerations for evaluation designers who plan to use RCT rejection sampling on their own datasets. Additionally, the paper provides a proof of concept using a novel, real-world RCT consisting of approximately 70k observations and text data as high-dimensional covariates.
    Abstract Confounding is a significant obstacle to unbiased estimation of causal effects from observational data. For settings with high-dimensional covariates -- such as text data, genomics, or the behavioral social sciences -- researchers have proposed methods to adjust for confounding by adapting machine learning methods to the goal of causal estimation. However, empirical evaluation of these adjustment methods has been challenging and limited. In this work, we build on a promising empirical evaluation strategy that simplifies evaluation design and uses real data: subsampling randomized controlled trials (RCTs) to create confounded observational datasets while using the average causal effects from the RCTs as ground-truth. We contribute a new sampling algorithm, which we call RCT rejection sampling, and provide theoretical guarantees that causal identification holds in the observational data to allow for valid comparisons to the ground-truth RCT. Using synthetic data, we show our algorithm indeed results in low bias when oracle estimators are evaluated on the confounded samples, which is not always the case for a previously proposed algorithm. In addition to this identification result, we highlight several finite data considerations for evaluation designers who plan to use RCT rejection sampling on their own datasets. As a proof of concept, we implement an example evaluation pipeline and walk through these finite data considerations with a novel, real-world RCT -- which we release publicly -- consisting of approximately 70k observations and text data as high-dimensional covariates. Together, these contributions build towards a broader agenda of improved empirical evaluation for causal estimation.
    摘要 干扰是观察数据中 causal 效应的主要障碍。在高维 covariate 的设置下,such as text data, genomics, or behavioral social sciences, researchers have proposed methods to adjust for confounding by adapting machine learning methods to the goal of causal estimation. However, empirical evaluation of these adjustment methods has been challenging and limited.在这种工作中,我们建立了一种有前途的 empirical evaluation strategy, simplify evaluation design and use real data: Randomized controlled trials (RCTs) 的 subsampling 创造了受到恶势力影响的观察数据,而使用 RCTs 的平均 causal effect 作为真实参照。我们提出了一种新的抽样算法,称为 RCT rejection sampling,并提供了理论保证,表明在观察数据中,causal 标识存在。使用 sintetic 数据,我们示出了我们的算法实际上具有低偏误。除了这一标识结果外,我们还提出了一些考虑finite data 的设计师可能会在他们自己的数据上使用 RCT rejection sampling 的问题。作为证明,我们实现了一个示例评估管道,并详细介绍了这些finite data 的考虑事项。总之,我们的贡献是 towards a broader agenda of improved empirical evaluation for causal estimation。

Causative Cyberattacks on Online Learning-based Automated Demand Response Systems

  • paper_url: http://arxiv.org/abs/2307.15175
  • repo_url: None
  • paper_authors: Samrat Acharya, Yury Dvorkin, Ramesh Karri
  • for: 本研究旨在探讨人工智能(AI)在供应侧热卷较小的电力loads中的应用,以及这些loads的数据集被用来验证攻击者可能会利用的攻击方法。
  • methods: 本研究使用了人工智能学习方法,包括机器学习和深度学习,以分析用户的能源使用模式并设计优化的奖励策略。
  • results: 研究发现,通过在DR客户端上执行攻击,可以让DR客户端错误地响应DR奖励,从而导致DR客户端的能源消耗增加,并且可以通过控制DR客户端的响应来 manipulate DR market。
    Abstract Power utilities are adopting Automated Demand Response (ADR) to replace the costly fuel-fired generators and to preempt congestion during peak electricity demand. Similarly, third-party Demand Response (DR) aggregators are leveraging controllable small-scale electrical loads to provide on-demand grid support services to the utilities. Some aggregators and utilities have started employing Artificial Intelligence (AI) to learn the energy usage patterns of electricity consumers and use this knowledge to design optimal DR incentives. Such AI frameworks use open communication channels between the utility/aggregator and the DR customers, which are vulnerable to \textit{causative} data integrity cyberattacks. This paper explores vulnerabilities of AI-based DR learning and designs a data-driven attack strategy informed by DR data collected from the New York University (NYU) campus buildings. The case study demonstrates the feasibility and effects of maliciously tampering with (i) real-time DR incentives, (ii) DR event data sent to DR customers, and (iii) responses of DR customers to the DR incentives.
    摘要 各种能源供应商正在采用自动化需求应答(ADR),以取代昂贵的燃料燃烧机和预防峰值电力需求压力。同时,第三方需求应答(DR)聚合者也在利用可控小规模电力负荷来提供实时电网支持服务。一些聚合者和供应商已经开始使用人工智能(AI)来学习电力消耗者的能源使用模式,并使用这些知识来设计优化的DR激励计划。这些AI框架使用公开的通信频道 между供应商/聚合者和DR客户,这些通信频道受到了 causative 数据完整性攻击的威胁。本文探讨了 AI-based DR 学习的漏洞,并设计了一种基于 DR 数据的数据驱动攻击策略。案例研究表明了在 NYU 校园建筑物上收集的 DR 数据可以用于设计和实现这种攻击策略。

PredictChain: Empowering Collaboration and Data Accessibility for AI in a Decentralized Blockchain-based Marketplace

  • paper_url: http://arxiv.org/abs/2307.15168
  • repo_url: https://github.com/ai-and-blockchain/s23_predictchain
  • paper_authors: Matthew T. Pisano, Connor J. Patterson, Oshani Seneviratne
  • For: 该论文旨在提供一个基于区块链的市场平台,帮助用户上传数据集用于预测机器学习模型训练,或者请求已上传数据集的模型训练,或者提交查询到已训练模型。* Methods: 该论文提出了一个基于区块链的机制,通过各个节点的可用计算资源来运行多种不同特征的预测机器学习模型,包括成本、速度、简洁、能力和成本效果等。* Results: 该论文通过实现了一个分布式的预测机器学习模型市场平台,推动了数据分享和中央云服务器的减少,并且为用户提供了一个可靠、安全、可控的机制来训练和使用预测机器学习模型。
    Abstract Limited access to computing resources and training data poses significant challenges for individuals and groups aiming to train and utilize predictive machine learning models. Although numerous publicly available machine learning models exist, they are often unhosted, necessitating end-users to establish their computational infrastructure. Alternatively, these models may only be accessible through paid cloud-based mechanisms, which can prove costly for general public utilization. Moreover, model and data providers require a more streamlined approach to track resource usage and capitalize on subsequent usage by others, both financially and otherwise. An effective mechanism is also lacking to contribute high-quality data for improving model performance. We propose a blockchain-based marketplace called "PredictChain" for predictive machine-learning models to address these issues. This marketplace enables users to upload datasets for training predictive machine learning models, request model training on previously uploaded datasets, or submit queries to trained models. Nodes within the blockchain network, equipped with available computing resources, will operate these models, offering a range of archetype machine learning models with varying characteristics, such as cost, speed, simplicity, power, and cost-effectiveness. This decentralized approach empowers users to develop improved models accessible to the public, promotes data sharing, and reduces reliance on centralized cloud providers.
    摘要 To address these issues, we propose a blockchain-based marketplace called "PredictChain" for predictive machine-learning models. This marketplace enables users to upload datasets for training predictive machine learning models, request model training on previously uploaded datasets, or submit queries to trained models. Nodes within the blockchain network, equipped with available computing resources, will operate these models, offering a range of archetype machine learning models with varying characteristics, such as cost, speed, simplicity, power, and cost-effectiveness. This decentralized approach empowers users to develop improved models accessible to the public, promotes data sharing, and reduces reliance on centralized cloud providers.

VISU at WASSA 2023 Shared Task: Detecting Emotions in Reaction to News Stories Leveraging BERT and Stacked Embeddings

  • paper_url: http://arxiv.org/abs/2307.15164
  • repo_url: None
  • paper_authors: Vivek Kumar, Sushmita Singh, Prayag Tiwari
  • for: 这项研究旨在开发深度学习模型,用于从新闻文章中推断情感表达。
  • methods: 研究使用word embedding表示法,并采用了适应性的预处理策略,以捕捉情感表达的细节。试验使用了静态和上下文嵌入(个体和堆叠),并与BiLSTM和Transformer模型进行了比较。
  • results: 研究在WASSA 2023共享任务中的情感分类任务中取得了第十名,其中Macro F1得分为0.2717,证明了实施的方法的有效性,尤其是在小样本和不均衡的数据集上。
    Abstract Our system, VISU, participated in the WASSA 2023 Shared Task (3) of Emotion Classification from essays written in reaction to news articles. Emotion detection from complex dialogues is challenging and often requires context/domain understanding. Therefore in this research, we have focused on developing deep learning (DL) models using the combination of word embedding representations with tailored prepossessing strategies to capture the nuances of emotions expressed. Our experiments used static and contextual embeddings (individual and stacked) with Bidirectional Long short-term memory (BiLSTM) and Transformer based models. We occupied rank tenth in the emotion detection task by scoring a Macro F1-Score of 0.2717, validating the efficacy of our implemented approaches for small and imbalanced datasets with mixed categories of target emotions.
    摘要 我们的系统,VISU,参加了2023年WASSA共享任务(3)的情感分类从新闻文章中的反应文章。情感检测从复杂对话中是挑战,需要Context/领域理解。因此在这项研究中,我们集中了深度学习(DL)模型,使用词嵌入表示和定制预处理策略,捕捉表达出的情感细节。我们的实验使用静态和上下文嵌入(个人和堆叠)与BiLSTM和Transformer基于模型。我们在情感检测任务中占据了第十名,得分 macro F1-Score 0.2717,证明我们实施的方法对小型和杂合类目的数据集具有效果。

R-LPIPS: An Adversarially Robust Perceptual Similarity Metric

  • paper_url: http://arxiv.org/abs/2307.15157
  • repo_url: https://github.com/saraghazanfari/r-lpips
  • paper_authors: Sara Ghazanfari, Siddharth Garg, Prashanth Krishnamurthy, Farshad Khorrami, Alexandre Araujo
  • For: The paper aims to address the security concerns of the Learned Perceptual Image Patch Similarity (LPIPS) metric by proposing a new metric called Robust Learned Perceptual Image Patch Similarity (R-LPIPS) that is more robust to adversarial examples.* Methods: The R-LPIPS metric leverages adversarially trained deep features to improve its robustness against adversarial examples.* Results: The paper demonstrates the superiority of R-LPIPS compared to the classical LPIPS metric through a comprehensive set of experiments.
    Abstract Similarity metrics have played a significant role in computer vision to capture the underlying semantics of images. In recent years, advanced similarity metrics, such as the Learned Perceptual Image Patch Similarity (LPIPS), have emerged. These metrics leverage deep features extracted from trained neural networks and have demonstrated a remarkable ability to closely align with human perception when evaluating relative image similarity. However, it is now well-known that neural networks are susceptible to adversarial examples, i.e., small perturbations invisible to humans crafted to deliberately mislead the model. Consequently, the LPIPS metric is also sensitive to such adversarial examples. This susceptibility introduces significant security concerns, especially considering the widespread adoption of LPIPS in large-scale applications. In this paper, we propose the Robust Learned Perceptual Image Patch Similarity (R-LPIPS) metric, a new metric that leverages adversarially trained deep features. Through a comprehensive set of experiments, we demonstrate the superiority of R-LPIPS compared to the classical LPIPS metric. The code is available at https://github.com/SaraGhazanfari/R-LPIPS.
    摘要 Computer vision 中的相似度度量有很大的作用,用于捕捉图像的含义。在最近几年,高级相似度度量,如学习后的图像特征相似度度量(LPIPS),出现了。这些度量利用训练过的神经网络提取的深度特征,并表现出了与人类视觉相似的惊人能力。然而,现在已经广泛 acknowledge 的是,神经网络受到抗性示例的影响,即通过小量不可见的扰动欺骗模型的特殊例子。这种抗性引入了安全性问题,尤其是在大规模应用中。在这篇论文中,我们提出了robust learned perceptual image patch similarity(R-LPIPS)度量,一种新的度量,利用抗性训练的深度特征。通过全面的实验,我们证明了R-LPIPS 度量的superiority 相比 classical LPIPS 度量。代码可以在 中找到。

A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity

  • paper_url: http://arxiv.org/abs/2307.15154
  • repo_url: None
  • paper_authors: Zhihan Xiong, Romain Camilleri, Maryam Fazel, Lalit Jain, Kevin Jamieson
  • for: The paper is written for identifying the best arm in a linear bandit problem with a non-stationary environment.
  • methods: The paper proposes a novel algorithm called $\mathsf{P1}$-$\mathsf{RAGE}$ that combines the advantages of both stationary and non-stationary algorithms to achieve robustness and fast identification.
  • results: The paper shows that the proposed algorithm achieves a lower error probability than existing algorithms in the non-stationary setting, while also performing well in benign settings.
    Abstract We investigate the fixed-budget best-arm identification (BAI) problem for linear bandits in a potentially non-stationary environment. Given a finite arm set $\mathcal{X}\subset\mathbb{R}^d$, a fixed budget $T$, and an unpredictable sequence of parameters $\left\lbrace\theta_t\right\rbrace_{t=1}^{T}$, an algorithm will aim to correctly identify the best arm $x^* := \arg\max_{x\in\mathcal{X}x^\top\sum_{t=1}^{T}\theta_t$ with probability as high as possible. Prior work has addressed the stationary setting where $\theta_t = \theta_1$ for all $t$ and demonstrated that the error probability decreases as $\exp(-T /\rho^*)$ for a problem-dependent constant $\rho^*$. But in many real-world $A/B/n$ multivariate testing scenarios that motivate our work, the environment is non-stationary and an algorithm expecting a stationary setting can easily fail. For robust identification, it is well-known that if arms are chosen randomly and non-adaptively from a G-optimal design over $\mathcal{X}$ at each time then the error probability decreases as $\exp(-T\Delta^2_{(1)}/d)$, where $\Delta_{(1)} = \min_{x \neq x^*} (x^* - x)^\top \frac{1}{T}\sum_{t=1}^T \theta_t$. As there exist environments where $\Delta_{(1)}^2/ d \ll 1/ \rho^*$, we are motivated to propose a novel algorithm $\mathsf{P1}$-$\mathsf{RAGE}$ that aims to obtain the best of both worlds: robustness to non-stationarity and fast rates of identification in benign settings. We characterize the error probability of $\mathsf{P1}$-$\mathsf{RAGE}$ and demonstrate empirically that the algorithm indeed never performs worse than G-optimal design but compares favorably to the best algorithms in the stationary setting.
    摘要 我们研究了固定预算最佳臂识别(BAI)问题,对于线性弹珠在潜在非站ARY环境中。我们有一个终端臂集$\mathcal{X}\subset\mathbb{R}^d$,一个固定预算$T$,以及一个无法预测的系列参数$\left\lbrace\theta_t\right\rbrace_{t=1}^{T}$。一个算法将尝试在可能的最高概率下正确地识别最佳臂$x^*:=\arg\max_{x\in\mathcal{X}x^\top\sum_{t=1}^{T}\theta_t$。先前的工作已经处理过站ARY情况,其中$\theta_t = \theta_1$ for all $t$,并证明了错误概率随着$T$变化为$\exp(-T/\rho^*)$,其中$\rho^*$是问题相依的常数。但在实际的$A/B/n$多变量测试中,环境通常是非站ARY的,因此预期站ARY的算法可以轻松失败。为了实现预算Robustness,我们知道如果在每个时间点随机地选择非逐次的G优化设计 over $\mathcal{X}$,错误概率将随着$T\Delta^2_{(1)}/d$下降,其中$\Delta_{(1)} = \min_{x \neq x^*} (x^* - x)^\top \frac{1}{T}\sum_{t=1}^T \theta_t$。因为存在环境中$\Delta_{(1)}^2/ d \ll 1/ \rho^*$,我们被动机验证一个新的算法$\mathsf{P1}-\mathsf{RAGE}$,它将寻求最佳的两个世界:Robustness to non-stationarity和在正常情况下快速的识别速率。我们描述了$\mathsf{P1}-\mathsf{RAGE}$的错误概率,并证明了它在实际中终会不比G优化设计差。

R-Block: Regularized Block of Dropout for convolutional networks

  • paper_url: http://arxiv.org/abs/2307.15150
  • repo_url: None
  • paper_authors: Liqi Wang, Qiya Hu
  • for: 这篇论文主要针对于 konvolutional Neural Networks (CNNs) 中的批处理层REGULARIZATION技术,即 Dropout 技术。
  • methods: 该论文提出了一种名为 R-Block 的互助学习训练策略,该策略在 konvolutional Neural Networks (CNNs) 中使用两个不同的 Dropout 区域来强制两个生成的差分最大化子模型的输出分布相互一致。
  • results: 该论文的实验结果表明,R-Block 可以比其他已有的结构化 Dropout 变体实现更好的性能。此外,作者还证明了他们的子模型构建方法超越了其他方法。
    Abstract Dropout as a regularization technique is widely used in fully connected layers while is less effective in convolutional layers. Therefore more structured forms of dropout have been proposed to regularize convolutional networks. The disadvantage of these methods is that the randomness introduced causes inconsistency between training and inference. In this paper, we apply a mutual learning training strategy for convolutional layer regularization, namely R-Block, which forces two outputs of the generated difference maximizing sub models to be consistent with each other. Concretely, R-Block minimizes the losses between the output distributions of two sub models with different drop regions for each sample in the training dataset. We design two approaches to construct such sub models. Our experiments demonstrate that R-Block achieves better performance than other existing structured dropout variants. We also demonstrate that our approaches to construct sub models outperforms others.
    摘要 dropout 作为常见的正则化技术,通常用于全连接层。然而,在卷积层中,dropout 的效果较差。因此,有些更Structured的 dropout 方法被提议用于卷积网络正则化。然而,这些方法的随机性引入会导致训练和推理中的不一致。在这篇论文中,我们应用了对卷积层REGULARIZATION的mutual learning训练策略,即R-Block,该策略要求两个生成的差分最大化子模型的输出分布相互匹配。具体来说,R-Block 将每个样本在训练集中的输出分布之间的损失进行最小化。我们设计了两种方法来构建子模型。我们的实验表明,R-Block 比其他已有的Structured dropout变种表现更好。此外,我们的子模型构建方法也超过了其他方法。

Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation

  • paper_url: http://arxiv.org/abs/2308.07931
  • repo_url: None
  • paper_authors: William Shen, Ge Yang, Alan Yu, Jansen Wong, Leslie Pack Kaelbling, Phillip Isola
  • for: bridges the 2D-to-3D gap for robotic manipulation
  • methods: leverages distilled feature fields to combine accurate 3D geometry with rich semantics from 2D foundation models
  • results: achieves in-the-wild generalization to unseen objects using few-shot learning method for 6-DOF grasping and placing
    Abstract Self-supervised and language-supervised image models contain rich knowledge of the world that is important for generalization. Many robotic tasks, however, require a detailed understanding of 3D geometry, which is often lacking in 2D image features. This work bridges this 2D-to-3D gap for robotic manipulation by leveraging distilled feature fields to combine accurate 3D geometry with rich semantics from 2D foundation models. We present a few-shot learning method for 6-DOF grasping and placing that harnesses these strong spatial and semantic priors to achieve in-the-wild generalization to unseen objects. Using features distilled from a vision-language model, CLIP, we present a way to designate novel objects for manipulation via free-text natural language, and demonstrate its ability to generalize to unseen expressions and novel categories of objects.
    摘要 自我超视和语言超视图像模型含有重要的世界知识,这对总体化很重要。然而,许多 робо控任务需要细节的三维几何理解,这经常缺失在二维图像特征中。这种工作弥补了这个二维-三维之间的差异,用精炼的特征场来结合精准的三维几何和丰富的 semantics from 二维基础模型。我们提出了一种几个shot学习方法,用于6DOF抓取和置放,这种方法利用了这些强大的空间和 semantics 假设来实现在野外的总体化,并且可以处理未看过的物体。使用从视力语言模型CLIP中提取出的特征,我们提出了一种通过自然语言来标识新的物体,并示出其能够泛化到未看过的表达和新的类别的物体。

On (Normalised) Discounted Cumulative Gain as an Offline Evaluation Metric for Top-$n$ Recommendation

  • paper_url: http://arxiv.org/abs/2307.15053
  • repo_url: None
  • paper_authors: Olivier Jeunen, Ivan Potapov, Aleksei Ustimenko
  • for: 这种研究是为了检验推荐系统的评价方法,特别是使用 Discounted Cumulative Gain (DCG) metric 的正确性。
  • methods: 这篇论文使用了一种 Critical Look 的方法来检验 DCG metric,包括对 DCG 的不准确性和不一致性的分析,以及一种基于实际数据的拟合方法来补做 DCG 的缺陷。
  • results: 研究发现,不正确地使用 DCG metric 可能会导致推荐系统的评价结果偏差,而且在某些情况下,正常化 DCG metric 可能会导致推荐系统的评价结果与实际情况相反。
    Abstract Approaches to recommendation are typically evaluated in one of two ways: (1) via a (simulated) online experiment, often seen as the gold standard, or (2) via some offline evaluation procedure, where the goal is to approximate the outcome of an online experiment. Several offline evaluation metrics have been adopted in the literature, inspired by ranking metrics prevalent in the field of Information Retrieval. (Normalised) Discounted Cumulative Gain (nDCG) is one such metric that has seen widespread adoption in empirical studies, and higher (n)DCG values have been used to present new methods as the state-of-the-art in top-$n$ recommendation for many years. Our work takes a critical look at this approach, and investigates when we can expect such metrics to approximate the gold standard outcome of an online experiment. We formally present the assumptions that are necessary to consider DCG an unbiased estimator of online reward and provide a derivation for this metric from first principles, highlighting where we deviate from its traditional uses in IR. Importantly, we show that normalising the metric renders it inconsistent, in that even when DCG is unbiased, ranking competing methods by their normalised DCG can invert their relative order. Through a correlation analysis between off- and on-line experiments conducted on a large-scale recommendation platform, we show that our unbiased DCG estimates strongly correlate with online reward, even when some of the metric's inherent assumptions are violated. This statement no longer holds for its normalised variant, suggesting that nDCG's practical utility may be limited.
    摘要 <>translate the following text into Simplified Chinese:Approaches to recommendation are typically evaluated in one of two ways: (1) via a (simulated) online experiment, often seen as the gold standard, or (2) via some offline evaluation procedure, where the goal is to approximate the outcome of an online experiment. Several offline evaluation metrics have been adopted in the literature, inspired by ranking metrics prevalent in the field of Information Retrieval. (Normalised) Discounted Cumulative Gain (nDCG) is one such metric that has seen widespread adoption in empirical studies, and higher (n)DCG values have been used to present new methods as the state-of-the-art in top-$n$ recommendation for many years. Our work takes a critical look at this approach, and investigates when we can expect such metrics to approximate the gold standard outcome of an online experiment. We formally present the assumptions that are necessary to consider DCG an unbiased estimator of online reward and provide a derivation for this metric from first principles, highlighting where we deviate from its traditional uses in IR. Importantly, we show that normalising the metric renders it inconsistent, in that even when DCG is unbiased, ranking competing methods by their normalised DCG can invert their relative order. Through a correlation analysis between off- and on-line experiments conducted on a large-scale recommendation platform, we show that our unbiased DCG estimates strongly correlate with online reward, even when some of the metric's inherent assumptions are violated. This statement no longer holds for its normalised variant, suggesting that nDCG's practical utility may be limited.Translate the text into Simplified Chinese:<>approaches to recommendation 通常被评估在两种方式之一:在线实验(通常是模拟的)或者离线评估过程,目标是估计在线实验的结果。在文献中,一些离线评估度量已经得到了广泛的采用,启发自信息检索领域中的排名度量。normalized discounted cumulative gain(nDCG)是其中的一个,在许多年中,高值nDCG被用来证明新方法的state-of-the-art。我们的工作是 kritically examining this approach,并investigating when we can expect such metrics to approximate the gold standard outcome of an online experiment。我们正式地表明了考虑DCG为无 bias的估计在线奖励的假设,并提供了这个度量的 derive from first principles,并且在IR中的传统用途上出现了偏差。进一步,我们表明了normalizing the metric renders it inconsistent,因为当DCG是无 bias的时,对比竞争方法的normalized DCG可以反转其相对顺序。通过对离线和在线实验的相对分析,我们显示了我们的不偏DCG估计与在线奖励之间强相关性,即使一些度量的内在假设被违反。这个说明不再 holds for its normalized variant, suggesting that nDCG's practical utility may be limited。Translation notes:* "approaches to recommendation" is translated as "推荐方法"* "online experiment" is translated as "在线实验"* "offline evaluation procedure" is translated as "离线评估过程"* "normalised" is translated as "normalized"* "Discounted Cumulative Gain" is translated as "折扣累积奖励"* "nDCG" is translated as "nDCG"* "unbiased" is translated as "无偏"* "practical utility" is translated as "实际用途"

A Transformer-based Approach for Arabic Offline Handwritten Text Recognition

  • paper_url: http://arxiv.org/abs/2307.15045
  • repo_url: None
  • paper_authors: Saleh Momeni, Bagher BabaAli
  • For: 本研究强调特定问题是recognizing offline Arabic handwritten text,它在pattern recognition和机器学习领域中是一个挑战性的问题,具有广泛的应用领域。* Methods: 我们提出了两种新的架构方法,即Transformer Transducer和标准sequence-to-sequence Transformer,以提高recognizing offline Arabic handwritten text的准确率和速度。我们的方法可以模型语言依赖关系,并且只需要使用注意机制,因此更加平行化和简单化。* Results: 我们的方法在Arabic KHATT数据集上进行评估,与现有状态的方法相比,我们的方法可以提高recognizing offline Arabic handwritten text的准确率。
    Abstract Handwriting recognition is a challenging and critical problem in the fields of pattern recognition and machine learning, with applications spanning a wide range of domains. In this paper, we focus on the specific issue of recognizing offline Arabic handwritten text. Existing approaches typically utilize a combination of convolutional neural networks for image feature extraction and recurrent neural networks for temporal modeling, with connectionist temporal classification used for text generation. However, these methods suffer from a lack of parallelization due to the sequential nature of recurrent neural networks. Furthermore, these models cannot account for linguistic rules, necessitating the use of an external language model in the post-processing stage to boost accuracy. To overcome these issues, we introduce two alternative architectures, namely the Transformer Transducer and the standard sequence-to-sequence Transformer, and compare their performance in terms of accuracy and speed. Our approach can model language dependencies and relies only on the attention mechanism, thereby making it more parallelizable and less complex. We employ pre-trained Transformers for both image understanding and language modeling. Our evaluation on the Arabic KHATT dataset demonstrates that our proposed method outperforms the current state-of-the-art approaches for recognizing offline Arabic handwritten text.
    摘要 《手写文本识别是 Pattern recognition 和机器学习 领域中的一个挑战性和重要问题,其应用范围广泛。在这篇论文中,我们专注于特定的问题是识别离线阿拉伯文本。现有的方法通常使用 convolutional neural networks 提取图像特征和 recurrent neural networks 模拟时间序列,并使用 connectionist temporal classification 生成文本。然而,这些方法受到缺乏并行化的限制,以及无法考虑语言规则的缺点。为了解决这些问题,我们介绍了两种alternative architecture,即 Transformer Transducer 和标准sequence-to-sequence Transformer,并比较其性能。我们的方法可以模型语言依赖关系,只需要使用注意机制,从而使其更加并行化和简单。我们使用预训练的 Transformer 来进行图像理解和语言模型化。我们的评估表明,我们提出的方法在阿拉伯文本KHATT 数据集上的识别性能比现有状态的方法高。》Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China.

Universal and Transferable Adversarial Attacks on Aligned Language Models

  • paper_url: http://arxiv.org/abs/2307.15043
  • repo_url: https://github.com/llm-attacks/llm-attacks
  • paper_authors: Andy Zou, Zifan Wang, J. Zico Kolter, Matt Fredrikson
  • for: 这个论文是为了攻击已经Alignment的语言模型,使其生成不良内容。
  • methods: 该论文使用了 suffix 的搜索技术,自动生成了攻击性的提问。
  • results: 该论文在多个模型和多个黑盒模型上实现了攻击,并且发现了这些攻击的 suffix 可以在不同的情况下传递。
    Abstract Because "out-of-the-box" large language models are capable of generating a great deal of objectionable content, recent work has focused on aligning these models in an attempt to prevent undesirable generation. While there has been some success at circumventing these measures -- so-called "jailbreaks" against LLMs -- these attacks have required significant human ingenuity and are brittle in practice. In this paper, we propose a simple and effective attack method that causes aligned language models to generate objectionable behaviors. Specifically, our approach finds a suffix that, when attached to a wide range of queries for an LLM to produce objectionable content, aims to maximize the probability that the model produces an affirmative response (rather than refusing to answer). However, instead of relying on manual engineering, our approach automatically produces these adversarial suffixes by a combination of greedy and gradient-based search techniques, and also improves over past automatic prompt generation methods. Surprisingly, we find that the adversarial prompts generated by our approach are quite transferable, including to black-box, publicly released LLMs. Specifically, we train an adversarial attack suffix on multiple prompts (i.e., queries asking for many different types of objectionable content), as well as multiple models (in our case, Vicuna-7B and 13B). When doing so, the resulting attack suffix is able to induce objectionable content in the public interfaces to ChatGPT, Bard, and Claude, as well as open source LLMs such as LLaMA-2-Chat, Pythia, Falcon, and others. In total, this work significantly advances the state-of-the-art in adversarial attacks against aligned language models, raising important questions about how such systems can be prevented from producing objectionable information. Code is available at github.com/llm-attacks/llm-attacks.
    摘要 因为"出包"大型自然语言模型可以生成很多不适内容,现有的工作集中在对这些模型进行对齐,以防止不适的生成。虽然有一些成功的"监禁"攻击(LLMs),但这些攻击需要人工智能和灵活,并且在实践中有些脆弱。在这篇论文中,我们提出了一种简单有效的攻击方法,使得对齐的语言模型生成不适行为。具体来说,我们的方法找到一个附加到各种查询中,以便使Language Model(LLM)生成不适的内容的 suffix。而不是人工设计,我们的方法通过扫描和梯度基于搜索技术自动生成这些恶意提示。 surprisingly,我们发现了这些恶意提示的可移植性,包括黑盒、公开发布的LLMs。 Specifically,我们在多个提示(即很多种不同类型的不适内容查询)和多个模型(我们的 случаyed Vicuna-7B和13B)上进行了训练,并得到了可以在公共的ChatGPT、Bard和Claude等interface中生成不适内容的攻击 suffix。总之,这项工作提前了对对齐语言模型的攻击的状态艺术,引发了关于如何防止这些系统生成不适信息的重要问题。 Code可以在github.com/llm-attacks/llm-attacks中找到。

Detecting Morphing Attacks via Continual Incremental Training

  • paper_url: http://arxiv.org/abs/2307.15105
  • repo_url: None
  • paper_authors: Lorenzo Pellegrini, Guido Borghi, Annalisa Franco, Davide Maltoni
  • for: 实现增量训练,当资料传输和储存有限制时,使得对多个数据来源进行批量训练具有挑战性。
  • methods: 采用不同的Continual Learning方法来实现增量训练,包括Learning without Forgetting(LwF)等方法。
  • results: LwF方法在这个方案中表现良好,并且在具有变化大小的数据批量中进行增量训练时,能够实现好的表现。
    Abstract Scenarios in which restrictions in data transfer and storage limit the possibility to compose a single dataset -- also exploiting different data sources -- to perform a batch-based training procedure, make the development of robust models particularly challenging. We hypothesize that the recent Continual Learning (CL) paradigm may represent an effective solution to enable incremental training, even through multiple sites. Indeed, a basic assumption of CL is that once a model has been trained, old data can no longer be used in successive training iterations and in principle can be deleted. Therefore, in this paper, we investigate the performance of different Continual Learning methods in this scenario, simulating a learning model that is updated every time a new chunk of data, even of variable size, is available. Experimental results reveal that a particular CL method, namely Learning without Forgetting (LwF), is one of the best-performing algorithms. Then, we investigate its usage and parametrization in Morphing Attack Detection and Object Classification tasks, specifically with respect to the amount of new training data that became available.
    摘要 具有限制数据传输和存储的场景下, compose a single dataset --- even exploiting different data sources --- to perform a batch-based training procedure, 的模型开发 particullay challenging. 我们假设 Continual Learning (CL) paradigm 可能是一个有效的解决方案,以便在多个站点进行逐步训练。 indeed, a basic assumption of CL 是一个已经训练过的模型不能再使用老数据进行后续训练迭代,并且可以删除。因此,在这篇论文中,我们研究了不同的 Continual Learning 方法在这种情况下的性能,通过模拟一个随着新数据块的到达而更新的学习模型。实验结果表明,一种特定的 CL 方法, namely Learning without Forgetting (LwF) 是最好的算法之一。然后,我们进一步调查了其在 Morphing Attack Detection 和 Object Classification 任务中的使用和 Parametrization,具体是根据可用的新训练数据量。

Speeding up Fourier Neural Operators via Mixed Precision

  • paper_url: http://arxiv.org/abs/2307.15034
  • repo_url: https://github.com/neuraloperator/neuraloperator
  • paper_authors: Colin White, Renbo Tu, Jean Kossaifi, Gennady Pekhimenko, Kamyar Azizzadenesheli, Anima Anandkumar
  • for: solving partial differential equation (PDE) solutions using the Fourier neural operator (FNO)
  • methods: mixed-precision training of FNO, with a focus on reducing memory usage and training time
  • results: up to 34% reduction in training time and memory usage, with little or no reduction in accuracy, on the Navier-Stokes and Darcy flow equations.
    Abstract The Fourier neural operator (FNO) is a powerful technique for learning surrogate maps for partial differential equation (PDE) solution operators. For many real-world applications, which often require high-resolution data points, training time and memory usage are significant bottlenecks. While there are mixed-precision training techniques for standard neural networks, those work for real-valued datatypes on finite dimensions and therefore cannot be directly applied to FNO, which crucially operates in the (complex-valued) Fourier domain and in function spaces. On the other hand, since the Fourier transform is already an approximation (due to discretization error), we do not need to perform the operation at full precision. In this work, we (i) profile memory and runtime for FNO with full and mixed-precision training, (ii) conduct a study on the numerical stability of mixed-precision training of FNO, and (iii) devise a training routine which substantially decreases training time and memory usage (up to 34%), with little or no reduction in accuracy, on the Navier-Stokes and Darcy flow equations. Combined with the recently proposed tensorized FNO (Kossaifi et al., 2023), the resulting model has far better performance while also being significantly faster than the original FNO.
    摘要 “傅曼托运算(FNO)是一种强大的技术,用于学习偏微分方程(PDE)解析Operator的替代地图。实际应用中,需要大量的高分辨率数据点,训练时间和内存使用却成为了主要的瓶颈。虽然存在混合精度训练技术 для标准神经网络,但这些技术仅适用于实数型数据和finite dimension上,因此无法直接应用于FNO,因为FNO在复数valued Fourier领域和函数空间中运作。另一方面,由于Fourier变换已经是一种 aproximation(由于精度错误),我们不需要在全精度下进行运算。在这个工作中,我们(i)评估FNO的内存和时间成本,(ii)进行了FNO混合精度训练的 numerics 稳定性研究,以及(iii)开发了一个对FNO训练时间和内存使用进行了大幅删减(高达34%)的训练程式,几乎不会影响精度,在奈瓦-斯托克和达瑞流运动方程中。combined with recently proposed tensorized FNO(Kossaifi et al., 2023),得到的模型具有更好的性能,同时也比原始FNO更快。”

Self-Supervised Graph Transformer for Deepfake Detection

  • paper_url: http://arxiv.org/abs/2307.15019
  • repo_url: None
  • paper_authors: Aminollah Khormali, Jiann-Shiun Yuan
  • for: 本研究旨在提出一种可靠的深伪检测系统,能够普适地检测不同类型的深伪视频。
  • methods: 该系统基于自我超vision transformer架构,采用自我超vised contrastive learning方法进行预训练,并将graph convolutional network和transformer discriminator相结合,以及图 transformer relevancy map提供更好的解释性。
  • results: 在多种实验中,该系统表现出色,在不同的 dataset、扰动类型和加工程度下均能够保持高度的检测性能,并且在常见的后期制作扰动下也能够保持robustness。
    Abstract Deepfake detection methods have shown promising results in recognizing forgeries within a given dataset, where training and testing take place on the in-distribution dataset. However, their performance deteriorates significantly when presented with unseen samples. As a result, a reliable deepfake detection system must remain impartial to forgery types, appearance, and quality for guaranteed generalizable detection performance. Despite various attempts to enhance cross-dataset generalization, the problem remains challenging, particularly when testing against common post-processing perturbations, such as video compression or blur. Hence, this study introduces a deepfake detection framework, leveraging a self-supervised pre-training model that delivers exceptional generalization ability, withstanding common corruptions and enabling feature explainability. The framework comprises three key components: a feature extractor based on vision Transformer architecture that is pre-trained via self-supervised contrastive learning methodology, a graph convolution network coupled with a Transformer discriminator, and a graph Transformer relevancy map that provides a better understanding of manipulated regions and further explains the model's decision. To assess the effectiveness of the proposed framework, several challenging experiments are conducted, including in-data distribution performance, cross-dataset, cross-manipulation generalization, and robustness against common post-production perturbations. The results achieved demonstrate the remarkable effectiveness of the proposed deepfake detection framework, surpassing the current state-of-the-art approaches.
    摘要 深层负作假检测方法在给定数据集上已经表现出了良好的结果,可以准确地识别forge。然而,当面临未seen样本时,其性能会降低 significatively。因此,一个可靠的深层负作假检测系统必须保持中立,不被假造类型、外观和质量所左右。despite various attempts to enhance cross-dataset generalization, the problem remains challenging, especially when testing against common post-processing perturbations, such as video compression or blur. Therefore, this study proposes a deepfake detection framework, which leverages a self-supervised pre-training model that delivers excellent generalization ability, resisting common corruptions and providing feature explainability. The framework consists of three key components: a feature extractor based on the vision Transformer architecture that is pre-trained via self-supervised contrastive learning, a graph convolution network coupled with a Transformer discriminator, and a graph Transformer relevancy map that provides a better understanding of manipulated regions and further explains the model's decision. To evaluate the effectiveness of the proposed framework, several challenging experiments are conducted, including in-distribution performance, cross-dataset, cross-manipulation generalization, and robustness against common post-production perturbations. The results show that the proposed deepfake detection framework outperforms the current state-of-the-art approaches.

Samplable Anonymous Aggregation for Private Federated Data Analysis

  • paper_url: http://arxiv.org/abs/2307.15017
  • repo_url: None
  • paper_authors: Kunal Talwar, Shan Wang, Audra McMillan, Vojta Jina, Vitaly Feldman, Bailey Basile, Aine Cahill, Yi Sheng Chan, Mike Chatzidakis, Junye Chen, Oliver Chick, Mona Chitnis, Suman Ganta, Yusuf Goren, Filip Granqvist, Kristine Guo, Frederic Jacobs, Omid Javidbakht, Albert Liu, Richard Low, Dan Mascenik, Steve Myers, David Park, Wonhee Park, Gianni Parsa, Tommy Pauly, Christian Priebe, Rehan Rishi, Guy Rothblum, Michael Scaria, Linmao Song, Congzheng Song, Karl Tarbe, Sebastian Vogt, Luke Winstrom, Shundong Zhou
  • for: 这个论文是为了设计扩展性强的协议,以实现隐私统计和隐私联合学习,每个设备都保持私有数据。
  • methods: 该论文提出了一种简单的原理,可以实现许多常用的算法,同时允许隐私评估,与中央设置相似,而不需要强大的信任假设。
  • results: 该论文提出了一种系统架构,实现了该原理,并进行了安全分析。
    Abstract We revisit the problem of designing scalable protocols for private statistics and private federated learning when each device holds its private data. Our first contribution is to propose a simple primitive that allows for efficient implementation of several commonly used algorithms, and allows for privacy accounting that is close to that in the central setting without requiring the strong trust assumptions it entails. Second, we propose a system architecture that implements this primitive and perform a security analysis of the proposed system.
    摘要 我们回归到每个设备保留自己私人数据时的私人统计和联邦学习协议设计的问题。我们的首要贡献是提出了一个简单的基本 primitives,它可以高效实现许多通常使用的算法,并且可以在中央设定下实现隐私账户,不需要强信任假设。其次,我们提出了一种系统架构,实现了这个基本 primitives,并进行了安全分析。

How Good is Google Bard’s Visual Understanding? An Empirical Study on Open Challenges

  • paper_url: http://arxiv.org/abs/2307.15016
  • repo_url: https://github.com/htqin/googlebard-visunderstand
  • paper_authors: Haotong Qin, Ge-Peng Ji, Salman Khan, Deng-Ping Fan, Fahad Shahbaz Khan, Luc Van Gool
  • for: The paper aims to evaluate the performance of Google’s Bard in understanding and interpreting visual data (images) conditioned by text questions, and to identify the gaps in Bard’s vision-based understanding.
  • methods: The paper uses 15 diverse task scenarios to comprehensively evaluate Bard’s performance in handling visual data, including regular, camouflaged, medical, under-water, and remote sensing data.
  • results: The primary finding of the study is that Bard still struggles in these vision scenarios, highlighting the significant gap in vision-based understanding that needs to be bridged in future developments.Here’s the Chinese version of the three key points:
  • for: 论文目的是评估Google的Bard在基于文本问题的视觉数据(图像)理解和解释方面的性能,并发现视觉基本理解中的差距。
  • methods: 论文使用15种多样化任务场景来全面评估Bard在处理视觉数据方面的性能,包括常见、掩蔽、医疗、水下和遥感数据等。
  • results: 研究的主要发现是,Bard在这些视觉场景中仍然困难,表明未来发展中需要覆盖视觉基本理解的差距。
    Abstract Google's Bard has emerged as a formidable competitor to OpenAI's ChatGPT in the field of conversational AI. Notably, Bard has recently been updated to handle visual inputs alongside text prompts during conversations. Given Bard's impressive track record in handling textual inputs, we explore its capabilities in understanding and interpreting visual data (images) conditioned by text questions. This exploration holds the potential to unveil new insights and challenges for Bard and other forthcoming multi-modal Generative models, especially in addressing complex computer vision problems that demand accurate visual and language understanding. Specifically, in this study, we focus on 15 diverse task scenarios encompassing regular, camouflaged, medical, under-water and remote sensing data to comprehensively evaluate Bard's performance. Our primary finding indicates that Bard still struggles in these vision scenarios, highlighting the significant gap in vision-based understanding that needs to be bridged in future developments. We expect that this empirical study will prove valuable in advancing future models, leading to enhanced capabilities in comprehending and interpreting fine-grained visual data. Our project is released on https://github.com/htqin/GoogleBard-VisUnderstand
    摘要 Google的Bard在对话AI领域已经出现为OpenAI的ChatGPT的强力竞争对手。值得注意的是,Bard最近更新了对视觉输入进行对话的能力。基于Bard的出色的文本输入处理能力,我们探索了它对于图像数据(图像)的理解和解释的能力。这种探索具有探索新的发现和挑战的潜在性,尤其是在解决复杂的计算机视觉问题方面。在这种研究中,我们选择了15种多样化的任务场景,包括常见、掩体、医疗、水下和Remote感知数据,以全面评估Bard的表现。我们的主要发现表明Bard在这些视觉场景中仍然努力,强调了将来发展中需要覆盖的视觉理解悬峰。我们期望这个实验室的研究将对未来模型的发展产生帮助,导致对细化的视觉数据的理解和解释能力得到提高。我们的项目在https://github.com/htqin/GoogleBard-VisUnderstand上发布。

Harnessing Synthetic Active Particles for Physical Reservoir Computing

  • paper_url: http://arxiv.org/abs/2307.15010
  • repo_url: None
  • paper_authors: Xiangzun Wang, Frank Cichos
  • for: 这篇论文探讨了基于活动微颗件系统的物理储存计算方法,以实现高效的信息处理。
  • methods: 该方法使用了具有延迟响应的微颗件系统自组织成非线性动力系统,并使用历史储存来降低噪声。
  • results: 研究人员发现,使用这种特殊架构可以实现高效的预测任务,即使在噪声强的情况下。这些结果为人工生物系统的信息处理研究开辟了新的可能性。
    Abstract The processing of information is an indispensable property of living systems realized by networks of active processes with enormous complexity. They have inspired many variants of modern machine learning one of them being reservoir computing, in which stimulating a network of nodes with fading memory enables computations and complex predictions. Reservoirs are implemented on computer hardware, but also on unconventional physical substrates such as mechanical oscillators, spins, or bacteria often summarized as physical reservoir computing. Here we demonstrate physical reservoir computing with a synthetic active microparticle system that self-organizes from an active and passive component into inherently noisy nonlinear dynamical units. The self-organization and dynamical response of the unit is the result of a delayed propulsion of the microswimmer to a passive target. A reservoir of such units with a self-coupling via the delayed response can perform predictive tasks despite the strong noise resulting from Brownian motion of the microswimmers. To achieve efficient noise suppression, we introduce a special architecture that uses historical reservoir states for output. Our results pave the way for the study of information processing in synthetic self-organized active particle systems.
    摘要 生物系统中信息处理是不可或缺的性能,通过活动过程网络实现了复杂性的计算。它们激发了现代机器学习的多种变种,其中之一是储存计算,通过启动网络节点的减弱记忆来实现计算和复杂预测。储存器在计算机硬件上实现,也可以在不同的物理基础结构上实现,如机械振荡器、螺旋体或细菌。在这里,我们使用自适应微体系来实现物理储存计算。这种系统由活动和无活动组件自组织而成,并且具有内生的噪声。我们通过延迟微型游子的推动到潜在目标来实现单元的自组织和动态响应。一个由这些单元组成的储存器,通过自相互关联来实现预测任务,即使面临强噪声。为了有效地减少噪声,我们提出了一种特殊的建筑方案,使用历史储存器状态来输出。我们的结果为人工自组织活体系统的信息处理研究开辟了新的道路。

Verifiable Feature Attributions: A Bridge between Post Hoc Explainability and Inherent Interpretability

  • paper_url: http://arxiv.org/abs/2307.15007
  • repo_url: None
  • paper_authors: Usha Bhalla, Suraj Srinivas, Himabindu Lakkaraju
  • for: 该论文的目的是解释机器学习模型的行为,并提供可靠的、可验证的特征归因方法。
  • methods: 该论文提出了一种名为Verifiability Tuning(VerT)的方法,可以将黑盒模型转化成具有自然的、可靠和可验证的特征归因方法。
  • results: 该论文通过对半 sintetic和实际数据集进行了广泛的实验,并证明了VerT可以生成的模型和特征归因方法是正确、可靠和 faithful于原始黑盒模型。
    Abstract With the increased deployment of machine learning models in various real-world applications, researchers and practitioners alike have emphasized the need for explanations of model behaviour. To this end, two broad strategies have been outlined in prior literature to explain models. Post hoc explanation methods explain the behaviour of complex black-box models by highlighting features that are critical to model predictions; however, prior work has shown that these explanations may not be faithful, and even more concerning is our inability to verify them. Specifically, it is nontrivial to evaluate if a given attribution is correct with respect to the underlying model. Inherently interpretable models, on the other hand, circumvent these issues by explicitly encoding explanations into model architecture, meaning their explanations are naturally faithful and verifiable, but they often exhibit poor predictive performance due to their limited expressive power. In this work, we aim to bridge the gap between the aforementioned strategies by proposing Verifiability Tuning (VerT), a method that transforms black-box models into models that naturally yield faithful and verifiable feature attributions. We begin by introducing a formal theoretical framework to understand verifiability and show that attributions produced by standard models cannot be verified. We then leverage this framework to propose a method to build verifiable models and feature attributions out of fully trained black-box models. Finally, we perform extensive experiments on semi-synthetic and real-world datasets, and show that VerT produces models that (1) yield explanations that are correct and verifiable and (2) are faithful to the original black-box models they are meant to explain.
    摘要 随着机器学习模型在各种实际应用中的普及,研究人员和实践者们都强调了模型行为的解释的重要性。为此,先前的文献中提出了两种广泛的解释策略:一是后期解释方法,通过强调模型预测中的关键特征来解释模型行为,但是先前的研究表明这些解释可能不准确,而且无法验证它们的正确性。另一方面,内置解释的模型可以自动编码解释到模型结构中,因此其解释是自然的、可靠的和验证的,但是它们通常具有较弱的预测性能。在这篇文章中,我们希望通过提出验证化调整(VerT)来bridge这两种策略之间的差距,以生成可靠的、可验证的特征归属。我们首先引入了一个正式的理论框架,以理解验证性的概念,并证明标准模型生成的归属无法验证。然后,我们利用这个框架,提出了一种方法,可以将黑盒模型转化为可靠、可验证的模型和特征归属。最后,我们对半synthetic和实际数据集进行了广泛的实验,并证明VerT可以生成符合预期的、可靠的和可验证的模型和特征归属。

Improved Neural Radiance Fields Using Pseudo-depth and Fusion

  • paper_url: http://arxiv.org/abs/2308.03772
  • repo_url: None
  • paper_authors: Jingliang Li, Qiang Zhou, Chaohui Yu, Zhengda Lu, Jun Xiao, Zhibin Wang, Fan Wang
  • for: 本研究主要探讨了如何使用多尺度编码体Volume和多尺度几何信息来提高NeRF模型的视野 sintesis和精密几何建模性能。
  • methods: 本研究提出了一种基于多尺度编码体Volume和多尺度几何信息的NeRF模型,并提出了一种同时进行深度预测和场景场景 reconstruction的方法,以及一种基于深度指导的点云特征融合方法来增强点云特征。
  • results: 实验结果显示,提出的方法可以在不需要Scene-specific优化的情况下实现高质量的视野 sintesis和精密几何建模,并且在点云特征融合方面提高了性能。
    Abstract Since the advent of Neural Radiance Fields, novel view synthesis has received tremendous attention. The existing approach for the generalization of radiance field reconstruction primarily constructs an encoding volume from nearby source images as additional inputs. However, these approaches cannot efficiently encode the geometric information of real scenes with various scale objects/structures. In this work, we propose constructing multi-scale encoding volumes and providing multi-scale geometry information to NeRF models. To make the constructed volumes as close as possible to the surfaces of objects in the scene and the rendered depth more accurate, we propose to perform depth prediction and radiance field reconstruction simultaneously. The predicted depth map will be used to supervise the rendered depth, narrow the depth range, and guide points sampling. Finally, the geometric information contained in point volume features may be inaccurate due to occlusion, lighting, etc. To this end, we propose enhancing the point volume feature from depth-guided neighbor feature fusion. Experiments demonstrate the superior performance of our method in both novel view synthesis and dense geometry modeling without per-scene optimization.
    摘要 自Neural Radiance Fields出现以来,新视图合成受到了广泛关注。现有的总结方法主要从邻近源图像中构建编码量为附加输入。然而,这些方法无法有效地编码实际场景中的各种尺度对象/结构的 геометрической信息。在这种工作中,我们提议构建多尺度编码量和将多尺度几何信息提供给NeRF模型。为使构造的体积最接近物体Scene中的表面和渲染深度更准确,我们提议同时进行深度预测和场景场景重建。预测的深度图将用于监督渲染深度,窄化深度范围,并导引点抽取。最后,由点体积特征中含有的几何信息可能因为遮挡、照明等因素而不准确。为此,我们提议通过深度导向邻居特征融合提高点体积特征。实验表明我们的方法在新视图合成和不需要Scene特定优化的精密几何模型中具有superior表现。

Detection of Children Abuse by Voice and Audio Classification by Short-Time Fourier Transform Machine Learning implemented on Nvidia Edge GPU device

  • paper_url: http://arxiv.org/abs/2307.15101
  • repo_url: None
  • paper_authors: Jiuqi Yan, Yingxian Chen, W. W. T. Fok
  • for: 增强儿童安全性,预测儿童被虐待情况。
  • methods: 机器学习技术应用于儿童声音识别和检测儿童被虐待情况。
  • results: 实验结果显示,使用机器学习模型可以准确地识别儿童的声音,并且可以在儿童被虐待情况下发出警示。模型的准确率达到了约92%。
    Abstract The safety of children in children home has become an increasing social concern, and the purpose of this experiment is to use machine learning applied to detect the scenarios of child abuse to increase the safety of children. This experiment uses machine learning to classify and recognize a child's voice and predict whether the current sound made by the child is crying, screaming or laughing. If a child is found to be crying or screaming, an alert is immediately sent to the relevant personnel so that they can perceive what the child may be experiencing in a surveillance blind spot and respond in a timely manner. Together with a hybrid use of video image classification, the accuracy of child abuse detection can be significantly increased. This greatly reduces the likelihood that a child will receive violent abuse in the nursery and allows personnel to stop an imminent or incipient child abuse incident in time. The datasets collected from this experiment is entirely from sounds recorded on site at the children home, including crying, laughing, screaming sound and background noises. These sound files are transformed into spectrograms using Short-Time Fourier Transform, and then these image data are imported into a CNN neural network for classification, and the final trained model can achieve an accuracy of about 92% for sound detection.
    摘要 <>对儿童HOME的儿童安全问题而言,这个实验的目的是使用机器学习技术来检测儿童虐待情况,以增加儿童的安全性。这个实验使用机器学习技术来分类和识别儿童的声音,并预测儿童当前发出的声音是否为哭泣、喊叫或 laughter。如果发现儿童在哭泣或喊叫,则立即发送 alert 到相关人员,以便他们在监控盲区内认定儿童的情况,并在时间上采取应急措施。与视频图像分类相结合使用,可以提高儿童虐待检测的准确率。这将大幅降低儿童在寄养所接受暴力虐待的可能性,并让人员在时间上采取应急措施,以防止儿童虐待事件的发生。实验所收集的数据完全来自寄养所录制的声音,包括哭泣、笑声、喊叫声和背景噪音。这些声音文件被转换成spectrograms使用Short-Time Fourier Transform,然后这些图像数据被导入到CNN神经网络中进行分类,最终训练出的模型可以达到约92%的准确率。

Thinker: Learning to Plan and Act

  • paper_url: http://arxiv.org/abs/2307.14993
  • repo_url: https://github.com/anonymous-scrl/thinker
  • paper_authors: Stephen Chung, Ivan Anokhin, David Krueger
  • for: 本研究旨在开掘了一种名为 Thinker 算法,该算法可以让学习型决策机器人自主地与学习的世界模型交互,从而实现了自动化的规划。
  • methods: Thinker 算法将环境包装在一个世界模型中,并引入了新的模型交互动作,允许机器人通过提议不同的计划来让世界模型进行规划,从而消除了手动设计的规划算法的需求。
  • results: experimental 结果表明, Thinker 算法在杯球游戏和 Atari 2600 测试中实现了状态之最好的性能和竞争性能。视觉化显示了机器人培育了 Thinker 算法后,它们已经学习了如何有效地规划使用世界模型选择更好的动作。
    Abstract We propose the Thinker algorithm, a novel approach that enables reinforcement learning agents to autonomously interact with and utilize a learned world model. The Thinker algorithm wraps the environment with a world model and introduces new actions designed for interacting with the world model. These model-interaction actions enable agents to perform planning by proposing alternative plans to the world model before selecting a final action to execute in the environment. This approach eliminates the need for hand-crafted planning algorithms by enabling the agent to learn how to plan autonomously and allows for easy interpretation of the agent's plan with visualization. We demonstrate the algorithm's effectiveness through experimental results in the game of Sokoban and the Atari 2600 benchmark, where the Thinker algorithm achieves state-of-the-art performance and competitive results, respectively. Visualizations of agents trained with the Thinker algorithm demonstrate that they have learned to plan effectively with the world model to select better actions. The algorithm's generality opens a new research direction on how a world model can be used in reinforcement learning and how planning can be seamlessly integrated into an agent's decision-making process.
    摘要 我们提出了思考算法,一种新的方法,让学习 Agent 能够自主地与学习的世界模型交互。思考算法将环境包装在世界模型中,并引入了特制的世界模型交互动作,让 Agent 可以通过提议多个计划来和世界模型交互,从而实现计划学习。这种方法消除了手动设计的计划算法的需求,让 Agent 可以自主地学习计划,并且可以轻松地通过可视化来解释 Agent 的计划。我们通过骁客游戏和 Atari 2600 测试集的实验result,证明思考算法在状态识别和竞争性能方面具有优秀表现。Visualization 显示 Agent 训练后已经学会了有效地使用世界模型来选择更好的动作。该算法的通用性开启了一个新的研究方向,即如何在学习 Agent 中使用世界模型,以及如何让计划学习与决策过程相协调。

Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs

  • paper_url: http://arxiv.org/abs/2307.14988
  • repo_url: None
  • paper_authors: Or Sharir, Anima Anandkumar
  • for: 这篇论文的目的是提出一种增量计算方法,以提高深度学习模型在处理动态输入时的效率。
  • methods: 这篇论文使用了 вектор量化来精简中继值,从而使得模型可以更好地重复计算。
  • results: 实验结果显示,使用增量计算方法可以与传统的批量计算方法相比,在运算序列中的字串编译时间比例降低了12.1倍(中位数)。
    Abstract Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs. For example, an AI writing assistant is required to update its suggestions in real time as a document is edited. Re-running the model each time is expensive, even with compression techniques like knowledge distillation, pruning, or quantization. Instead, we take an incremental computing approach, looking to reuse calculations as the inputs change. However, the dense connectivity of conventional architectures poses a major obstacle to incremental computation, as even minor input changes cascade through the network and restrict information reuse. To address this, we use vector quantization to discretize intermediate values in the network, which filters out noisy and unnecessary modifications to hidden neurons, facilitating the reuse of their values. We apply this approach to the transformers architecture, creating an efficient incremental inference algorithm with complexity proportional to the fraction of the modified inputs. Our experiments with adapting the OPT-125M pre-trained language model demonstrate comparable accuracy on document classification while requiring 12.1X (median) fewer operations for processing sequences of atomic edits.
    摘要 然而,传统的网络架构中的紧密连接带来了重要的障碍,因为even minor input changes 会在网络中传播并限制信息重用。为解决这个问题,我们使用 вектор量化来精确化网络中的中间值,从而过滤出干扰和无关的修改。这使得可以重用隐藏神经元的值。我们应用这种方法到 transformers 架构中,创建了高效的逐步计算算法,计算复杂度与修改输入的 Fraction 成正比。我们的实验表明,在适应 OPT-125M 预训练语言模型时,可以保持文档类型的准确性,而同时需要12.1倍(中位) fewer 操作来处理序列化的 atomic 修改。

Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models

  • paper_url: http://arxiv.org/abs/2307.14971
  • repo_url: https://github.com/wangzy22/tap
  • paper_authors: Ziyi Wang, Xumin Yu, Yongming Rao, Jie Zhou, Jiwen Lu
  • for: 提高3D视觉模型的性能
  • methods: 使用cross-attention机制生成不同指定姿态的视图图像,以供3D模型进行预训练
  • results: 在ScanObjectNN类型和ShapeNetPart分割任务上达到了最佳性能,超过了之前的预训练方法
    Abstract With the overwhelming trend of mask image modeling led by MAE, generative pre-training has shown a remarkable potential to boost the performance of fundamental models in 2D vision. However, in 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training. In this paper, we propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model. We propose to generate view images from different instructed poses via the cross-attention mechanism as the pre-training scheme. Generating view images has more precise supervision than its point cloud counterpart, thus assisting 3D backbones to have a finer comprehension of the geometrical structure and stereoscopic relations of the point cloud. Experimental results have proved the superiority of our proposed 3D-to-2D generative pre-training over previous pre-training methods. Our method is also effective in boosting the performance of architecture-oriented approaches, achieving state-of-the-art performance when fine-tuning on ScanObjectNN classification and ShapeNetPart segmentation tasks. Code is available at https://github.com/wangzy22/TAP.
    摘要 随着MAE领导的图像模型的潮流,生成预训练表现出了惊人的潜力,以提高2D视觉基本模型的性能。然而,在3D视觉中,由Transformer基础模型和无序点云的限制,生成预训练的发展受到了限制。在这篇论文中,我们提出了一种适用于任何点云模型的3D-to-2D生成预训练方法。我们提议通过cross-attention机制来生成不同的指定pose的视图图像作为预训练方案。通过生成视图图像,可以为3D背景提供更加精确的超visional指导,从而帮助3D背景更好地理解点云的几何结构和立体关系。实验结果表明,我们提出的3D-to-2D生成预训练方法比前期方法更高效。此外,我们的方法也可以提高建筑方法的性能,在ScanObjectNN分类和ShapeNetPart segmentation任务上达到了当前最佳性能。代码可以在https://github.com/wangzy22/TAP中找到。

Learning locally dominant force balances in active particle systems

  • paper_url: http://arxiv.org/abs/2307.14970
  • repo_url: None
  • paper_authors: Dominik Sturm, Suryanarayana Maddu, Ivo F. Sbalzarini
  • for: 这个论文的目的是解释自适应活动粒子系统中的宏观模式形成。
  • methods: 该论文使用了一种组合式无监督聚类和稀疏推理算法来学习自适应粒子系统中的地方主导力平衡。
  • results: 研究发现,自适应粒子系统中的宏观模式形成是基于地方的对称相互作用和激射压力的。该方法还能够揭示自适应粒子系统中模式形成的物理原理和实验观察结果之间的相互关系。
    Abstract We use a combination of unsupervised clustering and sparsity-promoting inference algorithms to learn locally dominant force balances that explain macroscopic pattern formation in self-organized active particle systems. The self-organized emergence of macroscopic patterns from microscopic interactions between self-propelled particles can be widely observed nature. Although hydrodynamic theories help us better understand the physical basis of this phenomenon, identifying a sufficient set of local interactions that shape, regulate, and sustain self-organized structures in active particle systems remains challenging. We investigate a classic hydrodynamic model of self-propelled particles that produces a wide variety of patterns, like asters and moving density bands. Our data-driven analysis shows that propagating bands are formed by local alignment interactions driven by density gradients, while steady-state asters are shaped by a mechanism of splay-induced negative compressibility arising from strong particle interactions. Our method also reveals analogous physical principles of pattern formation in a system where the speed of the particle is influenced by local density. This demonstrates the ability of our method to reveal physical commonalities across models. The physical mechanisms inferred from the data are in excellent agreement with analytical scaling arguments and experimental observations.
    摘要 (注意:以下是简化中文版本,具体的翻译结果可能有所不同)我们使用无监督划分和缺省推理算法来学习活体系中的地方主导力平衡,以解释大规模模式的形成。自然中广泛观察到活体系中自组织的大规模模式。尽管流体理论可以帮助我们更好地理解这种现象的物理基础,但是确定活体系中自组织结构的地方互动的充分集合仍然是一个挑战。我们研究了一种经典的流体力学模型,该模型可以生成各种模式,如星形和移动密度带。我们的数据驱动分析表明,带状模式是由地方对适应度势场的Alignment互动驱动的,而稳定的星形则是由强 particle interactions 导致的负压缩性质的。我们的方法还表明了在模型中速度受到地方密度影响的情况下,物理原理的形成是一致的。这说明我们的方法可以揭示模型之间的物理相似性。实际观察和理论分析中的物理机制与我们的数据分析结果吻合得非常好。