2023-08-05

cs.LG

cs.LG - 2023-08-05

Edge of stability echo state networks

paper_url: http://arxiv.org/abs/2308.02902
repo_url: None
paper_authors: Andrea Ceni, Claudio Gallicchio
for: 这篇论文旨在提出一种新的储量计算（Reservoir Computing，RC）架构，称为边缘稳定扩散网络（ES$^2$N）。
methods: 该ES$^2$N模型基于定义储量层为非线性储量（如标准ESN中的储量）和线性储量（实现正交变换）的几何组合。我们提供了对引入模型的数学分析，证明整个特征值域可以包含在一个可控制的半径的贝叶圈内，并利用这一性质以确保ES$^2$N的前向动力学在设计的边缘不稳定 regime内进化。
results: 我们的实验分析显示，引入的储量模型可以达到理论上的最大短期记忆容量。同时，相比标准ESN，ES$^2$N具有更好的记忆和非线性之间的质量比，以及在推理非线性模型方面的显著改善。

Abstract
In this paper, we propose a new Reservoir Computing (RC) architecture, called the Edge of Stability Echo State Network (ES$^2$N). The introduced ES$^2$N model is based on defining the reservoir layer as a convex combination of a nonlinear reservoir (as in the standard ESN), and a linear reservoir that implements an orthogonal transformation. We provide a thorough mathematical analysis of the introduced model, proving that the whole eigenspectrum of the Jacobian of the ES2N map can be contained in an annular neighbourhood of a complex circle of controllable radius, and exploit this property to demonstrate that the ES$^2$N's forward dynamics evolves close to the edge-of-chaos regime by design. Remarkably, our experimental analysis shows that the newly introduced reservoir model is able to reach the theoretical maximum short-term memory capacity. At the same time, in comparison to standard ESN, ES$^2$N is shown to offer a favorable trade-off between memory and nonlinearity, as well as a significant improvement of performance in autoregressive nonlinear modeling.

摘要
在这篇论文中，我们提出了一种新的储量计算（Reservoir Computing，RC）架构，称为边缘稳定响应网络（ES$^2$N）。我们在引入ES$^2$N模型时定义了储量层为一种非线性储量（如标准ESN中的储量）和一种线性储量，该线性储量实现了正交变换。我们对引入的模型进行了住所数学分析，证明了整个特征值谱的Jacobian可以是一个可控的圆形谱的一部分，并利用这个性质来证明ES$^2$N的前向动力学在设计上靠近边缘混乱 regime。实验分析表明，我们新引入的储量模型可以 дости到理论上的最大短期记忆容量。同时，相比标准ESN，ES$^2$N具有更好的记忆与非线性之间的平衡，以及在推理非线性模型方面的显著改善。

Textual Data Mining for Financial Fraud Detection: A Deep Learning Approach

paper_url: http://arxiv.org/abs/2308.03800
repo_url: None
paper_authors: Qiuru Li
for: 这项研究旨在应用深度学习技术对金融诈骗文本进行自然语言处理（NLP）二分类分类任务，以检测金融诈骗公司的诈骗行为。
methods: 本研究使用了多种神经网络模型，包括多层感知器（Multilayer Perceptrons）、普通的循环神经网络（vanilla RNN）、长短期记忆网络（LSTM）和闭包记忆网络（GRU），以进行文本分类任务。
results: 研究结果表明，使用这些多种神经网络模型可以准确地检测金融诈骗行为，并且可以为金融监管机构、企业和研究人员提供有价值的信息，以帮助他们制定更加有效和robust的诈骗检测策略。

Abstract
In this report, I present a deep learning approach to conduct a natural language processing (hereafter NLP) binary classification task for analyzing financial-fraud texts. First, I searched for regulatory announcements and enforcement bulletins from HKEX news to define fraudulent companies and to extract their MD&A reports before I organized the sentences from the reports with labels and reporting time. My methodology involved different kinds of neural network models, including Multilayer Perceptrons with Embedding layers, vanilla Recurrent Neural Network (RNN), Long-Short Term Memory (LSTM), and Gated Recurrent Unit (GRU) for the text classification task. By utilizing this diverse set of models, I aim to perform a comprehensive comparison of their accuracy in detecting financial fraud. My results bring significant implications for financial fraud detection as this work contributes to the growing body of research at the intersection of deep learning, NLP, and finance, providing valuable insights for industry practitioners, regulators, and researchers in the pursuit of more robust and effective fraud detection methodologies.

摘要
在这份报告中，我采用深度学习方法进行自然语言处理（以下简称NLP）二分类任务，以分析金融诈骗文本。首先，我从香港证券交易所新闻查找了规定公告和执行通知，以定义诈骗公司并提取其财务报告。然后，我将报告句子与标签和发布时间进行了分类。我的方法包括多层感知器、vanilla RNN、LSTM和GRU等不同类型的神经网络模型，用于文本分类任务。通过使用这些多样化的模型，我希望通过对它们的准确率进行比较，以评估不同模型在检测金融诈骗方面的精度。我的结果对金融诈骗检测有着重要的意义，这些研究贡献于深度学习、NLP和金融之间的交叉领域，为行业实践者、 regulators 和研究人员提供了价值的意见，以帮助他们开发更加Robust和有效的诈骗检测方法。

Elucidate Gender Fairness in Singing Voice Transcription

paper_url: http://arxiv.org/abs/2308.02898
repo_url: https://github.com/guxm2021/svt_speechbrain
paper_authors: Xiangming Gu, Wei Zeng, Ye Wang
for: 这个研究旨在探讨男女之间在歌唱声音识别中的性别差异，以及这些差异是否会导致声音识别系统的性别不公。
methods: 本研究使用了不同的模型和数据集，并证明了女性SVT系统的超越性。furthermore, the authors propose using an attribute predictor to predict gender labels and adversarially training the SVT system to enforce gender-invariance of acoustic representations.
results: 实验结果显示，这种方法可以降低性别差异（最多超过50%），而不会影响声音识别系统的总性能。

Abstract
It is widely known that males and females typically possess different sound characteristics when singing, such as timbre and pitch, but it has never been explored whether these gender-based characteristics lead to a performance disparity in singing voice transcription (SVT), whose target includes pitch. Such a disparity could cause fairness issues and severely affect the user experience of downstream SVT applications. Motivated by this, we first demonstrate the female superiority of SVT systems, which is observed across different models and datasets. We find that different pitch distributions, rather than gender data imbalance, contribute to this disparity. To address this issue, we propose using an attribute predictor to predict gender labels and adversarially training the SVT system to enforce the gender-invariance of acoustic representations. Leveraging the prior knowledge that pitch distributions may contribute to the gender bias, we propose conditionally aligning acoustic representations between demographic groups by feeding note events to the attribute predictor. Empirical experiments on multiple benchmark SVT datasets show that our method significantly reduces gender bias (up to more than 50%) with negligible degradation of overall SVT performance, on both in-domain and out-of-domain singing data, thus offering a better fairness-utility trade-off.

摘要
广泛知道，♂和♀在唱歌时通常具有不同的声音特征，如音质和高低音，但这些性别基本特征是否会导致唱歌voice识别（SVT）性别偏远问题，这可能会导致公平问题并且严重地影响下游SVT应用程序的用户体验。为此，我们首先证明♀ SVT系统的优势，这被观察到在不同的模型和数据集上。我们发现，不同的抽象分布，而不是性别数据不均衡，是导致这种偏迷的原因。为解决这个问题，我们提议使用一个属性预测器预测性别标签，并对SVT系统进行对gender-invariant的听音表示的反对抗训练。利用我们知道抽象分布可能会导致性别偏迷的知识，我们提议在注意力机制中条件对不同人群的声音表示进行 conditional alignment。实验表明，我们的方法可以在多个benchmark SVT数据集上显著减少性别偏迷（超过50%），无损SVT总性能，从而提供更好的公平利用折衡。

Physics-informed Gaussian process model for Euler-Bernoulli beam elements

paper_url: http://arxiv.org/abs/2308.02894
repo_url: None
paper_authors: Gledson Rodrigo Tondo, Sebastian Rau, Igor Kavrakov, Guido Morgenthal
for: 这个研究用于开发一种基于物理学习模型的结构弹簧性评估方法，用于结构Integrity monitoring和诊断。
methods: 该模型采用多输出 Gaussian process 方法，使用 Euler-Bernoulli 梁 equation 来形式化模型，并使用数据驱动法来学习模型 Parameters。
results: 研究人员通过使用数据驱动法来更新模型 Parameters，并使用 Mahalanobis 距离来评估结构系统中可能的损害的位置和范围。模型的预测质量也被研究了，并发现 measurement noise 对预测质量的影响。

Abstract
A physics-informed machine learning model, in the form of a multi-output Gaussian process, is formulated using the Euler-Bernoulli beam equation. Given appropriate datasets, the model can be used to regress the analytical value of the structure's bending stiffness, interpolate responses, and make probabilistic inferences on latent physical quantities. The developed model is applied on a numerically simulated cantilever beam, where the regressed bending stiffness is evaluated and the influence measurement noise on the prediction quality is investigated. Further, the regressed probabilistic stiffness distribution is used in a structural health monitoring context, where the Mahalanobis distance is employed to reason about the possible location and extent of damage in the structural system. To validate the developed framework, an experiment is conducted and measured heterogeneous datasets are used to update the assumed analytical structural model.

摘要
Physics-informed machine learning model, 形式为多输出 Gaussian process, 使用欧拉-伯恩逊梁 equation 建模。通过适当的数据集，该模型可以用来回归分析结构的弯矩刚性， interpolate 响应，以及对隐藏物理量进行 probabilistic 推论。在 numerically 模拟的 cantilever 梁上应用了该模型，并评估了预测质量的影响因素。此外，通过使用 regressed 的可信度分布，Structural health monitoring 上进行了可能的损害位置和范围的判断。为验证建立的框架，对 measured 的不同类型数据进行了更新，并使用 Mahalanobis distance 进行了可能的损害识别。

Secure Deep-JSCC Against Multiple Eavesdroppers

paper_url: http://arxiv.org/abs/2308.02892
repo_url: None
paper_authors: Seyyed Amirhossein Ameli Kalkhoran, Mehdi Letafati, Ecenaz Erdemir, Babak Hossein Khalaj, Hamid Behroozi, Deniz Gündüz
for: 这个论文旨在提出一个基于深度学习的安全通信方法，以保护传输的隐私信息免被听到者探知。
methods: 这个方法使用深度学习的术语编码方法来实现安全通信，包括对多个听到者的情况进行研究，包括协同运算和不协同运算两种情况。
results: 研究结果显示，这个方法可以实现传输隐私信息的安全通信，并且可以降低听到者的攻击率。在CIFAR-10数据集上进行了验证，并且在不同通信频道上进行了验证。

Abstract
In this paper, a generalization of deep learning-aided joint source channel coding (Deep-JSCC) approach to secure communications is studied. We propose an end-to-end (E2E) learning-based approach for secure communication against multiple eavesdroppers over complex-valued fading channels. Both scenarios of colluding and non-colluding eavesdroppers are studied. For the colluding strategy, eavesdroppers share their logits to collaboratively infer private attributes based on ensemble learning method, while for the non-colluding setup they act alone. The goal is to prevent eavesdroppers from inferring private (sensitive) information about the transmitted images, while delivering the images to a legitimate receiver with minimum distortion. By generalizing the ideas of privacy funnel and wiretap channel coding, the trade-off between the image recovery at the legitimate node and the information leakage to the eavesdroppers is characterized. To solve this secrecy funnel framework, we implement deep neural networks (DNNs) to realize a data-driven secure communication scheme, without relying on a specific data distribution. Simulations over CIFAR-10 dataset verifies the secrecy-utility trade-off. Adversarial accuracy of eavesdroppers are also studied over Rayleigh fading, Nakagami-m, and AWGN channels to verify the generalization of the proposed scheme. Our experiments show that employing the proposed secure neural encoding can decrease the adversarial accuracy by 28%.

摘要
在这篇论文中，我们研究了对通信安全进行扩展的深度学习协助集成源 Channel 编码（Deep-JSCC）方法。我们提议了一种基于全端到端学习的方法来保护通信 against multiple 窃听者。我们分析了两种情况：协同窃听者和不协同窃听者。在协同情况下，窃听者们共享它们的 logits 以共同推断私有属性，而在不协同情况下，窃听者们 acted alone。我们的目标是防止窃听者们对传输的图像进行推断，同时将图像传递到合法接收者，并最小化质量损失。通过扩展隐私飞行和窃听频道编码的想法，我们定义了隐私飞行框架。为解决这个隐私飞行框架，我们实现了深度神经网络（DNN）来实现数据驱动的安全通信方案，不需要具体的数据分布。我们的实验表明，通过使用我们的安全神经编码，可以降低窃听者的敌对精度 by 28%。我们还对 Rayleigh 抽象、Nakagami-m 抽象和AWGN 抽象频道进行了窃听者的敌对精度研究，以验证我们的方案的普适性。

Private Federated Learning with Dynamic Power Control via Non-Coherent Over-the-Air Computation

paper_url: http://arxiv.org/abs/2308.02881
repo_url: None
paper_authors: Anbang Zhang, Shuaishuai Guo, Shuai Liu
for: 提高 Federated Learning（FL）模型性能和隐私保护，通过在空中计算（AirComp）方案和动态功率控制（DPCA）技术。
methods: Edge devices（ED）通过活化两个邻近的orthogonal frequency division multi-plexing（OFDM）子带，将本地随机梯度签名传输给 Edge Server（ES），然后通过利用储存在子带上的能量强制获得多数投票（MV）。
results: 提出了一种动态功率控制算法，以减少MV聚合值偏置聚合值的影响。并证明了整个方案可以减少时间同步误差、通道抑噪和噪声的影响。

Abstract
To further preserve model weight privacy and improve model performance in Federated Learning (FL), FL via Over-the-Air Computation (AirComp) scheme based on dynamic power control is proposed. The edge devices (EDs) transmit the signs of local stochastic gradients by activating two adjacent orthogonal frequency division multi-plexing (OFDM) subcarriers, and majority votes (MVs) at the edge server (ES) are obtained by exploiting the energy accumulation on the subcarriers. Then, we propose a dynamic power control algorithm to further offset the biased aggregation of the MV aggregation values. We show that the whole scheme can mitigate the impact of the time synchronization error, channel fading and noise. The theoretical convergence proof of the scheme is re-derived.

摘要
为了更好地保护模型权重隐私和提高 Federated Learning（FL）的性能，我们提议使用基于动态功率控制的 Over-the-Air Computation（AirComp）方案。边缘设备（ED）通过活动两个相邻的 ortogonal frequency division multi-plexing（OFDM）子频，传输本地随机梯度的证明，然后在边缘服务器（ES）中通过能量积累获得多数投票（MV）。然后，我们提议使用动态功率控制算法来进一步减少MV聚合值的偏见。我们示示了整个方案可以减轻时间同步错误、频率抑减和噪声的影响。我们还重新证明了方案的理论收敛证明。

Meta-learning in healthcare: A survey

paper_url: http://arxiv.org/abs/2308.02877
repo_url: None
paper_authors: Alireza Rafiei, Ronald Moore, Sina Jahromi, Farshid Hajati, Rishikesan Kamaleswaran
for: 本研究旨在探讨meta-learning在医疗领域的应用，以提供对健康领域中 meta-learning 的应用情况和挑战。
methods: 本研究使用了多种meta-learning方法，包括多任务/单任务学习和多少/少少shot学习等。
results: 研究发现，meta-learning在医疗领域的应用可以提高模型的能力，并可以Addressing various healthcare challenges, such as insufficient data and domain shifts. However, there are still several challenges in meta-learning research, such as the need for more diverse and high-quality datasets, and the need for better understanding of the underlying mechanisms of meta-learning.

Abstract
As a subset of machine learning, meta-learning, or learning to learn, aims at improving the model's capabilities by employing prior knowledge and experience. A meta-learning paradigm can appropriately tackle the conventional challenges of traditional learning approaches, such as insufficient number of samples, domain shifts, and generalization. These unique characteristics position meta-learning as a suitable choice for developing influential solutions in various healthcare contexts, where the available data is often insufficient, and the data collection methodologies are different. This survey discusses meta-learning broad applications in the healthcare domain to provide insight into how and where it can address critical healthcare challenges. We first describe the theoretical foundations and pivotal methods of meta-learning. We then divide the employed meta-learning approaches in the healthcare domain into two main categories of multi/single-task learning and many/few-shot learning and survey the studies. Finally, we highlight the current challenges in meta-learning research, discuss the potential solutions and provide future perspectives on meta-learning in healthcare.

摘要
Traditional learning approaches often struggle with insufficient data, domain shifts, and generalization. However, meta-learning, a subset of machine learning, aims to improve the model's capabilities by leveraging prior knowledge and experience. This survey discusses the applications of meta-learning in the healthcare domain, exploring how it can address critical challenges such as insufficient data and diverse data collection methodologies.First, we provide an overview of the theoretical foundations and key methods of meta-learning. We then categorize the meta-learning approaches used in healthcare into two main types: multi/single-task learning and many/few-shot learning. Next, we survey the relevant studies and highlight the current challenges in meta-learning research. Finally, we discuss potential solutions and provide future perspectives on the use of meta-learning in healthcare.In this survey, we focus on the applications of meta-learning in healthcare, exploring how it can improve the accuracy and adaptability of machine learning models in various healthcare contexts. We believe that this survey will provide valuable insights into the potential of meta-learning in healthcare and inspire further research in this promising area.

Data-Based Design of Multi-Model Inferential Sensors

paper_url: http://arxiv.org/abs/2308.02872
repo_url: None
paper_authors: Martin Mojto, Karol Lubušký, Miroslav Fikar, Radoslav Paulen
for: 这篇论文关注软感知设计问题，旨在提高软感知测量的准确性和 Linearity。
methods: 该论文提出了两种新的软感知设计方法，以提高软感知测量的性能，并且可以维护软感知的线性结构。
results: 研究人员通过设计了一种多模型软感知测量系统，并对其进行了比较性评估，并显示了与现有单模型软感知测量系统和referential软感知测量系统相比，其性能有所提高。

Abstract
This paper deals with the problem of inferential (soft) sensor design. The nonlinear character of industrial processes is usually the main limitation to designing simple linear inferential sensors with sufficient accuracy. In order to increase the inferential sensor predictive performance and yet to maintain its linear structure, multi-model inferential sensors represent a straightforward option. In this contribution, we propose two novel approaches for the design of multi-model inferential sensors aiming to mitigate some drawbacks of the state-of-the-art approaches. For a demonstration of the developed techniques, we design inferential sensors for a Vacuum Gasoil Hydrogenation unit, which is a real-world petrochemical refinery unit. The performance of the multi-model inferential sensor is compared against various single-model inferential sensors and the current (referential) inferential sensor used in the refinery. The results show substantial improvements over the state-of-the-art design techniques for single-/multi-model inferential sensors.

摘要
Simplified Chinese translation:这篇论文关注了软传感器设计问题，特别是因为工业过程的非线性所带来的限制。为了提高多模型软传感器的预测性能而不失 linearity，我们提出了两种新的设计方法。我们通过设计一个真实存在的油化工厂中的真空气油氢化单元的软传感器，以示出我们的技术的效果。与现有的单模型和多模型软传感器相比，我们的设计方法具有显著的改进。

NP-SemiSeg: When Neural Processes meet Semi-Supervised Semantic Segmentation

paper_url: http://arxiv.org/abs/2308.02866
repo_url: https://github.com/jianf-wang/np-semiseg
paper_authors: Jianfeng Wang, Daniela Massiceti, Xiaolin Hu, Vladimir Pavlovic, Thomas Lukasiewicz
for: 这个论文主要针对的是semi-supervised semantic segmentation领域，即在训练时对无标注图像进行像素级标注。
methods: 该论文提出了一种基于神经网络过程（NP）的uncertainty quantification方法，用于提高semi-supervised semantic segmentation的准确性。
results: 实验表明，NP-SemiSeg模型在PASCAL VOC 2012和Cityscapes等公共 bencmark上，在不同的训练设置下，具有显著的效果。

Abstract
Semi-supervised semantic segmentation involves assigning pixel-wise labels to unlabeled images at training time. This is useful in a wide range of real-world applications where collecting pixel-wise labels is not feasible in time or cost. Current approaches to semi-supervised semantic segmentation work by predicting pseudo-labels for each pixel from a class-wise probability distribution output by a model. If the predicted probability distribution is incorrect, however, this leads to poor segmentation results, which can have knock-on consequences in safety critical systems, like medical images or self-driving cars. It is, therefore, important to understand what a model does not know, which is mainly achieved by uncertainty quantification. Recently, neural processes (NPs) have been explored in semi-supervised image classification, and they have been a computationally efficient and effective method for uncertainty quantification. In this work, we move one step forward by adapting NPs to semi-supervised semantic segmentation, resulting in a new model called NP-SemiSeg. We experimentally evaluated NP-SemiSeg on the public benchmarks PASCAL VOC 2012 and Cityscapes, with different training settings, and the results verify its effectiveness.

摘要
semi-supervised semantic segmentation是将无标记图像的像素级别标签分配给训练图像的过程。这有助于在各种现实世界应用中，例如收集像素级别标签是不可能或太昂贵。当前的semi-supervised semantic segmentation方法采用预测每个像素的假标签方法，从模型输出的类别概率分布中预测 pseudo-labels。如果预测的概率分布错误，则会导致 segmentation 结果差，这可能会影响到安全关键系统，如医疗图像或自动驾驶车。因此，了解模型不知道的内容非常重要，主要通过uncertainty量化来实现。最近，神经过程（NP）在 semi-supervised 图像分类中被探索，它们是一种计算效率高和可靠的不确定性量化方法。在这项工作中，我们将NP adapted to semi-supervised semantic segmentation，得到了NP-SemiSeg模型。我们对NP-SemiSeg进行了不同的训练设置，并在公共 benchmark PASCAL VOC 2012 和 Cityscapes 上进行了实验，结果证明了它的效果。

Generative Adversarial Networks for Stain Normalisation in Histopathology

paper_url: http://arxiv.org/abs/2308.02851
repo_url: None
paper_authors: Jack Breen, Kieran Zucker, Katie Allen, Nishant Ravikumar, Nicolas M. Orsi
for: 这个研究旨在探讨数位patology中对于图像的normalization方法，以提高人工智能模型的准确性和效率。
methods: 这篇论文探讨了不同的技术，包括使用生成式对抗网络（GANs）来实现图像normalization。GAN-based方法通常会比非生成式方法表现更好，但需要更高的计算资源。
results: 然而，不同的GAN和非GAN方法在不同的场景和效能指标下可能会出现不同的表现，这是该领域的繁殖领域。

Abstract
The rapid growth of digital pathology in recent years has provided an ideal opportunity for the development of artificial intelligence-based tools to improve the accuracy and efficiency of clinical diagnoses. One of the significant roadblocks to current research is the high level of visual variability across digital pathology images, causing models to generalise poorly to unseen data. Stain normalisation aims to standardise the visual profile of digital pathology images without changing the structural content of the images. In this chapter, we explore different techniques which have been used for stain normalisation in digital pathology, with a focus on approaches which utilise generative adversarial networks (GANs). Typically, GAN-based methods outperform non-generative approaches but at the cost of much greater computational requirements. However, it is not clear which method is best for stain normalisation in general, with different GAN and non-GAN approaches outperforming each other in different scenarios and according to different performance metrics. This is an ongoing field of study as researchers aim to identify a method which efficiently and effectively normalises pathology images to make AI models more robust and generalisable.

摘要
随着数字patology的快速发展，近年来提供了开发基于人工智能技术的诊断工具，以提高诊断精度和效率的 идеal 机会。然而，当前的研究面临一个重要的障碍：数字patology图像之间的视觉变化很大，导致模型尝试将数据推断到未经见过的数据。正常化着色方法可以帮助标准化数字patology图像的视觉profile，不会改变图像的结构内容。本章介绍了不同的数字patology图像正常化技术，主要关注使用生成对抗网络（GAN）的方法。通常，GAN基本方法在性能上表现更好，但是计算需求相对较高。然而，没有一种最佳的正常化方法，不同的GAN和非GAN方法在不同的场景和指标下各有优劣。这是一个持续的研究领域，研究人员希望找到一种高效、可靠地标准化pathology图像，以使AI模型更加稳定和泛化。

Approximating Positive Homogeneous Functions with Scale Invariant Neural Networks

paper_url: http://arxiv.org/abs/2308.02836
repo_url: None
paper_authors: Stefan Bamberger, Reinhard Heckel, Felix Krahmer
for: investigate the possibility of solving linear inverse problems with $ReLu$ networks
methods: use positive homogeneous functions and neural networks to recover sparse vectors and low-rank matrices
results: show that $ReLu$ networks with two hidden layers can approximately recover sparse vectors with arbitrary precision and low-rank matrices in a stable way, and establish new results on the approximation of general positive homogeneous functions with neural networks.

Abstract
We investigate to what extent it is possible to solve linear inverse problems with $ReLu$ networks. Due to the scaling invariance arising from the linearity, an optimal reconstruction function $f$ for such a problem is positive homogeneous, i.e., satisfies $f(\lambda x) = \lambda f(x)$ for all non-negative $\lambda$. In a $ReLu$ network, this condition translates to considering networks without bias terms. We first consider recovery of sparse vectors from few linear measurements. We prove that $ReLu$- networks with only one hidden layer cannot even recover $1$-sparse vectors, not even approximately, and regardless of the width of the network. However, with two hidden layers, approximate recovery with arbitrary precision and arbitrary sparsity level $s$ is possible in a stable way. We then extend our results to a wider class of recovery problems including low-rank matrix recovery and phase retrieval. Furthermore, we also consider the approximation of general positive homogeneous functions with neural networks. Extending previous work, we establish new results explaining under which conditions such functions can be approximated with neural networks. Our results also shed some light on the seeming contradiction between previous works showing that neural networks for inverse problems typically have very large Lipschitz constants, but still perform very well also for adversarial noise. Namely, the error bounds in our expressivity results include a combination of a small constant term and a term that is linear in the noise level, indicating that robustness issues may occur only for very small noise levels.

摘要
我们 investigate 当中解决线性逆问题是否可以使用 $ReLu$ 网络。由于线性对称性，一个理想的重建函数 $f$ 的条件是对所有非负 $\lambda$ 满足 $f(\lambda x) = \lambda f(x)$。在 $ReLu$ 网络中，这条件可以翻译为不包含偏好项的网络。我们首先考虑从少数线性量测到的簇短 vectors 的重建。我们证明 $ReLu$-网络仅有一个隐藏层时不能够精确地重建 $1$-簇短 vectors，不管网络的宽度如何。然而，具有两个隐藏层时，可以在稳定的情况下，对于任意精度和簇短度 $s$ 进行精确的重建。我们随后扩展我们的结果到一个更广泛的复原问题，包括低矩组合成问题和相位重建问题。此外，我们还考虑了一般正规复合函数的近似。我们从前一些工作继续推广，并建立新的结果，解释在哪些情况下，这些函数可以通过神经网络进行近似。我们的结果还照明了对于过去一些作品，显示神经网络在反对抗变量下表现非常好，但是实际上可能存在一定的Robustness问题。具体来说，我们的错误范围包括一定的常数项和一个线性对应到噪音水平的项目，这表明在某些情况下，Robustness问题可能仅在很小的噪音水平下出现。

Reinforcement Learning for Financial Index Tracking

paper_url: http://arxiv.org/abs/2308.02820
repo_url: https://github.com/dppalomar/sparseindextracking
paper_authors: Xianhua Peng, Chenyin Gong, Xue Dong He
for: 本研究旨在提出一种基于dynamic programming的财务指数追踪问题的解决方案，以满足返点追踪错误和价值追踪错误的同时要求。
methods: 本研究使用了 Banach 固定点迭代法解决股票重新平衡方程，并使用了深度强化学习（RL）方法解决动态形式问题。
results: 实验结果表明，提出的方法在追踪精度和额外收益方面都有较好的表现，并且可以通过充值或减值策略实现额外收益。

Abstract
We propose the first discrete-time infinite-horizon dynamic formulation of the financial index tracking problem under both return-based tracking error and value-based tracking error. The formulation overcomes the limitations of existing models by incorporating the intertemporal dynamics of market information variables not limited to prices, allowing exact calculation of transaction costs, accounting for the tradeoff between overall tracking error and transaction costs, allowing effective use of data in a long time period, etc. The formulation also allows novel decision variables of cash injection or withdraw. We propose to solve the portfolio rebalancing equation using a Banach fixed point iteration, which allows to accurately calculate the transaction costs specified as nonlinear functions of trading volumes in practice. We propose an extension of deep reinforcement learning (RL) method to solve the dynamic formulation. Our RL method resolves the issue of data limitation resulting from the availability of a single sample path of financial data by a novel training scheme. A comprehensive empirical study based on a 17-year-long testing set demonstrates that the proposed method outperforms a benchmark method in terms of tracking accuracy and has the potential for earning extra profit through cash withdraw strategy.

摘要
我们提出了首个离散时间无限远景动态形式化的金融指数跟踪问题，包括返利基于跟踪错误和价值基于跟踪错误。这种形式超越了现有模型的局限性，因为它包括市场信息变量的时间间动态，允许精确计算交易成本，考虑跟踪错误与交易成本之间的负面效应，使用长时间期内数据，等等。我们还提出了一种新的资金注入或抽取变量。我们使用巴нах固定点迭代法来解决portfolio重新平衡方程，这allowaccurately计算交易成本，它们在实践中是非线性函数。我们还提出了一种基于深度强化学习（RL）方法来解决动态形式。我们的RL方法可以解决数据有限制问题，这是因为只有一个金融数据样本路径的可用性。我们的实验结果表明，我们的方法在跟踪精度和交易成本方面表现出色，并且具有可能获得额外利润的资金抽取策略。Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

A generative model for surrogates of spatial-temporal wildfire nowcasting

paper_url: http://arxiv.org/abs/2308.02810
repo_url: None
paper_authors: Sibo Cheng, Yike Guo, Rossella Arcucci
for: 这个研究旨在提出一种基于生成模型的野火预测方法，以提高野火预测的准确性和效率。
methods: 该研究使用了三维向量量化自适应机制（Vector-Quantized Variational Autoencoders，VQVAE）生成空间时间序列的野火烧区。
results: 实验结果表明，该模型能够生成具有凝结性和结构的野火enario，考虑了地理物理变量的影响，如 Vegetation 和 Slope。生成的数据还用于训练一个野火扩散预测模型，并在实验和实际 chimney 火事件上进行了测试。

Abstract
Recent increase in wildfires worldwide has led to the need for real-time fire nowcasting. Physics-driven models, such as cellular automata and computational fluid dynamics can provide high-fidelity fire spread simulations but they are computationally expensive and time-consuming. Much effort has been put into developing machine learning models for fire prediction. However, these models are often region-specific and require a substantial quantity of simulation data for training purpose. This results in a significant amount of computational effort for different ecoregions. In this work, a generative model is proposed using a three-dimensional Vector-Quantized Variational Autoencoders to generate spatial-temporal sequences of unseen wildfire burned areas in a given ecoregion. The model is tested in the ecoregion of a recent massive wildfire event in California, known as the Chimney fire. Numerical results show that the model succeed in generating coherent and structured fire scenarios, taking into account the impact from geophysical variables, such as vegetation and slope. Generated data are also used to train a surrogate model for predicting wildfire dissemination, which has been tested on both simulation data and the real Chimney fire event.

摘要
In this work, a generative model is proposed using a three-dimensional Vector-Quantized Variational Autoencoders to generate spatial-temporal sequences of unseen wildfire burned areas in a given ecoregion. The model is tested in the ecoregion of a recent massive wildfire event in California, known as the Chimney fire. Numerical results show that the model successfully generated coherent and structured fire scenarios, taking into account the impact from geophysical variables such as vegetation and slope. The generated data were also used to train a surrogate model for predicting wildfire dissemination, which was tested on both simulation data and the real Chimney fire event.

MiAMix: Enhancing Image Classification through a Multi-stage Augmented Mixed Sample Data Augmentation Method

paper_url: http://arxiv.org/abs/2308.02804
repo_url: None
paper_authors: Wen Liang, Youzhi Liang, Jianguo Jia
for: 提高深度学习模型的泛化能力和性能
methods: 推出一种新的多阶段混合方法（MiAMix）， integrate 图像扩展 into 混合框架，并采用多种多样化混合方法并行进行混合
results: 对四个图像标准 benchmark 进行了全面评估，证明 MiAMix 可以提高性能而不增加计算负担

Abstract
Despite substantial progress in the field of deep learning, overfitting persists as a critical challenge, and data augmentation has emerged as a particularly promising approach due to its capacity to enhance model generalization in various computer vision tasks. While various strategies have been proposed, Mixed Sample Data Augmentation (MSDA) has shown great potential for enhancing model performance and generalization. We introduce a novel mixup method called MiAMix, which stands for Multi-stage Augmented Mixup. MiAMix integrates image augmentation into the mixup framework, utilizes multiple diversified mixing methods concurrently, and improves the mixing method by randomly selecting mixing mask augmentation methods. Recent methods utilize saliency information and the MiAMix is designed for computational efficiency as well, reducing additional overhead and offering easy integration into existing training pipelines. We comprehensively evaluate MiaMix using four image benchmarks and pitting it against current state-of-the-art mixed sample data augmentation techniques to demonstrate that MIAMix improves performance without heavy computational overhead.

摘要
尽管深度学习领域已经做出了很大的进步，但过拟合仍然是一个重要的挑战，而数据扩充被认为是一种有力的方法，因为它可以提高模型在各种计算机视觉任务中的泛化性。虽然有很多策略被提出，但混合样本数据扩充（MSDA）表现出了很大的潜力，可以提高模型性能和泛化性。我们介绍了一种新的混合方法，即MiAMix，它是多阶段混合的混合方法。MiAMix将图像扩充integrated into the mixup framework，并同时使用多种多样化混合方法，通过随机选择混合面积混合方法来提高混合方法。现有方法使用了焦点信息，而MiAMix也是为计算效率设计的，减少了额外的负担，并且可以轻松地 integrate into existing training pipelines。我们对MiaMix进行了四个图像 benchmark 的全面评估，并与当前状态的混合样本数据扩充技术进行了比较，以示 MiAMix 可以提高性能而不需要重大的计算负担。

OBESEYE: Interpretable Diet Recommender for Obesity Management using Machine Learning and Explainable AI

paper_url: http://arxiv.org/abs/2308.02796
repo_url: None
paper_authors: Mrinmoy Roy, Srabonti Das, Anica Tasnim Protity
For: The paper aims to develop a novel machine learning-based system to predict the amount of nutrients an individual requires for being healthy, with a focus on patients with comorbidities.* Methods: The authors applied different machine learning algorithms, including linear regression, support vector machine (SVM), decision tree, random forest, XGBoost, and LightGBM, to predict fluid, carbohydrate, protein, and fat consumption.* Results: The authors achieved high accuracy with low root mean square error (RMSE) using linear regression in fluid prediction, random forest in carbohydrate prediction, and LightGBM in protein and fat prediction. They also developed a diet recommender system called OBESEYE, which considers comorbidities and physical conditions and promotes encouragement to get rid of obesity.Here are the three points in Simplified Chinese text:* For: 本研究旨在开发一种基于机器学习的个性化营养计划，以便患有多种疾病的患者可以按照自己的身体和疾病状况进行适应的饮食规划。* Methods: 作者采用了不同的机器学习算法，包括线性回归、支持向量机（SVM）、决策树、随机森林、XGBoost和LightGBM，来预测液体、碳水化合物、蛋白质和脂肪的消耗。* Results: 作者通过线性回归在液体预测中获得了高精度低根本平方误差（RMSE），Random Forest在碳水化合物预测中获得了高精度，XGBoost和LightGBM在蛋白质和脂肪预测中获得了高精度。同时，作者还开发了一个名为OBESEYE的个性化饮食建议系统，该系统考虑了患者的身体和疾病状况，并且鼓励患者减轻肥胖。

Abstract
Obesity, the leading cause of many non-communicable diseases, occurs mainly for eating more than our body requirements and lack of proper activity. So, being healthy requires heathy diet plans, especially for patients with comorbidities. But it is difficult to figure out the exact quantity of each nutrient because nutrients requirement varies based on physical and disease conditions. In our study we proposed a novel machine learning based system to predict the amount of nutrients one individual requires for being healthy. We applied different machine learning algorithms: linear regression, support vector machine (SVM), decision tree, random forest, XGBoost, LightGBM on fluid and 3 other major micronutrients: carbohydrate, protein, fat consumption prediction. We achieved high accuracy with low root mean square error (RMSE) by using linear regression in fluid prediction, random forest in carbohydrate prediction and LightGBM in protein and fat prediction. We believe our diet recommender system, OBESEYE, is the only of its kind which recommends diet with the consideration of comorbidities and physical conditions and promote encouragement to get rid of obesity.

摘要
论文标题：一种基于机器学习的健康饮食计划系统（OBESEYE） introduction：肥胖是多种非传染病的主要原因，通常由食物过量和不足活动引起。因此，保持健康需要健康饮食计划，特别是 для患有相关疾病的患者。然而，确定每个人需要的具体量的营养物质很难，因为各种物质的需求因人体和疾病状况而异。在我们的研究中，我们提出了一种基于机器学习的系统，可以预测每个人需要的营养物质量。我们使用了不同的机器学习算法，包括线性回归、支持向量机（SVM）、决策树、随机森林和XGBoost、LightGBM，对营养物质的液体和三种主要微量营养素的消耗预测。我们实现了高精度低根方差误差（RMSE）的预测，通过线性回归在液体预测中，随机森林在碳水化合物预测中，LightGBM在蛋白质和脂肪预测中。我们认为我们的饮食建议系统（OBESEYE）是目前所知的唯一一种考虑患者的相关疾病和身体状况，并且激励人们努力减轻肥胖。Here's the translation in Simplified Chinese:论文标题：一种基于机器学习的健康饮食计划系统（OBESEYE）引言：肥胖是多种非传染病的主要原因，通常由食物过量和不足活动引起。因此，保持健康需要健康饮食计划，特别是 для患有相关疾病的患者。然而，确定每个人需要的具体量的营养物质很难，因为各种物质的需求因人体和疾病状况而异。在我们的研究中，我们提出了一种基于机器学习的系统，可以预测每个人需要的营养物质量。我们使用了不同的机器学习算法，包括线性回归、支持向量机（SVM）、决策树、随机森林和XGBoost、LightGBM，对营养物质的液体和三种主要微量营养素的消耗预测。我们实现了高精度低根方差误差（RMSE）的预测，通过线性回归在液体预测中，随机森林在碳水化合物预测中，LightGBM在蛋白质和脂肪预测中。我们认为我们的饮食建议系统（OBESEYE）是目前所知的唯一一种考虑患者的相关疾病和身体状况，并且激励人们努力减轻肥胖。

OrcoDCS: An IoT-Edge Orchestrated Online Deep Compressed Sensing Framework

paper_url: http://arxiv.org/abs/2308.05757
repo_url: None
paper_authors: Cheng-Wei Ching, Chirag Gupta, Zi Huang, Liting Hu
for: 这个研究是为了提高无线传感网络（WSNs）上的压缩数据聚合（CDA）的灵活性和适应性，并且能够应对不同的感应任务和环境变化。
methods: 这个研究使用了 IoT-Edge 协调的在线深度压缩感应（DCS）框架，利用特殊设计的不对称自适应器，可以大幅削减编码负载和提高重建性能和稳定性。
results: 分析和实验结果显示，OrcoDCS 比顶对称 DCDA 在训练时间、灵活性和适应性方面表现出色，并且在follow-up应用中得到了更高的性能。

Abstract
Compressed data aggregation (CDA) over wireless sensor networks (WSNs) is task-specific and subject to environmental changes. However, the existing compressed data aggregation (CDA) frameworks (e.g., compressed sensing-based data aggregation, deep learning(DL)-based data aggregation) do not possess the flexibility and adaptivity required to handle distinct sensing tasks and environmental changes. Additionally, they do not consider the performance of follow-up IoT data-driven deep learning (DL)-based applications. To address these shortcomings, we propose OrcoDCS, an IoT-Edge orchestrated online deep compressed sensing framework that offers high flexibility and adaptability to distinct IoT device groups and their sensing tasks, as well as high performance for follow-up applications. The novelty of our work is the design and deployment of IoT-Edge orchestrated online training framework over WSNs by leveraging an specially-designed asymmetric autoencoder, which can largely reduce the encoding overhead and improve the reconstruction performance and robustness. We show analytically and empirically that OrcoDCS outperforms the state-of-the-art DCDA on training time, significantly improves flexibility and adaptability when distinct reconstruction tasks are given, and achieves higher performance for follow-up applications.

摘要
压缩数据聚合（CDA）在无线传感器网络（WSN）上是任务特定和环境变化的。然而，现有的压缩数据聚合（CDA）框架（例如，扩lapsed感知基于的数据聚合、深度学习（DL）基于的数据聚合）不具备适应性和灵活性，以处理不同的感知任务和环境变化。另外，它们不考虑follow-up IoT数据驱动的深度学习（DL）基于应用的性能。为了解决这些不足，我们提议OrcoDCS，一个基于IoT-Edge协调的在线深度压缩感知框架，具有高适应性和灵活性，以及高性能 дляfollow-up应用。我们利用特制的非对称自适应神经网络，可以大幅减少编码开销和提高重建性能和Robustness。我们分析和实验表明，OrcoDCS在训练时间、适应性和follow-up应用性能方面，与状态空间的DCDA有显著优势。

Semi-supervised Contrastive Regression for Estimation of Eye Gaze

paper_url: http://arxiv.org/abs/2308.02784
repo_url: None
paper_authors: Somsukla Maiti, Akshansh Gupta
for: 这个研究旨在开发一个半supervised contrastive learning框架，用于视线方向估测。methods: 这个框架使用了深度学习模型，并提出了一个新的对照损失函数，用于增强对照对象之间的对应性。results: 该框架能够从小量类别视线数据集上学习一个通用的解决方案，并与多个现有的对照学习技术进行比较，表现良好。

Abstract
With the escalated demand of human-machine interfaces for intelligent systems, development of gaze controlled system have become a necessity. Gaze, being the non-intrusive form of human interaction, is one of the best suited approach. Appearance based deep learning models are the most widely used for gaze estimation. But the performance of these models is entirely influenced by the size of labeled gaze dataset and in effect affects generalization in performance. This paper aims to develop a semi-supervised contrastive learning framework for estimation of gaze direction. With a small labeled gaze dataset, the framework is able to find a generalized solution even for unseen face images. In this paper, we have proposed a new contrastive loss paradigm that maximizes the similarity agreement between similar images and at the same time reduces the redundancy in embedding representations. Our contrastive regression framework shows good performance in comparison to several state of the art contrastive learning techniques used for gaze estimation.

摘要
“随着人机交互界面智能系统的需求增加，视线控制系统的开发已成为一项必要。视线是一种不侵入的人类互动方式，是智能系统中最适合的方法。深度学习模型是目前最广泛使用的视线估计方法，但这些模型的性能受到标注视线资料集的大小和统计特性的影响。本文提出了一个半监督对称学习框架，可以对不同的脸像图像进行视线估计。我们提出了一个新的对称损失函数，可以增加相似图像之间的相似性协议，同时将嵌入表现的统计特性减少。我们的对称回传框架在与多个现有的对称学习技术相比，表现良好。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Dataopsy: Scalable and Fluid Visual Exploration using Aggregate Query Sculpting

paper_url: http://arxiv.org/abs/2308.02764
repo_url: None
paper_authors: Md Naimul Hoque, Niklas Elmqvist
for: 大规模多维数据的faceted visual查询技术
methods: P6方法（折冲、分布、侦错、堆积、对应、篡改）
results: 透过两个案例研究和三个应用例子，证明AQS的可扩展性和灵活性。

Abstract
We present aggregate query sculpting (AQS), a faceted visual query technique for large-scale multidimensional data. As a "born scalable" query technique, AQS starts visualization with a single visual mark representing an aggregation of the entire dataset. The user can then progressively explore the dataset through a sequence of operations abbreviated as P6: pivot (facet an aggregate based on an attribute), partition (lay out a facet in space), peek (see inside a subset using an aggregate visual representation), pile (merge two or more subsets), project (extracting a subset into a new substrate), and prune (discard an aggregate not currently of interest). We validate AQS with Dataopsy, a prototype implementation of AQS that has been designed for fluid interaction on desktop and touch-based mobile devices. We demonstrate AQS and Dataopsy using two case studies and three application examples.

摘要
我团队推出了聚合查询雕刻（AQS），一种适用于大规模多维数据的faceted visual查询技术。作为一种“生成可扩展”的查询技术，AQS从visual化开始，使用单个视觉标记表示整个数据集的聚合。用户可以逐渐探索数据集通过一系列操作简写为P6：分 pivot（基于特性分facet）、分区（在空间中排序facet）、偷窥（使用聚合视觉表示在子集中查看内容）、堆叠（合并两个或多个子集）、 проек（抽取子集到新基础上）和剪除（不再关心的聚合）。我们验证了AQS和Dataopsy，一个实现AQS的prototype，在桌面和触摸式移动设备上实现了流畅交互。我们通过两个案例和三个应用示例来证明AQS和Dataopsy的可用性。

Neural Collapse in the Intermediate Hidden Layers of Classification Neural Networks

paper_url: http://arxiv.org/abs/2308.02760
repo_url: None
paper_authors: Liam Parker, Emre Onal, Anton Stengel, Jake Intrater
for: 这个论文旨在研究分类神经网络中间埋在层的Neural Collapse（NC）现象。
methods: 作者采用了多种网络架构、活化函数和数据集来研究NC现象在不同层次中的出现。
results: 研究结果显示，NC现象在大多数中间埋在层中出现，并且这个层次的强度相对于神经网络的深度有正相关性。此外，研究还发现，大多数减少内类差值的减少发生在神经网络的浅层中，角度分割between class means随着层次深度增加，并且简单的数据集只需要神经网络的浅层来完全学习它们，而更复杂的数据集则需要整个神经网络。这些结果为分类神经网络的特征传播提供了细腻的理解。

Abstract
Neural Collapse (NC) gives a precise description of the representations of classes in the final hidden layer of classification neural networks. This description provides insights into how these networks learn features and generalize well when trained past zero training error. However, to date, (NC) has only been studied in the final layer of these networks. In the present paper, we provide the first comprehensive empirical analysis of the emergence of (NC) in the intermediate hidden layers of these classifiers. We examine a variety of network architectures, activations, and datasets, and demonstrate that some degree of (NC) emerges in most of the intermediate hidden layers of the network, where the degree of collapse in any given layer is typically positively correlated with the depth of that layer in the neural network. Moreover, we remark that: (1) almost all of the reduction in intra-class variance in the samples occurs in the shallower layers of the networks, (2) the angular separation between class means increases consistently with hidden layer depth, and (3) simple datasets require only the shallower layers of the networks to fully learn them, whereas more difficult ones require the entire network. Ultimately, these results provide granular insights into the structural propagation of features through classification neural networks.

摘要

WeldMon: A Cost-effective Ultrasonic Welding Machine Condition Monitoring System

paper_url: http://arxiv.org/abs/2308.05756
repo_url: None
paper_authors: Beitong Tian, Kuan-Chieh Lu, Ahmadreza Eslaminia, Yaohui Wang, Chenhui Shao, Klara Nahrstedt
for: 这个研究是为了开发一个可靠且高性能的类比声焊机器床状况监控系统，以应对现有监控方法面临的成本、 downtime 和适应性问题。
methods: 这个研究使用了一个自订的数据取得系统和一个数据分析管道，实现了实时数据分析。我们的分类算法结合了自动生成的特征和手工设计的特征，在条件分类任务中获得了95.8%的越 Validation精度（较之前者92.5%）。我们的数据增强方法对工具状况分类精度进行了8.3%的提升。所有算法都在本地运行，仅需385毫秒过程数据。
results: 我们的实验结果显示，WeldMon 可以实现高性能和可靠的工具状况监控，并且比现有的监控方法更有效率和可靠。我们在实验中使用了一个商业系统和 WeldMon 进行了比较，发现它们在某些任务上有所不同。

Abstract
Ultrasonic welding machines play a critical role in the lithium battery industry, facilitating the bonding of batteries with conductors. Ensuring high-quality welding is vital, making tool condition monitoring systems essential for early-stage quality control. However, existing monitoring methods face challenges in cost, downtime, and adaptability. In this paper, we present WeldMon, an affordable ultrasonic welding machine condition monitoring system that utilizes a custom data acquisition system and a data analysis pipeline designed for real-time analysis. Our classification algorithm combines auto-generated features and hand-crafted features, achieving superior cross-validation accuracy (95.8% on average over all testing tasks) compared to the state-of-the-art method (92.5%) in condition classification tasks. Our data augmentation approach alleviates the concept drift problem, enhancing tool condition classification accuracy by 8.3%. All algorithms run locally, requiring only 385 milliseconds to process data for each welding cycle. We deploy WeldMon and a commercial system on an actual ultrasonic welding machine, performing a comprehensive comparison. Our findings highlight the potential for developing cost-effective, high-performance, and reliable tool condition monitoring systems.

摘要
ultrasonic 焊接机器在锂元电池业中扮演着重要的角色，帮助焊接电池与导体。保证高品质焊接是非常重要，使工具状况监控系统成为了早期质量控制的必备。然而，现有的监控方法受到成本、机制时间和适应性等限制。在这篇论文中，我们提出了WeldMon，一个可以减少成本、提高效率和可靠性的焊接机器状况监控系统。我们的分类算法结合自动生成的特征和手工设计的特征，在条件分类任务中获得了95.8%的平均标准化精度（较state-of-the-art方法92.5%）。我们的数据增强方法解决了概念迁移问题，增强工具状况分类精度8.3%。所有算法在本地运行，仅需385毫秒处理每次焊接周期的数据。我们在实际的焊接机器上部署WeldMon和一个商业系统，进行了对比分析。我们的发现显示了开发可靠、高性能、低成本的工具状况监控系统的潜力。

DaMSTF: Domain Adversarial Learning Enhanced Meta Self-Training for Domain Adaptation

paper_url: http://arxiv.org/abs/2308.02753
repo_url: None
paper_authors: Menglong Lu, Zhen Huang, Yunxiang Zhao, Zhiliang Tian, Yang Liu, Dongsheng Li
For: 这篇研究探讨了自适应的发展，特别是透过模型的预测作为pseudo labels的目标领域资料自适应。* Methods: 本研究提出了一个新的自适应框架，即Domain adversarial learning enhanced Self-Training Framework (DaMSTF)，包括meta-learning估算每个pseudo instance的重要性，以同时减少label noise和保留困难的例子。* Results: 理论和实验证明了提案的DaMSTF的效果，在跨领域感受分类任务上，DaMSTF可以提高BERT的性能，增加了约4%的改善。

Abstract
Self-training emerges as an important research line on domain adaptation. By taking the model's prediction as the pseudo labels of the unlabeled data, self-training bootstraps the model with pseudo instances in the target domain. However, the prediction errors of pseudo labels (label noise) challenge the performance of self-training. To address this problem, previous approaches only use reliable pseudo instances, i.e., pseudo instances with high prediction confidence, to retrain the model. Although these strategies effectively reduce the label noise, they are prone to miss the hard examples. In this paper, we propose a new self-training framework for domain adaptation, namely Domain adversarial learning enhanced Self-Training Framework (DaMSTF). Firstly, DaMSTF involves meta-learning to estimate the importance of each pseudo instance, so as to simultaneously reduce the label noise and preserve hard examples. Secondly, we design a meta constructor for constructing the meta-validation set, which guarantees the effectiveness of the meta-learning module by improving the quality of the meta-validation set. Thirdly, we find that the meta-learning module suffers from the training guidance vanishment and tends to converge to an inferior optimal. To this end, we employ domain adversarial learning as a heuristic neural network initialization method, which can help the meta-learning module converge to a better optimal. Theoretically and experimentally, we demonstrate the effectiveness of the proposed DaMSTF. On the cross-domain sentiment classification task, DaMSTF improves the performance of BERT with an average of nearly 4%.

摘要
自我训练emerges as an important research line on domain adaptation. By taking the model's prediction as the pseudo labels of the unlabeled data, self-training bootstraps the model with pseudo instances in the target domain. However, the prediction errors of pseudo labels (label noise) challenge the performance of self-training. To address this problem, previous approaches only use reliable pseudo instances, i.e., pseudo instances with high prediction confidence, to retrain the model. Although these strategies effectively reduce the label noise, they are prone to miss the hard examples. In this paper, we propose a new self-training framework for domain adaptation, namely Domain adversarial learning enhanced Self-Training Framework (DaMSTF). Firstly, DaMSTF involves meta-learning to estimate the importance of each pseudo instance, so as to simultaneously reduce the label noise and preserve hard examples. Secondly, we design a meta constructor for constructing the meta-validation set, which guarantees the effectiveness of the meta-learning module by improving the quality of the meta-validation set. Thirdly, we find that the meta-learning module suffers from the training guidance vanishment and tends to converge to an inferior optimal. To this end, we employ domain adversarial learning as a heuristic neural network initialization method, which can help the meta-learning module converge to a better optimal. Theoretically and experimentally, we demonstrate the effectiveness of the proposed DaMSTF. On the cross-domain sentiment classification task, DaMSTF improves the performance of BERT with an average of nearly 4%.

NeRFs: The Search for the Best 3D Representation

paper_url: http://arxiv.org/abs/2308.02751
repo_url: None
paper_authors: Ravi Ramamoorthi
For: The paper is written for those interested in 3D representation and view synthesis, particularly in the context of Neural Radiance Fields (NeRFs) and their applications in computer graphics and vision.* Methods: The paper uses a historical perspective to review the development of NeRFs, and describes the NeRF representation as a continuous volume with view-dependent radiance and volume density obtained by querying a neural network.* Results: The paper describes the widespread adoption of NeRFs in the field, with thousands of papers extending or building on the original work, and numerous industrial applications and startup companies. It also provides some observations and insights regarding the future of 3D representations.Here is the same information in Simplified Chinese text:
for: 这篇论文是为了那些关注3D表示和视图合成的人们，尤其是在神经辐射场（NeRF）的应用中。
methods: 论文使用历史观点来检视NeRF的发展，并描述NeRF为一个连续体，其中包含视处理依赖的辐射和体积密度，通过神经网络进行查询。
results: 论文描述了NeRF在领域中的广泛应用，有千余篇文章进一步扩展或基于原始工作，以及众多的工业应用和创业公司。它还提供了一些关于未来3D表示的观察和意见。

Abstract
Neural Radiance Fields or NeRFs have become the representation of choice for problems in view synthesis or image-based rendering, as well as in many other applications across computer graphics and vision, and beyond. At their core, NeRFs describe a new representation of 3D scenes or 3D geometry. Instead of meshes, disparity maps, multiplane images or even voxel grids, they represent the scene as a continuous volume, with volumetric parameters like view-dependent radiance and volume density obtained by querying a neural network. The NeRF representation has now been widely used, with thousands of papers extending or building on it every year, multiple authors and websites providing overviews and surveys, and numerous industrial applications and startup companies. In this article, we briefly review the NeRF representation, and describe the three decades-long quest to find the best 3D representation for view synthesis and related problems, culminating in the NeRF papers. We then describe new developments in terms of NeRF representations and make some observations and insights regarding the future of 3D representations.

摘要
<> transtable("Neural Radiance Fields", "NeRFs", "view synthesis", "image-based rendering", "computer graphics", "vision", "3D scenes", "3D geometry", "meshes", "disparity maps", "multiplane images", "voxel grids")>文本： neural Radiance Fields（NeRFs）在视觉合成和基于图像的渲染等问题中成为了首选表示方式，同时也在计算机图形和计算机视觉等领域内广泛应用。NeRFs描述了一种新的3D场景或3D几何表示方式，而不是传统的网格、不一致度图、多平面图像或粒子网格。NeRF表示法中的场景被表示为一个连续的体积，通过问题 neural network 来获得视依endent的辐射光和体积密度。到目前为止，NeRF表示法已经广泛使用，每年有千篇以上的论文推广或基于它的研究，多个作者和网站提供了概述和评论，以及许多工业应用和 startup 公司。在这篇文章中，我们简要介绍NeRF表示法，并描述了在视觉合成和相关问题上三十年来寻找最佳3D表示方式的漫长历程，从而导致NeRF论文的出现。然后，我们描述了新的NeRF表示法发展和一些关于未来3D表示方式的见解和发现。

Exploiting On-chip Heterogeneity of Versal Architecture for GNN Inference Acceleration

paper_url: http://arxiv.org/abs/2308.02749
repo_url: None
paper_authors: Paul Chen, Pavan Manjunath, Sasindu Wijeratne, Bingyi Zhang, Viktor Prasanna
for: 加速Graph Neural Networks（GNNs）的推理，以便在机器学习（ML）应用中更好地实现社交网络分析、生物信息学等领域的应用。
methods: 利用数据稀疏性来加速GNN推理，并在AMD Versal ACAP架构中利用不同类型的计算机能进行灵活的稀疏计算。
results: 对于不同的模型和数据集，我们的实现在VCK5000 ACAP平台上比其他实现更快，包括CPU、GPU、ACAP和其他自定义GNN加速器。对于图 convolutional neural network（GCN）推理，我们的方法比使用PLOnly在同一ACAP设备上的设计快速了3.9-96.7倍。

Abstract
Graph Neural Networks (GNNs) have revolutionized many Machine Learning (ML) applications, such as social network analysis, bioinformatics, etc. GNN inference can be accelerated by exploiting data sparsity in the input graph, vertex features, and intermediate data in GNN computations. For dynamic sparsity exploitation, we leverage the heterogeneous computing capabilities of AMD Versal ACAP architecture to accelerate GNN inference. We develop a custom hardware module that executes the sparse primitives of the computation kernel on the Programmable Logic (PL) and efficiently computes the dense primitives using the AI Engine (AIE). To exploit data sparsity during inference, we devise a runtime kernel mapping strategy that dynamically assigns computation tasks to the PL and AIE based on data sparsity. Our implementation on the VCK5000 ACAP platform leads to superior performance compared with the state-of-the-art implementations on CPU, GPU, ACAP, and other custom GNN accelerators. Compared with these implementations, we achieve significant average runtime speedup across various models and datasets of 162.42x, 17.01x, 9.90x, and 27.23x, respectively. Furthermore, for Graph Convolutional Network (GCN) inference, our approach leads to a speedup of 3.9-96.7x compared to designs using PL only on the same ACAP device.

摘要
图形神经网络（GNN）在机器学习（ML）应用中引发了革命，如社交网络分析和生物信息处理等。在GNN推理中，我们可以利用输入图的数据稀疏性加速GNN推理。为了实现动态数据稀疏性利用，我们利用AMD Versal ACAP架构的多态计算能力来加速GNN推理。我们开发了一个自定义硬件模块，该模块在计算核心中执行稀疏运算，并使用AI引擎（AIE）高效地执行稠密运算。为了在推理过程中利用数据稀疏性，我们划分了一个动态映射策略，该策略在运行时基于数据稀疏性来分配计算任务给PL和AIE。我们在VCK5000 ACAP平台上实现了与状态网络实现相比，获得了显著的平均运行时速度提升，具体的提升比为162.42倍、17.01倍、9.90倍和27.23倍。此外，对于图几何学网络（GCN）推理，我们的方法比使用PL только在同一个ACAP设备上获得了3.9-96.7倍的速度提升。

SABRE: Robust Bayesian Peer-to-Peer Federated Learning

paper_url: http://arxiv.org/abs/2308.02747
repo_url: None
paper_authors: Nasimeh Heydaribeni, Ruisi Zhang, Tara Javidi, Cristina Nita-Rotaru, Farinaz Koushanfar
for: 提供了一种新的robust variational Bayesian peer-to-peer federated learning框架，以提高对恶意攻击的抵抗力。
methods: 使用了一种新的SABRE汇集方法，以超越现有的框架的限制，并不需要benign节点多于恶意节点。
results: 在非ID Settings下表现良好，并在benign Settings下超越基eline算法，并且从理论和实验角度证明了对数据/模型毒素攻击的Robustness。

Abstract
We introduce SABRE, a novel framework for robust variational Bayesian peer-to-peer federated learning. We analyze the robustness of the known variational Bayesian peer-to-peer federated learning framework (BayP2PFL) against poisoning attacks and subsequently show that BayP2PFL is not robust against those attacks. The new SABRE aggregation methodology is then devised to overcome the limitations of the existing frameworks. SABRE works well in non-IID settings, does not require the majority of the benign nodes over the compromised ones, and even outperforms the baseline algorithm in benign settings. We theoretically prove the robustness of our algorithm against data / model poisoning attacks in a decentralized linear regression setting. Proof-of-Concept evaluations on benchmark data from image classification demonstrate the superiority of SABRE over the existing frameworks under various poisoning attacks.

摘要
我们介绍SABRE，一个新的 Federated Learning框架，具有强健的Robustness。我们分析了现有的Variational Bayesian Peer-to-Peer Federated Learning框架（BayP2PFL）对于毒素攻击的Robustness，并证明BayP2PFL不具备Robustness。然后，我们提出了一新的SABRE聚合方法，以解决现有框架的限制。SABRE在非Identical Independent Distributed（non-IID）设定下运作良好，不需要大多数的良好节点超过毒素节点，甚至在良好设定下超越了基eline algorithm。我们有 teorically 证明了我们的算法对于数据/模型毒素攻击的Robustness。在分析数据集的image classification中，我们透过Proof-of-Concept的评估发现SABRE在不同的毒素攻击下表现出色，超过了现有的框架。

Meta-Tsallis-Entropy Minimization: A New Self-Training Approach for Domain Adaptation on Text Classification

paper_url: http://arxiv.org/abs/2308.02746
repo_url: None
paper_authors: Menglong Lu, Zhen Huang, Zhiliang Tian, Yunxiang Zhao, Xuanyu Fei, Dongsheng Li
for: 这篇论文主要关注的是如何将语言处理模型从一个领域转移到另一个领域，以便实现语言处理任务的执行。
methods: 这篇论文提出了一个名为Meta-Tsallis Entropy Minimization（MTEM）的方法，它利用了一个元学习算法来估计目标领域中的实例对应条件分布。同时，为了降低MTEM的计算成本，提出了一个简化技巧来简化元学习中的第二阶导函数。
results: 实验结果显示，MTEM可以提高BERT模型的适应性，平均提高了4%的适应性表现。

Abstract
Text classification is a fundamental task for natural language processing, and adapting text classification models across domains has broad applications. Self-training generates pseudo-examples from the model's predictions and iteratively trains on the pseudo-examples, i.e., minimizes the loss on the source domain and the Gibbs entropy on the target domain. However, Gibbs entropy is sensitive to prediction errors, and thus, self-training tends to fail when the domain shift is large. In this paper, we propose Meta-Tsallis Entropy minimization (MTEM), which applies a meta-learning algorithm to optimize the instance adaptive Tsallis entropy on the target domain. To reduce the computation cost of MTEM, we propose an approximation technique to approximate the Second-order derivation involved in the meta-learning. To efficiently generate pseudo labels, we propose an annealing sampling mechanism for exploring the model's prediction probability. Theoretically, we prove the convergence of the meta-learning algorithm in MTEM and analyze the effectiveness of MTEM in achieving domain adaptation. Experimentally, MTEM improves the adaptation performance of BERT with an average of 4 percent on the benchmark dataset.

摘要
文本分类是自然语言处理的基本任务，并且在不同领域中适应文本分类模型有广泛的应用。self-training生成 pseudo-例子从模型预测中，并在这些 pseudo-例子上进行逐步训练，即在源领域中减少损失，并在目标领域中减少 Gibbs entropy。然而，Gibbs entropy 响应预测错误，因此 self-training 在大型领域变化时往往失败。在这篇论文中，我们提出了 Meta-Tsallis Entropy 减少 (MTEM)，它使用 meta-学算法来优化目标领域中的实例适应 Tsallis entropy。为了降低 MTEM 的计算成本，我们提出了一种近似技术来 aproximate 第二个 derivation involved in meta-学。此外，我们还提出了一种熔化抽样机制来快速生成 pseudo 标签。理论上，我们证明了 MTEM 中的 meta-学算法的收敛性，并分析了 MTEM 在适应性方面的效果。实验表明，MTEM 可以提高 BERT 的适应性表现，平均提高了4%的 benchmark 数据集。

Learning to Schedule in Non-Stationary Wireless Networks With Unknown Statistics

paper_url: http://arxiv.org/abs/2308.02734
repo_url: None
paper_authors: Quang Minh Nguyen, Eytan Modiano
for: 这篇论文研究了无线网络中的有效调度算法，具体来说是针对具有通用干扰的无线网络进行调度。这种模型体现了现代无线网络中边缘设备的实际特点。
methods: 作者提出了一种新的MW-UCB算法，它基于Max-Weight策略，并利用Sliding-Window Upper-Confidence Bound来学习通道的统计特性在非站立性下。MW-UCB是可以 garantuee throughput-optimal的，只要在任意时间段内，服务器的含义服务率的总变化速度增长的速度是线性下降的。
results: 作者通过实验证明了他们的理论结果，并展示了MW-UCB的优越性。在具有非站立性的情况下，MW-UCB可以实现稳定区域的稳定性，而且可以接近稳定区域的稳定性。

Abstract
The emergence of large-scale wireless networks with partially-observable and time-varying dynamics has imposed new challenges on the design of optimal control policies. This paper studies efficient scheduling algorithms for wireless networks subject to generalized interference constraint, where mean arrival and mean service rates are unknown and non-stationary. This model exemplifies realistic edge devices' characteristics of wireless communication in modern networks. We propose a novel algorithm termed MW-UCB for generalized wireless network scheduling, which is based on the Max-Weight policy and leverages the Sliding-Window Upper-Confidence Bound to learn the channels' statistics under non-stationarity. MW-UCB is provably throughput-optimal under mild assumptions on the variability of mean service rates. Specifically, as long as the total variation in mean service rates over any time period grows sub-linearly in time, we show that MW-UCB can achieve the stability region arbitrarily close to the stability region of the class of policies with full knowledge of the channel statistics. Extensive simulations validate our theoretical results and demonstrate the favorable performance of MW-UCB.

摘要
“现代无线网络中大规模的无线网络具有部分可见和时间变化的动态特性，对优化控制策略的设计带来了新的挑战。这篇论文研究了基于通信频道统计数据的无线网络 generalized interference 约束下的有效协调策略。这种模型体现了现代无线通信网络中边缘设备的实际特性。我们提出了一种名为MW-UCB的新算法，基于Max-Weight策略，利用Sliding-Window Upper-Confidence Bound来学习不站дар性下的通信频道统计数据。MW-UCB 可以在某些时间期间内的总变化率下线性增长的情况下，提供可靠的吞吐量优化策略，并且可以达到与拥有通信频道统计数据的策略的稳定区域相同的性能。广泛的实验 validate 了我们的理论结果，并证明MW-UCB 的优秀性能。”Note: Simplified Chinese is used here, which is a more casual and informal version of Chinese. If you prefer Traditional Chinese or another version, please let me know.

Personalization of Stress Mobile Sensing using Self-Supervised Learning

paper_url: http://arxiv.org/abs/2308.02731
repo_url: None
paper_authors: Tanvir Islam, Peter Washington
for: 预测心理压力使用着装数据记录的智能手环研究是一个关键领域，因为实时预测压力可以让数字干预在压力的开始时点出现，帮助避免许多心理和生理学 симптом，如心脏rhythm irregularities。
methods: 我们使用模型个性化，即为每个用户训练一个压力预测模型。我们使用自我支持学习（SSL）来让神经网络学习每个人的基线生物信号模式的时间 Dynamics，以便个性化。
results: 我们发现使用我们的预训练方法学习的嵌入都高于无自我支持预训练的基线模型，并且需要 fewer than 30% of the labels to reach equivalent performance。这种个性化学习方法可以帮助精准健康系统，这些系统是针对每个人进行定制，并且需要少量标注。

Abstract
Stress is widely recognized as a major contributor to a variety of health issues. Stress prediction using biosignal data recorded by wearables is a key area of study in mobile sensing research because real-time stress prediction can enable digital interventions to immediately react at the onset of stress, helping to avoid many psychological and physiological symptoms such as heart rhythm irregularities. Electrodermal activity (EDA) is often used to measure stress. However, major challenges with the prediction of stress using machine learning include the subjectivity and sparseness of the labels, a large feature space, relatively few labels, and a complex nonlinear and subjective relationship between the features and outcomes. To tackle these issues, we examine the use of model personalization: training a separate stress prediction model for each user. To allow the neural network to learn the temporal dynamics of each individual's baseline biosignal patterns, thus enabling personalization with very few labels, we pre-train a 1-dimensional convolutional neural network (CNN) using self-supervised learning (SSL). We evaluate our method using the Wearable Stress and Affect prediction (WESAD) dataset. We fine-tune the pre-trained networks to the stress prediction task and compare against equivalent models without any self-supervised pre-training. We discover that embeddings learned using our pre-training method outperform supervised baselines with significantly fewer labeled data points: the models trained with SSL require less than 30% of the labels to reach equivalent performance without personalized SSL. This personalized learning method can enable precision health systems which are tailored to each subject and require few annotations by the end user, thus allowing for the mobile sensing of increasingly complex, heterogeneous, and subjective outcomes such as stress.

摘要
Stress 被广泛认为是健康问题的一个重要 contribuutor。使用移动感知技术记录的生物参数数据进行实时压力预测是移动感知研究中的关键领域，因为可以通过实时预测压力，防止心跳rhythm irregularities 和其他心理和生理症状。电解质活动 (EDA) 常常用于测量压力。然而，使用机器学习预测压力的主要挑战包括标签的主观性和稀缺性，特征空间的庞大，标签的 relativelly few，以及特征和结果之间的复杂、主观关系。为解决这些问题，我们研究了个性化的模型预测：对每个用户 separately 训练一个压力预测模型。通过训练一个1维度卷积神经网络 (CNN) 使用无监督学习 (SSL)，我们让神经网络学习每个人的个性基线生物参数模式，以便个性化。我们使用WESAD数据集进行评估。我们精细调整预训练后的网络，并与无个性SSL预训练的模型进行比较。我们发现，使用我们的预训练方法学习的嵌入在少于30%的标签点上达到相同性能。这种个性化学习方法可以帮助精准健康系统，这些系统是基于每个用户的特征，并且只需少量标签点Annotation by the end user，因此可以在移动感知中评估越来越复杂、多样化和主观的结果，如压力。

Synthesizing Programmatic Policies with Actor-Critic Algorithms and ReLU Networks

paper_url: http://arxiv.org/abs/2308.02729
repo_url: None
paper_authors: Spyros Orfanos, Levi H. S. Lelis
for: 这种研究旨在替代使用PIRL算法来Synthesize政策，而是直接使用actor-critic算法来获得程序化政策。
methods: 该研究使用了ReLU神经网络和斜角决策树之间的连接来翻译actor-critic算法学习的策略到程序化策略。
results: 实验结果表明，这种翻译方法可以学习出短而有效的策略，并且这些翻译后的策略与PIRL算法生成的策略相比，通常更好。

Abstract
Programmatically Interpretable Reinforcement Learning (PIRL) encodes policies in human-readable computer programs. Novel algorithms were recently introduced with the goal of handling the lack of gradient signal to guide the search in the space of programmatic policies. Most of such PIRL algorithms first train a neural policy that is used as an oracle to guide the search in the programmatic space. In this paper, we show that such PIRL-specific algorithms are not needed, depending on the language used to encode the programmatic policies. This is because one can use actor-critic algorithms to directly obtain a programmatic policy. We use a connection between ReLU neural networks and oblique decision trees to translate the policy learned with actor-critic algorithms into programmatic policies. This translation from ReLU networks allows us to synthesize policies encoded in programs with if-then-else structures, linear transformations of the input values, and PID operations. Empirical results on several control problems show that this translation approach is capable of learning short and effective policies. Moreover, the translated policies are at least competitive and often far superior to the policies PIRL algorithms synthesize.

摘要

Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction

paper_url: http://arxiv.org/abs/2308.02723
repo_url: https://github.com/smoothken/kknet
paper_authors: Keren Shao, Ke Chen, Taylor Berg-Kirkpatrick, Shlomo Dubnov
for: 提高歌唱 мелоody抽取模型的性能
methods: 输入特征修改和训练目标修改，基于两个假设：1）声音spectrogram中的谱 Harmonics快速衰减 along the frequency axis，2）歌唱和非歌唱 segment with extremely short duration rare。
results: 对多种模型，包括MSNet、FTANet和新引入的PianoNet，进行了修改，并对这些模型进行了实验，结果表明，提posed modifications是有效的提高歌唱 мелоody抽取模型的性能。

Abstract
In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance. In this paper, we propose an input feature modification and a training objective modification based on two assumptions. First, harmonics in the spectrograms of audio data decay rapidly along the frequency axis. To enhance the model's sensitivity on the trailing harmonics, we modify the Combined Frequency and Periodicity (CFP) representation using discrete z-transform. Second, the vocal and non-vocal segments with extremely short duration are uncommon. To ensure a more stable melody contour, we design a differentiable loss function that prevents the model from predicting such segments. We apply these modifications to several models, including MSNet, FTANet, and a newly introduced model, PianoNet, modified from a piano transcription network. Our experimental results demonstrate that the proposed modifications are empirically effective for singing melody extraction.

摘要
在深度学习研究中，许多旋律抽取模型通过修改神经网络架构来提高性能。在这篇论文中，我们提出了输入特征修改和训练目标修改，基于以下两个假设。首先，声音spectrogram中的谐波呈极速衰减的特性。为了增强模型对尾部谐波的敏感度，我们使用柔性变换来修改CFP表示。其次，歌曲和非歌曲段的持续时间非常短暂，以确保更稳定的旋律轨迹，我们设计了可导的损失函数，防止模型预测这些段落。我们应用这些修改到了多个模型，包括MSNet、FTANet和一个新引入的模型，PianoNet，该模型基于钢琴谱写网络。我们的实验结果表明，我们提出的修改是实际有效的，用于歌唱旋律抽取。

Fluid Property Prediction Leveraging AI and Robotics

paper_url: http://arxiv.org/abs/2308.02715
repo_url: https://github.com/baratilab/vid2visc
paper_authors: Jong Hoon Park, Gauri Pramod Dalwankar, Alison Bartsch, Abraham George, Amir Barati Farimani
for: 这篇论文旨在开发一种基于视觉的液体属性推断方法，以便在自动化液体处理系统中直接从视觉信息中推断液体属性。
methods: 该方法使用3D卷积自适应神经网络学习液体抖动模式的约束表示，然后通过视觉推断液体的动态粘度或液体类别。
results: 研究人员通过实验和比较分析发现，该方法可以准确地推断液体的动态粘度或液体类别，并且比传统方法更快速和高效。

Abstract
Inferring liquid properties from vision is a challenging task due to the complex nature of fluids, both in behavior and detection. Nevertheless, the ability to infer their properties directly from visual information is highly valuable for autonomous fluid handling systems, as cameras are readily available. Moreover, predicting fluid properties purely from vision can accelerate the process of fluid characterization saving considerable time and effort in various experimental environments. In this work, we present a purely vision-based approach to estimate viscosity, leveraging the fact that the behavior of the fluid oscillations is directly related to the viscosity. Specifically, we utilize a 3D convolutional autoencoder to learn latent representations of different fluid-oscillating patterns present in videos. We leverage this latent representation to visually infer the category of fluid or the dynamics viscosity of fluid from video.

摘要
“从视觉中推断流体属性是一项复杂的任务，这是因为流体的行为和检测都具有复杂的特点。然而，可以直接从视觉信息中推断流体属性的能力具有高度的价值，因为摄像头是 readily available。此外，只从视觉信息中预测流体属性可以大大缩短各种实验环境中的测试时间和努力。在这个工作中，我们提出了一种完全基于视觉的方法，通过利用视觉中的液体抖动模式来估算液体的粘度。特别是，我们使用3D卷积神经网络来学习不同的液体抖动模式在视频中的秘密表示。然后，我们利用这个秘密表示来从视频中可见地推断液体的类别或液体的粘度。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Exploring the Effect of Sparse Recovery on the Quality of Image Superresolution

paper_url: http://arxiv.org/abs/2308.02714
repo_url: None
paper_authors: Antonio Castro
for: 这个论文是为了研究ictionary learning可以用于图像超分辨的问题。
methods: 该论文使用了一对相互关联的词库来学习图像块，其中一个词库来自高分辨率图像，另一个词库来自低分辨率图像。这两个词库需要共享同一个稀疏向量，以便将高分辨率图像块重建为低分辨率图像。
results: 该论文通过实验研究了不同稀疏恢复算法对图像重建质量的影响。提出了最佳稀疏恢复算法可以用于此目的。

Abstract
Dictionary learning can be used for image superresolution by learning a pair of coupled dictionaries of image patches from high-resolution and low-resolution image pairs such that the corresponding pairs share the same sparse vector when represented by the coupled dictionaries. These dictionaries then can be used to to reconstruct the corresponding high-resolution patches from low-resolution input images based on sparse recovery. The idea is to recover the shared sparse vector using the low-resolution dictionary and then multiply it by the high-resolution dictionary to recover the corresponding high-resolution image patch. In this work, we study the effect of the sparse recovery algorithm that we use on the quality of the reconstructed images. We offer empirical experiments to search for the best sparse recovery algorithm that can be used for this purpose.

摘要
“字典学习可以用于图像超分辨by学习一对相关的字典集合，其中一个字典是高分辨度图像块，另一个字典是低分辨度图像块，这两个字典之间存在相同的稀疏 вектор。这些字典可以用于从低分辨度输入图像中重建对应的高分辨度图像块，基于稀疏恢复。我们的想法是通过低分辨度字典来恢复共享的稀疏 вектор，然后将其乘以高分辨度字典来恢复对应的高分辨度图像块。在这个工作中，我们研究了使用不同的稀疏恢复算法对图像质量的影响。我们通过实验搜索最佳的稀疏恢复算法。”Note that Simplified Chinese is a standardized form of Chinese that is used in mainland China and Singapore. Traditional Chinese is used in Taiwan, Hong Kong, and other regions.

Scalable Computation of Causal Bounds

paper_url: http://arxiv.org/abs/2308.02709
repo_url: None
paper_authors: Madhumitha Shridharan, Garud Iyengar
for: computing bounds for causal queries on causal graphs with unobserved confounders and discrete valued observed variables, where identifiability does not hold.
methods: significantly pruning a linear programming (LP) formulation to compute bounds, and extending the pruning methodology to fractional LPs for incorporating additional observations.
results: significant runtime improvement compared to benchmarks in experiments, and proposal of an efficient greedy heuristic for high-quality bounds that scales to larger problems.

Abstract
We consider the problem of computing bounds for causal queries on causal graphs with unobserved confounders and discrete valued observed variables, where identifiability does not hold. Existing non-parametric approaches for computing such bounds use linear programming (LP) formulations that quickly become intractable for existing solvers because the size of the LP grows exponentially in the number of edges in the causal graph. We show that this LP can be significantly pruned, allowing us to compute bounds for significantly larger causal inference problems compared to existing techniques. This pruning procedure allows us to compute bounds in closed form for a special class of problems, including a well-studied family of problems where multiple confounded treatments influence an outcome. We extend our pruning methodology to fractional LPs which compute bounds for causal queries which incorporate additional observations about the unit. We show that our methods provide significant runtime improvement compared to benchmarks in experiments and extend our results to the finite data setting. For causal inference without additional observations, we propose an efficient greedy heuristic that produces high quality bounds, and scales to problems that are several orders of magnitude larger than those for which the pruned LP can be solved.

摘要
我们考虑了 causal 图上的 causal 查询问题，其中存在隐藏的假设和离散变量。我们提出了一种非 Parametric 方法来计算这些查询问题的 bound，使用线性Programming（LP）形式，但是这个 LP 的大小会 exponential 增长与 causal 图的边数相关。我们展示了一种可以减少这个 LP 的大小的技术，使得我们可以计算更大的 causal 推理问题。这种减少技术允许我们计算closed form的 bound для一种特殊的问题，包括一种已经广泛研究的家族问题，其中多个干扰因素影响结果。我们扩展了我们的减少方法到 fractional LPs，它们计算包含额外观测的 causal 查询问题的 bound。我们示出了我们的方法可以提供 significiant 的运行时改进 compared to 参考值，并扩展到 finite 数据设置。对于 causal 推理无需额外观测，我们提出了一种高质量的 bound 生成的快速优化策略，并可以处理许多 magnitudes 更大的问题。

FPR Estimation for Fraud Detection in the Presence of Class-Conditional Label Noise

paper_url: http://arxiv.org/abs/2308.02695
repo_url: https://github.com/jtittelfitz/fpr-estimation
paper_authors: Justin Tittelfitz
for: 这篇论文的目的是为了估计二分类模型中的假阳性率（FPR）和真阳性率（TPR），当有标签验证集中存在错误标签（标签噪声）时。
methods: 这篇论文使用的方法是利用二分类模型直接清洁验证数据，以避免清洁不正确的示例，并确保清洁正确的示例。
results: 论文表明，使用模型直接清洁验证数据可能会导致低估真实的FPR和TPR，即使总错误率较低。这表明需要开发一些可以减少总错误率并且减少清洁错误率与模型分数相关的方法。

Abstract
We consider the problem of estimating the false-/ true-positive-rate (FPR/TPR) for a binary classification model when there are incorrect labels (label noise) in the validation set. Our motivating application is fraud prevention where accurate estimates of FPR are critical to preserving the experience for good customers, and where label noise is highly asymmetric. Existing methods seek to minimize the total error in the cleaning process - to avoid cleaning examples that are not noise, and to ensure cleaning of examples that are. This is an important measure of accuracy but insufficient to guarantee good estimates of the true FPR or TPR for a model, and we show that using the model to directly clean its own validation data leads to underestimates even if total error is low. This indicates a need for researchers to pursue methods that not only reduce total error but also seek to de-correlate cleaning error with model scores.

摘要
我们考虑一个二分类模型预测时存在标签错误（标签噪声）的问题。我们的动机应用是防止诈骗，准确估计FPR（假阳性率）是critical的，因为准确估计可以保持好客户的体验。在这种应用中，标签噪声具有高度的不对称性。现有方法尝试最小化清洁过程中的总错误，以避免清洁不是噪声的示例，并确保清洁示例是。这是一个重要的准确度指标，但是不足以保证模型的真实FPR或TPR估计。我们显示，使用模型直接清洁自己的验证数据会导致下降，即使总错误较低。这表明研究人员需要追求不 только减少总错误，而且还需要尝试减少清洁错误与模型分数之间的相关性。

Explainable Deep Learning-based Solar Flare Prediction with post hoc Attention for Operational Forecasting

paper_url: http://arxiv.org/abs/2308.02682
repo_url: https://bitbucket.org/gsudmlab/explainingfulldisk
paper_authors: Chetraj Pandey, Rafal A. Angryk, Manolis K. Georgoulis, Berkay Aydin
for: 这 paper 的目的是提出一种基于深度学习的全盘太阳风暴预测模型，以便在 24 小时内预测 $\geq$M1.0-级太阳风暴的发生。
methods: 这 paper 使用了自定义的数据增强和样本权重，以解决内在的类别不均衡问题，并使用了真实技能统计量和海德ке技能分数作为评估 metric。
results: 这 paper 的分析结果显示，全盘预测太阳风暴与活动区相关，并且模型可以准确地预测近 limb 的太阳风暴。主要发现包括：(1) 我们的全盘模型可以准确地位置和预测近 limb 的太阳风暴，这是操作性风暴预测中的关键特征；(2) 我们的候选模型在 TSS 和 HSS 两个评估指标上得到了平均值为 0.51$\pm$0.05 和 0.38$\pm$0.08，分别表示预测性能的水平。

Abstract
This paper presents a post hoc analysis of a deep learning-based full-disk solar flare prediction model. We used hourly full-disk line-of-sight magnetogram images and selected binary prediction mode to predict the occurrence of $\geq$M1.0-class flares within 24 hours. We leveraged custom data augmentation and sample weighting to counter the inherent class-imbalance problem and used true skill statistic and Heidke skill score as evaluation metrics. Recent advancements in gradient-based attention methods allow us to interpret models by sending gradient signals to assign the burden of the decision on the input features. We interpret our model using three post hoc attention methods: (i) Guided Gradient-weighted Class Activation Mapping, (ii) Deep Shapley Additive Explanations, and (iii) Integrated Gradients. Our analysis shows that full-disk predictions of solar flares align with characteristics related to the active regions. The key findings of this study are: (1) We demonstrate that our full disk model can tangibly locate and predict near-limb solar flares, which is a critical feature for operational flare forecasting, (2) Our candidate model achieves an average TSS=0.51$\pm$0.05 and HSS=0.38$\pm$0.08, and (3) Our evaluation suggests that these models can learn conspicuous features corresponding to active regions from full-disk magnetograms.

摘要
Recent advancements in gradient-based attention methods allow us to interpret models by sending gradient signals to assign the burden of the decision on the input features. We interpret our model using three post hoc attention methods:1. Guided Gradient-weighted Class Activation Mapping: This method sends gradient signals to the input features to identify the most important features for the model's predictions.2. Deep Shapley Additive Explanations: This method assigns a unique contribution to each input feature for the model's predictions, allowing us to understand which features are most important.3. Integrated Gradients: This method computes the gradient of the model's output with respect to the input features and visualizes the results in a heatmap.Our analysis shows that full-disk predictions of solar flares align with characteristics related to the active regions. The key findings of this study are:1. We demonstrate that our full-disk model can tangibly locate and predict near-limb solar flares, which is a critical feature for operational flare forecasting.2. Our candidate model achieves an average TSS=0.51$\pm$0.05 and HSS=0.38$\pm$0.08.3. Our evaluation suggests that these models can learn conspicuous features corresponding to active regions from full-disk magnetograms.Translation in Simplified Chinese:这篇论文介绍了一种基于深度学习的全磁盘太阳风暴预测模型的后期分析。我们使用了每小时的全磁盘视线 magnetogram 图像，并选择了二分类预测模式，以预测 Within 24 小时内的 $\geq$M1.0-级太阳风暴发生。我们利用了自定义的数据增强和样本权重，以解决内置的分类不均衡问题，并使用真实技能统计和海德基准技能分数作为评价指标。最新的梯度基于注意力方法，使得我们可以通过发送梯度信号，将决策归结托管到输入特征上。我们使用了三种后期注意力方法来解释我们的模型：1. 导航梯度权重分布图：这种方法将梯度信号发送到输入特征，以确定模型的决策中最重要的特征。2. 深度贡献添加性解释：这种方法为模型的预测做出了唯一的贡献，以便理解模型中哪些特征最重要。3. 集成梯度：这种方法计算了模型的输出与输入特征之间的梯度，并将结果视觉化为热图。我们的分析表明，全磁盘预测太阳风暴与活动区相关。研究的关键发现包括：1. 我们证明了我们的全磁盘模型可以实际地找到和预测近 limb 太阳风暴，这是操作风暴预测中的关键特点。2. 我们的候选模型的平均 TSS 为 0.51$\pm$0.05，HSS 为 0.38$\pm$0.08。3. 我们的评价表明，这些模型可以从全磁盘 magnetogram 中学习明显的活动区特征。

A Review of Change of Variable Formulas for Generative Modeling

paper_url: http://arxiv.org/abs/2308.02652
repo_url: None
paper_authors: Ullrich Köthe
for: 这篇论文是为了总结 Change-of-variables（CoV）公式的各种应用和特性，并提供一个系统性的对待方法。
methods: 论文使用了encoder/decoder架构来总结CoV公式，并收集了28种CoV公式在一处。
results: 论文发现了一些关于CoV公式的 interessing关系，以及一些在文献中不够明确的distinction。同时，论文还发现了未来研究中的一些潜在的空白。

Abstract
Change-of-variables (CoV) formulas allow to reduce complicated probability densities to simpler ones by a learned transformation with tractable Jacobian determinant. They are thus powerful tools for maximum-likelihood learning, Bayesian inference, outlier detection, model selection, etc. CoV formulas have been derived for a large variety of model types, but this information is scattered over many separate works. We present a systematic treatment from the unifying perspective of encoder/decoder architectures, which collects 28 CoV formulas in a single place, reveals interesting relationships between seemingly diverse methods, emphasizes important distinctions that are not always clear in the literature, and identifies surprising gaps for future research.

摘要
《变量变换（CoV）方程可以将复杂的概率密度转化为更简单的密度，通过学习的变换，其 Jacobian determinant 也是可追度的。因此，CoV 方程是maximum-likelihood 学习、 bayesian 推理、异常检测、模型选择等方面的 poderful工具。CoV 方程已经为不同类型的模型 derive，但这些信息分散在多个不同的论文中。我们在encoder/decoder 架构的统一视角下提供了一个系统性的处理方法，收集了28种CoV 方程，在一个地方集中，揭示了各种方法之间的关系，强调了文献中不一定明确的重要 отличия，并标识了未来研究中的潜在空白。》

ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation

paper_url: http://arxiv.org/abs/2308.03793
repo_url: None
paper_authors: Xuefeng Hu, Ke Zhang, Lu Xia, Albert Chen, Jiajia Luo, Yuyin Sun, Ken Wang, Nan Qiao, Xiao Zeng, Min Sun, Cheng-Hao Kuo, Ram Nevatia
for: 提高CLIP模型在下游目标领域的表现，解决视文频域差和跨模态不一致问题。
methods: 提出了一种源自由频率适应方法，不需要源数据或目标标注数据，首先学习减小视文嵌入的投影空间，然后通过跨模态自适应 retrained 更新视频编码器，细化标签和减少频率差和不一致。
results: 在22张图像分类benchmark上，使用ReCLIP方法可以将CLIP模型的平均错误率从30.17%降低至25.06%。

Abstract
Large-scale Pre-Training Vision-Language Model such as CLIP has demonstrated outstanding performance in zero-shot classification, e.g. achieving 76.3% top-1 accuracy on ImageNet without seeing any example, which leads to potential benefits to many tasks that have no labeled data. However, while applying CLIP to a downstream target domain, the presence of visual and text domain gaps and cross-modality misalignment can greatly impact the model performance. To address such challenges, we propose ReCLIP, the first source-free domain adaptation method for vision-language models, which does not require any source data or target labeled data. ReCLIP first learns a projection space to mitigate the misaligned visual-text embeddings and learns pseudo labels, and then deploys cross-modality self-training with the pseudo labels, to update visual and text encoders, refine labels and reduce domain gaps and misalignments iteratively. With extensive experiments, we demonstrate ReCLIP reduces the average error rate of CLIP from 30.17% to 25.06% on 22 image classification benchmarks.

摘要
大规模预训练视语模型，如CLIP，在零例批量分类任务中表现出色，例如在ImageNet上 achievement 76.3% 的顶部一准确率，这可能会带来许多无标注数据的任务中的潜在优势。然而，在应用 CLIP 到下游目标领域时，视文频域差和跨模态不一致可能会对模型表现产生很大的影响。为解决这些挑战，我们提出了 ReCLIP，第一个不需要源数据或目标标记数据的源自由频域适应方法。ReCLIP 首先学习一个抑制跨模态视文嵌入的投影空间，然后通过跨模态自我训练，使用 Pseudo 标签，更新视文编码器，约束标签和降低频域差和不一致。经过广泛的实验，我们证明 ReCLIP 可以将 CLIP 的平均错误率由 30.17% 降低至 25.06% 在 22 个图像分类 benchmark 上。

Learning from Topology: Cosmological Parameter Estimation from the Large-scale Structure

paper_url: http://arxiv.org/abs/2308.02636
repo_url: None
paper_authors: Jacky H. T. Yip, Adam Rouhiainen, Gary Shiu
for: 研究大规模宇宙结构的拓扑特征，以提取 cosmological 参数的信息。
methods: 使用神经网络模型将普适 persistent homology 图像映射到 cosmological 参数。
results: 通过参数恢复测试，我们的模型可以准确地估算 cosmological 参数，比传统 Bayesian 推理方法更高效。

Abstract
The topology of the large-scale structure of the universe contains valuable information on the underlying cosmological parameters. While persistent homology can extract this topological information, the optimal method for parameter estimation from the tool remains an open question. To address this, we propose a neural network model to map persistence images to cosmological parameters. Through a parameter recovery test, we demonstrate that our model makes accurate and precise estimates, considerably outperforming conventional Bayesian inference approaches.

摘要
宇宙大规模结构的Topology含有价值的 cosmological参数信息。 persistent homology 可以提取这种 topological 信息，但是最佳的方法 для参数估计仍然是一个开放的问题。为解决这个问题，我们提议一种 neural network 模型，将 persistency 图像映射到 cosmological 参数。通过参数恢复测试，我们证明了我们的模型可以准确和精确地估计参数，明显超过了传统的 Bayesian 推理方法。Note: "persistent homology" is a term that is not commonly used in Simplified Chinese, so I have translated it as " persistent homology" instead of using a more common term like " persistent homology theory" or " persistent homology analysis".

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

paper_url: http://arxiv.org/abs/2308.02490
repo_url: https://github.com/yuweihao/mm-vet
paper_authors: Weihao Yu, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Zicheng Liu, Xinchao Wang, Lijuan Wang
for: 本文提出了一个评估比较大型多模式（LMM）的评估标准 benchmark，用于测试LMM在复杂多模式任务上的能力。
methods: 本文使用了6种核心视力语言（VL）能力来定义LMM的评估标准，并对16种 инте инте инте инте инте инте инте инте инте инте инте инте инте инте инте inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte inte INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE INTE IN

Abstract
We propose MM-Vet, an evaluation benchmark that examines large multimodal models (LMMs) on complicated multimodal tasks. Recent LMMs have shown various intriguing abilities, such as solving math problems written on the blackboard, reasoning about events and celebrities in news images, and explaining visual jokes. Rapid model advancements pose challenges to evaluation benchmark development. Problems include: (1) How to systematically structure and evaluate the complicated multimodal tasks; (2) How to design evaluation metrics that work well across question and answer types; and (3) How to give model insights beyond a simple performance ranking. To this end, we present MM-Vet, designed based on the insight that the intriguing ability to solve complicated tasks is often achieved by a generalist model being able to integrate different core vision-language (VL) capabilities. MM-Vet defines 6 core VL capabilities and examines the 16 integrations of interest derived from the capability combination. For evaluation metrics, we propose an LLM-based evaluator for open-ended outputs. The evaluator enables the evaluation across different question types and answer styles, resulting in a unified scoring metric. We evaluate representative LMMs on MM-Vet, providing insights into the capabilities of different LMM system paradigms and models. Code and data are available at https://github.com/yuweihao/MM-Vet.

摘要
我们提出了 MM-Vet，一个评估大型多Modal模型（LMM）的评价指标，用于评估LMM在复杂多Modal任务上的能力。近期的LMM在不同的任务上展现了各种有趣的能力，如解释黑板上的数学问题，理解新闻图片中的事件和名人，以及解释视觉笑话。由于模型的快速进步，评价指标的开发受到挑战。问题包括：（1）如何系统地结构化和评估复杂多Modal任务；（2）如何设计评价指标，能够在问题和答案类型之间具有一致性；以及（3）如何为模型提供更多的探索和理解。为此，我们提出了 MM-Vet，基于视力语言（VL）能力的核心能力的设计。MM-Vet定义了6个VL核心能力，并评估了这些能力的16种组合。为评价指标，我们提议了基于大语言模型（LLM）的评估器，可以评估不同类型的问题和答案样式，从而获得一个统一的评价指标。我们对代表性的LMM进行了MM-Vet的评估，以获得不同模型系统和模型的能力的信息。代码和数据可以在 GitHub上获取。

Generation of Realistic Synthetic Raw Radar Data for Automated Driving Applications using Generative Adversarial Networks

paper_url: http://arxiv.org/abs/2308.02632
repo_url: None
paper_authors: Eduardo C. Fidelis, Fabio Reway, Herick Y. S. Ribeiro, Pietro L. Campos, Werner Huber, Christian Icking, Lester A. Faria, Torsten Schön
for: 这个论文目的是为了提供一种更快速的雷达 simulate 方法，使用生成 adversarial 网络 (GAN) 生成雷达数据。
methods: 这个方法使用 GAN 生成雷达数据，并使用距离和高斯噪声作为输入参数。
results: 研究发现，使用这种方法可以生成高度真实的雷达数据，包括雷达反射和背景噪声。此外，这种方法还可以增加数据的扩充，例如生成不可能或安全关键的场景数据。

Abstract
The main approaches for simulating FMCW radar are based on ray tracing, which is usually computationally intensive and do not account for background noise. This work proposes a faster method for FMCW radar simulation capable of generating synthetic raw radar data using generative adversarial networks (GAN). The code and pre-trained weights are open-source and available on GitHub. This method generates 16 simultaneous chirps, which allows the generated data to be used for the further development of algorithms for processing radar data (filtering and clustering). This can increase the potential for data augmentation, e.g., by generating data in non-existent or safety-critical scenarios that are not reproducible in real life. In this work, the GAN was trained with radar measurements of a motorcycle and used to generate synthetic raw radar data of a motorcycle traveling in a straight line. For generating this data, the distance of the motorcycle and Gaussian noise are used as input to the neural network. The synthetic generated radar chirps were evaluated using the Frechet Inception Distance (FID). Then, the Range-Azimuth (RA) map is calculated twice: first, based on synthetic data using this GAN and, second, based on real data. Based on these RA maps, an algorithm with adaptive threshold and edge detection is used for object detection. The results have shown that the data is realistic in terms of coherent radar reflections of the motorcycle and background noise based on the comparison of chirps, the RA maps and the object detection results. Thus, the proposed method in this work has shown to minimize the simulation-to-reality gap for the generation of radar data.

摘要
主要方法 для模拟FMCW雷达是基于射线跟踪，通常是 computationally 高度昂贵并不考虑背景噪声。这项工作提出了一种更快的FMCW雷达模拟方法，使用生成对抗网络（GAN）生成 sintetic raw radar 数据。代码和预训练 веса公开可用于 GitHub。这种方法生成了16个同时的雷达尖叫，使得生成的数据可以用于雷达数据处理算法的进一步开发（过滤和归一化）。这可以增加数据增强的潜在性，例如通过生成不可重复或安全关键的场景中的数据。在这项工作中，GAN 被训练使用雷达测量数据，并用于生成雷达数据。输入到神经网络的距离和高斯噪声，以生成雷达尖叫。生成的雷达尖叫被评估使用Frechet Inception Distance（FID）。然后，使用这些RA（距离-方向）地图进行了两次计算：一次是基于实际数据，第二次是基于生成的数据。基于这些RA地图，使用适应阈值和边检测算法进行对象检测。结果表明，生成的数据具有准确的协调雷达反射和背景噪声基于尖叫、RA 地图和对象检测结果。因此，提出的方法在这项工作中被证明可以减少 simulation-to-reality gap。

BlindSage: Label Inference Attacks against Node-level Vertical Federated Graph Neural Networks

paper_url: http://arxiv.org/abs/2308.02465
repo_url: None
paper_authors: Marco Arazzi, Mauro Conti, Stefanos Koffas, Marina Krcek, Antonino Nocera, Stjepan Picek, Jing Xu
for: 本研究的目的是探讨 Vertical Federated Learning (VFL) 中的标签推断攻击，特别是在攻击者没有任何背景知识的情况下。
methods: 本研究使用了一种零背景知识策略来攻击 VFL，并使用了图神经网络 (GNNs) 作为目标模型。
results: 实验结果表明，我们的提议的攻击方法可以准确地预测 VFL 中的标签，并且在大多数情况下可以达到 nearly 100% 的准确率。此外，我们发现了一些常见的防御机制无法防止我们的攻击，而不会影响模型在主要分类任务上的性能。

Abstract
Federated learning enables collaborative training of machine learning models by keeping the raw data of the involved workers private. One of its main objectives is to improve the models' privacy, security, and scalability. Vertical Federated Learning (VFL) offers an efficient cross-silo setting where a few parties collaboratively train a model without sharing the same features. In such a scenario, classification labels are commonly considered sensitive information held exclusively by one (active) party, while other (passive) parties use only their local information. Recent works have uncovered important flaws of VFL, leading to possible label inference attacks under the assumption that the attacker has some, even limited, background knowledge on the relation between labels and data. In this work, we are the first (to the best of our knowledge) to investigate label inference attacks on VFL using a zero-background knowledge strategy. To concretely formulate our proposal, we focus on Graph Neural Networks (GNNs) as a target model for the underlying VFL. In particular, we refer to node classification tasks, which are widely studied, and GNNs have shown promising results. Our proposed attack, BlindSage, provides impressive results in the experiments, achieving nearly 100% accuracy in most cases. Even when the attacker has no information about the used architecture or the number of classes, the accuracy remained above 85% in most instances. Finally, we observe that well-known defenses cannot mitigate our attack without affecting the model's performance on the main classification task.

摘要
联合学习可以实现机器学习模型的共同训练，同时保持参与者的原始数据Private。其主要目标是提高模型的隐私、安全性和扩展性。垂直联合学习（VFL）提供了一个高效的逻辑分布式设置，在这种情况下，一些党A共同训练模型，而不需要共享同一些特征。在这种情况下，分类标签通常被视为党A拥有的敏感信息，而党B仅使用本地信息。在最近的研究中，我们发现了VFL的重要漏洞，可能导致标签推断攻击，假设攻击者有一定的背景知识关于标签和数据之间的关系。在这种情况下，我们是第一个（到我们知道的）调查VFL中的标签推断攻击，使用零背景知识策略。为了具体实现我们的提议，我们将Graph Neural Networks（GNNs）作为VFL的目标模型。特别是在节点分类任务中，GNNs已经显示出了良好的效果。我们的提出的攻击方法，BlindSage，在实验中提供了惊人的结果，在大多数情况下达到了95%的准确率。甚至当攻击者没有任何关于使用的架构或类数的信息时，攻击的准确率仍然保持在85%以上的水平。最后，我们发现了常见的防御措施无法防止我们的攻击，而不会影响模型在主要分类任务上的表现。

Universal Approximation of Linear Time-Invariant (LTI) Systems through RNNs: Power of Randomness in Reservoir Computing

paper_url: http://arxiv.org/abs/2308.02464
repo_url: None
paper_authors: Shashank Jere, Lizhong Zheng, Karim Said, Lingjia Liu
for: 这个论文的目的是为了提供一种可靠的方法来使用循环神经网络（RNN）来近似线性时间不变（LTI）系统。
methods: 这个论文使用了循环计算（RC），一种特殊的RNN，其中循环权重是随机的并未经过训练。
results: 研究人员发现，RC可以通过随机初始化循环权重来近似任意LTI系统，并且可以通过干扰分析来理解RC在LTI系统 simulate 问题中的表现。此外，研究人员还提供了一种可靠的分布函数来生成RC的循环权重，并通过广泛的数值计算来验证这种分布函数的优化性。

Abstract
Recurrent neural networks (RNNs) are known to be universal approximators of dynamic systems under fairly mild and general assumptions, making them good tools to process temporal information. However, RNNs usually suffer from the issues of vanishing and exploding gradients in the standard RNN training. Reservoir computing (RC), a special RNN where the recurrent weights are randomized and left untrained, has been introduced to overcome these issues and has demonstrated superior empirical performance in fields as diverse as natural language processing and wireless communications especially in scenarios where training samples are extremely limited. On the contrary, the theoretical grounding to support this observed performance has not been fully developed at the same pace. In this work, we show that RNNs can provide universal approximation of linear time-invariant (LTI) systems. Specifically, we show that RC can universally approximate a general LTI system. We present a clear signal processing interpretation of RC and utilize this understanding in the problem of simulating a generic LTI system through RC. Under this setup, we analytically characterize the optimal probability distribution function for generating the recurrent weights of the underlying RNN of the RC. We provide extensive numerical evaluations to validate the optimality of the derived optimum distribution of the recurrent weights of the RC for the LTI system simulation problem. Our work results in clear signal processing-based model interpretability of RC and provides theoretical explanation for the power of randomness in setting instead of training RC's recurrent weights. It further provides a complete optimum analytical characterization for the untrained recurrent weights, marking an important step towards explainable machine learning (XML) which is extremely important for applications where training samples are limited.

摘要
循环神经网络（RNN）是知名的通用逼近器，可以在一定条件下逼近动态系统。然而，标准RNN训练中的衰减和扩散梯度问题经常会出现。循环计算（RC）是一种特殊的RNN，其循环权重随机并未经过训练，能够解决这些问题，并在自然语言处理和无线通信等领域达到了优秀的实验性表现。然而，对RC的理论基础的发展并没有和实验成果相同的步伐。在这种情况下，我们展示了RNN可以通用逼近线性时变系统（LTI）。具体来说，我们证明了RC可以通用任意一个通常LTI系统。我们提供了一个清晰的信号处理 интерпрета图，使用这种理解来解决通过RC来模拟一个通常LTI系统的问题。在这种设置下，我们分析性地 caracterize了RC的恒定梯度的优化分布函数。我们进行了广泛的数值评估，以验证RC的恒定梯度的优化分布函数是LTI系统模拟问题的优质解。我们的工作导致了RC的信号处理基础模型的可见性，并提供了对XML的理论解释，这对于具有有限样本的应用场景非常重要。

Fast and Accurate Reduced-Order Modeling of a MOOSE-based Additive Manufacturing Model with Operator Learning

paper_url: http://arxiv.org/abs/2308.02462
repo_url: None
paper_authors: Mahmoud Yaseen, Dewen Yushu, Peter German, Xu Wu
for: 本研究旨在开发一个快速和准确的减少维度模型（ROM），以便在Additive Manufacturing（AM）过程中控制和优化过程中减少计算负担。
methods: 本研究使用了运算学益（OL）方法，通过修改过程变量来学习一家偏微方程。specifically, the authors used Fourier neural operator (FNO) and deep operator network (DeepONet) to develop ROMs for time-dependent responses.
results: 对比深度神经网络（DNN）基于ROM，OL方法可以提供相似的性能，并且在准确性和泛化性方面甚至超越DNN。FNO和DeepONet都能够预测时间序列数据，而无需采用维度减少技术。

Abstract
One predominant challenge in additive manufacturing (AM) is to achieve specific material properties by manipulating manufacturing process parameters during the runtime. Such manipulation tends to increase the computational load imposed on existing simulation tools employed in AM. The goal of the present work is to construct a fast and accurate reduced-order model (ROM) for an AM model developed within the Multiphysics Object-Oriented Simulation Environment (MOOSE) framework, ultimately reducing the time/cost of AM control and optimization processes. Our adoption of the operator learning (OL) approach enabled us to learn a family of differential equations produced by altering process variables in the laser's Gaussian point heat source. More specifically, we used the Fourier neural operator (FNO) and deep operator network (DeepONet) to develop ROMs for time-dependent responses. Furthermore, we benchmarked the performance of these OL methods against a conventional deep neural network (DNN)-based ROM. Ultimately, we found that OL methods offer comparable performance and, in terms of accuracy and generalizability, even outperform DNN at predicting scalar model responses. The DNN-based ROM afforded the fastest training time. Furthermore, all the ROMs were faster than the original MOOSE model yet still provided accurate predictions. FNO had a smaller mean prediction error than DeepONet, with a larger variance for time-dependent responses. Unlike DNN, both FNO and DeepONet were able to simulate time series data without the need for dimensionality reduction techniques. The present work can help facilitate the AM optimization process by enabling faster execution of simulation tools while still preserving evaluation accuracy.

摘要
一个主要挑战在添加制造（AM）是如何在运行时间中控制特定材料属性。这种控制通常会增加现有的 simulations 工具在 AM 中所受的计算负担。目标是在 MOOSE 框架中开发一个快速和准确的减少维度模型（ROM），以降低 AM 控制和优化过程中的时间/成本。我们采用了运算学（OL）方法，通过修改过程变量来学习激光的 Gaussian 点热源生成的 differential equation 家族。我们使用了 Fourier neural operator（FNO）和深度运算网络（DeepONet）来开发 ROM для时间相依的响应。此外，我们对这些 OL 方法与传统的深度神经网络（DNN）基于 ROM 进行比较。最终，我们发现 OL 方法可以与 DNN 方法相比，在预测Scalar 模型响应方面具有相同的性能和普遍性，并且在一些情况下甚至超越 DNN。DNN 基于 ROM 提供的培训时间最快。此外，所有的 ROM 快于原始 MOOSE 模型，但仍然提供了准确的预测。FNO 对时间相依的响应有较小的平均预测错误，而 DeepONet 具有较大的差异。不同于 DNN，FNO 和 DeepONet 都可以无需采用维度减少技术来 simulate 时间序列数据。 presente 的工作可以帮助加速 AM 优化过程中的 simulation 工具执行，同时仍保持评估准确性。

Nonprehensile Planar Manipulation through Reinforcement Learning with Multimodal Categorical Exploration

paper_url: http://arxiv.org/abs/2308.02459
repo_url: None
paper_authors: Juan Del Aguila Ferrandis, João Moura, Sethu Vijayakumar
for: 这个论文的目的是开发一种能够实现灵活非抓取抓取操作的 робот控制器，例如在表面上推动一个物体。
methods: 这个论文使用了 reinforcement learning（RL）框架，并提出了一种多模式探索方法，以学习非抓取抓取操作的策略。
results: 研究人员通过实验和仿真实验 validate了这种方法，并证明了它可以在不同的初始和目标物体姿态下，以及在受到外部干扰和观测噪声的情况下实现高精度的操作。

Abstract
Developing robot controllers capable of achieving dexterous nonprehensile manipulation, such as pushing an object on a table, is challenging. The underactuated and hybrid-dynamics nature of the problem, further complicated by the uncertainty resulting from the frictional interactions, requires sophisticated control behaviors. Reinforcement Learning (RL) is a powerful framework for developing such robot controllers. However, previous RL literature addressing the nonprehensile pushing task achieves low accuracy, non-smooth trajectories, and only simple motions, i.e. without rotation of the manipulated object. We conjecture that previously used unimodal exploration strategies fail to capture the inherent hybrid-dynamics of the task, arising from the different possible contact interaction modes between the robot and the object, such as sticking, sliding, and separation. In this work, we propose a multimodal exploration approach through categorical distributions, which enables us to train planar pushing RL policies for arbitrary starting and target object poses, i.e. positions and orientations, and with improved accuracy. We show that the learned policies are robust to external disturbances and observation noise, and scale to tasks with multiple pushers. Furthermore, we validate the transferability of the learned policies, trained entirely in simulation, to a physical robot hardware using the KUKA iiwa robot arm. See our supplemental video: https://youtu.be/vTdva1mgrk4.

摘要
发展能够实现dexterous nonprehensile manipulation的机器人控制器是一个挑战。这种下动和混合动力学性的问题，受到摩擦交互的不确定性的影响，需要复杂的控制行为。使用奖励学习（RL）是一种强大的框架，但前一些RL文献对非握持推动任务的精度、非精ooth的轨迹和简单的运动只能达到了低水平。我们 conjecture previous 使用单模态探索策略无法捕捉非握持任务中的自然混合动力学， arise from 机器人和物体之间的不同可能的接触模式，如粘性、滑动和分离。在这种工作中，我们提出了多模态探索方法，通过分类分布来实现。我们可以通过这种方法训练平面推动RL策略，可以处理任意开始和目标物体姿态，并且具有改善的精度。我们还证明了学习的策略是对外部干扰和观测噪声抗性的，并且可扩展到多个推动器。此外，我们验证了学习的策略，完全在模拟环境中训练的，可以在 физи机器人硬件上运行，使用KUKA iiwa机械臂。参考我们的补充视频：https://youtu.be/vTdva1mgrk4.

Uncertainty Estimation and Propagation in Accelerated MRI Reconstruction

paper_url: http://arxiv.org/abs/2308.02631
repo_url: https://github.com/paulkogni/mr-recon-uq
paper_authors: Paul Fischer, Thomas Küstner, Christian F. Baumgartner
for: 这个论文是为了提出一种基于深度学习的MRI重建技术，以提高重建质量，特别在高速设置下。
methods: 该论文使用了conditional hierarchical variational autoencoders（PHiRec），这是一种新的概率重建技术。
results: 该论文表明，PHiRec可以生成高质量的重建结果，同时也可以准确地量化重建结果的不确定性。此外，该论文还证明了如何将MR重建过程中的不确定性传播到下游分类任务中，并表明PHiRec可以提供高度准确的分类不确定性估计。

Abstract
MRI reconstruction techniques based on deep learning have led to unprecedented reconstruction quality especially in highly accelerated settings. However, deep learning techniques are also known to fail unexpectedly and hallucinate structures. This is particularly problematic if reconstructions are directly used for downstream tasks such as real-time treatment guidance or automated extraction of clinical paramters (e.g. via segmentation). Well-calibrated uncertainty quantification will be a key ingredient for safe use of this technology in clinical practice. In this paper we propose a novel probabilistic reconstruction technique (PHiRec) building on the idea of conditional hierarchical variational autoencoders. We demonstrate that our proposed method produces high-quality reconstructions as well as uncertainty quantification that is substantially better calibrated than several strong baselines. We furthermore demonstrate how uncertainties arising in the MR econstruction can be propagated to a downstream segmentation task, and show that PHiRec also allows well-calibrated estimation of segmentation uncertainties that originated in the MR reconstruction process.

摘要
Translation notes:* "MRI reconstruction" is translated as "MRI重建" (MRI reconstruction)* "deep learning" is translated as "深度学习" (deep learning)* "hallucinate" is translated as "假象" (hallucination)* "downstream tasks" is translated as "下游任务" (downstream tasks)* "uncertainty quantification" is translated as "不确定性评估" (uncertainty quantification)* "well-calibrated" is translated as "准确报告" (well-calibrated)* "segmentation" is translated as "分割" (segmentation)* "uncertainties arising in the MR econstruction" is translated as "MRI重建中出现的不确定性" (uncertainties arising in the MRI reconstruction)* "propagated" is translated as "传播" (propagated)* "well-calibrated estimation" is translated as "准确报告的估算" (well-calibrated estimation)

Generative Modelling of Lévy Area for High Order SDE Simulation

paper_url: http://arxiv.org/abs/2308.02452
repo_url: None
paper_authors: Andraž Jelinčič, Jiajie Tao, William F. Turner, Thomas Cass, James Foster, Hao Ni
for: 该论文是关于数值精确地解决随机 diffeq 方程（SDE）问题的。
methods: 该论文提出了一种基于深度学习的模型，可以生成随机 diffeq 方程中的“Lévy 区域”样本，并且可以保证样本匹配所有的共轭偶极值和条件偶极值。
results: 该论文通过多种 metric 测试，证明了该模型在四维 Браунов运动中的性能非常出色，并且在数学金融中的 log-Heston 模型中进行了一个数值实验，证明了高质量的 synthetic Lévy 区域可以导致高顺序弱收敛和偏差减少。

Abstract
It is well known that, when numerically simulating solutions to SDEs, achieving a strong convergence rate better than O(\sqrt{h}) (where h is the step size) requires the use of certain iterated integrals of Brownian motion, commonly referred to as its "L\'{e}vy areas". However, these stochastic integrals are difficult to simulate due to their non-Gaussian nature and for a d-dimensional Brownian motion with d > 2, no fast almost-exact sampling algorithm is known. In this paper, we propose L\'{e}vyGAN, a deep-learning-based model for generating approximate samples of L\'{e}vy area conditional on a Brownian increment. Due to our "Bridge-flipping" operation, the output samples match all joint and conditional odd moments exactly. Our generator employs a tailored GNN-inspired architecture, which enforces the correct dependency structure between the output distribution and the conditioning variable. Furthermore, we incorporate a mathematically principled characteristic-function based discriminator. Lastly, we introduce a novel training mechanism termed "Chen-training", which circumvents the need for expensive-to-generate training data-sets. This new training procedure is underpinned by our two main theoretical results. For 4-dimensional Brownian motion, we show that L\'{e}vyGAN exhibits state-of-the-art performance across several metrics which measure both the joint and marginal distributions. We conclude with a numerical experiment on the log-Heston model, a popular SDE in mathematical finance, demonstrating that high-quality synthetic L\'{e}vy area can lead to high order weak convergence and variance reduction when using multilevel Monte Carlo (MLMC).

摘要
通常情况下，当数值解决泊松方程时，要达到比O(\sqrt{h})更好的收敛率（其中h是步长）需要使用泊松动机的特定迭代积分，即其“Lévy区”。然而，这些随机积分的非泊松性和高维泊松动机（d > 2）的情况下，没有快速准确抽样算法。在这篇论文中，我们提出LévyGAN，一种基于深度学习的模型，用于生成泊松区 conditional on Brownian increment的 approximate samples。由于我们的“桥接”操作，输出样本匹配所有的共同和条件奇偶幂。我们的生成器使用特制的GNN-inspired架构，以保证输出分布和条件变量之间的正确依赖关系。此外，我们采用一种基于数学原理的特征函数基的批量分类器。最后，我们介绍一种新的训练机制，称为“Chen-training”，它利用我们的两个主要理论结论。对于4维泊松动机，我们显示LévyGAN在多个维度上达到了状态机器人的性能。我们结束于一个数学金融领域的log-Heston模型的numerical experiment，表明高质量的Synthetic Lévy area可以导致高顺序弱收敛和减少方差，当使用多层 Monte Carlo（MLMC）。

Pruning a neural network using Bayesian inference

paper_url: http://arxiv.org/abs/2308.02451
repo_url: None
paper_authors: Sunil Mathew, Daniel B. Rowe
for: 降低大型神经网络的计算和内存占用
methods: 利用感知推理进行神经网络剔除
results: 实现感知推理的剔除方法可以保持竞争性的准确率

Abstract
Neural network pruning is a highly effective technique aimed at reducing the computational and memory demands of large neural networks. In this research paper, we present a novel approach to pruning neural networks utilizing Bayesian inference, which can seamlessly integrate into the training procedure. Our proposed method leverages the posterior probabilities of the neural network prior to and following pruning, enabling the calculation of Bayes factors. The calculated Bayes factors guide the iterative pruning. Through comprehensive evaluations conducted on multiple benchmarks, we demonstrate that our method achieves desired levels of sparsity while maintaining competitive accuracy.

摘要
神经网络剔除是一种非常有效的技术，用于降低大神经网络的计算和存储占用量。在这项研究中，我们提出了一种使用泊尔投影来剔除神经网络的新方法。我们的提议利用神经网络剔除前和剔除后的 posterior 概率，以计算泊尔因子。这些泊尔因子引导了迭代剔除。我们在多个标准框架上进行了广泛的评估，并证明了我们的方法可以实现所需的稀疏性，同时保持竞争性的准确率。

From Military to Healthcare: Adopting and Expanding Ethical Principles for Generative Artificial Intelligence

paper_url: http://arxiv.org/abs/2308.02448
repo_url: None
paper_authors: David Oniani, Jordan Hilsman, Yifan Peng, COL, Ronald K. Poropatich, COL Jeremy C. Pamplin, LTC Gary L. Legault, Yanshan Wang
For: The paper is written to propose ethical principles for the use of generative AI in healthcare, addressing concerns about transparency, bias, and ethical dilemmas.* Methods: The paper uses a comprehensive review of existing literature on generative AI and healthcare to identify key ethical challenges and propose the GREAT PLEA ethical principles as a framework for addressing these challenges.* Results: The paper aims to provide a proactive approach to addressing the ethical dilemmas posed by the integration of generative AI in healthcare, with the goal of ensuring that the technology is used in a responsible and equitable manner.

Abstract
In 2020, the U.S. Department of Defense officially disclosed a set of ethical principles to guide the use of Artificial Intelligence (AI) technologies on future battlefields. Despite stark differences, there are core similarities between the military and medical service. Warriors on battlefields often face life-altering circumstances that require quick decision-making. Medical providers experience similar challenges in a rapidly changing healthcare environment, such as in the emergency department or during surgery treating a life-threatening condition. Generative AI, an emerging technology designed to efficiently generate valuable information, holds great promise. As computing power becomes more accessible and the abundance of health data, such as electronic health records, electrocardiograms, and medical images, increases, it is inevitable that healthcare will be revolutionized by this technology. Recently, generative AI has captivated the research community, leading to debates about its application in healthcare, mainly due to concerns about transparency and related issues. Meanwhile, concerns about the potential exacerbation of health disparities due to modeling biases have raised notable ethical concerns regarding the use of this technology in healthcare. However, the ethical principles for generative AI in healthcare have been understudied, and decision-makers often fail to consider the significance of generative AI. In this paper, we propose GREAT PLEA ethical principles, encompassing governance, reliability, equity, accountability, traceability, privacy, lawfulness, empathy, and autonomy, for generative AI in healthcare. We aim to proactively address the ethical dilemmas and challenges posed by the integration of generative AI in healthcare.

摘要
在2020年，美国国防部 официаль地公布了一组伦理原则，用于导引未来战场上的人工智能技术的应用。尽管 militar y medical service 之间存在差异，但是在快速决策的 circumstance 中，战士和医疗Provider 面临的挑战类似。在战场上，战士可能面临生命变化的情况，需要快速决策。而医疗提供者在医疗环境中，如紧急部门或在手术中治疗生命危险情况，也面临类似的挑战。新兴的生成型人工智能技术，由于计算机力量的提高和健康数据的增加，将健康卫生领域 revolutionize。 recent years, generative AI has attracted extensive attention, leading to debates about its application in healthcare, primarily due to concerns about transparency and related issues. Meanwhile, concerns about the potential exacerbation of health disparities due to modeling biases have raised notable ethical concerns regarding the use of this technology in healthcare. However, the ethical principles for generative AI in healthcare have been understudied, and decision-makers often fail to consider the significance of generative AI. 为了积极应对生成型人工智能在医疗领域的伦理问题，我们提出了 GREAT PLEA 伦理原则，包括政府、可靠性、公平性、责任、可追溯性、隐私、法律性、Empathy 和自主权。我们的目标是为生成型人工智能在医疗领域的应用提出伦理原则，以便积极地解决这些技术的伦理问题。

Adaptive Preferential Attached kNN Graph with Distribution-Awareness

paper_url: http://arxiv.org/abs/2308.02442
repo_url: https://github.com/4alexmin/knnsotas
paper_authors: Shaojie Min, Ji Liu
for: 提高Machine Learning任务中的泛化能力，特别是在面临复杂数据分布时。
methods: 基于分布信息的适应kNN图 constructions，通过”pull” ambigious samples towards their original classes，提高总的泛化能力。
results: 在多种实验dataset上，paNNG比 estado-of-the-art算法表现出优异的适应性和效果，展示其在不同场景中的可靠性和有效性。

Abstract
Graph-based kNN algorithms have garnered widespread popularity for machine learning tasks due to their simplicity and effectiveness. However, as factual data often inherit complex distributions, the conventional kNN graph's reliance on a unified k-value can hinder its performance. A crucial factor behind this challenge is the presence of ambiguous samples along decision boundaries that are inevitably more prone to incorrect classifications. To address the situation, we propose the Preferential Attached k-Nearest Neighbors Graph (paNNG), which adopts distribution-aware adaptive-k into graph construction. By incorporating distribution information as a cohesive entity, paNNG can significantly improve performance on ambiguous samples by "pulling" them towards their original classes and hence enhance overall generalization capability. Through rigorous evaluations on diverse datasets, paNNG outperforms state-of-the-art algorithms, showcasing its adaptability and efficacy across various real-world scenarios.

摘要
graph-based kNN算法在机器学习任务中广泛应用，因为它们简单易用而且有效。然而，因为常见的数据具有复杂的分布，传统的kNN图的固定k值可能会限制其性能。这种挑战的关键原因是具有含糊目标的样本，这些样本在决策边界上存在，容易被错误地分类。为解决这种问题，我们提出了Preferential Attached k-Nearest Neighbors Graph（paNNG），它在图构建中采用了分布情况感知的自适应k。通过将分布信息作为一个整体 интегрирова到图构建中，paNNG可以在含糊目标样本上进行显著改进，使其“吸引”这些样本返回到原来的类别，从而提高总的泛化能力。经过严格的评估，paNNG在多个 dataset 上表现出优于当前状态的算法， demonstarting its adaptability and efficacy across various real-world scenarios.