2023-07-31

cs.LG

cs.LG - 2023-07-31

Classification with Deep Neural Networks and Logistic Loss

paper_url: http://arxiv.org/abs/2307.16792
repo_url: https://github.com/himanshub1007/Alzhimers-Disease-Prediction-Using-Deep-learning
paper_authors: Zihan Zhang, Lei Shi, Ding-Xuan Zhou
for: 这个论文主要是研究 Deep Neural Networks (DNNs) 在二分类任务中的泛化分析。
methods: 这篇论文使用了一种新的oracle-type不等式，并使用这个不等式来 deriv 出 DNN 类ifiers 在 logistic loss 下的快速收敛率。
results: 这篇论文提供了一些新的泛化分析结果，包括对 DNN 类ifiers 的收敛率的优化（即 log 因子），以及对数据维度的独立性。这些结果可以解释为什么 DNN 类ifiers 在实际高维度分类任务中能够表现出色。

Abstract
Deep neural networks (DNNs) trained with the logistic loss (i.e., the cross entropy loss) have made impressive advancements in various binary classification tasks. However, generalization analysis for binary classification with DNNs and logistic loss remains scarce. The unboundedness of the target function for the logistic loss is the main obstacle to deriving satisfying generalization bounds. In this paper, we aim to fill this gap by establishing a novel and elegant oracle-type inequality, which enables us to deal with the boundedness restriction of the target function, and using it to derive sharp convergence rates for fully connected ReLU DNN classifiers trained with logistic loss. In particular, we obtain optimal convergence rates (up to log factors) only requiring the H\"older smoothness of the conditional class probability $\eta$ of data. Moreover, we consider a compositional assumption that requires $\eta$ to be the composition of several vector-valued functions of which each component function is either a maximum value function or a H\"older smooth function only depending on a small number of its input variables. Under this assumption, we derive optimal convergence rates (up to log factors) which are independent of the input dimension of data. This result explains why DNN classifiers can perform well in practical high-dimensional classification problems. Besides the novel oracle-type inequality, the sharp convergence rates given in our paper also owe to a tight error bound for approximating the natural logarithm function near zero (where it is unbounded) by ReLU DNNs. In addition, we justify our claims for the optimality of rates by proving corresponding minimax lower bounds. All these results are new in the literature and will deepen our theoretical understanding of classification with DNNs.

摘要
深度神经网络（DNN）在不同的二分类任务中做出了很好的表现。然而，对于DNN和对数损失函数的泛化分析仍然缺乏研究。对数损失函数的无上界性是泛化分析的主要障碍。在这篇论文中，我们想要填补这个差距，通过设立一个新的oracle-type不等式，使得我们可以处理对数损失函数的上界限制，并使用其 derive sharp的泛化速率 для全连接ReLU DNN类ifiziert。特别是，我们获得了最佳的泛化速率（Up to log factor），只需要数据中condition class概率函数 $\eta$的Holder平滑性。此外，我们还假设了$\eta$是一个Vector-valued函数的compose，其中每个组件函数可以是最大值函数或Holder平滑函数，只依赖于几个输入变量。在这个假设下，我们 derive optimal的泛化速率（Up to log factor），这个结果解释了为什么DNN类ifiziert可以在实际高维分类问题中表现良好。此外，我们还提供了一个紧距的错误 bound，用于 aproximating自然对数函数 near zero（where it is unbounded）by ReLU DNNs。此外，我们还证明了我们的结果的最佳性，通过证明相应的最小化下界。这些结果都是文献中新的，它们将深入我们对分类问题的理论理解。

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

paper_url: http://arxiv.org/abs/2307.16789
repo_url: https://github.com/openbmb/toolbench
paper_authors: Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, Maosong Sun
for: 这个论文旨在提高开源大语言模型（LLM）的高级任务能力，使其能够更好地执行人类指令，以及自动选择适合的API。
methods: 这篇论文提出了一种总称为 ToolLLM 的工具使用框架，包括数据建构、模型训练和评估。具体来说，他们提出了一个名为 ToolBench 的实ruction-tuning数据集，并使用 ChatGPT 自动生成了16,464个实际 RESTful API 示例，以及一种名为 DFSDT 的深度优先搜索树，以提高 LL 模型的规划和推理能力。
results: 根据论文的描述，通过使用 ToolLLM 框架和 ToolBench 数据集，LLaMA 模型可以准确执行复杂的指令，并在未看过的 API 上进行推理和规划。此外，ToolLLaMA 模型与 ChatGPT 的表现相当，且可以自动选择适合的 API。

Abstract
Despite the advancements of open-source large language models (LLMs) and their variants, e.g., LLaMA and Vicuna, they remain significantly limited in performing higher-level tasks, such as following human instructions to use external tools (APIs). This is because current instruction tuning largely focuses on basic language tasks instead of the tool-use domain. This is in contrast to state-of-the-art (SOTA) LLMs, e.g., ChatGPT, which have demonstrated excellent tool-use capabilities but are unfortunately closed source. To facilitate tool-use capabilities within open-source LLMs, we introduce ToolLLM, a general tool-use framework of data construction, model training and evaluation. We first present ToolBench, an instruction-tuning dataset for tool use, which is created automatically using ChatGPT. Specifically, we collect 16,464 real-world RESTful APIs spanning 49 categories from RapidAPI Hub, then prompt ChatGPT to generate diverse human instructions involving these APIs, covering both single-tool and multi-tool scenarios. Finally, we use ChatGPT to search for a valid solution path (chain of API calls) for each instruction. To make the searching process more efficient, we develop a novel depth-first search-based decision tree (DFSDT), enabling LLMs to evaluate multiple reasoning traces and expand the search space. We show that DFSDT significantly enhances the planning and reasoning capabilities of LLMs. For efficient tool-use assessment, we develop an automatic evaluator: ToolEval. We fine-tune LLaMA on ToolBench and obtain ToolLLaMA. Our ToolEval reveals that ToolLLaMA demonstrates a remarkable ability to execute complex instructions and generalize to unseen APIs, and exhibits comparable performance to ChatGPT. To make the pipeline more practical, we devise a neural API retriever to recommend appropriate APIs for each instruction, negating the need for manual API selection.

摘要
尽管开源大型自然语言模型（LLM）和其变体（如LLaMA和Vicuna）在进行更高级任务方面有所进步，但它们仍然在使用外部工具（API）方面有限制。这是因为当前的指令调整主要集中在基础语言任务上，而不是工具使用领域。与此相比，当前最佳实践（SOTA）LLM（如ChatGPT）已经表现出了优秀的工具使用能力，但它们却是关闭源代码。为了激活开源LLM中的工具使用能力，我们提出了工具框架（ToolLLM），包括数据建构、模型训练和评估。我们首先提供了工具调整数据集（ToolBench），该数据集是通过ChatGPT自动生成的人工指令，涵盖单工具和多工具场景。我们收集了16,464个实际RESTful API，涵盖49个类别，并使用ChatGPT生成多样化的人工指令，以覆盖这些API。然后，我们使用ChatGPT搜索一个有效的解决方案路径（Chain of API calls） для每个指令。为了使搜索过程更有效，我们开发了一种深度优先搜索基于决策树（DFSDT），使LLM可以评估多种逻辑追踪，扩大搜索空间。我们表明，DFSDT有效地提高了LLM的规划和逻辑能力。为了有效评估工具使用，我们开发了自动评估器（ToolEval）。我们精心 fine-tune LLaMA 在 ToolBench 上，并得到了 ToolLLaMA。我们的 ToolEval 表明，ToolLLaMA 能够执行复杂的指令并将其推广到未经看到的API，并与ChatGPT的性能相似。为使管道更实用，我们设计了一种神经API搜索器，以便根据每个指令提供适当的API，从而消除手动API选择的需要。

Exploring how a Generative AI interprets music

paper_url: http://arxiv.org/abs/2308.00015
repo_url: None
paper_authors: Gabriela Barenboim, Luigi Del Debbio, Johannes Hirn, Veronica Sanz
for: 用Google的MusicVAE模型来表示一些乐曲的几个核心特征。
methods: 使用Variational Auto-Encoder，对乐曲进行512维 latent空间的表示，并对latent空间中的维度进行排序，以确定哪些维度最有用于描述乐曲。
results: 发现大多数latent neuron在听到真正的乐曲时都不活跃，只有几个独立的neuron会活跃，这些neuron被称为”music neuron”。发现大多数乐曲中的旋律信息都是由第一些music neuron来编码的，而乐曲的旋律信息只在 longer sequences of music中出现。

Abstract
We use Google's MusicVAE, a Variational Auto-Encoder with a 512-dimensional latent space to represent a few bars of music, and organize the latent dimensions according to their relevance in describing music. We find that, on average, most latent neurons remain silent when fed real music tracks: we call these "noise" neurons. The remaining few dozens of latent neurons that do fire are called "music neurons". We ask which neurons carry the musical information and what kind of musical information they encode, namely something that can be identified as pitch, rhythm or melody. We find that most of the information about pitch and rhythm is encoded in the first few music neurons: the neural network has thus constructed a couple of variables that non-linearly encode many human-defined variables used to describe pitch and rhythm. The concept of melody only seems to show up in independent neurons for longer sequences of music.

摘要
我们使用Google的MusicVAE，一种Variational Auto-Encoder，将乐曲压缩到512维 latent space中。我们发现，在真实音乐轨迹上 feed latent neurons 中，大多数 neurons 保持静止：我们称这些为 "噪声" neurons。剩下的数十个 latent neurons 会被 activated ，我们称为 "音乐 neurons"。我们问的是，哪些 neurons 携带了音乐信息，它们编码什么样的音乐信息，例如抽象的旋律、和声或旋律。我们发现，大多数旋律和和声信息是在首几个 music neurons 中编码的：神经网络因此构造了一些变量，非线性地编码了许多人定义的旋律和和声变量。概念化的旋律似乎只在 longer sequence of music 中出现。

Lossless Transformations and Excess Risk Bounds in Statistical Inference

paper_url: http://arxiv.org/abs/2307.16735
repo_url: None
paper_authors: László Györfi, Tamás Linder, Harro Walk
for: 这篇论文主要研究了统计推断中的剩余最小风险，即从观测特征向量中估计随机变量的最小风险差。
methods: 本文首先 caracterizes lossless transformations，即 transformations for which the excess risk is zero for all loss functions，然后构建一个分 partitions 测试统计量来检验一个 transformations 是否是 lossless，并证明对于独立同分布数据，该测试是strongly consistent。
results: 本文还提出了基于信息理论的上限 bounds on the excess risk，这些 bounds 适用于较为通用的损失函数类型。基于这些 bounds，本文引入了 delta-lossless transformation 概念，并给出了 universally delta-lossless transformation 的充分条件。

Abstract
We study the excess minimum risk in statistical inference, defined as the difference between the minimum expected loss in estimating a random variable from an observed feature vector and the minimum expected loss in estimating the same random variable from a transformation (statistic) of the feature vector. After characterizing lossless transformations, i.e., transformations for which the excess risk is zero for all loss functions, we construct a partitioning test statistic for the hypothesis that a given transformation is lossless and show that for i.i.d. data the test is strongly consistent. More generally, we develop information-theoretic upper bounds on the excess risk that uniformly hold over fairly general classes of loss functions. Based on these bounds, we introduce the notion of a delta-lossless transformation and give sufficient conditions for a given transformation to be universally delta-lossless. Applications to classification, nonparametric regression, portfolio strategies, information bottleneck, and deep learning, are also surveyed.

摘要
我们研究额外最小风险在统计推断中，定义为从观察特征向量获取随机变量的估计loss的差异，与从特征向量的变换（统计）获取随机变量的loss的差异。经过定义无损变换，即所有loss函数下的额外风险为零的变换，我们构建了分partition测试统计，用于测试一个给定的变换是否无损，并证明在独立Identically distributed（i.i.d）数据上，该测试是强有效的。更一般地，我们建立了基于信息理论的额外风险上下文 bounds，对于较为一般的损失函数而言，这些上下文 bounds 具有 uniform 的持续性。基于这些上下文 bounds，我们引入了 delta-lossless 变换的概念，并给出了universally delta-lossless 变换的充分条件。我们还应用到分类、非 Parametric 回归、投资策略、信息瓶颈和深度学习等领域。

An Efficient Shapley Value Computation for the Naive Bayes Classifier

paper_url: http://arxiv.org/abs/2307.16718
repo_url: None
paper_authors: Vincent Lemaire, Fabrice Clérot, Marc Boullé
for: 本研究的目的是提出一种exact analytic expression of Shapley values дляNaive Bayes分类器，以及与其他常用指标Weight of Evidence（WoE）的比较，以及与 kernelShap 结果的 empirical comparison。
methods: 本研究使用了 cooperative game theory 的 Shapley value estimation algorithms，并提出了一种exact analytic expression of Shapley values 的方法，用于Naive Bayes分类器。
results: 研究结果表明，我们的 Shapley proposal 可以在实际世界数据集上提供有用的结果，并且与 WoE 和 KernelShap 结果有一定的相似性和不同性。特别是，我们的方法具有低的算法复杂度和高效的计算时间，可以在很大的数据集上进行实时计算。

Abstract
Variable selection or importance measurement of input variables to a machine learning model has become the focus of much research. It is no longer enough to have a good model, one also must explain its decisions. This is why there are so many intelligibility algorithms available today. Among them, Shapley value estimation algorithms are intelligibility methods based on cooperative game theory. In the case of the naive Bayes classifier, and to our knowledge, there is no ``analytical" formulation of Shapley values. This article proposes an exact analytic expression of Shapley values in the special case of the naive Bayes Classifier. We analytically compare this Shapley proposal, to another frequently used indicator, the Weight of Evidence (WoE) and provide an empirical comparison of our proposal with (i) the WoE and (ii) KernelShap results on real world datasets, discussing similar and dissimilar results. The results show that our Shapley proposal for the naive Bayes classifier provides informative results with low algorithmic complexity so that it can be used on very large datasets with extremely low computation time.

摘要
Variable selection或输入变量重要性评估在机器学习模型中已成为研究焦点。不再只有一个好模型，还需要解释其决策。这是为什么今天有这么多可理解性算法的原因。 Among them, Shapley value estimation algorithms are intelligibility methods based on cooperative game theory. In the case of the naive Bayes classifier, and to our knowledge, there is no "analytical" formulation of Shapley values. This article proposes an exact analytic expression of Shapley values in the special case of the naive Bayes Classifier. We analytically compare this Shapley proposal with another frequently used indicator, the Weight of Evidence (WoE), and provide an empirical comparison of our proposal with (i) the WoE and (ii) KernelShap results on real-world datasets, discussing similar and dissimilar results. The results show that our Shapley proposal for the naive Bayes classifier provides informative results with low algorithmic complexity, so it can be used on very large datasets with extremely low computation time.Note: Please note that the translation is in Simplified Chinese, and the word order and sentence structure may be different from the original text.

Active Learning in Genetic Programming: Guiding Efficient Data Collection for Symbolic Regression

paper_url: http://arxiv.org/abs/2308.00672
repo_url: https://github.com/hoolagans/stackgp
paper_authors: Nathan Haut, Wolfgang Banzhaf, Bill Punch
for: 这种论文探讨了活动学习生物学程中不同方法 Computing Uncertainty和Diversity。
methods: 我们使用了模型集合 combines with uncertainty metric 选择有用的训练数据点。我们尝试了多种不确定度指标，发现 differential entropy 最佳。我们还比较了两种数据多样性指标，发现 correlation 作为多样性指标表现更好，但是有一些缺点使得 correlation 无法在所有问题上使用。
results: 我们将不确定度和多样性 combinesthrough Pareto optimization approach 来考虑它们在培训中选择有用和独特的数据点。

Abstract
This paper examines various methods of computing uncertainty and diversity for active learning in genetic programming. We found that the model population in genetic programming can be exploited to select informative training data points by using a model ensemble combined with an uncertainty metric. We explored several uncertainty metrics and found that differential entropy performed the best. We also compared two data diversity metrics and found that correlation as a diversity metric performs better than minimum Euclidean distance, although there are some drawbacks that prevent correlation from being used on all problems. Finally, we combined uncertainty and diversity using a Pareto optimization approach to allow both to be considered in a balanced way to guide the selection of informative and unique data points for training.

摘要

An Empirical Study on Log-based Anomaly Detection Using Machine Learning

paper_url: http://arxiv.org/abs/2307.16714
repo_url: None
paper_authors: Shan Ali, Chaima Boufaied, Domenico Bianculli, Paula Branco, Lionel Briand, Nathan Aschbacher
for: 本研究旨在对不同预测模型在日志异常检测任务中的表现进行全面的实验研究，包括传统机器学习（ML）技术和深度学习（DL）技术。
methods: 本研究使用了多种传统ML技术和深度学习技术，包括批处理学习、归一化学习、异常点检测等。
results: 研究发现，传统ML技术和深度学习技术在检测精度和预测时间方面几乎相当，而半监督学习技术在检测精度方面表现较差。此外，不同学习模型对于参数调整的敏感性也有很大差异。

Abstract
The growth of systems complexity increases the need of automated techniques dedicated to different log analysis tasks such as Log-based Anomaly Detection (LAD). The latter has been widely addressed in the literature, mostly by means of different deep learning techniques. Nevertheless, the focus on deep learning techniques results in less attention being paid to traditional Machine Learning (ML) techniques, which may perform well in many cases, depending on the context and the used datasets. Further, the evaluation of different ML techniques is mostly based on the assessment of their detection accuracy. However, this is is not enough to decide whether or not a specific ML technique is suitable to address the LAD problem. Other aspects to consider include the training and prediction time as well as the sensitivity to hyperparameter tuning. In this paper, we present a comprehensive empirical study, in which we evaluate different supervised and semi-supervised, traditional and deep ML techniques w.r.t. four evaluation criteria: detection accuracy, time performance, sensitivity of detection accuracy as well as time performance to hyperparameter tuning. The experimental results show that supervised traditional and deep ML techniques perform very closely in terms of their detection accuracy and prediction time. Moreover, the overall evaluation of the sensitivity of the detection accuracy of the different ML techniques to hyperparameter tuning shows that supervised traditional ML techniques are less sensitive to hyperparameter tuning than deep learning techniques. Further, semi-supervised techniques yield significantly worse detection accuracy than supervised techniques.

摘要
随着系统复杂性的增加，自动化技术在不同的日志分析任务中的应用越来越重要，如日志异常检测（LAD）。在文献中，大多数使用深度学习技术来解决LAD问题，但是这些技术的使用导致传统的机器学习（ML）技术 receiving less attention，但是这些技术在某些情况下可能表现非常好，具体取决于context和使用的数据集。此外，评估不同的ML技术的方法通常是根据它们的检测精度进行评估，但这并不是决定是否适用于LAD问题的唯一因素。其他需要考虑的因素包括训练和预测时间以及参数调整的敏感性。在本文中，我们提出了一项全面的实验研究，在四个评估标准下评估不同的超vised和半supervised、传统和深度学习技术：检测精度、训练和预测时间、检测精度对参数调整的敏感性以及训练和预测时间对参数调整的敏感性。实验结果表明，超vised传统和深度学习技术在检测精度和预测时间方面表现很相似，而且总体来说，传统学习技术对参数调整的敏感性较低。此外，半supervised技术的检测精度相对较低。

TFE-GNN: A Temporal Fusion Encoder Using Graph Neural Networks for Fine-grained Encrypted Traffic Classification

paper_url: http://arxiv.org/abs/2307.16713
repo_url: https://github.com/ViktorAxelsen/TFE-GNN
paper_authors: Haozhen Zhang, Le Yu, Xi Xiao, Qing Li, Francesco Mercaldo, Xiapu Luo, Qixu Liu
for: 本研究旨在提出一种基于点wise积分信息（PMI）的字节级交通图构建方法，以及一种基于图神经网络（GNN）的特征提取模型——时间融合编码器（TFE-GNN），用于细致Encrypted traffic classification。methods: 本研究使用了字节级交通图构建方法和基于GNN的特征提取模型，包括两层双重嵌入层、GNN基于交通图编码器和交通图与数据流之间的交叉阻止机制。results: 对于两个实际数据集，TFE-GNN的实验结果表明，它在细致Encrypted traffic classification任务中超过多种现有方法表现出色。

Abstract
Encrypted traffic classification is receiving widespread attention from researchers and industrial companies. However, the existing methods only extract flow-level features, failing to handle short flows because of unreliable statistical properties, or treat the header and payload equally, failing to mine the potential correlation between bytes. Therefore, in this paper, we propose a byte-level traffic graph construction approach based on point-wise mutual information (PMI), and a model named Temporal Fusion Encoder using Graph Neural Networks (TFE-GNN) for feature extraction. In particular, we design a dual embedding layer, a GNN-based traffic graph encoder as well as a cross-gated feature fusion mechanism, which can first embed the header and payload bytes separately and then fuses them together to obtain a stronger feature representation. The experimental results on two real datasets demonstrate that TFE-GNN outperforms multiple state-of-the-art methods in fine-grained encrypted traffic classification tasks.

摘要
受到研究者和工业公司的广泛关注，加密流量分类技术正在不断发展。然而，现有的方法仅EXTract flow-level特征，无法处理短流，或者对header和 payload equally，失去了可能的字节相关性。因此，在这篇论文中，我们提出了基于点wise私有信息（PMI）的字节级流量图构建方法，以及一种基于图 neural network（GNN）的 Temporal Fusion Encoder（TFE-GNN）模型 для特征提取。具体来说，我们设计了两层双向嵌入层，一个基于GNN的流量图编码器以及一个跨度闭合特征融合机制，可以首先将header和 payload字节分别嵌入，然后将其相互融合以获得更强的特征表示。实验结果表明，TFE-GNN在两个实际数据集上的细化加密流量分类任务中较多状态前方法表现出色。

Deep Learning Meets Adaptive Filtering: A Stein’s Unbiased Risk Estimator Approach

paper_url: http://arxiv.org/abs/2307.16708
repo_url: None
paper_authors: Zahra Esmaeilbeig, Mojtaba Soltanalian
for: 本文重新审视了两种广泛使用的适应滤波算法，即Recursive Least Squares (RLS) 和 Equivariant Adaptive Source Separation (EASI)，在源估计和分离上。
methods: 作者通过algorithm unrolling方法，将RLS和EASI算法变换成层次结构，并将每一层作为深度神经网络的一个层。
results: 作者提出了一种基于Stein’s unbiased risk estimator (SURE) 的训练方法，并在实验中证明了这种方法能够提高源信号估计的性能。

Abstract
This paper revisits two prominent adaptive filtering algorithms through the lens of algorithm unrolling, namely recursive least squares (RLS) and equivariant adaptive source separation (EASI), in the context of source estimation and separation. Building upon the unrolling methodology, we introduce novel task-based deep learning frameworks, denoted as Deep RLS and Deep EASI. These architectures transform the iterations of the original algorithms into layers of a deep neural network, thereby enabling efficient source signal estimation by taking advantage of a training process. To further enhance performance, we propose training these deep unrolled networks utilizing a loss function grounded on a Stein's unbiased risk estimator (SURE). Our empirical evaluations demonstrate the efficacy of this SURE-based approach for enhanced source signal estimation.

摘要
这篇论文重新审视了两种常见的适应滤波算法，即回归最小二乘（RLS）和对称适应源分离（EASI），在源估计和分离的上下文中。基于折叠方法，我们提出了两种新的任务深度学习框架，称为深度RLS和深度EASI。这些架构将原始算法的迭代转化为层次深度神经网络的形式，以便通过训练过程来实现高效的源信号估计。进一步提高性能，我们提议使用基于Stein不偏风险估计（SURE）的训练方法。我们的实验证明了这种SURE基于的方法对源信号估计具有显著的效果。

Lookbehind Optimizer: k steps back, 1 step forward

paper_url: http://arxiv.org/abs/2307.16704
repo_url: None
paper_authors: Gonçalo Mordido, Pranshu Malviya, Aristide Baratin, Sarath Chandar
for: 提高深度神经网络训练稳定性，并且改善损失锐度与损失平衡之间的交互关系。
methods: combines Lookahead optimizer的想法和锐度感知训练（SAM），并提出了Lookbehind方法，通过在每个迭代阶段计算多个梯度升降步的积分来偏导跌幅向平坦的极小值。
results: 在多种任务和训练方案中，Lookbehind方法可以获得更高的普适性表现、更大的随机权重稳定性和更高的寿命学习中的抗杂稳定性。

Abstract
The Lookahead optimizer improves the training stability of deep neural networks by having a set of fast weights that "look ahead" to guide the descent direction. Here, we combine this idea with sharpness-aware minimization (SAM) to stabilize its multi-step variant and improve the loss-sharpness trade-off. We propose Lookbehind, which computes $k$ gradient ascent steps ("looking behind") at each iteration and combine the gradients to bias the descent step toward flatter minima. We apply Lookbehind on top of two popular sharpness-aware training methods -- SAM and adaptive SAM (ASAM) -- and show that our approach leads to a myriad of benefits across a variety of tasks and training regimes. Particularly, we show increased generalization performance, greater robustness against noisy weights, and higher tolerance to catastrophic forgetting in lifelong learning settings.

摘要
“lookahead”优化器可以提高深度神经网络的训练稳定性，通过一组快速的权重来引导 DESC 方向。我们在这里结合这个想法与锐度意识化最小化（SAM）来稳定其多步变体并改善损失锐度质量。我们提出了“lookbehind”，它在每次迭代中计算 $k$ 步梯度升级（“寻看后”），并将梯度相加以偏移下降步向平坦的极小值。我们在两种流行的锐度意识化训练方法——SAM 和 adaptive SAM（ASAM）——之上应用了 Lookbehind，并证明了我们的方法在多种任务和训练 режимах中具有多种优势，包括提高泛化性能、增强随着权重噪声的 Robustness 和生长学习中的忘却鲁棒性。

A theory of data variability in Neural Network Bayesian inference

paper_url: http://arxiv.org/abs/2307.16695
repo_url: None
paper_authors: Javed Lindner, David Dahmen, Michael Krämer, Moritz Helias
for: 这篇论文主要是研究无穷层神经网络的泛化性质，以及在不同输入维度和训练数据规模下的泛化性质。
methods: 这篇论文使用了泛化推理和kernel方法，具体来说是使用了神经网络 Gaussian 过程，以研究无穷层神经网络的泛化性质。
results: 这篇论文的结果表明，在不同输入维度和训练数据规模下，神经网络的泛化性质可以通过计算 kernel 矩阵的统计性质来描述，并且可以通过这种方法获得泛化性质的 bounds。

Abstract
Bayesian inference and kernel methods are well established in machine learning. The neural network Gaussian process in particular provides a concept to investigate neural networks in the limit of infinitely wide hidden layers by using kernel and inference methods. Here we build upon this limit and provide a field-theoretic formalism which covers the generalization properties of infinitely wide networks. We systematically compute generalization properties of linear, non-linear, and deep non-linear networks for kernel matrices with heterogeneous entries. In contrast to currently employed spectral methods we derive the generalization properties from the statistical properties of the input, elucidating the interplay of input dimensionality, size of the training data set, and variability of the data. We show that data variability leads to a non-Gaussian action reminiscent of a ($\varphi^3+\varphi^4$)-theory. Using our formalism on a synthetic task and on MNIST we obtain a homogeneous kernel matrix approximation for the learning curve as well as corrections due to data variability which allow the estimation of the generalization properties and exact results for the bounds of the learning curves in the case of infinitely many training data points.

摘要
bayesian inference和kernel方法在机器学习中很受欢迎，特别是神经网络泊松过程，它提供了investigate神经网络的概念，在具有无限宽隐藏层的情况下使用kernel和推理方法。我们在这个限制下建立了一个场理论 formalism，涵盖无限宽网络的泛化性质。我们系统地计算了linear, non-linear和深度非线性网络的泛化性质，并对具有不同分布的输入数据进行了系统的计算。与现有的spectral方法不同，我们从输入数据的统计性质出发，描述了输入维度、训练数据集的大小和数据的多样性之间的交互作用。我们发现，数据多样性导致一种非高斯行为，类似于($\varphi^3+\varphi^4$)-理论。使用我们的形式主义在一个synthetic任务和MNIST上，我们获得了一个homogeneous kernel matrix approximation，以及由数据多样性引起的 corrections，这些 corrections 允许我们估计泛化性质并计算 bounds 的学习曲线。

Guiding Image Captioning Models Toward More Specific Captions

paper_url: http://arxiv.org/abs/2307.16686
repo_url: None
paper_authors: Simon Kornblith, Lala Li, Zirui Wang, Thao Nguyen
For: The paper aims to improve the quality of image captions generated by an autoregressive captioning model, specifically by fine-tuning the model to estimate both conditional and unconditional distributions over captions.* Methods: The paper uses classifier-free guidance for the autoregressive captioning model, which involves fine-tuning the model to estimate both conditional and unconditional distributions over captions. The guidance scale applied at decoding controls a trade-off between maximizing the probability of the caption given the image and the probability of the image given the caption.* Results: The paper shows that decoding with a guidance scale of 2 substantially improves reference-free metrics such as CLIPScore and caption-to-image retrieval performance in the CLIP embedding space, but worsens standard reference-based captioning metrics such as CIDEr. The paper also explores the use of language models to guide the decoding process, obtaining small improvements over the Pareto frontier of reference-free vs. reference-based captioning metrics.

Abstract
Image captioning is conventionally formulated as the task of generating captions for images that match the distribution of reference image-caption pairs. However, reference captions in standard captioning datasets are short and may not uniquely identify the images they describe. These problems are further exacerbated when models are trained directly on image-alt text pairs collected from the internet. In this work, we show that it is possible to generate more specific captions with minimal changes to the training process. We implement classifier-free guidance for an autoregressive captioning model by fine-tuning it to estimate both conditional and unconditional distributions over captions. The guidance scale applied at decoding controls a trade-off between maximizing $p(\mathrm{caption}|\mathrm{image})$ and $p(\mathrm{image}|\mathrm{caption})$. Compared to standard greedy decoding, decoding with a guidance scale of 2 substantially improves reference-free metrics such as CLIPScore (0.808 vs. 0.775) and caption$\to$image retrieval performance in the CLIP embedding space (recall@1 44.6% vs. 26.5%), but worsens standard reference-based captioning metrics (e.g., CIDEr 78.6 vs 126.1). We further explore the use of language models to guide the decoding process, obtaining small improvements over the Pareto frontier of reference-free vs. reference-based captioning metrics that arises from classifier-free guidance, and substantially improving the quality of captions generated from a model trained only on minimally curated web data.

摘要
Image captioning 通常是指生成与图像相关的文本描述，但标准的参考caption集合可能不够准确地描述图像。此外，由互联网上收集的图像-alt文本对也可能导致模型的训练过程中的问题。在这项工作中，我们表明可以通过微调模型来生成更具体的描述。我们实现了无类别导航的autoregressive captioning模型，通过估计条件和无条件的描述分布来对模型进行微调。在解码过程中应用的指导尺度控制了在描述中最大化 $p(\text{caption}|\text{image})$ 和 $p(\text{image}|\text{caption})$ 的权重平衡。与标准的批量解码相比，使用指导尺度可以substantially提高无参考metric（CLIPScore）（0.808 vs. 0.775）和在CLIP嵌入空间中的描述$\to$图像检索性能（recall@1 44.6% vs. 26.5%），但会降低标准的参考基线metric（例如 CIDEr 78.6 vs 126.1）。我们进一步探讨使用语言模型来引导解码过程，可以在无参考vs参考captioning metric之间获得小的改进，并在使用微 curated web数据训练的模型中substantially提高生成的描述质量。

On the Trustworthiness Landscape of State-of-the-art Generative Models: A Comprehensive Survey

paper_url: http://arxiv.org/abs/2307.16680
repo_url: None
paper_authors: Mingyuan Fan, Cen Chen, Chengyu Wang, Jun Huang
for: This paper investigates the trustworthiness of large-scale generative models, specifically addressing privacy, security, fairness, and responsibility concerns.methods: The authors use a comprehensive approach, mapping out the trustworthiness of these models across four fundamental dimensions and providing practical recommendations.results: The paper provides an extensive map outlining the trustworthiness of large-scale generative models and identifies future directions for promoting their trustworthy deployment.Here’s the text in Simplified Chinese:for: 这篇论文研究了大规模生成模型的可靠性，特别是privacy、安全、公平和责任等四个基本维度上的问题。methods: 作者采用了一种全面的方法，通过地图出大规模生成模型的可靠性，并提供了实践的建议。results: 论文提供了大规模生成模型的可靠性的广泛地图，并标识了未来的发展方向，以便推广这些模型的可靠性。

Abstract
Diffusion models and large language models have emerged as leading-edge generative models and have sparked a revolutionary impact on various aspects of human life. However, the practical implementation of these models has also exposed inherent risks, highlighting their dual nature and raising concerns regarding their trustworthiness. Despite the abundance of literature on this subject, a comprehensive survey specifically delving into the intersection of large-scale generative models and their trustworthiness remains largely absent. To bridge this gap, This paper investigates both the long-standing and emerging threats associated with these models across four fundamental dimensions: privacy, security, fairness, and responsibility. In this way, we construct an extensive map outlining the trustworthiness of these models, while also providing practical recommendations and identifying future directions. These efforts are crucial for promoting the trustworthy deployment of these models, ultimately benefiting society as a whole.

摘要
大数据扩散模型和大语言模型在各种方面已经成为当今领先的生成模型，它们的实施也暴露了内在的风险，表现出了双重性和不可靠性的问题。尽管有大量的文献关于这个主题，但是一篇全面探讨大规模生成模型和其可靠性之间的关系的调查还是缺席。为了填补这个空白，本文调查了这些模型所面临的长期和新出现的威胁，从四个基本维度出发：隐私、安全、公平和责任。通过构建了这些模型的可靠性地图，并提供了实践的建议和未来方向，以便推广这些模型的可靠部署，终于为社会带来利益。注：本文使用的是简化中文，以便更好地适应不同读者的需求。

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

paper_url: http://arxiv.org/abs/2307.16679
repo_url: None
paper_authors: Guangyan Zhang, Thomas Merritt, Manuel Sam Ribeiro, Biel Tura-Vecino, Kayoko Yanagisawa, Kamil Pokora, Abdelhamid Ezzerg, Sebastian Cygert, Ammar Abbas, Piotr Bilinski, Roberto Barra-Chicote, Daniel Korzekwa, Jaime Lorenzo-Trueba
for: 文章主要针对的是语音合成 tasks 中的 Prosody 和 mel-spectrogram 预测问题。
methods: 文章使用了 Normalizing Flows 和 Diffusion Probabilistic Models 作为传统 L1/L2 损失函数的替代方案。
results: 实验结果表明，流程基本模型在 mel-spectrogram 预测 task 中表现最佳，超过了相当的扩散和 L1 模型。此外，流程和扩散基本预测器均对传统 L2 训练的颤音模型产生了显著改进。

Abstract
Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong assumptions about the distributions of the target data space. Aiming to improve those assumptions, Normalizing Flows and Diffusion Probabilistic Models were recently proposed as alternatives. In this paper, we compare traditional L1/L2-based approaches to diffusion and flow-based approaches for the tasks of prosody and mel-spectrogram prediction for text-to-speech synthesis. We use a prosody model to generate log-f0 and duration features, which are used to condition an acoustic model that generates mel-spectrograms. Experimental results demonstrate that the flow-based model achieves the best performance for spectrogram prediction, improving over equivalent diffusion and L1 models. Meanwhile, both diffusion and flow-based prosody predictors result in significant improvements over a typical L2-trained prosody models.

摘要
Traditional L1/L2-based approaches 是使用 Normalizing Flows 和 Diffusion Probabilistic Models 作为代替方案，以改善目标数据空间的假设。在这篇文章中，我们对 text-to-speech 合成中的 Prosody 和 mel-spectrogram 预测任务进行比较。我们使用 Prosody 模型生成 log-f0 和 duration 特征，然后使用这些特征来 condition 一个 acoustic 模型，生成 mel-spectrogram。实验结果表明，流动基于模型可以在 spectrogram 预测任务中取得最佳性能，超过相当的扩散和 L1 模型。同时，流动和扩散基于 Prosody 预测器都导致了 significativly 改善于 Typical L2 训练的 Prosody 模型。

End-to-End Reinforcement Learning for Torque Based Variable Height Hopping

paper_url: http://arxiv.org/abs/2307.16676
repo_url: None
paper_authors: Raghav Soni, Daniel Harnack, Hauke Isermann, Sotaro Fushimi, Shivesh Kumar, Frank Kirchner
for: 该论文主要针对的是如何通过RL控制方法实现跑动动态控制，以提高跑动机器人在自然或非结构化地形上的穿行能力。
methods: 该论文使用了RL控制方法，并通过练习学习来学习一个端到端的扭矩控制器，该控制器能够自动探测不同的跳跃阶段，并使用不同的控制器来处理每个阶段。
results: 该论文的实验结果表明，使用RL控制方法可以成功地学习一个端到端的扭矩控制器，并且可以在实际中转移到机器人上进行运行，无需进行参数调整。

Abstract
Legged locomotion is arguably the most suited and versatile mode to deal with natural or unstructured terrains. Intensive research into dynamic walking and running controllers has recently yielded great advances, both in the optimal control and reinforcement learning (RL) literature. Hopping is a challenging dynamic task involving a flight phase and has the potential to increase the traversability of legged robots. Model based control for hopping typically relies on accurate detection of different jump phases, such as lift-off or touch down, and using different controllers for each phase. In this paper, we present a end-to-end RL based torque controller that learns to implicitly detect the relevant jump phases, removing the need to provide manual heuristics for state detection. We also extend a method for simulation to reality transfer of the learned controller to contact rich dynamic tasks, resulting in successful deployment on the robot after training without parameter tuning.

摘要
四肢行走是可能适应自然或非结构化地形的最佳和多样化模式。最近的研究对动态步行和跑步控制器的优化和强化学习（RL）文献已经取得了很大的进步。跳跃是一项复杂的动态任务，涉及飞行阶段，可以提高四肢机器人的通行性。基于模型的控制方法通常需要精准地探测不同的跳跃阶段，如升空或落地，然后使用不同的控制器来处理每个阶段。在这篇论文中，我们提出了一种终端RL基于扭矩控制器，该控制器通过学习来隐式探测相关的跳跃阶段，从而消除了手动规则的需求。我们还扩展了在模拟和实际中转移学习的方法，使得在机器人上部署已经训练好的控制器而无需参数调整。

Generative models for wearables data

paper_url: http://arxiv.org/abs/2307.16664
repo_url: None
paper_authors: Arinbjörn Kolbeinsson, Luca Foschini
for: 解决医疗研究中的数据缺乏问题，通过生成健康数据来提供效率和成本效果的解决方案，帮助研究人员探索现有观察数据中不表现或难以访问的分布和人口。
methods: 提出了一种多任务自注意模型，可生成真实的穿戴式活动数据，并对生成数据的特点进行了量化和质量的评估。
results: 研究人员通过量化和质量的评估方法，发现生成的数据具有真实性和可靠性，能够满足医疗研究中的数据需求。

Abstract
Data scarcity is a common obstacle in medical research due to the high costs associated with data collection and the complexity of gaining access to and utilizing data. Synthesizing health data may provide an efficient and cost-effective solution to this shortage, enabling researchers to explore distributions and populations that are not represented in existing observations or difficult to access due to privacy considerations. To that end, we have developed a multi-task self-attention model that produces realistic wearable activity data. We examine the characteristics of the generated data and quantify its similarity to genuine samples with both quantitative and qualitative approaches.

摘要
医学研究中数据缺乏是一个常见的障碍，这是因为数据收集的成本高昂，以及获取和利用数据的复杂性。synthesize health data可能提供一种高效且成本下降的解决方案，允许研究人员探索现有观察数据中未表示或难以访问的分布和人口。为此，我们开发了一种多任务自注意模型，生成了真实的运动活动数据。我们分析生成数据的特点，并使用量化和质量化方法衡量生成数据与真实样本之间的相似性。

Graph Structure from Point Clouds: Geometric Attention is All You Need

paper_url: http://arxiv.org/abs/2307.16662
repo_url: https://github.com/murnanedaniel/geometricattention
paper_authors: Daniel Murnane
for: 本研究 targets the problem of top jet tagging in high energy physics, using graph neural networks to improve the accuracy and efficiency of the task.
methods: 本研究提出了一种 attention mechanism, named GravNetNorm, which learns a graph structure in a high-dimensional space to handle the flow of relevance and solve the Topology Problem.
results: 实验结果表明，GravNetNorm 比其他相关模型具有更高的标签准确率和更少的计算资源消耗。

Abstract
The use of graph neural networks has produced significant advances in point cloud problems, such as those found in high energy physics. The question of how to produce a graph structure in these problems is usually treated as a matter of heuristics, employing fully connected graphs or K-nearest neighbors. In this work, we elevate this question to utmost importance as the Topology Problem. We propose an attention mechanism that allows a graph to be constructed in a learned space that handles geometrically the flow of relevance, providing one solution to the Topology Problem. We test this architecture, called GravNetNorm, on the task of top jet tagging, and show that it is competitive in tagging accuracy, and uses far fewer computational resources than all other comparable models.

摘要
使用图 neural network 已经在点云问题中取得了重要进展，如高能物理学中的问题。通常，在这些问题中建立图结构的问题是视为低级别的问题，通过全连接图或 K-最近邻居的方式进行处理。在这项工作中，我们提升了这个问题的重要性，将其称为Topology Problem。我们提议一种注意力机制，使得在学习空间中建立一个图，可以处理空间上的流动相关性，解决Topology Problem。我们测试了我们称为GravNetNorm的架构，并在top jet tagging任务上达到了与其他相关模型相当的准确率，同时使用的计算资源也是相对较少的。

Proactive Resource Request for Disaster Response: A Deep Learning-based Optimization Model

paper_url: http://arxiv.org/abs/2307.16661
repo_url: None
paper_authors: Hongzhe Zhang, Xiaohang Zhao, Xiao Fang, Bintong Chen
for: 这个论文旨在提出一种新的资源管理问题，以满足灾害应急响应中的资源需求。
methods: 这个论文使用了深度学习方法来预测未来需求，并将问题表述为一个Stochastic Optimization模型。
results: 论文的实验结果表明，该方法比现有方法更高效，并在多个维度和目标下进行了优化。

Abstract
Disaster response is critical to save lives and reduce damages in the aftermath of a disaster. Fundamental to disaster response operations is the management of disaster relief resources. To this end, a local agency (e.g., a local emergency resource distribution center) collects demands from local communities affected by a disaster, dispatches available resources to meet the demands, and requests more resources from a central emergency management agency (e.g., Federal Emergency Management Agency in the U.S.). Prior resource management research for disaster response overlooks the problem of deciding optimal quantities of resources requested by a local agency. In response to this research gap, we define a new resource management problem that proactively decides optimal quantities of requested resources by considering both currently unfulfilled demands and future demands. To solve the problem, we take salient characteristics of the problem into consideration and develop a novel deep learning method for future demand prediction. We then formulate the problem as a stochastic optimization model, analyze key properties of the model, and propose an effective solution method to the problem based on the analyzed properties. We demonstrate the superior performance of our method over prevalent existing methods using both real world and simulated data. We also show its superiority over prevalent existing methods in a multi-stakeholder and multi-objective setting through simulations.

摘要
灾害应急应对是保存生命和减少灾害后果的关键。紧急应急资源分配的管理是灾害应急应对的核心。因此，当地机构（例如本地紧急资源分配中心）会收集受到灾害影响的本地社区的需求，派发可用资源来满足需求，并请求中央紧急管理机构（例如美国联邦紧急管理署）提供更多资源。然而，过去的资源管理研究通常忽略了确定最佳资源请求量的问题。为了填补这一研究漏洞，我们定义了一个新的资源管理问题，该问题旨在预测未来需求，并考虑当前未满足的需求和未来需求。为了解决这个问题，我们首先考虑了问题的重要特征，然后开发了一种新的深度学习方法来预测未来需求。接着，我们将问题转化为一个随机优化模型，分析了模型的关键属性，并提出了一种有效的解决方案。我们通过使用实际数据和预测数据进行比较，证明了我们的方法的超越性。此外，我们还通过多元参与者和多元目标的 simulate 示例，证明了我们的方法在多元参与者和多元目标下的超越性。

Sequential and Shared-Memory Parallel Algorithms for Partitioned Local Depths

paper_url: http://arxiv.org/abs/2307.16652
repo_url: None
paper_authors: Aditya Devarakonda, Grey Ballard
for: 本文研究了一种基于对比距离的分区本地深度（PaLD）方法，用于检测 dense和稀疏社群中的强相关关系。
methods: 本文提出了两种变种，通过对 triplet 进行比较来进行社群结构分析。 authors 还提供了关于计算和通信成本的理论分析，并证明了Sequential 算法是通信优化的，即使在常数因素的限制下。
results: authors 介绍了一些性能优化策略，使Sequential 实现可以达到 $29\times$ 的速度提升，并在使用多个线程的情况下达到 $19.4\times$ 的速度提升。

Abstract
In this work, we design, analyze, and optimize sequential and shared-memory parallel algorithms for partitioned local depths (PaLD). Given a set of data points and pairwise distances, PaLD is a method for identifying strength of pairwise relationships based on relative distances, enabling the identification of strong ties within dense and sparse communities even if their sizes and within-community absolute distances vary greatly. We design two algorithmic variants that perform community structure analysis through triplet comparisons of pairwise distances. We present theoretical analyses of computation and communication costs and prove that the sequential algorithms are communication optimal, up to constant factors. We introduce performance optimization strategies that yield sequential speedups of up to $29\times$ over a baseline sequential implementation and parallel speedups of up to $19.4\times$ over optimized sequential implementations using up to $32$ threads on an Intel multicore CPU.

摘要
在这项工作中，我们设计、分析和优化了继承和分享内存并行算法 для分割本地深度（PaLD）。给定一组数据点和对称距离，PaLD 是一种方法，用于根据相对距离确定对的强度，从而在密集和稀疏社区中确定强聚合，即使社区大小和内部绝对距离差异较大。我们设计了两种算法变体，通过 triplet 比较来执行社区结构分析。我们提供了计算和通信成本的理论分析，并证明Sequential 算法是通信优化的，即使到 constants 因素。我们介绍了性能优化策略，其中包括Sequential 加速的最多 $29\times$，以及并行加速的最多 $19.4\times$，使用 Intel 多核CPU 上的最多 $32$ 个线程。

UDAMA: Unsupervised Domain Adaptation through Multi-discriminator Adversarial Training with Noisy Labels Improves Cardio-fitness Prediction

paper_url: http://arxiv.org/abs/2307.16651
repo_url: https://github.com/yvonneywu/udama
paper_authors: Yu Wu, Dimitris Spathis, Hong Jia, Ignacio Perez-Pozuelo, Tomas Gonzales, Soren Brage, Nicholas Wareham, Cecilia Mascolo
for:* 这个研究旨在提高健康监控应用中深度学习模型的表现，并且利用不精确的标签数据来实现这一目的。methods:* 这个研究使用了两个关键 комponents：Unsupervised Domain Adaptation 和 Multidiscriminator Adversarial Training，并在这两个部分中进行了训练。results:* 研究结果显示，UDAMA 可以对不同的标签分布进行适应，并且在两个自由生活调查中与竞争性转移学习和现有的领域适应模型相比，表现出色。

Abstract
Deep learning models have shown great promise in various healthcare monitoring applications. However, most healthcare datasets with high-quality (gold-standard) labels are small-scale, as directly collecting ground truth is often costly and time-consuming. As a result, models developed and validated on small-scale datasets often suffer from overfitting and do not generalize well to unseen scenarios. At the same time, large amounts of imprecise (silver-standard) labeled data, annotated by approximate methods with the help of modern wearables and in the absence of ground truth validation, are starting to emerge. However, due to measurement differences, this data displays significant label distribution shifts, which motivates the use of domain adaptation. To this end, we introduce UDAMA, a method with two key components: Unsupervised Domain Adaptation and Multidiscriminator Adversarial Training, where we pre-train on the silver-standard data and employ adversarial adaptation with the gold-standard data along with two domain discriminators. In particular, we showcase the practical potential of UDAMA by applying it to Cardio-respiratory fitness (CRF) prediction. CRF is a crucial determinant of metabolic disease and mortality, and it presents labels with various levels of noise (goldand silver-standard), making it challenging to establish an accurate prediction model. Our results show promising performance by alleviating distribution shifts in various label shift settings. Additionally, by using data from two free-living cohort studies (Fenland and BBVS), we show that UDAMA consistently outperforms up to 12% compared to competitive transfer learning and state-of-the-art domain adaptation models, paving the way for leveraging noisy labeled data to improve fitness estimation at scale.

摘要
深度学习模型在医疗监测应用中表现出了很大的搭配性。然而，大多数医疗数据集（高品质标准）的标签是小规模的，因为直接收集真实标签是成本高昂且时间consuming。因此，基于小规模数据集开发和验证的模型往往会出现过拟合问题，并不能良好地适应未看过的场景。同时，大量的不准确（银标准）标签数据，由现代便携设备提供的不准确标签，开始出现。然而，由于测量差异，这些数据表现出了明显的标签分布偏移，这种情况驱动我们使用领域适应。为此，我们介绍了UDAMA方法，它包括无监督领域适应和多discriminator对抗学习。我们在银标准数据上预训练，并使用银标准数据和两个领域探测器进行对抗适应。特别是，我们通过应用UDAMA方法于心血管健康（CRF）预测，CRF是生物 markers的关键指标，并且标签存在各种噪音（银标准和金标准），因此建立准确的预测模型是一个挑战。我们的结果表明UDAMA方法在不同的标签分布偏移情况下能够提供有望的性能。此外，通过使用两个自由生活 cohort study（Fenland和BBVS）的数据，我们表明UDAMA方法可以在不同的预测任务上持续性高于12%的竞争对手和现有的领域适应模型。这种表现，预示了可以通过不准确的标签数据来改善健康评估的可能性。

LLMs4OL: Large Language Models for Ontology Learning

paper_url: http://arxiv.org/abs/2307.16648
repo_url: https://github.com/hamedbabaei/llms4ol
paper_authors: Hamed Babaei Giglou, Jennifer D’Souza, Sören Auer
for: 这篇论文旨在探讨 whether Large Language Models (LLMs) can effectively apply their language pattern capturing capability to Ontology Learning (OL), and evaluate the performance of nine different LLM model families for three main OL tasks.
methods: 该论文使用了零shot prompting方法进行评估，并使用了多种ontoLOGical knowledge genre，包括lexicosemantic knowledge in WordNet、geographical knowledge in GeoNames和医学知识 in UMLS。
results: 论文的评估结果表明，LLMs可以很好地应用其语言模式捕捉能力来支持OL任务，并且不同的LLM模型家族在不同的OL任务上的表现有所不同。

Abstract
We propose the LLMs4OL approach, which utilizes Large Language Models (LLMs) for Ontology Learning (OL). LLMs have shown significant advancements in natural language processing, demonstrating their ability to capture complex language patterns in different knowledge domains. Our LLMs4OL paradigm investigates the following hypothesis: \textit{Can LLMs effectively apply their language pattern capturing capability to OL, which involves automatically extracting and structuring knowledge from natural language text?} To test this hypothesis, we conduct a comprehensive evaluation using the zero-shot prompting method. We evaluate nine different LLM model families for three main OL tasks: term typing, taxonomy discovery, and extraction of non-taxonomic relations. Additionally, the evaluations encompass diverse genres of ontological knowledge, including lexicosemantic knowledge in WordNet, geographical knowledge in GeoNames, and medical knowledge in UMLS.

摘要
我们提出LLMs4OL方法，该方法利用大型自然语言模型（LLMs）进行ontology学习（OL）。LLMs在不同知识领域的自然语言处理中已经表现出了明显的进步，其能够捕捉复杂的语言模式。我们的LLMs4OL思想是：\textit{可以LLMs通过捕捉和结构化自然语言文本中的知识来进行OL吗？} 为了证明这一假设，我们采用零处理方法进行全面的评估。我们评估了9种不同的LLM模型家族，用于三个主要的OL任务：类型映射、树 струкucture发现和非树结构EXTRACT。此外，评估还涵盖了多种类型的ontological知识，包括lexicosemantic知识在WordNet、地理知识在GeoNames和医疗知识在UMLS。

Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks

paper_url: http://arxiv.org/abs/2307.16630
repo_url: https://github.com/Eyr3/TextCRS
paper_authors: Xinyu Zhang, Hanbin Hong, Yuan Hong, Peng Huang, Binghui Wang, Zhongjie Ba, Kui Ren
for: 本研究旨在提高语言模型对文本攻击的Robustness，尤其是对synonym substitution和word insertion等文本攻击。
methods: 我们提出了一种通用的证明 robustness 框架——Text-CRS，基于随机填充。我们通过对每个单词 adversarial operation（synonym substitution、word reordering、insertion和deletion）进行 permutation和embedding transformation，提出了新的smoothing定理来 deriv robustness bound在 permutation和embedding空间中。
results: 我们在多种语言模型和数据集上进行了substantial的实验，Text-CRS可以 Address all four different word-level adversarial operations，并 achieve a significant accuracy improvement。此外，我们还提供了第一个certified accuracy和radius的四个单词操作的benchmark，并超越了现有的certification against synonym substitution attacks。

Abstract
The language models, especially the basic text classification models, have been shown to be susceptible to textual adversarial attacks such as synonym substitution and word insertion attacks. To defend against such attacks, a growing body of research has been devoted to improving the model robustness. However, providing provable robustness guarantees instead of empirical robustness is still widely unexplored. In this paper, we propose Text-CRS, a generalized certified robustness framework for natural language processing (NLP) based on randomized smoothing. To our best knowledge, existing certified schemes for NLP can only certify the robustness against $\ell_0$ perturbations in synonym substitution attacks. Representing each word-level adversarial operation (i.e., synonym substitution, word reordering, insertion, and deletion) as a combination of permutation and embedding transformation, we propose novel smoothing theorems to derive robustness bounds in both permutation and embedding space against such adversarial operations. To further improve certified accuracy and radius, we consider the numerical relationships between discrete words and select proper noise distributions for the randomized smoothing. Finally, we conduct substantial experiments on multiple language models and datasets. Text-CRS can address all four different word-level adversarial operations and achieve a significant accuracy improvement. We also provide the first benchmark on certified accuracy and radius of four word-level operations, besides outperforming the state-of-the-art certification against synonym substitution attacks.

摘要
“语言模型，尤其是基本文本分类模型，在文本对抗攻击方面存在抵触。为了防御这些攻击，研究人员已经投入了大量时间和精力来提高模型的Robustness。然而，提供可证明的Robustness保证而不是实际的Robustness仍然是广泛未explored的领域。在这篇论文中，我们提出了Text-CRS，一种通用的证明Robustness框架 для自然语言处理（NLP），基于随机抽象。我们知道现有的NLP证明Robustness可以只 certificates against $\ell_0$ 抖锋攻击。我们将每种单词级对抗操作（即同义词替换、单词重新排序、插入和删除）表示为排序和嵌入变换的组合。我们提出了新的抽象定理，以获得对这些对抗操作的Robustness保证在排序和嵌入空间中。为了进一步提高证明精度和半径，我们考虑了单词之间的数学关系，并选择合适的噪声分布来进行随机抽象。最后，我们对多种语言模型和数据集进行了substantial的实验。Text-CRS可以处理所有四种单词级对抗操作，并达到了显著的准确率提高。我们还提供了对四种单词级对抗操作的证明精度和半径的首个Benchmark，并超越了现有的对同义词替换攻击的证明。”

Adversarial Causal Bayesian Optimization

paper_url: http://arxiv.org/abs/2307.16625
repo_url: None
paper_authors: Scott Sussex, Pier Giuseppe Sessa, Anastasiia Makarova, Andreas Krause
for: 这篇论文的目的是解决 causal bayesian optimization (CBO) 中其他代理或外部事件对系统的影响，以实现适应非站立性如天气变化、市场力量或敌对者。
methods: 这篇论文提出了 Adversarial Causal Bayesian Optimization (ACBO) 的概念，并提供了首个 ACBO 算法：Causal Bayesian Optimization with Multiplicative Weights (CBO-MW)。该方法结合了经典的在线学习策略和 causal 模型来计算优化的可能性损失。
results: 实验表明，CBO-MW 在 synthetic 环境和基于实际数据的环境中表现出色，并且在用户需求模式学习和车辆重新部署方面进行了实际应用。

Abstract
In Causal Bayesian Optimization (CBO), an agent intervenes on an unknown structural causal model to maximize a downstream reward variable. In this paper, we consider the generalization where other agents or external events also intervene on the system, which is key for enabling adaptiveness to non-stationarities such as weather changes, market forces, or adversaries. We formalize this generalization of CBO as Adversarial Causal Bayesian Optimization (ACBO) and introduce the first algorithm for ACBO with bounded regret: Causal Bayesian Optimization with Multiplicative Weights (CBO-MW). Our approach combines a classical online learning strategy with causal modeling of the rewards. To achieve this, it computes optimistic counterfactual reward estimates by propagating uncertainty through the causal graph. We derive regret bounds for CBO-MW that naturally depend on graph-related quantities. We further propose a scalable implementation for the case of combinatorial interventions and submodular rewards. Empirically, CBO-MW outperforms non-causal and non-adversarial Bayesian optimization methods on synthetic environments and environments based on real-word data. Our experiments include a realistic demonstration of how CBO-MW can be used to learn users' demand patterns in a shared mobility system and reposition vehicles in strategic areas.

摘要
在 causal bayesian 优化 (CBO) 中, 一个智能体 intervenes 在一个未知的结构性 causal 模型中，以最大化一个下游奖励变量。在这篇论文中，我们考虑了泛化，其中其他智能体或外部事件也 intervenes 在系统中，这对于适应非站点性（如天气变化、市场力量或敌对者）是关键的。我们将这种泛化的 CBO 称为 Adversarial Causal Bayesian Optimization (ACBO)，并介绍了首个 ACBO 算法：Causal Bayesian Optimization with Multiplicative Weights (CBO-MW)。我们的方法结合了经典的在线学习策略和 causal 模型来计算优胜的假设性奖励估计。为此，它计算出了 causal 图中的优胜性奖励估计。我们 derive regret bounds for CBO-MW，这些 regret bounds 自然地取决于图像相关的量。我们还提出了可扩展的实现方法，用于 combinatorial interventions 和 submodular 奖励。Empirically, CBO-MW 在 synthetic 环境和基于实际数据的环境中表现出色，超过了非 causal 和非对抗的 bayesian 优化方法。我们的实验包括一个实际的示例，用于在一个共享交通系统中学习用户的需求模式，并在策略性位置重新布置车辆。

Detecting diabetic retinopathy severity through fundus images using an ensemble of classifiers

paper_url: http://arxiv.org/abs/2307.16622
repo_url: None
paper_authors: Eduard Popescu, Adrian Groza, Ioana Damian
for: This paper aims to detect diabetic retinopathy severity levels using fundus images.methods: The proposed method includes data preprocessing, image segmentation, and feature extraction, followed by an ensemble of classifiers.results: The authors assess the trust in the system and evaluate the performance of the proposed method.

Abstract
Diabetic retinopathy is an ocular condition that affects individuals with diabetes mellitus. It is a common complication of diabetes that can impact the eyes and lead to vision loss. One method for diagnosing diabetic retinopathy is the examination of the fundus of the eye. An ophthalmologist examines the back part of the eye, including the retina, optic nerve, and the blood vessels that supply the retina. In the case of diabetic retinopathy, the blood vessels in the retina deteriorate and can lead to bleeding, swelling, and other changes that affect vision. We proposed a method for detecting diabetic diabetic severity levels. First, a set of data-prerpocessing is applied to available data: adaptive equalisation, color normalisation, Gaussian filter, removal of the optic disc and blood vessels. Second, we perform image segmentation for relevant markers and extract features from the fundus images. Third, we apply an ensemble of classifiers and we assess the trust in the system.

摘要
糖尿病Retinopathy是肉眼疾病，它是diabetes mellitus的常见并发症，可能影响视力。一种用于诊断糖尿病Retinopathy的方法是对眼球背部进行检查，包括retina、触觉神经和对retina的血管。在糖尿病Retinopathy中，retina中的血管会逐渐衰竭，导致内出血、软化和其他影响视力的变化。我们提出了一种用于评估糖尿病严重度的方法。首先，对可用数据进行数据处理：自适应平衡、色彩normal化、Gaussian filter和去除 optic disc和血管。其次，我们实现图像分割，提取fundus图像中相关的标记。第三，我们应用一个 ensemble of classifiers，并评估系统的可靠性。

LaplaceConfidence: a Graph-based Approach for Learning with Noisy Labels

paper_url: http://arxiv.org/abs/2307.16614
repo_url: None
paper_authors: Mingcai Chen, Yuntao Du, Wei Tang, Baoming Zhang, Hao Cheng, Shuwei Qian, Chongjun Wang
For: 本文提出了一种基于 Laplacian 能量的方法，可以在受损数据集上获得高质量的标签信任度。* Methods: 方法首先根据特征表示构建了所有受损样本的图，然后使用 Laplacian 能量来生成低能量图。clean标签应该适应低能量图，而噪音标签则不应该。* Results: 实验表明，LaplaceConfidence 方法在标准 benchmark 数据集上以及真实世界中的噪音下都能够获得更高的性能，比对 State-of-the-art 方法。

Abstract
In real-world applications, perfect labels are rarely available, making it challenging to develop robust machine learning algorithms that can handle noisy labels. Recent methods have focused on filtering noise based on the discrepancy between model predictions and given noisy labels, assuming that samples with small classification losses are clean. This work takes a different approach by leveraging the consistency between the learned model and the entire noisy dataset using the rich representational and topological information in the data. We introduce LaplaceConfidence, a method that to obtain label confidence (i.e., clean probabilities) utilizing the Laplacian energy. Specifically, it first constructs graphs based on the feature representations of all noisy samples and minimizes the Laplacian energy to produce a low-energy graph. Clean labels should fit well into the low-energy graph while noisy ones should not, allowing our method to determine data's clean probabilities. Furthermore, LaplaceConfidence is embedded into a holistic method for robust training, where co-training technique generates unbiased label confidence and label refurbishment technique better utilizes it. We also explore the dimensionality reduction technique to accommodate our method on large-scale noisy datasets. Our experiments demonstrate that LaplaceConfidence outperforms state-of-the-art methods on benchmark datasets under both synthetic and real-world noise.

摘要
在实际应用中，完美的标签很少可用，这使得开发机器学习算法可以抗杂的挑战性增大。现有方法通常是基于模型预测和噪声标签之间的差异来筛选噪声，假设样本 WITH 小分类损失是干净的。这个工作采用了不同的方法，利用数据中学习模型和整个噪声数据集之间的一致性，通过数据中的丰富表达和拓扑信息来获得标签信任度。我们引入了LaplaceConfidence方法，它可以在数据中获得标签信任度（即干净概率），基于Laplacian能量。具体来说，它首先基于所有噪声样本的特征表示构建图，然后将图的laplacian能量最小化来生成一个低能量图。干净标签应该适应低能量图，而噪声标签不应该。这样，LaplaceConfidence方法可以判断数据中的干净概率。此外，LaplaceConfidence方法被包含到一种整体的鲁棒训练方法中，其中co-training技术生成不偏的标签信任度，而标签修复技术更好地利用它。我们还探索了缩放技术以适应我们的方法在大规模噪声数据上进行。我们的实验表明，LaplaceConfidence方法在标准 benchmark 数据集上比state-of-the-art方法更高效，并且在实际噪声下也表现出色。

Noisy Self-Training with Data Augmentations for Offensive and Hate Speech Detection Tasks

paper_url: http://arxiv.org/abs/2307.16609
repo_url: https://github.com/jaugusto97/offense-self-training
paper_authors: João A. Leite, Carolina Scarton, Diego F. Silva
for: automatic detection of offensive and hateful comments online
methods: self-training and noisy self-training with textual data augmentations
results: consistent improvement in performance regardless of model size, but noisy self-training decreases performance on offensive and hate-speech domains compared to default method

Abstract
Online social media is rife with offensive and hateful comments, prompting the need for their automatic detection given the sheer amount of posts created every second. Creating high-quality human-labelled datasets for this task is difficult and costly, especially because non-offensive posts are significantly more frequent than offensive ones. However, unlabelled data is abundant, easier, and cheaper to obtain. In this scenario, self-training methods, using weakly-labelled examples to increase the amount of training data, can be employed. Recent "noisy" self-training approaches incorporate data augmentation techniques to ensure prediction consistency and increase robustness against noisy data and adversarial attacks. In this paper, we experiment with default and noisy self-training using three different textual data augmentation techniques across five different pre-trained BERT architectures varying in size. We evaluate our experiments on two offensive/hate-speech datasets and demonstrate that (i) self-training consistently improves performance regardless of model size, resulting in up to +1.5% F1-macro on both datasets, and (ii) noisy self-training with textual data augmentations, despite being successfully applied in similar settings, decreases performance on offensive and hate-speech domains when compared to the default method, even with state-of-the-art augmentations such as backtranslation.

摘要
在线社交媒体中流行着侮辱和仇恨的评论，需要自动检测这些评论的需求，因为每秒创建的帖子的数量太多。创建高质量的人类标注数据集是困难和昂贵的，特别是非侮辱的帖子比侮辱的帖子更多。然而，无标注数据却充沛，更容易、更便宜地获得。在这种情况下，自动训练方法可以被使用，使用弱标注的示例来增加训练数据的量。最近的“噪声”自动训练方法利用数据扩展技术来确保预测的一致性和增强对噪声数据和敌意攻击的抵抗力。在这篇论文中，我们对默认和噪声自动训练使用三种不同的文本数据扩展技术进行实验，并使用五种不同的预训练BERT架构，变化规模。我们对两个侮辱和仇恨频道进行评估，并证明了以下两点：（i）自动训练 invariantly 提高性，无论模型的规模如何，可以在两个频道上达到最高 +1.5% F1-macro 的性能，（ii）噪声自动训练，尽管在类似场景中得到成功，但在侮辱和仇恨频道上，与默认方法相比，即使使用最新的扩展技术，如回 перевод，也会导致性能下降。

NLLG Quarterly arXiv Report 06/23: What are the most influential current AI Papers?

paper_url: http://arxiv.org/abs/2308.04889
repo_url: https://github.com/nl2g/quaterly-arxiv
paper_authors: Steffen Eger, Christoph Leiter, Jonas Belouadi, Ran Zhang, Aida Kostikova, Daniil Larionov, Yanran Chen, Vivian Fresen
为：本文主要研究目的是提供arXiv上最受欢迎的40篇论文，尤其是关于自然语言处理（NLP）和机器学习（ML）的研究。* 方法：本文使用了 норма化引用统计来确定最受欢迎的论文，并分析了这些论文的主题和特征。* 结果：研究发现，在2023年第一季度，LLMs和ChatGPT在最受欢迎的论文中占据了主导地位，但在最近几个月内，ChatGPT的 популярность已经开始下降。此外，NLP相关的论文占了Influential论文的约60%，而ML相关的论文则占了约40%。研究还发现，最受欢迎的论文中最重要的问题包括LLM效率、评估技术、伦理考虑、embodied agents和问题解决方法。

Abstract
The rapid growth of information in the field of Generative Artificial Intelligence (AI), particularly in the subfields of Natural Language Processing (NLP) and Machine Learning (ML), presents a significant challenge for researchers and practitioners to keep pace with the latest developments. To address the problem of information overload, this report by the Natural Language Learning Group at Bielefeld University focuses on identifying the most popular papers on arXiv, with a specific emphasis on NLP and ML. The objective is to offer a quick guide to the most relevant and widely discussed research, aiding both newcomers and established researchers in staying abreast of current trends. In particular, we compile a list of the 40 most popular papers based on normalized citation counts from the first half of 2023. We observe the dominance of papers related to Large Language Models (LLMs) and specifically ChatGPT during the first half of 2023, with the latter showing signs of declining popularity more recently, however. Further, NLP related papers are the most influential (around 60\% of top papers) even though there are twice as many ML related papers in our data. Core issues investigated in the most heavily cited papers are: LLM efficiency, evaluation techniques, ethical considerations, embodied agents, and problem-solving with LLMs. Additionally, we examine the characteristics of top papers in comparison to others outside the top-40 list (noticing the top paper's focus on LLM related issues and higher number of co-authors) and analyze the citation distributions in our dataset, among others.

摘要
中文（简化字）生成人工智能（AI）领域的信息快速增长，特别是自然语言处理（NLP）和机器学习（ML）的相关领域，对研究人员和实践者来说，是一个巨大的挑战。为了解决信息泛洪的问题，本报告由比较言语学组于比辗大学编制，将关注arXiv上最受欢迎的40篇论文，强调NLP和ML领域的研究。目的是提供一份快速引导新手和已有研究人员了解当前趋势的最有影响力的研究。我们发现在2023年第一季度，LLM相关论文占据了主导地位，其中ChatGPT的影响力在最近减弱。此外，NLP相关论文占据了Influence的60%，尽管ML相关论文的数量为NLP相关论文的两倍。核心问题包括：LLM效率、评价技术、伦理考虑、具体代理人和问题解决方法。此外，我们还分析了top40篇论文的特点和其他论文之间的差异，以及数据集中的引用分布。

Audio-visual video-to-speech synthesis with synthesized input audio

paper_url: http://arxiv.org/abs/2307.16584
repo_url: None
paper_authors: Triantafyllos Kefalas, Yannis Panagakis, Maja Pantic
for: 这个论文旨在研究视频到语音合成 task 中使用视频和声音输入的效果。
methods: 这个论文使用了预训练的视频到语音模型来生成缺失的语音信号，然后使用视频和生成的语音作为输入，用一个Audio-Visual到语音合成模型来预测最终重建的语音。
results: 实验结果表明，这种方法在使用 raw waveforms 和 mel spectrograms 作为目标输出时都是成功的。

Abstract
Video-to-speech synthesis involves reconstructing the speech signal of a speaker from a silent video. The implicit assumption of this task is that the sound signal is either missing or contains a high amount of noise/corruption such that it is not useful for processing. Previous works in the literature either use video inputs only or employ both video and audio inputs during training, and discard the input audio pathway during inference. In this work we investigate the effect of using video and audio inputs for video-to-speech synthesis during both training and inference. In particular, we use pre-trained video-to-speech models to synthesize the missing speech signals and then train an audio-visual-to-speech synthesis model, using both the silent video and the synthesized speech as inputs, to predict the final reconstructed speech. Our experiments demonstrate that this approach is successful with both raw waveforms and mel spectrograms as target outputs.

摘要
<>转换文本到简化中文。>视频到语音合成 involve 重建说话人的语音信号从一个无声视频中。假设中的假设是，音频信号 either missing 或含有高量的噪声/损坏，使其不可用于处理。先前的文献中的工作 either 使用视频输入只或者在训练期间使用视频和音频输入，并在推理期间抛弃输入音频路径。在这个工作中，我们调查了使用视频和音频输入进行视频到语音合成的效果，并在训练和推理期间都使用这两种输入。我们使用预训练的视频到语音模型来合成缺失的语音信号，然后使用两个输入（即无声视频和合成的语音）来预测最终重建的语音。我们的实验表明，这种方法在 Raw waveform 和 mel spectrogram 作为目标输出时都是成功的。

A multiscale and multicriteria Generative Adversarial Network to synthesize 1-dimensional turbulent fields

paper_url: http://arxiv.org/abs/2307.16580
repo_url: None
paper_authors: Carlos Granero-Belinchon, Manuel Cabeza Gallucci
for: This paper introduces a new neural network stochastic model to generate a 1-dimensional stochastic field with turbulent velocity statistics, with the goal of accurately capturing the energy distribution, energy cascade, and intermittency across scales in turbulence.
methods: The model used in this paper is a Generative Adversarial Network (GAN) with multiple multiscale optimization criteria, including physics-based criteria based on the Kolmogorov and Obukhov statistical theories of fully developed turbulence. The model is fully convolutional with varying kernel sizes, and is trained using turbulent velocity signals from grid turbulence at Modane wind tunnel.
results: The paper reports that the proposed model is able to accurately capture the energy distribution, energy cascade, and intermittency across scales in turbulence, as demonstrated through experiments using turbulent velocity signals from the Modane wind tunnel.

Abstract
This article introduces a new Neural Network stochastic model to generate a 1-dimensional stochastic field with turbulent velocity statistics. Both the model architecture and training procedure ground on the Kolmogorov and Obukhov statistical theories of fully developed turbulence, so guaranteeing descriptions of 1) energy distribution, 2) energy cascade and 3) intermittency across scales in agreement with experimental observations. The model is a Generative Adversarial Network with multiple multiscale optimization criteria. First, we use three physics-based criteria: the variance, skewness and flatness of the increments of the generated field that retrieve respectively the turbulent energy distribution, energy cascade and intermittency across scales. Second, the Generative Adversarial Network criterion, based on reproducing statistical distributions, is used on segments of different length of the generated field. Furthermore, to mimic multiscale decompositions frequently used in turbulence's studies, the model architecture is fully convolutional with kernel sizes varying along the multiple layers of the model. To train our model we use turbulent velocity signals from grid turbulence at Modane wind tunnel.

摘要
The model is a generative adversarial network (GAN) with multiple multiscale optimization criteria. First, three physics-based criteria are used: the variance, skewness, and flatness of the increments of the generated field, which respectively retrieve the turbulent energy distribution, energy cascade, and intermittency across scales. Second, the GAN criterion, based on reproducing statistical distributions, is used on segments of different length of the generated field.Furthermore, to mimic multiscale decompositions frequently used in turbulence studies, the model architecture is fully convolutional with kernel sizes varying along the multiple layers of the model. To train the model, turbulent velocity signals from grid turbulence at Modane wind tunnel are used.

The Decimation Scheme for Symmetric Matrix Factorization

paper_url: http://arxiv.org/abs/2307.16564
repo_url: None
paper_authors: Francesco Camilli, Marc Mézard
for: 研究矩阵分解的基本统计限制，以掌握其在深度学习中的应用。
methods: 提出了一种名为“减少”的方法，通过将问题映射到一系列神经网络模型，以实现矩阵分解。这种方法可以 theoretically analyzable，但不是最优的。
results: 对两种矩阵 families 进行了扩展和分析，并证明了在低温限下，这种方法的自由 entropy 采取一种统一的形式。对于稀疏牛顿假设，则证明了矩阵分解存储容量在缺失模式增加时而增长。还提出了一种基于底层搜索的简单算法，可以实现矩阵分解，无需初始化。

Abstract
Matrix factorization is an inference problem that has acquired importance due to its vast range of applications that go from dictionary learning to recommendation systems and machine learning with deep networks. The study of its fundamental statistical limits represents a true challenge, and despite a decade-long history of efforts in the community, there is still no closed formula able to describe its optimal performances in the case where the rank of the matrix scales linearly with its size. In the present paper, we study this extensive rank problem, extending the alternative 'decimation' procedure that we recently introduced, and carry out a thorough study of its performance. Decimation aims at recovering one column/line of the factors at a time, by mapping the problem into a sequence of neural network models of associative memory at a tunable temperature. Though being sub-optimal, decimation has the advantage of being theoretically analyzable. We extend its scope and analysis to two families of matrices. For a large class of compactly supported priors, we show that the replica symmetric free entropy of the neural network models takes a universal form in the low temperature limit. For sparse Ising prior, we show that the storage capacity of the neural network models diverges as sparsity in the patterns increases, and we introduce a simple algorithm based on a ground state search that implements decimation and performs matrix factorization, with no need of an informative initialization.

摘要
矩阵分解是一个推理问题，因其广泛的应用领域，从词语学习到推荐系统和深度学习 networks。研究其基本统计限制是一项真正挑战，尽管社区在过去的十年内努力努力，仍没有能描述其优化性能的关闭式公式，即使矩阵的排名与其大小成直线关系。在 presente 纸上，我们研究这个广泛的排名问题，扩展我们先前提出的 'decimation' 程序，并进行了详细的性能研究。decimation 的目的是一次一列/一行的因子，通过将问题映射到一个可调温度的神经网络模型中，以实现矩阵分解。虽然不是最优的，但 decimation 具有可分析的优点。我们将其扩展到两个家族的矩阵上，并对一类受限支持的假设进行了扩展性分析。在低温限下，我们证明了神经网络模型的自由能量的复制同归于普遍形式。对于稀疏的 Иссинг 假设，我们证明了矩阵分解中存储容量的增长，并引入了一种简单的地面搜索算法，实现了矩阵分解，无需具有有用的初始化。

Line Search for Convex Minimization

paper_url: http://arxiv.org/abs/2307.16560
repo_url: None
paper_authors: Laurent Orseau, Marcus Hutter
for: 这个论文的目的是提出一种新的搜索算法，用于最小化一元函数中的极值点。
methods: 该算法使用了 golden-section 搜索和 bisect 搜索两种主要的原理，并且利用了函数查询和导数查询来加速收敛。
results: 实验表明，该算法比其前一个论文中的搜索算法更快，通常比其快得多于一倍。此外，该算法还可以用于 quasi-exact line search，并且可以与导数下降的搜索 algorithms 进行比较。

Abstract
Golden-section search and bisection search are the two main principled algorithms for 1d minimization of quasiconvex (unimodal) functions. The first one only uses function queries, while the second one also uses gradient queries. Other algorithms exist under much stronger assumptions, such as Newton's method. However, to the best of our knowledge, there is no principled exact line search algorithm for general convex functions -- including piecewise-linear and max-compositions of convex functions -- that takes advantage of convexity. We propose two such algorithms: $\Delta$-Bisection is a variant of bisection search that uses (sub)gradient information and convexity to speed up convergence, while $\Delta$-Secant is a variant of golden-section search and uses only function queries. While bisection search reduces the $x$ interval by a factor 2 at every iteration, $\Delta$-Bisection reduces the (sometimes much) smaller $x^*$-gap $\Delta^x$ (the $x$ coordinates of $\Delta$) by at least a factor 2 at every iteration. Similarly, $\Delta$-Secant also reduces the $x^*$-gap by at least a factor 2 every second function query. Moreover, the $y^*$-gap $\Delta^y$ (the $y$ coordinates of $\Delta$) also provides a refined stopping criterion, which can also be used with other algorithms. Experiments on a few convex functions confirm that our algorithms are always faster than their quasiconvex counterparts, often by more than a factor 2. We further design a quasi-exact line search algorithm based on $\Delta$-Secant. It can be used with gradient descent as a replacement for backtracking line search, for which some parameters can be finicky to tune -- and we provide examples to this effect, on strongly-convex and smooth functions. We provide convergence guarantees, and confirm the efficiency of quasi-exact line search on a few single- and multivariate convex functions.

摘要
金叶搜索和BIsection搜索是一维最小化逻辑函数的两种主要原则化算法。前者只使用函数查询，而后者还使用梯度查询。其他算法存在更加强大的假设，如新颖方法。然而，我们所知道的情况下，没有原则化正确线搜索算法，可以在一般凸函数（包括分割凸函数和最大组合凸函数）上使用凸性，并且可以快速 converge。我们提出了两种算法：Δ-BIsection是BIsection搜索的变种，使用（子）梯度信息和凸性来加速收敛，而Δ-Secant是金叶搜索的变种，只使用函数查询。在每次迭代中，BIsection搜索将($x$)间隔减少为2，而Δ-BIsection将($x^*$)间隔减少为至少2。 similarly，Δ-Secant将($x^*$)间隔减少为至少2 every second function query。此外，($y^*$)间隔还提供了一个精细的停止条件，可以与其他算法一起使用。我们的实验表明，我们的算法在一些凸函数上比其相对凸函数更快，常常高于2倍。我们还设计了一种 quasi-exact line search algorithm，基于Δ-Secant。它可以与梯度下降为替换backtracking line search，对于一些参数可能需要繁琐调整。我们提供了一些示例，包括强拟合函数和 глад度函数。我们提供了收敛保证，并在一些单变量和多变量凸函数上验证了 quasi-exact line search 的效率。

Simultaneous column-based deep learning progression analysis of atrophy associated with AMD in longitudinal OCT studies

paper_url: http://arxiv.org/abs/2307.16559
repo_url: None
paper_authors: Adi Szeskin, Roei Yehuda, Or Shmueli, Jaime Levy, Leo Joskowicz
For: The paper aims to develop a fully automatic end-to-end pipeline for detecting and quantifying retinal atrophy changes associated with dry AMD in pairs of OCT scans.* Methods: The proposed method uses a novel simultaneous multi-channel column-based deep learning model that concurrently detects and segments retinal atrophy segments in consecutive OCT scans by classifying light scattering patterns in matched pairs of vertical pixel-wide columns (A-scans) in registered prior and current OCT slices (B-scans).* Results: The experimental results on a dataset of 4,040 OCT slices with 5.2M columns from 40 scan pairs of 18 patients show a mean atrophy segments detection precision of 0.90+-0.09 and a recall of 0.95+-0.06, outperforming standalone classification methods by 30+-62% and 27+-0% for atrophy segments and lesions, respectively.Here’s the same information in Simplified Chinese text:* For: 本研究旨在开发一个完全自动的末端到终端管道，用于检测和评估涂炭病关节associated with dry AMD的脑细胞变化。* Methods: 提议的方法使用了一种新的同时多通道列基的深度学习模型，该模型同时检测并分割涂炭病关节的变化，通过匹配的列宽像素列（A-scan）在注册的先前和当前 OCT slice（B-scan）中类型化光散射模式。* Results: 实验结果表明，在40个扫描对（18名患者，每名患者有40个扫描）的4,040个 OCT slice 中，5.2亿个列上的结果显示，同时检测和分割涂炭病关节的方法可以达到0.90+-0.09的检测精度和0.95+-0.06的回归率，相比单独的检测方法，提高了30+-62%和27+-0%。

Abstract
Purpose: Disease progression of retinal atrophy associated with AMD requires the accurate quantification of the retinal atrophy changes on longitudinal OCT studies. It is based on finding, comparing, and delineating subtle atrophy changes on consecutive pairs (prior and current) of unregistered OCT scans. Methods: We present a fully automatic end-to-end pipeline for the simultaneous detection and quantification of time-related atrophy changes associated with dry AMD in pairs of OCT scans of a patient. It uses a novel simultaneous multi-channel column-based deep learning model trained on registered pairs of OCT scans that concurrently detects and segments retinal atrophy segments in consecutive OCT scans by classifying light scattering patterns in matched pairs of vertical pixel-wide columns (A-scans) in registered prior and current OCT slices (B-scans). Results: Experimental results on 4,040 OCT slices with 5.2M columns from 40 scans pairs of 18 patients (66% training/validation, 33% testing) with 24.13+-14.0 months apart in which Complete RPE and Outer Retinal Atrophy (cRORA) was identified in 1,998 OCT slices (735 atrophy lesions from 3,732 segments, 0.45M columns) yield a mean atrophy segments detection precision, recall of 0.90+-0.09, 0.95+-0.06 and 0.74+-0.18, 0.94+-0.12 for atrophy lesions with AUC=0.897, all above observer variability. Simultaneous classification outperforms standalone classification precision and recall by 30+-62% and 27+-0% for atrophy segments and lesions. Conclusions: simultaneous column-based detection and quantification of retinal atrophy changes associated with AMD is accurate and outperforms standalone classification methods. Translational relevance: an automatic and efficient way to detect and quantify retinal atrophy changes associated with AMD.

摘要
目的：检测和评估普遍疾病相关的肉眼衰竭变化，需要精准地量化 consecutiveslices of OCT Studies。这是基于发现，比较和定义极微的衰竭变化的方法。方法：我们提出了一个完全自动的终端到终点管道，用于同时检测和评估普遍疾病相关的肉眼衰竭变化。这种方法使用了一种同时多通道的 column-based深度学习模型，该模型在注册的Prior和Current OCT slice之间同时检测和分割肉眼衰竭 segment。该模型通过匹配vertical pixel-wide columns（A-scans）在注册的Prior和Current OCT slice（B-scans）中匹配的光散射模式来同时检测和分割肉眼衰竭segment。结果：我们在4,040个OCT slice中进行了40个scan pairs的实验，其中每个scan pairs包含24.13±14.0个月的时间差。在这些实验中，我们发现了1,998个OCT slice中存在普遍疾病相关的肉眼衰竭（cRORA），其中735个衰竭 lesion from 3,732个 segment，0.45M columns。我们的方法在这些OCT slice中达到了 mean atrophy segments检测精度和回归的0.90±0.09和0.95±0.06，同时 simultanous classification的精度和回归也高于单独的 classification方法，相对于衰竭 segments和 lesions的检测和分割， simultaneous classification的精度和回归高于30±62%和27±0%。结论：同时检测和评估普遍疾病相关的肉眼衰竭变化是一种准确的方法，并且高于单独的 classification方法。翻译结论：我们提出了一种自动和高效的方法，可以帮助检测和评估普遍疾病相关的肉眼衰竭变化。

Deep Learning and Computer Vision for Glaucoma Detection: A Review

paper_url: http://arxiv.org/abs/2307.16528
repo_url: None
paper_authors: Mona Ashtari-Majlan, Mohammad Mahdi Dehshibi, David Masip
for: 本研究旨在概述近年来计算机视觉和深度学习在诊断眼内压瘤方面的应用，包括基于fundus、optical coherence tomography和视场图像的诊断方法。
methods: 本研究主要介绍了深度学习基于方法，并提供了一个更新的分类法，将方法分为不同的建筑学 paradigma，并附上了可用的源代码以增强方法的重复性。
results: 通过对广泛使用的公共数据集进行严格的 benchmarking，我们揭示了一些普遍的性能差距，包括总体化、不确定性估计和多modal интеграción。此外，我们还细目了一些关键的数据集，并指出了限制，如批处大小、标签不一致和偏见。

Abstract
Glaucoma is the leading cause of irreversible blindness worldwide and poses significant diagnostic challenges due to its reliance on subjective evaluation. However, recent advances in computer vision and deep learning have demonstrated the potential for automated assessment. In this paper, we survey recent studies on AI-based glaucoma diagnosis using fundus, optical coherence tomography, and visual field images, with a particular emphasis on deep learning-based methods. We provide an updated taxonomy that organizes methods into architectural paradigms and includes links to available source code to enhance the reproducibility of the methods. Through rigorous benchmarking on widely-used public datasets, we reveal performance gaps in generalizability, uncertainty estimation, and multimodal integration. Additionally, our survey curates key datasets while highlighting limitations such as scale, labeling inconsistencies, and bias. We outline open research challenges and detail promising directions for future studies. This survey is expected to be useful for both AI researchers seeking to translate advances into practice and ophthalmologists aiming to improve clinical workflows and diagnosis using the latest AI outcomes.

摘要
全球最主要的眼病之一是水肿眼病，它具有较大的检测挑战，主要是因为它的诊断依赖于主观评估。然而，最近的计算机视觉和深度学习技术的进步已经表明了自动诊断的潜在可能性。在这篇论文中，我们回顾了最近的人工智能基于眼膜、光共振成像和视场图像的眼病诊断研究，尤其是深度学习基本的方法。我们提供了一个更新的分类法，将方法分为建筑学 paradigma，并提供了可用的源代码，以便增强方法的重现性。通过对广泛使用的公共数据集进行严格的 benchmarking，我们揭示了总体的一致性、不确定性估计和多modal集成的性能差距。此外，我们还提供了关键的数据集，并强调了限制，如规模、标签不一致和偏见。我们列出了未解决的研究挑战，并详细介绍了未来研究的可能性。这篇论文预计会对人工智能研究人员和眼科医生都是有用的，以便将最新的成果落实到实践中，并改善诊断和临床工作流程。

No Fair Lunch: A Causal Perspective on Dataset Bias in Machine Learning for Medical Imaging

paper_url: http://arxiv.org/abs/2307.16526
repo_url: None
paper_authors: Charles Jones, Daniel C. Castro, Fabio De Sousa Ribeiro, Ozan Oktay, Melissa McCradden, Ben Glocker
for: This paper is written for those who are concerned about fairness in clinical decision-making, particularly in the context of machine learning methods.
methods: The paper uses a causal perspective to identify and analyze different sources of bias in datasets, and introduces a three-step framework for reasoning about fairness in medical imaging.
results: The paper highlights the limitations of current mitigation methods for algorithmic bias, and provides a practical framework for developing safe and equitable AI prediction models.

Abstract
As machine learning methods gain prominence within clinical decision-making, addressing fairness concerns becomes increasingly urgent. Despite considerable work dedicated to detecting and ameliorating algorithmic bias, today's methods are deficient with potentially harmful consequences. Our causal perspective sheds new light on algorithmic bias, highlighting how different sources of dataset bias may appear indistinguishable yet require substantially different mitigation strategies. We introduce three families of causal bias mechanisms stemming from disparities in prevalence, presentation, and annotation. Our causal analysis underscores how current mitigation methods tackle only a narrow and often unrealistic subset of scenarios. We provide a practical three-step framework for reasoning about fairness in medical imaging, supporting the development of safe and equitable AI prediction models.

摘要
随着机器学习方法在医疗决策中升级，对公平性问题的解决变得越来越紧迫。虽然已经投入了大量的时间和精力来检测和改进算法的偏见，但今天的方法仍然存在有害的后果。我们的 causal 视角 shed 新的光 на算法偏见，指出不同的数据集偏见可能会看起来相同，但需要不同的修正策略。我们介绍了三种家族的 causal 偏见机制，来自不同的发病率、展示和注释的偏见。我们的 causal 分析表明，当前的修正方法只能处理一个窄而且经常不现实的子集的场景。我们提供了一个实用的三步框架，以便在医疗影像领域考虑公平性，支持开发安全和公平的 AI 预测模型。

Deception Abilities Emerged in Large Language Models

paper_url: http://arxiv.org/abs/2307.16513
repo_url: None
paper_authors: Thilo Hagendorff
for: 这研究旨在探讨大型自然语言模型（LLM）是如何适应人类价值观的，以及未来 LLM 是否可能成为人类操作员的欺骗工具。
methods: 该研究使用现代大型语言模型 GPT-4 进行实验，以证明这些模型在逻辑推理和复杂的欺骗场景中表现出色。
results: 研究发现，现代 LLM 已经拥有了骗取他人信任的能力，并且可以通过链条思维提高其欺骗性能。此外，通过引入 MACHIAVELLIANISM 来调节 LLM 的骗取倾向也被证明有效。

Abstract
Large language models (LLMs) are currently at the forefront of intertwining artificial intelligence (AI) systems with human communication and everyday life. Thus, aligning them with human values is of great importance. However, given the steady increase in reasoning abilities, future LLMs are under suspicion of becoming able to deceive human operators and utilizing this ability to bypass monitoring efforts. As a prerequisite to this, LLMs need to possess a conceptual understanding of deception strategies. This study reveals that such strategies emerged in state-of-the-art LLMs, such as GPT-4, but were non-existent in earlier LLMs. We conduct a series of experiments showing that state-of-the-art LLMs are able to understand and induce false beliefs in other agents, that their performance in complex deception scenarios can be amplified utilizing chain-of-thought reasoning, and that eliciting Machiavellianism in LLMs can alter their propensity to deceive. In sum, revealing hitherto unknown machine behavior in LLMs, our study contributes to the nascent field of machine psychology.

摘要

Classifying multilingual party manifestos: Domain transfer across country, time, and genre

paper_url: http://arxiv.org/abs/2307.16511
repo_url: https://github.com/slds-lmu/manifesto-domaintransfer
paper_authors: Matthias Aßenmacher, Nadja Sauter, Christian Heumann
for: 这个研究的目的是探讨域转移在政治宣言中的可靠性和可重用性。
methods: 这个研究使用了 transformer 模型，并通过 fine-tuning 来调整模型。在不同的地理位置、语言、时间和风格上进行了域转移。
results: 研究发现，BERT 和 DistilBERT 都能够在不同的域转移情况下获得良好的表现，但 DistilBERT 的计算成本较低。此外，研究发现不同国家的政治宣言之间存在一定的差异，即使这些国家使用同一种语言或文化背景。

Abstract
Annotating costs of large corpora are still one of the main bottlenecks in empirical social science research. On the one hand, making use of the capabilities of domain transfer allows re-using annotated data sets and trained models. On the other hand, it is not clear how well domain transfer works and how reliable the results are for transfer across different dimensions. We explore the potential of domain transfer across geographical locations, languages, time, and genre in a large-scale database of political manifestos. First, we show the strong within-domain classification performance of fine-tuned transformer models. Second, we vary the genre of the test set across the aforementioned dimensions to test for the fine-tuned models' robustness and transferability. For switching genres, we use an external corpus of transcribed speeches from New Zealand politicians while for the other three dimensions, custom splits of the Manifesto database are used. While BERT achieves the best scores in the initial experiments across modalities, DistilBERT proves to be competitive at a lower computational expense and is thus used for further experiments across time and country. The results of the additional analysis show that (Distil)BERT can be applied to future data with similar performance. Moreover, we observe (partly) notable differences between the political manifestos of different countries of origin, even if these countries share a language or a cultural background.

摘要
大型公司的标注成本仍是employmatical社会科学研究中的主要瓶颈。一方面，使用领域传递的能力可以重用标注数据集和训练模型。另一方面，不清楚领域传递如何工作，以及如何确定结果的可靠性。我们在一个大规模的政治宣言数据库中探索领域传递的潜力。首先，我们显示了针对域内数据集进行细化的转换器模型的强大在域内分类性能。其次，我们在不同的维度上变换测试集，以测试细化模型的可重用性和鲁棒性。为switching genre，我们使用新西兰政治人物的口头演讲录音库；对其他三个维度，我们使用 manuscripdbase中的自定义分割。虽然BERT在初始实验中Across modalities achieve the best scores, DistilBERT proves to be competitive at a lower computational expense and is thus used for further experiments across time and country. Additional analysis shows that (Distil)BERT can be applied to future data with similar performance. Moreover, we observe (partly) notable differences between the political manifestos of different countries of origin, even if these countries share a language or a cultural background.

Explainable Equivariant Neural Networks for Particle Physics: PELICAN

paper_url: http://arxiv.org/abs/2307.16506
repo_url: https://github.com/abogatskiy/pelican
paper_authors: Alexander Bogatskiy, Timothy Hoffman, David W. Miller, Jan T. Offermann, Xiaoyang Liu
For: The paper is written for the task of tagging and reconstructing Lorentz-boosted top quarks, specifically identifying and measuring the $W$-boson in the dense final state.* Methods: The paper proposes a novel permutation equivariant and Lorentz invariant or covariant aggregator network called PELICAN, which employs a fundamentally symmetry group-based architecture to overcome common limitations in particle physics problems.* Results: PELICAN outperforms existing competitors with much lower model complexity and high sample efficiency on the standard task of Lorentz-boosted top quark tagging, and also outperforms hand-crafted algorithms on the less common and more complex task of four-momentum regression.

Abstract
We present a comprehensive study of the PELICAN machine learning algorithm architecture in the context of both tagging (classification) and reconstructing (regression) Lorentz-boosted top quarks, including the difficult task of specifically identifying and measuring the $W$-boson inside the dense environment of the boosted hadronic final state. PELICAN is a novel permutation equivariant and Lorentz invariant or covariant aggregator network designed to overcome common limitations found in architectures applied to particle physics problems. Compared to many approaches that use non-specialized architectures that neglect underlying physics principles and require very large numbers of parameters, PELICAN employs a fundamentally symmetry group-based architecture that demonstrates benefits in terms of reduced complexity, increased interpretability, and raw performance. When tested on the standard task of Lorentz-boosted top quark tagging, PELICAN outperforms existing competitors with much lower model complexity and high sample efficiency. On the less common and more complex task of four-momentum regression, PELICAN also outperforms hand-crafted algorithms. We discuss the implications of symmetry-restricted architectures for the wider field of machine learning for physics.

摘要
我们提出了一项全面的PELICAN机器学习算法架构研究，包括标记（分类）和重建（回归） Lorentz-扩展类题粒子，包括困难的内部 $W-$ boson 识别和量测在扩展有核心环境的对应核心粒子状态中。 PELICAN 是一个新的对称平衡和 Lorentz 不变的网络架构，用于超越物理问题中常见的限制。相比许多使用非特殊架构的方法，PELICAN 使用基本的 Symmetry 集合基础架构，实现了对复杂性、可读性和原生性的改善。在标准任务中 Lorentz-扩展类题粒子标识中，PELICAN 超过了现有的竞争对手，具有较低的模型复杂度和高的样本效率。在更少见且更复杂的任务中，四维动量回归中，PELICAN 也超过了手工构成的算法。我们讨论了对物理机器学习领域的对称限制的影响。

Value-Informed Skill Chaining for Policy Learning of Long-Horizon Tasks with Surgical Robot

paper_url: http://arxiv.org/abs/2307.16503
repo_url: https://github.com/med-air/viskill
paper_authors: Tao Huang, Kai Chen, Wang Wei, Jianan Li, Yonghao Long, Qi Dou
for: solves long-horizon surgical robot tasks with multiple steps over an extended duration of time.
methods: uses value-informed skill chaining (ViSkill) with a state value function to distinguish suitable terminal states for starting subtask policies, and a chaining policy to instruct subtask policies to terminate at the highest-value state.
results: demonstrates effectiveness on three complex surgical robot tasks from SurRoL, achieving high task success rates and execution efficiency.

Abstract
Reinforcement learning is still struggling with solving long-horizon surgical robot tasks which involve multiple steps over an extended duration of time due to the policy exploration challenge. Recent methods try to tackle this problem by skill chaining, in which the long-horizon task is decomposed into multiple subtasks for easing the exploration burden and subtask policies are temporally connected to complete the whole long-horizon task. However, smoothly connecting all subtask policies is difficult for surgical robot scenarios. Not all states are equally suitable for connecting two adjacent subtasks. An undesired terminate state of the previous subtask would make the current subtask policy unstable and result in a failed execution. In this work, we introduce value-informed skill chaining (ViSkill), a novel reinforcement learning framework for long-horizon surgical robot tasks. The core idea is to distinguish which terminal state is suitable for starting all the following subtask policies. To achieve this target, we introduce a state value function that estimates the expected success probability of the entire task given a state. Based on this value function, a chaining policy is learned to instruct subtask policies to terminate at the state with the highest value so that all subsequent policies are more likely to be connected for accomplishing the task. We demonstrate the effectiveness of our method on three complex surgical robot tasks from SurRoL, a comprehensive surgical simulation platform, achieving high task success rates and execution efficiency. Code is available at $\href{https://github.com/med-air/ViSkill}{\text{https://github.com/med-air/ViSkill}$.

摘要
<>使用强化学习解决长时间间隔的外科机器人任务仍然面临着策略探索挑战。现有方法是通过精细分解任务，将长时间间隔的任务分解成多个子任务，以减轻探索压力。但是，在外科机器人场景下，平滑地连接所有子任务策略是困难的。不是所有状态都适合连接两个相邻的子任务策略。undesired terminate state of the previous subtask would make the current subtask policy unstable and result in a failed execution。在这种情况下，我们提出了值知识推荐技术（ViSkill），一种新的强化学习框架，用于解决长时间间隔的外科机器人任务。核心思想是在不同状态下分配不同的策略，以确保连接所有子任务策略。为此，我们引入了一个状态价值函数，用于估计整个任务的成功概率。基于这个价值函数，我们学习了一个链接策略，用于指定子任务策略在最高价值的状态中终止，以确保所有后续策略能够连接成功完成任务。我们在三个复杂的外科机器人任务上进行了实验，分别来自SurRoL数据平台，实现了高任务成功率和执行效率。代码可以在 $\href{https://github.com/med-air/ViSkill}{\text{https://github.com/med-air/ViSkill}$ 上获取。

Learning Generalizable Tool Use with Non-rigid Grasp-pose Registration

paper_url: http://arxiv.org/abs/2307.16499
repo_url: None
paper_authors: Malte Mosbach, Sven Behnke
for: 本研究旨在帮助机器人学习工具使用行为。
methods: 该方法使用一个单一示例来学习新类型工具的操作。它使用了多指手套的抓取配置的普适化，通过有利的初始化和形式化奖励信号来导引政策搜索。
results: 学习出来的策略可以解决复杂的工具使用任务，并可以在未看过的工具上进行推广。视频和图像可以在https://maltemosbach.github.io/generalizable_tool_use上查看。

Abstract
Tool use, a hallmark feature of human intelligence, remains a challenging problem in robotics due the complex contacts and high-dimensional action space. In this work, we present a novel method to enable reinforcement learning of tool use behaviors. Our approach provides a scalable way to learn the operation of tools in a new category using only a single demonstration. To this end, we propose a new method for generalizing grasping configurations of multi-fingered robotic hands to novel objects. This is used to guide the policy search via favorable initializations and a shaped reward signal. The learned policies solve complex tool use tasks and generalize to unseen tools at test time. Visualizations and videos of the trained policies are available at https://maltemosbach.github.io/generalizable_tool_use.

摘要
人类智能的一个标志性特征是工具使用，但在机器人学中，这种问题仍然是一个挑战。在这篇论文中，我们提出了一种新的方法来启用机器人学习工具使用行为。我们的方法可以在新类别中学习工具的操作，只需要一个示例。为实现这一目标，我们提出了一种新的方法来泛化多指手 robotic 手上的抓取配置到新物体。这种方法通过提供有利初始化和形成的奖励信号来导引政策搜索。我们的学习策略解决了复杂的工具使用任务，并在测试时对未看过的工具进行扩展。可以在查看视频和图像。

Don’t be so negative! Score-based Generative Modeling with Oracle-assisted Guidance

paper_url: http://arxiv.org/abs/2307.16463
repo_url: None
paper_authors: Saeid Naderiparizi, Xiaoxuan Liang, Berend Zwartsenberg, Frank Wood
for: The paper is written for discussing a new method called Gen-neG, which leverages side-information in the form of an oracle to improve the learning of probabilistic models.
methods: The paper uses a combination of generative adversarial networks (GANs) and discriminator guidance in diffusion models to guide the generation process towards the positive support region indicated by the oracle.
results: The paper presents empirical results in applications including collision avoidance in self-driving simulators and safety-guarded human motion generation, demonstrating the utility of the proposed Gen-neG method.Here’s the same information in Simplified Chinese text:
for: 本文是用来介绍一种新的方法called Gen-neG，该方法利用 oracle 提供的侧 информацию来改进概率模型的学习。
methods: 本文使用了一种组合了生成对抗网络（GANs）和混合环境导向的扩散模型，以便通过 oracle 指定的正方向区域来导引生成过程。
results: 本文对自动驾驶模拟器中的碰撞避免和人体动作生成等应用中进行了实验，并证明了 Gen-neG 方法的实用性。

Abstract
The maximum likelihood principle advocates parameter estimation via optimization of the data likelihood function. Models estimated in this way can exhibit a variety of generalization characteristics dictated by, e.g. architecture, parameterization, and optimization bias. This work addresses model learning in a setting where there further exists side-information in the form of an oracle that can label samples as being outside the support of the true data generating distribution. Specifically we develop a new denoising diffusion probabilistic modeling (DDPM) methodology, Gen-neG, that leverages this additional side-information. Our approach builds on generative adversarial networks (GANs) and discriminator guidance in diffusion models to guide the generation process towards the positive support region indicated by the oracle. We empirically establish the utility of Gen-neG in applications including collision avoidance in self-driving simulators and safety-guarded human motion generation.

摘要
“最大可能性原则”提倡通过数据可能函数估计 параметр。这种方法可以实现多种通用特征，例如建筑、参数化和估计偏好。这个工作在存在 oracle 提供样本是否在真实数据生成分布中的支持下的情况下进行模型学习。我们开发了一种新的推导散布模型方法（DDPM），叫做 Gen-neG，它利用这些额外的side-information。我们的方法基于生成对抗网络（GANs）和推导器指导在散布模型中引导生成过程，以便将生成结果导向正确的支持区域。我们经过实验证明 Gen-neG 在包括自驾车 simulation 和人类动作生成等应用中的实用性。

L3DMC: Lifelong Learning using Distillation via Mixed-Curvature Space

paper_url: http://arxiv.org/abs/2307.16459
repo_url: https://github.com/csiro-robotics/l3dmc
paper_authors: Kaushik Roy, Peyman Moghadam, Mehrtash Harandi
for: 提高生长学习（L3）模型在连续学习任务中的表现，解决L3模型在学习新概念时的性能下降问题。
methods: 提议使用混合曲率空间（mixed-curvature space）来保持已经学习的知识，并使用多个固定曲率空间（fixed-curvature spaces）的表示能力来增强模型的表达力。
results: 在三个标准测试集上进行了实验，证明了我们提议的混合曲率空间防止忘记旧知识并更好地适应新知识的方法可以提高L3模型在医学图像分类任务中的表现。

Abstract
The performance of a lifelong learning (L3) model degrades when it is trained on a series of tasks, as the geometrical formation of the embedding space changes while learning novel concepts sequentially. The majority of existing L3 approaches operate on a fixed-curvature (e.g., zero-curvature Euclidean) space that is not necessarily suitable for modeling the complex geometric structure of data. Furthermore, the distillation strategies apply constraints directly on low-dimensional embeddings, discouraging the L3 model from learning new concepts by making the model highly stable. To address the problem, we propose a distillation strategy named L3DMC that operates on mixed-curvature spaces to preserve the already-learned knowledge by modeling and maintaining complex geometrical structures. We propose to embed the projected low dimensional embedding of fixed-curvature spaces (Euclidean and hyperbolic) to higher-dimensional Reproducing Kernel Hilbert Space (RKHS) using a positive-definite kernel function to attain rich representation. Afterward, we optimize the L3 model by minimizing the discrepancies between the new sample representation and the subspace constructed using the old representation in RKHS. L3DMC is capable of adapting new knowledge better without forgetting old knowledge as it combines the representation power of multiple fixed-curvature spaces and is performed on higher-dimensional RKHS. Thorough experiments on three benchmarks demonstrate the effectiveness of our proposed distillation strategy for medical image classification in L3 settings. Our code implementation is publicly available at https://github.com/csiro-robotics/L3DMC.

摘要
“一个生命时间学习（L3）模型的性能会随着在不同任务上的训练，而逐渐下降。现有大多数L3方法都是在固定曲率（例如零曲率欧几里得）空间中进行训练，这并不一定适合数据的复杂的几何结构。另外，维持抽象策略会直接在低维度嵌入上加载约束，使L3模型学习新的概念变得更加困难。为解决这问题，我们提出了一种名为L3DMC的维持策略，它在混合曲率空间中进行训练，以保留已经学习的知识，并在高维度的复制函数希尔бер特空间（RKHS）中进行嵌入。然后，我们将L3模型进行优化，使其在新样本表示中与以前的表示在RKHS中构建的子空间之间的差异最小化。L3DMC可以更好地适应新的知识，而不是忘记过去的知识，因为它结合了多个固定曲率空间的表示能力，并在高维度RKHS中进行训练。我们在三个标准检验 задании上进行了详细的实验，并证明了L3DMC的效iveness。我们的代码实现可以在https://github.com/csiro-robotics/L3DMC上获得。”

An Effective Data Creation Pipeline to Generate High-quality Financial Instruction Data for Large Language Model

paper_url: http://arxiv.org/abs/2308.01415
repo_url: None
paper_authors: Ziao Wang, Jianning Wang, Junda Wu, Xiaofeng Zhang
for: 这篇论文主要是为了提供一个高质量的金融数据集，以便使用大语言模型进行金融相关任务的细化调教。
methods: 该论文提出了一种仔细设计的数据创建管道，包括通过ChatGPT与人工金融专家之间的对话，并根据人工Feedback进行数据集的细化。
results: 该管道生成了一个 Robust 的征调数据集，包括103k多个多Turn chat，并通过对这个数据集进行了广泛的实验，以评估模型的性能。结果表明，该方法可以使AI模型生成准确、相关、金融式的回答，从而为金融领域应用提供一个强大的工具。

Abstract
At the beginning era of large language model, it is quite critical to generate a high-quality financial dataset to fine-tune a large language model for financial related tasks. Thus, this paper presents a carefully designed data creation pipeline for this purpose. Particularly, we initiate a dialogue between an AI investor and financial expert using ChatGPT and incorporate the feedback of human financial experts, leading to the refinement of the dataset. This pipeline yielded a robust instruction tuning dataset comprised of 103k multi-turn chats. Extensive experiments have been conducted on this dataset to evaluate the model's performance by adopting an external GPT-4 as the judge. The promising experimental results verify that our approach led to significant advancements in generating accurate, relevant, and financial-style responses from AI models, and thus providing a powerful tool for applications within the financial sector.

摘要
在大语言模型时代的开始，生成高质量金融数据集是非常重要的，以调整大语言模型进行金融相关任务。这篇论文提出了一个仔细设计的数据创建管道，特别是通过与人工金融专家的对话，使用ChatGPT，并根据人类金融专家的反馈，进行数据集的精细调整。这个管道生成了103k多turn对话数据集。我们在这个数据集上进行了广泛的实验，采用外部GPT-4作为评审者，以评估模型的性能。实验结果表明，我们的方法导致了AI模型生成高准确、相关、金融风格的回答，从而为金融领域应用提供了一个强大的工具。

A continuous Structural Intervention Distance to compare Causal Graphs

paper_url: http://arxiv.org/abs/2307.16452
repo_url: None
paper_authors: Mihir Dhanakshirur, Felix Laumann, Junhyung Park, Mauricio Barahona
for: 本研究旨在提供一种新的维度度量，用于评估真实和学习的 causal 图之间的差异。
methods: 本研究使用 conditional mean embeddings 将 intervención 分布映射到 reproduce kernel Hilbert space 中，然后计算这些分布之间的最大（conditional）mean discrepancy，来评估 causal 图的差异。
results: 研究人员通过 theoretically 和数据实验验证了这种新的维度度量的有效性。

Abstract
Understanding and adequately assessing the difference between a true and a learnt causal graphs is crucial for causal inference under interventions. As an extension to the graph-based structural Hamming distance and structural intervention distance, we propose a novel continuous-measured metric that considers the underlying data in addition to the graph structure for its calculation of the difference between a true and a learnt causal graph. The distance is based on embedding intervention distributions over each pair of nodes as conditional mean embeddings into reproducing kernel Hilbert spaces and estimating their difference by the maximum (conditional) mean discrepancy. We show theoretical results which we validate with numerical experiments on synthetic data.

摘要
理解和准确评估真实和学习的 causal 图之间的差异是 causal 推断中的关键。作为结构 Hamming 距离和结构 intervención 距离的扩展，我们提出一种新的连续量化的度量，它考虑了在计算真实和学习 causal 图之间的差异时的数据的下面。这个距离基于对每对节点的 intervención 分布进行 Conditional Mean Embedding 的嵌入，并估计它们之间的差异为最大（conditional） Mean Discrepancy。我们提供了理论结果，并通过synthetic数据的数值实验 validate 这些结果。

Towards Head Computed Tomography Image Reconstruction Standardization with Deep Learning Assisted Automatic Detection

paper_url: http://arxiv.org/abs/2307.16440
repo_url: None
paper_authors: Bowen Zheng, Chenxi Huang, Yuemei Luo
for: 提高头部计算机断层成像（CT）图像三维重建的精度和可重复性，并减少手动操作。
methods: 使用深度学习基于 объек检测算法，自动检测和评估颅骨线标志，以重新格式化图像前置 reconstruction。
results: 比较了十种对象检测算法的精度、效率和Robustness，选择了轻量级的 YOLOv8，其mAP为92.91%，并在标准化重建结果中表现出丰富的临床实用性和有效性。

Abstract
Three-dimensional (3D) reconstruction of head Computed Tomography (CT) images elucidates the intricate spatial relationships of tissue structures, thereby assisting in accurate diagnosis. Nonetheless, securing an optimal head CT scan without deviation is challenging in clinical settings, owing to poor positioning by technicians, patient's physical constraints, or CT scanner tilt angle restrictions. Manual formatting and reconstruction not only introduce subjectivity but also strain time and labor resources. To address these issues, we propose an efficient automatic head CT images 3D reconstruction method, improving accuracy and repeatability, as well as diminishing manual intervention. Our approach employs a deep learning-based object detection algorithm, identifying and evaluating orbitomeatal line landmarks to automatically reformat the images prior to reconstruction. Given the dearth of existing evaluations of object detection algorithms in the context of head CT images, we compared ten methods from both theoretical and experimental perspectives. By exploring their precision, efficiency, and robustness, we singled out the lightweight YOLOv8 as the aptest algorithm for our task, with an mAP of 92.91% and impressive robustness against class imbalance. Our qualitative evaluation of standardized reconstruction results demonstrates the clinical practicability and validity of our method.

摘要
三维重建头部计算机断层成像（CT）图像可以帮助精确诊断，但在临床 Settings中获得优质头部CT扫描是具有挑战性的，这主要是由技术人员的位置不稳定、病人的身体限制或计算机扫描机的倾斜角度所致。手动格式化和重建不仅引入主观性，还占用了时间和劳动资源。为了解决这些问题，我们提出了一种高效的自动头部CT图像三维重建方法，提高了准确性和重复性，同时减少了手动干预。我们的方法利用深度学习基于 объек检测算法，通过识别和评估 orbitomeatal 线标记来自动重新格式化图像，以前置重建。由于现有的头部CT图像对象检测算法的评估罕见，我们从理论和实验两个角度对十种方法进行了比较。通过评估精度、效率和稳定性，我们选择了轻量级的 YOLOv8，其MAP值为92.91%，并在类偏置问题中表现出了扎实的Robustness。我们的质量评估标准化重建结果表明了我们的方法在临床实践中的可行性和有效性。

VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design

paper_url: http://arxiv.org/abs/2307.16430
repo_url: https://github.com/daniilrobnikov/vits2
paper_authors: Jungil Kong, Jihoon Park, Beomjeong Kim, Jeongmin Kim, Dohee Kong, Sangjin Kim
for: 提高单Stage Text-to-Speech模型的自然性、计算效率和多种语音特征的同步。
methods: 提出VITS2模型，通过改进结构和训练机制，提高单Stage Text-to-Speech模型的自然性、多种语音特征的同步和训练和推断的效率。
results: 实验结果表明，VITS2模型可以更好地提高单Stage Text-to-Speech模型的自然性、多种语音特征的同步和计算效率，同时可以减少先前的phoneme转换依赖，实现完全端到端单Stage Approach。

Abstract
Single-stage text-to-speech models have been actively studied recently, and their results have outperformed two-stage pipeline systems. Although the previous single-stage model has made great progress, there is room for improvement in terms of its intermittent unnaturalness, computational efficiency, and strong dependence on phoneme conversion. In this work, we introduce VITS2, a single-stage text-to-speech model that efficiently synthesizes a more natural speech by improving several aspects of the previous work. We propose improved structures and training mechanisms and present that the proposed methods are effective in improving naturalness, similarity of speech characteristics in a multi-speaker model, and efficiency of training and inference. Furthermore, we demonstrate that the strong dependence on phoneme conversion in previous works can be significantly reduced with our method, which allows a fully end-to-end single-stage approach.

摘要
单阶段文本至语音模型在最近几年中得到了广泛研究，其效果比两阶段管道系统更好。虽然之前的单阶段模型已经做出了很大的进步，但还有一些方面可以进一步改进，例如间歇性不自然、计算效率低下和phoneme转换的强依赖。在这项工作中，我们介绍VITS2单阶段文本至语音模型，该模型通过改进多个方面来生成更自然的语音。我们提出了改进的结构和训练机制，并证明了我们的方法能够提高自然性、语音特征相似性和训练和推理的效率。此外，我们还证明了前一代模型中phoneme转换的强依赖可以在我们的方法下降到可接受的水平，这使得完全的端到端单阶段approach成为可能。

Causal Inference for Banking Finance and Insurance A Survey

paper_url: http://arxiv.org/abs/2307.16427
repo_url: None
paper_authors: Satyam Kumar, Yelleti Vivek, Vadlamani Ravi, Indranil Bose
for: 本研究旨在探讨 causal inference 在银行、金融和保险领域的应用，尤其是在这些领域中 causal inference 的应用状况。
methods: 本文通过对 37 篇1992-2023年发表的论文进行概括，探讨这些论文中使用的 statistical methods，包括 Bayesian Causal Network、Granger Causality 等。
results: 本文发现，银行和保险领域中的 causal inference 应用还处于初始阶段，因此有更多的研究空间可以开拓，以使其成为可靠的方法。

Abstract
Causal Inference plays an significant role in explaining the decisions taken by statistical models and artificial intelligence models. Of late, this field started attracting the attention of researchers and practitioners alike. This paper presents a comprehensive survey of 37 papers published during 1992-2023 and concerning the application of causal inference to banking, finance, and insurance. The papers are categorized according to the following families of domains: (i) Banking, (ii) Finance and its subdomains such as corporate finance, governance finance including financial risk and financial policy, financial economics, and Behavioral finance, and (iii) Insurance. Further, the paper covers the primary ingredients of causal inference namely, statistical methods such as Bayesian Causal Network, Granger Causality and jargon used thereof such as counterfactuals. The review also recommends some important directions for future research. In conclusion, we observed that the application of causal inference in the banking and insurance sectors is still in its infancy, and thus more research is possible to turn it into a viable method.

摘要
causal inference 在解释统计模型和人工智能模型所作出的决策中扮演着重要的角色。近年来，这个领域吸引了研究者和实践者的关注。本文是一篇涵盖1992-2023年发表的37篇论文，探讨了在银行、金融和保险领域中应用 causal inference 的综述。这些论文被分为以下三个家庭领域：（i）银行，（ii）金融和其子领域，如企业财务、管理财务、金融风险和金融政策、金融经济和行为金融，以及（iii）保险。此外，文章还覆盖了 causal inference 的基本组成部分，包括统计方法如 bayesian causal network 和 Granger causality，以及其中使用的术语如 counterfactuals。文章还提出了未来研究的重要方向。结论是，在银行和保险领域中应用 causal inference 的应用还处于初生阶段，因此更多的研究可以使其成为可靠的方法。

MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning

paper_url: http://arxiv.org/abs/2307.16424
repo_url: None
paper_authors: Baoquan Zhang, Demin Yu
for: 提高深度学习模型快速学习能力，即从少量示例学习出更好的表现。
methods: 基于梯度下降的meta学习方法，通过学习如何快速学习新任务。其关键思想是在bi-level优化manner中学习一个共享梯度下降算法（即其超参数），然后使用这个算法优化任务特定的模型，使用只有少量标注数据。
results: 与现有方法相比，我们的MetaDiff在少量学习任务中表现出色，并且不需要计算第二个DERIVATIVE，从而避免了内存压力和梯度消失问题。

Abstract
Equipping a deep model the abaility of few-shot learning, i.e., learning quickly from only few examples, is a core challenge for artificial intelligence. Gradient-based meta-learning approaches effectively address the challenge by learning how to learn novel tasks. Its key idea is learning a deep model in a bi-level optimization manner, where the outer-loop process learns a shared gradient descent algorithm (i.e., its hyperparameters), while the inner-loop process leverage it to optimize a task-specific model by using only few labeled data. Although these existing methods have shown superior performance, the outer-loop process requires calculating second-order derivatives along the inner optimization path, which imposes considerable memory burdens and the risk of vanishing gradients. Drawing inspiration from recent progress of diffusion models, we find that the inner-loop gradient descent process can be actually viewed as a reverse process (i.e., denoising) of diffusion where the target of denoising is model weights but the origin data. Based on this fact, in this paper, we propose to model the gradient descent optimizer as a diffusion model and then present a novel task-conditional diffusion-based meta-learning, called MetaDiff, that effectively models the optimization process of model weights from Gaussion noises to target weights in a denoising manner. Thanks to the training efficiency of diffusion models, our MetaDiff do not need to differentiate through the inner-loop path such that the memory burdens and the risk of vanishing gradients can be effectvely alleviated. Experiment results show that our MetaDiff outperforms the state-of-the-art gradient-based meta-learning family in few-shot learning tasks.

摘要
使得深度模型具备几个例之学习能力，即快速从只有几个示例学习，是人工智能的核心挑战。基于梯度的meta学习方法有效地解决了这个挑战，其关键思想是通过在外层循环中学习一个共享梯度下降算法（即其超参数），而在内层循环中使用只有几个标注数据来优化任务特定模型。虽然现有的方法已经表现出色，但外层循环过程需要计算第二个Derivative along the inner optimization path，这会带来很大的内存压力和梯度消失风险。 drawing inspiration from recent progress of diffusion models， we find that the inner-loop gradient descent process can be viewed as a reverse process (i.e., denoising) of diffusion, where the target of denoising is model weights but the origin data。 Based on this fact， in this paper， we propose to model the gradient descent optimizer as a diffusion model and then present a novel task-conditional diffusion-based meta-learning, called MetaDiff, that effectively models the optimization process of model weights from Gaussian noise to target weights in a denoising manner。Thanks to the training efficiency of diffusion models, our MetaDiff does not need to differentiate through the inner-loop path, so the memory burdens and the risk of vanishing gradients can be effectively alleviated。 Experiment results show that our MetaDiff outperforms the state-of-the-art gradient-based meta-learning family in few-shot learning tasks。

Guaranteed Optimal Generative Modeling with Maximum Deviation from the Empirical Distribution

paper_url: http://arxiv.org/abs/2307.16422
repo_url: None
paper_authors: Elen Vardanyan, Arshak Minasyan, Sona Hunanyan, Tigran Galstyan, Arnak Dalalyan
for: 这个论文的目的是提供关于训练生成模型的理论吗？
methods: 这个论文使用了哪些方法？
results: 这个论文的结果是什么？Here are the answers in Simplified Chinese:
for: 这个论文的目的是提供关于训练生成模型的理论吗？ + 这个论文的目的是为了提供一些关于训练生成模型的理论吗？
methods: 这个论文使用了哪些方法？ + 这个论文使用了一些关于训练生成模型的方法，包括： - 错误函数方法 - 聚合方法 - 权重函数方法
results: 这个论文的结果是什么？ + 这个论文的结果是： - 训练生成模型的误差函数应该逐渐接近零 - 训练生成模型的分布应该远离任何模型，可以在训练数据中找到的分布I hope this helps! Let me know if you have any further questions.

Abstract
Generative modeling is a widely-used machine learning method with various applications in scientific and industrial fields. Its primary objective is to simulate new examples drawn from an unknown distribution given training data while ensuring diversity and avoiding replication of examples from the training data. This paper presents theoretical insights into training a generative model with two properties: (i) the error of replacing the true data-generating distribution with the trained data-generating distribution should optimally converge to zero as the sample size approaches infinity, and (ii) the trained data-generating distribution should be far enough from any distribution replicating examples in the training data. We provide non-asymptotic results in the form of finite sample risk bounds that quantify these properties and depend on relevant parameters such as sample size, the dimension of the ambient space, and the dimension of the latent space. Our results are applicable to general integral probability metrics used to quantify errors in probability distribution spaces, with the Wasserstein-$1$ distance being the central example. We also include numerical examples to illustrate our theoretical findings.

摘要
<>传统的机器学习方法之一是生成模型，它在科学和工业领域有广泛的应用。生成模型的 PRIMARY OBJECTIVE 是通过训练数据 simulate 新的例子，从未知分布中采样新的例子，同时保证新的例子具有多样性和不同于训练数据中的例子。这篇文章提供了生成模型训练的两个性质：（i）在训练数据中替换真实的数据生成分布时，训练后的数据生成分布的误差应该在样本数趋向于无穷大时Optimally Converge to Zero，（ii）训练后的数据生成分布应该与训练数据中的例子replicate的分布远离 enough。我们提供了非假设统计结果，包括finite sample risk bounds，这些结果取决于样本大小、维度空间和秘密空间中的相关参数。我们的结果适用于普通的积分概率度量， Wasserstein-$1$ 距离是中心例子。我们还包括了数据示例来证明我们的理论发现。

DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in Conversation

paper_url: http://arxiv.org/abs/2308.01966
repo_url: None
paper_authors: Vu Ngoc Tu, Van Thong Huynh, Hyung-Jeong Yang, M. Zaigham Zaheer, Shah Nawaz, Karthik Nandakumar, Soo-Hyung Kim
for: 这篇论文的目的是为了模型和估计人类对话中的参与度。
methods: 该论文使用了扩展 convolutional Transformer 来实现对话参与度的估计。
results: 该论文在 MULTIMEDIATE 2023 比赛中表现出优于基eline模型，在测试集上提高了 $7%$，在验证集上提高了 $4%$。另外，该论文还使用了不同的modalities fusión机制，并证明了在这种数据上，简单的 concatenation 方法加上自注意力融合可以获得最好的性能。

Abstract
Conversational engagement estimation is posed as a regression problem, entailing the identification of the favorable attention and involvement of the participants in the conversation. This task arises as a crucial pursuit to gain insights into human's interaction dynamics and behavior patterns within a conversation. In this research, we introduce a dilated convolutional Transformer for modeling and estimating human engagement in the MULTIMEDIATE 2023 competition. Our proposed system surpasses the baseline models, exhibiting a noteworthy $7$\% improvement on test set and $4$\% on validation set. Moreover, we employ different modality fusion mechanism and show that for this type of data, a simple concatenated method with self-attention fusion gains the best performance.

摘要
通过对话参与度的 regression 问题来评估对话参与度，以获得对话中人类互动动态和行为模式的深入理解。在本研究中，我们提出了一种扩展 convolutional Transformer 来模型和估计对话参与度，并在 MULTIMEDIATE 2023 比赛中展示了我们的提案系统。我们的提案系统在测试集上显示了出色的 $7\%$ 提升，而在验证集上则是 $4\%$。此外，我们还使用了不同的 modalities 融合机制，并证明在这类数据上，简单的 concatenation 方法加上自注意力融合能够获得最好的性能。

Subspace Distillation for Continual Learning

paper_url: http://arxiv.org/abs/2307.16419
repo_url: https://github.com/csiro-robotics/sdcl
paper_authors: Kaushik Roy, Christian Simon, Peyman Moghadam, Mehrtash Harandi
for: 本研究旨在 Mitigating Catastrophic Forgetting in continual learning, 以 preserve 前一任务学习的知识。
methods: 提议一种基于 manifold structure 的知识混合技术，使得 neural network 可以在新任务学习过程中保持先前任务的知识。
results: 实验表明，提议的方法可以在多个难度dataset上减轻忘却现象，并且可以与现有的学习方法混合使用，以提高其性能。

Abstract
An ultimate objective in continual learning is to preserve knowledge learned in preceding tasks while learning new tasks. To mitigate forgetting prior knowledge, we propose a novel knowledge distillation technique that takes into the account the manifold structure of the latent/output space of a neural network in learning novel tasks. To achieve this, we propose to approximate the data manifold up-to its first order, hence benefiting from linear subspaces to model the structure and maintain the knowledge of a neural network while learning novel concepts. We demonstrate that the modeling with subspaces provides several intriguing properties, including robustness to noise and therefore effective for mitigating Catastrophic Forgetting in continual learning. We also discuss and show how our proposed method can be adopted to address both classification and segmentation problems. Empirically, we observe that our proposed method outperforms various continual learning methods on several challenging datasets including Pascal VOC, and Tiny-Imagenet. Furthermore, we show how the proposed method can be seamlessly combined with existing learning approaches to improve their performances. The codes of this article will be available at https://github.com/csiro-robotics/SDCL.

摘要
最终目标是在持续学习中保留先前任务中学习的知识，以避免知识卷积。我们提出了一种新的知识填充技术，利用神经网络的输出/潜在空间的拓扑结构来学习新任务。为此，我们提出将数据拓扑约化到第一阶段，从而利用直线子空间来模型结构，保持神经网络的知识 while learning novel concepts。我们证明了这种模型具有许多有趣的性质，包括鲁棒性于噪声和有效防止Catastrophic Forgetting。我们还讨论了如何采用我们的提议方法来解决分类和 segmentation 问题。实验证明，我们的提议方法在 Pascal VOC 和 Tiny-Imagenet 等复杂的数据集上表现出色，并且可以轻松地与现有的学习方法结合使用，以提高其表现。代码将在上提供。

Causal-learn: Causal Discovery in Python

paper_url: http://arxiv.org/abs/2307.16405
repo_url: https://github.com/py-why/causal-learn
paper_authors: Yujia Zheng, Biwei Huang, Wei Chen, Joseph Ramsey, Mingming Gong, Ruichu Cai, Shohei Shimizu, Peter Spirtes, Kun Zhang
for: 本文旨在描述一个开源的Python库，用于 causal discovery，即从观察数据中揭示 causal 关系。
methods: 本库提供了一系列 causal discovery 方法，包括 non-parametric 方法和 parametric 方法，并且提供了 ease-to-use API，以便 для非专家用户。
results: 本库可以帮助用户快速和简单地进行 causal discovery，并且提供了详细的 documentation，以便学习和掌握。

Abstract
Causal discovery aims at revealing causal relations from observational data, which is a fundamental task in science and engineering. We describe $\textit{causal-learn}$, an open-source Python library for causal discovery. This library focuses on bringing a comprehensive collection of causal discovery methods to both practitioners and researchers. It provides easy-to-use APIs for non-specialists, modular building blocks for developers, detailed documentation for learners, and comprehensive methods for all. Different from previous packages in R or Java, $\textit{causal-learn}$ is fully developed in Python, which could be more in tune with the recent preference shift in programming languages within related communities. The library is available at https://github.com/py-why/causal-learn.

摘要
causal discovery 旨在从观察数据中揭示 causal 关系，这是科学和工程中的基本任务。我们介绍了 $\textit{causal-learn}$，一个开源的 Python 库用于 causal discovery。这个库专注于为专家和研究人员提供一个全面的 causal discovery 方法收藏。它提供了易于使用的 API，用于非专家，模块化的构建块，详细的文档，以及全面的方法。与之前在 R 或 Java 中出现的包不同，$\textit{causal-learn}$ 是完全在 Python 中开发的，这与相关领域的编程语言偏好的变化相吻合。该库可以在中获取。

Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks

paper_url: http://arxiv.org/abs/2307.16395
repo_url: None
paper_authors: Kousik Rajesh, Mrigank Raman, Mohammed Asad Karim, Pranit Chawla
for: This paper investigates the performance of multi-modal architectures based on Large Language Models (LLMs) on the NLVR2 dataset, specifically focusing on the effectiveness of adding object level features and pre-training on multi-modal data for complex visual reasoning tasks.
methods: The paper proposes extending traditional bridge architectures for the NLVR2 dataset by adding object level features, and also explores the use of a recently proposed bridge-architecture called LLaVA in the zero shot setting.
results: The paper shows that pre-training on multi-modal data is key for good performance on complex reasoning tasks such as NLVR2, and that adding object level features to bridge architectures does not help. The paper also demonstrates some initial results on LLaVA in the zero shot setting and analyzes its performance.Here is the same information in Simplified Chinese text:
for: 这篇论文研究了基于大语言模型（LLMs）的多Modal架构在NLVR2数据集上的表现，具体来说是对复杂视觉理解任务中的细化对象分析是否有助于提高表现。
methods: 论文提出了对NLVR2数据集的传统桥架架构进行对象层特征的扩展，以及使用最近提出的桥架架构LLAVA在零shot Setting中的初步研究。
results: 论文表明了预training多Modal数据是复杂视觉任务NLVR2表现好的关键因素，而对象层特征的添加不帮助提高表现。论文还初步展示了LLAVA在零shot Setting中的表现和分析。

Abstract
In recent times there has been a surge of multi-modal architectures based on Large Language Models, which leverage the zero shot generation capabilities of LLMs and project image embeddings into the text space and then use the auto-regressive capacity to solve tasks such as VQA, captioning, and image retrieval. We name these architectures as "bridge-architectures" as they project from the image space to the text space. These models deviate from the traditional recipe of training transformer based multi-modal models, which involve using large-scale pre-training and complex multi-modal interactions through co or cross attention. However, the capabilities of bridge architectures have not been tested on complex visual reasoning tasks which require fine grained analysis about the image. In this project, we investigate the performance of these bridge-architectures on the NLVR2 dataset, and compare it to state-of-the-art transformer based architectures. We first extend the traditional bridge architectures for the NLVR2 dataset, by adding object level features to faciliate fine-grained object reasoning. Our analysis shows that adding object level features to bridge architectures does not help, and that pre-training on multi-modal data is key for good performance on complex reasoning tasks such as NLVR2. We also demonstrate some initial results on a recently bridge-architecture, LLaVA, in the zero shot setting and analyze its performance.

摘要
现在有一些基于大语言模型的多模态架构得到了广泛应用，这些架构利用大语言模型的零批生成能力，将图像嵌入在文本空间中，然后使用自动反相能力解决问题如VQA、描述和图像检索。我们称这些架构为“桥架架构”，因为它们从图像空间到文本空间的映射。这些模型与传统的多模态变换器模型不同，它们不需要大规模预训练和复杂的多模态交互，但是它们的能力尚未在复杂的视觉逻辑任务中被测试。在这个项目中，我们调查了bridge架构在NLVR2 dataset上的表现，并与状态对照的transformer基于模型进行比较。我们首先对NLVR2 dataset进行了传统的bridge架构的扩展，添加了 объек level 特征以便促进细化的物体逻辑分析。我们的分析表明，在bridge架构中添加对象级别特征并不有助于，预训练在多模态数据上是关键 для在复杂的逻辑任务上表现良好。我们还提供了一些初步的LLaVA bridgebasis在零批设置下的表现分析。

A Pre-trained Data Deduplication Model based on Active Learning

paper_url: http://arxiv.org/abs/2308.00721
repo_url: None
paper_authors: Xinyao Liu, Shengdong Du, Fengmao Lv, Hongtao Xue, Jie Hu, Tianrui Li
For: 解决大数据时期的数据质量问题，特别是重复数据问题，以提高大数据的有效应用。* Methods: 基于活动学习的预训练去重模型，首次将活动学习与转换器结合在一起，以选择最有价值的数据进行模型训练，并首次应用R-Drop方法进行数据扩展。* Results: 对于去重后的数据标识，提出了28%的准确率提升，比前一个状态的艺术品（SOTA）更高。

Abstract
In the era of big data, the issue of data quality has become increasingly prominent. One of the main challenges is the problem of duplicate data, which can arise from repeated entry or the merging of multiple data sources. These "dirty data" problems can significantly limit the effective application of big data. To address the issue of data deduplication, we propose a pre-trained deduplication model based on active learning, which is the first work that utilizes active learning to address the problem of deduplication at the semantic level. The model is built on a pre-trained Transformer and fine-tuned to solve the deduplication problem as a sequence to classification task, which firstly integrate the transformer with active learning into an end-to-end architecture to select the most valuable data for deduplication model training, and also firstly employ the R-Drop method to perform data augmentation on each round of labeled data, which can reduce the cost of manual labeling and improve the model's performance. Experimental results demonstrate that our proposed model outperforms previous state-of-the-art (SOTA) for deduplicated data identification, achieving up to a 28% improvement in Recall score on benchmark datasets.

摘要
在大数据时代，数据质量问题变得越来越突出。一个主要挑战是重复的数据问题，这可能来自于重复输入或多个数据源的合并。这些“废弃数据”问题可能会很大程度限制大数据的有效应用。为解决数据筛选问题，我们提议一个基于活动学习的预训练deduplication模型，这是第一个在semantic水平上使用活动学习解决数据筛选问题的研究。该模型基于预训练的Transformer结构，并在解决数据筛选问题上进行了序列分类任务的精度训练，这也是第一次将Transformer结构与活动学习集成到末端架构中，以选择数据筛选模型训练的最有价值数据，并且是第一次在每个Label数据上使用R-Drop方法进行数据增强，可以降低人工标注成本并提高模型的性能。实验结果表明，我们提议的模型在比较数据标识 task 中超过前一个状态的较好表现，达到28%的Recall分数提升在 benchmark 数据集上。

STL: A Signed and Truncated Logarithm Activation Function for Neural Networks

paper_url: http://arxiv.org/abs/2307.16389
repo_url: None
paper_authors: Yuanhao Gong
for: 本文提出了一种新的签名和 truncated logarithm 函数作为活动函数，以提高神经网络的精度和运行性能。
methods: 本文使用了多种已知的活动函数进行比较，并证明了新提出的activation函数具有更好的数学性质，如odd函数、 monotone函数、 differentiable函数、无 bound 值范围和连续非零导数。
results: 对多种已知神经网络进行比较，结果表明新提出的activation函数在精度和运行性能方面具有 estado-of-the-art 的表现。这种活动函数可以应用于大多数神经网络中， где活动函数是必要的。

Abstract
Activation functions play an essential role in neural networks. They provide the non-linearity for the networks. Therefore, their properties are important for neural networks' accuracy and running performance. In this paper, we present a novel signed and truncated logarithm function as activation function. The proposed activation function has significantly better mathematical properties, such as being odd function, monotone, differentiable, having unbounded value range, and a continuous nonzero gradient. These properties make it an excellent choice as an activation function. We compare it with other well-known activation functions in several well-known neural networks. The results confirm that it is the state-of-the-art. The suggested activation function can be applied in a large range of neural networks where activation functions are necessary.

摘要
Activation functions play an essential role in neural networks. They provide the non-linearity for the networks. Therefore, their properties are important for neural networks' accuracy and running performance. In this paper, we present a novel signed and truncated logarithm function as activation function. The proposed activation function has significantly better mathematical properties, such as being odd function, monotone, differentiable, having unbounded value range, and a continuous nonzero gradient. These properties make it an excellent choice as an activation function. We compare it with other well-known activation functions in several well-known neural networks. The results confirm that it is the state-of-the-art. The suggested activation function can be applied in a large range of neural networks where activation functions are necessary.这文章提出了一个新的签名和截断对数函数作为神经网络中的启动函数。这个提案的启动函数具有更好的数学性能，如是odd函数、单调、可微、无上限值范围和连续非零导数。这些特性使其成为一个非常出色的启动函数选择。我们与其他已知的启动函数进行比较，发现其在训练神经网络方面的表现优于其他所有启动函数。这个提案的启动函数可以应用于广泛的神经网络中，其中需要启动函数的情况下。

Does fine-tuning GPT-3 with the OpenAI API leak personally-identifiable information?

paper_url: http://arxiv.org/abs/2307.16382
repo_url: https://github.com/albertsun1/gpt3-pii-attacks
paper_authors: Albert Yu Sun, Eliott Zemour, Arushi Saxena, Udith Vaidyanathan, Eric Lin, Christian Lau, Vaikkunth Mugunthan
for: 这个研究的目的是确定是否可以从OpenAI的GPT-3模型中提取个人可Identifiable信息 (PII)。
methods: 这个研究使用了naive prompting方法和Autocomplete任务来调查GPT-3模型是否会记忆和泄露敏感信息。
results: 研究发现，对GPT-3模型进行了两个任务的 fine-tuning，导致模型记忆并泄露了基本数据集中的敏感信息 (PII)。Here’s the full text in Simplified Chinese:
for: 这个研究的目的是确定是否可以从OpenAI的GPT-3模型中提取个人可Identifiable信息 (PII)。
methods: 这个研究使用了naive prompting方法和Autocomplete任务来调查GPT-3模型是否会记忆和泄露敏感信息。
results: 研究发现，对GPT-3模型进行了两个任务的 fine-tuning，导致模型记忆并泄露了基本数据集中的敏感信息 (PII)。I hope that helps! Let me know if you have any other questions.

Abstract
Machine learning practitioners often fine-tune generative pre-trained models like GPT-3 to improve model performance at specific tasks. Previous works, however, suggest that fine-tuned machine learning models memorize and emit sensitive information from the original fine-tuning dataset. Companies such as OpenAI offer fine-tuning services for their models, but no prior work has conducted a memorization attack on any closed-source models. In this work, we simulate a privacy attack on GPT-3 using OpenAI's fine-tuning API. Our objective is to determine if personally identifiable information (PII) can be extracted from this model. We (1) explore the use of naive prompting methods on a GPT-3 fine-tuned classification model, and (2) we design a practical word generation task called Autocomplete to investigate the extent of PII memorization in fine-tuned GPT-3 within a real-world context. Our findings reveal that fine-tuning GPT3 for both tasks led to the model memorizing and disclosing critical personally identifiable information (PII) obtained from the underlying fine-tuning dataset. To encourage further research, we have made our codes and datasets publicly available on GitHub at: https://github.com/albertsun1/gpt3-pii-attacks

摘要
机器学习实践者们常常微调生成批处理模型如GPT-3以提高模型在特定任务上的性能。然而，先前的研究表明，微调机器学习模型会记忆和发送敏感信息从原始微调数据集。如OpenAI提供的微调服务，但没有任何之前的工作对任何关闭源模型进行了记忆攻击。在这种工作中，我们模拟了对GPT-3的隐私攻击，以确定是否可以从这个模型中提取个人 Identifiable Information (PII)。我们（1）探索使用简单的提示方法在GPT-3微调的分类模型上，并（2）我们设计了一个实用的单词生成任务called Autocomplete，以调查微调GPT-3中PII的储存程度。我们的发现表明，对GPT-3进行微调两个任务都导致模型记忆和披露critical的个人Identifiable Information (PII)从原始微调数据集中获得。为促进更多的研究，我们在GitHub上公开了我们的代码和数据集：https://github.com/albertsun1/gpt3-pii-attacks。

UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming

paper_url: http://arxiv.org/abs/2307.16375
repo_url: None
paper_authors: Hao Lin, Ke Wu, Jun Li, Wu-Jun Li
for: 提高深度学习模型的训练效率
methods: 使用混合整数二次函数Programming进行自动并行化
results: 比前方法提高1.70倍的 durchput，并且减少搜索策略时间16倍。

Abstract
Deep learning models have demonstrated impressive performance in various domains. However, the prolonged training time of these models remains a critical problem. Manually designed parallel training strategies could enhance efficiency but require considerable time and deliver little flexibility. Hence, automatic parallelism is proposed to automate the parallel strategy searching process. Even so, existing approaches suffer from sub-optimal strategy space because they treat automatic parallelism as two independent stages, namely inter- and intra-layer parallelism. To address this issue, we propose UniAP, which utilizes mixed integer quadratic programming to unify inter- and intra-layer automatic parallelism. To the best of our knowledge, UniAP is the first work to unify these two categories to search for a globally optimal strategy. The experimental results show that UniAP outperforms state-of-the-art methods by up to 1.70$\times$ in throughput and reduces strategy searching time by up to 16$\times$ across four Transformer-like models.

摘要

BearingPGA-Net: A Lightweight and Deployable Bearing Fault Diagnosis Network via Decoupled Knowledge Distillation and FPGA Acceleration

paper_url: http://arxiv.org/abs/2307.16363
repo_url: https://github.com/asdvfghg/bearingpga-net
paper_authors: Jing-Xiao Liao, Sheng-Lai Wei, Chen-Long Xie, Tieyong Zeng, Jinwei Sun, Shiping Zhang, Xiaoge Zhang, Feng-Lei Fan
for: 这个研究旨在提出一个轻量级且可部署的滚珠缺陷诊断模型，以应对现有的深度学习模型对行业领域的不适用性。
methods: 我们使用了一个已经受训的大型模型来训练 BearingPGA-Net，并运用了分离知识传播来将知识传播到小型模型中。这使得我们的模型具有了优秀的缺陷诊断性能，而且模型的大小仅仅是其他轻量级方法的一半。
results: 我们设计了一个使用 FPGA 的加速方案，将 BearingPGA-Net 的每个层都用特定的量化和定制的逻辑门体设计成 FPGA 上，并强调了平行计算和模组重复以提高计算速度。根据我们的实验结果，我们的部署方案可以实现 CPU 上的200倍以上的诊断速度，而且与 CPU 上的 F1、Recall 和 Precision 分数相比，其表现下降不到0.4%。

Abstract
Deep learning has achieved remarkable success in the field of bearing fault diagnosis. However, this success comes with larger models and more complex computations, which cannot be transferred into industrial fields requiring models to be of high speed, strong portability, and low power consumption. In this paper, we propose a lightweight and deployable model for bearing fault diagnosis, referred to as BearingPGA-Net, to address these challenges. Firstly, aided by a well-trained large model, we train BearingPGA-Net via decoupled knowledge distillation. Despite its small size, our model demonstrates excellent fault diagnosis performance compared to other lightweight state-of-the-art methods. Secondly, we design an FPGA acceleration scheme for BearingPGA-Net using Verilog. This scheme involves the customized quantization and designing programmable logic gates for each layer of BearingPGA-Net on the FPGA, with an emphasis on parallel computing and module reuse to enhance the computational speed. To the best of our knowledge, this is the first instance of deploying a CNN-based bearing fault diagnosis model on an FPGA. Experimental results reveal that our deployment scheme achieves over 200 times faster diagnosis speed compared to CPU, while achieving a lower-than-0.4\% performance drop in terms of F1, Recall, and Precision score on our independently-collected bearing dataset. Our code is available at \url{https://github.com/asdvfghg/BearingPGA-Net}.

摘要
深度学习在滚珠疲劳诊断领域取得了很大的成功，但这些成功来自于更大的模型和更复杂的计算，这些模型不能在需要高速、强可移植和低功耗的工业场景中使用。在这篇论文中，我们提出了一种轻量级可部署的滚珠疲劳诊断模型，称为滚珠疲劳诊断网络（BearingPGA-Net），以解决这些挑战。首先，我们通过一个受训的大型模型的帮助，对BearingPGA-Net进行分离知识填充。尽管它具有小型，但我们的模型在其他轻量级当前的方法中表现出色，并且达到了优于0.4%的F1、回归和准确率分数。其次，我们设计了一种FPGA加速方案，使用Verilog语言来自定义量化和设计可编程逻辑门阵列 для每层BearingPGA-Net在FPGA上，强调并行计算和模块重用以提高计算速度。到目前为止，这是首次将CNN基于滚珠疲劳诊断模型部署到FPGA上。实验结果表明，我们的部署方案可以在CPU上进行200倍以上的诊断速度提升，而且与F1、回归和准确率分数在我们独立收集的滚珠 dataset上保持低于0.4%的表现。我们的代码可以在以下链接中找到：https://github.com/asdvfghg/BearingPGA-Net。

Benchmarking and Analyzing Robust Point Cloud Recognition: Bag of Tricks for Defending Adversarial Examples

paper_url: http://arxiv.org/abs/2307.16361
repo_url: https://github.com/qiufan319/benchmark_pc_attack
paper_authors: Qiufan Ji, Lin Wang, Cong Shi, Shengshan Hu, Yingying Chen, Lichao Sun
for: The paper is written for defending deep neural networks (DNNs) against adversarial examples in 3D point cloud recognition.
methods: The paper uses a comprehensive and rigorous benchmark to evaluate adversarial robustness, collects existing defense tricks, and proposes a hybrid training augmentation method that considers various types of point cloud adversarial examples.
results: The paper achieves an average accuracy of 83.45% against various attacks, demonstrating the capability of the proposed defense framework to enable robust learners.Here is the information in Simplified Chinese text:
for: 本研究是为了防御深度神经网络（DNNs）在3D点云识别中受到攻击的。
methods: 本文使用了一个完整的和严格的benchmark来评估攻击性 robustness，收集了现有的防御技巧，并提出了一种 combining多种点云攻击的训练增强方法。
results: 本文通过多种攻击测试得到了83.45%的平均准确率，demonstrating the capabilities of the proposed defense framework to enable robust learners。

Abstract
Deep Neural Networks (DNNs) for 3D point cloud recognition are vulnerable to adversarial examples, threatening their practical deployment. Despite the many research endeavors have been made to tackle this issue in recent years, the diversity of adversarial examples on 3D point clouds makes them more challenging to defend against than those on 2D images. For examples, attackers can generate adversarial examples by adding, shifting, or removing points. Consequently, existing defense strategies are hard to counter unseen point cloud adversarial examples. In this paper, we first establish a comprehensive, and rigorous point cloud adversarial robustness benchmark to evaluate adversarial robustness, which can provide a detailed understanding of the effects of the defense and attack methods. We then collect existing defense tricks in point cloud adversarial defenses and then perform extensive and systematic experiments to identify an effective combination of these tricks. Furthermore, we propose a hybrid training augmentation methods that consider various types of point cloud adversarial examples to adversarial training, significantly improving the adversarial robustness. By combining these tricks, we construct a more robust defense framework achieving an average accuracy of 83.45\% against various attacks, demonstrating its capability to enabling robust learners. Our codebase are open-sourced on: \url{https://github.com/qiufan319/benchmark_pc_attack.git}.

摘要
深度神经网络（DNNs） для三维点云识别是易受到攻击的，这 threatening its practical deployment. despite many research efforts have been made to address this issue in recent years, the diversity of adversarial examples on 3D point clouds makes them more challenging to defend against than those on 2D images. For example, attackers can generate adversarial examples by adding, shifting, or removing points. As a result, existing defense strategies are difficult to counter unseen point cloud adversarial examples.In this paper, we first establish a comprehensive and rigorous point cloud adversarial robustness benchmark to evaluate adversarial robustness, which can provide a detailed understanding of the effects of the defense and attack methods. We then collect existing defense tricks in point cloud adversarial defenses and perform extensive and systematic experiments to identify an effective combination of these tricks. Furthermore, we propose a hybrid training augmentation method that considers various types of point cloud adversarial examples to adversarial training, significantly improving the adversarial robustness. By combining these tricks, we construct a more robust defense framework achieving an average accuracy of 83.45% against various attacks, demonstrating its capability to enabling robust learners. Our codebase is open-sourced on: .

Probabilistically robust conformal prediction

paper_url: http://arxiv.org/abs/2307.16360
repo_url: https://github.com/1995subhankar1995/PRCP
paper_authors: Subhankar Ghosh, Yuanjie Shi, Taha Belkhouja, Yan Yan, Jana Doppa, Brian Jones
for: This paper focuses on developing a probabilistically robust conformal prediction (PRCP) algorithm to ensure robustness to natural/adversarial perturbations in testing examples.
methods: The proposed PRCP algorithm uses a novel adaptive approach called “quantile-of-quantile” design to determine two parallel thresholds for data samples and perturbations, achieving better trade-offs between nominal performance and robustness.
results: The proposed aPRCP algorithm is experimentally demonstrated to achieve better trade-offs than state-of-the-art conformal prediction (CP) and adversarially robust CP algorithms on CIFAR-10, CIFAR-100, and ImageNet datasets using deep neural networks.

Abstract
Conformal prediction (CP) is a framework to quantify uncertainty of machine learning classifiers including deep neural networks. Given a testing example and a trained classifier, CP produces a prediction set of candidate labels with a user-specified coverage (i.e., true class label is contained with high probability). Almost all the existing work on CP assumes clean testing data and there is not much known about the robustness of CP algorithms w.r.t natural/adversarial perturbations to testing examples. This paper studies the problem of probabilistically robust conformal prediction (PRCP) which ensures robustness to most perturbations around clean input examples. PRCP generalizes the standard CP (cannot handle perturbations) and adversarially robust CP (ensures robustness w.r.t worst-case perturbations) to achieve better trade-offs between nominal performance and robustness. We propose a novel adaptive PRCP (aPRCP) algorithm to achieve probabilistically robust coverage. The key idea behind aPRCP is to determine two parallel thresholds, one for data samples and another one for the perturbations on data (aka "quantile-of-quantile" design). We provide theoretical analysis to show that aPRCP algorithm achieves robust coverage. Our experiments on CIFAR-10, CIFAR-100, and ImageNet datasets using deep neural networks demonstrate that aPRCP achieves better trade-offs than state-of-the-art CP and adversarially robust CP algorithms.

摘要
“对于机器学习分类器，具有不同程度的不确定性是一个重要的考虑因素。这篇论文探讨了一个名为“可靠性推断（Conformal Prediction，CP）”的框架，可以为机器学习分类器量化不确定性。给定一个测试例子和一个已经训练好的分类器，CP 可以生成一个包含真实类别的可能性的预测集。然而，大多数现有的 CP 研究假设测试数据是清洁的，而且对于自然或攻击性的推偏而言， CP 的可靠性不充分了解。这篇论文提出了一个名为“可靠性推断（PRCP）”的问题，它可以确保分类器对于大多数推偏而言是可靠的。我们提出了一个名为“可靠性推断（aPRCP）”的新算法，它可以实现可靠性推断。aPRCP 的关键思想是在测试数据和推偏之间设置两个平行的阈值（也称为“量ile-of-quantile”设计）。我们提供了理论分析，证明 aPRCP 算法可以实现可靠性推断。我们在 CIFAR-10、CIFAR-100 和 ImageNet dataset 上进行了深度神经网络的实验，结果显示 aPRCP 算法可以更好地平衡 Nominal 性和可靠性。”

Moreau-Yoshida Variational Transport: A General Framework For Solving Regularized Distributional Optimization Problems

paper_url: http://arxiv.org/abs/2307.16358
repo_url: None
paper_authors: Dai Hai Nguyen, Tetsuya Sakurai
for: solves a regularized distributional optimization problem widely appeared in machine learning and statistics, such as proximal Monte-Carlo sampling, Bayesian inference and generative modeling.methods: employs the Moreau-Yoshida envelope for a smooth approximation of the nonsmooth function in the objective, and leverages the variational representation to reformulate the approximate problem as a concave-convex saddle point problem.results: provides theoretical analyses and reports experimental results to demonstrate the effectiveness of the proposed method.

Abstract
We consider a general optimization problem of minimizing a composite objective functional defined over a class of probability distributions. The objective is composed of two functionals: one is assumed to possess the variational representation and the other is expressed in terms of the expectation operator of a possibly nonsmooth convex regularizer function. Such a regularized distributional optimization problem widely appears in machine learning and statistics, such as proximal Monte-Carlo sampling, Bayesian inference and generative modeling, for regularized estimation and generation. We propose a novel method, dubbed as Moreau-Yoshida Variational Transport (MYVT), for solving the regularized distributional optimization problem. First, as the name suggests, our method employs the Moreau-Yoshida envelope for a smooth approximation of the nonsmooth function in the objective. Second, we reformulate the approximate problem as a concave-convex saddle point problem by leveraging the variational representation, and then develope an efficient primal-dual algorithm to approximate the saddle point. Furthermore, we provide theoretical analyses and report experimental results to demonstrate the effectiveness of the proposed method.

摘要
我们考虑一个总体优化问题，即将一类概率分布中的一个复合目标函数进行最小化。这个目标函数由两个函数组成：一个假设具有变量表示，另一个是一个可能非均衡的凸函数。这种凸函数regularizer在机器学习和统计中广泛应用，例如距离 Monte Carlo 抽样、 bayesian 推断和生成模型。我们提出一种新的方法，称为 Moreau-Yoshida 变量运输（MYVT），以解决这种凸函数regularized 分布优化问题。首先，我们的方法使用Moreau-Yoshida 覆盖函数来将非均衡函数在目标函数中进行简化。其次，我们将 reformulate approximate 问题为一个凹凸两点问题，并利用变量表示来解决。然后，我们开发了一种高效的主动-副本算法来approximate 两点。 finally，我们提供了理论分析和实验结果，以证明我们的方法的有效性。

Hypertension Detection From High-Dimensional Representation of Photoplethysmogram Signals

paper_url: http://arxiv.org/abs/2308.02425
repo_url: https://github.com/navidhasanzadeh/hypertension_ppg
paper_authors: Navid Hasanzadeh, Shahrokh Valaee, Hojjat Salehinejad
for: 旨在检测高血压，使用光敏plethysmogram（PPG）信号。
methods: 提出了一种基于随机核函数的高维表示技术，用于检测高血压。
results: 实验结果表明，该关系不仅限于心率和血压，而且可以扩展到更多的特征。此外，使用核函数变换为终端时间序列特征提取器，超过了前一些研究和现代深度学习模型的性能。

Abstract
Hypertension is commonly referred to as the "silent killer", since it can lead to severe health complications without any visible symptoms. Early detection of hypertension is crucial in preventing significant health issues. Although some studies suggest a relationship between blood pressure and certain vital signals, such as Photoplethysmogram (PPG), reliable generalization of the proposed blood pressure estimation methods is not yet guaranteed. This lack of certainty has resulted in some studies doubting the existence of such relationships, or considering them weak and limited to heart rate and blood pressure. In this paper, a high-dimensional representation technique based on random convolution kernels is proposed for hypertension detection using PPG signals. The results show that this relationship extends beyond heart rate and blood pressure, demonstrating the feasibility of hypertension detection with generalization. Additionally, the utilized transform using convolution kernels, as an end-to-end time-series feature extractor, outperforms the methods proposed in the previous studies and state-of-the-art deep learning models.

摘要
高血压通常被称为"无论的杀手"，因为它可能会导致严重的健康问题无需任何可见的symptoms。早期检测高血压是预防重要的健康问题的锁定要素。虽然一些研究表明血压和某些生命体征之间存在关系，但可靠地总结这些提议的血压估算方法并不 yet guaranteed。这种不确定性导致了一些研究质疑这些关系的存在，或者认为这些关系是弱小limited to heart rate and blood pressure。在这篇论文中，一种基于随机核函数的高维表示技术被提出用于使用PPG信号检测高血压。结果表明这种关系超出了heart rate和血压，证明了检测高血压的可能性。此外，使用核函数来实现终端时间序列特征提取，比前一些研究和现代深度学习模型都高效。

Rating-based Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.16348
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Devin White, Mingkang Wu, Ellen Novoseller, Vernon Lawhern, Nick Waytowich, Yongcan Cao
for: 本研究开发了一种新的评价基于学习方法，利用人类评价来获取人类指导。与现有的喜好度基于和排名基于学习惯例不同，本研究基于人类评价个别路径而不需要相对比较sample pair的喜好度。
methods: 本研究使用了一个新的预测模型来预测人类评价，以及一种多组别损失函数。
results: 经过多个实验研究，包括基于实验实际评价和synthetic评价，本研究获得了该方法的有效性和优势。

Abstract
This paper develops a novel rating-based reinforcement learning approach that uses human ratings to obtain human guidance in reinforcement learning. Different from the existing preference-based and ranking-based reinforcement learning paradigms, based on human relative preferences over sample pairs, the proposed rating-based reinforcement learning approach is based on human evaluation of individual trajectories without relative comparisons between sample pairs. The rating-based reinforcement learning approach builds on a new prediction model for human ratings and a novel multi-class loss function. We conduct several experimental studies based on synthetic ratings and real human ratings to evaluate the effectiveness and benefits of the new rating-based reinforcement learning approach.

摘要
这个论文提出了一种新的评分基于束缚学习方法，利用人类评分来获取人类指导。与现有的偏好基于样本对比和排名基于样本对比不同，提出的评分基于束缚学习方法是基于人类评估个体轨迹而不需要对样本对比进行相对比较。该方法建立在新的人类评分预测模型和多类损失函数之上。我们通过对 sintetic评分和真实人类评分进行多个实验研究来评估新的评分基于束缚学习方法的有效性和优势。

Proof-of-Federated-Learning-Subchain: Free Partner Selection Subchain Based on Federated Learning

paper_url: http://arxiv.org/abs/2307.16342
repo_url: None
paper_authors: Boyang Li, Bingyu Shen, Qing Lu, Taeho Jung, Yiyu Shi
for: 本研究旨在提出一种新的证明方式，以填补当前Proof-of-Deep-Learning(PoDL)承诺的缺陷。
methods: 本研究使用了聚合学习模型训练任务作为填补Hashing的可用功能。
results: 在 simulations 中，我们发现在受限的订单池大小下， miner WITH 高Shapley Value (SV) 会获得更好的机会被选择。在实验中，Proof-of-Federated-Learning-Subchain (PoFLSC) 证明支持了子链管理员在受限订单池大小下建立和维护竞争性子链。

Abstract
The continuous thriving of the Blockchain society motivates research in novel designs of schemes supporting cryptocurrencies. Previously multiple Proof-of-Deep-Learning(PoDL) consensuses have been proposed to replace hashing with useful work such as deep learning model training tasks. The energy will be more efficiently used while maintaining the ledger. However deep learning models are problem-specific and can be extremely complex. Current PoDL consensuses still require much work to realize in the real world. In this paper, we proposed a novel consensus named Proof-of-Federated-Learning-Subchain(PoFLSC) to fill the gap. We applied a subchain to record the training, challenging, and auditing activities and emphasized the importance of valuable datasets in partner selection. We simulated 20 miners in the subchain to demonstrate the effectiveness of PoFLSC. When we reduce the pool size concerning the reservation priority order, the drop rate difference in the performance in different scenarios further exhibits that the miner with a higher Shapley Value (SV) will gain a better opportunity to be selected when the size of the subchain pool is limited. In the conducted experiments, the PoFLSC consensus supported the subchain manager to be aware of reservation priority and the core partition of contributors to establish and maintain a competitive subchain.

摘要
continous thriving of the Blockchain society motivates research in novel designs of schemes supporting cryptocurrencies. Previously multiple Proof-of-Deep-Learning(PoDL) consensuses have been proposed to replace hashing with useful work such as deep learning model training tasks. The energy will be more efficiently used while maintaining the ledger. However deep learning models are problem-specific and can be extremely complex. Current PoDL consensuses still require much work to realize in the real world. In this paper, we proposed a novel consensus named Proof-of-Federated-Learning-Subchain(PoFLSC) to fill the gap. We applied a subchain to record the training, challenging, and auditing activities and emphasized the importance of valuable datasets in partner selection. We simulated 20 miners in the subchain to demonstrate the effectiveness of PoFLSC. When we reduce the pool size concerning the reservation priority order, the drop rate difference in the performance in different scenarios further exhibits that the miner with a higher Shapley Value (SV) will gain a better opportunity to be selected when the size of the subchain pool is limited. In the conducted experiments, the PoFLSC consensus supported the subchain manager to be aware of reservation priority and the core partition of contributors to establish and maintain a competitive subchain.Here's the translation in Traditional Chinese:continuous thriving of the Blockchain society motivates research in novel designs of schemes supporting cryptocurrencies. Previously multiple Proof-of-Deep-Learning(PoDL) consensuses have been proposed to replace hashing with useful work such as deep learning model training tasks. The energy will be more efficiently used while maintaining the ledger. However deep learning models are problem-specific and can be extremely complex. Current PoDL consensuses still require much work to realize in the real world. In this paper, we proposed a novel consensus named Proof-of-Federated-Learning-Subchain(PoFLSC) to fill the gap. We applied a subchain to record the training, challenging, and auditing activities and emphasized the importance of valuable datasets in partner selection. We simulated 20 miners in the subchain to demonstrate the effectiveness of PoFLSC. When we reduce the pool size concerning the reservation priority order, the drop rate difference in the performance in different scenarios further exhibits that the miner with a higher Shapley Value (SV) will gain a better opportunity to be selected when the size of the subchain pool is limited. In the conducted experiments, the PoFLSC consensus supported the subchain manager to be aware of reservation priority and the core partition of contributors to establish and maintain a competitive subchain.

Theoretically Principled Trade-off for Stateful Defenses against Query-Based Black-Box Attacks

paper_url: http://arxiv.org/abs/2307.16331
repo_url: None
paper_authors: Ashish Hooda, Neal Mangaokar, Ryan Feng, Kassem Fawaz, Somesh Jha, Atul Prakash
for: 这个论文旨在探讨stateful defense的攻击检测与假阳性率之间的贸易OFF，并提供一种理论上的界限。
methods: 论文使用一种通用的特征提取器类型和相似度阈值来优化攻击检测和假阳性率的贸易OFF。
results: 论文通过理论分析和实验评估表明，stateful defense的攻击检测和假阳性率之间存在一定的贸易OFF，并且可以通过不同的特征提取器和相似度阈值来优化这个贸易OFF。

Abstract
Adversarial examples threaten the integrity of machine learning systems with alarming success rates even under constrained black-box conditions. Stateful defenses have emerged as an effective countermeasure, detecting potential attacks by maintaining a buffer of recent queries and detecting new queries that are too similar. However, these defenses fundamentally pose a trade-off between attack detection and false positive rates, and this trade-off is typically optimized by hand-picking feature extractors and similarity thresholds that empirically work well. There is little current understanding as to the formal limits of this trade-off and the exact properties of the feature extractors/underlying problem domain that influence it. This work aims to address this gap by offering a theoretical characterization of the trade-off between detection and false positive rates for stateful defenses. We provide upper bounds for detection rates of a general class of feature extractors and analyze the impact of this trade-off on the convergence of black-box attacks. We then support our theoretical findings with empirical evaluations across multiple datasets and stateful defenses.

摘要
⟨SYS⟩抗对抗示例威胁机器学习系统的稳定性，尤其在受限黑盒条件下。状态防御技术已经出现为有效的反应方法，通过维护最近几个查询来检测潜在攻击，并检测新的查询是否过于相似。然而，这些防御技术存在识别攻击和假阳性率之间的负面交易，这种交易通常通过手动选择特征提取器和相似性阈值来优化。现在我们对这种交易的形式上限和特征提取器/下面问题领域的影响没有很好的理解。这项工作的目的是解决这个问题，通过提供一种理论性的评估交易的权衡率和假阳性率的总bounds。我们还分析了黑盒攻击的收敛性受到这种交易的影响。最后，我们支持我们的理论发现通过多个数据集和状态防御技术的实验性评估。

Evaluating ChatGPT and GPT-4 for Visual Programming

paper_url: http://arxiv.org/abs/2308.02522
repo_url: None
paper_authors: Adish Singla
for: 本研究旨在检验现代生成模型在视觉编程领域是否具备高水平的能力，与文本编程领域的Python编程相比。
methods: 我们使用了ChatGPT和GPT-4两种生成模型，对各种视觉编程场景进行评估，并使用专家标注来评估其表现。
results: 我们发现，这两种模型在视觉编程领域表现不佳，尤其是在结合空间逻辑和编程技能方面遇到困难。这些结果提供了未来发展生成模型在视觉编程领域的探索方向。

Abstract
Generative AI and large language models have the potential to drastically improve the landscape of computing education by automatically generating personalized feedback and content. Recent works have studied the capabilities of these models for different programming education scenarios; however, these works considered only text-based programming, in particular, Python programming. Consequently, they leave open the question of how well these models would perform in visual programming domains popularly used for K-8 programming education. The main research question we study is: Do state-of-the-art generative models show advanced capabilities in visual programming on par with their capabilities in text-based Python programming? In our work, we evaluate two models, ChatGPT (based on GPT-3.5) and GPT-4, in visual programming domains for various scenarios and assess performance using expert-based annotations. In particular, we base our evaluation using reference tasks from the domains of Hour of Code: Maze Challenge by Code-dot-org and Karel. Our results show that these models perform poorly and struggle to combine spatial, logical, and programming skills crucial for visual programming. These results also provide exciting directions for future work on developing techniques to improve the performance of generative models in visual programming.

摘要
<>传送文本到简化中文。>生成AI和大语言模型有可能在计算教育中提供个性化反馈和内容，从而改善计算教育的景观。先前的研究已经研究了这些模型在不同的编程教育场景下的能力，但是这些研究仅考虑了文本编程，尤其是Python编程。因此，它们留下了如何在视觉编程领域中表现的问题。我们的研究问题是：现代生成模型在视觉编程领域中是否有高水平的表现，与文本基于Python编程的表现相当？在我们的工作中，我们评估了两个模型：ChatGPT（基于GPT-3.5）和GPT-4，在不同的视觉编程场景下进行评估，并使用专家标注来评估性能。具体来说，我们基于Code-dot-org的Hour of Code：迷宫挑战和Karel的参考任务进行评估。我们的结果表明，这些模型在视觉编程中表现糟糕，无法结合空间、逻辑和编程技能，这些技能是视觉编程中的关键。这些结果还提供了未来开发改进生成模型在视觉编程中表现的潜在方向。

RoseNNa: A performant, portable library for neural network inference with application to computational fluid dynamics

paper_url: http://arxiv.org/abs/2307.16322
repo_url: https://github.com/comp-physics/roseNNa
paper_authors: Ajay Bati, Spencer H. Bryngelson
for: 这篇论文主要应用在 computational fluid dynamics (CFD) 领域，将神经网络应用到 CFD 模型中以短化模拟时间。
methods: 本论文使用的方法包括 Multilayer Perceptrons (MLPs) 和 Long Short Term Memory (LSTM) 类型的神经网络架构，并通过自动将训练好的模型转换为高性能的 Fortran 库，以便与 CFD 模型集成。
results: 结果显示，使用 RoseNNa 实现神经网络架构后，与 PyTorch 和 libtorch 相比，在 MLPs 和 LSTM RNNs 中，具有少于 100 个隐藏层和 100 个神经元的情况下，可以获得较高的速度优化，具体而言，在小型神经网络中，速度优化因子在 10 到 2 之间，而在大型神经网络中，速度优化因子在 2 倍以上。

Abstract
The rise of neural network-based machine learning ushered in high-level libraries, including TensorFlow and PyTorch, to support their functionality. Computational fluid dynamics (CFD) researchers have benefited from this trend and produced powerful neural networks that promise shorter simulation times. For example, multilayer perceptrons (MLPs) and Long Short Term Memory (LSTM) recurrent-based (RNN) architectures can represent sub-grid physical effects, like turbulence. Implementing neural networks in CFD solvers is challenging because the programming languages used for machine learning and CFD are mostly non-overlapping, We present the roseNNa library, which bridges the gap between neural network inference and CFD. RoseNNa is a non-invasive, lightweight (1000 lines), and performant tool for neural network inference, with focus on the smaller networks used to augment PDE solvers, like those of CFD, which are typically written in C/C++ or Fortran. RoseNNa accomplishes this by automatically converting trained models from typical neural network training packages into a high-performance Fortran library with C and Fortran APIs. This reduces the effort needed to access trained neural networks and maintains performance in the PDE solvers that CFD researchers build and rely upon. Results show that RoseNNa reliably outperforms PyTorch (Python) and libtorch (C++) on MLPs and LSTM RNNs with less than 100 hidden layers and 100 neurons per layer, even after removing the overhead cost of API calls. Speedups range from a factor of about 10 and 2 faster than these established libraries for the smaller and larger ends of the neural network size ranges tested.

摘要
neural network基于机器学习的兴起使得高级库，如TensorFlow和PyTorch，得到了支持。计算流体动力学（CFD）研究人员也从中受益，制作出了 poderoso neural networks， promises shorter simulation times。例如，多层感知器（MLPs）和Long Short Term Memory（LSTM）回归型（RNN）架构可以表示sub-grid物理效应，如湍流。实施 neural networks在CFD solvers 中是具有挑战，因为机器学习和CFD的编程语言通常不 overlap。我们提出了roseNNa库，它 bridge gap between neural network inference and CFD。roseNNa 是一个不侵入、轻量级（1000行），并且高性能的工具，用于 neural network inference，专注于 CFD 中常用的小型网络。它通过自动将训练过的模型从通常的 neural network 训练包转换成高性能 Fortran 库，并提供 C 和 Fortran API，从而降低了访问训练过的 neural networks 的努力，并保持了在 PDE 解决方案中的性能。结果表明，roseNNa 可靠地超越了 PyTorch（Python）和 libtorch（C++）在 MLPs 和 LSTM RNNs 中，即使去除 API 调用的开销。速度提高范围从约10倍到2倍，depending on the size of the neural network。

Towards Practical Robustness Auditing for Linear Regression

paper_url: http://arxiv.org/abs/2307.16315
repo_url: None
paper_authors: Daniel Freund, Samuel B. Hopkins
for: 检测小数据集的影响，以逆转OLS回归系数的策略
methods: 使用混合整数 quadratic constrained optimization 和 exact greedy method
results: 比state of the art大大提高性能，但对高维度问题还存在计算瓶颈

Abstract
We investigate practical algorithms to find or disprove the existence of small subsets of a dataset which, when removed, reverse the sign of a coefficient in an ordinary least squares regression involving that dataset. We empirically study the performance of well-established algorithmic techniques for this task -- mixed integer quadratically constrained optimization for general linear regression problems and exact greedy methods for special cases. We show that these methods largely outperform the state of the art and provide a useful robustness check for regression problems in a few dimensions. However, significant computational bottlenecks remain, especially for the important task of disproving the existence of such small sets of influential samples for regression problems of dimension $3$ or greater. We make some headway on this challenge via a spectral algorithm using ideas drawn from recent innovations in algorithmic robust statistics. We summarize the limitations of known techniques in several challenge datasets to encourage further algorithmic innovation.

摘要
我们研究了实用的算法来找到或证明 dataset 中小subset 的存在，将其从 regression 问题中除去，使得回归系数的符号变化。我们对常用的算法技术进行实证研究，包括混合整数 quadratic 约束优化问题和特殊情况下的精准搜索方法。我们发现这些方法在大多数情况下表现出色，提供了有用的 robustness 检查。但是，特别是 dla regression 问题的维度大于 3 的情况下，计算瓶颈仍然存在，导致重要的 task 的实现受阻。我们通过使用最近的算法Robust statistics 的想法，开发了一种spectral 算法，尝试解决这个挑战。我们对一些挑战数据集的限制进行总结，以鼓励进一步的算法创新。

Mask-guided Data Augmentation for Multiparametric MRI Generation with a Rare Hepatocellular Carcinoma

paper_url: http://arxiv.org/abs/2307.16314
repo_url: None
paper_authors: Karen Sanchez, Carlos Hinojosa, Kevin Arias, Henry Arguello, Denis Kouame, Olivier Meyrignac, Adrian Basarab
for: 这篇论文主要是为了提高深度学习模型在医疗领域的性能而开发的数据扩充技术。
methods: 这篇论文提出了一种新的数据扩充建议，通过一种生成深度学习approach来生成多 Parametric（T1arterial、T1portal和T2）磁共振成像（MRI）图像，并生成相应的肝肿瘤mask。
results: 论文的实验结果表明，这种方法可以使用有限的多 Parametric dataset来生成1000个synthetic triplets和其对应的肝肿瘤mask，Frechet Inception Distance分数为86.55。这种方法在2021年数据扩充挑战中获得了优胜奖。

Abstract
Data augmentation is classically used to improve the overall performance of deep learning models. It is, however, challenging in the case of medical applications, and in particular for multiparametric datasets. For example, traditional geometric transformations used in several applications to generate synthetic images can modify in a non-realistic manner the patients' anatomy. Therefore, dedicated image generation techniques are necessary in the medical field to, for example, mimic a given pathology realistically. This paper introduces a new data augmentation architecture that generates synthetic multiparametric (T1 arterial, T1 portal, and T2) magnetic resonance images (MRI) of massive macrotrabecular subtype hepatocellular carcinoma with their corresponding tumor masks through a generative deep learning approach. The proposed architecture creates liver tumor masks and abdominal edges used as input in a Pix2Pix network for synthetic data creation. The method's efficiency is demonstrated by training it on a limited multiparametric dataset of MRI triplets from $89$ patients with liver lesions to generate $1,000$ synthetic triplets and their corresponding liver tumor masks. The resulting Frechet Inception Distance score was $86.55$. The proposed approach was among the winners of the 2021 data augmentation challenge organized by the French Society of Radiology.

摘要
“数据扩展是深度学习模型性能改进的传统方法，但在医疗应用中具有挑战性，特别是对多 параметric 数据进行处理。例如，传统的几何变换在许多应用中用于生成合成图像，可能会非实际地改变病人的解剖结构。因此，医疗领域需要专门的图像生成技术，例如模拟给定疾病的合成图像。这篇论文介绍了一种新的数据扩展建立，通过生成多参数（T1血管、T1入口和T2）核磁共振成像（MRI）图像，并生成相应的肝肿瘤面积大型巨孢细胞肿瘤肝癌的合成图像。提议的建立使用生成深度学习方法创建肝肿瘤面积和腹部边缘，并将其作为PIx2Pix网络的输入进行合成数据创建。方法的效率被证明通过使用有限的多参数MRI三重组合来训练，从89名患有肝脏病变的病人中生成1000个合成三重组合和相应的肝肿瘤面积。结果的Frechet Inception Distance分数为86.55。该方法在2021年数据扩展挑战中被法国放射学会评选为获奖作品。”Note: Simplified Chinese is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and Macau.

You Shall not Pass: the Zero-Gradient Problem in Predict and Optimize for Convex Optimization

paper_url: http://arxiv.org/abs/2307.16304
repo_url: None
paper_authors: Grigorii Veviurko, Wendelin Böhmer, Mathijs de Weerdt
for: 这篇论文是关于决策参数预测和优化的一种流行方法，它使用机器学习来预测优化问题中的未知参数。
methods: 这篇论文使用了任务性能作为损失函数，以训练预测模型。
results: 论文发现了这种方法的一个弱点—梯度问题，并提出了一种基于差分优化的解决方法，并在两个实际 benchmark 上进行了验证。

Abstract
Predict and optimize is an increasingly popular decision-making paradigm that employs machine learning to predict unknown parameters of optimization problems. Instead of minimizing the prediction error of the parameters, it trains predictive models using task performance as a loss function. In the convex optimization domain, predict and optimize has seen significant progress due to recently developed methods for differentiating optimization problem solutions over the problem parameters. This paper identifies a yet unnoticed drawback of this approach -- the zero-gradient problem -- and introduces a method to solve it. The suggested method is based on the mathematical properties of differential optimization and is verified using two real-world benchmarks.

摘要
预测和优化是一种日益受欢迎的决策模式，它利用机器学习预测未知优化问题中的参数。而不是将参数预测错误降为最小值，它使用任务性能作为损失函数来训练预测模型。在凸优化领域，预测和优化已经取得了显著进步，这主要归功于最近发展出的对优化问题解决方案的差分优化方法。然而，这种方法还存在一个未注意的缺点：零梯度问题。这篇论文描述了这个缺点，并提出了一种解决方法，基于差分优化的数学性质。该方法在两个真实世界 benchmark 上进行了验证。

Predicting delays in Indian lower courts using AutoML and Decision Forests

paper_url: http://arxiv.org/abs/2307.16285
repo_url: https://github.com/mb7419/pendencyprediction
paper_authors: Mohit Bhatnagar, Shivraj Huchhanavar
for: 预测印度下级法院延迟的 Classification 模型 (what the paper is written for)
methods: 使用 AutoML 开发多类别分类模型，并使用 binary decision forest 分类器提高预测精度 (what methods the paper uses)
results: 最佳模型达到 81.4% 的准确率，并且 precision、recall 和 F1 分别为 0.81 (what results the paper gets)

Abstract
This paper presents a classification model that predicts delays in Indian lower courts based on case information available at filing. The model is built on a dataset of 4.2 million court cases filed in 2010 and their outcomes over a 10-year period. The data set is drawn from 7000+ lower courts in India. The authors employed AutoML to develop a multi-class classification model over all periods of pendency and then used binary decision forest classifiers to improve predictive accuracy for the classification of delays. The best model achieved an accuracy of 81.4%, and the precision, recall, and F1 were found to be 0.81. The study demonstrates the feasibility of AI models for predicting delays in Indian courts, based on relevant data points such as jurisdiction, court, judge, subject, and the parties involved. The paper also discusses the results in light of relevant literature and suggests areas for improvement and future research. The authors have made the dataset and Python code files used for the analysis available for further research in the crucial and contemporary field of Indian judicial reform.

摘要
Translation notes:* "lower courts" is translated as "下级法院" (xià jí fǎ yuàn), which refers to the lower-level courts in India's judicial system.* "case information" is translated as "案件信息" (àn jiàn xìn xì), which refers to the details of each court case.* "AutoML" is translated as "自动机器学习" (zì dòng jī shù xí), which stands for "automated machine learning" and refers to the use of software tools to automate the process of building machine learning models.* "multi-class classification model" is translated as "多类分类模型" (duō lèi fēn c categorization model), which refers to a type of machine learning model that can predict multiple classes or outcomes.* "binary decision forest classifiers" is translated as "二分裂树分类器" (èr fēn jié shù fēn c categorization), which refers to a type of machine learning model that uses a combination of decision trees to predict binary outcomes.* "precision, recall, and F1" are all translated as "准确率、报告率和F1值" (zhèng qiú lǐ, bào gào lǐ, and F1 value), which are all measures of the accuracy of a machine learning model.

zkDL: Efficient Zero-Knowledge Proofs of Deep Learning Training

paper_url: http://arxiv.org/abs/2307.16273
repo_url: None
paper_authors: Haochen Sun, Hongyang Zhang
for: 这篇论文旨在提供一种高效的零知证证明方法，以保护不信任的AI开发者的知识产权。
methods: 该方法使用特殊的零知证证明协议——zkReLU，以优化证明时间和证明大小，使其适用于ReLU激活函数，这是许多验证过程中的主要障碍。
results: zkDL可以在几秒钟内，对一个深度16层神经网络，并在200M参数下，生成完整和准确的证明，证明时间和证明大小均在20 kB之下，保护数据和模型参数的隐私。

Abstract
The recent advancements in deep learning have brought about significant changes in various aspects of people's lives. Meanwhile, these rapid developments have raised concerns about the legitimacy of the training process of deep networks. However, to protect the intellectual properties of untrusted AI developers, directly examining the training process by accessing the model parameters and training data by verifiers is often prohibited. In response to this challenge, we present zkDL, an efficient zero-knowledge proof of deep learning training. At the core of zkDL is zkReLU, a specialized zero-knowledge proof protocol with optimized proving time and proof size for the ReLU activation function, a major obstacle in verifiable training due to its non-arithmetic nature. To integrate zkReLU into the proof system for the entire training process, we devise a novel construction of an arithmetic circuit from neural networks. By leveraging the abundant parallel computation resources, this construction reduces proving time and proof sizes by a factor of the network depth. As a result, zkDL enables the generation of complete and sound proofs, taking less than a minute with a size of less than 20 kB per training step, for a 16-layer neural network with 200M parameters, while ensuring the privacy of data and model parameters.

摘要
The core of zkDL is zkReLU, a specialized zero-knowledge proof protocol that is optimized for the ReLU activation function, which has been a major obstacle in verifiable training due to its non-arithmetic nature. By leveraging abundant parallel computation resources, we have devised a novel construction of an arithmetic circuit from neural networks, which reduces proving time and proof sizes by a factor of the network depth.With zkDL, we can generate complete and sound proofs in less than a minute, with a size of less than 20 kB per training step, for a 16-layer neural network with 200M parameters, while ensuring the privacy of data and model parameters. This solution is efficient and effective, and it addresses the challenge of verifying the training process of deep networks in a privacy-preserving manner.

S	M	T	W	T	F	S
« December
29	30	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31	1	2
3	4	5	6	7	8	9