cs.LG - 2023-07-14

Exploiting Counter-Examples for Active Learning with Partial labels

  • paper_url: http://arxiv.org/abs/2307.07413
  • repo_url: https://github.com/Ferenas/APLL
  • paper_authors: Fei Zhang, Yunjie Ye, Lei Feng, Zhongwen Rao, Jieming Zhu, Marcus Kalander, Chen Gong, Jianye Hao, Bo Han
  • for: 本文研究一新的激活学习问题,即半标签激活学习(ALPL)。在这种设定下,一个oracle只需对查询样本进行半标签注解,从而降低了oracle的精确标签处理压力。
  • methods: 我们首先构建了一个直观的基线,可以轻松地 incorporated into existing AL frameworks。然而,这个基eline仍然受到了过拟合的影响,并且在查询过程中缺乏表达部分标签的样本。 drawing inspiration from人类的推理在认知科学中,我们想要利用人类学习模式来解决过拟合问题,同时提高选择表达部分标签的样本。我们构建了一个简单而有效的worstNet,可以直接学习从这种补做模式。
  • results: 我们在五个实际 datasets和四个benchmark datasets上进行了实验,并证明了我们的提议方法在十个代表性的AL框架中具有了广泛的改进。这说明了worstNet的优越性。代码将在 \url{https://github.com/Ferenas/APLL} 上提供。
    Abstract This paper studies a new problem, \emph{active learning with partial labels} (ALPL). In this setting, an oracle annotates the query samples with partial labels, relaxing the oracle from the demanding accurate labeling process. To address ALPL, we first build an intuitive baseline that can be seamlessly incorporated into existing AL frameworks. Though effective, this baseline is still susceptible to the \emph{overfitting}, and falls short of the representative partial-label-based samples during the query process. Drawing inspiration from human inference in cognitive science, where accurate inferences can be explicitly derived from \emph{counter-examples} (CEs), our objective is to leverage this human-like learning pattern to tackle the \emph{overfitting} while enhancing the process of selecting representative samples in ALPL. Specifically, we construct CEs by reversing the partial labels for each instance, and then we propose a simple but effective WorseNet to directly learn from this complementary pattern. By leveraging the distribution gap between WorseNet and the predictor, this adversarial evaluation manner could enhance both the performance of the predictor itself and the sample selection process, allowing the predictor to capture more accurate patterns in the data. Experimental results on five real-world datasets and four benchmark datasets show that our proposed method achieves comprehensive improvements over ten representative AL frameworks, highlighting the superiority of WorseNet. The source code will be available at \url{https://github.com/Ferenas/APLL}.
    摘要 Inspired by human inference in cognitive science, where people can make accurate inferences from counter-examples (CEs), we aim to leverage this human-like learning pattern to tackle overfitting and improve the process of selecting representative samples in APLP. We construct CEs by reversing the partial labels for each instance, and then we propose a simple but effective WorseNet to directly learn from this complementary pattern. By leveraging the distribution gap between WorseNet and the predictor, this adversarial evaluation manner can enhance both the performance of the predictor and the sample selection process, allowing the predictor to capture more accurate patterns in the data.Experimental results on five real-world datasets and four benchmark datasets show that our proposed method achieves comprehensive improvements over ten representative AL frameworks. The source code will be available at \url{https://github.com/Ferenas/APLL}.Translation notes:* "active learning with partial labels" (ALPL) is translated as "活动学习半标签" (ALPL) in Simplified Chinese.* "oracle" is translated as "oracle" in Simplified Chinese.* "partial labels" is translated as "半标签" in Simplified Chinese.* "counter-examples" (CEs) is translated as "反例" (CEs) in Simplified Chinese.* "WorseNet" is translated as "差网" (WorseNet) in Simplified Chinese.

HuCurl: Human-induced Curriculum Discovery

  • paper_url: http://arxiv.org/abs/2307.07412
  • repo_url: None
  • paper_authors: Mohamed Elgaar, Hadi Amiri
  • for: 本研究旨在解决课程发现问题,提出了一种基于先前知识的课程学习框架,能够在课程空间中发现有效的课程。
  • methods: 使用笔记 entropy 和损失作为难度度量,并通过对模型和数据集的评估来找到最佳的课程。
  • results: 研究发现,常见的 monotonic 课程在某些情况下可能不会达到最佳性能,而 non-monotonic 课程往往会表现出优异性能。此外,在小型数据集和模型上采用的课程也可以在大型数据集和模型上表现出优异性能。
    Abstract We introduce the problem of curriculum discovery and describe a curriculum learning framework capable of discovering effective curricula in a curriculum space based on prior knowledge about sample difficulty. Using annotation entropy and loss as measures of difficulty, we show that (i): the top-performing discovered curricula for a given model and dataset are often non-monotonic as opposed to monotonic curricula in existing literature, (ii): the prevailing easy-to-hard or hard-to-easy transition curricula are often at the risk of underperforming, and (iii): the curricula discovered for smaller datasets and models perform well on larger datasets and models respectively. The proposed framework encompasses some of the existing curriculum learning approaches and can discover curricula that outperform them across several NLP tasks.
    摘要 我团队介绍了课程发现问题,并描述了一种基于优化课程空间的学习框架,可以在尝试知识的基础上发现有效的课程。使用笔记 entropy 和损失作为困难度的度量,我们显示了以下结论:(i):给定模型和数据集的情况下,常见的非 monotonic 课程在课程空间中表现出色,而不是 monotonic 课程。(ii):常见的易到困难或困难到易转课程在批处理大数据集上存在风险,容易表现不佳。(iii):对小型模型和数据集,我们发现的课程可以在大型模型和数据集上表现出色。我们的框架包含一些现有的课程学习方法,并可以在多个自然语言处理任务中发现表现更好的课程。

Improved Convergence Analysis and SNR Control Strategies for Federated Learning in the Presence of Noise

  • paper_url: http://arxiv.org/abs/2307.07406
  • repo_url: None
  • paper_authors: Antesh Upadhyay, Abolfazl Hashemi
  • for: 这个论文旨在描述一种改进的联合学习分布式学习(Federated Learning,FL)的减法分析技术,该技术考虑了实际部署中的不完美通信场景。
  • methods: 该论文提出了一种新的减法分析技术,该技术可以在FL中识别下行和上行通信的不同影响,并提出了一种新的信号响应控制策略,以提高FL的收敛速率。
  • results: 该论文的分析结果显示,在FL中,下行噪声的影响更加严重,而上行噪声的影响较弱。基于这个发现, authors提出了一种新的信号响应控制策略,可以在不受到噪声影响的前提下,提高FL的收敛速率。
    Abstract We propose an improved convergence analysis technique that characterizes the distributed learning paradigm of federated learning (FL) with imperfect/noisy uplink and downlink communications. Such imperfect communication scenarios arise in the practical deployment of FL in emerging communication systems and protocols. The analysis developed in this paper demonstrates, for the first time, that there is an asymmetry in the detrimental effects of uplink and downlink communications in FL. In particular, the adverse effect of the downlink noise is more severe on the convergence of FL algorithms. Using this insight, we propose improved Signal-to-Noise (SNR) control strategies that, discarding the negligible higher-order terms, lead to a similar convergence rate for FL as in the case of a perfect, noise-free communication channel while incurring significantly less power resources compared to existing solutions. In particular, we establish that to maintain the $O(\frac{1}{\sqrt{K})$ rate of convergence like in the case of noise-free FL, we need to scale down the uplink and downlink noise by $\Omega({\sqrt{k})$ and $\Omega({k})$ respectively, where $k$ denotes the communication round, $k=1,\dots, K$. Our theoretical result is further characterized by two major benefits: firstly, it does not assume the somewhat unrealistic assumption of bounded client dissimilarity, and secondly, it only requires smooth non-convex loss functions, a function class better suited for modern machine learning and deep learning models. We also perform extensive empirical analysis to verify the validity of our theoretical findings.
    摘要 我们提出了一种改进的收敛分析技术,用于描述 federated learning(FL)中的分布式学习 paradigm,在带有不完美/噪声的上行和下行通信的情况下。这些噪声通信场景在实际部署 FL 时经常出现。我们的分析表明,在 FL 中,下行噪声对收敛的影响更加严重。基于这一点,我们提出了改进的信噪比控制策略,使得在噪声通信道下,FL 的收敛速率与无噪声通信道相似,但却占用了较少的功率资源。具体来说,我们证明,要保持 $O(\frac{1}{\sqrt{K})$ 的收敛速率,就需要在下行和上行噪声中减小 $\Omega(\sqrt{k})$ 和 $\Omega(k)$,其中 $k$ 是通信轮次,$k=1,\dots, K$。我们的理论结论具有两个主要优点:首先,它不假设客户端之间的差异很大,而第二,它只需要非拥抱不对称损失函数,这是现代机器学习和深度学习模型更适合的函数类型。此外,我们还进行了广泛的实验分析,以验证我们的理论发现的正确性。

Performance of $\ell_1$ Regularization for Sparse Convex Optimization

  • paper_url: http://arxiv.org/abs/2307.07405
  • repo_url: None
  • paper_authors: Kyriakos Axiotis, Taisuke Yasuda
  • for: 这个论文是为了解释LASSO和Group LASSO在非统计问题中的保证。
  • methods: 这篇论文使用了vector-valued feature的强转移方法来解释Group LASSO的恢复性。
  • results: 论文显示了Group LASSO在强转移函数$l$下最小化时,可以 recuperate sparse vector支持vector-valued feature的最大$\ell_2$范数。这个结果回答了 Tibshirani等人和Yasuda等人的开问题,并推广了Sequential Attention算法的证明保证。
    Abstract Despite widespread adoption in practice, guarantees for the LASSO and Group LASSO are strikingly lacking in settings beyond statistical problems, and these algorithms are usually considered to be a heuristic in the context of sparse convex optimization on deterministic inputs. We give the first recovery guarantees for the Group LASSO for sparse convex optimization with vector-valued features. We show that if a sufficiently large Group LASSO regularization is applied when minimizing a strictly convex function $l$, then the minimizer is a sparse vector supported on vector-valued features with the largest $\ell_2$ norm of the gradient. Thus, repeating this procedure selects the same set of features as the Orthogonal Matching Pursuit algorithm, which admits recovery guarantees for any function $l$ with restricted strong convexity and smoothness via weak submodularity arguments. This answers open questions of Tibshirani et al. and Yasuda et al. Our result is the first to theoretically explain the empirical success of the Group LASSO for convex functions under general input instances assuming only restricted strong convexity and smoothness. Our result also generalizes provable guarantees for the Sequential Attention algorithm, which is a feature selection algorithm inspired by the attention mechanism proposed by Yasuda et al. As an application of our result, we give new results for the column subset selection problem, which is well-studied when the loss is the Frobenius norm or other entrywise matrix losses. We give the first result for general loss functions for this problem that requires only restricted strong convexity and smoothness.
    摘要 尽管在实践中广泛采用,LASSO和Group LASSO的保证在超出统计问题的设置下却缺乏保证,通常被视为统计问题中的启发式方法。我们提出了Group LASSO在稀疏凸优化中的首个回归保证,并证明如果在最小化一个固定输入的强烈凸函数$l$时应用足够大的Group LASSO正则化,那么最小值是一个稀疏的向量支持 vector-valued features的最大$\ell_2$范数。因此,重复这个过程可以选择同样的特征集,与Orthogonal Matching Pursuit算法相同,后者对任何函数$l$ WITH RESTRICTED STRONG CONVEXITY和SMOOTHNESS通过弱SubmodularityArguments提供了回归保证。这些问题得到了 Tibshirani et al.和Yasuda et al.的解答。我们的结果是对统计问题中的凸函数下的任意输入实例进行 theoretically解释Group LASSO的Empirical Success,只需要 RESTRICTED STRONG CONVEXITY和SMOOTHNESS假设。我们的结果还扩展了Sequential Attention算法的证明,Sequential Attention算法是一种基于注意机制的特征选择算法,而Yasuda et al.提出的。在我们的结果的应用中,我们给出了一个新的结果,即Column Subset Selection问题,这是一个广泛研究的问题,当loss是 Frobenius 范数或其他Entrywise Matrix losses时。我们给出了第一个对于一般损失函数的结果,只需要 RESTRICTED STRONG CONVEXITY和SMOOTHNESS假设。

Improving Zero-Shot Generalization for CLIP with Synthesized Prompts

  • paper_url: http://arxiv.org/abs/2307.07397
  • repo_url: https://github.com/mrflogs/SHIP
  • paper_authors: Zhengbo Wang, Jian Liang, Ran He, Nan Xu, Zilei Wang, Tieniu Tan
  • for: 提高CLIP模型的泛化能力,应对实际应用中数据不均匀性和Zipf’s law的问题。
  • methods: 提出了一种插件式生成方法—Synthesized Prompts~(\textbf{SHIP}),通过引入生成器来重构视觉特征,并使用文本Encoder来填充缺失的类别数据。
  • results: 通过对CLIP模型进行微调,实现了基于新数据的泛化、跨数据集转移学习和总是零例学习等多种任务的超越性表现。
    Abstract With the growing interest in pretrained vision-language models like CLIP, recent research has focused on adapting these models to downstream tasks. Despite achieving promising results, most existing methods require labeled data for all classes, which may not hold in real-world applications due to the long tail and Zipf's law. For example, some classes may lack labeled data entirely, such as emerging concepts. To address this problem, we propose a plug-and-play generative approach called \textbf{S}ynt\textbf{H}es\textbf{I}zed \textbf{P}rompts~(\textbf{SHIP}) to improve existing fine-tuning methods. Specifically, we follow variational autoencoders to introduce a generator that reconstructs the visual features by inputting the synthesized prompts and the corresponding class names to the textual encoder of CLIP. In this manner, we easily obtain the synthesized features for the remaining label-only classes. Thereafter, we fine-tune CLIP with off-the-shelf methods by combining labeled and synthesized features. Extensive experiments on base-to-new generalization, cross-dataset transfer learning, and generalized zero-shot learning demonstrate the superiority of our approach. The code is available at \url{https://github.com/mrflogs/SHIP}.
    摘要 随着人工智能视觉语言模型CLIP的兴趣增长,现有研究强调将这些模型适应下游任务。虽然实现了可喜的结果,但大多数现有方法需要所有类别的标注数据,这可能不符合实际应用中的情况,因为Zipf的法则和长尾问题。例如,某些类别可能完全缺乏标注数据,如新出现的概念。为解决这个问题,我们提出了一种插件式生成方法,即\textbf{S}ynt\textbf{H}es\textbf{I}zed \textbf{P}rompts~(\textbf{SHIP)},用以改进现有的辅助方法。具体来说,我们采用变量自动机制来引入一个生成器,通过输入生成的提示和对应的类别名称,将视觉特征重构回CLIP的文本编码器中。这样,我们可以轻松地获得生成的特征 для剩下的标签只有类。然后,我们将CLIP通过市场上可得到的方法进行精度调整,并将标注和生成的特征结合在一起。我们进行了基于新到旧泛化、跨数据集转移学习和通用零例学习的广泛实验,结果显示了我们的方法的优越性。代码可以在\url{https://github.com/mrflogs/SHIP}中找到。

Visualizing Overlapping Biclusterings and Boolean Matrix Factorizations

  • paper_url: http://arxiv.org/abs/2307.07396
  • repo_url: https://github.com/tmarette/biclustervisualization
  • paper_authors: Thibault Marette, Pauli Miettinen, Stefan Neumann
  • for: 这篇论文关注了如何视觉化带有重叠的群集的 биipartite 图的群集分解。
  • methods: 论文提出了三个目标函数,用于衡量视觉化的质量,并提供了一种新的规则来优化这些目标函数。
  • results: 实验结果表明,该规则可以在实际数据集上实现最佳的平衡,并且可以减少视觉化中的混乱。
    Abstract Finding (bi-)clusters in bipartite graphs is a popular data analysis approach. Analysts typically want to visualize the clusters, which is simple as long as the clusters are disjoint. However, many modern algorithms find overlapping clusters, making visualization more complicated. In this paper, we study the problem of visualizing \emph{a given clustering} of overlapping clusters in bipartite graphs and the related problem of visualizing Boolean Matrix Factorizations. We conceptualize three different objectives that any good visualization should satisfy: (1) proximity of cluster elements, (2) large consecutive areas of elements from the same cluster, and (3) large uninterrupted areas in the visualization, regardless of the cluster membership. We provide objective functions that capture these goals and algorithms that optimize these objective functions. Interestingly, in experiments on real-world datasets, we find that the best trade-off between these competing goals is achieved by a novel heuristic, which locally aims to place rows and columns with similar cluster membership next to each other.
    摘要 发现(二分)集群在 дву分图中是一种流行的数据分析方法。分析员通常希望可视化集群,只要集群是不 overlap的,那么很简单。然而,许多现代算法找到了重叠集群,使可视化变得更加复杂。在这篇论文中,我们研究了在二分图中可视化给定的集群和布尔矩阵因子化的问题,以及这两个问题之间的关系。我们提出了三个不同的目标,任何好的可视化都应该满足:(1)群元素之间的距离,(2)同一个群中的元素连续占用大面积,(3)无论群 membership,视化中的大面积不被打断。我们提出了捕捉这些目标的目标函数,并提供了优化这些目标函数的算法。实验结果表明,在真实世界 dataset 上,我们的新规则可以在这些竞争目标之间寻找最佳平衡。Note: "二分图" refers to a bipartite graph, and "布尔矩阵因子化" refers to Boolean matrix factorization.

CAMP: A Context-Aware Cricket Players Performance Metric

  • paper_url: http://arxiv.org/abs/2307.13700
  • repo_url: https://github.com/sohaibayub/camp
  • paper_authors: Muhammad Sohaib Ayub, Naimat Ullah, Sarwan Ali, Imdad Ullah Khan, Mian Muhammad Awais, Muhammad Asad Khan, Safiullah Faizullah
  • for: 这篇论文目的是为了提出一种Context-Aware Metric of player Performance(CAMP),用于评估 individuak 篮球运动员的表现。
  • methods: 这篇论文使用了数据挖掘技术,包括Context-Aware Metric of player Performance(CAMP),以便更好地支持数据驱动的决策。
  • results: 根据 empirical evaluation,CAMP 的评估结果与 domain experts 宣布的最佳球员(Man of the Match,MoM)相符合的比例为 83%,并且在比较与 Duckworth-Lewis-Stern 方法(DLS)中的最佳球员评估结果时表现出色。
    Abstract Cricket is the second most popular sport after soccer in terms of viewership. However, the assessment of individual player performance, a fundamental task in team sports, is currently primarily based on aggregate performance statistics, including average runs and wickets taken. We propose Context-Aware Metric of player Performance, CAMP, to quantify individual players' contributions toward a cricket match outcome. CAMP employs data mining methods and enables effective data-driven decision-making for selection and drafting, coaching and training, team line-ups, and strategy development. CAMP incorporates the exact context of performance, such as opponents' strengths and specific circumstances of games, such as pressure situations. We empirically evaluate CAMP on data of limited-over cricket matches between 2001 and 2019. In every match, a committee of experts declares one player as the best player, called Man of the M}atch (MoM). The top two rated players by CAMP match with MoM in 83\% of the 961 games. Thus, the CAMP rating of the best player closely matches that of the domain experts. By this measure, CAMP significantly outperforms the current best-known players' contribution measure based on the Duckworth-Lewis-Stern (DLS) method.
    摘要 cricket是世界上第二受欢迎的运动之一,仅次于足球。然而,评估个体运动员表现的任务,在团队运动中是一项基本任务,目前主要基于各种聚合性表现统计,如平均得分和夺取的球。我们提出了 Context-Aware Metric of player Performance(CAMP),用于评估cricket运动员的个人贡献。CAMP使用数据挖掘技术,可以帮助团队选择和培训、组队和策略开发等决策。CAMP考虑了特定的比赛情况,如对手的强点和游戏中的压力情况。我们对961场限定的cricket比赛数据进行了实验性评估,并发现CAMP评分与专家委员会宣布的最佳球员(Man of the Match,MoM)相匹配的比例为83%。因此,CAMP评分与专家评价几乎一致。此外,CAMP的评分也超过了基于Duckworth-Lewis-Stern(DLS)方法的现有最佳球员贡献度量。

Brain in the Dark: Design Principles for Neuro-mimetic Learning and Inference

  • paper_url: http://arxiv.org/abs/2307.08613
  • repo_url: None
  • paper_authors: Mehran H. Bazargani, Szymon Urbas, Karl Friston
  • for: 这篇论文旨在探讨脑内部做出感知的机制,具体来说是使用生成模型来描述脑内部做出的感知。
  • methods: 这篇论文使用生成模型来模拟脑内部做出的感知,并通过倒推来实现感知的归一化。
  • results: 这篇论文提出了一种基于脑内部生成模型的方法来实现感知的归一化,并讨论了不同的均场近似方法(MFA)和其影响于归一化学习(VI)。
    Abstract Even though the brain operates in pure darkness, within the skull, it can infer the most likely causes of its sensory input. An approach to modelling this inference is to assume that the brain has a generative model of the world, which it can invert to infer the hidden causes behind its sensory stimuli, that is, perception. This assumption raises key questions: how to formulate the problem of designing brain-inspired generative models, how to invert them for the tasks of inference and learning, what is the appropriate loss function to be optimised, and, most importantly, what are the different choices of mean field approximation (MFA) and their implications for variational inference (VI).
    摘要 虽然大脑在颅内完全黑暗中运行,但它可以推断感知输入的最有可能的原因。一种模型这种推断的方法是假设大脑有一个世界的生成模型,它可以反向推断感知的隐藏原因,即感知。这个假设提出了关键问题:如何构思大脑引起的生成模型设计问题,如何在推断和学习任务中对其进行反向推断,什么是合适的损失函数优化目标,以及不同的mean field approximation(MFA)选择对变量推断(VI)带来的不同影响。

Learning Sparse Neural Networks with Identity Layers

  • paper_url: http://arxiv.org/abs/2307.07389
  • repo_url: https://github.com/sccdnmj/Learning-Sparse-Neural-Networks-with-Identity-Layers
  • paper_authors: Mingjian Ni, Guangyao Chen, Xiawu Zheng, Peixi Peng, Li Yuan, Yonghong Tian
    for:This paper aims to improve the sparsity of deep neural networks by reducing interlayer feature similarity.methods:The proposed method uses Centered Kernel Alignment (CKA) to reduce feature similarity between layers and increase network sparsity.results:The proposed CKA-SR method consistently improves the performance of several State-Of-The-Art sparse training methods, especially at extremely high sparsity.Here is the simplified Chinese text:for: 这篇论文目的是提高深度神经网络的稀畴性,减少层次特征相似性。methods: 提议方法使用中心kernel对齐(CKA)来减少层次特征相似性,提高网络稀畴性。results: 提议CKA-SR方法在多个State-Of-The-Art稀畴训练方法中表现出色,特别是在极高稀畴性下表现更好。
    Abstract The sparsity of Deep Neural Networks is well investigated to maximize the performance and reduce the size of overparameterized networks as possible. Existing methods focus on pruning parameters in the training process by using thresholds and metrics. Meanwhile, feature similarity between different layers has not been discussed sufficiently before, which could be rigorously proved to be highly correlated to the network sparsity in this paper. Inspired by interlayer feature similarity in overparameterized models, we investigate the intrinsic link between network sparsity and interlayer feature similarity. Specifically, we prove that reducing interlayer feature similarity based on Centered Kernel Alignment (CKA) improves the sparsity of the network by using information bottleneck theory. Applying such theory, we propose a plug-and-play CKA-based Sparsity Regularization for sparse network training, dubbed CKA-SR, which utilizes CKA to reduce feature similarity between layers and increase network sparsity. In other words, layers of our sparse network tend to have their own identity compared to each other. Experimentally, we plug the proposed CKA-SR into the training process of sparse network training methods and find that CKA-SR consistently improves the performance of several State-Of-The-Art sparse training methods, especially at extremely high sparsity. Code is included in the supplementary materials.
    摘要 深度神经网络的稀疏性已经广泛研究,以最大化性能并减少过参数化网络的大小。现有方法主要通过在训练过程中使用阈值和度量来减少参数。而层之间的特征相似性还没有得到充分的研究,这里我们将通过中心kernel对齐(CKA)来降低网络稀疏性。利用信息瓶颈理论,我们提出了基于CKA的稀疏规范(CKA-SR),该规范通过减少层之间的特征相似性来增加网络的稀疏性。即层之间的稀疏网络具有更明确的标识性。我们在训练稀疏网络训练方法时对CKA-SR进行实验,发现CKA-SR可以在极高稀疏性下提高多种现有的状态数据训练方法的性能,特别是在极高稀疏性下。代码请参考辅料。

Higher-order topological kernels via quantum computation

  • paper_url: http://arxiv.org/abs/2307.07383
  • repo_url: None
  • paper_authors: Massimiliano Incudini, Francesco Martini, Alessandra Di Pierro
  • for: 本文旨在提出一种基于ptopological data analysis(TDA)的量子机器学习方法,以便从复杂数据中提取有用信息。
  • methods: 本文使用了TDA技术,将对象嵌入到 simplicial complex 中,并提取高维特征数(Betti numbers),这些数可以用来定义 kernel methods,并且可以与现有的机器学习算法集成。
  • results: 本文提出了一种基于Betti curves的量子定义 topological kernels 方法,并在一个干净的 simulator 上实现了一个工作示例。通过一些实验结果,表明 topological 方法可能在量子机器学习中具有优势。
    Abstract Topological data analysis (TDA) has emerged as a powerful tool for extracting meaningful insights from complex data. TDA enhances the analysis of objects by embedding them into a simplicial complex and extracting useful global properties such as the Betti numbers, i.e. the number of multidimensional holes, which can be used to define kernel methods that are easily integrated with existing machine-learning algorithms. These kernel methods have found broad applications, as they rely on powerful mathematical frameworks which provide theoretical guarantees on their performance. However, the computation of higher-dimensional Betti numbers can be prohibitively expensive on classical hardware, while quantum algorithms can approximate them in polynomial time in the instance size. In this work, we propose a quantum approach to defining topological kernels, which is based on constructing Betti curves, i.e. topological fingerprint of filtrations with increasing order. We exhibit a working prototype of our approach implemented on a noiseless simulator and show its robustness by means of some empirical results suggesting that topological approaches may offer an advantage in quantum machine learning.
    摘要

Composition-contrastive Learning for Sentence Embeddings

  • paper_url: http://arxiv.org/abs/2307.07380
  • repo_url: https://github.com/perceptiveshawty/compcse
  • paper_authors: Sachin J. Chanchani, Ruihong Huang
  • for: 本研究旨在提高自然语言表示方法中的vector表示。
  • methods: 本研究使用了对比学习方法,通过强制同样文本的几何变换后的表示相似性来学习文本表示。
  • results: 实验结果显示,相比基eline,本研究的方法可以提高 semantic textual similarity task 的表示质量,并且不需要 auxiliary training objective 或额外网络参数。
    Abstract Vector representations of natural language are ubiquitous in search applications. Recently, various methods based on contrastive learning have been proposed to learn textual representations from unlabelled data; by maximizing alignment between minimally-perturbed embeddings of the same text, and encouraging a uniform distribution of embeddings across a broader corpus. Differently, we propose maximizing alignment between texts and a composition of their phrasal constituents. We consider several realizations of this objective and elaborate the impact on representations in each case. Experimental results on semantic textual similarity tasks show improvements over baselines that are comparable with state-of-the-art approaches. Moreover, this work is the first to do so without incurring costs in auxiliary training objectives or additional network parameters.
    摘要 文本的矢量表示方法在搜索应用中广泛使用。近期,基于对比学习的不同方法被提议来从无标签数据中学习文本表示; 通过最大化同样文本的微小改动 embedding 之间的对应度,并尝试在更广泛的文库中均匀分布 embedding。不同的我们提议是通过最大化文本和其短语组成部分之间的对应度来实现。我们考虑了这些目标的不同实现方式,并详细描述了它们对表示的影响。实验结果表明,在 semantics 文本相似性任务中,我们的方法可以与现状技术相比获得显著提高,而无需额外训练目标或网络参数。Note: "矢量表示" in Chinese is typically translated as "vector representation" or "vector embedding", but I have used "矢量表示" throughout the text to maintain consistency with the original English phrasing.

Defect Classification in Additive Manufacturing Using CNN-Based Vision Processing

  • paper_url: http://arxiv.org/abs/2307.07378
  • repo_url: None
  • paper_authors: Xiao Liu, Alessandra Mileo, Alan F. Smeaton
  • for: 用于改进附加制造过程质量
  • methods: 使用图像感知器和活动学习技术
  • results: 精确地分类批量制造中的缺陷
    Abstract The development of computer vision and in-situ monitoring using visual sensors allows the collection of large datasets from the additive manufacturing (AM) process. Such datasets could be used with machine learning techniques to improve the quality of AM. This paper examines two scenarios: first, using convolutional neural networks (CNNs) to accurately classify defects in an image dataset from AM and second, applying active learning techniques to the developed classification model. This allows the construction of a human-in-the-loop mechanism to reduce the size of the data required to train and generate training data.
    摘要 通过计算机视觉和在位测量技术,可以从三维打印(AM)过程中收集大量数据。这些数据可以用机器学习技术来提高AM的质量。这篇论文研究了两种情况:首先,使用卷积神经网络(CNN)准确地分类AM图像集中的缺陷,其次,应用到开发的分类模型上的活动学习技术。这allowsthe construction of a human-in-the-loop mechanism to reduce the size of the data required to train and generate training data。Note: "Simplified Chinese" is also known as "Mandarin" or "Standard Chinese".

AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes

  • paper_url: http://arxiv.org/abs/2307.07370
  • repo_url: None
  • paper_authors: Guoyun Tu, Ying Liu, Vladimir Vlassov
  • for: 这篇论文主要探讨了一种基于注意力混合的图像标题生成模型,以提高图像标题生成的准确率和效率。
  • methods: 该模型使用了一种新的Attribute-Information-Combined Attention-Based Network(AIC-AB NET),该网络结合了空间注意力架构和文本特征信息,以优化图像标题生成。
  • results: 对于 MS COCO 数据集和我们新提出的时尚数据集,我们的 AIC-AB NET 与基eline模型和ablated模型进行比较,实验结果表明我们的模型在两个数据集上都有较高的表现,与基eline模型的 CIDEr 得分相比,我们的模型在 MS COCO 数据集上提高了0.017,而在时尚数据集上提高了0.095。
    Abstract Image captioning is a significant field across computer vision and natural language processing. We propose and present AIC-AB NET, a novel Attribute-Information-Combined Attention-Based Network that combines spatial attention architecture and text attributes in an encoder-decoder. For caption generation, adaptive spatial attention determines which image region best represents the image and whether to attend to the visual features or the visual sentinel. Text attribute information is synchronously fed into the decoder to help image recognition and reduce uncertainty. We have tested and evaluated our AICAB NET on the MS COCO dataset and a new proposed Fashion dataset. The Fashion dataset is employed as a benchmark of single-object images. The results show the superior performance of the proposed model compared to the state-of-the-art baseline and ablated models on both the images from MSCOCO and our single-object images. Our AIC-AB NET outperforms the baseline adaptive attention network by 0.017 (CIDEr score) on the MS COCO dataset and 0.095 (CIDEr score) on the Fashion dataset.
    摘要 Image captioning是一个重要的计算机视觉和自然语言处理领域。我们提议并发表了AIC-AB网络,一种新的Attribute-Information-Combined Attention-Based Network,它将空间注意力架构和文本属性注入到了Encoder-Decoder中。在caption生成过程中,适应的空间注意力确定了图像中最佳表示图像的区域,以及是否需要关注视觉特征或视觉监测。文本属性信息同时被 fed into Decoder,以帮助图像识别和减少不确定性。我们在MS COCO数据集和我们新提出的时尚数据集上测试和评估了我们的AICAB网络。时尚数据集被用作单个物体图像的标准准测试集。结果显示我们的提议模型与状态对比基eline和折衔模型在MS COCO数据集和时尚数据集上都有superior的性能。我们的AIC-AB网络在MS COCO数据集上比基eline适应注意力网络高0.017(CIDEr分数),在时尚数据集上高0.095(CIDEr分数)。

Source-Free Domain Adaptation with Temporal Imputation for Time Series Data

  • paper_url: http://arxiv.org/abs/2307.07542
  • repo_url: https://github.com/mohamedr002/mapu_sfda_ts
  • paper_authors: Mohamed Ragab, Emadeldeen Eldele, Min Wu, Chuan-Sheng Foo, Xiaoli Li, Zhenghua Chen
  • for: 本研究旨在适应无需访问源频道数据的时间序列数据预训练模型,保护源频道隐私。
  • methods: 本方法使用随机屏蔽时间序列信号,并利用一种新的时间序列填充器来重建原始信号在嵌入空间中。在适应步骤中,填充器网络被用来引导目标模型生成与源模型相关的目标特征。
  • results: EXTENSIVE experiments表明,我们的MAPU在三个实际时间序列数据集上实现了明显的性能提升,较exist方法更好。我们的代码可以在 \url{https://github.com/mohamedr002/MAPU_SFDA_TS} 上获取。
    Abstract Source-free domain adaptation (SFDA) aims to adapt a pretrained model from a labeled source domain to an unlabeled target domain without access to the source domain data, preserving source domain privacy. Despite its prevalence in visual applications, SFDA is largely unexplored in time series applications. The existing SFDA methods that are mainly designed for visual applications may fail to handle the temporal dynamics in time series, leading to impaired adaptation performance. To address this challenge, this paper presents a simple yet effective approach for source-free domain adaptation on time series data, namely MAsk and imPUte (MAPU). First, to capture temporal information of the source domain, our method performs random masking on the time series signals while leveraging a novel temporal imputer to recover the original signal from a masked version in the embedding space. Second, in the adaptation step, the imputer network is leveraged to guide the target model to produce target features that are temporally consistent with the source features. To this end, our MAPU can explicitly account for temporal dependency during the adaptation while avoiding the imputation in the noisy input space. Our method is the first to handle temporal consistency in SFDA for time series data and can be seamlessly equipped with other existing SFDA methods. Extensive experiments conducted on three real-world time series datasets demonstrate that our MAPU achieves significant performance gain over existing methods. Our code is available at \url{https://github.com/mohamedr002/MAPU_SFDA_TS}.
    摘要 源自由领域适应(SFDA)目标是将预训练的源频率频率模型适应到无标签目标频率频率数据上,保持源频率频率数据私钥。尽管SFDA在视觉应用中广泛存在,但在时间序列应用中它尚未得到充分探讨。现有的SFDA方法主要是为视觉应用设计,可能无法处理时间序列中的时间动态,导致适应性下降。为解决这个挑战,本文提出了一种简单又有效的源自由频率频率数据适应方法,即MAsk和imPUte(MAPU)。首先,为捕捉源频率频率数据中的时间信息,我们的方法在时间序列信号上随机填充mask,并利用一种新的时间填充器来在嵌入空间中恢复原始信号。其次,在适应步骤中,填充器网络被利用来指导目标模型生成目标特征,使其与源特征在时间上具有一致性。这样,我们的MAPU可以在适应过程中考虑时间相互关系,而不是在噪声输入空间中进行填充。我们的方法是首次在SFDA中处理时间相互关系,可以与其他现有的SFDA方法结合使用。我们在三个实际的时间序列数据集上进行了广泛的实验,并证明了我们的MAPU可以获得显著的性能提升。我们的代码可以在 \url{https://github.com/mohamedr002/MAPU_SFDA_TS} 上获取。

Inverse Optimization for Routing Problems

  • paper_url: http://arxiv.org/abs/2307.07357
  • repo_url: https://github.com/pedroszattoni/amazon-challenge
  • paper_authors: Pedro Zattoni Scroccaro, Piet van Beek, Peyman Mohajerin Esfahani, Bilge Atasoy
  • for: 本研究旨在学习决策者行为在 Routing 问题中的行为,使用反向优化(IO)方法。
  • methods: 本研究提出了一种 IO 方法,包括假设函数、损失函数和随机首领算法,特么适用于 Routing 问题。
  • results: 本研究在 Amazon Last Mile Routing Research Challenge 中测试了 IO 方法,并实现了在 thousands 个实际路径示例中学习决策者的路径偏好。最终的 IO-学习的路径模型在 48 个参赛模型中排名第二。结果表明 IO 方法在 Routing 问题中有优秀的灵活性和实际应用 potential。
    Abstract We propose a method for learning decision-makers' behavior in routing problems using Inverse Optimization (IO). The IO framework falls into the supervised learning category and builds on the premise that the target behavior is an optimizer of an unknown cost function. This cost function is to be learned through historical data, and in the context of routing problems, can be interpreted as the routing preferences of the decision-makers. In this view, the main contributions of this study are to propose an IO methodology with a hypothesis function, loss function, and stochastic first-order algorithm tailored to routing problems. We further test our IO approach in the Amazon Last Mile Routing Research Challenge, where the goal is to learn models that replicate the routing preferences of human drivers, using thousands of real-world routing examples. Our final IO-learned routing model achieves a score that ranks 2nd compared with the 48 models that qualified for the final round of the challenge. Our results showcase the flexibility and real-world potential of the proposed IO methodology to learn from decision-makers' decisions in routing problems.
    摘要 我们提出了一种使用反优化(IO)方法来学习决策者的行为在路径问题中。IO框架属于超级VI类和建立在决策者的路径偏好是未知成本函数的假设上。在路径问题上,这个成本函数可以通过历史数据来学习,并可以被解释为决策者的路径偏好。在这种视角下,本研究的主要贡献在于提出了一种适应路径问题的IO方法ologies,包括假设函数、损失函数和随机首领算法。我们进一步测试了我们的IO方法在亚马逊最后一英里路径研究挑战中,目标是学习人工驾驶员的路径偏好,使用了数千个真实世界路径示例。我们的IO学习路径模型在48个参赛模型中排名第二,这显示了我们的IO方法在路径问题中学习决策者的行为的可能性和实际应用的实用性。

On the Sublinear Regret of GP-UCB

  • paper_url: http://arxiv.org/abs/2307.07539
  • repo_url: None
  • paper_authors: Justin Whitehouse, Zhiwei Steven Wu, Aaditya Ramdas
  • for: 在 kernelized bandit problem 中,一个学习者需要逐步计算一个函数在 reproduce kernel Hilbert space 中的最优点。Specifically, the learner aims to minimize regret, which is a measure of the suboptimality of the choices made.
  • methods: 该算法使用 Gaussian Process Upper Confidence Bound (GP-UCB) 算法,具体来说是基于一个简单线性估计器来行动。
  • results: 我们解决了一个长期开放问题,证明 GP-UCB 算法具有几乎最优的 regret。Specifically, our results show that GP-UCB enjoys sublinear regret rates for the Mat'ern kernel, improving over the state-of-the-art analyses and partially resolving a COLT open problem posed by Vakili et al.
    Abstract In the kernelized bandit problem, a learner aims to sequentially compute the optimum of a function lying in a reproducing kernel Hilbert space given only noisy evaluations at sequentially chosen points. In particular, the learner aims to minimize regret, which is a measure of the suboptimality of the choices made. Arguably the most popular algorithm is the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm, which involves acting based on a simple linear estimator of the unknown function. Despite its popularity, existing analyses of GP-UCB give a suboptimal regret rate, which fails to be sublinear for many commonly used kernels such as the Mat\'ern kernel. This has led to a longstanding open question: are existing regret analyses for GP-UCB tight, or can bounds be improved by using more sophisticated analytical techniques? In this work, we resolve this open question and show that GP-UCB enjoys nearly optimal regret. In particular, our results yield sublinear regret rates for the Mat\'ern kernel, improving over the state-of-the-art analyses and partially resolving a COLT open problem posed by Vakili et al. Our improvements rely on a key technical contribution -- regularizing kernel ridge estimators in proportion to the smoothness of the underlying kernel $k$. Applying this key idea together with a largely overlooked concentration result in separable Hilbert spaces (for which we provide an independent, simplified derivation), we are able to provide a tighter analysis of the GP-UCB algorithm.
    摘要 在核函数问题中,一个学习者想要逐渐计算一个 lying in a reproducing kernel Hilbert space 中的最优函数,只有受到随机评估的点选择。特别是,学习者想要减少 regret,它是选择的不优异度的度量。现有最受欢迎的算法是 Gaussian Process Upper Confidence Bound(GP-UCB)算法,它基于一个简单的线性估计来行动。尽管它的流行,但现有的 regret 分析仍然给出了不优的 regret 率,这些率不仅对 Matérn 核函数而言不是线性的。这个问题已经成为一个长期开放问题:现有的 regret 分析是否准确,或者可以通过更加复杂的分析技术来改进 bound?在这个工作中,我们解决了这个问题,并证明 GP-UCB 算法具有nearly optimal regret。具体来说,我们的结果表明,对 Matérn 核函数,GP-UCB 算法的 regret 率是线性的,这与现有的分析不同,并且解决了 COLT 开放问题。我们的改进来自于一个关键技术创新——对 kernel ridge 估计器进行补偿,以便适应核函数的平滑度。通过这个关键想法,我们还应用了一个 relativelly 未知的集中结果(即 separable Hilbert spaces 中的 concentraion result),以获得 GP-UCB 算法的更加紧密的分析。

A testing-based approach to assess the clusterability of categorical data

  • paper_url: http://arxiv.org/abs/2307.07346
  • repo_url: https://github.com/hulianyu/TestCat
  • paper_authors: Lianyu Hu, Junjie Dong, Mudi Jiang, Yan Liu, Zengyou He
  • for: 本研究旨在evaluating the clusterability of categorical data, a crucial yet often-overlooked issue in cluster analysis.
  • methods: 我们提出了一种基于测试的方法,名为TestCat,来评估 categorical data的clusterability。
  • results: TestCat在一些标准 benchmark categorical data sets上表现出色,与基于现有clusterability evaluation方法的解决方案相比,并且可以 statistically sound manner中识别 categorical data的clusterability.
    Abstract The objective of clusterability evaluation is to check whether a clustering structure exists within the data set. As a crucial yet often-overlooked issue in cluster analysis, it is essential to conduct such a test before applying any clustering algorithm. If a data set is unclusterable, any subsequent clustering analysis would not yield valid results. Despite its importance, the majority of existing studies focus on numerical data, leaving the clusterability evaluation issue for categorical data as an open problem. Here we present TestCat, a testing-based approach to assess the clusterability of categorical data in terms of an analytical $p$-value. The key idea underlying TestCat is that clusterable categorical data possess many strongly correlated attribute pairs and hence the sum of chi-squared statistics of all attribute pairs is employed as the test statistic for $p$-value calculation. We apply our method to a set of benchmark categorical data sets, showing that TestCat outperforms those solutions based on existing clusterability evaluation methods for numeric data. To the best of our knowledge, our work provides the first way to effectively recognize the clusterability of categorical data in a statistically sound manner.
    摘要 The key idea behind TestCat is that clusterable categorical data will have many strongly correlated attribute pairs, so we use the sum of chi-squared statistics for all attribute pairs as our test statistic for p-value calculation. We apply our method to a set of benchmark categorical data sets and show that TestCat outperforms existing solutions based on clusterability evaluation methods for numerical data. To the best of our knowledge, our work provides the first statistically sound way to recognize the clusterability of categorical data.

Inverse Evolution Layers: Physics-informed Regularizers for Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2307.07344
  • repo_url: None
  • paper_authors: Chaoyu Liu, Zhonghua Qiao, Chao Li, Carola-Bibiane Schönlieb
  • for: 这 paper 提出了一种将 partial differential equation (PDE)-based evolution models интеグри到 neural networks 中的新型Regularization 方法。
  • methods: 这 paper 提出了 inverse evolution layers (IELs) 基于 evolution equations,这些层可以实现特定的 regularization 目标,并将 neural networks 输出具有对应的物理演化模型的属性。
  • results: 实验结果表明,使用 heat-diffusion IELs 可以有效地 mitigate 噪音标签导致的 overfitting 问题。
    Abstract This paper proposes a novel approach to integrating partial differential equation (PDE)-based evolution models into neural networks through a new type of regularization. Specifically, we propose inverse evolution layers (IELs) based on evolution equations. These layers can achieve specific regularization objectives and endow neural networks' outputs with corresponding properties of the evolution models. Moreover, IELs are straightforward to construct and implement, and can be easily designed for various physical evolutions and neural networks. Additionally, the design process for these layers can provide neural networks with intuitive and mathematical interpretability, thus enhancing the transparency and explainability of the approach. To demonstrate the effectiveness, efficiency, and simplicity of our approach, we present an example of endowing semantic segmentation models with the smoothness property based on the heat diffusion model. To achieve this goal, we design heat-diffusion IELs and apply them to address the challenge of semantic segmentation with noisy labels. The experimental results demonstrate that the heat-diffusion IELs can effectively mitigate the overfitting problem caused by noisy labels.
    摘要 To demonstrate the effectiveness, efficiency, and simplicity of the proposed approach, the paper presents an example of endowing semantic segmentation models with the smoothness property based on the heat diffusion model. To achieve this, heat-diffusion IELs are designed and applied to address the challenge of semantic segmentation with noisy labels. The experimental results show that the heat-diffusion IELs can effectively mitigate the overfitting problem caused by noisy labels.

MaxMin-L2-SVC-NCH: A Novel Approach for Support Vector Classifier Training and Parameter Selection

  • paper_url: http://arxiv.org/abs/2307.07343
  • repo_url: None
  • paper_authors: Linkai Luo, Qiaoling Yang, Hong Peng, Yiding Wang, Ziyang Chen
  • for: 提高支持向量分类(SVC)的应用效率,避免使用k-fold Cross Validation(CV)的时间消耗。
  • methods: 提出了一种新的SVC模型训练和加aussian kernel参数优化方法,其中包括一个名为MaxMin-L2-SVC-NCH的最小化-最大化优化问题。
  • results: 对于公共数据集的比较实验结果显示,MaxMin-L2-SVC-NCH可以减少模型训练数量而保持竞争力的测试准确率,这表明MaxMin-L2-SVC-NCH是SVC任务中更好的选择。
    Abstract The selection of Gaussian kernel parameters plays an important role in the applications of support vector classification (SVC). A commonly used method is the k-fold cross validation with grid search (CV), which is extremely time-consuming because it needs to train a large number of SVC models. In this paper, a new approach is proposed to train SVC and optimize the selection of Gaussian kernel parameters. We first formulate the training and parameter selection of SVC as a minimax optimization problem named as MaxMin-L2-SVC-NCH, in which the minimization problem is an optimization problem of finding the closest points between two normal convex hulls (L2-SVC-NCH) while the maximization problem is an optimization problem of finding the optimal Gaussian kernel parameters. A lower time complexity can be expected in MaxMin-L2-SVC-NCH because CV is not needed. We then propose a projected gradient algorithm (PGA) for training L2-SVC-NCH. The famous sequential minimal optimization (SMO) algorithm is a special case of the PGA. Thus, the PGA can provide more flexibility than the SMO. Furthermore, the solution of the maximization problem is done by a gradient ascent algorithm with dynamic learning rate. The comparative experiments between MaxMin-L2-SVC-NCH and the previous best approaches on public datasets show that MaxMin-L2-SVC-NCH greatly reduces the number of models to be trained while maintaining competitive test accuracy. These findings indicate that MaxMin-L2-SVC-NCH is a better choice for SVC tasks.
    摘要 选择 Gaussian kernel 参数的选择在支持向量分类 (SVC) 中扮演着重要的角色。一种常用的方法是 k-fold cross validation with grid search (CV),但这个方法需要训练大量的 SVC 模型,时间复杂度很高。在这篇论文中,我们提出了一个新的方法来训练 SVC 和优化 Gaussian kernel 参数的选择。我们首先将训练和参数选择转换为一个内积最小化问题(MaxMin-L2-SVC-NCH),其中的最小化问题是找到两个正常凸体(L2-SVC-NCH)之间最近的点,而最大化问题是找到最佳的 Gaussian kernel 参数。这个方法可以预期更低的时间复杂度,因为 CV 不需要。我们 THEN 提出了一个投影 gradient 算法 (PGA) 来训练 L2-SVC-NCH。SMO 算法是 PGA 的一个特例,因此 PGA 可以提供更多的灵活性。另外,最大化问题的解是使用梯度升降法 WITH 动态学习率。实验结果显示,MaxMin-L2-SVC-NCH 与过去的最佳方法在公开数据集上比较,可以大幅降低需要训练的模型数量,保持竞争的测试准确性。这些结果表明,MaxMin-L2-SVC-NCH 是 SVC 任务中的一个更好的选择。

Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild

  • paper_url: http://arxiv.org/abs/2307.10214
  • repo_url: None
  • paper_authors: Giuseppe Siracusano, Davide Sanvito, Roberto Gonzalez, Manikantan Srinivasan, Sivakaman Kamatchi, Wataru Takahashi, Masaru Kawakita, Takahiro Kakumaru, Roberto Bifulco
  • for: 这个论文的目的是提供一个大型开源 benchmark dataset 和一种基于大语言模型的自动化敏感信息抽取工具,以帮助组织评估风险和提高安全性。
  • methods: 这个论文使用了两个自定义的信息抽取管道,并利用最新的大语言模型(GPT3.5)来实现自动化敏感信息抽取。
  • results: 论文的实验结果表明,使用这种新的方法可以在敏感信息抽取 task 中提高 F1 分数由10%点提高到50%点,与之前的10种解决方案相比。
    Abstract Cyber Threat Intelligence (CTI) plays a crucial role in assessing risks and enhancing security for organizations. However, the process of extracting relevant information from unstructured text sources can be expensive and time-consuming. Our empirical experience shows that existing tools for automated structured CTI extraction have performance limitations. Furthermore, the community lacks a common benchmark to quantitatively assess their performance. We fill these gaps providing a new large open benchmark dataset and aCTIon, a structured CTI information extraction tool. The dataset includes 204 real-world publicly available reports and their corresponding structured CTI information in STIX format. Our team curated the dataset involving three independent groups of CTI analysts working over the course of several months. To the best of our knowledge, this dataset is two orders of magnitude larger than previously released open source datasets. We then design aCTIon, leveraging recently introduced large language models (GPT3.5) in the context of two custom information extraction pipelines. We compare our method with 10 solutions presented in previous work, for which we develop our own implementations when open-source implementations were lacking. Our results show that aCTIon outperforms previous work for structured CTI extraction with an improvement of the F1-score from 10%points to 50%points across all tasks.
    摘要 资ber隐ThreadIntelligence(CTI)在评估风险和增强组织安全方面扮演着关键的角色。然而,从不结构化文本来提取有用信息的过程可能是昂费时间和成本的。我们的实践经验表明,现有的自动化结构CTI信息提取工具有性能上的限制。此外,社区缺乏一个共同的量化评估标准。我们填补了这些空白,提供了一个新的大型开放 benchmark数据集和aCTIon,一个结构化CTI信息提取工具。这个数据集包括204份公开可用的报告,以及它们的对应的结构化CTI信息在STIX格式下。我们的团队在几个月内精心筛选了这个数据集,还有三个独立的CTI分析师团队参与了实验。根据我们所知,这个数据集比前一些公开数据集大得多,是二个数量级的大得多。我们随后设计了aCTIon,利用最近引入的大型语言模型(GPT3.5),并在两个自订的信息提取管道中进行了实现。我们与前一些作品中的10个解决方案进行比较,并为其实现了自己的开源实现。我们的结果显示,aCTIon在结构化CTI信息提取方面比前一些作品提高了F1分数的值,从10%点提高到50%点。

How Different Is Stereotypical Bias Across Languages?

  • paper_url: http://arxiv.org/abs/2307.07331
  • repo_url: https://github.com/slds-lmu/stereotypes-multi
  • paper_authors: Ibrahim Tolga Öztürk, Rostislav Nedelchev, Christian Heumann, Esteban Garces Arias, Marius Roger, Bernd Bischl, Matthias Aßenmacher
  • for: 本研究探讨了在预训练英语模型中带有刻板印象的问题,并在多个维度上扩展了这一分支研究。
  • methods: 我们使用了英语斯tereoSet数据集(Nadeem et al., 2021),通过自动翻译而将其翻译成了德语、法语、西班牙语和土耳其语。
  • results: 我们发现,在多语言设置下进行这类分析是非常重要的,因为我们的实验结果显示了许多多样性和语言之间的差异。主要结论是,mGPT-2(部分)在不同语言中表现出了反刻板的行为,英语(单语言)模型表现出最强的偏见,并且数据集中含有最少的刻板印象是在土耳其模型中。最后,我们发布了我们的代码库和数据集的 semi-automatic 翻译,以便鼓励其他语言的扩展。
    Abstract Recent studies have demonstrated how to assess the stereotypical bias in pre-trained English language models. In this work, we extend this branch of research in multiple different dimensions by systematically investigating (a) mono- and multilingual models of (b) different underlying architectures with respect to their bias in (c) multiple different languages. To that end, we make use of the English StereoSet data set (Nadeem et al., 2021), which we semi-automatically translate into German, French, Spanish, and Turkish. We find that it is of major importance to conduct this type of analysis in a multilingual setting, as our experiments show a much more nuanced picture as well as notable differences from the English-only analysis. The main takeaways from our analysis are that mGPT-2 (partly) shows surprising anti-stereotypical behavior across languages, English (monolingual) models exhibit the strongest bias, and the stereotypes reflected in the data set are least present in Turkish models. Finally, we release our codebase alongside the translated data sets and practical guidelines for the semi-automatic translation to encourage a further extension of our work to other languages.
    摘要 Recent studies have demonstrated how to assess the stereotypical bias in pre-trained English language models. In this work, we extend this branch of research in multiple different dimensions by systematically investigating (a) mono- and multilingual models of (b) different underlying architectures with respect to their bias in (c) multiple different languages. To that end, we make use of the English StereoSet data set (Nadeem et al., 2021), which we semi-automatically translate into German, French, Spanish, and Turkish. We find that it is of major importance to conduct this type of analysis in a multilingual setting, as our experiments show a much more nuanced picture as well as notable differences from the English-only analysis. The main takeaways from our analysis are that mGPT-2 (partly) shows surprising anti-stereotypical behavior across languages, English (monolingual) models exhibit the strongest bias, and the stereotypes reflected in the data set are least present in Turkish models. Finally, we release our codebase alongside the translated data sets and practical guidelines for the semi-automatic translation to encourage a further extension of our work to other languages.Here's the text in Traditional Chinese:Recent studies have demonstrated how to assess the stereotypical bias in pre-trained English language models. In this work, we extend this branch of research in multiple different dimensions by systematically investigating (a) mono- and multilingual models of (b) different underlying architectures with respect to their bias in (c) multiple different languages. To that end, we make use of the English StereoSet data set (Nadeem et al., 2021), which we semi-automatically translate into German, French, Spanish, and Turkish. We find that it is of major importance to conduct this type of analysis in a multilingual setting, as our experiments show a much more nuanced picture as well as notable differences from the English-only analysis. The main takeaways from our analysis are that mGPT-2 (partly) shows surprising anti-stereotypical behavior across languages, English (monolingual) models exhibit the strongest bias, and the stereotypes reflected in the data set are least present in Turkish models. Finally, we release our codebase alongside the translated data sets and practical guidelines for the semi-automatic translation to encourage a further extension of our work to other languages.

Boosting Backdoor Attack with A Learnable Poisoning Sample Selection Strategy

  • paper_url: http://arxiv.org/abs/2307.07328
  • repo_url: None
  • paper_authors: Zihao Zhu, Mingda Zhang, Shaokui Wei, Li Shen, Yanbo Fan, Baoyuan Wu
  • for: 防止模型中植入后门攻击,提高攻击效果。
  • methods: 使用数据毒素攻击,通过控制训练集中毒素样本的选择,实现后门攻击。
  • results: 提高了后门攻击性能,效率高。
    Abstract Data-poisoning based backdoor attacks aim to insert backdoor into models by manipulating training datasets without controlling the training process of the target model. Existing attack methods mainly focus on designing triggers or fusion strategies between triggers and benign samples. However, they often randomly select samples to be poisoned, disregarding the varying importance of each poisoning sample in terms of backdoor injection. A recent selection strategy filters a fixed-size poisoning sample pool by recording forgetting events, but it fails to consider the remaining samples outside the pool from a global perspective. Moreover, computing forgetting events requires significant additional computing resources. Therefore, how to efficiently and effectively select poisoning samples from the entire dataset is an urgent problem in backdoor attacks.To address it, firstly, we introduce a poisoning mask into the regular backdoor training loss. We suppose that a backdoored model training with hard poisoning samples has a more backdoor effect on easy ones, which can be implemented by hindering the normal training process (\ie, maximizing loss \wrt mask). To further integrate it with normal training process, we then propose a learnable poisoning sample selection strategy to learn the mask together with the model parameters through a min-max optimization.Specifically, the outer loop aims to achieve the backdoor attack goal by minimizing the loss based on the selected samples, while the inner loop selects hard poisoning samples that impede this goal by maximizing the loss. After several rounds of adversarial training, we finally select effective poisoning samples with high contribution. Extensive experiments on benchmark datasets demonstrate the effectiveness and efficiency of our approach in boosting backdoor attack performance.
    摘要 <> transtable text into Simplified Chinese.<>数据毒品基于后门攻击 aim to inject backdoors into models by manipulating training datasets without controlling the training process of the target model. Existing attack methods mainly focus on designing triggers or fusion strategies between triggers and benign samples. However, they often randomly select samples to be poisoned, disregarding the varying importance of each poisoning sample in terms of backdoor injection. A recent selection strategy filters a fixed-size poisoning sample pool by recording forgetting events, but it fails to consider the remaining samples outside the pool from a global perspective. Moreover, computing forgetting events requires significant additional computing resources. Therefore, how to efficiently and effectively select poisoning samples from the entire dataset is an urgent problem in backdoor attacks.To address this, we first introduce a poisoning mask into the regular backdoor training loss. We suppose that a backdoored model training with hard poisoning samples has a more backdoor effect on easy ones, which can be implemented by hindering the normal training process (ie, maximizing loss wrt mask). To further integrate it with the normal training process, we then propose a learnable poisoning sample selection strategy to learn the mask together with the model parameters through a min-max optimization. Specifically, the outer loop aims to achieve the backdoor attack goal by minimizing the loss based on the selected samples, while the inner loop selects hard poisoning samples that impede this goal by maximizing the loss. After several rounds of adversarial training, we finally select effective poisoning samples with high contribution. Extensive experiments on benchmark datasets demonstrate the effectiveness and efficiency of our approach in boosting backdoor attack performance.

On the Sensitivity of Deep Load Disaggregation to Adversarial Attacks

  • paper_url: http://arxiv.org/abs/2307.10209
  • repo_url: None
  • paper_authors: Hafsa Bousbiat, Yassine Himeur, Abbes Amira, Wathiq Mansoor
  • for: This paper is written to investigate the vulnerability of deep neural network-based non-intrusive load monitoring (NILM) algorithms to adversarial attacks, and to provide evidence for the potential impact of these attacks on energy management systems.
  • methods: The paper uses two commonly employed CNN-based NILM baselines, the Sequence-to-Sequence (S2S) and Sequence-to-Point (S2P) models, and applies an adversarial attack called the Fast Gradient Sign Method (FGSM) to perturb the input sequences fed into these models.
  • results: The paper finds that both NILM baselines are vulnerable to adversarial attacks, with the S2P model exhibiting a significant decline in the F1-score (an average of 20%) even with small amounts of noise. This suggests that these models may not be reliable for energy management systems in residential and industrial sectors.
    Abstract Non-intrusive Load Monitoring (NILM) algorithms, commonly referred to as load disaggregation algorithms, are fundamental tools for effective energy management. Despite the success of deep models in load disaggregation, they face various challenges, particularly those pertaining to privacy and security. This paper investigates the sensitivity of prominent deep NILM baselines to adversarial attacks, which have proven to be a significant threat in domains such as computer vision and speech recognition. Adversarial attacks entail the introduction of imperceptible noise into the input data with the aim of misleading the neural network into generating erroneous outputs. We investigate the Fast Gradient Sign Method (FGSM), a well-known adversarial attack, to perturb the input sequences fed into two commonly employed CNN-based NILM baselines: the Sequence-to-Sequence (S2S) and Sequence-to-Point (S2P) models. Our findings provide compelling evidence for the vulnerability of these models, particularly the S2P model which exhibits an average decline of 20\% in the F1-score even with small amounts of noise. Such weakness has the potential to generate profound implications for energy management systems in residential and industrial sectors reliant on NILM models.
    摘要

Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications

  • paper_url: http://arxiv.org/abs/2307.07325
  • repo_url: None
  • paper_authors: Varun Krishna, Tarun Sai, Sriram Ganapathy
  • for: 这篇论文的目的是提出一种无文本资源的语音表征学学习方法,用于解决低资源语音应用场景中的问题。
  • methods: 这篇论文使用了隐藏单元划分(HUC)框架,将输入的听录样本窗口处理后,使用1D卷积层生成时域表示,然后使用长短期记忆(LSTM)层生成每个窗口段的上下文向量表示。
  • results: 在 ZeroSpeech 2021 挑战中的不同任务上,以及在 TIMIT 数据集和 GramVaani 挑战 Hindi 数据集上的自动语音识别(ASR)应用中,提出的方法达到了状态的最佳结果。此外,在 ASR 应用中,HUC 表示进一步提高了与 Wav2vec、HuBERT 和 Best-RQ 等已知标准的比较。
    Abstract The representation learning of speech, without textual resources, is an area of significant interest for many low resource speech applications. In this paper, we describe an approach to self-supervised representation learning from raw audio using a hidden unit clustering (HUC) framework. The input to the model consists of audio samples that are windowed and processed with 1-D convolutional layers. The learned "time-frequency" representations from the convolutional neural network (CNN) module are further processed with long short term memory (LSTM) layers which generate a contextual vector representation for every windowed segment. The HUC framework, allowing the categorization of the representations into a small number of phoneme-like units, is used to train the model for learning semantically rich speech representations. The targets consist of phoneme-like pseudo labels for each audio segment and these are generated with an iterative k-means algorithm. We explore techniques that improve the speaker invariance of the learned representations and illustrate the effectiveness of the proposed approach on two settings, i) completely unsupervised speech applications on the sub-tasks described as part of the ZeroSpeech 2021 challenge and ii) semi-supervised automatic speech recognition (ASR) applications on the TIMIT dataset and on the GramVaani challenge Hindi dataset. In these experiments, we achieve state-of-art results for various ZeroSpeech tasks. Further, on the ASR experiments, the HUC representations are shown to improve significantly over other established benchmarks based on Wav2vec, HuBERT and Best-RQ.
    摘要 “对于无文本资源的语音识别,是一个具有很大的研究 интерес领域。在这篇文章中,我们描述了一种基于隐藏单位聚合(HUC)框架的自我监督表现学习方法,从原始数据中学习语音表现。输入模型包括对于对话时间轴的1-D卷积层处理的音频样本。从卷积神经网络(CNN)模块学习的“时间频率”表现被进一步处理,使用长期短记忆(LSTM)层生成每个窗口段的上下文vector表现。HUC框架,允许分类表现为一小数量的语音单元,用于训练模型从语音表现学习具有含义的Semantic Speech表现。目标包括每个语音段的phoneme-likepseudo标签,通过迭代k-means算法生成。我们探索提高话者不对称性的学习表现技术,并在两个设定下评估提案的效果:一、完全不监督语音应用程序中的ZeroSpeech 2021挑战中的多个子 зада项目,和二、半监督自动语音识别(ASR)应用程序中的TIMIT数据集和GramVaani挑战中的Hindi数据集。在这些实验中,我们 дости得了ZeroSpeech任务的国际最佳成绩,并且在ASR实验中,HUC表现明显提高了较之Wav2vec、HuBERT和Best-RQ参考。”

A Context-Aware Cutting Plane Selection Algorithm for Mixed-Integer Programming

  • paper_url: http://arxiv.org/abs/2307.07322
  • repo_url: https://github.com/opt-mucca/context-aware-cut-selection
  • paper_authors: Mark Turner, Timo Berthold, Mathieu Besançon
  • for: 这个论文是为了提高杂integer编程 solver中的切割算法而写的。
  • methods: 该论文提出了一些新的切割评价标准、切割筛选技术和停止标准,用于扩展当前的状态艺术算法,并在SCIP上实现5%的性能提升。
  • results: 该论文在MIPLIB 2017 benchmark集上实现了5%的性能提升。
    Abstract The current cut selection algorithm used in mixed-integer programming solvers has remained largely unchanged since its creation. In this paper, we propose a set of new cut scoring measures, cut filtering techniques, and stopping criteria, extending the current state-of-the-art algorithm and obtaining a 5\% performance improvement for SCIP over the MIPLIB 2017 benchmark set.
    摘要 当前的割选算法在杂Integer编程解决器中一直保持不变,在这篇论文中,我们提出了一组新的割分得分、割选技术和停止标准,对SCIP进行扩展,并在MIPLIB 2017 benchark集上实现5%的性能提升。Note: "SCIP" stands for "Solving Constraint Satisfaction Problems" and "MIPLIB" stands for "Mixed-Integer Programming Library".

Adaptive Linear Estimating Equations

  • paper_url: http://arxiv.org/abs/2307.07320
  • repo_url: https://github.com/mufangying/ALEE
  • paper_authors: Mufang Ying, Koulik Khamaru, Cun-Hui Zhang
  • for: 这篇论文主要是为了解决继系数收集机制中的数据收集问题,提高数据收集效率。
  • methods: 这篇论文提出了一种常数性预测方法,使用可适应线性估计方程,并提供了关于这种估计方法的理论保证。
  • results: 这篇论文的结果表明,使用这种常数性预测方法可以保证 asymptotic normality 性和近似优化的 asymptotic variance 性,并且在多臂投机中可以保持非尽含 asymptotic normality 性。
    Abstract Sequential data collection has emerged as a widely adopted technique for enhancing the efficiency of data gathering processes. Despite its advantages, such data collection mechanism often introduces complexities to the statistical inference procedure. For instance, the ordinary least squares (OLS) estimator in an adaptive linear regression model can exhibit non-normal asymptotic behavior, posing challenges for accurate inference and interpretation. In this paper, we propose a general method for constructing debiased estimator which remedies this issue. It makes use of the idea of adaptive linear estimating equations, and we establish theoretical guarantees of asymptotic normality, supplemented by discussions on achieving near-optimal asymptotic variance. A salient feature of our estimator is that in the context of multi-armed bandits, our estimator retains the non-asymptotic performance of the least square estimator while obtaining asymptotic normality property. Consequently, this work helps connect two fruitful paradigms of adaptive inference: a) non-asymptotic inference using concentration inequalities and b) asymptotic inference via asymptotic normality.
    摘要 纵向数据收集已成为数据收集过程中广泛采用的技术。虽然它具有优点,但这种数据收集机制经常带来统计推断过程中的复杂性。例如,通用最小二乘(OLS)估计器在自适应线性回归模型中可能会表现出非正态极限行为,从而增加准确推断和解释的困难。在这篇论文中,我们提出一种通用的构建减偏估计器的方法。它基于适应线性估计方程的想法,并为我们提供了正负合理的非正态性和优化性的理论保证。我们的估计器在多臂投掷中具有非 asymptotic性的性能,同时具有 asymptotic normality 性质。因此,这种工作可以将两种有用的推断方法相连接起来:a) 非 asymptotic 推断使用集中不等式和 b) asymptotic 推断通过 asymptotic normality。

Scalable Deep Learning for RNA Secondary Structure Prediction

  • paper_url: http://arxiv.org/abs/2307.10073
  • repo_url: https://github.com/automl/rnaformer
  • paper_authors: Jörg K. H. Franke, Frederic Runge, Frank Hutter
  • for: 本研究旨在提出一种深度学习模型,用于预测RNA的次STRUCTURE。
  • methods: 该模型使用轴对注意力和循环在隐藏空间进行设计,以提高性能。
  • results: 该方法在TS0benchmark数据集上达到了状态的最佳性能,并且超过了使用外部信息的方法。此外,实验表明,RNAformer可以学习RNA折叠过程的生物物理模型。
    Abstract The field of RNA secondary structure prediction has made significant progress with the adoption of deep learning techniques. In this work, we present the RNAformer, a lean deep learning model using axial attention and recycling in the latent space. We gain performance improvements by designing the architecture for modeling the adjacency matrix directly in the latent space and by scaling the size of the model. Our approach achieves state-of-the-art performance on the popular TS0 benchmark dataset and even outperforms methods that use external information. Further, we show experimentally that the RNAformer can learn a biophysical model of the RNA folding process.
    摘要 领域中的RNA次STRUCTURE预测技术已经做出了重要的进步,通过使用深度学习技术。在这项工作中,我们提出了RNAformer,一种简洁的深度学习模型,使用轴向注意力和重复在幂空间。我们通过设计模型的结构,直接在幂空间模型邻接矩阵,并通过增大模型的大小来提高性能。我们的方法在TS0测试集上达到了状态级性能,甚至超越了使用外部信息的方法。此外,我们实验表明RNAformer可以学习RNA折叠过程的生物物理模型。

  • paper_url: http://arxiv.org/abs/2307.07317
  • repo_url: None
  • paper_authors: Cedric Waterschoot, Antal van den Bosch
  • for: 这篇论文的目的是为Content moderation in online news outlets提供支持和帮助,即对用户生成内容进行Moderation。
  • methods: 这篇论文使用rank分类算法和用户内容特征相结合,以实现最佳的分类F1-score和NDCG@5指标。
  • results: 研究发现,添加文本特征可以获得最佳分类效果,而内容Moderator对推荐的评论进行评估时,NDCG分数均为0.83。
    Abstract Online news outlets are grappling with the moderation of user-generated content within their comment section. We present a recommender system based on ranking class probabilities to support and empower the moderator in choosing featured posts, a time-consuming task. By combining user and textual content features we obtain an optimal classification F1-score of 0.44 on the test set. Furthermore, we observe an optimum mean NDCG@5 of 0.87 on a large set of validation articles. As an expert evaluation, content moderators assessed the output of a random selection of articles by choosing comments to feature based on the recommendations, which resulted in a NDCG score of 0.83. We conclude that first, adding text features yields the best score and second, while choosing featured content remains somewhat subjective, content moderators found suitable comments in all but one evaluated recommendations. We end the paper by analyzing our best-performing model, a step towards transparency and explainability in hybrid content moderation.
    摘要 在线新闻媒体面临用户生成内容的Moderation问题。我们提出一种基于排名类probability的推荐系统,以支持和强化Moderator在选择推荐文章中的决策。通过结合用户和文本内容特征,我们获得了优化的分类F1得分0.44。此外,我们在大量验证文章上观察到了最佳的NDCG@50.87。专业评估人员对一些随机选择的文章中的评论进行评估,结果显示了NDCG分数0.83。我们得出了两点结论:一、添加文本特征可以获得最好的分数;二、选择推荐内容仍然有一定的主观性,但Moderator在所有评估的推荐中都可以找到合适的评论。我们结束这篇论文,并进行了最高表现的模型的分析,这是一种对于混合内容Moderation的透明度和解释性的一步。

HEAL-SWIN: A Vision Transformer On The Sphere

  • paper_url: http://arxiv.org/abs/2307.07313
  • repo_url: https://github.com/janegerken/heal-swin
  • paper_authors: Oscar Carlsson, Jan E. Gerken, Hampus Linander, Heiner Spieß, Fredrik Ohlsson, Christoffer Petersson, Daniel Persson
  • for: 高分辨率宽角鱼眼图像在自动驾驶等机器人应用中变得越来越重要,但使用普通的卷积神经网络或视transformer在这些数据上是问题,因为投影和扭曲损失引入到投影到平面上的方格上。
  • methods: 我们引入了HEAL-SWIN transformer,它将astrophysics和cosmology中使用的高度均匀的Hierarchical Equal Area iso-Latitude Pixelation(HEALPix)格子和Hierarchical Shifted-Window(SWIN)变换器相结合,以实现高效和灵活的模型,能够在高分辨率、扭曲free的球形数据上训练。在HEAL-SWIN中,HEALPix格子的嵌套结构用于执行覆盖和窗口操作,从而实现了一个简单的一维表示,以最小化计算开销。
  • results: 我们在semantic segmentation和depth regression任务上使用HEAL-SWIN模型,并在 sintetic和实际的汽车数据集上达到了superior表现。我们的代码可以在https://github.com/JanEGerken/HEAL-SWIN中找到。
    Abstract High-resolution wide-angle fisheye images are becoming more and more important for robotics applications such as autonomous driving. However, using ordinary convolutional neural networks or vision transformers on this data is problematic due to projection and distortion losses introduced when projecting to a rectangular grid on the plane. We introduce the HEAL-SWIN transformer, which combines the highly uniform Hierarchical Equal Area iso-Latitude Pixelation (HEALPix) grid used in astrophysics and cosmology with the Hierarchical Shifted-Window (SWIN) transformer to yield an efficient and flexible model capable of training on high-resolution, distortion-free spherical data. In HEAL-SWIN, the nested structure of the HEALPix grid is used to perform the patching and windowing operations of the SWIN transformer, resulting in a one-dimensional representation of the spherical data with minimal computational overhead. We demonstrate the superior performance of our model for semantic segmentation and depth regression tasks on both synthetic and real automotive datasets. Our code is available at https://github.com/JanEGerken/HEAL-SWIN.
    摘要 高分辨率宽角鱼眼图像在自动驾驶应用中变得日益重要。然而,使用常见的卷积神经网络或视Transformer在这些数据上是困难的,因为将平面上的投影和变形损失引入到模型中。我们介绍了HEAL-SWIN transformer,它将astrophysics和cosmology中使用的高度均匀的 Hierarchical Equal Area iso-Latitude Pixelation(HEALPix)格子与 Hierarchical Shifted-Window(SWIN)transformer结合,生成一种高效和灵活的模型,可以在高分辨率、变形free的球形数据上训练。在HEAL-SWIN中,HEALPix格子的嵌套结构用于执行覆盖和窗口操作,从而实现一个具有最小计算开销的一维表示形式。我们在semantic segmentation和深度回归任务中示出了HEAL-SWIN模型的优秀性能,并提供了实际汽车数据集和Synthetic数据集的实验结果。代码可以在https://github.com/JanEGerken/HEAL-SWIN上获取。

Solving higher-order Lane-Emden-Fowler type equations using physics-informed neural networks: benchmark tests comparing soft and hard constraints

  • paper_url: http://arxiv.org/abs/2307.07302
  • repo_url: https://github.com/hubertbaty/pinns-lebis
  • paper_authors: Hubert Baty
  • for: 这个论文目的是用数学方法解决高阶常微分方程(ODE)。
  • methods: 这篇论文使用物理学 Informed Neural Networks(PINNs)方法,并成功应用于解决不同类型的异常ODE,包括第二阶 Lane-Emden方程、第三阶 Emden-Fowler方程和第四阶 Lane-Emden-Fowler方程。
  • results: 论文分析了两种PINNs技术的变体,并对其进行比较。首先,使用最小化程序来约束总损失函数的神经网络,其中方程 residual 被视为物理学权重,并与训练数据损失相加。其次,使用特定的试解方法来确保这些条件,以便满足微分方程。
    Abstract In this paper, numerical methods using Physics-Informed Neural Networks (PINNs) are presented with the aim to solve higher-order ordinary differential equations (ODEs). Indeed, this deep-learning technique is successfully applied for solving different classes of singular ODEs, namely the well known second-order Lane-Emden equations, third order-order Emden-Fowler equations, and fourth-order Lane-Emden-Fowler equations. Two variants of PINNs technique are considered and compared. First, a minimization procedure is used to constrain the total loss function of the neural network, in which the equation residual is considered with some weight to form a physics-based loss and added to the training data loss that contains the initial/boundary conditions. Second, a specific choice of trial solutions ensuring these conditions as hard constraints is done in order to satisfy the differential equation, contrary to the first variant based on training data where the constraints appear as soft ones. Advantages and drawbacks of PINNs variants are highlighted.
    摘要 在本文中,使用物理学信息泛化神经网络(PINNs)的数字方法被提出,以解决更高阶常微分方程(ODEs)。这种深度学习技术成功地应用于不同类型的缺陷ODEs,包括著名的第二阶 Lane-Emden方程、第三阶 Emden-Fowler方程和第四阶 Lane-Emden-Fowler方程。文中考虑了两种PINNs技术的变体,并进行比较。首先,使用最小化过程来约束神经网络总损失函数,其中对等式剩余的权重加到了训练数据损失函数中,包含初始/边界条件。其次,通过特定的试解方式来确保这些条件,而不是通过训练数据来强制满足这些条件。文中高亮了PINNs变体的优势和缺点。

Similarity-based Memory Enhanced Joint Entity and Relation Extraction

  • paper_url: http://arxiv.org/abs/2307.11762
  • repo_url: https://github.com/kosciukiewicz/similarity_based_memory_re
  • paper_authors: Witold Kosciukiewicz, Mateusz Wojcik, Tomasz Kajdanowicz, Adam Gonczarek
  • for: joint entity and relation extraction
  • methods: bidirectional memory-like dependency between tasks
  • results: outperforms existing methods, achieves state-of-the-art results on BioCreative V CDR corpus
    Abstract Document-level joint entity and relation extraction is a challenging information extraction problem that requires a unified approach where a single neural network performs four sub-tasks: mention detection, coreference resolution, entity classification, and relation extraction. Existing methods often utilize a sequential multi-task learning approach, in which the arbitral decomposition causes the current task to depend only on the previous one, missing the possible existence of the more complex relationships between them. In this paper, we present a multi-task learning framework with bidirectional memory-like dependency between tasks to address those drawbacks and perform the joint problem more accurately. Our empirical studies show that the proposed approach outperforms the existing methods and achieves state-of-the-art results on the BioCreative V CDR corpus.
    摘要 文档级联合实体和关系抽取是一个复杂的信息抽取问题,需要一个统一的方法,其中单个神经网络执行四个子任务:提及检测、核心归一化、实体分类和关系抽取。现有方法通常采用顺序多任务学习方法,其中当前任务只依赖于前一个任务,缺失可能存在更复杂的关系 между them。在本文中,我们提出了一种多任务学习框架,其中任务之间具有双向记忆型的依赖关系,以解决这些缺陷并更加准确地解决联合问题。我们的实验表明,我们的方法在BioCreative V CDR corpus上达到了状态之 искусственный智能的最佳结果。

3D Shape-Based Myocardial Infarction Prediction Using Point Cloud Classification Networks

  • paper_url: http://arxiv.org/abs/2307.07298
  • repo_url: None
  • paper_authors: Marcel Beetz, Yilong Yang, Abhirup Banerjee, Lei Li, Vicente Grau
  • for: 这个研究旨在提高心脏病变检测和预测,使用完整的3D心脏形状数据。
  • methods: 我们提出了一个全自动多步管线,包括3D心脏表面重建和点云分类网络。我们利用了最新的几何深度学习技术,实现高级程度的多尺度学习。
  • results: 我们在1068名UK Biobank试验者身上进行了预现MI检测和新生MI预测,比较了临床benchmark,获得了13%和5%的提升。此外,我们分析了每个心脏 ventricle 和心脏阶段的角色,并进行了静止分析,描述了MI结果的 morphological 和 physiological 模式。
    Abstract Myocardial infarction (MI) is one of the most prevalent cardiovascular diseases with associated clinical decision-making typically based on single-valued imaging biomarkers. However, such metrics only approximate the complex 3D structure and physiology of the heart and hence hinder a better understanding and prediction of MI outcomes. In this work, we investigate the utility of complete 3D cardiac shapes in the form of point clouds for an improved detection of MI events. To this end, we propose a fully automatic multi-step pipeline consisting of a 3D cardiac surface reconstruction step followed by a point cloud classification network. Our method utilizes recent advances in geometric deep learning on point clouds to enable direct and efficient multi-scale learning on high-resolution surface models of the cardiac anatomy. We evaluate our approach on 1068 UK Biobank subjects for the tasks of prevalent MI detection and incident MI prediction and find improvements of ~13% and ~5% respectively over clinical benchmarks. Furthermore, we analyze the role of each ventricle and cardiac phase for 3D shape-based MI detection and conduct a visual analysis of the morphological and physiological patterns typically associated with MI outcomes.
    摘要 我ocardial infarction (MI) 是心血管疾病中最常见的一种,且相关的临床决策通常基于单个图像生物标志物。然而,这些指标只是心脏的复杂三维结构和生物学的估算,因此难以更好地理解和预测MI结果。在这种工作中,我们研究了使用完整的三维卡达形状来提高MI事件的检测。为此,我们提出了一个完全自动多步骤管道,包括三维卡达形状重建步骤和点云分类网络。我们的方法利用了最新的点云深度学习的进步,以实现直接和高效地多级学习高分辨率表面模型的卡达形状。我们对UK Biobank的1068名参与者进行了预现MI检测和新生MI预测两个任务,并发现我们的方法与临床标准差分别为~13%和~5%。此外,我们还分析了每个肺和律动期对三维形状基本MI检测的作用,并进行了MI结果的Visual分析,描述了通常与MI结果相关的形态和生理学特征。

Reinforcement Learning with Frontier-Based Exploration via Autonomous Environment

  • paper_url: http://arxiv.org/abs/2307.07296
  • repo_url: None
  • paper_authors: Kenji Leong
  • for: 提高自主 robot 的探索和地图建模精度
  • methods: 结合 Visual-Graph SLAM 和奖励学习
  • results: 提高 ExploreORB 的探索过程,实现更加精确的地图建模Here’s a more detailed explanation of each point:
  • for: The paper aims to improve the exploration and mapping process of autonomous robots by combining Visual-Graph SLAM with reinforcement learning.
  • methods: The proposed algorithm uses frontier-based exploration to detect unexplored areas and reinforcement learning to optimize the robot’s movement. The algorithm also integrates the robot’s sensory data using Graph SLAM to build an accurate map of the environment.
  • results: The proposed approach is expected to improve the efficiency and accuracy of ExploreORB by optimizing the exploration process of frontiers and building a more accurate map. The effectiveness of the proposed approach will be evaluated through experiments in various virtual environments using Gazebo.
    Abstract Active Simultaneous Localisation and Mapping (SLAM) is a critical problem in autonomous robotics, enabling robots to navigate to new regions while building an accurate model of their surroundings. Visual SLAM is a popular technique that uses virtual elements to enhance the experience. However, existing frontier-based exploration strategies can lead to a non-optimal path in scenarios where there are multiple frontiers with similar distance. This issue can impact the efficiency and accuracy of Visual SLAM, which is crucial for a wide range of robotic applications, such as search and rescue, exploration, and mapping. To address this issue, this research combines both an existing Visual-Graph SLAM known as ExploreORB with reinforcement learning. The proposed algorithm allows the robot to learn and optimize exploration routes through a reward-based system to create an accurate map of the environment with proper frontier selection. Frontier-based exploration is used to detect unexplored areas, while reinforcement learning optimizes the robot's movement by assigning rewards for optimal frontier points. Graph SLAM is then used to integrate the robot's sensory data and build an accurate map of the environment. The proposed algorithm aims to improve the efficiency and accuracy of ExploreORB by optimizing the exploration process of frontiers to build a more accurate map. To evaluate the effectiveness of the proposed approach, experiments will be conducted in various virtual environments using Gazebo, a robot simulation software. Results of these experiments will be compared with existing methods to demonstrate the potential of the proposed approach as an optimal solution for SLAM in autonomous robotics.
    摘要 活动同域地图Localization and Mapping(SLAM)是自主机器人领域的关键问题,帮助机器人在新区域 navigation 并建立精准的环境模型。视觉SLAM 是一种广泛使用的技术,通过虚拟元素进行增强。然而,现有的边境基于探索策略可能会导致非优化的路径,特别是在多个边境具有相似距离的情况下。这个问题可能会影响视觉SLAM 的效率和准确性,这些都是自主机器人应用的关键。为解决这个问题,本研究将 combine 现有的视觉图SLAM 知名为ExploreORB 与 reinforcement learning。提出的算法使得机器人通过奖励系统来学习和优化探索路径,以创建精准的环境模型。边境基于探索是用于检测未探索区域,而奖励学习则用于优化机器人的移动。图SLAM 然后用于将机器人的感知数据集成到环境中,建立精准的地图。提出的算法的目标是通过优化探索过程来提高ExploreORB 的效率和准确性。为评估提出的方法的有效性,将在不同的虚拟环境中进行实验,使用Gazebo 机器人 simulate 软件。实验结果将与现有方法进行比较,以证明提出的方法的潜在优势。

A Topical Approach to Capturing Customer Insight In Social Media

  • paper_url: http://arxiv.org/abs/2307.11775
  • repo_url: None
  • paper_authors: Miguel Palencia-Olivar
  • For: This research aims to address the challenge of fully unsupervised topic extraction in noisy, Big Data contexts.* Methods: The research uses three approaches built on the Variational Autoencoder framework: the Embedded Dirichlet Process, the Embedded Hierarchical Dirichlet Process, and the time-aware Dynamic Embedded Dirichlet Process. These nonparametric approaches determine word embeddings and topic embeddings without requiring transfer learning, but with the possibility of knowledge transfer.* Results: The research shows that the proposed models achieve equal to better performance than state-of-the-art methods on benchmark and automotive industry-related datasets from a real-world use case, and that improved evaluation metrics are needed in the field of topic modeling.Here’s the Simplified Chinese text format you requested:* For: 这项研究旨在解决大数据上不监督的话题抽取问题。* Methods: 这项研究使用了基于自适应变换器框架的三种方法:嵌入 Dirichlet 过程、嵌入层次 Dirichlet 过程和时态 Dynamic 嵌入 Dirichlet 过程。这些非 Parametric 方法可以不需要传输学习,但可以进行知识传输。* Results: 研究显示,提案的模型在benchmark和汽车业相关的数据集上实现了等于或更好的性能,并且在话题分析领域需要更好的评价指标。
    Abstract The age of social media has opened new opportunities for businesses. This flourishing wealth of information is outside traditional channels and frameworks of classical marketing research, including that of Marketing Mix Modeling (MMM). Textual data, in particular, poses many challenges that data analysis practitioners must tackle. Social media constitute massive, heterogeneous, and noisy document sources. Industrial data acquisition processes include some amount of ETL. However, the variability of noise in the data and the heterogeneity induced by different sources create the need for ad-hoc tools. Put otherwise, customer insight extraction in fully unsupervised, noisy contexts is an arduous task. This research addresses the challenge of fully unsupervised topic extraction in noisy, Big Data contexts. We present three approaches we built on the Variational Autoencoder framework: the Embedded Dirichlet Process, the Embedded Hierarchical Dirichlet Process, and the time-aware Dynamic Embedded Dirichlet Process. These nonparametric approaches concerning topics present the particularity of determining word embeddings and topic embeddings. These embeddings do not require transfer learning, but knowledge transfer remains possible. We test these approaches on benchmark and automotive industry-related datasets from a real-world use case. We show that our models achieve equal to better performance than state-of-the-art methods and that the field of topic modeling would benefit from improved evaluation metrics.
    摘要 “社交媒体时代对企业带来了新的机遇。这些外部 классиical 市场调查的数据 streams 和框架,包括市场混合模型(MMM)。文本数据特别是 poses 许多挑战,资料分析实践者必须解决。社交媒体是巨大、不同和噪音的文档来源。工业资料收集过程包括一定的 ETL。然而,数据中的噪音和不同来源导致需要专门的工具。即使在完全无监督的情况下,客户情感提取是一项艰辛的任务。本研究面对完全无监督的主题抽象在噪音大数据情况下的挑战。我们提出了三种基于自适应抽象框架的方法:嵌入Dirichlet过程、嵌入层次Dirichlet过程和时间意识的动态嵌入Dirichlet过程。这些非 Parametric 方法对主题进行了特殊的嵌入,并不需要转移学习。我们在 benchmark 和汽车业相关的数据集上进行了实验,并证明了我们的模型在与现有方法比较之下具有equal 或更好的性能。这显示了主题抽象领域对于改进评估指标的需求。”

Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation

  • paper_url: http://arxiv.org/abs/2307.07269
  • repo_url: https://github.com/asif-hanif/vafa
  • paper_authors: Asif Hanif, Muzammal Naseer, Salman Khan, Mubarak Shah, Fahad Shahbaz Khan
  • for: This paper is written for researchers and practitioners working in the field of medical image segmentation, particularly those interested in the robustness of deep learning models against adversarial attacks.
  • methods: The paper presents a 3D frequency domain adversarial attack for volumetric medical image segmentation models, which is a novel approach that exploits the vulnerability of these models to frequency-based attacks. The authors also propose a frequency domain adversarial training approach to optimize a robust model against both voxel and frequency domain attacks.
  • results: The authors demonstrate the effectiveness of their proposed attack and training approach through experiments on several publicly available datasets. They show that their approach can be used to launch successful attacks on state-of-the-art volumetric medical image segmentation models, and that the proposed frequency consistency loss achieves a better tradeoff between model performance on clean and adversarial samples.
    Abstract It is imperative to ensure the robustness of deep learning models in critical applications such as, healthcare. While recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks. We present a 3D frequency domain adversarial attack for volumetric medical image segmentation models and demonstrate its advantages over conventional input or voxel domain attacks. Using our proposed attack, we introduce a novel frequency domain adversarial training approach for optimizing a robust model against voxel and frequency domain attacks. Moreover, we propose frequency consistency loss to regulate our frequency domain adversarial training that achieves a better tradeoff between model's performance on clean and adversarial samples. Code is publicly available at https://github.com/asif-hanif/vafa.
    摘要 必须确保深度学习模型在重要应用领域如医疗中的稳定性。Recent Advances in deep learning have improved the performance of volumetric medical image segmentation models, but these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks. We present a 3D frequency domain adversarial attack for volumetric medical image segmentation models and demonstrate its advantages over conventional input or voxel domain attacks. Using our proposed attack, we introduce a novel frequency domain adversarial training approach for optimizing a robust model against voxel and frequency domain attacks. Moreover, we propose frequency consistency loss to regulate our frequency domain adversarial training that achieves a better tradeoff between model's performance on clean and adversarial samples. codes are publicly available at https://github.com/asif-hanif/vafa.

On Interpolating Experts and Multi-Armed Bandits

  • paper_url: http://arxiv.org/abs/2307.07264
  • repo_url: None
  • paper_authors: Houshuang Chen, Yuchen He, Chihao Zhang
  • for: This paper is written for studying a family of online decision problems called the $\mathbf{m}$-Multi-Armed Bandit (MAB) problem, which interpolates between the classical MAB problem and the Bandit with Advice Incentive (BAI) problem.
  • methods: The paper uses techniques from online learning, including the concept of minimax regret, to develop an optimal PAC algorithm for the pure exploration version of the $\mathbf{m}$-MAB problem, called the $\mathbf{m}$-BAI problem.
  • results: The paper proves tight minimax regret bounds for the $\mathbf{m}$-MAB problem and shows that the minimum number of pulls for an $(\epsilon,0.05)$-PAC algorithm of the $\mathbf{m}$-BAI problem is $\Theta\left(\frac{1}{\epsilon^2}\cdot \sum_{k=1}^K\log (m_k+1)\right)$. Additionally, the paper extends the results to a more general setting, the bandit with graph feedback, and obtains tight minimax regret bounds for several families of feedback graphs.
    Abstract Learning with expert advice and multi-armed bandit are two classic online decision problems which differ on how the information is observed in each round of the game. We study a family of problems interpolating the two. For a vector $\mathbf{m}=(m_1,\dots,m_K)\in \mathbb{N}^K$, an instance of $\mathbf{m}$-MAB indicates that the arms are partitioned into $K$ groups and the $i$-th group contains $m_i$ arms. Once an arm is pulled, the losses of all arms in the same group are observed. We prove tight minimax regret bounds for $\mathbf{m}$-MAB and design an optimal PAC algorithm for its pure exploration version, $\mathbf{m}$-BAI, where the goal is to identify the arm with minimum loss with as few rounds as possible. We show that the minimax regret of $\mathbf{m}$-MAB is $\Theta\left(\sqrt{T\sum_{k=1}^K\log (m_k+1)}\right)$ and the minimum number of pulls for an $(\epsilon,0.05)$-PAC algorithm of $\mathbf{m}$-BAI is $\Theta\left(\frac{1}{\epsilon^2}\cdot \sum_{k=1}^K\log (m_k+1)\right)$. Both our upper bounds and lower bounds for $\mathbf{m}$-MAB can be extended to a more general setting, namely the bandit with graph feedback, in terms of the clique cover and related graph parameters. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.
    摘要 学习与专家建议以及多臂帮手是两个经典的在线决策问题,它们在每个回合中收集信息的方式不同。我们研究了这两个问题之间的混合问题。对于一个vector $\mathbf{m} = (m_1, \ldots, m_K) \in \mathbb{N}^K$, $\mathbf{m}$-MAB问题表示将枪分成$K$个组,每个组有$m_i$枪,一旦某个枪被pull,则 Observation是所有组内的枪的loss。我们证明了$\mathbf{m}$-MAB问题的最小最大误差 bound是$\Theta\left(\sqrt{T\sum_{k=1}^K\log (m_k+1)}\right)$,并设计了一个优化的PAC算法,以实现$\mathbf{m}$-BAI问题的纯exploration版本,其目标是在最少的回合数中认定枪loss最小的枪。我们显示了这个问题的最小误差 bound是$\Theta\left(\frac{1}{\epsilon^2}\cdot \sum_{k=1}^K\log (m_k+1)\right)$。我们的Upper bound和Lower bound都可以推广到一个更通用的设定,即带有图回报的bandit问题,包括clip cover和相关的图参数。作为结果,我们得到了一些family of feedback graphs的 tight最小最大误差 bound。

Visual Explanations with Attributions and Counterfactuals on Time Series Classification

  • paper_url: http://arxiv.org/abs/2307.08494
  • repo_url: None
  • paper_authors: Udo Schlegel, Daniela Oelke, Daniel A. Keim, Mennatallah El-Assady
  • for: 这研究旨在提供一种可见化分析工作流程,用于支持在全球和本地层次上的解释,以满足透明化人工智能(XAI)的增加需求。
  • methods: 本研究使用了本地XAI技术(归因),将其应用到时间序列分类问题上,以便分析这种数据类型,通常是人类更难理解。为生成全局概述,我们使用本地归因方法对数据进行分析,并将其投影到两个维度上,以显示模型行为趋势、策略和决策界限。
  • results: 我们的研究实现了三个用例,以验证我们的技术能够帮助用户(1)探索数据变换和特征相关性,(2)找到模型行为和决策界限,以及(3)了解模型的错误原因。
    Abstract With the rising necessity of explainable artificial intelligence (XAI), we see an increase in task-dependent XAI methods on varying abstraction levels. XAI techniques on a global level explain model behavior and on a local level explain sample predictions. We propose a visual analytics workflow to support seamless transitions between global and local explanations, focusing on attributions and counterfactuals on time series classification. In particular, we adapt local XAI techniques (attributions) that are developed for traditional datasets (images, text) to analyze time series classification, a data type that is typically less intelligible to humans. To generate a global overview, we apply local attribution methods to the data, creating explanations for the whole dataset. These explanations are projected onto two dimensions, depicting model behavior trends, strategies, and decision boundaries. To further inspect the model decision-making as well as potential data errors, a what-if analysis facilitates hypothesis generation and verification on both the global and local levels. We constantly collected and incorporated expert user feedback, as well as insights based on their domain knowledge, resulting in a tailored analysis workflow and system that tightly integrates time series transformations into explanations. Lastly, we present three use cases, verifying that our technique enables users to (1)~explore data transformations and feature relevance, (2)~identify model behavior and decision boundaries, as well as, (3)~the reason for misclassifications.
    摘要 随着Explainable Artificial Intelligence(XAI)的需求增长,我们发现在不同层次的XAI技术中,任务取决的XAI方法在增加。XAI技术在全球层次解释模型行为,而在本地层次解释特定采样预测结果。我们提议一个可见分析工作流程,以便轻松地转换到全球和本地解释之间。具体来说,我们将本地XAI技术(归因),原本是为传统数据(图像、文本)开发的,应用于时间序列分类。为了生成全局概述,我们将本地归因方法应用于数据,并生成整个数据集的解释。这些解释将被投影到两个维度上,描述模型行为趋势、策略和决策边界。此外,我们还提供了一种“什么如果”分析,以便在全球和本地层次进行假设生成和验证。我们一直收集和 интегрирова了专家用户反馈,以及它们的领域知识,从而提供了适应时间序列转换的分析工作流程和系统。最后,我们提供了三个使用案例,证明了我们的技术可以让用户(1)探索数据转换和特征相关性,(2)描述模型行为和决策边界,以及(3)知道错误的原因。

Mitigating Adversarial Vulnerability through Causal Parameter Estimation by Adversarial Double Machine Learning

  • paper_url: http://arxiv.org/abs/2307.07250
  • repo_url: https://github.com/ByungKwanLee/Double-Debiased-Adversary
  • paper_authors: Byung-Kwan Lee, Junho Kim, Yong Man Ro
  • for: 防止深度神经网络受到攻击的潜在威胁,各种防御方法在不断演进。
  • methods: 采用 causal 方法 Adversarial Double Machine Learning (ADML) 测量攻击后预测结果的敏感度,并对结果进行修复。
  • results: ADML 在不同 CNN 和 Transformer 架构上进行了广泛的实验,并证明了它可以大幅提高鲁棒性并解决观察到的问题。
    Abstract Adversarial examples derived from deliberately crafted perturbations on visual inputs can easily harm decision process of deep neural networks. To prevent potential threats, various adversarial training-based defense methods have grown rapidly and become a de facto standard approach for robustness. Despite recent competitive achievements, we observe that adversarial vulnerability varies across targets and certain vulnerabilities remain prevalent. Intriguingly, such peculiar phenomenon cannot be relieved even with deeper architectures and advanced defense methods. To address this issue, in this paper, we introduce a causal approach called Adversarial Double Machine Learning (ADML), which allows us to quantify the degree of adversarial vulnerability for network predictions and capture the effect of treatments on outcome of interests. ADML can directly estimate causal parameter of adversarial perturbations per se and mitigate negative effects that can potentially damage robustness, bridging a causal perspective into the adversarial vulnerability. Through extensive experiments on various CNN and Transformer architectures, we corroborate that ADML improves adversarial robustness with large margins and relieve the empirical observation.
    摘要 deep neural networks 的决策过程可以轻松受到来自明确设计的干扰的攻击,这些攻击被称为对抗示例。为了防止这些潜在的威胁,各种基于对抗训练的防御方法在深度学习领域 быстро发展并成为了标准方法。 despite recent competitive achievements, we observe that adversarial vulnerability varies across targets and certain vulnerabilities remain prevalent. interestingly, such peculiar phenomenon cannot be relieved even with deeper architectures and advanced defense methods. to address this issue, in this paper, we introduce a causal approach called Adversarial Double Machine Learning (ADML), which allows us to quantify the degree of adversarial vulnerability for network predictions and capture the effect of treatments on the outcome of interests. ADML can directly estimate the causal parameter of adversarial perturbations per se and mitigate negative effects that can potentially damage robustness, bridging a causal perspective into the adversarial vulnerability. through extensive experiments on various CNN and Transformer architectures, we corroborate that ADML improves adversarial robustness with large margins and relieve the empirical observation.

Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-Training

  • paper_url: http://arxiv.org/abs/2307.07246
  • repo_url: None
  • paper_authors: Xiaofei Chen, Yuting He, Cheng Xue, Rongjun Ge, Shuo Li, Guanyu Yang
  • for: 这 paper 的目的是提出一种基于预训练技术的医学计算机辅助诊断方法,以便在实际应用中广泛使用。
  • methods: 这 paper 使用了 contrastive vision-language pre-training 技术,不需要人工标注,可以帮助计算机学习描述信息中的表示学习。同时,该方法还 integrate 了临床知识到学习中,以解决医学领域的大规模 semantic overlap 和 shift 问题。
  • results: 该 paper 的实验表明,通过使用 KoBo 框架,可以在 eight 个任务中 achieve 比或更好的性能,包括分类、 segmentation、 retrieval 和 semantic relatedness。
    Abstract The foundation models based on pre-training technology have significantly advanced artificial intelligence from theoretical to practical applications. These models have facilitated the feasibility of computer-aided diagnosis for widespread use. Medical contrastive vision-language pre-training, which does not require human annotations, is an effective approach for guiding representation learning using description information in diagnostic reports. However, the effectiveness of pre-training is limited by the large-scale semantic overlap and shifting problems in medical field. To address these issues, we propose the Knowledge-Boosting Contrastive Vision-Language Pre-training framework (KoBo), which integrates clinical knowledge into the learning of vision-language semantic consistency. The framework uses an unbiased, open-set sample-wise knowledge representation to measure negative sample noise and supplement the correspondence between vision-language mutual information and clinical knowledge. Extensive experiments validate the effect of our framework on eight tasks including classification, segmentation, retrieval, and semantic relatedness, achieving comparable or better performance with the zero-shot or few-shot settings. Our code is open on https://github.com/ChenXiaoFei-CS/KoBo.
    摘要 基于预训技术的基础模型在人工智能从理论到实用应用方面取得了 significi cant进步。这些模型使得计算机辅助诊断在广泛应用中变得可能。医学对比视语言预训,不需要人工注释,是一种有效的方法来导引描述信息在诊断报告中学习表示学习。然而,预训效果受到医学领域的大规模 semantically overlap和 shift问题的限制。为了解决这些问题,我们提出了知识增强对比视语言预训框架(KoBo),它将临床知识 integrate到视语言Semantic consistency学习中。该框架使用无偏、开放集样知识表示来度量负样噪声,并补做视语言相互信息和临床知识之间的对应关系。我们的实验证明,我们的框架在八个任务中,包括分类、 segmentation、 retrieve 和 Semantic relatedness 等八个任务中,可以 achieve comparable or better performance with zero-shot or few-shot setting。我们的代码可以在 上找到。

The Role of Transparency in Repeated First-Price Auctions with Unknown Valuations

  • paper_url: http://arxiv.org/abs/2307.09478
  • repo_url: None
  • paper_authors: Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Federico Fusco, Stefano Leonardi
  • for: 本研究旨在最小化恨觉的问题,即一个投标者在一系列的首价拍卖中最小化自己的损失。
  • methods: 本文使用了不同的环境假设(随机、敌意、粗糙等)来研究投标者在首价拍卖中最小化恨觉的最佳策略。
  • results: 本文完整地 caracterize了在不同环境下的最小化恨觉的最佳策略,并且发现了这些策略与拍卖过程中竞争对手的信息披露程度有直接的关系。
    Abstract We study the problem of regret minimization for a single bidder in a sequence of first-price auctions where the bidder knows the item's value only if the auction is won. Our main contribution is a complete characterization, up to logarithmic factors, of the minimax regret in terms of the auction's transparency, which regulates the amount of information on competing bids disclosed by the auctioneer at the end of each auction. Our results hold under different assumptions (stochastic, adversarial, and their smoothed variants) on the environment generating the bidder's valuations and competing bids. These minimax rates reveal how the interplay between transparency and the nature of the environment affects how fast one can learn to bid optimally in first-price auctions.
    摘要 我们研究一个单个投标者在一系列的首价拍卖中减迟 regret 的问题。我们的主要贡献是对拍卖的透明度(控制拍卖结束时公布竞拍信息的量)进行完全 caracterization,几乎很好地快速度量。我们的结果适用于不同的环境(随机、敌意、粗糙等)生成投标者的价值和竞拍行为。这些最小最大 regret 速率 revelas 拍卖透明度和环境的交互如何影响在首价拍卖中如何快速地学习优胜投标策略。

Ed-Fed: A generic federated learning framework with resource-aware client selection for edge devices

  • paper_url: http://arxiv.org/abs/2307.07199
  • repo_url: None
  • paper_authors: Zitha Sasindran, Harsha Yelchuri, T. V. Prabhakar
  • for: 本研究旨在提供一个通用和特定的 Federated Learning(FL)框架,以便在不同的边缘设备上自动进行 speech recognition 任务。
  • methods: 本研究使用了一个新的资源意识 Client 选择算法,以优化 FL 中的等待时间。此外,本研究还使用了一个统一的 FL 框架,以便在不同的边缘设备上进行实际的应用研究。
  • results: 本研究的结果显示,提案的方法可以对 FL 中的等待时间进行了重要优化,比对于传统随机 Client 选择方法更好。
    Abstract Federated learning (FL) has evolved as a prominent method for edge devices to cooperatively create a unified prediction model while securing their sensitive training data local to the device. Despite the existence of numerous research frameworks for simulating FL algorithms, they do not facilitate comprehensive deployment for automatic speech recognition tasks on heterogeneous edge devices. This is where Ed-Fed, a comprehensive and generic FL framework, comes in as a foundation for future practical FL system research. We also propose a novel resource-aware client selection algorithm to optimise the waiting time in the FL settings. We show that our approach can handle the straggler devices and dynamically set the training time for the selected devices in a round. Our evaluation has shown that the proposed approach significantly optimises waiting time in FL compared to conventional random client selection methods.
    摘要 federated 学习(FL)已经发展为边缘设备共同创建一个统一预测模型的著名方法,同时保护边缘设备的敏感训练数据本地。尽管现有许多研究框架用于模拟FL算法,但它们不能够全面实现自动听力识别任务中的各种各样边缘设备的自动化部署。这是 гдеEd-Fed,一个通用和特定的FL框架,作为未来实用FL系统研究的基础。我们还提出了一种资源意识客户端选择算法,以优化FL设置中的等待时间。我们显示,我们的方法可以处理慢跑器设备和动态设置每个选择的训练时间。我们的评估表明,我们的方法在FL中显著优化等待时间,比传统随机客户端选择方法更好。

Omnipotent Adversarial Training for Unknown Label-noisy and Imbalanced Datasets

  • paper_url: http://arxiv.org/abs/2307.08596
  • repo_url: https://github.com/guanlinlee/oat
  • paper_authors: Guanlin Li, Kangjie Chen, Yuan Xu, Han Qiu, Tianwei Zhang
  • for: 强化深度学习模型的Robustness和高级别准确率,解决实际应用中的噪声和数据不均衡问题。
  • methods: 提出了两种创新的方法来解决噪声和数据不均衡问题:首先引入了一个 oracle into the adversarial training process,帮助模型学习正确的数据-标签 conditional distribution;其次,提出了logits adjustment adversarial training方法,帮助模型学习bayes-优化分布。
  • results: OAT在复杂的数据噪声和标签噪声场景下表现出较高的清洁率提升(超过20%)和Robustness提升(超过10%),比其他基elines表现更优。代码可以在https://github.com/GuanlinLee/OAT找到。
    Abstract Adversarial training is an important topic in robust deep learning, but the community lacks attention to its practical usage. In this paper, we aim to resolve a real-world application challenge, i.e., training a model on an imbalanced and noisy dataset to achieve high clean accuracy and robustness, with our proposed Omnipotent Adversarial Training (OAT). Our strategy consists of two innovative methodologies to address the label noise and data imbalance in the training set. We first introduce an oracle into the adversarial training process to help the model learn a correct data-label conditional distribution. This carefully-designed oracle can provide correct label annotations for adversarial training. We further propose logits adjustment adversarial training to overcome the data imbalance challenge, which can help the model learn a Bayes-optimal distribution. Our comprehensive evaluation results show that OAT outperforms other baselines by more than 20% clean accuracy improvement and 10% robust accuracy improvement under the complex combinations of data imbalance and label noise scenarios. The code can be found in https://github.com/GuanlinLee/OAT.
    摘要 <系统环境>文章描述了一种强化学习的实际应用挑战,即在受损和噪声影响的数据集上训练模型,以达到高级别的清洁精度和Robustness。作者提出了一种名为“全能对抗训练”(OAT)的方法,以解决这个挑战。该方法包括两种创新的方法,用于Addressing the label noise and data imbalance in the training set。首先,作者引入了一个“oracle”机制,用于帮助模型学习正确的数据-标签 conditional distribution。这个特制的oracle可以提供正确的标签注释 для对抗训练。其次,作者提出了Logits adjustment adversarial training,用于超越数据不均衡挑战,该方法可以帮助模型学习bayes优化的分布。作者的全面评估结果表明,OAT在复杂的数据不均衡和标签噪声场景下表现出较高的纯净精度和Robustness,与基线相比,OAT的纯净精度提高了 más de 20%,Robustness提高了 más de 10%。code可以在https://github.com/GuanlinLee/OAT中找到。(Note: The translation is done using Google Translate and may not be perfect. Please let me know if you need any further assistance.)

Controlling dynamical systems to complex target states using machine learning: next-generation vs. classical reservoir computing

  • paper_url: http://arxiv.org/abs/2307.07195
  • repo_url: None
  • paper_authors: Alexander Haluszczynski, Daniel Köglmayr, Christoph Räth
  • for: 控制非线性动力系统使用机器学习可以不仅驱动系统到简单的行为如周期性,还可以驱动到更复杂的自然动力。
  • methods: 用机器学习系统训练可以重现目标动力,并且可以在受限的数据量情况下提供更好的性能。
  • results: 在带有混沌参数的 Lorenz 系统中强制实现间歇性动力,经典散列计算机器学习可以很好地完成这个任务,而下一代散列计算机器学习则在具有有限数据量情况下表现更出色。
    Abstract Controlling nonlinear dynamical systems using machine learning allows to not only drive systems into simple behavior like periodicity but also to more complex arbitrary dynamics. For this, it is crucial that a machine learning system can be trained to reproduce the target dynamics sufficiently well. On the example of forcing a chaotic parametrization of the Lorenz system into intermittent dynamics, we show first that classical reservoir computing excels at this task. In a next step, we compare those results based on different amounts of training data to an alternative setup, where next-generation reservoir computing is used instead. It turns out that while delivering comparable performance for usual amounts of training data, next-generation RC significantly outperforms in situations where only very limited data is available. This opens even further practical control applications in real world problems where data is restricted.
    摘要 使用机器学习控制非线性动力系统可以不仅将系统驱动到简单的行为如周期性,还可以将系统驱动到更加复杂的任意动力。为此,机器学习系统必须能够足够优化目标动力。在 Lorenz 系统的启动实验中,我们展示了 класичний 预设 Reservoir Computing 能够优化目标动力。在下一步,我们比较了这些结果,根据不同的训练数据量来进行比较。结果显示,对于通常的训练数据量, класичний Reservoir Computing 和 next-generation Reservoir Computing 的性能相似。但是,在仅有有限数据的情况下,next-generation RC 却能够优化性能。这开启了对实际世界问题中的应用。

Adversarial Training Over Long-Tailed Distribution

  • paper_url: http://arxiv.org/abs/2307.10205
  • repo_url: https://github.com/guanlinlee/reat
  • paper_authors: Guanlin Li, Guowen Xu, Tianwei Zhang
  • for: 本文研究了对长尾分布的数据进行对抗训练,这种情况在前一些研究中rarely被explored。
  • methods: 该 paper propose了一种新的对抗训练框架,即重新平衡对抗训练(REAT),它包括一种新的训练策略和一种特制的罚函数。
  • results: 评估结果表明,REAT可以有效提高模型的Robustness和保持模型的干净精度。代码可以在https://github.com/GuanlinLee/REAT找到。
    Abstract In this paper, we study adversarial training on datasets that obey the long-tailed distribution, which is practical but rarely explored in previous works. Compared with conventional adversarial training on balanced datasets, this process falls into the dilemma of generating uneven adversarial examples (AEs) and an unbalanced feature embedding space, causing the resulting model to exhibit low robustness and accuracy on tail data. To combat that, we propose a new adversarial training framework -- Re-balancing Adversarial Training (REAT). This framework consists of two components: (1) a new training strategy inspired by the term effective number to guide the model to generate more balanced and informative AEs; (2) a carefully constructed penalty function to force a satisfactory feature space. Evaluation results on different datasets and model structures prove that REAT can effectively enhance the model's robustness and preserve the model's clean accuracy. The code can be found in https://github.com/GuanlinLee/REAT.
    摘要 在这篇论文中,我们研究了对具有长尾分布的数据集进行反击训练,这种情况在前一些研究中很少被考虑。与传统的反击训练中的平衡数据集相比,这个过程会陷入生成不均匀的反击示例(AE)和不均匀的特征空间,导致模型的robustness和纯度在尾部数据上降低。为了解决这个问题,我们提出了一个新的反击训练框架——重新平衡反击训练(REAT)。这个框架包括两个组成部分:(1)一种基于有效数量的新训练策略,以引导模型生成更加均匀和有用的反击示例(AE);(2)一种特殊构造的罚函数,以强制模型的特征空间具有满意的性质。经过不同数据集和模型结构的评估,我们发现REAT可以有效地提高模型的robustness,同时保持模型的纯度。代码可以在https://github.com/GuanlinLee/REAT中找到。

Benchmarks and Custom Package for Electrical Load Forecasting

  • paper_url: http://arxiv.org/abs/2307.07191
  • repo_url: https://github.com/leo-vk/proenfo
  • paper_authors: Zhixian Wang, Qingsong Wen, Chaoli Zhang, Liang Sun, Leandro Von Krannichfeldt, Yi Wang
  • for: Load forecasting is of great significance in the power industry, as it can provide a reference for subsequent tasks such as power grid dispatch, thus bringing huge economic benefits.
  • methods: The paper provides a comprehensive load forecasting archive, including load domain-specific feature engineering to help forecasting models better model load data. The paper also customizes the loss function based on the forecasting error, integrating it into the forecasting framework.
  • results: The paper conducts extensive experiments on load data at different levels, providing a reference for researchers to compare different load forecasting models.
    Abstract Load forecasting is of great significance in the power industry as it can provide a reference for subsequent tasks such as power grid dispatch, thus bringing huge economic benefits. However, there are many differences between load forecasting and traditional time series forecasting. On the one hand, load forecasting aims to minimize the cost of subsequent tasks such as power grid dispatch, rather than simply pursuing prediction accuracy. On the other hand, the load is largely influenced by many external factors, such as temperature or calendar variables. In addition, the scale of predictions (such as building-level loads and aggregated-level loads) can also significantly impact the predicted results. In this paper, we provide a comprehensive load forecasting archive, which includes load domain-specific feature engineering to help forecasting models better model load data. In addition, different from the traditional loss function which only aims for accuracy, we also provide a method to customize the loss function based on the forecasting error, integrating it into our forecasting framework. Based on this, we conducted extensive experiments on load data at different levels, providing a reference for researchers to compare different load forecasting models.
    摘要 Load forecasting在电力业中具有很大的重要性,因为它可以提供后续任务 such as 电力网络调度的参考,从而带来巨大的经济效益。然而, load forecasting 和传统的时间序列预测有很多不同之处。一方面, load forecasting 的目标是最小化后续任务 such as 电力网络调度的成本,而不仅仅是追求预测精度。另一方面,荷是受到许多外部因素的影响,如温度或历法变量。此外,预测的规模(如建筑级别的荷和汇总级别的荷)也可能对预测结果产生重要影响。在这篇论文中,我们提供了一个完整的荷预测档案,其中包括荷领域特定的特征工程,以 помочь预测模型更好地模型荷数据。此外,不同于传统的损失函数,我们还提供了一种基于预测错误的自定义损失函数,并将其 integrate 到我们的预测框架中。基于这,我们在不同级别的荷数据上进行了广泛的实验,提供了参考 для研究人员比较不同的荷预测模型。

Multiplicative update rules for accelerating deep learning training and increasing robustness

  • paper_url: http://arxiv.org/abs/2307.07189
  • repo_url: None
  • paper_authors: Manos Kirtas, Nikolaos Passalis, Anastasios Tefas
  • for: 加速深度学习(DL)训练和建立更加稳定的DL模型
  • methods: 提出了一种优化框架,可以适应各种优化算法,并可以应用alternative更新规则
  • results: 提出了一种新的乘数更新规则,并将其与传统的加法更新规则相结合,实现了一种新的混合更新方法,可以加速训练,并使模型更加稳定。实验结果表明,该方法在多种任务和优化方法上达到了更好的效果。
    Abstract Even nowadays, where Deep Learning (DL) has achieved state-of-the-art performance in a wide range of research domains, accelerating training and building robust DL models remains a challenging task. To this end, generations of researchers have pursued to develop robust methods for training DL architectures that can be less sensitive to weight distributions, model architectures and loss landscapes. However, such methods are limited to adaptive learning rate optimizers, initialization schemes, and clipping gradients without investigating the fundamental rule of parameters update. Although multiplicative updates have contributed significantly to the early development of machine learning and hold strong theoretical claims, to best of our knowledge, this is the first work that investigate them in context of DL training acceleration and robustness. In this work, we propose an optimization framework that fits to a wide range of optimization algorithms and enables one to apply alternative update rules. To this end, we propose a novel multiplicative update rule and we extend their capabilities by combining it with a traditional additive update term, under a novel hybrid update method. We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule and we experimentally demonstrate their effectiveness in a wide range of task and optimization methods. Such tasks ranging from convex and non-convex optimization to difficult image classification benchmarks applying a wide range of traditionally used optimization methods and Deep Neural Network (DNN) architectures.
    摘要

DISPEL: Domain Generalization via Domain-Specific Liberating

  • paper_url: http://arxiv.org/abs/2307.07181
  • repo_url: None
  • paper_authors: Chia-Yuan Chang, Yu-Neng Chuang, Guanchu Wang, Mengnan Du, Na Zou
  • for: 这篇论文旨在提高预测性能的预测模型,并且可以在没有预测类别的情况下实现。
  • methods: 本文使用了对应的抽象特征分组,并且提出了一个post-processing精确掩蔽方法(DISPEL),可以对应抽象特征进行筛选。
  • results: 实验结果显示,使用DISPEL可以超过现有的方法,并且可以进一步泛化多种算法。
    Abstract Domain generalization aims to learn a generalization model that can perform well on unseen test domains by only training on limited source domains. However, existing domain generalization approaches often bring in prediction-irrelevant noise or require the collection of domain labels. To address these challenges, we consider the domain generalization problem from a different perspective by categorizing underlying feature groups into domain-shared and domain-specific features. Nevertheless, the domain-specific features are difficult to be identified and distinguished from the input data. In this work, we propose DomaIn-SPEcific Liberating (DISPEL), a post-processing fine-grained masking approach that can filter out undefined and indistinguishable domain-specific features in the embedding space. Specifically, DISPEL utilizes a mask generator that produces a unique mask for each input data to filter domain-specific features. The DISPEL framework is highly flexible to be applied to any fine-tuned models. We derive a generalization error bound to guarantee the generalization performance by optimizing a designed objective loss. The experimental results on five benchmarks demonstrate DISPEL outperforms existing methods and can further generalize various algorithms.
    摘要 领域总则目标是学习一个通用模型,可以在未经见过的测试领域中表现良好,只需在有限的源领域上进行训练。然而,现有的领域总则方法经常带来预测无关的噪声或需要收集领域标签。为了解决这些挑战,我们从不同的视角来看待领域总则问题,将基层特征分为领域共享特征和领域特定特征。然而,领域特定特征很难以被识别和从输入数据中分离出来。为此,我们提出了DISPEL方法,一种基于post处理的细化掩模 Approach,可以在嵌入空间中筛除未定义和难以分辨的领域特定特征。DISPEL框架可以适应任何精度调整的模型。我们 derive一个总体性的泛化误差上限,以确保泛化性能。实验结果在五个 benchmark 上表明,DISPEL方法可以超过现有方法,并可以进一步泛化多种算法。

A Surrogate Data Assimilation Model for the Estimation of Dynamical System in a Limited Area

  • paper_url: http://arxiv.org/abs/2307.07178
  • repo_url: None
  • paper_authors: Wei Kang, Liang Xu, Hong Zhou
  • for: 这个论文是为了提出一种基于学习的数据准入(DA)模型,用于高效地估计有限区域内状态。
  • methods: 该模型使用Feedforward神经网络进行在线计算,从而消除了高维有限区域模型的积算需求。这种方法可以让传统的DA算法具有显著的计算优势。此外,该方法还不需要在线和离线计算中提供限制区域模型的 lateral 边界条件。
  • results: 该surrogate DA模型的设计基于一种可靠的理论基础,利用了两个基本概念:观察性和有效区域。观察性Enable我们量化精确的DA数据必要的量度,而有效区域可以大幅减少计算 observability和生成训练数据的计算卷积。
    Abstract We propose a novel learning-based surrogate data assimilation (DA) model for efficient state estimation in a limited area. Our model employs a feedforward neural network for online computation, eliminating the need for integrating high-dimensional limited-area models. This approach offers significant computational advantages over traditional DA algorithms. Furthermore, our method avoids the requirement of lateral boundary conditions for the limited-area model in both online and offline computations. The design of our surrogate DA model is built upon a robust theoretical framework that leverages two fundamental concepts: observability and effective region. The concept of observability enables us to quantitatively determine the optimal amount of observation data necessary for accurate DA. Meanwhile, the concept of effective region substantially reduces the computational burden associated with computing observability and generating training data.
    摘要 我们提出了一种新型的学习基于的数据拟合(DA)模型,用于限区域的高效状态估计。我们的模型使用了一个Feedforward神经网络进行在线计算,从而消除了高维限区域模型的集成需求。这种方法在传统DA算法中提供了重要的计算优势。此外,我们的方法不需要限区域模型的 Lateral边界条件, neither in online nor offline computations。我们的拟合DA模型的设计基于一种可靠的理论基础,利用了两个基本概念:观察性和有效区域。观察性使我们能够量化确定需要用于准确DA的观察数据量。同时,有效区域概念减少了计算观察性和生成训练数据的计算压力。

Safe DreamerV3: Safe Reinforcement Learning with World Models

  • paper_url: http://arxiv.org/abs/2307.07176
  • repo_url: None
  • paper_authors: Weidong Huang, Jiaming Ji, Borong Zhang, Chunhe Xia, Yaodong Yang
  • for: 这篇论文是为了解决现有的RL方法不能满足实际应用中的安全要求而提出的一种新方法。
  • methods: 这篇论文使用了拉格拉尼ан-based和规划-based方法,并在世界模型中集成了这些方法。
  • results: 该方法在Safety-Gymnasium benchmark中能够在低维度和视觉任务中实现几乎零成本,是现有SafeRL方法中第一个达到这种目标的算法。Here’s the format you requested:
  • for: <what are the paper written for?>
  • methods: <what methods the paper use?>
  • results: <what results the paper get?>I hope this helps! Let me know if you have any other questions.
    Abstract The widespread application of Reinforcement Learning (RL) in real-world situations is yet to come to fruition, largely as a result of its failure to satisfy the essential safety demands of such systems. Existing safe reinforcement learning (SafeRL) methods, employing cost functions to enhance safety, fail to achieve zero-cost in complex scenarios, including vision-only tasks, even with comprehensive data sampling and training. To address this, we introduce Safe DreamerV3, a novel algorithm that integrates both Lagrangian-based and planning-based methods within a world model. Our methodology represents a significant advancement in SafeRL as the first algorithm to achieve nearly zero-cost in both low-dimensional and vision-only tasks within the Safety-Gymnasium benchmark. Our project website can be found in: https://sites.google.com/view/safedreamerv3.
    摘要 RL在实际场景中广泛应用仍未实现,主要是因为它无法满足实际系统的安全需求。现有的安全强化学习(SafeRL)方法,通过成本函数提高安全性,在复杂的场景中,包括视觉任务,甚至 WITH 全面数据采样和训练仍未达到零成本。为解决这个问题,我们介绍了Safe DreamerV3算法,它将拉格朗日式方法和规划方法 integrate 到世界模型中。我们的方法在 Safety-Gymnasium benchmark 中实现了low-dimensional和视觉任务中的几乎零成本,是SafeRL中的一大突破。您可以在以下网站上找到我们的项目网站:https://sites.google.com/view/safedreamerv3.

FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout

  • paper_url: http://arxiv.org/abs/2307.07172
  • repo_url: None
  • paper_authors: Jingjing Xue, Min Liu, Sheng Sun, Yuwei Wang, Hui Jiang, Xuefeng Jiang
  • for: 本研究旨在提出一种基于 bayesian 推理的 federated learning 方法(FedBIAD),以提高 federated learning 的 Training 效率和精度。
  • methods: 本方法基于 weight 行的概率分布,采用可靠的排序方法,并且只将非释出的 weight 行传输到服务器。
  • results: 比较 experiments 表明,FedBIAD 可以提供 2x 的上行减少和 2.41% 的精度提高,而且可以适应非 Independently 和 Identically Distributed(non-IID)数据。
    Abstract Federated Learning (FL) emerges as a distributed machine learning paradigm without end-user data transmission, effectively avoiding privacy leakage. Participating devices in FL are usually bandwidth-constrained, and the uplink is much slower than the downlink in wireless networks, which causes a severe uplink communication bottleneck. A prominent direction to alleviate this problem is federated dropout, which drops fractional weights of local models. However, existing federated dropout studies focus on random or ordered dropout and lack theoretical support, resulting in unguaranteed performance. In this paper, we propose Federated learning with Bayesian Inference-based Adaptive Dropout (FedBIAD), which regards weight rows of local models as probability distributions and adaptively drops partial weight rows based on importance indicators correlated with the trend of local training loss. By applying FedBIAD, each client adaptively selects a high-quality dropping pattern with accurate approximations and only transmits parameters of non-dropped weight rows to mitigate uplink costs while improving accuracy. Theoretical analysis demonstrates that the convergence rate of the average generalization error of FedBIAD is minimax optimal up to a squared logarithmic factor. Extensive experiments on image classification and next-word prediction show that compared with status quo approaches, FedBIAD provides 2x uplink reduction with an accuracy increase of up to 2.41% even on non-Independent and Identically Distributed (non-IID) data, which brings up to 72% decrease in training time.
    摘要 federated learning (FL) emerges as a distributed machine learning paradigm without end-user data transmission, effectively avoiding privacy leakage. participating devices in FL are usually bandwidth-constrained, and the uplink is much slower than the downlink in wireless networks, which causes a severe uplink communication bottleneck. a prominent direction to alleviate this problem is federated dropout, which drops fractional weights of local models. however, existing federated dropout studies focus on random or ordered dropout and lack theoretical support, resulting in unguaranteed performance. in this paper, we propose federated learning with bayesian inference-based adaptive dropout (fedbiad), which regards weight rows of local models as probability distributions and adaptively drops partial weight rows based on importance indicators correlated with the trend of local training loss. by applying fedbiad, each client adaptively selects a high-quality dropping pattern with accurate approximations and only transmits parameters of non-dropped weight rows to mitigate uplink costs while improving accuracy. theoretical analysis demonstrates that the convergence rate of the average generalization error of fedbiad is minimax optimal up to a squared logarithmic factor. extensive experiments on image classification and next-word prediction show that compared with status quo approaches, fedbiad provides 2x uplink reduction with an accuracy increase of up to 2.41% even on non-independent and identically distributed (non-iid) data, which brings up to 72% decrease in training time.

HYTREL: Hypergraph-enhanced Tabular Data Representation Learning

  • paper_url: http://arxiv.org/abs/2307.08623
  • repo_url: None
  • paper_authors: Pei Chen, Soumajyoti Sarkar, Leonard Lausen, Balasubramaniam Srinivasan, Sheng Zha, Ruihong Huang, George Karypis
  • for: 该 paper 的目的是提出一种基于 hypergraph 的表格语言模型(HYTREL),以捕捉表格数据中的 permutation 不变性和层次结构等特性。
  • methods: 该 paper 使用 hypergraphs 将表格 cells 作为节点,并通过三种不同类型的 hyperedges 表示表格中每一行、每一列和整个表格的结构。
  • results: 实验结果显示,HYTREL 在四个下游任务上具有优于其他竞争对手的表现,并且具有最大的 permutation 不变性。qualitative 分析表明,HYTREL 可以吸收表格结构,生成稳定的表格 Cell、行、列和整个表格的表示。
    Abstract Language models pretrained on large collections of tabular data have demonstrated their effectiveness in several downstream tasks. However, many of these models do not take into account the row/column permutation invariances, hierarchical structure, etc. that exist in tabular data. To alleviate these limitations, we propose HYTREL, a tabular language model, that captures the permutation invariances and three more structural properties of tabular data by using hypergraphs - where the table cells make up the nodes and the cells occurring jointly together in each row, column, and the entire table are used to form three different types of hyperedges. We show that HYTREL is maximally invariant under certain conditions for tabular data, i.e., two tables obtain the same representations via HYTREL iff the two tables are identical up to permutations. Our empirical results demonstrate that HYTREL consistently outperforms other competitive baselines on four downstream tasks with minimal pretraining, illustrating the advantages of incorporating the inductive biases associated with tabular data into the representations. Finally, our qualitative analyses showcase that HYTREL can assimilate the table structures to generate robust representations for the cells, rows, columns, and the entire table.
    摘要 Language models pre-trained on large collections of tabular data have demonstrated their effectiveness in several downstream tasks. However, many of these models do not take into account the row/column permutation invariances, hierarchical structure, etc. that exist in tabular data. To alleviate these limitations, we propose HYTREL, a tabular language model, that captures the permutation invariances and three more structural properties of tabular data by using hypergraphs - where the table cells make up the nodes and the cells occurring jointly together in each row, column, and the entire table are used to form three different types of hyperedges. We show that HYTREL is maximally invariant under certain conditions for tabular data, i.e., two tables obtain the same representations via HYTREL iff the two tables are identical up to permutations. Our empirical results demonstrate that HYTREL consistently outperforms other competitive baselines on four downstream tasks with minimal pretraining, illustrating the advantages of incorporating the inductive biases associated with tabular data into the representations. Finally, our qualitative analyses showcase that HYTREL can assimilate the table structures to generate robust representations for the cells, rows, columns, and the entire table.Here's the translation in Traditional Chinese:Language models pre-trained on large collections of tabular data have demonstrated their effectiveness in several downstream tasks. However, many of these models do not take into account the row/column permutation invariances, hierarchical structure, etc. that exist in tabular data. To alleviate these limitations, we propose HYTREL, a tabular language model, that captures the permutation invariances and three more structural properties of tabular data by using hypergraphs - where the table cells make up the nodes and the cells occurring jointly together in each row, column, and the entire table are used to form three different types of hyperedges. We show that HYTREL is maximally invariant under certain conditions for tabular data, i.e., two tables obtain the same representations via HYTREL iff the two tables are identical up to permutations. Our empirical results demonstrate that HYTREL consistently outperforms other competitive baselines on four downstream tasks with minimal pretraining, illustrating the advantages of incorporating the inductive biases associated with tabular data into the representations. Finally, our qualitative analyses showcase that HYTREL can assimilate the table structures to generate robust representations for the cells, rows, columns, and the entire table.

Certified Robustness for Large Language Models with Self-Denoising

  • paper_url: http://arxiv.org/abs/2307.07171
  • repo_url: None
  • paper_authors: Zhen Zhang, Guanhua Zhang, Bairu Hou, Wenqi Fan, Qing Li, Sijia Liu, Yang Zhang, Shiyu Chang
  • For: 该论文目的是提高大型语言模型(LLM)的稳定性和鲁棒性,使其在高风险环境中能够更加可靠。* Methods: 该论文提出了一种基于 LLM 自我净化方法,通过利用 LLM 的多任务特性来降低随机干扰的影响,提高 LLM 的证明性和鲁棒性。* Results: 实验结果显示,该方法可以在证明稳定性和实际鲁棒性两个方面表现出色,并且比现有的证明方法更高效和灵活。Here’s the English version of the summary for reference:* For: The paper aims to improve the robustness and stability of large language models (LLMs) in high-stakes environments, making them more reliable.* Methods: The paper proposes a self-denoising method based on LLM’s multitasking nature to reduce the impact of random noise and improve the certification and empirical robustness of LLMs.* Results: Experimental results show that the proposed method outperforms existing certification methods in both certified robustness and empirical robustness, and is more efficient and flexible.
    Abstract Although large language models (LLMs) have achieved great success in vast real-world applications, their vulnerabilities towards noisy inputs have significantly limited their uses, especially in high-stake environments. In these contexts, it is crucial to ensure that every prediction made by large language models is stable, i.e., LLM predictions should be consistent given minor differences in the input. This largely falls into the study of certified robust LLMs, i.e., all predictions of LLM are certified to be correct in a local region around the input. Randomized smoothing has demonstrated great potential in certifying the robustness and prediction stability of LLMs. However, randomized smoothing requires adding noise to the input before model prediction, and its certification performance depends largely on the model's performance on corrupted data. As a result, its direct application to LLMs remains challenging and often results in a small certification radius. To address this issue, we take advantage of the multitasking nature of LLMs and propose to denoise the corrupted inputs with LLMs in a self-denoising manner. Different from previous works like denoised smoothing, which requires training a separate model to robustify LLM, our method enjoys far better efficiency and flexibility. Our experiment results show that our method outperforms the existing certification methods under both certified robustness and empirical robustness. The codes are available at https://github.com/UCSB-NLP-Chang/SelfDenoise.
    摘要 尽管大语言模型(LLM)在各种实际应用中取得了很大成功,但它们对噪输入的敏感性却有限制了其应用范围,特别是在高赔率环境中。在这些情况下,确保大语言模型的每一个预测是稳定的,即LLM的预测结果在输入数据的小地方很少异常。这主要归结于证明了大语言模型的稳定性和预测稳定性。随机填充有显示出了潜在的潜在性,但随机填充需要在模型预测之前添加噪音,其证明性取决于模型在损害数据上的性能。因此,直接应用随机填充到LLM上是挑战性的,通常会导致小的证明半径。为解决这个问题,我们利用了LLM的多任务性,并提议使用LLM自身来推leans噪音。与前一些denoised smoothing方法不同,我们的方法不需要单独训练一个robust化模型,因此具有更高的效率和灵活性。我们的实验结果表明,我们的方法在证明性和实际性两个方面都高于现有的证明方法。代码可以在https://github.com/UCSB-NLP-Chang/SelfDenoise中获取。

Vulnerability-Aware Instance Reweighting For Adversarial Training

  • paper_url: http://arxiv.org/abs/2307.07167
  • repo_url: None
  • paper_authors: Olukorede Fakorede, Ashutosh Kumar Nirala, Modeste Atsague, Jin Tian
  • For: 本研究旨在提高深度学习分类器对攻击性样本的Robustness,通过在训练过程中包含攻击样本来提高分类器的Robustness。* Methods: 本研究使用了不同的重量调整方法,以优化分类器的Robustness。这些方法包括例子级重量调整、损失函数重量调整等。* Results: 经过EXTENSIVE EXPERIMENTS,研究发现,提出的新方法可以在不同的攻击方式下提高分类器的Robustness,特别是对于强白盒和黑盒攻击。
    Abstract Adversarial Training (AT) has been found to substantially improve the robustness of deep learning classifiers against adversarial attacks. AT involves obtaining robustness by including adversarial examples in training a classifier. Most variants of AT algorithms treat every training example equally. However, recent works have shown that better performance is achievable by treating them unequally. In addition, it has been observed that AT exerts an uneven influence on different classes in a training set and unfairly hurts examples corresponding to classes that are inherently harder to classify. Consequently, various reweighting schemes have been proposed that assign unequal weights to robust losses of individual examples in a training set. In this work, we propose a novel instance-wise reweighting scheme. It considers the vulnerability of each natural example and the resulting information loss on its adversarial counterpart occasioned by adversarial attacks. Through extensive experiments, we show that our proposed method significantly improves over existing reweighting schemes, especially against strong white and black-box attacks.
    摘要 《对抗训练》(AT)已经发现可以大幅提高深度学习分类器对假数据攻击的抗性。AT通过在训练分类器时包括对抗示例来获得强度。大多数AT算法对每个训练示例进行相同的处理。然而,最近的研究表明,可以通过不同的处理方式来获得更好的性能。此外,AT对不同类别在训练集中的影响不均,会不公正地增加对抗示例的损害,尤其是对于难以分类的类别。因此,各种重量调整方案已经被提出,将各个示例的Robust损害分配不同的重量。在这项工作中,我们提出了一种新的实例级重量调整方案。它考虑了每个自然示例的抗性和由对抗攻击引起的信息损失的对应 adversarial example。通过广泛的实验,我们显示了我们提出的方法在现有的重量调整方案中具有显著的优势,特别是对于强大的白盒和黑盒攻击。

ISAC-NET: Model-driven Deep Learning for Integrated Passive Sensing and Communication

  • paper_url: http://arxiv.org/abs/2307.15074
  • repo_url: None
  • paper_authors: Wangjun Jiang, Dingyou Ma, Zhiqing Wei, Zhiyong Feng, Ping Zhang
  • for: 这个论文旨在探讨一种统合感应和通信技术(ISAC技术),并且强调了通信调读错误的影响。
  • methods: 这个论文提出了一个基于深度学习(DL)的ISAC网络(ISAC-NET),它结合了感应和通信信号检测,并且同时获得了感应结果和通信调读结果。
  • results: 实验结果显示,ISAC-NET比传统的信号调读算法(OAMP-Net2)更好地实现通信性能,并且与2D-DFT算法相比,ISAC-NET的感应性能更高。
    Abstract Recent advances in wireless communication with the enormous demands of sensing ability have given rise to the integrated sensing and communication (ISAC) technology, among which passive sensing plays an important role. The main challenge of passive sensing is how to achieve high sensing performance in the condition of communication demodulation errors. In this paper, we propose an ISAC network (ISAC-NET) that combines passive sensing with communication signal detection by using model-driven deep learning (DL). Dissimilar to existing passive sensing algorithms that first demodulate the transmitted symbols and then obtain passive sensing results from the demodulated symbols, ISAC-NET obtains passive sensing results and communication demodulated symbols simultaneously. Different from the data-driven DL method, we adopt the block-by-block signal processing method that divides the ISAC-NET into the passive sensing module, signal detection module and channel reconstruction module. From the simulation results, ISAC-NET obtains better communication performance than the traditional signal demodulation algorithm, which is close to OAMP-Net2. Compared to the 2D-DFT algorithm, ISAC-NET demonstrates significantly enhanced sensing performance. In summary, ISAC-NET is a promising tool for passive sensing and communication in wireless communications.
    摘要 近年来,无线通信技术的发展,对感知能力的巨大需求,已经促使出一种集成感知和通信(ISAC)技术的出现,其中被动感知占据重要地位。被动感知的主要挑战是如何在通信模式错误的情况下实现高度的感知性能。本文提出一种名为ISAC网络(ISAC-NET),它将被动感知与通信信号检测结合,使用模型驱动深度学习(DL)。与现有的被动感知算法不同,ISAC-NET在获取被动感知结果和通信模式错误的同时,也可以同时获取通信解调结果。与传统的数据驱动DL方法不同,我们采用了块级Signal Processing方法,将ISAC-NET分为感知模块、信号检测模块和通道重建模块。从 simulate结果来看,ISAC-NET在通信性能方面比传统的信号解调算法更好,与OAMP-Net2几乎相当。相比2D-DFT算法,ISAC-NET在感知性能方面表现明显提高。总之,ISAC-NET是无线通信中可靠的被动感知和通信工具。

Do not Mask Randomly: Effective Domain-adaptive Pre-training by Masking In-domain Keywords

  • paper_url: http://arxiv.org/abs/2307.07160
  • repo_url: None
  • paper_authors: Shahriar Golchin, Mihai Surdeanu, Nazgol Tavabi, Ata Kiapour
  • for: 提高任务域预训练的性能
  • methods: 选择ively屏蔽域预训练中的关键词(使用KeyBERT)
  • results: 在六个不同的设置中,采用本方法进行域预训练后的精度提高,并且比随机屏蔽和普通的预训练-then-精度调整方法更高。屏蔽关键词的时间开销可以控制在7-15%(for two epochs)。
    Abstract We propose a novel task-agnostic in-domain pre-training method that sits between generic pre-training and fine-tuning. Our approach selectively masks in-domain keywords, i.e., words that provide a compact representation of the target domain. We identify such keywords using KeyBERT (Grootendorst, 2020). We evaluate our approach using six different settings: three datasets combined with two distinct pre-trained language models (PLMs). Our results reveal that the fine-tuned PLMs adapted using our in-domain pre-training strategy outperform PLMs that used in-domain pre-training with random masking as well as those that followed the common pre-train-then-fine-tune paradigm. Further, the overhead of identifying in-domain keywords is reasonable, e.g., 7-15% of the pre-training time (for two epochs) for BERT Large (Devlin et al., 2019).
    摘要 我们提出了一种新的任务非对称在领域预训练方法,位于通用预训练和精度调整之间。我们的方法选择性地遮盖领域关键词(Grootendorst, 2020),即预测目标领域的紧凑表示。我们使用 six 个不同的设置进行评估:三个数据集与两种不同的预训练语言模型(PLMs)结合。我们的结果表明,使用我们的领域预训练策略进行精度调整后,PLMs 的性能比使用随机遮盖预训练和通用预训练-然后调整的方法都高得多。此外,在标识领域关键词的时间上,只需要投入 7-15% 的时间(对 BERT Large 来说,Devlin et al., 2019)。

Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions

  • paper_url: http://arxiv.org/abs/2307.15073
  • repo_url: https://github.com/leojklarner/q-savi
  • paper_authors: Leo Klarner, Tim G. J. Rudner, Michael Reutlinger, Torsten Schindler, Garrett M. Morris, Charlotte Deane, Yee Whye Teh
  • for: 本研究旨在加速发现新有效药物,使用深度学习方法来解决药物发现任务中的数据罕见和变化问题。
  • methods: 本文提出了一种名为Q-SAVI的概率模型,该模型可以Explicitly encode prior knowledge of the data-generating process into a prior distribution over functions,提供了一种透明和 probabilistically principled的方法来编码数据驱动模型化偏好。
  • results: 通过使用Q-SAVI模型,可以在许多State-of-the-art自动预训练和领域调整技术的基础上获得显著提高的预测精度和准确性,并且可以在挑战性评价setup下表现出色。
    Abstract Accelerating the discovery of novel and more effective therapeutics is an important pharmaceutical problem in which deep learning is playing an increasingly significant role. However, real-world drug discovery tasks are often characterized by a scarcity of labeled data and significant covariate shift$\unicode{x2013}\unicode{x2013}$a setting that poses a challenge to standard deep learning methods. In this paper, we present Q-SAVI, a probabilistic model able to address these challenges by encoding explicit prior knowledge of the data-generating process into a prior distribution over functions, presenting researchers with a transparent and probabilistically principled way to encode data-driven modeling preferences. Building on a novel, gold-standard bioactivity dataset that facilitates a meaningful comparison of models in an extrapolative regime, we explore different approaches to induce data shift and construct a challenging evaluation setup. We then demonstrate that using Q-SAVI to integrate contextualized prior knowledge of drug-like chemical space into the modeling process affords substantial gains in predictive accuracy and calibration, outperforming a broad range of state-of-the-art self-supervised pre-training and domain adaptation techniques.
    摘要 加速发现新有效的药物是药品工业中一个重要的问题,深度学习在这个领域中发挥了越来越重要的作用。然而,实际的药物发现任务常常受到数据标注的罕见和变量差异的限制,这种情况对标准的深度学习方法提出了挑战。在这篇论文中,我们提出了Q-SAVI模型,该模型能够Address these challenges by encoding explicit prior knowledge of the data-generating process into a prior distribution over functions, providing researchers with a transparent and probabilistically principled way to encode data-driven modeling preferences。基于一个新的、标准的生物活性数据集,我们explore different approaches to induce data shift and construct a challenging evaluation setup。然后,我们示出了使用Q-SAVI模型将contextualized prior knowledge of drug-like chemical space integrate into the modeling process可以获得显著的预测精度和准确性提升,超过了一系列当前状态最佳的自动学习和领域适应技术。

Global path preference and local response: A reward decomposition approach for network path choice analysis in the presence of locally perceived attributes

  • paper_url: http://arxiv.org/abs/2307.08646
  • repo_url: None
  • paper_authors: Yuki Oyama
  • For: This study analyzes the global and local path preferences of network travelers, using a reward decomposition approach integrated into a link-based recursive (Markovian) path choice model.* Methods: The approach decomposes the instantaneous reward function into global and local utilities, allowing the analysis of how attributes affect global and local path choices. The model can be estimated based on revealed path observations, without the need for plan information.* Results: The study found that pedestrians locally perceive and react to visual street quality, rather than having pre-trip global perceptions. The simulation results highlight the importance of location selection of interventions when policy-related attributes are only locally perceived by travelers.Here is the same information in Simplified Chinese text, as requested:* For: 这项研究 analyzes the global and local path preferences of network travelers, using a reward decomposition approach integrated into a link-based recursive (Markovian) path choice model.* Methods: 该方法 decomposes the instantaneous reward function into global and local utilities, allowing the analysis of how attributes affect global and local path choices. The model can be estimated based on revealed path observations, without the need for plan information.* Results: 研究发现 pedestrians locally perceive and react to visual street quality, rather than having pre-trip global perceptions. 模拟结果 highlights the importance of location selection of interventions when policy-related attributes are only locally perceived by travelers.
    Abstract This study performs an attribute-level analysis of the global and local path preferences of network travelers. To this end, a reward decomposition approach is proposed and integrated into a link-based recursive (Markovian) path choice model. The approach decomposes the instantaneous reward function associated with each state-action pair into the global utility, a function of attributes globally perceived from anywhere in the network, and the local utility, a function of attributes that are only locally perceived from the current state. Only the global utility then enters the value function of each state, representing the future expected utility toward the destination. This global-local path choice model with decomposed reward functions allows us to analyze to what extent and which attributes affect the global and local path choices of agents. Moreover, unlike most adaptive path choice models, the proposed model can be estimated based on revealed path observations (without the information of plans) and as efficiently as deterministic recursive path choice models. The model was applied to the real pedestrian path choice observations in an urban street network where the green view index was extracted as a visual street quality from Google Street View images. The result revealed that pedestrians locally perceive and react to the visual street quality, rather than they have the pre-trip global perception on it. Furthermore, the simulation results using the estimated models suggested the importance of location selection of interventions when policy-related attributes are only locally perceived by travelers.
    摘要

Looking deeper into interpretable deep learning in neuroimaging: a comprehensive survey

  • paper_url: http://arxiv.org/abs/2307.09615
  • repo_url: None
  • paper_authors: Md. Mahfuzur Rahman, Vince D. Calhoun, Sergey M. Plis
  • for: This paper aims to comprehensively review interpretable deep learning models in the neuroimaging domain, discussing the current status of interpretability resources, challenges, and limitations, as well as offering insights and guidance for future research directions.
  • methods: The paper focuses on interpretable deep learning models in neuroimaging, including multiple recent studies that have leveraged model interpretability to capture anatomical and functional brain alterations most relevant to model predictions.
  • results: The paper discusses the limitations of current practices and offers valuable insights and guidance for future research directions to make deep learning models substantially interpretable and advance scientific understanding of brain disorders.
    Abstract Deep learning (DL) models have been popular due to their ability to learn directly from the raw data in an end-to-end paradigm, alleviating the concern of a separate error-prone feature extraction phase. Recent DL-based neuroimaging studies have also witnessed a noticeable performance advancement over traditional machine learning algorithms. But the challenges of deep learning models still exist because of the lack of transparency in these models for their successful deployment in real-world applications. In recent years, Explainable AI (XAI) has undergone a surge of developments mainly to get intuitions of how the models reached the decisions, which is essential for safety-critical domains such as healthcare, finance, and law enforcement agencies. While the interpretability domain is advancing noticeably, researchers are still unclear about what aspect of model learning a post hoc method reveals and how to validate its reliability. This paper comprehensively reviews interpretable deep learning models in the neuroimaging domain. Firstly, we summarize the current status of interpretability resources in general, focusing on the progression of methods, associated challenges, and opinions. Secondly, we discuss how multiple recent neuroimaging studies leveraged model interpretability to capture anatomical and functional brain alterations most relevant to model predictions. Finally, we discuss the limitations of the current practices and offer some valuable insights and guidance on how we can steer our future research directions to make deep learning models substantially interpretable and thus advance scientific understanding of brain disorders.
    摘要 在这篇评论中,我们将对 interpretable deep learning 模型在 neuroscience 领域进行全面的回顾。首先,我们将概括当前可用的解释性资源,包括方法的进步、相关的挑战和意见。其次,我们将讨论如何通过模型解释性来捕捉神经成像和功能变化,这些变化与模型预测之间的相互关系。最后,我们将讨论当前实践中的限制,并提供一些有价值的意见和指导,以帮助我们未来的研究方向,以便使深度学习模型变得可靠并提高神经疾病科学的理解。

Multi-Dimensional Ability Diagnosis for Machine Learning Algorithms

  • paper_url: http://arxiv.org/abs/2307.07134
  • repo_url: https://github.com/kellygong/camilla
  • paper_authors: Qi Liu, Zheng Gong, Zhenya Huang, Chuanren Liu, Hengshu Zhu, Zhi Li, Enhong Chen, Hui Xiong
  • for: 这 paper 旨在提出一种任务无关的评估框架 Camilla,用于评估机器学习算法的多方面强点。
  • methods: 该框架基于心理测量理论,使用神经网络和认知推理来学习机器学习算法对数据样本的回应,并同时量化每个算法的多种技能和样本因素。
  • results: 对公共数据集进行了广泛的实验,结果表明 Camilla 不仅能够更准确地评估机器学习算法的优缺点,还能够超越现有的基准值在度量可靠性、排名一致性和排名稳定性等方面。
    Abstract Machine learning algorithms have become ubiquitous in a number of applications (e.g. image classification). However, due to the insufficient measurement of traditional metrics (e.g. the coarse-grained Accuracy of each classifier), substantial gaps are usually observed between the real-world performance of these algorithms and their scores in standardized evaluations. In this paper, inspired by the psychometric theories from human measurement, we propose a task-agnostic evaluation framework Camilla, where a multi-dimensional diagnostic metric Ability is defined for collaboratively measuring the multifaceted strength of each machine learning algorithm. Specifically, given the response logs from different algorithms to data samples, we leverage cognitive diagnosis assumptions and neural networks to learn the complex interactions among algorithms, samples and the skills (explicitly or implicitly pre-defined) of each sample. In this way, both the abilities of each algorithm on multiple skills and some of the sample factors (e.g. sample difficulty) can be simultaneously quantified. We conduct extensive experiments with hundreds of machine learning algorithms on four public datasets, and our experimental results demonstrate that Camilla not only can capture the pros and cons of each algorithm more precisely, but also outperforms state-of-the-art baselines on the metric reliability, rank consistency and rank stability.
    摘要

Efficient Strongly Polynomial Algorithms for Quantile Regression

  • paper_url: http://arxiv.org/abs/2307.08706
  • repo_url: None
  • paper_authors: Suraj Shetiya, Shohedul Hasan, Abolfazl Asudeh, Gautam Das
  • for: 本研究旨在探讨量化回归(Quantile Regression,QR)的经典技术,并提出多种高效的强 polynomial 算法,以解决 QR 的计算复杂性问题。
  • methods: 本文使用了多种算法,包括 deterministic 和 randomized 算法,以解决 QR 的计算问题。其中,两维 QR 使用了 $k$-set геометрических思想,并提出了 deterministic worst-case 时间复杂度为 $\mathcal{O}(n^{4/3} polylog(n))$ 和 expected 时间复杂度为 $\mathcal{O}(n^{4/3})$ 的算法。
  • results: 本文提出了多种高效的强 polynomial 算法,以解决 QR 的计算问题。其中,两维 QR 的算法有 deterministic worst-case 时间复杂度为 $\mathcal{O}(n^{4/3} polylog(n))$ 和 expected 时间复杂度为 $\mathcal{O}(n^{4/3})$,而高维 QR 的算法有 expected 时间复杂度为 $\mathcal{O}(n^{d-1}\log^2(n))$。
    Abstract Linear Regression is a seminal technique in statistics and machine learning, where the objective is to build linear predictive models between a response (i.e., dependent) variable and one or more predictor (i.e., independent) variables. In this paper, we revisit the classical technique of Quantile Regression (QR), which is statistically a more robust alternative to the other classical technique of Ordinary Least Square Regression (OLS). However, while there exist efficient algorithms for OLS, almost all of the known results for QR are only weakly polynomial. Towards filling this gap, this paper proposes several efficient strongly polynomial algorithms for QR for various settings. For two dimensional QR, making a connection to the geometric concept of $k$-set, we propose an algorithm with a deterministic worst-case time complexity of $\mathcal{O}(n^{4/3} polylog(n))$ and an expected time complexity of $\mathcal{O}(n^{4/3})$ for the randomized version. We also propose a randomized divide-and-conquer algorithm -- RandomizedQR with an expected time complexity of $\mathcal{O}(n\log^2{(n)})$ for two dimensional QR problem. For the general case with more than two dimensions, our RandomizedQR algorithm has an expected time complexity of $\mathcal{O}(n^{d-1}\log^2{(n)})$.
    摘要 在二维QR中,我们与$k$-set的几何概念相连,提出了一个deterministic最坏情况时间复杂度为$\mathcal{O}(n^{4/3}polylog(n))$,并且随机版本的时间复杂度为$\mathcal{O}(n^{4/3})$。此外,我们还提出了一种随机分割算法---RandomizedQR,其预期时间复杂度为$\mathcal{O}(n\log^2(n))$。在多维QR中,我们的RandomizedQR算法的预期时间复杂度为$\mathcal{O}(n^{d-1}\log^2(n))。

DataAssist: A Machine Learning Approach to Data Cleaning and Preparation

  • paper_url: http://arxiv.org/abs/2307.07119
  • repo_url: None
  • paper_authors: Kartikay Goyle, Quin Xie, Vakul Goyle
  • for: 提高数据整理和清洁效率,减少数据分析时间
  • methods: 使用机器学习 Informed 方法自动完成数据整理和清洁,包括生成变量视觉化、统一数据注释、减少异常值和数据预处理
  • results: 可以为不同领域,如经济、商业和预测应用,提高数据整理和清洁效率,保留50%以上时间 для下游分析
    Abstract Current automated machine learning (ML) tools are model-centric, focusing on model selection and parameter optimization. However, the majority of the time in data analysis is devoted to data cleaning and wrangling, for which limited tools are available. Here we present DataAssist, an automated data preparation and cleaning platform that enhances dataset quality using ML-informed methods. We show that DataAssist provides a pipeline for exploratory data analysis and data cleaning, including generating visualization for user-selected variables, unifying data annotation, suggesting anomaly removal, and preprocessing data. The exported dataset can be readily integrated with other autoML tools or user-specified model for downstream analysis. Our data-centric tool is applicable to a variety of fields, including economics, business, and forecasting applications saving over 50% time of the time spent on data cleansing and preparation.
    摘要 当前的自动化机器学习(ML)工具都是模型集中心的,它们主要关注模型选择和参数优化。然而,数据分析的大部分时间被用于数据清洁和整理,而这个领域的工具却很有限。我们现在提出了数据协助(DataAssist),一个自动化数据准备和清洁平台,使用机器学习 Informed 方法来提高数据集质量。我们展示了 DataAssist 提供了探索数据分析和数据清洁的管道,包括生成用户选择变量的视觉化,统一数据注释,建议异常删除,并进行数据预处理。导出的数据集可以轻松地与其他自动ML工具或用户指定的模型进行下游分析。我们的数据集中心的工具适用于多个领域,包括经济、商业和预测应用,可以节省大约50%的时间用于数据清洁和准备。

An IPW-based Unbiased Ranking Metric in Two-sided Markets

  • paper_url: http://arxiv.org/abs/2307.10204
  • repo_url: None
  • paper_authors: Keisho Oh, Naoki Nishimura, Minje Sung, Ken Kobayashi, Kazuhide Nakata
  • For: 该论文目的是提出一种适应两侧市场中偏见的学习到排序(LTR)方法,以便在 clicked 数据中优先级化偏见的项目。* Methods: 该论文提出了一种基于对抗风险的偏见权重(IPW)技术,并将其扩展到两侧市场中,以处理双方用户之间的复杂的偏见交互。* Results: 该论文通过实验示范了其效果,特别是在处理罕见项目时表现出了更高的精度和更好的稳定性。
    Abstract In modern recommendation systems, unbiased learning-to-rank (LTR) is crucial for prioritizing items from biased implicit user feedback, such as click data. Several techniques, such as Inverse Propensity Weighting (IPW), have been proposed for single-sided markets. However, less attention has been paid to two-sided markets, such as job platforms or dating services, where successful conversions require matching preferences from both users. This paper addresses the complex interaction of biases between users in two-sided markets and proposes a tailored LTR approach. We first present a formulation of feedback mechanisms in two-sided matching platforms and point out that their implicit feedback may include position bias from both user groups. On the basis of this observation, we extend the IPW estimator and propose a new estimator, named two-sided IPW, to address the position bases in two-sided markets. We prove that the proposed estimator satisfies the unbiasedness for the ground-truth ranking metric. We conducted numerical experiments on real-world two-sided platforms and demonstrated the effectiveness of our proposed method in terms of both precision and robustness. Our experiments showed that our method outperformed baselines especially when handling rare items, which are less frequently observed in the training data.
    摘要 现代推荐系统中,无偏学习排名(LTR)在处理偏见用户反馈(如点击数据)的首位是关键。多种技术,如反投重量(IPW),已经为单边市场提出了解决方案。然而,对于双边市场,如寻找服务或约会服务,成功的转化需要从双方用户的匹配偏好中找到相互作用。本文描述了双边匹配平台上的反馈机制,并指出了用户群体之间的位置偏好可能会包含在内部反馈中。基于这一观察,我们扩展了IPW估计器,并提出了一种新的估计器——双边IPW,以解决双边市场中的位置基准。我们证明了我们提出的估计器满足了真实排名度量下的无偏性。我们在实际的双边平台上进行了数值实验,并证明了我们的方法在精度和稳定性两个方面的表现比基eline更好,特别是处理罕见项目时。

Variance-reduced accelerated methods for decentralized stochastic double-regularized nonconvex strongly-concave minimax problems

  • paper_url: http://arxiv.org/abs/2307.07113
  • repo_url: https://github.com/rpi-opt/vrlm
  • paper_authors: Gabriel Mancino-Ball, Yangyang Xu
  • for: solves the decentralized, stochastic nonconvex strongly-concave (NCSC) minimax problem with nonsmooth regularization terms on both primal and dual variables.
  • methods: uses a Lagrangian multiplier to eliminate the consensus constraint on the dual variable, and varaince-reduction (VR) techniques to achieve a sample complexity of $\mathcal{O}(\kappa^3\varepsilon^{-3})$ and a communication complexity of $\mathcal{O}(\kappa^2\varepsilon^{-2})$ under the general stochastic setting.
  • results: achieves an $\mathcal{O}(\kappa^3\varepsilon^{-3})$ sample complexity and $\mathcal{O}(\kappa^2\varepsilon^{-2})$ communication complexity under the general stochastic setting, and an $\mathcal{O}(n + \sqrt{n} \kappa^2\varepsilon^{-2})$ sample complexity and $\mathcal{O}(\kappa^2\varepsilon^{-2})$ communication complexity under the special finite-sum setting, which matches the best-known results achieved by a few existing methods for solving special cases of the problem.
    Abstract In this paper, we consider the decentralized, stochastic nonconvex strongly-concave (NCSC) minimax problem with nonsmooth regularization terms on both primal and dual variables, wherein a network of $m$ computing agents collaborate via peer-to-peer communications. We consider when the coupling function is in expectation or finite-sum form and the double regularizers are convex functions, applied separately to the primal and dual variables. Our algorithmic framework introduces a Lagrangian multiplier to eliminate the consensus constraint on the dual variable. Coupling this with variance-reduction (VR) techniques, our proposed method, entitled VRLM, by a single neighbor communication per iteration, is able to achieve an $\mathcal{O}(\kappa^3\varepsilon^{-3})$ sample complexity under the general stochastic setting, with either a big-batch or small-batch VR option, where $\kappa$ is the condition number of the problem and $\varepsilon$ is the desired solution accuracy. With a big-batch VR, we can additionally achieve $\mathcal{O}(\kappa^2\varepsilon^{-2})$ communication complexity. Under the special finite-sum setting, our method with a big-batch VR can achieve an $\mathcal{O}(n + \sqrt{n} \kappa^2\varepsilon^{-2})$ sample complexity and $\mathcal{O}(\kappa^2\varepsilon^{-2})$ communication complexity, where $n$ is the number of components in the finite sum. All complexity results match the best-known results achieved by a few existing methods for solving special cases of the problem we consider. To the best of our knowledge, this is the first work which provides convergence guarantees for NCSC minimax problems with general convex nonsmooth regularizers applied to both the primal and dual variables in the decentralized stochastic setting. Numerical experiments are conducted on two machine learning problems. Our code is downloadable from https://github.com/RPI-OPT/VRLM.
    摘要 在这篇论文中,我们考虑了分布式、随机非 convex 强迫紧密(NCSC)最小化问题,其中 $m$ 个计算代理在各自之间进行各自通信。我们考虑了对 coupling function 的预期形式和 finite-sum 形式的情况,并且对 primal 和 dual 变量应用了 convex 函数的双重正则化。我们的算法框架引入了 Lagrangian 多乘数,以消除 dual 变量上的协调约束。将这与减少异议(VR)技术结合,我们提出的方法,名为 VRLM,每个邻居通信一次,可以在通用随机设定下实现 $\mathcal{O}(\kappa^3\varepsilon^{-3})$ 样本复杂度,其中 $\kappa$ 是问题的condition number,$\varepsilon$ 是解决精度。另外,使用 big-batch VR,我们可以实现 $\mathcal{O}(\kappa^2\varepsilon^{-2})$ 的通信复杂度。在特定的finite-sum设定下,使用 big-batch VR,我们可以实现 $\mathcal{O}(n + \sqrt{n} \kappa^2\varepsilon^{-2})$ 样本复杂度和 $\mathcal{O}(\kappa^2\varepsilon^{-2})$ 通信复杂度,其中 $n$ 是finite-sum中的组件数。所有复杂度结果与一些现有的方法实现的最佳结果相匹配。这是我们知道的首次提出了 NCSC 最小化问题中的强迫紧密最小化问题的解 convergence guarantees,并且在分布式随机设定下实现了这种问题的解。我们的代码可以在 https://github.com/RPI-OPT/VRLM 上下载。Numerical experiments were conducted on two machine learning problems, and the results show that our method is effective and efficient.

Graph Positional and Structural Encoder

  • paper_url: http://arxiv.org/abs/2307.07107
  • repo_url: https://github.com/g-taxonomy-workgroup/gpse
  • paper_authors: Renming Liu, Semih Cantürk, Olivier Lapointe-Gagné, Vincent Létourneau, Guy Wolf, Dominique Beaini, Ladislav Rampášek
  • for: 这种研究旨在开发一种可以为不同类型的图像预测任务提供优化的图像encoder,以提高图像预测模型的性能。
  • methods: 该研究使用了一种新的图像encoder,称为图像 pozitional 和结构encoder(GPSE),该encoder可以学习各种图像的 pozitional 和结构信息,并将其转化为共享的幂等表示。
  • results: 研究发现,使用GPSE可以在各种图像预测任务中提高模型的性能,并且可以在不同类型的图像上进行高效的传播。此外,GPSE可以与现有的自我超VI等方法相比肩并让人们认为是一种可靠的代替方法。
    Abstract Positional and structural encodings (PSE) enable better identifiability of nodes within a graph, as in general graphs lack a canonical node ordering. This renders PSEs essential tools for empowering modern GNNs, and in particular graph Transformers. However, designing PSEs that work optimally for a variety of graph prediction tasks is a challenging and unsolved problem. Here, we present the graph positional and structural encoder (GPSE), a first-ever attempt to train a graph encoder that captures rich PSE representations for augmenting any GNN. GPSE can effectively learn a common latent representation for multiple PSEs, and is highly transferable. The encoder trained on a particular graph dataset can be used effectively on datasets drawn from significantly different distributions and even modalities. We show that across a wide range of benchmarks, GPSE-enhanced models can significantly improve the performance in certain tasks, while performing on par with those that employ explicitly computed PSEs in other cases. Our results pave the way for the development of large pre-trained models for extracting graph positional and structural information and highlight their potential as a viable alternative to explicitly computed PSEs as well as to existing self-supervised pre-training approaches.
    摘要 通用的位置和结构编码(PSE)可以更好地识别图像中的节点,因为普通的图像缺乏唯一的节点顺序。这使得PSE成为现代GNNS的重要工具,特别是图Transformers。然而,设计最佳的PSE是一个具有挑战性和未解决的问题。在这里,我们提出了图位置和结构编码器(GPSE),这是首次尝试用自适应学习方法来训练图编码器,以便在不同的图据集上捕捉丰富的PSE表示。GPSE可以有效地学习共享精灵代码,并且是可迁移的。我们的结果表明,在各种比较检验中,GPSE增强的模型可以在某些任务中显著提高性能,而在其他任务中与直接计算PSE的模型性能相当。我们的结果开创了大规模预训练模型的发展,并高亮了它们作为可靠的代码计算和现有自我超vised预训练方法的替代方案。

MaxCorrMGNN: A Multi-Graph Neural Network Framework for Generalized Multimodal Fusion of Medical Data for Outcome Prediction

  • paper_url: http://arxiv.org/abs/2307.07093
  • repo_url: None
  • paper_authors: Niharika S. D’Souza, Hongzhi Wang, Andrea Giovannini, Antonio Foncubierta-Rodriguez, Kristen L. Beck, Orest Boyko, Tanveer Syeda-Mahmood
  • for: 预测疾病结果(outcome prediction)
  • methods: 使用MaxCorr MGNN方法,模型不同modalities之间的非线性相关性,并使用多层graph neural network(MGNN)进行任务指导的理解
  • results: 在TB数据集上,MaxCorr MGNN方法在对比多种现有的神经网络、图基本方法和传统融合方法时,有所提高结果,并且能够有效地预测疾病结果。
    Abstract With the emergence of multimodal electronic health records, the evidence for an outcome may be captured across multiple modalities ranging from clinical to imaging and genomic data. Predicting outcomes effectively requires fusion frameworks capable of modeling fine-grained and multi-faceted complex interactions between modality features within and across patients. We develop an innovative fusion approach called MaxCorr MGNN that models non-linear modality correlations within and across patients through Hirschfeld-Gebelein-Renyi maximal correlation (MaxCorr) embeddings, resulting in a multi-layered graph that preserves the identities of the modalities and patients. We then design, for the first time, a generalized multi-layered graph neural network (MGNN) for task-informed reasoning in multi-layered graphs, that learns the parameters defining patient-modality graph connectivity and message passing in an end-to-end fashion. We evaluate our model an outcome prediction task on a Tuberculosis (TB) dataset consistently outperforming several state-of-the-art neural, graph-based and traditional fusion techniques.
    摘要 With the emergence of multimodal electronic health records, the evidence for an outcome may be captured across multiple modalities ranging from clinical to imaging and genomic data. Predicting outcomes effectively requires fusion frameworks capable of modeling fine-grained and multi-faceted complex interactions between modality features within and across patients. We develop an innovative fusion approach called MaxCorr MGNN that models non-linear modality correlations within and across patients through Hirschfeld-Gebelein-Renyi maximal correlation (MaxCorr) embeddings, resulting in a multi-layered graph that preserves the identities of the modalities and patients. We then design, for the first time, a generalized multi-layered graph neural network (MGNN) for task-informed reasoning in multi-layered graphs, that learns the parameters defining patient-modality graph connectivity and message passing in an end-to-end fashion. We evaluate our model an outcome prediction task on a Tuberculosis (TB) dataset consistently outperforming several state-of-the-art neural, graph-based and traditional fusion techniques.Here's the translation in Traditional Chinese:随着多模式电子健康纪录的出现,结果证据可能会被捕捉到多种模式,从临床到图像和基因数据。预测结果需要融合框架,可以模型内部和between patients的细节化和多方面复杂交互。我们开发了一种创新的融合方法,called MaxCorr MGNN,通过将悦律-哥白-雷尼最大相关性(MaxCorr)嵌入,创建了一个多层graph,保留了模式和患者的身份。我们然后设计了,for the first time, a generalized multi-layered graph neural network (MGNN) for task-informed reasoning in multi-layered graphs, 可以学习定义患者-模式图形连接和讯息传递的参数。我们在TB datasets上进行结果预测任务,与多种现有的神经网络、图形基础和传统融合技术相比,具有较高的性能。

Robotic Manipulation Datasets for Offline Compositional Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.07091
  • repo_url: https://github.com/lifelong-ml/offline-compositional-rl-datasets
  • paper_authors: Marcel Hussing, Jorge A. Mendez, Anisha Singrodia, Cassandra Kent, Eric Eaton
  • for: 本研究的目的是为了提高Offline Reinforcement Learning(RL)的进展,通过创建大量的数据集来避免高成本的数据收集。
  • methods: 本研究使用Compositional RL的方法,可以从少数元素中生成多个任务,并且可以将学习的元素组合以解决新任务。
  • results: 本研究提供了四个Offline RL的数据集,每个数据集包含256亿个转换,并提供了训练和评估设定来评估代理人是否能够学习集成任务政策。实验结果显示,现有的Offline RL方法可以在一定程度上学习训练任务,并且集成方法比非集成方法表现更好。但是,现有的方法仍然无法将任务的集成结构抽象出来,以适应未见任务。
    Abstract Offline reinforcement learning (RL) is a promising direction that allows RL agents to pre-train on large datasets, avoiding the recurrence of expensive data collection. To advance the field, it is crucial to generate large-scale datasets. Compositional RL is particularly appealing for generating such large datasets, since 1) it permits creating many tasks from few components, 2) the task structure may enable trained agents to solve new tasks by combining relevant learned components, and 3) the compositional dimensions provide a notion of task relatedness. This paper provides four offline RL datasets for simulated robotic manipulation created using the 256 tasks from CompoSuite [Mendez et al., 2022a]. Each dataset is collected from an agent with a different degree of performance, and consists of 256 million transitions. We provide training and evaluation settings for assessing an agent's ability to learn compositional task policies. Our benchmarking experiments on each setting show that current offline RL methods can learn the training tasks to some extent and that compositional methods significantly outperform non-compositional methods. However, current methods are still unable to extract the tasks' compositional structure to generalize to unseen tasks, showing a need for further research in offline compositional RL.
    摘要 <> translate "Offline reinforcement learning (RL) is a promising direction that allows RL agents to pre-train on large datasets, avoiding the recurrence of expensive data collection. To advance the field, it is crucial to generate large-scale datasets. Compositional RL is particularly appealing for generating such large datasets, since 1) it permits creating many tasks from few components, 2) the task structure may enable trained agents to solve new tasks by combining relevant learned components, and 3) the compositional dimensions provide a notion of task relatedness. This paper provides four offline RL datasets for simulated robotic manipulation created using the 256 tasks from CompoSuite [Mendez et al., 2022a]. Each dataset is collected from an agent with a different degree of performance, and consists of 256 million transitions. We provide training and evaluation settings for assessing an agent's ability to learn compositional task policies. Our benchmarking experiments on each setting show that current offline RL methods can learn the training tasks to some extent and that compositional methods significantly outperform non-compositional methods. However, current methods are still unable to extract the tasks' compositional structure to generalize to unseen tasks, showing a need for further research in offline compositional RL."into Simplified Chinese:<>Offline 学习 Reinforcement Learning (RL) 是一个promising的方向,允许 RL 代理人在大量数据集上进行预训练,以避免高昂的数据收集成本。为了进步这一领域,生成大规模数据集是极为重要的。compositional RL 是一种特别有把握的方法,因为它允许从 few 个组件中创建多个任务,并且任务结构可能使得训练过的代理人能够通过相关学习的组件来解决新任务。此外,compositional 维度提供了任务相关性的一种理解。本文提供了四个 offline RL 数据集,用于模拟机器人 manipulate 的场景,由 CompoSuite 中的 256 个任务生成 [Mendez et al., 2022a]。每个数据集来自不同水平的代理人,包含 256 亿个转移。我们提供了训练和评估代理人学习 compositional 任务策略的设置。我们的参考实验表明,当前的 offline RL 方法可以在一定程度上学习训练任务,而 compositional 方法在非 compositional 方法之上显著超越。然而,当前的方法仍然无法提取任务的 compositional 结构,以 generalized 到未经见过任务, indicating a need for further research in offline compositional RL.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The translation is based on the Google Translate API, and may not be perfect or idiomatic.

Choice Models and Permutation Invariance

  • paper_url: http://arxiv.org/abs/2307.07090
  • repo_url: None
  • paper_authors: Amandeep Singh, Ye Liu, Hema Yoganarasimhan
  • for: 本研究旨在描述选择函数的基本特征,以涵盖许多现有的选择模型。
  • methods: 我们提出了一种非参数化估计器,如神经网络,可以轻松地近似选择函数的函数式。
  • results: 我们通过了广泛的 simulate 结果表明,我们的提出的函数式可以完全捕捉consumer行为的特征,并且在数据驱动的情况下超越非参数化估计的矛盾性。
    Abstract Choice Modeling is at the core of many economics, operations, and marketing problems. In this paper, we propose a fundamental characterization of choice functions that encompasses a wide variety of extant choice models. We demonstrate how nonparametric estimators like neural nets can easily approximate such functionals and overcome the curse of dimensionality that is inherent in the non-parametric estimation of choice functions. We demonstrate through extensive simulations that our proposed functionals can flexibly capture underlying consumer behavior in a completely data-driven fashion and outperform traditional parametric models. As demand settings often exhibit endogenous features, we extend our framework to incorporate estimation under endogenous features. Further, we also describe a formal inference procedure to construct valid confidence intervals on objects of interest like price elasticity. Finally, to assess the practical applicability of our estimator, we utilize a real-world dataset from S. Berry, Levinsohn, and Pakes (1995). Our empirical analysis confirms that the estimator generates realistic and comparable own- and cross-price elasticities that are consistent with the observations reported in the existing literature.
    摘要 《选择模型在许多经济、运营和市场问题中核心地位。在这篇论文中,我们提出了一种涵盖广泛现有选择模型的基本特征化。我们示示了非参数统计方法如神经网络可以轻松地近似这些函数,并超越维度约束的咒语。我们通过广泛的 simulations 表明,我们提议的函数可以完全捕捉消费者行为的下面特征,并在完全数据驱动的方式下超越传统参数模型。由于需求设定 часто表现出内生特征,我们扩展了我们的框架,以便包括内生特征的估计。此外,我们还描述了一种正式的推断过程,以建立有效的自信间隔的确定。最后,我们利用 S. Berry、Levinsohn 和 Pakes (1995) 的实际数据进行了应用性测试,我们的统计分析表明,我们的估计器生成了真实的和相似的价格敏感性和跨价格敏感性,与现有文献中的观察结果相符。

Safe Reinforcement Learning as Wasserstein Variational Inference: Formal Methods for Interpretability

  • paper_url: http://arxiv.org/abs/2307.07084
  • repo_url: None
  • paper_authors: Yanran Wang, David Boyle
  • for: 解决sequential decision-making问题中的变量动态和奖励函数的问题
  • methods: 提出了一种新的 Adaptive Wasserstein Variational Optimization(AWaVO)方法,利用了正式方法来提供奖励函数设计的解释、训练结果的可见性和序列决策的概率解释
  • results: 通过实验和实际应用,证明了AWaVO方法的globally convergent和高性能,并实际证明了一个合理的奖励函数设计和稳定性之间的交易
    Abstract Reinforcement Learning or optimal control can provide effective reasoning for sequential decision-making problems with variable dynamics. Such reasoning in practical implementation, however, poses a persistent challenge in interpreting the reward function and corresponding optimal policy. Consequently, formalizing the sequential decision-making problems as inference has a considerable value, as probabilistic inference in principle offers diverse and powerful mathematical tools to infer the stochastic dynamics whilst suggesting a probabilistic interpretation of the reward design and policy convergence. In this study, we propose a novel Adaptive Wasserstein Variational Optimization (AWaVO) to tackle these challenges in sequential decision-making. Our approach utilizes formal methods to provide interpretations of reward design, transparency of training convergence, and probabilistic interpretation of sequential decisions. To demonstrate practicality, we show convergent training with guaranteed global convergence rates not only in simulation but also in real robot tasks, and empirically verify a reasonable tradeoff between high performance and conservative interpretability.
    摘要 <>使用强化学习或优化控制可以提供有效的解释力 для序列决策问题中的变量动力学。然而,在实践实现中,抽象序列决策问题为推理存在一个持续的挑战,即解释奖函数和相应的优化策略。因此,将序列决策问题正式化为推理有着很大的价值,因为推理在原则上提供了多样化和强大的数学工具来推断随机动力学,并提供了序列决策中的概率解释。在这种研究中,我们提出了一种新的自适应沃尔希特变量优化(AWaVO)方法,以解决序列决策中这些挑战。我们的方法使用正式方法提供奖函数设计的解释、训练过程的透明性和序列决策中的概率解释。为证明实用性,我们在模拟和真实 робоット任务中展示了可靠的训练,并证明了保证全球追踪率的全局收敛率,并且在实际中证明了高性能和保守可读性之间的合理交易。

A Scenario-Based Functional Testing Approach to Improving DNN Performance

  • paper_url: http://arxiv.org/abs/2307.07083
  • repo_url: None
  • paper_authors: Hong Zhu, Thi Minh Tam Tran, Aduen Benjumea, Andrew Bradley
  • for: 提高机器学习(ML)应用的性能
  • methods: 使用场景基本测试方法,包括针对弱场景进行测试、统计评估模型性能、重新训练使用传输学习技术、随机选择部分原始训练数据来避免彻底忘记效应
  • results: 通过对弱场景进行特定的重新训练和随机选择部分原始训练数据,提高了深度神经网络(DNN)模型的性能,并且比较效率地进行了改进,对人工和计算资源的需求更低
    Abstract This paper proposes a scenario-based functional testing approach for enhancing the performance of machine learning (ML) applications. The proposed method is an iterative process that starts with testing the ML model on various scenarios to identify areas of weakness. It follows by a further testing on the suspected weak scenarios and statistically evaluate the model's performance on the scenarios to confirm the diagnosis. Once the diagnosis of weak scenarios is confirmed by test results, the treatment of the model is performed by retraining the model using a transfer learning technique with the original model as the base and applying a set of training data specifically targeting the treated scenarios plus a subset of training data selected at random from the original train dataset to prevent the so-call catastrophic forgetting effect. Finally, after the treatment, the model is assessed and evaluated again by testing on the treated scenarios as well as other scenarios to check if the treatment is effective and no side effect caused. The paper reports a case study with a real ML deep neural network (DNN) model, which is the perception system of an autonomous racing car. It is demonstrated that the method is effective in the sense that DNN model's performance can be improved. It provides an efficient method of enhancing ML model's performance with much less human and compute resource than retrain from scratch.
    摘要

Kernel t-distributed stochastic neighbor embedding

  • paper_url: http://arxiv.org/abs/2307.07081
  • repo_url: https://github.com/DanShai/kernalized-tsne
  • paper_authors: Denis C. Ilie-Ablachim, Bogdan Dumitrescu, Cristian Rusu
  • for: 该论文旨在提出一种基于kernel的t-SNE算法,能够将高维数据映射到低维空间,保持数据点之间的对称距离,并且可以通过kernel trick来实现。
  • methods: 该算法使用了kernel trick,可以在高维空间或低维空间中应用,从而实现一个端到端的kernelized版本。
  • results: 对于一些数据集,该算法可以提供更好的分 clustering结果,比如在分类问题中,使用kernel方法可以提高性能和准确性。
    Abstract This paper presents a kernelized version of the t-SNE algorithm, capable of mapping high-dimensional data to a low-dimensional space while preserving the pairwise distances between the data points in a non-Euclidean metric. This can be achieved using a kernel trick only in the high dimensional space or in both spaces, leading to an end-to-end kernelized version. The proposed kernelized version of the t-SNE algorithm can offer new views on the relationships between data points, which can improve performance and accuracy in particular applications, such as classification problems involving kernel methods. The differences between t-SNE and its kernelized version are illustrated for several datasets, showing a neater clustering of points belonging to different classes.
    摘要 这篇论文提出了一种基于kernel的t-SNE算法,可以将高维数据映射到低维空间中,保持数据点之间的对称距离,而不是使用欧几里得空间。这可以通过kernel技术在高维空间或在两个空间中进行,从而实现一个端到端kernelized版本。提议的kernelized版本的t-SNE算法可以提供新的视图,用于描述数据点之间的关系,可以提高特定应用中的性能和准确性,如使用kernel方法进行分类问题。与t-SNE算法的区别被 illustrate для一些数据集,显示了不同类别的点更加整洁的归一化。

Unsupervised Learning of Distributional Properties can Supplement Human Labeling and Increase Active Learning Efficiency in Anomaly Detection

  • paper_url: http://arxiv.org/abs/2307.08782
  • repo_url: None
  • paper_authors: Jaturong Kongmanee, Mark Chignell, Khilan Jerath, Abhay Raman
  • for: 防止数据外泄 via email 是许多组织面临的严重网络安全问题。检测数据外泄异常模式通常需要人工标注,以减少假阳性的数量。
  • methods: 我们提出了一种适应式学习(AL)采样策略,利用下面的含义概率分布和模型不确定性,生成需要标注的批处,以包含罕见异常例子。
  • results: 我们的 AL 采样策略在三个高度不均衡的 UCI benchmark 上和一个真实世界的隐藏电子邮件数据集上表现出色,超过了现有的 AL 方法。
    Abstract Exfiltration of data via email is a serious cybersecurity threat for many organizations. Detecting data exfiltration (anomaly) patterns typically requires labeling, most often done by a human annotator, to reduce the high number of false alarms. Active Learning (AL) is a promising approach for labeling data efficiently, but it needs to choose an efficient order in which cases are to be labeled, and there are uncertainties as to what scoring procedure should be used to prioritize cases for labeling, especially when detecting rare cases of interest is crucial. We propose an adaptive AL sampling strategy that leverages the underlying prior data distribution, as well as model uncertainty, to produce batches of cases to be labeled that contain instances of rare anomalies. We show that (1) the classifier benefits from a batch of representative and informative instances of both normal and anomalous examples, (2) unsupervised anomaly detection plays a useful role in building the classifier in the early stages of training when relatively little labeling has been done thus far. Our approach to AL for anomaly detection outperformed existing AL approaches on three highly unbalanced UCI benchmarks and on one real-world redacted email data set.
    摘要 <>将数据外部传输到电子邮件是许多组织面临的严重网络安全威胁。检测数据外部传输(异常)模式通常需要标注,通常由人工标注员进行,以降低假警示的数量。活动学习(AL)是一种有前途的方法,可以有效地标注数据,但是需要选择有效的批处理顺序,以及使用的分数方法来优先级排序案例,特别是当检测罕见异常 случа件是关键的时候。我们提出了一种适应性的 AL 采样策略,利用下面的先验分布,以及模型的不确定性,生成需要标注的批处理中包含罕见异常例子的情况。我们表明了以下两点:(1)分类器可以从一批代表性和有用的正常和异常示例中受益,(2)无监督异常检测在训练过程的早期可以扮演一个有用的角色,尤其是当 relativamente 少的标注工作已经完成时。我们的 AL 方法在三个高度不均衡的 UCI benchmark 上和一个真实世界的隐藏邮件数据集上表现出色。

CaRT: Certified Safety and Robust Tracking in Learning-based Motion Planning for Multi-Agent Systems

  • paper_url: http://arxiv.org/abs/2307.08602
  • repo_url: None
  • paper_authors: Hiroyasu Tsukamoto, Benjamin Rivière, Changrak Choi, Amir Rahmani, Soon-Jo Chung
  • for: guaranteeing the safety and robustness of learning-based motion planning policies in nonlinear multi-agent systems
  • methods: analytical form of the CaRT safety/robust filter, which uses contraction theory to ensure safety and exponential boundedness of the trajectory tracking error, and a log-barrier formulation for distributed implementation in multi-agent settings
  • results: effectiveness of CaRT in several examples of nonlinear motion planning and control problems, including optimal, multi-spacecraft reconfiguration
    Abstract The key innovation of our analytical method, CaRT, lies in establishing a new hierarchical, distributed architecture to guarantee the safety and robustness of a given learning-based motion planning policy. First, in a nominal setting, the analytical form of our CaRT safety filter formally ensures safe maneuvers of nonlinear multi-agent systems, optimally with minimal deviation from the learning-based policy. Second, in off-nominal settings, the analytical form of our CaRT robust filter optimally tracks the certified safe trajectory, generated by the previous layer in the hierarchy, the CaRT safety filter. We show using contraction theory that CaRT guarantees safety and the exponential boundedness of the trajectory tracking error, even under the presence of deterministic and stochastic disturbance. Also, the hierarchical nature of CaRT enables enhancing its robustness for safety just by its superior tracking to the certified safe trajectory, thereby making it suitable for off-nominal scenarios with large disturbances. This is a major distinction from conventional safety function-driven approaches, where the robustness originates from the stability of a safe set, which could pull the system over-conservatively to the interior of the safe set. Our log-barrier formulation in CaRT allows for its distributed implementation in multi-agent settings. We demonstrate the effectiveness of CaRT in several examples of nonlinear motion planning and control problems, including optimal, multi-spacecraft reconfiguration.
    摘要 “我们的 CaRT 分析方法的关键创新在于建立了一个新的层次化、分布式架构,以保证学习型动力规划策略的安全和可靠性。首先,在正常设定下,我们的 CaRT 安全范防 formally 保证了非线性多自适应系统的安全运动,并且将其与学习型政策的最小偏差进行优化。其次,在偏差设定下,CaRT 的安全范防将跟踪由前一层架构生成的认证安全轨迹,以 guarantees 安全和可靠性。我们使用构造理论表明 CaRT 能够保证安全和轨迹追踪误差的对数式增长,甚至在决定性和随机干扰的存在下。此外,CaRT 的层次化结构使得它可以通过优化跟踪认证安全轨迹来增强其可靠性,因此适合偏差设定下的大干扰。这与传统的安全函数驱动方法不同,这些方法的稳定性来自安全集的稳定性,可能会将系统往内紧缩到安全集的内部。CaRT 的对数阻隔表现允许它在多自适应设定下进行分布式实现。我们在一些非线性动力规划和控制问题中证明了 CaRT 的有效性,包括多spacecraft 重配置问题。”

Rician likelihood loss for quantitative MRI using self-supervised deep learning

  • paper_url: http://arxiv.org/abs/2307.07072
  • repo_url: None
  • paper_authors: Christopher S. Parker, Anna Schroder, Sean C. Epstein, James Cole, Daniel C. Alexander, Hui Zhang
  • for: 提高量化MR成像技术中参数估计的准确性和稳定性
  • methods: 提出了negative log Rician likelihood(NLR)损失函数,并实现了其 numerically stable和准确的计算方法
  • results: 对于 Apparent Diffusion Coefficient(ADC)和Intra-voxel Incoherent Motion(IVIM)分布模型中的参数估计,Networks trained with NLR loss show higher estimation accuracy than MSE as SNR decreases, with minimal loss of precision or total error.
    Abstract Purpose: Previous quantitative MR imaging studies using self-supervised deep learning have reported biased parameter estimates at low SNR. Such systematic errors arise from the choice of Mean Squared Error (MSE) loss function for network training, which is incompatible with Rician-distributed MR magnitude signals. To address this issue, we introduce the negative log Rician likelihood (NLR) loss. Methods: A numerically stable and accurate implementation of the NLR loss was developed to estimate quantitative parameters of the apparent diffusion coefficient (ADC) model and intra-voxel incoherent motion (IVIM) model. Parameter estimation accuracy, precision and overall error were evaluated in terms of bias, variance and root mean squared error and compared against the MSE loss over a range of SNRs (5 - 30). Results: Networks trained with NLR loss show higher estimation accuracy than MSE for the ADC and IVIM diffusion coefficients as SNR decreases, with minimal loss of precision or total error. At high effective SNR (high SNR and small diffusion coefficients), both losses show comparable accuracy and precision for all parameters of both models. Conclusion: The proposed NLR loss is numerically stable and accurate across the full range of tested SNRs and improves parameter estimation accuracy of diffusion coefficients using self-supervised deep learning. We expect the development to benefit quantitative MR imaging techniques broadly, enabling more accurate parameter estimation from noisy data.
    摘要 目的:前一些量化MR成像研究使用自动编码学习发现低信噪率下参量估算结果受到了系统性的误差的问题。这些系统性错误来自于用于网络训练的平均方差(MSE)损失函数与MR幅度信号的 rician 分布不兼容。为解决这个问题,我们引入了负Log Rician 概率(NLR)损失函数。方法:我们开发了一种稳定和准确的 NLR 损失函数实现,以估算量化参量ADC模型和IVIM模型中的参量。我们评估了参量估算精度、精度和总误差,并与MSE损失函数进行比较,在5-30的SNR范围内进行了测试。结果:使用NLR损失函数训练的网络显示在SNR降低时,ADC和IVIM扩散系数的参量估算精度提高,而不会失去精度或总误差。在高有效SNR(高SNR和小扩散系数)下,两种损失函数都显示了相似的精度和精度。结论:我们提出的NLR损失函数是稳定和准确的,可以在全面测试的SNR范围内进行估算参量。我们预计这种发展将对量化MR成像技术产生积极的影响,使得从噪声数据中更加准确地估算参量。

Proof of Training (PoT): Harnessing Crypto Mining Power for Distributed AI Training

  • paper_url: http://arxiv.org/abs/2307.07066
  • repo_url: https://github.com/p-how/proof-of-training
  • paper_authors: Peihao Li
  • for: bridging the gap between artificial intelligence (AI) and crypto mining
  • methods: utilizes practical Byzantine fault tolerance (PBFT) consensus mechanism, decentralized training network (DTN)
  • results: considerable potential in terms of task throughput, system robustness, and network security
    Abstract In the midst of the emerging trend of integrating artificial intelligence (AI) with crypto mining, we identify three major challenges that create a gap between these two fields. To bridge this gap, we introduce the proof-of-training (PoT) protocol, an approach that combines the strengths of both AI and blockchain technology. The PoT protocol utilizes the practical Byzantine fault tolerance (PBFT) consensus mechanism to synchronize global states. To evaluate the performance of the protocol design, we present an implementation of a decentralized training network (DTN) that adopts the PoT protocol. Our results indicate that the protocol exhibits considerable potential in terms of task throughput, system robustness, and network security.
    摘要 在人工智能(AI)与抵销(crypto mining)两个领域的融合趋势中,我们识别出三大挑战,这些挑战使得这两个领域之间存在一个差距。为了bridging这个差距,我们提出了证明训练(PoT)协议,这种协议结合了AI和区块链技术的优势。PoT协议使用实际的拜占庭错误tolerance(PBFT)共识机制来同步全球状态。为评估协议设计的性能,我们提出了一个分布式训练网络(DTN)的实现,该网络采用PoT协议。我们的结果表明,协议在任务 durchput、系统稳定性和网络安全方面具有显著的潜力。

Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

  • paper_url: http://arxiv.org/abs/2307.07063
  • repo_url: https://github.com/yiren-jian/blitext
  • paper_authors: Yiren Jian, Chongyang Gao, Soroush Vosoughi
  • for: 优化大型语言模型(LLM)在资源充足的视觉语言(VL)预训练中的应用。
  • methods: 提出了一种新的方法,即通过预测最佳提示来匹配语言特征,而不是通过视觉特征来导引语言模型。引入了一种新的模型——Prompt-Transformer(P-Former),该模型通过只在语言数据上训练来预测最佳提示。
  • results: 对一种robust image-to-text基eline(BLIP-2)进行了改进,并将模型训练集的数据量从4M变为129M,显著提高了模型的性能。此外,模型在不同的基模块和视频学习任务中也表现出了高效性。
    Abstract We present a novel methodology aimed at optimizing the application of frozen large language models (LLMs) for resource-intensive vision-language (VL) pre-training. The current paradigm uses visual features as prompts to guide language models, with a focus on determining the most relevant visual features for corresponding text. Our approach diverges by concentrating on the language component, specifically identifying the optimal prompts to align with visual features. We introduce the Prompt-Transformer (P-Former), a model that predicts these ideal prompts, which is trained exclusively on linguistic data, bypassing the need for image-text pairings. This strategy subtly bifurcates the end-to-end VL training process into an additional, separate stage. Our experiments reveal that our framework significantly enhances the performance of a robust image-to-text baseline (BLIP-2), and effectively narrows the performance gap between models trained with either 4M or 129M image-text pairs. Importantly, our framework is modality-agnostic and flexible in terms of architectural design, as validated by its successful application in a video learning task using varied base modules. The code is available at https://github.com/yiren-jian/BLIText
    摘要 我们提出了一种新的方法ология,旨在优化冻结大型语言模型(LLM)的资源占用量 для视觉语言(VL)预训练。当前的方法使用视觉特征作为提示,以确定与文本相关的最相关的视觉特征。我们的方法则专注于语言组件,具体是确定最佳提示,以与视觉特征对齐。我们提出了提问转换器(P-Former)模型,可以预测这些理想的提示,该模型由solely on linguistic data 训练,不需要图像文本对应。这种策略将结束到端到端 VL 训练过程中的另一个额外阶段。我们的实验表明,我们的框架可以显著提高一个强大的图像文本基线(BLIP-2)的性能,并有效地减少使用4M或129M图像文本对应的模型性能差距。其中,我们的框架是模块无关和架构可变的,并在视频学习任务中成功应用了不同的基模块。代码可以在https://github.com/yiren-jian/BLIText 中下载。

Controllable Emphasis with zero data for text-to-speech

  • paper_url: http://arxiv.org/abs/2307.07062
  • repo_url: None
  • paper_authors: Arnaud Joly, Marco Nicolis, Ekaterina Peterova, Alessandro Lombardi, Ammar Abbas, Arent van Korlaar, Aman Hussain, Parul Sharma, Alexis Moinet, Mateusz Lajszczak, Penny Karanasou, Antonio Bonafonte, Thomas Drugman, Elena Sokolova
  • for: 这项研究的目的是开发一种可扩展的文本到语音转换(TTS)技术,不需要录音或标注。
  • methods: 这种技术使用了一种简单 yet effective的方法,即通过增加预测的强调词语的持续时间来实现强调语音。
  • results: 对比spectrogram修改技术,这种方法可以提高自然性的提升率达7.3%,并在测试 sentence中提高correct identifier的率达40%。此外,这种技术还可以适用于不同的语言(英语、西班牙语、意大利语、德语)、不同的voice和多种说话风格。
    Abstract We present a scalable method to produce high quality emphasis for text-to-speech (TTS) that does not require recordings or annotations. Many TTS models include a phoneme duration model. A simple but effective method to achieve emphasized speech consists in increasing the predicted duration of the emphasised word. We show that this is significantly better than spectrogram modification techniques improving naturalness by $7.3\%$ and correct testers' identification of the emphasized word in a sentence by $40\%$ on a reference female en-US voice. We show that this technique significantly closes the gap to methods that require explicit recordings. The method proved to be scalable and preferred in all four languages tested (English, Spanish, Italian, German), for different voices and multiple speaking styles.
    摘要 我们提出了一种可扩展的方法,可以生成高质量的强调文本到语音识别(TTS),不需要录音或标注。许多TTS模型包含一个音节持续时间模型。我们发现,通过增加预测的强调单词持续时间,可以得到更高质量的强调speech。我们发现,这种方法在自然性和正确地标识强调单词方面比spectrogram修改技术提高$7.3\%$和$40\%$在参考女性英文voice上。我们发现,这种技术可以在英文、西班牙语、意大利语和德语等语言中进行扩展,并且对不同的voice和多种说话风格都有好的表现。

Corticomorphic Hybrid CNN-SNN Architecture for EEG-based Low-footprint Low-latency Auditory Attention Detection

  • paper_url: http://arxiv.org/abs/2307.08501
  • repo_url: None
  • paper_authors: Richard Gall, Deniz Kocanaogullari, Murat Akcakaya, Deniz Erdogmus, Rajkumar Kubendran
  • for: 这个研究旨在开发一个基于电生物学的听力注意力检测系统,以便在听觉装置上进行轻量级计算。
  • methods: 这个研究使用了一种混合的卷积神经网络-脊损神经网络(CNN-SNN)架构,灵感来自听觉系统,并使用了多 speaker 的话音包裹来成功地解读听力注意力,并且具有低延迟(1秒)、高准确率(91.03%)和轻量级的优点。
  • results: 这个研究获得了较高的准确率(91.03%)和较低的延迟(1秒),并且使用了15% fewer parameters和低Bitprecision,实现了57%的内存储存降少。
    Abstract In a multi-speaker "cocktail party" scenario, a listener can selectively attend to a speaker of interest. Studies into the human auditory attention network demonstrate cortical entrainment to speech envelopes resulting in highly correlated Electroencephalography (EEG) measurements. Current trends in EEG-based auditory attention detection (AAD) using artificial neural networks (ANN) are not practical for edge-computing platforms due to longer decision windows using several EEG channels, with higher power consumption and larger memory footprint requirements. Nor are ANNs capable of accurately modeling the brain's top-down attention network since the cortical organization is complex and layer. In this paper, we propose a hybrid convolutional neural network-spiking neural network (CNN-SNN) corticomorphic architecture, inspired by the auditory cortex, which uses EEG data along with multi-speaker speech envelopes to successfully decode auditory attention with low latency down to 1 second, using only 8 EEG electrodes strategically placed close to the auditory cortex, at a significantly higher accuracy of 91.03%, compared to the state-of-the-art. Simultaneously, when compared to a traditional CNN reference model, our model uses ~15% fewer parameters at a lower bit precision resulting in ~57% memory footprint reduction. The results show great promise for edge-computing in brain-embedded devices, like smart hearing aids.
    摘要 在多个说话者的 "cocktail party" 场景中,一个听众可以选择性地注意到 interessante 的说话者。人类听力注意网络的研究表明,在语音包裹中的 cortical 整合会导致高相关的电энцеfalографи(EEG)测量。现有的 EEG 基于听力注意检测(AAD)技术使用人工神经网络(ANN)不太实用于边缘计算平台,因为它们需要较长的决策窗口、更多的 EEG 通道和更大的存储占用。此外,ANN 不能准确模型大脑的上下文注意网络,因为大脑的 cortical 组织复杂且多层。在这篇论文中,我们提出了一种 hybrid convolutional neural network-spiking neural network(CNN-SNN) corticomorphic 架构, Draw inspiration from the auditory cortex,使用 EEG 数据以及多个说话者的语音包裹来成功地解码听力注意,延迟时间在 1 秒钟,使用只有 8 个 EEG 电极,位于 auditory cortex 附近,具有 significatively 高的准确率(91.03%),比对 state-of-the-art 更高。同时,与传统 CNN 参考模型相比,我们的模型使用了 ~15% fewer parameters,并且使用了更低的比特精度,即使 ~57% 的存储占用减少。结果表明,这种 corticomorphic 架构具有优秀的潜在应用于边缘计算的潜力,如智能耳机。

Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement

  • paper_url: http://arxiv.org/abs/2307.07055
  • repo_url: None
  • paper_authors: Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Minshuo Chen, Mengdi Wang
  • for: 这个研究旨在探讨受赏导向生成的方法论和模型,具有广泛应用在生成AI、增强学习和计算生物学等领域。
  • methods: 我们的方法是使用 conditional diffusion models,并在小规模的数据集上学习伪标签。从理论上评估,这个受赏导向生成器可以有效地学习和抽取受赏条件的数据分布。此外,我们还证明了模型可以重建数据集的隐藏空间表示。
  • results: 我们的实验结果显示,这个受赏导向生成器可以将新的人造数据集传递到使用者指定的目标受赏值附近,并且这个改善的受赏与数据分布的迁移程度有关。此外,我们还发现了干扰因素之间的交互作用,包括受赏信号的强度、数据分布的变化和外支援抽象的成本。
    Abstract We explore the methodology and theory of reward-directed generation via conditional diffusion models. Directed generation aims to generate samples with desired properties as measured by a reward function, which has broad applications in generative AI, reinforcement learning, and computational biology. We consider the common learning scenario where the data set consists of unlabeled data along with a smaller set of data with noisy reward labels. Our approach leverages a learned reward function on the smaller data set as a pseudolabeler. From a theoretical standpoint, we show that this directed generator can effectively learn and sample from the reward-conditioned data distribution. Additionally, our model is capable of recovering the latent subspace representation of data. Moreover, we establish that the model generates a new population that moves closer to a user-specified target reward value, where the optimality gap aligns with the off-policy bandit regret in the feature subspace. The improvement in rewards obtained is influenced by the interplay between the strength of the reward signal, the distribution shift, and the cost of off-support extrapolation. We provide empirical results to validate our theory and highlight the relationship between the strength of extrapolation and the quality of generated samples.
    摘要 我们研究了奖励导向生成的方法ología和理论,使用条件扩散模型。奖励导向生成的目的是通过奖励函数来生成具有愿景属性的样本,这有广泛的应用在生成AI、奖励学习和计算生物学等领域。我们考虑了常见的学习场景,即数据集包括无标签数据和一个较小的噪声奖励标签数据。我们的方法利用学习的奖励函数来训练一个pseudolabeler。从理论上来说,我们的导向生成器可以有效地学习和抽取奖励条件的数据分布。此外,我们的模型还可以重现数据的隐藏特征空间表示。此外,我们证明了导向生成器可以在用户指定的目标奖励值附近生成一个新的人口,其优化差异与偏离策略异常幅度相关。实际结果 validate our theory, and highlight the relationship between the strength of extrapolation and the quality of generated samples.Note: Please note that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from the original text.

Making the Most Out of the Limited Context Length: Predictive Power Varies with Clinical Note Type and Note Section

  • paper_url: http://arxiv.org/abs/2307.07051
  • repo_url: None
  • paper_authors: Hongyi Zheng, Yixin Zhu, Lavender Yao Jiang, Kyunghyun Cho, Eric Karl Oermann
  • for: 这篇论文旨在探讨在医疗纪录中使用自然语言处理,特别是针对长时间的临床护理纪录。
  • methods: 论文使用了大型语言模型,并提出了一个框架来分析临床护理纪录中的predictive power。
  • results: 研究结果显示,临床护理纪录中的predictive power分布不同,对于护士笔记和释出笔记有不同的特征。此外,结合不同类型的护理纪录可以在长时间上提高性能。
    Abstract Recent advances in large language models have led to renewed interest in natural language processing in healthcare using the free text of clinical notes. One distinguishing characteristic of clinical notes is their long time span over multiple long documents. The unique structure of clinical notes creates a new design choice: when the context length for a language model predictor is limited, which part of clinical notes should we choose as the input? Existing studies either choose the inputs with domain knowledge or simply truncate them. We propose a framework to analyze the sections with high predictive power. Using MIMIC-III, we show that: 1) predictive power distribution is different between nursing notes and discharge notes and 2) combining different types of notes could improve performance when the context length is large. Our findings suggest that a carefully selected sampling function could enable more efficient information extraction from clinical notes.
    摘要 大量语言模型的进步已导致医疗领域自然语言处理(NLP)中的重新兴趣,特别是使用医疗记录中的自由文本。医疗记录的一个特点是其长时间覆盖多个长文档,这对NLP模型的设计带来了新的选择:当语言模型预测器的上下文长度有限制时,我们应该选择哪些部分作为输入?现有研究可能选择输入的领域知识或 simply truncate them。我们提出了一个框架来分析有高预测力的部分。使用MIMIC-III数据集,我们发现:1)预测力分布在许多不同的部分中有所不同,2)结合不同类型的记录可以在上下文长度较大时提高性能。我们的发现建议一个仔细选择的采样函数可以更有效地提取信息从医疗记录中。

AnyStar: Domain randomized universal star-convex 3D instance segmentation

  • paper_url: http://arxiv.org/abs/2307.07044
  • repo_url: https://github.com/neel-dey/anystar
  • paper_authors: Neel Dey, S. Mazdak Abulnaga, Benjamin Billot, Esra Abaci Turk, P. Ellen Grant, Adrian V. Dalca, Polina Golland
  • for: 这个论文是用于解决 bio-微型scopy 和 radiology 中的星形结构分类问题,而且不需要大量的手动标注数据。
  • methods: 这个论文使用了域随机生成模型,将星形物体的Randomized appearance、环境和摄影物理传入给生成模型,以训练通用的星形结构分类网络。
  • results: 这个论文的网络可以对不同的数据集和传感器模式进行3D星形结构分类,并且不需要任何再训练、调整或域对应。
    Abstract Star-convex shapes arise across bio-microscopy and radiology in the form of nuclei, nodules, metastases, and other units. Existing instance segmentation networks for such structures train on densely labeled instances for each dataset, which requires substantial and often impractical manual annotation effort. Further, significant reengineering or finetuning is needed when presented with new datasets and imaging modalities due to changes in contrast, shape, orientation, resolution, and density. We present AnyStar, a domain-randomized generative model that simulates synthetic training data of blob-like objects with randomized appearance, environments, and imaging physics to train general-purpose star-convex instance segmentation networks. As a result, networks trained using our generative model do not require annotated images from unseen datasets. A single network trained on our synthesized data accurately 3D segments C. elegans and P. dumerilii nuclei in fluorescence microscopy, mouse cortical nuclei in micro-CT, zebrafish brain nuclei in EM, and placental cotyledons in human fetal MRI, all without any retraining, finetuning, transfer learning, or domain adaptation. Code is available at https://github.com/neel-dey/AnyStar.
    摘要 星形对象在生物微型Scope和放射学中出现,包括核体、肿块、迁移和其他单元。现有的实例分割网络 для这些结构通常需要大量的手动标注准确率,而且需要重大的重新引擎或finetuning,以适应新的 datasets和成像模式。我们提出了AnyStar,一种随机生成的域特征模型,通过模拟不同的星形对象的Randomized外观、环境和成像物理来训练通用的星形对象分割网络。因此,使用我们生成的数据进行训练,无需从未看过的 datasets 中获取标注图像。我们的网络可以高精度地3D分割 C. elegans 和 P. dumerilii 核体在激发微scopes 中, mouse 脑核体在 Micro-CT 中, zebrafish 脑核体在 EM 中,以及人类胎儿 Placental cotyledons 在人类胎儿 MRI 中,无需任何再训练、finetuning、转移学习或域适应。代码可以在 中找到。

Tapestry of Time and Actions: Modeling Human Activity Sequences using Temporal Point Process Flows

  • paper_url: http://arxiv.org/abs/2307.10305
  • repo_url: None
  • paper_authors: Vinayak Gupta, Srikanta Bedathur
  • for: 本研究旨在理解人类活动序列中的动态,以便进行Activity Length Prediction、Goal Prediction和Next Action Recommendation等下游任务。
  • methods: 该研究提出了ProActive模型,基于神经网络和 temporal marked temporal point process(MTPP)框架,可以同时解决下游任务中的下一个动作预测、序列目标预测和端到端序列生成等问题。
  • results: 对于三个活动识别数据集的测试,ProActive模型表现出了明显的性能提升,包括动作和目标预测等,同时也实现了端到端序列生成的首次应用。
    Abstract Human beings always engage in a vast range of activities and tasks that demonstrate their ability to adapt to different scenarios. Any human activity can be represented as a temporal sequence of actions performed to achieve a certain goal. Unlike the time series datasets extracted from electronics or machines, these action sequences are highly disparate in their nature -- the time to finish a sequence of actions can vary between different persons. Therefore, understanding the dynamics of these sequences is essential for many downstream tasks such as activity length prediction, goal prediction, next action recommendation, etc. Existing neural network-based approaches that learn a continuous-time activity sequence (or CTAS) are limited to the presence of only visual data or are designed specifically for a particular task, i.e., limited to next action or goal prediction. In this paper, we present ProActive, a neural marked temporal point process (MTPP) framework for modeling the continuous-time distribution of actions in an activity sequence while simultaneously addressing three high-impact problems -- next action prediction, sequence-goal prediction, and end-to-end sequence generation. Specifically, we utilize a self-attention module with temporal normalizing flows to model the influence and the inter-arrival times between actions in a sequence. In addition, we propose a novel addition over the ProActive model that can handle variations in the order of actions, i.e., different methods of achieving a given goal. We demonstrate that this variant can learn the order in which the person or actor prefers to do their actions. Extensive experiments on sequences derived from three activity recognition datasets show the significant accuracy boost of ProActive over the state-of-the-art in terms of action and goal prediction, and the first-ever application of end-to-end action sequence generation.
    摘要 人类总是在各种各样的活动和任务中展示出具有适应性的能力。任何人类活动都可以表示为一个时间序列中的动作序列,以达到某个目标。与电子设备或机器所提取的时间序列数据不同,这些动作序列之间的性质异常分散,因此理解这些序列的动态是许多下游任务的关键,如动作长度预测、目标预测、下一个动作建议等。现有的神经网络基本上的方法,学习连续时间动作序列(CTAS)都是基于视觉数据的,或者特定任务的,如下一个动作或目标预测。在这篇论文中,我们提出了ProActive框架,一种基于神经网络marked temporal point process(MTPP)模型,用于模型连续时间动作序列中的动作分布,同时解决三个高影响性问题:下一个动作预测、序列目标预测和端到端动作序列生成。具体来说,我们使用自注意模块和时间正常化流来模型动作序列中的影响和间隔时间。此外,我们还提出了一种新的ProActive模型变体,可以处理动作序列中的动作顺序变化,即不同的方法来完成同一个目标。我们的实验表明,这种变体可以学习人或演员在完成动作序列时的顺序。与现有状态的艺术预测相比,ProActive在动作和目标预测方面具有显著的准确性提升,并且实现了 historia calidad的端到端动作序列生成。

Accelerated gradient methods for nonconvex optimization: Escape trajectories from strict saddle points and convergence to local minima

  • paper_url: http://arxiv.org/abs/2307.07030
  • repo_url: None
  • paper_authors: Rishabh Dixit, Mert Gurbuzbalaban, Waheed U. Bajwa
  • for: 本研究探讨加速度法在圆锥函数上的行为。
  • methods: 本文提出了一种广泛的内斯特洛夫-类型加速方法,并对这种方法进行了严格的研究,包括逃脱矩阵点和 converges to local minima,通过both asymptotic和非 asymptotic分析。
  • results: 本文回答了内斯特洛夫加速度法(NAG)中变量摩omentum参数是否可避免精细阶点的问题,并开发了两种 asymptotic rate of convergence和divergence的度量,对一些流行的加速方法(如NAG和NCM)进行了评估。此外,本文还提供了在精细阶点附近的“线性” exit time 估计,以及不存在这些轨迹的必要条件。最后,本文研究了一类加速方法,可以在圆锥函数上 converges to local minima with near optimal rate,同时具有更好的矩阵点逃脱行为。
    Abstract This paper considers the problem of understanding the behavior of a general class of accelerated gradient methods on smooth nonconvex functions. Motivated by some recent works that have proposed effective algorithms, based on Polyak's heavy ball method and the Nesterov accelerated gradient method, to achieve convergence to a local minimum of nonconvex functions, this work proposes a broad class of Nesterov-type accelerated methods and puts forth a rigorous study of these methods encompassing the escape from saddle-points and convergence to local minima through a both asymptotic and a non-asymptotic analysis. In the asymptotic regime, this paper answers an open question of whether Nesterov's accelerated gradient method (NAG) with variable momentum parameter avoids strict saddle points almost surely. This work also develops two metrics of asymptotic rate of convergence and divergence, and evaluates these two metrics for several popular standard accelerated methods such as the NAG, and Nesterov's accelerated gradient with constant momentum (NCM) near strict saddle points. In the local regime, this work provides an analysis that leads to the "linear" exit time estimates from strict saddle neighborhoods for trajectories of these accelerated methods as well the necessary conditions for the existence of such trajectories. Finally, this work studies a sub-class of accelerated methods that can converge in convex neighborhoods of nonconvex functions with a near optimal rate to a local minima and at the same time this sub-class offers superior saddle-escape behavior compared to that of NAG.
    摘要 In the asymptotic regime, the paper answers an open question about whether Nesterov's accelerated gradient method (NAG) with a variable momentum parameter avoids strict saddle points almost surely. The study also develops two metrics of asymptotic rate of convergence and divergence, and evaluates these metrics for several popular standard accelerated methods, including NAG and Nesterov's accelerated gradient with constant momentum (NCM), near strict saddle points.In the local regime, the paper provides an analysis that leads to "linear" exit time estimates from strict saddle neighborhoods for trajectories of these accelerated methods, as well as the necessary conditions for the existence of such trajectories.Finally, the paper studies a sub-class of accelerated methods that can converge in convex neighborhoods of nonconvex functions with a near-optimal rate to a local minimum, while also exhibiting superior saddle-escape behavior compared to NAG.

Multi-Player Zero-Sum Markov Games with Networked Separable Interactions

  • paper_url: http://arxiv.org/abs/2307.09470
  • repo_url: None
  • paper_authors: Chanwoo Park, Kaiqing Zhang, Asuman Ozdaglar
  • for: 模型了多个自主决策者之间的非合作多人决策问题,使用了多个零额game的网络化分解结构。
  • methods: 提出了一种新的多个零额game模型(MZNMG),并证明了这种模型下的Markov极值 equilibria(MNE)的存在和唯一性。
  • results: 证明了在 infinitem-horizon 折扣MZNMG中找到Markov stationary NE是PPAD困难的,除非网络具有星形结构。此外,提出了一种基于 fictitious-play 的动力学,并证明了其在星形网络上的收敛性。
    Abstract We study a new class of Markov games (MGs), \textit{Multi-player Zero-sum Markov Games} with {\it Networked separable interactions} (MZNMGs), to model the local interaction structure in non-cooperative multi-agent sequential decision-making. We define an MZNMG as a model where {the payoffs of the auxiliary games associated with each state are zero-sum and} have some separable (i.e., polymatrix) structure across the neighbors over some interaction network. We first identify the necessary and sufficient conditions under which an MG can be presented as an MZNMG, and show that the set of Markov coarse correlated equilibrium (CCE) collapses to the set of Markov Nash equilibrium (NE) in these games, in that the {product of} per-state marginalization of the former for all players yields the latter. Furthermore, we show that finding approximate Markov \emph{stationary} CCE in infinite-horizon discounted MZNMGs is \texttt{PPAD}-hard, unless the underlying network has a ``star topology''. Then, we propose fictitious-play-type dynamics, the classical learning dynamics in normal-form games, for MZNMGs, and establish convergence guarantees to Markov stationary NE under a star-shaped network structure. Finally, in light of the hardness result, we focus on computing a Markov \emph{non-stationary} NE and provide finite-iteration guarantees for a series of value-iteration-based algorithms. We also provide numerical experiments to corroborate our theoretical results.
    摘要 我们研究一种新的Markov游戏(MG),即多人零点Markov游戏(MZNMG),用于模型多个自主决策者之间的本地互动结构。我们定义MZNMG为一个模型,其中每个状态的协助游戏奖励为零点和一些分割(i.e., 多维)结构的交互网络。我们首先确定MG可以转化为MZNMG的必要和 suficient condition,并证明MG的Markov均衡(NE)与Markov均衡极限(CCE)的集合相同。此外,我们证明在无限远程积分MZNMG中找到 Approximate Markov stationary CCE 是PPAD困难的,除非网络具有星型拓扑结构。然后,我们提出了 fiction play 类型的动力学,normal form 游戏的传统学习动力学, для MZNMG,并证明其在星型网络结构下具有收敛保证。由于困难结果,我们将关注计算Markov非站点均衡,并提供了一系列值迭代基于算法的finite-iteration 保证。我们还提供了数值实验来证明我们的理论结果。

Multi-view self-supervised learning for multivariate variable-channel time series

  • paper_url: http://arxiv.org/abs/2307.09614
  • repo_url: https://github.com/theabrusch/multiview_ts_ssl
  • paper_authors: Thea Brüsch, Mikkel N. Schmidt, Tommy S. Alstrøm
  • for: 这篇论文是针对多重生物医疗时间序列数据进行标签,并且解决了大量、昂贵的标签数据问题。
  • methods: 本文提出了一种自我超级学习对称学习方法,通过预训练在无标签数据上,以便不需要大量、昂贵的标签数据。但是,多重时间序列数据中的输入通道集通常在应用中会改变,而现有的方法不能将数据集转换到不同的输入通道集。因此,我们提出了一个将单一encoder套用到所有输入通道上的方法。然后,我们使用一个传递讯息神经网络将各个输入通道的内容整合成单一的表示。
  • results: 我们透过将模型预训练在六个EEG通道上,然后精致化在两个不同的EEG通道上,并与和 без传递讯息神经网络的模型进行比较。我们发现,我们的方法,结合TS2Vec损失函数,在大多数情况下都能够超过其他方法的表现。
    Abstract Labeling of multivariate biomedical time series data is a laborious and expensive process. Self-supervised contrastive learning alleviates the need for large, labeled datasets through pretraining on unlabeled data. However, for multivariate time series data, the set of input channels often varies between applications, and most existing work does not allow for transfer between datasets with different sets of input channels. We propose learning one encoder to operate on all input channels individually. We then use a message passing neural network to extract a single representation across channels. We demonstrate the potential of this method by pretraining our model on a dataset with six EEG channels and then fine-tuning it on a dataset with two different EEG channels. We compare models with and without the message passing neural network across different contrastive loss functions. We show that our method, combined with the TS2Vec loss, outperforms all other methods in most settings.
    摘要 Multivariate 医学时间序列数据标注是一个劳资成本高的过程。无监督对比学习可以减少大量标注数据的需求。然而,多变量时间序列数据的输入通道集合通常在应用程序之间变化,现有的大多数工作不允许数据集之间的传输。我们提议学习一个Encoder来处理所有输入通道。然后,我们使用一个消息传递神经网络提取所有通道的单一表示。我们在六个EEG通道的数据集上预训练我们的模型,然后在两个不同的EEG通道上细化模型。我们将与和 без消息传递神经网络进行比较,并使用TS2Vec损失函数。我们显示,我们的方法,结合TS2Vec损失函数,在大多数设置下超越其他方法。

Near-Optimal Bounds for Learning Gaussian Halfspaces with Random Classification Noise

  • paper_url: http://arxiv.org/abs/2307.08438
  • repo_url: None
  • paper_authors: Ilias Diakonikolas, Jelena Diakonikolas, Daniel M. Kane, Puqian Wang, Nikos Zarifis
  • for: 学习通用半空间(不一定是同种)的Random Classification Noise问题。
  • methods: 提出了一种 computationally efficient 的学习算法,以及nearly-matching 的 statistically query 下界结果。
  • results: 学习问题的样本复杂度为 $\widetilde{\Theta}(d/\epsilon)$,其中 $d$ 是维度和 $\epsilon$ 是过度误差。
    Abstract We study the problem of learning general (i.e., not necessarily homogeneous) halfspaces with Random Classification Noise under the Gaussian distribution. We establish nearly-matching algorithmic and Statistical Query (SQ) lower bound results revealing a surprising information-computation gap for this basic problem. Specifically, the sample complexity of this learning problem is $\widetilde{\Theta}(d/\epsilon)$, where $d$ is the dimension and $\epsilon$ is the excess error. Our positive result is a computationally efficient learning algorithm with sample complexity $\tilde{O}(d/\epsilon + d/(\max\{p, \epsilon\})^2)$, where $p$ quantifies the bias of the target halfspace. On the lower bound side, we show that any efficient SQ algorithm (or low-degree test) for the problem requires sample complexity at least $\Omega(d^{1/2}/(\max\{p, \epsilon\})^2)$. Our lower bound suggests that this quadratic dependence on $1/\epsilon$ is inherent for efficient algorithms.
    摘要 我们研究一个基本问题:在 Gaussian 分布下学习通用(也可能不是同分布)半空间,受到Random Classification Noise的干扰。我们建立了几乎匹配的算法和统计 Query(SQ)下界结果,这些结果显示了这个问题的资讯计算差距。具体来说,这个学习问题的样本Complexity是 $\widetilde{\Theta}(d/\epsilon)$, где $d$ 是维度和 $\epsilon$ 是预ulu error。我们的正面结果是一个 computationally efficient 的学习算法,其样本Complexity 是 $\tilde{O}(d/\epsilon + d/(\max\{p, \epsilon\})^2)$,其中 $p$ 是目标半空间的偏好。在下界方面,我们显示任何有效的 SQ 算法(或低度测试) для这个问题需要至少 $\Omega(d^{1/2}/(\max\{p, \epsilon\})^2)$ 的样本 Complexity。我们的下界结果表明这个 quadratic dependence on $1/\epsilon$ 是有效的算法的基本特征。

Retrieving Continuous Time Event Sequences using Neural Temporal Point Processes with Learnable Hashing

  • paper_url: http://arxiv.org/abs/2307.09613
  • repo_url: None
  • paper_authors: Vinayak Gupta, Srikanta Bedathur, Abir De
  • for: 这篇论文是设计用于搜寻和推断时间序列资料(CTES)的框架,以提高CTES Retrieval的精度和效率。
  • methods: 本论文使用了具有标记的时间点 проце数(MTPP)的predictive modeling,并开发了四种不同的对应模型,以满足不同应用的要求。
  • results: 实验结果显示,NeuroSeqRet框架可以提供 significanly 高的准确率和效率,并且可以适应不同的应用需求。
    Abstract Temporal sequences have become pervasive in various real-world applications. Consequently, the volume of data generated in the form of continuous time-event sequence(s) or CTES(s) has increased exponentially in the past few years. Thus, a significant fraction of the ongoing research on CTES datasets involves designing models to address downstream tasks such as next-event prediction, long-term forecasting, sequence classification etc. The recent developments in predictive modeling using marked temporal point processes (MTPP) have enabled an accurate characterization of several real-world applications involving the CTESs. However, due to the complex nature of these CTES datasets, the task of large-scale retrieval of temporal sequences has been overlooked by the past literature. In detail, by CTES retrieval we mean that for an input query sequence, a retrieval system must return a ranked list of relevant sequences from a large corpus. To tackle this, we propose NeuroSeqRet, a first-of-its-kind framework designed specifically for end-to-end CTES retrieval. Specifically, NeuroSeqRet introduces multiple enhancements over standard retrieval frameworks and first applies a trainable unwarping function on the query sequence which makes it comparable with corpus sequences, especially when a relevant query-corpus pair has individually different attributes. Next, it feeds the unwarped query sequence and the corpus sequence into MTPP-guided neural relevance models. We develop four variants of the relevance model for different kinds of applications based on the trade-off between accuracy and efficiency. We also propose an optimization framework to learn binary sequence embeddings from the relevance scores, suitable for the locality-sensitive hashing. Our experiments show the significant accuracy boost of NeuroSeqRet as well as the efficacy of our hashing mechanism.
    摘要 现代应用中的时间序列有 become 普遍,因此 CTES 数据的量在过去几年内 exponential 增长。因此,大量的研究在 CTES 数据集上进行下游任务,如下一个事件预测、长期预测、时间序列分类等。 current 的 predictive modeling 技术使用 marked temporal point processes (MTPP) 可以准确地 caracterize 多种实际应用中的 CTES。然而,由于 CTES 数据集的复杂性,过去的文献中忽略了大规模 temporal sequence retrieval 任务。在详细的描述中,我们定义 CTES retrieval 为输入查询序列返回一个排名列表 relevante 序列从大型废疑集中。为解决这个问题,我们提出了 NeuroSeqRet,一个专门为 CTES retrieval 设计的框架。特别是,NeuroSeqRet 引入了多种改进 standard retrieval 框架,包括在查询序列上应用可训练的 unfolding 函数,使其与废疑集序列相比较,特别是当查询序列和废疑集序列具有不同的特征时。接下来,我们将推广查询序列和废疑集序列到 MTPP 引导的神经相关模型中。我们开发了四种不同类型应用的 relevance 模型,根据准确率和效率的负担进行负担。此外,我们还提出了一种优化框架,以学习适合本地Hashing 的二进制序列嵌入。我们的实验表明 NeuroSeqRet 具有显著的准确性提升,以及我们的嵌入机制的效果。

Student Assessment in Cybersecurity Training Automated by Pattern Mining and Clustering

  • paper_url: http://arxiv.org/abs/2307.10260
  • repo_url: None
  • paper_authors: Valdemar Švábenský, Jan Vykopal, Pavel Čeleda, Kristián Tkáčik, Daniel Popovič
  • for: 这篇论文旨在描述一种基于数据挖掘和机器学习技术的cybersecurity培训数据分析方法,以帮助教育研究人员和实践者更好地评估学生和专业人员在培训过程中的学习进度和问题。
  • methods: 这篇论文使用了数据挖掘和机器学习技术来分析18个cybersecurity培训 sessio中的113名学生所输入的8834个命令,揭示了学生们的常见行为、错误、解决方案和培训阶段的困难。
  • results: 研究发现,数据挖掘和机器学习技术是适用于cybersecurity培训数据分析的有效方法,可以帮助教育研究人员和实践者评估学生的学习进度,提供有argeted的支持和改进培训设计。
    Abstract Hands-on cybersecurity training allows students and professionals to practice various tools and improve their technical skills. The training occurs in an interactive learning environment that enables completing sophisticated tasks in full-fledged operating systems, networks, and applications. During the training, the learning environment allows collecting data about trainees' interactions with the environment, such as their usage of command-line tools. These data contain patterns indicative of trainees' learning processes, and revealing them allows to assess the trainees and provide feedback to help them learn. However, automated analysis of these data is challenging. The training tasks feature complex problem-solving, and many different solution approaches are possible. Moreover, the trainees generate vast amounts of interaction data. This paper explores a dataset from 18 cybersecurity training sessions using data mining and machine learning techniques. We employed pattern mining and clustering to analyze 8834 commands collected from 113 trainees, revealing their typical behavior, mistakes, solution strategies, and difficult training stages. Pattern mining proved suitable in capturing timing information and tool usage frequency. Clustering underlined that many trainees often face the same issues, which can be addressed by targeted scaffolding. Our results show that data mining methods are suitable for analyzing cybersecurity training data. Educational researchers and practitioners can apply these methods in their contexts to assess trainees, support them, and improve the training design. Artifacts associated with this research are publicly available.
    摘要 手动网络安全培训可以帮助学生和职业人员提高技术能力。这种培训发生在一个互动式学习环境中,可以完成具有真实操作系统、网络和应用程序的复杂任务。在培训过程中,学习环境可以收集学员在环境中的交互数据,例如命令行工具的使用情况。这些数据包含学员学习过程中的 patrern,可以用来评估学员并提供反馈以帮助他们学习。然而,自动分析这些数据是具有挑战性的。培训任务包括复杂的问题解决,有多种解决方案可能。此外,学员生成的交互数据非常大。这篇论文探讨了18场网络安全培训会议中的数据,使用数据挖掘和机器学习技术进行分析。我们使用模式挖掘和聚合分析113名学员执行8834个命令的数据,揭示了他们的常见行为、错误、解决策略和培训阶段的困难。模式挖掘能够捕捉时间信息和工具使用频率。聚合发现许多学员面临相同的问题,可以通过targeted scaffolding进行支持。我们的结果表明,数据挖掘方法适用于分析网络安全培训数据。教育研究人员和实践者可以在他们的 контексте中应用这些方法,评估学员,支持他们,并改进培训设计。相关的研究 artifacts 公共可用。

Leveraging Factored Action Spaces for Off-Policy Evaluation

  • paper_url: http://arxiv.org/abs/2307.07014
  • repo_url: https://github.com/ai4ai-lab/factored-action-spaces-for-ope
  • paper_authors: Aaman Rebello, Shengpu Tang, Jenna Wiens, Sonali Parbhoo
  • for: 这篇论文目的是为了估计对于执行不同的动作序列而言,对于实际执行的数据进行评估。
  • methods: 该论文使用了分解动作空间的方法,即将每个动作表示为多个独立的子动作,从更小的动作空间中选择。这种方法使得对于不同的动作的影响进行更细致的分析。
  • results: 该论文提出了一种基于分解动作空间的重要抽样(IS)估计器,并证明了这种估计器在存在大 combinatorial action spaces 的问题中具有较低的偏差和偏度。通过实验,论文还证明了这些理论结论的有效性。
    Abstract Off-policy evaluation (OPE) aims to estimate the benefit of following a counterfactual sequence of actions, given data collected from executed sequences. However, existing OPE estimators often exhibit high bias and high variance in problems involving large, combinatorial action spaces. We investigate how to mitigate this issue using factored action spaces i.e. expressing each action as a combination of independent sub-actions from smaller action spaces. This approach facilitates a finer-grained analysis of how actions differ in their effects. In this work, we propose a new family of "decomposed" importance sampling (IS) estimators based on factored action spaces. Given certain assumptions on the underlying problem structure, we prove that the decomposed IS estimators have less variance than their original non-decomposed versions, while preserving the property of zero bias. Through simulations, we empirically verify our theoretical results, probing the validity of various assumptions. Provided with a technique that can derive the action space factorisation for a given problem, our work shows that OPE can be improved "for free" by utilising this inherent problem structure.
    摘要 Off-policy evaluation (OPE) 目的是估算对于不同的行动序列而言,实际执行的效果。然而,现有的 OPE 估算器经常在具有大 combinatorial action space 的问题中表现出高偏差和高方差。我们研究如何使用 factored action space,即将每个动作表示为一些更小的 action space 中的独立子动作的组合来 mitigate 这个问题。这种方法可以为我们进行更细致的动作效果分析。在这项工作中,我们提出了一种基于 factored action space 的 "decomposed" importance sampling (IS) 估算器。对于满足某些假设的问题结构,我们证明了这种 decomposed IS 估算器 的方差比原始非 decomposed 版本更低,同时保持零偏差性。通过实验,我们证明了我们的理论结果,并探索了各种假设的有效性。在给定问题中 derivation 出 action space factorization 的技术可以,我们的工作表明了 OPE 可以免费地提高,通过利用这种问题的内在结构。

Impact of Free-carrier Nonlinearities on Silicon Microring-based Reservoir Computing

  • paper_url: http://arxiv.org/abs/2307.07011
  • repo_url: None
  • paper_authors: Bernard J. Giron Castro, Christophe Peucheret, Darko Zibar, Francesco Da Ros
  • for: 这个论文的目的是研究时间延迟散射计算机(time-delay reservoir computing)中的热光学和自由电子效应的影响。
  • methods: 这个论文使用了silicon微环 resonator来测量时间延迟散射计算机的性能。
  • results: 研究发现,在thermo-optic和自由电子效应的影响下,可以在NARMA-10任务中实现NMSE低于0.05,这个结果取决于两种效应的时间常数。
    Abstract We quantify the impact of thermo-optic and free-carrier effects on time-delay reservoir computing using a silicon microring resonator. We identify pump power and frequency detuning ranges with NMSE less than 0.05 for the NARMA-10 task depending on the time constants of the two considered effects.
    摘要 我们测量了热光学和自由粒子效应对时延器计算的影响,使用了一个硅微环 resonator。我们确定了辐射功率和频率偏差范围,以达到NMSE小于0.05的NARMA-10任务,具体取决于两种考虑的效应的时间常数。Note: NMSE stands for "normalized mean squared error", which is a measure of the difference between the predicted and actual values. NARMA-10 is a benchmark task for time series forecasting.

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models

  • paper_url: http://arxiv.org/abs/2307.06949
  • repo_url: https://github.com/JiauZhang/hyperdreambooth
  • paper_authors: Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Wei Wei, Tingbo Hou, Yael Pritch, Neal Wadhwa, Michael Rubinstein, Kfir Aberman
  • for: 该论文旨在提出一种高效的人脸个性化生成方法,以快速生成多种Context和Style下的人脸,保持高准确率和个性化特征。
  • methods: 该方法使用了Hypernetwork来生成少量的个性化权重,并将其与扩散模型相结合,通过快速训练实现人脸的多样化生成。
  • results: 相比 DreamBooth 和 Textual Inversion,HyperDreamBooth 可以在20秒内实现人脸个性化生成,使用单个参考图片,并保持同样的质量和样式多样性。此外,HyperDreamBooth 的模型尺寸为10000倍小于normal DreamBooth模型。
    Abstract Personalization has emerged as a prominent aspect within the field of generative AI, enabling the synthesis of individuals in diverse contexts and styles, while retaining high-fidelity to their identities. However, the process of personalization presents inherent challenges in terms of time and memory requirements. Fine-tuning each personalized model needs considerable GPU time investment, and storing a personalized model per subject can be demanding in terms of storage capacity. To overcome these challenges, we propose HyperDreamBooth-a hypernetwork capable of efficiently generating a small set of personalized weights from a single image of a person. By composing these weights into the diffusion model, coupled with fast finetuning, HyperDreamBooth can generate a person's face in various contexts and styles, with high subject details while also preserving the model's crucial knowledge of diverse styles and semantic modifications. Our method achieves personalization on faces in roughly 20 seconds, 25x faster than DreamBooth and 125x faster than Textual Inversion, using as few as one reference image, with the same quality and style diversity as DreamBooth. Also our method yields a model that is 10000x smaller than a normal DreamBooth model. Project page: https://hyperdreambooth.github.io
    摘要 personalization 在生成AI中变得更加重要,可以Synthesize individuals in diverse contexts and styles, while retaining high-fidelity to their identities. However, the process of personalization presents inherent challenges in terms of time and memory requirements. Fine-tuning each personalized model needs considerable GPU time investment, and storing a personalized model per subject can be demanding in terms of storage capacity. To overcome these challenges, we propose HyperDreamBooth-a hypernetwork capable of efficiently generating a small set of personalized weights from a single image of a person. By composing these weights into the diffusion model, coupled with fast finetuning, HyperDreamBooth can generate a person's face in various contexts and styles, with high subject details while also preserving the model's crucial knowledge of diverse styles and semantic modifications. Our method achieves personalization on faces in roughly 20 seconds, 25x faster than DreamBooth and 125x faster than Textual Inversion, using as few as one reference image, with the same quality and style diversity as DreamBooth. Also our method yields a model that is 10000x smaller than a normal DreamBooth model. Project page:

In-context Autoencoder for Context Compression in a Large Language Model

  • paper_url: http://arxiv.org/abs/2307.06945
  • repo_url: None
  • paper_authors: Tao Ge, Jing Hu, Xun Wang, Si-Qing Chen, Furu Wei
  • for: 解决大语言模型(LLM)中长Context问题
  • methods: 提出了In-context Autoencoder(ICAE)模型,包括学习Encoder和固定Decoder,可以压缩长Context到有限的内存槽中
  • results: 经过预训练和细化Objective,ICAE可以生成高精度、涵盖性好的内存槽,可以conditioning by target LLM для多种提示生成恰当的响应。
    Abstract We propose the In-context Autoencoder (ICAE) for context compression in a large language model (LLM). The ICAE has two modules: a learnable encoder adapted with LoRA from an LLM for compressing a long context into a limited number of memory slots, and a fixed decoder which is the target LLM that can condition on the memory slots for various purposes. We first pretrain the ICAE using both autoencoding and language modeling objectives on massive text data, enabling it to generate memory slots that accurately and comprehensively represent the original context. Then, we fine-tune the pretrained ICAE on a small amount of instruct data to enhance its interaction with various prompts for producing desirable responses. Our experimental results demonstrate that the ICAE learned with our proposed pretraining and fine-tuning paradigm can effectively produce memory slots with $4\times$ context compression, which can be well conditioned on by the target LLM to respond to various prompts. The promising results demonstrate significant implications of the ICAE for its novel approach to the long context problem and its potential to reduce computation and memory overheads for LLM inference in practice, suggesting further research effort in context management for an LLM. Our code and data will be released shortly.
    摘要 我们提出了内 Context Autoencoder(ICAE),用于压缩大语言模型(LLM)中的上下文。ICAE有两个模块:一个可学习的编码器,通过从LLM中提取LoRA来压缩长上下文到有限的内存槽中,以及一个固定的解码器,它是目标LLM,可以根据内存槽进行多种目的的条件。我们首先使用自动编码和语言模型目标来预训练ICAE,使其能够生成准确和全面的内存槽,然后通过细化预训练ICAE来进一步调整它与不同的提示进行交互,以生成满意的回答。我们的实验结果表明,通过我们提出的预训练和细化调整方法,ICAE可以有效地生成4倍压缩的内存槽,可以由目标LLM良好地条件。这些成果表明ICAE的新的方法对长上下文问题具有重要的意义,并且可以减少LLM推理中的计算和内存占用,建议进一步研究上下文管理的LLM。我们的代码和数据将很快发布。

On the Connection between Game-Theoretic Feature Attributions and Counterfactual Explanations

  • paper_url: http://arxiv.org/abs/2307.06941
  • repo_url: None
  • paper_authors: Emanuele Albini, Shubham Sharma, Saumitra Mishra, Danial Dervovic, Daniele Magazzeni
  • for: 本研究探讨了两种最受欢迎的可解释 искусственный智能(XAI)方法之间的关系,即特征归因和 counterfactual 解释。
  • methods: 本研究使用了游戏理论的特征归因和 counterfactual 解释方法,并对其进行了修改。
  • results: 研究发现,在满足certain condition时,特征归因和 counterfactual 解释方法是等价的。此外,研究还发现了使用 counterfactual 解释方法的局限性。In English, this translates to:
  • for: This research explores the relationship between two popular types of Explainable Artificial Intelligence (XAI) methods, namely feature attribution and counterfactual explanations.
  • methods: The research uses game-theoretic feature attribution and counterfactual explanation methods, with modifications made to both.
  • results: The study finds that, under certain conditions, feature attribution and counterfactual explanation methods are equivalent. Additionally, the study highlights the limitations of using counterfactual explanations.I hope this helps!
    Abstract Explainable Artificial Intelligence (XAI) has received widespread interest in recent years, and two of the most popular types of explanations are feature attributions, and counterfactual explanations. These classes of approaches have been largely studied independently and the few attempts at reconciling them have been primarily empirical. This work establishes a clear theoretical connection between game-theoretic feature attributions, focusing on but not limited to SHAP, and counterfactuals explanations. After motivating operative changes to Shapley values based feature attributions and counterfactual explanations, we prove that, under conditions, they are in fact equivalent. We then extend the equivalency result to game-theoretic solution concepts beyond Shapley values. Moreover, through the analysis of the conditions of such equivalence, we shed light on the limitations of naively using counterfactual explanations to provide feature importances. Experiments on three datasets quantitatively show the difference in explanations at every stage of the connection between the two approaches and corroborate the theoretical findings.
    摘要 Explainable Artificial Intelligence (XAI) 在最近几年内得到了广泛的关注,而两种最受欢迎的解释方法是特征归因和counterfactual解释。这两种方法在大多数情况下被研究独立地,只有一些基于实际研究的尝试进行了一些相互关系。本文在游戏理论特征归因和counterfactual解释之间建立了明确的理论连接,并且对 SHAP 特征归因进行了操作性的改变和counterfactual解释进行了扩展。我们证明,在某些条件下,这两种方法是等价的。然后,我们扩展了等价结果到游戏理论解决方案之外的其他解决方案。此外,通过分析等价条件的限制,我们把Counterfactual解释的局限性透视到了naively使用Counterfactual解释来提供特征重要性。实验在三个数据集上展示了在连接这两种方法的每个阶段的差异,并证明了理论发现的结论。 Here's the translation in Traditional Chinese: Explainable Artificial Intelligence (XAI) 在最近几年内得到了广泛的关注,而两种最受欢迎的解释方法是特征归因和counterfactual解释。这两种方法在大多数情况下被研究独立地,只有一些基于实际研究的尝试进行了一些相互关系。本文在游戏理论特征归因和counterfactual解释之间建立了明确的理论连接,并且对 SHAP 特征归因进行了操作性的改变和counterfactual解释进行了扩展。我们证明,在某些条件下,这两种方法是等价的。然后,我们扩展了等价结果到游戏理论解决方案之外的其他解决方案。此外,通过分析等价条件的限制,我们把Counterfactual解释的局限性透视到了naively使用Counterfactual解释来提供特征重要性。实验在三个数据集上展示了在连接这两种方法的每个阶段的差异,并证明了理论发现的结论。

Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models

  • paper_url: http://arxiv.org/abs/2307.06925
  • repo_url: None
  • paper_authors: Moab Arar, Rinon Gal, Yuval Atzmon, Gal Chechik, Daniel Cohen-Or, Ariel Shamir, Amit H. Bermano
  • For: This paper focuses on improving the text-to-image (T2I) personalization process by developing a domain-agnostic method that can handle diverse concepts without requiring specialized datasets or prior information.* Methods: The proposed method uses a contrastive-based regularization technique to maintain high fidelity to the target concept characteristics while keeping the predicted embeddings close to editable regions of the latent space. This is achieved by pushing the predicted tokens towards their nearest existing CLIP tokens.* Results: The experimental results demonstrate the effectiveness of the proposed approach, showing that the learned tokens are more semantic than tokens predicted by unregularized models. This leads to a better representation that achieves state-of-the-art performance while being more flexible than previous methods.Here’s the Chinese translation of the three key points:* For: 这篇论文关注提高文本到图像(T2I)个性化过程,开发了不需要特殊数据集或先知信息的领域独立方法,可以处理多样的概念。* Methods: 该方法使用了对比基于的正则化技术,以保持高度准确地表现目标概念特征,同时将预测的符号靠近CLIP符号的 editable 区域。* Results: 实验结果表明,提出的方法有效,学习的符号比未正则化模型预测的符号更加 semantics,从而实现了更好的表示,并且比先前的方法更 flexible。
    Abstract Text-to-image (T2I) personalization allows users to guide the creative image generation process by combining their own visual concepts in natural language prompts. Recently, encoder-based techniques have emerged as a new effective approach for T2I personalization, reducing the need for multiple images and long training times. However, most existing encoders are limited to a single-class domain, which hinders their ability to handle diverse concepts. In this work, we propose a domain-agnostic method that does not require any specialized dataset or prior information about the personalized concepts. We introduce a novel contrastive-based regularization technique to maintain high fidelity to the target concept characteristics while keeping the predicted embeddings close to editable regions of the latent space, by pushing the predicted tokens toward their nearest existing CLIP tokens. Our experimental results demonstrate the effectiveness of our approach and show how the learned tokens are more semantic than tokens predicted by unregularized models. This leads to a better representation that achieves state-of-the-art performance while being more flexible than previous methods.
    摘要

DRAGON: A Dialogue-Based Robot for Assistive Navigation with Visual Language Grounding

  • paper_url: http://arxiv.org/abs/2307.06924
  • repo_url: None
  • paper_authors: Shuijing Liu, Aamir Hasan, Kaiwen Hong, Runxuan Wang, Peixin Chang, Zachary Mizrachi, Justin Lin, D. Livingston McPherson, Wendy A. Rogers, Katherine Driggs-Campbell
  • for: 帮助人们 WITH visual impairments (PwVI) 更好地理解和导航他们周围的空间。
  • methods: 使用对话系统和自然语言相关的环境映射技术,以便从用户的自由形式描述约束下导航。
  • results: 在一个日常的室内环境中,DRAGON 能够与用户进行流畅的交互,提供良好的导航体验,并使用自然语言连接用户与周围环境的概念。
    Abstract Persons with visual impairments (PwVI) have difficulties understanding and navigating spaces around them. Current wayfinding technologies either focus solely on navigation or provide limited communication about the environment. Motivated by recent advances in visual-language grounding and semantic navigation, we propose DRAGON, a guiding robot powered by a dialogue system and the ability to associate the environment with natural language. By understanding the commands from the user, DRAGON is able to guide the user to the desired landmarks on the map, describe the environment, and answer questions from visual observations. Through effective utilization of dialogue, the robot can ground the user's free-form descriptions to landmarks in the environment, and give the user semantic information through spoken language. We conduct a user study with blindfolded participants in an everyday indoor environment. Our results demonstrate that DRAGON is able to communicate with the user smoothly, provide a good guiding experience, and connect users with their surrounding environment in an intuitive manner.
    摘要 视障人群(PwVI)在周围环境中有困难理解和导航。现有的导航技术 Either focus solely on navigation or provide limited communication about the environment. 我们受到最近的视觉语言固定和semantic navigation的进步 inspirited,我们提出了DRAGON,一个带有对话系统的导航机器人。通过理解用户的命令,DRAGON能够根据用户的描述导航到地图上的目标,描述环境,并根据视觉观察回答问题。通过对话的有效利用,机器人可以将用户的自由形描述与环境中的标志相关联,并通过语音提供 semantic information。我们在日常室内环境中进行了盲人参与者的用户研究。我们的结果表明,DRAGON能够与用户交流平滑,提供良好的导航体验,并将用户与周围环境连接起来在INTUITIVE的方式。

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality

  • paper_url: http://arxiv.org/abs/2307.06915
  • repo_url: None
  • paper_authors: Ziyang Wei, Wanrong Zhu, Wei Biao Wu
  • for: 本文探讨了一种通用的SGD平均方案,以提高SGD的计算和存储效率。
  • methods: 本文使用了一些常见的平均方案,并提出了一种适应性的平均方案,基于线性模型中的优化最佳质量。
  • results: 本文证明了SGDWeighted平均方案的 asymptotic normality,并提供了在线进行有效的推断方法。此外,本文还提出了一种适应性的平均方案,可以实现最佳的非假素性和有效性。
    Abstract Stochastic Gradient Descent (SGD) is one of the simplest and most popular algorithms in modern statistical and machine learning due to its computational and memory efficiency. Various averaging schemes have been proposed to accelerate the convergence of SGD in different settings. In this paper, we explore a general averaging scheme for SGD. Specifically, we establish the asymptotic normality of a broad range of weighted averaged SGD solutions and provide asymptotically valid online inference approaches. Furthermore, we propose an adaptive averaging scheme that exhibits both optimal statistical rate and favorable non-asymptotic convergence, drawing insights from the optimal weight for the linear model in terms of non-asymptotic mean squared error (MSE).
    摘要

Uncovering Unique Concept Vectors through Latent Space Decomposition

  • paper_url: http://arxiv.org/abs/2307.06913
  • repo_url: None
  • paper_authors: Mara Graziani, Laura O’ Mahony, An-Phi Nguyen, Henning Müller, Vincent Andrearczyk
  • for: 这 paper 的目的是解释深度学习模型的内部工作机制,以建立信任和确保模型的安全性。
  • methods: 这 paper 使用了一种新的后处置无监督方法,自动找出深度模型在训练过程中学习的概念。这种方法包括分解层的积分空间为单个向量,并通过无监督 clustering 精炼这些向量,以获得与模型预测有关的概念向量。
  • results: experiments 表明,大多数这些概念向量是人类可理解的,具有凝聚性,并与任务有关。此外,这种方法还可以成功地在数据集探索中标识受到各种干扰因素影响的训练样本。这种新的探索技术具有数据类型和模型架构的弹性,可以帮助发现训练数据中的偏见和错误来源。
    Abstract Interpreting the inner workings of deep learning models is crucial for establishing trust and ensuring model safety. Concept-based explanations have emerged as a superior approach that is more interpretable than feature attribution estimates such as pixel saliency. However, defining the concepts for the interpretability analysis biases the explanations by the user's expectations on the concepts. To address this, we propose a novel post-hoc unsupervised method that automatically uncovers the concepts learned by deep models during training. By decomposing the latent space of a layer in singular vectors and refining them by unsupervised clustering, we uncover concept vectors aligned with directions of high variance that are relevant to the model prediction, and that point to semantically distinct concepts. Our extensive experiments reveal that the majority of our concepts are readily understandable to humans, exhibit coherency, and bear relevance to the task at hand. Moreover, we showcase the practical utility of our method in dataset exploration, where our concept vectors successfully identify outlier training samples affected by various confounding factors. This novel exploration technique has remarkable versatility to data types and model architectures and it will facilitate the identification of biases and the discovery of sources of error within training data.
    摘要 深度学习模型的内部机制理解是建立信任和确保模型安全的关键。概念基于的解释方法在相比特征归属估计像像素环境中显示出更好的可解释性。然而,为了定义概念用于可解释分析,用户的期望会对解释产生偏见。为解决这个问题,我们提出了一种新的后续无监督方法,可以自动抽取深度模型在训练中学习的概念。我们首先将层的潜在空间分解成单值特征,然后通过无监督划分来精细化这些特征,并提取概念向量,这些向量与模型预测中的高异常值方向相互关联,并对人类可理解。我们的广泛实验表明,大多数我们提取的概念都是人类可理解的,具有凝结性,并与任务相关。此外,我们还证明了我们的方法在数据集探索中的实际用途,我们的概念向量可以成功地检测训练样本中的异常样本,它们受到了多种干扰因素的影响。这种新的探索技术具有数据类型和模型体系的弹性,可以帮助确定模型中的偏见和发现训练数据中的错误来源。

Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks

  • paper_url: http://arxiv.org/abs/2307.06887
  • repo_url: None
  • paper_authors: Liam Collins, Hamed Hassani, Mahdi Soltanolkotabi, Aryan Mokhtari, Sanjay Shakkottai
  • for: 这个论文的目的是解释如何使用多任务学习来学习有意义的特征表示。
  • methods: 这篇论文使用的方法是使用梯度下降来训练多任务 neural network,并证明了这种方法可以在多任务 Setting中实现有意义的特征学习。
  • results: 这篇论文的结果表明,当任务是二分类问题,且标签取决于输入空间中只有r个方向时,执行一种简单的梯度下降多任务学习算法可以学习出真实的r个方向。这意味着,任何后续任务在r个真实坐标上可以通过学习一个线性分类器来解决,而Random Feature模型需要对维度d进行指数增长来获得这样的保证。
    Abstract Feature learning, i.e. extracting meaningful representations of data, is quintessential to the practical success of neural networks trained with gradient descent, yet it is notoriously difficult to explain how and why it occurs. Recent theoretical studies have shown that shallow neural networks optimized on a single task with gradient-based methods can learn meaningful features, extending our understanding beyond the neural tangent kernel or random feature regime in which negligible feature learning occurs. But in practice, neural networks are increasingly often trained on {\em many} tasks simultaneously with differing loss functions, and these prior analyses do not generalize to such settings. In the multi-task learning setting, a variety of studies have shown effective feature learning by simple linear models. However, multi-task learning via {\em nonlinear} models, arguably the most common learning paradigm in practice, remains largely mysterious. In this work, we present the first results proving feature learning occurs in a multi-task setting with a nonlinear model. We show that when the tasks are binary classification problems with labels depending on only $r$ directions within the ambient $d\gg r$-dimensional input space, executing a simple gradient-based multitask learning algorithm on a two-layer ReLU neural network learns the ground-truth $r$ directions. In particular, any downstream task on the $r$ ground-truth coordinates can be solved by learning a linear classifier with sample and neuron complexity independent of the ambient dimension $d$, while a random feature model requires exponential complexity in $d$ for such a guarantee.
    摘要 “实际上,神经网络训练的成功很大程度上取决于特征学习,即从数据中提取有意义的表现。然而,实际上这个过程仍然具有许多不明的地方。最近的理论研究表明,使用梯度下降法来训练条件是单一任务的神经网络,可以学习有意义的特征,这超越了对于神经汤圆数据或随机特征的分析。但是,实际上的神经网络通常是同时进行多个任务的,这些任务可能有不同的损失函数。这些前一 analyses 不能应用于这种情况。在多任务学习中,许多研究已经显示了有效的特征学习,但是这些研究通常是使用线性模型。然而,现実中的多任务学习通常使用非线性模型,这些模型仍然具有许多不明之处。在这个工作中,我们给出了第一个证明特征学习在多任务情况下的非线性模型的结果。我们证明,当任务是二元排序问题,labels 取决于仅有 $r$ 个方向的数据空间中的 $d \gg r$ 维度时,执行一个简单的梯度下降多任务学习算法,则可以学习真实的 $r$ 个方向。具体来说,任何在 $r$ 个真实方向上的下游任务可以通过学习一个基于样本和神经元的线性分类器,而这个 garantuee Sample 和神经元的复杂度独立于数据空间中的维度 $d$。相比之下,随机特征模型需要 $d$ 的几何级数增长以获得类似的保证。”

Min-Max Optimization under Delays

  • paper_url: http://arxiv.org/abs/2307.06886
  • repo_url: https://github.com/dataclergy/Simulation-and-Optimization-with-Python
  • paper_authors: Arman Adibi, Aritra Mitra, Hamed Hassani
  • for: 研究大规模机器学习问题中的延迟和异步问题,探讨了Stochastic Optimization with delayed gradients的性能。
  • methods: 使用Gradient Descent-Ascent (\texttt{GDA})和Extra-gradient (\texttt{EG})两种标准的min-max优化算法,并对它们的延迟版本进行了研究。
  • results: empirical study表明,即使延迟非常小,也可能导致\texttt{EG} divergence在简单的实例上,需要进行精心的分析。在适当的技术假设下, Proof that Gradient Descent-Ascent (\texttt{GDA}) and \texttt{EG} with delayed updates continue to guarantee convergence to saddle points for convex-concave and strongly convex-strongly concave settings. 复杂性下限 revelas,在延迟下, convergence 的慢化。
    Abstract Delays and asynchrony are inevitable in large-scale machine-learning problems where communication plays a key role. As such, several works have extensively analyzed stochastic optimization with delayed gradients. However, as far as we are aware, no analogous theory is available for min-max optimization, a topic that has gained recent popularity due to applications in adversarial robustness, game theory, and reinforcement learning. Motivated by this gap, we examine the performance of standard min-max optimization algorithms with delayed gradient updates. First, we show (empirically) that even small delays can cause prominent algorithms like Extra-gradient (\texttt{EG}) to diverge on simple instances for which \texttt{EG} guarantees convergence in the absence of delays. Our empirical study thus suggests the need for a careful analysis of delayed versions of min-max optimization algorithms. Accordingly, under suitable technical assumptions, we prove that Gradient Descent-Ascent (\texttt{GDA}) and \texttt{EG} with delayed updates continue to guarantee convergence to saddle points for convex-concave and strongly convex-strongly concave settings. Our complexity bounds reveal, in a transparent manner, the slow-down in convergence caused by delays.
    摘要 <>大规模机器学习问题中,延迟和异步性是不可避免的。因此,许多研究已经广泛分析了 Stochastic Optimization WITH 延迟 gradients。然而,我们知道的是,对于 Min-Max Optimization 这个主题,没有相应的理论。驱动于这个 gap,我们研究了标准的 Min-Max Optimization 算法 WITH 延迟更新。我们首先证明(Empirical),即使延迟非常小,也可以使得一些标准的算法,如 Extra-gradient (\texttt{EG}),在简单的实例上出现崩溃。这个实验结果表明,需要仔细分析延迟版本的 Min-Max Optimization 算法。在适当的技术假设下,我们证明了 Gradient Descent-Ascent (\texttt{GDA})和 \texttt{EG} WITH 延迟更新仍然能够确定 converges to 锚点 FOR convex-concave 和 strongly convex-strongly concave 设置。我们的复杂度 bound revelas,在一个透明的方式上,延迟引起的 convergence 慢化。Note: Simplified Chinese is a simplified version of Chinese that is used in mainland China and Singapore. It is different from Traditional Chinese, which is used in Taiwan and other countries.Please let me know if you have any further questions or if there's anything else I can help with!

Sequential Monte Carlo Learning for Time Series Structure Discovery

  • paper_url: http://arxiv.org/abs/2307.09607
  • repo_url: https://github.com/fsaad/autogp.jl
  • paper_authors: Feras A. Saad, Brian J. Patton, Matthew D. Hoffman, Rif A. Saurous, Vikash K. Mansinghka
  • for: 这个论文目的是自动发现复杂时间序列数据的准确模型。
  • methods: 论文使用了核函数进行 Bayesian 非参数性 posterior 推理,并结合了顺序 Monte Carlo(SMC)和反转 MCMC。
  • results: 实验结果表明,论文的方法可以在真实的时间序列数据上实现10倍到100倍的运行速度提升,并且在1,428个 econometric 数据集上实现了首次大规模的 Gaussian process 时间序列结构学习。结果表明,论文的方法可以找到更加准确的点预测和时间序列预测,而且在多个时间框架下都能够提供更加准确的预测。
    Abstract This paper presents a new approach to automatically discovering accurate models of complex time series data. Working within a Bayesian nonparametric prior over a symbolic space of Gaussian process time series models, we present a novel structure learning algorithm that integrates sequential Monte Carlo (SMC) and involutive MCMC for highly effective posterior inference. Our method can be used both in "online" settings, where new data is incorporated sequentially in time, and in "offline" settings, by using nested subsets of historical data to anneal the posterior. Empirical measurements on real-world time series show that our method can deliver 10x--100x runtime speedups over previous MCMC and greedy-search structure learning algorithms targeting the same model family. We use our method to perform the first large-scale evaluation of Gaussian process time series structure learning on a prominent benchmark of 1,428 econometric datasets. The results show that our method discovers sensible models that deliver more accurate point forecasts and interval forecasts over multiple horizons as compared to widely used statistical and neural baselines that struggle on this challenging data.
    摘要

Deep reinforcement learning for the dynamic vehicle dispatching problem: An event-based approach

  • paper_url: http://arxiv.org/abs/2307.07508
  • repo_url: None
  • paper_authors: Edyvalberty Alenquer Cordeiro, Anselmo Ramalho Pitombeira-Neto
  • for: This paper aims to solve the dynamic vehicle dispatching problem, which involves assigning vehicles to requests that arise stochastically over time and space.
  • methods: The paper uses a semi-Markov decision process to model the problem, which allows for a continuous-time treatment of the decision-making process. The authors also use double deep q-learning to train decision agents and develop a new discrete-event simulator.
  • results: The authors compare their policies with heuristic policies often used in practice and show that their policies exhibit better performance in terms of average waiting times, cancellation rates, and total service times. Specifically, their policies can reduce average waiting times by up to 50% relative to the other tested heuristic policies.
    Abstract The dynamic vehicle dispatching problem corresponds to deciding which vehicles to assign to requests that arise stochastically over time and space. It emerges in diverse areas, such as in the assignment of trucks to loads to be transported; in emergency systems; and in ride-hailing services. In this paper, we model the problem as a semi-Markov decision process, which allows us to treat time as continuous. In this setting, decision epochs coincide with discrete events whose time intervals are random. We argue that an event-based approach substantially reduces the combinatorial complexity of the decision space and overcomes other limitations of discrete-time models often proposed in the literature. In order to test our approach, we develop a new discrete-event simulator and use double deep q-learning to train our decision agents. Numerical experiments are carried out in realistic scenarios using data from New York City. We compare the policies obtained through our approach with heuristic policies often used in practice. Results show that our policies exhibit better average waiting times, cancellation rates and total service times, with reduction in average waiting times of up to 50% relative to the other tested heuristic policies.
    摘要 “Dynamic vehicle dispatching problem”对应于在时间和空间上随机出现的请求,并将车辆分配给这些请求。这种问题出现在各种领域,如货物运输、紧急系统和乘用车服务。在这篇论文中,我们使用半Markov决策过程来模型这个问题,这使得时间可以被视为连续的。在这种设定下,决策瞬间与随机时间间隔匹配,而不是离散时间点。我们认为事件基本方法可以减少决策空间的 combinatorial 复杂性,并超越常见的离散时间模型。为了测试我们的方法,我们开发了一个新的离散事件仿真器,并使用双层深度Q学习来训练我们的决策代理。在使用实际的纽约市数据进行数学实验后,我们与常见的规则进行比较。结果表明,我们的策略可以提供更低的待机时间、取消率和总服务时间,减少待机时间的减少为50%。

The complexity of non-stationary reinforcement learning

  • paper_url: http://arxiv.org/abs/2307.06877
  • repo_url: None
  • paper_authors: Christos Papadimitriou, Binghui Peng
  • for: 该论文targets the problem of continual learning in reinforcement learning, specifically the challenge of non-stationary reinforcement learning.
  • methods: 论文提出了一个最差情况复杂性结论,即修改概率或奖励的一个状态动作对的改进需要大约等于状态数量的时间来保持值函数合理,unless SETH(强型时间复杂性假设)是错误的。
  • results: 论文表明,只需要添加一个新的状态动作对来实现更容易,而不需要修改整个状态空间。
    Abstract The problem of continual learning in the domain of reinforcement learning, often called non-stationary reinforcement learning, has been identified as an important challenge to the application of reinforcement learning. We prove a worst-case complexity result, which we believe captures this challenge: Modifying the probabilities or the reward of a single state-action pair in a reinforcement learning problem requires an amount of time almost as large as the number of states in order to keep the value function up to date, unless the strong exponential time hypothesis (SETH) is false; SETH is a widely accepted strengthening of the P $\neq$ NP conjecture. Recall that the number of states in current applications of reinforcement learning is typically astronomical. In contrast, we show that just $\textit{adding}$ a new state-action pair is considerably easier to implement.
    摘要 “ continual learning 在强化学习领域中的问题,通常被称为非站点强化学习,已被认为是强化学习应用的重要挑战。我们证明了一个最差情况复杂度结果,我们认为这个结果捕捉了这个挑战:对于一个强化学习问题中的一对州动作掌握率或奖励的修改,需要一个几乎等于states的数量的时间才能保持值函数最新, Unless SETH(强 exponential time hypothesis)是假的;SETH是强化学习中的一个广泛accepted的强化。 recall that the number of states in current applications of reinforcement learning is typically astronomical. 在 contrast,我们显示了仅将一个新的州动作 pair added 是许多 easier to implement.”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The translation is based on the original text and may not be perfect, as there may be nuances or cultural references that are lost in translation.

Identifying Early Help Referrals For Local Authorities With Machine Learning And Bias Analysis

  • paper_url: http://arxiv.org/abs/2307.06871
  • repo_url: None
  • paper_authors: Eufrásio de A. Lima Neto, Jonathan Bailiss, Axel Finke, Jo Miller, Georgina Cosma
  • for: 这个论文是为了研究利用机器学习(ML)技术来帮助专家Identify families that may need Early Help assessment and support。
  • methods: 这个论文使用了机器学习模型来分析Leicestershire County Council(LCC)提供的14360个年龄在18岁以下的记录,并应用了减少偏见的技术来提高模型的公平性。
  • results: 试验表明,这些机器学习模型可以帮助专家Identify young people requiring intervention or early help,但也会生成许多假阳性结果,尤其是在使用偏见数据时。这篇论文 empirically explores the suitability of data-driven ML models for identifying young people who may require Early Help services and discusses their appropriateness and limitations for this task。
    Abstract Local authorities in England, such as Leicestershire County Council (LCC), provide Early Help services that can be offered at any point in a young person's life when they experience difficulties that cannot be supported by universal services alone, such as schools. This paper investigates the utilisation of machine learning (ML) to assist experts in identifying families that may need to be referred for Early Help assessment and support. LCC provided an anonymised dataset comprising 14360 records of young people under the age of 18. The dataset was pre-processed, machine learning models were build, and experiments were conducted to validate and test the performance of the models. Bias mitigation techniques were applied to improve the fairness of these models. During testing, while the models demonstrated the capability to identify young people requiring intervention or early help, they also produced a significant number of false positives, especially when constructed with imbalanced data, incorrectly identifying individuals who most likely did not need an Early Help referral. This paper empirically explores the suitability of data-driven ML models for identifying young people who may require Early Help services and discusses their appropriateness and limitations for this task.
    摘要 本文研究利用机器学习(ML)技术帮助专业人员确定需要 Early Help 评估和支持的家庭。英格兰当地 autorities,如莱斯特郡 council(LCC),提供 Early Help 服务,这些服务可以在年轻人生活中任何时候提供,只要他们不能够通过一般服务得到支持。本文使用了 LCC 提供的匿名数据集,包含14360名年轻人 beneath 18岁。数据集经过了预处理,建立了机器学习模型,并进行了验证和测试。为了减少模型偏见,应用了减少偏见技术。在测试中,模型表现出了能够 identificatin 需要 Early Help 评估和支持的年轻人,但也产生了较多的假阳性结果,特别是使用不均衡数据构建模型时。本文employs 数据驱动的 ML 模型来确定需要 Early Help 服务的年轻人,并讨论这些模型的适用性和局限性。

Embodied Lifelong Learning for Task and Motion Planning

  • paper_url: http://arxiv.org/abs/2307.06870
  • repo_url: None
  • paper_authors: Jorge A. Mendez, Leslie Pack Kaelbling, Tomás Lozano-Pérez
  • for: 本研究旨在实现家庭中长期运行的机器人,通过累累学习来提高它的协助能力。
  • methods: 本研究使用了一种新的生命长学习问题形式,并开发了一个生成混合模型,从中生成了规划器的连续参数。
  • results: 本研究在实验2D领域和BEHAVIOR库中的几个问题上显示了明显的规划成功。
    Abstract A robot deployed in a home over long stretches of time faces a true lifelong learning problem. As it seeks to provide assistance to its users, the robot should leverage any accumulated experience to improve its own knowledge to become a more proficient assistant. We formalize this setting with a novel lifelong learning problem formulation in the context of learning for task and motion planning (TAMP). Exploiting the modularity of TAMP systems, we develop a generative mixture model that produces candidate continuous parameters for a planner. Whereas most existing lifelong learning approaches determine a priori how data is shared across task models, our approach learns shared and non-shared models and determines which to use online during planning based on auxiliary tasks that serve as a proxy for each model's understanding of a state. Our method exhibits substantial improvements in planning success on simulated 2D domains and on several problems from the BEHAVIOR benchmark.
    摘要 一个在家中长期部署的机器人面临着真正的一生学习问题。为了为用户提供帮助,机器人应该利用所获经验来提高自己的知识,成为更加准确的助手。我们将这种设定写入一个新的一生学习问题的形式,在学习任务和动作规划(TAMP)上进行 формализации。我们利用TAMP系统的模块性,开发了一种生成式混合模型,生成候选的连续参数 для规划器。而大多数现有的一生学习方法在数据共享上做出了先前决定,我们的方法在线上决定使用共享和非共享模型,根据auxiliary任务作为每个模型对状态的理解的代理。我们的方法在模拟的2D领域和BEHAVIOR benchmark上显示出了明显的改善。

Data Augmentation for Mathematical Objects

  • paper_url: http://arxiv.org/abs/2307.06984
  • repo_url: https://github.com/djdprogramming/adfa2
  • paper_authors: Tereso del Rio, Matthew England
  • for: 本研究探讨了数据均衡和数据扩展在数学对象上的应用,以便在使用机器学习技术优化数学工具时更好地进行优化。
  • methods: 本研究使用了将变量名称交换到已经标注的问题中,以生成不需要进一步标注的新问题实例。
  • results: 研究发现,这种扩展可以提高机器学习模型的准确率,平均提高63%。研究还发现,这种提高的一部分是由数据集的均衡带来的,另一部分是由数据集的大小增加带来的。
    Abstract This paper discusses and evaluates ideas of data balancing and data augmentation in the context of mathematical objects: an important topic for both the symbolic computation and satisfiability checking communities, when they are making use of machine learning techniques to optimise their tools. We consider a dataset of non-linear polynomial problems and the problem of selecting a variable ordering for cylindrical algebraic decomposition to tackle these with. By swapping the variable names in already labelled problems, we generate new problem instances that do not require any further labelling when viewing the selection as a classification problem. We find this augmentation increases the accuracy of ML models by 63% on average. We study what part of this improvement is due to the balancing of the dataset and what is achieved thanks to further increasing the size of the dataset, concluding that both have a very significant effect. We finish the paper by reflecting on how this idea could be applied in other uses of machine learning in mathematics.
    摘要