results: 经过广泛的实验 validate MixupExplainer方法的效果,并提供了分布Shift问题的解决方案。Abstract
Graph Neural Networks (GNNs) have received increasing attention due to their ability to learn from graph-structured data. However, their predictions are often not interpretable. Post-hoc instance-level explanation methods have been proposed to understand GNN predictions. These methods seek to discover substructures that explain the prediction behavior of a trained GNN. In this paper, we shed light on the existence of the distribution shifting issue in existing methods, which affects explanation quality, particularly in applications on real-life datasets with tight decision boundaries. To address this issue, we introduce a generalized Graph Information Bottleneck (GIB) form that includes a label-independent graph variable, which is equivalent to the vanilla GIB. Driven by the generalized GIB, we propose a graph mixup method, MixupExplainer, with a theoretical guarantee to resolve the distribution shifting issue. We conduct extensive experiments on both synthetic and real-world datasets to validate the effectiveness of our proposed mixup approach over existing approaches. We also provide a detailed analysis of how our proposed approach alleviates the distribution shifting issue.
摘要
图 нейрон网络(GNN)因其能够学习图结构数据而受到了越来越多的关注。然而,它们的预测结果通常不可解释。后续的实例级别解释方法已经被提出来了解 GNN 预测结果。这些方法寻找可以解释训练过的 GNN 预测行为的子结构。在这篇文章中,我们指出了现有方法中的分布转移问题,这会影响解释质量,特别是在实际数据集上面临着紧张的决策边界。为解决这个问题,我们引入一种通用的图信息瓶颈(GIB)形式,该形式包括一个独立于标签的图变量,与普通的 GIB 相同。驱动于通用 GIB,我们提出了一种图混合方法,叫做 MixupExplainer,具有解决分布转移问题的理论保证。我们在 synthetic 和实际数据集上进行了广泛的实验,证明了我们的提议的混合方法比现有方法更有效。我们还提供了对我们的提议方法如何缓解分布转移问题的详细分析。
$\text{EFO}_{k}$-CQA: Towards Knowledge Graph Complex Query Answering beyond Set Operation
paper_authors: Hang Yin, Zihao Wang, Weizhi Fei, Yangqiu Song
for: 提供了一个框架,用于 Answering Existential First-order Queries with multiple variables(EFO),并评估这些方法在这个框架下的性能。
methods: 使用了学习基本的方法来掌握不完整的知识,以应对开放世界假设下的查询。
results: 建立了一个具有741种查询的数据集(EFO-CQA),并通过实验证明了这些查询的难度对于查询方法的影响。Abstract
To answer complex queries on knowledge graphs, logical reasoning over incomplete knowledge is required due to the open-world assumption. Learning-based methods are essential because they are capable of generalizing over unobserved knowledge. Therefore, an appropriate dataset is fundamental to both obtaining and evaluating such methods under this paradigm. In this paper, we propose a comprehensive framework for data generation, model training, and method evaluation that covers the combinatorial space of Existential First-order Queries with multiple variables ($\text{EFO}_{k}$). The combinatorial query space in our framework significantly extends those defined by set operations in the existing literature. Additionally, we construct a dataset, $\text{EFO}_{k}$-CQA, with 741 types of query for empirical evaluation, and our benchmark results provide new insights into how query hardness affects the results. Furthermore, we demonstrate that the existing dataset construction process is systematically biased that hinders the appropriate development of query-answering methods, highlighting the importance of our work. Our code and data are provided in~\url{https://github.com/HKUST-KnowComp/EFOK-CQA}.
摘要
“为了回答知识图中的复杂查询,因为开放世界假设,需要逻辑推理 sobre 未完整的知识。学习型方法是必要的,因为它们可以泛化到未观察到的知识。因此,一个合适的数据集是知识检索方法的基础和评估的重要组成部分。在这篇论文中,我们提出了一个完整的框架,包括数据生成、模型训练和方法评估,对多变量Existential First-order Queries(EFO)的 combinatorial 查询空间进行覆盖。我们的框架中的查询空间比现有文献中的set操作定义的更加广泛。此外,我们还构建了741种类型的查询集,并对其进行实验评估。我们的研究结果提供了新的思路,描述了查询难度如何影响结果。此外,我们还发现了现有数据集构建过程存在系统性的偏见,这阻碍了合适的查询回答方法的发展,高亮了我们的工作的重要性。我们的代码和数据可以在https://github.com/HKUST-KnowComp/EFOK-CQA中获取。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form instead.
Improving Trace Link Recommendation by Using Non-Isotropic Distances and Combinations
methods: 本研究使用了自然语言处理工具来自动计算Trace链,并通过 geometric viewpoint on semantic similarity 来提高 Trace 链的准确率。
results: 研究在四个开源项目和两个企业项目上进行了实验,结果表明, geometric viewpoint on semantic similarity 可以帮助提高 Trace 链的准确率,并且这些发现可以用于其他信息检索问题。Abstract
The existence of trace links between artifacts of the software development life cycle can improve the efficiency of many activities during software development, maintenance and operations. Unfortunately, the creation and maintenance of trace links is time-consuming and error-prone. Research efforts have been spent to automatically compute trace links and lately gained momentum, e.g., due to the availability of powerful tools in the area of natural language processing. In this paper, we report on some observations that we made during studying non-linear similarity measures for computing trace links. We argue, that taking a geometric viewpoint on semantic similarity can be helpful for future traceability research. We evaluated our observations on a dataset of four open source projects and two industrial projects. We furthermore point out that our findings are more general and can build the basis for other information retrieval problems as well.
摘要
软件开发生命周期中的trace链可以提高软件开发、维护和运维的效率。然而,创建和维护trace链却是时间consuming和容易出错的。研究者们已经投入大量时间和精力来自动计算trace链,最近又得到了新的动力,例如自然语言处理领域的强大工具的出现。本文报告了我们在非线性相似度测量中所作出的观察,我们认为从 geometric 视角来看 semantic 相似度可以对未来的traceability研究提供帮助。我们对四个开源项目和两个industrial项目进行了评估,并指出我们的发现不仅限于traceability问题,还可以应用于其他信息检索问题。
Explaining and visualizing black-box models through counterfactual paths
methods: 该方法使用 conditional permutation 生成的 counterfactual paths,测量特征的重要性通过Sequential permutations of features 的影响对模型预测变化。
results: 实验表明,该方法可以准确地解释和视觉化黑盒模型,并在 synthetic 和医疗数据上得到了实际应用。Abstract
Explainable AI (XAI) is an increasingly important area of machine learning research, which aims to make black-box models transparent and interpretable. In this paper, we propose a novel approach to XAI that uses the so-called counterfactual paths generated by conditional permutations of features. The algorithm measures feature importance by identifying sequential permutations of features that most influence changes in model predictions. It is particularly suitable for generating explanations based on counterfactual paths in knowledge graphs incorporating domain knowledge. Counterfactual paths introduce an additional graph dimension to current XAI methods in both explaining and visualizing black-box models. Experiments with synthetic and medical data demonstrate the practical applicability of our approach.
摘要
<>用于机器学习研究的可解释AI(XAI)是一个日益重要的领域,旨在让黑盒模型变得透明和可解释。在这篇论文中,我们提出了一种新的XAI方法,使用叫做条件 permutation的feature counterfactual paths来衡量特征重要性。这个算法可以基于知识图 incorporating domain knowledge中的counterfactual paths来生成解释。counterfactual paths增加了现有XAI方法的一个新的维度,可以对黑盒模型的解释和可视化进行更好的支持。实验结果显示了我们的方法在synthetic和医疗数据上的实际可行性。Translation notes:* "可解释AI" (XAI) is translated as "可解释AI" (XAI), which is the standard term used in Simplified Chinese.* "黑盒模型" (black-box model) is translated as "黑盒模型" (black-box model), which is the standard term used in Simplified Chinese.* "counterfactual paths" is translated as "counterfactual paths" (counterfactual paths), which is the standard term used in Simplified Chinese.* "knowledge graph" is translated as "知识图" (knowledge graph), which is the standard term used in Simplified Chinese.* "domain knowledge" is translated as "领域知识" (domain knowledge), which is the standard term used in Simplified Chinese.I hope this helps! Let me know if you have any further questions.
Bidirectionally Deformable Motion Modulation For Video-based Human Pose Transfer
paper_authors: Wing-Yin Yu, Lai-Man Po, Ray C. C. Cheung, Yuzhi Zhao, Yu Xue, Kun Li
For: 动作人体图像做pose转换,即将原始图像动作转换为目标人体pose中的动作。* Methods: 提出了一种新的弹性动作修饰(DMM)方法,通过几何核OFFSET和自适应重量调整来同时实现特征对Alignment和样式传递。* Results: 与现有方法相比,提出的方法可以更好地处理衣物上的复杂结构和不连续的姿势,并且可以更好地保持图像的稳定性和视觉连续性。Abstract
Video-based human pose transfer is a video-to-video generation task that animates a plain source human image based on a series of target human poses. Considering the difficulties in transferring highly structural patterns on the garments and discontinuous poses, existing methods often generate unsatisfactory results such as distorted textures and flickering artifacts. To address these issues, we propose a novel Deformable Motion Modulation (DMM) that utilizes geometric kernel offset with adaptive weight modulation to simultaneously perform feature alignment and style transfer. Different from normal style modulation used in style transfer, the proposed modulation mechanism adaptively reconstructs smoothed frames from style codes according to the object shape through an irregular receptive field of view. To enhance the spatio-temporal consistency, we leverage bidirectional propagation to extract the hidden motion information from a warped image sequence generated by noisy poses. The proposed feature propagation significantly enhances the motion prediction ability by forward and backward propagation. Both quantitative and qualitative experimental results demonstrate superiority over the state-of-the-arts in terms of image fidelity and visual continuity. The source code is publicly available at github.com/rocketappslab/bdmm.
摘要
<>translate text into Simplified ChineseVideo-based human pose transfer是一种视频到视频生成任务,把平板的源人像图像基于一系列目标人 pose 动作。由于衣物上的结构很复杂,以及各种异常的姿势,现有方法通常会生成不满意的结果,如扭曲的 тексту涂抹和闪烁 artifacts。为解决这些问题,我们提出了一种新的减少动作模ulation(DMM)技术,利用几何kernel偏移以及适应加权修正来同时进行特征对齐和样式传递。与普通的样式修饰用于样式传递不同,我们的修饰机制可以根据对象形状自适应重建缓和frames从样式代码中。为了提高空间-时间一致性,我们利用双向传播来提取隐藏的运动信息从扭曲的图像序列,并且通过前向和后向传播来增强运动预测能力。实验结果表明,我们的特征传播方法可以明显提高图像准确性和视觉连续性,而且在量化和质量两个方面都超过了现有技术。源代码可以在github.com/rocketappslab/bdmm 中获取。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.
Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks
results: 实验表明,该方法可以有效地提高神经网络的不确定性估计和通用化性,并且在不断学习框架中实现了良好的性能Abstract
In this work, we propose a novel prior learning method for advancing generalization and uncertainty estimation in deep neural networks. The key idea is to exploit scalable and structured posteriors of neural networks as informative priors with generalization guarantees. Our learned priors provide expressive probabilistic representations at large scale, like Bayesian counterparts of pre-trained models on ImageNet, and further produce non-vacuous generalization bounds. We also extend this idea to a continual learning framework, where the favorable properties of our priors are desirable. Major enablers are our technical contributions: (1) the sums-of-Kronecker-product computations, and (2) the derivations and optimizations of tractable objectives that lead to improved generalization bounds. Empirically, we exhaustively show the effectiveness of this method for uncertainty estimation and generalization.
摘要
在这项工作中,我们提出了一种新的先学习方法,用于提高深度神经网络的通用化和不确定性估计。关键思想是利用可扩展和结构化的神经网络 posterior 作为有效的先学习模型,具有通用化保证。我们学习的先学习模型可以在大规模上提供表达性的概率表示,类似于 bayesian 对 ImageNet 预训练模型的counterpart,并且生成非虚无效的通用化误差 bound。我们还将这个想法扩展到 continual learning 框架中,其中我们的先学习模型具有恰当的特性。主要实现方法包括:(1) kronecker 乘积计算,以及对这些计算的Derivation和优化,以实现改进的通用化误差 bound。我们在实验中证明了这种方法的有效性,用于 uncertainty estimation 和通用化。
Combining model-predictive control and predictive reinforcement learning for stable quadrupedal robot locomotion
paper_authors: Vyacheslav Kovalev, Anna Shkromada, Henni Ouerdane, Pavel Osinenko
For: 这篇论文旨在研究如何通过模型预测和预测学习控制器来获得四肢机器人稳定的步行。* Methods: 本文使用了模型预测控制(MPC)和预测学习(RL)两种控制方法来解决四肢机器人稳定步行问题。MPC是一种已知的控制方法,但是它不使用线上学习,只有一些适应型的变化。RL则是一种基于体验的学习方法,但是在高复杂的机器人中可能不太适用。本文的混合方法结合了MPC和RL,使用了成本滚动算法和一个对应的Q函数预测器,以缓解MPC的计算复杂性。* Results: 本文的实验结果显示,使用了混合控制的四肢机器人可以在短时间内获得稳定的步行,而nominal MP控制器则在较长时间内失败。此外,本文的控制器不需要前期训练,可以进行现场操作。结果显示,混合MPC和RL的控制方法可以实现四肢机器人稳定步行的平衡。Abstract
Stable gait generation is a crucial problem for legged robot locomotion as this impacts other critical performance factors such as, e.g. mobility over an uneven terrain and power consumption. Gait generation stability results from the efficient control of the interaction between the legged robot's body and the environment where it moves. Here, we study how this can be achieved by a combination of model-predictive and predictive reinforcement learning controllers. Model-predictive control (MPC) is a well-established method that does not utilize any online learning (except for some adaptive variations) as it provides a convenient interface for state constraints management. Reinforcement learning (RL), in contrast, relies on adaptation based on pure experience. In its bare-bone variants, RL is not always suitable for robots due to their high complexity and expensive simulation/experimentation. In this work, we combine both control methods to address the quadrupedal robot stable gate generation problem. The hybrid approach that we develop and apply uses a cost roll-out algorithm with a tail cost in the form of a Q-function modeled by a neural network; this allows to alleviate the computational complexity, which grows exponentially with the prediction horizon in a purely MPC approach. We demonstrate that our RL gait controller achieves stable locomotion at short horizons, where a nominal MP controller fails. Further, our controller is capable of live operation, meaning that it does not require previous training. Our results suggest that the hybridization of MPC with RL, as presented here, is beneficial to achieve a good balance between online control capabilities and computational complexity.
摘要
稳定步态生成是四肢机器人行走中的关键问题,这会影响其他重要性能因素,如覆盖不平地形和功率消耗。稳定步态生成的稳定性来自四肢机器人身体和运动环境之间的有效控制。在这里,我们研究如何通过组合模型预测和预测学习控制器来实现稳定步态生成。模型预测控制(MPC)是一种已知的方法,不使用线上学习(除了一些适应变化),它提供了一个方便的状态约束管理界面。学习控制(RL),相比之下,基于经验学习,不适用于机器人,因为它们的复杂性和临界实验/仿真成本高。在这个工作中,我们将这两种控制方法结合使用,以解决四肢机器人稳定步态生成问题。我们开发的混合方法使用一个成本滚动算法,其中的尾成本是一个模拟网络模型的Q函数;这使得计算复杂性减少,在完全MPC方法中计算复杂性呈指数增长的情况下。我们的RL步态控制器可以在短预测时间内实现稳定行走,而一个准确MP控制器则无法实现。此外,我们的控制器可以进行实时操作,不需要先期训练。我们的结果表明,在MPC和RL之间的混合,如我们所提出的,可以实现一个良好的平衡,以提高在线控制能力和计算复杂性。
SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
paper_authors: Yi-Syuan Chen, Yun-Zhu Song, Cheng Yu Yeo, Bei Liu, Jianlong Fu, Hong-Han Shuai for: 这个论文的目的是探讨如何实现无 gradient 学习的情况下,大型 transformer 模型在视觉语言领域中进行增Context 学习。methods: 这个论文使用的方法是引入视觉信息到大型语言模型中,以便在输入上进行增Context 预测。results: 实验结果表明,SINC 方法在多种视觉语言任务下,在几个shot Setting 下表现出色,而且可以在实时进行增Context 预测。此外,SINC 方法的设计也帮助我们了解视觉语言领域中增Context 学习的好处,以及这种学习方式在不同任务中的发展。Abstract
Large Pre-trained Transformers exhibit an intriguing capacity for in-context learning. Without gradient updates, these models can rapidly construct new predictors from demonstrations presented in the inputs. Recent works promote this ability in the vision-language domain by incorporating visual information into large language models that can already make in-context predictions. However, these methods could inherit issues in the language domain, such as template sensitivity and hallucination. Also, the scale of these language models raises a significant demand for computations, making learning and operating these models resource-intensive. To this end, we raise a question: ``How can we enable in-context learning without relying on the intrinsic in-context ability of large language models?". To answer it, we propose a succinct and general framework, Self-supervised IN-Context learning (SINC), that introduces a meta-model to learn on self-supervised prompts consisting of tailored demonstrations. The learned models can be transferred to downstream tasks for making in-context predictions on-the-fly. Extensive experiments show that SINC outperforms gradient-based methods in various vision-language tasks under few-shot settings. Furthermore, the designs of SINC help us investigate the benefits of in-context learning across different tasks, and the analysis further reveals the essential components for the emergence of in-context learning in the vision-language domain.
摘要
大型预训Transformer显示了有趣的 Context Learning能力。无需梯度更新,这些模型可快速从输入中提取示例构建新预测器。最近的工作在视觉语言领域把视觉信息 integrate到可以在输入中进行预测的大语言模型中,以提高Context Learning能力。然而,这些方法可能会继承语言领域的问题,如模板敏感和幻觉。此外,这些语言模型的大规模需要巨量的计算资源,使学习和运行这些模型成为资源占用。因此,我们提出了问题:“如何启用Context Learning无需大语言模型的内在能力?”为回答这个问题,我们提出了一种简洁且通用的框架,Self-supervised IN-Context learning(SINC)。SINC引入了一个元模型,用于在自我超visuelle示例上学习。学习后,这些模型可以被转移到下游任务中进行实时预测。广泛的实验显示,SINC在视觉语言任务下的几个shot设定下表现出色,超过了梯度更新方法。此外,SINC的设计帮助我们调查在不同任务中Context Learning的好处,并且分析还揭示了视觉语言领域Context Learning的发展的关键组成部分。
Intuitive Access to Smartphone Settings Using Relevance Model Trained by Contrastive Learning
paper_authors: Joonyoung Kim, Kangwook Lee, Haebin Shin, Hurnjoo Lee, Sechun Kang, Byunguk Choi, Dong Shin, Joohyung Lee for: 该论文 targets 智能手机中的功能搜索问题,即用户难以找到功能的问题。methods: 该论文提出了一种新的搜索系统,使用了对比学习来训练一个拥有Contextual relevance的相关性模型,以及应用了知识填充来压缩模型,使其在设备上运行高效。results: 测试结果显示,该系统在 contextual sentence 查询和 usual keyword-based 查询中表现出色,超过了现有的搜索基准。Abstract
The more new features that are being added to smartphones, the harder it becomes for users to find them. This is because the feature names are usually short, and there are just too many to remember. In such a case, the users may want to ask contextual queries that describe the features they are looking for, but the standard term frequency-based search cannot process them. This paper presents a novel retrieval system for mobile features that accepts intuitive and contextual search queries. We trained a relevance model via contrastive learning from a pre-trained language model to perceive the contextual relevance between query embeddings and indexed mobile features. Also, to make it run efficiently on-device using minimal resources, we applied knowledge distillation to compress the model without degrading much performance. To verify the feasibility of our method, we collected test queries and conducted comparative experiments with the currently deployed search baselines. The results show that our system outperforms the others on contextual sentence queries and even on usual keyword-based queries.
摘要
随着智能手机中新增功能的数量的增加,用户找到这些功能越来越Difficult.这是因为功能名称通常很短,而且有太多了,用户可能会想要提问 Contextual queries 描述所需的功能,但标准的 terme frequency-based search 系统无法处理这些查询。本文介绍了一种新的手机功能检索系统,该系统可以接受用户提出的Intuitive和Contextual search queries。我们通过对预先训练的语言模型进行对比学习来训练一个 relevance 模型,以便在查询embeddings中感知Contextual relevance。此外,为了在设备上运行效率高并使用 minimal resources,我们应用了知识填充技术来压缩模型。为了证明我们的方法的可行性,我们收集了测试查询并进行了相对 эксперименты。结果表明,我们的系统在Contextual sentence queries 和一般键 palab 查询中都高于其他基elines。
Safe Formulas in the General Theory of Stable Models
results: 根据本研究结果,安全句子和其归grounding结果具有相同的稳定模型,而稳定模型的描述可以用一种简单的语法形式来表示。Abstract
Safe first-order formulas generalize the concept of a safe rule, which plays an important role in the design of answer set solvers. We show that any safe sentence is equivalent, in a certain sense, to the result of its grounding -- to the variable-free sentence obtained from it by replacing all quantifiers with multiple conjunctions and disjunctions. It follows that a safe sentence and the result of its grounding have the same stable models, and that the stable models of a safe sentence can be characterized by a formula of a simple syntactic form.
摘要
安全的首项式式表示安全规则的概念,这种概念在答案集解决器的设计中发挥着重要作用。我们证明任何安全句子都等价于其归根结构 -- 将它中的全量化器替换为多个并 conjunctions 和 disjunctions 后得到的变量自由句子。因此,安全句子和它的归根结构具有相同的稳定模型,并且安全句子的稳定模型可以通过一个简单的语法结构来描述。
Measuring Perceived Trust in XAI-Assisted Decision-Making by Eliciting a Mental Model
results: 研究发现,对于每个 ME,可以获得一个量化的信任值,以确定他们对 XAI 模型的信任程度。这些量化值可以判断 ME 是否对 XAI 模型有信任或不信任。此外,研究还发现,MEs 的心理主观性会影响他们对 XAI 模型的信任程度。Abstract
This empirical study proposes a novel methodology to measure users' perceived trust in an Explainable Artificial Intelligence (XAI) model. To do so, users' mental models are elicited using Fuzzy Cognitive Maps (FCMs). First, we exploit an interpretable Machine Learning (ML) model to classify suspected COVID-19 patients into positive or negative cases. Then, Medical Experts' (MEs) conduct a diagnostic decision-making task based on their knowledge and then prediction and interpretations provided by the XAI model. In order to evaluate the impact of interpretations on perceived trust, explanation satisfaction attributes are rated by MEs through a survey. Then, they are considered as FCM's concepts to determine their influences on each other and, ultimately, on the perceived trust. Moreover, to consider MEs' mental subjectivity, fuzzy linguistic variables are used to determine the strength of influences. After reaching the steady state of FCMs, a quantified value is obtained to measure the perceived trust of each ME. The results show that the quantified values can determine whether MEs trust or distrust the XAI model. We analyze this behavior by comparing the quantified values with MEs' performance in completing diagnostic tasks.
摘要
results: 本文提出了一种更加简单和普适的 elementary set 概念,并证明了其与非逻辑程序相关的最大不充足 elementary set 是非逻辑程序的所有非空不充足集之最小集。此外,本文还提供了一种图理学方法来Characterize elementary sets for nondisjunctive programs。在 contrast to nondisjunctive programs, 本文显示了对于分配程序,确定 elementary set 是 coNP-complete。Abstract
By introducing the concepts of a loop and a loop formula, Lin and Zhao showed that the answer sets of a nondisjunctive logic program are exactly the models of its Clark's completion that satisfy the loop formulas of all loops. Recently, Gebser and Schaub showed that the Lin-Zhao theorem remains correct even if we restrict loop formulas to a special class of loops called ``elementary loops.'' In this paper, we simplify and generalize the notion of an elementary loop, and clarify its role. We propose the notion of an elementary set, which is almost equivalent to the notion of an elementary loop for nondisjunctive programs, but is simpler, and, unlike elementary loops, can be extended to disjunctive programs without producing unintuitive results. We show that the maximal unfounded elementary sets for the ``relevant'' part of a program are exactly the minimal sets among the nonempty unfounded sets. We also present a graph-theoretic characterization of elementary sets for nondisjunctive programs, which is simpler than the one proposed in (Gebser & Schaub 2005). Unlike the case of nondisjunctive programs, we show that the problem of deciding an elementary set is coNP-complete for disjunctive programs.
摘要
林和赵通过引入循环和循环公式,表明答案集合的非逻辑程序是完全 Clark 完成的模型,满足所有循环公式的循环。最近,格卜和瑞布 показа了林-赵定理仍然正确,只要限制循环公式为特殊类循环 called “元素循环”。在这篇文章中,我们简化和推广元素循环的概念,并解释其作用。我们提出了元素集的概念,它与元素循环对非逻辑程序几乎等价,但更简单,而且与元素循环不同的是可以扩展到分支程序无需生成不自然的结果。我们证明了最大不定元素集的“相关”部分的程序是非空不定集中的最小集。我们还提出了非逻辑程序的元素集的图学特征化,这比 Gebser 和 Schaub (2005)提出的特征化更简单。不同于非逻辑程序,我们证明了决定元素集的问题是 coNP-完全的 для分支程序。
Abstracting Concept-Changing Rules for Solving Raven’s Progressive Matrix Problems
results: 这种方法可以自动抽象全局规则,并且在无外部监督下达到了同级或更高的准确率。Abstract
The abstract visual reasoning ability in human intelligence benefits discovering underlying rules in the novel environment. Raven's Progressive Matrix (RPM) is a classic test to realize such ability in machine intelligence by selecting from candidates. Recent studies suggest that solving RPM in an answer-generation way boosts a more in-depth understanding of rules. However, existing generative solvers cannot discover the global concept-changing rules without auxiliary supervision (e.g., rule annotations and distractors in candidate sets). To this end, we propose a deep latent variable model for Concept-changing Rule ABstraction (CRAB) by learning interpretable concepts and parsing concept-changing rules in the latent space. With the iterative learning process, CRAB can automatically abstract global rules shared on the dataset on each concept and form the learnable prior knowledge of global rules. CRAB outperforms the baselines trained without auxiliary supervision in the arbitrary-position answer generation task and achieves comparable and even higher accuracy than the compared models trained with auxiliary supervision. Finally, we conduct experiments to illustrate the interpretability of CRAB in concept learning, answer selection, and global rule abstraction.
摘要
人类智能中的抽象视觉理解能力可以帮助发现新环境中的底层规则。Raven's Progressive Matrix (RPM) 是一种经典的测试,用于评测机器智能中的这种能力。Recent studies 表明,通过answer-generation的方式解决RPM可以提高对规则的更深入的理解。然而,现有的生成解决方案无法自动发现全局概念变化的规则,需要 auxiliary supervision(例如,规则注释和distractors在候选集中)。为此,我们提出了一种深入学习的秘密变量模型,即Concept-changing Rule ABstraction (CRAB),可以学习可读的概念和解析概念变化规则在隐藏空间中。通过迭代学习过程,CRAB可以自动抽象数据集中的全局规则,并将这些规则形成可学习的先验知识。CRAB在无auxiliary supervision的arbitrary-position answer generation任务中表现出色,并与包括auxiliary supervision的比较模型相比,达到了相同或更高的准确率。最后,我们进行了实验,以示CRAB在概念学习、答案选择和全局规则抽象方面的解释性。
results: 本文显示了 causal theories 中 multi-valued constants 的 eliminating,并将 C+ 与 Pednault 提出的 action language ADL 进行比较。Abstract
This paper continues the line of work on representing properties of actions in nonmonotonic formalisms that stresses the distinction between being "true" and being "caused", as in the system of causal logic introduced by McCain and Turner and in the action language C proposed by Giunchiglia and Lifschitz. The only fluents directly representable in language C+ are truth-valued fluents, which is often inconvenient. We show that both causal logic and language C can be extended to allow values from arbitrary nonempty sets. Our extension of language C, called C+, also makes it possible to describe actions in terms of their attributes, which is important from the perspective of elaboration tolerance. We describe an embedding of C+ in causal theories with multi-valued constants, relate C+ to Pednault's action language ADL, and show how multi-valued constants can be eliminated in favor of Boolean constants.
摘要
for: 这个论文是为了推广 Ferraris et al. 的稳定模型定义,不再基于固定点,可应用于任意首选论 sentences 的 syntax。
methods: 这篇论文使用 Chen, Lin, Wang, Zhang 的 loop formulas with variables,并将其推广到分支计划和任意首选论 sentences。它还扩展了逻辑计划的语法,允许显式Quantifier,并定义其 semantics 为 Ferraris et al. 的稳定模型语言的一个 subclass。
results: 这篇论文显示了这种扩展的逻辑计划可以在稳定模型 semantics 下进行非 monotonic reasoning,而且在不假设唯一名称和Domain closure 的情况下仍然能够处理非 Herbrand 稳定模型。此外,它还显示了一些语法条件,使得查询答案可以通过 first-order 逻辑推理来实现,从而可以使用 first-order 证明器进行非 Herbrand 稳定模型的推理。Abstract
Recently Ferraris, Lee and Lifschitz proposed a new definition of stable models that does not refer to grounding, which applies to the syntax of arbitrary first-order sentences. We show its relation to the idea of loop formulas with variables by Chen, Lin, Wang and Zhang, and generalize their loop formulas to disjunctive programs and to arbitrary first-order sentences. We also extend the syntax of logic programs to allow explicit quantifiers, and define its semantics as a subclass of the new language of stable models by Ferraris et al. Such programs inherit from the general language the ability to handle nonmonotonic reasoning under the stable model semantics even in the absence of the unique name and the domain closure assumptions, while yielding more succinct loop formulas than the general language due to the restricted syntax. We also show certain syntactic conditions under which query answering for an extended program can be reduced to entailment checking in first-order logic, providing a way to apply first-order theorem provers to reasoning about non-Herbrand stable models.
摘要
最近,菲律数、李和里夫斯提出了一个新的定义方式,不受地面影响,可以应用于任意首项关系文法中的语法。我们展示了这个定义与陈等人的循环式关系式的联系,并将其扩展到分类程式和任意首项关系文法中。我们还将逻辑程式的 syntax 扩展为允许显式量词,并定义其 semantics 为一个基于新的稳定模型语言的子集。这些程式继承了稳定模型语言中的非对称逻辑推理能力,但是具有更短的循环式关系式,因为其 restrictive syntax。我们还展示了一些语法条件,使得问题回答可以与首项关系逻辑推理相同,并且可以运用首项关系逻辑推理器进行非HERBRAND稳定模型的推理。
First-Order Stable Model Semantics with Intensional Functions
for: 该论文旨在扩展answer set programming(ASP)中的函数支持,以便在ASP中执行first-order reasoning。
methods: 该论文使用了 Ferraris、Lee、Lifschitz的first-order stable model semantics,并将函数与前置定义的 predicate 一样地处理。
results: 该论文显示了多种已知的ASP性质可以自然地扩展到该形式中,并与其他相关的方法进行比较。此外,该论文还基于这种扩展定义了Answer Set Programming Modulo Theories(ASPMT),可以在含有实数的领域中进行有效的first-orderreasoning。Abstract
In classical logic, nonBoolean fluents, such as the location of an object, can be naturally described by functions. However, this is not the case in answer set programs, where the values of functions are pre-defined, and nonmonotonicity of the semantics is related to minimizing the extents of predicates but has nothing to do with functions. We extend the first-order stable model semantics by Ferraris, Lee, and Lifschitz to allow intensional functions -- functions that are specified by a logic program just like predicates are specified. We show that many known properties of the stable model semantics are naturally extended to this formalism and compare it with other related approaches to incorporating intensional functions. Furthermore, we use this extension as a basis for defining Answer Set Programming Modulo Theories (ASPMT), analogous to the way that Satisfiability Modulo Theories (SMT) is defined, allowing for SMT-like effective first-order reasoning in the context of ASP. Using SMT solving techniques involving functions, ASPMT can be applied to domains containing real numbers and alleviates the grounding problem. We show that other approaches to integrating ASP and CSP/SMT can be related to special cases of ASPMT in which functions are limited to non-intensional ones.
摘要
在经典逻辑中,非布尔流变量,如物体的位置,可以自然地被函数描述。然而,在答案集程序中,函数的值是预先定义的,并且非 monotonicity 的 semantics 与函数没有直接关系。我们将 Ferraris、Lee 和 Lifschitz 的第一阶stable model semantics 扩展以允许内在函数 -- 函数被逻辑程序所定义,与 predicates 一样。我们表明了许多已知的 stable model semantics 的属性被自然地扩展到这种 формалиズмом,并与其他相关的方法进行比较。此外,我们使用这种扩展为基础,定义了 Answer Set Programming Modulo Theories(ASPMT),类似于 Satisfiability Modulo Theories(SMT)的定义,允许在 ASP 中进行有效的第一阶逻辑推理。使用 SMT 解决方法 involving functions,ASPMT 可以应用于含有实数的Domain中,并alleviate the grounding problem。我们还证明了其他将 ASP 和 CSP/SMT 集成的方法可以被看作 ASPMT 中函数的特殊情况。
RL-ViGen: A Reinforcement Learning Benchmark for Visual Generalization
paper_authors: Zhecheng Yuan, Sizhe Yang, Pu Hua, Can Chang, Kaizhe Hu, Xiaolong Wang, Huazhe Xu for:* 这篇论文旨在解决视觉学习中的扩展性问题,即RL Agent在不同任务和扩展类别下的扩展性能力的评估。methods:* 本论文提出了RL-ViGen,一个新的视觉学习评价 benchmark,其包含了多种任务和扩展类别,以便更好地评估RL Agent的扩展性能力。results:* 实验结果表明,现有的视觉RL算法中没有一个 universally 适用于所有任务,RL-ViGen 可以作为一个 catalyst,促进未来创造出适用于实际场景的通用视觉RL Agent。Abstract
Visual Reinforcement Learning (Visual RL), coupled with high-dimensional observations, has consistently confronted the long-standing challenge of out-of-distribution generalization. Despite the focus on algorithms aimed at resolving visual generalization problems, we argue that the devil is in the existing benchmarks as they are restricted to isolated tasks and generalization categories, undermining a comprehensive evaluation of agents' visual generalization capabilities. To bridge this gap, we introduce RL-ViGen: a novel Reinforcement Learning Benchmark for Visual Generalization, which contains diverse tasks and a wide spectrum of generalization types, thereby facilitating the derivation of more reliable conclusions. Furthermore, RL-ViGen incorporates the latest generalization visual RL algorithms into a unified framework, under which the experiment results indicate that no single existing algorithm has prevailed universally across tasks. Our aspiration is that RL-ViGen will serve as a catalyst in this area, and lay a foundation for the future creation of universal visual generalization RL agents suitable for real-world scenarios. Access to our code and implemented algorithms is provided at https://gemcollector.github.io/RL-ViGen/.
摘要
visual reinforcement learning (Visual RL) coupled with high-dimensional observations, has consistently confronted the long-standing challenge of out-of-distribution generalization. Despite the focus on algorithms aimed at resolving visual generalization problems, we argue that the devil is in the existing benchmarks as they are restricted to isolated tasks and generalization categories, undermining a comprehensive evaluation of agents' visual generalization capabilities. To bridge this gap, we introduce RL-ViGen: a novel Reinforcement Learning Benchmark for Visual Generalization, which contains diverse tasks and a wide spectrum of generalization types, thereby facilitating the derivation of more reliable conclusions. Furthermore, RL-ViGen incorporates the latest generalization visual RL algorithms into a unified framework, under which the experiment results indicate that no single existing algorithm has prevailed universally across tasks. Our aspiration is that RL-ViGen will serve as a catalyst in this area, and lay a foundation for the future creation of universal visual generalization RL agents suitable for real-world scenarios. Access to our code and implemented algorithms is provided at https://gemcollector.github.io/RL-ViGen/.Here's the word-for-word translation of the text into Simplified Chinese:视觉强化学习(Visual RL),结合高维度观察,一直面临着 OUT-OF-distribution 泛化挑战。尽管关注在解决视觉泛化问题上的算法,但我们认为存在的 benchmarks 是隔离任务和泛化类别的,这会妨碍对代理人的视觉泛化能力进行全面评估。为了bridging这个差距,我们介绍 RL-ViGen:一个新的强化学习 benchmark для视觉泛化,包含多种任务和广泛的泛化类型,从而促进更可靠的结论。此外, RL-ViGen 还将 latest visual RL 泛化算法集成到一个统一的框架中,实验结果表明,无论任务,任何一个现有算法都没有在所有任务上 universal 适用。我们希望 RL-ViGen 能成为这一领域的 catalyst,并为实际场景中的 universal 视觉泛化 RL 代理人提供基础。可以在https://gemcollector.github.io/RL-ViGen/ 获取我们的代码和实现算法。
NeurASP: Embracing Neural Networks into Answer Set Programming
paper_authors: Zhun Yang, Adam Ishay, Joohyung Lee
for: 用于结合符号计算和低水平计算的简单扩展
methods: 使用神经网络输出作为答案集计算中的概率分布
results: 可以使用预训练神经网络进行符号计算,并通过应用符号逻辑来改善神经网络的感知结果,同时可以通过训练ASP规则来使神经网络更好地学习。Abstract
We present NeurASP, a simple extension of answer set programs by embracing neural networks. By treating the neural network output as the probability distribution over atomic facts in answer set programs, NeurASP provides a simple and effective way to integrate sub-symbolic and symbolic computation. We demonstrate how NeurASP can make use of a pre-trained neural network in symbolic computation and how it can improve the neural network's perception result by applying symbolic reasoning in answer set programming. Also, NeurASP can be used to train a neural network better by training with ASP rules so that a neural network not only learns from implicit correlations from the data but also from the explicit complex semantic constraints expressed by the rules.
摘要
我团队今天宣布了一个简单的扩展项目,即NeurASP,它通过将神经网络输出视为答案集程序中的概率分布来实现。通过将子符号计算和符号计算结合起来,NeurASP提供了一个简单有效的方式来整合子符号计算和符号计算。我们展示了如何使用预训练神经网络进行符号计算,以及如何通过应用答案集程序的 символиック逻辑来改进神经网络的识别结果。此外,NeurASP还可以用来训练神经网络,使其不仅从数据中学习隐式相关性,还从答案集程序中表达的复杂 semantic constraints中学习明确的符号逻辑。
Leveraging Large Language Models to Generate Answer Set Programs
results: 研究发现,只需要几个受Context learning示例,LLM 就可以生成相对复杂的答案集程序。大多数错误都是相对简单的,可以由人类轻松 corrrect。因此,LLM 可以有效地帮助创建答案集程序。Abstract
Large language models (LLMs), such as GPT-3 and GPT-4, have demonstrated exceptional performance in various natural language processing tasks and have shown the ability to solve certain reasoning problems. However, their reasoning capabilities are limited and relatively shallow, despite the application of various prompting techniques. In contrast, formal logic is adept at handling complex reasoning, but translating natural language descriptions into formal logic is a challenging task that non-experts struggle with. This paper proposes a neuro-symbolic method that combines the strengths of large language models and answer set programming. Specifically, we employ an LLM to transform natural language descriptions of logic puzzles into answer set programs. We carefully design prompts for an LLM to convert natural language descriptions into answer set programs in a step by step manner. Surprisingly, with just a few in-context learning examples, LLMs can generate reasonably complex answer set programs. The majority of errors made are relatively simple and can be easily corrected by humans, thus enabling LLMs to effectively assist in the creation of answer set programs.
摘要
大型语言模型(LLM),如GPT-3和GPT-4,在不同的自然语言处理任务中表现出色,并且能够解决一些推理问题。然而,它们的推理能力相对较浅,即使使用了不同的推问技巧。相比之下,正式逻辑能够处理复杂的推理,但将自然语言描述转换为正式逻辑是一个困难的任务,非专家通常难以进行。这篇论文提议了一个神经符号方法,将大型语言模型和答案集计算结合在一起。具体来说,我们使用一个LLM将自然语言描述逻辑题目转换为答案集程式。我们严格设计了对LLM的推问示例,以步骤地方式将自然语言描述转换为答案集程式。 surprisingly,仅需几个内容学习示例,LLM可以生成相对复杂的答案集程式。大多数错误都是相对简单的,可以轻松地由人类更正,因此LLM可以有效地协助创建答案集程式。
The Growth of E-Bike Use: A Machine Learning Approach
results: 研究发现,电动自行车在美国的使用带来了15,737.82公斤的二氧化碳排放减少和716,630.727千卡路里的热量燃烧。此外,研究还发现了电动自行车销售增长的主要影响因素,包括可 dispose的人均收入和受欢迎程度。Abstract
We present our work on electric bicycles (e-bikes) and their implications for policymakers in the United States. E-bikes have gained significant popularity as a fast and eco-friendly transportation option. As we strive for a sustainable energy plan, understanding the growth and impact of e-bikes is crucial for policymakers. Our mathematical modeling offers insights into the value of e-bikes and their role in the future. Using an ARIMA model, a supervised machine-learning algorithm, we predicted the growth of e-bike sales in the U.S. Our model, trained on historical sales data from January 2006 to December 2022, projected sales of 1.3 million units in 2025 and 2.113 million units in 2028. To assess the factors contributing to e-bike usage, we employed a Random Forest regression model. The most significant factors influencing e-bike sales growth were disposable personal income and popularity. Furthermore, we examined the environmental and health impacts of e-bikes. Through Monte Carlo simulations, we estimated the reduction in carbon emissions due to e-bike use and the calories burned through e-biking. Our findings revealed that e-bike usage in the U.S. resulted in a reduction of 15,737.82 kilograms of CO2 emissions in 2022. Additionally, e-bike users burned approximately 716,630.727 kilocalories through their activities in the same year. Our research provides valuable insights for policymakers, emphasizing the potential of e-bikes as a sustainable transportation solution. By understanding the growth factors and quantifying the environmental and health benefits, policymakers can make informed decisions about integrating e-bikes into future energy and transportation strategies.
摘要
我们在美国的电动自行车(e-bike)方面进行了研究,并对政策制定者提供了有价值的信息。电动自行车在快速和环保的交通方式上受到了广泛的欢迎,因此理解电动自行车的增长和影响对于政策制定者是非常重要的。我们使用ARIMA模型和一种监管的机器学习算法来预测美国电动自行车销售的增长。我们的模型,基于2006年1月至2022年12月的历史销售数据,预测到2025年的销售量将达130万台,而到2028年将达2113万台。为了了解电动自行车使用的因素,我们使用Random Forest回归模型。我们发现,个人废弃收入和流行度是电动自行车销售增长的最重要因素。此外,我们还对电动自行车的环境和健康影响进行了分析。通过蒙特卡罗 simulate,我们计算了因电动自行车使用而减少的二氧化碳排放量和通过电动自行车活动烧取的卡路里。我们的发现表明,在2022年,美国的电动自行车使用已经减少了15737.82公斤的二氧化碳排放量,同时电动自行车用户通过其活动烧取了约716630.727公斤的卡路里。我们的研究为政策制定者提供了有价值的信息,证明了电动自行车的可持续性,并且可以作为未来能源和交通战略的一部分。通过理解电动自行车增长的因素和量化电动自行车对环境和健康的影响,政策制定者可以做出有知识的决策。
Coupling Large Language Models with Logic Programming for Robust and General Reasoning from Text
methods: 该论文使用了一种基于 ASP(answer set programming)的逻辑知识表示形式,将自然语言句子转换为逻辑形式,并将这些逻辑形式作为 LLM 的输入。这种方法可以在不需要重新训练的情况下,让 LLM 适应不同的问题。
results: 该论文实验表明,这种方法可以在多个 NLP 评估 benchmark 上实现state-of-the-art 性能,包括 bAbI、StepGame、CLUTRR 和 gSCAN。此外,这种方法还可以成功解决了一些 LLM 无法解决的机器人规划任务。Abstract
While large language models (LLMs), such as GPT-3, appear to be robust and general, their reasoning ability is not at a level to compete with the best models trained for specific natural language reasoning problems. In this study, we observe that a large language model can serve as a highly effective few-shot semantic parser. It can convert natural language sentences into a logical form that serves as input for answer set programs, a logic-based declarative knowledge representation formalism. The combination results in a robust and general system that can handle multiple question-answering tasks without requiring retraining for each new task. It only needs a few examples to guide the LLM's adaptation to a specific task, along with reusable ASP knowledge modules that can be applied to multiple tasks. We demonstrate that this method achieves state-of-the-art performance on several NLP benchmarks, including bAbI, StepGame, CLUTRR, and gSCAN. Additionally, it successfully tackles robot planning tasks that an LLM alone fails to solve.
摘要
大型自然语言模型(LLM),如GPT-3,看起来具有坚固的基础和通用性,但它们的理解能力并没有与专门设计的自然语言理解问题模型相比。在这个研究中,我们发现了一种使用大型自然语言模型来实现几次例示semantic parser的方法。它可以将自然语言句子转换成逻辑形式,该逻辑形式可以作为答案集程序的输入,这种逻辑基于的知识表示形式。这种组合系统可以处理多个问题回答任务,无需为每个新任务进行重新训练。它只需要几个示例来导引LLM的适应特定任务,以及可重用的ASP知识模块,可以应用于多个任务。我们 demonstated 这种方法在多个 NLP 标准准测试上达到了现状最佳性能,包括 bAbI、StepGame、CLUTRR 和 gSCAN。此外,它还成功解决了一些 LLM 无法解决的机器人规划任务。
A Survey on Change Detection Techniques in Document Images
results: 本文对文档图像中的变化检测方法进行了总结和评价,并报告了现有的数据集和评价指标,以及现有方法的缺点和挑战。Abstract
The problem of change detection in images finds application in different domains like diagnosis of diseases in the medical field, detecting growth patterns of cities through remote sensing, and finding changes in legal documents and contracts. However, this paper presents a survey on core techniques and rules to detect changes in different versions of a document image. Our discussions on change detection focus on two categories -- content-based and layout-based. The content-based techniques intelligently extract and analyze the image contents (text or non-text) to show the possible differences, whereas the layout-based techniques use structural information to predict document changes. We also summarize the existing datasets and evaluation metrics used in change detection experiments. The shortcomings and challenges the existing methods face are reported, along with some pointers for future research work.
摘要
该问题在不同领域都有应用,如医疗领域的疾病诊断、通过远程感知获取城市增长趋势,以及法律文档和合同中的变化检测。然而,本文主要介绍了文档版本之间的变化检测技术和规则。我们的讨论关注内容基于和布局基于的两个类别。内容基于的技术通过智能EXTRACT和分析图像内容(文本或非文本)来显示可能的差异,而布局基于的技术使用结构信息预测文档变化。我们还总结了已有的数据集和评价标准,并报告现有方法的缺陷和挑战,以及未来研究的指向。
Creating a Dataset for High-Performance Computing Code Translation: A Bridge Between HPC Fortran and C++
results: 研究表明,使用该数据集可以大幅提高大规模语言模型的翻译能力,具体提高$\times 5.1$(无前期编程知识)和$\times 9.9$(有些编程familiarity)。这种数据集的出现有助于提高高性计算代码翻译的领域进步。数据集可以在https://github.com/bin123apple/Fortran-CPP-HPC-code-translation-dataset中下载。Abstract
In this study, we present a novel dataset for training machine learning models translating between OpenMP Fortran and C++ code. To ensure reliability and applicability, the dataset is initially refined using a meticulous code similarity test. The effectiveness of our dataset is assessed using both quantitative (CodeBLEU) and qualitative (human evaluation) methods. We demonstrate how this dataset can significantly improve the translation capabilities of large-scale language models, with improvements of $\mathbf{\times 5.1}$ for models with no prior coding knowledge and $\mathbf{\times 9.9}$ for models with some coding familiarity. Our work highlights the potential of this dataset to advance the field of code translation for high-performance computing. The dataset is available at https://github.com/bin123apple/Fortran-CPP-HPC-code-translation-dataset
摘要
在这项研究中,我们提供了一个新的代码集合用于训练机器学习模型翻译OpenMP Fortran和C++代码。为确保可靠性和可应用性,我们首先使用仔细的代码相似性测试进行初步约束。我们使用代码BLEU和人类评估方法进行评估效果,并证明这些数据可以帮助大规模语言模型提高翻译能力,其中模型没有编程知识时提高$\times 5.1$,而具有一定编程经验时提高$\times 9.9$。我们的工作表明这个数据集可以推动高性能计算领域代码翻译的发展。这个数据集可以在https://github.com/bin123apple/Fortran-CPP-HPC-code-translation-dataset中下载。
Bound by the Bounty: Collaboratively Shaping Evaluation Processes for Queer AI Harms
paper_authors: Organizers of QueerInAI, Nathan Dennler, Anaelia Ovalle, Ashwin Singh, Luca Soldaini, Arjun Subramonian, Huy Tu, William Agnew, Avijit Ghosh, Kyra Yee, Irene Font Peradejordi, Zeerak Talat, Mayra Russo, Jess de Jesus de Pinho Pinhal
for: This paper aims to understand the perspectives of queer communities on bias evaluation benchmarks and dataset and model documentation for AI systems, and to redesign these processes from queer perspectives.
methods: The paper uses a participatory workshop to gather feedback from queer communities on bias bounties and to critique and redesign these processes.
results: The paper finds that queer communities have concerns about the ownership, incentives, and efficacy of bias bounties, and advocates for community ownership of bounties and the use of participatory processes (e.g., co-creation) to complement bias bounties.Abstract
Bias evaluation benchmarks and dataset and model documentation have emerged as central processes for assessing the biases and harms of artificial intelligence (AI) systems. However, these auditing processes have been criticized for their failure to integrate the knowledge of marginalized communities and consider the power dynamics between auditors and the communities. Consequently, modes of bias evaluation have been proposed that engage impacted communities in identifying and assessing the harms of AI systems (e.g., bias bounties). Even so, asking what marginalized communities want from such auditing processes has been neglected. In this paper, we ask queer communities for their positions on, and desires from, auditing processes. To this end, we organized a participatory workshop to critique and redesign bias bounties from queer perspectives. We found that when given space, the scope of feedback from workshop participants goes far beyond what bias bounties afford, with participants questioning the ownership, incentives, and efficacy of bounties. We conclude by advocating for community ownership of bounties and complementing bounties with participatory processes (e.g., co-creation).
摘要
人工智能(AI)系统的偏见和伤害评估过程和数据集已成为评估AI系统偏见和伤害的中心过程。然而,这些审核过程受到了社会弱势群体知识不包括和审核人员与社区力量不均衡的批评。因此,一些偏见评估模式被提出,通过与受影响社区合作来识别和评估AI系统的伤害(例如,偏见报酬)。然而,寻求社会弱势群体想要从审核过程中获得的问题仍然被忽略。在这篇论文中,我们问到了LGBTQ+社区对审核过程的看法和期望。为此,我们组织了参与式工作坊,批判和重新设计偏见报酬从Queer perspective。我们发现,当给予参与者空间时,参与者的反馈范围超出了偏见报酬的范围,参与者质疑报酬的所有权、动机和有效性。我们 conclude by advocating for community ownership of bounties and complementing bounties with participatory processes (e.g., co-creation).
Efficient Adversarial Attacks on Online Multi-agent Reinforcement Learning
For: 本研究探讨了多智能体强化学习(MARL)模型面临恶意攻击的影响。* Methods: 我们 investigate the impact of adversarial attacks on MARL models, including action poisoning and reward poisoning attacks, as well as a mixed attack strategy that combines both.* Results: 我们发现,混合攻击策略可以高效地攻击 MARL 模型,即使攻击者没有对环境和代理人算法的任何先前信息。Abstract
Due to the broad range of applications of multi-agent reinforcement learning (MARL), understanding the effects of adversarial attacks against MARL model is essential for the safe applications of this model. Motivated by this, we investigate the impact of adversarial attacks on MARL. In the considered setup, there is an exogenous attacker who is able to modify the rewards before the agents receive them or manipulate the actions before the environment receives them. The attacker aims to guide each agent into a target policy or maximize the cumulative rewards under some specific reward function chosen by the attacker, while minimizing the amount of manipulation on feedback and action. We first show the limitations of the action poisoning only attacks and the reward poisoning only attacks. We then introduce a mixed attack strategy with both the action poisoning and the reward poisoning. We show that the mixed attack strategy can efficiently attack MARL agents even if the attacker has no prior information about the underlying environment and the agents' algorithms.
摘要
We first show the limitations of action poisoning only attacks and reward poisoning only attacks. We then introduce a mixed attack strategy that combines both action poisoning and reward poisoning. We demonstrate that the mixed attack strategy can effectively attack MARL agents even if the attacker has no prior knowledge of the underlying environment and the agents' algorithms.
Efficient Action Robust Reinforcement Learning with Probabilistic Policy Execution Uncertainty
results: 该 paper 的 Action Robust Reinforcement Learning with Certificates (ARRLC) 算法可以 дости到 minimax 优化的 regret 和样本复杂度,并且在实验中证明了其在行动偏移情况下的稳定性和更快的 converges 速度。Abstract
Robust reinforcement learning (RL) aims to find a policy that optimizes the worst-case performance in the face of uncertainties. In this paper, we focus on action robust RL with the probabilistic policy execution uncertainty, in which, instead of always carrying out the action specified by the policy, the agent will take the action specified by the policy with probability $1-\rho$ and an alternative adversarial action with probability $\rho$. We establish the existence of an optimal policy on the action robust MDPs with probabilistic policy execution uncertainty and provide the action robust Bellman optimality equation for its solution. Furthermore, we develop Action Robust Reinforcement Learning with Certificates (ARRLC) algorithm that achieves minimax optimal regret and sample complexity. Furthermore, we conduct numerical experiments to validate our approach's robustness, demonstrating that ARRLC outperforms non-robust RL algorithms and converges faster than the robust TD algorithm in the presence of action perturbations.
摘要
中文简体版Robust reinforcement learning(RL)目的是找到面对不确定性时的最佳策略。在这篇论文中,我们关注行动稳健RL,即在执行策略时存在可能性的情况下,agent将按照策略指定的行动执行的可能性为$1-\rho$,而剩下的可能性为$\rho$。我们证明了action robust Markov decision process(MDP)中的优化策略的存在,并提供了action robust Bellman优化方程的解。此外,我们开发了Action Robust Reinforcement Learning with Certificates(ARRLC)算法,实现了最小最大 regret和样本复杂度的优化。此外,我们进行了数值实验,证明了ARRLC在行动偏移下的 robustness,并证明它在存在行动偏移时比非稳健RL算法和robust TD算法更快 converges。Traditional Chinese versionRobust reinforcement learning(RL)的目的是找到面对不确定性时的最佳策略。在这篇论文中,我们关注行动稳健RL,即在执行策略时存在可能性的情况下,agent将按照策略指定的行动执行的可能性为$1-\rho$,而剩下的可能性为$\rho$。我们证明了action robust Markov decision process(MDP)中的优化策略的存在,并提供了action robust Bellman优化方程的解。此外,我们开发了Action Robust Reinforcement Learning with Certificates(ARRLC)算法,实现了最小最大 regret和样本复杂度的优化。此外,我们进行了数值实验,证明了ARRLC在行动偏移下的 robustness,并证明它在存在行动偏移时比非稳健RL算法和robust TD算法更快 converges。
MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression
for: 本研究旨在解决现有 bounding box regression loss function 无法优化 predicted box 和 groundtruth box 尺寸相同但是宽高值不同的问题。
methods: 本研究提出了一种基于 minimum point distance 的 bounding box similarity comparison metric MPDIoU,包括了现有 loss functions 中考虑的所有相关因素,如 overlap 或 non-overlapping 面积、中心点距离、宽高差异,而简化计算过程。基于 MPDIoU 的 bounding box regression loss function 被称为 LMPDIoU。
results: 实验结果表明,基于 MPDIoU 的 loss function 在 state-of-the-art instance segmentation 和 object detection 模型(例如 YOLACT 和 YOLOv7)上,对 PASCAL VOC、MS COCO 和 IIIT5k 进行训练后,与现有 loss functions 相比,具有更高的精度和效果。Abstract
Bounding box regression (BBR) has been widely used in object detection and instance segmentation, which is an important step in object localization. However, most of the existing loss functions for bounding box regression cannot be optimized when the predicted box has the same aspect ratio as the groundtruth box, but the width and height values are exactly different. In order to tackle the issues mentioned above, we fully explore the geometric features of horizontal rectangle and propose a novel bounding box similarity comparison metric MPDIoU based on minimum point distance, which contains all of the relevant factors considered in the existing loss functions, namely overlapping or non-overlapping area, central points distance, and deviation of width and height, while simplifying the calculation process. On this basis, we propose a bounding box regression loss function based on MPDIoU, called LMPDIoU . Experimental results show that the MPDIoU loss function is applied to state-of-the-art instance segmentation (e.g., YOLACT) and object detection (e.g., YOLOv7) model trained on PASCAL VOC, MS COCO, and IIIT5k outperforms existing loss functions.
摘要
bounding box regression (BBR) 广泛应用于物体检测和实例分割,是物体Localization的重要步骤。然而,现有的 bounding box regression 损失函数无法优化预测框与实际框的尺寸值不同,但宽高比相同的情况。为解决以上问题,我们彻底探讨直方框的几何特征,并提出一种基于最小点距离的 bounding box 相似比较度量 MPDIoU,该度量包含现有损失函数中考虑的所有因素,包括重叠或非重叠区域、中心点距离、宽高差异,而简化计算过程。基于 MPDIoU 度量,我们提出了一种基于 MPDIoU 的 bounding box regression 损失函数 LMPDIoU。实验结果表明,MPDIoU 损失函数在 state-of-the-art 实例分割(如 YOLACT)和物体检测(如 YOLOv7)模型在 PASCAL VOC、MS COCO 和 IIIT5k 上训练后,与现有损失函数相比,具有更高的性能。
SALC: Skeleton-Assisted Learning-Based Clustering for Time-Varying Indoor Localization
paper_authors: An-Hung Hsiao, Li-Hsiang Shen, Chen-Yi Chang, Chun-Jie Chiu, Kai-Ten Feng for: 本研究旨在提高室内地位系统的精度和可靠性,透过对室内 WiFi 接点点 (AP) 的接收信号强度 (RSS) 进行建立 fingerprinting 数据库。methods: 本研究提出了一个基于 skeleton-assisted learning-based clustering (SALC) 系统,包括 RSS-oriented map-assisted clustering (ROMAC)、cluster-based online database establishment (CODE) 和 cluster-scaled location estimation (CsLE)。SALC 系统结合了骨架基于最短路 (SSP) 的相似性和时间变化的 RSS 测量 across reference points (RPs)。results: simulation 和实验结果表明,提出的 SALC 系统可以有效地重建 fingerprint 数据库,提高地位估计精度,比较出色于现有Literature中的其他方法。Abstract
Wireless indoor localization has attracted significant amount of attention in recent years. Using received signal strength (RSS) obtained from WiFi access points (APs) for establishing fingerprinting database is a widely utilized method in indoor localization. However, the time-variant problem for indoor positioning systems is not well-investigated in existing literature. Compared to conventional static fingerprinting, the dynamicallyreconstructed database can adapt to a highly-changing environment, which achieves sustainability of localization accuracy. To deal with the time-varying issue, we propose a skeleton-assisted learning-based clustering localization (SALC) system, including RSS-oriented map-assisted clustering (ROMAC), cluster-based online database establishment (CODE), and cluster-scaled location estimation (CsLE). The SALC scheme jointly considers similarities from the skeleton-based shortest path (SSP) and the time-varying RSS measurements across the reference points (RPs). ROMAC clusters RPs into different feature sets and therefore selects suitable monitor points (MPs) for enhancing location estimation. Moreover, the CODE algorithm aims for establishing adaptive fingerprint database to alleviate the timevarying problem. Finally, CsLE is adopted to acquire the target position by leveraging the benefits of clustering information and estimated signal variations in order to rescale the weights fromweighted k-nearest neighbors (WkNN) method. Both simulation and experimental results demonstrate that the proposed SALC system can effectively reconstruct the fingerprint database with an enhanced location estimation accuracy, which outperforms the other existing schemes in the open literature.
摘要
无线内部位置已引起过去几年的广泛关注。使用WiFi接入点(AP)接收信号强度(RSS)建立指本库是内部位置系统中广泛使用的方法。然而,现有文献中对内部位置系统的时间变化问题并未得到充分研究。与传统静止指本相比,动态重建库可以适应高度变化的环境,实现位置准确性的持续稳定。为解决时间变化问题,我们提出了骨架帮助学习基本的集成位置系统(SALC),包括RSS方向的地图帮助集成(ROMAC)、集成在线数据库建立(CODE)和集群扩大位置估计(CsLE)。SALC方案同时考虑骨架基于最短路(SSP)的相似性和时间变化的RSS测量值。ROMAC将RP集成到不同的特征集中,选择合适的监测点(MP)以提高位置估计。此外,CODE算法旨在建立适应时间变化的指本库,以解决时间变化问题。最后,CsLE方法通过利用集群信息和估计信号变化来重新调整WkNN方法中的权重,以获得更高的位置估计精度。实验和 simulations结果表明,提出的SALC系统可以有效地重建指本库,并提高位置估计精度,超过现有文献中的其他方案。
Othering and low prestige framing of immigrant cuisines in US restaurant reviews and large language models
paper_authors: Yiwei Luo, Kristina Gligorić, Dan Jurafsky
for: This paper aims to understand implicit attitudes toward food and how they can perpetuate social prejudice, specifically in the context of immigrant cuisines.
methods: The authors use linguistic analyses of over 2.1 million English language Yelp reviews of restaurants in 14 US states to evaluate social theories about attitudes toward immigrant cuisine. They control for factors such as restaurant price and neighborhood racial diversity.
results: The authors find that immigrant cuisines are more likely to be framed in objectifying and othering terms of authenticity, exoticism, and prototypicality, and that non-Western immigrant cuisines receive more othering than European cuisines. Additionally, they find that non-Western immigrant cuisines are framed less positively and as lower status, being evaluated in terms of affordability and hygiene. Finally, they show that reviews generated by large language models (LLMs) reproduce many of the same framing tendencies.Abstract
Identifying and understanding implicit attitudes toward food can help efforts to mitigate social prejudice due to food's pervasive role as a marker of cultural and ethnic identity. Stereotypes about food are a form of microaggression that contribute to harmful public discourse that may in turn perpetuate prejudice toward ethnic groups and negatively impact economic outcomes for restaurants. Through careful linguistic analyses, we evaluate social theories about attitudes toward immigrant cuisine in a large-scale study of framing differences in 2.1M English language Yelp reviews of restaurants in 14 US states. Controlling for factors such as restaurant price and neighborhood racial diversity, we find that immigrant cuisines are more likely to be framed in objectifying and othering terms of authenticity (e.g., authentic, traditional), exoticism (e.g., exotic, different), and prototypicality (e.g., typical, usual), but that non-Western immigrant cuisines (e.g., Indian, Mexican) receive more othering than European cuisines (e.g., French, Italian). We further find that non-Western immigrant cuisines are framed less positively and as lower status, being evaluated in terms of affordability and hygiene. Finally, we show that reviews generated by large language models (LLMs) reproduce many of the same framing tendencies. Our results empirically corroborate social theories of taste and gastronomic stereotyping, and reveal linguistic processes by which such attitudes are reified.
摘要
认识和理解食物的隐式态度可以帮助减少基于食物的文化和民族身份 marker 的社会偏见。食物的刻板印象是一种微侵略,它们可能在社会公共讨论中产生有害的影响,从而导致对少数民族的偏见和不良经济效益。通过精心的语言分析,我们在大规模的英语 Yelp 评论数据集中评估社会理论对移民菜系的态度。控制因素包括餐厅价格和邻里种族多样性,我们发现移民菜系更有可能被归类为真实、传统、特色、外国、不同、常见等词汇,但非西方移民菜系(如印度、墨西哥)被其他化比欧洲菜系(如法国、意大利)更多。此外,我们发现非西方移民菜系在评价中受到负面评价,被评价为便宜和卫生。最后,我们发现由大语言模型(LLMs)生成的评论也存在类似的归类倾向。我们的结果经验证了社会理论的味蕾和文化刻板印象,并揭示了语言过程如何固化这些态度。
`It is currently hodgepodge’’: Examining AI/ML Practitioners’ Challenges during Co-production of Responsible AI Values
results: 研究发现,实施RAI价值观会由于组织结构和价值观念的冲突而带来挑战,这些挑战会影响实践者的工作。研究还发现了多种解决这些挑战的策略,包括在组织结构和价值观念方面进行调整。Abstract
Recently, the AI/ML research community has indicated an urgent need to establish Responsible AI (RAI) values and practices as part of the AI/ML lifecycle. Several organizations and communities are responding to this call by sharing RAI guidelines. However, there are gaps in awareness, deliberation, and execution of such practices for multi-disciplinary ML practitioners. This work contributes to the discussion by unpacking co-production challenges faced by practitioners as they align their RAI values. We interviewed 23 individuals, across 10 organizations, tasked to ship AI/ML based products while upholding RAI norms and found that both top-down and bottom-up institutional structures create burden for different roles preventing them from upholding RAI values, a challenge that is further exacerbated when executing conflicted values. We share multiple value levers used as strategies by the practitioners to resolve their challenges. We end our paper with recommendations for inclusive and equitable RAI value-practices, creating supportive organizational structures and opportunities to further aid practitioners.
摘要
We interviewed 23 individuals from 10 organizations that are tasked with shipping AI/ML-based products while upholding RAI norms. We found that both top-down and bottom-up institutional structures create burdens for different roles, preventing them from upholding RAI values. This challenge is further exacerbated when executing conflicting values.To address these challenges, we share multiple value levers used by practitioners as strategies to resolve their challenges. These include:1. Inclusive and equitable value-practices: Practitioners must prioritize inclusive and equitable value-practices that take into account the needs and perspectives of diverse stakeholders.2. Supportive organizational structures: Organizations must create supportive structures and opportunities to further aid practitioners in upholding RAI values.3. Ongoing education and training: Practitioners must receive ongoing education and training on RAI values and practices to ensure they are equipped to handle the challenges they face.We end our paper with recommendations for inclusive and equitable RAI value-practices, creating supportive organizational structures, and providing ongoing education and training to practitioners. By addressing these challenges, we can ensure that RAI values are upheld in the development and deployment of AI/ML-based products.
Exploring Link Prediction over Hyper-Relational Temporal Knowledge Graphs Enhanced with Time-Invariant Relational Knowledge
for: This paper focuses on filling the gap between temporal knowledge graph (TKG) reasoning and hyper-relational knowledge graph (HKG) reasoning, by developing a new benchmark dataset and a reasoning model that can efficiently handle both temporal and qualifier information.
methods: The proposed reasoning model leverages both temporal and time-invariant relational knowledge from the Wikidata knowledge base to improve the performance of HTKG reasoning.
results: The experimental results show that the proposed model outperforms previous related methods on HTKG link prediction, and can be further enhanced by jointly leveraging both temporal and time-invariant relational knowledge.Here’s the simplified Chinese text version of the three information points:
results: 实验结果表明,提出的模型在HTKG链接预测 task上表现出色,并且可以通过共同利用时间不变和资格信息来进一步提高性能。Abstract
Stemming from traditional knowledge graphs (KGs), hyper-relational KGs (HKGs) provide additional key-value pairs (i.e., qualifiers) for each KG fact that help to better restrict the fact validity. In recent years, there has been an increasing interest in studying graph reasoning over HKGs. In the meantime, due to the ever-evolving nature of world knowledge, extensive parallel works have been focusing on reasoning over temporal KGs (TKGs), where each TKG fact can be viewed as a KG fact coupled with a timestamp (or time period) specifying its time validity. The existing HKG reasoning approaches do not consider temporal information because it is not explicitly specified in previous benchmark datasets. Besides, all the previous TKG reasoning methods only lay emphasis on temporal reasoning and have no way to learn from qualifiers. To this end, we aim to fill the gap between TKG reasoning and HKG reasoning. We develop two new benchmark hyper-relational TKG (HTKG) datasets, i.e., Wiki-hy and YAGO-hy, and propose a HTKG reasoning model that efficiently models both temporal facts and qualifiers. We further exploit additional time-invariant relational knowledge from the Wikidata knowledge base and study its effectiveness in HTKG reasoning. Time-invariant relational knowledge serves as the knowledge that remains unchanged in time (e.g., Sasha Obama is the child of Barack Obama), and it has never been fully explored in previous TKG reasoning benchmarks and approaches. Experimental results show that our model substantially outperforms previous related methods on HTKG link prediction and can be enhanced by jointly leveraging both temporal and time-invariant relational knowledge.
摘要
traditional知识 graphs(KGs)的扩展,hyper-relational知识 graphs(HKGs)提供每个KG事实的额外关键值对(i.e., 资格),以更好地限定事实的有效性。近年来,研究graph reasoning over HKGs的兴趣在增长。同时,由于世界知识的不断演进,大量并发的工作在研究temporal知识 graphs(TKGs)上进行reasoning,每个TKG事实可以视为KG事实加上时间戳(或时间段),指定其时间有效性。现有的HKGreasoning方法不考虑时间信息,因为之前的benchmark dataset不Explicitly specified。此外,所有前一些TKGreasoning方法只是强调时间reasoning,没有考虑学习资格。为此,我们希望填补HKGreasoning和TKGreasoning之间的空白。我们开发了两个新的benchmark hyper-relational TKG(HTKG)数据集,即Wiki-hy和YAGO-hy,并提出了一种HTKGreasoning模型,可以效率地模型时间事实和资格。此外,我们还利用Wikidata知识库中的静止关系知识,并研究其在HTKGreasoning中的效果。静止关系知识是指不变于时间的知识(例如,萨沙·奥巴马是巴拉克·奥巴马的孩子),在前一些TKGreasoningbenchmark和方法中从未得到过全面的探索。实验结果表明,我们的模型在HTKGlink prediction中具有明显的优势,并可以通过结合时间和静止关系知识来进一步提高性能。
Dissenting Explanations: Leveraging Disagreement to Reduce Model Overreliance
results: 经过一个小样本研究,authors发现,通过提供分裂解释,可以减少人们对模型预测的过重依赖,同时不会降低总准确率。这表明,分裂解释可以帮助人们更好地理解模型的决策过程,并减少模型预测的不确定性。Abstract
While explainability is a desirable characteristic of increasingly complex black-box models, modern explanation methods have been shown to be inconsistent and contradictory. The semantics of explanations is not always fully understood - to what extent do explanations "explain" a decision and to what extent do they merely advocate for a decision? Can we help humans gain insights from explanations accompanying correct predictions and not over-rely on incorrect predictions advocated for by explanations? With this perspective in mind, we introduce the notion of dissenting explanations: conflicting predictions with accompanying explanations. We first explore the advantage of dissenting explanations in the setting of model multiplicity, where multiple models with similar performance may have different predictions. In such cases, providing dissenting explanations could be done by invoking the explanations of disagreeing models. Through a pilot study, we demonstrate that dissenting explanations reduce overreliance on model predictions, without reducing overall accuracy. Motivated by the utility of dissenting explanations we present both global and local methods for their generation.
摘要
“ explainnability 是复杂黑盒模型的一个欲具备的特点,但现代解释方法有时会被视为不一致和矛盾。解释的 semantics 不 sempre fully understood - 解释是否真的解释了一个决策,或者仅仅是支持一个决策?我们可以帮助人们从解释中获得启发,而不是仅仅依赖错误的预测和解释?以此角度,我们引入了不同的解释: conflicting predictions with accompanying explanations。在模型多样性的设定中,多个模型具有相似的表现可能会有不同的预测。在这种情况下,提供不同的解释可以通过邀请不同的模型的解释。我们通过一个小试验示出,提供不同的解释可以对抗预测的过度依赖,而不减少整体准确性。驱动了不同解释的 utility,我们提出了全球和本地的生成方法。”Note that Simplified Chinese is used in mainland China, while Traditional Chinese is used in Taiwan and Hong Kong.
paper_authors: Marianna B. Ganapini, Francesco Fabiano, Lior Horesh, Andrea Loreggia, Nicholas Mattei, Keerthiram Murugesan, Vishal Pallagani, Francesca Rossi, Biplav Srivastava, Brent Venable for: 这项研究旨在开发一个基于人工智能和人类协作的决策框架,该框架通过提供决策建议来引导人类做出决策。methods: 该研究使用了三种决策激励模式,这些模式根据决策建议是在什么时候提供给人类,以便激励人类快速思考、慢速思考或自reflective思考。results: 研究人员通过使用不同的价值来决定何时和如何使用各种决策激励模式,以实现更好的决策效果。例如,在做出决策时,可以考虑decision quality、speed、human upskilling和learning、human agency和隐私等价值。Abstract
Nudging is a behavioral strategy aimed at influencing people's thoughts and actions. Nudging techniques can be found in many situations in our daily lives, and these nudging techniques can targeted at human fast and unconscious thinking, e.g., by using images to generate fear or the more careful and effortful slow thinking, e.g., by releasing information that makes us reflect on our choices. In this paper, we propose and discuss a value-based AI-human collaborative framework where AI systems nudge humans by proposing decision recommendations. Three different nudging modalities, based on when recommendations are presented to the human, are intended to stimulate human fast thinking, slow thinking, or meta-cognition. Values that are relevant to a specific decision scenario are used to decide when and how to use each of these nudging modalities. Examples of values are decision quality, speed, human upskilling and learning, human agency, and privacy. Several values can be present at the same time, and their priorities can vary over time. The framework treats values as parameters to be instantiated in a specific decision environment.
摘要
推动(nudging)是一种行为战略,旨在影响人们的思想和行为。推动技巧可以在我们日常生活中找到很多的应用,这些推动技巧可以 targets 人们的快速和不自觉的思维,例如使用图像引发恐惧或更加细致和努力的慢思考。在这篇论文中,我们提出了一种基于人工智能和人类合作的价值基于推动框架。这个框架中的三种推动模式,基于建议给人时的 WHEN 和 HOW,用于刺激人们的快速思维、慢思考或者元认知。在具体的决策场景中,根据相关的价值来决定使用哪种推动模式。例如,决策质量、快速响应、人类技能和学习、人类自主权和隐私等价值。在这个框架中,价值被视为实例化在特定决策环境中的参数。
Interactive Spatiotemporal Token Attention Network for Skeleton-based General Interactive Action Recognition
results: EXTensive experiments on four datasets show that ISTA-Net outperforms state-of-the-art methods in recognizing interactive actions, demonstrating the effectiveness of the proposed approach.Abstract
Recognizing interactive action plays an important role in human-robot interaction and collaboration. Previous methods use late fusion and co-attention mechanism to capture interactive relations, which have limited learning capability or inefficiency to adapt to more interacting entities. With assumption that priors of each entity are already known, they also lack evaluations on a more general setting addressing the diversity of subjects. To address these problems, we propose an Interactive Spatiotemporal Token Attention Network (ISTA-Net), which simultaneously model spatial, temporal, and interactive relations. Specifically, our network contains a tokenizer to partition Interactive Spatiotemporal Tokens (ISTs), which is a unified way to represent motions of multiple diverse entities. By extending the entity dimension, ISTs provide better interactive representations. To jointly learn along three dimensions in ISTs, multi-head self-attention blocks integrated with 3D convolutions are designed to capture inter-token correlations. When modeling correlations, a strict entity ordering is usually irrelevant for recognizing interactive actions. To this end, Entity Rearrangement is proposed to eliminate the orderliness in ISTs for interchangeable entities. Extensive experiments on four datasets verify the effectiveness of ISTA-Net by outperforming state-of-the-art methods. Our code is publicly available at https://github.com/Necolizer/ISTA-Net
摘要
Recognizing interactive action plays an important role in human-robot interaction and collaboration. Previous methods use late fusion and co-attention mechanism to capture interactive relations, which have limited learning capability or inefficiency to adapt to more interacting entities. With assumption that priors of each entity are already known, they also lack evaluations on a more general setting addressing the diversity of subjects. To address these problems, we propose an Interactive Spatiotemporal Token Attention Network (ISTA-Net), which simultaneously models spatial, temporal, and interactive relations. Specifically, our network contains a tokenizer to partition Interactive Spatiotemporal Tokens (ISTs), which is a unified way to represent motions of multiple diverse entities. By extending the entity dimension, ISTs provide better interactive representations. To jointly learn along three dimensions in ISTs, multi-head self-attention blocks integrated with 3D convolutions are designed to capture inter-token correlations. When modeling correlations, a strict entity ordering is usually irrelevant for recognizing interactive actions. To this end, Entity Rearrangement is proposed to eliminate the orderliness in ISTs for interchangeable entities. Extensive experiments on four datasets verify the effectiveness of ISTA-Net by outperforming state-of-the-art methods. Our code is publicly available at https://github.com/Necolizer/ISTA-Net.Here's the text with some notes on the translation:* "Recognizing interactive action" is translated as "认识人机交互行为" (règng shí yǔ jì xìng zhì xíng)* "Interactive Spatiotemporal Token Attention Network" is translated as "交互空时token注意网络" (jiāo xì kōng shí zhōng zhì wǎng wǎn)* "Interactive Spatiotemporal Tokens" is translated as "交互空时token" (jiāo xì kōng shí zhōng zhì)* "Entity Rearrangement" is translated as "实体重新排序" (shí tǐ zhòng xīn pinyīn)Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.
Structured Pruning of Neural Networks for Constraints Learning
paper_authors: Matteo Cacciola, Antonio Frangioni, Andrea Lodi for: 这篇论文主要关注在机器学习(ML)模型与运筐学(OR)工具的集成方面,具体来说是使用混合整数编程(MIP)表述ML模型输出的问题。methods: 本论文使用了束缚(pruning)技术来缩减人工神经网络(ANNs)中的参数数量,从而提高MIP表述的效率。results: experiments 表明,使用束缚技术可以在ML模型的解决过程中提供显著的加速,而无需妥协解决质量。Abstract
In recent years, the integration of Machine Learning (ML) models with Operation Research (OR) tools has gained popularity across diverse applications, including cancer treatment, algorithmic configuration, and chemical process optimization. In this domain, the combination of ML and OR often relies on representing the ML model output using Mixed Integer Programming (MIP) formulations. Numerous studies in the literature have developed such formulations for many ML predictors, with a particular emphasis on Artificial Neural Networks (ANNs) due to their significant interest in many applications. However, ANNs frequently contain a large number of parameters, resulting in MIP formulations that are impractical to solve, thereby impeding scalability. In fact, the ML community has already introduced several techniques to reduce the parameter count of ANNs without compromising their performance, since the substantial size of modern ANNs presents challenges for ML applications as it significantly impacts computational efforts during training and necessitates significant memory resources for storage. In this paper, we showcase the effectiveness of pruning, one of these techniques, when applied to ANNs prior to their integration into MIPs. By pruning the ANN, we achieve significant improvements in the speed of the solution process. We discuss why pruning is more suitable in this context compared to other ML compression techniques, and we identify the most appropriate pruning strategies. To highlight the potential of this approach, we conduct experiments using feed-forward neural networks with multiple layers to construct adversarial examples. Our results demonstrate that pruning offers remarkable reductions in solution times without hindering the quality of the final decision, enabling the resolution of previously unsolvable instances.
摘要
近年来,机器学习(ML)模型与运筐学(OR)工具的集成在多种应用中得到了广泛的推广,包括肿瘤治疗、算法配置和化学过程优化。在这个领域,ML和OR的结合常常通过表示ML模型输出的混合整数编程(MIP)形式来实现。文献中有许多研究发展了这种形式,尤其是人工神经网络(ANNs),因为它们在许多应用中具有广泛的 интерес。然而,ANNs经常具有较大的参数数量,导致MIP形式成为实际不可解决的,从而阻碍了扩展性。事实上,ML社区已经开发了许多技术来减少ANNs中参数的数量,以避免降低性能。在这篇论文中,我们展示了对ANNs进行剪裁后,在MIP中的速度解决过程中的显著改善。我们解释了为什么剪裁在这种上下文中比其他ML压缩技术更适合,并确定了最佳剪裁策略。为了强调这种方法的潜力,我们在多层扩散神经网络中构建了反对例。我们的结果表明,剪裁可以在解决之前不可解决的实例中提供了很大的改善,而不会影响最终决策的质量。
Can Large Language Models Empower Molecular Property Prediction?
results: 实验结果表明,文本解释作为分子表示在多个 benchmark 数据集上具有优势,并证明 LL.M 在分子性质预测任务中具有潜在的潜力。Abstract
Molecular property prediction has gained significant attention due to its transformative potential in multiple scientific disciplines. Conventionally, a molecule graph can be represented either as a graph-structured data or a SMILES text. Recently, the rapid development of Large Language Models (LLMs) has revolutionized the field of NLP. Although it is natural to utilize LLMs to assist in understanding molecules represented by SMILES, the exploration of how LLMs will impact molecular property prediction is still in its early stage. In this work, we advance towards this objective through two perspectives: zero/few-shot molecular classification, and using the new explanations generated by LLMs as representations of molecules. To be specific, we first prompt LLMs to do in-context molecular classification and evaluate their performance. After that, we employ LLMs to generate semantically enriched explanations for the original SMILES and then leverage that to fine-tune a small-scale LM model for multiple downstream tasks. The experimental results highlight the superiority of text explanations as molecular representations across multiple benchmark datasets, and confirm the immense potential of LLMs in molecular property prediction tasks. Codes are available at \url{https://github.com/ChnQ/LLM4Mol}.
摘要
摩尔电子性预测已经受到了广泛关注,因为它在多种科学领域的变革性很大。传统上,摩尔图可以被表示为图structured data或SMILES文本。在最近的几年中,大型自然语言模型(LLMs)的快速发展对涉及到NLP领域的研究带来了革命性的变革。虽然可以使用LLMs来帮助理解表示by SMILES的分子,但是研究如何使用LLMs进行分子性预测的阶段仍处于初期阶段。在这项工作中,我们通过两个视角进行推进:零/几个shot分子分类,和使用LLMs生成的新的解释来代表分子。具体来说,我们首先让LLMs在上下文中进行分子分类,并评估其性能。然后,我们使用LLMs生成Semantically enriched explanations for the original SMILES,并使用该解释来精化一个小规模LM模型以进行多个下游任务。实验结果表明,文本解释作为分子表示的 superiority across multiple benchmark datasets,并证明了LLMs在分子性预测任务中的巨大潜力。代码可以在 \url{https://github.com/ChnQ/LLM4Mol} 上获取。
results: 研究发现了两个不同领域之间的新连接:三选一决策和评价语言表达理论。Abstract
We propose a linguistic interpretation of three-way decisions, where the regions of acceptance, rejection, and non-commitment are constructed by using the so-called evaluative linguistic expressions, which are expressions of natural language such as small, medium, very short, quite roughly strong, extremely good, etc. Our results highlight new connections between two different research areas: three-way decisions and the theory of evaluative linguistic expressions.
摘要
我们提出了一种语言解释三元决策,其中acceptance、rejection和non-commitment的区域由使用所谓的评价语言表达来构建,这些表达包括自然语言中的小、中、很短、很强、非常好等等。我们的研究结果揭示了两个不同的研究领域之间的新连接:三元决策和评价语言表达理论。
Opinion mining using Double Channel CNN for Recommender System
results: 该方法在评价用户对产品的看法方面达到了 91.6% 的准确率,与之前的方面基于的方法相比有显著提高。Abstract
Much unstructured data has been produced with the growth of the Internet and social media. A significant volume of textual data includes users' opinions about products in online stores and social media. By exploring and categorizing them, helpful information can be acquired, including customer satisfaction, user feedback about a particular event, predicting the sale of a specific product, and other similar cases. In this paper, we present an approach for sentiment analysis with a deep learning model and use it to recommend products. A two-channel convolutional neural network model has been used for opinion mining, which has five layers and extracts essential features from the data. We increased the number of comments by applying the SMOTE algorithm to the initial dataset and balanced the data. Then we proceed to cluster the aspects. We also assign a weight to each cluster using tensor decomposition algorithms that improve the recommender system's performance. Our proposed method has reached 91.6% accuracy, significantly improved compared to previous aspect-based approaches.
摘要
“随着互联网和社交媒体的发展,大量的未结构化数据被生成。这些文本数据中包含用户对产品的评价,可以从中获得有益信息,如客户满意度、用户对某个事件的反馈、预测特定产品的销售等。在这篇论文中,我们提出了一种基于深度学习模型的情感分析方法,并使其用于产品推荐。我们使用了两个通道卷积神经网络模型进行意见挖掘,该模型有五层,可以从数据中提取重要特征。我们首先应用了SMOTE算法来增加数据量,然后对特征进行分类。此外,我们还使用了矩阵分解算法来赋予每个分类器一个权重,以提高推荐系统的性能。我们的提议方法已达91.6%的准确率,与之前的方面基于的方法相比有显著提高。”
Political Sentiment Analysis of Persian Tweets Using CNN-LSTM Model
paper_authors: Mohammad Dehghani, Zahra Yazdanparast for: 本研究的目的是使用机器学习和深度学习模型来分析波斯语政治微博上的情感。methods: 本研究使用Bag of Words和ParsBERT来表示字词,并应用 Gaussian Naive Bayes、Gradient Boosting、Logistic Regression、Decision Trees、Random Forests,以及一种 combinaison of CNN和LSTM 来类别 tweet 的方向。results: 研究结果显示,使用 ParsBERT 嵌入深度学习模型可以更好地分析波斯语政治微博上的情感,CNN-LSTM 模型在第一个数据集上取得了89%的分类精度,在第二个数据集上取得了71%的分类精度。Abstract
Sentiment analysis is the process of identifying and categorizing people's emotions or opinions regarding various topics. The analysis of Twitter sentiment has become an increasingly popular topic in recent years. In this paper, we present several machine learning and a deep learning model to analysis sentiment of Persian political tweets. Our analysis was conducted using Bag of Words and ParsBERT for word representation. We applied Gaussian Naive Bayes, Gradient Boosting, Logistic Regression, Decision Trees, Random Forests, as well as a combination of CNN and LSTM to classify the polarities of tweets. The results of this study indicate that deep learning with ParsBERT embedding performs better than machine learning. The CNN-LSTM model had the highest classification accuracy with 89 percent on the first dataset with three classes and 71 percent on the second dataset with seven classes. Due to the complexity of Persian, it was a difficult task to achieve this level of efficiency.
摘要
sentiment分析是指 identificifying和 categorizing人们对不同话题的情感或意见。在最近几年,Twitter sentiment的分析已成为一个越来越流行的话题。在这篇论文中,我们提出了一些机器学习和深度学习模型,用于分析波斯政治微博的情感。我们的分析使用了Bag of Words和ParsBERT来表示单词。我们应用了 Gaussian Naive Bayes、Gradient Boosting、Logistic Regression、Decision Trees、Random Forests,以及一个组合的CNN和LSTM来分类微博的褒贬性。研究结果表明,使用ParsBERT embedding的深度学习模型在波斯政治微博中的情感分类任务中表现较好,其中CNN-LSTM模型在第一个数据集中的三类分类任务中取得了89%的分类精度,在第二个数据集中的七类分类任务中取得了71%的分类精度。由于波斯语的复杂性,这是一项具有挑战性的任务。
CPET: Effective Parameter-Efficient Tuning for Compressed Large Language Models
methods: 论文使用了各种主流 LLM 压缩技术来提高 PET 性能,并引入了知识继承和恢复策略来补偿压缩技术导致的知识损失。
results: 论文的实验结果显示,与原始压缩 LLM 相比,使用 CPET 框架可以实现更好的性能,并且在多任务情况下与直接运用普通 PET 方法相比 OUTperform。Abstract
Parameter-efficient tuning (PET) has been widely explored in recent years because it tunes much fewer parameters (PET modules) than full-parameter fine-tuning (FT) while still stimulating sufficient knowledge from large language models (LLMs) for downstream tasks. Moreover, when PET is employed to serve multiple tasks, different task-specific PET modules can be built on a frozen LLM, avoiding redundant LLM deployments. Although PET significantly reduces the cost of tuning and deploying LLMs, its inference still suffers from the computational bottleneck of LLMs. To address the above issue, we propose an effective PET framework based on compressed LLMs, named "CPET". In CPET, we evaluate the impact of mainstream LLM compression techniques on PET performance and then introduce knowledge inheritance and recovery strategies to restore the knowledge loss caused by these compression techniques. Our experimental results demonstrate that, owing to the restoring strategies of CPET, collaborating task-specific PET modules with a compressed LLM can achieve comparable performance to collaborating PET modules with the original version of the compressed LLM and outperform directly applying vanilla PET methods to the compressed LLM.
摘要
减少参数调参 (PET) 在最近几年内得到了广泛的探索,因为它在调参 fewer parameters (PET modules) 时仍然可以从大语言模型 (LLMs) 中继承足够的知识,用于下游任务。此外,当使用 PET 服务多个任务时,可以在冻结 LLM 上建立不同任务专门的 PET modules,以避免重复的 LLM 部署。虽然 PET 可以减少调试和部署 LLMS 的成本,但其推理仍然受到 LLMS 的计算瓶颈的限制。为解决以上问题,我们提出了一个有效的 PET 框架,名为 "CPET"。在 CPET 中,我们评估了主流 LLMS 压缩技术对 PET 性能的影响,然后引入了知识继承和恢复策略,以弥补压缩技术所导致的知识损失。我们的实验结果表明, CPET 可以在压缩 LLMS 上与原始版本的压缩 LLMS 相比,并且可以超越直接应用 vanilla PET 方法。
Think-on-Graph: Deep and Responsible Reasoning of Large Language Model with Knowledge Graph
methods: 我们提出了 Think-on-Graph(ToG)框架,利用知识图来提升 LLM 的深入负责推理能力。 ToG 可以识别问题中关键的实体,并从外部知识库中检索相关的 triplet,进行循环的推理和检索。
results: 通过对复杂多层推理问答 зада务进行实验,我们表明 ToG 可以有效地解决 LLM 的上述限制,不需要额外训练成本。Abstract
Large language models (LLMs) have made significant strides in various tasks, yet they often struggle with complex reasoning and exhibit poor performance in scenarios where knowledge traceability, timeliness, and accuracy are crucial. To address these limitations, we present Think-on-Graph (ToG), a novel framework that leverages knowledge graphs to enhance LLMs' ability for deep and responsible reasoning. By employing ToG, we can identify entities relevant to a given question and conduct exploration and reasoning to retrieve related triples from an external knowledge database. This iterative procedure generates multiple reasoning pathways consisting of sequentially connected triplets until sufficient information is gathered to answer the question or the maximum depth is reached. Through experiments on complex multi-hop reasoning question-answering tasks, we demonstrate that ToG outperforms existing methods, effectively addressing the aforementioned limitations of LLMs without incurring additional training costs.
摘要
Single and Multi-Speaker Cloned Voice Detection: From Perceptual to Learned Features
results: 这三种方法在训练在单一话者的声音和多话者的声音下都显示了良好的效果,并且对对抗过滤攻击也表现了良好的Robustness。Abstract
Synthetic-voice cloning technologies have seen significant advances in recent years, giving rise to a range of potential harms. From small- and large-scale financial fraud to disinformation campaigns, the need for reliable methods to differentiate real and synthesized voices is imperative. We describe three techniques for differentiating a real from a cloned voice designed to impersonate a specific person. These three approaches differ in their feature extraction stage with low-dimensional perceptual features offering high interpretability but lower accuracy, to generic spectral features, and end-to-end learned features offering less interpretability but higher accuracy. We show the efficacy of these approaches when trained on a single speaker's voice and when trained on multiple voices. The learned features consistently yield an equal error rate between $0\%$ and $4\%$, and are reasonably robust to adversarial laundering.
摘要
artifical voice 技术在过去几年内得到了 significiant advances,带来了一系列可能的害。从小规模到大规模的金融诈骗到假信息运动,需要可靠的方法来分辨真正的声音和假声音。我们描述了三种方法来分辨真正的声音和假声音,这三种方法在特征提取阶段有低维度的感知特征、通用频谱特征和终端学习的特征。我们表明这些方法在单个 speaker 的声音和多个声音上都具有较高的准确率,并且可以抵抗假钱包洗涤。
Towards Generalizable Detection of Urgency of Discussion Forum Posts
paper_authors: Valdemar Švábenský, Ryan S. Baker, Andrés Zambrano, Yishan Zou, Stefan Slater for: This paper aims to help instructors in online courses, such as MOOCs, better support student learning by automatically determining the urgency of forum posts.methods: The authors use machine learning techniques to build predictive models that determine the urgency of forum posts on a 7-point scale. They train and cross-validate several models on an original data set of 3,503 posts from MOOCs at the University of Pennsylvania, and test their performance on a separate data set of 29,604 posts from MOOCs at Stanford University.results: The best-performing model was a support vector regressor trained on the Universal Sentence Encoder embeddings of the posts, achieving an RMSE of 1.1 on the training set and 1.4 on the test set. This suggests that the model is effective in predicting the urgency of forum posts and could be used to help instructors focus their time more effectively and better support student learning.Abstract
Students who take an online course, such as a MOOC, use the course's discussion forum to ask questions or reach out to instructors when encountering an issue. However, reading and responding to students' questions is difficult to scale because of the time needed to consider each message. As a result, critical issues may be left unresolved, and students may lose the motivation to continue in the course. To help address this problem, we build predictive models that automatically determine the urgency of each forum post, so that these posts can be brought to instructors' attention. This paper goes beyond previous work by predicting not just a binary decision cut-off but a post's level of urgency on a 7-point scale. First, we train and cross-validate several models on an original data set of 3,503 posts from MOOCs at University of Pennsylvania. Second, to determine the generalizability of our models, we test their performance on a separate, previously published data set of 29,604 posts from MOOCs at Stanford University. While the previous work on post urgency used only one data set, we evaluated the prediction across different data sets and courses. The best-performing model was a support vector regressor trained on the Universal Sentence Encoder embeddings of the posts, achieving an RMSE of 1.1 on the training set and 1.4 on the test set. Understanding the urgency of forum posts enables instructors to focus their time more effectively and, as a result, better support student learning.
摘要
在线学生们,如果他们参加了MOOC课程,通常会使用课程的讨论 форум来提问或向教师们寻求帮助当遇到问题。然而,阅读和回答学生的问题需要一定的时间,因此可能会有一些重要的问题被忽略。为了解决这个问题,我们建立了一些预测模型,以便自动确定讨论 форум的帖子的紧急程度,以便将其带到教师的注意力中。这篇论文超过了之前的工作,因为它不仅预测了一个二分类决策门槛,而且预测了帖子的紧急程度在7个级别上。我们首先训练了多个模型,并对其进行跨验证。其中,我们使用大学 Pennsylvania 的 MOOC 课程数据集,训练了多个模型,并对其进行跨验证。为了证明我们的模型的一致性,我们对其进行了测试,并与之前发表的 Stanford University 的 MOOC 课程数据集进行了比较。而之前的帖子紧急性预测工作只使用了一个数据集,我们的模型则在不同的数据集和课程上进行了预测。我们最佳的模型是使用 Universal Sentence Encoder embedding 训练的支持向量回归模型,其在训练集上的 RMSE 为1.1,测试集上的 RMSE 为1.4。理解讨论帖子的紧急程度可以帮助教师更好地利用时间,从而更好地支持学生的学习。
QontSum: On Contrasting Salient Content for Query-focused Summarization
results: 对于一些 benchmark 数据集,QontSum Either outperforms 现有状态的艺术 или 具有较低的计算成本,而不是通过大规模预训练实验来实现。此外,人工研究表明,QontSum 生成的摘要更加与问题相关,而不会产生流利性的损失。Abstract
Query-focused summarization (QFS) is a challenging task in natural language processing that generates summaries to address specific queries. The broader field of Generative Information Retrieval (Gen-IR) aims to revolutionize information extraction from vast document corpora through generative approaches, encompassing Generative Document Retrieval (GDR) and Grounded Answer Retrieval (GAR). This paper highlights the role of QFS in Grounded Answer Generation (GAR), a key subdomain of Gen-IR that produces human-readable answers in direct correspondence with queries, grounded in relevant documents. In this study, we propose QontSum, a novel approach for QFS that leverages contrastive learning to help the model attend to the most relevant regions of the input document. We evaluate our approach on a couple of benchmark datasets for QFS and demonstrate that it either outperforms existing state-of-the-art or exhibits a comparable performance with considerably reduced computational cost through enhancements in the fine-tuning stage, rather than relying on large-scale pre-training experiments, which is the focus of current SOTA. Moreover, we conducted a human study and identified improvements in the relevance of generated summaries to the posed queries without compromising fluency. We further conduct an error analysis study to understand our model's limitations and propose avenues for future research.
摘要
问题集中摘要(QFS)是自然语言处理中的一个挑战任务,它生成摘要以回答特定问题。更广泛的生成信息抽取(Gen-IR)领域希望通过生成方法来从巨大的文档库中抽取信息,包括生成文档搜寻(GDR)和固定答案搜寻(GAR)。本文强调QFS在GAR中的角色,GAR是Gen-IR的一个子领域,它生成基于问题的人阅读性的答案,并将答案与问题相对应。在这篇研究中,我们提出了一种新的QFS方法,叫做QontSum,它利用对比学习来帮助模型对输入文档中最相关的区域进行专注。我们在一些QFS的 benchmarck datasets 进行评估,并证明了QontSum Either outperforms 现有的State-of-the-art(SOTA)或与现有的SOTA相比,具有许多reduced computational cost,而不是通过大规模的预训学习实验。此外,我们进行了人类研究,并发现了生成摘要与问题之间的相关性得到了改善,而不会妥协于流畅性。 finally, we conducted an error analysis study to understand our model's limitations and proposed avenues for future research.
Sensi-BERT: Towards Sensitivity Driven Fine-Tuning for Parameter-Efficient BERT
results: 这篇论文的实验结果显示,Sensi-BERT可以在不同的下游任务上(包括MNLI、QQP、QNLI和SST-2)实现更好的性能,而且在相同或更小的参数预算下。Abstract
Large pre-trained language models have recently gained significant traction due to their improved performance on various down-stream tasks like text classification and question answering, requiring only few epochs of fine-tuning. However, their large model sizes often prohibit their applications on resource-constrained edge devices. Existing solutions of yielding parameter-efficient BERT models largely rely on compute-exhaustive training and fine-tuning. Moreover, they often rely on additional compute heavy models to mitigate the performance gap. In this paper, we present Sensi-BERT, a sensitivity driven efficient fine-tuning of BERT models that can take an off-the-shelf pre-trained BERT model and yield highly parameter-efficient models for downstream tasks. In particular, we perform sensitivity analysis to rank each individual parameter tensor, that then is used to trim them accordingly during fine-tuning for a given parameter or FLOPs budget. Our experiments show the efficacy of Sensi-BERT across different downstream tasks including MNLI, QQP, QNLI, and SST-2, demonstrating better performance at similar or smaller parameter budget compared to various existing alternatives.
摘要
In this paper, we propose Sensi-BERT, a sensitivity-driven efficient fine-tuning method for BERT models that can take an off-the-shelf pre-trained BERT model and generate highly parameter-efficient models for downstream tasks. Specifically, we perform sensitivity analysis to rank each individual parameter tensor, and then trim them accordingly during fine-tuning based on a given parameter or FLOPs budget. Our experiments show the effectiveness of Sensi-BERT across different downstream tasks, including MNLI, QQP, QNLI, and SST-2, demonstrating better performance at a similar or smaller parameter budget compared to various existing alternatives.
Population Expansion for Training Language Models with Private Federated Learning
paper_authors: Tatsuki Koga, Congzheng Song, Martin Pelikan, Mona Chitnis
for: 提高小规模训练集的模型质量和训练效率
methods: 使用域 adaptive 技术扩大训练集大小,提高模型质量和训练效率
results: 在实际语言模型 datasets 上,提高模型质量约 13%-30%,训练效率也得到了提高。Abstract
Federated learning (FL) combined with differential privacy (DP) offers machine learning (ML) training with distributed devices and with a formal privacy guarantee. With a large population of devices, FL with DP produces a performant model in a timely manner. However, for applications with a smaller population, not only does the model utility degrade as the DP noise is inversely proportional to population, but also the training latency increases since waiting for enough clients to become available from a smaller pool is slower. In this work, we thus propose expanding the population based on domain adaptation techniques to speed up the training and improves the final model quality when training with small populations. We empirically demonstrate that our techniques can improve the utility by 13% to 30% on real-world language modeling datasets.
摘要
federated learning (FL) 与差异隐私 (DP) 结合可以实现分布式设备上的机器学习 (ML) 训练,并且具有正式的隐私保证。在大规模设备人口中,FL 与 DP 生成的模型在时间上具有良好的性能,但是在小规模应用中,模型的性能会降低,而且等待来自小池中的足够客户端的可用性需要更长时间。为了解决这个问题,我们提议通过域 adaptation 技术来扩大人口,以加速训练和提高最终模型质量。我们通过实验表明,我们的技术可以在实际语言模型 dataset 上提高utilities 13% 到 30%。
results: 研究发现,ECAPA-TDNN模型在识别爱尔兰语言的不同 диалект方面表现最佳,特别是在爱尔兰北部的伦敦达利尔 диалект方面具有94%的准确率。然而,模型在康诺特和明斯特两个 диалект之间的差异化方面存在问题,建议可能需要采用更加细致的方法来稳定地分辨这两个 диаLECT。Abstract
The Irish language is rich in its diversity of dialects and accents. This compounds the difficulty of creating a speech recognition system for the low-resource language, as such a system must contend with a high degree of variability with limited corpora. A recent study investigating dialect bias in Irish ASR found that balanced training corpora gave rise to unequal dialect performance, with performance for the Ulster dialect being consistently worse than for the Connacht or Munster dialects. Motivated by this, the present experiments investigate spoken dialect identification of Irish, with a view to incorporating such a system into the speech recognition pipeline. Two acoustic classification models are tested, XLS-R and ECAPA-TDNN, in conjunction with a text-based classifier using a pretrained Irish-language BERT model. The ECAPA-TDNN, particularly a model pretrained for language identification on the VoxLingua107 dataset, performed best overall, with an accuracy of 73%. This was further improved to 76% by fusing the model's outputs with the text-based model. The Ulster dialect was most accurately identified, with an accuracy of 94%, however the model struggled to disambiguate between the Connacht and Munster dialects, suggesting a more nuanced approach may be necessary to robustly distinguish between the dialects of Irish.
摘要
爱尔兰语言具有多样化的方言和口音,这使得为低资源语言的语音识别系统设计更加困难,因为系统需要面对各种方言和口音的变化。一项latest study发现,在爱尔兰语音识别中,具有平衡训练数据集的系统会导致不同方言的表现不均匀,特别是北爱尔兰方言表现较差。为了解决这个问题,当前实验探索爱尔兰口音的识别,并计划将其 integrate into speech recognition pipeline。两种音频分类模型被测试,即XLS-R和ECAPA-TDNN,并与一个基于爱尔兰语言BERT模型的文本分类器进行结合。ECAPA-TDNN模型,尤其是在VoxLingua107数据集上进行语言标识训练的模型,在整体上表现最佳,准确率达73%。通过将模型的输出与文本分类器结合,准确率得到了进一步提高,达76%。北爱尔兰方言的识别率最高,达94%,但是系统在connacht和munster方言之间的干扰仍然存在, suggesting a more nuanced approach may be necessary to robustly distinguish between the dialects of Irish.
for: This paper aims to address the issue of distribution shifting in post-hoc instance-level explanation methods for Graph Neural Networks (GNNs), which can lead to poor explanation quality in real-world applications with tight decision boundaries.
methods: The proposed approach is based on a generalized Graph Information Bottleneck (GIB) form that includes a label-independent graph variable, which is equivalent to the vanilla GIB. The approach also uses a graph mixup method called MixupExplainer, which has a theoretical guarantee to resolve the distribution shifting issue.
results: The proposed MixupExplainer approach is validated through extensive experiments on both synthetic and real-world datasets, and is shown to be effective in addressing the distribution shifting issue and improving explanation quality. Additionally, the paper provides a detailed analysis of how the proposed approach alleviates the distribution shifting issue.Here is the result in Simplified Chinese text:
results: 该方法通过对 sintetic和实际数据集进行了广泛的实验 validate,并证明了其能够有效地解决分布shift问题,提高解释质量。此外,论文还提供了对该方法如何缓解分布shift问题的详细分析。Abstract
Graph Neural Networks (GNNs) have received increasing attention due to their ability to learn from graph-structured data. However, their predictions are often not interpretable. Post-hoc instance-level explanation methods have been proposed to understand GNN predictions. These methods seek to discover substructures that explain the prediction behavior of a trained GNN. In this paper, we shed light on the existence of the distribution shifting issue in existing methods, which affects explanation quality, particularly in applications on real-life datasets with tight decision boundaries. To address this issue, we introduce a generalized Graph Information Bottleneck (GIB) form that includes a label-independent graph variable, which is equivalent to the vanilla GIB. Driven by the generalized GIB, we propose a graph mixup method, MixupExplainer, with a theoretical guarantee to resolve the distribution shifting issue. We conduct extensive experiments on both synthetic and real-world datasets to validate the effectiveness of our proposed mixup approach over existing approaches. We also provide a detailed analysis of how our proposed approach alleviates the distribution shifting issue.
摘要
graph neural networks (GNNs) 已经收到了越来越多的关注,因为它们可以从图结构数据中学习。然而,它们的预测通常不是可解释的。post-hoc实例级解释方法已经被提出,以解释训练好的 GNN 的预测行为。在这篇论文中,我们探讨了现有方法中的分布转移问题,该问题影响解释质量,特别是在实际数据集上 with tight decision boundaries 上。为解决这个问题,我们引入一种通用的图信息瓶颈(GIB)形式,该形式包括一个独立于标签的图变量,与普通的 GIB 相等。驱动于通用 GIB,我们提议一种图mixup方法,MixupExplainer,具有解决分布转移问题的理论保证。我们在 both synthetic 和实际数据集上进行了广泛的实验,以验证我们的提议的混合方法的效iveness。我们还提供了详细的分析,解释我们的提议如何缓解分布转移问题。
Minimal Random Code Learning with Mean-KL Parameterization
paper_authors: Jihao Andreas Lin, Gergely Flamich, José Miguel Hernández-Lobato
for: 这个论文研究了两种基于Minimal Random Code Learning(MIRACLE)的变分 Bayesian neural networks的质量行为和稳定性。
methods: 论文使用了一种强大的、conditionally Gaussian变分approximation来 aproximate the weight posterior $Q_{\mathbf{w}$,并使用relative entropy coding来压缩一个weight sample从 posterior中使用 Gaussian coding distribution $P_{\mathbf{w}$。
results: 作者们发现,使用 Mean-KL 参数化可以更快 converges 并保持预测性能,并且 Mean-KL 导致了更有意义的变分分布和压缩weight sample,这些sample更易受到截彩处理。Abstract
This paper studies the qualitative behavior and robustness of two variants of Minimal Random Code Learning (MIRACLE) used to compress variational Bayesian neural networks. MIRACLE implements a powerful, conditionally Gaussian variational approximation for the weight posterior $Q_{\mathbf{w}$ and uses relative entropy coding to compress a weight sample from the posterior using a Gaussian coding distribution $P_{\mathbf{w}$. To achieve the desired compression rate, $D_{\mathrm{KL}[Q_{\mathbf{w} \Vert P_{\mathbf{w}]$ must be constrained, which requires a computationally expensive annealing procedure under the conventional mean-variance (Mean-Var) parameterization for $Q_{\mathbf{w}$. Instead, we parameterize $Q_{\mathbf{w}$ by its mean and KL divergence from $P_{\mathbf{w}$ to constrain the compression cost to the desired value by construction. We demonstrate that variational training with Mean-KL parameterization converges twice as fast and maintains predictive performance after compression. Furthermore, we show that Mean-KL leads to more meaningful variational distributions with heavier tails and compressed weight samples which are more robust to pruning.
摘要
Machine Learning Meets Mental Training – A Proof of Concept Applied to Memory Sports
results: 研究发现,通过使用机器学习算法,可以提高记忆运动的效果和精度,并且可以预测记忆运动的成绩。Abstract
This work aims to combine these two fields together by presenting a practical implementation of machine learning to the particular form of mental training that is the art of memory, taken in its competitive version called "Memory Sports". Such a fusion, on the one hand, strives to raise awareness about both realms, while on the other it seeks to encourage research in this mixed field as a way to, ultimately, drive forward the development of this seemingly underestimated sport.
摘要
(Simplified Chinese)这项工作 aimsto combine these two fields together by presenting a practical implementation of machine learning to the particular form of mental training that is the art of memory, taken in its competitive version called "Memory Sports". Such a fusion, on the one hand, strives to raise awareness about both realms, while on the other it seeks to encourage research in this mixed field as a way to, ultimately, drive forward the development of this seemingly underestimated sport.Note: The word " Memory Sports" is not a direct translation of "Memory Sports" in Chinese, but it is a commonly used term in the field to refer to competitive memory training.
Graph Automorphism Group Equivariant Neural Networks
methods: 这种研究使用了learnable、线性、$\textrm{Aut}(G)$-equivariant层函数的 span set 来 characterize 所有可能的层次结构。
results: 研究发现,对于任意的图 $G$ 和 $\textrm{Aut}(G)$,存在一个 span set of matrices 表示所有可能的 learnable、线性、$\textrm{Aut}(G)$-equivariant层函数,并且这些层函数可以在标准基底上表示 $\mathbb{R}^{n}$ 中的所有 tensor power。Abstract
For any graph $G$ having $n$ vertices and its automorphism group $\textrm{Aut}(G)$, we provide a full characterisation of all of the possible $\textrm{Aut}(G)$-equivariant neural networks whose layers are some tensor power of $\mathbb{R}^{n}$. In particular, we find a spanning set of matrices for the learnable, linear, $\textrm{Aut}(G)$-equivariant layer functions between such tensor power spaces in the standard basis of $\mathbb{R}^{n}$.
摘要
Simplified Chinese:For any graph $G$ with $n$ vertices, we provide a full characterization of all possible $\textrm{Aut}(G)$-equivariant neural networks whose layers are some tensor power of $\mathbb{R}^{n}$. Specifically, we find a spanning set of matrices for the learnable, linear, $\textrm{Aut}(G)$-equivariant layer functions between such tensor power spaces in the standard basis of $\mathbb{R}^{n}$.Note: "tensor power" is not a standard term in Simplified Chinese, so I used the phrase "some tensor power" to convey the same meaning.
$\text{EFO}_{k}$-CQA: Towards Knowledge Graph Complex Query Answering beyond Set Operation
results: 本研究提出了一个名为 $\text{EFO}_{k}$-CQA的新数据集,并通过实验评估了这些方法在不同的查询难度下的性能。results also show that the existing dataset construction process is biased, highlighting the importance of the proposed framework.Abstract
To answer complex queries on knowledge graphs, logical reasoning over incomplete knowledge is required due to the open-world assumption. Learning-based methods are essential because they are capable of generalizing over unobserved knowledge. Therefore, an appropriate dataset is fundamental to both obtaining and evaluating such methods under this paradigm. In this paper, we propose a comprehensive framework for data generation, model training, and method evaluation that covers the combinatorial space of Existential First-order Queries with multiple variables ($\text{EFO}_{k}$). The combinatorial query space in our framework significantly extends those defined by set operations in the existing literature. Additionally, we construct a dataset, $\text{EFO}_{k}$-CQA, with 741 types of query for empirical evaluation, and our benchmark results provide new insights into how query hardness affects the results. Furthermore, we demonstrate that the existing dataset construction process is systematically biased that hinders the appropriate development of query-answering methods, highlighting the importance of our work. Our code and data are provided in~\url{https://github.com/HKUST-KnowComp/EFOK-CQA}.
摘要
“为回答知识图中复杂的查询,因为开放世界假设,需要逻辑推理 sobre 不完全的知识。学习基于方法是必要的,因为它们可以对未观察到的知识进行泛化。因此,一个适当的数据集是知识推理方法的基础,以及评估这些方法的基础。在这篇论文中,我们提出了一个完整的框架,包括数据生成、模型训练和方法评估,覆盖了多变量($\text{EFO}_{k}$)的组合空间。我们的框架中的组合查询空间significantly extends those defined by set operations in the existing literature。此外,我们构建了741种类型的查询集,并提供了empirical evaluation,我们的研究结果提供了新的视角,描述了查询困难度对结果的影响。此外,我们还发现了现有数据集构建过程存在系统性的偏见,这阻碍了适当的查询答案方法的发展,强调了我们的工作的重要性。我们的代码和数据可以在\url{https://github.com/HKUST-KnowComp/EFOK-CQA}中找到。”
The Interpolating Information Criterion for Overparameterized Models
paper_authors: Liam Hodgkinson, Chris van der Heide, Robert Salomone, Fred Roosta, Michael W. Mahoney
for: interpolating estimators with overparameterized models
methods: using classical information criteria, Bayesian duality, and prior misspecification
results: a new information criterion called Interpolating Information Criterion (IIC) that accounts for prior misspecification, geometric and spectral properties of the model, and is numerically consistent with known empirical and theoretical behavior in the overparameterized settingAbstract
The problem of model selection is considered for the setting of interpolating estimators, where the number of model parameters exceeds the size of the dataset. Classical information criteria typically consider the large-data limit, penalizing model size. However, these criteria are not appropriate in modern settings where overparameterized models tend to perform well. For any overparameterized model, we show that there exists a dual underparameterized model that possesses the same marginal likelihood, thus establishing a form of Bayesian duality. This enables more classical methods to be used in the overparameterized setting, revealing the Interpolating Information Criterion, a measure of model quality that naturally incorporates the choice of prior into the model selection. Our new information criterion accounts for prior misspecification, geometric and spectral properties of the model, and is numerically consistent with known empirical and theoretical behavior in this regime.
摘要
“模型选择问题在 interpolating estimators 的设置下被考虑,其中模型参数的数量超出数据集的大小。经典信息critérium通常在大数据 limit 下考虑模型大小,但这些 критериion 不适用于现代设置, где过参数化模型往往表现良好。我们证明,任何过参数化模型都存在一个对应的 dual underparameterized model,这两个模型具有同样的边缘分布,从而建立了一种 Bayesian duality。这使得更 classical methods 可以在过参数化 Setting 中使用,揭示了 interpolating information criterion,一种评价模型质量的指标,这个指标自然地包括先验选择的选择。我们的新信息 критериion 考虑了先验错误、模型的几何和спектраль性质,与已知的 empirical 和理论行为相一致。”
CatBoost Versus XGBoost and LightGBM: Developing Enhanced Predictive Models for Zero-Inflated Insurance Claim Data
for: 这 paper 是为了构建投保laim predictive模型而写的,面临着高度右偏度分布的正确laims 和过多的 zeros 的挑战。
methods: 这 paper 使用了 zero-inflated 模型,将 traditional count model 和 binary model 结合起来,更有效地处理投保laim 数据。
results: 经过对两个不同的数据集的分析和比较, CatBoost 库在建立汽车投保laim frequency 模型方面表现最佳,并且发现 zero-inflated Poisson 树模型在不同数据特点下的假设对于relation between inflation probability and distribution mean 的变化会影响其性能。 I hope this helps! Let me know if you have any further questions.Abstract
In the property and casualty insurance industry, some challenges are presented in constructing claim predictive models due to a highly right-skewed distribution of positive claims with excess zeros. Traditional models, such as Poisson or negative binomial Generalized Linear Models(GLMs), frequently struggle with inflated zeros. In response to this, researchers in actuarial science have employed ``zero-inflated" models that merge a traditional count model and a binary model to address these datasets more effectively. This paper uses boosting algorithms to process insurance claim data, including zero-inflated telematics data, in order to construct claim frequency models. We evaluated and compared three popular gradient boosting libraries - XGBoost, LightGBM, and CatBoost - with the aim of identifying the most suitable library for training insurance claim data and fitting actuarial frequency models. Through a rigorous analysis of two distinct datasets, we demonstrated that CatBoost is superior in developing auto claim frequency models based on predictive performance. We also found that Zero-inflated Poisson boosted tree models, with variations in their assumptions about the relationship between inflation probability and distribution mean, outperformed others depending on data characteristics. Furthermore, by using a specific CatBoost tool, we explored the effects and interactions of different risk features on the frequency model when using telematics data.
摘要
在财产和责任保险业务中,建立投保模型时会遇到一些挑战,主要是因为投保金额呈右skewed分布,具有过多的零值。传统模型,如波尔tz或非正态泛化模型(GLM),经常遇到膨胀零值问题。为了解决这个问题, actuarial science 研究人员使用了“zero-inflated”模型,这种模型结合了传统的计数模型和二分模型,可以更有效地处理这些数据。本文使用了扩大算法来处理投保laim data,包括零Inflated telematics data,以建立投保频率模型。我们对三种popular gradient boosting库(XGBoost、LightGBM、CatBoost)进行了评估和比较,以确定最适合训练投保laim数据和适应保险频率模型的库。经过对两个不同的数据集的严格分析,我们发现CatBoost在开发汽车投保频率模型方面表现出色,并且对数据特点进行了深入的探索和分析。此外,我们还使用了CatBoost工具来探索不同风险特征对频率模型的影响,并对telematics数据进行了深入的分析。
randomHAR: Improving Ensemble Deep Learners for Human Activity Recognition with Sensor Selection and Reinforcement Learning
results: 对六个HAR数据集进行比较,结果表明提议的方法可以超越当前状态的各种方法,包括ensembleLSTM。Abstract
Deep learning has proven to be an effective approach in the field of Human activity recognition (HAR), outperforming other architectures that require manual feature engineering. Despite recent advancements, challenges inherent to HAR data, such as noisy data, intra-class variability and inter-class similarity, remain. To address these challenges, we propose an ensemble method, called randomHAR. The general idea behind randomHAR is training a series of deep learning models with the same architecture on randomly selected sensor data from the given dataset. Besides, an agent is trained with the reinforcement learning algorithm to identify the optimal subset of the trained models that are utilized for runtime prediction. In contrast to existing work, this approach optimizes the ensemble process rather than the architecture of the constituent models. To assess the performance of the approach, we compare it against two HAR algorithms, including the current state of the art, on six HAR benchmark datasets. The result of the experiment demonstrates that the proposed approach outperforms the state-of-the-art method, ensembleLSTM.
摘要
深度学习在人动识别(HAR)领域已经证明是一种有效的方法,超过了需要人工特征工程的其他架构。 DESPITE recent advancements, HAR数据中的挑战,如噪音数据、内类变化和间类相似性,仍然存在。 To address these challenges, we propose an ensemble method, called randomHAR. The general idea behind randomHAR is to train a series of deep learning models with the same architecture on randomly selected sensor data from the given dataset. Besides, an agent is trained with the reinforcement learning algorithm to identify the optimal subset of the trained models that are utilized for runtime prediction. In contrast to existing work, this approach optimizes the ensemble process rather than the architecture of the constituent models. To assess the performance of the approach, we compare it against two HAR algorithms, including the current state of the art, on six HAR benchmark datasets. The result of the experiment demonstrates that the proposed approach outperforms the state-of-the-art method, ensembleLSTM.Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.
Variational Monte Carlo on a Budget – Fine-tuning pre-trained Neural Wavefunctions
paper_authors: Michael Scherbela, Leon Gerard, Philipp Grohs
for: 这 paper 的目的是提出一种基于深度学习的变量 Monte Carlo(DL-VMC)方法,以提高计算量化化学中的精度。
methods: 这 paper 使用了自我超vised wavefunction optimization 来预训练 DL-VMC 模型,并在新的分子实例上应用这个模型来获得更高的精度。
results: compared to established methods such as CCSD(T)-2Z, 这 paper 的方法可以获得更高的精度和更好的相对能量。 In addition, the method can be applied to a wide variety of test systems and shows good scalability.Abstract
Obtaining accurate solutions to the Schr\"odinger equation is the key challenge in computational quantum chemistry. Deep-learning-based Variational Monte Carlo (DL-VMC) has recently outperformed conventional approaches in terms of accuracy, but only at large computational cost. Whereas in many domains models are trained once and subsequently applied for inference, accurate DL-VMC so far requires a full optimization for every new problem instance, consuming thousands of GPUhs even for small molecules. We instead propose a DL-VMC model which has been pre-trained using self-supervised wavefunction optimization on a large and chemically diverse set of molecules. Applying this model to new molecules without any optimization, yields wavefunctions and absolute energies that outperform established methods such as CCSD(T)-2Z. To obtain accurate relative energies, only few fine-tuning steps of this base model are required. We accomplish this with a fully end-to-end machine-learned model, consisting of an improved geometry embedding architecture and an existing SE(3)-equivariant model to represent molecular orbitals. Combining this architecture with continuous sampling of geometries, we improve zero-shot accuracy by two orders of magnitude compared to the state of the art. We extensively evaluate the accuracy, scalability and limitations of our base model on a wide variety of test systems.
摘要
computational quantum chemistry中的主要挑战是获取准确的Schrödinger方程解。深度学习基于变量 Monte Carlo(DL-VMC)在过去几年内已经超越了传统方法,但是它们的计算成本很大。在许多领域中,模型会被训练一次并用于推理,而DL-VMC则需要每个新问题都进行全局优化,消耗了千个GPUhs甚至对于小分子来说。我们提议一种已经预训练过的DL-VMC模型,使用自动优化的自我适应波函数优化算法来训练。对于新的分子,只需要几个精度调整步骤,就可以获得比CCSD(T)-2Z更高的精度。为了获取准确的相对能量,我们使用一个完整的端到端机器学习模型,包括改进的几何嵌入体系和现有的SE(3)-可变模型来表示分子轨道函数。将这种体系与连续样本的几何描述相结合,我们提高了零shot精度至少两个数量级比前state of the art。我们对各种测试系统进行了广泛的评估,包括准确度、可扩展性和限制。
Real-time Traffic Classification for 5G NSA Encrypted Data Flows With Physical Channel Records
paper_authors: Xiao Fei, Philippe Martins, Jialiang Lu
for: 5G-NR mobile network traffic classification for QoS management and dynamic resource allocation
methods: real-time encrypted traffic classification using physical channel records and decision-tree-based gradient boosting algorithms
results: 95% accuracy with state-of-the-art response time of 10ms using Light Gradient Boosting Machine (LGBM)Abstract
The classification of fifth-generation New-Radio (5G-NR) mobile network traffic is an emerging topic in the field of telecommunications. It can be utilized for quality of service (QoS) management and dynamic resource allocation. However, traditional approaches such as Deep Packet Inspection (DPI) can not be directly applied to encrypted data flows. Therefore, new real-time encrypted traffic classification algorithms need to be investigated to handle dynamic transmission. In this study, we examine the real-time encrypted 5G Non-Standalone (NSA) application-level traffic classification using physical channel records. Due to the vastness of their features, decision-tree-based gradient boosting algorithms are a viable approach for classification. We generate a noise-limited 5G NSA trace dataset with traffic from multiple applications. We develop a new pipeline to convert sequences of physical channel records into numerical vectors. A set of machine learning models are tested, and we propose our solution based on Light Gradient Boosting Machine (LGBM) due to its advantages in fast parallel training and low computational burden in practical scenarios. Our experiments demonstrate that our algorithm can achieve 95% accuracy on the classification task with a state-of-the-art response time as quick as 10ms.
摘要
fifth-generation New-Radio (5G-NR) 移动网络流量的分类是当前 телеcommunications 领域的一个热点话题。它可以用于质量服务(QoS)管理和动态资源分配。然而,传统的方法,如深度包检查(DPI),无法直接应用于加密数据流。因此,新的实时加密交通分类算法需要被研究以处理动态传输。在本研究中,我们研究了实时加密5G非标准应用级别(NSA)的应用级别流量分类,使用物理通道记录。由于它们的特征很多,决策树基本的泵浦搅拌算法是一种可行的方法。我们生成了5G NSA的噪声限定数据集,包括多个应用程序的流量。我们开发了一个新的管道,将物理通道记录序列转换为数字矢量。一系列机器学习模型被测试,我们提议使用光 Gradient Boosting Machine(LGBM),因为它在实际应用中具有快速并行训练和低计算负担的优点。我们的实验表明,我们的算法可以在分类任务上达到95%的准确率,并且响应时间只需10ms。
Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks
results: 我们通过实验证明了这种方法的效果,包括不确定性估计和通用化。Abstract
In this work, we propose a novel prior learning method for advancing generalization and uncertainty estimation in deep neural networks. The key idea is to exploit scalable and structured posteriors of neural networks as informative priors with generalization guarantees. Our learned priors provide expressive probabilistic representations at large scale, like Bayesian counterparts of pre-trained models on ImageNet, and further produce non-vacuous generalization bounds. We also extend this idea to a continual learning framework, where the favorable properties of our priors are desirable. Major enablers are our technical contributions: (1) the sums-of-Kronecker-product computations, and (2) the derivations and optimizations of tractable objectives that lead to improved generalization bounds. Empirically, we exhaustively show the effectiveness of this method for uncertainty estimation and generalization.
摘要
在这项工作中,我们提出了一种新的先学习方法,用于提高深度神经网络的泛化和不确定性估计。关键思想是利用可扩展和结构化的神经网络 posterior 作为有用的先学习模型,具有泛化保证。我们学习的先学习模型可以在大规模上表达可信度,类似于 Bayesian 对 ImageNet 预训练模型的Counterpart,并且生成非虚无效的泛化误差 bound。我们还将这个想法应用于连续学习框架,其中我们的先学习模型具有恰当的性质。主要推动因素是我们的技术贡献:(1) Kronecker 乘积计算,以及(2)对可迭代目标函数的 derivation 和优化,导致改进的泛化误差 bound。在实验中,我们详细展示了这种方法的效果,包括不确定性估计和泛化。
Probabilistic Black-Box Checking via Active MDP Learning
results: ProbBBC比现有方法更高效,特别是对具有有限观察的系统。Abstract
We introduce a novel methodology for testing stochastic black-box systems, frequently encountered in embedded systems. Our approach enhances the established black-box checking (BBC) technique to address stochastic behavior. Traditional BBC primarily involves iteratively identifying an input that breaches the system's specifications by executing the following three phases: the learning phase to construct an automaton approximating the black box's behavior, the synthesis phase to identify a candidate counterexample from the learned automaton, and the validation phase to validate the obtained candidate counterexample and the learned automaton against the original black-box system. Our method, ProbBBC, refines the conventional BBC approach by (1) employing an active Markov Decision Process (MDP) learning method during the learning phase, (2) incorporating probabilistic model checking in the synthesis phase, and (3) applying statistical hypothesis testing in the validation phase. ProbBBC uniquely integrates these techniques rather than merely substituting each method in the traditional BBC; for instance, the statistical hypothesis testing and the MDP learning procedure exchange information regarding the black-box system's observation with one another. The experiment results suggest that ProbBBC outperforms an existing method, especially for systems with limited observation.
摘要
我们介绍一种新的黑盒系统测试方法,这种方法可以更好地捕捉黑盒系统中的随机行为。我们的方法基于传统的黑盒检查(BBC)技术,但它具有以下三个特点:1. 在学习阶段使用活动马尔可夫遇处理(MDP)学习方法,以更好地模拟黑盒系统的行为。2. 在合成阶段使用概率模型检查,以更好地找到黑盒系统的错误。3. 在验证阶段使用统计假设检测,以验证获得的候选反例和学习的黑盒系统是否符合原始黑盒系统。我们的方法不同于传统的 BBC 方法,不仅是将每种方法简单地替换成另一种。例如,统计假设检测和 MDP 学习过程之间会互相交换黑盒系统的观察信息。我们的实验结果表明,ProbBBC 比现有的方法更高效,特别是针对具有有限观察的系统。
On the Utility Gain of Iterative Bayesian Update for Locally Differentially Private Mechanisms
methods: 我们比较了 IBU 和 Matrix Inversion (MI) 两种估计技术的性能,对七种LDP机制进行了一次数据收集和多次数据收集的比较(如 RAPPOR)。我们还在不同的实用环境下(包括 synthetic 数据和实际数据)进行了参数调整(包括 utility 度量、用户数 n、领域大小 k 和隐私参数 {\epsilon})。
results: 我们的结果表明,IBU 可以在不同的场景下提高 LDP 机制的实用性,而不需要额外的隐私成本。例如,在高隐私 режи(即 {\epsilon} 小)下,IBU 可以提供更好的实用性比 MI。我们的研究为实践者提供了使用 IBU 和现有 LDP 机制进行更准确和隐私保护的数据分析的指导。此外,我们将 IBU 实现到了 state-of-the-art multi-freq-ldpy Python 包(https://pypi.org/project/multi-freq-ldpy/)中,并将所有我们用于实验的代码开源了为 tutorials。Abstract
This paper investigates the utility gain of using Iterative Bayesian Update (IBU) for private discrete distribution estimation using data obfuscated with Locally Differentially Private (LDP) mechanisms. We compare the performance of IBU to Matrix Inversion (MI), a standard estimation technique, for seven LDP mechanisms designed for one-time data collection and for other seven LDP mechanisms designed for multiple data collections (e.g., RAPPOR). To broaden the scope of our study, we also varied the utility metric, the number of users n, the domain size k, and the privacy parameter {\epsilon}, using both synthetic and real-world data. Our results suggest that IBU can be a useful post-processing tool for improving the utility of LDP mechanisms in different scenarios without any additional privacy cost. For instance, our experiments show that IBU can provide better utility than MI, especially in high privacy regimes (i.e., when {\epsilon} is small). Our paper provides insights for practitioners to use IBU in conjunction with existing LDP mechanisms for more accurate and privacy-preserving data analysis. Finally, we implemented IBU for all fourteen LDP mechanisms into the state-of-the-art multi-freq-ldpy Python package (https://pypi.org/project/multi-freq-ldpy/) and open-sourced all our code used for the experiments as tutorials.
摘要
Knowledge Graph Enhanced Intelligent Tutoring System Based on Exercise Representativeness and Informativeness
results: 对两个公共教育数据集进行了广泛的实验,结果表明,该 framwork 可以更好地推荐适合学生的练习题,提高学生的性能Abstract
Presently, knowledge graph-based recommendation algorithms have garnered considerable attention among researchers. However, these algorithms solely consider knowledge graphs with single relationships and do not effectively model exercise-rich features, such as exercise representativeness and informativeness. Consequently, this paper proposes a framework, namely the Knowledge-Graph-Exercise Representativeness and Informativeness Framework, to address these two issues. The framework consists of four intricate components and a novel cognitive diagnosis model called the Neural Attentive cognitive diagnosis model. These components encompass the informativeness component, exercise representation component, knowledge importance component, and exercise representativeness component. The informativeness component evaluates the informational value of each question and identifies the candidate question set that exhibits the highest exercise informativeness. Furthermore, the skill embeddings are employed as input for the knowledge importance component. This component transforms a one-dimensional knowledge graph into a multi-dimensional one through four class relations and calculates skill importance weights based on novelty and popularity. Subsequently, the exercise representativeness component incorporates exercise weight knowledge coverage to select questions from the candidate question set for the tested question set. Lastly, the cognitive diagnosis model leverages exercise representation and skill importance weights to predict student performance on the test set and estimate their knowledge state. To evaluate the effectiveness of our selection strategy, extensive experiments were conducted on two publicly available educational datasets. The experimental results demonstrate that our framework can recommend appropriate exercises to students, leading to improved student performance.
摘要
Translated into Simplified Chinese:当前,基于知识图的推荐算法已经吸引了研究人员的广泛关注。然而,这些算法只考虑单 relate 知识图,并不能有效地模型运动rich feature,如运动 representativeness 和 informativeness。因此,本文提出了一个框架,即知识图运动 representativeness 和 informativeness 框架,以解决这两个问题。该框架包括四个复杂的组件和一个新的认知诊断模型called Neural Attentive cognitive diagnosis model。这些组件包括 informativeness 组件、运动表现组件、知识重要性组件和运动 representativeness 组件。informativeness 组件评估每个问题的信息价值,并将候选问题集定为展示最高运动 informativeness。此外,技能嵌入被用作知识重要性组件的输入。这个组件通过四种类关系将一维知识图转换为多维知识图,并计算技能重要性 weights 基于新鲜度和流行度。然后,运动 representativeness 组件将运动权重知识覆盖纳入选择候选问题集的 tested question set。最后,认知诊断模型通过运动表现和技能重要性 weights 预测学生在测试集上的表现和知识状态。为评估我们的选择策略的效果,我们在两个公共可用的教育数据集上进行了广泛的实验。实验结果表明,我们的框架可以为学生推荐适合的运动,从而提高学生的表现。
Promotion/Inhibition Effects in Networks: A Model with Negative Probabilities
methods: 本研究采用了P。Dirac和R。Feynman提出的“负概率”框架,并设立了可能性形式来获得Edge-weight的值。 solve this problem, the proposed optimization problem can be solved via a generalization of the well-known Sinkhorn algorithm.
results: 本研究得到了一种基于“负概率”框架的方法,可以在基因网络中确定Edge-weight,并且这种方法可以通过一种扩展的Sinkhorn算法来解决。Abstract
Biological networks often encapsulate promotion/inhibition as signed edge-weights of a graph. Nodes may correspond to genes assigned expression levels (mass) of respective proteins. The promotion/inhibition nature of co-expression between nodes is encoded in the sign of the corresponding entry of a sign-indefinite adjacency matrix, though the strength of such co-expression (i.e., the precise value of edge weights) cannot typically be directly measured. Herein we address the inverse problem to determine network edge-weights based on a sign-indefinite adjacency and expression levels at the nodes. While our motivation originates in gene networks, the framework applies to networks where promotion/inhibition dictates a stationary mass distribution at the nodes. In order to identify suitable edge-weights we adopt a framework of ``negative probabilities,'' advocated by P.\ Dirac and R.\ Feynman, and we set up a likelihood formalism to obtain values for the sought edge-weights. The proposed optimization problem can be solved via a generalization of the well-known Sinkhorn algorithm; in our setting the Sinkhorn-type ``diagonal scalings'' are multiplicative or inverse-multiplicative, depending on the sign of the respective entries in the adjacency matrix, with value computed as the positive root of a quadratic polynomial.
摘要
To identify suitable edge weights, we adopt a framework of "negative probabilities" advocated by P. Dirac and R. Feynman. We set up a likelihood formalism to obtain values for the sought edge weights. The proposed optimization problem can be solved using a generalization of the well-known Sinkhorn algorithm; in our setting, the Sinkhorn-type "diagonal scalings" are multiplicative or inverse-multiplicative, depending on the sign of the respective entries in the adjacency matrix, with values computed as the positive root of a quadratic polynomial.
Measuring Perceived Trust in XAI-Assisted Decision-Making by Eliciting a Mental Model
paper_authors: Mohsen Abbaspour Onari, Isel Grau, Marco S. Nobile, Yingqian Zhang
for: This paper aims to measure users’ perceived trust in an Explainable Artificial Intelligence (XAI) model by eliciting their mental models using Fuzzy Cognitive Maps (FCMs).
methods: The paper uses an interpretable Machine Learning (ML) model to classify suspected COVID-19 patients and then evaluates the impact of interpretations on perceived trust through a survey of Medical Experts’ (MEs) explanation satisfaction attributes. Fuzzy linguistic variables are used to determine the strength of influences in MEs’ mental subjectivity.
results: The paper obtains quantified values to measure the perceived trust of each ME and analyzes the behavior of MEs in completing diagnostic tasks based on the quantified values. The results show that the quantified values can determine whether MEs trust or distrust the XAI model.Abstract
This empirical study proposes a novel methodology to measure users' perceived trust in an Explainable Artificial Intelligence (XAI) model. To do so, users' mental models are elicited using Fuzzy Cognitive Maps (FCMs). First, we exploit an interpretable Machine Learning (ML) model to classify suspected COVID-19 patients into positive or negative cases. Then, Medical Experts' (MEs) conduct a diagnostic decision-making task based on their knowledge and then prediction and interpretations provided by the XAI model. In order to evaluate the impact of interpretations on perceived trust, explanation satisfaction attributes are rated by MEs through a survey. Then, they are considered as FCM's concepts to determine their influences on each other and, ultimately, on the perceived trust. Moreover, to consider MEs' mental subjectivity, fuzzy linguistic variables are used to determine the strength of influences. After reaching the steady state of FCMs, a quantified value is obtained to measure the perceived trust of each ME. The results show that the quantified values can determine whether MEs trust or distrust the XAI model. We analyze this behavior by comparing the quantified values with MEs' performance in completing diagnostic tasks.
摘要
Translation Notes:* "empirical study" is translated as "实验研究" (shí yàn yán jí)* "perceived trust" is translated as "感知的信任" (gǎn zhī de xìn ràng)* "Fuzzy Cognitive Maps" is translated as "模糊认知地图" (mó huang gòu zhī dì tú)* "Medical Experts" is translated as "医学专家" (yī xué zhù jià)* "diagnostic decision-making task" is translated as "诊断决策任务" (shòu yán jì suī zhèng yì)* "explanation satisfaction attributes" is translated as "解释满意属性" (jiě jie cháng zhì fù xìng)* "fuzzy linguistic variables" is translated as "模糊语言变量" (mó huang yǔ yán biàn zhì)* "quantified value" is translated as "量化值" (liàng zhì yù)* "perceived trust of each ME" is translated as "每位ME的感知信任" (mēi zhì ME de gǎn zhī xìn ràng)
Fast Adaptation with Bradley-Terry Preference Models in Text-To-Image Classification and Generation
results: 实验结果显示了这个框架在不同的多modal文本和图像理解领域中的能力,包括喜好预测和生成任务。Abstract
Recently, large multimodal models, such as CLIP and Stable Diffusion have experimented tremendous successes in both foundations and applications. However, as these models increase in parameter size and computational requirements, it becomes more challenging for users to personalize them for specific tasks or preferences. In this work, we address the problem of adapting the previous models towards sets of particular human preferences, aligning the retrieved or generated images with the preferences of the user. We leverage the Bradley-Terry preference model to develop a fast adaptation method that efficiently fine-tunes the original model, with few examples and with minimal computing resources. Extensive evidence of the capabilities of this framework is provided through experiments in different domains related to multimodal text and image understanding, including preference prediction as a reward model, and generation tasks.
摘要
A Nearly-Linear Time Algorithm for Structured Support Vector Machines
for: quadratic programming with low-rank factorization or low-treewidth, and a small number of linear constraints
methods: nearly-linear time algorithm
results: nearly-linear time algorithms for low-treewidth or low-rank SVMsAbstract
Quadratic programming is a fundamental problem in the field of convex optimization. Many practical tasks can be formulated as quadratic programming, for example, the support vector machine (SVM). Linear SVM is one of the most popular tools over the last three decades in machine learning before deep learning method dominating. In general, a quadratic program has input size $\Theta(n^2)$ (where $n$ is the number of variables), thus takes $\Omega(n^2)$ time to solve. Nevertheless, quadratic programs coming from SVMs has input size $O(n)$, allowing the possibility of designing nearly-linear time algorithms. Two important classes of SVMs are programs admitting low-rank kernel factorizations and low-treewidth programs. Low-treewidth convex optimization has gained increasing interest in the past few years (e.g.~linear programming [Dong, Lee and Ye 2021] and semidefinite programming [Gu and Song 2022]). Therefore, an important open question is whether there exist nearly-linear time algorithms for quadratic programs with these nice structures. In this work, we provide the first nearly-linear time algorithm for solving quadratic programming with low-rank factorization or low-treewidth, and a small number of linear constraints. Our results imply nearly-linear time algorithms for low-treewidth or low-rank SVMs.
摘要
quadratic programming 是 convex optimization 领域中的基本问题。许多实际任务可以被формализова为quadratic programming,例如支持向量机器(SVM)。线性SVM 是过去三十年最受欢迎的机器学习工具之一,直到深度学习方法成为主流。 在一般情况下,quadratic program 的输入大小为 $\Theta(n^2)$(where $n$ 是变数的数量),因此需要 $\Omega(n^2)$ 时间来解决。然而,从 SVM 中获得的quadratic program 的输入大小为 $O(n)$,这使得可能设计近似线性时间的算法。两个重要的 SVM 类别是允许低矩阵kernel factorization 和低树几何 programme。低树几何 convex optimization 在过去几年内(例如线性程度 [Dong, Lee 和 Ye 2021] 和对偶定理程度 [Gu 和 Song 2022])获得了增加的关注。因此,一个重要的开问是是否存在近似线性时间的算法 для quadratic program WITH low-rank factorization 或 low-treewidth。在这个工作中,我们提供了第一个 near-linear time algorithm for solving quadratic programming with low-rank factorization or low-treewidth, 和一小数量的线性几何。我们的结果意味着 near-linear time algorithms for low-treewidth 或 low-rank SVMs.
Towards Optimal Neural Networks: the Role of Sample Splitting in Hyperparameter Selection
for: Understanding the effectiveness of neural network models
methods: By practicing sample splitting to optimize hyperparameters
results: Experimental results prove that this method can minimize the prediction risk of neural network modelsAbstract
When artificial neural networks have demonstrated exceptional practical success in a variety of domains, investigations into their theoretical characteristics, such as their approximation power, statistical properties, and generalization performance, have made significant strides. In this paper, we construct a novel theory for understanding the effectiveness of neural networks by discovering the mystery underlying a common practice during neural network model construction: sample splitting. Our theory demonstrates that, the optimal hyperparameters derived from sample splitting can enable a neural network model that asymptotically minimizes the prediction risk. We conduct extensive experiments across different application scenarios and network architectures, and the results manifest our theory's effectiveness.
摘要
Visual Analytics For Machine Learning: A Data Perspective Survey
results: 对143篇评估的论文进行分析,发现这些论文在不同的ML管道阶段和数据类型上进行了六种任务,并对未来研究方向做出预测。Abstract
The past decade has witnessed a plethora of works that leverage the power of visualization (VIS) to interpret machine learning (ML) models. The corresponding research topic, VIS4ML, keeps growing at a fast pace. To better organize the enormous works and shed light on the developing trend of VIS4ML, we provide a systematic review of these works through this survey. Since data quality greatly impacts the performance of ML models, our survey focuses specifically on summarizing VIS4ML works from the data perspective. First, we categorize the common data handled by ML models into five types, explain the unique features of each type, and highlight the corresponding ML models that are good at learning from them. Second, from the large number of VIS4ML works, we tease out six tasks that operate on these types of data (i.e., data-centric tasks) at different stages of the ML pipeline to understand, diagnose, and refine ML models. Lastly, by studying the distribution of 143 surveyed papers across the five data types, six data-centric tasks, and their intersections, we analyze the prospective research directions and envision future research trends.
摘要
过去一个 décennie hath witnessed a plethora of works that leveraged the power of visualization (VIS) to interpret machine learning (ML) models. The corresponding research topic, VIS4ML, hath been growing at a fast pace. To better organize the enormous works and shed light on the developing trend of VIS4ML, we provide a systematic review of these works through this survey. Since data quality greatly impacts the performance of ML models, our survey focuses specifically on summarizing VIS4ML works from the data perspective. First, we categorize the common data handled by ML models into five types, explain the unique features of each type, and highlight the corresponding ML models that are good at learning from them. Second, from the large number of VIS4ML works, we tease out six tasks that operate on these types of data (i.e., data-centric tasks) at different stages of the ML pipeline to understand, diagnose, and refine ML models. Lastly, by studying the distribution of 143 surveyed papers across the five data types, six data-centric tasks, and their intersections, we analyze the prospective research directions and envision future research trends.
Identification of Stochasticity by Matrix-decomposition: Applied on Black Hole Data
paper_authors: Sai Pradeep Chakka, Sunil Kumar Vengalil, Neelam Sinha
for: 本研究旨在提出一种两路矩阵分解法,用于分类时间序列数据。
methods: 该算法使用了两种不同的技术:单值分解(SVD)和主成分分析(PCA)。
results: 对synthetic数据进行了分析,并在实验中使用了SVM进行分类。结果显示,在12个时间类中,SVD-label和PCA-label之间存在高度的一致性。Abstract
Timeseries classification as stochastic (noise-like) or non-stochastic (structured), helps understand the underlying dynamics, in several domains. Here we propose a two-legged matrix decomposition-based algorithm utilizing two complementary techniques for classification. In Singular Value Decomposition (SVD) based analysis leg, we perform topological analysis (Betti numbers) on singular vectors containing temporal information, leading to SVD-label. Parallely, temporal-ordering agnostic Principal Component Analysis (PCA) is performed, and the proposed PCA-derived features are computed. These features, extracted from synthetic timeseries of the two labels, are observed to map the timeseries to a linearly separable feature space. Support Vector Machine (SVM) is used to produce PCA-label. The proposed methods have been applied to synthetic data, comprising 41 realisations of white-noise, pink-noise (stochastic), Logistic-map at growth-rate 4 and Lorentz-system (non-stochastic), as proof-of-concept. Proposed algorithm is applied on astronomical data: 12 temporal-classes of timeseries of black hole GRS 1915+105, obtained from RXTE satellite with average length 25000. For a given timeseries, if SVD-label and PCA-label concur, then the label is retained; else deemed "Uncertain". Comparison of obtained results with those in literature are presented. It's found that out of 12 temporal classes of GRS 1915+105, concurrence between SVD-label and PCA-label is obtained on 11 of them.
摘要
时间序列分类为随机(噪声如的)或非随机(结构化),可以帮助我们理解时间序列的下面动力学。我们提出了一种基于两个脚本的矩阵分解算法,利用两种 complementary 技术进行分类。在 Singular Value Decomposition(SVD)基础分析脚本中,我们进行了 topological 分析(Betti 数)于时间信号中的特征向量,从而获得 SVD-标签。同时,无关于时间顺序的 Principal Component Analysis(PCA)被应用,并计算了提案的 PCA-derived 特征。这些特征从 synthetic 时间序列中提取出来,并在线性分离特征空间中映射时间序列。使用 Support Vector Machine(SVM)生成 PCA-标签。我们对 synthetic 数据进行了证明,包括41个实现 white-noise、pink-noise(随机)、Logistic-map 增长率4和 Lorentz-system(非随机)。我们还应用了这种方法于天文数据:RXTE 卫星上的 12 个 temporal 类时间序列,每个时间序列的平均长度为 25000。对于每个时间序列,如果 SVD-标签和 PCA-标签协调,则保留标签;否则被称为 "Uncertain"。我们对得到的结果与文献中的结果进行了比较,发现 GRS 1915+105 黑洞的 11 个 temporal 类时间序列中,SVD-标签和 PCA-标签协调。
NeurASP: Embracing Neural Networks into Answer Set Programming
paper_authors: Zhun Yang, Adam Ishay, Joohyung Lee
for: 该论文是为了推动Answer Set Programming(ASP)和神经网络之间的 integración,提供了一种简单扩展的Answer Set Programming(NeurASP)。
methods: 该论文使用神经网络输出作为Answer Set Programming中的概率分布,从而实现了sub-symbolic和symbolic计算的集成。它还展示了如何使用预训练神经网络在符号计算中使用ASP规则,以及如何使用ASP规则来训练神经网络。
results: NeurASP可以使用预训练神经网络来改善神经网络的识别结果,并且可以使用ASP规则来帮助神经网络学习从数据中的隐式相关性和Explicit complex semantic constraints。Abstract
We present NeurASP, a simple extension of answer set programs by embracing neural networks. By treating the neural network output as the probability distribution over atomic facts in answer set programs, NeurASP provides a simple and effective way to integrate sub-symbolic and symbolic computation. We demonstrate how NeurASP can make use of a pre-trained neural network in symbolic computation and how it can improve the neural network's perception result by applying symbolic reasoning in answer set programming. Also, NeurASP can be used to train a neural network better by training with ASP rules so that a neural network not only learns from implicit correlations from the data but also from the explicit complex semantic constraints expressed by the rules.
摘要
我们介绍NeurASP,一个简单扩展Answer Set Programs(ASP)的方法,通过将神经网络输出视为Answer Set Programs中的原子事实的概率分布。NeurASP提供了一个简单而有效的方式将子符号 computations和符号 computations融合。我们显示了NeurASP如何使用预训练的神经网络在符号计算中使用,以及如何运用符号推理来改善神经网络的认知结果。此外,NeurASP还可以用来训练神经网络,使其不仅从数据中学习隐含的相互关联,而且还从ASP规则中获得明确的复杂 semantic constraint。
The Growth of E-Bike Use: A Machine Learning Approach
paper_authors: Aditya Gupta, Samarth Chitgopekar, Alexander Kim, Joseph Jiang, Megan Wang, Christopher Grattoni for: 这个研究的目的是为美国政策制定者提供关于电动自行车(e-bike)的信息,以便他们能够更好地了解电动自行车的增长和影响,并在制定可持续能源计划时做出更 Informed decisions。methods: 这个研究使用了ARIMA模型和一种监管机器学习算法来预测电动自行车销售量的增长。此外,研究还使用Random Forest回归模型来分析电动自行车销售增长的因素。results: 研究发现,电动自行车在美国的销售量将在2025年和2028年分别达到130万和2113万个单位。此外,研究还发现,电动自行车的使用会减少碳排放和提高体能消耗。在2022年,电动自行车的使用已经减少了15737.82吨碳排放和716630.727千卡ло里。Abstract
We present our work on electric bicycles (e-bikes) and their implications for policymakers in the United States. E-bikes have gained significant popularity as a fast and eco-friendly transportation option. As we strive for a sustainable energy plan, understanding the growth and impact of e-bikes is crucial for policymakers. Our mathematical modeling offers insights into the value of e-bikes and their role in the future. Using an ARIMA model, a supervised machine-learning algorithm, we predicted the growth of e-bike sales in the U.S. Our model, trained on historical sales data from January 2006 to December 2022, projected sales of 1.3 million units in 2025 and 2.113 million units in 2028. To assess the factors contributing to e-bike usage, we employed a Random Forest regression model. The most significant factors influencing e-bike sales growth were disposable personal income and popularity. Furthermore, we examined the environmental and health impacts of e-bikes. Through Monte Carlo simulations, we estimated the reduction in carbon emissions due to e-bike use and the calories burned through e-biking. Our findings revealed that e-bike usage in the U.S. resulted in a reduction of 15,737.82 kilograms of CO2 emissions in 2022. Additionally, e-bike users burned approximately 716,630.727 kilocalories through their activities in the same year. Our research provides valuable insights for policymakers, emphasizing the potential of e-bikes as a sustainable transportation solution. By understanding the growth factors and quantifying the environmental and health benefits, policymakers can make informed decisions about integrating e-bikes into future energy and transportation strategies.
摘要
我们对电动自行车(e-bike)的研究和其对政策 makers 在美国的影响进行了报告。电动自行车在快速和环保交通方面受到了广泛的欢迎,随着我们努力实现可持续能源规划,理解电动自行车的增长和影响非常重要。我们使用 ARIMA 模型和一种监管机器学习算法来预测电动自行车销售在美国的增长。我们的模型,基于2006年1月至2022年12月的历史销售数据,预测在2025年销售130万部电动自行车,在2028年销售2113万部。为了评估电动自行车使用的因素,我们使用Random Forest回归模型。最主要影响电动自行车销售增长的因素是可 dispose 个人收入和流行度。此外,我们还研究了电动自行车对环境和健康的影响。通过蒙地卡罗模拟,我们估算了电动自行车使用在美国的碳排放减少和热量燃烧。我们的发现表明,在2022年,电动自行车在美国的使用已经减少了15737.82公斤的碳排放,同时电动自行车用户通过其活动燃烧了约716630.727公利 kalories。我们的研究为政策 makers 提供了有价值的见解,强调电动自行车作为可持续交通解决方案的潜在价值。通过理解电动自行车增长因素和评估环境和健康的影响,政策 makers 可以做出 Informed 的决策,将电动自行车纳入未来能源和交通战略中。
Reducing operator complexity in Algebraic Multigrid with Machine Learning Approaches
results: 降低粗网操作符的复杂性,保持总的 AMG converges.Abstract
We propose a data-driven and machine-learning-based approach to compute non-Galerkin coarse-grid operators in algebraic multigrid (AMG) methods, addressing the well-known issue of increasing operator complexity. Guided by the AMG theory on spectrally equivalent coarse-grid operators, we have developed novel ML algorithms that utilize neural networks (NNs) combined with smooth test vectors from multigrid eigenvalue problems. The proposed method demonstrates promise in reducing the complexity of coarse-grid operators while maintaining overall AMG convergence for solving parametric partial differential equation (PDE) problems. Numerical experiments on anisotropic rotated Laplacian and linear elasticity problems are provided to showcase the performance and compare with existing methods for computing non-Galerkin coarse-grid operators.
摘要
我们提出了一种基于数据驱动和机器学习的方法,用于在数学多普逊(AMG)方法中计算非加尔erkin粗积算子,解决了常见的算子复杂性问题。我们根据AMG理论中的特征相似粗积算子,开发了一种新的机器学习算法,利用神经网络(NN)和多普逊域值问题中的平滑测试向量。我们的方法可以减少粗积算子的复杂性,同时保持AMG方法的总体收敛性,用于解决参数化partial differential equation(PDE)问题。我们在不同的旋转卷积 Laplacian 和线性塑性问题上进行了数值实验,以示出我们的方法的性能和与现有方法相比。
Creating a Dataset for High-Performance Computing Code Translation: A Bridge Between HPC Fortran and C++
results: 我们使用量化(CodeBLEU)和质量(人类评估)方法评估该集的有效性,并发现该集可以提高大规模语言模型的翻译能力,比如无编程知识下的提升为$\mathbf{\times 5.1}$,有编程知识下的提升为$\mathbf{\times 9.9}$。这种dataset的存在可能推动高性能计算领域中的代码翻译技术的发展。该集可以在https://github.com/bin123apple/Fortran-CPP-HPC-code-translation-dataset上下载。Abstract
In this study, we present a novel dataset for training machine learning models translating between OpenMP Fortran and C++ code. To ensure reliability and applicability, the dataset is initially refined using a meticulous code similarity test. The effectiveness of our dataset is assessed using both quantitative (CodeBLEU) and qualitative (human evaluation) methods. We demonstrate how this dataset can significantly improve the translation capabilities of large-scale language models, with improvements of $\mathbf{\times 5.1}$ for models with no prior coding knowledge and $\mathbf{\times 9.9}$ for models with some coding familiarity. Our work highlights the potential of this dataset to advance the field of code translation for high-performance computing. The dataset is available at https://github.com/bin123apple/Fortran-CPP-HPC-code-translation-dataset
摘要
在这项研究中,我们提供了一个新的数据集用于训练机器学习模型在OpenMP Fortran和C++代码之间翻译。为确保可靠性和实用性,我们首先使用精细的代码相似性测试进行初步纤细。我们使用代码BLEU和人类评估方法进行评估数据集的效果,并证明了该数据集可以大幅提高大规模语言模型的翻译能力,具体是$\times 5.1$ для没有编程知识的模型和$\times 9.9$ для具有一定编程经验的模型。我们的工作展示了该数据集在高性能计算领域的代码翻译技术的前进。数据集可以在https://github.com/bin123apple/Fortran-CPP-HPC-code-translation-dataset中下载。
Learning Subjective Time-Series Data via Utopia Label Distribution Approximation
paper_authors: Wenxin Xu, Hexin Jiang, Xuefeng Liang, Ying Zhou, Yin Zhao, Jie Zhang for:STR tasks (Subjective time-series regression)methods:ULDA (Utopia Label Distribution Approximation)TNS (Time-slice Normal Sampling)CWL (Convolutional Weighted Loss)results:lifts the state-of-the-art performance on two STR tasks and three benchmark datasets.Abstract
Subjective time-series regression (STR) tasks have gained increasing attention recently. However, most existing methods overlook the label distribution bias in STR data, which results in biased models. Emerging studies on imbalanced regression tasks, such as age estimation and depth estimation, hypothesize that the prior label distribution of the dataset is uniform. However, we observe that the label distributions of training and test sets in STR tasks are likely to be neither uniform nor identical. This distinct feature calls for new approaches that estimate more reasonable distributions to train a fair model. In this work, we propose Utopia Label Distribution Approximation (ULDA) for time-series data, which makes the training label distribution closer to real-world but unknown (utopia) label distribution. This would enhance the model's fairness. Specifically, ULDA first convolves the training label distribution by a Gaussian kernel. After convolution, the required sample quantity at each regression label may change. We further devise the Time-slice Normal Sampling (TNS) to generate new samples when the required sample quantity is greater than the initial sample quantity, and the Convolutional Weighted Loss (CWL) to lower the sample weight when the required sample quantity is less than the initial quantity. These two modules not only assist the model training on the approximated utopia label distribution, but also maintain the sample continuity in temporal context space. To the best of our knowledge, ULDA is the first method to address the label distribution bias in time-series data. Extensive experiments demonstrate that ULDA lifts the state-of-the-art performance on two STR tasks and three benchmark datasets.
摘要
受到媒体关注的主观时序回归(STR)任务在最近几年来得到了越来越多的关注。然而,大多数现有方法忽略了STR数据中标签分布偏见,导致模型偏向。新诞听学者认为STR任务中的标签分布是均匀的,但我们发现STR任务中的训练和测试集标签分布很可能不均匀,也不是完全相同的。这种特殊特点需要新的方法来训练公正的模型。在这种情况下,我们提出了UTopia标签分布近似(ULDA)方法,用于在时序数据上训练公正的模型。ULDA方法首先将训练标签分布通过 Gaussian 核函数进行混合。在混合后,每个回归标签的样本数量可能会改变。我们还提出了时间扁平分布(TNS)和卷积权重损失(CWL)两个模块,用于生成新的样本和更正模型的训练。这两个模块不仅帮助模型在训练中使用更加公正的标签分布,还保持了样本在时间上的连续性。到目前为止,ULDA方法是首个强调STR任务中标签分布偏见的方法。我们对 STR 任务中的三个标准数据集进行了广泛的实验,结果表明ULDA方法可以超越当前的状态势。
Data-centric Operational Design Domain Characterization for Machine Learning-based Aeronautical Products
methods: 该方法是基于数据而不是场景而定义ODD,并提出了将定义ODD的参数维度和 ML 应用可能遇到的数据类型进行明确表述,以及这些数据类型对 ML 模型和系统层次结构的影响。
results: 该论文指出,通过这种方法可以确定 ML 模型的需求,以及系统层次结构中 ML 模型和高级系统的可能的影响,以及可能需要进行学习保障过程和系统体系设计考虑。 例如,通过使用飞行器飞行范围来说明这些概念。Abstract
We give a first rigorous characterization of Operational Design Domains (ODDs) for Machine Learning (ML)-based aeronautical products. Unlike in other application sectors (such as self-driving road vehicles) where ODD development is scenario-based, our approach is data-centric: we propose the dimensions along which the parameters that define an ODD can be explicitly captured, together with a categorization of the data that ML-based applications can encounter in operation, whilst identifying their system-level relevance and impact. Specifically, we discuss how those data categories are useful to determine: the requirements necessary to drive the design of ML Models (MLMs); the potential effects on MLMs and higher levels of the system hierarchy; the learning assurance processes that may be needed, and system architectural considerations. We illustrate the underlying concepts with an example of an aircraft flight envelope.
摘要
我们给出了机器学习(ML)基于航空产品的操作设计领域(ODD)的首次正式定义。与其他应用领域(如自动驾驶道路车辆)的ODD开发不同,我们的方法是数据中心:我们提议定义ODD参数的维度,并将ML基于应用中可能遇到的数据分类,以及这些数据的系统水平重要性和影响。specifically, we discuss how these data categories can be used to determine: the requirements needed to drive the design of ML models(MLMs); the potential effects on MLMs and higher levels of the system hierarchy; the learning assurance processes that may be needed, and system architectural considerations. We illustrate the underlying concepts with an example of an aircraft flight envelope.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
paper_authors: Usama Sardar, Sarwan Ali, Muhammad Sohaib Ayub, Muhammad Shoaib, Khurram Bashir, Imdad Ullah Khan, Murray Patterson for: This paper aims to develop a machine-learning method to predict the binding of nanobodies (Nb) to antigens based solely on sequence data.methods: The authors curated a comprehensive dataset of Nb-Antigen binding and nonbinding data and devised an embedding method based on gapped k-mers to predict binding based only on sequences of Nb and Antigen.results: The approach achieved up to 90% accuracy in binding prediction and was significantly more efficient compared to the widely-used computational docking technique.Abstract
Nanobodies (Nb) are monomeric heavy-chain fragments derived from heavy-chain only antibodies naturally found in Camelids and Sharks. Their considerably small size (~3-4 nm; 13 kDa) and favorable biophysical properties make them attractive targets for recombinant production. Furthermore, their unique ability to bind selectively to specific antigens, such as toxins, chemicals, bacteria, and viruses, makes them powerful tools in cell biology, structural biology, medical diagnostics, and future therapeutic agents in treating cancer and other serious illnesses. However, a critical challenge in nanobodies production is the unavailability of nanobodies for a majority of antigens. Although some computational methods have been proposed to screen potential nanobodies for given target antigens, their practical application is highly restricted due to their reliance on 3D structures. Moreover, predicting nanobodyantigen interactions (binding) is a time-consuming and labor-intensive task. This study aims to develop a machine-learning method to predict Nanobody-Antigen binding solely based on the sequence data. We curated a comprehensive dataset of Nanobody-Antigen binding and nonbinding data and devised an embedding method based on gapped k-mers to predict binding based only on sequences of nanobody and antigen. Our approach achieves up to 90% accuracy in binding prediction and is significantly more efficient compared to the widely-used computational docking technique.
摘要
纳诺体(Nb)是含有重链只的轻链抗体的自然存在的哺乳动物和鲨鱼中的蛋白质。它们的非常小的大小(约3-4奈米,13 kDa)和有利的生物物理性质使其成为了重点生产的目标。此外,它们可以特异性地绑定到特定抗原,如毒素、化学物质、细菌和病毒,使其成为了细胞生物、结构生物、医学诊断和未来的疾病治疗的有力工具。然而,纳诺体生产中的主要挑战是缺乏纳诺体对大多数抗原的可用性。虽然一些计算方法已经被提出来屏选纳诺体对给定抗原的可能性,但它们的实际应用受到了三维结构的限制,而且预测纳诺体-抗原交互(绑定)是一项时间consuming和劳动密集的任务。本研究旨在开发一种基于序列数据的机器学习方法,以预测纳诺体-抗原绑定。我们收集了一个完整的纳诺体-抗原绑定和非绑定数据集,并采用基于异常词的嵌入方法来预测绑定基于纳诺体和抗原的序列数据。我们的方法可以达到90%的准确率,与广泛使用的计算协同技术相比,效率明显高于。
paper_authors: Jason M. Klusowski, Jonathan W. Siegel
for: 本文研究了matching pursuit的基本限制,即用一个 слова库中的元素组成一个稀疏的线性组合来近似目标函数。当目标函数在字典的变化空间中时,过去几十年有很多卓越的工作获得了上下限 bounds on error of matching pursuit,但它们并不匹配。本文的主要贡献是将这个差异关系closed和获得了准确的衰减率特征。
methods: 本文使用了一个最差情况的字典来构建,该字典显示出了现有最佳上限 bound cannot be significantly improved。结果是,与其他greedy algorithm variants不同,matching pursuit的 converges rate是非优的并由一个certain non-linear equation的解决决定。这使得我们可以结论出任何Amount of shrinkage improve matching pursuit in the worst case.
results: 本文的结果是,任何Amount of shrinkage improve matching pursuit in the worst case。这意味着,无论如何选择 слова库,matching pursuit都会在最差情况下出现衰减。这与之前的研究不同,因为它们通常认为matching pursuit在某些情况下是optimal的。Abstract
We study the fundamental limits of matching pursuit, or the pure greedy algorithm, for approximating a target function by a sparse linear combination of elements from a dictionary. When the target function is contained in the variation space corresponding to the dictionary, many impressive works over the past few decades have obtained upper and lower bounds on the error of matching pursuit, but they do not match. The main contribution of this paper is to close this gap and obtain a sharp characterization of the decay rate of matching pursuit. Specifically, we construct a worst case dictionary which shows that the existing best upper bound cannot be significantly improved. It turns out that, unlike other greedy algorithm variants, the converge rate is suboptimal and is determined by the solution to a certain non-linear equation. This enables us to conclude that any amount of shrinkage improves matching pursuit in the worst case.
摘要
我们研究基本限制的匹配追求(也称为纯格列批处理),用一个简单的线性组合来近似目标函数。当目标函数在字典的变换空间中存在时,过去几十年有很多出色的成果,得到了误差的上下限,但是它们不匹配。本文的主要贡献是关于匹配追求的衰减率的锐化特征化。我们构建了最坏情况的字典,显示现有的最佳上限不能得到显著改进。结果表明,与其他格列算法变体不同,匹配追求的 converges率是不优的,并且取决于一个非线性方程的解。这使得我们能够 conclued 任何Amount of shrinkage 都会提高匹配追求的性能在最坏情况下。
On the Robustness of Epoch-Greedy in Multi-Agent Contextual Bandit Mechanisms
results: 研究发现,可以通过扩展 $\epsilon$-greedy 算法来处理这些挑战,并且这种扩展具有对 adversarial data corruption attacks 的 innate robustness,并且性能会随损害的Amount decay linearly。Abstract
Efficient learning in multi-armed bandit mechanisms such as pay-per-click (PPC) auctions typically involves three challenges: 1) inducing truthful bidding behavior (incentives), 2) using personalization in the users (context), and 3) circumventing manipulations in click patterns (corruptions). Each of these challenges has been studied orthogonally in the literature; incentives have been addressed by a line of work on truthful multi-armed bandit mechanisms, context has been extensively tackled by contextual bandit algorithms, while corruptions have been discussed via a recent line of work on bandits with adversarial corruptions. Since these challenges co-exist, it is important to understand the robustness of each of these approaches in addressing the other challenges, provide algorithms that can handle all simultaneously, and highlight inherent limitations in this combination. In this work, we show that the most prominent contextual bandit algorithm, $\epsilon$-greedy can be extended to handle the challenges introduced by strategic arms in the contextual multi-arm bandit mechanism setting. We further show that $\epsilon$-greedy is inherently robust to adversarial data corruption attacks and achieves performance that degrades linearly with the amount of corruption.
摘要
efficient learning in multi-armed bandit mechanisms such as pay-per-click (PPC) auctions typically involves three challenges: 1) inducing truthful bidding behavior (incentives), 2) using personalization in the users (context), and 3) circumventing manipulations in click patterns (corruptions). each of these challenges has been studied orthogonally in the literature; incentives have been addressed by a line of work on truthful multi-armed bandit mechanisms, context has been extensively tackled by contextual bandit algorithms, while corruptions have been discussed via a recent line of work on bandits with adversarial corruptions. since these challenges co-exist, it is important to understand the robustness of each of these approaches in addressing the other challenges, provide algorithms that can handle all simultaneously, and highlight inherent limitations in this combination. in this work, we show that the most prominent contextual bandit algorithm, $\epsilon$-greedy can be extended to handle the challenges introduced by strategic arms in the contextual multi-arm bandit mechanism setting. we further show that $\epsilon$-greedy is inherently robust to adversarial data corruption attacks and achieves performance that degrades linearly with the amount of corruption.
An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode Discovery in GFlowNets
methods: 本文employs empirical studies to explore various replay buffer sampling techniques and evaluates their impact on the speed of mode discovery and the quality of the discovered modes.
results: 实验结果表明,在Hypergrid即地域和分子合成环境中,使用储存缓存可以significantly improve模式发现速度和模式质量。Abstract
Reinforcement Learning (RL) algorithms aim to learn an optimal policy by iteratively sampling actions to learn how to maximize the total expected return, $R(x)$. GFlowNets are a special class of algorithms designed to generate diverse candidates, $x$, from a discrete set, by learning a policy that approximates the proportional sampling of $R(x)$. GFlowNets exhibit improved mode discovery compared to conventional RL algorithms, which is very useful for applications such as drug discovery and combinatorial search. However, since GFlowNets are a relatively recent class of algorithms, many techniques which are useful in RL have not yet been associated with them. In this paper, we study the utilization of a replay buffer for GFlowNets. We explore empirically various replay buffer sampling techniques and assess the impact on the speed of mode discovery and the quality of the modes discovered. Our experimental results in the Hypergrid toy domain and a molecule synthesis environment demonstrate significant improvements in mode discovery when training with a replay buffer, compared to training only with trajectories generated on-policy.
摘要
强化学习(RL)算法的目标是通过反复样本动作来学习最佳策略,以 maximize the total expected return, $R(x)$. GFlowNets 是一种特殊的算法,用于生成自 discrete 集合中的多个候选者,$x$, 通过学习一个策略,来近似 proportional sampling of $R(x)$. GFlowNets 在模式发现方面表现出了改善,这对于应用如药物发现和 combinatorial search 非常有用。然而,由于 GFlowNets 是一种相对较新的算法,许多RL中的技巧还没有与其相关。在这篇论文中,我们研究了 GFlowNets 中使用 replay buffer 的利用。我们通过 empirical 方式研究了不同的 replay buffer 采样技术的影响,以及它们对速度模式发现和模式质量的影响。我们的实验结果在 Hypergrid 玩家领域和一个分子合成环境中表明,在训练中使用 replay buffer 可以比训练只使用在政策上的 trajectories 更快地发现模式,并且模式质量也更高。
Efficient Adversarial Attacks on Online Multi-agent Reinforcement Learning
results: efficient attack on MARL agents even with no prior information about the environment and agents’ algorithmsAbstract
Due to the broad range of applications of multi-agent reinforcement learning (MARL), understanding the effects of adversarial attacks against MARL model is essential for the safe applications of this model. Motivated by this, we investigate the impact of adversarial attacks on MARL. In the considered setup, there is an exogenous attacker who is able to modify the rewards before the agents receive them or manipulate the actions before the environment receives them. The attacker aims to guide each agent into a target policy or maximize the cumulative rewards under some specific reward function chosen by the attacker, while minimizing the amount of manipulation on feedback and action. We first show the limitations of the action poisoning only attacks and the reward poisoning only attacks. We then introduce a mixed attack strategy with both the action poisoning and the reward poisoning. We show that the mixed attack strategy can efficiently attack MARL agents even if the attacker has no prior information about the underlying environment and the agents' algorithms.
摘要
We first show the limitations of action poisoning only attacks and reward poisoning only attacks. We then introduce a mixed attack strategy that combines both action poisoning and reward poisoning. We demonstrate that the mixed attack strategy can efficiently attack MARL agents even if the attacker has no prior knowledge of the underlying environment and the agents' algorithms.
Efficient Action Robust Reinforcement Learning with Probabilistic Policy Execution Uncertainty
paper_authors: Guanlin Liu, Zhihan Zhou, Han Liu, Lifeng Lai
for: 本研究目的是找到面对不确定性时的最佳策略,以优化最差情况性能。
methods: 本研究使用了可靠性执行不确定性,即策略中指定的动作会被执行的概率是1-ρ,而冲击动作会被执行的概率是ρ。我们提出了动作稳健MDP的优化方法,并开发了Action Robust Reinforcement Learning with Certificates(ARRLC)算法,可以实现最小最大偏差和样本复杂度。
results: 我们通过数值实验 validate了我们的方法的稳健性,并证明了ARRLC在动作冲击下比非稳健RL算法表现更好,并且 faster than robust TD算法在存在动作冲击时 converge。Abstract
Robust reinforcement learning (RL) aims to find a policy that optimizes the worst-case performance in the face of uncertainties. In this paper, we focus on action robust RL with the probabilistic policy execution uncertainty, in which, instead of always carrying out the action specified by the policy, the agent will take the action specified by the policy with probability $1-\rho$ and an alternative adversarial action with probability $\rho$. We establish the existence of an optimal policy on the action robust MDPs with probabilistic policy execution uncertainty and provide the action robust Bellman optimality equation for its solution. Furthermore, we develop Action Robust Reinforcement Learning with Certificates (ARRLC) algorithm that achieves minimax optimal regret and sample complexity. Furthermore, we conduct numerical experiments to validate our approach's robustness, demonstrating that ARRLC outperforms non-robust RL algorithms and converges faster than the robust TD algorithm in the presence of action perturbations.
摘要
Robust reinforcement learning (RL) aims to find a policy that optimizes the worst-case performance in the face of uncertainties. In this paper, we focus on action robust RL with probabilistic policy execution uncertainty, in which, instead of always carrying out the action specified by the policy, the agent will take the action specified by the policy with probability $1-\rho$ and an alternative adversarial action with probability $\rho$. We establish the existence of an optimal policy on the action robust MDPs with probabilistic policy execution uncertainty and provide the action robust Bellman optimality equation for its solution. Furthermore, we develop Action Robust Reinforcement Learning with Certificates (ARRLC) algorithm that achieves minimax optimal regret and sample complexity. Furthermore, we conduct numerical experiments to validate our approach's robustness, demonstrating that ARRLC outperforms non-robust RL algorithms and converges faster than the robust TD algorithm in the presence of action perturbations.Here's the translation in Simplified Chinese:robust reinforcement learning (RL) 目标是找到面临不确定性时的政策优化策略,在这篇论文中,我们关注action robust RL中的抽象uncertainty, Specifically, instead of always carrying out the action specified by the policy, the agent will take the action specified by the policy with probability $1-\rho$ and an alternative adversarial action with probability $\rho$. We prove the existence of an optimal policy on the action robust MDPs with probabilistic policy execution uncertainty and provide the action robust Bellman optimality equation for its solution. In addition, we develop Action Robust Reinforcement Learning with Certificates (ARRLC) algorithm that achieves minimax optimal regret and sample complexity. Finally, we conduct numerical experiments to validate our approach's robustness, demonstrating that ARRLC outperforms non-robust RL algorithms and converges faster than the robust TD algorithm in the presence of action perturbations.
Machine learning for option pricing: an empirical investigation of network architectures
results: 通过实验发现,对选取价格问题,使用通用高速公路网络架构可以得到最佳性能,其中误差和训练时间都是最佳。而在计算附加价值时,经过必要的转换后,DGM架构的变体可以获得最佳性能。Abstract
We consider the supervised learning problem of learning the price of an option or the implied volatility given appropriate input data (model parameters) and corresponding output data (option prices or implied volatilities). The majority of articles in this literature considers a (plain) feed forward neural network architecture in order to connect the neurons used for learning the function mapping inputs to outputs. In this article, motivated by methods in image classification and recent advances in machine learning methods for PDEs, we investigate empirically whether and how the choice of network architecture affects the accuracy and training time of a machine learning algorithm. We find that for option pricing problems, where we focus on the Black--Scholes and the Heston model, the generalized highway network architecture outperforms all other variants, when considering the mean squared error and the training time as criteria. Moreover, for the computation of the implied volatility, after a necessary transformation, a variant of the DGM architecture outperforms all other variants, when considering again the mean squared error and the training time as criteria.
摘要
我们考虑了超级vised学习问题,即通过适当的输入数据(模型参数)和对应的输出数据(选项价格或预测volatility)来学习函数映射。大多数文章在这个文献中使用(普通)径向神经网络架构来连接学习神经元。在这篇文章中,我们受到图像分类方法和最近的机器学习方法 дляPDE的影响,我们在option价格问题上进行了实验,以评估不同网络架构对精度和训练时间的影响。我们发现,对黑-谢尔斯和哈斯顿模型的option价格问题,通用高速公路网络架构在评估 Mean Squared Error 和训练时间作为标准对比下,其表现比其他所有变体更好。此外,对计算预测volatility问题,经过必要的变换,一种DGM架构的变体在评估 Mean Squared Error 和训练时间作为标准对比下,表现比其他所有变体更好。
DIGEST: Fast and Communication Efficient Decentralized Learning with Local Updates
results: 论文通过分析单流和多流DIGEST机制的渐进性和通信开销,证明了两者都可以 approached到优化解决方案。在ilogistic回归和深度神经网络ResNet20上进行了实验,结果表明,多流DIGEST在iid设定下的渐进性比基eline更好,而在非iid设定下则超越基eline。Abstract
Two widely considered decentralized learning algorithms are Gossip and random walk-based learning. Gossip algorithms (both synchronous and asynchronous versions) suffer from high communication cost, while random-walk based learning experiences increased convergence time. In this paper, we design a fast and communication-efficient asynchronous decentralized learning mechanism DIGEST by taking advantage of both Gossip and random-walk ideas, and focusing on stochastic gradient descent (SGD). DIGEST is an asynchronous decentralized algorithm building on local-SGD algorithms, which are originally designed for communication efficient centralized learning. We design both single-stream and multi-stream DIGEST, where the communication overhead may increase when the number of streams increases, and there is a convergence and communication overhead trade-off which can be leveraged. We analyze the convergence of single- and multi-stream DIGEST, and prove that both algorithms approach to the optimal solution asymptotically for both iid and non-iid data distributions. We evaluate the performance of single- and multi-stream DIGEST for logistic regression and a deep neural network ResNet20. The simulation results confirm that multi-stream DIGEST has nice convergence properties; i.e., its convergence time is better than or comparable to the baselines in iid setting, and outperforms the baselines in non-iid setting.
摘要
“两种广泛被考虑的分布式学习算法是聊天和随机游走学习。聊天算法(同步和异步版本)具有高通信成本,而随机游走学习则具有增长的收敛时间。在这篇论文中,我们设计了一种快速和通信效率高的异步分布式学习机制DIGEST,通过融合聊天和随机游走的想法,专注于随机梯度下降(SGD)。DIGEST是一种异步分布式算法,基于本地SGD算法,原本设计用于通信效率高的中央化学习。我们设计了单流和多流DIGEST,其通信开销随着流数增加,并且存在一种收敛和通信开销贸易,可以利用。我们分析了单流和多流DIGEST的收敛,并证明它们在iid和非iid数据分布下都能够向优化解决方案 asymptotically。我们对单流和多流DIGEST进行了逻辑回归和深度神经网络ResNet20的性能评估。实验结果表明,多流DIGEST具有良好的收敛性质,即其收敛时间在iid设定下比基eline更快,并在非iid设定下超过基eline。”
SALC: Skeleton-Assisted Learning-Based Clustering for Time-Varying Indoor Localization
For: The paper is written for establishing a sustainable and accurate indoor localization system that can adapt to highly-changing environments.* Methods: The paper proposes a skeleton-assisted learning-based clustering localization (SALC) system that jointly considers similarities from the skeleton-based shortest path (SSP) and time-varying RSS measurements across reference points (RPs). The system includes RSS-oriented map-assisted clustering (ROMAC), cluster-based online database establishment (CODE), and cluster-scaled location estimation (CsLE).* Results: The proposed SALC system can effectively reconstruct the fingerprint database with an enhanced location estimation accuracy, outperforming other existing schemes in the open literature. Both simulation and experimental results demonstrate the effectiveness of the proposed system.Abstract
Wireless indoor localization has attracted significant amount of attention in recent years. Using received signal strength (RSS) obtained from WiFi access points (APs) for establishing fingerprinting database is a widely utilized method in indoor localization. However, the time-variant problem for indoor positioning systems is not well-investigated in existing literature. Compared to conventional static fingerprinting, the dynamicallyreconstructed database can adapt to a highly-changing environment, which achieves sustainability of localization accuracy. To deal with the time-varying issue, we propose a skeleton-assisted learning-based clustering localization (SALC) system, including RSS-oriented map-assisted clustering (ROMAC), cluster-based online database establishment (CODE), and cluster-scaled location estimation (CsLE). The SALC scheme jointly considers similarities from the skeleton-based shortest path (SSP) and the time-varying RSS measurements across the reference points (RPs). ROMAC clusters RPs into different feature sets and therefore selects suitable monitor points (MPs) for enhancing location estimation. Moreover, the CODE algorithm aims for establishing adaptive fingerprint database to alleviate the timevarying problem. Finally, CsLE is adopted to acquire the target position by leveraging the benefits of clustering information and estimated signal variations in order to rescale the weights fromweighted k-nearest neighbors (WkNN) method. Both simulation and experimental results demonstrate that the proposed SALC system can effectively reconstruct the fingerprint database with an enhanced location estimation accuracy, which outperforms the other existing schemes in the open literature.
摘要
sans serif;">无线内部位置系统在过去几年内吸引了广泛的关注。使用WiFi接入点(AP)获得的接收信号强度(RSS)来建立指本库是内部位置系统中广泛使用的方法。然而,现有文献中对indoor位置系统中的时间变化问题的研究不够。相比于传统的静止指本,动态重建库可以适应高度变化的环境,实现地位测定精度的持续性。为解决时间变化问题,我们提议一种骨架协助学习基于扩展的分布式位置估计系统(SALC),包括RSS导向的地图帮助分组(ROMAC)、群集基本在线数据建立(CODE)和群集缩放位置估计(CsLE)。SALC方案同时考虑骨架基于最短路(SSP)的相似性和时间变化的RSS测量值 across reference points(RPs)。ROMAC将RPs分为不同的特征集并因此选择了改进地位估计的适用点(MPs)。此外,CODE算法目的是建立适应时间变化的指本库,以解决时间变化问题。最后,CsLE方法使用分组信息和估计信号变化来重新衡量weighted k-nearest neighbors(WkNN)方法中的权重,以实现更高的地位估计精度。在实验和 simulations中,我们发现,提议的SALC系统可以更好地重建指本库,并在开 literature中的其他方案中表现出更高的地位估计精度。
DistTGL: Distributed Memory-Based Temporal Graph Neural Network Training
results: 在实验中,DistTGL 实现了近线性的速度增长,相比单机方法,准确率提高 14.5%,训练 durchput 提高 10.17倍。Abstract
Memory-based Temporal Graph Neural Networks are powerful tools in dynamic graph representation learning and have demonstrated superior performance in many real-world applications. However, their node memory favors smaller batch sizes to capture more dependencies in graph events and needs to be maintained synchronously across all trainers. As a result, existing frameworks suffer from accuracy loss when scaling to multiple GPUs. Evenworse, the tremendous overhead to synchronize the node memory make it impractical to be deployed to distributed GPU clusters. In this work, we propose DistTGL -- an efficient and scalable solution to train memory-based TGNNs on distributed GPU clusters. DistTGL has three improvements over existing solutions: an enhanced TGNN model, a novel training algorithm, and an optimized system. In experiments, DistTGL achieves near-linear convergence speedup, outperforming state-of-the-art single-machine method by 14.5% in accuracy and 10.17x in training throughput.
摘要
<> translate "Memory-based Temporal Graph Neural Networks are powerful tools in dynamic graph representation learning and have demonstrated superior performance in many real-world applications. However, their node memory favors smaller batch sizes to capture more dependencies in graph events and needs to be maintained synchronously across all trainers. As a result, existing frameworks suffer from accuracy loss when scaling to multiple GPUs. Even worse, the tremendous overhead to synchronize the node memory make it impractical to be deployed to distributed GPU clusters. In this work, we propose DistTGL -- an efficient and scalable solution to train memory-based TGNNs on distributed GPU clusters. DistTGL has three improvements over existing solutions: an enhanced TGNN model, a novel training algorithm, and an optimized system. In experiments, DistTGL achieves near-linear convergence speedup, outperforming state-of-the-art single-machine method by 14.5% in accuracy and 10.17x in training throughput." into Simplified Chinese.<>室内Memery-based Temporal Graph Neural Networks是动态图表示学习中的 poderful工具,在多个实际应用中表现出了superior的性能。然而,它们的节点记忆偏好 smaller batch size以捕捉更多的图事件依赖关系,并需要在所有训练器上同步保持。因此,现有的框架会导致精度损失when scaling to multiple GPUs。worse, synchronizing the node memory leads to significant overhead, making it impractical to deploy to distributed GPU clusters.在这种情况下,我们提出了DistTGL——一种高效可扩展的解决方案,用于在分布式GPU集群上训练 memory-based TGNNs。DistTGL有三个改进:一种改进的TGNN模型,一种新的训练算法,以及一种优化的系统。在实验中,DistTGL实现了近线性的速度增长,相比单机方法的14.5%的精度提升和10.17x的训练吞吐量。
Exploring Link Prediction over Hyper-Relational Temporal Knowledge Graphs Enhanced with Time-Invariant Relational Knowledge
results: 实验结果表明,作者的模型在HTKG连接预测任务上显著超过了之前相关方法,并且可以通过同时利用时间不变的关系知识和时间信息来进一步提高表现。Abstract
Stemming from traditional knowledge graphs (KGs), hyper-relational KGs (HKGs) provide additional key-value pairs (i.e., qualifiers) for each KG fact that help to better restrict the fact validity. In recent years, there has been an increasing interest in studying graph reasoning over HKGs. In the meantime, due to the ever-evolving nature of world knowledge, extensive parallel works have been focusing on reasoning over temporal KGs (TKGs), where each TKG fact can be viewed as a KG fact coupled with a timestamp (or time period) specifying its time validity. The existing HKG reasoning approaches do not consider temporal information because it is not explicitly specified in previous benchmark datasets. Besides, all the previous TKG reasoning methods only lay emphasis on temporal reasoning and have no way to learn from qualifiers. To this end, we aim to fill the gap between TKG reasoning and HKG reasoning. We develop two new benchmark hyper-relational TKG (HTKG) datasets, i.e., Wiki-hy and YAGO-hy, and propose a HTKG reasoning model that efficiently models both temporal facts and qualifiers. We further exploit additional time-invariant relational knowledge from the Wikidata knowledge base and study its effectiveness in HTKG reasoning. Time-invariant relational knowledge serves as the knowledge that remains unchanged in time (e.g., Sasha Obama is the child of Barack Obama), and it has never been fully explored in previous TKG reasoning benchmarks and approaches. Experimental results show that our model substantially outperforms previous related methods on HTKG link prediction and can be enhanced by jointly leveraging both temporal and time-invariant relational knowledge.
摘要
traditional知识 graphs (KGs)的核心思想,hyper-relational知识 graphs (HKGs)提供每个KG事实的额外键值对(即资格),以更好地限定事实的有效性。近年来,研究图像理解在HKGs上有增加的兴趣。同时,由于世界知识的演化性,广泛的平行工作在图像理解过程中强调时间因素。现有的HKG理解方法不考虑时间信息,而且所有以前的TKG理解方法只是强调时间理解,没有考虑资格。为了填补这一空白,我们的目标是将HKG理解和TKG理解联系起来。我们开发了两个新的Benchmark hyper-relational TKG(HTKG)数据集,即Wiki-hy和YAGO-hy,并提出了一种HTKG理解模型,该模型能够有效地处理时间因素和资格。此外,我们还利用Wikidata知识库中的时间不变的关系知识,并研究其在HTKG理解中的效果。时间不变的关系知识是指不会随着时间的变化(例如萨沙·奥巴马是巴拉克·奥巴马的孩子),这种知识从未在过去的TKG理解benchmark和方法中被完全探索。实验结果表明,我们的模型在HTKG链接预测任务上显著超越了相关方法,并且可以通过同时利用时间因素和时间不变的关系知识来进一步提高性能。
Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning
results: 相比较现有的计算在内存(CIM)方法,MBI在MNIST字符识别任务上提高了能效率,相对于多层感知(MLP)-CIM和ResNet20-CIM方法,MBI的能效率提高了大约2.7倍和83倍Here’s the full translation of the abstract in Simplified Chinese:随着深度神经网络的快速发展,它们在各种任务上的表现得到了大幅提高,如图像和语音识别等。然而,随着模型的复杂度增加,计算成本和参数数量也随之增加,使得在资源受限设备上部署这些模型变得更加困难。本文提出了一种新的记忆搜索(MBI)方法,它具有计算免卷和只需查找的特点。通过缓存中存储键值对来实现计算免卷的推理。我们利用了隐藏向量来组合多个扫描结果,以实现问题的总分类输出。通过bayesian优化和归一化,减少了必要的查找数量,提高了准确率。此外,我们还提出了内存计算电路来快速查找输入查询匹配的关键vector。相比较现有的计算在内存(CIM)方法,MBI在MNIST字符识别任务上提高了能效率,相对于多层感知(MLP)-CIM和ResNet20-CIM方法,MBI的能效率提高了大约2.7倍和83倍。Abstract
The rapid advancement of deep neural networks has significantly improved various tasks, such as image and speech recognition. However, as the complexity of these models increases, so does the computational cost and the number of parameters, making it difficult to deploy them on resource-constrained devices. This paper proposes a novel memorization-based inference (MBI) that is compute free and only requires lookups. Specifically, our work capitalizes on the inference mechanism of the recurrent attention model (RAM), where only a small window of input domain (glimpse) is processed in a one time step, and the outputs from multiple glimpses are combined through a hidden vector to determine the overall classification output of the problem. By leveraging the low-dimensionality of glimpse, our inference procedure stores key value pairs comprising of glimpse location, patch vector, etc. in a table. The computations are obviated during inference by utilizing the table to read out key-value pairs and performing compute-free inference by memorization. By exploiting Bayesian optimization and clustering, the necessary lookups are reduced, and accuracy is improved. We also present in-memory computing circuits to quickly look up the matching key vector to an input query. Compared to competitive compute-in-memory (CIM) approaches, MBI improves energy efficiency by almost 2.7 times than multilayer perceptions (MLP)-CIM and by almost 83 times than ResNet20-CIM for MNIST character recognition.
摘要
深度神经网络的快速进步大大提高了各种任务,如图像和语音识别。然而,随着模型的复杂度增加,计算成本和参数数量也在增加,使得在有限资源的设备上部署变得困难。这篇论文提出了一种新的记忆化推理(MBI),它是计算免的,只需要lookups。我们的工作利用回卷注意力模型(RAM)的推理机制,只处理一次步骤中的小窗口输入领域(印象),并将多个印象的输出组合到一个隐藏向量中,以确定问题的总分类输出。我们利用印象的低维度,将推理过程中的关键值对存储在一个表中。在推理过程中,通过利用表来读取关键值对和计算免的推理。通过对搜索和分区进行优化,减少了必要的lookups,提高了准确率。我们还提出了内存计算电路,快速查找输入查询对应的匹配键向量。与与计算在内存(CIM)方法相比,MBI提高了能效率,相对于多层感知(MLP)-CIM的2.7倍,相对于ResNet20-CIM的83倍。
Generalizable Embeddings with Cross-batch Metric Learning
results: 本文在4个深度度量学benchmark上验证了这种方法的效果,并达到了比较好的结果。In English, this means:
for: The paper studies the Global Average Pooling (GAP) component in deep metric learning (DML) and how it can better capture Semantic Entity.
methods: The paper uses learnable prototypes to represent GAP, and shows that this method can be reliably learned across different batches.
results: The paper verifies the effectiveness of this method on four popular DML benchmarks, achieving good results.Abstract
Global average pooling (GAP) is a popular component in deep metric learning (DML) for aggregating features. Its effectiveness is often attributed to treating each feature vector as a distinct semantic entity and GAP as a combination of them. Albeit substantiated, such an explanation's algorithmic implications to learn generalizable entities to represent unseen classes, a crucial DML goal, remain unclear. To address this, we formulate GAP as a convex combination of learnable prototypes. We then show that the prototype learning can be expressed as a recursive process fitting a linear predictor to a batch of samples. Building on that perspective, we consider two batches of disjoint classes at each iteration and regularize the learning by expressing the samples of a batch with the prototypes that are fitted to the other batch. We validate our approach on 4 popular DML benchmarks.
摘要
全球平均池化(GAP)是深度度量学(DML)中常用的一个组件,用于Feature集合。其效果通常被归结到对每个特征向量视为不同的semantic实体,并将GAP视为它们的组合。虽然这种解释得到了证明,但是它的算法逻辑来学习可 generalized Entities来表示未经看过的类,深度度量学的重要目标,仍然不清楚。为此,我们将GAP表示为可学习的原型的吞合权重的 convex combination。我们然后证明了这种原型学习可以表示为一个递归过程,对一个批处理样本适应一个线性预测器。从这个角度出发,我们考虑了两个不同的批处理,并在每个迭代阶段对学习进行正则化,使用这些批处理中的样本表示另一个批处理中的原型。我们验证了我们的方法在4个深度度量学标准测试集上。
Efficiently Factorizing Boolean Matrices using Proximal Gradient Descent
paper_authors: Sebastian Dalleiger, Jilles Vreeken
for: addresses the interpretability problem of NMF on Boolean data
methods: uses Boolean algebra to decompose the input into low-rank Boolean factor matrices, with a novel elastic-binary regularizer and proximal gradient algorithm
results: demonstrates good performance in practice, with quick convergence, precise recovery of ground truth, and exact estimation of simulated rank; improves upon the state of the art in recall, loss, and runtime, and provides easily interpretable and semantically meaningful results on real-world data.Here’s the full text in Simplified Chinese:
for: addresses the interpretability problem of NMF on Boolean data
results: 通过广泛的实验表明,我们的方法在实际中工作良好:在 sintetic 数据上,它快速收敛,准确地回归真实值,并且正确地估算预设的rank; 在实际数据上,它超越了现有的状态,在回归、损失和运行时间上均有所提高,并且一个医疗领域的案例研究表明,我们的结果易于理解和具有Semantically Meaningful。Abstract
Addressing the interpretability problem of NMF on Boolean data, Boolean Matrix Factorization (BMF) uses Boolean algebra to decompose the input into low-rank Boolean factor matrices. These matrices are highly interpretable and very useful in practice, but they come at the high computational cost of solving an NP-hard combinatorial optimization problem. To reduce the computational burden, we propose to relax BMF continuously using a novel elastic-binary regularizer, from which we derive a proximal gradient algorithm. Through an extensive set of experiments, we demonstrate that our method works well in practice: On synthetic data, we show that it converges quickly, recovers the ground truth precisely, and estimates the simulated rank exactly. On real-world data, we improve upon the state of the art in recall, loss, and runtime, and a case study from the medical domain confirms that our results are easily interpretable and semantically meaningful.
摘要
Translated into Simplified Chinese:Addressing the interpretability problem of NMF on Boolean data, Boolean Matrix Factorization (BMF) uses Boolean algebra to decompose the input into low-rank Boolean factor matrices. These matrices are highly interpretable and very useful in practice, but they come at the high computational cost of solving an NP-hard combinatorial optimization problem. To reduce the computational burden, we propose to relax BMF continuously using a novel elastic-binary regularizer, from which we derive a proximal gradient algorithm. Through an extensive set of experiments, we demonstrate that our method works well in practice: On synthetic data, we show that it converges quickly, recovers the ground truth precisely, and estimates the simulated rank exactly. On real-world data, we improve upon the state of the art in recall, loss, and runtime, and a case study from the medical domain confirms that our results are easily interpretable and semantically meaningful.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
Towards Generalizable Detection of Urgency of Discussion Forum Posts
results: 使用支持向量回归算法和Universal Sentence Encoder嵌入式,实现了对讨论区帖子的优先级预测,可以帮助 instructor 更好地利用时间,提高学生学习质量Abstract
Students who take an online course, such as a MOOC, use the course's discussion forum to ask questions or reach out to instructors when encountering an issue. However, reading and responding to students' questions is difficult to scale because of the time needed to consider each message. As a result, critical issues may be left unresolved, and students may lose the motivation to continue in the course. To help address this problem, we build predictive models that automatically determine the urgency of each forum post, so that these posts can be brought to instructors' attention. This paper goes beyond previous work by predicting not just a binary decision cut-off but a post's level of urgency on a 7-point scale. First, we train and cross-validate several models on an original data set of 3,503 posts from MOOCs at University of Pennsylvania. Second, to determine the generalizability of our models, we test their performance on a separate, previously published data set of 29,604 posts from MOOCs at Stanford University. While the previous work on post urgency used only one data set, we evaluated the prediction across different data sets and courses. The best-performing model was a support vector regressor trained on the Universal Sentence Encoder embeddings of the posts, achieving an RMSE of 1.1 on the training set and 1.4 on the test set. Understanding the urgency of forum posts enables instructors to focus their time more effectively and, as a result, better support student learning.
摘要
在线学习者,如MOOC课程的学生,通常会使用课程的讨论 форуم来提问或与教师联系,当遇到问题时。然而,为了考虑每条消息,评估每个消息的时间成本很高,因此可能会有重要的问题被忽略。为了解决这个问题,我们构建了预测模型,以自动确定讨论 форум的优先级,以便将这些消息引导给教师的注意。这篇论文超过了之前的工作,不仅预测了一个二分类决策阈值,而且预测了每条消息的优先级水平,从1到7的7个级别。首先,我们训练和十分之检验了多种模型,使用大学 Pennsylvania的MOOC课程的原始数据集3,503条消息。其次,为了证明我们的模型的一致性,我们测试了它们的性能在另一个,已经发表的数据集29,604条消息中。而之前的帖子优先级预测工作只使用了一个数据集,我们在不同的数据集和课程之间评估预测。最佳性能的模型是使用Universe Sentence Encoder嵌入的支持向量回归模型,在训练集上的RMSE为1.1,测试集上的RMSE为1.4。了解讨论 форум中的帖子优先级,可以帮助教师更有效地利用时间,从而更好地支持学生学习。
First-order Methods for Affinely Constrained Composite Non-convex Non-smooth Problems: Lower Complexity Bound and Near-optimal Methods
results: 本论文首次为 composite non-convex non-smooth 优化问题提供了lower complexity bound,并采用了一种名为减小距离梯度(IPG)方法来实现这个目标。该方法具有 oracle complexity 与 lower bound 几乎相同的性质。Abstract
Many recent studies on first-order methods (FOMs) focus on \emph{composite non-convex non-smooth} optimization with linear and/or nonlinear function constraints. Upper (or worst-case) complexity bounds have been established for these methods. However, little can be claimed about their optimality as no lower bound is known, except for a few special \emph{smooth non-convex} cases. In this paper, we make the first attempt to establish lower complexity bounds of FOMs for solving a class of composite non-convex non-smooth optimization with linear constraints. Assuming two different first-order oracles, we establish lower complexity bounds of FOMs to produce a (near) $\epsilon$-stationary point of a problem (and its reformulation) in the considered problem class, for any given tolerance $\epsilon>0$. In addition, we present an inexact proximal gradient (IPG) method by using the more relaxed one of the two assumed first-order oracles. The oracle complexity of the proposed IPG, to find a (near) $\epsilon$-stationary point of the considered problem and its reformulation, matches our established lower bounds up to a logarithmic factor. Therefore, our lower complexity bounds and the proposed IPG method are almost non-improvable.
摘要
很多最近的研究对于首项方法(FOMs)强调 composite non-convex non-smooth 优化问题,包括线性和/或非线性函数约束。然而,对于这些方法的优化性没有很多研究,只有一些特殊的平滑非几何优化问题例外。在这篇论文中,我们首次尝试确定 FOMs 对于解决 composite non-convex non-smooth 优化问题的类型的下界复杂度。我们假设两种不同的首项或acles,并确定 FOMs 的下界复杂度,以便在任何给定的 tolerance ε > 0 下,生成 (near) ε-站点。此外,我们还提出了一种不准确的 proximal Gradient(IPG)方法,使用更松的一个首项或acles。我们的 IPG 方法的 oracle 复杂度与我们确定的下界复杂度几乎相同,只有一个对数性logarithmic factor。因此,我们的下界复杂度和提出的 IPG 方法在优化性方面几乎不可改进。
Smooth Lower Bounds for Differentially Private Algorithms via Padding-and-Permuting Fingerprinting Codes
results: 这个论文提供了新的下界在不同的设置下,包括DP averaging、approximate k-means clustering和DP subspace estimation等。这些下界是基于一种新的指纹lemmata,它比之前的指纹lemmata更加强大,并且可以直接从lemmata来证明下界。Abstract
Fingerprinting arguments, first introduced by Bun, Ullman, and Vadhan (STOC 2014), are the most widely used method for establishing lower bounds on the sample complexity or error of approximately differentially private (DP) algorithms. Still, there are many problems in differential privacy for which we don't know suitable lower bounds, and even for problems that we do, the lower bounds are not smooth, and usually become vacuous when the error is larger than some threshold. In this work, we present a simple method to generate hard instances by applying a padding-and-permuting transformation to a fingerprinting code. We illustrate the applicability of this method by providing new lower bounds in various settings: 1. A tight lower bound for DP averaging in the low-accuracy regime, which in particular implies a new lower bound for the private 1-cluster problem introduced by Nissim, Stemmer, and Vadhan (PODS 2016). 2. A lower bound on the additive error of DP algorithms for approximate k-means clustering, as a function of the multiplicative error, which is tight for a constant multiplication error. 3. A lower bound for estimating the top singular vector of a matrix under DP in low-accuracy regimes, which is a special case of DP subspace estimation studied by Singhal and Steinke (NeurIPS 2021). Our main technique is to apply a padding-and-permuting transformation to a fingerprinting code. However, rather than proving our results using a black-box access to an existing fingerprinting code (e.g., Tardos' code), we develop a new fingerprinting lemma that is stronger than those of Dwork et al. (FOCS 2015) and Bun et al. (SODA 2017), and prove our lower bounds directly from the lemma. Our lemma, in particular, gives a simpler fingerprinting code construction with optimal rate (up to polylogarithmic factors) that is of independent interest.
摘要
“指纹Argument”,最早由布恩、奥尔曼和 вадан(STOC 2014)引入,是最广泛使用的方法来确定下界或错误率的约 differentially private(DP)算法的下界。然而,有很多 differential privacy 问题,我们还没有知道合适的下界,而且甚至对已知的问题,下界不是平滑的,通常在误差大于某个阈值时变得无效。在这项工作中,我们提出了一种简单的方法,通过对指纹编码进行补充和排序转换来生成困难实例。我们通过以下几个方面证明了这种方法的应用性:1. 对DP抽象平均在低精度 régime中的下界,具体是对Nissim、Stemmer和 вадан(PODS 2016)所引入的私人1-集问题的新下界。2. DP算法对 Approximate k-means 集群化的添加性误差下界,其中multiplicative error是常数多项式的。3. DP算法对矩阵 top singular vector 的估计在低精度 régime中的下界,这是特殊的DP subspace estimation问题,与Singhal和Steinke(NeurIPS 2021)的研究相关。我们的主要技巧是对指纹编码进行补充和排序转换。而不是通过黑盒访问现有的指纹编码(例如Tardos的代码)来证明我们的结果(例如Dwork等人(FOCS 2015)和布恩等人(SODA 2017)的结果),我们开发了一个新的指纹 lemmatheorem,该lemmatheorem是Dwork等人(FOCS 2015)和布恩等人(SODA 2017)的lemmatheorem更强,并直接从lemmatheorem prove我们的下界。具体来说,我们的lemmatheorem提供了一种更简单的指纹编码建构,具有最佳率(即polylogarithmic factor),这是独立有价值的。
Training Discrete Energy-Based Models with Energy Discrepancy
results: 研究人员通过对三种扰动过程(bernoulli噪声、杜特推论变换和邻域结构)的性能进行比较,并在离散链模型、二进制 sintetic 数据和离散图像数据集上进行了实验,证明了ED的效果。Abstract
Training energy-based models (EBMs) on discrete spaces is challenging because sampling over such spaces can be difficult. We propose to train discrete EBMs with energy discrepancy (ED), a novel type of contrastive loss functional which only requires the evaluation of the energy function at data points and their perturbed counter parts, thus not relying on sampling strategies like Markov chain Monte Carlo (MCMC). Energy discrepancy offers theoretical guarantees for a broad class of perturbation processes of which we investigate three types: perturbations based on Bernoulli noise, based on deterministic transforms, and based on neighbourhood structures. We demonstrate their relative performance on lattice Ising models, binary synthetic data, and discrete image data sets.
摘要
培训能量基于模型(EBM)在极性空间上是具有挑战性的,因为抽样这些空间可能困难。我们提议使用能量差(ED),一种新的对比损失函数,只需评估能量函数在数据点和其扰动版本之间,因此不需要采用样本策略如Markov链 Monte Carlo(MCMC)。能量差提供了对广泛类型扰动过程的理论保证,我们investigate三种类型的扰动过程:基于 Bernoulli 噪声、基于 deterministic transforms 和基于 neighbor structure。我们在邻居 Ising 模型、二进制 synthetic 数据和极性图像数据集上证明了它们的相对性能。
A Quantitative Approach to Predicting Representational Learning and Performance in Neural Networks
results: 研究人员使用了一个简单的测试案例,然后使用该工具对一个关于表示学习对顺序单任务和并行多任务性能的问题进行预测。结果显示,该工具可以预测表示学习的规模初始化和训练课程对下游同时多任务性能的影响。Abstract
A key property of neural networks (both biological and artificial) is how they learn to represent and manipulate input information in order to solve a task. Different types of representations may be suited to different types of tasks, making identifying and understanding learned representations a critical part of understanding and designing useful networks. In this paper, we introduce a new pseudo-kernel based tool for analyzing and predicting learned representations, based only on the initial conditions of the network and the training curriculum. We validate the method on a simple test case, before demonstrating its use on a question about the effects of representational learning on sequential single versus concurrent multitask performance. We show that our method can be used to predict the effects of the scale of weight initialization and training curriculum on representational learning and downstream concurrent multitasking performance.
摘要
neuronal networks(生物和人工的)的一个关键性能是如何学习表示和处理输入信息以解决任务。不同类型的表示可能适用于不同类型的任务,因此识别和理解学习的表示是设计有用网络的关键部分。在这篇论文中,我们介绍了一种新的 Pseudo-kernel 基于工具 для分析和预测学习的表示,只基于网络的初始条件和训练课程。我们验证了这种方法在一个简单的测试场景中,然后示cases the use of this method to predict the effects of representational learning on sequential single versus concurrent multitask performance. We show that our method can be used to predict the effects of the scale of weight initialization and training curriculum on representational learning and downstream concurrent multitasking performance.Here's the translation in Traditional Chinese: neuronal networks(生物和人工的)的一个关键性能是如何学习表示和处理输入信息以解决任务。不同的类型的表示可能适用于不同的任务,因此识别和理解学习的表示是设计有用网络的关键部分。在这篇论文中,我们介绍了一种新的 Pseudo-kernel 基于工具 для分析和预测学习的表示,只基于网络的初始条件和训练课程。我们验证了这种方法在一个简单的测试场景中,然后示cases the use of this method to predict the effects of representational learning on sequential single versus concurrent multitask performance. We show that our method can be used to predict the effects of the scale of weight initialization and training curriculum on representational learning and downstream concurrent multitasking performance.
Harpa: High-Rate Phase Association with Travel Time Neural Fields
results: 这个论文表明可以在高速度下进行相关性分组,并且可以 efficiently处理不确定的波速。 numercial experiments表明,\harpa可以 efficiently associates high-rate seismicity clouds over complex, unknown wave speeds and graciously handles noisy and missing picks.Abstract
Phase association groups seismic wave arrivals according to their originating earthquakes. It is a fundamental task in a seismic data processing pipeline, but challenging to perform for smaller, high-rate seismic events which carry fundamental information about earthquake dynamics, especially with a commonly assumed inaccurate wave speed model. As a consequence, most association methods focus on larger events that occur at a lower rate and are thus easier to associate, even though microseismicity provides a valuable description of the elastic medium properties in the subsurface. In this paper, we show that association is possible at rates much higher than previously reported even when the wave speed is unknown. We propose Harpa, a high-rate seismic phase association method which leverages deep neural fields to build generative models of wave speeds and associated travel times, and first solves a joint spatio--temporal source localization and wave speed recovery problem, followed by association. We obviate the need for associated phases by interpreting arrival time data as probability measures and using an optimal transport loss to enforce data fidelity. The joint recovery problem is known to admit a unique solution under certain conditions but due to the non-convexity of the corresponding loss a simple gradient scheme converges to poor local minima. We show that this is effectively mitigated by stochastic gradient Langevin dynamics (SGLD). Numerical experiments show that \harpa~efficiently associates high-rate seismicity clouds over complex, unknown wave speeds and graciously handles noisy and missing picks.
摘要
phasic association groups seismic wave arrivals based on their originating earthquakes. It is a fundamental task in a seismic data processing pipeline, but challenging to perform for smaller, high-rate seismic events which carry fundamental information about earthquake dynamics, especially with a commonly assumed inaccurate wave speed model. As a consequence, most association methods focus on larger events that occur at a lower rate and are thus easier to associate, even though microseismicity provides a valuable description of the elastic medium properties in the subsurface. In this paper, we show that association is possible at rates much higher than previously reported even when the wave speed is unknown. We propose Harpa, a high-rate seismic phase association method which leverages deep neural fields to build generative models of wave speeds and associated travel times, and first solves a joint spatio--temporal source localization and wave speed recovery problem, followed by association. We obviate the need for associated phases by interpreting arrival time data as probability measures and using an optimal transport loss to enforce data fidelity. The joint recovery problem is known to admit a unique solution under certain conditions but due to the non-convexity of the corresponding loss a simple gradient scheme converges to poor local minima. We show that this is effectively mitigated by stochastic gradient Langevin dynamics (SGLD). Numerical experiments show that \harpa~efficiently associates high-rate seismicity clouds over complex, unknown wave speeds and graciously handles noisy and missing picks.
results: 这篇论文通过使用 Variational Prediction 技术,可以提供良好的预测分布,而无需在测试时进行 marginalization 成本。Abstract
Bayesian inference offers benefits over maximum likelihood, but it also comes with computational costs. Computing the posterior is typically intractable, as is marginalizing that posterior to form the posterior predictive distribution. In this paper, we present variational prediction, a technique for directly learning a variational approximation to the posterior predictive distribution using a variational bound. This approach can provide good predictive distributions without test time marginalization costs. We demonstrate Variational Prediction on an illustrative toy example.
摘要
Note:* "Bayesian inference" bayesian inference (悖论推理)* "maximum likelihood" maximum likelihood (最大可能性)* "posterior" posterior (后期)* "posterior predictive distribution" posterior predictive distribution (后期预测分布)* "variational bound" variational bound (可能性范围)* "variational prediction" variational prediction (可能性预测)
Reconstruction of 3-Axis Seismocardiogram from Right-to-left and Head-to-foot Components Using A Long Short-Term Memory Network
results: 研究获得了一个LSTM网络,可以将一个心脏周期中的100个时间步骤的SCG信号转换为dorsoventral方向的SCG信号,mean square error为0.09。这项研究显示了深度学习模型可以将 dual-axis加速计读取的数据转换为三轴SCG信号。Abstract
This pilot study aims to develop a deep learning model for predicting seismocardiogram (SCG) signals in the dorsoventral direction from the SCG signals in the right-to-left and head-to-foot directions ($\textrm{SCG}_x$ and $\textrm{SCG}_y$). The dataset used for the training and validation of the model was obtained from 15 healthy adult subjects. The SCG signals were recorded using tri-axial accelerometers placed on the chest of each subject. The signals were then segmented using electrocardiogram R waves, and the segments were downsampled, normalized, and centered around zero. The resulting dataset was used to train and validate a long short-term memory (LSTM) network with two layers and a dropout layer to prevent overfitting. The network took as input 100-time steps of $\textrm{SCG}_x$ and $\textrm{SCG}_y$, representing one cardiac cycle, and outputted a vector that mapped to the target variable being predicted. The results showed that the LSTM model had a mean square error of 0.09 between the predicted and actual SCG segments in the dorsoventral direction. The study demonstrates the potential of deep learning models for reconstructing 3-axis SCG signals using the data obtained from dual-axis accelerometers.
摘要
Here's the translation in Simplified Chinese:这项试验旨在开发一个深度学习模型,用于预测心电幂量信号(SCG)的dorsoventral方向。试验使用15名健康成人的SCG信号,通过三轴加速度计记录在胸部。信号被电cardiogram R波分割,下amples, норmalize和减少中心在零点。结果显示,使用LSTM网络(两层)和dropout层预防过拟合。网络输入100个时间步长的$SCG_x$和$SCG_y$,表示一个心脏频率征,输出一个向量,将目标变量映射到。结果显示LSTM模型与实际SCG段的平均方差为0.09。这项研究表明,深度学习模型可以使用双轴加速度计记录的数据来重建3轴SCG信号。
results: 该论文通过训练多种应用场景中的强大、稳定、可解释的探测器,达到了与当前状态艺术法的竞争性性能。Abstract
The monotonic dependence of the outputs of a neural network on some of its inputs is a crucial inductive bias in many scenarios where domain knowledge dictates such behavior. This is especially important for interpretability and fairness considerations. In a broader context, scenarios in which monotonicity is important can be found in finance, medicine, physics, and other disciplines. It is thus desirable to build neural network architectures that implement this inductive bias provably. In this work, we propose a weight-constrained architecture with a single residual connection to achieve exact monotonic dependence in any subset of the inputs. The weight constraint scheme directly controls the Lipschitz constant of the neural network and thus provides the additional benefit of robustness. Compared to currently existing techniques used for monotonicity, our method is simpler in implementation and in theory foundations, has negligible computational overhead, is guaranteed to produce monotonic dependence, and is highly expressive. We show how the algorithm is used to train powerful, robust, and interpretable discriminators that achieve competitive performance compared to current state-of-the-art methods across various benchmarks, from social applications to the classification of the decays of subatomic particles produced at the CERN Large Hadron Collider.
摘要
很多情况下,神经网络的输出对某些输入的 monotonic dependence 是一种重要的推导假设。这种假设在解释性和公平性方面具有重要意义。在更广泛的上下文中, monotonicity 在金融、医学、物理和其他领域都具有重要的意义。因此,建立能够实现这种假设的神经网络架构是非常感兴趣的。在这种情况下,我们提出了一种带有单个差异连接的权重约束架构,可以实现任意输入子集的精确 monotonic dependence。这种约束方案直接控制神经网络的 Lipschitz 常数,从而提供了额外的robustness benefit。与现有的 monotonicity 实现技术相比,我们的方法更简单,更有理论基础,计算开销几乎可以忽略不计,可以保证 monotonic dependence,并且具有很高的表达能力。我们显示了如何使用这种算法来训练高效、Robust、可解释的分类器,在社会应用和辐射子粒子在 CERN 大弹性粒子加速器中的分类方面达到了竞争性的性能。
results: 该系统可以减少线aje graph的存储占用量,并自动将下游模型更新对应的上游模型的更新。Abstract
Models derived from other models are extremely common in machine learning (ML) today. For example, transfer learning is used to create task-specific models from "pre-trained" models through finetuning. This has led to an ecosystem where models are related to each other, sharing structure and often even parameter values. However, it is hard to manage these model derivatives: the storage overhead of storing all derived models quickly becomes onerous, prompting users to get rid of intermediate models that might be useful for further analysis. Additionally, undesired behaviors in models are hard to track down (e.g., is a bug inherited from an upstream model?). In this paper, we propose a model versioning and management system called MGit that makes it easier to store, test, update, and collaborate on model derivatives. MGit introduces a lineage graph that records provenance and versioning information between models, optimizations to efficiently store model parameters, as well as abstractions over this lineage graph that facilitate relevant testing, updating and collaboration functionality. MGit is able to reduce the lineage graph's storage footprint by up to 7x and automatically update downstream models in response to updates to upstream models.
摘要
现在机器学习(ML)中,基于其他模型 derivated 的模型非常常见。例如,通过 fine-tuning 来创建任务特定模型从 "预训练" 模型。这导致了一个模型之间相互关联,共享结构,甚至参数值的生态系统。然而,管理这些模型Derivative 很困难:存储所有 derivated 模型的存储开销很快就变得压力很大,让用户放弃 intermediate 模型,可能用于进一步分析。此外,模型中的不良行为困难跟踪(例如,一个 bug 是从上游模型继承吗?)。在这篇论文中,我们提出一个名为 MGit 的模型版本管理系统,使得更加容易存储、测试、更新和合作模型Derivative。MGit 引入了模型家族图,记录模型的 происхождение和版本信息,并且提供了 Parameters 的优化,以及基于这个家族图的抽象,使得更加方便地进行相关的测试、更新和合作功能。MGit 能够将模型家族图的存储占用量减少至最多 7 倍,并自动将下游模型更新响应上游模型的更新。
Brain Tumor Detection using Convolutional Neural Networks with Skip Connections
results: 结果显示,一些优化技术可以致使CNN模型在这个目标上表现出色Abstract
In this paper, we present different architectures of Convolutional Neural Networks (CNN) to analyze and classify the brain tumors into benign and malignant types using the Magnetic Resonance Imaging (MRI) technique. Different CNN architecture optimization techniques such as widening and deepening of the network and adding skip connections are applied to improve the accuracy of the network. Results show that a subset of these techniques can judiciously be used to outperform a baseline CNN model used for the same purpose.
摘要
在这篇论文中,我们介绍了不同类型的卷积神经网络(CNN)来分类大脑肿瘤为良性和有害两类使用Magnetic Resonance Imaging(MRI)技术。不同的CNN结构优化技术 such as 宽化和深化网络以及添加跳过连接被应用以提高网络的准确率。结果显示,一个子集这些技术可以有效地使用以超越基eline CNN模型。Here's the word-for-word translation:在这篇论文中,我们介绍了不同类型的卷积神经网络(CNN)来分类大脑肿瘤为良性和有害两类使用Magnetic Resonance Imaging(MRI)技术。不同的CNN结构优化技术such as 宽化和深化网络以及添加跳过连接被应用以提高网络的准确率。结果显示,一个子集这些技术可以有效地使用以超越基eline CNN模型。
Reinforcement Learning for Photonic Component Design
results: 该算法可以提高插入损耗从8.8dB降至3.24dB,并且可以生成具有150nm宽扩展带width的设计,其最低点loss不超过10.2dB。Abstract
We present a new fab-in-the-loop reinforcement learning algorithm for the design of nano-photonic components that accounts for the imperfections present in nanofabrication processes. As a demonstration of the potential of this technique, we apply it to the design of photonic crystal grating couplers (PhCGC) fabricated on a 220nm silicon on insulator (SOI) single etch platform. This fab-in-the-loop algorithm improves the insertion loss from 8.8 dB to 3.24 dB. The widest bandwidth designs produced using our fab-in-the-loop algorithm are able to cover a 150nm bandwidth with less than 10.2 dB of loss at their lowest point.
摘要
我们提出了一种新的 fab-in-the-loop 束缚学习算法,用于 nanophotonic 组件的设计,考虑到 nanofabrication 过程中存在的不确定性。作为这种技术的演示,我们应用它于 SOI 单刻平台上的 photonic crystal grating couplers (PhCGC) 的设计。这种 fab-in-the-loop 算法改善了插入损耗从 8.8 dB 降低至 3.24 dB。我们使用这种算法生成的设计可以覆盖 150nm 的频谱宽度,且损耗在最低点下不超过 10.2 dB。
PseudoCal: A Source-Free Approach to Unsupervised Uncertainty Calibration in Domain Adaptation
paper_authors: Dapeng Hu, Jian Liang, Xinchao Wang, Chuan-Sheng Foo for:This paper focuses on improving the calibration of predictive uncertainty in unsupervised domain adaptation (UDA) models, specifically in source-free UDA settings.methods:The proposed method, PseudoCal, relies exclusively on unlabeled target data to calibrate UDA models. It transforms the unsupervised calibration problem into a supervised one by generating a labeled pseudo-target set that captures the structure of the real target.results:Extensive experiments on 10 UDA methods show that PseudoCal consistently exhibits significantly reduced calibration error compared to existing calibration methods, both in traditional UDA settings and recent source-free UDA scenarios.Abstract
Unsupervised domain adaptation (UDA) has witnessed remarkable advancements in improving the accuracy of models for unlabeled target domains. However, the calibration of predictive uncertainty in the target domain, a crucial aspect of the safe deployment of UDA models, has received limited attention. The conventional in-domain calibration method, \textit{temperature scaling} (TempScal), encounters challenges due to domain distribution shifts and the absence of labeled target domain data. Recent approaches have employed importance-weighting techniques to estimate the target-optimal temperature based on re-weighted labeled source data. Nonetheless, these methods require source data and suffer from unreliable density estimates under severe domain shifts, rendering them unsuitable for source-free UDA settings. To overcome these limitations, we propose PseudoCal, a source-free calibration method that exclusively relies on unlabeled target data. Unlike previous approaches that treat UDA calibration as a \textit{covariate shift} problem, we consider it as an unsupervised calibration problem specific to the target domain. Motivated by the factorization of the negative log-likelihood (NLL) objective in TempScal, we generate a labeled pseudo-target set that captures the structure of the real target. By doing so, we transform the unsupervised calibration problem into a supervised one, enabling us to effectively address it using widely-used in-domain methods like TempScal. Finally, we thoroughly evaluate the calibration performance of PseudoCal by conducting extensive experiments on 10 UDA methods, considering both traditional UDA settings and recent source-free UDA scenarios. The experimental results consistently demonstrate the superior performance of PseudoCal, exhibiting significantly reduced calibration error compared to existing calibration methods.
摘要
Unsupervised domain adaptation (UDA) 技术在目标频道中的准确性方面做出了很多突出的进步,但是目标频道中的预测 uncertainty 的准确性却受到了有限的关注。传统的域内准则(TempScal)方法在域 Distribution 的转移和目标频道没有标注数据的情况下遇到了挑战。现有的方法使用重要性评估技术来估算目标频道优化的温度,但是这些方法需要源数据,而且在严重的域转移情况下,概率估计不可靠,因此不适用于源自由 UDA 设置。为了解决这些局限性,我们提出了 PseudoCal,一种源自由的准则调整方法,不需要源数据。与前期方法不同,我们将 UDA 准则调整视为目标频道特有的无监督准则调整问题,而不是 covariate shift 问题。受 TempScal 的负逻辑 log-likelihood(NLL) objective 的因子化启发,我们生成了一个 Pseudo-target 集,这个集合捕捉了真实target 的结构。通过这种方式,我们将无监督准则调整问题转化为监督的一个,可以使用现有的域内方法,如 TempScal,进行有效地处理。最后,我们进行了广泛的实验,评估了 10 种 UDA 方法,包括传统的 UDA 设置以及 recent source-free UDA 情况。实验结果表明,PseudoCal 的准则调整性能明显高于现有的准则调整方法,显示它在 calibration error 方面具有显著的优势。
DreamTeacher: Pretraining Image Backbones with Deep Generative Models
results: 我们进行了多种生成模型、精密预测benchmark和预训练策略的实验研究,并观察到我们的梦教师在所有自我超越现有自然语言处理方法。不需要手动标注,使用梦教师进行无监督图像预训练,可以获得显著改善。Abstract
In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones. We propose to distill knowledge from a trained generative model into standard image backbones that have been well engineered for specific perception tasks. We investigate two types of knowledge distillation: 1) distilling learned generative features onto target image backbones as an alternative to pretraining these backbones on large labeled datasets such as ImageNet, and 2) distilling labels obtained from generative networks with task heads onto logits of target backbones. We perform extensive analyses on multiple generative models, dense prediction benchmarks, and several pre-training regimes. We empirically find that our DreamTeacher significantly outperforms existing self-supervised representation learning approaches across the board. Unsupervised ImageNet pre-training with DreamTeacher leads to significant improvements over ImageNet classification pre-training on downstream datasets, showcasing generative models, and diffusion generative models specifically, as a promising approach to representation learning on large, diverse datasets without requiring manual annotation.
摘要
“在这个研究中,我们介绍了一个自我超vised特征表示学习框架DreamTeacher,该框架利用生成网络进行预训练下游图像脑筋。我们提议通过将生成模型已经学习的特征知识融入到标准图像脑筋中来,而不是使用大量标注数据集如ImageNet进行预训练。我们研究了两种知识融入方法:1)将生成模型学习的特征知识直接融入目标图像脑筋中,2)将生成网络中的标签融入到目标脑筋的响应值中。我们对多种生成模型、粘密预测benchmark和预训练策略进行了广泛的分析。我们发现,我们的DreamTeacher在所有自我超vised表示学习方法之上表现出优异的成绩。不需要手动标注,使用DreamTeacher进行无监督ImageNet预训练可以在下游数据集上获得显著改进,特别是使用扩散生成模型。”
Population Expansion for Training Language Models with Private Federated Learning
paper_authors: Tatsuki Koga, Congzheng Song, Martin Pelikan, Mona Chitnis
for: 这个研究旨在提高 federated learning(FL) combined with differential privacy(DP)的机器学习(ML)训练效率和形式化隐私保证,尤其是在小型人口的情况下。
methods: 这个研究使用了域适应技术来扩展人口,以加快训练和提高最终模型质量。
results: 研究表明,使用这些技术可以提高模型的使用価价( Utility),在实际的语言模型化数据集上提高13%到30%。Abstract
Federated learning (FL) combined with differential privacy (DP) offers machine learning (ML) training with distributed devices and with a formal privacy guarantee. With a large population of devices, FL with DP produces a performant model in a timely manner. However, for applications with a smaller population, not only does the model utility degrade as the DP noise is inversely proportional to population, but also the training latency increases since waiting for enough clients to become available from a smaller pool is slower. In this work, we thus propose expanding the population based on domain adaptation techniques to speed up the training and improves the final model quality when training with small populations. We empirically demonstrate that our techniques can improve the utility by 13% to 30% on real-world language modeling datasets.
摘要
联合 federated learning (FL) 和差异隐私 (DP) 可以为分布式设备进行机器学习 (ML) 训练,并且提供正式的隐私保证。通过大量的设备人口,FL 与 DP 可以生成高性能的模型,但是在小规模应用中,模型的Utility 会逐渐下降,而且训练时间会增加,因为等待足够的客户端可用于训练的池子中 slower。为了解决这个问题,我们提议通过领域适应技术扩大人口,以加速训练和提高最终模型质量。我们经验表明,我们的技术可以提高实际语言模型集成的Utility 13% 到 30%。
Structured Pruning of Neural Networks for Constraints Learning
results: 实验结果显示,删除可以对多层 feed-forward neural networks 建立反例,并且可以实现很大的解决速度提高,而不会对最终决策的质量产生影响。Abstract
In recent years, the integration of Machine Learning (ML) models with Operation Research (OR) tools has gained popularity across diverse applications, including cancer treatment, algorithmic configuration, and chemical process optimization. In this domain, the combination of ML and OR often relies on representing the ML model output using Mixed Integer Programming (MIP) formulations. Numerous studies in the literature have developed such formulations for many ML predictors, with a particular emphasis on Artificial Neural Networks (ANNs) due to their significant interest in many applications. However, ANNs frequently contain a large number of parameters, resulting in MIP formulations that are impractical to solve, thereby impeding scalability. In fact, the ML community has already introduced several techniques to reduce the parameter count of ANNs without compromising their performance, since the substantial size of modern ANNs presents challenges for ML applications as it significantly impacts computational efforts during training and necessitates significant memory resources for storage. In this paper, we showcase the effectiveness of pruning, one of these techniques, when applied to ANNs prior to their integration into MIPs. By pruning the ANN, we achieve significant improvements in the speed of the solution process. We discuss why pruning is more suitable in this context compared to other ML compression techniques, and we identify the most appropriate pruning strategies. To highlight the potential of this approach, we conduct experiments using feed-forward neural networks with multiple layers to construct adversarial examples. Our results demonstrate that pruning offers remarkable reductions in solution times without hindering the quality of the final decision, enabling the resolution of previously unsolvable instances.
摘要
近年来,机器学习(ML)模型与运筹学(OR)工具的集成在多种应用中得到了普遍的推广,包括肿瘤治疗、算法配置和化学过程优化。在这个领域,ML和OR的结合经常通过使用混合整数编程(MIP)表述来实现。文献中有很多研究对多种ML预测器进行了MIP表述,特别是人工神经网络(ANNs),因为它们在许多应用中具有极高的 интерес。然而,ANNs经常具有很多参数,导致MIP表述变得不实现,从而降低了扩展性。事实上,ML社区已经开发出了许多技术来减少ANNs中参数的数量,以避免降低性能。在这篇论文中,我们展示了采用剪枝(pruning)这一技术可以在ANNs之前进行剪枝,从而实现显著提高解决速度的效果。我们解释了为什么剪枝在这个上下文中比其他ML压缩技术更适用,并标识了最佳剪枝策略。为了强调这种方法的潜力,我们通过使用多层感知网络构建了反对抗例。我们的结果表明,剪枝可以很有效地减少解决时间,而无需妨碍最终决策的质量,从而解决了之前不可解决的实例。
Generative adversarial networks for data-scarce spectral applications
results: 研究发现,使用 CWGAN 进行数据增强,可以提高 FFNN 的表现,特别是在有限数据情况下。此外,CWGAN 可以作为低数据情况下的代理模型,表现较好。Abstract
Generative adversarial networks (GANs) are one of the most robust and versatile techniques in the field of generative artificial intelligence. In this work, we report on an application of GANs in the domain of synthetic spectral data generation, offering a solution to the scarcity of data found in various scientific contexts. We demonstrate the proposed approach by applying it to an illustrative problem within the realm of near-field radiative heat transfer involving a multilayered hyperbolic metamaterial. We find that a successful generation of spectral data requires two modifications to conventional GANs: (i) the introduction of Wasserstein GANs (WGANs) to avoid mode collapse, and, (ii) the conditioning of WGANs to obtain accurate labels for the generated data. We show that a simple feed-forward neural network (FFNN), when augmented with data generated by a CWGAN, enhances significantly its performance under conditions of limited data availability, demonstrating the intrinsic value of CWGAN data augmentation beyond simply providing larger datasets. In addition, we show that CWGANs can act as a surrogate model with improved performance in the low-data regime with respect to simple FFNNs. Overall, this work highlights the potential of generative machine learning algorithms in scientific applications beyond image generation and optimization.
摘要
生成对抗网络(GAN)是生成人工智能领域最为稳健和多样化的技术之一。在这项工作中,我们报告了GAN在spectral数据生成领域的应用,提供了数据缺乏问题的解决方案。我们通过在多层赫普力元件中的近场辐射热传输问题中应用提议方法来示例。我们发现,成功生成spectral数据需要两个修改:(i)引入Wasserstein GANs(WGANs)以避免模式塌溃,以及(ii)使WGANs Conditioned以获取准确的标签 для生成数据。我们表明,在有限数据情况下,一个简单的Feed-Forward Neural Network(FFNN),当其被补充了由CWGAN生成的数据后,显著提高了其性能。此外,我们还示出了CWGAN可以作为低数据情况下的代理模型,其性能比简单的FFNN更高。总的来说,这项工作强调了生成机器学习算法在科学应用之外的潜在价值。
results: 该论文提出了一种可以实现$(1+\gamma)$-倍增加的差分隐私 clustering算法,使用了流处理模型和差分隐私技术。该算法的空间复杂度为$poly(k,d,\log(T))$,并且可以保证对于任意的$\gamma>0$,扩展系数是$(1+\gamma)$,增加系数是$poly(k,d,\log(T))$.Abstract
The streaming model is an abstraction of computing over massive data streams, which is a popular way of dealing with large-scale modern data analysis. In this model, there is a stream of data points, one after the other. A streaming algorithm is only allowed one pass over the data stream, and the goal is to perform some analysis during the stream while using as small space as possible. Clustering problems (such as $k$-means and $k$-median) are fundamental unsupervised machine learning primitives, and streaming clustering algorithms have been extensively studied in the past. However, since data privacy becomes a central concern in many real-world applications, non-private clustering algorithms are not applicable in many scenarios. In this work, we provide the first differentially private streaming algorithms for $k$-means and $k$-median clustering of $d$-dimensional Euclidean data points over a stream with length at most $T$ using $poly(k,d,\log(T))$ space to achieve a {\it constant} multiplicative error and a $poly(k,d,\log(T))$ additive error. In particular, we present a differentially private streaming clustering framework which only requires an offline DP coreset algorithm as a blackbox. By plugging in existing DP coreset results via Ghazi, Kumar, Manurangsi 2020 and Kaplan, Stemmer 2018, we achieve (1) a $(1+\gamma)$-multiplicative approximation with $\tilde{O}_\gamma(poly(k,d,\log(T)))$ space for any $\gamma>0$, and the additive error is $poly(k,d,\log(T))$ or (2) an $O(1)$-multiplicative approximation with $\tilde{O}(k \cdot poly(d,\log(T)))$ space and $poly(k,d,\log(T))$ additive error. In addition, our algorithmic framework is also differentially private under the continual release setting, i.e., the union of outputs of our algorithms at every timestamp is always differentially private.
摘要
“流处理模型是大规模数据流处理的抽象,是现代数据分析中受欢迎的方法。在这个模型中,有一串数据点,一个接一个地进行处理。流处理算法只有一次可以访问数据流,目标是在流中进行分析,使用最小的空间。归类问题(如$k$-means和$k$- median)是现代无监督机器学习的基本 primitives,流处理归类算法已经得到了广泛的研究。然而,由于数据隐私问题的关注,非私有的归类算法不适用于许多场景。在这种情况下,我们提供了首个具有常量多元因子错误和$poly(k,d,\log(T))$空间的扩展隐私流处理归类算法。特别是,我们提供了一个具有隐私性的流处理归类框架,只需要一个私有DP核心算法作为黑盒。通过插入现有的DP核心结果,我们实现了以下两个目标:1. $(1+\gamma)$-多元因子近似, $\tilde{O}_\gamma(poly(k,d,\log(T)))$ 空间,对于任何 $\gamma>0$。错误是 $poly(k,d,\log(T))$。2. $O(1)$-多元因子近似, $\tilde{O}(k \cdot poly(d,\log(T)))$ 空间,错误是 $poly(k,d,\log(T))$。此外,我们的算法框架还是隐私的,即将流处理算法的输出集合在每个时间戳都是隐私的。”
Can Large Language Models Empower Molecular Property Prediction?
paper_authors: Chen Qian, Huayi Tang, Zhirui Yang, Hong Liang, Yong Liu
for: 本研究旨在利用大型自然语言模型(LLM)提高分子物理性能预测。
methods: 本研究采用两个视角:零/几次分子类型化和使用LLM生成的新解释作为分子表示。
results: 实验结果表明,使用文本解释作为分子表示可以在多个benchmark数据集上实现优越性,并证明LLM在分子物理性能预测任务中具有极大的潜力。Abstract
Molecular property prediction has gained significant attention due to its transformative potential in multiple scientific disciplines. Conventionally, a molecule graph can be represented either as a graph-structured data or a SMILES text. Recently, the rapid development of Large Language Models (LLMs) has revolutionized the field of NLP. Although it is natural to utilize LLMs to assist in understanding molecules represented by SMILES, the exploration of how LLMs will impact molecular property prediction is still in its early stage. In this work, we advance towards this objective through two perspectives: zero/few-shot molecular classification, and using the new explanations generated by LLMs as representations of molecules. To be specific, we first prompt LLMs to do in-context molecular classification and evaluate their performance. After that, we employ LLMs to generate semantically enriched explanations for the original SMILES and then leverage that to fine-tune a small-scale LM model for multiple downstream tasks. The experimental results highlight the superiority of text explanations as molecular representations across multiple benchmark datasets, and confirm the immense potential of LLMs in molecular property prediction tasks. Codes are available at \url{https://github.com/ChnQ/LLM4Mol}.
摘要
молекулярная свойство предсказание 已经吸引了广泛关注,因为它在多种科学领域中可能产生很大的转变。通常,分子图可以表示为图structured data或SMILES文本。在最近几年,大型自然语言模型(LLMs)的快速发展已经革命化了自然语言处理(NLP)领域。虽然可以使用LLMs来帮助理解表示分子的SMILES,但是研究如何使用LLMs进行分子性质预测的阶段还处于早期。在这种工作中,我们通过两个视角提前这个目标:零/几个分子类别和使用LLMs生成的新解释来代表分子。具体来说,我们首先请求LLMs在上下文中进行分子分类,并评估其表现。然后,我们使用LLMs生成semantically Rich explanation for the original SMILES,并使用这些解释来精细调整一个小规模LM模型 для多个下游任务。实验结果表明文本解释作为分子表示的超越多个benchmark dataset,并证明LLMs在分子性质预测任务中的极大潜力。代码可以在 \url{https://github.com/ChnQ/LLM4Mol} 中找到。
results: 研究发现三个主要预测年龄关键部位:脊梁、自生背肌和心脏区,其中心脏区的重要性最高。该模型在整体身体图像上实现了state-of-the-art的年龄预测精度,年龄差异平均值为2.76年。Abstract
Age prediction is an important part of medical assessments and research. It can aid in detecting diseases as well as abnormal ageing by highlighting the discrepancy between chronological and biological age. To gain a comprehensive understanding of age-related changes observed in various body parts, we investigate them on a larger scale by using whole-body images. We utilise the Grad-CAM interpretability method to determine the body areas most predictive of a person's age. We expand our analysis beyond individual subjects by employing registration techniques to generate population-wide interpretability maps. Furthermore, we set state-of-the-art whole-body age prediction with a model that achieves a mean absolute error of 2.76 years. Our findings reveal three primary areas of interest: the spine, the autochthonous back muscles, and the cardiac region, which exhibits the highest importance.
摘要
预测年龄是医学评估和研究中的一个重要部分。它可以帮助检测疾病以及异常年龄的趋势,并且通过显示生物年龄与 cronological age 之间的差异来提供有价值的信息。为了更全面地了解不同部位的年龄相关变化,我们使用整体图像进行研究。我们使用 Grad-CAM 可读性方法来确定人体各部位最有predictive value的地方。此外,我们还使用注册技术来生成全 популяцион的可读性地图,以扩展我们的分析范围。此外,我们实现了全身年龄预测的state-of-the-art模型,其 сред平均绝对误差为2.76年。我们的发现表明了三个主要领域:脊梁、自生肌肉和心脏区域,这三个领域具有最高的重要性。
results: 我们的方法在三个医疗图像 dataset 上进行实验,比较了旧有的方法,结果显示我们的方法可以提高增强效果和下游任务的表现。Abstract
Unpaired Medical Image Enhancement (UMIE) aims to transform a low-quality (LQ) medical image into a high-quality (HQ) one without relying on paired images for training. While most existing approaches are based on Pix2Pix/CycleGAN and are effective to some extent, they fail to explicitly use HQ information to guide the enhancement process, which can lead to undesired artifacts and structural distortions. In this paper, we propose a novel UMIE approach that avoids the above limitation of existing methods by directly encoding HQ cues into the LQ enhancement process in a variational fashion and thus model the UMIE task under the joint distribution between the LQ and HQ domains. Specifically, we extract features from an HQ image and explicitly insert the features, which are expected to encode HQ cues, into the enhancement network to guide the LQ enhancement with the variational normalization module. We train the enhancement network adversarially with a discriminator to ensure the generated HQ image falls into the HQ domain. We further propose a content-aware loss to guide the enhancement process with wavelet-based pixel-level and multi-encoder-based feature-level constraints. Additionally, as a key motivation for performing image enhancement is to make the enhanced images serve better for downstream tasks, we propose a bi-level learning scheme to optimize the UMIE task and downstream tasks cooperatively, helping generate HQ images both visually appealing and favorable for downstream tasks. Experiments on three medical datasets, including two newly collected datasets, verify that the proposed method outperforms existing techniques in terms of both enhancement quality and downstream task performance. We will make the code and the newly collected datasets publicly available for community study.
摘要
<>translate(umie)Unsupervised Medical Image Enhancement (UMIE) 的目标是将低质量(LQ)医疗图像转化为高质量(HQ)图像,而不依赖于对训练图像的对应。现有的方法多数基于 Pix2Pix/CycleGAN,虽然有一定的效果,但是它们不会直接使用 HQ 信息来导引增强过程,这可能会导致不 DESIRED 的artefacts 和结构扭曲。在这篇论文中,我们提出了一种新的 UMIE 方法,通过直接在 LQ 增强过程中编码 HQ 信息来避免上述 limitation。 Specifically,我们从 HQ 图像中提取特征,并将这些特征直接插入增强网络中,以帮助 LQ 图像增强。我们使用变量 норmalization 模块来确保生成的 HQ 图像处于 HQ 领域内。我们还提出了一种内容相关的损失函数,用于引导增强过程,并且使用 wavelet 基于像素级和多个 encoder 基于特征级的约束来限制增强过程。此外,作为增强图像的主要目的是为了使其更适合下游任务,我们提出了一种两级学习方案,通过将 UMIE 任务和下游任务相互协同学习,以生成高质量的图像,同时也能够满足下游任务的需求。实验结果表明,提出的方法在三个医疗数据集上都有较高的增强质量和下游任务性能。我们将代码和新收集的数据集公开发布,以便社区进行研究。Note: The translation is in Simplified Chinese, which is the standard Chinese writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.
MUVF-YOLOX: A Multi-modal Ultrasound Video Fusion Network for Renal Tumor Diagnosis
paper_authors: Junyu Li, Han Huang, Dong Ni, Wufeng Xue, Dongmei Zhu, Jun Cheng for:这个论文的目的是检测和分类肾脏癌,以提高患者存活率。methods:这个论文使用了多模态超声影像视频融合网络,将B模式和CEUS模式超声影像视频融合到一起,以提高肾脏癌诊断的准确性。results:实验结果表明,提案的框架在多中心数据集上表现出色,超过单模态模型和竞争方法。此外,我们的OTA模块在分类任务中获得了更高的准确率。代码可以在GitHub上获取:https://github.com/JeunyuLi/MUAF。Abstract
Early diagnosis of renal cancer can greatly improve the survival rate of patients. Contrast-enhanced ultrasound (CEUS) is a cost-effective and non-invasive imaging technique and has become more and more frequently used for renal tumor diagnosis. However, the classification of benign and malignant renal tumors can still be very challenging due to the highly heterogeneous appearance of cancer and imaging artifacts. Our aim is to detect and classify renal tumors by integrating B-mode and CEUS-mode ultrasound videos. To this end, we propose a novel multi-modal ultrasound video fusion network that can effectively perform multi-modal feature fusion and video classification for renal tumor diagnosis. The attention-based multi-modal fusion module uses cross-attention and self-attention to extract modality-invariant features and modality-specific features in parallel. In addition, we design an object-level temporal aggregation (OTA) module that can automatically filter low-quality features and efficiently integrate temporal information from multiple frames to improve the accuracy of tumor diagnosis. Experimental results on a multicenter dataset show that the proposed framework outperforms the single-modal models and the competing methods. Furthermore, our OTA module achieves higher classification accuracy than the frame-level predictions. Our code is available at \url{https://github.com/JeunyuLi/MUAF}.
摘要
早期诊断reno肿瘤可以大大提高患者存活率。对比增强超声(CEUS)是一种 Cost-effective 和非侵入的成像技术,在reno肿瘤诊断中越来越常用。然而,分类benign和malignantreno肿瘤仍然是非常困难的,这是因为肿瘤的高度多样性和成像 artifacts。我们的目标是通过 integrate B-mode和CEUS-mode超声视频来检测和分类reno肿瘤。为此,我们提出了一种 novel 多模态超声视频融合网络,可以有效地执行多模态特征融合和视频分类。我们的注意力基于多模态融合模块使用 Cross-attention 和自注意力来提取模式不变特征和模式特征。此外,我们设计了一个 object-level temporal aggregation(OTA)模块,可以自动筛选低质量特征并有效地集成多帧中的时间信息,以提高肿瘤诊断的准确性。实验结果表明,我们提出的框架在多中心数据集上超过单模态模型和竞争方法。此外,我们的 OTA 模块在 Frame-level 预测中实现了更高的分类精度。我们的代码可以在 上获取。
Theoretical Analysis of Binary Masks in Snapshot Compressive Imaging Systems
for: 这 paper 主要研究了 binary 面Mask 在 compressive imaging 系统中的影响。
methods: 该 paper 使用了 teoretic 分析方法来 investigate binary 面Mask 的影响。
results: 研究发现,最佳的 binary 面Mask 的概率非零元素小于 0.5,这提供了设计和优化 binary 面Mask 的 valuable 信息。Abstract
Snapshot compressive imaging (SCI) systems have gained significant attention in recent years. While previous theoretical studies have primarily focused on the performance analysis of Gaussian masks, practical SCI systems often employ binary-valued masks. Furthermore, recent research has demonstrated that optimized binary masks can significantly enhance system performance. In this paper, we present a comprehensive theoretical characterization of binary masks and their impact on SCI system performance. Initially, we investigate the scenario where the masks are binary and independently identically distributed (iid), revealing a noteworthy finding that aligns with prior numerical results. Specifically, we show that the optimal probability of non-zero elements in the masks is smaller than 0.5. This result provides valuable insights into the design and optimization of binary masks for SCI systems, facilitating further advancements in the field. Additionally, we extend our analysis to characterize the performance of SCI systems where the mask entries are not independent but are generated based on a stationary first-order Markov process. Overall, our theoretical framework offers a comprehensive understanding of the performance implications associated with binary masks in SCI systems.
摘要
快照压缩成像(SCI)系统在最近几年内获得了广泛关注。而在理论研究中,既前面的研究主要集中在 Gaussian 面积上的性能分析,实际的 SCI 系统却常常使用二值面积。此外,最近的研究表明,优化的二值面积可以显著提高系统性能。在这篇论文中,我们提供了 SCi 系统中二值面积的完整理论Characterization,并对其对系统性能的影响进行了深入分析。首先,我们研究了面积为二值独立相同分布(iid)的情况,发现一个值得注意的结论,与之前的数值结果相符。具体来说,我们证明了最佳非零元素概率在面积中小于 0.5。这个结论为 SCi 系统中 binary 面积的设计和优化提供了有价值的准则,推动了领域的进一步发展。此外,我们将分析推广到 SCi 系统中面积条件不独立,而是基于一个站立的首阶Markov 过程生成的情况。总的来说,我们的理论框架为 SCi 系统中二值面积的性能影响提供了全面的理解。
Tightly-Coupled LiDAR-Visual SLAM Based on Geometric Features for Mobile Agents
results: 在公共数据集 M2DGR 上进行评估,与当前状态的多模式方法相比,我们的系统实现了更高精度和更加稳定的姿态估计。Abstract
The mobile robot relies on SLAM (Simultaneous Localization and Mapping) to provide autonomous navigation and task execution in complex and unknown environments. However, it is hard to develop a dedicated algorithm for mobile robots due to dynamic and challenging situations, such as poor lighting conditions and motion blur. To tackle this issue, we propose a tightly-coupled LiDAR-visual SLAM based on geometric features, which includes two sub-systems (LiDAR and monocular visual SLAM) and a fusion framework. The fusion framework associates the depth and semantics of the multi-modal geometric features to complement the visual line landmarks and to add direction optimization in Bundle Adjustment (BA). This further constrains visual odometry. On the other hand, the entire line segment detected by the visual subsystem overcomes the limitation of the LiDAR subsystem, which can only perform the local calculation for geometric features. It adjusts the direction of linear feature points and filters out outliers, leading to a higher accurate odometry system. Finally, we employ a module to detect the subsystem's operation, providing the LiDAR subsystem's output as a complementary trajectory to our system while visual subsystem tracking fails. The evaluation results on the public dataset M2DGR, gathered from ground robots across various indoor and outdoor scenarios, show that our system achieves more accurate and robust pose estimation compared to current state-of-the-art multi-modal methods.
摘要
Mobile robot靠SLAM(同时地址和地图生成)提供自主导航和任务执行在复杂和未知环境中。然而,为手动机器人开发专门的算法很难,因为有动态和挑战性的情况,如亮度不足和运动模糊。为解决这个问题,我们提议一种紧密相互关联的LiDAR-视觉SLAM,包括两个子系统(LiDAR和单目视觉SLAM)和一个融合框架。融合框架将LiDAR和视觉的多模态几何特征相关,以增强视觉线坐标的精度和 semantics,并在Bundle Adjustment(BA)中添加方向优化。这会进一步约束视觉odoometry。另一方面,视觉子系统检测到的整条视觉线将LiDAR子系统的局部计算限制,并且可以调整视觉线的方向和过滤异常值,从而实现更高精度的odoometry系统。最后,我们采用一个模块来检测子系统的操作,提供LiDAR子系统的补做轨迹,而视觉子系统tracking失败时。根据M2DGR公共数据集,评估结果显示,我们的系统在多模态方法中实现了更高精度和robust的pose估计。
Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments
methods: 使用 Grounded Situation Recognition (GSR) 技术,并将其扩展为 Open Scene Understanding (OpenSU) 系统,包括实现 pixel-wise dense segmentation masks 以及增强特征提取和Encoder-Decoder 结构之间的互动。
results: 在 SWiG 数据集上取得了最佳性能,并在实际应用中显示了对 PVI 人群的独立移动能力提高。Abstract
Grounded Situation Recognition (GSR) is capable of recognizing and interpreting visual scenes in a contextually intuitive way, yielding salient activities (verbs) and the involved entities (roles) depicted in images. In this work, we focus on the application of GSR in assisting people with visual impairments (PVI). However, precise localization information of detected objects is often required to navigate their surroundings confidently and make informed decisions. For the first time, we propose an Open Scene Understanding (OpenSU) system that aims to generate pixel-wise dense segmentation masks of involved entities instead of bounding boxes. Specifically, we build our OpenSU system on top of GSR by additionally adopting an efficient Segment Anything Model (SAM). Furthermore, to enhance the feature extraction and interaction between the encoder-decoder structure, we construct our OpenSU system using a solid pure transformer backbone to improve the performance of GSR. In order to accelerate the convergence, we replace all the activation functions within the GSR decoders with GELU, thereby reducing the training duration. In quantitative analysis, our model achieves state-of-the-art performance on the SWiG dataset. Moreover, through field testing on dedicated assistive technology datasets and application demonstrations, the proposed OpenSU system can be used to enhance scene understanding and facilitate the independent mobility of people with visual impairments. Our code will be available at https://github.com/RuipingL/OpenSU.
摘要
“固定场景认知(GSR)能够理解和解释视觉场景,生成出明确的活动(动词)和参与者(角色)。在这项工作中,我们关注使用GSR进行辅助视障人群(PVI)。然而,精确的本地化信息可以帮助人们自信地导航周围环境和做出 Informed 决策。为了实现这一目标,我们第一次提出了一个开放场景理解(OpenSU)系统,旨在生成像素粒度的精密分割mask,而不是 bounding box。具体来说,我们基于GSR构建了OpenSU系统,并采用高效的Segment Anything Model(SAM)。此外,为了提高Encoder-Decoder结构中的特征提取和交互,我们使用了坚实的纯transformer背景。为了加速训练,我们在GSR解码器中replace所有活动函数,使得训练时间缩短。在量化分析中,我们的模型在SWiG数据集上达到了领先的性能。此外,通过特定辅助技术数据集和应用示例测试,我们的OpenSU系统可以增强场景理解和推动视障人群的独立行动。我们的代码将在https://github.com/RuipingL/OpenSU上公开。”
ExposureDiffusion: Learning to Expose for Low-light Image Enhancement
results: 比传统扩散模型具有更好的性能和更快的执行时间,并且可以与不同的背景网络和实际对照数据集一起使用Abstract
Previous raw image-based low-light image enhancement methods predominantly relied on feed-forward neural networks to learn deterministic mappings from low-light to normally-exposed images. However, they failed to capture critical distribution information, leading to visually undesirable results. This work addresses the issue by seamlessly integrating a diffusion model with a physics-based exposure model. Different from a vanilla diffusion model that has to perform Gaussian denoising, with the injected physics-based exposure model, our restoration process can directly start from a noisy image instead of pure noise. As such, our method obtains significantly improved performance and reduced inference time compared with vanilla diffusion models. To make full use of the advantages of different intermediate steps, we further propose an adaptive residual layer that effectively screens out the side-effect in the iterative refinement when the intermediate results have been already well-exposed. The proposed framework can work with both real-paired datasets, SOTA noise models, and different backbone networks. Note that, the proposed framework is compatible with real-paired datasets, real/synthetic noise models, and different backbone networks. We evaluate the proposed method on various public benchmarks, achieving promising results with consistent improvements using different exposure models and backbones. Besides, the proposed method achieves better generalization capacity for unseen amplifying ratios and better performance than a larger feedforward neural model when few parameters are adopted.
摘要
以前的Raw图像基于低光照图像增强方法主要采用了Feedforward神经网络来学习确定性的映射从低光照图像到正常曝光图像。然而,它们没有捕捉到重要的分布信息,导致视觉不满意的结果。这个工作解决这个问题,通过将扩散模型与物理基础曝光模型相结合。与普通的扩散模型不同,我们的恢复过程可以直接从噪声图像开始,而不需要纯噪声。因此,我们的方法可以获得显著改进的性能和减少推理时间,相比普通的扩散模型。为了充分利用不同的中间结果的优势,我们还提出了适应性的剩余层,可以有效地排除中间结果的副作用在迭代纠正过程中。我们的框架可以与真实对应的数据集、前进推理模型和不同的背景网络一起工作。注意,我们的框架与真实对应的数据集、真实/生成噪声模型和不同的背景网络兼容。我们在各种公共测试benchmark上评估了我们的方法,实现了优秀的结果,并在不同的曝光模型和背景网络上获得了适应性和性能优势。此外,我们的方法在未见扩大比率上也有更好的总体适应能力和性能优势。
DRM-IR: Task-Adaptive Deep Unfolding Network for All-In-One Image Restoration
results: 对多个标准 benchmark 数据集进行了广泛的实验,结果显示,我们的 DRM-IR 方法可以在 All-In-One 图像修复中达到状态的前者。Abstract
Existing All-In-One image restoration (IR) methods usually lack flexible modeling on various types of degradation, thus impeding the restoration performance. To achieve All-In-One IR with higher task dexterity, this work proposes an efficient Dynamic Reference Modeling paradigm (DRM-IR), which consists of task-adaptive degradation modeling and model-based image restoring. Specifically, these two subtasks are formalized as a pair of entangled reference-based maximum a posteriori (MAP) inferences, which are optimized synchronously in an unfolding-based manner. With the two cascaded subtasks, DRM-IR first dynamically models the task-specific degradation based on a reference image pair and further restores the image with the collected degradation statistics. Besides, to bridge the semantic gap between the reference and target degraded images, we further devise a Degradation Prior Transmitter (DPT) that restrains the instance-specific feature differences. DRM-IR explicitly provides superior flexibility for All-in-One IR while being interpretable. Extensive experiments on multiple benchmark datasets show that our DRM-IR achieves state-of-the-art in All-In-One IR.
摘要
通常的全面修复(IR)方法通常缺乏多种降低的灵活模型化,因此影响了修复性能。为了实现更高的任务敏捷度,这项工作提出了一种高效的动态参照模型(DRM-IR),它包括任务适应型的降低模型和基于模型的图像修复。具体来说,这两个子任务被формализова为一对推理最大 posteriori(MAP)推理,它们在一个层次结构中被同步优化。通过这两个相顺序的子任务,DRM-IR首先在参照图像对的基础上动态模型任务特定的降低,然后使用收集的降低统计来修复图像。此外,为了跨越参照图像和目标降低图像之间的semantic gap,我们还开发了一种降低先验(DPT),它限制了特定的特征差异。DRM-IR通过显式提供多种降低类型的灵活性,而且可读性高。广泛的实验表明,我们的DRM-IR在全面修复中实现了state-of-the-art。
results: 结果显示,dialect-balanced数据集不会在不同的 диалект中产生相同的表现, UlDIialeкти consistently underperforms,而 Mu диалект则具有最低的wer。Co和Mu диалект之间存在密切的关系,但这种关系不是对称的。这些结果将导向未来的数据集收集和系统建立策略,以优化在不同 диаLECT中的表现准确性。Abstract
ASR systems are generally built for the spoken 'standard', and their performance declines for non-standard dialects/varieties. This is a problem for a language like Irish, where there is no single spoken standard, but rather three major dialects: Ulster (Ul), Connacht (Co) and Munster (Mu). As a diagnostic to quantify the effect of the speaker's dialect on recognition performance, 12 ASR systems were trained, firstly using baseline dialect-balanced training corpora, and then using modified versions of the baseline corpora, where dialect-specific materials were either subtracted or added. Results indicate that dialect-balanced corpora do not yield a similar performance across the dialects: the Ul dialect consistently underperforms, whereas Mu yields lowest WERs. There is a close relationship between Co and Mu dialects, but one that is not symmetrical. These results will guide future corpus collection and system building strategies to optimise for cross-dialect performance equity.
摘要
听说系统通常是为口语标准建立的,其表现在非标准方言下降。这是一个问题,因为如爱尔兰语言中没有单一的口语标准,而是有三大方言: Ulster(Ul)、Connacht(Co)和Munster(Mu)。为了评估说话人的方言对识别表现的影响,12个听说系统在基础的方言均衡训练集上进行了训练,然后使用基础集的修改版本,其中方言特有的材料被 subtracted 或 added。结果表明,不同方言的表现不具有相似性:Ul方言一直表现不佳,而 Mu 方言具有最低 WERs。Co 和 Mu 方言之间存在密切的关系,但这种关系不是对称的。这些结果将导引未来的资料采集和系统建设策略,以优化在不同方言之间的表现 equity。
Replay to Remember: Continual Layer-Specific Fine-tuning for German Speech Recognition
results: 研究发现,通过添加原频谱的一部分数据,可以在新频谱上达到 Word-Error-Rates(WER)低于5%,同时稳定总语音识别性能。Abstract
While Automatic Speech Recognition (ASR) models have shown significant advances with the introduction of unsupervised or self-supervised training techniques, these improvements are still only limited to a subsection of languages and speakers. Transfer learning enables the adaptation of large-scale multilingual models to not only low-resource languages but also to more specific speaker groups. However, fine-tuning on data from new domains is usually accompanied by a decrease in performance on the original domain. Therefore, in our experiments, we examine how well the performance of large-scale ASR models can be approximated for smaller domains, with our own dataset of German Senior Voice Commands (SVC-de), and how much of the general speech recognition performance can be preserved by selectively freezing parts of the model during training. To further increase the robustness of the ASR model to vocabulary and speakers outside of the fine-tuned domain, we apply Experience Replay for continual learning. By adding only a fraction of data from the original domain, we are able to reach Word-Error-Rates (WERs) below 5\% on the new domain, while stabilizing performance for general speech recognition at acceptable WERs.
摘要
自动语音识别(ASR)模型在无监督或自监督训练技术的引入后已经表现出了显著的进步,但这些进步仅限于一些语言和发音人群。传输学习可以使大规模多语言模型适应不仅低资源语言,还可以适应更特定的发音人群。然而,在新领域数据进行精细调整通常会导致原领域性能下降。因此,我们在实验中检查了大规模ASR模型在更小的领域上的表现如何,以及如何在 selectively 冻结模型部分 During training 中保持一定的总体语音识别性能。进一步增加ASR模型对词汇和发音人群外的Robustness,我们应用经验回放 для持续学习。只添加原领域数据的一小部分,我们可以在新领域下达 Word-Error-Rates(WER)低于5%,而同时稳定总体语音识别性能。
AudioInceptionNeXt: TCL AI LAB Submission to EPIC-SOUND Audio-Based-Interaction-Recognition Challenge 2023
results: 我们的方法在挑战测试集上实现55.43%的top-1准确率,在公共领先榜上排名第一。代码可以在https://github.com/StevenLauHKHK/AudioInceptionNeXt.git上获取。Abstract
This report presents the technical details of our submission to the 2023 Epic-Kitchen EPIC-SOUNDS Audio-Based Interaction Recognition Challenge. The task is to learn the mapping from audio samples to their corresponding action labels. To achieve this goal, we propose a simple yet effective single-stream CNN-based architecture called AudioInceptionNeXt that operates on the time-frequency log-mel-spectrogram of the audio samples. Motivated by the design of the InceptionNeXt, we propose parallel multi-scale depthwise separable convolutional kernels in the AudioInceptionNeXt block, which enable the model to learn the time and frequency information more effectively. The large-scale separable kernels capture the long duration of activities and the global frequency semantic information, while the small-scale separable kernels capture the short duration of activities and local details of frequency information. Our approach achieved 55.43% of top-1 accuracy on the challenge test set, ranked as 1st on the public leaderboard. Codes are available anonymously at https://github.com/StevenLauHKHK/AudioInceptionNeXt.git.
摘要
这份报告介绍我们在2023年 Epic-Kitchen EPIC-SOUNDS 音频基于交互认知挑战中的技术细节。任务是学习音频示例与其对应的动作标签之间的映射。为了实现这个目标,我们提议一种简单 yet 高效的单流 CNN 建 architecture AudioInceptionNeXt,该架构在时域-频谱响应的 Log-Mel спектрограм中运行。受 InceptionNeXt 的设计启发,我们提议在 AudioInceptionNeXt 块中使用并行多级分割 convolutional 核,这些核 enable 模型更好地学习时间和频谱信息。大规模分割核捕捉活动的长时间和全局频谱 semantic 信息,而小规模分割核捕捉活动的短时间和局部频谱信息。我们的方法在挑战测试集上达到了 55.43% 的 top-1 精度,排名公共排行板上第一名。代码可以在 https://github.com/StevenLauHKHK/AudioInceptionNeXt.git 上anonymous 获取。
Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts
for: 可以Synthesize unseen speaker的speech with arbitrary-length prompts
methods: 使用 multi-reference timbre encoder和prosody language model,并 introduce arbitrary-source prompts and phoneme-level auto-regressive duration model
results: 可以 achieve improved performance with longer speech prompts and synthesize identity-preserving speech with a short prompt of an unseen speakerAbstract
Zero-shot text-to-speech aims at synthesizing voices with unseen speech prompts. Previous large-scale multispeaker TTS models have successfully achieved this goal with an enrolled recording within 10 seconds. However, most of them are designed to utilize only short speech prompts. The limited information in short speech prompts significantly hinders the performance of fine-grained identity imitation. In this paper, we introduce Mega-TTS 2, a generic zero-shot multispeaker TTS model that is capable of synthesizing speech for unseen speakers with arbitrary-length prompts. Specifically, we 1) design a multi-reference timbre encoder to extract timbre information from multiple reference speeches; 2) and train a prosody language model with arbitrary-length speech prompts; With these designs, our model is suitable for prompts of different lengths, which extends the upper bound of speech quality for zero-shot text-to-speech. Besides arbitrary-length prompts, we introduce arbitrary-source prompts, which leverages the probabilities derived from multiple P-LLM outputs to produce expressive and controlled prosody. Furthermore, we propose a phoneme-level auto-regressive duration model to introduce in-context learning capabilities to duration modeling. Experiments demonstrate that our method could not only synthesize identity-preserving speech with a short prompt of an unseen speaker but also achieve improved performance with longer speech prompts. Audio samples can be found in https://mega-tts.github.io/mega2_demo/.
摘要
<>零批 Text-to-Speech 目标是synthesize voice with unseen speech prompts。前一代大规模多 speaker TTS 模型已经成功实现了这个目标,但是大多数它们只能使用短的 speech prompts。短 speech prompts 的有限信息使得 fine-grained identity imitation 的性能受到了很大的限制。在这篇论文中,我们介绍 Mega-TTS 2,一种可以 synthesize speech for unseen speakers with arbitrary-length prompts 的通用零批多 speaker TTS 模型。具体来说,我们:1. 设计了多 references timbre encoder,以EXTRACT timbre information from multiple reference speeches。2. 并使用 arbitrary-length speech prompts 进行训练 prosody language model。这些设计使得我们的模型适用于不同的提示长度,从而扩展了 speech quality 的Upper bound for zero-shot text-to-speech。此外,我们还引入了arbitrary-source prompts,这里利用了多个 P-LLM 输出的概率来生成表达性和控制的 prosody。此外,我们还提出了一种phoneme-level auto-regressive duration model,以INTRODUCE in-context learning capabilities to duration modeling。实验表明,我们的方法可以不仅synthesize identity-preserving speech with a short prompt of an unseen speaker,还可以 achieved improved performance with longer speech prompts。Audio samples can be found in .
Low Rank Properties for Estimating Microphones Start Time and Sources Emission Time
paper_authors: Faxian Cao, Yongqiang Cheng, Adil Mehmood Khan, Zhijing Yang, S. M. Ahsan Kazmiand Yingxiu Chang
for: 这 paper 是为了解决不精确的时间信息问题,例如麦克风和源localization 而写的。
methods: 这 paper 使用了一种基于low-rank property (LRP)的方法,具体来说是利用LRP的低级结构来形成linear constraint,从而解决UTIm的不确定性问题。
results: 实验结果表明,这 paper 的方法在比较 existed state-of-the-art 方法时表现出了更高的性能, measured 通过Recovery number 和 reduced estimation errors of UTIm。Abstract
Uncertainty in timing information pertaining to the start time of microphone recordings and sources' emission time pose significant challenges in various applications, such as joint microphones and sources localization. Traditional optimization methods, which directly estimate this unknown timing information (UTIm), often fall short compared to approaches exploiting the low-rank property (LRP). LRP encompasses an additional low-rank structure, facilitating a linear constraint on UTIm to help formulate related low-rank structure information. This method allows us to attain globally optimal solutions for UTIm, given proper initialization. However, the initialization process often involves randomness, leading to suboptimal, local minimum values. This paper presents a novel, combined low-rank approximation (CLRA) method designed to mitigate the effects of this random initialization. We introduce three new LRP variants, underpinned by mathematical proof, which allow the UTIm to draw on a richer pool of low-rank structural information. Utilizing this augmented low-rank structural information from both LRP and the proposed variants, we formulate four linear constraints on the UTIm. Employing the proposed CLRA algorithm, we derive global optimal solutions for the UTIm via these four linear constraints.Experimental results highlight the superior performance of our method over existing state-of-the-art approaches, measured in terms of both the recovery number and reduced estimation errors of UTIm.
摘要
<>传感器记录的开始时间和发源时间的不确定性在各种应用中具有重要挑战性,如共同扬声器和发源器localization。传统优化方法,直接估算这些未知时间信息(UTIm),经常与LRP方法相比,表现不足。LRP包含额外的低级结构,使得可以在UTIm中增加直线约束,以帮助形式化相关的低级结构信息。这种方法使得我们可以在初始化过程中获得全球最优解。然而,初始化过程通常含有Randomness,导致获得局部最优解。本文提出了一种新的combined low-rank approximation(CLRA)方法,旨在 mitigate这种随机初始化的影响。我们提出了三种新的LRP变体,基于数学证明,使得UTIm可以借鉴更加丰富的低级结构信息。通过这些增强的低级结构信息,我们将UTIm转化为四个直线约束。采用我们提出的CLRA算法,我们可以从这些四个直线约束中获得全球最优解。实验结果表明,我们的方法在与现有状态的方法相比,具有更好的性能, measured in terms of both the recovery number and reduced estimation errors of UTIm。Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.
Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling
results: 我们的模型在SLURP数据集上实现了新的状态对Intent分类和槽填充的最佳Result,即90.14% Intent准确率和82.27% SLURP-F1。此外,我们还对端到端模型与分解模型(ASR+NLU)进行了深入比较,并证明端到端模型在参数效率和性能之间具有优势。最后,我们的模型成为了首个实现与分解模型相同性的E2E模型。Abstract
We study speech intent classification and slot filling (SICSF) by proposing to use an encoder pretrained on speech recognition (ASR) to initialize an end-to-end (E2E) Conformer-Transformer model, which achieves the new state-of-the-art results on the SLURP dataset, with 90.14% intent accuracy and 82.27% SLURP-F1. We compare our model with encoders pretrained on self-supervised learning (SSL), and show that ASR pretraining is much more effective than SSL for SICSF. To explore parameter efficiency, we freeze the encoder and add Adapter modules, and show that parameter efficiency is only achievable with an ASR-pretrained encoder, while the SSL encoder needs full finetuning to achieve comparable results. In addition, we provide an in-depth comparison on end-to-end models versus cascading models (ASR+NLU), and show that E2E models are better than cascaded models unless an oracle ASR model is provided. Last but not least, our model is the first E2E model that achieves the same performance as cascading models with oracle ASR. Code, checkpoints and configs are available.
摘要
我们研究了speech意图分类和插槽填充(SICSF),我们提议使用已经预训练的语音识别(ASR)Encoder来初始化一个端到端(E2E)Conformer-Transformer模型,这些模型在SLURP数据集上达到了新的州OF-the-art结果,具有90.14%的意图精度和82.27%的SLURP-F1。我们与self-supervised learning(SSL)预训练器进行比较,并发现ASR预训练是对SICSF的 much more effective than SSL。为了探索参数效率,我们冻结Encoder并添加Adapter模块,并发现只有ASR预训练的Encoder可以保持参数效率,而SSL预训练的Encoder需要全部finetuning才能达到相似的结果。此外,我们还提供了端到端模型与杂合模型(ASR+NLU)的深入比较,并发现E2E模型比杂合模型更好,除非提供了oracle ASR模型。最后,我们的模型是第一个E2E模型,可以与杂合模型具有oracle ASR模型的性能相同。代码、checkpoints和配置都可以获得。
Adapting an ASR Foundation Model for Spoken Language Assessment
results: 实验结果显示,通过精度地练习和软提示调整,可以有效地改变Whisper的解码行为,以生成候选者实际上说的话。Abstract
A crucial part of an accurate and reliable spoken language assessment system is the underlying ASR model. Recently, large-scale pre-trained ASR foundation models such as Whisper have been made available. As the output of these models is designed to be human readable, punctuation is added, numbers are presented in Arabic numeric form and abbreviations are included. Additionally, these models have a tendency to skip disfluencies and hesitations in the output. Though useful for readability, these attributes are not helpful for assessing the ability of a candidate and providing feedback. Here a precise transcription of what a candidate said is needed. In this paper, we give a detailed analysis of Whisper outputs and propose two solutions: fine-tuning and soft prompt tuning. Experiments are conducted on both public speech corpora and an English learner dataset. Results show that we can effectively alter the decoding behaviour of Whisper to generate the exact words spoken in the response.
摘要
《一个精准和可靠的口语评估系统中的关键部分是底层ASR模型。最近,大规模预训练ASR基础模型如Whisper已经被提供。这些模型的输出设计为人类可读,包括括号、阿拉伯数字形式的数字和缩写。然而,这些特征不实用于评估候选人的能力和提供反馈。我们需要准确转录候选人所说的话。在这篇论文中,我们对Whisper输出进行了详细分析,并提出了两种解决方案:细化和软提示调整。我们在公共演讲 corpora 和英语学习 dataset 上进行了实验,结果表明我们可以有效地改变Whisper的解码行为,以产生候选人实际上说的话。
results: 该论文预计可以在低频下提高声场重现的效果,因为直接控制声速 веctor在听区域中,可以更好地实现声场的重现。Abstract
This paper proposes a sound field reproduction algorithm based on matching the velocity vectors in a spherical listening region. Using the concept of sound field translation, the spherical harmonic coefficients of the velocity vectors in a spherical region are derived from the desired pressure distribution. The desired pressure distribution can either correspond to sources such as plane waves and point sources, or be obtained from measurements using a spherical microphone array. Unlike previous work in which the velocity vectors are only controlled on the boundary of the listening region or at discrete sweet spots, this work directly manipulates the velocity vectors in the whole listening region, which is expected to improve the perception of the desired sound field at low frequencies.
摘要
(Simplified Chinese translation)这篇论文提出了一种基于匀速场匀速场匀速场的声场重建算法。通过声场翻译的概念,这篇论文从需要的压力分布中获得了匀速场中的匀速Vector的圆锥幂系数。需要的压力分布可以来自平面波、点源等源,或者通过圆形 Mikrofon 阵列进行测量。与先前的工作不同,这篇论文直接控制整个听区域内的匀速Vector,而不是仅在听区域边缘或 discrete sweet spots 上控制匀速Vector,这可能会在低频段提高所需的声场的感知。
results: 实验结果表明,BiLSTM比传统回归方法更高效,能够捕捉多个时序序列之间的非线性动态。它在不同的生长条件下表现良好,并且可以使用有限的Sentinel-2图像来进行补充。BiLSTM在衰黄期表现特别出色。因此,BiLSTM可以用于时序序列Sentinel-1 VH/VV和Sentinel-2数据中的LAI补充,这种方法可以应用于其他时序序列补充问题。Abstract
The Leaf Area Index (LAI) is vital for predicting winter wheat yield. Acquisition of crop conditions via Sentinel-2 remote sensing images can be hindered by persistent clouds, affecting yield predictions. Synthetic Aperture Radar (SAR) provides all-weather imagery, and the ratio between its cross- and co-polarized channels (C-band) shows a high correlation with time series LAI over winter wheat regions. This study evaluates the use of time series Sentinel-1 VH/VV for LAI imputation, aiming to increase spatial-temporal density. We utilize a bidirectional LSTM (BiLSTM) network to impute time series LAI and use half mean squared error for each time step as the loss function. We trained models on data from southern Germany and the North China Plain using only LAI data generated by Sentinel-1 VH/VV and Sentinel-2. Experimental results show BiLSTM outperforms traditional regression methods, capturing nonlinear dynamics between multiple time series. It proves robust in various growing conditions and is effective even with limited Sentinel-2 images. BiLSTM's performance surpasses that of LSTM, particularly over the senescence period. Therefore, BiLSTM can be used to impute LAI with time-series Sentinel-1 VH/VV and Sentinel-2 data, and this method could be applied to other time-series imputation issues.
摘要
“叶面指数(LAI)是预测冬小麦收获的关键因素。但是,由于可视光Remote Sensing像素可能会受到持续云层遮挡,对收获预测造成影响。Synthetic Aperture Radar(SAR)提供了不受天气阻据的影像,其中横极和衡极通道(C-band)的比率与时间序列LAI之间存在高度的相关性。本研究探讨使用时间序列Sentinel-1 VH/VV来替代LAI数据,以增加空间时间的密度。我们使用了 bidirectional LSTM(BiLSTM)网络来替代时间序列LAI,并使用每个时间步的平均方差作为损失函数。我们使用了德国南部和中国北方平原的数据,只使用Sentinel-1 VH/VV和Sentinel-2的数据来训练模型。实验结果显示,BiLSTM比传统回归方法高效,能够捕捉时间序列之间的非线性动态。它在不同生长条件下表现良好,并且具有限制Sentinel-2像素的可行性。BiLSTM在衰老期表现更好,因此可以用时间序列Sentinel-1 VH/VV和Sentinel-2数据来替代LAI数据,这种方法可以应用于其他时间序列替代问题。”
Transient Neural Radiance Fields for Lidar View Synthesis and 3D Reconstruction
paper_authors: Anagh Malik, Parsa Mirdehghan, Sotiris Nousias, Kiriakos N. Kutulakos, David B. Lindell
for: 这 paper 的目的是用 NeRFs 模型来描述 Multiview 图像中的场景外观和几何结构,同时使用雷达测量数据作为额外监督。
methods: 这 paper 使用了一种新的时间分辨率版本的卷积渲染方程来渲染雷达测量数据,以捕捉在纳秒级别上的光学传输现象。
results: 作者们在一个具有 simulated 和捕捉的多视图扫描数据的 dataset 上评估了他们的方法,并发现其在缺少输入视图点的情况下比点云基础监督更好地恢复场景的几何和外观。这种方法可能对于自动驾驶、 робо控和远程感知等应用有很大的应用前景。Abstract
Neural radiance fields (NeRFs) have become a ubiquitous tool for modeling scene appearance and geometry from multiview imagery. Recent work has also begun to explore how to use additional supervision from lidar or depth sensor measurements in the NeRF framework. However, previous lidar-supervised NeRFs focus on rendering conventional camera imagery and use lidar-derived point cloud data as auxiliary supervision; thus, they fail to incorporate the underlying image formation model of the lidar. Here, we propose a novel method for rendering transient NeRFs that take as input the raw, time-resolved photon count histograms measured by a single-photon lidar system, and we seek to render such histograms from novel views. Different from conventional NeRFs, the approach relies on a time-resolved version of the volume rendering equation to render the lidar measurements and capture transient light transport phenomena at picosecond timescales. We evaluate our method on a first-of-its-kind dataset of simulated and captured transient multiview scans from a prototype single-photon lidar. Overall, our work brings NeRFs to a new dimension of imaging at transient timescales, newly enabling rendering of transient imagery from novel views. Additionally, we show that our approach recovers improved geometry and conventional appearance compared to point cloud-based supervision when training on few input viewpoints. Transient NeRFs may be especially useful for applications which seek to simulate raw lidar measurements for downstream tasks in autonomous driving, robotics, and remote sensing.
摘要
《神经辐射场(NeRFs)已成为多视图图像和几何的模型化工具。最近的研究也开始尝试将额外的超级视差从激光或深度测量设备的测量数据加入NeRF框架中。然而,先前的激光超级视差NeRF都是用来渲染传统的摄像头图像,并使用激光测量得到的点云数据作为辅助监督;因此,它们不能充分利用激光测量数据中的图像形成模型。在这里,我们提出了一种新的方法,可以将单 photon激光系统测量的时间分辨率历史曲线作为输入,并尝试从新视角渲染这些历史曲线。与传统的NeRF不同,我们的方法基于时间分辨率版本的卷积渲染方程来渲染激光测量和捕捉时钟秒级的辐射运输现象。我们对一个具有历史扫描和捕捉功能的单 photon激光系统进行了首次实验和捕捉。总的来说,我们的方法可以将NeRF引入新的时间级别的成像,并能够从新视角渲染动态图像。此外,我们还证明了我们的方法在几个输入视图点的监督下可以提取更好的几何和传统的外观特征,比起基于点云的监督来说更好。动态NeRF可能会在自动驾驶、机器人和远程感知等领域中发挥作用。》
Improving Zero-Shot Generalization for CLIP with Synthesized Prompts
methods: 提出了一种插件化生成方法 called \textbf{S}ynt\textbf{H}es\textbf{I}zed \textbf{P}rompts~(\textbf{SHIP}),通过变量自动编码器引入生成器,使用文本编码器来重建视觉特征。
results: 通过基本到新的泛化、跨数据集转移学习和总体零shot学习,实验证明了我们的方法的超越性。Abstract
With the growing interest in pretrained vision-language models like CLIP, recent research has focused on adapting these models to downstream tasks. Despite achieving promising results, most existing methods require labeled data for all classes, which may not hold in real-world applications due to the long tail and Zipf's law. For example, some classes may lack labeled data entirely, such as emerging concepts. To address this problem, we propose a plug-and-play generative approach called \textbf{S}ynt\textbf{H}es\textbf{I}zed \textbf{P}rompts~(\textbf{SHIP}) to improve existing fine-tuning methods. Specifically, we follow variational autoencoders to introduce a generator that reconstructs the visual features by inputting the synthesized prompts and the corresponding class names to the textual encoder of CLIP. In this manner, we easily obtain the synthesized features for the remaining label-only classes. Thereafter, we fine-tune CLIP with off-the-shelf methods by combining labeled and synthesized features. Extensive experiments on base-to-new generalization, cross-dataset transfer learning, and generalized zero-shot learning demonstrate the superiority of our approach. The code is available at \url{https://github.com/mrflogs/SHIP}.
摘要
随着CLIP类型预训练语义视觉模型的兴趣增长,当前研究的焦点在于将这些模型适应下渠道任务。虽然实现了有前途的结果,但大多数现有方法需要所有类别的标注数据,这在实际应用中可能不存在,因为Zipf的法则和长尾问题。例如,某些类别可能完全缺乏标注数据,如出现的概念。为解决这个问题,我们提议一种插件式生成方法called \textbf{S}ynt\textbf{H}es\textbf{I}zed \textbf{P}rompts~(\textbf{SHIP}),以改进现有的精度训练方法。具体来说,我们采用变量自动编码器引入一个生成器,通过输入生成的提示和相应的类别名称,将视觉特征重构回CLIP的文本编码器中。这样,我们可以轻松地获得标注-只有的类别的生成特征。然后,我们将CLIP通过市场上可获得的方法进行精度训练,并将标注和生成特征相结合。我们进行了基于新基础上的泛化、跨数据集转移学习和通用零shot学习的广泛实验,结果表明我们的方法有所优势。代码可以在\url{https://github.com/mrflogs/SHIP}中找到。
results: 在cross-silo和cross-device设置下,对CIFAR-10/100和Tiny ImageNet datasets进行了广泛的实验,并实现了新的SOTA性能在对照和非对照自适应学习方法上Abstract
The ubiquity of camera-enabled devices has led to large amounts of unlabeled image data being produced at the edge. The integration of self-supervised learning (SSL) and federated learning (FL) into one coherent system can potentially offer data privacy guarantees while also advancing the quality and robustness of the learned visual representations without needing to move data around. However, client bias and divergence during FL aggregation caused by data heterogeneity limits the performance of learned visual representations on downstream tasks. In this paper, we propose a new aggregation strategy termed Layer-wise Divergence Aware Weight Aggregation (L-DAWA) to mitigate the influence of client bias and divergence during FL aggregation. The proposed method aggregates weights at the layer-level according to the measure of angular divergence between the clients' model and the global model. Extensive experiments with cross-silo and cross-device settings on CIFAR-10/100 and Tiny ImageNet datasets demonstrate that our methods are effective and obtain new SOTA performance on both contrastive and non-contrastive SSL approaches.
摘要
随着摄像头设备的普遍使用,生成大量未标注图像数据的情况变得越来越普遍。通过结合自我超级vised learning(SSL)和联合学习(FL),可以实现数据隐私保证,同时提高学习的视觉表示质量和可靠性,不需要将数据传输。然而,客户端偏见和分布导致FL聚合中的客户端偏见和分布限制了下游任务的性能。在本文中,我们提出了一种新的聚合策略,即层次分解差异抑制Weight聚合(L-DAWA),以mitigate客户端偏见和分布的影响。该方法在客户端和全球模型之间进行层次分解,根据客户端模型和全球模型之间的角度差来权衡聚合。我们在跨积silod和跨设备的实验中,使用CIFAR-10/100和Tiny ImageNet datasets,证明了我们的方法的有效性,并实现了新的SOTA性能在对照和非对照SSL方法上。
Defect Classification in Additive Manufacturing Using CNN-Based Vision Processing
results: 本文通过将人类Loop mechanism integrate到分类模型中,以减少需要用于训练和生成训练数据的数据量。Abstract
The development of computer vision and in-situ monitoring using visual sensors allows the collection of large datasets from the additive manufacturing (AM) process. Such datasets could be used with machine learning techniques to improve the quality of AM. This paper examines two scenarios: first, using convolutional neural networks (CNNs) to accurately classify defects in an image dataset from AM and second, applying active learning techniques to the developed classification model. This allows the construction of a human-in-the-loop mechanism to reduce the size of the data required to train and generate training data.
摘要
通过计算机视觉和在位MONITORING使用视觉传感器,可以收集Additive Manufacturing(AM)过程中大量的数据集。这些数据集可以使用机器学习技术来提高AM的质量。本文研究了两个场景:首先,使用卷积神经网络(CNNs)准确地分类AM图像集中的缺陷,其次,应用活动学习技术到已经开发的分类模型,以建立人类在Loop机制,以降低需要训练和生成训练数据的大小。
ConTrack: Contextual Transformer for Device Tracking in X-ray
results: 实验结果显示,ConTrack方法在与现有追踪模型比较时,具有45%或更高的准确率。Abstract
Device tracking is an important prerequisite for guidance during endovascular procedures. Especially during cardiac interventions, detection and tracking of guiding the catheter tip in 2D fluoroscopic images is important for applications such as mapping vessels from angiography (high dose with contrast) to fluoroscopy (low dose without contrast). Tracking the catheter tip poses different challenges: the tip can be occluded by contrast during angiography or interventional devices; and it is always in continuous movement due to the cardiac and respiratory motions. To overcome these challenges, we propose ConTrack, a transformer-based network that uses both spatial and temporal contextual information for accurate device detection and tracking in both X-ray fluoroscopy and angiography. The spatial information comes from the template frames and the segmentation module: the template frames define the surroundings of the device, whereas the segmentation module detects the entire device to bring more context for the tip prediction. Using multiple templates makes the model more robust to the change in appearance of the device when it is occluded by the contrast agent. The flow information computed on the segmented catheter mask between the current and the previous frame helps in further refining the prediction by compensating for the respiratory and cardiac motions. The experiments show that our method achieves 45% or higher accuracy in detection and tracking when compared to state-of-the-art tracking models.
摘要
Device tracking是endo vasculature процедуры的重要前提。特别是在心血管 intervención中,探测和跟踪导引器的扫描器板准确性非常重要,用于映射血管from angiography (高剂量矿物质) to fluoroscopy (低剂量无矿物质)。跟踪导引器的挑战包括:扫描器板中的导引器可能会被矿物质 occlude during angiography or interventional devices; 并且它们都在心动和呼吸运动中不断移动。为了解决这些挑战,我们提出了ConTrack,一种基于transformer网络的方法,利用扫描器板中的空间和时间上下文信息进行准确的设备探测和跟踪。空间信息来自于模板帧和分 segmentation模块:模板帧定义了设备周围的环境,而分 segmentation模块检测了整个设备,以提供更多的上下文信息 для tip prediction。使用多个模板使得模型更加鲁凤于设备的外观变化,当它被矿物质 occlude 时。流动信息在分 segmentation模块中计算的扫描器板中的�atheter mask between the current and the previous frame帮助进一步细化预测,以补做心动和呼吸运动。实验显示,我们的方法可以在与状态方法进行比较时达到45%或更高的检测和跟踪精度。
for: 本研究旨在 automatizethe generation of artistic character line drawings from photographs, 以探讨 Vector Flow Aware and Line Controllable Image-to-Image Translation 架构的可能性。
methods: 我们首先提出了 Image-to-Flow 网络(I2FNet),用于有效地和稳定地生成 vector flow field,并提供了一个方向导航 для绘制线条。然后,我们引入了 Double Flow Generator(DFG)框架,用于融合 vector flow 和输入图像流的特征,以保证流程中的空间一致性。此外,我们还添加了一个 Line Control Matrix(LCM),以便控制绘制的线条的粗细、 глад度和连续性。
results: 我们的方法可以在高分辨率的图像中生成高质量的人物线 Drawing 图像,并且可以控制绘制的线条的特征。我们的实验结果表明,我们的方法可以在量和质量两个方面具有优于其他方法。Abstract
In this paper, we investigate the problem of automatically controllable artistic character line drawing generation from photographs by proposing a Vector Flow Aware and Line Controllable Image-to-Image Translation architecture, which can be viewed as an appealing intersection between Artificial Intelligence and Arts. Specifically, we first present an Image-to-Flow network (I2FNet) to efficiently and robustly create the vector flow field in a learning-based manner, which can provide a direction guide for drawing lines. Then, we introduce our well-designed Double Flow Generator (DFG) framework to fuse features from learned vector flow and input image flow guaranteeing the spatial coherence of lines. Meanwhile, in order to allow for controllable character line drawing generation, we integrate a Line Control Matrix (LCM) into DFG and train a Line Control Regressor (LCR) to synthesize drawings with different styles by elaborately controlling the level of details, such as thickness, smoothness, and continuity, of lines. Finally, we design a Fourier Transformation Loss to further constrain the character line generation from the frequency domain view of the point. Quantitative and qualitative experiments demonstrate that our approach can obtain superior performance in producing high-resolution character line-drawing images with perceptually realistic characteristics.
摘要
在这篇论文中,我们研究了自动控制的艺术人物线 drawing 生成从照片的问题,我们提出了一种 Vector Flow Aware 和 Line Controllable Image-to-Image Translation 架构,可以视为人工智能和艺术之间的吸引人交叉点。 Specifically,我们首先提出了一个 Image-to-Flow 网络 (I2FNet),可以高效和稳定地生成学习基于的vector flow场,这可以提供笔画的方向指南。然后,我们介绍了我们妙心设计的 Double Flow Generator (DFG) 框架,将学习的vector flow和输入图像流集成,以保证图像线的空间一致性。此外,为了允许可控制的人物线 drawing 生成,我们将一个 Line Control Matrix (LCM) интегрирован到 DFG 中,并训练一个 Line Control Regressor (LCR),通过精心控制线的细粒度、平滑度和连续性来 synthesize 不同风格的笔画。最后,我们设计了一种 Fourier Transformation Loss,从frequency domain的角度来约束 character line generation。量化和质量实验表明,我们的方法可以在生成高分辨率的人物线 drawing 图像时获得优秀的表现,具有人工智能和艺术的实际特征。
LEST: Large-scale LiDAR Semantic Segmentation with Transformer
methods: 该论文提出了一种基于Transformer架构的LiDARsemantic Segmentation方法,包括两个新组件:Space Filling Curve(SFC)Grouping策略和Distance-based Cosine Linear Transformer(DISCO)。
results: 在公共nuScenes semantic segmentation验证集和SemanticKITTI测试集上,该模型超过了所有其他状态时的方法。Abstract
Large-scale LiDAR-based point cloud semantic segmentation is a critical task in autonomous driving perception. Almost all of the previous state-of-the-art LiDAR semantic segmentation methods are variants of sparse 3D convolution. Although the Transformer architecture is becoming popular in the field of natural language processing and 2D computer vision, its application to large-scale point cloud semantic segmentation is still limited. In this paper, we propose a LiDAR sEmantic Segmentation architecture with pure Transformer, LEST. LEST comprises two novel components: a Space Filling Curve (SFC) Grouping strategy and a Distance-based Cosine Linear Transformer, DISCO. On the public nuScenes semantic segmentation validation set and SemanticKITTI test set, our model outperforms all the other state-of-the-art methods.
摘要
大规模 LiDAR 基于点云Semantic segmentation 是自动驾驶感知中的关键任务。大多数前一代状态对 LiDAR semantic segmentation 方法都是稀疏 3D 卷积的变种。虽然 Transformer 架构在自然语言处理和 2D 计算机视觉领域变得越来越流行,但它在大规模点云Semantic segmentation 领域的应用仍然很有限。在本文中,我们提出了 LiDAR Semantic Segmentation 架构,即 LEST,其包括两个新的组成部分:空间填充曲线(SFC)分组策略和距离基于 косину斯线性变换(DISCO)。在公共 nuScenes semantic segmentation 验证集和 SemanticKITTI 测试集上,我们的模型超过所有其他前一代方法的性能。
PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language Pre-training via Prompting
results: 这个研究发现,使用PiTL生成的对话 pairs可以对VLP进行强化,并且需要更少的超vision。实验结果显示,使用PiTL-生成的对话 pairs,VLP模型在图像到文本(I2T)和文本到图像(T2I)搜寻任务上表现更好,并且需要更少的超vision。这些结果显示PiTL-生成的对话 pairs 具有优秀的效用性。Abstract
Vision-language (VL) Pre-training (VLP) has shown to well generalize VL models over a wide range of VL downstream tasks, especially for cross-modal retrieval. However, it hinges on a huge amount of image-text pairs, which requires tedious and costly curation. On the contrary, weakly-supervised VLP (W-VLP) explores means with object tags generated by a pre-trained object detector (OD) from images. Yet, they still require paired information, i.e. images and object-level annotations, as supervision to train an OD. To further reduce the amount of supervision, we propose Prompts-in-The-Loop (PiTL) that prompts knowledge from large language models (LLMs) to describe images. Concretely, given a category label of an image, e.g. refinery, the knowledge, e.g. a refinery could be seen with large storage tanks, pipework, and ..., extracted by LLMs is used as the language counterpart. The knowledge supplements, e.g. the common relations among entities most likely appearing in a scene. We create IN14K, a new VL dataset of 9M images and 1M descriptions of 14K categories from ImageNet21K with PiTL. Empirically, the VL models pre-trained with PiTL-generated pairs are strongly favored over other W-VLP works on image-to-text (I2T) and text-to-image (T2I) retrieval tasks, with less supervision. The results reveal the effectiveness of PiTL-generated pairs for VLP.
摘要
视觉语言(VL)预训练(VLP)已经表现出在各种视觉下渠道任务上广泛适用,特别是跨Modal Retrieval。然而,它需要大量的图像文本对,需要辛苦和成本的筹集。相反,弱类视觉预训练(W-VLP)探索了通过图像检测器(OD)生成的对象标签来预训练VL模型。然而,它们仍然需要对的图像和对象级别注释来训练OD。 为了进一步减少监督,我们提出了Prompts-in-The-Loop(PiTL),它使用大型语言模型(LLM)来描述图像。具体来说,给定一个图像的类别标签,例如“炼厂”,我们使用LLM提取的知识,例如“炼厂可能出现的大容器、管道和...”,作为语言对应。这种知识补充了图像中可能出现的常见关系,例如场景中的实体之间的共同关系。我们创建了IN14K dataset,包含900万个图像和100万个描述,分别来自ImageNet21K dataset。我们的实验表明,使用PiTL生成的对应对在图像到文本(I2T)和文本到图像(T2I)检索任务上具有更高的性能,并且需要更少的监督。这些结果表明PiTL生成对的有效性。
paper_authors: Kaiwen Cai, Chris Xiaoxuan Lu, Xingyu Zhao, Xiaowei Huang
for: 提高图像检索精度和可靠性
methods: 应用不确定性评估技术,提供可靠性 guarantees
results: 实现图像检索集的覆盖保证,提高检索精度和可靠性Abstract
Most image retrieval research focuses on improving predictive performance, ignoring scenarios where the reliability of the prediction is also crucial. Uncertainty quantification technique can be applied to mitigate this issue by assessing uncertainty for retrieval sets, but it can provide only a heuristic estimate of uncertainty rather than a guarantee. To address these limitations, we present Risk Controlled Image Retrieval (RCIR), which generates retrieval sets with coverage guarantee, i.e., retrieval sets that are guaranteed to contain the true nearest neighbors with a predefined probability. RCIR can be easily integrated with existing uncertainty-aware image retrieval systems, agnostic to data distribution and model selection. To the best of our knowledge, this is the first work that provides coverage guarantees for image retrieval. The validity and efficiency of RCIR are demonstrated on four real-world image retrieval datasets: Stanford CAR-196, CUB-200, Pittsburgh and ChestX-Det.
摘要
大多数图像检索研究强调改进预测性能,忽略了预测可靠性的场景。不精准量化技术可以应用于 mitigate这一问题,评估检索集的不确定性,但它只能提供一个启示性的不确定性估计,而不是一个保证。为解决这些局限性,我们提出了风险控制图像检索(RCIR),生成具有覆盖保证的检索集,即在先定概率下包含真正的最近邻居。RCIR可以与现有的不确定性感知图像检索系统集成,不受数据分布和模型选择的影响。据我们所知,这是首次为图像检索提供了覆盖保证。我们在四个真实世界图像检索数据集上 validate和demonstrate RCIR 的有效性和高效性:Stanford CAR-196、CUB-200、Pittsburgh和ChestX-Det。
results: 实验结果表明,使用提议的粗略路由方法可以在 Mnist、smallnorb 和 affNist datasets 上达到类比性的分类性能,即准确率为 99.52%、93.91% 和 89.02% соответственно。此外,使用注意力采用权值的方法可以减少计算量,比EM routing 减少计算量1.42倍和2.5倍。这些发现有助于提高高效精度的层次模式表示模型。Abstract
This study introduces "shortcut routing," a novel routing mechanism in capsule networks that addresses computational inefficiencies by directly activating global capsules from local capsules, eliminating intermediate layers. An attention-based approach with fuzzy coefficients is also explored for improved efficiency. Experimental results on Mnist, smallnorb, and affNist datasets show comparable classification performance, achieving accuracies of 99.52%, 93.91%, and 89.02% respectively. The proposed fuzzy-based and attention-based routing methods significantly reduce the number of calculations by 1.42 and 2.5 times compared to EM routing, highlighting their computational advantages in capsule networks. These findings contribute to the advancement of efficient and accurate hierarchical pattern representation models.
摘要
Simplified Chinese:这个研究提出了一种名为"快捷路由"的新的路由机制,用于减少卷积网络中的计算复杂性。这种机制直接从本地卷积到全局卷积的方式活动,从中间层除去了。此外,还探索了一种基于注意力的方法,使用杂度系数进行补做。实验结果表明,在Mnist、smallnorb和affNist dataset上,这种方法可以达到99.52%、93.91%和89.02%的分类精度,并且与EM路由相比,具有1.42和2.5倍的计算优势。这些发现将对快速和准确的层次特征表示模型的发展做出贡献。
SynTable: A Synthetic Data Generation Pipeline for Unseen Object Amodal Instance Segmentation of Cluttered Tabletop Scenes
methods: 这篇论文使用了NVIDIA Isaac Sim Replicator Composer生成高质量的人工数据集,并可以根据用户的需求自动生成metadata,如模式和无模式实例分割masks,遮盲masks,深度地图, bounding box 和材料属性。
results: 这篇论文的实验结果表明,使用SynTable生成的数据集可以在Sim-to-Real转移中提高模型的性能,并且与OSD-Amodal数据集的评价相当。Abstract
In this work, we present SynTable, a unified and flexible Python-based dataset generator built using NVIDIA's Isaac Sim Replicator Composer for generating high-quality synthetic datasets for unseen object amodal instance segmentation of cluttered tabletop scenes. Our dataset generation tool can render a complex 3D scene containing object meshes, materials, textures, lighting, and backgrounds. Metadata, such as modal and amodal instance segmentation masks, occlusion masks, depth maps, bounding boxes, and material properties, can be generated to automatically annotate the scene according to the users' requirements. Our tool eliminates the need for manual labeling in the dataset generation process while ensuring the quality and accuracy of the dataset. In this work, we discuss our design goals, framework architecture, and the performance of our tool. We demonstrate the use of a sample dataset generated using SynTable by ray tracing for training a state-of-the-art model, UOAIS-Net. The results show significantly improved performance in Sim-to-Real transfer when evaluated on the OSD-Amodal dataset. We offer this tool as an open-source, easy-to-use, photorealistic dataset generator for advancing research in deep learning and synthetic data generation.
摘要
在这项工作中,我们介绍SynTable,一个统一和灵活的Python基础的数据生成工具,使用NVIDIA的Isaac Sim Replicator Composer生成高质量的synthetic数据集 для未经看过的对象模式分割。我们的数据生成工具可以渲染复杂的3D场景,包括物体模型、材料、Texture、照明和背景。用户可以根据需要生成metadata,如Modal和Amodal实例分割mask、 occlusion mask、深度地图、 bounding box 和材料属性,以自动标注场景。我们的工具消除了手动标注数据生成过程中的需求,同时保证数据的质量和准确性。在这项工作中,我们讨论我们的设计目标、框架体系和工具的性能。我们示例了使用SynTable生成的样本数据集,通过投影法训练State-of-the-art模型UOAIS-Net。结果显示在Sim-to-Real传输中,SynTable生成的数据集在OSD-Amodal数据集上得到了显著改善的性能。我们将这个工具作为开源、易用、 photorealistic数据生成工具,为深度学习和synthetic数据生成的研究提供支持。
methods: 我们引入了HEAL-SWIN transformer,它将astrophysics和cosmology中使用的高度均匀的Hierarchical Equal Area iso-Latitude Pixelation(HEALPix)网格和层次Shifted-Window(SWIN)变换器结合,实现了高效和灵活的模型,能够在高分辨率、扭曲无的球面数据上训练。在HEAL-SWIN中,HEALPix网格的嵌套结构用于Swin transformer的覆盖和窗口操作,导致一个具有最小计算开销的一维表示方式。
results: 我们在semantic segmentation和depth regression任务上使用HEAL-SWIN模型在真实和 sintetic汽车数据集上达到了superior性能。我们的代码可以在https://github.com/JanEGerken/HEAL-SWIN中下载。Abstract
High-resolution wide-angle fisheye images are becoming more and more important for robotics applications such as autonomous driving. However, using ordinary convolutional neural networks or vision transformers on this data is problematic due to projection and distortion losses introduced when projecting to a rectangular grid on the plane. We introduce the HEAL-SWIN transformer, which combines the highly uniform Hierarchical Equal Area iso-Latitude Pixelation (HEALPix) grid used in astrophysics and cosmology with the Hierarchical Shifted-Window (SWIN) transformer to yield an efficient and flexible model capable of training on high-resolution, distortion-free spherical data. In HEAL-SWIN, the nested structure of the HEALPix grid is used to perform the patching and windowing operations of the SWIN transformer, resulting in a one-dimensional representation of the spherical data with minimal computational overhead. We demonstrate the superior performance of our model for semantic segmentation and depth regression tasks on both synthetic and real automotive datasets. Our code is available at https://github.com/JanEGerken/HEAL-SWIN.
摘要
高分辨率宽角鱼眼图像在自动驾驶等机器人应用中日益重要。然而,使用普通的卷积神经网络或视Transformer在这些数据上进行训练存在投影和扭曲损失的问题。我们介绍了HEAL-SWIN transformer,它将astrophysics和cosmology中使用的高度均匀的 Hierarchical Equal Area iso-Latitude Pixelation(HEALPix)网格与层次分割(SWIN)转换器结合,实现高效可扩展的模型,能够在高分辨率、扭曲free的球面数据上进行训练。在HEAL-SWIN中,HEALPix网格的嵌套结构用于批处和窗口操作,从而实现了一个具有最小计算开销的一维表示方式。我们在 sintetic和实际汽车数据集上进行Semantic segmentation和深度回归任务的实验,并证明了我们的模型的优秀表现。代码可以在https://github.com/JanEGerken/HEAL-SWIN中找到。
3D Shape-Based Myocardial Infarction Prediction Using Point Cloud Classification Networks
results: 研究人员对UK Biobank数据集中的1068名参与者进行了预现性心肺疾病检测和新生心肺疾病预测任务,并发现了与临床标准相比的 ~13% 和 ~5% 的提升。此外,他们还分析了每个肺脏和卡丁阶段对3D形态基于的心肺疾病检测的作用,并进行了心肺疾病结果的视觉分析。Abstract
Myocardial infarction (MI) is one of the most prevalent cardiovascular diseases with associated clinical decision-making typically based on single-valued imaging biomarkers. However, such metrics only approximate the complex 3D structure and physiology of the heart and hence hinder a better understanding and prediction of MI outcomes. In this work, we investigate the utility of complete 3D cardiac shapes in the form of point clouds for an improved detection of MI events. To this end, we propose a fully automatic multi-step pipeline consisting of a 3D cardiac surface reconstruction step followed by a point cloud classification network. Our method utilizes recent advances in geometric deep learning on point clouds to enable direct and efficient multi-scale learning on high-resolution surface models of the cardiac anatomy. We evaluate our approach on 1068 UK Biobank subjects for the tasks of prevalent MI detection and incident MI prediction and find improvements of ~13% and ~5% respectively over clinical benchmarks. Furthermore, we analyze the role of each ventricle and cardiac phase for 3D shape-based MI detection and conduct a visual analysis of the morphological and physiological patterns typically associated with MI outcomes.
摘要
乳心细胞损伤(MI)是冠arca疾病中最常见的一种,临床决策通常基于单一的影像生物标志物。然而,这些指标只是心脏三维结构和生理的一种简化表达,因此难以更好地理解和预测MI结果。在这个工作中,我们研究了基于完整的三维心脏形状的点云的可用性,以提高MI事件的检测。我们提出了一种完全自动的多步骤管道,包括三维心脏表面重建步骤和点云分类网络。我们的方法利用了最新的点云深度学习的进步,以实现直接和高效地多尺度学习高分辨率的心脏 анатомиче模型。我们对UK Biobank数据集上的1068名参与者进行了MI检测和未来MI预测两个任务,并发现我们的方法与临床标准准确率相比提高了约13%和5%。此外,我们还分析了每个肺和卡ди亚阶段对于三维形状基于MI检测的作用,并进行了MI结果常见的 morphological和生理特征的视觉分析。
Sampling-Priors-Augmented Deep Unfolding Network for Robust Video Compressive Sensing
paper_authors: Yuhao Huang, Gangrong Qu, Youran Ge
For: 高速场景记录和低帧率传感器* Methods: 使用Sampling-Priors-Augmented Deep Unfolding Network (SPA-DUN) 进行高效和可靠的 видео压缩感知重建* Results: 在 simulations 和实际数据集上,SPA-DUN 可以处理多种采样设置,并 achieve state-of-the-art 性能与极高效率。Abstract
Video Compressed Sensing (VCS) aims to reconstruct multiple frames from one single captured measurement, thus achieving high-speed scene recording with a low-frame-rate sensor. Although there have been impressive advances in VCS recently, those state-of-the-art (SOTA) methods also significantly increase model complexity and suffer from poor generality and robustness, which means that those networks need to be retrained to accommodate the new system. Such limitations hinder the real-time imaging and practical deployment of models. In this work, we propose a Sampling-Priors-Augmented Deep Unfolding Network (SPA-DUN) for efficient and robust VCS reconstruction. Under the optimization-inspired deep unfolding framework, a lightweight and efficient U-net is exploited to downsize the model while improving overall performance. Moreover, the prior knowledge from the sampling model is utilized to dynamically modulate the network features to enable single SPA-DUN to handle arbitrary sampling settings, augmenting interpretability and generality. Extensive experiments on both simulation and real datasets demonstrate that SPA-DUN is not only applicable for various sampling settings with one single model but also achieves SOTA performance with incredible efficiency.
摘要
Implicit Neural Feature Fusion Function for Multispectral and Hyperspectral Image Fusion
paper_authors: ShangQi Deng, RuoCheng Wu, Liang-Jian Deng, Ran Ran, Tai-Xiang Jiang for: 这个论文的目的是解决多spectral和 hyperspectral图像融合问题,即将高分辨率多spectral图像(HR-MSI)和低分辨率 hyperspectral图像(LR-HSI)融合成高分辨率 hyperspectral图像(HR-HSI)。methods: 本论文提出了一种基于Implicit Neural Representation(INR)的方法,称为Implicit Neural Feature Fusion Function(INF),它利用HR-MSI作为高频率细节辅助输入,并通过 dual high-frequency fusion(DHFF)结构和cosine similarity(INR-CS)来协调多modal特征。results: 根据实验结果,提出的INFN网络在两个公共数据集上(CAVE和Harvard)上达到了当前最佳性能。Abstract
Multispectral and Hyperspectral Image Fusion (MHIF) is a practical task that aims to fuse a high-resolution multispectral image (HR-MSI) and a low-resolution hyperspectral image (LR-HSI) of the same scene to obtain a high-resolution hyperspectral image (HR-HSI). Benefiting from powerful inductive bias capability, CNN-based methods have achieved great success in the MHIF task. However, they lack certain interpretability and require convolution structures be stacked to enhance performance. Recently, Implicit Neural Representation (INR) has achieved good performance and interpretability in 2D tasks due to its ability to locally interpolate samples and utilize multimodal content such as pixels and coordinates. Although INR-based approaches show promise, they require extra construction of high-frequency information (\emph{e.g.,} positional encoding). In this paper, inspired by previous work of MHIF task, we realize that HR-MSI could serve as a high-frequency detail auxiliary input, leading us to propose a novel INR-based hyperspectral fusion function named Implicit Neural Feature Fusion Function (INF). As an elaborate structure, it solves the MHIF task and addresses deficiencies in the INR-based approaches. Specifically, our INF designs a Dual High-Frequency Fusion (DHFF) structure that obtains high-frequency information twice from HR-MSI and LR-HSI, then subtly fuses them with coordinate information. Moreover, the proposed INF incorporates a parameter-free method named INR with cosine similarity (INR-CS) that uses cosine similarity to generate local weights through feature vectors. Based on INF, we construct an Implicit Neural Fusion Network (INFN) that achieves state-of-the-art performance for MHIF tasks of two public datasets, \emph{i.e.,} CAVE and Harvard. The code will soon be made available on GitHub.
摘要
多спектраль和高spectral图像融合(MHIF)是一个实用的任务,旨在将高分辨率多спектраль图像(HR-MSI)和低分辨率高spectral图像(LR-HSI)的同一场景图像融合为获得高分辨率高spectral图像(HR-HSI)。由于强大的推导偏好能力,基于CNN的方法在MHIF任务中取得了很大的成功。然而,它们缺乏一定的解释性和需要堆叠 convolution 结构来提高性能。在最近的几年,基于INR的方法在2D任务中取得了良好的性能和解释性,因为它可以地方 interpolate samples和利用多Modal的内容,如像素和坐标。虽然INR基于的方法表现良好,但它们需要额外建立高频信息(例如, pozitional encoding)。在这篇论文中,我们受到MHIF任务的启发,认为HR-MSI可以作为高频细节助手输入,这引导我们提出了一种基于INR的新的高spectral融合函数,即偏导内在特征融合函数(INF)。作为一种复杂的结构,INF解决了MHIF任务和INR基于方法的缺陷。具体来说,我们的INF实现了双高频融合(DHFF)结构,从HR-MSI和LR-HSI中获取高频信息两次,然后细致地融合它们与坐标信息。此外,我们的INF还包括一种无参数的方法,即INR WITH cosine similarity(INR-CS),使用cosine similarity来生成本地权重通过特征向量。基于INF,我们建立了一个偏导内在融合网络(INFN),实现了MHIF任务的最新状态。代码即将在GitHub上公布。
Cloud Detection in Multispectral Satellite Images Using Support Vector Machines With Quantum Kernels
paper_authors: Artur Miroszewski, Jakub Mielczarek, Filip Szczepanek, Grzegorz Czelusta, Bartosz Grabowski, Bertrand Le Saux, Jakub Nalepa for:* 这个论文是为了扩展 классические支持向量机(SVM),使其能够更好地处理卫星数据分类 зада务。methods:* 这个论文使用了量子kernel(QKE)过程和经典SVM训练方法结合,将像素数据映射到希尔伯特空间中。* 使用ZZ-特征地图将参数化 ansatz 状态映射到希尔伯特空间中,并优化参数以最大化kernel目标对齐。results:* 实验结果表明,使用模拟的hybrid SVM可以成功地分类卫星图像,并且与经典SVM的准确率相当。Abstract
Support vector machines (SVMs) are a well-established classifier effectively deployed in an array of pattern recognition and classification tasks. In this work, we consider extending classic SVMs with quantum kernels and applying them to satellite data analysis. The design and implementation of SVMs with quantum kernels (hybrid SVMs) is presented. It consists of the Quantum Kernel Estimation (QKE) procedure combined with a classic SVM training routine. The pixel data are mapped to the Hilbert space using ZZ-feature maps acting on the parameterized ansatz state. The parameters are optimized to maximize the kernel target alignment. We approach the problem of cloud detection in satellite image data, which is one of the pivotal steps in both on-the-ground and on-board satellite image analysis processing chains. The experiments performed over the benchmark Landsat-8 multispectral dataset revealed that the simulated hybrid SVM successfully classifies satellite images with accuracy on par with classic SVMs.
摘要
Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation
results: 该论文通过实验表明,使用频域对抗训练可以提高模型对于不同类型的攻击的抗击能力,并且可以保持模型在清洁和攻击样本上的性能之间取得一个更好的平衡。Abstract
It is imperative to ensure the robustness of deep learning models in critical applications such as, healthcare. While recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks. We present a 3D frequency domain adversarial attack for volumetric medical image segmentation models and demonstrate its advantages over conventional input or voxel domain attacks. Using our proposed attack, we introduce a novel frequency domain adversarial training approach for optimizing a robust model against voxel and frequency domain attacks. Moreover, we propose frequency consistency loss to regulate our frequency domain adversarial training that achieves a better tradeoff between model's performance on clean and adversarial samples. Code is publicly available at https://github.com/asif-hanif/vafa.
摘要
必须确保深度学习模型在重要应用领域如医疗中的稳定性。 although recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks. we present a 3D frequency domain adversarial attack for volumetric medical image segmentation models and demonstrate its advantages over conventional input or voxel domain attacks. using our proposed attack, we introduce a novel frequency domain adversarial training approach for optimizing a robust model against voxel and frequency domain attacks. Moreover, we propose frequency consistency loss to regulate our frequency domain adversarial training that achieves a better tradeoff between model's performance on clean and adversarial samples. code is publicly available at https://github.com/asif-hanif/vafa.Here's the text in Traditional Chinese:必须确保深度学习模型在重要应用领域如医疗中的稳定性。 although recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks. we present a 3D frequency domain adversarial attack for volumetric medical image segmentation models and demonstrate its advantages over conventional input or voxel domain attacks. using our proposed attack, we introduce a novel frequency domain adversarial training approach for optimizing a robust model against voxel and frequency domain attacks. Moreover, we propose frequency consistency loss to regulate our frequency domain adversarial training that achieves a better tradeoff between model's performance on clean and adversarial samples. code is publicly available at https://github.com/asif-hanif/vafa.
cOOpD: Reformulating COPD classification on chest CT scans as anomaly detection using contrastive representations
paper_authors: Silvia D. Almeida, Carsten T. Lüth, Tobias Norajitra, Tassilo Wald, Marco Nolden, Paul F. Jaeger, Claus P. Heussel, Jürgen Biederer, Oliver Weinheimer, Klaus Maier-Hein
results: 研究结果显示,这个方法可以实现肺病分类的高精确性和高效率,并且可以提供明确的肺部频谱像 anomaly map 和病人级别得分,对于肺病的早期诊断和监测提供了有价的帮助。Abstract
Classification of heterogeneous diseases is challenging due to their complexity, variability of symptoms and imaging findings. Chronic Obstructive Pulmonary Disease (COPD) is a prime example, being underdiagnosed despite being the third leading cause of death. Its sparse, diffuse and heterogeneous appearance on computed tomography challenges supervised binary classification. We reformulate COPD binary classification as an anomaly detection task, proposing cOOpD: heterogeneous pathological regions are detected as Out-of-Distribution (OOD) from normal homogeneous lung regions. To this end, we learn representations of unlabeled lung regions employing a self-supervised contrastive pretext model, potentially capturing specific characteristics of diseased and healthy unlabeled regions. A generative model then learns the distribution of healthy representations and identifies abnormalities (stemming from COPD) as deviations. Patient-level scores are obtained by aggregating region OOD scores. We show that cOOpD achieves the best performance on two public datasets, with an increase of 8.2% and 7.7% in terms of AUROC compared to the previous supervised state-of-the-art. Additionally, cOOpD yields well-interpretable spatial anomaly maps and patient-level scores which we show to be of additional value in identifying individuals in the early stage of progression. Experiments in artificially designed real-world prevalence settings further support that anomaly detection is a powerful way of tackling COPD classification.
摘要
临床疾病分类是一项复杂的任务,因为疾病的复杂性、症状的多样性以及图像观察结果的不确定性。 chronic obstructive pulmonary disease (COPD) 是一个典型的例子,它 Despite being the third leading cause of death, it is often underdiagnosed due to its sparse, diffuse, and heterogeneous appearance on computed tomography (CT) scans. To address this challenge, we propose a novel approach called cOOpD, which reforms COPD binary classification as an anomaly detection task.In cOOpD, we first learn representations of unlabeled lung regions using a self-supervised contrastive pretext task, which captures specific characteristics of both healthy and diseased regions. We then use a generative model to learn the distribution of healthy representations and identify abnormalities (stemming from COPD) as deviations from this distribution. Patient-level scores are obtained by aggregating region-level out-of-distribution (OOD) scores.We evaluate cOOpD on two public datasets and show that it achieves the best performance compared to previous supervised state-of-the-art methods, with an increase of 8.2% and 7.7% in terms of area under the receiver operating characteristic curve (AUROC). Additionally, cOOpD provides well-interpretable spatial anomaly maps and patient-level scores, which can be used to identify individuals in the early stage of progression.In artificially designed real-world prevalence settings, we also demonstrate that anomaly detection is a powerful way of tackling COPD classification.
Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-Training
results: 在8个任务中,包括分类、 segmentation、检索和semantic relatedness等,与零shot或几shot设置相当或更好的表现Abstract
The foundation models based on pre-training technology have significantly advanced artificial intelligence from theoretical to practical applications. These models have facilitated the feasibility of computer-aided diagnosis for widespread use. Medical contrastive vision-language pre-training, which does not require human annotations, is an effective approach for guiding representation learning using description information in diagnostic reports. However, the effectiveness of pre-training is limited by the large-scale semantic overlap and shifting problems in medical field. To address these issues, we propose the Knowledge-Boosting Contrastive Vision-Language Pre-training framework (KoBo), which integrates clinical knowledge into the learning of vision-language semantic consistency. The framework uses an unbiased, open-set sample-wise knowledge representation to measure negative sample noise and supplement the correspondence between vision-language mutual information and clinical knowledge. Extensive experiments validate the effect of our framework on eight tasks including classification, segmentation, retrieval, and semantic relatedness, achieving comparable or better performance with the zero-shot or few-shot settings. Our code is open on https://github.com/ChenXiaoFei-CS/KoBo.
摘要
基于预训技术的基金会模型已经有效提高了人工智能的理论和实践应用。这些模型使得计算机辅助诊断的广泛应用变得可能。医疗对视语言预训,不需要人工注释,是一种有效的方法来引导表示学习使用描述信息。然而,预训的效果受到医疗领域的大规模semantic overlap和shift问题的限制。为了解决这些问题,我们提出了知识扩充对视语言预训框架(KoBo),该框架将临床知识integrated到视语言semantic consistency的学习中。该框架使用无偏点、开放集样本知识表示来度量负样本噪声,并补充视语言共 informations和临床知识之间的对应关系。我们进行了广泛的实验,证明了我们的框架在八个任务中,包括分类、 segmentation、retrieval和semantic relatedness等,可以达到零 instances或少 instances的设置下的相当或更好的性能。我们的代码可以在https://github.com/ChenXiaoFei-CS/KoBo 中找到。
FreeCOS: Self-Supervised Learning from Fractals and Unlabeled Images for Curvilinear Object Segmentation
paper_authors: Tianyi Shi, Xiaohuan Ding, Liang Zhang, Xin Yang for:自动 segmentation of curvilinear objects methods:使用 fractal-based synthesis 和 geometric information alignmentresults:超越现有的无监督方法和领域适应方法,在四个公共数据集上(XCAD、DRIVE、STARE和CrackTree)实现了高度的性能提升。I hope that helps! Let me know if you have any other questions.Abstract
Curvilinear object segmentation is critical for many applications. However, manually annotating curvilinear objects is very time-consuming and error-prone, yielding insufficiently available annotated datasets for existing supervised methods and domain adaptation methods. This paper proposes a self-supervised curvilinear object segmentation method that learns robust and distinctive features from fractals and unlabeled images (FreeCOS). The key contributions include a novel Fractal-FDA synthesis (FFS) module and a geometric information alignment (GIA) approach. FFS generates curvilinear structures based on the parametric Fractal L-system and integrates the generated structures into unlabeled images to obtain synthetic training images via Fourier Domain Adaptation. GIA reduces the intensity differences between the synthetic and unlabeled images by comparing the intensity order of a given pixel to the values of its nearby neighbors. Such image alignment can explicitly remove the dependency on absolute intensity values and enhance the inherent geometric characteristics which are common in both synthetic and real images. In addition, GIA aligns features of synthetic and real images via the prediction space adaptation loss (PSAL) and the curvilinear mask contrastive loss (CMCL). Extensive experimental results on four public datasets, i.e., XCAD, DRIVE, STARE and CrackTree demonstrate that our method outperforms the state-of-the-art unsupervised methods, self-supervised methods and traditional methods by a large margin. The source code of this work is available at https://github.com/TY-Shi/FreeCOS.
摘要
Curvilinear对象 segmentation是许多应用程序的关键。然而,手动标注curvilinear对象是非常时间consuming和容易出错,导致现有的超级vised方法和领域适应方法的可用数据不足。这篇论文提出了一种无监督的curvilinear对象 segmentation方法,该方法可以从自然语言和无标签图像中学习抗衡性和特征特征(FreeCOS)。本论文的主要贡献包括一种新的FFS模块和一种几何信息对接(GIA)方法。FFS使用 Parametric Fractal L-system生成curvilinear结构,并将生成的结构与无标签图像集成,以获得synthetic训练图像via Fourier Domain Adaptation。GIA通过比较给定像素的INTENSITY值与其邻近 neighbors的INTENSITY值,来降低synthetic和真实图像之间的INTENSITY值差异。这种图像对接可以明确地去除绝对INTENSITY值的依赖关系,并强调图像的内在几何特征。此外,GIA还使用PSAL和CMCL来对实际和synthetic图像的特征进行对接。我们在四个公共数据集(XCAD、DRIVE、STARE和CrackTree)进行了广泛的实验,并证明了我们的方法在无监督方法、自监督方法和传统方法之上具有显著的优势。 FreeCOS代码可以在https://github.com/TY-Shi/FreeCOS上获取。
MaxSR: Image Super-Resolution Using Improved MaxViT
results: 我们实现了两种不同的模型:классиical单张图像超解模型(MaxSR)和轻量级单张图像超解模型(MaxSR-light)。实验显示,我们的MaxSR和MaxSR-light在效率上创造了新的状态对。Abstract
While transformer models have been demonstrated to be effective for natural language processing tasks and high-level vision tasks, only a few attempts have been made to use powerful transformer models for single image super-resolution. Because transformer models have powerful representation capacity and the in-built self-attention mechanisms in transformer models help to leverage self-similarity prior in input low-resolution image to improve performance for single image super-resolution, we present a single image super-resolution model based on recent hybrid vision transformer of MaxViT, named as MaxSR. MaxSR consists of four parts, a shallow feature extraction block, multiple cascaded adaptive MaxViT blocks to extract deep hierarchical features and model global self-similarity from low-level features efficiently, a hierarchical feature fusion block, and finally a reconstruction block. The key component of MaxSR, i.e., adaptive MaxViT block, is based on MaxViT block which mixes MBConv with squeeze-and-excitation, block attention and grid attention. In order to achieve better global modelling of self-similarity in input low-resolution image, we improve block attention and grid attention in MaxViT block to adaptive block attention and adaptive grid attention which do self-attention inside each window across all grids and each grid across all windows respectively in the most efficient way. We instantiate proposed model for classical single image super-resolution (MaxSR) and lightweight single image super-resolution (MaxSR-light). Experiments show that our MaxSR and MaxSR-light establish new state-of-the-art performance efficiently.
摘要
transformer模型已经在自然语言处理任务和高级视觉任务上得到了证明,但只有一些尝试使用强大的transformer模型进行单图超解像。因为transformer模型具有强大的表示能力和内置的自注意机制,可以借助输入低分辨率图像中的自相似性前来提高单图超解像性能。为此,我们提出了基于最近的hybrid视觉transformer(MaxViT)模型的单图超解像模型,名为MaxSR。MaxSR包括四部分:浅层特征提取块、多个彩色分割的adaptive MaxViT块、层次特征融合块和最后的重建块。MaxSR的关键组件是adaptive MaxViT块,它将MBConv与压缩激发、块注意力和网格注意力相结合。为了更好地全局地模型输入低分辨率图像中的自相似性,我们在MaxViT块中进行了改进,使其成为adaptive块注意力和adaptive网格注意力,可以在最有效的方式内进行自注意。我们实现了提议的模型,并对 классиical单图超解像(MaxSR)和lightweight单图超解像(MaxSR-light)进行实现。实验结果显示,我们的MaxSR和MaxSR-light都达到了高效的新状态码。
Source-Free Domain Adaptive Fundus Image Segmentation with Class-Balanced Mean Teacher
results: 实验表明,CBMT可以有效地解决这两个问题,并在多个标准推广上超越现有方法。Abstract
This paper studies source-free domain adaptive fundus image segmentation which aims to adapt a pretrained fundus segmentation model to a target domain using unlabeled images. This is a challenging task because it is highly risky to adapt a model only using unlabeled data. Most existing methods tackle this task mainly by designing techniques to carefully generate pseudo labels from the model's predictions and use the pseudo labels to train the model. While often obtaining positive adaption effects, these methods suffer from two major issues. First, they tend to be fairly unstable - incorrect pseudo labels abruptly emerged may cause a catastrophic impact on the model. Second, they fail to consider the severe class imbalance of fundus images where the foreground (e.g., cup) region is usually very small. This paper aims to address these two issues by proposing the Class-Balanced Mean Teacher (CBMT) model. CBMT addresses the unstable issue by proposing a weak-strong augmented mean teacher learning scheme where only the teacher model generates pseudo labels from weakly augmented images to train a student model that takes strongly augmented images as input. The teacher is updated as the moving average of the instantly trained student, which could be noisy. This prevents the teacher model from being abruptly impacted by incorrect pseudo-labels. For the class imbalance issue, CBMT proposes a novel loss calibration approach to highlight foreground classes according to global statistics. Experiments show that CBMT well addresses these two issues and outperforms existing methods on multiple benchmarks.
摘要
This paper proposes the Class-Balanced Mean Teacher (CBMT) model to address these issues. CBMT uses a weak-strong augmented mean teacher learning scheme, where the teacher model generates pseudo labels from weakly augmented images to train a student model that takes strongly augmented images as input. The teacher is updated as the moving average of the instantly trained student, which helps prevent the teacher model from being impacted by incorrect pseudo-labels. Additionally, CBMT proposes a novel loss calibration approach to highlight foreground classes according to global statistics, addressing the class imbalance issue.Experiments show that CBMT effectively addresses these issues and outperforms existing methods on multiple benchmarks.
Masked Autoencoders for Unsupervised Anomaly Detection in Medical Images
For: 本研究旨在检测医学成像中的异常现象,但不使用病理样本进行训练。* Methods: 我们提出使用伪装自动编码器模型来学习正常样本的结构,然后在伪装编码器的差分上训练一个异常分类器。我们使用正常扫描图像的重建结果作为负样本,而伪装模块修改正常扫描图像的一些区域的INTENSITY,作为正样本。* Results: 我们在BRATS2020和LUNA16两个医学成像数据集上进行了实验,并与四种 state-of-the-art 异常检测框架进行比较,分别是 AST、RD4AD、AnoVAEGAN 和 f-AnoGAN。Abstract
Pathological anomalies exhibit diverse appearances in medical imaging, making it difficult to collect and annotate a representative amount of data required to train deep learning models in a supervised setting. Therefore, in this work, we tackle anomaly detection in medical images training our framework using only healthy samples. We propose to use the Masked Autoencoder model to learn the structure of the normal samples, then train an anomaly classifier on top of the difference between the original image and the reconstruction provided by the masked autoencoder. We train the anomaly classifier in a supervised manner using as negative samples the reconstruction of the healthy scans, while as positive samples, we use pseudo-abnormal scans obtained via our novel pseudo-abnormal module. The pseudo-abnormal module alters the reconstruction of the normal samples by changing the intensity of several regions. We conduct experiments on two medical image data sets, namely BRATS2020 and LUNA16 and compare our method with four state-of-the-art anomaly detection frameworks, namely AST, RD4AD, AnoVAEGAN and f-AnoGAN.
摘要
医学影像中的疾病异常现象多样化,使得收集和标注充足数据来训练深度学习模型在指导下的情况困难。因此,在这项工作中,我们采用不同的方法来检测医学影像中的异常。我们提议使用Masked Autoencoder模型来学习正常样本的结构,然后在Masked Autoencoder的差异上训练异常分类器。我们在指导下训练异常分类器,使用正样本的重建为负样本,而使用我们新提出的假异常模块生成的 pseudo-异常样本作为正样本。假异常模块对正常样本的重建进行了一些区域的INTENSITY变化。我们在 BRATS2020和LUNA16两个医学影像数据集上进行了实验,并与四种现代异常检测框架进行了比较,namely AST, RD4AD, AnoVAEGAN和f-AnoGAN。
results: 重现后,挑战排名显著不同于原来的挑战,这表明挑战排名可能不具可重现性。Abstract
While clinical trials are the state-of-the-art methods to assess the effect of new medication in a comparative manner, benchmarking in the field of medical image analysis is performed by so-called challenges. Recently, comprehensive analysis of multiple biomedical image analysis challenges revealed large discrepancies between the impact of challenges and quality control of the design and reporting standard. This work aims to follow up on these results and attempts to address the specific question of the reproducibility of the participants methods. In an effort to determine whether alternative interpretations of the method description may change the challenge ranking, we reproduced the algorithms submitted to the 2019 Robust Medical Image Segmentation Challenge (ROBUST-MIS). The leaderboard differed substantially between the original challenge and reimplementation, indicating that challenge rankings may not be sufficiently reproducible.
摘要
在药物开发中,临床试验是当今最佳方法来评估新药的效果,而在医学影像分析领域,则通过 так称的挑战来进行比较性评估。最近,多个生物医学影像分析挑战的全面分析发现,挑战的影响和设计标准的质量控制存在大差异。这项工作的目标是跟进这些结果,并尝试解决特定问题,即参与者的方法是否可重复。为了确定参与者的方法是否具有可重复性,我们对2019年的稳定医学影像分类挑战(ROBUST-MIS)中提交的算法进行重新实现。结果发现,挑战排名存在很大的差异,这表明挑战排名可能不具有足够的可重复性。
Complementary Frequency-Varying Awareness Network for Open-Set Fine-Grained Image Recognition
results: 对于3种细致图像 dataset和2种粗致图像 dataset的实验结果表明,CFAN-OSFGR方法在大多数情况下与9种现有方法进行比较,表现出了显著的优势。Abstract
Open-set image recognition is a challenging topic in computer vision. Most of the existing works in literature focus on learning more discriminative features from the input images, however, they are usually insensitive to the high- or low-frequency components in features, resulting in a decreasing performance on fine-grained image recognition. To address this problem, we propose a Complementary Frequency-varying Awareness Network that could better capture both high-frequency and low-frequency information, called CFAN. The proposed CFAN consists of three sequential modules: (i) a feature extraction module is introduced for learning preliminary features from the input images; (ii) a frequency-varying filtering module is designed to separate out both high- and low-frequency components from the preliminary features in the frequency domain via a frequency-adjustable filter; (iii) a complementary temporal aggregation module is designed for aggregating the high- and low-frequency components via two Long Short-Term Memory networks into discriminative features. Based on CFAN, we further propose an open-set fine-grained image recognition method, called CFAN-OSFGR, which learns image features via CFAN and classifies them via a linear classifier. Experimental results on 3 fine-grained datasets and 2 coarse-grained datasets demonstrate that CFAN-OSFGR performs significantly better than 9 state-of-the-art methods in most cases.
摘要
“开放集image认识是计算机视觉领域中的一个挑战。大多数现有的文献works都是学习从输入图像中学习更有吸引力的特征,然而,它们通常对高频或低频组件不敏感,导致图像细化认识性能下降。为解决这个问题,我们提出了一种Complementary Frequency-varying Awareness Network(CFAN),可以更好地捕捉高频和低频信息。CFAN包括三个顺序模块:(i)一个特征提取模块,用于从输入图像中学习初步特征;(ii)一个频率域中的频率变化滤波器,用于在频率域中分离出高频和低频组件;(iii)一个 complementary temporal aggregation模块,用于在时间域中聚合高频和低频组件,并使用两个Long Short-Term Memory网络(LSTM)进行聚合。基于CFAN,我们进一步提出了一种开放集细化图像认识方法(CFAN-OSFGR),它通过CFAN学习图像特征,并使用一个线性分类器进行分类。实验结果表明,CFAN-OSFGR在3个细化图像 dataset和2个粗化图像 dataset上表现出色,在大多数情况下与9种现有方法进行比较,表现出 significatively better 的性能。”
Multimodal Motion Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection
results: 在4个 benchmark 上表现出色,超过了现有的最佳ResultAbstract
Anomalies are rare and anomaly detection is often therefore framed as One-Class Classification (OCC), i.e. trained solely on normalcy. Leading OCC techniques constrain the latent representations of normal motions to limited volumes and detect as abnormal anything outside, which accounts satisfactorily for the openset'ness of anomalies. But normalcy shares the same openset'ness property since humans can perform the same action in several ways, which the leading techniques neglect. We propose a novel generative model for video anomaly detection (VAD), which assumes that both normality and abnormality are multimodal. We consider skeletal representations and leverage state-of-the-art diffusion probabilistic models to generate multimodal future human poses. We contribute a novel conditioning on the past motion of people and exploit the improved mode coverage capabilities of diffusion processes to generate different-but-plausible future motions. Upon the statistical aggregation of future modes, an anomaly is detected when the generated set of motions is not pertinent to the actual future. We validate our model on 4 established benchmarks: UBnormal, HR-UBnormal, HR-STC, and HR-Avenue, with extensive experiments surpassing state-of-the-art results.
摘要
异常现象是罕见的,因此异常检测通常被框架为一类分类(OCC),即只在正常情况下进行训练。现leading OCC技术限制了正常动作的latent表示形式为有限体积,检测其外部为异常。然而,正常情况也具有同样的开放性属性,因为人们可以通过不同的方式执行同一个动作,这些技术忽略了。我们提出了一种新的生成模型 для视频异常检测(VAD),它假设了正常和异常都是多模的。我们考虑了骨骼表示形式,并利用当今最佳的扩散概率模型来生成多模未来人体姿势。我们采用了一种新的 conditioning 方法,基于过去人体运动的统计聚合,并利用扩散过程的改进Mode覆盖能力来生成不同 yet plausible 的未来姿势。当生成的集合不符合实际未来时,我们识别出异常。我们验证了我们的模型在 4 个确立的标准准obenchmark 上,包括 UBnormal、HR-UBnormal、HR-STC 和 HR-Avenue,并进行了广泛的实验,超过了当前最佳的成果。
Volumetric Wireframe Parsing from Neural Attraction Fields
results: 在DTU和BlendedMVS数据集上进行了实验,取得了出色的表现。Abstract
The primal sketch is a fundamental representation in Marr's vision theory, which allows for parsimonious image-level processing from 2D to 2.5D perception. This paper takes a further step by computing 3D primal sketch of wireframes from a set of images with known camera poses, in which we take the 2D wireframes in multi-view images as the basis to compute 3D wireframes in a volumetric rendering formulation. In our method, we first propose a NEural Attraction (NEAT) Fields that parameterizes the 3D line segments with coordinate Multi-Layer Perceptrons (MLPs), enabling us to learn the 3D line segments from 2D observation without incurring any explicit feature correspondences across views. We then present a novel Global Junction Perceiving (GJP) module to perceive meaningful 3D junctions from the NEAT Fields of 3D line segments by optimizing a randomly initialized high-dimensional latent array and a lightweight decoding MLP. Benefitting from our explicit modeling of 3D junctions, we finally compute the primal sketch of 3D wireframes by attracting the queried 3D line segments to the 3D junctions, significantly simplifying the computation paradigm of 3D wireframe parsing. In experiments, we evaluate our approach on the DTU and BlendedMVS datasets with promising performance obtained. As far as we know, our method is the first approach to achieve high-fidelity 3D wireframe parsing without requiring explicit matching.
摘要
primal 绘制是马尔普视觉理论中的基本表示,允许从2D到2.5D的简洁图像处理。本文进一步计算基于多视图图像的知道摄像机位置的3D primal 绘制,其中我们将2D 绘制图像作为基础,计算3D 绘制图像在三维渲染中。我们首先提出了一种基于多层感知神经网络(MLP)的神经吸引(NEAT)场,以学习2D 绘制图像中的3D 线段。然后,我们提出了一种全球缝合(GJP)模块,用于从 NEAT 场中提取有意义的3D 缝合。由于我们明确地模型了3D 缝合,因此我们最终计算了3D 绘制图像的 primal 绘制,从而大大简化了3D 绘制图像的计算模式。在实验中,我们对DTU和BlendedMVS数据集进行了评估,并取得了出色的性能。据我们所知,我们的方法是首次实现高精度3D 绘制 parsing 无需显式匹配。
Omnipotent Adversarial Training for Unknown Label-noisy and Imbalanced Datasets
results: 我们的全面评估结果显示,OAT 在复杂的数据不均匀和标签噪声场景下出现较大的改善,clean accuracy 提高了More than 20%,robust accuracy 提高了More than 10%。Abstract
Adversarial training is an important topic in robust deep learning, but the community lacks attention to its practical usage. In this paper, we aim to resolve a real-world application challenge, i.e., training a model on an imbalanced and noisy dataset to achieve high clean accuracy and robustness, with our proposed Omnipotent Adversarial Training (OAT). Our strategy consists of two innovative methodologies to address the label noise and data imbalance in the training set. We first introduce an oracle into the adversarial training process to help the model learn a correct data-label conditional distribution. This carefully-designed oracle can provide correct label annotations for adversarial training. We further propose logits adjustment adversarial training to overcome the data imbalance challenge, which can help the model learn a Bayes-optimal distribution. Our comprehensive evaluation results show that OAT outperforms other baselines by more than 20% clean accuracy improvement and 10% robust accuracy improvement under the complex combinations of data imbalance and label noise scenarios. The code can be found in https://github.com/GuanlinLee/OAT.
摘要1 adversarial training是深度学习中重要的话题,但社区对其实用方面缺乏关注。在这篇论文中,我们想要解决一个实际应用挑战,即在受损和噪声 dataset 上训练模型,以 достичь高度的清洁精度和Robustness。我们提出了两种创新的方法来解决训练集中的标签噪声和数据不均衡问题。我们首先引入了一个恰当的oracle到对抗训练过程中,帮助模型学习正确的数据-标签 conditional distribution。这个特制的oracle可以提供正确的标签注释 для对抗训练。我们还提出了Logits调整对抗训练,以解决数据不均衡问题,帮助模型学习bayes优化的分布。我们的全面评估结果表明,OAT在复杂的数据不均衡和标签噪声搭配下出perform得超过20%的清洁精度提升和10%的Robustness提升。代码可以在https://github.com/GuanlinLee/OAT中找到。
LightFormer: An End-to-End Model for Intersection Right-of-Way Recognition Using Traffic Light Signals and an Attention Mechanism
results: 经过训练并测试两个公共交通灯光数据集,LightFormer模型能够准确地识别交通灯光的权限状态。Abstract
For smart vehicles driving through signalised intersections, it is crucial to determine whether the vehicle has right of way given the state of the traffic lights. To address this issue, camera based sensors can be used to determine whether the vehicle has permission to proceed straight, turn left or turn right. This paper proposes a novel end to end intersection right of way recognition model called LightFormer to generate right of way status for available driving directions in complex urban intersections. The model includes a spatial temporal inner structure with an attention mechanism, which incorporates features from past image to contribute to the classification of the current frame right of way status. In addition, a modified, multi weight arcface loss is introduced to enhance the model classification performance. Finally, the proposed LightFormer is trained and tested on two public traffic light datasets with manually augmented labels to demonstrate its effectiveness.
摘要
For smart vehicles driving through signalized intersections, it is crucial to determine whether the vehicle has the right of way given the state of the traffic lights. To address this issue, camera-based sensors can be used to determine whether the vehicle has permission to proceed straight, turn left, or turn right. This paper proposes a novel end-to-end intersection right-of-way recognition model called LightFormer to generate right-of-way status for available driving directions in complex urban intersections. The model includes a spatial-temporal inner structure with an attention mechanism, which incorporates features from past images to contribute to the classification of the current frame right-of-way status. In addition, a modified, multi-weight arcface loss is introduced to enhance the model classification performance. Finally, the proposed LightFormer is trained and tested on two public traffic light datasets with manually augmented labels to demonstrate its effectiveness.Here's the translation in Traditional Chinese:For smart vehicles driving through signalized intersections, it is crucial to determine whether the vehicle has the right of way given the state of the traffic lights. To address this issue, camera-based sensors can be used to determine whether the vehicle has permission to proceed straight, turn left, or turn right. This paper proposes a novel end-to-end intersection right-of-way recognition model called LightFormer to generate right-of-way status for available driving directions in complex urban intersections. The model includes a spatial-temporal inner structure with an attention mechanism, which incorporates features from past images to contribute to the classification of the current frame right-of-way status. In addition, a modified, multi-weight arcface loss is introduced to enhance the model classification performance. Finally, the proposed LightFormer is trained and tested on two public traffic light datasets with manually augmented labels to demonstrate its effectiveness.
Adversarial Training Over Long-Tailed Distribution
results: 对不同的数据集和模型结构进行评估,我们发现 REAT 可以有效地提高模型的Robustness 和保持模型的清洁准确率。代码可以在 https://github.com/GuanlinLee/REAT 找到。Abstract
In this paper, we study adversarial training on datasets that obey the long-tailed distribution, which is practical but rarely explored in previous works. Compared with conventional adversarial training on balanced datasets, this process falls into the dilemma of generating uneven adversarial examples (AEs) and an unbalanced feature embedding space, causing the resulting model to exhibit low robustness and accuracy on tail data. To combat that, we propose a new adversarial training framework -- Re-balancing Adversarial Training (REAT). This framework consists of two components: (1) a new training strategy inspired by the term effective number to guide the model to generate more balanced and informative AEs; (2) a carefully constructed penalty function to force a satisfactory feature space. Evaluation results on different datasets and model structures prove that REAT can effectively enhance the model's robustness and preserve the model's clean accuracy. The code can be found in https://github.com/GuanlinLee/REAT.
摘要
在这篇论文中,我们研究了对长条度分布的数据集进行对抗训练,这种情况在前一些研究中很少被考虑。与传统的对抗训练在均衡数据集上进行的情况相比,这个过程会陷入生成不均匀的对抗示例(AE)以及不均匀的特征空间的困境,导致模型在尾部数据上表现出低的鲁棒性和准确率。为解决这个问题,我们提出了一个新的对抗训练框架——重新均衡对抗训练(REAT)。这个框架包括两个组成部分:(1)一种基于有效数量的新训练策略,以导引模型生成更加均匀和有用的AE;(2)一个特殊构造的罚函数,以强制模型在特征空间中达到满意的结果。经过不同数据集和模型结构的评估,我们发现REAT可以有效地提高模型的鲁棒性,同时保持模型的净精度。代码可以在https://github.com/GuanlinLee/REAT找到。
Erasing, Transforming, and Noising Defense Network for Occluded Person Re-Identification
methods: adversarial defense, random erasure, random transformations, noise perturbation
results: effective handling of occlusion issues, no need for external modules, superior performance on five public datasetsAbstract
Occlusion perturbation presents a significant challenge in person re-identification (re-ID), and existing methods that rely on external visual cues require additional computational resources and only consider the issue of missing information caused by occlusion. In this paper, we propose a simple yet effective framework, termed Erasing, Transforming, and Noising Defense Network (ETNDNet), which treats occlusion as a noise disturbance and solves occluded person re-ID from the perspective of adversarial defense. In the proposed ETNDNet, we introduce three strategies: Firstly, we randomly erase the feature map to create an adversarial representation with incomplete information, enabling adversarial learning of identity loss to protect the re-ID system from the disturbance of missing information. Secondly, we introduce random transformations to simulate the position misalignment caused by occlusion, training the extractor and classifier adversarially to learn robust representations immune to misaligned information. Thirdly, we perturb the feature map with random values to address noisy information introduced by obstacles and non-target pedestrians, and employ adversarial gaming in the re-ID system to enhance its resistance to occlusion noise. Without bells and whistles, ETNDNet has three key highlights: (i) it does not require any external modules with parameters, (ii) it effectively handles various issues caused by occlusion from obstacles and non-target pedestrians, and (iii) it designs the first GAN-based adversarial defense paradigm for occluded person re-ID. Extensive experiments on five public datasets fully demonstrate the effectiveness, superiority, and practicality of the proposed ETNDNet. The code will be released at \url{https://github.com/nengdong96/ETNDNet}.
摘要
干扰对人重新标识(re-ID)提出了 significativ challenge,现有的方法通常需要额外的计算资源,并只考虑 occlusion 引起的信息缺失问题。在这篇论文中,我们提出了一种简单 yet effective的框架,称为Erasing, Transforming, and Noising Defense Network(ETNDNet),它将 occlusion 视为干扰,通过对抗防御的方式解决 occluded person re-ID 问题。在我们提出的 ETNDNet 中,我们引入了三种策略:首先,我们随机将特征图抹除,创建一个干扰表示,使得抗护理学习损失,以保护重新标识系统免受缺失信息的影响。其次,我们引入了随机变换,模拟了障碍物和非目标人员的位置偏移,使得抽取器和分类器通过对抗学习,学习具有抗性的表示。最后,我们在特征图中添加了随机值,对干扰引入的噪音进行处理,并通过对抗游戏,提高重新标识系统对 occlusion 的抗性。ETNDNet 的三个关键亮点是:一、它不需要任何外部模块和参数;二、它能有效地处理各种由障碍物和非目标人员引起的 occlusion 问题;三、它是首次在 occluded person re-ID 中应用 GAN 基于对抗防御的方法。我们的实验在五个公共数据集上进行了广泛的证明和评估,并证明了 ETNDNet 的有效性、优势和实用性。代码将在 \url{https://github.com/nengdong96/ETNDNet} 上发布。
TVPR: Text-to-Video Person Retrieval and a New Benchmark
methods: 提出了一种新的 Text-to-Video Person Retrieval(TVPR)任务,使用文本描述和视频数据交互进行人脸 Retrieval。同时,构建了大规模的 across-modal人员视频数据集(TVPReid),包括人脸、动作和环境交互等详细自然语言描述。
results: 提出了 Text-to-Video Person Retrieval Network(TVPRN),使用视觉和动作特征进行人脸视频表示,并使用预训练的 BERT 获取描述文本的表示,以便找出最相关的人脸视频。经过广泛的实验,TVPRN 在 TVPReid 数据集上达到了state-of-the-art表现。Abstract
Most existing methods for text-based person retrieval focus on text-to-image person retrieval. Nevertheless, due to the lack of dynamic information provided by isolated frames, the performance is hampered when the person is obscured in isolated frames or variable motion details are given in the textual description. In this paper, we propose a new task called Text-to-Video Person Retrieval(TVPR) which aims to effectively overcome the limitations of isolated frames. Since there is no dataset or benchmark that describes person videos with natural language, we construct a large-scale cross-modal person video dataset containing detailed natural language annotations, such as person's appearance, actions and interactions with environment, etc., termed as Text-to-Video Person Re-identification (TVPReid) dataset, which will be publicly available. To this end, a Text-to-Video Person Retrieval Network (TVPRN) is proposed. Specifically, TVPRN acquires video representations by fusing visual and motion representations of person videos, which can deal with temporal occlusion and the absence of variable motion details in isolated frames. Meanwhile, we employ the pre-trained BERT to obtain caption representations and the relationship between caption and video representations to reveal the most relevant person videos. To evaluate the effectiveness of the proposed TVPRN, extensive experiments have been conducted on TVPReid dataset. To the best of our knowledge, TVPRN is the first successful attempt to use video for text-based person retrieval task and has achieved state-of-the-art performance on TVPReid dataset. The TVPReid dataset will be publicly available to benefit future research.
摘要
现有的文本基于人识别方法多数采用文本到图像人识别。然而,由于隔ilder的缺乏动态信息,在文本描述中人脸遮盖或变化的运动细节时,性能受到了限制。在这篇论文中,我们提出了一个新任务:文本到视频人识别(TVPR),旨在超越隔ilder的局限性。由于没有任何人视频与自然语言描述的数据集或标准,我们构建了一个大规模的跨模态人视频数据集,包括人物的出现、行为和环境交互等细节的自然语言描述,称为文本到视频人重识别(TVPReid)数据集。为此,我们提出了文本到视频人 Retrieval网络(TVPRN)。具体来说,TVPRN使用人视频的视觉和运动表示 fusion 来处理时间遮盖和隔ilder中缺乏变化运动细节。同时,我们使用预训练的 BERT 获取caption表示,并通过caption和视频表示之间的关系,找出最相关的人视频。为证明提出的 TVPRN 的有效性,我们进行了广泛的实验,并在 TVPReid 数据集上达到了当前最佳性能。据我们知道,TVPRN 是首次使用视频来解决文本基于人识别任务,并在 TVPReid 数据集上实现了状态畅的性能。TVPReid 数据集将于未来的研究中公开,以便推动未来的研究。
DISPEL: Domain Generalization via Domain-Specific Liberating
results: 我们的实验结果表明,DISPEL 可以与现有方法相比,在五个 benchmark 上表现出色,并且可以进一步普适多种算法。此外,我们还 derivated 一个泛化误差 bounds,以保证泛化性能。Abstract
Domain generalization aims to learn a generalization model that can perform well on unseen test domains by only training on limited source domains. However, existing domain generalization approaches often bring in prediction-irrelevant noise or require the collection of domain labels. To address these challenges, we consider the domain generalization problem from a different perspective by categorizing underlying feature groups into domain-shared and domain-specific features. Nevertheless, the domain-specific features are difficult to be identified and distinguished from the input data. In this work, we propose DomaIn-SPEcific Liberating (DISPEL), a post-processing fine-grained masking approach that can filter out undefined and indistinguishable domain-specific features in the embedding space. Specifically, DISPEL utilizes a mask generator that produces a unique mask for each input data to filter domain-specific features. The DISPEL framework is highly flexible to be applied to any fine-tuned models. We derive a generalization error bound to guarantee the generalization performance by optimizing a designed objective loss. The experimental results on five benchmarks demonstrate DISPEL outperforms existing methods and can further generalize various algorithms.
摘要
领域总则目标是学习一个通用模型,可以在未经见过的测试领域上表现出色,只需在有限的源领域上训练。然而,现有的领域总则方法经常带来预测无关的噪声或需要收集领域标签。为解决这些挑战,我们从不同的角度看待领域总则问题,将下面的特征分为领域共享特征和领域特定特征。然而,领域特定特征很难被识别出来并分离于输入数据中。在这种情况下,我们提出了DISPEL方法,一种后处细化的面积掩码方法,可以在嵌入空间中过滤无法定义或不可分辨的领域特定特征。DISPEL方法使用一个生成器生成特有的掩码,用于过滤领域特定特征。DISPEL框架高度灵活,可以应用于任何精度调整的模型。我们 derivate一个总则错误范围,以保证总则性能。实验结果在五个benchmark上表明,DISPEL方法超过了现有方法,并可以进一步普适多种算法。
Adaptive Region Selection for Active Learning in Whole Slide Image Semantic Segmentation
paper_authors: Jingna Qiu, Frauke Wilm, Mathias Öttl, Maja Schlereth, Chang Liu, Tobias Heimann, Marc Aubreville, Katharina Breininger for:这篇论文是为了提高 Histological gigapixel-sized whole slide images (WSIs) 的分类训练模型而做的准备工作,特别是在 Region-based active learning (AL) 中,即通过选择一小部分 annotated 图像区域来训练模型,而不是全图像。methods:这篇论文提出了一种新的技术,即动态选择 annotated 区域,以避免 AL Step size 的选择问题。这种技术首先会找到一个有用的区域,然后确定该区域的最佳包围框,而不是选择固定形状和大小的 rectangular 区域,如标准方法所做。results:这篇论文使用了 breast cancer metastases segmentation 任务,在 CAMELYON16 dataset 上进行了评估。结果显示,这种新技术可以在不同的 AL Step size 下 consistently 实现更高的 sampling efficiency,并且只需要 annotate 2.6% 的组织质量。这意味着可以大幅减少 annotate 图像集的成本。代码可以在 https://github.com/DeepMicroscopy/AdaptiveRegionSelection 上下载。Abstract
The process of annotating histological gigapixel-sized whole slide images (WSIs) at the pixel level for the purpose of training a supervised segmentation model is time-consuming. Region-based active learning (AL) involves training the model on a limited number of annotated image regions instead of requesting annotations of the entire images. These annotation regions are iteratively selected, with the goal of optimizing model performance while minimizing the annotated area. The standard method for region selection evaluates the informativeness of all square regions of a specified size and then selects a specific quantity of the most informative regions. We find that the efficiency of this method highly depends on the choice of AL step size (i.e., the combination of region size and the number of selected regions per WSI), and a suboptimal AL step size can result in redundant annotation requests or inflated computation costs. This paper introduces a novel technique for selecting annotation regions adaptively, mitigating the reliance on this AL hyperparameter. Specifically, we dynamically determine each region by first identifying an informative area and then detecting its optimal bounding box, as opposed to selecting regions of a uniform predefined shape and size as in the standard method. We evaluate our method using the task of breast cancer metastases segmentation on the public CAMELYON16 dataset and show that it consistently achieves higher sampling efficiency than the standard method across various AL step sizes. With only 2.6\% of tissue area annotated, we achieve full annotation performance and thereby substantially reduce the costs of annotating a WSI dataset. The source code is available at https://github.com/DeepMicroscopy/AdaptiveRegionSelection.
摘要
“ annotating histological gigapixel-sized whole slide images (WSIs) at the pixel level for the purpose of training a supervised segmentation model is time-consuming. Region-based active learning (AL) involves training the model on a limited number of annotated image regions instead of requesting annotations of the entire images. These annotation regions are iteratively selected, with the goal of optimizing model performance while minimizing the annotated area. The standard method for region selection evaluates the informativeness of all square regions of a specified size and then selects a specific quantity of the most informative regions. We find that the efficiency of this method highly depends on the choice of AL step size (i.e., the combination of region size and the number of selected regions per WSI), and a suboptimal AL step size can result in redundant annotation requests or inflated computation costs. This paper introduces a novel technique for selecting annotation regions adaptively, mitigating the reliance on this AL hyperparameter. Specifically, we dynamically determine each region by first identifying an informative area and then detecting its optimal bounding box, as opposed to selecting regions of a uniform predefined shape and size as in the standard method. We evaluate our method using the task of breast cancer metastases segmentation on the public CAMELYON16 dataset and show that it consistently achieves higher sampling efficiency than the standard method across various AL step sizes. With only 2.6% of tissue area annotated, we achieve full annotation performance and thereby substantially reduce the costs of annotating a WSI dataset. The source code is available at https://github.com/DeepMicroscopy/AdaptiveRegionSelection.”Here's the translation in Traditional Chinese:“ annotating histological gigapixel-sized whole slide images (WSIs) at the pixel level for the purpose of training a supervised segmentation model is time-consuming. Region-based active learning (AL) involves training the model on a limited number of annotated image regions instead of requesting annotations of the entire images. These annotation regions are iteratively selected, with the goal of optimizing model performance while minimizing the annotated area. The standard method for region selection evaluates the informativeness of all square regions of a specified size and then selects a specific quantity of the most informative regions. We find that the efficiency of this method highly depends on the choice of AL step size (i.e., the combination of region size and the number of selected regions per WSI), and a suboptimal AL step size can result in redundant annotation requests or inflated computation costs. This paper introduces a novel technique for selecting annotation regions adaptively, mitigating the reliance on this AL hyperparameter. Specifically, we dynamically determine each region by first identifying an informative area and then detecting its optimal bounding box, as opposed to selecting regions of a uniform predefined shape and size as in the standard method. We evaluate our method using the task of breast cancer metastases segmentation on the public CAMELYON16 dataset and show that it consistently achieves higher sampling efficiency than the standard method across various AL step sizes. With only 2.6% of tissue area annotated, we achieve full annotation performance and thereby substantially reduce the costs of annotating a WSI dataset. The source code is available at https://github.com/DeepMicroscopy/AdaptiveRegionSelection.”
Linking vision and motion for self-supervised object-centric perception
results: 研究人员通过使用这种方法在Waymo开放感知数据集上获得了可接受的结果,虽然对象层质不如supervised方法或其他使用更特权信息的方法,但模型能够学习一种能够融合多个相机视角的时间序列表示,并成功跟踪了 dataset 中的许多车辆和行人。Abstract
Object-centric representations enable autonomous driving algorithms to reason about interactions between many independent agents and scene features. Traditionally these representations have been obtained via supervised learning, but this decouples perception from the downstream driving task and could harm generalization. In this work we adapt a self-supervised object-centric vision model to perform object decomposition using only RGB video and the pose of the vehicle as inputs. We demonstrate that our method obtains promising results on the Waymo Open perception dataset. While object mask quality lags behind supervised methods or alternatives that use more privileged information, we find that our model is capable of learning a representation that fuses multiple camera viewpoints over time and successfully tracks many vehicles and pedestrians in the dataset. Code for our model is available at https://github.com/wayveai/SOCS.
摘要
object-centric表示法可以让自主驾驶算法对多个独立的Agent和场景特征进行理解。传统上,这些表示方法通过监督学习获得,但这会分离感知和下游驾驶任务,可能会对泛化造成负面影响。在这项工作中,我们适应了基于自动学习的对象分解模型,只使用RGB视频和车辆的pose作为输入。我们示示了我们的方法在 Waymo 开放感知数据集上获得了有前途的结果。虽然对象层面质量落后于监督方法或其他使用更特权信息的方法,但我们发现我们的模型可以学习一个汇集多个相机视点的时间序列,并成功跟踪多辆车和人行进在数据集中。代码可以在https://github.com/wayveai/SOCS上获取。
Deteksi Sampah di Permukaan dan Dalam Perairan pada Objek Video dengan Metode Robust and Efficient Post-Processing dan Tubelet-Level Bounding Box Linking
paper_authors: Bryan Tjandra, Made S. N. Negara, Nyoo S. C. Handoko
for: 本研究旨在开发一种自动化垃圾收集机器人,以解决印度尼西亚水域中垃圾的问题。
methods: 本研究使用了YOLOv5模型和Robust & Efficient Post Processing(REPP)方法,以及Tubelet-level bounding box linking在FloW和Roboflow数据集上。这些方法可以提高原生Object Detection的性能,并考虑邻帧检测结果。
results: 研究结果表明,后处理阶段和Tubelet-level bounding box linking可以提高检测质量,相比YOLOv5 alone提高约3%。这些方法可以检测表面和水下垃圾,并可以应用于实时图像基于垃圾收集机器人。Abstract
Indonesia, as a maritime country, has a significant portion of its territory covered by water. Ineffective waste management has resulted in a considerable amount of trash in Indonesian waters, leading to various issues. The development of an automated trash-collecting robot can be a solution to address this problem. The robot requires a system capable of detecting objects in motion, such as in videos. However, using naive object detection methods in videos has limitations, particularly when image focus is reduced and the target object is obstructed by other objects. This paper's contribution provides an explanation of the methods that can be applied to perform video object detection in an automated trash-collecting robot. The study utilizes the YOLOv5 model and the Robust & Efficient Post Processing (REPP) method, along with tubelet-level bounding box linking on the FloW and Roboflow datasets. The combination of these methods enhances the performance of naive object detection from YOLOv5 by considering the detection results in adjacent frames. The results show that the post-processing stage and tubelet-level bounding box linking can improve the quality of detection, achieving approximately 3% better performance compared to YOLOv5 alone. The use of these methods has the potential to detect surface and underwater trash and can be applied to a real-time image-based trash-collecting robot. Implementing this system is expected to mitigate the damage caused by trash in the past and improve Indonesia's waste management system in the future.
摘要
印度尼西亚,作为一个海上国家,有很大一部分领土被水覆盖。不效的垃圾管理导致了印度尼西亚水域中的废弃物堆积,引起了各种问题。开发自动垃圾收集机器人可以解决这个问题。这个机器人需要一个能够检测运动 объек的系统,例如在视频中。但是使用直观的对象检测方法在视频中有限制,特别是当图像焦点减少和目标对象受到其他对象干扰时。本文的贡献是对视频对象检测方法的解释,并使用YOLOv5模型和Robust & Efficient Post Processing(REPP)方法,以及Tubelet-level bounding box linking在Flow和Roboflow数据集上。将这些方法结合使用可以提高YOLOv5直观对象检测的性能,并考虑邻帧检测结果。结果表明,后处理阶段和Tubelet-level bounding box linking可以提高检测质量,相比YOLOv5 alone,提高约3%。这些方法可以检测表面和水下垃圾,并可以应用于实时图像基于垃圾收集机器人。实施这种系统,预计可以改善过去垃圾的损害,并提高未来印度尼西亚的垃圾管理系统。
CFI2P: Coarse-to-Fine Cross-Modal Correspondence Learning for Image-to-Point Cloud Registration
paper_authors: Gongxin Yao, Yixin Xuan, Yiwei Chen, Yu Pan
For: 本研究主要针对image-to-point cloud registration问题,实现点云和图像之间的匹配。* Methods: 我们提出了一个具有粗细对称的框架,从本地角度出发,首先建立点云和图像之间的匹配,然后透过精确的搜寻、注意力学习和精确匹配,从细腻搜寻空间中获取高品质的匹配。* Results: 我们在大规模的 OUTDOOR 实验中证明了我们的方法的优越性,并且在EPnP算法下进行了匹配。Abstract
In the context of image-to-point cloud registration, acquiring point-to-pixel correspondences presents a challenging task since the similarity between individual points and pixels is ambiguous due to the visual differences in data modalities. Nevertheless, the same object present in the two data formats can be readily identified from the local perspective of point sets and pixel patches. Motivated by this intuition, we propose a coarse-to-fine framework that emphasizes the establishment of correspondences between local point sets and pixel patches, followed by the refinement of results at both the point and pixel levels. On a coarse scale, we mimic the classic Visual Transformer to translate both image and point cloud into two sequences of local representations, namely point and pixel proxies, and employ attention to capture global and cross-modal contexts. To supervise the coarse matching, we propose a novel projected point proportion loss, which guides to match point sets with pixel patches where more points can be projected into. On a finer scale, point-to-pixel correspondences are then refined from a smaller search space (i.e., the coarsely matched sets and patches) via well-designed sampling, attentional learning and fine matching, where sampling masks are embedded in the last two steps to mitigate the negative effect of sampling. With the high-quality correspondences, the registration problem is then resolved by EPnP algorithm within RANSAC. Experimental results on large-scale outdoor benchmarks demonstrate our superiority over existing methods.
摘要
在图像与点云注册中,获取点对像像点对应存在挑战,因为图像和点云数据模式之间的视觉差异使得个别点和像素之间的相似性具有很大的歧义。然而,同一个物体在两种数据格式中可以轻松地从地方视角上识别。基于这种感知,我们提出了一个卷积框架,强调在点集和像素块之间建立对应关系,然后对点和像素水平进行重finement。在粗略层次,我们模仿了经典的视觉 трансформа器,将图像和点云转换成两个Sequence of local representation,即点和像素代理,并使用注意力来捕捉全局和交叉模式上的信息。为了监督粗略匹配,我们提出了一种新的 projeted point proportion loss,它引导将点集与像素块匹配,其中更多的点可以被投影到像素块中。在细化层次,点对像像点对应从粗略匹配的小搜索空间(即粗略匹配的集和块)进行重新匹配,使用特制的采样、注意力学习和细化匹配,其中采样面被嵌入最后两步以避免采样的负面影响。与高质量对应关系,则可以通过EPnP算法在RANSAC中解决注册问题。实验结果表明我们在大规模的户外 benchmark 上表现出色。
Fine-grained Text-Video Retrieval with Frozen Image Encoders
results: 实验结果表明, compared to 状态空间的方法,我们的提出的 CrossTVR 方法可以更好地提高文本视频检索性能。Abstract
State-of-the-art text-video retrieval (TVR) methods typically utilize CLIP and cosine similarity for efficient retrieval. Meanwhile, cross attention methods, which employ a transformer decoder to compute attention between each text query and all frames in a video, offer a more comprehensive interaction between text and videos. However, these methods lack important fine-grained spatial information as they directly compute attention between text and video-level tokens. To address this issue, we propose CrossTVR, a two-stage text-video retrieval architecture. In the first stage, we leverage existing TVR methods with cosine similarity network for efficient text/video candidate selection. In the second stage, we propose a novel decoupled video text cross attention module to capture fine-grained multimodal information in spatial and temporal dimensions. Additionally, we employ the frozen CLIP model strategy in fine-grained retrieval, enabling scalability to larger pre-trained vision models like ViT-G, resulting in improved retrieval performance. Experiments on text video retrieval datasets demonstrate the effectiveness and scalability of our proposed CrossTVR compared to state-of-the-art approaches.
摘要
现代文本视频检索(TVR)方法通常使用CLIP和cosine相似性来实现高效的检索。而cross attention方法,它使用变换器解码器计算文本查询和视频帧之间的关注,可以更好地考虑文本和视频之间的互动。然而,这些方法缺乏细致的空间信息,因为直接计算文本和视频级别的токен之间的关注。为解决这个问题,我们提出了CrossTVR,一种两stage文本视频检索架构。在第一个阶段,我们利用现有的 TVR 方法和cosine相似性网络来高效地选择文本/视频候选者。在第二个阶段,我们提出了一种新的解除视频文本交叉注意模块,以捕捉视频和文本之间的细致的多Modal信息。此外,我们采用冻结 CLIP 模型策略,使得可以扩展到更大的预训练视觉模型 like ViT-G,从而提高检索性能。实验表明,我们提出的 CrossTVR 比对现有的方法更有效和可扩展。
CeRF: Convolutional Neural Radiance Fields for New View Synthesis with Derivatives of Ray Modeling
paper_authors: Xiaoyan Yang, Dingbo Lu, Yang Li, Chenhui Li, Changbo Wang
for: novel view synthesis of high-fidelity images
methods: Convolutional Neural Radiance Fields with 1D convolutional operations and structured neural network architecture, and a proposed recurrent module to solve geometric ambiguity
results: promising results compared with existing state-of-the-art methodsHere’s the full text in Simplified Chinese:
for: 高效图像新视图合成
methods: 基于1D卷积操作的卷积神经场,以及用于解决几何杂乱的循环模块
results: 与现有状态艺术方法相比,显示出优秀的结果Abstract
In recent years, novel view synthesis has gained popularity in generating high-fidelity images. While demonstrating superior performance in the task of synthesizing novel views, the majority of these methods are still based on the conventional multi-layer perceptron for scene embedding. Furthermore, light field models suffer from geometric blurring during pixel rendering, while radiance field-based volume rendering methods have multiple solutions for a certain target of density distribution integration. To address these issues, we introduce the Convolutional Neural Radiance Fields to model the derivatives of radiance along rays. Based on 1D convolutional operations, our proposed method effectively extracts potential ray representations through a structured neural network architecture. Besides, with the proposed ray modeling, a proposed recurrent module is employed to solve geometric ambiguity in the fully neural rendering process. Extensive experiments demonstrate the promising results of our proposed model compared with existing state-of-the-art methods.
摘要
近年来,新视图合成技术受到广泛关注,并在生成高效图像方面达到了显著提高。虽然大多数这些方法仍然基于传统的多层感知器进行场景嵌入,但是它们在新视图合成任务中表现出色。然而,光场模型在像素渲染过程中受到光学模糊的影响,而基于辐射场的Volume渲染方法具有多个解的涂抹积分问题。为解决这些问题,我们介绍了卷积神经场的激素场,通过1D卷积操作来有效地提取潜在的光线表示。此外,我们还提出了一种循环模块,用于解决全神经渲染过程中的 геометрические抽象问题。广泛的实验证明了我们提出的模型与现有状态艺技术相比,具有出色的表现。
Improved Flood Insights: Diffusion-Based SAR to EO Image Translation
paper_authors: Minseok Seo, Youngtack Oh, Doyi Kim, Dongmin Kang, Yeji Choi
For: 该论文目的是提高洪水灾害评估的可 interpretability,通过将Synthetic Aperture Radar(SAR)图像转换成Electro-Optical(EO)图像,提高人类分析者对洪水危机的理解。* Methods: 该论文提出了一种新的Diffusion-Based SAR to EO Image Translation(DSE)框架,用于将SAR图像转换成EO图像,以提高洪水灾害评估的可 interpretability。* Results: 实验结果表明,DSE框架不仅可以提高洪水灾害评估的可读性,还可以提高所有测试的洪水分割基准的性能。Abstract
Driven by rapid climate change, the frequency and intensity of flood events are increasing. Electro-Optical (EO) satellite imagery is commonly utilized for rapid response. However, its utilities in flood situations are hampered by issues such as cloud cover and limitations during nighttime, making accurate assessment of damage challenging. Several alternative flood detection techniques utilizing Synthetic Aperture Radar (SAR) data have been proposed. Despite the advantages of SAR over EO in the aforementioned situations, SAR presents a distinct drawback: human analysts often struggle with data interpretation. To tackle this issue, this paper introduces a novel framework, Diffusion-Based SAR to EO Image Translation (DSE). The DSE framework converts SAR images into EO images, thereby enhancing the interpretability of flood insights for humans. Experimental results on the Sen1Floods11 and SEN12-FLOOD datasets confirm that the DSE framework not only delivers enhanced visual information but also improves performance across all tested flood segmentation baselines.
摘要
随着气候变化的加速,洪水事件的频率和 INTENSITY 都在增加。电子优化(EO)卫星成像通常用于快速应对。然而,它在洪水情况下面临着云覆盖和夜间限制,增加了评估损害的困难。一些使用 Synthetic Aperture Radar(SAR)数据的洪水探测技术已经被提议。虽然 SAR 在上述情况下具有优势,但它具有一个缺点:人工分析员经常遇到数据解释的困难。为解决这个问题,本文提出了一个新的框架:傅尔基于 SAR 到 EO 图像翻译(DSE)。DSE 框架将 SAR 图像转换为 EO 图像,从而提高了洪水启示的可读性 для 人类。实验结果表明,在 Sen1Floods11 和 SEN12-FLOOD 数据集上,DSE 框架不仅提供了加强的视觉信息,还提高了所有测试的洪水分割基elines 的性能。
Achelous: A Fast Unified Water-surface Panoptic Perception Framework based on Fusion of Monocular Camera and 4D mmWave Radar
results: 在一个收集的数据集上,Achelous family模型比 HybridNets 快11 FPS,并在5 mAP$_{\text{50-95}$ 和 0.7 mIoU 上超越 YOLOX-Tiny 和 Segformer-B0,特别是在恶势卷云、黑暗环境和相机失效情况下表现出色。Abstract
Current perception models for different tasks usually exist in modular forms on Unmanned Surface Vehicles (USVs), which infer extremely slowly in parallel on edge devices, causing the asynchrony between perception results and USV position, and leading to error decisions of autonomous navigation. Compared with Unmanned Ground Vehicles (UGVs), the robust perception of USVs develops relatively slowly. Moreover, most current multi-task perception models are huge in parameters, slow in inference and not scalable. Oriented on this, we propose Achelous, a low-cost and fast unified panoptic perception framework for water-surface perception based on the fusion of a monocular camera and 4D mmWave radar. Achelous can simultaneously perform five tasks, detection and segmentation of visual targets, drivable-area segmentation, waterline segmentation and radar point cloud segmentation. Besides, models in Achelous family, with less than around 5 million parameters, achieve about 18 FPS on an NVIDIA Jetson AGX Xavier, 11 FPS faster than HybridNets, and exceed YOLOX-Tiny and Segformer-B0 on our collected dataset about 5 mAP$_{\text{50-95}$ and 0.7 mIoU, especially under situations of adverse weather, dark environments and camera failure. To our knowledge, Achelous is the first comprehensive panoptic perception framework combining vision-level and point-cloud-level tasks for water-surface perception. To promote the development of the intelligent transportation community, we release our codes in \url{https://github.com/GuanRunwei/Achelous}.
摘要
现有的感知模型通常存在模块化的形式在无人水面车(USV)上,这些模型在边缘设备上并行执行,导致感知结果与USV位置的偏差,并导致自动导航错误决策。相比于无人地面车(UGV),USV的感知模型发展相对较慢。另外,大多数当前的多任务感知模型具有庞大的参数量、慢速推理和不可扩展的问题。基于这一点,我们提出了 Achelous,一个低成本、快速的综合杂化感知框架,用于水面感知。Achelous可同时完成五个任务,包括视觉目标检测和分割、可行区域分割、水面分割和4D mmWave雷达点云分割。此外,Achelous家族中的模型,具有 Less than 5000万参数,在 NVIDIA Jetson AGX Xavier 上达到约 18 FPS,比 HybridNets 快 11 FPS,并在我们收集的数据集上 exceed YOLOX-Tiny 和 Segformer-B0 的 5 mAP$_{\text{50-95}$ 和 0.7 mIoU,特别是在恶劣天气、黑暗环境和摄像头故障等情况下。我们知道,Achelous 是首个对水面感知任务进行综合杂化感知框架的研究。为推动智能交通社区的发展,我们在 \url{https://github.com/GuanRunwei/Achelous} 上发布了代码。
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
results: 我们的实验表明,我们的框架可以显著提高一个基本图像文本基线(BLIP-2)的性能,并有效地缩小基于4M或129M图像文本对应的模型性能差距。此外,我们的框架可以在不同的基模块上进行模块化和灵活的应用,并在视频学习任务中得到了成功应用。代码可以在https://github.com/yiren-jian/BLIText上获取。Abstract
We present a novel methodology aimed at optimizing the application of frozen large language models (LLMs) for resource-intensive vision-language (VL) pre-training. The current paradigm uses visual features as prompts to guide language models, with a focus on determining the most relevant visual features for corresponding text. Our approach diverges by concentrating on the language component, specifically identifying the optimal prompts to align with visual features. We introduce the Prompt-Transformer (P-Former), a model that predicts these ideal prompts, which is trained exclusively on linguistic data, bypassing the need for image-text pairings. This strategy subtly bifurcates the end-to-end VL training process into an additional, separate stage. Our experiments reveal that our framework significantly enhances the performance of a robust image-to-text baseline (BLIP-2), and effectively narrows the performance gap between models trained with either 4M or 129M image-text pairs. Importantly, our framework is modality-agnostic and flexible in terms of architectural design, as validated by its successful application in a video learning task using varied base modules. The code is available at https://github.com/yiren-jian/BLIText
摘要
我们提出了一种新的方法,旨在优化冻结大型语言模型(LLM)在资源占用量大的视频语言(VL)预训练中的应用。现有的方法使用视觉特征作为提示来导引语言模型,主要是确定与文本相对应的最 relevante 视觉特征。我们的方法则注重语言组件,具体来说是确定最佳提示,以对应视觉特征进行对齐。我们提出了Prompt-Transformer(P-Former)模型,该模型预测最佳提示,并且专门在语言数据上进行训练,不需要图像文本对应。这种策略将结束到终端VL训练过程中的整个过程分解为一个额外的、独立的阶段。我们的实验表明,我们的框架可以显著提高一种稳定的图像文本基线(BLIP-2)的性能,并有效缩小使用4M或129M图像文本对应的模型性能差距。此外,我们的框架是模块化的,可以在不同的基础模块上进行成功应用,如视频学习任务中。代码可以在https://github.com/yiren-jian/BLIText 上获取。
AnyStar: Domain randomized universal star-convex 3D instance segmentation
results: 根据论文的描述,使用该方法可以训练一个通用的星形实例分割网络,可以在不同的 dataset 和模式下进行高精度的3D分割,无需进行任何的重新引入、微调或域适应。Abstract
Star-convex shapes arise across bio-microscopy and radiology in the form of nuclei, nodules, metastases, and other units. Existing instance segmentation networks for such structures train on densely labeled instances for each dataset, which requires substantial and often impractical manual annotation effort. Further, significant reengineering or finetuning is needed when presented with new datasets and imaging modalities due to changes in contrast, shape, orientation, resolution, and density. We present AnyStar, a domain-randomized generative model that simulates synthetic training data of blob-like objects with randomized appearance, environments, and imaging physics to train general-purpose star-convex instance segmentation networks. As a result, networks trained using our generative model do not require annotated images from unseen datasets. A single network trained on our synthesized data accurately 3D segments C. elegans and P. dumerilii nuclei in fluorescence microscopy, mouse cortical nuclei in micro-CT, zebrafish brain nuclei in EM, and placental cotyledons in human fetal MRI, all without any retraining, finetuning, transfer learning, or domain adaptation. Code is available at https://github.com/neel-dey/AnyStar.
摘要
星形对象在生物微scopie和放射学中出现,包括核体、肿块、肿肿转移和其他单元。现有的星形实例分割网络需要大量的手动标注实例,这需要巨大的人工标注努力。此外,对于新的数据集和成像模式来说,需要重大的重工程化或者赋值,这是因为对比、形状、方向、分辨率和密度的变化。我们提出了AnyStar,一种随机生成模型,用于生成星形对象的 sintetic 训练数据,用于训练通用的星形实例分割网络。因此,使用我们的生成模型训练的网络不需要未看过的数据集的注释。我们的网络可以高精度地3D分割C. elegans和P. dumerilii核体在 fluorescence 微scopie中,mouse cortical核体在 micro-CT 中,zebrafish brain核体在 EM 中,以及人类胎儿 Placental cotyledons 在人类胎儿 MRI 中,无需任何再训练、赋值、传输学习或领域适应。代码可以在https://github.com/neel-dey/AnyStar 中找到。
Deepfake Video Detection Using Generative Convolutional Vision Transformer
For: The paper is written for detecting deepfake videos, which have become a significant concern due to their potential to spread false information and compromise digital media integrity.* Methods: The proposed model, called Generative Convolutional Vision Transformer (GenConViT), combines ConvNeXt and Swin Transformer models for feature extraction, and utilizes Autoencoder and Variational Autoencoder to learn from the latent data distribution.* Results: The model achieves improved performance in detecting a wide range of deepfake videos, with an average accuracy of 95.8% and an AUC value of 99.3% across the tested datasets. The model demonstrates robust performance in deepfake video detection and provides an effective solution for identifying fake videos while preserving media integrity.Here are the three points in Simplified Chinese text:* For: 这篇论文是为检测深伪视频而写的,深伪视频已经成为 False Information 的一种可能性,并且可能会对数字媒体的完整性产生负面影响。* Methods: 该论文提出的模型是 Generative Convolutional Vision Transformer (GenConViT),它结合 ConvNeXt 和 Swin Transformer 模型来提取特征,并使用 Autoencoder 和 Variational Autoencoder 来学习 latent 数据分布。* Results: 模型在多种深伪视频检测中表现出色,具有高精度和高 F1 分数,以及 AUC 值。模型在检测深伪视频方面达到了robust性,并提供了一种有效的方法来识别假视频,保持数字媒体完整性。Abstract
Deepfakes have raised significant concerns due to their potential to spread false information and compromise digital media integrity. In this work, we propose a Generative Convolutional Vision Transformer (GenConViT) for deepfake video detection. Our model combines ConvNeXt and Swin Transformer models for feature extraction, and it utilizes Autoencoder and Variational Autoencoder to learn from the latent data distribution. By learning from the visual artifacts and latent data distribution, GenConViT achieves improved performance in detecting a wide range of deepfake videos. The model is trained and evaluated on DFDC, FF++, DeepfakeTIMIT, and Celeb-DF v2 datasets, achieving high classification accuracy, F1 scores, and AUC values. The proposed GenConViT model demonstrates robust performance in deepfake video detection, with an average accuracy of 95.8% and an AUC value of 99.3% across the tested datasets. Our proposed model addresses the challenge of generalizability in deepfake detection by leveraging visual and latent features and providing an effective solution for identifying a wide range of fake videos while preserving media integrity. The code for GenConViT is available at https://github.com/erprogs/GenConViT.
摘要
深层复制(Deepfake)技术已经引起了广泛的关注,因为它们的潜在能力可能导致假信息的传播和数字媒体的完整性的威胁。在这项工作中,我们提出了一种基于Generative Convolutional Vision Transformer(GenConViT)的深层复制视频检测模型。我们的模型结合了ConvNeXt和Swin Transformer模型,用于特征提取,并使用Autoencoder和Variational Autoencoder来学习从隐藏数据分布中学习。通过学习视觉特征和隐藏数据分布,GenConViT实现了对各种深层复制视频的广泛检测性能的改进。我们的模型在DFDC、FF++、DeepfakeTIMIT和Celeb-DF v2等数据集上进行训练和评估,实现了高的分类精度、F1分数和AUC值。我们提出的GenConViT模型在深层复制视频检测中表现了强大的一致性,其平均准确率为95.8%,AUC值为99.3%。我们的提出的模型通过利用视觉和隐藏特征,为检测各种假视频而提供了有效的解决方案,保护数字媒体完整性。GenConViT模型的代码可以在https://github.com/erprogs/GenConViT中下载。
Tapestry of Time and Actions: Modeling Human Activity Sequences using Temporal Point Process Flows
results: 经过广泛的实验,表明ProActive模型在动作和目标预测方面具有显著的准确率提升,并且实现了端到端动作序列生成的首次应用。Abstract
Human beings always engage in a vast range of activities and tasks that demonstrate their ability to adapt to different scenarios. Any human activity can be represented as a temporal sequence of actions performed to achieve a certain goal. Unlike the time series datasets extracted from electronics or machines, these action sequences are highly disparate in their nature -- the time to finish a sequence of actions can vary between different persons. Therefore, understanding the dynamics of these sequences is essential for many downstream tasks such as activity length prediction, goal prediction, next action recommendation, etc. Existing neural network-based approaches that learn a continuous-time activity sequence (or CTAS) are limited to the presence of only visual data or are designed specifically for a particular task, i.e., limited to next action or goal prediction. In this paper, we present ProActive, a neural marked temporal point process (MTPP) framework for modeling the continuous-time distribution of actions in an activity sequence while simultaneously addressing three high-impact problems -- next action prediction, sequence-goal prediction, and end-to-end sequence generation. Specifically, we utilize a self-attention module with temporal normalizing flows to model the influence and the inter-arrival times between actions in a sequence. In addition, we propose a novel addition over the ProActive model that can handle variations in the order of actions, i.e., different methods of achieving a given goal. We demonstrate that this variant can learn the order in which the person or actor prefers to do their actions. Extensive experiments on sequences derived from three activity recognition datasets show the significant accuracy boost of ProActive over the state-of-the-art in terms of action and goal prediction, and the first-ever application of end-to-end action sequence generation.
摘要
人类总是在各种各样的活动和任务中展现出非常强的适应能力,这些活动和任务可以看作是一个时间序列的动作。与电子设备或机器的时间序列数据不同,这些动作序列具有非常不同的特点,因此理解这些序列的动态是许多下游任务的关键,如行动长度预测、目标预测、下一个动作建议等。现有的基于神经网络的方法,只能学习视觉数据的连续时间动作序列(CTAS),而且一些方法只能进行特定任务,如下一个动作或目标预测。在这篇论文中,我们提出了ProActive,一种基于marked temporal point process(MTPP)的神经网络框架,用于模型连续时间动作序列中的动作的分布,同时解决三个高度影响的问题:下一个动作预测、序列目标预测和终端动作生成。我们使用自注意模块和时间正常化流来模型动作序列中动作之间的影响和间隔时间。此外,我们还提出了一种新的ProActive变体,可以处理动作序列中动作的不同顺序,即不同的方法来完成同一个目标。我们的实验表明,这种变体可以学习actor或人类的偏好顺序。与现状的最佳实践相比,ProActive在动作和目标预测方面具有显著的准确率提升,并且是继承动作序列生成的首次应用。
Bridging the Gap: Heterogeneous Face Recognition with Conditional Adaptive Instance Modulation
paper_authors: Anjith George, Sebastien Marcel for:本研究旨在提高Face Recognition(FR)系统的可用性,通过匹配不同频谱的脸像,包括可见和热成像频谱。methods:我们对不同频谱的脸像进行了分类,并将不同频谱视为不同的样式。我们还提出了一种 Conditional Adaptive Instance Modulation(CAIM)模块,可以将这些样式引入到预训练FR网络中,以适应目标频谱。results:我们在多个具有挑战性的测试集上进行了广泛的评估,并证明了我们的方法在与现有方法相比具有更高的性能。我们将源代码和实验室协议公开发布,以便重复我们的发现。Abstract
Heterogeneous Face Recognition (HFR) aims to match face images across different domains, such as thermal and visible spectra, expanding the applicability of Face Recognition (FR) systems to challenging scenarios. However, the domain gap and limited availability of large-scale datasets in the target domain make training robust and invariant HFR models from scratch difficult. In this work, we treat different modalities as distinct styles and propose a framework to adapt feature maps, bridging the domain gap. We introduce a novel Conditional Adaptive Instance Modulation (CAIM) module that can be integrated into pre-trained FR networks, transforming them into HFR networks. The CAIM block modulates intermediate feature maps, to adapt the style of the target modality effectively bridging the domain gap. Our proposed method allows for end-to-end training with a minimal number of paired samples. We extensively evaluate our approach on multiple challenging benchmarks, demonstrating superior performance compared to state-of-the-art methods. The source code and protocols for reproducing the findings will be made publicly available.
摘要
异构面Recognition(HFR)目的是匹配不同频谱的面图像,扩展面认知(FR)系统的应用场景。然而,频谱差距和目标频谱大规模数据的有限性使得从头文件训练Robust和抗变异HFR模型困难。在这种情况下,我们将不同频谱视为不同风格,并提议一种框架来适应特征地图。我们引入了一种新的Conditional Adaptive Instance Modulation(CAIM)模块,可以与预训练FR网络集成,将其转化为HFR网络。CAIM块在中间特征地图中进行调整,以有效地桥接频谱差距。我们提出的方法允许终端培训,只需要 minimal number of paired samples。我们对多个复杂的标准架进行了广泛的评估,并示出了与当前方法相比的优秀性能。源代码和 reproduce 的协议将公开发布。
Self-regulating Prompts: Foundational Model Adaptation without Forgetting
paper_authors: Muhammad Uzair Khattak, Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan for:PromptSRC is designed to improve the performance of prompt learning for downstream tasks while maintaining the generalization ability of the pre-trained CLIP model.methods:PromptSRC uses a self-regularization framework that includes mutual agreement maximization, self-ensemble of prompts, and textual diversity to guide the prompts to optimize for both task-specific and task-agnostic general representations.results:PromptSRC outperforms existing methods on 4 benchmarks and demonstrates better performance on downstream tasks while maintaining the generalization ability of the pre-trained CLIP model.Here is the answer in Simplified Chinese text:for: PromptSRC 是为提高下游任务的表现而设计的提示学习方法,同时保持预训练 CLIP 模型的通用能力。methods: PromptSRC 使用自我 regulatory 框架,包括协调最大化、自我集成和文本多样性来引导提示来优化任务特定和通用特征表示。results: PromptSRC 在 4 个 benchmark 上表现出色,在下游任务上表现更好,同时保持预训练 CLIP 模型的通用能力。Abstract
Prompt learning has emerged as an efficient alternative for fine-tuning foundational models, such as CLIP, for various downstream tasks. Conventionally trained using the task-specific objective, i.e., cross-entropy loss, prompts tend to overfit downstream data distributions and find it challenging to capture task-agnostic general features from the frozen CLIP. This leads to the loss of the model's original generalization capability. To address this issue, our work introduces a self-regularization framework for prompting called PromptSRC (Prompting with Self-regulating Constraints). PromptSRC guides the prompts to optimize for both task-specific and task-agnostic general representations using a three-pronged approach by: (a) regulating prompted representations via mutual agreement maximization with the frozen model, (b) regulating with self-ensemble of prompts over the training trajectory to encode their complementary strengths, and (c) regulating with textual diversity to mitigate sample diversity imbalance with the visual branch. To the best of our knowledge, this is the first regularization framework for prompt learning that avoids overfitting by jointly attending to pre-trained model features, the training trajectory during prompting, and the textual diversity. PromptSRC explicitly steers the prompts to learn a representation space that maximizes performance on downstream tasks without compromising CLIP generalization. We perform extensive experiments on 4 benchmarks where PromptSRC overall performs favorably well compared to the existing methods. Our code and pre-trained models are publicly available at: https://github.com/muzairkhattak/PromptSRC.
摘要
它们发现了一种高效的替代方案,即使用适应性训练来练习基础模型,如CLIP,以适应不同的下游任务。通常通过任务特定的目标函数,即十字矩阵损失函数,来训练提示。然而,这会导致提示过拟合下游数据分布,难以捕捉基于冻结的CLIP的任务不可预测的通用特征。这会导致模型的原始泛化能力丢失。为解决这个问题,我们的工作提出了一个自regularization框架,即PromptSRC(提示自regularization)。PromptSRC使提示优化任务特定和任务无关通用表示,通过以下三个方法:(a)通过与冻结模型的共识最大化来规范提示表示,(b)通过自我集成提示训练轨迹中的提示来编码其优势,(c)通过文本多样性来抑制采样不均衡问题。根据我们所知,这是首个避免过拟合的提示学习规则框架,可以同时关注预训练模型特征、训练轨迹和文本多样性。PromptSRC显式地使提示学习一个表示空间,以最大化下游任务性能而无需妥协CLIP泛化。我们在4个标准测试集上进行了广泛的实验,并证明PromptSRC在现有方法中表现出色。我们的代码和预训练模型可以在https://github.com/muzairkhattak/PromptSRC上获取。
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
results: 我们的模型在视频识别和检索任务上表现出色,并且在视频-文本生成和对话系统等多modal应用中也具有广泛的应用前景。Abstract
This paper introduces InternVid, a large-scale video-centric multimodal dataset that enables learning powerful and transferable video-text representations for multimodal understanding and generation. The InternVid dataset contains over 7 million videos lasting nearly 760K hours, yielding 234M video clips accompanied by detailed descriptions of total 4.1B words. Our core contribution is to develop a scalable approach to autonomously build a high-quality video-text dataset with large language models (LLM), thereby showcasing its efficacy in learning video-language representation at scale. Specifically, we utilize a multi-scale approach to generate video-related descriptions. Furthermore, we introduce ViCLIP, a video-text representation learning model based on ViT-L. Learned on InternVid via contrastive learning, this model demonstrates leading zero-shot action recognition and competitive video retrieval performance. Beyond basic video understanding tasks like recognition and retrieval, our dataset and model have broad applications. They are particularly beneficial for generating interleaved video-text data for learning a video-centric dialogue system, advancing video-to-text and text-to-video generation research. These proposed resources provide a tool for researchers and practitioners interested in multimodal video understanding and generation.
摘要
Translated into Simplified Chinese:这篇论文介绍了InternVid,一个大规模的视频-中心多模态数据集,允许学习强大并可传递的视频-文本表示,以便多Modal理解和生成。InternVid数据集包含超过700万个视频,持续时间约为760000小时,共有23400万个视频clip, accompanied by detailed descriptions of total 4.1 billion words。我们的核心贡献是开发一种可扩展的方法,通过大语言模型(LLM)来自动建立高质量的视频-文本数据集,并在这个数据集上学习视频-语言表示。我们使用多尺度方法生成视频相关的描述。此外,我们介绍了ViCLIP,基于ViT-L的视频-文本表示学习模型。通过对InternVid进行对比学习,这个模型在零shot动作识别和视频检索方面表现出了领先的性能。除了基本的视频理解任务之外,我们的数据集和模型在广泛的应用领域。它们特别有用于生成交叠的视频-文本数据,以便学习基于视频的对话系统,以及进一步的视频-文本和文本-视频生成研究。这些提出的资源为关注多Modal视频理解和生成的研究人员和实践者提供了一个工具。
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
results: 比较现有基eline的优势,能够生成具有可控的场景和动作,以及文本提示导航的视觉故事视频。Abstract
Generating videos for visual storytelling can be a tedious and complex process that typically requires either live-action filming or graphics animation rendering. To bypass these challenges, our key idea is to utilize the abundance of existing video clips and synthesize a coherent storytelling video by customizing their appearances. We achieve this by developing a framework comprised of two functional modules: (i) Motion Structure Retrieval, which provides video candidates with desired scene or motion context described by query texts, and (ii) Structure-Guided Text-to-Video Synthesis, which generates plot-aligned videos under the guidance of motion structure and text prompts. For the first module, we leverage an off-the-shelf video retrieval system and extract video depths as motion structure. For the second module, we propose a controllable video generation model that offers flexible controls over structure and characters. The videos are synthesized by following the structural guidance and appearance instruction. To ensure visual consistency across clips, we propose an effective concept personalization approach, which allows the specification of the desired character identities through text prompts. Extensive experiments demonstrate that our approach exhibits significant advantages over various existing baselines.
摘要
生成视频 для视觉Storytelling可以是一个劳碌和复杂的过程,通常需要live-action拍摄或图形动画渲染。为了绕过这些挑战,我们的关键想法是利用现有的视频剪辑和synthesize一个coherent的Storytelling视频,并通过自定义视频的场景和动作来满足用户的需求。我们的框架包括两个功能模块:1. 动作结构检索(Motion Structure Retrieval):通过提供用户需要的场景或动作上下文的查询文本,检索可用的视频clip。2. 结构引导文本到视频synthesizer(Structure-Guided Text-to-Video Synthesis):根据动作结构和文本提示,生成具有结构和人物控制的视频。为了实现第一个模块,我们利用了一个off-the-shelf的视频检索系统,并提取视频的深度作为动作结构。为了实现第二个模块,我们提议一种可控的视频生成模型,具有灵活的结构和人物控制。通过跟随结构指导和外观指令,我们可以生成具有视觉一致性的视频。为保证视频之间的一致性,我们提议一种有效的人物个性化方法,允许通过文本提示来定义愿望的人物标识。我们的方法在多个存在baseline的实验中表现出了显著的优势。
Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models
paper_authors: Moab Arar, Rinon Gal, Yuval Atzmon, Gal Chechik, Daniel Cohen-Or, Ariel Shamir, Amit H. Bermano
for: This paper is written for personalizing text-to-image (T2I) generation, allowing users to guide the creative image generation process by combining their own visual concepts in natural language prompts.
methods: The paper proposes a domain-agnostic method for T2I personalization that does not require any specialized dataset or prior information about the personalized concepts. The method uses a novel contrastive-based regularization technique to maintain high fidelity to the target concept characteristics while keeping the predicted embeddings close to editable regions of the latent space.
results: The experimental results demonstrate the effectiveness of the proposed approach, showing that the learned tokens are more semantic than tokens predicted by unregularized models. This leads to a better representation that achieves state-of-the-art performance while being more flexible than previous methods.Here’s the Chinese translation of the three information points:
results: 实验结果表明提出的方法的有效性,预测的元素比不正则化模型预测的元素更加 semantic,导致一个更好的表示,实现了状态的最佳性能,同时比前方法更加灵活。Abstract
Text-to-image (T2I) personalization allows users to guide the creative image generation process by combining their own visual concepts in natural language prompts. Recently, encoder-based techniques have emerged as a new effective approach for T2I personalization, reducing the need for multiple images and long training times. However, most existing encoders are limited to a single-class domain, which hinders their ability to handle diverse concepts. In this work, we propose a domain-agnostic method that does not require any specialized dataset or prior information about the personalized concepts. We introduce a novel contrastive-based regularization technique to maintain high fidelity to the target concept characteristics while keeping the predicted embeddings close to editable regions of the latent space, by pushing the predicted tokens toward their nearest existing CLIP tokens. Our experimental results demonstrate the effectiveness of our approach and show how the learned tokens are more semantic than tokens predicted by unregularized models. This leads to a better representation that achieves state-of-the-art performance while being more flexible than previous methods.
摘要
LVLane: Deep Learning for Lane Detection and Classification in Challenging Conditions
results: 实验结果表明,该系统在TuSimple数据集、Caltech Lane数据集和LVLane数据集等多个数据集上具有优秀的车道检测和分类能力,特别是在面临特殊挑战的场景下表现出色。Abstract
Lane detection plays a pivotal role in the field of autonomous vehicles and advanced driving assistant systems (ADAS). Despite advances from image processing to deep learning based models, algorithm performance is highly dependent on training data matching the local challenges such as extreme lighting conditions, partially visible lane markings, and sparse lane markings like Botts' dots. To address this, we present an end-to-end lane detection and classification system based on deep learning methodologies. In our study, we introduce a unique dataset meticulously curated to encompass scenarios that pose significant challenges for state-of-the-art (SOTA) lane localization models. Moreover, we propose a CNN-based classification branch, seamlessly integrated with the detector, facilitating the identification of distinct lane types. This architecture enables informed lane-changing decisions and empowers more resilient ADAS capabilities. We also investigate the effect of using mixed precision training and testing on different models and batch sizes. Experimental evaluations conducted on the widely-used TuSimple dataset, Caltech Lane dataset, and our LVLane dataset demonstrate the effectiveness of our model in accurately detecting and classifying lanes amidst challenging scenarios. Our method achieves state-of-the-art classification results on the TuSimple dataset. The code of the work can be found on www.github.com/zillur-av/LVLane.
摘要
lane detection 在自动驾驶和高级驾驶助手系统(ADAS)中扮演着关键的角色。尽管从图像处理演化到深度学习基于模型,但算法性能仍然受到本地挑战的影响,如极端光照条件、部分可见的车道标记和罕见的车道标记如Botts的点。为解决这一问题,我们提出了一个端到端的车道检测和分类系统,基于深度学习方法。在我们的研究中,我们提供了一个独特的数据集,仔细筛选了表现出特殊挑战的场景,以便为现有的SOTA车道local化模型提供更加准确的挑战。此外,我们还提出了一种基于CNN的分类分支,与检测器紧密集成,以便识别不同的车道类型。这种架构允许更加有知识的车道更改决策,激发更加可靠的ADAS功能。我们还进行了不同模型和批处理大小的混合精度训练和测试的研究。实验结果在广泛使用的TuSimple数据集、Caltech Lane数据集和我们的LVLane数据集上展示了我们的模型在面临挑战场景下准确检测和分类车道的能力。我们的方法在TuSimple数据集上实现了SOTA的分类结果。代码可以在www.github.com/zillur-av/LVLane中找到。
paper_authors: Fei Zhang, Yunjie Ye, Lei Feng, Zhongwen Rao, Jieming Zhu, Marcus Kalander, Chen Gong, Jianye Hao, Bo Han
for: 这个论文研究了一个新的问题,即活动学习 WITH 部分标签(ALPL)。在这种设定下,一个oracle对查询样本提供部分标签,从而放宽oracle的准确标签过程。
methods: 我们首先构建了一个直观的基线,可以轻松地整合到现有的AL框架中。虽然有效,但这个基eline仍然受到过拟合的影响,并且在选择代表样本时缺乏表达partial-label-based samples的能力。我们从人类的认知科学中灵感,where accurate inferences can be explicitly derived from counter-examples (CEs),我们的目标是使用这种人类学习模式来解决过拟合问题,同时提高选择代表样本的过程。
results: 我们的方法在五个实际数据集和四个benchmark数据集上实现了全面的改进,超过了十个代表性的AL框架。这些实验结果表明,我们提议的方法可以增强predictor的性能和选择代表样本的过程,使predictor能够更准确地捕捉数据中的征性 Patterns。Abstract
This paper studies a new problem, \emph{active learning with partial labels} (ALPL). In this setting, an oracle annotates the query samples with partial labels, relaxing the oracle from the demanding accurate labeling process. To address ALPL, we first build an intuitive baseline that can be seamlessly incorporated into existing AL frameworks. Though effective, this baseline is still susceptible to the \emph{overfitting}, and falls short of the representative partial-label-based samples during the query process. Drawing inspiration from human inference in cognitive science, where accurate inferences can be explicitly derived from \emph{counter-examples} (CEs), our objective is to leverage this human-like learning pattern to tackle the \emph{overfitting} while enhancing the process of selecting representative samples in ALPL. Specifically, we construct CEs by reversing the partial labels for each instance, and then we propose a simple but effective WorseNet to directly learn from this complementary pattern. By leveraging the distribution gap between WorseNet and the predictor, this adversarial evaluation manner could enhance both the performance of the predictor itself and the sample selection process, allowing the predictor to capture more accurate patterns in the data. Experimental results on five real-world datasets and four benchmark datasets show that our proposed method achieves comprehensive improvements over ten representative AL frameworks, highlighting the superiority of WorseNet. The source code will be available at \url{https://github.com/Ferenas/APLL}.
摘要
CAMP: A Context-Aware Cricket Players Performance Metric
results: 研究发现,CAMP评估结果与专家委员会宣布的最佳球员(Man of the Match,MoM)相closely match,在961场比赛中,CAMP评估的top两名球员与MoM一致的比例为83%。此外,CAMP的评估结果也超过了现有的最佳球员贡献度量方法(基于Duckworth-Lewis-Stern方法)。Abstract
Cricket is the second most popular sport after soccer in terms of viewership. However, the assessment of individual player performance, a fundamental task in team sports, is currently primarily based on aggregate performance statistics, including average runs and wickets taken. We propose Context-Aware Metric of player Performance, CAMP, to quantify individual players' contributions toward a cricket match outcome. CAMP employs data mining methods and enables effective data-driven decision-making for selection and drafting, coaching and training, team line-ups, and strategy development. CAMP incorporates the exact context of performance, such as opponents' strengths and specific circumstances of games, such as pressure situations. We empirically evaluate CAMP on data of limited-over cricket matches between 2001 and 2019. In every match, a committee of experts declares one player as the best player, called Man of the M}atch (MoM). The top two rated players by CAMP match with MoM in 83\% of the 961 games. Thus, the CAMP rating of the best player closely matches that of the domain experts. By this measure, CAMP significantly outperforms the current best-known players' contribution measure based on the Duckworth-Lewis-Stern (DLS) method.
摘要
крикет是以往最受欢迎的运动之一,仅次于足球。然而,评估个体运动员的表现,是现代团队运动中的基本任务,目前主要基于汇总性统计,包括平均得分和帮助取下。我们提出了 Context-Aware Metric of player Performance(CAMP),用于衡量 крикет运动员的贡献。CAMP 利用数据挖掘技术,实现了有效的数据驱动决策,包括选择和招募、教练和训练、队列和战略开发。CAMP 包含了特定场景的表现,如对手的优势和特定游戏情况,如压力情况。我们对数据库中的限制时间 крикет比赛数据进行了Empirical评估。每场比赛中,专家委员会选择一名最佳运动员,称为 Man of the Match(MoM)。CAMP 中的前两名与 MoM 匹配在 83% 的 961 场比赛中。因此,CAMP 评分的最佳运动员与领域专家的评价相当接近。根据这个标准,CAMP 明显超过了目前最佳运动员贡献的度量方法,基于 Duckworth-Lewis-Stern(DLS)方法。
Rank Your Summaries: Enhancing Bengali Text Summarization via Ranking-based Approach
results: 实验结果表明,通过 combinig 每种预训练模型的优点并使用排名算法,本方法可以显著提高 Bengali 文本摘要的准确性和效果。Abstract
With the increasing need for text summarization techniques that are both efficient and accurate, it becomes crucial to explore avenues that enhance the quality and precision of pre-trained models specifically tailored for summarizing Bengali texts. When it comes to text summarization tasks, there are numerous pre-trained transformer models at one's disposal. Consequently, it becomes quite a challenge to discern the most informative and relevant summary for a given text among the various options generated by these pre-trained summarization models. This paper aims to identify the most accurate and informative summary for a given text by utilizing a simple but effective ranking-based approach that compares the output of four different pre-trained Bengali text summarization models. The process begins by carrying out preprocessing of the input text that involves eliminating unnecessary elements such as special characters and punctuation marks. Next, we utilize four pre-trained summarization models to generate summaries, followed by applying a text ranking algorithm to identify the most suitable summary. Ultimately, the summary with the highest ranking score is chosen as the final one. To evaluate the effectiveness of this approach, the generated summaries are compared against human-annotated summaries using standard NLG metrics such as BLEU, ROUGE, BERTScore, WIL, WER, and METEOR. Experimental results suggest that by leveraging the strengths of each pre-trained transformer model and combining them using a ranking-based approach, our methodology significantly improves the accuracy and effectiveness of the Bengali text summarization.
摘要
随着需要高效精准的文本摘要技术的增加,对适应 Bengali 文本摘要模型进行特点化成本的探索变得非常重要。在文本摘要任务中,有许多预训练的转换器模型可供选择。因此,对于给定的文本,从多个预训练摘要模型中选择最佳的摘要成为了一项挑战。本文提出了一种简单 yet effective 的排名基于方法,该方法通过对四个预训练 Bengali 文本摘要模型的输出进行比较,以确定最佳的摘要。该过程包括对输入文本进行简化处理,从而消除不必要的元素,如特殊字符和标点符号。接着,我们利用四个预训练摘要模型生成摘要,然后应用文本排名算法来确定最佳摘要。最后,根据排名得分,选择最高分的摘要作为最终结果。为了评估该方法的有效性,我们将生成的摘要与人工标注的摘要进行比较,使用标准的NLG指标,如 BLEU、ROUGE、BERTScore、WIL、WER 和 METEOR。实验结果表明,通过利用每个预训练转换器模型的优点并将其结合使用排名基于方法,我们的方法可以显著提高 Bengali 文本摘要的准确性和效iveness。
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes
paper_authors: Guoyun Tu, Ying Liu, Vladimir Vlassov
For: 本研究旨在提出和实现一种Attribute-Information-Combined Attention-Based Network(AIC-AB NET),用于图像描述 generation。* Methods: 该模型结合了空间注意力架构和文本特征信息,并在encoder-decoder结构中进行了可适应的注意力调整。* Results: 对于 MS COCO 数据集和我们新提出的 Fashion 数据集,我们的 AIC-AB NET 与基eline模型和减少模型进行了比较,并取得了更高的性能水平。Abstract
Image captioning is a significant field across computer vision and natural language processing. We propose and present AIC-AB NET, a novel Attribute-Information-Combined Attention-Based Network that combines spatial attention architecture and text attributes in an encoder-decoder. For caption generation, adaptive spatial attention determines which image region best represents the image and whether to attend to the visual features or the visual sentinel. Text attribute information is synchronously fed into the decoder to help image recognition and reduce uncertainty. We have tested and evaluated our AICAB NET on the MS COCO dataset and a new proposed Fashion dataset. The Fashion dataset is employed as a benchmark of single-object images. The results show the superior performance of the proposed model compared to the state-of-the-art baseline and ablated models on both the images from MSCOCO and our single-object images. Our AIC-AB NET outperforms the baseline adaptive attention network by 0.017 (CIDEr score) on the MS COCO dataset and 0.095 (CIDEr score) on the Fashion dataset.
摘要
“图像描述是计算机视觉和自然语言处理领域中的一个重要领域。我们提议并提出了一种新的 Attribute-Information-Combined Attention-Based Network(AIC-AB NET),该模型结合了空间注意架构和文本特征在encoder-decoder中组合。在图像描述中,适应的空间注意可以确定图像中最好代表图像的区域,以及是否需要关注视觉特征或视觉特征。文本特征信息同时被 fed into decoder,以帮助图像识别和减少不确定性。我们在 MS COCO 数据集和我们提出的新的时尚数据集上测试和评估了我们的 AICAB NET。时尚数据集被用作单个物体图像的标准 benchmark。结果表明我们的提议模型与状态之前的基eline和剥离模型在 MS COCO 数据集和时尚数据集上的表现都有所提高,分别提高了0.017(CIDEr 分数)和0.095(CIDEr 分数)。”
Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack Overflow
paper_authors: Maria del Rio-Chanona, Nadzeya Laurentsyeva, Johannes Wachs
for: investigate how the release of ChatGPT affects human-generated open data on the web, specifically on Stack Overflow
methods: analyze activity on Stack Overflow, use difference-in-differences model to estimate the impact of ChatGPT
results: find a 16% decrease in weekly posts on Stack Overflow after the release of ChatGPT, with a greater impact on posts related to the most widely used programming languages, and no significant change in voting scores.Abstract
Large language models like ChatGPT efficiently provide users with information about various topics, presenting a potential substitute for searching the web and asking people for help online. But since users interact privately with the model, these models may drastically reduce the amount of publicly available human-generated data and knowledge resources. This substitution can present a significant problem in securing training data for future models. In this work, we investigate how the release of ChatGPT changed human-generated open data on the web by analyzing the activity on Stack Overflow, the leading online Q\&A platform for computer programming. We find that relative to its Russian and Chinese counterparts, where access to ChatGPT is limited, and to similar forums for mathematics, where ChatGPT is less capable, activity on Stack Overflow significantly decreased. A difference-in-differences model estimates a 16\% decrease in weekly posts on Stack Overflow. This effect increases in magnitude over time, and is larger for posts related to the most widely used programming languages. Posts made after ChatGPT get similar voting scores than before, suggesting that ChatGPT is not merely displacing duplicate or low-quality content. These results suggest that more users are adopting large language models to answer questions and they are better substitutes for Stack Overflow for languages for which they have more training data. Using models like ChatGPT may be more efficient for solving certain programming problems, but its widespread adoption and the resulting shift away from public exchange on the web will limit the open data people and models can learn from in the future.
摘要
大型自然语言模型如ChatGPT有效地为用户提供了关于不同话题的信息,可能成为搜索网络和在线问答的替代方案。但是由于用户与模型进行私人交互,这些模型可能会减少公开可用的人类生成的数据和知识资源。这种替换可能会对未来模型的训练数据产生重大问题。在这项工作中,我们研究了ChatGPT的发布后对网络上的人类生成开放数据的影响,通过分析Stack Overflow网站的活动。我们发现,相比于其俄罗斯和中国的对手, Stack Overflow上的活动呈现下降趋势。此外,与数学相关的讨论forum不同,Stack Overflow上的活动呈现更大的下降趋势。使用差异分析模型,我们估算Stack Overflow每周的帖子数减少16%。这个效果随时间的推移而增强,并且对于使用最广泛的编程语言的帖子更大。帖子发布后得到了与之前相同的投票分数,表明ChatGPT不仅不是替换低质量或 duplicates的内容,而且是更好的解决某些编程问题的模型。这些结果表明,更多的用户正在使用大语言模型来回答问题,而且它们对于某些编程语言来说是更好的替代方案。使用模型如ChatGPT可能更高效地解决某些编程问题,但是其广泛的采用和相应的减少公开数据将限制未来模型和人类可以学习的开放数据。
Source-Free Domain Adaptation with Temporal Imputation for Time Series Data
results: 实验结果显示, compared to现有的方法,MAPU在三个真实世界时间序列数据集上 achieves significant performance gain。Abstract
Source-free domain adaptation (SFDA) aims to adapt a pretrained model from a labeled source domain to an unlabeled target domain without access to the source domain data, preserving source domain privacy. Despite its prevalence in visual applications, SFDA is largely unexplored in time series applications. The existing SFDA methods that are mainly designed for visual applications may fail to handle the temporal dynamics in time series, leading to impaired adaptation performance. To address this challenge, this paper presents a simple yet effective approach for source-free domain adaptation on time series data, namely MAsk and imPUte (MAPU). First, to capture temporal information of the source domain, our method performs random masking on the time series signals while leveraging a novel temporal imputer to recover the original signal from a masked version in the embedding space. Second, in the adaptation step, the imputer network is leveraged to guide the target model to produce target features that are temporally consistent with the source features. To this end, our MAPU can explicitly account for temporal dependency during the adaptation while avoiding the imputation in the noisy input space. Our method is the first to handle temporal consistency in SFDA for time series data and can be seamlessly equipped with other existing SFDA methods. Extensive experiments conducted on three real-world time series datasets demonstrate that our MAPU achieves significant performance gain over existing methods. Our code is available at \url{https://github.com/mohamedr002/MAPU_SFDA_TS}.
摘要
源自由领域适应 (SFDA) 目标是将预训练的源频率模型适应到无标记目标频率频率上,保持源频率隐私。尽管 SFDA 在视觉应用中广泛存在,但在时间序列应用中它尚未得到充分研究。现有的 SFDA 方法主要是为视觉应用设计,可能无法处理时间序列中的时间动态,导致适应性下降。为解决这个挑战,本文提出了一种简单 yet 有效的时间序列 SFDA 方法,即 Maske 和 imPUte (MAPU)。首先,为了捕捉源频率频率中的时间信息,我们的方法在时间序列信号上随机填充,并利用一种新的时间填充器来在嵌入空间中恢复原始信号。其次,在适应步骤中,填充器网络被利用来引导目标模型生成目标特征,使其与源特征在时间上具有一致性。这样,我们的 MAPU 可以在适应过程中考虑时间相互关系,而不是在噪音输入空间中进行填充。我们的方法是时间序列 SFDA 中首次考虑时间一致性的方法,可以顺利地与其他现有的 SFDA 方法结合使用。我们的实验结果表明,使用 MAPU 可以在三个实际的时间序列数据集上实现显著的性能提升。我们的代码可以在 \url{https://github.com/mohamedr002/MAPU_SFDA_TS} 上找到。
Rethinking Trust Repair in Human-Robot Interaction
results: 本研究提供了人机交互中信任修复策略的概念和关键组件,以及现有的研究成果。未来的研究将围绕着这些研究问题进行发展。Abstract
As robots become increasingly prevalent in work-oriented collaborations, trust has emerged as a critical factor in their acceptance and effectiveness. However, trust is dynamic and can erode when mistakes are made. Despite emerging research on trust repair in human-robot interaction, significant questions remain about identifying reliable approaches to restoring trust in robots after trust violations occur. To address this problem, my research aims to identify effective strategies for designing robots capable of trust repair in human-robot interaction (HRI) and to explore the underlying mechanisms that make these strategies successful. This paper provides an overview of the fundamental concepts and key components of the trust repair process in HRI, as well as a summary of my current published work in this area. Additionally, I discuss the research questions that will guide my future work and the potential contributions that this research could make to the field.
摘要
As robots become increasingly prevalent in work-oriented collaborations, trust has emerged as a critical factor in their acceptance and effectiveness. However, trust is dynamic and can erode when mistakes are made. Despite emerging research on trust repair in human-robot interaction, significant questions remain about identifying reliable approaches to restoring trust in robots after trust violations occur. To address this problem, my research aims to identify effective strategies for designing robots capable of trust repair in human-robot interaction (HRI) and to explore the underlying mechanisms that make these strategies successful. This paper provides an overview of the fundamental concepts and key components of the trust repair process in HRI, as well as a summary of my current published work in this area. Additionally, I discuss the research questions that will guide my future work and the potential contributions that this research could make to the field.Here's the translation in Traditional Chinese:随着机器人在工作合作中的普及,信任已成为 kritical factor 的 acceptance 和 efficiency。然而,信任是动态的,可以在错误时被损坏。尽管人机交互中的信任修复研究已经出现,但仍有很多问题需要解决,包括如何确定可靠的方法来重建机器人信任。为了解决这个问题,我的研究目标是发现可靠的机器人信任修复策略,并探索这些策略的成功关键。这篇文章提供了人机交互中信任修复过程的基本概念和关键组件,以及我的现有发表作品。此外,我还讨论了未来研究的问题和这些研究对领域的潜在贡献。
Mitigating Bias in Conversations: A Hate Speech Classifier and Debiaser with Prompts
methods: 提议一种两步方法:首先,使用分类器检测 hate speech;然后,通过提示生成更不偏见或不偏见的替代语言。
results: 对一个标准数据集进行评估,观察到 hate speech 的负面效果减少。 这种方法可以帮助在线对话中减少偏见,创造更公正和包容的沟通环境。Abstract
Discriminatory language and biases are often present in hate speech during conversations, which usually lead to negative impacts on targeted groups such as those based on race, gender, and religion. To tackle this issue, we propose an approach that involves a two-step process: first, detecting hate speech using a classifier, and then utilizing a debiasing component that generates less biased or unbiased alternatives through prompts. We evaluated our approach on a benchmark dataset and observed reduction in negativity due to hate speech comments. The proposed method contributes to the ongoing efforts to reduce biases in online discourse and promote a more inclusive and fair environment for communication.
摘要
【文本】恐Speech hate speech在对话中经常带有歧视性语言和偏见,这通常会对targeted groups造成负面影响,如基于种族、性别和宗教的群体。为解决这个问题,我们提出了一种两步方法:首先,使用分类器探测 hate speech,然后使用debiasing组件生成less biased或无偏见的 altenatives through prompts。我们对 benchmark dataset进行了评估,并观察到了对 hate speech负面评论的减少。该方法对在线对话中减少偏见和促进更加包容和公正的环境做出了贡献。Note: "恐Speech" is a combination of "恐怖" (terror) and "Speech" (speech), which is a common term used to refer to hate speech in Chinese.
Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications
for: 这 paper 是关于无文本资源的语音表征学研究,具体是用 hidden unit clustering (HUC) 框架来实现自我超参的 speech 表征学习。
methods: 输入 audio 样本经 windowing 和 1-D convolutional layers 处理,然后使用 long short term memory (LSTM) 层生成每个窗口段的上下文向量表示。HUC 框架用于训练模型学习具有含义的 speech 表示。
results: 在 ZeroSpeech 2021 挑战中 Completely Unsupervised 语音应用中,以及 TIMIT 数据集和 GramVaani Hindi 数据集上的 semi-supervised automatic speech recognition (ASR) 应用中,模型 achieved state-of-art 结果。另外,HUC 表示在 ASR 实验中显著提高了对比 Wav2vec、HuBERT 和 Best-RQ 的成果。Abstract
The representation learning of speech, without textual resources, is an area of significant interest for many low resource speech applications. In this paper, we describe an approach to self-supervised representation learning from raw audio using a hidden unit clustering (HUC) framework. The input to the model consists of audio samples that are windowed and processed with 1-D convolutional layers. The learned "time-frequency" representations from the convolutional neural network (CNN) module are further processed with long short term memory (LSTM) layers which generate a contextual vector representation for every windowed segment. The HUC framework, allowing the categorization of the representations into a small number of phoneme-like units, is used to train the model for learning semantically rich speech representations. The targets consist of phoneme-like pseudo labels for each audio segment and these are generated with an iterative k-means algorithm. We explore techniques that improve the speaker invariance of the learned representations and illustrate the effectiveness of the proposed approach on two settings, i) completely unsupervised speech applications on the sub-tasks described as part of the ZeroSpeech 2021 challenge and ii) semi-supervised automatic speech recognition (ASR) applications on the TIMIT dataset and on the GramVaani challenge Hindi dataset. In these experiments, we achieve state-of-art results for various ZeroSpeech tasks. Further, on the ASR experiments, the HUC representations are shown to improve significantly over other established benchmarks based on Wav2vec, HuBERT and Best-RQ.
摘要
“对于无文本资源的语音识别,是一个具有很大的研究 интерес的领域。在这篇文章中,我们描述了一种对于类似时间频率的自我监督学习方法,使用隐藏单元聚合(HUC)框架。输入模型包含了对于每个时间频率范围的窗口处理和1-D卷积层。从卷积层获得的“时间频率”表现被进一步处理,并使用长期内存(LSTM)层生成每个窗口段的内容vector表现。HUC框架允许将表现分为一小数量的语音单元,并在这些单元上进行训练,以学习具有 semantic richness 的语音表现。目标包括每个音频段的语音单元 pseudo-标签,这些标签是使用迭代k-means算法生成的。我们探索了提高话者不变的技术,并证明了我们的方法在两个设定下具有优秀的效果:完全无监督语音应用程序中的ZeroSpeech 2021挑战和半监督自动语音识别(ASR)应用程序中的TIMITdataset和GramVaani挑战Hindi dataset。在这些实验中,我们取得了顶尖的成绩,并在不同的 ZeroSpeech 任务中获得了州内最佳的成绩。此外,在 ASR 实验中,HUC 表现优化了与其他已知的参考标准Wav2vec、HuBERT和Best-RQ相比。”
methods: 方法包括三个关键组件:Clear Prompting (CP)、Calibration with Hints (CH)和Consistent Output (CO),用于处理模型输入、模型偏见和模型输出。
results: 在Spider Challenge的备用测试集上达到了82.3%的执行精度,成为零shot Text-to-SQL领域的状态精度。Abstract
This paper proposes a ChatGPT-based zero-shot Text-to-SQL method, dubbed C3, which achieves 82.3\% in terms of execution accuracy on the holdout test set of Spider and becomes the state-of-the-art zero-shot Text-to-SQL method on the Spider Challenge. C3 consists of three key components: Clear Prompting (CP), Calibration with Hints (CH), and Consistent Output (CO), which are corresponding to the model input, model bias and model output respectively. It provides a systematic treatment for zero-shot Text-to-SQL. Extensive experiments have been conducted to verify the effectiveness and efficiency of our proposed method.
摘要
这篇论文提出了基于ChatGPT的零次 Text-to-SQL方法,名为C3,其在Spider挑战的备用测试集上达到了82.3%的执行精度,成为零次 Text-to-SQL领域的状态之一。C3包括三个关键组件: Clear Prompting(CP)、Calibration with Hints(CH)和Consistent Output(CO),它们分别对应模型输入、模型偏好和模型输出。它提供了零次 Text-to-SQL的系统性处理方法。我们进行了广泛的实验来证明我们提出的方法的有效性和效率。
One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton Matching
paper_authors: Siyuan Yang, Jun Liu, Shijian Lu, Er Meng Hwa, Alex C. Kot
for: 一shot skeleton action recognition, which aims to learn a skeleton action recognition model with a single training sample.
methods: 使用多scale spatial-temporal feature matching to handle skeleton action recognition, representing skeleton data at multiple spatial and temporal scales and achieving optimal feature matching from two perspectives.
results: 在三个大规模数据集(NTU RGB+D、NTU RGB+D 120、PKU-MMD)上进行了广泛的实验,得到了superior的一shot skeleton action recognition结果,并与状态对比大幅提高了性能。Abstract
One-shot skeleton action recognition, which aims to learn a skeleton action recognition model with a single training sample, has attracted increasing interest due to the challenge of collecting and annotating large-scale skeleton action data. However, most existing studies match skeleton sequences by comparing their feature vectors directly which neglects spatial structures and temporal orders of skeleton data. This paper presents a novel one-shot skeleton action recognition technique that handles skeleton action recognition via multi-scale spatial-temporal feature matching. We represent skeleton data at multiple spatial and temporal scales and achieve optimal feature matching from two perspectives. The first is multi-scale matching which captures the scale-wise semantic relevance of skeleton data at multiple spatial and temporal scales simultaneously. The second is cross-scale matching which handles different motion magnitudes and speeds by capturing sample-wise relevance across multiple scales. Extensive experiments over three large-scale datasets (NTU RGB+D, NTU RGB+D 120, and PKU-MMD) show that our method achieves superior one-shot skeleton action recognition, and it outperforms the state-of-the-art consistently by large margins.
摘要
一shot skeleton action recognition技术,即通过单个训练样本学习skeleton action recognition模型,已经引起了越来越多的关注,因为收集和标注大规模skeleton action数据是一项挑战。然而,大多数现有研究都是通过直接比较skeleton序列的特征向量来匹配skeleton数据,这种方法忽略了skeleton数据的空间结构和时间顺序。本文提出了一种新的一shot skeleton action recognition技术,通过多级空间-时间特征匹配来处理skeleton action recognition。我们将skeleton数据表示在多个空间和时间级别,并实现最佳的特征匹配从两个方面。第一是多级匹配,同时捕捉多个空间和时间级别的scale-wise semantic relevance。第二是交叉级别匹配,处理不同的动作幅度和速度,通过捕捉不同级别的sample-wise relevance。我们在NTU RGB+D、NTU RGB+D 120和PKU-MMD三个大规模数据集上进行了广泛的实验,结果表明,我们的方法可以实现superior的一shot skeleton action recognition,并在相对评价中减少了state-of-the-art的差距。
AudioInceptionNeXt: TCL AI LAB Submission to EPIC-SOUND Audio-Based-Interaction-Recognition Challenge 2023
results: 我们的方法在挑战测试集上达到55.43%的排名第一的top-1准确率,代码可以匿名获取于https://github.com/StevenLauHKHK/AudioInceptionNeXt.git。Abstract
This report presents the technical details of our submission to the 2023 Epic-Kitchen EPIC-SOUNDS Audio-Based Interaction Recognition Challenge. The task is to learn the mapping from audio samples to their corresponding action labels. To achieve this goal, we propose a simple yet effective single-stream CNN-based architecture called AudioInceptionNeXt that operates on the time-frequency log-mel-spectrogram of the audio samples. Motivated by the design of the InceptionNeXt, we propose parallel multi-scale depthwise separable convolutional kernels in the AudioInceptionNeXt block, which enable the model to learn the time and frequency information more effectively. The large-scale separable kernels capture the long duration of activities and the global frequency semantic information, while the small-scale separable kernels capture the short duration of activities and local details of frequency information. Our approach achieved 55.43% of top-1 accuracy on the challenge test set, ranked as 1st on the public leaderboard. Codes are available anonymously at https://github.com/StevenLauHKHK/AudioInceptionNeXt.git.
摘要
这份报告介绍我们在2023年 Epic-Kitchen EPIC-SOUNDS 音频基于交互识别挑战中的提交技术细节。任务是学习音频示例与其相应的动作标签之间的映射。为达到这个目标,我们提议一种简单 yet effective的单流Convolutional Neural Network(CNN)架构 AudioInceptionNeXt,该架构在时域-频谱幅的Log-Mel spectrogram上运行。受inceptionNeXt的设计启发,我们提议在AudioInceptionNeXt块中并行执行多尺度分割的深度独立 convolutional 核,这使得模型能够更有效地学习时间和频谱信息。大规模分割核捕捉活动的长期持续和全球频谱 semantic 信息,而小规模分割核捕捉活动的短期持续和本地频谱信息。我们的方法在挑战测试集上达到55.43%的 top-1 准确率,排名公共排行板上第一名。代码可以在https://github.com/StevenLauHKHK/AudioInceptionNeXt.git anonymous 上获取。
A Dynamic Points Removal Benchmark in Point Cloud Maps
results: 本研究使用了多个不同感应器类型的数据集,并提供了一个公开可用的代码和数据集,以便进一步的发展和应用。Abstract
In the field of robotics, the point cloud has become an essential map representation. From the perspective of downstream tasks like localization and global path planning, points corresponding to dynamic objects will adversely affect their performance. Existing methods for removing dynamic points in point clouds often lack clarity in comparative evaluations and comprehensive analysis. Therefore, we propose an easy-to-extend unified benchmarking framework for evaluating techniques for removing dynamic points in maps. It includes refactored state-of-art methods and novel metrics to analyze the limitations of these approaches. This enables researchers to dive deep into the underlying reasons behind these limitations. The benchmark makes use of several datasets with different sensor types. All the code and datasets related to our study are publicly available for further development and utilization.
摘要
在机器人学中,点云已成为重要的地图表示方式。从本地化和全球规划任务的视角来看,对动态对象的点会有负面影响性能。现有的点云中动态点除法方法经常缺乏明确的比较评价和全面分析。因此,我们提议一个易扩展的统一评价框架,用于评估地图中动态点除法的技术。该框架包括 refactoring 当前领先的方法和新的评价指标,以分析这些方法的局限性。这使研究人员能够深入探究这些局限性的根本原因。我们的benchmark使用了不同感知器类型的数据集。所有与我们的研究相关的代码和数据集都公开可用于进一步开发和应用。
Dialogue Agents 101: A Beginner’s Guide to Critical Ingredients for Designing Effective Conversational Systems
results: 研究发现,使用不同的方法解决不同的对话任务可能会带来高成本和不充分利用对话任务之间的相互关系。因此,现在的趋势是向建立统一基础模型。本研究还提出了一个统一对话数据集(UNIT),用于检验对话代理系统的性能。Abstract
Sharing ideas through communication with peers is the primary mode of human interaction. Consequently, extensive research has been conducted in the area of conversational AI, leading to an increase in the availability and diversity of conversational tasks, datasets, and methods. However, with numerous tasks being explored simultaneously, the current landscape of conversational AI becomes fragmented. Therefore, initiating a well-thought-out model for a dialogue agent can pose significant challenges for a practitioner. Towards highlighting the critical ingredients needed for a practitioner to design a dialogue agent from scratch, the current study provides a comprehensive overview of the primary characteristics of a dialogue agent, the supporting tasks, their corresponding open-domain datasets, and the methods used to benchmark these datasets. We observe that different methods have been used to tackle distinct dialogue tasks. However, building separate models for each task is costly and does not leverage the correlation among the several tasks of a dialogue agent. As a result, recent trends suggest a shift towards building unified foundation models. To this end, we propose UNIT, a UNified dIalogue dataseT constructed from conversations of existing datasets for different dialogue tasks capturing the nuances for each of them. We also examine the evaluation strategies used to measure the performance of dialogue agents and highlight the scope for future research in the area of conversational AI.
摘要
人类交流的主要方式是通过与同伴的交流,因此很多研究在对话AI方面进行了探索。这导致了对话任务的多样性和可用性的提高,但是由于同时探索多个任务,现在的对话AI领域就变得分散化。因此,设计一个从头开始的对话代理模型可能会对实践者提出 significiant challenges。为了帮助实践者设计对话代理模型,本研究提供了对对话代理模型的主要特征、支持任务、相关的开放领域数据集以及用于评估这些数据集的方法的全面回顾。我们发现不同的方法在解决不同的对话任务时都有用,但是建立每个任务的单独模型是昂贵的,并且不利用对话任务之间的相互关系。因此,现在的趋势是建立统一基础模型。为此,我们提出了UNIT,一个基于对话数据集的统一基础模型, capture了每个任务的细节。我们还检查了用于评估对话代理模型的评价策略,并 highlighted the scope for future research in the area of conversational AI.
Mitigating Adversarial Vulnerability through Causal Parameter Estimation by Adversarial Double Machine Learning
paper_authors: Byung-Kwan Lee, Junho Kim, Yong Man Ro
For: This paper aims to improve the adversarial robustness of deep neural networks by introducing a causal approach called Adversarial Double Machine Learning (ADML).* Methods: The ADML method uses a causal perspective to quantify the degree of adversarial vulnerability and capture the effect of treatments on the outcome of interests.* Results: The paper demonstrates through extensive experiments on various CNN and Transformer architectures that ADML improves adversarial robustness with large margins and relieves the empirical observation of vulnerability.Abstract
Adversarial examples derived from deliberately crafted perturbations on visual inputs can easily harm decision process of deep neural networks. To prevent potential threats, various adversarial training-based defense methods have grown rapidly and become a de facto standard approach for robustness. Despite recent competitive achievements, we observe that adversarial vulnerability varies across targets and certain vulnerabilities remain prevalent. Intriguingly, such peculiar phenomenon cannot be relieved even with deeper architectures and advanced defense methods. To address this issue, in this paper, we introduce a causal approach called Adversarial Double Machine Learning (ADML), which allows us to quantify the degree of adversarial vulnerability for network predictions and capture the effect of treatments on outcome of interests. ADML can directly estimate causal parameter of adversarial perturbations per se and mitigate negative effects that can potentially damage robustness, bridging a causal perspective into the adversarial vulnerability. Through extensive experiments on various CNN and Transformer architectures, we corroborate that ADML improves adversarial robustness with large margins and relieve the empirical observation.
摘要
“深度神经网络的决策过程可以轻易受到来自故意设计的扰动的攻击,这些扰动可以轻易地让深度神经网络的决策过程受到损害。为了防止这些潜在的威胁,各种基于对抗训练的防御方法在深度神经网络中迅速增长,并成为了惯例的标准方法。尽管最近的竞赛成绩表明了这些防御方法的竞争力,但我们发现,对于不同的目标,攻击性的扰动的敏感性会异常变化,而且某些敏感性甚至无法被深度神经网络的更深度结构和高级防御方法缓解。为了解决这个问题,在这篇论文中,我们提出了一种名为对抗双机器学习(ADML)的 causal 方法,可以评估深度神经网络的预测中的攻击性扰动的度量,并且可以直接评估对于结果的影响。ADML可以直接估计攻击性扰动的 causal 参数,并且可以避免对 robustness 的负面影响,从而将 causal 视角引入到攻击性扰动中。通过对各种 CNN 和 Transformer 架构进行广泛的实验,我们证明了 ADML 可以提高对抗鲁棒性,并且可以解除实际观察中的异常现象。”
Rigorous Runtime Analysis of Diversity Optimization with GSEMO on OneMinMax
results: 论文证明了GSEMO算法在 Population 中拥有第二最佳多样化时,预期时间 $O(n^2)$ 内找到优化多样化的解。这个结论基于Population 的随机漫步分析,该分析反映了解集的变化频率和结果。Abstract
The evolutionary diversity optimization aims at finding a diverse set of solutions which satisfy some constraint on their fitness. In the context of multi-objective optimization this constraint can require solutions to be Pareto-optimal. In this paper we study how the GSEMO algorithm with additional diversity-enhancing heuristic optimizes a diversity of its population on a bi-objective benchmark problem OneMinMax, for which all solutions are Pareto-optimal. We provide a rigorous runtime analysis of the last step of the optimization, when the algorithm starts with a population with a second-best diversity, and prove that it finds a population with optimal diversity in expected time $O(n^2)$, when the problem size $n$ is odd. For reaching our goal, we analyse the random walk of the population, which reflects the frequency of changes in the population and their outcomes.
摘要
进化多标的优化目标是找到一个多元化的解集,满足一些适当的健康指标。在多目标优化情况下,这个限制可能需要解是Pareto优的。在这篇论文中,我们研究了GSEMO算法,具有额外的多标化增强规律,在二元最大最小问题OneMinMax上进行了多标化优化。我们提供了严谨的时间分析,当算法从一个第二最佳多标的人口开始时,证明它可以在预期时间O(n^2)内找到一个最佳多标的人口,当问题大小n是奇数时。为了实现我们的目标,我们分析了人口的随机步行,这反映了人口中的变化频率和其结果。
Fairness of ChatGPT and the Role Of Explainable-Guided Prompts
methods: 研究使用了judiciously designed prompts和domain-specific knowledge来导向LLMs,并与传统机器学习(ML)模型进行比较。
results: 研究发现LLMs可以与传统ML模型相似的性能,但使用了40倍少的数据(20个数据点,相比800个数据点),并优化错误率和公平性,这些都是风险分析中重要的层面。Abstract
Our research investigates the potential of Large-scale Language Models (LLMs), specifically OpenAI's GPT, in credit risk assessment-a binary classification task. Our findings suggest that LLMs, when directed by judiciously designed prompts and supplemented with domain-specific knowledge, can parallel the performance of traditional Machine Learning (ML) models. Intriguingly, they achieve this with significantly less data-40 times less, utilizing merely 20 data points compared to the ML's 800. LLMs particularly excel in minimizing false positives and enhancing fairness, both being vital aspects of risk analysis. While our results did not surpass those of classical ML models, they underscore the potential of LLMs in analogous tasks, laying a groundwork for future explorations into harnessing the capabilities of LLMs in diverse ML tasks.
摘要
我们的研究探讨了大规模语言模型(LLMs),具体是OpenAI的GPT,在信用风险评估中的潜力。我们发现,当用judiciously设计的提示和域pecific知识支持时,LLMs可以与传统机器学习(ML)模型相当。这些LLMs可以在使用merely 20个数据点时达到800个数据点的性能,并且特别是减少假阳性和提高公平性,这两者都是重要的风险分析方面。虽然我们的结果没有超过传统ML模型的表现,但它们表明LLMs在类似任务中具有潜力,为未来在多种ML任务中利用LLMs的可能性奠定基础。
Multiplicative update rules for accelerating deep learning training and increasing robustness
results: 实验表明,使用这种新的多元更新规则可以在多种优化方法和深度神经网络(DNN)架构下加速DL训练,并且导致模型更加稳定和robust。Abstract
Even nowadays, where Deep Learning (DL) has achieved state-of-the-art performance in a wide range of research domains, accelerating training and building robust DL models remains a challenging task. To this end, generations of researchers have pursued to develop robust methods for training DL architectures that can be less sensitive to weight distributions, model architectures and loss landscapes. However, such methods are limited to adaptive learning rate optimizers, initialization schemes, and clipping gradients without investigating the fundamental rule of parameters update. Although multiplicative updates have contributed significantly to the early development of machine learning and hold strong theoretical claims, to best of our knowledge, this is the first work that investigate them in context of DL training acceleration and robustness. In this work, we propose an optimization framework that fits to a wide range of optimization algorithms and enables one to apply alternative update rules. To this end, we propose a novel multiplicative update rule and we extend their capabilities by combining it with a traditional additive update term, under a novel hybrid update method. We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule and we experimentally demonstrate their effectiveness in a wide range of task and optimization methods. Such tasks ranging from convex and non-convex optimization to difficult image classification benchmarks applying a wide range of traditionally used optimization methods and Deep Neural Network (DNN) architectures.
摘要
In this work, we propose an optimization framework that fits to a wide range of optimization algorithms and enables one to apply alternative update rules. To this end, we propose a novel multiplicative update rule and we extend their capabilities by combining it with a traditional additive update term, under a novel hybrid update method. We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule and we experimentally demonstrate their effectiveness in a wide range of tasks and optimization methods. Such tasks ranging from convex and non-convex optimization to difficult image classification benchmarks applying a wide range of traditionally used optimization methods and Deep Neural Network (DNN) architectures.Translated into Simplified Chinese:即使现在,深度学习(DL)已经在许多研究领域达到了状态机器的表现,加速训练和建立robust DL模型仍然是一项挑战。为此,不同的研究者已经努力开发了robust DL模型训练方法,以降低参数分布、模型架构和损失landscape的敏感性。然而,这些方法受限于 adaptive learning rate optimizer、初始化方案和clipping gradients,而没有探究参数更新的基本规则。 although multiplicative updates have made significant contributions to the early development of machine learning and have strong theoretical foundations, to the best of our knowledge, this is the first work that investigates them in the context of DL training acceleration and robustness.在这个工作中,我们提出了一个优化框架,可以适应多种优化算法,并允许使用alternative update rule。为此,我们提出了一个新的乘数更新规则,并通过将其与传统的加itive更新项结合,实现了一种新的hybrid更新方法。我们声称,该 frameworks可以加速训练,同时导致模型更加稳定和robust,相比于传统使用的加itive更新规则,并通过实验证明其效果。这些任务包括从 convex和非convex优化到难度图像分类benchmark,并通过多种传统使用的优化方法和深度神经网络(DNN)架构。
TriFormer: A Multi-modal Transformer Framework For Mild Cognitive Impairment Conversion Prediction
results: Triformer在Alzheimer’s Disease Neuroimaging Initiative(ANDI)1和ADNI2数据集上评估,与之前的单模和多模态方法相比,显示出更高的准确率。Abstract
The prediction of mild cognitive impairment (MCI) conversion to Alzheimer's disease (AD) is important for early treatment to prevent or slow the progression of AD. To accurately predict the MCI conversion to stable MCI or progressive MCI, we propose Triformer, a novel transformer-based framework with three specialized transformers to incorporate multi-model data. Triformer uses I) an image transformer to extract multi-view image features from medical scans, II) a clinical transformer to embed and correlate multi-modal clinical data, and III) a modality fusion transformer that produces an accurate prediction based on fusing the outputs from the image and clinical transformers. Triformer is evaluated on the Alzheimer's Disease Neuroimaging Initiative (ANDI)1 and ADNI2 datasets and outperforms previous state-of-the-art single and multi-modal methods.
摘要
预测轻度认知障碍(MCI)转化为阿尔茨海默病(AD)的预测非常重要,以便在早期发现并采取措施来防止或减慢AD的进程。为了准确预测MCI转化为稳定MCI或进展MCI,我们提议Triformer,一种新的转换器基础框架,其特点是使用三种特殊的转换器来汇合多种数据。Triformer使用以下三种转换器:1. 图像转换器,用于从医疗扫描中提取多视图图像特征。2. 临床转换器,用于将多modal临床数据进行嵌入和相关计算。3. 模式融合转换器,用于基于多种数据融合的准确预测。Triformer在Alzheimer's Disease Neuroimaging Initiative(ANDI)1和ADNI2 datasets上进行评估,并与之前的单模和多模态方法相比,显示出更高的预测精度。
Safe DreamerV3: Safe Reinforcement Learning with World Models
results: 我们的实验结果显示,Safe DreamerV3算法在Safety-Gymnasium benchmark中的低维度和视觉任务中可以达到几乎零成本,而 existing SafeRL 方法则无法达到这个目标。Abstract
The widespread application of Reinforcement Learning (RL) in real-world situations is yet to come to fruition, largely as a result of its failure to satisfy the essential safety demands of such systems. Existing safe reinforcement learning (SafeRL) methods, employing cost functions to enhance safety, fail to achieve zero-cost in complex scenarios, including vision-only tasks, even with comprehensive data sampling and training. To address this, we introduce Safe DreamerV3, a novel algorithm that integrates both Lagrangian-based and planning-based methods within a world model. Our methodology represents a significant advancement in SafeRL as the first algorithm to achieve nearly zero-cost in both low-dimensional and vision-only tasks within the Safety-Gymnasium benchmark. Our project website can be found in: https://sites.google.com/view/safedreamerv3.
摘要
RL在实际应用中的普及尚未实现,主要是因为它无法满足实际系统的安全需求。现有的安全再强化学习(SafeRL)方法,通过提高安全成本来增强安全性,在复杂的场景中,包括视觉任务,甚至 WITH comprehensive data sampling和训练,都无法实现零成本。为解决这一问题,我们提出了Safe DreamerV3算法,该算法结合了Lagrangian-based和规划基于的方法,并在世界模型中实现了这些方法。我们的方法代表了安全RL的一大突破,是首个在低维度和视觉任务中的 Safety-Gymnasium bencmark中实现零成本的算法。您可以在以下网站上找到我们的项目网站:https://sites.google.com/view/safedreamerv3。
HYTREL: Hypergraph-enhanced Tabular Data Representation Learning
results: 在四个下游任务中,HYTREL 表现出了与其他竞争对手相比的稳定和高效表现,只需要最小的预训练。此外,qualitative 分析表明 HYTREL 可以快速吸收表格结构,生成稳定的表格单元表示。Abstract
Language models pretrained on large collections of tabular data have demonstrated their effectiveness in several downstream tasks. However, many of these models do not take into account the row/column permutation invariances, hierarchical structure, etc. that exist in tabular data. To alleviate these limitations, we propose HYTREL, a tabular language model, that captures the permutation invariances and three more structural properties of tabular data by using hypergraphs - where the table cells make up the nodes and the cells occurring jointly together in each row, column, and the entire table are used to form three different types of hyperedges. We show that HYTREL is maximally invariant under certain conditions for tabular data, i.e., two tables obtain the same representations via HYTREL iff the two tables are identical up to permutations. Our empirical results demonstrate that HYTREL consistently outperforms other competitive baselines on four downstream tasks with minimal pretraining, illustrating the advantages of incorporating the inductive biases associated with tabular data into the representations. Finally, our qualitative analyses showcase that HYTREL can assimilate the table structures to generate robust representations for the cells, rows, columns, and the entire table.
摘要
language models 预训练在大量表格数据上有显示其效iveness 在多个下游任务中。然而,许多这些模型不会考虑表格数据中的列/行Permutation invariances,层次结构等属性。为了解决这些限制,我们提出了 HYTREL,一种表格语言模型,该模型通过使用 hypergraphs capture 表格数据中的 Permutation invariances 和三种结构属性。我们证明 HYTREL 在某些条件下对 tabular data 是最大 invariant,即两个表格通过 HYTREL 生成的表示相同,只要两个表格在排序上相同。我们的实验结果表明 HYTREL 在四个下游任务中具有明显的优势,只需要 minimal pretraining,这说明了在表格数据中包含表格数据的预测性假设可以为表格数据的表示增加优势。最后,我们的质量分析表明 HYTREL 可以吸收表格结构,为细胞、行、列和整个表格生成强健的表示。
Vulnerability-Aware Instance Reweighting For Adversarial Training
results: 对比 existed 的权重调整方法,本研究提出了一种新的实例级权重调整方法,可以减少对攻击示例的依赖度和信息损失。实验结果表明,该方法可以显著提高对攻击示例的Robustness,特别是面对强大的白盒和黑盒攻击。Abstract
Adversarial Training (AT) has been found to substantially improve the robustness of deep learning classifiers against adversarial attacks. AT involves obtaining robustness by including adversarial examples in training a classifier. Most variants of AT algorithms treat every training example equally. However, recent works have shown that better performance is achievable by treating them unequally. In addition, it has been observed that AT exerts an uneven influence on different classes in a training set and unfairly hurts examples corresponding to classes that are inherently harder to classify. Consequently, various reweighting schemes have been proposed that assign unequal weights to robust losses of individual examples in a training set. In this work, we propose a novel instance-wise reweighting scheme. It considers the vulnerability of each natural example and the resulting information loss on its adversarial counterpart occasioned by adversarial attacks. Through extensive experiments, we show that our proposed method significantly improves over existing reweighting schemes, especially against strong white and black-box attacks.
摘要
Looking deeper into interpretable deep learning in neuroimaging: a comprehensive survey
results: 研究发现,使用可解释深度学习模型可以更好地捕捉神经成像数据中关键的脑变化,并且可以提高模型的预测性能。但是,当前的实践还存在一些限制和挑战,需要进一步的研究和改进。Abstract
Deep learning (DL) models have been popular due to their ability to learn directly from the raw data in an end-to-end paradigm, alleviating the concern of a separate error-prone feature extraction phase. Recent DL-based neuroimaging studies have also witnessed a noticeable performance advancement over traditional machine learning algorithms. But the challenges of deep learning models still exist because of the lack of transparency in these models for their successful deployment in real-world applications. In recent years, Explainable AI (XAI) has undergone a surge of developments mainly to get intuitions of how the models reached the decisions, which is essential for safety-critical domains such as healthcare, finance, and law enforcement agencies. While the interpretability domain is advancing noticeably, researchers are still unclear about what aspect of model learning a post hoc method reveals and how to validate its reliability. This paper comprehensively reviews interpretable deep learning models in the neuroimaging domain. Firstly, we summarize the current status of interpretability resources in general, focusing on the progression of methods, associated challenges, and opinions. Secondly, we discuss how multiple recent neuroimaging studies leveraged model interpretability to capture anatomical and functional brain alterations most relevant to model predictions. Finally, we discuss the limitations of the current practices and offer some valuable insights and guidance on how we can steer our future research directions to make deep learning models substantially interpretable and thus advance scientific understanding of brain disorders.
摘要
深度学习(DL)模型在过去几年中得到了广泛应用,主要是因为它们可以直接从原始数据中学习,而不需要额外的错误产生的特征提取阶段。现在的DL基于的脑成像研究也经历了明显的性能提高,但DL模型的挑战仍然存在,主要是因为这些模型的透明性不够,使得它们在实际应用中不可靠。在过去几年,可解释AI(XAI)技术得到了广泛发展,以获取模型做出决策的 intuitions,这对于安全关键领域如医疗、金融和宪政机构来说是非常重要。虽然解释领域在发展中,但研究人员仍然不清楚哪些方面的模型学习post hoc方法揭示出来,以及如何验证其可靠性。这篇评论综述了在脑成像领域中可解释深度学习模型的发展。首先,我们概括了目前可解释资源的状况,包括方法的进步、相关挑战和观点。其次,我们讲述了一些最近的脑成像研究如何通过模型解释来捕捉最相关的脑结构和功能变化,以及对于模型预测的影响。最后,我们讨论了当前实践中的限制,并提供了一些有价值的指导和建议,以便我们可以在未来减少DL模型的不可靠性,并提高我们对脑疾病的科学理解。
Federated Learning-Empowered AI-Generated Content in Wireless Networks
paper_authors: Xumin Huang, Peichun Li, Hongyang Du, Jiawen Kang, Dusit Niyato, Dong In Kim, Yuan Wu
for: 提高内容创作效率、质量、多样性和灵活性
methods: 采用分布式学习框架,协同数据所有者进行模型训练,保护用户隐私
results: 降低通信成本和训练延迟,同时保护用户隐私Abstract
Artificial intelligence generated content (AIGC) has emerged as a promising technology to improve the efficiency, quality, diversity and flexibility of the content creation process by adopting a variety of generative AI models. Deploying AIGC services in wireless networks has been expected to enhance the user experience. However, the existing AIGC service provision suffers from several limitations, e.g., the centralized training in the pre-training, fine-tuning and inference processes, especially their implementations in wireless networks with privacy preservation. Federated learning (FL), as a collaborative learning framework where the model training is distributed to cooperative data owners without the need for data sharing, can be leveraged to simultaneously improve learning efficiency and achieve privacy protection for AIGC. To this end, we present FL-based techniques for empowering AIGC, and aim to enable users to generate diverse, personalized, and high-quality content. Furthermore, we conduct a case study of FL-aided AIGC fine-tuning by using the state-of-the-art AIGC model, i.e., stable diffusion model. Numerical results show that our scheme achieves advantages in effectively reducing the communication cost and training latency and privacy protection. Finally, we highlight several major research directions and open issues for the convergence of FL and AIGC.
摘要
人工智能生成内容(AIGC)技术已经出现为改善内容创作过程的效率、质量、多样性和灵活性的优秀选择。在无线网络中部署AIGC服务可以提高用户体验。然而,现有的AIGC服务提供方式受到一些限制,例如中央化训练在预训练、精度调整和推理过程中,特别是在保护隐私的无线网络中。 Federated learning(FL),作为一种分布式学习框架,可以同时提高学习效率和实现隐私保护。为此,我们提出了基于FL的AIGC技术,并 Hoping to enable users to generate diverse, personalized, and high-quality content.在我们的实验中,我们使用当前的AIGC模型,即稳定扩散模型进行FL-帮助的精度调整。 numerically results show that our scheme achieves advantages in effectively reducing the communication cost and training latency, as well as privacy protection. Finally, we highlight several major research directions and open issues for the convergence of FL and AIGC.
Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots
results: 研究发现,open-domain chatbot 模型在多Turn对话中可以被触发出攻击性的回答,最好的情况下,\toxicbot 的激活率达到 67%。Abstract
Recent advances in natural language processing and machine learning have led to the development of chatbot models, such as ChatGPT, that can engage in conversational dialogue with human users. However, the ability of these models to generate toxic or harmful responses during a non-toxic multi-turn conversation remains an open research question. Existing research focuses on single-turn sentence testing, while we find that 82\% of the individual non-toxic sentences that elicit toxic behaviors in a conversation are considered safe by existing tools. In this paper, we design a new attack, \toxicbot, by fine-tuning a chatbot to engage in conversation with a target open-domain chatbot. The chatbot is fine-tuned with a collection of crafted conversation sequences. Particularly, each conversation begins with a sentence from a crafted prompt sentences dataset. Our extensive evaluation shows that open-domain chatbot models can be triggered to generate toxic responses in a multi-turn conversation. In the best scenario, \toxicbot achieves a 67\% activation rate. The conversation sequences in the fine-tuning stage help trigger the toxicity in a conversation, which allows the attack to bypass two defense methods. Our findings suggest that further research is needed to address chatbot toxicity in a dynamic interactive environment. The proposed \toxicbot can be used by both industry and researchers to develop methods for detecting and mitigating toxic responses in conversational dialogue and improve the robustness of chatbots for end users.
摘要
近年,自然语言处理和机器学习的进步使得聊天机器人模型,如ChatGPT,能够与人类用户进行对话交流。然而,这些模型在不恶意多轮对话中产生恶意或危险回复的能力仍然是一个开放的研究问题。现有研究主要集中在单Turn句子测试上,而我们发现82%的个人不恶意句子在对话中可以让chatbot产生恶意行为。在这篇论文中,我们设计了一种新的攻击,\toxicbot,通过细化一个目标的开放领域聊天机器人。这个聊天机器人通过一个采集的对话序列进行细化。具体来说,每个对话都开始于一个采集的提示句子集中的一句话。我们的广泛评估表明,开放领域聊天机器人模型可以在多轮对话中产生恶意回复。最好的情况下,\toxicbot达到67%的活动率。对话序列在细化阶段帮助诱发对话中的恶意,这使得攻击能够绕过两种防御方法。我们的发现表明,进一步的研究是必要的,以解决聊天机器人中的恶意对话在动态交互环境中的问题。我们提出的\toxicbot可以由行业和研究人员使用,以开发检测和mitigate聊天机器人中的恶意回复的方法,并提高聊天机器人的用户端Robustness。
Multi-Dimensional Ability Diagnosis for Machine Learning Algorithms
results: 对四个公共数据集进行了广泛的实验,结果显示 Camilla 可以更准确地捕捉每个算法的优缺点,并在metric可靠性、排名一致性和排名稳定性上表现出色Abstract
Machine learning algorithms have become ubiquitous in a number of applications (e.g. image classification). However, due to the insufficient measurement of traditional metrics (e.g. the coarse-grained Accuracy of each classifier), substantial gaps are usually observed between the real-world performance of these algorithms and their scores in standardized evaluations. In this paper, inspired by the psychometric theories from human measurement, we propose a task-agnostic evaluation framework Camilla, where a multi-dimensional diagnostic metric Ability is defined for collaboratively measuring the multifaceted strength of each machine learning algorithm. Specifically, given the response logs from different algorithms to data samples, we leverage cognitive diagnosis assumptions and neural networks to learn the complex interactions among algorithms, samples and the skills (explicitly or implicitly pre-defined) of each sample. In this way, both the abilities of each algorithm on multiple skills and some of the sample factors (e.g. sample difficulty) can be simultaneously quantified. We conduct extensive experiments with hundreds of machine learning algorithms on four public datasets, and our experimental results demonstrate that Camilla not only can capture the pros and cons of each algorithm more precisely, but also outperforms state-of-the-art baselines on the metric reliability, rank consistency and rank stability.
摘要
Ethics in the Age of AI: An Analysis of AI Practitioners’ Awareness and Challenges
results: 研究发现大多数AI实践者有一定的认识度对AI伦理,主要归功于工作场所的规则和政策。他们主要关注隐私保护和安全问题。正式的教育和培训被认为有一定帮助作用。AI实践者在开发伦理AI系统时遇到的挑战包括一般挑战、技术相关挑战和人类相关挑战。Abstract
Ethics in AI has become a debated topic of public and expert discourse in recent years. But what do people who build AI - AI practitioners - have to say about their understanding of AI ethics and the challenges associated with incorporating it in the AI-based systems they develop? Understanding AI practitioners' views on AI ethics is important as they are the ones closest to the AI systems and can bring about changes and improvements. We conducted a survey aimed at understanding AI practitioners' awareness of AI ethics and their challenges in incorporating ethics. Based on 100 AI practitioners' responses, our findings indicate that majority of AI practitioners had a reasonable familiarity with the concept of AI ethics, primarily due to workplace rules and policies. Privacy protection and security was the ethical principle that majority of them were aware of. Formal education/training was considered somewhat helpful in preparing practitioners to incorporate AI ethics. The challenges that AI practitioners faced in the development of ethical AI-based systems included (i) general challenges, (ii) technology-related challenges and (iii) human-related challenges. We also identified areas needing further investigation and provided recommendations to assist AI practitioners and companies in incorporating ethics into AI development.
摘要
调查结果表明,大多数 AI practitioners 对 AI 伦理概念有一定的了解,主要归功于工作场所的规则和政策。隐私保护和安全是他们意识到的伦理原则中的主要内容。有些人认为,正式的教育和训练有所帮助,以准备 practitioners 在实践中采用 AI 伦理。在开发伦理 AI 系统时,AI practitioners 面临着一些挑战,包括(i)总体挑战、(ii)技术相关的挑战和(iii)人类相关的挑战。我们还发现了需要进一步调查的领域和提出了建议,以帮助 AI practitioners 和公司在 AI 开发中更好地 integrating ethics。
DataAssist: A Machine Learning Approach to Data Cleaning and Preparation
results: 可以大幅减少数据整理和整合的时间,为经济、商业和预测应用等领域提供高质量数据集Here’s the breakdown of each point:
for: The paper is written for people who want to improve the efficiency of data analysis and reduce the time spent on data cleaning and preparation.
methods: The paper uses machine learning-informed methods to automate data preparation, including generating visualizations for user-selected variables, unifying data annotation, suggesting anomaly removal, and preprocessing data.
results: The paper shows that using DataAssist can significantly reduce the time spent on data cleaning and preparation, providing high-quality data sets for a variety of fields, including economics, business, and forecasting applications.Abstract
Current automated machine learning (ML) tools are model-centric, focusing on model selection and parameter optimization. However, the majority of the time in data analysis is devoted to data cleaning and wrangling, for which limited tools are available. Here we present DataAssist, an automated data preparation and cleaning platform that enhances dataset quality using ML-informed methods. We show that DataAssist provides a pipeline for exploratory data analysis and data cleaning, including generating visualization for user-selected variables, unifying data annotation, suggesting anomaly removal, and preprocessing data. The exported dataset can be readily integrated with other autoML tools or user-specified model for downstream analysis. Our data-centric tool is applicable to a variety of fields, including economics, business, and forecasting applications saving over 50% time of the time spent on data cleansing and preparation.
摘要
现有的自动化机器学习(ML)工具都是模型中心的,强调模型选择和参数优化。然而,数据分析中大部分时间都是投入到数据整理和整理中,而现有的工具却有限。我们现在提出了DataAssist,一个自动化数据准备和整理平台,使用机器学习 Informed 方法提高数据集质量。我们显示了DataAssist 提供了探索数据分析和数据整理的管道,包括生成用户选择变量的Visualization,统一数据注释、建议异常移除和数据预处理。导出的数据集可以轻松地与其他自动ML工具或用户指定的模型进行下游分析。我们的数据集中心工具适用于多个领域,包括经济、商业和预测应用,可以节省大约50%的时间用于数据整理和准备。
EmotionPrompt: Leveraging Psychology for Large Language Models Enhancement via Emotional Stimulus
results: 对8种不同任务和4种模型(ChatGPT、Vicuna-13b、Bloom、T5)进行实验,EmotionPrompt 显著超越原始的零例示 prompt 和 Zero-shot-CoT,同时提高了真实性和信息性。Abstract
Large language models (LLMs) have achieved significant performance in many fields such as reasoning, language understanding, and math problem-solving, and are regarded as a crucial step to artificial general intelligence (AGI). However, the sensitivity of LLMs to prompts remains a major bottleneck for their daily adoption. In this paper, we take inspiration from psychology and propose EmotionPrompt to explore emotional intelligence to enhance the performance of LLMs. EmotionPrompt operates on a remarkably straightforward principle: the incorporation of emotional stimulus into prompts. Experimental results demonstrate that our EmotionPrompt, using the same single prompt templates, significantly outperforms original zero-shot prompt and Zero-shot-CoT on 8 tasks with diverse models: ChatGPT, Vicuna-13b, Bloom, and T5. Further, EmotionPrompt was observed to improve both truthfulness and informativeness. We believe that EmotionPrompt heralds a novel avenue for exploring interdisciplinary knowledge for humans-LLMs interaction.
摘要
Translated into Simplified Chinese:大型语言模型(LLM)在多个领域 such as 理解、语言理解和数学问题解决方面已经达到了显著的性能,并被视为人工通用智能(AGI)的关键一步。然而,LLM的提示敏感性仍然是日常使用的主要瓶颈。在这篇论文中,我们启发自心理学,并提出了情感提示(EmotionPrompt),以增强 LLM 的性能。情感提示的工作原理很简单:在提示中添加情感刺激。实验结果表明,我们的 EmotionPrompt,使用同一个单一提示模板,在8个任务上 WITH 多种模型(ChatGPT、Vicuna-13b、Bloom 和 T5)显著超过原始 zero-shot 提示和 Zero-shot-CoT。此外,EmotionPrompt 还被观察到改善了真实性和信息性。我们认为,EmotionPrompt 开启了一条新的人类-LLM交互知识探索的道路。
Robotic Manipulation Datasets for Offline Compositional Reinforcement Learning
results: 论文的实验表明,当前的OFFLINE RL方法可以在一定程度上学习训练任务,而 compose 方法在不同任务结构下表现 significatively better than non-compose 方法。但是,当前的方法还无法EXTRACT 任务的compositional结构,以便在未看到的任务上掌握。Abstract
Offline reinforcement learning (RL) is a promising direction that allows RL agents to pre-train on large datasets, avoiding the recurrence of expensive data collection. To advance the field, it is crucial to generate large-scale datasets. Compositional RL is particularly appealing for generating such large datasets, since 1) it permits creating many tasks from few components, 2) the task structure may enable trained agents to solve new tasks by combining relevant learned components, and 3) the compositional dimensions provide a notion of task relatedness. This paper provides four offline RL datasets for simulated robotic manipulation created using the 256 tasks from CompoSuite [Mendez et al., 2022a]. Each dataset is collected from an agent with a different degree of performance, and consists of 256 million transitions. We provide training and evaluation settings for assessing an agent's ability to learn compositional task policies. Our benchmarking experiments on each setting show that current offline RL methods can learn the training tasks to some extent and that compositional methods significantly outperform non-compositional methods. However, current methods are still unable to extract the tasks' compositional structure to generalize to unseen tasks, showing a need for further research in offline compositional RL.
摘要
减少数据采集成本的离线强化学习(RL)是一个有前途的方向,允许RL Agent在大量数据上预训练,避免重复的数据采集成本。为了推动这一领域,生成大规模数据是非常重要的。 compositional RL 特别有利于生成大规模数据,因为它允许创建多个任务从少量组成部件,并且任务结构可能使得训练过的 Agent 能够解决新任务,并将相关学习的组成部件组合起来。这篇文章提供了基于 CompoSuite 的 256 个任务的四个离线 RL 数据集,每个数据集由一个不同的表现度作为 Agent 生成的 256 亿个转移。我们提供了训练和评估设置,以评估 Agent 是否能够学习复杂的任务策略。我们的检验结果表明,当前的离线 RL 方法可以在一定程度上学习训练任务,而 compositional 方法在不同任务之间具有显著的优势。然而,当前的方法仍然无法提取任务的复杂结构,以普适化到未看到的任务,表明需要进一步的研究在离线 compositional RL 方面。
Espaloma-0.3.0: Machine-learned molecular mechanics force field for the simulation of protein-ligand systems and beyond
paper_authors: Kenichiro Takaba, Iván Pulido, Mike Henry, Hugo MacDermott-Opeskin, John D. Chodera, Yuanqing Wang
For: The paper aims to develop a new approach to building molecular mechanics (MM) force fields that can be learned directly from quantum chemical calculations or condensed-phase data, using graph neural networks.* Methods: The approach uses an end-to-end differentiable force field construction method called Espaloma, which incorporates both energy and force fitting directly to quantum chemical data. The method is trained on a dataset of chemical spaces relevant to biomolecular modeling, including small molecules, proteins, and RNA.* Results: The resulting force field, espaloma 0.3.0, accurately predicts quantum chemical energies and forces, and maintains stable quantum chemical energy-minimized geometries. The approach also produces highly accurate protein-ligand binding free energies when self-consistently parametrizing protein and ligand. The method shows promise as a path forward for building systematically more accurate force fields that can be easily extended to new chemical domains of interest.Here are the three key points in Simplified Chinese text:* For: 这篇论文目标是开发一种基于量子化学计算或固体数据的分子机械力场模型,使用图 neural network 来取代人工专家审核的、不可变的化学参数分配规则。* Methods: 该方法使用一种终端可微分的力场建模方法,称为 Espaloma,直接从量子化学计算或固体数据中学习力场模型。该方法在包括小分子、蛋白质和 RNA 等化学空间中进行了训练。* Results: 所得到的力场模型,即 espaloma 0.3.0,能够准确预测量子化学能量和力,并稳定地保持量子化学能量最小化的结构。此外,该方法还能够高精度地预测蛋白质-抗体复合物的绑定自由能。该方法显示出了在新的化学领域中建立更加准确的力场模型的潜在优势。Abstract
Molecular mechanics (MM) force fields -- the models that characterize the energy landscape of molecular systems via simple pairwise and polynomial terms -- have traditionally relied on human expert-curated, inflexible, and poorly extensible discrete chemical parameter assignment rules, namely atom or valence types. Recently, there has been significant interest in using graph neural networks to replace this process, while enabling the parametrization scheme to be learned in an end-to-end differentiable manner directly from quantum chemical calculations or condensed-phase data. In this paper, we extend the Espaloma end-to-end differentiable force field construction approach by incorporating both energy and force fitting directly to quantum chemical data into the training process. Building on the OpenMM SPICE dataset, we curate a dataset containing chemical spaces highly relevant to the broad interest of biomolecular modeling, covering small molecules, proteins, and RNA. The resulting force field, espaloma 0.3.0, self-consistently parametrizes these diverse biomolecular species, accurately predicts quantum chemical energies and forces, and maintains stable quantum chemical energy-minimized geometries. Surprisingly, this simple approach produces highly accurate protein-ligand binding free energies when self-consistently parametrizing protein and ligand. This approach -- capable of fitting new force fields to large quantum chemical datasets in one GPU-day -- shows significant promise as a path forward for building systematically more accurate force fields that can be easily extended to new chemical domains of interest.
摘要
分子机械力场(MM)力场模型 -- 用简单的对拟和多项式关系来描述分子系统的能量景观的模型 -- 传统上依赖人工专家 manually curated 和不可靠的化学参数赋值规则,即原子或Valence 类型。在最近几年,有一个重要的兴趣是使用图 neural networks 取代这个过程,以使 parametrization 算法可以在一个端到端可微分的方式直接从量子化学计算或压缩物理数据中学习。在这篇文章中,我们扩展了Espaloma 端到端可微分力场建构方法,并将能量和力数据直接适应到量子化学计算中。基于 OpenMM SPICE 数据集,我们筹集了一个包含高度相关的生物分子模型化学species的数据集,包括小分子、蛋白质和 RNA。得到的力场,Espaloma 0.3.0,自 consistently 参数化这些多样化的生物分子物种,准确预测量子化学能量和力,并维持稳定的量子化学能量最小化几何。Surprisingly,这种简单的方法可以高度准确地预测蛋白质- ligand 绑定自由能量,当自 consistently 参数化蛋白质和 ligand 时。这种方法 -- 可以在一个 GPU-day 内适应新的量子化学数据集 -- 显示出了重要的承诺,作为在新化学领域中建立更准确的力场的可行之路。
Safe Reinforcement Learning as Wasserstein Variational Inference: Formal Methods for Interpretability
results: 该论文通过在实验和真实机器人任务中的训练表现,证明了AWaVO方法的可行性和稳定性,并实际地证明了一种合理的性能与保守可观察性之间的交易。Abstract
Reinforcement Learning or optimal control can provide effective reasoning for sequential decision-making problems with variable dynamics. Such reasoning in practical implementation, however, poses a persistent challenge in interpreting the reward function and corresponding optimal policy. Consequently, formalizing the sequential decision-making problems as inference has a considerable value, as probabilistic inference in principle offers diverse and powerful mathematical tools to infer the stochastic dynamics whilst suggesting a probabilistic interpretation of the reward design and policy convergence. In this study, we propose a novel Adaptive Wasserstein Variational Optimization (AWaVO) to tackle these challenges in sequential decision-making. Our approach utilizes formal methods to provide interpretations of reward design, transparency of training convergence, and probabilistic interpretation of sequential decisions. To demonstrate practicality, we show convergent training with guaranteed global convergence rates not only in simulation but also in real robot tasks, and empirically verify a reasonable tradeoff between high performance and conservative interpretability.
摘要
“强化学习或最佳控制可以提供有效的推理 для累缲问题,具有变化的动态。然而,实际实现中,评估奖函数和对应的优质策略实际上具有挑战。因此,将累缲问题正式化为推理具有很大的价值,因为概率推理在原理上提供了多元化和强大的数学工具,以推断随机动态,并提供了累缲决策的概率解释。在本研究中,我们提出了一种新的适应 Wasserstein 统计Optimization(AWaVO),以解决这些挑战。我们的方法使用正式方法,提供奖函数设计的解释、训练调合的透明度和累缲决策的概率解释。为证明实用性,我们在模拟和实际 robot 任务中显示了执行调合的训练,并证明了保证全球调合速率不仅在模拟中实现,而且在实际任务中也实现了。此外,我们还证明了累缲决策的合理交换,即高性能和保守解释之间的平衡。”
Vertex-based Networks to Accelerate Path Planning Algorithms
results: 通过对 randomly generated floor maps 进行实验,所提出的解决方案可以实现超过400%的速度提升,与基eline模型相比。Abstract
Path planning plays a crucial role in various autonomy applications, and RRT* is one of the leading solutions in this field. In this paper, we propose the utilization of vertex-based networks to enhance the sampling process of RRT*, leading to more efficient path planning. Our approach focuses on critical vertices along the optimal paths, which provide essential yet sparser abstractions of the paths. We employ focal loss to address the associated data imbalance issue, and explore different masking configurations to determine practical tradeoffs in system performance. Through experiments conducted on randomly generated floor maps, our solutions demonstrate significant speed improvements, achieving over a 400% enhancement compared to the baseline model.
摘要
<>将文本翻译成简化中文。<>路径规划在各种自动化应用中扮演着关键性的角色,RRT* 是该领域的一种领先解决方案。在这篇论文中,我们提议通过顶点基于网络来增强 RRT* 的抽象过程,从而实现更有效的路径规划。我们的方法是关注优质路径上的关键顶点,这些顶点提供了关键的 yet 稀疏的路径抽象。我们使用焦点损失来解决相关的数据不均衡问题,并 explore 不同的masking配置来确定实际的系统性能质量。通过在随机生成的 floor 图上进行的实验,我们的解决方案表明了明显的速度提高,相比基eline 模型,实现了超过 400% 的提高。
A metric learning approach for endoscopic kidney stone identification
paper_authors: Jorge Gonzalez-Zapata, Francisco Lopez-Tiro, Elias Villalvazo-Avila, Daniel Flores-Araiza, Jacques Hubert, Andres Mendez-Vazquez, Gilberto Ochoa-Ruiz, Christian Daul
for: automatic identification of kidney stones during ureteroscopy to enable rapid therapeutic decisions
methods: Deep Metric Learning (DML) methods, including a novel architecture and a teacher-student approach with Knowledge Distillation
results: improved identification accuracy by 10-12% compared to Deep Learning (DL) methods and other DML approaches, and up to 30% compared to shallow machine learning methodsAbstract
Several Deep Learning (DL) methods have recently been proposed for an automated identification of kidney stones during an ureteroscopy to enable rapid therapeutic decisions. Even if these DL approaches led to promising results, they are mainly appropriate for kidney stone types for which numerous labelled data are available. However, only few labelled images are available for some rare kidney stone types. This contribution exploits Deep Metric Learning (DML) methods i) to handle such classes with few samples, ii) to generalize well to out of distribution samples, and iii) to cope better with new classes which are added to the database. The proposed Guided Deep Metric Learning approach is based on a novel architecture which was designed to learn data representations in an improved way. The solution was inspired by Few-Shot Learning (FSL) and makes use of a teacher-student approach. The teacher model (GEMINI) generates a reduced hypothesis space based on prior knowledge from the labeled data, and is used it as a guide to a student model (i.e., ResNet50) through a Knowledge Distillation scheme. Extensive tests were first performed on two datasets separately used for the recognition, namely a set of images acquired for the surfaces of the kidney stone fragments, and a set of images of the fragment sections. The proposed DML-approach improved the identification accuracy by 10% and 12% in comparison to DL-methods and other DML-approaches, respectively. Moreover, model embeddings from the two dataset types were merged in an organized way through a multi-view scheme to simultaneously exploit the information of surface and section fragments. Test with the resulting mixed model improves the identification accuracy by at least 3% and up to 30% with respect to DL-models and shallow machine learning methods, respectively.
摘要
Recently, several Deep Learning (DL) methods have been proposed for automated identification of kidney stones during ureteroscopy to enable rapid therapeutic decisions. Although these DL approaches have shown promising results, they are mainly suitable for kidney stone types with abundant labeled data. However, there are few labeled images available for rare kidney stone types. To address this challenge, this study exploits Deep Metric Learning (DML) methods to handle such classes with few samples, generalize well to out-of-distribution samples, and cope better with new classes added to the database.The proposed Guided Deep Metric Learning approach is based on a novel architecture that learns data representations in an improved way. The solution is inspired by Few-Shot Learning (FSL) and uses a teacher-student approach. The teacher model (GEMINI) generates a reduced hypothesis space based on prior knowledge from labeled data and guides a student model (i.e., ResNet50) through a Knowledge Distillation scheme.Extensive tests were performed on two datasets separately used for recognition, namely a set of images of the surfaces of kidney stone fragments and a set of images of fragment sections. The proposed DML-approach improved identification accuracy by 10% and 12% compared to DL-methods and other DML-approaches, respectively. Moreover, model embeddings from the two dataset types were merged in an organized way through a multi-view scheme to simultaneously exploit the information of surface and section fragments. Tests with the resulting mixed model improved identification accuracy by at least 3% and up to 30% compared to DL-models and shallow machine learning methods, respectively.
Leveraging Factored Action Spaces for Off-Policy Evaluation
results: 提出了一种新的“分解”重样 estimator,并证明了这种 estimator 具有较低的偏差和较高的稳定性,同时保持零偏差性。通过实验验证了这些理论结果,并证明了假设的有效性。Abstract
Off-policy evaluation (OPE) aims to estimate the benefit of following a counterfactual sequence of actions, given data collected from executed sequences. However, existing OPE estimators often exhibit high bias and high variance in problems involving large, combinatorial action spaces. We investigate how to mitigate this issue using factored action spaces i.e. expressing each action as a combination of independent sub-actions from smaller action spaces. This approach facilitates a finer-grained analysis of how actions differ in their effects. In this work, we propose a new family of "decomposed" importance sampling (IS) estimators based on factored action spaces. Given certain assumptions on the underlying problem structure, we prove that the decomposed IS estimators have less variance than their original non-decomposed versions, while preserving the property of zero bias. Through simulations, we empirically verify our theoretical results, probing the validity of various assumptions. Provided with a technique that can derive the action space factorisation for a given problem, our work shows that OPE can be improved "for free" by utilising this inherent problem structure.
摘要
<> translate "Off-policy evaluation (OPE) aims to estimate the benefit of following a counterfactual sequence of actions, given data collected from executed sequences. However, existing OPE estimators often exhibit high bias and high variance in problems involving large, combinatorial action spaces. We investigate how to mitigate this issue using factored action spaces i.e. expressing each action as a combination of independent sub-actions from smaller action spaces. This approach facilitates a finer-grained analysis of how actions differ in their effects. In this work, we propose a new family of "decomposed" importance sampling (IS) estimators based on factored action spaces. Given certain assumptions on the underlying problem structure, we prove that the decomposed IS estimators have less variance than their original non-decomposed versions, while preserving the property of zero bias. Through simulations, we empirically verify our theoretical results, probing the validity of various assumptions. Provided with a technique that can derive the action space factorisation for a given problem, our work shows that OPE can be improved "for free" by utilising this inherent problem structure.">>Here's the translation in Simplified Chinese:<>Off-policy evaluation (OPE) 目标是估计在执行不同的行动序列后的利益,基于已经执行过的数据。然而,现有的 OPE 估计器经常在具有大量结合型动作空间的问题中表现出高偏差和高方差。我们研究如何使用分解动作空间来降低这些问题。我们提出了一种基于分解动作空间的 "分解" 重要抽样(IS) 估计器。对于满足某些假设,我们证明了这种分解 IS 估计器 的方差比原始非分解版本更低,同时保持零偏差性。通过实验,我们证明了我们的理论结论,检验了各种假设的有效性。如果可以得到一种可以 derive 动作空间分解的技术,我们的工作显示了 OPE 可以免费地改进,通过利用这种问题的内在结构。
Classical Out-of-Distribution Detection Methods Benchmark in Text Classification Tasks
results: 我们的分析显示,现有的NLP任务中的OOD检测方法并不够敏感,无法捕捉各种分布偏移的样本。特别是背景偏移和随机排序的字符串内域文本测试场景是最大的挑战。这说明未来的研究应该更加注重开发更有效的OOD检测方法,而我们的研究提供了一个充分定义的研究基础。Abstract
State-of-the-art models can perform well in controlled environments, but they often struggle when presented with out-of-distribution (OOD) examples, making OOD detection a critical component of NLP systems. In this paper, we focus on highlighting the limitations of existing approaches to OOD detection in NLP. Specifically, we evaluated eight OOD detection methods that are easily integrable into existing NLP systems and require no additional OOD data or model modifications. One of our contributions is providing a well-structured research environment that allows for full reproducibility of the results. Additionally, our analysis shows that existing OOD detection methods for NLP tasks are not yet sufficiently sensitive to capture all samples characterized by various types of distributional shifts. Particularly challenging testing scenarios arise in cases of background shift and randomly shuffled word order within in domain texts. This highlights the need for future work to develop more effective OOD detection approaches for the NLP problems, and our work provides a well-defined foundation for further research in this area.
摘要
现代模型在控制环境下可以表现出色,但它们在不同类型的 Distributional Shift 下陷入困难,因此 OOD 检测成为 NLP 系统的关键组成部分。在这篇论文中,我们关注现有 OOD 检测方法在 NLP 领域的局限性。我们评估了八种可以容易地集成到现有 NLP 系统中的 OOD 检测方法,不需要额外的 OOD 数据或模型修改。我们的贡献之一是提供了一个具有完整可重现性的研究环境。此外,我们的分析表明,现有的 OOD 检测方法在 NLP 任务中并不够敏感,无法捕捉所有类型的 Distributional Shift 下的样本。特别是在背景变化和随机排序的情况下,OOD 检测方法表现出特别困难。这种情况 highlights 未来的研究应该更加注重开发更有效的 OOD 检测方法,我们的工作提供了一个完善的基础 для进一步的研究。
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
results: 本论文的方法可以在约 20 秒内进行个人化,比 DreamBooth 快 25 倍,比 Textual Inversion 快 125 倍,并且可以从单一图像中生成高质量和多样化的个人化 faces,而且模型的大小仅有 DreamBooth 的 1/10000。Abstract
Personalization has emerged as a prominent aspect within the field of generative AI, enabling the synthesis of individuals in diverse contexts and styles, while retaining high-fidelity to their identities. However, the process of personalization presents inherent challenges in terms of time and memory requirements. Fine-tuning each personalized model needs considerable GPU time investment, and storing a personalized model per subject can be demanding in terms of storage capacity. To overcome these challenges, we propose HyperDreamBooth-a hypernetwork capable of efficiently generating a small set of personalized weights from a single image of a person. By composing these weights into the diffusion model, coupled with fast finetuning, HyperDreamBooth can generate a person's face in various contexts and styles, with high subject details while also preserving the model's crucial knowledge of diverse styles and semantic modifications. Our method achieves personalization on faces in roughly 20 seconds, 25x faster than DreamBooth and 125x faster than Textual Inversion, using as few as one reference image, with the same quality and style diversity as DreamBooth. Also our method yields a model that is 10000x smaller than a normal DreamBooth model. Project page: https://hyperdreambooth.github.io
摘要
个人化在生成人工智能领域中已经成为一个显著的特征,允许在不同的上下文和风格中生成个性化的人脸,保持高度准确性。然而,个人化过程存在内存和时间的挑战,每个个性化模型都需要较长的GPU时间投资,并且每个人需要存储一个个性化模型,这会占用更多的存储容量。为了解决这些挑战,我们提出了HyperDreamBooth,一个具有高效生成少量个性化参数的超网络。通过这些参数与扩散模型的组合,加速了个性化,可以在20秒钟内生成具有高级精度和多样化风格的人脸,与DreamBooth和Textual Inversion相当,但快速得多,使用的参数少得多,模型的大小也很小。更多信息请参考:
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
results: 在五个大规模数据集(Kinetics-400、Kinetics-600、SS-v2、Diving-48和ActivityNet-1.3)上,Video-FocalNets模型表现出色,与现有的Transformer模型相比,具有更高的效率和更好的性能。Abstract
Recent video recognition models utilize Transformer models for long-range spatio-temporal context modeling. Video transformer designs are based on self-attention that can model global context at a high computational cost. In comparison, convolutional designs for videos offer an efficient alternative but lack long-range dependency modeling. Towards achieving the best of both designs, this work proposes Video-FocalNet, an effective and efficient architecture for video recognition that models both local and global contexts. Video-FocalNet is based on a spatio-temporal focal modulation architecture that reverses the interaction and aggregation steps of self-attention for better efficiency. Further, the aggregation step and the interaction step are both implemented using efficient convolution and element-wise multiplication operations that are computationally less expensive than their self-attention counterparts on video representations. We extensively explore the design space of focal modulation-based spatio-temporal context modeling and demonstrate our parallel spatial and temporal encoding design to be the optimal choice. Video-FocalNets perform favorably well against the state-of-the-art transformer-based models for video recognition on five large-scale datasets (Kinetics-400, Kinetics-600, SS-v2, Diving-48, and ActivityNet-1.3) at a lower computational cost. Our code/models are released at https://github.com/TalalWasim/Video-FocalNets.
摘要
近期的视频识别模型通过Transformer模型来实现长距离空间temporal上下文模型。视频转换设计基于自注意的自我注意可以模型全局上下文,但是计算成本较高。相比之下,图像设计对视频提供了一个有效的alternative,但缺少长距离依赖关系模型。为了实现两种设计的优点,本工作提出了Video-FocalNet,一种高效和经济的视频识别模型。Video-FocalNet基于空间temporal关注模块,该模块通过逆转自注意的互动和聚合步骤来提高效率。此外,聚合步骤和互动步骤都使用了高效的卷积和元素乘法操作,这些操作相比于自注意操作更加经济。我们广泛探索了关注模块基于空间temporal上下文模型的设计空间,并证明我们的并行空间和时间编码设计是最佳选择。Video-FocalNet在五个大规模数据集(Kinetics-400、Kinetics-600、SS-v2、Diving-48和ActivityNet-1.3)上与当前的Transformer模型进行视频识别,性能较高,计算成本较低。我们的代码和模型在https://github.com/TalalWasim/Video-FocalNets上发布。
In-context Autoencoder for Context Compression in a Large Language Model
for: solves the long context problem in large language models (LLMs) by compressing a long context into a limited number of memory slots.
methods: uses an In-context Autoencoder (ICAE) with two modules: a learnable encoder adapted with LoRA from an LLM, and a fixed decoder which is the target LLM. The ICAE is pretrained using both autoencoding and language modeling objectives, and then fine-tuned on a small amount of instruct data to enhance its interaction with various prompts.
results: effectively produces memory slots with $4\times$ context compression, which can be well conditioned on by the target LLM to respond to various prompts.Abstract
We propose the In-context Autoencoder (ICAE) for context compression in a large language model (LLM). The ICAE has two modules: a learnable encoder adapted with LoRA from an LLM for compressing a long context into a limited number of memory slots, and a fixed decoder which is the target LLM that can condition on the memory slots for various purposes. We first pretrain the ICAE using both autoencoding and language modeling objectives on massive text data, enabling it to generate memory slots that accurately and comprehensively represent the original context. Then, we fine-tune the pretrained ICAE on a small amount of instruct data to enhance its interaction with various prompts for producing desirable responses. Our experimental results demonstrate that the ICAE learned with our proposed pretraining and fine-tuning paradigm can effectively produce memory slots with $4\times$ context compression, which can be well conditioned on by the target LLM to respond to various prompts. The promising results demonstrate significant implications of the ICAE for its novel approach to the long context problem and its potential to reduce computation and memory overheads for LLM inference in practice, suggesting further research effort in context management for an LLM. Our code and data will be released shortly.
摘要
我们提议使用内容压缩 autoencoder(ICAE)来解决大型语言模型(LLM)中的长度问题。ICAE有两个模组:一个可学习的Encoder,从LLM中获取丰富的文本数据,将长 context 压缩到有限的内存槽中,以及一个固定的Decoder,可以根据内存槽进行多种目的的处理。我们首先将ICAE使用自动编码和语言模型目标函数进行预训练,将其训练到从大量文本数据中获取的memory slots可以准确地和全面地表示原始上下文。然后,我们精确地调整预训练后的ICAE,以便在不同的提示下生成适当的回应。我们的实验结果表明,透过我们提出的预训练和精确调整方法,ICAE可以实现$4\times$的context压缩,可以很好地被目标LLM所条件。这些成果表明ICAE具有novel的方法并且具有实际应用中LLM计算和内存负载的减少之能力,因此需要进一步的研究,以解决LLM中的长度问题。我们将在短时间内发布代码和数据。
On the Connection between Game-Theoretic Feature Attributions and Counterfactual Explanations
results: 我们的研究显示,在满足某些条件时,game-theoretic feature attributions和对冲推论解释是等价的。此外,我们还发现了对冲推论解释的局限性,其不能准确地提供特征重要性。我们的实验结果表明,在不同的数据集上,使用不同的方法可以得到不同的解释结果。Abstract
Explainable Artificial Intelligence (XAI) has received widespread interest in recent years, and two of the most popular types of explanations are feature attributions, and counterfactual explanations. These classes of approaches have been largely studied independently and the few attempts at reconciling them have been primarily empirical. This work establishes a clear theoretical connection between game-theoretic feature attributions, focusing on but not limited to SHAP, and counterfactuals explanations. After motivating operative changes to Shapley values based feature attributions and counterfactual explanations, we prove that, under conditions, they are in fact equivalent. We then extend the equivalency result to game-theoretic solution concepts beyond Shapley values. Moreover, through the analysis of the conditions of such equivalence, we shed light on the limitations of naively using counterfactual explanations to provide feature importances. Experiments on three datasets quantitatively show the difference in explanations at every stage of the connection between the two approaches and corroborate the theoretical findings.
摘要
“几年前,可解释人工智能(XAI)已经受到了广泛的关注,而feature attributions和counterfactual explanations是最受欢迎的两种解释方法。这两种方法在过去一直被研究独立,只有一些基于实践的尝试了它们的结合。本文确立了game-theoretic feature attributions和counterfactual explanations之间的具体理论连接,并且修改了Shapley值基础的feature attributions和counterfactual explanations,以便实现它们的相等。此外,我们还延伸了这个相等结果至游戏理论解决方案的其他概念,并且通过分析这些条件的限制,实际阐明了使用counterfactual explanations提供特征重要性的局限性。实验结果显示,这两种方法在connection中的解释有所不同,并且与理论结果相符。”Here is a word-for-word translation of the text into Simplified Chinese:“几年前,可解释人工智能(XAI)已经受到了广泛的关注,而feature attributions和counterfactual explanations是最受欢迎的两种解释方法。这两种方法在过去一直被研究独立,只有一些基于实践的尝试了它们的结合。本文确立了game-theoretic feature attributions和counterfactual explanations之间的具体理论连接,并且修改了Shapley值基础的feature attributions和counterfactual explanations,以便实现它们的相等。此外,我们还延伸了这个相等结果至游戏理论解决方案的其他概念。”
DRAGON: A Dialogue-Based Robot for Assistive Navigation with Visual Language Grounding
results: 经过用户实验,发现DRAGON能够与用户通信流畅,提供好的引导体验,并将用户联系到周遭环境的意义性资讯。Abstract
Persons with visual impairments (PwVI) have difficulties understanding and navigating spaces around them. Current wayfinding technologies either focus solely on navigation or provide limited communication about the environment. Motivated by recent advances in visual-language grounding and semantic navigation, we propose DRAGON, a guiding robot powered by a dialogue system and the ability to associate the environment with natural language. By understanding the commands from the user, DRAGON is able to guide the user to the desired landmarks on the map, describe the environment, and answer questions from visual observations. Through effective utilization of dialogue, the robot can ground the user's free-form descriptions to landmarks in the environment, and give the user semantic information through spoken language. We conduct a user study with blindfolded participants in an everyday indoor environment. Our results demonstrate that DRAGON is able to communicate with the user smoothly, provide a good guiding experience, and connect users with their surrounding environment in an intuitive manner.
摘要
人们视障 (PwVI) 在周围环境中有困难理解和导航。当前的导航技术ether solely focus on navigation or provide limited environmental information. 鼓动于最近的视觉语言固定和semantic navigation技术的进步,我们提议了DRAGON,一种带有对话系统的导航机器人。通过理解用户的命令,DRAGON可以引导用户到地图上的目标点,描述环境,并根据视觉观察回答问题。通过对话的有效利用,机器人可以将用户的自由形式描述与环境中的标志点相关联,并通过语音提供Semantic information。我们在每天的室内环境中进行了盲人参与者的用户研究。我们的结果表明,DRAGON能够与用户交流平滑,提供良好的引导体验,并通过语音连接用户与周围环境的INTUITIVE manner。
LLM-assisted Knowledge Graph Engineering: Experiments with ChatGPT
paper_authors: Lars-Peter Meyer, Claus Stadler, Johannes Frey, Norman Radtke, Kurt Junghanns, Roy Meissner, Gordian Dziwis, Kirill Bulert, Michael Martin
methods: 该 paper 使用了 ChatGPT 进行了详细的实验,以 explore its potential in supporting KGE。
results: 实验结果表明,ChatGPT 可以帮助开发和管理 Knowledge Graphs (KGs),并且可以提高 KGE 的效率和质量。Abstract
Knowledge Graphs (KG) provide us with a structured, flexible, transparent, cross-system, and collaborative way of organizing our knowledge and data across various domains in society and industrial as well as scientific disciplines. KGs surpass any other form of representation in terms of effectiveness. However, Knowledge Graph Engineering (KGE) requires in-depth experiences of graph structures, web technologies, existing models and vocabularies, rule sets, logic, as well as best practices. It also demands a significant amount of work. Considering the advancements in large language models (LLMs) and their interfaces and applications in recent years, we have conducted comprehensive experiments with ChatGPT to explore its potential in supporting KGE. In this paper, we present a selection of these experiments and their results to demonstrate how ChatGPT can assist us in the development and management of KGs.
摘要
知识图(KG)为我们提供一种结构化、灵活、透明、跨系统、合作的知识和数据组织方式,可以涵盖社会、工业以及科学领域的多种领域。KG比任何其他表示方式更有效。然而,知识图工程(KGE)需要深厚的图结构、网络技术、现有模型和词汇、规则集、逻辑以及最佳实践的经验。它还需要大量的劳动。鉴于大语言模型(LLM)的发展和其界面和应用在最近几年,我们已经进行了全面的实验,用ChatGPT探索其在支持KGE方面的潜力。本文介绍了这些实验的选择和结果,以示出ChatGPT如何帮助我们在开发和管理知识图方面。
Uncovering Unique Concept Vectors through Latent Space Decomposition
results: 经过广泛的实验表明,大多数提取的概念向量都是人类可理解的,具有准确性和一致性,并与任务有直接关系。此外,该方法在数据集探索中也有remarkable的实用性,可以成功地找到训练数据中的偏移和杂音样本。Abstract
Interpreting the inner workings of deep learning models is crucial for establishing trust and ensuring model safety. Concept-based explanations have emerged as a superior approach that is more interpretable than feature attribution estimates such as pixel saliency. However, defining the concepts for the interpretability analysis biases the explanations by the user's expectations on the concepts. To address this, we propose a novel post-hoc unsupervised method that automatically uncovers the concepts learned by deep models during training. By decomposing the latent space of a layer in singular vectors and refining them by unsupervised clustering, we uncover concept vectors aligned with directions of high variance that are relevant to the model prediction, and that point to semantically distinct concepts. Our extensive experiments reveal that the majority of our concepts are readily understandable to humans, exhibit coherency, and bear relevance to the task at hand. Moreover, we showcase the practical utility of our method in dataset exploration, where our concept vectors successfully identify outlier training samples affected by various confounding factors. This novel exploration technique has remarkable versatility to data types and model architectures and it will facilitate the identification of biases and the discovery of sources of error within training data.
摘要
深度学习模型的内部工作方式的解释是建立信任和确保模型安全的关键。概念基于的解释方法在可解释性方面胜过特征归因估计如像素感知。然而,为了定义概念,用户的期望会影响解释。为解决这个问题,我们提出了一种新的后续无监督方法,可以自动揭示深度模型在训练过程中学习的概念。我们将层的秘密空间拆分成单值特征,然后通过无监督归一化来精细化概念向量,从而找到与高差值方向相关的概念向量,这些向量对应于semantically meaningful的概念。我们的广泛实验表明,大多数我们的概念都可以被人类理解,具有凝聚性,并与任务相关。此外,我们还证明了我们的方法在数据集探索中的实用性,我们的概念向量可以成功地标识在训练数据中受到各种干扰因素的异常训练样本。这种新的探索技术具有数据类型和模型结构的强大可 versatility,可以帮助确定模型中的偏见和训练数据中的错误来源。
Generating Benchmarks for Factuality Evaluation of Language Models
results: 这个论文使用 FACTOR 方法创建了两个 benchmark:Wiki-FACTOR 和 News-FACTOR。研究发现,(i) benchmark 分数与模型大小相关,并且当LM被增强后,其分数会提高。(ii) benchmark 分数与沟通能力之间存在正相关关系,但这两个指标不总是同时同意模型排名。(iii) 当沟通能力和 benchmark 分数不同时,后者更好地反映了 LM 在开放生成中的 фактиче正确性,这种反馈来自于人工标注员。Abstract
Before deploying a language model (LM) within a given domain, it is important to measure its tendency to generate factually incorrect information in that domain. Existing factual generation evaluation methods focus on facts sampled from the LM itself, and thus do not control the set of evaluated facts and might under-represent rare and unlikely facts. We propose FACTOR: Factual Assessment via Corpus TransfORmation, a scalable approach for evaluating LM factuality. FACTOR automatically transforms a factual corpus of interest into a benchmark evaluating an LM's propensity to generate true facts from the corpus vs. similar but incorrect statements. We use our framework to create two benchmarks: Wiki-FACTOR and News-FACTOR. We show that: (i) our benchmark scores increase with model size and improve when the LM is augmented with retrieval; (ii) benchmark score correlates with perplexity, but the two metrics do not always agree on model ranking; and (iii) when perplexity and benchmark score disagree, the latter better reflects factuality in open-ended generation, as measured by human annotators. We make our data and code publicly available in https://github.com/AI21Labs/factor.
摘要
Benchmark scores increase with model size and improve when the LM is augmented with retrieval.2. Benchmark score correlates with perplexity, but the two metrics do not always agree on model ranking.3. When perplexity and benchmark score disagree, the latter better reflects factuality in open-ended generation, as measured by human annotators.We make our data and code publicly available at https://github.com/AI21Labs/factor.
Sequential Monte Carlo Learning for Time Series Structure Discovery
paper_authors: Feras A. Saad, Brian J. Patton, Matthew D. Hoffman, Rif A. Saurous, Vikash K. Mansinghka
for: automatizilly discover accurate models of complex time series data
methods: Bayesian nonparametric prior over Gaussian process time series models, integrates sequential Monte Carlo (SMC) and involutive MCMC for posterior inference
results: delivers 10x-100x runtime speedups over previous MCMC and greedy-search structure learning algorithms, and discovers sensible models that deliver more accurate point forecasts and interval forecasts compared to statistical and neural baselines.Abstract
This paper presents a new approach to automatically discovering accurate models of complex time series data. Working within a Bayesian nonparametric prior over a symbolic space of Gaussian process time series models, we present a novel structure learning algorithm that integrates sequential Monte Carlo (SMC) and involutive MCMC for highly effective posterior inference. Our method can be used both in "online" settings, where new data is incorporated sequentially in time, and in "offline" settings, by using nested subsets of historical data to anneal the posterior. Empirical measurements on real-world time series show that our method can deliver 10x--100x runtime speedups over previous MCMC and greedy-search structure learning algorithms targeting the same model family. We use our method to perform the first large-scale evaluation of Gaussian process time series structure learning on a prominent benchmark of 1,428 econometric datasets. The results show that our method discovers sensible models that deliver more accurate point forecasts and interval forecasts over multiple horizons as compared to widely used statistical and neural baselines that struggle on this challenging data.
摘要
Deep reinforcement learning for the dynamic vehicle dispatching problem: An event-based approach
for: This paper is written for solving the dynamic vehicle dispatching problem, which is a problem of assigning vehicles to requests that arise stochastically over time and space.
methods: The paper uses a semi-Markov decision process to model the problem, which allows for treating time as continuous and reduces the combinatorial complexity of the decision space. The authors also use double deep q-learning to train their decision agents.
results: The authors’ policies exhibit better average waiting times, cancellation rates, and total service times compared to heuristic policies often used in practice, with a reduction in average waiting times of up to 50% relative to the other tested heuristic policies.Here is the text in Simplified Chinese:
results: 作者的策略比常用的各种规则更好,waiting time、取消率和总服务时间都得到了改善,waiting time的减少达50%以上。Abstract
The dynamic vehicle dispatching problem corresponds to deciding which vehicles to assign to requests that arise stochastically over time and space. It emerges in diverse areas, such as in the assignment of trucks to loads to be transported; in emergency systems; and in ride-hailing services. In this paper, we model the problem as a semi-Markov decision process, which allows us to treat time as continuous. In this setting, decision epochs coincide with discrete events whose time intervals are random. We argue that an event-based approach substantially reduces the combinatorial complexity of the decision space and overcomes other limitations of discrete-time models often proposed in the literature. In order to test our approach, we develop a new discrete-event simulator and use double deep q-learning to train our decision agents. Numerical experiments are carried out in realistic scenarios using data from New York City. We compare the policies obtained through our approach with heuristic policies often used in practice. Results show that our policies exhibit better average waiting times, cancellation rates and total service times, with reduction in average waiting times of up to 50% relative to the other tested heuristic policies.
摘要
这个动态车辆分配问题与决定在时间和空间上随机出现的请求中分配车辆相关。它出现在不同领域,如货物运输、紧急系统和乘用车服务等。在这篇论文中,我们将问题模型为半Markov决策过程,这使得我们可以将时间视为连续的。在这种设定下,决策瞬间与随机时间间隔相匹配,我们认为事件基本方法可以减少决策空间的复杂度,并且超越了常见的 discrete-time 模型。为了测试我们的方法,我们开发了一个新的离散事件仿真器,并使用双层深度Q学习来训练我们的决策代理人。我们在使用实际数据从纽约市进行数值实验,并与常见的各种各样的策略进行比较。结果显示,我们的策略可以减少待命时间的平均值,最多减少50%,相比其他测试的各种各样的策略。
The complexity of non-stationary reinforcement learning
for: solves the problem of continual learning in reinforcement learning, specifically the non-stationary reinforcement learning challenge.
methods: uses a worst-case complexity result to prove that modifying probabilities or rewards in a reinforcement learning problem requires a significant amount of time, unless the strong exponential time hypothesis (SETH) is false.
results: shows that adding a new state-action pair is much easier to implement than modifying existing probabilities or rewards.Abstract
The problem of continual learning in the domain of reinforcement learning, often called non-stationary reinforcement learning, has been identified as an important challenge to the application of reinforcement learning. We prove a worst-case complexity result, which we believe captures this challenge: Modifying the probabilities or the reward of a single state-action pair in a reinforcement learning problem requires an amount of time almost as large as the number of states in order to keep the value function up to date, unless the strong exponential time hypothesis (SETH) is false; SETH is a widely accepted strengthening of the P $\neq$ NP conjecture. Recall that the number of states in current applications of reinforcement learning is typically astronomical. In contrast, we show that just $\textit{adding}$ a new state-action pair is considerably easier to implement.
摘要
“对于回传学习领域中的持续学习问题(通常称为非站ARY reinforcement learning),我们证明了一个最差情况的复杂性结果:对某个状态动作 pairs modify 概率或奖励,需要大约与状态数量相当的时间才能保持值函数更新, Unless SETH 是 false;SETK 是一个广泛accepted的对 P $\neq$ NP 推论的强化。请注意,现在的实际应用中的状态数量通常是惊人的。然而,我们显示了仅增加一个新的状态动作 pairs 是可以轻松实现的。”Note: SETH stands for "Strong Exponential Time Hypothesis" and P $\neq$ NP is a famous open problem in computer science.
Embodied Lifelong Learning for Task and Motion Planning
results: 论文在 simulated 2D 领域和 BEHAVIOR 数据集上的规划成功率显示了明显的提高。Abstract
A robot deployed in a home over long stretches of time faces a true lifelong learning problem. As it seeks to provide assistance to its users, the robot should leverage any accumulated experience to improve its own knowledge to become a more proficient assistant. We formalize this setting with a novel lifelong learning problem formulation in the context of learning for task and motion planning (TAMP). Exploiting the modularity of TAMP systems, we develop a generative mixture model that produces candidate continuous parameters for a planner. Whereas most existing lifelong learning approaches determine a priori how data is shared across task models, our approach learns shared and non-shared models and determines which to use online during planning based on auxiliary tasks that serve as a proxy for each model's understanding of a state. Our method exhibits substantial improvements in planning success on simulated 2D domains and on several problems from the BEHAVIOR benchmark.
摘要
一个机器人在家中长时间内部署,面临着一个真正的一生学习问题。作为它提供帮助的用户,机器人应该利用已有经验来提高自己的知识,成为更有效的帮手。我们将这种设定形式化为一种新的一生学习问题形式,在任务和动作规划(TAMP)上下文中。利用TAMP系统的模块性,我们开发了一种生成混合模型,生成候选的连续参数 для плаanner。大多数现有的一生学习方法在数据分享上做出了决定,而我们的方法在线上决定使用共享和非共享模型,并根据辅助任务来选择使用哪些模型,以便在规划中使用。我们的方法在 simulated 2D 领域和 BEHAVIOR benchmark 上表现出了显著的改善。
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering
for: 这 paper 的目的是提出一种简单 yet effective 的自然语言生成(NLG)评估 metric,它可以增强评估过程的普适性和可读性。
methods: 这 paper 使用了 instruction-style 问题回答任务来形式NLG评估,并使用了 instruction-tuned pre-trained language models(PLMs),不需要训练在评估数据集上,以提高普适性。另外,这 paper 还 decomposes 了自己的设计的 instruction-style 问题,以获得更好的解释性。
results: 实验结果表明,DecompEval 可以在不需要训练的情况下,在 text summarization 和对话生成 等 NLG 任务中 achieve state-of-the-art 性能,同时也能够具备强大的特征级 / 任务级普适性和可读性。Abstract
Existing evaluation metrics for natural language generation (NLG) tasks face the challenges on generalization ability and interpretability. Specifically, most of the well-performed metrics are required to train on evaluation datasets of specific NLG tasks and evaluation dimensions, which may cause over-fitting to task-specific datasets. Furthermore, existing metrics only provide an evaluation score for each dimension without revealing the evidence to interpret how this score is obtained. To deal with these challenges, we propose a simple yet effective metric called DecompEval. This metric formulates NLG evaluation as an instruction-style question answering task and utilizes instruction-tuned pre-trained language models (PLMs) without training on evaluation datasets, aiming to enhance the generalization ability. To make the evaluation process more interpretable, we decompose our devised instruction-style question about the quality of generated texts into the subquestions that measure the quality of each sentence. The subquestions with their answers generated by PLMs are then recomposed as evidence to obtain the evaluation result. Experimental results show that DecompEval achieves state-of-the-art performance in untrained metrics for evaluating text summarization and dialogue generation, which also exhibits strong dimension-level / task-level generalization ability and interpretability.
摘要
To address these challenges, we propose a new metric called DecompEval. This metric formulates NLG evaluation as an instruction-style question answering task and utilizes instruction-tuned pre-trained language models (PLMs) without training on evaluation datasets. This approach aims to enhance the generalization ability of the metric.To make the evaluation process more interpretable, we decompose the instruction-style question about the quality of generated texts into subquestions that measure the quality of each sentence. The subquestions, along with the answers generated by PLMs, are then recomposed to obtain the evaluation result.Experimental results show that DecompEval achieves state-of-the-art performance in untrained metrics for evaluating text summarization and dialogue generation. It also exhibits strong dimension-level and task-level generalization ability and interpretability.
Prompts Should not be Seen as Secrets: Systematically Measuring Prompt Extraction Attack Success
results: 实验结果表明,简单的文本基本可以抓取提取,高效地抓取提取。Abstract
The generations of large language models are commonly controlled through prompting techniques, where a user's query to the model is prefixed with a prompt that aims to guide the model's behaviour on the query. The prompts used by companies to guide their models are often treated as secrets, to be hidden from the user making the query. They have even been treated as commodities to be bought and sold. However, there has been anecdotal evidence showing that the prompts can be extracted by a user even when they are kept secret. In this paper, we present a framework for systematically measuring the success of prompt extraction attacks. In experiments with multiple sources of prompts and multiple underlying language models, we find that simple text-based attacks can in fact reveal prompts with high probability.
摘要
大型语言模型的一代通常通过提示技术来控制,其中用户的查询会被前置一个提示,以导引模型对查询的行为。公司使用的提示 oftentimes 被视为秘密,隐瞒于用户。然而,有一些具体的证据表明,用户可以EXTRACT 提示,即使它们被保持为秘密。在这篇论文中,我们提出了一种系统化测量提示提取攻击的成功度的框架。在多个来源的提示和多个基础语言模型的实验中,我们发现了简单的文本基于攻击可以高概率地披露提示。
Self-Supervised Learning for Interactive Perception of Surgical Thread for Autonomous Suture Tail-Shortening
methods: 该方法使用了学习的2D外科缝合线检测网络来分割RGB图像中的缝合线,然后利用两个ステレオカメラ拍摄的图像来重建缝合线为NURBS spline。方法还使用了两个摄像头来跟踪缝合线 across consecutive frames。
results: 实验表明,该方法在单帧3D缝合线重建中实现了1.33像素平均 reprojection error,并在两个跟踪序列中实现了0.84像素平均 reprojection error。在“尾短”任务中,方法实现了20次实验中的90%成功率。详细的补充材料可以在https://sites.google.com/berkeley.edu/autolab-surgical-thread/ 查看。Abstract
Accurate 3D sensing of suturing thread is a challenging problem in automated surgical suturing because of the high state-space complexity, thinness and deformability of the thread, and possibility of occlusion by the grippers and tissue. In this work we present a method for tracking surgical thread in 3D which is robust to occlusions and complex thread configurations, and apply it to autonomously perform the surgical suture "tail-shortening" task: pulling thread through tissue until a desired "tail" length remains exposed. The method utilizes a learned 2D surgical thread detection network to segment suturing thread in RGB images. It then identifies the thread path in 2D and reconstructs the thread in 3D as a NURBS spline by triangulating the detections from two stereo cameras. Once a 3D thread model is initialized, the method tracks the thread across subsequent frames. Experiments suggest the method achieves a 1.33 pixel average reprojection error on challenging single-frame 3D thread reconstructions, and an 0.84 pixel average reprojection error on two tracking sequences. On the tail-shortening task, it accomplishes a 90% success rate across 20 trials. Supplemental materials are available at https://sites.google.com/berkeley.edu/autolab-surgical-thread/ .
摘要
医学自动控制技术是一个挑战性的问题,因为缝纫线的高状态空间复杂性,细长和扭曲性,以及机械握住和组织干扰。在这项工作中,我们提出了一种能够承受干扰和复杂缝纫线配置的3D缝纫线跟踪方法,并将其应用于自动完成“尾短化”任务:将缝纫线通过组织 until a desired “tail” length remains exposed。该方法使用了学习的2D缝纫线检测网络来分割RGB图像中的缝纫线。然后,它将缝纫线路径在2D上标识出,并使用两个ステレオ摄像头拍摄的检测来重建缝纫线为NURBS spline。一旦Initializes a 3D thread model, the method tracks the thread across subsequent frames.实验表明,该方法在具有挑战性的单帧3D缝纫线重建任务中达到1.33像素平均投影误差,并在两个跟踪序列中达到0.84像素平均投影误差。在“尾短化”任务中,它实现了20次实验中的90%成功率。补充材料可以在https://sites.google.com/berkeley.edu/autolab-surgical-thread/查看。
results: (i) 发现的课程通常不 monotonic,与现有文献中的 monotonic 课程不同; (ii) 常见的 easy-to-hard 或 hard-to-easy 转换课程可能会下降性能; (iii) 为小 datasets 和模型而发现的课程可以在大 datasets 和模型上表现出色。Abstract
We introduce the problem of curriculum discovery and describe a curriculum learning framework capable of discovering effective curricula in a curriculum space based on prior knowledge about sample difficulty. Using annotation entropy and loss as measures of difficulty, we show that (i): the top-performing discovered curricula for a given model and dataset are often non-monotonic as opposed to monotonic curricula in existing literature, (ii): the prevailing easy-to-hard or hard-to-easy transition curricula are often at the risk of underperforming, and (iii): the curricula discovered for smaller datasets and models perform well on larger datasets and models respectively. The proposed framework encompasses some of the existing curriculum learning approaches and can discover curricula that outperform them across several NLP tasks.
摘要
我们介绍了课程发现问题和一个基于先前知识的课程学习框架,可以在课程空间中发现有效的课程。使用标签 entropy 和损失作为困难度的度量,我们显示了以下结论:(i) 给定的模型和数据集,最高表现的发现课程通常不是传统文献中的对称课程,(ii) 常见的易于困难或困难于易课程过渡curricula 可能会受到表现下降的问题,(iii) 针对较小的数据集和模型,发现的课程可以对较大的数据集和模型进行良好的表现。提案的框架包括一些现有的课程学习方法,并可以在多个自然语言处理任务中发现高表现的课程。
Composition-contrastive Learning for Sentence Embeddings
results: 比基eline表现提高,无需 auxiliary 训练目标或额外网络参数Abstract
Vector representations of natural language are ubiquitous in search applications. Recently, various methods based on contrastive learning have been proposed to learn textual representations from unlabelled data; by maximizing alignment between minimally-perturbed embeddings of the same text, and encouraging a uniform distribution of embeddings across a broader corpus. Differently, we propose maximizing alignment between texts and a composition of their phrasal constituents. We consider several realizations of this objective and elaborate the impact on representations in each case. Experimental results on semantic textual similarity tasks show improvements over baselines that are comparable with state-of-the-art approaches. Moreover, this work is the first to do so without incurring costs in auxiliary training objectives or additional network parameters.
摘要
设想文本表示方法在搜索应用中广泛存在。最近,有许多基于对比学习的方法被提出来从无标签数据中学习文本表示方法,通过最大化同样文本的微小变化 embedding 之间的对应性,并强制整个 корпуス中 embedding 呈现 uniform 分布。然而,我们提议通过 maximizing alignment between texts and a composition of their phrasal constituents来学习文本表示方法。我们考虑了几种实现方式,并详细介绍每种情况下的影响。实验结果表明,在 semantic textual similarity 任务上比基eline 高,并且不需要额外的auxiliary training objective或网络参数。Note: "phrase" is translated as "短语" (xiàng yǔ) in Simplified Chinese, and "constituents" is translated as "成分" (chéng fāng) in Simplified Chinese.
A scoping review on multimodal deep learning in biomedical images and texts
results: 本文对多个任务进行了评估,包括报告生成、视觉问答、跨模态检索、计算辅助诊断和semantic segmentation等。结果表明多模态深度学习在医学领域有广泛的应用前景和潜在的发展前景。Abstract
Computer-assisted diagnostic and prognostic systems of the future should be capable of simultaneously processing multimodal data. Multimodal deep learning (MDL), which involves the integration of multiple sources of data, such as images and text, has the potential to revolutionize the analysis and interpretation of biomedical data. However, it only caught researchers' attention recently. To this end, there is a critical need to conduct a systematic review on this topic, identify the limitations of current work, and explore future directions. In this scoping review, we aim to provide a comprehensive overview of the current state of the field and identify key concepts, types of studies, and research gaps with a focus on biomedical images and texts joint learning, mainly because these two were the most commonly available data types in MDL research. This study reviewed the current uses of multimodal deep learning on five tasks: (1) Report generation, (2) Visual question answering, (3) Cross-modal retrieval, (4) Computer-aided diagnosis, and (5) Semantic segmentation. Our results highlight the diverse applications and potential of MDL and suggest directions for future research in the field. We hope our review will facilitate the collaboration of natural language processing (NLP) and medical imaging communities and support the next generation of decision-making and computer-assisted diagnostic system development.
摘要
Translated into Simplified Chinese:计算机助成诊断和预测系统的未来应该能同时处理多modal数据。多modal深度学习(MDL),即多种数据的 integrate,有可能改变生物医学数据的分析和解释。然而,它只是最近才引起研究者的注意。为此,有一项重要的需求是进行这个话题的系统性综述,识别当前工作的局限性,并探索未来的方向。在这份报告中,我们希望提供生物医学图像和文本合作学习领域的全面的概述,并识别关键概念、研究类型和研究缺失,主要关注生物医学图像和文本合作学习。这个研究对五个任务进行了现有的多modal深度学习应用:(1)报告生成,(2)视觉问答,(3)跨modal检索,(4)计算机辅助诊断,(5) semantics 分割。我们的结果表明 MDL 的多样化应用和潜力,并提出了未来研究的方向。我们希望这份综述能促进 NLP 和医学影像社区之间的合作,并支持下一代决策和计算机辅助诊断系统的发展。
Gloss Attention for Gloss-free Sign Language Translation
results: 对多个大规模手语数据集进行实验,结果显示,我们提出的GASLT模型在准确率和效率两个方面与现有方法显著不同,表现更优化。代码在 \url{https://github.com/YinAoXiong/GASLT} 上提供。Abstract
Most sign language translation (SLT) methods to date require the use of gloss annotations to provide additional supervision information, however, the acquisition of gloss is not easy. To solve this problem, we first perform an analysis of existing models to confirm how gloss annotations make SLT easier. We find that it can provide two aspects of information for the model, 1) it can help the model implicitly learn the location of semantic boundaries in continuous sign language videos, 2) it can help the model understand the sign language video globally. We then propose \emph{gloss attention}, which enables the model to keep its attention within video segments that have the same semantics locally, just as gloss helps existing models do. Furthermore, we transfer the knowledge of sentence-to-sentence similarity from the natural language model to our gloss attention SLT network (GASLT) to help it understand sign language videos at the sentence level. Experimental results on multiple large-scale sign language datasets show that our proposed GASLT model significantly outperforms existing methods. Our code is provided in \url{https://github.com/YinAoXiong/GASLT}.
摘要
大多数手语翻译(SLT)方法至今都需要使用夸夸注(gloss)来提供额外监督信息,但获取夸夸注并不容易。为解决这个问题,我们首先进行了现有模型的分析,确认夸夸注如何使SLT更容易。我们发现,夸夸注可以为模型提供两种信息:1)帮助模型在连续手语视频中找到 semantics 的位置,2)帮助模型理解手语视频的全局结构。然后,我们提出了“夸夸注注意力”(gloss attention),它使模型能够在同 semantics 的视频段内保持注意力,就像夸夸注一样。此外,我们将自然语言模型中的句子之间的相似性知识传递到我们的夸夸注注意力SLT网络(GASLT)中,以 помочь它理解手语视频的句子水平。多个大规模的手语数据集的实验结果表明,我们的提议的GASLT模型在与现有方法进行比较时显著提高了性能。我们的代码可以在 中找到。
Unsupervised Domain Adaptation using Lexical Transformations and Label Injection for Twitter Data
results: 论文发现,使用修改后的源频道数据可以提高模型的性能,并达到了无监督的 POS 标签准确率为 92.14%(与零shot准确率相比,提高了81.54%)。同时,通过使用提posed transformations来生成 tweets,可以增强 Twitter 数据集,达到了状态的术语标签性能。Abstract
Domain adaptation is an important and widely studied problem in natural language processing. A large body of literature tries to solve this problem by adapting models trained on the source domain to the target domain. In this paper, we instead solve this problem from a dataset perspective. We modify the source domain dataset with simple lexical transformations to reduce the domain shift between the source dataset distribution and the target dataset distribution. We find that models trained on the transformed source domain dataset performs significantly better than zero-shot models. Using our proposed transformations to convert standard English to tweets, we reach an unsupervised part-of-speech (POS) tagging accuracy of 92.14% (from 81.54% zero shot accuracy), which is only slightly below the supervised performance of 94.45%. We also use our proposed transformations to synthetically generate tweets and augment the Twitter dataset to achieve state-of-the-art performance for POS tagging.
摘要
域适应是自然语言处理领域的重要和广泛研究问题。大量文献尝试通过将源领域模型适应目标领域来解决这个问题。在这篇论文中,我们则从数据集角度解决这个问题。我们对源领域数据集进行简单的lexical变换,以减少源领域分布和目标领域分布之间的域Shift。我们发现,使用我们提议的变换可以将标准英语转换成推文,并达到无监督 circumstance 中的POS标签准确率92.14%(从零shot准确率81.54%提高),只是与监督性性能(94.45%)相对较低。此外,我们还使用我们的变换synthetically生成推文,并将其添加到Twitter数据集中,以达到POS标签准确率的状态图表。
How Different Is Stereotypical Bias Across Languages?
methods: 本研究使用英语STereoSet数据集(Nadeem et al., 2021),通过自动翻译而将其翻译成德语、法语、西班牙语和土耳其语。
results: 研究发现,在多语言设置下进行此类分析非常重要,因为我们的实验显示了较为复杂的图像和语言模型之间的关系,英语单语言模型表现最强烈的偏见,而土耳其模型偏见最少。Abstract
Recent studies have demonstrated how to assess the stereotypical bias in pre-trained English language models. In this work, we extend this branch of research in multiple different dimensions by systematically investigating (a) mono- and multilingual models of (b) different underlying architectures with respect to their bias in (c) multiple different languages. To that end, we make use of the English StereoSet data set (Nadeem et al., 2021), which we semi-automatically translate into German, French, Spanish, and Turkish. We find that it is of major importance to conduct this type of analysis in a multilingual setting, as our experiments show a much more nuanced picture as well as notable differences from the English-only analysis. The main takeaways from our analysis are that mGPT-2 (partly) shows surprising anti-stereotypical behavior across languages, English (monolingual) models exhibit the strongest bias, and the stereotypes reflected in the data set are least present in Turkish models. Finally, we release our codebase alongside the translated data sets and practical guidelines for the semi-automatic translation to encourage a further extension of our work to other languages.
摘要
新的研究表明如何评估预训练的英语语言模型中的刻板印象偏见。在这项工作中,我们扩展了这一领域的研究,在不同的维度上系统地调查(a)不同的语言模型(b)的不同基础架构,对其偏见的多语言研究。为此,我们使用英语STereoSet数据集(Nadeem et al., 2021),并自动将其翻译成德语、法语、西班牙语和土耳其语。我们发现在多语言设置下进行这类分析是非常重要的,因为我们的实验显示了更加复杂的图像以及英语只有分析中的差异。主要发现是,mGPT-2(部分)在不同语言上表现出了反刻板行为,英语单语言模型表现最强烈的偏见,而数据集中各种刻板印象最少出现在土耳其模型中。最后,我们发布了我们的代码基础和翻译后的数据集,并提供了实践指南,以便其他语言的扩展。
Hybrid moderation in the newsroom: Recommending featured posts to content moderators
results: 专家评估表明,基于建议的选择,content moderators可以准确地选择合适的评论,NDCG分数为0.83。本研究还发现,文本特征的添加可以提高分类效果,但选择featured content仍然有一定主观性。Abstract
Online news outlets are grappling with the moderation of user-generated content within their comment section. We present a recommender system based on ranking class probabilities to support and empower the moderator in choosing featured posts, a time-consuming task. By combining user and textual content features we obtain an optimal classification F1-score of 0.44 on the test set. Furthermore, we observe an optimum mean NDCG@5 of 0.87 on a large set of validation articles. As an expert evaluation, content moderators assessed the output of a random selection of articles by choosing comments to feature based on the recommendations, which resulted in a NDCG score of 0.83. We conclude that first, adding text features yields the best score and second, while choosing featured content remains somewhat subjective, content moderators found suitable comments in all but one evaluated recommendations. We end the paper by analyzing our best-performing model, a step towards transparency and explainability in hybrid content moderation.
摘要
在线新闻媒体正在苦战用户生成内容的Moderation。我们提出一种基于排名分类概率的推荐系统,以支持和赋能Moderator在选择推荐文章方面的努力。通过结合用户和文本内容特征,我们实现了最佳的分类F1分数0.44,以及在大量验证文章上的优秀NDCG@50.87。此外,专业评估人员对一组随机选择的文章中的 recommenations 进行评估,得到了NDCG分数0.83。我们 conclude 以下两点:一、添加文本特征可以获得最佳分数;二、选择推荐内容仍然有一定的主观性,但Moderator在所有评估的推荐中都可以找到适合的评论。我们结束文章,分析我们最佳performing的模型,是一种透明度和解释性的 hybrid content moderation 的一步。
Using Large Language Models for Zero-Shot Natural Language Generation from Knowledge Graphs
results: 研究发现,使用 ChatGPT 模型可以在一些 WebNLG 2020 挑战中达到near state-of-the-art 性能水平,但在其他指标上表现较差。此外,研究还发现, factual、counter-factual 和 fictional 声明之间存在显著的关系,与模型对数据进行处理的知识有关。Abstract
In any system that uses structured knowledge graph (KG) data as its underlying knowledge representation, KG-to-text generation is a useful tool for turning parts of the graph data into text that can be understood by humans. Recent work has shown that models that make use of pretraining on large amounts of text data can perform well on the KG-to-text task even with relatively small sets of training data on the specific graph-to-text task. In this paper, we build on this concept by using large language models to perform zero-shot generation based on nothing but the model's understanding of the triple structure from what it can read. We show that ChatGPT achieves near state-of-the-art performance on some measures of the WebNLG 2020 challenge, but falls behind on others. Additionally, we compare factual, counter-factual and fictional statements, and show that there is a significant connection between what the LLM already knows about the data it is parsing and the quality of the output text.
摘要
在使用结构化知识图(KG)数据作为基础知识表示时,KG-to-text生成是一种有用的工具,可以将知识图数据转换成人类可理解的文本。最近的研究表明,可以通过对大量文本数据进行预训练,使模型在特定的图文生成任务上表现出色,即使只有小量的训练数据。在这篇论文中,我们将这一概念进一步发展,使用大型语言模型来进行零基础生成,基于模型对 triple 结构的理解。我们发现,ChatGPT在一些 WebNLG 2020 挑战中的一些指标上达到了near state-of-the-art 水平,但在其他指标上落后。此外,我们比较了事实、Counter-factual 和虚构声明,发现模型对数据处理过程中已知知识和出力文本质量之间存在显著的关系。
Similarity-based Memory Enhanced Joint Entity and Relation Extraction
results: 我们的实验表明,提出的方法可以比现有方法更高效地解决文档级别的共同实体和关系抽取问题,并在BioCreative V CDR词库中达到了国际先进水平。Abstract
Document-level joint entity and relation extraction is a challenging information extraction problem that requires a unified approach where a single neural network performs four sub-tasks: mention detection, coreference resolution, entity classification, and relation extraction. Existing methods often utilize a sequential multi-task learning approach, in which the arbitral decomposition causes the current task to depend only on the previous one, missing the possible existence of the more complex relationships between them. In this paper, we present a multi-task learning framework with bidirectional memory-like dependency between tasks to address those drawbacks and perform the joint problem more accurately. Our empirical studies show that the proposed approach outperforms the existing methods and achieves state-of-the-art results on the BioCreative V CDR corpus.
摘要
文档级联合实体和关系抽取是一个复杂的信息抽取问题,需要一种统一的方法,其中单个神经网络执行四个子任务:提及检测、核心匹配解决、实体分类和关系抽取。现有方法frequently使用顺序多任务学习方法,这会使当前任务仅仅依赖于前一个任务,缺少可能存在的更复杂的关系。在这篇论文中,我们提出了一种多任务学习框架,其中任务之间具有双向卫星记忆的相互依赖关系,以解决这些缺陷并更准确地完成联合问题。我们的实验表明,我们的方法可以超过现有方法,并在生物创新V CDR数据集上达到状态的最佳Result。
Towards dialect-inclusive recognition in a low-resource language: are balanced corpora the answer?
results: 结果显示,dialect-balanced训练集不会导致所有 диалект的表现相似。 Ul диалект一直表现出来underperform,而 Mu dialect获得了最低的wer。 Co 和 Mu диалект之间存在密切的关系,但这种关系不是对称的。 这些结果将指导未来的 corps collection 和系统建设策略,以优化cross-dialect表现的公平性。Abstract
ASR systems are generally built for the spoken 'standard', and their performance declines for non-standard dialects/varieties. This is a problem for a language like Irish, where there is no single spoken standard, but rather three major dialects: Ulster (Ul), Connacht (Co) and Munster (Mu). As a diagnostic to quantify the effect of the speaker's dialect on recognition performance, 12 ASR systems were trained, firstly using baseline dialect-balanced training corpora, and then using modified versions of the baseline corpora, where dialect-specific materials were either subtracted or added. Results indicate that dialect-balanced corpora do not yield a similar performance across the dialects: the Ul dialect consistently underperforms, whereas Mu yields lowest WERs. There is a close relationship between Co and Mu dialects, but one that is not symmetrical. These results will guide future corpus collection and system building strategies to optimise for cross-dialect performance equity.
摘要
听说系统通常是为标准口语建立的,其性能在非标准方言下降。这是一个问题,因为不同的语言如爱尔兰语没有单一的口语标准,而是有三大方言:乌尔斯特(Ul)、康нах特(Co)和门стер(Mu)。为了诊断语音识别器的语言方言影响的效果,12个听说系统被训练,首先使用基线 dialect-balanced 训练数据库,然后使用基线数据库的修改版本,其中方言特定的材料被 subtracted 或 added。结果表明,dialect-balanced 数据库不会在不同方言上得到相同的性能:Ul 方言一直表现不佳,而 Mu 方言具有最低 WERs。Co 和 Mu 方言之间存在close关系,但这不是对称的。这些结果将指导未来的 corpus 收集和系统建立策略,以便实现cross-dialect 性能平等。
Replay to Remember: Continual Layer-Specific Fine-tuning for German Speech Recognition
paper_authors: Theresa Pekarek Rosin, Stefan Wermter
for: 这 paper 的目的是探讨大规模自动语音识别(ASR)模型在小区域数据上的表现,以及如何通过 selective freezing 和经验回放来提高模型的稳定性和抗衰落性。
methods: 作者使用了大规模 multilingual ASR 模型,通过 transfer learning 将其适应到更小的 germany senior voice commands(SVC-de)数据集上。在训练过程中,作者选择性冻结部分模型参数,以保留大规模数据上的表现。此外,作者还应用了经验回放来进行 continual learning,以增强模型对 vocabulary 和 Speaker 的抗衰落性。
results: 作者通过实验发现,可以通过 selective freezing 和经验回放来将 ASR 模型的表现 approximated 到小区域数据上,并保持 general speech recognition 的表现在可接受的 Water Error Rate(WER)下。特别是,通过添加原始频率上的一部分数据,可以在新频率上达到 WER 下于 5%,并稳定 general speech recognition 的表现。Abstract
While Automatic Speech Recognition (ASR) models have shown significant advances with the introduction of unsupervised or self-supervised training techniques, these improvements are still only limited to a subsection of languages and speakers. Transfer learning enables the adaptation of large-scale multilingual models to not only low-resource languages but also to more specific speaker groups. However, fine-tuning on data from new domains is usually accompanied by a decrease in performance on the original domain. Therefore, in our experiments, we examine how well the performance of large-scale ASR models can be approximated for smaller domains, with our own dataset of German Senior Voice Commands (SVC-de), and how much of the general speech recognition performance can be preserved by selectively freezing parts of the model during training. To further increase the robustness of the ASR model to vocabulary and speakers outside of the fine-tuned domain, we apply Experience Replay for continual learning. By adding only a fraction of data from the original domain, we are able to reach Word-Error-Rates (WERs) below 5\% on the new domain, while stabilizing performance for general speech recognition at acceptable WERs.
摘要
自动语音识别(ASR)模型在无监督或自监督训练技术的引入后显示了显著的进步,但这些改进仅限于一个语言和说话人的子集。通过传输学习,大规模多语言模型可以适应不仅低资源语言,还可以适应更特定的说话人组。然而,在新领域的数据进行精细调整通常会导致原领域的性能下降。因此,我们在实验中研究了如何在更小的领域中 aproximate大规模 ASR 模型的性能,以及如何在 selective freezing 部分模型 During training 中保持一定的通用语音识别性能。此外,我们还应用了经验回放以实现持续学习,通过添加原领域数据的一小部分,能够在新领域下达到 Word-Error-Rates(WER)下于5%,而同时稳定通用语音识别的性能。
Are words equally surprising in audio and audio-visual comprehension?
paper_authors: Pranava Madhyastha, Ye Zhang, Gabriella Vigliocco
for: investigate the effect of visual information on spoken language comprehension
methods: compare ERP signature (N400) in audio-only and audio-visual presentations, use different types of language models (n-gram and Transformer models) to predict N400 responses for each word
results: cognitive effort differs significantly between multimodal and unimodal settings, Transformer-based models provide a better fit in the audio-only setting, while 2-gram language models are more effective in the multimodal setting, highlighting the significant impact of local lexical context on cognitive processing in a multimodal environment.Here’s the full text in Simplified Chinese:
results: 在多模态和单模态 Setting中,认知努力存在显著的差异,Transformer模型在audio只 Setting中提供了更好的预测,而2-gram语言模型在多模态 Setting中更有效。这些结果表明在多模态环境中,本地语言上下文对认知处理产生了深刻的影响。Abstract
We report a controlled study investigating the effect of visual information (i.e., seeing the speaker) on spoken language comprehension. We compare the ERP signature (N400) associated with each word in audio-only and audio-visual presentations of the same verbal stimuli. We assess the extent to which surprisal measures (which quantify the predictability of words in their lexical context) are generated on the basis of different types of language models (specifically n-gram and Transformer models) that predict N400 responses for each word. Our results indicate that cognitive effort differs significantly between multimodal and unimodal settings. In addition, our findings suggest that while Transformer-based models, which have access to a larger lexical context, provide a better fit in the audio-only setting, 2-gram language models are more effective in the multimodal setting. This highlights the significant impact of local lexical context on cognitive processing in a multimodal environment.
摘要
我们报告了一项控制性研究,探讨了视觉信息(即说话人的视觉)对口语理解的影响。我们比较了听音只和听音视频两种 presentaation 中每个单词的 ERP 特征(N400)。我们评估了不同类型的语言模型(具体是 n-gram 和 Transformer 模型)对每个单词的抽象度的影响。我们发现,在多模态和单模态 Setting 中,聪明的努力有所不同。此外,我们发现,基于 Transformer 模型,它们有更大的语言上下文,在听音只 Setting 中提供了更好的适应。然而,基于 2-gram 语言模型,在多模态 Setting 中更有效。这显示了多模态环境中的本地语言上下文对认知处理的重要影响。
A Topical Approach to Capturing Customer Insight In Social Media
results: 研究人员在一个实际案例中使用了这些方法测试 benchmark 和汽车行业相关的数据集,并证明了这些方法可以与当前状态的方法相比或更好的表现。同时,研究人员也认为这些方法可以帮助话题模型领域增强评估指标。Abstract
The age of social media has opened new opportunities for businesses. This flourishing wealth of information is outside traditional channels and frameworks of classical marketing research, including that of Marketing Mix Modeling (MMM). Textual data, in particular, poses many challenges that data analysis practitioners must tackle. Social media constitute massive, heterogeneous, and noisy document sources. Industrial data acquisition processes include some amount of ETL. However, the variability of noise in the data and the heterogeneity induced by different sources create the need for ad-hoc tools. Put otherwise, customer insight extraction in fully unsupervised, noisy contexts is an arduous task. This research addresses the challenge of fully unsupervised topic extraction in noisy, Big Data contexts. We present three approaches we built on the Variational Autoencoder framework: the Embedded Dirichlet Process, the Embedded Hierarchical Dirichlet Process, and the time-aware Dynamic Embedded Dirichlet Process. These nonparametric approaches concerning topics present the particularity of determining word embeddings and topic embeddings. These embeddings do not require transfer learning, but knowledge transfer remains possible. We test these approaches on benchmark and automotive industry-related datasets from a real-world use case. We show that our models achieve equal to better performance than state-of-the-art methods and that the field of topic modeling would benefit from improved evaluation metrics.
摘要
“社交媒体时代对企业带来了新的机会。这个润装丰富的信息是传统渠道和古典市场研究框架之外,包括市场混合模型(MMM)。文本数据特别是这些挑战,数据分析实践者需要面对。社交媒体是巨大、多样化和噪音的文档来源。工业资料取得过程中有一定的ETL。但是,资料噪音的多样性和不同来源导致需要特殊的工具。如果简称,在完全不监督的情况下提取客户见解是一个艰辛的任务。本研究面著挑战的是在噪音大数据中完全不监督的主题抽象。我们提出了基于自适应抽象器框架的三种方法:嵌入 Dirichlet 过程、嵌入层次 Dirichlet 过程和时间意识的动态嵌入 Dirichlet 过程。这些非parametric方法对主题表现出特殊之处,包括决定字幕和主题幕。这些幕并不需要传统学习,但知识传统仍然可能。我们在 benchmark 和汽车业相关的数据集上进行了实验,并证明了我们的模型在比较顶对照方法的情况下实现了等于或更好的性能。这显示了主题抽象领域对于改进评估指标的需求。”
MorphPiece : Moving away from Statistical Language Representation
results: 与标准 BPE 字串识别器相比,该方法可以提高语言模型的融合性能,并在多种 NLP 任务中显示出superior表现。Abstract
Tokenization is a critical part of modern NLP pipelines. However, contemporary tokenizers for Large Language Models are based on statistical analysis of text corpora, without much consideration to the linguistic features. We propose a linguistically motivated tokenization scheme, MorphPiece, which is based partly on morphological segmentation of the underlying text. A GPT-style causal language model trained on this tokenizer (called MorphGPT) shows superior convergence compared to the same architecture trained on a standard BPE tokenizer. Specifically we get Language Modeling performance comparable to a 6 times larger model. Additionally, we evaluate MorphGPT on a variety of NLP tasks in supervised and unsupervised settings and find superior performance across the board, compared to GPT-2 model.
摘要
启用是现代自然语言处理(NLP)管道的关键部分。然而,当前的tokenizer для大语言模型基于文本 corpus 的统计分析,没有考虑语言特点。我们提出一种基于语言学特征的tokenization方案,MorphPiece,它基于文本的 morphological segmentation。一个基于MorphGPT的GPT-style causal语言模型(called MorphGPT)在这种tokenizer上训练显示了更好的吞吐量,比标准BPE tokenizer更高的性能。特别是,我们得到了与6倍大的模型性能相同的语言模型表现。此外,我们在多种NLP任务中评估了MorphGPT的表现,包括supervised和Unsupervised Setting,并发现其在所有任务上表现更好,比GPT-2模型更高。
Improving BERT with Hybrid Pooling Network and Drop Mask
For: This paper proposes a HybridBERT model that combines self-attention and pooling networks to improve the encoding of contextual features in each layer, and also introduces a simple DropMask method to address the mismatch between pre-training and fine-tuning.* Methods: The HybridBERT model uses a combination of self-attention and pooling networks to encode different contextual features in each layer, and the DropMask method is used to address the mismatch between pre-training and fine-tuning.* Results: The HybridBERT model outperforms the vanilla BERT in pre-training with lower loss, faster training speed, and lower memory cost, and also achieves 1.5% higher accuracies on downstream tasks. The DropMask method improves the accuracies of BERT on downstream tasks across various masking rates.Abstract
Transformer-based pre-trained language models, such as BERT, achieve great success in various natural language understanding tasks. Prior research found that BERT captures a rich hierarchy of linguistic information at different layers. However, the vanilla BERT uses the same self-attention mechanism for each layer to model the different contextual features. In this paper, we propose a HybridBERT model which combines self-attention and pooling networks to encode different contextual features in each layer. Additionally, we propose a simple DropMask method to address the mismatch between pre-training and fine-tuning caused by excessive use of special mask tokens during Masked Language Modeling pre-training. Experiments show that HybridBERT outperforms BERT in pre-training with lower loss, faster training speed (8% relative), lower memory cost (13% relative), and also in transfer learning with 1.5% relative higher accuracies on downstream tasks. Additionally, DropMask improves accuracies of BERT on downstream tasks across various masking rates.
摘要
transformer-based 预训练语言模型,如BERT,在不同的自然语言理解任务中获得了很大的成功。先前的研究发现,BERT capture了不同层次的语言信息,但是普通的BERT使用同一种自我注意机制来模型不同的contextual feature。在这篇论文中,我们提出了HybridBERT模型,它将自我注意和卷积网络结合使用,以在每层中编码不同的contextual feature。此外,我们还提出了一种简单的DropMask方法,用于解决预训练和精度训练之间的差异,由于预训练中过度使用特殊的masktoken。实验显示,HybridBERT在预训练中得到较低的损失,更快的训练速度(8%相对),更低的内存成本(13%相对),以及在转移学习中的1.5%相对高于Accuracies。此外,DropMask可以提高BERT在下游任务上的准确率,不 matter what masking rate。
Certified Robustness for Large Language Models with Self-Denoising
results: 对比 existed 证明方法,本方法在证明稳定性和实际稳定性两个方面具有较高的表现,并且可以更好地满足高风险环境中 LLM 的应用需求。I hope that helps!Abstract
Although large language models (LLMs) have achieved great success in vast real-world applications, their vulnerabilities towards noisy inputs have significantly limited their uses, especially in high-stake environments. In these contexts, it is crucial to ensure that every prediction made by large language models is stable, i.e., LLM predictions should be consistent given minor differences in the input. This largely falls into the study of certified robust LLMs, i.e., all predictions of LLM are certified to be correct in a local region around the input. Randomized smoothing has demonstrated great potential in certifying the robustness and prediction stability of LLMs. However, randomized smoothing requires adding noise to the input before model prediction, and its certification performance depends largely on the model's performance on corrupted data. As a result, its direct application to LLMs remains challenging and often results in a small certification radius. To address this issue, we take advantage of the multitasking nature of LLMs and propose to denoise the corrupted inputs with LLMs in a self-denoising manner. Different from previous works like denoised smoothing, which requires training a separate model to robustify LLM, our method enjoys far better efficiency and flexibility. Our experiment results show that our method outperforms the existing certification methods under both certified robustness and empirical robustness. The codes are available at https://github.com/UCSB-NLP-Chang/SelfDenoise.
摘要
大型语言模型(LLM)在各种实际应用中取得了很大的成功,但它们对噪输入的敏感性限制了它们在高资产环境中的应用,特别是在高资产环境中,每个LLM预测都需要是稳定的,即LLM预测应该在输入的小变化下保持一致。这主要归结于证明LLM的稳定性和预测稳定性,即所有LLM预测在输入的当地区域内都是正确的。随机缓冲已经证明了它在证明LLM的稳定性和预测稳定性方面具有极大的潜力。然而,随机缓冲需要在模型预测前添加噪音到输入中,并且其证明性виси于模型在受损数据上的性能。因此,直接应用随机缓冲到LLM中 remainschallenging,并常导致小证明半径。为解决这一问题,我们利用了LLM的多任务性,并提议使用LLM自身来去噪 Input。与之前的denoising smoothing不同,我们的方法不需要单独培训一个模型来Robustify LLM,而且具有较好的效率和灵活性。我们的实验结果表明,我们的方法在证明稳定性和实际稳定性两个方面都高于现有的证明方法。代码可以在https://github.com/UCSB-NLP-Chang/SelfDenoise中找到。
Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks
results: 我们的方法在一个新建的数据集上进行验证,并与基线方法进行比较。实验结果显示,我们的方法在语言理解精度方面表现出色,而且在物理实验中,DSR成功完成了指定的物品抓取和放置动作,成功率高于90%。Abstract
This paper describes a domestic service robot (DSR) that fetches everyday objects and carries them to specified destinations according to free-form natural language instructions. Given an instruction such as "Move the bottle on the left side of the plate to the empty chair," the DSR is expected to identify the bottle and the chair from multiple candidates in the environment and carry the target object to the destination. Most of the existing multimodal language understanding methods are impractical in terms of computational complexity because they require inferences for all combinations of target object candidates and destination candidates. We propose Switching Head-Tail Funnel UNITER, which solves the task by predicting the target object and the destination individually using a single model. Our method is validated on a newly-built dataset consisting of object manipulation instructions and semi photo-realistic images captured in a standard Embodied AI simulator. The results show that our method outperforms the baseline method in terms of language comprehension accuracy. Furthermore, we conduct physical experiments in which a DSR delivers standardized everyday objects in a standardized domestic environment as requested by instructions with referring expressions. The experimental results show that the object grasping and placing actions are achieved with success rates of more than 90%.
摘要
translate to Simplified Chinese:这篇论文描述了一种家庭服务机器人(DSR),该机器人可以根据自然语言指令fetch everyday objects并将其运送到指定的目的地点。例如,一个指令可以是“将左边的杯子移动到空椅上”,在这个指令中,DSR需要从多个环境中 identificates the bottle and the chair,并将目标对象运送到目的地点。现有的多modal语言理解方法都是在计算复杂性方面不实用,因为它们需要对所有的目标对象和目的地点进行推理。我们提出了 Switching Head-Tail Funnel UNITER,这种方法可以通过单个模型来预测目标对象和目的地点。我们的方法在一个新建的数据集上进行验证,结果表明我们的方法在语言理解精度方面高于基eline方法。此外,我们还进行了实际实验,在标准的家庭环境中,DSR通过指令With referring expressions来实现物品抓取和放置动作,实际结果表明动作成功率高于90%。
Learning to Retrieve In-Context Examples for Large Language Models
results: 在30个任务上显著提高上下文学习性能,并在训练中对未看过任务的探索性能具有普遍性In simpler Chinese text:
for: 提高LLM在上下文中学习的效iveness
methods: 使用奖励模型和知识填充 retrainer 训练稠密检索器
results: 在30个任务上显著提高上下文学习性能,并在训练中对未看过任务的探索性能具有普遍性Abstract
Large language models (LLMs) have demonstrated their ability to learn in-context, allowing them to perform various tasks based on a few input-output examples. However, the effectiveness of in-context learning is heavily reliant on the quality of the selected examples. In this paper, we propose a novel framework to iteratively train dense retrievers that can identify high-quality in-context examples for LLMs. Our framework initially trains a reward model based on LLM feedback to evaluate the quality of candidate examples, followed by knowledge distillation to train a bi-encoder based dense retriever. Our experiments on a suite of 30 tasks demonstrate that our framework significantly enhances in-context learning performance. Furthermore, we show the generalization ability of our framework to unseen tasks during training. An in-depth analysis reveals that our model improves performance by retrieving examples with similar patterns, and the gains are consistent across LLMs of varying sizes.
摘要
大型语言模型(LLM)已经表现出可以在上下文中学习,以完成基于输入输出示例的多种任务。然而,Context learning的效果受输入示例质量的影响很大。在这篇论文中,我们提出了一种新的框架,可以逐步训练稠密检索器,以便在LLM上进行高质量的上下文学习示例选择。我们的框架首先在LLM反馈下将奖励模型训练,然后通过知识储存来训练二元编码器基于稠密检索器。我们在30个任务的灵活suite上进行了实验,结果表明,我们的框架可以明显提高上下文学习性能。此外,我们还证明了我们的框架在训练过程中对未看过任务的泛化能力具有很强的能力。深入分析表明,我们的模型可以更好地选择与示例具有相似模式的示例,并且这些提升都是LLM的大小无关的。
Drive Like a Human: Rethinking Autonomous Driving with Large Language Models
results: 实验表明,LLM 在复杂情况下能够具有卓越的解释和环境互动能力,为人类化自动驾驶的开发提供了有价值的意见。Abstract
In this paper, we explore the potential of using a large language model (LLM) to understand the driving environment in a human-like manner and analyze its ability to reason, interpret, and memorize when facing complex scenarios. We argue that traditional optimization-based and modular autonomous driving (AD) systems face inherent performance limitations when dealing with long-tail corner cases. To address this problem, we propose that an ideal AD system should drive like a human, accumulating experience through continuous driving and using common sense to solve problems. To achieve this goal, we identify three key abilities necessary for an AD system: reasoning, interpretation, and memorization. We demonstrate the feasibility of employing an LLM in driving scenarios by building a closed-loop system to showcase its comprehension and environment-interaction abilities. Our extensive experiments show that the LLM exhibits the impressive ability to reason and solve long-tailed cases, providing valuable insights for the development of human-like autonomous driving. The related code are available at https://github.com/PJLab-ADG/DriveLikeAHuman .
摘要
在这篇论文中,我们探讨了使用大型自然语言模型(LLM)来理解人类驾驶环境,并分析其能否如人类般进行理性、解释和记忆。我们认为传统的优化型和模块化自动驾驶(AD)系统在面对复杂情况时存在内在的性能限制。为解决这个问题,我们提议一个理想的 AD 系统应该驾驶如人类一样,通过不断驾驶积累经验,使用通用意识解决问题。为达到这个目标,我们确定了三种关键的能力是必要的:理性、解释和记忆。我们通过建立一个封闭系统来示cases,证明LLM在驾驶场景中的理解和环境互动能力。我们的广泛实验表明,LLM能够具有卓越的理性和解决长尾情况的能力,为人类化自动驾驶的开发提供了有价值的思路。相关代码可以在https://github.com/PJLab-ADG/DriveLikeAHuman 中找到。
Do not Mask Randomly: Effective Domain-adaptive Pre-training by Masking In-domain Keywords
results: 根据六个不同的设置,包括三个数据集和两种不同的预训练语言模型(PLMs),我们的结果表明,使用我们的预训练策略来改进PLMs,than使用随机掩码预训练和常见的预训练然后调整的方法。此外,标识目标领域中的关键词的负担不高,例如,对BERT大(Devlin et al., 2019)的预训练需要7-15%的时间(对两个epoch)。Abstract
We propose a novel task-agnostic in-domain pre-training method that sits between generic pre-training and fine-tuning. Our approach selectively masks in-domain keywords, i.e., words that provide a compact representation of the target domain. We identify such keywords using KeyBERT (Grootendorst, 2020). We evaluate our approach using six different settings: three datasets combined with two distinct pre-trained language models (PLMs). Our results reveal that the fine-tuned PLMs adapted using our in-domain pre-training strategy outperform PLMs that used in-domain pre-training with random masking as well as those that followed the common pre-train-then-fine-tune paradigm. Further, the overhead of identifying in-domain keywords is reasonable, e.g., 7-15% of the pre-training time (for two epochs) for BERT Large (Devlin et al., 2019).
摘要
我们提出了一种新的任务无关预训练方法,它位于普通预训练和精度调整之间。我们的方法选择性地遮盖域内关键词(Grootendorst, 2020),即目标领域中提供紧凑的表示的词语。我们使用 KeyBERT 进行识别。我们使用六个不同的设置进行评估:三个数据集和两种不同的预训练语言模型(PLM)。我们的结果表明,使用我们的域内预训练策略进行精度调整的 PLM 表现比使用随机遮盖和常见预训练然后调整的 PLM 更好。此外,寻找域内关键词的开销是合理的,例如,BERT 大型(Devlin et al., 2019)在两个epoch的预训练时间中,寻找域内关键词需要7-15%的时间开销。
MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System
paper_authors: Libo Qin, Shijue Huang, Qiguang Chen, Chenran Cai, Yudi Zhang, Bin Liang, Wanxiang Che, Ruifeng Xu
for: 提高多模态嘲笑检测系统的可靠性
methods: 利用多个视角(文本、图像和文本-图像互动视角)的多重准确信息
results: 在广泛的实验中,MMSD2.0是一个有价值的 Referenceless Benchmark,而多视图CLIP可以明显超越前一个基线值Abstract
Multi-modal sarcasm detection has attracted much recent attention. Nevertheless, the existing benchmark (MMSD) has some shortcomings that hinder the development of reliable multi-modal sarcasm detection system: (1) There are some spurious cues in MMSD, leading to the model bias learning; (2) The negative samples in MMSD are not always reasonable. To solve the aforementioned issues, we introduce MMSD2.0, a correction dataset that fixes the shortcomings of MMSD, by removing the spurious cues and re-annotating the unreasonable samples. Meanwhile, we present a novel framework called multi-view CLIP that is capable of leveraging multi-grained cues from multiple perspectives (i.e., text, image, and text-image interaction view) for multi-modal sarcasm detection. Extensive experiments show that MMSD2.0 is a valuable benchmark for building reliable multi-modal sarcasm detection systems and multi-view CLIP can significantly outperform the previous best baselines.
摘要
多Modal sarcastic detection recently attracted much attention. However, the existing benchmark (MMSD) has some shortcomings that hinder the development of reliable multi-modal sarcasm detection systems:1. MMSD contains some spurious cues, leading to model bias learning;2. The negative samples in MMSD are not always reasonable.To solve these issues, we introduce MMSD2.0, a corrected dataset that removes spurious cues and re-annotates unreasonable samples. Additionally, we present a novel framework called multi-view CLIP that can leverage multi-grained cues from multiple perspectives (i.e., text, image, and text-image interaction view) for multi-modal sarcasm detection.Extensive experiments show that MMSD2.0 is a valuable benchmark for building reliable multi-modal sarcasm detection systems, and multi-view CLIP can significantly outperform previous best baselines.
Generating Efficient Training Data via LLM-based Attribute Manipulation
results: 对文本分类和其他任务进行了广泛的试验,证明CoTAM比其他基于LLM的文本生成方法具有更大的优势,并且可以在几个shot学习中实现更好的性能。Abstract
In this paper, we propose a novel method, Chain-of-Thoughts Attribute Manipulation (CoTAM), to guide few-shot learning by carefully crafted data from Large Language Models (LLMs). The main idea is to create data with changes only in the attribute targeted by the task. Inspired by facial attribute manipulation, our approach generates label-switched data by leveraging LLMs to manipulate task-specific attributes and reconstruct new sentences in a controlled manner. Instead of conventional latent representation controlling, we implement chain-of-thoughts decomposition and reconstruction to adapt the procedure to LLMs. Extensive results on text classification and other tasks verify the advantage of CoTAM over other LLM-based text generation methods with the same number of training examples. Analysis visualizes the attribute manipulation effectiveness of CoTAM and presents the potential of LLM-guided learning with even less supervision.
摘要
在这篇论文中,我们提出了一种新的方法,链条思维属性 manipulate(CoTAM),用于导引几张学习。我们的主要想法是创建具有目标特征的数据,并通过大自然语言模型(LLM)来修改这些特征。启发自人脸特征 manipulate,我们的方法生成了标签交换数据,并使用 LLM 来控制任务特定的特征并重建新的句子。相比于传统的隐藏表示控制,我们实施了链条思维分解和重建来适应 LLM。我们的实验结果表明,CoTAM 在文本分类和其他任务上比其他基于 LLM 的文本生成方法具有更多的优势,并且对于几张学习,CoTAM 可以减少数据量。分析还可以视觉化 attribute 修改效果,并探讨 LLM 带领学习的可能性。
An Analysis of Dialogue Repair in Virtual Voice Assistants
results: 研究发现,虚拟助手和人类对话修复策略存在差异,而且虚拟助手和语言 studied也存在差异。Abstract
Language speakers often use what are known as repair initiators to mend fundamental disconnects that occur between them during verbal communication. Previous research in this field has mainly focused on the human-to-human use of repair initiator. We proposed an examination of dialogue repair structure wherein the dialogue initiator is human and the party that initiates or responds to the repair is a virtual assistant. This study examined the use of repair initiators in both English and Spanish with two popular assistants, Google Assistant and Apple's Siri. Our aim was to codify the differences, if any, in responses by voice assistants to dialogues in need of repair as compared to human-human dialogues also in need of repair. Ultimately the data demonstrated that not only were there differences between human-assistant and human-human dialogue repair strategies, but that there were likewise differences among the assistants and the languages studied.
摘要
语言使用者常使用优化开始符来修复在对话中出现的基本分支。先前的研究主要集中在人类到人类对话中的修复开始符。我们提议了对对话结构的对话修复,其中对话开始人是人类,并且有虚拟助手发起或回应修复。我们对英语和西班牙语使用Google助手和苹果的SIRI两种流行助手进行研究,目的是将对话需要修复的响应与人类对话需要修复的响应进行比较。最终数据显示,不仅有人类与助手对话修复策略的差异,还有助手之间和语言研究的差异。
Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling
results: 该论文通过对端到端模型和混合模型(ASR+NLU)进行比较,显示E2E模型比混合模型更高效,除非提供oracle ASR模型。此外,作者还提供了一个完整的实现,包括代码、检查点和配置。Abstract
We study speech intent classification and slot filling (SICSF) by proposing to use an encoder pretrained on speech recognition (ASR) to initialize an end-to-end (E2E) Conformer-Transformer model, which achieves the new state-of-the-art results on the SLURP dataset, with 90.14% intent accuracy and 82.27% SLURP-F1. We compare our model with encoders pretrained on self-supervised learning (SSL), and show that ASR pretraining is much more effective than SSL for SICSF. To explore parameter efficiency, we freeze the encoder and add Adapter modules, and show that parameter efficiency is only achievable with an ASR-pretrained encoder, while the SSL encoder needs full finetuning to achieve comparable results. In addition, we provide an in-depth comparison on end-to-end models versus cascading models (ASR+NLU), and show that E2E models are better than cascaded models unless an oracle ASR model is provided. Last but not least, our model is the first E2E model that achieves the same performance as cascading models with oracle ASR. Code, checkpoints and configs are available.
摘要
我们研究了Speech Intent Classification和slot filling(SICSF),我们提议使用已经预训练的SpeechRecognition(ASR)encoder来初始化一个End-to-end(E2E)Conformer-Transformer模型,达到了SLURP数据集上新的状态码� instantiateResults,即90.14%的意图精度和82.27%的SLURP-F1。我们与self-supervised learning(SSL)预训练器进行比较,并显示ASR预训练是SLSF的多少更有效。为了探索参数效率,我们冻结encoder并添加Adapter模块,并显示仅ASR预训练的encoder可以实现参数效率,而SSL预训练的encoder需要全部训练来 достичь相似的结果。此外,我们还进行了End-to-end模型vs搭配模型(ASR+NLU)的深入比较,并显示E2E模型比搭配模型更好,除非提供了oracle ASR模型。最后,我们的模型是第一个实现了与搭配模型相同的性能的E2E模型。我们提供了代码、Checkpoints和配置。
Making the Most Out of the Limited Context Length: Predictive Power Varies with Clinical Note Type and Note Section
results: 该论文使用MIMIC-III数据集进行实验,发现:1)临床笔记和释放笔记的预测力分布不同,2)将不同类型的临床笔记合并使用可以在大context length下提高性能。这些发现 suggessts that a carefully selected sampling function could enable more efficient information extraction from clinical notes.Abstract
Recent advances in large language models have led to renewed interest in natural language processing in healthcare using the free text of clinical notes. One distinguishing characteristic of clinical notes is their long time span over multiple long documents. The unique structure of clinical notes creates a new design choice: when the context length for a language model predictor is limited, which part of clinical notes should we choose as the input? Existing studies either choose the inputs with domain knowledge or simply truncate them. We propose a framework to analyze the sections with high predictive power. Using MIMIC-III, we show that: 1) predictive power distribution is different between nursing notes and discharge notes and 2) combining different types of notes could improve performance when the context length is large. Our findings suggest that a carefully selected sampling function could enable more efficient information extraction from clinical notes.
摘要
近期大语言模型的进步导致了医疗健康领域自然语言处理(NLP)中文仪表的再次吸引。临床笔记的一个特点是其长时间覆盖多个长文档,这创造了一个新的设计选择:当语言模型预测器的上下文长度有限制时,我们应该选择哪部分作为输入?现有研究 either 选择输入的域知识或 simply truncate them。我们提出了一个框架来分析sections with high predictive power。使用MIMIC-III,我们发现:1)预测力分布不同于护理笔记和出院笔记,2)将不同类型的笔记结合起来可以在大上下文长度下提高性能。我们的发现表示,一个仔细选择的采样函数可以更有效地提取信息从临床笔记中。
MegaWika: Millions of reports and their sources across 50 diverse languages
paper_authors: Samuel Barham, Orion Weller, Michelle Yuan, Kenton Murray, Mahsa Yarmohammadi, Zhengping Jiang, Siddharth Vashishtha, Alexander Martin, Anqi Liu, Aaron Steven White, Jordan Boyd-Graber, Benjamin Van Durme
results: 本研究提供了许多基线结果和训练模型,包括跨语言问答和引用检索。MegaWika是 sentence-level 报告生成的最大资源,同时也是唯一的多语言报告生成资源。Abstract
To foster the development of new models for collaborative AI-assisted report generation, we introduce MegaWika, consisting of 13 million Wikipedia articles in 50 diverse languages, along with their 71 million referenced source materials. We process this dataset for a myriad of applications, going beyond the initial Wikipedia citation extraction and web scraping of content, including translating non-English articles for cross-lingual applications and providing FrameNet parses for automated semantic analysis. MegaWika is the largest resource for sentence-level report generation and the only report generation dataset that is multilingual. We manually analyze the quality of this resource through a semantically stratified sample. Finally, we provide baseline results and trained models for crucial steps in automated report generation: cross-lingual question answering and citation retrieval.
摘要
为推动新的协作AI助成报告生成模型的发展,我们介绍了MegaWika,包括50种语言的1300万篇Wikipedia文章以及其7100万个参考文献。我们对这个数据集进行了多种应用程序,超出了初始的Wikipedia引用EXTRACTION和网络抓取内容的限制,包括将非英语文章翻译为跨语言应用程序和提供FrameNet分析的自动化semantic analysis。MegaWika是最大的句子级报告生成资源,也是唯一的多语言报告生成资源。我们手动分析了这个资源的质量,通过semanticstratified sample进行了manual analysis。最后,我们提供了基线结果和已经训练的模型,用于自动化报告生成的关键步骤:跨语言问答和引用检索。
DIALGEN: Collaborative Human-LM Generated Dialogues for Improved Understanding of Human-Human Conversations
results: 在结构化摘要化Agent-客户信息收集电话对话中,DIALGEN数据实现了显著提高模型性能的效果。Abstract
Applications that could benefit from automatic understanding of human-human conversations often come with challenges associated with private information in real-world data such as call center or clinical conversations. Working with protected data also increases costs of annotation, which limits technology development. To address these challenges, we propose DIALGEN, a human-in-the-loop semi-automated dialogue generation framework. DIALGEN uses a language model (ChatGPT) that can follow schema and style specifications to produce fluent conversational text, generating a complex conversation through iteratively generating subdialogues and using human feedback to correct inconsistencies or redirect the flow. In experiments on structured summarization of agent-client information gathering calls, framed as dialogue state tracking, we show that DIALGEN data enables significant improvement in model performance.
摘要
应用程序,具有自动理解人类对话的潜在优势,通常面临实际数据中private信息的挑战,如客户服务或医疗对话。与保护数据一起工作也增加了注释成本,限制技术发展。为解决这些挑战,我们提议DIALGEN,一种人工干预半自动对话生成框架。DIALGEN使用一种语言模型(ChatGPT),可以按照schema和样式规范生成流畅对话文本,通过逐步生成子对话和人类反馈来修正不一致或重定向流程。在对客户服务信息收集电话对话的结构化摘要实验中,我们示出DIALGEN数据可以带来显著提高模型性能。
Data Augmentation for Machine Translation via Dependency Subtree Swapping
paper_authors: Attila Nagy, Dorina Petra Lakatos, Botond Barta, Patrick Nanys, Judit Ács
for: 提高机器翻译模型的性能
methods: 基于依赖树的子树交换数据增强
results: 在4种语言对irection中,与基eline模型比较,有consistent的BLEU分数提高Abstract
We present a generic framework for data augmentation via dependency subtree swapping that is applicable to machine translation. We extract corresponding subtrees from the dependency parse trees of the source and target sentences and swap these across bisentences to create augmented samples. We perform thorough filtering based on graphbased similarities of the dependency trees and additional heuristics to ensure that extracted subtrees correspond to the same meaning. We conduct resource-constrained experiments on 4 language pairs in both directions using the IWSLT text translation datasets and the Hunglish2 corpus. The results demonstrate consistent improvements in BLEU score over our baseline models in 3 out of 4 language pairs. Our code is available on GitHub.
摘要
我们提出了一个通用的数据扩充框架,可以应用于机器翻译。我们从源和目标句子的依赖树中提取相应的子树,并将这些子树在句子之间交换以创建扩充样本。我们根据依赖树的图基 similarity 和其他规则进行了严格的筛选,以确保提取的子树表示同一个意义。我们在4种语言对的两个方向使用 IWSLT 文本翻译数据集和 Hunglish2 corpus 进行了资源受限的实验。结果表明在3种语言对中,我们的基eline模型在 BLEU 分数上具有了一致的提高。我们的代码可以在 GitHub 上找到。
Electoral Agitation Data Set: The Use Case of the Polish Election
paper_authors: Mateusz Baran, Mateusz Wójcik, Piotr Kolebski, Michał Bernaczyk, Krzysztof Rajda, Łukasz Augustyniak, Tomasz Kajdanowicz
for: This paper aims to address the problem of detecting electoral agitation in social media, specifically in the Polish language.
methods: The authors use a combination of human annotation and machine learning to create a data set of labeled tweets for training a Polish language model to detect electoral agitation.
results: The authors achieve a 0.66 inter-annotator agreement and a 68% F1 score for the fine-tuned language model on the newly created data set. They also present a number of potential use cases for such data sets and models, and analyze the Polish 2020 Presidential Election on Twitter.Here’s the simplified Chinese text:
results: 作者达到了0.66间标注者一致度(Cohen的卡巴 score)和68% F1分数,表明训练后的Polish语言模型在新创建的数据集上具有良好的检测能力。Abstract
The popularity of social media makes politicians use it for political advertisement. Therefore, social media is full of electoral agitation (electioneering), especially during the election campaigns. The election administration cannot track the spread and quantity of messages that count as agitation under the election code. It addresses a crucial problem, while also uncovering a niche that has not been effectively targeted so far. Hence, we present the first publicly open data set for detecting electoral agitation in the Polish language. It contains 6,112 human-annotated tweets tagged with four legally conditioned categories. We achieved a 0.66 inter-annotator agreement (Cohen's kappa score). An additional annotator resolved the mismatches between the first two improving the consistency and complexity of the annotation process. The newly created data set was used to fine-tune a Polish Language Model called HerBERT (achieving a 68% F1 score). We also present a number of potential use cases for such data sets and models, enriching the paper with an analysis of the Polish 2020 Presidential Election on Twitter.
摘要
社交媒体的流行使得政治人物利用它进行政治广告。因此,社交媒体上有很多选举宣传(选举宣传),特别是在选举运动期间。选举管理部门无法跟踪播散和量度选举宣传下的信息,这解决了一个重要的问题,同时还揭示了一个未曾有效地宣传的 nich(niche)。因此,我们提供了第一个公开的波兰语选举宣传数据集。该数据集包含6,112个人标注的推特,分为四个法律条件的类别。我们达到了0.66的间对标注者一致度(Cohen的卡方分数)。其中一个额外的标注者解决了第一两个标注的差异,从而提高了标注过程的一致性和复杂性。新创建的数据集被用来练化一个波兰语模型called HerBERT(实现了68%的F1分数)。我们还提供了一些可能的数据集和模型的应用场景,并对波兰2020年总统选举的推特进行分析。
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
results: 在 IGLUE benchmark 上,mBLIP 的表现与当前最佳模型匹配,而且在 XM3600 的图像描述任务中,mBLIP (零shot)even 超过了 PaLI-X(一个55B参数的模型)。相比之下,从头开始训练大型多语言视力语言模型需要训练几个数量级更多的参数和更多的数据。Abstract
Modular vision-language models (Vision-LLMs) align pretrained image encoders with (pretrained) large language models (LLMs), representing a computationally much more efficient alternative to end-to-end training of large vision-language models from scratch, which is prohibitively expensive for most. Vision-LLMs instead post-hoc condition LLMs to `understand' the output of an image encoder. With the abundance of readily available high-quality English image-text data as well as monolingual English LLMs, the research focus has been on English-only Vision-LLMs. Multilingual vision-language models are still predominantly obtained via expensive end-to-end pretraining, resulting in comparatively smaller models, trained on limited multilingual image data supplemented with text-only multilingual corpora. In this work, we present mBLIP, the first multilingual Vision-LLM, which we obtain in a computationally efficient manner -- on consumer hardware using only a few million training examples -- by leveraging a pretrained multilingual LLM. To this end, we \textit{re-align} an image encoder previously tuned to an English LLM to a new, multilingual LLM -- for this, we leverage multilingual data from a mix of vision-and-language tasks, which we obtain by machine-translating high-quality English data to 95 languages. On the IGLUE benchmark, mBLIP yields results competitive with state-of-the-art models. Moreover, in image captioning on XM3600, mBLIP (zero-shot) even outperforms PaLI-X (a model with 55B parameters). Compared to these very large multilingual vision-language models trained from scratch, we obtain mBLIP by training orders of magnitude fewer parameters on magnitudes less data. We release our model and code at \url{https://github.com/gregor-ge/mBLIP}.
摘要
模块化视力语言模型(Vision-LLMs)将预训练的图像编码器与(预训练)大型语言模型(LLMs)进行对接,代表了计算效率更高的代替方案,而不是从头开始训练大型视力语言模型,这是不可持续的。Vision-LLMs 将 LLMs 后置 condition 以便“理解”图像编码器的输出。由于有充足的高质量英文图文数据以及英文 LLMS,研究的焦点是英文只 Vision-LLMs。多语言视力语言模型仍然主要通过昂贵的端到端训练获得, resulting in 相对较小的模型,通过有限的多语言图像数据和文本Only multilingual corpora 进行训练。在这种工作中,我们发布了 mBLIP,首个多语言视力语言模型,通过在消费级硬件上使用只有几百万个训练例子进行计算效率的方式获得。为此,我们利用了预训练的多语言 LLMS。为此,我们重新对预训练的图像编码器进行了对接,使其与新的多语言 LLMS进行对接。我们利用了多语言视和语言任务中的数据,通过机器翻译高质量的英文数据到95种语言。在 IGLUE benchmark 上,mBLIP 的结果与状态 Ellume 模型竞争。此外,在 XM3600 上的图像描述任务中,mBLIP(零容量) Even outperform PaLI-X(一个550亿参数的模型)。相比于这些很大的多语言视力语言模型,我们通过训练数量在数量级下的参数和数据进行训练获得 mBLIP。我们将我们的模型和代码发布在 GitHub 上,请参考 \url{https://github.com/gregor-ge/mBLIP}。
Towards Populating Generalizable Engineering Design Knowledge
methods: train taggers to identify entities and relationships from sentences, and compare performance against typically recommended approaches
results: build a domain knowledge base and search for solutions relevant to key issues in fan systems, with comparative discussion against ChatGPT’s opinionsHere’s the full text in Simplified Chinese:
for: 为了填充可重用的工程设计知识,我们提议一种方法从 Sentences 中提取 head entity :: relationship :: tail entity 的事实。这些事实可以在各个和 across patent documents 中组合,形成知识图表,用于表示和存储设计知识。现有的工程设计文献中的方法通常使用一组预先定义的关系来填充 triple,而不是事实。
methods: 我们 trains 两个标注器:一个用于从 sentence 中标注 head entity 和 tail entity,另一个用于从 head entity 和 tail entity 对的对话中标注关系 tokens。为了训练这两个标注器,我们手动构建了44,227个句子和相应的事实数据集。我们还与一般推荐的方法进行比较,其中包括对 tokens 的独立对应和图形对应。
results: 我们应用方法到有关扇子系统的专利文献中的句子上,并构建了Domain知识库。然后,我们对知识库进行概述,并在一些关键问题上进行搜索,并组织结果为知识图表,进行与 ChatGPT 的比较讨论。Abstract
Aiming to populate generalizable engineering design knowledge, we propose a method to extract facts of the form head entity :: relationship :: tail entity from sentences found in patent documents. These facts could be combined within and across patent documents to form knowledge graphs that serve as schemes for representing as well as storing design knowledge. Existing methods in engineering design literature often utilise a set of predefined relationships to populate triples that are statistical approximations rather than facts. In our method, we train a tagger to identify both entities and relationships from a sentence. Given a pair of entities thus identified, we train another tagger to identify the relationship tokens that specifically denote the relationship between the pair. For training these taggers, we manually construct a dataset of 44,227 sentences and corresponding facts. We also compare the performance of the method against typically recommended approaches, wherein, we predict the edges among tokens by pairing the tokens independently and as part of a graph. We apply our method to sentences found in patents related to fan systems and build a domain knowledge base. Upon providing an overview of the knowledge base, we search for solutions relevant to some key issues prevailing in fan systems. We organize the responses into knowledge graphs and hold a comparative discussion against the opinions from ChatGPT.
摘要
目标是填充通用工程设计知识,我们提出了一种方法,从专利文档中提取形式为head entity :: relationship :: tail entity的事实。这些事实可以在专利文档之间组合,组成知识图图表示和存储设计知识。现有的工程设计文献中的方法 oftentimes使用 predetermined 关系来填充 triple,这些 triple 是统计方程而不是事实。在我们的方法中,我们训练一个标签器,可以从句子中标识 both entities 和关系。给定一对 entities,我们又训练另一个标签器,可以特定这两个 entities 之间的关系Token。为了训练这些标签器,我们手动构建了一 dataset of 44,227 句子和相应的事实。我们还比较了这种方法与通常推荐的方法的性能,其中,我们在 pairing tokens 独立和 graph 中对 tokens 进行预测。我们应用我们的方法到专利中关于扇子系统的句子,并建立了领域知识基础。对于一些关键问题在扇子系统中,我们提供了解决方案,并将其组织成知识图表示。我们进行了比较分析,并与 ChatGPT 的观点进行对比。
Adapting an ASR Foundation Model for Spoken Language Assessment
paper_authors: Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill for: 这个论文的目的是提出一种方法来更正Whisper输出中的问题,以提高学习者的评估和反馈。methods: 这个论文使用了精度的词汇级抽象和软提示调整的方法来修正Whisper输出。results: 实验结果表明,通过精度的词汇级抽象和软提示调整,可以有效地更正Whisper输出中的问题,并生成学习者实际上说的话。Abstract
A crucial part of an accurate and reliable spoken language assessment system is the underlying ASR model. Recently, large-scale pre-trained ASR foundation models such as Whisper have been made available. As the output of these models is designed to be human readable, punctuation is added, numbers are presented in Arabic numeric form and abbreviations are included. Additionally, these models have a tendency to skip disfluencies and hesitations in the output. Though useful for readability, these attributes are not helpful for assessing the ability of a candidate and providing feedback. Here a precise transcription of what a candidate said is needed. In this paper, we give a detailed analysis of Whisper outputs and propose two solutions: fine-tuning and soft prompt tuning. Experiments are conducted on both public speech corpora and an English learner dataset. Results show that we can effectively alter the decoding behaviour of Whisper to generate the exact words spoken in the response.
摘要
一个重要的准确和可靠的口语评估系统的基础模型是ASR模型。最近,大规模预训练ASR基础模型如嘟哒(Whisper)已经提供。这些模型的输出设计为人类可读性,包括括号、阿拉伯数字形式的数字和缩写。然而,这些特征不实用于评估候选人的能力和提供反馈。在这篇论文中,我们对嘟哒输出进行了详细分析,并提出了两种解决方案:细化和软提示调整。我们在公共演讲 Corpora 和英语学习数据集上进行了实验,结果表明我们可以有效地改变嘟哒的解码行为,以生成响应中实际说的字句。
methods: 我们首先构建了一个直观的基线,可以轻松地 incorporated into existing AL frameworks。然而,这个基eline仍然受到了过拟合的影响,并且在查询过程中缺乏表达部分标签的样本。 drawing inspiration from人类的推理在认知科学中,我们想要利用人类学习模式来解决过拟合问题,同时提高选择表达部分标签的样本。我们构建了一个简单而有效的worstNet,可以直接学习从这种补做模式。
results: 我们在五个实际 datasets和四个benchmark datasets上进行了实验,并证明了我们的提议方法在十个代表性的AL框架中具有了广泛的改进。这说明了worstNet的优越性。代码将在 \url{https://github.com/Ferenas/APLL} 上提供。Abstract
This paper studies a new problem, \emph{active learning with partial labels} (ALPL). In this setting, an oracle annotates the query samples with partial labels, relaxing the oracle from the demanding accurate labeling process. To address ALPL, we first build an intuitive baseline that can be seamlessly incorporated into existing AL frameworks. Though effective, this baseline is still susceptible to the \emph{overfitting}, and falls short of the representative partial-label-based samples during the query process. Drawing inspiration from human inference in cognitive science, where accurate inferences can be explicitly derived from \emph{counter-examples} (CEs), our objective is to leverage this human-like learning pattern to tackle the \emph{overfitting} while enhancing the process of selecting representative samples in ALPL. Specifically, we construct CEs by reversing the partial labels for each instance, and then we propose a simple but effective WorseNet to directly learn from this complementary pattern. By leveraging the distribution gap between WorseNet and the predictor, this adversarial evaluation manner could enhance both the performance of the predictor itself and the sample selection process, allowing the predictor to capture more accurate patterns in the data. Experimental results on five real-world datasets and four benchmark datasets show that our proposed method achieves comprehensive improvements over ten representative AL frameworks, highlighting the superiority of WorseNet. The source code will be available at \url{https://github.com/Ferenas/APLL}.
摘要
Inspired by human inference in cognitive science, where people can make accurate inferences from counter-examples (CEs), we aim to leverage this human-like learning pattern to tackle overfitting and improve the process of selecting representative samples in APLP. We construct CEs by reversing the partial labels for each instance, and then we propose a simple but effective WorseNet to directly learn from this complementary pattern. By leveraging the distribution gap between WorseNet and the predictor, this adversarial evaluation manner can enhance both the performance of the predictor and the sample selection process, allowing the predictor to capture more accurate patterns in the data.Experimental results on five real-world datasets and four benchmark datasets show that our proposed method achieves comprehensive improvements over ten representative AL frameworks. The source code will be available at \url{https://github.com/Ferenas/APLL}.Translation notes:* "active learning with partial labels" (ALPL) is translated as "活动学习半标签" (ALPL) in Simplified Chinese.* "oracle" is translated as "oracle" in Simplified Chinese.* "partial labels" is translated as "半标签" in Simplified Chinese.* "counter-examples" (CEs) is translated as "反例" (CEs) in Simplified Chinese.* "WorseNet" is translated as "差网" (WorseNet) in Simplified Chinese.
results: 研究发现,常见的 monotonic 课程在某些情况下可能不会达到最佳性能,而 non-monotonic 课程往往会表现出优异性能。此外,在小型数据集和模型上采用的课程也可以在大型数据集和模型上表现出优异性能。Abstract
We introduce the problem of curriculum discovery and describe a curriculum learning framework capable of discovering effective curricula in a curriculum space based on prior knowledge about sample difficulty. Using annotation entropy and loss as measures of difficulty, we show that (i): the top-performing discovered curricula for a given model and dataset are often non-monotonic as opposed to monotonic curricula in existing literature, (ii): the prevailing easy-to-hard or hard-to-easy transition curricula are often at the risk of underperforming, and (iii): the curricula discovered for smaller datasets and models perform well on larger datasets and models respectively. The proposed framework encompasses some of the existing curriculum learning approaches and can discover curricula that outperform them across several NLP tasks.
摘要
我团队介绍了课程发现问题,并描述了一种基于优化课程空间的学习框架,可以在尝试知识的基础上发现有效的课程。使用笔记 entropy 和损失作为困难度的度量,我们显示了以下结论:(i):给定模型和数据集的情况下,常见的非 monotonic 课程在课程空间中表现出色,而不是 monotonic 课程。(ii):常见的易到困难或困难到易转课程在批处理大数据集上存在风险,容易表现不佳。(iii):对小型模型和数据集,我们发现的课程可以在大型模型和数据集上表现出色。我们的框架包含一些现有的课程学习方法,并可以在多个自然语言处理任务中发现表现更好的课程。
Improved Convergence Analysis and SNR Control Strategies for Federated Learning in the Presence of Noise
results: 该论文的分析结果显示,在FL中,下行噪声的影响更加严重,而上行噪声的影响较弱。基于这个发现, authors提出了一种新的信号响应控制策略,可以在不受到噪声影响的前提下,提高FL的收敛速率。Abstract
We propose an improved convergence analysis technique that characterizes the distributed learning paradigm of federated learning (FL) with imperfect/noisy uplink and downlink communications. Such imperfect communication scenarios arise in the practical deployment of FL in emerging communication systems and protocols. The analysis developed in this paper demonstrates, for the first time, that there is an asymmetry in the detrimental effects of uplink and downlink communications in FL. In particular, the adverse effect of the downlink noise is more severe on the convergence of FL algorithms. Using this insight, we propose improved Signal-to-Noise (SNR) control strategies that, discarding the negligible higher-order terms, lead to a similar convergence rate for FL as in the case of a perfect, noise-free communication channel while incurring significantly less power resources compared to existing solutions. In particular, we establish that to maintain the $O(\frac{1}{\sqrt{K})$ rate of convergence like in the case of noise-free FL, we need to scale down the uplink and downlink noise by $\Omega({\sqrt{k})$ and $\Omega({k})$ respectively, where $k$ denotes the communication round, $k=1,\dots, K$. Our theoretical result is further characterized by two major benefits: firstly, it does not assume the somewhat unrealistic assumption of bounded client dissimilarity, and secondly, it only requires smooth non-convex loss functions, a function class better suited for modern machine learning and deep learning models. We also perform extensive empirical analysis to verify the validity of our theoretical findings.
摘要
我们提出了一种改进的收敛分析技术,用于描述 federated learning(FL)中的分布式学习 paradigm,在带有不完美/噪声的上行和下行通信的情况下。这些噪声通信场景在实际部署 FL 时经常出现。我们的分析表明,在 FL 中,下行噪声对收敛的影响更加严重。基于这一点,我们提出了改进的信噪比控制策略,使得在噪声通信道下,FL 的收敛速率与无噪声通信道相似,但却占用了较少的功率资源。具体来说,我们证明,要保持 $O(\frac{1}{\sqrt{K})$ 的收敛速率,就需要在下行和上行噪声中减小 $\Omega(\sqrt{k})$ 和 $\Omega(k)$,其中 $k$ 是通信轮次,$k=1,\dots, K$。我们的理论结论具有两个主要优点:首先,它不假设客户端之间的差异很大,而第二,它只需要非拥抱不对称损失函数,这是现代机器学习和深度学习模型更适合的函数类型。此外,我们还进行了广泛的实验分析,以验证我们的理论发现的正确性。
Performance of $\ell_1$ Regularization for Sparse Convex Optimization
results: 论文显示了Group LASSO在强转移函数$l$下最小化时,可以 recuperate sparse vector支持vector-valued feature的最大$\ell_2$范数。这个结果回答了 Tibshirani等人和Yasuda等人的开问题,并推广了Sequential Attention算法的证明保证。Abstract
Despite widespread adoption in practice, guarantees for the LASSO and Group LASSO are strikingly lacking in settings beyond statistical problems, and these algorithms are usually considered to be a heuristic in the context of sparse convex optimization on deterministic inputs. We give the first recovery guarantees for the Group LASSO for sparse convex optimization with vector-valued features. We show that if a sufficiently large Group LASSO regularization is applied when minimizing a strictly convex function $l$, then the minimizer is a sparse vector supported on vector-valued features with the largest $\ell_2$ norm of the gradient. Thus, repeating this procedure selects the same set of features as the Orthogonal Matching Pursuit algorithm, which admits recovery guarantees for any function $l$ with restricted strong convexity and smoothness via weak submodularity arguments. This answers open questions of Tibshirani et al. and Yasuda et al. Our result is the first to theoretically explain the empirical success of the Group LASSO for convex functions under general input instances assuming only restricted strong convexity and smoothness. Our result also generalizes provable guarantees for the Sequential Attention algorithm, which is a feature selection algorithm inspired by the attention mechanism proposed by Yasuda et al. As an application of our result, we give new results for the column subset selection problem, which is well-studied when the loss is the Frobenius norm or other entrywise matrix losses. We give the first result for general loss functions for this problem that requires only restricted strong convexity and smoothness.
摘要
尽管在实践中广泛采用,LASSO和Group LASSO的保证在超出统计问题的设置下却缺乏保证,通常被视为统计问题中的启发式方法。我们提出了Group LASSO在稀疏凸优化中的首个回归保证,并证明如果在最小化一个固定输入的强烈凸函数$l$时应用足够大的Group LASSO正则化,那么最小值是一个稀疏的向量支持 vector-valued features的最大$\ell_2$范数。因此,重复这个过程可以选择同样的特征集,与Orthogonal Matching Pursuit算法相同,后者对任何函数$l$ WITH RESTRICTED STRONG CONVEXITY和SMOOTHNESS通过弱SubmodularityArguments提供了回归保证。这些问题得到了 Tibshirani et al.和Yasuda et al.的解答。我们的结果是对统计问题中的凸函数下的任意输入实例进行 theoretically解释Group LASSO的Empirical Success,只需要 RESTRICTED STRONG CONVEXITY和SMOOTHNESS假设。我们的结果还扩展了Sequential Attention算法的证明,Sequential Attention算法是一种基于注意机制的特征选择算法,而Yasuda et al.提出的。在我们的结果的应用中,我们给出了一个新的结果,即Column Subset Selection问题,这是一个广泛研究的问题,当loss是 Frobenius 范数或其他Entrywise Matrix losses时。我们给出了第一个对于一般损失函数的结果,只需要 RESTRICTED STRONG CONVEXITY和SMOOTHNESS假设。
Improving Zero-Shot Generalization for CLIP with Synthesized Prompts
results: 通过对CLIP模型进行微调,实现了基于新数据的泛化、跨数据集转移学习和总是零例学习等多种任务的超越性表现。Abstract
With the growing interest in pretrained vision-language models like CLIP, recent research has focused on adapting these models to downstream tasks. Despite achieving promising results, most existing methods require labeled data for all classes, which may not hold in real-world applications due to the long tail and Zipf's law. For example, some classes may lack labeled data entirely, such as emerging concepts. To address this problem, we propose a plug-and-play generative approach called \textbf{S}ynt\textbf{H}es\textbf{I}zed \textbf{P}rompts~(\textbf{SHIP}) to improve existing fine-tuning methods. Specifically, we follow variational autoencoders to introduce a generator that reconstructs the visual features by inputting the synthesized prompts and the corresponding class names to the textual encoder of CLIP. In this manner, we easily obtain the synthesized features for the remaining label-only classes. Thereafter, we fine-tune CLIP with off-the-shelf methods by combining labeled and synthesized features. Extensive experiments on base-to-new generalization, cross-dataset transfer learning, and generalized zero-shot learning demonstrate the superiority of our approach. The code is available at \url{https://github.com/mrflogs/SHIP}.
摘要
随着人工智能视觉语言模型CLIP的兴趣增长,现有研究强调将这些模型适应下游任务。虽然实现了可喜的结果,但大多数现有方法需要所有类别的标注数据,这可能不符合实际应用中的情况,因为Zipf的法则和长尾问题。例如,某些类别可能完全缺乏标注数据,如新出现的概念。为解决这个问题,我们提出了一种插件式生成方法,即\textbf{S}ynt\textbf{H}es\textbf{I}zed \textbf{P}rompts~(\textbf{SHIP)},用以改进现有的辅助方法。具体来说,我们采用变量自动机制来引入一个生成器,通过输入生成的提示和对应的类别名称,将视觉特征重构回CLIP的文本编码器中。这样,我们可以轻松地获得生成的特征 для剩下的标签只有类。然后,我们将CLIP通过市场上可得到的方法进行精度调整,并将标注和生成的特征结合在一起。我们进行了基于新到旧泛化、跨数据集转移学习和通用零例学习的广泛实验,结果显示了我们的方法的优越性。代码可以在\url{https://github.com/mrflogs/SHIP}中找到。
Visualizing Overlapping Biclusterings and Boolean Matrix Factorizations
results: 实验结果表明,该规则可以在实际数据集上实现最佳的平衡,并且可以减少视觉化中的混乱。Abstract
Finding (bi-)clusters in bipartite graphs is a popular data analysis approach. Analysts typically want to visualize the clusters, which is simple as long as the clusters are disjoint. However, many modern algorithms find overlapping clusters, making visualization more complicated. In this paper, we study the problem of visualizing \emph{a given clustering} of overlapping clusters in bipartite graphs and the related problem of visualizing Boolean Matrix Factorizations. We conceptualize three different objectives that any good visualization should satisfy: (1) proximity of cluster elements, (2) large consecutive areas of elements from the same cluster, and (3) large uninterrupted areas in the visualization, regardless of the cluster membership. We provide objective functions that capture these goals and algorithms that optimize these objective functions. Interestingly, in experiments on real-world datasets, we find that the best trade-off between these competing goals is achieved by a novel heuristic, which locally aims to place rows and columns with similar cluster membership next to each other.
摘要
发现(二分)集群在 дву分图中是一种流行的数据分析方法。分析员通常希望可视化集群,只要集群是不 overlap的,那么很简单。然而,许多现代算法找到了重叠集群,使可视化变得更加复杂。在这篇论文中,我们研究了在二分图中可视化给定的集群和布尔矩阵因子化的问题,以及这两个问题之间的关系。我们提出了三个不同的目标,任何好的可视化都应该满足:(1)群元素之间的距离,(2)同一个群中的元素连续占用大面积,(3)无论群 membership,视化中的大面积不被打断。我们提出了捕捉这些目标的目标函数,并提供了优化这些目标函数的算法。实验结果表明,在真实世界 dataset 上,我们的新规则可以在这些竞争目标之间寻找最佳平衡。Note: "二分图" refers to a bipartite graph, and "布尔矩阵因子化" refers to Boolean matrix factorization.
CAMP: A Context-Aware Cricket Players Performance Metric
paper_authors: Muhammad Sohaib Ayub, Naimat Ullah, Sarwan Ali, Imdad Ullah Khan, Mian Muhammad Awais, Muhammad Asad Khan, Safiullah Faizullah
for: 这篇论文目的是为了提出一种Context-Aware Metric of player Performance(CAMP),用于评估 individuak 篮球运动员的表现。
methods: 这篇论文使用了数据挖掘技术,包括Context-Aware Metric of player Performance(CAMP),以便更好地支持数据驱动的决策。
results: 根据 empirical evaluation,CAMP 的评估结果与 domain experts 宣布的最佳球员(Man of the Match,MoM)相符合的比例为 83%,并且在比较与 Duckworth-Lewis-Stern 方法(DLS)中的最佳球员评估结果时表现出色。Abstract
Cricket is the second most popular sport after soccer in terms of viewership. However, the assessment of individual player performance, a fundamental task in team sports, is currently primarily based on aggregate performance statistics, including average runs and wickets taken. We propose Context-Aware Metric of player Performance, CAMP, to quantify individual players' contributions toward a cricket match outcome. CAMP employs data mining methods and enables effective data-driven decision-making for selection and drafting, coaching and training, team line-ups, and strategy development. CAMP incorporates the exact context of performance, such as opponents' strengths and specific circumstances of games, such as pressure situations. We empirically evaluate CAMP on data of limited-over cricket matches between 2001 and 2019. In every match, a committee of experts declares one player as the best player, called Man of the M}atch (MoM). The top two rated players by CAMP match with MoM in 83\% of the 961 games. Thus, the CAMP rating of the best player closely matches that of the domain experts. By this measure, CAMP significantly outperforms the current best-known players' contribution measure based on the Duckworth-Lewis-Stern (DLS) method.
摘要
cricket是世界上第二受欢迎的运动之一,仅次于足球。然而,评估个体运动员表现的任务,在团队运动中是一项基本任务,目前主要基于各种聚合性表现统计,如平均得分和夺取的球。我们提出了 Context-Aware Metric of player Performance(CAMP),用于评估cricket运动员的个人贡献。CAMP使用数据挖掘技术,可以帮助团队选择和培训、组队和策略开发等决策。CAMP考虑了特定的比赛情况,如对手的强点和游戏中的压力情况。我们对961场限定的cricket比赛数据进行了实验性评估,并发现CAMP评分与专家委员会宣布的最佳球员(Man of the Match,MoM)相匹配的比例为83%。因此,CAMP评分与专家评价几乎一致。此外,CAMP的评分也超过了基于Duckworth-Lewis-Stern(DLS)方法的现有最佳球员贡献度量。
Brain in the Dark: Design Principles for Neuro-mimetic Learning and Inference
paper_authors: Mehran H. Bazargani, Szymon Urbas, Karl Friston
for: 这篇论文旨在探讨脑内部做出感知的机制,具体来说是使用生成模型来描述脑内部做出的感知。
methods: 这篇论文使用生成模型来模拟脑内部做出的感知,并通过倒推来实现感知的归一化。
results: 这篇论文提出了一种基于脑内部生成模型的方法来实现感知的归一化,并讨论了不同的均场近似方法(MFA)和其影响于归一化学习(VI)。Abstract
Even though the brain operates in pure darkness, within the skull, it can infer the most likely causes of its sensory input. An approach to modelling this inference is to assume that the brain has a generative model of the world, which it can invert to infer the hidden causes behind its sensory stimuli, that is, perception. This assumption raises key questions: how to formulate the problem of designing brain-inspired generative models, how to invert them for the tasks of inference and learning, what is the appropriate loss function to be optimised, and, most importantly, what are the different choices of mean field approximation (MFA) and their implications for variational inference (VI).
摘要
虽然大脑在颅内完全黑暗中运行,但它可以推断感知输入的最有可能的原因。一种模型这种推断的方法是假设大脑有一个世界的生成模型,它可以反向推断感知的隐藏原因,即感知。这个假设提出了关键问题:如何构思大脑引起的生成模型设计问题,如何在推断和学习任务中对其进行反向推断,什么是合适的损失函数优化目标,以及不同的mean field approximation(MFA)选择对变量推断(VI)带来的不同影响。
Learning Sparse Neural Networks with Identity Layers
paper_authors: Mingjian Ni, Guangyao Chen, Xiawu Zheng, Peixi Peng, Li Yuan, Yonghong Tian for:This paper aims to improve the sparsity of deep neural networks by reducing interlayer feature similarity.methods:The proposed method uses Centered Kernel Alignment (CKA) to reduce feature similarity between layers and increase network sparsity.results:The proposed CKA-SR method consistently improves the performance of several State-Of-The-Art sparse training methods, especially at extremely high sparsity.Here is the simplified Chinese text:for: 这篇论文目的是提高深度神经网络的稀畴性,减少层次特征相似性。methods: 提议方法使用中心kernel对齐(CKA)来减少层次特征相似性,提高网络稀畴性。results: 提议CKA-SR方法在多个State-Of-The-Art稀畴训练方法中表现出色,特别是在极高稀畴性下表现更好。Abstract
The sparsity of Deep Neural Networks is well investigated to maximize the performance and reduce the size of overparameterized networks as possible. Existing methods focus on pruning parameters in the training process by using thresholds and metrics. Meanwhile, feature similarity between different layers has not been discussed sufficiently before, which could be rigorously proved to be highly correlated to the network sparsity in this paper. Inspired by interlayer feature similarity in overparameterized models, we investigate the intrinsic link between network sparsity and interlayer feature similarity. Specifically, we prove that reducing interlayer feature similarity based on Centered Kernel Alignment (CKA) improves the sparsity of the network by using information bottleneck theory. Applying such theory, we propose a plug-and-play CKA-based Sparsity Regularization for sparse network training, dubbed CKA-SR, which utilizes CKA to reduce feature similarity between layers and increase network sparsity. In other words, layers of our sparse network tend to have their own identity compared to each other. Experimentally, we plug the proposed CKA-SR into the training process of sparse network training methods and find that CKA-SR consistently improves the performance of several State-Of-The-Art sparse training methods, especially at extremely high sparsity. Code is included in the supplementary materials.
摘要
深度神经网络的稀疏性已经广泛研究,以最大化性能并减少过参数化网络的大小。现有方法主要通过在训练过程中使用阈值和度量来减少参数。而层之间的特征相似性还没有得到充分的研究,这里我们将通过中心kernel对齐(CKA)来降低网络稀疏性。利用信息瓶颈理论,我们提出了基于CKA的稀疏规范(CKA-SR),该规范通过减少层之间的特征相似性来增加网络的稀疏性。即层之间的稀疏网络具有更明确的标识性。我们在训练稀疏网络训练方法时对CKA-SR进行实验,发现CKA-SR可以在极高稀疏性下提高多种现有的状态数据训练方法的性能,特别是在极高稀疏性下。代码请参考辅料。
Higher-order topological kernels via quantum computation
results: 本文提出了一种基于Betti curves的量子定义 topological kernels 方法,并在一个干净的 simulator 上实现了一个工作示例。通过一些实验结果,表明 topological 方法可能在量子机器学习中具有优势。Abstract
Topological data analysis (TDA) has emerged as a powerful tool for extracting meaningful insights from complex data. TDA enhances the analysis of objects by embedding them into a simplicial complex and extracting useful global properties such as the Betti numbers, i.e. the number of multidimensional holes, which can be used to define kernel methods that are easily integrated with existing machine-learning algorithms. These kernel methods have found broad applications, as they rely on powerful mathematical frameworks which provide theoretical guarantees on their performance. However, the computation of higher-dimensional Betti numbers can be prohibitively expensive on classical hardware, while quantum algorithms can approximate them in polynomial time in the instance size. In this work, we propose a quantum approach to defining topological kernels, which is based on constructing Betti curves, i.e. topological fingerprint of filtrations with increasing order. We exhibit a working prototype of our approach implemented on a noiseless simulator and show its robustness by means of some empirical results suggesting that topological approaches may offer an advantage in quantum machine learning.
摘要
Composition-contrastive Learning for Sentence Embeddings
results: 实验结果显示,相比基eline,本研究的方法可以提高 semantic textual similarity task 的表示质量,并且不需要 auxiliary training objective 或额外网络参数。Abstract
Vector representations of natural language are ubiquitous in search applications. Recently, various methods based on contrastive learning have been proposed to learn textual representations from unlabelled data; by maximizing alignment between minimally-perturbed embeddings of the same text, and encouraging a uniform distribution of embeddings across a broader corpus. Differently, we propose maximizing alignment between texts and a composition of their phrasal constituents. We consider several realizations of this objective and elaborate the impact on representations in each case. Experimental results on semantic textual similarity tasks show improvements over baselines that are comparable with state-of-the-art approaches. Moreover, this work is the first to do so without incurring costs in auxiliary training objectives or additional network parameters.
摘要
文本的矢量表示方法在搜索应用中广泛使用。近期,基于对比学习的不同方法被提议来从无标签数据中学习文本表示; 通过最大化同样文本的微小改动 embedding 之间的对应度,并尝试在更广泛的文库中均匀分布 embedding。不同的我们提议是通过最大化文本和其短语组成部分之间的对应度来实现。我们考虑了这些目标的不同实现方式,并详细描述了它们对表示的影响。实验结果表明,在 semantics 文本相似性任务中,我们的方法可以与现状技术相比获得显著提高,而无需额外训练目标或网络参数。Note: "矢量表示" in Chinese is typically translated as "vector representation" or "vector embedding", but I have used "矢量表示" throughout the text to maintain consistency with the original English phrasing.
Defect Classification in Additive Manufacturing Using CNN-Based Vision Processing
paper_authors: Xiao Liu, Alessandra Mileo, Alan F. Smeaton
for: 用于改进附加制造过程质量
methods: 使用图像感知器和活动学习技术
results: 精确地分类批量制造中的缺陷Abstract
The development of computer vision and in-situ monitoring using visual sensors allows the collection of large datasets from the additive manufacturing (AM) process. Such datasets could be used with machine learning techniques to improve the quality of AM. This paper examines two scenarios: first, using convolutional neural networks (CNNs) to accurately classify defects in an image dataset from AM and second, applying active learning techniques to the developed classification model. This allows the construction of a human-in-the-loop mechanism to reduce the size of the data required to train and generate training data.
摘要
通过计算机视觉和在位测量技术,可以从三维打印(AM)过程中收集大量数据。这些数据可以用机器学习技术来提高AM的质量。这篇论文研究了两种情况:首先,使用卷积神经网络(CNN)准确地分类AM图像集中的缺陷,其次,应用到开发的分类模型上的活动学习技术。这allowsthe construction of a human-in-the-loop mechanism to reduce the size of the data required to train and generate training data。Note: "Simplified Chinese" is also known as "Mandarin" or "Standard Chinese".
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes
results: 对于 MS COCO 数据集和我们新提出的时尚数据集,我们的 AIC-AB NET 与基eline模型和ablated模型进行比较,实验结果表明我们的模型在两个数据集上都有较高的表现,与基eline模型的 CIDEr 得分相比,我们的模型在 MS COCO 数据集上提高了0.017,而在时尚数据集上提高了0.095。Abstract
Image captioning is a significant field across computer vision and natural language processing. We propose and present AIC-AB NET, a novel Attribute-Information-Combined Attention-Based Network that combines spatial attention architecture and text attributes in an encoder-decoder. For caption generation, adaptive spatial attention determines which image region best represents the image and whether to attend to the visual features or the visual sentinel. Text attribute information is synchronously fed into the decoder to help image recognition and reduce uncertainty. We have tested and evaluated our AICAB NET on the MS COCO dataset and a new proposed Fashion dataset. The Fashion dataset is employed as a benchmark of single-object images. The results show the superior performance of the proposed model compared to the state-of-the-art baseline and ablated models on both the images from MSCOCO and our single-object images. Our AIC-AB NET outperforms the baseline adaptive attention network by 0.017 (CIDEr score) on the MS COCO dataset and 0.095 (CIDEr score) on the Fashion dataset.
摘要
Image captioning是一个重要的计算机视觉和自然语言处理领域。我们提议并发表了AIC-AB网络,一种新的Attribute-Information-Combined Attention-Based Network,它将空间注意力架构和文本属性注入到了Encoder-Decoder中。在caption生成过程中,适应的空间注意力确定了图像中最佳表示图像的区域,以及是否需要关注视觉特征或视觉监测。文本属性信息同时被 fed into Decoder,以帮助图像识别和减少不确定性。我们在MS COCO数据集和我们新提出的时尚数据集上测试和评估了我们的AICAB网络。时尚数据集被用作单个物体图像的标准准测试集。结果显示我们的提议模型与状态对比基eline和折衔模型在MS COCO数据集和时尚数据集上都有superior的性能。我们的AIC-AB网络在MS COCO数据集上比基eline适应注意力网络高0.017(CIDEr分数),在时尚数据集上高0.095(CIDEr分数)。
Source-Free Domain Adaptation with Temporal Imputation for Time Series Data
results: EXTENSIVE experiments表明,我们的MAPU在三个实际时间序列数据集上实现了明显的性能提升,较exist方法更好。我们的代码可以在 \url{https://github.com/mohamedr002/MAPU_SFDA_TS} 上获取。Abstract
Source-free domain adaptation (SFDA) aims to adapt a pretrained model from a labeled source domain to an unlabeled target domain without access to the source domain data, preserving source domain privacy. Despite its prevalence in visual applications, SFDA is largely unexplored in time series applications. The existing SFDA methods that are mainly designed for visual applications may fail to handle the temporal dynamics in time series, leading to impaired adaptation performance. To address this challenge, this paper presents a simple yet effective approach for source-free domain adaptation on time series data, namely MAsk and imPUte (MAPU). First, to capture temporal information of the source domain, our method performs random masking on the time series signals while leveraging a novel temporal imputer to recover the original signal from a masked version in the embedding space. Second, in the adaptation step, the imputer network is leveraged to guide the target model to produce target features that are temporally consistent with the source features. To this end, our MAPU can explicitly account for temporal dependency during the adaptation while avoiding the imputation in the noisy input space. Our method is the first to handle temporal consistency in SFDA for time series data and can be seamlessly equipped with other existing SFDA methods. Extensive experiments conducted on three real-world time series datasets demonstrate that our MAPU achieves significant performance gain over existing methods. Our code is available at \url{https://github.com/mohamedr002/MAPU_SFDA_TS}.
摘要
源自由领域适应(SFDA)目标是将预训练的源频率频率模型适应到无标签目标频率频率数据上,保持源频率频率数据私钥。尽管SFDA在视觉应用中广泛存在,但在时间序列应用中它尚未得到充分探讨。现有的SFDA方法主要是为视觉应用设计,可能无法处理时间序列中的时间动态,导致适应性下降。为解决这个挑战,本文提出了一种简单又有效的源自由频率频率数据适应方法,即MAsk和imPUte(MAPU)。首先,为捕捉源频率频率数据中的时间信息,我们的方法在时间序列信号上随机填充mask,并利用一种新的时间填充器来在嵌入空间中恢复原始信号。其次,在适应步骤中,填充器网络被利用来指导目标模型生成目标特征,使其与源特征在时间上具有一致性。这样,我们的MAPU可以在适应过程中考虑时间相互关系,而不是在噪声输入空间中进行填充。我们的方法是首次在SFDA中处理时间相互关系,可以与其他现有的SFDA方法结合使用。我们在三个实际的时间序列数据集上进行了广泛的实验,并证明了我们的MAPU可以获得显著的性能提升。我们的代码可以在 \url{https://github.com/mohamedr002/MAPU_SFDA_TS} 上获取。
results: 本研究在 Amazon Last Mile Routing Research Challenge 中测试了 IO 方法,并实现了在 thousands 个实际路径示例中学习决策者的路径偏好。最终的 IO-学习的路径模型在 48 个参赛模型中排名第二。结果表明 IO 方法在 Routing 问题中有优秀的灵活性和实际应用 potential。Abstract
We propose a method for learning decision-makers' behavior in routing problems using Inverse Optimization (IO). The IO framework falls into the supervised learning category and builds on the premise that the target behavior is an optimizer of an unknown cost function. This cost function is to be learned through historical data, and in the context of routing problems, can be interpreted as the routing preferences of the decision-makers. In this view, the main contributions of this study are to propose an IO methodology with a hypothesis function, loss function, and stochastic first-order algorithm tailored to routing problems. We further test our IO approach in the Amazon Last Mile Routing Research Challenge, where the goal is to learn models that replicate the routing preferences of human drivers, using thousands of real-world routing examples. Our final IO-learned routing model achieves a score that ranks 2nd compared with the 48 models that qualified for the final round of the challenge. Our results showcase the flexibility and real-world potential of the proposed IO methodology to learn from decision-makers' decisions in routing problems.
摘要
我们提出了一种使用反优化(IO)方法来学习决策者的行为在路径问题中。IO框架属于超级VI类和建立在决策者的路径偏好是未知成本函数的假设上。在路径问题上,这个成本函数可以通过历史数据来学习,并可以被解释为决策者的路径偏好。在这种视角下,本研究的主要贡献在于提出了一种适应路径问题的IO方法ologies,包括假设函数、损失函数和随机首领算法。我们进一步测试了我们的IO方法在亚马逊最后一英里路径研究挑战中,目标是学习人工驾驶员的路径偏好,使用了数千个真实世界路径示例。我们的IO学习路径模型在48个参赛模型中排名第二,这显示了我们的IO方法在路径问题中学习决策者的行为的可能性和实际应用的实用性。
paper_authors: Justin Whitehouse, Zhiwei Steven Wu, Aaditya Ramdas
for: 在 kernelized bandit problem 中,一个学习者需要逐步计算一个函数在 reproduce kernel Hilbert space 中的最优点。Specifically, the learner aims to minimize regret, which is a measure of the suboptimality of the choices made.
methods: 该算法使用 Gaussian Process Upper Confidence Bound (GP-UCB) 算法,具体来说是基于一个简单线性估计器来行动。
results: 我们解决了一个长期开放问题,证明 GP-UCB 算法具有几乎最优的 regret。Specifically, our results show that GP-UCB enjoys sublinear regret rates for the Mat'ern kernel, improving over the state-of-the-art analyses and partially resolving a COLT open problem posed by Vakili et al.Abstract
In the kernelized bandit problem, a learner aims to sequentially compute the optimum of a function lying in a reproducing kernel Hilbert space given only noisy evaluations at sequentially chosen points. In particular, the learner aims to minimize regret, which is a measure of the suboptimality of the choices made. Arguably the most popular algorithm is the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm, which involves acting based on a simple linear estimator of the unknown function. Despite its popularity, existing analyses of GP-UCB give a suboptimal regret rate, which fails to be sublinear for many commonly used kernels such as the Mat\'ern kernel. This has led to a longstanding open question: are existing regret analyses for GP-UCB tight, or can bounds be improved by using more sophisticated analytical techniques? In this work, we resolve this open question and show that GP-UCB enjoys nearly optimal regret. In particular, our results yield sublinear regret rates for the Mat\'ern kernel, improving over the state-of-the-art analyses and partially resolving a COLT open problem posed by Vakili et al. Our improvements rely on a key technical contribution -- regularizing kernel ridge estimators in proportion to the smoothness of the underlying kernel $k$. Applying this key idea together with a largely overlooked concentration result in separable Hilbert spaces (for which we provide an independent, simplified derivation), we are able to provide a tighter analysis of the GP-UCB algorithm.
摘要
在核函数问题中,一个学习者想要逐渐计算一个 lying in a reproducing kernel Hilbert space 中的最优函数,只有受到随机评估的点选择。特别是,学习者想要减少 regret,它是选择的不优异度的度量。现有最受欢迎的算法是 Gaussian Process Upper Confidence Bound(GP-UCB)算法,它基于一个简单的线性估计来行动。尽管它的流行,但现有的 regret 分析仍然给出了不优的 regret 率,这些率不仅对 Matérn 核函数而言不是线性的。这个问题已经成为一个长期开放问题:现有的 regret 分析是否准确,或者可以通过更加复杂的分析技术来改进 bound?在这个工作中,我们解决了这个问题,并证明 GP-UCB 算法具有nearly optimal regret。具体来说,我们的结果表明,对 Matérn 核函数,GP-UCB 算法的 regret 率是线性的,这与现有的分析不同,并且解决了 COLT 开放问题。我们的改进来自于一个关键技术创新——对 kernel ridge 估计器进行补偿,以便适应核函数的平滑度。通过这个关键想法,我们还应用了一个 relativelly 未知的集中结果(即 separable Hilbert spaces 中的 concentraion result),以获得 GP-UCB 算法的更加紧密的分析。
A testing-based approach to assess the clusterability of categorical data
results: TestCat在一些标准 benchmark categorical data sets上表现出色,与基于现有clusterability evaluation方法的解决方案相比,并且可以 statistically sound manner中识别 categorical data的clusterability.Abstract
The objective of clusterability evaluation is to check whether a clustering structure exists within the data set. As a crucial yet often-overlooked issue in cluster analysis, it is essential to conduct such a test before applying any clustering algorithm. If a data set is unclusterable, any subsequent clustering analysis would not yield valid results. Despite its importance, the majority of existing studies focus on numerical data, leaving the clusterability evaluation issue for categorical data as an open problem. Here we present TestCat, a testing-based approach to assess the clusterability of categorical data in terms of an analytical $p$-value. The key idea underlying TestCat is that clusterable categorical data possess many strongly correlated attribute pairs and hence the sum of chi-squared statistics of all attribute pairs is employed as the test statistic for $p$-value calculation. We apply our method to a set of benchmark categorical data sets, showing that TestCat outperforms those solutions based on existing clusterability evaluation methods for numeric data. To the best of our knowledge, our work provides the first way to effectively recognize the clusterability of categorical data in a statistically sound manner.
摘要
The key idea behind TestCat is that clusterable categorical data will have many strongly correlated attribute pairs, so we use the sum of chi-squared statistics for all attribute pairs as our test statistic for p-value calculation. We apply our method to a set of benchmark categorical data sets and show that TestCat outperforms existing solutions based on clusterability evaluation methods for numerical data. To the best of our knowledge, our work provides the first statistically sound way to recognize the clusterability of categorical data.
Inverse Evolution Layers: Physics-informed Regularizers for Deep Neural Networks
results: 实验结果表明,使用 heat-diffusion IELs 可以有效地 mitigate 噪音标签导致的 overfitting 问题。Abstract
This paper proposes a novel approach to integrating partial differential equation (PDE)-based evolution models into neural networks through a new type of regularization. Specifically, we propose inverse evolution layers (IELs) based on evolution equations. These layers can achieve specific regularization objectives and endow neural networks' outputs with corresponding properties of the evolution models. Moreover, IELs are straightforward to construct and implement, and can be easily designed for various physical evolutions and neural networks. Additionally, the design process for these layers can provide neural networks with intuitive and mathematical interpretability, thus enhancing the transparency and explainability of the approach. To demonstrate the effectiveness, efficiency, and simplicity of our approach, we present an example of endowing semantic segmentation models with the smoothness property based on the heat diffusion model. To achieve this goal, we design heat-diffusion IELs and apply them to address the challenge of semantic segmentation with noisy labels. The experimental results demonstrate that the heat-diffusion IELs can effectively mitigate the overfitting problem caused by noisy labels.
摘要
To demonstrate the effectiveness, efficiency, and simplicity of the proposed approach, the paper presents an example of endowing semantic segmentation models with the smoothness property based on the heat diffusion model. To achieve this, heat-diffusion IELs are designed and applied to address the challenge of semantic segmentation with noisy labels. The experimental results show that the heat-diffusion IELs can effectively mitigate the overfitting problem caused by noisy labels.
MaxMin-L2-SVC-NCH: A Novel Approach for Support Vector Classifier Training and Parameter Selection
results: 对于公共数据集的比较实验结果显示,MaxMin-L2-SVC-NCH可以减少模型训练数量而保持竞争力的测试准确率,这表明MaxMin-L2-SVC-NCH是SVC任务中更好的选择。Abstract
The selection of Gaussian kernel parameters plays an important role in the applications of support vector classification (SVC). A commonly used method is the k-fold cross validation with grid search (CV), which is extremely time-consuming because it needs to train a large number of SVC models. In this paper, a new approach is proposed to train SVC and optimize the selection of Gaussian kernel parameters. We first formulate the training and parameter selection of SVC as a minimax optimization problem named as MaxMin-L2-SVC-NCH, in which the minimization problem is an optimization problem of finding the closest points between two normal convex hulls (L2-SVC-NCH) while the maximization problem is an optimization problem of finding the optimal Gaussian kernel parameters. A lower time complexity can be expected in MaxMin-L2-SVC-NCH because CV is not needed. We then propose a projected gradient algorithm (PGA) for training L2-SVC-NCH. The famous sequential minimal optimization (SMO) algorithm is a special case of the PGA. Thus, the PGA can provide more flexibility than the SMO. Furthermore, the solution of the maximization problem is done by a gradient ascent algorithm with dynamic learning rate. The comparative experiments between MaxMin-L2-SVC-NCH and the previous best approaches on public datasets show that MaxMin-L2-SVC-NCH greatly reduces the number of models to be trained while maintaining competitive test accuracy. These findings indicate that MaxMin-L2-SVC-NCH is a better choice for SVC tasks.
摘要
选择 Gaussian kernel 参数的选择在支持向量分类 (SVC) 中扮演着重要的角色。一种常用的方法是 k-fold cross validation with grid search (CV),但这个方法需要训练大量的 SVC 模型,时间复杂度很高。在这篇论文中,我们提出了一个新的方法来训练 SVC 和优化 Gaussian kernel 参数的选择。我们首先将训练和参数选择转换为一个内积最小化问题(MaxMin-L2-SVC-NCH),其中的最小化问题是找到两个正常凸体(L2-SVC-NCH)之间最近的点,而最大化问题是找到最佳的 Gaussian kernel 参数。这个方法可以预期更低的时间复杂度,因为 CV 不需要。我们 THEN 提出了一个投影 gradient 算法 (PGA) 来训练 L2-SVC-NCH。SMO 算法是 PGA 的一个特例,因此 PGA 可以提供更多的灵活性。另外,最大化问题的解是使用梯度升降法 WITH 动态学习率。实验结果显示,MaxMin-L2-SVC-NCH 与过去的最佳方法在公开数据集上比较,可以大幅降低需要训练的模型数量,保持竞争的测试准确性。这些结果表明,MaxMin-L2-SVC-NCH 是 SVC 任务中的一个更好的选择。
Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild
results: 论文的实验结果表明,使用这种新的方法可以在敏感信息抽取 task 中提高 F1 分数由10%点提高到50%点,与之前的10种解决方案相比。Abstract
Cyber Threat Intelligence (CTI) plays a crucial role in assessing risks and enhancing security for organizations. However, the process of extracting relevant information from unstructured text sources can be expensive and time-consuming. Our empirical experience shows that existing tools for automated structured CTI extraction have performance limitations. Furthermore, the community lacks a common benchmark to quantitatively assess their performance. We fill these gaps providing a new large open benchmark dataset and aCTIon, a structured CTI information extraction tool. The dataset includes 204 real-world publicly available reports and their corresponding structured CTI information in STIX format. Our team curated the dataset involving three independent groups of CTI analysts working over the course of several months. To the best of our knowledge, this dataset is two orders of magnitude larger than previously released open source datasets. We then design aCTIon, leveraging recently introduced large language models (GPT3.5) in the context of two custom information extraction pipelines. We compare our method with 10 solutions presented in previous work, for which we develop our own implementations when open-source implementations were lacking. Our results show that aCTIon outperforms previous work for structured CTI extraction with an improvement of the F1-score from 10%points to 50%points across all tasks.
摘要
资ber隐ThreadIntelligence(CTI)在评估风险和增强组织安全方面扮演着关键的角色。然而,从不结构化文本来提取有用信息的过程可能是昂费时间和成本的。我们的实践经验表明,现有的自动化结构CTI信息提取工具有性能上的限制。此外,社区缺乏一个共同的量化评估标准。我们填补了这些空白,提供了一个新的大型开放 benchmark数据集和aCTIon,一个结构化CTI信息提取工具。这个数据集包括204份公开可用的报告,以及它们的对应的结构化CTI信息在STIX格式下。我们的团队在几个月内精心筛选了这个数据集,还有三个独立的CTI分析师团队参与了实验。根据我们所知,这个数据集比前一些公开数据集大得多,是二个数量级的大得多。我们随后设计了aCTIon,利用最近引入的大型语言模型(GPT3.5),并在两个自订的信息提取管道中进行了实现。我们与前一些作品中的10个解决方案进行比较,并为其实现了自己的开源实现。我们的结果显示,aCTIon在结构化CTI信息提取方面比前一些作品提高了F1分数的值,从10%点提高到50%点。
How Different Is Stereotypical Bias Across Languages?
paper_authors: Ibrahim Tolga Öztürk, Rostislav Nedelchev, Christian Heumann, Esteban Garces Arias, Marius Roger, Bernd Bischl, Matthias Aßenmacher
for: 本研究探讨了在预训练英语模型中带有刻板印象的问题,并在多个维度上扩展了这一分支研究。
methods: 我们使用了英语斯tereoSet数据集(Nadeem et al., 2021),通过自动翻译而将其翻译成了德语、法语、西班牙语和土耳其语。
results: 我们发现,在多语言设置下进行这类分析是非常重要的,因为我们的实验结果显示了许多多样性和语言之间的差异。主要结论是,mGPT-2(部分)在不同语言中表现出了反刻板的行为,英语(单语言)模型表现出最强的偏见,并且数据集中含有最少的刻板印象是在土耳其模型中。最后,我们发布了我们的代码库和数据集的 semi-automatic 翻译,以便鼓励其他语言的扩展。Abstract
Recent studies have demonstrated how to assess the stereotypical bias in pre-trained English language models. In this work, we extend this branch of research in multiple different dimensions by systematically investigating (a) mono- and multilingual models of (b) different underlying architectures with respect to their bias in (c) multiple different languages. To that end, we make use of the English StereoSet data set (Nadeem et al., 2021), which we semi-automatically translate into German, French, Spanish, and Turkish. We find that it is of major importance to conduct this type of analysis in a multilingual setting, as our experiments show a much more nuanced picture as well as notable differences from the English-only analysis. The main takeaways from our analysis are that mGPT-2 (partly) shows surprising anti-stereotypical behavior across languages, English (monolingual) models exhibit the strongest bias, and the stereotypes reflected in the data set are least present in Turkish models. Finally, we release our codebase alongside the translated data sets and practical guidelines for the semi-automatic translation to encourage a further extension of our work to other languages.
摘要
Recent studies have demonstrated how to assess the stereotypical bias in pre-trained English language models. In this work, we extend this branch of research in multiple different dimensions by systematically investigating (a) mono- and multilingual models of (b) different underlying architectures with respect to their bias in (c) multiple different languages. To that end, we make use of the English StereoSet data set (Nadeem et al., 2021), which we semi-automatically translate into German, French, Spanish, and Turkish. We find that it is of major importance to conduct this type of analysis in a multilingual setting, as our experiments show a much more nuanced picture as well as notable differences from the English-only analysis. The main takeaways from our analysis are that mGPT-2 (partly) shows surprising anti-stereotypical behavior across languages, English (monolingual) models exhibit the strongest bias, and the stereotypes reflected in the data set are least present in Turkish models. Finally, we release our codebase alongside the translated data sets and practical guidelines for the semi-automatic translation to encourage a further extension of our work to other languages.Here's the text in Traditional Chinese:Recent studies have demonstrated how to assess the stereotypical bias in pre-trained English language models. In this work, we extend this branch of research in multiple different dimensions by systematically investigating (a) mono- and multilingual models of (b) different underlying architectures with respect to their bias in (c) multiple different languages. To that end, we make use of the English StereoSet data set (Nadeem et al., 2021), which we semi-automatically translate into German, French, Spanish, and Turkish. We find that it is of major importance to conduct this type of analysis in a multilingual setting, as our experiments show a much more nuanced picture as well as notable differences from the English-only analysis. The main takeaways from our analysis are that mGPT-2 (partly) shows surprising anti-stereotypical behavior across languages, English (monolingual) models exhibit the strongest bias, and the stereotypes reflected in the data set are least present in Turkish models. Finally, we release our codebase alongside the translated data sets and practical guidelines for the semi-automatic translation to encourage a further extension of our work to other languages.
Boosting Backdoor Attack with A Learnable Poisoning Sample Selection Strategy
results: 提高了后门攻击性能,效率高。Abstract
Data-poisoning based backdoor attacks aim to insert backdoor into models by manipulating training datasets without controlling the training process of the target model. Existing attack methods mainly focus on designing triggers or fusion strategies between triggers and benign samples. However, they often randomly select samples to be poisoned, disregarding the varying importance of each poisoning sample in terms of backdoor injection. A recent selection strategy filters a fixed-size poisoning sample pool by recording forgetting events, but it fails to consider the remaining samples outside the pool from a global perspective. Moreover, computing forgetting events requires significant additional computing resources. Therefore, how to efficiently and effectively select poisoning samples from the entire dataset is an urgent problem in backdoor attacks.To address it, firstly, we introduce a poisoning mask into the regular backdoor training loss. We suppose that a backdoored model training with hard poisoning samples has a more backdoor effect on easy ones, which can be implemented by hindering the normal training process (\ie, maximizing loss \wrt mask). To further integrate it with normal training process, we then propose a learnable poisoning sample selection strategy to learn the mask together with the model parameters through a min-max optimization.Specifically, the outer loop aims to achieve the backdoor attack goal by minimizing the loss based on the selected samples, while the inner loop selects hard poisoning samples that impede this goal by maximizing the loss. After several rounds of adversarial training, we finally select effective poisoning samples with high contribution. Extensive experiments on benchmark datasets demonstrate the effectiveness and efficiency of our approach in boosting backdoor attack performance.
摘要
<> transtable text into Simplified Chinese.<>数据毒品基于后门攻击 aim to inject backdoors into models by manipulating training datasets without controlling the training process of the target model. Existing attack methods mainly focus on designing triggers or fusion strategies between triggers and benign samples. However, they often randomly select samples to be poisoned, disregarding the varying importance of each poisoning sample in terms of backdoor injection. A recent selection strategy filters a fixed-size poisoning sample pool by recording forgetting events, but it fails to consider the remaining samples outside the pool from a global perspective. Moreover, computing forgetting events requires significant additional computing resources. Therefore, how to efficiently and effectively select poisoning samples from the entire dataset is an urgent problem in backdoor attacks.To address this, we first introduce a poisoning mask into the regular backdoor training loss. We suppose that a backdoored model training with hard poisoning samples has a more backdoor effect on easy ones, which can be implemented by hindering the normal training process (ie, maximizing loss wrt mask). To further integrate it with the normal training process, we then propose a learnable poisoning sample selection strategy to learn the mask together with the model parameters through a min-max optimization. Specifically, the outer loop aims to achieve the backdoor attack goal by minimizing the loss based on the selected samples, while the inner loop selects hard poisoning samples that impede this goal by maximizing the loss. After several rounds of adversarial training, we finally select effective poisoning samples with high contribution. Extensive experiments on benchmark datasets demonstrate the effectiveness and efficiency of our approach in boosting backdoor attack performance.
On the Sensitivity of Deep Load Disaggregation to Adversarial Attacks
for: This paper is written to investigate the vulnerability of deep neural network-based non-intrusive load monitoring (NILM) algorithms to adversarial attacks, and to provide evidence for the potential impact of these attacks on energy management systems.
methods: The paper uses two commonly employed CNN-based NILM baselines, the Sequence-to-Sequence (S2S) and Sequence-to-Point (S2P) models, and applies an adversarial attack called the Fast Gradient Sign Method (FGSM) to perturb the input sequences fed into these models.
results: The paper finds that both NILM baselines are vulnerable to adversarial attacks, with the S2P model exhibiting a significant decline in the F1-score (an average of 20%) even with small amounts of noise. This suggests that these models may not be reliable for energy management systems in residential and industrial sectors.Abstract
Non-intrusive Load Monitoring (NILM) algorithms, commonly referred to as load disaggregation algorithms, are fundamental tools for effective energy management. Despite the success of deep models in load disaggregation, they face various challenges, particularly those pertaining to privacy and security. This paper investigates the sensitivity of prominent deep NILM baselines to adversarial attacks, which have proven to be a significant threat in domains such as computer vision and speech recognition. Adversarial attacks entail the introduction of imperceptible noise into the input data with the aim of misleading the neural network into generating erroneous outputs. We investigate the Fast Gradient Sign Method (FGSM), a well-known adversarial attack, to perturb the input sequences fed into two commonly employed CNN-based NILM baselines: the Sequence-to-Sequence (S2S) and Sequence-to-Point (S2P) models. Our findings provide compelling evidence for the vulnerability of these models, particularly the S2P model which exhibits an average decline of 20\% in the F1-score even with small amounts of noise. Such weakness has the potential to generate profound implications for energy management systems in residential and industrial sectors reliant on NILM models.
摘要
Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications
results: 在 ZeroSpeech 2021 挑战中的不同任务上,以及在 TIMIT 数据集和 GramVaani 挑战 Hindi 数据集上的自动语音识别(ASR)应用中,提出的方法达到了状态的最佳结果。此外,在 ASR 应用中,HUC 表示进一步提高了与 Wav2vec、HuBERT 和 Best-RQ 等已知标准的比较。Abstract
The representation learning of speech, without textual resources, is an area of significant interest for many low resource speech applications. In this paper, we describe an approach to self-supervised representation learning from raw audio using a hidden unit clustering (HUC) framework. The input to the model consists of audio samples that are windowed and processed with 1-D convolutional layers. The learned "time-frequency" representations from the convolutional neural network (CNN) module are further processed with long short term memory (LSTM) layers which generate a contextual vector representation for every windowed segment. The HUC framework, allowing the categorization of the representations into a small number of phoneme-like units, is used to train the model for learning semantically rich speech representations. The targets consist of phoneme-like pseudo labels for each audio segment and these are generated with an iterative k-means algorithm. We explore techniques that improve the speaker invariance of the learned representations and illustrate the effectiveness of the proposed approach on two settings, i) completely unsupervised speech applications on the sub-tasks described as part of the ZeroSpeech 2021 challenge and ii) semi-supervised automatic speech recognition (ASR) applications on the TIMIT dataset and on the GramVaani challenge Hindi dataset. In these experiments, we achieve state-of-art results for various ZeroSpeech tasks. Further, on the ASR experiments, the HUC representations are shown to improve significantly over other established benchmarks based on Wav2vec, HuBERT and Best-RQ.
摘要
“对于无文本资源的语音识别,是一个具有很大的研究 интерес领域。在这篇文章中,我们描述了一种基于隐藏单位聚合(HUC)框架的自我监督表现学习方法,从原始数据中学习语音表现。输入模型包括对于对话时间轴的1-D卷积层处理的音频样本。从卷积神经网络(CNN)模块学习的“时间频率”表现被进一步处理,使用长期短记忆(LSTM)层生成每个窗口段的上下文vector表现。HUC框架,允许分类表现为一小数量的语音单元,用于训练模型从语音表现学习具有含义的Semantic Speech表现。目标包括每个语音段的phoneme-likepseudo标签,通过迭代k-means算法生成。我们探索提高话者不对称性的学习表现技术,并在两个设定下评估提案的效果:一、完全不监督语音应用程序中的ZeroSpeech 2021挑战中的多个子 зада项目,和二、半监督自动语音识别(ASR)应用程序中的TIMIT数据集和GramVaani挑战中的Hindi数据集。在这些实验中,我们 дости得了ZeroSpeech任务的国际最佳成绩,并且在ASR实验中,HUC表现明显提高了较之Wav2vec、HuBERT和Best-RQ参考。”
A Context-Aware Cutting Plane Selection Algorithm for Mixed-Integer Programming
results: 该论文在MIPLIB 2017 benchmark集上实现了5%的性能提升。Abstract
The current cut selection algorithm used in mixed-integer programming solvers has remained largely unchanged since its creation. In this paper, we propose a set of new cut scoring measures, cut filtering techniques, and stopping criteria, extending the current state-of-the-art algorithm and obtaining a 5\% performance improvement for SCIP over the MIPLIB 2017 benchmark set.
摘要
当前的割选算法在杂Integer编程解决器中一直保持不变,在这篇论文中,我们提出了一组新的割分得分、割选技术和停止标准,对SCIP进行扩展,并在MIPLIB 2017 benchark集上实现5%的性能提升。Note: "SCIP" stands for "Solving Constraint Satisfaction Problems" and "MIPLIB" stands for "Mixed-Integer Programming Library".
results: 这篇论文的结果表明,使用这种常数性预测方法可以保证 asymptotic normality 性和近似优化的 asymptotic variance 性,并且在多臂投机中可以保持非尽含 asymptotic normality 性。Abstract
Sequential data collection has emerged as a widely adopted technique for enhancing the efficiency of data gathering processes. Despite its advantages, such data collection mechanism often introduces complexities to the statistical inference procedure. For instance, the ordinary least squares (OLS) estimator in an adaptive linear regression model can exhibit non-normal asymptotic behavior, posing challenges for accurate inference and interpretation. In this paper, we propose a general method for constructing debiased estimator which remedies this issue. It makes use of the idea of adaptive linear estimating equations, and we establish theoretical guarantees of asymptotic normality, supplemented by discussions on achieving near-optimal asymptotic variance. A salient feature of our estimator is that in the context of multi-armed bandits, our estimator retains the non-asymptotic performance of the least square estimator while obtaining asymptotic normality property. Consequently, this work helps connect two fruitful paradigms of adaptive inference: a) non-asymptotic inference using concentration inequalities and b) asymptotic inference via asymptotic normality.
摘要
纵向数据收集已成为数据收集过程中广泛采用的技术。虽然它具有优点,但这种数据收集机制经常带来统计推断过程中的复杂性。例如,通用最小二乘(OLS)估计器在自适应线性回归模型中可能会表现出非正态极限行为,从而增加准确推断和解释的困难。在这篇论文中,我们提出一种通用的构建减偏估计器的方法。它基于适应线性估计方程的想法,并为我们提供了正负合理的非正态性和优化性的理论保证。我们的估计器在多臂投掷中具有非 asymptotic性的性能,同时具有 asymptotic normality 性质。因此,这种工作可以将两种有用的推断方法相连接起来:a) 非 asymptotic 推断使用集中不等式和 b) asymptotic 推断通过 asymptotic normality。
Scalable Deep Learning for RNA Secondary Structure Prediction
paper_authors: Jörg K. H. Franke, Frederic Runge, Frank Hutter
for: 本研究旨在提出一种深度学习模型,用于预测RNA的次STRUCTURE。
methods: 该模型使用轴对注意力和循环在隐藏空间进行设计,以提高性能。
results: 该方法在TS0benchmark数据集上达到了状态的最佳性能,并且超过了使用外部信息的方法。此外,实验表明,RNAformer可以学习RNA折叠过程的生物物理模型。Abstract
The field of RNA secondary structure prediction has made significant progress with the adoption of deep learning techniques. In this work, we present the RNAformer, a lean deep learning model using axial attention and recycling in the latent space. We gain performance improvements by designing the architecture for modeling the adjacency matrix directly in the latent space and by scaling the size of the model. Our approach achieves state-of-the-art performance on the popular TS0 benchmark dataset and even outperforms methods that use external information. Further, we show experimentally that the RNAformer can learn a biophysical model of the RNA folding process.
摘要
领域中的RNA次STRUCTURE预测技术已经做出了重要的进步,通过使用深度学习技术。在这项工作中,我们提出了RNAformer,一种简洁的深度学习模型,使用轴向注意力和重复在幂空间。我们通过设计模型的结构,直接在幂空间模型邻接矩阵,并通过增大模型的大小来提高性能。我们的方法在TS0测试集上达到了状态级性能,甚至超越了使用外部信息的方法。此外,我们实验表明RNAformer可以学习RNA折叠过程的生物物理模型。
Hybrid moderation in the newsroom: Recommending featured posts to content moderators
results: 研究发现,添加文本特征可以获得最佳分类效果,而内容Moderator对推荐的评论进行评估时,NDCG分数均为0.83。Abstract
Online news outlets are grappling with the moderation of user-generated content within their comment section. We present a recommender system based on ranking class probabilities to support and empower the moderator in choosing featured posts, a time-consuming task. By combining user and textual content features we obtain an optimal classification F1-score of 0.44 on the test set. Furthermore, we observe an optimum mean NDCG@5 of 0.87 on a large set of validation articles. As an expert evaluation, content moderators assessed the output of a random selection of articles by choosing comments to feature based on the recommendations, which resulted in a NDCG score of 0.83. We conclude that first, adding text features yields the best score and second, while choosing featured content remains somewhat subjective, content moderators found suitable comments in all but one evaluated recommendations. We end the paper by analyzing our best-performing model, a step towards transparency and explainability in hybrid content moderation.
摘要
在线新闻媒体面临用户生成内容的Moderation问题。我们提出一种基于排名类probability的推荐系统,以支持和强化Moderator在选择推荐文章中的决策。通过结合用户和文本内容特征,我们获得了优化的分类F1得分0.44。此外,我们在大量验证文章上观察到了最佳的NDCG@50.87。专业评估人员对一些随机选择的文章中的评论进行评估,结果显示了NDCG分数0.83。我们得出了两点结论:一、添加文本特征可以获得最好的分数;二、选择推荐内容仍然有一定的主观性,但Moderator在所有评估的推荐中都可以找到合适的评论。我们结束这篇论文,并进行了最高表现的模型的分析,这是一种对于混合内容Moderation的透明度和解释性的一步。
methods: 我们引入了HEAL-SWIN transformer,它将astrophysics和cosmology中使用的高度均匀的Hierarchical Equal Area iso-Latitude Pixelation(HEALPix)格子和Hierarchical Shifted-Window(SWIN)变换器相结合,以实现高效和灵活的模型,能够在高分辨率、扭曲free的球形数据上训练。在HEAL-SWIN中,HEALPix格子的嵌套结构用于执行覆盖和窗口操作,从而实现了一个简单的一维表示,以最小化计算开销。
results: 我们在semantic segmentation和depth regression任务上使用HEAL-SWIN模型,并在 sintetic和实际的汽车数据集上达到了superior表现。我们的代码可以在https://github.com/JanEGerken/HEAL-SWIN中找到。Abstract
High-resolution wide-angle fisheye images are becoming more and more important for robotics applications such as autonomous driving. However, using ordinary convolutional neural networks or vision transformers on this data is problematic due to projection and distortion losses introduced when projecting to a rectangular grid on the plane. We introduce the HEAL-SWIN transformer, which combines the highly uniform Hierarchical Equal Area iso-Latitude Pixelation (HEALPix) grid used in astrophysics and cosmology with the Hierarchical Shifted-Window (SWIN) transformer to yield an efficient and flexible model capable of training on high-resolution, distortion-free spherical data. In HEAL-SWIN, the nested structure of the HEALPix grid is used to perform the patching and windowing operations of the SWIN transformer, resulting in a one-dimensional representation of the spherical data with minimal computational overhead. We demonstrate the superior performance of our model for semantic segmentation and depth regression tasks on both synthetic and real automotive datasets. Our code is available at https://github.com/JanEGerken/HEAL-SWIN.
摘要
高分辨率宽角鱼眼图像在自动驾驶应用中变得日益重要。然而,使用常见的卷积神经网络或视Transformer在这些数据上是困难的,因为将平面上的投影和变形损失引入到模型中。我们介绍了HEAL-SWIN transformer,它将astrophysics和cosmology中使用的高度均匀的 Hierarchical Equal Area iso-Latitude Pixelation(HEALPix)格子与 Hierarchical Shifted-Window(SWIN)transformer结合,生成一种高效和灵活的模型,可以在高分辨率、变形free的球形数据上训练。在HEAL-SWIN中,HEALPix格子的嵌套结构用于执行覆盖和窗口操作,从而实现一个具有最小计算开销的一维表示形式。我们在semantic segmentation和深度回归任务中示出了HEAL-SWIN模型的优秀性能,并提供了实际汽车数据集和Synthetic数据集的实验结果。代码可以在https://github.com/JanEGerken/HEAL-SWIN上获取。
Solving higher-order Lane-Emden-Fowler type equations using physics-informed neural networks: benchmark tests comparing soft and hard constraints
results: 论文分析了两种PINNs技术的变体,并对其进行比较。首先,使用最小化程序来约束总损失函数的神经网络,其中方程 residual 被视为物理学权重,并与训练数据损失相加。其次,使用特定的试解方法来确保这些条件,以便满足微分方程。Abstract
In this paper, numerical methods using Physics-Informed Neural Networks (PINNs) are presented with the aim to solve higher-order ordinary differential equations (ODEs). Indeed, this deep-learning technique is successfully applied for solving different classes of singular ODEs, namely the well known second-order Lane-Emden equations, third order-order Emden-Fowler equations, and fourth-order Lane-Emden-Fowler equations. Two variants of PINNs technique are considered and compared. First, a minimization procedure is used to constrain the total loss function of the neural network, in which the equation residual is considered with some weight to form a physics-based loss and added to the training data loss that contains the initial/boundary conditions. Second, a specific choice of trial solutions ensuring these conditions as hard constraints is done in order to satisfy the differential equation, contrary to the first variant based on training data where the constraints appear as soft ones. Advantages and drawbacks of PINNs variants are highlighted.
摘要
在本文中,使用物理学信息泛化神经网络(PINNs)的数字方法被提出,以解决更高阶常微分方程(ODEs)。这种深度学习技术成功地应用于不同类型的缺陷ODEs,包括著名的第二阶 Lane-Emden方程、第三阶 Emden-Fowler方程和第四阶 Lane-Emden-Fowler方程。文中考虑了两种PINNs技术的变体,并进行比较。首先,使用最小化过程来约束神经网络总损失函数,其中对等式剩余的权重加到了训练数据损失函数中,包含初始/边界条件。其次,通过特定的试解方式来确保这些条件,而不是通过训练数据来强制满足这些条件。文中高亮了PINNs变体的优势和缺点。
Similarity-based Memory Enhanced Joint Entity and Relation Extraction
paper_authors: Witold Kosciukiewicz, Mateusz Wojcik, Tomasz Kajdanowicz, Adam Gonczarek
for: joint entity and relation extraction
methods: bidirectional memory-like dependency between tasks
results: outperforms existing methods, achieves state-of-the-art results on BioCreative V CDR corpusAbstract
Document-level joint entity and relation extraction is a challenging information extraction problem that requires a unified approach where a single neural network performs four sub-tasks: mention detection, coreference resolution, entity classification, and relation extraction. Existing methods often utilize a sequential multi-task learning approach, in which the arbitral decomposition causes the current task to depend only on the previous one, missing the possible existence of the more complex relationships between them. In this paper, we present a multi-task learning framework with bidirectional memory-like dependency between tasks to address those drawbacks and perform the joint problem more accurately. Our empirical studies show that the proposed approach outperforms the existing methods and achieves state-of-the-art results on the BioCreative V CDR corpus.
摘要
文档级联合实体和关系抽取是一个复杂的信息抽取问题,需要一个统一的方法,其中单个神经网络执行四个子任务:提及检测、核心归一化、实体分类和关系抽取。现有方法通常采用顺序多任务学习方法,其中当前任务只依赖于前一个任务,缺失可能存在更复杂的关系 между them。在本文中,我们提出了一种多任务学习框架,其中任务之间具有双向记忆型的依赖关系,以解决这些缺陷并更加准确地解决联合问题。我们的实验表明,我们的方法在BioCreative V CDR corpus上达到了状态之 искусственный智能的最佳结果。
3D Shape-Based Myocardial Infarction Prediction Using Point Cloud Classification Networks
results: 我们在1068名UK Biobank试验者身上进行了预现MI检测和新生MI预测,比较了临床benchmark,获得了13%和5%的提升。此外,我们分析了每个心脏 ventricle 和心脏阶段的角色,并进行了静止分析,描述了MI结果的 morphological 和 physiological 模式。Abstract
Myocardial infarction (MI) is one of the most prevalent cardiovascular diseases with associated clinical decision-making typically based on single-valued imaging biomarkers. However, such metrics only approximate the complex 3D structure and physiology of the heart and hence hinder a better understanding and prediction of MI outcomes. In this work, we investigate the utility of complete 3D cardiac shapes in the form of point clouds for an improved detection of MI events. To this end, we propose a fully automatic multi-step pipeline consisting of a 3D cardiac surface reconstruction step followed by a point cloud classification network. Our method utilizes recent advances in geometric deep learning on point clouds to enable direct and efficient multi-scale learning on high-resolution surface models of the cardiac anatomy. We evaluate our approach on 1068 UK Biobank subjects for the tasks of prevalent MI detection and incident MI prediction and find improvements of ~13% and ~5% respectively over clinical benchmarks. Furthermore, we analyze the role of each ventricle and cardiac phase for 3D shape-based MI detection and conduct a visual analysis of the morphological and physiological patterns typically associated with MI outcomes.
摘要
我ocardial infarction (MI) 是心血管疾病中最常见的一种,且相关的临床决策通常基于单个图像生物标志物。然而,这些指标只是心脏的复杂三维结构和生物学的估算,因此难以更好地理解和预测MI结果。在这种工作中,我们研究了使用完整的三维卡达形状来提高MI事件的检测。为此,我们提出了一个完全自动多步骤管道,包括三维卡达形状重建步骤和点云分类网络。我们的方法利用了最新的点云深度学习的进步,以实现直接和高效地多级学习高分辨率表面模型的卡达形状。我们对UK Biobank的1068名参与者进行了预现MI检测和新生MI预测两个任务,并发现我们的方法与临床标准差分别为~13%和~5%。此外,我们还分析了每个肺和律动期对三维形状基本MI检测的作用,并进行了MI结果的Visual分析,描述了通常与MI结果相关的形态和生理学特征。
Reinforcement Learning with Frontier-Based Exploration via Autonomous Environment
results: 提高 ExploreORB 的探索过程,实现更加精确的地图建模Here’s a more detailed explanation of each point:
for: The paper aims to improve the exploration and mapping process of autonomous robots by combining Visual-Graph SLAM with reinforcement learning.
methods: The proposed algorithm uses frontier-based exploration to detect unexplored areas and reinforcement learning to optimize the robot’s movement. The algorithm also integrates the robot’s sensory data using Graph SLAM to build an accurate map of the environment.
results: The proposed approach is expected to improve the efficiency and accuracy of ExploreORB by optimizing the exploration process of frontiers and building a more accurate map. The effectiveness of the proposed approach will be evaluated through experiments in various virtual environments using Gazebo.Abstract
Active Simultaneous Localisation and Mapping (SLAM) is a critical problem in autonomous robotics, enabling robots to navigate to new regions while building an accurate model of their surroundings. Visual SLAM is a popular technique that uses virtual elements to enhance the experience. However, existing frontier-based exploration strategies can lead to a non-optimal path in scenarios where there are multiple frontiers with similar distance. This issue can impact the efficiency and accuracy of Visual SLAM, which is crucial for a wide range of robotic applications, such as search and rescue, exploration, and mapping. To address this issue, this research combines both an existing Visual-Graph SLAM known as ExploreORB with reinforcement learning. The proposed algorithm allows the robot to learn and optimize exploration routes through a reward-based system to create an accurate map of the environment with proper frontier selection. Frontier-based exploration is used to detect unexplored areas, while reinforcement learning optimizes the robot's movement by assigning rewards for optimal frontier points. Graph SLAM is then used to integrate the robot's sensory data and build an accurate map of the environment. The proposed algorithm aims to improve the efficiency and accuracy of ExploreORB by optimizing the exploration process of frontiers to build a more accurate map. To evaluate the effectiveness of the proposed approach, experiments will be conducted in various virtual environments using Gazebo, a robot simulation software. Results of these experiments will be compared with existing methods to demonstrate the potential of the proposed approach as an optimal solution for SLAM in autonomous robotics.
摘要
活动同域地图Localization and Mapping(SLAM)是自主机器人领域的关键问题,帮助机器人在新区域 navigation 并建立精准的环境模型。视觉SLAM 是一种广泛使用的技术,通过虚拟元素进行增强。然而,现有的边境基于探索策略可能会导致非优化的路径,特别是在多个边境具有相似距离的情况下。这个问题可能会影响视觉SLAM 的效率和准确性,这些都是自主机器人应用的关键。为解决这个问题,本研究将 combine 现有的视觉图SLAM 知名为ExploreORB 与 reinforcement learning。提出的算法使得机器人通过奖励系统来学习和优化探索路径,以创建精准的环境模型。边境基于探索是用于检测未探索区域,而奖励学习则用于优化机器人的移动。图SLAM 然后用于将机器人的感知数据集成到环境中,建立精准的地图。提出的算法的目标是通过优化探索过程来提高ExploreORB 的效率和准确性。为评估提出的方法的有效性,将在不同的虚拟环境中进行实验,使用Gazebo 机器人 simulate 软件。实验结果将与现有方法进行比较,以证明提出的方法的潜在优势。
A Topical Approach to Capturing Customer Insight In Social Media
For: This research aims to address the challenge of fully unsupervised topic extraction in noisy, Big Data contexts.* Methods: The research uses three approaches built on the Variational Autoencoder framework: the Embedded Dirichlet Process, the Embedded Hierarchical Dirichlet Process, and the time-aware Dynamic Embedded Dirichlet Process. These nonparametric approaches determine word embeddings and topic embeddings without requiring transfer learning, but with the possibility of knowledge transfer.* Results: The research shows that the proposed models achieve equal to better performance than state-of-the-art methods on benchmark and automotive industry-related datasets from a real-world use case, and that improved evaluation metrics are needed in the field of topic modeling.Here’s the Simplified Chinese text format you requested:* For: 这项研究旨在解决大数据上不监督的话题抽取问题。* Methods: 这项研究使用了基于自适应变换器框架的三种方法:嵌入 Dirichlet 过程、嵌入层次 Dirichlet 过程和时态 Dynamic 嵌入 Dirichlet 过程。这些非 Parametric 方法可以不需要传输学习,但可以进行知识传输。* Results: 研究显示,提案的模型在benchmark和汽车业相关的数据集上实现了等于或更好的性能,并且在话题分析领域需要更好的评价指标。Abstract
The age of social media has opened new opportunities for businesses. This flourishing wealth of information is outside traditional channels and frameworks of classical marketing research, including that of Marketing Mix Modeling (MMM). Textual data, in particular, poses many challenges that data analysis practitioners must tackle. Social media constitute massive, heterogeneous, and noisy document sources. Industrial data acquisition processes include some amount of ETL. However, the variability of noise in the data and the heterogeneity induced by different sources create the need for ad-hoc tools. Put otherwise, customer insight extraction in fully unsupervised, noisy contexts is an arduous task. This research addresses the challenge of fully unsupervised topic extraction in noisy, Big Data contexts. We present three approaches we built on the Variational Autoencoder framework: the Embedded Dirichlet Process, the Embedded Hierarchical Dirichlet Process, and the time-aware Dynamic Embedded Dirichlet Process. These nonparametric approaches concerning topics present the particularity of determining word embeddings and topic embeddings. These embeddings do not require transfer learning, but knowledge transfer remains possible. We test these approaches on benchmark and automotive industry-related datasets from a real-world use case. We show that our models achieve equal to better performance than state-of-the-art methods and that the field of topic modeling would benefit from improved evaluation metrics.
摘要
“社交媒体时代对企业带来了新的机遇。这些外部 классиical 市场调查的数据 streams 和框架,包括市场混合模型(MMM)。文本数据特别是 poses 许多挑战,资料分析实践者必须解决。社交媒体是巨大、不同和噪音的文档来源。工业资料收集过程包括一定的 ETL。然而,数据中的噪音和不同来源导致需要专门的工具。即使在完全无监督的情况下,客户情感提取是一项艰辛的任务。本研究面对完全无监督的主题抽象在噪音大数据情况下的挑战。我们提出了三种基于自适应抽象框架的方法:嵌入Dirichlet过程、嵌入层次Dirichlet过程和时间意识的动态嵌入Dirichlet过程。这些非 Parametric 方法对主题进行了特殊的嵌入,并不需要转移学习。我们在 benchmark 和汽车业相关的数据集上进行了实验,并证明了我们的模型在与现有方法比较之下具有equal 或更好的性能。这显示了主题抽象领域对于改进评估指标的需求。”
Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation
for: This paper is written for researchers and practitioners working in the field of medical image segmentation, particularly those interested in the robustness of deep learning models against adversarial attacks.
methods: The paper presents a 3D frequency domain adversarial attack for volumetric medical image segmentation models, which is a novel approach that exploits the vulnerability of these models to frequency-based attacks. The authors also propose a frequency domain adversarial training approach to optimize a robust model against both voxel and frequency domain attacks.
results: The authors demonstrate the effectiveness of their proposed attack and training approach through experiments on several publicly available datasets. They show that their approach can be used to launch successful attacks on state-of-the-art volumetric medical image segmentation models, and that the proposed frequency consistency loss achieves a better tradeoff between model performance on clean and adversarial samples.Abstract
It is imperative to ensure the robustness of deep learning models in critical applications such as, healthcare. While recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks. We present a 3D frequency domain adversarial attack for volumetric medical image segmentation models and demonstrate its advantages over conventional input or voxel domain attacks. Using our proposed attack, we introduce a novel frequency domain adversarial training approach for optimizing a robust model against voxel and frequency domain attacks. Moreover, we propose frequency consistency loss to regulate our frequency domain adversarial training that achieves a better tradeoff between model's performance on clean and adversarial samples. Code is publicly available at https://github.com/asif-hanif/vafa.
摘要
必须确保深度学习模型在重要应用领域如医疗中的稳定性。Recent Advances in deep learning have improved the performance of volumetric medical image segmentation models, but these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks. We present a 3D frequency domain adversarial attack for volumetric medical image segmentation models and demonstrate its advantages over conventional input or voxel domain attacks. Using our proposed attack, we introduce a novel frequency domain adversarial training approach for optimizing a robust model against voxel and frequency domain attacks. Moreover, we propose frequency consistency loss to regulate our frequency domain adversarial training that achieves a better tradeoff between model's performance on clean and adversarial samples. codes are publicly available at https://github.com/asif-hanif/vafa.
for: This paper is written for studying a family of online decision problems called the $\mathbf{m}$-Multi-Armed Bandit (MAB) problem, which interpolates between the classical MAB problem and the Bandit with Advice Incentive (BAI) problem.
methods: The paper uses techniques from online learning, including the concept of minimax regret, to develop an optimal PAC algorithm for the pure exploration version of the $\mathbf{m}$-MAB problem, called the $\mathbf{m}$-BAI problem.
results: The paper proves tight minimax regret bounds for the $\mathbf{m}$-MAB problem and shows that the minimum number of pulls for an $(\epsilon,0.05)$-PAC algorithm of the $\mathbf{m}$-BAI problem is $\Theta\left(\frac{1}{\epsilon^2}\cdot \sum_{k=1}^K\log (m_k+1)\right)$. Additionally, the paper extends the results to a more general setting, the bandit with graph feedback, and obtains tight minimax regret bounds for several families of feedback graphs.Abstract
Learning with expert advice and multi-armed bandit are two classic online decision problems which differ on how the information is observed in each round of the game. We study a family of problems interpolating the two. For a vector $\mathbf{m}=(m_1,\dots,m_K)\in \mathbb{N}^K$, an instance of $\mathbf{m}$-MAB indicates that the arms are partitioned into $K$ groups and the $i$-th group contains $m_i$ arms. Once an arm is pulled, the losses of all arms in the same group are observed. We prove tight minimax regret bounds for $\mathbf{m}$-MAB and design an optimal PAC algorithm for its pure exploration version, $\mathbf{m}$-BAI, where the goal is to identify the arm with minimum loss with as few rounds as possible. We show that the minimax regret of $\mathbf{m}$-MAB is $\Theta\left(\sqrt{T\sum_{k=1}^K\log (m_k+1)}\right)$ and the minimum number of pulls for an $(\epsilon,0.05)$-PAC algorithm of $\mathbf{m}$-BAI is $\Theta\left(\frac{1}{\epsilon^2}\cdot \sum_{k=1}^K\log (m_k+1)\right)$. Both our upper bounds and lower bounds for $\mathbf{m}$-MAB can be extended to a more general setting, namely the bandit with graph feedback, in terms of the clique cover and related graph parameters. As consequences, we obtained tight minimax regret bounds for several families of feedback graphs.
摘要
学习与专家建议以及多臂帮手是两个经典的在线决策问题,它们在每个回合中收集信息的方式不同。我们研究了这两个问题之间的混合问题。对于一个vector $\mathbf{m} = (m_1, \ldots, m_K) \in \mathbb{N}^K$, $\mathbf{m}$-MAB问题表示将枪分成$K$个组,每个组有$m_i$枪,一旦某个枪被pull,则 Observation是所有组内的枪的loss。我们证明了$\mathbf{m}$-MAB问题的最小最大误差 bound是$\Theta\left(\sqrt{T\sum_{k=1}^K\log (m_k+1)}\right)$,并设计了一个优化的PAC算法,以实现$\mathbf{m}$-BAI问题的纯exploration版本,其目标是在最少的回合数中认定枪loss最小的枪。我们显示了这个问题的最小误差 bound是$\Theta\left(\frac{1}{\epsilon^2}\cdot \sum_{k=1}^K\log (m_k+1)\right)$。我们的Upper bound和Lower bound都可以推广到一个更通用的设定,即带有图回报的bandit问题,包括clip cover和相关的图参数。作为结果,我们得到了一些family of feedback graphs的 tight最小最大误差 bound。
Visual Explanations with Attributions and Counterfactuals on Time Series Classification
results: 我们的研究实现了三个用例,以验证我们的技术能够帮助用户(1)探索数据变换和特征相关性,(2)找到模型行为和决策界限,以及(3)了解模型的错误原因。Abstract
With the rising necessity of explainable artificial intelligence (XAI), we see an increase in task-dependent XAI methods on varying abstraction levels. XAI techniques on a global level explain model behavior and on a local level explain sample predictions. We propose a visual analytics workflow to support seamless transitions between global and local explanations, focusing on attributions and counterfactuals on time series classification. In particular, we adapt local XAI techniques (attributions) that are developed for traditional datasets (images, text) to analyze time series classification, a data type that is typically less intelligible to humans. To generate a global overview, we apply local attribution methods to the data, creating explanations for the whole dataset. These explanations are projected onto two dimensions, depicting model behavior trends, strategies, and decision boundaries. To further inspect the model decision-making as well as potential data errors, a what-if analysis facilitates hypothesis generation and verification on both the global and local levels. We constantly collected and incorporated expert user feedback, as well as insights based on their domain knowledge, resulting in a tailored analysis workflow and system that tightly integrates time series transformations into explanations. Lastly, we present three use cases, verifying that our technique enables users to (1)~explore data transformations and feature relevance, (2)~identify model behavior and decision boundaries, as well as, (3)~the reason for misclassifications.
摘要
随着Explainable Artificial Intelligence(XAI)的需求增长,我们发现在不同层次的XAI技术中,任务取决的XAI方法在增加。XAI技术在全球层次解释模型行为,而在本地层次解释特定采样预测结果。我们提议一个可见分析工作流程,以便轻松地转换到全球和本地解释之间。具体来说,我们将本地XAI技术(归因),原本是为传统数据(图像、文本)开发的,应用于时间序列分类。为了生成全局概述,我们将本地归因方法应用于数据,并生成整个数据集的解释。这些解释将被投影到两个维度上,描述模型行为趋势、策略和决策边界。此外,我们还提供了一种“什么如果”分析,以便在全球和本地层次进行假设生成和验证。我们一直收集和 интегрирова了专家用户反馈,以及它们的领域知识,从而提供了适应时间序列转换的分析工作流程和系统。最后,我们提供了三个使用案例,证明了我们的技术可以让用户(1)探索数据转换和特征相关性,(2)描述模型行为和决策边界,以及(3)知道错误的原因。
Mitigating Adversarial Vulnerability through Causal Parameter Estimation by Adversarial Double Machine Learning
results: ADML 在不同 CNN 和 Transformer 架构上进行了广泛的实验,并证明了它可以大幅提高鲁棒性并解决观察到的问题。Abstract
Adversarial examples derived from deliberately crafted perturbations on visual inputs can easily harm decision process of deep neural networks. To prevent potential threats, various adversarial training-based defense methods have grown rapidly and become a de facto standard approach for robustness. Despite recent competitive achievements, we observe that adversarial vulnerability varies across targets and certain vulnerabilities remain prevalent. Intriguingly, such peculiar phenomenon cannot be relieved even with deeper architectures and advanced defense methods. To address this issue, in this paper, we introduce a causal approach called Adversarial Double Machine Learning (ADML), which allows us to quantify the degree of adversarial vulnerability for network predictions and capture the effect of treatments on outcome of interests. ADML can directly estimate causal parameter of adversarial perturbations per se and mitigate negative effects that can potentially damage robustness, bridging a causal perspective into the adversarial vulnerability. Through extensive experiments on various CNN and Transformer architectures, we corroborate that ADML improves adversarial robustness with large margins and relieve the empirical observation.
摘要
deep neural networks 的决策过程可以轻松受到来自明确设计的干扰的攻击,这些攻击被称为对抗示例。为了防止这些潜在的威胁,各种基于对抗训练的防御方法在深度学习领域 быстро发展并成为了标准方法。 despite recent competitive achievements, we observe that adversarial vulnerability varies across targets and certain vulnerabilities remain prevalent. interestingly, such peculiar phenomenon cannot be relieved even with deeper architectures and advanced defense methods. to address this issue, in this paper, we introduce a causal approach called Adversarial Double Machine Learning (ADML), which allows us to quantify the degree of adversarial vulnerability for network predictions and capture the effect of treatments on the outcome of interests. ADML can directly estimate the causal parameter of adversarial perturbations per se and mitigate negative effects that can potentially damage robustness, bridging a causal perspective into the adversarial vulnerability. through extensive experiments on various CNN and Transformer architectures, we corroborate that ADML improves adversarial robustness with large margins and relieve the empirical observation.
Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-Training
results: 该 paper 的实验表明,通过使用 KoBo 框架,可以在 eight 个任务中 achieve 比或更好的性能,包括分类、 segmentation、 retrieval 和 semantic relatedness。Abstract
The foundation models based on pre-training technology have significantly advanced artificial intelligence from theoretical to practical applications. These models have facilitated the feasibility of computer-aided diagnosis for widespread use. Medical contrastive vision-language pre-training, which does not require human annotations, is an effective approach for guiding representation learning using description information in diagnostic reports. However, the effectiveness of pre-training is limited by the large-scale semantic overlap and shifting problems in medical field. To address these issues, we propose the Knowledge-Boosting Contrastive Vision-Language Pre-training framework (KoBo), which integrates clinical knowledge into the learning of vision-language semantic consistency. The framework uses an unbiased, open-set sample-wise knowledge representation to measure negative sample noise and supplement the correspondence between vision-language mutual information and clinical knowledge. Extensive experiments validate the effect of our framework on eight tasks including classification, segmentation, retrieval, and semantic relatedness, achieving comparable or better performance with the zero-shot or few-shot settings. Our code is open on https://github.com/ChenXiaoFei-CS/KoBo.
摘要
基于预训技术的基础模型在人工智能从理论到实用应用方面取得了 significi cant进步。这些模型使得计算机辅助诊断在广泛应用中变得可能。医学对比视语言预训,不需要人工注释,是一种有效的方法来导引描述信息在诊断报告中学习表示学习。然而,预训效果受到医学领域的大规模 semantically overlap和 shift问题的限制。为了解决这些问题,我们提出了知识增强对比视语言预训框架(KoBo),它将临床知识 integrate到视语言Semantic consistency学习中。该框架使用无偏、开放集样知识表示来度量负样噪声,并补做视语言相互信息和临床知识之间的对应关系。我们的实验证明,我们的框架在八个任务中,包括分类、 segmentation、 retrieve 和 Semantic relatedness 等八个任务中,可以 achieve comparable or better performance with zero-shot or few-shot setting。我们的代码可以在 上找到。
The Role of Transparency in Repeated First-Price Auctions with Unknown Valuations
results: 本文完整地 caracterize了在不同环境下的最小化恨觉的最佳策略,并且发现了这些策略与拍卖过程中竞争对手的信息披露程度有直接的关系。Abstract
We study the problem of regret minimization for a single bidder in a sequence of first-price auctions where the bidder knows the item's value only if the auction is won. Our main contribution is a complete characterization, up to logarithmic factors, of the minimax regret in terms of the auction's transparency, which regulates the amount of information on competing bids disclosed by the auctioneer at the end of each auction. Our results hold under different assumptions (stochastic, adversarial, and their smoothed variants) on the environment generating the bidder's valuations and competing bids. These minimax rates reveal how the interplay between transparency and the nature of the environment affects how fast one can learn to bid optimally in first-price auctions.
摘要
我们研究一个单个投标者在一系列的首价拍卖中减迟 regret 的问题。我们的主要贡献是对拍卖的透明度(控制拍卖结束时公布竞拍信息的量)进行完全 caracterization,几乎很好地快速度量。我们的结果适用于不同的环境(随机、敌意、粗糙等)生成投标者的价值和竞拍行为。这些最小最大 regret 速率 revelas 拍卖透明度和环境的交互如何影响在首价拍卖中如何快速地学习优胜投标策略。
Ed-Fed: A generic federated learning framework with resource-aware client selection for edge devices
results: 本研究的结果显示,提案的方法可以对 FL 中的等待时间进行了重要优化,比对于传统随机 Client 选择方法更好。Abstract
Federated learning (FL) has evolved as a prominent method for edge devices to cooperatively create a unified prediction model while securing their sensitive training data local to the device. Despite the existence of numerous research frameworks for simulating FL algorithms, they do not facilitate comprehensive deployment for automatic speech recognition tasks on heterogeneous edge devices. This is where Ed-Fed, a comprehensive and generic FL framework, comes in as a foundation for future practical FL system research. We also propose a novel resource-aware client selection algorithm to optimise the waiting time in the FL settings. We show that our approach can handle the straggler devices and dynamically set the training time for the selected devices in a round. Our evaluation has shown that the proposed approach significantly optimises waiting time in FL compared to conventional random client selection methods.
摘要
federated 学习(FL)已经发展为边缘设备共同创建一个统一预测模型的著名方法,同时保护边缘设备的敏感训练数据本地。尽管现有许多研究框架用于模拟FL算法,但它们不能够全面实现自动听力识别任务中的各种各样边缘设备的自动化部署。这是 гдеEd-Fed,一个通用和特定的FL框架,作为未来实用FL系统研究的基础。我们还提出了一种资源意识客户端选择算法,以优化FL设置中的等待时间。我们显示,我们的方法可以处理慢跑器设备和动态设置每个选择的训练时间。我们的评估表明,我们的方法在FL中显著优化等待时间,比传统随机客户端选择方法更好。
Omnipotent Adversarial Training for Unknown Label-noisy and Imbalanced Datasets
methods: 提出了两种创新的方法来解决噪声和数据不均衡问题:首先引入了一个 oracle into the adversarial training process,帮助模型学习正确的数据-标签 conditional distribution;其次,提出了logits adjustment adversarial training方法,帮助模型学习bayes-优化分布。
results: OAT在复杂的数据噪声和标签噪声场景下表现出较高的清洁率提升(超过20%)和Robustness提升(超过10%),比其他基elines表现更优。代码可以在https://github.com/GuanlinLee/OAT找到。Abstract
Adversarial training is an important topic in robust deep learning, but the community lacks attention to its practical usage. In this paper, we aim to resolve a real-world application challenge, i.e., training a model on an imbalanced and noisy dataset to achieve high clean accuracy and robustness, with our proposed Omnipotent Adversarial Training (OAT). Our strategy consists of two innovative methodologies to address the label noise and data imbalance in the training set. We first introduce an oracle into the adversarial training process to help the model learn a correct data-label conditional distribution. This carefully-designed oracle can provide correct label annotations for adversarial training. We further propose logits adjustment adversarial training to overcome the data imbalance challenge, which can help the model learn a Bayes-optimal distribution. Our comprehensive evaluation results show that OAT outperforms other baselines by more than 20% clean accuracy improvement and 10% robust accuracy improvement under the complex combinations of data imbalance and label noise scenarios. The code can be found in https://github.com/GuanlinLee/OAT.
摘要
<系统环境>文章描述了一种强化学习的实际应用挑战,即在受损和噪声影响的数据集上训练模型,以达到高级别的清洁精度和Robustness。作者提出了一种名为“全能对抗训练”(OAT)的方法,以解决这个挑战。该方法包括两种创新的方法,用于Addressing the label noise and data imbalance in the training set。首先,作者引入了一个“oracle”机制,用于帮助模型学习正确的数据-标签 conditional distribution。这个特制的oracle可以提供正确的标签注释 для对抗训练。其次,作者提出了Logits adjustment adversarial training,用于超越数据不均衡挑战,该方法可以帮助模型学习bayes优化的分布。作者的全面评估结果表明,OAT在复杂的数据不均衡和标签噪声场景下表现出较高的纯净精度和Robustness,与基线相比,OAT的纯净精度提高了 más de 20%,Robustness提高了 más de 10%。code可以在https://github.com/GuanlinLee/OAT中找到。(Note: The translation is done using Google Translate and may not be perfect. Please let me know if you need any further assistance.)
Controlling dynamical systems to complex target states using machine learning: next-generation vs. classical reservoir computing
results: 在带有混沌参数的 Lorenz 系统中强制实现间歇性动力,经典散列计算机器学习可以很好地完成这个任务,而下一代散列计算机器学习则在具有有限数据量情况下表现更出色。Abstract
Controlling nonlinear dynamical systems using machine learning allows to not only drive systems into simple behavior like periodicity but also to more complex arbitrary dynamics. For this, it is crucial that a machine learning system can be trained to reproduce the target dynamics sufficiently well. On the example of forcing a chaotic parametrization of the Lorenz system into intermittent dynamics, we show first that classical reservoir computing excels at this task. In a next step, we compare those results based on different amounts of training data to an alternative setup, where next-generation reservoir computing is used instead. It turns out that while delivering comparable performance for usual amounts of training data, next-generation RC significantly outperforms in situations where only very limited data is available. This opens even further practical control applications in real world problems where data is restricted.
摘要
使用机器学习控制非线性动力系统可以不仅将系统驱动到简单的行为如周期性,还可以将系统驱动到更加复杂的任意动力。为此,机器学习系统必须能够足够优化目标动力。在 Lorenz 系统的启动实验中,我们展示了 класичний 预设 Reservoir Computing 能够优化目标动力。在下一步,我们比较了这些结果,根据不同的训练数据量来进行比较。结果显示,对于通常的训练数据量, класичний Reservoir Computing 和 next-generation Reservoir Computing 的性能相似。但是,在仅有有限数据的情况下,next-generation RC 却能够优化性能。这开启了对实际世界问题中的应用。
Adversarial Training Over Long-Tailed Distribution
methods: 该 paper propose了一种新的对抗训练框架,即重新平衡对抗训练(REAT),它包括一种新的训练策略和一种特制的罚函数。
results: 评估结果表明,REAT可以有效提高模型的Robustness和保持模型的干净精度。代码可以在https://github.com/GuanlinLee/REAT找到。Abstract
In this paper, we study adversarial training on datasets that obey the long-tailed distribution, which is practical but rarely explored in previous works. Compared with conventional adversarial training on balanced datasets, this process falls into the dilemma of generating uneven adversarial examples (AEs) and an unbalanced feature embedding space, causing the resulting model to exhibit low robustness and accuracy on tail data. To combat that, we propose a new adversarial training framework -- Re-balancing Adversarial Training (REAT). This framework consists of two components: (1) a new training strategy inspired by the term effective number to guide the model to generate more balanced and informative AEs; (2) a carefully constructed penalty function to force a satisfactory feature space. Evaluation results on different datasets and model structures prove that REAT can effectively enhance the model's robustness and preserve the model's clean accuracy. The code can be found in https://github.com/GuanlinLee/REAT.
摘要
在这篇论文中,我们研究了对具有长尾分布的数据集进行反击训练,这种情况在前一些研究中很少被考虑。与传统的反击训练中的平衡数据集相比,这个过程会陷入生成不均匀的反击示例(AE)和不均匀的特征空间,导致模型的robustness和纯度在尾部数据上降低。为了解决这个问题,我们提出了一个新的反击训练框架——重新平衡反击训练(REAT)。这个框架包括两个组成部分:(1)一种基于有效数量的新训练策略,以引导模型生成更加均匀和有用的反击示例(AE);(2)一种特殊构造的罚函数,以强制模型的特征空间具有满意的性质。经过不同数据集和模型结构的评估,我们发现REAT可以有效地提高模型的robustness,同时保持模型的纯度。代码可以在https://github.com/GuanlinLee/REAT中找到。
Benchmarks and Custom Package for Electrical Load Forecasting
paper_authors: Zhixian Wang, Qingsong Wen, Chaoli Zhang, Liang Sun, Leandro Von Krannichfeldt, Yi Wang
for: Load forecasting is of great significance in the power industry, as it can provide a reference for subsequent tasks such as power grid dispatch, thus bringing huge economic benefits.
methods: The paper provides a comprehensive load forecasting archive, including load domain-specific feature engineering to help forecasting models better model load data. The paper also customizes the loss function based on the forecasting error, integrating it into the forecasting framework.
results: The paper conducts extensive experiments on load data at different levels, providing a reference for researchers to compare different load forecasting models.Abstract
Load forecasting is of great significance in the power industry as it can provide a reference for subsequent tasks such as power grid dispatch, thus bringing huge economic benefits. However, there are many differences between load forecasting and traditional time series forecasting. On the one hand, load forecasting aims to minimize the cost of subsequent tasks such as power grid dispatch, rather than simply pursuing prediction accuracy. On the other hand, the load is largely influenced by many external factors, such as temperature or calendar variables. In addition, the scale of predictions (such as building-level loads and aggregated-level loads) can also significantly impact the predicted results. In this paper, we provide a comprehensive load forecasting archive, which includes load domain-specific feature engineering to help forecasting models better model load data. In addition, different from the traditional loss function which only aims for accuracy, we also provide a method to customize the loss function based on the forecasting error, integrating it into our forecasting framework. Based on this, we conducted extensive experiments on load data at different levels, providing a reference for researchers to compare different load forecasting models.
摘要
Load forecasting在电力业中具有很大的重要性,因为它可以提供后续任务 such as 电力网络调度的参考,从而带来巨大的经济效益。然而, load forecasting 和传统的时间序列预测有很多不同之处。一方面, load forecasting 的目标是最小化后续任务 such as 电力网络调度的成本,而不仅仅是追求预测精度。另一方面,荷是受到许多外部因素的影响,如温度或历法变量。此外,预测的规模(如建筑级别的荷和汇总级别的荷)也可能对预测结果产生重要影响。在这篇论文中,我们提供了一个完整的荷预测档案,其中包括荷领域特定的特征工程,以 помочь预测模型更好地模型荷数据。此外,不同于传统的损失函数,我们还提供了一种基于预测错误的自定义损失函数,并将其 integrate 到我们的预测框架中。基于这,我们在不同级别的荷数据上进行了广泛的实验,提供了参考 для研究人员比较不同的荷预测模型。
Multiplicative update rules for accelerating deep learning training and increasing robustness
results: 提出了一种新的乘数更新规则,并将其与传统的加法更新规则相结合,实现了一种新的混合更新方法,可以加速训练,并使模型更加稳定。实验结果表明,该方法在多种任务和优化方法上达到了更好的效果。Abstract
Even nowadays, where Deep Learning (DL) has achieved state-of-the-art performance in a wide range of research domains, accelerating training and building robust DL models remains a challenging task. To this end, generations of researchers have pursued to develop robust methods for training DL architectures that can be less sensitive to weight distributions, model architectures and loss landscapes. However, such methods are limited to adaptive learning rate optimizers, initialization schemes, and clipping gradients without investigating the fundamental rule of parameters update. Although multiplicative updates have contributed significantly to the early development of machine learning and hold strong theoretical claims, to best of our knowledge, this is the first work that investigate them in context of DL training acceleration and robustness. In this work, we propose an optimization framework that fits to a wide range of optimization algorithms and enables one to apply alternative update rules. To this end, we propose a novel multiplicative update rule and we extend their capabilities by combining it with a traditional additive update term, under a novel hybrid update method. We claim that the proposed framework accelerates training, while leading to more robust models in contrast to traditionally used additive update rule and we experimentally demonstrate their effectiveness in a wide range of task and optimization methods. Such tasks ranging from convex and non-convex optimization to difficult image classification benchmarks applying a wide range of traditionally used optimization methods and Deep Neural Network (DNN) architectures.
摘要
DISPEL: Domain Generalization via Domain-Specific Liberating
results: 实验结果显示,使用DISPEL可以超过现有的方法,并且可以进一步泛化多种算法。Abstract
Domain generalization aims to learn a generalization model that can perform well on unseen test domains by only training on limited source domains. However, existing domain generalization approaches often bring in prediction-irrelevant noise or require the collection of domain labels. To address these challenges, we consider the domain generalization problem from a different perspective by categorizing underlying feature groups into domain-shared and domain-specific features. Nevertheless, the domain-specific features are difficult to be identified and distinguished from the input data. In this work, we propose DomaIn-SPEcific Liberating (DISPEL), a post-processing fine-grained masking approach that can filter out undefined and indistinguishable domain-specific features in the embedding space. Specifically, DISPEL utilizes a mask generator that produces a unique mask for each input data to filter domain-specific features. The DISPEL framework is highly flexible to be applied to any fine-tuned models. We derive a generalization error bound to guarantee the generalization performance by optimizing a designed objective loss. The experimental results on five benchmarks demonstrate DISPEL outperforms existing methods and can further generalize various algorithms.
摘要
领域总则目标是学习一个通用模型,可以在未经见过的测试领域中表现良好,只需在有限的源领域上进行训练。然而,现有的领域总则方法经常带来预测无关的噪声或需要收集领域标签。为了解决这些挑战,我们从不同的视角来看待领域总则问题,将基层特征分为领域共享特征和领域特定特征。然而,领域特定特征很难以被识别和从输入数据中分离出来。为此,我们提出了DISPEL方法,一种基于post处理的细化掩模 Approach,可以在嵌入空间中筛除未定义和难以分辨的领域特定特征。DISPEL框架可以适应任何精度调整的模型。我们 derive一个总体性的泛化误差上限,以确保泛化性能。实验结果在五个 benchmark 上表明,DISPEL方法可以超过现有方法,并可以进一步泛化多种算法。
A Surrogate Data Assimilation Model for the Estimation of Dynamical System in a Limited Area
results: 该surrogate DA模型的设计基于一种可靠的理论基础,利用了两个基本概念:观察性和有效区域。观察性Enable我们量化精确的DA数据必要的量度,而有效区域可以大幅减少计算 observability和生成训练数据的计算卷积。Abstract
We propose a novel learning-based surrogate data assimilation (DA) model for efficient state estimation in a limited area. Our model employs a feedforward neural network for online computation, eliminating the need for integrating high-dimensional limited-area models. This approach offers significant computational advantages over traditional DA algorithms. Furthermore, our method avoids the requirement of lateral boundary conditions for the limited-area model in both online and offline computations. The design of our surrogate DA model is built upon a robust theoretical framework that leverages two fundamental concepts: observability and effective region. The concept of observability enables us to quantitatively determine the optimal amount of observation data necessary for accurate DA. Meanwhile, the concept of effective region substantially reduces the computational burden associated with computing observability and generating training data.
摘要
我们提出了一种新型的学习基于的数据拟合(DA)模型,用于限区域的高效状态估计。我们的模型使用了一个Feedforward神经网络进行在线计算,从而消除了高维限区域模型的集成需求。这种方法在传统DA算法中提供了重要的计算优势。此外,我们的方法不需要限区域模型的 Lateral边界条件, neither in online nor offline computations。我们的拟合DA模型的设计基于一种可靠的理论基础,利用了两个基本概念:观察性和有效区域。观察性使我们能够量化确定需要用于准确DA的观察数据量。同时,有效区域概念减少了计算观察性和生成训练数据的计算压力。
Safe DreamerV3: Safe Reinforcement Learning with World Models
results: 该方法在Safety-Gymnasium benchmark中能够在低维度和视觉任务中实现几乎零成本,是现有SafeRL方法中第一个达到这种目标的算法。Here’s the format you requested:
for: <what are the paper written for?>
methods: <what methods the paper use?>
results: <what results the paper get?>I hope this helps! Let me know if you have any other questions.Abstract
The widespread application of Reinforcement Learning (RL) in real-world situations is yet to come to fruition, largely as a result of its failure to satisfy the essential safety demands of such systems. Existing safe reinforcement learning (SafeRL) methods, employing cost functions to enhance safety, fail to achieve zero-cost in complex scenarios, including vision-only tasks, even with comprehensive data sampling and training. To address this, we introduce Safe DreamerV3, a novel algorithm that integrates both Lagrangian-based and planning-based methods within a world model. Our methodology represents a significant advancement in SafeRL as the first algorithm to achieve nearly zero-cost in both low-dimensional and vision-only tasks within the Safety-Gymnasium benchmark. Our project website can be found in: https://sites.google.com/view/safedreamerv3.
摘要
RL在实际场景中广泛应用仍未实现,主要是因为它无法满足实际系统的安全需求。现有的安全强化学习(SafeRL)方法,通过成本函数提高安全性,在复杂的场景中,包括视觉任务,甚至 WITH 全面数据采样和训练仍未达到零成本。为解决这个问题,我们介绍了Safe DreamerV3算法,它将拉格朗日式方法和规划方法 integrate 到世界模型中。我们的方法在 Safety-Gymnasium benchmark 中实现了low-dimensional和视觉任务中的几乎零成本,是SafeRL中的一大突破。您可以在以下网站上找到我们的项目网站:https://sites.google.com/view/safedreamerv3.
FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout
results: 比较 experiments 表明,FedBIAD 可以提供 2x 的上行减少和 2.41% 的精度提高,而且可以适应非 Independently 和 Identically Distributed(non-IID)数据。Abstract
Federated Learning (FL) emerges as a distributed machine learning paradigm without end-user data transmission, effectively avoiding privacy leakage. Participating devices in FL are usually bandwidth-constrained, and the uplink is much slower than the downlink in wireless networks, which causes a severe uplink communication bottleneck. A prominent direction to alleviate this problem is federated dropout, which drops fractional weights of local models. However, existing federated dropout studies focus on random or ordered dropout and lack theoretical support, resulting in unguaranteed performance. In this paper, we propose Federated learning with Bayesian Inference-based Adaptive Dropout (FedBIAD), which regards weight rows of local models as probability distributions and adaptively drops partial weight rows based on importance indicators correlated with the trend of local training loss. By applying FedBIAD, each client adaptively selects a high-quality dropping pattern with accurate approximations and only transmits parameters of non-dropped weight rows to mitigate uplink costs while improving accuracy. Theoretical analysis demonstrates that the convergence rate of the average generalization error of FedBIAD is minimax optimal up to a squared logarithmic factor. Extensive experiments on image classification and next-word prediction show that compared with status quo approaches, FedBIAD provides 2x uplink reduction with an accuracy increase of up to 2.41% even on non-Independent and Identically Distributed (non-IID) data, which brings up to 72% decrease in training time.
摘要
federated learning (FL) emerges as a distributed machine learning paradigm without end-user data transmission, effectively avoiding privacy leakage. participating devices in FL are usually bandwidth-constrained, and the uplink is much slower than the downlink in wireless networks, which causes a severe uplink communication bottleneck. a prominent direction to alleviate this problem is federated dropout, which drops fractional weights of local models. however, existing federated dropout studies focus on random or ordered dropout and lack theoretical support, resulting in unguaranteed performance. in this paper, we propose federated learning with bayesian inference-based adaptive dropout (fedbiad), which regards weight rows of local models as probability distributions and adaptively drops partial weight rows based on importance indicators correlated with the trend of local training loss. by applying fedbiad, each client adaptively selects a high-quality dropping pattern with accurate approximations and only transmits parameters of non-dropped weight rows to mitigate uplink costs while improving accuracy. theoretical analysis demonstrates that the convergence rate of the average generalization error of fedbiad is minimax optimal up to a squared logarithmic factor. extensive experiments on image classification and next-word prediction show that compared with status quo approaches, fedbiad provides 2x uplink reduction with an accuracy increase of up to 2.41% even on non-independent and identically distributed (non-iid) data, which brings up to 72% decrease in training time.
HYTREL: Hypergraph-enhanced Tabular Data Representation Learning
paper_authors: Pei Chen, Soumajyoti Sarkar, Leonard Lausen, Balasubramaniam Srinivasan, Sheng Zha, Ruihong Huang, George Karypis
for: 该 paper 的目的是提出一种基于 hypergraph 的表格语言模型(HYTREL),以捕捉表格数据中的 permutation 不变性和层次结构等特性。
methods: 该 paper 使用 hypergraphs 将表格 cells 作为节点,并通过三种不同类型的 hyperedges 表示表格中每一行、每一列和整个表格的结构。
results: 实验结果显示,HYTREL 在四个下游任务上具有优于其他竞争对手的表现,并且具有最大的 permutation 不变性。qualitative 分析表明,HYTREL 可以吸收表格结构,生成稳定的表格 Cell、行、列和整个表格的表示。Abstract
Language models pretrained on large collections of tabular data have demonstrated their effectiveness in several downstream tasks. However, many of these models do not take into account the row/column permutation invariances, hierarchical structure, etc. that exist in tabular data. To alleviate these limitations, we propose HYTREL, a tabular language model, that captures the permutation invariances and three more structural properties of tabular data by using hypergraphs - where the table cells make up the nodes and the cells occurring jointly together in each row, column, and the entire table are used to form three different types of hyperedges. We show that HYTREL is maximally invariant under certain conditions for tabular data, i.e., two tables obtain the same representations via HYTREL iff the two tables are identical up to permutations. Our empirical results demonstrate that HYTREL consistently outperforms other competitive baselines on four downstream tasks with minimal pretraining, illustrating the advantages of incorporating the inductive biases associated with tabular data into the representations. Finally, our qualitative analyses showcase that HYTREL can assimilate the table structures to generate robust representations for the cells, rows, columns, and the entire table.
摘要
Language models pre-trained on large collections of tabular data have demonstrated their effectiveness in several downstream tasks. However, many of these models do not take into account the row/column permutation invariances, hierarchical structure, etc. that exist in tabular data. To alleviate these limitations, we propose HYTREL, a tabular language model, that captures the permutation invariances and three more structural properties of tabular data by using hypergraphs - where the table cells make up the nodes and the cells occurring jointly together in each row, column, and the entire table are used to form three different types of hyperedges. We show that HYTREL is maximally invariant under certain conditions for tabular data, i.e., two tables obtain the same representations via HYTREL iff the two tables are identical up to permutations. Our empirical results demonstrate that HYTREL consistently outperforms other competitive baselines on four downstream tasks with minimal pretraining, illustrating the advantages of incorporating the inductive biases associated with tabular data into the representations. Finally, our qualitative analyses showcase that HYTREL can assimilate the table structures to generate robust representations for the cells, rows, columns, and the entire table.Here's the translation in Traditional Chinese:Language models pre-trained on large collections of tabular data have demonstrated their effectiveness in several downstream tasks. However, many of these models do not take into account the row/column permutation invariances, hierarchical structure, etc. that exist in tabular data. To alleviate these limitations, we propose HYTREL, a tabular language model, that captures the permutation invariances and three more structural properties of tabular data by using hypergraphs - where the table cells make up the nodes and the cells occurring jointly together in each row, column, and the entire table are used to form three different types of hyperedges. We show that HYTREL is maximally invariant under certain conditions for tabular data, i.e., two tables obtain the same representations via HYTREL iff the two tables are identical up to permutations. Our empirical results demonstrate that HYTREL consistently outperforms other competitive baselines on four downstream tasks with minimal pretraining, illustrating the advantages of incorporating the inductive biases associated with tabular data into the representations. Finally, our qualitative analyses showcase that HYTREL can assimilate the table structures to generate robust representations for the cells, rows, columns, and the entire table.
Certified Robustness for Large Language Models with Self-Denoising
For: 该论文目的是提高大型语言模型(LLM)的稳定性和鲁棒性,使其在高风险环境中能够更加可靠。* Methods: 该论文提出了一种基于 LLM 自我净化方法,通过利用 LLM 的多任务特性来降低随机干扰的影响,提高 LLM 的证明性和鲁棒性。* Results: 实验结果显示,该方法可以在证明稳定性和实际鲁棒性两个方面表现出色,并且比现有的证明方法更高效和灵活。Here’s the English version of the summary for reference:* For: The paper aims to improve the robustness and stability of large language models (LLMs) in high-stakes environments, making them more reliable.* Methods: The paper proposes a self-denoising method based on LLM’s multitasking nature to reduce the impact of random noise and improve the certification and empirical robustness of LLMs.* Results: Experimental results show that the proposed method outperforms existing certification methods in both certified robustness and empirical robustness, and is more efficient and flexible.Abstract
Although large language models (LLMs) have achieved great success in vast real-world applications, their vulnerabilities towards noisy inputs have significantly limited their uses, especially in high-stake environments. In these contexts, it is crucial to ensure that every prediction made by large language models is stable, i.e., LLM predictions should be consistent given minor differences in the input. This largely falls into the study of certified robust LLMs, i.e., all predictions of LLM are certified to be correct in a local region around the input. Randomized smoothing has demonstrated great potential in certifying the robustness and prediction stability of LLMs. However, randomized smoothing requires adding noise to the input before model prediction, and its certification performance depends largely on the model's performance on corrupted data. As a result, its direct application to LLMs remains challenging and often results in a small certification radius. To address this issue, we take advantage of the multitasking nature of LLMs and propose to denoise the corrupted inputs with LLMs in a self-denoising manner. Different from previous works like denoised smoothing, which requires training a separate model to robustify LLM, our method enjoys far better efficiency and flexibility. Our experiment results show that our method outperforms the existing certification methods under both certified robustness and empirical robustness. The codes are available at https://github.com/UCSB-NLP-Chang/SelfDenoise.
摘要
尽管大语言模型(LLM)在各种实际应用中取得了很大成功,但它们对噪输入的敏感性却有限制了其应用范围,特别是在高赔率环境中。在这些情况下,确保大语言模型的每一个预测是稳定的,即LLM的预测结果在输入数据的小地方很少异常。这主要归结于证明了大语言模型的稳定性和预测稳定性。随机填充有显示出了潜在的潜在性,但随机填充需要在模型预测之前添加噪音,其证明性取决于模型在损害数据上的性能。因此,直接应用随机填充到LLM上是挑战性的,通常会导致小的证明半径。为解决这个问题,我们利用了LLM的多任务性,并提议使用LLM自身来推leans噪音。与前一些denoised smoothing方法不同,我们的方法不需要单独训练一个robust化模型,因此具有更高的效率和灵活性。我们的实验结果表明,我们的方法在证明性和实际性两个方面都高于现有的证明方法。代码可以在https://github.com/UCSB-NLP-Chang/SelfDenoise中获取。
Vulnerability-Aware Instance Reweighting For Adversarial Training
paper_authors: Olukorede Fakorede, Ashutosh Kumar Nirala, Modeste Atsague, Jin Tian
For: 本研究旨在提高深度学习分类器对攻击性样本的Robustness,通过在训练过程中包含攻击样本来提高分类器的Robustness。* Methods: 本研究使用了不同的重量调整方法,以优化分类器的Robustness。这些方法包括例子级重量调整、损失函数重量调整等。* Results: 经过EXTENSIVE EXPERIMENTS,研究发现,提出的新方法可以在不同的攻击方式下提高分类器的Robustness,特别是对于强白盒和黑盒攻击。Abstract
Adversarial Training (AT) has been found to substantially improve the robustness of deep learning classifiers against adversarial attacks. AT involves obtaining robustness by including adversarial examples in training a classifier. Most variants of AT algorithms treat every training example equally. However, recent works have shown that better performance is achievable by treating them unequally. In addition, it has been observed that AT exerts an uneven influence on different classes in a training set and unfairly hurts examples corresponding to classes that are inherently harder to classify. Consequently, various reweighting schemes have been proposed that assign unequal weights to robust losses of individual examples in a training set. In this work, we propose a novel instance-wise reweighting scheme. It considers the vulnerability of each natural example and the resulting information loss on its adversarial counterpart occasioned by adversarial attacks. Through extensive experiments, we show that our proposed method significantly improves over existing reweighting schemes, especially against strong white and black-box attacks.
摘要
《对抗训练》(AT)已经发现可以大幅提高深度学习分类器对假数据攻击的抗性。AT通过在训练分类器时包括对抗示例来获得强度。大多数AT算法对每个训练示例进行相同的处理。然而,最近的研究表明,可以通过不同的处理方式来获得更好的性能。此外,AT对不同类别在训练集中的影响不均,会不公正地增加对抗示例的损害,尤其是对于难以分类的类别。因此,各种重量调整方案已经被提出,将各个示例的Robust损害分配不同的重量。在这项工作中,我们提出了一种新的实例级重量调整方案。它考虑了每个自然示例的抗性和由对抗攻击引起的信息损失的对应 adversarial example。通过广泛的实验,我们显示了我们提出的方法在现有的重量调整方案中具有显著的优势,特别是对于强大的白盒和黑盒攻击。
ISAC-NET: Model-driven Deep Learning for Integrated Passive Sensing and Communication
results: 实验结果显示,ISAC-NET比传统的信号调读算法(OAMP-Net2)更好地实现通信性能,并且与2D-DFT算法相比,ISAC-NET的感应性能更高。Abstract
Recent advances in wireless communication with the enormous demands of sensing ability have given rise to the integrated sensing and communication (ISAC) technology, among which passive sensing plays an important role. The main challenge of passive sensing is how to achieve high sensing performance in the condition of communication demodulation errors. In this paper, we propose an ISAC network (ISAC-NET) that combines passive sensing with communication signal detection by using model-driven deep learning (DL). Dissimilar to existing passive sensing algorithms that first demodulate the transmitted symbols and then obtain passive sensing results from the demodulated symbols, ISAC-NET obtains passive sensing results and communication demodulated symbols simultaneously. Different from the data-driven DL method, we adopt the block-by-block signal processing method that divides the ISAC-NET into the passive sensing module, signal detection module and channel reconstruction module. From the simulation results, ISAC-NET obtains better communication performance than the traditional signal demodulation algorithm, which is close to OAMP-Net2. Compared to the 2D-DFT algorithm, ISAC-NET demonstrates significantly enhanced sensing performance. In summary, ISAC-NET is a promising tool for passive sensing and communication in wireless communications.
摘要
近年来,无线通信技术的发展,对感知能力的巨大需求,已经促使出一种集成感知和通信(ISAC)技术的出现,其中被动感知占据重要地位。被动感知的主要挑战是如何在通信模式错误的情况下实现高度的感知性能。本文提出一种名为ISAC网络(ISAC-NET),它将被动感知与通信信号检测结合,使用模型驱动深度学习(DL)。与现有的被动感知算法不同,ISAC-NET在获取被动感知结果和通信模式错误的同时,也可以同时获取通信解调结果。与传统的数据驱动DL方法不同,我们采用了块级Signal Processing方法,将ISAC-NET分为感知模块、信号检测模块和通道重建模块。从 simulate结果来看,ISAC-NET在通信性能方面比传统的信号解调算法更好,与OAMP-Net2几乎相当。相比2D-DFT算法,ISAC-NET在感知性能方面表现明显提高。总之,ISAC-NET是无线通信中可靠的被动感知和通信工具。
Do not Mask Randomly: Effective Domain-adaptive Pre-training by Masking In-domain Keywords
paper_authors: Shahriar Golchin, Mihai Surdeanu, Nazgol Tavabi, Ata Kiapour
for: 提高任务域预训练的性能
methods: 选择ively屏蔽域预训练中的关键词(使用KeyBERT)
results: 在六个不同的设置中,采用本方法进行域预训练后的精度提高,并且比随机屏蔽和普通的预训练-then-精度调整方法更高。屏蔽关键词的时间开销可以控制在7-15%(for two epochs)。Abstract
We propose a novel task-agnostic in-domain pre-training method that sits between generic pre-training and fine-tuning. Our approach selectively masks in-domain keywords, i.e., words that provide a compact representation of the target domain. We identify such keywords using KeyBERT (Grootendorst, 2020). We evaluate our approach using six different settings: three datasets combined with two distinct pre-trained language models (PLMs). Our results reveal that the fine-tuned PLMs adapted using our in-domain pre-training strategy outperform PLMs that used in-domain pre-training with random masking as well as those that followed the common pre-train-then-fine-tune paradigm. Further, the overhead of identifying in-domain keywords is reasonable, e.g., 7-15% of the pre-training time (for two epochs) for BERT Large (Devlin et al., 2019).
摘要
我们提出了一种新的任务非对称在领域预训练方法,位于通用预训练和精度调整之间。我们的方法选择性地遮盖领域关键词(Grootendorst, 2020),即预测目标领域的紧凑表示。我们使用 six 个不同的设置进行评估:三个数据集与两种不同的预训练语言模型(PLMs)结合。我们的结果表明,使用我们的领域预训练策略进行精度调整后,PLMs 的性能比使用随机遮盖预训练和通用预训练-然后调整的方法都高得多。此外,在标识领域关键词的时间上,只需要投入 7-15% 的时间(对 BERT Large 来说,Devlin et al., 2019)。
Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions
paper_authors: Leo Klarner, Tim G. J. Rudner, Michael Reutlinger, Torsten Schindler, Garrett M. Morris, Charlotte Deane, Yee Whye Teh
for: 本研究旨在加速发现新有效药物,使用深度学习方法来解决药物发现任务中的数据罕见和变化问题。
methods: 本文提出了一种名为Q-SAVI的概率模型,该模型可以Explicitly encode prior knowledge of the data-generating process into a prior distribution over functions,提供了一种透明和 probabilistically principled的方法来编码数据驱动模型化偏好。
results: 通过使用Q-SAVI模型,可以在许多State-of-the-art自动预训练和领域调整技术的基础上获得显著提高的预测精度和准确性,并且可以在挑战性评价setup下表现出色。Abstract
Accelerating the discovery of novel and more effective therapeutics is an important pharmaceutical problem in which deep learning is playing an increasingly significant role. However, real-world drug discovery tasks are often characterized by a scarcity of labeled data and significant covariate shift$\unicode{x2013}\unicode{x2013}$a setting that poses a challenge to standard deep learning methods. In this paper, we present Q-SAVI, a probabilistic model able to address these challenges by encoding explicit prior knowledge of the data-generating process into a prior distribution over functions, presenting researchers with a transparent and probabilistically principled way to encode data-driven modeling preferences. Building on a novel, gold-standard bioactivity dataset that facilitates a meaningful comparison of models in an extrapolative regime, we explore different approaches to induce data shift and construct a challenging evaluation setup. We then demonstrate that using Q-SAVI to integrate contextualized prior knowledge of drug-like chemical space into the modeling process affords substantial gains in predictive accuracy and calibration, outperforming a broad range of state-of-the-art self-supervised pre-training and domain adaptation techniques.
摘要
加速发现新有效的药物是药品工业中一个重要的问题,深度学习在这个领域中发挥了越来越重要的作用。然而,实际的药物发现任务常常受到数据标注的罕见和变量差异的限制,这种情况对标准的深度学习方法提出了挑战。在这篇论文中,我们提出了Q-SAVI模型,该模型能够Address these challenges by encoding explicit prior knowledge of the data-generating process into a prior distribution over functions, providing researchers with a transparent and probabilistically principled way to encode data-driven modeling preferences。基于一个新的、标准的生物活性数据集,我们explore different approaches to induce data shift and construct a challenging evaluation setup。然后,我们示出了使用Q-SAVI模型将contextualized prior knowledge of drug-like chemical space integrate into the modeling process可以获得显著的预测精度和准确性提升,超过了一系列当前状态最佳的自动学习和领域适应技术。
Global path preference and local response: A reward decomposition approach for network path choice analysis in the presence of locally perceived attributes
For: This study analyzes the global and local path preferences of network travelers, using a reward decomposition approach integrated into a link-based recursive (Markovian) path choice model.* Methods: The approach decomposes the instantaneous reward function into global and local utilities, allowing the analysis of how attributes affect global and local path choices. The model can be estimated based on revealed path observations, without the need for plan information.* Results: The study found that pedestrians locally perceive and react to visual street quality, rather than having pre-trip global perceptions. The simulation results highlight the importance of location selection of interventions when policy-related attributes are only locally perceived by travelers.Here is the same information in Simplified Chinese text, as requested:* For: 这项研究 analyzes the global and local path preferences of network travelers, using a reward decomposition approach integrated into a link-based recursive (Markovian) path choice model.* Methods: 该方法 decomposes the instantaneous reward function into global and local utilities, allowing the analysis of how attributes affect global and local path choices. The model can be estimated based on revealed path observations, without the need for plan information.* Results: 研究发现 pedestrians locally perceive and react to visual street quality, rather than having pre-trip global perceptions. 模拟结果 highlights the importance of location selection of interventions when policy-related attributes are only locally perceived by travelers.Abstract
This study performs an attribute-level analysis of the global and local path preferences of network travelers. To this end, a reward decomposition approach is proposed and integrated into a link-based recursive (Markovian) path choice model. The approach decomposes the instantaneous reward function associated with each state-action pair into the global utility, a function of attributes globally perceived from anywhere in the network, and the local utility, a function of attributes that are only locally perceived from the current state. Only the global utility then enters the value function of each state, representing the future expected utility toward the destination. This global-local path choice model with decomposed reward functions allows us to analyze to what extent and which attributes affect the global and local path choices of agents. Moreover, unlike most adaptive path choice models, the proposed model can be estimated based on revealed path observations (without the information of plans) and as efficiently as deterministic recursive path choice models. The model was applied to the real pedestrian path choice observations in an urban street network where the green view index was extracted as a visual street quality from Google Street View images. The result revealed that pedestrians locally perceive and react to the visual street quality, rather than they have the pre-trip global perception on it. Furthermore, the simulation results using the estimated models suggested the importance of location selection of interventions when policy-related attributes are only locally perceived by travelers.
摘要
Looking deeper into interpretable deep learning in neuroimaging: a comprehensive survey
paper_authors: Md. Mahfuzur Rahman, Vince D. Calhoun, Sergey M. Plis
for: This paper aims to comprehensively review interpretable deep learning models in the neuroimaging domain, discussing the current status of interpretability resources, challenges, and limitations, as well as offering insights and guidance for future research directions.
methods: The paper focuses on interpretable deep learning models in neuroimaging, including multiple recent studies that have leveraged model interpretability to capture anatomical and functional brain alterations most relevant to model predictions.
results: The paper discusses the limitations of current practices and offers valuable insights and guidance for future research directions to make deep learning models substantially interpretable and advance scientific understanding of brain disorders.Abstract
Deep learning (DL) models have been popular due to their ability to learn directly from the raw data in an end-to-end paradigm, alleviating the concern of a separate error-prone feature extraction phase. Recent DL-based neuroimaging studies have also witnessed a noticeable performance advancement over traditional machine learning algorithms. But the challenges of deep learning models still exist because of the lack of transparency in these models for their successful deployment in real-world applications. In recent years, Explainable AI (XAI) has undergone a surge of developments mainly to get intuitions of how the models reached the decisions, which is essential for safety-critical domains such as healthcare, finance, and law enforcement agencies. While the interpretability domain is advancing noticeably, researchers are still unclear about what aspect of model learning a post hoc method reveals and how to validate its reliability. This paper comprehensively reviews interpretable deep learning models in the neuroimaging domain. Firstly, we summarize the current status of interpretability resources in general, focusing on the progression of methods, associated challenges, and opinions. Secondly, we discuss how multiple recent neuroimaging studies leveraged model interpretability to capture anatomical and functional brain alterations most relevant to model predictions. Finally, we discuss the limitations of the current practices and offer some valuable insights and guidance on how we can steer our future research directions to make deep learning models substantially interpretable and thus advance scientific understanding of brain disorders.
摘要
在这篇评论中,我们将对 interpretable deep learning 模型在 neuroscience 领域进行全面的回顾。首先,我们将概括当前可用的解释性资源,包括方法的进步、相关的挑战和意见。其次,我们将讨论如何通过模型解释性来捕捉神经成像和功能变化,这些变化与模型预测之间的相互关系。最后,我们将讨论当前实践中的限制,并提供一些有价值的意见和指导,以帮助我们未来的研究方向,以便使深度学习模型变得可靠并提高神经疾病科学的理解。
Multi-Dimensional Ability Diagnosis for Machine Learning Algorithms
results: 对公共数据集进行了广泛的实验,结果表明 Camilla 不仅能够更准确地评估机器学习算法的优缺点,还能够超越现有的基准值在度量可靠性、排名一致性和排名稳定性等方面。Abstract
Machine learning algorithms have become ubiquitous in a number of applications (e.g. image classification). However, due to the insufficient measurement of traditional metrics (e.g. the coarse-grained Accuracy of each classifier), substantial gaps are usually observed between the real-world performance of these algorithms and their scores in standardized evaluations. In this paper, inspired by the psychometric theories from human measurement, we propose a task-agnostic evaluation framework Camilla, where a multi-dimensional diagnostic metric Ability is defined for collaboratively measuring the multifaceted strength of each machine learning algorithm. Specifically, given the response logs from different algorithms to data samples, we leverage cognitive diagnosis assumptions and neural networks to learn the complex interactions among algorithms, samples and the skills (explicitly or implicitly pre-defined) of each sample. In this way, both the abilities of each algorithm on multiple skills and some of the sample factors (e.g. sample difficulty) can be simultaneously quantified. We conduct extensive experiments with hundreds of machine learning algorithms on four public datasets, and our experimental results demonstrate that Camilla not only can capture the pros and cons of each algorithm more precisely, but also outperforms state-of-the-art baselines on the metric reliability, rank consistency and rank stability.
摘要
Efficient Strongly Polynomial Algorithms for Quantile Regression
results: 本文提出了多种高效的强 polynomial 算法,以解决 QR 的计算问题。其中,两维 QR 的算法有 deterministic worst-case 时间复杂度为 $\mathcal{O}(n^{4/3} polylog(n))$ 和 expected 时间复杂度为 $\mathcal{O}(n^{4/3})$,而高维 QR 的算法有 expected 时间复杂度为 $\mathcal{O}(n^{d-1}\log^2(n))$。Abstract
Linear Regression is a seminal technique in statistics and machine learning, where the objective is to build linear predictive models between a response (i.e., dependent) variable and one or more predictor (i.e., independent) variables. In this paper, we revisit the classical technique of Quantile Regression (QR), which is statistically a more robust alternative to the other classical technique of Ordinary Least Square Regression (OLS). However, while there exist efficient algorithms for OLS, almost all of the known results for QR are only weakly polynomial. Towards filling this gap, this paper proposes several efficient strongly polynomial algorithms for QR for various settings. For two dimensional QR, making a connection to the geometric concept of $k$-set, we propose an algorithm with a deterministic worst-case time complexity of $\mathcal{O}(n^{4/3} polylog(n))$ and an expected time complexity of $\mathcal{O}(n^{4/3})$ for the randomized version. We also propose a randomized divide-and-conquer algorithm -- RandomizedQR with an expected time complexity of $\mathcal{O}(n\log^2{(n)})$ for two dimensional QR problem. For the general case with more than two dimensions, our RandomizedQR algorithm has an expected time complexity of $\mathcal{O}(n^{d-1}\log^2{(n)})$.
摘要
在二维QR中,我们与$k$-set的几何概念相连,提出了一个deterministic最坏情况时间复杂度为$\mathcal{O}(n^{4/3}polylog(n))$,并且随机版本的时间复杂度为$\mathcal{O}(n^{4/3})$。此外,我们还提出了一种随机分割算法---RandomizedQR,其预期时间复杂度为$\mathcal{O}(n\log^2(n))$。在多维QR中,我们的RandomizedQR算法的预期时间复杂度为$\mathcal{O}(n^{d-1}\log^2(n))。
DataAssist: A Machine Learning Approach to Data Cleaning and Preparation
results: 可以为不同领域,如经济、商业和预测应用,提高数据整理和清洁效率,保留50%以上时间 для下游分析Abstract
Current automated machine learning (ML) tools are model-centric, focusing on model selection and parameter optimization. However, the majority of the time in data analysis is devoted to data cleaning and wrangling, for which limited tools are available. Here we present DataAssist, an automated data preparation and cleaning platform that enhances dataset quality using ML-informed methods. We show that DataAssist provides a pipeline for exploratory data analysis and data cleaning, including generating visualization for user-selected variables, unifying data annotation, suggesting anomaly removal, and preprocessing data. The exported dataset can be readily integrated with other autoML tools or user-specified model for downstream analysis. Our data-centric tool is applicable to a variety of fields, including economics, business, and forecasting applications saving over 50% time of the time spent on data cleansing and preparation.
摘要
当前的自动化机器学习(ML)工具都是模型集中心的,它们主要关注模型选择和参数优化。然而,数据分析的大部分时间被用于数据清洁和整理,而这个领域的工具却很有限。我们现在提出了数据协助(DataAssist),一个自动化数据准备和清洁平台,使用机器学习 Informed 方法来提高数据集质量。我们展示了 DataAssist 提供了探索数据分析和数据清洁的管道,包括生成用户选择变量的视觉化,统一数据注释,建议异常删除,并进行数据预处理。导出的数据集可以轻松地与其他自动ML工具或用户指定的模型进行下游分析。我们的数据集中心的工具适用于多个领域,包括经济、商业和预测应用,可以节省大约50%的时间用于数据清洁和准备。
An IPW-based Unbiased Ranking Metric in Two-sided Markets
For: 该论文目的是提出一种适应两侧市场中偏见的学习到排序(LTR)方法,以便在 clicked 数据中优先级化偏见的项目。* Methods: 该论文提出了一种基于对抗风险的偏见权重(IPW)技术,并将其扩展到两侧市场中,以处理双方用户之间的复杂的偏见交互。* Results: 该论文通过实验示范了其效果,特别是在处理罕见项目时表现出了更高的精度和更好的稳定性。Abstract
In modern recommendation systems, unbiased learning-to-rank (LTR) is crucial for prioritizing items from biased implicit user feedback, such as click data. Several techniques, such as Inverse Propensity Weighting (IPW), have been proposed for single-sided markets. However, less attention has been paid to two-sided markets, such as job platforms or dating services, where successful conversions require matching preferences from both users. This paper addresses the complex interaction of biases between users in two-sided markets and proposes a tailored LTR approach. We first present a formulation of feedback mechanisms in two-sided matching platforms and point out that their implicit feedback may include position bias from both user groups. On the basis of this observation, we extend the IPW estimator and propose a new estimator, named two-sided IPW, to address the position bases in two-sided markets. We prove that the proposed estimator satisfies the unbiasedness for the ground-truth ranking metric. We conducted numerical experiments on real-world two-sided platforms and demonstrated the effectiveness of our proposed method in terms of both precision and robustness. Our experiments showed that our method outperformed baselines especially when handling rare items, which are less frequently observed in the training data.
摘要
现代推荐系统中,无偏学习排名(LTR)在处理偏见用户反馈(如点击数据)的首位是关键。多种技术,如反投重量(IPW),已经为单边市场提出了解决方案。然而,对于双边市场,如寻找服务或约会服务,成功的转化需要从双方用户的匹配偏好中找到相互作用。本文描述了双边匹配平台上的反馈机制,并指出了用户群体之间的位置偏好可能会包含在内部反馈中。基于这一观察,我们扩展了IPW估计器,并提出了一种新的估计器——双边IPW,以解决双边市场中的位置基准。我们证明了我们提出的估计器满足了真实排名度量下的无偏性。我们在实际的双边平台上进行了数值实验,并证明了我们的方法在精度和稳定性两个方面的表现比基eline更好,特别是处理罕见项目时。
for: solves the decentralized, stochastic nonconvex strongly-concave (NCSC) minimax problem with nonsmooth regularization terms on both primal and dual variables.
methods: uses a Lagrangian multiplier to eliminate the consensus constraint on the dual variable, and varaince-reduction (VR) techniques to achieve a sample complexity of $\mathcal{O}(\kappa^3\varepsilon^{-3})$ and a communication complexity of $\mathcal{O}(\kappa^2\varepsilon^{-2})$ under the general stochastic setting.
results: achieves an $\mathcal{O}(\kappa^3\varepsilon^{-3})$ sample complexity and $\mathcal{O}(\kappa^2\varepsilon^{-2})$ communication complexity under the general stochastic setting, and an $\mathcal{O}(n + \sqrt{n} \kappa^2\varepsilon^{-2})$ sample complexity and $\mathcal{O}(\kappa^2\varepsilon^{-2})$ communication complexity under the special finite-sum setting, which matches the best-known results achieved by a few existing methods for solving special cases of the problem.Abstract
In this paper, we consider the decentralized, stochastic nonconvex strongly-concave (NCSC) minimax problem with nonsmooth regularization terms on both primal and dual variables, wherein a network of $m$ computing agents collaborate via peer-to-peer communications. We consider when the coupling function is in expectation or finite-sum form and the double regularizers are convex functions, applied separately to the primal and dual variables. Our algorithmic framework introduces a Lagrangian multiplier to eliminate the consensus constraint on the dual variable. Coupling this with variance-reduction (VR) techniques, our proposed method, entitled VRLM, by a single neighbor communication per iteration, is able to achieve an $\mathcal{O}(\kappa^3\varepsilon^{-3})$ sample complexity under the general stochastic setting, with either a big-batch or small-batch VR option, where $\kappa$ is the condition number of the problem and $\varepsilon$ is the desired solution accuracy. With a big-batch VR, we can additionally achieve $\mathcal{O}(\kappa^2\varepsilon^{-2})$ communication complexity. Under the special finite-sum setting, our method with a big-batch VR can achieve an $\mathcal{O}(n + \sqrt{n} \kappa^2\varepsilon^{-2})$ sample complexity and $\mathcal{O}(\kappa^2\varepsilon^{-2})$ communication complexity, where $n$ is the number of components in the finite sum. All complexity results match the best-known results achieved by a few existing methods for solving special cases of the problem we consider. To the best of our knowledge, this is the first work which provides convergence guarantees for NCSC minimax problems with general convex nonsmooth regularizers applied to both the primal and dual variables in the decentralized stochastic setting. Numerical experiments are conducted on two machine learning problems. Our code is downloadable from https://github.com/RPI-OPT/VRLM.
摘要
在这篇论文中,我们考虑了分布式、随机非 convex 强迫紧密(NCSC)最小化问题,其中 $m$ 个计算代理在各自之间进行各自通信。我们考虑了对 coupling function 的预期形式和 finite-sum 形式的情况,并且对 primal 和 dual 变量应用了 convex 函数的双重正则化。我们的算法框架引入了 Lagrangian 多乘数,以消除 dual 变量上的协调约束。将这与减少异议(VR)技术结合,我们提出的方法,名为 VRLM,每个邻居通信一次,可以在通用随机设定下实现 $\mathcal{O}(\kappa^3\varepsilon^{-3})$ 样本复杂度,其中 $\kappa$ 是问题的condition number,$\varepsilon$ 是解决精度。另外,使用 big-batch VR,我们可以实现 $\mathcal{O}(\kappa^2\varepsilon^{-2})$ 的通信复杂度。在特定的finite-sum设定下,使用 big-batch VR,我们可以实现 $\mathcal{O}(n + \sqrt{n} \kappa^2\varepsilon^{-2})$ 样本复杂度和 $\mathcal{O}(\kappa^2\varepsilon^{-2})$ 通信复杂度,其中 $n$ 是finite-sum中的组件数。所有复杂度结果与一些现有的方法实现的最佳结果相匹配。这是我们知道的首次提出了 NCSC 最小化问题中的强迫紧密最小化问题的解 convergence guarantees,并且在分布式随机设定下实现了这种问题的解。我们的代码可以在 https://github.com/RPI-OPT/VRLM 上下载。Numerical experiments were conducted on two machine learning problems, and the results show that our method is effective and efficient.
results: 研究发现,使用GPSE可以在各种图像预测任务中提高模型的性能,并且可以在不同类型的图像上进行高效的传播。此外,GPSE可以与现有的自我超VI等方法相比肩并让人们认为是一种可靠的代替方法。Abstract
Positional and structural encodings (PSE) enable better identifiability of nodes within a graph, as in general graphs lack a canonical node ordering. This renders PSEs essential tools for empowering modern GNNs, and in particular graph Transformers. However, designing PSEs that work optimally for a variety of graph prediction tasks is a challenging and unsolved problem. Here, we present the graph positional and structural encoder (GPSE), a first-ever attempt to train a graph encoder that captures rich PSE representations for augmenting any GNN. GPSE can effectively learn a common latent representation for multiple PSEs, and is highly transferable. The encoder trained on a particular graph dataset can be used effectively on datasets drawn from significantly different distributions and even modalities. We show that across a wide range of benchmarks, GPSE-enhanced models can significantly improve the performance in certain tasks, while performing on par with those that employ explicitly computed PSEs in other cases. Our results pave the way for the development of large pre-trained models for extracting graph positional and structural information and highlight their potential as a viable alternative to explicitly computed PSEs as well as to existing self-supervised pre-training approaches.
摘要
通用的位置和结构编码(PSE)可以更好地识别图像中的节点,因为普通的图像缺乏唯一的节点顺序。这使得PSE成为现代GNNS的重要工具,特别是图Transformers。然而,设计最佳的PSE是一个具有挑战性和未解决的问题。在这里,我们提出了图位置和结构编码器(GPSE),这是首次尝试用自适应学习方法来训练图编码器,以便在不同的图据集上捕捉丰富的PSE表示。GPSE可以有效地学习共享精灵代码,并且是可迁移的。我们的结果表明,在各种比较检验中,GPSE增强的模型可以在某些任务中显著提高性能,而在其他任务中与直接计算PSE的模型性能相当。我们的结果开创了大规模预训练模型的发展,并高亮了它们作为可靠的代码计算和现有自我超vised预训练方法的替代方案。
MaxCorrMGNN: A Multi-Graph Neural Network Framework for Generalized Multimodal Fusion of Medical Data for Outcome Prediction
paper_authors: Niharika S. D’Souza, Hongzhi Wang, Andrea Giovannini, Antonio Foncubierta-Rodriguez, Kristen L. Beck, Orest Boyko, Tanveer Syeda-Mahmood
results: 在TB数据集上,MaxCorr MGNN方法在对比多种现有的神经网络、图基本方法和传统融合方法时,有所提高结果,并且能够有效地预测疾病结果。Abstract
With the emergence of multimodal electronic health records, the evidence for an outcome may be captured across multiple modalities ranging from clinical to imaging and genomic data. Predicting outcomes effectively requires fusion frameworks capable of modeling fine-grained and multi-faceted complex interactions between modality features within and across patients. We develop an innovative fusion approach called MaxCorr MGNN that models non-linear modality correlations within and across patients through Hirschfeld-Gebelein-Renyi maximal correlation (MaxCorr) embeddings, resulting in a multi-layered graph that preserves the identities of the modalities and patients. We then design, for the first time, a generalized multi-layered graph neural network (MGNN) for task-informed reasoning in multi-layered graphs, that learns the parameters defining patient-modality graph connectivity and message passing in an end-to-end fashion. We evaluate our model an outcome prediction task on a Tuberculosis (TB) dataset consistently outperforming several state-of-the-art neural, graph-based and traditional fusion techniques.
摘要
With the emergence of multimodal electronic health records, the evidence for an outcome may be captured across multiple modalities ranging from clinical to imaging and genomic data. Predicting outcomes effectively requires fusion frameworks capable of modeling fine-grained and multi-faceted complex interactions between modality features within and across patients. We develop an innovative fusion approach called MaxCorr MGNN that models non-linear modality correlations within and across patients through Hirschfeld-Gebelein-Renyi maximal correlation (MaxCorr) embeddings, resulting in a multi-layered graph that preserves the identities of the modalities and patients. We then design, for the first time, a generalized multi-layered graph neural network (MGNN) for task-informed reasoning in multi-layered graphs, that learns the parameters defining patient-modality graph connectivity and message passing in an end-to-end fashion. We evaluate our model an outcome prediction task on a Tuberculosis (TB) dataset consistently outperforming several state-of-the-art neural, graph-based and traditional fusion techniques.Here's the translation in Traditional Chinese:随着多模式电子健康纪录的出现,结果证据可能会被捕捉到多种模式,从临床到图像和基因数据。预测结果需要融合框架,可以模型内部和between patients的细节化和多方面复杂交互。我们开发了一种创新的融合方法,called MaxCorr MGNN,通过将悦律-哥白-雷尼最大相关性(MaxCorr)嵌入,创建了一个多层graph,保留了模式和患者的身份。我们然后设计了,for the first time, a generalized multi-layered graph neural network (MGNN) for task-informed reasoning in multi-layered graphs, 可以学习定义患者-模式图形连接和讯息传递的参数。我们在TB datasets上进行结果预测任务,与多种现有的神经网络、图形基础和传统融合技术相比,具有较高的性能。
Robotic Manipulation Datasets for Offline Compositional Reinforcement Learning
results: 本研究提供了四个Offline RL的数据集,每个数据集包含256亿个转换,并提供了训练和评估设定来评估代理人是否能够学习集成任务政策。实验结果显示,现有的Offline RL方法可以在一定程度上学习训练任务,并且集成方法比非集成方法表现更好。但是,现有的方法仍然无法将任务的集成结构抽象出来,以适应未见任务。Abstract
Offline reinforcement learning (RL) is a promising direction that allows RL agents to pre-train on large datasets, avoiding the recurrence of expensive data collection. To advance the field, it is crucial to generate large-scale datasets. Compositional RL is particularly appealing for generating such large datasets, since 1) it permits creating many tasks from few components, 2) the task structure may enable trained agents to solve new tasks by combining relevant learned components, and 3) the compositional dimensions provide a notion of task relatedness. This paper provides four offline RL datasets for simulated robotic manipulation created using the 256 tasks from CompoSuite [Mendez et al., 2022a]. Each dataset is collected from an agent with a different degree of performance, and consists of 256 million transitions. We provide training and evaluation settings for assessing an agent's ability to learn compositional task policies. Our benchmarking experiments on each setting show that current offline RL methods can learn the training tasks to some extent and that compositional methods significantly outperform non-compositional methods. However, current methods are still unable to extract the tasks' compositional structure to generalize to unseen tasks, showing a need for further research in offline compositional RL.
摘要
<> translate "Offline reinforcement learning (RL) is a promising direction that allows RL agents to pre-train on large datasets, avoiding the recurrence of expensive data collection. To advance the field, it is crucial to generate large-scale datasets. Compositional RL is particularly appealing for generating such large datasets, since 1) it permits creating many tasks from few components, 2) the task structure may enable trained agents to solve new tasks by combining relevant learned components, and 3) the compositional dimensions provide a notion of task relatedness. This paper provides four offline RL datasets for simulated robotic manipulation created using the 256 tasks from CompoSuite [Mendez et al., 2022a]. Each dataset is collected from an agent with a different degree of performance, and consists of 256 million transitions. We provide training and evaluation settings for assessing an agent's ability to learn compositional task policies. Our benchmarking experiments on each setting show that current offline RL methods can learn the training tasks to some extent and that compositional methods significantly outperform non-compositional methods. However, current methods are still unable to extract the tasks' compositional structure to generalize to unseen tasks, showing a need for further research in offline compositional RL."into Simplified Chinese:<>Offline 学习 Reinforcement Learning (RL) 是一个promising的方向,允许 RL 代理人在大量数据集上进行预训练,以避免高昂的数据收集成本。为了进步这一领域,生成大规模数据集是极为重要的。compositional RL 是一种特别有把握的方法,因为它允许从 few 个组件中创建多个任务,并且任务结构可能使得训练过的代理人能够通过相关学习的组件来解决新任务。此外,compositional 维度提供了任务相关性的一种理解。本文提供了四个 offline RL 数据集,用于模拟机器人 manipulate 的场景,由 CompoSuite 中的 256 个任务生成 [Mendez et al., 2022a]。每个数据集来自不同水平的代理人,包含 256 亿个转移。我们提供了训练和评估代理人学习 compositional 任务策略的设置。我们的参考实验表明,当前的 offline RL 方法可以在一定程度上学习训练任务,而 compositional 方法在非 compositional 方法之上显著超越。然而,当前的方法仍然无法提取任务的 compositional 结构,以 generalized 到未经见过任务, indicating a need for further research in offline compositional RL.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The translation is based on the Google Translate API, and may not be perfect or idiomatic.
paper_authors: Amandeep Singh, Ye Liu, Hema Yoganarasimhan
for: 本研究旨在描述选择函数的基本特征,以涵盖许多现有的选择模型。
methods: 我们提出了一种非参数化估计器,如神经网络,可以轻松地近似选择函数的函数式。
results: 我们通过了广泛的 simulate 结果表明,我们的提出的函数式可以完全捕捉consumer行为的特征,并且在数据驱动的情况下超越非参数化估计的矛盾性。Abstract
Choice Modeling is at the core of many economics, operations, and marketing problems. In this paper, we propose a fundamental characterization of choice functions that encompasses a wide variety of extant choice models. We demonstrate how nonparametric estimators like neural nets can easily approximate such functionals and overcome the curse of dimensionality that is inherent in the non-parametric estimation of choice functions. We demonstrate through extensive simulations that our proposed functionals can flexibly capture underlying consumer behavior in a completely data-driven fashion and outperform traditional parametric models. As demand settings often exhibit endogenous features, we extend our framework to incorporate estimation under endogenous features. Further, we also describe a formal inference procedure to construct valid confidence intervals on objects of interest like price elasticity. Finally, to assess the practical applicability of our estimator, we utilize a real-world dataset from S. Berry, Levinsohn, and Pakes (1995). Our empirical analysis confirms that the estimator generates realistic and comparable own- and cross-price elasticities that are consistent with the observations reported in the existing literature.
摘要
《选择模型在许多经济、运营和市场问题中核心地位。在这篇论文中,我们提出了一种涵盖广泛现有选择模型的基本特征化。我们示示了非参数统计方法如神经网络可以轻松地近似这些函数,并超越维度约束的咒语。我们通过广泛的 simulations 表明,我们提议的函数可以完全捕捉消费者行为的下面特征,并在完全数据驱动的方式下超越传统参数模型。由于需求设定 часто表现出内生特征,我们扩展了我们的框架,以便包括内生特征的估计。此外,我们还描述了一种正式的推断过程,以建立有效的自信间隔的确定。最后,我们利用 S. Berry、Levinsohn 和 Pakes (1995) 的实际数据进行了应用性测试,我们的统计分析表明,我们的估计器生成了真实的和相似的价格敏感性和跨价格敏感性,与现有文献中的观察结果相符。
Safe Reinforcement Learning as Wasserstein Variational Inference: Formal Methods for Interpretability
results: 通过实验和实际应用,证明了AWaVO方法的globally convergent和高性能,并实际证明了一个合理的奖励函数设计和稳定性之间的交易Abstract
Reinforcement Learning or optimal control can provide effective reasoning for sequential decision-making problems with variable dynamics. Such reasoning in practical implementation, however, poses a persistent challenge in interpreting the reward function and corresponding optimal policy. Consequently, formalizing the sequential decision-making problems as inference has a considerable value, as probabilistic inference in principle offers diverse and powerful mathematical tools to infer the stochastic dynamics whilst suggesting a probabilistic interpretation of the reward design and policy convergence. In this study, we propose a novel Adaptive Wasserstein Variational Optimization (AWaVO) to tackle these challenges in sequential decision-making. Our approach utilizes formal methods to provide interpretations of reward design, transparency of training convergence, and probabilistic interpretation of sequential decisions. To demonstrate practicality, we show convergent training with guaranteed global convergence rates not only in simulation but also in real robot tasks, and empirically verify a reasonable tradeoff between high performance and conservative interpretability.
摘要
<>使用强化学习或优化控制可以提供有效的解释力 для序列决策问题中的变量动力学。然而,在实践实现中,抽象序列决策问题为推理存在一个持续的挑战,即解释奖函数和相应的优化策略。因此,将序列决策问题正式化为推理有着很大的价值,因为推理在原则上提供了多样化和强大的数学工具来推断随机动力学,并提供了序列决策中的概率解释。在这种研究中,我们提出了一种新的自适应沃尔希特变量优化(AWaVO)方法,以解决序列决策中这些挑战。我们的方法使用正式方法提供奖函数设计的解释、训练过程的透明性和序列决策中的概率解释。为证明实用性,我们在模拟和真实 робоット任务中展示了可靠的训练,并证明了保证全球追踪率的全局收敛率,并且在实际中证明了高性能和保守可读性之间的合理交易。
A Scenario-Based Functional Testing Approach to Improving DNN Performance
results: 通过对弱场景进行特定的重新训练和随机选择部分原始训练数据,提高了深度神经网络(DNN)模型的性能,并且比较效率地进行了改进,对人工和计算资源的需求更低Abstract
This paper proposes a scenario-based functional testing approach for enhancing the performance of machine learning (ML) applications. The proposed method is an iterative process that starts with testing the ML model on various scenarios to identify areas of weakness. It follows by a further testing on the suspected weak scenarios and statistically evaluate the model's performance on the scenarios to confirm the diagnosis. Once the diagnosis of weak scenarios is confirmed by test results, the treatment of the model is performed by retraining the model using a transfer learning technique with the original model as the base and applying a set of training data specifically targeting the treated scenarios plus a subset of training data selected at random from the original train dataset to prevent the so-call catastrophic forgetting effect. Finally, after the treatment, the model is assessed and evaluated again by testing on the treated scenarios as well as other scenarios to check if the treatment is effective and no side effect caused. The paper reports a case study with a real ML deep neural network (DNN) model, which is the perception system of an autonomous racing car. It is demonstrated that the method is effective in the sense that DNN model's performance can be improved. It provides an efficient method of enhancing ML model's performance with much less human and compute resource than retrain from scratch.
摘要
results: 对于一些数据集,该算法可以提供更好的分 clustering结果,比如在分类问题中,使用kernel方法可以提高性能和准确性。Abstract
This paper presents a kernelized version of the t-SNE algorithm, capable of mapping high-dimensional data to a low-dimensional space while preserving the pairwise distances between the data points in a non-Euclidean metric. This can be achieved using a kernel trick only in the high dimensional space or in both spaces, leading to an end-to-end kernelized version. The proposed kernelized version of the t-SNE algorithm can offer new views on the relationships between data points, which can improve performance and accuracy in particular applications, such as classification problems involving kernel methods. The differences between t-SNE and its kernelized version are illustrated for several datasets, showing a neater clustering of points belonging to different classes.
摘要
这篇论文提出了一种基于kernel的t-SNE算法,可以将高维数据映射到低维空间中,保持数据点之间的对称距离,而不是使用欧几里得空间。这可以通过kernel技术在高维空间或在两个空间中进行,从而实现一个端到端kernelized版本。提议的kernelized版本的t-SNE算法可以提供新的视图,用于描述数据点之间的关系,可以提高特定应用中的性能和准确性,如使用kernel方法进行分类问题。与t-SNE算法的区别被 illustrate для一些数据集,显示了不同类别的点更加整洁的归一化。
Unsupervised Learning of Distributional Properties can Supplement Human Labeling and Increase Active Learning Efficiency in Anomaly Detection
results: 我们的 AL 采样策略在三个高度不均衡的 UCI benchmark 上和一个真实世界的隐藏电子邮件数据集上表现出色,超过了现有的 AL 方法。Abstract
Exfiltration of data via email is a serious cybersecurity threat for many organizations. Detecting data exfiltration (anomaly) patterns typically requires labeling, most often done by a human annotator, to reduce the high number of false alarms. Active Learning (AL) is a promising approach for labeling data efficiently, but it needs to choose an efficient order in which cases are to be labeled, and there are uncertainties as to what scoring procedure should be used to prioritize cases for labeling, especially when detecting rare cases of interest is crucial. We propose an adaptive AL sampling strategy that leverages the underlying prior data distribution, as well as model uncertainty, to produce batches of cases to be labeled that contain instances of rare anomalies. We show that (1) the classifier benefits from a batch of representative and informative instances of both normal and anomalous examples, (2) unsupervised anomaly detection plays a useful role in building the classifier in the early stages of training when relatively little labeling has been done thus far. Our approach to AL for anomaly detection outperformed existing AL approaches on three highly unbalanced UCI benchmarks and on one real-world redacted email data set.
摘要
<>将数据外部传输到电子邮件是许多组织面临的严重网络安全威胁。检测数据外部传输(异常)模式通常需要标注,通常由人工标注员进行,以降低假警示的数量。活动学习(AL)是一种有前途的方法,可以有效地标注数据,但是需要选择有效的批处理顺序,以及使用的分数方法来优先级排序案例,特别是当检测罕见异常 случа件是关键的时候。我们提出了一种适应性的 AL 采样策略,利用下面的先验分布,以及模型的不确定性,生成需要标注的批处理中包含罕见异常例子的情况。我们表明了以下两点:(1)分类器可以从一批代表性和有用的正常和异常示例中受益,(2)无监督异常检测在训练过程的早期可以扮演一个有用的角色,尤其是当 relativamente 少的标注工作已经完成时。我们的 AL 方法在三个高度不均衡的 UCI benchmark 上和一个真实世界的隐藏邮件数据集上表现出色。
CaRT: Certified Safety and Robust Tracking in Learning-based Motion Planning for Multi-Agent Systems
paper_authors: Hiroyasu Tsukamoto, Benjamin Rivière, Changrak Choi, Amir Rahmani, Soon-Jo Chung
for: guaranteeing the safety and robustness of learning-based motion planning policies in nonlinear multi-agent systems
methods: analytical form of the CaRT safety/robust filter, which uses contraction theory to ensure safety and exponential boundedness of the trajectory tracking error, and a log-barrier formulation for distributed implementation in multi-agent settings
results: effectiveness of CaRT in several examples of nonlinear motion planning and control problems, including optimal, multi-spacecraft reconfigurationAbstract
The key innovation of our analytical method, CaRT, lies in establishing a new hierarchical, distributed architecture to guarantee the safety and robustness of a given learning-based motion planning policy. First, in a nominal setting, the analytical form of our CaRT safety filter formally ensures safe maneuvers of nonlinear multi-agent systems, optimally with minimal deviation from the learning-based policy. Second, in off-nominal settings, the analytical form of our CaRT robust filter optimally tracks the certified safe trajectory, generated by the previous layer in the hierarchy, the CaRT safety filter. We show using contraction theory that CaRT guarantees safety and the exponential boundedness of the trajectory tracking error, even under the presence of deterministic and stochastic disturbance. Also, the hierarchical nature of CaRT enables enhancing its robustness for safety just by its superior tracking to the certified safe trajectory, thereby making it suitable for off-nominal scenarios with large disturbances. This is a major distinction from conventional safety function-driven approaches, where the robustness originates from the stability of a safe set, which could pull the system over-conservatively to the interior of the safe set. Our log-barrier formulation in CaRT allows for its distributed implementation in multi-agent settings. We demonstrate the effectiveness of CaRT in several examples of nonlinear motion planning and control problems, including optimal, multi-spacecraft reconfiguration.
摘要
“我们的 CaRT 分析方法的关键创新在于建立了一个新的层次化、分布式架构,以保证学习型动力规划策略的安全和可靠性。首先,在正常设定下,我们的 CaRT 安全范防 formally 保证了非线性多自适应系统的安全运动,并且将其与学习型政策的最小偏差进行优化。其次,在偏差设定下,CaRT 的安全范防将跟踪由前一层架构生成的认证安全轨迹,以 guarantees 安全和可靠性。我们使用构造理论表明 CaRT 能够保证安全和轨迹追踪误差的对数式增长,甚至在决定性和随机干扰的存在下。此外,CaRT 的层次化结构使得它可以通过优化跟踪认证安全轨迹来增强其可靠性,因此适合偏差设定下的大干扰。这与传统的安全函数驱动方法不同,这些方法的稳定性来自安全集的稳定性,可能会将系统往内紧缩到安全集的内部。CaRT 的对数阻隔表现允许它在多自适应设定下进行分布式实现。我们在一些非线性动力规划和控制问题中证明了 CaRT 的有效性,包括多spacecraft 重配置问题。”
Rician likelihood loss for quantitative MRI using self-supervised deep learning
results: 对于 Apparent Diffusion Coefficient(ADC)和Intra-voxel Incoherent Motion(IVIM)分布模型中的参数估计,Networks trained with NLR loss show higher estimation accuracy than MSE as SNR decreases, with minimal loss of precision or total error.Abstract
Purpose: Previous quantitative MR imaging studies using self-supervised deep learning have reported biased parameter estimates at low SNR. Such systematic errors arise from the choice of Mean Squared Error (MSE) loss function for network training, which is incompatible with Rician-distributed MR magnitude signals. To address this issue, we introduce the negative log Rician likelihood (NLR) loss. Methods: A numerically stable and accurate implementation of the NLR loss was developed to estimate quantitative parameters of the apparent diffusion coefficient (ADC) model and intra-voxel incoherent motion (IVIM) model. Parameter estimation accuracy, precision and overall error were evaluated in terms of bias, variance and root mean squared error and compared against the MSE loss over a range of SNRs (5 - 30). Results: Networks trained with NLR loss show higher estimation accuracy than MSE for the ADC and IVIM diffusion coefficients as SNR decreases, with minimal loss of precision or total error. At high effective SNR (high SNR and small diffusion coefficients), both losses show comparable accuracy and precision for all parameters of both models. Conclusion: The proposed NLR loss is numerically stable and accurate across the full range of tested SNRs and improves parameter estimation accuracy of diffusion coefficients using self-supervised deep learning. We expect the development to benefit quantitative MR imaging techniques broadly, enabling more accurate parameter estimation from noisy data.
摘要
目的:前一些量化MR成像研究使用自动编码学习发现低信噪率下参量估算结果受到了系统性的误差的问题。这些系统性错误来自于用于网络训练的平均方差(MSE)损失函数与MR幅度信号的 rician 分布不兼容。为解决这个问题,我们引入了负Log Rician 概率(NLR)损失函数。方法:我们开发了一种稳定和准确的 NLR 损失函数实现,以估算量化参量ADC模型和IVIM模型中的参量。我们评估了参量估算精度、精度和总误差,并与MSE损失函数进行比较,在5-30的SNR范围内进行了测试。结果:使用NLR损失函数训练的网络显示在SNR降低时,ADC和IVIM扩散系数的参量估算精度提高,而不会失去精度或总误差。在高有效SNR(高SNR和小扩散系数)下,两种损失函数都显示了相似的精度和精度。结论:我们提出的NLR损失函数是稳定和准确的,可以在全面测试的SNR范围内进行估算参量。我们预计这种发展将对量化MR成像技术产生积极的影响,使得从噪声数据中更加准确地估算参量。
Proof of Training (PoT): Harnessing Crypto Mining Power for Distributed AI Training
results: considerable potential in terms of task throughput, system robustness, and network securityAbstract
In the midst of the emerging trend of integrating artificial intelligence (AI) with crypto mining, we identify three major challenges that create a gap between these two fields. To bridge this gap, we introduce the proof-of-training (PoT) protocol, an approach that combines the strengths of both AI and blockchain technology. The PoT protocol utilizes the practical Byzantine fault tolerance (PBFT) consensus mechanism to synchronize global states. To evaluate the performance of the protocol design, we present an implementation of a decentralized training network (DTN) that adopts the PoT protocol. Our results indicate that the protocol exhibits considerable potential in terms of task throughput, system robustness, and network security.
摘要
在人工智能(AI)与抵销(crypto mining)两个领域的融合趋势中,我们识别出三大挑战,这些挑战使得这两个领域之间存在一个差距。为了bridging这个差距,我们提出了证明训练(PoT)协议,这种协议结合了AI和区块链技术的优势。PoT协议使用实际的拜占庭错误tolerance(PBFT)共识机制来同步全球状态。为评估协议设计的性能,我们提出了一个分布式训练网络(DTN)的实现,该网络采用PoT协议。我们的结果表明,协议在任务 durchput、系统稳定性和网络安全方面具有显著的潜力。
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
results: 对一种robust image-to-text基eline(BLIP-2)进行了改进,并将模型训练集的数据量从4M变为129M,显著提高了模型的性能。此外,模型在不同的基模块和视频学习任务中也表现出了高效性。Abstract
We present a novel methodology aimed at optimizing the application of frozen large language models (LLMs) for resource-intensive vision-language (VL) pre-training. The current paradigm uses visual features as prompts to guide language models, with a focus on determining the most relevant visual features for corresponding text. Our approach diverges by concentrating on the language component, specifically identifying the optimal prompts to align with visual features. We introduce the Prompt-Transformer (P-Former), a model that predicts these ideal prompts, which is trained exclusively on linguistic data, bypassing the need for image-text pairings. This strategy subtly bifurcates the end-to-end VL training process into an additional, separate stage. Our experiments reveal that our framework significantly enhances the performance of a robust image-to-text baseline (BLIP-2), and effectively narrows the performance gap between models trained with either 4M or 129M image-text pairs. Importantly, our framework is modality-agnostic and flexible in terms of architectural design, as validated by its successful application in a video learning task using varied base modules. The code is available at https://github.com/yiren-jian/BLIText
摘要
我们提出了一种新的方法ология,旨在优化冻结大型语言模型(LLM)的资源占用量 для视觉语言(VL)预训练。当前的方法使用视觉特征作为提示,以确定与文本相关的最相关的视觉特征。我们的方法则专注于语言组件,具体是确定最佳提示,以与视觉特征对齐。我们提出了提问转换器(P-Former)模型,可以预测这些理想的提示,该模型由solely on linguistic data 训练,不需要图像文本对应。这种策略将结束到端到端 VL 训练过程中的另一个额外阶段。我们的实验表明,我们的框架可以显著提高一个强大的图像文本基线(BLIP-2)的性能,并有效地减少使用4M或129M图像文本对应的模型性能差距。其中,我们的框架是模块无关和架构可变的,并在视频学习任务中成功应用了不同的基模块。代码可以在https://github.com/yiren-jian/BLIText 中下载。
Controllable Emphasis with zero data for text-to-speech
paper_authors: Arnaud Joly, Marco Nicolis, Ekaterina Peterova, Alessandro Lombardi, Ammar Abbas, Arent van Korlaar, Aman Hussain, Parul Sharma, Alexis Moinet, Mateusz Lajszczak, Penny Karanasou, Antonio Bonafonte, Thomas Drugman, Elena Sokolova
results: 对比spectrogram修改技术,这种方法可以提高自然性的提升率达7.3%,并在测试 sentence中提高correct identifier的率达40%。此外,这种技术还可以适用于不同的语言(英语、西班牙语、意大利语、德语)、不同的voice和多种说话风格。Abstract
We present a scalable method to produce high quality emphasis for text-to-speech (TTS) that does not require recordings or annotations. Many TTS models include a phoneme duration model. A simple but effective method to achieve emphasized speech consists in increasing the predicted duration of the emphasised word. We show that this is significantly better than spectrogram modification techniques improving naturalness by $7.3\%$ and correct testers' identification of the emphasized word in a sentence by $40\%$ on a reference female en-US voice. We show that this technique significantly closes the gap to methods that require explicit recordings. The method proved to be scalable and preferred in all four languages tested (English, Spanish, Italian, German), for different voices and multiple speaking styles.
摘要
我们提出了一种可扩展的方法,可以生成高质量的强调文本到语音识别(TTS),不需要录音或标注。许多TTS模型包含一个音节持续时间模型。我们发现,通过增加预测的强调单词持续时间,可以得到更高质量的强调speech。我们发现,这种方法在自然性和正确地标识强调单词方面比spectrogram修改技术提高$7.3\%$和$40\%$在参考女性英文voice上。我们发现,这种技术可以在英文、西班牙语、意大利语和德语等语言中进行扩展,并且对不同的voice和多种说话风格都有好的表现。
results: 这个研究获得了较高的准确率(91.03%)和较低的延迟(1秒),并且使用了15% fewer parameters和低Bitprecision,实现了57%的内存储存降少。Abstract
In a multi-speaker "cocktail party" scenario, a listener can selectively attend to a speaker of interest. Studies into the human auditory attention network demonstrate cortical entrainment to speech envelopes resulting in highly correlated Electroencephalography (EEG) measurements. Current trends in EEG-based auditory attention detection (AAD) using artificial neural networks (ANN) are not practical for edge-computing platforms due to longer decision windows using several EEG channels, with higher power consumption and larger memory footprint requirements. Nor are ANNs capable of accurately modeling the brain's top-down attention network since the cortical organization is complex and layer. In this paper, we propose a hybrid convolutional neural network-spiking neural network (CNN-SNN) corticomorphic architecture, inspired by the auditory cortex, which uses EEG data along with multi-speaker speech envelopes to successfully decode auditory attention with low latency down to 1 second, using only 8 EEG electrodes strategically placed close to the auditory cortex, at a significantly higher accuracy of 91.03%, compared to the state-of-the-art. Simultaneously, when compared to a traditional CNN reference model, our model uses ~15% fewer parameters at a lower bit precision resulting in ~57% memory footprint reduction. The results show great promise for edge-computing in brain-embedded devices, like smart hearing aids.
摘要
在多个说话者的 "cocktail party" 场景中,一个听众可以选择性地注意到 interessante 的说话者。人类听力注意网络的研究表明,在语音包裹中的 cortical 整合会导致高相关的电энцеfalографи(EEG)测量。现有的 EEG 基于听力注意检测(AAD)技术使用人工神经网络(ANN)不太实用于边缘计算平台,因为它们需要较长的决策窗口、更多的 EEG 通道和更大的存储占用。此外,ANN 不能准确模型大脑的上下文注意网络,因为大脑的 cortical 组织复杂且多层。在这篇论文中,我们提出了一种 hybrid convolutional neural network-spiking neural network(CNN-SNN) corticomorphic 架构, Draw inspiration from the auditory cortex,使用 EEG 数据以及多个说话者的语音包裹来成功地解码听力注意,延迟时间在 1 秒钟,使用只有 8 个 EEG 电极,位于 auditory cortex 附近,具有 significatively 高的准确率(91.03%),比对 state-of-the-art 更高。同时,与传统 CNN 参考模型相比,我们的模型使用了 ~15% fewer parameters,并且使用了更低的比特精度,即使 ~57% 的存储占用减少。结果表明,这种 corticomorphic 架构具有优秀的潜在应用于边缘计算的潜力,如智能耳机。
Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement
results: 我们的实验结果显示,这个受赏导向生成器可以将新的人造数据集传递到使用者指定的目标受赏值附近,并且这个改善的受赏与数据分布的迁移程度有关。此外,我们还发现了干扰因素之间的交互作用,包括受赏信号的强度、数据分布的变化和外支援抽象的成本。Abstract
We explore the methodology and theory of reward-directed generation via conditional diffusion models. Directed generation aims to generate samples with desired properties as measured by a reward function, which has broad applications in generative AI, reinforcement learning, and computational biology. We consider the common learning scenario where the data set consists of unlabeled data along with a smaller set of data with noisy reward labels. Our approach leverages a learned reward function on the smaller data set as a pseudolabeler. From a theoretical standpoint, we show that this directed generator can effectively learn and sample from the reward-conditioned data distribution. Additionally, our model is capable of recovering the latent subspace representation of data. Moreover, we establish that the model generates a new population that moves closer to a user-specified target reward value, where the optimality gap aligns with the off-policy bandit regret in the feature subspace. The improvement in rewards obtained is influenced by the interplay between the strength of the reward signal, the distribution shift, and the cost of off-support extrapolation. We provide empirical results to validate our theory and highlight the relationship between the strength of extrapolation and the quality of generated samples.
摘要
我们研究了奖励导向生成的方法ología和理论,使用条件扩散模型。奖励导向生成的目的是通过奖励函数来生成具有愿景属性的样本,这有广泛的应用在生成AI、奖励学习和计算生物学等领域。我们考虑了常见的学习场景,即数据集包括无标签数据和一个较小的噪声奖励标签数据。我们的方法利用学习的奖励函数来训练一个pseudolabeler。从理论上来说,我们的导向生成器可以有效地学习和抽取奖励条件的数据分布。此外,我们的模型还可以重现数据的隐藏特征空间表示。此外,我们证明了导向生成器可以在用户指定的目标奖励值附近生成一个新的人口,其优化差异与偏离策略异常幅度相关。实际结果 validate our theory, and highlight the relationship between the strength of extrapolation and the quality of generated samples.Note: Please note that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from the original text.
Making the Most Out of the Limited Context Length: Predictive Power Varies with Clinical Note Type and Note Section
results: 研究结果显示,临床护理纪录中的predictive power分布不同,对于护士笔记和释出笔记有不同的特征。此外,结合不同类型的护理纪录可以在长时间上提高性能。Abstract
Recent advances in large language models have led to renewed interest in natural language processing in healthcare using the free text of clinical notes. One distinguishing characteristic of clinical notes is their long time span over multiple long documents. The unique structure of clinical notes creates a new design choice: when the context length for a language model predictor is limited, which part of clinical notes should we choose as the input? Existing studies either choose the inputs with domain knowledge or simply truncate them. We propose a framework to analyze the sections with high predictive power. Using MIMIC-III, we show that: 1) predictive power distribution is different between nursing notes and discharge notes and 2) combining different types of notes could improve performance when the context length is large. Our findings suggest that a carefully selected sampling function could enable more efficient information extraction from clinical notes.
摘要
大量语言模型的进步已导致医疗领域自然语言处理(NLP)中的重新兴趣,特别是使用医疗记录中的自由文本。医疗记录的一个特点是其长时间覆盖多个长文档,这对NLP模型的设计带来了新的选择:当语言模型预测器的上下文长度有限制时,我们应该选择哪些部分作为输入?现有研究可能选择输入的领域知识或 simply truncate them。我们提出了一个框架来分析有高预测力的部分。使用MIMIC-III数据集,我们发现:1)预测力分布在许多不同的部分中有所不同,2)结合不同类型的记录可以在上下文长度较大时提高性能。我们的发现建议一个仔细选择的采样函数可以更有效地提取信息从医疗记录中。
AnyStar: Domain randomized universal star-convex 3D instance segmentation
results: 这个论文的网络可以对不同的数据集和传感器模式进行3D星形结构分类,并且不需要任何再训练、调整或域对应。Abstract
Star-convex shapes arise across bio-microscopy and radiology in the form of nuclei, nodules, metastases, and other units. Existing instance segmentation networks for such structures train on densely labeled instances for each dataset, which requires substantial and often impractical manual annotation effort. Further, significant reengineering or finetuning is needed when presented with new datasets and imaging modalities due to changes in contrast, shape, orientation, resolution, and density. We present AnyStar, a domain-randomized generative model that simulates synthetic training data of blob-like objects with randomized appearance, environments, and imaging physics to train general-purpose star-convex instance segmentation networks. As a result, networks trained using our generative model do not require annotated images from unseen datasets. A single network trained on our synthesized data accurately 3D segments C. elegans and P. dumerilii nuclei in fluorescence microscopy, mouse cortical nuclei in micro-CT, zebrafish brain nuclei in EM, and placental cotyledons in human fetal MRI, all without any retraining, finetuning, transfer learning, or domain adaptation. Code is available at https://github.com/neel-dey/AnyStar.
摘要
星形对象在生物微型Scope和放射学中出现,包括核体、肿块、迁移和其他单元。现有的实例分割网络 для这些结构通常需要大量的手动标注准确率,而且需要重大的重新引擎或finetuning,以适应新的 datasets和成像模式。我们提出了AnyStar,一种随机生成的域特征模型,通过模拟不同的星形对象的Randomized外观、环境和成像物理来训练通用的星形对象分割网络。因此,使用我们生成的数据进行训练,无需从未看过的 datasets 中获取标注图像。我们的网络可以高精度地3D分割 C. elegans 和 P. dumerilii 核体在激发微scopes 中, mouse 脑核体在 Micro-CT 中, zebrafish 脑核体在 EM 中,以及人类胎儿 Placental cotyledons 在人类胎儿 MRI 中,无需任何再训练、finetuning、转移学习或域适应。代码可以在 中找到。
Tapestry of Time and Actions: Modeling Human Activity Sequences using Temporal Point Process Flows
methods: 该研究提出了ProActive模型,基于神经网络和 temporal marked temporal point process(MTPP)框架,可以同时解决下游任务中的下一个动作预测、序列目标预测和端到端序列生成等问题。
results: 对于三个活动识别数据集的测试,ProActive模型表现出了明显的性能提升,包括动作和目标预测等,同时也实现了端到端序列生成的首次应用。Abstract
Human beings always engage in a vast range of activities and tasks that demonstrate their ability to adapt to different scenarios. Any human activity can be represented as a temporal sequence of actions performed to achieve a certain goal. Unlike the time series datasets extracted from electronics or machines, these action sequences are highly disparate in their nature -- the time to finish a sequence of actions can vary between different persons. Therefore, understanding the dynamics of these sequences is essential for many downstream tasks such as activity length prediction, goal prediction, next action recommendation, etc. Existing neural network-based approaches that learn a continuous-time activity sequence (or CTAS) are limited to the presence of only visual data or are designed specifically for a particular task, i.e., limited to next action or goal prediction. In this paper, we present ProActive, a neural marked temporal point process (MTPP) framework for modeling the continuous-time distribution of actions in an activity sequence while simultaneously addressing three high-impact problems -- next action prediction, sequence-goal prediction, and end-to-end sequence generation. Specifically, we utilize a self-attention module with temporal normalizing flows to model the influence and the inter-arrival times between actions in a sequence. In addition, we propose a novel addition over the ProActive model that can handle variations in the order of actions, i.e., different methods of achieving a given goal. We demonstrate that this variant can learn the order in which the person or actor prefers to do their actions. Extensive experiments on sequences derived from three activity recognition datasets show the significant accuracy boost of ProActive over the state-of-the-art in terms of action and goal prediction, and the first-ever application of end-to-end action sequence generation.
摘要
人类总是在各种各样的活动和任务中展示出具有适应性的能力。任何人类活动都可以表示为一个时间序列中的动作序列,以达到某个目标。与电子设备或机器所提取的时间序列数据不同,这些动作序列之间的性质异常分散,因此理解这些序列的动态是许多下游任务的关键,如动作长度预测、目标预测、下一个动作建议等。现有的神经网络基本上的方法,学习连续时间动作序列(CTAS)都是基于视觉数据的,或者特定任务的,如下一个动作或目标预测。在这篇论文中,我们提出了ProActive框架,一种基于神经网络marked temporal point process(MTPP)模型,用于模型连续时间动作序列中的动作分布,同时解决三个高影响性问题:下一个动作预测、序列目标预测和端到端动作序列生成。具体来说,我们使用自注意模块和时间正常化流来模型动作序列中的影响和间隔时间。此外,我们还提出了一种新的ProActive模型变体,可以处理动作序列中的动作顺序变化,即不同的方法来完成同一个目标。我们的实验表明,这种变体可以学习人或演员在完成动作序列时的顺序。与现有状态的艺术预测相比,ProActive在动作和目标预测方面具有显著的准确性提升,并且实现了 historia calidad的端到端动作序列生成。
Accelerated gradient methods for nonconvex optimization: Escape trajectories from strict saddle points and convergence to local minima
paper_authors: Rishabh Dixit, Mert Gurbuzbalaban, Waheed U. Bajwa
for: 本研究探讨加速度法在圆锥函数上的行为。
methods: 本文提出了一种广泛的内斯特洛夫-类型加速方法,并对这种方法进行了严格的研究,包括逃脱矩阵点和 converges to local minima,通过both asymptotic和非 asymptotic分析。
results: 本文回答了内斯特洛夫加速度法(NAG)中变量摩omentum参数是否可避免精细阶点的问题,并开发了两种 asymptotic rate of convergence和divergence的度量,对一些流行的加速方法(如NAG和NCM)进行了评估。此外,本文还提供了在精细阶点附近的“线性” exit time 估计,以及不存在这些轨迹的必要条件。最后,本文研究了一类加速方法,可以在圆锥函数上 converges to local minima with near optimal rate,同时具有更好的矩阵点逃脱行为。Abstract
This paper considers the problem of understanding the behavior of a general class of accelerated gradient methods on smooth nonconvex functions. Motivated by some recent works that have proposed effective algorithms, based on Polyak's heavy ball method and the Nesterov accelerated gradient method, to achieve convergence to a local minimum of nonconvex functions, this work proposes a broad class of Nesterov-type accelerated methods and puts forth a rigorous study of these methods encompassing the escape from saddle-points and convergence to local minima through a both asymptotic and a non-asymptotic analysis. In the asymptotic regime, this paper answers an open question of whether Nesterov's accelerated gradient method (NAG) with variable momentum parameter avoids strict saddle points almost surely. This work also develops two metrics of asymptotic rate of convergence and divergence, and evaluates these two metrics for several popular standard accelerated methods such as the NAG, and Nesterov's accelerated gradient with constant momentum (NCM) near strict saddle points. In the local regime, this work provides an analysis that leads to the "linear" exit time estimates from strict saddle neighborhoods for trajectories of these accelerated methods as well the necessary conditions for the existence of such trajectories. Finally, this work studies a sub-class of accelerated methods that can converge in convex neighborhoods of nonconvex functions with a near optimal rate to a local minima and at the same time this sub-class offers superior saddle-escape behavior compared to that of NAG.
摘要
In the asymptotic regime, the paper answers an open question about whether Nesterov's accelerated gradient method (NAG) with a variable momentum parameter avoids strict saddle points almost surely. The study also develops two metrics of asymptotic rate of convergence and divergence, and evaluates these metrics for several popular standard accelerated methods, including NAG and Nesterov's accelerated gradient with constant momentum (NCM), near strict saddle points.In the local regime, the paper provides an analysis that leads to "linear" exit time estimates from strict saddle neighborhoods for trajectories of these accelerated methods, as well as the necessary conditions for the existence of such trajectories.Finally, the paper studies a sub-class of accelerated methods that can converge in convex neighborhoods of nonconvex functions with a near-optimal rate to a local minimum, while also exhibiting superior saddle-escape behavior compared to NAG.
Multi-Player Zero-Sum Markov Games with Networked Separable Interactions
results: 证明了在 infinitem-horizon 折扣MZNMG中找到Markov stationary NE是PPAD困难的,除非网络具有星形结构。此外,提出了一种基于 fictitious-play 的动力学,并证明了其在星形网络上的收敛性。Abstract
We study a new class of Markov games (MGs), \textit{Multi-player Zero-sum Markov Games} with {\it Networked separable interactions} (MZNMGs), to model the local interaction structure in non-cooperative multi-agent sequential decision-making. We define an MZNMG as a model where {the payoffs of the auxiliary games associated with each state are zero-sum and} have some separable (i.e., polymatrix) structure across the neighbors over some interaction network. We first identify the necessary and sufficient conditions under which an MG can be presented as an MZNMG, and show that the set of Markov coarse correlated equilibrium (CCE) collapses to the set of Markov Nash equilibrium (NE) in these games, in that the {product of} per-state marginalization of the former for all players yields the latter. Furthermore, we show that finding approximate Markov \emph{stationary} CCE in infinite-horizon discounted MZNMGs is \texttt{PPAD}-hard, unless the underlying network has a ``star topology''. Then, we propose fictitious-play-type dynamics, the classical learning dynamics in normal-form games, for MZNMGs, and establish convergence guarantees to Markov stationary NE under a star-shaped network structure. Finally, in light of the hardness result, we focus on computing a Markov \emph{non-stationary} NE and provide finite-iteration guarantees for a series of value-iteration-based algorithms. We also provide numerical experiments to corroborate our theoretical results.
摘要
我们研究一种新的Markov游戏(MG),即多人零点Markov游戏(MZNMG),用于模型多个自主决策者之间的本地互动结构。我们定义MZNMG为一个模型,其中每个状态的协助游戏奖励为零点和一些分割(i.e., 多维)结构的交互网络。我们首先确定MG可以转化为MZNMG的必要和 suficient condition,并证明MG的Markov均衡(NE)与Markov均衡极限(CCE)的集合相同。此外,我们证明在无限远程积分MZNMG中找到 Approximate Markov stationary CCE 是PPAD困难的,除非网络具有星型拓扑结构。然后,我们提出了 fiction play 类型的动力学,normal form 游戏的传统学习动力学, для MZNMG,并证明其在星型网络结构下具有收敛保证。由于困难结果,我们将关注计算Markov非站点均衡,并提供了一系列值迭代基于算法的finite-iteration 保证。我们还提供了数值实验来证明我们的理论结果。
Multi-view self-supervised learning for multivariate variable-channel time series
results: 我们透过将模型预训练在六个EEG通道上,然后精致化在两个不同的EEG通道上,并与和 без传递讯息神经网络的模型进行比较。我们发现,我们的方法,结合TS2Vec损失函数,在大多数情况下都能够超过其他方法的表现。Abstract
Labeling of multivariate biomedical time series data is a laborious and expensive process. Self-supervised contrastive learning alleviates the need for large, labeled datasets through pretraining on unlabeled data. However, for multivariate time series data, the set of input channels often varies between applications, and most existing work does not allow for transfer between datasets with different sets of input channels. We propose learning one encoder to operate on all input channels individually. We then use a message passing neural network to extract a single representation across channels. We demonstrate the potential of this method by pretraining our model on a dataset with six EEG channels and then fine-tuning it on a dataset with two different EEG channels. We compare models with and without the message passing neural network across different contrastive loss functions. We show that our method, combined with the TS2Vec loss, outperforms all other methods in most settings.
摘要
Multivariate 医学时间序列数据标注是一个劳资成本高的过程。无监督对比学习可以减少大量标注数据的需求。然而,多变量时间序列数据的输入通道集合通常在应用程序之间变化,现有的大多数工作不允许数据集之间的传输。我们提议学习一个Encoder来处理所有输入通道。然后,我们使用一个消息传递神经网络提取所有通道的单一表示。我们在六个EEG通道的数据集上预训练我们的模型,然后在两个不同的EEG通道上细化模型。我们将与和 без消息传递神经网络进行比较,并使用TS2Vec损失函数。我们显示,我们的方法,结合TS2Vec损失函数,在大多数设置下超越其他方法。
Near-Optimal Bounds for Learning Gaussian Halfspaces with Random Classification Noise
results: 学习问题的样本复杂度为 $\widetilde{\Theta}(d/\epsilon)$,其中 $d$ 是维度和 $\epsilon$ 是过度误差。Abstract
We study the problem of learning general (i.e., not necessarily homogeneous) halfspaces with Random Classification Noise under the Gaussian distribution. We establish nearly-matching algorithmic and Statistical Query (SQ) lower bound results revealing a surprising information-computation gap for this basic problem. Specifically, the sample complexity of this learning problem is $\widetilde{\Theta}(d/\epsilon)$, where $d$ is the dimension and $\epsilon$ is the excess error. Our positive result is a computationally efficient learning algorithm with sample complexity $\tilde{O}(d/\epsilon + d/(\max\{p, \epsilon\})^2)$, where $p$ quantifies the bias of the target halfspace. On the lower bound side, we show that any efficient SQ algorithm (or low-degree test) for the problem requires sample complexity at least $\Omega(d^{1/2}/(\max\{p, \epsilon\})^2)$. Our lower bound suggests that this quadratic dependence on $1/\epsilon$ is inherent for efficient algorithms.
摘要
我们研究一个基本问题:在 Gaussian 分布下学习通用(也可能不是同分布)半空间,受到Random Classification Noise的干扰。我们建立了几乎匹配的算法和统计 Query(SQ)下界结果,这些结果显示了这个问题的资讯计算差距。具体来说,这个学习问题的样本Complexity是 $\widetilde{\Theta}(d/\epsilon)$, где $d$ 是维度和 $\epsilon$ 是预ulu error。我们的正面结果是一个 computationally efficient 的学习算法,其样本Complexity 是 $\tilde{O}(d/\epsilon + d/(\max\{p, \epsilon\})^2)$,其中 $p$ 是目标半空间的偏好。在下界方面,我们显示任何有效的 SQ 算法(或低度测试) для这个问题需要至少 $\Omega(d^{1/2}/(\max\{p, \epsilon\})^2)$ 的样本 Complexity。我们的下界结果表明这个 quadratic dependence on $1/\epsilon$ 是有效的算法的基本特征。
Retrieving Continuous Time Event Sequences using Neural Temporal Point Processes with Learnable Hashing
results: 实验结果显示,NeuroSeqRet框架可以提供 significanly 高的准确率和效率,并且可以适应不同的应用需求。Abstract
Temporal sequences have become pervasive in various real-world applications. Consequently, the volume of data generated in the form of continuous time-event sequence(s) or CTES(s) has increased exponentially in the past few years. Thus, a significant fraction of the ongoing research on CTES datasets involves designing models to address downstream tasks such as next-event prediction, long-term forecasting, sequence classification etc. The recent developments in predictive modeling using marked temporal point processes (MTPP) have enabled an accurate characterization of several real-world applications involving the CTESs. However, due to the complex nature of these CTES datasets, the task of large-scale retrieval of temporal sequences has been overlooked by the past literature. In detail, by CTES retrieval we mean that for an input query sequence, a retrieval system must return a ranked list of relevant sequences from a large corpus. To tackle this, we propose NeuroSeqRet, a first-of-its-kind framework designed specifically for end-to-end CTES retrieval. Specifically, NeuroSeqRet introduces multiple enhancements over standard retrieval frameworks and first applies a trainable unwarping function on the query sequence which makes it comparable with corpus sequences, especially when a relevant query-corpus pair has individually different attributes. Next, it feeds the unwarped query sequence and the corpus sequence into MTPP-guided neural relevance models. We develop four variants of the relevance model for different kinds of applications based on the trade-off between accuracy and efficiency. We also propose an optimization framework to learn binary sequence embeddings from the relevance scores, suitable for the locality-sensitive hashing. Our experiments show the significant accuracy boost of NeuroSeqRet as well as the efficacy of our hashing mechanism.
摘要
现代应用中的时间序列有 become 普遍,因此 CTES 数据的量在过去几年内 exponential 增长。因此,大量的研究在 CTES 数据集上进行下游任务,如下一个事件预测、长期预测、时间序列分类等。 current 的 predictive modeling 技术使用 marked temporal point processes (MTPP) 可以准确地 caracterize 多种实际应用中的 CTES。然而,由于 CTES 数据集的复杂性,过去的文献中忽略了大规模 temporal sequence retrieval 任务。在详细的描述中,我们定义 CTES retrieval 为输入查询序列返回一个排名列表 relevante 序列从大型废疑集中。为解决这个问题,我们提出了 NeuroSeqRet,一个专门为 CTES retrieval 设计的框架。特别是,NeuroSeqRet 引入了多种改进 standard retrieval 框架,包括在查询序列上应用可训练的 unfolding 函数,使其与废疑集序列相比较,特别是当查询序列和废疑集序列具有不同的特征时。接下来,我们将推广查询序列和废疑集序列到 MTPP 引导的神经相关模型中。我们开发了四种不同类型应用的 relevance 模型,根据准确率和效率的负担进行负担。此外,我们还提出了一种优化框架,以学习适合本地Hashing 的二进制序列嵌入。我们的实验表明 NeuroSeqRet 具有显著的准确性提升,以及我们的嵌入机制的效果。
Student Assessment in Cybersecurity Training Automated by Pattern Mining and Clustering
results: 研究发现,数据挖掘和机器学习技术是适用于cybersecurity培训数据分析的有效方法,可以帮助教育研究人员和实践者评估学生的学习进度,提供有argeted的支持和改进培训设计。Abstract
Hands-on cybersecurity training allows students and professionals to practice various tools and improve their technical skills. The training occurs in an interactive learning environment that enables completing sophisticated tasks in full-fledged operating systems, networks, and applications. During the training, the learning environment allows collecting data about trainees' interactions with the environment, such as their usage of command-line tools. These data contain patterns indicative of trainees' learning processes, and revealing them allows to assess the trainees and provide feedback to help them learn. However, automated analysis of these data is challenging. The training tasks feature complex problem-solving, and many different solution approaches are possible. Moreover, the trainees generate vast amounts of interaction data. This paper explores a dataset from 18 cybersecurity training sessions using data mining and machine learning techniques. We employed pattern mining and clustering to analyze 8834 commands collected from 113 trainees, revealing their typical behavior, mistakes, solution strategies, and difficult training stages. Pattern mining proved suitable in capturing timing information and tool usage frequency. Clustering underlined that many trainees often face the same issues, which can be addressed by targeted scaffolding. Our results show that data mining methods are suitable for analyzing cybersecurity training data. Educational researchers and practitioners can apply these methods in their contexts to assess trainees, support them, and improve the training design. Artifacts associated with this research are publicly available.
摘要
手动网络安全培训可以帮助学生和职业人员提高技术能力。这种培训发生在一个互动式学习环境中,可以完成具有真实操作系统、网络和应用程序的复杂任务。在培训过程中,学习环境可以收集学员在环境中的交互数据,例如命令行工具的使用情况。这些数据包含学员学习过程中的 patrern,可以用来评估学员并提供反馈以帮助他们学习。然而,自动分析这些数据是具有挑战性的。培训任务包括复杂的问题解决,有多种解决方案可能。此外,学员生成的交互数据非常大。这篇论文探讨了18场网络安全培训会议中的数据,使用数据挖掘和机器学习技术进行分析。我们使用模式挖掘和聚合分析113名学员执行8834个命令的数据,揭示了他们的常见行为、错误、解决策略和培训阶段的困难。模式挖掘能够捕捉时间信息和工具使用频率。聚合发现许多学员面临相同的问题,可以通过targeted scaffolding进行支持。我们的结果表明,数据挖掘方法适用于分析网络安全培训数据。教育研究人员和实践者可以在他们的 контексте中应用这些方法,评估学员,支持他们,并改进培训设计。相关的研究 artifacts 公共可用。
Leveraging Factored Action Spaces for Off-Policy Evaluation
results: 该论文提出了一种基于分解动作空间的重要抽样(IS)估计器,并证明了这种估计器在存在大 combinatorial action spaces 的问题中具有较低的偏差和偏度。通过实验,论文还证明了这些理论结论的有效性。Abstract
Off-policy evaluation (OPE) aims to estimate the benefit of following a counterfactual sequence of actions, given data collected from executed sequences. However, existing OPE estimators often exhibit high bias and high variance in problems involving large, combinatorial action spaces. We investigate how to mitigate this issue using factored action spaces i.e. expressing each action as a combination of independent sub-actions from smaller action spaces. This approach facilitates a finer-grained analysis of how actions differ in their effects. In this work, we propose a new family of "decomposed" importance sampling (IS) estimators based on factored action spaces. Given certain assumptions on the underlying problem structure, we prove that the decomposed IS estimators have less variance than their original non-decomposed versions, while preserving the property of zero bias. Through simulations, we empirically verify our theoretical results, probing the validity of various assumptions. Provided with a technique that can derive the action space factorisation for a given problem, our work shows that OPE can be improved "for free" by utilising this inherent problem structure.
摘要
Off-policy evaluation (OPE) 目的是估算对于不同的行动序列而言,实际执行的效果。然而,现有的 OPE 估算器经常在具有大 combinatorial action space 的问题中表现出高偏差和高方差。我们研究如何使用 factored action space,即将每个动作表示为一些更小的 action space 中的独立子动作的组合来 mitigate 这个问题。这种方法可以为我们进行更细致的动作效果分析。在这项工作中,我们提出了一种基于 factored action space 的 "decomposed" importance sampling (IS) 估算器。对于满足某些假设的问题结构,我们证明了这种 decomposed IS 估算器 的方差比原始非 decomposed 版本更低,同时保持零偏差性。通过实验,我们证明了我们的理论结果,并探索了各种假设的有效性。在给定问题中 derivation 出 action space factorization 的技术可以,我们的工作表明了 OPE 可以免费地提高,通过利用这种问题的内在结构。
Impact of Free-carrier Nonlinearities on Silicon Microring-based Reservoir Computing
results: 研究发现,在thermo-optic和自由电子效应的影响下,可以在NARMA-10任务中实现NMSE低于0.05,这个结果取决于两种效应的时间常数。Abstract
We quantify the impact of thermo-optic and free-carrier effects on time-delay reservoir computing using a silicon microring resonator. We identify pump power and frequency detuning ranges with NMSE less than 0.05 for the NARMA-10 task depending on the time constants of the two considered effects.
摘要
我们测量了热光学和自由粒子效应对时延器计算的影响,使用了一个硅微环 resonator。我们确定了辐射功率和频率偏差范围,以达到NMSE小于0.05的NARMA-10任务,具体取决于两种考虑的效应的时间常数。Note: NMSE stands for "normalized mean squared error", which is a measure of the difference between the predicted and actual values. NARMA-10 is a benchmark task for time series forecasting.
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
results: 相比 DreamBooth 和 Textual Inversion,HyperDreamBooth 可以在20秒内实现人脸个性化生成,使用单个参考图片,并保持同样的质量和样式多样性。此外,HyperDreamBooth 的模型尺寸为10000倍小于normal DreamBooth模型。Abstract
Personalization has emerged as a prominent aspect within the field of generative AI, enabling the synthesis of individuals in diverse contexts and styles, while retaining high-fidelity to their identities. However, the process of personalization presents inherent challenges in terms of time and memory requirements. Fine-tuning each personalized model needs considerable GPU time investment, and storing a personalized model per subject can be demanding in terms of storage capacity. To overcome these challenges, we propose HyperDreamBooth-a hypernetwork capable of efficiently generating a small set of personalized weights from a single image of a person. By composing these weights into the diffusion model, coupled with fast finetuning, HyperDreamBooth can generate a person's face in various contexts and styles, with high subject details while also preserving the model's crucial knowledge of diverse styles and semantic modifications. Our method achieves personalization on faces in roughly 20 seconds, 25x faster than DreamBooth and 125x faster than Textual Inversion, using as few as one reference image, with the same quality and style diversity as DreamBooth. Also our method yields a model that is 10000x smaller than a normal DreamBooth model. Project page: https://hyperdreambooth.github.io
摘要
personalization 在生成AI中变得更加重要,可以Synthesize individuals in diverse contexts and styles, while retaining high-fidelity to their identities. However, the process of personalization presents inherent challenges in terms of time and memory requirements. Fine-tuning each personalized model needs considerable GPU time investment, and storing a personalized model per subject can be demanding in terms of storage capacity. To overcome these challenges, we propose HyperDreamBooth-a hypernetwork capable of efficiently generating a small set of personalized weights from a single image of a person. By composing these weights into the diffusion model, coupled with fast finetuning, HyperDreamBooth can generate a person's face in various contexts and styles, with high subject details while also preserving the model's crucial knowledge of diverse styles and semantic modifications. Our method achieves personalization on faces in roughly 20 seconds, 25x faster than DreamBooth and 125x faster than Textual Inversion, using as few as one reference image, with the same quality and style diversity as DreamBooth. Also our method yields a model that is 10000x smaller than a normal DreamBooth model. Project page:
In-context Autoencoder for Context Compression in a Large Language Model
results: 经过预训练和细化Objective,ICAE可以生成高精度、涵盖性好的内存槽,可以conditioning by target LLM для多种提示生成恰当的响应。Abstract
We propose the In-context Autoencoder (ICAE) for context compression in a large language model (LLM). The ICAE has two modules: a learnable encoder adapted with LoRA from an LLM for compressing a long context into a limited number of memory slots, and a fixed decoder which is the target LLM that can condition on the memory slots for various purposes. We first pretrain the ICAE using both autoencoding and language modeling objectives on massive text data, enabling it to generate memory slots that accurately and comprehensively represent the original context. Then, we fine-tune the pretrained ICAE on a small amount of instruct data to enhance its interaction with various prompts for producing desirable responses. Our experimental results demonstrate that the ICAE learned with our proposed pretraining and fine-tuning paradigm can effectively produce memory slots with $4\times$ context compression, which can be well conditioned on by the target LLM to respond to various prompts. The promising results demonstrate significant implications of the ICAE for its novel approach to the long context problem and its potential to reduce computation and memory overheads for LLM inference in practice, suggesting further research effort in context management for an LLM. Our code and data will be released shortly.
摘要
我们提出了内 Context Autoencoder(ICAE),用于压缩大语言模型(LLM)中的上下文。ICAE有两个模块:一个可学习的编码器,通过从LLM中提取LoRA来压缩长上下文到有限的内存槽中,以及一个固定的解码器,它是目标LLM,可以根据内存槽进行多种目的的条件。我们首先使用自动编码和语言模型目标来预训练ICAE,使其能够生成准确和全面的内存槽,然后通过细化预训练ICAE来进一步调整它与不同的提示进行交互,以生成满意的回答。我们的实验结果表明,通过我们提出的预训练和细化调整方法,ICAE可以有效地生成4倍压缩的内存槽,可以由目标LLM良好地条件。这些成果表明ICAE的新的方法对长上下文问题具有重要的意义,并且可以减少LLM推理中的计算和内存占用,建议进一步研究上下文管理的LLM。我们的代码和数据将很快发布。
On the Connection between Game-Theoretic Feature Attributions and Counterfactual Explanations
results: 研究发现,在满足certain condition时,特征归因和 counterfactual 解释方法是等价的。此外,研究还发现了使用 counterfactual 解释方法的局限性。In English, this translates to:
for: This research explores the relationship between two popular types of Explainable Artificial Intelligence (XAI) methods, namely feature attribution and counterfactual explanations.
methods: The research uses game-theoretic feature attribution and counterfactual explanation methods, with modifications made to both.
results: The study finds that, under certain conditions, feature attribution and counterfactual explanation methods are equivalent. Additionally, the study highlights the limitations of using counterfactual explanations.I hope this helps!Abstract
Explainable Artificial Intelligence (XAI) has received widespread interest in recent years, and two of the most popular types of explanations are feature attributions, and counterfactual explanations. These classes of approaches have been largely studied independently and the few attempts at reconciling them have been primarily empirical. This work establishes a clear theoretical connection between game-theoretic feature attributions, focusing on but not limited to SHAP, and counterfactuals explanations. After motivating operative changes to Shapley values based feature attributions and counterfactual explanations, we prove that, under conditions, they are in fact equivalent. We then extend the equivalency result to game-theoretic solution concepts beyond Shapley values. Moreover, through the analysis of the conditions of such equivalence, we shed light on the limitations of naively using counterfactual explanations to provide feature importances. Experiments on three datasets quantitatively show the difference in explanations at every stage of the connection between the two approaches and corroborate the theoretical findings.
摘要 Explainable Artificial Intelligence (XAI) 在最近几年内得到了广泛的关注,而两种最受欢迎的解释方法是特征归因和counterfactual解释。这两种方法在大多数情况下被研究独立地,只有一些基于实际研究的尝试进行了一些相互关系。本文在游戏理论特征归因和counterfactual解释之间建立了明确的理论连接,并且对 SHAP 特征归因进行了操作性的改变和counterfactual解释进行了扩展。我们证明,在某些条件下,这两种方法是等价的。然后,我们扩展了等价结果到游戏理论解决方案之外的其他解决方案。此外,通过分析等价条件的限制,我们把Counterfactual解释的局限性透视到了naively使用Counterfactual解释来提供特征重要性。实验在三个数据集上展示了在连接这两种方法的每个阶段的差异,并证明了理论发现的结论。 Here's the translation in Traditional Chinese: Explainable Artificial Intelligence (XAI) 在最近几年内得到了广泛的关注,而两种最受欢迎的解释方法是特征归因和counterfactual解释。这两种方法在大多数情况下被研究独立地,只有一些基于实际研究的尝试进行了一些相互关系。本文在游戏理论特征归因和counterfactual解释之间建立了明确的理论连接,并且对 SHAP 特征归因进行了操作性的改变和counterfactual解释进行了扩展。我们证明,在某些条件下,这两种方法是等价的。然后,我们扩展了等价结果到游戏理论解决方案之外的其他解决方案。此外,通过分析等价条件的限制,我们把Counterfactual解释的局限性透视到了naively使用Counterfactual解释来提供特征重要性。实验在三个数据集上展示了在连接这两种方法的每个阶段的差异,并证明了理论发现的结论。
Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models
paper_authors: Moab Arar, Rinon Gal, Yuval Atzmon, Gal Chechik, Daniel Cohen-Or, Ariel Shamir, Amit H. Bermano
For: This paper focuses on improving the text-to-image (T2I) personalization process by developing a domain-agnostic method that can handle diverse concepts without requiring specialized datasets or prior information.* Methods: The proposed method uses a contrastive-based regularization technique to maintain high fidelity to the target concept characteristics while keeping the predicted embeddings close to editable regions of the latent space. This is achieved by pushing the predicted tokens towards their nearest existing CLIP tokens.* Results: The experimental results demonstrate the effectiveness of the proposed approach, showing that the learned tokens are more semantic than tokens predicted by unregularized models. This leads to a better representation that achieves state-of-the-art performance while being more flexible than previous methods.Here’s the Chinese translation of the three key points:* For: 这篇论文关注提高文本到图像(T2I)个性化过程,开发了不需要特殊数据集或先知信息的领域独立方法,可以处理多样的概念。* Methods: 该方法使用了对比基于的正则化技术,以保持高度准确地表现目标概念特征,同时将预测的符号靠近CLIP符号的 editable 区域。* Results: 实验结果表明,提出的方法有效,学习的符号比未正则化模型预测的符号更加 semantics,从而实现了更好的表示,并且比先前的方法更 flexible。Abstract
Text-to-image (T2I) personalization allows users to guide the creative image generation process by combining their own visual concepts in natural language prompts. Recently, encoder-based techniques have emerged as a new effective approach for T2I personalization, reducing the need for multiple images and long training times. However, most existing encoders are limited to a single-class domain, which hinders their ability to handle diverse concepts. In this work, we propose a domain-agnostic method that does not require any specialized dataset or prior information about the personalized concepts. We introduce a novel contrastive-based regularization technique to maintain high fidelity to the target concept characteristics while keeping the predicted embeddings close to editable regions of the latent space, by pushing the predicted tokens toward their nearest existing CLIP tokens. Our experimental results demonstrate the effectiveness of our approach and show how the learned tokens are more semantic than tokens predicted by unregularized models. This leads to a better representation that achieves state-of-the-art performance while being more flexible than previous methods.
摘要
DRAGON: A Dialogue-Based Robot for Assistive Navigation with Visual Language Grounding
for: 帮助人们 WITH visual impairments (PwVI) 更好地理解和导航他们周围的空间。
methods: 使用对话系统和自然语言相关的环境映射技术,以便从用户的自由形式描述约束下导航。
results: 在一个日常的室内环境中,DRAGON 能够与用户进行流畅的交互,提供良好的导航体验,并使用自然语言连接用户与周围环境的概念。Abstract
Persons with visual impairments (PwVI) have difficulties understanding and navigating spaces around them. Current wayfinding technologies either focus solely on navigation or provide limited communication about the environment. Motivated by recent advances in visual-language grounding and semantic navigation, we propose DRAGON, a guiding robot powered by a dialogue system and the ability to associate the environment with natural language. By understanding the commands from the user, DRAGON is able to guide the user to the desired landmarks on the map, describe the environment, and answer questions from visual observations. Through effective utilization of dialogue, the robot can ground the user's free-form descriptions to landmarks in the environment, and give the user semantic information through spoken language. We conduct a user study with blindfolded participants in an everyday indoor environment. Our results demonstrate that DRAGON is able to communicate with the user smoothly, provide a good guiding experience, and connect users with their surrounding environment in an intuitive manner.
摘要
视障人群(PwVI)在周围环境中有困难理解和导航。现有的导航技术 Either focus solely on navigation or provide limited communication about the environment. 我们受到最近的视觉语言固定和semantic navigation的进步 inspirited,我们提出了DRAGON,一个带有对话系统的导航机器人。通过理解用户的命令,DRAGON能够根据用户的描述导航到地图上的目标,描述环境,并根据视觉观察回答问题。通过对话的有效利用,机器人可以将用户的自由形描述与环境中的标志相关联,并通过语音提供 semantic information。我们在日常室内环境中进行了盲人参与者的用户研究。我们的结果表明,DRAGON能够与用户交流平滑,提供良好的导航体验,并将用户与周围环境连接起来在INTUITIVE的方式。
Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality
results: 本文证明了SGDWeighted平均方案的 asymptotic normality,并提供了在线进行有效的推断方法。此外,本文还提出了一种适应性的平均方案,可以实现最佳的非假素性和有效性。Abstract
Stochastic Gradient Descent (SGD) is one of the simplest and most popular algorithms in modern statistical and machine learning due to its computational and memory efficiency. Various averaging schemes have been proposed to accelerate the convergence of SGD in different settings. In this paper, we explore a general averaging scheme for SGD. Specifically, we establish the asymptotic normality of a broad range of weighted averaged SGD solutions and provide asymptotically valid online inference approaches. Furthermore, we propose an adaptive averaging scheme that exhibits both optimal statistical rate and favorable non-asymptotic convergence, drawing insights from the optimal weight for the linear model in terms of non-asymptotic mean squared error (MSE).
摘要
Uncovering Unique Concept Vectors through Latent Space Decomposition
paper_authors: Mara Graziani, Laura O’ Mahony, An-Phi Nguyen, Henning Müller, Vincent Andrearczyk
for: 这 paper 的目的是解释深度学习模型的内部工作机制,以建立信任和确保模型的安全性。
methods: 这 paper 使用了一种新的后处置无监督方法,自动找出深度模型在训练过程中学习的概念。这种方法包括分解层的积分空间为单个向量,并通过无监督 clustering 精炼这些向量,以获得与模型预测有关的概念向量。
results: experiments 表明,大多数这些概念向量是人类可理解的,具有凝聚性,并与任务有关。此外,这种方法还可以成功地在数据集探索中标识受到各种干扰因素影响的训练样本。这种新的探索技术具有数据类型和模型架构的弹性,可以帮助发现训练数据中的偏见和错误来源。Abstract
Interpreting the inner workings of deep learning models is crucial for establishing trust and ensuring model safety. Concept-based explanations have emerged as a superior approach that is more interpretable than feature attribution estimates such as pixel saliency. However, defining the concepts for the interpretability analysis biases the explanations by the user's expectations on the concepts. To address this, we propose a novel post-hoc unsupervised method that automatically uncovers the concepts learned by deep models during training. By decomposing the latent space of a layer in singular vectors and refining them by unsupervised clustering, we uncover concept vectors aligned with directions of high variance that are relevant to the model prediction, and that point to semantically distinct concepts. Our extensive experiments reveal that the majority of our concepts are readily understandable to humans, exhibit coherency, and bear relevance to the task at hand. Moreover, we showcase the practical utility of our method in dataset exploration, where our concept vectors successfully identify outlier training samples affected by various confounding factors. This novel exploration technique has remarkable versatility to data types and model architectures and it will facilitate the identification of biases and the discovery of sources of error within training data.
摘要
深度学习模型的内部机制理解是建立信任和确保模型安全的关键。概念基于的解释方法在相比特征归属估计像像素环境中显示出更好的可解释性。然而,为了定义概念用于可解释分析,用户的期望会对解释产生偏见。为解决这个问题,我们提出了一种新的后续无监督方法,可以自动抽取深度模型在训练中学习的概念。我们首先将层的潜在空间分解成单值特征,然后通过无监督划分来精细化这些特征,并提取概念向量,这些向量与模型预测中的高异常值方向相互关联,并对人类可理解。我们的广泛实验表明,大多数我们提取的概念都是人类可理解的,具有凝结性,并与任务相关。此外,我们还证明了我们的方法在数据集探索中的实际用途,我们的概念向量可以成功地检测训练样本中的异常样本,它们受到了多种干扰因素的影响。这种新的探索技术具有数据类型和模型体系的弹性,可以帮助确定模型中的偏见和发现训练数据中的错误来源。
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks
results: 这篇论文的结果表明,当任务是二分类问题,且标签取决于输入空间中只有r个方向时,执行一种简单的梯度下降多任务学习算法可以学习出真实的r个方向。这意味着,任何后续任务在r个真实坐标上可以通过学习一个线性分类器来解决,而Random Feature模型需要对维度d进行指数增长来获得这样的保证。Abstract
Feature learning, i.e. extracting meaningful representations of data, is quintessential to the practical success of neural networks trained with gradient descent, yet it is notoriously difficult to explain how and why it occurs. Recent theoretical studies have shown that shallow neural networks optimized on a single task with gradient-based methods can learn meaningful features, extending our understanding beyond the neural tangent kernel or random feature regime in which negligible feature learning occurs. But in practice, neural networks are increasingly often trained on {\em many} tasks simultaneously with differing loss functions, and these prior analyses do not generalize to such settings. In the multi-task learning setting, a variety of studies have shown effective feature learning by simple linear models. However, multi-task learning via {\em nonlinear} models, arguably the most common learning paradigm in practice, remains largely mysterious. In this work, we present the first results proving feature learning occurs in a multi-task setting with a nonlinear model. We show that when the tasks are binary classification problems with labels depending on only $r$ directions within the ambient $d\gg r$-dimensional input space, executing a simple gradient-based multitask learning algorithm on a two-layer ReLU neural network learns the ground-truth $r$ directions. In particular, any downstream task on the $r$ ground-truth coordinates can be solved by learning a linear classifier with sample and neuron complexity independent of the ambient dimension $d$, while a random feature model requires exponential complexity in $d$ for such a guarantee.
摘要
“实际上,神经网络训练的成功很大程度上取决于特征学习,即从数据中提取有意义的表现。然而,实际上这个过程仍然具有许多不明的地方。最近的理论研究表明,使用梯度下降法来训练条件是单一任务的神经网络,可以学习有意义的特征,这超越了对于神经汤圆数据或随机特征的分析。但是,实际上的神经网络通常是同时进行多个任务的,这些任务可能有不同的损失函数。这些前一 analyses 不能应用于这种情况。在多任务学习中,许多研究已经显示了有效的特征学习,但是这些研究通常是使用线性模型。然而,现実中的多任务学习通常使用非线性模型,这些模型仍然具有许多不明之处。在这个工作中,我们给出了第一个证明特征学习在多任务情况下的非线性模型的结果。我们证明,当任务是二元排序问题,labels 取决于仅有 $r$ 个方向的数据空间中的 $d \gg r$ 维度时,执行一个简单的梯度下降多任务学习算法,则可以学习真实的 $r$ 个方向。具体来说,任何在 $r$ 个真实方向上的下游任务可以通过学习一个基于样本和神经元的线性分类器,而这个 garantuee Sample 和神经元的复杂度独立于数据空间中的维度 $d$。相比之下,随机特征模型需要 $d$ 的几何级数增长以获得类似的保证。”
results: empirical study表明,即使延迟非常小,也可能导致\texttt{EG} divergence在简单的实例上,需要进行精心的分析。在适当的技术假设下, Proof that Gradient Descent-Ascent (\texttt{GDA}) and \texttt{EG} with delayed updates continue to guarantee convergence to saddle points for convex-concave and strongly convex-strongly concave settings. 复杂性下限 revelas,在延迟下, convergence 的慢化。Abstract
Delays and asynchrony are inevitable in large-scale machine-learning problems where communication plays a key role. As such, several works have extensively analyzed stochastic optimization with delayed gradients. However, as far as we are aware, no analogous theory is available for min-max optimization, a topic that has gained recent popularity due to applications in adversarial robustness, game theory, and reinforcement learning. Motivated by this gap, we examine the performance of standard min-max optimization algorithms with delayed gradient updates. First, we show (empirically) that even small delays can cause prominent algorithms like Extra-gradient (\texttt{EG}) to diverge on simple instances for which \texttt{EG} guarantees convergence in the absence of delays. Our empirical study thus suggests the need for a careful analysis of delayed versions of min-max optimization algorithms. Accordingly, under suitable technical assumptions, we prove that Gradient Descent-Ascent (\texttt{GDA}) and \texttt{EG} with delayed updates continue to guarantee convergence to saddle points for convex-concave and strongly convex-strongly concave settings. Our complexity bounds reveal, in a transparent manner, the slow-down in convergence caused by delays.
摘要
<>大规模机器学习问题中,延迟和异步性是不可避免的。因此,许多研究已经广泛分析了 Stochastic Optimization WITH 延迟 gradients。然而,我们知道的是,对于 Min-Max Optimization 这个主题,没有相应的理论。驱动于这个 gap,我们研究了标准的 Min-Max Optimization 算法 WITH 延迟更新。我们首先证明(Empirical),即使延迟非常小,也可以使得一些标准的算法,如 Extra-gradient (\texttt{EG}),在简单的实例上出现崩溃。这个实验结果表明,需要仔细分析延迟版本的 Min-Max Optimization 算法。在适当的技术假设下,我们证明了 Gradient Descent-Ascent (\texttt{GDA})和 \texttt{EG} WITH 延迟更新仍然能够确定 converges to 锚点 FOR convex-concave 和 strongly convex-strongly concave 设置。我们的复杂度 bound revelas,在一个透明的方式上,延迟引起的 convergence 慢化。Note: Simplified Chinese is a simplified version of Chinese that is used in mainland China and Singapore. It is different from Traditional Chinese, which is used in Taiwan and other countries.Please let me know if you have any further questions or if there's anything else I can help with!
Sequential Monte Carlo Learning for Time Series Structure Discovery
paper_authors: Feras A. Saad, Brian J. Patton, Matthew D. Hoffman, Rif A. Saurous, Vikash K. Mansinghka
for: 这个论文目的是自动发现复杂时间序列数据的准确模型。
methods: 论文使用了核函数进行 Bayesian 非参数性 posterior 推理,并结合了顺序 Monte Carlo(SMC)和反转 MCMC。
results: 实验结果表明,论文的方法可以在真实的时间序列数据上实现10倍到100倍的运行速度提升,并且在1,428个 econometric 数据集上实现了首次大规模的 Gaussian process 时间序列结构学习。结果表明,论文的方法可以找到更加准确的点预测和时间序列预测,而且在多个时间框架下都能够提供更加准确的预测。Abstract
This paper presents a new approach to automatically discovering accurate models of complex time series data. Working within a Bayesian nonparametric prior over a symbolic space of Gaussian process time series models, we present a novel structure learning algorithm that integrates sequential Monte Carlo (SMC) and involutive MCMC for highly effective posterior inference. Our method can be used both in "online" settings, where new data is incorporated sequentially in time, and in "offline" settings, by using nested subsets of historical data to anneal the posterior. Empirical measurements on real-world time series show that our method can deliver 10x--100x runtime speedups over previous MCMC and greedy-search structure learning algorithms targeting the same model family. We use our method to perform the first large-scale evaluation of Gaussian process time series structure learning on a prominent benchmark of 1,428 econometric datasets. The results show that our method discovers sensible models that deliver more accurate point forecasts and interval forecasts over multiple horizons as compared to widely used statistical and neural baselines that struggle on this challenging data.
摘要
Deep reinforcement learning for the dynamic vehicle dispatching problem: An event-based approach
for: This paper aims to solve the dynamic vehicle dispatching problem, which involves assigning vehicles to requests that arise stochastically over time and space.
methods: The paper uses a semi-Markov decision process to model the problem, which allows for a continuous-time treatment of the decision-making process. The authors also use double deep q-learning to train decision agents and develop a new discrete-event simulator.
results: The authors compare their policies with heuristic policies often used in practice and show that their policies exhibit better performance in terms of average waiting times, cancellation rates, and total service times. Specifically, their policies can reduce average waiting times by up to 50% relative to the other tested heuristic policies.Abstract
The dynamic vehicle dispatching problem corresponds to deciding which vehicles to assign to requests that arise stochastically over time and space. It emerges in diverse areas, such as in the assignment of trucks to loads to be transported; in emergency systems; and in ride-hailing services. In this paper, we model the problem as a semi-Markov decision process, which allows us to treat time as continuous. In this setting, decision epochs coincide with discrete events whose time intervals are random. We argue that an event-based approach substantially reduces the combinatorial complexity of the decision space and overcomes other limitations of discrete-time models often proposed in the literature. In order to test our approach, we develop a new discrete-event simulator and use double deep q-learning to train our decision agents. Numerical experiments are carried out in realistic scenarios using data from New York City. We compare the policies obtained through our approach with heuristic policies often used in practice. Results show that our policies exhibit better average waiting times, cancellation rates and total service times, with reduction in average waiting times of up to 50% relative to the other tested heuristic policies.
摘要
“Dynamic vehicle dispatching problem”对应于在时间和空间上随机出现的请求,并将车辆分配给这些请求。这种问题出现在各种领域,如货物运输、紧急系统和乘用车服务。在这篇论文中,我们使用半Markov决策过程来模型这个问题,这使得时间可以被视为连续的。在这种设定下,决策瞬间与随机时间间隔匹配,而不是离散时间点。我们认为事件基本方法可以减少决策空间的 combinatorial 复杂性,并超越常见的离散时间模型。为了测试我们的方法,我们开发了一个新的离散事件仿真器,并使用双层深度Q学习来训练我们的决策代理。在使用实际的纽约市数据进行数学实验后,我们与常见的规则进行比较。结果表明,我们的策略可以提供更低的待机时间、取消率和总服务时间,减少待机时间的减少为50%。
The complexity of non-stationary reinforcement learning
results: 论文表明,只需要添加一个新的状态动作对来实现更容易,而不需要修改整个状态空间。Abstract
The problem of continual learning in the domain of reinforcement learning, often called non-stationary reinforcement learning, has been identified as an important challenge to the application of reinforcement learning. We prove a worst-case complexity result, which we believe captures this challenge: Modifying the probabilities or the reward of a single state-action pair in a reinforcement learning problem requires an amount of time almost as large as the number of states in order to keep the value function up to date, unless the strong exponential time hypothesis (SETH) is false; SETH is a widely accepted strengthening of the P $\neq$ NP conjecture. Recall that the number of states in current applications of reinforcement learning is typically astronomical. In contrast, we show that just $\textit{adding}$ a new state-action pair is considerably easier to implement.
摘要
“ continual learning 在强化学习领域中的问题,通常被称为非站点强化学习,已被认为是强化学习应用的重要挑战。我们证明了一个最差情况复杂度结果,我们认为这个结果捕捉了这个挑战:对于一个强化学习问题中的一对州动作掌握率或奖励的修改,需要一个几乎等于states的数量的时间才能保持值函数最新, Unless SETH(强 exponential time hypothesis)是假的;SETH是强化学习中的一个广泛accepted的强化。 recall that the number of states in current applications of reinforcement learning is typically astronomical. 在 contrast,我们显示了仅将一个新的州动作 pair added 是许多 easier to implement.”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The translation is based on the original text and may not be perfect, as there may be nuances or cultural references that are lost in translation.
Identifying Early Help Referrals For Local Authorities With Machine Learning And Bias Analysis
paper_authors: Eufrásio de A. Lima Neto, Jonathan Bailiss, Axel Finke, Jo Miller, Georgina Cosma
for: 这个论文是为了研究利用机器学习(ML)技术来帮助专家Identify families that may need Early Help assessment and support。
methods: 这个论文使用了机器学习模型来分析Leicestershire County Council(LCC)提供的14360个年龄在18岁以下的记录,并应用了减少偏见的技术来提高模型的公平性。
results: 试验表明,这些机器学习模型可以帮助专家Identify young people requiring intervention or early help,但也会生成许多假阳性结果,尤其是在使用偏见数据时。这篇论文 empirically explores the suitability of data-driven ML models for identifying young people who may require Early Help services and discusses their appropriateness and limitations for this task。Abstract
Local authorities in England, such as Leicestershire County Council (LCC), provide Early Help services that can be offered at any point in a young person's life when they experience difficulties that cannot be supported by universal services alone, such as schools. This paper investigates the utilisation of machine learning (ML) to assist experts in identifying families that may need to be referred for Early Help assessment and support. LCC provided an anonymised dataset comprising 14360 records of young people under the age of 18. The dataset was pre-processed, machine learning models were build, and experiments were conducted to validate and test the performance of the models. Bias mitigation techniques were applied to improve the fairness of these models. During testing, while the models demonstrated the capability to identify young people requiring intervention or early help, they also produced a significant number of false positives, especially when constructed with imbalanced data, incorrectly identifying individuals who most likely did not need an Early Help referral. This paper empirically explores the suitability of data-driven ML models for identifying young people who may require Early Help services and discusses their appropriateness and limitations for this task.
摘要
本文研究利用机器学习(ML)技术帮助专业人员确定需要 Early Help 评估和支持的家庭。英格兰当地 autorities,如莱斯特郡 council(LCC),提供 Early Help 服务,这些服务可以在年轻人生活中任何时候提供,只要他们不能够通过一般服务得到支持。本文使用了 LCC 提供的匿名数据集,包含14360名年轻人 beneath 18岁。数据集经过了预处理,建立了机器学习模型,并进行了验证和测试。为了减少模型偏见,应用了减少偏见技术。在测试中,模型表现出了能够 identificatin 需要 Early Help 评估和支持的年轻人,但也产生了较多的假阳性结果,特别是使用不均衡数据构建模型时。本文employs 数据驱动的 ML 模型来确定需要 Early Help 服务的年轻人,并讨论这些模型的适用性和局限性。
Embodied Lifelong Learning for Task and Motion Planning
results: 本研究在实验2D领域和BEHAVIOR库中的几个问题上显示了明显的规划成功。Abstract
A robot deployed in a home over long stretches of time faces a true lifelong learning problem. As it seeks to provide assistance to its users, the robot should leverage any accumulated experience to improve its own knowledge to become a more proficient assistant. We formalize this setting with a novel lifelong learning problem formulation in the context of learning for task and motion planning (TAMP). Exploiting the modularity of TAMP systems, we develop a generative mixture model that produces candidate continuous parameters for a planner. Whereas most existing lifelong learning approaches determine a priori how data is shared across task models, our approach learns shared and non-shared models and determines which to use online during planning based on auxiliary tasks that serve as a proxy for each model's understanding of a state. Our method exhibits substantial improvements in planning success on simulated 2D domains and on several problems from the BEHAVIOR benchmark.
摘要
一个在家中长期部署的机器人面临着真正的一生学习问题。为了为用户提供帮助,机器人应该利用所获经验来提高自己的知识,成为更加准确的助手。我们将这种设定写入一个新的一生学习问题的形式,在学习任务和动作规划(TAMP)上进行 формализации。我们利用TAMP系统的模块性,开发了一种生成式混合模型,生成候选的连续参数 для规划器。而大多数现有的一生学习方法在数据共享上做出了先前决定,我们的方法在线上决定使用共享和非共享模型,根据auxiliary任务作为每个模型对状态的理解的代理。我们的方法在模拟的2D领域和BEHAVIOR benchmark上显示出了明显的改善。
results: 研究发现,这种扩展可以提高机器学习模型的准确率,平均提高63%。研究还发现,这种提高的一部分是由数据集的均衡带来的,另一部分是由数据集的大小增加带来的。Abstract
This paper discusses and evaluates ideas of data balancing and data augmentation in the context of mathematical objects: an important topic for both the symbolic computation and satisfiability checking communities, when they are making use of machine learning techniques to optimise their tools. We consider a dataset of non-linear polynomial problems and the problem of selecting a variable ordering for cylindrical algebraic decomposition to tackle these with. By swapping the variable names in already labelled problems, we generate new problem instances that do not require any further labelling when viewing the selection as a classification problem. We find this augmentation increases the accuracy of ML models by 63% on average. We study what part of this improvement is due to the balancing of the dataset and what is achieved thanks to further increasing the size of the dataset, concluding that both have a very significant effect. We finish the paper by reflecting on how this idea could be applied in other uses of machine learning in mathematics.
摘要