cs.AI - 2023-07-24

QAmplifyNet: Pushing the Boundaries of Supply Chain Backorder Prediction Using Interpretable Hybrid Quantum - Classical Neural Network

  • paper_url: http://arxiv.org/abs/2307.12906
  • repo_url: None
  • paper_authors: Md Abrar Jahin, Md Sakib Hossain Shovon, Md. Saiful Islam, Jungpil Shin, M. F. Mridha, Yuichi Okuyama
    for: 这个研究是为了提高供应链管理中的货物预先预测,以便优化存储控制、减少成本和提高顾客满意度。methods: 本研究提出了一个新的方法ológical framework,使用量子概念的启发法来预测货物预先,并且使用量子-классиiral neural network来预测货物预先效果。results: 实验评估表明,QAmplifyNet模型在短时间和不寻常的数据集上预测货物预先的性能比 классиical models、量子组合、量子神经网和深度强化学习模型更好。此外,QAmplifyNet模型的可读性也得到了改进,使用可读性的人工智能技术。实际应用中,QAmplifyNet模型可以实现有效的存储控制、减少货物预先和提高操作效率。未来的工作包括进一步探索量子概念启发法,扩大数据集和探索其他供应链应用。
    Abstract Supply chain management relies on accurate backorder prediction for optimizing inventory control, reducing costs, and enhancing customer satisfaction. However, traditional machine-learning models struggle with large-scale datasets and complex relationships, hindering real-world data collection. This research introduces a novel methodological framework for supply chain backorder prediction, addressing the challenge of handling large datasets. Our proposed model, QAmplifyNet, employs quantum-inspired techniques within a quantum-classical neural network to predict backorders effectively on short and imbalanced datasets. Experimental evaluations on a benchmark dataset demonstrate QAmplifyNet's superiority over classical models, quantum ensembles, quantum neural networks, and deep reinforcement learning. Its proficiency in handling short, imbalanced datasets makes it an ideal solution for supply chain management. To enhance model interpretability, we use Explainable Artificial Intelligence techniques. Practical implications include improved inventory control, reduced backorders, and enhanced operational efficiency. QAmplifyNet seamlessly integrates into real-world supply chain management systems, enabling proactive decision-making and efficient resource allocation. Future work involves exploring additional quantum-inspired techniques, expanding the dataset, and investigating other supply chain applications. This research unlocks the potential of quantum computing in supply chain optimization and paves the way for further exploration of quantum-inspired machine learning models in supply chain management. Our framework and QAmplifyNet model offer a breakthrough approach to supply chain backorder prediction, providing superior performance and opening new avenues for leveraging quantum-inspired techniques in supply chain management.
    摘要 供应链管理需要准确预测落后订单,以优化存储控制、降低成本和提高客户满意度。然而,传统的机器学习模型在巨量数据和复杂关系下难以取得实际数据收集。本研究提出了一种新的方法ológica framework for supply chain backorder prediction,解决了处理巨量数据的挑战。我们的提出的模型,QAmplifyNet,利用量子启发技术在量子-классиical neural network中预测落后订单,效果更高于经典模型、量子集合、量子神经网络和深度奖励学习。实验评估表明,QAmplifyNet在短时间和不均衡数据上的预测性能较高,适用于实际供应链管理。为提高模型可读性,我们使用Explainable Artificial Intelligence技术。实际应用包括改善存储控制、减少落后订单和提高操作效率。QAmplifyNet可顺利 интегра到实际供应链管理系统中,允许批处理决策和有效资源分配。未来工作包括进一步探索量子启发技术、扩大数据集和探索其他供应链应用。本研究解锁了量子计算在供应链优化中的潜力,开辟了量子启发机器学习模型在供应链管理中的新 Avenues。我们的框架和QAmplifyNet模型提供了落后订单预测的突破方法,提供了更高的性能,开启了新的可能性 для利用量子启发技术在供应链管理中。

Towards Bridging the FL Performance-Explainability Trade-Off: A Trustworthy 6G RAN Slicing Use-Case

  • paper_url: http://arxiv.org/abs/2307.12903
  • repo_url: None
  • paper_authors: Swastika Roy, Hatim Chergui, Christos Verikoukis
  • for: sixth-generation (6G) networks 与多元网络slice 共存下,AI驱动的零touch管理和orchestration (MANO) 成为重要的。但是,确保AI黑盒子在实际应用中的可靠性是问题。
  • methods: this paper presents a novel explanation-guided in-hoc federated learning (FL) approach, which combines a constrained resource allocation model and an explainer exchange in a closed loop (CL) fashion to achieve transparent 6G network slicing resource management in a RAN-Edge setup under non-independent identically distributed (non-IID) datasets.
  • results: the proposed approach achieves a balance between AI performance and explainability, and outperforms the unconstrained Integrated-Gradient post-hoc FL baseline in terms of faithfulness of explanations and overall training process.Here is the full answer in Simplified Chinese:
  • for: sixth-generation (6G) networks 与多元网络slice 共存下,AI驱动的零touch管理和orchestration (MANO) 成为重要的。但是,确保AI黑盒子在实际应用中的可靠性是问题。
  • methods: this paper presents a novel explanation-guided in-hoc federated learning (FL) approach, which combines a constrained resource allocation model and an explainer exchange in a closed loop (CL) fashion to achieve transparent 6G network slicing resource management in a RAN-Edge setup under non-independent identically distributed (non-IID) datasets.
  • results: the proposed approach achieves a balance between AI performance and explainability, and outperforms the unconstrained Integrated-Gradient post-hoc FL baseline in terms of faithfulness of explanations and overall training process.
    Abstract In the context of sixth-generation (6G) networks, where diverse network slices coexist, the adoption of AI-driven zero-touch management and orchestration (MANO) becomes crucial. However, ensuring the trustworthiness of AI black-boxes in real deployments is challenging. Explainable AI (XAI) tools can play a vital role in establishing transparency among the stakeholders in the slicing ecosystem. But there is a trade-off between AI performance and explainability, posing a dilemma for trustworthy 6G network slicing because the stakeholders require both highly performing AI models for efficient resource allocation and explainable decision-making to ensure fairness, accountability, and compliance. To balance this trade off and inspired by the closed loop automation and XAI methodologies, this paper presents a novel explanation-guided in-hoc federated learning (FL) approach where a constrained resource allocation model and an explainer exchange -- in a closed loop (CL) fashion -- soft attributions of the features as well as inference predictions to achieve a transparent 6G network slicing resource management in a RAN-Edge setup under non-independent identically distributed (non-IID) datasets. In particular, we quantitatively validate the faithfulness of the explanations via the so-called attribution-based confidence metric that is included as a constraint to guide the overall training process in the run-time FL optimization task. In this respect, Integrated-Gradient (IG) as well as Input $\times$ Gradient and SHAP are used to generate the attributions for our proposed in-hoc scheme, wherefore simulation results under different methods confirm its success in tackling the performance-explainability trade-off and its superiority over the unconstrained Integrated-Gradient post-hoc FL baseline.
    摘要 在 sixth-generation(6G)网络上,多种网络slice共存的情况下,采用AI驱动的零 touched Management和Orchestration(MANO)变得非常重要。然而,在实际部署中确保AI黑obox的可靠性是一项挑战。 Explainable AI(XAI)工具可以在slice ecosystem中建立透明度,但是存在AI性能和解释性之间的负担,这种负担对于可靠的6G网络slice进行分配是一个悖论。为了平衡这个负担,我们提出了一种基于closed loop自动化和XAI方法的新的解释导向 federated learning(FL)方法。在这种方法中,一个受限的资源分配模型和一个解释器在closed loop(CL)方式交换软属性和推理预测结果,以实现透明的6G网络slice资源管理。特别是,我们在运行时FL优化任务中包含了一个受限的权重矩阵,以确保解释的准确性。在这种情况下,我们使用Integrated-Gradient(IG)、Input × Gradient和SHAP等方法生成解释,并通过在不同方法下的 simulate结果证明了我们的方法的成功。

As Time Goes By: Adding a Temporal Dimension Towards Resolving Delegations in Liquid Democracy

  • paper_url: http://arxiv.org/abs/2307.12898
  • repo_url: None
  • paper_authors: Evangelos Markakis, Georgios Papasotiropoulos
  • for: This paper aims to integrate a time horizon into decision-making problems in Liquid Democracy systems to enhance participation.
  • methods: The paper uses temporal graph theory to analyze the computational complexity of Liquid Democracy systems with a time horizon.
  • results: The paper shows that adding a time horizon can increase the number of possible delegation paths and reduce the loss of votes due to delegation cycles or abstaining agents, ultimately enhancing participation in Liquid Democracy systems.
    Abstract In recent years, the study of various models and questions related to Liquid Democracy has been of growing interest among the community of Computational Social Choice. A concern that has been raised, is that current academic literature focuses solely on static inputs, concealing a key characteristic of Liquid Democracy: the right for a voter to change her mind as time goes by, regarding her options of whether to vote herself or delegate her vote to other participants, till the final voting deadline. In real life, a period of extended deliberation preceding the election-day motivates voters to adapt their behaviour over time, either based on observations of the remaining electorate or on information acquired for the topic at hand. By adding a temporal dimension to Liquid Democracy, such adaptations can increase the number of possible delegation paths and reduce the loss of votes due to delegation cycles or delegating paths towards abstaining agents, ultimately enhancing participation. Our work takes a first step to integrate a time horizon into decision-making problems in Liquid Democracy systems. Our approach, via a computational complexity analysis, exploits concepts and tools from temporal graph theory which turn out to be convenient for our framework.
    摘要

Anytime Model Selection in Linear Bandits

  • paper_url: http://arxiv.org/abs/2307.12897
  • repo_url: None
  • paper_authors: Parnian Kassraie, Aldo Pacchiano, Nicolas Emmenegger, Andreas Krause
  • for: 本文研究了带剑优化中的模型选择问题,即在搜索和利用之间进行平衡,以便在不同的模型中选择最佳的一个。
  • methods: 作者使用了在线学习算法,将不同的模型当做专家进行处理。然而,现有的方法具有$\text{poly}M$的复杂度,与模型数量 $M$ 成直接相关。
  • results: 作者提出了ALEXP方法,它具有$\log M$ 的依赖关系,并且具有任何时间保证的减册。此外,ALEXP方法不需要知道预测horizon $n$,也不需要早期充分探索阶段。
    Abstract Model selection in the context of bandit optimization is a challenging problem, as it requires balancing exploration and exploitation not only for action selection, but also for model selection. One natural approach is to rely on online learning algorithms that treat different models as experts. Existing methods, however, scale poorly ($\text{poly}M$) with the number of models $M$ in terms of their regret. Our key insight is that, for model selection in linear bandits, we can emulate full-information feedback to the online learner with a favorable bias-variance trade-off. This allows us to develop ALEXP, which has an exponentially improved ($\log M$) dependence on $M$ for its regret. ALEXP has anytime guarantees on its regret, and neither requires knowledge of the horizon $n$, nor relies on an initial purely exploratory stage. Our approach utilizes a novel time-uniform analysis of the Lasso, establishing a new connection between online learning and high-dimensional statistics.
    摘要 模型选择在带刺优化上是一个复杂的问题,因为它需要平衡探索和利用,不仅 для动作选择,而且还为模型选择。一个自然的方法是通过在线学习算法来处理不同的模型。现有方法,然而, scales poorly($\text{poly}M$)于模型数量 $M$ 的 regret。我们的关键发现是,在线选择器中的模型选择可以通过利用全信息反馈来让在线学习者具有有利的偏差-variance质量。这允许我们开发 ALEXP,它的 regret 有 exponentially 改进的($\log M$)依赖于 $M$。ALEXP 具有任何时间 guarantees 的 regret,并不需要知道天数 $n$,也不需要初始阶段的纯探索阶段。我们的方法利用了一种新的时间uniform 分析,建立了在线学习和高维统计之间的新连接。

Interpretable Stereotype Identification through Reasoning

  • paper_url: http://arxiv.org/abs/2308.00071
  • repo_url: None
  • paper_authors: Jacob-Junqi Tian, Omkar Dige, David Emerson, Faiza Khan Khattak
    for: 这篇研究的目的是探讨语模型中的偏见,并在其开发中整合公平性,以确保这些模型是无偏见的和公平的。methods: 本研究使用了Vicuna-13B-v1.3进行零执行刻 sterotype 识别,并评估了将13B扩展到33B的影响。results: 研究发现,从理解到执行的改善 exceeds 从扩展到33B的改善,这表明了理解可以帮助语模型在离域任务上超越扩展的法则。此外,通过选择性分析一些理解迹象,我们显示了如何理解可以提高决策的解释性。
    Abstract Given that language models are trained on vast datasets that may contain inherent biases, there is a potential danger of inadvertently perpetuating systemic discrimination. Consequently, it becomes essential to examine and address biases in language models, integrating fairness into their development to ensure these models are equitable and free from bias. In this work, we demonstrate the importance of reasoning in zero-shot stereotype identification based on Vicuna-13B-v1.3. While we do observe improved accuracy by scaling from 13B to 33B, we show that the performance gain from reasoning significantly exceeds the gain from scaling up. Our findings suggest that reasoning could be a key factor that enables LLMs to trescend the scaling law on out-of-domain tasks such as stereotype identification. Additionally, through a qualitative analysis of select reasoning traces, we highlight how reasoning enhances not just accuracy but also the interpretability of the decision.
    摘要 language models 可能会含有隐性偏见,这可能会导致不必要地扩大系统歧视。因此,检查和解决语言模型中的偏见变得非常重要,以确保这些模型是公正的。在这种情况下,我们展示了在 Vicuna-13B-v1.3 基础上进行零 shot 刻板印象的重要性。虽然我们确实观察到了从 13B 缩放到 33B 的性能提升,但我们发现,理解的性能提升远超过缩放的提升。我们的发现表明,理解可以使 LLMS 突破预期的性能增长。此外,通过分析 select 的理解轨迹,我们指出了如何使用理解来提高决策的可读性。

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

  • paper_url: http://arxiv.org/abs/2307.12856
  • repo_url: None
  • paper_authors: Izzeddin Gur, Hiroki Furuta, Austin Huang, Mustafa Safdari, Yutaka Matsuo, Douglas Eck, Aleksandra Faust
  • for: 这篇论文的目的是提出一种基于大语言模型(LLM)的自主网络浏览器,可以根据自然语言指令完成真实网站上的任务。
  • methods: 这篇论文使用了Flan-U-PaLM和HTML-T5两种大语言模型,其中Flan-U-PaLM用于固有代码生成,HTML-T5用于长HTML文档的规划和摘要。这两种模型都使用了地方和全局注意力机制,并使用了混合长时间杂化目标来进行规划和摘要。
  • results: 论文的实验表明,使用WebAgent可以在真实网站上提高成功率超过50%,并且HTML-T5模型在解决HTML基于任务上的精度高于之前的SoTA。
    Abstract Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web navigation. However, the performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML. We introduce WebAgent, an LLM-driven agent that can complete the tasks on real websites following natural language instructions. WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via generated Python programs from those. We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization. We empirically demonstrate that our recipe improves the success on a real website by over 50%, and that HTML-T5 is the best model to solve HTML-based tasks; achieving 14.9% higher success rate than prior SoTA on the MiniWoB web navigation benchmark and better accuracy on offline task planning evaluation.
    摘要 Recently, pre-trained large language models (LLMs) have achieved better generalization and sample efficiency in autonomous web navigation. However, the performance on real-world websites is still affected by three factors: open domain, limited context length, and lack of inductive bias on HTML. To address these issues, we introduce WebAgent, an LLM-driven agent that can complete tasks on real websites based on natural language instructions.WebAgent plans ahead by breaking down instructions into canonical sub-instructions, summarizing long HTML documents into task-relevant snippets, and acting on websites through generated Python programs. We use Flan-U-PaLM for grounded code generation and HTML-T5, a new pre-trained LLM that utilizes local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization.Our empirical results show that our recipe improves the success rate on a real website by over 50%, and HTML-T5 is the best model for solving HTML-based tasks, achieving a 14.9% higher success rate than the prior state-of-the-art on the MiniWoB web navigation benchmark and better accuracy on offline task planning evaluation.

EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge: Mixed Sequences Prediction

  • paper_url: http://arxiv.org/abs/2307.12837
  • repo_url: None
  • paper_authors: Amirshayan Nasirimajd, Simone Alberto Peirone, Chiara Plizzari, Barbara Caputo
  • For: 本研究是为了解决Unsupervised Domain Adaptation (UDA) Challenge in Action Recognition 中的问题。* Methods: 我们采用了一种基于序列的方法,即将源频率和目标频率 randomly combine 生成一个修改后的序列,然后使用标准的 pseudo-labeling 策略提取目标频率中的动作标签。* Results: 我们的提交(名为 ‘sshayan’)可以在领导人员中找到,目前在 ‘verb’ 和 ‘noun’ 两个分类中排名第二和第四。
    Abstract This report presents the technical details of our approach for the EPIC-Kitchens-100 Unsupervised Domain Adaptation (UDA) Challenge in Action Recognition. Our approach is based on the idea that the order in which actions are performed is similar between the source and target domains. Based on this, we generate a modified sequence by randomly combining actions from the source and target domains. As only unlabelled target data are available under the UDA setting, we use a standard pseudo-labeling strategy for extracting action labels for the target. We then ask the network to predict the resulting action sequence. This allows to integrate information from both domains during training and to achieve better transfer results on target. Additionally, to better incorporate sequence information, we use a language model to filter unlikely sequences. Lastly, we employed a co-occurrence matrix to eliminate unseen combinations of verbs and nouns. Our submission, labeled as 'sshayan', can be found on the leaderboard, where it currently holds the 2nd position for 'verb' and the 4th position for both 'noun' and 'action'.
    摘要 Here's the Simplified Chinese translation:这份报告介绍了我们在EPIC-Kitchens-100Unsupervised Domain Adaptation(UDA)挑战中的动作识别技术细节。我们的方法基于源频率和目标频率中动作的顺序相似性。我们随机将源频率和目标频率中的动作组合在一起,然后使用标准的 Pseudo-labeling 策略提取目标频率中的动作标签。我们然后问网络预测结果的动作序列。这allow我们在训练中 integrating信息从两个频率中,以实现更好的传输结果。此外,我们还使用语言模型筛选不可能的序列,以及一个co-occurrence Matrix来消除未看到的动词和名词的组合。我们的提交,标记为'sshayan',可以在排行榜上找到,现在在'verb'、'noun'和'action'三个分类中分别排名第二和第四。

Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains

  • paper_url: http://arxiv.org/abs/2307.13012
  • repo_url: None
  • paper_authors: Martin Lebourdais, Théo Mariotte, Marie Tahon, Anthony Larcher, Antoine Laurent, Silvio Montresor, Sylvain Meignier, Jean-Hugh Thomas
    for: 这篇论文主要针对的是voice activity detection和 overlap speech detection的预处理任务,以提高speaker diarization的最终 segmentation性能。methods: 这篇论文提出了一个全新的benchmark,用于评估不同的voice activity detection和overlap speech detection模型,在多个音频设置和语音频道上。这些模型结合了一个Temporal Convolutional Network和适应于设置的语音表示,可以达到 state-of-the-art的性能水平。results: 这篇论文的实验结果显示,将voice activity detection和overlap speech detection作为一个 jointly trained模型,可以提高预处理性能,同时降低了训练成本。此外,这种unique的架构还可以用于单和多通道speech处理。
    Abstract Voice activity and overlapped speech detection (respectively VAD and OSD) are key pre-processing tasks for speaker diarization. The final segmentation performance highly relies on the robustness of these sub-tasks. Recent studies have shown VAD and OSD can be trained jointly using a multi-class classification model. However, these works are often restricted to a specific speech domain, lacking information about the generalization capacities of the systems. This paper proposes a complete and new benchmark of different VAD and OSD models, on multiple audio setups (single/multi-channel) and speech domains (e.g. media, meeting...). Our 2/3-class systems, which combine a Temporal Convolutional Network with speech representations adapted to the setup, outperform state-of-the-art results. We show that the joint training of these two tasks offers similar performances in terms of F1-score to two dedicated VAD and OSD systems while reducing the training cost. This unique architecture can also be used for single and multichannel speech processing.
    摘要 <>将文本翻译成简化字符串。<>声音活动和重叠说话检测(简称VAD和OSD)是Speaker Diagnosis的关键预处理任务。最终分 segmentation 性能强度取决于这两个子任务的稳定性。 current studies have shown that VAD and OSD can be trained together using a multi-class classification model. However, these works are often restricted to a specific speech domain, lacking information about the generalization capacities of the systems. This paper proposes a complete and new benchmark of different VAD and OSD models, on multiple audio setups (single/multi-channel) and speech domains (e.g. media, meeting...). Our 2/3-class systems, which combine a Temporal Convolutional Network with speech representations adapted to the setup, outperform state-of-the-art results. We show that the joint training of these two tasks offers similar performances in terms of F1-score to two dedicated VAD and OSD systems while reducing the training cost. This unique architecture can also be used for single and multichannel speech processing.

End-to-End Deep Transfer Learning for Calibration-free Motor Imagery Brain Computer Interfaces

  • paper_url: http://arxiv.org/abs/2307.12827
  • repo_url: None
  • paper_authors: Maryam Alimardani, Steven Kocken, Nikki Leeuwis
    for:这个研究的目的是开发一种可以在各种应用场景中使用的无需参数调整的motor imagery brain-computer interface(MI-BCI)分类器。methods:这个研究使用了深度训练学习,并在 Raw EEG 信号上进行了一种端到端的深度学习方法。三种深度学习模型(MIN2Net、EEGNet 和 DeepConvNet)被训练并比较使用了一个公开available的数据集。results:在离势一个subject出的cross-validation中,MIN2Net 无法在新用户中分辨右手和左手的motor imagery,具有51.7%的 median 准确率。而 EEGNet 和 DeepConvNet 两种模型在其他数据集上没有进行过训练,但在这个数据集上的 median 准确率分别为62.5% 和 59.2%。这些准确率虽然不足70%,但与其他数据集上的准确率相似。
    Abstract A major issue in Motor Imagery Brain-Computer Interfaces (MI-BCIs) is their poor classification accuracy and the large amount of data that is required for subject-specific calibration. This makes BCIs less accessible to general users in out-of-the-lab applications. This study employed deep transfer learning for development of calibration-free subject-independent MI-BCI classifiers. Unlike earlier works that applied signal preprocessing and feature engineering steps in transfer learning, this study adopted an end-to-end deep learning approach on raw EEG signals. Three deep learning models (MIN2Net, EEGNet and DeepConvNet) were trained and compared using an openly available dataset. The dataset contained EEG signals from 55 subjects who conducted a left- vs. right-hand motor imagery task. To evaluate the performance of each model, a leave-one-subject-out cross validation was used. The results of the models differed significantly. MIN2Net was not able to differentiate right- vs. left-hand motor imagery of new users, with a median accuracy of 51.7%. The other two models performed better, with median accuracies of 62.5% for EEGNet and 59.2% for DeepConvNet. These accuracies do not reach the required threshold of 70% needed for significant control, however, they are similar to the accuracies of these models when tested on other datasets without transfer learning.
    摘要 一个主要问题在肌动幻象潜意计算机界面(MI-BCI)是其低精度分类和大量的数据需要用于特定用户的卡 Liping。这使得BCI对通用用户在室外应用中 menos accessible。这项研究使用了深度传输学习开发了无需特定用户 calibration的Ml-BCI分类器。与之前的工作一样,这项研究不对信号预处理和特征工程步骤进行传输学习,而是使用了 Raw EEG 信号的端到端深度学习方法。研究使用了三个深度学习模型(MIN2Net、EEGNet 和 DeepConvNet),并对其进行了比较。数据集包含55名用户完成左手 vs. 右手肌动幻象任务的EEG信号。为评估每个模型的性能,使用了留一个用户之外的交叉验证。结果表明,MIN2Net 无法识别新用户的左手 vs. 右手肌动幻象,具有51.7%的 median 精度。另外两个模型则表现更好, median 精度分别为62.5% 和 59.2%。这些精度没有达到所需的70%的阈值,但与不使用传输学习的其他数据集测试的精度类似。

Performance of Large Language Models in a Computer Science Degree Program

  • paper_url: http://arxiv.org/abs/2308.02432
  • repo_url: None
  • paper_authors: Tim Krüger, Michael Gref
  • for: 这个论文的目的是评估不同大型自然语言模型在大学应用科学学士学位课程中的效iveness。
  • methods: 这个论文使用了不同大型自然语言模型作为教学工具,并通过提示模型 lecture material、运动任务和过去考试来评估它们在不同计算机科学领域的能力。
  • results: 研究发现,ChatGPT-3.5在10个测试模块中的平均分为79.9%,BingAI为68.4%,LLaMa(65亿参数变量)为20%。尽管这些结果非常有力,但even GPT-4.0不能通过学位课程 - 因为它在数学计算中存在限制。
    Abstract Large language models such as ChatGPT-3.5 and GPT-4.0 are ubiquitous and dominate the current discourse. Their transformative capabilities have led to a paradigm shift in how we interact with and utilize (text-based) information. Each day, new possibilities to leverage the capabilities of these models emerge. This paper presents findings on the performance of different large language models in a university of applied sciences' undergraduate computer science degree program. Our primary objective is to assess the effectiveness of these models within the curriculum by employing them as educational aids. By prompting the models with lecture material, exercise tasks, and past exams, we aim to evaluate their proficiency across different computer science domains. We showcase the strong performance of current large language models while highlighting limitations and constraints within the context of such a degree program. We found that ChatGPT-3.5 averaged 79.9% of the total score in 10 tested modules, BingAI achieved 68.4%, and LLaMa, in the 65 billion parameter variant, 20%. Despite these convincing results, even GPT-4.0 would not pass the degree program - due to limitations in mathematical calculations.
    摘要 大型语言模型如ChatGPT-3.5和GPT-4.0在当前的讨论中占据主导地位,其转换能力导致了信息处理方式的 paradigm shift。每天,新的可能性 emerge 以利用这些模型的能力。本文介绍了不同大型语言模型在大学应用科学学士学位课程中的表现。我们的主要目标是通过使用这些模型作为教育工具来评估它们在课程中的效iveness。我们在讲义材料、作业任务和过往考试中提问这些模型,以评估它们在不同的计算机科学领域中的技能。我们显示了当前大型语言模型的强大表现,并 highlighted 其中的限制和约束在这种学位课程中。我们发现ChatGPT-3.5在10个测试模块中的平均分为79.9%,BingAI为68.4%,LLaMa(65亿参数变量)为20%。尽管这些结果非常吸引人,但even GPT-4.0不能通过这种学位课程 - 因为它们在数学计算中的限制。

Maximal Independent Sets for Pooling in Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2307.13011
  • repo_url: None
  • paper_authors: Stevan Stanovic, Benoit Gaüzère, Luc Brun
  • for: 本文为了解决图像Pooling问题,提出了三种基于最大独立集的Pooling方法,以避免传统图像Pooling方法的缺点。
  • methods: 本文使用了三种基于最大独立集的Pooling方法,即Maximal Independent Set Pooling(MISP)、Maximal Independent Set Aggregation(MISA)和Maximal Independent Set Based Pooling(MIBP)。
  • results: 实验结果表明,这三种Pooling方法能够减少图像的缺失和杂乱,并且能够保持图像的连通性。
    Abstract Convolutional Neural Networks (CNNs) have enabled major advances in image classification through convolution and pooling. In particular, image pooling transforms a connected discrete lattice into a reduced lattice with the same connectivity and allows reduction functions to consider all pixels in an image. However, there is no pooling that satisfies these properties for graphs. In fact, traditional graph pooling methods suffer from at least one of the following drawbacks: Graph disconnection or overconnection, low decimation ratio, and deletion of large parts of graphs. In this paper, we present three pooling methods based on the notion of maximal independent sets that avoid these pitfalls. Our experimental results confirm the relevance of maximal independent set constraints for graph pooling.
    摘要 convolutional neural networks (CNNs) hanno permesso di grandi avanzamenti nella classificazione di immagini attraverso la convoluzione e la pooling. In particolare, la pooling di immagini trasforma una rete discreta connesse in una rete ridotta con la stessa connessione e consente alle funzioni di riduzione di considerare tutti i pixel dell'immagine. Tuttavia, non esiste pooling per grafi che soddisfi queste proprietà. Infatti, i metodi di pooling tradizionali per grafi soffrono almeno uno dei seguenti difetti: disconnessione del grafico o eccessiva connessione, bassa riduzione del numero di node e cancellazione di grandi porzioni del grafico. In questo paper, presentiamo tre metodi di pooling basati sulla nozione di set indipendenti massimi che evitano questi inconvenienti. I nostri risultati sperimentali confermano la rilevanza delle restrizioni di set indipendenti massimi per il pooling di grafi.

Analyzing the Strategy of Propaganda using Inverse Reinforcement Learning: Evidence from the 2022 Russian Invasion of Ukraine

  • paper_url: http://arxiv.org/abs/2307.12788
  • repo_url: None
  • paper_authors: Dominique Geissler, Stefan Feuerriegel
  • for: This study aims to understand the strategy behind the pro-Russian propaganda campaign on social media during the 2022 Russian invasion of Ukraine.
  • methods: The study uses an inverse reinforcement learning (IRL) approach to model online behavior as a Markov decision process and infer the underlying reward structure that guides propagandists when interacting with users with a supporting or opposing stance toward the invasion.
  • results: The study finds that bots and humans follow different strategies in responding to pro-Russian propaganda. Bots primarily respond to pro-invasion messages, suggesting they seek to drive virality, while messages indicating opposition primarily elicit responses from humans, suggesting they tend to engage in critical discussions.
    Abstract The 2022 Russian invasion of Ukraine was accompanied by a large-scale, pro-Russian propaganda campaign on social media. However, the strategy behind the dissemination of propaganda has remained unclear, particularly how the online discourse was strategically shaped by the propagandists' community. Here, we analyze the strategy of the Twitter community using an inverse reinforcement learning (IRL) approach. Specifically, IRL allows us to model online behavior as a Markov decision process, where the goal is to infer the underlying reward structure that guides propagandists when interacting with users with a supporting or opposing stance toward the invasion. Thereby, we aim to understand empirically whether and how between-user interactions are strategically used to promote the proliferation of Russian propaganda. For this, we leverage a large-scale dataset with 349,455 posts with pro-Russian propaganda from 132,131 users. We show that bots and humans follow a different strategy: bots respond predominantly to pro-invasion messages, suggesting that they seek to drive virality; while messages indicating opposition primarily elicit responses from humans, suggesting that they tend to engage in critical discussions. To the best of our knowledge, this is the first study analyzing the strategy behind propaganda from the 2022 Russian invasion of Ukraine through the lens of IRL.
    摘要 俄罗斯入侵乌克兰的2022年社交媒体宣传活动中,涉及到了大规模的俄罗斯支持者在社交媒体上的宣传活动。然而,这些宣传活动的战略具体如何实施,特别是如何通过在用户之间的互动来推动俄罗斯宣传的普及,这些问题仍未得到了清楚的回答。在这里,我们使用 inverse reinforcement learning(IRL)方法来分析推特社区的宣传策略。具体来说,IRL方法允许我们将在线行为模型为Markov决策过程,并且目的是从推特用户的支持或反对姿态来推断宣传者在互动中的奖励结构。因此,我们可以理解propagandists在互动中是如何用between-user互动来推动俄罗斯宣传的。为此,我们利用了349,455条推特帖子和132,131名用户的大规模数据集。我们发现,机器人和人类使用者采取了不同的策略:机器人尽量回应支持入侵的消息,表明它们想要驱动病毒性;而反对消息主要引起人类用户的回应,表明人类用户更倾向于进行批评讨论。到目前为止,这是我们分析2022年俄罗斯入侵乌克兰的宣传策略的第一项研究,通过IRL镜像来分析。

Is attention all you need in medical image analysis? A review

  • paper_url: http://arxiv.org/abs/2307.12775
  • repo_url: None
  • paper_authors: Giorgos Papanastasiou, Nikolaos Dikaios, Jiahao Huang, Chengjia Wang, Guang Yang
  • for: 这篇论文旨在概述现有的hybrid CNN-Transf/Attention模型,以及对这些模型的架构设计、突破点和应用前景。
  • methods: 这篇论文使用了系统性的文献综述方法,对 hybrid CNN-Transf/Attention模型进行了架构分析和综述,并提出了一种基于数据驱动的领域泛化和适应方法。
  • results: 论文发现了hybrid CNN-Transf/Attention模型在医学图像分析领域的应用前景,并提出了一种基于数据驱动的领域泛化和适应方法,可以优化模型的泛化能力和适应能力。
    Abstract Medical imaging is a key component in clinical diagnosis, treatment planning and clinical trial design, accounting for almost 90% of all healthcare data. CNNs achieved performance gains in medical image analysis (MIA) over the last years. CNNs can efficiently model local pixel interactions and be trained on small-scale MI data. The main disadvantage of typical CNN models is that they ignore global pixel relationships within images, which limits their generalisation ability to understand out-of-distribution data with different 'global' information. The recent progress of Artificial Intelligence gave rise to Transformers, which can learn global relationships from data. However, full Transformer models need to be trained on large-scale data and involve tremendous computational complexity. Attention and Transformer compartments (Transf/Attention) which can well maintain properties for modelling global relationships, have been proposed as lighter alternatives of full Transformers. Recently, there is an increasing trend to co-pollinate complementary local-global properties from CNN and Transf/Attention architectures, which led to a new era of hybrid models. The past years have witnessed substantial growth in hybrid CNN-Transf/Attention models across diverse MIA problems. In this systematic review, we survey existing hybrid CNN-Transf/Attention models, review and unravel key architectural designs, analyse breakthroughs, and evaluate current and future opportunities as well as challenges. We also introduced a comprehensive analysis framework on generalisation opportunities of scientific and clinical impact, based on which new data-driven domain generalisation and adaptation methods can be stimulated.
    摘要 医疗影像是诊断、治疗规划和临床试验设计中的关键组成部分,占全部医疗数据的近90%。过去几年,Convolutional Neural Networks(CNN)在医疗影像分析(MIA)中获得了性能提升。CNN可以有效地模型图像中的局部像素互动,并可以在小规模的MI数据上进行训练。然而,典型的CNN模型却忽略图像中的全局像素关系,这限制了它们对不同'全球'信息的总体化能力。随着人工智能的发展,Transformers(Transformers)产生了,它可以从数据中学习全球关系。然而,全Transformers模型需要大规模的训练和巨大的计算复杂度。Attention和Transformers组件(Transf/Attention),可以保持模型全球关系的性能,被提议为轻量级的Alternative。最近,有一个增长的趋势,将Complementary local-global properties(CLGP)从CNN和Transf/Attention架构中搬运到新的混合模型中。过去几年,混合CNN-Transf/Attention模型在多种MIA问题上表现出了明显的增长。在这个系统性评论中,我们对现有的混合CNN-Transf/Attention模型进行了评论和探讨,分析了重要的架构设计、突破性和现有和未来的机遇和挑战。此外,我们还提出了一种基于总结机会的科学和临床影响分析框架,可以驱动新的数据驱动领域总结和适应方法。

Adaptation of Whisper models to child speech recognition

  • paper_url: http://arxiv.org/abs/2307.13008
  • repo_url: https://github.com/c3imaging/whisper_child_speech
  • paper_authors: Rishabh Jain, Andrei Barcovschi, Mariam Yiwere, Peter Corcoran, Horia Cucu
  • for: 提高儿童语音识别(ASR)系统对儿童语音的识别精度
  • methods: 使用现有的大量成人语音数据集来适应儿童语音,并对Whisper模型进行finetuning和自动监督学习
  • results: 在儿童语音上,使用finetuning Whisper模型和自动监督学习的wav2vec2模型可以获得显著改善的ASR性能,相比非finetuning Whisper模型
    Abstract Automatic Speech Recognition (ASR) systems often struggle with transcribing child speech due to the lack of large child speech datasets required to accurately train child-friendly ASR models. However, there are huge amounts of annotated adult speech datasets which were used to create multilingual ASR models, such as Whisper. Our work aims to explore whether such models can be adapted to child speech to improve ASR for children. In addition, we compare Whisper child-adaptations with finetuned self-supervised models, such as wav2vec2. We demonstrate that finetuning Whisper on child speech yields significant improvements in ASR performance on child speech, compared to non finetuned Whisper models. Additionally, utilizing self-supervised Wav2vec2 models that have been finetuned on child speech outperforms Whisper finetuning.
    摘要 自动话语识别(ASR)系统经常因缺乏儿童语音数据而困难地识别儿童语音。然而,存在巨量的注解的成人语音数据,这些数据被用来创建多语言的ASR模型,如呼叫。我们的工作是探索是否可以将这些模型适应儿童语音,以提高ASR的性能。此外,我们还比较了使用自动注解的wav2vec2模型和Whisper模型。我们的结果显示,对儿童语音进行finetuning可以大幅提高ASR性能,相比非finetuning Whisper模型。此外,使用自动注解的wav2vec2模型,经过finetuning在儿童语音上表现更好,超越了Whisper finetuning。

Nonparametric Linear Feature Learning in Regression Through Regularisation

  • paper_url: http://arxiv.org/abs/2307.12754
  • repo_url: https://github.com/bertillefollain/regfeal
  • paper_authors: Bertille Follain, Umut Simsekli, Francis Bach
  • for: 这个论文主要针对高维数据的自动特征选择问题,具体来说是多指标模型中的适用学习问题。
  • methods: 该论文提出了一种新的非 Parametric 方法,可以同时估计预测函数和相关的直方几何空间。该方法使用了Empirical risk minimization,并添加了函数导数的约束,以保证方法的多元性。
  • results: 该论文提供了一些实验结果,证明了RegFeaL 可以在不同的实际场景中表现出色,并且可以准确地估计相关维度。
    Abstract Representation learning plays a crucial role in automated feature selection, particularly in the context of high-dimensional data, where non-parametric methods often struggle. In this study, we focus on supervised learning scenarios where the pertinent information resides within a lower-dimensional linear subspace of the data, namely the multi-index model. If this subspace were known, it would greatly enhance prediction, computation, and interpretation. To address this challenge, we propose a novel method for linear feature learning with non-parametric prediction, which simultaneously estimates the prediction function and the linear subspace. Our approach employs empirical risk minimisation, augmented with a penalty on function derivatives, ensuring versatility. Leveraging the orthogonality and rotation invariance properties of Hermite polynomials, we introduce our estimator, named RegFeaL. By utilising alternative minimisation, we iteratively rotate the data to improve alignment with leading directions and accurately estimate the relevant dimension in practical settings. We establish that our method yields a consistent estimator of the prediction function with explicit rates. Additionally, we provide empirical results demonstrating the performance of RegFeaL in various experiments.
    摘要 《学习表示在自动选择特征中扮演重要角色,特别是在高维数据的情况下,非 Parametric 方法经常陷入困难。在这项研究中,我们关注supervised learning情况下,关键信息都集中在数据中的一个lower-dimensional linear subspace,即多index模型。如果这个subspace已知,它会大大提高预测、计算和解释。为解决这个挑战,我们提出了一种新的方法 для线性特征学习,同时估算预测函数和linear subspace。我们的方法使用empirical risk minimization,并添加了函数导数的罚因,以确保多样化。利用 Hermite polynomials的正交性和旋转不变性,我们引入了我们的估计器,名为RegFeaL。通过使用alternative minimization,我们可以逐步旋转数据,以便更好地与主导方向相align并准确地估算实际情况中的相关维度。我们证明了我们的方法可以生成一个consistent的预测函数估计器,并且提供了explicit rates。此外,我们还提供了一系列实验结果,证明RegFeaL的性能。

Introducing CALMED: Multimodal Annotated Dataset for Emotion Detection in Children with Autism

  • paper_url: http://arxiv.org/abs/2307.13706
  • repo_url: None
  • paper_authors: Annanda Sousa, Karen Young, Mathieu D’aquin, Manel Zarrouk, Jennifer Holloway
  • for: 这个论文的目的是提高人际交流的自动情感识别系统,以提供个性化的用户体验。
  • methods: 本论文使用了多种数据收集和处理技术,包括录音和视频特征提取和分类。
  • results: 本论文描述了一个基于多Modal的情感识别数据集,包括8-12岁的儿童患有Autism Spectrum Disorder(ASD)的记录例子。该数据集包括4个target类别的注解,共计57,012个示例,每个示例代表200ms(0.2秒)的时间窗口。
    Abstract Automatic Emotion Detection (ED) aims to build systems to identify users' emotions automatically. This field has the potential to enhance HCI, creating an individualised experience for the user. However, ED systems tend to perform poorly on people with Autism Spectrum Disorder (ASD). Hence, the need to create ED systems tailored to how people with autism express emotions. Previous works have created ED systems tailored for children with ASD but did not share the resulting dataset. Sharing annotated datasets is essential to enable the development of more advanced computer models for ED within the research community. In this paper, we describe our experience establishing a process to create a multimodal annotated dataset featuring children with a level 1 diagnosis of autism. In addition, we introduce CALMED (Children, Autism, Multimodal, Emotion, Detection), the resulting multimodal emotion detection dataset featuring children with autism aged 8-12. CALMED includes audio and video features extracted from recording files of study sessions with participants, together with annotations provided by their parents into four target classes. The generated dataset includes a total of 57,012 examples, with each example representing a time window of 200ms (0.2s). Our experience and methods described here, together with the dataset shared, aim to contribute to future research applications of affective computing in ASD, which has the potential to create systems to improve the lives of people with ASD.
    摘要 自动情感检测(ED)目标是建立自动识别用户情感的系统。这个领域有可能提高人机交互(HCI),创造个性化的用户体验。然而,ED系统通常在Autism Spectrum Disorder(ASD)人群表现不佳。因此,需要开发特化于人们表达情感的ED系统。先前的工作已经创建了特化于儿童ASD的ED系统,但没有公布数据集。分享标注数据集是开发更先进的计算机模型的关键。在这篇论文中,我们描述了我们在建立一个多模态注释数据集的过程中的经验。此外,我们还介绍了CALMED(儿童Autism、多模态、情感检测)数据集,这是8-12岁的儿童ASD的多模态情感检测数据集。CALMED包括录音和视频特征,从参与者录制的文件中提取,以及由参与者的父母提供的四个目标类别的注释。生成的数据集包括57,012个示例,每个示例表示200毫秒(0.2秒)的时间窗口。我们的经验和方法描述以及分享的数据,希望能为ASD的情感计算机科学研究提供贡献,以创造用于改善ASD人群生活质量的系统。

MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features

  • paper_url: http://arxiv.org/abs/2307.12698
  • repo_url: None
  • paper_authors: Adrien Bardes, Jean Ponce, Yann LeCun
  • for: 学习视觉表示,强调了学习内容特征,而不是捕捉物体运动或位置信息。
  • methods: 我们提出了 MC-JEPA 方法,即共同嵌入预测架构和自动学习方法,以同时学习光流和内容特征,并证明了这两个关联目标互助彼此,从而学习了包含运动信息的内容特征。
  • results: 我们的方法可以与现有的无监督光流标准做比较,以及与常见的自动学习方法在图像和视频 semantic segmentation 任务上表现相当。
    Abstract Self-supervised learning of visual representations has been focusing on learning content features, which do not capture object motion or location, and focus on identifying and differentiating objects in images and videos. On the other hand, optical flow estimation is a task that does not involve understanding the content of the images on which it is estimated. We unify the two approaches and introduce MC-JEPA, a joint-embedding predictive architecture and self-supervised learning approach to jointly learn optical flow and content features within a shared encoder, demonstrating that the two associated objectives; the optical flow estimation objective and the self-supervised learning objective; benefit from each other and thus learn content features that incorporate motion information. The proposed approach achieves performance on-par with existing unsupervised optical flow benchmarks, as well as with common self-supervised learning approaches on downstream tasks such as semantic segmentation of images and videos.
    摘要 自领导学习的视觉表示学习把注意力集中在学习内容特征上,这些特征不包括物体运动或位置信息,而是通过识别和区分图像和视频中的 объек 来学习。然而,流体计算是一个不需要理解图像内容的任务。我们将这两种方法联合起来,并引入 MC-JEPA,一种共享编码器中的联合预测建筑和自领导学习方法,以便同时学习流体计算和内容特征。我们发现这两个关联的目标,即流体计算目标和自领导学习目标,互相帮助和学习,因此学习的内容特征包含运动信息。我们的方法可以与现有的无监督光流标准准确比较,以及常见的自领导学习方法在图像和视频 semantic 分割任务中的性能。

Addressing the Impact of Localized Training Data in Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2307.12689
  • repo_url: https://github.com/akanshaaga/reg_appnp
  • paper_authors: Singh Akansha
  • for: 本文旨在评估 Graph Neural Networks (GNNs) 在不同区域的图数据上的性能,以及如何在受限的训练数据情况下提高 GNN 的适应性和泛化能力。
  • methods: 本文提出了一种基于 distribution alignment 的正则化方法,用于解决 GNN 在本地化训练数据上的性能下降问题。
  • results: 经过广泛测试, Results 表明该正则化方法可以有效地提高 GNN 在异常数据上的性能,并且可以帮助 GNN 更好地适应和泛化到不同的图数据上。
    Abstract Graph Neural Networks (GNNs) have achieved notable success in learning from graph-structured data, owing to their ability to capture intricate dependencies and relationships between nodes. They excel in various applications, including semi-supervised node classification, link prediction, and graph generation. However, it is important to acknowledge that the majority of state-of-the-art GNN models are built upon the assumption of an in-distribution setting, which hinders their performance on real-world graphs with dynamic structures. In this article, we aim to assess the impact of training GNNs on localized subsets of the graph. Such restricted training data may lead to a model that performs well in the specific region it was trained on but fails to generalize and make accurate predictions for the entire graph. In the context of graph-based semi-supervised learning (SSL), resource constraints often lead to scenarios where the dataset is large, but only a portion of it can be labeled, affecting the model's performance. This limitation affects tasks like anomaly detection or spam detection when labeling processes are biased or influenced by human subjectivity. To tackle the challenges posed by localized training data, we approach the problem as an out-of-distribution (OOD) data issue by by aligning the distributions between the training data, which represents a small portion of labeled data, and the graph inference process that involves making predictions for the entire graph. We propose a regularization method to minimize distributional discrepancies between localized training data and graph inference, improving model performance on OOD data. Extensive tests on popular GNN models show significant performance improvement on three citation GNN benchmark datasets. The regularization approach effectively enhances model adaptation and generalization, overcoming challenges posed by OOD data.
    摘要 граф neural networks (GNNs) 已经取得了很大的成功,它们可以从图структуре数据中学习,因为它们可以捕捉图中节点之间的复杂关系和依赖关系。它们在半supervised node classification、链接预测和图生成等应用中表现出色。然而,我们需要注意的是,大多数当前的状态部署GNN模型是基于图结构的内部分布式Setting中建模,这会限制它们在实际图中的性能。在这篇文章中,我们试图评估在本地化Subset中训练GNN模型的影响。这种局部训练数据可能会导致模型在特定区域中表现出色,但是它无法总结整个图中的准确预测。在图基于半supervised learning (SSL) 中,资源约束常常导致数据集很大,但只有一部分可以被标记,这会影响模型的性能。这种限制会对任务 like anomaly detection 或 spam detection 产生影响,因为标记过程可能受到人类主观性的影响。为了解决本地化训练数据所带来的挑战,我们将这种问题看作一个out-of-distribution (OOD) 数据问题,并通过对本地化训练数据和图推理过程之间的分布匹配来解决问题。我们提出一种Regularization方法,以降低本地化训练数据和图推理之间的分布差异,从而提高OOD数据上的模型表现。我们在popular GNN模型上进行了广泛的测试,并发现这种Regularization方法可以在三个文献GNN benchmark数据集上显著提高表现。这种Regularization方法可以增强模型的适应和泛化能力,从而超越OOD数据的挑战。

IteraTTA: An interface for exploring both text prompts and audio priors in generating music with text-to-audio models

  • paper_url: http://arxiv.org/abs/2307.13005
  • repo_url: None
  • paper_authors: Hiromu Yakura, Masataka Goto
  • for: 帮助用户自由生成音乐音频,不需具备音乐知识,通过不同文本提示和音频先导来探索音频生成空间。
  • methods: 使用文本-到-音频生成技术,并提供用户可以进行双重探索(文本提示和音频先导),以便理解不同文本提示和音频先导对生成结果的影响,并逐渐实现用户的模糊化目标。
  • results: 通过提供特定的音频先导和文本提示,用户可以逐渐理解和探索音频生成空间,并通过反复比较不同的文本提示和音频先导来了解它们对生成结果的影响。
    Abstract Recent text-to-audio generation techniques have the potential to allow novice users to freely generate music audio. Even if they do not have musical knowledge, such as about chord progressions and instruments, users can try various text prompts to generate audio. However, compared to the image domain, gaining a clear understanding of the space of possible music audios is difficult because users cannot listen to the variations of the generated audios simultaneously. We therefore facilitate users in exploring not only text prompts but also audio priors that constrain the text-to-audio music generation process. This dual-sided exploration enables users to discern the impact of different text prompts and audio priors on the generation results through iterative comparison of them. Our developed interface, IteraTTA, is specifically designed to aid users in refining text prompts and selecting favorable audio priors from the generated audios. With this, users can progressively reach their loosely-specified goals while understanding and exploring the space of possible results. Our implementation and discussions highlight design considerations that are specifically required for text-to-audio models and how interaction techniques can contribute to their effectiveness.
    摘要 现代文本到音频生成技术具有让新手无需 musical knowledge 可以自由生成音乐音频的潜力。用户可以通过不同的文本提示来尝试生成音频,而不需要了解和器械进程和和谐进程。然而,与图像领域相比,了解音乐频谱中可能的音频空间是Difficult的,因为用户无法同时听到生成的音频变化。我们因此为用户提供了探索不仅文本提示而且受限于文本到音频生成过程的音频先例的机会。这种双重探索使得用户可以通过相互比较不同的文本提示和音频先例来了解它们对生成结果的影响。我们开发的界面IteraTTA特别设计为帮助用户精细调整文本提示和选择生成音频中有利的先例。通过这种方式,用户可以逐步实现自己的模糊化目标,同时了解和探索可能的结果空间。我们的实现和讨论探讨了特定于文本到音频模型的设计考虑和交互技术如何提高其效果。

Control and Monitoring of Artificial Intelligence Algorithms

  • paper_url: http://arxiv.org/abs/2307.13705
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Carlos Mario Braga Ortuño, Blanza Martinez Donoso, Belén Muñiz Villanueva
  • for: 本研究阐述了在训练完成后,如何监督人工智能模型的运行,并处理可能出现的数据分布的变化。
  • methods: 本研究介绍了一些用于评估模型表现的指标,以及适当的数据分布基础。
  • results: 研究发现,监督模型的运行可以帮助检测和处理数据分布的变化,并且可以提高模型的表现。
    Abstract This paper elucidates the importance of governing an artificial intelligence model post-deployment and overseeing potential fluctuations in the distribution of present data in contrast to the training data. The concepts of data drift and concept drift are explicated, along with their respective foundational distributions. Furthermore, a range of metrics is introduced, which can be utilized to scrutinize the model's performance concerning potential temporal variations.
    摘要 Translation Notes:* "post-deployment" is translated as "后部署" (hòu bù zhì)* "potential fluctuations" is translated as "潜在的波动" (pán zài de bō dòng)* "data drift" is translated as "数据漂移" (shù jí qiáo yí)* "concept drift" is translated as "概念漂移" (gài yán qiáo yí)* "foundational distributions" is translated as "基础分布" (jī zhì fēn zhòu)* "scrutinize" is translated as "检查" (jiǎn chá)* "concerning potential temporal variations" is translated as "关于潜在的时间变化" (guān yù pán zài de shí huan bìng xiàng)

Remote Bio-Sensing: Open Source Benchmark Framework for Fair Evaluation of rPPG

  • paper_url: http://arxiv.org/abs/2307.12644
  • repo_url: https://github.com/remotebiosensing/rppg
  • paper_authors: Dae-Yeol Kim, Eunsu Goh, KwangKee Lee, JongEui Chae, JongHyeon Mun, Junyeong Na, Chae-bong Sohn, Do-Yup Kim
  • For: This paper aims to provide a benchmarking framework for evaluating the performance of remote photoplethysmography (rPPG) techniques across a wide range of datasets, to facilitate fair comparison and progress in the field.* Methods: The paper uses a variety of datasets and benchmarking metrics to evaluate the performance of both conventional non-deep neural network (non-DNN) and deep neural network (DNN) methods for rPPG.* Results: The paper provides a comprehensive evaluation of the performance of different rPPG techniques on a wide range of datasets, and highlights the need for fair and evaluable benchmarking to overcome challenges in the field and make meaningful progress.
    Abstract rPPG (Remote photoplethysmography) is a technology that measures and analyzes BVP (Blood Volume Pulse) by using the light absorption characteristics of hemoglobin captured through a camera. Analyzing the measured BVP can derive various physiological signals such as heart rate, stress level, and blood pressure, which can be applied to various applications such as telemedicine, remote patient monitoring, and early prediction of cardiovascular disease. rPPG is rapidly evolving and attracting great attention from both academia and industry by providing great usability and convenience as it can measure biosignals using a camera-equipped device without medical or wearable devices. Despite extensive efforts and advances in this field, serious challenges remain, including issues related to skin color, camera characteristics, ambient lighting, and other sources of noise and artifacts, which degrade accuracy performance. We argue that fair and evaluable benchmarking is urgently required to overcome these challenges and make meaningful progress from both academic and commercial perspectives. In most existing work, models are trained, tested, and validated only on limited datasets. Even worse, some studies lack available code or reproducibility, making it difficult to fairly evaluate and compare performance. Therefore, the purpose of this study is to provide a benchmarking framework to evaluate various rPPG techniques across a wide range of datasets for fair evaluation and comparison, including both conventional non-deep neural network (non-DNN) and deep neural network (DNN) methods. GitHub URL: https://github.com/remotebiosensing/rppg
    摘要 remote 血液氧测技术 (rPPG) 是一种利用摄像机捕捉到血液中内氧滤过特性,并分析血液量脉搏 (BVP) 的技术。通过分析测量的 BVP,可以 derivate 多种生理信号,如心率、压力和压力等,这些信号可以应用于多个应用,如远程医疗、远程患者监控和早期心血管疾病预后评估。 rPPG 在学术和业界中受到广泛关注,因为它提供了很好的使用性和便利性,可以通过摄像机设备而不需要医疗器械或戴式设备来量测生物信号。然而,这个领域仍然面临许多挑战,包括皮肤颜色、摄像机特性、环境照明和其他干扰和错误的问题,这些问题会影响性能。我们认为,对于这些挑战的公平和评估是非常重要的,以实现学术和商业上的进步。大多数现有的工作都是将模型训练、测试和验证在有限的数据集上。甚至更糟糕,一些研究缺乏可用的代码或重现性,使得公平评估和比较性能的问题更加困难。因此,本研究的目的是提供一个 benchmarking 框架,以评估不同的 rPPG 技术在广泛的数据集上的性能,以便公平评估和比较,包括非深度神经网络 (non-DNN) 和深度神经网络 (DNN) 方法。GitHub URL:

Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework

  • paper_url: http://arxiv.org/abs/2307.12626
  • repo_url: None
  • paper_authors: Jingxuan Wei, Cheng Tan, Zhangyang Gao, Linzhuang Sun, Siyuan Li, Bihui Yu, Ruifeng Guo, Stan Z. Li
  • for: This paper aims to address the lack of comprehensive evaluation of diverse approaches in multimodal scientific question answering, by presenting a novel dataset (COCO Multi-Modal Reasoning Dataset) that includes open-ended questions, rationales, and answers derived from the large object dataset COCO.
  • methods: The proposed dataset pioneers the use of open-ended questions in the context of multimodal chain-of-thought, which introduces a more challenging problem that effectively assesses the reasoning capability of CoT models. The authors propose innovative techniques, including multi-hop cross-modal attention and sentence-level contrastive learning, to enhance the image and text encoders.
  • results: Extensive experiments demonstrate the efficacy of the proposed dataset and techniques, offering novel perspectives for advancing multimodal reasoning. The proposed methods and dataset provide valuable insights and offer a more challenging problem for advancing the field of multimodal reasoning.
    Abstract Multimodal reasoning is a critical component in the pursuit of artificial intelligence systems that exhibit human-like intelligence, especially when tackling complex tasks. While the chain-of-thought (CoT) technique has gained considerable attention, the existing ScienceQA dataset, which focuses on multimodal scientific questions and explanations from elementary and high school textbooks, lacks a comprehensive evaluation of diverse approaches. To address this gap, we present COCO Multi-Modal Reasoning Dataset(COCO-MMRD), a novel dataset that encompasses an extensive collection of open-ended questions, rationales, and answers derived from the large object dataset COCO. Unlike previous datasets that rely on multiple-choice questions, our dataset pioneers the use of open-ended questions in the context of multimodal CoT, introducing a more challenging problem that effectively assesses the reasoning capability of CoT models. Through comprehensive evaluations and detailed analyses, we provide valuable insights and propose innovative techniques, including multi-hop cross-modal attention and sentence-level contrastive learning, to enhance the image and text encoders. Extensive experiments demonstrate the efficacy of the proposed dataset and techniques, offering novel perspectives for advancing multimodal reasoning.
    摘要 多Modal重要组成部分在人工智能系统具有人类智能的追求中,尤其是在较复杂的任务上。而链式思考(CoT)技术已经受到了广泛关注,但现有的科学QA数据集(ScienceQA),which focuses on multimodal scientific questions and explanations from elementary and high school textbooks, lacks a comprehensive evaluation of diverse approaches. To address this gap, we present COCO Multi-Modal Reasoning Dataset(COCO-MMRD), a novel dataset that encompasses an extensive collection of open-ended questions, rationales, and answers derived from the large object dataset COCO. Unlike previous datasets that rely on multiple-choice questions, our dataset pioneers the use of open-ended questions in the context of multimodal CoT, introducing a more challenging problem that effectively assesses the reasoning capability of CoT models. Through comprehensive evaluations and detailed analyses, we provide valuable insights and propose innovative techniques, including multi-hop cross-modal attention and sentence-level contrastive learning, to enhance the image and text encoders. Extensive experiments demonstrate the efficacy of the proposed dataset and techniques, offering novel perspectives for advancing multimodal reasoning.Here is the word-for-word translation of the text into Simplified Chinese:多Modal重要组成部分在人工智能系统具有人类智能的追求中,尤其是在较复杂的任务上。而链式思考(CoT)技术已经受到了广泛关注,但现有的科学QA数据集(ScienceQA),which focuses on multimodal scientific questions and explanations from elementary and high school textbooks, lacks a comprehensive evaluation of diverse approaches. To address this gap, we present COCO Multi-Modal Reasoning Dataset(COCO-MMRD), a novel dataset that encompasses an extensive collection of open-ended questions, rationales, and answers derived from the large object dataset COCO. Unlike previous datasets that rely on multiple-choice questions, our dataset pioneers the use of open-ended questions in the context of multimodal CoT, introducing a more challenging problem that effectively assesses the reasoning capability of CoT models. Through comprehensive evaluations and detailed analyses, we provide valuable insights and propose innovative techniques, including multi-hop cross-modal attention and sentence-level contrastive learning, to enhance the image and text encoders. Extensive experiments demonstrate the efficacy of the proposed dataset and techniques, offering novel perspectives for advancing multimodal reasoning.

De-confounding Representation Learning for Counterfactual Inference on Continuous Treatment via Generative Adversarial Network

  • paper_url: http://arxiv.org/abs/2307.12625
  • repo_url: None
  • paper_authors: Yonghe Zhao, Qiang Huang, Haolong Zeng, Yun Pen, Huiyan Sun
  • for: 这种论文主要针对的是如何对连续型干预变量进行Counterfactual推断,而现实世界中更常见的是连续型干预变量的Counterfactual推断任务。
  • methods: 我们提出了一种基于De-confounding Representation Learning(DRL)的框架,通过生成与干预变量分离的covariate表示来消除干预变量与covariate之间的相关性。DRL是一种非 Parametric 模型,可以消除连续型干预变量与covariate之间的线性和非线性相关性。
  • results: 在 synthetic 数据集上进行了广泛的实验,发现 DRL 模型在学习分离表示的同时,也可以超越当前Counterfactual推断模型的性能。此外,我们还应用了 DRL 模型到一个实际的医疗数据集 MIMIC,并显示出了连续型红细胞宽度分布和死亡率之间的详细 causal 关系。
    Abstract Counterfactual inference for continuous rather than binary treatment variables is more common in real-world causal inference tasks. While there are already some sample reweighting methods based on Marginal Structural Model for eliminating the confounding bias, they generally focus on removing the treatment's linear dependence on confounders and rely on the accuracy of the assumed parametric models, which are usually unverifiable. In this paper, we propose a de-confounding representation learning (DRL) framework for counterfactual outcome estimation of continuous treatment by generating the representations of covariates disentangled with the treatment variables. The DRL is a non-parametric model that eliminates both linear and nonlinear dependence between treatment and covariates. Specifically, we train the correlations between the de-confounded representations and the treatment variables against the correlations between the covariate representations and the treatment variables to eliminate confounding bias. Further, a counterfactual inference network is embedded into the framework to make the learned representations serve both de-confounding and trusted inference. Extensive experiments on synthetic datasets show that the DRL model performs superiorly in learning de-confounding representations and outperforms state-of-the-art counterfactual inference models for continuous treatment variables. In addition, we apply the DRL model to a real-world medical dataset MIMIC and demonstrate a detailed causal relationship between red cell width distribution and mortality.
    摘要 translate-into-simplified-chineseCounterfactual inference for continuous rather than binary treatment variables is more common in real-world causal inference tasks. While there are already some sample reweighting methods based on Marginal Structural Model for eliminating the confounding bias, they generally focus on removing the treatment's linear dependence on confounders and rely on the accuracy of the assumed parametric models, which are usually unverifiable. In this paper, we propose a de-confounding representation learning (DRL) framework for counterfactual outcome estimation of continuous treatment by generating the representations of covariates disentangled with the treatment variables. The DRL is a non-parametric model that eliminates both linear and nonlinear dependence between treatment and covariates. Specifically, we train the correlations between the de-confounded representations and the treatment variables against the correlations between the covariate representations and the treatment variables to eliminate confounding bias. Further, a counterfactual inference network is embedded into the framework to make the learned representations serve both de-confounding and trusted inference. Extensive experiments on synthetic datasets show that the DRL model performs superiorly in learning de-confounding representations and outperforms state-of-the-art counterfactual inference models for continuous treatment variables. In addition, we apply the DRL model to a real-world medical dataset MIMIC and demonstrate a detailed causal relationship between red cell width distribution and mortality.Here's the text in Simplified Chinese:对于连续型干预变量,Counterfactual inference更常用于实际世界 causal inference 任务中。虽然现有一些基于Marginal Structural Model的样本重新权重方法可以消除干预的相关性,但这些方法通常只是消除干预对隐藏变量的直线相关性,并且它们的假设模型通常是不可验证的。在这篇论文中,我们提出了一种基于 representation learning 的 de-confounding 框架,用于计算连续型干预下的可靠的结果。该框架使用非 Parametric 模型来消除干预和隐藏变量之间的线性和非线性相关性。具体来说,我们在 DRL 框架中训练了对干预变量和隐藏变量之间的相关性,并与隐藏变量和干预变量之间的相关性进行对比,以消除干预偏见。此外,我们还在框架中嵌入了一个 counterfactual inference 网络,以使得学习的表示可以服务于 both de-confounding 和可靠的推理。在 synthetic 数据集上进行了广泛的实验,显示 DRL 模型在学习 de-confounding 表示方面表现出色,并且超越了现有的 state-of-the-art counterfactual inference 模型。此外,我们还应用了 DRL 模型到了一个实际医疗数据集 MIMIC,并通过示出红细胞宽度分布和死亡之间的详细 causal 关系,以证明 DRL 模型的效果。

Past-present temporal programs over finite traces

  • paper_url: http://arxiv.org/abs/2307.12620
  • repo_url: None
  • paper_authors: Pedro Cabalar, Martín Diéguez, François Laferrière, Torsten Schaub
  • for: 这个论文旨在探讨逻辑编程的扩展,使用时间逻辑的语言结构来模型动态应用。
  • methods: 这篇论文使用了TELf语义,对过去和当前的逻辑编程规则进行研究,并将过去和当前分为不同的语义级别。
  • results: 论文提出了一种基于LTLf表达式的完成和循环式表达式的定义,以捕捉过去和当前 temporal 稳定模型。
    Abstract Extensions of Answer Set Programming with language constructs from temporal logics, such as temporal equilibrium logic over finite traces (TELf), provide an expressive computational framework for modeling dynamic applications. In this paper, we study the so-called past-present syntactic subclass, which consists of a set of logic programming rules whose body references to the past and head to the present. Such restriction ensures that the past remains independent of the future, which is the case in most dynamic domains. We extend the definitions of completion and loop formulas to the case of past-present formulas, which allows capturing the temporal stable models of a set of past-present temporal programs by means of an LTLf expression.
    摘要 扩展回答集编程的语言结构,如基于时间逻辑的时间平衡逻辑(TELf),提供了一种表达强大的计算框架,用于模型动态应用。本文研究一种叫做过去今天的 sintactic subclass,它包含一组逻辑编程规则,其体中参考过去,头部参考当前。这种限制保证了过去不会受到未来的影响,这是动态领域中的典型情况。我们将完成和循环式的定义扩展到过去今天的表达中,以便通过 LTLf 表达捕捉过去今天的时间稳定模型。

CTVIS: Consistent Training for Online Video Instance Segmentation

  • paper_url: http://arxiv.org/abs/2307.12616
  • repo_url: https://github.com/kainingying/ctvis
  • paper_authors: Kaining Ying, Qing Zhong, Weian Mao, Zhenhua Wang, Hao Chen, Lin Yuanbo Wu, Yifan Liu, Chengxiang Fan, Yunzhi Zhuge, Chunhua Shen
  • for: 本研究旨在提高在线视频实例分割(VIS)中的实例嵌入差异,以便在不同时刻进行实例关联。
  • methods: 本研究使用了对比损失来直接监督实例嵌入学习,并使用了一种叫做“一致训练”的简单而有效的训练策略,以提高实例嵌入的可靠性。
  • results: 实验表明,使用“一致训练”策略可以提高实例嵌入的可靠性,并在三个 VIS 测试套件中提高了 SOTA 模型的性能,包括 YTVIS19(55.1% AP)、YTVIS21(50.1% AP)和 OVIS(35.5% AP)。此外,我们还发现使用 pseudo-video 从图像转换而来的模型可以训练出比 Fully-supervised 模型更加强大的模型。
    Abstract The discrimination of instance embeddings plays a vital role in associating instances across time for online video instance segmentation (VIS). Instance embedding learning is directly supervised by the contrastive loss computed upon the contrastive items (CIs), which are sets of anchor/positive/negative embeddings. Recent online VIS methods leverage CIs sourced from one reference frame only, which we argue is insufficient for learning highly discriminative embeddings. Intuitively, a possible strategy to enhance CIs is replicating the inference phase during training. To this end, we propose a simple yet effective training strategy, called Consistent Training for Online VIS (CTVIS), which devotes to aligning the training and inference pipelines in terms of building CIs. Specifically, CTVIS constructs CIs by referring inference the momentum-averaged embedding and the memory bank storage mechanisms, and adding noise to the relevant embeddings. Such an extension allows a reliable comparison between embeddings of current instances and the stable representations of historical instances, thereby conferring an advantage in modeling VIS challenges such as occlusion, re-identification, and deformation. Empirically, CTVIS outstrips the SOTA VIS models by up to +5.0 points on three VIS benchmarks, including YTVIS19 (55.1% AP), YTVIS21 (50.1% AP) and OVIS (35.5% AP). Furthermore, we find that pseudo-videos transformed from images can train robust models surpassing fully-supervised ones.
    摘要 <>对于在线视频实例分割(VIS)中,实例嵌入的歧视扮演着关键的角色。我们使用对比损失来直接监督实例嵌入学习,其中对比项(CI)是一组锚点/正例/负例嵌入的集合。在线 VIS 方法通常使用一个参考帧中的 CIs,我们认为这是学习高度抽象的嵌入的不够。我们提出了一种简单 yet 有效的训练策略,called Consistent Training for Online VIS(CTVIS),它的目的是在训练和推理管道中对实例嵌入进行对应。具体来说,CTVIS 使用暂停聚合的嵌入和存储机制,并将它们添加到相关的嵌入中。这种扩展使得可以在训练和推理中对实例嵌入进行可靠的比较,从而提高对 occlusion、重复、和变形等挑战的适应能力。实验证明,CTVIS 可以超越当前 SOTA VIS 模型,提高 YTVIS19(55.1% AP)、YTVIS21(50.1% AP)和 OVIS(35.5% AP)等三个 VIS 标准套件中的成绩。此外,我们发现在图像转换成 pseudo-video 后训练的模型可以超过完全监督的模型。>>>

Regulating AI manipulation: Applying Insights from behavioral economics and psychology to enhance the practicality of the EU AI Act

  • paper_url: http://arxiv.org/abs/2308.02041
  • repo_url: None
  • paper_authors: Huixin Zhong
  • For: 这篇论文的目的是为了解释和增强欧盟人工智能法规第五条的执行,以防止人工智能操纵的可能有害后果。* Methods: 这篇论文使用了认知心理学和行为经济学的研究来解释潜意识技术和相关表达的概念,并将行为经济学中的决策简化技巧推广到操纵技术的领域。* Results: 这篇论文提出了五种经典的决策简化技巧和其相应的示例,以便用户、开发者、算法审核人和法律专业人员识别操纵技术并采取对策。此外,论文还对欧盟人工智能法规第五条的保护效果进行了批判性评估,并提出了特定的修改建议以增强保护效果。
    Abstract The EU AI Act Article 5 is designed to regulate AI manipulation to prevent potential harmful consequences. However, the practical implementation of this legislation is challenging due to the ambiguous terminologies and the unclear presentations of manipulative techniques. Moreover, the Article 5 also suffers criticize of inadequate protective efficacy. This paper attempts to clarify terminologies and to enhance the protective efficacy by integrating insights from psychology and behavioral economics. Firstly, this paper employs cognitive psychology research to elucidate the term subliminal techniques and its associated representation. Additionally, this paper extends the study of heuristics: a set of thinking shortcuts which can be aroused for behavior changing from behavior economics to the realm of manipulative techniques. The elucidation and expansion of terminologies not only provide a more accurate understanding of the legal provision but also enhance its protective efficacy. Secondly, this paper proposes five classical heuristics and their associated examples to illustrate how can AI arouse those heuristics to alter users behavior. The enumeration of heuristics serves as a practical guide for stakeholders such as AI developers, algorithm auditors, users, and legal practitioners, enabling them to identify manipulative techniques and implement countermeasures. Finally, this paper critically evaluates the protective efficacy of Article 5 for both the general public and vulnerable groups. This paper argues that the current protective efficacy of Article 5 is insufficient and thus proposes specific revision suggestions to terms a and b in Article 5 to enhance its protective efficacy. This work contributes to the ongoing discourse on AI ethics and legal regulations, providing a practical guide for interpreting and applying the EU AI Act Article 5.
    摘要 欧盟人工智能法 Article 5 是为了规范人工智能操纵,避免可能的有害后果。然而,实施这一法律的具体方法是困难的,因为涉及的术语抽象,以及操纵技巧的不清楚表述。此外, Article 5 还受到了不充分的保护效果的批评。这篇论文试图使用认知心理学和行为经济学的意识来减轻这些问题。首先,这篇论文使用认知心理学研究来解释“潜意识技巧”的概念,并与其相关的表达相结合。此外,这篇论文将行为经济学中的决策简化技巧(heuristics)扩展到了操纵技巧的领域。通过这种方式,不仅可以提供更加准确的法律规定的理解,还可以提高其保护效果。其次,这篇论文提出五种经典的决策简化技巧和其相应的示例,以示AI如何使用这些技巧来改变用户的行为。这些技巧的列表 serves as a practical guide for stakeholders such as AI developers, algorithm auditors, users, and legal practitioners,allowing them to identify manipulative techniques and implement countermeasures。最后,这篇论文 kritisch evaluates the protective efficacy of Article 5 for both the general public and vulnerable groups。这篇论文 argue that the current protective efficacy of Article 5 is insufficient, and thus proposes specific revision suggestions to terms a and b in Article 5 to enhance its protective efficacy。这篇论文的成果将贡献到欧盟人工智能伦理法规的继续讨论中,并提供了一份实用的指南,用于解读和应用欧盟人工智能法 Article 5。

Less is More: Focus Attention for Efficient DETR

  • paper_url: http://arxiv.org/abs/2307.12612
  • repo_url: https://github.com/huawei-noah/noah-research
  • paper_authors: Dehua Zheng, Wenhui Dong, Hailin Hu, Xinghao Chen, Yunhe Wang
    for: 这个论文的目的是提高DETR-like模型的计算效率,同时保持模型的准确率。methods: 这个论文使用了一种名为Focus-DETR的方法,它使用了双重注意 Mechanism来注意更有用的 токен,从而提高计算效率。具体来说,它首先使用了一种名为Token Scoring Mechanism来评估每个 токен的重要性,然后使用了一种名为Enhanced Semantic Interaction Mechanism来提高对象的 semantic interaction。results: Comparing with state-of-the-art sparse DETR-like detectors under the same setting, Focus-DETR achieves 50.4AP (+2.2) on COCO, with comparable complexity.
    Abstract DETR-like models have significantly boosted the performance of detectors and even outperformed classical convolutional models. However, all tokens are treated equally without discrimination brings a redundant computational burden in the traditional encoder structure. The recent sparsification strategies exploit a subset of informative tokens to reduce attention complexity maintaining performance through the sparse encoder. But these methods tend to rely on unreliable model statistics. Moreover, simply reducing the token population hinders the detection performance to a large extent, limiting the application of these sparse models. We propose Focus-DETR, which focuses attention on more informative tokens for a better trade-off between computation efficiency and model accuracy. Specifically, we reconstruct the encoder with dual attention, which includes a token scoring mechanism that considers both localization and category semantic information of the objects from multi-scale feature maps. We efficiently abandon the background queries and enhance the semantic interaction of the fine-grained object queries based on the scores. Compared with the state-of-the-art sparse DETR-like detectors under the same setting, our Focus-DETR gets comparable complexity while achieving 50.4AP (+2.2) on COCO. The code is available at https://github.com/huawei-noah/noah-research/tree/master/Focus-DETR and https://gitee.com/mindspore/models/tree/master/research/cv/Focus-DETR.
    摘要

SL: Stable Learning in Source-Free Domain Adaption for Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2307.12580
  • repo_url: None
  • paper_authors: Yixin Chen, Yan Wang
  • for: 这篇论文是针对医疗影像分析中的深度学习技术,尤其是面临域名变化问题。
  • methods: 本研究提出了一个名为“稳定学习”(Stable Learning)策略,用于解决“长期训练,差化表现”的问题。这个策略包括对预测重量进行整合和增加熵。
  • results: 比较实验显示了这个策略的有效性。此外,研究者还进行了广泛的剥离实验,以评估不同方法之间的比较。
    Abstract Deep learning techniques for medical image analysis usually suffer from the domain shift between source and target data. Most existing works focus on unsupervised domain adaptation (UDA). However, in practical applications, privacy issues are much more severe. For example, the data of different hospitals have domain shifts due to equipment problems, and data of the two domains cannot be available simultaneously because of privacy. In this challenge defined as Source-Free UDA, the previous UDA medical methods are limited. Although a variety of medical source-free unsupervised domain adaption (MSFUDA) methods have been proposed, we found they fall into an over-fitting dilemma called "longer training, worse performance." Therefore, we propose the Stable Learning (SL) strategy to address the dilemma. SL is a scalable method and can be integrated with other research, which consists of Weight Consolidation and Entropy Increase. First, we apply Weight Consolidation to retain domain-invariant knowledge and then we design Entropy Increase to avoid over-learning. Comparative experiments prove the effectiveness of SL. We also have done extensive ablation experiments. Besides, We will release codes including a variety of MSFUDA methods.
    摘要 深度学习技术 для医疗影像分析通常受到源数据和目标数据之间的领域转移的影响。大多数现有的工作集中在无监督领域适应(UDA)。然而,在实际应用中,隐私问题更加严重。例如,医院数据之间存在领域转移,因为设备问题,两个领域的数据无法同时可用。这种挑战被称为源无法适应(Source-Free UDA),前一代的UDA医疗方法有限。虽然一些医疗源无法无监督领域适应(MSFUDA)方法已经被提出,但我们发现它们受到“长期训练,性能下降”的困惑。因此,我们提出稳定学习(SL)策略来解决这个困惑。SL是可扩展的方法,可以与其他研究集成,它包括权重团聚和Entropy增加。首先,我们将权重团聚应用于保留领域不变知识,然后我们设计Entropy增加以避免过度学习。对比性实验证明SL的效果。此外,我们还进行了广泛的减少实验。此外,我们将发布代码,包括多种MSFUDA方法。

Continuation Path Learning for Homotopy Optimization

  • paper_url: http://arxiv.org/abs/2307.12551
  • repo_url: https://github.com/xi-l/cpl
  • paper_authors: Xi Lin, Zhiyuan Yang, Xiaoyuan Zhang, Qingfu Zhang
  • for: 解决复杂优化问题,提高homotopy优化的效果和可靠性。
  • methods: 提出了一种基于模型的方法,可以同时优化原始问题和所有优化子问题,并在实时生成任何中间解决方案。
  • results: 实验表明,提议的方法可以大幅提高homotopy优化的性能,并提供更多有用信息,以支持更好的决策。
    Abstract Homotopy optimization is a traditional method to deal with a complicated optimization problem by solving a sequence of easy-to-hard surrogate subproblems. However, this method can be very sensitive to the continuation schedule design and might lead to a suboptimal solution to the original problem. In addition, the intermediate solutions, often ignored by classic homotopy optimization, could be useful for many real-world applications. In this work, we propose a novel model-based approach to learn the whole continuation path for homotopy optimization, which contains infinite intermediate solutions for any surrogate subproblems. Rather than the classic unidirectional easy-to-hard optimization, our method can simultaneously optimize the original problem and all surrogate subproblems in a collaborative manner. The proposed model also supports real-time generation of any intermediate solution, which could be desirable for many applications. Experimental studies on different problems show that our proposed method can significantly improve the performance of homotopy optimization and provide extra helpful information to support better decision-making.
    摘要 In this work, we propose a novel model-based approach to learn the whole continuation path for homotopy optimization, which includes infinite intermediate solutions for any surrogate subproblems. Our method optimizes the original problem and all surrogate subproblems in a collaborative manner, rather than the classic unidirectional easy-to-hard optimization. The proposed model also supports real-time generation of any intermediate solution, which could be desirable for many applications.Experimental studies on different problems show that our proposed method can significantly improve the performance of homotopy optimization and provide extra helpful information to support better decision-making.

Knapsack: Connectedness, Path, and Shortest-Path

  • paper_url: http://arxiv.org/abs/2307.12547
  • repo_url: None
  • paper_authors: Palash Dey, Sudeshna Kolay, Sipra Singh
  • for: 这个论文研究了带有图理解的背包问题。 specifically, it aims to find a connected subset of items of maximum value that satisfies the knapsack constraint.
  • methods: 这个论文使用了图论的方法来解决这个问题。 specifically, it shows that the problem is strongly NP-complete even for graphs of maximum degree four and NP-complete even for star graphs.
  • results: 这个论文得到了一个时间复杂度为 $O\left(2^{tw\log tw}\cdot\text{poly}(\min{s^2,d^2})\right)$ 的算法,以及一个 $(1-\epsilon)$ 因数逼近算法时间复杂度为 $O\left(2^{tw\log tw}\cdot\text{poly}(n,1/\epsilon)\right)$ для每个 $\epsilon>0$。 Additionally, it shows that connected-knapsack is computationally hardest followed by path-knapsack and shortestpath-knapsack.
    Abstract We study the knapsack problem with graph theoretic constraints. That is, we assume that there exists a graph structure on the set of items of knapsack and the solution also needs to satisfy certain graph theoretic properties on top of knapsack constraints. In particular, we need to compute in the connected knapsack problem a connected subset of items which has maximum value subject to the size of knapsack constraint. We show that this problem is strongly NP-complete even for graphs of maximum degree four and NP-complete even for star graphs. On the other hand, we develop an algorithm running in time $O\left(2^{tw\log tw}\cdot\text{poly}(\min\{s^2,d^2\})\right)$ where $tw,s,d$ are respectively treewidth of the graph, size, and target value of the knapsack. We further exhibit a $(1-\epsilon)$ factor approximation algorithm running in time $O\left(2^{tw\log tw}\cdot\text{poly}(n,1/\epsilon)\right)$ for every $\epsilon>0$. We show similar results for several other graph theoretic properties, namely path and shortest-path under the problem names path-knapsack and shortestpath-knapsack. Our results seems to indicate that connected-knapsack is computationally hardest followed by path-knapsack and shortestpath-knapsack.
    摘要 我们研究了带有图 teoretic 约束的零件包问题。具体来说,我们假设存在一个图结构,其中每个零件都有一定的价值,并且解决方案还需要满足一定的图 theoretica 性质。在特定情况下,我们需要计算一个连接的零件集,其中每个零件的价值最大,并且满足零件包的大小约束。我们证明了这个问题是强NP完全的,甚至对于最大度为四的图和星型图。然而,我们开发了一个时间复杂度为 $O\left(2^{tw\log tw}\cdot\text{poly}(\min\{s^2,d^2\})\right)$ 的算法,其中 $tw,s,d$ 分别是图的树幂,零件包的大小,和目标值。此外,我们还提出了一个 $(1-\epsilon)$ 因子近似算法,其时间复杂度为 $O\left(2^{tw\log tw}\cdot\text{poly}(n,1/\epsilon)\right)$,其中 $\epsilon>0$。我们还证明了对于一些其他的图 teoretic 性质,例如路径和最短路,我们可以通过对应的问题名称来描述它们,例如路径-零件包和最短路-零件包。我们的结果表明,连接-零件包是计算最为困难的,然后是路径-零件包和最短路-零件包。

Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model

  • paper_url: http://arxiv.org/abs/2307.12545
  • repo_url: None
  • paper_authors: Peng Wu, Jing Liu, Xiangteng He, Yuxin Peng, Peng Wang, Yanning Zhang
  • for: 这个研究的目的是为了提出一个新的错误检测任务,即错误视频搜寻(VAR),这个任务的目的是实用地搜寻错误的视频,并且使用语言描述和同步的声音进行跨Modalities的搜寻。
  • methods: 这个研究使用了一个名为“错误引导排序”的类别学习模型,它使用了一个新的错误排序方法来将错误的部分对应到视频中的特定时间点。此外,这个模型还使用了一个内建的类别排序方法来将视频中的内容与语言描述进行对应。
  • results: 实验结果显示,这个新的错误检测任务(VAR)是一个非常有挑战性的任务,并且证明了这个任务的重要性。此外,实验结果还显示了这个模型的优秀性,它可以实现高度的准确率和高效率。
    Abstract Video anomaly detection (VAD) has been paid increasing attention due to its potential applications, its current dominant tasks focus on online detecting anomalies% at the frame level, which can be roughly interpreted as the binary or multiple event classification. However, such a setup that builds relationships between complicated anomalous events and single labels, e.g., ``vandalism'', is superficial, since single labels are deficient to characterize anomalous events. In reality, users tend to search a specific video rather than a series of approximate videos. Therefore, retrieving anomalous events using detailed descriptions is practical and positive but few researches focus on this. In this context, we propose a novel task called Video Anomaly Retrieval (VAR), which aims to pragmatically retrieve relevant anomalous videos by cross-modalities, e.g., language descriptions and synchronous audios. Unlike the current video retrieval where videos are assumed to be temporally well-trimmed with short duration, VAR is devised to retrieve long untrimmed videos which may be partially relevant to the given query. To achieve this, we present two large-scale VAR benchmarks, UCFCrime-AR and XDViolence-AR, constructed on top of prevalent anomaly datasets. Meanwhile, we design a model called Anomaly-Led Alignment Network (ALAN) for VAR. In ALAN, we propose an anomaly-led sampling to focus on key segments in long untrimmed videos. Then, we introduce an efficient pretext task to enhance semantic associations between video-text fine-grained representations. Besides, we leverage two complementary alignments to further match cross-modal contents. Experimental results on two benchmarks reveal the challenges of VAR task and also demonstrate the advantages of our tailored method.
    摘要 视频异常检测(VAD)在过去几年中受到了越来越多的关注,因为它在实际应用中具有潜在的价值。目前主流的VAD任务都是在帧级别上进行异常检测,可以简单地 interpreted为多个事件的二分类或多类别分类。但是,这种设置不仅缺乏异常事件之间的关系,而且单个标签(如“违法行为”)无法描述异常事件的复杂性。在实际应用中,用户通常会搜索特定的视频而不是一系列相似的视频。因此,使用详细的描述来检测异常事件是有用的。在这个上下文中,我们提出了一个新的任务:视频异常检索(VAR),该任务的目标是在多modalities(如语言描述和同步声音)之间实用地检索相关的异常视频。不同于现有的视频检索,在VAR中视频不再被假设为短暂和有效的,而是可以是长不trimmed的,这些视频可能只有部分相关于给定查询。为了实现这一目标,我们提出了两个大规模的VAR benchmark,UCFCrime-AR和XDViolence-AR,它们基于常见的异常数据集。同时,我们设计了一种名为异常领导Alignment网络(ALAN)的模型,用于VAR。在ALAN中,我们提出了一种异常领导采样方法,以吸引关注关键部分在长不trimmed视频中。然后,我们引入了一种有效的假任务,以增强视频-文本细致的semantic关系。此外,我们利用了两种补偿匹配来进一步匹配cross-modal内容。实验结果表明,VAR任务具有挑战性,同时我们的定制方法也得到了优势。

Client-Level Differential Privacy via Adaptive Intermediary in Federated Medical Imaging

  • paper_url: http://arxiv.org/abs/2307.12542
  • repo_url: https://github.com/med-air/client-dp-fl
  • paper_authors: Meirui Jiang, Yuan Zhong, Anjie Le, Xiaoxiao Li, Qi Dou
  • for: This paper aims to optimize the trade-off between privacy protection and performance in federated learning (FL) for medical imaging under the context of client-level differential privacy (DP).
  • methods: The proposed approach is based on an adaptive intermediary strategy that splits clients into sub-clients, which serve as intermediaries between hospitals and the server to mitigate the noises introduced by DP without harming privacy.
  • results: The proposed approach is empirically evaluated on both classification and segmentation tasks using two public datasets, and its effectiveness is demonstrated with significant performance improvements and comprehensive analytical studies.Here’s the simplified Chinese version:
  • for: 这篇论文的目标是在医疗镜像 Federated Learning(FL)中优化 differential privacy(DP)的质量和性能的负担平衡。
  • methods: 提议的方法是基于 adaptive intermediary strategy,将客户端分成子客户端,这些子客户端将作为客户端和服务器之间的中间人,以减少 DP 引入的噪声而不危害隐私。
  • results: 提议的方法被 Empirical 评估在 two 个公共数据集上,并通过 comprehensive analytical studies 证明其效果。
    Abstract Despite recent progress in enhancing the privacy of federated learning (FL) via differential privacy (DP), the trade-off of DP between privacy protection and performance is still underexplored for real-world medical scenario. In this paper, we propose to optimize the trade-off under the context of client-level DP, which focuses on privacy during communications. However, FL for medical imaging involves typically much fewer participants (hospitals) than other domains (e.g., mobile devices), thus ensuring clients be differentially private is much more challenging. To tackle this problem, we propose an adaptive intermediary strategy to improve performance without harming privacy. Specifically, we theoretically find splitting clients into sub-clients, which serve as intermediaries between hospitals and the server, can mitigate the noises introduced by DP without harming privacy. Our proposed approach is empirically evaluated on both classification and segmentation tasks using two public datasets, and its effectiveness is demonstrated with significant performance improvements and comprehensive analytical studies. Code is available at: https://github.com/med-air/Client-DP-FL.
    摘要 尽管最近的进展已经提高了联合学习(FL)的隐私保护(DP),但是实际世界医疗场景中DP与性能之间的负担仍未得到充分探讨。在这篇论文中,我们提议优化DP与隐私保护之间的负担,在客户端DP上进行优化。然而,医疗影像FL通常有比其他领域(如移动设备)更少的参与者(医院),因此保持客户端的隐私是更加挑战性的。为解决这个问题,我们提议使用可适应担保策略,以提高性能而不危害隐私。具体来说,我们通过将客户端分为子客户端,使其作为医院和服务器之间的中间人,可以减少DP引入的噪声而不危害隐私。我们的提议方法在两个公共数据集上进行了实际评估,并通过了重要性能改进和完整的分析研究。代码可以在以下链接中找到:https://github.com/med-air/Client-DP-FL。

SelFormaly: Towards Task-Agnostic Unified Anomaly Detection

  • paper_url: http://arxiv.org/abs/2307.12540
  • repo_url: None
  • paper_authors: Yujin Lee, Harin Lim, Hyunsoo Yoon
  • for: 这篇论文旨在提出一个通用且强大的问题检测框架,以扩展previous任务特定的问题检测方法。
  • methods: 这篇论文使用了自我supervised ViTs,以及back-patch masking和top k-ratio feature matching等技术来实现通用的问题检测。
  • results: 这篇论文在不同的数据集上都 achieved state-of-the-art 的结果,并且适用于多种任务,包括问题检测、Semantic anomaly detection、多类问题检测和问题聚合。
    Abstract The core idea of visual anomaly detection is to learn the normality from normal images, but previous works have been developed specifically for certain tasks, leading to fragmentation among various tasks: defect detection, semantic anomaly detection, multi-class anomaly detection, and anomaly clustering. This one-task-one-model approach is resource-intensive and incurs high maintenance costs as the number of tasks increases. This paper presents SelFormaly, a universal and powerful anomaly detection framework. We emphasize the necessity of our off-the-shelf approach by pointing out a suboptimal issue with fluctuating performance in previous online encoder-based methods. In addition, we question the effectiveness of using ConvNets as previously employed in the literature and confirm that self-supervised ViTs are suitable for unified anomaly detection. We introduce back-patch masking and discover the new role of top k-ratio feature matching to achieve unified and powerful anomaly detection. Back-patch masking eliminates irrelevant regions that possibly hinder target-centric detection with representations of the scene layout. The top k-ratio feature matching unifies various anomaly levels and tasks. Finally, SelFormaly achieves state-of-the-art results across various datasets for all the aforementioned tasks.
    摘要 核心思想是可视异常检测是学习正常图像的 normality,但之前的工作是为特定任务而开发,导致不同任务之间的分化。这种一个任务一个模型的方法是资源占用和维护成本高。这篇论文介绍了 SelFormaly,一个通用和强大的异常检测框架。我们强调我们的卖外方法的必要性,指出了在线编码器基于方法中的性能波动问题。此外,我们质疑了使用 ConvNets 以前在文献中使用的效果,并证明了不同类异常检测和泛化异常检测可以使用自适应 ViTs。我们引入后贴布覆盖和发现了新的顶部 k-比例特征匹配,以实现统一和强大的异常检测。后贴布覆盖 eliminates 不相关的区域,可能干扰目标中心检测表示场景布局。顶部 k-比例特征匹配 统一了不同异常水平和任务。最后,SelFormaly 在不同数据集上实现了所有上述任务的州OF-the-art 结果。

Rethinking Medical Report Generation: Disease Revealing Enhancement with Knowledge Graph

  • paper_url: http://arxiv.org/abs/2307.12526
  • repo_url: https://github.com/wangyixinxin/mrg-kg
  • paper_authors: Yixin Wang, Zihao Lin, Haoyu Dong
  • for: 这个研究的目的是提高医疗报告生成(MRG)过程中的知识图(KG)的完整性和应用。
  • methods: 这个研究使用了一个完整的知识图,包括137种疾病和异常性,以帮助指导MRG过程。此外,研究还引入了一种新的增强策略,以便增强疾病类型在分布的表现。
  • results: 研究发现,使用提案的两个阶段生成方法和增强策略,可以显著提高生成报告中的疾病匹配度和多样性。这表明,这种方法可以有效地减少对疾病分布的长尾问题。
    Abstract Knowledge Graph (KG) plays a crucial role in Medical Report Generation (MRG) because it reveals the relations among diseases and thus can be utilized to guide the generation process. However, constructing a comprehensive KG is labor-intensive and its applications on the MRG process are under-explored. In this study, we establish a complete KG on chest X-ray imaging that includes 137 types of diseases and abnormalities. Based on this KG, we find that the current MRG data sets exhibit a long-tailed problem in disease distribution. To mitigate this problem, we introduce a novel augmentation strategy that enhances the representation of disease types in the tail-end of the distribution. We further design a two-stage MRG approach, where a classifier is first trained to detect whether the input images exhibit any abnormalities. The classified images are then independently fed into two transformer-based generators, namely, ``disease-specific generator" and ``disease-free generator" to generate the corresponding reports. To enhance the clinical evaluation of whether the generated reports correctly describe the diseases appearing in the input image, we propose diverse sensitivity (DS), a new metric that checks whether generated diseases match ground truth and measures the diversity of all generated diseases. Results show that the proposed two-stage generation framework and augmentation strategies improve DS by a considerable margin, indicating a notable reduction in the long-tailed problem associated with under-represented diseases.
    摘要 医学报告生成(MRG)中知识图грам(KG)发挥关键作用,因为它揭示疾病之间的关系,可以用于导航生成过程。然而,建立完整的KG是劳动密集的,其在MRG过程中的应用还尚未得到充分探索。在这种研究中,我们建立了包括137种疾病和异常的完整KG,基于这个KG,我们发现现有的MRG数据集表现出长尾问题。为了解决这个问题,我们提出了一种新的扩充策略,该策略可以增强疾病类型在分布的尾部的表达。我们还设计了一种两阶段的MRG方法,其中一个是使用一个分类器来检测输入图像是否存在任何异常。经过分类后,输入图像被独立地传递给两个基于转换器的生成器,即“疾病特定生成器”和“疾病无效生成器”,以生成相应的报告。为了提高生成的严肃评估,我们提出了多样敏感度(DS),一种新的度量,该度量检查生成的疾病与真实的疾病是否匹配,并且度量所有生成的疾病的多样性。结果表明,我们的两阶段生成框架和扩充策略可以大幅提高DS, indicating a remarkable reduction in the long-tailed problem associated with under-represented diseases.

FaFCNN: A General Disease Classification Framework Based on Feature Fusion Neural Networks

  • paper_url: http://arxiv.org/abs/2307.12518
  • repo_url: None
  • paper_authors: Menglin Kong, Shaojie Zhao, Juan Cheng, Xingquan Li, Ri Su, Muzhou Hou, Cong Cao
  • for: 本研究旨在解决应用深度学习/机器学习方法于疾病分类任务中的两个基本问题,即训练样本数量和质量的不足,以及如何有效地融合多源特征并训练稳定的分类模型。
  • methods: 我们提出了一种基于人类学习知识的Feature-aware Fusion Correlation Neural Network (FaFCNN)框架,包括特征意识互动模块和域对抗学习基于特征对齐模块。
  • results: 实验结果表明,通过使用预训练梯度提升树的扩充特征,FaFCNN在低质量 dataset 上实现了更高的性能提升,并且对于竞争对手基线方法进行了一致性优化。此外,广泛的实验还证明了提案的方法的稳定性和模型中每个组件的有效性。
    Abstract There are two fundamental problems in applying deep learning/machine learning methods to disease classification tasks, one is the insufficient number and poor quality of training samples; another one is how to effectively fuse multiple source features and thus train robust classification models. To address these problems, inspired by the process of human learning knowledge, we propose the Feature-aware Fusion Correlation Neural Network (FaFCNN), which introduces a feature-aware interaction module and a feature alignment module based on domain adversarial learning. This is a general framework for disease classification, and FaFCNN improves the way existing methods obtain sample correlation features. The experimental results show that training using augmented features obtained by pre-training gradient boosting decision tree yields more performance gains than random-forest based methods. On the low-quality dataset with a large amount of missing data in our setup, FaFCNN obtains a consistently optimal performance compared to competitive baselines. In addition, extensive experiments demonstrate the robustness of the proposed method and the effectiveness of each component of the model\footnote{Accepted in IEEE SMC2023}.
    摘要 “there are two fundamental problems in applying deep learning/machine learning methods to disease classification tasks, one is the insufficient number and poor quality of training samples; another one is how to effectively fuse multiple source features and thus train robust classification models. To address these problems, inspired by the process of human learning knowledge, we propose the Feature-aware Fusion Correlation Neural Network (FaFCNN), which introduces a feature-aware interaction module and a feature alignment module based on domain adversarial learning. This is a general framework for disease classification, and FaFCNN improves the way existing methods obtain sample correlation features. The experimental results show that training using augmented features obtained by pre-training gradient boosting decision tree yields more performance gains than random-forest based methods. On the low-quality dataset with a large amount of missing data in our setup, FaFCNN obtains a consistently optimal performance compared to competitive baselines. In addition, extensive experiments demonstrate the robustness of the proposed method and the effectiveness of each component of the model”。Here is the breakdown of the translation:* “two fundamental problems”(二大问题) - This phrase is translated as "two fundamental problems" to emphasize the importance of the issues being discussed.* “insufficient number and poor quality of training samples”(训练样本数量和质量不足) - This phrase is translated as "insufficient number and poor quality of training samples" to accurately convey the idea that there are not enough high-quality training samples available for training deep learning models.* “how to effectively fuse multiple source features”(如何有效地融合多源特征) - This phrase is translated as "how to effectively fuse multiple source features" to emphasize the importance of combining multiple sources of data to improve the accuracy of deep learning models.* “and thus train robust classification models”(并因此训练Robust分类模型) - This phrase is translated as "and thus train robust classification models" to emphasize the goal of training deep learning models that are robust and accurate.* “Feature-aware Fusion Correlation Neural Network”(特征意识融合相关神经网络) - This phrase is translated as "Feature-aware Fusion Correlation Neural Network" to accurately convey the name of the proposed method and its key features.* “based on domain adversarial learning”(基于域对抗学习) - This phrase is translated as "based on domain adversarial learning" to emphasize the key technique used in the proposed method.* “This is a general framework for disease classification”(这是一种普遍的疾病分类框架) - This phrase is translated as "This is a general framework for disease classification" to emphasize that the proposed method is a generalizable framework that can be applied to a wide range of disease classification tasks.* “and FaFCNN improves the way existing methods obtain sample correlation features”(而FaFCNN改进了现有方法获取样本相关特征) - This phrase is translated as "and FaFCNN improves the way existing methods obtain sample correlation features" to emphasize the key advantage of the proposed method.* “The experimental results show that training using augmented features obtained by pre-training gradient boosting decision tree yields more performance gains than random-forest based methods”(实验结果表明,使用由前期逻辑树决策树提取的增强特征进行训练,比Random Forest基于方法更多地提高性能) - This phrase is translated as "The experimental results show that training using augmented features obtained by pre-training gradient boosting decision tree yields more performance gains than random-forest based methods" to accurately convey the key findings of the experiments.* “On the low-quality dataset with a large amount of missing data in our setup, FaFCNN obtains a consistently optimal performance compared to competitive baselines”(在我们的设置下,具有大量缺失数据的低质量数据集上,FaFCNN对于竞争对手基线表现出一致优化的性能) - This phrase is translated as "On the low-quality dataset with a large amount of missing data in our setup, FaFCNN obtains a consistently optimal performance compared to competitive baselines" to emphasize the key finding that the proposed method performs well even on low-quality datasets with a large amount of missing data.* “In addition, extensive experiments demonstrate the robustness of the proposed method and the effectiveness of each component of the model”(此外,广泛的实验还证明了提议方法的稳定性和模型每个组件的有效性) - This phrase is translated as "In addition, extensive experiments demonstrate the robustness of the proposed method and the effectiveness of each component of the model" to emphasize the key finding that the proposed method is robust and effective, and to highlight the importance of each component of the model.

Gradient-Based Word Substitution for Obstinate Adversarial Examples Generation in Language Models

  • paper_url: http://arxiv.org/abs/2307.12507
  • repo_url: None
  • paper_authors: Yimu Wang, Peng Shi, Hongyang Zhang
  • for: This paper aims to address the problem of generating obstinate adversarial examples in NLP by introducing a novel word substitution method named GradObstinate, which automatically generates obstinate adversarial examples without any constraints on the search space or the need for manual design principles.
  • methods: The proposed GradObstinate method uses a gradient-based approach to automatically generate obstinate adversarial examples. It does not rely on any manual design principles or constraints on the search space, making it more practical and applicable in real-world scenarios.
  • results: The proposed GradObstinate method is evaluated on five representative NLP models and four benchmarks, and the results show that it generates more powerful obstinate adversarial examples with a higher attack success rate compared to antonym-based methods. Additionally, the obstinate substitutions found by GradObstinate are transferable to other models in black-box settings, including even GPT-3 and ChatGPT.Here is the answer in Simplified Chinese:
  • for: 这篇论文目的是解决NLP中的阻挡(过稳)攻击例的生成问题,提出了一种名为GradObstinate的新的词替换方法,可以自动生成阻挡攻击例而不需要人工设计原则或搜索空间的约束。
  • methods: GradObstinate方法使用梯度基础来自动生成阻挡攻击例,不需要人工设计原则或搜索空间的约束,使其在实际应用中更加实际。
  • results: GradObstinate方法在五种代表性的NLP模型和四个benchmark上进行了广泛的实验,结果显示,它可以生成更加强大的阻挡攻击例,攻击成功率高于antonym-based方法。此外,GradObstinate发现的阻挡替换还可以在黑盒设置下转移到其他模型中,包括GPT-3和ChatGPT。
    Abstract In this paper, we study the problem of generating obstinate (over-stability) adversarial examples by word substitution in NLP, where input text is meaningfully changed but the model's prediction does not, even though it should. Previous word substitution approaches have predominantly focused on manually designed antonym-based strategies for generating obstinate adversarial examples, which hinders its application as these strategies can only find a subset of obstinate adversarial examples and require human efforts. To address this issue, in this paper, we introduce a novel word substitution method named GradObstinate, a gradient-based approach that automatically generates obstinate adversarial examples without any constraints on the search space or the need for manual design principles. To empirically evaluate the efficacy of GradObstinate, we conduct comprehensive experiments on five representative models (Electra, ALBERT, Roberta, DistillBERT, and CLIP) finetuned on four NLP benchmarks (SST-2, MRPC, SNLI, and SQuAD) and a language-grounding benchmark (MSCOCO). Extensive experiments show that our proposed GradObstinate generates more powerful obstinate adversarial examples, exhibiting a higher attack success rate compared to antonym-based methods. Furthermore, to show the transferability of obstinate word substitutions found by GradObstinate, we replace the words in four representative NLP benchmarks with their obstinate substitutions. Notably, obstinate substitutions exhibit a high success rate when transferred to other models in black-box settings, including even GPT-3 and ChatGPT. Examples of obstinate adversarial examples found by GradObstinate are available at https://huggingface.co/spaces/anonauthors/SecretLanguage.
    摘要 在这篇论文中,我们研究了对话语言处理(NLP)领域中生成顽固(过度稳定)攻击示例的问题,通过单词替换而生成这些示例。现有的单词替换方法主要采用人工设计的反义策略来生成顽固攻击示例,这限制了其应用,因为这些策略只能找到一部分顽固攻击示例,并且需要人工劳动。为解决这问题,在这篇论文中,我们提出了一种新的单词替换方法,即GradObstinate,它是基于梯度的方法,可以自动生成顽固攻击示例,不需要任何限制或人工设计原则。为证明GradObstinate的有效性,我们在五种代表性模型(Electra、ALBERT、Roberta、DistillBERT和CLIP)上进行了广泛的实验,这些模型在四个NLPBenchmark(SST-2、MRPC、SNLI和SQuAD)和一个语言固定 benchmark(MSCOCO)上进行了finetuning。实验结果表明,我们提出的GradObstinate可以更好地生成顽固攻击示例,对于反义策略来说,攻击成功率更高。此外,我们还证明了GradObstinate生成的顽固替换示例在黑盒Setting中的传送性,包括GPT-3和ChatGPT等模型。详细的顽固攻击示例可以在https://huggingface.co/spaces/anonauthors/SecretLanguage上找到。

TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition

  • paper_url: http://arxiv.org/abs/2307.12493
  • repo_url: https://github.com/Shilin-LU/TF-ICON
  • paper_authors: Shilin Lu, Yanzhu Liu, Adams Wai-Kin Kong
  • for: 这个论文旨在提出一个无需训练的图像调和框架,将文本驱动的填充模型应用于跨领域图像导向作业。
  • methods: 这个框架使用的方法是使用文本驱动的填充模型,不需要进一步的训练、调整或优化。
  • results: 实验结果显示,将Stable Diffusion与特别提示(Exceptional Prompt)搭配可以超越现有的对应方法,而TF-ICON在多个视觉领域中也表现出优越的可 versatility。
    Abstract Text-driven diffusion models have exhibited impressive generative capabilities, enabling various image editing tasks. In this paper, we propose TF-ICON, a novel Training-Free Image COmpositioN framework that harnesses the power of text-driven diffusion models for cross-domain image-guided composition. This task aims to seamlessly integrate user-provided objects into a specific visual context. Current diffusion-based methods often involve costly instance-based optimization or finetuning of pretrained models on customized datasets, which can potentially undermine their rich prior. In contrast, TF-ICON can leverage off-the-shelf diffusion models to perform cross-domain image-guided composition without requiring additional training, finetuning, or optimization. Moreover, we introduce the exceptional prompt, which contains no information, to facilitate text-driven diffusion models in accurately inverting real images into latent representations, forming the basis for compositing. Our experiments show that equipping Stable Diffusion with the exceptional prompt outperforms state-of-the-art inversion methods on various datasets (CelebA-HQ, COCO, and ImageNet), and that TF-ICON surpasses prior baselines in versatile visual domains. Code is available at https://github.com/Shilin-LU/TF-ICON
    摘要 文本驱动的扩散模型已经展示出了吸引人的生成能力,可以完成多种图像编辑任务。在这篇论文中,我们提出了TF-ICON,一种新的培成freeImage COmpositioN框架,利用文本驱动扩散模型进行交域图像引导组合。这个任务的目标是将用户提供的对象顺利地 интеGRATE到特定的视觉上下文中。当前的扩散基于方法通常需要费时的实例基于优化或特定数据集上的Finetuning pretrained模型,这可能会损害它们的丰富先天知识。相比之下,TF-ICON可以不需要额外的培成或优化,通过使用off-the-shelf扩散模型来完成交域图像引导组合。此外,我们引入了Exceptional prompt,它不含任何信息,以便文本驱动扩散模型可以准确地将真实图像转化为幂等表示,这为组合提供了基础。我们的实验表明,在不同的图像 datasets(CelebA-HQ、COCO和ImageNet)上,使用 Exceptional prompt 的 Stable Diffusion 方法可以超越现有的抽象方法,而TF-ICON 也可以在多种视觉领域中超越先前的基eline。代码可以在https://github.com/Shilin-LU/TF-ICON 中找到。

ChatGPT for Software Security: Exploring the Strengths and Limitations of ChatGPT in the Security Applications

  • paper_url: http://arxiv.org/abs/2307.12488
  • repo_url: None
  • paper_authors: Zhilong Wang, Lan Zhang, Peng Liu
  • for: The paper aims to evaluate ChatGPT’s capabilities in security-oriented program analysis, specifically from the perspectives of both attackers and security analysts.
  • methods: The paper uses a case study approach, presenting several security-oriented program analysis tasks and deliberately introducing challenges to assess ChatGPT’s responses.
  • results: The paper examines the quality of answers provided by ChatGPT and gains a clearer understanding of its strengths and limitations in the realm of security-oriented program analysis.Here’s the same information in Simplified Chinese text:
  • for: 本文旨在评估ChatGPT在安全关注程序分析方面的能力,具体来说是从攻击者和安全分析员两个角度出发。
  • methods: 本文采用 caso study方法,通过提出多个安全关注程序分析任务,故意引入挑战来评估ChatGPT的回答质量。
  • results: 本文通过分析ChatGPT的回答来了解它在安全关注程序分析方面的优劣点。
    Abstract ChatGPT, as a versatile large language model, has demonstrated remarkable potential in addressing inquiries across various domains. Its ability to analyze, comprehend, and synthesize information from both online sources and user inputs has garnered significant attention. Previous research has explored ChatGPT's competence in code generation and code reviews. In this paper, we delve into ChatGPT's capabilities in security-oriented program analysis, focusing on perspectives from both attackers and security analysts. We present a case study involving several security-oriented program analysis tasks while deliberately introducing challenges to assess ChatGPT's responses. Through an examination of the quality of answers provided by ChatGPT, we gain a clearer understanding of its strengths and limitations in the realm of security-oriented program analysis.
    摘要 chatgpt 作为一种多能语言模型,在各个领域的问题上表现出了惊人的潜力。它可以分析、理解和合成来自线上源和用户输入的信息,吸引了广泛的关注。以前的研究探讨了 chatgpt 在代码生成和代码审查方面的能力。在这篇论文中,我们探究 chatgpt 在安全关注程序分析方面的能力,具体来说是从攻击者和安全分析员的视角来评估 chatgpt 的回答质量。我们通过对多个安全关注程序分析任务的挑战性评估,了解 chatgpt 在安全关注程序分析领域的优势和局限性。

ProtoFL: Unsupervised Federated Learning via Prototypical Distillation

  • paper_url: http://arxiv.org/abs/2307.12450
  • repo_url: None
  • paper_authors: Hansol Kim, Youngjun Kwak, Minyoung Jung, Jinho Shin, Youngsung Kim, Changick Kim
  • for: 提高数据隐私保护和一类分类性能
  • methods: 提出了基于原型 repreentation 的 federated learning 方法,以增强全球模型的表示能力,降低通信成本
  • results: 对五种广泛使用的基准数据集进行了广泛的实验,证明了提议的框架在先前的方法中表现出色
    Abstract Federated learning (FL) is a promising approach for enhancing data privacy preservation, particularly for authentication systems. However, limited round communications, scarce representation, and scalability pose significant challenges to its deployment, hindering its full potential. In this paper, we propose 'ProtoFL', Prototypical Representation Distillation based unsupervised Federated Learning to enhance the representation power of a global model and reduce round communication costs. Additionally, we introduce a local one-class classifier based on normalizing flows to improve performance with limited data. Our study represents the first investigation of using FL to improve one-class classification performance. We conduct extensive experiments on five widely used benchmarks, namely MNIST, CIFAR-10, CIFAR-100, ImageNet-30, and Keystroke-Dynamics, to demonstrate the superior performance of our proposed framework over previous methods in the literature.
    摘要 《联合学习(FL)是一种有前途的方法,能够提高数据隐私保护,特别是 для 身份验证系统。然而,有限的回合通信,珍贵的表示和扩展性带来了其部署的挑战,使其全部潜力受限。在本文中,我们提出了“ProtoFL”,基于 прототипиаль表示储存的无监督联合学习,以提高全球模型的表示力和减少回合通信成本。此外,我们还引入了基于 нормализа函数的本地一类分类器,以提高有限数据下的性能。我们的研究是首次利用FL提高一类分类性能的研究。我们在五种广泛使用的标准数据集上进行了广泛的实验,包括MNIST、CIFAR-10、CIFAR-100、ImageNet-30和键盘动作,以示出我们提posed的框架在过去的方法中的超越性。》Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese. If you prefer Traditional Chinese, please let me know.

SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces

  • paper_url: http://arxiv.org/abs/2307.12445
  • repo_url: None
  • paper_authors: Ivan Vallés-Pérez, Grzegorz Beringer, Piotr Bilinski, Gary Cook, Roberto Barra-Chicote
  • for: 本研究旨在将CLIP模型应用到语音领域,以学习共同的phonetic和acoustic空间表示。
  • methods: 本研究使用CLIP模型,通过对语音资料进行对映和调整,以学习共同的phonetic和acoustic空间表示。
  • results: 研究获得的结果显示,提案的模型具有辨识phonetic变化的能力,并且具有对不同类型噪音的抗响性。此外,研究还证明了模型的对下游应用的有用性,如语音识别和生成等。
    Abstract Numerous examples in the literature proved that deep learning models have the ability to work well with multimodal data. Recently, CLIP has enabled deep learning systems to learn shared latent spaces between images and text descriptions, with outstanding zero- or few-shot results in downstream tasks. In this paper we explore the same idea proposed by CLIP but applied to the speech domain, where the phonetic and acoustic spaces usually coexist. We train a CLIP-based model with the aim to learn shared representations of phonetic and acoustic spaces. The results show that the proposed model is sensible to phonetic changes, with a 91% of score drops when replacing 20% of the phonemes at random, while providing substantial robustness against different kinds of noise, with a 10% performance drop when mixing the audio with 75% of Gaussian noise. We also provide empirical evidence showing that the resulting embeddings are useful for a variety of downstream applications, such as intelligibility evaluation and the ability to leverage rich pre-trained phonetic embeddings in speech generation task. Finally, we discuss potential applications with interesting implications for the speech generation and recognition fields.
    摘要 多种例子在文献中证明深度学习模型可以好地处理多Modal数据。近期,CLIP使得深度学习系统可以学习图像和文本描述之间的共享幂等空间,显示出 Zero-或几个Shot结果在下游任务中。在这篇论文中,我们探索了与CLIP相同的想法,但是应用到语音频域中, где音频和语音空间通常共存。我们使用CLIP基于模型,以学习共享phonetic和Acoustic空间的表示。结果显示,我们的提posed模型对phonetic变化敏感,在Random中替换20%的Phoneemes时,得分下降91%,同时在混合音频75%的加aussian噪音时,表现下降10%。我们还提供了实验证明,表明得到的嵌入是下游应用中有用,如智能评估和Speech生成任务中的Rich预训练phonetic嵌入。最后,我们讨论了可能的应用,具有Speech生成和识别领域的 interessante implikationen。

AMaizeD: An End to End Pipeline for Automatic Maize Disease Detection

  • paper_url: http://arxiv.org/abs/2308.03766
  • repo_url: None
  • paper_authors: Anish Mall, Sanchit Kabra, Ankur Lhila, Pawan Ajmera
  • for: automatización de la detección de enfermedades en maíz utilizando imágenes multiespectrales obtenidas desde drones.
  • methods: combina redes neuronales convolucionales (CNNs) como extractores de características y técnicas de segmentación para identificar las plantas de maíz y sus enfermedades asociadas.
  • results: detecta una variedad de enfermedades de maíz, incluyendo la roya, el antracnosis y la podredumbre foliar, con un rendimiento estado del arte en el conjunto de datos personalizado.
    Abstract This research paper presents AMaizeD: An End to End Pipeline for Automatic Maize Disease Detection, an automated framework for early detection of diseases in maize crops using multispectral imagery obtained from drones. A custom hand-collected dataset focusing specifically on maize crops was meticulously gathered by expert researchers and agronomists. The dataset encompasses a diverse range of maize varieties, cultivation practices, and environmental conditions, capturing various stages of maize growth and disease progression. By leveraging multispectral imagery, the framework benefits from improved spectral resolution and increased sensitivity to subtle changes in plant health. The proposed framework employs a combination of convolutional neural networks (CNNs) as feature extractors and segmentation techniques to identify both the maize plants and their associated diseases. Experimental results demonstrate the effectiveness of the framework in detecting a range of maize diseases, including powdery mildew, anthracnose, and leaf blight. The framework achieves state-of-the-art performance on the custom hand-collected dataset and contributes to the field of automated disease detection in agriculture, offering a practical solution for early identification of diseases in maize crops advanced machine learning techniques and deep learning architectures.
    摘要

Framing Relevance for Safety-Critical Autonomous Systems

  • paper_url: http://arxiv.org/abs/2307.14355
  • repo_url: None
  • paper_authors: Astrid Rakow
  • for: 这个论文是为了研究如何确定自动化系统在当前任务中需要的信息,以建立适当的世界观并实现任务目标。
  • methods: 这篇论文使用了正式方法来确定自动化系统需要的信息,包括对各种信息源的分类和选择,以及如何将信息整合到自动化系统中。
  • results: 这篇论文的结果表明,使用正式方法可以有效地确定自动化系统需要的信息,并且可以帮助自动化系统在充满信息的环境中做出更加有效的决策。
    Abstract We are in the process of building complex highly autonomous systems that have build-in beliefs, perceive their environment and exchange information. These systems construct their respective world view and based on it they plan their future manoeuvres, i.e., they choose their actions in order to establish their goals based on their prediction of the possible futures. Usually these systems face an overwhelming flood of information provided by a variety of sources where by far not everything is relevant. The goal of our work is to develop a formal approach to determine what is relevant for a safety critical autonomous system at its current mission, i.e., what information suffices to build an appropriate world view to accomplish its mission goals.
    摘要 我们正在建设复杂高自动化系统,这些系统具有内置的信念和环境感知功能,并且能够交换信息。这些系统根据自己的世界观建立未来行动计划,即选择行动以实现目标基于预测未来的可能性。通常这些系统面临着极大的信息泛洪,其中大多数信息并不相关。我们的工作目标是开发一种正式的方法,以确定一个安全关键自动化系统当前任务中需要的信息,以建立合适的世界观以实现任务目标。

Implementing Smart Contracts: The case of NFT-rental with pay-per-like

  • paper_url: http://arxiv.org/abs/2308.02424
  • repo_url: https://github.com/asopi/rental-project
  • paper_authors: Alfred Sopi, Johannes Schneider, Jan vom Brocke
  • For: The paper aims to address the challenges of lending and renting non-fungible tokens (NFTs) for marketing purposes, such as the risk of items not being returned and the difficulty in anticipating the impact of artworks.* Methods: The paper introduces an NFT rental solution based on a pay-per-like pricing model using blockchain technology and smart contracts based on the Ethereum chain.* Results: The paper finds that blockchain solutions enjoy many advantages, but also observes dark sides such as large blockchain fees, which can be unfair to niche artists and potentially hamper cultural diversity. Additionally, a trust-cost tradeoff arises to handle fraud caused by manipulation from parties outside the blockchain.Here are the three points in Simplified Chinese text:* For: 论文目的是解决非 fungible tokens(NFTs)的借领和租赁问题,如物品不返还和艺术作品的影响预测困难。* Methods: 论文提出了基于 pays-per-like 价格模式的 NFT 租赁解决方案,使用了区块链技术和基于 Ethereum 链的智能合约。* Results: 论文发现区块链解决方案具有许多优点,但也注意到了一些黑暗的面向,如大量区块链费用,可能对专业艺术家不公平,可能妨碍文化多样性。此外,面临滥用和 manipulate 等问题,需要考虑信任成本tradeoff。
    Abstract Non-fungible tokens(NFTs) are on the rise. They can represent artworks exhibited for marketing purposes on webpages of companies or online stores -- analogously to physical artworks. Lending of NFTs is an attractive form of passive income for owners but comes with risks (e.g., items are not returned) and costs for escrow agents. Similarly, renters have difficulties in anticipating the impact of artworks, e.g., how spectators of NFTs perceive them. To address these challenges, we introduce an NFT rental solution based on a pay-per-like pricing model using blockchain technology, i.e., smart contracts based on the Ethereum chain. We find that blockchain solutions enjoy many advantages also reported for other applications, but interestingly, we also observe dark sides of (large) blockchain fees. Blockchain solutions appear unfair to niche artists and potentially hamper cultural diversity. Furthermore, a trust-cost tradeoff arises to handle fraud caused by manipulation from parties outside the blockchain. All code for the solution is publicly available at: https://github.com/asopi/rental-project
    摘要 非可转换token(NFT)在升温。它们可以代表公司或在线商店的网页上展示的艺术作品,类似于 físichen艺术作品。NFT的借用是持有者的有利预想的 passive income,但是也有风险(例如,物品不返回)和代理人的成本。同时,租户困难预测NFT的影响,例如,如何评估NF的观众。为解决这些挑战,我们介绍了基于付费模式的NFT租赁解决方案,使用区块链技术和Ethereum链上的智能合约。我们发现,区块链解决方案在其他应用程序中报道的优点也存在,但是有趣的是,我们还观察到大型区块链费用的黑暗面。区块链解决方案可能不公平对尼希艺术家和文化多样性。此外,为了处理外部干扰所引起的诈骗,需要处理信任成本。所有的代码都可以在GitHub上找到:https://github.com/asopi/rental-project。

Validation of a Zero-Shot Learning Natural Language Processing Tool for Data Abstraction from Unstructured Healthcare Data

  • paper_url: http://arxiv.org/abs/2308.00107
  • repo_url: https://github.com/kaufmannb/PDF-Extractor
  • paper_authors: Basil Kaufmann, Dallin Busby, Chandan Krushna Das, Neeraja Tillu, Mani Menon, Ashutosh K. Tewari, Michael A. Gorin
  • For: The paper is written to describe the development and validation of a zero-shot learning natural language processing (NLP) tool for abstracting data from unstructured text contained within PDF documents, such as those found within electronic health records.* Methods: The data abstraction tool was based on the GPT-3.5 model from OpenAI, and was compared to three physician human abstractors in terms of time to task completion and accuracy for abstracting data on 14 unique variables from a set of 199 de-identified radical prostatectomy pathology reports. The reports were processed by the software tool in vectorized and scanned formats to establish the impact of optical character recognition on data abstraction.* Results: The software tool required a mean of 12.8 s to process the vectorized reports and a mean of 15.8 s to process the scanned reports, which was significantly faster than the human abstractors (mean time of 101 s). The tool had an overall accuracy of 94.2% for the vectorized reports and 88.7% for the scanned reports, which was non-inferior to 2 out of 3 human abstractors.
    Abstract Objectives: To describe the development and validation of a zero-shot learning natural language processing (NLP) tool for abstracting data from unstructured text contained within PDF documents, such as those found within electronic health records. Materials and Methods: A data abstraction tool based on the GPT-3.5 model from OpenAI was developed and compared to three physician human abstractors in terms of time to task completion and accuracy for abstracting data on 14 unique variables from a set of 199 de-identified radical prostatectomy pathology reports. The reports were processed by the software tool in vectorized and scanned formats to establish the impact of optical character recognition on data abstraction. The tool was assessed for superiority for data abstraction speed and non-inferiority for accuracy. Results: The human abstractors required a mean of 101s per report for data abstraction, with times varying from 15 to 284 s. In comparison, the software tool required a mean of 12.8 s to process the vectorized reports and a mean of 15.8 to process the scanned reports (P < 0.001). The overall accuracies of the three human abstractors were 94.7%, 97.8%, and 96.4% for the combined set of 2786 datapoints. The software tool had an overall accuracy of 94.2% for the vectorized reports, proving to be non-inferior to the human abstractors at a margin of -10% ($\alpha$=0.025). The tool had a slightly lower accuracy of 88.7% using the scanned reports, proving to be non-inferiority to 2 out of 3 human abstractors. Conclusion: The developed zero-shot learning NLP tool affords researchers comparable levels of accuracy to that of human abstractors, with significant time savings benefits. Because of the lack of need for task-specific model training, the developed tool is highly generalizable and can be used for a wide variety of data abstraction tasks, even outside the field of medicine.
    摘要 Materials and Methods: We developed a data abstraction tool based on the GPT-3.5 model from OpenAI and compared its performance to three human abstractors in terms of time and accuracy for abstracting data from 14 variables in 199 de-identified radical prostatectomy pathology reports. The reports were processed in both vectorized and scanned formats to assess the impact of optical character recognition (OCR) on data abstraction. We evaluated the tool's superiority in speed and non-inferiority in accuracy.Results: The human abstractors took a mean of 101 seconds per report, with times ranging from 15 to 284 seconds. In contrast, the software tool took a mean of 12.8 seconds to process vectorized reports and 15.8 seconds for scanned reports (p < 0.001). The tool's overall accuracy was 94.2% for vectorized reports, proving non-inferiority to the human abstractors at a margin of -10% (α = 0.025). For scanned reports, the tool's accuracy was 88.7%, proving non-inferiority to two out of three human abstractors.Conclusion: The developed zero-shot learning NLP tool provides researchers with a time-saving solution that affords comparable levels of accuracy to human abstractors. The tool's lack of need for task-specific model training makes it highly generalizable and suitable for a wide range of data abstraction tasks, both within and outside the field of medicine.

Uncertainty-aware Grounded Action Transformation towards Sim-to-Real Transfer for Traffic Signal Control

  • paper_url: http://arxiv.org/abs/2307.12388
  • repo_url: None
  • paper_authors: Longchao Da, Hao Mei, Romir Sharma, Hua Wei
  • for: 提高RL在实际道路上的应用性能
  • methods: 使用 simulations-to-real-world (sim-to-real) 转移方法,动态将模拟环境中学习的策略转移到实际环境中,以 Mitigate domain gap
  • results: 在模拟交通环境中评估了UGAT方法,显示UGAT方法可以在实际环境中提高RL策略的性能
    Abstract Traffic signal control (TSC) is a complex and important task that affects the daily lives of millions of people. Reinforcement Learning (RL) has shown promising results in optimizing traffic signal control, but current RL-based TSC methods are mainly trained in simulation and suffer from the performance gap between simulation and the real world. In this paper, we propose a simulation-to-real-world (sim-to-real) transfer approach called UGAT, which transfers a learned policy trained from a simulated environment to a real-world environment by dynamically transforming actions in the simulation with uncertainty to mitigate the domain gap of transition dynamics. We evaluate our method on a simulated traffic environment and show that it significantly improves the performance of the transferred RL policy in the real world.
    摘要 交通信号控制(TSC)是一项复杂且重要的任务,影响了数百万人的日常生活。人工智能学习(RL)已经在优化交通信号控制方面表现出了扎实的成果,但现有RL基于TSC方法主要在模拟环境中训练,它们在实际世界中的性能差异很大。在这篇论文中,我们提出了一种从模拟环境到实际世界(sim-to-real)的转移方法,称为UGAT,它通过在模拟环境中动态地转换行动,以减少领域差距,将学习的RL策略在实际世界中表现出较好的性能。我们对一个模拟交通环境进行评估,并显示了UGAT方法在实际世界中的性能提升。

In-Context Learning in Large Language Models Learns Label Relationships but Is Not Conventional Learning

  • paper_url: http://arxiv.org/abs/2307.12375
  • repo_url: None
  • paper_authors: Jannik Kossen, Tom Rainforth, Yarin Gal
  • for: 这种研究旨在探讨大语言模型(LLMs)在下游任务中表现提高的原因,以及如何更好地理解和调控LLMs的行为。
  • methods: 这篇论文使用了实验方法,检查了LLMs在不同情况下如何处理输入和标签的关系,并分析了模型如何在不同阶段学习和使用标签信息。
  • results: 研究发现,LLMs通常会在输入中使用标签信息,但是在预训练和输入中的标签关系是不同的,并且模型不会对所有输入信息进行平等处理。这些结论可以帮助我们更好地理解和调控LLMs的行为。
    Abstract The performance of Large Language Models (LLMs) on downstream tasks often improves significantly when including examples of the input-label relationship in the context. However, there is currently no consensus about how this in-context learning (ICL) ability of LLMs works: for example, while Xie et al. (2021) liken ICL to a general-purpose learning algorithm, Min et al. (2022b) argue ICL does not even learn label relationships from in-context examples. In this paper, we study (1) how labels of in-context examples affect predictions, (2) how label relationships learned during pre-training interact with input-label examples provided in-context, and (3) how ICL aggregates label information across in-context examples. Our findings suggests LLMs usually incorporate information from in-context labels, but that pre-training and in-context label relationships are treated differently, and that the model does not consider all in-context information equally. Our results give insights into understanding and aligning LLM behavior.
    摘要 大型语言模型(LLM)在下游任务中表现往往会得到显著改善,当包含输入-标签关系的示例在内部时。然而,目前没有一致的看法,即如何实现这种在 контексте学习(ICL)能力:例如,希耶等(2021)将 ICL 比作一种通用学习算法,而民等(2022b)则认为 ICL 不会从内部示例中学习标签关系。在这篇论文中,我们研究以下几点:1. 如何 Labels of in-context examples affect predictions.2. 如何在预训练时学习的标签关系与输入-标签示例提供在内部交互。3. ICL 如何对各个内部示例的标签信息进行汇总。我们的发现表明,LLMs 通常会在内部示例中使用标签信息,但是预训练和内部标签关系被处理不同,并且模型不会对所有内部信息进行平等考虑。我们的结果为理解和调整 LLM 行为提供了新的视角。

Early Prediction of Alzheimers Disease Leveraging Symptom Occurrences from Longitudinal Electronic Health Records of US Military Veterans

  • paper_url: http://arxiv.org/abs/2307.12369
  • repo_url: None
  • paper_authors: Rumeng Li, Xun Wang, Dan Berlowitz, Brian Silver, Wen Hu, Heather Keating, Raelene Goodwin, Weisong Liu, Honghuang Lin, Hong Yu
    for: 这个研究的目的是使用机器学习方法来分析患有阿尔茨杰病(AD)的长期电子医疗纪录(EHR),以预测AD的发病更早。methods: 这个研究使用了一种case-control设计,使用了来自美国卫生部VA卫生管理局(VHA)的长期EHR数据从2004年到2021年。研究使用了一组AD相关关键词,并分析了这些词的时间发展特征以预测AD的发病。results: 研究发现,与AD诊断日期相关的AD相关关键词的出现频率在患有AD的患者中增长迅速,从约10年增长到超过40年。此外,研究还发现了一些年龄、性别和种族/民族 subgroup的差异。最佳模型可以具有高度的排除率(ROCAUC 0.997),并且在不同的 subgroup 中具有良好的准确率和均匀性。
    Abstract Early prediction of Alzheimer's disease (AD) is crucial for timely intervention and treatment. This study aims to use machine learning approaches to analyze longitudinal electronic health records (EHRs) of patients with AD and identify signs and symptoms that can predict AD onset earlier. We used a case-control design with longitudinal EHRs from the U.S. Department of Veterans Affairs Veterans Health Administration (VHA) from 2004 to 2021. Cases were VHA patients with AD diagnosed after 1/1/2016 based on ICD-10-CM codes, matched 1:9 with controls by age, sex and clinical utilization with replacement. We used a panel of AD-related keywords and their occurrences over time in a patient's longitudinal EHRs as predictors for AD prediction with four machine learning models. We performed subgroup analyses by age, sex, and race/ethnicity, and validated the model in a hold-out and "unseen" VHA stations group. Model discrimination, calibration, and other relevant metrics were reported for predictions up to ten years before ICD-based diagnosis. The study population included 16,701 cases and 39,097 matched controls. The average number of AD-related keywords (e.g., "concentration", "speaking") per year increased rapidly for cases as diagnosis approached, from around 10 to over 40, while remaining flat at 10 for controls. The best model achieved high discriminative accuracy (ROCAUC 0.997) for predictions using data from at least ten years before ICD-based diagnoses. The model was well-calibrated (Hosmer-Lemeshow goodness-of-fit p-value = 0.99) and consistent across subgroups of age, sex and race/ethnicity, except for patients younger than 65 (ROCAUC 0.746). Machine learning models using AD-related keywords identified from EHR notes can predict future AD diagnoses, suggesting its potential use for identifying AD risk using EHR notes, offering an affordable way for early screening on large population.
    摘要 早期预测阿尔茨海默病(AD)非常重要,以便在时间上采取措施和治疗。这项研究的目的是使用机器学习方法分析患者的长期电子医疗纪录(EHR),以预测AD的发病。我们采用了一种case-control设计,使用了2004年至2021年美国卫生部 veterans Health Administration(VHA)的长期EHR数据。cases是在2016年1月1日以后根据ICD-10-CM代码被诊断出AD的VHA患者,与年龄、性别和临床使用相同的9名控制组进行匹配。我们使用了AD相关关键词的Panel,并分析这些关键词在患者的长期EHR中的出现情况,以预测AD的发病。我们进行了年龄、性别和种族/民族 subgroup分析,并在一个封闭和“未看到”的VHA站组中验证了模型。我们测试了四种机器学习模型,并计算了预测正确率、准确率和其他相关指标。研究人口包括16,701个case和39,097个匹配控制组。case的AD相关关键词每年增长迅速,从约10到超过40,而控制组保持在10个。最佳模型在使用至少10年前ICD-based诊断时的预测中 дости到了高度的抗抑词率(ROCAUC 0.997)。模型具有良好的准确率(Hosmer-Lemeshow好准确性值 = 0.99)和年龄、性别和种族/民族 subgroup中的一致性,除了年龄少于65的患者(ROCAUC 0.746)。机器学习模型使用从EHR笔记中提取的AD相关关键词可以预测未来AD诊断,这表明了这种方法的潜在应用价值,可以通过分析EHR笔记来预测AD风险,这是一种可靠且经济的预测方法。

Deployment of Leader-Follower Automated Vehicle Systems for Smart Work Zone Applications with a Queuing-based Traffic Assignment Approach

  • paper_url: http://arxiv.org/abs/2308.03764
  • repo_url: None
  • paper_authors: Qing Tang, Xianbiao Hu
  • for: 这篇论文旨在优化Autonomous Truck Mounted Attenuator(ATMA)车辆系统中的routing,以最小化在交通基础设施维护期间的系统成本。
  • methods: 这篇论文使用了连接和自动化车辆技术,并提出了一种基于队列的交通分配方法,以考虑ATMA车辆的运行速度差异。
  • results: 研究发现,通过模拟不同路线的选择,ATMA车辆系统可以减少交通系统的成本,并且可以通过队列基于的交通分配方法来实现这一目标。
    Abstract The emerging technology of the Autonomous Truck Mounted Attenuator (ATMA), a leader-follower style vehicle system, utilizes connected and automated vehicle capabilities to enhance safety during transportation infrastructure maintenance in work zones. However, the speed difference between ATMA vehicles and general vehicles creates a moving bottleneck that reduces capacity and increases queue length, resulting in additional delays. The different routes taken by ATMA cause diverse patterns of time-varying capacity drops, which may affect the user equilibrium traffic assignment and lead to different system costs. This manuscript focuses on optimizing the routing for ATMA vehicles in a network to minimize the system cost associated with the slow-moving operation. To achieve this, a queuing-based traffic assignment approach is proposed to identify the system cost caused by the ATMA system. A queuing-based time-dependent (QBTD) travel time function, considering capacity drop, is introduced and applied in the static user equilibrium traffic assignment problem, with a result of adding dynamic characteristics. Subsequently, we formulate the queuing-based traffic assignment problem and solve it using a modified path-based algorithm. The methodology is validated using a small-size and a large-size network and compared with two benchmark models to analyze the benefit of capacity drop modeling and QBTD travel time function. Furthermore, the approach is applied to quantify the impact of different routes on the traffic system and identify an optimal route for ATMA vehicles performing maintenance work. Finally, sensitivity analysis is conducted to explore how the impact changes with variations in traffic demand and capacity reduction.
    摘要 新兴技术自动化卡车拥挤器(ATMA),一种领头随员式车辆系统,通过连接和自动化车辆能力来提高交通基础设施维护工区的安全性。然而,ATMA车辆的速度与普通车辆的速度差距创造了运动瓶颈,导致交通容量下降和排队较长,从而增加延迟。ATMA车辆采取不同的路线,导致时间变化的容量下降,这可能影响用户均衡交通分配和导致不同的系统成本。本文关注优化ATMA车辆网络路径,以最小化由慢速运行引起的系统成本。为此,我们提出了基于队列的交通分配方法,以识别由ATMA系统引起的系统成本。我们引入了考虑容量下降的队列基于时间依赖(QBTD)旅行时间函数,并应用于静态用户均衡交通分配问题。通过修改的路径基本算法,我们解决了队列基于交通分配问题。我们验证了方法使用小型和大型网络,并与两个参考模型进行比较,以分析容器下降模型和QBTD旅行时间函数的利好。此外,我们还应用该方法来评估不同路线对交通系统的影响,并确定最佳维护工区路线。最后,我们进行敏感分析,以explore系统成本变化的影响因素。