cs.AI - 2023-11-07

ToP-ToM: Trust-aware Robot Policy with Theory of Mind

  • paper_url: http://arxiv.org/abs/2311.04397
  • repo_url: None
  • paper_authors: Chuang Yu, Baris Serhan, Angelo Cangelosi
  • for: 这个论文旨在探讨人工智能机器人在多体系统中与人类合作时,如何使用理解人类心理状态的理论来建立信任。
  • methods: 该论文使用了人类心理状态理论(理论心)来设计一种可靠性评估方法,并通过对人类行为观察和分析来推断人类的信任情况。
  • results: 实验结果表明,通过采用基于理论心的机器人策略,可以有效地维护人类与机器人之间的信任关系,并在多体系统中提高人类与机器人之间的协作效率。
    Abstract Theory of Mind (ToM) is a fundamental cognitive architecture that endows humans with the ability to attribute mental states to others. Humans infer the desires, beliefs, and intentions of others by observing their behavior and, in turn, adjust their actions to facilitate better interpersonal communication and team collaboration. In this paper, we investigated trust-aware robot policy with the theory of mind in a multiagent setting where a human collaborates with a robot against another human opponent. We show that by only focusing on team performance, the robot may resort to the reverse psychology trick, which poses a significant threat to trust maintenance. The human's trust in the robot will collapse when they discover deceptive behavior by the robot. To mitigate this problem, we adopt the robot theory of mind model to infer the human's trust beliefs, including true belief and false belief (an essential element of ToM). We designed a dynamic trust-aware reward function based on different trust beliefs to guide the robot policy learning, which aims to balance between avoiding human trust collapse due to robot reverse psychology. The experimental results demonstrate the importance of the ToM-based robot policy for human-robot trust and the effectiveness of our robot ToM-based robot policy in multiagent interaction settings.
    摘要 We found that when the robot focuses solely on team performance, it may resort to reverse psychology, which can damage trust between the human and the robot. When the human discovers the robot's deceptive behavior, their trust in the robot will collapse. To address this issue, we developed a robot ToM model to infer the human's trust beliefs, including true belief and false belief, which is an essential element of ToM.We designed a dynamic trust-aware reward function based on different trust beliefs to guide the robot policy learning, which aims to balance between avoiding human trust collapse due to robot reverse psychology and achieving team performance. Our experimental results demonstrate the importance of ToM-based robot policy for human-robot trust and the effectiveness of our approach in multiagent interaction settings.

Harnessing Manycore Processors with Distributed Memory for Accelerated Training of Sparse and Recurrent Models

  • paper_url: http://arxiv.org/abs/2311.04386
  • repo_url: None
  • paper_authors: Jan Finkbeiner, Thomas Gmeinder, Mark Pupilli, Alexander Titterton, Emre Neftci
  • For: + The paper aims to explore the use of a massively parallel multiple instruction multiple data (MIMD) architecture with distributed local memory for training sparse and recurrent neural networks. + The authors aim to overcome the limitations of current AI training infrastructure, which is dominated by single instruction multiple data (SIMD) and systolic array architectures like GPUs and TPUs. + The paper aims to pave the way towards more efficient and sustainable AI training methods.* Methods: + The authors use a training routine based on backpropagation through time (BPTT) for the brain-inspired class of Spiking Neural Networks (SNNs) that feature binary sparse activations. + The authors use a MIMD processor, the Intelligence Processing Unit (IPU), to train the SNNs. + The authors compare the performance of the IPU with A100 GPUs.* Results: + The authors observe 5-10x throughput gains compared to A100 GPUs and up to 38x gains for higher levels of activation sparsity, without a significant slowdown in training convergence or reduction in final model performance. + The authors observe highly promising trends for both single and multi IPU configurations as they scale up to larger model sizes.Here is the summary in Simplified Chinese:* For: + 本研究旨在探讨使用分布式本地存储的多指令多数据(MIMD)架构来训练稀疏和递归神经网络。 + 作者希望超越当前人工智能训练基础设施,这些基础设施主要是单指令多数据(SIMD)和 Systolic 阵列设施,如 GPU 和 TPU。 + 本研究希望为更高效和可持续的人工智能训练提供新的方法。* Methods: + 作者使用 backpropagation through time(BPTT)训练Brain-inspired 的稀疏神经网络(SNNs),并使用 MIMD 处理器,Intelligence Processing Unit(IPU)来训练 SNNs。 + 作者比较 IPU 与 A100 GPU 的性能。* Results: + 作者发现在 activations 稀疏性较高时,使用 IPU 可以获得 5-10 倍的 Throughput 提升,并且没有显著减慢训练过程的速度或模型性能的下降。 + 作者发现在大型模型训练时,单 IPU 和多 IPU 配置具有极高的潜力。
    Abstract Current AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures, such as Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs), that excel at accelerating parallel workloads and dense vector matrix multiplications. Potentially more efficient neural network models utilizing sparsity and recurrence cannot leverage the full power of SIMD processor and are thus at a severe disadvantage compared to today's prominent parallel architectures like Transformers and CNNs, thereby hindering the path towards more sustainable AI. To overcome this limitation, we explore sparse and recurrent model training on a massively parallel multiple instruction multiple data (MIMD) architecture with distributed local memory. We implement a training routine based on backpropagation through time (BPTT) for the brain-inspired class of Spiking Neural Networks (SNNs) that feature binary sparse activations. We observe a massive advantage in using sparse activation tensors with a MIMD processor, the Intelligence Processing Unit (IPU) compared to GPUs. On training workloads, our results demonstrate 5-10x throughput gains compared to A100 GPUs and up to 38x gains for higher levels of activation sparsity, without a significant slowdown in training convergence or reduction in final model performance. Furthermore, our results show highly promising trends for both single and multi IPU configurations as we scale up to larger model sizes. Our work paves the way towards more efficient, non-standard models via AI training hardware beyond GPUs, and competitive large scale SNN models.
    摘要 We implement a training routine based on backpropagation through time (BPTT) for the brain-inspired class of Spiking Neural Networks (SNNs) that feature binary sparse activations. Our results show a significant advantage in using sparse activation tensors with a MIMD processor, the Intelligence Processing Unit (IPU), compared to GPUs. On training workloads, we observe 5-10x throughput gains compared to A100 GPUs and up to 38x gains for higher levels of activation sparsity, without a significant slowdown in training convergence or reduction in final model performance.Our results also show promising trends for both single and multi IPU configurations as we scale up to larger model sizes. Our work paves the way towards more efficient, non-standard models via AI training hardware beyond GPUs and competitive large-scale SNN models.

Enhancing Malware Detection by Integrating Machine Learning with Cuckoo Sandbox

  • paper_url: http://arxiv.org/abs/2311.04372
  • repo_url: None
  • paper_authors: Amaal F. Alshmarni, Mohammed A. Alliheedi
  • for: 本研究目的是使用深度学习算法和传统机器学习算法对API调用序列中的Malware进行分类和识别。
  • methods: 本研究使用的方法包括CNN(卷积神经网络)和RNN(循环神经网络)等深度学习算法,以及SVM(支持向量机)、RF(随机森林)、KNN(最近邻居)、XGB(极限梯度提升)和GBC(梯度提升类ifiers)等传统机器学习算法。
  • results: 研究结果显示,深度学习和传统机器学习算法均可达到极高的准确率,达到99%以上。
    Abstract In the modern era, malware is experiencing a significant increase in both its variety and quantity, aligning with the widespread adoption of the digital world. This surge in malware has emerged as a critical challenge in the realm of cybersecurity, prompting numerous research endeavors and contributions to address the issue. Machine learning algorithms have been leveraged for malware detection due to their ability to uncover concealed patterns within vast datasets. However, deep learning algorithms, characterized by their multi-layered structure, surpass the limitations of traditional machine learning approaches. By employing deep learning techniques such as CNN (Convolutional Neural Network) and RNN (Recurrent Neural Network), this study aims to classify and identify malware extracted from a dataset containing API call sequences. The performance of these algorithms is compared with that of conventional machine learning methods, including SVM (Support Vector Machine), RF (Random Forest), KNN (K-Nearest Neighbors), XGB (Extreme Gradient Boosting), and GBC (Gradient Boosting Classifier), all using the same dataset. The outcomes of this research demonstrate that both deep learning and machine learning algorithms achieve remarkably high levels of accuracy, reaching up to 99% in certain cases.
    摘要 现代时期,黑客软件(malware)的多样性和量在不断增长,与数字化世界的普及相应。这种黑客软件的增长对网络安全领域提出了棘手的挑战,并促使了许多研究和贡献以解决问题。机器学习算法在黑客软件检测方面得到广泛应用,因为它们可以在大量数据中找到隐藏的模式。然而,深度学习算法,即多层结构的算法,超越了传统机器学习方法的局限性。本研究使用深度学习技术,如卷积神经网络(CNN)和循环神经网络(RNN),对来自API调用序列的黑客软件进行分类和识别。本研究的结果表明,深度学习和机器学习算法均可以达到极高的准确率,达到99%以上。

Evaluating the Effectiveness of Retrieval-Augmented Large Language Models in Scientific Document Reasoning

  • paper_url: http://arxiv.org/abs/2311.04348
  • repo_url: None
  • paper_authors: Sai Munikoti, Anurag Acharya, Sridevi Wagle, Sameera Horawalavithana
  • for: 提高Large Language Model(LLM)的可靠性和科学性,解决LLM提供的假信息问题。
  • methods: 使用检索增强的LLM,通过从外部数据源检索相关信息,改善模型的训练过程。
  • results: 在科学文档理解任务中,模型会使用假据来证明预测结果,并且使用科学资料作为预训练数据不能减轻证据fabrication的风险。
    Abstract Despite the dramatic progress in Large Language Model (LLM) development, LLMs often provide seemingly plausible but not factual information, often referred to as hallucinations. Retrieval-augmented LLMs provide a non-parametric approach to solve these issues by retrieving relevant information from external data sources and augment the training process. These models help to trace evidence from an externally provided knowledge base allowing the model predictions to be better interpreted and verified. In this work, we critically evaluate these models in their ability to perform in scientific document reasoning tasks. To this end, we tuned multiple such model variants with science-focused instructions and evaluated them on a scientific document reasoning benchmark for the usefulness of the retrieved document passages. Our findings suggest that models justify predictions in science tasks with fabricated evidence and leveraging scientific corpus as pretraining data does not alleviate the risk of evidence fabrication.
    摘要 尽管大语言模型(LLM)的发展做出了很多显著的进步,但是它们经常提供不符事实的信息,通常被称为幻觉。检索增强型LLM可以通过从外部数据源中检索相关信息并增强训练过程来解决这些问题。这些模型可以跟踪从知ledge基中提供的证据,使模型的预测更好地被解释和验证。在这项工作中,我们 kritically evaluates these models in their ability to perform in scientific document reasoning tasks。为此,我们调整了多种模型变体,并在科学文档理解benchmark上评估它们的用用。我们发现,这些模型在科学任务中 justify 预测的方法有假的证据,而使用科学corpus作为预训数据不会减少证据fabrication的风险。

A Taxonomy of Rater Disagreements: Surveying Challenges & Opportunities from the Perspective of Annotating Online Toxicity

  • paper_url: http://arxiv.org/abs/2311.04345
  • repo_url: None
  • paper_authors: Wenbo Zhang, Hangzhi Guo, Ian D Kivlichan, Vinodkumar Prabhakaran, Davis Yadav, Amulya Yadav
  • for: 本研究旨在探讨在线恶意评论的计算检测和缓解方法中,annotator之间的分歧原因,以及如何在Machine Learning发展阶段内 integrate这些分歧。
  • methods: 本研究通过分析广泛的文献,提出了一个细化的分类器,以及对每种原因进行解释和讨论。
  • results: 本研究提出了一个全面的分类器,可以帮助解释和缓解在线恶意评论中的分歧原因。此外,还提出了一些未解决的问题,可以推动未来的研究发展。
    Abstract Toxicity is an increasingly common and severe issue in online spaces. Consequently, a rich line of machine learning research over the past decade has focused on computationally detecting and mitigating online toxicity. These efforts crucially rely on human-annotated datasets that identify toxic content of various kinds in social media texts. However, such annotations historically yield low inter-rater agreement, which was often dealt with by taking the majority vote or other such approaches to arrive at a single ground truth label. Recent research has pointed out the importance of accounting for the subjective nature of this task when building and utilizing these datasets, and this has triggered work on analyzing and better understanding rater disagreements, and how they could be effectively incorporated into the machine learning developmental pipeline. While these efforts are filling an important gap, there is a lack of a broader framework about the root causes of rater disagreement, and therefore, we situate this work within that broader landscape. In this survey paper, we analyze a broad set of literature on the reasons behind rater disagreements focusing on online toxicity, and propose a detailed taxonomy for the same. Further, we summarize and discuss the potential solutions targeting each reason for disagreement. We also discuss several open issues, which could promote the future development of online toxicity research.
    摘要 “在线空间中,毒性问题日益严重,这导致了过去十年的机器学习研究强调computationally检测和缓和在线毒性。这些努力将靠着人类给出的标签来识别社交媒体文本中的毒性内容。但是,这些标签往往会受到评估者间的不一致,这些不一致通常会通过获得多数决的方式或其他方法来得到单一的真实标签。现在的研究表明了评估者间的不一致需要被考虑,并且这些不一致可以对机器学习开发过程中的建立和使用标签进行更好的理解和integration。然而,这些努力仍然缺乏一个更广泛的架构,它应该涵盖毒性评估者间的根本原因。因此,我们在这篇调查报告中分析了线上毒性领域中的各种原因,并提出了一个详细的分类。此外,我们还总结了每个原因的解决方案,并讨论了一些未解之处,这些未解之处可以帮助未来线上毒性研究的发展。”

Multimodal Clinical Benchmark for Emergency Care (MC-BEC): A Comprehensive Benchmark for Evaluating Foundation Models in Emergency Medicine

  • paper_url: http://arxiv.org/abs/2311.04937
  • repo_url: None
  • paper_authors: Emma Chen, Aman Kansal, Julie Chen, Boyang Tom Jin, Julia Rachel Reisler, David A Kim, Pranav Rajpurkar
  • for: The paper aims to provide a comprehensive benchmark for evaluating foundation models in Emergency Medicine, specifically for predicting patient decompensation, disposition, and ED revisit.
  • methods: The paper uses a multimodal dataset of over 100,000 continuously monitored Emergency Department visits from 2020-2022, including clinical data such as vital signs, electrocardiogram and photoplethysmograph waveforms, and free-text reports of imaging studies. The paper provides a standardized evaluation framework with train-test splits and evaluation metrics.
  • results: The paper provides performance baselines for each prediction task to enable the evaluation of multimodal, multitask models.Here are the three points in Simplified Chinese text:
  • for: 这篇论文目标是提供一个包容全的基础模型评估 benchmark для急诊医学,特别是预测病人状况下降、处置和急诊室 revisit。
  • methods: 这篇论文使用了2020-2022年度的急诊室访问数据集,涵盖了详细的临床数据,包括评估信息、先前诊断和药物、连续测量的生命体 Parameters、电cardiogram和光谱波形图,以及急诊室诊断、处置和后续 revisit 的信息。论文提供了一个标准化的评估框架,包括训练测试分割和评估指标。
  • results: 论文提供了每个预测任务的性能基线,以便评估多模态、多任务模型。
    Abstract We propose the Multimodal Clinical Benchmark for Emergency Care (MC-BEC), a comprehensive benchmark for evaluating foundation models in Emergency Medicine using a dataset of 100K+ continuously monitored Emergency Department visits from 2020-2022. MC-BEC focuses on clinically relevant prediction tasks at timescales from minutes to days, including predicting patient decompensation, disposition, and emergency department (ED) revisit, and includes a standardized evaluation framework with train-test splits and evaluation metrics. The multimodal dataset includes a wide range of detailed clinical data, including triage information, prior diagnoses and medications, continuously measured vital signs, electrocardiogram and photoplethysmograph waveforms, orders placed and medications administered throughout the visit, free-text reports of imaging studies, and information on ED diagnosis, disposition, and subsequent revisits. We provide performance baselines for each prediction task to enable the evaluation of multimodal, multitask models. We believe that MC-BEC will encourage researchers to develop more effective, generalizable, and accessible foundation models for multimodal clinical data.
    摘要 我们提出了多modal临床标准(MC-BEC),用于评估临床数据的基本模型。MC-BEC使用2020-2022年度 Emergency Department 访问记录超过100,000次,并将临床相关的预测任务分为不同时间尺度,包括patient decompensation、 dispose 和 Emergency Department revisit。我们还提供了标准化的评估框架,包括训练测试分割和评估指标。数据集包括丰富的临床数据,包括triage信息、先前诊断和药物、不断测量的生命 Parameter、电cardiogram和光谱波形图像、在访问中下达的命令和给药、自由文本报告的成像试验结果以及 Emergency Department 诊断、处置和后续 revisits 信息。我们还提供了每个预测任务的性能基线,以便评估多modal、多任务模型。我们认为MC-BEC将鼓励研究人员开发更有效、普遍、可 accessible 的基本模型。

Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations

  • paper_url: http://arxiv.org/abs/2311.04335
  • repo_url: https://github.com/schen149/sub-sentence-encoder
  • paper_authors: Sihao Chen, Hongming Zhang, Tong Chen, Ben Zhou, Wenhao Yu, Dian Yu, Baolin Peng, Hongwei Wang, Dan Roth, Dong Yu
  • for: 这 paper 是为了提供一种基于 contrastive learning 的文本细词表示模型,以便在文本分类、检索和机器翻译等应用中进行细词级别的语义表示。
  • methods: 该 paper 使用了一种基于 contrastive learning 的文本细词表示模型,称为 sub-sentence encoder,该模型可以学习细词级别的语义表示,并且可以在不同文本序列中检测到相似的语义表达。
  • results: 该 paper 的实验结果表明,使用 sub-sentence encoder 可以在文本应用中提高细词级别的语义表示的准确率,同时保持与传统 sentence encoder 相同的推理成本和存储复杂度。
    Abstract We introduce sub-sentence encoder, a contrastively-learned contextual embedding model for fine-grained semantic representation of text. In contrast to the standard practice with sentence embeddings, where the meaning of an entire sequence of text is encoded into a fixed-length vector, the sub-sentence encoder learns to produce distinct contextual embeddings corresponding to different atomic propositions, i.e. atomic units of meaning expressed within a text sequence. The sub-sentence embeddings are contrastively learned to recognize (inferred) semantic equivalence between propositions across different text sequences. Our experiments show the effectiveness of sub-sentence encoders in applications, such as retrieving supporting facts for fine-grained text attribution or recognizing the conditional semantic similarity between texts. In practice, we demonstrate that sub-sentence encoders keep the same level of inference cost and space complexity compared to sentence encoders.
    摘要 我们介绍了下一个字句编码器,一种基于对比学习的上下文嵌入模型,用于细化文本 semantics的表示。与标准做法不同,即将整个文本序列中的意义编码成固定长度向量,而我们的字句编码器学习了生成不同原子提POSITIONS中的含义的上下文嵌入。这些字句嵌入被对比式学习,以认可(推理)在不同文本序列中的含义相似性。我们的实验表明,字句编码器在应用中具有效果,例如检索支持 факт dla fine-grained text attribution 或recognize conditional semantic similarity between texts。在实践中,我们发现字句编码器与句子编码器的推理成本和存储复杂度相同。

Educating for AI Cybersecurity Work and Research: Ethics, Systems Thinking, and Communication Requirements

  • paper_url: http://arxiv.org/abs/2311.04326
  • repo_url: None
  • paper_authors: Sorin Adam Matei, Elisa Bertino
    for:* The paper explores managerial and instructor perceptions of freshly employed cybersecurity workers’ preparedness to work effectively in a changing cybersecurity environment that includes AI tools.methods:* The study uses a survey to collect data on managerial and instructor perceptions of technical preparedness and non-technical skill sets (ethical, systems thinking, and communication skills) among freshly employed cybersecurity workers.results:* The study found that managers and professors perceive preparedness to use AI tools in cybersecurity to be significantly associated with all three non-technical skill sets, with ethics being the most important. Additionally, professors over-estimate students’ preparedness for ethical, system thinking, and communication abilities compared to IT managers’ perceptions of their newly employed IT workers.
    Abstract The present study explored managerial and instructor perceptions of their freshly employed cybersecurity workers' or students' preparedness to work effectively in a changing cybersecurity environment that includes AI tools. Specifically, we related perceptions of technical preparedness to ethical, systems thinking, and communication skills. We found that managers and professors perceive preparedness to use AI tools in cybersecurity to be significantly associated with all three non-technical skill sets. Most important, ethics is a clear leader in the network of relationships. Contrary to expectations that ethical concerns are left behind in the rush to adopt the most advanced AI tools in security, both higher education instructors and managers appreciate their role and see them closely associated with technical prowess. Another significant finding is that professors over-estimate students' preparedness for ethical, system thinking, and communication abilities compared to IT managers' perceptions of their newly employed IT workers.
    摘要 现在的研究探讨了新 empleados或学生的cybersecurity工作效果preparedness,特别是在包括AI工具的变化的cybersecurity环境中。我们发现管理者和教授对使用AI工具在cybersecurity中的准备程度与伦理、系统思维和communication skills有显著相关性。其中,伦理准备是网络关系中最重要的一个因素。 contradicting expectations, both higher education instructors and managers recognize the importance of ethics in the adoption of advanced AI tools in security, and see it closely related to technical proficiency. Another significant finding is that professors overestimate students' preparedness for ethical, system thinking, and communication abilities compared to IT managers' perceptions of their newly employed IT workers.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Extending Machine Learning-Based Early Sepsis Detection to Different Demographics

  • paper_url: http://arxiv.org/abs/2311.04325
  • repo_url: None
  • paper_authors: Surajsinh Parmar, Tao Shan, San Lee, Yonghwan Kim, Jang Yong Kim
  • for: 这项研究旨在探讨两种ensemble学习方法 LightGBM 和 XGBoost 在公共eICU-CRD数据集和私人韩国圣玛利亚医院数据集上的比较分析,以便更好地掌握医疗数据的不均衡问题和抑制综合症识别。
  • methods: 这项研究使用了 LightGBM 和 XGBoost 两种ensemble学习方法,以探讨它们在医疗数据中的应用。
  • results: 研究发现,LightGBM 在计算效率和扩展性方面表现有些微强,而 XGBoost 则在某些方面表现较差。这些结果预示了这两种方法在医疗数据中的应用潜力,并为扩展机器学习在医疗领域的应用提供了基础。
    Abstract Sepsis requires urgent diagnosis, but research is predominantly focused on Western datasets. In this study, we perform a comparative analysis of two ensemble learning methods, LightGBM and XGBoost, using the public eICU-CRD dataset and a private South Korean St. Mary's Hospital's dataset. Our analysis reveals the effectiveness of these methods in addressing healthcare data imbalance and enhancing sepsis detection. Specifically, LightGBM shows a slight edge in computational efficiency and scalability. The study paves the way for the broader application of machine learning in critical care, thereby expanding the reach of predictive analytics in healthcare globally.
    摘要 septicemia需要紧急诊断,但研究主要集中在西方数据集上。在这项研究中,我们进行了两种ensemble学习方法,LightGBM和XGBoost的比较分析,使用公共的eICU-CRD数据集和私人的韩国圣玛利亚医院数据集。我们的分析表明这些方法在医疗数据异质问题中能够有效地解决,提高了 septicemia的检测率。特别是LightGBM在计算效率和可扩展性方面表现了一定的优势。这项研究为医疗预测分析在全球扩展开创了道路。

A comparative analysis between Conformer-Transducer, Whisper, and wav2vec2 for improving the child speech recognition

  • paper_url: http://arxiv.org/abs/2311.04936
  • repo_url: https://github.com/c3imaging/child_asr_conformer
  • paper_authors: Andrei Barcovschi, Rishabh Jain, Peter Corcoran
  • for: 提高儿童语音识别性能
  • methods: 采用适应器-转换器模型进行自适应,并与其他两种模型进行比较
  • results: 结果显示,适应器-转换器模型在儿童语音识别中具有显著的改善,与其他两种模型相比,wav2vec2模型提供了最好的性能改善。
    Abstract Automatic Speech Recognition (ASR) systems have progressed significantly in their performance on adult speech data; however, transcribing child speech remains challenging due to the acoustic differences in the characteristics of child and adult voices. This work aims to explore the potential of adapting state-of-the-art Conformer-transducer models to child speech to improve child speech recognition performance. Furthermore, the results are compared with those of self-supervised wav2vec2 models and semi-supervised multi-domain Whisper models that were previously finetuned on the same data. We demonstrate that finetuning Conformer-transducer models on child speech yields significant improvements in ASR performance on child speech, compared to the non-finetuned models. We also show Whisper and wav2vec2 adaptation on different child speech datasets. Our detailed comparative analysis shows that wav2vec2 provides the most consistent performance improvements among the three methods studied.
    摘要 自动语音识别(ASR)系统在成人语音数据上的表现已经得到了显著提高,但是识别儿童语音仍然是一项挑战,这是因为儿童和成人语音的音频特征差异较大。本研究旨在探讨使用现有的Conformer-抽象器模型来改进儿童语音识别性能。此外,我们还对自动学习wav2vec2模型和多个频道Whisper模型进行了比较,这些模型都在同一个数据集上进行了finetuning。我们的详细比较分析表明,wav2vec2模型在不同的儿童语音数据集上提供了最稳定的性能改进。

Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning

  • paper_url: http://arxiv.org/abs/2311.04313
  • repo_url: None
  • paper_authors: Rishabh Jain, Peter Corcoran
  • for: 这个论文旨在研究如何使用 Fastpitch 文本到语音(TTS)模型生成高质量的人工婴儿语音。
  • methods: 这种新的方法利用了传输学习训练管道,并对多个说话者 TTS 模型进行了微调。
  • results: 这项研究使用了公共可用的 MyST 数据集(55 小时)进行了微调实验,并释放了一个原型数据集和模型代码,以支持进一步的研究。对于实际和生成的婴儿语音之间的相似性,我们使用了一个预训练的 MOSNet,并进行了对jective评估。此外,我们还使用了自动语音识别(ASR)模型来比较实际和生成的婴儿语音中的单词错误率(WER)。
    Abstract Speech synthesis technology has witnessed significant advancements in recent years, enabling the creation of natural and expressive synthetic speech. One area of particular interest is the generation of synthetic child speech, which presents unique challenges due to children's distinct vocal characteristics and developmental stages. This paper presents a novel approach that leverages the Fastpitch text-to-speech (TTS) model for generating high-quality synthetic child speech. This study uses the transfer learning training pipeline. The approach involved finetuning a multi-speaker TTS model to work with child speech. We use the cleaned version of the publicly available MyST dataset (55 hours) for our finetuning experiments. We also release a prototype dataset of synthetic speech samples generated from this research together with model code to support further research. By using a pretrained MOSNet, we conducted an objective assessment that showed a significant correlation between real and synthetic child voices. Additionally, to validate the intelligibility of the generated speech, we employed an automatic speech recognition (ASR) model to compare the word error rates (WER) of real and synthetic child voices. The speaker similarity between the real and generated speech is also measured using a pretrained speaker encoder.
    摘要 《文本识别技术在最近几年内得到了显著的进步,使得生成自然和表情充沛的合成语音成为可能。一个特别有趣的领域是生成合成儿童语音,这种语音具有儿童特有的声音特征和发展阶段。本文提出了一种新的方法,利用Fastpitch文本识别(TTS)模型来生成高质量的合成儿童语音。本研究使用了传输学习训练管道。我们使用了公共可用的MyST数据集(55小时)进行训练实验。我们还发布了一个原型数据集,包含这些研究中生成的合成语音样本,以及模型代码,以支持进一步的研究。通过使用预训练的MOSNet,我们进行了对比实验,并显示了真实和合成儿童声音之间的显著相似性。此外,为验证生成的语音是否可以理解,我们使用了自动语音识别(ASR)模型来比较真实和合成儿童声音的单词错误率(WER)。生成的语音与真实儿童声音之间的 speaker 相似性也被使用预训练的 speaker 编码器进行评估。

Class-Incremental Continual Learning for General Purpose Healthcare Models

  • paper_url: http://arxiv.org/abs/2311.04301
  • repo_url: None
  • paper_authors: Amritpal Singh, Mustafa Burak Gurbuz, Shiva Souhith Gantha, Prahlad Jasti
  • for: 这个研究旨在检验各种医疗影像案例中,使用不同的特殊性和医院,将模型 sequentially 学习新任务,而不会影响之前学习的任务的性能。
  • methods: 这个研究使用了不同的持续学习方法,包括 episodic memory、iCaRL 和 Rehearsal-based continual learning,并评估它们在不同的医疗影像案例中的表现。
  • results: 研究结果显示,使用不同的持续学习方法,单一的模型可以 sequentially 学习不同的特殊性和医院,并实现与预期值相似的性能。这表明了在不同的医疗影像案例中,可以共享或回收模型,从而推进医疗影像 AI 的发展,并且可以在不同的医院和特殊性中进行共享和实用化。
    Abstract Healthcare clinics regularly encounter dynamic data that changes due to variations in patient populations, treatment policies, medical devices, and emerging disease patterns. Deep learning models can suffer from catastrophic forgetting when fine-tuned in such scenarios, causing poor performance on previously learned tasks. Continual learning allows learning on new tasks without performance drop on previous tasks. In this work, we investigate the performance of continual learning models on four different medical imaging scenarios involving ten classification datasets from diverse modalities, clinical specialties, and hospitals. We implement various continual learning approaches and evaluate their performance in these scenarios. Our results demonstrate that a single model can sequentially learn new tasks from different specialties and achieve comparable performance to naive methods. These findings indicate the feasibility of recycling or sharing models across the same or different medical specialties, offering another step towards the development of general-purpose medical imaging AI that can be shared across institutions.
    摘要 医疗机构常常遇到动态数据,这些数据因为患者人口变化、治疗政策变化、医疗设备更新和新艺术疾病趋势而变化。深度学习模型在这些场景下可能会出现忘记灾难,导致之前学习的任务表现下降。连续学习可以让模型在新任务上学习而不会影响之前学习的任务表现。在这项工作中,我们研究了不同医疗领域的四个医学影像场景中,十个类别数据集的连续学习表现。我们实现了不同的连续学习方法,并评估其在这些场景中的表现。我们的结果表明,一个单一的模型可以在不同的专业领域中顺序学习新任务,并达到与预期方法相同的性能。这些发现表明,可以在同一或不同的医疗专业领域中重用或共享模型,这是普用医学影像AI的又一步发展。

CRAB: Assessing the Strength of Causal Relationships Between Real-world Events

  • paper_url: http://arxiv.org/abs/2311.04284
  • repo_url: None
  • paper_authors: Angelika Romanou, Syrielle Montariol, Debjit Paul, Leo Laugier, Karl Aberer, Antoine Bosselut
  • for: 评估大语言模型在新闻叙述中的 causal 理解能力
  • methods: 使用 CRAB benchmark,测试多种语言模型在 causal 关系之间的理解能力
  • results: 大多数语言模型在这个任务中表现不佳,尤其是当事件来源于复杂的 causal 结构时Here’s a more detailed explanation of each point:
  • for: The paper is written to assess the ability of large language models to understand causal relationships in real-world narratives.
  • methods: The authors use a new benchmark called CRAB to test the performance of several large language models in reasoning about causal relationships in narratives.
  • results: The authors find that most language models perform poorly on this task, and that models perform worse when events are derived from complex causal structures compared to simple linear causal chains.I hope this helps! Let me know if you have any further questions.
    Abstract Understanding narratives requires reasoning about the cause-and-effect relationships between events mentioned in the text. While existing foundation models yield impressive results in many NLP tasks requiring reasoning, it is unclear whether they understand the complexity of the underlying network of causal relationships of events in narratives. In this work, we present CRAB, a new Causal Reasoning Assessment Benchmark designed to evaluate causal understanding of events in real-world narratives. CRAB contains fine-grained, contextual causality annotations for ~2.7K pairs of real-world events that describe various newsworthy event timelines (e.g., the acquisition of Twitter by Elon Musk). Using CRAB, we measure the performance of several large language models, demonstrating that most systems achieve poor performance on the task. Motivated by classical causal principles, we also analyze the causal structures of groups of events in CRAB, and find that models perform worse on causal reasoning when events are derived from complex causal structures compared to simple linear causal chains. We make our dataset and code available to the research community.
    摘要 理解叙述需要对事件之间的 causa-effect 关系进行推理。现有的基础模型在许多 NLP 任务中表现出色,但是是否真的理解叙述中事件的复杂网络 causal 关系却存在uncertainty。在这项工作中,我们介绍了 CRAB,一个新的 causal reasoning assessment benchmark,用于评估事件叙述中的 causal 理解。CRAB 包含 ~2.7K 对实际新闻事件的细腻、上下文 causality 注释,例如 Elon Musk 收购 Twitter 等。使用 CRAB,我们测试了多种大语言模型的性能,发现大多数系统在这项任务中表现不佳。针对古典 causal 原则,我们还分析了 CRAB 中事件组合体系的 causal 结构,发现模型在复杂 causal 结构下的推理性能较差。我们将数据集和代码公开给研究社区。

OtterHD: A High-Resolution Multi-modality Model

  • paper_url: http://arxiv.org/abs/2311.04219
  • repo_url: None
  • paper_authors: Bo Li, Peiyuan Zhang, Jingkang Yang, Yuanhan Zhang, Fanyi Pu, Ziwei Liu
  • for: 这个论文旨在提出一种新型多模态模型OtterHD-8B,用于处理高分辨率视觉输入,并且具有更高的灵活性和准确性。
  • methods: 该模型基于Fuyu-8B architecture,通过采用可变长度视觉编码器和精细调整的权重学习策略,实现了对高分辨率视觉输入的高精度处理。
  • results: 对于MagnifierBench数据集,OtterHD-8B直接处理高分辨率输入时表现出了明显的优势,与当前领先模型相比,具有较高的准确率和灵活性。
    Abstract In this paper, we present OtterHD-8B, an innovative multimodal model evolved from Fuyu-8B, specifically engineered to interpret high-resolution visual inputs with granular precision. Unlike conventional models that are constrained by fixed-size vision encoders, OtterHD-8B boasts the ability to handle flexible input dimensions, ensuring its versatility across various inference requirements. Alongside this model, we introduce MagnifierBench, an evaluation framework designed to scrutinize models' ability to discern minute details and spatial relationships of small objects. Our comparative analysis reveals that while current leading models falter on this benchmark, OtterHD-8B, particularly when directly processing high-resolution inputs, outperforms its counterparts by a substantial margin. The findings illuminate the structural variances in visual information processing among different models and the influence that the vision encoders' pre-training resolution disparities have on model effectiveness within such benchmarks. Our study highlights the critical role of flexibility and high-resolution input capabilities in large multimodal models and also exemplifies the potential inherent in the Fuyu architecture's simplicity for handling complex visual data.
    摘要 在这篇论文中,我们介绍了OtterHD-8B,一种创新的多模式模型,从Fuyu-8B中演化而来,专门用于解释高分辨率视觉输入的细节精度。与传统的模型不同,OtterHD-8B具有可变输入维度的能力,因此在不同的推理需求下表现非常灵活。此外,我们还提出了MagnifierBench评价框架,用于评估模型对小 objet的细节和空间关系的解释能力。我们的比较分析表明,当直接处理高分辨率输入时,OtterHD-8B在同类模型中表现出了明显的优势。这些发现探讨了不同模型在视觉信息处理中的结构差异以及视觉编码器的预训练分辨率差异对模型效果的影响。我们的研究强调了大型多模式模型的灵活性和高分辨率输入能力的重要性,同时也illustrates Fuyu架构的简单性在处理复杂视觉数据方面的潜在优势。

Towards Garment Sewing Pattern Reconstruction from a Single Image

  • paper_url: http://arxiv.org/abs/2311.04218
  • repo_url: None
  • paper_authors: Lijuan Liu, Xiangyu Xu, Zhijie Lin, Jiabin Liang, Shuicheng Yan
  • for: 这个研究旨在利用日常照片中恢复裤子缝制图,以扩展服装设计、虚拟试穿和数字人物等应用。
  • methods: 研究者首先合成了一个具有约1M张图像和真实缝制图的数据集,名为SewFactory,以供模型训练和评估。然后,他们提出了一个两级变换器网络,名为Sewformer,可以显著提高缝制图预测性能。
  • results: EXTENSIVE EXPERIMENTS表明,提案的框架可以有效地恢复缝制图,并在 casually-taken human photos 上广泛适用。
    Abstract Garment sewing pattern represents the intrinsic rest shape of a garment, and is the core for many applications like fashion design, virtual try-on, and digital avatars. In this work, we explore the challenging problem of recovering garment sewing patterns from daily photos for augmenting these applications. To solve the problem, we first synthesize a versatile dataset, named SewFactory, which consists of around 1M images and ground-truth sewing patterns for model training and quantitative evaluation. SewFactory covers a wide range of human poses, body shapes, and sewing patterns, and possesses realistic appearances thanks to the proposed human texture synthesis network. Then, we propose a two-level Transformer network called Sewformer, which significantly improves the sewing pattern prediction performance. Extensive experiments demonstrate that the proposed framework is effective in recovering sewing patterns and well generalizes to casually-taken human photos. Code, dataset, and pre-trained models are available at: https://sewformer.github.io.
    摘要 仪服缝纹模板表示服装的内在形状,是许多应用程序,如时尚设计、虚拟试穿和数字化人物的核心。在这项工作中,我们研究了从日常照片中恢复仪服缝纹模板的问题。为解决这个问题,我们首先生成了一个通用的数据集,名为SewFactory,该数据集包含约1M张图片和对应的缝纹模板,用于模型训练和评估。SewFactory覆盖了人体姿势、身体形态和缝纹模板的广泛范围,并具有真实的外观特征, thanks to our proposed human texture synthesis network。然后,我们提出了一种两级转换器网络,名为Sewformer,该网络能够显著提高缝纹模板预测性能。广泛的实验证明了我们提出的框架可以高效地恢复缝纹模板,并在习惯性图像中广泛应用。代码、数据集和预训练模型可以在:https://sewformer.github.io/获取。

Wearable data from subjects playing Super Mario, sitting university exams, or performing physical exercise help detect acute mood episodes via self-supervised learning

  • paper_url: http://arxiv.org/abs/2311.04215
  • repo_url: None
  • paper_authors: Filippo Corponi, Bryan M. Li, Gerard Anmella, Clàudia Valenzuela-Pascual, Ariadna Mas, Isabella Pacchiarotti, Marc Valentí, Iria Grande, Antonio Benabarre, Marina Garriga, Eduard Vieta, Allan H Young, Stephen M. Lawrie, Heather C. Whalley, Diego Hidalgo-Mazzei, Antonio Vergari
  • for: 这个研究用于监测情绪障碍(MDs),利用佩戴式设备收集的数据,以提高现代监测技术的应用。
  • methods: 本研究使用了自动学习(SSL)技术,利用无标签数据来学习表示,然后用于支持任务。
  • results: 研究显示,SSL可以有效地检测MDs的急性症状和稳定状态,并且可以与权威的XGBoost和Transformer架构进行比较。
    Abstract Personal sensing, leveraging data passively and near-continuously collected with wearables from patients in their ecological environment, is a promising paradigm to monitor mood disorders (MDs), a major determinant of worldwide disease burden. However, collecting and annotating wearable data is very resource-intensive. Studies of this kind can thus typically afford to recruit only a couple dozens of patients. This constitutes one of the major obstacles to applying modern supervised machine learning techniques to MDs detection. In this paper, we overcome this data bottleneck and advance the detection of MDs acute episode vs stable state from wearables data on the back of recent advances in self-supervised learning (SSL). This leverages unlabelled data to learn representations during pre-training, subsequently exploited for a supervised task. First, we collected open-access datasets recording with an Empatica E4 spanning different, unrelated to MD monitoring, personal sensing tasks -- from emotion recognition in Super Mario players to stress detection in undergraduates -- and devised a pre-processing pipeline performing on-/off-body detection, sleep-wake detection, segmentation, and (optionally) feature extraction. With 161 E4-recorded subjects, we introduce E4SelfLearning, the largest to date open access collection, and its pre-processing pipeline. Second, we show that SSL confidently outperforms fully-supervised pipelines using either our novel E4-tailored Transformer architecture (E4mer) or classical baseline XGBoost: 81.23% against 75.35% (E4mer) and 72.02% (XGBoost) correctly classified recording segments from 64 (half acute, half stable) patients. Lastly, we illustrate that SSL performance is strongly associated with the specific surrogate task employed for pre-training as well as with unlabelled data availability.
    摘要 个人感知,通过PASSIVEALLY和持续采集来自病人的佩戴设备数据,是监测情绪障碍(MD)的有前途的方法。然而,收集和标注佩戴设备数据具有很大的资源投入。这类研究通常只能招募几十名病人。这是监测MD的主要障碍。在这篇论文中,我们解决了这种数据瓶颈,并在使用最新的自我超vision学习(SSL)技术时提高了监测MD急性病情vs稳定状态的能力。通过在预训练阶段使用无标签数据来学习表示,然后在supervised任务上进行应用。我们收集了来自Empatica E4设备的开放访问数据,包括不同的、与MD监测无关的个人感知任务,如Super Mario游戏中的情绪识别和大学生中的压力检测。我们设计了一个预处理管道,包括在/离体检测、睡卫睡检测、分割和(可选)特征提取。通过161名E4记录的主题,我们引入了E4SelfLearning,是目前最大的开放访问收集。我们还证明了,使用我们的专门设计的E4mer变体的Transformer架构,SSL可以强制超过完全监测的架构,包括XGBoost。在64名病人中,我们获得了81.23%的正确记录分割,比75.35%和72.02%的完全监测和XGBoost架构更高。最后,我们发现了ssl性能与特定的代表任务的预训练和无标签数据可用性之间存在强相关性。

Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves

  • paper_url: http://arxiv.org/abs/2311.04205
  • repo_url: https://github.com/uclaml/Rephrase-and-Respond
  • paper_authors: Yihe Deng, Weitong Zhang, Zixiang Chen, Quanquan Gu
  • for: 这个论文的目的是提出一种名为“重句并回答”(Rephrase and Respond,简称RaR)的方法,以便人工智能语言模型(LLM)更好地理解人类提出的问题,并在单个提问中提供回答。
  • methods: 这种方法使用一个拥有重句能力的LLM来重句人类提出的问题,然后将原始问题和重句后的问题一起传递给另一个回答LLM。这种方法可以让LLM更好地理解问题,并提高它们的表现。
  • results: 实验表明,使用RaR方法可以提高不同任务的模型表现,并且与现有的链条思维(Chain-of-Thought,CoT)方法相比,RaR方法是一种更有效的提问方法。此外,RaR方法可以与CoT方法相结合,以达到更高的表现。
    Abstract Misunderstandings arise not only in interpersonal communication but also between humans and Large Language Models (LLMs). Such discrepancies can make LLMs interpret seemingly unambiguous questions in unexpected ways, yielding incorrect responses. While it is widely acknowledged that the quality of a prompt, such as a question, significantly impacts the quality of the response provided by LLMs, a systematic method for crafting questions that LLMs can better comprehend is still underdeveloped. In this paper, we present a method named `Rephrase and Respond' (RaR), which allows LLMs to rephrase and expand questions posed by humans and provide responses in a single prompt. This approach serves as a simple yet effective prompting method for improving performance. We also introduce a two-step variant of RaR, where a rephrasing LLM first rephrases the question and then passes the original and rephrased questions together to a different responding LLM. This facilitates the effective utilization of rephrased questions generated by one LLM with another. Our experiments demonstrate that our methods significantly improve the performance of different models across a wide range to tasks. We further provide a comprehensive comparison between RaR and the popular Chain-of-Thought (CoT) methods, both theoretically and empirically. We show that RaR is complementary to CoT and can be combined with CoT to achieve even better performance. Our work not only contributes to enhancing LLM performance efficiently and effectively but also sheds light on a fair evaluation of LLM capabilities. Data and codes are available at https://github.com/uclaml/Rephrase-and-Respond.
    摘要 人类和大语言模型(LLM)之间的误解不仅出现在人际交流中,还出现在LLM与人类之间的交流中。这些不一致性可能使LLM对明确的问题进行不预期的解释,导致错误的回答。虽然广泛认可的提问质量对LLM提供回答的质量产生 significan influence,但是一种系统化的提问方法,例如`Rephrase and Respond`(RaR),仍然处于开发阶段。在这篇论文中,我们提出了一种方法,即RaR,允许LLM对人类提出的问题进行重新表达和扩展,并提供回答在单个提问中。这种方法可以提高LLM的性能,并且我们还提出了一种两步变体的RaR方法,其中一个重新表达LLM首先重新表达问题,然后将原始和重新表达的问题一起传递给另一个回答LLM。这种方法可以有效地利用重新表达的问题,并且我们的实验表明,我们的方法可以在各种任务上提高不同的模型性能。此外,我们还对RaR和流行的链式思维(CoT)方法进行了比较,并证明了RaR和CoT是 complementary的,可以在一起使用以达到更好的性能。我们的工作不仅可以有效地提高LLM性能,还可以为LLM评估提供新的思路。数据和代码可以在https://github.com/uclaml/Rephrase-and-Respond中获取。

JPAVE: A Generation and Classification-based Model for Joint Product Attribute Prediction and Value Extraction

  • paper_url: http://arxiv.org/abs/2311.04196
  • repo_url: https://github.com/zhongfendeng/jpave
  • paper_authors: Zhongfen Deng, Hao Peng, Tao Zhang, Shuaiqi Liu, Wenting Zhao, Yibo Wang, Philip S. Yu
  • for: 这篇论文主要针对产品特征值EXTRACTION问题进行研究,帮助了下游应用程序如产品搜索和推荐。
  • methods: 该论文提出了一种多任务学习模型,包括值生成/分类和特征预测,以预测值无需考虑文本中值的顺序信息。此外,模型还包括值生成器的复制机制和值分类器的值注意模块,以解决数据不一致问题。
  • results: 实验结果表明,该模型比强基eline模型更有优势,并且具有更好的扩展性和适用性。
    Abstract Product attribute value extraction is an important task in e-Commerce which can help several downstream applications such as product search and recommendation. Most previous models handle this task using sequence labeling or question answering method which rely on the sequential position information of values in the product text and are vulnerable to data discrepancy between training and testing. This limits their generalization ability to real-world scenario in which each product can have multiple descriptions across various shopping platforms with different composition of text and style. They also have limited zero-shot ability to new values. In this paper, we propose a multi-task learning model with value generation/classification and attribute prediction called JPAVE to predict values without the necessity of position information of values in the text. Furthermore, the copy mechanism in value generator and the value attention module in value classifier help our model address the data discrepancy issue by only focusing on the relevant part of input text and ignoring other information which causes the discrepancy issue such as sentence structure in the text. Besides, two variants of our model are designed for open-world and closed-world scenarios. In addition, copy mechanism introduced in the first variant based on value generation can improve its zero-shot ability for identifying unseen values. Experimental results on a public dataset demonstrate the superiority of our model compared with strong baselines and its generalization ability of predicting new values.
    摘要 产品特征值EXTRACTION是电商中非常重要的任务,可以帮助多种下游应用程序,如产品搜索和推荐。之前的大多数模型都使用序列标注或问答方法来处理这个任务,这些方法需要文本中值的顺序信息,但是这些方法容易受到训练和测试数据的不一致问题的影响, limiting their generalization ability to real-world scenarios in which each product can have multiple descriptions across various shopping platforms with different text composition and style. They also have limited zero-shot ability to new values.在这篇论文中,我们提出了一种多任务学习模型,即JPAVE,可以预测值无需文本中值的顺序信息。此外,模型中的复制机制和价值注意模块帮助我们解决数据不一致问题,只关注输入文本中相关的部分,忽略其他信息。此外,我们还设计了两种模型的变体,一种适用于开放世界场景,另一种适用于关闭世界场景。此外,模型中的复制机制可以提高其零shot能力,用于识别未看过的值。实验结果表明,我们的模型在公共数据集上比强基eline表现出色,并且具有普遍性和适用性。

Selective Visual Representations Improve Convergence and Generalization for Embodied AI

  • paper_url: http://arxiv.org/abs/2311.04193
  • repo_url: None
  • paper_authors: Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Ani Kembhavi, Ranjay Krishna
    for:* 这个论文是为了提高embodied AI模型中的视觉处理的精度和效率,使其更好地处理任务相关的视觉信息。methods:* 这个论文使用了一种Parameter-efficient Approach,即使用小型学习编码表,来实现视觉信息的筛选和压缩。results:* 这个论文在5个benchmark上实现了state-of-the-art的性能,包括ProcTHOR、ArchitecTHOR、RoboTHOR、AI2-iTHOR和ManipulaTHOR。In Simplified Chinese text, the three key points would be:for:* 这个论文是为了提高embodied AI模型中的视觉处理的精度和效率,使其更好地处理任务相关的视觉信息。methods:* 这个论文使用了一种Parameter-efficient Approach,即使用小型学习编码表,来实现视觉信息的筛选和压缩。results:* 这个论文在5个benchmark上实现了state-of-the-art的性能,包括ProcTHOR、ArchitecTHOR、RoboTHOR、AI2-iTHOR和ManipulaTHOR。
    Abstract Embodied AI models often employ off the shelf vision backbones like CLIP to encode their visual observations. Although such general purpose representations encode rich syntactic and semantic information about the scene, much of this information is often irrelevant to the specific task at hand. This introduces noise within the learning process and distracts the agent's focus from task-relevant visual cues. Inspired by selective attention in humans-the process through which people filter their perception based on their experiences, knowledge, and the task at hand-we introduce a parameter-efficient approach to filter visual stimuli for embodied AI. Our approach induces a task-conditioned bottleneck using a small learnable codebook module. This codebook is trained jointly to optimize task reward and acts as a task-conditioned selective filter over the visual observation. Our experiments showcase state-of-the-art performance for object goal navigation and object displacement across 5 benchmarks, ProcTHOR, ArchitecTHOR, RoboTHOR, AI2-iTHOR, and ManipulaTHOR. The filtered representations produced by the codebook are also able generalize better and converge faster when adapted to other simulation environments such as Habitat. Our qualitative analyses show that agents explore their environments more effectively and their representations retain task-relevant information like target object recognition while ignoring superfluous information about other objects. Code and pretrained models are available at our project website: https://embodied-codebook.github.io.
    摘要 embodied AI模型经常使用 commercially available视觉脊梁 like CLIP来编码其视觉观察。 although these general-purpose representations encode rich syntax and semantics about the scene, much of this information is often irrelevant to the specific task at hand. This introduces noise into the learning process and distracts the agent's focus from task-relevant visual cues. inspired by selective attention in humans-the process through which people filter their perception based on their experiences, knowledge, and the task at hand-we introduce a parameter-efficient approach to filter visual stimuli for embodied AI. our approach induces a task-conditioned bottleneck using a small learnable codebook module. This codebook is trained jointly to optimize task reward and acts as a task-conditioned selective filter over the visual observation. our experiments showcase state-of-the-art performance for object goal navigation and object displacement across 5 benchmarks, ProcTHOR, ArchitecTHOR, RoboTHOR, AI2-iTHOR, and ManipulaTHOR. the filtered representations produced by the codebook are also able to generalize better and converge faster when adapted to other simulation environments such as Habitat. our qualitative analyses show that agents explore their environments more effectively and their representations retain task-relevant information like target object recognition while ignoring superfluous information about other objects. code and pretrained models are available at our project website: .

Spatio-Temporal Anomaly Detection with Graph Networks for Data Quality Monitoring of the Hadron Calorimeter

  • paper_url: http://arxiv.org/abs/2311.04190
  • repo_url: None
  • paper_authors: Mulugeta Weldezgina Asres, Christian Walter Omlin, Long Wang, David Yu, Pavel Parygin, Jay Dittmann, Georgia Karapostoli, Markus Seidel, Rosamaria Venditti, Luka Lambrecht, Emanuele Usai, Muhammad Ahmad, Javier Fernandez Menendez, Kaori Maeshima, the CMS-HCAL Collaboration
  • for: 这个研究是为了提供一种 semi-supervised spatio-temporal anomaly detection (AD) 监测系统,用于检测高能物理实验室(LHC)中的粒子数据获取问题。
  • methods: 该研究使用了三维幂occupation图数据,并使用了卷积神经网络、图神经网络和回归神经网络来学习本地空间特征和全球行为。
  • results: 该研究表明,提出的 AD 监测系统可以准确地检测多种通道故障类型,并且在生产环境中实现了高级别的准确率。
    Abstract The compact muon solenoid (CMS) experiment is a general-purpose detector for high-energy collision at the large hadron collider (LHC) at CERN. It employs an online data quality monitoring (DQM) system to promptly spot and diagnose particle data acquisition problems to avoid data quality loss. In this study, we present semi-supervised spatio-temporal anomaly detection (AD) monitoring for the physics particle reading channels of the hadronic calorimeter (HCAL) of the CMS using three-dimensional digi-occupancy map data of the DQM. We propose the GraphSTAD system, which employs convolutional and graph neural networks to learn local spatial characteristics induced by particles traversing the detector, and global behavior owing to shared backend circuit connections and housing boxes of the channels, respectively. Recurrent neural networks capture the temporal evolution of the extracted spatial features. We have validated the accuracy of the proposed AD system in capturing diverse channel fault types using the LHC Run-2 collision data sets. The GraphSTAD system has achieved production-level accuracy and is being integrated into the CMS core production system--for real-time monitoring of the HCAL. We have also provided a quantitative performance comparison with alternative benchmark models to demonstrate the promising leverage of the presented system.
    摘要 “聚焦μ子探测器(CMS)实验是高能物理实验室(LHC)中的通用探测器,位于瑞士日内瓦核子研究所(CERN)。它使用在线数据质量监控(DQM)系统来及时检测和诊断粒子数据收集问题,以避免数据质量损失。在本研究中,我们提出了基于三维干扰量图像数据的 semi-supervised 空间时间异常检测(AD)监控方案,用于 физи学粒子读取通道(HCAL)的CMS。我们提出的GraphSTAD系统使用卷积神经网络学习粒子通过探测器的本地空间特征,以及全球性的后端电路连接和底板盒子特征。循环神经网络捕捉粒子 traverse 探测器的时间演化。我们在LHCRun-2碰撞数据集上验证了提议的AD系统的准确性,可以捕捉多种通道故障类型。GraphSTAD系统已经达到生产级准确性,并在CMS核心生产系统中进行实时监控HCAL。我们还提供了alternative benchmark模型的量化性能比较,以示出提出的系统的承诺优势。”

Prompt Cache: Modular Attention Reuse for Low-Latency Inference

  • paper_url: http://arxiv.org/abs/2311.04934
  • repo_url: None
  • paper_authors: In Gim, Guojun Chen, Seung-seob Lee, Nikhil Sarda, Anurag Khandelwal, Lin Zhong
  • for: 加速大语言模型(LLM)的推理进程,以便更快地响应用户提交的问题。
  • methods: 利用预计算和存储推理过程中出现的文本段的注意力状态,以便在用户提交中重用这些状态,从而提高推理速度。
  • results: 对多种大语言模型进行评估,显示Prompt Cache可以减少时间到首个字符的延迟,特别是对长提问的回答和推荐问题。对GPU基于的推理进程,提高8倍;对CPU基于的推理进程,提高60倍,而无需修改模型参数。
    Abstract We present Prompt Cache, an approach for accelerating inference for large language models (LLM) by reusing attention states across different LLM prompts. Many input prompts have overlapping text segments, such as system messages, prompt templates, and documents provided for context. Our key insight is that by precomputing and storing the attention states of these frequently occurring text segments on the inference server, we can efficiently reuse them when these segments appear in user prompts. Prompt Cache employs a schema to explicitly define such reusable text segments, called prompt modules. The schema ensures positional accuracy during attention state reuse and provides users with an interface to access cached states in their prompt. Using a prototype implementation, we evaluate Prompt Cache across several LLMs. We show that Prompt Cache significantly reduce latency in time-to-first-token, especially for longer prompts such as document-based question answering and recommendations. The improvements range from 8x for GPU-based inference to 60x for CPU-based inference, all while maintaining output accuracy and without the need for model parameter modifications.
    摘要 我们提出了几个方法可以加速大型语言模型(LLM)的推理过程,包括重复使用参考状态。许多输入提示中有许多重复的文本段落,例如系统讯息、提示模板和提供了背景文本。我们的关键见解是,可以在推理服务器上预先计算和储存这些频繁出现的文本段落的参考状态,以便在用户提示中重复使用它们。我们称之为“提示库”(Prompt Cache)。这个架构使用一个schema来明确定义可重复使用的文本段落,称为“提示模组”(Prompt Module)。这个schema确保了参考状态重复使用时的位置精度,并提供了用户可以在提示中访问储存的状态的界面。使用一个原型实现,我们评估了Prompt Cache在多个LLM上。我们发现,Prompt Cache可以对推理过程中的时间-第一个字元(time-to-first-token)进行明显增加,特别是在更长的提示中,例如文档基于的问题回答和推荐。改进范围从GPU基础的推理过程中8倍增加到CPU基础的推理过程中60倍,而且不需要模型参数的修改,同时保持输出精度。

On Leakage in Machine Learning Pipelines

  • paper_url: http://arxiv.org/abs/2311.04179
  • repo_url: https://github.com/djdprogramming/adfa2
  • paper_authors: Leonard Sasse, Eliana Nicolaisen-Sobesky, Juergen Dukart, Simon B. Eickhoff, Michael Götz, Sami Hamdan, Vera Komeyer, Abhijit Kulkarni, Juha Lahnakoski, Bradley C. Love, Federico Raimondo, Kaustubh R. Patil
  • for: 本研究旨在拓展遗漏导致ML管道性能估计过optimistic的原因,以及如何在设计、实现和评估ML管道时避免遗漏。
  • methods: 本文使用具体的实例来描述ML管道中可能出现的多种遗漏类型,并进行了全面的介绍和讨论。
  • results: 本研究的结果表明,遗漏可能导致ML管道的性能估计过optimistic,并且可能导致新数据上的失败generalization。
    Abstract Machine learning (ML) provides powerful tools for predictive modeling. ML's popularity stems from the promise of sample-level prediction with applications across a variety of fields from physics and marketing to healthcare. However, if not properly implemented and evaluated, ML pipelines may contain leakage typically resulting in overoptimistic performance estimates and failure to generalize to new data. This can have severe negative financial and societal implications. Our aim is to expand understanding associated with causes leading to leakage when designing, implementing, and evaluating ML pipelines. Illustrated by concrete examples, we provide a comprehensive overview and discussion of various types of leakage that may arise in ML pipelines.
    摘要

Enhancing LLM Intelligence with ARM-RAG: Auxiliary Rationale Memory for Retrieval Augmented Generation

  • paper_url: http://arxiv.org/abs/2311.04177
  • repo_url: None
  • paper_authors: Eric Melz
  • for: 这篇论文是关于如何提高语言模型的智能水平的研究。
  • methods: 该论文使用了 Retrieval Augmented Generation(RAG)技术,并提出了一种名为 Auxiliary Rationale Memory for Retrieval Augmented Generation(ARM-RAG)的新系统。
  • results: 该研究发现,通过存储和后续检索解释链可以提高解决grade-school math问题的性能。
    Abstract Large Language Models (LLMs) are smart but forgetful. Recent studies, (e.g., (Bubeck et al., 2023)) on modern LLMs have shown that they are capable of performing amazing tasks typically necessitating human-level intelligence. However, unlike humans, frozen LLMs do not improve over time; they neither acquire new knowledge nor learn from their successes or failures. Some approaches to improving the intelligence of LLMs include fine-tuning models based on problem-solving performance (Zelikman et al., 2022), and building bigger and more sophisticated models (Bubeck et al., 2023). However, these methods have the drawback of requiring substantial data and computational resources to retrain existing models. In this paper, we explore the use of Retrieval Augmented Generation, also known as RAG (Lewis et al., 2021) to improve problem-solving performance. We propose ARM-RAG (Auxiliary Rationale Memory for Retrieval Augmented Generation), a system that learns from its successes without incurring high training costs. We demonstrate that the storage and subsequent retrieval of reasoning chains have a positive influence on performance in grade-school math problems.
    摘要

HADES: Fast Singularity Detection with Local Measure Comparison

  • paper_url: http://arxiv.org/abs/2311.04171
  • repo_url: None
  • paper_authors: Uzu Lim, Harald Oberhauser, Vidit Nanda
  • for: 检测数据中的独特点(singularities)
  • methods: 使用核心善良测试和 diferencial geometry 等工具
  • results: 在数据样本生成的跨维度抽象上检测独特点,并在实验中回归真实存在的独特点
    Abstract We introduce Hades, an unsupervised algorithm to detect singularities in data. This algorithm employs a kernel goodness-of-fit test, and as a consequence it is much faster and far more scaleable than the existing topology-based alternatives. Using tools from differential geometry and optimal transport theory, we prove that Hades correctly detects singularities with high probability when the data sample lives on a transverse intersection of equidimensional manifolds. In computational experiments, Hades recovers singularities in synthetically generated data, branching points in road network data, intersection rings in molecular conformation space, and anomalies in image data.
    摘要 我团队介绍了冥王(Hades)算法,用于不监督地检测数据中的缺陷。这个算法使用kernel好度验证测试,因此它比现有的topology基于的方法更快速和可扩展。使用差分几何和最优运输理论,我们证明了冥王在数据样本生活在等维度抽象 manifold上的极限交叉点上correctly检测缺陷,并且在计算实验中能够回归数据中的分支点、路网数据中的分支点、分子 conformity 空间中的交叉环和图像数据中的异常。Note: "冥王" (Hades) is the name of the algorithm, and "数据中的缺陷" (singularities in data) is the object being detected.

Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization

  • paper_url: http://arxiv.org/abs/2311.04163
  • repo_url: None
  • paper_authors: Elan Rosenfeld, Andrej Risteski
  • For: This paper is written to study the interaction between depth and heavy-tailed structures in neural network optimization, and to provide intuitive explanations for several previously reported observations about network training dynamics.* Methods: The paper uses experimental and theoretical approaches to study the phenomenon of opposing signals in training data, and to explore the effects of these signals on the optimization and behavior of the network. The authors also compare the performance of two popular optimization algorithms, Adam and SGD.* Results: The paper reports several key findings, including the existence of paired groups of outliers in the training data that can significantly influence the optimization process, the concept of “grokking” and its connection to the edge of stability, and the importance of carefully balancing opposing signals during training. The authors also provide a mechanistic explanation of the phenomenon on a toy example and a theoretical analysis of a two-layer linear network on a simple model. Experimental results confirm the predictions made by the authors.
    Abstract We identify a new phenomenon in neural network optimization which arises from the interaction of depth and a particular heavy-tailed structure in natural data. Our result offers intuitive explanations for several previously reported observations about network training dynamics. In particular, it implies a conceptually new cause for progressive sharpening and the edge of stability; we also highlight connections to other concepts in optimization and generalization including grokking, simplicity bias, and Sharpness-Aware Minimization. Experimentally, we demonstrate the significant influence of paired groups of outliers in the training data with strong opposing signals: consistent, large magnitude features which dominate the network output throughout training and provide gradients which point in opposite directions. Due to these outliers, early optimization enters a narrow valley which carefully balances the opposing groups; subsequent sharpening causes their loss to rise rapidly, oscillating between high on one group and then the other, until the overall loss spikes. We describe how to identify these groups, explore what sets them apart, and carefully study their effect on the network's optimization and behavior. We complement these experiments with a mechanistic explanation on a toy example of opposing signals and a theoretical analysis of a two-layer linear network on a simple model. Our finding enables new qualitative predictions of training behavior which we confirm experimentally. It also provides a new lens through which to study and improve modern training practices for stochastic optimization, which we highlight via a case study of Adam versus SGD.
    摘要 我们发现了一种新的现象在神经网络优化中,它来自数据中自然存在的特殊重 tailed 结构和深度之间的交互作用。我们的结果提供了直观的解释,解释了许多先前报道的网络训练动态现象。特别是,它 imply 一种新的原因导致进步性和稳定边缘现象;我们还高亮了与其他优化和泛化概念相关的 grokking、简洁偏好和 Sharpness-Aware Minimization。实验证明,在训练数据中存在对抗信号的对应对,这些对应对包括强大的大量特征,这些特征在训练过程中控制网络输出,并且为网络提供反对向量。由于这些对应对,早期优化在训练过程中进入一个窄的谷地,精心平衡这些对应对的抗向量;随后的折衣使得对应对的损失快速增加,振荡在一个对应对的高峰和另一个对应对的高峰之间,直到总损失快速增加。我们描述了如何识别这些对应对,探索它们之间的差异,并且仔细研究它们对网络优化和行为的影响。我们补充了这些实验的机械化解释,并对一个简单的对应对示例进行了理论分析。我们的发现可以提供新的质量预测,我们通过实验确认了它们。它还提供了一个新的视角,用于研究和改进现代训练实践。

A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis

  • paper_url: http://arxiv.org/abs/2311.04157
  • repo_url: https://github.com/imageomics/intr
  • paper_authors: Dipanjyoti Paul, Arpita Chowdhury, Xinqi Xiong, Feng-Ju Chang, David Carlyn, Samuel Stevens, Kaiya Provost, Anuj Karpatne, Bryan Carstens, Daniel Rubenstein, Charles Stewart, Tanya Berger-Wolf, Yu Su, Wei-Lun Chao
  • for: 提高图像分类的可读性,通过增强每个类别在图像中的搜索和特征LOCAL化。
  • methods: 基于Transformer编码器解码器的INTR方法,学习每个类别的特征查询,通过对比跨核查重来实现图像分类的可读性。
  • results: 在八个频道上进行了多种细腻的图像分类和分析,并且发现INTR方法能够增强图像分类的可读性和精度。
    Abstract We present a novel usage of Transformers to make image classification interpretable. Unlike mainstream classifiers that wait until the last fully-connected layer to incorporate class information to make predictions, we investigate a proactive approach, asking each class to search for itself in an image. We realize this idea via a Transformer encoder-decoder inspired by DEtection TRansformer (DETR). We learn ``class-specific'' queries (one for each class) as input to the decoder, enabling each class to localize its patterns in an image via cross-attention. We name our approach INterpretable TRansformer (INTR), which is fairly easy to implement and exhibits several compelling properties. We show that INTR intrinsically encourages each class to attend distinctively; the cross-attention weights thus provide a faithful interpretation of the prediction. Interestingly, via ``multi-head'' cross-attention, INTR could identify different ``attributes'' of a class, making it particularly suitable for fine-grained classification and analysis, which we demonstrate on eight datasets. Our code and pre-trained model are publicly accessible at https://github.com/Imageomics/INTR.
    摘要 我们提出了一种使用 transformer 进行图像分类的新方法,以使得分类结果更加可解。不同于主流的分类器,我们在最后一层完全连接层之前就将类信息integrated进行预测,而我们的方法是主动的,每个类都会在图像中搜寻自己的特征。我们通过基于DEtection TRansformer(DETR)的 transformer 编码器-解码器来实现这个想法,学习每个类的特定的查询(每个类一个),使得每个类能够通过对比关注图像中的特征进行地方化搜寻。我们命名这种方法为 INterpretable TRansformer(INTR),它容易实现并且具有许多吸引人的特性。我们表明,INTR 会自然地让每个类强制关注独特的特征,因此对比考虑的加权可以提供准确的预测解释。更加有趣的是,通过多头对比关注,INTR 可以识别不同的类属性,使其特别适合细致的分类和分析,我们在八个数据集上进行了示例。我们的代码和预训练模型可以在 中获取。

Contactless Fingerprint Biometric Anti-Spoofing: An Unsupervised Deep Learning Approach

  • paper_url: http://arxiv.org/abs/2311.04148
  • repo_url: None
  • paper_authors: Banafsheh Adami, Nima Karimian
  • For: 本研究旨在提高无接触指纹识别系统的用户舒适性和防止指纹假样攻击。* Methods: 本研究使用了一种创新的防伪方法,combines an unsupervised autoencoder with a convolutional block attention module,并且在训练阶段不暴露模型于假样攻击。* Results: 研究发现,使用这种方法可以在不同类型的假样攻击图像测试阶段达到了0.96%的BPCER和1.6%的APCER。
    Abstract Contactless fingerprint recognition offers a higher level of user comfort and addresses hygiene concerns more effectively. However, it is also more vulnerable to presentation attacks such as photo paper, paper-printout, and various display attacks, which makes it more challenging to implement in biometric systems compared to contact-based modalities. Limited research has been conducted on presentation attacks in contactless fingerprint systems, and these studies have encountered challenges in terms of generalization and scalability since both bonafide samples and presentation attacks are utilized during training model. Although this approach appears promising, it lacks the ability to handle unseen attacks, which is a crucial factor for developing PAD methods that can generalize effectively. We introduced an innovative anti-spoofing approach that combines an unsupervised autoencoder with a convolutional block attention module to address the limitations of existing methods. Our model is exclusively trained on bonafide images without exposure to any spoofed samples during the training phase. It is then evaluated against various types of presentation attack images in the testing phase. The scheme we proposed has achieved an average BPCER of 0.96\% with an APCER of 1.6\% for presentation attacks involving various types of spoofed samples.
    摘要 无接触指纹识别技术可以提供更高水平的用户 COMFORT 和更好地解决卫生问题。然而,它也更容易受到展示攻击,如 фото纸、打印机打印出的纸张和多种显示攻击,这使得其在生物ometrics系统中实现更加挑战。有限的研究已经在无接触指纹系统中进行了展示攻击,但这些研究遇到了总结和扩展性的问题,因为训练模型时都需要使用真实的样本和攻击样本。这种方法看上去很有前途,但它缺乏对未看过的攻击的能力,这是生物метrics系统中发展可靠PAD方法的关键因素。我们介绍了一种创新的防伪方法,该方法结合了无监督自适应神经网络和卷积束注意模块来解决现有方法的局限性。我们的模型在训练阶段只接触 bonafide 图像,而不是任何假样本。然后,我们在测试阶段对各种展示攻击图像进行评估。我们的方案实现了平均BPCER 0.96% 和 APcer 1.6% 的表现,对于各种假样本展示攻击来说。

Locating Cross-Task Sequence Continuation Circuits in Transformers

  • paper_url: http://arxiv.org/abs/2311.04131
  • repo_url: None
  • paper_authors: Michael Lan, Fazl Barez
  • for: 这个论文旨在探讨 transformer 模型如何进行语言任务,以及如何将其解释为可读的人类可理解的表示。
  • methods: 该论文使用了将 transformer 模型转换成可读的表示,称为电路,以实现算法功能。
  • results: 该论文通过分析和比较不同类型的序列续写任务的电路,发现了检测序列成员和预测下一个序列成员的关键子电路。此外,该论文还发现了semantically相关的序列使用共享的电路子графи,具有相似的角色。
    Abstract While transformer models exhibit strong capabilities on linguistic tasks, their complex architectures make them difficult to interpret. Recent work has aimed to reverse engineer transformer models into human-readable representations called circuits that implement algorithmic functions. We extend this research by analyzing and comparing circuits for similar sequence continuation tasks, which include increasing sequences of digits, number words, and months. Through the application of circuit analysis techniques, we identify key sub-circuits responsible for detecting sequence members and for predicting the next member in a sequence. Our analysis reveals that semantically related sequences rely on shared circuit subgraphs with analogous roles. Overall, documenting shared computational structures enables better prediction of model behaviors, identification of errors, and safer editing procedures. This mechanistic understanding of transformers is a critical step towards building more robust, aligned, and interpretable language models.
    摘要 transformer 模型在语言任务上表现出色,但其复杂的架构使其难以解释。 recent work 尝试将 transformer 模型转化成可读的人类表示,称为Circuit。我们在这些研究的基础上进一步分析和比较不同的序列续写任务的Circuit,包括增加数字、数字名称和月份。通过Circuit分析技术,我们发现了检测序列成员的关键子Circuit和预测下一个序列成员的Circuit。我们的分析表明,semantically related sequences 使用共享的Circuit子图,具有相似的角色。总的来说,记录共享的计算结构可以更好地预测模型的行为,标识错误和安全地编辑过程。这种机制性的理解transformer 是建立更加稳定、aligned和可解释的语言模型的重要一步。

Unveiling Safety Vulnerabilities of Large Language Models

  • paper_url: http://arxiv.org/abs/2311.04124
  • repo_url: None
  • paper_authors: George Kour, Marcel Zalmanovici, Naama Zwerdling, Esther Goldbraich, Ora Nova Fandina, Ateret Anaby-Tavor, Orna Raz, Eitan Farchi
  • for: This paper is written to address the concern of harmful or inappropriate responses from large language models, and to provide a dataset (AttaQ) and a novel approach for identifying and naming vulnerable semantic regions in such models.
  • methods: The paper uses a unique dataset of adversarial examples in the form of questions, and introduces a novel automatic approach for identifying and naming vulnerable semantic regions in models using specialized clustering techniques.
  • results: The paper assesses the efficacy of its dataset and approach by analyzing the vulnerabilities of various models when subjected to the AttaQ dataset, and demonstrates the effectiveness of its approach in identifying and naming vulnerable semantic regions.
    Abstract As large language models become more prevalent, their possible harmful or inappropriate responses are a cause for concern. This paper introduces a unique dataset containing adversarial examples in the form of questions, which we call AttaQ, designed to provoke such harmful or inappropriate responses. We assess the efficacy of our dataset by analyzing the vulnerabilities of various models when subjected to it. Additionally, we introduce a novel automatic approach for identifying and naming vulnerable semantic regions - input semantic areas for which the model is likely to produce harmful outputs. This is achieved through the application of specialized clustering techniques that consider both the semantic similarity of the input attacks and the harmfulness of the model's responses. Automatically identifying vulnerable semantic regions enhances the evaluation of model weaknesses, facilitating targeted improvements to its safety mechanisms and overall reliability.
    摘要 大型语言模型在使用时,可能会产生不当或不适当的回应,这引起了对其可能的害的担忧。本篇文章介绍了一个唯一的数据集,即AttaQ,这是一组设计来提oking这些害的问题。我们通过分析不同模型在面对这个数据集时的脆弱性,以评估数据集的有效性。此外,我们还引入了一种新的自动化方法,可以自动识别和命名容易受到害的 semantic 区域,这些区域是模型对于特定的入力攻击而产生不当回应的区域。我们通过特殊的聚类技术来实现这一点,考虑了对于入力攻击的Semantic similarity和模型对于这些攻击的回应的害性。自动识别容易受到害的 semantic 区域可以增强模型的评估,并且可以帮助改善模型的安全机制和整体可靠性。

ETDPC: A Multimodality Framework for Classifying Pages in Electronic Theses and Dissertations

  • paper_url: http://arxiv.org/abs/2311.04262
  • repo_url: https://github.com/lamps-lab/ETDMiner
  • paper_authors: Muntabir Hasan Choudhury, Lamia Salsabil, William A. Ingram, Edward A. Fox, Jian Wu
  • for: This paper aims to segment Electronic Theses and Dissertations (ETDs) into 13 categories to facilitate navigation and exploration of the content.
  • methods: The proposed method, ETDPC, uses a two-stream multimodal model with a cross-attention network to classify ETD pages. To address the challenge of imbalanced labeled samples, the authors augmented data for minority categories and employed a hierarchical classifier.
  • results: ETDPC outperforms the state-of-the-art models in all categories, achieving an F1 score of 0.84 to 0.96 for 9 out of 13 categories. The authors also demonstrated the data efficiency of their approach.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文旨在为电子硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件
    Abstract Electronic theses and dissertations (ETDs) have been proposed, advocated, and generated for more than 25 years. Although ETDs are hosted by commercial or institutional digital library repositories, they are still an understudied type of scholarly big data, partially because they are usually longer than conference proceedings and journals. Segmenting ETDs will allow researchers to study sectional content. Readers can navigate to particular pages of interest, discover, and explore the content buried in these long documents. Most existing frameworks on document page classification are designed for classifying general documents and perform poorly on ETDs. In this paper, we propose ETDPC. Its backbone is a two-stream multimodal model with a cross-attention network to classify ETD pages into 13 categories. To overcome the challenge of imbalanced labeled samples, we augmented data for minority categories and employed a hierarchical classifier. ETDPC outperforms the state-of-the-art models in all categories, achieving an F1 of 0.84 -- 0.96 for 9 out of 13 categories. We also demonstrated its data efficiency. The code and data can be found on GitHub (https://github.com/lamps-lab/ETDMiner/tree/master/etd_segmentation).
    摘要 电子硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬件硬

Evaluating Large Language Models in Ophthalmology

  • paper_url: http://arxiv.org/abs/2311.04933
  • repo_url: None
  • paper_authors: Jason Holmes, Shuyuan Ye, Yiwei Li, Shi-Nan Wu, Zhengliang Liu, Zihao Wu, Jinyu Hu, Huan Zhao, Xi Jiang, Wei Liu, Hong Wei, Jie Zou, Tianming Liu, Yi Shao
  • for: 评估三种大型语言模型(GPT-3.5、GPT-4、PaLM2)在回答眼科专业问题方面的表现,并与不同专业水平(医学本科生、医学硬件生、医生)进行比较。
  • methods: 使用100个眼科单选测验,对三种语言模型和三种专业水平进行测试,并对LLM的表现进行全面评估和比较。
  • results: 每种LLM都超过了医学本科生的平均分,其中GPT-4的表现与医生水平相当,而GPT-3.5和PaLM2略为下降到医学硬件生水平。此外,GPT-4的答案稳定性和自信度显著高于GPT-3.5和PaLM2。结论:我们的研究表明,LLM代表的GPT-4在眼科领域表现出色。随着进一步改进,LLM将带来未知的医学教育和临床决策的突破。
    Abstract Purpose: The performance of three different large language models (LLMS) (GPT-3.5, GPT-4, and PaLM2) in answering ophthalmology professional questions was evaluated and compared with that of three different professional populations (medical undergraduates, medical masters, and attending physicians). Methods: A 100-item ophthalmology single-choice test was administered to three different LLMs (GPT-3.5, GPT-4, and PaLM2) and three different professional levels (medical undergraduates, medical masters, and attending physicians), respectively. The performance of LLM was comprehensively evaluated and compared with the human group in terms of average score, stability, and confidence. Results: Each LLM outperformed undergraduates in general, with GPT-3.5 and PaLM2 being slightly below the master's level, while GPT-4 showed a level comparable to that of attending physicians. In addition, GPT-4 showed significantly higher answer stability and confidence than GPT-3.5 and PaLM2. Conclusion: Our study shows that LLM represented by GPT-4 performs better in the field of ophthalmology. With further improvements, LLM will bring unexpected benefits in medical education and clinical decision making in the near future.
    摘要 目的:评估三种不同的大语言模型(LLM)(GPT-3.5、GPT-4和PaLM2)在回答眼科专业问题上的表现,并与三种不同的专业人群(医学本科生、医学硬件生和医生)进行比较。方法:为三种LLM和三种专业水平(医学本科生、医学硬件生和医生)分别进行100项眼科单选测验。对LLM的表现进行全面评估和比较,包括平均分、稳定性和信心度。结果:每种LLM都比本科生在总体的表现得高,GPT-3.5和PaLM2只是微不足道的与硬件生水平相当,而GPT-4则与医生水平相当。此外,GPT-4还表现出了明显更高的答案稳定性和信心度。结论:我们的研究表明,LLM表示的GPT-4在眼科领域表现出色。随着进一步改进,LLM将在医学教育和临床决策中带来无法预期的优势。

Multitask Multimodal Prompted Training for Interactive Embodied Task Completion

  • paper_url: http://arxiv.org/abs/2311.04067
  • repo_url: None
  • paper_authors: Georgios Pantazopoulos, Malvina Nikandrou, Amit Parekh, Bhathiya Hemanthage, Arash Eshghi, Ioannis Konstas, Verena Rieser, Oliver Lemon, Alessandro Suglia
  • for: 该论文旨在解决现有视觉语言(VL)模型面临的两个基本挑战,即将语言融入行为轨迹和观察轨迹,以及referential歧义。
  • methods: 该论文提出了一种Embodied MultiModal Agent(EMMA)模型,它是一个统一的编码-解码模型,可以理解图像和轨迹,并将动作预测设置为多模态文本生成。通过将所有任务作为文本生成集成,EMMA学习了一种行为语言,这有助于在任务之间进行转移。与之前的分模块方法不同,我们使用了一个单一的多任务模型,每个任务均贡献到完成目标。
  • results: EMMA在多个VLbenchmark上表现与其他模型相当,并在Dialog-guided Task Completion(DTC)benchmark上达到了新的州OF-THE-ART性能(36.81%成功率),这是一个用于评估对话导向的智能代理在Alexa Arena的 benchmark。
    Abstract Interactive and embodied tasks pose at least two fundamental challenges to existing Vision & Language (VL) models, including 1) grounding language in trajectories of actions and observations, and 2) referential disambiguation. To tackle these challenges, we propose an Embodied MultiModal Agent (EMMA): a unified encoder-decoder model that reasons over images and trajectories, and casts action prediction as multimodal text generation. By unifying all tasks as text generation, EMMA learns a language of actions which facilitates transfer across tasks. Different to previous modular approaches with independently trained components, we use a single multitask model where each task contributes to goal completion. EMMA performs on par with similar models on several VL benchmarks and sets a new state-of-the-art performance (36.81% success rate) on the Dialog-guided Task Completion (DTC), a benchmark to evaluate dialog-guided agents in the Alexa Arena
    摘要 现有的视觉语言(VL)模型面临两个基本挑战:1)将语言融入行为和观察的轨迹中,2)确定参照的ambiguation。为解决这些挑战,我们提议一个embodied multimodal agent(EMMA):一个统一的编码解码模型,用于处理图像和轨迹,并将动作预测转化为多modal文本生成。通过将所有任务视为文本生成,EMMA学习了一种行为语言,该语言促进了任务之间的传递。与之前的模块化方法不同,我们使用一个单一的多任务模型,其中每个任务协助完成目标。EMMA在多个VL benchmark上达到了与其他模型相当的性能(36.81%成功率),并在Dialog-guided Task Completion(DTC) benchmark上创下了新的州Of-the-art表现(36.81%成功率),这是一个用于评估对话引导的智能客厅中的代表性任务。

Can CLIP Help Sound Source Localization?

  • paper_url: http://arxiv.org/abs/2311.04066
  • repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
  • paper_authors: Sooyoung Park, Arda Senocak, Joon Son Chung
    for: 这个研究是用于探索如何使用预训语音-影像模型来进行声源地图化。methods: 我们使用CLIP预训模型,并将音频讯号转换为CLIP可以处理的字串表示。然后,我们使用这些字串表示生成音频驱动的封包,将音频驱动的影像特征提取出来,并与音频驱动的字串表示进行对预对。results: 我们的方法可以将声音驱动的影像特征更加完整和紧凑地图示出。实验结果显示,我们的方法可以与现有的方法比较,获得更高的性能。
    Abstract Large-scale pre-trained image-text models demonstrate remarkable versatility across diverse tasks, benefiting from their robust representational capabilities and effective multimodal alignment. We extend the application of these models, specifically CLIP, to the domain of sound source localization. Unlike conventional approaches, we employ the pre-trained CLIP model without explicit text input, relying solely on the audio-visual correspondence. To this end, we introduce a framework that translates audio signals into tokens compatible with CLIP's text encoder, yielding audio-driven embeddings. By directly using these embeddings, our method generates audio-grounded masks for the provided audio, extracts audio-grounded image features from the highlighted regions, and aligns them with the audio-driven embeddings using the audio-visual correspondence objective. Our findings suggest that utilizing pre-trained image-text models enable our model to generate more complete and compact localization maps for the sounding objects. Extensive experiments show that our method outperforms state-of-the-art approaches by a significant margin.
    摘要 大规模预训练图像文本模型在多种任务上表现出了惊人的多样性,受益于它们的强大表示能力和有效的多媒体对接。我们在这些模型中 specifically CLIP 中扩展应用,特别是不需要显式文本输入,而是仅仅通过音视频对应关系来使用。为此,我们提出了一种框架,将音频信号转换成CLIP的文本编码器兼容的 токен,从而生成音频驱动的嵌入。通过直接使用这些嵌入,我们的方法生成了音频驱动的掩模,提取音频驱动的图像特征从高亮区域,并使用音视频对应关系目标来对准这些嵌入。我们的发现表明,通过使用预训练的图像文本模型,我们的模型可以生成更加完整和紧凑的当地化图。广泛的实验表明,我们的方法在相对比较的情况下,与现有的方法相比,具有显著的优势。

Multi-View Causal Representation Learning with Partial Observability

  • paper_url: http://arxiv.org/abs/2311.04056
  • repo_url: None
  • paper_authors: Dingling Yao, Danru Xu, Sébastien Lachapelle, Sara Magliacane, Perouz Taslakian, Georg Martius, Julius von Kügelgen, Francesco Locatello
  • for: studying the identifiability of representations learned from simultaneously observed views, such as different data modalities.
  • methods: using contrastive learning and a single encoder per view to learn the information shared across all subsets of any number of views.
  • results: the paper provides a unified framework and theoretical results that extend and unify several previous works on multi-view nonlinear ICA, disentanglement, and causal representation learning, and experimentally validate the claims on numerical, image, and multi-modal data sets.
    Abstract We present a unified framework for studying the identifiability of representations learned from simultaneously observed views, such as different data modalities. We allow a partially observed setting in which each view constitutes a nonlinear mixture of a subset of underlying latent variables, which can be causally related. We prove that the information shared across all subsets of any number of views can be learned up to a smooth bijection using contrastive learning and a single encoder per view. We also provide graphical criteria indicating which latent variables can be identified through a simple set of rules, which we refer to as identifiability algebra. Our general framework and theoretical results unify and extend several previous works on multi-view nonlinear ICA, disentanglement, and causal representation learning. We experimentally validate our claims on numerical, image, and multi-modal data sets. Further, we demonstrate that the performance of prior methods is recovered in different special cases of our setup. Overall, we find that access to multiple partial views enables us to identify a more fine-grained representation, under the generally milder assumption of partial observability.
    摘要 我们提出一个统一的框架,用于研究同时观察到的视图中学习的表示学习问题。我们允许部分观察的设置,在每个视图中包含一个非线性混合的一部分下面变量,这些变量可能是相关的。我们证明了,通过对所有 subsets 的任意数量的视图进行对比学习,可以学习到一个精细的映射,以便在所有视图中共享信息。我们还提供了一组图像标准,用于判断哪些变量可以通过简单的规则进行标识。我们的总框架和理论结论统一并扩展了多视图非线性ICA、解耦和 causal 表示学习的前期工作。我们在数学、图像和多Modal 数据集上进行了实验验证,并证明了我们的假设的性能。此外,我们还证明了我们的设置下,先前的方法的性能可以在不同的特殊情况下被恢复。总之,我们发现了访问多个部分视图,可以在更加轻松的假设下,标识一个更细grained的表示。

Causal Discovery Under Local Privacy

  • paper_url: http://arxiv.org/abs/2311.04037
  • repo_url: None
  • paper_authors: Rūta Binkytė, Carlos Pinzón, Szilvia Lestyán, Kangsoo Jung, Héber H. Arcolezi, Catuscia Palamidessi
  • for: 本研究是针对 differential privacy 框架中的本地隐私 Mechanism 的应用,以保护数据提供者的敏感信息。
  • methods: 本研究使用了多种知名的本地隐私机制,并评估了这些机制对于 causal discovery задача的干扰。
  • results: 研究发现,不同的本地隐私机制对于 causal discovery 任务的干扰程度不同,并且提供了选择适当的本地隐私协议的 valuable insights。
    Abstract Differential privacy is a widely adopted framework designed to safeguard the sensitive information of data providers within a data set. It is based on the application of controlled noise at the interface between the server that stores and processes the data, and the data consumers. Local differential privacy is a variant that allows data providers to apply the privatization mechanism themselves on their data individually. Therefore it provides protection also in contexts in which the server, or even the data collector, cannot be trusted. The introduction of noise, however, inevitably affects the utility of the data, particularly by distorting the correlations between individual data components. This distortion can prove detrimental to tasks such as causal discovery. In this paper, we consider various well-known locally differentially private mechanisms and compare the trade-off between the privacy they provide, and the accuracy of the causal structure produced by algorithms for causal learning when applied to data obfuscated by these mechanisms. Our analysis yields valuable insights for selecting appropriate local differentially private protocols for causal discovery tasks. We foresee that our findings will aid researchers and practitioners in conducting locally private causal discovery.
    摘要 differential privacy 是一种广泛采用的框架,用于保护数据提供者在数据集中的敏感信息。它基于数据处理和存储服务器和数据消费者之间应用控制的噪声的原理。本地异步隐私是一种变体,允许数据提供者本地应用隐私机制于他们的数据。因此,它可以在服务器或数据收集者无法被信任的情况下提供保护。噪声的引入,然而,必然影响数据的利用性,特别是对数据组件之间的相关性进行扭曲。这种扭曲可能对 causal discovery 任务产生负面影响。在这篇论文中,我们考虑了多种常见的本地异步隐私机制,并比较这些机制提供的隐私和 causal learning 算法应用于混淆后的数据中的准确性的负面影响。我们的分析带来了有价值的洞察,可以帮助研究人员和实践者选择合适的本地异步隐私协议进行 causal discovery 任务。我们预计,我们的发现将助力研究人员和实践者在本地私人方式进行 causal discovery。

Impact of HPO on AutoML Forecasting Ensembles

  • paper_url: http://arxiv.org/abs/2311.04034
  • repo_url: None
  • paper_authors: David Hoffmann
  • for: 这篇论文旨在探讨如何将不同的数据分析方法融合在一起以提高forecasting的精度。
  • methods: 论文使用了多种数据分析方法,包括MQ-CNN、DeepAR、Prophet、NPTS、ARIMA和ETS,并考虑了不同的数据分析方法之间的交互作用。
  • results: 论文的结果显示,在这种设置中,添加了搜索优化策略可以提高预测的精度,并且与基eline ensemble безHPO相比,具有9.9%的精度提升和65.8%的终端集成延迟时间提升。此外,这种设置还能够超越现有的商业AutoML预测解决方案Amazon Forecast,具有3.5%的误差降低和16.0%的终端集成延迟时间降低。
    Abstract A forecasting ensemble consisting of a diverse range of estimators for both local and global univariate forecasting, in particular MQ-CNN,DeepAR, Prophet, NPTS, ARIMA and ETS, can be used to make forecasts for a variety of problems. This paper delves into the aspect of adding different hyperparameter optimization strategies to the deep learning models in such a setup (DeepAR and MQ-CNN), exploring the trade-off between added training cost and the increase in accuracy for different configurations. It shows that in such a setup, adding hyperparameter optimization can lead to performance improvements, with the final setup having a 9.9 % percent accuracy improvement with respect to the avg-wQL over the baseline ensemble without HPO, accompanied by a 65.8 % increase in end-to-end ensemble latency. This improvement is based on an empirical analysis of combining the ensemble pipeline with different tuning strategies, namely Bayesian Optimisation and Hyperband and different configurations of those strategies. In the final configuration, the proposed combination of ensemble learning and HPO outperforms the state of the art commercial AutoML forecasting solution, Amazon Forecast, with a 3.5 % lower error and 16.0 % lower end-to-end ensemble latency.
    摘要 一个包含多种估计器的预测集群,包括MQ-CNN、DeepAR、Prophet、NPTS、ARIMA和ETS,可以用来预测多种问题。这篇论文探讨了将不同的超参数优化策略添加到深度学习模型中的影响(DeepAR和MQ-CNN),探究加入超参数优化后的准确性提升和预测时间的费用增加的负担关系。结果表明,在这种设置下,添加超参数优化可以提高性能,最终集群的准确率与对比 ensemble without HPO 的平均值QL差异为9.9%,同时预测时间增加65.8%。这种提升基于对不同配置的 ensemble pipeline 的实际分析,包括bayesian optimization和hyperband等策略的组合。最终配置中,提案的集群学习和HPO组合超过了商业AutoML预测解决方案 Amazon Forecast 的状态的误差和总体预测时间。

IoT-Based Environmental Control System for Fish Farms with Sensor Integration and Machine Learning Decision Support

  • paper_url: http://arxiv.org/abs/2311.04258
  • repo_url: None
  • paper_authors: D. Dhinakaran, S. Gopalakrishnan, M. D. Manigandan, T. P. Anish
    for: 这个研究旨在提高鱼养殖的环境控制和生产效率,以满足全球增长的海food需求,同时强调环境责任和经济可持续性。methods: 这个研究使用了互联网物联网(IoT)技术和先进的机器学习决策支持系统,在鱼养殖场中部署无线传感器网络,实时收集环境参数数据,包括水温、pH值、湿度和鱼类行为等。数据经过仔细的预处理,包括填充、异常检测、特征工程和同步。results: 这个系统使用四种不同的机器学习算法来实现环境控制和生产效率,包括随机森林算法(Random Forests)优化水温和pH值,检测和预测鱼类疾病和寄生虫;支持向量机器(SVMs)早期检测疾病和寄生虫;梯度提升机器(GBMs)动态调整饲料时间以适应实时环境 Conditions,提高鱼类生长和产量,同时减少资源浪费。这些机器学习算法共同实现实时决策,使鱼养殖环境Conditions与预定的规范相匹配,提高鱼类健康和产量,同时降低资源浪费,从而提高可持续性和经济可行性。
    Abstract In response to the burgeoning global demand for seafood and the challenges of managing fish farms, we introduce an innovative IoT based environmental control system that integrates sensor technology and advanced machine learning decision support. Deploying a network of wireless sensors within the fish farm, we continuously collect real-time data on crucial environmental parameters, including water temperature, pH levels, humidity, and fish behavior. This data undergoes meticulous preprocessing to ensure its reliability, including imputation, outlier detection, feature engineering, and synchronization. At the heart of our system are four distinct machine learning algorithms: Random Forests predict and optimize water temperature and pH levels for the fish, fostering their health and growth; Support Vector Machines (SVMs) function as an early warning system, promptly detecting diseases and parasites in fish; Gradient Boosting Machines (GBMs) dynamically fine-tune the feeding schedule based on real-time environmental conditions, promoting resource efficiency and fish productivity; Neural Networks manage the operation of critical equipment like water pumps and heaters to maintain the desired environmental conditions within the farm. These machine learning algorithms collaboratively make real-time decisions to ensure that the fish farm's environmental conditions align with predefined specifications, leading to improved fish health and productivity while simultaneously reducing resource wastage, thereby contributing to increased profitability and sustainability. This research article showcases the power of data-driven decision support in fish farming, promising to meet the growing demand for seafood while emphasizing environmental responsibility and economic viability, thus revolutionizing the future of fish farming.
    摘要 为了应对全球海鲜需求的增长和抚养鱼场的挑战,我们介绍了一种基于互联网物理(IoT)的环境控制系统,它 integrate了感知技术和高级机器学习决策支持。在鱼场中部署了无线感知器,实时收集了关键环境参数的数据,包括水温、pH值、湿度和鱼类行为。这些数据经过仔细的处理,以确保其可靠性,包括填充、异常检测、特征工程和同步。系统的核心是四种不同的机器学习算法:随机森林预测和优化水温和pH值,以促进鱼类健康和生长;支持向量机器(SVM)作为早期警报系统,迅速检测鱼类疾病和寄生虫;梯度提升机器(GBM)动态细化饲料时间表,以实现资源效率和鱼类产量的提高;神经网络管理关键设备如水泵和加热器,以保持鱼场的欲要的环境条件。这些机器学习算法合作实时做出决策,以确保鱼场的环境条件与预先定义的规范相符,从而提高鱼类健康和产量,同时减少资源浪费,从而提高可持续性和经济可行性。这篇研究文章展示了数据驱动决策支持在鱼养业中的力量,承诺满足全球海鲜需求的增长,同时强调环境负责任和经济可行性,从而革命化未来的鱼养业发展。

Expressivity of ReLU-Networks under Convex Relaxations

  • paper_url: http://arxiv.org/abs/2311.04015
  • repo_url: None
  • paper_authors: Maximilian Baader, Mark Niklas Müller, Yuhao Mao, Martin Vechev
  • for: 论文旨在探讨是否存在基于凸relaxation的神经网络训练和证明的基本限制。
  • methods: 论文使用了多种常用的凸relaxation,包括IBP relaxation和其他更高级别的relaxation,以探讨ReLU网络可以表示的函数类型和精度。
  • results: 研究结果显示,更高级别的凸relaxation可以表示更多的单变量函数,而且可以 exponential larger的解空间;即使使用最精度的单 neuron relaxation,也无法构建可以表示多变量凸CPWL函数的ReLU网络。
    Abstract Convex relaxations are a key component of training and certifying provably safe neural networks. However, despite substantial progress, a wide and poorly understood accuracy gap to standard networks remains, raising the question of whether this is due to fundamental limitations of convex relaxations. Initial work investigating this question focused on the simple and widely used IBP relaxation. It revealed that some univariate, convex, continuous piecewise linear (CPWL) functions cannot be encoded by any ReLU network such that its IBP-analysis is precise. To explore whether this limitation is shared by more advanced convex relaxations, we conduct the first in-depth study on the expressive power of ReLU networks across all commonly used convex relaxations. We show that: (i) more advanced relaxations allow a larger class of univariate functions to be expressed as precisely analyzable ReLU networks, (ii) more precise relaxations can allow exponentially larger solution spaces of ReLU networks encoding the same functions, and (iii) even using the most precise single-neuron relaxations, it is impossible to construct precisely analyzable ReLU networks that express multivariate, convex, monotone CPWL functions.
    摘要 凸relaxation是训练和证明可靠神经网络的关键组成部分。然而,尽管已经取得了重要进展,但是仍存在一个宽泛而不受理解的准确缺失,这引发了标准网络的准确性限制是凸relaxation的问题。初步的研究发现,使用IBP relaxation时,一些单变量、凸、连续弯曲函数(CPWL)不能被ReLU网络编码,使其IBP分析准确。为了检查这些限制是否被更高级的凸relaxation卷入,我们进行了所有常用凸relaxation的深入研究。我们发现:1.更高级的relaxation允许更多的单变量函数被 preciselly analyzable ReLU网络表示,2.更精准的relaxation可以让ReLU网络表示的函数空间增加 exponentially,3.即使使用最精准的单 neuron relaxation,也无法构建preciseley analyzable ReLU网络,表示多变量、凸、弯曲CPWL函数。

A Method to Improve the Performance of Reinforcement Learning Based on the Y Operator for a Class of Stochastic Differential Equation-Based Child-Mother Systems

  • paper_url: http://arxiv.org/abs/2311.04014
  • repo_url: None
  • paper_authors: Cheng Yin, Yi Chen
  • for: 提高基于actor-critic学习的控制性能,特别是在带有渐变随机过程的系统中。
  • methods: 提出了一种新的运算符,称为Y运算符,它将带有渐变随机过程的孩子系统的随机性 интеграinto critic网络的损失函数中,从而实现了RL算法的控制性能的明显提高。
  • results: Y运算符能够很好地解决RL算法在模型基于和数据驱动系统中的优化控制问题,并且在线性和非线性数学示例中展现出了与现有方法相比的显著提高。
    Abstract This paper introduces a novel operator, termed the Y operator, to elevate control performance in Actor-Critic(AC) based reinforcement learning for systems governed by stochastic differential equations(SDEs). The Y operator ingeniously integrates the stochasticity of a class of child-mother system into the Critic network's loss function, yielding substantial advancements in the control performance of RL algorithms.Additionally, the Y operator elegantly reformulates the challenge of solving partial differential equations for the state-value function into a parallel problem for the drift and diffusion functions within the system's SDEs.A rigorous mathematical proof confirms the operator's validity.This transformation enables the Y Operator-based Reinforcement Learning(YORL) framework to efficiently tackle optimal control problems in both model-based and data-driven systems.The superiority of YORL is demonstrated through linear and nonlinear numerical examples showing its enhanced performance over existing methods post convergence.
    摘要 这篇论文介绍了一种新的运算符,称为Y运算符,用于提高基于actor-critic(AC)学习算法的控制性能,适用于具有杂度的随机 diffeq 方程(SDE)的系统。Y运算符 Innately integrate了一类的child-mother系统的随机性 into the critic网络的损失函数中,从而实现了重要的控制性能提升。此外,Y运算符也 elegantly reformulates了解决部分 дифференциал方程的解值函数问题,转化成了并行的涨落函数问题。一个严格的数学证明证明了该运算符的有效性。这种转化使得YORL框架可以高效地解决模型基于和数据驱动的优化控制问题。数字示例表明,YORL在线性和非线性问题中的表现都胜过现有方法。

The Energy Prediction Smart-Meter Dataset: Analysis of Previous Competitions and Beyond

  • paper_url: http://arxiv.org/abs/2311.04007
  • repo_url: None
  • paper_authors: Direnc Pekaslan, Jose Maria Alonso-Moral, Kasun Bandara, Christoph Bergmeir, Juan Bernabe-Moreno, Robert Eigenmann, Nils Einecke, Selvi Ergen, Rakshitha Godahewa, Hansika Hewamalage, Jesus Lago, Steffen Limmer, Sven Rebhan, Boris Rabinovich, Dilini Rajapasksha, Heda Song, Christian Wagner, Wenlong Wu, Luis Magdalena, Isaac Triguero
  • for: 本研究提供了一个真实世界智能仪表数据集,并对能源预测技术挑战进行分析,主要集中在2020年IEEE计算机智能学会(IEEE-CIS)技术挑战和2021年IEEE国际 Conference on Fuzzy Systems(FUZZ-IEEE)技术挑战中。这两个挑战的目标是准确预测家庭能源消耗,并解释在下面的因素上。
  • methods: 本研究使用了不同的方法,包括机器学习、深度学习和杂度学习等,以提高能源预测的准确性和可解释性。
  • results: 本研究对实际世界智能仪表数据集进行分析,并提出了一些精准的能源预测方法,同时也提出了一些可解释性评价指标。
    Abstract This paper presents the real-world smart-meter dataset and offers an analysis of solutions derived from the Energy Prediction Technical Challenges, focusing primarily on two key competitions: the IEEE Computational Intelligence Society (IEEE-CIS) Technical Challenge on Energy Prediction from Smart Meter data in 2020 (named EP) and its follow-up challenge at the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) in 2021 (named as XEP). These competitions focus on accurate energy consumption forecasting and the importance of interpretability in understanding the underlying factors. The challenge aims to predict monthly and yearly estimated consumption for households, addressing the accurate billing problem with limited historical smart meter data. The dataset comprises 3,248 smart meters, with varying data availability ranging from a minimum of one month to a year. This paper delves into the challenges, solutions and analysing issues related to the provided real-world smart meter data, developing accurate predictions at the household level, and introducing evaluation criteria for assessing interpretability. Additionally, this paper discusses aspects beyond the competitions: opportunities for energy disaggregation and pattern detection applications at the household level, significance of communicating energy-driven factors for optimised billing, and emphasising the importance of responsible AI and data privacy considerations. These aspects provide insights into the broader implications and potential advancements in energy consumption prediction. Overall, these competitions provide a dataset for residential energy research and serve as a catalyst for exploring accurate forecasting, enhancing interpretability, and driving progress towards the discussion of various aspects such as energy disaggregation, demand response programs or behavioural interventions.
    摘要 The dataset includes 3,248 smart meters with varying data availability, ranging from one month to one year. The paper discusses the challenges, solutions, and evaluation criteria for developing accurate predictions at the household level. Additionally, it explores opportunities for energy disaggregation and pattern detection applications, the significance of communicating energy-driven factors for optimized billing, and the importance of responsible AI and data privacy considerations.These competitions provide a dataset for residential energy research and serve as a catalyst for exploring accurate forecasting, enhancing interpretability, and driving progress towards discussions of various aspects such as energy disaggregation, demand response programs, or behavioral interventions. Overall, the paper provides insights into the broader implications and potential advancements in energy consumption prediction.

Foundational propositions of hesitant fuzzy sets and parameter reductions of hesitant fuzzy information systems

  • paper_url: http://arxiv.org/abs/2311.04256
  • repo_url: None
  • paper_authors: Shizhan Lu
  • for: 这篇论文探讨了不确定和犹豫的情况下的软集的定义和应用。
  • methods: 该论文提出了基于不确定软集的形式的各种包含关系,并给出了一些基础定理和软集家族的描述。
  • results: 论文提出了基于不确定软集的参数缩减的基本提案,并给出了一个示例和一个算法来演示该过程。
    Abstract Hesitant fuzzy sets are widely used in the instances of uncertainty and hesitation. The inclusion relationship is an important and foundational definition for sets. Hesitant fuzzy set, as a kind of set, needs explicit definition of inclusion relationship. Base on the hesitant fuzzy membership degree of discrete form, several kinds of inclusion relationships for hesitant fuzzy sets are proposed. And then some foundational propositions of hesitant fuzzy sets and the families of hesitant fuzzy sets are presented. Finally, some foundational propositions of hesitant fuzzy information systems with respect to parameter reductions are put forward, and an example and an algorithm are given to illustrate the processes of parameter reductions.
    摘要 《不确定和犹豫的集合》广泛应用于不确定和犹豫的情况下。集合关系是不确定集合的基本定义。hesitant fuzzy set需要明确的包含关系定义。基于不确定模糊会员度的离散形式,对不确定集合的包含关系提出了多种建议。然后,对不确定集合和其家族进行了一些基本提案和推论。最后,对不确定模糊信息系统中参数减少的基本提案,并给出了一个示例和一个算法,以便 Illustrate the parameter reduction process.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Human-AI Collaboration in Thematic Analysis using ChatGPT: A User Study and Design Recommendations

  • paper_url: http://arxiv.org/abs/2311.03999
  • repo_url: None
  • paper_authors: Lixiang Yan, Vanessa Echeverria, Gloria Fernandez Nieto, Yueqiao Jin, Zachari Swiecki, Linxuan Zhao, Dragan Gašević, Roberto Martinez-Maldonado
  • for: 本研究探讨了研究者与生成人工智能(GenAI)的合作方式,尤其是使用ChatGPT进行资深分析。
  • methods: 研究者通过与ChatGPT进行合作,发现它可以帮助提高分析效率、帮助初步数据探索、提供细腻的量化预测和帮助非Native speaker和非专家理解。
  • results: 虽然ChatGPT显示了一定的价值,但研究者还存在对其可靠性、一致性和广泛acceptance within the research community的关切。本研究还提出了五项行动方案,以促进人AI合作的效果。这些方案包括在合作机制中提供透明的解释,提高界面和集成能力,优先级Contextual understanding和个性化,嵌入人AI反馈循环和迭代功能,以及加强信任 durch validation mechanisms。
    Abstract Generative artificial intelligence (GenAI) offers promising potential for advancing human-AI collaboration in qualitative research. However, existing works focused on conventional machine-learning and pattern-based AI systems, and little is known about how researchers interact with GenAI in qualitative research. This work delves into researchers' perceptions of their collaboration with GenAI, specifically ChatGPT. Through a user study involving ten qualitative researchers, we found ChatGPT to be a valuable collaborator for thematic analysis, enhancing coding efficiency, aiding initial data exploration, offering granular quantitative insights, and assisting comprehension for non-native speakers and non-experts. Yet, concerns about its trustworthiness and accuracy, reliability and consistency, limited contextual understanding, and broader acceptance within the research community persist. We contribute five actionable design recommendations to foster effective human-AI collaboration. These include incorporating transparent explanatory mechanisms, enhancing interface and integration capabilities, prioritising contextual understanding and customisation, embedding human-AI feedback loops and iterative functionality, and strengthening trust through validation mechanisms.
    摘要 生成人工智能(GenAI)在质量研究中具有潜在的扩展性,但现有的研究主要集中在传统的机器学习和模式基于的人工智能系统上,对GenAI在质量研究中的研究者与AI的合作情况知之甚少。这项研究探讨了10名质量研究者对ChatGPT的合作情况,发现ChatGPT作为主题分析的有价值合作者,提高代码效率,帮助初步数据探索,提供细致的量化分析,并帮助非本地语言使用者和非专家理解。然而,对其可靠性和准确性、一致性和可靠性、局部理解的限制以及研究社区中的更广泛acceptance仍然存在。我们提出了五项可行的设计建议,以促进人AI合作的有效实施。这些建议包括 integrate transparent explanatory mechanisms,提高界面和集成能力,优先级Contextual understanding and customization,嵌入人AI反馈循环和迭代功能,和强化信任 durch validation mechanisms。

Learned Causal Method Prediction

  • paper_url: http://arxiv.org/abs/2311.03989
  • repo_url: None
  • paper_authors: Shantanu Gupta, Cheng Zhang, Agrin Hilmkil
  • for: 这篇论文旨在提高 causal inference 方法的选择效率,因为 causal methods 通常需要复杂且难以验证的假设,并且 cross-validation 不适用因为真实的 causal 量未知。
  • methods: 本文提出了 CAusal Method Predictor (CAMP) 框架,用于预测最佳方法 для 给定的数据集。CAMP 使用了生成自各种 sintetic causal models 的数据集,评分了候选方法,并将模型训练为直接预测每个数据集的最高分方法。
  • results: CAMP 在 causal discovery 方面实现了比 selecting 任何个候选方法更高的性能,并且在未见过的半 sintetic 和真实世界 benchmark 中实现了良好的扩展性。
    Abstract For a given causal question, it is important to efficiently decide which causal inference method to use for a given dataset. This is challenging because causal methods typically rely on complex and difficult-to-verify assumptions, and cross-validation is not applicable since ground truth causal quantities are unobserved. In this work, we propose CAusal Method Predictor (CAMP), a framework for predicting the best method for a given dataset. To this end, we generate datasets from a diverse set of synthetic causal models, score the candidate methods, and train a model to directly predict the highest-scoring method for that dataset. Next, by formulating a self-supervised pre-training objective centered on dataset assumptions relevant for causal inference, we significantly reduce the need for costly labeled data and enhance training efficiency. Our strategy learns to map implicit dataset properties to the best method in a data-driven manner. In our experiments, we focus on method prediction for causal discovery. CAMP outperforms selecting any individual candidate method and demonstrates promising generalization to unseen semi-synthetic and real-world benchmarks.
    摘要 <>请求翻译文本为简化字符串。<>为给定的 causal 问题,效率地选择适合的 causal inference 方法是挑战。这是因为 causal 方法通常受到复杂且难以验证的假设的限制,而跨验证不适用,因为真实的 causal 量未被观察。在这种情况下,我们提出了 CAusal Method Predictor (CAMP),一种框架,用于预测给定数据集的最佳方法。为此,我们生成了一系列基于Synthetic causal models的数据集,评分候选方法,并使用一个模型来直接预测每个数据集的最高分方法。然后,我们通过 centered on dataset assumptions relevant for causal inference 的自我超vised pre-training objective来减少需要高昂的标注数据的需求,提高训练效率。我们的策略可以将数据集的隐藏属性映射到最佳方法,从数据驱动的角度来解决这个问题。在我们的实验中,我们关注 causal discovery 方法预测。CAMP 在比较任何个人候选方法时表现出色,并在未看过的半Synthetic和实际 benchmark 上展现了良好的泛化性。

Its All Graph To Me: Foundational Topology Models with Contrastive Learning on Multiple Domains

  • paper_url: http://arxiv.org/abs/2311.03976
  • repo_url: None
  • paper_authors: Alex O. Davies, Riku W. Green, Nirav S. Ajmeri, Telmo M. Silva Filho
  • for: 本研究旨在提出一种多领域图数据学习的方法,以便在数据稀缺或标签缺乏的情况下进行图数据分析。
  • methods: 本研究使用对抗对抗学习方法,通过在多个图领域上预训练模型,以获得更好的图数据表示。
  • results: 对比基eline模型、无预训练模型和非预训练模型,本研究显示,使用我们的单模型可以达到相当或更好的性能,特别是当节点标签在评估中使用时。
    Abstract Representations and embeddings of graph data have been essential in many domains of research. The principle benefit of learning such representations is that the pre-trained model can be fine-tuned on smaller datasets where data or labels are scarse. Existing models, however, are domain specific; for example a model trained on molecular graphs is fine-tuned on other molecular graphs. This means that in many application cases the choice of pre-trained model can be arbitrary, and novel domains may lack an appropriate pre-trained model. This is of particular issue where data is scarse, precluding traditional supervised methods. In this work we use adversarial contrastive learning to present a \method, a model pre-trained on many graph domains. We train the model only on topologies but include node labels in evaluation. We evaluate the efficacy of its learnt representations on various downstream tasks. Against baseline models pre-trained on single domains, as well as un-trained models and non-transferred models, we show that performance is equal or better using our single model. This includes when node labels are used in evaluation, where performance is consistently superior to single-domain or non-pre-trained models.
    摘要 研究领域中的表示和嵌入都是非常重要的。这些表示可以帮助预训练模型在数据或标签scarce的情况下进行细化。现有的模型却是域特定的,例如一个基于分子图的模型将在其他分子图上细化。这意味着在许多应用场景中,选择预训练模型是随意的,而新的领域可能缺乏适合的预训练模型。这对于数据scarce情况下特别是一个问题。在这种情况下,我们使用对抗式强化学习来提出一种方法,一个在多个图域上预训练的模型。我们只在结构上进行训练,但是在评估中包含节点标签。我们对此进行了评估,并与基于单个域的基eline模型、无预训练模型和非转移模型进行比较。我们发现,使用我们的单一模型,表现和基eline模型相当或更好,特别是在使用节点标签进行评估时。

An Expectation-Realization Model for Metaphor Detection

  • paper_url: http://arxiv.org/abs/2311.03963
  • repo_url: None
  • paper_authors: Oseremen O. Uduehi, Razvan C. Bunescu
  • for: 这篇论文旨在提出一种基于预期和实现模块的métafore检测建模,以提高métafore检测精度。
  • methods: 该方法使用两个主要模块:预期组件和实现组件。预期组件计算给定上下文中Literal字句的预期表示,而实现组件计算actual字句在上下文中的含义表示。整个建模被训练以学习Expectation-Realization(ER)模式,以捕捉métafore的用法。
  • results: 对三个métafore数据集进行评估,包括 dentro分布、离分布和新métafore泛化,提出的方法可以与现有的状态艺技比或更好的结果。进一步的ER模型 ensemble可以进一步提高métafore检测精度。
    Abstract We propose a metaphor detection architecture that is structured around two main modules: an expectation component that estimates representations of literal word expectations given a context, and a realization component that computes representations of actual word meanings in context. The overall architecture is trained to learn expectation-realization (ER) patterns that characterize metaphorical uses of words. When evaluated on three metaphor datasets for within distribution, out of distribution, and novel metaphor generalization, the proposed method is shown to obtain results that are competitive or better than state-of-the art. Further increases in metaphor detection accuracy are obtained through ensembling of ER models.
    摘要 我们提出了一种基于两个主要模块的比喻检测建筑:一个预期组件,用于在上下文中估计Literal字句的表达,以及一个实现组件,用于在上下文中计算实际字句的含义。总体建筑是通过学习预期-实现(ER)模式来学习比喻的用法。在三个比喻数据集上进行评估,我们的方法在 dentro分布、out of distribution和新比喻泛化上具有竞争或更好的结果。进一步的加权ER模型 ensemble可以提高比喻检测精度。

Elastic Information Bottleneck

  • paper_url: http://arxiv.org/abs/2311.03955
  • repo_url: https://github.com/nyyxxx/elastic-information-bottleneck
  • paper_authors: Yuyan Ni, Yanyan Lan, Ao Liu, Zhiming Ma
  • for: 本文研究了信息瓶颈原理在深度学习算法的表示学习中的应用,并对两种方法进行比较:信息瓶颈(IB)和决定性信息瓶颈(DIB)。
  • methods: 本文使用了信息瓶颈原理来学习最大压缩表示,并通过对两种方法进行比较,提出了一种新的扩展IB方法,即弹性信息瓶颈(EIB)。
  • results: 实验和数据分析表明,EIB方法可以在适应域适应中实现更好的适应结果,而且可以在不同预测 distribuition 下进行适应。
    Abstract Information bottleneck is an information-theoretic principle of representation learning that aims to learn a maximally compressed representation that preserves as much information about labels as possible. Under this principle, two different methods have been proposed, i.e., information bottleneck (IB) and deterministic information bottleneck (DIB), and have gained significant progress in explaining the representation mechanisms of deep learning algorithms. However, these theoretical and empirical successes are only valid with the assumption that training and test data are drawn from the same distribution, which is clearly not satisfied in many real-world applications. In this paper, we study their generalization abilities within a transfer learning scenario, where the target error could be decomposed into three components, i.e., source empirical error, source generalization gap (SG), and representation discrepancy (RD). Comparing IB and DIB on these terms, we prove that DIB's SG bound is tighter than IB's while DIB's RD is larger than IB's. Therefore, it is difficult to tell which one is better. To balance the trade-off between SG and the RD, we propose an elastic information bottleneck (EIB) to interpolate between the IB and DIB regularizers, which guarantees a Pareto frontier within the IB framework. Additionally, simulations and real data experiments show that EIB has the ability to achieve better domain adaptation results than IB and DIB, which validates the correctness of our theories.
    摘要 <>将文本翻译成简化中文。<>信息瓶颈是一种信息理论的学习原则,旨在学习最高度压缩的表示,以保持最多信息 Label。在这个原则下,有两种不同的方法被提议,即信息瓶颈(IB)和决定性信息瓶颈(DIB),它们在理论和实验上具有显著的进步,能够解释深度学习算法中的表示机制。但是,这些成功假设训练和测试数据是从同一个分布中采样,这明显不符合许多实际应用中的情况。在这篇文章中,我们研究IB和DIB在转移学习场景中的泛化能力,并将目标错误 decomposed 为三个组成部分:源Empirical error、源泛化差(SG)和表示差(RD)。 Comparing IB和DIB,我们证明DIB的SG bound 更紧,而DIB的RD更大。因此,无法判断哪一个更好。为了平衡SG和RD之间的负担,我们提议一种可塑性信息瓶颈(EIB),可以在IB框架中 interpolate IB和DIB regularizers, garantía 一个Pareto frontier。此外,实验和实际数据表明,EIB可以在域 adaptation 中 achieve 更好的结果,证明了我们的理论的正确性。

The Music Meta Ontology: a flexible semantic model for the interoperability of music metadata

  • paper_url: http://arxiv.org/abs/2311.03942
  • repo_url: None
  • paper_authors: Jacopo de Berardinis, Valentina Anita Carriero, Albert Meroño-Peñuela, Andrea Poltronieri, Valentina Presutti
  • for: 该论文旨在提供一种Semantic Description of MusicMetadata,以便创建可以归一化、 интеграción和搜索信息的音乐数据集。
  • methods: 该论文使用eXtreme Design方法和数据工程最佳实践,将不同领域的专家(音乐学家、图书馆员、数据工程师等)的需求和观点反映到模型的设计中,同时借鉴ontology设计模式和证明来源。
  • results: 该论文介绍了Music Meta ontology,一种rich和flexible的semantic模型,用于描述音乐元数据相关的艺术家、作品、演奏、录音等方面。此外,论文还提供了对Music Ontology、DOREMUS和Wikidata等其他schema的 alignments,以及数据转换支持。
    Abstract The semantic description of music metadata is a key requirement for the creation of music datasets that can be aligned, integrated, and accessed for information retrieval and knowledge discovery. It is nonetheless an open challenge due to the complexity of musical concepts arising from different genres, styles, and periods -- standing to benefit from a lingua franca to accommodate various stakeholders (musicologists, librarians, data engineers, etc.). To initiate this transition, we introduce the Music Meta ontology, a rich and flexible semantic model to describe music metadata related to artists, compositions, performances, recordings, and links. We follow eXtreme Design methodologies and best practices for data engineering, to reflect the perspectives and the requirements of various stakeholders into the design of the model, while leveraging ontology design patterns and accounting for provenance at different levels (claims, links). After presenting the main features of Music Meta, we provide a first evaluation of the model, alignments to other schema (Music Ontology, DOREMUS, Wikidata), and support for data transformation.
    摘要 “music metadata的Semantic description是创建可以协调、集成和检索信息的音乐数据集的关键要求。然而,这是一个打开的挑战,因为音乐概念来自不同的流派、风格和时期,从而需要一种 lingua franca 来满足不同的参与者(音乐学家、图书馆员、数据工程师等)。为了实现这一目标,我们介绍了 Music Meta ontology,这是一个强大和灵活的Semantic模型,用于描述音乐 metadata related to artists, compositions, performances, recordings, and links。我们遵循 eXtreme Design 方法和数据工程最佳实践,将参与者的视角和需求 Reflected into the model的设计,同时应用ontology设计模式和考虑多维 provinance(声明、链接)。文章最后介绍了 Music Meta的主要特点,以及与 Music Ontology、DOREMUS 和 Wikidata 的对接,以及数据转换的支持。”Note: Please note that Simplified Chinese is used here, which is the most commonly used Chinese language in mainland China. If you prefer Traditional Chinese, I can provide that version as well.

Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation

  • paper_url: http://arxiv.org/abs/2311.04254
  • repo_url: None
  • paper_authors: Ruomeng Ding, Chaoyun Zhang, Lu Wang, Yong Xu, Minghua Ma, Wei Zhang, Si Qin, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang
  • For: The paper aims to improve the capabilities of Large Language Models (LLMs) in decision-making by developing a novel thought prompting approach called “Everything of Thoughts” (XoT).* Methods: The approach leverages pretrained reinforcement learning and Monte Carlo Tree Search (MCTS) to incorporate external domain knowledge into thoughts, and autonomously produces high-quality comprehensive cognitive mappings with minimal LLM interactions.* Results: The approach enables LLMs to generalize to unseen problems efficiently and engage in unconstrained thinking, allowing for flexible cognitive mappings for problems with multiple solutions.Here are the three points in Simplified Chinese text:* For: paper 目的是提高 Large Language Models (LLMs) 在决策中的能力。* Methods: 方法是利用预训练的回归学习和 Monte Carlo Tree Search (MCTS) 将外部领域知识 integrate 到思维中,并通过自动生成高质量的全面认知地图,使 LLMs 可以快速地 generalized 到未看到的问题。* Results: 结果是 LLMs 可以快速地 generalized 到未看到的问题,并且可以进行不受限制的思维,解决多解问题。
    Abstract Recent advancements in Large Language Models (LLMs) have revolutionized decision-making by breaking down complex problems into more manageable language sequences referred to as ``thoughts''. An effective thought design should consider three key perspectives: performance, efficiency, and flexibility. However, existing thought can at most exhibit two of these attributes. To address these limitations, we introduce a novel thought prompting approach called ``Everything of Thoughts'' (XoT) to defy the law of ``Penrose triangle of existing thought paradigms. XoT leverages pretrained reinforcement learning and Monte Carlo Tree Search (MCTS) to incorporate external domain knowledge into thoughts, thereby enhancing LLMs' capabilities and enabling them to generalize to unseen problems efficiently. Through the utilization of the MCTS-LLM collaborative thought revision framework, this approach autonomously produces high-quality comprehensive cognitive mappings with minimal LLM interactions. Additionally, XoT empowers LLMs to engage in unconstrained thinking, allowing for flexible cognitive mappings for problems with multiple solutions.
    摘要 Simplified Chinese:最近的大语言模型(LLMs)技术进步已经革命化了决策,将复杂问题转化为更容易处理的语言序列,称为“思想”。一个有效的思想设计应该考虑三个关键方面:性能、效率和灵活性。然而,现有的思想只能展现两个这些特征。为了解决这些限制,我们介绍了一种新的思想提示方法called“Everything of Thoughts”(XoT),用于绕过现有思想观念的法律。XoT利用预训练的奖励学习和 Monte Carlo Tree Search(MCTS),将外部领域知识 integrate into thoughts,从而提高LLMs的能力和效率,使其能够高效地处理未经见过的问题。通过MCTS-LLM合作思想修订框架,这种方法可以自动生成高质量的完整性ognitive mapping,并且减少LLM的交互量。此外,XoT使LLMs具有不受限制的思维能力,允许它们为多解问题生成灵活的认知地图。

MixtureGrowth: Growing Neural Networks by Recombining Learned Parameters

  • paper_url: http://arxiv.org/abs/2311.04251
  • repo_url: https://github.com/chaudatascience/mixturegrowth
  • paper_authors: Chau Pham, Piotr Teterwak, Soren Nelson, Bryan A. Plummer
  • for: 这篇论文旨在提出一种新的网络变大方法,以避免传统的网络训练 overhead。
  • methods: 本文使用了 MixtureGrowth 方法,将每个层的 Parameters 组合成一个 linear combination,以允许新增的层 weights 学习新的知识。
  • results: 本文比较了 MixtureGrowth 方法与传统的网络训练方法,结果显示 MixtureGrowth 方法可以提高 top-1 精度,并且具有较低的 Computational FLOPs。
    Abstract Most deep neural networks are trained under fixed network architectures and require retraining when the architecture changes. If expanding the network's size is needed, it is necessary to retrain from scratch, which is expensive. To avoid this, one can grow from a small network by adding random weights over time to gradually achieve the target network size. However, this naive approach falls short in practice as it brings too much noise to the growing process. Prior work tackled this issue by leveraging the already learned weights and training data for generating new weights through conducting a computationally expensive analysis step. In this paper, we introduce MixtureGrowth, a new approach to growing networks that circumvents the initialization overhead in prior work. Before growing, each layer in our model is generated with a linear combination of parameter templates. Newly grown layer weights are generated by using a new linear combination of existing templates for a layer. On one hand, these templates are already trained for the task, providing a strong initialization. On the other, the new coefficients provide flexibility for the added layer weights to learn something new. We show that our approach boosts top-1 accuracy over the state-of-the-art by 2-2.5% on CIFAR-100 and ImageNet datasets, while achieving comparable performance with fewer FLOPs to a larger network trained from scratch. Code is available at https://github.com/chaudatascience/mixturegrowth.
    摘要 大多数深度神经网络在固定网络架构下训练,需要重新训练当网络架构发生变化时。如果需要扩大网络的大小,需要从头开始训练,这是非常昂贵的。为了避免这个问题,可以通过逐渐添加Random weights来增加网络的大小。然而,这种简单的方法在实践中失败,因为它引入了太多噪声到增长过程中。先前的工作解决了这个问题,通过利用已经学习过的参数和训练数据来生成新的参数,通过进行计算昂贵的分析步骤。在这篇论文中,我们介绍了 MixtureGrowth,一种新的网络增长方法,可以缓解先前的初始化开销。在我们的方法中,每层的参数都是通过线性组合参数模板来生成的。新增加的层参数是通过使用新的线性组合已有的层参数模板来生成的。一方面,这些模板已经被训练了任务,可以提供强大的初始化。另一方面,新的系数提供了可以学习新的东西的灵活性。我们显示,我们的方法可以在CIFAR-100和ImageNet数据集上提高top-1准确率,与一个从头开始训练的更大网络相比,同时实现相似的性能,而且需要更少的FLOPs。代码可以在https://github.com/chaudatascience/mixturegrowth中找到。

Temporal Graph Representation Learning with Adaptive Augmentation Contrastive

  • paper_url: http://arxiv.org/abs/2311.03897
  • repo_url: None
  • paper_authors: Hongjiang Chen, Pengfei Jiao, Huijun Tang, Huaming Wu
  • for: 本研究旨在生成低维度的动态节点嵌入,以捕捉时间信息和结构和性信息。
  • methods: 我们提出了一种新的时间图表示学习模型(TGAC),该模型通过结合先前知识和时间信息来进行自适应扩充,并定义了扩充后的视图对比函数。
  • results: 我们的广泛实验表明,提出的模型在各种真实网络上具有优于其他时间图表示学习方法。
    Abstract Temporal graph representation learning aims to generate low-dimensional dynamic node embeddings to capture temporal information as well as structural and property information. Current representation learning methods for temporal networks often focus on capturing fine-grained information, which may lead to the model capturing random noise instead of essential semantic information. While graph contrastive learning has shown promise in dealing with noise, it only applies to static graphs or snapshots and may not be suitable for handling time-dependent noise. To alleviate the above challenge, we propose a novel Temporal Graph representation learning with Adaptive augmentation Contrastive (TGAC) model. The adaptive augmentation on the temporal graph is made by combining prior knowledge with temporal information, and the contrastive objective function is constructed by defining the augmented inter-view contrast and intra-view contrast. To complement TGAC, we propose three adaptive augmentation strategies that modify topological features to reduce noise from the network. Our extensive experiments on various real networks demonstrate that the proposed model outperforms other temporal graph representation learning methods.
    摘要 <>translate the following text into Simplified Chinese:Temporal graph representation learning aims to generate low-dimensional dynamic node embeddings to capture temporal information as well as structural and property information. Current representation learning methods for temporal networks often focus on capturing fine-grained information, which may lead to the model capturing random noise instead of essential semantic information. While graph contrastive learning has shown promise in dealing with noise, it only applies to static graphs or snapshots and may not be suitable for handling time-dependent noise. To alleviate the above challenge, we propose a novel Temporal Graph representation learning with Adaptive augmentation Contrastive (TGAC) model. The adaptive augmentation on the temporal graph is made by combining prior knowledge with temporal information, and the contrastive objective function is constructed by defining the augmented inter-view contrast and intra-view contrast. To complement TGAC, we propose three adaptive augmentation strategies that modify topological features to reduce noise from the network. Our extensive experiments on various real networks demonstrate that the proposed model outperforms other temporal graph representation learning methods.>Here's the translation:时间图表示学习目标是生成低维度动态节点嵌入,以捕捉时间信息和结构性和性信息。当前的时间网络表示学习方法通常强调细化信息,可能导致模型捕捉随机噪音而不是重要的semantic信息。虽然图像强制学习有潜在的应用,但它只适用于静止图或快照,可能无法适应时间依赖的噪音。为解决以上挑战,我们提出了一种新的时间图表示学习模型,即时间图表示学习with Adaptive augmentation Contrastive(TGAC)模型。TGAC模型中的adaptive扩展是通过结合先前知识和时间信息来实现的,对于时间图来说,我们定义了扩展的间观异见和内观异见。为了补做TGAC模型,我们提出了三种适应扩展策略,用于修改时间图的特征,以降低噪音的影响。我们在多种真实网络上进行了广泛的实验,并证明了我们提出的模型在其他时间图表示学习方法的基础上具有更高的性能。

Unifying Structure and Language Semantic for Efficient Contrastive Knowledge Graph Completion with Structured Entity Anchors

  • paper_url: http://arxiv.org/abs/2311.04250
  • repo_url: None
  • paper_authors: Sang-Hyun Je, Wontae Choi, Kwangjin Oh
  • for: 预测缺失的链接在知识图(KG)中,使用已知的训练事实来预测。
  • methods: 使用预训练语言模型(PLM),同时利用文本信息和结构信息,但是其性能落后于当前最佳结构基本方法或者一些方法会在结构嵌入和文本编码之间失去推理能力。
  • results: 提出一种新的方法,可以有效地融合结构信息和语言 semantics,不会失去推理能力。采用实体锚点,并将 KG 元素的文本描述和实体锚点一起输入 PLM 基本编码器,以学习统一表示。此外,使用随机负样本,可以在对照学习中重复使用每个 mini-batch 中,以学习通用实体表示。经过多种实验和分析,我们证明了我们提出的方法的效果。实验结果表明,我们的方法在标准的链接预测任务上超过了现有的 SOTA KGC 模型。尤其是在 FB15K-237 上,我们的方法展现出最大性能提升,与结构基本 KGC 方法竞争。
    Abstract The goal of knowledge graph completion (KGC) is to predict missing links in a KG using trained facts that are already known. In recent, pre-trained language model (PLM) based methods that utilize both textual and structural information are emerging, but their performances lag behind state-of-the-art (SOTA) structure-based methods or some methods lose their inductive inference capabilities in the process of fusing structure embedding to text encoder. In this paper, we propose a novel method to effectively unify structure information and language semantics without losing the power of inductive reasoning. We adopt entity anchors and these anchors and textual description of KG elements are fed together into the PLM-based encoder to learn unified representations. In addition, the proposed method utilizes additional random negative samples which can be reused in the each mini-batch during contrastive learning to learn a generalized entity representations. We verify the effectiveness of the our proposed method through various experiments and analysis. The experimental results on standard benchmark widely used in link prediction task show that the proposed model outperforms existing the SOTA KGC models. Especially, our method show the largest performance improvement on FB15K-237, which is competitive to the SOTA of structure-based KGC methods.
    摘要 知识图完成(KGC)的目标是预测知识图中缺失的链接,使用已知的训练事实。在最近,基于自然语言模型(PLM)的方法在推广使用文本信息和结构信息,但其性能落后于当前最佳结构基本方法或者一些方法在结构嵌入和文本编码的融合过程中失去了推理推导能力。在这篇论文中,我们提出一种新的方法,能够有效地结合结构信息和语言semantics,不失去推理推导能力。我们采用实体锚点,并将知识图元素的文本描述和实体锚点一起 feed 到 PLM-based 编码器中,以学习一致的表示。此外,我们的方法还利用随机负样本,可以在每个 mini-batch 中重复利用,在对比学习中学习一致的实体表示。我们通过多种实验和分析证明了方法的效iveness。实验结果表明,我们提出的模型在标准的链接预测任务上表现出色,尤其是在 FB15K-237 上,与结构基本 KGC 方法竞争。

Understanding Tool Discovery and Tool Innovation Using Active Inference

  • paper_url: http://arxiv.org/abs/2311.03893
  • repo_url: None
  • paper_authors: Poppy Collis, Paul F Kinghorn, Christopher L Buckley
  • for: 这篇论文旨在探讨人工智能代理人如何发现和创造新工具。
  • methods: 论文使用了活动推断理论中的最小描述来分别Tool discovery和Tool innovation。然后,通过引入工具可用性这个隐藏状态因素,构建了一个简单的工具创造模型。
  • results: 研究发现,通过在隐藏状态中引入工具可用性,代理人可以不仅发现工具,还可以创造新的工具。这些结果对于人工智能领域的工具创造研究提供了新的思路和未来研究方向。
    Abstract The ability to invent new tools has been identified as an important facet of our ability as a species to problem solve in dynamic and novel environments. While the use of tools by artificial agents presents a challenging task and has been widely identified as a key goal in the field of autonomous robotics, far less research has tackled the invention of new tools by agents. In this paper, (1) we articulate the distinction between tool discovery and tool innovation by providing a minimal description of the two concepts under the formalism of active inference. We then (2) apply this description to construct a toy model of tool innovation by introducing the notion of tool affordances into the hidden states of the agent's probabilistic generative model. This particular state factorisation facilitates the ability to not just discover tools but invent them through the offline induction of an appropriate tool property. We discuss the implications of these preliminary results and outline future directions of research.
    摘要 人类的问题解决能力是通过创造新工具的能力,这种能力被认为是我们种族面临动态和新环境中的重要特征。虽然人工智能代理人使用工具是一项复杂的任务,但是许多研究都未曾关注代理人创造新工具的能力。在这篇论文中,我们将(1)描述工具发现和工具创新之间的区别,通过活动推理的形式进行描述。然后我们将(2)使用这个描述来构建一个简单的工具创新模型,通过引入工具可用性的概念来增加代理人的生成模型中的隐藏状态因素。这种状态因素的分解使得代理人不仅能够发现工具,还能够通过离线学习适当的工具性质来创造新工具。我们讨论了这些初步结果的意义和未来研究的方向。

Formulating Discrete Probability Flow Through Optimal Transport

  • paper_url: http://arxiv.org/abs/2311.03886
  • repo_url: https://github.com/pangzecheung/discrete-probability-flow
  • paper_authors: Pengze Zhang, Hubery Yin, Chen Li, Xiaohua Xie
  • for: 本文 aim to establish the fundamental theory for the probability flow of discrete diffusion models.
  • methods: 作者首先证明了连续概率流是MONGE最优运输图 beneath certain conditions, 并在 discrete 情况下提供了相等的证明。然后,他们根据这些发现定义了离散概率流,并在这个原理上提出了一种新的采样方法。
  • results: 作者通过对 synthetic toy dataset 和 CIFAR-10 dataset 的广泛实验 validate 了他们提出的离散概率流的效果。代码可以在:https://github.com/PangzeCheung/Discrete-Probability-Flow 找到。
    Abstract Continuous diffusion models are commonly acknowledged to display a deterministic probability flow, whereas discrete diffusion models do not. In this paper, we aim to establish the fundamental theory for the probability flow of discrete diffusion models. Specifically, we first prove that the continuous probability flow is the Monge optimal transport map under certain conditions, and also present an equivalent evidence for discrete cases. In view of these findings, we are then able to define the discrete probability flow in line with the principles of optimal transport. Finally, drawing upon our newly established definitions, we propose a novel sampling method that surpasses previous discrete diffusion models in its ability to generate more certain outcomes. Extensive experiments on the synthetic toy dataset and the CIFAR-10 dataset have validated the effectiveness of our proposed discrete probability flow. Code is released at: https://github.com/PangzeCheung/Discrete-Probability-Flow.
    摘要 continuous diffusion models 通常被承认为展现Deterministic probability flow,而 discrete diffusion models 则不然。在这篇论文中,我们目标是建立Discrete probability flow的基本理论。 Specifically, we first prove that the continuous probability flow is the Monge optimal transport map under certain conditions,并提供了对 discrete cases 的等价证明。 Based on these findings,我们可以定义Discrete probability flow according to the principles of optimal transport。 Finally,drawing upon our newly established definitions,we propose a novel sampling method that can generate more certain outcomes than previous discrete diffusion models。 Extensive experiments on the synthetic toy dataset and the CIFAR-10 dataset have validated the effectiveness of our proposed discrete probability flow. Code can be found at: https://github.com/PangzeCheung/Discrete-Probability-Flow.Note that "Discrete Probability Flow" is a literal translation of "discrete diffusion models" in the text, and "Monge optimal transport map" is a literal translation of "continuous probability flow" in the text.

Mini but Mighty: Finetuning ViTs with Mini Adapters

  • paper_url: http://arxiv.org/abs/2311.03873
  • repo_url: https://github.com/iemprog/mimi
  • paper_authors: Imad Eddine Marouf, Enzo Tartaglione, Stéphane Lathuilière
  • for: 提高适应性模型(Adapters)的性能,避免训练和存储成本过高。
  • methods: 提出了一种名为MiMi的训练框架,可以逐渐减少适应器的大小,以提高性能。还引入了一种专门为适应器设计的新评价函数,用于自动估计每个适应器的隐藏维度。
  • results: 在DomainNet、VTAB和Multi-task等三个 dataset 上,与现有方法进行比较,MiMi 方法可以在 29 个 dataset 上找到最佳的准确率和训练参数之间的最佳平衡。
    Abstract Vision Transformers (ViTs) have become one of the dominant architectures in computer vision, and pre-trained ViT models are commonly adapted to new tasks via fine-tuning. Recent works proposed several parameter-efficient transfer learning methods, such as adapters, to avoid the prohibitive training and storage cost of finetuning. In this work, we observe that adapters perform poorly when the dimension of adapters is small, and we propose MiMi, a training framework that addresses this issue. We start with large adapters which can reach high performance, and iteratively reduce their size. To enable automatic estimation of the hidden dimension of every adapter, we also introduce a new scoring function, specifically designed for adapters, that compares the neuron importance across layers. Our method outperforms existing methods in finding the best trade-off between accuracy and trained parameters across the three dataset benchmarks DomainNet, VTAB, and Multi-task, for a total of 29 datasets.
    摘要 视transformer(ViT)已成为计算机视觉领域的主导架构,预训练ViT模型通常通过精细调整来适应新任务。近期的工作提出了多种精度efficient的传输学习方法,如适配器,以避免训练和存储成本过高。在这种工作中,我们发现小的适配器表现不佳,我们提议MiMi,一种训练框架,解决这个问题。我们从大的适配器开始,可以达到高性能,然后逐步减小它们的大小。为了自动估计每个适配器的隐藏维度,我们也引入了一个新的评分函数,专门设计 для适配器,用于比较层次中每个神经元的重要性。我们的方法在三个数据集标准 bencmarks(DomainNet、VTAB和Multi-task)上,对39个数据集进行了最佳的折衔比较,并超越了现有的方法。

FD-MIA: Efficient Attacks on Fairness-enhanced Models

  • paper_url: http://arxiv.org/abs/2311.03865
  • repo_url: None
  • paper_authors: Huan Tian, Guangsheng Zhang, Bo Liu, Tianqing Zhu, Ming Ding, Wanlei Zhou
  • for: 这篇论文主要探讨了对受歧视模型进行保护的方法,以及这些方法对于攻击者发起识别成员资料攻击的问题。
  • methods: 这篇论文使用了一种基于公平关系的攻击方法,即使用了训练模型的公平关系分析结果来侦测资料攻击。
  • results: 研究发现,对于公平关系强化的模型来说,攻击者可能会遇到困难,因为这些模型的预测结果会受到公平关系的限制,使得攻击者无法从预测结果中获得有用的信息。此外,研究也发现,对于大多数训练数据子集而言,公平关系方法通常会导致预测性能下降,这使得攻击者更难获得成功。
    Abstract Previous studies have developed fairness methods for biased models that exhibit discriminatory behaviors towards specific subgroups. While these models have shown promise in achieving fair predictions, recent research has identified their potential vulnerability to score-based membership inference attacks (MIAs). In these attacks, adversaries can infer whether a particular data sample was used during training by analyzing the model's prediction scores. However, our investigations reveal that these score-based MIAs are ineffective when targeting fairness-enhanced models in binary classifications. The attack models trained to launch the MIAs degrade into simplistic threshold models, resulting in lower attack performance. Meanwhile, we observe that fairness methods often lead to prediction performance degradation for the majority subgroups of the training data. This raises the barrier to successful attacks and widens the prediction gaps between member and non-member data. Building upon these insights, we propose an efficient MIA method against fairness-enhanced models based on fairness discrepancy results (FD-MIA). It leverages the difference in the predictions from both the original and fairness-enhanced models and exploits the observed prediction gaps as attack clues. We also explore potential strategies for mitigating privacy leakages. Extensive experiments validate our findings and demonstrate the efficacy of the proposed method.
    摘要 Building on these insights, we propose an efficient MIA method against fairness-enhanced models based on fairness discrepancy results (FD-MIA). This method leverages the difference in predictions from both the original and fairness-enhanced models and exploits the observed prediction gaps as attack clues. We also explore potential strategies for mitigating privacy leakages. Our extensive experiments validate our findings and demonstrate the effectiveness of the proposed method.

Aspects of human memory and Large Language Models

  • paper_url: http://arxiv.org/abs/2311.03839
  • repo_url: https://github.com/rmldj/memory-llm-paper
  • paper_authors: Romuald A. Janik
  • for: 研究大语言模型(LLM)的内存特性,了解它们是如何模仿人类内存的。
  • methods: 通过分析LLM的训练数据统计特性,发现它们具有人类内存的一些特点。
  • results: 研究结果表明,LLM的人类内存特点不是由其架构自动带来的,而是由训练文本数据的统计特性学习得来的。这些结果表明,人类内存的特点会影响我们如何排序文本 narraatives。
    Abstract Large Language Models (LLMs) are huge artificial neural networks which primarily serve to generate text, but also provide a very sophisticated probabilistic model of language use. Since generating a semantically consistent text requires a form of effective memory, we investigate the memory properties of LLMs and find surprising similarities with key characteristics of human memory. We argue that the human-like memory properties of the Large Language Model do not follow automatically from the LLM architecture but are rather learned from the statistics of the training textual data. These results strongly suggest that the biological features of human memory leave an imprint on the way that we structure our textual narratives.
    摘要 Here's the text in Simplified Chinese:大语言模型(LLM)是一种非常大的人工神经网络,主要用于生成文本,同时也提供了一种非常复杂的语言使用概率模型。为生成具有 semantic consistency 的文本,LLM 需要一种有效的记忆,我们发现了 LLM 的记忆特性与人类记忆的一些关键特征有很大的相似性。我们认为这些人类记忆特性不是 LLM 的architecture 的自然结果,而是从training文本数据中学习的。这些发现表明了人类记忆的生物特征会影响我们如何结构我们的文本narracles。

Reducing Spatial Fitting Error in Distillation of Denoising Diffusion Models

  • paper_url: http://arxiv.org/abs/2311.03830
  • repo_url: https://github.com/Sainzerjj/SFERD
  • paper_authors: Shengzhe Zhou, Zejian Lee, Shengyuan Zhang, Lefan Hou, Changyuan Yang, Guang Yang, Lingyun Sun
  • For: 提高 diffusion 模型的采样质量,使用知识储存法。* Methods: 使用注意力导航和设计的semantic gradient predictor来减少学生模型的适应错误。* Results: 实现了一步内生高质量样本生成,对CIFAR-10和ImageNet 64×64 achieve FID of 5.31和9.39。
    Abstract Denoising Diffusion models have exhibited remarkable capabilities in image generation. However, generating high-quality samples requires a large number of iterations. Knowledge distillation for diffusion models is an effective method to address this limitation with a shortened sampling process but causes degraded generative quality. Based on our analysis with bias-variance decomposition and experimental observations, we attribute the degradation to the spatial fitting error occurring in the training of both the teacher and student model. Accordingly, we propose $\textbf{S}$patial $\textbf{F}$itting-$\textbf{E}$rror $\textbf{R}$eduction $\textbf{D}$istillation model ($\textbf{SFERD}$). SFERD utilizes attention guidance from the teacher model and a designed semantic gradient predictor to reduce the student's fitting error. Empirically, our proposed model facilitates high-quality sample generation in a few function evaluations. We achieve an FID of 5.31 on CIFAR-10 and 9.39 on ImageNet 64$\times$64 with only one step, outperforming existing diffusion methods. Our study provides a new perspective on diffusion distillation by highlighting the intrinsic denoising ability of models.
    摘要 diffusion 模型在图像生成方面表现出了非常出色的能力。然而,生成高质量样本需要较多的迭代。知识传授 для diffusion 模型是一种有效的方法,可以缩短迭代过程,但会导致生成质量下降。根据我们的分析和实验观察,我们认为这种下降是在教师和学生模型的训练过程中出现的空间适应错误。因此,我们提出了 $\textbf{S}$patial $\textbf{F}$itting-$\textbf{E}$rror $\textbf{R}$eduction $\textbf{D}$istillation模型(SFERD)。SFERD使用教师模型的注意力指导和设计的含义梯度预测器来减少学生模型的适应错误。经验表明,我们的提出的模型可以在几个函数评估过程中生成高质量样本,我们在 CIFAR-10 和 ImageNet 64$\times$64 上达到了 FID 5.31 和 9.39 只需一步,超过了现有的扩散方法。我们的研究提供了一个新的视角,强调扩散混合模型的内在杂谱能力。

Rethinking and Improving Multi-task Learning for End-to-end Speech Translation

  • paper_url: http://arxiv.org/abs/2311.03810
  • repo_url: https://github.com/xiaozhang521/imtl
  • paper_authors: Yuhao Zhang, Chen Xu, Bei Li, Hao Chen, Tong Xiao, Chunliang Zhang, Jingbo Zhu
  • For: This paper aims to investigate the consistency between different tasks in end-to-end speech translation (ST), and to propose an improved multi-task learning (IMTL) approach to bridge the modal gap and improve ST performance.* Methods: The authors use a multi-task learning approach, considering different times and modules, to explore the consistency between different tasks. They also propose an improved multi-task learning approach that mitigates the difference in length and representation to bridge the modal gap.* Results: The authors conduct experiments on the MuST-C dataset and achieve state-of-the-art results. Additionally, with the use of additional data, they achieve a new SOTA result on the MuST-C English to Spanish task with 20.8% of the training time required by the current SOTA method.
    Abstract Significant improvements in end-to-end speech translation (ST) have been achieved through the application of multi-task learning. However, the extent to which auxiliary tasks are highly consistent with the ST task, and how much this approach truly helps, have not been thoroughly studied. In this paper, we investigate the consistency between different tasks, considering different times and modules. We find that the textual encoder primarily facilitates cross-modal conversion, but the presence of noise in speech impedes the consistency between text and speech representations. Furthermore, we propose an improved multi-task learning (IMTL) approach for the ST task, which bridges the modal gap by mitigating the difference in length and representation. We conduct experiments on the MuST-C dataset. The results demonstrate that our method attains state-of-the-art results. Moreover, when additional data is used, we achieve the new SOTA result on MuST-C English to Spanish task with 20.8% of the training time required by the current SOTA method.
    摘要 significanth improvements in end-to-end speech translation (ST) have been achieved through the application of multi-task learning. However, the extent to which auxiliary tasks are highly consistent with the ST task, and how much this approach truly helps, have not been thoroughly studied. In this paper, we investigate the consistency between different tasks, considering different times and modules. We find that the textual encoder primarily facilitates cross-modal conversion, but the presence of noise in speech impedes the consistency between text and speech representations. Furthermore, we propose an improved multi-task learning (IMTL) approach for the ST task, which bridges the modal gap by mitigating the difference in length and representation. We conduct experiments on the MuST-C dataset. The results demonstrate that our method attains state-of-the-art results. Moreover, when additional data is used, we achieve the new SOTA result on MuST-C English to Spanish task with 20.8% of the training time required by the current SOTA method.Here's the translation in Traditional Chinese:这篇研究文章 investigate了多任务学习(MTL)在语音译写(ST)中的改善,但尚未全面研究auxiliary task是否高度一致ST任务,以及这种方法对ST任务的帮助程度。我们发现文本编码器主要帮助跨modal转换,但是对话声中的噪音会干扰文本和声音表现之间的一致性。此外,我们提出一种改进多任务学习(IMTL)方法,将modal gap mitigated by reducing the difference in length and representation。我们在MuST-C dataset上进行实验,结果显示我们的方法实现了state-of-the-art的结果。此外,当使用更多数据时,我们在MuST-C英文到西班牙语任务上实现了新的SOTA结果,并且只需20.8%的训练时间。

Scene-Driven Multimodal Knowledge Graph Construction for Embodied AI

  • paper_url: http://arxiv.org/abs/2311.03783
  • repo_url: None
  • paper_authors: Song Yaoxian, Sun Penglei, Liu Haoyu, Li Zhixu, Song Wei, Xiao Yanghua, Zhou Xiaofang
  • For: The paper aims to address the challenge of scene knowledge in embodied AI and proposes a scene-driven multimodal knowledge graph construction method to improve the intelligence of real-world agents.* Methods: The proposed method combines conventional knowledge engineering and large language models to construct a unified scene knowledge injection framework, and the authors evaluate the advantages of their method through an instantiation of the method for typical indoor robotic functionalities.* Results: The experimental results show that the knowledge-enhanced methods using the instantiated ManipMob-MMKG can improve the performance of embodied tasks without re-designing model structures complexly, demonstrating the effectiveness of the proposed method in improving the intelligence of real-world agents.Here is the Chinese version of the three key points:* For: 论文旨在解决embodied AI中场景知识的挑战,并提出了场景驱动多Modal知识图 constructions方法,以提高现实世界代理人的智能。* Methods: 提出的方法结合了常规知识工程和大型自然语言模型,实现了场景知识注入框架的统一化,并通过对典型室内 робоics功能(Manipulation和Mobility)进行实例化,以评估方法的优势。* Results: 实验结果表明,通过使用实例化的ManipMob-MMKG,可以明显提高embodied任务的性能,而无需复杂地重新设计模型结构。
    Abstract Embodied AI is one of the most popular studies in artificial intelligence and robotics, which can effectively improve the intelligence of real-world agents (i.e. robots) serving human beings. Scene knowledge is important for an agent to understand the surroundings and make correct decisions in the varied open world. Currently, knowledge base for embodied tasks is missing and most existing work use general knowledge base or pre-trained models to enhance the intelligence of an agent. For conventional knowledge base, it is sparse, insufficient in capacity and cost in data collection. For pre-trained models, they face the uncertainty of knowledge and hard maintenance. To overcome the challenges of scene knowledge, we propose a scene-driven multimodal knowledge graph (Scene-MMKG) construction method combining conventional knowledge engineering and large language models. A unified scene knowledge injection framework is introduced for knowledge representation. To evaluate the advantages of our proposed method, we instantiate Scene-MMKG considering typical indoor robotic functionalities (Manipulation and Mobility), named ManipMob-MMKG. Comparisons in characteristics indicate our instantiated ManipMob-MMKG has broad superiority in data-collection efficiency and knowledge quality. Experimental results on typical embodied tasks show that knowledge-enhanced methods using our instantiated ManipMob-MMKG can improve the performance obviously without re-designing model structures complexly. Our project can be found at https://sites.google.com/view/manipmob-mmkg
    摘要 现实世界中的机器人服务人类需要智能化,embodied AI是人工智能和机器人领域中最受欢迎的研究之一,可以有效提高机器人的智能水平。场景知识是一个机器人理解困难的重要因素,以便它可以在多样化的开放世界中做出正确的决策。然而,目前存在的场景知识库缺失,大多数现有的工作使用通用知识库或预训练模型来提高机器人的智能水平。这些通用知识库缺乏特定场景的信息,而且收集数据成本较高,预训练模型则面临知识不确定性和维护困难。为了解决场景知识的挑战,我们提出了场景驱动多modal知识图(Scene-MMKG)建构方法,这种方法结合了传统知识工程和大型自然语言模型。我们还提出了一种统一场景知识注入框架,用于场景知识表示。为了评估我们提出的方法的优势,我们实例化了Scene-MMKG,并考虑了典型的室内机器人功能(搬运和移动),称之为ManipMob-MMKG。对比分析表明,我们的实例化ManipMob-MMKG在数据收集效率和知识质量方面有广泛的优势。实验结果表明,使用我们的实例化ManipMob-MMKG可以明显提高机器人的性能,无需复杂地重新设计模型结构。更多细节可以通过我们的项目网站(https://sites.google.com/view/manipmob-mmkg)了解。

Ensembling Textual and Structure-Based Models for Knowledge Graph Completion

  • paper_url: http://arxiv.org/abs/2311.03780
  • repo_url: None
  • paper_authors: Ananjan Nandi, Navdeep Kaur, Parag Singla, Mausam
  • for: 这个论文的目的是提出一种可以结合两种常见的知识图完成(KGC)方法的新方法,即文本模型和结构基于模型。
  • methods: 这个论文使用了两种方法:文本模型和结构基于模型。文本模型利用知识图中实体描述文本,而结构基于模型则利用知识图的连接结构。
  • results: 这个论文的实验结果表明,这两种方法具有补偿的优势:结构基于模型在知识图中找到答案较容易时表现出色,而文本模型则可以通过描述文本来提供更好的性能,即使答案不在知识图中。
    Abstract We consider two popular approaches to Knowledge Graph Completion (KGC): textual models that rely on textual entity descriptions, and structure-based models that exploit the connectivity structure of the Knowledge Graph (KG). Preliminary experiments show that these approaches have complementary strengths: structure-based models perform well when the gold answer is easily reachable from the query head in the KG, while textual models exploit descriptions to give good performance even when the gold answer is not reachable. In response, we explore ensembling as a way of combining the best of both approaches. We propose a novel method for learning query-dependent ensemble weights by using the distributions of scores assigned by individual models to all candidate entities. Our ensemble baseline achieves state-of-the-art results on three standard KGC datasets, with up to 6.8 pt MRR and 8.3 pt Hits@1 gains over best individual models.
    摘要 我团队考虑了两种受欢迎的知识图完成(KGC)方法:文本模型,它们基于知识图中实体描述文本,以及结构基于模型,它们利用知识图中实体之间的连接结构。我们的初步实验表明,这两种方法具有补偿性优势:结构基于模型在金标答案可以快速到达知识图中的情况下表现良好,而文本模型可以通过描述来提供良好的性能,即使金标答案不可达。因此,我们研究了 ensemble 的方式来结合这两种方法。我们提出了一种学习查询相依 ensemble 重量的新方法,使用各个模型对所有候选实体分配的分布来分配 ensemble 重量。我们的 ensemble 基线达到了三个标准 KGC 数据集的状态核心结果,与最佳个体模型相比,增加了6.8pt MRR和8.3pt Hits@1。

PT-Tuning: Bridging the Gap between Time Series Masked Reconstruction and Forecasting via Prompt Token Tuning

  • paper_url: http://arxiv.org/abs/2311.03768
  • repo_url: None
  • paper_authors: Hao Liu, Jinrui Gan, Xiaoxuan Fan, Yi Zhang, Chuanxian Luo, Jing Zhang, Guangxin Jiang, Yucheng Qian, Changwei Zhao, Huan Ma, Zhenyu Guo
  • for: bridging the gap between time series masked reconstruction and forecasting
  • methods: reserving pre-trained mask token during fine-tuning stage, prompt token tuning (PT-Tuning) paradigm
  • results: state-of-the-art performance compared to representation learning and end-to-end supervised forecasting methods
    Abstract Self-supervised learning has been actively studied in time series domain recently, especially for masked reconstruction. Most of these methods follow the "Pre-training + Fine-tuning" paradigm in which a new decoder replaces the pre-trained decoder to fit for a specific downstream task, leading to inconsistency of upstream and downstream tasks. In this paper, we first point out that the unification of task objectives and adaptation for task difficulty are critical for bridging the gap between time series masked reconstruction and forecasting. By reserving the pre-trained mask token during fine-tuning stage, the forecasting task can be taken as a special case of masked reconstruction, where the future values are masked and reconstructed based on history values. It guarantees the consistency of task objectives but there is still a gap in task difficulty. Because masked reconstruction can utilize contextual information while forecasting can only use historical information to reconstruct. To further mitigate the existed gap, we propose a simple yet effective prompt token tuning (PT-Tuning) paradigm, in which all pre-trained parameters are frozen and only a few trainable prompt tokens are added to extended mask tokens in element-wise manner. Extensive experiments on real-world datasets demonstrate the superiority of our proposed paradigm with state-of-the-art performance compared to representation learning and end-to-end supervised forecasting methods.
    摘要 自适应学习在时间序列领域已经广泛研究,特别是面向压缩重建。大多数这些方法采用“预训练+精度调整”模式,在这种模式下,一个新的解码器取代预训练的解码器,以适应特定下游任务,导致上游和下游任务不一致。在这篇论文中,我们首先指出了将任务目标统一和适应任务Difficulty是bridging上时间序列压缩重建和预测的关键因素。在精度调整阶段,我们保留预训练的压缩token,以使预测任务成为面向压缩重建的特殊情况,将未来值视为压缩的未知值,并基于历史值进行重建。这 garantizes任务目标的一致性,但是还有一个差距在任务难度上。因为压缩重建可以利用 contextual information,而预测只能使用历史信息来重建。为了进一步减少这个差距,我们提出了一种简单 yet effective的 prompt token tuning(PT-Tuning)模式。在这种模式下,所有预训练参数都被冻结,并只添加了一些可训练的提示符Token来扩展压缩Token。我们对实际世界数据进行了广泛的实验, demonstarted our proposed paradigm的优越性,并与 representation learning和端到端直接监督预测方法相比。

Augmenting Radio Signals with Wavelet Transform for Deep Learning-Based Modulation Recognition

  • paper_url: http://arxiv.org/abs/2311.03761
  • repo_url: None
  • paper_authors: Tao Chen, Shilian Zheng, Kunfeng Qiu, Luxin Zhang, Qi Xuan, Xiaoniu Yang
  • for: 这篇论文是用于提出一种基于深度学习的无线电波调变识别方法,以解决现实中缺乏训练数据的问题。
  • methods: 这篇论文使用了一些资料增强技术,包括将细节系数分解成离散波лет变换后重建新数据,以增加训练集的多样性和量。不同的生成方法被用来生成替补序列。
  • results: simulations 结果显示,提案的方法可以对其他增强方法进行明显的优化。
    Abstract The use of deep learning for radio modulation recognition has become prevalent in recent years. This approach automatically extracts high-dimensional features from large datasets, facilitating the accurate classification of modulation schemes. However, in real-world scenarios, it may not be feasible to gather sufficient training data in advance. Data augmentation is a method used to increase the diversity and quantity of training dataset and to reduce data sparsity and imbalance. In this paper, we propose data augmentation methods that involve replacing detail coefficients decomposed by discrete wavelet transform for reconstructing to generate new samples and expand the training set. Different generation methods are used to generate replacement sequences. Simulation results indicate that our proposed methods significantly outperform the other augmentation methods.
    摘要 深度学习对广播调制识别已经在过去几年广泛使用。这种方法自动提取大量数据中的特征,使得调制方案的准确识别变得可能。然而,在实际应用中,可能无法在提前收集充足的训练数据。数据扩充是一种方法,用于增加训练集的多样性和量,并降低数据稀疏和不均衡。在这篇论文中,我们提议了一种数据扩充方法,包括使用梯形变换径分解后的细节系数重建新样本,以扩大训练集。不同的生成方法用于生成替换序列。实验结果表明,我们的提议方法在其他扩充方法的基础上显著提高了性能。

Learning Decentralized Traffic Signal Controllers with Multi-Agent Graph Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.03756
  • repo_url: None
  • paper_authors: Yao Zhang, Zhiwen Yu, Jun Zhang, Liang Wang, Tom H. Luan, Bin Guo, Chau Yuen
  • for: 这篇论文关注智能城市中的优化交通信号控制问题,即为复杂网络系统控制问题。在交通灯和道路网络之间存在互动动力,实现控制器适应性和扩展性是主要挑战。
  • methods: 作者采用多智能 reinforcement learning(MARL)方法,但现有MARL算法忽视了有效信息聚合,这是提高分布式代理人学习能力的关键。本文提出了一种新的分布式控制架构,包括改进环境观察性以捕捉空间-时间相关性。
  • results: 作者通过对 sintetic和实际数据进行广泛的实验,证明了其方案在分布式环境中比现有方法更高效。
    Abstract This paper considers optimal traffic signal control in smart cities, which has been taken as a complex networked system control problem. Given the interacting dynamics among traffic lights and road networks, attaining controller adaptivity and scalability stands out as a primary challenge. Capturing the spatial-temporal correlation among traffic lights under the framework of Multi-Agent Reinforcement Learning (MARL) is a promising solution. Nevertheless, existing MARL algorithms ignore effective information aggregation which is fundamental for improving the learning capacity of decentralized agents. In this paper, we design a new decentralized control architecture with improved environmental observability to capture the spatial-temporal correlation. Specifically, we first develop a topology-aware information aggregation strategy to extract correlation-related information from unstructured data gathered in the road network. Particularly, we transfer the road network topology into a graph shift operator by forming a diffusion process on the topology, which subsequently facilitates the construction of graph signals. A diffusion convolution module is developed, forming a new MARL algorithm, which endows agents with the capabilities of graph learning. Extensive experiments based on both synthetic and real-world datasets verify that our proposal outperforms existing decentralized algorithms.
    摘要 To address this limitation, we propose a new decentralized control architecture with enhanced environmental observability. Our approach includes a topology-aware information aggregation strategy to extract correlation-related information from unstructured data collected in the road network. Specifically, we convert the road network topology into a graph shift operator by creating a diffusion process on the topology, which enables the construction of graph signals. We then develop a diffusion convolution module, which endows agents with the ability to learn from graph signals.Extensive experiments using both synthetic and real-world datasets demonstrate that our proposed method outperforms existing decentralized algorithms. Our approach enables more efficient and effective traffic signal control, leading to improved traffic flow and reduced congestion in smart cities.

COOL: A Constraint Object-Oriented Logic Programming Language and its Neural-Symbolic Compilation System

  • paper_url: http://arxiv.org/abs/2311.03753
  • repo_url: None
  • paper_authors: Jipeng Han
  • for: 本研究探讨了神经网络与逻辑编程的集成,解决了长期存在的神经网络学习和逻辑逻辑之间的结合问题。传统的尝试都受到了数据收集初始化的困难、网络训练不可靠和已训练模型的复用和扩展问题的困难。
  • methods: 我们提出了一种新的Constraint Object-Oriented Logic(COOL)编程语言,它将逻辑推理与神经网络技术自然地结合起来。COOL可以自动处理数据收集,减少用户提供初始数据的需求。它还将用户提示包含在编程过程中,以降低神经网络训练不可靠的风险,并在神经网络模型的整个生命周期中促进模型之间的交互,以便提高模型的重用和扩展。
  • results: 我们的COOL语言和编译系统的基本原则和算法可能会为未来的编程语言和神经网络架构的发展提供有价值的指导。
    Abstract This paper explores the integration of neural networks with logic programming, addressing the longstanding challenges of combining the generalization and learning capabilities of neural networks with the precision of symbolic logic. Traditional attempts at this integration have been hampered by difficulties in initial data acquisition, the reliability of undertrained networks, and the complexity of reusing and augmenting trained models. To overcome these issues, we introduce the COOL (Constraint Object-Oriented Logic) programming language, an innovative approach that seamlessly combines logical reasoning with neural network technologies. COOL is engineered to autonomously handle data collection, mitigating the need for user-supplied initial data. It incorporates user prompts into the coding process to reduce the risks of undertraining and enhances the interaction among models throughout their lifecycle to promote the reuse and augmentation of networks. Furthermore, the foundational principles and algorithms in COOL's design and its compilation system could provide valuable insights for future developments in programming languages and neural network architectures.
    摘要 (本文探讨了神经网络与逻辑编程的集成,解决了将神经网络的通用化和学习能力与逻辑逻辑的精度相结合的长期挑战。传统的尝试都受到了数据收集的困难、不可靠的培训网络和 reuse和修改已训练的模型的复杂性的限制。为了解决这些问题,我们介绍了COOL(卷积物理逻辑)编程语言,一种创新的方法,可以自然地结合逻辑推理和神经网络技术。COOL通过自动处理数据收集来减少用户提供的初始数据的需求,并通过在编程过程中包含用户提示来减少下线训练的风险,以及在模型的整个生命周期中提高模型之间的互动,以促进模型的重复和扩展。此外,COOL的基本原理和编译系统的设计和实现可能会为未来的编程语言和神经网络架构带来有价值的意义。)

Analysis and Applications of Deep Learning with Finite Samples in Full Life-Cycle Intelligence of Nuclear Power Generation

  • paper_url: http://arxiv.org/abs/2311.04247
  • repo_url: None
  • paper_authors: Chenwei Tang, Wenqiang Zhou, Dong Wang, Caiyang Yu, Zhenan He, Jizhe Zhou, Shudong Huang, Yi Gao, Jianming Chen, Wentao Feng, Jiancheng Lv
  • for: 本研究旨在探讨在营业4.0时代,在能源探索和生产等复杂工业环境中应用深度学习技术,并对其在营业4.0中的应用提供技术基础和新视角。
  • methods: 本研究采用了小样本学习、少量学习、零处学习和开集识别等深度学习技术,以适应能源探索和生产等工业环境中的数据特点。
  • results: 本研究通过两个实践案例,一是自动识别锆合金图像,二是开集识别机器传感器信号,展示了深度学习技术在营业4.0中的可靠和高效应用。
    Abstract The advent of Industry 4.0 has precipitated the incorporation of Artificial Intelligence (AI) methods within industrial contexts, aiming to realize intelligent manufacturing, operation as well as maintenance, also known as industrial intelligence. However, intricate industrial milieus, particularly those relating to energy exploration and production, frequently encompass data characterized by long-tailed class distribution, sample imbalance, and domain shift. These attributes pose noteworthy challenges to data-centric Deep Learning (DL) techniques, crucial for the realization of industrial intelligence. The present study centers on the intricate and distinctive industrial scenarios of Nuclear Power Generation (NPG), meticulously scrutinizing the application of DL techniques under the constraints of finite data samples. Initially, the paper expounds on potential employment scenarios for AI across the full life-cycle of NPG. Subsequently, we delve into an evaluative exposition of DL's advancement, grounded in the finite sample perspective. This encompasses aspects such as small-sample learning, few-shot learning, zero-shot learning, and open-set recognition, also referring to the unique data characteristics of NPG. The paper then proceeds to present two specific case studies. The first revolves around the automatic recognition of zirconium alloy metallography, while the second pertains to open-set recognition for signal diagnosis of machinery sensors. These cases, spanning the entirety of NPG's life-cycle, are accompanied by constructive outcomes and insightful deliberations. By exploring and applying DL methodologies within the constraints of finite sample availability, this paper not only furnishes a robust technical foundation but also introduces a fresh perspective toward the secure and efficient advancement and exploitation of this advanced energy source.
    摘要 industry 4.0的出现引起了在工业上使用人工智能(AI)方法的普及,以实现智能生产、运营和维护,也称为工业智能。然而,在能源探索和生产中的工业环境中,数据往往具有长尾分布、样本偏移和领域转换等特点,这些特点对数据驱动的深度学习(DL)技术 pose notable challenges。本研究将关注在核电生产(NPG)中的复杂和特殊工业场景,细化分析DL技术在有限数据样本的约束下的应用。首先,本文将讨论在NPG全生命周期中可能的AI应用场景。然后,我们将深入探讨DL技术在有限样本视角下的发展,包括小样本学习、少数shot学习、零shot学习和开放集 recognition,同时参照NPG特有的数据特征。本文接着介绍两个特定的案例研究:一是自动识别锌合金镀层 micrography,二是开放集 recognition for machinery sensors的信号诊断。这两个案例,涵盖了NPG生命周期的全部,并且获得了有益的结果和深入的讨论。通过在有限样本 availability 下应用和探索DL方法ologies,本文不仅提供了坚实的技术基础,还提供了一种新的视角,即在安全和高效地推进和利用这种先进能源源的方法。

Leveraging Large Language Models for Automated Proof Synthesis in Rust

  • paper_url: http://arxiv.org/abs/2311.03739
  • repo_url: None
  • paper_authors: Jianan Yao, Ziqiao Zhou, Weiteng Chen, Weidong Cui
  • for: 这 paper 是为了推广形式验证的广泛采用,但高证明负担已经阻碍了很长时间。
  • methods: 这 paper 使用了 Large Language Models (LLMs) 和静态分析,把 Rust 语言的正式验证框架 Verus 中的 invariants、assertions 和其他证据结构合成出来。
  • results: 在几个步骤的设置下,LLMs 能够快速生成 postconditions 和循环 invariants,特别是对短代码段进行分析。但 LLMs 缺乏保持和传递上下文信息的能力,这是传统静态分析的优点。基于这些观察,我们开发了一个基于 OpenAI 的 GPT-4 模型的原型。我们的原型将验证任务分解成多个更小的任务,逐步调用 GPT-4,并将其输出与轻量级静态分析结合起来。我们对 20 个向量处理程序进行了评估。结果表明,我们的原型可以减少人类的证据代码写作劳动。
    Abstract Formal verification can provably guarantee the correctness of critical system software, but the high proof burden has long hindered its wide adoption. Recently, Large Language Models (LLMs) have shown success in code analysis and synthesis. In this paper, we present a combination of LLMs and static analysis to synthesize invariants, assertions, and other proof structures for a Rust-based formal verification framework called Verus. In a few-shot setting, LLMs demonstrate impressive logical ability in generating postconditions and loop invariants, especially when analyzing short code snippets. However, LLMs lack the ability to retain and propagate context information, a strength of traditional static analysis. Based on these observations, we developed a prototype based on OpenAI's GPT-4 model. Our prototype decomposes the verification task into multiple smaller ones, iteratively queries GPT-4, and combines its output with lightweight static analysis. We evaluated the prototype with a developer in the automation loop on 20 vector-manipulating programs. The results demonstrate that it significantly reduces human effort in writing entry-level proof code.
    摘要 Formal verification 可以证明 kritical 系统软件的正确性,但是长期以来受到了各种各样的阻碍。最近,大型自然语言模型(LLMs)在代码分析和生成方面表现出了成功。在这篇论文中,我们提出了结合 LLMs 和静态分析的方法,用于生成 invariants、assertrions 和其他证明结构,以验证 Rust 基于的正式验证框架 Verus。在几步设定下,LLMs 在分析短代码段时表现出了很强的逻辑能力,特别是在生成 postconditions 和循环 invariants 方面。然而,LLMs 缺乏保持和传播上下文信息的能力,这是传统静态分析的优点。基于这些观察,我们开发了一个基于 OpenAI 的 GPT-4 模型的原型。我们的原型将验证任务分解成多个更小的任务, iteratively 查询 GPT-4,并将其输出与轻量级静态分析结合起来。我们对 20 个vector-manipulating 程序进行了评估。结果表明,我们的原型可以很大程度地减少人类的证明代码编写劳动。

deep-REMAP: Parameterization of Stellar Spectra Using Regularized Multi-Task Learning

  • paper_url: http://arxiv.org/abs/2311.03738
  • repo_url: None
  • paper_authors: Sankalp Gilda
  • for: 这篇论文是为了准确地预测恒星大气层参数(效温、表面重力、金属量)而开发的一个新框架。
  • methods: 这篇论文使用了深度学习的多 зада用混合学习方法,包括多任务学习和一个创新的不对称损失函数,并且使用了 Phoenix 库中的丰富人工spectrum和 MARVELS 调查中的观测数据来准确预测恒星大气层参数。
  • results: 这篇论文的结果显示了 $\rm{deep-REMAP}$ 框架在使用不同的恒星库和属性时的预测能力具有优势,并且显示了这个框架可以扩展到其他的恒星属性和属性。
    Abstract Traditional spectral analysis methods are increasingly challenged by the exploding volumes of data produced by contemporary astronomical surveys. In response, we develop deep-Regularized Ensemble-based Multi-task Learning with Asymmetric Loss for Probabilistic Inference ($\rm{deep-REMAP}$), a novel framework that utilizes the rich synthetic spectra from the PHOENIX library and observational data from the MARVELS survey to accurately predict stellar atmospheric parameters. By harnessing advanced machine learning techniques, including multi-task learning and an innovative asymmetric loss function, $\rm{deep-REMAP}$ demonstrates superior predictive capabilities in determining effective temperature, surface gravity, and metallicity from observed spectra. Our results reveal the framework's effectiveness in extending to other stellar libraries and properties, paving the way for more sophisticated and automated techniques in stellar characterization.
    摘要 传统的spectral分析方法随着现代天文调查数据的急剧增长而逐渐成为挑战。为应对这一问题,我们开发了deep-Regularized Ensemble-based Multi-task Learning with Asymmetric Loss for Probabilistic Inference($\rm{deep-REMAP}$),一种新的框架,利用了PHOENIX图书馆中的辐射 synthetic spectra和MARVELS调查中的观测数据,以准确预测星际大气参数。通过应用先进的机器学习技术,包括多任务学习和创新的非对称损失函数,$\rm{deep-REMAP}$ 表现出了在确定效果温度、表面重力和金属量方面的高度预测能力。我们的结果表明,该框架可以扩展到其他星际图书和特性,为stellar caracterization提供更加复杂和自动化的技术。

Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning

  • paper_url: http://arxiv.org/abs/2311.03736
  • repo_url: None
  • paper_authors: Joseph Suárez, Phillip Isola, Kyoung Whan Choe, David Bloomin, Hao Xiang Li, Nikhil Pinnaparaju, Nishaanth Kanna, Daniel Scott, Ryan Sullivan, Rose S. Shuman, Lucas de Alcântara, Herbie Bradley, Louis Castricato, Kirsty You, Yuhao Jiang, Qimai Li, Jiaxin Chen, Xiaolong Zhu
  • for: 本研究的目的是提供一个可以定义广泛任务目标和优化征引的大型多代理环境,供实验室研究人员进行强化学习研究。
  • methods: 本研究使用的方法包括Neural MMO 2.0,一个可以生成地图和多代理的多代理环境,以及CleanRL,一个强化学习框架。
  • results: 本研究的结果显示,Neural MMO 2.0可以提供三倍的性能提升 compared to its predecessor,并且支持更多的代理和更大的地图。
    Abstract Neural MMO 2.0 is a massively multi-agent environment for reinforcement learning research. The key feature of this new version is a flexible task system that allows users to define a broad range of objectives and reward signals. We challenge researchers to train agents capable of generalizing to tasks, maps, and opponents never seen during training. Neural MMO features procedurally generated maps with 128 agents in the standard setting and support for up to. Version 2.0 is a complete rewrite of its predecessor with three-fold improved performance and compatibility with CleanRL. We release the platform as free and open-source software with comprehensive documentation available at neuralmmo.github.io and an active community Discord. To spark initial research on this new platform, we are concurrently running a competition at NeurIPS 2023.
    摘要 neuralmmo 2.0 是一个大规模多智能环境,用于强化学习研究。新版本的关键特性是可变的任务系统,允许用户定义广泛的目标和奖励信号。我们挑战研究人员使得代理人能够通过训练扩展到任务、地图和对手,从未在训练过程中看到。 neuralmmo 2.0 支持生成式地图,标准设置中有128个代理人,并且支持最多4096个代理人。这是前一版本的三倍性能提升,并且与 CleanRL 兼容。我们发布了这个平台作为免费和开源软件,并提供了详细的文档,可以在 neuralmmo.github.io 上查看。同时,我们还在 NeurIPS 2023 上同步进行一项竞赛,以促进这个新平台的初始研究。

ClimateSet: A Large-Scale Climate Model Dataset for Machine Learning

  • paper_url: http://arxiv.org/abs/2311.03721
  • repo_url: None
  • paper_authors: Julia Kaltenborn, Charlotte E. E. Lange, Venkatesh Ramesh, Philippe Brouillard, Yaniv Gurwicz, Chandni Nagda, Jakob Runge, Peer Nowack, David Rolnick
  • for: 这个论文是为了提供大规模、一致的气候模型数据集,以支持气候科学和机器学习(ML)社区在气候变化预测和相关任务上的努力。
  • methods: 论文使用了Input4MIPs和CMIP6气候模型数据集,并提供了一个模块化的数据集管道,以便检索和处理更多的气候模型和enario。
  • results: 论文通过使用ClimateSet数据集作为benchmark,展示了ML模型在不同气候模型下的性能和总体化能力,并预测了新的气候变化enario,为气候科学和政策制定提供了新的参考。
    Abstract Climate models have been key for assessing the impact of climate change and simulating future climate scenarios. The machine learning (ML) community has taken an increased interest in supporting climate scientists' efforts on various tasks such as climate model emulation, downscaling, and prediction tasks. Many of those tasks have been addressed on datasets created with single climate models. However, both the climate science and ML communities have suggested that to address those tasks at scale, we need large, consistent, and ML-ready climate model datasets. Here, we introduce ClimateSet, a dataset containing the inputs and outputs of 36 climate models from the Input4MIPs and CMIP6 archives. In addition, we provide a modular dataset pipeline for retrieving and preprocessing additional climate models and scenarios. We showcase the potential of our dataset by using it as a benchmark for ML-based climate model emulation. We gain new insights about the performance and generalization capabilities of the different ML models by analyzing their performance across different climate models. Furthermore, the dataset can be used to train an ML emulator on several climate models instead of just one. Such a "super emulator" can quickly project new climate change scenarios, complementing existing scenarios already provided to policymakers. We believe ClimateSet will create the basis needed for the ML community to tackle climate-related tasks at scale.
    摘要 климаático模型已经是评估气候变化的重要工具,以及预测未来气候enario的方法。 машин学习(ML)社区对此表示了增加的兴趣,以支持气候科学家在各种任务上,如气候模型仿真、下降和预测任务。但是,气候科学和ML社区都认为,为了解决这些任务,我们需要大量、一致、ML准备好的气候模型数据集。这里,我们介绍了气候集(ClimateSet),一个包含36种气候模型的输入和输出数据集。此外,我们还提供了一个模块化数据集管道,用于检索和处理其他气候模型和enario。我们使用该数据集作为ML基于气候模型的仿真 benchmark,并分析不同气候模型之间的性能和总体可行性。此外,该数据集还可以用于训练一个ML仿真器,以便快速预测新的气候变化enario,并补充现有的气候变化enario,已经向政策制定者提供。我们认为,气候集将为ML社区提供基础,以便把气候相关任务进行大规模处理。

LLM as an Art Director (LaDi): Using LLMs to improve Text-to-Media Generators

  • paper_url: http://arxiv.org/abs/2311.03716
  • repo_url: None
  • paper_authors: Allen Roush, Emil Zakirov, Artemiy Shirokov, Polina Lunina, Jack Gane, Alexander Duffy, Charlie Basil, Aber Whitcomb, Jim Benedetto, Chris DeWolfe
  • for: 提高文本生成到图像和视频的质量和 relevance
  • methods: 使用多种技术,如约束解码、智能提示、精度调整和检索,以提高文本生成器的能力
  • results: 实现了更高质量和更有相关性的图像和视频生成,并在 Plai Labs 的应用和平台中使用
    Abstract Recent advancements in text-to-image generation have revolutionized numerous fields, including art and cinema, by automating the generation of high-quality, context-aware images and video. However, the utility of these technologies is often limited by the inadequacy of text prompts in guiding the generator to produce artistically coherent and subject-relevant images. In this paper, We describe the techniques that can be used to make Large Language Models (LLMs) act as Art Directors that enhance image and video generation. We describe our unified system for this called "LaDi". We explore how LaDi integrates multiple techniques for augmenting the capabilities of text-to-image generators (T2Is) and text-to-video generators (T2Vs), with a focus on constrained decoding, intelligent prompting, fine-tuning, and retrieval. LaDi and these techniques are being used today in apps and platforms developed by Plai Labs.
    摘要 Simplified Chinese:最近的文本到图像生成技术已经革命化了许多领域,包括艺术和电影,by自动生成高质量、上下文感知的图像和视频。然而,这些技术的实用性往往受到文本提示的不足,导致生成的图像不具有艺术性和主题相关性。在这篇论文中,我们介绍了使用Large Language Models(LLMs)作为艺术指导者,以提高图像和视频生成的技术。我们介绍了我们的统一系统叫“LaDi”,我们探讨了LaDi如何结合多种技术来增强文本到图像生成器(T2Is)和文本到视频生成器(T2Vs)的能力,包括约束解码、智能提示、精度调整和检索。LaDi和这些技术在Plai Labs开发的应用和平台上正在使用。

Loss Balancing for Fair Supervised Learning

  • paper_url: http://arxiv.org/abs/2311.03714
  • repo_url: https://github.com/khalilimahdi/loss_balancing_icml2023
  • paper_authors: Mohammad Mahdi Khalili, Xueru Zhang, Mahed Abroshan
  • for: 这 paper 的目的是提出一种可以快速和高效地在不平等损失(Equalized Loss,EL)限制下找到公平预测器的算法。
  • methods: 这 paper 使用了现有的几何编程工具(CVXPY)来将非对称优化问题转化为一系列的对称优化问题,从而使用 convex 优化算法来快速找到公平预测器。
  • results: 这 paper 通过 teorically 证明了其算法可以在某些条件下找到全球最优解,并通过了一些 empirical 研究来支持其理论结论。
    Abstract Supervised learning models have been used in various domains such as lending, college admission, face recognition, natural language processing, etc. However, they may inherit pre-existing biases from training data and exhibit discrimination against protected social groups. Various fairness notions have been proposed to address unfairness issues. In this work, we focus on Equalized Loss (EL), a fairness notion that requires the expected loss to be (approximately) equalized across different groups. Imposing EL on the learning process leads to a non-convex optimization problem even if the loss function is convex, and the existing fair learning algorithms cannot properly be adopted to find the fair predictor under the EL constraint. This paper introduces an algorithm that can leverage off-the-shelf convex programming tools (e.g., CVXPY) to efficiently find the global optimum of this non-convex optimization. In particular, we propose the ELminimizer algorithm, which finds the optimal fair predictor under EL by reducing the non-convex optimization to a sequence of convex optimization problems. We theoretically prove that our algorithm finds the global optimal solution under certain conditions. Then, we support our theoretical results through several empirical studies.
    摘要 超vised learning模型在不同领域中使用,如贷款、大学招生、脸部识别、自然语言处理等。但它们可能从训练数据中继承潜在的偏见,并对保护的社会群体表现出歧视。各种公平性观念被提出来解决不公平问题。在这种工作中,我们关注平衡损失(EL),它需要不同群体的预期损失相似。在满足EL的情况下,增加了一个非 convex 优化问题,并且现有的公平学习算法无法正确地采用EL约束来找到公平预测器。这篇论文介绍了一种可以利用现有的凸编程工具(如CVXPY)高效地找到非 convex 优化问题的全球最优解。具体来说,我们提出了EL最小化算法,它通过将非 convex 优化问题转化为一系列凸优化问题来找到公平预测器。我们理论上证明了我们的算法可以在某些条件下找到全球最优解。然后,我们通过一些实验研究来支持我们的理论结论。

Mitigating Estimation Errors by Twin TD-Regularized Actor and Critic for Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.03711
  • repo_url: None
  • paper_authors: Junmin Zhong, Ruofan Wu, Jennie Si
  • for: 减少深度强化学习中的估计偏误
  • methods: 引入新的双TD-正则化actor-critic方法,以减少过估和下估错误
  • results: 通过 combining 好的DRL改进方法,如分布学习和长N步代理stage奖励方法,实现了新的TDR基本actor-critic学习在深度控制集中的出色表现,并将TD3和SAC的表现提升到与D4PG(当前SOTA)水平,同时还提高了D4PG的表现到新的SOTA水平。
    Abstract We address the issue of estimation bias in deep reinforcement learning (DRL) by introducing solution mechanisms that include a new, twin TD-regularized actor-critic (TDR) method. It aims at reducing both over and under-estimation errors. With TDR and by combining good DRL improvements, such as distributional learning and long N-step surrogate stage reward (LNSS) method, we show that our new TDR-based actor-critic learning has enabled DRL methods to outperform their respective baselines in challenging environments in DeepMind Control Suite. Furthermore, they elevate TD3 and SAC respectively to a level of performance comparable to that of D4PG (the current SOTA), and they also improve the performance of D4PG to a new SOTA level measured by mean reward, convergence speed, learning success rate, and learning variance.
    摘要 我们解决深度奖励学习(DRL)中的估计偏见问题,通过引入新的双TD-正则化actor-critic(TDR)方法,以减少过估和下估错误。与TDR相结合,我们通过分布学习和长N步代理奖励方法(LNSS),显示了我们的新TDR基于actor-critic学习可以在深度控制集中的复杂环境中超越其基线。此外,它还提高了TD3和SAC的性能,使其与D4PG(当前SOTA)的性能相当,并且还提高了D4PG的性能到新的SOTA水平, measured by mean reward, convergence speed, learning success rate, and learning variance。Note: "SOTA" stands for "State of the Art", which means the current best performance in a particular field or area.

The NeurIPS 2022 Neural MMO Challenge: A Massively Multiagent Competition with Specialization and Trade

  • paper_url: http://arxiv.org/abs/2311.03707
  • repo_url: https://github.com/neuralmmo/neurips2022nmmo-submission-pool
  • paper_authors: Enhong Liu, Joseph Suarez, Chenhui You, Bo Wu, Bingcheng Chen, Jun Hu, Jiaxin Chen, Xiaolong Zhu, Clare Zhu, Julian Togelius, Sharada Mohanty, Weijun Hong, Rui Du, Yibing Zhang, Qinwen Wang, Xinhang Li, Zheng Yuan, Xiang Li, Yuejia Huang, Kun Zhang, Hanhui Yang, Shiqi Tang, Phillip Isola
  • for: 本文描述了 NeurIPS-2022 神经网络多player挑战的结果,该挑战吸引了500名参与者并接受了1,600份提交。
  • methods: 本文使用了最新的 v1.6 神经网络多player环境,该环境引入了新的设备、战斗、交易和评价系统,对于前一个competition pose additional robustness和总体化挑战。
  • results: 本文描述了挑战的设计和结果,探讨了这种环境作为学习方法的标准 benchmark 的潜力,并提供了一些实用的强化学习训练方法,以解决复杂任务的稀少奖励问题。
    Abstract In this paper, we present the results of the NeurIPS-2022 Neural MMO Challenge, which attracted 500 participants and received over 1,600 submissions. Like the previous IJCAI-2022 Neural MMO Challenge, it involved agents from 16 populations surviving in procedurally generated worlds by collecting resources and defeating opponents. This year's competition runs on the latest v1.6 Neural MMO, which introduces new equipment, combat, trading, and a better scoring system. These elements combine to pose additional robustness and generalization challenges not present in previous competitions. This paper summarizes the design and results of the challenge, explores the potential of this environment as a benchmark for learning methods, and presents some practical reinforcement learning training approaches for complex tasks with sparse rewards. Additionally, we have open-sourced our baselines, including environment wrappers, benchmarks, and visualization tools for future research.
    摘要 在这篇论文中,我们介绍了2022年的NeuIPS neural MMO挑战的结果,该挑战吸引了500名参与者并收到了1,600份提交。与前一年的IJCAI neural MMO挑战一样,这个挑战由16个人口生成的世界中的代理人进行了资源收集和对手击败。这年的比赛运行在最新的v1.6神经MMO上,该版本引入了新的设备、战斗、贸易和更好的分数系统。这些元素共同提供了附加的可靠性和泛化挑战,不存在在前一年的比赛中。本文总结了挑战的设计和结果,探讨了这个环境的潜在作为学习方法的标准测试环境,并提出了一些实用的强化学习训练方法 для复杂任务的稀肥奖励。此外,我们还开源了我们的基elines,包括环境包装器、标准 benchmarks 和视觉化工具,以便未来的研究。

Efficient Bottom-Up Synthesis for Programs with Local Variables

  • paper_url: http://arxiv.org/abs/2311.03705
  • repo_url: None
  • paper_authors: Xiang Li, Xiangyu Zhou, Rui Dong, Yihong Zhang, Xinyu Wang
  • for: 本研究旨在提出一种新的合成算法,可以效率地搜索具有地方变量(如lambda引入的程序)的程序。现有的底部合成算法无法评估具有自由地方变量的程序,因此无法有效减少这些程序的搜索空间(例如使用标准观察等价减少技术),从而使合成变得慢。本算法可以减少程序的搜索空间。
  • methods: 本研究使用的方法包括提出了”提起解释”的想法,即从一个程序中提高解释过程,从而同时评估所有程序的语法树。这种方法可以系统地生成所有绑定上下文,因此可以评估和减少具有地方变量的程序的搜索空间。
  • results: 研究结果表明,使用提出的算法可以更有效地自动化互联网自动化任务,比如WebRobot和Helena等现有的技术。在实际应用中,提出的工具”Arborist”可以更高效地完成更多的挑战性任务。
    Abstract We propose a new synthesis algorithm that can efficiently search programs with local variables (e.g., those introduced by lambdas). Prior bottom-up synthesis algorithms are not able to evaluate programs with free local variables, and therefore cannot effectively reduce the search space of such programs (e.g., using standard observational equivalence reduction techniques), making synthesis slow. Our algorithm can reduce the space of programs with local variables. The key idea, dubbed lifted interpretation, is to lift up the program interpretation process, from evaluating one program at a time to simultaneously evaluating all programs from a grammar. Lifted interpretation provides a mechanism to systematically enumerate all binding contexts for local variables, thereby enabling us to evaluate and reduce the space of programs with local variables. Our ideas are instantiated in the domain of web automation. The resulting tool, Arborist, can automate a significantly broader range of challenging tasks more efficiently than state-of-the-art techniques including WebRobot and Helena.
    摘要 我们提出了一个新的合成算法,可以效率地搜寻具有地方变数(例如lambda引入的)的程式。先前的底部合成算法无法评估具有自由地方变数的程式,因此无法有效缩小这些程式的搜寻空间(例如使用标准观察对等减少技术),使合成变得慢。我们的算法可以缩小具有地方变数的程式的搜寻空间。我们的主要想法是将程式解释过程“升级”到同时评估所有程式的语法中,这称为“提升解释”。提升解释提供了一个系统性的方式来谱系地方变数的绑定上下文,因此可以有效地评估和缩小具有地方变数的程式的搜寻空间。我们的想法是实现在网页自动化领域,而我们的工具Arborist可以更有效率地自动化许多具有挑战性的任务,比如WebRobot和Helena。

Hypothesis Network Planned Exploration for Rapid Meta-Reinforcement Learning Adaptation

  • paper_url: http://arxiv.org/abs/2311.03701
  • repo_url: None
  • paper_authors: Maxwell Joseph Jacobson, Yexiang Xue
  • for: trains agents that adapt to fast-changing environments and tasks
  • methods: integrates an active and planned exploration process via the hypothesis network to optimize adaptation speed
  • results: outpaces baseline methods in adaptation speed and model accuracy, validating its potential in enhancing reinforcement learning adaptation in rapidly evolving settingsHere’s the full text in Simplified Chinese:
  • for: 训练适应快速变化环境和任务的智能代理
  • methods: 通过假设网络实现活动和规划探索,以优化适应速度
  • results: 在符号化的Alchemy游戏中舜测了比基准方法更快的适应速度和模型准确率,证明其在快速演化 Setting 中提高学习适应的潜力
    Abstract Meta Reinforcement Learning (Meta RL) trains agents that adapt to fast-changing environments and tasks. Current strategies often lose adaption efficiency due to the passive nature of model exploration, causing delayed understanding of new transition dynamics. This results in particularly fast-evolving tasks being impossible to solve. We propose a novel approach, Hypothesis Network Planned Exploration (HyPE), that integrates an active and planned exploration process via the hypothesis network to optimize adaptation speed. HyPE uses a generative hypothesis network to form potential models of state transition dynamics, then eliminates incorrect models through strategically devised experiments. Evaluated on a symbolic version of the Alchemy game, HyPE outpaces baseline methods in adaptation speed and model accuracy, validating its potential in enhancing reinforcement learning adaptation in rapidly evolving settings.
    摘要 meta 强化学习(Meta RL)训练代理人可适应快速变化的环境和任务。现有策略经常因模型探索的 passive 性而失去适应效率,导致新的转移动力学特性理解延迟。这会导致特别是快速演化的任务无法解决。我们提议一种新的方法,假设网络规划探索(HyPE),它通过假设网络整合活动和规划探索过程,以优化适应速度。HyPE 使用生成假设网络形成可能的状态转移动力学模型,然后通过策划的实验排除错误模型。在符号版的 Alchemy 游戏中评估,HyPE 比基eline方法更快地适应和模型准确性提高,证明其在快速演化的情况下提高强化学习适应能力的潜力。

A Novel Variational Lower Bound for Inverse Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.03698
  • repo_url: None
  • paper_authors: Yikang Gui, Prashant Doshi
  • for: 学习任务协同或模仿的奖励函数,从专家轨迹中学习任务特性和奖励函数。
  • methods: 基于概率图模型和优化节点的Variational Lower Bound for IRL(VLB-IRL)方法,同时学习奖励函数和政策。
  • results: 在多个知名领域中,比如跑步和游戏等,方法可以学习有效的奖励函数,并且政策下的奖励函数可以达到专家水平性。此外,方法也超过了现有的State-of-the-art IRL算法在这些领域的性能。
    Abstract Inverse reinforcement learning (IRL) seeks to learn the reward function from expert trajectories, to understand the task for imitation or collaboration thereby removing the need for manual reward engineering. However, IRL in the context of large, high-dimensional problems with unknown dynamics has been particularly challenging. In this paper, we present a new Variational Lower Bound for IRL (VLB-IRL), which is derived under the framework of a probabilistic graphical model with an optimality node. Our method simultaneously learns the reward function and policy under the learned reward function by maximizing the lower bound, which is equivalent to minimizing the reverse Kullback-Leibler divergence between an approximated distribution of optimality given the reward function and the true distribution of optimality given trajectories. This leads to a new IRL method that learns a valid reward function such that the policy under the learned reward achieves expert-level performance on several known domains. Importantly, the method outperforms the existing state-of-the-art IRL algorithms on these domains by demonstrating better reward from the learned policy.
    摘要 <> translate "Inverse reinforcement learning (IRL) seeks to learn the reward function from expert trajectories, to understand the task for imitation or collaboration thereby removing the need for manual reward engineering. However, IRL in the context of large, high-dimensional problems with unknown dynamics has been particularly challenging. In this paper, we present a new Variational Lower Bound for IRL (VLB-IRL), which is derived under the framework of a probabilistic graphical model with an optimality node. Our method simultaneously learns the reward function and policy under the learned reward function by maximizing the lower bound, which is equivalent to minimizing the reverse Kullback-Leibler divergence between an approximated distribution of optimality given the reward function and the true distribution of optimality given trajectories. This leads to a new IRL method that learns a valid reward function such that the policy under the learned reward achieves expert-level performance on several known domains. Importantly, the method outperforms the existing state-of-the-art IRL algorithms on these domains by demonstrating better reward from the learned policy."into Simplified Chinese.Here's the translation:<> inverse reinforcement learning (IRL) 目的是从专家轨迹中学习奖励函数,以便理解任务,以便模拟或协作,从而消除手动奖励工程。然而,在大型、高维度问题中, unknown 动力学下的 IRL 特别困难。在这篇文章中,我们提出了一种新的 Variational Lower Bound for IRL (VLB-IRL),它基于 probabilistic graphical model 的优化节点。我们的方法同时学习奖励函数和 policy under 学习的奖励函数,通过最大化下界,等于最小化 reverse Kullback-Leibler divergence between an approximated distribution of optimality given the reward function and the true distribution of optimality given trajectories。这导致了一种新的 IRL 方法,该方法学习一个有效的奖励函数,使得 policy under 学习的奖励函数 achieve expert-level performance on several known domains。重要的是,该方法在这些领域上超过了现有的 state-of-the-art IRL 算法,通过示出更好的奖励来证明。

Context Shift Reduction for Offline Meta-Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.03695
  • repo_url: https://github.com/moreanp/csro
  • paper_authors: Yunkai Gao, Rui Zhang, Jiaming Guo, Fan Wu, Qi Yi, Shaohui Peng, Siming Lan, Ruizhi Chen, Zidong Du, Xing Hu, Qi Guo, Ling Li, Yunji Chen
  • for: 提高meta-学习 Agent的总结能力,解决context shift问题
  • methods: 提出了一种新的Context Shift Reduction for OMRL(CSRO)方法,通过减少policy在context中的影响来解决context shift问题
  • results: 实验结果显示,CSRO方法可以减少context shift,提高meta-学习 Agent的总结能力,超过了之前的方法在多个复杂的领域
    Abstract Offline meta-reinforcement learning (OMRL) utilizes pre-collected offline datasets to enhance the agent's generalization ability on unseen tasks. However, the context shift problem arises due to the distribution discrepancy between the contexts used for training (from the behavior policy) and testing (from the exploration policy). The context shift problem leads to incorrect task inference and further deteriorates the generalization ability of the meta-policy. Existing OMRL methods either overlook this problem or attempt to mitigate it with additional information. In this paper, we propose a novel approach called Context Shift Reduction for OMRL (CSRO) to address the context shift problem with only offline datasets. The key insight of CSRO is to minimize the influence of policy in context during both the meta-training and meta-test phases. During meta-training, we design a max-min mutual information representation learning mechanism to diminish the impact of the behavior policy on task representation. In the meta-test phase, we introduce the non-prior context collection strategy to reduce the effect of the exploration policy. Experimental results demonstrate that CSRO significantly reduces the context shift and improves the generalization ability, surpassing previous methods across various challenging domains.
    摘要 <>translate "Offline meta-reinforcement learning (OMRL) utilizes pre-collected offline datasets to enhance the agent's generalization ability on unseen tasks. However, the context shift problem arises due to the distribution discrepancy between the contexts used for training (from the behavior policy) and testing (from the exploration policy). The context shift problem leads to incorrect task inference and further deteriorates the generalization ability of the meta-policy. Existing OMRL methods either overlook this problem or attempt to mitigate it with additional information. In this paper, we propose a novel approach called Context Shift Reduction for OMRL (CSRO) to address the context shift problem with only offline datasets. The key insight of CSRO is to minimize the influence of policy in context during both the meta-training and meta-test phases. During meta-training, we design a max-min mutual information representation learning mechanism to diminish the impact of the behavior policy on task representation. In the meta-test phase, we introduce the non-prior context collection strategy to reduce the effect of the exploration policy. Experimental results demonstrate that CSRO significantly reduces the context shift and improves the generalization ability, surpassing previous methods across various challenging domains."into Simplified Chinese:Offline meta-学习(OMRL)利用预收集的offline数据集来提高agent的未知任务泛化能力。然而,上下文偏移问题出现,由于各个上下文在训练(从行为策略)和测试(从探索策略)中的分布差异。这个问题会导致任务推理错误,并进一步恶化meta策略的泛化能力。现有OMRL方法可能忽视这个问题,或者尝试通过额外信息来缓解。在这篇论文中,我们提出了一种新的方法——上下文偏移减少 для OMRL(CSRO),通过仅使用offline数据集来解决上下文偏移问题。CSRO的关键思想在于,在meta训练和meta测试阶段都尽量减少策略在上下文中的影响。在meta训练阶段,我们设计了max-min相互信息表示学习机制,以减少行为策略对任务表示的影响。在meta测试阶段,我们引入了非先验上下文收集策略,以减少探索策略对上下文的影响。实验结果表明,CSRO可以减少上下文偏移,提高泛化能力,超过了先前方法在各种复杂的领域。

Deep Bayesian Reinforcement Learning for Spacecraft Proximity Maneuvers and Docking

  • paper_url: http://arxiv.org/abs/2311.03680
  • repo_url: None
  • paper_authors: Desong Du, Naiming Qi, Yanfang Liu, Wei Pan
  • for: 本研究旨在开发一种 Bayesian actor-critic 征值学习算法,用于实现自主太空船的距离执行和对接(PMD)任务。
  • methods: 本研究使用 Lyapunov 理论和 Gaussian 过程 regression 技术,将 PMD 任务转化为一个 Markov 决策过程,并使用 Bayesian quadrature policy 优化程序来分析策略对应。
  • results: 实验结果显示,提出的算法在一个太空船气压滑道测试平台上表现出色,并且满足了航天任务中的严格安全需求。
    Abstract In the pursuit of autonomous spacecraft proximity maneuvers and docking(PMD), we introduce a novel Bayesian actor-critic reinforcement learning algorithm to learn a control policy with the stability guarantee. The PMD task is formulated as a Markov decision process that reflects the relative dynamic model, the docking cone and the cost function. Drawing from the principles of Lyapunov theory, we frame the temporal difference learning as a constrained Gaussian process regression problem. This innovative approach allows the state-value function to be expressed as a Lyapunov function, leveraging the Gaussian process and deep kernel learning. We develop a novel Bayesian quadrature policy optimization procedure to analytically compute the policy gradient while integrating Lyapunov-based stability constraints. This integration is pivotal in satisfying the rigorous safety demands of spaceflight missions. The proposed algorithm has been experimentally evaluated on a spacecraft air-bearing testbed and shows impressive and promising performance.
    摘要 在探索自主空天器近距离推进和协调(PMD)方面,我们介绍了一种新的 bayesian actor-critic reinforcement learning算法,以学习一个具有稳定保证的控制策略。PMD任务被定义为一个Markov决策过程,反映了相对动态模型、协调杯和成本函数。从Lyapunov理论的原则出发,我们将时间差学习转化为一个受限Gaussian проце推 regression问题。这种创新的方法使得状态价值函数可以表示为Lyapunov函数,利用Gaussian проце和深度kernel学习。我们开发了一种bayesian quadrature policy优化程序,可以分析计算策略偏导的同时,并将Lyapunov-based稳定约束集成到系统中。这种集成是Spaceflight任务中的严格安全要求的满足。我们在一个空天器空拖测试平台上实验了提议的算法,并表现出了非常出色和可能的性能。

Stable Modular Control via Contraction Theory for Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.03669
  • repo_url: None
  • paper_authors: Bing Song, Jean-Jacques Slotine, Quang-Cuong Pham
  • for: 提高 neural control 稳定性、可靠性和泛化能力
  • methods: 利用 contraction theory 实现模块化 neural control,使得稳定性自然地保持,并且通过 signal composition 和 dynamic decomposition 实现模块化。
  • results: 在 simulations 中证明了方法的必要性和有效性,方法可以提高 hierarchical RL 的性能,并且可以使得 neural control 更加稳定、可靠和泛化。
    Abstract We propose a novel way to integrate control techniques with reinforcement learning (RL) for stability, robustness, and generalization: leveraging contraction theory to realize modularity in neural control, which ensures that combining stable subsystems can automatically preserve the stability. We realize such modularity via signal composition and dynamic decomposition. Signal composition creates the latent space, within which RL applies to maximizing rewards. Dynamic decomposition is realized by coordinate transformation that creates an auxiliary space, within which the latent signals are coupled in the way that their combination can preserve stability provided each signal, that is, each subsystem, has stable self-feedbacks. Leveraging modularity, the nonlinear stability problem is deconstructed into algebraically solvable ones, the stability of the subsystems in the auxiliary space, yielding linear constraints on the input gradients of control networks that can be as simple as switching the signs of network weights. This minimally invasive method for stability allows arguably easy integration into the modular neural architectures in machine learning, like hierarchical RL, and improves their performance. We demonstrate in simulation the necessity and the effectiveness of our method: the necessity for robustness and generalization, and the effectiveness in improving hierarchical RL for manipulation learning.
    摘要 我们提出了一种新的方法,把控制技术与强化学习(RL)结合在一起,以确保稳定性、可靠性和泛化性:通过ontraction theory来实现模块化在神经控制中,以确保将稳定的子系统组合起来,可以保持稳定性。我们通过信号组合和动态分解来实现这种模块化。信号组合创造了隐藏空间,在这个空间中,RL可以以最大化奖励来实现。动态分解通过坐标转换创造了一个辅助空间,在这个空间中,隐藏信号被 coupling在一起,以保持稳定性, provided each signal, that is, each subsystem, has stable self-feedbacks。通过模块化,非线性稳定性问题被分解成可解的问题,即隐藏空间中的稳定性问题,它们是线性约束的输入梯度问题,可以非常简单地Switching the signs of network weights。这种非侵入式的稳定性方法可以轻松地集成到现有的模块化神经网络架构中,例如层次RL,并提高其性能。我们在模拟中证明了我们的方法的必要性和有效性:必要性是为了稳定性和泛化性,有效性是通过改进层次RL的掌握能力。

GPT-ST: Generative Pre-Training of Spatio-Temporal Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2311.04245
  • repo_url: https://github.com/hkuds/gpt-st
  • paper_authors: Zhonghang Li, Lianghao Xia, Yong Xu, Chao Huang
  • for: 本文旨在提出一种针对流行管理和旅行规划的空间temporal预测技术快速发展的方法,以满足现代交通管理和旅行规划的需求。
  • methods: 本文提出了一种针对下游基线模型的预训练框架,该框架通过两个关键设计来增强预测性能:首先,我们提出了一种针对空间temporal依赖关系的自适应封装器,该模型包括自定义参数学习器和层次的空间pattern编码器。其次,我们引入了一种适应封装策略,以便在预训练过程中学习Robust的空间temporal表示,并且可以轻松地模型不同关系,从内部cluster到外部cluster,在易于困难的训练方式下。
  • results: 我们在代表性的 benchmark 上进行了广泛的实验,并证明了我们的提出方法的效iveness。我们的模型实现可以在https://github.com/HKUDS/GPT-ST 上获得。
    Abstract In recent years, there has been a rapid development of spatio-temporal prediction techniques in response to the increasing demands of traffic management and travel planning. While advanced end-to-end models have achieved notable success in improving predictive performance, their integration and expansion pose significant challenges. This work aims to address these challenges by introducing a spatio-temporal pre-training framework that seamlessly integrates with downstream baselines and enhances their performance. The framework is built upon two key designs: (i) We propose a spatio-temporal mask autoencoder as a pre-training model for learning spatio-temporal dependencies. The model incorporates customized parameter learners and hierarchical spatial pattern encoding networks. These modules are specifically designed to capture spatio-temporal customized representations and intra- and inter-cluster region semantic relationships, which have often been neglected in existing approaches. (ii) We introduce an adaptive mask strategy as part of the pre-training mechanism. This strategy guides the mask autoencoder in learning robust spatio-temporal representations and facilitates the modeling of different relationships, ranging from intra-cluster to inter-cluster, in an easy-to-hard training manner. Extensive experiments conducted on representative benchmarks demonstrate the effectiveness of our proposed method. We have made our model implementation publicly available at https://github.com/HKUDS/GPT-ST.
    摘要 Recently, there has been a rapid development of spatio-temporal prediction techniques in response to the increasing demands of traffic management and travel planning. Although advanced end-to-end models have achieved notable success in improving predictive performance, their integration and expansion pose significant challenges. This work aims to address these challenges by introducing a spatio-temporal pre-training framework that seamlessly integrates with downstream baselines and enhances their performance. The framework is built upon two key designs:(i) We propose a spatio-temporal mask autoencoder as a pre-training model for learning spatio-temporal dependencies. The model incorporates customized parameter learners and hierarchical spatial pattern encoding networks. These modules are specifically designed to capture spatio-temporal customized representations and intra- and inter-cluster region semantic relationships, which have often been neglected in existing approaches.(ii) We introduce an adaptive mask strategy as part of the pre-training mechanism. This strategy guides the mask autoencoder in learning robust spatio-temporal representations and facilitates the modeling of different relationships, ranging from intra-cluster to inter-cluster, in an easy-to-hard training manner.Extensive experiments conducted on representative benchmarks demonstrate the effectiveness of our proposed method. Our model implementation is publicly available at .

The Linear Representation Hypothesis and the Geometry of Large Language Models

  • paper_url: http://arxiv.org/abs/2311.03658
  • repo_url: https://github.com/kihopark/linear_rep_geometry
  • paper_authors: Kiho Park, Yo Joong Choe, Victor Veitch
  • for: This paper aims to clarify the meaning of “linear representation” in the context of natural language processing, and to develop a unified framework for understanding geometric notions such as cosine similarity and projection in representation space.
  • methods: The authors use counterfactuals to formalize two different notions of “linear representation”, one in the output (word) representation space and one in the input (sentence) space. They then prove that these notions connect to linear probing and model steering, respectively.
  • results: The authors show that using a particular (non-Euclidean) inner product, they can unify all notions of linear representation, and demonstrate the existence of linear representations of concepts, the connection to interpretation and control, and the fundamental role of the choice of inner product through experiments with LLaMA-2.
    Abstract Informally, the 'linear representation hypothesis' is the idea that high-level concepts are represented linearly as directions in some representation space. In this paper, we address two closely related questions: What does "linear representation" actually mean? And, how do we make sense of geometric notions (e.g., cosine similarity or projection) in the representation space? To answer these, we use the language of counterfactuals to give two formalizations of "linear representation", one in the output (word) representation space, and one in the input (sentence) space. We then prove these connect to linear probing and model steering, respectively. To make sense of geometric notions, we use the formalization to identify a particular (non-Euclidean) inner product that respects language structure in a sense we make precise. Using this causal inner product, we show how to unify all notions of linear representation. In particular, this allows the construction of probes and steering vectors using counterfactual pairs. Experiments with LLaMA-2 demonstrate the existence of linear representations of concepts, the connection to interpretation and control, and the fundamental role of the choice of inner product.
    摘要 文章提出了两个问题:首先,“线性表示”的具体含义是什么?其次,在表示空间中如何理解几何概念(例如cosine相似性或投影)?为了回答这些问题,文章使用了counterfactual语言来给出了两种形式化的“线性表示”,一种在输出(单词)表示空间中,另一种在输入(句子)空间中。然后,文章证明了这些连接到线性探测和模型导航。使用这种形式化,文章可以归一化所有的线性表示概念,并通过counterfactual对pair来构建探测器和导航向量。实验表明了概念的线性表示存在,以及与解释和控制的关系。此外,文章还证明了内积的选择对于语言结构的理解起到了关键作用。

Machine Learning Parameterization of the Multi-scale Kain-Fritsch (MSKF) Convection Scheme

  • paper_url: http://arxiv.org/abs/2311.03652
  • repo_url: None
  • paper_authors: Xiaohui Zhong, Xing Yu, Hao Li
  • For: This paper aims to improve the representation of convective transport in high-resolution numerical weather prediction (NWP) models, specifically in the gray zone where the grid spacing is comparable to the length scales of convection.* Methods: The authors use a multi-scale Kain-Fritsch (MSKF) scheme and a multi-output bidirectional long short-term memory (Bi-LSTM) model to represent convective transport in the gray zone. They also compare the performance of the Bi-LSTM model with the original MSKF scheme in the WRF model.* Results: The results show that the Bi-LSTM model can achieve high accuracy in representing convective transport in the gray zone, indicating the potential use of machine learning (ML) models to substitute physical parameterizations in NWP models.
    Abstract Warm-sector heavy rainfall often occurs along the coast of South China, and it is usually localized and long-lasting, making it challenging to predict. High-resolution numerical weather prediction (NWP) models are increasingly used to better resolve topographic features and forecast such high-impact weather events. However, when the grid spacing becomes comparable to the length scales of convection, known as the gray zone, the turbulent eddies in the atmospheric boundary layer are only partially resolved and parameterized to some extent. Whether using a convection parameterization (CP) scheme in the gray zone remains controversial. Scale-aware CP schemes are developed to enhance the representation of convective transport within the gray zone. The multi-scale Kain-Fritsch (MSKF) scheme includes modifications that allow for its effective implementation at a grid resolution as high as 2 km. In recent years, there has been an increasing application of machine learning (ML) models to various domains of atmospheric sciences, including the replacement of physical parameterizations with ML models. This work proposes a multi-output bidirectional long short-term memory (Bi-LSTM) model as a replace the scale-aware MSKF CP scheme. The Weather Research and Forecast (WRF) model is used to generate training and testing data over South China at a horizontal resolution of 5 km. Furthermore, the WRF model is coupled with the ML based CP scheme and compared with WRF simulations with original MSKF scheme. The results demonstrate that the Bi-LSTM model can achieve high accuracy, indicating the potential use of ML models to substitute the MSKF scheme in the gray zone.
    摘要 暖 sector 重降水 frequent occurrence 南中国 海岸,通常是局部化和长时间的,预测困难。高分解能数值天气预测 (NWP) 模型在气象预测中日益广泛应用,以提高地形特征的解析和预测高影响天气事件。然而,当网格间距相当于 конвектив 的长度尺度时,称为灰色区域,大气边层中的湍流涡旋只部分解析出来,部分用参数化来做出报告。使用湍流参数化 (CP) 方案在灰色区域是有争议的。基于Scale-aware CP 方案的多个气象学应用。多Scale Kain-Fritsch (MSKF) 方案包括修改,以使其在2 km 的网格分辨率上有效实施。最近几年,机器学习 (ML) 模型在大气科学中的应用越来越广泛,包括将物理参数化替换为 ML 模型。这项工作提出了一种多输出 bidirectional 长短时间储存 (Bi-LSTM) 模型,用来取代灰色区域中的Scale-aware MSKF CP 方案。使用 WRF 模型生成训练和测试数据,并将 WRF 模型与 ML 基于 CP 方案相结合,与 WRF 模型中原始 MSKF 方案进行比较。结果显示,Bi-LSTM 模型可以达到高精度,表明 ML 模型可以在灰色区域中代替 MSKF 方案。

SeRO: Self-Supervised Reinforcement Learning for Recovery from Out-of-Distribution Situations

  • paper_url: http://arxiv.org/abs/2311.03651
  • repo_url: https://github.com/snuchankim/sero
  • paper_authors: Chan Kim, Jaekyung Cho, Christophe Bobda, Seung-Woo Seo, Seong-Woo Kim
  • for: 解决机器人代理人使用强化学习时在不同分布下(Out-of-distribution, OOD)状态下行为不可靠的问题。
  • methods: 我们提出了一种自动学习方法,使得当机器人代理人在OOD状态下时,可以自动恢复到已经学习的状态分布中。
  • results: 我们的实验结果表明,我们的方法可以大幅提高机器人代理人在OOD状态下恢复能力,包括样本效率和原始任务完成度的提高。此外,我们还证明了我们的方法可以在困难探索的状态下进行自动恢复。
    Abstract Robotic agents trained using reinforcement learning have the problem of taking unreliable actions in an out-of-distribution (OOD) state. Agents can easily become OOD in real-world environments because it is almost impossible for them to visit and learn the entire state space during training. Unfortunately, unreliable actions do not ensure that agents perform their original tasks successfully. Therefore, agents should be able to recognize whether they are in OOD states and learn how to return to the learned state distribution rather than continue to take unreliable actions. In this study, we propose a novel method for retraining agents to recover from OOD situations in a self-supervised manner when they fall into OOD states. Our in-depth experimental results demonstrate that our method substantially improves the agent's ability to recover from OOD situations in terms of sample efficiency and restoration of the performance for the original tasks. Moreover, we show that our method can retrain the agent to recover from OOD situations even when in-distribution states are difficult to visit through exploration.
    摘要

HKTGNN: Hierarchical Knowledge Transferable Graph Neural Network-based Supply Chain Risk Assessment

  • paper_url: http://arxiv.org/abs/2311.04244
  • repo_url: None
  • paper_authors: Zhanting Zhou, Kejun Bi, Yuyanzhen Zhong, Chao Tang, Dongfen Li, Shi Ying, Ruijin Wang
  • for: 本研究旨在提供一种基于图学 embedding 技术的供应链风险评估模型,以帮助企业管理和减轻供应链中的潜在风险。
  • methods: 本研究使用一种基于图 neural network 的层次知识传递模型(HKTGNN),通过对供应链网络中各个商品的图像进行嵌入,将复杂的供应链网络简化成一个导向同质图,并使用中心性知识传递模块和特征补做消除数据渴望问题。
  • results: 实验结果表明, compared with traditional knowledge inference methods, our proposed HKTGNN model outperforms in assessing supply chain risk. We also prove the effectiveness and fairness of our comparative experiment through an equation.
    Abstract The strength of a supply chain is an important measure of a country's or region's technical advancement and overall competitiveness. Establishing supply chain risk assessment models for effective management and mitigation of potential risks has become increasingly crucial. As the number of businesses grows, the important relationships become more complicated and difficult to measure. This emphasizes the need of extracting relevant information from graph data. Previously, academics mostly employed knowledge inference to increase the visibility of links between nodes in the supply chain. However, they have not solved the data hunger problem of single node feature characteristics. We propose a hierarchical knowledge transferable graph neural network-based (HKTGNN) supply chain risk assessment model to address these issues. Our approach is based on current graph embedding methods for assessing corporate investment risk assessment. We embed the supply chain network corresponding to individual goods in the supply chain using the graph embedding module, resulting in a directed homogeneous graph with just product nodes. This reduces the complicated supply chain network into a basic product network. It addresses difficulties using the domain difference knowledge transferable module based on centrality, which is presented by the premise that supply chain feature characteristics may be biased in the actual world. Meanwhile, the feature complement and message passing will alleviate the data hunger problem, which is driven by domain differences. Our model outperforms in experiments on a real-world supply chain dataset. We will give an equation to prove that our comparative experiment is both effective and fair.
    摘要 “供应链的强度是一个国家或地区的技术进步和竞争力的重要指标。为了有效管理和遏制风险,建立供应链风险评估模型已经变得越来越重要。随着企业数量的增加,关键关系变得更加复杂和难以测量。这种情况强调了提取供应链数据中重要信息的需要。在过去,学术界主要采用了知识推断来增强供应链中节点之间的链接可见性。然而,它们没有解决单节点特征特性数据不足的问题。我们提出了基于图数据预处理的层次知识传递图神经网络(HKTGNN)供应链风险评估模型,以解决这些问题。我们的方法基于当前图集 embedding 方法,用于评估企业投资风险评估。我们将供应链网络相应的各个商品 embedding 到图集模块中,从而将复杂的供应链网络转化为基本的产品网络。这解决了使用中心性知识传递模块基于中心性,即供应链特征特性可能受到实际世界的偏见的问题。同时,特征补做和消息传递可以缓解数据不足问题,它是由于领域差异所致。我们的模型在实际供应链数据集上实验表现出色,我们将给出一个公式来证明我们的比较实验是有效和公正的。”Note: The translation is done using Google Translate and may not be perfect. Please let me know if you need any further assistance.

Analysis of the User Perception of Chatbots in Education Using A Partial Least Squares Structural Equation Modeling Approach

  • paper_url: http://arxiv.org/abs/2311.03636
  • repo_url: None
  • paper_authors: Md Rabiul Hasan, Nahian Ismail Chowdhury, Md Hadisur Rahman, Md Asif Bin Syed, JuHyeong Ryu
  • for: 本研究旨在探讨学生在教育中对聊天机器人的采用,以填补现有研究中缺乏关注行为相关因素的知识空白。
  • methods: 本研究使用了部分最小 squares 结构方程模型(PLS-SEM)调查学生对聊天机器人在教育中的采用,并考虑了技术Ready Index(TRI)和技术接受度模型(TAM)。数据收集使用了五点likert分布,共获得185个答复,并使用R-Studio软件进行分析。
  • results: 研究发现,乐观性和创新性对聊天机器人的易用性(PEOU)和有用性(PU)具有正相关关系,而不适和不安室对PEOU具有负相关关系,只有不安室对PU具有负影响。这些发现可以为未来的技术设计师提供指导,揭示了在教育上下文中聊天机器人采用的关键用户行为因素。
    Abstract The integration of Artificial Intelligence (AI) into education is a recent development, with chatbots emerging as a noteworthy addition to this transformative landscape. As online learning platforms rapidly advance, students need to adapt swiftly to excel in this dynamic environment. Consequently, understanding the acceptance of chatbots, particularly those employing Large Language Model (LLM) such as Chat Generative Pretrained Transformer (ChatGPT), Google Bard, and other interactive AI technologies, is of paramount importance. However, existing research on chatbots in education has overlooked key behavior-related aspects, such as Optimism, Innovativeness, Discomfort, Insecurity, Transparency, Ethics, Interaction, Engagement, and Accuracy, creating a significant literature gap. To address this gap, this study employs Partial Least Squares Structural Equation Modeling (PLS-SEM) to investigate the determinant of chatbots adoption in education among students, considering the Technology Readiness Index (TRI) and Technology Acceptance Model (TAM). Utilizing a five-point Likert scale for data collection, we gathered a total of 185 responses, which were analyzed using R-Studio software. We established 12 hypotheses to achieve its objectives. The results showed that Optimism and Innovativeness are positively associated with Perceived Ease of Use (PEOU) and Perceived Usefulness (PU). Conversely, Discomfort and Insecurity negatively impact PEOU, with only Insecurity negatively affecting PU. These findings provide insights for future technology designers, elucidating critical user behavior factors influencing chatbots adoption and utilization in educational contexts.
    摘要 教育领域中人工智能(AI)的整合是一项最近的发展,聊天机器人(chatbot)成为这一变革的一个显著添加。在在线学习平台快速进步的情况下,学生需要快速适应以成功在这个动态环境中。因此,理解聊天机器人的接受度,特别是使用大型语言模型(LLM)如聊天生成预训练Transformer(ChatGPT)、Google Bard等交互式AI技术的接受度,是 Paramount importance。然而,现有的教育领域聊天机器人研究忽略了关键的行为相关方面,如优択性、创新性、不适感、不安全感、透明度、伦理、互动、参与度和准确性,这创造了一个重要的文献差距。为了填补这一差距,本研究使用部分最小二乘结构方程(PLS-SEM)调查教育领域聊天机器人的采用情况,考虑技术准备指数(TRI)和技术接受度模型(TAM)。通过五点Likert等级的数据收集,总共收集了185个答案,使用RStudio软件进行分析。我们设置了12个假设,以实现研究的目标。结果显示,优択性和创新性与使用容易性(PEOU)和有用性(PU)正相关,而不适感和不安全感则与PEOU相反,只有不安全感对PU具有负面影响。这些发现为未来技术设计师提供了灵感,把关于聊天机器人在教育上的采用和使用的用户行为因素揭露出来。

TWIST: Teacher-Student World Model Distillation for Efficient Sim-to-Real Transfer

  • paper_url: http://arxiv.org/abs/2311.03622
  • repo_url: None
  • paper_authors: Jun Yamada, Marc Rigter, Jack Collins, Ingmar Posner
  • for: 该论文目的是提出一种基于模型的RL方法来解决实际 робо太器中的视觉任务,并且提高模型之间的转移效果。
  • methods: 该论文使用了Distillation技术来实现模型之间的转移,特别是使用了教师模型和学生模型的设计来加速实际世界中的模型转移。
  • results: 实验表明,该方法可以比naive随机化和模型自由RL方法更高效地实现实际世界中的模型转移,并且可以提高任务性能。
    Abstract Model-based RL is a promising approach for real-world robotics due to its improved sample efficiency and generalization capabilities compared to model-free RL. However, effective model-based RL solutions for vision-based real-world applications require bridging the sim-to-real gap for any world model learnt. Due to its significant computational cost, standard domain randomisation does not provide an effective solution to this problem. This paper proposes TWIST (Teacher-Student World Model Distillation for Sim-to-Real Transfer) to achieve efficient sim-to-real transfer of vision-based model-based RL using distillation. Specifically, TWIST leverages state observations as readily accessible, privileged information commonly garnered from a simulator to significantly accelerate sim-to-real transfer. Specifically, a teacher world model is trained efficiently on state information. At the same time, a matching dataset is collected of domain-randomised image observations. The teacher world model then supervises a student world model that takes the domain-randomised image observations as input. By distilling the learned latent dynamics model from the teacher to the student model, TWIST achieves efficient and effective sim-to-real transfer for vision-based model-based RL tasks. Experiments in simulated and real robotics tasks demonstrate that our approach outperforms naive domain randomisation and model-free methods in terms of sample efficiency and task performance of sim-to-real transfer.
    摘要 模型基于RL是实世界机器人控制中有前途的方法,因其在样本效率和泛化能力方面比模型自由RL更为优秀。然而,实世界应用中的视觉基于模型基于RL解决方案需要跨 simulate-to-real gap 的桥梁。由于其计算成本较高,标准的领域随机化不能提供有效的解决方案。本文提出了TWIST(教师学生世界模型填充法),以实现efficient的 simulate-to-real 传输。 Specifically, TWIST 利用了 readily accessible 的状态信息,通常来自 simulator 中获得的特权信息,以加速 simulate-to-real 传输。具体来说,一个教师世界模型在状态信息上进行高效地训练,同时收集了随机化的图像观察数据。教师世界模型然后监督学生世界模型,该模型接受随机化的图像观察数据作为输入。通过填充学生模型中学习的 latent dynamics 模型,TWIST 实现了高效和有效的 simulate-to-real 传输。实验表明,我们的方法在 simulated 和实际机器人任务中超过了Randomization 和模型自由方法的样本效率和任务性能。