2023-09-27

cs.AI

cs.AI - 2023-09-27

Masked autoencoders are scalable learners of cellular morphology

paper_url: http://arxiv.org/abs/2309.16064
repo_url: https://github.com/NumtraCG/614ca4eaa2b781088de64a5f20210923-160645routingmodel230921
paper_authors: Oren Kraus, Kian Kenyon-Dean, Saber Saberian, Maryam Fallah, Peter McLean, Jess Leung, Vasudev Sharma, Ayla Khan, Jia Balakrishnan, Safiye Celik, Maciej Sypetkowski, Chi Vicky Cheng, Kristen Morse, Maureen Makes, Ben Mabey, Berton Earnshaw
for: 这 paper 用于探讨高容量微scopia 图像屏试中的生物关系推理，以及深度学习模型在生物研究中的应用。
methods: 这 paper 使用了弱监督和自监督的深度学习方法，并评估了不同模型的可扩展性和性能。
results: 结果显示，使用 CNN 和 ViT 基于的假降降autoencoder 模型可以舒过弱监督模型，并在大规模数据集上实现28% 的相对提升。

Abstract
Inferring biological relationships from cellular phenotypes in high-content microscopy screens provides significant opportunity and challenge in biological research. Prior results have shown that deep vision models can capture biological signal better than hand-crafted features. This work explores how weakly supervised and self-supervised deep learning approaches scale when training larger models on larger datasets. Our results show that both CNN- and ViT-based masked autoencoders significantly outperform weakly supervised models. At the high-end of our scale, a ViT-L/8 trained on over 3.5-billion unique crops sampled from 95-million microscopy images achieves relative improvements as high as 28% over our best weakly supervised models at inferring known biological relationships curated from public databases.

摘要
高通量微scopic摄像头检测可以提供生物关系的重要机会和挑战。先前的结果表明深度视觉模型可以更好地捕捉生物信号，比手工设计的特征更高效。这项工作探讨如何在训练更大的模型和更大的数据集时，使用弱监睹和自监睹深度学习方法Scaling。我们的结果显示，基于CNN和ViT的masked autoencoder都能够明显超越弱监睹模型。在我们的大规模扩展中，使用超过3.5亿个独特的折衣和95万个微scopic摄像头图像中采样的ViT-L/8模型，可以在推断已知生物关系中获得相对提高达28%的改善。

Towards Best Practices of Activation Patching in Language Models: Metrics and Methods

paper_url: http://arxiv.org/abs/2309.16042
repo_url: None
paper_authors: Fred Zhang, Neel Nanda
for: 本研究目的是强化机器学习模型的机制可读性，即理解模型内部的机制。
methods: 本研究使用活动补丁技术（也称作 causal tracing 或 interchange intervention）来实现机制可读性。
results: 研究发现，在不同的地方和电路发现任务中，不同的评价指标和损害方法会导致不同的可读性结果。同时，研究还提供了一些概念上的论述，以及未来Activation patching的最佳实践。

Abstract
Mechanistic interpretability seeks to understand the internal mechanisms of machine learning models, where localization -- identifying the important model components -- is a key step. Activation patching, also known as causal tracing or interchange intervention, is a standard technique for this task (Vig et al., 2020), but the literature contains many variants with little consensus on the choice of hyperparameters or methodology. In this work, we systematically examine the impact of methodological details in activation patching, including evaluation metrics and corruption methods. In several settings of localization and circuit discovery in language models, we find that varying these hyperparameters could lead to disparate interpretability results. Backed by empirical observations, we give conceptual arguments for why certain metrics or methods may be preferred. Finally, we provide recommendations for the best practices of activation patching going forwards.

摘要
机制解释寻求机器学习模型内部机制的理解，本地化（identifying important model components）是关键步骤。活动贴图（也称为 causal tracing或交换间作用）是标准技术，但文献中有很多变体，几乎没有共识于选择超参数或方法论。在本工作中，我们系统地检查活动贴图的方法论环境下的影响，包括评价指标和腐败方法。在语音模型的本地化和电路发现中，我们发现了不同的超参数选择可能导致不同的解释结果。基于实际观察，我们给出了概念性的Arguments，并提出了以下推荐：在活动贴图中，应该采用合适的评价指标和腐败方法，以确保解释结果的可靠性和有用性。

MedEdit: Model Editing for Medical Question Answering with External Knowledge Bases

paper_url: http://arxiv.org/abs/2309.16035
repo_url: None
paper_authors: Yucheng Shi, Shaochen Xu, Zhengliang Liu, Tianming Liu, Xiang Li, Ninghao Liu
for: 提高大语言模型（LLM）在医疗问答（QA）任务上的表现，并且不需要进行 fine-tuning 或 retraining。
methods: 利用内容学习进行模型编辑，并将医疗知识库中的信息 incorporate 到查询提示中，以提高 LLM 的响应。
results: 对使用 MedQA-SMILE dataset进行医疗 QA 的 edited Vicuna 模型，其精度从 44.46% 提高到 48.54%。这项研究表明，模型编辑可以提高 LLM 的表现，并提供一种实用的方法来 Mitigate 黑盒 LLM 的挑战。

Abstract
Large Language Models (LLMs), although powerful in general domains, often perform poorly on domain-specific tasks like medical question answering (QA). Moreover, they tend to function as "black-boxes," making it challenging to modify their behavior. Addressing this, our study delves into model editing utilizing in-context learning, aiming to improve LLM responses without the need for fine-tuning or retraining. Specifically, we propose a comprehensive retrieval strategy to extract medical facts from an external knowledge base, and then we incorporate them into the query prompt for the LLM. Focusing on medical QA using the MedQA-SMILE dataset, we evaluate the impact of different retrieval models and the number of facts provided to the LLM. Notably, our edited Vicuna model exhibited an accuracy improvement from 44.46% to 48.54%. This work underscores the potential of model editing to enhance LLM performance, offering a practical approach to mitigate the challenges of black-box LLMs.

摘要
大型语言模型（LLM）虽然在通用领域上表现出色，但在具体领域任务如医疗问答（QA）中表现不佳。此外，它们往往 behave like "黑盒子"，使其行为修改困难。为了解决这问题，我们的研究探讨了模型编辑，以提高 LLM 的回答质量，不需要 Fine-tuning 或 Retraining。我们提议了一种完整的检索策略，将医学知识库中的医学事实EXTRACTED并提供给 LLM 作为查询提示。我们在使用 MedQA-SMILE 数据集进行医学问答任务中，评估了不同的检索模型和提供给 LLM 的事实数量对性能的影响。结果显示，我们修改后的 Vicuna 模型的准确率从 44.46% 提高到 48.54%。这种研究证明了模型编辑的潜在作用，为黑盒子 LLM 带来了实际的解决方案。

Symbolic Imitation Learning: From Black-Box to Explainable Driving Policies

paper_url: http://arxiv.org/abs/2309.16025
repo_url: None
paper_authors: Iman Sharifi, Saber Fallah
for: 提高自动驾驶系统的可靠性和安全性
methods: 使用卷积神经网络和推理逻辑编程（ILP）学习驾驶策略
results: 提高驾驶策略的可解释性和泛化性，并在不同的驾驶情况下显著提高驾驶策略的应用性

Abstract
Current methods of imitation learning (IL), primarily based on deep neural networks, offer efficient means for obtaining driving policies from real-world data but suffer from significant limitations in interpretability and generalizability. These shortcomings are particularly concerning in safety-critical applications like autonomous driving. In this paper, we address these limitations by introducing Symbolic Imitation Learning (SIL), a groundbreaking method that employs Inductive Logic Programming (ILP) to learn driving policies which are transparent, explainable and generalisable from available datasets. Utilizing the real-world highD dataset, we subject our method to a rigorous comparative analysis against prevailing neural-network-based IL methods. Our results demonstrate that SIL not only enhances the interpretability of driving policies but also significantly improves their applicability across varied driving situations. Hence, this work offers a novel pathway to more reliable and safer autonomous driving systems, underscoring the potential of integrating ILP into the domain of IL.

摘要
当前的模仿学习（IL）方法，主要基于深度神经网络，提供了高效的获取驾驶策略的方式，但受到解释性和普适性的重大限制。这些局限性在安全关键应用如自动驾驶中特别有问题。在这篇论文中，我们解决了这些限制，通过引入符号学习（SIL），我们使用推理逻辑编程（ILP）来学习透明、可解释的驾驶策略。我们使用现实世界的高D数据集进行了严格的比较分析，我们的结果表明，SIL不仅可以提高驾驶策略的解释性，还可以显著改善驾驶策略在不同驾驶情况下的应用程度。因此，这项工作提供了一个新的可靠和安全的自动驾驶系统的可能性，强调了将ILPintegrated到驾驶学习领域中的潜在价值。

Clinical Trial Recommendations Using Semantics-Based Inductive Inference and Knowledge Graph Embeddings

paper_url: http://arxiv.org/abs/2309.15979
repo_url: None
paper_authors: Murthy V. Devarakonda, Smita Mohanty, Raja Rao Sunkishala, Nag Mallampalli, Xiong Liu
for: 本研究的目的是提出一种新的临床试验设计方法，通过对临床试验记录的探索性挖掘，为设计新的临床试验提供建议。
methods: 本研究使用了基于神经元编码的新推荐方法，利用临床试验数据知识图构建知识图embedding（KGE），并通过对KGE方法的效果进行研究，以及一种新的推荐方法基于KGE。
results: 研究结果显示，该推荐方法可以达到70%-83%的相关性分数，并且在实际临床试验元素中找到最相关的建议。此外，研究还发现可以通过节点 semantics 进行训练，以提高KGE的性能。

Abstract
Designing a new clinical trial entails many decisions, such as defining a cohort and setting the study objectives to name a few, and therefore can benefit from recommendations based on exhaustive mining of past clinical trial records. Here, we propose a novel recommendation methodology, based on neural embeddings trained on a first-of-a-kind knowledge graph of clinical trials. We addressed several important research questions in this context, including designing a knowledge graph (KG) for clinical trial data, effectiveness of various KG embedding (KGE) methods for it, a novel inductive inference using KGE, and its use in generating recommendations for clinical trial design. We used publicly available data from clinicaltrials.gov for the study. Results show that our recommendations approach achieves relevance scores of 70%-83%, measured as the text similarity to actual clinical trial elements, and the most relevant recommendation can be found near the top of list. Our study also suggests potential improvement in training KGE using node semantics.

摘要
“设计新临床试验需要许多决策，例如定义受试群体和设定试验目标等，因此可以受益于根据过去临床试验记录的广泛采矿提供建议。我们提出了一种新的建议方法，基于临床试验知识图（KG）的神经嵌入。我们解决了许多重要的研究问题，包括临床试验数据的知识图设计、不同KG嵌入方法的效果、一种新的推论方法，以及其在设计临床试验时的应用。我们使用了公开ailable的临床试验数据库，来进行研究。结果显示，我们的建议方法可以实现70%-83%的相似度数据， measured as 文本与实际临床试验元素之间的相似度，而且最相似的建议通常可以在列表的顶部发现。我们的研究也显示，可以在专门的node semantics上进行训练，以提高KGE的性能。”

Resilience of Deep Learning applications: a systematic survey of analysis and hardening techniques

paper_url: http://arxiv.org/abs/2309.16733
repo_url: None
paper_authors: Cristiana Bolchini, Luca Cassano, Antonio Miele
for: 本研究探讨了深度学习（一种人工智能技术）对硬件错误的抵御性，系统地对现有相关研究进行了思考和回顾。
methods: 本研究采用了一种分类框架来解释和 highlight研究相似之处和特点，基于多个参数，包括研究主要目标、采用的错误和缺陷模型、其可重现性。
results: 本研究结果表明，目前的研究主要集中在深度学习对硬件错误的抵御性方面，并采用了多种方法来解决这些问题。但是，还有一些未解决的问题和挑战需要在未来进行研究。

Abstract
Machine Learning (ML) is currently being exploited in numerous applications being one of the most effective Artificial Intelligence (AI) technologies, used in diverse fields, such as vision, autonomous systems, and alike. The trend motivated a significant amount of contributions to the analysis and design of ML applications against faults affecting the underlying hardware. The authors investigate the existing body of knowledge on Deep Learning (among ML techniques) resilience against hardware faults systematically through a thoughtful review in which the strengths and weaknesses of this literature stream are presented clearly and then future avenues of research are set out. The review is based on 163 scientific articles published between January 2019 and March 2023. The authors adopt a classifying framework to interpret and highlight research similarities and peculiarities, based on several parameters, starting from the main scope of the work, the adopted fault and error models, to their reproducibility. This framework allows for a comparison of the different solutions and the identification of possible synergies. Furthermore, suggestions concerning the future direction of research are proposed in the form of open challenges to be addressed.

摘要
根据2019年1月至2023年3月发表的163篇科学文献，作者采用了一个分类框架来解读和 highlight研究的相似性和特点，包括研究的主要范围、采用的缺陷和错误模型、其重复性等多个参数。这个框架允许对不同的解决方案进行比较，并提出了可能的共同点。此外，作者还提出了未来研究的方向和开放挑战。

Unified Long-Term Time-Series Forecasting Benchmark

paper_url: http://arxiv.org/abs/2309.15946
repo_url: https://github.com/MIMUW-RL/Unified-Long-Horizon-Time-Series-Benchmark
paper_authors: Jacek Cyranka, Szymon Haponiuk
for: 这个论文是为了提高机器学习方法的时间序列预测能力而设计的，并提供了一个完整的数据集，用于验证这些方法。
methods: 这个论文使用了多种不同的机器学习模型，包括LSTM、DeepAR、NLinear、N-Hits、PatchTST和LatentODE等，并进行了广泛的比较分析以决定这些模型在不同情况下的效果。
results: 这个论文的结果显示了不同模型在不同数据集下的表现，并发现了一些模型在某些情况下的优化。此外，论文还引入了一个自定义的潜在NLinear模型和将DeepAR加以课程学习阶段，both consistently outperform their vanilla counterparts。

Abstract
In order to support the advancement of machine learning methods for predicting time-series data, we present a comprehensive dataset designed explicitly for long-term time-series forecasting. We incorporate a collection of datasets obtained from diverse, dynamic systems and real-life records. Each dataset is standardized by dividing it into training and test trajectories with predetermined lookback lengths. We include trajectories of length up to $2000$ to ensure a reliable evaluation of long-term forecasting capabilities. To determine the most effective model in diverse scenarios, we conduct an extensive benchmarking analysis using classical and state-of-the-art models, namely LSTM, DeepAR, NLinear, N-Hits, PatchTST, and LatentODE. Our findings reveal intriguing performance comparisons among these models, highlighting the dataset-dependent nature of model effectiveness. Notably, we introduce a custom latent NLinear model and enhance DeepAR with a curriculum learning phase. Both consistently outperform their vanilla counterparts.

摘要
为支持机器学习方法预测时间序列数据的进步，我们提供了专门为长期时间序列预测设计的完整数据集。我们收集了来自多种动态系统和实际记录的多个数据集，并对每个数据集进行标准化，将它们分为训练和测试曲线的训练和测试轨迹，使用预定的回看长度。我们的数据集包括长度达2000的轨迹，以确保可靠地评估长期预测能力。我们使用经典和当前最佳模型，包括LSTM、DeepAR、NLinear、N-Hits、PatchTST和LatentODE进行广泛的比较分析，发现这些模型在不同的场景中表现出有趣的比较。特别是，我们提出了一种自定义隐藏的NLinear模型和对DeepAR进行课程学习阶段的改进，两者都能在其基础模型中提高表现。

Towards Efficient and Trustworthy AI Through Hardware-Algorithm-Communication Co-Design

paper_url: http://arxiv.org/abs/2309.15942
repo_url: None
paper_authors: Bipin Rajendran, Osvaldo Simeone, Bashir M. Al-Hashimi
for: 这篇论文的目的是提出一种基于硬件和软件设计的高效可靠人工智能（AI）算法，以提高AI模型的可靠性和不确定性评估。
methods: 该论文提出了一些研究方向，包括将物理知识integrated into计算基础结构，采用神经科学原则来实现高效信息处理，使用信息论和通信论的结果来估计 uncertainty，并采用分布式处理的通信论指南。
results: 该论文认为，通过采用新的设计方法，可以不仅提高AI模型的准确率，还可以提供可靠的不确定性评估。此外，该论文还提出了一些基于新 computing 架构的高效可靠AI算法，包括卷积神经网络、听频神经网络和量子计算技术。

Abstract
Artificial intelligence (AI) algorithms based on neural networks have been designed for decades with the goal of maximising some measure of accuracy. This has led to two undesired effects. First, model complexity has risen exponentially when measured in terms of computation and memory requirements. Second, state-of-the-art AI models are largely incapable of providing trustworthy measures of their uncertainty, possibly `hallucinating' their answers and discouraging their adoption for decision-making in sensitive applications. With the goal of realising efficient and trustworthy AI, in this paper we highlight research directions at the intersection of hardware and software design that integrate physical insights into computational substrates, neuroscientific principles concerning efficient information processing, information-theoretic results on optimal uncertainty quantification, and communication-theoretic guidelines for distributed processing. Overall, the paper advocates for novel design methodologies that target not only accuracy but also uncertainty quantification, while leveraging emerging computing hardware architectures that move beyond the traditional von Neumann digital computing paradigm to embrace in-memory, neuromorphic, and quantum computing technologies. An important overarching principle of the proposed approach is to view the stochasticity inherent in the computational substrate and in the communication channels between processors as a resource to be leveraged for the purpose of representing and processing classical and quantum uncertainty.

摘要
人工智能（AI）算法基于神经网络已经在数十年中设计的目标是最大化某种精度指标。这两个不良效果：首先，模型复杂性 exponentiates 计算和内存需求。其次，当前的AI模型几乎无法提供可靠的不确定度评估，可能会“幻见”答案，这使得它们在敏感应用中无法得到采用。为实现高效可靠的AI，本文提出了融合硬件和软件设计的研究方向。这些方向包括：1. 基于物理学的启发，设计计算substrate，以提高计算效率和可靠性。2. 基于神经科学的原理，设计硬件和软件结构，以提高信息处理效率。3. 基于信息论的结果，使用最佳的不确定度量化方法，以提高模型的可靠性。4. 基于通信论的指导原则，设计分布式处理的方法，以提高模型的可靠性和可重复性。总的来说，本文提出了一种新的设计方法，该方法不仅考虑精度，还考虑不确定度量化。此外，该方法还利用新的计算硬件技术，例如半导体、神经元和量子计算技术，以超越传统的沃尔夫尼亚姆数字计算模式。一个重要的总体原则是视计算substrate和通信频道之间的随机性为可以利用的资源，以便用于表示和处理类型的不确定性。

SHACIRA: Scalable HAsh-grid Compression for Implicit Neural Representations

paper_url: http://arxiv.org/abs/2309.15848
repo_url: https://github.com/Sharath-girish/Shacira
paper_authors: Sharath Girish, Abhinav Shrivastava, Kamal Gupta
for: 这 paper 旨在提出一种任务不受限制的框架，用于压缩Instant-NGP中的学习特征网格，以提高存储和流处理应用程序中的效率。
methods: 这 paper 使用了量化 latent веса的重parameterization和熵 regularization来实现压缩，而不需要额外的post验签/量化阶段。
results: 实验结果表明，我们的方法可以在多个领域中实现高度压缩，而不需要大量的数据或域特有的规则。我们的项目页面可以在 http://shacira.github.io 找到。

Abstract
Implicit Neural Representations (INR) or neural fields have emerged as a popular framework to encode multimedia signals such as images and radiance fields while retaining high-quality. Recently, learnable feature grids proposed by Instant-NGP have allowed significant speed-up in the training as well as the sampling of INRs by replacing a large neural network with a multi-resolution look-up table of feature vectors and a much smaller neural network. However, these feature grids come at the expense of large memory consumption which can be a bottleneck for storage and streaming applications. In this work, we propose SHACIRA, a simple yet effective task-agnostic framework for compressing such feature grids with no additional post-hoc pruning/quantization stages. We reparameterize feature grids with quantized latent weights and apply entropy regularization in the latent space to achieve high levels of compression across various domains. Quantitative and qualitative results on diverse datasets consisting of images, videos, and radiance fields, show that our approach outperforms existing INR approaches without the need for any large datasets or domain-specific heuristics. Our project page is available at http://shacira.github.io .

摘要
含义表示（INR）或神经场已成为编码多媒体信号（图像和辐射场）的受欢迎框架，保持高质量。近些年，可学习特征格子提出了可学习特征格子，以替代大型神经网络，从而实现训练和采样INR的速度增快。然而，这些特征格子带来大量内存占用，可能对存储和流动应用造成瓶颈。在这种情况下，我们提出了SHACIRA，一个简单 yet有效的任务无关框架，用于压缩特征格子，无需额外的后处剖分/量化阶段。我们将特征格子映射到量化的幂Weight中，并在幂空间应用Entropy抑制来实现高度压缩。在多个领域中，包括图像、视频和辐射场，我们的方法与现有INR方法进行比较，并且不需要大量的数据或域特定的规则。更多信息可以通过我们的项目页面http://shacira.github.io/获取。

Examining the Values Reflected by Children during AI Problem Formulation

paper_url: http://arxiv.org/abs/2309.15839
repo_url: None
paper_authors: Utkarsh Dwivedi, Salma Elsayed-ali, Elizabeth Bonsignore, Hernisa Kacorri
for: 这个论文的目的是了解儿童在设计和训练AIInterface时所优先的目标和价值观。
methods: 论文使用了合作设计方法和修改后的故事板来让儿童和成年合作者在AI问题定义上进行活动。
results: 研究发现儿童的提议中含有高级的系统智能，如感知和理解用户的社交关系。儿童的想法表明他们关心家庭和期望机器能够理解社交上下文。

Abstract
Understanding how children design and what they value in AI interfaces that allow them to explicitly train their models such as teachable machines, could help increase such activities' impact and guide the design of future technologies. In a co-design session using a modified storyboard, a team of 5 children (aged 7-13 years) and adult co-designers, engaged in AI problem formulation activities where they imagine their own teachable machines. Our findings, leveraging an established psychological value framework (the Rokeach Value Survey), illuminate how children conceptualize and embed their values in AI systems that they themselves devise to support their everyday activities. Specifically, we find that children's proposed ideas require advanced system intelligence, e.g. emotion detection and understanding the social relationships of a user. The underlying models could be trained under multiple modalities and any errors would be fixed by adding more data or by anticipating negative examples. Children's ideas showed they cared about family and expected machines to understand their social context before making decisions.

摘要
理解儿童在设计和训练AI模型时所价值的内容，可以帮助提高这些活动的影响力并导向未来技术的设计。在一个 modify 的故事板 session 中，一组5名儿童（年龄7-13岁）和成年合作设计者，参与了 AI 问题定制活动，在假设自己的教程机器人时。我们的发现，基于已成立的心理价值框架（Rokeach Value Survey），揭示了儿童如何概念化并嵌入自己的价值观在自己设计的 AI 系统中。特别是，儿童的提出的想法需要高级的系统智能，例如情感检测和理解用户的社交关系。这些基础模型可以在多种感知模式下训练，并且任何错误都可以通过添加更多数据或预期负例来修复。儿童的想法表明他们关心家庭和期望机器人能够理解其社交上下文，在做出决策之前。

OrthoPlanes: A Novel Representation for Better 3D-Awareness of GANs

paper_url: http://arxiv.org/abs/2309.15830
repo_url: None
paper_authors: Honglin He, Zhuoqian Yang, Shikai Li, Bo Dai, Wayne Wu
for: 这个论文的目的是为了生成高精度的、视角一致的3D图像。
methods: 这个论文提出了一种hybrid的显式-隐式表示方法，称为OrthoPlanes，它可以高效地通过修改2D StyleGANs来生成细节rich的3D信息。
results: 实验表明，这个方法可以处理更加困难的视角和生成高度自由的静止物体图像，并且在FFHQ和SHHQ数据集上达到了状态 искусственный智能水平。项目页面：https://orthoplanes.github.io/。

Abstract
We present a new method for generating realistic and view-consistent images with fine geometry from 2D image collections. Our method proposes a hybrid explicit-implicit representation called \textbf{OrthoPlanes}, which encodes fine-grained 3D information in feature maps that can be efficiently generated by modifying 2D StyleGANs. Compared to previous representations, our method has better scalability and expressiveness with clear and explicit information. As a result, our method can handle more challenging view-angles and synthesize articulated objects with high spatial degree of freedom. Experiments demonstrate that our method achieves state-of-the-art results on FFHQ and SHHQ datasets, both quantitatively and qualitatively. Project page: \url{https://orthoplanes.github.io/}.

摘要
我们提出了一种新的方法，可以生成具有细腻的三维信息的真实和视角一致的图像集。我们的方法使用名为“OrthoPlanes”的混合显式隐式表示方式，可以快速地由修改2D StyleGANs生成细节rich的特征图。相比之前的表示方法，我们的方法具有更好的扩展性和表达能力，并且有明确的信息。因此，我们的方法可以更好地处理更加困难的视角和 sintheSize articulated objects with high spatial degree of freedom。实验结果表明，我们的方法在FFHQ和SHHQ数据集上达到了现状之册的Result， both quantitatively and qualitatively。项目页面：\url{https://orthoplanes.github.io/}.

Lyra: Orchestrating Dual Correction in Automated Theorem Proving

paper_url: http://arxiv.org/abs/2309.15806
repo_url: https://github.com/chuanyang-zheng/lyra-theorem-prover
paper_authors: Chuanyang Zheng, Haiming Wang, Enze Xie, Zhengying Liu, Jiankai Sun, Huajian Xin, Jianhao Shen, Zhenguo Li, Yu Li
for: 这个论文的目的是提高大型自然语言模型（LLMs）在正式证明领域的效果，特别是避免幻觉和证明错误的反馈。
methods: 这个论文提出了一个新的框架 called Lyra，它使用了两种不同的修正机制：工具修正（TC）和推测修正（CC）。TC 使用先前知识来利用预定义的证明工具（如 Sledgehammer）来指导错误工具的更换，以避免幻觉。CC 是一种错误反馈机制，用于与证明器交互，以改进正式证明 conjecture。
results: 该论文的方法在 miniF2F 验证和测试集上达到了当前最佳性能（SOTA），从48.0% 提高到55.3% 和从45.5% 提高到51.2%。此外，论文还解决了三个国际数学奥林匹克（IMO）问题。

Abstract
Large Language Models (LLMs) present an intriguing avenue for exploration in the field of formal theorem proving. Nevertheless, their full potential, particularly concerning the mitigation of hallucinations and refinement through prover error messages, remains an area that has yet to be thoroughly investigated. To enhance the effectiveness of LLMs in the field, we introduce the Lyra, a new framework that employs two distinct correction mechanisms: Tool Correction (TC) and Conjecture Correction (CC). To implement Tool Correction in the post-processing of formal proofs, we leverage prior knowledge to utilize predefined prover tools (e.g., Sledgehammer) for guiding the replacement of incorrect tools. Tool Correction significantly contributes to mitigating hallucinations, thereby improving the overall accuracy of the proof. In addition, we introduce Conjecture Correction, an error feedback mechanism designed to interact with prover to refine formal proof conjectures with prover error messages. Compared to the previous refinement framework, the proposed Conjecture Correction refines generation with instruction but does not collect paired (generation, error & refinement) prompts. Our method has achieved state-of-the-art (SOTA) performance on both miniF2F validation (48.0% -> 55.3%) and test (45.5% -> 51.2%). We also present 3 IMO problems solved by Lyra. We believe Tool Correction (post-process for hallucination mitigation) and Conjecture Correction (subgoal adjustment from interaction with environment) could provide a promising avenue for future research in this field.

摘要
大型语言模型（LLM）在形式证明领域的探索具有吸引力，但它们的全面潜力，特别是通过证明错误消息修正和抑制幻觉，仍然是一个尚未被全面探索的领域。为了增强LLM在这个领域的效果，我们介绍了一个新的框架，即Lyra，它使用了两种不同的修正机制：工具修正（TC）和推测修正（CC）。为了在后期处理中使用工具修正，我们利用了先前知识，使用预定的证明工具（例如Sledgehammer）来指导错误工具的更换。工具修正在幻觉缓解方面做出了重要贡献，从而提高了证明的总准确性。此外，我们引入了推测修正，这是一种基于证明错误消息的错误反馈机制，可以与证明进行互动，以修正正式证明的推测。与之前的修复框架相比，我们的提案的推测修正不需要收集配对（生成、错误和修复）的示例。我们的方法在miniF2F验证中达到了状态的最佳性能（SOTA），从48.0%提高到55.3%，以及在测试中从45.5%提高到51.2%。我们还展示了3个IMO问题，由Lyra解决。我们认为工具修正（幻觉缓解）和推测修正（从环境交互修正）是未来这个领域的有前途的研究方向。

AI in Software Engineering: Case Studies and Prospects

paper_url: http://arxiv.org/abs/2309.15768
repo_url: None
paper_authors: Lei Wang
for: 本文旨在研究人工智能（AI）和软件工程（SE）之间的关系，以及如何应用AI技术在软件开发中提高软件产品质量。
methods: 本文分析了两个案例研究：IBM watson和Google AlphaGo，它们都使用了不同的AI技术来解决现实世界中的挑战问题。
results: 研究发现，使用AI技术如深度学习和机器学习在软件系统中可以实现智能系统。IBM watson采用了“决策支持”策略，帮助人类做出决策；AlphaGo则使用了“自动决策”选择操作，以实现最佳结果。此外，AlphaGo还使用了神经网络和强化学习来模仿人脑，这可能在医学研究中用于诊断和治疗。然而，我们还需要很长的时间来复制人脑在机器中，因为人脑和机器是内在不同的。

Abstract
Artificial intelligence (AI) and software engineering (SE) are two important areas in computer science. In recent years, researchers are trying to apply AI techniques in various stages of software development to improve the overall quality of software products. Moreover, there are also some researchers focus on the intersection between SE and AI. In fact, the relationship between SE and AI is very weak; however, methods and techniques in one area have been adopted in another area. More and more software products are capable of performing intelligent behaviour like human beings. In this paper, two cases studies which are IBM Watson and Google AlphaGo that use different AI techniques in solving real world challenging problems have been analysed, evaluated and compared. Based on the analysis of both case studies, using AI techniques such as deep learning and machine learning in software systems contributes to intelligent systems. Watson adopts 'decision making support' strategy to help human make decisions; whereas AlphaGo uses 'self-decision making' to choose operations that contribute to the best outcome. In addition, Watson learns from man-made resources such as paper; AlphaGo, on the other hand, learns from massive online resources such as photos. AlphaGo uses neural networks and reinforcement learning to mimic human brain, which might be very useful in medical research for diagnosis and treatment. However, there is still a long way to go if we want to reproduce human brain in machine and view computers as thinkers, because human brain and machines are intrinsically different. It would be more promising to see whether computers and software systems will become more and more intelligent to help with real world challenging problems that human beings cannot do.

摘要
人工智能（AI）和软件工程（SE）是计算机科学中两个重要领域。近年来，研究人员尝试将AI技术应用于软件开发的不同阶段，以提高软件产品的总质量。此外，还有一些研究人员关注SE和AI的交叉点。事实上，SE和AI之间的关系很弱，但是一个领域的方法和技术往往被另一个领域采纳。逐渐增多的软件产品可以展现出人类智能的行为。本文分析了IBM Watson和Google AlphaGo两个案例，它们使用了不同的AI技术解决实际世界问题。根据两个案例的分析，使用AI技术如深度学习和机器学习在软件系统中带来智能系统。Watson采用了“决策支持”策略，以帮助人类做出决策；AlphaGo则使用“自动决策”选择操作，以实现最佳结果。此外，Watson从人类制作的资源学习，如文献；AlphaGo则从大量在线资源学习，如照片。AlphaGo使用神经网络和强化学习模仿人脑，可能在医学研究中非常有用于诊断和治疗。然而，我们还很遥远才能复制人脑机器，因为人脑和机器是内在不同的。可能更有前途的是看看计算机和软件系统会变得越来越智能，以帮助实际世界中的问题。

Borges and AI

paper_url: http://arxiv.org/abs/2310.01425
repo_url: None
paper_authors: Léon Bottou, Bernhard Schölkopf
for: 这篇论文主要是为了探讨大型自然语言模型（LLM）是人工智能（AI）的开端，以及这种技术的可能性和威胁。
methods: 本论文使用了 Jorge Luis Borges 的文学创作作为视角，以帮助理解大型自然语言模型和人工智能之间的关系。
results: 本论文提出了一种新的视角，即通过 Borges 的文学创作来理解大型自然语言模型和人工智能之间的关系，从而帮助我们更深入理解这种技术的潜在可能性和威胁。

Abstract
Many believe that Large Language Models (LLMs) open the era of Artificial Intelligence (AI). Some see opportunities while others see dangers. Yet both proponents and opponents grasp AI through the imagery popularised by science fiction. Will the machine become sentient and rebel against its creators? Will we experience a paperclip apocalypse? Before answering such questions, we should first ask whether this mental imagery provides a good description of the phenomenon at hand. Understanding weather patterns through the moods of the gods only goes so far. The present paper instead advocates understanding LLMs and their connection to AI through the imagery of Jorge Luis Borges, a master of 20th century literature, forerunner of magical realism, and precursor to postmodern literature. This exercise leads to a new perspective that illuminates the relation between language modelling and artificial intelligence.

摘要
很多人认为大语言模型（LLM）开启了人工智能（AI）的时代。一些人看到了机会，而另一些人看到了危险。然而，两者都是通过科幻小说中的形象来理解AI。例如，将机器变成自己的创造者并反抗它们吗？将会出现笔clip末日吗？在回答这些问题之前，我们应该首先问这些形象是否能够正确描述现象。通过神话和幻想的形象来理解天气patterns只能取得一定的成果。而在本文中，我们 instead advocates使用 Jorge Luis Borges 的形象来理解 LLM 和 AI 之间的关系，这将导向一种新的视角，以便更好地理解语言模型和人工智能之间的关系。

Latent Graphs for Semi-Supervised Learning on Biomedical Tabular Data

paper_url: http://arxiv.org/abs/2309.15757
repo_url: None
paper_authors: Boshko Koloski, Nada Lavrač, Senja Pollak, Blaž Škrlj
for: 提高 semi-supervised learning 技术的Robustness和性能，通过发现数据之间的关系关系
methods: 基于图表示法，利用 graph-based 表示，实现信息的流动性，同时包含全局和局部知识
results: 对生物医学数据集进行评估，提出一种基于图的方法，能够超越当前方法的性能，并且在三个生物医学数据集上达到最佳效果

Abstract
In the domain of semi-supervised learning, the current approaches insufficiently exploit the potential of considering inter-instance relationships among (un)labeled data. In this work, we address this limitation by providing an approach for inferring latent graphs that capture the intrinsic data relationships. By leveraging graph-based representations, our approach facilitates the seamless propagation of information throughout the graph, effectively incorporating global and local knowledge. Through evaluations on biomedical tabular datasets, we compare the capabilities of our approach to other contemporary methods. Our work demonstrates the significance of inter-instance relationship discovery as practical means for constructing robust latent graphs to enhance semi-supervised learning techniques. The experiments show that the proposed methodology outperforms contemporary state-of-the-art methods for (semi-)supervised learning on three biomedical datasets.

摘要
在半指导学习领域，当前的方法未能充分利用半标注数据之间实例关系的潜力。在这个工作中，我们解决这个限制，通过提供一种推理潜在图的方法，以捕捉数据的内在关系。通过利用图表示，我们的方法可以轻松地在图中传递信息，有效地结合全局和局部知识。通过对生物医学表格数据进行评估，我们与当代其他方法进行比较。我们的工作表明了在建立强大的潜在图中找到实例关系的重要性，以提高半指导学习技术的性能。实验表明，我们提出的方法在三个生物医学数据集上比当代状态オブジェクト的方法表现更出色。

paper_url: http://arxiv.org/abs/2309.15739
repo_url: https://github.com/nlp-rl/mm-cliconsummation
paper_authors: Abhisek Tiwari, Anisha Saha, Sriparna Saha, Pushpak Bhattacharyya, Minakshi Dhar
for: 这个论文的目的是提出一种多Modal临床对话概括生成任务，使用临床医生与患者之间的文本和视觉信息，并生成一个简洁的对话概括。
methods: 这种方法基于一个知识感知、多Modal、多任务的医疗领域标识和临床对话概括生成框架，使用一个适配器来把知识和视觉特征融合，并使用一个阻止机制来归一化混合特征向量。
results: 经过了大量的数据分析和评估，研究发现：（1）视觉信息具有重要的意义，（2）增加知识感知可以提高概括的准确性和医学实体保持性，（3）医疗部门标识和临床对话概括之间存在 statistically significant 的相关性。

Abstract
With the advancement of telemedicine, both researchers and medical practitioners are working hand-in-hand to develop various techniques to automate various medical operations, such as diagnosis report generation. In this paper, we first present a multi-modal clinical conversation summary generation task that takes a clinician-patient interaction (both textual and visual information) and generates a succinct synopsis of the conversation. We propose a knowledge-infused, multi-modal, multi-tasking medical domain identification and clinical conversation summary generation (MM-CliConSummation) framework. It leverages an adapter to infuse knowledge and visual features and unify the fused feature vector using a gated mechanism. Furthermore, we developed a multi-modal, multi-intent clinical conversation summarization corpus annotated with intent, symptom, and summary. The extensive set of experiments, both quantitatively and qualitatively, led to the following findings: (a) critical significance of visuals, (b) more precise and medical entity preserving summary with additional knowledge infusion, and (c) a correlation between medical department identification and clinical synopsis generation. Furthermore, the dataset and source code are available at https://github.com/NLP-RL/MM-CliConSummation.

摘要
随着电子医疗的发展，研究人员和医生们在合作开发了许多自动化医疗操作的技术，其中包括诊断报告生成。本文首先介绍了一种多模态临床对话总结生成任务，该任务可以从医生与病人的互动（包括文本和视觉信息）中生成简洁的对话总结。我们提出了一个知识激发、多模态、多任务医疗领域识别和临床对话总结框架（MM-CliConSummation）。它利用一个适配器来激发知识和视觉特征，并使用一个阻止机制将混合特征vector化。此外，我们还制作了多模态、多意向的临床对话总结数据集，该数据集包括意向、症状和总结的标注。经过了广泛的实验，我们得到了以下发现：（a）视觉信息的重要性，（b）增加知识激发后的 preciser和医学实体保持的总结，以及（c）医疗部门识别和临床总结生成之间的相关性。此外，数据集和源代码可以在GitHub上下载。

MindGPT: Interpreting What You See with Non-invasive Brain Recordings

paper_url: http://arxiv.org/abs/2309.15729
repo_url: https://github.com/jxuanc/mindgpt
paper_authors: Jiaxuan Chen, Yu Qi, Yueming Wang, Gang Pan
for: 这个研究的目的是使用非侵入式脑记录技术来解码视觉内容。
methods: 这个研究使用了一种非侵入式神经解码器，称为 MindGPT，将视觉刺激转化为自然语言。该模型基于一种视觉导向的神经编码器，并使用大语言模型GPT来实现语言semantic的导向。
results: 实验结果表明，MindGPT模型可以准确地将视觉信息转化为自然语言，并且可以评估视觉属性对语言 semantics的贡献。此外，研究还发现，高级视觉 cortex (HVC) 比低级视觉 cortex (LVC) 更具Semantic信息，只使用 HVC 可以重建大多数Semantic信息。

Abstract
Decoding of seen visual contents with non-invasive brain recordings has important scientific and practical values. Efforts have been made to recover the seen images from brain signals. However, most existing approaches cannot faithfully reflect the visual contents due to insufficient image quality or semantic mismatches. Compared with reconstructing pixel-level visual images, speaking is a more efficient and effective way to explain visual information. Here we introduce a non-invasive neural decoder, termed as MindGPT, which interprets perceived visual stimuli into natural languages from fMRI signals. Specifically, our model builds upon a visually guided neural encoder with a cross-attention mechanism, which permits us to guide latent neural representations towards a desired language semantic direction in an end-to-end manner by the collaborative use of the large language model GPT. By doing so, we found that the neural representations of the MindGPT are explainable, which can be used to evaluate the contributions of visual properties to language semantics. Our experiments show that the generated word sequences truthfully represented the visual information (with essential details) conveyed in the seen stimuli. The results also suggested that with respect to language decoding tasks, the higher visual cortex (HVC) is more semantically informative than the lower visual cortex (LVC), and using only the HVC can recover most of the semantic information. The code of the MindGPT model will be publicly available at https://github.com/JxuanC/MindGPT.

摘要
<>TRANSLATE_TEXT科学和实践中的重要价值在于解码见过的视觉内容。尝试将视觉信号中的图像重建。然而，大多数现有方法无法准确表达见过的图像，因为图像质量不够高或 semantic mismatch。相比于重建像素级视觉图像，说出视觉信息是更高效和有效的方式。我们介绍了一种非侵入性神经解码器，称为 MindGPT，它将感知的视觉刺激转化为自然语言 from fMRI 信号中。具体来说，我们的模型基于一个视觉驱动的神经编码器，其中包含一个 cross-attention 机制，使得我们可以通过携带大语言模型 GPT 的协同使用，将 latent 神经表示向 желаем的语言semantic 方向协调。由此，我们发现 MindGPT 的神经表示是可解释的，可以用于评估视觉属性对语言semantic 的贡献。我们的实验表明，生成的单词序列准确表达了看过的视觉信息（包括关键信息）。结果还表明，高级视觉区域 (HVC) 比低级视觉区域 (LVC) 更具Semantic 信息，只使用 HVC 可以回归大多数semantic 信息。MindGPT 模型的代码将在 https://github.com/JxuanC/MindGPT 公共可用。

Where Are We So Far? Understanding Data Storytelling Tools from the Perspective of Human-AI Collaboration

paper_url: http://arxiv.org/abs/2309.15723
repo_url: None
paper_authors: Haotian Li, Yun Wang, Huamin Qu
for: 这篇论文旨在探讨人工智能（AI）在数据故事创作中的支持和增强，但现有的研究很少从人机合作角度来检视现有的数据故事创作工具，这限制了研究人员对现有工具的反思和学习。
methods: 本文采用了一个框架，从数据故事创作过程中的不同阶段和人机合作角色来分析现有工具，包括分析、规划、实施和沟通阶段，以及人类和AI在每个阶段的角色，如创作者、助手、优化者和审查者。
results: 通过分析，我们发现现有工具中的常见合作模式，总结了这些模式所学习的经验教训，并进一步阐述了人机合作在数据故事创作中的研究机遇。

Abstract
Data storytelling is powerful for communicating data insights, but it requires diverse skills and considerable effort from human creators. Recent research has widely explored the potential for artificial intelligence (AI) to support and augment humans in data storytelling. However, there lacks a systematic review to understand data storytelling tools from the perspective of human-AI collaboration, which hinders researchers from reflecting on the existing collaborative tool designs that promote humans' and AI's advantages and mitigate their shortcomings. This paper investigated existing tools with a framework from two perspectives: the stages in the storytelling workflow where a tool serves, including analysis, planning, implementation, and communication, and the roles of humans and AI in each stage, such as creators, assistants, optimizers, and reviewers. Through our analysis, we recognize the common collaboration patterns in existing tools, summarize lessons learned from these patterns, and further illustrate research opportunities for human-AI collaboration in data storytelling.

摘要
<>转换文本到简化中文。<>数据故事传递具有强大的沟通数据发现力，但是需要多种技能和较大的人类创造者的努力。最近的研究广泛探讨了人工智能（AI）支持和加强人类数据故事传递的潜力。然而，还缺乏一个系统性的审查，以便研究人员反思现有的合作工具的设计，以便利用人类和AI的优势，避免他们的缺点。这篇论文调查了现有工具，使用了两个视角：在数据故事传递过程中工具服务的阶段，包括分析、规划、实施和沟通，以及在每个阶段中人类和AI的角色，如创作者、助手、优化者和审查者。通过我们的分析，我们认可现有工具的共同合作模式，总结了这些模式所学到的经验教训，并进一步阐述了人类-AI合作在数据故事传递中的研究机会。

paper_url: http://arxiv.org/abs/2309.15719
repo_url: None
paper_authors: Heinrich Peters, Michael Parrott
For: The paper aims to address the issue of many machine learning (ML) projects never progressing past the proof-of-concept stage by introducing an easy-to-use platform called Model Share AI (AIMS) to streamline collaborative model development, model provenance tracking, and model deployment.* Methods: The paper describes the features of AIMS, including collaborative project spaces, a standardized model evaluation process, and the ability to deploy ML models built in various frameworks into live REST APIs and automatically generated web apps with minimal code.* Results: The paper highlights the potential of AIMS to make ML research more applicable to real-world challenges by facilitating collaborative model development, capturing model performance and metadata for provenance tracking, and providing a user-friendly platform for deploying ML models to non-technical end-users through web apps.Here are the three points in Simplified Chinese:* For: 这篇论文目标是解决机器学习（ML）项目常常无法继续进行证明阶段的问题，并提出了一个易于使用的平台called Model Share AI（AIMS），用于协作模型开发、追踪模型来源和模型部署。* Methods: 论文描述了 AIMS 的特点，包括协作项目空间、基于不同框架的模型部署、以及使得 ML 模型在 REST API 和自动生成的 web 应用中部署的最小代码。* Results: 论文强调了 AIMS 可能使 ML 研究更加适用于实际挑战，通过促进协作模型开发、记录模型性能和元数据进行追踪、以及提供易于使用的平台来帮助非技术用户通过 web 应用访问 ML 模型。

Abstract
Machine learning (ML) has the potential to revolutionize a wide range of research areas and industries, but many ML projects never progress past the proof-of-concept stage. To address this issue, we introduce Model Share AI (AIMS), an easy-to-use MLOps platform designed to streamline collaborative model development, model provenance tracking, and model deployment, as well as a host of other functions aiming to maximize the real-world impact of ML research. AIMS features collaborative project spaces and a standardized model evaluation process that ranks model submissions based on their performance on unseen evaluation data, enabling collaborative model development and crowd-sourcing. Model performance and various model metadata are automatically captured to facilitate provenance tracking and allow users to learn from and build on previous submissions. Additionally, AIMS allows users to deploy ML models built in Scikit-Learn, TensorFlow Keras, PyTorch, and ONNX into live REST APIs and automatically generated web apps with minimal code. The ability to deploy models with minimal effort and to make them accessible to non-technical end-users through web apps has the potential to make ML research more applicable to real-world challenges.

摘要
机器学习（ML）有望革命化广泛的研究领域和行业，但许多ML项目很难进入实际应用阶段。为解决这问题，我们介绍Model Share AI（AIMS），一个易用的MLOps平台，旨在协同开发模型、追踪模型来源、模型部署以及多种其他功能，以最大化ML研究的实际影响。AIMS提供了协同项目空间和基于未见评估数据的模型评价过程，可以促进协同开发和招募模型。模型性能和多种模型元数据会自动记录，以便追踪模型来源和启发新 submission。此外，AIMS还允许用户通过 minimum code 将 Scikit-Learn、TensorFlow Keras、PyTorch 和 ONNX 中的模型部署到live REST API 和自动生成的网页应用程序中，以便让 ML 研究更加适应实际挑战。

Brave new world: Artificial Intelligence in teaching and learning

paper_url: http://arxiv.org/abs/2310.06856
repo_url: None
paper_authors: Adrian Groza, Anca Marginean
for: 这篇论文主要是为了探讨大语言模型在教学和学习中的应用，以及教育领域中已经发生的人工智能事件，并提出了在大学中引入人工智能政策的必要性和紧迫性。
methods: 本论文使用了大语言模型在教学和学习中的应用，以及已经发生的人工智能事件，以探讨教育领域中的人工智能应用。
results: 本论文认为，每所高等教育机构应该有一个人工智能政策，以提高教育工具的认识，并减少教育领域中的人工智能事件风险。

Abstract
We exemplify how Large Language Models are used in both teaching and learning. We also discuss the AI incidents that have already occurred in the education domain, and we argue for the urgent need to introduce AI policies in universities and for the ongoing strategies to regulate AI. Regarding policy for AI, our view is that each institution should have a policy for AI in teaching and learning. This is important from at least twofolds: (i) to raise awareness on the numerous educational tools that can both positively and negatively affect education; (ii) to minimise the risk of AI incidents in education.

摘要
我团队讲述了大语言模型在教学和学习中的应用，以及教育领域已经发生的人工智能事件。我们认为，每所学府应该制定一份人工智能教学政策，这有两点重要性：（一）提高教育工具的认识，这些工具可以 both positively和negativelyaffect教育;（二）减少教育领域的人工智能事件风险。

Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey

paper_url: http://arxiv.org/abs/2310.01424
repo_url: None
paper_authors: Victoria Smith, Ali Shahin Shamsabadi, Carolyn Ashurst, Adrian Weller
for: 本研究旨在帮助研究人员和政策制定者更好地理解语言模型（LM）的隐私风险和mitigation策略，包括需要更多的研究和关注的方向。
methods: 本研究使用了一种稍加分析语言模型（LM）的隐私风险和mitigation策略的方法，包括分析LM的攻击和防御方法，并对现有的mitigation策略进行了评估和分析。
results: 本研究通过分析了多种隐私攻击和mitigation策略，并对现有的mitigation策略进行了评估和分析，得出了一些结论和建议，包括LM的攻击和防御方法，以及需要更多的研究和关注的方向。

Abstract
Rapid advancements in language models (LMs) have led to their adoption across many sectors. Alongside the potential benefits, such models present a range of risks, including around privacy. In particular, as LMs have grown in size, the potential to memorise aspects of their training data has increased, resulting in the risk of leaking private information. As LMs become increasingly widespread, it is vital that we understand such privacy risks and how they might be mitigated. To help researchers and policymakers understand the state of knowledge around privacy attacks and mitigations, including where more work is needed, we present the first technical survey on LM privacy. We (i) identify a taxonomy of salient dimensions where attacks differ on LMs, (ii) survey existing attacks and use our taxonomy of dimensions to highlight key trends, (iii) discuss existing mitigation strategies, highlighting their strengths and limitations, identifying key gaps and demonstrating open problems and areas for concern.

摘要
快速发展的语言模型（LM）在多个领域得到广泛应用，同时也存在一系列的风险，包括隐私问题。特别是LM的大小增加后，可能吸收训练数据的memory risk增加，可能导致泄露private information。随着LM的普及，我们必须了解这些隐私风险，并研究如何 Mitigate them。为了帮助研究人员和政策制定者更好地理解隐私攻击和 Mitigation Strategies，我们提出了语言模型隐私技术的首次技术评估。我们（i）确定了LM隐私攻击的重要维度，（ii）survey了现有的攻击方法，并使用我们的维度分类来描述主要趋势，（iii）讨论了现有的 Mitigation Strategies， highlighting their strengths and limitations，并识别主要的缺陷和开放问题。

Integrating LLM, EEG, and Eye-Tracking Biomarker Analysis for Word-Level Neural State Classification in Semantic Inference Reading Comprehension

paper_url: http://arxiv.org/abs/2309.15714
repo_url: None
paper_authors: Yuhong Zhang, Qin Li, Sujal Nahata, Tasnia Jamal, Shih-kuen Cheng, Gert Cauwenberghs, Tzyy-Ping Jung
for: This pilot study aims to provide insights into individuals’ neural states during a semantic relation reading-comprehension task.
methods: The study jointly analyzes LLMs, eye-gaze, and electroencephalographic (EEG) data to study how the brain processes words with varying degrees of relevance to a keyword during reading.
results: The best validation accuracy in this word-level classification is over 60% across 12 subjects. Words of high relevance to the inference keyword had significantly more eye fixations per word compared to words of low relevance.Here is the same information in Simplified Chinese text:
for: 这个飞行试验的目的是研究人类在 semantic relation 读写理解任务中的 neural 状态。
methods: 这个研究将jointly 分析 LLMS，eye-gaze，和电enzephalographic（EEG）数据，以研究阅读中关键字的 brain 处理词语 varying degrees of relevance 的过程。
results: 这个word-level 分类的最佳验证精度超过 60% across 12 个主体。关键字 relevance 高的词语在阅读中有significantly 更多的眼动 Fixations per word。

Abstract
With the recent proliferation of large language models (LLMs), such as Generative Pre-trained Transformers (GPT), there has been a significant shift in exploring human and machine comprehension of semantic language meaning. This shift calls for interdisciplinary research that bridges cognitive science and natural language processing (NLP). This pilot study aims to provide insights into individuals' neural states during a semantic relation reading-comprehension task. We propose jointly analyzing LLMs, eye-gaze, and electroencephalographic (EEG) data to study how the brain processes words with varying degrees of relevance to a keyword during reading. We also use a feature engineering approach to improve the fixation-related EEG data classification while participants read words with high versus low relevance to the keyword. The best validation accuracy in this word-level classification is over 60\% across 12 subjects. Words of high relevance to the inference keyword had significantly more eye fixations per word: 1.0584 compared to 0.6576 when excluding no-fixation words, and 1.5126 compared to 1.4026 when including them. This study represents the first attempt to classify brain states at a word level using LLM knowledge. It provides valuable insights into human cognitive abilities and the realm of Artificial General Intelligence (AGI), and offers guidance for developing potential reading-assisted technologies.

摘要
With the recent proliferation of large language models (LLMs), such as Generative Pre-trained Transformers (GPT), there has been a significant shift in exploring human and machine comprehension of semantic language meaning. This shift calls for interdisciplinary research that bridges cognitive science and natural language processing (NLP). This pilot study aims to provide insights into individuals' neural states during a semantic relation reading-comprehension task. We propose jointly analyzing LLMs, eye-gaze, and electroencephalographic (EEG) data to study how the brain processes words with varying degrees of relevance to a keyword during reading. We also use a feature engineering approach to improve the fixation-related EEG data classification while participants read words with high versus low relevance to the keyword. The best validation accuracy in this word-level classification is over 60\% across 12 subjects. Words of high relevance to the inference keyword had significantly more eye fixations per word: 1.0584 compared to 0.6576 when excluding no-fixation words, and 1.5126 compared to 1.4026 when including them. This study represents the first attempt to classify brain states at a word level using LLM knowledge. It provides valuable insights into human cognitive abilities and the realm of Artificial General Intelligence (AGI), and offers guidance for developing potential reading-assisted technologies.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. The traditional Chinese form of the text is also available upon request.

HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models

paper_url: http://arxiv.org/abs/2309.15701
repo_url: https://github.com/hypotheses-paradise/hypo2trans
paper_authors: Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Sabato Macro Siniscalchi, Pin-Yu Chen, Eng Siong Chng
for: 本研究目的是提出一种基于大语言模型（LLM）的自动语音识别（ASR）错误修复方法，以提高ASR系统在不良条件下的表现。
methods: 本研究使用了一个开源的 benchmark，其中包含了一个新的数据集（HyPoradise，HP），该数据集包含了超过334,000个N-best假设和其相应的准确的转录。三种基于LLM的错误修复技术被研究，其中一种使用了reasonable prompt和其生成能力来修复缺失的token。
results: 实验证明，提出的方法可以超越传统的重新排序基于方法的上限，并且LLM的reasonable prompt和生成能力可以修复缺失的token。 results publicly accessible，以便用于可重现的管道中。

Abstract
Advancements in deep neural networks have allowed automatic speech recognition (ASR) systems to attain human parity on several publicly available clean speech datasets. However, even state-of-the-art ASR systems experience performance degradation when confronted with adverse conditions, as a well-trained acoustic model is sensitive to variations in the speech domain, e.g., background noise. Intuitively, humans address this issue by relying on their linguistic knowledge: the meaning of ambiguous spoken terms is usually inferred from contextual cues thereby reducing the dependency on the auditory system. Inspired by this observation, we introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction, where N-best decoding hypotheses provide informative elements for true transcription prediction. This approach is a paradigm shift from the traditional language model rescoring strategy that can only select one candidate hypothesis as the output transcription. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses and corresponding accurate transcriptions across prevalent speech domains. Given this dataset, we examine three types of error correction techniques based on LLMs with varying amounts of labeled hypotheses-transcription pairs, which gains a significant word error rate (WER) reduction. Experimental evidence demonstrates the proposed technique achieves a breakthrough by surpassing the upper bound of traditional re-ranking based methods. More surprisingly, LLM with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list. We make our results publicly accessible for reproducible pipelines with released pre-trained models, thus providing a new evaluation paradigm for ASR error correction with LLMs.

摘要
深度神经网络技术的进步使得自动语音识别（ASR）系统可以达到人类水平在一些公开的干净语音数据集上。然而，即使使用最新的ASR系统，它们在面临不利条件时会经受性能下降，因为一个干净的语音模型对语音频谱的变化非常敏感。人类在面临这种问题时会依靠语言知识：在不同语音频谱中，人们会根据上下文提供的听觉信息来推断不确定的 spoken terms的意思，从而减少对听觉系统的依赖。 Drawing inspiration from this observation, we introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction, where N-best decoding hypotheses provide informative elements for true transcription prediction. This approach is a paradigm shift from the traditional language model rescoring strategy that can only select one candidate hypothesis as the output transcription. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses and corresponding accurate transcriptions across prevalent speech domains. Given this dataset, we examine three types of error correction techniques based on LLMs with varying amounts of labeled hypotheses-transcription pairs, which gains a significant word error rate (WER) reduction. Experimental evidence demonstrates the proposed technique achieves a breakthrough by surpassing the upper bound of traditional re-ranking based methods. Moreover, LLM with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list. We make our results publicly accessible for reproducible pipelines with released pre-trained models, thus providing a new evaluation paradigm for ASR error correction with LLMs.

Deep Model Fusion: A Survey

paper_url: http://arxiv.org/abs/2309.15698
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Weishi Li, Yong Peng, Miao Zhang, Liang Ding, Han Hu, Li Shen
for: 这个论文主要是为了探讨深度模型融合技术，尤其是在大规模深度学习模型（如LLMs和基础模型）上进行深度模型融合的挑战和可能性。methods: 这个论文主要分析了四种深度模型融合方法：（1）”Mode connectivity”，通过非增加损失的路径连接解决方案的重要性;（2）”Alignment”，匹配神经网络中单元的匹配以创造更好的融合条件;（3）”Weight average”，一种经典的模型融合方法，将多个模型的权重平均为更加准确的结果;（4）”Ensemble learning”，将多个不同模型的输出结合，以提高最终模型的准确性和可靠性。results: 这个论文分析了深度模型融合技术面临的挑战，并提出了未来研究的可能性。它还对不同的模型融合方法进行了分析和比较，帮助读者更深入地理解不同方法之间的相互关系和实际应用方法。

Abstract
Deep model fusion/merging is an emerging technique that merges the parameters or predictions of multiple deep learning models into a single one. It combines the abilities of different models to make up for the biases and errors of a single model to achieve better performance. However, deep model fusion on large-scale deep learning models (e.g., LLMs and foundation models) faces several challenges, including high computational cost, high-dimensional parameter space, interference between different heterogeneous models, etc. Although model fusion has attracted widespread attention due to its potential to solve complex real-world tasks, there is still a lack of complete and detailed survey research on this technique. Accordingly, in order to understand the model fusion method better and promote its development, we present a comprehensive survey to summarize the recent progress. Specifically, we categorize existing deep model fusion methods as four-fold: (1) "Mode connectivity", which connects the solutions in weight space via a path of non-increasing loss, in order to obtain better initialization for model fusion; (2) "Alignment" matches units between neural networks to create better conditions for fusion; (3) "Weight average", a classical model fusion method, averages the weights of multiple models to obtain more accurate results closer to the optimal solution; (4) "Ensemble learning" combines the outputs of diverse models, which is a foundational technique for improving the accuracy and robustness of the final model. In addition, we analyze the challenges faced by deep model fusion and propose possible research directions for model fusion in the future. Our review is helpful in deeply understanding the correlation between different model fusion methods and practical application methods, which can enlighten the research in the field of deep model fusion.

摘要
深度模型融合/合并是一种出现在的技术，它将多个深度学习模型的参数或预测融合到一起，以实现更好的性能。它利用不同模型的能力来补偿单个模型的偏见和错误，以解决复杂的实际任务。然而，深度模型融合在大规模深度学习模型（例如LLMs和基础模型）上面临多种挑战，包括高计算成本、高维度参数空间以及不同型号之间的干扰等。虽然模型融合吸引了广泛的关注，因为它有可能解决复杂的实际任务，但是还没有完整和详细的报告研究这种技术。因此，为了更好地理解模型融合方法，我们提供了一份完整的报告，总结了最近的进展。 Specifically，我们将现有的深度模型融合方法分为四类：（1）“模式连接”，通过非增加损失的路径连接解 Solution Space，以获得更好的初始化 для模型融合;（2）“匹配”，将神经网络中的单元匹配，创造更好的融合 conditio;（3）“Weight average”，一种经典的模型融合方法，将多个模型的权重平均，以获得更加准确的结果，更近于优化解决方案;（4）“集成学习”，将多个不同模型的输出融合，是深度学习领域的基础技术，可以提高最终模型的准确性和鲁棒性。此外，我们还分析了深度模型融合所面临的挑战，并提出了未来模型融合的可能的研究方向。我们的评论对深度模型融合的深入理解和实际应用方法之间的相互关系做出了贡献，可以推动深度模型融合领域的研究。

Genetic Algorithm-Based Dynamic Backdoor Attack on Federated Learning-Based Network Traffic Classification

paper_url: http://arxiv.org/abs/2310.06855
repo_url: None
paper_authors: Mahmoud Nazzal, Nura Aljaafari, Ahmed Sawalmeh, Abdallah Khreishah, Muhammad Anan, Abdulelah Algosaibi, Mohammed Alnaeem, Adel Aldalbahi, Abdulaziz Alhumam, Conrado P. Vizcarra, Shadan Alhamed
for: 这个研究是为了探讨基于联合学习的网络流量分类模型是否受到后门攻击的问题。
methods: 本研究使用了一种基于遗传算法的后门攻击方法，叫做GABAttack，它利用遗传算法来优化后门触发模式的值和位置，以 guarantees a better fit with the input and the model。
results: 实验结果显示GABAttack可以在实际的网络数据上得到良好的成果，并且可以在不同的情况下保持这些成果。这个研究作为一个警示，让网络安全专家和实践者为这种攻击进行防御措施。

Abstract
Federated learning enables multiple clients to collaboratively contribute to the learning of a global model orchestrated by a central server. This learning scheme promotes clients' data privacy and requires reduced communication overheads. In an application like network traffic classification, this helps hide the network vulnerabilities and weakness points. However, federated learning is susceptible to backdoor attacks, in which adversaries inject manipulated model updates into the global model. These updates inject a salient functionality in the global model that can be launched with specific input patterns. Nonetheless, the vulnerability of network traffic classification models based on federated learning to these attacks remains unexplored. In this paper, we propose GABAttack, a novel genetic algorithm-based backdoor attack against federated learning for network traffic classification. GABAttack utilizes a genetic algorithm to optimize the values and locations of backdoor trigger patterns, ensuring a better fit with the input and the model. This input-tailored dynamic attack is promising for improved attack evasiveness while being effective. Extensive experiments conducted over real-world network datasets validate the success of the proposed GABAttack in various situations while maintaining almost invisible activity. This research serves as an alarming call for network security experts and practitioners to develop robust defense measures against such attacks.

摘要
federated learning 可以让多个客户端共同参与到全球模型的学习中，由中央服务器进行协调。这种学习方式可以保护客户端的数据隐私，并减少通信开销。在应用于网络流量分类中，这会隐藏网络漏洞和弱点。然而， federated learning 受到后门攻击的威胁，攻击者可以在全球模型中注入修改后的模型更新。这些更新会在特定的输入模式下引入一个突出的功能，可以通过特定的输入来启动。然而，基于 federated learning 的网络流量分类模型对这些攻击的抗性尚未得到探讨。在这篇论文中，我们提出了 GABAttack，一种基于遗传算法的后门攻击方法，用于攻击 federated learning 的网络流量分类模型。GABAttack 使用遗传算法来优化后门触发模式的值和位置，以确保更好地适应输入和模型。这种输入特定的动态攻击可以提高攻击的逃避能力，同时保持高效。我们对实际的网络数据进行了广泛的实验，并证明了 GABAttack 在不同的情况下都有很好的成功率，同时几乎无法察见。这些研究作为一个警告，鼓励网络安全专家和实践者开发robust的防御措施来应对这类攻击。

Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting

paper_url: http://arxiv.org/abs/2309.15649
repo_url: None
paper_authors: Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke
for: 研究大型自然语言模型（LLM）是否可以作为语音识别后处理器，进行重分配和错误修正。
methods: 研究不同的提示方法，包括零拟合和少量拟合在Context learning中，以及一种新的任务活动提示方法，它结合了 causal instructions和示例来增加其上下文窗口。
results: 研究发现，通过冻结LLM进行重分配，只需在Context learning中进行几个批量训练，就可以达到与预先定制的语言模型相当的性能，并且通过结合提示技术和微调来实现错误率下降至N-best oracle水平。

Abstract
We explore the ability of large language models (LLMs) to act as speech recognition post-processors that perform rescoring and error correction. Our first focus is on instruction prompting to let LLMs perform these task without fine-tuning, for which we evaluate different prompting schemes, both zero- and few-shot in-context learning, and a novel task activation prompting method that combines causal instructions and demonstration to increase its context windows. Next, we show that rescoring only by in-context learning with frozen LLMs achieves results that are competitive with rescoring by domain-tuned LMs, using a pretrained first-pass recognition system and rescoring output on two out-of-domain tasks (ATIS and WSJ). By combining prompting techniques with fine-tuning we achieve error rates below the N-best oracle level, showcasing the generalization power of the LLMs.

摘要
我团队 investigate LLMs 的能力以干作 speech recognition 后处理器，包括重新评分和错误修复。我们首先关注 instruction prompting，以便 LLMs 可以无需微调完成这些任务。我们评估了不同的提示方案，包括零射频和几射频在 Context 学习，以及一种新的任务活动提示方法，该方法结合 causal instrucitons 和 demonstration，以增加其上下文窗口。接着，我们显示了只使用冰结 LLMs 进行 in-context learning 可以达到与预训练的 domain-tuned LMs 相当的结果，使用预训练的 first-pass recognition 系统和重新评分输出在两个 out-of-domain 任务（ATIS 和 WSJ）上。通过结合提示技术与微调，我们实现了错误率低于 N-best oracle 水平，展示了 LLMs 的泛化能力。

Hedging Properties of Algorithmic Investment Strategies using Long Short-Term Memory and Time Series models for Equity Indices

paper_url: http://arxiv.org/abs/2309.15640
repo_url: None
paper_authors: Jakub Michańków, Paweł Sakowski, Robert Ślepaczuk
for: 这个论文旨在防范金融市场在金融危机时的风险投资 portfolio。
methods: 这篇论文提出了一种全新的多Asset ensemble algorithmic investment strategies（AIS）多元化风险投资策略，通过使用不同类型的数学模型（LSTM、ARIMA-GARCH、 momentum和 contrarian）生成价格预测，并将其用于生成投资信号。
results: 研究发现LSTM模型表现最佳，而使用比特币constructed AIS 是最佳多元化风险投资策略。此外，使用1小时数据也表现更好于使用日常数据。

Abstract
This paper proposes a novel approach to hedging portfolios of risky assets when financial markets are affected by financial turmoils. We introduce a completely novel approach to diversification activity not on the level of single assets but on the level of ensemble algorithmic investment strategies (AIS) built based on the prices of these assets. We employ four types of diverse theoretical models (LSTM - Long Short-Term Memory, ARIMA-GARCH - Autoregressive Integrated Moving Average - Generalized Autoregressive Conditional Heteroskedasticity, momentum, and contrarian) to generate price forecasts, which are then used to produce investment signals in single and complex AIS. In such a way, we are able to verify the diversification potential of different types of investment strategies consisting of various assets (energy commodities, precious metals, cryptocurrencies, or soft commodities) in hedging ensemble AIS built for equity indices (S&P 500 index). Empirical data used in this study cover the period between 2004 and 2022. Our main conclusion is that LSTM-based strategies outperform the other models and that the best diversifier for the AIS built for the S&P 500 index is the AIS built for Bitcoin. Finally, we test the LSTM model for a higher frequency of data (1 hour). We conclude that it outperforms the results obtained using daily data.

摘要
Our main finding is that LSTM-based strategies outperform the other models, and the best diversifier for the AIS built for the S&P 500 index is the AIS built for Bitcoin. Furthermore, we test the LSTM model on a higher frequency of data (1 hour) and find that it outperforms the results obtained using daily data.

Learning with Noisy Labels for Human Fall Events Classification: Joint Cooperative Training with Trinity Networks

paper_url: http://arxiv.org/abs/2310.06854
repo_url: None
paper_authors: Leiyu Xie, Yang Sun, Syed Mohsen Naqvi
for: 这篇论文目的是提出一个简单 yet effective的方法来解决深度学习中的污染标签问题，以保护人类试验者的隐私。
methods: 这篇论文使用了一个名为“Joint Cooperative training with Trinity Networks”的方法（简称JoCoT），具有两个教师网络和一个学生网络，以改善混淆标签学习框架的稳定性和性能。
results: 根据实验结果，JoCoT 在高混淆率下表现出色，较前一代方法高5.17%和3.35%。具体来说，JoCoT 在 UP-Fall dataset 上的平均 pairflip 和 symmetric 混淆率下，较前一代方法高5.17%和3.35%。

Abstract
With the increasing ageing population, fall events classification has drawn much research attention. In the development of deep learning, the quality of data labels is crucial. Most of the datasets are labelled automatically or semi-automatically, and the samples may be mislabeled, which constrains the performance of Deep Neural Networks (DNNs). Recent research on noisy label learning confirms that neural networks first focus on the clean and simple instances and then follow the noisy and hard instances in the training stage. To address the learning with noisy label problem and protect the human subjects' privacy, we propose a simple but effective approach named Joint Cooperative training with Trinity Networks (JoCoT). To mitigate the privacy issue, human skeleton data are used. The robustness and performance of the noisy label learning framework is improved by using the two teacher modules and one student module in the proposed JoCoT. To mitigate the incorrect selections, the predictions from the teacher modules are applied with the consensus-based method to guide the student module training. The performance evaluation on the widely used UP-Fall dataset and comparison with the state-of-the-art, confirms the effectiveness of the proposed JoCoT in high noise rates. Precisely, JoCoT outperforms the state-of-the-art by 5.17% and 3.35% with the averaged pairflip and symmetric noises, respectively.

摘要
随着老龄化人口增长，落地事件分类得到了大量研究的关注。在深度学习的发展中，数据标签质量的影响是关键。大多数数据集都是自动或半自动标注的，因此样本可能会出现误标，这会限制深度神经网络（DNNs）的性能。最近关于噪音标签学习的研究表明，神经网络在训练阶段会首先学习清晰和简单的实例，然后遵循噪音和复杂的实例。为解决噪音标签学习问题并保护人类主体隐私，我们提出了一种简单 yet有效的方法 named Joint Cooperative training with Trinity Networks（JoCoT）。使用人体骨骼数据来 mitigate the privacy issue。通过使用两个教师模块和一个学生模块，我们提高了噪音标签学习框架的Robustness和性能。通过将教师模块的预测应用于学生模块的训练中，我们减少了错误选择的问题。在广泛使用的 UP-Fall 数据集上进行性能评估，我们发现 JoCoT 在高噪音率下能够超过状态艺术。具体来说，JoCoT 在 averaged pairflip 和 symmetric noise 下的平均性能高于状态艺术的 5.17% 和 3.35%。

An Empirical Study of AI Generated Text Detection Tools

paper_url: http://arxiv.org/abs/2310.01423
repo_url: None
paper_authors: Arslan Akram
for: 这个研究的目的是为了填补现有的多Domain ChatGPT材料测试State-of-the-art API和工具的需求。
methods: 这个研究使用了一个大型的多Domain dataset，包括文章、摘要、故事、新闻和产品评论，并使用了六种人工智能文本标识系统进行测试。
results: 这个研究发现， Originality 在所有工具中表现最佳，具有97.0%的准确率。

Abstract
Since ChatGPT has emerged as a major AIGC model, providing high-quality responses across a wide range of applications (including software development and maintenance), it has attracted much interest from many individuals. ChatGPT has great promise, but there are serious problems that might arise from its misuse, especially in the realms of education and public safety. Several AIGC detectors are available, and they have all been tested on genuine text. However, more study is needed to see how effective they are for multi-domain ChatGPT material. This study aims to fill this need by creating a multi-domain dataset for testing the state-of-the-art APIs and tools for detecting artificially generated information used by universities and other research institutions. A large dataset consisting of articles, abstracts, stories, news, and product reviews was created for this study. The second step is to use the newly created dataset to put six tools through their paces. Six different artificial intelligence (AI) text identification systems, including "GPTkit," "GPTZero," "Originality," "Sapling," "Writer," and "Zylalab," have accuracy rates between 55.29 and 97.0%. Although all the tools fared well in the evaluations, originality was particularly effective across the board.

摘要
(Simplified Chinese translation)自从ChatGPT出现以来，它已经在各种应用程序中提供了高质量的响应，包括软件开发和维护，因此吸引了许多人的关注。ChatGPT拥有巨大的潜力，但是可能由其滥用而产生的问题很严重，特别是在教育和公共安全领域。目前有几种AIGC检测器可用，它们都在真实文本上进行测试。然而，更多的研究是需要了解多元领域ChatGPT材料的效果。这项研究目的是填充这个需求，通过创建一个多元领域数据集，用于测试当前最佳API和工具。一个大量的数据集，包括文章、摘要、故事、新闻和产品评论，被用于这项研究。第二步是使用新创建的数据集，对六种工具进行测试。六种人工智能文本标识系统，包括"GPTkit"、"GPTZero"、"Originality"、"Sapling"、"Writer"和"Zylalab"，在评估中的准确率分别为55.29%和97.0%。虽然所有工具在评估中表现良好，但"Originality"在整体上表现特别出色。

Perception for Humanoid Robots

paper_url: http://arxiv.org/abs/2309.15616
repo_url: https://github.com/openhumanoids/oh-distro
paper_authors: Arindam Roychoudhury, Shahram Khorshidi, Subham Agrawal, Maren Bennewitz
for: 本研究探讨了人工智能机器人 perceive 技术的最新发展和趋势。
methods: 本研究使用了多种感知模式和技术，包括视觉、听觉和感觉感知，以实现机器人与人类和环境的互动。
results: 研究发现，多感知模式的融合和机器学习技术在机器人内部状态估计、环境理解和人机交互方面具有广泛的应用前景。

Abstract
Purpose of Review: The field of humanoid robotics, perception plays a fundamental role in enabling robots to interact seamlessly with humans and their surroundings, leading to improved safety, efficiency, and user experience. This scientific study investigates various perception modalities and techniques employed in humanoid robots, including visual, auditory, and tactile sensing by exploring recent state-of-the-art approaches for perceiving and understanding the internal state, the environment, objects, and human activities. Recent Findings: Internal state estimation makes extensive use of Bayesian filtering methods and optimization techniques based on maximum a-posteriori formulation by utilizing proprioceptive sensing. In the area of external environment understanding, with an emphasis on robustness and adaptability to dynamic, unforeseen environmental changes, the new slew of research discussed in this study have focused largely on multi-sensor fusion and machine learning in contrast to the use of hand-crafted, rule-based systems. Human robot interaction methods have established the importance of contextual information representation and memory for understanding human intentions. Summary: This review summarizes the recent developments and trends in the field of perception in humanoid robots. Three main areas of application are identified, namely, internal state estimation, external environment estimation, and human robot interaction. The applications of diverse sensor modalities in each of these areas are considered and recent significant works are discussed.

摘要
目的的检查：人类型 робоット学中，感知对于机器人与人类环境互动过程中的流畅性、安全性、效率和用户体验具有基本作用。这项科学研究探讨了人类型 робоット中不同感知模式和技术的应用，包括视觉、听觉和触觉感知，并探讨最新的状态艺术方法以及理解内部状态、环境、物体和人类活动的感知和理解方法。最新发现：机器人内部状态估计主要利用极大似然估计方法和优化技术，基于最大似然估计的形式ulation，利用 proprioceptive 感知。在机器人对外环境理解方面，研究者们主要关注多感知融合和机器学习，而不是使用手工、规则驱动的系统。人机交互方法也证明了 Contextual information representation和记忆的重要性，以便理解人类的意图。总结：这篇文章总结了最近在人类型 robot学中感知的发展和趋势。文章分为三个主要应用领域：内部状态估计、外部环境估计和人机交互。每个领域中的不同感知模式的应用和最新的重要成果都被考虑到。

Developing automatic verbatim transcripts for international multilingual meetings: an end-to-end solution

paper_url: http://arxiv.org/abs/2309.15609
repo_url: None
paper_authors: Akshat Dewan, Michal Ziemski, Henri Meylan, Lorenzo Concina, Bruno Pouliquen
for: 这篇论文是为了描述一种完全自动化会议记录和多种语言机器翻译的综合解决方案。
methods: 该工具使用了WIPO内部开发的语音转文本（S2T）和机器翻译（MT）组件，并进行了数据收集和精度调整，实现了高度定制和可靠的系统。
results: 这篇论文描述了技术组件的架构和进化，以及用户 сторо面的业务影响和利益。

Abstract
This paper presents an end-to-end solution for the creation of fully automated conference meeting transcripts and their machine translations into various languages. This tool has been developed at the World Intellectual Property Organization (WIPO) using in-house developed speech-to-text (S2T) and machine translation (MT) components. Beyond describing data collection and fine-tuning, resulting in a highly customized and robust system, this paper describes the architecture and evolution of the technical components as well as highlights the business impact and benefits from the user side. We also point out particular challenges in the evolution and adoption of the system and how the new approach created a new product and replaced existing established workflows in conference management documentation.

摘要
translation in simplified chinese:这篇论文介绍了一个端到端解决方案，用于自动生成会议记录和不同语言的机器翻译。这个工具在世界知识产权组织（WIPO）内部开发了自动 speech-to-text（S2T）和机器翻译（MT）组件。除了描述数据收集和精细调整外，这篇论文还描述了技术组件的架构和进化，以及用户 сторо面上的优点和影响。我们还指出了系统演化和采用的一些挑战，以及如何新的方法创造了一个新产品，取代了现有的会议管理文档工作流程。

An Evaluation of ChatGPT-4’s Qualitative Spatial Reasoning Capabilities in RCC-8

paper_url: http://arxiv.org/abs/2309.15577
repo_url: None
paper_authors: Anthony G Cohn
for: Investigating the extent to which a Large Language Model (LLM) can perform classical qualitative spatial reasoning tasks.
methods: Using the mereotopological calculus, RCC-8.
results: The LLM is able to perform classical qualitative spatial reasoning tasks on RCC-8.

Abstract
Qualitative Spatial Reasoning (QSR) is well explored area of Commonsense Reasoning and has multiple applications ranging from Geographical Information Systems to Robotics and Computer Vision. Recently many claims have been made for the capabilities of Large Language Models (LLMs). In this paper we investigate the extent to which one particular LLM can perform classical qualitative spatial reasoning tasks on the mereotopological calculus, RCC-8.

摘要
优质空间理解（QSR）是已经广泛探索的常识理解领域之一，它在地理信息系统到机器人和计算机视觉等领域有多种应用。近期，许多人对大语言模型（LLM）的能力做出了各种各样的声明。本文我们将 investigate LLM 是否可以在简单的 mereotopological calculus 上完成经典的qualitative spatial reasoning 任务。

Identifiability Matters: Revealing the Hidden Recoverable Condition in Unbiased Learning to Rank

paper_url: http://arxiv.org/abs/2309.15560
repo_url: None
paper_authors: Mouxiang Chen, Chenghao Liu, Zemin Liu, Zhuo Li, Jianling Sun
for: 本研究的目的是探讨ULTR中true relevance是否可以从点击数据中恢复，这是ULTR领域的基础问题。
methods: 我们首先定义一个排名模型为可识别的，如果它可以将true relevance恢复到一个扭曲参数下，那么它是可识别的。然后我们探讨一个等价的可识别性条件，可以用图连接问题表示：如果图 constructed on the underlying structure of the dataset是连通的，那么true relevance可以正确地恢复。如果IG不连通，那么可能会出现坏的情况，导致排名性能下降。为解决这个问题，我们提出了两种方法，即节点干扰和节点合并，以修改数据集并恢复IG的连接性。
results: 我们在一个 simulate dataset和两个LTR benchmark dataset上进行了实验，结果证明了我们的提出的定理的正确性，并证明了我们的方法可以在数据偏见时mitigate the impact of data bias。

Abstract
The application of Unbiased Learning to Rank (ULTR) is widespread in modern systems for training unbiased ranking models from biased click logs. The key is to explicitly model a generation process for user behavior and fit click data based on examination hypothesis. Previous research found empirically that the true latent relevance can be recovered in most cases as long as the clicks are perfectly fitted. However, we demonstrate that this is not always achievable, resulting in a significant reduction in ranking performance. In this work, we aim to answer if or when the true relevance can be recovered from click data, which is a foundation issue for ULTR field. We first define a ranking model as identifiable if it can recover the true relevance up to a scaling transformation, which is enough for pairwise ranking objective. Then we explore an equivalent condition for identifiability that can be novely expressed as a graph connectivity test problem: if and only if a graph (namely identifiability graph, or IG) constructed on the underlying structure of the dataset is connected, we can guarantee that the relevance can be correctly recovered. When the IG is not connected, there may be bad cases leading to poor ranking performance. To address this issue, we propose two methods, namely node intervention and node merging, to modify the dataset and restore connectivity of the IG. Empirical results obtained on a simulation dataset and two LTR benchmark datasets confirm the validity of our proposed theorems and show the effectiveness of our methods in mitigating data bias when the relevance model is unidentifiable.

摘要
现代系统中广泛应用无偏学习排名（ULTR）训练不偏排名模型从偏折衔的点击日志中。关键在于显式地模型用户行为生成过程并将点击数据适应测试假设。前一项的研究发现，如果点击数据完全适应，那么真正的潜在相关性可以在大多数情况下恢复。然而，我们示出这并不总是可能，导致排名性能受到显著降低。在这项工作中，我们试图回答点击数据中真正的相关性是否可以恢复的问题，这是ULTR领域的基础问题。我们首先定义排名模型为可 identificable 如果它可以恢复真正的相关性，并且可以通过对比对象的排名来证明。然后我们探索一种等价的可 identificability 条件，可以以图形连接问题的形式表达：如果数据集下的图形（即可 identificability 图，IG）是连接的，那么我们可以保证相关性可以正确恢复。如果 IG 不连接，可能会出现坏的情况，导致排名性能下降。为解决这个问题，我们提出了两种方法：节点干扰和节点合并，以修改数据集并恢复 IG 的连接性。实验结果， obtained on a simulation dataset and two LTR benchmark datasets, confirm the validity of our proposed theorems and show the effectiveness of our methods in mitigating data bias when the relevance model is unidentifiable.

Direct Models for Simultaneous Translation and Automatic Subtitling: FBK@IWSLT2023

paper_url: http://arxiv.org/abs/2309.15554
repo_url: https://github.com/hlt-mt/fbk-fairseq
paper_authors: Sara Papi, Marco Gaido, Matteo Negri
for: 本研究参加IWSLT 2023评分活动的同时翻译和自动字幕追踪两个追踪，使用直接架构进行两个任务。
methods: 我们使用了已经在线上训练的模型来获取实时推断，并将直接ST模型改进以生成符合标准的字幕和时间标签。
results: 我们的英德同时翻译系统在2021和2022年的任务中比顶对方系统具有更好的计算感知延迟，优化至多达3.5个BLEU。我们的自动字幕系统在英德和英西二 languages 中优化了3.7和1.7个SubER。

Abstract
This paper describes the FBK's participation in the Simultaneous Translation and Automatic Subtitling tracks of the IWSLT 2023 Evaluation Campaign. Our submission focused on the use of direct architectures to perform both tasks: for the simultaneous one, we leveraged the knowledge already acquired by offline-trained models and directly applied a policy to obtain the real-time inference; for the subtitling one, we adapted the direct ST model to produce well-formed subtitles and exploited the same architecture to produce timestamps needed for the subtitle synchronization with audiovisual content. Our English-German SimulST system shows a reduced computational-aware latency compared to the one achieved by the top-ranked systems in the 2021 and 2022 rounds of the task, with gains of up to 3.5 BLEU. Our automatic subtitling system outperforms the only existing solution based on a direct system by 3.7 and 1.7 SubER in English-German and English-Spanish respectively.

摘要
(Note: Simplified Chinese is a written language that uses simpler characters and grammar than Traditional Chinese. The translation is written in Simplified Chinese, but the original text is in Traditional Chinese.)

Identifying confounders in deep-learning-based model predictions using DeepRepViz

paper_url: http://arxiv.org/abs/2309.15551
repo_url: None
paper_authors: Roshan Prakash Rane, JiHoon Kim, Arjun Umesha, Didem Stark, Marc-André Schulz, Kerstin Ritter
for: This paper aims to help researchers detect and mitigate the impact of confounding variables in deep learning (DL) models when analyzing neuroimaging data.
methods: The paper proposes a framework called DeepRepViz, which consists of a metric to quantify the effect of potential confounders and a visualization tool to qualitatively inspect the DL model’s learning process.
results: The authors demonstrate the benefits of using DeepRepViz in combination with DL models through experiments on simulated and neuroimaging datasets. For example, the framework identifies sex as a significant confounder in a DL model predicting chronic alcohol users, and age as a confounder in a DL model predicting cognitive task performance.

Abstract
Deep Learning (DL) models are increasingly used to analyze neuroimaging data and uncover insights about the brain, brain pathologies, and psychological traits. However, extraneous `confounders' variables such as the age of the participants, sex, or imaging artifacts can bias model predictions, preventing the models from learning relevant brain-phenotype relationships. In this study, we provide a solution called the `DeepRepViz' framework that enables researchers to systematically detect confounders in their DL model predictions. The framework consists of (1) a metric that quantifies the effect of potential confounders and (2) a visualization tool that allows researchers to qualitatively inspect what the DL model is learning. By performing experiments on simulated and neuroimaging datasets, we demonstrate the benefits of using DeepRepViz in combination with DL models. For example, experiments on the neuroimaging datasets reveal that sex is a significant confounder in a DL model predicting chronic alcohol users (Con-score=0.35). Similarly, DeepRepViz identifies age as a confounder in a DL model predicting participants' performance on a cognitive task (Con-score=0.3). Overall, DeepRepViz enables researchers to systematically test for potential confounders and expose DL models that rely on extraneous information such as age, sex, or imaging artifacts.

摘要
深度学习（DL）模型在分析神经成像数据方面变得越来越普遍，以探索大脑、脑病和心理特质之间的关系。然而，外部因素such as参与者年龄、性别或成像artefacts可能会影响模型预测，使模型无法学习有关大脑与人类特质之间的关系。在本研究中，我们提供了一种解决方案called“DeepRepViz”框架，它可以帮助研究人员系统地检测DL模型预测中的外部因素。框架包括（1）一个量化 potential confounders的metric和（2）一个可视化工具，允许研究人员资深 inspect DL模型是学习什么。通过对模拟数据和神经成像数据进行实验，我们证明了在与DL模型结合使用DeepRepViz时的优点。例如，对神经成像数据进行实验显示，性别对酒精滥用者的预测结果有 statistically significant的影响（Con-score=0.35）。同时，DeepRepViz也发现年龄对参与者的认知任务表现有 statistically significant的影响（Con-score=0.3）。总之，DeepRepViz可以帮助研究人员系统地测试potential confounders，并暴露DL模型是否依赖于不必要的信息 such as age、性别或成像artefacts。

From LAION-5B to LAION-EO: Filtering Billions of Images Using Anchor Datasets for Satellite Image Extraction

paper_url: http://arxiv.org/abs/2309.15535
repo_url: None
paper_authors: Mikolaj Czerkawski, Alistair Francis
for: 这种研究是为了提取卫星图像域的特定子集而设计的。
methods: 这种方法使用了参考 dataset，然后进行进一步的筛选，以提取卫星图像域的特定子集。
results: 这种方法导致了一个名为 LAION-EO 的 dataset 的发布，该 dataset 包含高分辨率像素的文本和卫星图像的对应对。 paper 还介绍了数据采集过程以及数据集的一些特性。

Abstract
Large datasets, such as LAION-5B, contain a diverse distribution of images shared online. However, extraction of domain-specific subsets of large image corpora is challenging. The extraction approach based on an anchor dataset, combined with further filtering, is proposed here and demonstrated for the domain of satellite imagery. This results in the release of LAION-EO, a dataset sourced from the web containing pairs of text and satellite images in high (pixel-wise) resolution. The paper outlines the acquisition procedure as well as some of the features of the dataset.

摘要
大量的数据集，如LAION-5B，包含丰富多样化的在线图像分布。然而，抽取具有域特定特点的图像集的大数据集是一项挑战。本文提出了基于锚点集的抽取方法，并通过进一步的筛选，在卫星图像领域实现了LAION-EO数据集的生成。该数据集包含高分辨率像素的文本和卫星图像对。文章还介绍了数据集的获取过程以及一些特性。

Cyber Security Requirements for Platforms Enhancing AI Reproducibility

paper_url: http://arxiv.org/abs/2309.15525
repo_url: None
paper_authors: Polra Victor Falade
for: 本研究旨在 Addressing the security challenges associated with artificial intelligence (AI) research and ensuring reproducibility in the field of AI.
methods: 本研究使用了 A new framework for evaluating AI platforms for reproducibility from a cyber security standpoint, which assessed five popular AI reproducibility platforms.
results: 分析发现， none of these platforms fully incorporates the necessary cyber security measures essential for robust reproducibility. Kaggle and Codalab performed better in terms of implementing cyber security measures, covering aspects like security, privacy, usability, and trust.

Abstract
Scientific research is increasingly reliant on computational methods, posing challenges for ensuring research reproducibility. This study focuses on the field of artificial intelligence (AI) and introduces a new framework for evaluating AI platforms for reproducibility from a cyber security standpoint to address the security challenges associated with AI research. Using this framework, five popular AI reproducibility platforms; Floydhub, BEAT, Codalab, Kaggle, and OpenML were assessed. The analysis revealed that none of these platforms fully incorporates the necessary cyber security measures essential for robust reproducibility. Kaggle and Codalab, however, performed better in terms of implementing cyber security measures covering aspects like security, privacy, usability, and trust. Consequently, the study provides tailored recommendations for different user scenarios, including individual researchers, small laboratories, and large corporations. It emphasizes the importance of integrating specific cyber security features into AI platforms to address the challenges associated with AI reproducibility, ultimately advancing reproducibility in this field. Moreover, the proposed framework can be applied beyond AI platforms, serving as a versatile tool for evaluating a wide range of systems and applications from a cyber security perspective.

摘要
（注意：以下是简化中文版本，有些词语和句子可能会有所不同）科学研究越来越依赖计算方法，但这也引发了复制性问题的出现。这项研究将人工智能（AI）作为研究对象，提出了一个新的评估AI平台复制性的框架，从安全角度来解决相关的安全挑战。使用这个框架，研究者评估了5个流行的AI复制性平台，即Floydhub、BEAT、Codalab、Kaggle和OpenML。分析发现，这些平台中没有任一个完全涵盖了必要的安全措施，以确保强大的复制性。Kaggle和Codalab在实施安全措施方面表现了更好，覆盖了安全、隐私、可用性和信任方面的多个方面。因此，研究提供了对不同用户场景的打算建议，包括个人研究者、小团队和大公司。强调在AI平台中集成特定的安全功能，以解决相关的挑战，最终推动AI复制性领域的进步。此外，提出的框架可以超出AI平台，用于评估各种系统和应用程序的安全性。

Robust Internal Representations for Domain Generalization

paper_url: http://arxiv.org/abs/2309.15522
repo_url: None
paper_authors: Mohammad Rostami
for: 本研究是一篇概述我在转移学习中的研究成果，尤其是在缺乏标签数据和连续学习中遇到的挑战。
methods: 本研究使用 embedding space 进行转移学习，包括几何学习、零shot学习、连续学习、领域适应和分布式学习。
results: 本研究提供了一个抽象的转移学习概述，为未来的研究人员提供了一个前瞻性的视角，帮助他们更好地理解和发展转移学习领域。

Abstract
This paper which is part of the New Faculty Highlights Invited Speaker Program of AAAI'23, serves as a comprehensive survey of my research in transfer learning by utilizing embedding spaces. The work reviewed in this paper specifically revolves around the inherent challenges associated with continual learning and limited availability of labeled data. By providing an overview of my past and ongoing contributions, this paper aims to present a holistic understanding of my research, paving the way for future explorations and advancements in the field. My research delves into the various settings of transfer learning, including, few-shot learning, zero-shot learning, continual learning, domain adaptation, and distributed learning. I hope this survey provides a forward-looking perspective for researchers who would like to focus on similar research directions.

摘要
这篇论文是AAAI'23年新教授精彩报告系列之一，它是我在使用嵌入空间进行转移学习的研究概述。这篇论文的工作具有继续学习和标注数据的有限性等内在挑战。通过对我过去和当前研究的概述，这篇论文希望能够为未来的探索和进步提供一个整体的理解，并为相关领域的研究者提供前瞻性的视角。我的研究涉及到转移学习的不同场景，包括几shot学习、零shot学习、继续学习、领域适应和分布式学习。我希望这份报告能够为研究者们提供一个前瞻性的视角，以便他们可以专注于类似的研究方向。

Raijū: Reinforcement Learning-Guided Post-Exploitation for Automating Security Assessment of Network Systems

paper_url: http://arxiv.org/abs/2309.15518
repo_url: None
paper_authors: Van-Hau Pham, Hien Do Hoang, Phan Thanh Trung, Van Dinh Quoc, Trong-Nghia To, Phan The Duy
for: This paper aims to propose a Reinforcement Learning (RL)-driven automation approach to assist penetration testers in quickly implementing the process of post-exploitation for security-level evaluation in network systems.
methods: The proposed approach uses two RL algorithms, Advantage Actor-Critic (A2C) and Proximal Policy Optimization (PPO), to train specialized agents capable of making intelligent actions, which are Metasploit modules to automatically launch attacks of privileges escalation, gathering hashdump, and lateral movement.
results: The agents automatically select actions and launch attacks on four real environments with over 84% successful attacks and under 55 attack steps given. The A2C algorithm has proven to be extremely effective in the selection of proper actions for automation of post-exploitation.

Abstract
In order to assess the risks of a network system, it is important to investigate the behaviors of attackers after successful exploitation, which is called post-exploitation. Although there are various efficient tools supporting post-exploitation implementation, no application can automate this process. Most of the steps of this process are completed by experts who have profound knowledge of security, known as penetration testers or pen-testers. To this end, our study proposes the Raij\=u framework, a Reinforcement Learning (RL)-driven automation approach that assists pen-testers in quickly implementing the process of post-exploitation for security-level evaluation in network systems. We implement two RL algorithms, Advantage Actor-Critic (A2C) and Proximal Policy Optimization (PPO), to train specialized agents capable of making intelligent actions, which are Metasploit modules to automatically launch attacks of privileges escalation, gathering hashdump, and lateral movement. By leveraging RL, we aim to empower these agents with the ability to autonomously select and execute actions that can exploit vulnerabilities in target systems. This approach allows us to automate certain aspects of the penetration testing workflow, making it more efficient and responsive to emerging threats and vulnerabilities. The experiments are performed in four real environments with agents trained in thousands of episodes. The agents automatically select actions and launch attacks on the environments and achieve over 84\% of successful attacks with under 55 attack steps given. Moreover, the A2C algorithm has proved extremely effective in the selection of proper actions for automation of post-exploitation.

摘要
为评估网络系统的风险，需要调查攻击者在成功攻击后的行为，即后续利用。虽然有各种高效的工具支持后续实施，但没有应用程序可以自动化这个过程。大多数步骤都需要由具有安全知识的专家，即黑客测试员或黑客测试人员完成。为此，我们的研究提出了Raij\=u框架，一种基于强化学习（RL）驱动的自动化方法，可以帮助黑客测试员快速实施后续利用的安全水平评估在网络系统中。我们实现了两种RL算法，即Advantage Actor-Critic（A2C）和Proximal Policy Optimization（PPO），用以训练专门的代理人，以便它们可以具有智能行为，例如Metasploit模块，自动发起特权提升、 hashdump 和 lateral movement 攻击。通过RL，我们想使这些代理人具有攻击漏洞的能力，并且可以自动选择和执行攻击。这种方法可以自动化一些黑客测试工作流程，使其更加高效和应对新的威胁和漏洞。实验在四个真实环境中进行，代理人在千多个回合中被训练。代理人自动选择行动，在环境中发起攻击，达成84%以上的成功率，只需55步。此外，A2C算法在选择合适的行动方面表现出色。

Residual Scheduling: A New Reinforcement Learning Approach to Solving Job Shop Scheduling Problem

paper_url: http://arxiv.org/abs/2309.15517
repo_url: None
paper_authors: Kuo-Hao Ho, Ruei-Yu Jheng, Ji-Han Wu, Fan Chiang, Yen-Chi Chen, Yuan-Yu Wu, I-Chen Wu
for: solves the job-shop scheduling problem (JSP) and the flexible job-shop scheduling problem (FJSP)
methods: uses deep reinforcement learning (DRL) with graph neural networks (GNN) to construct scheduling solutions
results: reaches state-of-the-art (SOTA) performance among all known construction heuristics on most well-known open JSP and FJSP benchmarks, and performs well on large-size instances despite being trained on smaller instances.

Abstract
Job-shop scheduling problem (JSP) is a mathematical optimization problem widely used in industries like manufacturing, and flexible JSP (FJSP) is also a common variant. Since they are NP-hard, it is intractable to find the optimal solution for all cases within reasonable times. Thus, it becomes important to develop efficient heuristics to solve JSP/FJSP. A kind of method of solving scheduling problems is construction heuristics, which constructs scheduling solutions via heuristics. Recently, many methods for construction heuristics leverage deep reinforcement learning (DRL) with graph neural networks (GNN). In this paper, we propose a new approach, named residual scheduling, to solving JSP/FJSP. In this new approach, we remove irrelevant machines and jobs such as those finished, such that the states include the remaining (or relevant) machines and jobs only. Our experiments show that our approach reaches state-of-the-art (SOTA) among all known construction heuristics on most well-known open JSP and FJSP benchmarks. In addition, we also observe that even though our model is trained for scheduling problems of smaller sizes, our method still performs well for scheduling problems of large sizes. Interestingly in our experiments, our approach even reaches zero gap for 49 among 50 JSP instances whose job numbers are more than 150 on 20 machines.

摘要
Job-shop scheduling problem (JSP) 是一个数学优化问题，广泛应用在生产和制造业等领域。可是，由于 JSP 和 flexible JSP (FJSP) 都是 NP-hard，因此找到全面的解决方案是不可能的。因此，发展高效的规律来解决 JSP/FJSP 的问题非常重要。一种解决 scheduling 问题的方法是建构规律来，这种方法利用深度学习来建立 scheduling 的解决方案。在这篇文章中，我们提出了一个新的方法，即 residual 调度，来解决 JSP/FJSP 问题。在这个新方法中，我们删除不必要的机器和任务，例如已经完成的任务和机器，以只包含剩下的（或有效的）机器和任务。我们的实验结果显示，我们的方法在大多数知名的 open JSP 和 FJSP 测试集上具有最佳性（SOTA）。此外，我们也发现了我们的模型在较小的 scheduling 问题上训练的情况下，我们的方法仍然可以很好地解决大型 scheduling 问题。在我们的实验中，我们甚至发现了我们的方法可以将 49 个 JSP 问题中的任务数量超过 150 的机器组合成零距离。

Teaching Text-to-Image Models to Communicate

paper_url: http://arxiv.org/abs/2309.15516
repo_url: https://github.com/Sfedfcv/redesigned-pancake
paper_authors: Xiaowen Sun, Jiazhan Feng, Yuxuan Wang, Yuxuan Lai, Xingyu Shen, Dongyan Zhao
for: Given the dialog context, the model should generate a realistic image that is consistent with the specified conversation as response.
methods: We propose an efficient approach for dialog-to-image generation without any intermediate translation, which maximizes the extraction of the semantic information contained in the dialog. We fine-tune pre-trained text-to-image models to enable them to generate images conditioning on processed dialog context.
results: Our approach can consistently improve the performance of various models across multiple metrics. Experimental results on public benchmark demonstrate the effectiveness and practicability of our method.

Abstract
Various works have been extensively studied in the research of text-to-image generation. Although existing models perform well in text-to-image generation, there are significant challenges when directly employing them to generate images in dialogs. In this paper, we first highlight a new problem: dialog-to-image generation, that is, given the dialog context, the model should generate a realistic image which is consistent with the specified conversation as response. To tackle the problem, we propose an efficient approach for dialog-to-image generation without any intermediate translation, which maximizes the extraction of the semantic information contained in the dialog. Considering the characteristics of dialog structure, we put segment token before each sentence in a turn of a dialog to differentiate different speakers. Then, we fine-tune pre-trained text-to-image models to enable them to generate images conditioning on processed dialog context. After fine-tuning, our approach can consistently improve the performance of various models across multiple metrics. Experimental results on public benchmark demonstrate the effectiveness and practicability of our method.

摘要
各种工作在文本到图像生成研究中得到了广泛的研究。虽然现有模型在文本到图像生成方面表现良好，但在直接将其应用于对话中生成图像时存在重要的挑战。在这篇论文中，我们首先强调了一个新的问题：对话到图像生成，即在对话上下文中，模型应该生成一个真实的图像，并且与指定的对话进行响应。为解决这个问题，我们提议一种高效的对话到图像生成方法，不需要任何中间翻译，最大化提取对话中含义的semantic信息。考虑对话结构的特点，我们在对话中每个句子前面添加了分割token，以便 diferenciar不同的说话人。然后，我们使用预训练的文本到图像模型进行微调，以使其能够根据处理后的对话上下文生成图像。经过微调，我们的方法可以在多种纪录下 consistently 提高不同模型的性能。实验结果表明我们的方法是可靠和实用的。

Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training

paper_url: http://arxiv.org/abs/2309.15881
repo_url: None
paper_authors: Zihao Deng, Benjamin Ghaemmaghami, Ashish Kumar Singh, Benjamin Cho, Leo Orshansky, Mattan Erez, Michael Orshansky
for: 提高推荐系统中 rarely-occurring category 的 embedding 质量
methods: 使用 training-time 技巧生成高质量 embedding，并理论解释其效果的 surprising 性
results: MLET 可以生成更好的 embedding，并且可以降低 embedding 维度和模型大小，对多个 state-of-the-art 推荐模型进行 click-through rate 预测任务中表现出色，特别是 для rare items

Abstract
Modern DNN-based recommendation systems rely on training-derived embeddings of sparse features. Input sparsity makes obtaining high-quality embeddings for rarely-occurring categories harder as their representations are updated infrequently. We demonstrate a training-time technique to produce superior embeddings via effective cross-category learning and theoretically explain its surprising effectiveness. The scheme, termed the multi-layer embeddings training (MLET), trains embeddings using factorization of the embedding layer, with an inner dimension higher than the target embedding dimension. For inference efficiency, MLET converts the trained two-layer embedding into a single-layer one thus keeping inference-time model size unchanged. Empirical superiority of MLET is puzzling as its search space is not larger than that of the single-layer embedding. The strong dependence of MLET on the inner dimension is even more surprising. We develop a theory that explains both of these behaviors by showing that MLET creates an adaptive update mechanism modulated by the singular vectors of embeddings. When tested on multiple state-of-the-art recommendation models for click-through rate (CTR) prediction tasks, MLET consistently produces better models, especially for rare items. At constant model quality, MLET allows embedding dimension, and model size, reduction by up to 16x, and 5.8x on average, across the models.

摘要
现代 Deep Neural Network (DNN) 基于推荐系统 rely 于训练得到的含缺特征 embedding。输入缺乏性使得为罕见类目得到高质量 embedding 更加困难，因为它们的表示更新更少。我们提出一种在训练时期进行的技术，称为多层 embedding 训练（MLET），可以生成优秀的 embedding。MLET 使用 embedding 层的因子化，并在内部维度大于目标 embedding 维度。为了提高推理效率，MLET 将训练后的两层 embedding 转换成单层 embedding，以保持推理时模型大小不变。MLET 的实际优势是在它的搜索空间不大于单层 embedding 的情况下表现出色的。此外，MLET 强调内部维度的依赖性也是一种意外的现象。我们提出了一种理论，解释了 MLET 的行为，表明 MLET 创造了一种适应更新机制，该机制被模ulated 由嵌入表示中的几个特征值。当应用于多个 state-of-the-art 推荐模型中，MLET consistently 生成更好的模型，特别是 для 罕见项。在保持模型质量不变的情况下，MLET 允许 embedding 维度和模型大小的减少，最大化到 16x 和 5.8x 的平均值。

High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models

paper_url: http://arxiv.org/abs/2309.15512
repo_url: None
paper_authors: Chunyu Qiang, Hao Li, Yixin Tian, Yi Zhao, Ying Zhang, Longbiao Wang, Jianwu Dang
for: 这个论文主要是为了提出一种基于扩散模型的微调 speech synthesis方法，以减少标注数据量并提高语音质量。
methods: 这种方法使用了两种类型的扩散表示（semantic和acoustic），并使用两个sequence-to-sequence任务来实现微调语音生成。它还使用了一种新的CTAP抽象方法来解决现有方法中的信息重复和维度爆炸问题。
results: 实验结果表明，我们的提议方法比基eline方法有更高的语音质量和多样性。我们在官方网站上提供了一些音频样本，以便用户可以直接听到结果。

Abstract
Text-to-speech (TTS) methods have shown promising results in voice cloning, but they require a large number of labeled text-speech pairs. Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations(semantic \& acoustic) and using two sequence-to-sequence tasks to enable training with minimal supervision. However, existing methods suffer from information redundancy and dimension explosion in semantic representation, and high-frequency waveform distortion in discrete acoustic representation. Autoregressive frameworks exhibit typical instability and uncontrollability issues. And non-autoregressive frameworks suffer from prosodic averaging caused by duration prediction models. To address these issues, we propose a minimally-supervised high-fidelity speech synthesis method, where all modules are constructed based on the diffusion models. The non-autoregressive framework enhances controllability, and the duration diffusion model enables diversified prosodic expression. Contrastive Token-Acoustic Pretraining (CTAP) is used as an intermediate semantic representation to solve the problems of information redundancy and dimension explosion in existing semantic coding methods. Mel-spectrogram is used as the acoustic representation. Both semantic and acoustic representations are predicted by continuous variable regression tasks to solve the problem of high-frequency fine-grained waveform distortion. Experimental results show that our proposed method outperforms the baseline method. We provide audio samples on our website.

摘要
文本识别（TTS）方法已经在voice cloning中表现出了有前途的结果，但它们需要大量标注的文本-语音对。不过，现有的方法受到 semantic 表示中的信息重复和维度爆发的问题，以及 discrete acoustic 表示中的高频波形腐败问题。潜在的 autoregressive 框架具有典型的不稳定和不可控问题，而非潜在的框架受到duration prediction模型的平均化问题。为解决这些问题，我们提出了一种 minimally-supervised 高精度语音合成方法，其中所有模块都是基于扩散模型的。非潜在的框架提高了可控性，而 duration 扩散模型允许多样化的语音表达。我们使用 contrastive Token-Acoustic Pretraining（CTAP）作为中间的semantic表示，以解决现有的semantic coding方法中的信息重复和维度爆发问题。Mel-spectrogram 作为 acoustic 表示。两者都是通过连续变量回归任务来解决高频细腐波形腐败问题。实验结果表明，我们的提议方法比基eline方法更高效。我们在官方网站上提供了声音样本。

Towards Human-Like RL: Taming Non-Naturalistic Behavior in Deep RL via Adaptive Behavioral Costs in 3D Games

paper_url: http://arxiv.org/abs/2309.15484
repo_url: None
paper_authors: Kuo-Hao Ho, Ping-Chun Hsieh, Chiu-Chou Lin, You-Ren Luo, Feng-Jian Wang, I-Chen Wu
For: The paper aims to train a human-like agent with competitive strength in reinforcement learning, addressing the issue of peculiar gameplay experiences caused by unconstrained agents.* Methods: The proposed approach, called Adaptive Behavioral Costs in Reinforcement Learning (ABC-RL), augments behavioral limitations as cost signals in reinforcement learning with dynamically adjusted weights, and minimizes the behavioral costs subject to a constraint of the value function.* Results: Through experiments conducted on 3D games in DMLab-30 and Unity ML-Agents Toolkit, the paper demonstrates that ABC-RL achieves the same performance level while significantly reducing instances of shaking and spinning, promoting more natural and human-like behavior during gameplay.Here’s the Chinese version of the three key points:* For: 论文目的是在强化学习中训练一个与人类行为相似的智能机器人，解决不 constraint 的agent可能出现的异常游戏体验问题。* Methods: 提议的方法是 Adaptive Behavioral Costs in Reinforcement Learning (ABC-RL)，它在强化学习中加入行为限制作为成本信号，并通过动态调整行为限制的权重来减少行为成本。* Results: 通过在 DMLab-30 和 Unity ML-Agents Toolkit 上进行的3D游戏实验，论文表明 ABC-RL 可以同时保持同等性能水平，减少摆动和旋转的实例，使游戏play 更加自然和人类化。

Abstract
In this paper, we propose a new approach called Adaptive Behavioral Costs in Reinforcement Learning (ABC-RL) for training a human-like agent with competitive strength. While deep reinforcement learning agents have recently achieved superhuman performance in various video games, some of these unconstrained agents may exhibit actions, such as shaking and spinning, that are not typically observed in human behavior, resulting in peculiar gameplay experiences. To behave like humans and retain similar performance, ABC-RL augments behavioral limitations as cost signals in reinforcement learning with dynamically adjusted weights. Unlike traditional constrained policy optimization, we propose a new formulation that minimizes the behavioral costs subject to a constraint of the value function. By leveraging the augmented Lagrangian, our approach is an approximation of the Lagrangian adjustment, which handles the trade-off between the performance and the human-like behavior. Through experiments conducted on 3D games in DMLab-30 and Unity ML-Agents Toolkit, we demonstrate that ABC-RL achieves the same performance level while significantly reducing instances of shaking and spinning. These findings underscore the effectiveness of our proposed approach in promoting more natural and human-like behavior during gameplay.

摘要
在这篇论文中，我们提出了一种新的方法called Adaptive Behavioral Costs in Reinforcement Learning（ABC-RL），用于训练具有竞争力的人类样式机器人。Recently, deep reinforcement learning agents have achieved superhuman performance in various video games, but some of these unconstrained agents may exhibit actions such as shaking and spinning that are not typical of human behavior, leading to peculiar gameplay experiences. To behave like humans and retain similar performance, ABC-RL adds behavioral limitations as cost signals in reinforcement learning with dynamically adjusted weights. Unlike traditional constrained policy optimization, we propose a new formulation that minimizes the behavioral costs subject to a constraint of the value function. By leveraging the augmented Lagrangian, our approach is an approximation of the Lagrangian adjustment, which handles the trade-off between performance and human-like behavior. Through experiments conducted on 3D games in DMLab-30 and Unity ML-Agents Toolkit, we demonstrate that ABC-RL achieves the same performance level while significantly reducing instances of shaking and spinning. These findings underscore the effectiveness of our proposed approach in promoting more natural and human-like behavior during gameplay.Here's the translation in Traditional Chinese:在这篇论文中，我们提出了一种新的方法called Adaptive Behavioral Costs in Reinforcement Learning（ABC-RL），用于训练具有竞争力的人类样式机器人。Recently, deep reinforcement learning agents have achieved superhuman performance in various video games, but some of these unconstrained agents may exhibit actions such as shaking and spinning that are not typical of human behavior, leading to peculiar gameplay experiences. To behave like humans and retain similar performance, ABC-RL adds behavioral limitations as cost signals in reinforcement learning with dynamically adjusted weights. Unlike traditional constrained policy optimization, we propose a new formulation that minimizes the behavioral costs subject to a constraint of the value function. By leveraging the augmented Lagrangian, our approach is an approximation of the Lagrangian adjustment, which handles the trade-off between performance and human-like behavior. Through experiments conducted on 3D games in DMLab-30 and Unity ML-Agents Toolkit, we demonstrate that ABC-RL achieves the same performance level while significantly reducing instances of shaking and spinning. These findings underscore the effectiveness of our proposed approach in promoting more natural and human-like behavior during gameplay.

Enabling Resource-efficient AIoT System with Cross-level Optimization: A survey

paper_url: http://arxiv.org/abs/2309.15467
repo_url: None
paper_authors: Sicong Liu, Bin Guo, Cheng Fang, Ziqi Wang, Shiyan Luo, Zimu Zhou, Zhiwen Yu
for:* The paper focuses on the development of resource-friendly deep learning (DL) models and model-adaptive system scheduling for artificial intelligence of things (AIoT) applications.methods:* The paper explores algorithm-system co-design to optimize resource availability and improve the performance of DL models on resource-scarce infrastructures.* The survey covers various granularity levels, including DL models, computation graphs, operators, memory schedules, and hardware instructors in both on-device and distributed paradigms.results:* The paper aims to provide a broader optimization space for more free resource-performance tradeoffs and to help readers understand the connections between problems and techniques scattered over diverse levels.Here is the answer in Simplified Chinese text:for:* 本文关注资源充足的深度学习（DL）模型和AIoT应用中的系统调度方法的开发。methods:* 本文探讨了算法系统共设计，以优化资源可用性并提高资源缺乏基础设施上DL模型的性能。* 本文覆盖了多个粒度 уровень，包括DL模型、计算图、运算符、内存调度和硬件指令在 both 设备和分布式 paradigms 中。results:* 本文希望通过提供更多的自由资源性能交易空间，以便读者更好地理解各种问题和技术之间的连接。

Abstract
The emerging field of artificial intelligence of things (AIoT, AI+IoT) is driven by the widespread use of intelligent infrastructures and the impressive success of deep learning (DL). With the deployment of DL on various intelligent infrastructures featuring rich sensors and weak DL computing capabilities, a diverse range of AIoT applications has become possible. However, DL models are notoriously resource-intensive. Existing research strives to realize near-/realtime inference of AIoT live data and low-cost training using AIoT datasets on resource-scare infrastructures. Accordingly, the accuracy and responsiveness of DL models are bounded by resource availability. To this end, the algorithm-system co-design that jointly optimizes the resource-friendly DL models and model-adaptive system scheduling improves the runtime resource availability and thus pushes the performance boundary set by the standalone level. Unlike previous surveys on resource-friendly DL models or hand-crafted DL compilers/frameworks with partially fine-tuned components, this survey aims to provide a broader optimization space for more free resource-performance tradeoffs. The cross-level optimization landscape involves various granularity, including the DL model, computation graph, operator, memory schedule, and hardware instructor in both on-device and distributed paradigms. Furthermore, due to the dynamic nature of AIoT context, which includes heterogeneous hardware, agnostic sensing data, varying user-specified performance demands, and resource constraints, this survey explores the context-aware inter-/intra-device controllers for automatic cross-level adaptation. Additionally, we identify some potential directions for resource-efficient AIoT systems. By consolidating problems and techniques scattered over diverse levels, we aim to help readers understand their connections and stimulate further discussions.

摘要
人工智能物联网（AIoT，AI+IoT）领域在广泛使用智能基础设施和深度学习（DL）的成功下发展。通过在具有丰富感知器和软DL计算能力的多种智能基础设施上部署DL模型，AIoT应用程序的多样化化成为可能。然而，DL模型具有资源占用率问题。现有研究寻求实现AIoT实时数据和训练成本低的准确和响应率DL模型。为此，我们需要同时优化资源充足的DL模型和模型适应系统调度。不同于之前关于资源友好DL模型或手动精细DL编译器/框架的调研，本调研旨在为更多的自由资源性能交易提供更广泛的优化空间。AIoT上下文的跨级优化景观包括DL模型、计算图、运算、内存调度和硬件指导等，在 both on-device 和分布式模式下进行跨级优化。此外，由于AIoT上下文的动态特性，包括多样化硬件、多种感知数据、用户指定性能要求和资源限制，我们需要采用智能Device Controller来自动进行跨级调整。此外，我们还提出了一些可能的资源高效AIoT系统的方向。通过汇集多级问题和技术，我们希望读者可以更好地理解他们之间的连接，并促进进一步的讨论。

LogicMP: A Neuro-symbolic Approach for Encoding First-order Logic Constraints

paper_url: http://arxiv.org/abs/2309.15458
repo_url: None
paper_authors: Weidi Xu, Jingwei Wang, Lele Xie, Jianshan He, Hongting Zhou, Taifeng Wang, Xiaopei Wan, Jingdong Chen, Chao Qu, Wei Chu
for: 本文提出了一种将逻辑学约束（FOLC）与神经网络结合的新方法，以便模型复杂的关系，满足约束。
methods: 本文提出了一种新的神经层，逻辑MP（LogicMP），其层使用了mean-field变量推理来处理MLN（多边网络）。这种层可以与现有的神经网络结合，以便编码FOLC。
results: empirical results表明，LogicMP在图像、文本和图像三种任务中的表现比先进竞争者更高，同时具有更高的效率和性能。

Abstract
Integrating first-order logic constraints (FOLCs) with neural networks is a crucial but challenging problem since it involves modeling intricate correlations to satisfy the constraints. This paper proposes a novel neural layer, LogicMP, whose layers perform mean-field variational inference over an MLN. It can be plugged into any off-the-shelf neural network to encode FOLCs while retaining modularity and efficiency. By exploiting the structure and symmetries in MLNs, we theoretically demonstrate that our well-designed, efficient mean-field iterations effectively mitigate the difficulty of MLN inference, reducing the inference from sequential calculation to a series of parallel tensor operations. Empirical results in three kinds of tasks over graphs, images, and text show that LogicMP outperforms advanced competitors in both performance and efficiency.

摘要
integrating first-order logic constraints (FOLCs) with neural networks is a crucial but challenging problem, as it involves modeling intricate correlations to satisfy the constraints. this paper proposes a novel neural layer, LogicMP, whose layers perform mean-field variational inference over an MLN. it can be plugged into any off-the-shelf neural network to encode FOLCs while retaining modularity and efficiency. by exploiting the structure and symmetries in MLNs, we theoretically demonstrate that our well-designed, efficient mean-field iterations effectively mitigate the difficulty of MLN inference, reducing the inference from sequential calculation to a series of parallel tensor operations. empirical results in three kinds of tasks over graphs, images, and text show that LogicMP outperforms advanced competitors in both performance and efficiency.

Local Compressed Video Stream Learning for Generic Event Boundary Detection

paper_url: http://arxiv.org/abs/2309.15431
repo_url: https://github.com/gx77/lcvsl
paper_authors: Libo Zhang, Xin Gu, Congcong Li, Tiejian Luo, Heng Fan
for: 本研究旨在提出一种基于压缩视频表示学习方法，用于精准地检测视频中的事件边界。
methods: 该方法使用轻量级的ConvNet提取RGB、运动向量、差异和GOP结构中的特征，并通过针对压缩信息的批处理和双向信息流的SCAM模块进行特征提取和束缚。
results: 对于Kinetics-GEBD和TAPOS数据集，该方法实现了较大的改进，与之前的端到端方法相比，同时运行速度相同。

Abstract
Generic event boundary detection aims to localize the generic, taxonomy-free event boundaries that segment videos into chunks. Existing methods typically require video frames to be decoded before feeding into the network, which contains significant spatio-temporal redundancy and demands considerable computational power and storage space. To remedy these issues, we propose a novel compressed video representation learning method for event boundary detection that is fully end-to-end leveraging rich information in the compressed domain, i.e., RGB, motion vectors, residuals, and the internal group of pictures (GOP) structure, without fully decoding the video. Specifically, we use lightweight ConvNets to extract features of the P-frames in the GOPs and spatial-channel attention module (SCAM) is designed to refine the feature representations of the P-frames based on the compressed information with bidirectional information flow. To learn a suitable representation for boundary detection, we construct the local frames bag for each candidate frame and use the long short-term memory (LSTM) module to capture temporal relationships. We then compute frame differences with group similarities in the temporal domain. This module is only applied within a local window, which is critical for event boundary detection. Finally a simple classifier is used to determine the event boundaries of video sequences based on the learned feature representation. To remedy the ambiguities of annotations and speed up the training process, we use the Gaussian kernel to preprocess the ground-truth event boundaries. Extensive experiments conducted on the Kinetics-GEBD and TAPOS datasets demonstrate that the proposed method achieves considerable improvements compared to previous end-to-end approach while running at the same speed. The code is available at https://github.com/GX77/LCVSL.

摘要
通用事件边界检测目标是将视频切分成各个事件边界，以便进行事件分类和识别。现有方法通常需要将视频帧解码为图像，然后将其传输到网络中进行处理，这会带来很大的计算成本和存储空间。为了解决这些问题，我们提出了一种新的压缩视频表示学习方法，可以在压缩域内完全结束地处理视频，而不需要完全解码视频。我们使用轻量级的ConvNet来提取P帧中的特征，并使用空间通道注意机制（SCAM）来细化P帧的特征表示，基于压缩信息的双向信息流。为了学习适合的表示，我们构建了每个候选帧的本地帧袋，并使用长短时间记忆（LSTM）模块来捕捉视频序列中的时间关系。然后，我们计算帧之间的相似性，并使用 Gaussian kernel 预处理真实的事件边界标注。我们的方法在 Kinetics-GEBD 和 TAPOS 数据集上进行了广泛的实验，并达到了较好的性能，而且与之前的端到端方法相比，运行速度相对较快。代码可以在 GitHub 上找到：https://github.com/GX77/LCVSL。

SimPINNs: Simulation-Driven Physics-Informed Neural Networks for Enhanced Performance in Nonlinear Inverse Problems

paper_url: http://arxiv.org/abs/2309.16729
repo_url: None
paper_authors: Sidney Besnard, Frédéric Jurie, Jalal M. Fadili
for: solves inverse problems by leveraging deep learning techniques, with the objective of inferring unknown parameters that govern a physical system based on observed data.
methods: builds upon physics-informed neural networks (PINNs) trained with a hybrid loss function that combines observed data with simulated data generated by a known (approximate) physical model.
results: surpasses the performance of standard PINNs, providing improved accuracy and robustness, as demonstrated by experimental results on an orbit restitution problem.Here’s the full text in Simplified Chinese:
for: solves inverse problems by leveraging deep learning techniques, with the objective of inferring unknown parameters that govern a physical system based on observed data.
methods: builds upon physics-informed neural networks (PINNs) trained with a hybrid loss function that combines observed data with simulated data generated by a known (approximate) physical model.
results: surpasses the performance of standard PINNs, providing improved accuracy and robustness, as demonstrated by experimental results on an orbit restitution problem.

Abstract
This paper introduces a novel approach to solve inverse problems by leveraging deep learning techniques. The objective is to infer unknown parameters that govern a physical system based on observed data. We focus on scenarios where the underlying forward model demonstrates pronounced nonlinear behaviour, and where the dimensionality of the unknown parameter space is substantially smaller than that of the observations. Our proposed method builds upon physics-informed neural networks (PINNs) trained with a hybrid loss function that combines observed data with simulated data generated by a known (approximate) physical model. Experimental results on an orbit restitution problem demonstrate that our approach surpasses the performance of standard PINNs, providing improved accuracy and robustness.

摘要

Graph Neural Prompting with Large Language Models

paper_url: http://arxiv.org/abs/2309.15427
repo_url: None
paper_authors: Yijun Tian, Huan Song, Zichen Wang, Haozhu Wang, Ziqing Hu, Fang Wang, Nitesh V. Chawla, Panpan Xu
for: 增强预训练的大语言模型（LLM）在语言理解任务中的表现，提高 LLM 的基础知识捕捉和返回能力。
methods: 提出 Graph Neural Prompting（GNP）方法，GNP 包括标准图 neural network Encoder、异种Modal Pooling 模块、域 проекor 和自我超vision连接预测目标。
results: 在多个数据集上，GNP 能够在不同的 LLM 大小和设置下提高各种普通常识和生物医学理解任务的表现。

Abstract
Large Language Models (LLMs) have shown remarkable generalization capability with exceptional performance in various language modeling tasks. However, they still exhibit inherent limitations in precisely capturing and returning grounded knowledge. While existing work has explored utilizing knowledge graphs to enhance language modeling via joint training and customized model architectures, applying this to LLMs is problematic owing to their large number of parameters and high computational cost. In addition, how to leverage the pre-trained LLMs and avoid training a customized model from scratch remains an open question. In this work, we propose Graph Neural Prompting (GNP), a novel plug-and-play method to assist pre-trained LLMs in learning beneficial knowledge from KGs. GNP encompasses various designs, including a standard graph neural network encoder, a cross-modality pooling module, a domain projector, and a self-supervised link prediction objective. Extensive experiments on multiple datasets demonstrate the superiority of GNP on both commonsense and biomedical reasoning tasks across different LLM sizes and settings.

摘要

Towards the Vulnerability of Watermarking Artificial Intelligence Generated Content

paper_url: http://arxiv.org/abs/2310.07726
repo_url: None
paper_authors: Guanlin Li, Yifei Chen, Jie Zhang, Jiwei Li, Shangwei Guo, Tianwei Zhang
for: 本研究旨在探讨社交媒体中人工智能生成内容（AIGC）的许多商业服务，以及这些服务的使用需要高度调控，以确保用户不会违反使用政策（如商业化利用、生成和分发不安全内容）。
methods: 本研究使用了许多水印技术，包括潜在扩散模型和大语言模型，来生成创意内容（如真实的图像和流畅的句子） для用户。
results: 研究发现， adversary可以轻松破坏这些水印技术，包括两种可能的攻击方式：水印去除和水印forge。WMagi是一个综合性框架，可以实现这两种攻击方式，并且可以保持内容质量。相比之下，现有的扩散模型基于攻击，WMagi是5,050$\sim$11,000$\times$ faster。

Abstract
Artificial Intelligence Generated Content (AIGC) is gaining great popularity in social media, with many commercial services available. These services leverage advanced generative models, such as latent diffusion models and large language models, to generate creative content (e.g., realistic images, fluent sentences) for users. The usage of such generated content needs to be highly regulated, as the service providers need to ensure the users do not violate the usage policies (e.g., abuse for commercialization, generating and distributing unsafe content). Numerous watermarking approaches have been proposed recently. However, in this paper, we show that an adversary can easily break these watermarking mechanisms. Specifically, we consider two possible attacks. (1) Watermark removal: the adversary can easily erase the embedded watermark from the generated content and then use it freely without the regulation of the service provider. (2) Watermark forge: the adversary can create illegal content with forged watermarks from another user, causing the service provider to make wrong attributions. We propose WMaGi, a unified framework to achieve both attacks in a holistic way. The key idea is to leverage a pre-trained diffusion model for content processing, and a generative adversarial network for watermark removing or forging. We evaluate WMaGi on different datasets and embedding setups. The results prove that it can achieve high success rates while maintaining the quality of the generated content. Compared with existing diffusion model-based attacks, WMaGi is 5,050$\sim$11,000$\times$ faster.

摘要
人工智能生成内容（AIGC）在社交媒体上 gaining popularity，许多商业服务可以提供。这些服务利用先进的生成模型，如潜在扩散模型和大语言模型，为用户生成创ativo内容（如真实的图像和流畅的句子）。使用这些生成内容的使用需要高度调控，因为服务提供者需要确保用户不会违反使用策略（如商业化利用和发布不安全内容）。Recently, numerous watermarking approaches have been proposed, but in this paper, we show that an adversary can easily break these watermarking mechanisms. Specifically, we consider two possible attacks:1. 水印除除：敌对者可以轻松地从生成内容中除除水印，然后使用无需服务提供者的调控。2. 水印forge：敌对者可以从另一名用户的水印中生成非法内容，使服务提供者错误地归因。我们提出WMaGi，一种综合性框架，可以实现这两种攻击。WMaGi 利用预训练的扩散模型进行内容处理，并利用生成对抗网络来除水印或forge水印。我们对不同的数据集和嵌入设置进行评估，结果表明，WMaGi 可以 дости到高成功率，同时保持生成内容的质量。相比 existed 的扩散模型基于攻击，WMaGi 速度比例为5,050$\sim$11,000$\times$快。

The Triad of Failure Modes and a Possible Way Out

paper_url: http://arxiv.org/abs/2309.15420
repo_url: None
paper_authors: Emanuele Sansone
for: 提高cluster-based self-supervised learning（SSL）中的表示 collapse、cluster collapse和数据Permutation的问题。
methods: 提出了一个新的目标函数，该目标函数包括三个关键组成部分：（i）一个生成项，用于抑制表示 collapse；（ii）一个对数据变换的项，以解决标签Permutation的问题；（iii）一个均匀项，用于抑制集合溃ubble。
results: 对于具体实验，我们的提出的目标函数可以有效地解决表示 collapse、cluster collapse和数据Permutation的问题，并且可以通过标准背部网络 Architecture 进行优化。

Abstract
We present a novel objective function for cluster-based self-supervised learning (SSL) that is designed to circumvent the triad of failure modes, namely representation collapse, cluster collapse, and the problem of invariance to permutations of cluster assignments. This objective consists of three key components: (i) A generative term that penalizes representation collapse, (ii) a term that promotes invariance to data augmentations, thereby addressing the issue of label permutations and (ii) a uniformity term that penalizes cluster collapse. Additionally, our proposed objective possesses two notable advantages. Firstly, it can be interpreted from a Bayesian perspective as a lower bound on the data log-likelihood. Secondly, it enables the training of a standard backbone architecture without the need for asymmetric elements like stop gradients, momentum encoders, or specialized clustering layers. Due to its simplicity and theoretical foundation, our proposed objective is well-suited for optimization. Experiments on both toy and real world data demonstrate its effectiveness

摘要
我们提出了一种新的目标函数 для基于分 clustering 的自监督学习（SSL），旨在解决三种失败模式，即表示 collapse， cluster collapse 和数据变换 permutations 的问题。这个目标函数包括三个关键组件：(i) 生成项，惩罚表示 collapse。(ii) 对数据变换具有抗变换性，解决标签 permutations 问题。(iii) 统一项，惩罚 cluster collapse。我们的提议的目标函数具有两个优点：第一，它可以从 bayesian 的视角来看做数据log-likelihood 的下界。第二，它可以使用标准的背bone 架构进行训练，不需要做特殊的杂化元素，如停止梯度、旋转encoder 或特殊的 clustering 层。由于其简单性和理论基础，我们的提议的目标函数适合优化。实验表明，它在 Toy 和实际数据上具有效果。

Neuro-Inspired Hierarchical Multimodal Learning

paper_url: http://arxiv.org/abs/2309.15877
repo_url: None
paper_authors: Xiongye Xiao, Gengshuo Liu, Gaurav Gupta, Defu Cao, Shixuan Li, Yaxing Li, Tianqing Fang, Mingxi Cheng, Paul Bogdan
for: 本研究旨在提高多modalities情况下的感知效果，启发自 neuroscience 学习。
methods: 我们提出了一种基于信息理论的层次感知模型（ITHP），利用信息瓶颈原理。与传统的融合模型不同，我们的模型将prime模ality作为输入，剩下的模ality作为检测器在信息路径中。
results: 我们的模型在多modalities学习场景下具有明显的性能优势，在MUStARD和CMU-MOSI数据集上经验证明了这一点。

Abstract
Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Distinct from most traditional fusion models that aim to incorporate all modalities as input, our model designates the prime modality as input, while the remaining modalities act as detectors in the information pathway. Our proposed perception model focuses on constructing an effective and compact information flow by achieving a balance between the minimization of mutual information between the latent state and the input modal state, and the maximization of mutual information between the latent states and the remaining modal states. This approach leads to compact latent state representations that retain relevant information while minimizing redundancy, thereby substantially enhancing the performance of downstream tasks. Experimental evaluations on both the MUStARD and CMU-MOSI datasets demonstrate that our model consistently distills crucial information in multimodal learning scenarios, outperforming state-of-the-art benchmarks.

摘要
将多种来源或模式的信息集成和处理作为获取全面和准确的现实世界认知的关键。 drawing inspiration from neuroscience，我们开发了信息理论层次感知（ITHP）模型，利用信息瓶颈概念。 unlike most traditional fusión模型，我们的模型将首要模式作为输入，而剩下的模式则作为信息路径中的探测器。 our proposed perception model emphasizes constructing an effective and compact information flow by achieving a balance between the minimization of mutual information between the latent state and the input modal state, and the maximization of mutual information between the latent states and the remaining modal states。 this approach leads to compact latent state representations that retain relevant information while minimizing redundancy, thereby substantially enhancing the performance of downstream tasks。 experimental evaluations on both the MUStARD and CMU-MOSI datasets demonstrate that our model consistently distills crucial information in multimodal learning scenarios, outperforming state-of-the-art benchmarks。

STAG: Enabling Low Latency and Low Staleness of GNN-based Services with Dynamic Graphs

paper_url: http://arxiv.org/abs/2309.15875
repo_url: None
paper_authors: Jiawen Wang, Quan Chen, Deze Zeng, Zhuo Song, Chen Chen, Minyi Guo
for: 提高 Graph Neural Networks (GNNs) 服务的精度。
methods: 提出了一种名为 STAG 的 GNN 服务框架，它通过协同服务机制和可加性基于增量传播策略来解决邻居爆发问题和重复计算问题，从而实现低延迟和低落后性。
results: STAG 可以加速更新阶段的执行速度，并大幅减少落后时间，但是有一定的延迟增加。

Abstract
Many emerging user-facing services adopt Graph Neural Networks (GNNs) to improve serving accuracy. When the graph used by a GNN model changes, representations (embedding) of nodes in the graph should be updated accordingly. However, the node representation update is too slow, resulting in either long response latency of user queries (the inference is performed after the update completes) or high staleness problem (the inference is performed based on stale data). Our in-depth analysis shows that the slow update is mainly due to neighbor explosion problem in graphs and duplicated computation. Based on such findings, we propose STAG, a GNN serving framework that enables low latency and low staleness of GNN-based services. It comprises a collaborative serving mechanism and an additivity-based incremental propagation strategy. With the collaborative serving mechanism, only part of node representations are updated during the update phase, and the final representations are calculated in the inference phase. It alleviates the neighbor explosion problem. The additivity-based incremental propagation strategy reuses intermediate data during the update phase, eliminating duplicated computation problem. Experimental results show that STAG accelerates the update phase by 1.3x~90.1x, and greatly reduces staleness time with a slight increase in response latency.

摘要

A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future

paper_url: http://arxiv.org/abs/2309.15402
repo_url: https://github.com/zchuz/cot-reasoning-survey
paper_authors: Zheng Chu, Jingchang Chen, Qianglong Chen, Weijiang Yu, Tao He, Haotian Wang, Weihua Peng, Ming Liu, Bing Qin, Ting Liu
for: 本文提供了一份严谨的链 reasoning（Chain-of-Thought）研究领域的概述，以帮助研究人员更好地了解这个领域的最新进展。
methods: 本文使用了多种方法，包括链 reasoning（XoT）的建构、结构变体和增强XoT，以系统地组织当前的研究工作。
results: 本文总结了链 reasoning的前沿应用，包括规划、工具使用和简化等领域的研究进展，并提出了一些未来研究的挑战和方向。

Abstract
Chain-of-thought reasoning, a cognitive process fundamental to human intelligence, has garnered significant attention in the realm of artificial intelligence and natural language processing. However, there still remains a lack of a comprehensive survey for this arena. To this end, we take the first step and present a thorough survey of this research field carefully and widely. We use X-of-Thought to refer to Chain-of-Thought in a broad sense. In detail, we systematically organize the current research according to the taxonomies of methods, including XoT construction, XoT structure variants, and enhanced XoT. Additionally, we describe XoT with frontier applications, covering planning, tool use, and distillation. Furthermore, we address challenges and discuss some future directions, including faithfulness, multi-modal, and theory. We hope this survey serves as a valuable resource for researchers seeking to innovate within the domain of chain-of-thought reasoning.

摘要
Chain-of-thought reasoning, a fundamental cognitive process of human intelligence, has recently garnered significant attention in the field of artificial intelligence and natural language processing. However, there is still a lack of a comprehensive survey in this area. To address this, we present a thorough survey of this research field, carefully and widely. We use X-of-Thought (XoT) to refer to Chain-of-Thought in a broad sense.In detail, we systematically organize the current research according to the taxonomies of methods, including XoT construction, XoT structure variants, and enhanced XoT. Additionally, we describe XoT with frontier applications, covering planning, tool use, and distillation. Furthermore, we address challenges and discuss some future directions, including faithfulness, multi-modal, and theory. We hope this survey serves as a valuable resource for researchers seeking to innovate within the domain of chain-of-thought reasoning.Here's the translation in Traditional Chinese as well:Chain-of-thought reasoning, a fundamental cognitive process of human intelligence, has recently garnered significant attention in the field of artificial intelligence and natural language processing. However, there is still a lack of a comprehensive survey in this area. To address this, we present a thorough survey of this research field, carefully and widely. We use X-of-Thought (XoT) to refer to Chain-of-Thought in a broad sense.In detail, we systematically organize the current research according to the taxonomies of methods, including XoT construction, XoT structure variants, and enhanced XoT. Additionally, we describe XoT with frontier applications, covering planning, tool use, and distillation. Furthermore, we address challenges and discuss some future directions, including faithfulness, multi-modal, and theory. We hope this survey serves as a valuable resource for researchers seeking to innovate within the domain of chain-of-thought reasoning.

Neural Stochastic Differential Equations for Robust and Explainable Analysis of Electromagnetic Unintended Radiated Emissions

paper_url: http://arxiv.org/abs/2309.15386
repo_url: None
paper_authors: Sumit Kumar Jha, Susmit Jha, Rickard Ewetz, Alvaro Velasquez
for: 这个论文主要用于评估某些模型在隐性辐射检测 task 中的稳定性和解释性，以及提出一种基于神经泛化差分方程（SDE）的新方法来解决这些问题。
methods: 这个论文使用了 ResNet-like 模型进行隐性辐射检测 task，并对这些模型进行了广泛的评估。研究发现，ResNet-like 模型在 Gaussian 噪声扰动下 exhibits 的性能会很快下降，其 F1 分数从 0.93 下降至 0.008。此外，研究还发现 ResNet-like 模型对输入数据的解释不准确，缺乏时间不变或周期性的特征。
results: 该论文提出了一种基于 SDE 的新方法，可以提高模型的稳定性和解释性。这种方法在面对 Gaussian 噪声扰动时仍然可以保持高的 F1 分数（0.93），而且可以更好地捕捉输入数据中的时间不变或周期性特征。这种新方法可以用于实际的 URE 应用程序中，提供更加稳定和可解释的机器学习预测。

Abstract
We present a comprehensive evaluation of the robustness and explainability of ResNet-like models in the context of Unintended Radiated Emission (URE) classification and suggest a new approach leveraging Neural Stochastic Differential Equations (SDEs) to address identified limitations. We provide an empirical demonstration of the fragility of ResNet-like models to Gaussian noise perturbations, where the model performance deteriorates sharply and its F1-score drops to near insignificance at 0.008 with a Gaussian noise of only 0.5 standard deviation. We also highlight a concerning discrepancy where the explanations provided by ResNet-like models do not reflect the inherent periodicity in the input data, a crucial attribute in URE detection from stable devices. In response to these findings, we propose a novel application of Neural SDEs to build models for URE classification that are not only robust to noise but also provide more meaningful and intuitive explanations. Neural SDE models maintain a high F1-score of 0.93 even when exposed to Gaussian noise with a standard deviation of 0.5, demonstrating superior resilience to ResNet models. Neural SDE models successfully recover the time-invariant or periodic horizontal bands from the input data, a feature that was conspicuously missing in the explanations generated by ResNet-like models. This advancement presents a small but significant step in the development of robust and interpretable models for real-world URE applications where data is inherently noisy and assurance arguments demand interpretable machine learning predictions.

摘要
我们提出了对具有抗耗能和解释性的ResNet-like模型的全面评估，并建议使用神经泛化方程（SDE）来解决已知的限制。我们通过实验表明，ResNet-like模型对 Gaussian 噪声的抗性不佳，其性能很快下降，F1 分数降至 near insignificance 的 0.008，只需0.5 标准差的 Gaussian 噪声。我们还指出，ResNet-like模型提供的解释不符合输入数据中的自然周期性，这是URE检测中稳定设备的关键特征。为了解决这些问题，我们提议使用神经泛化方程（SDE）建立URE分类模型，这些模型不仅抗耗能强，还能提供更直观和易理解的解释。神经泛化方程模型在对 Gaussian 噪声的抗性方面表现出色，即使 exposed to 0.5 标准差的 Gaussian 噪声，其 F1 分数仍然保持在 0.93 级别。此外，神经泛化方程模型成功地从输入数据中提取了时间不变或周期性的水平带，这是 ResNet-like 模型的解释中缺失的特征。这种进步 represent a small but significant step towards the development of robust and interpretable URE models for real-world applications where data is inherently noisy and assurance arguments demand interpretable machine learning predictions.

Seeing Beyond the Patch: Scale-Adaptive Semantic Segmentation of High-resolution Remote Sensing Imagery based on Reinforcement Learning

paper_url: http://arxiv.org/abs/2309.15372
repo_url: None
paper_authors: Yinhe Liu, Sunan Shi, Junjue Wang, Yanfei Zhong
For: 这篇论文的目的是提出一个动态缩尺框架，以便在遥测影像分析中超过滑动窗口的资讯捕捉。* Methods: 这篇论文使用了一个名为GeoAgent的动态缩尺框架，它可以自动捕捉遥测影像中不同类型的地理物件的对应缩尺资讯。GeoAgent使用了一个全球图示和一个位置几何来表示每个遥测影像 patch 的状态，并通过一个构成单元来强制视觉关系。* Results: 实验结果显示，GeoAgent 比前一代分 segmentation 方法更好地适应大规模地图对象的分类任务，特别是在大规模地图分类任务中。

Abstract
In remote sensing imagery analysis, patch-based methods have limitations in capturing information beyond the sliding window. This shortcoming poses a significant challenge in processing complex and variable geo-objects, which results in semantic inconsistency in segmentation results. To address this challenge, we propose a dynamic scale perception framework, named GeoAgent, which adaptively captures appropriate scale context information outside the image patch based on the different geo-objects. In GeoAgent, each image patch's states are represented by a global thumbnail and a location mask. The global thumbnail provides context beyond the patch, and the location mask guides the perceived spatial relationships. The scale-selection actions are performed through a Scale Control Agent (SCA). A feature indexing module is proposed to enhance the ability of the agent to distinguish the current image patch's location. The action switches the patch scale and context branch of a dual-branch segmentation network that extracts and fuses the features of multi-scale patches. The GeoAgent adjusts the network parameters to perform the appropriate scale-selection action based on the reward received for the selected scale. The experimental results, using two publicly available datasets and our newly constructed dataset WUSU, demonstrate that GeoAgent outperforms previous segmentation methods, particularly for large-scale mapping applications.

摘要
在Remote感影像分析中， patch-based 方法具有限制在滑块窗口之外的信息捕获的缺点，这种缺点对处理复杂多变的地球对象产生了semantic 不一致的segmentation结果。为解决这个挑战，我们提出了一种动态缩放见解框架，名为GeoAgent，该框架可以适应不同的地球对象，并在不同的缩放尺度下进行semantic segmentation。在GeoAgent中，每个图像块的状态被表示为全局缩略图和位置 máscara。全局缩略图提供了图像块之外的上下文信息，而位置 máscara 则引导了感知的空间关系。缩放控制器（SCA）来实现缩放选择操作，并通过一个Feature Indexing Module来增强代理的能力以辨别当前图像块的位置。当选择缩放scale时， dual-branch segmentation网络的patch scale和context branch会发生变化，以提取和融合多个缩放级别的特征。GeoAgent根据获得的奖励进行参数调整，以实现适当的缩放选择操作。我们通过使用两个公共可用的数据集和我们自己制作的WUSU数据集进行实验，得到的结果表明，GeoAgent在大规模地图应用中表现出色，特别是在semantic segmentation领域。

ACWA: An AI-driven Cyber-Physical Testbed for Intelligent Water Systems

paper_url: http://arxiv.org/abs/2310.17654
repo_url: https://github.com/ai-vtrc/acwa-data
paper_authors: Feras A. Batarseh, Ajay Kulkarni, Chhayly Sreng, Justice Lin, Siam Maksud
for:* 这篇论文旨在提出一个新的水测试床，即人工智能和网络安全测试床（ACWA），以解决水供应管理领域的挑战。methods:* ACWA使用了最新的人工智能和数据驱动技术，包括多种拓扑、传感器、计算节点、泵、水箱、智能水设备以及数据库和人工智能模型来控制系统。results:* ACWA的实验结果表明，这种新的水测试床可以帮助解决水和农业领域的挑战，包括网络安全、资源管理、获取水资源、可持续发展和数据驱动决策等问题。

Abstract
This manuscript presents a novel state-of-the-art cyber-physical water testbed, namely: The AI and Cyber for Water and Agriculture testbed (ACWA). ACWA is motivated by the need to advance water supply management using AI and Cybersecurity experimentation. The main goal of ACWA is to address pressing challenges in the water and agricultural domains by utilising cutting-edge AI and data-driven technologies. These challenges include Cyberbiosecurity, resources management, access to water, sustainability, and data-driven decision-making, among others. To address such issues, ACWA consists of multiple topologies, sensors, computational nodes, pumps, tanks, smart water devices, as well as databases and AI models that control the system. Moreover, we present ACWA simulator, which is a software-based water digital twin. The simulator runs on fluid and constituent transport principles that produce theoretical time series of a water distribution system. This creates a good validation point for comparing the theoretical approach with real-life results via the physical ACWA testbed. ACWA data are available to AI and water domain researchers and are hosted in an online public repository. In this paper, the system is introduced in detail and compared with existing water testbeds; additionally, example use-cases are described along with novel outcomes such as datasets, software, and AI-related scenarios.

摘要
ACWA consists of multiple topologies, sensors, computational nodes, pumps, tanks, smart water devices, and databases, as well as AI models that control the system. Additionally, the ACWA simulator, a software-based water digital twin, runs on fluid and constituent transport principles to produce theoretical time series of a water distribution system, providing a good validation point for comparing the theoretical approach with real-life results via the physical ACWA testbed.The system is introduced in detail and compared with existing water testbeds, along with example use-cases and novel outcomes such as datasets, software, and AI-related scenarios. The ACWA data are available to AI and water domain researchers and are hosted in an online public repository.

C3Net: interatomic potential neural network for prediction of physicochemical properties in heterogenous systems

paper_url: http://arxiv.org/abs/2309.15334
repo_url: https://github.com/sehanlee/c3net
paper_authors: Sehan Lee, Jaechang Lim, Woo Youn Kim
for: 这个论文的目的是提出一种深度神经网络模型，用于 atom type embeddings 的分子上的物理化学性质预测。
methods: 该模型采用了深度神经网络，并遵循物理法律来预测分子中各个原子的物理化学性质。
results: 该模型能够高效地预测分子的物理化学性质，并且在不同的溶剂和环境中的预测结果具有高度的一致性和可预测性。

Abstract
Understanding the interactions of a solute with its environment is of fundamental importance in chemistry and biology. In this work, we propose a deep neural network architecture for atom type embeddings in its molecular context and interatomic potential that follows fundamental physical laws. The architecture is applied to predict physicochemical properties in heterogeneous systems including solvation in diverse solvents, 1-octanol-water partitioning, and PAMPA with a single set of network weights. We show that our architecture is generalized well to the physicochemical properties and outperforms state-of-the-art approaches based on quantum mechanics and neural networks in the task of solvation free energy prediction. The interatomic potentials at each atom in a solute obtained from the model allow quantitative analysis of the physicochemical properties at atomic resolution consistent with chemical and physical reasoning. The software is available at https://github.com/SehanLee/C3Net.

摘要
理解分子中物质与环境之间的互动是化学和生物中的基本问题。在这项工作中，我们提出了一种深度神经网络架构，用于在分子上的原子类型嵌入和分子间势，该架构遵循物理法律。我们将该架构应用于预测多种不同溶剂中的溶解能，1- octanol-水分配、PAMPA等物理化学性质。我们的结果表明，我们的架构可以准确预测物理化学性质，并且在相比Quantum mechanics和神经网络方法时表现出色。通过这种方法，我们可以在分子级别获得物理化学性质的量化分析，与化学和物理原理一致。软件可以在https://github.com/SehanLee/C3Net上获得。

2023-09-27

Masked autoencoders are scalable learners of cellular morphology

Towards Best Practices of Activation Patching in Language Models: Metrics and Methods

MedEdit: Model Editing for Medical Question Answering with External Knowledge Bases

Symbolic Imitation Learning: From Black-Box to Explainable Driving Policies

Clinical Trial Recommendations Using Semantics-Based Inductive Inference and Knowledge Graph Embeddings

Resilience of Deep Learning applications: a systematic survey of analysis and hardening techniques

Unified Long-Term Time-Series Forecasting Benchmark

Towards Efficient and Trustworthy AI Through Hardware-Algorithm-Communication Co-Design

SHACIRA: Scalable HAsh-grid Compression for Implicit Neural Representations

Examining the Values Reflected by Children during AI Problem Formulation

OrthoPlanes: A Novel Representation for Better 3D-Awareness of GANs

Lyra: Orchestrating Dual Correction in Automated Theorem Proving

AI in Software Engineering: Case Studies and Prospects

Borges and AI

Latent Graphs for Semi-Supervised Learning on Biomedical Tabular Data

Experience and Evidence are the eyes of an excellent summarizer! Towards Knowledge Infused Multi-modal Clinical Conversation Summarization

MindGPT: Interpreting What You See with Non-invasive Brain Recordings

Where Are We So Far? Understanding Data Storytelling Tools from the Perspective of Human-AI Collaboration

Model Share AI: An Integrated Toolkit for Collaborative Machine Learning Model Development, Provenance Tracking, and Deployment in Python

Brave new world: Artificial Intelligence in teaching and learning

Identifying and Mitigating Privacy Risks Stemming from Language Models: A Survey

Integrating LLM, EEG, and Eye-Tracking Biomarker Analysis for Word-Level Neural State Classification in Semantic Inference Reading Comprehension

HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models

Deep Model Fusion: A Survey

Genetic Algorithm-Based Dynamic Backdoor Attack on Federated Learning-Based Network Traffic Classification

Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting

Hedging Properties of Algorithmic Investment Strategies using Long Short-Term Memory and Time Series models for Equity Indices

Learning with Noisy Labels for Human Fall Events Classification: Joint Cooperative Training with Trinity Networks

An Empirical Study of AI Generated Text Detection Tools

Perception for Humanoid Robots

Developing automatic verbatim transcripts for international multilingual meetings: an end-to-end solution

An Evaluation of ChatGPT-4’s Qualitative Spatial Reasoning Capabilities in RCC-8

Identifiability Matters: Revealing the Hidden Recoverable Condition in Unbiased Learning to Rank

Direct Models for Simultaneous Translation and Automatic Subtitling: FBK@IWSLT2023

Identifying confounders in deep-learning-based model predictions using DeepRepViz

From LAION-5B to LAION-EO: Filtering Billions of Images Using Anchor Datasets for Satellite Image Extraction

Cyber Security Requirements for Platforms Enhancing AI Reproducibility

Robust Internal Representations for Domain Generalization

Raijū: Reinforcement Learning-Guided Post-Exploitation for Automating Security Assessment of Network Systems

Residual Scheduling: A New Reinforcement Learning Approach to Solving Job Shop Scheduling Problem

Teaching Text-to-Image Models to Communicate

Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training

High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models

Towards Human-Like RL: Taming Non-Naturalistic Behavior in Deep RL via Adaptive Behavioral Costs in 3D Games

Enabling Resource-efficient AIoT System with Cross-level Optimization: A survey

LogicMP: A Neuro-symbolic Approach for Encoding First-order Logic Constraints

Local Compressed Video Stream Learning for Generic Event Boundary Detection

SimPINNs: Simulation-Driven Physics-Informed Neural Networks for Enhanced Performance in Nonlinear Inverse Problems

Graph Neural Prompting with Large Language Models

Towards the Vulnerability of Watermarking Artificial Intelligence Generated Content

The Triad of Failure Modes and a Possible Way Out

Neuro-Inspired Hierarchical Multimodal Learning

STAG: Enabling Low Latency and Low Staleness of GNN-based Services with Dynamic Graphs

A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future

Neural Stochastic Differential Equations for Robust and Explainable Analysis of Electromagnetic Unintended Radiated Emissions

Seeing Beyond the Patch: Scale-Adaptive Semantic Segmentation of High-resolution Remote Sensing Imagery based on Reinforcement Learning

ACWA: An AI-driven Cyber-Physical Testbed for Intelligent Water Systems

C3Net: interatomic potential neural network for prediction of physicochemical properties in heterogenous systems