cs.AI - 2023-10-12

Examining the Potential and Pitfalls of ChatGPT in Science and Engineering Problem-Solving

  • paper_url: http://arxiv.org/abs/2310.08773
  • repo_url: None
  • paper_authors: Karen D. Wang, Eric Burkholder, Carl Wieman, Shima Salehi, Nick Haber
  • for: 本研究探讨OpenAI的ChatGPT在解决不同类型物理问题的能力。
  • methods: 本研究使用ChatGPT(GPT-4)解决了一共40个大学物理课程中的问题,这些问题包括具有完整数据的准确问题以及缺乏数据的实际问题。
  • results: 研究发现ChatGPT可以成功解决62.5%的准确问题,但对于缺乏数据的问题,准确率只有8.3%。分析模型的错误解决方法发现有三种失败模式:1)建立不准确的物理世界模型,2)缺乏数据的假设,3)计算错误。
    Abstract The study explores the capabilities of OpenAI's ChatGPT in solving different types of physics problems. ChatGPT (with GPT-4) was queried to solve a total of 40 problems from a college-level engineering physics course. These problems ranged from well-specified problems, where all data required for solving the problem was provided, to under-specified, real-world problems where not all necessary data were given. Our findings show that ChatGPT could successfully solve 62.5% of the well-specified problems, but its accuracy drops to 8.3% for under-specified problems. Analysis of the model's incorrect solutions revealed three distinct failure modes: 1) failure to construct accurate models of the physical world, 2) failure to make reasonable assumptions about missing data, and 3) calculation errors. The study offers implications for how to leverage LLM-augmented instructional materials to enhance STEM education. The insights also contribute to the broader discourse on AI's strengths and limitations, serving both educators aiming to leverage the technology and researchers investigating human-AI collaboration frameworks for problem-solving and decision-making.
    摘要
  1. 失败 construct accurate models of the physical world2. 失败 make reasonable assumptions about missing data3. calculation errorsThe study offers implications for how to leverage LLM-augmented instructional materials to enhance STEM education. The insights also contribute to the broader discourse on AI’s strengths and limitations, serving both educators aiming to leverage the technology and researchers investigating human-AI collaboration frameworks for problem-solving and decision-making.

Stabilizing Subject Transfer in EEG Classification with Divergence Estimation

  • paper_url: http://arxiv.org/abs/2310.08762
  • repo_url: None
  • paper_authors: Niklas Smedemark-Margulies, Ye Wang, Toshiaki Koike-Akino, Jing Liu, Kieran Parsons, Yunus Bicer, Deniz Erdogmus
  • for: 这篇论文的目的是提高电enzephalogram(EEG)数据的分类模型性能。
  • methods: 作者使用了新的调整技术来减少分类模型在未见到的测试主题上的性能下降。他们提出了几个图形模型来描述EEG分类任务,并从每个模型中提取了一些关于理想训练enario中的统计关系。他们设计了一些调整 penalty来保持这些关系在实际训练中。
  • results: 作者的提案方法可以对EEG数据进行分类,并且可以增加测试主题上的均衡精度和减少过滤。这些方法在不同的参数下展现出更大的优化效果,并且仅对训练时间进行小量的computational cost。
    Abstract Classification models for electroencephalogram (EEG) data show a large decrease in performance when evaluated on unseen test sub jects. We reduce this performance decrease using new regularization techniques during model training. We propose several graphical models to describe an EEG classification task. From each model, we identify statistical relationships that should hold true in an idealized training scenario (with infinite data and a globally-optimal model) but that may not hold in practice. We design regularization penalties to enforce these relationships in two stages. First, we identify suitable proxy quantities (divergences such as Mutual Information and Wasserstein-1) that can be used to measure statistical independence and dependence relationships. Second, we provide algorithms to efficiently estimate these quantities during training using secondary neural network models. We conduct extensive computational experiments using a large benchmark EEG dataset, comparing our proposed techniques with a baseline method that uses an adversarial classifier. We find our proposed methods significantly increase balanced accuracy on test subjects and decrease overfitting. The proposed methods exhibit a larger benefit over a greater range of hyperparameters than the baseline method, with only a small computational cost at training time. These benefits are largest when used for a fixed training period, though there is still a significant benefit for a subset of hyperparameters when our techniques are used in conjunction with early stopping regularization.
    摘要 “电击脑波(EEG)标本分类模型表现出现大量的减少性能,当被评估在未见到的测试主题时。我们使用新的调整技术来减少这种性能下降。我们提出了一些图形模型来描述EEG标本分类任务。从每个模型中,我们识别出理想情况下(即无穷数据和全球最佳模型)不会出现的统计关系。我们设计了调整罚则来强制这些关系在两个阶段中。首先,我们选择适合的代理量(如共识信息和沃瑟敏-1)来量度弹性和依赖关系。其次,我们提供了高效的训练 Algorithm 来计算这些量。我们使用大量的benchmark EEG数据进行了广泛的计算实验,比较我们的提议方法与基eline方法(使用对抗网络)。我们发现,我们的提议方法在测试主题上具有更高的平衡率和更低的过滤。我们的提议方法在多个参数的范围中表现出更大的优势,仅仅需要在训练过程中进行小量的计算成本。这些优势在固定训练时间下最大化,但是还存在一些参数的子集中,使用我们的技术和早期停止调整时仍然具有重要的优势。”

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

  • paper_url: http://arxiv.org/abs/2310.08753
  • repo_url: None
  • paper_authors: Sreyan Ghosh, Ashish Seth, Sonal Kumar, Utkarsh Tyagi, Chandra Kiran Evuru, S. Ramaneswaran, S. Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha
  • for: The paper is written to explore the ability of audio-language models (ALMs) to perform compositional reasoning, and to propose a new benchmark (CompA) to evaluate this ability.
  • methods: The paper uses a contrastive approach (e.g., CLAP) to train the ALMs, and proposes a novel learning method to improve the model’s compositional reasoning abilities. The method includes improvements to contrastive training with composition-aware hard negatives, and a novel modular contrastive loss.
  • results: The paper shows that current ALMs perform only marginally better than random chance on the CompA benchmark, and proposes a new model (CompA-CLAP) that significantly improves over all baseline models on the benchmark. The results indicate that the proposed method has superior compositional reasoning capabilities.Here’s the Chinese translation of the three key information points:
  • for: 这篇论文是为了探究语音语言模型(ALM)的 композиitional 理解能力而写的,并提出了一个新的 benchmark(CompA)来评估这种能力。
  • methods: 这篇论文使用了对比方法(e.g., CLAP)来训练 ALM,并提出了一种新的学习方法来提高模型的compositional 理解能力。这种方法包括对比训练中的组合感知强制性进行改进,以及一种新的模块对比损失。
  • results: 这篇论文显示了现有的 ALM 只能marginally better than随机的概率来 Perform 在 CompA bencmark 上,并提出了一种新的模型(CompA-CLAP)来解决这个问题。这种模型在 CompA bencmark 上显示出了明显的改进, indicating 它的 compositional 理解能力更强。
    Abstract A fundamental characteristic of audio is its compositional nature. Audio-language models (ALMs) trained using a contrastive approach (e.g., CLAP) that learns a shared representation between audio and language modalities have improved performance in many downstream applications, including zero-shot audio classification, audio retrieval, etc. However, the ability of these models to effectively perform compositional reasoning remains largely unexplored and necessitates additional research. In this paper, we propose CompA, a collection of two expert-annotated benchmarks with a majority of real-world audio samples, to evaluate compositional reasoning in ALMs. Our proposed CompA-order evaluates how well an ALM understands the order or occurrence of acoustic events in audio, and CompA-attribute evaluates attribute binding of acoustic events. An instance from either benchmark consists of two audio-caption pairs, where both audios have the same acoustic events but with different compositions. An ALM is evaluated on how well it matches the right audio to the right caption. Using this benchmark, we first show that current ALMs perform only marginally better than random chance, thereby struggling with compositional reasoning. Next, we propose CompA-CLAP, where we fine-tune CLAP using a novel learning method to improve its compositional reasoning abilities. To train CompA-CLAP, we first propose improvements to contrastive training with composition-aware hard negatives, allowing for more focused training. Next, we propose a novel modular contrastive loss that helps the model learn fine-grained compositional understanding and overcomes the acute scarcity of openly available compositional audios. CompA-CLAP significantly improves over all our baseline models on the CompA benchmark, indicating its superior compositional reasoning capabilities.
    摘要 音频的基本特点之一是其 Compositional nature。使用对比方法(例如CLAP)训练的音频语言模型(ALM)在许多下游应用程序中表现得更好,包括零shot音频分类、音频检索等。然而,这些模型对于实际进行compositional reasoning的能力尚未得到足够的探索,需要进一步的研究。在这篇论文中,我们提出了CompA,一个由专家标注的benchmark集合,用于评估ALM的compositional reasoning能力。我们的CompA-order评估了ALM是否能够正确地理解音频中的事件顺序或发生频度,而CompA-attribute评估了事件绑定的能力。每个benchmark实例都包括两对音频-标签对,其中两个音频具有相同的听觉事件,但具有不同的组合。ALM被评估是否能够匹配正确的音频和标签。使用这个benchmark,我们首先发现现有ALM的表现只是marginally better than random chance,因此它们在compositional reasoning方面几乎没有表现出来。然后,我们提出了CompA-CLAP,其中我们使用一种新的学习方法来改进CLAP的compositional reasoning能力。为了训练CompA-CLAP,我们首先提出了对比训练中的组合感知强制对手,以便更加专注的训练。然后,我们提出了一种新的模块化对比损失,帮助模型学习细致的compositional理解,并且解决了公开available的compositional音频的缺乏问题。CompA-CLAP在CompA benchmark上显著超越了所有基线模型, indicating its superior compositional reasoning capabilities.

Constrained Bayesian Optimization with Adaptive Active Learning of Unknown Constraints

  • paper_url: http://arxiv.org/abs/2310.08751
  • repo_url: None
  • paper_authors: Fengxue Zhang, Zejie Zhu, Yuxin Chen
  • for: 这 paper 是关于 constrained Bayesian optimization (CBO) 的研究,用于处理具有黑盒函数目标和约束的复杂应用场景。
  • methods: 该 paper 提出了一种基于 ROI 的 CBO 框架,利用了目标和约束可以帮助确定高可信区域的想法。
  • results: 该 paper 提供了一种有理 теорем 的 CBO 框架,并通过实验证明了其效率和稳定性。 In English:
  • for: This paper is about research on constrained Bayesian optimization (CBO) for handling complex application scenarios with black-box objective and constraint functions.
  • methods: The paper proposes an CBO framework based on the idea of identifying high-confidence regions of interest (ROI) using both the objective and constraint functions.
  • results: The paper provides a theoretically grounded CBO framework and demonstrates its efficiency and robustness through empirical evidence.
    Abstract Optimizing objectives under constraints, where both the objectives and constraints are black box functions, is a common scenario in real-world applications such as scientific experimental design, design of medical therapies, and industrial process optimization. One popular approach to handling these complex scenarios is Bayesian Optimization (BO). In terms of theoretical behavior, BO is relatively well understood in the unconstrained setting, where its principles have been well explored and validated. However, when it comes to constrained Bayesian optimization (CBO), the existing framework often relies on heuristics or approximations without the same level of theoretical guarantees. In this paper, we delve into the theoretical and practical aspects of constrained Bayesian optimization, where the objective and constraints can be independently evaluated and are subject to noise. By recognizing that both the objective and constraints can help identify high-confidence regions of interest (ROI), we propose an efficient CBO framework that intersects the ROIs identified from each aspect to determine the general ROI. The ROI, coupled with a novel acquisition function that adaptively balances the optimization of the objective and the identification of feasible regions, enables us to derive rigorous theoretical justifications for its performance. We showcase the efficiency and robustness of our proposed CBO framework through empirical evidence and discuss the fundamental challenge of deriving practical regret bounds for CBO algorithms.
    摘要 In this paper, we delve into the theoretical and practical aspects of constrained Bayesian optimization, where the objective and constraints can be independently evaluated and are subject to noise. By recognizing that both the objective and constraints can help identify high-confidence regions of interest (ROI), we propose an efficient CBO framework that intersects the ROIs identified from each aspect to determine the general ROI. The ROI, coupled with a novel acquisition function that adaptively balances the optimization of the objective and the identification of feasible regions, enables us to derive rigorous theoretical justifications for its performance. We showcase the efficiency and robustness of our proposed CBO framework through empirical evidence and discuss the fundamental challenge of deriving practical regret bounds for CBO algorithms.

Development and Validation of a Deep Learning-Based Microsatellite Instability Predictor from Prostate Cancer Whole-Slide Images

  • paper_url: http://arxiv.org/abs/2310.08743
  • repo_url: None
  • paper_authors: Qiyuan Hu, Abbas A. Rizvi, Geoffery Schau, Kshitij Ingale, Yoni Muller, Rachel Baits, Sebastian Pretzer, Aïcha BenTaieb, Abigail Gordhamer, Roberto Nussenzveig, Adam Cole, Matthew O. Leavitt, Rohan P. Joshi, Nike Beaubier, Martin C. Stumpe, Kunal Nagpal
  • for: 这个研究的目的是为了开发一个基于人工智能的肉眼染色图像(H&E)的微isatellite不稳定(MSI)诊断模型,以便将这些模型应用到肝癌患者身上,以促进免疫抑制剂治疗的适应率。
  • methods: 这个研究使用了一种名为“注意力型多个例学习(Multiple Instance Learning,MIL)”的人工智能模型,并使用了4015名肝癌患者的肝癌标本,其中173名患者的标本是在实验室内进行了染色和扫描。这个模型使用了一个叫做“注意力”的技术,将标本中的细胞扫描到图像中,以便更好地识别细胞的特征。
  • results: 这个研究发现了一个新的AI-based MSI诊断模型,可以从H&E标本中预测肝癌患者是否有高度微isatellite不稳定(MSI-H)。这个模型在3个不同的验证集中都达到了高度的准确率,分别为0.78、0.72和0.72。此外,这个模型还发现了与 gleason 分子数值相关的MSI-H诊断。
    Abstract Microsatellite instability-high (MSI-H) is a tumor agnostic biomarker for immune checkpoint inhibitor therapy. However, MSI status is not routinely tested in prostate cancer, in part due to low prevalence and assay cost. As such, prediction of MSI status from hematoxylin and eosin (H&E) stained whole-slide images (WSIs) could identify prostate cancer patients most likely to benefit from confirmatory testing and becoming eligible for immunotherapy. Prostate biopsies and surgical resections from de-identified records of consecutive prostate cancer patients referred to our institution were analyzed. Their MSI status was determined by next generation sequencing. Patients before a cutoff date were split into an algorithm development set (n=4015, MSI-H 1.8%) and a paired validation set (n=173, MSI-H 19.7%) that consisted of two serial sections from each sample, one stained and scanned internally and the other at an external site. Patients after the cutoff date formed the temporal validation set (n=1350, MSI-H 2.3%). Attention-based multiple instance learning models were trained to predict MSI-H from H&E WSIs. The MSI-H predictor achieved area under the receiver operating characteristic curve values of 0.78 (95% CI [0.69-0.86]), 0.72 (95% CI [0.63-0.81]), and 0.72 (95% CI [0.62-0.82]) on the internally prepared, externally prepared, and temporal validation sets, respectively. While MSI-H status is significantly correlated with Gleason score, the model remained predictive within each Gleason score subgroup. In summary, we developed and validated an AI-based MSI-H diagnostic model on a large real-world cohort of routine H&E slides, which effectively generalized to externally stained and scanned samples and a temporally independent validation cohort. This algorithm has the potential to direct prostate cancer patients toward immunotherapy and to identify MSI-H cases secondary to Lynch syndrome.
    摘要 微卫星稳定性高 (MSI-H) 是一种肿瘤不吝啬的生物标志物,可以用于免疫检查点剂治疗。然而,MSI 状态在前列腺癌中并不是常见的测试项,一些原因是诊断成本高和预测率低。因此,可以通过从 Hematoxylin 和 Eosin (H&E) 染色整个扫描图像 (WSIs) 预测 prostate cancer 患者可能会从 confirmatory testing 中受益,并成为免疫治疗的 кандидат。我们分析了来自 consecutive prostate cancer 患者的杯尿和手术摘取记录,并确定了他们的 MSI 状态通过次世代测序。在割Date 之前,患者被分为了一个算法开发集 (n=4015, MSI-H 1.8%) 和一个验证集 (n=173, MSI-H 19.7%),其中每个样本都有两个并行的 serial section,一个在内部染色和扫描,另一个在外部Site 染色。割Date 之后的患者组成了 temporal 验证集 (n=1350, MSI-H 2.3%)。我们使用了注意力基本多实例学习模型来预测 MSI-H 从 H&E WSIs。预测模型在 internally prepared、 externally prepared 和 temporal 验证集上的 area under the receiver operating characteristic curve 值分别为 0.78 (95% CI [0.69-0.86]), 0.72 (95% CI [0.63-0.81]), 0.72 (95% CI [0.62-0.82])。尽管 MSI-H 状态与 gleason 分型显著相关,但模型在每个 gleason 分型 subgroup 中保持预测性。综上所述,我们开发了一种基于 AI 的 MSI-H 诊断模型,并在大量实际患者数据上验证了其效果。这种算法有可能导引 prostate cancer 患者进行免疫治疗,并确定 MSI-H случа例是否与 Lynch 综合征相关。

Real-Time Event Detection with Random Forests and Temporal Convolutional Networks for More Sustainable Petroleum Industry

  • paper_url: http://arxiv.org/abs/2310.08737
  • repo_url: None
  • paper_authors: Yuanwei Qu, Baifan Zhou, Arild Waaler, David Cameron
  • for: 本研究旨在提供更有效的生产过程中不愿意事件探测方法,以避免环境和经济损害。
  • methods: 本研究使用机器学习方法,包括Random Forest和时间卷积网络,实时探测不愿意事件。
  • results: 研究结果表明,我们的方法可以有效地类型化事件并预测事件出现概率,从而解决过去研究中存在的挑战,并为生产过程中的事件管理提供更有效的解决方案。
    Abstract The petroleum industry is crucial for modern society, but the production process is complex and risky. During the production, accidents or failures, resulting from undesired production events, can cause severe environmental and economic damage. Previous studies have investigated machine learning (ML) methods for undesired event detection. However, the prediction of event probability in real-time was insufficiently addressed, which is essential since it is important to undertake early intervention when an event is expected to happen. This paper proposes two ML approaches, random forests and temporal convolutional networks, to detect undesired events in real-time. Results show that our approaches can effectively classify event types and predict the probability of their appearance, addressing the challenges uncovered in previous studies and providing a more effective solution for failure event management during the production.
    摘要 现代社会中,石油工业具有重要的地位,但生产过程具有复杂和危险的特点。生产过程中的意外或失败可能会导致严重的环境和经济损害。先前的研究已经调查了机器学习(ML)方法用于不愿意事件检测。然而,实时预测事件概率的问题尚未得到充分解决,这是因为在事件预计将发生时,早期干预是非常重要的。本文提出了两种ML方法,随机森林和时间卷积网络,用于实时检测不愿意事件。结果表明,我们的方法可以有效地分类事件类型并预测事件出现的概率,解决先前研究中存在的挑战,并为生产过程中的失败事件管理提供更有效的解决方案。

A Simple Way to Incorporate Novelty Detection in World Models

  • paper_url: http://arxiv.org/abs/2310.08731
  • repo_url: None
  • paper_authors: Geigh Zollicoffer, Kenneth Eaton, Jonathan Balloch, Julia Kim, Mark O. Riedl, Robert Wright
  • for: 保护RL Agent在突然改变世界机制或属性时的性能和可靠性。
  • methods: 利用生成的世界模型框架中的假象状态与真实观察状态的偏差来检测新鲜事物。
  • results: 在一个新环境中,比传统机器学习新鲜事物检测方法和当前RL关注的新鲜事物检测算法更有优势。
    Abstract Reinforcement learning (RL) using world models has found significant recent successes. However, when a sudden change to world mechanics or properties occurs then agent performance and reliability can dramatically decline. We refer to the sudden change in visual properties or state transitions as {\em novelties}. Implementing novelty detection within generated world model frameworks is a crucial task for protecting the agent when deployed. In this paper, we propose straightforward bounding approaches to incorporate novelty detection into world model RL agents, by utilizing the misalignment of the world model's hallucinated states and the true observed states as an anomaly score. We first provide an ontology of novelty detection relevant to sequential decision making, then we provide effective approaches to detecting novelties in a distribution of transitions learned by an agent in a world model. Finally, we show the advantage of our work in a novel environment compared to traditional machine learning novelty detection methods as well as currently accepted RL focused novelty detection algorithms.
    摘要 现代控制学(RL)使用世界模型已经取得了显著成功。然而,当世界机制或属性突然发生变化时,智能体性能和可靠性可能很快减退。我们称这种突然变化为“新奇”(novelties)。在生成世界模型框架中实现新奇探测是保护智能体部署的关键任务。在这篇论文中,我们提出了简单的绝对方法,通过利用世界模型生成的幻觉状态和实际观察状态之间的偏差作为异常分数来检测新奇。我们首先提供了对于顺序决策的新奇检测 Ontology,然后我们提供了有效的检测新奇在智能体在世界模型中学习的转移分布中的方法。最后,我们展示了我们的工作在一个新环境中的优势,比传统机器学习新奇检测方法和当前广泛accepted RL专注的新奇检测算法。

Transformer Choice Net: A Transformer Neural Network for Choice Prediction

  • paper_url: http://arxiv.org/abs/2310.08716
  • repo_url: None
  • paper_authors: Hanzhao Wang, Xiaocheng Li, Kalyan Talluri
  • for: 这篇论文旨在提出一种能够预测客户选择多个 item 的Transformer neural network architecture,即 Transformer Choice Net。
  • methods: 该论文使用 transformer 网络,考虑客户和物品特征以及上下文(如购物礼品和客户之前选择)来预测客户选择。
  • results: 在多个 benchmark 数据集上,该 Architecture 表现出比 Literature 中主流模型更好的out-of-sample 预测性能,无需特定模型定制或调整。
    Abstract Discrete-choice models, such as Multinomial Logit, Probit, or Mixed-Logit, are widely used in Marketing, Economics, and Operations Research: given a set of alternatives, the customer is modeled as choosing one of the alternatives to maximize a (latent) utility function. However, extending such models to situations where the customer chooses more than one item (such as in e-commerce shopping) has proven problematic. While one can construct reasonable models of the customer's behavior, estimating such models becomes very challenging because of the combinatorial explosion in the number of possible subsets of items. In this paper we develop a transformer neural network architecture, the Transformer Choice Net, that is suitable for predicting multiple choices. Transformer networks turn out to be especially suitable for this task as they take into account not only the features of the customer and the items but also the context, which in this case could be the assortment as well as the customer's past choices. On a range of benchmark datasets, our architecture shows uniformly superior out-of-sample prediction performance compared to the leading models in the literature, without requiring any custom modeling or tuning for each instance.
    摘要 偏函数模型,如多项逻辑或混合逻辑,在市场学、经济学和运筹学中广泛应用:给定一组选项,客户会选择一个选项以最大化隐藏的凝聚函数。然而,将这些模型扩展到客户选择多个Item(如在电子商务上的购物)是有困难的。尽管可以构建合理的客户行为模型,但估计这些模型变得非常困难,因为选择的可能性的 combinatorial 爆炸。在这篇文章中,我们开发了一种变换神经网络架构,名为Transformer Choice Net,适用于预测多个选择。Transformer网络在这种任务中特别适用,因为它们考虑客户和Item的特征以及上下文,上下文可能是商品组合以及客户的过去选择。在一系列的标准数据集上,我们的架构在无需任何定制化或调整的情况下显示了对比标准模型的uniformly 出色的尝试预测性能。

Toward Joint Language Modeling for Speech Units and Text

  • paper_url: http://arxiv.org/abs/2310.08715
  • repo_url: None
  • paper_authors: Ju-Chieh Chou, Chung-Ming Chien, Wei-Ning Hsu, Karen Livescu, Arun Babu, Alexis Conneau, Alexei Baevski, Michael Auli
  • for: 本研究旨在模型 speech 和 text 之间的共同表达。
  • methods: 我们使用不同的 speech tokenizer 将连续的 speech 信号转换成 discrete 单元,并使用不同的方法构建混合 speech-text 数据。我们还引入自动评价指标,以评估模型是否能够共同学习 speech 和 text。
  • results: 我们的结果表明,通过我们的混合技术,混合 speech 单元和 text,joint LM 可以在 SLU 任务上超过 speech-only 基线,并且在不同的模式(speech 或 text)下进行 Zero-shot 跨模态传递。
    Abstract Speech and text are two major forms of human language. The research community has been focusing on mapping speech to text or vice versa for many years. However, in the field of language modeling, very little effort has been made to model them jointly. In light of this, we explore joint language modeling for speech units and text. Specifically, we compare different speech tokenizers to transform continuous speech signals into discrete units and use different methods to construct mixed speech-text data. We introduce automatic metrics to evaluate how well the joint LM mixes speech and text. We also fine-tune the LM on downstream spoken language understanding (SLU) tasks with different modalities (speech or text) and test its performance to assess the model's learning of shared representations. Our results show that by mixing speech units and text with our proposed mixing techniques, the joint LM improves over a speech-only baseline on SLU tasks and shows zero-shot cross-modal transferability.
    摘要 文本和语音是人类语言的两大形式。研究者们在映射语音到文本或反之方面努力了很多年。然而,在语言模型化领域,很少努力用于同时模型语音和文本。为了解决这个问题,我们在语音单元和文本之间进行同时语言模型化。我们比较不同的语音切分器将连续的语音信号转换成分解单元,并使用不同的方法构建混合语音-文本数据。我们引入自动评估 metric来评估混合LM如何混合语音和文本。此外,我们在不同Modalitites(语音或文本)下进行了精细调整,并测试模型在下游语言理解任务上的性能,以评估模型是否学习了共享表示。我们的结果表明,通过我们提议的混合技术,混合语音单元和文本的混合LM在SLU任务上超过了基准点的语音Only模型,并表现出零 shot cross-modal可转移性。

ELDEN: Exploration via Local Dependencies

  • paper_url: http://arxiv.org/abs/2310.08702
  • repo_url: None
  • paper_authors: Jiaheng Hu, Zizhao Wang, Peter Stone, Roberto Martin-Martin
  • for: 这篇论文是为了解决复杂的任务和奖励不够的问题,提出了一种新的自适应奖励方法。
  • methods: 该方法基于当前环境中实体之间的异常依赖关系,通过计算部分导数来准确地捕捉实体之间的依赖关系,并使用这些依赖关系来鼓励探索新的交互方式。
  • results: 在四个不同的领域中,ELDEN方法在许多复杂的任务上表现出色,比前一个状态的探索方法更加成功,并且能够准确地捕捉实体之间的依赖关系。
    Abstract Tasks with large state space and sparse rewards present a longstanding challenge to reinforcement learning. In these tasks, an agent needs to explore the state space efficiently until it finds a reward. To deal with this problem, the community has proposed to augment the reward function with intrinsic reward, a bonus signal that encourages the agent to visit interesting states. In this work, we propose a new way of defining interesting states for environments with factored state spaces and complex chained dependencies, where an agent's actions may change the value of one entity that, in order, may affect the value of another entity. Our insight is that, in these environments, interesting states for exploration are states where the agent is uncertain whether (as opposed to how) entities such as the agent or objects have some influence on each other. We present ELDEN, Exploration via Local DepENdencies, a novel intrinsic reward that encourages the discovery of new interactions between entities. ELDEN utilizes a novel scheme -- the partial derivative of the learned dynamics to model the local dependencies between entities accurately and computationally efficiently. The uncertainty of the predicted dependencies is then used as an intrinsic reward to encourage exploration toward new interactions. We evaluate the performance of ELDEN on four different domains with complex dependencies, ranging from 2D grid worlds to 3D robotic tasks. In all domains, ELDEN correctly identifies local dependencies and learns successful policies, significantly outperforming previous state-of-the-art exploration methods.
    摘要 Tasks with large state space and sparse rewards have long been a challenge for reinforcement learning. In these tasks, an agent needs to explore the state space efficiently until it finds a reward. To address this problem, the community has proposed augmenting the reward function with an intrinsic reward, a bonus signal that encourages the agent to visit interesting states. In this work, we propose a new way of defining interesting states for environments with factored state spaces and complex chained dependencies, where an agent's actions may change the value of one entity that, in turn, may affect the value of another entity. Our insight is that, in these environments, interesting states for exploration are states where the agent is uncertain whether (as opposed to how) entities such as the agent or objects have some influence on each other. We present ELDEN, Exploration via Local Dependencies, a novel intrinsic reward that encourages the discovery of new interactions between entities. ELDEN utilizes a novel scheme -- the partial derivative of the learned dynamics to model the local dependencies between entities accurately and computationally efficiently. The uncertainty of the predicted dependencies is then used as an intrinsic reward to encourage exploration toward new interactions. We evaluate the performance of ELDEN on four different domains with complex dependencies, ranging from 2D grid worlds to 3D robotic tasks. In all domains, ELDEN correctly identifies local dependencies and learns successful policies, significantly outperforming previous state-of-the-art exploration methods.

Virtual Augmented Reality for Atari Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.08683
  • repo_url: https://github.com/c-a-schiller/var4arl
  • paper_authors: Christian A. Schiller
  • for: 研究是使RL代理人在Atari游戏中表现更好的途径,以及是否可以通过现有的图像分割模型提高RL代理人的游戏表现。
  • methods: 使用现有的图像分割模型(如Segment Anything Model)对RL代理人的游戏环境进行修饰,以提高其游戏表现。
  • results: 研究发现,对RL代理人的游戏环境进行修饰可以提高其游戏表现,但是需要满足certain condition。 Comparing RL agent performance results from raw and augmented pixel inputs provides insight into these conditions.
    Abstract Reinforcement Learning (RL) has achieved significant milestones in the gaming domain, most notably Google DeepMind's AlphaGo defeating human Go champion Ken Jie. This victory was also made possible through the Atari Learning Environment (ALE): The ALE has been foundational in RL research, facilitating significant RL algorithm developments such as AlphaGo and others. In current Atari video game RL research, RL agents' perceptions of its environment is based on raw pixel data from the Atari video game screen with minimal image preprocessing. Contrarily, cutting-edge ML research, external to the Atari video game RL research domain, is focusing on enhancing image perception. A notable example is Meta Research's "Segment Anything Model" (SAM), a foundation model capable of segmenting images without prior training (zero-shot). This paper addresses a novel methodical question: Can state-of-the-art image segmentation models such as SAM improve the performance of RL agents playing Atari video games? The results suggest that SAM can serve as a "virtual augmented reality" for the RL agent, boosting its Atari video game playing performance under certain conditions. Comparing RL agent performance results from raw and augmented pixel inputs provides insight into these conditions. Although this paper was limited by computational constraints, the findings show improved RL agent performance for augmented pixel inputs and can inform broader research agendas in the domain of "virtual augmented reality for video game playing RL agents".
    摘要 reinforcement learning (RL) 在游戏领域取得了重要的成就,最 Notable example 是 Google DeepMind 的 AlphaGo 击败人类Go冠军 Ken Jie。这胜利也得到了 ALE 的支持:ALE 是RL研究中基础的平台,促进了一系列RL算法的发展,如 AlphaGo 等。现在的 Atari 游戏 RL 研究中,RL 代理的环境感知基于 raw pixel 数据从 Atari 游戏屏幕, minimal 图像预处理。然而,当前的 ML 研究,外部于 Atari 游戏 RL 研究领域,正在强调图像感知的提高。一个 notable example 是 Meta Research 的 "Segment Anything Model" (SAM),这是一个无需先期训练的基本模型,可以 segmenting 图像。本文提出了一个新的问题:可以使用 state-of-the-art 图像 segmentation 模型来提高 Atari 游戏 RL 代理的性能吗?结果表明,SAM 可以作为 RL 代理的 "虚拟增强 reality",在某些条件下提高其 Atari 游戏性能。通过比较 raw 和增强 pixel 输入的 RL 代理性能结果,可以了解这些条件。虽然这篇文章受限于计算力,但结果表明在某些情况下,使用 state-of-the-art 图像 segmentation 模型可以提高 RL 代理的性能,这些结果可以推导到更广泛的 "虚拟增强 reality для video game 游戏 RL 代理" 的研究论题。

Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

  • paper_url: http://arxiv.org/abs/2310.08678
  • repo_url: None
  • paper_authors: Ethan Callanan, Amarachi Mbakwe, Antony Papadimitriou, Yulong Pei, Mathieu Sibue, Xiaodan Zhu, Zhiqiang Ma, Xiaomo Liu, Sameena Shah
  • For: This study aims to assess the financial reasoning capabilities of Large Language Models (LLMs) using mock exam questions from the Chartered Financial Analyst (CFA) Program.* Methods: The study uses ChatGPT and GPT-4 in financial analysis, considering Zero-Shot (ZS), Chain-of-Thought (CoT), and Few-Shot (FS) scenarios.* Results: The study presents an in-depth analysis of the models’ performance and limitations, and estimates whether they would have a chance at passing the CFA exams. Additionally, it outlines insights into potential strategies and improvements to enhance the applicability of LLMs in finance.
    Abstract Large Language Models (LLMs) have demonstrated remarkable performance on a wide range of Natural Language Processing (NLP) tasks, often matching or even beating state-of-the-art task-specific models. This study aims at assessing the financial reasoning capabilities of LLMs. We leverage mock exam questions of the Chartered Financial Analyst (CFA) Program to conduct a comprehensive evaluation of ChatGPT and GPT-4 in financial analysis, considering Zero-Shot (ZS), Chain-of-Thought (CoT), and Few-Shot (FS) scenarios. We present an in-depth analysis of the models' performance and limitations, and estimate whether they would have a chance at passing the CFA exams. Finally, we outline insights into potential strategies and improvements to enhance the applicability of LLMs in finance. In this perspective, we hope this work paves the way for future studies to continue enhancing LLMs for financial reasoning through rigorous evaluation.
    摘要 大型自然语言模型(LLM)已经在各种自然语言处理任务上显示出极具表现力,经常与状态流的任务特定模型匹配或甚至超越。这项研究的目的是评估LLM在金融分析中的理解能力。我们利用Chartered Financial Analyst(CFA)考试Mock问题来进行全面的GPT和GPT-4在金融分析中的评估,包括零极(ZS)、链条(CoT)和几极(FS)场景。我们提供了深入的分析和限制,并估算这些模型是否会在CFA考试中通过。最后,我们总结了可能的策略和改进,以提高LLM在金融领域的应用性。希望这项研究能够为未来的研究提供依据,继续提高LLM在金融分析中的表现。

GDL-DS: A Benchmark for Geometric Deep Learning under Distribution Shifts

  • paper_url: http://arxiv.org/abs/2310.08677
  • repo_url: https://github.com/graph-com/gdl_ds
  • paper_authors: Deyu Zou, Shikun Liu, Siqi Miao, Victor Fung, Shiyu Chang, Pan Li
  • for: 本研究旨在评估深度学习模型在数据分布变化的情况下的性能。
  • methods: 本研究使用的方法包括提出了一个全面的benchmark,用于评估深度学习模型在不同的数据分布变化情况下的性能。
  • results: 研究结果显示,在30个不同的实验设置中,3种深度学习基础模型和11种学习算法在不同的数据分布变化情况下的性能有所差异。
    Abstract Geometric deep learning (GDL) has gained significant attention in various scientific fields, chiefly for its proficiency in modeling data with intricate geometric structures. Yet, very few works have delved into its capability of tackling the distribution shift problem, a prevalent challenge in many relevant applications. To bridge this gap, we propose GDL-DS, a comprehensive benchmark designed for evaluating the performance of GDL models in scenarios with distribution shifts. Our evaluation datasets cover diverse scientific domains from particle physics and materials science to biochemistry, and encapsulate a broad spectrum of distribution shifts including conditional, covariate, and concept shifts. Furthermore, we study three levels of information access from the out-of-distribution (OOD) testing data, including no OOD information, only OOD features without labels, and OOD features with a few labels. Overall, our benchmark results in 30 different experiment settings, and evaluates 3 GDL backbones and 11 learning algorithms in each setting. A thorough analysis of the evaluation results is provided, poised to illuminate insights for DGL researchers and domain practitioners who are to use DGL in their applications.
    摘要 几何深度学习(GDL)已经受到了不同领域的科学家的重视,主要是因为它能够有效地模型复杂的几何结构数据。然而,只有一些研究探讨了GDL模型在分布类型错误(distribution shift)的情况下的能力。为了补充这个空白,我们提出了GDL-DS,一个全面的对照测试框架,用于评估GDL模型在分布类型错误的情况下的表现。我们的评估数据集覆盖了物理学和材料科学等多个科学领域,并包含了各种分布类型错误,包括增量、偏好和概念类型错误。此外,我们还研究了从 OUT-OF-Distribution(OOD)测试数据中获取信息的三种水平,包括没有OOD信息、只有OOD特征而无 labels,以及OOD特征和一些labels。总的来说,我们的对照测试得出了30个不同的实验设定,并评估了3个GDL核心和11种学习算法在每个设定中。我们进行了详细的分析结果,以便为DGL研究者和领域实践者提供启发。

Learning RL-Policies for Joint Beamforming Without Exploration: A Batch Constrained Off-Policy Approach

  • paper_url: http://arxiv.org/abs/2310.08660
  • repo_url: https://github.com/heasung-kim/safe-rl-deployment-for-5g
  • paper_authors: Heasung Kim, Sravan Ankireddy
  • For: The paper is written for optimizing network parameters for rate maximization in 5G communication systems.* Methods: The paper proposes using deep reinforcement learning (RL) techniques, specifically discrete batch constrained deep Q-learning (BCQ), to solve the non-convex optimization problem of power control, beam forming, and interference cancellation.* Results: The paper shows that the proposed BCQ algorithm can achieve performance similar to deep Q-network (DQN) based control with only a fraction of the data and without the need for exploration, resulting in maximized sample efficiency and minimized risk in the deployment of a new algorithm to commercial networks.Here are the three key information points in Simplified Chinese text:* For: 本文是为了优化5G通信系统中的网络参数以实现速率最大化。* Methods: 本文提议使用深度学习 Reinforcement Learning (RL) 技术,特别是粗粒度约束的深度 Q-学习 (BCQ),解决非对称优化问题。* Results: 本文显示,提议的 BCQ 算法可以与 DQN 基于控制 дости到类似性,只需要一小部分数据和不需要探索,从而最大化样本效率和风险的避免。
    Abstract In this project, we consider the problem of network parameter optimization for rate maximization. We frame this as a joint optimization problem of power control, beam forming, and interference cancellation. We consider the setting where multiple Base Stations (BSs) are communicating with multiple user equipments (UEs). Because of the exponential computational complexity of brute force search, we instead solve this non-convex optimization problem using deep reinforcement learning (RL) techniques. The modern communication systems are notorious for their difficulty in exactly modeling their behaviour. This limits us in using RL based algorithms as interaction with the environment is needed for the agent to explore and learn efficiently. Further, it is ill advised to deploy the algorithm in real world for exploration and learning because of the high cost of failure. In contrast to the previous RL-based solutions proposed, such as deep-Q network (DQN) based control, we propose taking an offline model based approach. We specifically consider discrete batch constrained deep Q-learning (BCQ) and show that performance similar to DQN can be acheived with only a fraction of the data and without the need for exploration. This results in maximizing sample efficiency and minimizing risk in the deployment of a new algorithm to commercial networks. We provide the entire resource of the project, including code and data, at the following link: https://github.com/Heasung-Kim/ safe-rl-deployment-for-5g.
    摘要 在这个项目中,我们考虑了网络参数优化问题,以maximize rate。我们将这个问题划为多个基站(BS)与多个用户设备(UE)之间的共同优化问题,包括功率控制、扫描形成和干扰抑制。由于条件矩阵的计算复杂性,我们不能采用条件矩阵搜索法。相反,我们使用深度学习束缚学习(RL)技术来解决这个非连续优化问题。现代通信系统的行为难以准确模拟,这限制了我们使用RL基于算法。此外,由于实际部署中的失败成本高,我们不建议在实际环境中进行探索和学习。相比之前的RL基于解决方案,我们提出了离线模型基于的BCQ算法。我们表明,BCQ算法可以在只需一部分数据和不需探索的情况下,达到与DQN算法相同的性能。这使得我们可以最大化样本效率,最小化部署新算法到商业网络中的风险。我们提供了该项目的所有资源,包括代码和数据,请参考以下链接:https://github.com/Heasung-Kim/safe-rl-deployment-for-5g。

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

  • paper_url: http://arxiv.org/abs/2310.08659
  • repo_url: https://github.com/yxli2123/loftq
  • paper_authors: Yixiao Li, Yifan Yu, Chen Liang, Pengcheng He, Nikos Karampatziakis, Weizhu Chen, Tuo Zhao
  • for: 本研究旨在探讨在预训练模型上同时应用量化和LoRA精度调整的场景下,量化和LoRA精度调整可以共同提高下游任务的性能。
  • methods: 我们提出了LoftQ(LoRA-Fine-Tuning-aware Quantization)量化框架,该框架同时对LLM进行量化,并在LoRA精度调整中找到适当的低级别初始化,以解决量化模型和全精度模型之间的性能差距。
  • results: 我们在自然语言理解、问答、概要、自然语言生成等任务上进行了实验,结果表明,我们的方法在2比特和2/4比特混合精度 режиmes中具有显著的优势,与现有的量化方法相比,尤其是在更加具有挑战性的场景下表现出色。
    Abstract Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning. In this work we focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model. In such cases it is common to observe a consistent gap in the performance on downstream tasks between full fine-tuning and quantization plus LoRA fine-tuning approach. In response, we propose LoftQ (LoRA-Fine-Tuning-aware Quantization), a novel quantization framework that simultaneously quantizes an LLM and finds a proper low-rank initialization for LoRA fine-tuning. Such an initialization alleviates the discrepancy between the quantized and full-precision model and significantly improves the generalization in downstream tasks. We evaluate our method on natural language understanding, question answering, summarization, and natural language generation tasks. Experiments show that our method is highly effective and outperforms existing quantization methods, especially in the challenging 2-bit and 2/4-bit mixed precision regimes. We will release our code.
    摘要 “量化”是现代大语言模型(LLM)的不可或缺技巧,最近它在LoRA精细调整中找到了应用。在这种场景下,我们发现在预训练模型上应用量化和LoRA精细调整时,下游任务表现存在一个一致的差距。为了解决这个问题,我们提出了LoftQ(LoRA-Fine-Tuning-aware Quantization),一种新的量化框架,同时对大语言模型进行量化,并在LoRA精细调整中找到合适的低级初始化。这种初始化可以减轻量化模型和整数模型之间的差异,并在下游任务中提高通用性。我们在自然语言理解、问答、概要、自然语言生成等任务上进行了实验,结果显示,我们的方法非常有效,特别在2位和2/4位混合精度 régime中表现出色。我们将发布我们的代码。

Analyzing Textual Data for Fatality Classification in Afghanistan’s Armed Conflicts: A BERT Approach

  • paper_url: http://arxiv.org/abs/2310.08653
  • repo_url: None
  • paper_authors: Hikmatullah Mohammadi, Ziaullah Momand, Parwin Habibi, Nazifa Ramaki, Bibi Storay Fazli, Sayed Zobair Rohany, Iqbal Samsoor
  • For: The paper aims to classify the outcomes of armed conflicts in Afghanistan as either fatal or non-fatal based on textual descriptions provided by the ACLED dataset.* Methods: The paper uses the BERT model, a cutting-edge language representation model in natural language processing, to classify the events based on their raw textual descriptions.* Results: The model achieved impressive performance on the test set with an accuracy of 98.8%, recall of 98.05%, precision of 99.6%, and an F1 score of 98.82%. These results highlight the model’s robustness and indicate its potential impact in various areas such as resource allocation, policymaking, and humanitarian aid efforts in Afghanistan.Here are the three points in Simplified Chinese text:
  • for: 这个研究目标是使用 ACLED 数据集的文本描述来分类阿富汗武装冲突的结果为非致死或致死。
  • methods: 这个研究使用 BERT 模型,一种现代自然语言处理的语言表示模型,来基于事件的原始文本描述来分类。
  • results: 模型在测试集上表现出色,准确率为 98.8%,回归率为 98.05%,准确率为 99.6%, F1 分数为 98.82%。这些结果表明模型的稳定性,并指示其在阿富汗资源分配、政策制定和人道主义援助等领域的潜在影响。
    Abstract Afghanistan has witnessed many armed conflicts throughout history, especially in the past 20 years; these events have had a significant impact on human lives, including military and civilians, with potential fatalities. In this research, we aim to leverage state-of-the-art machine learning techniques to classify the outcomes of Afghanistan armed conflicts to either fatal or non-fatal based on their textual descriptions provided by the Armed Conflict Location & Event Data Project (ACLED) dataset. The dataset contains comprehensive descriptions of armed conflicts in Afghanistan that took place from August 2021 to March 2023. The proposed approach leverages the power of BERT (Bidirectional Encoder Representations from Transformers), a cutting-edge language representation model in natural language processing. The classifier utilizes the raw textual description of an event to estimate the likelihood of the event resulting in a fatality. The model achieved impressive performance on the test set with an accuracy of 98.8%, recall of 98.05%, precision of 99.6%, and an F1 score of 98.82%. These results highlight the model's robustness and indicate its potential impact in various areas such as resource allocation, policymaking, and humanitarian aid efforts in Afghanistan. The model indicates a machine learning-based text classification approach using the ACLED dataset to accurately classify fatality in Afghanistan armed conflicts, achieving robust performance with the BERT model and paving the way for future endeavors in predicting event severity in Afghanistan.
    摘要 阿富汗历史上有很多武装冲突,特别是过去20年,这些事件对人类生命产生了深远的影响,包括军事人员和平民,可能导致致命性伤亡。在这项研究中,我们想利用当今最先进的机器学习技术来分类阿富汗武装冲突的结果为致命或非致命,基于ACLED数据集(武装冲突位置和事件数据项目)提供的文本描述。ACLED数据集包含了阿富汗2021年8月至2023年3月期间的武装冲突描述。我们提出的方法利用BERT(irectional Encoder Representations from Transformers)模型,这是当今自然语言处理领域最先进的语言表示模型。分类器使用事件描述的Raw文本来估计事件是否会导致致命性伤亡。测试集上,模型实现了惊人的表现,准确率为98.8%,回归率为98.05%,精度为99.6%,F1分数为98.82%。这些结果显示模型的强健性,并指示其在各种领域,如资源分配、政策制定和人道主义援助等,有可能产生深远的影响。模型表明,使用ACLED数据集和BERT模型进行文本分类可以准确地 классифици阿富汗武装冲突的致命性,实现了robust性表现,开创了预测阿富汗事件严重性的先河。

Electrical Grid Anomaly Detection via Tensor Decomposition

  • paper_url: http://arxiv.org/abs/2310.08650
  • repo_url: None
  • paper_authors: Alexander Most, Maksim Eren, Nigel Lawrence, Boian Alexandrov
  • For: This paper aims to improve the accuracy and specificity of anomaly detection in Supervisory Control and Data Acquisition (SCADA) systems for electrical grid systems.* Methods: The paper applies a non-negative tensor decomposition method called Canonical Polyadic Alternating Poisson Regression (CP-APR) with a probabilistic framework to identify anomalies in SCADA systems.* Results: The use of statistical behavior analysis of SCADA communication with tensor decomposition improves the specificity and accuracy of identifying anomalies in electrical grid systems, as demonstrated through experiments using real-world SCADA system data collected from the Los Alamos National Laboratory (LANL).
    Abstract Supervisory Control and Data Acquisition (SCADA) systems often serve as the nervous system for substations within power grids. These systems facilitate real-time monitoring, data acquisition, control of equipment, and ensure smooth and efficient operation of the substation and its connected devices. Previous work has shown that dimensionality reduction-based approaches, such as Principal Component Analysis (PCA), can be used for accurate identification of anomalies in SCADA systems. While not specifically applied to SCADA, non-negative matrix factorization (NMF) has shown strong results at detecting anomalies in wireless sensor networks. These unsupervised approaches model the normal or expected behavior and detect the unseen types of attacks or anomalies by identifying the events that deviate from the expected behavior. These approaches; however, do not model the complex and multi-dimensional interactions that are naturally present in SCADA systems. Differently, non-negative tensor decomposition is a powerful unsupervised machine learning (ML) method that can model the complex and multi-faceted activity details of SCADA events. In this work, we novelly apply the tensor decomposition method Canonical Polyadic Alternating Poisson Regression (CP-APR) with a probabilistic framework, which has previously shown state-of-the-art anomaly detection results on cyber network data, to identify anomalies in SCADA systems. We showcase that the use of statistical behavior analysis of SCADA communication with tensor decomposition improves the specificity and accuracy of identifying anomalies in electrical grid systems. In our experiments, we model real-world SCADA system data collected from the electrical grid operated by Los Alamos National Laboratory (LANL) which provides transmission and distribution service through a partnership with Los Alamos County, and detect synthetically generated anomalies.
    摘要 超visory控制和数据获取(SCADA)系统 часто作为电网互网络的神经系统。这些系统实时监控、数据获取、控制设备,以确保电网和相关设备的运行平滑和高效。以前的研究表明,维度减少基本方法,如主成分分析(PCA),可以准确地检测SCADA系统中的异常。尽管不直接应用于SCADA,非负矩阵分解(NMF)在无人报表网络中检测异常表现出色。这些不监管的方法模拟正常或预期的行为,并检测不可见的攻击或异常情况,并且可以快速地响应变化。然而,这些方法不能模拟SCADA系统中自然存在的复杂多维度交互。相反,非负矩阵分解是一种强大的无监管机器学习方法,可以模拟SCADA事件的复杂多方面活动详细情况。在这种工作中,我们首次应用tensor decompositions方法Canonical Polyadic Alternating Poisson Regression(CP-APR)的概率框架,以前已经在网络数据上达到了状态之绩异常检测结果。我们显示,通过统计行为分析SCADA通信和tensor decompositions,可以提高异常检测在电力网络系统中的特点和准确率。在我们的实验中,我们使用实际的SCADA系统数据,从洛斯阿拉莫斯国家实验室(LANL)电力网络提供的传输和分布服务,并检测生成的异常。

A Mass-Conserving-Perceptron for Machine Learning-Based Modeling of Geoscientific Systems

  • paper_url: http://arxiv.org/abs/2310.08644
  • repo_url: None
  • paper_authors: Yuan-Heng Wang, Hoshin V. Gupta
  • for: 该文章是为了开发一种能够更准确地预测地球科学系统时间序列进程的 физи学基础模型。
  • methods: 该文章使用机器学习(ML)技术,开发了一种基于权重链网络(GRNN)的 физи学基础模型。
  • results: 该文章的实验结果表明,该模型可以更好地预测地球科学系统的时间序列进程,并且可以帮助科学家更好地理解系统的结构和功能。
    Abstract Although decades of effort have been devoted to building Physical-Conceptual (PC) models for predicting the time-series evolution of geoscientific systems, recent work shows that Machine Learning (ML) based Gated Recurrent Neural Network technology can be used to develop models that are much more accurate. However, the difficulty of extracting physical understanding from ML-based models complicates their utility for enhancing scientific knowledge regarding system structure and function. Here, we propose a physically-interpretable Mass Conserving Perceptron (MCP) as a way to bridge the gap between PC-based and ML-based modeling approaches. The MCP exploits the inherent isomorphism between the directed graph structures underlying both PC models and GRNNs to explicitly represent the mass-conserving nature of physical processes while enabling the functional nature of such processes to be directly learned (in an interpretable manner) from available data using off-the-shelf ML technology. As a proof of concept, we investigate the functional expressivity (capacity) of the MCP, explore its ability to parsimoniously represent the rainfall-runoff (RR) dynamics of the Leaf River Basin, and demonstrate its utility for scientific hypothesis testing. To conclude, we discuss extensions of the concept to enable ML-based physical-conceptual representation of the coupled nature of mass-energy-information flows through geoscientific systems.
    摘要 尽管多年的努力已经投入到建立物理概念(PC)模型以预测地球科学系统的时间序列演化,但最近的研究表明,机器学习(ML)基于闭合循环神经网络技术可以建立更高度准确的模型。然而,提取物理理解从ML基于模型中带来了问题,使其在提高科学知识系统结构和功能方面具有限制。为了bridging这个鸿沟,我们提议一种可解释的质量保持嵌入(MCP),该模型利用PC模型和GRNNs的直接对应关系来显式表示物理过程中的质量保持性,同时允许功能性过程直接从数据中学习(可解释的方式)。作为证明,我们研究MCP的功能表达能力,探讨它在哥伦比亚河流水系中表达简洁性,并示出其在科学假设测试中的实用性。最后,我们讨论了扩展该概念,以实现ML基于物理概念的表示地球科学系统的结合性。

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

  • paper_url: http://arxiv.org/abs/2310.08588
  • repo_url: https://github.com/dongyh20/octopus
  • paper_authors: Jingkang Yang, Yuhao Dong, Shuai Liu, Bo Li, Ziyue Wang, Chencheng Jiang, Haoran Tan, Jiamu Kang, Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu
  • for: 本研究旨在开发一种能够高效地理解智能代理人的视觉和文本任务目标,并生成复杂的行动序列和可执行代码的新型视觉语言模型(VLM)。
  • methods: 本研究使用GPT-4来控制一个探索性的代理人生成训练数据,包括行动蓝图和相应的可执行代码,并采用反馈学习环境反馈(RLEF)来进一步优化决策。
  • results: 经过一系列实验,我们证明 Octopus 的功能和取得了吸引人的结果,并且 RLEF 提高了代理人的决策。
    Abstract Large vision-language models (VLMs) have achieved substantial progress in multimodal perception and reasoning. Furthermore, when seamlessly integrated into an embodied agent, it signifies a crucial stride towards the creation of autonomous and context-aware systems capable of formulating plans and executing commands with precision. In this paper, we introduce Octopus, a novel VLM designed to proficiently decipher an agent's vision and textual task objectives and to formulate intricate action sequences and generate executable code. Our design allows the agent to adeptly handle a wide spectrum of tasks, ranging from mundane daily chores in simulators to sophisticated interactions in complex video games. Octopus is trained by leveraging GPT-4 to control an explorative agent to generate training data, i.e., action blueprints and the corresponding executable code, within our experimental environment called OctoVerse. We also collect the feedback that allows the enhanced training scheme of Reinforcement Learning with Environmental Feedback (RLEF). Through a series of experiments, we illuminate Octopus's functionality and present compelling results, and the proposed RLEF turns out to refine the agent's decision-making. By open-sourcing our model architecture, simulator, and dataset, we aspire to ignite further innovation and foster collaborative applications within the broader embodied AI community.
    摘要 大型视力语言模型(VLM)已经取得了多样化感知和理解的重要进步。更重要的是,当这些模型与embody agent结合使用时,表示自主和上下文感知系统的创造。在这篇论文中,我们介绍了Octopus,一种新的VLM,可以高效地理解机器人的视觉和文本任务目标,并生成复杂的动作序列和执行代码。我们的设计允许机器人在各种任务中灵活处理,从日常 simulate 中的杂乱任务到复杂的 виде游戏中的互动。Octopus 通过利用 GPT-4 控制一个探索性的机器人生成训练数据,即动作蓝图和相应的执行代码,在我们的实验环境 OctoVerse 中。我们还收集了反馈,用于改进强化学习环境反馈(RLEF)的训练方案。通过一系列实验,我们表明 Octopus 的功能和结果,并发现 RLEF 对机器人做出了更好的决策。我们通过开源我们的模型结构、模拟器和数据集,希望能够点燃更多的创新和在更广泛的embody AI社区中的合作应用。

Tree-Planner: Efficient Close-loop Task Planning with Large Language Models

  • paper_url: http://arxiv.org/abs/2310.08582
  • repo_url: None
  • paper_authors: Mengkang Hu, Yao Mu, Xinmiao Yu, Mingyu Ding, Shiguang Wu, Wenqi Shao, Qiguang Chen, Bin Wang, Yu Qiao, Ping Luo
  • for: 该论文研究了一种名为close-loop任务规划的技术,它是一种根据实时观察而逐步生成任务计划的过程。
  • methods: 该论文使用了大语言模型(LLM)来生成动作,并将其分为三个阶段:计划抽样、动作树构建和基于实际环境信息的决策。
  • results: 该论文通过将LLM查询分解为多个基于实际环境信息的决策,可以大幅减少token消耗量,同时提高了效率。实验显示,该方法可以达到状态的术语表现,而且可以减少92.2%的token消耗量和40.5%的错误纠正量。
    Abstract This paper studies close-loop task planning, which refers to the process of generating a sequence of skills (a plan) to accomplish a specific goal while adapting the plan based on real-time observations. Recently, prompting Large Language Models (LLMs) to generate actions iteratively has become a prevalent paradigm due to its superior performance and user-friendliness. However, this paradigm is plagued by two inefficiencies: high token consumption and redundant error correction, both of which hinder its scalability for large-scale testing and applications. To address these issues, we propose Tree-Planner, which reframes task planning with LLMs into three distinct phases: plan sampling, action tree construction, and grounded deciding. Tree-Planner starts by using an LLM to sample a set of potential plans before execution, followed by the aggregation of them to form an action tree. Finally, the LLM performs a top-down decision-making process on the tree, taking into account real-time environmental information. Experiments show that Tree-Planner achieves state-of-the-art performance while maintaining high efficiency. By decomposing LLM queries into a single plan-sampling call and multiple grounded-deciding calls, a considerable part of the prompt are less likely to be repeatedly consumed. As a result, token consumption is reduced by 92.2% compared to the previously best-performing model. Additionally, by enabling backtracking on the action tree as needed, the correction process becomes more flexible, leading to a 40.5% decrease in error corrections. Project page: https://tree-planner.github.io/
    摘要 这份论文研究了闭环任务规划,即通过生成一个序列的技能(计划)来完成特定目标,并在实时观察基础上修改计划。最近,通过让大型自然语言模型(LLM)逐步生成动作来实现这种方法,已成为流行的方法,因为它的性能和用户友好性。然而,这种方法受到两种不足:高度的token消耗和重复的错误修正,两者都阻碍了其扩展性,特别是对大规模测试和应用。为了解决这些问题,我们提出了Tree-Planner,它将任务规划转化为三个不同阶段:计划抽样、动作树构建和基于现场信息的决策。Tree-Planner开始使用LLM生成一组可能的计划,然后将它们聚合成动作树。最后,LLM在树上进行顶部决策过程,考虑实时环境信息。实验结果表明,Tree-Planner可以 дости得状态足以性,同时保持高效。通过将LLM查询分解成单个计划抽样调用和多个基于现场信息的决策调用,可以减少提示的92.2%。此外,通过允许在动作树上进行弹回 correction, correction过程更加灵活,导致错误修正减少40.5%。项目页面:https://tree-planner.github.io/

Jigsaw: Supporting Designers in Prototyping Multimodal Applications by Assembling AI Foundation Models

  • paper_url: http://arxiv.org/abs/2310.08574
  • repo_url: None
  • paper_authors: David Chuan-En Lin, Nikolas Martelaro
  • for: 本研究旨在帮助设计师在创作过程中更好地利用基础模型,提高设计效率和质量。
  • methods: 本研究使用维度模型作为基础模型,并通过将这些模型转化为独特的盘点模式来帮助设计师更好地组合不同的模式和任务。
  • results: 在用户研究中,Jigsaw系统有助于设计师更好地理解可用基础模型的功能,提供了不同模式和任务之间的组合指南,并且可以作为设计探索、原型制作和文档支持的画布。
    Abstract Recent advancements in AI foundation models have made it possible for them to be utilized off-the-shelf for creative tasks, including ideating design concepts or generating visual prototypes. However, integrating these models into the creative process can be challenging as they often exist as standalone applications tailored to specific tasks. To address this challenge, we introduce Jigsaw, a prototype system that employs puzzle pieces as metaphors to represent foundation models. Jigsaw allows designers to combine different foundation model capabilities across various modalities by assembling compatible puzzle pieces. To inform the design of Jigsaw, we interviewed ten designers and distilled design goals. In a user study, we showed that Jigsaw enhanced designers' understanding of available foundation model capabilities, provided guidance on combining capabilities across different modalities and tasks, and served as a canvas to support design exploration, prototyping, and documentation.
    摘要 Recent advancements in AI基础模型have made it possible to use them for creative tasks such as generating design concepts or visual prototypes. However, integrating these models into the creative process can be challenging because they often exist as standalone applications tailored to specific tasks. To address this challenge, we introduce Jigsaw, a prototype system that uses puzzle pieces as metaphors to represent foundation models. Jigsaw allows designers to combine different foundation model capabilities across various modalities by assembling compatible puzzle pieces. To inform the design of Jigsaw, we interviewed ten designers and distilled their design goals. In a user study, we found that Jigsaw enhanced designers' understanding of available foundation model capabilities, provided guidance on combining capabilities across different modalities and tasks, and served as a canvas to support design exploration, prototyping, and documentation.

A Lightweight Calibrated Simulation Enabling Efficient Offline Learning for Optimal Control of Real Buildings

  • paper_url: http://arxiv.org/abs/2310.08569
  • repo_url: None
  • paper_authors: Judah Goldfeder, John Sipple
  • for: 这篇论文的目的是提出一种基于强化学习的空调系统控制方法,以减少能源消耗和碳排放。
  • methods: 这篇论文使用了一个自订的模拟器来训练代理人,并使用了现有的建筑和天气资料来实现更高的精度。
  • results: 在一个68,000平方英尺的二层建筑物上,使用这种方法可以实现仅仅半度的偏差值和现实世界之间的调整,这显示了这种方法在减少能源消耗和碳排放方面的重要性。
    Abstract Modern commercial Heating, Ventilation, and Air Conditioning (HVAC) devices form a complex and interconnected thermodynamic system with the building and outside weather conditions, and current setpoint control policies are not fully optimized for minimizing energy use and carbon emission. Given a suitable training environment, a Reinforcement Learning (RL) model is able to improve upon these policies, but training such a model, especially in a way that scales to thousands of buildings, presents many real world challenges. We propose a novel simulation-based approach, where a customized simulator is used to train the agent for each building. Our open-source simulator (available online: https://github.com/google/sbsim) is lightweight and calibrated via telemetry from the building to reach a higher level of fidelity. On a two-story, 68,000 square foot building, with 127 devices, we were able to calibrate our simulator to have just over half a degree of drift from the real world over a six-hour interval. This approach is an important step toward having a real-world RL control system that can be scaled to many buildings, allowing for greater efficiency and resulting in reduced energy consumption and carbon emissions.
    摘要 现代商业冷却、通风、空调设备形成了复杂且相互连接的 термодинамиче系统,与建筑物和外部天气条件相关。目前的设点控制策略并没有充分优化能源使用和二氧化碳排放。一个适当的训练环境下,一个强化学习(RL)模型可以改进这些策略,但是训练这样一个模型,特别是在千量级建筑物上,存在许多现实世界挑战。我们提议一种新的模拟基本方法,其中每座建筑物都有自己的特定的模拟器。我们开源的模拟器(可以在线访问:https://github.com/google/sbsim)轻量级,通过建筑物的测验数据进行准确调整。在一座两层、68,000平方米的建筑物上,拥有127个设备时,我们可以在六个小时内将模拟器与实际世界之间的偏差降低到了超过一半度。这种方法是有效地帮助实现大规模化RL控制系统,从而提高能源使用效率,并减少能源消耗和二氧化碳排放。

Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining

  • paper_url: http://arxiv.org/abs/2310.08566
  • repo_url: None
  • paper_authors: Licong Lin, Yu Bai, Song Mei
  • for: 这paper的目的是理解可以在offline启动的大型变换器模型中进行ICRL。
  • methods: 这paper使用了两种近期提出的训练方法:算法涵化和决策预训练变换器。
  • results: 这paper表明,supervised预训练的变换器可以很好地复制条件预期的专家算法,并且可以有效地近似在线学习算法。
    Abstract Large transformer models pretrained on offline reinforcement learning datasets have demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they can make good decisions when prompted with interaction trajectories from unseen environments. However, when and how transformers can be trained to perform ICRL have not been theoretically well-understood. In particular, it is unclear which reinforcement-learning algorithms transformers can perform in context, and how distribution mismatch in offline training data affects the learned algorithms. This paper provides a theoretical framework that analyzes supervised pretraining for ICRL. This includes two recently proposed training methods -- algorithm distillation and decision-pretrained transformers. First, assuming model realizability, we prove the supervised-pretrained transformer will imitate the conditional expectation of the expert algorithm given the observed trajectory. The generalization error will scale with model capacity and a distribution divergence factor between the expert and offline algorithms. Second, we show transformers with ReLU attention can efficiently approximate near-optimal online reinforcement learning algorithms like LinUCB and Thompson sampling for stochastic linear bandits, and UCB-VI for tabular Markov decision processes. This provides the first quantitative analysis of the ICRL capabilities of transformers pretrained from offline trajectories.
    摘要 大型转换器模型在线上强化学习数据上预训练后表现出了非常出色的在场景强化学习(ICRL)能力,它们可以在未看过环境中接受交互轨迹时作出良好的决策。然而,transformer在ICRL中被训练的时候和怎样做到ICRL都没有有 teorтичеamente好的理解。具体来说,transformer可以执行哪些强化学习算法在场景中,以及在线上训练数据中的分布差异如何影响学习的算法。这篇论文提供了一个理论框架,用于分析监督预训练的ICRL。这包括两种最近提出的训练方法:算法采样和决策预训练转换器。首先,我们假设模型可行,我们证明监督预训练的转换器将在观察轨迹时效果地复制出 conditional expectation 的专家算法。总的来说,泛化误差将与模型容量和一个分布分化因子 между专家和线上算法相关。其次,我们表明 transformer WITH ReLU 注意力可以高效地近似在线强化学习算法 like LinUCB 和 Thompson sampling для随机线性奖励,以及 UCB-VI для表格 Markov 决策过程。这是 ICRL 能力的首次量化分析。

Security Considerations in AI-Robotics: A Survey of Current Methods, Challenges, and Opportunities

  • paper_url: http://arxiv.org/abs/2310.08565
  • repo_url: None
  • paper_authors: Subash Neupane, Shaswata Mitra, Ivan A. Fernandez, Swayamjit Saha, Sudip Mittal, Jingdao Chen, Nisha Pillai, Shahram Rahimi
  • for: 这篇论文的目的是为了探讨人工智能机器人系统的安全问题。
  • methods: 这篇论文使用了三维ensional的攻击表面、伦理和法律问题、人机交互安全等方面进行概括和分类。
  • results: 这篇论文提供了一个总结性的对话,包括攻击表面、伦理和法律问题、人机交互安全等方面的概括和分类,以帮助用户、开发者和其他关注者更好地理解这些领域,并提高整体系统安全性。
    Abstract Robotics and Artificial Intelligence (AI) have been inextricably intertwined since their inception. Today, AI-Robotics systems have become an integral part of our daily lives, from robotic vacuum cleaners to semi-autonomous cars. These systems are built upon three fundamental architectural elements: perception, navigation and planning, and control. However, while the integration of AI-Robotics systems has enhanced the quality our lives, it has also presented a serious problem - these systems are vulnerable to security attacks. The physical components, algorithms, and data that make up AI-Robotics systems can be exploited by malicious actors, potentially leading to dire consequences. Motivated by the need to address the security concerns in AI-Robotics systems, this paper presents a comprehensive survey and taxonomy across three dimensions: attack surfaces, ethical and legal concerns, and Human-Robot Interaction (HRI) security. Our goal is to provide users, developers and other stakeholders with a holistic understanding of these areas to enhance the overall AI-Robotics system security. We begin by surveying potential attack surfaces and provide mitigating defensive strategies. We then delve into ethical issues, such as dependency and psychological impact, as well as the legal concerns regarding accountability for these systems. Besides, emerging trends such as HRI are discussed, considering privacy, integrity, safety, trustworthiness, and explainability concerns. Finally, we present our vision for future research directions in this dynamic and promising field.
    摘要 人工智能(AI)和机器人技术自出发以来一直是不可分割的。今天,AI-机器人系统已成为我们日常生活的一部分,从吸尘器到半自动汽车。这些系统建立在三个基本建筑元素之上:感知、导航和规划,以及控制。然而,AI-机器人系统的集成也导致了一个严重的问题——这些系统容易受到安全攻击。物理组件、算法和数据,这些组成AI-机器人系统的元素可以被恶意攻击者滥用,可能导致严重的后果。为了解决AI-机器人系统的安全问题,本文提供了全面的调查和分类,涵盖三个维度:攻击表面、伦理和法律问题,以及人机交互安全。我们的目标是为用户、开发者和其他参与者提供一个整体的理解,以增强AI-机器人系统的安全性。我们开始是检查潜在的攻击表面,并提供防御策略。然后,我们详细讨论了伦理问题,如依赖和心理影响,以及法律问题,包括负责任的问题。此外,我们还讨论了新趋势,如人机交互,考虑隐私、完整性、安全、可靠性、可 explainer 的问题。最后,我们提出了未来研究方向的视野。

MemGPT: Towards LLMs as Operating Systems

  • paper_url: http://arxiv.org/abs/2310.08560
  • repo_url: None
  • paper_authors: Charles Packer, Vivian Fang, Shishir G. Patil, Kevin Lin, Sarah Wooders, Joseph E. Gonzalez
  • for: 该论文旨在解决现代大语言模型(LLM)受限于局部上下文窗口的问题,提高LLM在长 conversations 和文档分析等任务中的实用性。
  • methods: 该论文提出了虚拟上下文管理技术, drawing inspiration from hierarchical memory systems in traditional operating systems,以提供较大的上下文资源,并使用中断来管理控制流。
  • results: 在文档分析和多会话聊天两个领域中,MemGPT能够有效地提供extended context,超过了基于LLM的局部上下文窗口的性能。
    Abstract Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversations and document analysis. To enable using context beyond limited context windows, we propose virtual context management, a technique drawing inspiration from hierarchical memory systems in traditional operating systems that provide the appearance of large memory resources through data movement between fast and slow memory. Using this technique, we introduce MemGPT (Memory-GPT), a system that intelligently manages different memory tiers in order to effectively provide extended context within the LLM's limited context window, and utilizes interrupts to manage control flow between itself and the user. We evaluate our OS-inspired design in two domains where the limited context windows of modern LLMs severely handicaps their performance: document analysis, where MemGPT is able to analyze large documents that far exceed the underlying LLM's context window, and multi-session chat, where MemGPT can create conversational agents that remember, reflect, and evolve dynamically through long-term interactions with their users. We release MemGPT code and data for our experiments at https://memgpt.ai.
    摘要

Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement

  • paper_url: http://arxiv.org/abs/2310.08559
  • repo_url: https://github.com/linlu-qiu/lm-inductive-reasoning
  • paper_authors: Linlu Qiu, Liwei Jiang, Ximing Lu, Melanie Sclar, Valentina Pyatkin, Chandra Bhagavatula, Bailin Wang, Yoon Kim, Yejin Choi, Nouha Dziri, Xiang Ren
  • for: 这个研究旨在探讨语言模型(LM)在推理中的 inductive reasoning 能力,以及LM与人类 inductive reasoning 过程的差异。
  • methods: 研究使用了迭代假设细化(iterative hypothesis refinement)技术,包括提出、选择和细化假设的三个步骤,以模拟人类 inductive reasoning 过程。
  • results: 研究发现,LM 在 inductive reasoning 任务中表现出色,但也存在一些问题,如规则推理和应用等方面的表现下降,这表明LM 可能只是提出了假设而无法实际应用规则。此外,研究还发现了LM 和人类 inductive reasoning 过程之间的几个差异。
    Abstract The ability to derive underlying principles from a handful of observations and then generalize to novel situations -- known as inductive reasoning -- is central to human intelligence. Prior work suggests that language models (LMs) often fall short on inductive reasoning, despite achieving impressive success on research benchmarks. In this work, we conduct a systematic study of the inductive reasoning capabilities of LMs through iterative hypothesis refinement, a technique that more closely mirrors the human inductive process than standard input-output prompting. Iterative hypothesis refinement employs a three-step process: proposing, selecting, and refining hypotheses in the form of textual rules. By examining the intermediate rules, we observe that LMs are phenomenal hypothesis proposers (i.e., generating candidate rules), and when coupled with a (task-specific) symbolic interpreter that is able to systematically filter the proposed set of rules, this hybrid approach achieves strong results across inductive reasoning benchmarks that require inducing causal relations, language-like instructions, and symbolic concepts. However, they also behave as puzzling inductive reasoners, showing notable performance gaps in rule induction (i.e., identifying plausible rules) and rule application (i.e., applying proposed rules to instances), suggesting that LMs are proposing hypotheses without being able to actually apply the rules. Through empirical and human analyses, we further reveal several discrepancies between the inductive reasoning processes of LMs and humans, shedding light on both the potentials and limitations of using LMs in inductive reasoning tasks.
    摘要 人类智能中的一个重要特点是从少量观察结果中推导出基本原则,然后将其应用到新的情况下。这种推导能力被称为推理,是人类智能的核心能力。尽管语言模型(LM)在研究 benchmark上表现出色,但在推理能力方面 frequently falls short。在这项工作中,我们通过迭代假设细化来系统地研究LM的推理能力,这种方法更加像人类的推理过程。迭代假设细化包括提出、选择和细化假设的三个步骤,通过分析中间规则,我们发现LM是出色的假设提出者(即生成候选规则),当与任务特定的符号化 интерпрета器相结合,这种混合方法在induction reasoning benchmarks上表现出强劲。然而,LMs也表现出了吸引人的推理行为,包括规则生成和规则应用的性能差距,这表明LMs在提出假设时不能实际应用规则。通过实验和人类分析,我们进一步揭示了LMs和人类在推理过程中的差异,这有助于理解LMs在推理任务中的潜在能力和局限性。

Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias

  • paper_url: http://arxiv.org/abs/2310.08558
  • repo_url: https://github.com/MaxSobolMark/OOO
  • paper_authors: Max Sobol Mark, Archit Sharma, Fahim Tajwar, Rafael Rafailov, Sergey Levine, Chelsea Finn
  • for: 该论文主要目标是提高在在线学习 Reinforcement Learning (RL) 中的策略训练效果,特别是在缺乏足够状态覆盖的情况下。
  • methods: 该论文提出了一种 Offline-to-Online-to-Offline (OOO) 框架,其中在在线finetuning过程中使用了一个optimistic(探索)策略和一个pessimistic(利用)策略。在这个框架中, optimistic策略用于与环境交互,而pessimistic策略则是根据所有观察到的数据进行训练。
  • results: 该论文的实验结果显示,OOO框架可以提高在线RL的性能,并且可以在缺乏足够状态覆盖的情况下进行策略训练。实验结果还表明,OOO框架可以与其他在线RL和离线RL方法相结合,并且可以在一些OpenAI gym环境上提高在线RL性能 by 165%。
    Abstract It is desirable for policies to optimistically explore new states and behaviors during online reinforcement learning (RL) or fine-tuning, especially when prior offline data does not provide enough state coverage. However, exploration bonuses can bias the learned policy, and our experiments find that naive, yet standard use of such bonuses can fail to recover a performant policy. Concurrently, pessimistic training in offline RL has enabled recovery of performant policies from static datasets. Can we leverage offline RL to recover better policies from online interaction? We make a simple observation that a policy can be trained from scratch on all interaction data with pessimistic objectives, thereby decoupling the policies used for data collection and for evaluation. Specifically, we propose offline retraining, a policy extraction step at the end of online fine-tuning in our Offline-to-Online-to-Offline (OOO) framework for reinforcement learning (RL). An optimistic (exploration) policy is used to interact with the environment, and a separate pessimistic (exploitation) policy is trained on all the observed data for evaluation. Such decoupling can reduce any bias from online interaction (intrinsic rewards, primacy bias) in the evaluation policy, and can allow more exploratory behaviors during online interaction which in turn can generate better data for exploitation. OOO is complementary to several offline-to-online RL and online RL methods, and improves their average performance by 14% to 26% in our fine-tuning experiments, achieves state-of-the-art performance on several environments in the D4RL benchmarks, and improves online RL performance by 165% on two OpenAI gym environments. Further, OOO can enable fine-tuning from incomplete offline datasets where prior methods can fail to recover a performant policy. Implementation: https://github.com/MaxSobolMark/OOO
    摘要 <>translation into Simplified Chinese<>政策应该在在线强化学习(RL)或精度调整时,积极探索新状态和行为。特别是当前在线数据不够覆盖状态时,这对于政策的学习非常有利。然而,探索奖励可能会偏移学习的政策,我们的实验发现,标准使用探索奖励可能会失败回归高性能政策。同时,在线RL中的积极训练已经使得从静态数据中回归高性能政策成为可能。我们可以利用在线RL来回归更好的政策从在线互动中?我们提出了一个简单的观察:一个政策可以从所有互动数据中准备零,并使用消极目标进行训练。这可以减少在线互动中的偏见(内在奖励、优先级偏见),并允许在线互动中更多的探索行为,从而生成更好的数据进行利用。我们提出了在线重新训练(OOO)框架,它在在线 Fine-tuning 过程中使用一个积极(探索)政策和一个独立的消极(利用)政策进行训练。这种分离可以减少在线互动中的偏见,并允许更多的探索行为,从而提高在线RL的性能。OOO 与多种在线-to-Offline RL 和在线RL 方法相结合,可以提高均衡性能。我们的实验表明,OOO 可以在 D4RL benchmark 上达到状态-of-the-art 性能,并在 OpenAI gym 中的两个环境上提高在线RL 性能 by 165%。此外,OOO 可以在无法回归高性能政策的情况下,从不完整的 Offline 数据进行 fine-tuning。实现:https://github.com/MaxSobolMark/OOO。

Cross-Episodic Curriculum for Transformer Agents

  • paper_url: http://arxiv.org/abs/2310.08549
  • repo_url: https://github.com/CEC-Agent/CEC
  • paper_authors: Lucy Xiaoyang Shi, Yunfan Jiang, Jake Grigsby, Linxi “Jim” Fan, Yuke Zhu
  • for: 提高 transformer 代理的学习效率和通用性
  • methods: 跨话Context curriculum 方法
  • results: 在多任务 reinforcement learning 和模仿学习中表现出色,政策表现出超过比较者的优势和强大的通用性
    Abstract We present a new algorithm, Cross-Episodic Curriculum (CEC), to boost the learning efficiency and generalization of Transformer agents. Central to CEC is the placement of cross-episodic experiences into a Transformer's context, which forms the basis of a curriculum. By sequentially structuring online learning trials and mixed-quality demonstrations, CEC constructs curricula that encapsulate learning progression and proficiency increase across episodes. Such synergy combined with the potent pattern recognition capabilities of Transformer models delivers a powerful cross-episodic attention mechanism. The effectiveness of CEC is demonstrated under two representative scenarios: one involving multi-task reinforcement learning with discrete control, such as in DeepMind Lab, where the curriculum captures the learning progression in both individual and progressively complex settings; and the other involving imitation learning with mixed-quality data for continuous control, as seen in RoboMimic, where the curriculum captures the improvement in demonstrators' expertise. In all instances, policies resulting from CEC exhibit superior performance and strong generalization. Code is open-sourced at https://cec-agent.github.io/ to facilitate research on Transformer agent learning.
    摘要 我们提出了一种新的算法 named Cross-Episodic Curriculum (CEC), 用于提高 transformer 代理的学习效率和通用性。 CEC 的核心思想是在 transformer 的上下文中放置 cross-episodic 经验,这些经验组成了一个 curriculum。通过将在线学习课程和杂质示例进行顺序排序,CEC 构建了包含学习进程和能力提升的 curricula。这种同时利用 transformer 模型强大的模式识别能力和 curriculum 结构的 synergy,实现了一种强大的 cross-episodic 注意力机制。在 two 个代表性的场景中,CEC 的效果得到了证明:一个是在 DeepMind Lab 中进行多任务强化学习,其中 curriculum 捕捉了学习过程中的个体和逐渐复杂的设置;另一个是在 RoboMimic 中进行模仿学习,其中 curriculum 捕捉了示例师的专业水平提高。在所有情况下,由 CEC 生成的策略均显示出superior performance和强大的泛化能力。code 可以在 上下载,以便研究 transformer 代理学习。

Do pretrained Transformers Really Learn In-context by Gradient Descent?

  • paper_url: http://arxiv.org/abs/2310.08540
  • repo_url: None
  • paper_authors: Lingfeng Shen, Aayush Mishra, Daniel Khashabi
  • for: 这种研究旨在检验是否存在某种潜在的相似性 между大语言模型中的增量学习(ICL)和梯度下降(GD)。
  • methods: 该研究使用了一种新的方法,即在大语言模型中使用Transformer网络进行学习,并通过对ICL和GD进行比较,检验它们之间的关系。
  • results: 研究发现,ICL和GD在不同的数据集、模型和示例数下 exhibit 不同的行为,表明它们之间并不是完全相同的。这些结果 Suggests 存在一些假设不符的问题,需要进一步的研究以确认它们的等价性。
    Abstract Is In-Context Learning (ICL) implicitly equivalent to Gradient Descent (GD)? Several recent works draw analogies between the dynamics of GD and the emergent behavior of ICL in large language models. However, these works make assumptions far from the realistic natural language setting in which language models are trained. Such discrepancies between theory and practice, therefore, necessitate further investigation to validate their applicability. We start by highlighting the weaknesses in prior works that construct Transformer weights to simulate gradient descent. Their experiments with training Transformers on ICL objective, inconsistencies in the order sensitivity of ICL and GD, sparsity of the constructed weights, and sensitivity to parameter changes are some examples of a mismatch from the real-world setting. Furthermore, we probe and compare the ICL vs. GD hypothesis in a natural setting. We conduct comprehensive empirical analyses on language models pretrained on natural data (LLaMa-7B). Our comparisons on various performance metrics highlight the inconsistent behavior of ICL and GD as a function of various factors such as datasets, models, and number of demonstrations. We observe that ICL and GD adapt the output distribution of language models differently. These results indicate that the equivalence between ICL and GD is an open hypothesis, requires nuanced considerations and calls for further studies.
    摘要 是否存在卷积下降(GD)与Context Learning(ICL)的隐式等价?一些最近的研究将GD和ICL的动力学比作,但这些研究假设了不realistic的自然语言训练环境,导致了与实际情况之间的差异。因此,进一步的调查是必要的以验证其可靠性。我们开始于 highlighting priors works的缺陷,它们通过构建Transformer weights来模拟GD。他们在训练Transformers时使用ICL目标,但存在一些不一致的问题,如ICL和GD的敏感性顺序、稀疏的构建矩阵、和参数变化的敏感性。此外,我们进行了ICL vs. GD的比较,并在自然 Setting中进行了广泛的实验分析。我们在使用自然数据(LLaMa-7B)预训练的语言模型上进行了多种表现指标的比较。我们发现,ICL和GD在不同的数据集、模型和示例数目上 exhibit不一致的行为。这些结果表明,ICL和GD的等价性是一个开放的假设,需要细致的考虑和进一步的研究。

Formally Specifying the High-Level Behavior of LLM-Based Agents

  • paper_url: http://arxiv.org/abs/2310.08535
  • repo_url: None
  • paper_authors: Maxwell Crouse, Ibrahim Abdelaziz, Kinjal Basu, Soham Dan, Sadhana Kumaravel, Achille Fokoue, Pavan Kapanipathi, Luis Lastras
    for:LLM-based agents are promising tools for solving challenging problems without the need for task-specific finetuned models.methods:The proposed framework uses Linear Temporal Logic (LTL) to specify desired agent behaviors, and a constrained decoder to guarantee the LLM will produce an output exhibiting the desired behavior.results:The framework enables rapid design, implementation, and experimentation with different LLM-based agents, and provides benefits such as the ability to enforce complex agent behavior, formally validate prompt examples, and incorporate content-focused logical constraints into generation. The approach leads to improvements in agent performance, and the code is released for general use.Here is the text in Simplified Chinese:for: LLM-based agents 是一种可以解决复杂问题的有前途的工具,无需特定任务的精心适应模型。methods: 提议的框架使用线性时间逻辑(LTL)来指定代理行为,并使用受限的解码器来保证 LLM 生成输出符合所需的行为。results: 该框架可以快速设计、实现和测试不同的 LLM-based agents,并提供了一些优点,如强制执行复杂的代理行为、正式验证提示示例、内容专注的逻辑约束的 incorporation into generation。该方法可以提高代理性能,并公开发布代码。
    Abstract LLM-based agents have recently emerged as promising tools for solving challenging problems without the need for task-specific finetuned models that can be expensive to procure. Currently, the design and implementation of such agents is ad hoc, as the wide variety of tasks that LLM-based agents may be applied to naturally means there can be no one-size-fits-all approach to agent design. In this work we aim to alleviate the difficulty of designing and implementing new agents by proposing a minimalistic, high-level generation framework that simplifies the process of building agents. The framework we introduce allows the user to specify desired agent behaviors in Linear Temporal Logic (LTL). The declarative LTL specification is then used to construct a constrained decoder that guarantees the LLM will produce an output exhibiting the desired behavior. By designing our framework in this way, we obtain several benefits, including the ability to enforce complex agent behavior, the ability to formally validate prompt examples, and the ability to seamlessly incorporate content-focused logical constraints into generation. In particular, our declarative approach, in which the desired behavior is simply described without concern for how it should be implemented or enforced, enables rapid design, implementation and experimentation with different LLM-based agents. We demonstrate how the proposed framework can be used to implement recent LLM-based agents, and show how the guardrails our approach provides can lead to improvements in agent performance. In addition, we release our code for general use.
    摘要 The framework we introduce allows the user to specify desired agent behaviors in Linear Temporal Logic (LTL). The declarative LTL specification is then used to construct a constrained decoder that guarantees the LLM will produce an output exhibiting the desired behavior. By designing our framework in this way, we obtain several benefits, including the ability to enforce complex agent behavior, the ability to formally validate prompt examples, and the ability to seamlessly incorporate content-focused logical constraints into generation.In particular, our declarative approach, in which the desired behavior is simply described without concern for how it should be implemented or enforced, enables rapid design, implementation, and experimentation with different LLM-based agents. We demonstrate how the proposed framework can be used to implement recent LLM-based agents, and show how the guardrails our approach provides can lead to improvements in agent performance. Additionally, we release our code for general use.

How connectivity structure shapes rich and lazy learning in neural circuits

  • paper_url: http://arxiv.org/abs/2310.08513
  • repo_url: None
  • paper_authors: Yuhan Helena Liu, Aristide Baratin, Jonathan Cornford, Stefan Mihalas, Eric Shea-Brown, Guillaume Lajoie
  • for: 这个论文探讨了深度学习工具如何用于研究神经网络学习动态。
  • methods: 这篇论文使用了实验和理论分析来研究初始积分特性如何影响神经网络的学习 режим。
  • results: 研究发现,高级别初始积分通常导致小变化的网络学习 режим,而低级别初始积分则导致更加丰富的学习 режим。
    Abstract In theoretical neuroscience, recent work leverages deep learning tools to explore how some network attributes critically influence its learning dynamics. Notably, initial weight distributions with small (resp. large) variance may yield a rich (resp. lazy) regime, where significant (resp. minor) changes to network states and representation are observed over the course of learning. However, in biology, neural circuit connectivity generally has a low-rank structure and therefore differs markedly from the random initializations generally used for these studies. As such, here we investigate how the structure of the initial weights, in particular their effective rank, influences the network learning regime. Through both empirical and theoretical analyses, we discover that high-rank initializations typically yield smaller network changes indicative of lazier learning, a finding we also confirm with experimentally-driven initial connectivity in recurrent neural networks. Conversely, low-rank initialization biases learning towards richer learning. Importantly, however, as an exception to this rule, we find lazier learning can still occur with a low-rank initialization that aligns with task and data statistics. Our research highlights the pivotal role of initial weight structures in shaping learning regimes, with implications for metabolic costs of plasticity and risks of catastrophic forgetting.
    摘要 在理论神经科学中,最近的工作利用深度学习工具来探索如何某些网络特性影响其学习动态。特别是,初始 веса分布有小(resp. 大)方差可能导致一个丰富(resp. 懒散)的学习模式,其中网络状态和表示有 significiant(resp. 微不足)的变化。然而,生物中神经Circuit连接通常具有低维结构,因此与通常用于这些研究的随机初始化不同。因此,我们 investigate how the structure of the initial weights, particularly their effective rank, influences the network learning regime.通过实验和理论分析,我们发现高维初始化通常导致小网络变化,表示懒散学习,而低维初始化启动学习更加丰富。然而,我们发现在任务和数据统计相align的低维初始化下,可以occurrence lazier learning。我们的研究强调初始 веса结构在形成学习模式的作用,有关 метаболиic cost of plasticity和忘记风险。

HoneyBee: Progressive Instruction Finetuning of Large Language Models for Materials Science

  • paper_url: http://arxiv.org/abs/2310.08511
  • repo_url: https://github.com/BangLab-UdeM-Mila/NLP4MatSci-HoneyBee
  • paper_authors: Yu Song, Santiago Miret, Huan Zhang, Bang Liu
  • for: 本研究的目的是提出一种信任worthy数据准备过程(MatSci-Instruct),并应用其在语言模型中进行迭代优化(HoneyBee),以解决物理科学领域的数据准备问题。
  • methods: 本研究使用了多个商业可用的大语言模型(如Chat-GPT和Claude),通过Instructor模块和Verifier模块的合作,提高生成的数据的可靠性和相关性。
  • results: 本研究通过MatSci-Instruct来构建多个任务的数据集,并评估了数据集的质量从多个维度,包括准确性、相关性、完整性和合理性。此外,本研究还通过迭代生成更加定向的指令和指令数据来进行迭代优化,以达到进一步改进HoneyBee模型的性能。
    Abstract We propose an instruction-based process for trustworthy data curation in materials science (MatSci-Instruct), which we then apply to finetune a LLaMa-based language model targeted for materials science (HoneyBee). MatSci-Instruct helps alleviate the scarcity of relevant, high-quality materials science textual data available in the open literature, and HoneyBee is the first billion-parameter language model specialized to materials science. In MatSci-Instruct we improve the trustworthiness of generated data by prompting multiple commercially available large language models for generation with an Instructor module (e.g. Chat-GPT) and verification from an independent Verifier module (e.g. Claude). Using MatSci-Instruct, we construct a dataset of multiple tasks and measure the quality of our dataset along multiple dimensions, including accuracy against known facts, relevance to materials science, as well as completeness and reasonableness of the data. Moreover, we iteratively generate more targeted instructions and instruction-data in a finetuning-evaluation-feedback loop leading to progressively better performance for our finetuned HoneyBee models. Our evaluation on the MatSci-NLP benchmark shows HoneyBee's outperformance of existing language models on materials science tasks and iterative improvement in successive stages of instruction-data refinement. We study the quality of HoneyBee's language modeling through automatic evaluation and analyze case studies to further understand the model's capabilities and limitations. Our code and relevant datasets are publicly available at \url{https://github.com/BangLab-UdeM-Mila/NLP4MatSci-HoneyBee}.
    摘要 我们提出一种基于 instrucion 的数据纯化 процесс,称为 MatSci-Instruct,用于提高材料科学领域的数据质量。我们 THEN 使用这种 processto 训练一个基于 LLaMa 语言模型,称为 HoneyBee,以提高材料科学领域的语言模型性能。MatSci-Instruct 可以帮助解决开 literature 中材料科学领域的相关、高质量文本数据的缺乏问题,HoneyBee 是首个专门针对材料科学的一千亿参数语言模型。在 MatSci-Instruct 中,我们通过多个商业可用的大语言模型(例如 Chat-GPT 和 Claude)的干预和独立验证模块的验证来提高生成数据的可靠性。我们使用 MatSci-Instruct 构建多个任务的数据集,并对数据集进行多维度评估,包括准确性、 relevance、完整性和合理性。此外,我们在 finetuning-evaluation-feedback 循环中不断生成更加定向的 instructon-data,导致我们的 fine-tuned HoneyBee 模型的表现不断改善。我们在 MatSci-NLP benchmark 上进行评估,发现 HoneyBee 对材料科学任务的表现优于现有语言模型,并在 successive stages of instruction-data refinement 中进行Iterative improvement。我们通过自动评估和案例研究来深入了解 HoneyBee 模型的能力和局限性。我们的代码和相关数据集可以在 \url{https://github.com/BangLab-UdeM-Mila/NLP4MatSci-HoneyBee} 上获取。

Impact of time and note duration tokenizations on deep learning symbolic music modeling

  • paper_url: http://arxiv.org/abs/2310.08497
  • repo_url: https://github.com/Natooz/music-modeling-time-duration
  • paper_authors: Nathan Fradet, Nicolas Gutowski, Fabien Chhel, Jean-Pierre Briot
  • for: 本研究旨在研究Symbolic music在深度学习任务中的应用,包括生成、识别、合成和Music Information Retrieval(MIR)等。
  • methods: 本研究使用了不同的tokenization方法,包括时间和音长表示方法,以研究这些方法对Transformer模型的表现的影响。
  • results: 研究发现,виси于任务,explicit信息可以提高表现,而time和音长表示方法在不同任务中的表现有所不同。
    Abstract Symbolic music is widely used in various deep learning tasks, including generation, transcription, synthesis, and Music Information Retrieval (MIR). It is mostly employed with discrete models like Transformers, which require music to be tokenized, i.e., formatted into sequences of distinct elements called tokens. Tokenization can be performed in different ways. As Transformer can struggle at reasoning, but capture more easily explicit information, it is important to study how the way the information is represented for such model impact their performances. In this work, we analyze the common tokenization methods and experiment with time and note duration representations. We compare the performances of these two impactful criteria on several tasks, including composer and emotion classification, music generation, and sequence representation learning. We demonstrate that explicit information leads to better results depending on the task.
    摘要 Symbolic music 广泛应用于深度学习任务中,包括生成、识别、合成和音乐信息检索(MIR)。它通常与分割模型如转换器结合使用,这些模型需要音乐被格式化为序列中的固定元素,即token。格式化可以通过不同的方式进行,而转换器可能会很难理解,但可以较容易捕捉明确的信息。因此,我们需要研究不同的表示方式对这种模型的性能产生何种影响。在这项工作中,我们分析了常见的tokenization方法,并对时间和音符持续时间的表示进行实验。我们比较了这两个重要的标准准则在不同任务中的表现,包括作曲和情感分类、音乐生成和序列表示学习。我们发现,明确的信息会带来更好的结果,具体取决于任务。

Can We Edit Multimodal Large Language Models?

  • paper_url: http://arxiv.org/abs/2310.08475
  • repo_url: https://github.com/zjunlp/easyedit
  • paper_authors: Siyuan Cheng, Bozhong Tian, Qingbin Liu, Xi Chen, Yongheng Wang, Huajun Chen, Ningyu Zhang
  • for: 这个论文主要关注于编辑多Modal大型自然语言模型(MLLMs)。与单Modal模型编辑相比,多Modal模型编辑更加具有挑战性,需要更高的级别的精检和谨慎的考虑。为促进这一领域的研究,我们建立了一个新的标准 benchmark,名为MMEdit,并开发了一组创新的评价指标。
  • methods: 我们在这个论文中采用了多种模型编辑基线和评价指标,并进行了广泛的实验。我们发现,之前的基线可以在一定程度上实现编辑多Modal LLMs,但效果仍然很有限,这表明这个任务可能比较困难。
  • results: 我们的实验结果表明,之前的基线可以在一定程度上实现编辑多Modal LLMs,但效果仍然很有限。我们希望通过这个研究,为NLP社区提供一些新的想法和灵感。代码和数据集可以在https://github.com/zjunlp/EasyEdit中下载。
    Abstract In this paper, we focus on editing Multimodal Large Language Models (MLLMs). Compared to editing single-modal LLMs, multimodal model editing is more challenging, which demands a higher level of scrutiny and careful consideration in the editing process. To facilitate research in this area, we construct a new benchmark, dubbed MMEdit, for editing multimodal LLMs and establishing a suite of innovative metrics for evaluation. We conduct comprehensive experiments involving various model editing baselines and analyze the impact of editing different components for multimodal LLMs. Empirically, we notice that previous baselines can implement editing multimodal LLMs to some extent, but the effect is still barely satisfactory, indicating the potential difficulty of this task. We hope that our work can provide the NLP community with insights. Code and dataset are available in https://github.com/zjunlp/EasyEdit.
    摘要 在这篇论文中,我们关注编辑多Modal Large Language Models(MLLMs)。与单modal LLMs 编辑相比,多modal 模型编辑更加具有挑战性,需要更高的审核和谨慎评估。为促进这一领域的研究,我们构建了一个新的标准测试集,名为MMEdit,并开发了一套创新的评价指标。我们进行了对多modal LLMs 编辑不同组件的全面实验,并分析了不同组件的编辑对多modal LLMs 的影响。实验结果表明,前一代基eline可以部分地编辑多modal LLMs,但效果仍然很有限,表明这是一项具有挑战性的任务。我们希望通过这项工作,为NLP社区提供新的想法。代码和数据集可以在https://github.com/zjunlp/EasyEdit 上找到。

Belief formation and the persistence of biased beliefs

  • paper_url: http://arxiv.org/abs/2310.08466
  • repo_url: None
  • paper_authors: Olivier Compte
  • for: 本研究旨在描述智能代理如何在决策过程中处理信息,以及如何偏袋证据导致的偏见。
  • methods: 本研究使用了一种假设形成模型,其中代理尝试将两个理论区分开,并且因为证据的强度差异,倾向于接受具有强(可能罕见)证据的理论。
  • results: 研究发现,由于信息处理限制,代理可能会剪辑弱证据,导致一些歧义问题中的证据变得一面。更加聪明的代理不会受到这些偏袋证据的影响,但是一些不那么聪明的代理可能会偏袋其信念。
    Abstract We propose a belief-formation model where agents attempt to discriminate between two theories, and where the asymmetry in strength between confirming and disconfirming evidence tilts beliefs in favor of theories that generate strong (and possibly rare) confirming evidence and weak (and frequent) disconfirming evidence. In our model, limitations on information processing provide incentives to censor weak evidence, with the consequence that for some discrimination problems, evidence may become mostly one-sided, independently of the true underlying theory. Sophisticated agents who know the characteristics of the censored data-generating process are not lured by this accumulation of ``evidence'', but less sophisticated ones end up with biased beliefs.
    摘要 我们提出了一种信仰形成模型,在这个模型中,代理人尝试区分两个理论,而差异强度 между证实和驳斥证据使得信仰倾向于强大(可能罕见)的证实证据和弱(常见)的驳斥证据。在我们的模型中,信息处理的限制提供了奖励自我ensorcement的机会,导致一些推理问题上的证据变得一面,独立于真实下面理论。更加了解的代理人不会受到这些偏见的影响,但是不那么了解的代理人则会受到偏见。

DistillSpec: Improving Speculative Decoding via Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2310.08461
  • repo_url: None
  • paper_authors: Yongchao Zhou, Kaifeng Lyu, Ankit Singh Rawat, Aditya Krishna Menon, Afshin Rostamizadeh, Sanjiv Kumar, Jean-François Kagy, Rishabh Agarwal
  • for: 这个论文旨在提高大型语言模型的推导速度,使用快速的范本模型生成多个 tokens,然后在平行验证过程中运用更大的目标模型来生成文本,根据目标模型的分布。
  • methods: 这个方法使用知识传递来更好地调整范本模型和目标模型之间的对齐性,然后通过快速推导来实现文本生成。
  • results: 这个方法可以在多个标准参数上获得很好的速度提升,从10%到45%不等,并且可以在不同的标准参数和推导策略下进行精确的调整。此外,这个方法可以与丧失SD结合,以控制推导延误和任务性能的贸易。最后,这个方法可以在实际的实验中,使用对齐模型来实现6-10倍的延误缩减,而且几乎没有性能下降。
    Abstract Speculative decoding (SD) accelerates large language model inference by employing a faster draft model for generating multiple tokens, which are then verified in parallel by the larger target model, resulting in the text generated according to the target model distribution. However, identifying a compact draft model that is well-aligned with the target model is challenging. To tackle this issue, we propose DistillSpec that uses knowledge distillation to better align the draft model with the target model, before applying SD. DistillSpec makes two key design choices, which we demonstrate via systematic study to be crucial to improving the draft and target alignment: utilizing on-policy data generation from the draft model, and tailoring the divergence function to the task and decoding strategy. Notably, DistillSpec yields impressive 10 - 45% speedups over standard SD on a range of standard benchmarks, using both greedy and non-greedy sampling. Furthermore, we combine DistillSpec with lossy SD to achieve fine-grained control over the latency vs. task performance trade-off. Finally, in practical scenarios with models of varying sizes, first using distillation to boost the performance of the target model and then applying DistillSpec to train a well-aligned draft model can reduce decoding latency by 6-10x with minimal performance drop, compared to standard decoding without distillation.
    摘要 假设解oding(SD)可以加速大型语言模型的推断,通过使用更快的稿本模型来生成多个字元,然后在平行验证这些字元的准确性,以生成根据目标模型分布的文本。但是,找到一个具有单位大小的稿本模型,与目标模型相互Alignment是一个挑战。为了解决这个问题,我们提出了DistillSpec,它使用知识传播来更好地对稿本模型和目标模型进行Alignment。DistillSpec做出了两项重要的设计决策,我们通过系统性的研究证明这些设计决策是关键的提高稿本和目标模型的Alignment:使用稿本模型生成的在policy数据来验证稿本模型,并调整差异函数以适应任务和推断策略。特别是,DistillSpec可以在标准 benchmark 上获得了10-45%的提高,使用了 both greedy 和 non-greedy 推断。此外,我们可以将DistillSpec与lossy SD 结合,以获得精确的任务性能和时延调整。最后,在实际应用中,首先使用对target模型进行增强,然后使用DistillSpec对稿本模型进行训练,可以将推断时间降低6-10倍,而且几乎没有性能下降。

A Survey of Heterogeneous Transfer Learning

  • paper_url: http://arxiv.org/abs/2310.08459
  • repo_url: https://github.com/ymsun99/Heterogeneous-Transfer-Learning
  • paper_authors: Runxue Bao, Yiming Sun, Yuhe Gao, Jindong Wang, Qiang Yang, Haifeng Chen, Zhi-Hong Mao, Ye Ye
  • for: 本研究旨在提供一份彻悟的综述,涵盖最新的非同一致学习方法的发展,以帮助未来的研究。
  • methods: 本文总结了不同学习场景下的多样化学习方法,包括自适应学习、卷积神经网络、隐藏状态模型、等方法,以及它们在不同应用场景中的应用。
  • results: 本文综述了不同领域中的实验结果,包括自然语言处理、计算机视觉、多模式识别、生物医学等领域,以及它们的应用场景和限制。
    Abstract The application of transfer learning, an approach utilizing knowledge from a source domain to enhance model performance in a target domain, has seen a tremendous rise in recent years, underpinning many real-world scenarios. The key to its success lies in the shared common knowledge between the domains, a prerequisite in most transfer learning methodologies. These methods typically presuppose identical feature spaces and label spaces in both domains, known as homogeneous transfer learning, which, however, is not always a practical assumption. Oftentimes, the source and target domains vary in feature spaces, data distributions, and label spaces, making it challenging or costly to secure source domain data with identical feature and label spaces as the target domain. Arbitrary elimination of these differences is not always feasible or optimal. Thus, heterogeneous transfer learning, acknowledging and dealing with such disparities, has emerged as a promising approach for a variety of tasks. Despite the existence of a survey in 2017 on this topic, the fast-paced advances post-2017 necessitate an updated, in-depth review. We therefore present a comprehensive survey of recent developments in heterogeneous transfer learning methods, offering a systematic guide for future research. Our paper reviews methodologies for diverse learning scenarios, discusses the limitations of current studies, and covers various application contexts, including Natural Language Processing, Computer Vision, Multimodality, and Biomedicine, to foster a deeper understanding and spur future research.
    摘要 “将学习传播技术应用到目标领域,以优化模型表现,在过去几年中获得了巨大的发展,支撑了许多实际应用场景。这些方法通常假设源领域和目标领域之间存在共同知识,这是传统的传播学习方法的前提。然而,这些方法通常假设源领域和目标领域之间存在同样的特征空间和标签空间,这称为同样的传播学习。然而,这种假设不一定是实际可行或优化的。因此,不同领域之间的传播学习,承认和处理这些差异,已经成为一种有前途的方法。尽管在2017年已经有一篇关于这个主题的调查,但随着时间的推移,这些领域的发展速度很快,因此我们需要一份更新、更深入的评论。我们因此提出了一份综观最近几年传播学习方法的综观,实现了系统化的引导。我们的评论涵盖了多种学习enario,讨论了现有研究的限制,并涵盖了不同应用场景,包括自然语言处理、computer vision、多模式和生医,以促进更深入的理解和未来研究。”

Metrics for popularity bias in dynamic recommender systems

  • paper_url: http://arxiv.org/abs/2310.08455
  • repo_url: None
  • paper_authors: Valentijn Braun, Debarati Bhaumik, Diptish Dey
  • for: 这篇论文主要目标是量化推荐系统中的不公正和偏见。
  • methods: 论文提出了四种度量推荐系统中受欢迎性偏见的指标,并在两个常用的 benchmark 数据集上测试了四种 collaborative filtering 算法。
  • results: 测试结果表明,提出的度量指标可以为推荐系统的不公正和偏见提供全面的理解,并且在不同的敏感用户群体中存在增长的差距。
    Abstract Albeit the widespread application of recommender systems (RecSys) in our daily lives, rather limited research has been done on quantifying unfairness and biases present in such systems. Prior work largely focuses on determining whether a RecSys is discriminating or not but does not compute the amount of bias present in these systems. Biased recommendations may lead to decisions that can potentially have adverse effects on individuals, sensitive user groups, and society. Hence, it is important to quantify these biases for fair and safe commercial applications of these systems. This paper focuses on quantifying popularity bias that stems directly from the output of RecSys models, leading to over recommendation of popular items that are likely to be misaligned with user preferences. Four metrics to quantify popularity bias in RescSys over time in dynamic setting across different sensitive user groups have been proposed. These metrics have been demonstrated for four collaborative filtering based RecSys algorithms trained on two commonly used benchmark datasets in the literature. Results obtained show that the metrics proposed provide a comprehensive understanding of growing disparities in treatment between sensitive groups over time when used conjointly.
    摘要 This paper aims to address this issue by quantifying popularity bias in RecSys, which stems directly from the output of the models and leads to the over-recommendation of popular items that may be misaligned with user preferences. To achieve this, four metrics have been proposed to quantify popularity bias in RecSys over time in a dynamic setting across different sensitive user groups. These metrics have been demonstrated for four collaborative filtering-based RecSys algorithms trained on two commonly used benchmark datasets in the literature.The results obtained show that the proposed metrics provide a comprehensive understanding of the growing disparities in treatment between sensitive groups over time when used conjointly. This study contributes to the development of fair and safe RecSys by providing a quantitative approach to identify and mitigate popularity bias.

Towards Robust Multi-Modal Reasoning via Model Selection

  • paper_url: http://arxiv.org/abs/2310.08446
  • repo_url: None
  • paper_authors: Xiangyan Liu, Rongxue Li, Wei Ji, Tao Lin
  • For: This paper aims to improve the robustness of multi-modal agents in multi-step reasoning by addressing the challenge of model selection.* Methods: The paper proposes the $\textit{M}^3$ framework, a plug-in with negligible runtime overhead at test-time, to improve model selection and bolster the robustness of multi-modal agents.* Results: The paper creates a new dataset, MS-GQA, to investigate the model selection challenge in multi-modal agents and shows that the proposed framework enables dynamic model selection, considering both user inputs and subtask dependencies, thereby robustifying the overall reasoning process.
    Abstract The reasoning capabilities of LLM (Large Language Model) are widely acknowledged in recent research, inspiring studies on tool learning and autonomous agents. LLM serves as the "brain" of agent, orchestrating multiple tools for collaborative multi-step task solving. Unlike methods invoking tools like calculators or weather APIs for straightforward tasks, multi-modal agents excel by integrating diverse AI models for complex challenges. However, current multi-modal agents neglect the significance of model selection: they primarily focus on the planning and execution phases, and will only invoke predefined task-specific models for each subtask, making the execution fragile. Meanwhile, other traditional model selection methods are either incompatible with or suboptimal for the multi-modal agent scenarios, due to ignorance of dependencies among subtasks arising by multi-step reasoning. To this end, we identify the key challenges therein and propose the $\textit{M}^3$ framework as a plug-in with negligible runtime overhead at test-time. This framework improves model selection and bolsters the robustness of multi-modal agents in multi-step reasoning. In the absence of suitable benchmarks, we create MS-GQA, a new dataset specifically designed to investigate the model selection challenge in multi-modal agents. Our experiments reveal that our framework enables dynamic model selection, considering both user inputs and subtask dependencies, thereby robustifying the overall reasoning process. Our code and benchmark: https://github.com/LINs-lab/M3.
    摘要 大量语言模型(LLM)的智能能力在最新的研究中得到了广泛认可,激发了工具学习和自主代理研究。LLM作为代理的“脑”,整合多种工具进行合作多步任务解决。与传统的方法不同,现在的多模态代理忽略了模型选择的重要性:它们主要关注计划和执行阶段,只在每个子任务中预先定义任务特定的模型,使执行过程脆弱。此外,传统的模型选择方法在多模态代理场景中不兼容或优化不够,因为忽略了由多步逻辑导致的任务依赖关系。为了解决这些挑战,我们认为需要一个可插入的框架,具有较少的运行时开销。我们称之为$\textit{M}^3$框架,它可以在测试时进行插入。这个框架改进了模型选择,使多模态代理在多步逻辑中更加稳定。由于缺乏适当的benchmark,我们创建了MS-GQA数据集,用于调查多模态代理中模型选择挑战的问题。我们的实验表明,我们的框架可以动态选择模型,考虑用户输入和子任务依赖关系,从而强化整体逻辑过程的稳定性。我们的代码和benchmark可以在GitHub上找到:https://github.com/LINs-lab/M3。

Debias the Training of Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.08442
  • repo_url: None
  • paper_authors: Hu Yu, Li Shen, Jie Huang, Man Zhou, Hongsheng Li, Feng Zhao
  • for: 提高Diffusion模型的生成质量
  • methods: 提出了一种有效的权重调整策略,以解决常用的损失函数策略带来的偏见问题
  • results: 通过理论分析和实验评估,证明了该策略可以减少偏见问题,并提高样本质量和生成效率
    Abstract Diffusion models have demonstrated compelling generation quality by optimizing the variational lower bound through a simple denoising score matching loss. In this paper, we provide theoretical evidence that the prevailing practice of using a constant loss weight strategy in diffusion models leads to biased estimation during the training phase. Simply optimizing the denoising network to predict Gaussian noise with constant weighting may hinder precise estimations of original images. To address the issue, we propose an elegant and effective weighting strategy grounded in the theoretically unbiased principle. Moreover, we conduct a comprehensive and systematic exploration to dissect the inherent bias problem deriving from constant weighting loss from the perspectives of its existence, impact and reasons. These analyses are expected to advance our understanding and demystify the inner workings of diffusion models. Through empirical evaluation, we demonstrate that our proposed debiased estimation method significantly enhances sample quality without the reliance on complex techniques, and exhibits improved efficiency compared to the baseline method both in training and sampling processes.
    摘要 Diffusion models 已经展示了吸引人的生成质量,通过简单的降噪对应loss来优化variational lower bound。在这篇论文中,我们提供了理论证明,表明常用的常数损失重量策略在Diffusion models中导致训练阶段的估计偏见。只是优化降噪网络以预测 Gaussian noise 的常数权重,可能会妨碍精准估计原始图像。为解决这个问题,我们提议一种精美和有效的权重策略,基于理论上的无偏估计原理。此外,我们进行了系统性的探索,析分了常数损失重量导致的内在偏见问题的存在、影响和原因。这些分析将有助于我们更深入理解Diffusion models的内部工作机制。通过实验评估,我们示出了我们提议的减偏估计方法可以大幅提高样本质量,不需要复杂的技术,并且在训练和采样过程中比基eline方法更高效。

The Impact of Explanations on Fairness in Human-AI Decision-Making: Protected vs Proxy Features

  • paper_url: http://arxiv.org/abs/2310.08617
  • repo_url: None
  • paper_authors: Navita Goyal, Connor Baumler, Tin Nguyen, Hal Daumé III
  • for: 本研究旨在 investigating the effect of protected and proxy features on participants’ perception of model fairness and their ability to improve demographic parity over an AI alone.
  • methods: 本研究使用了不同的treatments,包括解释、模型偏见披露和代理相关性披露,以影响人们对模型公平性的识别和决策公平性。
  • results: 研究发现,解释可以帮助人们检测直接偏见,但不能帮助人们检测间接偏见。此外,无论偏见类型如何,解释都会增加对模型偏见的同意。披露可以减轻这种效果,提高不公正认知和决策公平性。
    Abstract AI systems have been known to amplify biases in real world data. Explanations may help human-AI teams address these biases for fairer decision-making. Typically, explanations focus on salient input features. If a model is biased against some protected group, explanations may include features that demonstrate this bias, but when biases are realized through proxy features, the relationship between this proxy feature and the protected one may be less clear to a human. In this work, we study the effect of the presence of protected and proxy features on participants' perception of model fairness and their ability to improve demographic parity over an AI alone. Further, we examine how different treatments -- explanations, model bias disclosure and proxy correlation disclosure -- affect fairness perception and parity. We find that explanations help people detect direct biases but not indirect biases. Additionally, regardless of bias type, explanations tend to increase agreement with model biases. Disclosures can help mitigate this effect for indirect biases, improving both unfairness recognition and the decision-making fairness. We hope that our findings can help guide further research into advancing explanations in support of fair human-AI decision-making.
    摘要

Neural Sampling in Hierarchical Exponential-family Energy-based Models

  • paper_url: http://arxiv.org/abs/2310.08431
  • repo_url: None
  • paper_authors: Xingsi Dong, Si Wu
  • for: 这个论文旨在探讨脑海的推理和学习方法。
  • methods: 该论文提出了 Hierarchical Exponential-family Energy-based(HEE)模型,该模型可以同时进行推理和学习,并且可以通过采样神经元响应的梯度来估计归一化函数。
  • results: 该模型可以快速地进行推理和学习,并且可以在自然图像 datasets 上显示出类似于生物视觉系统中的表示。此外,该模型还可以通过 marginal generation 或 joint generation 生成观察结果,并且 marginal generation 可以达到与其他 EBMs 相同的性能。
    Abstract Bayesian brain theory suggests that the brain employs generative models to understand the external world. The sampling-based perspective posits that the brain infers the posterior distribution through samples of stochastic neuronal responses. Additionally, the brain continually updates its generative model to approach the true distribution of the external world. In this study, we introduce the Hierarchical Exponential-family Energy-based (HEE) model, which captures the dynamics of inference and learning. In the HEE model, we decompose the partition function into individual layers and leverage a group of neurons with shorter time constants to sample the gradient of the decomposed normalization term. This allows our model to estimate the partition function and perform inference simultaneously, circumventing the negative phase encountered in conventional energy-based models (EBMs). As a result, the learning process is localized both in time and space, and the model is easy to converge. To match the brain's rapid computation, we demonstrate that neural adaptation can serve as a momentum term, significantly accelerating the inference process. On natural image datasets, our model exhibits representations akin to those observed in the biological visual system. Furthermore, for the machine learning community, our model can generate observations through joint or marginal generation. We show that marginal generation outperforms joint generation and achieves performance on par with other EBMs.
    摘要 bayesian 脑理论 suggets that the brain 使用生成模型来理解外部世界。 sampling-based 观点认为脑内部INFERS posterior distribution 通过抽样 Stochastic neuronal responses。 此外,脑 continually 更新其生成模型,以 approaching true distribution 外部世界。 在这种研究中,我们引入 Hierarchical Exponential-family Energy-based (HEE) 模型,该模型捕捉了推理和学习的动力学。在 HEE 模型中,我们将 partition function 分解成各层,并利用一组具有 shorter time constants 的 neurons 来抽样分解 normalization term 的梯度。 这 permit our model 可以估算 partition function 并同时进行推理,而不是在 conventional energy-based models (EBMs) 中遇到的负相位。 因此,学习过程是在时间和空间上局部化的,模型易于收敛。 为了匹配脑的快速计算,我们示出 neural adaptation 可以作为推理过程中的推进量,帮助加速推理过程。 在自然图像数据集上,我们的模型表现出类似于生物视觉系统中的表征。 此外,为机器学习社区,我们的模型可以通过 joint 或 marginal generation 生成观测。 我们表明 marginal generation 超过 joint generation,并达到与其他 EBMs 相同的性能。

DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing

  • paper_url: http://arxiv.org/abs/2310.08785
  • repo_url: https://github.com/yueming6568/deltaedit
  • paper_authors: Yueming Lyu, Kang Zhao, Bo Peng, Yue Jiang, Yingya Zhang, Jing Dong
  • for: 文章主要旨在提高文本引导图像修改的训练和推理灵活性。
  • methods: 文章提出了一种基于 CLIP delta 空间的新方法,称为 deltaedit,它可以在训练阶段将 CLIP 视觉特征差映射到生成模型的幂空间方向上,并在推理阶段通过 CLIP 文本特征差来预测幂空间方向。
  • results: 实验证明, deltaedit 可以在不同的生成模型(包括 GAN 模型和扩散模型)上实现文本引导图像修改的灵活性,并且可以在不同的文本描述下进行零shot推理。
    Abstract Text-guided image editing faces significant challenges to training and inference flexibility. Much literature collects large amounts of annotated image-text pairs to train text-conditioned generative models from scratch, which is expensive and not efficient. After that, some approaches that leverage pre-trained vision-language models are put forward to avoid data collection, but they are also limited by either per text-prompt optimization or inference-time hyper-parameters tuning. To address these issues, we investigate and identify a specific space, referred to as CLIP DeltaSpace, where the CLIP visual feature difference of two images is semantically aligned with the CLIP textual feature difference of their corresponding text descriptions. Based on DeltaSpace, we propose a novel framework called DeltaEdit, which maps the CLIP visual feature differences to the latent space directions of a generative model during the training phase, and predicts the latent space directions from the CLIP textual feature differences during the inference phase. And this design endows DeltaEdit with two advantages: (1) text-free training; (2) generalization to various text prompts for zero-shot inference. Extensive experiments validate the effectiveness and versatility of DeltaEdit with different generative models, including both the GAN model and the diffusion model, in achieving flexible text-guided image editing. Code is available at https://github.com/Yueming6568/DeltaEdit.
    摘要 文本导向图像编辑面临训练和推理灵活性的重大挑战。大量文本描述和图像对应的 annotated image-text pairs 收集是贵重的并不是效率的。后来,一些利用预训练的视觉语言模型的方法被提出,以避免数据收集,但它们也受到文本提示优化或推理时的参数调整的限制。为解决这些问题,我们调查并发现了一个特定的空间,称为 CLIP DeltaSpace,其中 CLIP 视觉特征差异与 CLIP 文本特征差异相semantically 对齐。基于 DeltaSpace,我们提议一种新的框架 called DeltaEdit,它在训练阶段将 CLIP 视觉特征差异映射到生成模型的幂值空间方向上,并在推理阶段从 CLIP 文本特征差异预测幂值空间方向。这种设计具有两个优势:(1)无需文本训练;(2)对不同文本提示进行零件推理。广泛的实验证明了 DeltaEdit 与不同的生成模型,包括 GAN 模型和扩散模型,在实现文本导向图像编辑的灵活性方面的有效和多样化。代码可以在 https://github.com/Yueming6568/DeltaEdit 上获取。

SegLoc: Visual Self-supervised Learning Scheme for Dense Prediction Tasks of Security Inspection X-ray Images

  • paper_url: http://arxiv.org/abs/2310.08421
  • repo_url: None
  • paper_authors: Shervin Halat, Mohammad Rahmati, Ehsan Nazerfard
  • for: 本研究旨在提高对验安全检查X射线图像进行密集预测的能力。
  • methods: 本研究使用了增强的自然语言处理(NLP)技术,并将对比学习策略应用于现有的视觉自我超级学习(SSL)模型。
  • results: 对比于随机初始化方法,本研究的方法在AR和AP指标下,在不同的IOU值下表现出3%至6%的提高,但在不同的预训练纪元下,被超越了指导初始化方法。
    Abstract Lately, remarkable advancements of artificial intelligence have been attributed to the integration of self-supervised learning (SSL) scheme. Despite impressive achievements within natural language processing (NLP), SSL in computer vision has not been able to stay on track comparatively. Recently, integration of contrastive learning on top of existing visual SSL models has established considerable progress, thereby being able to outperform supervised counterparts. Nevertheless, the improvements were mostly limited to classification tasks; moreover, few studies have evaluated visual SSL models in real-world scenarios, while the majority considered datasets containing class-wise portrait images, notably ImageNet. Thus, here, we have considered dense prediction tasks on security inspection x-ray images to evaluate our proposed model Segmentation Localization (SegLoc). Based upon the model Instance Localization (InsLoc), our model has managed to address one of the most challenging downsides of contrastive learning, i.e., false negative pairs of query embeddings. To do so, our pre-training dataset is synthesized by cutting, transforming, then pasting labeled segments, as foregrounds, from an already existing labeled dataset (PIDray) onto instances, as backgrounds, of an unlabeled dataset (SIXray;) further, we fully harness the labels through integration of the notion, one queue per class, into MoCo-v2 memory bank, avoiding false negative pairs. Regarding the task in question, our approach has outperformed random initialization method by 3% to 6%, while having underperformed supervised initialization, in AR and AP metrics at different IoU values for 20 to 30 pre-training epochs.
    摘要 近期,人工智能的发展受到了自我指导学习(SSL)的整合的影响。尽管在自然语言处理(NLP)领域内的成果很出色,但在计算机视觉领域,SSL并没有很好地保持同步。最近,在现有的视觉SSL模型之上添加了对比学习,已经实现了较好的进步,并且能够超越指导学习的对比。然而,这些进步主要集中在分类任务上,而且很少的研究对视觉SSL模型进行了实际场景的评估,大多数研究都是使用类别图像 datasets,特别是ImageNet。因此,我们在安全检查式x射线图像上进行了粒度预测任务来评估我们的提议模型Segmentation Localization(SegLoc)。基于Instance Localization(InsLoc)模型,我们解决了对比学习中一个最大的挑战,即查询embedding false negative对。为此,我们使用了将已有的标注dataset(PIDray)中的标注段落切割、变换并贴上无标注dataset(SIXray)中的图像作为背景,并通过 integrate notion one queue per class into MoCo-v2 memory bank来完全利用标签。在问题上,我们的方法在与随机初始化方法的比较中出现了3%到6%的提升,而与指导初始化方法相比,在不同的IoU值下的AR和AP metric上出现了20到30个预训练纪元内的下降。

Jailbreaking Black Box Large Language Models in Twenty Queries

  • paper_url: http://arxiv.org/abs/2310.08419
  • repo_url: https://github.com/patrickrchao/jailbreakingllms
  • paper_authors: Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong
  • for: 保障大型自然语言模型(LLM)与人类价值观Alignment。
  • methods: 使用攻击者LLM自动生成 semantic jailbreaks,只需黑盒访问目标LLM。
  • results: PAIR algorithm可以很快生成jailbreak,需要 fewer than twenty queries,并且在不同的LLM上 achieve competitive jailbreaking success rates and transferability.
    Abstract There is growing interest in ensuring that large language models (LLMs) align with human values. However, the alignment of such models is vulnerable to adversarial jailbreaks, which coax LLMs into overriding their safety guardrails. The identification of these vulnerabilities is therefore instrumental in understanding inherent weaknesses and preventing future misuse. To this end, we propose Prompt Automatic Iterative Refinement (PAIR), an algorithm that generates semantic jailbreaks with only black-box access to an LLM. PAIR -- which is inspired by social engineering attacks -- uses an attacker LLM to automatically generate jailbreaks for a separate targeted LLM without human intervention. In this way, the attacker LLM iteratively queries the target LLM to update and refine a candidate jailbreak. Empirically, PAIR often requires fewer than twenty queries to produce a jailbreak, which is orders of magnitude more efficient than existing algorithms. PAIR also achieves competitive jailbreaking success rates and transferability on open and closed-source LLMs, including GPT-3.5/4, Vicuna, and PaLM-2.
    摘要 有越来越多的关注是确保大语言模型(LLM)与人类价值观念相对应。然而, LLM 的启用是易受到黑客攻击的威胁,这些攻击可以让 LLM 绕过安全拦束。因此,可以通过确定这些漏洞来理解 LLM 的内在弱点,并防止未来的滥用。为此目的,我们提议 Prompt Automatic Iterative Refinement(PAIR)算法,该算法可以使用黑盒访问 LLM 生成 semantic jailbreak,而无需人类干预。PAIR 灵感来自社会工程攻击,使用攻击者 LLM 自动生成针对目标 LLM 的 jailbreak。这样,攻击者 LLM 可以针对目标 LLM 进行无数次询问,以更新和精细化候选 jailbreak。我们的实验表明,PAIR 通常需要 fewer than twenty 个询问来生成 jailbreak,这是现有算法的整个数量级别效率。PAIR 还实现了对 open 和 closed-source LLM 的突破和传输性。包括 GPT-3.5/4、Vicuna 和 PaLM-2 等。

Tightening Bounds on Probabilities of Causation By Merging Datasets

  • paper_url: http://arxiv.org/abs/2310.08406
  • repo_url: None
  • paper_authors: Numair Sani, Atalanti A. Mastakouri
  • For: The paper aims to provide symbolic bounds on the Probabilities of Causation (PoC) for a challenging scenario where multiple datasets with different treatment assignment mechanisms are available.* Methods: The paper uses causal sufficiency and combines two randomized experiments or a randomized experiment and an observational study to derive symbolic bounds on the PoC.* Results: The paper provides bounds on the PoC that work for arbitrary dimensionality of covariates and treatment, and discusses the conditions under which these bounds are tighter than existing bounds in literature. Additionally, the paper allows for the possibility of different treatment assignment mechanisms across datasets, enabling the transfer of causal information from the external dataset to the target dataset.
    Abstract Probabilities of Causation (PoC) play a fundamental role in decision-making in law, health care and public policy. Nevertheless, their point identification is challenging, requiring strong assumptions, in the absence of which only bounds can be derived. Existing work to further tighten these bounds by leveraging extra information either provides numerical bounds, symbolic bounds for fixed dimensionality, or requires access to multiple datasets that contain the same treatment and outcome variables. However, in many clinical, epidemiological and public policy applications, there exist external datasets that examine the effect of different treatments on the same outcome variable, or study the association between covariates and the outcome variable. These external datasets cannot be used in conjunction with the aforementioned bounds, since the former may entail different treatment assignment mechanisms, or even obey different causal structures. Here, we provide symbolic bounds on the PoC for this challenging scenario. We focus on combining either two randomized experiments studying different treatments, or a randomized experiment and an observational study, assuming causal sufficiency. Our symbolic bounds work for arbitrary dimensionality of covariates and treatment, and we discuss the conditions under which these bounds are tighter than existing bounds in literature. Finally, our bounds parameterize the difference in treatment assignment mechanism across datasets, allowing the mechanisms to vary across datasets while still allowing causal information to be transferred from the external dataset to the target dataset.
    摘要 “ causal sufficiency ”在法律、医疗和公共政策中的决策中发挥基本作用。然而,它们的点标识具有挑战性,需要强大的假设,在缺乏这些假设的情况下只能 derivation 出界。现有的工作是通过利用额外信息来进一步紧紧这些界。然而,在许多临床、EPIDEMIOLOGY 和公共政策应用中,存在外部数据集,其研究不同的治疗方法对同一个结果变量的影响,或者研究 covariates 和结果变量之间的关系。这些外部数据集不能与上述界一起使用,因为前者可能具有不同的治疗分配机制,或者甚至遵循不同的 causal 结构。在这里,我们提供了符号约束,用于评估 PoC。我们集中于组合两个随机化实验,其中一个研究不同的治疗方法,另一个是随机化实验和观察研究,假设 causal sufficiency。我们的符号约束适用于任意维度的 covariates 和治疗,并讨论了这些约束在文献中是否更紧的。最后,我们的约束可以 Parametrize 治疗分配机制的差异,让机制在数据集之间差异,同时仍然允许 causal 信息从外部数据集传递到目标数据集。”

Performance/power assessment of CNN packages on embedded automotive platforms

  • paper_url: http://arxiv.org/abs/2310.08401
  • repo_url: None
  • paper_authors: Paolo Burgio, Gianluca Brilli
    for:This paper aims to support engineers in choosing the most appropriate deep neural network (CNN) package and computing system for their autonomous driving designs, while also deriving guidelines for adequately sizing their systems.methods:The paper will validate the effectiveness and efficiency of recent CNN networks on state-of-the-art platforms with embedded commercial-off-the-shelf System-on-Chips (SoCs), including Xavier AGX, Tegra X2, Nano for NVIDIA, and XCZU9EG and XCZU3EG of the Zynq UltraScale+ family for the Xilinx counterpart.results:The paper will provide guidelines for engineers to choose the most appropriate CNN package and computing system for their designs, based on the performance and power consumption of the SoCs.
    Abstract The rise of power-efficient embedded computers based on highly-parallel accelerators opens a number of opportunities and challenges for researchers and engineers, and paved the way to the era of edge computing. At the same time, advances in embedded AI for object detection and categorization such as YOLO, GoogleNet and AlexNet reached an unprecedented level of accuracy (mean-Average Precision - mAP) and performance (Frames-Per-Second - FPS). Today, edge computers based on heterogeneous many-core systems are a predominant choice to deploy such systems in industry 4.0, wearable devices, and - our focus - autonomous driving systems. In these latter systems, engineers struggle to make reduced automotive power and size budgets co-exist with the accuracy and performance targets requested by autonomous driving. We aim at validating the effectiveness and efficiency of most recent networks on state-of-the-art platforms with embedded commercial-off-the-shelf System-on-Chips, such as Xavier AGX, Tegra X2 and Nano for NVIDIA and XCZU9EG and XCZU3EG of the Zynq UltraScale+ family, for the Xilinx counterpart. Our work aims at supporting engineers in choosing the most appropriate CNN package and computing system for their designs, and deriving guidelines for adequately sizing their systems.
    摘要 随着高效的嵌入式计算机的兴起,基于高并行加速器的技术开创了许多机遇和挑战,并导致了边缘计算的时代。同时,嵌入式AI的对象检测和分类技术,如YOLO、GoogleNet和AlexNet,在准确率(mean-Average Precision - mAP)和性能(Frame-Per-Second - FPS)方面达到了历史性的水平。在现代工业4.0、穿梭设备和自动驾驶系统等领域,基于多核心多处理器系统的边缘计算机已成为主流选择。在这些系统中,工程师面临着减少汽车功率和尺寸预算的挑战,同时需要保持自动驾驶系统的准确率和性能标准。我们的研究旨在验证最新的网络在现有的商业半导体SoC上的效果和效率,如Xavier AGX、Tegra X2和Nano等NVIDIA SoC,以及XCZU9EG和XCZU3EG等Xilinx SoC。我们的工作旨在支持工程师选择最适合的Convolutional Neural Network(CNN)套件和计算系统,并 derive出适用于适应系统的指南。

Prompting Large Language Models with Chain-of-Thought for Few-Shot Knowledge Base Question Generation

  • paper_url: http://arxiv.org/abs/2310.08395
  • repo_url: None
  • paper_authors: Yuanyuan Liang, Jianing Wang, Hanlun Zhu, Lei Wang, Weining Qian, Yunshi Lan
  • for: 本研究旨在提出一种基于大语言模型的几何问题生成方法,以解决现有KBQG方法对于几何数据的依赖性。
  • methods: 我们提出了一种基于链条思想的几何问题生成方法(KQG-CoT),首先从无标注数据池中选择支持的逻辑形式,然后根据选择的示例进行链条式启发,并通过扩展KQG-CoT+来确保提问质量。
  • results: 我们在三个公共KBQG数据集上进行了广泛的实验,结果表明,我们的提问方法在评估数据集上一直表现出优于其他提问基线。特别是,我们的KQG-CoT+方法在PathQuestions数据集上超越了现有的几何数据集的SoTA结果,提高了BLEU-4、METEOR和ROUGE-L的评估指标的相对提升率。
    Abstract The task of Question Generation over Knowledge Bases (KBQG) aims to convert a logical form into a natural language question. For the sake of expensive cost of large-scale question annotation, the methods of KBQG under low-resource scenarios urgently need to be developed. However, current methods heavily rely on annotated data for fine-tuning, which is not well-suited for few-shot question generation. The emergence of Large Language Models (LLMs) has shown their impressive generalization ability in few-shot tasks. Inspired by Chain-of-Thought (CoT) prompting, which is an in-context learning strategy for reasoning, we formulate KBQG task as a reasoning problem, where the generation of a complete question is splitted into a series of sub-question generation. Our proposed prompting method KQG-CoT first retrieves supportive logical forms from the unlabeled data pool taking account of the characteristics of the logical form. Then, we write a prompt to explicit the reasoning chain of generating complicated questions based on the selected demonstrations. To further ensure prompt quality, we extend KQG-CoT into KQG-CoT+ via sorting the logical forms by their complexity. We conduct extensive experiments over three public KBQG datasets. The results demonstrate that our prompting method consistently outperforms other prompting baselines on the evaluated datasets. Remarkably, our KQG-CoT+ method could surpass existing few-shot SoTA results of the PathQuestions dataset by 18.25, 10.72, and 10.18 absolute points on BLEU-4, METEOR, and ROUGE-L, respectively.
    摘要 KBQG任务的目的是将逻辑形式转换为自然语言问题。由于大规模问题标注的昂贵成本,KBQG在低资源场景下的方法urgently需要开发。然而,现有方法均重视 annotated data 的微调,这不适用于少量问题生成。大型自然语言模型(LLMs)的出现表明它们在少量任务中表现出色。受链条思维(CoT)提问策略启发,我们将KBQG任务定义为reasoning问题,其中问题生成的完整过程被拆分为多个子问题生成。我们提出的KQG-CoT提问方法首先从无标注数据池中选择符合特征的逻辑形式,然后写出一个提示,以显示生成复杂问题的逻辑链。为了进一步保证提示质量,我们将KQG-CoT+进一步推广,对逻辑形式进行排序,以确保提示的复杂度适中。我们在三个公共KBQG数据集上进行了广泛的实验。结果表明,我们的提示方法在评估数据集上一直表现出色,并且可以与其他提示基eline比肩。特别是,我们的KQG-CoT+方法可以在PathQuestions数据集上超越现有的几个shot SoTA结果,在BLEU-4、METEOR和ROUGE-L三个指标上提高相对评价18.25、10.72和10.18分。

Towards Better Evaluation of Instruction-Following: A Case-Study in Summarization

  • paper_url: http://arxiv.org/abs/2310.08394
  • repo_url: None
  • paper_authors: Ondrej Skopek, Rahul Aralikatte, Sian Gooding, Victor Carbune
    for: 这个论文的目的是评估大型自然语言模型(LLM)如何遵循用户的指令。methods: 这篇论文使用了多种评估方法来量化LLM的指令遵循能力,包括Prompt-based方法。results: 研究发现,新的LLM-based reference-free评估方法可以提高评估精度,并与高品质的参照基础metric相当。
    Abstract Despite recent advances, evaluating how well large language models (LLMs) follow user instructions remains an open problem. While evaluation methods of language models have seen a rise in prompt-based approaches, limited work on the correctness of these methods has been conducted. In this work, we perform a meta-evaluation of a variety of metrics to quantify how accurately they measure the instruction-following abilities of LLMs. Our investigation is performed on grounded query-based summarization by collecting a new short-form, real-world dataset riSum, containing 300 document-instruction pairs with 3 answers each. All 900 answers are rated by 3 human annotators. Using riSum, we analyze the agreement between evaluation methods and human judgment. Finally, we propose new LLM-based reference-free evaluation methods that improve upon established baselines and perform on par with costly reference-based metrics that require high-quality summaries.
    摘要 尽管最近有所进步,评估大语言模型(LLM)遵循用户指令仍然是一个开放的问题。评估语言模型的方法有很多,但对这些方法的正确性进行了有限的研究。在这种情况下,我们进行了一项meta评估,用于量化 LLM 遵循用户指令的能力。我们的调查是基于文本摘要的基础,收集了300份文档指令对,每个对有3个答案。所有900个答案都被3名人类标注员评分。使用riSum,我们分析了评估方法与人类判断的一致性。最后,我们提出了一些新的 LLM 基于参照free评估方法,超越了已有的基线,并与高质量参照基础的评估方法相当。

Do Not Marginalize Mechanisms, Rather Consolidate!

  • paper_url: http://arxiv.org/abs/2310.08377
  • repo_url: None
  • paper_authors: Moritz Willig, Matej Zečević, Devendra Singh Dhami, Kristian Kersting
  • for: 本研究旨在开发一种能够简化大规模结构 causal model(SCM)的方法,以便更好地理解这些系统的复杂 causal 关系。
  • methods: 本研究提出了一种基于 consolidating causal mechanisms 的方法,可以将大规模 SCM 转换为更加简单的模型,保持了可变量的 causal 行为。
  • results: 研究表明,通过 consolidation 可以大幅减少计算复杂性,同时保持 SCM 的可变量性和 causal 行为的一致性。此外,研究还提供了一种泛化 SCM 的思路,以增强其应用范围。
    Abstract Structural causal models (SCMs) are a powerful tool for understanding the complex causal relationships that underlie many real-world systems. As these systems grow in size, the number of variables and complexity of interactions between them does, too. Thus, becoming convoluted and difficult to analyze. This is particularly true in the context of machine learning and artificial intelligence, where an ever increasing amount of data demands for new methods to simplify and compress large scale SCM. While methods for marginalizing and abstracting SCM already exist today, they may destroy the causality of the marginalized model. To alleviate this, we introduce the concept of consolidating causal mechanisms to transform large-scale SCM while preserving consistent interventional behaviour. We show consolidation is a powerful method for simplifying SCM, discuss reduction of computational complexity and give a perspective on generalizing abilities of consolidated SCM.
    摘要 Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. The translation is based on the standard grammar and vocabulary of Simplified Chinese, and may differ from Traditional Chinese, which is used in Taiwan and other countries.

MCU: A Task-centric Framework for Open-ended Agent Evaluation in Minecraft

  • paper_url: http://arxiv.org/abs/2310.08367
  • repo_url: https://github.com/craftjarvis/mcu
  • paper_authors: Haowei Lin, Zihao Wang, Jianzhu Ma, Yitao Liang
  • for: 本研究旨在开发一个开放式 Minecraft 代理人, therefore 提出了一个任务中心框架(MCU)用于评估 Minecraft 代理人。
  • methods: 本研究使用了MCU框架,其基于atom任务作为基本建构件,可以生成多种多样的任务。每个任务都有六个不同的困难度分数(时间消耗、运作努力、规划复杂度、细节、创新、新颖),这些分数可以从不同的角度评估代理人的能力。
  • results: 研究发现MCU框架具有高表达力,能够覆盖所有在latest literature中使用的 Minecraft 代理人任务。此外,研究还发现了代理人开发中的一些挑战,如创新、精准控制和out-of-distribution总结。
    Abstract To pursue the goal of creating an open-ended agent in Minecraft, an open-ended game environment with unlimited possibilities, this paper introduces a task-centric framework named MCU for Minecraft agent evaluation. The MCU framework leverages the concept of atom tasks as fundamental building blocks, enabling the generation of diverse or even arbitrary tasks. Within the MCU framework, each task is measured with six distinct difficulty scores (time consumption, operational effort, planning complexity, intricacy, creativity, novelty). These scores offer a multi-dimensional assessment of a task from different angles, and thus can reveal an agent's capability on specific facets. The difficulty scores also serve as the feature of each task, which creates a meaningful task space and unveils the relationship between tasks. For efficient evaluation of Minecraft agents employing the MCU framework, we maintain a unified benchmark, namely SkillForge, which comprises representative tasks with diverse categories and difficulty distribution. We also provide convenient filters for users to select tasks to assess specific capabilities of agents. We show that MCU has the high expressivity to cover all tasks used in recent literature on Minecraft agent, and underscores the need for advancements in areas such as creativity, precise control, and out-of-distribution generalization under the goal of open-ended Minecraft agent development.
    摘要 为了实现在 Minecraft 中创造开放式的代理人,这篇论文提出了一个任务中心框架 named MCU,用于评估 Minecraft 代理人的能力。MCU 框架利用了原子任务作为基本建构件,可以生成多种多样的任务。在 MCU 框架中,每个任务都有六种不同的难度分数(时间消耗、操作努力、计划复杂度、细节、创造力、新颖性)。这些分数可以从不同的角度评估一个任务的难度,从而揭示代理人的特定能力。难度分数还成为每个任务的特征,创造了一个有意义的任务空间,揭示了任务之间的关系。为了有效评估 Minecraft 代理人使用 MCU 框架,我们维护了一个统一的标准套件,称为 SkillForge,该套件包含了多种类型的代表任务,并且有多样化的难度分布。我们还提供了用户友好的筛选工具,以便用户选择评估特定能力的代理人。我们发现 MCU 框架可以覆盖所有在最近的 Minecraft 代理人研究中使用的任务,并且强调了在开放式 Minecraft 代理人发展中的创新、精准控制和非标型泛化等领域的进一步发展。

2SFGL: A Simple And Robust Protocol For Graph-Based Fraud Detection

  • paper_url: http://arxiv.org/abs/2310.08335
  • repo_url: None
  • paper_authors: Zhirui Pan, Guangzhong Wang, Zhaoning Li, Lifeng Chen, Yang Bian, Zhongyuan Lai
  • for: 提高金融安全性和效率,避免金融犯罪者逃脱检测
  • methods: 联邦学习(FL)和虚拟图谱融合
  • results: 在一种常见诈骗检测任务上,与 FedAvg 相比, integrating GCN 和 2SFGL 协同检测方法可以提高性能 indicator 17.6%-30.2%,而 integrating GraphSAGE 和 2SFGL 协同检测方法可以提高性能 indicator 6%-16.2%。
    Abstract Financial crime detection using graph learning improves financial safety and efficiency. However, criminals may commit financial crimes across different institutions to avoid detection, which increases the difficulty of detection for financial institutions which use local data for graph learning. As most financial institutions are subject to strict regulations in regards to data privacy protection, the training data is often isolated and conventional learning technology cannot handle the problem. Federated learning (FL) allows multiple institutions to train a model without revealing their datasets to each other, hence ensuring data privacy protection. In this paper, we proposes a novel two-stage approach to federated graph learning (2SFGL): The first stage of 2SFGL involves the virtual fusion of multiparty graphs, and the second involves model training and inference on the virtual graph. We evaluate our framework on a conventional fraud detection task based on the FraudAmazonDataset and FraudYelpDataset. Experimental results show that integrating and applying a GCN (Graph Convolutional Network) with our 2SFGL framework to the same task results in a 17.6\%-30.2\% increase in performance on several typical metrics compared to the case only using FedAvg, while integrating GraphSAGE with 2SFGL results in a 6\%-16.2\% increase in performance compared to the case only using FedAvg. We conclude that our proposed framework is a robust and simple protocol which can be simply integrated to pre-existing graph-based fraud detection methods.
    摘要 金融犯罪检测使用图学学习提高金融安全和效率。然而,犯罪者可能会在不同机构中犯罪,以避免检测,这会增加金融机构使用本地数据进行图学学习时的检测难度。由于大多数金融机构受到严格的数据隐私保护法规,训练数据通常孤立,传统的学习技术无法处理这个问题。联邦学习(FL)allow multiple institutions to train a model without revealing their datasets to each other, thereby ensuring data privacy protection.在这篇论文中,我们提出了一种新的两stage方法: federated graph learning(2SFGL)。第一stage of 2SFGL involves the virtual fusion of multiparty graphs, and the second stage involves model training and inference on the virtual graph. We evaluate our framework on a conventional fraud detection task based on the FraudAmazonDataset and FraudYelpDataset. Experimental results show that integrating and applying a GCN (Graph Convolutional Network) with our 2SFGL framework to the same task results in a 17.6%-30.2% increase in performance on several typical metrics compared to the case only using FedAvg, while integrating GraphSAGE with 2SFGL results in a 6%-16.2% increase in performance compared to the case only using FedAvg. We conclude that our proposed framework is a robust and simple protocol which can be simply integrated to pre-existing graph-based fraud detection methods.

Transport-Hub-Aware Spatial-Temporal Adaptive Graph Transformer for Traffic Flow Prediction

  • paper_url: http://arxiv.org/abs/2310.08328
  • repo_url: https://github.com/fantasy-shaw/h-stformer
  • paper_authors: Xiao Xu, Lei Zhang, Bailong Liu, Zhizhen Liang, Xuefei Zhang
  • for: 这篇论文的目的是提出一种基于交通运输系统核心技术的交通流量预测方法,以解决现有方法不充分利用交通流量数据的特性和增量学习问题。
  • methods: 该方法基于Transport-Hub-aware Spatial-Temporal adaptive graph transFormer (H-STFormer),包括了一个新的空间自注意机制,三个图集合矩阵和一个时间自注意机制,以及一个额外的空间-时间知识塑造模块。
  • results: 经过广泛的实验,该方法在正常和增量交通流量预测任务中表现出色,能够更好地利用交通流量数据的特性和增量学习知识。
    Abstract As a core technology of Intelligent Transportation System (ITS), traffic flow prediction has a wide range of applications. Traffic flow data are spatial-temporal, which are not only correlated to spatial locations in road networks, but also vary with temporal time indices. Existing methods have solved the challenges in traffic flow prediction partly, focusing on modeling spatial-temporal dependencies effectively, while not all intrinsic properties of traffic flow data are utilized fully. Besides, there are very few attempts at incremental learning of spatial-temporal data mining, and few previous works can be easily transferred to the traffic flow prediction task. Motivated by the challenge of incremental learning methods for traffic flow prediction and the underutilization of intrinsic properties of road networks, we propose a Transport-Hub-aware Spatial-Temporal adaptive graph transFormer (H-STFormer) for traffic flow prediction. Specifically, we first design a novel spatial self-attention module to capture the dynamic spatial dependencies. Three graph masking matrices are integrated into spatial self-attentions to highlight both short- and long-term dependences. Additionally, we employ a temporal self-attention module to detect dynamic temporal patterns in the traffic flow data. Finally, we design an extra spatial-temporal knowledge distillation module for incremental learning of traffic flow prediction tasks. Through extensive experiments, we show the effectiveness of H-STFormer in normal and incremental traffic flow prediction tasks. The code is available at https://github.com/Fantasy-Shaw/H-STFormer.
    摘要 为智能交通系统(ITS)核心技术之一,交通流量预测具有广泛的应用。交通流量数据具有空间-时间相关性,不仅与路网中的空间位置相关,还随着时间索引而变化。现有方法已经解决了交通流量预测中的一些挑战,主要是有效地模型空间-时间相关性,但是没有 completelly 利用交通流量数据的内在特性。此外,有很少的尝试在增量学习空间-时间数据挖掘中,而且前期工作很难 direct 应用于交通流量预测任务。驱动 by 交通流量预测任务的增量学习挑战和路网内在特性的 unterutilization,我们提出了一种基于交通枢纽的空间-时间 adaptive graph transformer(H-STFormer)。具体来说,我们首先设计了一种新的空间自注意模块,以捕捉流动的空间相关性。在空间自注意模块中,我们采用了三个图 masking 矩阵,以强调短期和长期相关性。此外,我们采用了一种时间自注意模块,以检测交通流量数据中的动态时间模式。最后,我们设计了一个额外的空间-时间知识继承模块,以进行增量学习交通流量预测任务。通过广泛的实验,我们证明了 H-STFormer 在正常和增量交通流量预测任务中的效果。代码可以在 GitHub 上找到:https://github.com/Fantasy-Shaw/H-STFormer。

CHIP: Contrastive Hierarchical Image Pretraining

  • paper_url: http://arxiv.org/abs/2310.08304
  • repo_url: None
  • paper_authors: Arpit Mittal, Harshil Jhaveri, Swapnil Mallick, Abhishek Ajmera
  • for: 这个论文主要目的是提出一种几 shot 对象分类模型,用于将未seen类对象分类到一个相对普遍的类别中。
  • methods: 该模型使用了三级层次的对比损失函数基于 ResNet152 分类器,用于基于图像嵌入特征进行对象分类。
  • results: 经过训练后,模型可以准确地将未seen类对象分类到一个相对普遍的类别中,并且对这些结果进行了详细的讨论。
    Abstract Few-shot object classification is the task of classifying objects in an image with limited number of examples as supervision. We propose a one-shot/few-shot classification model that can classify an object of any unseen class into a relatively general category in an hierarchically based classification. Our model uses a three-level hierarchical contrastive loss based ResNet152 classifier for classifying an object based on its features extracted from Image embedding, not used during the training phase. For our experimentation, we have used a subset of the ImageNet (ILSVRC-12) dataset that contains only the animal classes for training our model and created our own dataset of unseen classes for evaluating our trained model. Our model provides satisfactory results in classifying the unknown objects into a generic category which has been later discussed in greater detail.
    摘要 几个示例物类分类是指将图像中的对象分类到有限多个示例的超级类别中。我们提出了一种一批/几批分类模型,可以将未看到的对象分类到一个层次结构基于的总体类别中。我们的模型使用了基于对比损失的ResNet152分类器,通过图像嵌入特征来分类对象。在训练阶段,我们使用了ILSVRC-12 dataset的动物类 subsets来训练我们的模型,并创建了一个包含未看到类的自定义数据集来评估我们的训练后模型。我们的模型在不知道对象的情况下提供了满意的结果,这些结果在后续详细介绍。

If our aim is to build morality into an artificial agent, how might we begin to go about doing so?

  • paper_url: http://arxiv.org/abs/2310.08295
  • repo_url: None
  • paper_authors: Reneira Seeamber, Cosmin Badea
  • for: 本研究旨在强调在AI中建立道德机器人的重要性,以及关键考虑的道德方法和挑战。
  • methods: 本文提出了一种混合方法和层次结合方法,以实现建立道德机器人。
  • results: 本研究提出了一些解决方案,包括 hybrid 方法和层次结合方法,以确保 AI 的道德行为和政策的实施。
    Abstract As Artificial Intelligence (AI) becomes pervasive in most fields, from healthcare to autonomous driving, it is essential that we find successful ways of building morality into our machines, especially for decision-making. However, the question of what it means to be moral is still debated, particularly in the context of AI. In this paper, we highlight the different aspects that should be considered when building moral agents, including the most relevant moral paradigms and challenges. We also discuss the top-down and bottom-up approaches to design and the role of emotion and sentience in morality. We then propose solutions including a hybrid approach to design and a hierarchical approach to combining moral paradigms. We emphasize how governance and policy are becoming ever more critical in AI Ethics and in ensuring that the tasks we set for moral agents are attainable, that ethical behavior is achieved, and that we obtain good AI.
    摘要 随着人工智能(AI)在各个领域的普及,从医疗到自动驾驶,建立机器内置的道德是非常重要的。然而,我们 Still debating what it means to be moral, especially in the context of AI. In this paper, we highlight the different aspects that should be considered when building moral agents, including the most relevant moral paradigms and challenges. We also discuss the top-down and bottom-up approaches to design and the role of emotion and sentience in morality. We then propose solutions including a hybrid approach to design and a hierarchical approach to combining moral paradigms. We emphasize how governance and policy are becoming ever more critical in AI Ethics and in ensuring that the tasks we set for moral agents are attainable, that ethical behavior is achieved, and that we obtain good AI.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Concealed Electronic Countermeasures of Radar Signal with Adversarial Examples

  • paper_url: http://arxiv.org/abs/2310.08292
  • repo_url: None
  • paper_authors: Ruinan Ma, Canjie Zhu, Mingfeng Lu, Yunjie Li, Yu-an Tan, Ruibin Zhang, Ran Tao
  • for: 本研究旨在探讨基于AI技术的雷达信号电子干扰技术,以解决传统干扰技术的缺点,即干扰信号过于明显。
  • methods: 我们提出了一个基于时域频域图像的雷达信号分类攻击管道,并使用DITIMI-FGSM攻击算法,实现了高度可移植性。此外,我们还提出了基于STFT算法的时域信号攻击方法(STDS),以解决时域分析中的非逆问题。
  • results: 我们通过大量实验发现,我们的攻击管道是可行的,并且提出的攻击方法具有高度成功率。
    Abstract Electronic countermeasures involving radar signals are an important aspect of modern warfare. Traditional electronic countermeasures techniques typically add large-scale interference signals to ensure interference effects, which can lead to attacks being too obvious. In recent years, AI-based attack methods have emerged that can effectively solve this problem, but the attack scenarios are currently limited to time domain radar signal classification. In this paper, we focus on the time-frequency images classification scenario of radar signals. We first propose an attack pipeline under the time-frequency images scenario and DITIMI-FGSM attack algorithm with high transferability. Then, we propose STFT-based time domain signal attack(STDS) algorithm to solve the problem of non-invertibility in time-frequency analysis, thus obtaining the time-domain representation of the interference signal. A large number of experiments show that our attack pipeline is feasible and the proposed attack method has a high success rate.
    摘要 现代战争中电子干扰技术是非常重要的。传统的电子干扰技术通常添加大规模干扰信号以确保干扰效果,这可能导致攻击变得太明显。在最近几年,基于人工智能的攻击方法出现了,可以有效解决这个问题,但攻击场景目前仅限于时域雷达信号分类。在这篇论文中,我们关注时频图像分类场景中的雷达信号。我们首先提出了基于时频图像的攻击管道和DITIMI-FGSM攻击算法,该算法具有高传输性。然后,我们提出了STFT基于时域信号攻击算法(STDS)以解决时频分析中的非可逆性问题,从而获得了干扰信号的时域表示。大量实验表明,我们的攻击管道是可行的,并且提posed攻击方法具有高成功率。

Expanding the Vocabulary of BERT for Knowledge Base Construction

  • paper_url: http://arxiv.org/abs/2310.08291
  • repo_url: https://github.com/MaastrichtU-IDS/LMKBC-2023
  • paper_authors: Dong Yang, Xu Wang, Remzi Celebi
  • for: 本研究旨在constructing knowledge base using language model, specifically tackling the task of knowledge base construction from pre-trained language models at International Semantic Web Conference 2023.
  • methods: 我们提出了Vocabulary Expandable BERT,一种可以扩展语言模型词汇表的方法,同时保持语义嵌入的新增词语。我们采用了任务特有的重新预训练方法来进一步提高语言模型。
  • results: 我们的方法在实验中表现出色,F1分数达0.323和0.362分别在隐藏测试集和验证集上,两者均由挑战提供。我们的框架使用了轻量级语言模型(BERT-base,0.13亿参数),超过使用直接在大语言模型上预训练(Chatgpt-3,175亿参数)。此外,Token-Recode achieve相当的表现与Re-pretrain。本研究提升了语言理解模型,使得直接嵌入多token实体,标志着知识图和数据管理中的链接预测任务做出了重大进步。
    Abstract Knowledge base construction entails acquiring structured information to create a knowledge base of factual and relational data, facilitating question answering, information retrieval, and semantic understanding. The challenge called "Knowledge Base Construction from Pretrained Language Models" at International Semantic Web Conference 2023 defines tasks focused on constructing knowledge base using language model. Our focus was on Track 1 of the challenge, where the parameters are constrained to a maximum of 1 billion, and the inclusion of entity descriptions within the prompt is prohibited. Although the masked language model offers sufficient flexibility to extend its vocabulary, it is not inherently designed for multi-token prediction. To address this, we present Vocabulary Expandable BERT for knowledge base construction, which expand the language model's vocabulary while preserving semantic embeddings for newly added words. We adopt task-specific re-pre-training on masked language model to further enhance the language model. Through experimentation, the results show the effectiveness of our approaches. Our framework achieves F1 score of 0.323 on the hidden test set and 0.362 on the validation set, both data set is provided by the challenge. Notably, our framework adopts a lightweight language model (BERT-base, 0.13 billion parameters) and surpasses the model using prompts directly on large language model (Chatgpt-3, 175 billion parameters). Besides, Token-Recode achieves comparable performances as Re-pretrain. This research advances language understanding models by enabling the direct embedding of multi-token entities, signifying a substantial step forward in link prediction task in knowledge graph and metadata completion in data management.
    摘要 知识库建设需要获取结构化信息,以创建一个包含事实和关系数据的知识库,以便问题回答、信息检索和Semantic理解。国际semantic Web会议2023年度挑战定义了一系列任务,用于使用语言模型构建知识库。我们的关注点是第一轨道的挑战,其中语言模型的参数不得超过10亿,并且在提示中禁止包含实体描述。虽然遮盲语言模型具有较好的灵活性,但它并不自然地适应多个单词预测。为解决这个问题,我们提出了用于知识库建设的词汇扩展BERT(Vocabulary Expandable BERT),可以扩展语言模型的词汇,同时保持新增的单词含义表示。我们采用了特定任务的再预训练masked语言模型,以进一步提高语言模型。经过实验,我们的方法得到了显著的效果。我们的框架在隐藏测试集上达到了F1分数0.323,并在验证集上达到了F1分数0.362。值得注意的是,我们的框架使用了轻量级语言模型(BERT-base,0.13亿参数),并在使用大型语言模型(Chatgpt-3,175亿参数)上超过了模型。此外,Token-Recode获得了与Re-pretrain相当的性能。这项研究提高了语言理解模型,使其能直接嵌入多个单词实体,标志着链接预测任务在知 graphs和数据管理中的一个重要进步。

CP-KGC: Constrained-Prompt Knowledge Graph Completion with Large Language Models

  • paper_url: http://arxiv.org/abs/2310.08279
  • repo_url: https://github.com/sjlmg/CP-KGC
  • paper_authors: Rui Yang, Li Fang, Yi Zhou
  • For: 这篇论文的目的是利用现有的知识来推理和推测知识图中缺失的连接。* Methods: 这篇论文使用了文本基本方法,如SimKGC,以提高知识图补充的效果。但是,文本基本方法的效果受到实体文本描述的质量的限制。本文提出了使用约束基于的提示来减少LLM生成文本中的幻化问题。* Results: 本文的Constraint-Prompt Knowledge Graph Completion(CP-KGC)方法在低资源计算条件下表现出了有效的推理能力,并在WN18RR和FB15K237数据集上超过了之前的结果。这表明了LLMs在KGC任务中的整合和未来研究的新方向。
    Abstract Knowledge graph completion (KGC) aims to utilize existing knowledge to deduce and infer missing connections within knowledge graphs. Text-based approaches, like SimKGC, have outperformed graph embedding methods, showcasing the promise of inductive KGC. However, the efficacy of text-based methods hinges on the quality of entity textual descriptions. In this paper, we identify the key issue of whether large language models (LLMs) can generate effective text. To mitigate hallucination in LLM-generated text in this paper, we introduce a constraint-based prompt that utilizes the entity and its textual description as contextual constraints to enhance data quality. Our Constrained-Prompt Knowledge Graph Completion (CP-KGC) method demonstrates effective inference under low resource computing conditions and surpasses prior results on the WN18RR and FB15K237 datasets. This showcases the integration of LLMs in KGC tasks and provides new directions for future research.
    摘要 知识图完成(KGC)目标是利用现有知识来推理和推断知识图中缺失的连接。文本基本方法,如SimKGC,在完成KGC任务中表现出色,超越了图集 embedding 方法。然而,文本基本方法的效果归结于实体文本描述的质量。在这篇论文中,我们发现了大语言模型(LLM)是否能生成有效的文本是关键问题。为了消除LLM生成文本中的幻觉,我们引入了一种基于约束的提问方法,使用实体和其文本描述作为 Contextual 约束来提高数据质量。我们的受约束知识图完成(CP-KGC)方法在低资源计算条件下表现出了有效的推理能力,并在WN18RR和FB15K237数据集上超越了先前的结果。这表明了LLM在KGC任务中的整合,并为未来的研究提供了新的方向。

Lag-Llama: Towards Foundation Models for Time Series Forecasting

  • paper_url: http://arxiv.org/abs/2310.08278
  • repo_url: https://github.com/kashif/pytorch-transformer-ts
  • paper_authors: Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Arian Khorasani, George Adamopoulos, Rishika Bhagwatkar, Marin Biloš, Hena Ghonia, Nadhir Vincent Hassen, Anderson Schneider, Sahil Garg, Alexandre Drouin, Nicolas Chapados, Yuriy Nevmyvaka, Irina Rish
  • for: 这个论文的目的是建立基础模型,用于时间序列预测,以及研究这些模型的扩展性。
  • methods: 这个模型使用了一种通用的 probabilistic 时间序列预测模型,并使用了大量的时间序列数据进行训练。
  • results: 模型在未看过的 “out-of-distribution” 时间序列数据上表现出色,超过了supervised baselines的预测性能。模型使用了光滑破碎的power-laws来预测模型的扩展性。Here’s the English version for reference:
  • for: The purpose of this paper is to build foundational models for time-series forecasting and study their scalability.
  • methods: The model uses a general-purpose probabilistic time-series forecasting model trained on a large collection of time-series data.
  • results: The model shows good zero-shot prediction capabilities on unseen “out-of-distribution” time-series datasets, outperforming supervised baselines. The model uses smoothly broken power-laws to fit and predict model scaling behavior.
    Abstract Aiming to build foundation models for time-series forecasting and study their scaling behavior, we present here our work-in-progress on Lag-Llama, a general-purpose univariate probabilistic time-series forecasting model trained on a large collection of time-series data. The model shows good zero-shot prediction capabilities on unseen "out-of-distribution" time-series datasets, outperforming supervised baselines. We use smoothly broken power-laws to fit and predict model scaling behavior. The open source code is made available at https://github.com/kashif/pytorch-transformer-ts.
    摘要 目标建立时间序列预测基础模型,我们现在发表我们的工作进度, Lag-Llama 是一种通用单变量时间序列预测模型,通过大量时间序列数据进行训练。该模型在未看过的 "out-of-distribution" 时间序列数据上表现出良好的预测能力,超过了指导基eline。我们使用缓和的力普洛斯来预测模型缩放行为。开源代码可以在 https://github.com/kashif/pytorch-transformer-ts 上获取。

Direction-Oriented Visual-semantic Embedding Model for Remote Sensing Image-text Retrieval

  • paper_url: http://arxiv.org/abs/2310.08276
  • repo_url: None
  • paper_authors: Qing Ma, Jiancheng Pan, Cong Bai
  • for: 提高Remote Sensing中的图像文本检索精度,解决视觉语义不匹配问题
  • methods: 提出一种新的方向偏置视 semantic Embedding Model (DOVE),利用 Regional-Oriented Attention Module (ROAM) 和 lightweight Digging Text Genome Assistant (DTGA) 实现视觉语义关系的挖掘
  • results: 通过广泛的实验,包括参数评估、量化比较、拆除研究和视觉分析,证明方法的效果和优越性,在RSICD和RSITMD两个标准测试集上
    Abstract Image-text retrieval has developed rapidly in recent years. However, it is still a challenge in remote sensing due to visual-semantic imbalance, which leads to incorrect matching of non-semantic visual and textual features. To solve this problem, we propose a novel Direction-Oriented Visual-semantic Embedding Model (DOVE) to mine the relationship between vision and language. Concretely, a Regional-Oriented Attention Module (ROAM) adaptively adjusts the distance between the final visual and textual embeddings in the latent semantic space, oriented by regional visual features. Meanwhile, a lightweight Digging Text Genome Assistant (DTGA) is designed to expand the range of tractable textual representation and enhance global word-level semantic connections using less attention operations. Ultimately, we exploit a global visual-semantic constraint to reduce single visual dependency and serve as an external constraint for the final visual and textual representations. The effectiveness and superiority of our method are verified by extensive experiments including parameter evaluation, quantitative comparison, ablation studies and visual analysis, on two benchmark datasets, RSICD and RSITMD.
    摘要 图文检索在最近几年内得到了快速发展,但在遥感领域仍然存在视 semantic 不匹配问题,这会导致非 semantic 的视觉和文本特征的不正确匹配。为解决这问题,我们提议一种新的方向围绕视 semantic 嵌入模型(DOVE),以挖掘视 Semantic 关系。具体来说,一个地域围绕注意模块(ROAM)可以在最终的视觉和文本嵌入空间中进行 adaptive 距离调整,以便根据地域视觉特征进行方向引导。同时,我们设计了一种轻量级的挖掘文本遗传助手(DTGA),以扩大可处理文本表示范围和增强全局单词 Semantic 连接使用 fewer attention 操作。最后,我们利用全局视 Semantic 约束来减少单个视觉依赖和作为外部约束 для final 视觉和文本表示。我们的方法的效果和优越性在两个 benchmark 数据集上,RSICD 和 RSITMD 进行了广泛的实验,包括参数评估、量化比较、剥削学习和视觉分析。Note that Simplified Chinese is used in the translation, as it is more widely used in mainland China and is the standard writing system used in the country. Traditional Chinese is used in Hong Kong, Macau, and Taiwan, and it has some differences in spelling and grammar compared to Simplified Chinese.

Impact of Co-occurrence on Factual Knowledge of Large Language Models

  • paper_url: http://arxiv.org/abs/2310.08256
  • repo_url: https://github.com/cheongwoong/impact_of_cooccurrence
  • paper_authors: Cheongwoong Kang, Jaesik Choi
  • for: 本研究旨在探讨大语言模型(LLM)常常返回错误的原因,以及如何提高LLM的可靠性。
  • methods: 本研究使用了一种定量方法,通过分析LLM在不同预训练集中的表现,探讨LLM在回答问题时是否受到预训练数据中的偏见影响。
  • results: 研究发现,LLM受到预训练数据中的偏见影响,导致它们偏好频繁共occurrence的词语,而不是正确的答案。这使得LLM在回答有关的问题时难以讲述事实,尤其是当问题中的主题和 объек hardly co-occur在预训练数据中时。研究还发现,不 matter how large the model size or how much finetuning is done, co-occurrence bias still exists. 因此,研究建议使用减偏数据集进行finetuning,以避免这种偏见。
    Abstract Large language models (LLMs) often make factually incorrect responses despite their success in various applications. In this paper, we hypothesize that relying heavily on simple co-occurrence statistics of the pre-training corpora is one of the main factors that cause factual errors. Our results reveal that LLMs are vulnerable to the co-occurrence bias, defined as preferring frequently co-occurred words over the correct answer. Consequently, LLMs struggle to recall facts whose subject and object rarely co-occur in the pre-training dataset although they are seen during finetuning. We show that co-occurrence bias remains despite scaling up model sizes or finetuning. Therefore, we suggest finetuning on a debiased dataset to mitigate the bias by filtering out biased samples whose subject-object co-occurrence count is high. Although debiased finetuning allows LLMs to memorize rare facts in the training set, it is not effective in recalling rare facts unseen during finetuning. Further research in mitigation will help build reliable language models by preventing potential errors. The code is available at \url{https://github.com/CheongWoong/impact_of_cooccurrence}.
    摘要 大型语言模型(LLM)经常会给出错误的回答,即使在不同的应用中具有成功。在这篇文章中,我们提出了假设,认为将重点放在预训料中的简单共occurrence统计上是 LLM 产生错误的主要原因。我们的结果显示 LLM 受到共occurrence偏见,即偏爱常见的词语,而不是正确的答案。因此, LLM 对于 rarely 共occurrence 的主题和物件没有记忆,即使它们在调整中看到过。我们发现,共occurrence偏见不受模型大小或调整的影响,因此我们建议使用删除偏见样本的删除调整,以降低这种偏见。虽然删除调整可以帮助 LLM 记忆预训料中罕见的事实,但是它不能帮助 LLM 在调整中发现过去未见的罕见事实。进一步的研究将有助于建立可靠的语言模型,以避免潜在的错误。代码可以在 \url{https://github.com/CheongWoong/impact_of_cooccurrence} 获取。

MetaBox: A Benchmark Platform for Meta-Black-Box Optimization with Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.08252
  • repo_url: https://github.com/GMC-DRL/MetaBox
  • paper_authors: Zeyuan Ma, Hongshu Guo, Jiacheng Chen, Zhenrui Li, Guojun Peng, Yue-Jiao Gong, Yining Ma, Zhiguang Cao
  • for: 这个研究旨在探讨Meta-Black-Box Optimization with Reinforcement Learning(MetaBBO-RL)的可能性,并提供一个全面的 benchmark 平台 для开发和评估MetaBBO-RL 方法。
  • methods: 这个研究使用了一个可调的 algorithmic template,让使用者可以轻松地实现自己的设计 within the platform。另外,它还提供了300多个问题实例,涵盖了从 sintetic 到 realistic 的情况,以及19个基eline 方法,包括传统的 black-box optimizer 和最近的 MetaBBO-RL 方法。
  • results: 这个研究为了证明 MetaBox 的用途,对现有的 MetaBBO-RL 方法进行了广泛的 benchmarking 研究。
    Abstract Recently, Meta-Black-Box Optimization with Reinforcement Learning (MetaBBO-RL) has showcased the power of leveraging RL at the meta-level to mitigate manual fine-tuning of low-level black-box optimizers. However, this field is hindered by the lack of a unified benchmark. To fill this gap, we introduce MetaBox, the first benchmark platform expressly tailored for developing and evaluating MetaBBO-RL methods. MetaBox offers a flexible algorithmic template that allows users to effortlessly implement their unique designs within the platform. Moreover, it provides a broad spectrum of over 300 problem instances, collected from synthetic to realistic scenarios, and an extensive library of 19 baseline methods, including both traditional black-box optimizers and recent MetaBBO-RL methods. Besides, MetaBox introduces three standardized performance metrics, enabling a more thorough assessment of the methods. In a bid to illustrate the utility of MetaBox for facilitating rigorous evaluation and in-depth analysis, we carry out a wide-ranging benchmarking study on existing MetaBBO-RL methods. Our MetaBox is open-source and accessible at: https://github.com/GMC-DRL/MetaBox.
    摘要 近期,Meta-Black-Box优化器与强化学习(MetaBBO-RL)已经展示了通过在meta层使用RL来减少人工细化低级黑盒优化器的问题。然而,这个领域受到互联网缺乏一个统一的 benchmark 的限制。为了填补这个空白,我们介绍了 MetaBox,第一个专门为开发和评估 MetaBBO-RL 方法而设计的 benchmark 平台。MetaBox 提供了灵活的算法模板,allowing users 可以轻松地实现他们的独特设计在平台上。此外,它还提供了来自 sintetic 到 realistic 的问题集,以及一个广泛的库,包括传统的黑盒优化器和最新的 MetaBBO-RL 方法。此外,MetaBox 引入了三种标准性能指标,以便更加全面地评估方法。为了证明 MetaBox 的用于促进严格评估和深入分析的能力,我们进行了广泛的 benchmarking 研究,覆盖了现有的 MetaBBO-RL 方法。我们的 MetaBox 是开源的,可以在以下地址下载:https://github.com/GMC-DRL/MetaBox。

GROOT: Learning to Follow Instructions by Watching Gameplay Videos

  • paper_url: http://arxiv.org/abs/2310.08235
  • repo_url: https://github.com/CraftJarvis/GROOT
  • paper_authors: Shaofei Cai, Bowei Zhang, Zihao Wang, Xiaojian Ma, Anji Liu, Yitao Liang
  • for: 这个论文目标是建立一个可以遵循开放式指令的控制器,用于在开放世界环境中进行游戏play。
  • methods: 该论文提出了以参考视频作为指令,从游戏媒体中学习控制器的方法,并使用 causal transformers 实现了一个简单 yet effective encoder-decoder 架构。
  • results: 在一个基于 Minecraft 的 SkillForge benchamark 上,对于开放世界的对手和人类玩家进行评测,GROOT 表现出了70% 的赢利率,并且与人类机器同等水平。 代码和视频可以在https://craftjarvis-groot.github.io 上找到。
    Abstract We study the problem of building a controller that can follow open-ended instructions in open-world environments. We propose to follow reference videos as instructions, which offer expressive goal specifications while eliminating the need for expensive text-gameplay annotations. A new learning framework is derived to allow learning such instruction-following controllers from gameplay videos while producing a video instruction encoder that induces a structured goal space. We implement our agent GROOT in a simple yet effective encoder-decoder architecture based on causal transformers. We evaluate GROOT against open-world counterparts and human players on a proposed Minecraft SkillForge benchmark. The Elo ratings clearly show that GROOT is closing the human-machine gap as well as exhibiting a 70% winning rate over the best generalist agent baseline. Qualitative analysis of the induced goal space further demonstrates some interesting emergent properties, including the goal composition and complex gameplay behavior synthesis. Code and video can be found on the website https://craftjarvis-groot.github.io.
    摘要 我们研究如何建立一个可以遵循开放式指令的控制器,在开放世界环境中进行游戏游戏。我们提议以参考视频作为指令,这些指令提供了表达力强的目标规范,同时消除了高昂的文本游戏注释。我们 derivates a new learning framework,使得可以从游戏视频中学习这种指令遵循控制器,并生成一个视频指令编码器,该编码器在游戏中生成结构化的目标空间。我们实现了我们的代理GROOT,使用了 causal transformers 基于 encoder-decoder 架构。我们对 Minecraft SkillForge benchmark 进行了评估,并与人类玩家和其他开放世界控制器进行比较。很明显,GROOT 在人机之间减少了差距,并在最佳通用代理基eline上达到 70% 的赢利率。另外,对于引导空间的分析也表明了一些有趣的 emergent 性,包括目标组合和复杂的游戏行为合成。代码和视频可以在 website https://craftjarvis-groot.github.io 找到。

The Impact of Time Step Frequency on the Realism of Robotic Manipulation Simulation for Objects of Different Scales

  • paper_url: http://arxiv.org/abs/2310.08233
  • repo_url: None
  • paper_authors: Minh Q. Ta, Holly Dinkel, Hameed Abdul-Rashid, Yangfei Dai, Jessica Myers, Tan Chen, Junyi Geng, Timothy Bretl
  • for: 本研究探讨了机器人操作模拟精度中的时间步频和组件尺度的影响。
  • methods: 研究使用了不同的时间步频和组件尺度,对小规模 объек的机器人操作模拟精度进行了评估。
  • results: 结果显示,逐步增加时间步频可以提高小规模 объек的机器人操作模拟精度。
    Abstract This work evaluates the impact of time step frequency and component scale on robotic manipulation simulation accuracy. Increasing the time step frequency for small-scale objects is shown to improve simulation accuracy. This simulation, demonstrating pre-assembly part picking for two object geometries, serves as a starting point for discussing how to improve Sim2Real transfer in robotic assembly processes.
    摘要 这个研究evaluates the impact of时间步频和组件尺度在机器人操作 simulated accuracy. 增加小规模对象的时间步频可以提高simulation accuracy. 这个simulation, demonstrating pre-assembly part picking for two object geometries, serves as a starting point for discussing how to improve Sim2Real transfer in robotic assembly processes.Note: "Sim2Real" refers to the transfer of skills learned in simulation to the real world.

Large language models can replicate cross-cultural differences in personality

  • paper_url: http://arxiv.org/abs/2310.10679
  • repo_url: None
  • paper_authors: Paweł Niszczota, Mateusz Janczak
  • for: 本研究用于测试GPT-4是否能够复制不同文化之间的五大人性特质差异,使用美国和韩国作为文化对比。
  • methods: 研究使用了大规模实验(N=8000),使用GPT-4和GPT-3.5两种语言模型,对英语和韩语版本的十项人性测试表进行了 manipulate。
  • results: 研究发现GPT-4能够复制不同文化之间的每个因素差异,但是 сред值有上升偏好,表现出来的结构适应性较低。
    Abstract We use a large-scale experiment (N=8000) to determine whether GPT-4 can replicate cross-cultural differences in the Big Five, measured using the Ten-Item Personality Inventory. We used the US and South Korea as the cultural pair, given that prior research suggests substantial personality differences between people from these two countries. We manipulated the target of the simulation (US vs. Korean), the language of the inventory (English vs. Korean), and the language model (GPT-4 vs. GPT-3.5). Our results show that GPT-4 replicated the cross-cultural differences for each factor. However, mean ratings had an upward bias and exhibited lower variation than in the human samples, as well as lower structural validity. Overall, we provide preliminary evidence that LLMs can aid cross-cultural psychological research.
    摘要 我们使用大规模实验(N=8000)来确定GPT-4是否可以复制不同文化之间的五大人格 trait,使用美国和韩国作为文化对,因为先前的研究表明这两个国家之间存在重要的人格差异。我们在目标 simulate(美国 vs. 韩国)、语言测量 инструment(英语 vs. 韩语)和语言模型(GPT-4 vs. GPT-3.5)上进行了 manipulate。我们的结果表明,GPT-4能够复制不同文化之间的每个因素。然而,平均评价显示有上升偏好,并且表现出较低的多样性和结构有效性。总的来说,我们提供了初步的证据,表明LLMs可以助助cross-cultural psychological research。Note: "韩语" (Korean language) is used in the text to refer to the language spoken in South Korea.

SimCKP: Simple Contrastive Learning of Keyphrase Representations

  • paper_url: http://arxiv.org/abs/2310.08221
  • repo_url: https://github.com/brightjade/SimCKP
  • paper_authors: Minseok Choi, Chaeheon Gwak, Seho Kim, Si Hyeong Kim, Jaegul Choo
  • for: 本文目的是提出一种简单的对比学习框架,以提高键短语生成和键短语提取的效果。
  • methods: 本文使用了一个抽象generator和一个reranker来实现对比学习框架。抽象generator通过学习上下文感知词语表示来提取键短语,同时也生成不在文档中的键短语。reranker则是对生成的每个词语进行适应性分配,使其与文档的表示相似。
  • results: 实验结果表明,提出的方法在多个benchmark数据集上表现出色,与现有状态的模型相比,具有显著的超越性。
    Abstract Keyphrase generation (KG) aims to generate a set of summarizing words or phrases given a source document, while keyphrase extraction (KE) aims to identify them from the text. Because the search space is much smaller in KE, it is often combined with KG to predict keyphrases that may or may not exist in the corresponding document. However, current unified approaches adopt sequence labeling and maximization-based generation that primarily operate at a token level, falling short in observing and scoring keyphrases as a whole. In this work, we propose SimCKP, a simple contrastive learning framework that consists of two stages: 1) An extractor-generator that extracts keyphrases by learning context-aware phrase-level representations in a contrastive manner while also generating keyphrases that do not appear in the document; 2) A reranker that adapts scores for each generated phrase by likewise aligning their representations with the corresponding document. Experimental results on multiple benchmark datasets demonstrate the effectiveness of our proposed approach, which outperforms the state-of-the-art models by a significant margin.
    摘要 “键签生成(KG)的目标是从来源文档中生成一系列概要的词汇或短语,而键签提取(KE)则是从文档中直接找到这些键签。由于搜寻空间较小的KE,因此通常与KG结合以预测文档中可能存在的键签。然而,现有的统一方法通常运用序列标记和最大化生成,主要在字元水平运作,忽略了评估和评分键签的整体性。在这个工作中,我们提出了简单的对照学习框架SimCKP,它包括以下两个阶段:1)抽取生成器,通过学习上下文感知词汇水平表示来提取键签,同时生成不存在文档中的键签;2)改进器,将每个生成的词汇排名更新,根据该词汇与文档的表示相互适合。实验结果显示,我们提出的方法可以对多个标准 benchmark dataset 进行优化,并与现有模型相比,具有较高的效果。”

TriRE: A Multi-Mechanism Learning Paradigm for Continual Knowledge Retention and Promotion

  • paper_url: http://arxiv.org/abs/2310.08217
  • repo_url: https://github.com/NeurAI-Lab/TriRE
  • paper_authors: Preetha Vijayan, Prashant Bhat, Elahe Arani, Bahram Zonooz
  • For: The paper aims to address the challenge of continual learning (CL) in deep neural networks, specifically catastrophic forgetting (CF) of previously learned tasks.* Methods: The proposed method, called TriRE, combines several neurophysiological processes, including neurogenesis, active forgetting, neuromodulation, metaplasticity, experience rehearsal, and context-dependent gating, to mitigate CF and improve CL performance.* Results: TriRE significantly reduces task interference and outperforms other CL approaches considered in isolation across various CL settings.
    Abstract Continual learning (CL) has remained a persistent challenge for deep neural networks due to catastrophic forgetting (CF) of previously learned tasks. Several techniques such as weight regularization, experience rehearsal, and parameter isolation have been proposed to alleviate CF. Despite their relative success, these research directions have predominantly remained orthogonal and suffer from several shortcomings, while missing out on the advantages of competing strategies. On the contrary, the brain continually learns, accommodates, and transfers knowledge across tasks by simultaneously leveraging several neurophysiological processes, including neurogenesis, active forgetting, neuromodulation, metaplasticity, experience rehearsal, and context-dependent gating, rarely resulting in CF. Inspired by how the brain exploits multiple mechanisms concurrently, we propose TriRE, a novel CL paradigm that encompasses retaining the most prominent neurons for each task, revising and solidifying the extracted knowledge of current and past tasks, and actively promoting less active neurons for subsequent tasks through rewinding and relearning. Across CL settings, TriRE significantly reduces task interference and surpasses different CL approaches considered in isolation.
    摘要

Trustworthy Machine Learning

  • paper_url: http://arxiv.org/abs/2310.08215
  • repo_url: https://github.com/matthew-mcateer/practicing_trustworthy_machine_learning
  • paper_authors: Bálint Mucsányi, Michael Kirchhof, Elisa Nguyen, Alexander Rubinstein, Seong Joon Oh
  • for: This paper is written for researchers and practitioners who want to build trustworthy machine learning models that can generalize to small changes in the distribution, provide explainability, and quantify uncertainty.
  • methods: The paper covers four key topics in trustworthy machine learning: out-of-distribution generalization, explainability, uncertainty quantification, and evaluation of trustworthiness. It discusses classical and contemporary research papers in these fields and uncovers their underlying intuitions.
  • results: The book provides a theoretical and technical background in trustworthy machine learning, including code snippets and pointers to further sources on topics of TML. It is meant to be a stand-alone product and has evolved from a course offered at the University of Tübingen.
    Abstract As machine learning technology gets applied to actual products and solutions, new challenges have emerged. Models unexpectedly fail to generalize to small changes in the distribution, tend to be confident on novel data they have never seen, or cannot communicate the rationale behind their decisions effectively with the end users. Collectively, we face a trustworthiness issue with the current machine learning technology. This textbook on Trustworthy Machine Learning (TML) covers a theoretical and technical background of four key topics in TML: Out-of-Distribution Generalization, Explainability, Uncertainty Quantification, and Evaluation of Trustworthiness. We discuss important classical and contemporary research papers of the aforementioned fields and uncover and connect their underlying intuitions. The book evolved from the homonymous course at the University of T\"ubingen, first offered in the Winter Semester of 2022/23. It is meant to be a stand-alone product accompanied by code snippets and various pointers to further sources on topics of TML. The dedicated website of the book is https://trustworthyml.io/.
    摘要 machine learning技术应用到实际产品和解决方案时,新的挑战出现了。模型往往无法泛化到小的分布变化,对新数据感到非常自信,或者无法有效地通过决策的理由与用户交流。总之,我们面临着当前机器学习技术的信任问题。这本《信任worthy机器学习》(TML)教程涵盖了四个关键话题的理论和技术背景:离distribution泛化、解释性、不确定度量和评估信任worthiness。我们讨论了重要的经典和当代研究论文,探索了它们的基本感知。这本书源于同名课程,在2022/23学年冬季学期首次举行。它是一个独立的产品,附有代码示例和各种关于TML主题的资源。相关网站是

Long-Tailed Classification Based on Coarse-Grained Leading Forest and Multi-Center Loss

  • paper_url: http://arxiv.org/abs/2310.08206
  • repo_url: https://github.com/jinyery/cognisance
  • paper_authors: Jinye Yang, Ji Xu
  • For: This paper aims to address the long-tailed classification problem by proposing a new framework called \textbf{\textsc{Cognisance}, which uses a combination of Coarse-Grained Leading Forest (CLF) and Multi-Center Loss (MCL) to learn invariant features and improve the performance of long-tailed classification.* Methods: The proposed method uses an unsupervised learning method, CLF, to better characterize the distribution of attributes within a class, and introduces a new metric learning loss, MCL, to gradually eliminate confusing attributes during the feature learning process.* Results: The proposed method has state-of-the-art performance in both existing benchmarks ImageNet-GLT and MSCOCO-GLT, and can improve the performance of existing LT methods. The codes are available on GitHub: \url{https://github.com/jinyery/cognisance}.
    Abstract Long-tailed(LT) classification is an unavoidable and challenging problem in the real world. Most of the existing long-tailed classification methods focus only on solving the inter-class imbalance in which there are more samples in the head class than in the tail class, while ignoring the intra-lass imbalance in which the number of samples of the head attribute within the same class is much larger than the number of samples of the tail attribute. The deviation in the model is caused by both of these factors, and due to the fact that attributes are implicit in most datasets and the combination of attributes is very complex, the intra-class imbalance is more difficult to handle. For this purpose, we proposed a long-tailed classification framework, known as \textbf{\textsc{Cognisance}, which is founded on Coarse-Grained Leading Forest (CLF) and Multi-Center Loss (MCL), aiming to build a multi-granularity joint solution model by means of invariant feature learning. In this method, we designed an unsupervised learning method, i.e., CLF, to better characterize the distribution of attributes within a class. Depending on the distribution of attributes, we can flexibly construct sampling strategies suitable for different environments. In addition, we introduce a new metric learning loss (MCL), which aims to gradually eliminate confusing attributes during the feature learning process. More importantly, this approach does not depend on a specific model structure and can be integrated with existing LT methods as an independent component. We have conducted extensive experiments and our approach has state-of-the-art performance in both existing benchmarks ImageNet-GLT and MSCOCO-GLT, and can improve the performance of existing LT methods. Our codes are available on GitHub: \url{https://github.com/jinyery/cognisance}
    摘要 Traditional long-tailed classification methods only focus on solving the inter-class imbalance issue, where there are more samples in the head class than in the tail class, while ignoring the intra-class imbalance issue where the number of samples of the head attribute within the same class is much larger than the number of samples of the tail attribute. This leads to deviation in the model. Moreover, attributes are implicit in most datasets and the combination of attributes is very complex, making the intra-class imbalance more difficult to handle.To address these issues, we proposed a long-tailed classification framework called \textbf{\textsc{Cognisance} which is founded on Coarse-Grained Leading Forest (CLF) and Multi-Center Loss (MCL). The goal is to build a multi-granularity joint solution model through invariant feature learning.Our approach includes an unsupervised learning method, CLF, to better characterize the distribution of attributes within a class. Depending on the distribution of attributes, we can flexibly construct sampling strategies suitable for different environments. Additionally, we introduce a new metric learning loss, MCL, which aims to gradually eliminate confusing attributes during the feature learning process.The key advantage of our approach is that it does not depend on a specific model structure and can be integrated with existing LT methods as an independent component. We have conducted extensive experiments and our approach has achieved state-of-the-art performance in both existing benchmarks ImageNet-GLT and MSCOCO-GLT, and can improve the performance of existing LT methods. Our codes are available on GitHub: \url{https://github.com/jinyery/cognisance}.

Beyond Traditional DoE: Deep Reinforcement Learning for Optimizing Experiments in Model Identification of Battery Dynamics

  • paper_url: http://arxiv.org/abs/2310.08198
  • repo_url: None
  • paper_authors: Gokhan Budan, Francesca Damiani, Can Kurtulus, N. Kemal Ure
  • For: 该研究旨在提高电池模型的建模效率,以便更好地优化能源管理系统和设计过程。* Methods: 该研究使用深度强化学习来改进传统设计实验(DoE)方法,以避免手动配置多个电流配置,并通过更新过去实验统计信息来动态调整当前实验。* Results: 实验和仿真结果表明,提案的方法可以与传统DoE方法相比,使用85% menos资源获得同样准确的电池模型。
    Abstract Model identification of battery dynamics is a central problem in energy research; many energy management systems and design processes rely on accurate battery models for efficiency optimization. The standard methodology for battery modelling is traditional design of experiments (DoE), where the battery dynamics are excited with many different current profiles and the measured outputs are used to estimate the system dynamics. However, although it is possible to obtain useful models with the traditional approach, the process is time consuming and expensive because of the need to sweep many different current-profile configurations. In the present work, a novel DoE approach is developed based on deep reinforcement learning, which alters the configuration of the experiments on the fly based on the statistics of past experiments. Instead of sticking to a library of predefined current profiles, the proposed approach modifies the current profiles dynamically by updating the output space covered by past measurements, hence only the current profiles that are informative for future experiments are applied. Simulations and real experiments are used to show that the proposed approach gives models that are as accurate as those obtained with traditional DoE but by using 85\% less resources.
    摘要 模型识别电池动态是能源研究的中心问题,许多能源管理系统和设计过程都依赖于准确的电池模型以优化效率。现行的方法是传统的设计实验(DoE),通过刺激电池动态多种不同的电流 Profiling 并根据测量输出来估算系统动态。然而,尽管可以通过传统方法获得有用的模型,但这个过程占用时间和成本很大,因为需要探索许多不同的电流配置。在 presente 工作中,一种新的DoE方法基于深度强化学习被发展出来,该方法在实验过程中基于过去测量的统计参数来修改配置。相比传统方法,该方法不再仅仅依赖于静态的电流配置库,而是在实验过程中动态地修改电流配置,只有在过去测量中有用的电流配置才会被应用。通过实验和真实实验,我们显示了该方法可以提供与传统DoE相同的准确性,但是使用85% fewer resources。

EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation

  • paper_url: http://arxiv.org/abs/2310.08185
  • repo_url: None
  • paper_authors: Wang You, Wenshan Wu, Yaobo Liang, Shaoguang Mao, Chenfei Wu, Maosong Cao, Yuzhe Cai, Yiduo Guo, Yan Xia, Furu Wei, Nan Duan
  • for: 这个论文主要目标是提高长篇文章生成的质量,使其更加 coherent 和 relevante。
  • methods: 这个论文提出了一种新的框架 called Evaluation-guided Iterative Plan Extraction (EIPE-text),它从 narrative 干员中提取计划,并使用这些计划来构建更好的规划器。这个框架包括三个阶段:计划提取、学习和推理。在计划提取阶段,它通过迭代提取和改进计划来构建一个计划库。
  • results: 这个论文的实验结果表明,使用 EIPE-text 可以生成更加 coherent 和 relevante 的长篇文章,比如小说和故事。两个 GPT-4 基于的评估和人类评估都表明了这一点。
    Abstract Plan-and-Write is a common hierarchical approach in long-form narrative text generation, which first creates a plan to guide the narrative writing. Following this approach, several studies rely on simply prompting large language models for planning, which often yields suboptimal results. In this paper, we propose a new framework called Evaluation-guided Iterative Plan Extraction for long-form narrative text generation (EIPE-text), which extracts plans from the corpus of narratives and utilizes the extracted plans to construct a better planner. EIPE-text has three stages: plan extraction, learning, and inference. In the plan extraction stage, it iteratively extracts and improves plans from the narrative corpus and constructs a plan corpus. We propose a question answer (QA) based evaluation mechanism to automatically evaluate the plans and generate detailed plan refinement instructions to guide the iterative improvement. In the learning stage, we build a better planner by fine-tuning with the plan corpus or in-context learning with examples in the plan corpus. Finally, we leverage a hierarchical approach to generate long-form narratives. We evaluate the effectiveness of EIPE-text in the domains of novels and storytelling. Both GPT-4-based evaluations and human evaluations demonstrate that our method can generate more coherent and relevant long-form narratives. Our code will be released in the future.
    摘要 Plan-and-Write 是一种常见的幂等方法,用于长篇叙述文本生成。在这种方法中,首先创建一个指导叙述的计划。然而,许多研究仅通过简单地请求大语言模型生成计划,这经常会得到不佳的结果。在这篇论文中,我们提出了一个新的框架,即评估指导逐步提取计划(EIPE-text)。这个框架包括三个阶段:计划提取、学习和推理。在计划提取阶段,我们使用迭代提取和改进计划的方法,从叙述资源中提取计划,并构建计划库。我们提出了一种问答(QA)基于的评估机制,自动评估计划,并生成详细的计划细化指导,以帮助迭代改进。在学习阶段,我们使用计划库或在计划库中进行Contextual learning 进行训练,以建立更好的规划器。最后,我们采用层次结构来生成长篇叙述。我们在小说和故事领域进行了评估,并得到了人类和 GPT-4 基于的评估结果,表明我们的方法可以生成更 coherent 和 relevante 的长篇叙述。我们将在未来发布代码。

Learn From Model Beyond Fine-Tuning: A Survey

  • paper_url: http://arxiv.org/abs/2310.08184
  • repo_url: https://github.com/ruthless-man/awesome-learn-from-model
  • paper_authors: Hongling Zheng, Li Shen, Anke Tang, Yong Luo, Han Hu, Bo Du, Dacheng Tao
  • for: 这篇论文主要研究的是基于模型接口的学习从模型(Learn From Model,LFM)技术,以提高模型的性能和普适性。
  • methods: 该论文主要介绍了五个主要领域的研究方法,包括模型调整、模型萃取、模型重用、元学习和模型编辑。每个领域都包括了多种方法和策略,用于提高模型的表现和普适性。
  • results: 该论文对现有的研究进行了一个全面的回顾,并提出了未来研究的一些关键领域和需要更多的关注的问题。
    Abstract Foundation models (FM) have demonstrated remarkable performance across a wide range of tasks (especially in the fields of natural language processing and computer vision), primarily attributed to their ability to comprehend instructions and access extensive, high-quality data. This not only showcases their current effectiveness but also sets a promising trajectory towards the development of artificial general intelligence. Unfortunately, due to multiple constraints, the raw data of the model used for large model training are often inaccessible, so the use of end-to-end models for downstream tasks has become a new research trend, which we call Learn From Model (LFM) in this article. LFM focuses on the research, modification, and design of FM based on the model interface, so as to better understand the model structure and weights (in a black box environment), and to generalize the model to downstream tasks. The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse, meta learning and model editing. Each category encompasses a repertoire of methods and strategies that aim to enhance the capabilities and performance of FM. This paper gives a comprehensive review of the current methods based on FM from the perspective of LFM, in order to help readers better understand the current research status and ideas. To conclude, we summarize the survey by highlighting several critical areas for future exploration and addressing open issues that require further attention from the research community. The relevant papers we investigated in this article can be accessed at .
    摘要 基于模型(FM)在各种任务上表现出色,特别是自然语言处理和计算机视觉领域,这主要归功于它们对指令的理解和访问高质量数据的能力。这不仅表明当前的效果,还预示了人工智能发展的美好趋势。然而,由于多种限制,FM的原始数据通常不可 accessible,因此使用端到端模型进行下游任务的研究成为了新的研究趋势,我们在这篇文章中称之为“学习从模型”(LFM)。LFM的研究重点在于 FM 的模型接口上进行研究、修改和设计,以更好地理解模型结构和权重(在黑盒环境中),并将模型扩展到下游任务。LFM 的研究领域可以分为五大类:模型调整、模型蒸馏、模型复用、元学习和模型编辑。每个类别包括一系列方法和策略,旨在提高 FM 的能力和性能。本文通过 FM 的角度,对当前的 LFM 方法进行了全面的审视,以帮助读者更好地了解当前的研究状况和想法。以下是本文的结论:我们将来的探索细分为五个重要领域,并提出了一些需要进一步关注的问题。相关的研究论文可以在 中找到。

Multi-Scale Spatial-Temporal Recurrent Networks for Traffic Flow Prediction

  • paper_url: http://arxiv.org/abs/2310.08138
  • repo_url: None
  • paper_authors: Haiyang Liu, Chunjiang Zhu, Detian Zhang, Qing Li
    for: traffic flow predictionmethods: Multi-Scale Spatial-Temporal Recurrent Network (MSSTRN) with single-step gate recurrent unit and multi-step gate recurrent unit, and spatial-temporal synchronous attention mechanismresults: best prediction accuracy with non-trivial margins compared to all twenty baseline methods.Here is the simplified Chinese text:for: 交通流量预测methods: 多尺度空间-时间径向网络(MSSTRN),包括单步门闭合径向单元和多步门闭合径向单元,以及空间-时间同步注意力机制results: 与所有基线方法相比,实现了最佳预测精度。
    Abstract Traffic flow prediction is one of the most fundamental tasks of intelligent transportation systems. The complex and dynamic spatial-temporal dependencies make the traffic flow prediction quite challenging. Although existing spatial-temporal graph neural networks hold prominent, they often encounter challenges such as (1) ignoring the fixed graph that limits the predictive performance of the model, (2) insufficiently capturing complex spatial-temporal dependencies simultaneously, and (3) lacking attention to spatial-temporal information at different time lengths. In this paper, we propose a Multi-Scale Spatial-Temporal Recurrent Network for traffic flow prediction, namely MSSTRN, which consists of two different recurrent neural networks: the single-step gate recurrent unit and the multi-step gate recurrent unit to fully capture the complex spatial-temporal information in the traffic data under different time windows. Moreover, we propose a spatial-temporal synchronous attention mechanism that integrates adaptive position graph convolutions into the self-attention mechanism to achieve synchronous capture of spatial-temporal dependencies. We conducted extensive experiments on four real traffic datasets and demonstrated that our model achieves the best prediction accuracy with non-trivial margins compared to all the twenty baseline methods.
    摘要 做为智能交通系统的基本任务之一,流行预测是非常复杂和动态的。虽然现有的空间-时间图神经网络具有显著的优势,但它们经常遇到以下困难:(1)忽略固定图,这限制了预测模型的性能;(2)不够同时捕捉复杂的空间-时间依赖关系;(3)缺乏对不同时间长度的空间-时间信息的注意力。在这篇论文中,我们提出了一种多级空间-时间循环网络(MSSTRN),它包括单步门阻循环单元和多步门阻循环单元,以全面捕捉不同时间窗口下的复杂空间-时间信息。此外,我们提出了一种空间-时间同步注意机制,它将适应性位图 convolution integrated into the self-attention mechanism,以同步捕捉空间-时间依赖关系。我们对四个实际交通数据集进行了广泛的实验,并证明了我们的模型在所有二十个基eline方法的比较下具有最好的预测精度。

Can Large Language Models Really Improve by Self-critiquing Their Own Plans?

  • paper_url: http://arxiv.org/abs/2310.08118
  • repo_url: None
  • paper_authors: Karthik Valmeekam, Matthew Marquez, Subbarao Kambhampati
  • for: investigate the verification/self-critiquing abilities of large language models in the context of planning
  • methods: employ LLMs for both plan generation and verification, assess the verifier LLM’s performance against ground-truth verification, and evaluate the impact of self-critiquing and feedback levels on system performance
  • results: self-critiquing appears to diminish plan generation performance, LLM verifiers produce a notable number of false positives, and the nature of feedback has minimal impact on plan generation.
    Abstract There have been widespread claims about Large Language Models (LLMs) being able to successfully verify or self-critique their candidate solutions in reasoning problems in an iterative mode. Intrigued by those claims, in this paper we set out to investigate the verification/self-critiquing abilities of large language models in the context of planning. We evaluate a planning system that employs LLMs for both plan generation and verification. We assess the verifier LLM's performance against ground-truth verification, the impact of self-critiquing on plan generation, and the influence of varying feedback levels on system performance. Using GPT-4, a state-of-the-art LLM, for both generation and verification, our findings reveal that self-critiquing appears to diminish plan generation performance, especially when compared to systems with external, sound verifiers and the LLM verifiers in that system produce a notable number of false positives, compromising the system's reliability. Additionally, the nature of feedback, whether binary or detailed, showed minimal impact on plan generation. Collectively, our results cast doubt on the effectiveness of LLMs in a self-critiquing, iterative framework for planning tasks.
    摘要 有很多人提出了大型自然语言模型(LLM)可以成功验证或自我批判其候选解决方案的宣传。为了调查这些宣传,我们在这篇论文中进行了大语言模型在规划中的验证/自我批判能力的调查。我们评估了使用LLM进行生成和验证的规划系统。我们评估了验证LLM的性能与基准验证、自我批判对计划生成的影响以及不同反馈水平对系统性能的影响。使用GPT-4,当前的状态顶尖LLM,进行生成和验证,我们发现自我批判对计划生成性能产生了负面影响,尤其是与外部、有效的验证器和LLM验证器相比。此外,我们发现验证LLM生成的许多假阳性,这使得系统的可靠性受到了损害。此外,反馈的性质,无论是binary还是详细,对计划生成没有显著影响。总之,我们的结果表明LLM在自我批判、迭代模式下的规划任务效果不足。

DUSA: Decoupled Unsupervised Sim2Real Adaptation for Vehicle-to-Everything Collaborative Perception

  • paper_url: http://arxiv.org/abs/2310.08117
  • repo_url: https://github.com/refkxh/DUSA
  • paper_authors: Xianghao Kong, Wentao Jiang, Jinrang Jia, Yifeng Shi, Runsheng Xu, Si Liu
  • for: 这个研究是为了解决自动驾驶需要高精度的车辆到所有事物(V2X)的共同感知问题,但是获得大量真实世界数据可能是costly和difficult的。因此,实验数据获得了更多的注意,因为它们可以在非常低的成本下生成大量的数据。但是,实验和真实世界之间的领域差强度常常导致从实验数据训练的模型在真实世界数据上表现不佳。
  • methods: 这个研究使用了一种名为Decoupled Unsupervised Sim2Real Adaptation(DUSA)的新方法,它将V2X共同感知领域的 sim2real 领域对应问题分解为两个互相独立的子问题: sim2real 适应和间 agent 适应。在 sim2real 适应方面,我们设计了一个位置适应的 LSA(Location-adaptive Sim2Real Adapter)模组,将从critical locations of the feature map中提取的特征进行适应,并通过一个 sim/real 检测器来调整这些特征与实验数据之间的对应。在间 agent 适应方面,我们还提出了一个 Confidence-aware Inter-agent Adapter(CIA)模组,将Agent-wise confidence maps的指导下进行细部特征的对应。
  • results: 实验结果显示,提案的 DUSA 方法在无supervision的 sim2real 适应上具有优秀的效果,从 simulated V2XSet 数据集中获得了高精度的 V2X 共同感知结果,并且在真实世界 DAIR-V2X-C 数据集上进行验证。
    Abstract Vehicle-to-Everything (V2X) collaborative perception is crucial for autonomous driving. However, achieving high-precision V2X perception requires a significant amount of annotated real-world data, which can always be expensive and hard to acquire. Simulated data have raised much attention since they can be massively produced at an extremely low cost. Nevertheless, the significant domain gap between simulated and real-world data, including differences in sensor type, reflectance patterns, and road surroundings, often leads to poor performance of models trained on simulated data when evaluated on real-world data. In addition, there remains a domain gap between real-world collaborative agents, e.g. different types of sensors may be installed on autonomous vehicles and roadside infrastructures with different extrinsics, further increasing the difficulty of sim2real generalization. To take full advantage of simulated data, we present a new unsupervised sim2real domain adaptation method for V2X collaborative detection named Decoupled Unsupervised Sim2Real Adaptation (DUSA). Our new method decouples the V2X collaborative sim2real domain adaptation problem into two sub-problems: sim2real adaptation and inter-agent adaptation. For sim2real adaptation, we design a Location-adaptive Sim2Real Adapter (LSA) module to adaptively aggregate features from critical locations of the feature map and align the features between simulated data and real-world data via a sim/real discriminator on the aggregated global feature. For inter-agent adaptation, we further devise a Confidence-aware Inter-agent Adapter (CIA) module to align the fine-grained features from heterogeneous agents under the guidance of agent-wise confidence maps. Experiments demonstrate the effectiveness of the proposed DUSA approach on unsupervised sim2real adaptation from the simulated V2XSet dataset to the real-world DAIR-V2X-C dataset.
    摘要

Promptor: A Conversational and Autonomous Prompt Generation Agent for Intelligent Text Entry Techniques

  • paper_url: http://arxiv.org/abs/2310.08101
  • repo_url: None
  • paper_authors: Junxiao Shen, John J. Dudley, Jingyao Zheng, Bill Byrne, Per Ola Kristensson
  • for: 这篇研究旨在提高文本输入的效率和流畅性,并且应对深度学习模型在文本输入中的应用。
  • methods: 这篇研究使用了大型语言模型GPT-3.5的内置学习能力,将其训练为不同的文本预测技术。另外,还引入了一个对话式提示生成器Promptor,以帮助设计师创建适当的提示。
  • results: 研究结果显示,使用Promptor生成的提示可以提高文本预测的相似度和 coherence 比设计师自己创建的提示高出35%和22%。
    Abstract Text entry is an essential task in our day-to-day digital interactions. Numerous intelligent features have been developed to streamline this process, making text entry more effective, efficient, and fluid. These improvements include sentence prediction and user personalization. However, as deep learning-based language models become the norm for these advanced features, the necessity for data collection and model fine-tuning increases. These challenges can be mitigated by harnessing the in-context learning capability of large language models such as GPT-3.5. This unique feature allows the language model to acquire new skills through prompts, eliminating the need for data collection and fine-tuning. Consequently, large language models can learn various text prediction techniques. We initially showed that, for a sentence prediction task, merely prompting GPT-3.5 surpassed a GPT-2 backed system and is comparable with a fine-tuned GPT-3.5 model, with the latter two methods requiring costly data collection, fine-tuning and post-processing. However, the task of prompting large language models to specialize in specific text prediction tasks can be challenging, particularly for designers without expertise in prompt engineering. To address this, we introduce Promptor, a conversational prompt generation agent designed to engage proactively with designers. Promptor can automatically generate complex prompts tailored to meet specific needs, thus offering a solution to this challenge. We conducted a user study involving 24 participants creating prompts for three intelligent text entry tasks, half of the participants used Promptor while the other half designed prompts themselves. The results show that Promptor-designed prompts result in a 35% increase in similarity and 22% in coherence over those by designers.
    摘要 文本输入是我们日常数字互动中的基本任务。许多智能功能已经被开发出来,以减少这个过程的复杂性、效率和流畅性。这些改进包括句子预测和用户个性化。然而,随着深度学习基于语言模型成为标准,数据采集和模型细化的必要性增加。这些挑战可以通过大语言模型的上下文学习能力来解决,例如GPT-3.5。这种特有的功能允许语言模型通过提示来获得新的技能,从而消除数据采集和细化的需要。因此,大语言模型可以学习多种文本预测技术。我们的初步研究表明,对于句子预测任务,只需提示GPT-3.5,其性能比GPT-2 backing system和精心细化GPT-3.5模型高,但需要大量数据采集、细化和后处理。然而,让大语言模型专注于特定文本预测任务的任务可能是挑战,特别是没有提示工程学习的设计师。为解决这个问题,我们介绍了Promptor,一个用于生成对话提示的对话引擎,旨在与设计师进行激活engage。Promptor可以自动生成特定需求的复杂提示,因此为这个挑战提供了解决方案。我们对24名参与者进行了用户研究,其中一半使用Promptor,另一半设计自己的提示。结果表明,Promptor-设计的提示与设计师自己设计的提示相比,同样的任务上的相似性提高35%, coherence提高22%。

Sentinel: An Aggregation Function to Secure Decentralized Federated Learning

  • paper_url: http://arxiv.org/abs/2310.08097
  • repo_url: None
  • paper_authors: Chao Feng, Alberto Huertas Celdran, Janosch Baltensperger, Enrique Tomas Martinez Beltran, Gerome Bovet, Burkhard Stiller
  • for: 本研究旨在提出一种防御策略,以counteract poisoning attacks在分布式学习(DFL)中。
  • methods: 该策略基于本地数据的可访问性,定义了三步集成协议:相似性筛选、bootstrap验证和 нормализация,以保护 Against malicious model updates。
  • results: 对于多种数据集和攻击类型和威胁水平,Sentinel可以提高防御性能,超越当前领域的状态之作。
    Abstract The rapid integration of Federated Learning (FL) into networking encompasses various aspects such as network management, quality of service, and cybersecurity while preserving data privacy. In this context, Decentralized Federated Learning (DFL) emerges as an innovative paradigm to train collaborative models, addressing the single point of failure limitation. However, the security and trustworthiness of FL and DFL are compromised by poisoning attacks, negatively impacting its performance. Existing defense mechanisms have been designed for centralized FL and they do not adequately exploit the particularities of DFL. Thus, this work introduces Sentinel, a defense strategy to counteract poisoning attacks in DFL. Sentinel leverages the accessibility of local data and defines a three-step aggregation protocol consisting of similarity filtering, bootstrap validation, and normalization to safeguard against malicious model updates. Sentinel has been evaluated with diverse datasets and various poisoning attack types and threat levels, improving the state-of-the-art performance against both untargeted and targeted poisoning attacks.
    摘要 随着联邦学习(FL)在网络中的快速整合,包括网络管理、质量服务和网络安全,同时保护数据隐私。在这个 контексте,分布式联邦学习(DFL) emerges as an innovative paradigm to train collaborative models, addressing the single point of failure limitation。然而,FL和DFL的安全性和可靠性受到毒素攻击的威胁,这会 negatively impact its performance。现有的防御机制是为中央化FL设计的,它们不充分利用了 DFL 的特点。因此,这个工作介绍了 Sentinel,一种防御策略,用于对抗毒素攻击在 DFL 中。Sentinel 利用了本地数据的可 accessible 性,并定义了三步集成协议,包括相似性筛选、 bootstrap 验证和归一化,以保护 против 恶意模型更新。Sentinel 在多种数据集和不同类型和威胁水平的攻击下进行了评估,提高了对于不argeted和targeted毒素攻击的状态前艺性表现。

Discerning Temporal Difference Learning

  • paper_url: http://arxiv.org/abs/2310.08091
  • repo_url: None
  • paper_authors: Jianfei Ma
  • for: 提高 reinforcement learning 中值函数的效率评估
  • methods: 使用 temporal difference learning ($\lambda$) 和 flexible emphasis functions
  • results: 提高 value estimation 和 学习速度,适用于多种情况
    Abstract Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL), aimed at efficiently assessing a policy's value function. TD($\lambda$), a potent variant, incorporates a memory trace to distribute the prediction error into the historical context. However, this approach often neglects the significance of historical states and the relative importance of propagating the TD error, influenced by challenges such as visitation imbalance or outcome noise. To address this, we propose a novel TD algorithm named discerning TD learning (DTD), which allows flexible emphasis functions$-$predetermined or adapted during training$-$to allocate efforts effectively across states. We establish the convergence properties of our method within a specific class of emphasis functions and showcase its promising potential for adaptation to deep RL contexts. Empirical results underscore that employing a judicious emphasis function not only improves value estimation but also expedites learning across diverse scenarios.
    摘要 <>Temporal difference learning(TD)是RL中的基本概念,旨在效率地评估策略的价值函数。TD($\lambda$)是一种强大的变体,它在历史上追溯记忆中分配预测错误。然而,这种方法经常忽略历史状态的重要性和对TD错误的相对重要性,这可能导致探索偏误或结果噪音的问题。为解决这个问题,我们提出了一种新的TD算法,名为分化TD学习(DTD),它允许在训练过程中预先或适应定制强调函数,以有效地分配努力 across状态。我们证明了我们的方法在特定类型的强调函数下的收敛性质,并在深度RLContext中展示了其扬名的潜力。实验结果表明,采用合适的强调函数不仅改善价值估计,而且加快学习过程中的探索。Translation notes:* TD($\lambda$) is translated as TD($\lambda$),where $\lambda$ is a memory trace.* 历史状态 (lìshǐ zhèngjī) is translated as "historical states".* 探索偏误 (tànsuǒ biànpò) is translated as "exploration bias".* 结果噪音 (jiéguǒ zhāoxīn) is translated as "outcome noise".* 强调函数 (qiángdǎo fungs) is translated as "emphasis functions".* 深度RLContext (shēngrán yījīng) is translated as "deep RL contexts".

Low-Resource Clickbait Spoiling for Indonesian via Question Answering

  • paper_url: http://arxiv.org/abs/2310.08085
  • repo_url: None
  • paper_authors: Ni Putu Intan Maharani, Ayu Purwarianti, Alham Fikri Aji
  • for: 这个论文的目的是怎样解决Clickbait spoiling问题,即通过生成短文满足Clickbait文章中引起的Curiosity。
  • methods: 这个论文使用了跨语言零shot问答模型,以及多语言模型,来解决Clickbait spoiling问题。
  • results: 实验结果表明,XLM-RoBERTa(大)模型在短语和段落 spoilers 中表现最佳,而 mDeBERTa(基础)模型在多部 spoilers 中表现最佳。
    Abstract Clickbait spoiling aims to generate a short text to satisfy the curiosity induced by a clickbait post. As it is a newly introduced task, the dataset is only available in English so far. Our contributions include the construction of manually labeled clickbait spoiling corpus in Indonesian and an evaluation on using cross-lingual zero-shot question answering-based models to tackle clikcbait spoiling for low-resource language like Indonesian. We utilize selection of multilingual language models. The experimental results suggest that XLM-RoBERTa (large) model outperforms other models for phrase and passage spoilers, meanwhile, mDeBERTa (base) model outperforms other models for multipart spoilers.
    摘要 Clickbait 恶作戏目的是生成一篇短文以满足 clickbait 帖子所引起的好奇心。现在这个任务刚刚引入,数据集只有英语版本。我们的贡献包括手动标注的 Indonesian clickbait 恶作戏训练集,以及使用 cross-lingual zero-shot 问答模型来解决 low-resource 语言 like Indonesian 的 clickbait 恶作戏。我们利用多语言语言模型的选择。实验结果表明, XLM-RoBERTa (大) 模型在短语和段落 spoilers 方面表现出色,而 mDeBERTa (基础) 模型在多部 spoilers 方面表现更佳。

GameGPT: Multi-agent Collaborative Framework for Game Development

  • paper_url: http://arxiv.org/abs/2310.08067
  • repo_url: None
  • paper_authors: Dake Chen, Hanbin Wang, Yunhao Huo, Yuzhao Li, Haoyang Zhang
  • for: automatize and expedite game development processes
  • methods: dual collaboration, layered approaches with several in-house lexicons, and decoupling approach
  • results: mitigate hallucination and redundancy in planning, task identification, and implementation phases, and achieve code generation with better precision.Here’s the full translation of the paper’s abstract in Simplified Chinese:
  • for: 这个论文主要是为了自动化和加速游戏开发过程而写的。
  • methods: 该框架使用了双合作、层次分解和多个内部词典的方法来 Mitigate hallucination和重复性在规划、任务标识和实现阶段。
  • results: 这些方法可以减少hallucination和重复性,并实现代码生成更加精准。
    Abstract The large language model (LLM) based agents have demonstrated their capacity to automate and expedite software development processes. In this paper, we focus on game development and propose a multi-agent collaborative framework, dubbed GameGPT, to automate game development. While many studies have pinpointed hallucination as a primary roadblock for deploying LLMs in production, we identify another concern: redundancy. Our framework presents a series of methods to mitigate both concerns. These methods include dual collaboration and layered approaches with several in-house lexicons, to mitigate the hallucination and redundancy in the planning, task identification, and implementation phases. Furthermore, a decoupling approach is also introduced to achieve code generation with better precision.
    摘要 大型语言模型(LLM)基于代理的代理系统已经展示了自动化和加速软件开发过程的能力。在这篇文章中,我们专注于游戏开发,并提出了一个多代理协同框架,名为GameGPT,以自动化游戏开发。许多研究都指出了推几成为 LLM 在生产环境中应用时的主要障碍。我们则识别了另一个问题:重复。我们的框架提出了一系列方法来减轻这两个问题。这些方法包括双投递和层次方法,以减少在规划、任务识别和实现阶段中的重复和推几。此外,我们还引入了解离方法,以实现代码生成的更高精度。

The Search-and-Mix Paradigm in Approximate Nash Equilibrium Algorithms

  • paper_url: http://arxiv.org/abs/2310.08066
  • repo_url: None
  • paper_authors: Xiaotie Deng, Dongchen Li, Hanyu Li
  • for: 本文旨在提供一种自动化筛选和混合方法,用于计算 approximate Nash equilibria 在两个玩家的游戏中。
  • methods: 本文使用了一种搜索和混合方法,其中包括一个搜索阶段和一个混合阶段。通过这种方法,我们可以自动化筛选和混合过程,并不需要手写证明。
  • results: 本文通过自动化筛选和混合方法,可以计算出所有文献中的算法精度下界。同时,我们还可以使用一个程序来分析这些下界,而不需要手写证明。这种方法可以扩展到其他算法中,以自动化其分析。
    Abstract AI in Math deals with mathematics in a constructive manner so that reasoning becomes automated, less laborious, and less error-prone. For algorithms, the question becomes how to automate analyses for specific problems. For the first time, this work provides an automatic method for approximation analysis on a well-studied problem in theoretical computer science: computing approximate Nash equilibria in two-player games. We observe that such algorithms can be reformulated into a search-and-mix paradigm, which involves a search phase followed by a mixing phase. By doing so, we are able to fully automate the procedure of designing and analyzing the mixing phase. For example, we illustrate how to perform our method with a program to analyze the approximation bounds of all the algorithms in the literature. Same approximation bounds are computed without any hand-written proof. Our automatic method heavily relies on the LP-relaxation structure in approximate Nash equilibria. Since many approximation algorithms and online algorithms adopt the LP relaxation, our approach may be extended to automate the analysis of other algorithms.
    摘要

Learning from Label Proportions: Bootstrapping Supervised Learners via Belief Propagation

  • paper_url: http://arxiv.org/abs/2310.08056
  • repo_url: None
  • paper_authors: Shreyas Havaldar, Navodita Sharma, Shubhi Sareen, Karthikeyan Shanmugam, Aravindan Raghuveer
  • for: 本文targets the Learning from Label Proportions (LLP) problem, where only aggregate level labels are available during training, and the aim is to achieve the best performance at the instance-level on test data.
  • methods: 本文提出了一个新的算法框架,包括两个主要步骤:Pseudo Labeling和Embedding Refinement。在Pseudo Labeling阶段,我们使用Gibbs分布 incorporate covariate information和bag level aggregated label,然后使用Belief Propagation marginalize Gibbs distribution获得pseudo labels。在Embedding Refinement阶段,我们使用pseudo labels提供supervision for a learner to obtain a better embedding。
  • results: 本文的算法在LLPBinary Classification问题上 display strong gains against several SOTA baselines (up to 15%) on various dataset types - tabular and Image. 更重要的是,我们的算法可以在大袋子样本数量下 достичь这些提高,并且具有较少的计算负担。
    Abstract Learning from Label Proportions (LLP) is a learning problem where only aggregate level labels are available for groups of instances, called bags, during training, and the aim is to get the best performance at the instance-level on the test data. This setting arises in domains like advertising and medicine due to privacy considerations. We propose a novel algorithmic framework for this problem that iteratively performs two main steps. For the first step (Pseudo Labeling) in every iteration, we define a Gibbs distribution over binary instance labels that incorporates a) covariate information through the constraint that instances with similar covariates should have similar labels and b) the bag level aggregated label. We then use Belief Propagation (BP) to marginalize the Gibbs distribution to obtain pseudo labels. In the second step (Embedding Refinement), we use the pseudo labels to provide supervision for a learner that yields a better embedding. Further, we iterate on the two steps again by using the second step's embeddings as new covariates for the next iteration. In the final iteration, a classifier is trained using the pseudo labels. Our algorithm displays strong gains against several SOTA baselines (up to 15%) for the LLP Binary Classification problem on various dataset types - tabular and Image. We achieve these improvements with minimal computational overhead above standard supervised learning due to Belief Propagation, for large bag sizes, even for a million samples.
    摘要 学习从标签分布(LLP)是一个学习问题,在训练时只有袋子级别标签可用,并且目标是在测试数据上达到最佳实例级别性能。这种设定出现在广告和医疗等领域 due to privacy considerations。我们提出了一种新的算法框架,它在每次迭代中执行两个主要步骤。在第一步(假标签生成)中,我们定义了一个 Gibbs 分布 над二进制实例标签,该分布包含 a) covariate 信息通过要求同 covariate 的实例有同样的标签,以及 b) 袋子级别归一化标签。然后,我们使用信念传播(BP)来抽象 Gibbs 分布,从而获得假标签。在第二步(嵌入级修正)中,我们使用假标签来提供对一个更好的嵌入的超vision。然后,我们在下一轮迭代中使用上一轮的嵌入作为新的covariate。在最后一轮迭代中,我们使用假标签来训练一个分类器。我们的算法在LLP binary classification问题中 Display strong gains against several SOTA baselines (up to 15%) on various dataset types - tabular and Image. We achieve these improvements with minimal computational overhead above standard supervised learning due to Belief Propagation, even for large bag sizes, even for a million samples.

Understanding and Controlling a Maze-Solving Policy Network

  • paper_url: http://arxiv.org/abs/2310.08043
  • repo_url: None
  • paper_authors: Ulisse Mini, Peli Grietzer, Mrinank Sharma, Austin Meek, Monte MacDiarmid, Alexander Matt Turner
  • for: 研究AI系统的目标和目标表现方式,通过实际测试一个预训练的套件学习策略,实现迷宫 Navigation 的多个上下文依赖性目标。
  • methods: 使用实验研究方法,精确地研究这个策略所解决的迷宫 Navigation 问题,并对策略中的不同部分进行修改和测试,以探索策略中的多个目标表现方式。
  • results: 研究发现,这个策略包含多个重复、分散和可重新目标表现方式,并且可以通过修改策略中的不同部分来控制策略的行为。这些结果提供了关于对AI系统的目标和目标表现方式的更深入理解。
    Abstract To understand the goals and goal representations of AI systems, we carefully study a pretrained reinforcement learning policy that solves mazes by navigating to a range of target squares. We find this network pursues multiple context-dependent goals, and we further identify circuits within the network that correspond to one of these goals. In particular, we identified eleven channels that track the location of the goal. By modifying these channels, either with hand-designed interventions or by combining forward passes, we can partially control the policy. We show that this network contains redundant, distributed, and retargetable goal representations, shedding light on the nature of goal-direction in trained policy networks.
    摘要 要了解人工智能系统的目标和目标表达,我们仔细研究了一个预训练的奖励学习策略,该策略在迷宫中穿梭到多种目标方块。我们发现该网络追求多个Context-dependent目标,并且我们进一步确定了网络中的一些Circuits与这些目标相关。例如,我们发现了11个跟踪目标的通道。通过修改这些通道,可以在一定程度上控制策略。我们显示,这个网络包含多余的、分布式的和可重定向的目标表达,这 shed light on the nature of goal-direction in trained policy networks。

QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models

  • paper_url: http://arxiv.org/abs/2310.08041
  • repo_url: None
  • paper_authors: Jing Liu, Ruihao Gong, Xiuying Wei, Zhiwei Dong, Jianfei Cai, Bohan Zhuang
  • for: 提高大型语言模型(LLMs)的广泛部署,因为它们的需求很高。
  • methods: 提议使用Quantization-Aware Training(QAT)来解决这个问题,但QAT的训练成本很高。因此,提议使用Post-Training Quantization(PTQ)来实现LLMs的低位数部署。
  • results: 提出了一种名为QLLM的准确和高效的低位数PTQ方法,可以快速地量化LLMs。例如,在LLaMA-2上,QLLM可以在10个小时内量化4位数LLaMA-2-70B模型,比前一个state-of-the-art方法提高了7.89%的均值准确率。
    Abstract Large Language Models (LLMs) excel in NLP, but their demands hinder their widespread deployment. While Quantization-Aware Training (QAT) offers a solution, its extensive training costs make Post-Training Quantization (PTQ) a more practical approach for LLMs. In existing studies, activation outliers in particular channels are identified as the bottleneck to PTQ accuracy. They propose to transform the magnitudes from activations to weights, which however offers limited alleviation or suffers from unstable gradients, resulting in a severe performance drop at low-bitwidth. In this paper, we propose QLLM, an accurate and efficient low-bitwidth PTQ method designed for LLMs. QLLM introduces an adaptive channel reassembly technique that reallocates the magnitude of outliers to other channels, thereby mitigating their impact on the quantization range. This is achieved by channel disassembly and channel assembly, which first breaks down the outlier channels into several sub-channels to ensure a more balanced distribution of activation magnitudes. Then similar channels are merged to maintain the original channel number for efficiency. Additionally, an adaptive strategy is designed to autonomously determine the optimal number of sub-channels for channel disassembly. To further compensate for the performance loss caused by quantization, we propose an efficient tuning method that only learns a small number of low-rank weights while freezing the pre-trained quantized model. After training, these low-rank parameters can be fused into the frozen weights without affecting inference. Extensive experiments on LLaMA-1 and LLaMA-2 show that QLLM can obtain accurate quantized models efficiently. For example, QLLM quantizes the 4-bit LLaMA-2-70B within 10 hours on a single A100-80G GPU, outperforming the previous state-of-the-art method by 7.89% on the average accuracy across five zero-shot tasks.
    摘要 大型语言模型(LLM)在自然语言处理(NLP)领域表现出色,但它们的需求限制了它们的广泛部署。量化意识训练(QAT)提供了一种解决方案,但它的训练成本非常高,因此在LLM中使用Post-Training Quantization(PTQ)成为了更实际的方法。在现有的研究中,活动异常值在特定通道被识别为PTQ准确率的瓶颈。它们提议将活动的大小从权重转移到 weights,但这些方法具有有限的缓解或因为不稳定的梯度而导致性能下降。在这篇论文中,我们提出了QLLM,一种高效和准确的低位宽PTQ方法,适用于LLM。QLLM引入了适应通道重新组装技术,通过将异常通道的大小分配到其他通道来缓解其影响量化范围。这是通过通道分解和通道组装来实现的,先将异常通道分解成多个子通道,以确保更加平衡的活动大小分布。然后,类似通道被合并以维持原始通道数量的效率。此外,我们还提出了一种自适应的整数截取策略,以确定最佳的子通道数量。为了进一步补偿由量化所带来的性能损失,我们还提出了一种高效的调整方法,只需学习一小部分的低维度参数,而不影响推理。经验表明,QLLM可以高效地生成准确的量化模型,比如在4位LLAMA-2-70B上,QLLM在10个小时内在单个A100-80G GPU上量化,并在五个零shot任务上平均提高了7.89%的准确率。

Rethinking Large-scale Pre-ranking System: Entire-chain Cross-domain Models

  • paper_url: http://arxiv.org/abs/2310.08039
  • repo_url: https://github.com/songjinbo/ECMM
  • paper_authors: Jinbo Song, Ruoran Huang, Xinyang Wang, Wei Huang, Qian Yu, Mingming Chen, Yafei Yao, Chaosheng Fan, Changping Peng, Zhangang Lin, Jinghe Hu, Jingping Shao
  • for: 提高推荐系统和在线广告中的多Stage架构的性能,减少样本选择偏见问题。
  • methods: 提出基于全样本空间的整体链模型(ECM),并设计了细化神经网络结构ECMM以提高预选精度。
  • results: 对实际大规模交通日志数据进行评估,ECM模型比状态艺术法准确率高,时间消耗在可接受水平之下,实现了更好的效率和效果之间的交易。
    Abstract Industrial systems such as recommender systems and online advertising, have been widely equipped with multi-stage architectures, which are divided into several cascaded modules, including matching, pre-ranking, ranking and re-ranking. As a critical bridge between matching and ranking, existing pre-ranking approaches mainly endure sample selection bias (SSB) problem owing to ignoring the entire-chain data dependence, resulting in sub-optimal performances. In this paper, we rethink pre-ranking system from the perspective of the entire sample space, and propose Entire-chain Cross-domain Models (ECM), which leverage samples from the whole cascaded stages to effectively alleviate SSB problem. Besides, we design a fine-grained neural structure named ECMM to further improve the pre-ranking accuracy. Specifically, we propose a cross-domain multi-tower neural network to comprehensively predict for each stage result, and introduce the sub-networking routing strategy with $L0$ regularization to reduce computational costs. Evaluations on real-world large-scale traffic logs demonstrate that our pre-ranking models outperform SOTA methods while time consumption is maintained within an acceptable level, which achieves better trade-off between efficiency and effectiveness.
    摘要 Note:* 推荐系统 (recommender systems) is translated as 推荐系统* online advertising is translated as 在线广告* SOTA (state-of-the-art) is translated as 当前最佳方案* SSB (sample selection bias) is translated as 样本选择偏见

Receive, Reason, and React: Drive as You Say with Large Language Models in Autonomous Vehicles

  • paper_url: http://arxiv.org/abs/2310.08034
  • repo_url: None
  • paper_authors: Can Cui, Yunsheng Ma, Xu Cao, Wenqian Ye, Ziran Wang
  • for: 提高自动驾驶车辆的安全性和效率,通过语言模型增强决策过程
  • methods: 利用语言模型的语言和上下文理解能力,与专门的工具集成在自动驾驶车辆中
  • results: 实验表明,使用链条提示可以提高驾驶决策,并实现实时个性化驾驶,语言模型可以 influencing driving behaviors based on verbal commands, leading to improved driving decisions and personalized driving experiences.
    Abstract The fusion of human-centric design and artificial intelligence (AI) capabilities has opened up new possibilities for next-generation autonomous vehicles that go beyond transportation. These vehicles can dynamically interact with passengers and adapt to their preferences. This paper proposes a novel framework that leverages Large Language Models (LLMs) to enhance the decision-making process in autonomous vehicles. By utilizing LLMs' linguistic and contextual understanding abilities with specialized tools, we aim to integrate the language and reasoning capabilities of LLMs into autonomous vehicles. Our research includes experiments in HighwayEnv, a collection of environments for autonomous driving and tactical decision-making tasks, to explore LLMs' interpretation, interaction, and reasoning in various scenarios. We also examine real-time personalization, demonstrating how LLMs can influence driving behaviors based on verbal commands. Our empirical results highlight the substantial advantages of utilizing chain-of-thought prompting, leading to improved driving decisions, and showing the potential for LLMs to enhance personalized driving experiences through ongoing verbal feedback. The proposed framework aims to transform autonomous vehicle operations, offering personalized support, transparent decision-making, and continuous learning to enhance safety and effectiveness. We achieve user-centric, transparent, and adaptive autonomous driving ecosystems supported by the integration of LLMs into autonomous vehicles.
    摘要 人 centered 设计和人工智能(AI)技术的融合已经开启了下一代自动驾驶车的新可能性,这些车辆可以在交通过程中动态与乘客交互,并根据他们的偏好进行适应。本文提出了一种新的框架,利用大型自然语言模型(LLM)来增强自动驾驶车的决策过程。通过利用 LLM 的语言和上下文理解能力,我们希望将语言和思维能力 integrate into autonomous vehicles。我们的研究包括在 HighwayEnv 环境中进行了一系列的实验,以探索 LLM 在不同场景中的解释、交互和思维能力。我们还考虑了实时个性化,以示如何 LLM 可以根据 verbalemands 的指令来影响驾驶行为。我们的实验结果表明,使用 chain-of-thought prompting 可以提高驾驶决策的质量,并显示 LLM 可以在不断的语言反馈中提供个性化驾驶体验。我们的提案框架 aimsto transform autonomous vehicle operations, offering personalized support, transparent decision-making, and continuous learning to enhance safety and effectiveness。我们实现了用户中心、透明和 adaptive 的自动驾驶生态系统,通过 LLM 的 integrate into autonomous vehicles。

Incorporating Domain Knowledge Graph into Multimodal Movie Genre Classification with Self-Supervised Attention and Contrastive Learning

  • paper_url: http://arxiv.org/abs/2310.08032
  • repo_url: https://github.com/aoluming/IDKG
  • paper_authors: Jiaqi Li, Guilin Qi, Chuanyi Zhang, Yongrui Chen, Yiming Tan, Chenlong Xia, Ye Tian
  • for: 这篇论文旨在提高多模态电影类别分类的性能,解决了 metadata 中的群体关系未利用、自动注意分配和综合特征混合等问题。
  • methods: 该方法利用知识图从多种角度来解决这些问题,首先将 metadata 转化为域知识图,然后使用 translate model 获取知识图中的关系。接着,引入自动注意分配模块,使用自我超vised学习学习知识图的分布,生成合理的注意重量。最后,提出一种 Genre-Centroid Anchored Contrastive Learning 模块,增强综合特征的抑制能力。
  • results: 实验结果表明,我们的方法在 MM-IMDb 2.0 数据集上比现有方法高效,并且在 MM-IMDb 数据集上也达到了比较好的效果。
    Abstract Multimodal movie genre classification has always been regarded as a demanding multi-label classification task due to the diversity of multimodal data such as posters, plot summaries, trailers and metadata. Although existing works have made great progress in modeling and combining each modality, they still face three issues: 1) unutilized group relations in metadata, 2) unreliable attention allocation, and 3) indiscriminative fused features. Given that the knowledge graph has been proven to contain rich information, we present a novel framework that exploits the knowledge graph from various perspectives to address the above problems. As a preparation, the metadata is processed into a domain knowledge graph. A translate model for knowledge graph embedding is adopted to capture the relations between entities. Firstly we retrieve the relevant embedding from the knowledge graph by utilizing group relations in metadata and then integrate it with other modalities. Next, we introduce an Attention Teacher module for reliable attention allocation based on self-supervised learning. It learns the distribution of the knowledge graph and produces rational attention weights. Finally, a Genre-Centroid Anchored Contrastive Learning module is proposed to strengthen the discriminative ability of fused features. The embedding space of anchors is initialized from the genre entities in the knowledge graph. To verify the effectiveness of our framework, we collect a larger and more challenging dataset named MM-IMDb 2.0 compared with the MM-IMDb dataset. The experimental results on two datasets demonstrate that our model is superior to the state-of-the-art methods. We will release the code in the near future.
    摘要 多Modal电影类别分类一直被视为一项具有多个标签的复杂分类任务,这是因为多modal数据,如海报、剧情简介、预告片和元数据的多样性。 existing works have made great progress in modeling and combining each modality, but they still face three issues: 1) unutilized group relations in metadata, 2) unreliable attention allocation, and 3) indiscriminative fused features. Given that the knowledge graph has been proven to contain rich information, we present a novel framework that exploits the knowledge graph from various perspectives to address the above problems. As a preparation, the metadata is processed into a domain knowledge graph. A translate model for knowledge graph embedding is adopted to capture the relations between entities. Firstly we retrieve the relevant embedding from the knowledge graph by utilizing group relations in metadata and then integrate it with other modalities. Next, we introduce an Attention Teacher module for reliable attention allocation based on self-supervised learning. It learns the distribution of the knowledge graph and produces rational attention weights. Finally, a Genre-Centroid Anchored Contrastive Learning module is proposed to strengthen the discriminative ability of fused features. The embedding space of anchors is initialized from the genre entities in the knowledge graph. To verify the effectiveness of our framework, we collect a larger and more challenging dataset named MM-IMDb 2.0 compared with the MM-IMDb dataset. The experimental results on two datasets demonstrate that our model is superior to the state-of-the-art methods. We will release the code in the near future.

Beyond Sharing Weights in Decoupling Feature Learning Network for UAV RGB-Infrared Vehicle Re-Identification

  • paper_url: http://arxiv.org/abs/2310.08026
  • repo_url: None
  • paper_authors: Xingyue Liu, Jiahao Qi, Chen Chen, Kangcheng Bin, Ping Zhong
  • for: 该论文旨在解决无人机视觉检索中的跨模态车辆识别问题,提高视觉监测和公共安全领域的应用。
  • methods: 该论文提出了一个跨模态车辆识别 benchmark 名为 UAV Cross-Modality Vehicle Re-ID (UCM-VeID),包含 753 个标识性的车辆图像,以及一种 hybrid weights decoupling network (HWDNet) 来解决跨模态差异和方向差异挑战。
  • results: 实验结果表明,UCM-VeID 可以有效地解决跨模态车辆识别问题,并且 HWDNet 可以学习共享的 orientation-invariant 特征。
    Abstract Owing to the capacity of performing full-time target search, cross-modality vehicle re-identification (Re-ID) based on unmanned aerial vehicle (UAV) is gaining more attention in both video surveillance and public security. However, this promising and innovative research has not been studied sufficiently due to the data inadequacy issue. Meanwhile, the cross-modality discrepancy and orientation discrepancy challenges further aggravate the difficulty of this task. To this end, we pioneer a cross-modality vehicle Re-ID benchmark named UAV Cross-Modality Vehicle Re-ID (UCM-VeID), containing 753 identities with 16015 RGB and 13913 infrared images. Moreover, to meet cross-modality discrepancy and orientation discrepancy challenges, we present a hybrid weights decoupling network (HWDNet) to learn the shared discriminative orientation-invariant features. For the first challenge, we proposed a hybrid weights siamese network with a well-designed weight restrainer and its corresponding objective function to learn both modality-specific and modality shared information. In terms of the second challenge, three effective decoupling structures with two pretext tasks are investigated to learn orientation-invariant feature. Comprehensive experiments are carried out to validate the effectiveness of the proposed method. The dataset and codes will be released at https://github.com/moonstarL/UAV-CM-VeID.
    摘要 由于全时目标搜索的能力,基于无人机(UAV)的跨模态汽车重新认识(Re-ID)在视频监测和公共安全领域获得更多的注意力。然而,这项有前瞻性和创新的研究尚未得到充分的研究,主要是因为数据不足问题。此外,跨模态差异和Orientation差挑战更加增加了这个任务的难度。为此,我们开创了一个跨模态汽车Re-ID标准 bencmark named UAV Cross-Modality Vehicle Re-ID (UCM-VeID),包含753个标识性、16015个RGB和13913个 инфракра们图像。此外,为了解决跨模态差异和Orientation差挑战,我们提出了一种hybrid weights decoupling network (HWDNet),以学习共享的Discriminative orientation-invariant特征。首先,我们提出了一种hybrid weights siamese network,其中包括一个Well-designed weight restrainer和其对应的目标函数,以学习两个模态Specific和共享信息。在第二个挑战中,我们investigated three effective decoupling structures with two pretext tasks,以学习Orientation-invariant特征。为了证明方法的有效性,我们进行了广泛的实验。UCM-VeID数据集和代码将在https://github.com/moonstarL/UAV-CM-VeID上发布。

Effects of Human Adversarial and Affable Samples on BERT Generalizability

  • paper_url: http://arxiv.org/abs/2310.08008
  • repo_url: None
  • paper_authors: Aparna Elangovan, Jiayuan He, Yuan Li, Karin Verspoor
  • for: 这篇论文的目的是探讨训练数据质量对模型的泛化性的影响,而不是训练数据量。
  • methods: 这篇论文使用了BERT模型,并对训练数据进行分类和关系抽取任务。
  • results: 研究发现,固定训练样本数量下,有10-30%的人工挑战(h-adversarial)样本可以提高精度和F1值,但是超过这个范围可能会导致性能普遍下降。同时,h-affable样本可能没有提高模型的泛化性,甚至会导致模型的泛化性下降。
    Abstract BERT-based models have had strong performance on leaderboards, yet have been demonstrably worse in real-world settings requiring generalization. Limited quantities of training data is considered a key impediment to achieving generalizability in machine learning. In this paper, we examine the impact of training data quality, not quantity, on a model's generalizability. We consider two characteristics of training data: the portion of human-adversarial (h-adversarial), i.e., sample pairs with seemingly minor differences but different ground-truth labels, and human-affable (h-affable) training samples, i.e., sample pairs with minor differences but the same ground-truth label. We find that for a fixed size of training samples, as a rule of thumb, having 10-30% h-adversarial instances improves the precision, and therefore F1, by up to 20 points in the tasks of text classification and relation extraction. Increasing h-adversarials beyond this range can result in performance plateaus or even degradation. In contrast, h-affables may not contribute to a model's generalizability and may even degrade generalization performance.
    摘要

A Novel Statistical Measure for Out-of-Distribution Detection in Data Quality Assurance

  • paper_url: http://arxiv.org/abs/2310.07998
  • repo_url: None
  • paper_authors: Tinghui Ouyang, Isao Echizen, Yoshiki Seo
  • for: 本研究旨在 investigate AIQualityManagement (AIQM) 领域中数据领域和out-of-distribution (OOD) 数据的问题。
  • methods: 本研究使用深度学习技术来实现特征表示,并开发了一种新的统计量来检测OOD数据。
  • results: 经过实验和评估于图像 benchmark datasets 和工业 dataset,提出的方法被证明为可靠和有效的OOD检测方法。
    Abstract Data outside the problem domain poses significant threats to the security of AI-based intelligent systems. Aiming to investigate the data domain and out-of-distribution (OOD) data in AI quality management (AIQM) study, this paper proposes to use deep learning techniques for feature representation and develop a novel statistical measure for OOD detection. First, to extract low-dimensional representative features distinguishing normal and OOD data, the proposed research combines the deep auto-encoder (AE) architecture and neuron activation status for feature engineering. Then, using local conditional probability (LCP) in data reconstruction, a novel and superior statistical measure is developed to calculate the score of OOD detection. Experiments and evaluations are conducted on image benchmark datasets and an industrial dataset. Through comparative analysis with other common statistical measures in OOD detection, the proposed research is validated as feasible and effective in OOD and AIQM studies.
    摘要 □ Text ①人工智能系统的安全性受到数据外部威胁。本研究旨在通过深度学习技术实现特征表示和外部数据检测的AI质量管理(AIQM)研究。首先,通过结合自动encoder(AE)架构和神经元活动状态进行特征工程,提取出Normal和外部数据之间的低维度表示特征。然后,通过局部概率(LCP)进行数据重建,提出了一种新的和优秀的统计度量,用于评估外部数据检测得分。经对图像 bench mark 数据集和工业数据集进行实验和评估,与其他常见的统计度量进行比较分析,本研究被证明可行和有效。

Point-NeuS: Point-Guided Neural Implicit Surface Reconstruction by Volume Rendering

  • paper_url: http://arxiv.org/abs/2310.07997
  • repo_url: None
  • paper_authors: Chen Zhang, Wanjuan Su, Wenbing Tao
  • for: 本研究旨在提高多视图重建的精度和效率,提出一种基于点导航机制的新方法Point-NeuS。
  • methods: 该方法利用点模型进行几何约束,将点云的 aleatoric 不确定性模型为捕捉噪声和估计点的可靠性。另外,引入图像 проек示模块,将点和图像连接到signed distance function中,以增强 geometric constraint。
  • results: 经过效果的点导航,使用轻量级网络实现了11倍的速度提升,并且在多个实验中表现出高质量表面,尤其是细腻的细节和平滑区域。此外,它还具有强大的鲁棒性,可以抗 resist 噪声和缺失数据。
    Abstract Recently, learning neural implicit surface by volume rendering has been a promising way for multi-view reconstruction. However, limited accuracy and excessive time complexity remain bottlenecks that current methods urgently need to overcome. To address these challenges, we propose a new method called Point-NeuS, utilizing point-guided mechanisms to achieve accurate and efficient reconstruction. Point modeling is organically embedded into the volume rendering to enhance and regularize the representation of implicit surface. Specifically, to achieve precise point guidance and noise robustness, aleatoric uncertainty of the point cloud is modeled to capture the distribution of noise and estimate the reliability of points. Additionally, a Neural Projection module connecting points and images is introduced to add geometric constraints to the Signed Distance Function (SDF). To better compensate for geometric bias between volume rendering and point modeling, high-fidelity points are filtered into an Implicit Displacement Network to improve the representation of SDF. Benefiting from our effective point guidance, lightweight networks are employed to achieve an impressive 11x speedup compared to NeuS. Extensive experiments show that our method yields high-quality surfaces, especially for fine-grained details and smooth regions. Moreover, it exhibits strong robustness to both noisy and sparse data.
    摘要 近期,通过量rendering学习神经隐 superficie的方法在多视图重建方面表现出了抢眼的承诺。然而,当前方法仍面临着准确性和时间复杂度的瓶颈。为了解决这些挑战,我们提出了一种新的方法 called Point-NeuS,该方法利用点导向机制来实现准确和高效的重建。在量rendering中,点模型被天然地嵌入到隐 superficie中,以增强和规范隐 superficie的表示。具体来说,为了实现精准的点导航和随机变量的鲁棒性,点云的 aleatoric 不确定性被模型来捕捉随机变量的分布和计算点的可靠性。此外,我们引入了一种神经投影模块,将点和图像连接起来,以加入 геометрические约束到signed Distance Function (SDF)。为了更好地补做几何偏见 между量rendering和点模型,高精度的点被筛选到一个Implicit Displacement Network中,以提高SDF的表示。由于我们的有效点导航,我们采用了轻量级的网络,实现了与NeuS的11倍的速度提升。我们的实验表明,我们的方法可以生成高质量的表面,特别是细节和平滑的区域。此外,它还具有强大的鲁棒性,能够抗抗噪和缺失数据。

HeightFormer: A Multilevel Interaction and Image-adaptive Classification-regression Network for Monocular Height Estimation with Aerial Images

  • paper_url: http://arxiv.org/abs/2310.07995
  • repo_url: None
  • paper_authors: Zhan Chen, Yidan Zhang, Xiyu Qi, Yongqiang Mao, Xin Zhou, Lulu Niu, Hui Wu, Lei Wang, Yunping Ge
  • for: 这篇论文是针对单一图像高度估测在远程测量领域中提出了全面的解决方案,以提高现有方法的精度和效能。
  • methods: 这篇论文使用了一种叫做HeightFormer的全新方法,其结合了多个层次互动和适应性分类回归,以解决单一图像高度估测中的常见问题。
  • results: 这篇论文的结果显示,使用HeightFormer方法可以实现高度估测的精度和效能,并且可以提高实际应用中的对象边缘深度估测精度。
    Abstract Height estimation has long been a pivotal topic within measurement and remote sensing disciplines, proving critical for endeavours such as 3D urban modelling, MR and autonomous driving. Traditional methods utilise stereo matching or multisensor fusion, both well-established techniques that typically necessitate multiple images from varying perspectives and adjunct sensors like SAR, leading to substantial deployment costs. Single image height estimation has emerged as an attractive alternative, boasting a larger data source variety and simpler deployment. However, current methods suffer from limitations such as fixed receptive fields, a lack of global information interaction, leading to noticeable instance-level height deviations. The inherent complexity of height prediction can result in a blurry estimation of object edge depth when using mainstream regression methods based on fixed height division. This paper presents a comprehensive solution for monocular height estimation in remote sensing, termed HeightFormer, combining multilevel interactions and image-adaptive classification-regression. It features the Multilevel Interaction Backbone (MIB) and Image-adaptive Classification-regression Height Generator (ICG). MIB supplements the fixed sample grid in CNN of the conventional backbone network with tokens of different interaction ranges. It is complemented by a pixel-, patch-, and feature map-level hierarchical interaction mechanism, designed to relay spatial geometry information across different scales and introducing a global receptive field to enhance the quality of instance-level height estimation. The ICG dynamically generates height partition for each image and reframes the traditional regression task, using a refinement from coarse to fine classification-regression that significantly mitigates the innate ill-posedness issue and drastically improves edge sharpness.
    摘要 Height estimation 已经是测量和远程感知领域中长期焦点问题,对3D城市模型、MR和无人驾驶等项目具有重要意义。传统方法通常使用ステレオ匹配或多感器融合,这些技术需要多张视角不同的图像和附加感知器 like SAR,这导致了巨大的投入成本。单张图像高度估计已经成为一种有吸引力的alternative,具有更多的数据源和更简单的投入。然而,当前方法受到Fixed receptive fields和缺乏全球信息互动的限制,导致了明显的实例级高度偏差。图像高度预测的自然复杂性可能导致使用主流回归方法的对象边缘深度估计变得模糊。这篇论文提出了一种干扰HeightFormer,用于远程感知单张图像高度估计。该方法结合多层互动和图像适应分类回归,具有多层互动背bone和图像适应分类回归高度生成器(ICG)。多层互动背bone将传统的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的固定样本网络中CNN的

Large Language Models for Scientific Synthesis, Inference and Explanation

  • paper_url: http://arxiv.org/abs/2310.07984
  • repo_url: https://github.com/zyzisastudyreallyhardguy/llm4sd
  • paper_authors: Yizhen Zheng, Huan Yee Koh, Jiaxin Ju, Anh T. N. Nguyen, Lauren T. May, Geoffrey I. Webb, Shirui Pan
  • for: 这个论文的目的是用大语言模型来执行科学合成、推理和解释。
  • methods: 论文使用了通用的大语言模型来从科学数据集中进行推理,并将这些推理结果与专门用于机器学习的数据集相结合,以提高预测分子性质的性能。
  • results: 研究表明,当将大语言模型的推理和合成结果与专门用于机器学习的数据集相结合时,可以超过当前的状态艺术水平。此外,大语言模型还可以解释机器学习系统的预测结果。
    Abstract Large language models are a form of artificial intelligence systems whose primary knowledge consists of the statistical patterns, semantic relationships, and syntactical structures of language1. Despite their limited forms of "knowledge", these systems are adept at numerous complex tasks including creative writing, storytelling, translation, question-answering, summarization, and computer code generation. However, they have yet to demonstrate advanced applications in natural science. Here we show how large language models can perform scientific synthesis, inference, and explanation. We present a method for using general-purpose large language models to make inferences from scientific datasets of the form usually associated with special-purpose machine learning algorithms. We show that the large language model can augment this "knowledge" by synthesizing from the scientific literature. When a conventional machine learning system is augmented with this synthesized and inferred knowledge it can outperform the current state of the art across a range of benchmark tasks for predicting molecular properties. This approach has the further advantage that the large language model can explain the machine learning system's predictions. We anticipate that our framework will open new avenues for AI to accelerate the pace of scientific discovery.
    摘要 We present a method for using general-purpose large language models to make inferences from scientific datasets, which are usually used for special-purpose machine learning algorithms. We found that the large language model can augment this "knowledge" by synthesizing from the scientific literature. When a conventional machine learning system is augmented with this synthesized and inferred knowledge, it can outperform the current state of the art in predicting molecular properties. This approach also has the advantage that the large language model can explain the machine learning system's predictions. We believe that our framework will open up new opportunities for AI to accelerate scientific discovery.

Self-supervised visual learning for analyzing firearms trafficking activities on the Web

  • paper_url: http://arxiv.org/abs/2310.07975
  • repo_url: None
  • paper_authors: Sotirios Konstantakos, Despina Ioanna Chalkiadaki, Ioannis Mademlis, Adamantia Anna Rebolledo Chrysochoou, Georgios Th. Papadopoulos
  • for: 这篇论文的目的是对RGB图像中的自动化火器分类进行研究,以应对公共空间安全、情报收集和刑事调查等实际应用。
  • methods: 这篇论文使用的方法是深度神经网络(DNN),特别是卷积神经网络(CNN),并且使用了转移学习和自我超vised learning(SSL)。
  • results: 这篇论文的结果显示,使用SSL和转移学习可以实现更好的火器分类效果,并且可以在较小的 annotated 数据集上进行实现。
    Abstract Automated visual firearms classification from RGB images is an important real-world task with applications in public space security, intelligence gathering and law enforcement investigations. When applied to images massively crawled from the World Wide Web (including social media and dark Web sites), it can serve as an important component of systems that attempt to identify criminal firearms trafficking networks, by analyzing Big Data from open-source intelligence. Deep Neural Networks (DNN) are the state-of-the-art methodology for achieving this, with Convolutional Neural Networks (CNN) being typically employed. The common transfer learning approach consists of pretraining on a large-scale, generic annotated dataset for whole-image classification, such as ImageNet-1k, and then finetuning the DNN on a smaller, annotated, task-specific, downstream dataset for visual firearms classification. Neither Visual Transformer (ViT) neural architectures nor Self-Supervised Learning (SSL) approaches have been so far evaluated on this critical task. SSL essentially consists of replacing the traditional supervised pretraining objective with an unsupervised pretext task that does not require ground-truth labels..
    摘要 自动化视觉枪支分类从RGB图像是一项重要的现实世界任务,有应用于公共空间安全、情报收集和刑事调查调查。当应用于互联网上搜索大量图像时,它可以作为系统的一部分,用于识别刑事枪支贩卖网络,通过分析开源情报大数据。深度神经网络(DNN)是现状最佳方法,常用的是卷积神经网络(CNN)。常见的传输学习方法是先在大规模、通用注解 dataset 上预训练 DNN,然后在下游任务特定的注解 dataset 上精度调整 DNN。而Visual Transformer(ViT)神经网络 architectures 和Self-Supervised Learning(SSL)方法尚未在这一关键任务上进行评估。 SSL 基本上是将传统的超级vised预训练目标取代为无labels的自主预TEXT task。

Interpretable Diffusion via Information Decomposition

  • paper_url: http://arxiv.org/abs/2310.07972
  • repo_url: https://github.com/kxh001/info-decomp
  • paper_authors: Xianghao Kong, Ollie Liu, Han Li, Dani Yogatama, Greg Ver Steeg
  • for: This paper is written for understanding the fine-grained relationships learned by diffusion models, and for developing methods to quantify and manipulate these relationships.
  • methods: The paper uses denoising diffusion models and exact expressions for mutual information and conditional mutual information to illuminate the relationships between words and parts of an image.
  • results: The paper shows that a natural non-negative decomposition of mutual information emerges, allowing for the quantification of informative relationships between words and pixels in an image, and enabling unsupervised localization of objects in images and measurement of effects through selective editing.
    Abstract Denoising diffusion models enable conditional generation and density modeling of complex relationships like images and text. However, the nature of the learned relationships is opaque making it difficult to understand precisely what relationships between words and parts of an image are captured, or to predict the effect of an intervention. We illuminate the fine-grained relationships learned by diffusion models by noticing a precise relationship between diffusion and information decomposition. Exact expressions for mutual information and conditional mutual information can be written in terms of the denoising model. Furthermore, pointwise estimates can be easily estimated as well, allowing us to ask questions about the relationships between specific images and captions. Decomposing information even further to understand which variables in a high-dimensional space carry information is a long-standing problem. For diffusion models, we show that a natural non-negative decomposition of mutual information emerges, allowing us to quantify informative relationships between words and pixels in an image. We exploit these new relations to measure the compositional understanding of diffusion models, to do unsupervised localization of objects in images, and to measure effects when selectively editing images through prompt interventions.
    摘要 Diffusion models 可以实现Conditional generation和density模型复杂关系,如图像和文本之间的关系。然而,学习的关系性是不透明的,难以理解 diffusion 模型中 capture 的细腻关系是什么,或者预测干预的效果。我们通过发现 diffusion 和信息分解的精确关系来照明 diffusion 模型学习的细腻关系。我们可以通过 conditional 和 mutual information 的准确表达来理解这些关系,并且可以轻松地估算点wise 的关系。这些新关系allow us 可以问题关于特定图像和caption 之间的关系,并且可以进一步分解信息,以便理解 diffusion 模型中各变量的信息含量。我们展示了一种自然的非负分解方法,以便量化 diffusion 模型中各变量之间的信息关系。我们利用这些新关系来衡量 diffusion 模型的 compositional understanding,进行无监督的对象本地化,并且measure 干预后图像的效果。

A New Approach Towards Autoformalization

  • paper_url: http://arxiv.org/abs/2310.07957
  • repo_url: None
  • paper_authors: Nilay Patel, Rahul Saha, Jeffrey Flanigan
  • for: 本文提出了一种方法来自动化推理证明的验证,即通过自动将自然语言数学转化为可验证的正式语言。
  • methods: 本文提出了一种分解任务的方法,即将自然语言数学分解成三个更容易实现的子任务:不连接化形式化(即使用不连接的定义和证明)、实体链接(将证明和定义链接到正确的位置)和类型调整(使类型检查器通过)。
  • results: 本文提出了一个名为 arXiv2Formal 的 benchmark 数据集,包含 50 个证明,来验证自然语言数学的自动化验证能力。
    Abstract Verifying mathematical proofs is difficult, but can be automated with the assistance of a computer. Autoformalization is the task of automatically translating natural language mathematics into a formal language that can be verified by a program. This is a challenging task, and especially for higher-level mathematics found in research papers. Research paper mathematics requires large amounts of background and context. In this paper, we propose an avenue towards tackling autoformalization for research-level mathematics, by breaking the task into easier and more approachable subtasks: unlinked formalization (formalization with unlinked definitions and theorems), entity linking (linking to the proper theorems and definitions), and finally adjusting types so it passes the type checker. In addition, we present arXiv2Formal, a benchmark dataset for unlinked formalization consisting of 50 theorems formalized for the Lean theorem prover sampled from papers on arXiv.org. We welcome any contributions from the community to future versions of this dataset.
    摘要 自动化验证数学证明是具有挑战性的任务,但可以通过计算机的协助进行自动化。自动化形式化是将自然语言数学转换为可以由计算机验证的形式语言的任务。这是一项复杂的任务,特别是在研究论文中出现的更高水平的数学。在这篇论文中,我们提出了一种方法来解决研究级数学自动化问题,即将任务分解为更容易实现的子任务:无关定义(定义和证明分开)、实体链接(将证明和定义链接到正确的位置)和最后调整类型,以便通过类型检查器进行验证。此外,我们还提供了arXiv2Formal数据集,这是一个由arXiv.org上的50个论文中所选择的50个证明,用于测试Lean证明引擎。我们欢迎社区的贡献,以便未来版本的数据集。